All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] RAID-6 check standalone
@ 2011-02-21 20:45 Piergiorgio Sartor
  2011-03-07 19:33 ` Piergiorgio Sartor
  2011-03-21  3:02 ` NeilBrown
  0 siblings, 2 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-02-21 20:45 UTC (permalink / raw)
  To: linux-raid

Hi Neil,

please find attached a patch, to mdadm-3.2 base, including
a standalone versione of the raid-6 check.

This is basically a re-working (and hopefully improvement)
of the already implemented check in "restripe.c".

I splitted the check function into "collect" and "stats",
so that the second one could be easily replaced.
The API is also simplified.

The command line option are reduced, since we only level
is raid-6, but the ":offset" option is included.

The output reports the block/stripe rotation, P/Q errors
and the possible HDD (or unknown).

BTW, the patch applies also to the already patched "restripe.c",
including the last ":offset" patch (which is not yet in git).

Other item is that due to "sysfs.c" linking (see below) the
"Makefile" needed some changes, I hope this is not a problem.

Next steps (TODO list you like) would be:

1) Add the "sysfs.c" code in order to retrieve the HDDs info
from the MD device. It is already linked, together with the
whole (mdadm) universe, since it seems it cannot leave alone.
I'll need some advice or hint on how to do use it. I checked
"sysfs.c", but before I dig deep into it maybe better to
have some advice (maybe just one function call will do it).

2) Add the suspend lo/hi control. Fellow John Robinson was
suggesting to look into "Grow.c", which I did, but I guess
the same story as 1) is valid: better to have some hint on
where to look before wasting time.

3) Add a repair option (future). This should have different
levels, like "all", "disk", "stripe". That is, fix everything
(more or less like "repair"), fix only if a disk is clearly
having problems, fix each stripe which has clearly a problem
(but maybe different stripes may belong to different HDDs).

So, for the point 1) and 2) would be nice to have some more
detail on where to look what. Point 3) we will discuss later.

Thanks, please consider for inclusion,

bye,

pg


diff -urN a/Makefile b/Makefile
--- a/Makefile	2011-02-01 05:31:07.000000000 +0100
+++ b/Makefile	2011-02-20 00:03:44.929998038 +0100
@@ -98,7 +98,7 @@
 MAN5DIR = $(MANDIR)/man5
 MAN8DIR = $(MANDIR)/man8
 
-OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
+OBJSX = config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
 	Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
 	Incremental.o \
 	mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
@@ -106,6 +106,8 @@
 	restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
 	platform-intel.o probe_roms.o
 
+OBJS = mdadm.o $(OBJSX)
+
 SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
 	Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c Query.c \
 	Incremental.c \
@@ -182,6 +184,9 @@
 test_stripe : restripe.c mdadm.h
 	$(CC) $(CXFLAGS) $(LDFLAGS) -o test_stripe -DMAIN restripe.c
 
+raid6check : raid6check.o mdadm.h $(OBJSX)
+	$(CC) $(CXFLAGS) $(LDFLAGS) -o raid6check raid6check.o $(OBJSX)
+
 mdassemble : $(ASSEMBLE_SRCS) $(INCL)
 	rm -f $(OBJS)
 	$(DIET_GCC) $(ASSEMBLE_FLAGS) -o mdassemble $(ASSEMBLE_SRCS)  $(STATICSRC)
@@ -265,7 +270,7 @@
 	mdadm.Os mdadm.O2 mdmon.O2 \
 	mdassemble mdassemble.static mdassemble.auto mdassemble.uclibc \
 	mdassemble.klibc swap_super \
-	init.cpio.gz mdadm.uclibc.static test_stripe mdmon \
+	init.cpio.gz mdadm.uclibc.static test_stripe raid6check raid6check.o mdmon \
 	mdadm.8
 
 dist : clean
diff -urN a/raid6check.c b/raid6check.c
--- a/raid6check.c	1970-01-01 01:00:00.000000000 +0100
+++ b/raid6check.c	2011-02-21 21:40:27.813656214 +0100
@@ -0,0 +1,263 @@
+/*
+ * raid6check - extended consistency check for RAID-6
+ *
+ * Copyright (C) 2011 Piergiorgio Sartor
+ *
+ *
+ *    This program is free software; you can redistribute it and/or modify
+ *    it under the terms of the GNU General Public License as published by
+ *    the Free Software Foundation; either version 2 of the License, or
+ *    (at your option) any later version.
+ *
+ *    This program is distributed in the hope that it will be useful,
+ *    but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *    GNU General Public License for more details.
+ *
+ *    You should have received a copy of the GNU General Public License
+ *    along with this program; if not, write to the Free Software
+ *    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ *
+ *    Author: Piergiorgio Sartor
+ *    Based on "restripe.c" from "mdadm" codebase
+ */
+
+#include "mdadm.h"
+#include <stdint.h>
+
+int geo_map(int block, unsigned long long stripe, int raid_disks,
+	    int level, int layout);
+void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size);
+void make_tables(void);
+
+/* Collect per stripe consistency information */
+void raid6_collect(int chunk_size, uint8_t *p, uint8_t *q,
+		   char *chunkP, char *chunkQ, int *results)
+{
+	int i;
+	int data_id;
+	uint8_t Px, Qx;
+	extern uint8_t raid6_gflog[];
+
+	for(i = 0; i < chunk_size; i++) {
+		Px = (uint8_t)chunkP[i] ^ (uint8_t)p[i];
+		Qx = (uint8_t)chunkQ[i] ^ (uint8_t)q[i];
+
+		if((Px != 0) && (Qx == 0))
+			results[i] = -1;
+
+		if((Px == 0) && (Qx != 0))
+			results[i] = -2;
+
+		if((Px != 0) && (Qx != 0)) {
+			data_id = (raid6_gflog[Qx] - raid6_gflog[Px]);
+			if(data_id < 0) data_id += 255;
+			results[i] = data_id;
+		}
+
+		if((Px == 0) && (Qx == 0))
+			results[i] = -255;
+	}
+}
+
+/* Try to find out if a specific disk has problems */
+int raid6_stats(int *results, int raid_disks, int chunk_size)
+{
+	int i;
+	int curr_broken_disk = -255;
+	int prev_broken_disk = -255;
+	int broken_status = 0;
+	
+	for(i = 0; i < chunk_size; i++) {
+
+		if(results[i] != -255)
+			curr_broken_disk = results[i];
+
+		if(curr_broken_disk >= raid_disks)
+			broken_status = 2;
+
+		switch(broken_status) {
+		case 0:
+			if(curr_broken_disk != -255) {
+				prev_broken_disk = curr_broken_disk;
+				broken_status = 1;
+			}
+			break;
+
+		case 1:
+			if(curr_broken_disk != prev_broken_disk)
+				broken_status = 2;
+			break;
+
+		case 2:
+		default:
+			curr_broken_disk = prev_broken_disk = -65535;
+			break;
+		}
+	}
+
+	return curr_broken_disk;
+}
+
+int check_stripes(int *source, unsigned long long *offsets,
+		  int raid_disks, int chunk_size, int level, int layout,
+		  unsigned long long start, unsigned long long length, char *name[])
+{
+	/* read the data and p and q blocks, and check we got them right */
+	char *stripe_buf = malloc(raid_disks * chunk_size);
+	char **stripes = malloc(raid_disks * sizeof(char*));
+	char **blocks = malloc(raid_disks * sizeof(char*));
+	uint8_t *p = malloc(chunk_size);
+	uint8_t *q = malloc(chunk_size);
+	int *results = malloc(chunk_size * sizeof(int));
+
+	int i;
+	int diskP, diskQ;
+	int data_disks = raid_disks - 2;
+
+	extern int tables_ready;
+
+	if (!tables_ready)
+		make_tables();
+
+	for ( i = 0 ; i < raid_disks ; i++)
+		stripes[i] = stripe_buf + i * chunk_size;
+
+	while (length > 0) {
+		int disk;
+
+		for (i = 0 ; i < raid_disks ; i++) {
+			lseek64(source[i], offsets[i]+start, 0);
+			read(source[i], stripes[i], chunk_size);
+		}
+		for (i = 0 ; i < data_disks ; i++) {
+			int disk = geo_map(i, start/chunk_size, raid_disks,
+					   level, layout);
+			blocks[i] = stripes[disk];
+			printf("%d->%d\n", i, disk);
+		}
+
+		qsyndrome(p, q, (uint8_t**)blocks, data_disks, chunk_size);
+		diskP = geo_map(-1, start/chunk_size, raid_disks,
+				level, layout);
+		if (memcmp(p, stripes[diskP], chunk_size) != 0) {
+			printf("P(%d) wrong at %llu\n", diskP,
+			       start / chunk_size);
+		}
+		diskQ = geo_map(-2, start/chunk_size, raid_disks,
+				level, layout);
+		if (memcmp(q, stripes[diskQ], chunk_size) != 0) {
+			printf("Q(%d) wrong at %llu\n", diskQ,
+			       start / chunk_size);
+		}
+		raid6_collect(chunk_size, p, q,
+			      stripes[diskP], stripes[diskQ], results);
+		disk = raid6_stats(results, raid_disks, chunk_size);
+
+		if(disk >= -2) {
+			disk = geo_map(disk, start/chunk_size, raid_disks,
+				       level, layout);
+		}
+		if(disk >= 0) {
+			printf("Possible failed disk: %d --> %s\n", disk, name[disk]);
+		}
+		if(disk == -65535) {
+			printf("Failure detected, but disk unknown\n");
+		}
+
+		length -= chunk_size;
+		start += chunk_size;
+	}
+
+	free(stripe_buf);
+	free(stripes);
+	free(blocks);
+	free(p);
+	free(q);
+	free(results);
+
+	return 0;
+}
+
+unsigned long long getnum(char *str, char **err)
+{
+	char *e;
+	unsigned long long rv = strtoull(str, &e, 10);
+	if (e==str || *e) {
+		*err = str;
+		return 0;
+	}
+	return rv;
+}
+
+int main(int argc, char *argv[])
+{
+	/* raid_disks chunk_size layout start length devices...
+	 */
+	int *fds;
+	char *buf;
+	unsigned long long *offsets;
+	int raid_disks, chunk_size, layout;
+	int level = 6;
+	unsigned long long start, length;
+	int i;
+
+	char *err = NULL;
+	if (argc < 8) {
+		fprintf(stderr, "Usage: raid6check raid_disks"
+			" chunk_size layout start length devices...\n");
+		exit(1);
+	}
+
+	raid_disks = getnum(argv[1], &err);
+	chunk_size = getnum(argv[2], &err);
+	layout = getnum(argv[3], &err);
+	start = getnum(argv[4], &err);
+	length = getnum(argv[5], &err);
+	if (err) {
+		fprintf(stderr, "test_stripe: Bad number: %s\n", err);
+		exit(2);
+	}
+	if (argc != raid_disks + 6) {
+		fprintf(stderr, "test_stripe: wrong number of devices: want %d found %d\n",
+			raid_disks, argc-6);
+		exit(2);
+	}
+	fds = malloc(raid_disks * sizeof(*fds));
+	offsets = malloc(raid_disks * sizeof(*offsets));
+	memset(offsets, 0, raid_disks * sizeof(*offsets));
+
+	for (i=0; i<raid_disks; i++) {
+		char *p;
+		p = strchr(argv[6+i], ':');
+
+		if(p != NULL) {
+			*p++ = '\0';
+			offsets[i] = atoll(p) * 512;
+		}
+		fds[i] = open(argv[6+i], O_RDWR);
+		if (fds[i] < 0) {
+			perror(argv[6+i]);
+			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
+			exit(3);
+		}
+	}
+
+	buf = malloc(raid_disks * chunk_size);
+
+	int rv = check_stripes(fds, offsets,
+			       raid_disks, chunk_size, level, layout,
+			       start, length, &argv[6]);
+	if (rv != 0) {
+		fprintf(stderr,
+			"test_stripe: test_stripes returned %d\n", rv);
+		exit(1);
+	}
+
+	free(fds);
+	free(offsets);
+	free(buf);
+
+	exit(0);
+}
+
diff -urN a/restripe.c b/restripe.c
--- a/restripe.c	2011-02-19 22:36:29.707651642 +0100
+++ b/restripe.c	2011-02-19 18:13:14.468626986 +0100
@@ -33,7 +33,7 @@
  *
  */
 
-static int geo_map(int block, unsigned long long stripe, int raid_disks,
+int geo_map(int block, unsigned long long stripe, int raid_disks,
 		   int level, int layout)
 {
 	/* On the given stripe, find which disk in the array will have
@@ -223,7 +223,7 @@
 	}
 }
 
-static void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
+void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
 {
 	int d, z;
 	uint8_t wq0, wp0, wd0, w10, w20;

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone
  2011-02-21 20:45 [PATCH] RAID-6 check standalone Piergiorgio Sartor
@ 2011-03-07 19:33 ` Piergiorgio Sartor
  2011-03-21  3:02 ` NeilBrown
  1 sibling, 0 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-03-07 19:33 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

Hi Neil,

around a couple of weeks ago I sent two patches, one
was the ":offset" change for "restripe.c" and another
(here below) a complete standalone RAID-6 check.

Since then I did not get any feedback, are the patches
OK and are just in the waiting queue or there are issues?

I also have a couple of items I would like to clarify,
still in the text below (TODO list).

Thanks,

bye,

pg

On Mon, Feb 21, 2011 at 09:45:51PM +0100, Piergiorgio Sartor wrote:
> Hi Neil,
> 
> please find attached a patch, to mdadm-3.2 base, including
> a standalone versione of the raid-6 check.
> 
> This is basically a re-working (and hopefully improvement)
> of the already implemented check in "restripe.c".
> 
> I splitted the check function into "collect" and "stats",
> so that the second one could be easily replaced.
> The API is also simplified.
> 
> The command line option are reduced, since we only level
> is raid-6, but the ":offset" option is included.
> 
> The output reports the block/stripe rotation, P/Q errors
> and the possible HDD (or unknown).
> 
> BTW, the patch applies also to the already patched "restripe.c",
> including the last ":offset" patch (which is not yet in git).
> 
> Other item is that due to "sysfs.c" linking (see below) the
> "Makefile" needed some changes, I hope this is not a problem.
> 
> Next steps (TODO list you like) would be:
> 
> 1) Add the "sysfs.c" code in order to retrieve the HDDs info
> from the MD device. It is already linked, together with the
> whole (mdadm) universe, since it seems it cannot leave alone.
> I'll need some advice or hint on how to do use it. I checked
> "sysfs.c", but before I dig deep into it maybe better to
> have some advice (maybe just one function call will do it).
> 
> 2) Add the suspend lo/hi control. Fellow John Robinson was
> suggesting to look into "Grow.c", which I did, but I guess
> the same story as 1) is valid: better to have some hint on
> where to look before wasting time.
> 
> 3) Add a repair option (future). This should have different
> levels, like "all", "disk", "stripe". That is, fix everything
> (more or less like "repair"), fix only if a disk is clearly
> having problems, fix each stripe which has clearly a problem
> (but maybe different stripes may belong to different HDDs).
> 
> So, for the point 1) and 2) would be nice to have some more
> detail on where to look what. Point 3) we will discuss later.
> 
> Thanks, please consider for inclusion,
> 
> bye,
> 
> pg
> 
> 
> diff -urN a/Makefile b/Makefile
> --- a/Makefile	2011-02-01 05:31:07.000000000 +0100
> +++ b/Makefile	2011-02-20 00:03:44.929998038 +0100
> @@ -98,7 +98,7 @@
>  MAN5DIR = $(MANDIR)/man5
>  MAN8DIR = $(MANDIR)/man8
>  
> -OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
> +OBJSX = config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
>  	Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
>  	Incremental.o \
>  	mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
> @@ -106,6 +106,8 @@
>  	restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
>  	platform-intel.o probe_roms.o
>  
> +OBJS = mdadm.o $(OBJSX)
> +
>  SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
>  	Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c Query.c \
>  	Incremental.c \
> @@ -182,6 +184,9 @@
>  test_stripe : restripe.c mdadm.h
>  	$(CC) $(CXFLAGS) $(LDFLAGS) -o test_stripe -DMAIN restripe.c
>  
> +raid6check : raid6check.o mdadm.h $(OBJSX)
> +	$(CC) $(CXFLAGS) $(LDFLAGS) -o raid6check raid6check.o $(OBJSX)
> +
>  mdassemble : $(ASSEMBLE_SRCS) $(INCL)
>  	rm -f $(OBJS)
>  	$(DIET_GCC) $(ASSEMBLE_FLAGS) -o mdassemble $(ASSEMBLE_SRCS)  $(STATICSRC)
> @@ -265,7 +270,7 @@
>  	mdadm.Os mdadm.O2 mdmon.O2 \
>  	mdassemble mdassemble.static mdassemble.auto mdassemble.uclibc \
>  	mdassemble.klibc swap_super \
> -	init.cpio.gz mdadm.uclibc.static test_stripe mdmon \
> +	init.cpio.gz mdadm.uclibc.static test_stripe raid6check raid6check.o mdmon \
>  	mdadm.8
>  
>  dist : clean
> diff -urN a/raid6check.c b/raid6check.c
> --- a/raid6check.c	1970-01-01 01:00:00.000000000 +0100
> +++ b/raid6check.c	2011-02-21 21:40:27.813656214 +0100
> @@ -0,0 +1,263 @@
> +/*
> + * raid6check - extended consistency check for RAID-6
> + *
> + * Copyright (C) 2011 Piergiorgio Sartor
> + *
> + *
> + *    This program is free software; you can redistribute it and/or modify
> + *    it under the terms of the GNU General Public License as published by
> + *    the Free Software Foundation; either version 2 of the License, or
> + *    (at your option) any later version.
> + *
> + *    This program is distributed in the hope that it will be useful,
> + *    but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *    GNU General Public License for more details.
> + *
> + *    You should have received a copy of the GNU General Public License
> + *    along with this program; if not, write to the Free Software
> + *    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + *
> + *    Author: Piergiorgio Sartor
> + *    Based on "restripe.c" from "mdadm" codebase
> + */
> +
> +#include "mdadm.h"
> +#include <stdint.h>
> +
> +int geo_map(int block, unsigned long long stripe, int raid_disks,
> +	    int level, int layout);
> +void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size);
> +void make_tables(void);
> +
> +/* Collect per stripe consistency information */
> +void raid6_collect(int chunk_size, uint8_t *p, uint8_t *q,
> +		   char *chunkP, char *chunkQ, int *results)
> +{
> +	int i;
> +	int data_id;
> +	uint8_t Px, Qx;
> +	extern uint8_t raid6_gflog[];
> +
> +	for(i = 0; i < chunk_size; i++) {
> +		Px = (uint8_t)chunkP[i] ^ (uint8_t)p[i];
> +		Qx = (uint8_t)chunkQ[i] ^ (uint8_t)q[i];
> +
> +		if((Px != 0) && (Qx == 0))
> +			results[i] = -1;
> +
> +		if((Px == 0) && (Qx != 0))
> +			results[i] = -2;
> +
> +		if((Px != 0) && (Qx != 0)) {
> +			data_id = (raid6_gflog[Qx] - raid6_gflog[Px]);
> +			if(data_id < 0) data_id += 255;
> +			results[i] = data_id;
> +		}
> +
> +		if((Px == 0) && (Qx == 0))
> +			results[i] = -255;
> +	}
> +}
> +
> +/* Try to find out if a specific disk has problems */
> +int raid6_stats(int *results, int raid_disks, int chunk_size)
> +{
> +	int i;
> +	int curr_broken_disk = -255;
> +	int prev_broken_disk = -255;
> +	int broken_status = 0;
> +	
> +	for(i = 0; i < chunk_size; i++) {
> +
> +		if(results[i] != -255)
> +			curr_broken_disk = results[i];
> +
> +		if(curr_broken_disk >= raid_disks)
> +			broken_status = 2;
> +
> +		switch(broken_status) {
> +		case 0:
> +			if(curr_broken_disk != -255) {
> +				prev_broken_disk = curr_broken_disk;
> +				broken_status = 1;
> +			}
> +			break;
> +
> +		case 1:
> +			if(curr_broken_disk != prev_broken_disk)
> +				broken_status = 2;
> +			break;
> +
> +		case 2:
> +		default:
> +			curr_broken_disk = prev_broken_disk = -65535;
> +			break;
> +		}
> +	}
> +
> +	return curr_broken_disk;
> +}
> +
> +int check_stripes(int *source, unsigned long long *offsets,
> +		  int raid_disks, int chunk_size, int level, int layout,
> +		  unsigned long long start, unsigned long long length, char *name[])
> +{
> +	/* read the data and p and q blocks, and check we got them right */
> +	char *stripe_buf = malloc(raid_disks * chunk_size);
> +	char **stripes = malloc(raid_disks * sizeof(char*));
> +	char **blocks = malloc(raid_disks * sizeof(char*));
> +	uint8_t *p = malloc(chunk_size);
> +	uint8_t *q = malloc(chunk_size);
> +	int *results = malloc(chunk_size * sizeof(int));
> +
> +	int i;
> +	int diskP, diskQ;
> +	int data_disks = raid_disks - 2;
> +
> +	extern int tables_ready;
> +
> +	if (!tables_ready)
> +		make_tables();
> +
> +	for ( i = 0 ; i < raid_disks ; i++)
> +		stripes[i] = stripe_buf + i * chunk_size;
> +
> +	while (length > 0) {
> +		int disk;
> +
> +		for (i = 0 ; i < raid_disks ; i++) {
> +			lseek64(source[i], offsets[i]+start, 0);
> +			read(source[i], stripes[i], chunk_size);
> +		}
> +		for (i = 0 ; i < data_disks ; i++) {
> +			int disk = geo_map(i, start/chunk_size, raid_disks,
> +					   level, layout);
> +			blocks[i] = stripes[disk];
> +			printf("%d->%d\n", i, disk);
> +		}
> +
> +		qsyndrome(p, q, (uint8_t**)blocks, data_disks, chunk_size);
> +		diskP = geo_map(-1, start/chunk_size, raid_disks,
> +				level, layout);
> +		if (memcmp(p, stripes[diskP], chunk_size) != 0) {
> +			printf("P(%d) wrong at %llu\n", diskP,
> +			       start / chunk_size);
> +		}
> +		diskQ = geo_map(-2, start/chunk_size, raid_disks,
> +				level, layout);
> +		if (memcmp(q, stripes[diskQ], chunk_size) != 0) {
> +			printf("Q(%d) wrong at %llu\n", diskQ,
> +			       start / chunk_size);
> +		}
> +		raid6_collect(chunk_size, p, q,
> +			      stripes[diskP], stripes[diskQ], results);
> +		disk = raid6_stats(results, raid_disks, chunk_size);
> +
> +		if(disk >= -2) {
> +			disk = geo_map(disk, start/chunk_size, raid_disks,
> +				       level, layout);
> +		}
> +		if(disk >= 0) {
> +			printf("Possible failed disk: %d --> %s\n", disk, name[disk]);
> +		}
> +		if(disk == -65535) {
> +			printf("Failure detected, but disk unknown\n");
> +		}
> +
> +		length -= chunk_size;
> +		start += chunk_size;
> +	}
> +
> +	free(stripe_buf);
> +	free(stripes);
> +	free(blocks);
> +	free(p);
> +	free(q);
> +	free(results);
> +
> +	return 0;
> +}
> +
> +unsigned long long getnum(char *str, char **err)
> +{
> +	char *e;
> +	unsigned long long rv = strtoull(str, &e, 10);
> +	if (e==str || *e) {
> +		*err = str;
> +		return 0;
> +	}
> +	return rv;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	/* raid_disks chunk_size layout start length devices...
> +	 */
> +	int *fds;
> +	char *buf;
> +	unsigned long long *offsets;
> +	int raid_disks, chunk_size, layout;
> +	int level = 6;
> +	unsigned long long start, length;
> +	int i;
> +
> +	char *err = NULL;
> +	if (argc < 8) {
> +		fprintf(stderr, "Usage: raid6check raid_disks"
> +			" chunk_size layout start length devices...\n");
> +		exit(1);
> +	}
> +
> +	raid_disks = getnum(argv[1], &err);
> +	chunk_size = getnum(argv[2], &err);
> +	layout = getnum(argv[3], &err);
> +	start = getnum(argv[4], &err);
> +	length = getnum(argv[5], &err);
> +	if (err) {
> +		fprintf(stderr, "test_stripe: Bad number: %s\n", err);
> +		exit(2);
> +	}
> +	if (argc != raid_disks + 6) {
> +		fprintf(stderr, "test_stripe: wrong number of devices: want %d found %d\n",
> +			raid_disks, argc-6);
> +		exit(2);
> +	}
> +	fds = malloc(raid_disks * sizeof(*fds));
> +	offsets = malloc(raid_disks * sizeof(*offsets));
> +	memset(offsets, 0, raid_disks * sizeof(*offsets));
> +
> +	for (i=0; i<raid_disks; i++) {
> +		char *p;
> +		p = strchr(argv[6+i], ':');
> +
> +		if(p != NULL) {
> +			*p++ = '\0';
> +			offsets[i] = atoll(p) * 512;
> +		}
> +		fds[i] = open(argv[6+i], O_RDWR);
> +		if (fds[i] < 0) {
> +			perror(argv[6+i]);
> +			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
> +			exit(3);
> +		}
> +	}
> +
> +	buf = malloc(raid_disks * chunk_size);
> +
> +	int rv = check_stripes(fds, offsets,
> +			       raid_disks, chunk_size, level, layout,
> +			       start, length, &argv[6]);
> +	if (rv != 0) {
> +		fprintf(stderr,
> +			"test_stripe: test_stripes returned %d\n", rv);
> +		exit(1);
> +	}
> +
> +	free(fds);
> +	free(offsets);
> +	free(buf);
> +
> +	exit(0);
> +}
> +
> diff -urN a/restripe.c b/restripe.c
> --- a/restripe.c	2011-02-19 22:36:29.707651642 +0100
> +++ b/restripe.c	2011-02-19 18:13:14.468626986 +0100
> @@ -33,7 +33,7 @@
>   *
>   */
>  
> -static int geo_map(int block, unsigned long long stripe, int raid_disks,
> +int geo_map(int block, unsigned long long stripe, int raid_disks,
>  		   int level, int layout)
>  {
>  	/* On the given stripe, find which disk in the array will have
> @@ -223,7 +223,7 @@
>  	}
>  }
>  
> -static void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
> +void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
>  {
>  	int d, z;
>  	uint8_t wq0, wp0, wd0, w10, w20;
> 
> -- 
> 
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone
  2011-02-21 20:45 [PATCH] RAID-6 check standalone Piergiorgio Sartor
  2011-03-07 19:33 ` Piergiorgio Sartor
@ 2011-03-21  3:02 ` NeilBrown
  2011-03-21 10:40   ` Piergiorgio Sartor
  1 sibling, 1 reply; 29+ messages in thread
From: NeilBrown @ 2011-03-21  3:02 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Mon, 21 Feb 2011 21:45:51 +0100 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> please find attached a patch, to mdadm-3.2 base, including
> a standalone versione of the raid-6 check.
> 
> This is basically a re-working (and hopefully improvement)
> of the already implemented check in "restripe.c".
> 
> I splitted the check function into "collect" and "stats",
> so that the second one could be easily replaced.
> The API is also simplified.
> 
> The command line option are reduced, since we only level
> is raid-6, but the ":offset" option is included.
> 
> The output reports the block/stripe rotation, P/Q errors
> and the possible HDD (or unknown).
> 
> BTW, the patch applies also to the already patched "restripe.c",
> including the last ":offset" patch (which is not yet in git).
> 

Sorry for not replying for a month!!  I manage to lose track of this then
couldn't find it when I went looking, then got distracted, then....

Thanks for your patience and persistence.

I have applied this patch to me git now - with some minor changes.

> Other item is that due to "sysfs.c" linking (see below) the
> "Makefile" needed some changes, I hope this is not a problem.

I have reverted most of these changes and the linking with sysfs.c isn't
needed yet.  When it is we can sort out how best to achieve it.

> 
> Next steps (TODO list you like) would be:
> 
> 1) Add the "sysfs.c" code in order to retrieve the HDDs info
> from the MD device. It is already linked, together with the
> whole (mdadm) universe, since it seems it cannot leave alone.
> I'll need some advice or hint on how to do use it. I checked
> "sysfs.c", but before I dig deep into it maybe better to
> have some advice (maybe just one function call will do it).

What exactly do you want to achieve?

I suspect you want to a open the md device, then

  info = sysfs_read(fd, -1,
  GET_LEVEL|GET_LAYOUT_GET_DISKS|GET_CHUNK|GET_DEVS|GET_OFFSET);

(possibly with other flags) and then extract the info you want from the data
structure returned - but I'm only guessing at what you might want.


> 
> 2) Add the suspend lo/hi control. Fellow John Robinson was
> suggesting to look into "Grow.c", which I did, but I guess
> the same story as 1) is valid: better to have some hint on
> where to look before wasting time.

This would be:

  sysfs_set_num(info, NULL, "suspend_lo", offset*data_disks);
  sysfs_set_num(info, NULL, "suspend_hi", (offset+chunksize)*data_disks);

to freeze one stripe.  Then work in there.
The addresses are addresses in the array, hence the multiplication
by data_disks (which is raid_disks - 2 for RAID6).

Don't hold the array suspended for too long or something might get
upset.  And allocate any memory you need first, and call
    	mlockall(MCL_CURRENT | MCL_FUTURE);

first to be even more safe.

> 
> 3) Add a repair option (future). This should have different
> levels, like "all", "disk", "stripe". That is, fix everything
> (more or less like "repair"), fix only if a disk is clearly
> having problems, fix each stripe which has clearly a problem
> (but maybe different stripes may belong to different HDDs).
> 
> So, for the point 1) and 2) would be nice to have some more
> detail on where to look what. Point 3) we will discuss later.

Agreed.
Hopefully the info I have provided is sufficient to help you progress.

Thanks,
NeilBrown


> 
> Thanks, please consider for inclusion,
> 
> bye,
> 
> pg

> 
> 
> diff -urN a/Makefile b/Makefile
> --- a/Makefile	2011-02-01 05:31:07.000000000 +0100
> +++ b/Makefile	2011-02-20 00:03:44.929998038 +0100
> @@ -98,7 +98,7 @@
>  MAN5DIR = $(MANDIR)/man5
>  MAN8DIR = $(MANDIR)/man8
>  
> -OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
> +OBJSX = config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
>  	Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
>  	Incremental.o \
>  	mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
> @@ -106,6 +106,8 @@
>  	restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
>  	platform-intel.o probe_roms.o
>  
> +OBJS = mdadm.o $(OBJSX)
> +
>  SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
>  	Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c Query.c \
>  	Incremental.c \
> @@ -182,6 +184,9 @@
>  test_stripe : restripe.c mdadm.h
>  	$(CC) $(CXFLAGS) $(LDFLAGS) -o test_stripe -DMAIN restripe.c
>  
> +raid6check : raid6check.o mdadm.h $(OBJSX)
> +	$(CC) $(CXFLAGS) $(LDFLAGS) -o raid6check raid6check.o $(OBJSX)
> +
>  mdassemble : $(ASSEMBLE_SRCS) $(INCL)
>  	rm -f $(OBJS)
>  	$(DIET_GCC) $(ASSEMBLE_FLAGS) -o mdassemble $(ASSEMBLE_SRCS)  $(STATICSRC)
> @@ -265,7 +270,7 @@
>  	mdadm.Os mdadm.O2 mdmon.O2 \
>  	mdassemble mdassemble.static mdassemble.auto mdassemble.uclibc \
>  	mdassemble.klibc swap_super \
> -	init.cpio.gz mdadm.uclibc.static test_stripe mdmon \
> +	init.cpio.gz mdadm.uclibc.static test_stripe raid6check raid6check.o mdmon \
>  	mdadm.8
>  
>  dist : clean
> diff -urN a/raid6check.c b/raid6check.c
> --- a/raid6check.c	1970-01-01 01:00:00.000000000 +0100
> +++ b/raid6check.c	2011-02-21 21:40:27.813656214 +0100
> @@ -0,0 +1,263 @@
> +/*
> + * raid6check - extended consistency check for RAID-6
> + *
> + * Copyright (C) 2011 Piergiorgio Sartor
> + *
> + *
> + *    This program is free software; you can redistribute it and/or modify
> + *    it under the terms of the GNU General Public License as published by
> + *    the Free Software Foundation; either version 2 of the License, or
> + *    (at your option) any later version.
> + *
> + *    This program is distributed in the hope that it will be useful,
> + *    but WITHOUT ANY WARRANTY; without even the implied warranty of
> + *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *    GNU General Public License for more details.
> + *
> + *    You should have received a copy of the GNU General Public License
> + *    along with this program; if not, write to the Free Software
> + *    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> + *
> + *    Author: Piergiorgio Sartor
> + *    Based on "restripe.c" from "mdadm" codebase
> + */
> +
> +#include "mdadm.h"
> +#include <stdint.h>
> +
> +int geo_map(int block, unsigned long long stripe, int raid_disks,
> +	    int level, int layout);
> +void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size);
> +void make_tables(void);
> +
> +/* Collect per stripe consistency information */
> +void raid6_collect(int chunk_size, uint8_t *p, uint8_t *q,
> +		   char *chunkP, char *chunkQ, int *results)
> +{
> +	int i;
> +	int data_id;
> +	uint8_t Px, Qx;
> +	extern uint8_t raid6_gflog[];
> +
> +	for(i = 0; i < chunk_size; i++) {
> +		Px = (uint8_t)chunkP[i] ^ (uint8_t)p[i];
> +		Qx = (uint8_t)chunkQ[i] ^ (uint8_t)q[i];
> +
> +		if((Px != 0) && (Qx == 0))
> +			results[i] = -1;
> +
> +		if((Px == 0) && (Qx != 0))
> +			results[i] = -2;
> +
> +		if((Px != 0) && (Qx != 0)) {
> +			data_id = (raid6_gflog[Qx] - raid6_gflog[Px]);
> +			if(data_id < 0) data_id += 255;
> +			results[i] = data_id;
> +		}
> +
> +		if((Px == 0) && (Qx == 0))
> +			results[i] = -255;
> +	}
> +}
> +
> +/* Try to find out if a specific disk has problems */
> +int raid6_stats(int *results, int raid_disks, int chunk_size)
> +{
> +	int i;
> +	int curr_broken_disk = -255;
> +	int prev_broken_disk = -255;
> +	int broken_status = 0;
> +	
> +	for(i = 0; i < chunk_size; i++) {
> +
> +		if(results[i] != -255)
> +			curr_broken_disk = results[i];
> +
> +		if(curr_broken_disk >= raid_disks)
> +			broken_status = 2;
> +
> +		switch(broken_status) {
> +		case 0:
> +			if(curr_broken_disk != -255) {
> +				prev_broken_disk = curr_broken_disk;
> +				broken_status = 1;
> +			}
> +			break;
> +
> +		case 1:
> +			if(curr_broken_disk != prev_broken_disk)
> +				broken_status = 2;
> +			break;
> +
> +		case 2:
> +		default:
> +			curr_broken_disk = prev_broken_disk = -65535;
> +			break;
> +		}
> +	}
> +
> +	return curr_broken_disk;
> +}
> +
> +int check_stripes(int *source, unsigned long long *offsets,
> +		  int raid_disks, int chunk_size, int level, int layout,
> +		  unsigned long long start, unsigned long long length, char *name[])
> +{
> +	/* read the data and p and q blocks, and check we got them right */
> +	char *stripe_buf = malloc(raid_disks * chunk_size);
> +	char **stripes = malloc(raid_disks * sizeof(char*));
> +	char **blocks = malloc(raid_disks * sizeof(char*));
> +	uint8_t *p = malloc(chunk_size);
> +	uint8_t *q = malloc(chunk_size);
> +	int *results = malloc(chunk_size * sizeof(int));
> +
> +	int i;
> +	int diskP, diskQ;
> +	int data_disks = raid_disks - 2;
> +
> +	extern int tables_ready;
> +
> +	if (!tables_ready)
> +		make_tables();
> +
> +	for ( i = 0 ; i < raid_disks ; i++)
> +		stripes[i] = stripe_buf + i * chunk_size;
> +
> +	while (length > 0) {
> +		int disk;
> +
> +		for (i = 0 ; i < raid_disks ; i++) {
> +			lseek64(source[i], offsets[i]+start, 0);
> +			read(source[i], stripes[i], chunk_size);
> +		}
> +		for (i = 0 ; i < data_disks ; i++) {
> +			int disk = geo_map(i, start/chunk_size, raid_disks,
> +					   level, layout);
> +			blocks[i] = stripes[disk];
> +			printf("%d->%d\n", i, disk);
> +		}
> +
> +		qsyndrome(p, q, (uint8_t**)blocks, data_disks, chunk_size);
> +		diskP = geo_map(-1, start/chunk_size, raid_disks,
> +				level, layout);
> +		if (memcmp(p, stripes[diskP], chunk_size) != 0) {
> +			printf("P(%d) wrong at %llu\n", diskP,
> +			       start / chunk_size);
> +		}
> +		diskQ = geo_map(-2, start/chunk_size, raid_disks,
> +				level, layout);
> +		if (memcmp(q, stripes[diskQ], chunk_size) != 0) {
> +			printf("Q(%d) wrong at %llu\n", diskQ,
> +			       start / chunk_size);
> +		}
> +		raid6_collect(chunk_size, p, q,
> +			      stripes[diskP], stripes[diskQ], results);
> +		disk = raid6_stats(results, raid_disks, chunk_size);
> +
> +		if(disk >= -2) {
> +			disk = geo_map(disk, start/chunk_size, raid_disks,
> +				       level, layout);
> +		}
> +		if(disk >= 0) {
> +			printf("Possible failed disk: %d --> %s\n", disk, name[disk]);
> +		}
> +		if(disk == -65535) {
> +			printf("Failure detected, but disk unknown\n");
> +		}
> +
> +		length -= chunk_size;
> +		start += chunk_size;
> +	}
> +
> +	free(stripe_buf);
> +	free(stripes);
> +	free(blocks);
> +	free(p);
> +	free(q);
> +	free(results);
> +
> +	return 0;
> +}
> +
> +unsigned long long getnum(char *str, char **err)
> +{
> +	char *e;
> +	unsigned long long rv = strtoull(str, &e, 10);
> +	if (e==str || *e) {
> +		*err = str;
> +		return 0;
> +	}
> +	return rv;
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	/* raid_disks chunk_size layout start length devices...
> +	 */
> +	int *fds;
> +	char *buf;
> +	unsigned long long *offsets;
> +	int raid_disks, chunk_size, layout;
> +	int level = 6;
> +	unsigned long long start, length;
> +	int i;
> +
> +	char *err = NULL;
> +	if (argc < 8) {
> +		fprintf(stderr, "Usage: raid6check raid_disks"
> +			" chunk_size layout start length devices...\n");
> +		exit(1);
> +	}
> +
> +	raid_disks = getnum(argv[1], &err);
> +	chunk_size = getnum(argv[2], &err);
> +	layout = getnum(argv[3], &err);
> +	start = getnum(argv[4], &err);
> +	length = getnum(argv[5], &err);
> +	if (err) {
> +		fprintf(stderr, "test_stripe: Bad number: %s\n", err);
> +		exit(2);
> +	}
> +	if (argc != raid_disks + 6) {
> +		fprintf(stderr, "test_stripe: wrong number of devices: want %d found %d\n",
> +			raid_disks, argc-6);
> +		exit(2);
> +	}
> +	fds = malloc(raid_disks * sizeof(*fds));
> +	offsets = malloc(raid_disks * sizeof(*offsets));
> +	memset(offsets, 0, raid_disks * sizeof(*offsets));
> +
> +	for (i=0; i<raid_disks; i++) {
> +		char *p;
> +		p = strchr(argv[6+i], ':');
> +
> +		if(p != NULL) {
> +			*p++ = '\0';
> +			offsets[i] = atoll(p) * 512;
> +		}
> +		fds[i] = open(argv[6+i], O_RDWR);
> +		if (fds[i] < 0) {
> +			perror(argv[6+i]);
> +			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
> +			exit(3);
> +		}
> +	}
> +
> +	buf = malloc(raid_disks * chunk_size);
> +
> +	int rv = check_stripes(fds, offsets,
> +			       raid_disks, chunk_size, level, layout,
> +			       start, length, &argv[6]);
> +	if (rv != 0) {
> +		fprintf(stderr,
> +			"test_stripe: test_stripes returned %d\n", rv);
> +		exit(1);
> +	}
> +
> +	free(fds);
> +	free(offsets);
> +	free(buf);
> +
> +	exit(0);
> +}
> +
> diff -urN a/restripe.c b/restripe.c
> --- a/restripe.c	2011-02-19 22:36:29.707651642 +0100
> +++ b/restripe.c	2011-02-19 18:13:14.468626986 +0100
> @@ -33,7 +33,7 @@
>   *
>   */
>  
> -static int geo_map(int block, unsigned long long stripe, int raid_disks,
> +int geo_map(int block, unsigned long long stripe, int raid_disks,
>  		   int level, int layout)
>  {
>  	/* On the given stripe, find which disk in the array will have
> @@ -223,7 +223,7 @@
>  	}
>  }
>  
> -static void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
> +void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
>  {
>  	int d, z;
>  	uint8_t wq0, wp0, wd0, w10, w20;
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone
  2011-03-21  3:02 ` NeilBrown
@ 2011-03-21 10:40   ` Piergiorgio Sartor
  2011-03-21 11:04     ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-03-21 10:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

On Mon, Mar 21, 2011 at 02:02:44PM +1100, NeilBrown wrote:
> On Mon, 21 Feb 2011 21:45:51 +0100 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> > Hi Neil,
> > 
> > please find attached a patch, to mdadm-3.2 base, including
> > a standalone versione of the raid-6 check.
> > 
> > This is basically a re-working (and hopefully improvement)
> > of the already implemented check in "restripe.c".
> > 
> > I splitted the check function into "collect" and "stats",
> > so that the second one could be easily replaced.
> > The API is also simplified.
> > 
> > The command line option are reduced, since we only level
> > is raid-6, but the ":offset" option is included.
> > 
> > The output reports the block/stripe rotation, P/Q errors
> > and the possible HDD (or unknown).
> > 
> > BTW, the patch applies also to the already patched "restripe.c",
> > including the last ":offset" patch (which is not yet in git).
> > 
> 
> Sorry for not replying for a month!!  I manage to lose track of this then
> couldn't find it when I went looking, then got distracted, then....

don't worry, I know how easy is to loose track of
such items. That's why I kept (slowly) insisting.

> Thanks for your patience and persistence.
> 
> I have applied this patch to me git now - with some minor changes.

Thanks.

One question, I did not find the previous patch to "restripe.c",
which was adding the ":offset" capability. This was in order to
be able to cope with metadata 1.1 or 1.2 (offset to be determined).

> > Other item is that due to "sysfs.c" linking (see below) the
> > "Makefile" needed some changes, I hope this is not a problem.
> 
> I have reverted most of these changes and the linking with sysfs.c isn't
> needed yet.  When it is we can sort out how best to achieve it.
> 
> > 
> > Next steps (TODO list you like) would be:
> > 
> > 1) Add the "sysfs.c" code in order to retrieve the HDDs info
> > from the MD device. It is already linked, together with the
> > whole (mdadm) universe, since it seems it cannot leave alone.
> > I'll need some advice or hint on how to do use it. I checked
> > "sysfs.c", but before I dig deep into it maybe better to
> > have some advice (maybe just one function call will do it).
> 
> What exactly do you want to achieve?
> 
> I suspect you want to a open the md device, then
> 
>   info = sysfs_read(fd, -1,
>   GET_LEVEL|GET_LAYOUT_GET_DISKS|GET_CHUNK|GET_DEVS|GET_OFFSET);
> 
> (possibly with other flags) and then extract the info you want from the data
> structure returned - but I'm only guessing at what you might want.

I think this more or less correct.

I would like to pass, to "raid6check", the md device as only
parameter, maybe with some start/stop position, and derive all
the needed information from that.

That is, it should be: raid6check /dev/mdX start stop
Or, possibly: raid6check /dev/mdX start length
Or just: raid6check /dev/mdX

This means, the information that should be derived is:

1) level, in order to confirm it is 6
2) layout
3) disk components
4) offset for each component
5) chunk size

That should be all, I guess.

Your "sysfs_read" seems to do exactly this, please confirm.

> > 
> > 2) Add the suspend lo/hi control. Fellow John Robinson was
> > suggesting to look into "Grow.c", which I did, but I guess
> > the same story as 1) is valid: better to have some hint on
> > where to look before wasting time.
> 
> This would be:
> 
>   sysfs_set_num(info, NULL, "suspend_lo", offset*data_disks);
>   sysfs_set_num(info, NULL, "suspend_hi", (offset+chunksize)*data_disks);
> 
> to freeze one stripe.  Then work in there.
> The addresses are addresses in the array, hence the multiplication
> by data_disks (which is raid_disks - 2 for RAID6).
> 
> Don't hold the array suspended for too long or something might get
> upset.  And allocate any memory you need first, and call
>     	mlockall(MCL_CURRENT | MCL_FUTURE);
> 
> first to be even more safe.

Thanks for the explanation. How is, then, the "unfreeze"?
Just writing (0) to both hi and lo?

> > 
> > 3) Add a repair option (future). This should have different
> > levels, like "all", "disk", "stripe". That is, fix everything
> > (more or less like "repair"), fix only if a disk is clearly
> > having problems, fix each stripe which has clearly a problem
> > (but maybe different stripes may belong to different HDDs).
> > 
> > So, for the point 1) and 2) would be nice to have some more
> > detail on where to look what. Point 3) we will discuss later.
> 
> Agreed.
> Hopefully the info I have provided is sufficient to help you progress.

I'll ask, if I need more information.

> Thanks,
> NeilBrown

Thank you,

bye,

pg

> 
> > 
> > Thanks, please consider for inclusion,
> > 
> > bye,
> > 
> > pg
> 
> > 
> > 
> > diff -urN a/Makefile b/Makefile
> > --- a/Makefile	2011-02-01 05:31:07.000000000 +0100
> > +++ b/Makefile	2011-02-20 00:03:44.929998038 +0100
> > @@ -98,7 +98,7 @@
> >  MAN5DIR = $(MANDIR)/man5
> >  MAN8DIR = $(MANDIR)/man8
> >  
> > -OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
> > +OBJSX = config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
> >  	Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
> >  	Incremental.o \
> >  	mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
> > @@ -106,6 +106,8 @@
> >  	restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
> >  	platform-intel.o probe_roms.o
> >  
> > +OBJS = mdadm.o $(OBJSX)
> > +
> >  SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
> >  	Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c Query.c \
> >  	Incremental.c \
> > @@ -182,6 +184,9 @@
> >  test_stripe : restripe.c mdadm.h
> >  	$(CC) $(CXFLAGS) $(LDFLAGS) -o test_stripe -DMAIN restripe.c
> >  
> > +raid6check : raid6check.o mdadm.h $(OBJSX)
> > +	$(CC) $(CXFLAGS) $(LDFLAGS) -o raid6check raid6check.o $(OBJSX)
> > +
> >  mdassemble : $(ASSEMBLE_SRCS) $(INCL)
> >  	rm -f $(OBJS)
> >  	$(DIET_GCC) $(ASSEMBLE_FLAGS) -o mdassemble $(ASSEMBLE_SRCS)  $(STATICSRC)
> > @@ -265,7 +270,7 @@
> >  	mdadm.Os mdadm.O2 mdmon.O2 \
> >  	mdassemble mdassemble.static mdassemble.auto mdassemble.uclibc \
> >  	mdassemble.klibc swap_super \
> > -	init.cpio.gz mdadm.uclibc.static test_stripe mdmon \
> > +	init.cpio.gz mdadm.uclibc.static test_stripe raid6check raid6check.o mdmon \
> >  	mdadm.8
> >  
> >  dist : clean
> > diff -urN a/raid6check.c b/raid6check.c
> > --- a/raid6check.c	1970-01-01 01:00:00.000000000 +0100
> > +++ b/raid6check.c	2011-02-21 21:40:27.813656214 +0100
> > @@ -0,0 +1,263 @@
> > +/*
> > + * raid6check - extended consistency check for RAID-6
> > + *
> > + * Copyright (C) 2011 Piergiorgio Sartor
> > + *
> > + *
> > + *    This program is free software; you can redistribute it and/or modify
> > + *    it under the terms of the GNU General Public License as published by
> > + *    the Free Software Foundation; either version 2 of the License, or
> > + *    (at your option) any later version.
> > + *
> > + *    This program is distributed in the hope that it will be useful,
> > + *    but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + *    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + *    GNU General Public License for more details.
> > + *
> > + *    You should have received a copy of the GNU General Public License
> > + *    along with this program; if not, write to the Free Software
> > + *    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
> > + *
> > + *    Author: Piergiorgio Sartor
> > + *    Based on "restripe.c" from "mdadm" codebase
> > + */
> > +
> > +#include "mdadm.h"
> > +#include <stdint.h>
> > +
> > +int geo_map(int block, unsigned long long stripe, int raid_disks,
> > +	    int level, int layout);
> > +void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size);
> > +void make_tables(void);
> > +
> > +/* Collect per stripe consistency information */
> > +void raid6_collect(int chunk_size, uint8_t *p, uint8_t *q,
> > +		   char *chunkP, char *chunkQ, int *results)
> > +{
> > +	int i;
> > +	int data_id;
> > +	uint8_t Px, Qx;
> > +	extern uint8_t raid6_gflog[];
> > +
> > +	for(i = 0; i < chunk_size; i++) {
> > +		Px = (uint8_t)chunkP[i] ^ (uint8_t)p[i];
> > +		Qx = (uint8_t)chunkQ[i] ^ (uint8_t)q[i];
> > +
> > +		if((Px != 0) && (Qx == 0))
> > +			results[i] = -1;
> > +
> > +		if((Px == 0) && (Qx != 0))
> > +			results[i] = -2;
> > +
> > +		if((Px != 0) && (Qx != 0)) {
> > +			data_id = (raid6_gflog[Qx] - raid6_gflog[Px]);
> > +			if(data_id < 0) data_id += 255;
> > +			results[i] = data_id;
> > +		}
> > +
> > +		if((Px == 0) && (Qx == 0))
> > +			results[i] = -255;
> > +	}
> > +}
> > +
> > +/* Try to find out if a specific disk has problems */
> > +int raid6_stats(int *results, int raid_disks, int chunk_size)
> > +{
> > +	int i;
> > +	int curr_broken_disk = -255;
> > +	int prev_broken_disk = -255;
> > +	int broken_status = 0;
> > +	
> > +	for(i = 0; i < chunk_size; i++) {
> > +
> > +		if(results[i] != -255)
> > +			curr_broken_disk = results[i];
> > +
> > +		if(curr_broken_disk >= raid_disks)
> > +			broken_status = 2;
> > +
> > +		switch(broken_status) {
> > +		case 0:
> > +			if(curr_broken_disk != -255) {
> > +				prev_broken_disk = curr_broken_disk;
> > +				broken_status = 1;
> > +			}
> > +			break;
> > +
> > +		case 1:
> > +			if(curr_broken_disk != prev_broken_disk)
> > +				broken_status = 2;
> > +			break;
> > +
> > +		case 2:
> > +		default:
> > +			curr_broken_disk = prev_broken_disk = -65535;
> > +			break;
> > +		}
> > +	}
> > +
> > +	return curr_broken_disk;
> > +}
> > +
> > +int check_stripes(int *source, unsigned long long *offsets,
> > +		  int raid_disks, int chunk_size, int level, int layout,
> > +		  unsigned long long start, unsigned long long length, char *name[])
> > +{
> > +	/* read the data and p and q blocks, and check we got them right */
> > +	char *stripe_buf = malloc(raid_disks * chunk_size);
> > +	char **stripes = malloc(raid_disks * sizeof(char*));
> > +	char **blocks = malloc(raid_disks * sizeof(char*));
> > +	uint8_t *p = malloc(chunk_size);
> > +	uint8_t *q = malloc(chunk_size);
> > +	int *results = malloc(chunk_size * sizeof(int));
> > +
> > +	int i;
> > +	int diskP, diskQ;
> > +	int data_disks = raid_disks - 2;
> > +
> > +	extern int tables_ready;
> > +
> > +	if (!tables_ready)
> > +		make_tables();
> > +
> > +	for ( i = 0 ; i < raid_disks ; i++)
> > +		stripes[i] = stripe_buf + i * chunk_size;
> > +
> > +	while (length > 0) {
> > +		int disk;
> > +
> > +		for (i = 0 ; i < raid_disks ; i++) {
> > +			lseek64(source[i], offsets[i]+start, 0);
> > +			read(source[i], stripes[i], chunk_size);
> > +		}
> > +		for (i = 0 ; i < data_disks ; i++) {
> > +			int disk = geo_map(i, start/chunk_size, raid_disks,
> > +					   level, layout);
> > +			blocks[i] = stripes[disk];
> > +			printf("%d->%d\n", i, disk);
> > +		}
> > +
> > +		qsyndrome(p, q, (uint8_t**)blocks, data_disks, chunk_size);
> > +		diskP = geo_map(-1, start/chunk_size, raid_disks,
> > +				level, layout);
> > +		if (memcmp(p, stripes[diskP], chunk_size) != 0) {
> > +			printf("P(%d) wrong at %llu\n", diskP,
> > +			       start / chunk_size);
> > +		}
> > +		diskQ = geo_map(-2, start/chunk_size, raid_disks,
> > +				level, layout);
> > +		if (memcmp(q, stripes[diskQ], chunk_size) != 0) {
> > +			printf("Q(%d) wrong at %llu\n", diskQ,
> > +			       start / chunk_size);
> > +		}
> > +		raid6_collect(chunk_size, p, q,
> > +			      stripes[diskP], stripes[diskQ], results);
> > +		disk = raid6_stats(results, raid_disks, chunk_size);
> > +
> > +		if(disk >= -2) {
> > +			disk = geo_map(disk, start/chunk_size, raid_disks,
> > +				       level, layout);
> > +		}
> > +		if(disk >= 0) {
> > +			printf("Possible failed disk: %d --> %s\n", disk, name[disk]);
> > +		}
> > +		if(disk == -65535) {
> > +			printf("Failure detected, but disk unknown\n");
> > +		}
> > +
> > +		length -= chunk_size;
> > +		start += chunk_size;
> > +	}
> > +
> > +	free(stripe_buf);
> > +	free(stripes);
> > +	free(blocks);
> > +	free(p);
> > +	free(q);
> > +	free(results);
> > +
> > +	return 0;
> > +}
> > +
> > +unsigned long long getnum(char *str, char **err)
> > +{
> > +	char *e;
> > +	unsigned long long rv = strtoull(str, &e, 10);
> > +	if (e==str || *e) {
> > +		*err = str;
> > +		return 0;
> > +	}
> > +	return rv;
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +	/* raid_disks chunk_size layout start length devices...
> > +	 */
> > +	int *fds;
> > +	char *buf;
> > +	unsigned long long *offsets;
> > +	int raid_disks, chunk_size, layout;
> > +	int level = 6;
> > +	unsigned long long start, length;
> > +	int i;
> > +
> > +	char *err = NULL;
> > +	if (argc < 8) {
> > +		fprintf(stderr, "Usage: raid6check raid_disks"
> > +			" chunk_size layout start length devices...\n");
> > +		exit(1);
> > +	}
> > +
> > +	raid_disks = getnum(argv[1], &err);
> > +	chunk_size = getnum(argv[2], &err);
> > +	layout = getnum(argv[3], &err);
> > +	start = getnum(argv[4], &err);
> > +	length = getnum(argv[5], &err);
> > +	if (err) {
> > +		fprintf(stderr, "test_stripe: Bad number: %s\n", err);
> > +		exit(2);
> > +	}
> > +	if (argc != raid_disks + 6) {
> > +		fprintf(stderr, "test_stripe: wrong number of devices: want %d found %d\n",
> > +			raid_disks, argc-6);
> > +		exit(2);
> > +	}
> > +	fds = malloc(raid_disks * sizeof(*fds));
> > +	offsets = malloc(raid_disks * sizeof(*offsets));
> > +	memset(offsets, 0, raid_disks * sizeof(*offsets));
> > +
> > +	for (i=0; i<raid_disks; i++) {
> > +		char *p;
> > +		p = strchr(argv[6+i], ':');
> > +
> > +		if(p != NULL) {
> > +			*p++ = '\0';
> > +			offsets[i] = atoll(p) * 512;
> > +		}
> > +		fds[i] = open(argv[6+i], O_RDWR);
> > +		if (fds[i] < 0) {
> > +			perror(argv[6+i]);
> > +			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
> > +			exit(3);
> > +		}
> > +	}
> > +
> > +	buf = malloc(raid_disks * chunk_size);
> > +
> > +	int rv = check_stripes(fds, offsets,
> > +			       raid_disks, chunk_size, level, layout,
> > +			       start, length, &argv[6]);
> > +	if (rv != 0) {
> > +		fprintf(stderr,
> > +			"test_stripe: test_stripes returned %d\n", rv);
> > +		exit(1);
> > +	}
> > +
> > +	free(fds);
> > +	free(offsets);
> > +	free(buf);
> > +
> > +	exit(0);
> > +}
> > +
> > diff -urN a/restripe.c b/restripe.c
> > --- a/restripe.c	2011-02-19 22:36:29.707651642 +0100
> > +++ b/restripe.c	2011-02-19 18:13:14.468626986 +0100
> > @@ -33,7 +33,7 @@
> >   *
> >   */
> >  
> > -static int geo_map(int block, unsigned long long stripe, int raid_disks,
> > +int geo_map(int block, unsigned long long stripe, int raid_disks,
> >  		   int level, int layout)
> >  {
> >  	/* On the given stripe, find which disk in the array will have
> > @@ -223,7 +223,7 @@
> >  	}
> >  }
> >  
> > -static void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
> > +void qsyndrome(uint8_t *p, uint8_t *q, uint8_t **sources, int disks, int size)
> >  {
> >  	int d, z;
> >  	uint8_t wq0, wp0, wd0, w10, w20;
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone
  2011-03-21 10:40   ` Piergiorgio Sartor
@ 2011-03-21 11:04     ` NeilBrown
  2011-03-21 11:54       ` Piergiorgio Sartor
  0 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2011-03-21 11:04 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Mon, 21 Mar 2011 11:40:07 +0100 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:


> > I have applied this patch to me git now - with some minor changes.
> 
> Thanks.
> 
> One question, I did not find the previous patch to "restripe.c",
> which was adding the ":offset" capability. This was in order to
> be able to cope with metadata 1.1 or 1.2 (offset to be determined).

I cannot seem to find it.  Please resend.


> 
> > > Other item is that due to "sysfs.c" linking (see below) the
> > > "Makefile" needed some changes, I hope this is not a problem.
> > 
> > I have reverted most of these changes and the linking with sysfs.c isn't
> > needed yet.  When it is we can sort out how best to achieve it.
> > 
> > > 
> > > Next steps (TODO list you like) would be:
> > > 
> > > 1) Add the "sysfs.c" code in order to retrieve the HDDs info
> > > from the MD device. It is already linked, together with the
> > > whole (mdadm) universe, since it seems it cannot leave alone.
> > > I'll need some advice or hint on how to do use it. I checked
> > > "sysfs.c", but before I dig deep into it maybe better to
> > > have some advice (maybe just one function call will do it).
> > 
> > What exactly do you want to achieve?
> > 
> > I suspect you want to a open the md device, then
> > 
> >   info = sysfs_read(fd, -1,
> >   GET_LEVEL|GET_LAYOUT_GET_DISKS|GET_CHUNK|GET_DEVS|GET_OFFSET);
> > 
> > (possibly with other flags) and then extract the info you want from the data
> > structure returned - but I'm only guessing at what you might want.
> 
> I think this more or less correct.
> 
> I would like to pass, to "raid6check", the md device as only
> parameter, maybe with some start/stop position, and derive all
> the needed information from that.
> 
> That is, it should be: raid6check /dev/mdX start stop
> Or, possibly: raid6check /dev/mdX start length
> Or just: raid6check /dev/mdX
> 
> This means, the information that should be derived is:
> 
> 1) level, in order to confirm it is 6
> 2) layout
> 3) disk components
> 4) offset for each component
> 5) chunk size
> 
> That should be all, I guess.
> 
> Your "sysfs_read" seems to do exactly this, please confirm.

Yes, all that should be in there - experiment and see if you get what you
expect!

> 
> > > 
> > > 2) Add the suspend lo/hi control. Fellow John Robinson was
> > > suggesting to look into "Grow.c", which I did, but I guess
> > > the same story as 1) is valid: better to have some hint on
> > > where to look before wasting time.
> > 
> > This would be:
> > 
> >   sysfs_set_num(info, NULL, "suspend_lo", offset*data_disks);
> >   sysfs_set_num(info, NULL, "suspend_hi", (offset+chunksize)*data_disks);
> > 
> > to freeze one stripe.  Then work in there.
> > The addresses are addresses in the array, hence the multiplication
> > by data_disks (which is raid_disks - 2 for RAID6).
> > 
> > Don't hold the array suspended for too long or something might get
> > upset.  And allocate any memory you need first, and call
> >     	mlockall(MCL_CURRENT | MCL_FUTURE);
> > 
> > first to be even more safe.
> 
> Thanks for the explanation. How is, then, the "unfreeze"?
> Just writing (0) to both hi and lo?

Just make sure lo >= hi and it will unfreeze.
Some kernels are a bit fussy about the order of writing to these.
So if you just write a big number to 'lo', then the same to 'hi', you should
be safe.


NeilBrown


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone
  2011-03-21 11:04     ` NeilBrown
@ 2011-03-21 11:54       ` Piergiorgio Sartor
  2011-03-21 22:59         ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-03-21 11:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

On Mon, Mar 21, 2011 at 10:04:57PM +1100, NeilBrown wrote:
> On Mon, 21 Mar 2011 11:40:07 +0100 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> 
> > > I have applied this patch to me git now - with some minor changes.
> > 
> > Thanks.
> > 
> > One question, I did not find the previous patch to "restripe.c",
> > which was adding the ":offset" capability. This was in order to
> > be able to cope with metadata 1.1 or 1.2 (offset to be determined).
> 
> I cannot seem to find it.  Please resend.

neither me, maybe I just forgot to send it.

Anyway, here below is the patch:

--- cut here ---

diff -urN a/restripe.c b/restripe.c
--- a/restripe.c	2011-02-18 23:18:20.377740868 +0100
+++ b/restripe.c	2011-02-18 23:30:07.589841525 +0100
@@ -875,6 +875,14 @@
 		exit(3);
 	}
 	for (i=0; i<raid_disks; i++) {
+		char *p;
+		p = strchr(argv[9+i], ':');
+
+		if(p != NULL) {
+			*p++ = '\0';
+			offsets[i] = atoll(p) * 512;
+		}
+			
 		fds[i] = open(argv[9+i], O_RDWR);
 		if (fds[i] < 0) {
 			perror(argv[9+i]);

--- cut here ---

Thanks,

bye,

pg



> 
> > 
> > > > Other item is that due to "sysfs.c" linking (see below) the
> > > > "Makefile" needed some changes, I hope this is not a problem.
> > > 
> > > I have reverted most of these changes and the linking with sysfs.c isn't
> > > needed yet.  When it is we can sort out how best to achieve it.
> > > 
> > > > 
> > > > Next steps (TODO list you like) would be:
> > > > 
> > > > 1) Add the "sysfs.c" code in order to retrieve the HDDs info
> > > > from the MD device. It is already linked, together with the
> > > > whole (mdadm) universe, since it seems it cannot leave alone.
> > > > I'll need some advice or hint on how to do use it. I checked
> > > > "sysfs.c", but before I dig deep into it maybe better to
> > > > have some advice (maybe just one function call will do it).
> > > 
> > > What exactly do you want to achieve?
> > > 
> > > I suspect you want to a open the md device, then
> > > 
> > >   info = sysfs_read(fd, -1,
> > >   GET_LEVEL|GET_LAYOUT_GET_DISKS|GET_CHUNK|GET_DEVS|GET_OFFSET);
> > > 
> > > (possibly with other flags) and then extract the info you want from the data
> > > structure returned - but I'm only guessing at what you might want.
> > 
> > I think this more or less correct.
> > 
> > I would like to pass, to "raid6check", the md device as only
> > parameter, maybe with some start/stop position, and derive all
> > the needed information from that.
> > 
> > That is, it should be: raid6check /dev/mdX start stop
> > Or, possibly: raid6check /dev/mdX start length
> > Or just: raid6check /dev/mdX
> > 
> > This means, the information that should be derived is:
> > 
> > 1) level, in order to confirm it is 6
> > 2) layout
> > 3) disk components
> > 4) offset for each component
> > 5) chunk size
> > 
> > That should be all, I guess.
> > 
> > Your "sysfs_read" seems to do exactly this, please confirm.
> 
> Yes, all that should be in there - experiment and see if you get what you
> expect!
> 
> > 
> > > > 
> > > > 2) Add the suspend lo/hi control. Fellow John Robinson was
> > > > suggesting to look into "Grow.c", which I did, but I guess
> > > > the same story as 1) is valid: better to have some hint on
> > > > where to look before wasting time.
> > > 
> > > This would be:
> > > 
> > >   sysfs_set_num(info, NULL, "suspend_lo", offset*data_disks);
> > >   sysfs_set_num(info, NULL, "suspend_hi", (offset+chunksize)*data_disks);
> > > 
> > > to freeze one stripe.  Then work in there.
> > > The addresses are addresses in the array, hence the multiplication
> > > by data_disks (which is raid_disks - 2 for RAID6).
> > > 
> > > Don't hold the array suspended for too long or something might get
> > > upset.  And allocate any memory you need first, and call
> > >     	mlockall(MCL_CURRENT | MCL_FUTURE);
> > > 
> > > first to be even more safe.
> > 
> > Thanks for the explanation. How is, then, the "unfreeze"?
> > Just writing (0) to both hi and lo?
> 
> Just make sure lo >= hi and it will unfreeze.
> Some kernels are a bit fussy about the order of writing to these.
> So if you just write a big number to 'lo', then the same to 'hi', you should
> be safe.
> 
> 
> NeilBrown
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone
  2011-03-21 11:54       ` Piergiorgio Sartor
@ 2011-03-21 22:59         ` NeilBrown
  2011-03-31 18:53           ` [PATCH] RAID-6 check standalone md device Piergiorgio Sartor
  2011-04-04 17:52           ` [PATCH] RAID-6 check standalone code cleanup Piergiorgio Sartor
  0 siblings, 2 replies; 29+ messages in thread
From: NeilBrown @ 2011-03-21 22:59 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Mon, 21 Mar 2011 12:54:41 +0100 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> On Mon, Mar 21, 2011 at 10:04:57PM +1100, NeilBrown wrote:
> > On Mon, 21 Mar 2011 11:40:07 +0100 Piergiorgio Sartor
> > <piergiorgio.sartor@nexgo.de> wrote:
> > 
> > 
> > > > I have applied this patch to me git now - with some minor changes.
> > > 
> > > Thanks.
> > > 
> > > One question, I did not find the previous patch to "restripe.c",
> > > which was adding the ":offset" capability. This was in order to
> > > be able to cope with metadata 1.1 or 1.2 (offset to be determined).
> > 
> > I cannot seem to find it.  Please resend.
> 
> neither me, maybe I just forgot to send it.
> 
> Anyway, here below is the patch:
> 
> --- cut here ---
> 
> diff -urN a/restripe.c b/restripe.c
> --- a/restripe.c	2011-02-18 23:18:20.377740868 +0100
> +++ b/restripe.c	2011-02-18 23:30:07.589841525 +0100
> @@ -875,6 +875,14 @@
>  		exit(3);
>  	}
>  	for (i=0; i<raid_disks; i++) {
> +		char *p;
> +		p = strchr(argv[9+i], ':');
> +
> +		if(p != NULL) {
> +			*p++ = '\0';
> +			offsets[i] = atoll(p) * 512;
> +		}
> +			
>  		fds[i] = open(argv[9+i], O_RDWR);
>  		if (fds[i] < 0) {
>  			perror(argv[9+i]);
> 

OK, that's applied now.
thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone md device
  2011-03-21 22:59         ` NeilBrown
@ 2011-03-31 18:53           ` Piergiorgio Sartor
       [not found]             ` <4D96597C.1020103@tuxes.nl>
  2011-04-04 23:01             ` NeilBrown
  2011-04-04 17:52           ` [PATCH] RAID-6 check standalone code cleanup Piergiorgio Sartor
  1 sibling, 2 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-03-31 18:53 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

please find below the promised patch for the
RAID-6 check, which allows to pass only the
MD device, start and length.
The three parameters are mandatory.

All necessary information is collected using
the "sysfs_read()" call.
Furthermore, if "length" is "0", then the check
is performed until the end of the array.

The "Makefile" needed modifications too (as done
previously) in order to link "sysfs.c".

Some checks are done, for example if the md device
is really a RAID-6. Nevertheless I guess it is not
bullet proof...

Next patch will include the "suspend" action.
My idea is to do it "per stripe", please let me
know if you've some better options.

bye,

pg

--- cut here ---

diff -uNr a/Makefile b/Makefile
--- a/Makefile	2011-03-24 04:21:58.000000000 +0100
+++ b/Makefile	2011-03-29 21:43:33.495939342 +0200
@@ -98,7 +98,7 @@
 MAN5DIR = $(MANDIR)/man5
 MAN8DIR = $(MANDIR)/man8
 
-OBJS =  mdadm.o config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
+OBJSX = config.o policy.o mdstat.o  ReadMe.o util.o Manage.o Assemble.o Build.o \
 	Create.o Detail.o Examine.o Grow.o Monitor.o dlink.o Kill.o Query.o \
 	Incremental.o \
 	mdopen.o super0.o super1.o super-ddf.o super-intel.o bitmap.o \
@@ -106,7 +106,7 @@
 	restripe.o sysfs.o sha1.o mapfile.o crc32.o sg_io.o msg.o \
 	platform-intel.o probe_roms.o
 
-OBJSX = restripe.o
+OBJS = mdadm.o $(OBJSX)
 
 SRCS =  mdadm.c config.c policy.c mdstat.c  ReadMe.c util.c Manage.c Assemble.c Build.c \
 	Create.c Detail.c Examine.c Grow.c Monitor.c dlink.c Kill.c Query.c \
diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-03-24 04:21:58.000000000 +0100
+++ b/raid6check.c	2011-03-31 20:44:45.820258613 +0200
@@ -159,7 +159,7 @@
 				       level, layout);
 		}
 		if(disk >= 0) {
-			printf("Possible failed disk: %d --> %s\n", disk, name[disk]);
+			printf("Possible failed disk slot: %d --> %s\n", disk, name[disk]);
 		}
 		if(disk == -65535) {
 			printf("Failure detected, but disk unknown\n");
@@ -192,71 +192,121 @@
 
 int main(int argc, char *argv[])
 {
-	/* raid_disks chunk_size layout start length devices...
-	 */
+	/* md_device start length */
 	int *fds;
 	char *buf;
+	char **disk_name;
 	unsigned long long *offsets;
 	int raid_disks, chunk_size, layout;
 	int level = 6;
 	unsigned long long start, length;
 	int i;
+	int mdfd;
+	struct mdinfo *info, *comp;
+        char *err = NULL;
+	const char prg[] = "raid6check";
+
+        if (argc < 3) {
+                fprintf(stderr, "Usage: %s md_device start length\n", prg);
+                exit(1);
+        }
+
+	mdfd = open(argv[1], O_RDONLY);
+	if(mdfd < 0) {
+		perror(argv[1]);
+		fprintf(stderr,"%s: cannot open %s\n", prg, argv[1]);
+		exit(4);
+	}
+
+	info  = sysfs_read(mdfd, -1,
+		GET_LEVEL|
+		GET_LAYOUT|
+		GET_DISKS|
+		GET_COMPONENT|
+		GET_CHUNK|
+		GET_DEVS|
+		GET_OFFSET|
+		GET_SIZE);
+
+	if(info->array.level != level) {
+		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
+		exit(5);
+	}
+
+	printf("layout: %d\n", info->array.layout);
+	printf("disks: %d\n", info->array.raid_disks);
+	printf("component size: %llu\n", info->component_size*512);
+	printf("chunk size: %d\n", info->array.chunk_size);
+	printf("\n");
+
+	comp = info->devs;
+	for(i = 0; i < info->array.raid_disks; i++) {
+		printf("disk: %d - offset: %llu - size: %llu - name: %s - slot: %d\n",
+			i, comp->data_offset, comp->component_size*512,
+			map_dev(comp->disk.major, comp->disk.minor, 0),
+			comp->disk.raid_disk);
+
+		comp = comp->next;
+	}
+	printf("\n");
+
+	close(mdfd);
+
+	raid_disks = info->array.raid_disks;
+	chunk_size = info->array.chunk_size;
+	layout = info->array.layout;
+	start = getnum(argv[2], &err);
+	length = getnum(argv[3], &err);
 
-	char *err = NULL;
-	if (argc < 8) {
-		fprintf(stderr, "Usage: raid6check raid_disks"
-			" chunk_size layout start length devices...\n");
-		exit(1);
-	}
-
-	raid_disks = getnum(argv[1], &err);
-	chunk_size = getnum(argv[2], &err);
-	layout = getnum(argv[3], &err);
-	start = getnum(argv[4], &err);
-	length = getnum(argv[5], &err);
 	if (err) {
-		fprintf(stderr, "test_stripe: Bad number: %s\n", err);
+		fprintf(stderr, "%s: Bad number: %s\n", prg, err);
 		exit(2);
 	}
-	if (argc != raid_disks + 6) {
-		fprintf(stderr, "test_stripe: wrong number of devices: want %d found %d\n",
-			raid_disks, argc-6);
-		exit(2);
+
+	start = (start / chunk_size) * chunk_size;
+
+	if(length == 0) {
+		length = info->component_size * 512 - start;
 	}
+
+	disk_name = malloc(raid_disks * sizeof(*disk_name));
 	fds = malloc(raid_disks * sizeof(*fds));
 	offsets = malloc(raid_disks * sizeof(*offsets));
 	memset(offsets, 0, raid_disks * sizeof(*offsets));
 
+	comp = info->devs;
 	for (i=0; i<raid_disks; i++) {
-		char *p;
-		p = strchr(argv[6+i], ':');
-
-		if(p != NULL) {
-			*p++ = '\0';
-			offsets[i] = atoll(p) * 512;
-		}
-		fds[i] = open(argv[6+i], O_RDWR);
-		if (fds[i] < 0) {
-			perror(argv[6+i]);
-			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
+		int disk_slot = comp->disk.raid_disk;
+		disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
+		offsets[disk_slot] = comp->data_offset * 512;
+		fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
+		if (fds[disk_slot] < 0) {
+			perror(disk_name[disk_slot]);
+			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
 			exit(3);
 		}
+
+		comp = comp->next;
 	}
 
 	buf = malloc(raid_disks * chunk_size);
 
 	int rv = check_stripes(fds, offsets,
 			       raid_disks, chunk_size, level, layout,
-			       start, length, &argv[6]);
+			       start, length, disk_name);
 	if (rv != 0) {
 		fprintf(stderr,
-			"test_stripe: test_stripes returned %d\n", rv);
+			"%s: check_stripes returned %d\n", prg, rv);
 		exit(1);
 	}
 
+	free(disk_name);
 	free(fds);
 	free(offsets);
 	free(buf);
 
+	for(i=0; i<raid_disks; i++)
+		close(fds[i]);
+
 	exit(0);
 }

--- cut here ---

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone md device
       [not found]               ` <20110402071310.GA2640@lazy.lzy>
@ 2011-04-02 10:33                 ` Bas van Schaik
  2011-04-02 11:03                   ` Piergiorgio Sartor
  0 siblings, 1 reply; 29+ messages in thread
From: Bas van Schaik @ 2011-04-02 10:33 UTC (permalink / raw)
  To: Piergiorgio Sartor, linux-raid

On 04/02/2011 08:13 AM, Piergiorgio Sartor wrote:
> Hi Bas,
>
> On Sat, Apr 02, 2011 at 12:02:20AM +0100, Bas van Schaik wrote:
>> Dear Piergiorgio,
>>
>> On 03/31/2011 07:53 PM, Piergiorgio Sartor wrote:
>>> please find below the promised patch for the
>>> RAID-6 check, which allows to pass only the
>>> MD device, start and length.
>>> (...)
>> Recently, I repeatedly ran into block mismatches in my 8x2TB RAID-6
>> array (and a corrupted EXT3 filesystem). I've been playing with the idea
>> (as expressed in my posts to the linux-raid list in the last month) to
>> extend the RAID-6 check in order to report more verbose information
>> about a block mismatch, which might help to identify the failing hard drive.
>>
>> Do I understand correctly that your recent patches implement exactly
>> this feature?
> yes, that's the target.
That's great! The code can currently only be found in GIT, right? I'll
try to fetch it from there and compile the user-space binaries. Of
course I'll report any suggestions back to you, including patches if I
feel confident enough...

Thanks,

  Bas

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone md device
  2011-04-02 10:33                 ` Bas van Schaik
@ 2011-04-02 11:03                   ` Piergiorgio Sartor
  0 siblings, 0 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-04-02 11:03 UTC (permalink / raw)
  To: Bas van Schaik; +Cc: Piergiorgio Sartor, linux-raid

Hi Bas,

On Sat, Apr 02, 2011 at 11:33:33AM +0100, Bas van Schaik wrote:
> On 04/02/2011 08:13 AM, Piergiorgio Sartor wrote:
> > Hi Bas,
> >
> > On Sat, Apr 02, 2011 at 12:02:20AM +0100, Bas van Schaik wrote:
> >> Dear Piergiorgio,
> >>
> >> On 03/31/2011 07:53 PM, Piergiorgio Sartor wrote:
> >>> please find below the promised patch for the
> >>> RAID-6 check, which allows to pass only the
> >>> MD device, start and length.
> >>> (...)
> >> Recently, I repeatedly ran into block mismatches in my 8x2TB RAID-6
> >> array (and a corrupted EXT3 filesystem). I've been playing with the idea
> >> (as expressed in my posts to the linux-raid list in the last month) to
> >> extend the RAID-6 check in order to report more verbose information
> >> about a block mismatch, which might help to identify the failing hard drive.
> >>
> >> Do I understand correctly that your recent patches implement exactly
> >> this feature?
> > yes, that's the target.
> That's great! The code can currently only be found in GIT, right? I'll

you'll have to add the latest patch, from two days ago,
yourself, I do not see it in git, yet.
It is for mdadm 3.2, that is the development branch.

> try to fetch it from there and compile the user-space binaries. Of

Just unpack the tar.gz, patch it and "make raid6check".

> course I'll report any suggestions back to you, including patches if I
> feel confident enough...

Yes please, I'll try to follow.
 
> Thanks,

Thank you,

bye,

pg

>   Bas
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone code cleanup
  2011-03-21 22:59         ` NeilBrown
  2011-03-31 18:53           ` [PATCH] RAID-6 check standalone md device Piergiorgio Sartor
@ 2011-04-04 17:52           ` Piergiorgio Sartor
  2011-04-04 23:12             ` NeilBrown
  1 sibling, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-04-04 17:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

please find below a second patch to "raid6check.c".
This applies on top of the previous one.

Major change is code cleanup and simplification.
Furthermore, a better error handling and a couple
of bug fixes.
Last but not least, the command line parameters are
changed from "bytes" to "stripes", which is more
convenient, I guess.

If you prefer, I can send a single patch, including
in one shot the last one and this one.

Thanks,

bye,

pg

--- cut here ---

diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-03-31 20:44:45.820258613 +0200
+++ b/raid6check.c	2011-04-04 19:44:52.539745305 +0200
@@ -114,9 +114,20 @@
 	int i;
 	int diskP, diskQ;
 	int data_disks = raid_disks - 2;
+	int err = 0;
 
 	extern int tables_ready;
 
+	if((stripe_buf == NULL) ||
+	   (stripes == NULL) ||
+	   (blocks == NULL) ||
+	   (p == NULL) ||
+	   (q == NULL) ||
+	   (results == NULL)) {
+		err = 1;
+		goto exitCheck;
+	}
+
 	if (!tables_ready)
 		make_tables();
 
@@ -126,49 +137,47 @@
 	while (length > 0) {
 		int disk;
 
+		printf("pos --> %llu\n", start);
+
 		for (i = 0 ; i < raid_disks ; i++) {
-			lseek64(source[i], offsets[i]+start, 0);
+			lseek64(source[i], offsets[i] + start * chunk_size, 0);
 			read(source[i], stripes[i], chunk_size);
 		}
 		for (i = 0 ; i < data_disks ; i++) {
-			int disk = geo_map(i, start/chunk_size, raid_disks,
-					   level, layout);
+			int disk = geo_map(i, start, raid_disks, level, layout);
 			blocks[i] = stripes[disk];
 			printf("%d->%d\n", i, disk);
 		}
 
 		qsyndrome(p, q, (uint8_t**)blocks, data_disks, chunk_size);
-		diskP = geo_map(-1, start/chunk_size, raid_disks,
-				level, layout);
+		diskP = geo_map(-1, start, raid_disks, level, layout);
 		if (memcmp(p, stripes[diskP], chunk_size) != 0) {
-			printf("P(%d) wrong at %llu\n", diskP,
-			       start / chunk_size);
+			printf("P(%d) wrong at %llu\n", diskP, start);
 		}
-		diskQ = geo_map(-2, start/chunk_size, raid_disks,
-				level, layout);
+		diskQ = geo_map(-2, start, raid_disks, level, layout);
 		if (memcmp(q, stripes[diskQ], chunk_size) != 0) {
-			printf("Q(%d) wrong at %llu\n", diskQ,
-			       start / chunk_size);
+			printf("Q(%d) wrong at %llu\n", diskQ, start);
 		}
-		raid6_collect(chunk_size, p, q,
-			      stripes[diskP], stripes[diskQ], results);
+		raid6_collect(chunk_size, p, q, stripes[diskP], stripes[diskQ], results);
 		disk = raid6_stats(results, raid_disks, chunk_size);
 
 		if(disk >= -2) {
-			disk = geo_map(disk, start/chunk_size, raid_disks,
-				       level, layout);
+			disk = geo_map(disk, start, raid_disks, level, layout);
 		}
 		if(disk >= 0) {
-			printf("Possible failed disk slot: %d --> %s\n", disk, name[disk]);
+			printf("Error detected at %llu: possible failed disk slot: %d --> %s\n",
+				start, disk, name[disk]);
 		}
 		if(disk == -65535) {
-			printf("Failure detected, but disk unknown\n");
+			printf("Error detected at %llu: disk slot unknown\n", start);
 		}
 
-		length -= chunk_size;
-		start += chunk_size;
+		length--;
+		start++;
 	}
 
+exitCheck:
+
 	free(stripe_buf);
 	free(stripes);
 	free(blocks);
@@ -176,7 +185,7 @@
 	free(q);
 	free(results);
 
-	return 0;
+	return err;
 }
 
 unsigned long long getnum(char *str, char **err)
@@ -193,29 +202,40 @@
 int main(int argc, char *argv[])
 {
 	/* md_device start length */
-	int *fds;
-	char *buf;
-	char **disk_name;
-	unsigned long long *offsets;
-	int raid_disks, chunk_size, layout;
+	int *fds = NULL;
+	char *buf = NULL;
+	char **disk_name = NULL;
+	unsigned long long *offsets = NULL;
+	int raid_disks = 0;
+	int chunk_size = 0;
+	int layout = -1;
 	int level = 6;
 	unsigned long long start, length;
 	int i;
 	int mdfd;
 	struct mdinfo *info, *comp;
         char *err = NULL;
-	const char prg[] = "raid6check";
-
-        if (argc < 3) {
-                fprintf(stderr, "Usage: %s md_device start length\n", prg);
-                exit(1);
+	int exit_err = 0;
+	int close_flag = 0;
+	char *prg = strrchr(argv[0], '/');
+
+	if (prg == NULL)
+		prg = argv[0];
+	else
+		prg++;
+
+        if (argc < 4) {
+                fprintf(stderr, "Usage: %s md_device start_stripe length_stripes\n", prg);
+                exit_err = 1;
+		goto exitHere;
         }
 
 	mdfd = open(argv[1], O_RDONLY);
 	if(mdfd < 0) {
 		perror(argv[1]);
 		fprintf(stderr,"%s: cannot open %s\n", prg, argv[1]);
-		exit(4);
+		exit_err = 2;
+		goto exitHere;
 	}
 
 	info  = sysfs_read(mdfd, -1,
@@ -230,19 +250,21 @@
 
 	if(info->array.level != level) {
 		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
-		exit(5);
+		exit_err = 3;
+		goto exitHere;
 	}
 
 	printf("layout: %d\n", info->array.layout);
 	printf("disks: %d\n", info->array.raid_disks);
-	printf("component size: %llu\n", info->component_size*512);
+	printf("component size: %llu\n", info->component_size * 512);
+	printf("total stripes: %llu\n", (info->component_size * 512) / info->array.chunk_size);
 	printf("chunk size: %d\n", info->array.chunk_size);
 	printf("\n");
 
 	comp = info->devs;
 	for(i = 0; i < info->array.raid_disks; i++) {
 		printf("disk: %d - offset: %llu - size: %llu - name: %s - slot: %d\n",
-			i, comp->data_offset, comp->component_size*512,
+			i, comp->data_offset * 512, comp->component_size * 512,
 			map_dev(comp->disk.major, comp->disk.minor, 0),
 			comp->disk.raid_disk);
 
@@ -260,19 +282,39 @@
 
 	if (err) {
 		fprintf(stderr, "%s: Bad number: %s\n", prg, err);
-		exit(2);
+		exit_err = 4;
+		goto exitHere;
 	}
 
-	start = (start / chunk_size) * chunk_size;
+	if(start > ((info->component_size * 512) / chunk_size)) {
+		start = (info->component_size * 512) / chunk_size;
+		fprintf(stderr, "%s: start beyond disks size\n", prg);
+	}
 
-	if(length == 0) {
-		length = info->component_size * 512 - start;
+	if((length == 0) ||
+	   ((length + start) > ((info->component_size * 512) / chunk_size))) {
+		length = (info->component_size * 512) / chunk_size - start;
 	}
 
 	disk_name = malloc(raid_disks * sizeof(*disk_name));
 	fds = malloc(raid_disks * sizeof(*fds));
 	offsets = malloc(raid_disks * sizeof(*offsets));
+	buf = malloc(raid_disks * chunk_size);
+
+	if((disk_name == NULL) ||
+	   (fds == NULL) ||
+	   (offsets == NULL) ||
+	   (buf == NULL)) {
+		fprintf(stderr, "%s: allocation fail\n", prg);
+		exit_err = 5;
+		goto exitHere;
+	} 
+
 	memset(offsets, 0, raid_disks * sizeof(*offsets));
+	for(i=0; i<raid_disks; i++) {	
+		fds[i] = -1;
+	}
+	close_flag = 1;
 
 	comp = info->devs;
 	for (i=0; i<raid_disks; i++) {
@@ -283,21 +325,29 @@
 		if (fds[disk_slot] < 0) {
 			perror(disk_name[disk_slot]);
 			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
-			exit(3);
+			exit_err = 6;
+			goto exitHere;
 		}
 
 		comp = comp->next;
 	}
 
-	buf = malloc(raid_disks * chunk_size);
-
 	int rv = check_stripes(fds, offsets,
 			       raid_disks, chunk_size, level, layout,
 			       start, length, disk_name);
 	if (rv != 0) {
 		fprintf(stderr,
 			"%s: check_stripes returned %d\n", prg, rv);
-		exit(1);
+		exit_err = 7;
+		goto exitHere;
+	}
+
+exitHere:
+
+	if(close_flag == 1) {
+		for(i=0; i<raid_disks; i++) {
+			close(fds[i]);
+		}
 	}
 
 	free(disk_name);
@@ -305,8 +355,5 @@
 	free(offsets);
 	free(buf);
 
-	for(i=0; i<raid_disks; i++)
-		close(fds[i]);
-
-	exit(0);
+	exit(exit_err);
 }

--- cut here ---

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone md device
  2011-03-31 18:53           ` [PATCH] RAID-6 check standalone md device Piergiorgio Sartor
       [not found]             ` <4D96597C.1020103@tuxes.nl>
@ 2011-04-04 23:01             ` NeilBrown
  2011-04-05 19:56               ` Piergiorgio Sartor
  1 sibling, 1 reply; 29+ messages in thread
From: NeilBrown @ 2011-04-04 23:01 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Thu, 31 Mar 2011 20:53:46 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> please find below the promised patch for the
> RAID-6 check, which allows to pass only the
> MD device, start and length.
> The three parameters are mandatory.
> 
> All necessary information is collected using
> the "sysfs_read()" call.
> Furthermore, if "length" is "0", then the check
> is performed until the end of the array.
> 
> The "Makefile" needed modifications too (as done
> previously) in order to link "sysfs.c".
> 
> Some checks are done, for example if the md device
> is really a RAID-6. Nevertheless I guess it is not
> bullet proof...
> 
> Next patch will include the "suspend" action.
> My idea is to do it "per stripe", please let me
> know if you've some better options.

Hi,
 I've applied this patch after some minor clean ups (indenting mostly).
 I rearranged some code first so that we don't need to link all all of the
 rest of mdadm - just a few 'library-style' files.

 I'm not sure about the loop:
>  
> +	comp = info->devs;
>  	for (i=0; i<raid_disks; i++) {
> -		char *p;
> -		p = strchr(argv[6+i], ':');
> -
> -		if(p != NULL) {
> -			*p++ = '\0';
> -			offsets[i] = atoll(p) * 512;
> -		}
> -		fds[i] = open(argv[6+i], O_RDWR);
> -		if (fds[i] < 0) {
> -			perror(argv[6+i]);
> -			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
> +		int disk_slot = comp->disk.raid_disk;
> +		disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
> +		offsets[disk_slot] = comp->data_offset * 512;
> +		fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
> +		if (fds[disk_slot] < 0) {
> +			perror(disk_name[disk_slot]);
> +			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
>  			exit(3);
>  		}
> +
> +		comp = comp->next;
>  	}


The 'info->devs' list could include spare devices mixed in with the other
devices.  So the 'for' loop should go to the end of list list, and we should
ignore devices which are not active..
So you should probably fix this.

You will be able to find the applied patch in my 'master' branch shortly.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone code cleanup
  2011-04-04 17:52           ` [PATCH] RAID-6 check standalone code cleanup Piergiorgio Sartor
@ 2011-04-04 23:12             ` NeilBrown
  2011-04-06 18:02               ` Piergiorgio Sartor
  2011-05-08 18:54               ` [PATCH] RAID-6 check standalone suspend array Piergiorgio Sartor
  0 siblings, 2 replies; 29+ messages in thread
From: NeilBrown @ 2011-04-04 23:12 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Mon, 4 Apr 2011 19:52:42 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> please find below a second patch to "raid6check.c".
> This applies on top of the previous one.
> 
> Major change is code cleanup and simplification.
> Furthermore, a better error handling and a couple
> of bug fixes.
> Last but not least, the command line parameters are
> changed from "bytes" to "stripes", which is more
> convenient, I guess.

Thanks - I've applied this.

I'm not sure about using 'stripes', though it would be hard to argue in
favour of 'bytes'.
Possibly the best number to use would be 'sectors' as that is how the kernel
would report an inconsistency.

Once the code settles and you work out what the expected usage pattern would
be, it might then be obvious what the best number is.  i.e. try to document
how it would be use and if you find yourself describing complex calculations,
then change the program so it does the the calculations and you document can
avoid the complexity.

> 
> If you prefer, I can send a single patch, including
> in one shot the last one and this one.

no, multiple patches are much better - thanks.

As for the granularity for suspend/check/fix/unsuspend, I suspect that 
per-stripe would be best.
A smaller size wouldn't work, and a bigger size would only be helpful if
there were lots and lots of fixes needed ... which hopefully won't be the
case.

NeilBrown


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone md device
  2011-04-04 23:01             ` NeilBrown
@ 2011-04-05 19:56               ` Piergiorgio Sartor
  0 siblings, 0 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-04-05 19:56 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

On Tue, Apr 05, 2011 at 09:01:13AM +1000, NeilBrown wrote:
> On Thu, 31 Mar 2011 20:53:46 +0200 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> > Hi Neil,
> > 
> > please find below the promised patch for the
> > RAID-6 check, which allows to pass only the
> > MD device, start and length.
> > The three parameters are mandatory.
> > 
> > All necessary information is collected using
> > the "sysfs_read()" call.
> > Furthermore, if "length" is "0", then the check
> > is performed until the end of the array.
> > 
> > The "Makefile" needed modifications too (as done
> > previously) in order to link "sysfs.c".
> > 
> > Some checks are done, for example if the md device
> > is really a RAID-6. Nevertheless I guess it is not
> > bullet proof...
> > 
> > Next patch will include the "suspend" action.
> > My idea is to do it "per stripe", please let me
> > know if you've some better options.
> 
> Hi,
>  I've applied this patch after some minor clean ups (indenting mostly).
>  I rearranged some code first so that we don't need to link all all of the
>  rest of mdadm - just a few 'library-style' files.
> 
>  I'm not sure about the loop:
> >  
> > +	comp = info->devs;
> >  	for (i=0; i<raid_disks; i++) {
> > -		char *p;
> > -		p = strchr(argv[6+i], ':');
> > -
> > -		if(p != NULL) {
> > -			*p++ = '\0';
> > -			offsets[i] = atoll(p) * 512;
> > -		}
> > -		fds[i] = open(argv[6+i], O_RDWR);
> > -		if (fds[i] < 0) {
> > -			perror(argv[6+i]);
> > -			fprintf(stderr,"test_stripe: cannot open %s.\n", argv[6+i]);
> > +		int disk_slot = comp->disk.raid_disk;
> > +		disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
> > +		offsets[disk_slot] = comp->data_offset * 512;
> > +		fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
> > +		if (fds[disk_slot] < 0) {
> > +			perror(disk_name[disk_slot]);
> > +			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
> >  			exit(3);
> >  		}
> > +
> > +		comp = comp->next;
> >  	}
> 
> 
> The 'info->devs' list could include spare devices mixed in with the other
> devices.  So the 'for' loop should go to the end of list list, and we should
> ignore devices which are not active..
> So you should probably fix this.

thanks for the information, that's actually very
important.

I'll try to provide a fix patch in then next days.

> You will be able to find the applied patch in my 'master' branch shortly.

Good!

Thanks again,

bye,

pg
 
> Thanks,
> NeilBrown
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone code cleanup
  2011-04-04 23:12             ` NeilBrown
@ 2011-04-06 18:02               ` Piergiorgio Sartor
  2011-04-13 20:48                 ` [PATCH] RAID-6 check standalone fix component list parsing Piergiorgio Sartor
  2011-04-14  7:32                 ` [PATCH] RAID-6 check standalone code cleanup NeilBrown
  2011-05-08 18:54               ` [PATCH] RAID-6 check standalone suspend array Piergiorgio Sartor
  1 sibling, 2 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-04-06 18:02 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

On Tue, Apr 05, 2011 at 09:12:42AM +1000, NeilBrown wrote:
> On Mon, 4 Apr 2011 19:52:42 +0200 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> > Hi Neil,
> > 
> > please find below a second patch to "raid6check.c".
> > This applies on top of the previous one.
> > 
> > Major change is code cleanup and simplification.
> > Furthermore, a better error handling and a couple
> > of bug fixes.
> > Last but not least, the command line parameters are
> > changed from "bytes" to "stripes", which is more
> > convenient, I guess.
> 
> Thanks - I've applied this.

please find attached very below the fix for the
component list scanning. Taking care, hopefully,
to skip/avoid spare drives.
Furthermore, I added also a check for degraded
array, which should not be checked.

> I'm not sure about using 'stripes', though it would be hard to argue in
> favour of 'bytes'.
> Possibly the best number to use would be 'sectors' as that is how the kernel
> would report an inconsistency.
> 
> Once the code settles and you work out what the expected usage pattern would
> be, it might then be obvious what the best number is.  i.e. try to document
> how it would be use and if you find yourself describing complex calculations,
> then change the program so it does the the calculations and you document can
> avoid the complexity.

I switched to "stripes" because the code is using theme
all over and because I was continuosly calculating from
stripe to bytes.

I guess you're right, later it will be possible to decided
which is the better unit for command line and for the
error reporting.

> > 
> > If you prefer, I can send a single patch, including
> > in one shot the last one and this one.
> 
> no, multiple patches are much better - thanks.
> 
> As for the granularity for suspend/check/fix/unsuspend, I suspect that 
> per-stripe would be best.
> A smaller size wouldn't work, and a bigger size would only be helpful if
> there were lots and lots of fixes needed ... which hopefully won't be the
> case.

The suspend story might be a bit more complex than
I was considering.

For example, what will happen if the user hits ctrl-c
while the array is suspended?
Maybe the signals will have to be blocked or re-routed
to a proper cleanup function.
How about kill -9?

Second issue, the stripe in the array should be suspend
also in case the user wants a correction to happen.
In this situation, the suspend should include read, check
and write, since it will not be possible to allow some
other access in between the operations.
Could it be this is too long time for the stripe to be
blocked?

Maybe it would be simpler to require the arrays is in
read only mode....

What do you think?

Thanks,

bye,

pg

Patch follows here:

--- cut here ---

diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-04-05 01:29:45.000000000 +0200
+++ b/raid6check.c	2011-04-05 22:51:32.587032612 +0200
@@ -207,6 +207,7 @@
 	char **disk_name = NULL;
 	unsigned long long *offsets = NULL;
 	int raid_disks = 0;
+	int active_disks = 0;
 	int chunk_size = 0;
 	int layout = -1;
 	int level = 6;
@@ -242,6 +243,7 @@
 			  GET_LEVEL|
 			  GET_LAYOUT|
 			  GET_DISKS|
+			  GET_DEGRADED |
 			  GET_COMPONENT|
 			  GET_CHUNK|
 			  GET_DEVS|
@@ -254,6 +256,12 @@
 		goto exitHere;
 	}
 
+	if(info->array.failed_disks > 0) {
+		fprintf(stderr, "%s: %s degraded array\n", prg, argv[1]);
+		exit_err = 8;
+		goto exitHere;
+	}
+
 	printf("layout: %d\n", info->array.layout);
 	printf("disks: %d\n", info->array.raid_disks);
 	printf("component size: %llu\n", info->component_size * 512);
@@ -262,12 +270,13 @@
 	printf("\n");
 
 	comp = info->devs;
-	for(i = 0; i < info->array.raid_disks; i++) {
+	for(i = 0, active_disks = 0; active_disks < info->array.raid_disks; i++) {
 		printf("disk: %d - offset: %llu - size: %llu - name: %s - slot: %d\n",
 			i, comp->data_offset * 512, comp->component_size * 512,
 			map_dev(comp->disk.major, comp->disk.minor, 0),
 			comp->disk.raid_disk);
-
+		if(comp->disk.raid_disk >= 0)
+			active_disks++;
 		comp = comp->next;
 	}
 	printf("\n");
@@ -317,18 +326,20 @@
 	close_flag = 1;
 
 	comp = info->devs;
-	for (i=0; i<raid_disks; i++) {
+	for (i=0, active_disks=0; active_disks<raid_disks; i++) {
 		int disk_slot = comp->disk.raid_disk;
-		disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
-		offsets[disk_slot] = comp->data_offset * 512;
-		fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
-		if (fds[disk_slot] < 0) {
-			perror(disk_name[disk_slot]);
-			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
-			exit_err = 6;
-			goto exitHere;
+		if(disk_slot >= 0) {
+			disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
+			offsets[disk_slot] = comp->data_offset * 512;
+			fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
+			if (fds[disk_slot] < 0) {
+				perror(disk_name[disk_slot]);
+				fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
+				exit_err = 6;
+				goto exitHere;
+			}
+			active_disks++;
 		}
-
 		comp = comp->next;
 	}
 
--- cut here ---

> NeilBrown
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone fix component list parsing
  2011-04-06 18:02               ` Piergiorgio Sartor
@ 2011-04-13 20:48                 ` Piergiorgio Sartor
  2011-04-14  7:29                   ` NeilBrown
  2011-04-14  7:32                 ` [PATCH] RAID-6 check standalone code cleanup NeilBrown
  1 sibling, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-04-13 20:48 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: NeilBrown, linux-raid

Hi Neil,

maybe you missed the other email, anyway please
find attached the patch to fix the parsing of
the component list, i.e. skipping the "spare" one.

I also added a check in case the array is degraded.

Thanks,

pg

--- cut here ---


diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-04-05 01:29:45.000000000 +0200
+++ b/raid6check.c	2011-04-05 22:51:32.587032612 +0200
@@ -207,6 +207,7 @@
 	char **disk_name = NULL;
 	unsigned long long *offsets = NULL;
 	int raid_disks = 0;
+	int active_disks = 0;
 	int chunk_size = 0;
 	int layout = -1;
 	int level = 6;
@@ -242,6 +243,7 @@
 			  GET_LEVEL|
 			  GET_LAYOUT|
 			  GET_DISKS|
+			  GET_DEGRADED |
 			  GET_COMPONENT|
 			  GET_CHUNK|
 			  GET_DEVS|
@@ -254,6 +256,12 @@
 		goto exitHere;
 	}
 
+	if(info->array.failed_disks > 0) {
+		fprintf(stderr, "%s: %s degraded array\n", prg, argv[1]);
+		exit_err = 8;
+		goto exitHere;
+	}
+
 	printf("layout: %d\n", info->array.layout);
 	printf("disks: %d\n", info->array.raid_disks);
 	printf("component size: %llu\n", info->component_size * 512);
@@ -262,12 +270,13 @@
 	printf("\n");
 
 	comp = info->devs;
-	for(i = 0; i < info->array.raid_disks; i++) {
+	for(i = 0, active_disks = 0; active_disks < info->array.raid_disks; i++) {
 		printf("disk: %d - offset: %llu - size: %llu - name: %s - slot: %d\n",
 			i, comp->data_offset * 512, comp->component_size * 512,
 			map_dev(comp->disk.major, comp->disk.minor, 0),
 			comp->disk.raid_disk);
-
+		if(comp->disk.raid_disk >= 0)
+			active_disks++;
 		comp = comp->next;
 	}
 	printf("\n");
@@ -317,18 +326,20 @@
 	close_flag = 1;
 
 	comp = info->devs;
-	for (i=0; i<raid_disks; i++) {
+	for (i=0, active_disks=0; active_disks<raid_disks; i++) {
 		int disk_slot = comp->disk.raid_disk;
-		disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
-		offsets[disk_slot] = comp->data_offset * 512;
-		fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
-		if (fds[disk_slot] < 0) {
-			perror(disk_name[disk_slot]);
-			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
-			exit_err = 6;
-			goto exitHere;
+		if(disk_slot >= 0) {
+			disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
+			offsets[disk_slot] = comp->data_offset * 512;
+			fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
+			if (fds[disk_slot] < 0) {
+				perror(disk_name[disk_slot]);
+				fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
+				exit_err = 6;
+				goto exitHere;
+			}
+			active_disks++;
 		}
-
 		comp = comp->next;
 	}
 
--- cut here ---

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone fix component list parsing
  2011-04-13 20:48                 ` [PATCH] RAID-6 check standalone fix component list parsing Piergiorgio Sartor
@ 2011-04-14  7:29                   ` NeilBrown
  0 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2011-04-14  7:29 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Wed, 13 Apr 2011 22:48:25 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> maybe you missed the other email, anyway please
> find attached the patch to fix the parsing of
> the component list, i.e. skipping the "spare" one.
> 
> I also added a check in case the array is degraded.
> 
> Thanks,
> 

Thanks for the reminded.  I've applied this now.

NeilBrown


> pg
> 
> --- cut here ---
> 
> 
> diff -uNr a/raid6check.c b/raid6check.c
> --- a/raid6check.c	2011-04-05 01:29:45.000000000 +0200
> +++ b/raid6check.c	2011-04-05 22:51:32.587032612 +0200
> @@ -207,6 +207,7 @@
>  	char **disk_name = NULL;
>  	unsigned long long *offsets = NULL;
>  	int raid_disks = 0;
> +	int active_disks = 0;
>  	int chunk_size = 0;
>  	int layout = -1;
>  	int level = 6;
> @@ -242,6 +243,7 @@
>  			  GET_LEVEL|
>  			  GET_LAYOUT|
>  			  GET_DISKS|
> +			  GET_DEGRADED |
>  			  GET_COMPONENT|
>  			  GET_CHUNK|
>  			  GET_DEVS|
> @@ -254,6 +256,12 @@
>  		goto exitHere;
>  	}
>  
> +	if(info->array.failed_disks > 0) {
> +		fprintf(stderr, "%s: %s degraded array\n", prg, argv[1]);
> +		exit_err = 8;
> +		goto exitHere;
> +	}
> +
>  	printf("layout: %d\n", info->array.layout);
>  	printf("disks: %d\n", info->array.raid_disks);
>  	printf("component size: %llu\n", info->component_size * 512);
> @@ -262,12 +270,13 @@
>  	printf("\n");
>  
>  	comp = info->devs;
> -	for(i = 0; i < info->array.raid_disks; i++) {
> +	for(i = 0, active_disks = 0; active_disks < info->array.raid_disks; i++) {
>  		printf("disk: %d - offset: %llu - size: %llu - name: %s - slot: %d\n",
>  			i, comp->data_offset * 512, comp->component_size * 512,
>  			map_dev(comp->disk.major, comp->disk.minor, 0),
>  			comp->disk.raid_disk);
> -
> +		if(comp->disk.raid_disk >= 0)
> +			active_disks++;
>  		comp = comp->next;
>  	}
>  	printf("\n");
> @@ -317,18 +326,20 @@
>  	close_flag = 1;
>  
>  	comp = info->devs;
> -	for (i=0; i<raid_disks; i++) {
> +	for (i=0, active_disks=0; active_disks<raid_disks; i++) {
>  		int disk_slot = comp->disk.raid_disk;
> -		disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
> -		offsets[disk_slot] = comp->data_offset * 512;
> -		fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
> -		if (fds[disk_slot] < 0) {
> -			perror(disk_name[disk_slot]);
> -			fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
> -			exit_err = 6;
> -			goto exitHere;
> +		if(disk_slot >= 0) {
> +			disk_name[disk_slot] = map_dev(comp->disk.major, comp->disk.minor, 0);
> +			offsets[disk_slot] = comp->data_offset * 512;
> +			fds[disk_slot] = open(disk_name[disk_slot], O_RDWR);
> +			if (fds[disk_slot] < 0) {
> +				perror(disk_name[disk_slot]);
> +				fprintf(stderr,"%s: cannot open %s\n", prg, disk_name[disk_slot]);
> +				exit_err = 6;
> +				goto exitHere;
> +			}
> +			active_disks++;
>  		}
> -
>  		comp = comp->next;
>  	}
>  
> --- cut here ---
> 
> bye,
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone code cleanup
  2011-04-06 18:02               ` Piergiorgio Sartor
  2011-04-13 20:48                 ` [PATCH] RAID-6 check standalone fix component list parsing Piergiorgio Sartor
@ 2011-04-14  7:32                 ` NeilBrown
  1 sibling, 0 replies; 29+ messages in thread
From: NeilBrown @ 2011-04-14  7:32 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Wed, 6 Apr 2011 20:02:02 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> On Tue, Apr 05, 2011 at 09:12:42AM +1000, NeilBrown wrote:
> > On Mon, 4 Apr 2011 19:52:42 +0200 Piergiorgio Sartor
> > <piergiorgio.sartor@nexgo.de> wrote:
> > 
> > > Hi Neil,
> > > 
> > > please find below a second patch to "raid6check.c".
> > > This applies on top of the previous one.
> > > 
> > > Major change is code cleanup and simplification.
> > > Furthermore, a better error handling and a couple
> > > of bug fixes.
> > > Last but not least, the command line parameters are
> > > changed from "bytes" to "stripes", which is more
> > > convenient, I guess.
> > 
> > Thanks - I've applied this.
> 
> please find attached very below the fix for the
> component list scanning. Taking care, hopefully,
> to skip/avoid spare drives.
> Furthermore, I added also a check for degraded
> array, which should not be checked.
> 
> > I'm not sure about using 'stripes', though it would be hard to argue in
> > favour of 'bytes'.
> > Possibly the best number to use would be 'sectors' as that is how the kernel
> > would report an inconsistency.
> > 
> > Once the code settles and you work out what the expected usage pattern would
> > be, it might then be obvious what the best number is.  i.e. try to document
> > how it would be use and if you find yourself describing complex calculations,
> > then change the program so it does the the calculations and you document can
> > avoid the complexity.
> 
> I switched to "stripes" because the code is using theme
> all over and because I was continuosly calculating from
> stripe to bytes.
> 
> I guess you're right, later it will be possible to decided
> which is the better unit for command line and for the
> error reporting.
> 
> > > 
> > > If you prefer, I can send a single patch, including
> > > in one shot the last one and this one.
> > 
> > no, multiple patches are much better - thanks.
> > 
> > As for the granularity for suspend/check/fix/unsuspend, I suspect that 
> > per-stripe would be best.
> > A smaller size wouldn't work, and a bigger size would only be helpful if
> > there were lots and lots of fixes needed ... which hopefully won't be the
> > case.
> 
> The suspend story might be a bit more complex than
> I was considering.
> 
> For example, what will happen if the user hits ctrl-c
> while the array is suspended?
> Maybe the signals will have to be blocked or re-routed
> to a proper cleanup function.
> How about kill -9?

Definitely catch signals like SIGTERM SIGINT SIGQUIT.

There is nothing you can do about SIGKILL - if someone sends that, it is
their own fault if things break.

> 
> Second issue, the stripe in the array should be suspend
> also in case the user wants a correction to happen.
> In this situation, the suspend should include read, check
> and write, since it will not be possible to allow some
> other access in between the operations.
> Could it be this is too long time for the stripe to be
> blocked?
> 
> Maybe it would be simpler to require the arrays is in
> read only mode....
> 
> What do you think?
> 

Certainly encouraging - in the documentation - the array to not be in active
use while you check/fix it would be a good thing.
But a read/check/write should not take more than a few dozen milliseconds.
Suspending IO for that long shouldn't be a problem.

NeilBrown


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone suspend array
  2011-04-04 23:12             ` NeilBrown
  2011-04-06 18:02               ` Piergiorgio Sartor
@ 2011-05-08 18:54               ` Piergiorgio Sartor
  2011-05-09  1:45                 ` NeilBrown
  1 sibling, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-05-08 18:54 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

Hi Neil,

please find below a small patch which should suspend the
array while reading the stripes in order to perform the
check of the RAID-6.

This should complete the "check" part of the SW.
Please let me know what else could be needed (docs,
test or else).

Please have a careful look at it, since I did not know
how to test it.

Thanks.

--- cut here ---


diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
+++ b/raid6check.c	2011-05-07 21:00:07.713865939 +0200
@@ -24,6 +24,7 @@
 
 #include "mdadm.h"
 #include <stdint.h>
+#include <signal.h>
 
 int geo_map(int block, unsigned long long stripe, int raid_disks,
 	    int level, int layout);
@@ -99,7 +100,7 @@
 	return curr_broken_disk;
 }
 
-int check_stripes(int *source, unsigned long long *offsets,
+int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
 		  int raid_disks, int chunk_size, int level, int layout,
 		  unsigned long long start, unsigned long long length, char *name[])
 {
@@ -139,10 +140,22 @@
 
 		printf("pos --> %llu\n", start);
 
+		signal(SIGTERM, SIG_IGN);
+		signal(SIGINT, SIG_IGN);
+		signal(SIGQUIT, SIG_IGN);
+		sysfs_set_num(info, NULL, "suspend_lo", start * data_disks);
+		sysfs_set_num(info, NULL, "suspend_hi", (start + chunk_size) * data_disks);
 		for (i = 0 ; i < raid_disks ; i++) {
 			lseek64(source[i], offsets[i] + start * chunk_size, 0);
 			read(source[i], stripes[i], chunk_size);
 		}
+		sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
+		sysfs_set_num(info, NULL, "suspend_hi", 0);
+		sysfs_set_num(info, NULL, "suspend_lo", 0);
+		signal(SIGQUIT, SIG_DFL);
+		signal(SIGINT, SIG_DFL);
+		signal(SIGTERM, SIG_DFL);
+
 		for (i = 0 ; i < data_disks ; i++) {
 			int disk = geo_map(i, start, raid_disks, level, layout);
 			blocks[i] = stripes[disk];
@@ -343,7 +356,7 @@
 		comp = comp->next;
 	}
 
-	int rv = check_stripes(fds, offsets,
+	int rv = check_stripes(info, fds, offsets,
 			       raid_disks, chunk_size, level, layout,
 			       start, length, disk_name);
 	if (rv != 0) {

--- cut here ---

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone suspend array
  2011-05-08 18:54               ` [PATCH] RAID-6 check standalone suspend array Piergiorgio Sartor
@ 2011-05-09  1:45                 ` NeilBrown
  2011-05-09 18:43                   ` [PATCH] RAID-6 check standalone suspend array V2.0 Piergiorgio Sartor
  0 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2011-05-09  1:45 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Sun, 8 May 2011 20:54:08 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> please find below a small patch which should suspend the
> array while reading the stripes in order to perform the
> check of the RAID-6.
> 
> This should complete the "check" part of the SW.
> Please let me know what else could be needed (docs,
> test or else).
> 
> Please have a careful look at it, since I did not know
> how to test it.
> 
> Thanks.
> 
> --- cut here ---
> 
> 
> diff -uNr a/raid6check.c b/raid6check.c
> --- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
> +++ b/raid6check.c	2011-05-07 21:00:07.713865939 +0200
> @@ -24,6 +24,7 @@
>  
>  #include "mdadm.h"
>  #include <stdint.h>
> +#include <signal.h>
>  
>  int geo_map(int block, unsigned long long stripe, int raid_disks,
>  	    int level, int layout);
> @@ -99,7 +100,7 @@
>  	return curr_broken_disk;
>  }
>  
> -int check_stripes(int *source, unsigned long long *offsets,
> +int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
>  		  int raid_disks, int chunk_size, int level, int layout,
>  		  unsigned long long start, unsigned long long length, char *name[])
>  {
> @@ -139,10 +140,22 @@
>  
>  		printf("pos --> %llu\n", start);
>  
> +		signal(SIGTERM, SIG_IGN);
> +		signal(SIGINT, SIG_IGN);
> +		signal(SIGQUIT, SIG_IGN);
> +		sysfs_set_num(info, NULL, "suspend_lo", start * data_disks);
> +		sysfs_set_num(info, NULL, "suspend_hi", (start + chunk_size) * data_disks);
>  		for (i = 0 ; i < raid_disks ; i++) {
>  			lseek64(source[i], offsets[i] + start * chunk_size, 0);
>  			read(source[i], stripes[i], chunk_size);
>  		}
> +		sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
> +		sysfs_set_num(info, NULL, "suspend_hi", 0);
> +		sysfs_set_num(info, NULL, "suspend_lo", 0);
> +		signal(SIGQUIT, SIG_DFL);
> +		signal(SIGINT, SIG_DFL);
> +		signal(SIGTERM, SIG_DFL);
> +
>  		for (i = 0 ; i < data_disks ; i++) {
>  			int disk = geo_map(i, start, raid_disks, level, layout);
>  			blocks[i] = stripes[disk];
> @@ -343,7 +356,7 @@
>  		comp = comp->next;
>  	}
>  
> -	int rv = check_stripes(fds, offsets,
> +	int rv = check_stripes(info, fds, offsets,
>  			       raid_disks, chunk_size, level, layout,
>  			       start, length, disk_name);
>  	if (rv != 0) {
> 
> --- cut here ---
> 
> bye,
> 


Looks pretty good.  However:

 - you shouldn't blindly reset the signals to 'SIG_DFL'.  You should capture
   the return value from 'signal', and feed tha back in to restore the
   previous setting.  Alternately use 'sigblock' to just block the signal
   rather than ignoring it, then unblock afterwards.

 - When suspending IO it is safest to call
        mlockall(MCL_CURRENT|MCL_FUTURE);
   before you start.  That ensures that if the device is used for swap there
   is no chance of deadlocking trying to swap-out while the device is locked.

 - You should check the return value from sysfs_set_num and at least report
   any error.  If they return an error then you can know something is wrong...

 - Finally, I think the numbers you are giving to suspend_{lo,hi} are wrong.
   'start' is a number of chunks, so you should write
           start * chunk_size * data_disks
   to suspend_hi, and make a similar change to the calculation for suspend_lo.


Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone suspend array V2.0
  2011-05-09  1:45                 ` NeilBrown
@ 2011-05-09 18:43                   ` Piergiorgio Sartor
  2011-05-15 21:15                     ` Piergiorgio Sartor
  0 siblings, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-05-09 18:43 UTC (permalink / raw)
  To: NeilBrown; +Cc: Piergiorgio Sartor, linux-raid

On Mon, May 09, 2011 at 11:45:00AM +1000, NeilBrown wrote:
> On Sun, 8 May 2011 20:54:08 +0200 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
> > Hi Neil,
> > 
> > please find below a small patch which should suspend the
> > array while reading the stripes in order to perform the
> > check of the RAID-6.
> > 
> > This should complete the "check" part of the SW.
> > Please let me know what else could be needed (docs,
> > test or else).
> > 
> > Please have a careful look at it, since I did not know
> > how to test it.
> > 
> > Thanks.
> > 
> > --- cut here ---
> > 
> > 
> > diff -uNr a/raid6check.c b/raid6check.c
> > --- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
> > +++ b/raid6check.c	2011-05-07 21:00:07.713865939 +0200
> > @@ -24,6 +24,7 @@
> >  
> >  #include "mdadm.h"
> >  #include <stdint.h>
> > +#include <signal.h>
> >  
> >  int geo_map(int block, unsigned long long stripe, int raid_disks,
> >  	    int level, int layout);
> > @@ -99,7 +100,7 @@
> >  	return curr_broken_disk;
> >  }
> >  
> > -int check_stripes(int *source, unsigned long long *offsets,
> > +int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
> >  		  int raid_disks, int chunk_size, int level, int layout,
> >  		  unsigned long long start, unsigned long long length, char *name[])
> >  {
> > @@ -139,10 +140,22 @@
> >  
> >  		printf("pos --> %llu\n", start);
> >  
> > +		signal(SIGTERM, SIG_IGN);
> > +		signal(SIGINT, SIG_IGN);
> > +		signal(SIGQUIT, SIG_IGN);
> > +		sysfs_set_num(info, NULL, "suspend_lo", start * data_disks);
> > +		sysfs_set_num(info, NULL, "suspend_hi", (start + chunk_size) * data_disks);
> >  		for (i = 0 ; i < raid_disks ; i++) {
> >  			lseek64(source[i], offsets[i] + start * chunk_size, 0);
> >  			read(source[i], stripes[i], chunk_size);
> >  		}
> > +		sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
> > +		sysfs_set_num(info, NULL, "suspend_hi", 0);
> > +		sysfs_set_num(info, NULL, "suspend_lo", 0);
> > +		signal(SIGQUIT, SIG_DFL);
> > +		signal(SIGINT, SIG_DFL);
> > +		signal(SIGTERM, SIG_DFL);
> > +
> >  		for (i = 0 ; i < data_disks ; i++) {
> >  			int disk = geo_map(i, start, raid_disks, level, layout);
> >  			blocks[i] = stripes[disk];
> > @@ -343,7 +356,7 @@
> >  		comp = comp->next;
> >  	}
> >  
> > -	int rv = check_stripes(fds, offsets,
> > +	int rv = check_stripes(info, fds, offsets,
> >  			       raid_disks, chunk_size, level, layout,
> >  			       start, length, disk_name);
> >  	if (rv != 0) {
> > 
> > --- cut here ---
> > 
> > bye,
> > 
> 
> 
> Looks pretty good.  However:
> 
>  - you shouldn't blindly reset the signals to 'SIG_DFL'.  You should capture
>    the return value from 'signal', and feed tha back in to restore the
>    previous setting.  Alternately use 'sigblock' to just block the signal
>    rather than ignoring it, then unblock afterwards.
> 
>  - When suspending IO it is safest to call
>         mlockall(MCL_CURRENT|MCL_FUTURE);
>    before you start.  That ensures that if the device is used for swap there
>    is no chance of deadlocking trying to swap-out while the device is locked.
> 
>  - You should check the return value from sysfs_set_num and at least report
>    any error.  If they return an error then you can know something is wrong...
> 
>  - Finally, I think the numbers you are giving to suspend_{lo,hi} are wrong.
>    'start' is a number of chunks, so you should write
>            start * chunk_size * data_disks
>    to suspend_hi, and make a similar change to the calculation for suspend_lo.
> 
> 
> Thanks,
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Neil,

thank you so much for the code review.

I modified the code in order to fix, hopefully, all the flaws.

New patch attached below.

Please note that "sigblock()" cannot be used, since it is
declared, at least on my system, as "deprecated".
Furthermore, I noticed that "Grow.c" is not checking the
return value of "sysfs_set_num()" while suspending the
array, maybe you'll need to look at this.

Finally, please check the new patch too, while I can
confirm the software is doing what is supposed to do,
I still need support in order to confirm the suspend
and resume code.

Thanks again for your help, again let me know what
is the next expected step.

bye,

--- cut here ---

diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
+++ b/raid6check.c	2011-05-09 20:32:14.551695036 +0200
@@ -24,6 +24,8 @@
 
 #include "mdadm.h"
 #include <stdint.h>
+#include <signal.h>
+#include <sys/mman.h>
 
 int geo_map(int block, unsigned long long stripe, int raid_disks,
 	    int level, int layout);
@@ -99,7 +101,7 @@
 	return curr_broken_disk;
 }
 
-int check_stripes(int *source, unsigned long long *offsets,
+int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
 		  int raid_disks, int chunk_size, int level, int layout,
 		  unsigned long long start, unsigned long long length, char *name[])
 {
@@ -115,6 +117,8 @@
 	int diskP, diskQ;
 	int data_disks = raid_disks - 2;
 	int err = 0;
+	sighandler_t sig[3];
+	int rv;
 
 	extern int tables_ready;
 
@@ -139,10 +143,35 @@
 
 		printf("pos --> %llu\n", start);
 
+		if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
+			err = 2;
+			goto exitCheck;
+		}
+		sig[0] = signal(SIGTERM, SIG_IGN);
+		sig[1] = signal(SIGINT, SIG_IGN);
+		sig[2] = signal(SIGQUIT, SIG_IGN);
+		rv = sysfs_set_num(info, NULL, "suspend_lo", start * chunk_size * data_disks);
+		rv |= sysfs_set_num(info, NULL, "suspend_hi", (start + 1) * chunk_size * data_disks);
 		for (i = 0 ; i < raid_disks ; i++) {
 			lseek64(source[i], offsets[i] + start * chunk_size, 0);
 			read(source[i], stripes[i], chunk_size);
 		}
+		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
+		rv |= sysfs_set_num(info, NULL, "suspend_hi", 0);
+		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0);
+		signal(SIGQUIT, sig[2]);
+		signal(SIGINT, sig[1]);
+		signal(SIGTERM, sig[0]);
+		if(munlockall() != 0) {
+			err = 3;
+			goto exitCheck;
+		}
+
+		if(rv != 0) {
+			err = rv * 256;
+			goto exitCheck;
+		}
+
 		for (i = 0 ; i < data_disks ; i++) {
 			int disk = geo_map(i, start, raid_disks, level, layout);
 			blocks[i] = stripes[disk];
@@ -214,7 +243,7 @@
 	unsigned long long start, length;
 	int i;
 	int mdfd;
-	struct mdinfo *info, *comp;
+	struct mdinfo *info = NULL, *comp = NULL;
 	char *err = NULL;
 	int exit_err = 0;
 	int close_flag = 0;
@@ -250,6 +279,12 @@
 			  GET_OFFSET|
 			  GET_SIZE);
 
+	if(info == NULL) {
+		fprintf(stderr, "%s: Error reading sysfs information of %s\n", prg, argv[1]);
+		exit_err = 9;
+		goto exitHere;
+	}
+
 	if(info->array.level != level) {
 		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
 		exit_err = 3;
@@ -343,7 +378,7 @@
 		comp = comp->next;
 	}
 
-	int rv = check_stripes(fds, offsets,
+	int rv = check_stripes(info, fds, offsets,
 			       raid_disks, chunk_size, level, layout,
 			       start, length, disk_name);
 	if (rv != 0) {

--- cut here ---

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone suspend array V2.0
  2011-05-09 18:43                   ` [PATCH] RAID-6 check standalone suspend array V2.0 Piergiorgio Sartor
@ 2011-05-15 21:15                     ` Piergiorgio Sartor
  2011-05-16 10:08                       ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-05-15 21:15 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: NeilBrown, linux-raid

Hi Neil,

reminder for the suspend patch.

Thank you so much for the code review.

I modified it in order to fix, hopefully, all the flaws.

New patch attached below.

Please note that "sigblock()" cannot be used, since it is
declared, at least on my system, as "deprecated".
Furthermore, I noticed that "Grow.c" is not checking the
return value of "sysfs_set_num()" while suspending the
array, maybe you'll need to look at this.

Finally, please check the new patch too, while I can
confirm the software is doing what is supposed to do,
I still need support in order to confirm the suspend
and resume code.

Thanks again for your help, again let me know what
is the next expected step.

bye,

--- cut here ---

diff -uNr a/raid6check.c b/raid6check.c
--- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
+++ b/raid6check.c	2011-05-09 20:32:14.551695036 +0200
@@ -24,6 +24,8 @@
 
 #include "mdadm.h"
 #include <stdint.h>
+#include <signal.h>
+#include <sys/mman.h>
 
 int geo_map(int block, unsigned long long stripe, int raid_disks,
 	    int level, int layout);
@@ -99,7 +101,7 @@
 	return curr_broken_disk;
 }
 
-int check_stripes(int *source, unsigned long long *offsets,
+int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
 		  int raid_disks, int chunk_size, int level, int layout,
 		  unsigned long long start, unsigned long long length, char *name[])
 {
@@ -115,6 +117,8 @@
 	int diskP, diskQ;
 	int data_disks = raid_disks - 2;
 	int err = 0;
+	sighandler_t sig[3];
+	int rv;
 
 	extern int tables_ready;
 
@@ -139,10 +143,35 @@
 
 		printf("pos --> %llu\n", start);
 
+		if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
+			err = 2;
+			goto exitCheck;
+		}
+		sig[0] = signal(SIGTERM, SIG_IGN);
+		sig[1] = signal(SIGINT, SIG_IGN);
+		sig[2] = signal(SIGQUIT, SIG_IGN);
+		rv = sysfs_set_num(info, NULL, "suspend_lo", start * chunk_size * data_disks);
+		rv |= sysfs_set_num(info, NULL, "suspend_hi", (start + 1) * chunk_size * data_disks);
 		for (i = 0 ; i < raid_disks ; i++) {
 			lseek64(source[i], offsets[i] + start * chunk_size, 0);
 			read(source[i], stripes[i], chunk_size);
 		}
+		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
+		rv |= sysfs_set_num(info, NULL, "suspend_hi", 0);
+		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0);
+		signal(SIGQUIT, sig[2]);
+		signal(SIGINT, sig[1]);
+		signal(SIGTERM, sig[0]);
+		if(munlockall() != 0) {
+			err = 3;
+			goto exitCheck;
+		}
+
+		if(rv != 0) {
+			err = rv * 256;
+			goto exitCheck;
+		}
+
 		for (i = 0 ; i < data_disks ; i++) {
 			int disk = geo_map(i, start, raid_disks, level, layout);
 			blocks[i] = stripes[disk];
@@ -214,7 +243,7 @@
 	unsigned long long start, length;
 	int i;
 	int mdfd;
-	struct mdinfo *info, *comp;
+	struct mdinfo *info = NULL, *comp = NULL;
 	char *err = NULL;
 	int exit_err = 0;
 	int close_flag = 0;
@@ -250,6 +279,12 @@
 			  GET_OFFSET|
 			  GET_SIZE);
 
+	if(info == NULL) {
+		fprintf(stderr, "%s: Error reading sysfs information of %s\n", prg, argv[1]);
+		exit_err = 9;
+		goto exitHere;
+	}
+
 	if(info->array.level != level) {
 		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
 		exit_err = 3;
@@ -343,7 +378,7 @@
 		comp = comp->next;
 	}
 
-	int rv = check_stripes(fds, offsets,
+	int rv = check_stripes(info, fds, offsets,
 			       raid_disks, chunk_size, level, layout,
 			       start, length, disk_name);
 	if (rv != 0) {

--- cut here ---

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone suspend array V2.0
  2011-05-15 21:15                     ` Piergiorgio Sartor
@ 2011-05-16 10:08                       ` NeilBrown
  2011-07-20 17:57                         ` Piergiorgio Sartor
  0 siblings, 1 reply; 29+ messages in thread
From: NeilBrown @ 2011-05-16 10:08 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> reminder for the suspend patch.
> 
> Thank you so much for the code review.
> 
> I modified it in order to fix, hopefully, all the flaws.
> 
> New patch attached below.
> 
> Please note that "sigblock()" cannot be used, since it is
> declared, at least on my system, as "deprecated".
> Furthermore, I noticed that "Grow.c" is not checking the
> return value of "sysfs_set_num()" while suspending the
> array, maybe you'll need to look at this.
> 
> Finally, please check the new patch too, while I can
> confirm the software is doing what is supposed to do,
> I still need support in order to confirm the suspend
> and resume code.
> 
> Thanks again for your help, again let me know what
> is the next expected step.

That all looks fine thank.  I've applied it and pushed it out.

I'm not sure what you mean exactly by the 'next expected step'... 

Thanks,
NeilBrown


> 
> bye,
> 
> --- cut here ---
> 
> diff -uNr a/raid6check.c b/raid6check.c
> --- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
> +++ b/raid6check.c	2011-05-09 20:32:14.551695036 +0200
> @@ -24,6 +24,8 @@
>  
>  #include "mdadm.h"
>  #include <stdint.h>
> +#include <signal.h>
> +#include <sys/mman.h>
>  
>  int geo_map(int block, unsigned long long stripe, int raid_disks,
>  	    int level, int layout);
> @@ -99,7 +101,7 @@
>  	return curr_broken_disk;
>  }
>  
> -int check_stripes(int *source, unsigned long long *offsets,
> +int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
>  		  int raid_disks, int chunk_size, int level, int layout,
>  		  unsigned long long start, unsigned long long length, char *name[])
>  {
> @@ -115,6 +117,8 @@
>  	int diskP, diskQ;
>  	int data_disks = raid_disks - 2;
>  	int err = 0;
> +	sighandler_t sig[3];
> +	int rv;
>  
>  	extern int tables_ready;
>  
> @@ -139,10 +143,35 @@
>  
>  		printf("pos --> %llu\n", start);
>  
> +		if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
> +			err = 2;
> +			goto exitCheck;
> +		}
> +		sig[0] = signal(SIGTERM, SIG_IGN);
> +		sig[1] = signal(SIGINT, SIG_IGN);
> +		sig[2] = signal(SIGQUIT, SIG_IGN);
> +		rv = sysfs_set_num(info, NULL, "suspend_lo", start * chunk_size * data_disks);
> +		rv |= sysfs_set_num(info, NULL, "suspend_hi", (start + 1) * chunk_size * data_disks);
>  		for (i = 0 ; i < raid_disks ; i++) {
>  			lseek64(source[i], offsets[i] + start * chunk_size, 0);
>  			read(source[i], stripes[i], chunk_size);
>  		}
> +		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
> +		rv |= sysfs_set_num(info, NULL, "suspend_hi", 0);
> +		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0);
> +		signal(SIGQUIT, sig[2]);
> +		signal(SIGINT, sig[1]);
> +		signal(SIGTERM, sig[0]);
> +		if(munlockall() != 0) {
> +			err = 3;
> +			goto exitCheck;
> +		}
> +
> +		if(rv != 0) {
> +			err = rv * 256;
> +			goto exitCheck;
> +		}
> +
>  		for (i = 0 ; i < data_disks ; i++) {
>  			int disk = geo_map(i, start, raid_disks, level, layout);
>  			blocks[i] = stripes[disk];
> @@ -214,7 +243,7 @@
>  	unsigned long long start, length;
>  	int i;
>  	int mdfd;
> -	struct mdinfo *info, *comp;
> +	struct mdinfo *info = NULL, *comp = NULL;
>  	char *err = NULL;
>  	int exit_err = 0;
>  	int close_flag = 0;
> @@ -250,6 +279,12 @@
>  			  GET_OFFSET|
>  			  GET_SIZE);
>  
> +	if(info == NULL) {
> +		fprintf(stderr, "%s: Error reading sysfs information of %s\n", prg, argv[1]);
> +		exit_err = 9;
> +		goto exitHere;
> +	}
> +
>  	if(info->array.level != level) {
>  		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
>  		exit_err = 3;
> @@ -343,7 +378,7 @@
>  		comp = comp->next;
>  	}
>  
> -	int rv = check_stripes(fds, offsets,
> +	int rv = check_stripes(info, fds, offsets,
>  			       raid_disks, chunk_size, level, layout,
>  			       start, length, disk_name);
>  	if (rv != 0) {
> 
> --- cut here ---
> 
> bye,
> 


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone suspend array V2.0
  2011-05-16 10:08                       ` NeilBrown
@ 2011-07-20 17:57                         ` Piergiorgio Sartor
  2011-07-22  6:41                           ` Luca Berra
  2011-07-26  5:25                           ` NeilBrown
  0 siblings, 2 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-07-20 17:57 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,

sorry for the very late answer.

On 05/16/2011 12:08 PM, NeilBrown wrote:
> On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
>> Hi Neil,
>>
>> reminder for the suspend patch.
>>
>> Thank you so much for the code review.
>>
>> I modified it in order to fix, hopefully, all the flaws.
>>
>> New patch attached below.
>>
>> Please note that "sigblock()" cannot be used, since it is
>> declared, at least on my system, as "deprecated".
>> Furthermore, I noticed that "Grow.c" is not checking the
>> return value of "sysfs_set_num()" while suspending the
>> array, maybe you'll need to look at this.
>>
>> Finally, please check the new patch too, while I can
>> confirm the software is doing what is supposed to do,
>> I still need support in order to confirm the suspend
>> and resume code.
>>
>> Thanks again for your help, again let me know what
>> is the next expected step.
> 
> That all looks fine thank.  I've applied it and pushed it out.
> 
> I'm not sure what you mean exactly by the 'next expected step'... 

Well, is there anything than should be done, like
documentation or code cleanup?

At the moment, it seems to me, the check itself it is
fine, maybe performance is not at best (anyone wants
to help, here?).

So, I was thinking about the "repair" process, that is
fixing the chunks which seem corrupted, instead of just
the parity.

Before I go that way, I would like to close pending issues,
if any, with the actual software.

Thanks,

bye,

pg


> Thanks,
> NeilBrown
> 
> 
>>
>> bye,
>>
>> --- cut here ---
>>
>> diff -uNr a/raid6check.c b/raid6check.c
>> --- a/raid6check.c	2011-05-07 20:35:18.693370007 +0200
>> +++ b/raid6check.c	2011-05-09 20:32:14.551695036 +0200
>> @@ -24,6 +24,8 @@
>>  
>>  #include "mdadm.h"
>>  #include <stdint.h>
>> +#include <signal.h>
>> +#include <sys/mman.h>
>>  
>>  int geo_map(int block, unsigned long long stripe, int raid_disks,
>>  	    int level, int layout);
>> @@ -99,7 +101,7 @@
>>  	return curr_broken_disk;
>>  }
>>  
>> -int check_stripes(int *source, unsigned long long *offsets,
>> +int check_stripes(struct mdinfo *info, int *source, unsigned long long *offsets,
>>  		  int raid_disks, int chunk_size, int level, int layout,
>>  		  unsigned long long start, unsigned long long length, char *name[])
>>  {
>> @@ -115,6 +117,8 @@
>>  	int diskP, diskQ;
>>  	int data_disks = raid_disks - 2;
>>  	int err = 0;
>> +	sighandler_t sig[3];
>> +	int rv;
>>  
>>  	extern int tables_ready;
>>  
>> @@ -139,10 +143,35 @@
>>  
>>  		printf("pos --> %llu\n", start);
>>  
>> +		if(mlockall(MCL_CURRENT | MCL_FUTURE) != 0) {
>> +			err = 2;
>> +			goto exitCheck;
>> +		}
>> +		sig[0] = signal(SIGTERM, SIG_IGN);
>> +		sig[1] = signal(SIGINT, SIG_IGN);
>> +		sig[2] = signal(SIGQUIT, SIG_IGN);
>> +		rv = sysfs_set_num(info, NULL, "suspend_lo", start * chunk_size * data_disks);
>> +		rv |= sysfs_set_num(info, NULL, "suspend_hi", (start + 1) * chunk_size * data_disks);
>>  		for (i = 0 ; i < raid_disks ; i++) {
>>  			lseek64(source[i], offsets[i] + start * chunk_size, 0);
>>  			read(source[i], stripes[i], chunk_size);
>>  		}
>> +		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0x7FFFFFFFFFFFFFFFULL);
>> +		rv |= sysfs_set_num(info, NULL, "suspend_hi", 0);
>> +		rv |= sysfs_set_num(info, NULL, "suspend_lo", 0);
>> +		signal(SIGQUIT, sig[2]);
>> +		signal(SIGINT, sig[1]);
>> +		signal(SIGTERM, sig[0]);
>> +		if(munlockall() != 0) {
>> +			err = 3;
>> +			goto exitCheck;
>> +		}
>> +
>> +		if(rv != 0) {
>> +			err = rv * 256;
>> +			goto exitCheck;
>> +		}
>> +
>>  		for (i = 0 ; i < data_disks ; i++) {
>>  			int disk = geo_map(i, start, raid_disks, level, layout);
>>  			blocks[i] = stripes[disk];
>> @@ -214,7 +243,7 @@
>>  	unsigned long long start, length;
>>  	int i;
>>  	int mdfd;
>> -	struct mdinfo *info, *comp;
>> +	struct mdinfo *info = NULL, *comp = NULL;
>>  	char *err = NULL;
>>  	int exit_err = 0;
>>  	int close_flag = 0;
>> @@ -250,6 +279,12 @@
>>  			  GET_OFFSET|
>>  			  GET_SIZE);
>>  
>> +	if(info == NULL) {
>> +		fprintf(stderr, "%s: Error reading sysfs information of %s\n", prg, argv[1]);
>> +		exit_err = 9;
>> +		goto exitHere;
>> +	}
>> +
>>  	if(info->array.level != level) {
>>  		fprintf(stderr, "%s: %s not a RAID-6\n", prg, argv[1]);
>>  		exit_err = 3;
>> @@ -343,7 +378,7 @@
>>  		comp = comp->next;
>>  	}
>>  
>> -	int rv = check_stripes(fds, offsets,
>> +	int rv = check_stripes(info, fds, offsets,
>>  			       raid_disks, chunk_size, level, layout,
>>  			       start, length, disk_name);
>>  	if (rv != 0) {
>>
>> --- cut here ---
>>
>> bye,
>>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone suspend array V2.0
  2011-07-20 17:57                         ` Piergiorgio Sartor
@ 2011-07-22  6:41                           ` Luca Berra
  2011-07-25 18:53                             ` Piergiorgio Sartor
  2011-07-26  5:25                           ` NeilBrown
  1 sibling, 1 reply; 29+ messages in thread
From: Luca Berra @ 2011-07-22  6:41 UTC (permalink / raw)
  To: linux-raid

On Wed, Jul 20, 2011 at 07:57:03PM +0200, Piergiorgio Sartor wrote:
>Hi Neil,
>
>sorry for the very late answer.
>
>On 05/16/2011 12:08 PM, NeilBrown wrote:
>> On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
>>> Thanks again for your help, again let me know what
>>> is the next expected step.
>> 
>> That all looks fine thank.  I've applied it and pushed it out.
>> 
>> I'm not sure what you mean exactly by the 'next expected step'... 
>
>Well, is there anything than should be done, like
>documentation or code cleanup?

I would love if you could also provide a man page

Regards,
L.

-- 
Luca Berra -- bluca@comedia.it

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone suspend array V2.0
  2011-07-22  6:41                           ` Luca Berra
@ 2011-07-25 18:53                             ` Piergiorgio Sartor
  0 siblings, 0 replies; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-07-25 18:53 UTC (permalink / raw)
  To: linux-raid

Hi Luca,

On 07/22/2011 08:41 AM, Luca Berra wrote:
> On Wed, Jul 20, 2011 at 07:57:03PM +0200, Piergiorgio Sartor wrote:
>> Hi Neil,
>>
>> sorry for the very late answer.
>>
>> On 05/16/2011 12:08 PM, NeilBrown wrote:
>>> On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
>>>> Thanks again for your help, again let me know what
>>>> is the next expected step.
>>>
>>> That all looks fine thank.  I've applied it and pushed it out.
>>>
>>> I'm not sure what you mean exactly by the 'next expected step'... 
>>
>> Well, is there anything than should be done, like
>> documentation or code cleanup?
> 
> I would love if you could also provide a man page
> 
> Regards,
> L.
> 

is there any tool to prepare a man page or should I
copy & modify something already existing?
Would a "readme.txt" do the same for you?

Thanks,

bye,

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone suspend array V2.0
  2011-07-20 17:57                         ` Piergiorgio Sartor
  2011-07-22  6:41                           ` Luca Berra
@ 2011-07-26  5:25                           ` NeilBrown
  2011-08-07 17:09                             ` [PATCH] RAID-6 check standalone man page Piergiorgio Sartor
  1 sibling, 1 reply; 29+ messages in thread
From: NeilBrown @ 2011-07-26  5:25 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Wed, 20 Jul 2011 19:57:03 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> Hi Neil,
> 
> sorry for the very late answer.
> 
> On 05/16/2011 12:08 PM, NeilBrown wrote:
> > On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
> > <piergiorgio.sartor@nexgo.de> wrote:
> > 
> >> Hi Neil,
> >>
> >> reminder for the suspend patch.
> >>
> >> Thank you so much for the code review.
> >>
> >> I modified it in order to fix, hopefully, all the flaws.
> >>
> >> New patch attached below.
> >>
> >> Please note that "sigblock()" cannot be used, since it is
> >> declared, at least on my system, as "deprecated".
> >> Furthermore, I noticed that "Grow.c" is not checking the
> >> return value of "sysfs_set_num()" while suspending the
> >> array, maybe you'll need to look at this.
> >>
> >> Finally, please check the new patch too, while I can
> >> confirm the software is doing what is supposed to do,
> >> I still need support in order to confirm the suspend
> >> and resume code.
> >>
> >> Thanks again for your help, again let me know what
> >> is the next expected step.
> > 
> > That all looks fine thank.  I've applied it and pushed it out.
> > 
> > I'm not sure what you mean exactly by the 'next expected step'... 
> 
> Well, is there anything than should be done, like
> documentation or code cleanup?
> 
> At the moment, it seems to me, the check itself it is
> fine, maybe performance is not at best (anyone wants
> to help, here?).
> 
> So, I was thinking about the "repair" process, that is
> fixing the chunks which seem corrupted, instead of just
> the parity.
> 
> Before I go that way, I would like to close pending issues,
> if any, with the actual software.

Very sensible - thanks.

I haven't actually used it or tested it at all so I don't know of any issues
in that regard.
I agree with an earlier reply that a man-page would be a good idea.
You could start by looking at "mdadm.8.in" and basing your man page on that.
It doesn't have to be very long - just explain what it does and how to use
it, and maybe how to interpret the results.
Often writing a man page is enough to flush out any serious usability issues
- if you find it hard to explain how to use it, it is probably because it is
hard to use :-)

If you aren't familiar with the troff -man markup language don't let it worry
you - I am happy to fix up any markup issues before including it.

It might be good to create a test script too - something that can go in the
tests/ directory would be ideal.
e.g. create and initialise a RAID6, deliberately corrupt one block, then run
your program an check that it reports the right thing.
Probably corrupt a different blocks (randomly?) on each device and test that
it reports all of the errors correctly... something like that.

NeilBrown

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH] RAID-6 check standalone man page
  2011-07-26  5:25                           ` NeilBrown
@ 2011-08-07 17:09                             ` Piergiorgio Sartor
  2011-08-09  0:43                               ` NeilBrown
  0 siblings, 1 reply; 29+ messages in thread
From: Piergiorgio Sartor @ 2011-08-07 17:09 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 07/26/2011 07:25 AM, NeilBrown wrote:
> On Wed, 20 Jul 2011 19:57:03 +0200 Piergiorgio Sartor
> <piergiorgio.sartor@nexgo.de> wrote:
> 
>> Hi Neil,
>>
>> sorry for the very late answer.
>>
>> On 05/16/2011 12:08 PM, NeilBrown wrote:
>>> On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
>>> <piergiorgio.sartor@nexgo.de> wrote:
>>>
>>>> Hi Neil,
>>>>
>>>> reminder for the suspend patch.
>>>>
>>>> Thank you so much for the code review.
>>>>
>>>> I modified it in order to fix, hopefully, all the flaws.
>>>>
>>>> New patch attached below.
>>>>
>>>> Please note that "sigblock()" cannot be used, since it is
>>>> declared, at least on my system, as "deprecated".
>>>> Furthermore, I noticed that "Grow.c" is not checking the
>>>> return value of "sysfs_set_num()" while suspending the
>>>> array, maybe you'll need to look at this.
>>>>
>>>> Finally, please check the new patch too, while I can
>>>> confirm the software is doing what is supposed to do,
>>>> I still need support in order to confirm the suspend
>>>> and resume code.
>>>>
>>>> Thanks again for your help, again let me know what
>>>> is the next expected step.
>>>
>>> That all looks fine thank.  I've applied it and pushed it out.
>>>
>>> I'm not sure what you mean exactly by the 'next expected step'... 
>>
>> Well, is there anything than should be done, like
>> documentation or code cleanup?
>>
>> At the moment, it seems to me, the check itself it is
>> fine, maybe performance is not at best (anyone wants
>> to help, here?).
>>
>> So, I was thinking about the "repair" process, that is
>> fixing the chunks which seem corrupted, instead of just
>> the parity.
>>
>> Before I go that way, I would like to close pending issues,
>> if any, with the actual software.
> 
> Very sensible - thanks.
> 
> I haven't actually used it or tested it at all so I don't know of any issues
> in that regard.
> I agree with an earlier reply that a man-page would be a good idea.
> You could start by looking at "mdadm.8.in" and basing your man page on that.
> It doesn't have to be very long - just explain what it does and how to use
> it, and maybe how to interpret the results.
> Often writing a man page is enough to flush out any serious usability issues
> - if you find it hard to explain how to use it, it is probably because it is
> hard to use :-)
> 
> If you aren't familiar with the troff -man markup language don't let it worry
> you - I am happy to fix up any markup issues before including it.
> 
> It might be good to create a test script too - something that can go in the
> tests/ directory would be ideal.
> e.g. create and initialise a RAID6, deliberately corrupt one block, then run
> your program an check that it reports the right thing.
> Probably corrupt a different blocks (randomly?) on each device and test that
> it reports all of the errors correctly... something like that.
> 
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Hi Neil,

please find below a draft of the man page (as patch) for raid6check.
Please have a look and let me know any issues (typos for sure).

The test cases will come later.

--- cut here ---

diff -uNr a/raid6check.8.in b/raid6check.8.in
--- a/raid6check.8.in	1970-01-01 01:00:00.000000000 +0100
+++ b/raid6check.8.in	2011-08-07 19:04:17.352680683 +0200
@@ -0,0 +1,101 @@
+.\" -*- nroff -*-
+.\" Copyright Piergiorgio Sartor and others.
+.\"   This program is free software; you can redistribute it and/or modify
+.\"   it under the terms of the GNU General Public License as published by
+.\"   the Free Software Foundation; either version 2 of the License, or
+.\"   (at your option) any later version.
+.\" See file COPYING in distribution for details.
+.TH RAID6CHECK 8 "" v1.0.0
+.SH NAME
+raid6check \- check MD RAID6 device for errors
+.I aka
+Linux Software RAID
+
+.SH SYNOPSIS
+
+.BI raid6check " <raid6 device> <start stripe> <number of stripes>"
+
+.SH DESCRIPTION
+RAID 6 devices in which one single component drive has errors can use
+the double parity in order to find out the component drive.
+The "raid6check" tool checks, for each stripe, the double parity
+consistency and, it reports mismatches and, if possible, which
+component drive has the mismatch.
+Since it works at stripe level, it can report different drives with
+mismatches at different stripes.
+
+"raid6check" requires a non-degraded RAID 6 MD device, as first
+parameter, a starting stripe, usually 0, and the number of stripes
+to be checked.
+If this third parameter is also 0, it will check the array up to
+the end.
+
+"raid6check" will start printing information about the RAID 6, then
+for each stripe, it will report the parity rotation status.
+In case of parity mismatches, "raid6check" reports, if possible,
+which component drive could be responsible. Otherwise it reports
+that it is not possible to find the component drive.
+
+If the given MD device is not a RAID 6, "raid6check" will, of
+course, not continue.
+
+If the RAID6 MD device is degraded, "raid6check" will report
+an error and it will not proceed further.
+
+No write operation are performed on the array or the components.
+Furthermore, the checked array can be online and in use during
+the operation of "raid6check".
+
+.SH EXAMPLES
+
+.B "  raid6check /dev/md0 0 0"
+.br
+This will check /dev/md0 from start to end.
+
+.B "  raid6check /dev/md3 0 1"
+.br
+This will check the first stripe of /dev/md3.
+
+.B "  raid6check /dev/md1 1000 0"
+.br
+This will check /dev/md1 from stripe 1000 up to the end.
+
+.B "  raid6check /dev/m127 128 256"
+.br
+This will check 256 stripes of /dev/md127 starting from stripe 128.
+
+.B "  raid6check /dev/md0 0 0 | grep -i error > md0_err.log"
+.br
+This will check /dev/md0 completely and create a log file only
+with errors, if any.
+
+.SH FILES
+
+"raid6check" uses directly the component drives as found in /dev.
+Furthermore, the sysfs interface is needed in order to find out
+the RAID 6 parameters.
+
+.SH BUGS
+Negative parameters can lead to unexpected results.
+
+It is not clear what will happen if the RAID 6 MD device gets
+degraded during the check.
+
+.PP
+The latest version of
+.I raid6check
+should always be available from
+.IP
+.B http://www.kernel.org/pub/linux/utils/raid/mdadm/
+.PP
+Related man pages:
+.PP
+.IR mdadm (8)
+.IR mdmon (8),
+.IR mdadm.conf (5),
+.IR md (4).
+.PP
+.IR raidtab (5),
+.IR raid0run (8),
+.IR raidstop (8),
+.IR mkraid (8).

--- cut here ---

-- 

piergiorgio

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH] RAID-6 check standalone man page
  2011-08-07 17:09                             ` [PATCH] RAID-6 check standalone man page Piergiorgio Sartor
@ 2011-08-09  0:43                               ` NeilBrown
  0 siblings, 0 replies; 29+ messages in thread
From: NeilBrown @ 2011-08-09  0:43 UTC (permalink / raw)
  To: Piergiorgio Sartor; +Cc: linux-raid

On Sun, 07 Aug 2011 19:09:57 +0200 Piergiorgio Sartor
<piergiorgio.sartor@nexgo.de> wrote:

> On 07/26/2011 07:25 AM, NeilBrown wrote:
> > On Wed, 20 Jul 2011 19:57:03 +0200 Piergiorgio Sartor
> > <piergiorgio.sartor@nexgo.de> wrote:
> > 
> >> Hi Neil,
> >>
> >> sorry for the very late answer.
> >>
> >> On 05/16/2011 12:08 PM, NeilBrown wrote:
> >>> On Sun, 15 May 2011 23:15:15 +0200 Piergiorgio Sartor
> >>> <piergiorgio.sartor@nexgo.de> wrote:
> >>>
> >>>> Hi Neil,
> >>>>
> >>>> reminder for the suspend patch.
> >>>>
> >>>> Thank you so much for the code review.
> >>>>
> >>>> I modified it in order to fix, hopefully, all the flaws.
> >>>>
> >>>> New patch attached below.
> >>>>
> >>>> Please note that "sigblock()" cannot be used, since it is
> >>>> declared, at least on my system, as "deprecated".
> >>>> Furthermore, I noticed that "Grow.c" is not checking the
> >>>> return value of "sysfs_set_num()" while suspending the
> >>>> array, maybe you'll need to look at this.
> >>>>
> >>>> Finally, please check the new patch too, while I can
> >>>> confirm the software is doing what is supposed to do,
> >>>> I still need support in order to confirm the suspend
> >>>> and resume code.
> >>>>
> >>>> Thanks again for your help, again let me know what
> >>>> is the next expected step.
> >>>
> >>> That all looks fine thank.  I've applied it and pushed it out.
> >>>
> >>> I'm not sure what you mean exactly by the 'next expected step'... 
> >>
> >> Well, is there anything than should be done, like
> >> documentation or code cleanup?
> >>
> >> At the moment, it seems to me, the check itself it is
> >> fine, maybe performance is not at best (anyone wants
> >> to help, here?).
> >>
> >> So, I was thinking about the "repair" process, that is
> >> fixing the chunks which seem corrupted, instead of just
> >> the parity.
> >>
> >> Before I go that way, I would like to close pending issues,
> >> if any, with the actual software.
> > 
> > Very sensible - thanks.
> > 
> > I haven't actually used it or tested it at all so I don't know of any issues
> > in that regard.
> > I agree with an earlier reply that a man-page would be a good idea.
> > You could start by looking at "mdadm.8.in" and basing your man page on that.
> > It doesn't have to be very long - just explain what it does and how to use
> > it, and maybe how to interpret the results.
> > Often writing a man page is enough to flush out any serious usability issues
> > - if you find it hard to explain how to use it, it is probably because it is
> > hard to use :-)
> > 
> > If you aren't familiar with the troff -man markup language don't let it worry
> > you - I am happy to fix up any markup issues before including it.
> > 
> > It might be good to create a test script too - something that can go in the
> > tests/ directory would be ideal.
> > e.g. create and initialise a RAID6, deliberately corrupt one block, then run
> > your program an check that it reports the right thing.
> > Probably corrupt a different blocks (randomly?) on each device and test that
> > it reports all of the errors correctly... something like that.
> > 
> > NeilBrown
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> Hi Neil,
> 
> please find below a draft of the man page (as patch) for raid6check.
> Please have a look and let me know any issues (typos for sure).
> 
> The test cases will come later.
> 
> --- cut here ---
> 
> diff -uNr a/raid6check.8.in b/raid6check.8.in
> --- a/raid6check.8.in	1970-01-01 01:00:00.000000000 +0100
> +++ b/raid6check.8.in	2011-08-07 19:04:17.352680683 +0200
> @@ -0,0 +1,101 @@
> +.\" -*- nroff -*-
> +.\" Copyright Piergiorgio Sartor and others.
> +.\"   This program is free software; you can redistribute it and/or modify
> +.\"   it under the terms of the GNU General Public License as published by
> +.\"   the Free Software Foundation; either version 2 of the License, or
> +.\"   (at your option) any later version.
> +.\" See file COPYING in distribution for details.
> +.TH RAID6CHECK 8 "" v1.0.0
> +.SH NAME
> +raid6check \- check MD RAID6 device for errors
> +.I aka
> +Linux Software RAID
> +
> +.SH SYNOPSIS
> +
> +.BI raid6check " <raid6 device> <start stripe> <number of stripes>"
> +
> +.SH DESCRIPTION
> +RAID 6 devices in which one single component drive has errors can use
> +the double parity in order to find out the component drive.
> +The "raid6check" tool checks, for each stripe, the double parity
> +consistency and, it reports mismatches and, if possible, which
> +component drive has the mismatch.
> +Since it works at stripe level, it can report different drives with
> +mismatches at different stripes.
> +
> +"raid6check" requires a non-degraded RAID 6 MD device, as first
> +parameter, a starting stripe, usually 0, and the number of stripes
> +to be checked.
> +If this third parameter is also 0, it will check the array up to
> +the end.
> +
> +"raid6check" will start printing information about the RAID 6, then
> +for each stripe, it will report the parity rotation status.
> +In case of parity mismatches, "raid6check" reports, if possible,
> +which component drive could be responsible. Otherwise it reports
> +that it is not possible to find the component drive.
> +
> +If the given MD device is not a RAID 6, "raid6check" will, of
> +course, not continue.
> +
> +If the RAID6 MD device is degraded, "raid6check" will report
> +an error and it will not proceed further.
> +
> +No write operation are performed on the array or the components.
> +Furthermore, the checked array can be online and in use during
> +the operation of "raid6check".
> +
> +.SH EXAMPLES
> +
> +.B "  raid6check /dev/md0 0 0"
> +.br
> +This will check /dev/md0 from start to end.
> +
> +.B "  raid6check /dev/md3 0 1"
> +.br
> +This will check the first stripe of /dev/md3.
> +
> +.B "  raid6check /dev/md1 1000 0"
> +.br
> +This will check /dev/md1 from stripe 1000 up to the end.
> +
> +.B "  raid6check /dev/m127 128 256"
> +.br
> +This will check 256 stripes of /dev/md127 starting from stripe 128.
> +
> +.B "  raid6check /dev/md0 0 0 | grep -i error > md0_err.log"
> +.br
> +This will check /dev/md0 completely and create a log file only
> +with errors, if any.
> +
> +.SH FILES
> +
> +"raid6check" uses directly the component drives as found in /dev.
> +Furthermore, the sysfs interface is needed in order to find out
> +the RAID 6 parameters.
> +
> +.SH BUGS
> +Negative parameters can lead to unexpected results.
> +
> +It is not clear what will happen if the RAID 6 MD device gets
> +degraded during the check.
> +
> +.PP
> +The latest version of
> +.I raid6check
> +should always be available from
> +.IP
> +.B http://www.kernel.org/pub/linux/utils/raid/mdadm/
> +.PP
> +Related man pages:
> +.PP
> +.IR mdadm (8)
> +.IR mdmon (8),
> +.IR mdadm.conf (5),
> +.IR md (4).
> +.PP
> +.IR raidtab (5),
> +.IR raid0run (8),
> +.IR raidstop (8),
> +.IR mkraid (8).
> 
> --- cut here ---
> 


Thanks.
I've made few changes as follow - mostly "RAID 6" to "RAID6" for consistency
with other man pages.  I've also named the file "raid6check.8" (no ".in") and
added bits to the Makefile.

I was thinking that it might be good to restore the old interface as an
option.  i.e. to be able to list some devices explicitly even though they
aren't in an active array.  Occasionally people lose their metadata and this
could be used to confirm if a collection of devices in a particular order
really look like a RAID6.... Just a thought.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2011-08-09  0:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-21 20:45 [PATCH] RAID-6 check standalone Piergiorgio Sartor
2011-03-07 19:33 ` Piergiorgio Sartor
2011-03-21  3:02 ` NeilBrown
2011-03-21 10:40   ` Piergiorgio Sartor
2011-03-21 11:04     ` NeilBrown
2011-03-21 11:54       ` Piergiorgio Sartor
2011-03-21 22:59         ` NeilBrown
2011-03-31 18:53           ` [PATCH] RAID-6 check standalone md device Piergiorgio Sartor
     [not found]             ` <4D96597C.1020103@tuxes.nl>
     [not found]               ` <20110402071310.GA2640@lazy.lzy>
2011-04-02 10:33                 ` Bas van Schaik
2011-04-02 11:03                   ` Piergiorgio Sartor
2011-04-04 23:01             ` NeilBrown
2011-04-05 19:56               ` Piergiorgio Sartor
2011-04-04 17:52           ` [PATCH] RAID-6 check standalone code cleanup Piergiorgio Sartor
2011-04-04 23:12             ` NeilBrown
2011-04-06 18:02               ` Piergiorgio Sartor
2011-04-13 20:48                 ` [PATCH] RAID-6 check standalone fix component list parsing Piergiorgio Sartor
2011-04-14  7:29                   ` NeilBrown
2011-04-14  7:32                 ` [PATCH] RAID-6 check standalone code cleanup NeilBrown
2011-05-08 18:54               ` [PATCH] RAID-6 check standalone suspend array Piergiorgio Sartor
2011-05-09  1:45                 ` NeilBrown
2011-05-09 18:43                   ` [PATCH] RAID-6 check standalone suspend array V2.0 Piergiorgio Sartor
2011-05-15 21:15                     ` Piergiorgio Sartor
2011-05-16 10:08                       ` NeilBrown
2011-07-20 17:57                         ` Piergiorgio Sartor
2011-07-22  6:41                           ` Luca Berra
2011-07-25 18:53                             ` Piergiorgio Sartor
2011-07-26  5:25                           ` NeilBrown
2011-08-07 17:09                             ` [PATCH] RAID-6 check standalone man page Piergiorgio Sartor
2011-08-09  0:43                               ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.