All of lore.kernel.org
 help / color / mirror / Atom feed
* [md PATCH 00/11] Series short description
@ 2009-10-16  5:49 NeilBrown
  2009-10-16  5:49 ` [md PATCH 01/11] Revert "md: do not progress the resync process if the stripe was blocked" NeilBrown
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Dan Williams

Hi,
 The following are a series of patches for 2.6.32-rc that I hope to
 ask Linus to pull mid next week.  They are already in my for-next tree.

 Review always welcome.

 Dan, I would particularly like you to check over 
      raid6/async_tx: handle holes in block list in async_syndrome_val
      md/async: don't pass a memory pointer as a page pointer.

 as they fix some issues with the recent conversion of raid6 to work
 with async offload.  Problems particularly occurred with 'ddf' style
 layouts where there the Q syndrome is calculated over data blocks and
 some 'zero' blocks.

 My mdadm test suite now has some tests that test every single RAID5
 and RAID6 layout and every combination of devices failing.  And also
 uses the new reshape functionality to transition an array through
 each different layout.  It has been very helpful at finding bugs.

 One particularly important bug is that thought 2.6.31 can reshape a
 RAID5 to have fewer devices, if the reshape is aborted in the middle,
 2.6.31 cannot restart that reshape.  With these patches 2.6.32 will
 be able to.  I suspect that when I release mdadm-3.1 it will refuse
 to reduce the number of devices on kernels earlier than 2.6.32.

Thanks,
NeilBrown
---

Dan Williams (2):
      md/raid456: downlevel multicore operations to raid_run_ops
      md/raid5: initialize conf->device_lock earlier

NeilBrown (8):
      raid6/async_tx: handle holes in block list in async_syndrome_val
      md/async: don't pass a memory pointer as a page pointer.
      md: Fix handling of raid5 array which is being reshaped to fewer devices.
      md: fix problems with RAID6 calculations for DDF.
      md: remove clumsy usage of do_sync_mapping_range from bitmap code
      md: raid1/raid10: handle allocation errors during array setup.
      md/raid1/raid10: add a cond_resched
      Revert "md: do not progress the resync process if the stripe was blocked"

Vladimir Dronnikov (1):
      md: drivers/md/unroll.pl replaced with awk analog


 crypto/async_tx/async_pq.c          |   40 +++++---
 crypto/async_tx/async_raid6_recov.c |   16 ++-
 crypto/async_tx/async_xor.c         |   18 ++--
 drivers/md/Makefile                 |   22 ++--
 drivers/md/bitmap.c                 |    9 +-
 drivers/md/md.c                     |    2 
 drivers/md/raid1.c                  |    6 +
 drivers/md/raid10.c                 |    5 +
 drivers/md/raid5.c                  |  180 ++++++++++++++++++-----------------
 drivers/md/raid5.h                  |   14 ++-
 drivers/md/raid6altivec.uc          |    2 
 drivers/md/raid6int.uc              |    2 
 drivers/md/raid6test/Makefile       |   42 ++++----
 drivers/md/unroll.awk               |   20 ++++
 drivers/md/unroll.pl                |   24 -----
 15 files changed, 217 insertions(+), 185 deletions(-)
 create mode 100644 drivers/md/unroll.awk
 delete mode 100644 drivers/md/unroll.pl

-- 
Signature


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [md PATCH 01/11] Revert "md: do not progress the resync process if the stripe was blocked"
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 05/11] md: remove clumsy usage of do_sync_mapping_range from bitmap code NeilBrown
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Dan Williams, NeilBrown

This reverts commit df10cfbc4d7ab93260d997df754219d390d62a9d.

This patch was based on a misunderstanding and risks introducing a busy-wait loop.
So revert it.

Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid5.c |   19 ++++++-------------
 1 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9482980..ec14ff5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2896,7 +2896,7 @@ static void handle_stripe_expansion(raid5_conf_t *conf, struct stripe_head *sh,
  *
  */
 
-static bool handle_stripe5(struct stripe_head *sh)
+static void handle_stripe5(struct stripe_head *sh)
 {
 	raid5_conf_t *conf = sh->raid_conf;
 	int disks = sh->disks, i;
@@ -3167,11 +3167,9 @@ static bool handle_stripe5(struct stripe_head *sh)
 	ops_run_io(sh, &s);
 
 	return_io(return_bi);
-
-	return blocked_rdev == NULL;
 }
 
-static bool handle_stripe6(struct stripe_head *sh)
+static void handle_stripe6(struct stripe_head *sh)
 {
 	raid5_conf_t *conf = sh->raid_conf;
 	int disks = sh->disks;
@@ -3455,17 +3453,14 @@ static bool handle_stripe6(struct stripe_head *sh)
 	ops_run_io(sh, &s);
 
 	return_io(return_bi);
-
-	return blocked_rdev == NULL;
 }
 
-/* returns true if the stripe was handled */
-static bool handle_stripe(struct stripe_head *sh)
+static void handle_stripe(struct stripe_head *sh)
 {
 	if (sh->raid_conf->level == 6)
-		return handle_stripe6(sh);
+		handle_stripe6(sh);
 	else
-		return handle_stripe5(sh);
+		handle_stripe5(sh);
 }
 
 static void raid5_activate_delayed(raid5_conf_t *conf)
@@ -4277,9 +4272,7 @@ static inline sector_t sync_request(mddev_t *mddev, sector_t sector_nr, int *ski
 	clear_bit(STRIPE_INSYNC, &sh->state);
 	spin_unlock(&sh->lock);
 
-	/* wait for any blocked device to be handled */
-	while (unlikely(!handle_stripe(sh)))
-		;
+	handle_stripe(sh);
 	release_stripe(sh);
 
 	return STRIPE_SECTORS;



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 03/11] md/raid5: initialize conf->device_lock earlier
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (2 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 02/11] md/raid1/raid10: add a cond_resched NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 04/11] md: raid1/raid10: handle allocation errors during array setup NeilBrown
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Dan Williams, NeilBrown

From: Dan Williams <dan.j.williams@intel.com>

Deallocating a raid5_conf_t structure requires taking 'device_lock'.
Ensure it is initialized before it is used, i.e. initialize the lock
before attempting any further initializations that might fail.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid5.c |   25 ++++++++++++-------------
 1 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ec14ff5..c3e5967 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4715,6 +4715,18 @@ static raid5_conf_t *setup_conf(mddev_t *mddev)
 	conf = kzalloc(sizeof(raid5_conf_t), GFP_KERNEL);
 	if (conf == NULL)
 		goto abort;
+	spin_lock_init(&conf->device_lock);
+	init_waitqueue_head(&conf->wait_for_stripe);
+	init_waitqueue_head(&conf->wait_for_overlap);
+	INIT_LIST_HEAD(&conf->handle_list);
+	INIT_LIST_HEAD(&conf->hold_list);
+	INIT_LIST_HEAD(&conf->delayed_list);
+	INIT_LIST_HEAD(&conf->bitmap_list);
+	INIT_LIST_HEAD(&conf->inactive_list);
+	atomic_set(&conf->active_stripes, 0);
+	atomic_set(&conf->preread_active_stripes, 0);
+	atomic_set(&conf->active_aligned_reads, 0);
+	conf->bypass_threshold = BYPASS_THRESHOLD;
 
 	conf->raid_disks = mddev->raid_disks;
 	conf->scribble_len = scribble_len(conf->raid_disks);
@@ -4737,19 +4749,6 @@ static raid5_conf_t *setup_conf(mddev_t *mddev)
 	if (raid5_alloc_percpu(conf) != 0)
 		goto abort;
 
-	spin_lock_init(&conf->device_lock);
-	init_waitqueue_head(&conf->wait_for_stripe);
-	init_waitqueue_head(&conf->wait_for_overlap);
-	INIT_LIST_HEAD(&conf->handle_list);
-	INIT_LIST_HEAD(&conf->hold_list);
-	INIT_LIST_HEAD(&conf->delayed_list);
-	INIT_LIST_HEAD(&conf->bitmap_list);
-	INIT_LIST_HEAD(&conf->inactive_list);
-	atomic_set(&conf->active_stripes, 0);
-	atomic_set(&conf->preread_active_stripes, 0);
-	atomic_set(&conf->active_aligned_reads, 0);
-	conf->bypass_threshold = BYPASS_THRESHOLD;
-
 	pr_debug("raid5: run(%s) called.\n", mdname(mddev));
 
 	list_for_each_entry(rdev, &mddev->disks, same_set) {



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 05/11] md: remove clumsy usage of do_sync_mapping_range from bitmap code
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
  2009-10-16  5:49 ` [md PATCH 01/11] Revert "md: do not progress the resync process if the stripe was blocked" NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 02/11] md/raid1/raid10: add a cond_resched NeilBrown
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Christoph Hellwig, NeilBrown

and replace with vfs_fsync which is much neater (but wasn't exported,
or even in existence at the time the code was written).

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/bitmap.c |    9 +++++----
 1 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c
index 6986b00..60e2b32 100644
--- a/drivers/md/bitmap.c
+++ b/drivers/md/bitmap.c
@@ -1624,10 +1624,11 @@ int bitmap_create(mddev_t *mddev)
 	bitmap->offset = mddev->bitmap_offset;
 	if (file) {
 		get_file(file);
-		do_sync_mapping_range(file->f_mapping, 0, LLONG_MAX,
-				      SYNC_FILE_RANGE_WAIT_BEFORE |
-				      SYNC_FILE_RANGE_WRITE |
-				      SYNC_FILE_RANGE_WAIT_AFTER);
+		/* As future accesses to this file will use bmap,
+		 * and bypass the page cache, we must sync the file
+		 * first.
+		 */
+		vfs_fsync(file, file->f_dentry, 1);
 	}
 	/* read superblock from bitmap file (this sets bitmap->chunksize) */
 	err = bitmap_read_sb(bitmap);



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 04/11] md: raid1/raid10: handle allocation errors during array setup.
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (3 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 03/11] md/raid5: initialize conf->device_lock earlier NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 06/11] md: drivers/md/unroll.pl replaced with awk analog NeilBrown
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

Both raid1 and raid10 create a mempool during startup.
If the 'alloc' function for this mempool fails, unplug_slaves
is called.
If that happens when the pool is being initialised, unplug_slaves
will try to use the 'conf' structure that isn't filled in yet, and
badness will happen.

So ensure that unplug_slaves doesn't get called unless we know
that the conf structure if fully initialised.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c  |    5 +++--
 drivers/md/raid10.c |    4 ++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 71a01a2..a053423 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -64,7 +64,7 @@ static void * r1bio_pool_alloc(gfp_t gfp_flags, void *data)
 
 	/* allocate a r1bio with room for raid_disks entries in the bios array */
 	r1_bio = kzalloc(size, gfp_flags);
-	if (!r1_bio)
+	if (!r1_bio && pi->mddev)
 		unplug_slaves(pi->mddev);
 
 	return r1_bio;
@@ -1979,13 +1979,14 @@ static int run(mddev_t *mddev)
 	conf->poolinfo = kmalloc(sizeof(*conf->poolinfo), GFP_KERNEL);
 	if (!conf->poolinfo)
 		goto out_no_mem;
-	conf->poolinfo->mddev = mddev;
+	conf->poolinfo->mddev = NULL;
 	conf->poolinfo->raid_disks = mddev->raid_disks;
 	conf->r1bio_pool = mempool_create(NR_RAID1_BIOS, r1bio_pool_alloc,
 					  r1bio_pool_free,
 					  conf->poolinfo);
 	if (!conf->r1bio_pool)
 		goto out_no_mem;
+	conf->poolinfo->mddev = mddev;
 
 	spin_lock_init(&conf->device_lock);
 	mddev->queue->queue_lock = &conf->device_lock;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 69fc76c..c2cb7b8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -68,7 +68,7 @@ static void * r10bio_pool_alloc(gfp_t gfp_flags, void *data)
 
 	/* allocate a r10bio with room for raid_disks entries in the bios array */
 	r10_bio = kzalloc(size, gfp_flags);
-	if (!r10_bio)
+	if (!r10_bio && conf->mddev)
 		unplug_slaves(conf->mddev);
 
 	return r10_bio;
@@ -2096,7 +2096,6 @@ static int run(mddev_t *mddev)
 	if (!conf->tmppage)
 		goto out_free_conf;
 
-	conf->mddev = mddev;
 	conf->raid_disks = mddev->raid_disks;
 	conf->near_copies = nc;
 	conf->far_copies = fc;
@@ -2133,6 +2132,7 @@ static int run(mddev_t *mddev)
 		goto out_free_conf;
 	}
 
+	conf->mddev = mddev;
 	spin_lock_init(&conf->device_lock);
 	mddev->queue->queue_lock = &conf->device_lock;
 



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 02/11] md/raid1/raid10: add a cond_resched
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
  2009-10-16  5:49 ` [md PATCH 01/11] Revert "md: do not progress the resync process if the stripe was blocked" NeilBrown
  2009-10-16  5:49 ` [md PATCH 05/11] md: remove clumsy usage of do_sync_mapping_range from bitmap code NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 03/11] md/raid5: initialize conf->device_lock earlier NeilBrown
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

During 'check' of a raid1 or raid10 it is possible for the management
thread to spend a lot of time running 'memcmp' on blocks from
different devices, so make sure the thread has a chance to schedule.
raid5d already has a cond_resched (in process_stripe).

Reported-By: Lee Howard <faxguy@howardsilvan.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid1.c  |    1 +
 drivers/md/raid10.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d1b9bd5..71a01a2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1683,6 +1683,7 @@ static void raid1d(mddev_t *mddev)
 				generic_make_request(bio);
 			}
 		}
+		cond_resched();
 	}
 	if (unplug)
 		unplug_slaves(mddev);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 51c4c5c..69fc76c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1632,6 +1632,7 @@ static void raid10d(mddev_t *mddev)
 				generic_make_request(bio);
 			}
 		}
+		cond_resched();
 	}
 	if (unplug)
 		unplug_slaves(mddev);



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 06/11] md: drivers/md/unroll.pl replaced with awk analog
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (4 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 04/11] md: raid1/raid10: handle allocation errors during array setup NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 09/11] md: Fix handling of raid5 array which is being reshaped to fewer devices NeilBrown
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Vladimir Dronnikov, NeilBrown

From: Vladimir Dronnikov <dronnikov@gmail.com>

drivers/md/unroll.pl replaced by awk script to drop build-time
dependency on perl

Signed-off-by: Vladimir Dronnikov <dronnikov@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/Makefile           |   22 +++++++++++----------
 drivers/md/raid6altivec.uc    |    2 +-
 drivers/md/raid6int.uc        |    2 +-
 drivers/md/raid6test/Makefile |   42 +++++++++++++++++++++--------------------
 drivers/md/unroll.awk         |   20 ++++++++++++++++++++
 drivers/md/unroll.pl          |   24 -----------------------
 6 files changed, 54 insertions(+), 58 deletions(-)
 create mode 100644 drivers/md/unroll.awk
 delete mode 100644 drivers/md/unroll.pl

diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index 1dc4185..e355e7f 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -46,7 +46,7 @@ obj-$(CONFIG_DM_LOG_USERSPACE)	+= dm-log-userspace.o
 obj-$(CONFIG_DM_ZERO)		+= dm-zero.o
 
 quiet_cmd_unroll = UNROLL  $@
-      cmd_unroll = $(PERL) $(srctree)/$(src)/unroll.pl $(UNROLL) \
+      cmd_unroll = $(AWK) -f$(srctree)/$(src)/unroll.awk -vN=$(UNROLL) \
                    < $< > $@ || ( rm -f $@ && exit 1 )
 
 ifeq ($(CONFIG_ALTIVEC),y)
@@ -59,56 +59,56 @@ endif
 
 targets += raid6int1.c
 $(obj)/raid6int1.c:   UNROLL := 1
-$(obj)/raid6int1.c:   $(src)/raid6int.uc $(src)/unroll.pl FORCE
+$(obj)/raid6int1.c:   $(src)/raid6int.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 targets += raid6int2.c
 $(obj)/raid6int2.c:   UNROLL := 2
-$(obj)/raid6int2.c:   $(src)/raid6int.uc $(src)/unroll.pl FORCE
+$(obj)/raid6int2.c:   $(src)/raid6int.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 targets += raid6int4.c
 $(obj)/raid6int4.c:   UNROLL := 4
-$(obj)/raid6int4.c:   $(src)/raid6int.uc $(src)/unroll.pl FORCE
+$(obj)/raid6int4.c:   $(src)/raid6int.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 targets += raid6int8.c
 $(obj)/raid6int8.c:   UNROLL := 8
-$(obj)/raid6int8.c:   $(src)/raid6int.uc $(src)/unroll.pl FORCE
+$(obj)/raid6int8.c:   $(src)/raid6int.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 targets += raid6int16.c
 $(obj)/raid6int16.c:  UNROLL := 16
-$(obj)/raid6int16.c:  $(src)/raid6int.uc $(src)/unroll.pl FORCE
+$(obj)/raid6int16.c:  $(src)/raid6int.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 targets += raid6int32.c
 $(obj)/raid6int32.c:  UNROLL := 32
-$(obj)/raid6int32.c:  $(src)/raid6int.uc $(src)/unroll.pl FORCE
+$(obj)/raid6int32.c:  $(src)/raid6int.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 CFLAGS_raid6altivec1.o += $(altivec_flags)
 targets += raid6altivec1.c
 $(obj)/raid6altivec1.c:   UNROLL := 1
-$(obj)/raid6altivec1.c:   $(src)/raid6altivec.uc $(src)/unroll.pl FORCE
+$(obj)/raid6altivec1.c:   $(src)/raid6altivec.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 CFLAGS_raid6altivec2.o += $(altivec_flags)
 targets += raid6altivec2.c
 $(obj)/raid6altivec2.c:   UNROLL := 2
-$(obj)/raid6altivec2.c:   $(src)/raid6altivec.uc $(src)/unroll.pl FORCE
+$(obj)/raid6altivec2.c:   $(src)/raid6altivec.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 CFLAGS_raid6altivec4.o += $(altivec_flags)
 targets += raid6altivec4.c
 $(obj)/raid6altivec4.c:   UNROLL := 4
-$(obj)/raid6altivec4.c:   $(src)/raid6altivec.uc $(src)/unroll.pl FORCE
+$(obj)/raid6altivec4.c:   $(src)/raid6altivec.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 CFLAGS_raid6altivec8.o += $(altivec_flags)
 targets += raid6altivec8.c
 $(obj)/raid6altivec8.c:   UNROLL := 8
-$(obj)/raid6altivec8.c:   $(src)/raid6altivec.uc $(src)/unroll.pl FORCE
+$(obj)/raid6altivec8.c:   $(src)/raid6altivec.uc $(src)/unroll.awk FORCE
 	$(call if_changed,unroll)
 
 quiet_cmd_mktable = TABLE   $@
diff --git a/drivers/md/raid6altivec.uc b/drivers/md/raid6altivec.uc
index 699dfee..2654d5c 100644
--- a/drivers/md/raid6altivec.uc
+++ b/drivers/md/raid6altivec.uc
@@ -15,7 +15,7 @@
  *
  * $#-way unrolled portable integer math RAID-6 instruction set
  *
- * This file is postprocessed using unroll.pl
+ * This file is postprocessed using unroll.awk
  *
  * <benh> hpa: in process,
  * you can just "steal" the vec unit with enable_kernel_altivec() (but
diff --git a/drivers/md/raid6int.uc b/drivers/md/raid6int.uc
index f9bf9cb..d1e276a 100644
--- a/drivers/md/raid6int.uc
+++ b/drivers/md/raid6int.uc
@@ -15,7 +15,7 @@
  *
  * $#-way unrolled portable integer math RAID-6 instruction set
  *
- * This file is postprocessed using unroll.pl
+ * This file is postprocessed using unroll.awk
  */
 
 #include <linux/raid/pq.h>
diff --git a/drivers/md/raid6test/Makefile b/drivers/md/raid6test/Makefile
index 58ffdf4..2874cbe 100644
--- a/drivers/md/raid6test/Makefile
+++ b/drivers/md/raid6test/Makefile
@@ -7,7 +7,7 @@ CC	 = gcc
 OPTFLAGS = -O2			# Adjust as desired
 CFLAGS	 = -I.. -I ../../../include -g $(OPTFLAGS)
 LD	 = ld
-PERL	 = perl
+AWK	 = awk
 AR	 = ar
 RANLIB	 = ranlib
 
@@ -35,35 +35,35 @@ raid6.a: raid6int1.o raid6int2.o raid6int4.o raid6int8.o raid6int16.o \
 raid6test: test.c raid6.a
 	$(CC) $(CFLAGS) -o raid6test $^
 
-raid6altivec1.c: raid6altivec.uc ../unroll.pl
-	$(PERL) ../unroll.pl 1 < raid6altivec.uc > $@
+raid6altivec1.c: raid6altivec.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=1 < raid6altivec.uc > $@
 
-raid6altivec2.c: raid6altivec.uc ../unroll.pl
-	$(PERL) ../unroll.pl 2 < raid6altivec.uc > $@
+raid6altivec2.c: raid6altivec.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=2 < raid6altivec.uc > $@
 
-raid6altivec4.c: raid6altivec.uc ../unroll.pl
-	$(PERL) ../unroll.pl 4 < raid6altivec.uc > $@
+raid6altivec4.c: raid6altivec.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=4 < raid6altivec.uc > $@
 
-raid6altivec8.c: raid6altivec.uc ../unroll.pl
-	$(PERL) ../unroll.pl 8 < raid6altivec.uc > $@
+raid6altivec8.c: raid6altivec.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=8 < raid6altivec.uc > $@
 
-raid6int1.c: raid6int.uc ../unroll.pl
-	$(PERL) ../unroll.pl 1 < raid6int.uc > $@
+raid6int1.c: raid6int.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=1 < raid6int.uc > $@
 
-raid6int2.c: raid6int.uc ../unroll.pl
-	$(PERL) ../unroll.pl 2 < raid6int.uc > $@
+raid6int2.c: raid6int.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=2 < raid6int.uc > $@
 
-raid6int4.c: raid6int.uc ../unroll.pl
-	$(PERL) ../unroll.pl 4 < raid6int.uc > $@
+raid6int4.c: raid6int.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=4 < raid6int.uc > $@
 
-raid6int8.c: raid6int.uc ../unroll.pl
-	$(PERL) ../unroll.pl 8 < raid6int.uc > $@
+raid6int8.c: raid6int.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=8 < raid6int.uc > $@
 
-raid6int16.c: raid6int.uc ../unroll.pl
-	$(PERL) ../unroll.pl 16 < raid6int.uc > $@
+raid6int16.c: raid6int.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=16 < raid6int.uc > $@
 
-raid6int32.c: raid6int.uc ../unroll.pl
-	$(PERL) ../unroll.pl 32 < raid6int.uc > $@
+raid6int32.c: raid6int.uc ../unroll.awk
+	$(AWK) ../unroll.awk -vN=32 < raid6int.uc > $@
 
 raid6tables.c: mktables
 	./mktables > raid6tables.c
diff --git a/drivers/md/unroll.awk b/drivers/md/unroll.awk
new file mode 100644
index 0000000..c6aa036
--- /dev/null
+++ b/drivers/md/unroll.awk
@@ -0,0 +1,20 @@
+
+# This filter requires one command line option of form -vN=n
+# where n must be a decimal number.
+#
+# Repeat each input line containing $$ n times, replacing $$ with 0...n-1.
+# Replace each $# with n, and each $* with a single $.
+
+BEGIN {
+	n = N + 0
+}
+{
+	if (/\$\$/) { rep = n } else { rep = 1 }
+	for (i = 0; i < rep; ++i) {
+		tmp = $0
+		gsub(/\$\$/, i, tmp)
+		gsub(/\$\#/, n, tmp)
+		gsub(/\$\*/, "$", tmp)
+		print tmp
+	}
+}
diff --git a/drivers/md/unroll.pl b/drivers/md/unroll.pl
deleted file mode 100644
index 3acc710..0000000
--- a/drivers/md/unroll.pl
+++ /dev/null
@@ -1,24 +0,0 @@
-#!/usr/bin/perl
-#
-# Take a piece of C code and for each line which contains the sequence $$
-# repeat n times with $ replaced by 0...n-1; the sequence $# is replaced
-# by the unrolling factor, and $* with a single $
-#
-
-($n) = @ARGV;
-$n += 0;
-
-while ( defined($line = <STDIN>) ) {
-    if ( $line =~ /\$\$/ ) {
-	$rep = $n;
-    } else {
-	$rep = 1;
-    }
-    for ( $i = 0 ; $i < $rep ; $i++ ) {
-	$tmp = $line;
-	$tmp =~ s/\$\$/$i/g;
-	$tmp =~ s/\$\#/$n/g;
-	$tmp =~ s/\$\*/\$/g;
-	print $tmp;
-    }
-}



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 07/11] md/raid456: downlevel multicore operations to raid_run_ops
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (9 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 11/11] raid6/async_tx: handle holes in block list in async_syndrome_val NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Dan Williams, NeilBrown

From: Dan Williams <dan.j.williams@intel.com>

The percpu conversion allowed a straightforward handoff of stripe
processing to the async subsytem that initially showed some modest gains
(+4%).  However, this model is too simplistic and leads to stripes
bouncing between raid5d and the async thread pool for every invocation
of handle_stripe().  As reported by Holger this can fall into a
pathological situation severely impacting throughput (6x performance
loss).

By downleveling the parallelism to raid_run_ops the pathological
stripe_head bouncing is eliminated.  This version still exhibits an
average 11% throughput loss for:

	mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6
	echo 1024 > /sys/block/md0/md/stripe_cache_size
	dd if=/dev/zero of=/dev/md0 bs=1024k count=2048

...but the results are at least stable and can be used as a base for
further multicore experimentation.

Reported-by: Holger Kiehl <Holger.Kiehl@dwd.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid5.c |   75 ++++++++++++++++++++++++++++------------------------
 drivers/md/raid5.h |   12 ++++++++
 2 files changed, 51 insertions(+), 36 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c3e5967..25c3c29 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1139,7 +1139,7 @@ static void ops_run_check_pq(struct stripe_head *sh, struct raid5_percpu *percpu
 			   &sh->ops.zero_sum_result, percpu->spare_page, &submit);
 }
 
-static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
+static void __raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
 {
 	int overlap_clear = 0, i, disks = sh->disks;
 	struct dma_async_tx_descriptor *tx = NULL;
@@ -1204,6 +1204,36 @@ static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
 	put_cpu();
 }
 
+#ifdef CONFIG_MULTICORE_RAID456
+static void async_run_ops(void *param, async_cookie_t cookie)
+{
+	struct stripe_head *sh = param;
+	unsigned long ops_request = sh->ops.request;
+
+	clear_bit_unlock(STRIPE_OPS_REQ_PENDING, &sh->state);
+	wake_up(&sh->ops.wait_for_ops);
+
+	__raid_run_ops(sh, ops_request);
+	release_stripe(sh);
+}
+
+static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
+{
+	/* since handle_stripe can be called outside of raid5d context
+	 * we need to ensure sh->ops.request is de-staged before another
+	 * request arrives
+	 */
+	wait_event(sh->ops.wait_for_ops,
+		   !test_and_set_bit_lock(STRIPE_OPS_REQ_PENDING, &sh->state));
+	sh->ops.request = ops_request;
+
+	atomic_inc(&sh->count);
+	async_schedule(async_run_ops, sh);
+}
+#else
+#define raid_run_ops __raid_run_ops
+#endif
+
 static int grow_one_stripe(raid5_conf_t *conf)
 {
 	struct stripe_head *sh;
@@ -1213,6 +1243,9 @@ static int grow_one_stripe(raid5_conf_t *conf)
 	memset(sh, 0, sizeof(*sh) + (conf->raid_disks-1)*sizeof(struct r5dev));
 	sh->raid_conf = conf;
 	spin_lock_init(&sh->lock);
+	#ifdef CONFIG_MULTICORE_RAID456
+	init_waitqueue_head(&sh->ops.wait_for_ops);
+	#endif
 
 	if (grow_buffers(sh, conf->raid_disks)) {
 		shrink_buffers(sh, conf->raid_disks);
@@ -1329,6 +1362,9 @@ static int resize_stripes(raid5_conf_t *conf, int newsize)
 
 		nsh->raid_conf = conf;
 		spin_lock_init(&nsh->lock);
+		#ifdef CONFIG_MULTICORE_RAID456
+		init_waitqueue_head(&nsh->ops.wait_for_ops);
+		#endif
 
 		list_add(&nsh->lru, &newstripes);
 	}
@@ -4342,37 +4378,6 @@ static int  retry_aligned_read(raid5_conf_t *conf, struct bio *raid_bio)
 	return handled;
 }
 
-#ifdef CONFIG_MULTICORE_RAID456
-static void __process_stripe(void *param, async_cookie_t cookie)
-{
-	struct stripe_head *sh = param;
-
-	handle_stripe(sh);
-	release_stripe(sh);
-}
-
-static void process_stripe(struct stripe_head *sh, struct list_head *domain)
-{
-	async_schedule_domain(__process_stripe, sh, domain);
-}
-
-static void synchronize_stripe_processing(struct list_head *domain)
-{
-	async_synchronize_full_domain(domain);
-}
-#else
-static void process_stripe(struct stripe_head *sh, struct list_head *domain)
-{
-	handle_stripe(sh);
-	release_stripe(sh);
-	cond_resched();
-}
-
-static void synchronize_stripe_processing(struct list_head *domain)
-{
-}
-#endif
-
 
 /*
  * This is our raid5 kernel thread.
@@ -4386,7 +4391,6 @@ static void raid5d(mddev_t *mddev)
 	struct stripe_head *sh;
 	raid5_conf_t *conf = mddev->private;
 	int handled;
-	LIST_HEAD(raid_domain);
 
 	pr_debug("+++ raid5d active\n");
 
@@ -4423,7 +4427,9 @@ static void raid5d(mddev_t *mddev)
 		spin_unlock_irq(&conf->device_lock);
 		
 		handled++;
-		process_stripe(sh, &raid_domain);
+		handle_stripe(sh);
+		release_stripe(sh);
+		cond_resched();
 
 		spin_lock_irq(&conf->device_lock);
 	}
@@ -4431,7 +4437,6 @@ static void raid5d(mddev_t *mddev)
 
 	spin_unlock_irq(&conf->device_lock);
 
-	synchronize_stripe_processing(&raid_domain);
 	async_tx_issue_pending_all();
 	unplug_slaves(mddev);
 
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index 2390e0e..dcefdc9 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -214,12 +214,20 @@ struct stripe_head {
 	int			disks;		/* disks in stripe */
 	enum check_states	check_state;
 	enum reconstruct_states reconstruct_state;
-	/* stripe_operations
+	/**
+	 * struct stripe_operations
 	 * @target - STRIPE_OP_COMPUTE_BLK target
+	 * @target2 - 2nd compute target in the raid6 case
+	 * @zero_sum_result - P and Q verification flags
+	 * @request - async service request flags for raid_run_ops
 	 */
 	struct stripe_operations {
 		int 		     target, target2;
 		enum sum_check_flags zero_sum_result;
+		#ifdef CONFIG_MULTICORE_RAID456
+		unsigned long	     request;
+		wait_queue_head_t    wait_for_ops;
+		#endif
 	} ops;
 	struct r5dev {
 		struct bio	req;
@@ -294,6 +302,8 @@ struct r6_state {
 #define	STRIPE_FULL_WRITE	13 /* all blocks are set to be overwritten */
 #define	STRIPE_BIOFILL_RUN	14
 #define	STRIPE_COMPUTE_RUN	15
+#define	STRIPE_OPS_REQ_PENDING	16
+
 /*
  * Operation request flags
  */



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 08/11] md: fix problems with RAID6 calculations for DDF.
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (6 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 09/11] md: Fix handling of raid5 array which is being reshaped to fewer devices NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 10/11] md/async: don't pass a memory pointer as a page pointer NeilBrown
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/raid5.c |   20 +++++++++++++-------
 drivers/md/raid5.h |    2 +-
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 25c3c29..d4ce51b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -158,11 +158,14 @@ static int raid6_idx_to_slot(int idx, struct stripe_head *sh,
 {
 	int slot;
 
+	if (sh->ddf_layout)
+		slot = (*count)++;
 	if (idx == sh->pd_idx)
 		return syndrome_disks;
 	if (idx == sh->qd_idx)
 		return syndrome_disks + 1;
-	slot = (*count)++;
+	if (!sh->ddf_layout)
+		slot = (*count)++;
 	return slot;
 }
 
@@ -727,9 +730,8 @@ static int set_syndrome_sources(struct page **srcs, struct stripe_head *sh)
 		srcs[slot] = sh->dev[i].page;
 		i = raid6_next_disk(i, disks);
 	} while (i != d0_idx);
-	BUG_ON(count != syndrome_disks);
 
-	return count;
+	return syndrome_disks;
 }
 
 static struct dma_async_tx_descriptor *
@@ -828,7 +830,6 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
 			failb = slot;
 		i = raid6_next_disk(i, disks);
 	} while (i != d0_idx);
-	BUG_ON(count != syndrome_disks);
 
 	BUG_ON(faila == failb);
 	if (failb < faila)
@@ -845,7 +846,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
 			init_async_submit(&submit, ASYNC_TX_FENCE, NULL,
 					  ops_complete_compute, sh,
 					  to_addr_conv(sh, percpu));
-			return async_gen_syndrome(blocks, 0, count+2,
+			return async_gen_syndrome(blocks, 0, syndrome_disks+2,
 						  STRIPE_SIZE, &submit);
 		} else {
 			struct page *dest;
@@ -1935,10 +1936,15 @@ static sector_t compute_blocknr(struct stripe_head *sh, int i, int previous)
 		case ALGORITHM_PARITY_N:
 			break;
 		case ALGORITHM_ROTATING_N_CONTINUE:
+			/* Like left_symmetric, but P is before Q */
 			if (sh->pd_idx == 0)
 				i--;	/* P D D D Q */
-			else if (i > sh->pd_idx)
-				i -= 2; /* D D Q P D */
+			else {
+				/* D D Q P D */
+				if (i < sh->pd_idx)
+					i += raid_disks;
+				i -= (sh->pd_idx + 1);
+			}
 			break;
 		case ALGORITHM_LEFT_ASYMMETRIC_6:
 		case ALGORITHM_RIGHT_ASYMMETRIC_6:
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index dcefdc9..dd70835 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -488,7 +488,7 @@ static inline int algorithm_valid_raid6(int layout)
 {
 	return (layout >= 0 && layout <= 5)
 		||
-		(layout == 8 || layout == 10)
+		(layout >= 8 && layout <= 10)
 		||
 		(layout >= 16 && layout <= 20);
 }



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 11/11] raid6/async_tx: handle holes in block list in async_syndrome_val
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (8 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 10/11] md/async: don't pass a memory pointer as a page pointer NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 07/11] md/raid456: downlevel multicore operations to raid_run_ops NeilBrown
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Dan Williams

async_syndrome_val check the P and Q blocks used for RAID6
calculations.
With DDF raid6, some of the data blocks might be NULL, so
this needs to be handled in the same way that async_gen_syndrome
handles it.

As async_syndrome_val calls async_xor, also enhance async_xor
to detect and skip NULL blocks in the list.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 crypto/async_tx/async_pq.c  |   31 ++++++++++++++++++++++++-------
 crypto/async_tx/async_xor.c |   18 +++++++++++-------
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index 9ab1ce4..43b1436 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -260,8 +260,10 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
 						      len);
 	struct dma_device *device = chan ? chan->device : NULL;
 	struct dma_async_tx_descriptor *tx;
+	unsigned char coefs[disks-2];
 	enum dma_ctrl_flags dma_flags = submit->cb_fn ? DMA_PREP_INTERRUPT : 0;
 	dma_addr_t *dma_src = NULL;
+	int src_cnt = 0;
 
 	BUG_ON(disks < 4);
 
@@ -280,20 +282,35 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
 			 __func__, disks, len);
 		if (!P(blocks, disks))
 			dma_flags |= DMA_PREP_PQ_DISABLE_P;
+		else
+			pq[0] = dma_map_page(dev, P(blocks,disks),
+					     offset, len,
+					     DMA_TO_DEVICE);
 		if (!Q(blocks, disks))
 			dma_flags |= DMA_PREP_PQ_DISABLE_Q;
+		else
+			pq[1] = dma_map_page(dev, Q(blocks,disks),
+					     offset, len,
+					     DMA_TO_DEVICE);
+
 		if (submit->flags & ASYNC_TX_FENCE)
 			dma_flags |= DMA_PREP_FENCE;
-		for (i = 0; i < disks; i++)
-			if (likely(blocks[i]))
-				dma_src[i] = dma_map_page(dev, blocks[i],
-							  offset, len,
-							  DMA_TO_DEVICE);
+		for (i = 0; i < disks-2; i++)
+			if (likely(blocks[i])) {
+				dma_src[src_cnt] = dma_map_page(dev, blocks[i],
+								offset, len,
+								DMA_TO_DEVICE);
+				coefs[src_cnt] = raid6_gfexp[i];
+				src_cnt++;
+			}
+		pq[1] = dma_map_page(dev, Q(blocks,disks),
+				     offset, len,
+				     DMA_TO_DEVICE);
 
 		for (;;) {
 			tx = device->device_prep_dma_pq_val(chan, pq, dma_src,
-							    disks - 2,
-							    raid6_gfexp,
+							    src_cnt,
+							    coefs,
 							    len, pqres,
 							    dma_flags);
 			if (likely(tx))
diff --git a/crypto/async_tx/async_xor.c b/crypto/async_tx/async_xor.c
index b459a90..79182dc 100644
--- a/crypto/async_tx/async_xor.c
+++ b/crypto/async_tx/async_xor.c
@@ -44,20 +44,23 @@ do_async_xor(struct dma_chan *chan, struct page *dest, struct page **src_list,
 	void *cb_param_orig = submit->cb_param;
 	enum async_tx_flags flags_orig = submit->flags;
 	enum dma_ctrl_flags dma_flags;
-	int xor_src_cnt;
+	int xor_src_cnt = 0;
 	dma_addr_t dma_dest;
 
 	/* map the dest bidrectional in case it is re-used as a source */
 	dma_dest = dma_map_page(dma->dev, dest, offset, len, DMA_BIDIRECTIONAL);
 	for (i = 0; i < src_cnt; i++) {
 		/* only map the dest once */
+		if (!src_list[i])
+			continue;
 		if (unlikely(src_list[i] == dest)) {
-			dma_src[i] = dma_dest;
+			dma_src[xor_src_cnt++] = dma_dest;
 			continue;
 		}
-		dma_src[i] = dma_map_page(dma->dev, src_list[i], offset,
-					  len, DMA_TO_DEVICE);
+		dma_src[xor_src_cnt++] = dma_map_page(dma->dev, src_list[i], offset,
+						      len, DMA_TO_DEVICE);
 	}
+	src_cnt = xor_src_cnt;
 
 	while (src_cnt) {
 		submit->flags = flags_orig;
@@ -123,7 +126,7 @@ do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
 	    int src_cnt, size_t len, struct async_submit_ctl *submit)
 {
 	int i;
-	int xor_src_cnt;
+	int xor_src_cnt = 0;
 	int src_off = 0;
 	void *dest_buf;
 	void **srcs;
@@ -135,8 +138,9 @@ do_sync_xor(struct page *dest, struct page **src_list, unsigned int offset,
 
 	/* convert to buffer pointers */
 	for (i = 0; i < src_cnt; i++)
-		srcs[i] = page_address(src_list[i]) + offset;
-
+		if (src_list[i])
+			srcs[xor_src_cnt++] = page_address(src_list[i]) + offset;
+	src_cnt = xor_src_cnt;
 	/* set destination address */
 	dest_buf = page_address(dest) + offset;
 



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 10/11] md/async: don't pass a memory pointer as a page pointer.
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (7 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 08/11] md: fix problems with RAID6 calculations for DDF NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 11/11] raid6/async_tx: handle holes in block list in async_syndrome_val NeilBrown
  2009-10-16  5:49 ` [md PATCH 07/11] md/raid456: downlevel multicore operations to raid_run_ops NeilBrown
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: Dan Williams

md/raid6 passes a list of 'struct page *' to the async_tx routines,
which then either DMA map them for offload, or take the page_address
for CPU based calculations.

For RAID6 we sometime leave 'blanks' in the list of pages.
For CPU based calcs, we want to treat theses as a page of zeros.
For offloaded calculations, we simply don't pass a page to the
hardware.

Currently the 'blanks' are encoded as a pointer to
raid6_empty_zero_page.  This is a 4096 byte memory region, not a
'struct page'.  This is mostly handled correctly but is rather ugly.

So change the code to pass and expect a NULL pointer for the blanks.
When taking page_address of a page, we need to check for a NULL and
in that case use raid6_empty_zero_page.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 crypto/async_tx/async_pq.c          |   15 ++++-----------
 crypto/async_tx/async_raid6_recov.c |   16 +++++++++++-----
 drivers/md/raid5.c                  |    4 ++--
 3 files changed, 17 insertions(+), 18 deletions(-)

diff --git a/crypto/async_tx/async_pq.c b/crypto/async_tx/async_pq.c
index b88db6d..9ab1ce4 100644
--- a/crypto/async_tx/async_pq.c
+++ b/crypto/async_tx/async_pq.c
@@ -30,11 +30,6 @@
  */
 static struct page *scribble;
 
-static bool is_raid6_zero_block(struct page *p)
-{
-	return p == (void *) raid6_empty_zero_page;
-}
-
 /* the struct page *blocks[] parameter passed to async_gen_syndrome()
  * and async_syndrome_val() contains the 'P' destination address at
  * blocks[disks-2] and the 'Q' destination address at blocks[disks-1]
@@ -83,7 +78,7 @@ do_async_gen_syndrome(struct dma_chan *chan, struct page **blocks,
 	 * sources and update the coefficients accordingly
 	 */
 	for (i = 0, idx = 0; i < src_cnt; i++) {
-		if (is_raid6_zero_block(blocks[i]))
+		if (blocks[i] == NULL)
 			continue;
 		dma_src[idx] = dma_map_page(dma->dev, blocks[i], offset, len,
 					    DMA_TO_DEVICE);
@@ -160,9 +155,9 @@ do_sync_gen_syndrome(struct page **blocks, unsigned int offset, int disks,
 		srcs = (void **) blocks;
 
 	for (i = 0; i < disks; i++) {
-		if (is_raid6_zero_block(blocks[i])) {
+		if (blocks[i] == NULL) {
 			BUG_ON(i > disks - 3); /* P or Q can't be zero */
-			srcs[i] = blocks[i];
+			srcs[i] = (void*)raid6_empty_zero_page;
 		} else
 			srcs[i] = page_address(blocks[i]) + offset;
 	}
@@ -290,12 +285,10 @@ async_syndrome_val(struct page **blocks, unsigned int offset, int disks,
 		if (submit->flags & ASYNC_TX_FENCE)
 			dma_flags |= DMA_PREP_FENCE;
 		for (i = 0; i < disks; i++)
-			if (likely(blocks[i])) {
-				BUG_ON(is_raid6_zero_block(blocks[i]));
+			if (likely(blocks[i]))
 				dma_src[i] = dma_map_page(dev, blocks[i],
 							  offset, len,
 							  DMA_TO_DEVICE);
-			}
 
 		for (;;) {
 			tx = device->device_prep_dma_pq_val(chan, pq, dma_src,
diff --git a/crypto/async_tx/async_raid6_recov.c b/crypto/async_tx/async_raid6_recov.c
index 6d73dde..8e30b6e 100644
--- a/crypto/async_tx/async_raid6_recov.c
+++ b/crypto/async_tx/async_raid6_recov.c
@@ -263,10 +263,10 @@ __2data_recov_n(int disks, size_t bytes, int faila, int failb,
 	 * delta p and delta q
 	 */
 	dp = blocks[faila];
-	blocks[faila] = (void *)raid6_empty_zero_page;
+	blocks[faila] = NULL;
 	blocks[disks-2] = dp;
 	dq = blocks[failb];
-	blocks[failb] = (void *)raid6_empty_zero_page;
+	blocks[failb] = NULL;
 	blocks[disks-1] = dq;
 
 	init_async_submit(submit, ASYNC_TX_FENCE, tx, NULL, NULL, scribble);
@@ -338,7 +338,10 @@ async_raid6_2data_recov(int disks, size_t bytes, int faila, int failb,
 
 		async_tx_quiesce(&submit->depend_tx);
 		for (i = 0; i < disks; i++)
-			ptrs[i] = page_address(blocks[i]);
+			if (blocks[i] == NULL)
+				ptrs[i] = (void*)raid6_empty_zero_page;
+			else
+				ptrs[i] = page_address(blocks[i]);
 
 		raid6_2data_recov(disks, bytes, faila, failb, ptrs);
 
@@ -398,7 +401,10 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
 
 		async_tx_quiesce(&submit->depend_tx);
 		for (i = 0; i < disks; i++)
-			ptrs[i] = page_address(blocks[i]);
+			if (blocks[i] == NULL)
+				ptrs[i] = (void*)raid6_empty_zero_page;
+			else
+				ptrs[i] = page_address(blocks[i]);
 
 		raid6_datap_recov(disks, bytes, faila, ptrs);
 
@@ -414,7 +420,7 @@ async_raid6_datap_recov(int disks, size_t bytes, int faila,
 	 * Use the dead data page as temporary storage for delta q
 	 */
 	dq = blocks[faila];
-	blocks[faila] = (void *)raid6_empty_zero_page;
+	blocks[faila] = NULL;
 	blocks[disks-1] = dq;
 
 	/* in the 4 disk case we only need to perform a single source
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index c4366c9..dcd9e65 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -720,7 +720,7 @@ static int set_syndrome_sources(struct page **srcs, struct stripe_head *sh)
 	int i;
 
 	for (i = 0; i < disks; i++)
-		srcs[i] = (void *)raid6_empty_zero_page;
+		srcs[i] = NULL;
 
 	count = 0;
 	i = d0_idx;
@@ -816,7 +816,7 @@ ops_run_compute6_2(struct stripe_head *sh, struct raid5_percpu *percpu)
 	 * slot number conversion for 'faila' and 'failb'
 	 */
 	for (i = 0; i < disks ; i++)
-		blocks[i] = (void *)raid6_empty_zero_page;
+		blocks[i] = NULL;
 	count = 0;
 	i = d0_idx;
 	do {



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [md PATCH 09/11] md: Fix handling of raid5 array which is being reshaped to fewer devices.
  2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
                   ` (5 preceding siblings ...)
  2009-10-16  5:49 ` [md PATCH 06/11] md: drivers/md/unroll.pl replaced with awk analog NeilBrown
@ 2009-10-16  5:49 ` NeilBrown
  2009-10-16  5:49 ` [md PATCH 08/11] md: fix problems with RAID6 calculations for DDF NeilBrown
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: NeilBrown @ 2009-10-16  5:49 UTC (permalink / raw)
  To: linux-raid; +Cc: NeilBrown

When a raid5 (or raid6) array is being reshaped to have fewer devices,
conf->raid_disks is the latter and hence smaller number of devices.
However sometimes we want to use a number which is the total number of
currently required devices - the larger of the 'old' and 'new' sizes.
Before we implemented reducing the number of devices, this was always
'new' i.e. ->raid_disks.
Now we need max(raid_disks, previous_raid_disks) in those places.

This particularly affects assembling an array that was shutdown while
in the middle of a reshape to fewer devices.

md.c needs a similar fix when interpreting the md metadata.

Signed-off-by: NeilBrown <neilb@suse.de>
---

 drivers/md/md.c    |    2 +-
 drivers/md/raid5.c |   37 ++++++++++++++++++-------------------
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 26ba42a..10eb1fc 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -2631,7 +2631,7 @@ static void analyze_sbs(mddev_t * mddev)
 			rdev->desc_nr = i++;
 			rdev->raid_disk = rdev->desc_nr;
 			set_bit(In_sync, &rdev->flags);
-		} else if (rdev->raid_disk >= mddev->raid_disks) {
+		} else if (rdev->raid_disk >= (mddev->raid_disks - min(0, mddev->delta_disks))) {
 			rdev->raid_disk = -1;
 			clear_bit(In_sync, &rdev->flags);
 		}
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d4ce51b..c4366c9 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1238,22 +1238,22 @@ static void raid_run_ops(struct stripe_head *sh, unsigned long ops_request)
 static int grow_one_stripe(raid5_conf_t *conf)
 {
 	struct stripe_head *sh;
+	int disks = max(conf->raid_disks, conf->previous_raid_disks);
 	sh = kmem_cache_alloc(conf->slab_cache, GFP_KERNEL);
 	if (!sh)
 		return 0;
-	memset(sh, 0, sizeof(*sh) + (conf->raid_disks-1)*sizeof(struct r5dev));
+	memset(sh, 0, sizeof(*sh) + (disks-1)*sizeof(struct r5dev));
 	sh->raid_conf = conf;
 	spin_lock_init(&sh->lock);
 	#ifdef CONFIG_MULTICORE_RAID456
 	init_waitqueue_head(&sh->ops.wait_for_ops);
 	#endif
 
-	if (grow_buffers(sh, conf->raid_disks)) {
-		shrink_buffers(sh, conf->raid_disks);
+	if (grow_buffers(sh, disks)) {
+		shrink_buffers(sh, disks);
 		kmem_cache_free(conf->slab_cache, sh);
 		return 0;
 	}
-	sh->disks = conf->raid_disks;
 	/* we just created an active stripe so... */
 	atomic_set(&sh->count, 1);
 	atomic_inc(&conf->active_stripes);
@@ -1265,7 +1265,7 @@ static int grow_one_stripe(raid5_conf_t *conf)
 static int grow_stripes(raid5_conf_t *conf, int num)
 {
 	struct kmem_cache *sc;
-	int devs = conf->raid_disks;
+	int devs = max(conf->raid_disks, conf->previous_raid_disks);
 
 	sprintf(conf->cache_name[0],
 		"raid%d-%s", conf->level, mdname(conf->mddev));
@@ -3540,9 +3540,10 @@ static void unplug_slaves(mddev_t *mddev)
 {
 	raid5_conf_t *conf = mddev->private;
 	int i;
+	int devs = max(conf->raid_disks, conf->previous_raid_disks);
 
 	rcu_read_lock();
-	for (i = 0; i < conf->raid_disks; i++) {
+	for (i = 0; i < devs; i++) {
 		mdk_rdev_t *rdev = rcu_dereference(conf->disks[i].rdev);
 		if (rdev && !test_bit(Faulty, &rdev->flags) && atomic_read(&rdev->nr_pending)) {
 			struct request_queue *r_queue = bdev_get_queue(rdev->bdev);
@@ -4562,13 +4563,9 @@ raid5_size(mddev_t *mddev, sector_t sectors, int raid_disks)
 
 	if (!sectors)
 		sectors = mddev->dev_sectors;
-	if (!raid_disks) {
+	if (!raid_disks)
 		/* size is defined by the smallest of previous and new size */
-		if (conf->raid_disks < conf->previous_raid_disks)
-			raid_disks = conf->raid_disks;
-		else
-			raid_disks = conf->previous_raid_disks;
-	}
+		raid_disks = min(conf->raid_disks, conf->previous_raid_disks);
 
 	sectors &= ~((sector_t)mddev->chunk_sectors - 1);
 	sectors &= ~((sector_t)mddev->new_chunk_sectors - 1);
@@ -4669,7 +4666,7 @@ static int raid5_alloc_percpu(raid5_conf_t *conf)
 			}
 			per_cpu_ptr(conf->percpu, cpu)->spare_page = spare_page;
 		}
-		scribble = kmalloc(scribble_len(conf->raid_disks), GFP_KERNEL);
+		scribble = kmalloc(conf->scribble_len, GFP_KERNEL);
 		if (!scribble) {
 			err = -ENOMEM;
 			break;
@@ -4690,7 +4687,7 @@ static int raid5_alloc_percpu(raid5_conf_t *conf)
 static raid5_conf_t *setup_conf(mddev_t *mddev)
 {
 	raid5_conf_t *conf;
-	int raid_disk, memory;
+	int raid_disk, memory, max_disks;
 	mdk_rdev_t *rdev;
 	struct disk_info *disk;
 
@@ -4740,13 +4737,14 @@ static raid5_conf_t *setup_conf(mddev_t *mddev)
 	conf->bypass_threshold = BYPASS_THRESHOLD;
 
 	conf->raid_disks = mddev->raid_disks;
-	conf->scribble_len = scribble_len(conf->raid_disks);
 	if (mddev->reshape_position == MaxSector)
 		conf->previous_raid_disks = mddev->raid_disks;
 	else
 		conf->previous_raid_disks = mddev->raid_disks - mddev->delta_disks;
+	max_disks = max(conf->raid_disks, conf->previous_raid_disks);
+	conf->scribble_len = scribble_len(max_disks);
 
-	conf->disks = kzalloc(conf->raid_disks * sizeof(struct disk_info),
+	conf->disks = kzalloc(max_disks * sizeof(struct disk_info),
 			      GFP_KERNEL);
 	if (!conf->disks)
 		goto abort;
@@ -4764,7 +4762,7 @@ static raid5_conf_t *setup_conf(mddev_t *mddev)
 
 	list_for_each_entry(rdev, &mddev->disks, same_set) {
 		raid_disk = rdev->raid_disk;
-		if (raid_disk >= conf->raid_disks
+		if (raid_disk >= max_disks
 		    || raid_disk < 0)
 			continue;
 		disk = conf->disks + raid_disk;
@@ -4796,7 +4794,7 @@ static raid5_conf_t *setup_conf(mddev_t *mddev)
 	}
 
 	memory = conf->max_nr_stripes * (sizeof(struct stripe_head) +
-		 conf->raid_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
+		 max_disks * ((sizeof(struct bio) + PAGE_SIZE))) / 1024;
 	if (grow_stripes(conf, conf->max_nr_stripes)) {
 		printk(KERN_ERR
 			"raid5: couldn't allocate %dkB for buffers\n", memory);
@@ -4921,7 +4919,8 @@ static int run(mddev_t *mddev)
 		    test_bit(In_sync, &rdev->flags))
 			working_disks++;
 
-	mddev->degraded = conf->raid_disks - working_disks;
+	mddev->degraded = (max(conf->raid_disks, conf->previous_raid_disks)
+			   - working_disks);
 
 	if (mddev->degraded > conf->max_degraded) {
 		printk(KERN_ERR "raid5: not enough operational devices for %s"



^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-10-16  5:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-16  5:49 [md PATCH 00/11] Series short description NeilBrown
2009-10-16  5:49 ` [md PATCH 01/11] Revert "md: do not progress the resync process if the stripe was blocked" NeilBrown
2009-10-16  5:49 ` [md PATCH 05/11] md: remove clumsy usage of do_sync_mapping_range from bitmap code NeilBrown
2009-10-16  5:49 ` [md PATCH 02/11] md/raid1/raid10: add a cond_resched NeilBrown
2009-10-16  5:49 ` [md PATCH 03/11] md/raid5: initialize conf->device_lock earlier NeilBrown
2009-10-16  5:49 ` [md PATCH 04/11] md: raid1/raid10: handle allocation errors during array setup NeilBrown
2009-10-16  5:49 ` [md PATCH 06/11] md: drivers/md/unroll.pl replaced with awk analog NeilBrown
2009-10-16  5:49 ` [md PATCH 09/11] md: Fix handling of raid5 array which is being reshaped to fewer devices NeilBrown
2009-10-16  5:49 ` [md PATCH 08/11] md: fix problems with RAID6 calculations for DDF NeilBrown
2009-10-16  5:49 ` [md PATCH 10/11] md/async: don't pass a memory pointer as a page pointer NeilBrown
2009-10-16  5:49 ` [md PATCH 11/11] raid6/async_tx: handle holes in block list in async_syndrome_val NeilBrown
2009-10-16  5:49 ` [md PATCH 07/11] md/raid456: downlevel multicore operations to raid_run_ops NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.