All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/Resend] md: Push down data integrity code to personalities.
@ 2009-07-01  8:38 Andre Noll
  2009-07-07  3:42 ` Neil Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Andre Noll @ 2009-07-01  8:38 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Martin K. Petersen

[-- Attachment #1: Type: text/plain, Size: 9488 bytes --]

Hi Neil,

here's again the patch that reduces the knowledge about specific
raid levels from md.c by moving the data integrity code to the
personalities. The patch was tested and acked by Martin.

Please review.

Thanks
Andre

commit 51295532895ffe532a5d8401fc32073100268b29
Author: Andre Noll <maan@systemlinux.org>
Date:   Fri Jun 19 14:40:46 2009 +0200

    [PATCH/RFC] md: Push down data integrity code to personalities.
    
    This patch replaces md_integrity_check() by two new functions:
    md_integrity_register() and md_integrity_add_rdev() which are both
    personality-independent.
    
    md_integrity_register() is a public function which is called from
    the ->run method of all personalities that support data integrity.
    The function iterates over the component devices of the array and
    determines if all active devices are integrity capable and if their
    profiles match. If this is the case, the common profile is registered
    for the mddev via blk_integrity_register().
    
    The second new function, md_integrity_add_rdev(), is internal to
    md.c and is called by bind_rdev_to_array(), i.e. whenever a new
    device is about to be added to a raid array. If the new device does
    not support data integrity or has a profile different from the one
    already registered, data integrity for the mddev is disabled.
    
    Conversely, removing a device from a (raid1-)array might make the mddev
    integrity-capable. The patch adds a call to md_integrity_register()
    to the error path of raid1.c in order to activate data integrity in
    this case.

    Signed-off-by: Andre Noll <maan@systemlinux.org>
    Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index dda2f1b..15aa325 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -201,6 +201,7 @@ static int linear_run (mddev_t *mddev)
 	mddev->queue->unplug_fn = linear_unplug;
 	mddev->queue->backing_dev_info.congested_fn = linear_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
+	md_integrity_register(mddev);
 	return 0;
 }
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0f11fd1..54436cb 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1491,36 +1491,71 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
 
 static LIST_HEAD(pending_raid_disks);
 
-static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
+/*
+ * Try to register data integrity profile for an mddev
+ *
+ * This only succeeds if all working and active component devices are integrity
+ * capable with matching profiles.
+ */
+int md_integrity_register(mddev_t *mddev)
 {
-	struct mdk_personality *pers = mddev->pers;
-	struct gendisk *disk = mddev->gendisk;
+	mdk_rdev_t *rdev, *reference = NULL;
+
+	if (list_empty(&mddev->disks))
+		return 0; /* nothing to do */
+	if (blk_get_integrity(mddev->gendisk))
+		return 0; /* already registered */
+	list_for_each_entry(rdev, &mddev->disks, same_set) {
+		/* skip spares and non-functional disks */
+		if (test_bit(Faulty, &rdev->flags))
+			continue;
+		if (rdev->raid_disk < 0)
+			continue;
+		/*
+		 * If at least one rdev is not integrity capable, we can not
+		 * enable data integrity for the md device.
+		 */
+		if (!bdev_get_integrity(rdev->bdev))
+			return -EINVAL;
+		if (!reference) {
+			/* Use the first rdev as the reference */
+			reference = rdev;
+			continue;
+		}
+		/* does this rdev's profile match the reference profile? */
+		if (blk_integrity_compare(reference->bdev->bd_disk,
+				rdev->bdev->bd_disk) < 0)
+			return -EINVAL;
+	}
+	/*
+	 * All component devices are integrity capable and have matching
+	 * profiles, register the common profile for the md device.
+	 */
+	if (blk_integrity_register(mddev->gendisk,
+			bdev_get_integrity(reference->bdev)) != 0) {
+		printk(KERN_ERR "md: failed to register integrity for %s\n",
+			mdname(mddev));
+		return -EINVAL;
+	}
+	printk(KERN_NOTICE "md: data integrity on %s enabled\n",
+		mdname(mddev));
+	return 0;
+}
+EXPORT_SYMBOL(md_integrity_register);
+
+/* Disable data integrity if non-capable/non-matching disk is being added */
+static void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
+{
+	struct gendisk *gd = mddev->gendisk;
 	struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
-	struct blk_integrity *bi_mddev = blk_get_integrity(disk);
+	struct blk_integrity *bi_mddev = blk_get_integrity(gd);
 
-	/* Data integrity passthrough not supported on RAID 4, 5 and 6 */
-	if (pers && pers->level >= 4 && pers->level <= 6)
+	if (!bi_mddev) /* nothing to do */
 		return;
-
-	/* If rdev is integrity capable, register profile for mddev */
-	if (!bi_mddev && bi_rdev) {
-		if (blk_integrity_register(disk, bi_rdev))
-			printk(KERN_ERR "%s: %s Could not register integrity!\n",
-			       __func__, disk->disk_name);
-		else
-			printk(KERN_NOTICE "Enabling data integrity on %s\n",
-			       disk->disk_name);
+	if (bi_rdev && blk_integrity_compare(gd, rdev->bdev->bd_disk) >= 0)
 		return;
-	}
-
-	/* Check that mddev and rdev have matching profiles */
-	if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
-		printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
-		       disk->disk_name, rdev->bdev->bd_disk->disk_name);
-		printk(KERN_NOTICE "Disabling data integrity on %s\n",
-		       disk->disk_name);
-		blk_integrity_unregister(disk);
-	}
+	printk(KERN_NOTICE "disabling data integrity on %s\n", mdname(mddev));
+	blk_integrity_unregister(gd);
 }
 
 static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
@@ -1595,7 +1630,7 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 	/* May as well allow recovery to be retried once */
 	mddev->recovery_disabled = 0;
 
-	md_integrity_check(rdev, mddev);
+	md_integrity_add_rdev(rdev, mddev);
 	return 0;
 
  fail:
@@ -4048,10 +4083,6 @@ static int do_md_run(mddev_t * mddev)
 	}
 	strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
 
-	if (pers->level >= 4 && pers->level <= 6)
-		/* Cannot support integrity (yet) */
-		blk_integrity_unregister(mddev->gendisk);
-
 	if (mddev->reshape_position != MaxSector &&
 	    pers->start_reshape == NULL) {
 		/* This personality cannot handle reshaping... */
diff --git a/drivers/md/md.h b/drivers/md/md.h
index ea2c441..9433a5d 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -430,5 +430,6 @@ extern void md_new_event(mddev_t *mddev);
 extern int md_allow_write(mddev_t *mddev);
 extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
 extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
+extern int md_integrity_register(mddev_t *mddev);
 
 #endif /* _MD_MD_H */
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index c1ca63f..3d3a308 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -515,7 +515,7 @@ static int multipath_run (mddev_t *mddev)
 	mddev->queue->unplug_fn = multipath_unplug;
 	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
-
+	md_integrity_register(mddev);
 	return 0;
 
 out_free_conf:
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 851e631..902de77 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -346,6 +346,7 @@ static int raid0_run(mddev_t *mddev)
 
 	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
 	dump_zones(mddev);
+	md_integrity_register(mddev);
 	return 0;
 }
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 89939a7..44fbeda 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1045,6 +1045,11 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 	printk(KERN_ALERT "raid1: Disk failure on %s, disabling device.\n"
 		"raid1: Operation continuing on %d devices.\n",
 		bdevname(rdev->bdev,b), conf->raid_disks - mddev->degraded);
+	/*
+	 * The good news is that kicking a disk might allow to enable data
+	 * integrity on the mddev.
+	 */
+	md_integrity_register(mddev);
 }
 
 static void print_conf(conf_t *conf)
@@ -1178,7 +1183,9 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -2068,7 +2075,7 @@ static int run(mddev_t *mddev)
 	mddev->queue->unplug_fn = raid1_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
-
+	md_integrity_register(mddev);
 	return 0;
 
 out_no_mem:
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index ae12cea..3e553e3 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1203,7 +1203,9 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -2218,6 +2220,7 @@ static int run(mddev_t *mddev)
 
 	if (conf->near_copies < mddev->raid_disks)
 		blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
+	md_integrity_register(mddev);
 	return 0;
 
 out_free_conf:
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-07-01  8:38 [PATCH/Resend] md: Push down data integrity code to personalities Andre Noll
@ 2009-07-07  3:42 ` Neil Brown
  2009-07-07 13:44   ` Andre Noll
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Neil Brown @ 2009-07-07  3:42 UTC (permalink / raw)
  To: Andre Noll; +Cc: linux-raid, Martin K. Petersen

On Wednesday July 1, maan@systemlinux.org wrote:
> Hi Neil,
> 
> here's again the patch that reduces the knowledge about specific
> raid levels from md.c by moving the data integrity code to the
> personalities. The patch was tested and acked by Martin.
> 
> Please review.

Apologies for the delay.  I've been fighting a flu :-(

This patch seems to treat spares inconsistently.

md_integrity_register ignores spares.
However bind_rdev_to_array - which is used for adding a spare - calls
md_integrity_add_rdev to check that the integrity profile of the new
device matches.

We need to be consistent.
Either all devices that are bound to the array - whether active,
spare, or failed - are considered, or only the active devices are
considered.

In the former case we want to take action in bind_rdev_to_array
and possibly in unbind_rdev_from_array.
In the latter we need to take action either in remove_and_add_spares,
or in the per-personality ->hot_add_disk and ->hot_remove_disk
methods.

I think I lean towards the latter, and put code in ->hot_*_disk, but
it isn't a strong leaning.

Thanks,

NeilBrown


> 
> Thanks
> Andre
> 
> commit 51295532895ffe532a5d8401fc32073100268b29
> Author: Andre Noll <maan@systemlinux.org>
> Date:   Fri Jun 19 14:40:46 2009 +0200
> 
>     [PATCH/RFC] md: Push down data integrity code to personalities.
>     
>     This patch replaces md_integrity_check() by two new functions:
>     md_integrity_register() and md_integrity_add_rdev() which are both
>     personality-independent.
>     
>     md_integrity_register() is a public function which is called from
>     the ->run method of all personalities that support data integrity.
>     The function iterates over the component devices of the array and
>     determines if all active devices are integrity capable and if their
>     profiles match. If this is the case, the common profile is registered
>     for the mddev via blk_integrity_register().
>     
>     The second new function, md_integrity_add_rdev(), is internal to
>     md.c and is called by bind_rdev_to_array(), i.e. whenever a new
>     device is about to be added to a raid array. If the new device does
>     not support data integrity or has a profile different from the one
>     already registered, data integrity for the mddev is disabled.
>     
>     Conversely, removing a device from a (raid1-)array might make the mddev
>     integrity-capable. The patch adds a call to md_integrity_register()
>     to the error path of raid1.c in order to activate data integrity in
>     this case.
> 
>     Signed-off-by: Andre Noll <maan@systemlinux.org>
>     Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> diff --git a/drivers/md/linear.c b/drivers/md/linear.c
> index dda2f1b..15aa325 100644
> --- a/drivers/md/linear.c
> +++ b/drivers/md/linear.c
> @@ -201,6 +201,7 @@ static int linear_run (mddev_t *mddev)
>  	mddev->queue->unplug_fn = linear_unplug;
>  	mddev->queue->backing_dev_info.congested_fn = linear_congested;
>  	mddev->queue->backing_dev_info.congested_data = mddev;
> +	md_integrity_register(mddev);
>  	return 0;
>  }
>  
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 0f11fd1..54436cb 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -1491,36 +1491,71 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
>  
>  static LIST_HEAD(pending_raid_disks);
>  
> -static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
> +/*
> + * Try to register data integrity profile for an mddev
> + *
> + * This only succeeds if all working and active component devices are integrity
> + * capable with matching profiles.
> + */
> +int md_integrity_register(mddev_t *mddev)
>  {
> -	struct mdk_personality *pers = mddev->pers;
> -	struct gendisk *disk = mddev->gendisk;
> +	mdk_rdev_t *rdev, *reference = NULL;
> +
> +	if (list_empty(&mddev->disks))
> +		return 0; /* nothing to do */
> +	if (blk_get_integrity(mddev->gendisk))
> +		return 0; /* already registered */
> +	list_for_each_entry(rdev, &mddev->disks, same_set) {
> +		/* skip spares and non-functional disks */
> +		if (test_bit(Faulty, &rdev->flags))
> +			continue;
> +		if (rdev->raid_disk < 0)
> +			continue;
> +		/*
> +		 * If at least one rdev is not integrity capable, we can not
> +		 * enable data integrity for the md device.
> +		 */
> +		if (!bdev_get_integrity(rdev->bdev))
> +			return -EINVAL;
> +		if (!reference) {
> +			/* Use the first rdev as the reference */
> +			reference = rdev;
> +			continue;
> +		}
> +		/* does this rdev's profile match the reference profile? */
> +		if (blk_integrity_compare(reference->bdev->bd_disk,
> +				rdev->bdev->bd_disk) < 0)
> +			return -EINVAL;
> +	}
> +	/*
> +	 * All component devices are integrity capable and have matching
> +	 * profiles, register the common profile for the md device.
> +	 */
> +	if (blk_integrity_register(mddev->gendisk,
> +			bdev_get_integrity(reference->bdev)) != 0) {
> +		printk(KERN_ERR "md: failed to register integrity for %s\n",
> +			mdname(mddev));
> +		return -EINVAL;
> +	}
> +	printk(KERN_NOTICE "md: data integrity on %s enabled\n",
> +		mdname(mddev));
> +	return 0;
> +}
> +EXPORT_SYMBOL(md_integrity_register);
> +
> +/* Disable data integrity if non-capable/non-matching disk is being added */
> +static void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
> +{
> +	struct gendisk *gd = mddev->gendisk;
>  	struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
> -	struct blk_integrity *bi_mddev = blk_get_integrity(disk);
> +	struct blk_integrity *bi_mddev = blk_get_integrity(gd);
>  
> -	/* Data integrity passthrough not supported on RAID 4, 5 and 6 */
> -	if (pers && pers->level >= 4 && pers->level <= 6)
> +	if (!bi_mddev) /* nothing to do */
>  		return;
> -
> -	/* If rdev is integrity capable, register profile for mddev */
> -	if (!bi_mddev && bi_rdev) {
> -		if (blk_integrity_register(disk, bi_rdev))
> -			printk(KERN_ERR "%s: %s Could not register integrity!\n",
> -			       __func__, disk->disk_name);
> -		else
> -			printk(KERN_NOTICE "Enabling data integrity on %s\n",
> -			       disk->disk_name);
> +	if (bi_rdev && blk_integrity_compare(gd, rdev->bdev->bd_disk) >= 0)
>  		return;
> -	}
> -
> -	/* Check that mddev and rdev have matching profiles */
> -	if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
> -		printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
> -		       disk->disk_name, rdev->bdev->bd_disk->disk_name);
> -		printk(KERN_NOTICE "Disabling data integrity on %s\n",
> -		       disk->disk_name);
> -		blk_integrity_unregister(disk);
> -	}
> +	printk(KERN_NOTICE "disabling data integrity on %s\n", mdname(mddev));
> +	blk_integrity_unregister(gd);
>  }
>  
>  static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
> @@ -1595,7 +1630,7 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
>  	/* May as well allow recovery to be retried once */
>  	mddev->recovery_disabled = 0;
>  
> -	md_integrity_check(rdev, mddev);
> +	md_integrity_add_rdev(rdev, mddev);
>  	return 0;
>  
>   fail:
> @@ -4048,10 +4083,6 @@ static int do_md_run(mddev_t * mddev)
>  	}
>  	strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
>  
> -	if (pers->level >= 4 && pers->level <= 6)
> -		/* Cannot support integrity (yet) */
> -		blk_integrity_unregister(mddev->gendisk);
> -
>  	if (mddev->reshape_position != MaxSector &&
>  	    pers->start_reshape == NULL) {
>  		/* This personality cannot handle reshaping... */
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ea2c441..9433a5d 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -430,5 +430,6 @@ extern void md_new_event(mddev_t *mddev);
>  extern int md_allow_write(mddev_t *mddev);
>  extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
>  extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
> +extern int md_integrity_register(mddev_t *mddev);
>  
>  #endif /* _MD_MD_H */
> diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
> index c1ca63f..3d3a308 100644
> --- a/drivers/md/multipath.c
> +++ b/drivers/md/multipath.c
> @@ -515,7 +515,7 @@ static int multipath_run (mddev_t *mddev)
>  	mddev->queue->unplug_fn = multipath_unplug;
>  	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
>  	mddev->queue->backing_dev_info.congested_data = mddev;
> -
> +	md_integrity_register(mddev);
>  	return 0;
>  
>  out_free_conf:
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 851e631..902de77 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -346,6 +346,7 @@ static int raid0_run(mddev_t *mddev)
>  
>  	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
>  	dump_zones(mddev);
> +	md_integrity_register(mddev);
>  	return 0;
>  }
>  
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 89939a7..44fbeda 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1045,6 +1045,11 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
>  	printk(KERN_ALERT "raid1: Disk failure on %s, disabling device.\n"
>  		"raid1: Operation continuing on %d devices.\n",
>  		bdevname(rdev->bdev,b), conf->raid_disks - mddev->degraded);
> +	/*
> +	 * The good news is that kicking a disk might allow to enable data
> +	 * integrity on the mddev.
> +	 */
> +	md_integrity_register(mddev);
>  }
>  
>  static void print_conf(conf_t *conf)
> @@ -1178,7 +1183,9 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
>  			/* lost the race, try later */
>  			err = -EBUSY;
>  			p->rdev = rdev;
> +			goto abort;
>  		}
> +		md_integrity_register(mddev);
>  	}
>  abort:
>  
> @@ -2068,7 +2075,7 @@ static int run(mddev_t *mddev)
>  	mddev->queue->unplug_fn = raid1_unplug;
>  	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
>  	mddev->queue->backing_dev_info.congested_data = mddev;
> -
> +	md_integrity_register(mddev);
>  	return 0;
>  
>  out_no_mem:
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index ae12cea..3e553e3 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1203,7 +1203,9 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
>  			/* lost the race, try later */
>  			err = -EBUSY;
>  			p->rdev = rdev;
> +			goto abort;
>  		}
> +		md_integrity_register(mddev);
>  	}
>  abort:
>  
> @@ -2218,6 +2220,7 @@ static int run(mddev_t *mddev)
>  
>  	if (conf->near_copies < mddev->raid_disks)
>  		blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
> +	md_integrity_register(mddev);
>  	return 0;
>  
>  out_free_conf:
> -- 
> The only person who always got his work done by Friday was Robinson Crusoe

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-07-07  3:42 ` Neil Brown
@ 2009-07-07 13:44   ` Andre Noll
  2009-07-07 22:10   ` Bill Davidsen
  2009-07-13  8:54   ` Andre Noll
  2 siblings, 0 replies; 12+ messages in thread
From: Andre Noll @ 2009-07-07 13:44 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, Martin K. Petersen

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

On 13:42, Neil Brown wrote:

> We need to be consistent.
> Either all devices that are bound to the array - whether active,
> spare, or failed - are considered, or only the active devices are
> considered.

OK, I'll look at this and send a revised version of the patch.

Get well soon.
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-07-07  3:42 ` Neil Brown
  2009-07-07 13:44   ` Andre Noll
@ 2009-07-07 22:10   ` Bill Davidsen
  2009-07-13  8:54   ` Andre Noll
  2 siblings, 0 replies; 12+ messages in thread
From: Bill Davidsen @ 2009-07-07 22:10 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andre Noll, linux-raid, Martin K. Petersen

Neil Brown wrote:
> On Wednesday July 1, maan@systemlinux.org wrote:
>   
>> Hi Neil,
>>
>> here's again the patch that reduces the knowledge about specific
>> raid levels from md.c by moving the data integrity code to the
>> personalities. The patch was tested and acked by Martin.
>>
>> Please review.
>>     
>
> Apologies for the delay.  I've been fighting a flu :-(
>
> This patch seems to treat spares inconsistently.
>
> md_integrity_register ignores spares.
> However bind_rdev_to_array - which is used for adding a spare - calls
> md_integrity_add_rdev to check that the integrity profile of the new
> device matches.
>
> We need to be consistent.
> Either all devices that are bound to the array - whether active,
> spare, or failed - are considered, or only the active devices are
> considered.
>
> In the former case we want to take action in bind_rdev_to_array
> and possibly in unbind_rdev_from_array.
> In the latter we need to take action either in remove_and_add_spares,
> or in the per-personality ->hot_add_disk and ->hot_remove_disk
> methods.
>
> I think I lean towards the latter, and put code in ->hot_*_disk, but
> it isn't a strong leaning.
>   

Does this open problems with shared spares between arrays of different 
types?

-- 
Bill Davidsen <davidsen@tmr.com>
  Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one error occurs during
wildcard (glob) expansion.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-07-07  3:42 ` Neil Brown
  2009-07-07 13:44   ` Andre Noll
  2009-07-07 22:10   ` Bill Davidsen
@ 2009-07-13  8:54   ` Andre Noll
  2009-07-31  5:06     ` Neil Brown
  2 siblings, 1 reply; 12+ messages in thread
From: Andre Noll @ 2009-07-13  8:54 UTC (permalink / raw)
  To: Neil Brown; +Cc: Martin K. Petersen, linux-raid

[-- Attachment #1: Type: text/plain, Size: 11903 bytes --]

On Tue, Jul 07, 2009 at 01:42:56PM +1000, Neil Brown wrote:

> On Wednesday July 1, maan@systemlinux.org wrote:

> This patch seems to treat spares inconsistently.
> 
> md_integrity_register ignores spares.
> However bind_rdev_to_array - which is used for adding a spare - calls
> md_integrity_add_rdev to check that the integrity profile of the new
> device matches.
> 
> We need to be consistent.
> Either all devices that are bound to the array - whether active,
> spare, or failed - are considered, or only the active devices are
> considered.
> 
> In the former case we want to take action in bind_rdev_to_array
> and possibly in unbind_rdev_from_array.
> In the latter we need to take action either in remove_and_add_spares,
> or in the per-personality ->hot_add_disk and ->hot_remove_disk
> methods.
> 
> I think I lean towards the latter, and put code in ->hot_*_disk, but
> it isn't a strong leaning.

Here is a revised patch that puts calls to the new functions into
the ->hot_*_disk methods as you propose.

Side note: The patch causes the new gcc warning

	drivers/md/md.c: In function 'md_integrity_add_rdev':
	drivers/md/md.c:1546: warning: unused variable 'gd'

if data integrity is not compiled in. Any ideas on how to avoid it
without introducing an #ifdef-mess are welcome.

Please review.

Thanks
Andre

commit fff7635e729d3b767855b4435d7c9cc36ed62568
Author: Andre Noll <maan@systemlinux.org>
Date:   Sun Jul 12 22:17:28 2009 +0200

    md: Push down data integrity code to personalities.
    
    This patch replaces md_integrity_check() by two new public functions:
    md_integrity_register() and md_integrity_add_rdev() which are both
    personality-independent.
    
    md_integrity_register() is called from the ->run, ->hot_remove and
    the ->error methods of all personalities that support data integrity.
    The function iterates over the component devices of the array and
    determines if all active devices are integrity capable and if their
    profiles match. If this is the case, the common profile is registered
    for the mddev via blk_integrity_register().
    
    The second new function, md_integrity_add_rdev() is called from the
    ->hot_add_disk methods, i.e. whenever a new device is being added
    to a raid array. If the new device does not support data integrity,
    or has a profile different from the one already registered, data
    integrity for the mddev is disabled.
    
    For raid0 and linear, only the call to md_integrity_register() from
    the ->run method is necessary.
    
    Signed-off-by: Andre Noll <maan@systemlinux.org>

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 5810fa9..54c8677 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -220,6 +220,7 @@ static int linear_run (mddev_t *mddev)
 	mddev->queue->unplug_fn = linear_unplug;
 	mddev->queue->backing_dev_info.congested_fn = linear_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
+	md_integrity_register(mddev);
 	return 0;
 }
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 0f4a70c..1cbba68 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1487,36 +1487,74 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
 
 static LIST_HEAD(pending_raid_disks);
 
-static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
+/*
+ * Try to register data integrity profile for an mddev
+ *
+ * This is called when an array is started and after a disk has been kicked
+ * from the array. It only succeeds if all working and active component devices
+ * are integrity capable with matching profiles.
+ */
+int md_integrity_register(mddev_t *mddev)
 {
-	struct mdk_personality *pers = mddev->pers;
-	struct gendisk *disk = mddev->gendisk;
+	mdk_rdev_t *rdev, *reference = NULL;
+
+	if (list_empty(&mddev->disks))
+		return 0; /* nothing to do */
+	if (blk_get_integrity(mddev->gendisk))
+		return 0; /* already registered */
+	list_for_each_entry(rdev, &mddev->disks, same_set) {
+		/* skip spares and non-functional disks */
+		if (test_bit(Faulty, &rdev->flags))
+			continue;
+		if (rdev->raid_disk < 0)
+			continue;
+		/*
+		 * If at least one rdev is not integrity capable, we can not
+		 * enable data integrity for the md device.
+		 */
+		if (!bdev_get_integrity(rdev->bdev))
+			return -EINVAL;
+		if (!reference) {
+			/* Use the first rdev as the reference */
+			reference = rdev;
+			continue;
+		}
+		/* does this rdev's profile match the reference profile? */
+		if (blk_integrity_compare(reference->bdev->bd_disk,
+				rdev->bdev->bd_disk) < 0)
+			return -EINVAL;
+	}
+	/*
+	 * All component devices are integrity capable and have matching
+	 * profiles, register the common profile for the md device.
+	 */
+	if (blk_integrity_register(mddev->gendisk,
+			bdev_get_integrity(reference->bdev)) != 0) {
+		printk(KERN_ERR "md: failed to register integrity for %s\n",
+			mdname(mddev));
+		return -EINVAL;
+	}
+	printk(KERN_NOTICE "md: data integrity on %s enabled\n",
+		mdname(mddev));
+	return 0;
+}
+EXPORT_SYMBOL(md_integrity_register);
+
+/* Disable data integrity if non-capable/non-matching disk is being added */
+void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
+{
+	struct gendisk *gd = mddev->gendisk;
 	struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
-	struct blk_integrity *bi_mddev = blk_get_integrity(disk);
+	struct blk_integrity *bi_mddev = blk_get_integrity(gd);
 
-	/* Data integrity passthrough not supported on RAID 4, 5 and 6 */
-	if (pers && pers->level >= 4 && pers->level <= 6)
+	if (!bi_mddev) /* nothing to do */
 		return;
-
-	/* If rdev is integrity capable, register profile for mddev */
-	if (!bi_mddev && bi_rdev) {
-		if (blk_integrity_register(disk, bi_rdev))
-			printk(KERN_ERR "%s: %s Could not register integrity!\n",
-			       __func__, disk->disk_name);
-		else
-			printk(KERN_NOTICE "Enabling data integrity on %s\n",
-			       disk->disk_name);
+	if (rdev->raid_disk < 0) /* skip spares */
 		return;
-	}
-
-	/* Check that mddev and rdev have matching profiles */
-	if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
-		printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
-		       disk->disk_name, rdev->bdev->bd_disk->disk_name);
-		printk(KERN_NOTICE "Disabling data integrity on %s\n",
-		       disk->disk_name);
-		blk_integrity_unregister(disk);
-	}
+	if (bi_rdev && blk_integrity_compare(gd, rdev->bdev->bd_disk) >= 0)
+		return;
+	printk(KERN_NOTICE "disabling data integrity on %s\n", mdname(mddev));
+	blk_integrity_unregister(gd);
 }
 
 static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
@@ -1591,7 +1629,6 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 	/* May as well allow recovery to be retried once */
 	mddev->recovery_disabled = 0;
 
-	md_integrity_check(rdev, mddev);
 	return 0;
 
  fail:
@@ -4046,10 +4083,6 @@ static int do_md_run(mddev_t * mddev)
 	}
 	strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
 
-	if (pers->level >= 4 && pers->level <= 6)
-		/* Cannot support integrity (yet) */
-		blk_integrity_unregister(mddev->gendisk);
-
 	if (mddev->reshape_position != MaxSector &&
 	    pers->start_reshape == NULL) {
 		/* This personality cannot handle reshaping... */
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 9430a11..78f0316 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -431,5 +431,7 @@ extern int md_allow_write(mddev_t *mddev);
 extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
 extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
 extern int md_check_no_bitmap(mddev_t *mddev);
+extern int md_integrity_register(mddev_t *mddev);
+void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
 
 #endif /* _MD_MD_H */
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 237fe3f..7289c30 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -248,6 +248,7 @@ static void multipath_error (mddev_t *mddev, mdk_rdev_t *rdev)
 				" on %d IO paths.\n",
 				bdevname (rdev->bdev,b),
 				conf->working_disks);
+			md_integrity_register(mddev);
 		}
 	}
 }
@@ -313,6 +314,7 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			set_bit(In_sync, &rdev->flags);
 			rcu_assign_pointer(p->rdev, rdev);
 			err = 0;
+			md_integrity_add_rdev(rdev, mddev);
 			break;
 		}
 
@@ -345,7 +347,9 @@ static int multipath_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -519,7 +523,7 @@ static int multipath_run (mddev_t *mddev)
 	mddev->queue->unplug_fn = multipath_unplug;
 	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
-
+	md_integrity_register(mddev);
 	return 0;
 
 out_free_conf:
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 335f490..898e2bd 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -351,6 +351,7 @@ static int raid0_run(mddev_t *mddev)
 
 	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
 	dump_zones(mddev);
+	md_integrity_register(mddev);
 	return 0;
 }
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0569efb..f316bfc 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1045,6 +1045,7 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 	printk(KERN_ALERT "raid1: Disk failure on %s, disabling device.\n"
 		"raid1: Operation continuing on %d devices.\n",
 		bdevname(rdev->bdev,b), conf->raid_disks - mddev->degraded);
+	md_integrity_register(mddev);
 }
 
 static void print_conf(conf_t *conf)
@@ -1144,7 +1145,7 @@ static int raid1_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			rcu_assign_pointer(p->rdev, rdev);
 			break;
 		}
-
+	md_integrity_add_rdev(rdev, mddev);
 	print_conf(conf);
 	return err;
 }
@@ -1178,7 +1179,9 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -2067,7 +2070,7 @@ static int run(mddev_t *mddev)
 	mddev->queue->unplug_fn = raid1_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
-
+	md_integrity_register(mddev);
 	return 0;
 
 out_no_mem:
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7298a5e..b224b75 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1039,6 +1039,7 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 	printk(KERN_ALERT "raid10: Disk failure on %s, disabling device.\n"
 		"raid10: Operation continuing on %d devices.\n",
 		bdevname(rdev->bdev,b), conf->raid_disks - mddev->degraded);
+	md_integrity_register(mddev);
 }
 
 static void print_conf(conf_t *conf)
@@ -1170,6 +1171,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			break;
 		}
 
+	md_integrity_add_rdev(rdev, mddev);
 	print_conf(conf);
 	return err;
 }
@@ -1203,7 +1205,9 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -2225,6 +2229,7 @@ static int run(mddev_t *mddev)
 
 	if (conf->near_copies < mddev->raid_disks)
 		blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
+	md_integrity_register(mddev);
 	return 0;
 
 out_free_conf:
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-07-13  8:54   ` Andre Noll
@ 2009-07-31  5:06     ` Neil Brown
  2009-08-03 16:40       ` Andre Noll
  0 siblings, 1 reply; 12+ messages in thread
From: Neil Brown @ 2009-07-31  5:06 UTC (permalink / raw)
  To: Andre Noll; +Cc: Martin K. Petersen, linux-raid

On Monday July 13, maan@systemlinux.org wrote:

Only 18 days later.....

> 
> Here is a revised patch that puts calls to the new functions into
> the ->hot_*_disk methods as you propose.

Thanks.  Much better.
Calling md_integrity_check from the ->error routines isn't a good idea
though.  They can be called in interrupt context, and
md_integrity_check can call kmalloc(..., GFP_KERNEL) which might try
to sleep.  That would be bad.
We don't need the call in ->error, having it in ->hot_remove_disk is
adequate, as that is called soon after any failure (it doesn't wait
for the sysadmin to "mdadm --remove ...".).

So I have made that change and updated the comment accordingly.

> 
> Side note: The patch causes the new gcc warning
> 
> 	drivers/md/md.c: In function 'md_integrity_add_rdev':
> 	drivers/md/md.c:1546: warning: unused variable 'gd'
> 
> if data integrity is not compiled in. Any ideas on how to avoid it
> without introducing an #ifdef-mess are welcome.

I just replaced ever occurrence of "gd" with "mddev->gendisk".  It
isn't too ugly.
Alternately, blkdev.h could have

static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
{
	return NULL;
}

in place of

#define blk_get_integrity(a)			(0)

and then there would be no complaints about unused variables, and
slightly improved type checking.

Thanks.

NeilBrown


From 66f4f3f41f5c334f399e920b2aaad9b82514acda Mon Sep 17 00:00:00 2001
From: Andre Noll <maan@systemlinux.org>
Date: Fri, 31 Jul 2009 15:04:50 +1000
Subject: [PATCH] md: Push down data integrity code to personalities.

This patch replaces md_integrity_check() by two new public functions:
md_integrity_register() and md_integrity_add_rdev() which are both
personality-independent.

md_integrity_register() is called from the ->run and ->hot_remove
methods of all personalities that support data integrity.  The
function iterates over the component devices of the array and
determines if all active devices are integrity capable and if their
profiles match. If this is the case, the common profile is registered
for the mddev via blk_integrity_register().

The second new function, md_integrity_add_rdev() is called from the
->hot_add_disk methods, i.e. whenever a new device is being added
to a raid array. If the new device does not support data integrity,
or has a profile different from the one already registered, data
integrity for the mddev is disabled.

For raid0 and linear, only the call to md_integrity_register() from
the ->run method is necessary.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/md/linear.c    |    1 +
 drivers/md/md.c        |   93 ++++++++++++++++++++++++++++++++---------------
 drivers/md/md.h        |    2 +
 drivers/md/multipath.c |    5 ++-
 drivers/md/raid0.c     |    1 +
 drivers/md/raid1.c     |    6 ++-
 drivers/md/raid10.c    |    4 ++
 7 files changed, 79 insertions(+), 33 deletions(-)

diff --git a/drivers/md/linear.c b/drivers/md/linear.c
index 5810fa9..54c8677 100644
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -220,6 +220,7 @@ static int linear_run (mddev_t *mddev)
 	mddev->queue->unplug_fn = linear_unplug;
 	mddev->queue->backing_dev_info.congested_fn = linear_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
+	md_integrity_register(mddev);
 	return 0;
 }
 
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d4351ff..13535f0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1487,36 +1487,74 @@ static int match_mddev_units(mddev_t *mddev1, mddev_t *mddev2)
 
 static LIST_HEAD(pending_raid_disks);
 
-static void md_integrity_check(mdk_rdev_t *rdev, mddev_t *mddev)
+/*
+ * Try to register data integrity profile for an mddev
+ *
+ * This is called when an array is started and after a disk has been kicked
+ * from the array. It only succeeds if all working and active component devices
+ * are integrity capable with matching profiles.
+ */
+int md_integrity_register(mddev_t *mddev)
+{
+	mdk_rdev_t *rdev, *reference = NULL;
+
+	if (list_empty(&mddev->disks))
+		return 0; /* nothing to do */
+	if (blk_get_integrity(mddev->gendisk))
+		return 0; /* already registered */
+	list_for_each_entry(rdev, &mddev->disks, same_set) {
+		/* skip spares and non-functional disks */
+		if (test_bit(Faulty, &rdev->flags))
+			continue;
+		if (rdev->raid_disk < 0)
+			continue;
+		/*
+		 * If at least one rdev is not integrity capable, we can not
+		 * enable data integrity for the md device.
+		 */
+		if (!bdev_get_integrity(rdev->bdev))
+			return -EINVAL;
+		if (!reference) {
+			/* Use the first rdev as the reference */
+			reference = rdev;
+			continue;
+		}
+		/* does this rdev's profile match the reference profile? */
+		if (blk_integrity_compare(reference->bdev->bd_disk,
+				rdev->bdev->bd_disk) < 0)
+			return -EINVAL;
+	}
+	/*
+	 * All component devices are integrity capable and have matching
+	 * profiles, register the common profile for the md device.
+	 */
+	if (blk_integrity_register(mddev->gendisk,
+			bdev_get_integrity(reference->bdev)) != 0) {
+		printk(KERN_ERR "md: failed to register integrity for %s\n",
+			mdname(mddev));
+		return -EINVAL;
+	}
+	printk(KERN_NOTICE "md: data integrity on %s enabled\n",
+		mdname(mddev));
+	return 0;
+}
+EXPORT_SYMBOL(md_integrity_register);
+
+/* Disable data integrity if non-capable/non-matching disk is being added */
+void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev)
 {
-	struct mdk_personality *pers = mddev->pers;
-	struct gendisk *disk = mddev->gendisk;
 	struct blk_integrity *bi_rdev = bdev_get_integrity(rdev->bdev);
-	struct blk_integrity *bi_mddev = blk_get_integrity(disk);
+	struct blk_integrity *bi_mddev = blk_get_integrity(mddev->gendisk);
 
-	/* Data integrity passthrough not supported on RAID 4, 5 and 6 */
-	if (pers && pers->level >= 4 && pers->level <= 6)
+	if (!bi_mddev) /* nothing to do */
 		return;
-
-	/* If rdev is integrity capable, register profile for mddev */
-	if (!bi_mddev && bi_rdev) {
-		if (blk_integrity_register(disk, bi_rdev))
-			printk(KERN_ERR "%s: %s Could not register integrity!\n",
-			       __func__, disk->disk_name);
-		else
-			printk(KERN_NOTICE "Enabling data integrity on %s\n",
-			       disk->disk_name);
+	if (rdev->raid_disk < 0) /* skip spares */
 		return;
-	}
-
-	/* Check that mddev and rdev have matching profiles */
-	if (blk_integrity_compare(disk, rdev->bdev->bd_disk) < 0) {
-		printk(KERN_ERR "%s: %s/%s integrity mismatch!\n", __func__,
-		       disk->disk_name, rdev->bdev->bd_disk->disk_name);
-		printk(KERN_NOTICE "Disabling data integrity on %s\n",
-		       disk->disk_name);
-		blk_integrity_unregister(disk);
-	}
+	if (bi_rdev && blk_integrity_compare(mddev->gendisk,
+					     rdev->bdev->bd_disk) >= 0)
+		return;
+	printk(KERN_NOTICE "disabling data integrity on %s\n", mdname(mddev));
+	blk_integrity_unregister(mddev->gendisk);
 }
 
 static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
@@ -1591,7 +1629,6 @@ static int bind_rdev_to_array(mdk_rdev_t * rdev, mddev_t * mddev)
 	/* May as well allow recovery to be retried once */
 	mddev->recovery_disabled = 0;
 
-	md_integrity_check(rdev, mddev);
 	return 0;
 
  fail:
@@ -4048,10 +4085,6 @@ static int do_md_run(mddev_t * mddev)
 	}
 	strlcpy(mddev->clevel, pers->name, sizeof(mddev->clevel));
 
-	if (pers->level >= 4 && pers->level <= 6)
-		/* Cannot support integrity (yet) */
-		blk_integrity_unregister(mddev->gendisk);
-
 	if (mddev->reshape_position != MaxSector &&
 	    pers->start_reshape == NULL) {
 		/* This personality cannot handle reshaping... */
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 9430a11..78f0316 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -431,5 +431,7 @@ extern int md_allow_write(mddev_t *mddev);
 extern void md_wait_for_blocked_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
 extern void md_set_array_sectors(mddev_t *mddev, sector_t array_sectors);
 extern int md_check_no_bitmap(mddev_t *mddev);
+extern int md_integrity_register(mddev_t *mddev);
+void md_integrity_add_rdev(mdk_rdev_t *rdev, mddev_t *mddev);
 
 #endif /* _MD_MD_H */
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
index 237fe3f..7140909 100644
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -313,6 +313,7 @@ static int multipath_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			set_bit(In_sync, &rdev->flags);
 			rcu_assign_pointer(p->rdev, rdev);
 			err = 0;
+			md_integrity_add_rdev(rdev, mddev);
 			break;
 		}
 
@@ -345,7 +346,9 @@ static int multipath_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -519,7 +522,7 @@ static int multipath_run (mddev_t *mddev)
 	mddev->queue->unplug_fn = multipath_unplug;
 	mddev->queue->backing_dev_info.congested_fn = multipath_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
-
+	md_integrity_register(mddev);
 	return 0;
 
 out_free_conf:
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 335f490..898e2bd 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -351,6 +351,7 @@ static int raid0_run(mddev_t *mddev)
 
 	blk_queue_merge_bvec(mddev->queue, raid0_mergeable_bvec);
 	dump_zones(mddev);
+	md_integrity_register(mddev);
 	return 0;
 }
 
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0569efb..67e794d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1144,7 +1144,7 @@ static int raid1_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			rcu_assign_pointer(p->rdev, rdev);
 			break;
 		}
-
+	md_integrity_add_rdev(rdev, mddev);
 	print_conf(conf);
 	return err;
 }
@@ -1178,7 +1178,9 @@ static int raid1_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -2067,7 +2069,7 @@ static int run(mddev_t *mddev)
 	mddev->queue->unplug_fn = raid1_unplug;
 	mddev->queue->backing_dev_info.congested_fn = raid1_congested;
 	mddev->queue->backing_dev_info.congested_data = mddev;
-
+	md_integrity_register(mddev);
 	return 0;
 
 out_no_mem:
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7298a5e..3d9020c 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1170,6 +1170,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
 			break;
 		}
 
+	md_integrity_add_rdev(rdev, mddev);
 	print_conf(conf);
 	return err;
 }
@@ -1203,7 +1204,9 @@ static int raid10_remove_disk(mddev_t *mddev, int number)
 			/* lost the race, try later */
 			err = -EBUSY;
 			p->rdev = rdev;
+			goto abort;
 		}
+		md_integrity_register(mddev);
 	}
 abort:
 
@@ -2225,6 +2228,7 @@ static int run(mddev_t *mddev)
 
 	if (conf->near_copies < mddev->raid_disks)
 		blk_queue_merge_bvec(mddev->queue, raid10_mergeable_bvec);
+	md_integrity_register(mddev);
 	return 0;
 
 out_free_conf:
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-07-31  5:06     ` Neil Brown
@ 2009-08-03 16:40       ` Andre Noll
  2009-08-04  5:28         ` Martin K. Petersen
  2009-08-06  3:31         ` Neil Brown
  0 siblings, 2 replies; 12+ messages in thread
From: Andre Noll @ 2009-08-03 16:40 UTC (permalink / raw)
  To: Neil Brown; +Cc: Martin K. Petersen, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1838 bytes --]

On 15:06, Neil Brown wrote:
> > Here is a revised patch that puts calls to the new functions into
> > the ->hot_*_disk methods as you propose.
> 
> Thanks.  Much better.
> Calling md_integrity_check from the ->error routines isn't a good idea
> though.  They can be called in interrupt context, and
> md_integrity_check can call kmalloc(..., GFP_KERNEL) which might try
> to sleep.  That would be bad.
> We don't need the call in ->error, having it in ->hot_remove_disk is
> adequate, as that is called soon after any failure (it doesn't wait
> for the sysadmin to "mdadm --remove ...".).
> 
> So I have made that change and updated the comment accordingly.

Thanks. Sorry if this is a FAQ, but how can one tell whether a given
function may be called in interrupt context? Is there a better way
than recursively checking all its callers?

> > Side note: The patch causes the new gcc warning
> > 
> > 	drivers/md/md.c: In function 'md_integrity_add_rdev':
> > 	drivers/md/md.c:1546: warning: unused variable 'gd'
> > 
> > if data integrity is not compiled in. Any ideas on how to avoid it
> > without introducing an #ifdef-mess are welcome.
> 
> I just replaced ever occurrence of "gd" with "mddev->gendisk".  It
> isn't too ugly.
> Alternately, blkdev.h could have
> 
> static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
> {
> 	return NULL;
> }
> 
> in place of
> 
> #define blk_get_integrity(a)			(0)
> 
> and then there would be no complaints about unused variables, and
> slightly improved type checking.

IMO this second alternative is be a bit cleaner and it might help
other users of blk_get_integrity() as well. Martin, are you OK with
this change to blkdev.h?

Regards
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-08-03 16:40       ` Andre Noll
@ 2009-08-04  5:28         ` Martin K. Petersen
  2009-08-06  8:37           ` Andre Noll
  2009-08-06  3:31         ` Neil Brown
  1 sibling, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2009-08-04  5:28 UTC (permalink / raw)
  To: Andre Noll; +Cc: Neil Brown, Martin K. Petersen, linux-raid

>>>>> "Andre" == Andre Noll <maan@systemlinux.org> writes:

[Inlining blk_get_integrity]

Andre> IMO this second alternative is be a bit cleaner and it might help
Andre> other users of blk_get_integrity() as well. Martin, are you OK
Andre> with this change to blkdev.h?

Yeah, that's fine.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-08-03 16:40       ` Andre Noll
  2009-08-04  5:28         ` Martin K. Petersen
@ 2009-08-06  3:31         ` Neil Brown
  2009-08-07 16:46           ` Andre Noll
  1 sibling, 1 reply; 12+ messages in thread
From: Neil Brown @ 2009-08-06  3:31 UTC (permalink / raw)
  To: Andre Noll; +Cc: Martin K. Petersen, linux-raid

On Monday August 3, maan@systemlinux.org wrote:
> On 15:06, Neil Brown wrote:
> > > Here is a revised patch that puts calls to the new functions into
> > > the ->hot_*_disk methods as you propose.
> > 
> > Thanks.  Much better.
> > Calling md_integrity_check from the ->error routines isn't a good idea
> > though.  They can be called in interrupt context, and
> > md_integrity_check can call kmalloc(..., GFP_KERNEL) which might try
> > to sleep.  That would be bad.
> > We don't need the call in ->error, having it in ->hot_remove_disk is
> > adequate, as that is called soon after any failure (it doesn't wait
> > for the sysadmin to "mdadm --remove ...".).
> > 
> > So I have made that change and updated the comment accordingly.
> 
> Thanks. Sorry if this is a FAQ, but how can one tell whether a given
> function may be called in interrupt context? Is there a better way
> than recursively checking all its callers?

I think you just have to 'know'. :-(
Some functions which mustn't be called from interrupts are
'documented' as such by calling "might_sleep()", but there is no
similar documentation for the reverse.

All ->bi_endio routines are called from interrupts.. or maybe
from softirqs or something similar.  I think the important point is
that they are called without a process context, so they cannot sleep
(i.e. no kmalloc unless you use GFP_ATOMIC, no mutex_lock, no
wait_event etc) and should use spin_lock_irqsave or spin_lock_bh
rather than a bare spin_lock.

NeilBrown

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-08-04  5:28         ` Martin K. Petersen
@ 2009-08-06  8:37           ` Andre Noll
  2009-08-07  4:48             ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Andre Noll @ 2009-08-06  8:37 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Neil Brown, linux-raid

[-- Attachment #1: Type: text/plain, Size: 2474 bytes --]

On 01:28, Martin K. Petersen wrote:
> >>>>> "Andre" == Andre Noll <maan@systemlinux.org> writes:
> 
> [Inlining blk_get_integrity]
> 
> Andre> IMO this second alternative is be a bit cleaner and it might help
> Andre> other users of blk_get_integrity() as well. Martin, are you OK
> Andre> with this change to blkdev.h?
> 
> Yeah, that's fine.

So here's a patch that replaces the dummy macros for data integrity
by inline functions. The patch is only compile-tesed, but that's
hopefully OK as the new functions are only used if data integrity is
not compiled in and are thrown away by gcc anyway.

Thanks
Andre

commit e550b6fe0850933d1be4c430a22d25f9040c7627
Author: Andre Noll <maan@systemlinux.org>
Date:   Thu Aug 6 10:32:37 2009 +0200

    [blkdev.h] Replace dummy macros by inline functions.
    
    This avoids complaints about unused variables and improves type checking.

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 69103e0..81d683f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1191,14 +1191,47 @@ static inline int blk_integrity_rq(struct request *rq)
 
 #else /* CONFIG_BLK_DEV_INTEGRITY */
 
-#define blk_integrity_rq(rq)			(0)
-#define blk_rq_count_integrity_sg(a)		(0)
-#define blk_rq_map_integrity_sg(a, b)		(0)
-#define bdev_get_integrity(a)			(0)
-#define blk_get_integrity(a)			(0)
-#define blk_integrity_compare(a, b)		(0)
-#define blk_integrity_register(a, b)		(0)
-#define blk_integrity_unregister(a)		do { } while (0);
+static inline int blk_integrity_rq(struct request *rq)
+{
+	return 0;
+}
+
+static inline int blk_rq_count_integrity_sg(struct request *rq)
+{
+	return 0;
+}
+
+static inline
+int blk_rq_map_integrity_sg(struct request *rq, struct scatterlist *sglist)
+{
+	return 0;
+}
+
+static inline
+struct blk_integrity *bdev_get_integrity(struct block_device *bdev)
+{
+	return NULL;
+}
+
+static inline struct blk_integrity *blk_get_integrity(struct gendisk *disk)
+{
+	return NULL;
+}
+
+static inline int blk_integrity_compare(struct gendisk *gd1, struct gendisk *gd2)
+{
+	return 0;
+}
+
+static inline
+int blk_integrity_register(struct gendisk *disk, struct blk_integrity *template)
+{
+	return 0;
+}
+
+static inline void blk_integrity_unregister(struct gendisk *disk)
+{
+}
 
 #endif /* CONFIG_BLK_DEV_INTEGRITY */
 
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-08-06  8:37           ` Andre Noll
@ 2009-08-07  4:48             ` Martin K. Petersen
  0 siblings, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2009-08-07  4:48 UTC (permalink / raw)
  To: Andre Noll; +Cc: Martin K. Petersen, Neil Brown, linux-raid

>>>>> "Andre" == Andre Noll <maan@systemlinux.org> writes:

Andre> So here's a patch that replaces the dummy macros for data
Andre> integrity by inline functions. The patch is only compile-tesed,
Andre> but that's hopefully OK as the new functions are only used if
Andre> data integrity is not compiled in and are thrown away by gcc
Andre> anyway.

Looks ok.  Please send it to Jens.

Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH/Resend] md: Push down data integrity code to personalities.
  2009-08-06  3:31         ` Neil Brown
@ 2009-08-07 16:46           ` Andre Noll
  0 siblings, 0 replies; 12+ messages in thread
From: Andre Noll @ 2009-08-07 16:46 UTC (permalink / raw)
  To: Neil Brown; +Cc: Martin K. Petersen, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1234 bytes --]

On 13:31, Neil Brown wrote:
> > Thanks. Sorry if this is a FAQ, but how can one tell whether a given
> > function may be called in interrupt context? Is there a better way
> > than recursively checking all its callers?
> 
> I think you just have to 'know'. :-(
> Some functions which mustn't be called from interrupts are
> 'documented' as such by calling "might_sleep()", but there is no
> similar documentation for the reverse.

There's exactly one call to might_sleep() in drivers/md/*.c ;)

> All ->bi_endio routines are called from interrupts.. or maybe
> from softirqs or something similar.  I think the important point is
> that they are called without a process context, so they cannot sleep
> (i.e. no kmalloc unless you use GFP_ATOMIC, no mutex_lock, no
> wait_event etc) and should use spin_lock_irqsave or spin_lock_bh
> rather than a bare spin_lock.

So spin_lock_irqsave() and friends are other indicators that the
function in question might be called from interrupts while kmalloc(...,
GFP_KERNEL) indicates the converse, i.e. that the function is called
with process context.

Thanks for the explanation.
Andre
-- 
The only person who always got his work done by Friday was Robinson Crusoe

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-08-07 16:46 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-01  8:38 [PATCH/Resend] md: Push down data integrity code to personalities Andre Noll
2009-07-07  3:42 ` Neil Brown
2009-07-07 13:44   ` Andre Noll
2009-07-07 22:10   ` Bill Davidsen
2009-07-13  8:54   ` Andre Noll
2009-07-31  5:06     ` Neil Brown
2009-08-03 16:40       ` Andre Noll
2009-08-04  5:28         ` Martin K. Petersen
2009-08-06  8:37           ` Andre Noll
2009-08-07  4:48             ` Martin K. Petersen
2009-08-06  3:31         ` Neil Brown
2009-08-07 16:46           ` Andre Noll

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.