* [PATCH] md linear: Protecting mddev with rcu locks to avoid races in @ 2009-06-06 15:24 SandeepKsinha 2009-06-14 22:56 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: SandeepKsinha @ 2009-06-06 15:24 UTC (permalink / raw) To: Neil Brown; +Cc: Linux RAID Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> Protecting mddev with barriers to avoid races. diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 9ad6ec4..a56095c 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -28,9 +28,11 @@ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) { int lo, mid, hi; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; + rcu_read_lock(); lo = 0; + conf = rcu_dereference(mddev->private); hi = mddev->raid_disks - 1; /* @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) else lo = mid + 1; } - + rcu_read_unlock(); return conf->disks + lo; } @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, static void linear_unplug(struct request_queue *q) { mddev_t *mddev = q->queuedata; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; int i; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); + for (i=0; i < mddev->raid_disks; i++) { struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); blk_unplug(r_queue); } + rcu_read_unlock(); } static int linear_congested(void *data, int bits) { mddev_t *mddev = data; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; int i, ret = 0; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); + for (i = 0; i < mddev->raid_disks && !ret ; i++) { struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); ret |= bdi_congested(&q->backing_dev_info, bits); } + + rcu_read_unlock(); return ret; } static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) { - linear_conf_t *conf = mddev_to_conf(mddev); - + linear_conf_t *conf; + sector_t array_sectors; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); WARN_ONCE(sectors || raid_disks, "%s does not support generic reshape\n", __func__); - - return conf->array_sectors; + array_sectors = conf->array_sectors; + rcu_read_unlock(); + + return array_sectors; } static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) return -EINVAL; rdev->raid_disk = rdev->saved_raid_disk; - - newconf = linear_conf(mddev,mddev->raid_disks+1); + newconf = linear_conf(mddev,mddev->raid_disks + 1); if (!newconf) return -ENOMEM; newconf->prev = mddev_to_conf(mddev); - mddev->private = newconf; mddev->raid_disks++; + rcu_assign_pointer(mddev->private,newconf); md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); set_capacity(mddev->gendisk, mddev->array_sectors); return 0; @@ -231,14 +245,17 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) static int linear_stop (mddev_t *mddev) { - linear_conf_t *conf = mddev_to_conf(mddev); - + linear_conf_t *conf; + + rcu_read_lock(); + conf = rcu_dereference(mddev->private); blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ do { linear_conf_t *t = conf->prev; kfree(conf); conf = t; } while (conf); + rcu_read_unlock(); return 0; } -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-06 15:24 [PATCH] md linear: Protecting mddev with rcu locks to avoid races in SandeepKsinha @ 2009-06-14 22:56 ` Neil Brown 2009-06-17 6:35 ` SandeepKsinha 0 siblings, 1 reply; 14+ messages in thread From: Neil Brown @ 2009-06-14 22:56 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=unknown, Size: 5276 bytes --] Hi, Thanks for this patch, and sorry for the delay in reviewing it. I have a few issues: On Saturday June 6, sandeepksinha@gmail.com wrote: > Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> > > Protecting mddev with barriers to avoid races. 1/ You need a lot more of an explanatory comment than this. At least give some hint as to what the races are. Give than the rcu primitives are used, it now makes sense to use e.g. call_rcu to free the old 'conf'. That might reasonably be in a separate patch, but the comment on this patch should at least at that possibility. > > diff --git a/drivers/md/linear.c b/drivers/md/linear.c > index 9ad6ec4..a56095c 100644 > --- a/drivers/md/linear.c > +++ b/drivers/md/linear.c > @@ -28,9 +28,11 @@ > static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) > { > int lo, mid, hi; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > > + rcu_read_lock(); > lo = 0; > + conf = rcu_dereference(mddev->private); > hi = mddev->raid_disks - 1; > 2/ mddev->raid_disks should really be dereferenced before 'conf'. Doing it the way you have done it, the 'raid_disks' value could be larger than the value supported by the 'conf' so things could go wrong. > /* > @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, > sector_t sector) > else > lo = mid + 1; > } > - > + rcu_read_unlock(); > return conf->disks + lo; > } 3/ We are accessing conf->disks well after the rcu_lock has been released. That is not exactly a problem with the code as it stands. But if we do go ahead and free the old 'conf' with call_rcu, then this becomes racy. We should hold the rcu_read_lock for the entire time that we are accessing the contents of 'conf'. That means we don't take rcu_read_lock in which_dev, but rather take it in the two functions that call which_dev. > > @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, > static void linear_unplug(struct request_queue *q) > { > mddev_t *mddev = q->queuedata; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > int i; > > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > + > for (i=0; i < mddev->raid_disks; i++) { > struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); > blk_unplug(r_queue); > } > + rcu_read_unlock(); > } > > static int linear_congested(void *data, int bits) > { > mddev_t *mddev = data; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > int i, ret = 0; > > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > + > for (i = 0; i < mddev->raid_disks && !ret ; i++) { > struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); > ret |= bdi_congested(&q->backing_dev_info, bits); > } > + > + rcu_read_unlock(); > return ret; > } > > static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) > { > - linear_conf_t *conf = mddev_to_conf(mddev); > - > + linear_conf_t *conf; > + sector_t array_sectors; > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > WARN_ONCE(sectors || raid_disks, > "%s does not support generic reshape\n", __func__); > - > - return conf->array_sectors; > + array_sectors = conf->array_sectors; > + rcu_read_unlock(); > + > + return array_sectors; > } > > static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) > @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) > return -EINVAL; > > rdev->raid_disk = rdev->saved_raid_disk; > - > - newconf = linear_conf(mddev,mddev->raid_disks+1); > + newconf = linear_conf(mddev,mddev->raid_disks + 1); > > if (!newconf) > return -ENOMEM; > > newconf->prev = mddev_to_conf(mddev); > - mddev->private = newconf; > mddev->raid_disks++; > + rcu_assign_pointer(mddev->private,newconf); > md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); > set_capacity(mddev->gendisk, mddev->array_sectors); > return 0; > @@ -231,14 +245,17 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) > > static int linear_stop (mddev_t *mddev) > { > - linear_conf_t *conf = mddev_to_conf(mddev); > - > + linear_conf_t *conf; > + > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ > do { > linear_conf_t *t = conf->prev; > kfree(conf); > conf = t; > } while (conf); > + rcu_read_unlock(); > 4/ We don't need the rcu protection here as we hold ->reconfig_mutex both in linear_add and linear_stop, so they cannot race. Adding a comment to this effect might be a good idea though. Thanks, NeilBrown > return 0; > } > -- > Regards, > Sandeep. > > > > > > > To learn is to change. Education is a process that changes the learner. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-14 22:56 ` Neil Brown @ 2009-06-17 6:35 ` SandeepKsinha 2009-06-17 6:46 ` SandeepKsinha 2009-06-17 6:50 ` NeilBrown 0 siblings, 2 replies; 14+ messages in thread From: SandeepKsinha @ 2009-06-17 6:35 UTC (permalink / raw) To: Neil Brown; +Cc: Linux RAID Neil, On Mon, Jun 15, 2009 at 4:26 AM, Neil Brown<neilb@suse.de> wrote: > > Hi, > Thanks for this patch, and sorry for the delay in reviewing it. > > I have a few issues: > > On Saturday June 6, sandeepksinha@gmail.com wrote: >> Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> >> >> Protecting mddev with barriers to avoid races. > > 1/ You need a lot more of an explanatory comment than this. > At least give some hint as to what the races are. > Give than the rcu primitives are used, it now makes sense to use > e.g. call_rcu to free the old 'conf'. That might reasonably be in a > separate patch, but the comment on this patch should at least at that > possibility. >> Sure. I shall do it for the final patch. I will also take care of this henceforth. >> diff --git a/drivers/md/linear.c b/drivers/md/linear.c >> index 9ad6ec4..a56095c 100644 >> --- a/drivers/md/linear.c >> +++ b/drivers/md/linear.c >> @@ -28,9 +28,11 @@ >> static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) >> { >> int lo, mid, hi; >> - linear_conf_t *conf = mddev_to_conf(mddev); >> + linear_conf_t *conf; >> >> + rcu_read_lock(); >> lo = 0; >> + conf = rcu_dereference(mddev->private); >> hi = mddev->raid_disks - 1; >> > > 2/ mddev->raid_disks should really be dereferenced before 'conf'. > Doing it the way you have done it, the 'raid_disks' value could be > larger than the value supported by the 'conf' so things could > go wrong. > Agreed. I hope you are referring to the case where a disk is in the process of being added to an array. Is that right ? Kindly confirm. > >> /* >> @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, >> sector_t sector) >> else >> lo = mid + 1; >> } >> - >> + rcu_read_unlock(); >> return conf->disks + lo; >> } > > 3/ We are accessing conf->disks well after the rcu_lock has been released. > That is not exactly a problem with the code as it stands. But if > we do go ahead and free the old 'conf' with call_rcu, then this > becomes racy. > We should hold the rcu_read_lock for the entire time that we are > accessing the contents of 'conf'. > True. > That means we don't take rcu_read_lock in which_dev, but rather > take it in the two functions that call which_dev. > I have made that change. >> >> @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, >> static void linear_unplug(struct request_queue *q) >> { >> mddev_t *mddev = q->queuedata; >> - linear_conf_t *conf = mddev_to_conf(mddev); >> + linear_conf_t *conf; >> int i; >> >> + rcu_read_lock(); >> + conf = rcu_dereference(mddev->private); >> + >> for (i=0; i < mddev->raid_disks; i++) { >> struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); >> blk_unplug(r_queue); >> } >> + rcu_read_unlock(); >> } >> >> static int linear_congested(void *data, int bits) >> { >> mddev_t *mddev = data; >> - linear_conf_t *conf = mddev_to_conf(mddev); >> + linear_conf_t *conf; >> int i, ret = 0; >> >> + rcu_read_lock(); >> + conf = rcu_dereference(mddev->private); >> + >> for (i = 0; i < mddev->raid_disks && !ret ; i++) { >> struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); >> ret |= bdi_congested(&q->backing_dev_info, bits); >> } >> + >> + rcu_read_unlock(); >> return ret; >> } >> >> static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) >> { >> - linear_conf_t *conf = mddev_to_conf(mddev); >> - >> + linear_conf_t *conf; >> + sector_t array_sectors; >> + rcu_read_lock(); >> + conf = rcu_dereference(mddev->private); >> WARN_ONCE(sectors || raid_disks, >> "%s does not support generic reshape\n", __func__); >> - >> - return conf->array_sectors; >> + array_sectors = conf->array_sectors; >> + rcu_read_unlock(); >> + >> + return array_sectors; >> } >> >> static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) >> @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >> return -EINVAL; >> >> rdev->raid_disk = rdev->saved_raid_disk; >> - >> - newconf = linear_conf(mddev,mddev->raid_disks+1); >> + newconf = linear_conf(mddev,mddev->raid_disks + 1); >> >> if (!newconf) >> return -ENOMEM; >> >> newconf->prev = mddev_to_conf(mddev); >> - mddev->private = newconf; >> mddev->raid_disks++; >> + rcu_assign_pointer(mddev->private,newconf); >> md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); >> set_capacity(mddev->gendisk, mddev->array_sectors); >> return 0; >> @@ -231,14 +245,17 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >> >> static int linear_stop (mddev_t *mddev) >> { >> - linear_conf_t *conf = mddev_to_conf(mddev); >> - >> + linear_conf_t *conf; >> + >> + rcu_read_lock(); >> + conf = rcu_dereference(mddev->private); >> blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ >> do { >> linear_conf_t *t = conf->prev; >> kfree(conf); >> conf = t; >> } while (conf); >> + rcu_read_unlock(); >> > > 4/ We don't need the rcu protection here as we hold ->reconfig_mutex > both in linear_add and linear_stop, so they cannot race. > Adding a comment to this effect might be a good idea though. > Fine. Shall do this as well. The new patch will follow soon. > Thanks, > > NeilBrown > > >> return 0; >> } >> -- >> Regards, >> Sandeep. >> >> >> >> >> >> >> “To learn is to change. Education is a process that changes the learner.” > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 6:35 ` SandeepKsinha @ 2009-06-17 6:46 ` SandeepKsinha 2009-06-17 8:02 ` Sujit Karataparambil 2009-06-17 9:59 ` [PATCH] md linear: Protecting mddev with rcu locks to avoid races in Neil Brown 2009-06-17 6:50 ` NeilBrown 1 sibling, 2 replies; 14+ messages in thread From: SandeepKsinha @ 2009-06-17 6:46 UTC (permalink / raw) To: Neil Brown; +Cc: Linux RAID Here goes the updated patch. commit 74da1595eb711b77969275070cda7516bac36f5e Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> Date: Sat Jun 6 20:49:37 2009 +0530 Due to the lack of memory ordering guarantees, we may have races around mddev->conf. This patch addresses the same using rcu protection to avoid such race conditions. diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 9ad6ec4..a56095c 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -28,9 +28,11 @@ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) { int lo, mid, hi; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; + rcu_read_lock(); lo = 0; + conf = rcu_dereference(mddev->private); hi = mddev->raid_disks - 1; /* @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) else lo = mid + 1; } - + rcu_read_unlock(); return conf->disks + lo; } @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, static void linear_unplug(struct request_queue *q) { mddev_t *mddev = q->queuedata; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; int i; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); + for (i=0; i < mddev->raid_disks; i++) { struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); blk_unplug(r_queue); } + rcu_read_unlock(); } static int linear_congested(void *data, int bits) { mddev_t *mddev = data; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; int i, ret = 0; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); + for (i = 0; i < mddev->raid_disks && !ret ; i++) { struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); ret |= bdi_congested(&q->backing_dev_info, bits); } + + rcu_read_unlock(); return ret; } On Wed, Jun 17, 2009 at 12:05 PM, SandeepKsinha<sandeepksinha@gmail.com> wrote: > Neil, > > On Mon, Jun 15, 2009 at 4:26 AM, Neil Brown<neilb@suse.de> wrote: >> >> Hi, >> Thanks for this patch, and sorry for the delay in reviewing it. >> >> I have a few issues: >> >> On Saturday June 6, sandeepksinha@gmail.com wrote: >>> Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> >>> >>> Protecting mddev with barriers to avoid races. >> >> 1/ You need a lot more of an explanatory comment than this. >> At least give some hint as to what the races are. >> Give than the rcu primitives are used, it now makes sense to use >> e.g. call_rcu to free the old 'conf'. That might reasonably be in a >> separate patch, but the comment on this patch should at least at that >> possibility. >>> > > Sure. I shall do it for the final patch. I will also take care of this > henceforth. > >>> diff --git a/drivers/md/linear.c b/drivers/md/linear.c >>> index 9ad6ec4..a56095c 100644 >>> --- a/drivers/md/linear.c >>> +++ b/drivers/md/linear.c >>> @@ -28,9 +28,11 @@ >>> static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) >>> { >>> int lo, mid, hi; >>> - linear_conf_t *conf = mddev_to_conf(mddev); >>> + linear_conf_t *conf; >>> >>> + rcu_read_lock(); >>> lo = 0; >>> + conf = rcu_dereference(mddev->private); >>> hi = mddev->raid_disks - 1; >>> >> >> 2/ mddev->raid_disks should really be dereferenced before 'conf'. >> Doing it the way you have done it, the 'raid_disks' value could be >> larger than the value supported by the 'conf' so things could >> go wrong. >> > Agreed. I hope you are referring to the case where a disk is in the > process of being added to an array. Is that right ? > Kindly confirm. >> >>> /* >>> @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, >>> sector_t sector) >>> else >>> lo = mid + 1; >>> } >>> - >>> + rcu_read_unlock(); >>> return conf->disks + lo; >>> } >> >> 3/ We are accessing conf->disks well after the rcu_lock has been released. >> That is not exactly a problem with the code as it stands. But if >> we do go ahead and free the old 'conf' with call_rcu, then this >> becomes racy. >> We should hold the rcu_read_lock for the entire time that we are >> accessing the contents of 'conf'. >> > True. > >> That means we don't take rcu_read_lock in which_dev, but rather >> take it in the two functions that call which_dev. >> > > I have made that change. >>> >>> @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, >>> static void linear_unplug(struct request_queue *q) >>> { >>> mddev_t *mddev = q->queuedata; >>> - linear_conf_t *conf = mddev_to_conf(mddev); >>> + linear_conf_t *conf; >>> int i; >>> >>> + rcu_read_lock(); >>> + conf = rcu_dereference(mddev->private); >>> + >>> for (i=0; i < mddev->raid_disks; i++) { >>> struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); >>> blk_unplug(r_queue); >>> } >>> + rcu_read_unlock(); >>> } >>> >>> static int linear_congested(void *data, int bits) >>> { >>> mddev_t *mddev = data; >>> - linear_conf_t *conf = mddev_to_conf(mddev); >>> + linear_conf_t *conf; >>> int i, ret = 0; >>> >>> + rcu_read_lock(); >>> + conf = rcu_dereference(mddev->private); >>> + >>> for (i = 0; i < mddev->raid_disks && !ret ; i++) { >>> struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); >>> ret |= bdi_congested(&q->backing_dev_info, bits); >>> } >>> + >>> + rcu_read_unlock(); >>> return ret; >>> } >>> >>> static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) >>> { >>> - linear_conf_t *conf = mddev_to_conf(mddev); >>> - >>> + linear_conf_t *conf; >>> + sector_t array_sectors; >>> + rcu_read_lock(); >>> + conf = rcu_dereference(mddev->private); >>> WARN_ONCE(sectors || raid_disks, >>> "%s does not support generic reshape\n", __func__); >>> - >>> - return conf->array_sectors; >>> + array_sectors = conf->array_sectors; >>> + rcu_read_unlock(); >>> + >>> + return array_sectors; >>> } >>> >>> static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) >>> @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >>> return -EINVAL; >>> >>> rdev->raid_disk = rdev->saved_raid_disk; >>> - >>> - newconf = linear_conf(mddev,mddev->raid_disks+1); >>> + newconf = linear_conf(mddev,mddev->raid_disks + 1); >>> >>> if (!newconf) >>> return -ENOMEM; >>> >>> newconf->prev = mddev_to_conf(mddev); >>> - mddev->private = newconf; >>> mddev->raid_disks++; >>> + rcu_assign_pointer(mddev->private,newconf); >>> md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); >>> set_capacity(mddev->gendisk, mddev->array_sectors); >>> return 0; >>> @@ -231,14 +245,17 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >>> >>> static int linear_stop (mddev_t *mddev) >>> { >>> - linear_conf_t *conf = mddev_to_conf(mddev); >>> - >>> + linear_conf_t *conf; >>> + >>> + rcu_read_lock(); >>> + conf = rcu_dereference(mddev->private); >>> blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ >>> do { >>> linear_conf_t *t = conf->prev; >>> kfree(conf); >>> conf = t; >>> } while (conf); >>> + rcu_read_unlock(); >>> >> >> 4/ We don't need the rcu protection here as we hold ->reconfig_mutex >> both in linear_add and linear_stop, so they cannot race. >> Adding a comment to this effect might be a good idea though. >> > > Fine. Shall do this as well. > > The new patch will follow soon. > >> Thanks, >> >> NeilBrown >> >> >>> return 0; >>> } >>> -- >>> Regards, >>> Sandeep. >>> >>> >>> >>> >>> >>> >>> “To learn is to change. Education is a process that changes the learner.” >> > > > > -- > Regards, > Sandeep. > > > > > > > “To learn is to change. Education is a process that changes the learner.” > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 6:46 ` SandeepKsinha @ 2009-06-17 8:02 ` Sujit Karataparambil 2009-06-17 8:48 ` SandeepKsinha 2009-06-17 9:59 ` [PATCH] md linear: Protecting mddev with rcu locks to avoid races in Neil Brown 1 sibling, 1 reply; 14+ messages in thread From: Sujit Karataparambil @ 2009-06-17 8:02 UTC (permalink / raw) To: SandeepKsinha; +Cc: Neil Brown, Linux RAID Neil, mddev is an static one time setup. Which should not have any problem with rcu, anyway. Any changes to the raid will require an rebuild using md. Is this correct or wrong. Thanks, Sujit On Wed, Jun 17, 2009 at 12:16 PM, SandeepKsinha<sandeepksinha@gmail.com> wrote: > Here goes the updated patch. > > > commit 74da1595eb711b77969275070cda7516bac36f5e > Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> > Date: Sat Jun 6 20:49:37 2009 +0530 > Due to the lack of memory ordering guarantees, we may have races around > mddev->conf. This patch addresses the same using rcu protection to avoid > such race conditions. > > diff --git a/drivers/md/linear.c b/drivers/md/linear.c > index 9ad6ec4..a56095c 100644 > --- a/drivers/md/linear.c > +++ b/drivers/md/linear.c > @@ -28,9 +28,11 @@ > static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) > { > int lo, mid, hi; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > > + rcu_read_lock(); > lo = 0; > + conf = rcu_dereference(mddev->private); > hi = mddev->raid_disks - 1; > > /* > @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, > sector_t sector) > else > lo = mid + 1; > } > - > + rcu_read_unlock(); > return conf->disks + lo; > } > > @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, > static void linear_unplug(struct request_queue *q) > { > mddev_t *mddev = q->queuedata; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > int i; > > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > + > for (i=0; i < mddev->raid_disks; i++) { > struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); > blk_unplug(r_queue); > } > + rcu_read_unlock(); > } > > static int linear_congested(void *data, int bits) > { > mddev_t *mddev = data; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > int i, ret = 0; > > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > + > for (i = 0; i < mddev->raid_disks && !ret ; i++) { > struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); > ret |= bdi_congested(&q->backing_dev_info, bits); > } > + > + rcu_read_unlock(); > return ret; > } > > > On Wed, Jun 17, 2009 at 12:05 PM, SandeepKsinha<sandeepksinha@gmail.com> wrote: >> Neil, >> >> On Mon, Jun 15, 2009 at 4:26 AM, Neil Brown<neilb@suse.de> wrote: >>> >>> Hi, >>> Thanks for this patch, and sorry for the delay in reviewing it. >>> >>> I have a few issues: >>> >>> On Saturday June 6, sandeepksinha@gmail.com wrote: >>>> Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> >>>> >>>> Protecting mddev with barriers to avoid races. >>> >>> 1/ You need a lot more of an explanatory comment than this. >>> At least give some hint as to what the races are. >>> Give than the rcu primitives are used, it now makes sense to use >>> e.g. call_rcu to free the old 'conf'. That might reasonably be in a >>> separate patch, but the comment on this patch should at least at that >>> possibility. >>>> >> >> Sure. I shall do it for the final patch. I will also take care of this >> henceforth. >> >>>> diff --git a/drivers/md/linear.c b/drivers/md/linear.c >>>> index 9ad6ec4..a56095c 100644 >>>> --- a/drivers/md/linear.c >>>> +++ b/drivers/md/linear.c >>>> @@ -28,9 +28,11 @@ >>>> static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) >>>> { >>>> int lo, mid, hi; >>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>> + linear_conf_t *conf; >>>> >>>> + rcu_read_lock(); >>>> lo = 0; >>>> + conf = rcu_dereference(mddev->private); >>>> hi = mddev->raid_disks - 1; >>>> >>> >>> 2/ mddev->raid_disks should really be dereferenced before 'conf'. >>> Doing it the way you have done it, the 'raid_disks' value could be >>> larger than the value supported by the 'conf' so things could >>> go wrong. >>> >> Agreed. I hope you are referring to the case where a disk is in the >> process of being added to an array. Is that right ? >> Kindly confirm. >>> >>>> /* >>>> @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, >>>> sector_t sector) >>>> else >>>> lo = mid + 1; >>>> } >>>> - >>>> + rcu_read_unlock(); >>>> return conf->disks + lo; >>>> } >>> >>> 3/ We are accessing conf->disks well after the rcu_lock has been released. >>> That is not exactly a problem with the code as it stands. But if >>> we do go ahead and free the old 'conf' with call_rcu, then this >>> becomes racy. >>> We should hold the rcu_read_lock for the entire time that we are >>> accessing the contents of 'conf'. >>> >> True. >> >>> That means we don't take rcu_read_lock in which_dev, but rather >>> take it in the two functions that call which_dev. >>> >> >> I have made that change. >>>> >>>> @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, >>>> static void linear_unplug(struct request_queue *q) >>>> { >>>> mddev_t *mddev = q->queuedata; >>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>> + linear_conf_t *conf; >>>> int i; >>>> >>>> + rcu_read_lock(); >>>> + conf = rcu_dereference(mddev->private); >>>> + >>>> for (i=0; i < mddev->raid_disks; i++) { >>>> struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); >>>> blk_unplug(r_queue); >>>> } >>>> + rcu_read_unlock(); >>>> } >>>> >>>> static int linear_congested(void *data, int bits) >>>> { >>>> mddev_t *mddev = data; >>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>> + linear_conf_t *conf; >>>> int i, ret = 0; >>>> >>>> + rcu_read_lock(); >>>> + conf = rcu_dereference(mddev->private); >>>> + >>>> for (i = 0; i < mddev->raid_disks && !ret ; i++) { >>>> struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); >>>> ret |= bdi_congested(&q->backing_dev_info, bits); >>>> } >>>> + >>>> + rcu_read_unlock(); >>>> return ret; >>>> } >>>> >>>> static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) >>>> { >>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>> - >>>> + linear_conf_t *conf; >>>> + sector_t array_sectors; >>>> + rcu_read_lock(); >>>> + conf = rcu_dereference(mddev->private); >>>> WARN_ONCE(sectors || raid_disks, >>>> "%s does not support generic reshape\n", __func__); >>>> - >>>> - return conf->array_sectors; >>>> + array_sectors = conf->array_sectors; >>>> + rcu_read_unlock(); >>>> + >>>> + return array_sectors; >>>> } >>>> >>>> static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) >>>> @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >>>> return -EINVAL; >>>> >>>> rdev->raid_disk = rdev->saved_raid_disk; >>>> - >>>> - newconf = linear_conf(mddev,mddev->raid_disks+1); >>>> + newconf = linear_conf(mddev,mddev->raid_disks + 1); >>>> >>>> if (!newconf) >>>> return -ENOMEM; >>>> >>>> newconf->prev = mddev_to_conf(mddev); >>>> - mddev->private = newconf; >>>> mddev->raid_disks++; >>>> + rcu_assign_pointer(mddev->private,newconf); >>>> md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); >>>> set_capacity(mddev->gendisk, mddev->array_sectors); >>>> return 0; >>>> @@ -231,14 +245,17 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >>>> >>>> static int linear_stop (mddev_t *mddev) >>>> { >>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>> - >>>> + linear_conf_t *conf; >>>> + >>>> + rcu_read_lock(); >>>> + conf = rcu_dereference(mddev->private); >>>> blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ >>>> do { >>>> linear_conf_t *t = conf->prev; >>>> kfree(conf); >>>> conf = t; >>>> } while (conf); >>>> + rcu_read_unlock(); >>>> >>> >>> 4/ We don't need the rcu protection here as we hold ->reconfig_mutex >>> both in linear_add and linear_stop, so they cannot race. >>> Adding a comment to this effect might be a good idea though. >>> >> >> Fine. Shall do this as well. >> >> The new patch will follow soon. >> >>> Thanks, >>> >>> NeilBrown >>> >>> >>>> return 0; >>>> } >>>> -- >>>> Regards, >>>> Sandeep. >>>> >>>> >>>> >>>> >>>> >>>> >>>> “To learn is to change. Education is a process that changes the learner.” >>> >> >> >> >> -- >> Regards, >> Sandeep. >> >> >> >> >> >> >> “To learn is to change. Education is a process that changes the learner.” >> > > > > -- > Regards, > Sandeep. > > > > > > > “To learn is to change. Education is a process that changes the learner.” > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- -- Sujit K M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 8:02 ` Sujit Karataparambil @ 2009-06-17 8:48 ` SandeepKsinha 2009-06-17 9:14 ` Sujit Karataparambil 0 siblings, 1 reply; 14+ messages in thread From: SandeepKsinha @ 2009-06-17 8:48 UTC (permalink / raw) To: Sujit Karataparambil; +Cc: Neil Brown, Linux RAID Hi Sujit, Just a try to answer your question. On Wed, Jun 17, 2009 at 1:32 PM, Sujit Karataparambil<sjt.kar@gmail.com> wrote: > Neil, > > mddev is an static one time setup. Which should not have any problem > with rcu, anyway. > Any changes to the raid will require an rebuild using md. > mddev->raid_disks can change, when you add disks and also in the mean while when you are updating the conf, the older conf might be being used at various other places. > Is this correct or wrong. > > Thanks, > Sujit > > On Wed, Jun 17, 2009 at 12:16 PM, SandeepKsinha<sandeepksinha@gmail.com> wrote: >> Here goes the updated patch. >> >> >> commit 74da1595eb711b77969275070cda7516bac36f5e >> Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> >> Date: Sat Jun 6 20:49:37 2009 +0530 >> Due to the lack of memory ordering guarantees, we may have races around >> mddev->conf. This patch addresses the same using rcu protection to avoid >> such race conditions. >> >> diff --git a/drivers/md/linear.c b/drivers/md/linear.c >> index 9ad6ec4..a56095c 100644 >> --- a/drivers/md/linear.c >> +++ b/drivers/md/linear.c >> @@ -28,9 +28,11 @@ >> static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) >> { >> int lo, mid, hi; >> - linear_conf_t *conf = mddev_to_conf(mddev); >> + linear_conf_t *conf; >> >> + rcu_read_lock(); >> lo = 0; >> + conf = rcu_dereference(mddev->private); >> hi = mddev->raid_disks - 1; >> >> /* >> @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, >> sector_t sector) >> else >> lo = mid + 1; >> } >> - >> + rcu_read_unlock(); >> return conf->disks + lo; >> } >> >> @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, >> static void linear_unplug(struct request_queue *q) >> { >> mddev_t *mddev = q->queuedata; >> - linear_conf_t *conf = mddev_to_conf(mddev); >> + linear_conf_t *conf; >> int i; >> >> + rcu_read_lock(); >> + conf = rcu_dereference(mddev->private); >> + >> for (i=0; i < mddev->raid_disks; i++) { >> struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); >> blk_unplug(r_queue); >> } >> + rcu_read_unlock(); >> } >> >> static int linear_congested(void *data, int bits) >> { >> mddev_t *mddev = data; >> - linear_conf_t *conf = mddev_to_conf(mddev); >> + linear_conf_t *conf; >> int i, ret = 0; >> >> + rcu_read_lock(); >> + conf = rcu_dereference(mddev->private); >> + >> for (i = 0; i < mddev->raid_disks && !ret ; i++) { >> struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); >> ret |= bdi_congested(&q->backing_dev_info, bits); >> } >> + >> + rcu_read_unlock(); >> return ret; >> } >> >> >> On Wed, Jun 17, 2009 at 12:05 PM, SandeepKsinha<sandeepksinha@gmail.com> wrote: >>> Neil, >>> >>> On Mon, Jun 15, 2009 at 4:26 AM, Neil Brown<neilb@suse.de> wrote: >>>> >>>> Hi, >>>> Thanks for this patch, and sorry for the delay in reviewing it. >>>> >>>> I have a few issues: >>>> >>>> On Saturday June 6, sandeepksinha@gmail.com wrote: >>>>> Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> >>>>> >>>>> Protecting mddev with barriers to avoid races. >>>> >>>> 1/ You need a lot more of an explanatory comment than this. >>>> At least give some hint as to what the races are. >>>> Give than the rcu primitives are used, it now makes sense to use >>>> e.g. call_rcu to free the old 'conf'. That might reasonably be in a >>>> separate patch, but the comment on this patch should at least at that >>>> possibility. >>>>> >>> >>> Sure. I shall do it for the final patch. I will also take care of this >>> henceforth. >>> >>>>> diff --git a/drivers/md/linear.c b/drivers/md/linear.c >>>>> index 9ad6ec4..a56095c 100644 >>>>> --- a/drivers/md/linear.c >>>>> +++ b/drivers/md/linear.c >>>>> @@ -28,9 +28,11 @@ >>>>> static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) >>>>> { >>>>> int lo, mid, hi; >>>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>>> + linear_conf_t *conf; >>>>> >>>>> + rcu_read_lock(); >>>>> lo = 0; >>>>> + conf = rcu_dereference(mddev->private); >>>>> hi = mddev->raid_disks - 1; >>>>> >>>> >>>> 2/ mddev->raid_disks should really be dereferenced before 'conf'. >>>> Doing it the way you have done it, the 'raid_disks' value could be >>>> larger than the value supported by the 'conf' so things could >>>> go wrong. >>>> >>> Agreed. I hope you are referring to the case where a disk is in the >>> process of being added to an array. Is that right ? >>> Kindly confirm. >>>> >>>>> /* >>>>> @@ -45,7 +47,7 @@ static inline dev_info_t *which_dev(mddev_t *mddev, >>>>> sector_t sector) >>>>> else >>>>> lo = mid + 1; >>>>> } >>>>> - >>>>> + rcu_read_unlock(); >>>>> return conf->disks + lo; >>>>> } >>>> >>>> 3/ We are accessing conf->disks well after the rcu_lock has been released. >>>> That is not exactly a problem with the code as it stands. But if >>>> we do go ahead and free the old 'conf' with call_rcu, then this >>>> becomes racy. >>>> We should hold the rcu_read_lock for the entire time that we are >>>> accessing the contents of 'conf'. >>>> >>> True. >>> >>>> That means we don't take rcu_read_lock in which_dev, but rather >>>> take it in the two functions that call which_dev. >>>> >>> >>> I have made that change. >>>>> >>>>> @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, >>>>> static void linear_unplug(struct request_queue *q) >>>>> { >>>>> mddev_t *mddev = q->queuedata; >>>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>>> + linear_conf_t *conf; >>>>> int i; >>>>> >>>>> + rcu_read_lock(); >>>>> + conf = rcu_dereference(mddev->private); >>>>> + >>>>> for (i=0; i < mddev->raid_disks; i++) { >>>>> struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); >>>>> blk_unplug(r_queue); >>>>> } >>>>> + rcu_read_unlock(); >>>>> } >>>>> >>>>> static int linear_congested(void *data, int bits) >>>>> { >>>>> mddev_t *mddev = data; >>>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>>> + linear_conf_t *conf; >>>>> int i, ret = 0; >>>>> >>>>> + rcu_read_lock(); >>>>> + conf = rcu_dereference(mddev->private); >>>>> + >>>>> for (i = 0; i < mddev->raid_disks && !ret ; i++) { >>>>> struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); >>>>> ret |= bdi_congested(&q->backing_dev_info, bits); >>>>> } >>>>> + >>>>> + rcu_read_unlock(); >>>>> return ret; >>>>> } >>>>> >>>>> static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) >>>>> { >>>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>>> - >>>>> + linear_conf_t *conf; >>>>> + sector_t array_sectors; >>>>> + rcu_read_lock(); >>>>> + conf = rcu_dereference(mddev->private); >>>>> WARN_ONCE(sectors || raid_disks, >>>>> "%s does not support generic reshape\n", __func__); >>>>> - >>>>> - return conf->array_sectors; >>>>> + array_sectors = conf->array_sectors; >>>>> + rcu_read_unlock(); >>>>> + >>>>> + return array_sectors; >>>>> } >>>>> >>>>> static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) >>>>> @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >>>>> return -EINVAL; >>>>> >>>>> rdev->raid_disk = rdev->saved_raid_disk; >>>>> - >>>>> - newconf = linear_conf(mddev,mddev->raid_disks+1); >>>>> + newconf = linear_conf(mddev,mddev->raid_disks + 1); >>>>> >>>>> if (!newconf) >>>>> return -ENOMEM; >>>>> >>>>> newconf->prev = mddev_to_conf(mddev); >>>>> - mddev->private = newconf; >>>>> mddev->raid_disks++; >>>>> + rcu_assign_pointer(mddev->private,newconf); >>>>> md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); >>>>> set_capacity(mddev->gendisk, mddev->array_sectors); >>>>> return 0; >>>>> @@ -231,14 +245,17 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) >>>>> >>>>> static int linear_stop (mddev_t *mddev) >>>>> { >>>>> - linear_conf_t *conf = mddev_to_conf(mddev); >>>>> - >>>>> + linear_conf_t *conf; >>>>> + >>>>> + rcu_read_lock(); >>>>> + conf = rcu_dereference(mddev->private); >>>>> blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ >>>>> do { >>>>> linear_conf_t *t = conf->prev; >>>>> kfree(conf); >>>>> conf = t; >>>>> } while (conf); >>>>> + rcu_read_unlock(); >>>>> >>>> >>>> 4/ We don't need the rcu protection here as we hold ->reconfig_mutex >>>> both in linear_add and linear_stop, so they cannot race. >>>> Adding a comment to this effect might be a good idea though. >>>> >>> >>> Fine. Shall do this as well. >>> >>> The new patch will follow soon. >>> >>>> Thanks, >>>> >>>> NeilBrown >>>> >>>> >>>>> return 0; >>>>> } >>>>> -- >>>>> Regards, >>>>> Sandeep. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> “To learn is to change. Education is a process that changes the learner.” >>>> >>> >>> >>> >>> -- >>> Regards, >>> Sandeep. >>> >>> >>> >>> >>> >>> >>> “To learn is to change. Education is a process that changes the learner.” >>> >> >> >> >> -- >> Regards, >> Sandeep. >> >> >> >> >> >> >> “To learn is to change. Education is a process that changes the learner.” >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > -- Sujit K M > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 8:48 ` SandeepKsinha @ 2009-06-17 9:14 ` Sujit Karataparambil 2009-06-17 9:32 ` SandeepKsinha 0 siblings, 1 reply; 14+ messages in thread From: Sujit Karataparambil @ 2009-06-17 9:14 UTC (permalink / raw) To: SandeepKsinha; +Cc: Neil Brown, Linux RAID Sorry this was not my question. What I am asking is whether md is built on top of the raid statically or is it an dynamic part of the raid array. As with my experiance md is built statically. Your answer > mddev->raid_disks can change, when you add disks and also in the mean > while when you are updating the conf, the older conf might be being > used at various other places. does not speicify either. It seems to be from manual pages that tries to administrate raid array's. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 9:14 ` Sujit Karataparambil @ 2009-06-17 9:32 ` SandeepKsinha 2009-06-17 9:37 ` Sujit Karataparambil 0 siblings, 1 reply; 14+ messages in thread From: SandeepKsinha @ 2009-06-17 9:32 UTC (permalink / raw) To: Sujit Karataparambil; +Cc: Neil Brown, Linux RAID Hi Sujit, On Wed, Jun 17, 2009 at 2:44 PM, Sujit Karataparambil<sjt.kar@gmail.com> wrote: > Sorry this was not my question. What I am asking is whether md is built on > top of the raid statically or is it an dynamic part of the raid array. > What makes you think that md is static when you array itself can grow. md contains information pertaining to the array it holds. > As with my experiance md is built statically. Your answer > >> mddev->raid_disks can change, when you add disks and also in the mean >> while when you are updating the conf, the older conf might be being >> used at various other places. > > does not speicify either. It seems to be from manual pages that tries > to administrate > raid array's. > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 9:32 ` SandeepKsinha @ 2009-06-17 9:37 ` Sujit Karataparambil 2009-06-17 10:01 ` Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: Sujit Karataparambil @ 2009-06-17 9:37 UTC (permalink / raw) To: SandeepKsinha; +Cc: Neil Brown, Linux RAID Sorry what your patch claims is that the raid array can change the number of array's otherwise there is no need for an rcu. > What makes you think that md is static when you array itself can grow. > md contains information pertaining to the array it holds. This is what I saying my the md is built statically on top of the raid. -- -- Sujit K M ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 9:37 ` Sujit Karataparambil @ 2009-06-17 10:01 ` Neil Brown [not found] ` <37d33d830906170315k4087d532nc2426879c2063fd7@mail.gmail.com> 0 siblings, 1 reply; 14+ messages in thread From: Neil Brown @ 2009-06-17 10:01 UTC (permalink / raw) To: Sujit Karataparambil; +Cc: SandeepKsinha, Linux RAID On Wednesday June 17, sjt.kar@gmail.com wrote: > Sorry what your patch claims is that the raid array can change the number of > array's otherwise there is no need for an rcu. > > > What makes you think that md is static when you array itself can grow. > > md contains information pertaining to the array it holds. > > This is what I saying my the md is built statically on top of the raid. md/linear allows an array to grow while it is only. You can add a drive to a 'linear' array and it will become larger to include that drive. So the array is not completely static. NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <37d33d830906170315k4087d532nc2426879c2063fd7@mail.gmail.com>]
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in [not found] ` <37d33d830906170315k4087d532nc2426879c2063fd7@mail.gmail.com> @ 2009-06-17 10:17 ` SandeepKsinha 2009-06-17 23:38 ` [PATCH] md linear: Protecting mddev with rcu locks to avoid races Neil Brown 0 siblings, 1 reply; 14+ messages in thread From: SandeepKsinha @ 2009-06-17 10:17 UTC (permalink / raw) To: Neil Brown, Linux RAID [-- Attachment #1: Type: text/plain, Size: 4966 bytes --] Hi Neil, I am attaching the patch file as I am seeing some issues with copy-paste into the mail. On Wed, Jun 17, 2009 at 3:45 PM, SandeepKsinha<sandeepksinha@gmail.com> wrote: > Hi Neil, > My Mistake. I missed "stg refresh". Here is the updated one. > Sorry for the inconvinience. > > Signed-off-by: Sandeep K Sinha <sandeepksinha@gmail.com> > Due to the lack of memory ordering guarantees, we may have races around > mddev->conf. This patch addresses the same using rcu protection to avoid > such race conditions. > diff --git a/drivers/md/linear.c b/drivers/md/linear.c > index 9ad6ec4..f254a8a 100644 > --- a/drivers/md/linear.c > +++ b/drivers/md/linear.c > @@ -28,10 +28,11 @@ > static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) > { > int lo, mid, hi; > - linear_conf_t *conf = mddev_to_conf(mddev); > - > + linear_conf_t *conf; > + > lo = 0; > hi = mddev->raid_disks - 1; > + conf = rcu_dereference(mddev->private); > > /* > * Binary Search > @@ -45,7 +46,6 @@ static inline dev_info_t *which_dev(mddev_t *mddev, > sector_t sector) > else > lo = mid + 1; > } > - > return conf->disks + lo; > } > > @@ -66,7 +66,9 @@ static int linear_mergeable_bvec(struct request_queue *q, > unsigned long maxsectors, bio_sectors = bvm->bi_size >> 9; > sector_t sector = bvm->bi_sector + get_start_sect(bvm->bi_bdev); > > + rcu_read_lock(); > dev0 = which_dev(mddev, sector); > + rcu_read_unlock(); > maxsectors = dev0->end_sector - sector; > > if (maxsectors < bio_sectors) > @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue > *q, > static void linear_unplug(struct request_queue *q) > { > mddev_t *mddev = q->queuedata; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > int i; > > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > + > for (i=0; i < mddev->raid_disks; i++) { > struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); > blk_unplug(r_queue); > } > + rcu_read_unlock(); > } > > static int linear_congested(void *data, int bits) > { > mddev_t *mddev = data; > - linear_conf_t *conf = mddev_to_conf(mddev); > + linear_conf_t *conf; > int i, ret = 0; > > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > + > for (i = 0; i < mddev->raid_disks && !ret ; i++) { > struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); > ret |= bdi_congested(&q->backing_dev_info, bits); > } > + > + rcu_read_unlock(); > return ret; > } > > static sector_t linear_size(mddev_t *mddev, sector_t sectors, int > raid_disks) > { > - linear_conf_t *conf = mddev_to_conf(mddev); > - > + linear_conf_t *conf; > + sector_t array_sectors; > + rcu_read_lock(); > + conf = rcu_dereference(mddev->private); > WARN_ONCE(sectors || raid_disks, > "%s does not support generic reshape\n", __func__); > - > - return conf->array_sectors; > + array_sectors = conf->array_sectors; > + rcu_read_unlock(); > + > + return array_sectors; > } > > static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) > @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t > *rdev) > return -EINVAL; > > rdev->raid_disk = rdev->saved_raid_disk; > - > - newconf = linear_conf(mddev,mddev->raid_disks+1); > + newconf = linear_conf(mddev,mddev->raid_disks + 1); > > if (!newconf) > return -ENOMEM; > > newconf->prev = mddev_to_conf(mddev); > - mddev->private = newconf; > mddev->raid_disks++; > + rcu_assign_pointer(mddev->private,newconf); > md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); > set_capacity(mddev->gendisk, mddev->array_sectors); > return 0; > @@ -231,8 +245,15 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) > > static int linear_stop (mddev_t *mddev) > { > - linear_conf_t *conf = mddev_to_conf(mddev); > - > + linear_conf_t *conf; > + > + /* > + * We do not require rcu protection here since > + * we hold reconfig_mutex for both linear_add and > + * linear_stop, so they cannot race. > + */ > + > + conf = rcu_dereference(mddev->private); > blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ > do { > linear_conf_t *t = conf->prev; > @@ -262,7 +283,9 @@ static int linear_make_request (struct request_queue *q, > struct bio *bio) > bio_sectors(bio)); > part_stat_unlock(); > > + rcu_read_lock(); > tmp_dev = which_dev(mddev, bio->bi_sector); > + rcu_read_unlock(); > start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors; > > if (unlikely(bio->bi_sector >= (tmp_dev->end_sector) > -- > Regards, > Sandeep. > > > > > > > “To learn is to change. Education is a process that changes the learner.” > -- Regards, Sandeep. “To learn is to change. Education is a process that changes the learner.” [-- Attachment #2: linear-rcu.patch --] [-- Type: text/x-patch, Size: 4120 bytes --] commit 848cea3974a6c63a8a810b8e74c87a42ea727399 Author: Sandeep K Sinha <sandeepksinha@gmail.com> Date: Wed Jun 17 15:33:37 2009 +0530 Due to the lack of memory ordering guarantees, we may have races around mddev->conf. This patch addresses the same using rcu protection to avoid such race conditions. diff --git a/drivers/md/linear.c b/drivers/md/linear.c index 9ad6ec4..f254a8a 100644 --- a/drivers/md/linear.c +++ b/drivers/md/linear.c @@ -28,10 +28,11 @@ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) { int lo, mid, hi; - linear_conf_t *conf = mddev_to_conf(mddev); - + linear_conf_t *conf; + lo = 0; hi = mddev->raid_disks - 1; + conf = rcu_dereference(mddev->private); /* * Binary Search @@ -45,7 +46,6 @@ static inline dev_info_t *which_dev(mddev_t *mddev, sector_t sector) else lo = mid + 1; } - return conf->disks + lo; } @@ -66,7 +66,9 @@ static int linear_mergeable_bvec(struct request_queue *q, unsigned long maxsectors, bio_sectors = bvm->bi_size >> 9; sector_t sector = bvm->bi_sector + get_start_sect(bvm->bi_bdev); + rcu_read_lock(); dev0 = which_dev(mddev, sector); + rcu_read_unlock(); maxsectors = dev0->end_sector - sector; if (maxsectors < bio_sectors) @@ -86,36 +88,49 @@ static int linear_mergeable_bvec(struct request_queue *q, static void linear_unplug(struct request_queue *q) { mddev_t *mddev = q->queuedata; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; int i; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); + for (i=0; i < mddev->raid_disks; i++) { struct request_queue *r_queue = bdev_get_queue(conf->disks[i].rdev->bdev); blk_unplug(r_queue); } + rcu_read_unlock(); } static int linear_congested(void *data, int bits) { mddev_t *mddev = data; - linear_conf_t *conf = mddev_to_conf(mddev); + linear_conf_t *conf; int i, ret = 0; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); + for (i = 0; i < mddev->raid_disks && !ret ; i++) { struct request_queue *q = bdev_get_queue(conf->disks[i].rdev->bdev); ret |= bdi_congested(&q->backing_dev_info, bits); } + + rcu_read_unlock(); return ret; } static sector_t linear_size(mddev_t *mddev, sector_t sectors, int raid_disks) { - linear_conf_t *conf = mddev_to_conf(mddev); - + linear_conf_t *conf; + sector_t array_sectors; + rcu_read_lock(); + conf = rcu_dereference(mddev->private); WARN_ONCE(sectors || raid_disks, "%s does not support generic reshape\n", __func__); - - return conf->array_sectors; + array_sectors = conf->array_sectors; + rcu_read_unlock(); + + return array_sectors; } static linear_conf_t *linear_conf(mddev_t *mddev, int raid_disks) @@ -215,15 +230,14 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) return -EINVAL; rdev->raid_disk = rdev->saved_raid_disk; - - newconf = linear_conf(mddev,mddev->raid_disks+1); + newconf = linear_conf(mddev,mddev->raid_disks + 1); if (!newconf) return -ENOMEM; newconf->prev = mddev_to_conf(mddev); - mddev->private = newconf; mddev->raid_disks++; + rcu_assign_pointer(mddev->private,newconf); md_set_array_sectors(mddev, linear_size(mddev, 0, 0)); set_capacity(mddev->gendisk, mddev->array_sectors); return 0; @@ -231,8 +245,15 @@ static int linear_add(mddev_t *mddev, mdk_rdev_t *rdev) static int linear_stop (mddev_t *mddev) { - linear_conf_t *conf = mddev_to_conf(mddev); - + linear_conf_t *conf; + + /* + * We do not require rcu protection here since + * we hold reconfig_mutex for both linear_add and + * linear_stop, so they cannot race. + */ + + conf = rcu_dereference(mddev->private); blk_sync_queue(mddev->queue); /* the unplug fn references 'conf'*/ do { linear_conf_t *t = conf->prev; @@ -262,7 +283,9 @@ static int linear_make_request (struct request_queue *q, struct bio *bio) bio_sectors(bio)); part_stat_unlock(); + rcu_read_lock(); tmp_dev = which_dev(mddev, bio->bi_sector); + rcu_read_unlock(); start_sector = tmp_dev->end_sector - tmp_dev->rdev->sectors; if (unlikely(bio->bi_sector >= (tmp_dev->end_sector) ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races 2009-06-17 10:17 ` SandeepKsinha @ 2009-06-17 23:38 ` Neil Brown 0 siblings, 0 replies; 14+ messages in thread From: Neil Brown @ 2009-06-17 23:38 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID On Wednesday June 17, sandeepksinha@gmail.com wrote: > Hi Neil, > > I am attaching the patch file as I am seeing some issues with > copy-paste into the mail. > Thanks. Much better. I possibly didn't explain myself properly when saying that the rcu_read_lock needed to be taken out side of the call to which_dev. The reason was that it returns a pointer which still needs to be protected. I possibly didn't for the code as it stood, but I wanted to use call_rcu to free the old conf, so it makes sense to protect the whole usage of the structure. So I made that little change. Then added a patch to free the old conf with call_rcu. It is all now pushed out and you can see the final version of this patch at http://neil.brown.name/git?p=md;a=commitdiff;h=af11c397fd8835c70ec0bb777104e4ab98b2d660 Thanks, NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 6:46 ` SandeepKsinha 2009-06-17 8:02 ` Sujit Karataparambil @ 2009-06-17 9:59 ` Neil Brown 1 sibling, 0 replies; 14+ messages in thread From: Neil Brown @ 2009-06-17 9:59 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID On Wednesday June 17, sandeepksinha@gmail.com wrote: > Here goes the updated patch. The patch is corrupt. It looks like there are about 15 lines missing that the end (at least). Also, you are still doing the locking inside which_dev. I said > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] md linear: Protecting mddev with rcu locks to avoid races in 2009-06-17 6:35 ` SandeepKsinha 2009-06-17 6:46 ` SandeepKsinha @ 2009-06-17 6:50 ` NeilBrown 1 sibling, 0 replies; 14+ messages in thread From: NeilBrown @ 2009-06-17 6:50 UTC (permalink / raw) To: SandeepKsinha; +Cc: Linux RAID On Wed, June 17, 2009 4:35 pm, SandeepKsinha wrote: >> >> 2/ mddev->raid_disks should really be dereferenced before 'conf'. >> Doing it the way you have done it, the 'raid_disks' value could be >> larger than the value supported by the 'conf' so things could >> go wrong. >> > Agreed. I hope you are referring to the case where a disk is in the > process of being added to an array. Is that right ? > Kindly confirm. Yes, that is correct. >> 4/ We don't need the rcu protection here as we hold ->reconfig_mutex >> both in linear_add and linear_stop, so they cannot race. >> Adding a comment to this effect might be a good idea though. >> > > Fine. Shall do this as well. > > The new patch will follow soon. > Thanks! NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2009-06-17 23:38 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-06-06 15:24 [PATCH] md linear: Protecting mddev with rcu locks to avoid races in SandeepKsinha 2009-06-14 22:56 ` Neil Brown 2009-06-17 6:35 ` SandeepKsinha 2009-06-17 6:46 ` SandeepKsinha 2009-06-17 8:02 ` Sujit Karataparambil 2009-06-17 8:48 ` SandeepKsinha 2009-06-17 9:14 ` Sujit Karataparambil 2009-06-17 9:32 ` SandeepKsinha 2009-06-17 9:37 ` Sujit Karataparambil 2009-06-17 10:01 ` Neil Brown [not found] ` <37d33d830906170315k4087d532nc2426879c2063fd7@mail.gmail.com> 2009-06-17 10:17 ` SandeepKsinha 2009-06-17 23:38 ` [PATCH] md linear: Protecting mddev with rcu locks to avoid races Neil Brown 2009-06-17 9:59 ` [PATCH] md linear: Protecting mddev with rcu locks to avoid races in Neil Brown 2009-06-17 6:50 ` NeilBrown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.