All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHES]: dm lock optimization
@ 2012-04-19  3:03 Mikulas Patocka
  2012-04-19  5:17 ` Jun'ichi Nomura
  2012-04-23 13:14 ` Joe Thornber
  0 siblings, 2 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-04-19  3:03 UTC (permalink / raw)
  To: Alasdair G. Kergon, dm-devel

Hi

I placed dm lock optimization patches here

http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/

The patches remove cache-line bouncing from device mapper when multiple 
processors submit requests simultaneously.


I was able to measure performance improvement, although in a very specific 
setup (it is unmeasurable with normal disks).

The setup to measure it is:
- two quad-core Opterons
- use ramdisk as a block device, leave it empty
- create 11 nested dm-linear mappings on it (so that each i/o passes 
through 11 levels of dm-linear and then goes to the ramdisk)
- run fio with 8 threads, using direct-io to read 512-byte blocks
(time fio --rw=randread --size=1G --bsrange=512-512 --direct=1 
--filename=/dev/mapper/dm-test --name=job1 --name=job2 --name=job3 
--name=job4 --name=job5 --name=job6 --name=job7 --name=job8)

With this setup it can be measured that the patches help:

no patch (3.3 kernel):  68.9s
patch 1:                64.3s
patch 1,2:              52.3s
patch 1,2,3:            45.2s
patch 1,2,3,4:          41.2s
patch 1,2,3,4,5:        34.0s

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-04-19  3:03 [PATCHES]: dm lock optimization Mikulas Patocka
@ 2012-04-19  5:17 ` Jun'ichi Nomura
  2012-04-21 16:17   ` Mikulas Patocka
  2012-04-23 13:14 ` Joe Thornber
  1 sibling, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-04-19  5:17 UTC (permalink / raw)
  To: dm-devel

Hello Mikulas,

On 04/19/12 12:03, Mikulas Patocka wrote:
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> 
> The patches remove cache-line bouncing from device mapper when multiple 
> processors submit requests simultaneously.

I think the idea of optimization is nice.

From a quick look, however, I have some comments:

dm-optimize-no-irqsave-map-lock.patch:
  - dm_get_live_table() is called from request_fn,
    that can be called from in_interrupt context.
    So the added BUG_ON will be triggered.

dm-optimize-percpu-io-lock.patch:
  - The following existing mechanisms could be used:
      * include/linux/rcupdate.h
      * include/linux/lglock.h
    (or extended if necessary).

dm-optimize-get_live_table_fast.patch:
  - dm_lld_busy() can (theoretically, if dm-mpath is stacked)
    be called from in_interrupt context and trigger the
    BUG_ON.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-04-19  5:17 ` Jun'ichi Nomura
@ 2012-04-21 16:17   ` Mikulas Patocka
  2012-04-23 10:56     ` Jun'ichi Nomura
  0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-04-21 16:17 UTC (permalink / raw)
  To: device-mapper development; +Cc: Jun'ichi Nomura, Alasdair G. Kergon



On Thu, 19 Apr 2012, Jun'ichi Nomura wrote:

> Hello Mikulas,
> 
> On 04/19/12 12:03, Mikulas Patocka wrote:
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> > 
> > The patches remove cache-line bouncing from device mapper when multiple 
> > processors submit requests simultaneously.
> 
> I think the idea of optimization is nice.
> 
> >From a quick look, however, I have some comments:
> 
> dm-optimize-no-irqsave-map-lock.patch:
>   - dm_get_live_table() is called from request_fn,
>     that can be called from in_interrupt context.
>     So the added BUG_ON will be triggered.
> 
> dm-optimize-percpu-io-lock.patch:
>   - The following existing mechanisms could be used:
>       * include/linux/rcupdate.h
>       * include/linux/lglock.h
>     (or extended if necessary).
> 
> dm-optimize-get_live_table_fast.patch:
>   - dm_lld_busy() can (theoretically, if dm-mpath is stacked)
>     be called from in_interrupt context and trigger the
>     BUG_ON.
> 
> -- 
> Jun'ichi Nomura, NEC Corporation

Hi

I created new patches that use rcu instead of map_lock, so they address 
the issues you mentioned. Get the new patches here:
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/

performance with new patches:
no patch:               69.3
patch 1:                54.0
patch 1,2:              44.2
patch 1,2,3:            39.8
patch 1,2,3,4:          32.7

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-04-21 16:17   ` Mikulas Patocka
@ 2012-04-23 10:56     ` Jun'ichi Nomura
  2012-05-02  2:17       ` Mikulas Patocka
  0 siblings, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-04-23 10:56 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Alasdair G. Kergon

Hi Mikulas,

On 04/22/12 01:17, Mikulas Patocka wrote:
> I created new patches that use rcu instead of map_lock, so they address 
> the issues you mentioned. Get the new patches here:
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> 
> performance with new patches:
> no patch:               69.3
> patch 1:                54.0
> patch 1,2:              44.2
> patch 1,2,3:            39.8
> patch 1,2,3,4:          32.7

Thank you. I have 2 comments for the new patches.

synchronize_rcu could be put in dm_table_destroy() instead of __bind().
I think it's safer place to wait.

io_lock could be converted to SRCU.
I.e. something like:
  On reader-side:
    idx = srcu_read_lock(io_srcu);
    if (!DMF_BLOCK_IO_FOR_SUSPEND)
      split_and_process_bio();
    srcu_read_unlock(io_srcu,idx);
  In dm_suspend:
    set_bit(DMF_BLOCK_IO_FOR_SUSPEND);
    mb();
    synchronize_srcu(io_srcu);
    <from here, nobody will enter split_and_process_bio>
That makes dm-optimize-percpu-io-lock.patch simpler.
dm-optimize-take-io_lock-on-table-swap.patch may become simpler, too.

-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-04-19  3:03 [PATCHES]: dm lock optimization Mikulas Patocka
  2012-04-19  5:17 ` Jun'ichi Nomura
@ 2012-04-23 13:14 ` Joe Thornber
  2012-05-02  0:03   ` Mikulas Patocka
  1 sibling, 1 reply; 10+ messages in thread
From: Joe Thornber @ 2012-04-23 13:14 UTC (permalink / raw)
  To: device-mapper development; +Cc: Alasdair G. Kergon

On Wed, Apr 18, 2012 at 11:03:33PM -0400, Mikulas Patocka wrote:
> Hi
> 
> I placed dm lock optimization patches here
> 
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/

Just a heads up that sparse gives some warnings with these patches:

drivers/md/dm.c:689:24: warning: context imbalance in 'dm_get_live_table_fast' - wrong count at exit
drivers/md/dm.c:695:13: warning: context imbalance in 'dm_put_live_table_fast' - unexpected unlock
drivers/md/dm.c:1789:17: warning: context imbalance in 'dm_request_fn' - unexpected unlock

- Joe

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-04-23 13:14 ` Joe Thornber
@ 2012-05-02  0:03   ` Mikulas Patocka
  0 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-05-02  0:03 UTC (permalink / raw)
  To: Joe Thornber; +Cc: device-mapper development, Alasdair G. Kergon



On Mon, 23 Apr 2012, Joe Thornber wrote:

> On Wed, Apr 18, 2012 at 11:03:33PM -0400, Mikulas Patocka wrote:
> > Hi
> > 
> > I placed dm lock optimization patches here
> > 
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> 
> Just a heads up that sparse gives some warnings with these patches:
> 
> drivers/md/dm.c:689:24: warning: context imbalance in 'dm_get_live_table_fast' - wrong count at exit
> drivers/md/dm.c:695:13: warning: context imbalance in 'dm_put_live_table_fast' - unexpected unlock
> drivers/md/dm.c:1789:17: warning: context imbalance in 'dm_request_fn' - unexpected unlock
> 
> - Joe

I couldn't figure out how to shut up sparse warnings. For RCU you can mark 
functions that take/release RCU with __acquires(RCU)/__releases(RCU), but 
I didn't find out how to do it for SRCU.

If I use "__acquires(&md->io_barrier)"
struct dm_table *dm_get_live_table(struct mapped_device *md, int 
*srcu_idx) __acquires(&md->io_barrier)
{
        *srcu_idx = srcu_read_lock(&md->io_barrier);
        return rcu_dereference(md->map);
}
it doens't work and I get an error:
drivers/md/dm.c:557:9: warning: context imbalance in 'dm_get_live_table': 
unexpected unlock
drivers/md/dm.c:557:9:    default context: wanted 1, got 0

Maybe it is a bug in sparse, maybe I am using a wrong tag? Is there a 
reference of all sparse tags somewhere?

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-04-23 10:56     ` Jun'ichi Nomura
@ 2012-05-02  2:17       ` Mikulas Patocka
  2012-05-10  4:33         ` Jun'ichi Nomura
  0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-05-02  2:17 UTC (permalink / raw)
  To: Jun'ichi Nomura; +Cc: device-mapper development, Alasdair G. Kergon

Hi

I placed the new code using srcu here: 
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/

It removes io_lock, map_lock and holders and replaces them with srcu.


On Mon, 23 Apr 2012, Jun'ichi Nomura wrote:

> Hi Mikulas,
> 
> On 04/22/12 01:17, Mikulas Patocka wrote:
> > I created new patches that use rcu instead of map_lock, so they address 
> > the issues you mentioned. Get the new patches here:
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> > 
> > performance with new patches:
> > no patch:               69.3
> > patch 1:                54.0
> > patch 1,2:              44.2
> > patch 1,2,3:            39.8
> > patch 1,2,3,4:          32.7
> 
> Thank you. I have 2 comments for the new patches.
> 
> synchronize_rcu could be put in dm_table_destroy() instead of __bind().
> I think it's safer place to wait.

I think the code is more readable if synchronizing rcu is just after 
assigning the pointer that is protected by rcu.

> io_lock could be converted to SRCU.
> I.e. something like:
>   On reader-side:
>     idx = srcu_read_lock(io_srcu);
>     if (!DMF_BLOCK_IO_FOR_SUSPEND)
>       split_and_process_bio();
>     srcu_read_unlock(io_srcu,idx);
>   In dm_suspend:
>     set_bit(DMF_BLOCK_IO_FOR_SUSPEND);
>     mb();
>     synchronize_srcu(io_srcu);
>     <from here, nobody will enter split_and_process_bio>
> That makes dm-optimize-percpu-io-lock.patch simpler.
> dm-optimize-take-io_lock-on-table-swap.patch may become simpler, too.
> 
> -- 
> Jun'ichi Nomura, NEC Corporation

Mikulas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-05-02  2:17       ` Mikulas Patocka
@ 2012-05-10  4:33         ` Jun'ichi Nomura
  2012-05-18  6:37           ` Mikulas Patocka
  0 siblings, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-05-10  4:33 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Alasdair G. Kergon

Hi Mikulas,

On 05/02/12 11:17, Mikulas Patocka wrote:
> I placed the new code using srcu here: 
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> 
> It removes io_lock, map_lock and holders and replaces them with srcu.

I've reviewed the patches and here are some comments.
The 1st one seems to be a bug.
Others are minor comments about readability.

  - There are functions destroying inactive table
    without SRCU synchronization.
      * table_load
      * table_clear
      * do_resume
      * __hash_remove
    It could cause use-after-free by the user of
    dm_get_inactive_table().
    synchronization is needed, outside of exclusive hash_lock.

  - I couldn't see the reason why locking is different
    for request-based in dm_wq_work().

  - We could use dm_get_live_table() in the caller of
    __split_and_process_bio() and pass the result to it.
    Then, naked use of srcu_read_lock/unlock and
    rcu_dereference can be eliminated.
    I think it helps readability.

  - md->map is directly referenced in dm_suspend/dm_resume.
    It's ok because md->suspend_lock is taken and nobody
    can replace md->map. I think it's worth putting a comment
    in 'struct mapped_device' about the rule.

Attached patch explains the above comments by code.

>> synchronize_rcu could be put in dm_table_destroy() instead of __bind().
>> I think it's safer place to wait.
> 
> I think the code is more readable if synchronizing rcu is just after 
> assigning the pointer that is protected by rcu.

OK. I don't object.

Thanks,
--- 
Jun'ichi Nomura, NEC Corporation

* Added a comment about locking for reading md->map
* dm_sync_table() to wait for other processes to exit from
  read-side critical section
* __split_and_process_bio() takes map
* __hash_remove() returns a table pointer to be destroyed later
* Added dm_sync_table() in functions in dm-ioctl.c to synchronize with
  inactive table users

Index: linux-3.3/drivers/md/dm.c
===================================================================
--- linux-3.3.orig/drivers/md/dm.c	2012-05-10 12:19:10.977242964 +0900
+++ linux-3.3/drivers/md/dm.c	2012-05-10 13:54:30.150853855 +0900
@@ -141,6 +141,8 @@ struct mapped_device {
 
 	/*
 	 * The current mapping.
+	 * Use dm_get_live_table{_fast} or take suspend_lock for
+	 * dereference.
 	 */
 	struct dm_table *map;
 
@@ -561,6 +563,12 @@ void dm_put_live_table(struct mapped_dev
 	srcu_read_unlock(&md->io_barrier, srcu_idx);
 }
 
+void dm_sync_table(struct mapped_device *md)
+{
+	synchronize_srcu(&md->io_barrier);
+	synchronize_rcu_expedited();
+}
+
 /*
  * A fast alternative to dm_get_live_table/dm_put_live_table.
  * The caller must not block between these two functions.
@@ -1316,17 +1324,18 @@ static int __clone_and_map(struct clone_
 /*
  * Split the bio into several clones and submit it to targets.
  */
-static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
+static void __split_and_process_bio(struct mapped_device *md,
+				    struct dm_table *map, struct bio *bio)
 {
 	struct clone_info ci;
 	int error = 0;
 
-	ci.map = srcu_dereference(md->map, &md->io_barrier);
-	if (unlikely(!ci.map)) {
+	if (unlikely(!map)) {
 		bio_io_error(bio);
 		return;
 	}
 
+	ci.map = map;
 	ci.md = md;
 	ci.io = alloc_io(md);
 	ci.io->error = 0;
@@ -1422,8 +1431,9 @@ static void _dm_request(struct request_q
 	struct mapped_device *md = q->queuedata;
 	int cpu;
 	int srcu_idx;
+	struct dm_table *map;
 
-	srcu_idx = srcu_read_lock(&md->io_barrier);
+	map = dm_get_live_table(md, &srcu_idx);
 
 	cpu = part_stat_lock();
 	part_stat_inc(cpu, &dm_disk(md)->part0, ios[rw]);
@@ -1432,7 +1442,7 @@ static void _dm_request(struct request_q
 
 	/* if we're suspended, we have to queue this io for later */
 	if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
-		srcu_read_unlock(&md->io_barrier, srcu_idx);
+		dm_put_live_table(md, srcu_idx);
 
 		if (bio_rw(bio) != READA)
 			queue_io(md, bio);
@@ -1441,8 +1451,8 @@ static void _dm_request(struct request_q
 		return;
 	}
 
-	__split_and_process_bio(md, bio);
-	srcu_read_unlock(&md->io_barrier, srcu_idx);
+	__split_and_process_bio(md, map, bio);
+	dm_put_live_table(md, srcu_idx);
 	return;
 }
 
@@ -2115,8 +2125,7 @@ static struct dm_table *__bind(struct ma
 		set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
 	else
 		clear_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
-	synchronize_srcu(&md->io_barrier);
-	synchronize_rcu_expedited();
+	dm_sync_table(md);
 
 	return old_map;
 }
@@ -2133,8 +2142,7 @@ static struct dm_table *__unbind(struct 
 
 	dm_table_event_callback(map, NULL, NULL);
 	rcu_assign_pointer(md->map, NULL);
-	synchronize_srcu(&md->io_barrier);
-	synchronize_rcu_expedited();
+	dm_sync_table(md);
 
 	return map;
 }
@@ -2375,8 +2383,9 @@ static void dm_wq_work(struct work_struc
 						work);
 	struct bio *c;
 	int srcu_idx;
+	struct dm_table *map;
 
-	srcu_idx = srcu_read_lock(&md->io_barrier);
+	map = dm_get_live_table(md, &srcu_idx);
 
 	while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
 		spin_lock_irq(&md->deferred_lock);
@@ -2386,15 +2395,13 @@ static void dm_wq_work(struct work_struc
 		if (!c)
 			break;
 
-		if (dm_request_based(md)) {
-			srcu_read_unlock(&md->io_barrier, srcu_idx);
+		if (dm_request_based(md))
 			generic_make_request(c);
-			srcu_idx = srcu_read_lock(&md->io_barrier);
-		} else
-			__split_and_process_bio(md, c);
+		else
+			__split_and_process_bio(md, map, c);
 	}
 
-	srcu_read_unlock(&md->io_barrier, srcu_idx);
+	dm_put_live_table(md, srcu_idx);
 }
 
 static void dm_queue_flush(struct mapped_device *md)
Index: linux-3.3/drivers/md/dm-ioctl.c
===================================================================
--- linux-3.3.orig/drivers/md/dm-ioctl.c	2012-05-10 12:19:10.995242964 +0900
+++ linux-3.3/drivers/md/dm-ioctl.c	2012-05-10 12:54:30.312180811 +0900
@@ -250,7 +250,7 @@ static int dm_hash_insert(const char *na
 	return -EBUSY;
 }
 
-static void __hash_remove(struct hash_cell *hc)
+static struct dm_table *__hash_remove(struct hash_cell *hc)
 {
 	struct dm_table *table;
 	int srcu_idx;
@@ -267,10 +267,13 @@ static void __hash_remove(struct hash_ce
 		dm_table_event(table);
 	dm_put_live_table(hc->md, srcu_idx);
 
+	table = NULL;
 	if (hc->new_map)
-		dm_table_destroy(hc->new_map);
+		table = hc->new_map;
 	dm_put(hc->md);
 	free_cell(hc);
+
+	return table;
 }
 
 static void dm_hash_remove_all(int keep_open_devices)
@@ -278,6 +281,7 @@ static void dm_hash_remove_all(int keep_
 	int i, dev_skipped;
 	struct hash_cell *hc;
 	struct mapped_device *md;
+	struct dm_table *t;
 
 retry:
 	dev_skipped = 0;
@@ -295,10 +299,14 @@ retry:
 				continue;
 			}
 
-			__hash_remove(hc);
+			t = __hash_remove(hc);
 
 			up_write(&_hash_lock);
 
+			if (t) {
+				dm_sync_table(md);
+				dm_table_destroy(t);
+			}
 			dm_put(md);
 			if (likely(keep_open_devices))
 				dm_destroy(md);
@@ -808,6 +816,7 @@ static int dev_remove(struct dm_ioctl *p
 	struct hash_cell *hc;
 	struct mapped_device *md;
 	int r;
+	struct dm_table *t;
 
 	down_write(&_hash_lock);
 	hc = __find_device_hash_cell(param);
@@ -831,9 +840,14 @@ static int dev_remove(struct dm_ioctl *p
 		return r;
 	}
 
-	__hash_remove(hc);
+	t = __hash_remove(hc);
 	up_write(&_hash_lock);
 
+	if (t) {
+		dm_sync_table(md);
+		dm_table_destroy(t);
+	}
+
 	if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr))
 		param->flags |= DM_UEVENT_GENERATED_FLAG;
 
@@ -997,6 +1011,7 @@ static int do_resume(struct dm_ioctl *pa
 
 		old_map = dm_swap_table(md, new_map);
 		if (IS_ERR(old_map)) {
+			dm_sync_table(md);
 			dm_table_destroy(new_map);
 			dm_put(md);
 			return PTR_ERR(old_map);
@@ -1014,6 +1029,10 @@ static int do_resume(struct dm_ioctl *pa
 			param->flags |= DM_UEVENT_GENERATED_FLAG;
 	}
 
+	/*
+	 * Since dm_swap_table synchronizes RCU, nobody should be in
+	 * read-side critical section already.
+	 */
 	if (old_map)
 		dm_table_destroy(old_map);
 
@@ -1225,7 +1244,7 @@ static int table_load(struct dm_ioctl *p
 {
 	int r;
 	struct hash_cell *hc;
-	struct dm_table *t;
+	struct dm_table *t, *old_map = NULL;
 	struct mapped_device *md;
 	struct target_type *immutable_target_type;
 
@@ -1281,14 +1300,14 @@ static int table_load(struct dm_ioctl *p
 	hc = dm_get_mdptr(md);
 	if (!hc || hc->md != md) {
 		DMWARN("device has been removed from the dev hash table.");
-		dm_table_destroy(t);
 		up_write(&_hash_lock);
+		dm_table_destroy(t);
 		r = -ENXIO;
 		goto out;
 	}
 
 	if (hc->new_map)
-		dm_table_destroy(hc->new_map);
+		old_map = hc->new_map;
 	hc->new_map = t;
 	up_write(&_hash_lock);
 
@@ -1296,6 +1315,11 @@ static int table_load(struct dm_ioctl *p
 	__dev_status(md, param);
 
 out:
+	if (old_map) {
+		dm_sync_table(md);
+		dm_table_destroy(old_map);
+	}
+
 	dm_put(md);
 
 	return r;
@@ -1305,6 +1329,7 @@ static int table_clear(struct dm_ioctl *
 {
 	struct hash_cell *hc;
 	struct mapped_device *md;
+	struct dm_table *old_map = NULL;
 
 	down_write(&_hash_lock);
 
@@ -1316,7 +1341,7 @@ static int table_clear(struct dm_ioctl *
 	}
 
 	if (hc->new_map) {
-		dm_table_destroy(hc->new_map);
+		old_map = hc->new_map;
 		hc->new_map = NULL;
 	}
 
@@ -1325,6 +1350,10 @@ static int table_clear(struct dm_ioctl *
 	__dev_status(hc->md, param);
 	md = hc->md;
 	up_write(&_hash_lock);
+	if (old_map) {
+		dm_sync_table(md);
+		dm_table_destroy(old_map);
+	}
 	dm_put(md);
 
 	return 0;
Index: linux-3.3/include/linux/device-mapper.h
===================================================================
--- linux-3.3.orig/include/linux/device-mapper.h	2012-05-10 10:03:06.000000000 +0900
+++ linux-3.3/include/linux/device-mapper.h	2012-05-10 12:45:48.510196182 +0900
@@ -364,6 +364,7 @@ int dm_table_complete(struct dm_table *t
  */
 struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx);
 void dm_put_live_table(struct mapped_device *md, int srcu_idx);
+void dm_sync_table(struct mapped_device *md);
 
 /*
  * Queries

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-05-10  4:33         ` Jun'ichi Nomura
@ 2012-05-18  6:37           ` Mikulas Patocka
  2012-05-23  6:27             ` Jun'ichi Nomura
  0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-05-18  6:37 UTC (permalink / raw)
  To: Jun'ichi Nomura; +Cc: device-mapper development, Alasdair G. Kergon

Hi

Thanks for reviewing this. I applied your patch, the new version is here: 
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/

Mikulas


On Thu, 10 May 2012, Jun'ichi Nomura wrote:

> Hi Mikulas,
> 
> On 05/02/12 11:17, Mikulas Patocka wrote:
> > I placed the new code using srcu here: 
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> > 
> > It removes io_lock, map_lock and holders and replaces them with srcu.
> 
> I've reviewed the patches and here are some comments.
> The 1st one seems to be a bug.
> Others are minor comments about readability.
> 
>   - There are functions destroying inactive table
>     without SRCU synchronization.
>       * table_load
>       * table_clear
>       * do_resume
>       * __hash_remove
>     It could cause use-after-free by the user of
>     dm_get_inactive_table().
>     synchronization is needed, outside of exclusive hash_lock.
> 
>   - I couldn't see the reason why locking is different
>     for request-based in dm_wq_work().
> 
>   - We could use dm_get_live_table() in the caller of
>     __split_and_process_bio() and pass the result to it.
>     Then, naked use of srcu_read_lock/unlock and
>     rcu_dereference can be eliminated.
>     I think it helps readability.
> 
>   - md->map is directly referenced in dm_suspend/dm_resume.
>     It's ok because md->suspend_lock is taken and nobody
>     can replace md->map. I think it's worth putting a comment
>     in 'struct mapped_device' about the rule.
> 
> Attached patch explains the above comments by code.
> 
> >> synchronize_rcu could be put in dm_table_destroy() instead of __bind().
> >> I think it's safer place to wait.
> > 
> > I think the code is more readable if synchronizing rcu is just after 
> > assigning the pointer that is protected by rcu.
> 
> OK. I don't object.
> 
> Thanks,
> --- 
> Jun'ichi Nomura, NEC Corporation
> 
> * Added a comment about locking for reading md->map
> * dm_sync_table() to wait for other processes to exit from
>   read-side critical section
> * __split_and_process_bio() takes map
> * __hash_remove() returns a table pointer to be destroyed later
> * Added dm_sync_table() in functions in dm-ioctl.c to synchronize with
>   inactive table users
> 
> Index: linux-3.3/drivers/md/dm.c
> ===================================================================
> --- linux-3.3.orig/drivers/md/dm.c	2012-05-10 12:19:10.977242964 +0900
> +++ linux-3.3/drivers/md/dm.c	2012-05-10 13:54:30.150853855 +0900
> @@ -141,6 +141,8 @@ struct mapped_device {
>  
>  	/*
>  	 * The current mapping.
> +	 * Use dm_get_live_table{_fast} or take suspend_lock for
> +	 * dereference.
>  	 */
>  	struct dm_table *map;
>  
> @@ -561,6 +563,12 @@ void dm_put_live_table(struct mapped_dev
>  	srcu_read_unlock(&md->io_barrier, srcu_idx);
>  }
>  
> +void dm_sync_table(struct mapped_device *md)
> +{
> +	synchronize_srcu(&md->io_barrier);
> +	synchronize_rcu_expedited();
> +}
> +
>  /*
>   * A fast alternative to dm_get_live_table/dm_put_live_table.
>   * The caller must not block between these two functions.
> @@ -1316,17 +1324,18 @@ static int __clone_and_map(struct clone_
>  /*
>   * Split the bio into several clones and submit it to targets.
>   */
> -static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
> +static void __split_and_process_bio(struct mapped_device *md,
> +				    struct dm_table *map, struct bio *bio)
>  {
>  	struct clone_info ci;
>  	int error = 0;
>  
> -	ci.map = srcu_dereference(md->map, &md->io_barrier);
> -	if (unlikely(!ci.map)) {
> +	if (unlikely(!map)) {
>  		bio_io_error(bio);
>  		return;
>  	}
>  
> +	ci.map = map;
>  	ci.md = md;
>  	ci.io = alloc_io(md);
>  	ci.io->error = 0;
> @@ -1422,8 +1431,9 @@ static void _dm_request(struct request_q
>  	struct mapped_device *md = q->queuedata;
>  	int cpu;
>  	int srcu_idx;
> +	struct dm_table *map;
>  
> -	srcu_idx = srcu_read_lock(&md->io_barrier);
> +	map = dm_get_live_table(md, &srcu_idx);
>  
>  	cpu = part_stat_lock();
>  	part_stat_inc(cpu, &dm_disk(md)->part0, ios[rw]);
> @@ -1432,7 +1442,7 @@ static void _dm_request(struct request_q
>  
>  	/* if we're suspended, we have to queue this io for later */
>  	if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
> -		srcu_read_unlock(&md->io_barrier, srcu_idx);
> +		dm_put_live_table(md, srcu_idx);
>  
>  		if (bio_rw(bio) != READA)
>  			queue_io(md, bio);
> @@ -1441,8 +1451,8 @@ static void _dm_request(struct request_q
>  		return;
>  	}
>  
> -	__split_and_process_bio(md, bio);
> -	srcu_read_unlock(&md->io_barrier, srcu_idx);
> +	__split_and_process_bio(md, map, bio);
> +	dm_put_live_table(md, srcu_idx);
>  	return;
>  }
>  
> @@ -2115,8 +2125,7 @@ static struct dm_table *__bind(struct ma
>  		set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
>  	else
>  		clear_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
> -	synchronize_srcu(&md->io_barrier);
> -	synchronize_rcu_expedited();
> +	dm_sync_table(md);
>  
>  	return old_map;
>  }
> @@ -2133,8 +2142,7 @@ static struct dm_table *__unbind(struct 
>  
>  	dm_table_event_callback(map, NULL, NULL);
>  	rcu_assign_pointer(md->map, NULL);
> -	synchronize_srcu(&md->io_barrier);
> -	synchronize_rcu_expedited();
> +	dm_sync_table(md);
>  
>  	return map;
>  }
> @@ -2375,8 +2383,9 @@ static void dm_wq_work(struct work_struc
>  						work);
>  	struct bio *c;
>  	int srcu_idx;
> +	struct dm_table *map;
>  
> -	srcu_idx = srcu_read_lock(&md->io_barrier);
> +	map = dm_get_live_table(md, &srcu_idx);
>  
>  	while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
>  		spin_lock_irq(&md->deferred_lock);
> @@ -2386,15 +2395,13 @@ static void dm_wq_work(struct work_struc
>  		if (!c)
>  			break;
>  
> -		if (dm_request_based(md)) {
> -			srcu_read_unlock(&md->io_barrier, srcu_idx);
> +		if (dm_request_based(md))
>  			generic_make_request(c);
> -			srcu_idx = srcu_read_lock(&md->io_barrier);
> -		} else
> -			__split_and_process_bio(md, c);
> +		else
> +			__split_and_process_bio(md, map, c);
>  	}
>  
> -	srcu_read_unlock(&md->io_barrier, srcu_idx);
> +	dm_put_live_table(md, srcu_idx);
>  }
>  
>  static void dm_queue_flush(struct mapped_device *md)
> Index: linux-3.3/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-3.3.orig/drivers/md/dm-ioctl.c	2012-05-10 12:19:10.995242964 +0900
> +++ linux-3.3/drivers/md/dm-ioctl.c	2012-05-10 12:54:30.312180811 +0900
> @@ -250,7 +250,7 @@ static int dm_hash_insert(const char *na
>  	return -EBUSY;
>  }
>  
> -static void __hash_remove(struct hash_cell *hc)
> +static struct dm_table *__hash_remove(struct hash_cell *hc)
>  {
>  	struct dm_table *table;
>  	int srcu_idx;
> @@ -267,10 +267,13 @@ static void __hash_remove(struct hash_ce
>  		dm_table_event(table);
>  	dm_put_live_table(hc->md, srcu_idx);
>  
> +	table = NULL;
>  	if (hc->new_map)
> -		dm_table_destroy(hc->new_map);
> +		table = hc->new_map;
>  	dm_put(hc->md);
>  	free_cell(hc);
> +
> +	return table;
>  }
>  
>  static void dm_hash_remove_all(int keep_open_devices)
> @@ -278,6 +281,7 @@ static void dm_hash_remove_all(int keep_
>  	int i, dev_skipped;
>  	struct hash_cell *hc;
>  	struct mapped_device *md;
> +	struct dm_table *t;
>  
>  retry:
>  	dev_skipped = 0;
> @@ -295,10 +299,14 @@ retry:
>  				continue;
>  			}
>  
> -			__hash_remove(hc);
> +			t = __hash_remove(hc);
>  
>  			up_write(&_hash_lock);
>  
> +			if (t) {
> +				dm_sync_table(md);
> +				dm_table_destroy(t);
> +			}
>  			dm_put(md);
>  			if (likely(keep_open_devices))
>  				dm_destroy(md);
> @@ -808,6 +816,7 @@ static int dev_remove(struct dm_ioctl *p
>  	struct hash_cell *hc;
>  	struct mapped_device *md;
>  	int r;
> +	struct dm_table *t;
>  
>  	down_write(&_hash_lock);
>  	hc = __find_device_hash_cell(param);
> @@ -831,9 +840,14 @@ static int dev_remove(struct dm_ioctl *p
>  		return r;
>  	}
>  
> -	__hash_remove(hc);
> +	t = __hash_remove(hc);
>  	up_write(&_hash_lock);
>  
> +	if (t) {
> +		dm_sync_table(md);
> +		dm_table_destroy(t);
> +	}
> +
>  	if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr))
>  		param->flags |= DM_UEVENT_GENERATED_FLAG;
>  
> @@ -997,6 +1011,7 @@ static int do_resume(struct dm_ioctl *pa
>  
>  		old_map = dm_swap_table(md, new_map);
>  		if (IS_ERR(old_map)) {
> +			dm_sync_table(md);
>  			dm_table_destroy(new_map);
>  			dm_put(md);
>  			return PTR_ERR(old_map);
> @@ -1014,6 +1029,10 @@ static int do_resume(struct dm_ioctl *pa
>  			param->flags |= DM_UEVENT_GENERATED_FLAG;
>  	}
>  
> +	/*
> +	 * Since dm_swap_table synchronizes RCU, nobody should be in
> +	 * read-side critical section already.
> +	 */
>  	if (old_map)
>  		dm_table_destroy(old_map);
>  
> @@ -1225,7 +1244,7 @@ static int table_load(struct dm_ioctl *p
>  {
>  	int r;
>  	struct hash_cell *hc;
> -	struct dm_table *t;
> +	struct dm_table *t, *old_map = NULL;
>  	struct mapped_device *md;
>  	struct target_type *immutable_target_type;
>  
> @@ -1281,14 +1300,14 @@ static int table_load(struct dm_ioctl *p
>  	hc = dm_get_mdptr(md);
>  	if (!hc || hc->md != md) {
>  		DMWARN("device has been removed from the dev hash table.");
> -		dm_table_destroy(t);
>  		up_write(&_hash_lock);
> +		dm_table_destroy(t);
>  		r = -ENXIO;
>  		goto out;
>  	}
>  
>  	if (hc->new_map)
> -		dm_table_destroy(hc->new_map);
> +		old_map = hc->new_map;
>  	hc->new_map = t;
>  	up_write(&_hash_lock);
>  
> @@ -1296,6 +1315,11 @@ static int table_load(struct dm_ioctl *p
>  	__dev_status(md, param);
>  
>  out:
> +	if (old_map) {
> +		dm_sync_table(md);
> +		dm_table_destroy(old_map);
> +	}
> +
>  	dm_put(md);
>  
>  	return r;
> @@ -1305,6 +1329,7 @@ static int table_clear(struct dm_ioctl *
>  {
>  	struct hash_cell *hc;
>  	struct mapped_device *md;
> +	struct dm_table *old_map = NULL;
>  
>  	down_write(&_hash_lock);
>  
> @@ -1316,7 +1341,7 @@ static int table_clear(struct dm_ioctl *
>  	}
>  
>  	if (hc->new_map) {
> -		dm_table_destroy(hc->new_map);
> +		old_map = hc->new_map;
>  		hc->new_map = NULL;
>  	}
>  
> @@ -1325,6 +1350,10 @@ static int table_clear(struct dm_ioctl *
>  	__dev_status(hc->md, param);
>  	md = hc->md;
>  	up_write(&_hash_lock);
> +	if (old_map) {
> +		dm_sync_table(md);
> +		dm_table_destroy(old_map);
> +	}
>  	dm_put(md);
>  
>  	return 0;
> Index: linux-3.3/include/linux/device-mapper.h
> ===================================================================
> --- linux-3.3.orig/include/linux/device-mapper.h	2012-05-10 10:03:06.000000000 +0900
> +++ linux-3.3/include/linux/device-mapper.h	2012-05-10 12:45:48.510196182 +0900
> @@ -364,6 +364,7 @@ int dm_table_complete(struct dm_table *t
>   */
>  struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx);
>  void dm_put_live_table(struct mapped_device *md, int srcu_idx);
> +void dm_sync_table(struct mapped_device *md);
>  
>  /*
>   * Queries
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHES]: dm lock optimization
  2012-05-18  6:37           ` Mikulas Patocka
@ 2012-05-23  6:27             ` Jun'ichi Nomura
  0 siblings, 0 replies; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-05-23  6:27 UTC (permalink / raw)
  To: Mikulas Patocka; +Cc: device-mapper development, Alasdair G. Kergon

Hi Mikulas,

On 05/18/12 15:37, Mikulas Patocka wrote:
> Thanks for reviewing this. I applied your patch, the new version is here: 
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/

The patch (dm-optimize.patch) looks good.
I have no other issues with it.
Thank you.
-- 
Jun'ichi Nomura, NEC Corporation

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-05-23  6:27 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19  3:03 [PATCHES]: dm lock optimization Mikulas Patocka
2012-04-19  5:17 ` Jun'ichi Nomura
2012-04-21 16:17   ` Mikulas Patocka
2012-04-23 10:56     ` Jun'ichi Nomura
2012-05-02  2:17       ` Mikulas Patocka
2012-05-10  4:33         ` Jun'ichi Nomura
2012-05-18  6:37           ` Mikulas Patocka
2012-05-23  6:27             ` Jun'ichi Nomura
2012-04-23 13:14 ` Joe Thornber
2012-05-02  0:03   ` Mikulas Patocka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.