* [PATCHES]: dm lock optimization
@ 2012-04-19 3:03 Mikulas Patocka
2012-04-19 5:17 ` Jun'ichi Nomura
2012-04-23 13:14 ` Joe Thornber
0 siblings, 2 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-04-19 3:03 UTC (permalink / raw)
To: Alasdair G. Kergon, dm-devel
Hi
I placed dm lock optimization patches here
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
The patches remove cache-line bouncing from device mapper when multiple
processors submit requests simultaneously.
I was able to measure performance improvement, although in a very specific
setup (it is unmeasurable with normal disks).
The setup to measure it is:
- two quad-core Opterons
- use ramdisk as a block device, leave it empty
- create 11 nested dm-linear mappings on it (so that each i/o passes
through 11 levels of dm-linear and then goes to the ramdisk)
- run fio with 8 threads, using direct-io to read 512-byte blocks
(time fio --rw=randread --size=1G --bsrange=512-512 --direct=1
--filename=/dev/mapper/dm-test --name=job1 --name=job2 --name=job3
--name=job4 --name=job5 --name=job6 --name=job7 --name=job8)
With this setup it can be measured that the patches help:
no patch (3.3 kernel): 68.9s
patch 1: 64.3s
patch 1,2: 52.3s
patch 1,2,3: 45.2s
patch 1,2,3,4: 41.2s
patch 1,2,3,4,5: 34.0s
Mikulas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-04-19 3:03 [PATCHES]: dm lock optimization Mikulas Patocka
@ 2012-04-19 5:17 ` Jun'ichi Nomura
2012-04-21 16:17 ` Mikulas Patocka
2012-04-23 13:14 ` Joe Thornber
1 sibling, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-04-19 5:17 UTC (permalink / raw)
To: dm-devel
Hello Mikulas,
On 04/19/12 12:03, Mikulas Patocka wrote:
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
>
> The patches remove cache-line bouncing from device mapper when multiple
> processors submit requests simultaneously.
I think the idea of optimization is nice.
From a quick look, however, I have some comments:
dm-optimize-no-irqsave-map-lock.patch:
- dm_get_live_table() is called from request_fn,
that can be called from in_interrupt context.
So the added BUG_ON will be triggered.
dm-optimize-percpu-io-lock.patch:
- The following existing mechanisms could be used:
* include/linux/rcupdate.h
* include/linux/lglock.h
(or extended if necessary).
dm-optimize-get_live_table_fast.patch:
- dm_lld_busy() can (theoretically, if dm-mpath is stacked)
be called from in_interrupt context and trigger the
BUG_ON.
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-04-19 5:17 ` Jun'ichi Nomura
@ 2012-04-21 16:17 ` Mikulas Patocka
2012-04-23 10:56 ` Jun'ichi Nomura
0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-04-21 16:17 UTC (permalink / raw)
To: device-mapper development; +Cc: Jun'ichi Nomura, Alasdair G. Kergon
On Thu, 19 Apr 2012, Jun'ichi Nomura wrote:
> Hello Mikulas,
>
> On 04/19/12 12:03, Mikulas Patocka wrote:
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> >
> > The patches remove cache-line bouncing from device mapper when multiple
> > processors submit requests simultaneously.
>
> I think the idea of optimization is nice.
>
> >From a quick look, however, I have some comments:
>
> dm-optimize-no-irqsave-map-lock.patch:
> - dm_get_live_table() is called from request_fn,
> that can be called from in_interrupt context.
> So the added BUG_ON will be triggered.
>
> dm-optimize-percpu-io-lock.patch:
> - The following existing mechanisms could be used:
> * include/linux/rcupdate.h
> * include/linux/lglock.h
> (or extended if necessary).
>
> dm-optimize-get_live_table_fast.patch:
> - dm_lld_busy() can (theoretically, if dm-mpath is stacked)
> be called from in_interrupt context and trigger the
> BUG_ON.
>
> --
> Jun'ichi Nomura, NEC Corporation
Hi
I created new patches that use rcu instead of map_lock, so they address
the issues you mentioned. Get the new patches here:
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
performance with new patches:
no patch: 69.3
patch 1: 54.0
patch 1,2: 44.2
patch 1,2,3: 39.8
patch 1,2,3,4: 32.7
Mikulas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-04-21 16:17 ` Mikulas Patocka
@ 2012-04-23 10:56 ` Jun'ichi Nomura
2012-05-02 2:17 ` Mikulas Patocka
0 siblings, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-04-23 10:56 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: device-mapper development, Alasdair G. Kergon
Hi Mikulas,
On 04/22/12 01:17, Mikulas Patocka wrote:
> I created new patches that use rcu instead of map_lock, so they address
> the issues you mentioned. Get the new patches here:
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
>
> performance with new patches:
> no patch: 69.3
> patch 1: 54.0
> patch 1,2: 44.2
> patch 1,2,3: 39.8
> patch 1,2,3,4: 32.7
Thank you. I have 2 comments for the new patches.
synchronize_rcu could be put in dm_table_destroy() instead of __bind().
I think it's safer place to wait.
io_lock could be converted to SRCU.
I.e. something like:
On reader-side:
idx = srcu_read_lock(io_srcu);
if (!DMF_BLOCK_IO_FOR_SUSPEND)
split_and_process_bio();
srcu_read_unlock(io_srcu,idx);
In dm_suspend:
set_bit(DMF_BLOCK_IO_FOR_SUSPEND);
mb();
synchronize_srcu(io_srcu);
<from here, nobody will enter split_and_process_bio>
That makes dm-optimize-percpu-io-lock.patch simpler.
dm-optimize-take-io_lock-on-table-swap.patch may become simpler, too.
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-04-19 3:03 [PATCHES]: dm lock optimization Mikulas Patocka
2012-04-19 5:17 ` Jun'ichi Nomura
@ 2012-04-23 13:14 ` Joe Thornber
2012-05-02 0:03 ` Mikulas Patocka
1 sibling, 1 reply; 10+ messages in thread
From: Joe Thornber @ 2012-04-23 13:14 UTC (permalink / raw)
To: device-mapper development; +Cc: Alasdair G. Kergon
On Wed, Apr 18, 2012 at 11:03:33PM -0400, Mikulas Patocka wrote:
> Hi
>
> I placed dm lock optimization patches here
>
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
Just a heads up that sparse gives some warnings with these patches:
drivers/md/dm.c:689:24: warning: context imbalance in 'dm_get_live_table_fast' - wrong count at exit
drivers/md/dm.c:695:13: warning: context imbalance in 'dm_put_live_table_fast' - unexpected unlock
drivers/md/dm.c:1789:17: warning: context imbalance in 'dm_request_fn' - unexpected unlock
- Joe
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-04-23 13:14 ` Joe Thornber
@ 2012-05-02 0:03 ` Mikulas Patocka
0 siblings, 0 replies; 10+ messages in thread
From: Mikulas Patocka @ 2012-05-02 0:03 UTC (permalink / raw)
To: Joe Thornber; +Cc: device-mapper development, Alasdair G. Kergon
On Mon, 23 Apr 2012, Joe Thornber wrote:
> On Wed, Apr 18, 2012 at 11:03:33PM -0400, Mikulas Patocka wrote:
> > Hi
> >
> > I placed dm lock optimization patches here
> >
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
>
> Just a heads up that sparse gives some warnings with these patches:
>
> drivers/md/dm.c:689:24: warning: context imbalance in 'dm_get_live_table_fast' - wrong count at exit
> drivers/md/dm.c:695:13: warning: context imbalance in 'dm_put_live_table_fast' - unexpected unlock
> drivers/md/dm.c:1789:17: warning: context imbalance in 'dm_request_fn' - unexpected unlock
>
> - Joe
I couldn't figure out how to shut up sparse warnings. For RCU you can mark
functions that take/release RCU with __acquires(RCU)/__releases(RCU), but
I didn't find out how to do it for SRCU.
If I use "__acquires(&md->io_barrier)"
struct dm_table *dm_get_live_table(struct mapped_device *md, int
*srcu_idx) __acquires(&md->io_barrier)
{
*srcu_idx = srcu_read_lock(&md->io_barrier);
return rcu_dereference(md->map);
}
it doens't work and I get an error:
drivers/md/dm.c:557:9: warning: context imbalance in 'dm_get_live_table':
unexpected unlock
drivers/md/dm.c:557:9: default context: wanted 1, got 0
Maybe it is a bug in sparse, maybe I am using a wrong tag? Is there a
reference of all sparse tags somewhere?
Mikulas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-04-23 10:56 ` Jun'ichi Nomura
@ 2012-05-02 2:17 ` Mikulas Patocka
2012-05-10 4:33 ` Jun'ichi Nomura
0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-05-02 2:17 UTC (permalink / raw)
To: Jun'ichi Nomura; +Cc: device-mapper development, Alasdair G. Kergon
Hi
I placed the new code using srcu here:
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
It removes io_lock, map_lock and holders and replaces them with srcu.
On Mon, 23 Apr 2012, Jun'ichi Nomura wrote:
> Hi Mikulas,
>
> On 04/22/12 01:17, Mikulas Patocka wrote:
> > I created new patches that use rcu instead of map_lock, so they address
> > the issues you mentioned. Get the new patches here:
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> >
> > performance with new patches:
> > no patch: 69.3
> > patch 1: 54.0
> > patch 1,2: 44.2
> > patch 1,2,3: 39.8
> > patch 1,2,3,4: 32.7
>
> Thank you. I have 2 comments for the new patches.
>
> synchronize_rcu could be put in dm_table_destroy() instead of __bind().
> I think it's safer place to wait.
I think the code is more readable if synchronizing rcu is just after
assigning the pointer that is protected by rcu.
> io_lock could be converted to SRCU.
> I.e. something like:
> On reader-side:
> idx = srcu_read_lock(io_srcu);
> if (!DMF_BLOCK_IO_FOR_SUSPEND)
> split_and_process_bio();
> srcu_read_unlock(io_srcu,idx);
> In dm_suspend:
> set_bit(DMF_BLOCK_IO_FOR_SUSPEND);
> mb();
> synchronize_srcu(io_srcu);
> <from here, nobody will enter split_and_process_bio>
> That makes dm-optimize-percpu-io-lock.patch simpler.
> dm-optimize-take-io_lock-on-table-swap.patch may become simpler, too.
>
> --
> Jun'ichi Nomura, NEC Corporation
Mikulas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-05-02 2:17 ` Mikulas Patocka
@ 2012-05-10 4:33 ` Jun'ichi Nomura
2012-05-18 6:37 ` Mikulas Patocka
0 siblings, 1 reply; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-05-10 4:33 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: device-mapper development, Alasdair G. Kergon
Hi Mikulas,
On 05/02/12 11:17, Mikulas Patocka wrote:
> I placed the new code using srcu here:
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
>
> It removes io_lock, map_lock and holders and replaces them with srcu.
I've reviewed the patches and here are some comments.
The 1st one seems to be a bug.
Others are minor comments about readability.
- There are functions destroying inactive table
without SRCU synchronization.
* table_load
* table_clear
* do_resume
* __hash_remove
It could cause use-after-free by the user of
dm_get_inactive_table().
synchronization is needed, outside of exclusive hash_lock.
- I couldn't see the reason why locking is different
for request-based in dm_wq_work().
- We could use dm_get_live_table() in the caller of
__split_and_process_bio() and pass the result to it.
Then, naked use of srcu_read_lock/unlock and
rcu_dereference can be eliminated.
I think it helps readability.
- md->map is directly referenced in dm_suspend/dm_resume.
It's ok because md->suspend_lock is taken and nobody
can replace md->map. I think it's worth putting a comment
in 'struct mapped_device' about the rule.
Attached patch explains the above comments by code.
>> synchronize_rcu could be put in dm_table_destroy() instead of __bind().
>> I think it's safer place to wait.
>
> I think the code is more readable if synchronizing rcu is just after
> assigning the pointer that is protected by rcu.
OK. I don't object.
Thanks,
---
Jun'ichi Nomura, NEC Corporation
* Added a comment about locking for reading md->map
* dm_sync_table() to wait for other processes to exit from
read-side critical section
* __split_and_process_bio() takes map
* __hash_remove() returns a table pointer to be destroyed later
* Added dm_sync_table() in functions in dm-ioctl.c to synchronize with
inactive table users
Index: linux-3.3/drivers/md/dm.c
===================================================================
--- linux-3.3.orig/drivers/md/dm.c 2012-05-10 12:19:10.977242964 +0900
+++ linux-3.3/drivers/md/dm.c 2012-05-10 13:54:30.150853855 +0900
@@ -141,6 +141,8 @@ struct mapped_device {
/*
* The current mapping.
+ * Use dm_get_live_table{_fast} or take suspend_lock for
+ * dereference.
*/
struct dm_table *map;
@@ -561,6 +563,12 @@ void dm_put_live_table(struct mapped_dev
srcu_read_unlock(&md->io_barrier, srcu_idx);
}
+void dm_sync_table(struct mapped_device *md)
+{
+ synchronize_srcu(&md->io_barrier);
+ synchronize_rcu_expedited();
+}
+
/*
* A fast alternative to dm_get_live_table/dm_put_live_table.
* The caller must not block between these two functions.
@@ -1316,17 +1324,18 @@ static int __clone_and_map(struct clone_
/*
* Split the bio into several clones and submit it to targets.
*/
-static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
+static void __split_and_process_bio(struct mapped_device *md,
+ struct dm_table *map, struct bio *bio)
{
struct clone_info ci;
int error = 0;
- ci.map = srcu_dereference(md->map, &md->io_barrier);
- if (unlikely(!ci.map)) {
+ if (unlikely(!map)) {
bio_io_error(bio);
return;
}
+ ci.map = map;
ci.md = md;
ci.io = alloc_io(md);
ci.io->error = 0;
@@ -1422,8 +1431,9 @@ static void _dm_request(struct request_q
struct mapped_device *md = q->queuedata;
int cpu;
int srcu_idx;
+ struct dm_table *map;
- srcu_idx = srcu_read_lock(&md->io_barrier);
+ map = dm_get_live_table(md, &srcu_idx);
cpu = part_stat_lock();
part_stat_inc(cpu, &dm_disk(md)->part0, ios[rw]);
@@ -1432,7 +1442,7 @@ static void _dm_request(struct request_q
/* if we're suspended, we have to queue this io for later */
if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
- srcu_read_unlock(&md->io_barrier, srcu_idx);
+ dm_put_live_table(md, srcu_idx);
if (bio_rw(bio) != READA)
queue_io(md, bio);
@@ -1441,8 +1451,8 @@ static void _dm_request(struct request_q
return;
}
- __split_and_process_bio(md, bio);
- srcu_read_unlock(&md->io_barrier, srcu_idx);
+ __split_and_process_bio(md, map, bio);
+ dm_put_live_table(md, srcu_idx);
return;
}
@@ -2115,8 +2125,7 @@ static struct dm_table *__bind(struct ma
set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
else
clear_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
- synchronize_srcu(&md->io_barrier);
- synchronize_rcu_expedited();
+ dm_sync_table(md);
return old_map;
}
@@ -2133,8 +2142,7 @@ static struct dm_table *__unbind(struct
dm_table_event_callback(map, NULL, NULL);
rcu_assign_pointer(md->map, NULL);
- synchronize_srcu(&md->io_barrier);
- synchronize_rcu_expedited();
+ dm_sync_table(md);
return map;
}
@@ -2375,8 +2383,9 @@ static void dm_wq_work(struct work_struc
work);
struct bio *c;
int srcu_idx;
+ struct dm_table *map;
- srcu_idx = srcu_read_lock(&md->io_barrier);
+ map = dm_get_live_table(md, &srcu_idx);
while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
spin_lock_irq(&md->deferred_lock);
@@ -2386,15 +2395,13 @@ static void dm_wq_work(struct work_struc
if (!c)
break;
- if (dm_request_based(md)) {
- srcu_read_unlock(&md->io_barrier, srcu_idx);
+ if (dm_request_based(md))
generic_make_request(c);
- srcu_idx = srcu_read_lock(&md->io_barrier);
- } else
- __split_and_process_bio(md, c);
+ else
+ __split_and_process_bio(md, map, c);
}
- srcu_read_unlock(&md->io_barrier, srcu_idx);
+ dm_put_live_table(md, srcu_idx);
}
static void dm_queue_flush(struct mapped_device *md)
Index: linux-3.3/drivers/md/dm-ioctl.c
===================================================================
--- linux-3.3.orig/drivers/md/dm-ioctl.c 2012-05-10 12:19:10.995242964 +0900
+++ linux-3.3/drivers/md/dm-ioctl.c 2012-05-10 12:54:30.312180811 +0900
@@ -250,7 +250,7 @@ static int dm_hash_insert(const char *na
return -EBUSY;
}
-static void __hash_remove(struct hash_cell *hc)
+static struct dm_table *__hash_remove(struct hash_cell *hc)
{
struct dm_table *table;
int srcu_idx;
@@ -267,10 +267,13 @@ static void __hash_remove(struct hash_ce
dm_table_event(table);
dm_put_live_table(hc->md, srcu_idx);
+ table = NULL;
if (hc->new_map)
- dm_table_destroy(hc->new_map);
+ table = hc->new_map;
dm_put(hc->md);
free_cell(hc);
+
+ return table;
}
static void dm_hash_remove_all(int keep_open_devices)
@@ -278,6 +281,7 @@ static void dm_hash_remove_all(int keep_
int i, dev_skipped;
struct hash_cell *hc;
struct mapped_device *md;
+ struct dm_table *t;
retry:
dev_skipped = 0;
@@ -295,10 +299,14 @@ retry:
continue;
}
- __hash_remove(hc);
+ t = __hash_remove(hc);
up_write(&_hash_lock);
+ if (t) {
+ dm_sync_table(md);
+ dm_table_destroy(t);
+ }
dm_put(md);
if (likely(keep_open_devices))
dm_destroy(md);
@@ -808,6 +816,7 @@ static int dev_remove(struct dm_ioctl *p
struct hash_cell *hc;
struct mapped_device *md;
int r;
+ struct dm_table *t;
down_write(&_hash_lock);
hc = __find_device_hash_cell(param);
@@ -831,9 +840,14 @@ static int dev_remove(struct dm_ioctl *p
return r;
}
- __hash_remove(hc);
+ t = __hash_remove(hc);
up_write(&_hash_lock);
+ if (t) {
+ dm_sync_table(md);
+ dm_table_destroy(t);
+ }
+
if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr))
param->flags |= DM_UEVENT_GENERATED_FLAG;
@@ -997,6 +1011,7 @@ static int do_resume(struct dm_ioctl *pa
old_map = dm_swap_table(md, new_map);
if (IS_ERR(old_map)) {
+ dm_sync_table(md);
dm_table_destroy(new_map);
dm_put(md);
return PTR_ERR(old_map);
@@ -1014,6 +1029,10 @@ static int do_resume(struct dm_ioctl *pa
param->flags |= DM_UEVENT_GENERATED_FLAG;
}
+ /*
+ * Since dm_swap_table synchronizes RCU, nobody should be in
+ * read-side critical section already.
+ */
if (old_map)
dm_table_destroy(old_map);
@@ -1225,7 +1244,7 @@ static int table_load(struct dm_ioctl *p
{
int r;
struct hash_cell *hc;
- struct dm_table *t;
+ struct dm_table *t, *old_map = NULL;
struct mapped_device *md;
struct target_type *immutable_target_type;
@@ -1281,14 +1300,14 @@ static int table_load(struct dm_ioctl *p
hc = dm_get_mdptr(md);
if (!hc || hc->md != md) {
DMWARN("device has been removed from the dev hash table.");
- dm_table_destroy(t);
up_write(&_hash_lock);
+ dm_table_destroy(t);
r = -ENXIO;
goto out;
}
if (hc->new_map)
- dm_table_destroy(hc->new_map);
+ old_map = hc->new_map;
hc->new_map = t;
up_write(&_hash_lock);
@@ -1296,6 +1315,11 @@ static int table_load(struct dm_ioctl *p
__dev_status(md, param);
out:
+ if (old_map) {
+ dm_sync_table(md);
+ dm_table_destroy(old_map);
+ }
+
dm_put(md);
return r;
@@ -1305,6 +1329,7 @@ static int table_clear(struct dm_ioctl *
{
struct hash_cell *hc;
struct mapped_device *md;
+ struct dm_table *old_map = NULL;
down_write(&_hash_lock);
@@ -1316,7 +1341,7 @@ static int table_clear(struct dm_ioctl *
}
if (hc->new_map) {
- dm_table_destroy(hc->new_map);
+ old_map = hc->new_map;
hc->new_map = NULL;
}
@@ -1325,6 +1350,10 @@ static int table_clear(struct dm_ioctl *
__dev_status(hc->md, param);
md = hc->md;
up_write(&_hash_lock);
+ if (old_map) {
+ dm_sync_table(md);
+ dm_table_destroy(old_map);
+ }
dm_put(md);
return 0;
Index: linux-3.3/include/linux/device-mapper.h
===================================================================
--- linux-3.3.orig/include/linux/device-mapper.h 2012-05-10 10:03:06.000000000 +0900
+++ linux-3.3/include/linux/device-mapper.h 2012-05-10 12:45:48.510196182 +0900
@@ -364,6 +364,7 @@ int dm_table_complete(struct dm_table *t
*/
struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx);
void dm_put_live_table(struct mapped_device *md, int srcu_idx);
+void dm_sync_table(struct mapped_device *md);
/*
* Queries
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-05-10 4:33 ` Jun'ichi Nomura
@ 2012-05-18 6:37 ` Mikulas Patocka
2012-05-23 6:27 ` Jun'ichi Nomura
0 siblings, 1 reply; 10+ messages in thread
From: Mikulas Patocka @ 2012-05-18 6:37 UTC (permalink / raw)
To: Jun'ichi Nomura; +Cc: device-mapper development, Alasdair G. Kergon
Hi
Thanks for reviewing this. I applied your patch, the new version is here:
http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
Mikulas
On Thu, 10 May 2012, Jun'ichi Nomura wrote:
> Hi Mikulas,
>
> On 05/02/12 11:17, Mikulas Patocka wrote:
> > I placed the new code using srcu here:
> > http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
> >
> > It removes io_lock, map_lock and holders and replaces them with srcu.
>
> I've reviewed the patches and here are some comments.
> The 1st one seems to be a bug.
> Others are minor comments about readability.
>
> - There are functions destroying inactive table
> without SRCU synchronization.
> * table_load
> * table_clear
> * do_resume
> * __hash_remove
> It could cause use-after-free by the user of
> dm_get_inactive_table().
> synchronization is needed, outside of exclusive hash_lock.
>
> - I couldn't see the reason why locking is different
> for request-based in dm_wq_work().
>
> - We could use dm_get_live_table() in the caller of
> __split_and_process_bio() and pass the result to it.
> Then, naked use of srcu_read_lock/unlock and
> rcu_dereference can be eliminated.
> I think it helps readability.
>
> - md->map is directly referenced in dm_suspend/dm_resume.
> It's ok because md->suspend_lock is taken and nobody
> can replace md->map. I think it's worth putting a comment
> in 'struct mapped_device' about the rule.
>
> Attached patch explains the above comments by code.
>
> >> synchronize_rcu could be put in dm_table_destroy() instead of __bind().
> >> I think it's safer place to wait.
> >
> > I think the code is more readable if synchronizing rcu is just after
> > assigning the pointer that is protected by rcu.
>
> OK. I don't object.
>
> Thanks,
> ---
> Jun'ichi Nomura, NEC Corporation
>
> * Added a comment about locking for reading md->map
> * dm_sync_table() to wait for other processes to exit from
> read-side critical section
> * __split_and_process_bio() takes map
> * __hash_remove() returns a table pointer to be destroyed later
> * Added dm_sync_table() in functions in dm-ioctl.c to synchronize with
> inactive table users
>
> Index: linux-3.3/drivers/md/dm.c
> ===================================================================
> --- linux-3.3.orig/drivers/md/dm.c 2012-05-10 12:19:10.977242964 +0900
> +++ linux-3.3/drivers/md/dm.c 2012-05-10 13:54:30.150853855 +0900
> @@ -141,6 +141,8 @@ struct mapped_device {
>
> /*
> * The current mapping.
> + * Use dm_get_live_table{_fast} or take suspend_lock for
> + * dereference.
> */
> struct dm_table *map;
>
> @@ -561,6 +563,12 @@ void dm_put_live_table(struct mapped_dev
> srcu_read_unlock(&md->io_barrier, srcu_idx);
> }
>
> +void dm_sync_table(struct mapped_device *md)
> +{
> + synchronize_srcu(&md->io_barrier);
> + synchronize_rcu_expedited();
> +}
> +
> /*
> * A fast alternative to dm_get_live_table/dm_put_live_table.
> * The caller must not block between these two functions.
> @@ -1316,17 +1324,18 @@ static int __clone_and_map(struct clone_
> /*
> * Split the bio into several clones and submit it to targets.
> */
> -static void __split_and_process_bio(struct mapped_device *md, struct bio *bio)
> +static void __split_and_process_bio(struct mapped_device *md,
> + struct dm_table *map, struct bio *bio)
> {
> struct clone_info ci;
> int error = 0;
>
> - ci.map = srcu_dereference(md->map, &md->io_barrier);
> - if (unlikely(!ci.map)) {
> + if (unlikely(!map)) {
> bio_io_error(bio);
> return;
> }
>
> + ci.map = map;
> ci.md = md;
> ci.io = alloc_io(md);
> ci.io->error = 0;
> @@ -1422,8 +1431,9 @@ static void _dm_request(struct request_q
> struct mapped_device *md = q->queuedata;
> int cpu;
> int srcu_idx;
> + struct dm_table *map;
>
> - srcu_idx = srcu_read_lock(&md->io_barrier);
> + map = dm_get_live_table(md, &srcu_idx);
>
> cpu = part_stat_lock();
> part_stat_inc(cpu, &dm_disk(md)->part0, ios[rw]);
> @@ -1432,7 +1442,7 @@ static void _dm_request(struct request_q
>
> /* if we're suspended, we have to queue this io for later */
> if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags))) {
> - srcu_read_unlock(&md->io_barrier, srcu_idx);
> + dm_put_live_table(md, srcu_idx);
>
> if (bio_rw(bio) != READA)
> queue_io(md, bio);
> @@ -1441,8 +1451,8 @@ static void _dm_request(struct request_q
> return;
> }
>
> - __split_and_process_bio(md, bio);
> - srcu_read_unlock(&md->io_barrier, srcu_idx);
> + __split_and_process_bio(md, map, bio);
> + dm_put_live_table(md, srcu_idx);
> return;
> }
>
> @@ -2115,8 +2125,7 @@ static struct dm_table *__bind(struct ma
> set_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
> else
> clear_bit(DMF_MERGE_IS_OPTIONAL, &md->flags);
> - synchronize_srcu(&md->io_barrier);
> - synchronize_rcu_expedited();
> + dm_sync_table(md);
>
> return old_map;
> }
> @@ -2133,8 +2142,7 @@ static struct dm_table *__unbind(struct
>
> dm_table_event_callback(map, NULL, NULL);
> rcu_assign_pointer(md->map, NULL);
> - synchronize_srcu(&md->io_barrier);
> - synchronize_rcu_expedited();
> + dm_sync_table(md);
>
> return map;
> }
> @@ -2375,8 +2383,9 @@ static void dm_wq_work(struct work_struc
> work);
> struct bio *c;
> int srcu_idx;
> + struct dm_table *map;
>
> - srcu_idx = srcu_read_lock(&md->io_barrier);
> + map = dm_get_live_table(md, &srcu_idx);
>
> while (!test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) {
> spin_lock_irq(&md->deferred_lock);
> @@ -2386,15 +2395,13 @@ static void dm_wq_work(struct work_struc
> if (!c)
> break;
>
> - if (dm_request_based(md)) {
> - srcu_read_unlock(&md->io_barrier, srcu_idx);
> + if (dm_request_based(md))
> generic_make_request(c);
> - srcu_idx = srcu_read_lock(&md->io_barrier);
> - } else
> - __split_and_process_bio(md, c);
> + else
> + __split_and_process_bio(md, map, c);
> }
>
> - srcu_read_unlock(&md->io_barrier, srcu_idx);
> + dm_put_live_table(md, srcu_idx);
> }
>
> static void dm_queue_flush(struct mapped_device *md)
> Index: linux-3.3/drivers/md/dm-ioctl.c
> ===================================================================
> --- linux-3.3.orig/drivers/md/dm-ioctl.c 2012-05-10 12:19:10.995242964 +0900
> +++ linux-3.3/drivers/md/dm-ioctl.c 2012-05-10 12:54:30.312180811 +0900
> @@ -250,7 +250,7 @@ static int dm_hash_insert(const char *na
> return -EBUSY;
> }
>
> -static void __hash_remove(struct hash_cell *hc)
> +static struct dm_table *__hash_remove(struct hash_cell *hc)
> {
> struct dm_table *table;
> int srcu_idx;
> @@ -267,10 +267,13 @@ static void __hash_remove(struct hash_ce
> dm_table_event(table);
> dm_put_live_table(hc->md, srcu_idx);
>
> + table = NULL;
> if (hc->new_map)
> - dm_table_destroy(hc->new_map);
> + table = hc->new_map;
> dm_put(hc->md);
> free_cell(hc);
> +
> + return table;
> }
>
> static void dm_hash_remove_all(int keep_open_devices)
> @@ -278,6 +281,7 @@ static void dm_hash_remove_all(int keep_
> int i, dev_skipped;
> struct hash_cell *hc;
> struct mapped_device *md;
> + struct dm_table *t;
>
> retry:
> dev_skipped = 0;
> @@ -295,10 +299,14 @@ retry:
> continue;
> }
>
> - __hash_remove(hc);
> + t = __hash_remove(hc);
>
> up_write(&_hash_lock);
>
> + if (t) {
> + dm_sync_table(md);
> + dm_table_destroy(t);
> + }
> dm_put(md);
> if (likely(keep_open_devices))
> dm_destroy(md);
> @@ -808,6 +816,7 @@ static int dev_remove(struct dm_ioctl *p
> struct hash_cell *hc;
> struct mapped_device *md;
> int r;
> + struct dm_table *t;
>
> down_write(&_hash_lock);
> hc = __find_device_hash_cell(param);
> @@ -831,9 +840,14 @@ static int dev_remove(struct dm_ioctl *p
> return r;
> }
>
> - __hash_remove(hc);
> + t = __hash_remove(hc);
> up_write(&_hash_lock);
>
> + if (t) {
> + dm_sync_table(md);
> + dm_table_destroy(t);
> + }
> +
> if (!dm_kobject_uevent(md, KOBJ_REMOVE, param->event_nr))
> param->flags |= DM_UEVENT_GENERATED_FLAG;
>
> @@ -997,6 +1011,7 @@ static int do_resume(struct dm_ioctl *pa
>
> old_map = dm_swap_table(md, new_map);
> if (IS_ERR(old_map)) {
> + dm_sync_table(md);
> dm_table_destroy(new_map);
> dm_put(md);
> return PTR_ERR(old_map);
> @@ -1014,6 +1029,10 @@ static int do_resume(struct dm_ioctl *pa
> param->flags |= DM_UEVENT_GENERATED_FLAG;
> }
>
> + /*
> + * Since dm_swap_table synchronizes RCU, nobody should be in
> + * read-side critical section already.
> + */
> if (old_map)
> dm_table_destroy(old_map);
>
> @@ -1225,7 +1244,7 @@ static int table_load(struct dm_ioctl *p
> {
> int r;
> struct hash_cell *hc;
> - struct dm_table *t;
> + struct dm_table *t, *old_map = NULL;
> struct mapped_device *md;
> struct target_type *immutable_target_type;
>
> @@ -1281,14 +1300,14 @@ static int table_load(struct dm_ioctl *p
> hc = dm_get_mdptr(md);
> if (!hc || hc->md != md) {
> DMWARN("device has been removed from the dev hash table.");
> - dm_table_destroy(t);
> up_write(&_hash_lock);
> + dm_table_destroy(t);
> r = -ENXIO;
> goto out;
> }
>
> if (hc->new_map)
> - dm_table_destroy(hc->new_map);
> + old_map = hc->new_map;
> hc->new_map = t;
> up_write(&_hash_lock);
>
> @@ -1296,6 +1315,11 @@ static int table_load(struct dm_ioctl *p
> __dev_status(md, param);
>
> out:
> + if (old_map) {
> + dm_sync_table(md);
> + dm_table_destroy(old_map);
> + }
> +
> dm_put(md);
>
> return r;
> @@ -1305,6 +1329,7 @@ static int table_clear(struct dm_ioctl *
> {
> struct hash_cell *hc;
> struct mapped_device *md;
> + struct dm_table *old_map = NULL;
>
> down_write(&_hash_lock);
>
> @@ -1316,7 +1341,7 @@ static int table_clear(struct dm_ioctl *
> }
>
> if (hc->new_map) {
> - dm_table_destroy(hc->new_map);
> + old_map = hc->new_map;
> hc->new_map = NULL;
> }
>
> @@ -1325,6 +1350,10 @@ static int table_clear(struct dm_ioctl *
> __dev_status(hc->md, param);
> md = hc->md;
> up_write(&_hash_lock);
> + if (old_map) {
> + dm_sync_table(md);
> + dm_table_destroy(old_map);
> + }
> dm_put(md);
>
> return 0;
> Index: linux-3.3/include/linux/device-mapper.h
> ===================================================================
> --- linux-3.3.orig/include/linux/device-mapper.h 2012-05-10 10:03:06.000000000 +0900
> +++ linux-3.3/include/linux/device-mapper.h 2012-05-10 12:45:48.510196182 +0900
> @@ -364,6 +364,7 @@ int dm_table_complete(struct dm_table *t
> */
> struct dm_table *dm_get_live_table(struct mapped_device *md, int *srcu_idx);
> void dm_put_live_table(struct mapped_device *md, int srcu_idx);
> +void dm_sync_table(struct mapped_device *md);
>
> /*
> * Queries
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCHES]: dm lock optimization
2012-05-18 6:37 ` Mikulas Patocka
@ 2012-05-23 6:27 ` Jun'ichi Nomura
0 siblings, 0 replies; 10+ messages in thread
From: Jun'ichi Nomura @ 2012-05-23 6:27 UTC (permalink / raw)
To: Mikulas Patocka; +Cc: device-mapper development, Alasdair G. Kergon
Hi Mikulas,
On 05/18/12 15:37, Mikulas Patocka wrote:
> Thanks for reviewing this. I applied your patch, the new version is here:
> http://people.redhat.com/mpatocka/patches/kernel/dm-lock-optimization/
The patch (dm-optimize.patch) looks good.
I have no other issues with it.
Thank you.
--
Jun'ichi Nomura, NEC Corporation
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-05-23 6:27 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19 3:03 [PATCHES]: dm lock optimization Mikulas Patocka
2012-04-19 5:17 ` Jun'ichi Nomura
2012-04-21 16:17 ` Mikulas Patocka
2012-04-23 10:56 ` Jun'ichi Nomura
2012-05-02 2:17 ` Mikulas Patocka
2012-05-10 4:33 ` Jun'ichi Nomura
2012-05-18 6:37 ` Mikulas Patocka
2012-05-23 6:27 ` Jun'ichi Nomura
2012-04-23 13:14 ` Joe Thornber
2012-05-02 0:03 ` Mikulas Patocka
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.