* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
@ 2015-01-26 12:28 yangwenfang
2015-01-27 7:08 ` Srinivas Eeda
2015-01-29 0:05 ` Goldwyn Rodrigues
0 siblings, 2 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-26 12:28 UTC (permalink / raw)
To: ocfs2-devel
What:
Byte range lock is applied to lock a region of a file to accelerate
reading/writing concurrently.
Why:
Currently ocfs2 does not support byte range lock. Since multiple nodes
may concurrently update/write at different positions of the same file
in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
DB+GPFS in running TPCC.
Aiming at improving the efficiency of parallel accesses to the same file,
we have implemented a demo of range lock feature which has been supported
by lustre and GPFS, so that a file can be updated by different nodes in
the cluster when they are visiting different blocks.
How:
Key issues in design and implementation:
1.In ocfs2, each file only has one lock, which is incapable of telling
different position.
One solution is to add a range field (start,end) in a lock. For example:
-ocfs2_lock_res(N1) dlm_lock_resource(Master) ocfs2_lock_res(N2)
-ocfs2_res_range_lock (0,9)----dlm_lock(0,9) N1
- dlm_lock(10,19) N2<--ocfs2_res_range_lock(10,19)
-ocfs2_res_range_lock (20,29)---dlm_lock(20,29) N1
- dlm_lock(30,49) N2<--ocfs2_res_range_lock(30,49)
-ocfs2_res_range_lock (50,59)---dlm_lock(50,59) N1
- dlm_lock(60,69) N2<--ocfs2_res_range_lock(60,69)
Each lock resource deploys an interval tree to manage the range, which
supports basic operations like add, delete, insert, find, split and merge.
The most important issue is to determine the existance of conflicts
among the ranges. Conflict-free ranges of the same file can be accessed
concurrently. In the contrary, nodes must wait for the release of a
conflicted lock before accessing the range of file.
Byte range lock supports split and merge rules: for same level, larger
scope; different level, write > read(If a node keeps EX lock with
range(start,end), then it has PR range lock(start,end)).
For example:
(1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
(0,19) PR;
(2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
become(0,19) PR, (5,19)EX;
(3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
split the lock and keep (6,9)PR.
2.In ocfs2, there are only three types of lock resources: rw, inode and open
which provide protections to different contents.
We need to add another lock resource(ip_range_lock_lockres) to protect
different ranges in IO read/write process.
For example: buffer read/write.
(1)ocfs2_file_aio_write ------------->ocfs2_file_aio_write
ocfs2_rw_lock(ex) ocfs2_rw_lock(pr)
ocfs2_range_lock(start, end, ex)
ocfs2_write_begin
ocfs2_inode_lock(ex) ocfs2_inode_lock(pr)
if append, update to ex;
(2)ocfs2_file_aio_read---------------> no need to change.
ocfs2_readpage
ocfs2_inode_lock(pr)
(3)but it is a problem in read_ahead.
ocfs2_readpages------------------>ocfs2_readpages
ocfs2_inode_lock(pr) ocfs2_inode_lock(pr)
ocfs2_range_lock(start, end, pr)
Limitations based on our assumption:
1.Byte range lock is only beneficial for update write.
2.Too many locks because of delayed unlock.
3.Significant source code modification is necessitated, involving almost the
whole dlmglue and dlm modules.
As described above, there are also many limitations base on our assumption.
Many thanks for any advice.
thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-26 12:28 [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock yangwenfang
@ 2015-01-27 7:08 ` Srinivas Eeda
2015-01-29 6:42 ` yangwenfang
2015-01-29 0:05 ` Goldwyn Rodrigues
1 sibling, 1 reply; 16+ messages in thread
From: Srinivas Eeda @ 2015-01-27 7:08 UTC (permalink / raw)
To: ocfs2-devel
Hi Yangwenfang,
thank you very much for initiating this RFC :). This feature is long due
for OCFS2 and we are also interested in implementing this feature.
Wengang(cc'ed) has been looking into analysing and giving an attempt to
implement it. We haven't looked at splitting and merging the range
locking yet, but looked at having lock fairness and range locking.
Wengang has done some of the dlm changes to see how it can be done but
other changes are still work in progress. We will email more details in
coming few days.
Since you are also looking into it, it would be great if we can
collaborate work on this feature. Can you please share more info on the
demo code you mentioned ? Like what it does and how much work has been
done on this ?
One of the thing we considered was making the rw lock itself support
range locking, which is a different approach from what you mentioned. Is
there any reason why rw lock cannot be used and we needa new
ip_range_lock_lockres ?
Thanks,
--Srini
Hi On 01/26/2015 04:28 AM, yangwenfang wrote:
> What:
> Byte range lock is applied to lock a region of a file to accelerate
> reading/writing concurrently.
>
> Why:
> Currently ocfs2 does not support byte range lock. Since multiple nodes
> may concurrently update/write at different positions of the same file
> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
> DB+GPFS in running TPCC.
> Aiming at improving the efficiency of parallel accesses to the same file,
> we have implemented a demo of range lock feature which has been supported
> by lustre and GPFS, so that a file can be updated by different nodes in
> the cluster when they are visiting different blocks.
>
> How:
> Key issues in design and implementation:
> 1.In ocfs2, each file only has one lock, which is incapable of telling
> different position.
> One solution is to add a range field (start,end) in a lock. For example:
> -ocfs2_lock_res(N1) dlm_lock_resource(Master) ocfs2_lock_res(N2)
> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9) N1
> - dlm_lock(10,19) N2<--ocfs2_res_range_lock(10,19)
> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29) N1
> - dlm_lock(30,49) N2<--ocfs2_res_range_lock(30,49)
> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59) N1
> - dlm_lock(60,69) N2<--ocfs2_res_range_lock(60,69)
>
> Each lock resource deploys an interval tree to manage the range, which
> supports basic operations like add, delete, insert, find, split and merge.
> The most important issue is to determine the existance of conflicts
> among the ranges. Conflict-free ranges of the same file can be accessed
> concurrently. In the contrary, nodes must wait for the release of a
> conflicted lock before accessing the range of file.
>
> Byte range lock supports split and merge rules: for same level, larger
> scope; different level, write > read(If a node keeps EX lock with
> range(start,end), then it has PR range lock(start,end)).
> For example:
> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
> (0,19) PR;
> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
> become(0,19) PR, (5,19)EX;
> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
> split the lock and keep (6,9)PR.
>
> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
> which provide protections to different contents.
> We need to add another lock resource(ip_range_lock_lockres) to protect
> different ranges in IO read/write process.
> For example: buffer read/write.
> (1)ocfs2_file_aio_write ------------->ocfs2_file_aio_write
> ocfs2_rw_lock(ex) ocfs2_rw_lock(pr)
> ocfs2_range_lock(start, end, ex)
> ocfs2_write_begin
> ocfs2_inode_lock(ex) ocfs2_inode_lock(pr)
> if append, update to ex;
> (2)ocfs2_file_aio_read---------------> no need to change.
> ocfs2_readpage
> ocfs2_inode_lock(pr)
> (3)but it is a problem in read_ahead.
> ocfs2_readpages------------------>ocfs2_readpages
> ocfs2_inode_lock(pr) ocfs2_inode_lock(pr)
> ocfs2_range_lock(start, end, pr)
>
> Limitations based on our assumption:
> 1.Byte range lock is only beneficial for update write.
> 2.Too many locks because of delayed unlock.
> 3.Significant source code modification is necessitated, involving almost the
> whole dlmglue and dlm modules.
>
> As described above, there are also many limitations base on our assumption.
> Many thanks for any advice.
>
> thanks.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-26 12:28 [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock yangwenfang
2015-01-27 7:08 ` Srinivas Eeda
@ 2015-01-29 0:05 ` Goldwyn Rodrigues
2015-01-29 3:21 ` Wengang Wang
2015-01-29 7:47 ` yangwenfang
1 sibling, 2 replies; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-29 0:05 UTC (permalink / raw)
To: ocfs2-devel
Hi Yangwenfang,
I appreciate the effort in this regard.
On 01/26/2015 06:28 AM, yangwenfang wrote:
> What:
> Byte range lock is applied to lock a region of a file to accelerate
> reading/writing concurrently.
>
> Why:
> Currently ocfs2 does not support byte range lock. Since multiple nodes
> may concurrently update/write at different positions of the same file
> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
> DB+GPFS in running TPCC.
> Aiming at improving the efficiency of parallel accesses to the same file,
> we have implemented a demo of range lock feature which has been supported
> by lustre and GPFS, so that a file can be updated by different nodes in
> the cluster when they are visiting different blocks.
>
> How:
> Key issues in design and implementation:
> 1.In ocfs2, each file only has one lock, which is incapable of telling
> different position.
> One solution is to add a range field (start,end) in a lock. For example:
> -ocfs2_lock_res(N1) dlm_lock_resource(Master) ocfs2_lock_res(N2)
> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9) N1
> - dlm_lock(10,19) N2<--ocfs2_res_range_lock(10,19)
> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29) N1
> - dlm_lock(30,49) N2<--ocfs2_res_range_lock(30,49)
> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59) N1
> - dlm_lock(60,69) N2<--ocfs2_res_range_lock(60,69)
>
> Each lock resource deploys an interval tree to manage the range, which
> supports basic operations like add, delete, insert, find, split and merge.
> The most important issue is to determine the existance of conflicts
> among the ranges. Conflict-free ranges of the same file can be accessed
> concurrently. In the contrary, nodes must wait for the release of a
> conflicted lock before accessing the range of file.
>
> Byte range lock supports split and merge rules: for same level, larger
> scope; different level, write > read(If a node keeps EX lock with
> range(start,end), then it has PR range lock(start,end)).
> For example:
> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
> (0,19) PR;
> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
> become(0,19) PR, (5,19)EX;
> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
> split the lock and keep (6,9)PR.
What is the purpose of doing this kind of merge/split? I assume this
will be required in case of multiple processes from the same node
read/write to the file. Would it not be simpler to not merge or split
and keep separate instances in lock resources? This way you would have
to do relatively lesser book keeping with respect to comparisons.
Are these numbers in your pseudocode byte ranges? If yes, how do you
propose multiple writes which lie within a block_size/cluster_size range?
>
> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
> which provide protections to different contents.
> We need to add another lock resource(ip_range_lock_lockres) to protect
> different ranges in IO read/write process.
> For example: buffer read/write.
> (1)ocfs2_file_aio_write ------------->ocfs2_file_aio_write
> ocfs2_rw_lock(ex) ocfs2_rw_lock(pr)
> ocfs2_range_lock(start, end, ex)
This does not seem right. ocfs2_rw_lock is meant to serialize writes to
the same file. Changing it from ex to pr would make the file
inconsistent for writes to the same file. As Srini proposed, why create
a new lock instead of adding the feature to rw_lock?
> ocfs2_write_begin
> ocfs2_inode_lock(ex) ocfs2_inode_lock(pr)
> if append, update to ex;
> (2)ocfs2_file_aio_read---------------> no need to change.
> ocfs2_readpage
> ocfs2_inode_lock(pr)
> (3)but it is a problem in read_ahead.
> ocfs2_readpages------------------>ocfs2_readpages
> ocfs2_inode_lock(pr) ocfs2_inode_lock(pr)
> ocfs2_range_lock(start, end, pr)
>
> Limitations based on our assumption:
> 1.Byte range lock is only beneficial for update write.
> 2.Too many locks because of delayed unlock.
> 3.Significant source code modification is necessitated, involving almost the
> whole dlmglue and dlm modules.
>
> As described above, there are also many limitations base on our assumption.
> Many thanks for any advice.
>
--
Goldwyn
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 0:05 ` Goldwyn Rodrigues
@ 2015-01-29 3:21 ` Wengang Wang
2015-01-29 7:47 ` yangwenfang
1 sibling, 0 replies; 16+ messages in thread
From: Wengang Wang @ 2015-01-29 3:21 UTC (permalink / raw)
To: ocfs2-devel
? 2015?01?29? 08:05, Goldwyn Rodrigues ??:
> Hi Yangwenfang,
>
> I appreciate the effort in this regard.
>
> On 01/26/2015 06:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>>
>> Why:
>> Currently ocfs2 does not support byte range lock. Since multiple nodes
>> may concurrently update/write at different positions of the same file
>> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
>> DB+GPFS in running TPCC.
>> Aiming at improving the efficiency of parallel accesses to the same file,
>> we have implemented a demo of range lock feature which has been supported
>> by lustre and GPFS, so that a file can be updated by different nodes in
>> the cluster when they are visiting different blocks.
>>
>> How:
>> Key issues in design and implementation:
>> 1.In ocfs2, each file only has one lock, which is incapable of telling
>> different position.
>> One solution is to add a range field (start,end) in a lock. For example:
>> -ocfs2_lock_res(N1) dlm_lock_resource(Master) ocfs2_lock_res(N2)
>> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9) N1
>> - dlm_lock(10,19) N2<--ocfs2_res_range_lock(10,19)
>> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29) N1
>> - dlm_lock(30,49) N2<--ocfs2_res_range_lock(30,49)
>> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59) N1
>> - dlm_lock(60,69) N2<--ocfs2_res_range_lock(60,69)
>>
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
> What is the purpose of doing this kind of merge/split? I assume this
> will be required in case of multiple processes from the same node
> read/write to the file. Would it not be simpler to not merge or split
> and keep separate instances in lock resources? This way you would have
> to do relatively lesser book keeping with respect to comparisons.
>
> Are these numbers in your pseudocode byte ranges? If yes, how do you
> propose multiple writes which lie within a block_size/cluster_size range?
>
Yes, if the range lock is used for file read/write, the granularity
would be block rather than byte.
Say for example block size is 512, a write to 0-9 would acquire whole
0~511 bytes to be locked. Or acquire 0~0 block to be locked. Otherwise
If two write requests would access to same block, say one writes to
0~254 and the other writes to 255~511, if they take 0~254 and 255~511
respectively, the contents in this block may get corrupted after the two
writes.
thanks,
wengang
>> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
>> which provide protections to different contents.
>> We need to add another lock resource(ip_range_lock_lockres) to protect
>> different ranges in IO read/write process.
>> For example: buffer read/write.
>> (1)ocfs2_file_aio_write ------------->ocfs2_file_aio_write
>> ocfs2_rw_lock(ex) ocfs2_rw_lock(pr)
>> ocfs2_range_lock(start, end, ex)
> This does not seem right. ocfs2_rw_lock is meant to serialize writes to
> the same file. Changing it from ex to pr would make the file
> inconsistent for writes to the same file. As Srini proposed, why create
> a new lock instead of adding the feature to rw_lock?
>
>> ocfs2_write_begin
>> ocfs2_inode_lock(ex) ocfs2_inode_lock(pr)
>> if append, update to ex;
>> (2)ocfs2_file_aio_read---------------> no need to change.
>> ocfs2_readpage
>> ocfs2_inode_lock(pr)
>> (3)but it is a problem in read_ahead.
>> ocfs2_readpages------------------>ocfs2_readpages
>> ocfs2_inode_lock(pr) ocfs2_inode_lock(pr)
>> ocfs2_range_lock(start, end, pr)
>>
>> Limitations based on our assumption:
>> 1.Byte range lock is only beneficial for update write.
>> 2.Too many locks because of delayed unlock.
>> 3.Significant source code modification is necessitated, involving almost the
>> whole dlmglue and dlm modules.
>>
>> As described above, there are also many limitations base on our assumption.
>> Many thanks for any advice.
>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-27 7:08 ` Srinivas Eeda
@ 2015-01-29 6:42 ` yangwenfang
2015-01-29 11:04 ` Goldwyn Rodrigues
2015-01-29 11:07 ` Goldwyn Rodrigues
0 siblings, 2 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-29 6:42 UTC (permalink / raw)
To: ocfs2-devel
On 2015/1/27 15:08, Srinivas Eeda wrote:
> Hi Yangwenfang,
>
> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>
> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>
Hi,
About 6k lines of code was modified including dlmglue and dlm in our demo.
code modification:
1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
determine the existance of conflicts betwen multiple threads within the node.
manage the cache of range lock to support unlock-delay.
3.dlm: determine the existance of conflicts betwen multiple nodes.
add splitting and merging the range locking.
4.lib: interval tree.
> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>
RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
> Thanks,
> --Srini
>
>
> Hi On 01/26/2015 04:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>>
>> Why:
>> Currently ocfs2 does not support byte range lock. Since multiple nodes
>> may concurrently update/write at different positions of the same file
>> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
>> DB+GPFS in running TPCC.
>> Aiming at improving the efficiency of parallel accesses to the same file,
>> we have implemented a demo of range lock feature which has been supported
>> by lustre and GPFS, so that a file can be updated by different nodes in
>> the cluster when they are visiting different blocks.
>>
>> How:
>> Key issues in design and implementation:
>> 1.In ocfs2, each file only has one lock, which is incapable of telling
>> different position.
>> One solution is to add a range field (start,end) in a lock. For example:
>> -ocfs2_lock_res(N1) dlm_lock_resource(Master) ocfs2_lock_res(N2)
>> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9) N1
>> - dlm_lock(10,19) N2<--ocfs2_res_range_lock(10,19)
>> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29) N1
>> - dlm_lock(30,49) N2<--ocfs2_res_range_lock(30,49)
>> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59) N1
>> - dlm_lock(60,69) N2<--ocfs2_res_range_lock(60,69)
>>
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
>>
>> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
>> which provide protections to different contents.
>> We need to add another lock resource(ip_range_lock_lockres) to protect
>> different ranges in IO read/write process.
>> For example: buffer read/write.
>> (1)ocfs2_file_aio_write ------------->ocfs2_file_aio_write
>> ocfs2_rw_lock(ex) ocfs2_rw_lock(pr)
>> ocfs2_range_lock(start, end, ex)
>> ocfs2_write_begin
>> ocfs2_inode_lock(ex) ocfs2_inode_lock(pr)
>> if append, update to ex;
>> (2)ocfs2_file_aio_read---------------> no need to change.
>> ocfs2_readpage
>> ocfs2_inode_lock(pr)
>> (3)but it is a problem in read_ahead.
>> ocfs2_readpages------------------>ocfs2_readpages
>> ocfs2_inode_lock(pr) ocfs2_inode_lock(pr)
>> ocfs2_range_lock(start, end, pr)
>>
>> Limitations based on our assumption:
>> 1.Byte range lock is only beneficial for update write.
>> 2.Too many locks because of delayed unlock.
>> 3.Significant source code modification is necessitated, involving almost the
>> whole dlmglue and dlm modules.
>>
>> As described above, there are also many limitations base on our assumption.
>> Many thanks for any advice.
>>
>> thanks.
>>
>
>
> .
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 0:05 ` Goldwyn Rodrigues
2015-01-29 3:21 ` Wengang Wang
@ 2015-01-29 7:47 ` yangwenfang
2015-01-29 8:06 ` Wengang Wang
1 sibling, 1 reply; 16+ messages in thread
From: yangwenfang @ 2015-01-29 7:47 UTC (permalink / raw)
To: ocfs2-devel
On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>
> Hi Yangwenfang,
>
> I appreciate the effort in this regard.
>
> On 01/26/2015 06:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
>
> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>
Hi,
Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
For example(the granularity is block size)
1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
3.Node 1 writes to 5-10, then no need to dlmlock from master.
3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
> Are these numbers in your pseudocode byte ranges? If yes, how do you propose multiple writes which lie within a block_size/cluster_size range?
>
No, the granularity of these numbers is block size or PAGE_SIZE. The granularity is smaller, the conflict is more. Actually, we use 1M in our test.
thanks,
yangwenfang
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 7:47 ` yangwenfang
@ 2015-01-29 8:06 ` Wengang Wang
2015-01-30 3:54 ` yangwenfang
0 siblings, 1 reply; 16+ messages in thread
From: Wengang Wang @ 2015-01-29 8:06 UTC (permalink / raw)
To: ocfs2-devel
? 2015?01?29? 15:47, yangwenfang ??:
> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>> Hi Yangwenfang,
>>
>> I appreciate the effort in this regard.
>>
>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>> What:
>>> Byte range lock is applied to lock a region of a file to accelerate
>>> reading/writing concurrently.
>>> Each lock resource deploys an interval tree to manage the range, which
>>> supports basic operations like add, delete, insert, find, split and merge.
>>> The most important issue is to determine the existance of conflicts
>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>> concurrently. In the contrary, nodes must wait for the release of a
>>> conflicted lock before accessing the range of file.
>>>
>>> Byte range lock supports split and merge rules: for same level, larger
>>> scope; different level, write > read(If a node keeps EX lock with
>>> range(start,end), then it has PR range lock(start,end)).
>>> For example:
>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>> (0,19) PR;
>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>> become(0,19) PR, (5,19)EX;
>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>> split the lock and keep (6,9)PR.
>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>
> Hi,
> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
> For example(the granularity is block size)
> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
What's the merge would be like in dlm module? Will it cause deadlock when
node1 extend 0-9 to 0-19 and node 2 extend 10-19 to 0-19?
thanks,
wengang
>> Are these numbers in your pseudocode byte ranges? If yes, how do you propose multiple writes which lie within a block_size/cluster_size range?
>>
> No, the granularity of these numbers is block size or PAGE_SIZE. The granularity is smaller, the conflict is more. Actually, we use 1M in our test.
>
> thanks,
> yangwenfang
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 6:42 ` yangwenfang
@ 2015-01-29 11:04 ` Goldwyn Rodrigues
2015-01-30 2:59 ` Xue jiufei
2015-01-29 11:07 ` Goldwyn Rodrigues
1 sibling, 1 reply; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-29 11:04 UTC (permalink / raw)
To: ocfs2-devel
Yangwenfang,
On 01/29/2015 12:42 AM, yangwenfang wrote:
> On 2015/1/27 15:08, Srinivas Eeda wrote:
>> Hi Yangwenfang,
>>
>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>
>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>
> Hi,
> About 6k lines of code was modified including dlmglue and dlm in our demo.
>
> code modification:
> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
> determine the existance of conflicts betwen multiple threads within the node.
> manage the cache of range lock to support unlock-delay.
> 3.dlm: determine the existance of conflicts betwen multiple nodes.
> add splitting and merging the range locking.
> 4.lib: interval tree.
>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>
> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
> I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
Okay, let me ask this question in another way: What is the purpose of
ocfs2_rw_lock(pr) in *this* scenario, where you are using
ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
ocfs2_rw_lock guarding?
--
Goldwyn
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 6:42 ` yangwenfang
2015-01-29 11:04 ` Goldwyn Rodrigues
@ 2015-01-29 11:07 ` Goldwyn Rodrigues
1 sibling, 0 replies; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-29 11:07 UTC (permalink / raw)
To: ocfs2-devel
Yangwenfang,
On 01/29/2015 12:42 AM, yangwenfang wrote:
> On 2015/1/27 15:08, Srinivas Eeda wrote:
>> Hi Yangwenfang,
>>
>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>
>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>
> Hi,
> About 6k lines of code was modified including dlmglue and dlm in our demo.
>
> code modification:
> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
> determine the existance of conflicts betwen multiple threads within the node.
> manage the cache of range lock to support unlock-delay.
> 3.dlm: determine the existance of conflicts betwen multiple nodes.
> add splitting and merging the range locking.
> 4.lib: interval tree.
>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>
> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
> I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
Okay, let me ask this question in another way: What is the purpose of
ocfs2_rw_lock(pr) in *this* scenario, where you are using
ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
ocfs2_rw_lock guarding?
--
Goldwyn
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 11:04 ` Goldwyn Rodrigues
@ 2015-01-30 2:59 ` Xue jiufei
2015-01-30 12:37 ` Goldwyn Rodrigues
0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2015-01-30 2:59 UTC (permalink / raw)
To: ocfs2-devel
Hi Goldwyn,
On 2015/1/29 19:04, Goldwyn Rodrigues wrote:
> Yangwenfang,
>
> On 01/29/2015 12:42 AM, yangwenfang wrote:
>> On 2015/1/27 15:08, Srinivas Eeda wrote:
>>> Hi Yangwenfang,
>>>
>>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>>
>>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>>
>> Hi,
>> About 6k lines of code was modified including dlmglue and dlm in our demo.
>>
>> code modification:
>> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
>> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
>> determine the existance of conflicts betwen multiple threads within the node.
>> manage the cache of range lock to support unlock-delay.
>> 3.dlm: determine the existance of conflicts betwen multiple nodes.
>> add splitting and merging the range locking.
>> 4.lib: interval tree.
>>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>>
>> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
>> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
>> I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
>
> Okay, let me ask this question in another way: What is the purpose of
> ocfs2_rw_lock(pr) in *this* scenario, where you are using
> ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
> ocfs2_rw_lock guarding?
>
Because RW lock is also used to protect O_DIRECT reads from racing with truncate,
buffer read is not protected by RW lock. We do not want to change rw lock in buffer
read scenario. So we add another range lock to complete this demo.
Thanks
Xuejiufei
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-29 8:06 ` Wengang Wang
@ 2015-01-30 3:54 ` yangwenfang
2015-01-30 6:02 ` Wengang Wang
0 siblings, 1 reply; 16+ messages in thread
From: yangwenfang @ 2015-01-30 3:54 UTC (permalink / raw)
To: ocfs2-devel
On 2015/1/29 16:06, Wengang Wang wrote:
>
>> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>>> Hi Yangwenfang,
>>>
>>> I appreciate the effort in this regard.
>>>
>>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>>> What:
>>>> Byte range lock is applied to lock a region of a file to accelerate
>>>> reading/writing concurrently.
>>>> Each lock resource deploys an interval tree to manage the range, which
>>>> supports basic operations like add, delete, insert, find, split and merge.
>>>> The most important issue is to determine the existance of conflicts
>>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>>> concurrently. In the contrary, nodes must wait for the release of a
>>>> conflicted lock before accessing the range of file.
>>>>
>>>> Byte range lock supports split and merge rules: for same level, larger
>>>> scope; different level, write > read(If a node keeps EX lock with
>>>> range(start,end), then it has PR range lock(start,end)).
>>>> For example:
>>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>>> (0,19) PR;
>>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>>> become(0,19) PR, (5,19)EX;
>>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>>> split the lock and keep (6,9)PR.
>>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>>
>> Hi,
>> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
>> For example(the granularity is block size)
>> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
>> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
>> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
>> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
>
> What's the merge would be like in dlm module? Will it cause deadlock when
> node1 extend 0-9 to 0-19 and node 2 extend 10-19 to 0-19?
>
> thanks,
> wengang
>
Hi,
Do you mean that:
N1 keeps range lock(0,9), and wants to lock(10,19).
N2 keeps range lock(10,19), and wants to lock(0,9).
Firstly N1 sends locking message (10,19) to master, then master determines the existance of conflicts among the ranges.
N1(10,19) is conflict with N2(10,19). So master sends bast message to N2.
Sencond N2 sends locking message (0,9) to master, N1(0,9) is conflict with N2 (0,9), so master sends bast message to N1.
N2 drops range lock(10,19), then N1 merges range lock into (0,19).
N1 drops range lock (0,9), then N1 splits range lock into (10,19).
Finally, N1 keeps range lock (10,19), N2 keeps range lock (0,9).
So, there is no deadlock. Merging is only to the granted lock.
But if N2 keeps range lock(10,19), and wants to lock(0,15), there is deadlock.
When N2 drops range lock(10,19), (10,19) is conflict with another request (0,15), range lock (0,15) must be canceled
So, the most important issue is to determine the existance of conflicts among the ranges.
thanks,
yangwenfang
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-30 3:54 ` yangwenfang
@ 2015-01-30 6:02 ` Wengang Wang
2015-01-30 7:46 ` yangwenfang
0 siblings, 1 reply; 16+ messages in thread
From: Wengang Wang @ 2015-01-30 6:02 UTC (permalink / raw)
To: ocfs2-devel
Hi Wenfang,
? 2015?01?30? 11:54, yangwenfang ??:
> On 2015/1/29 16:06, Wengang Wang wrote:
>>> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>>>> Hi Yangwenfang,
>>>>
>>>> I appreciate the effort in this regard.
>>>>
>>>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>>>> What:
>>>>> Byte range lock is applied to lock a region of a file to accelerate
>>>>> reading/writing concurrently.
>>>>> Each lock resource deploys an interval tree to manage the range, which
>>>>> supports basic operations like add, delete, insert, find, split and merge.
>>>>> The most important issue is to determine the existance of conflicts
>>>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>>>> concurrently. In the contrary, nodes must wait for the release of a
>>>>> conflicted lock before accessing the range of file.
>>>>>
>>>>> Byte range lock supports split and merge rules: for same level, larger
>>>>> scope; different level, write > read(If a node keeps EX lock with
>>>>> range(start,end), then it has PR range lock(start,end)).
>>>>> For example:
>>>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>>>> (0,19) PR;
>>>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>>>> become(0,19) PR, (5,19)EX;
>>>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>>>> split the lock and keep (6,9)PR.
>>>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>>>
>>> Hi,
>>> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
>>> For example(the granularity is block size)
>>> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
>>> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
>>> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
>>> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
>> What's the merge would be like in dlm module? Will it cause deadlock when
>> node1 extend 0-9 to 0-19 and node 2 extend 10-19 to 0-19?
>>
>> thanks,
>> wengang
>>
> Hi,
> Do you mean that:
> N1 keeps range lock(0,9), and wants to lock(10,19).
> N2 keeps range lock(10,19), and wants to lock(0,9).
>
> Firstly N1 sends locking message (10,19) to master, then master determines the existance of conflicts among the ranges.
> N1(10,19) is conflict with N2(10,19). So master sends bast message to N2.
> Sencond N2 sends locking message (0,9) to master, N1(0,9) is conflict with N2 (0,9), so master sends bast message to N1.
> N2 drops range lock(10,19), then N1 merges range lock into (0,19).
> N1 drops range lock (0,9), then N1 splits range lock into (10,19).
> Finally, N1 keeps range lock (10,19), N2 keeps range lock (0,9).
>
> So, there is no deadlock. Merging is only to the granted lock.
>
> But if N2 keeps range lock(10,19), and wants to lock(0,15), there is deadlock.
> When N2 drops range lock(10,19), (10,19) is conflict with another request (0,15), range lock (0,15) must be canceled
How you detect the deadlock and avoid it?
thanks,
wengang
> So, the most important issue is to determine the existance of conflicts among the ranges.
>
> thanks,
> yangwenfang
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-30 6:02 ` Wengang Wang
@ 2015-01-30 7:46 ` yangwenfang
0 siblings, 0 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-30 7:46 UTC (permalink / raw)
To: ocfs2-devel
On 2015/1/30 14:02, Wengang Wang wrote:
> Hi Wenfang,
>
> ? 2015?01?30? 11:54, yangwenfang ??:
>> On 2015/1/29 16:06, Wengang Wang wrote:
>>>> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>>>>> Hi Yangwenfang,
>>>>>
>>>>> I appreciate the effort in this regard.
>>>>>
>>>>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>>>>> What:
>>>>>> Byte range lock is applied to lock a region of a file to accelerate
>>>>>> reading/writing concurrently.
>>>>>> Each lock resource deploys an interval tree to manage the range, which
>>>>>> supports basic operations like add, delete, insert, find, split and merge.
>>>>>> The most important issue is to determine the existance of conflicts
>>>>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>>>>> concurrently. In the contrary, nodes must wait for the release of a
>>>>>> conflicted lock before accessing the range of file.
>>>>>>
>>>>>> Byte range lock supports split and merge rules: for same level, larger
>>>>>> scope; different level, write > read(If a node keeps EX lock with
>>>>>> range(start,end), then it has PR range lock(start,end)).
>>>>>> For example:
>>>>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>>>>> (0,19) PR;
>>>>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>>>>> become(0,19) PR, (5,19)EX;
>>>>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>>>>> split the lock and keep (6,9)PR.
>>>>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>>>>
>>>> Hi,
>>>> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
>>>> For example(the granularity is block size)
>>>> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
>>>> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
>>>> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
>>>> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
>>> What's the merge would be like in dlm module? Will it cause deadlock when
>>> node1 extend 0-9 to 0-19 and node 2 extend 10-19 to 0-19?
>>>
>>> thanks,
>>> wengang
>>>
>> Hi,
>> Do you mean that:
>> N1 keeps range lock(0,9), and wants to lock(10,19).
>> N2 keeps range lock(10,19), and wants to lock(0,9).
>>
>> Firstly N1 sends locking message (10,19) to master, then master determines the existance of conflicts among the ranges.
>> N1(10,19) is conflict with N2(10,19). So master sends bast message to N2.
>> Sencond N2 sends locking message (0,9) to master, N1(0,9) is conflict with N2 (0,9), so master sends bast message to N1.
>> N2 drops range lock(10,19), then N1 merges range lock into (0,19).
>> N1 drops range lock (0,9), then N1 splits range lock into (10,19).
>> Finally, N1 keeps range lock (10,19), N2 keeps range lock (0,9).
>>
>> So, there is no deadlock. Merging is only to the granted lock.
>>
>> But if N2 keeps range lock(10,19), and wants to lock(0,15), there is deadlock.
>> When N2 drops range lock(10,19), (10,19) is conflict with another request (0,15), range lock (0,15) must be canceled
>
> How you detect the deadlock and avoid it?
> thanks,
> wengang
No additional deadlock detection mechanism.
We keep the original cancel process which use OCFS2_LOCK_BUSY and OCFS2_LOCK_PENDING in ocfs2_unblock_lock.
Maybe we can have a talk by telephone, ok?
key data structures:
struct ocfs2_lock_res {
struct ocfs2_cluster_connection *conn;
void *l_priv;
struct ocfs2_lock_res_ops *l_ops;
spinlock_t l_lock;
struct mutex l_wait_blocked_mutex;
char l_name[OCFS2_LOCK_ID_MAX_LEN];
/* Data packed - type enum ocfs2_lock_type */
unsigned char l_type;
unsigned long l_flags;
wait_queue_head_t l_event;
char lvb[DLM_LVB_LEN];
struct list_head l_mask_waiters;
struct list_head l_grant_list; //l_list
struct list_head l_request_list; //l_list
struct list_head l_region_list; //l_list
struct list_head l_blocked_list; //l_list, remote blocking list
struct interval_node *list_root;
struct list_head l_debug_list;
#ifdef CONFIG_OCFS2_FS_STATS
struct ocfs2_lock_stats l_lock_prmode; /* PR mode stats */
u32 l_lock_refresh; /* Disk refreshes */
struct ocfs2_lock_stats l_lock_exmode; /* EX mode stats */
#endif
};
struct ocfs2_res_range_lock {
struct ocfs2_lock_res *l_lockres;
struct list_head l_list;
struct list_head l_tmp_list; //for args
struct list_head l_remote_list; //for osb
struct list_head l_wait_blocked_list;
struct list_head l_mask_waiters;
wait_queue_head_t l_event;
struct kref l_refs;
unsigned long l_flags;
unsigned long l_state;
signed char l_level;
struct interval_node_extent out_extent;
struct lock_interval *l_tree_node;
struct list_head l_same_range_list;
/* used from AST/BAST funcs. */
/* Data packed - enum type ocfs2_ast_action */
unsigned char l_action;
/* Data packed - enum type ocfs2_unlock_action */
unsigned char l_unlock_action;
unsigned int l_pending_gen;
struct ocfs2_dlm_lksb l_lksb;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map l_lockdep_map;
#endif
};
thanks,
yangwenfang
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-30 2:59 ` Xue jiufei
@ 2015-01-30 12:37 ` Goldwyn Rodrigues
2015-01-31 4:15 ` yangwenfang
0 siblings, 1 reply; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-30 12:37 UTC (permalink / raw)
To: ocfs2-devel
On 01/29/2015 08:59 PM, Xue jiufei wrote:
> Hi Goldwyn,
> On 2015/1/29 19:04, Goldwyn Rodrigues wrote:
>> Yangwenfang,
>>
>> On 01/29/2015 12:42 AM, yangwenfang wrote:
>>> On 2015/1/27 15:08, Srinivas Eeda wrote:
>>>> Hi Yangwenfang,
>>>>
>>>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>>>
>>>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>>>
>>> Hi,
>>> About 6k lines of code was modified including dlmglue and dlm in our demo.
>>>
>>> code modification:
>>> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
>>> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
>>> determine the existance of conflicts betwen multiple threads within the node.
>>> manage the cache of range lock to support unlock-delay.
>>> 3.dlm: determine the existance of conflicts betwen multiple nodes.
>>> add splitting and merging the range locking.
>>> 4.lib: interval tree.
>>>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>>>
>>> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
>>> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
>>> I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
>>
>> Okay, let me ask this question in another way: What is the purpose of
>> ocfs2_rw_lock(pr) in *this* scenario, where you are using
>> ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
>> ocfs2_rw_lock guarding?
>>
> Because RW lock is also used to protect O_DIRECT reads from racing with truncate,
> buffer read is not protected by RW lock. We do not want to change rw lock in buffer
> read scenario. So we add another range lock to complete this demo.
>
Sorry, I still don't understand this. You are changing the RW lock from
EX to PR during writes. This will add races (with respect to reads)
rather solving it.
--
Goldwyn
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
2015-01-30 12:37 ` Goldwyn Rodrigues
@ 2015-01-31 4:15 ` yangwenfang
0 siblings, 0 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-31 4:15 UTC (permalink / raw)
To: ocfs2-devel
On 2015/1/30 20:37, Goldwyn Rodrigues wrote:
>
>
> On 01/29/2015 08:59 PM, Xue jiufei wrote:
>> Hi Goldwyn,
>> On 2015/1/29 19:04, Goldwyn Rodrigues wrote:
>>> Yangwenfang,
>>>
>>> On 01/29/2015 12:42 AM, yangwenfang wrote:
>>>> On 2015/1/27 15:08, Srinivas Eeda wrote:
>>>>> Hi Yangwenfang,
>>>>>
>>>>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>>>>
>>>>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>>>>
>>>> Hi,
>>>> About 6k lines of code was modified including dlmglue and dlm in our demo.
>>>>
>>>> code modification:
>>>> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
>>>> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
>>>> determine the existance of conflicts betwen multiple threads within the node.
>>>> manage the cache of range lock to support unlock-delay.
>>>> 3.dlm: determine the existance of conflicts betwen multiple nodes.
>>>> add splitting and merging the range locking.
>>>> 4.lib: interval tree.
>>>>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>>>>
>>>> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
>>>> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
>>>> I think ocfs2_rw_lock(pr) + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
>>>
>>> Okay, let me ask this question in another way: What is the purpose of
>>> ocfs2_rw_lock(pr) in *this* scenario, where you are using
>>> ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
>>> ocfs2_rw_lock guarding?
>>>
>> Because RW lock is also used to protect O_DIRECT reads from racing with truncate,
>> buffer read is not protected by RW lock. We do not want to change rw lock in buffer
>> read scenario. So we add another range lock to complete this demo.
>>
>
> Sorry, I still don't understand this. You are changing the RW lock from EX to PR during writes. This will add races (with respect to reads) rather solving it.
>
Ok, let me have a try to answer this in another way.
1.ocfs2_rw_lock is called by ocfs2_setattr(EX), __ocfs2_change_file_space(EX),ocfs2_move_extents(EX),
ocfs2_file_splice_write(EX),ocfs2_file_aio_write(EX or PR),ocfs2_file_aio_read(PR).
Changing the RW lock from EX to PR is risky when another node is keeping RW PR lock.
So we add another type of lock in functions which are using RW PR lock including ocfs2_file_aio_write and ocfs2_file_aio_read.
Such implementations do not change the original read/write IO flows.
RW and range lock work together to prevent racing just in ocfs2_file_aio_write and ocfs2_file_aio_read.
For example: buffer read/write
ocfs2_file_aio_write
mutex_lock(&inode->i_mutex);
ocfs2_rw_lock(0, Max_LEN, PR)
ocfs2_range_lock(start, end, EX)
generic_file_buffered_write
ocfs2_write_begin
ocfs2_write_end
ocfs2_file_aio_read
ocfs2_range_lock(start, end, PR)
generic_file_aio_read
2. Making the rw lock itself support range locking is realizable after finishing the follwing things:
1)modify the interface of ocfs2_rw_lock, add parameters of start and end.
2)get the scope in ocfs2_file_aio_write.
3)add calling ocfs2_rw_lock in ocfs2_file_aio_read.
4)the scope is (0, Max_LEN) in ocfs2_setattr, __ocfs2_change_file_space, ocfs2_move_extents, ocfs2_file_splice_write.
thanks,
yangwenfang
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
@ 2015-01-28 8:43 David Weber
0 siblings, 0 replies; 16+ messages in thread
From: David Weber @ 2015-01-28 8:43 UTC (permalink / raw)
To: ocfs2-devel
Hi,
On 01/26/2015 04:28 AM, yangwenfang wrote:
>
> What:
> Byte range lock is applied to lock a region of a file to accelerate
> reading/writing concurrently.
>
> Why:
> Currently ocfs2 does not support byte range lock. Since multiple nodes
> may concurrently update/write at different positions of the same file
> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer
> than
> DB+GPFS in running TPCC.
> Aiming at improving the efficiency of parallel accesses to the same file,
> we have implemented a demo of range lock feature which has been supported
> by lustre and GPFS, so that a file can be updated by different nodes in
> the cluster when they are visiting different blocks.
would this also make cluster aware fcntl(2) locks with the o2cb stack possible?
Cheers,
David
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2015-01-31 4:15 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 12:28 [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock yangwenfang
2015-01-27 7:08 ` Srinivas Eeda
2015-01-29 6:42 ` yangwenfang
2015-01-29 11:04 ` Goldwyn Rodrigues
2015-01-30 2:59 ` Xue jiufei
2015-01-30 12:37 ` Goldwyn Rodrigues
2015-01-31 4:15 ` yangwenfang
2015-01-29 11:07 ` Goldwyn Rodrigues
2015-01-29 0:05 ` Goldwyn Rodrigues
2015-01-29 3:21 ` Wengang Wang
2015-01-29 7:47 ` yangwenfang
2015-01-29 8:06 ` Wengang Wang
2015-01-30 3:54 ` yangwenfang
2015-01-30 6:02 ` Wengang Wang
2015-01-30 7:46 ` yangwenfang
2015-01-28 8:43 David Weber
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.