All of lore.kernel.org
 help / color / mirror / Atom feed
* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
@ 2015-01-26 12:28 yangwenfang
  2015-01-27  7:08 ` Srinivas Eeda
  2015-01-29  0:05 ` Goldwyn Rodrigues
  0 siblings, 2 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-26 12:28 UTC (permalink / raw)
  To: ocfs2-devel

What:
Byte range lock is applied to lock a region of a file to accelerate
reading/writing concurrently.

Why:		
Currently ocfs2 does not support byte range lock. Since multiple nodes
may concurrently update/write at different positions of the same file
in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
DB+GPFS in running TPCC.
Aiming at improving the efficiency of parallel accesses to the same file,
we have implemented a demo of range lock feature which has been supported
by lustre and GPFS, so that a file can be updated by different nodes in
the cluster when they are visiting different blocks.

How:
Key issues in design and implementation:
1.In ocfs2, each file only has one lock, which is incapable of telling
different position.
One solution is to add a range field (start,end) in a lock. For example:
-ocfs2_lock_res(N1)	      dlm_lock_resource(Master)	ocfs2_lock_res(N2)
-ocfs2_res_range_lock (0,9)----dlm_lock(0,9)    N1			
-				dlm_lock(10,19)  N2<--ocfs2_res_range_lock(10,19)
-ocfs2_res_range_lock (20,29)---dlm_lock(20,29)  N1			
-				dlm_lock(30,49)  N2<--ocfs2_res_range_lock(30,49)
-ocfs2_res_range_lock (50,59)---dlm_lock(50,59)  N1			
-				dlm_lock(60,69)  N2<--ocfs2_res_range_lock(60,69)

Each lock resource deploys an interval tree to manage the range, which
supports basic operations like add, delete, insert, find, split and merge.
The most important issue is to determine the existance of conflicts
among the ranges. Conflict-free ranges of the same file can be accessed
concurrently. In the contrary, nodes must wait for the release of a
conflicted lock before accessing the range of file.

Byte range lock supports split and merge rules: for same level, larger
scope; different level, write > read(If a node keeps EX lock with
range(start,end), then it has PR range lock(start,end)).
For example:
(1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
(0,19) PR;
(2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
become(0,19) PR, (5,19)EX;
(3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
split the lock and keep (6,9)PR.

2.In ocfs2, there are only three types of lock resources: rw, inode and open
which provide protections to different contents.
We need to add another lock resource(ip_range_lock_lockres) to protect
different ranges in IO read/write process.
For example: buffer read/write.
(1)ocfs2_file_aio_write	------------->ocfs2_file_aio_write
	ocfs2_rw_lock(ex)		ocfs2_rw_lock(pr)
					ocfs2_range_lock(start, end, ex)
	ocfs2_write_begin
		ocfs2_inode_lock(ex)    ocfs2_inode_lock(pr)
					if append, update to ex;
(2)ocfs2_file_aio_read---------------> no need to change.
	ocfs2_readpage
		ocfs2_inode_lock(pr)
(3)but it is a problem in read_ahead.
	ocfs2_readpages------------------>ocfs2_readpages
	ocfs2_inode_lock(pr)		ocfs2_inode_lock(pr)
					ocfs2_range_lock(start, end, pr)
																	
Limitations based on our assumption:
1.Byte range lock is only beneficial for update write.
2.Too many locks because of delayed unlock.
3.Significant source code modification is necessitated, involving almost the
whole dlmglue and dlm modules.

As described above, there are also many limitations base on our assumption.
Many thanks for any advice.

thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-26 12:28 [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock yangwenfang
@ 2015-01-27  7:08 ` Srinivas Eeda
  2015-01-29  6:42   ` yangwenfang
  2015-01-29  0:05 ` Goldwyn Rodrigues
  1 sibling, 1 reply; 16+ messages in thread
From: Srinivas Eeda @ 2015-01-27  7:08 UTC (permalink / raw)
  To: ocfs2-devel

Hi Yangwenfang,

thank you very much for initiating this RFC :). This feature is long due 
for OCFS2 and we are also interested in implementing this feature. 
Wengang(cc'ed) has been looking into analysing and giving an attempt to 
implement it. We haven't  looked at splitting and merging the range 
locking yet, but looked at having lock fairness and range locking. 
Wengang has done some of the dlm changes to see how it can be done but 
other changes are still work in progress. We will email more details in 
coming few days.

Since you are also looking into it, it would be great if we can 
collaborate work on this feature. Can you please share more info on the 
demo code you mentioned ? Like what it does and how much work has been 
done on this ?

One of the thing we considered was making the rw lock itself support 
range locking, which is a different approach from what you mentioned. Is 
there any reason why rw lock cannot be used and we needa new 
ip_range_lock_lockres ?

Thanks,
--Srini


Hi On 01/26/2015 04:28 AM, yangwenfang wrote:
> What:
> Byte range lock is applied to lock a region of a file to accelerate
> reading/writing concurrently.
>
> Why:		
> Currently ocfs2 does not support byte range lock. Since multiple nodes
> may concurrently update/write at different positions of the same file
> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
> DB+GPFS in running TPCC.
> Aiming at improving the efficiency of parallel accesses to the same file,
> we have implemented a demo of range lock feature which has been supported
> by lustre and GPFS, so that a file can be updated by different nodes in
> the cluster when they are visiting different blocks.
>
> How:
> Key issues in design and implementation:
> 1.In ocfs2, each file only has one lock, which is incapable of telling
> different position.
> One solution is to add a range field (start,end) in a lock. For example:
> -ocfs2_lock_res(N1)	      dlm_lock_resource(Master)	ocfs2_lock_res(N2)
> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9)    N1			
> -				dlm_lock(10,19)  N2<--ocfs2_res_range_lock(10,19)
> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29)  N1			
> -				dlm_lock(30,49)  N2<--ocfs2_res_range_lock(30,49)
> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59)  N1			
> -				dlm_lock(60,69)  N2<--ocfs2_res_range_lock(60,69)
>
> Each lock resource deploys an interval tree to manage the range, which
> supports basic operations like add, delete, insert, find, split and merge.
> The most important issue is to determine the existance of conflicts
> among the ranges. Conflict-free ranges of the same file can be accessed
> concurrently. In the contrary, nodes must wait for the release of a
> conflicted lock before accessing the range of file.
>
> Byte range lock supports split and merge rules: for same level, larger
> scope; different level, write > read(If a node keeps EX lock with
> range(start,end), then it has PR range lock(start,end)).
> For example:
> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
> (0,19) PR;
> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
> become(0,19) PR, (5,19)EX;
> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
> split the lock and keep (6,9)PR.
>
> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
> which provide protections to different contents.
> We need to add another lock resource(ip_range_lock_lockres) to protect
> different ranges in IO read/write process.
> For example: buffer read/write.
> (1)ocfs2_file_aio_write	------------->ocfs2_file_aio_write
> 	ocfs2_rw_lock(ex)		ocfs2_rw_lock(pr)
> 					ocfs2_range_lock(start, end, ex)
> 	ocfs2_write_begin
> 		ocfs2_inode_lock(ex)    ocfs2_inode_lock(pr)
> 					if append, update to ex;
> (2)ocfs2_file_aio_read---------------> no need to change.
> 	ocfs2_readpage
> 		ocfs2_inode_lock(pr)
> (3)but it is a problem in read_ahead.
> 	ocfs2_readpages------------------>ocfs2_readpages
> 	ocfs2_inode_lock(pr)		ocfs2_inode_lock(pr)
> 					ocfs2_range_lock(start, end, pr)
> 																	
> Limitations based on our assumption:
> 1.Byte range lock is only beneficial for update write.
> 2.Too many locks because of delayed unlock.
> 3.Significant source code modification is necessitated, involving almost the
> whole dlmglue and dlm modules.
>
> As described above, there are also many limitations base on our assumption.
> Many thanks for any advice.
>
> thanks.
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-26 12:28 [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock yangwenfang
  2015-01-27  7:08 ` Srinivas Eeda
@ 2015-01-29  0:05 ` Goldwyn Rodrigues
  2015-01-29  3:21   ` Wengang Wang
  2015-01-29  7:47   ` yangwenfang
  1 sibling, 2 replies; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-29  0:05 UTC (permalink / raw)
  To: ocfs2-devel


Hi Yangwenfang,

I appreciate the effort in this regard.

On 01/26/2015 06:28 AM, yangwenfang wrote:
> What:
> Byte range lock is applied to lock a region of a file to accelerate
> reading/writing concurrently.
>
> Why:		
> Currently ocfs2 does not support byte range lock. Since multiple nodes
> may concurrently update/write at different positions of the same file
> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
> DB+GPFS in running TPCC.
> Aiming at improving the efficiency of parallel accesses to the same file,
> we have implemented a demo of range lock feature which has been supported
> by lustre and GPFS, so that a file can be updated by different nodes in
> the cluster when they are visiting different blocks.
>
> How:
> Key issues in design and implementation:
> 1.In ocfs2, each file only has one lock, which is incapable of telling
> different position.
> One solution is to add a range field (start,end) in a lock. For example:
> -ocfs2_lock_res(N1)	      dlm_lock_resource(Master)	ocfs2_lock_res(N2)
> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9)    N1			
> -				dlm_lock(10,19)  N2<--ocfs2_res_range_lock(10,19)
> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29)  N1			
> -				dlm_lock(30,49)  N2<--ocfs2_res_range_lock(30,49)
> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59)  N1			
> -				dlm_lock(60,69)  N2<--ocfs2_res_range_lock(60,69)
>
> Each lock resource deploys an interval tree to manage the range, which
> supports basic operations like add, delete, insert, find, split and merge.
> The most important issue is to determine the existance of conflicts
> among the ranges. Conflict-free ranges of the same file can be accessed
> concurrently. In the contrary, nodes must wait for the release of a
> conflicted lock before accessing the range of file.
>
> Byte range lock supports split and merge rules: for same level, larger
> scope; different level, write > read(If a node keeps EX lock with
> range(start,end), then it has PR range lock(start,end)).
> For example:
> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
> (0,19) PR;
> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
> become(0,19) PR, (5,19)EX;
> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
> split the lock and keep (6,9)PR.

What is the purpose of doing this kind of merge/split? I assume this 
will be required in case of multiple processes from the same node 
read/write to the file. Would it not be simpler to not merge or split 
and keep separate instances in lock resources? This way you would have 
to do relatively lesser book keeping with respect to comparisons.

Are these numbers in your pseudocode byte ranges? If yes, how do you 
propose multiple writes which lie within a block_size/cluster_size range?


>
> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
> which provide protections to different contents.
> We need to add another lock resource(ip_range_lock_lockres) to protect
> different ranges in IO read/write process.
> For example: buffer read/write.
> (1)ocfs2_file_aio_write	------------->ocfs2_file_aio_write
> 	ocfs2_rw_lock(ex)		ocfs2_rw_lock(pr)
> 					ocfs2_range_lock(start, end, ex)

This does not seem right. ocfs2_rw_lock is meant to serialize writes to 
the same file. Changing it from ex to pr would make the file 
inconsistent for writes to the same file. As Srini proposed, why create 
a new lock instead of adding the feature to rw_lock?

> 	ocfs2_write_begin
> 		ocfs2_inode_lock(ex)    ocfs2_inode_lock(pr)
> 					if append, update to ex;
> (2)ocfs2_file_aio_read---------------> no need to change.
> 	ocfs2_readpage
> 		ocfs2_inode_lock(pr)
> (3)but it is a problem in read_ahead.
> 	ocfs2_readpages------------------>ocfs2_readpages
> 	ocfs2_inode_lock(pr)		ocfs2_inode_lock(pr)
> 					ocfs2_range_lock(start, end, pr)
> 																	
> Limitations based on our assumption:
> 1.Byte range lock is only beneficial for update write.
> 2.Too many locks because of delayed unlock.
> 3.Significant source code modification is necessitated, involving almost the
> whole dlmglue and dlm modules.
>
> As described above, there are also many limitations base on our assumption.
> Many thanks for any advice.
>


-- 
Goldwyn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29  0:05 ` Goldwyn Rodrigues
@ 2015-01-29  3:21   ` Wengang Wang
  2015-01-29  7:47   ` yangwenfang
  1 sibling, 0 replies; 16+ messages in thread
From: Wengang Wang @ 2015-01-29  3:21 UTC (permalink / raw)
  To: ocfs2-devel


? 2015?01?29? 08:05, Goldwyn Rodrigues ??:
> Hi Yangwenfang,
>
> I appreciate the effort in this regard.
>
> On 01/26/2015 06:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>>
>> Why:		
>> Currently ocfs2 does not support byte range lock. Since multiple nodes
>> may concurrently update/write at different positions of the same file
>> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
>> DB+GPFS in running TPCC.
>> Aiming at improving the efficiency of parallel accesses to the same file,
>> we have implemented a demo of range lock feature which has been supported
>> by lustre and GPFS, so that a file can be updated by different nodes in
>> the cluster when they are visiting different blocks.
>>
>> How:
>> Key issues in design and implementation:
>> 1.In ocfs2, each file only has one lock, which is incapable of telling
>> different position.
>> One solution is to add a range field (start,end) in a lock. For example:
>> -ocfs2_lock_res(N1)	      dlm_lock_resource(Master)	ocfs2_lock_res(N2)
>> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9)    N1			
>> -				dlm_lock(10,19)  N2<--ocfs2_res_range_lock(10,19)
>> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29)  N1			
>> -				dlm_lock(30,49)  N2<--ocfs2_res_range_lock(30,49)
>> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59)  N1			
>> -				dlm_lock(60,69)  N2<--ocfs2_res_range_lock(60,69)
>>
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
> What is the purpose of doing this kind of merge/split? I assume this
> will be required in case of multiple processes from the same node
> read/write to the file. Would it not be simpler to not merge or split
> and keep separate instances in lock resources? This way you would have
> to do relatively lesser book keeping with respect to comparisons.
>
> Are these numbers in your pseudocode byte ranges? If yes, how do you
> propose multiple writes which lie within a block_size/cluster_size range?
>

Yes, if the range lock is used for file read/write, the granularity 
would be block rather than byte.
Say for example block size is 512, a write to 0-9 would acquire whole 
0~511 bytes to be locked. Or acquire 0~0 block to be locked. Otherwise 
If two write requests would access to same block, say one writes to 
0~254 and the other writes to 255~511, if they take 0~254 and 255~511 
respectively, the contents in this block may get corrupted after the two 
writes.

thanks,
wengang

>> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
>> which provide protections to different contents.
>> We need to add another lock resource(ip_range_lock_lockres) to protect
>> different ranges in IO read/write process.
>> For example: buffer read/write.
>> (1)ocfs2_file_aio_write	------------->ocfs2_file_aio_write
>> 	ocfs2_rw_lock(ex)		ocfs2_rw_lock(pr)
>> 					ocfs2_range_lock(start, end, ex)
> This does not seem right. ocfs2_rw_lock is meant to serialize writes to
> the same file. Changing it from ex to pr would make the file
> inconsistent for writes to the same file. As Srini proposed, why create
> a new lock instead of adding the feature to rw_lock?
>
>> 	ocfs2_write_begin
>> 		ocfs2_inode_lock(ex)    ocfs2_inode_lock(pr)
>> 					if append, update to ex;
>> (2)ocfs2_file_aio_read---------------> no need to change.
>> 	ocfs2_readpage
>> 		ocfs2_inode_lock(pr)
>> (3)but it is a problem in read_ahead.
>> 	ocfs2_readpages------------------>ocfs2_readpages
>> 	ocfs2_inode_lock(pr)		ocfs2_inode_lock(pr)
>> 					ocfs2_range_lock(start, end, pr)
>> 																	
>> Limitations based on our assumption:
>> 1.Byte range lock is only beneficial for update write.
>> 2.Too many locks because of delayed unlock.
>> 3.Significant source code modification is necessitated, involving almost the
>> whole dlmglue and dlm modules.
>>
>> As described above, there are also many limitations base on our assumption.
>> Many thanks for any advice.
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-27  7:08 ` Srinivas Eeda
@ 2015-01-29  6:42   ` yangwenfang
  2015-01-29 11:04     ` Goldwyn Rodrigues
  2015-01-29 11:07     ` Goldwyn Rodrigues
  0 siblings, 2 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-29  6:42 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/1/27 15:08, Srinivas Eeda wrote:
> Hi Yangwenfang,
> 
> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't  looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
> 
> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
> 
Hi,
About 6k lines of code was modified including dlmglue and dlm in our demo.

code modification:
1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
		   determine the existance of conflicts betwen multiple threads within the node.
		   manage the cache of range lock to support unlock-delay.
3.dlm: determine the existance of conflicts betwen multiple nodes.
	   add splitting and merging the range locking.
4.lib: interval tree.
> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
> 
RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
I think ocfs2_rw_lock(pr)  + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
> Thanks,
> --Srini
> 
> 
> Hi On 01/26/2015 04:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.
>>
>> Why:       
>> Currently ocfs2 does not support byte range lock. Since multiple nodes
>> may concurrently update/write at different positions of the same file
>> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer than
>> DB+GPFS in running TPCC.
>> Aiming at improving the efficiency of parallel accesses to the same file,
>> we have implemented a demo of range lock feature which has been supported
>> by lustre and GPFS, so that a file can be updated by different nodes in
>> the cluster when they are visiting different blocks.
>>
>> How:
>> Key issues in design and implementation:
>> 1.In ocfs2, each file only has one lock, which is incapable of telling
>> different position.
>> One solution is to add a range field (start,end) in a lock. For example:
>> -ocfs2_lock_res(N1)          dlm_lock_resource(Master)    ocfs2_lock_res(N2)
>> -ocfs2_res_range_lock (0,9)----dlm_lock(0,9)    N1           
>> -                dlm_lock(10,19)  N2<--ocfs2_res_range_lock(10,19)
>> -ocfs2_res_range_lock (20,29)---dlm_lock(20,29)  N1           
>> -                dlm_lock(30,49)  N2<--ocfs2_res_range_lock(30,49)
>> -ocfs2_res_range_lock (50,59)---dlm_lock(50,59)  N1           
>> -                dlm_lock(60,69)  N2<--ocfs2_res_range_lock(60,69)
>>
>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
>>
>> 2.In ocfs2, there are only three types of lock resources: rw, inode and open
>> which provide protections to different contents.
>> We need to add another lock resource(ip_range_lock_lockres) to protect
>> different ranges in IO read/write process.
>> For example: buffer read/write.
>> (1)ocfs2_file_aio_write    ------------->ocfs2_file_aio_write
>>     ocfs2_rw_lock(ex)        ocfs2_rw_lock(pr)
>>                     ocfs2_range_lock(start, end, ex)
>>     ocfs2_write_begin
>>         ocfs2_inode_lock(ex)    ocfs2_inode_lock(pr)
>>                     if append, update to ex;
>> (2)ocfs2_file_aio_read---------------> no need to change.
>>     ocfs2_readpage
>>         ocfs2_inode_lock(pr)
>> (3)but it is a problem in read_ahead.
>>     ocfs2_readpages------------------>ocfs2_readpages
>>     ocfs2_inode_lock(pr)        ocfs2_inode_lock(pr)
>>                     ocfs2_range_lock(start, end, pr)
>>                                                                    
>> Limitations based on our assumption:
>> 1.Byte range lock is only beneficial for update write.
>> 2.Too many locks because of delayed unlock.
>> 3.Significant source code modification is necessitated, involving almost the
>> whole dlmglue and dlm modules.
>>
>> As described above, there are also many limitations base on our assumption.
>> Many thanks for any advice.
>>
>> thanks.
>>
> 
> 
> .
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29  0:05 ` Goldwyn Rodrigues
  2015-01-29  3:21   ` Wengang Wang
@ 2015-01-29  7:47   ` yangwenfang
  2015-01-29  8:06     ` Wengang Wang
  1 sibling, 1 reply; 16+ messages in thread
From: yangwenfang @ 2015-01-29  7:47 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
> 
> Hi Yangwenfang,
> 
> I appreciate the effort in this regard.
> 
> On 01/26/2015 06:28 AM, yangwenfang wrote:
>> What:
>> Byte range lock is applied to lock a region of a file to accelerate
>> reading/writing concurrently.

>> Each lock resource deploys an interval tree to manage the range, which
>> supports basic operations like add, delete, insert, find, split and merge.
>> The most important issue is to determine the existance of conflicts
>> among the ranges. Conflict-free ranges of the same file can be accessed
>> concurrently. In the contrary, nodes must wait for the release of a
>> conflicted lock before accessing the range of file.
>>
>> Byte range lock supports split and merge rules: for same level, larger
>> scope; different level, write > read(If a node keeps EX lock with
>> range(start,end), then it has PR range lock(start,end)).
>> For example:
>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>> (0,19) PR;
>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>> become(0,19) PR, (5,19)EX;
>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>> split the lock and keep (6,9)PR.
> 
> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
> 
Hi,
Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
For example(the granularity is block size)
1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
3.Node 1 writes to 5-10, then no need to dlmlock from master.
3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).

> Are these numbers in your pseudocode byte ranges? If yes, how do you propose multiple writes which lie within a block_size/cluster_size range?
> 
No, the granularity of these numbers is block size or PAGE_SIZE. The granularity is smaller, the conflict is more. Actually, we use 1M in our test.

thanks,
yangwenfang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29  7:47   ` yangwenfang
@ 2015-01-29  8:06     ` Wengang Wang
  2015-01-30  3:54       ` yangwenfang
  0 siblings, 1 reply; 16+ messages in thread
From: Wengang Wang @ 2015-01-29  8:06 UTC (permalink / raw)
  To: ocfs2-devel


? 2015?01?29? 15:47, yangwenfang ??:
> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>> Hi Yangwenfang,
>>
>> I appreciate the effort in this regard.
>>
>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>> What:
>>> Byte range lock is applied to lock a region of a file to accelerate
>>> reading/writing concurrently.
>>> Each lock resource deploys an interval tree to manage the range, which
>>> supports basic operations like add, delete, insert, find, split and merge.
>>> The most important issue is to determine the existance of conflicts
>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>> concurrently. In the contrary, nodes must wait for the release of a
>>> conflicted lock before accessing the range of file.
>>>
>>> Byte range lock supports split and merge rules: for same level, larger
>>> scope; different level, write > read(If a node keeps EX lock with
>>> range(start,end), then it has PR range lock(start,end)).
>>> For example:
>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>> (0,19) PR;
>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>> become(0,19) PR, (5,19)EX;
>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>> split the lock and keep (6,9)PR.
>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>
> Hi,
> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
> For example(the granularity is block size)
> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).

What's the merge would be like in dlm module? Will it cause deadlock when
node1 extend 0-9 to 0-19  and node 2 extend 10-19 to 0-19?

thanks,
wengang

>> Are these numbers in your pseudocode byte ranges? If yes, how do you propose multiple writes which lie within a block_size/cluster_size range?
>>
> No, the granularity of these numbers is block size or PAGE_SIZE. The granularity is smaller, the conflict is more. Actually, we use 1M in our test.
>
> thanks,
> yangwenfang
>
>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29  6:42   ` yangwenfang
@ 2015-01-29 11:04     ` Goldwyn Rodrigues
  2015-01-30  2:59       ` Xue jiufei
  2015-01-29 11:07     ` Goldwyn Rodrigues
  1 sibling, 1 reply; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-29 11:04 UTC (permalink / raw)
  To: ocfs2-devel

Yangwenfang,

On 01/29/2015 12:42 AM, yangwenfang wrote:
> On 2015/1/27 15:08, Srinivas Eeda wrote:
>> Hi Yangwenfang,
>>
>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't  looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>
>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>
> Hi,
> About 6k lines of code was modified including dlmglue and dlm in our demo.
>
> code modification:
> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
> 		   determine the existance of conflicts betwen multiple threads within the node.
> 		   manage the cache of range lock to support unlock-delay.
> 3.dlm: determine the existance of conflicts betwen multiple nodes.
> 	   add splitting and merging the range locking.
> 4.lib: interval tree.
>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>
> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
> I think ocfs2_rw_lock(pr)  + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?

Okay, let me ask this question in another way: What is the purpose of 
ocfs2_rw_lock(pr) in *this* scenario, where you are using 
ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is 
ocfs2_rw_lock guarding?

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29  6:42   ` yangwenfang
  2015-01-29 11:04     ` Goldwyn Rodrigues
@ 2015-01-29 11:07     ` Goldwyn Rodrigues
  1 sibling, 0 replies; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-29 11:07 UTC (permalink / raw)
  To: ocfs2-devel

Yangwenfang,

On 01/29/2015 12:42 AM, yangwenfang wrote:
> On 2015/1/27 15:08, Srinivas Eeda wrote:
>> Hi Yangwenfang,
>>
>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't  looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>
>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>
> Hi,
> About 6k lines of code was modified including dlmglue and dlm in our demo.
>
> code modification:
> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
> 		   determine the existance of conflicts betwen multiple threads within the node.
> 		   manage the cache of range lock to support unlock-delay.
> 3.dlm: determine the existance of conflicts betwen multiple nodes.
> 	   add splitting and merging the range locking.
> 4.lib: interval tree.
>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>
> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
> I think ocfs2_rw_lock(pr)  + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?


Okay, let me ask this question in another way: What is the purpose of 
ocfs2_rw_lock(pr) in *this* scenario, where you are using 
ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is 
ocfs2_rw_lock guarding?

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29 11:04     ` Goldwyn Rodrigues
@ 2015-01-30  2:59       ` Xue jiufei
  2015-01-30 12:37         ` Goldwyn Rodrigues
  0 siblings, 1 reply; 16+ messages in thread
From: Xue jiufei @ 2015-01-30  2:59 UTC (permalink / raw)
  To: ocfs2-devel

Hi Goldwyn,
On 2015/1/29 19:04, Goldwyn Rodrigues wrote:
> Yangwenfang,
> 
> On 01/29/2015 12:42 AM, yangwenfang wrote:
>> On 2015/1/27 15:08, Srinivas Eeda wrote:
>>> Hi Yangwenfang,
>>>
>>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't  looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>>
>>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>>
>> Hi,
>> About 6k lines of code was modified including dlmglue and dlm in our demo.
>>
>> code modification:
>> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
>> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
>> 		   determine the existance of conflicts betwen multiple threads within the node.
>> 		   manage the cache of range lock to support unlock-delay.
>> 3.dlm: determine the existance of conflicts betwen multiple nodes.
>> 	   add splitting and merging the range locking.
>> 4.lib: interval tree.
>>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>>
>> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
>> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
>> I think ocfs2_rw_lock(pr)  + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
> 
> Okay, let me ask this question in another way: What is the purpose of 
> ocfs2_rw_lock(pr) in *this* scenario, where you are using 
> ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is 
> ocfs2_rw_lock guarding?
> 
Because RW lock is also used to protect O_DIRECT reads from racing with truncate,
buffer read is not protected by RW lock. We do not want to change rw lock in buffer
read scenario. So we add another range lock to complete this demo.

Thanks
Xuejiufei

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-29  8:06     ` Wengang Wang
@ 2015-01-30  3:54       ` yangwenfang
  2015-01-30  6:02         ` Wengang Wang
  0 siblings, 1 reply; 16+ messages in thread
From: yangwenfang @ 2015-01-30  3:54 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/1/29 16:06, Wengang Wang wrote:
> 
>> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>>> Hi Yangwenfang,
>>>
>>> I appreciate the effort in this regard.
>>>
>>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>>> What:
>>>> Byte range lock is applied to lock a region of a file to accelerate
>>>> reading/writing concurrently.
>>>> Each lock resource deploys an interval tree to manage the range, which
>>>> supports basic operations like add, delete, insert, find, split and merge.
>>>> The most important issue is to determine the existance of conflicts
>>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>>> concurrently. In the contrary, nodes must wait for the release of a
>>>> conflicted lock before accessing the range of file.
>>>>
>>>> Byte range lock supports split and merge rules: for same level, larger
>>>> scope; different level, write > read(If a node keeps EX lock with
>>>> range(start,end), then it has PR range lock(start,end)).
>>>> For example:
>>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>>> (0,19) PR;
>>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>>> become(0,19) PR, (5,19)EX;
>>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>>> split the lock and keep (6,9)PR.
>>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>>
>> Hi,
>> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
>> For example(the granularity is block size)
>> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
>> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
>> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
>> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
> 
> What's the merge would be like in dlm module? Will it cause deadlock when
> node1 extend 0-9 to 0-19  and node 2 extend 10-19 to 0-19?
> 
> thanks,
> wengang
> 
Hi,
Do you mean that:
N1 keeps range lock(0,9), and wants to lock(10,19).
N2 keeps range lock(10,19), and wants to lock(0,9).

Firstly N1 sends locking message (10,19) to master, then master determines the existance of conflicts among the ranges.
N1(10,19) is conflict with N2(10,19). So master sends bast message to N2.
Sencond N2 sends locking message (0,9) to master, N1(0,9) is conflict with N2 (0,9), so master sends bast message to N1.
N2 drops range lock(10,19), then N1 merges range lock into (0,19).
N1 drops range lock (0,9), then N1 splits range lock into (10,19).
Finally, N1 keeps range lock (10,19), N2 keeps range lock (0,9).

So, there is no deadlock. Merging is only to the granted lock.

But if N2 keeps range lock(10,19), and wants to lock(0,15), there is deadlock.
When N2 drops range lock(10,19), (10,19) is conflict with another request (0,15), range lock (0,15) must be canceled

So, the most important issue is to determine the existance of conflicts among the ranges.

thanks,
yangwenfang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-30  3:54       ` yangwenfang
@ 2015-01-30  6:02         ` Wengang Wang
  2015-01-30  7:46           ` yangwenfang
  0 siblings, 1 reply; 16+ messages in thread
From: Wengang Wang @ 2015-01-30  6:02 UTC (permalink / raw)
  To: ocfs2-devel

Hi Wenfang,

? 2015?01?30? 11:54, yangwenfang ??:
> On 2015/1/29 16:06, Wengang Wang wrote:
>>> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>>>> Hi Yangwenfang,
>>>>
>>>> I appreciate the effort in this regard.
>>>>
>>>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>>>> What:
>>>>> Byte range lock is applied to lock a region of a file to accelerate
>>>>> reading/writing concurrently.
>>>>> Each lock resource deploys an interval tree to manage the range, which
>>>>> supports basic operations like add, delete, insert, find, split and merge.
>>>>> The most important issue is to determine the existance of conflicts
>>>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>>>> concurrently. In the contrary, nodes must wait for the release of a
>>>>> conflicted lock before accessing the range of file.
>>>>>
>>>>> Byte range lock supports split and merge rules: for same level, larger
>>>>> scope; different level, write > read(If a node keeps EX lock with
>>>>> range(start,end), then it has PR range lock(start,end)).
>>>>> For example:
>>>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>>>> (0,19) PR;
>>>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>>>> become(0,19) PR, (5,19)EX;
>>>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>>>> split the lock and keep (6,9)PR.
>>>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>>>
>>> Hi,
>>> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
>>> For example(the granularity is block size)
>>> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
>>> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
>>> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
>>> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
>> What's the merge would be like in dlm module? Will it cause deadlock when
>> node1 extend 0-9 to 0-19  and node 2 extend 10-19 to 0-19?
>>
>> thanks,
>> wengang
>>
> Hi,
> Do you mean that:
> N1 keeps range lock(0,9), and wants to lock(10,19).
> N2 keeps range lock(10,19), and wants to lock(0,9).
>
> Firstly N1 sends locking message (10,19) to master, then master determines the existance of conflicts among the ranges.
> N1(10,19) is conflict with N2(10,19). So master sends bast message to N2.
> Sencond N2 sends locking message (0,9) to master, N1(0,9) is conflict with N2 (0,9), so master sends bast message to N1.
> N2 drops range lock(10,19), then N1 merges range lock into (0,19).
> N1 drops range lock (0,9), then N1 splits range lock into (10,19).
> Finally, N1 keeps range lock (10,19), N2 keeps range lock (0,9).
>
> So, there is no deadlock. Merging is only to the granted lock.
>
> But if N2 keeps range lock(10,19), and wants to lock(0,15), there is deadlock.
> When N2 drops range lock(10,19), (10,19) is conflict with another request (0,15), range lock (0,15) must be canceled

How you detect the deadlock and avoid it?
thanks,
wengang
> So, the most important issue is to determine the existance of conflicts among the ranges.
>
> thanks,
> yangwenfang
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-30  6:02         ` Wengang Wang
@ 2015-01-30  7:46           ` yangwenfang
  0 siblings, 0 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-30  7:46 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/1/30 14:02, Wengang Wang wrote:
> Hi Wenfang,
> 
> ? 2015?01?30? 11:54, yangwenfang ??:
>> On 2015/1/29 16:06, Wengang Wang wrote:
>>>> On 2015/1/29 8:05, Goldwyn Rodrigues wrote:
>>>>> Hi Yangwenfang,
>>>>>
>>>>> I appreciate the effort in this regard.
>>>>>
>>>>> On 01/26/2015 06:28 AM, yangwenfang wrote:
>>>>>> What:
>>>>>> Byte range lock is applied to lock a region of a file to accelerate
>>>>>> reading/writing concurrently.
>>>>>> Each lock resource deploys an interval tree to manage the range, which
>>>>>> supports basic operations like add, delete, insert, find, split and merge.
>>>>>> The most important issue is to determine the existance of conflicts
>>>>>> among the ranges. Conflict-free ranges of the same file can be accessed
>>>>>> concurrently. In the contrary, nodes must wait for the release of a
>>>>>> conflicted lock before accessing the range of file.
>>>>>>
>>>>>> Byte range lock supports split and merge rules: for same level, larger
>>>>>> scope; different level, write > read(If a node keeps EX lock with
>>>>>> range(start,end), then it has PR range lock(start,end)).
>>>>>> For example:
>>>>>> (1) merge: N1 keeps range lock (0,9)PR and (5,19)PR, the lock is merged into
>>>>>> (0,19) PR;
>>>>>> (2) merge: N1 keeps range lock (0,9)PR and (5,19)EX, the merged lock should
>>>>>> become(0,19) PR, (5,19)EX;
>>>>>> (3) split: N1 keeps range lock (0,9)PR, N2 tries to lock(0,5) PR, N1 should
>>>>>> split the lock and keep (6,9)PR.
>>>>> What is the purpose of doing this kind of merge/split? I assume this will be required in case of multiple processes from the same node read/write to the file. Would it not be simpler to not merge or split and keep separate instances in lock resources? This way you would have to do relatively lesser book keeping with respect to comparisons.
>>>>>
>>>> Hi,
>>>> Realization of this kind of merge/split is for cache of range lock to support unlock-delay.
>>>> For example(the granularity is block size)
>>>> 1.Node 1 writes to 0-9, it will keep the range lock(0,9,EX) if no other node write the same range of file.
>>>> 2.Node 1 writes to 10-19, then the range lock will be merged into (0,19,EX). if not, the number of locks will be more and more.
>>>> 3.Node 1 writes to 5-10, then no need to dlmlock from master.
>>>> 3.Node 2 writes to 5-10, conflict with Node 1, so Node 1 will drop (5,10), the range lock is splitted into (0,4) and (11,19).
>>> What's the merge would be like in dlm module? Will it cause deadlock when
>>> node1 extend 0-9 to 0-19  and node 2 extend 10-19 to 0-19?
>>>
>>> thanks,
>>> wengang
>>>
>> Hi,
>> Do you mean that:
>> N1 keeps range lock(0,9), and wants to lock(10,19).
>> N2 keeps range lock(10,19), and wants to lock(0,9).
>>
>> Firstly N1 sends locking message (10,19) to master, then master determines the existance of conflicts among the ranges.
>> N1(10,19) is conflict with N2(10,19). So master sends bast message to N2.
>> Sencond N2 sends locking message (0,9) to master, N1(0,9) is conflict with N2 (0,9), so master sends bast message to N1.
>> N2 drops range lock(10,19), then N1 merges range lock into (0,19).
>> N1 drops range lock (0,9), then N1 splits range lock into (10,19).
>> Finally, N1 keeps range lock (10,19), N2 keeps range lock (0,9).
>>
>> So, there is no deadlock. Merging is only to the granted lock.
>>
>> But if N2 keeps range lock(10,19), and wants to lock(0,15), there is deadlock.
>> When N2 drops range lock(10,19), (10,19) is conflict with another request (0,15), range lock (0,15) must be canceled
> 
> How you detect the deadlock and avoid it?
> thanks,
> wengang

No additional deadlock detection mechanism.
We keep the original cancel process which use OCFS2_LOCK_BUSY and OCFS2_LOCK_PENDING in ocfs2_unblock_lock.

Maybe we can have a talk by telephone, ok?

key data structures:
struct ocfs2_lock_res {
	struct ocfs2_cluster_connection *conn;
	void                    *l_priv;
	struct ocfs2_lock_res_ops *l_ops;

	spinlock_t               l_lock;
	struct mutex      l_wait_blocked_mutex;

	char                     l_name[OCFS2_LOCK_ID_MAX_LEN];
	/* Data packed - type enum ocfs2_lock_type */
	unsigned char            l_type;
	unsigned long		 l_flags;
	wait_queue_head_t        l_event;

	char lvb[DLM_LVB_LEN];
	
	struct list_head         l_mask_waiters;
	struct list_head  l_grant_list;   //l_list
	struct list_head  l_request_list; //l_list
	struct list_head  l_region_list;  //l_list

	struct list_head  l_blocked_list;   //l_list, remote blocking list

	struct interval_node  *list_root;
	struct list_head         l_debug_list;

#ifdef CONFIG_OCFS2_FS_STATS
	struct ocfs2_lock_stats  l_lock_prmode;		/* PR mode stats */
	u32                      l_lock_refresh;	/* Disk refreshes */
	struct ocfs2_lock_stats  l_lock_exmode;		/* EX mode stats */
#endif
};
struct ocfs2_res_range_lock {
	struct ocfs2_lock_res *l_lockres;

	struct list_head  l_list;
	struct list_head  l_tmp_list;  //for args
	struct list_head  l_remote_list; //for osb
	struct list_head  l_wait_blocked_list;
	struct list_head         l_mask_waiters;
	wait_queue_head_t        l_event;

	struct kref l_refs;
	unsigned long		 l_flags;
	unsigned long l_state;
	signed char		 l_level;

	struct interval_node_extent	out_extent;

	struct lock_interval *l_tree_node;
	struct list_head l_same_range_list;

	/* used from AST/BAST funcs. */
	/* Data packed - enum type ocfs2_ast_action */
	unsigned char            l_action;
	/* Data packed - enum type ocfs2_unlock_action */
	unsigned char            l_unlock_action;
	unsigned int             l_pending_gen;

	struct ocfs2_dlm_lksb    l_lksb;

#ifdef CONFIG_DEBUG_LOCK_ALLOC
	struct lockdep_map	 l_lockdep_map;
#endif
};

thanks,
yangwenfang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-30  2:59       ` Xue jiufei
@ 2015-01-30 12:37         ` Goldwyn Rodrigues
  2015-01-31  4:15           ` yangwenfang
  0 siblings, 1 reply; 16+ messages in thread
From: Goldwyn Rodrigues @ 2015-01-30 12:37 UTC (permalink / raw)
  To: ocfs2-devel



On 01/29/2015 08:59 PM, Xue jiufei wrote:
> Hi Goldwyn,
> On 2015/1/29 19:04, Goldwyn Rodrigues wrote:
>> Yangwenfang,
>>
>> On 01/29/2015 12:42 AM, yangwenfang wrote:
>>> On 2015/1/27 15:08, Srinivas Eeda wrote:
>>>> Hi Yangwenfang,
>>>>
>>>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't  looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>>>
>>>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>>>
>>> Hi,
>>> About 6k lines of code was modified including dlmglue and dlm in our demo.
>>>
>>> code modification:
>>> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
>>> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
>>> 		   determine the existance of conflicts betwen multiple threads within the node.
>>> 		   manage the cache of range lock to support unlock-delay.
>>> 3.dlm: determine the existance of conflicts betwen multiple nodes.
>>> 	   add splitting and merging the range locking.
>>> 4.lib: interval tree.
>>>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>>>
>>> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
>>> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
>>> I think ocfs2_rw_lock(pr)  + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
>>
>> Okay, let me ask this question in another way: What is the purpose of
>> ocfs2_rw_lock(pr) in *this* scenario, where you are using
>> ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
>> ocfs2_rw_lock guarding?
>>
> Because RW lock is also used to protect O_DIRECT reads from racing with truncate,
> buffer read is not protected by RW lock. We do not want to change rw lock in buffer
> read scenario. So we add another range lock to complete this demo.
>

Sorry, I still don't understand this. You are changing the RW lock from 
EX to PR during writes. This will add races (with respect to reads) 
rather solving it.

-- 
Goldwyn

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
  2015-01-30 12:37         ` Goldwyn Rodrigues
@ 2015-01-31  4:15           ` yangwenfang
  0 siblings, 0 replies; 16+ messages in thread
From: yangwenfang @ 2015-01-31  4:15 UTC (permalink / raw)
  To: ocfs2-devel

On 2015/1/30 20:37, Goldwyn Rodrigues wrote:
> 
> 
> On 01/29/2015 08:59 PM, Xue jiufei wrote:
>> Hi Goldwyn,
>> On 2015/1/29 19:04, Goldwyn Rodrigues wrote:
>>> Yangwenfang,
>>>
>>> On 01/29/2015 12:42 AM, yangwenfang wrote:
>>>> On 2015/1/27 15:08, Srinivas Eeda wrote:
>>>>> Hi Yangwenfang,
>>>>>
>>>>> thank you very much for initiating this RFC :). This feature is long due for OCFS2 and we are also interested in implementing this feature. Wengang(cc'ed) has been looking into analysing and giving an attempt to implement it. We haven't  looked at splitting and merging the range locking yet, but looked at having lock fairness and range locking. Wengang has done some of the dlm changes to see how it can be done but other changes are still work in progress. We will email more details in coming few days.
>>>>>
>>>>> Since you are also looking into it, it would be great if we can collaborate work on this feature. Can you please share more info on the demo code you mentioned ? Like what it does and how much work has been done on this ?
>>>>>
>>>> Hi,
>>>> About 6k lines of code was modified including dlmglue and dlm in our demo.
>>>>
>>>> code modification:
>>>> 1.read/write IO: get the range(start, end) and call ocfs2_range_lock.
>>>> 2.dlmglue: modify key data struct: each inode has one ocfs2_lock_res including many range locks which have different range.
>>>>            determine the existance of conflicts betwen multiple threads within the node.
>>>>            manage the cache of range lock to support unlock-delay.
>>>> 3.dlm: determine the existance of conflicts betwen multiple nodes.
>>>>        add splitting and merging the range locking.
>>>> 4.lib: interval tree.
>>>>> One of the thing we considered was making the rw lock itself support range locking, which is a different approach from what you mentioned. Is there any reason why rw lock cannot be used and we needa new ip_range_lock_lockres ?
>>>>>
>>>> RW lock can be used, but it is complicated to add the feature to rw_lock because RW lock is also applicated in read/write/truncate.
>>>> Byte range lock is only beneficial for update write, so I just modify write IO to finish the demo to get performance results as soon as possible.
>>>> I think ocfs2_rw_lock(pr)  + ocfs2_range_lock(start, end, ex) are equivalent to ocfs2_rw_lock(ex)?am I rigth?
>>>
>>> Okay, let me ask this question in another way: What is the purpose of
>>> ocfs2_rw_lock(pr) in *this* scenario, where you are using
>>> ocfs2_range_lock in conjunction with ocfs2_rw_lock. What is
>>> ocfs2_rw_lock guarding?
>>>
>> Because RW lock is also used to protect O_DIRECT reads from racing with truncate,
>> buffer read is not protected by RW lock. We do not want to change rw lock in buffer
>> read scenario. So we add another range lock to complete this demo.
>>
> 
> Sorry, I still don't understand this. You are changing the RW lock from EX to PR during writes. This will add races (with respect to reads) rather solving it.
> 
Ok, let me have a try to answer this in another way.

1.ocfs2_rw_lock is called by ocfs2_setattr(EX), __ocfs2_change_file_space(EX),ocfs2_move_extents(EX),
ocfs2_file_splice_write(EX),ocfs2_file_aio_write(EX or PR),ocfs2_file_aio_read(PR).

Changing the RW lock from EX to PR is risky when another node is keeping RW PR lock.
So we add another type of lock in functions which are using RW PR lock including ocfs2_file_aio_write and ocfs2_file_aio_read.
Such implementations do not change the original read/write IO flows.

RW and range lock work together to prevent racing just in ocfs2_file_aio_write and ocfs2_file_aio_read.
For example: buffer read/write
ocfs2_file_aio_write
	mutex_lock(&inode->i_mutex);
	ocfs2_rw_lock(0, Max_LEN, PR)
	ocfs2_range_lock(start, end, EX)
	generic_file_buffered_write
		ocfs2_write_begin
		ocfs2_write_end

ocfs2_file_aio_read
	ocfs2_range_lock(start, end, PR)
	generic_file_aio_read
		
2. Making the rw lock itself support range locking is realizable after finishing the follwing things:
1)modify the interface of ocfs2_rw_lock, add parameters of start and end.
2)get the scope in ocfs2_file_aio_write.
3)add calling ocfs2_rw_lock in ocfs2_file_aio_read.
4)the scope is (0, Max_LEN) in ocfs2_setattr, __ocfs2_change_file_space, ocfs2_move_extents, ocfs2_file_splice_write.

thanks,
yangwenfang

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock
@ 2015-01-28  8:43 David Weber
  0 siblings, 0 replies; 16+ messages in thread
From: David Weber @ 2015-01-28  8:43 UTC (permalink / raw)
  To: ocfs2-devel

Hi,

On 01/26/2015 04:28 AM, yangwenfang wrote:
> 
> What:
> Byte range lock is applied to lock a region of a file to accelerate
> reading/writing concurrently.
> 
> Why:		
> Currently ocfs2 does not support byte range lock. Since multiple nodes
> may concurrently update/write at different positions of the same file
> in database workloads, the performance(tpmc) of DB+ocfs2 is much poorer 
> than
> DB+GPFS in running TPCC.
> Aiming at improving the efficiency of parallel accesses to the same file,
> we have implemented a demo of range lock feature which has been supported
> by lustre and GPFS, so that a file can be updated by different nodes in
> the cluster when they are visiting different blocks.

would this also make cluster aware fcntl(2) locks with the o2cb stack possible?

Cheers,
David

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-01-31  4:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-26 12:28 [Ocfs2-devel] [RFC] ocfs2/dlm: support range lock yangwenfang
2015-01-27  7:08 ` Srinivas Eeda
2015-01-29  6:42   ` yangwenfang
2015-01-29 11:04     ` Goldwyn Rodrigues
2015-01-30  2:59       ` Xue jiufei
2015-01-30 12:37         ` Goldwyn Rodrigues
2015-01-31  4:15           ` yangwenfang
2015-01-29 11:07     ` Goldwyn Rodrigues
2015-01-29  0:05 ` Goldwyn Rodrigues
2015-01-29  3:21   ` Wengang Wang
2015-01-29  7:47   ` yangwenfang
2015-01-29  8:06     ` Wengang Wang
2015-01-30  3:54       ` yangwenfang
2015-01-30  6:02         ` Wengang Wang
2015-01-30  7:46           ` yangwenfang
2015-01-28  8:43 David Weber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.