All of lore.kernel.org
 help / color / mirror / Atom feed
* [Lustre-devel] Lustre dcache clean up and distributed locking
       [not found]           ` <8A50205D-3704-4969-9997-3D6157C252D0@whamcloud.com>
@ 2012-01-22 16:11             ` Fan Yong
  2012-01-22 17:33               ` Andreas Dilger
  0 siblings, 1 reply; 16+ messages in thread
From: Fan Yong @ 2012-01-22 16:11 UTC (permalink / raw)
  To: lustre-devel

On 1/21/12 2:06 AM, Andreas Dilger wrote:
> Hi all, please enjoy your New Years holiday. This issue can wait until your return.
>
> Fan Yong,
> About DNE locking (I write this now only to avoid forgetting it, no immediate answer is needed) - in order to handle the split name/permission issue, the current plan is for the client to get the LOOKUP lock for the FID on both the master MDT and the remote MDT.
Hi Andreas,

Why two LOOKUP locks on two MDTs, why not one LOOKUP lock on inode 
holder MDT and the other name lock (new) on name entry holder MDT? Any 
special advantage? I think both methods need client-side support (for 
protocol changes).


Cheers,
Fan Yong
>
> If the link is removed from the master MDT, then the FID LOOKUP lock will be cancelled there, but if another hard link is remove from a different MDT then the only the LOOKUP from the other MDT needs to be cancelled.
>
> This will be similar to Tao's proposal for a separate lock bit for the name, in the DNE case where the name is remote from the inode. It still suffers from some inefficiency in case of multiple hard links from the same master MDT to a remote inode (canceling the FID LOOKUP lock when unlinking one name will force it to be refetched for the other links), but since hard links are so rare this should not significantly impact performance.
>
> Cheers, Andreas
>
> On 2012-01-20, at 10:10, Fan Yong<yong.fan@whamcloud.com>  wrote:
>> Excellent work. I just added some comments inside your document. Please
>> check.
>>
>>
>> Best Regards,
>> Fan Yong
>> Whamcloud, Inc.
>>
>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>>> Hi Andreas,
>>>
>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet access at hometown.
>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
>>>
>>> Thanks,
>>> Haiying
>>>
>>> -----Original Message-----
>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>> Sent: 2012?1?20? 1:39
>>> To: Peng, Tao
>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com
>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>
>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>  <tao.peng@emc.com>  wrote:
>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I updated the design to reflect it.
>>>>
>>>> Please see attachment.
>>> Tao,
>>> Fan Yong is also taking a more detailed look at your document and will
>>> hopefully have a chance to reply before the New Year holidays.
>>>
>>> Also, we are just working on landing the patches to add support for Linux
>>> 2.6.38 for the Lustre client.  One of the patches relates to the lockless
>>> dcache changes that were introduced in that kernel.  If you are interested
>>> to review this patch, and become more familiar with the Lustre development
>>> process, you should visit http://review.whamcloud.com/1865 for the patch.
>>>
>>> You need to create an account in Gerrit using OpenID (Google, mostly), and
>>> an account in our bug tracking system (http://jira.whamcloud.com) if you
>>> haven't already.
>>>
>>>>> -----Original Message-----
>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>>>>> To: Peng, Tao
>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan Yong; Oleg Drokin; Eric Barton
>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>
>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>  wrote:
>>>>>> I finally started to work on Lustre dcache cleanup and locking. After reading Lustre, ocfs2 and VFS
>>>>> dcache related code, I came to a design for cleaning up Lustre dcache code and doing distributed
>>>>> locking under dcache. For distributed locking, the main idea is to add a new inode bitlock DENTRY lock
>>>>> to just protect valid dentry, instead of letting LOOKUP lock handle multiple purpose, which makes
>>>>> client unable to know whether file is deleted or not when server cancels LOOKUP lock. Instead, client
>>>>> holds PR mode DENTRY lock on all valid denties and when server cancels it, client knows the file is
>>>>> deleted.
>>>>>> For details, please see the attachments (I attached both word and pdf versions as I am not sure if
>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
>>>>>
>>>>> Hi Tao,
>>>>> I'm passing this on to the engineers that are most familiar with the DLM and client dcache code in
>>>>> Lustre.  After a quick first read through your document, your investigation of the client code is very
>>>>> insightful and looks like it will be able to remove some of the significant complexity that has grown
>>>>> over time in the llite code.
>>>>>
>>>>> Cheers, Andreas
>>>>> --
>>>>> Andreas Dilger                       Whamcloud, Inc.
>>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>>
>>>>>
>>>> <dcache_distributed_locking-v2.docx>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger                       Whamcloud, Inc.
>>> Principal Engineer                   http://www.whamcloud.com/
>>>
>>>
>>>
>>>
>>>
>> <dcache_distributed_locking-v2_comment.docx>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-01-22 16:11             ` Fan Yong
@ 2012-01-22 17:33               ` Andreas Dilger
  0 siblings, 0 replies; 16+ messages in thread
From: Andreas Dilger @ 2012-01-22 17:33 UTC (permalink / raw)
  To: lustre-devel

On 2012-01-22, at 9:11, Fan Yong <yong.fan@whamcloud.com> wrote:

> On 1/21/12 2:06 AM, Andreas Dilger wrote:
>> 
>> Fan Yong,
>> About DNE locking - in order to handle the split name/permission issue, the current plan is for the client to get the LOOKUP lock for the FID on both the master MDT and the remote MDT.
> Hi Andreas,
> 
> Why two LOOKUP locks on two MDTs, why not one LOOKUP lock on inode holder MDT and the other name lock (new) on name entry holder MDT? Any special advantage? I think both methods need client-side support (for protocol changes).

The difference between the two approaches is relatively small in terms of the functional outcome, but in terms of implementation there is a bigger difference. 

Two reasons for this:
- we already have LOOKUP locks, so less code changes are needed
- having a single lock per FID is easier to manage compared to one lock per name

Consider that chmod/chgrp/chown/ACL operations are more common in practice than hard links, then having a single LOOKUP lock is more efficient. 
 
Even with hard links, if the client does a lookup on a link on the master, it can find the FID from the dirent and may have the inode data cached from the remote MDT. 

I agree that name locks may have advantages in some cases, but I don't know if that is worth the extra complexity. 

Cheers, Andreas

>> If the link is removed from the master MDT, then the FID LOOKUP lock will be cancelled there, but if another hard link is remove from a different MDT then the only the LOOKUP from the other MDT needs to be cancelled.
>> 
>> This will be similar to Tao's proposal for a separate lock bit for the name, in the DNE case where the name is remote from the inode. It still suffers from some inefficiency in case of multiple hard links from the same master MDT to a remote inode (canceling the FID LOOKUP lock when unlinking one name will force it to be refetched for the other links), but since hard links are so rare this should not significantly impact performance.
>> 
>> Cheers, Andreas
>> 
>> On 2012-01-20, at 10:10, Fan Yong<yong.fan@whamcloud.com>  wrote:
>>> Excellent work. I just added some comments inside your document. Please
>>> check.
>>> 
>>> 
>>> Best Regards,
>>> Fan Yong
>>> Whamcloud, Inc.
>>> 
>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>>>> Hi Andreas,
>>>> 
>>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet access at hometown.
>>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
>>>> 
>>>> Thanks,
>>>> Haiying
>>>> 
>>>> -----Original Message-----
>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>> Sent: 2012?1?20? 1:39
>>>> To: Peng, Tao
>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com
>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>> 
>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>  <tao.peng@emc.com>  wrote:
>>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I updated the design to reflect it.
>>>>> 
>>>>> Please see attachment.
>>>> Tao,
>>>> Fan Yong is also taking a more detailed look at your document and will
>>>> hopefully have a chance to reply before the New Year holidays.
>>>> 
>>>> Also, we are just working on landing the patches to add support for Linux
>>>> 2.6.38 for the Lustre client.  One of the patches relates to the lockless
>>>> dcache changes that were introduced in that kernel.  If you are interested
>>>> to review this patch, and become more familiar with the Lustre development
>>>> process, you should visit http://review.whamcloud.com/1865 for the patch.
>>>> 
>>>> You need to create an account in Gerrit using OpenID (Google, mostly), and
>>>> an account in our bug tracking system (http://jira.whamcloud.com) if you
>>>> haven't already.
>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>>>>>> To: Peng, Tao
>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan Yong; Oleg Drokin; Eric Barton
>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>> 
>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>  wrote:
>>>>>>> I finally started to work on Lustre dcache cleanup and locking. After reading Lustre, ocfs2 and VFS
>>>>>> dcache related code, I came to a design for cleaning up Lustre dcache code and doing distributed
>>>>>> locking under dcache. For distributed locking, the main idea is to add a new inode bitlock DENTRY lock
>>>>>> to just protect valid dentry, instead of letting LOOKUP lock handle multiple purpose, which makes
>>>>>> client unable to know whether file is deleted or not when server cancels LOOKUP lock. Instead, client
>>>>>> holds PR mode DENTRY lock on all valid denties and when server cancels it, client knows the file is
>>>>>> deleted.
>>>>>>> For details, please see the attachments (I attached both word and pdf versions as I am not sure if
>>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
>>>>>> 
>>>>>> Hi Tao,
>>>>>> I'm passing this on to the engineers that are most familiar with the DLM and client dcache code in
>>>>>> Lustre.  After a quick first read through your document, your investigation of the client code is very
>>>>>> insightful and looks like it will be able to remove some of the significant complexity that has grown
>>>>>> over time in the llite code.
>>>>>> 
>>>>>> Cheers, Andreas
>>>>>> --
>>>>>> Andreas Dilger                       Whamcloud, Inc.
>>>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>>> 
>>>>>> 
>>>>> <dcache_distributed_locking-v2.docx>
>>>> Cheers, Andreas
>>>> --
>>>> Andreas Dilger                       Whamcloud, Inc.
>>>> Principal Engineer                   http://www.whamcloud.com/
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> <dcache_distributed_locking-v2_comment.docx>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
       [not found]     ` <CALxHnhL5Oz7cwqF15x65enx1qhgadSkftGa-=QAuY1N+9B5rSw@mail.gmail.com>
@ 2012-01-30  7:44       ` tao.peng at emc.com
  0 siblings, 0 replies; 16+ messages in thread
From: tao.peng at emc.com @ 2012-01-30  7:44 UTC (permalink / raw)
  To: lustre-devel

Hi Siyao,

Yes, ll_lookup_lock should be able to be removed. And looks like it will be handled by Yang Sheng in ?LU-506 kernel: FC15 - Dcache RCU-walk changes? as commented by Fan Yong (http://review.whamcloud.com/#patch,sidebyside,1865,2,lustre/include/linux/lustre_patchless_compat.h). I will leave it as is to let Sheng handle it.

Thanks,
Tao

From: Lai Siyao [mailto:laisiyao at whamcloud.com]<mailto:[mailto:laisiyao@whamcloud.com]>
Sent: Friday, January 20, 2012 10:56 AM
To: Oleg Drokin
Cc: Andreas Dilger; Peng, Tao; faibish, sorin; Tang, Haiying; Liu, Xuezhao; Fan Yong; Eric Barton
Subject: Re: Lustre dcache clean up and distributed locking

Hi Tao,

The reason why ll_lookup_lock is introduced is because in some kernels (eg. 2.6.32 for RHEL6), d_rehash_cond() doesn't exist and only d_rehash can be used, which needs unlock dcache_lock before entry, therefore there is a race here, and a similar global lock like dcache_lock is needed. The improvement can be done here is when d_rehash_cond() is exported, ll_lookup_lock lock/unlock can be a null operation, and in recent kernels, dcache_lock is removed, ll_lookup_lock should be able to be removed too.

I like most of the cleanups, but I don't see much benefits introducing a DENTRY bit lock, because a dentry without LOOKUP lock can hardly be used, and inodebit lock coverage upgrade/downgrade is more useful here.

Bye,
- Lai
On Fri, Jan 20, 2012 at 1:11 AM, Oleg Drokin <green at whamcloud.com<mailto:green@whamcloud.com>> wrote:
Hello!

On Jan 17, 2012, at 3:16 AM, Andreas Dilger wrote:

> On 2012-01-16, at 9:25 PM, <tao.peng at emc.com<mailto:tao.peng@emc.com>> wrote:
>> I finally started to work on Lustre dcache cleanup and locking. After reading Lustre, ocfs2 and VFS dcache related code, I came to a design for cleaning up Lustre dcache code and doing distributed locking under dcache. For distributed locking, the main idea is to add a new inode bitlock DENTRY lock to just protect valid dentry, instead of letting LOOKUP lock handle multiple purpose, which makes client unable to know whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
>>
>> For details, please see the attachments (I attached both word and pdf versions as I am not sure if which is more convenient for you:). And please help to review and comment. Thank you!
Thanks for looking into this.
It is indeed possible to add another bit for just the dentry, but the problem is it would need to be granted in conjunction with
the LOOKUP bit anyway (along with permission information), if we don't do this, then we would need to send an additional RPC to get
this information very shortly after lookup in a lot of times and that would be pretty slow. Of course there is no way to get different
lock mode for every bit, so in this scheme the lock mode swould really be the same.
And the (currently) big problem with multiple bit locks is we don't have a way to cancel just subset of a lock (or degrade a lock coverage
I should say). This was planned for quite a while ago, but somehow never ended up at the top of the priority list so was never implemented after all.
There are minor comments on the logic flow after that too. E.g. the locks in lustre are unlocked right away on the client side, but just placed in the LRU and remain on the client in this sort of unreferenced state. When there is a conflict, the server sends the AST RPC and then the lock
is actually released (cancelled) by client. Currently the AST is not delivered to a lock that has non-zero usage counters.
Additionally due to the way we implement locks, it's not possible to have a separate lock for every name of the file.

For your cleanup list, I think that is good. Note that item 'c' was already implemented some time ago, you can see it here:
https://bugzilla.lustre.org/show_bug.cgi?id=16417 the patch is this: https://bugzilla.lustre.org/attachment.cgi?id=24200
It's probably somewhat stale now. It did not work correctly back at the time due to some other logic, but should work now.
You can read the comments in that bug for some more info.

Bye,
   Oleg
--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120130/e5aab0d3/attachment.htm>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
       [not found]         ` <4F19A01B.4070405@whamcloud.com>
       [not found]           ` <8A50205D-3704-4969-9997-3D6157C252D0@whamcloud.com>
@ 2012-01-30  7:46           ` tao.peng at emc.com
  2012-01-30 23:27             ` Andreas Dilger
  1 sibling, 1 reply; 16+ messages in thread
From: tao.peng at emc.com @ 2012-01-30  7:46 UTC (permalink / raw)
  To: lustre-devel

Hi Fan Yong,

Sorry for the late reply. I got your email earlier but didn't have chance to read the document due to very limited internet access.

Thank you very much for taking detailed look at the document and giving so many valuable comments.

About adding new inode bitlock, I must admit that I missed hardlink case. Although the lock mode can be changed to be the same as LOOKUP lock, I agree that inode bitlock cannot sufficiently protect every name/dentry separately for hardlinked files. And I agree, as you suggested in the document, adding a name entry based lock can solve it.

IIUC, to add a name entry based lock, both client and server should be modified, and LDLM need to be changed to manage the new lock information. Like new inode bitlock, the name entry based lock will as well introduce interoperability issues as a result of on wire protocol change. And in DNE environment as Andreas explained, the original plan is to get LOOKUP on both master MDT and remote MDT. If we introduce name entry based lock, DNE case may need some change as well.

I will take a closer look at the approach. In the meanwhile, could you please confirm if it is the right way to go? The benefit of above complexity seems to be better distributed dentry locking under dcache framework.

What do others think? Andreas?

Thanks and Best Regards,
Tao

> -----Original Message-----
> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> Sent: Saturday, January 21, 2012 1:11 AM
> To: Peng, Tao; bergwolf at gmail.com
> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu, Xuezhao; laisiyao at whamcloud.com;
> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
> Subject: Re: Lustre dcache clean up and distributed locking
> 
> Hi Tao,
> 
> Excellent work. I just added some comments inside your document. Please check.
> 
> 
> Best Regards,
> Fan Yong
> Whamcloud, Inc.
> 
> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> > Hi Andreas,
> >
> > Tao is on vacation now. It might be difficult for him to check emails due to limited internet access
> at hometown.
> > For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
> >
> > Thanks,
> > Haiying
> >
> > -----Original Message-----
> > From: Andreas Dilger [mailto:adilger at whamcloud.com]
> > Sent: 2012?1?20? 1:39
> > To: Peng, Tao
> > Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> > laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
> > eeb at whamcloud.com
> > Subject: Re: Lustre dcache clean up and distributed locking
> >
> > On 2012-01-17, at 3:21 AM, <tao.peng@emc.com> <tao.peng@emc.com> wrote:
> >> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I
> updated the design to reflect it.
> >>
> >> Please see attachment.
> > Tao,
> > Fan Yong is also taking a more detailed look at your document and will
> > hopefully have a chance to reply before the New Year holidays.
> >
> > Also, we are just working on landing the patches to add support for
> > Linux
> > 2.6.38 for the Lustre client.  One of the patches relates to the
> > lockless dcache changes that were introduced in that kernel.  If you
> > are interested to review this patch, and become more familiar with the
> > Lustre development process, you should visit http://review.whamcloud.com/1865 for the patch.
> >
> > You need to create an account in Gerrit using OpenID (Google, mostly),
> > and an account in our bug tracking system (http://jira.whamcloud.com)
> > if you haven't already.
> >
> >>> -----Original Message-----
> >>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >>> Sent: Tuesday, January 17, 2012 4:16 PM
> >>> To: Peng, Tao
> >>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
> >>> Yong; Oleg Drokin; Eric Barton
> >>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>
> >>> On 2012-01-16, at 9:25 PM, <tao.peng@emc.com> wrote:
> >>>> I finally started to work on Lustre dcache cleanup and locking.
> >>>> After reading Lustre, ocfs2 and VFS
> >>> dcache related code, I came to a design for cleaning up Lustre
> >>> dcache code and doing distributed locking under dcache. For
> >>> distributed locking, the main idea is to add a new inode bitlock
> >>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
> >>> lock handle multiple purpose, which makes client unable to know
> >>> whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode
> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
> >>>> For details, please see the attachments (I attached both word and
> >>>> pdf versions as I am not sure if
> >>> which is more convenient for you:). And please help to review and comment. Thank you!
> >>>
> >>> Hi Tao,
> >>> I'm passing this on to the engineers that are most familiar with the
> >>> DLM and client dcache code in Lustre.  After a quick first read
> >>> through your document, your investigation of the client code is very
> >>> insightful and looks like it will be able to remove some of the significant complexity that has
> grown over time in the llite code.
> >>>
> >>> Cheers, Andreas
> >>> --
> >>> Andreas Dilger                       Whamcloud, Inc.
> >>> Principal Engineer                   http://www.whamcloud.com/
> >>>
> >>>
> >> <dcache_distributed_locking-v2.docx>
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger                       Whamcloud, Inc.
> > Principal Engineer                   http://www.whamcloud.com/
> >
> >
> >
> >
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-01-30  7:46           ` tao.peng at emc.com
@ 2012-01-30 23:27             ` Andreas Dilger
  2012-01-31 12:02               ` Fan Yong
  0 siblings, 1 reply; 16+ messages in thread
From: Andreas Dilger @ 2012-01-30 23:27 UTC (permalink / raw)
  To: lustre-devel

On 2012-01-30, at 12:46 AM, <tao.peng@emc.com> wrote:
> Sorry for the late reply. I got your email earlier but didn't have chance to read the document due to very limited internet access.
> 
> Thank you very much for taking detailed look at the document and giving so many valuable comments.
> 
> About adding new inode bitlock, I must admit that I missed hardlink case. Although the lock mode can be changed to be the same as LOOKUP lock, I agree that inode bitlock cannot sufficiently protect every name/dentry separately for hardlinked files. And I agree, as you suggested in the document, adding a name entry based lock can solve it.
> 
> IIUC, to add a name entry based lock, both client and server should be modified, and LDLM need to be changed to manage the new lock information. Like new inode bitlock, the name entry based lock will as well introduce interoperability issues as a result of on wire protocol change. And in DNE environment as Andreas explained, the original plan is to get LOOKUP on both master MDT and remote MDT. If we introduce name entry based lock, DNE case may need some change as well.
> 
> I will take a closer look at the approach. In the meanwhile, could you please confirm if it is the right way to go? The benefit of above complexity seems to be better distributed dentry locking under dcache framework.
> 
> What do others think? Andreas?

Tao, before we go down the road of implementing such a change, there would have to be clear benefits to the performance and/or kernel integration before we could accept the interoperability issues and added code complexity.

The benefits of having a per-name lock, as I see them, are:
- can be cancelled separately from other name locks
- can be changed independently from the inode permissions (LOOKUP)
- potential ease of integration with the Linux dcache

The drawbacks of introducing a per-name lock:
- only useful for hard-linked files (uncommon case)
- added code complexity for a new lock type
- incompatibility with existing client/server
- additional code to handle incompatibility transparently
- lock enqueue may need extra RPC for each name lock (different resource)
- lock cancellation may need an extra RPC for each resource

The way I see it is that the DENTRY lock would need to be a different resource for each name on a file.  The only way this is going to be useful AFAICS is if the client enqueues each DENTRY lock on a separate resource, so that it can be granted and cancelled independently of the LOOKUP lock (which is on the FID and would be common for all names that reference this inode).  That would either need an extra RPC for every filename lookup (to enqueue both the DENTRY and LOOKUP locks) or a new LDLM_ENQUEUE RPC that can request and be granted two different locks (FID [seq, oid, ver, 0], and FID+name hash [seq, oid, ver, hash]) in a single RPC.

So, the main question is how much better the client dcache/VFS integration will be with the DCACHE lock compared to the existing mechanism with the LOOKUP (FID) lock?  Is it impossible (or very difficult) to use the existing mechanism for the new lockless dcache code?  Or is it a matter of this being a somewhat cleaner implementation, but not critical for moving forward?

I'm aware that our dcache code has been too complex for some time, but I haven't looked recently to see how much of this is due to using/supporting old kernel APIs, and how much of it is due to the use of the LOOKUP lock on the FID protecting too much of the state on the client.

Cheers, Andreas


>> -----Original Message-----
>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>> Sent: Saturday, January 21, 2012 1:11 AM
>> To: Peng, Tao; bergwolf at gmail.com
>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu, Xuezhao; laisiyao at whamcloud.com;
>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
>> Subject: Re: Lustre dcache clean up and distributed locking
>> 
>> Hi Tao,
>> 
>> Excellent work. I just added some comments inside your document. Please check.
>> 
>> 
>> Best Regards,
>> Fan Yong
>> Whamcloud, Inc.
>> 
>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>>> Hi Andreas,
>>> 
>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet access
>> at hometown.
>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
>>> 
>>> Thanks,
>>> Haiying
>>> 
>>> -----Original Message-----
>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>> Sent: 2012?1?20? 1:39
>>> To: Peng, Tao
>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
>>> eeb at whamcloud.com
>>> Subject: Re: Lustre dcache clean up and distributed locking
>>> 
>>> On 2012-01-17, at 3:21 AM, <tao.peng@emc.com> <tao.peng@emc.com> wrote:
>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I
>> updated the design to reflect it.
>>>> 
>>>> Please see attachment.
>>> Tao,
>>> Fan Yong is also taking a more detailed look at your document and will
>>> hopefully have a chance to reply before the New Year holidays.
>>> 
>>> Also, we are just working on landing the patches to add support for
>>> Linux
>>> 2.6.38 for the Lustre client.  One of the patches relates to the
>>> lockless dcache changes that were introduced in that kernel.  If you
>>> are interested to review this patch, and become more familiar with the
>>> Lustre development process, you should visit http://review.whamcloud.com/1865 for the patch.
>>> 
>>> You need to create an account in Gerrit using OpenID (Google, mostly),
>>> and an account in our bug tracking system (http://jira.whamcloud.com)
>>> if you haven't already.
>>> 
>>>>> -----Original Message-----
>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>>>>> To: Peng, Tao
>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
>>>>> Yong; Oleg Drokin; Eric Barton
>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>> 
>>>>> On 2012-01-16, at 9:25 PM, <tao.peng@emc.com> wrote:
>>>>>> I finally started to work on Lustre dcache cleanup and locking.
>>>>>> After reading Lustre, ocfs2 and VFS
>>>>> dcache related code, I came to a design for cleaning up Lustre
>>>>> dcache code and doing distributed locking under dcache. For
>>>>> distributed locking, the main idea is to add a new inode bitlock
>>>>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
>>>>> lock handle multiple purpose, which makes client unable to know
>>>>> whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode
>> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
>>>>>> For details, please see the attachments (I attached both word and
>>>>>> pdf versions as I am not sure if
>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
>>>>> 
>>>>> Hi Tao,
>>>>> I'm passing this on to the engineers that are most familiar with the
>>>>> DLM and client dcache code in Lustre.  After a quick first read
>>>>> through your document, your investigation of the client code is very
>>>>> insightful and looks like it will be able to remove some of the significant complexity that has
>> grown over time in the llite code.
>>>>> 
>>>>> Cheers, Andreas
>>>>> --
>>>>> Andreas Dilger                       Whamcloud, Inc.
>>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>> 
>>>>> 
>>>> <dcache_distributed_locking-v2.docx>
>>> 
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger                       Whamcloud, Inc.
>>> Principal Engineer                   http://www.whamcloud.com/
>>> 
>>> 
>>> 
>>> 
>>> 
> 


Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Engineer                   http://www.whamcloud.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-01-30 23:27             ` Andreas Dilger
@ 2012-01-31 12:02               ` Fan Yong
  2012-02-03  7:37                 ` tao.peng at emc.com
  0 siblings, 1 reply; 16+ messages in thread
From: Fan Yong @ 2012-01-31 12:02 UTC (permalink / raw)
  To: lustre-devel

On 1/31/12 7:27 AM, Andreas Dilger wrote:
> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>  wrote:
>> Sorry for the late reply. I got your email earlier but didn't have chance to read the document due to very limited internet access.
>>
>> Thank you very much for taking detailed look at the document and giving so many valuable comments.
>>
>> About adding new inode bitlock, I must admit that I missed hardlink case. Although the lock mode can be changed to be the same as LOOKUP lock, I agree that inode bitlock cannot sufficiently protect every name/dentry separately for hardlinked files. And I agree, as you suggested in the document, adding a name entry based lock can solve it.
>>
>> IIUC, to add a name entry based lock, both client and server should be modified, and LDLM need to be changed to manage the new lock information. Like new inode bitlock, the name entry based lock will as well introduce interoperability issues as a result of on wire protocol change. And in DNE environment as Andreas explained, the original plan is to get LOOKUP on both master MDT and remote MDT. If we introduce name entry based lock, DNE case may need some change as well.
>>
>> I will take a closer look at the approach. In the meanwhile, could you please confirm if it is the right way to go? The benefit of above complexity seems to be better distributed dentry locking under dcache framework.
>>
>> What do others think? Andreas?
> Tao, before we go down the road of implementing such a change, there would have to be clear benefits to the performance and/or kernel integration before we could accept the interoperability issues and added code complexity.
>
> The benefits of having a per-name lock, as I see them, are:
> - can be cancelled separately from other name locks
> - can be changed independently from the inode permissions (LOOKUP)
> - potential ease of integration with the Linux dcache
>
> The drawbacks of introducing a per-name lock:
> - only useful for hard-linked files (uncommon case)
> - added code complexity for a new lock type
> - incompatibility with existing client/server
> - additional code to handle incompatibility transparently
> - lock enqueue may need extra RPC for each name lock (different resource)
> - lock cancellation may need an extra RPC for each resource
>
> The way I see it is that the DENTRY lock would need to be a different resource for each name on a file.  The only way this is going to be useful AFAICS is if the client enqueues each DENTRY lock on a separate resource, so that it can be granted and cancelled independently of the LOOKUP lock (which is on the FID and would be common for all names that reference this inode).  That would either need an extra RPC for every filename lookup (to enqueue both the DENTRY and LOOKUP locks) or a new LDLM_ENQUEUE RPC that can request and be granted two different locks (FID [seq, oid, ver, 0], and FID+name hash [seq, oid, ver, hash]) in a single RPC.
>
> So, the main question is how much better the client dcache/VFS integration will be with the DCACHE lock compared to the existing mechanism with the LOOKUP (FID) lock?  Is it impossible (or very difficult) to use the existing mechanism for the new lockless dcache code?  Or is it a matter of this being a somewhat cleaner implementation, but not critical for moving forward?
>
> I'm aware that our dcache code has been too complex for some time, but I haven't looked recently to see how much of this is due to using/supporting old kernel APIs, and how much of it is due to the use of the LOOKUP lock on the FID protecting too much of the state on the client.
>
> Cheers, Andreas
Hi Tao,

To introduce either per-name lock or DENTRY lock is mainly for removing 
Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code. But 
if such simplification causes more complex ldlm logic, we have to 
consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is 
not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux kernel. 
On the other hand, "DCACHE_LUSTRE_INVALID" can work together with 
lockless dcache in new linux kernel also.

So to push the project move forward, we can divide it as two parts 
(according to your document): Lustre dcache code cleanup, and some new 
ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an 
agreement on the second part, we can start from the first part to 
cleanup Lustre dcache code. How do you think?

Cheers,
--
Nasf

>
>
>>> -----Original Message-----
>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>>> Sent: Saturday, January 21, 2012 1:11 AM
>>> To: Peng, Tao; bergwolf at gmail.com
>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu, Xuezhao; laisiyao at whamcloud.com;
>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>
>>> Hi Tao,
>>>
>>> Excellent work. I just added some comments inside your document. Please check.
>>>
>>>
>>> Best Regards,
>>> Fan Yong
>>> Whamcloud, Inc.
>>>
>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>>>> Hi Andreas,
>>>>
>>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet access
>>> at hometown.
>>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
>>>>
>>>> Thanks,
>>>> Haiying
>>>>
>>>> -----Original Message-----
>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>> Sent: 2012?1?20? 1:39
>>>> To: Peng, Tao
>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
>>>> eeb at whamcloud.com
>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>
>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>  <tao.peng@emc.com>  wrote:
>>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I
>>> updated the design to reflect it.
>>>>> Please see attachment.
>>>> Tao,
>>>> Fan Yong is also taking a more detailed look at your document and will
>>>> hopefully have a chance to reply before the New Year holidays.
>>>>
>>>> Also, we are just working on landing the patches to add support for
>>>> Linux
>>>> 2.6.38 for the Lustre client.  One of the patches relates to the
>>>> lockless dcache changes that were introduced in that kernel.  If you
>>>> are interested to review this patch, and become more familiar with the
>>>> Lustre development process, you should visit http://review.whamcloud.com/1865 for the patch.
>>>>
>>>> You need to create an account in Gerrit using OpenID (Google, mostly),
>>>> and an account in our bug tracking system (http://jira.whamcloud.com)
>>>> if you haven't already.
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>>>>>> To: Peng, Tao
>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
>>>>>> Yong; Oleg Drokin; Eric Barton
>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>>
>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>  wrote:
>>>>>>> I finally started to work on Lustre dcache cleanup and locking.
>>>>>>> After reading Lustre, ocfs2 and VFS
>>>>>> dcache related code, I came to a design for cleaning up Lustre
>>>>>> dcache code and doing distributed locking under dcache. For
>>>>>> distributed locking, the main idea is to add a new inode bitlock
>>>>>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
>>>>>> lock handle multiple purpose, which makes client unable to know
>>>>>> whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode
>>> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
>>>>>>> For details, please see the attachments (I attached both word and
>>>>>>> pdf versions as I am not sure if
>>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
>>>>>>
>>>>>> Hi Tao,
>>>>>> I'm passing this on to the engineers that are most familiar with the
>>>>>> DLM and client dcache code in Lustre.  After a quick first read
>>>>>> through your document, your investigation of the client code is very
>>>>>> insightful and looks like it will be able to remove some of the significant complexity that has
>>> grown over time in the llite code.
>>>>>> Cheers, Andreas
>>>>>> --
>>>>>> Andreas Dilger                       Whamcloud, Inc.
>>>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>>>
>>>>>>
>>>>> <dcache_distributed_locking-v2.docx>
>>>> Cheers, Andreas
>>>> --
>>>> Andreas Dilger                       Whamcloud, Inc.
>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>
>>>>
>>>>
>>>>
>>>>
>
> Cheers, Andreas
> --
> Andreas Dilger                       Whamcloud, Inc.
> Principal Engineer                   http://www.whamcloud.com/
>
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-01-31 12:02               ` Fan Yong
@ 2012-02-03  7:37                 ` tao.peng at emc.com
  2012-02-03 16:17                   ` Fan Yong
  0 siblings, 1 reply; 16+ messages in thread
From: tao.peng at emc.com @ 2012-02-03  7:37 UTC (permalink / raw)
  To: lustre-devel

> -----Original Message-----
> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> Sent: Tuesday, January 31, 2012 8:02 PM
> To: Peng, Tao
> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> devel at lists.lustre.org
> Subject: Re: Lustre dcache clean up and distributed locking
> 
> On 1/31/12 7:27 AM, Andreas Dilger wrote:
> > On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>  wrote:
> >> Sorry for the late reply. I got your email earlier but didn't have chance to read the document due
> to very limited internet access.
> >>
> >> Thank you very much for taking detailed look at the document and giving so many valuable comments.
> >>
> >> About adding new inode bitlock, I must admit that I missed hardlink case. Although the lock mode
> can be changed to be the same as LOOKUP lock, I agree that inode bitlock cannot sufficiently protect
> every name/dentry separately for hardlinked files. And I agree, as you suggested in the document,
> adding a name entry based lock can solve it.
> >>
> >> IIUC, to add a name entry based lock, both client and server should be modified, and LDLM need to
> be changed to manage the new lock information. Like new inode bitlock, the name entry based lock will
> as well introduce interoperability issues as a result of on wire protocol change. And in DNE
> environment as Andreas explained, the original plan is to get LOOKUP on both master MDT and remote MDT.
> If we introduce name entry based lock, DNE case may need some change as well.
> >>
> >> I will take a closer look at the approach. In the meanwhile, could you please confirm if it is the
> right way to go? The benefit of above complexity seems to be better distributed dentry locking under
> dcache framework.
> >>
> >> What do others think? Andreas?
> > Tao, before we go down the road of implementing such a change, there would have to be clear benefits
> to the performance and/or kernel integration before we could accept the interoperability issues and
> added code complexity.
> >
> > The benefits of having a per-name lock, as I see them, are:
> > - can be cancelled separately from other name locks
> > - can be changed independently from the inode permissions (LOOKUP)
> > - potential ease of integration with the Linux dcache
> >
> > The drawbacks of introducing a per-name lock:
> > - only useful for hard-linked files (uncommon case)
> > - added code complexity for a new lock type
> > - incompatibility with existing client/server
> > - additional code to handle incompatibility transparently
> > - lock enqueue may need extra RPC for each name lock (different resource)
> > - lock cancellation may need an extra RPC for each resource
> >
> > The way I see it is that the DENTRY lock would need to be a different resource for each name on a
> file.  The only way this is going to be useful AFAICS is if the client enqueues each DENTRY lock on a
> separate resource, so that it can be granted and cancelled independently of the LOOKUP lock (which is
> on the FID and would be common for all names that reference this inode).  That would either need an
> extra RPC for every filename lookup (to enqueue both the DENTRY and LOOKUP locks) or a new
> LDLM_ENQUEUE RPC that can request and be granted two different locks (FID [seq, oid, ver, 0], and
> FID+name hash [seq, oid, ver, hash]) in a single RPC.
> >
> > So, the main question is how much better the client dcache/VFS integration will be with the DCACHE
> lock compared to the existing mechanism with the LOOKUP (FID) lock?  Is it impossible (or very
> difficult) to use the existing mechanism for the new lockless dcache code?  Or is it a matter of this
> being a somewhat cleaner implementation, but not critical for moving forward?
> >
> > I'm aware that our dcache code has been too complex for some time, but I haven't looked recently to
> see how much of this is due to using/supporting old kernel APIs, and how much of it is due to the use
> of the LOOKUP lock on the FID protecting too much of the state on the client.
> >
> > Cheers, Andreas
> Hi Tao,
> 
> To introduce either per-name lock or DENTRY lock is mainly for removing
> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code. But
> if such simplification causes more complex ldlm logic, we have to
> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux kernel.
> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
> lockless dcache in new linux kernel also.
> 
> So to push the project move forward, we can divide it as two parts
> (according to your document): Lustre dcache code cleanup, and some new
> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
> agreement on the second part, we can start from the first part to
> cleanup Lustre dcache code. How do you think?
> 
Thank you, Andreas and Yong. I tend to agree with you that we should avoid the complexity if there is no obvious benefits. However, as we need to make sure the code is acceptable by kernel community, I will try to consolidate a design for adding the new type of lock but just don't implement it at this point. And if later we are asked (by kernel community developers) to do it, we just need to do the implementation.

Please stay tuned.

Best,
Tao

> Cheers,
> --
> Nasf
> 
> >
> >
> >>> -----Original Message-----
> >>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >>> Sent: Saturday, January 21, 2012 1:11 AM
> >>> To: Peng, Tao; bergwolf at gmail.com
> >>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu, Xuezhao; laisiyao at whamcloud.com;
> >>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
> >>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>
> >>> Hi Tao,
> >>>
> >>> Excellent work. I just added some comments inside your document. Please check.
> >>>
> >>>
> >>> Best Regards,
> >>> Fan Yong
> >>> Whamcloud, Inc.
> >>>
> >>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> >>>> Hi Andreas,
> >>>>
> >>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet
> access
> >>> at hometown.
> >>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
> >>>>
> >>>> Thanks,
> >>>> Haiying
> >>>>
> >>>> -----Original Message-----
> >>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >>>> Sent: 2012?1?20? 1:39
> >>>> To: Peng, Tao
> >>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> >>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
> >>>> eeb at whamcloud.com
> >>>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>>
> >>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>  <tao.peng@emc.com>  wrote:
> >>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I
> >>> updated the design to reflect it.
> >>>>> Please see attachment.
> >>>> Tao,
> >>>> Fan Yong is also taking a more detailed look at your document and will
> >>>> hopefully have a chance to reply before the New Year holidays.
> >>>>
> >>>> Also, we are just working on landing the patches to add support for
> >>>> Linux
> >>>> 2.6.38 for the Lustre client.  One of the patches relates to the
> >>>> lockless dcache changes that were introduced in that kernel.  If you
> >>>> are interested to review this patch, and become more familiar with the
> >>>> Lustre development process, you should visit http://review.whamcloud.com/1865 for the patch.
> >>>>
> >>>> You need to create an account in Gerrit using OpenID (Google, mostly),
> >>>> and an account in our bug tracking system (http://jira.whamcloud.com)
> >>>> if you haven't already.
> >>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
> >>>>>> To: Peng, Tao
> >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
> >>>>>> Yong; Oleg Drokin; Eric Barton
> >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>>>>
> >>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>  wrote:
> >>>>>>> I finally started to work on Lustre dcache cleanup and locking.
> >>>>>>> After reading Lustre, ocfs2 and VFS
> >>>>>> dcache related code, I came to a design for cleaning up Lustre
> >>>>>> dcache code and doing distributed locking under dcache. For
> >>>>>> distributed locking, the main idea is to add a new inode bitlock
> >>>>>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
> >>>>>> lock handle multiple purpose, which makes client unable to know
> >>>>>> whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode
> >>> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
> >>>>>>> For details, please see the attachments (I attached both word and
> >>>>>>> pdf versions as I am not sure if
> >>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
> >>>>>>
> >>>>>> Hi Tao,
> >>>>>> I'm passing this on to the engineers that are most familiar with the
> >>>>>> DLM and client dcache code in Lustre.  After a quick first read
> >>>>>> through your document, your investigation of the client code is very
> >>>>>> insightful and looks like it will be able to remove some of the significant complexity that has
> >>> grown over time in the llite code.
> >>>>>> Cheers, Andreas
> >>>>>> --
> >>>>>> Andreas Dilger                       Whamcloud, Inc.
> >>>>>> Principal Engineer                   http://www.whamcloud.com/
> >>>>>>
> >>>>>>
> >>>>> <dcache_distributed_locking-v2.docx>
> >>>> Cheers, Andreas
> >>>> --
> >>>> Andreas Dilger                       Whamcloud, Inc.
> >>>> Principal Engineer                   http://www.whamcloud.com/
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >
> > Cheers, Andreas
> > --
> > Andreas Dilger                       Whamcloud, Inc.
> > Principal Engineer                   http://www.whamcloud.com/
> >
> >
> >
> >
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-03  7:37                 ` tao.peng at emc.com
@ 2012-02-03 16:17                   ` Fan Yong
  2012-02-05 13:50                     ` Peng Tao
                                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Fan Yong @ 2012-02-03 16:17 UTC (permalink / raw)
  To: lustre-devel

On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
>> -----Original Message-----
>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>> Sent: Tuesday, January 31, 2012 8:02 PM
>> To: Peng, Tao
>> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
>> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
>> devel at lists.lustre.org
>> Subject: Re: Lustre dcache clean up and distributed locking
>>
>> On 1/31/12 7:27 AM, Andreas Dilger wrote:
>>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>   wrote:
>>>> Sorry for the late reply. I got your email earlier but didn't have chance to read the document due
>> to very limited internet access.
>>>> Thank you very much for taking detailed look at the document and giving so many valuable comments.
>>>>
>>>> About adding new inode bitlock, I must admit that I missed hardlink case. Although the lock mode
>> can be changed to be the same as LOOKUP lock, I agree that inode bitlock cannot sufficiently protect
>> every name/dentry separately for hardlinked files. And I agree, as you suggested in the document,
>> adding a name entry based lock can solve it.
>>>> IIUC, to add a name entry based lock, both client and server should be modified, and LDLM need to
>> be changed to manage the new lock information. Like new inode bitlock, the name entry based lock will
>> as well introduce interoperability issues as a result of on wire protocol change. And in DNE
>> environment as Andreas explained, the original plan is to get LOOKUP on both master MDT and remote MDT.
>> If we introduce name entry based lock, DNE case may need some change as well.
>>>> I will take a closer look at the approach. In the meanwhile, could you please confirm if it is the
>> right way to go? The benefit of above complexity seems to be better distributed dentry locking under
>> dcache framework.
>>>> What do others think? Andreas?
>>> Tao, before we go down the road of implementing such a change, there would have to be clear benefits
>> to the performance and/or kernel integration before we could accept the interoperability issues and
>> added code complexity.
>>> The benefits of having a per-name lock, as I see them, are:
>>> - can be cancelled separately from other name locks
>>> - can be changed independently from the inode permissions (LOOKUP)
>>> - potential ease of integration with the Linux dcache
>>>
>>> The drawbacks of introducing a per-name lock:
>>> - only useful for hard-linked files (uncommon case)
>>> - added code complexity for a new lock type
>>> - incompatibility with existing client/server
>>> - additional code to handle incompatibility transparently
>>> - lock enqueue may need extra RPC for each name lock (different resource)
>>> - lock cancellation may need an extra RPC for each resource
>>>
>>> The way I see it is that the DENTRY lock would need to be a different resource for each name on a
>> file.  The only way this is going to be useful AFAICS is if the client enqueues each DENTRY lock on a
>> separate resource, so that it can be granted and cancelled independently of the LOOKUP lock (which is
>> on the FID and would be common for all names that reference this inode).  That would either need an
>> extra RPC for every filename lookup (to enqueue both the DENTRY and LOOKUP locks) or a new
>> LDLM_ENQUEUE RPC that can request and be granted two different locks (FID [seq, oid, ver, 0], and
>> FID+name hash [seq, oid, ver, hash]) in a single RPC.
>>> So, the main question is how much better the client dcache/VFS integration will be with the DCACHE
>> lock compared to the existing mechanism with the LOOKUP (FID) lock?  Is it impossible (or very
>> difficult) to use the existing mechanism for the new lockless dcache code?  Or is it a matter of this
>> being a somewhat cleaner implementation, but not critical for moving forward?
>>> I'm aware that our dcache code has been too complex for some time, but I haven't looked recently to
>> see how much of this is due to using/supporting old kernel APIs, and how much of it is due to the use
>> of the LOOKUP lock on the FID protecting too much of the state on the client.
>>> Cheers, Andreas
>> Hi Tao,
>>
>> To introduce either per-name lock or DENTRY lock is mainly for removing
>> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code. But
>> if such simplification causes more complex ldlm logic, we have to
>> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
>> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux kernel.
>> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
>> lockless dcache in new linux kernel also.
>>
>> So to push the project move forward, we can divide it as two parts
>> (according to your document): Lustre dcache code cleanup, and some new
>> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
>> agreement on the second part, we can start from the first part to
>> cleanup Lustre dcache code. How do you think?
>>
> Thank you, Andreas and Yong. I tend to agree with you that we should avoid the complexity if there is no obvious benefits. However, as we need to make sure the code is acceptable by kernel community, I will try to consolidate a design for adding the new type of lock but just don't implement it at this point. And if later we are asked (by kernel community developers) to do it, we just need to do the implementation.

Hi Tao,

We has another way to remove Lustre special dentry flags 
"DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP 
lock protected but not unhashed yet, we can add new flags in 
"ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre 
private date, no need to worry about whether the new flags will conflict 
with any other VFS standard "dentry::d_flags" or not, and maybe save 
some effort to implement the complex DENTRY lock in your original design.

Cheers,
Nasf

> Please stay tuned.
>
> Best,
> Tao
>
>> Cheers,
>> --
>> Nasf
>>
>>>
>>>>> -----Original Message-----
>>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>>>>> Sent: Saturday, January 21, 2012 1:11 AM
>>>>> To: Peng, Tao; bergwolf at gmail.com
>>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu, Xuezhao; laisiyao at whamcloud.com;
>>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>
>>>>> Hi Tao,
>>>>>
>>>>> Excellent work. I just added some comments inside your document. Please check.
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Fan Yong
>>>>> Whamcloud, Inc.
>>>>>
>>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>>>>>> Hi Andreas,
>>>>>>
>>>>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet
>> access
>>>>> at hometown.
>>>>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
>>>>>>
>>>>>> Thanks,
>>>>>> Haiying
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>>> Sent: 2012?1?20? 1:39
>>>>>> To: Peng, Tao
>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
>>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
>>>>>> eeb at whamcloud.com
>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>>
>>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>   <tao.peng@emc.com>   wrote:
>>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I
>>>>> updated the design to reflect it.
>>>>>>> Please see attachment.
>>>>>> Tao,
>>>>>> Fan Yong is also taking a more detailed look at your document and will
>>>>>> hopefully have a chance to reply before the New Year holidays.
>>>>>>
>>>>>> Also, we are just working on landing the patches to add support for
>>>>>> Linux
>>>>>> 2.6.38 for the Lustre client.  One of the patches relates to the
>>>>>> lockless dcache changes that were introduced in that kernel.  If you
>>>>>> are interested to review this patch, and become more familiar with the
>>>>>> Lustre development process, you should visit http://review.whamcloud.com/1865 for the patch.
>>>>>>
>>>>>> You need to create an account in Gerrit using OpenID (Google, mostly),
>>>>>> and an account in our bug tracking system (http://jira.whamcloud.com)
>>>>>> if you haven't already.
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>>>>>>>> To: Peng, Tao
>>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
>>>>>>>> Yong; Oleg Drokin; Eric Barton
>>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>>>>
>>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>   wrote:
>>>>>>>>> I finally started to work on Lustre dcache cleanup and locking.
>>>>>>>>> After reading Lustre, ocfs2 and VFS
>>>>>>>> dcache related code, I came to a design for cleaning up Lustre
>>>>>>>> dcache code and doing distributed locking under dcache. For
>>>>>>>> distributed locking, the main idea is to add a new inode bitlock
>>>>>>>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
>>>>>>>> lock handle multiple purpose, which makes client unable to know
>>>>>>>> whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode
>>>>> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
>>>>>>>>> For details, please see the attachments (I attached both word and
>>>>>>>>> pdf versions as I am not sure if
>>>>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
>>>>>>>>
>>>>>>>> Hi Tao,
>>>>>>>> I'm passing this on to the engineers that are most familiar with the
>>>>>>>> DLM and client dcache code in Lustre.  After a quick first read
>>>>>>>> through your document, your investigation of the client code is very
>>>>>>>> insightful and looks like it will be able to remove some of the significant complexity that has
>>>>> grown over time in the llite code.
>>>>>>>> Cheers, Andreas
>>>>>>>> --
>>>>>>>> Andreas Dilger                       Whamcloud, Inc.
>>>>>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>>>>>
>>>>>>>>
>>>>>>> <dcache_distributed_locking-v2.docx>
>>>>>> Cheers, Andreas
>>>>>> --
>>>>>> Andreas Dilger                       Whamcloud, Inc.
>>>>>> Principal Engineer                   http://www.whamcloud.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger                       Whamcloud, Inc.
>>> Principal Engineer                   http://www.whamcloud.com/
>>>
>>>
>>>
>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-03 16:17                   ` Fan Yong
@ 2012-02-05 13:50                     ` Peng Tao
  2012-02-09  7:15                     ` tao.peng at emc.com
  2012-02-09  7:35                     ` tao.peng at emc.com
  2 siblings, 0 replies; 16+ messages in thread
From: Peng Tao @ 2012-02-05 13:50 UTC (permalink / raw)
  To: lustre-devel

On Sat, Feb 4, 2012 at 12:17 AM, Fan Yong <yong.fan@whamcloud.com> wrote:
> On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
>>>
>>> -----Original Message-----
>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>>> Sent: Tuesday, January 31, 2012 8:02 PM
>>> To: Peng, Tao
>>> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish, sorin;
>>> Liu, Xuezhao;
>>> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
>>> bryon at whamcloud.com; lustre-
>>> devel at lists.lustre.org
>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>
>>> On 1/31/12 7:27 AM, Andreas Dilger wrote:
>>>>
>>>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com> ? wrote:
>>>>>
>>>>> Sorry for the late reply. I got your email earlier but didn't have
>>>>> chance to read the document due
>>>
>>> to very limited internet access.
>>>>>
>>>>> Thank you very much for taking detailed look at the document and giving
>>>>> so many valuable comments.
>>>>>
>>>>> About adding new inode bitlock, I must admit that I missed hardlink
>>>>> case. Although the lock mode
>>>
>>> can be changed to be the same as LOOKUP lock, I agree that inode bitlock
>>> cannot sufficiently protect
>>> every name/dentry separately for hardlinked files. And I agree, as you
>>> suggested in the document,
>>> adding a name entry based lock can solve it.
>>>>>
>>>>> IIUC, to add a name entry based lock, both client and server should be
>>>>> modified, and LDLM need to
>>>
>>> be changed to manage the new lock information. Like new inode bitlock,
>>> the name entry based lock will
>>> as well introduce interoperability issues as a result of on wire protocol
>>> change. And in DNE
>>> environment as Andreas explained, the original plan is to get LOOKUP on
>>> both master MDT and remote MDT.
>>> If we introduce name entry based lock, DNE case may need some change as
>>> well.
>>>>>
>>>>> I will take a closer look at the approach. In the meanwhile, could you
>>>>> please confirm if it is the
>>>
>>> right way to go? The benefit of above complexity seems to be better
>>> distributed dentry locking under
>>> dcache framework.
>>>>>
>>>>> What do others think? Andreas?
>>>>
>>>> Tao, before we go down the road of implementing such a change, there
>>>> would have to be clear benefits
>>>
>>> to the performance and/or kernel integration before we could accept the
>>> interoperability issues and
>>> added code complexity.
>>>>
>>>> The benefits of having a per-name lock, as I see them, are:
>>>> - can be cancelled separately from other name locks
>>>> - can be changed independently from the inode permissions (LOOKUP)
>>>> - potential ease of integration with the Linux dcache
>>>>
>>>> The drawbacks of introducing a per-name lock:
>>>> - only useful for hard-linked files (uncommon case)
>>>> - added code complexity for a new lock type
>>>> - incompatibility with existing client/server
>>>> - additional code to handle incompatibility transparently
>>>> - lock enqueue may need extra RPC for each name lock (different
>>>> resource)
>>>> - lock cancellation may need an extra RPC for each resource
>>>>
>>>> The way I see it is that the DENTRY lock would need to be a different
>>>> resource for each name on a
>>>
>>> file. ?The only way this is going to be useful AFAICS is if the client
>>> enqueues each DENTRY lock on a
>>> separate resource, so that it can be granted and cancelled independently
>>> of the LOOKUP lock (which is
>>> on the FID and would be common for all names that reference this inode).
>>> ?That would either need an
>>> extra RPC for every filename lookup (to enqueue both the DENTRY and
>>> LOOKUP locks) or a new
>>> LDLM_ENQUEUE RPC that can request and be granted two different locks (FID
>>> [seq, oid, ver, 0], and
>>> FID+name hash [seq, oid, ver, hash]) in a single RPC.
>>>>
>>>> So, the main question is how much better the client dcache/VFS
>>>> integration will be with the DCACHE
>>>
>>> lock compared to the existing mechanism with the LOOKUP (FID) lock? ?Is
>>> it impossible (or very
>>> difficult) to use the existing mechanism for the new lockless dcache
>>> code? ?Or is it a matter of this
>>> being a somewhat cleaner implementation, but not critical for moving
>>> forward?
>>>>
>>>> I'm aware that our dcache code has been too complex for some time, but I
>>>> haven't looked recently to
>>>
>>> see how much of this is due to using/supporting old kernel APIs, and how
>>> much of it is due to the use
>>> of the LOOKUP lock on the FID protecting too much of the state on the
>>> client.
>>>>
>>>> Cheers, Andreas
>>>
>>> Hi Tao,
>>>
>>> To introduce either per-name lock or DENTRY lock is mainly for removing
>>> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code. But
>>> if such simplification causes more complex ldlm logic, we have to
>>> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
>>> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux kernel.
>>> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
>>> lockless dcache in new linux kernel also.
>>>
>>> So to push the project move forward, we can divide it as two parts
>>> (according to your document): Lustre dcache code cleanup, and some new
>>> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
>>> agreement on the second part, we can start from the first part to
>>> cleanup Lustre dcache code. How do you think?
>>>
>> Thank you, Andreas and Yong. I tend to agree with you that we should avoid
>> the complexity if there is no obvious benefits. However, as we need to make
>> sure the code is acceptable by kernel community, I will try to consolidate a
>> design for adding the new type of lock but just don't implement it at this
>> point. And if later we are asked (by kernel community developers) to do it,
>> we just need to do the implementation.
>
>
> Hi Tao,
>
> We has another way to remove Lustre special dentry flags
> "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
> lock protected but not unhashed yet, we can add new flags in
> "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre private
> date, no need to worry about whether the new flags will conflict with any
> other VFS standard "dentry::d_flags" or not, and maybe save some effort to
> implement the complex DENTRY lock in your original design.
>
Brilliant! Then it is lustre private flag and should not receive any
objection from VFS layer.

Thanks,
Tao

> Cheers,
> Nasf
>
>
>> Please stay tuned.
>>
>> Best,
>> Tao
>>
>>> Cheers,
>>> --
>>> Nasf
>>>
>>>>
>>>>>> -----Original Message-----
>>>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>>>>>> Sent: Saturday, January 21, 2012 1:11 AM
>>>>>> To: Peng, Tao; bergwolf at gmail.com
>>>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu,
>>>>>> Xuezhao; laisiyao at whamcloud.com;
>>>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>>
>>>>>> Hi Tao,
>>>>>>
>>>>>> Excellent work. I just added some comments inside your document.
>>>>>> Please check.
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Fan Yong
>>>>>> Whamcloud, Inc.
>>>>>>
>>>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>>>>>>>
>>>>>>> Hi Andreas,
>>>>>>>
>>>>>>> Tao is on vacation now. It might be difficult for him to check emails
>>>>>>> due to limited internet
>>>
>>> access
>>>>>>
>>>>>> at hometown.
>>>>>>>
>>>>>>> For urgent issue, you folks could send email to his gmail account
>>>>>>> bergwolf at gmail.com.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Haiying
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>>>> Sent: 2012?1?20? 1:39
>>>>>>> To: Peng, Tao
>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
>>>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
>>>>>>> eeb at whamcloud.com
>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>>>
>>>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com> ? <tao.peng@emc.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation question
>>>>>>>> on lustre mailing list. I
>>>>>>
>>>>>> updated the design to reflect it.
>>>>>>>>
>>>>>>>> Please see attachment.
>>>>>>>
>>>>>>> Tao,
>>>>>>> Fan Yong is also taking a more detailed look at your document and
>>>>>>> will
>>>>>>> hopefully have a chance to reply before the New Year holidays.
>>>>>>>
>>>>>>> Also, we are just working on landing the patches to add support for
>>>>>>> Linux
>>>>>>> 2.6.38 for the Lustre client. ?One of the patches relates to the
>>>>>>> lockless dcache changes that were introduced in that kernel. ?If you
>>>>>>> are interested to review this patch, and become more familiar with
>>>>>>> the
>>>>>>> Lustre development process, you should visit
>>>>>>> http://review.whamcloud.com/1865 for the patch.
>>>>>>>
>>>>>>> You need to create an account in Gerrit using OpenID (Google,
>>>>>>> mostly),
>>>>>>> and an account in our bug tracking system (http://jira.whamcloud.com)
>>>>>>> if you haven't already.
>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>>>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>>>>>>>>> To: Peng, Tao
>>>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
>>>>>>>>> Yong; Oleg Drokin; Eric Barton
>>>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>>>>>>>>>
>>>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com> ? wrote:
>>>>>>>>>>
>>>>>>>>>> I finally started to work on Lustre dcache cleanup and locking.
>>>>>>>>>> After reading Lustre, ocfs2 and VFS
>>>>>>>>>
>>>>>>>>> dcache related code, I came to a design for cleaning up Lustre
>>>>>>>>> dcache code and doing distributed locking under dcache. For
>>>>>>>>> distributed locking, the main idea is to add a new inode bitlock
>>>>>>>>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
>>>>>>>>> lock handle multiple purpose, which makes client unable to know
>>>>>>>>> whether file is deleted or not when server cancels LOOKUP lock.
>>>>>>>>> Instead, client holds PR mode
>>>>>>
>>>>>> DENTRY lock on all valid denties and when server cancels it, client
>>>>>> knows the file is deleted.
>>>>>>>>>>
>>>>>>>>>> For details, please see the attachments (I attached both word and
>>>>>>>>>> pdf versions as I am not sure if
>>>>>>>>>
>>>>>>>>> which is more convenient for you:). And please help to review and
>>>>>>>>> comment. Thank you!
>>>>>>>>>
>>>>>>>>> Hi Tao,
>>>>>>>>> I'm passing this on to the engineers that are most familiar with
>>>>>>>>> the
>>>>>>>>> DLM and client dcache code in Lustre. ?After a quick first read
>>>>>>>>> through your document, your investigation of the client code is
>>>>>>>>> very
>>>>>>>>> insightful and looks like it will be able to remove some of the
>>>>>>>>> significant complexity that has
>>>>>>
>>>>>> grown over time in the llite code.
>>>>>>>>>
>>>>>>>>> Cheers, Andreas
>>>>>>>>> --
>>>>>>>>> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
>>>>>>>>> Principal Engineer ? ? ? ? ? ? ? ? ? http://www.whamcloud.com/
>>>>>>>>>
>>>>>>>>>
>>>>>>>> <dcache_distributed_locking-v2.docx>
>>>>>>>
>>>>>>> Cheers, Andreas
>>>>>>> --
>>>>>>> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
>>>>>>> Principal Engineer ? ? ? ? ? ? ? ? ? http://www.whamcloud.com/
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>> Cheers, Andreas
>>>> --
>>>> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
>>>> Principal Engineer ? ? ? ? ? ? ? ? ? http://www.whamcloud.com/
>>>>
>>>>
>>>>
>>>>
>



-- 
Thanks,
Tao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-03 16:17                   ` Fan Yong
  2012-02-05 13:50                     ` Peng Tao
@ 2012-02-09  7:15                     ` tao.peng at emc.com
  2012-02-10 20:37                       ` Andreas Dilger
  2012-02-11 15:58                       ` Yong Fan
  2012-02-09  7:35                     ` tao.peng at emc.com
  2 siblings, 2 replies; 16+ messages in thread
From: tao.peng at emc.com @ 2012-02-09  7:15 UTC (permalink / raw)
  To: lustre-devel


> -----Original Message-----
> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> Sent: Saturday, February 04, 2012 12:18 AM
> To: Peng, Tao
> Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> devel at lists.lustre.org
> Subject: Re: Lustre dcache clean up and distributed locking
> 
> On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
> >> -----Original Message-----
> >> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> Sent: Tuesday, January 31, 2012 8:02 PM
> >> To: Peng, Tao
> >> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
> >> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> >> devel at lists.lustre.org
> >> Subject: Re: Lustre dcache clean up and distributed locking
> >>
> >> On 1/31/12 7:27 AM, Andreas Dilger wrote:
> >>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>   wrote:
> >>>> Sorry for the late reply. I got your email earlier but didn't have chance to read the document
> due
> >> to very limited internet access.
> >>>> Thank you very much for taking detailed look at the document and giving so many valuable comments.
> >>>>
> >>>> About adding new inode bitlock, I must admit that I missed hardlink case. Although the lock mode
> >> can be changed to be the same as LOOKUP lock, I agree that inode bitlock cannot sufficiently
> protect
> >> every name/dentry separately for hardlinked files. And I agree, as you suggested in the document,
> >> adding a name entry based lock can solve it.
> >>>> IIUC, to add a name entry based lock, both client and server should be modified, and LDLM need to
> >> be changed to manage the new lock information. Like new inode bitlock, the name entry based lock
> will
> >> as well introduce interoperability issues as a result of on wire protocol change. And in DNE
> >> environment as Andreas explained, the original plan is to get LOOKUP on both master MDT and remote
> MDT.
> >> If we introduce name entry based lock, DNE case may need some change as well.
> >>>> I will take a closer look at the approach. In the meanwhile, could you please confirm if it is
> the
> >> right way to go? The benefit of above complexity seems to be better distributed dentry locking
> under
> >> dcache framework.
> >>>> What do others think? Andreas?
> >>> Tao, before we go down the road of implementing such a change, there would have to be clear
> benefits
> >> to the performance and/or kernel integration before we could accept the interoperability issues and
> >> added code complexity.
> >>> The benefits of having a per-name lock, as I see them, are:
> >>> - can be cancelled separately from other name locks
> >>> - can be changed independently from the inode permissions (LOOKUP)
> >>> - potential ease of integration with the Linux dcache
> >>>
> >>> The drawbacks of introducing a per-name lock:
> >>> - only useful for hard-linked files (uncommon case)
> >>> - added code complexity for a new lock type
> >>> - incompatibility with existing client/server
> >>> - additional code to handle incompatibility transparently
> >>> - lock enqueue may need extra RPC for each name lock (different resource)
> >>> - lock cancellation may need an extra RPC for each resource
> >>>
> >>> The way I see it is that the DENTRY lock would need to be a different resource for each name on a
> >> file.  The only way this is going to be useful AFAICS is if the client enqueues each DENTRY lock on
> a
> >> separate resource, so that it can be granted and cancelled independently of the LOOKUP lock (which
> is
> >> on the FID and would be common for all names that reference this inode).  That would either need an
> >> extra RPC for every filename lookup (to enqueue both the DENTRY and LOOKUP locks) or a new
> >> LDLM_ENQUEUE RPC that can request and be granted two different locks (FID [seq, oid, ver, 0], and
> >> FID+name hash [seq, oid, ver, hash]) in a single RPC.
> >>> So, the main question is how much better the client dcache/VFS integration will be with the DCACHE
> >> lock compared to the existing mechanism with the LOOKUP (FID) lock?  Is it impossible (or very
> >> difficult) to use the existing mechanism for the new lockless dcache code?  Or is it a matter of
> this
> >> being a somewhat cleaner implementation, but not critical for moving forward?
> >>> I'm aware that our dcache code has been too complex for some time, but I haven't looked recently
> to
> >> see how much of this is due to using/supporting old kernel APIs, and how much of it is due to the
> use
> >> of the LOOKUP lock on the FID protecting too much of the state on the client.
> >>> Cheers, Andreas
> >> Hi Tao,
> >>
> >> To introduce either per-name lock or DENTRY lock is mainly for removing
> >> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code. But
> >> if such simplification causes more complex ldlm logic, we have to
> >> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
> >> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux kernel.
> >> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
> >> lockless dcache in new linux kernel also.
> >>
> >> So to push the project move forward, we can divide it as two parts
> >> (according to your document): Lustre dcache code cleanup, and some new
> >> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
> >> agreement on the second part, we can start from the first part to
> >> cleanup Lustre dcache code. How do you think?
> >>
> > Thank you, Andreas and Yong. I tend to agree with you that we should avoid the complexity if there
> is no obvious benefits. However, as we need to make sure the code is acceptable by kernel community, I
> will try to consolidate a design for adding the new type of lock but just don't implement it at this
> point. And if later we are asked (by kernel community developers) to do it, we just need to do the
> implementation.
> 
> Hi Tao,
> 
> We has another way to remove Lustre special dentry flags
> "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
> lock protected but not unhashed yet, we can add new flags in
> "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
> private date, no need to worry about whether the new flags will conflict
> with any other VFS standard "dentry::d_flags" or not, and maybe save
> some effort to implement the complex DENTRY lock in your original design.
> 
Hi Fan Yong,

Thanks for your great suggestion. I updated the design doc with following changes, please help to review. Thank you!

Changelog:
1. Macro cleanups are removed. I prefer to do it separately while cleaning up other obsolete macros.
2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag can be hidden from generic VFS layer, it is no longer necessary to introduce such complex change.
3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.

Cheers,
Tao
> Cheers,
> Nasf
> 
> > Please stay tuned.
> >
> > Best,
> > Tao
> >
> >> Cheers,
> >> --
> >> Nasf
> >>
> >>>
> >>>>> -----Original Message-----
> >>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >>>>> Sent: Saturday, January 21, 2012 1:11 AM
> >>>>> To: Peng, Tao; bergwolf at gmail.com
> >>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu, Xuezhao; laisiyao at whamcloud.com;
> >>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
> >>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>>>
> >>>>> Hi Tao,
> >>>>>
> >>>>> Excellent work. I just added some comments inside your document. Please check.
> >>>>>
> >>>>>
> >>>>> Best Regards,
> >>>>> Fan Yong
> >>>>> Whamcloud, Inc.
> >>>>>
> >>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> >>>>>> Hi Andreas,
> >>>>>>
> >>>>>> Tao is on vacation now. It might be difficult for him to check emails due to limited internet
> >> access
> >>>>> at hometown.
> >>>>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Haiying
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >>>>>> Sent: 2012?1?20? 1:39
> >>>>>> To: Peng, Tao
> >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> >>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com; green at whamcloud.com;
> >>>>>> eeb at whamcloud.com
> >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>>>>
> >>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>   <tao.peng@emc.com>   wrote:
> >>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation question on lustre mailing list. I
> >>>>> updated the design to reflect it.
> >>>>>>> Please see attachment.
> >>>>>> Tao,
> >>>>>> Fan Yong is also taking a more detailed look at your document and will
> >>>>>> hopefully have a chance to reply before the New Year holidays.
> >>>>>>
> >>>>>> Also, we are just working on landing the patches to add support for
> >>>>>> Linux
> >>>>>> 2.6.38 for the Lustre client.  One of the patches relates to the
> >>>>>> lockless dcache changes that were introduced in that kernel.  If you
> >>>>>> are interested to review this patch, and become more familiar with the
> >>>>>> Lustre development process, you should visit http://review.whamcloud.com/1865 for the patch.
> >>>>>>
> >>>>>> You need to create an account in Gerrit using OpenID (Google, mostly),
> >>>>>> and an account in our bug tracking system (http://jira.whamcloud.com)
> >>>>>> if you haven't already.
> >>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
> >>>>>>>> To: Peng, Tao
> >>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
> >>>>>>>> Yong; Oleg Drokin; Eric Barton
> >>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >>>>>>>>
> >>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>   wrote:
> >>>>>>>>> I finally started to work on Lustre dcache cleanup and locking.
> >>>>>>>>> After reading Lustre, ocfs2 and VFS
> >>>>>>>> dcache related code, I came to a design for cleaning up Lustre
> >>>>>>>> dcache code and doing distributed locking under dcache. For
> >>>>>>>> distributed locking, the main idea is to add a new inode bitlock
> >>>>>>>> DENTRY lock to just protect valid dentry, instead of letting LOOKUP
> >>>>>>>> lock handle multiple purpose, which makes client unable to know
> >>>>>>>> whether file is deleted or not when server cancels LOOKUP lock. Instead, client holds PR mode
> >>>>> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
> >>>>>>>>> For details, please see the attachments (I attached both word and
> >>>>>>>>> pdf versions as I am not sure if
> >>>>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
> >>>>>>>>
> >>>>>>>> Hi Tao,
> >>>>>>>> I'm passing this on to the engineers that are most familiar with the
> >>>>>>>> DLM and client dcache code in Lustre.  After a quick first read
> >>>>>>>> through your document, your investigation of the client code is very
> >>>>>>>> insightful and looks like it will be able to remove some of the significant complexity that
> has
> >>>>> grown over time in the llite code.
> >>>>>>>> Cheers, Andreas
> >>>>>>>> --
> >>>>>>>> Andreas Dilger                       Whamcloud, Inc.
> >>>>>>>> Principal Engineer                   http://www.whamcloud.com/
> >>>>>>>>
> >>>>>>>>
> >>>>>>> <dcache_distributed_locking-v2.docx>
> >>>>>> Cheers, Andreas
> >>>>>> --
> >>>>>> Andreas Dilger                       Whamcloud, Inc.
> >>>>>> Principal Engineer                   http://www.whamcloud.com/
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>> Cheers, Andreas
> >>> --
> >>> Andreas Dilger                       Whamcloud, Inc.
> >>> Principal Engineer                   http://www.whamcloud.com/
> >>>
> >>>
> >>>
> >>>
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dcache_cleanup_v3.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 24186 bytes
Desc: dcache_cleanup_v3.docx
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120209/639046f5/attachment.bin>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-03 16:17                   ` Fan Yong
  2012-02-05 13:50                     ` Peng Tao
  2012-02-09  7:15                     ` tao.peng at emc.com
@ 2012-02-09  7:35                     ` tao.peng at emc.com
  2 siblings, 0 replies; 16+ messages in thread
From: tao.peng at emc.com @ 2012-02-09  7:35 UTC (permalink / raw)
  To: lustre-devel

> -----Original Message-----
> From: Peng, Tao
> Sent: Thursday, February 09, 2012 3:16 PM
> To: 'Fan Yong'
> Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> devel at lists.lustre.org
> Subject: RE: Lustre dcache clean up and distributed locking
> 
> 
> > -----Original Message-----
> > From: Fan Yong [mailto:yong.fan at whamcloud.com]
> > Sent: Saturday, February 04, 2012 12:18 AM
> > To: Peng, Tao
> > Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish,
> > sorin; Liu, Xuezhao; laisiyao at whamcloud.com; green at whamcloud.com;
> > eeb at whamcloud.com; bryon at whamcloud.com; lustre- devel at lists.lustre.org
> > Subject: Re: Lustre dcache clean up and distributed locking
> >
> > On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
> > >> -----Original Message-----
> > >> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> > >> Sent: Tuesday, January 31, 2012 8:02 PM
> > >> To: Peng, Tao
> > >> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish,
> > >> sorin; Liu, Xuezhao; laisiyao at whamcloud.com; green at whamcloud.com;
> > >> eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> > >> devel at lists.lustre.org
> > >> Subject: Re: Lustre dcache clean up and distributed locking
> > >>
> > >> On 1/31/12 7:27 AM, Andreas Dilger wrote:
> > >>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>   wrote:
> > >>>> Sorry for the late reply. I got your email earlier but didn't
> > >>>> have chance to read the document
> > due
> > >> to very limited internet access.
> > >>>> Thank you very much for taking detailed look at the document and giving so many valuable
> comments.
> > >>>>
> > >>>> About adding new inode bitlock, I must admit that I missed
> > >>>> hardlink case. Although the lock mode
> > >> can be changed to be the same as LOOKUP lock, I agree that inode
> > >> bitlock cannot sufficiently
> > protect
> > >> every name/dentry separately for hardlinked files. And I agree, as
> > >> you suggested in the document, adding a name entry based lock can solve it.
> > >>>> IIUC, to add a name entry based lock, both client and server
> > >>>> should be modified, and LDLM need to
> > >> be changed to manage the new lock information. Like new inode
> > >> bitlock, the name entry based lock
> > will
> > >> as well introduce interoperability issues as a result of on wire
> > >> protocol change. And in DNE environment as Andreas explained, the
> > >> original plan is to get LOOKUP on both master MDT and remote
> > MDT.
> > >> If we introduce name entry based lock, DNE case may need some change as well.
> > >>>> I will take a closer look at the approach. In the meanwhile,
> > >>>> could you please confirm if it is
> > the
> > >> right way to go? The benefit of above complexity seems to be better
> > >> distributed dentry locking
> > under
> > >> dcache framework.
> > >>>> What do others think? Andreas?
> > >>> Tao, before we go down the road of implementing such a change,
> > >>> there would have to be clear
> > benefits
> > >> to the performance and/or kernel integration before we could accept
> > >> the interoperability issues and added code complexity.
> > >>> The benefits of having a per-name lock, as I see them, are:
> > >>> - can be cancelled separately from other name locks
> > >>> - can be changed independently from the inode permissions (LOOKUP)
> > >>> - potential ease of integration with the Linux dcache
> > >>>
> > >>> The drawbacks of introducing a per-name lock:
> > >>> - only useful for hard-linked files (uncommon case)
> > >>> - added code complexity for a new lock type
> > >>> - incompatibility with existing client/server
> > >>> - additional code to handle incompatibility transparently
> > >>> - lock enqueue may need extra RPC for each name lock (different
> > >>> resource)
> > >>> - lock cancellation may need an extra RPC for each resource
> > >>>
> > >>> The way I see it is that the DENTRY lock would need to be a
> > >>> different resource for each name on a
> > >> file.  The only way this is going to be useful AFAICS is if the
> > >> client enqueues each DENTRY lock on
> > a
> > >> separate resource, so that it can be granted and cancelled
> > >> independently of the LOOKUP lock (which
> > is
> > >> on the FID and would be common for all names that reference this
> > >> inode).  That would either need an extra RPC for every filename
> > >> lookup (to enqueue both the DENTRY and LOOKUP locks) or a new
> > >> LDLM_ENQUEUE RPC that can request and be granted two different
> > >> locks (FID [seq, oid, ver, 0], and
> > >> FID+name hash [seq, oid, ver, hash]) in a single RPC.
> > >>> So, the main question is how much better the client dcache/VFS
> > >>> integration will be with the DCACHE
> > >> lock compared to the existing mechanism with the LOOKUP (FID) lock?
> > >> Is it impossible (or very
> > >> difficult) to use the existing mechanism for the new lockless
> > >> dcache code?  Or is it a matter of
> > this
> > >> being a somewhat cleaner implementation, but not critical for moving forward?
> > >>> I'm aware that our dcache code has been too complex for some time,
> > >>> but I haven't looked recently
> > to
> > >> see how much of this is due to using/supporting old kernel APIs,
> > >> and how much of it is due to the
> > use
> > >> of the LOOKUP lock on the FID protecting too much of the state on the client.
> > >>> Cheers, Andreas
> > >> Hi Tao,
> > >>
> > >> To introduce either per-name lock or DENTRY lock is mainly for
> > >> removing Lustre special flag "DCACHE_LUSTRE_INVALID" to simply
> > >> Lustre code. But if such simplification causes more complex ldlm
> > >> logic, we have to consider whether it is worth to remove
> > >> "DCACHE_LUSTRE_INVALID". It is not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux
> kernel.
> > >> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
> > >> lockless dcache in new linux kernel also.
> > >>
> > >> So to push the project move forward, we can divide it as two parts
> > >> (according to your document): Lustre dcache code cleanup, and some
> > >> new ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach
> > >> an agreement on the second part, we can start from the first part
> > >> to cleanup Lustre dcache code. How do you think?
> > >>
> > > Thank you, Andreas and Yong. I tend to agree with you that we should
> > > avoid the complexity if there
> > is no obvious benefits. However, as we need to make sure the code is
> > acceptable by kernel community, I will try to consolidate a design for
> > adding the new type of lock but just don't implement it at this point.
> > And if later we are asked (by kernel community developers) to do it, we just need to do the
> implementation.
> >
> > Hi Tao,
> >
> > We has another way to remove Lustre special dentry flags
> > "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without
> > LOOKUP lock protected but not unhashed yet, we can add new flags in
> > "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
> > private date, no need to worry about whether the new flags will
> > conflict with any other VFS standard "dentry::d_flags" or not, and
> > maybe save some effort to implement the complex DENTRY lock in your original design.
> >
> Hi Fan Yong,
> 
> Thanks for your great suggestion. I updated the design doc with following changes, please help to
> review. Thank you!
> 
> Changelog:
> 1. Macro cleanups are removed. I prefer to do it separately while cleaning up other obsolete macros.
> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag can be hidden from generic VFS
> layer, it is no longer necessary to introduce such complex change.
> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to ll_dentry_data structure to replace
> existing DCACHE_LUSTRE_INVALID flag.
> 
BTW, I notice that there are some conflicts with Siyao's latest RCU walk patch proposal (http://review.whamcloud.com/#change,1865). The main conflict seems to be about how to handle parent UPDATE lock cancelation. I will update the doc to remove the confliction later when Siyao new patchset is out.

Thanks,
Tao
> Cheers,
> Tao
> > Cheers,
> > Nasf
> >
> > > Please stay tuned.
> > >
> > > Best,
> > > Tao
> > >
> > >> Cheers,
> > >> --
> > >> Nasf
> > >>
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> > >>>>> Sent: Saturday, January 21, 2012 1:11 AM
> > >>>>> To: Peng, Tao; bergwolf at gmail.com
> > >>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu,
> > >>>>> Xuezhao; laisiyao at whamcloud.com; green at whamcloud.com;
> > >>>>> eeb at whamcloud.com; Bryon Neitzel
> > >>>>> Subject: Re: Lustre dcache clean up and distributed locking
> > >>>>>
> > >>>>> Hi Tao,
> > >>>>>
> > >>>>> Excellent work. I just added some comments inside your document. Please check.
> > >>>>>
> > >>>>>
> > >>>>> Best Regards,
> > >>>>> Fan Yong
> > >>>>> Whamcloud, Inc.
> > >>>>>
> > >>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> > >>>>>> Hi Andreas,
> > >>>>>>
> > >>>>>> Tao is on vacation now. It might be difficult for him to check
> > >>>>>> emails due to limited internet
> > >> access
> > >>>>> at hometown.
> > >>>>>> For urgent issue, you folks could send email to his gmail account bergwolf at gmail.com.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Haiying
> > >>>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> > >>>>>> Sent: 2012?1?20? 1:39
> > >>>>>> To: Peng, Tao
> > >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> > >>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com;
> > >>>>>> green at whamcloud.com; eeb at whamcloud.com
> > >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> > >>>>>>
> > >>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>   <tao.peng@emc.com>   wrote:
> > >>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation
> > >>>>>>> question on lustre mailing list. I
> > >>>>> updated the design to reflect it.
> > >>>>>>> Please see attachment.
> > >>>>>> Tao,
> > >>>>>> Fan Yong is also taking a more detailed look at your document
> > >>>>>> and will hopefully have a chance to reply before the New Year holidays.
> > >>>>>>
> > >>>>>> Also, we are just working on landing the patches to add support
> > >>>>>> for Linux
> > >>>>>> 2.6.38 for the Lustre client.  One of the patches relates to
> > >>>>>> the lockless dcache changes that were introduced in that
> > >>>>>> kernel.  If you are interested to review this patch, and become
> > >>>>>> more familiar with the Lustre development process, you should visit
> http://review.whamcloud.com/1865 for the patch.
> > >>>>>>
> > >>>>>> You need to create an account in Gerrit using OpenID (Google,
> > >>>>>> mostly), and an account in our bug tracking system
> > >>>>>> (http://jira.whamcloud.com) if you haven't already.
> > >>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> > >>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
> > >>>>>>>> To: Peng, Tao
> > >>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao;
> > >>>>>>>> Fan Yong; Oleg Drokin; Eric Barton
> > >>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> > >>>>>>>>
> > >>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>   wrote:
> > >>>>>>>>> I finally started to work on Lustre dcache cleanup and locking.
> > >>>>>>>>> After reading Lustre, ocfs2 and VFS
> > >>>>>>>> dcache related code, I came to a design for cleaning up
> > >>>>>>>> Lustre dcache code and doing distributed locking under
> > >>>>>>>> dcache. For distributed locking, the main idea is to add a
> > >>>>>>>> new inode bitlock DENTRY lock to just protect valid dentry,
> > >>>>>>>> instead of letting LOOKUP lock handle multiple purpose, which
> > >>>>>>>> makes client unable to know whether file is deleted or not
> > >>>>>>>> when server cancels LOOKUP lock. Instead, client holds PR
> > >>>>>>>> mode
> > >>>>> DENTRY lock on all valid denties and when server cancels it, client knows the file is deleted.
> > >>>>>>>>> For details, please see the attachments (I attached both
> > >>>>>>>>> word and pdf versions as I am not sure if
> > >>>>>>>> which is more convenient for you:). And please help to review and comment. Thank you!
> > >>>>>>>>
> > >>>>>>>> Hi Tao,
> > >>>>>>>> I'm passing this on to the engineers that are most familiar
> > >>>>>>>> with the DLM and client dcache code in Lustre.  After a quick
> > >>>>>>>> first read through your document, your investigation of the
> > >>>>>>>> client code is very insightful and looks like it will be able
> > >>>>>>>> to remove some of the significant complexity that
> > has
> > >>>>> grown over time in the llite code.
> > >>>>>>>> Cheers, Andreas
> > >>>>>>>> --
> > >>>>>>>> Andreas Dilger                       Whamcloud, Inc.
> > >>>>>>>> Principal Engineer                   http://www.whamcloud.com/
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> <dcache_distributed_locking-v2.docx>
> > >>>>>> Cheers, Andreas
> > >>>>>> --
> > >>>>>> Andreas Dilger                       Whamcloud, Inc.
> > >>>>>> Principal Engineer                   http://www.whamcloud.com/
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>> Cheers, Andreas
> > >>> --
> > >>> Andreas Dilger                       Whamcloud, Inc.
> > >>> Principal Engineer                   http://www.whamcloud.com/
> > >>>
> > >>>
> > >>>
> > >>>
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-09  7:15                     ` tao.peng at emc.com
@ 2012-02-10 20:37                       ` Andreas Dilger
  2012-02-11 13:40                         ` Peng Tao
  2012-02-11 15:58                       ` Yong Fan
  1 sibling, 1 reply; 16+ messages in thread
From: Andreas Dilger @ 2012-02-10 20:37 UTC (permalink / raw)
  To: lustre-devel

On 2012-02-09, at 12:15 AM, <tao.peng@emc.com> <tao.peng@emc.com> wrote:
> Fan Yong <yong.fan@whamcloud.com> wrote:
>> We has another way to remove Lustre special dentry flags
>> "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
>> lock protected but not unhashed yet, we can add new flags in
>> "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
>> private date, no need to worry about whether the new flags will conflict
>> with any other VFS standard "dentry::d_flags" or not, and maybe save
>> some effort to implement the complex DENTRY lock in your original design.
> 
> Hi Fan Yong,
> 
> Thanks for your great suggestion. I updated the design doc with following changes, please help to review. Thank you!
> 
> Changelog:
> 1. Macro cleanups are removed. I prefer to do it separately while cleaning up other obsolete macros.
> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag can be hidden from generic VFS layer, it is no longer necessary to introduce such complex change.
> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.

Hi Tao,
these cleanups look quite reasonable now.  Will you be working on making
these changes for Lustre?  The feature freeze for the Lustre 2.2 release
was Jan 7, so any non-bugfix changes will need to land for Lustre 2.3.

However, that doesn't prevent you from working on these patches and sending
them to Gerrit for inspection, testing, and getting them ready for landing.
You should base your patch on top of the most recent dcache patch in

http://review.whamcloud.com/1865

If you haven't looked already, please read the following wiki pages:

http://wiki.whamcloud.com/display/PUB/Submitting+Changes
http://wiki.whamcloud.com/display/PUB/Using+Gerrit

for details on how to submit patches to Gerrit for review/testing/landing.



Cheers, Andreas
--
Andreas Dilger                       Whamcloud, Inc.
Principal Lustre Engineer            http://www.whamcloud.com/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-10 20:37                       ` Andreas Dilger
@ 2012-02-11 13:40                         ` Peng Tao
  0 siblings, 0 replies; 16+ messages in thread
From: Peng Tao @ 2012-02-11 13:40 UTC (permalink / raw)
  To: lustre-devel

On Sat, Feb 11, 2012 at 4:37 AM, Andreas Dilger <adilger@whamcloud.com> wrote:
> On 2012-02-09, at 12:15 AM, <tao.peng@emc.com> <tao.peng@emc.com> wrote:
>> Fan Yong <yong.fan@whamcloud.com> wrote:
>>> We has another way to remove Lustre special dentry flags
>>> "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
>>> lock protected but not unhashed yet, we can add new flags in
>>> "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
>>> private date, no need to worry about whether the new flags will conflict
>>> with any other VFS standard "dentry::d_flags" or not, and maybe save
>>> some effort to implement the complex DENTRY lock in your original design.
>>
>> Hi Fan Yong,
>>
>> Thanks for your great suggestion. I updated the design doc with following changes, please help to review. Thank you!
>>
>> Changelog:
>> 1. Macro cleanups are removed. I prefer to do it separately while cleaning up other obsolete macros.
>> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag can be hidden from generic VFS layer, it is no longer necessary to introduce such complex change.
>> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.
>
> Hi Tao,
Hi Andreas,

> these cleanups look quite reasonable now. ?Will you be working on making
> these changes for Lustre? ?The feature freeze for the Lustre 2.2 release
> was Jan 7, so any non-bugfix changes will need to land for Lustre 2.3.
>
Yes, it is on my todo list. Our plan is for me to first work on
removing the outdated configure macros and related code that are there
to support older (older than 2.6.18-latest) kernels. Then I will work
on the dcache cleanups based on top of the most recent dcache RCU walk
patch.

> However, that doesn't prevent you from working on these patches and sending
> them to Gerrit for inspection, testing, and getting them ready for landing.
> You should base your patch on top of the most recent dcache patch in
>
> http://review.whamcloud.com/1865
>
> If you haven't looked already, please read the following wiki pages:
>
> http://wiki.whamcloud.com/display/PUB/Submitting+Changes
> http://wiki.whamcloud.com/display/PUB/Using+Gerrit
>
> for details on how to submit patches to Gerrit for review/testing/landing.
>
Thanks for sharing the links. I will make sure all patches follow the
guidelines.

Best,
Tao
>
>
> Cheers, Andreas
> --
> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
> Principal Lustre Engineer ? ? ? ? ? ?http://www.whamcloud.com/
>
>
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-09  7:15                     ` tao.peng at emc.com
  2012-02-10 20:37                       ` Andreas Dilger
@ 2012-02-11 15:58                       ` Yong Fan
  2012-02-12  3:46                         ` Peng Tao
  1 sibling, 1 reply; 16+ messages in thread
From: Yong Fan @ 2012-02-11 15:58 UTC (permalink / raw)
  To: lustre-devel

Hi Tao,

The new design looks OK me.
BTW, you have removed the section about "ll_lookup_lock" from the first
version document. So you will not change related logic, right? I think this
part should be fixed in new kernels.

Cheers,
Nasf

On Thu, Feb 9, 2012 at 3:15 PM, <tao.peng@emc.com> wrote:

>
> > -----Original Message-----
> > From: Fan Yong [mailto:yong.fan at whamcloud.com]
> > Sent: Saturday, February 04, 2012 12:18 AM
> > To: Peng, Tao
> > Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish,
> sorin; Liu, Xuezhao;
> > laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
> bryon at whamcloud.com; lustre-
> > devel at lists.lustre.org
> > Subject: Re: Lustre dcache clean up and distributed locking
> >
> > On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
> > >> -----Original Message-----
> > >> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> > >> Sent: Tuesday, January 31, 2012 8:02 PM
> > >> To: Peng, Tao
> > >> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish,
> sorin; Liu, Xuezhao;
> > >> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
> bryon at whamcloud.com; lustre-
> > >> devel at lists.lustre.org
> > >> Subject: Re: Lustre dcache clean up and distributed locking
> > >>
> > >> On 1/31/12 7:27 AM, Andreas Dilger wrote:
> > >>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>   wrote:
> > >>>> Sorry for the late reply. I got your email earlier but didn't have
> chance to read the document
> > due
> > >> to very limited internet access.
> > >>>> Thank you very much for taking detailed look at the document and
> giving so many valuable comments.
> > >>>>
> > >>>> About adding new inode bitlock, I must admit that I missed hardlink
> case. Although the lock mode
> > >> can be changed to be the same as LOOKUP lock, I agree that inode
> bitlock cannot sufficiently
> > protect
> > >> every name/dentry separately for hardlinked files. And I agree, as
> you suggested in the document,
> > >> adding a name entry based lock can solve it.
> > >>>> IIUC, to add a name entry based lock, both client and server should
> be modified, and LDLM need to
> > >> be changed to manage the new lock information. Like new inode
> bitlock, the name entry based lock
> > will
> > >> as well introduce interoperability issues as a result of on wire
> protocol change. And in DNE
> > >> environment as Andreas explained, the original plan is to get LOOKUP
> on both master MDT and remote
> > MDT.
> > >> If we introduce name entry based lock, DNE case may need some change
> as well.
> > >>>> I will take a closer look at the approach. In the meanwhile, could
> you please confirm if it is
> > the
> > >> right way to go? The benefit of above complexity seems to be better
> distributed dentry locking
> > under
> > >> dcache framework.
> > >>>> What do others think? Andreas?
> > >>> Tao, before we go down the road of implementing such a change, there
> would have to be clear
> > benefits
> > >> to the performance and/or kernel integration before we could accept
> the interoperability issues and
> > >> added code complexity.
> > >>> The benefits of having a per-name lock, as I see them, are:
> > >>> - can be cancelled separately from other name locks
> > >>> - can be changed independently from the inode permissions (LOOKUP)
> > >>> - potential ease of integration with the Linux dcache
> > >>>
> > >>> The drawbacks of introducing a per-name lock:
> > >>> - only useful for hard-linked files (uncommon case)
> > >>> - added code complexity for a new lock type
> > >>> - incompatibility with existing client/server
> > >>> - additional code to handle incompatibility transparently
> > >>> - lock enqueue may need extra RPC for each name lock (different
> resource)
> > >>> - lock cancellation may need an extra RPC for each resource
> > >>>
> > >>> The way I see it is that the DENTRY lock would need to be a
> different resource for each name on a
> > >> file.  The only way this is going to be useful AFAICS is if the
> client enqueues each DENTRY lock on
> > a
> > >> separate resource, so that it can be granted and cancelled
> independently of the LOOKUP lock (which
> > is
> > >> on the FID and would be common for all names that reference this
> inode).  That would either need an
> > >> extra RPC for every filename lookup (to enqueue both the DENTRY and
> LOOKUP locks) or a new
> > >> LDLM_ENQUEUE RPC that can request and be granted two different locks
> (FID [seq, oid, ver, 0], and
> > >> FID+name hash [seq, oid, ver, hash]) in a single RPC.
> > >>> So, the main question is how much better the client dcache/VFS
> integration will be with the DCACHE
> > >> lock compared to the existing mechanism with the LOOKUP (FID) lock?
>  Is it impossible (or very
> > >> difficult) to use the existing mechanism for the new lockless dcache
> code?  Or is it a matter of
> > this
> > >> being a somewhat cleaner implementation, but not critical for moving
> forward?
> > >>> I'm aware that our dcache code has been too complex for some time,
> but I haven't looked recently
> > to
> > >> see how much of this is due to using/supporting old kernel APIs, and
> how much of it is due to the
> > use
> > >> of the LOOKUP lock on the FID protecting too much of the state on the
> client.
> > >>> Cheers, Andreas
> > >> Hi Tao,
> > >>
> > >> To introduce either per-name lock or DENTRY lock is mainly for
> removing
> > >> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code. But
> > >> if such simplification causes more complex ldlm logic, we have to
> > >> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
> > >> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux
> kernel.
> > >> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
> > >> lockless dcache in new linux kernel also.
> > >>
> > >> So to push the project move forward, we can divide it as two parts
> > >> (according to your document): Lustre dcache code cleanup, and some new
> > >> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
> > >> agreement on the second part, we can start from the first part to
> > >> cleanup Lustre dcache code. How do you think?
> > >>
> > > Thank you, Andreas and Yong. I tend to agree with you that we should
> avoid the complexity if there
> > is no obvious benefits. However, as we need to make sure the code is
> acceptable by kernel community, I
> > will try to consolidate a design for adding the new type of lock but
> just don't implement it at this
> > point. And if later we are asked (by kernel community developers) to do
> it, we just need to do the
> > implementation.
> >
> > Hi Tao,
> >
> > We has another way to remove Lustre special dentry flags
> > "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
> > lock protected but not unhashed yet, we can add new flags in
> > "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
> > private date, no need to worry about whether the new flags will conflict
> > with any other VFS standard "dentry::d_flags" or not, and maybe save
> > some effort to implement the complex DENTRY lock in your original design.
> >
> Hi Fan Yong,
>
> Thanks for your great suggestion. I updated the design doc with following
> changes, please help to review. Thank you!
>
> Changelog:
> 1. Macro cleanups are removed. I prefer to do it separately while cleaning
> up other obsolete macros.
> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag
> can be hidden from generic VFS layer, it is no longer necessary to
> introduce such complex change.
> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to
> ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.
>
> Cheers,
> Tao
> > Cheers,
> > Nasf
> >
> > > Please stay tuned.
> > >
> > > Best,
> > > Tao
> > >
> > >> Cheers,
> > >> --
> > >> Nasf
> > >>
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> > >>>>> Sent: Saturday, January 21, 2012 1:11 AM
> > >>>>> To: Peng, Tao; bergwolf at gmail.com
> > >>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu,
> Xuezhao; laisiyao at whamcloud.com;
> > >>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
> > >>>>> Subject: Re: Lustre dcache clean up and distributed locking
> > >>>>>
> > >>>>> Hi Tao,
> > >>>>>
> > >>>>> Excellent work. I just added some comments inside your document.
> Please check.
> > >>>>>
> > >>>>>
> > >>>>> Best Regards,
> > >>>>> Fan Yong
> > >>>>> Whamcloud, Inc.
> > >>>>>
> > >>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> > >>>>>> Hi Andreas,
> > >>>>>>
> > >>>>>> Tao is on vacation now. It might be difficult for him to check
> emails due to limited internet
> > >> access
> > >>>>> at hometown.
> > >>>>>> For urgent issue, you folks could send email to his gmail account
> bergwolf at gmail.com.
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Haiying
> > >>>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> > >>>>>> Sent: 2012?1?20? 1:39
> > >>>>>> To: Peng, Tao
> > >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> > >>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com;
> green at whamcloud.com;
> > >>>>>> eeb at whamcloud.com
> > >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> > >>>>>>
> > >>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>   <tao.peng@emc.com>
>   wrote:
> > >>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation
> question on lustre mailing list. I
> > >>>>> updated the design to reflect it.
> > >>>>>>> Please see attachment.
> > >>>>>> Tao,
> > >>>>>> Fan Yong is also taking a more detailed look at your document and
> will
> > >>>>>> hopefully have a chance to reply before the New Year holidays.
> > >>>>>>
> > >>>>>> Also, we are just working on landing the patches to add support
> for
> > >>>>>> Linux
> > >>>>>> 2.6.38 for the Lustre client.  One of the patches relates to the
> > >>>>>> lockless dcache changes that were introduced in that kernel.  If
> you
> > >>>>>> are interested to review this patch, and become more familiar
> with the
> > >>>>>> Lustre development process, you should visit
> http://review.whamcloud.com/1865 for the patch.
> > >>>>>>
> > >>>>>> You need to create an account in Gerrit using OpenID (Google,
> mostly),
> > >>>>>> and an account in our bug tracking system (
> http://jira.whamcloud.com)
> > >>>>>> if you haven't already.
> > >>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> > >>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
> > >>>>>>>> To: Peng, Tao
> > >>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
> > >>>>>>>> Yong; Oleg Drokin; Eric Barton
> > >>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> > >>>>>>>>
> > >>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>   wrote:
> > >>>>>>>>> I finally started to work on Lustre dcache cleanup and locking.
> > >>>>>>>>> After reading Lustre, ocfs2 and VFS
> > >>>>>>>> dcache related code, I came to a design for cleaning up Lustre
> > >>>>>>>> dcache code and doing distributed locking under dcache. For
> > >>>>>>>> distributed locking, the main idea is to add a new inode bitlock
> > >>>>>>>> DENTRY lock to just protect valid dentry, instead of letting
> LOOKUP
> > >>>>>>>> lock handle multiple purpose, which makes client unable to know
> > >>>>>>>> whether file is deleted or not when server cancels LOOKUP lock.
> Instead, client holds PR mode
> > >>>>> DENTRY lock on all valid denties and when server cancels it,
> client knows the file is deleted.
> > >>>>>>>>> For details, please see the attachments (I attached both word
> and
> > >>>>>>>>> pdf versions as I am not sure if
> > >>>>>>>> which is more convenient for you:). And please help to review
> and comment. Thank you!
> > >>>>>>>>
> > >>>>>>>> Hi Tao,
> > >>>>>>>> I'm passing this on to the engineers that are most familiar
> with the
> > >>>>>>>> DLM and client dcache code in Lustre.  After a quick first read
> > >>>>>>>> through your document, your investigation of the client code is
> very
> > >>>>>>>> insightful and looks like it will be able to remove some of the
> significant complexity that
> > has
> > >>>>> grown over time in the llite code.
> > >>>>>>>> Cheers, Andreas
> > >>>>>>>> --
> > >>>>>>>> Andreas Dilger                       Whamcloud, Inc.
> > >>>>>>>> Principal Engineer                   http://www.whamcloud.com/
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> <dcache_distributed_locking-v2.docx>
> > >>>>>> Cheers, Andreas
> > >>>>>> --
> > >>>>>> Andreas Dilger                       Whamcloud, Inc.
> > >>>>>> Principal Engineer                   http://www.whamcloud.com/
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>> Cheers, Andreas
> > >>> --
> > >>> Andreas Dilger                       Whamcloud, Inc.
> > >>> Principal Engineer                   http://www.whamcloud.com/
> > >>>
> > >>>
> > >>>
> > >>>
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120211/645ae651/attachment.htm>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-11 15:58                       ` Yong Fan
@ 2012-02-12  3:46                         ` Peng Tao
  2012-02-23  3:13                           ` tao.peng at emc.com
  0 siblings, 1 reply; 16+ messages in thread
From: Peng Tao @ 2012-02-12  3:46 UTC (permalink / raw)
  To: lustre-devel

Hi Yong,

On Sat, Feb 11, 2012 at 11:58 PM, Yong Fan <yong.fan@whamcloud.com> wrote:
> Hi Tao,
>
> The new design looks OK me.
Thanks for reviewing!

> BTW, you have removed the section about "ll_lookup_lock" from the first
> version document. So you will not change related logic, right? I think this
> part should be fixed in new kernels.
"ll_lookup_lock" is removed in Siyao's latest dcache patch and
replaced by inode lock. So I removed the section about
"ll_lookup_lock" from the design and I will not change related logic.

Cheers,
Tao
>
> Cheers,
> Nasf
>
>
> On Thu, Feb 9, 2012 at 3:15 PM, <tao.peng@emc.com> wrote:
>>
>>
>> > -----Original Message-----
>> > From: Fan Yong [mailto:yong.fan at whamcloud.com]
>> > Sent: Saturday, February 04, 2012 12:18 AM
>> > To: Peng, Tao
>> > Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish,
>> > sorin; Liu, Xuezhao;
>> > laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
>> > bryon at whamcloud.com; lustre-
>> > devel at lists.lustre.org
>> > Subject: Re: Lustre dcache clean up and distributed locking
>> >
>> > On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
>> > >> -----Original Message-----
>> > >> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>> > >> Sent: Tuesday, January 31, 2012 8:02 PM
>> > >> To: Peng, Tao
>> > >> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish,
>> > >> sorin; Liu, Xuezhao;
>> > >> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
>> > >> bryon at whamcloud.com; lustre-
>> > >> devel at lists.lustre.org
>> > >> Subject: Re: Lustre dcache clean up and distributed locking
>> > >>
>> > >> On 1/31/12 7:27 AM, Andreas Dilger wrote:
>> > >>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com> ? wrote:
>> > >>>> Sorry for the late reply. I got your email earlier but didn't have
>> > >>>> chance to read the document
>> > due
>> > >> to very limited internet access.
>> > >>>> Thank you very much for taking detailed look at the document and
>> > >>>> giving so many valuable comments.
>> > >>>>
>> > >>>> About adding new inode bitlock, I must admit that I missed hardlink
>> > >>>> case. Although the lock mode
>> > >> can be changed to be the same as LOOKUP lock, I agree that inode
>> > >> bitlock cannot sufficiently
>> > protect
>> > >> every name/dentry separately for hardlinked files. And I agree, as
>> > >> you suggested in the document,
>> > >> adding a name entry based lock can solve it.
>> > >>>> IIUC, to add a name entry based lock, both client and server should
>> > >>>> be modified, and LDLM need to
>> > >> be changed to manage the new lock information. Like new inode
>> > >> bitlock, the name entry based lock
>> > will
>> > >> as well introduce interoperability issues as a result of on wire
>> > >> protocol change. And in DNE
>> > >> environment as Andreas explained, the original plan is to get LOOKUP
>> > >> on both master MDT and remote
>> > MDT.
>> > >> If we introduce name entry based lock, DNE case may need some change
>> > >> as well.
>> > >>>> I will take a closer look at the approach. In the meanwhile, could
>> > >>>> you please confirm if it is
>> > the
>> > >> right way to go? The benefit of above complexity seems to be better
>> > >> distributed dentry locking
>> > under
>> > >> dcache framework.
>> > >>>> What do others think? Andreas?
>> > >>> Tao, before we go down the road of implementing such a change, there
>> > >>> would have to be clear
>> > benefits
>> > >> to the performance and/or kernel integration before we could accept
>> > >> the interoperability issues and
>> > >> added code complexity.
>> > >>> The benefits of having a per-name lock, as I see them, are:
>> > >>> - can be cancelled separately from other name locks
>> > >>> - can be changed independently from the inode permissions (LOOKUP)
>> > >>> - potential ease of integration with the Linux dcache
>> > >>>
>> > >>> The drawbacks of introducing a per-name lock:
>> > >>> - only useful for hard-linked files (uncommon case)
>> > >>> - added code complexity for a new lock type
>> > >>> - incompatibility with existing client/server
>> > >>> - additional code to handle incompatibility transparently
>> > >>> - lock enqueue may need extra RPC for each name lock (different
>> > >>> resource)
>> > >>> - lock cancellation may need an extra RPC for each resource
>> > >>>
>> > >>> The way I see it is that the DENTRY lock would need to be a
>> > >>> different resource for each name on a
>> > >> file. ?The only way this is going to be useful AFAICS is if the
>> > >> client enqueues each DENTRY lock on
>> > a
>> > >> separate resource, so that it can be granted and cancelled
>> > >> independently of the LOOKUP lock (which
>> > is
>> > >> on the FID and would be common for all names that reference this
>> > >> inode). ?That would either need an
>> > >> extra RPC for every filename lookup (to enqueue both the DENTRY and
>> > >> LOOKUP locks) or a new
>> > >> LDLM_ENQUEUE RPC that can request and be granted two different locks
>> > >> (FID [seq, oid, ver, 0], and
>> > >> FID+name hash [seq, oid, ver, hash]) in a single RPC.
>> > >>> So, the main question is how much better the client dcache/VFS
>> > >>> integration will be with the DCACHE
>> > >> lock compared to the existing mechanism with the LOOKUP (FID) lock?
>> > >> ?Is it impossible (or very
>> > >> difficult) to use the existing mechanism for the new lockless dcache
>> > >> code? ?Or is it a matter of
>> > this
>> > >> being a somewhat cleaner implementation, but not critical for moving
>> > >> forward?
>> > >>> I'm aware that our dcache code has been too complex for some time,
>> > >>> but I haven't looked recently
>> > to
>> > >> see how much of this is due to using/supporting old kernel APIs, and
>> > >> how much of it is due to the
>> > use
>> > >> of the LOOKUP lock on the FID protecting too much of the state on the
>> > >> client.
>> > >>> Cheers, Andreas
>> > >> Hi Tao,
>> > >>
>> > >> To introduce either per-name lock or DENTRY lock is mainly for
>> > >> removing
>> > >> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code.
>> > >> But
>> > >> if such simplification causes more complex ldlm logic, we have to
>> > >> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
>> > >> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux
>> > >> kernel.
>> > >> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
>> > >> lockless dcache in new linux kernel also.
>> > >>
>> > >> So to push the project move forward, we can divide it as two parts
>> > >> (according to your document): Lustre dcache code cleanup, and some
>> > >> new
>> > >> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
>> > >> agreement on the second part, we can start from the first part to
>> > >> cleanup Lustre dcache code. How do you think?
>> > >>
>> > > Thank you, Andreas and Yong. I tend to agree with you that we should
>> > > avoid the complexity if there
>> > is no obvious benefits. However, as we need to make sure the code is
>> > acceptable by kernel community, I
>> > will try to consolidate a design for adding the new type of lock but
>> > just don't implement it at this
>> > point. And if later we are asked (by kernel community developers) to do
>> > it, we just need to do the
>> > implementation.
>> >
>> > Hi Tao,
>> >
>> > We has another way to remove Lustre special dentry flags
>> > "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
>> > lock protected but not unhashed yet, we can add new flags in
>> > "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
>> > private date, no need to worry about whether the new flags will conflict
>> > with any other VFS standard "dentry::d_flags" or not, and maybe save
>> > some effort to implement the complex DENTRY lock in your original
>> > design.
>> >
>> Hi Fan Yong,
>>
>> Thanks for your great suggestion. I updated the design doc with following
>> changes, please help to review. Thank you!
>>
>> Changelog:
>> 1. Macro cleanups are removed. I prefer to do it separately while cleaning
>> up other obsolete macros.
>> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag
>> can be hidden from generic VFS layer, it is no longer necessary to introduce
>> such complex change.
>> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to
>> ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.
>>
>> Cheers,
>> Tao
>> > Cheers,
>> > Nasf
>> >
>> > > Please stay tuned.
>> > >
>> > > Best,
>> > > Tao
>> > >
>> > >> Cheers,
>> > >> --
>> > >> Nasf
>> > >>
>> > >>>
>> > >>>>> -----Original Message-----
>> > >>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
>> > >>>>> Sent: Saturday, January 21, 2012 1:11 AM
>> > >>>>> To: Peng, Tao; bergwolf at gmail.com
>> > >>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu,
>> > >>>>> Xuezhao; laisiyao at whamcloud.com;
>> > >>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
>> > >>>>> Subject: Re: Lustre dcache clean up and distributed locking
>> > >>>>>
>> > >>>>> Hi Tao,
>> > >>>>>
>> > >>>>> Excellent work. I just added some comments inside your document.
>> > >>>>> Please check.
>> > >>>>>
>> > >>>>>
>> > >>>>> Best Regards,
>> > >>>>> Fan Yong
>> > >>>>> Whamcloud, Inc.
>> > >>>>>
>> > >>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
>> > >>>>>> Hi Andreas,
>> > >>>>>>
>> > >>>>>> Tao is on vacation now. It might be difficult for him to check
>> > >>>>>> emails due to limited internet
>> > >> access
>> > >>>>> at hometown.
>> > >>>>>> For urgent issue, you folks could send email to his gmail account
>> > >>>>>> bergwolf at gmail.com.
>> > >>>>>>
>> > >>>>>> Thanks,
>> > >>>>>> Haiying
>> > >>>>>>
>> > >>>>>> -----Original Message-----
>> > >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>> > >>>>>> Sent: 2012?1?20? 1:39
>> > >>>>>> To: Peng, Tao
>> > >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
>> > >>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com;
>> > >>>>>> green at whamcloud.com;
>> > >>>>>> eeb at whamcloud.com
>> > >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>> > >>>>>>
>> > >>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com> ? <tao.peng@emc.com>
>> > >>>>>> ? wrote:
>> > >>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation
>> > >>>>>>> question on lustre mailing list. I
>> > >>>>> updated the design to reflect it.
>> > >>>>>>> Please see attachment.
>> > >>>>>> Tao,
>> > >>>>>> Fan Yong is also taking a more detailed look at your document and
>> > >>>>>> will
>> > >>>>>> hopefully have a chance to reply before the New Year holidays.
>> > >>>>>>
>> > >>>>>> Also, we are just working on landing the patches to add support
>> > >>>>>> for
>> > >>>>>> Linux
>> > >>>>>> 2.6.38 for the Lustre client. ?One of the patches relates to the
>> > >>>>>> lockless dcache changes that were introduced in that kernel. ?If
>> > >>>>>> you
>> > >>>>>> are interested to review this patch, and become more familiar
>> > >>>>>> with the
>> > >>>>>> Lustre development process, you should visit
>> > >>>>>> http://review.whamcloud.com/1865 for the patch.
>> > >>>>>>
>> > >>>>>> You need to create an account in Gerrit using OpenID (Google,
>> > >>>>>> mostly),
>> > >>>>>> and an account in our bug tracking system
>> > >>>>>> (http://jira.whamcloud.com)
>> > >>>>>> if you haven't already.
>> > >>>>>>
>> > >>>>>>>> -----Original Message-----
>> > >>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
>> > >>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
>> > >>>>>>>> To: Peng, Tao
>> > >>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
>> > >>>>>>>> Yong; Oleg Drokin; Eric Barton
>> > >>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
>> > >>>>>>>>
>> > >>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com> ? wrote:
>> > >>>>>>>>> I finally started to work on Lustre dcache cleanup and
>> > >>>>>>>>> locking.
>> > >>>>>>>>> After reading Lustre, ocfs2 and VFS
>> > >>>>>>>> dcache related code, I came to a design for cleaning up Lustre
>> > >>>>>>>> dcache code and doing distributed locking under dcache. For
>> > >>>>>>>> distributed locking, the main idea is to add a new inode
>> > >>>>>>>> bitlock
>> > >>>>>>>> DENTRY lock to just protect valid dentry, instead of letting
>> > >>>>>>>> LOOKUP
>> > >>>>>>>> lock handle multiple purpose, which makes client unable to know
>> > >>>>>>>> whether file is deleted or not when server cancels LOOKUP lock.
>> > >>>>>>>> Instead, client holds PR mode
>> > >>>>> DENTRY lock on all valid denties and when server cancels it,
>> > >>>>> client knows the file is deleted.
>> > >>>>>>>>> For details, please see the attachments (I attached both word
>> > >>>>>>>>> and
>> > >>>>>>>>> pdf versions as I am not sure if
>> > >>>>>>>> which is more convenient for you:). And please help to review
>> > >>>>>>>> and comment. Thank you!
>> > >>>>>>>>
>> > >>>>>>>> Hi Tao,
>> > >>>>>>>> I'm passing this on to the engineers that are most familiar
>> > >>>>>>>> with the
>> > >>>>>>>> DLM and client dcache code in Lustre. ?After a quick first read
>> > >>>>>>>> through your document, your investigation of the client code is
>> > >>>>>>>> very
>> > >>>>>>>> insightful and looks like it will be able to remove some of the
>> > >>>>>>>> significant complexity that
>> > has
>> > >>>>> grown over time in the llite code.
>> > >>>>>>>> Cheers, Andreas
>> > >>>>>>>> --
>> > >>>>>>>> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
>> > >>>>>>>> Principal Engineer ? ? ? ? ? ? ? ? ? http://www.whamcloud.com/
>> > >>>>>>>>
>> > >>>>>>>>
>> > >>>>>>> <dcache_distributed_locking-v2.docx>
>> > >>>>>> Cheers, Andreas
>> > >>>>>> --
>> > >>>>>> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
>> > >>>>>> Principal Engineer ? ? ? ? ? ? ? ? ? http://www.whamcloud.com/
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>>>>>
>> > >>> Cheers, Andreas
>> > >>> --
>> > >>> Andreas Dilger ? ? ? ? ? ? ? ? ? ? ? Whamcloud, Inc.
>> > >>> Principal Engineer ? ? ? ? ? ? ? ? ? http://www.whamcloud.com/
>> > >>>
>> > >>>
>> > >>>
>> > >>>
>> >
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [Lustre-devel] Lustre dcache clean up and distributed locking
  2012-02-12  3:46                         ` Peng Tao
@ 2012-02-23  3:13                           ` tao.peng at emc.com
  0 siblings, 0 replies; 16+ messages in thread
From: tao.peng at emc.com @ 2012-02-23  3:13 UTC (permalink / raw)
  To: lustre-devel

Hi all,

I cleaned up the design a bit more to reflect the changes in Siyao's latest dcache patch. Some of the previous mentioned dcache cleanups are removed because they were already addressed in the dcache scalability patch. And now the remaining task is mainly about hiding DCACHE_LUSTRE_INVALID flag. Please see the attachment for details. I also uploaded it to LU-1113.

Cheers,
Tao


> -----Original Message-----
> From: Peng Tao [mailto:bergwolf at gmail.com]
> Sent: Sunday, February 12, 2012 11:46 AM
> To: Yong Fan
> Cc: Peng, Tao; adilger at whamcloud.com; Tang, Haiying; faibish, sorin; Liu, Xuezhao;
> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com; bryon at whamcloud.com; lustre-
> devel at lists.lustre.org
> Subject: Re: Lustre dcache clean up and distributed locking
>
> Hi Yong,
>
> On Sat, Feb 11, 2012 at 11:58 PM, Yong Fan <yong.fan@whamcloud.com> wrote:
> > Hi Tao,
> >
> > The new design looks OK me.
> Thanks for reviewing!
>
> > BTW, you have removed the section about "ll_lookup_lock" from the first
> > version document. So you will not change related logic, right? I think this
> > part should be fixed in new kernels.
> "ll_lookup_lock" is removed in Siyao's latest dcache patch and
> replaced by inode lock. So I removed the section about
> "ll_lookup_lock" from the design and I will not change related logic.
>
> Cheers,
> Tao
> >
> > Cheers,
> > Nasf
> >
> >
> > On Thu, Feb 9, 2012 at 3:15 PM, <tao.peng@emc.com> wrote:
> >>
> >>
> >> > -----Original Message-----
> >> > From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> > Sent: Saturday, February 04, 2012 12:18 AM
> >> > To: Peng, Tao
> >> > Cc: adilger at whamcloud.com; bergwolf at gmail.com; Tang, Haiying; faibish,
> >> > sorin; Liu, Xuezhao;
> >> > laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
> >> > bryon at whamcloud.com; lustre-
> >> > devel at lists.lustre.org
> >> > Subject: Re: Lustre dcache clean up and distributed locking
> >> >
> >> > On 2/3/12 3:37 PM, tao.peng at emc.com wrote:
> >> > >> -----Original Message-----
> >> > >> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> > >> Sent: Tuesday, January 31, 2012 8:02 PM
> >> > >> To: Peng, Tao
> >> > >> Cc: Andreas Dilger; bergwolf at gmail.com; Tang, Haiying; faibish,
> >> > >> sorin; Liu, Xuezhao;
> >> > >> laisiyao at whamcloud.com; green at whamcloud.com; eeb at whamcloud.com;
> >> > >> bryon at whamcloud.com; lustre-
> >> > >> devel at lists.lustre.org
> >> > >> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>
> >> > >> On 1/31/12 7:27 AM, Andreas Dilger wrote:
> >> > >>> On 2012-01-30, at 12:46 AM,<tao.peng@emc.com>   wrote:
> >> > >>>> Sorry for the late reply. I got your email earlier but didn't have
> >> > >>>> chance to read the document
> >> > due
> >> > >> to very limited internet access.
> >> > >>>> Thank you very much for taking detailed look at the document and
> >> > >>>> giving so many valuable comments.
> >> > >>>>
> >> > >>>> About adding new inode bitlock, I must admit that I missed hardlink
> >> > >>>> case. Although the lock mode
> >> > >> can be changed to be the same as LOOKUP lock, I agree that inode
> >> > >> bitlock cannot sufficiently
> >> > protect
> >> > >> every name/dentry separately for hardlinked files. And I agree, as
> >> > >> you suggested in the document,
> >> > >> adding a name entry based lock can solve it.
> >> > >>>> IIUC, to add a name entry based lock, both client and server should
> >> > >>>> be modified, and LDLM need to
> >> > >> be changed to manage the new lock information. Like new inode
> >> > >> bitlock, the name entry based lock
> >> > will
> >> > >> as well introduce interoperability issues as a result of on wire
> >> > >> protocol change. And in DNE
> >> > >> environment as Andreas explained, the original plan is to get LOOKUP
> >> > >> on both master MDT and remote
> >> > MDT.
> >> > >> If we introduce name entry based lock, DNE case may need some change
> >> > >> as well.
> >> > >>>> I will take a closer look at the approach. In the meanwhile, could
> >> > >>>> you please confirm if it is
> >> > the
> >> > >> right way to go? The benefit of above complexity seems to be better
> >> > >> distributed dentry locking
> >> > under
> >> > >> dcache framework.
> >> > >>>> What do others think? Andreas?
> >> > >>> Tao, before we go down the road of implementing such a change, there
> >> > >>> would have to be clear
> >> > benefits
> >> > >> to the performance and/or kernel integration before we could accept
> >> > >> the interoperability issues and
> >> > >> added code complexity.
> >> > >>> The benefits of having a per-name lock, as I see them, are:
> >> > >>> - can be cancelled separately from other name locks
> >> > >>> - can be changed independently from the inode permissions (LOOKUP)
> >> > >>> - potential ease of integration with the Linux dcache
> >> > >>>
> >> > >>> The drawbacks of introducing a per-name lock:
> >> > >>> - only useful for hard-linked files (uncommon case)
> >> > >>> - added code complexity for a new lock type
> >> > >>> - incompatibility with existing client/server
> >> > >>> - additional code to handle incompatibility transparently
> >> > >>> - lock enqueue may need extra RPC for each name lock (different
> >> > >>> resource)
> >> > >>> - lock cancellation may need an extra RPC for each resource
> >> > >>>
> >> > >>> The way I see it is that the DENTRY lock would need to be a
> >> > >>> different resource for each name on a
> >> > >> file.  The only way this is going to be useful AFAICS is if the
> >> > >> client enqueues each DENTRY lock on
> >> > a
> >> > >> separate resource, so that it can be granted and cancelled
> >> > >> independently of the LOOKUP lock (which
> >> > is
> >> > >> on the FID and would be common for all names that reference this
> >> > >> inode).  That would either need an
> >> > >> extra RPC for every filename lookup (to enqueue both the DENTRY and
> >> > >> LOOKUP locks) or a new
> >> > >> LDLM_ENQUEUE RPC that can request and be granted two different locks
> >> > >> (FID [seq, oid, ver, 0], and
> >> > >> FID+name hash [seq, oid, ver, hash]) in a single RPC.
> >> > >>> So, the main question is how much better the client dcache/VFS
> >> > >>> integration will be with the DCACHE
> >> > >> lock compared to the existing mechanism with the LOOKUP (FID) lock?
> >> > >>  Is it impossible (or very
> >> > >> difficult) to use the existing mechanism for the new lockless dcache
> >> > >> code?  Or is it a matter of
> >> > this
> >> > >> being a somewhat cleaner implementation, but not critical for moving
> >> > >> forward?
> >> > >>> I'm aware that our dcache code has been too complex for some time,
> >> > >>> but I haven't looked recently
> >> > to
> >> > >> see how much of this is due to using/supporting old kernel APIs, and
> >> > >> how much of it is due to the
> >> > use
> >> > >> of the LOOKUP lock on the FID protecting too much of the state on the
> >> > >> client.
> >> > >>> Cheers, Andreas
> >> > >> Hi Tao,
> >> > >>
> >> > >> To introduce either per-name lock or DENTRY lock is mainly for
> >> > >> removing
> >> > >> Lustre special flag "DCACHE_LUSTRE_INVALID" to simply Lustre code.
> >> > >> But
> >> > >> if such simplification causes more complex ldlm logic, we have to
> >> > >> consider whether it is worth to remove "DCACHE_LUSTRE_INVALID". It is
> >> > >> not impossible to make "DCACHE_LUSTRE_INVALID" accepted by linux
> >> > >> kernel.
> >> > >> On the other hand, "DCACHE_LUSTRE_INVALID" can work together with
> >> > >> lockless dcache in new linux kernel also.
> >> > >>
> >> > >> So to push the project move forward, we can divide it as two parts
> >> > >> (according to your document): Lustre dcache code cleanup, and some
> >> > >> new
> >> > >> ldlm logic to remove "DCACHE_LUSTRE_INVALID". Before we reach an
> >> > >> agreement on the second part, we can start from the first part to
> >> > >> cleanup Lustre dcache code. How do you think?
> >> > >>
> >> > > Thank you, Andreas and Yong. I tend to agree with you that we should
> >> > > avoid the complexity if there
> >> > is no obvious benefits. However, as we need to make sure the code is
> >> > acceptable by kernel community, I
> >> > will try to consolidate a design for adding the new type of lock but
> >> > just don't implement it at this
> >> > point. And if later we are asked (by kernel community developers) to do
> >> > it, we just need to do the
> >> > implementation.
> >> >
> >> > Hi Tao,
> >> >
> >> > We has another way to remove Lustre special dentry flags
> >> > "DCACHE_LUSTRE_INVALID". To indicate the dentry in dcache without LOOKUP
> >> > lock protected but not unhashed yet, we can add new flags in
> >> > "ll_dentry_data" (pointed by "dentry::d_fsdata"). Since it is Lustre
> >> > private date, no need to worry about whether the new flags will conflict
> >> > with any other VFS standard "dentry::d_flags" or not, and maybe save
> >> > some effort to implement the complex DENTRY lock in your original
> >> > design.
> >> >
> >> Hi Fan Yong,
> >>
> >> Thanks for your great suggestion. I updated the design doc with following
> >> changes, please help to review. Thank you!
> >>
> >> Changelog:
> >> 1. Macro cleanups are removed. I prefer to do it separately while cleaning
> >> up other obsolete macros.
> >> 2. Remove DENTRY LDLM lock proposal. Because DCACHE_LUSTRE_INVALID flag
> >> can be hidden from generic VFS layer, it is no longer necessary to introduce
> >> such complex change.
> >> 3. Add a section to describe how to add LLD_LUSTRE_INVALID flag to
> >> ll_dentry_data structure to replace existing DCACHE_LUSTRE_INVALID flag.
> >>
> >> Cheers,
> >> Tao
> >> > Cheers,
> >> > Nasf
> >> >
> >> > > Please stay tuned.
> >> > >
> >> > > Best,
> >> > > Tao
> >> > >
> >> > >> Cheers,
> >> > >> --
> >> > >> Nasf
> >> > >>
> >> > >>>
> >> > >>>>> -----Original Message-----
> >> > >>>>> From: Fan Yong [mailto:yong.fan at whamcloud.com]
> >> > >>>>> Sent: Saturday, January 21, 2012 1:11 AM
> >> > >>>>> To: Peng, Tao; bergwolf at gmail.com
> >> > >>>>> Cc: Tang, Haiying; adilger at whamcloud.com; faibish, sorin; Liu,
> >> > >>>>> Xuezhao; laisiyao at whamcloud.com;
> >> > >>>>> green at whamcloud.com; eeb at whamcloud.com; Bryon Neitzel
> >> > >>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>>>>
> >> > >>>>> Hi Tao,
> >> > >>>>>
> >> > >>>>> Excellent work. I just added some comments inside your document.
> >> > >>>>> Please check.
> >> > >>>>>
> >> > >>>>>
> >> > >>>>> Best Regards,
> >> > >>>>> Fan Yong
> >> > >>>>> Whamcloud, Inc.
> >> > >>>>>
> >> > >>>>> On 1/20/12 9:34 AM, haiying.tang at emc.com wrote:
> >> > >>>>>> Hi Andreas,
> >> > >>>>>>
> >> > >>>>>> Tao is on vacation now. It might be difficult for him to check
> >> > >>>>>> emails due to limited internet
> >> > >> access
> >> > >>>>> at hometown.
> >> > >>>>>> For urgent issue, you folks could send email to his gmail account
> >> > >>>>>> bergwolf at gmail.com.
> >> > >>>>>>
> >> > >>>>>> Thanks,
> >> > >>>>>> Haiying
> >> > >>>>>>
> >> > >>>>>> -----Original Message-----
> >> > >>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >> > >>>>>> Sent: 2012?1?20? 1:39
> >> > >>>>>> To: Peng, Tao
> >> > >>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao;
> >> > >>>>>> laisiyao at whamcloud.com; yong.fan at whamcloud.com;
> >> > >>>>>> green at whamcloud.com;
> >> > >>>>>> eeb at whamcloud.com
> >> > >>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>>>>>
> >> > >>>>>> On 2012-01-17, at 3:21 AM,<tao.peng@emc.com>   <tao.peng@emc.com>
> >> > >>>>>>   wrote:
> >> > >>>>>>> Thanks Siyao and Oleg for answering my dcache revalidation
> >> > >>>>>>> question on lustre mailing list. I
> >> > >>>>> updated the design to reflect it.
> >> > >>>>>>> Please see attachment.
> >> > >>>>>> Tao,
> >> > >>>>>> Fan Yong is also taking a more detailed look at your document and
> >> > >>>>>> will
> >> > >>>>>> hopefully have a chance to reply before the New Year holidays.
> >> > >>>>>>
> >> > >>>>>> Also, we are just working on landing the patches to add support
> >> > >>>>>> for
> >> > >>>>>> Linux
> >> > >>>>>> 2.6.38 for the Lustre client.  One of the patches relates to the
> >> > >>>>>> lockless dcache changes that were introduced in that kernel.  If
> >> > >>>>>> you
> >> > >>>>>> are interested to review this patch, and become more familiar
> >> > >>>>>> with the
> >> > >>>>>> Lustre development process, you should visit
> >> > >>>>>> http://review.whamcloud.com/1865 for the patch.
> >> > >>>>>>
> >> > >>>>>> You need to create an account in Gerrit using OpenID (Google,
> >> > >>>>>> mostly),
> >> > >>>>>> and an account in our bug tracking system
> >> > >>>>>> (http://jira.whamcloud.com)
> >> > >>>>>> if you haven't already.
> >> > >>>>>>
> >> > >>>>>>>> -----Original Message-----
> >> > >>>>>>>> From: Andreas Dilger [mailto:adilger at whamcloud.com]
> >> > >>>>>>>> Sent: Tuesday, January 17, 2012 4:16 PM
> >> > >>>>>>>> To: Peng, Tao
> >> > >>>>>>>> Cc: faibish, sorin; Tang, Haiying; Liu, Xuezhao; Lai Siyao; Fan
> >> > >>>>>>>> Yong; Oleg Drokin; Eric Barton
> >> > >>>>>>>> Subject: Re: Lustre dcache clean up and distributed locking
> >> > >>>>>>>>
> >> > >>>>>>>> On 2012-01-16, at 9:25 PM,<tao.peng@emc.com>   wrote:
> >> > >>>>>>>>> I finally started to work on Lustre dcache cleanup and
> >> > >>>>>>>>> locking.
> >> > >>>>>>>>> After reading Lustre, ocfs2 and VFS
> >> > >>>>>>>> dcache related code, I came to a design for cleaning up Lustre
> >> > >>>>>>>> dcache code and doing distributed locking under dcache. For
> >> > >>>>>>>> distributed locking, the main idea is to add a new inode
> >> > >>>>>>>> bitlock
> >> > >>>>>>>> DENTRY lock to just protect valid dentry, instead of letting
> >> > >>>>>>>> LOOKUP
> >> > >>>>>>>> lock handle multiple purpose, which makes client unable to know
> >> > >>>>>>>> whether file is deleted or not when server cancels LOOKUP lock.
> >> > >>>>>>>> Instead, client holds PR mode
> >> > >>>>> DENTRY lock on all valid denties and when server cancels it,
> >> > >>>>> client knows the file is deleted.
> >> > >>>>>>>>> For details, please see the attachments (I attached both word
> >> > >>>>>>>>> and
> >> > >>>>>>>>> pdf versions as I am not sure if
> >> > >>>>>>>> which is more convenient for you:). And please help to review
> >> > >>>>>>>> and comment. Thank you!
> >> > >>>>>>>>
> >> > >>>>>>>> Hi Tao,
> >> > >>>>>>>> I'm passing this on to the engineers that are most familiar
> >> > >>>>>>>> with the
> >> > >>>>>>>> DLM and client dcache code in Lustre.  After a quick first read
> >> > >>>>>>>> through your document, your investigation of the client code is
> >> > >>>>>>>> very
> >> > >>>>>>>> insightful and looks like it will be able to remove some of the
> >> > >>>>>>>> significant complexity that
> >> > has
> >> > >>>>> grown over time in the llite code.
> >> > >>>>>>>> Cheers, Andreas
> >> > >>>>>>>> --
> >> > >>>>>>>> Andreas Dilger                       Whamcloud, Inc.
> >> > >>>>>>>> Principal Engineer                   http://www.whamcloud.com/
> >> > >>>>>>>>
> >> > >>>>>>>>
> >> > >>>>>>> <dcache_distributed_locking-v2.docx>
> >> > >>>>>> Cheers, Andreas
> >> > >>>>>> --
> >> > >>>>>> Andreas Dilger                       Whamcloud, Inc.
> >> > >>>>>> Principal Engineer                   http://www.whamcloud.com/
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>>>>>
> >> > >>> Cheers, Andreas
> >> > >>> --
> >> > >>> Andreas Dilger                       Whamcloud, Inc.
> >> > >>> Principal Engineer                   http://www.whamcloud.com/
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>>
> >> >
> >>
> >

-------------- next part --------------
A non-text attachment was scrubbed...
Name: dcache_cleanup_v4.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 24430 bytes
Desc: dcache_cleanup_v4.docx
URL: <http://lists.lustre.org/pipermail/lustre-devel-lustre.org/attachments/20120222/36c6b598/attachment.bin>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-02-23  3:13 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <F19688880B763E40B28B2B462677FBF805E7E684CD@MX09A.corp.emc.com>
     [not found] ` <AC57B2B9-5559-4713-905D-7AC9BAE3B1DB@whamcloud.com>
     [not found]   ` <28CD3A7C-52EE-4E75-B00F-4280D60CE17D@whamcloud.com>
     [not found]     ` <CALxHnhL5Oz7cwqF15x65enx1qhgadSkftGa-=QAuY1N+9B5rSw@mail.gmail.com>
2012-01-30  7:44       ` [Lustre-devel] Lustre dcache clean up and distributed locking tao.peng at emc.com
     [not found]   ` <F19688880B763E40B28B2B462677FBF805E7E6857F@MX09A.corp.emc.com>
     [not found]     ` <8449503C-66B3-4CDC-89BF-5B52E6D29579@whamcloud.com>
     [not found]       ` <5F3A45681A0F5944B3194CF8C8FB905B66FA4DD4F9@MX17A.corp.emc.com>
     [not found]         ` <4F19A01B.4070405@whamcloud.com>
     [not found]           ` <8A50205D-3704-4969-9997-3D6157C252D0@whamcloud.com>
2012-01-22 16:11             ` Fan Yong
2012-01-22 17:33               ` Andreas Dilger
2012-01-30  7:46           ` tao.peng at emc.com
2012-01-30 23:27             ` Andreas Dilger
2012-01-31 12:02               ` Fan Yong
2012-02-03  7:37                 ` tao.peng at emc.com
2012-02-03 16:17                   ` Fan Yong
2012-02-05 13:50                     ` Peng Tao
2012-02-09  7:15                     ` tao.peng at emc.com
2012-02-10 20:37                       ` Andreas Dilger
2012-02-11 13:40                         ` Peng Tao
2012-02-11 15:58                       ` Yong Fan
2012-02-12  3:46                         ` Peng Tao
2012-02-23  3:13                           ` tao.peng at emc.com
2012-02-09  7:35                     ` tao.peng at emc.com

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.