* configfs, dlm_controld & lockdep @ 2008-12-11 14:20 ` Steven Whitehouse 0 siblings, 0 replies; 53+ messages in thread From: Steven Whitehouse @ 2008-12-11 14:20 UTC (permalink / raw) To: joel.becker; +Cc: linux-kernel, cluster-devel Hi, I've been trying to track down the cause of the following messages which appear in my logs each time I start up dlm_controld: Dec 1 12:53:17 men-an-tol kernel: ============================================= Dec 1 12:53:17 men-an-tol kernel: [ INFO: possible recursive locking detected ] Dec 1 12:53:17 men-an-tol kernel: 2.6.28-rc5 #179 Dec 1 12:53:17 men-an-tol kernel: --------------------------------------------- Dec 1 12:53:17 men-an-tol kernel: dlm_controld/2455 is trying to acquire lock: Dec 1 12:53:17 men-an-tol kernel: (&sb->s_type->i_mutex_key#11/2){--..}, at: [< ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] Dec 1 12:53:17 men-an-tol kernel: Dec 1 12:53:17 men-an-tol kernel: but task is already holding lock: Dec 1 12:53:17 men-an-tol kernel: (&sb->s_type->i_mutex_key#11/2){--..}, at: [< ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] Dec 1 12:53:17 men-an-tol kernel: Dec 1 12:53:17 men-an-tol kernel: other info that might help us debug this: Dec 1 12:53:17 men-an-tol kernel: 2 locks held by dlm_controld/2455: Dec 1 12:53:17 men-an-tol kernel: #0: (&sb->s_type->i_mutex_key#10/1){--..}, a t: [<ffffffff802b5e11>] lookup_create+0x26/0x94 Dec 1 12:53:17 men-an-tol kernel: #1: (&sb->s_type->i_mutex_key#11/2){--..}, a t: [<ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] Dec 1 12:53:17 men-an-tol kernel: which seems to be caused by the mkdir which dlm_controld does in configfs. Looking at the stack trace, this didn't make much sense until I stuck noinline in front of several functions in configfs, whereupon I get: [<ffffffff8025808e>] __lock_acquire+0xdce/0x14f5 [<ffffffff80254894>] ? get_lock_stats+0x34/0x5c [<ffffffff802548ca>] ? put_lock_stats+0xe/0x27 [<ffffffff802549c3>] ? lock_release_holdtime+0xe0/0xe5 [<ffffffff8025880a>] lock_acquire+0x55/0x71 [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] [<ffffffff8050563c>] mutex_lock_nested+0xf9/0x2c5 [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] [<ffffffffa024ac7c>] ? configfs_attach_item+0xed/0x201 [configfs] [<ffffffffa024add0>] configfs_attach_group+0x40/0x89 [configfs] <- second call [<ffffffffa024aec5>] create_default_group+0xac/0xe3 [configfs] [<ffffffffa024af24>] populate_groups+0x28/0x52 [configfs] [<ffffffffa024add8>] configfs_attach_group+0x48/0x89 [configfs] <- first call [<ffffffffa024b222>] configfs_mkdir+0x2d4/0x3bf [configfs] [<ffffffff802b66ef>] vfs_mkdir+0xb0/0x121 [<ffffffff802b8afa>] sys_mkdirat+0xa2/0xf5 [<ffffffff8020b4fc>] ? sysret_check+0x27/0x62 [<ffffffff802569ec>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff8026cf14>] ? audit_syscall_entry+0x126/0x15a [<ffffffff802b8b60>] sys_mkdir+0x13/0x15 [<ffffffff8020b4cb>] system_call_fastpath+0x16/0x1b so it looks like configfs_attach_group is being called recursively in this case, and I think thats the cause of the warning messages. Also I spotted a couple of other things... from configfs_attach_item() the inode mutex which is being locked just uses a plain old mutex_lock() call whereas in configfs_attach_group() which calls configfs_attach_item() there is an annotated I_MUTEX_CHILD call. I would have expected them both to be the same since I presume that the parent is common (locked by the VFS if I've understood whats going on here). The second thing is that configfs_attach_item() calls populate_attrs() which calls through to configfs_add_file(), so in order words it seems to also be called from the context of the mkdir call. In that case the inode mutex is locked with I_MUTEX_NORMAL annotation. So I'm a bit confused as to why lockdep doesn't flag up those issues too since they appear to occur before the one which produced the above message, or maybe I've misunderstood how the annotation works. Any ideas what is going wrong here? I think it must just be an annotation issue since it appears that configfs works perfectly ok otherwise, but it would be nice to get to the bottom of it, Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] configfs, dlm_controld & lockdep @ 2008-12-11 14:20 ` Steven Whitehouse 0 siblings, 0 replies; 53+ messages in thread From: Steven Whitehouse @ 2008-12-11 14:20 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, I've been trying to track down the cause of the following messages which appear in my logs each time I start up dlm_controld: Dec 1 12:53:17 men-an-tol kernel: ============================================= Dec 1 12:53:17 men-an-tol kernel: [ INFO: possible recursive locking detected ] Dec 1 12:53:17 men-an-tol kernel: 2.6.28-rc5 #179 Dec 1 12:53:17 men-an-tol kernel: --------------------------------------------- Dec 1 12:53:17 men-an-tol kernel: dlm_controld/2455 is trying to acquire lock: Dec 1 12:53:17 men-an-tol kernel: (&sb->s_type->i_mutex_key#11/2){--..}, at: [< ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] Dec 1 12:53:17 men-an-tol kernel: Dec 1 12:53:17 men-an-tol kernel: but task is already holding lock: Dec 1 12:53:17 men-an-tol kernel: (&sb->s_type->i_mutex_key#11/2){--..}, at: [< ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] Dec 1 12:53:17 men-an-tol kernel: Dec 1 12:53:17 men-an-tol kernel: other info that might help us debug this: Dec 1 12:53:17 men-an-tol kernel: 2 locks held by dlm_controld/2455: Dec 1 12:53:17 men-an-tol kernel: #0: (&sb->s_type->i_mutex_key#10/1){--..}, a t: [<ffffffff802b5e11>] lookup_create+0x26/0x94 Dec 1 12:53:17 men-an-tol kernel: #1: (&sb->s_type->i_mutex_key#11/2){--..}, a t: [<ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] Dec 1 12:53:17 men-an-tol kernel: which seems to be caused by the mkdir which dlm_controld does in configfs. Looking at the stack trace, this didn't make much sense until I stuck noinline in front of several functions in configfs, whereupon I get: [<ffffffff8025808e>] __lock_acquire+0xdce/0x14f5 [<ffffffff80254894>] ? get_lock_stats+0x34/0x5c [<ffffffff802548ca>] ? put_lock_stats+0xe/0x27 [<ffffffff802549c3>] ? lock_release_holdtime+0xe0/0xe5 [<ffffffff8025880a>] lock_acquire+0x55/0x71 [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] [<ffffffff8050563c>] mutex_lock_nested+0xf9/0x2c5 [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] [<ffffffffa024ac7c>] ? configfs_attach_item+0xed/0x201 [configfs] [<ffffffffa024add0>] configfs_attach_group+0x40/0x89 [configfs] <- second call [<ffffffffa024aec5>] create_default_group+0xac/0xe3 [configfs] [<ffffffffa024af24>] populate_groups+0x28/0x52 [configfs] [<ffffffffa024add8>] configfs_attach_group+0x48/0x89 [configfs] <- first call [<ffffffffa024b222>] configfs_mkdir+0x2d4/0x3bf [configfs] [<ffffffff802b66ef>] vfs_mkdir+0xb0/0x121 [<ffffffff802b8afa>] sys_mkdirat+0xa2/0xf5 [<ffffffff8020b4fc>] ? sysret_check+0x27/0x62 [<ffffffff802569ec>] ? trace_hardirqs_on_caller+0xf0/0x114 [<ffffffff8026cf14>] ? audit_syscall_entry+0x126/0x15a [<ffffffff802b8b60>] sys_mkdir+0x13/0x15 [<ffffffff8020b4cb>] system_call_fastpath+0x16/0x1b so it looks like configfs_attach_group is being called recursively in this case, and I think thats the cause of the warning messages. Also I spotted a couple of other things... from configfs_attach_item() the inode mutex which is being locked just uses a plain old mutex_lock() call whereas in configfs_attach_group() which calls configfs_attach_item() there is an annotated I_MUTEX_CHILD call. I would have expected them both to be the same since I presume that the parent is common (locked by the VFS if I've understood whats going on here). The second thing is that configfs_attach_item() calls populate_attrs() which calls through to configfs_add_file(), so in order words it seems to also be called from the context of the mkdir call. In that case the inode mutex is locked with I_MUTEX_NORMAL annotation. So I'm a bit confused as to why lockdep doesn't flag up those issues too since they appear to occur before the one which produced the above message, or maybe I've misunderstood how the annotation works. Any ideas what is going wrong here? I think it must just be an annotation issue since it appears that configfs works perfectly ok otherwise, but it would be nice to get to the bottom of it, Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: configfs, dlm_controld & lockdep 2008-12-11 14:20 ` [Cluster-devel] " Steven Whitehouse (?) @ 2008-12-11 14:44 ` Louis Rilling 2008-12-11 17:34 ` [Cluster-devel] " Joel Becker -1 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2008-12-11 14:44 UTC (permalink / raw) To: Steven Whitehouse; +Cc: joel.becker, linux-kernel, cluster-devel [-- Attachment #1: Type: text/plain, Size: 6000 bytes --] Hi, On 11/12/08 14:20 +0000, Steven Whitehouse wrote: > Hi, > > I've been trying to track down the cause of the following messages which > appear in my logs each time I start up dlm_controld: > > Dec 1 12:53:17 men-an-tol kernel: > ============================================= > Dec 1 12:53:17 men-an-tol kernel: [ INFO: possible recursive locking > detected ] > Dec 1 12:53:17 men-an-tol kernel: 2.6.28-rc5 #179 > Dec 1 12:53:17 men-an-tol kernel: > --------------------------------------------- > Dec 1 12:53:17 men-an-tol kernel: dlm_controld/2455 is trying to > acquire lock: > Dec 1 12:53:17 men-an-tol kernel: > (&sb->s_type->i_mutex_key#11/2){--..}, at: [< > ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] > Dec 1 12:53:17 men-an-tol kernel: > Dec 1 12:53:17 men-an-tol kernel: but task is already holding lock: > Dec 1 12:53:17 men-an-tol kernel: > (&sb->s_type->i_mutex_key#11/2){--..}, at: [< > ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] > Dec 1 12:53:17 men-an-tol kernel: > Dec 1 12:53:17 men-an-tol kernel: other info that might help us debug > this: > Dec 1 12:53:17 men-an-tol kernel: 2 locks held by dlm_controld/2455: > Dec 1 12:53:17 men-an-tol kernel: #0: > (&sb->s_type->i_mutex_key#10/1){--..}, a > t: [<ffffffff802b5e11>] lookup_create+0x26/0x94 > Dec 1 12:53:17 men-an-tol kernel: #1: > (&sb->s_type->i_mutex_key#11/2){--..}, a > t: [<ffffffffa0294d76>] configfs_attach_group+0x4a/0x183 [configfs] > Dec 1 12:53:17 men-an-tol kernel: > > which seems to be caused by the mkdir which dlm_controld does in > configfs. Looking at the stack trace, this didn't make much sense until > I stuck noinline in front of several functions in configfs, whereupon I > get: > [<ffffffff8025808e>] __lock_acquire+0xdce/0x14f5 > [<ffffffff80254894>] ? get_lock_stats+0x34/0x5c > [<ffffffff802548ca>] ? put_lock_stats+0xe/0x27 > [<ffffffff802549c3>] ? lock_release_holdtime+0xe0/0xe5 > [<ffffffff8025880a>] lock_acquire+0x55/0x71 > [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] > [<ffffffff8050563c>] mutex_lock_nested+0xf9/0x2c5 > [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] > [<ffffffffa024add0>] ? configfs_attach_group+0x40/0x89 [configfs] > [<ffffffffa024ac7c>] ? configfs_attach_item+0xed/0x201 [configfs] > [<ffffffffa024add0>] configfs_attach_group+0x40/0x89 [configfs] <- second call > [<ffffffffa024aec5>] create_default_group+0xac/0xe3 [configfs] > [<ffffffffa024af24>] populate_groups+0x28/0x52 [configfs] > [<ffffffffa024add8>] configfs_attach_group+0x48/0x89 [configfs] <- first call > [<ffffffffa024b222>] configfs_mkdir+0x2d4/0x3bf [configfs] > [<ffffffff802b66ef>] vfs_mkdir+0xb0/0x121 > [<ffffffff802b8afa>] sys_mkdirat+0xa2/0xf5 > [<ffffffff8020b4fc>] ? sysret_check+0x27/0x62 > [<ffffffff802569ec>] ? trace_hardirqs_on_caller+0xf0/0x114 > [<ffffffff8026cf14>] ? audit_syscall_entry+0x126/0x15a > [<ffffffff802b8b60>] sys_mkdir+0x13/0x15 > [<ffffffff8020b4cb>] system_call_fastpath+0x16/0x1b These warnings are known issues. This results from a lack of lockdep annotations in configfs. I must admit that I started to send patches for that a few months ago, and then could not find time to finish this work. The problem is a bit harder than just playing with I_MUTEX_CHILD, I_MUTEX_PARENT and I_MUTEX_NORMAL, since configfs recursively locks variable numbers (this can go to as many as the depth of the whole configfs tree) of config_group inodes during operations like mkdir(), rmdir(), and depend_item(). I was working on two kinds of solutions: 1) insert lockdep_off()/lockdep_on() at the places of recursion, 2) separate default groups inode mutex classes according to their depth under the created group they belong to. People tend to reject any proposition like 1), but IIRC Joel was tending to accept it. Solution 2) does not work for depend_item(). This needs to rework the locking scheme of depend_item() by removing the variable lock recursion depth, and I think that it's doable thanks to the configfs_dirent_lock. Joel, what do you think about this? Anyway, I still hope to find time for this :) Louis > > so it looks like configfs_attach_group is being called recursively in > this case, and I think thats the cause of the warning messages. Also I > spotted a couple of other things... from configfs_attach_item() the > inode mutex which is being locked just uses a plain old mutex_lock() > call whereas in configfs_attach_group() which calls > configfs_attach_item() there is an annotated I_MUTEX_CHILD call. I would > have expected them both to be the same since I presume that the parent > is common (locked by the VFS if I've understood whats going on here). > > The second thing is that configfs_attach_item() calls populate_attrs() > which calls through to configfs_add_file(), so in order words it seems > to also be called from the context of the mkdir call. In that case the > inode mutex is locked with I_MUTEX_NORMAL annotation. > > So I'm a bit confused as to why lockdep doesn't flag up those issues too > since they appear to occur before the one which produced the above > message, or maybe I've misunderstood how the annotation works. > > Any ideas what is going wrong here? I think it must just be an > annotation issue since it appears that configfs works perfectly ok > otherwise, but it would be nice to get to the bottom of it, > > Steve. > > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: configfs, dlm_controld & lockdep 2008-12-11 14:44 ` Louis Rilling @ 2008-12-11 17:34 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-11 17:34 UTC (permalink / raw) To: Steven Whitehouse, linux-kernel, cluster-devel On Thu, Dec 11, 2008 at 03:44:41PM +0100, Louis Rilling wrote: > These warnings are known issues. This results from a lack of lockdep annotations > in configfs. I must admit that I started to send patches for that a few months > ago, and then could not find time to finish this work. > > The problem is a bit harder than just playing with I_MUTEX_CHILD, I_MUTEX_PARENT > and I_MUTEX_NORMAL, since configfs recursively locks variable numbers > (this can go to as many as the depth of the whole configfs tree) of > config_group inodes during operations like mkdir(), rmdir(), and depend_item(). > > I was working on two kinds of solutions: > 1) insert lockdep_off()/lockdep_on() at the places of recursion, > 2) separate default groups inode mutex classes according to their depth under > the created group they belong to. > > People tend to reject any proposition like 1), but IIRC Joel was tending to > accept it. > > Solution 2) does not work for depend_item(). This needs to rework the locking > scheme of depend_item() by removing the variable lock recursion depth, and I > think that it's doable thanks to the configfs_dirent_lock. > Joel, what do you think about this? I've been waiting for your patch for (1). I am wary of the (2) approach. Not because it wouldn't work for mkdir(2) - I think it would. But rmdir(2) has the same recursive locking, with far more importance (live objects), and would print the same error. Joel -- Life's Little Instruction Book #267 "Lie on your back and look at the stars." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: configfs, dlm_controld & lockdep @ 2008-12-11 17:34 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-11 17:34 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 11, 2008 at 03:44:41PM +0100, Louis Rilling wrote: > These warnings are known issues. This results from a lack of lockdep annotations > in configfs. I must admit that I started to send patches for that a few months > ago, and then could not find time to finish this work. > > The problem is a bit harder than just playing with I_MUTEX_CHILD, I_MUTEX_PARENT > and I_MUTEX_NORMAL, since configfs recursively locks variable numbers > (this can go to as many as the depth of the whole configfs tree) of > config_group inodes during operations like mkdir(), rmdir(), and depend_item(). > > I was working on two kinds of solutions: > 1) insert lockdep_off()/lockdep_on() at the places of recursion, > 2) separate default groups inode mutex classes according to their depth under > the created group they belong to. > > People tend to reject any proposition like 1), but IIRC Joel was tending to > accept it. > > Solution 2) does not work for depend_item(). This needs to rework the locking > scheme of depend_item() by removing the variable lock recursion depth, and I > think that it's doable thanks to the configfs_dirent_lock. > Joel, what do you think about this? I've been waiting for your patch for (1). I am wary of the (2) approach. Not because it wouldn't work for mkdir(2) - I think it would. But rmdir(2) has the same recursive locking, with far more importance (live objects), and would print the same error. Joel -- Life's Little Instruction Book #267 "Lie on your back and look at the stars." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: configfs, dlm_controld & lockdep 2008-12-11 17:34 ` [Cluster-devel] " Joel Becker (?) @ 2008-12-12 10:06 ` Louis Rilling 2008-12-12 15:29 ` [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Louis Rilling -1 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2008-12-12 10:06 UTC (permalink / raw) To: Steven Whitehouse, linux-kernel, cluster-devel [-- Attachment #1: Type: text/plain, Size: 1988 bytes --] On 11/12/08 9:34 -0800, Joel Becker wrote: > On Thu, Dec 11, 2008 at 03:44:41PM +0100, Louis Rilling wrote: > > These warnings are known issues. This results from a lack of lockdep annotations > > in configfs. I must admit that I started to send patches for that a few months > > ago, and then could not find time to finish this work. > > > > The problem is a bit harder than just playing with I_MUTEX_CHILD, I_MUTEX_PARENT > > and I_MUTEX_NORMAL, since configfs recursively locks variable numbers > > (this can go to as many as the depth of the whole configfs tree) of > > config_group inodes during operations like mkdir(), rmdir(), and depend_item(). > > > > I was working on two kinds of solutions: > > 1) insert lockdep_off()/lockdep_on() at the places of recursion, > > 2) separate default groups inode mutex classes according to their depth under > > the created group they belong to. > > > > People tend to reject any proposition like 1), but IIRC Joel was tending to > > accept it. > > > > Solution 2) does not work for depend_item(). This needs to rework the locking > > scheme of depend_item() by removing the variable lock recursion depth, and I > > think that it's doable thanks to the configfs_dirent_lock. > > Joel, what do you think about this? > > I've been waiting for your patch for (1). I am wary of the (2) > approach. Not because it wouldn't work for mkdir(2) - I think it would. > But rmdir(2) has the same recursive locking, with far more importance > (live objects), and would print the same error. ?? (2) is fine for both mkdir() and rmdir(), since they both lock inodes' mutexes along paths, and only paths. The problem is depend_item(). Anyway, I'll post patches based on (1) and later I'll propose patches based on (2). Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-12 10:06 ` Louis Rilling @ 2008-12-12 15:29 ` Louis Rilling 2008-12-17 21:40 ` [Cluster-devel] " Andrew Morton 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2008-12-12 15:29 UTC (permalink / raw) To: Joel Becker; +Cc: linux-kernel, cluster-devel, swhiteho, Louis Rilling When attaching default groups (subdirs) of a new group (in mkdir() or in configfs_register()), configfs recursively takes inode's mutexes along the path from the parent of the new group to the default subdirs. This is needed to ensure that the VFS will not race with operations on these sub-dirs. This is safe for the following reasons: - the VFS allows one to lock first an inode and second one of its children (The lock subclasses for this pattern are respectively I_MUTEX_PARENT and I_MUTEX_CHILD); - from this rule any inode path can be recursively locked in descending order as long as it stays under a single mountpoint and does not follow symlinks. Unfortunately lockdep does not know (yet?) how to handle such recursion. I've tried to use Peter Zijlstra's lock_set_subclass() helper to upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know that we might recursively lock some of their descendant, but this usage does not seem to fit the purpose of lock_set_subclass() because it leads to several i_mutex locked with subclass I_MUTEX_PARENT by the same task. >From inside configfs it is not possible to serialize those recursive locking with a top-level one, because mkdir() and rmdir() are already called with inodes locked by the VFS. So using some mutex_lock_nest_lock() is not an option. I am proposing two solutions: 1) one that wraps recursive mutex_lock()s with lockdep_off()/lockdep_on(). 2) (as suggested earlier by Peter Zijlstra) one that puts the i_mutexes recursively locked in different classes based on their depth from the top-level config_group created. This induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the nesting of configfs default groups whenever lockdep is activated but this limit looks reasonably high. Unfortunately, this alos isolates VFS operations on configfs default groups from the others and thus lowers the chances to detect locking issues. This patch implements solution 1). Solution 2) looks better from lockdep's point of view, but fails with configfs_depend_item(). This needs to rework the locking scheme of configfs_depend_item() by removing the variable lock recursion depth, and I think that it's doable thanks to the configfs_dirent_lock. For now, let's stick to solution 1). Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> --- fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 59 insertions(+), 0 deletions(-) diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c index 8e93341..9c23583 100644 --- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) child = sd->s_dentry; + /* + * Note: we hide this from lockdep since we have no way + * to teach lockdep about recursive + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path + * in an inode tree, which are valid as soon as + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a + * parent inode to one of its children. + */ + lockdep_off(); mutex_lock(&child->d_inode->i_mutex); + lockdep_on(); configfs_detach_group(sd->s_element); child->d_inode->i_flags |= S_DEAD; + lockdep_off(); mutex_unlock(&child->d_inode->i_mutex); + lockdep_on(); d_delete(child); dput(child); @@ -748,11 +760,22 @@ static int configfs_attach_item(struct config_item *parent_item, * We are going to remove an inode and its dentry but * the VFS may already have hit and used them. Thus, * we must lock them as rmdir() would. + * + * Note: we hide this from lockdep since we have no way + * to teach lockdep about recursive + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path + * in an inode tree, which are valid as soon as + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a + * parent inode to one of its children. */ + lockdep_off(); mutex_lock(&dentry->d_inode->i_mutex); + lockdep_on(); configfs_remove_dir(item); dentry->d_inode->i_flags |= S_DEAD; + lockdep_off(); mutex_unlock(&dentry->d_inode->i_mutex); + lockdep_on(); d_delete(dentry); } } @@ -787,14 +810,25 @@ static int configfs_attach_group(struct config_item *parent_item, * * We must also lock the inode to remove it safely in case of * error, as rmdir() would. + * + * Note: we hide this from lockdep since we have no way + * to teach lockdep about recursive + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path + * in an inode tree, which are valid as soon as + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a + * parent inode to one of its children. */ + lockdep_off(); mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD); + lockdep_on(); ret = populate_groups(to_config_group(item)); if (ret) { configfs_detach_item(item); dentry->d_inode->i_flags |= S_DEAD; } + lockdep_off(); mutex_unlock(&dentry->d_inode->i_mutex); + lockdep_on(); if (ret) d_delete(dentry); } @@ -956,7 +990,17 @@ static int configfs_depend_prep(struct dentry *origin, BUG_ON(!origin || !sd); /* Lock this guy on the way down */ + /* + * Note: we hide this from lockdep since we have no way + * to teach lockdep about recursive + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path + * in an inode tree, which are valid as soon as + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a + * parent inode to one of its children. + */ + lockdep_off(); mutex_lock(&sd->s_dentry->d_inode->i_mutex); + lockdep_on(); if (sd->s_element == target) /* Boo-yah */ goto out; @@ -970,7 +1014,9 @@ static int configfs_depend_prep(struct dentry *origin, } /* We looped all our children and didn't find target */ + lockdep_off(); mutex_unlock(&sd->s_dentry->d_inode->i_mutex); + lockdep_on(); ret = -ENOENT; out: @@ -990,11 +1036,16 @@ static void configfs_depend_rollback(struct dentry *origin, struct dentry *dentry = item->ci_dentry; while (dentry != origin) { + /* See comments in configfs_depend_prep() */ + lockdep_off(); mutex_unlock(&dentry->d_inode->i_mutex); + lockdep_on(); dentry = dentry->d_parent; } + lockdep_off(); mutex_unlock(&origin->d_inode->i_mutex); + lockdep_on(); } int configfs_depend_item(struct configfs_subsystem *subsys, @@ -1329,8 +1380,16 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry) } /* Wait until the racing operation terminates */ + /* + * Note: we hide this from lockdep since we are locked + * with subclass I_MUTEX_NORMAL from vfs_rmdir() (why + * not I_MUTEX_CHILD?), and I_MUTEX_XATTR or + * I_MUTEX_QUOTA are not relevant for the locked inode. + */ + lockdep_off(); mutex_lock(wait_mutex); mutex_unlock(wait_mutex); + lockdep_on(); } } while (ret == -EAGAIN); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-12 15:29 ` [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Louis Rilling @ 2008-12-17 21:40 ` Andrew Morton 0 siblings, 0 replies; 53+ messages in thread From: Andrew Morton @ 2008-12-17 21:40 UTC (permalink / raw) To: Louis Rilling Cc: Joel.Becker, linux-kernel, cluster-devel, swhiteho, louis.rilling, Peter Zijlstra On Fri, 12 Dec 2008 16:29:11 +0100 Louis Rilling <louis.rilling@kerlabs.com> wrote: > When attaching default groups (subdirs) of a new group (in mkdir() or > in configfs_register()), configfs recursively takes inode's mutexes > along the path from the parent of the new group to the default > subdirs. This is needed to ensure that the VFS will not race with > operations on these sub-dirs. This is safe for the following reasons: > > - the VFS allows one to lock first an inode and second one of its > children (The lock subclasses for this pattern are respectively > I_MUTEX_PARENT and I_MUTEX_CHILD); > - from this rule any inode path can be recursively locked in > descending order as long as it stays under a single mountpoint and > does not follow symlinks. > > Unfortunately lockdep does not know (yet?) how to handle such > recursion. > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > that we might recursively lock some of their descendant, but this > usage does not seem to fit the purpose of lock_set_subclass() because > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > the same task. > > >From inside configfs it is not possible to serialize those recursive > locking with a top-level one, because mkdir() and rmdir() are already > called with inodes locked by the VFS. So using some > mutex_lock_nest_lock() is not an option. > > I am proposing two solutions: > 1) one that wraps recursive mutex_lock()s with > lockdep_off()/lockdep_on(). > 2) (as suggested earlier by Peter Zijlstra) one that puts the > i_mutexes recursively locked in different classes based on their > depth from the top-level config_group created. This > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > nesting of configfs default groups whenever lockdep is activated > but this limit looks reasonably high. Unfortunately, this alos > isolates VFS operations on configfs default groups from the others > and thus lowers the chances to detect locking issues. > > This patch implements solution 1). > > Solution 2) looks better from lockdep's point of view, but fails with > configfs_depend_item(). This needs to rework the locking > scheme of configfs_depend_item() by removing the variable lock recursion > depth, and I think that it's doable thanks to the configfs_dirent_lock. > For now, let's stick to solution 1). > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > --- > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 59 insertions(+), 0 deletions(-) > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index 8e93341..9c23583 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > child = sd->s_dentry; > > + /* > + * Note: we hide this from lockdep since we have no way > + * to teach lockdep about recursive > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > + * in an inode tree, which are valid as soon as > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > + * parent inode to one of its children. > + */ > + lockdep_off(); > mutex_lock(&child->d_inode->i_mutex); > + lockdep_on(); > > [etc] > Oh dear, what an unpleasant patch. Peter, can this be saved? ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2008-12-17 21:40 ` Andrew Morton 0 siblings, 0 replies; 53+ messages in thread From: Andrew Morton @ 2008-12-17 21:40 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 12 Dec 2008 16:29:11 +0100 Louis Rilling <louis.rilling@kerlabs.com> wrote: > When attaching default groups (subdirs) of a new group (in mkdir() or > in configfs_register()), configfs recursively takes inode's mutexes > along the path from the parent of the new group to the default > subdirs. This is needed to ensure that the VFS will not race with > operations on these sub-dirs. This is safe for the following reasons: > > - the VFS allows one to lock first an inode and second one of its > children (The lock subclasses for this pattern are respectively > I_MUTEX_PARENT and I_MUTEX_CHILD); > - from this rule any inode path can be recursively locked in > descending order as long as it stays under a single mountpoint and > does not follow symlinks. > > Unfortunately lockdep does not know (yet?) how to handle such > recursion. > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > that we might recursively lock some of their descendant, but this > usage does not seem to fit the purpose of lock_set_subclass() because > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > the same task. > > >From inside configfs it is not possible to serialize those recursive > locking with a top-level one, because mkdir() and rmdir() are already > called with inodes locked by the VFS. So using some > mutex_lock_nest_lock() is not an option. > > I am proposing two solutions: > 1) one that wraps recursive mutex_lock()s with > lockdep_off()/lockdep_on(). > 2) (as suggested earlier by Peter Zijlstra) one that puts the > i_mutexes recursively locked in different classes based on their > depth from the top-level config_group created. This > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > nesting of configfs default groups whenever lockdep is activated > but this limit looks reasonably high. Unfortunately, this alos > isolates VFS operations on configfs default groups from the others > and thus lowers the chances to detect locking issues. > > This patch implements solution 1). > > Solution 2) looks better from lockdep's point of view, but fails with > configfs_depend_item(). This needs to rework the locking > scheme of configfs_depend_item() by removing the variable lock recursion > depth, and I think that it's doable thanks to the configfs_dirent_lock. > For now, let's stick to solution 1). > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > --- > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 59 insertions(+), 0 deletions(-) > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index 8e93341..9c23583 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > child = sd->s_dentry; > > + /* > + * Note: we hide this from lockdep since we have no way > + * to teach lockdep about recursive > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > + * in an inode tree, which are valid as soon as > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > + * parent inode to one of its children. > + */ > + lockdep_off(); > mutex_lock(&child->d_inode->i_mutex); > + lockdep_on(); > > [etc] > Oh dear, what an unpleasant patch. Peter, can this be saved? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-17 21:40 ` [Cluster-devel] " Andrew Morton @ 2008-12-17 22:03 ` Joel Becker -1 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-17 22:03 UTC (permalink / raw) To: Andrew Morton Cc: Louis Rilling, linux-kernel, cluster-devel, swhiteho, Peter Zijlstra On Wed, Dec 17, 2008 at 01:40:20PM -0800, Andrew Morton wrote: > On Fri, 12 Dec 2008 16:29:11 +0100 > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > When attaching default groups (subdirs) of a new group (in mkdir() or > > in configfs_register()), configfs recursively takes inode's mutexes > > along the path from the parent of the new group to the default > > subdirs. This is needed to ensure that the VFS will not race with > > operations on these sub-dirs. This is safe for the following reasons: > > > > - the VFS allows one to lock first an inode and second one of its > > children (The lock subclasses for this pattern are respectively > > I_MUTEX_PARENT and I_MUTEX_CHILD); > > - from this rule any inode path can be recursively locked in > > descending order as long as it stays under a single mountpoint and > > does not follow symlinks. > > > > Unfortunately lockdep does not know (yet?) how to handle such > > recursion. > > > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > > that we might recursively lock some of their descendant, but this > > usage does not seem to fit the purpose of lock_set_subclass() because > > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > > the same task. > > > > >From inside configfs it is not possible to serialize those recursive > > locking with a top-level one, because mkdir() and rmdir() are already > > called with inodes locked by the VFS. So using some > > mutex_lock_nest_lock() is not an option. > > > > I am proposing two solutions: > > 1) one that wraps recursive mutex_lock()s with > > lockdep_off()/lockdep_on(). > > 2) (as suggested earlier by Peter Zijlstra) one that puts the > > i_mutexes recursively locked in different classes based on their > > depth from the top-level config_group created. This > > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > > nesting of configfs default groups whenever lockdep is activated > > but this limit looks reasonably high. Unfortunately, this alos > > isolates VFS operations on configfs default groups from the others > > and thus lowers the chances to detect locking issues. > > > > This patch implements solution 1). > > > > Solution 2) looks better from lockdep's point of view, but fails with > > configfs_depend_item(). This needs to rework the locking > > scheme of configfs_depend_item() by removing the variable lock recursion > > depth, and I think that it's doable thanks to the configfs_dirent_lock. > > For now, let's stick to solution 1). > > > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > > --- > > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 59 insertions(+), 0 deletions(-) > > > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > > index 8e93341..9c23583 100644 > > --- a/fs/configfs/dir.c > > +++ b/fs/configfs/dir.c > > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > > > child = sd->s_dentry; > > > > + /* > > + * Note: we hide this from lockdep since we have no way > > + * to teach lockdep about recursive > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > > + * in an inode tree, which are valid as soon as > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > > + * parent inode to one of its children. > > + */ > > + lockdep_off(); > > mutex_lock(&child->d_inode->i_mutex); > > + lockdep_on(); > > > > [etc] > > > > Oh dear, what an unpleasant patch. > > Peter, can this be saved? I'd love to see it work within lockdep, but it seems rather hard, so that's why I recommended Louis cook up this version. I see you picked it up in -mm. Do you want me to push it through ocfs2.git? Joel -- Life's Little Instruction Book #157 "Take time to smell the roses." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2008-12-17 22:03 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-17 22:03 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, Dec 17, 2008 at 01:40:20PM -0800, Andrew Morton wrote: > On Fri, 12 Dec 2008 16:29:11 +0100 > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > When attaching default groups (subdirs) of a new group (in mkdir() or > > in configfs_register()), configfs recursively takes inode's mutexes > > along the path from the parent of the new group to the default > > subdirs. This is needed to ensure that the VFS will not race with > > operations on these sub-dirs. This is safe for the following reasons: > > > > - the VFS allows one to lock first an inode and second one of its > > children (The lock subclasses for this pattern are respectively > > I_MUTEX_PARENT and I_MUTEX_CHILD); > > - from this rule any inode path can be recursively locked in > > descending order as long as it stays under a single mountpoint and > > does not follow symlinks. > > > > Unfortunately lockdep does not know (yet?) how to handle such > > recursion. > > > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > > that we might recursively lock some of their descendant, but this > > usage does not seem to fit the purpose of lock_set_subclass() because > > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > > the same task. > > > > >From inside configfs it is not possible to serialize those recursive > > locking with a top-level one, because mkdir() and rmdir() are already > > called with inodes locked by the VFS. So using some > > mutex_lock_nest_lock() is not an option. > > > > I am proposing two solutions: > > 1) one that wraps recursive mutex_lock()s with > > lockdep_off()/lockdep_on(). > > 2) (as suggested earlier by Peter Zijlstra) one that puts the > > i_mutexes recursively locked in different classes based on their > > depth from the top-level config_group created. This > > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > > nesting of configfs default groups whenever lockdep is activated > > but this limit looks reasonably high. Unfortunately, this alos > > isolates VFS operations on configfs default groups from the others > > and thus lowers the chances to detect locking issues. > > > > This patch implements solution 1). > > > > Solution 2) looks better from lockdep's point of view, but fails with > > configfs_depend_item(). This needs to rework the locking > > scheme of configfs_depend_item() by removing the variable lock recursion > > depth, and I think that it's doable thanks to the configfs_dirent_lock. > > For now, let's stick to solution 1). > > > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > > --- > > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 59 insertions(+), 0 deletions(-) > > > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > > index 8e93341..9c23583 100644 > > --- a/fs/configfs/dir.c > > +++ b/fs/configfs/dir.c > > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > > > child = sd->s_dentry; > > > > + /* > > + * Note: we hide this from lockdep since we have no way > > + * to teach lockdep about recursive > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > > + * in an inode tree, which are valid as soon as > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > > + * parent inode to one of its children. > > + */ > > + lockdep_off(); > > mutex_lock(&child->d_inode->i_mutex); > > + lockdep_on(); > > > > [etc] > > > > Oh dear, what an unpleasant patch. > > Peter, can this be saved? I'd love to see it work within lockdep, but it seems rather hard, so that's why I recommended Louis cook up this version. I see you picked it up in -mm. Do you want me to push it through ocfs2.git? Joel -- Life's Little Instruction Book #157 "Take time to smell the roses." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-17 22:03 ` [Cluster-devel] " Joel Becker @ 2008-12-17 22:09 ` Andrew Morton -1 siblings, 0 replies; 53+ messages in thread From: Andrew Morton @ 2008-12-17 22:09 UTC (permalink / raw) To: Joel Becker Cc: louis.rilling, linux-kernel, cluster-devel, swhiteho, a.p.zijlstra On Wed, 17 Dec 2008 14:03:10 -0800 Joel Becker <Joel.Becker@oracle.com> wrote: > > > > > > [etc] > > > > > > > Oh dear, what an unpleasant patch. > > > > Peter, can this be saved? > > I'd love to see it work within lockdep, but it seems rather > hard, so that's why I recommended Louis cook up this version. I see you > picked it up in -mm. Do you want me to push it through ocfs2.git? Please do. If Peter refuses to save us. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2008-12-17 22:09 ` Andrew Morton 0 siblings, 0 replies; 53+ messages in thread From: Andrew Morton @ 2008-12-17 22:09 UTC (permalink / raw) To: cluster-devel.redhat.com On Wed, 17 Dec 2008 14:03:10 -0800 Joel Becker <Joel.Becker@oracle.com> wrote: > > > > > > [etc] > > > > > > > Oh dear, what an unpleasant patch. > > > > Peter, can this be saved? > > I'd love to see it work within lockdep, but it seems rather > hard, so that's why I recommended Louis cook up this version. I see you > picked it up in -mm. Do you want me to push it through ocfs2.git? Please do. If Peter refuses to save us. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-17 21:40 ` [Cluster-devel] " Andrew Morton (?) (?) @ 2008-12-18 7:26 ` Peter Zijlstra 2008-12-18 9:27 ` [Cluster-devel] " Joel Becker -1 siblings, 1 reply; 53+ messages in thread From: Peter Zijlstra @ 2008-12-18 7:26 UTC (permalink / raw) To: Andrew Morton Cc: Louis Rilling, Joel.Becker, linux-kernel, cluster-devel, swhiteho On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > On Fri, 12 Dec 2008 16:29:11 +0100 > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > When attaching default groups (subdirs) of a new group (in mkdir() or > > in configfs_register()), configfs recursively takes inode's mutexes > > along the path from the parent of the new group to the default > > subdirs. This is needed to ensure that the VFS will not race with > > operations on these sub-dirs. This is safe for the following reasons: > > > > - the VFS allows one to lock first an inode and second one of its > > children (The lock subclasses for this pattern are respectively > > I_MUTEX_PARENT and I_MUTEX_CHILD); > > - from this rule any inode path can be recursively locked in > > descending order as long as it stays under a single mountpoint and > > does not follow symlinks. > > > > Unfortunately lockdep does not know (yet?) how to handle such > > recursion. > > > > I've tried to use Peter Zijlstra's lock_set_subclass() helper to > > upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know > > that we might recursively lock some of their descendant, but this > > usage does not seem to fit the purpose of lock_set_subclass() because > > it leads to several i_mutex locked with subclass I_MUTEX_PARENT by > > the same task. > > > > >From inside configfs it is not possible to serialize those recursive > > locking with a top-level one, because mkdir() and rmdir() are already > > called with inodes locked by the VFS. So using some > > mutex_lock_nest_lock() is not an option. > > > > I am proposing two solutions: > > 1) one that wraps recursive mutex_lock()s with > > lockdep_off()/lockdep_on(). > > 2) (as suggested earlier by Peter Zijlstra) one that puts the > > i_mutexes recursively locked in different classes based on their > > depth from the top-level config_group created. This > > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > > nesting of configfs default groups whenever lockdep is activated > > but this limit looks reasonably high. Unfortunately, this alos > > isolates VFS operations on configfs default groups from the others > > and thus lowers the chances to detect locking issues. > > > > This patch implements solution 1). > > > > Solution 2) looks better from lockdep's point of view, but fails with > > configfs_depend_item(). This needs to rework the locking > > scheme of configfs_depend_item() by removing the variable lock recursion > > depth, and I think that it's doable thanks to the configfs_dirent_lock. > > For now, let's stick to solution 1). > > > > Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> > > --- > > fs/configfs/dir.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 files changed, 59 insertions(+), 0 deletions(-) > > > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > > index 8e93341..9c23583 100644 > > --- a/fs/configfs/dir.c > > +++ b/fs/configfs/dir.c > > @@ -553,12 +553,24 @@ static void detach_groups(struct config_group *group) > > > > child = sd->s_dentry; > > > > + /* > > + * Note: we hide this from lockdep since we have no way > > + * to teach lockdep about recursive > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD patterns along a path > > + * in an inode tree, which are valid as soon as > > + * I_MUTEX_PARENT -> I_MUTEX_CHILD is valid from a > > + * parent inode to one of its children. > > + */ > > + lockdep_off(); > > mutex_lock(&child->d_inode->i_mutex); > > + lockdep_on(); > > > > [etc] > > > > Oh dear, what an unpleasant patch. > > Peter, can this be saved? I'm still not sure why configfs is the only virtual filesystem that suffers this - something in its design is weird (and not so wonderfull). Also, while I usually applaud fine grained locking, is configfs really in need of that?, I mean, its not like its meant to create and remove directories all day every day at breakneck speed, right? AFAIK you just stomp some data in to configure some kernel stuff and then let it sit for the duration of whatever that kernel thing does while it does it. See I'm not even clear on what configfs is.. and why its better than sysfs for example.. - /me goes read configfs.txt Right, so basically we avoid syscalls by making vfs ops do stuff.. lovely - still not seeing it though - regular filesystems seems to cope just fine and they get to create arbitrary tree structures too. Yes lockdep has 2 major weaknesses - 1) it doesn't support arbitrary lock depths (and I've tried, twice, to fix that, but its a _hard_ problem) and 2) it can't deal with arbitrary recursion. The thing is, in practise it turns out that reworking code to not run into these issues often makes the code better - if only for the fact that taking locks is expensive and doing less is better, and holding locks stifles concurrency, so holding less it better (yes, I said _often_, there likely are counter cases but I don't believe configfs is one of them). Anyway - I'm against just turning lockdep off, that will make folks complacent and let the stuff rot to bits inside - and I for one will refuse to run anything using it (but since that only seems to be dlm/ocfs and I'm of the believe that centralized cluster stuff sucks rocks anyway that won't be a problem). ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 7:26 ` Peter Zijlstra @ 2008-12-18 9:27 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-18 9:27 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Louis Rilling, linux-kernel, cluster-devel, swhiteho On Thu, Dec 18, 2008 at 08:26:48AM +0100, Peter Zijlstra wrote: > On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > > On Fri, 12 Dec 2008 16:29:11 +0100 > > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > >From inside configfs it is not possible to serialize those recursive > > > locking with a top-level one, because mkdir() and rmdir() are already > > > called with inodes locked by the VFS. So using some > > > mutex_lock_nest_lock() is not an option. <snip> > > > > Oh dear, what an unpleasant patch. > > > > Peter, can this be saved? > > I'm still not sure why configfs is the only virtual filesystem that > suffers this - something in its design is weird (and not so wonderfull). > > Also, while I usually applaud fine grained locking, is configfs really > in need of that?, I mean, its not like its meant to create and remove > directories all day every day at breakneck speed, right? AFAIK you just > stomp some data in to configure some kernel stuff and then let it sit > for the duration of whatever that kernel thing does while it does it. It's not about breakneck speed, it's about living in the VFS. But I think you get that later. > See I'm not even clear on what configfs is.. and why its better than > sysfs for example.. - /me goes read configfs.txt > > Right, so basically we avoid syscalls by making vfs ops do stuff.. > lovely - still not seeing it though - regular filesystems seems to cope > just fine and they get to create arbitrary tree structures too. It's about the default_groups and how they build up and tear down small bits of tree. A simple creation of a config_item, a mkdir(2), is a normal VFS lock set and doesn't make lockdep unhappy. But if the new config_item has a default_group or two, they need locking too. Not so much on mkdir(2), but on rmdir(2). > Yes lockdep has 2 major weaknesses - 1) it doesn't support arbitrary > lock depths (and I've tried, twice, to fix that, but its a _hard_ > problem) and 2) it can't deal with arbitrary recursion. I know it's hard, or I'd have sent you patches :-) In fact, Louis tried to use the subclass bits to make this work to a depth of N (where N was probably deep enough in practice). However, this creates subclasses that don't get seen by the regular VFS locking - and the big deal here is making sure configfs's use of i_mutex meshes with the VFS. That is, his code made the warnings go away, but removed much of lockdep's ability to see when we got the locking wrong. > The thing is, in practise it turns out that reworking code to not run > into these issues often makes the code better - if only for the fact > that taking locks is expensive and doing less is better, and holding > locks stifles concurrency, so holding less it better (yes, I said > _often_, there likely are counter cases but I don't believe configfs is > one of them). This isn't about concurrency or speed. This is about safety while configfs is attaching or (especially) detaching config_items from the filesystem view it presents. When the VFS travels down a path, it unlocks the trailing directory. We can't do that when tearing down default groups, because we need to lock that small hunk and tear it out atomically. > Anyway - I'm against just turning lockdep off, that will make folks > complacent and let the stuff rot to bits inside - and I for one will > refuse to run anything using it (but since that only seems to be > dlm/ocfs and I'm of the believe that centralized cluster stuff sucks > rocks anyway that won't be a problem). Oh, be nice :-) You are absolutely right that turning off lockdep leaves the possibility of complacency and bitrot. That's precisely why I didn't like Louis' subclass solution - again, bitrot might go unnoticed. Now, I know that I will be paying attention to the locking and going over it with a fine-toothed comb. But I'd much rather have an actual working lockdep analysis. Whether that means we find a way for lockdep to describe what's happening here, or we find another way to keep folks out of the tree we're removing, I don't care. Joel -- Life's Little Instruction Book #109 "Know how to drive a stick shift." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2008-12-18 9:27 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-18 9:27 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 18, 2008 at 08:26:48AM +0100, Peter Zijlstra wrote: > On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > > On Fri, 12 Dec 2008 16:29:11 +0100 > > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > >From inside configfs it is not possible to serialize those recursive > > > locking with a top-level one, because mkdir() and rmdir() are already > > > called with inodes locked by the VFS. So using some > > > mutex_lock_nest_lock() is not an option. <snip> > > > > Oh dear, what an unpleasant patch. > > > > Peter, can this be saved? > > I'm still not sure why configfs is the only virtual filesystem that > suffers this - something in its design is weird (and not so wonderfull). > > Also, while I usually applaud fine grained locking, is configfs really > in need of that?, I mean, its not like its meant to create and remove > directories all day every day at breakneck speed, right? AFAIK you just > stomp some data in to configure some kernel stuff and then let it sit > for the duration of whatever that kernel thing does while it does it. It's not about breakneck speed, it's about living in the VFS. But I think you get that later. > See I'm not even clear on what configfs is.. and why its better than > sysfs for example.. - /me goes read configfs.txt > > Right, so basically we avoid syscalls by making vfs ops do stuff.. > lovely - still not seeing it though - regular filesystems seems to cope > just fine and they get to create arbitrary tree structures too. It's about the default_groups and how they build up and tear down small bits of tree. A simple creation of a config_item, a mkdir(2), is a normal VFS lock set and doesn't make lockdep unhappy. But if the new config_item has a default_group or two, they need locking too. Not so much on mkdir(2), but on rmdir(2). > Yes lockdep has 2 major weaknesses - 1) it doesn't support arbitrary > lock depths (and I've tried, twice, to fix that, but its a _hard_ > problem) and 2) it can't deal with arbitrary recursion. I know it's hard, or I'd have sent you patches :-) In fact, Louis tried to use the subclass bits to make this work to a depth of N (where N was probably deep enough in practice). However, this creates subclasses that don't get seen by the regular VFS locking - and the big deal here is making sure configfs's use of i_mutex meshes with the VFS. That is, his code made the warnings go away, but removed much of lockdep's ability to see when we got the locking wrong. > The thing is, in practise it turns out that reworking code to not run > into these issues often makes the code better - if only for the fact > that taking locks is expensive and doing less is better, and holding > locks stifles concurrency, so holding less it better (yes, I said > _often_, there likely are counter cases but I don't believe configfs is > one of them). This isn't about concurrency or speed. This is about safety while configfs is attaching or (especially) detaching config_items from the filesystem view it presents. When the VFS travels down a path, it unlocks the trailing directory. We can't do that when tearing down default groups, because we need to lock that small hunk and tear it out atomically. > Anyway - I'm against just turning lockdep off, that will make folks > complacent and let the stuff rot to bits inside - and I for one will > refuse to run anything using it (but since that only seems to be > dlm/ocfs and I'm of the believe that centralized cluster stuff sucks > rocks anyway that won't be a problem). Oh, be nice :-) You are absolutely right that turning off lockdep leaves the possibility of complacency and bitrot. That's precisely why I didn't like Louis' subclass solution - again, bitrot might go unnoticed. Now, I know that I will be paying attention to the locking and going over it with a fine-toothed comb. But I'd much rather have an actual working lockdep analysis. Whether that means we find a way for lockdep to describe what's happening here, or we find another way to keep folks out of the tree we're removing, I don't care. Joel -- Life's Little Instruction Book #109 "Know how to drive a stick shift." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 9:27 ` [Cluster-devel] " Joel Becker (?) @ 2008-12-18 11:15 ` Louis Rilling 2008-12-18 18:00 ` Make lockdep happy with configfs Louis Rilling ` (2 more replies) -1 siblings, 3 replies; 53+ messages in thread From: Louis Rilling @ 2008-12-18 11:15 UTC (permalink / raw) To: linux-kernel; +Cc: Peter Zijlstra, Andrew Morton, cluster-devel, swhiteho [-- Attachment #1: Type: text/plain, Size: 4727 bytes --] On 18/12/08 1:27 -0800, Joel Becker wrote: > > I know it's hard, or I'd have sent you patches :-) In fact, > Louis tried to use the subclass bits to make this work to a depth of N > (where N was probably deep enough in practice). However, this creates > subclasses that don't get seen by the regular VFS locking - and the big > deal here is making sure configfs's use of i_mutex meshes with the VFS. > That is, his code made the warnings go away, but removed much of > lockdep's ability to see when we got the locking wrong. > > > The thing is, in practise it turns out that reworking code to not run > > into these issues often makes the code better - if only for the fact > > that taking locks is expensive and doing less is better, and holding > > locks stifles concurrency, so holding less it better (yes, I said > > _often_, there likely are counter cases but I don't believe configfs is > > one of them). > > This isn't about concurrency or speed. This is about safety > while configfs is attaching or (especially) detaching config_items from > the filesystem view it presents. When the VFS travels down a path, it > unlocks the trailing directory. We can't do that when tearing down > default groups, because we need to lock that small hunk and tear it out > atomically. > > > Anyway - I'm against just turning lockdep off, that will make folks > > complacent and let the stuff rot to bits inside - and I for one will > > refuse to run anything using it (but since that only seems to be > > dlm/ocfs and I'm of the believe that centralized cluster stuff sucks > > rocks anyway that won't be a problem). > > Oh, be nice :-) > You are absolutely right that turning off lockdep leaves the > possibility of complacency and bitrot. That's precisely why I didn't > like Louis' subclass solution - again, bitrot might go unnoticed. > Now, I know that I will be paying attention to the locking and > going over it with a fine-toothed comb. But I'd much rather have an > actual working lockdep analysis. Whether that means we find a way for > lockdep to describe what's happening here, or we find another way to > keep folks out of the tree we're removing, I don't care. Perhaps I didn't explain myself well. Quoting my original post: <quote> I am proposing two solutions: 1) one that wraps recursive mutex_lock()s with lockdep_off()/lockdep_on(). 2) (as suggested earlier by Peter Zijlstra) one that puts the i_mutexes recursively locked in different classes based on their depth from the top-level config_group created. This induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the nesting of configfs default groups whenever lockdep is activated but this limit looks reasonably high. Unfortunately, this alos isolates VFS operations on configfs default groups from the others and thus lowers the chances to detect locking issues. This patch implements solution 1). Solution 2) looks better from lockdep's point of view, but fails with configfs_depend_item(). This needs to rework the locking scheme of configfs_depend_item() by removing the variable lock recursion depth, and I think that it's doable thanks to the configfs_dirent_lock. For now, let's stick to solution 1). </quote> Solution 2) does not play with i_mutex sub-classes as I proposed earlier, but instead put default_groups' i_mutex in separate classes (actually one class per default group depth). This is not worse than putting each run queue lock in a separate class, as it used to be. For instance, if a created group A has default groups A/B, A/D, and A/B/C, A's i_mutex class will be the regular i_mutex class used everywhere else in the VFS, A/B and A/D will have default_group_class[0], and A/B/C will have default_group_class[1]. Of course those default_group classes will not benefit from locking schemes seen by lockdep outside configfs, but they still will interact nicely with the VFS. Moreover, a default group depth limit of 46 (MAX_LOCK_DEPTH - 2) looks rather reasonable, doesn't it? To me the real drawback of this solution is that it needs to rework locking in configfs_depend_item(). Peter says it is preferable, I know how it could be done, but as any code rework this may bring new bugs, and I realize that I'm spending time to explain this while 1) I don't have much time to just explain what could be done, 2) I'd prefer having time to code what I am explaining. Let's see if I can show you something today. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Make lockdep happy with configfs 2008-12-18 11:15 ` Louis Rilling @ 2008-12-18 18:00 ` Louis Rilling 2009-01-26 11:51 ` Louis Rilling 2008-12-18 18:00 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling 2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling 2 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2008-12-18 18:00 UTC (permalink / raw) To: Joel Becker; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz Hi, Here is a patchset implementing a more lockdep-friendly solution to make lockdep happy with configfs. The first patch could probably be cleaner, but first let's see if the approach is ok. I don't have a good setup to test the second patch beyond compilation, but I guess that Joel has one :) So, does it look better? Louis Louis Rilling (2): configfs: Silence lockdep on mkdir() and rmdir() configfs: Rework configfs_depend_item() locking and make lockdep happy fs/configfs/configfs_internal.h | 3 + fs/configfs/dir.c | 131 +++++++++++++++++++++++---------------- fs/configfs/inode.c | 28 ++++++++ 3 files changed, 108 insertions(+), 54 deletions(-) -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Make lockdep happy with configfs 2008-12-18 18:00 ` Make lockdep happy with configfs Louis Rilling @ 2009-01-26 11:51 ` Louis Rilling 2009-01-28 3:44 ` [Cluster-devel] " Joel Becker 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-01-26 11:51 UTC (permalink / raw) To: Joel Becker; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz [-- Attachment #1: Type: text/plain, Size: 706 bytes --] On 18/12/08 19:00 +0100, Louis Rilling wrote: > > Hi, > > Here is a patchset implementing a more lockdep-friendly solution to make lockdep > happy with configfs. > > The first patch could probably be cleaner, but first let's see if the > approach is ok. > I don't have a good setup to test the second patch beyond compilation, but I > guess that Joel has one :) > > So, does it look better? Joel, Are you going to take one of these approaches to make lockdep and configfs cohabit? Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: Make lockdep happy with configfs 2009-01-26 11:51 ` Louis Rilling @ 2009-01-28 3:44 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:44 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz On Mon, Jan 26, 2009 at 12:51:52PM +0100, Louis Rilling wrote: > Are you going to take one of these approaches to make lockdep and configfs > cohabit? I'm going to look at your patches now and do some review - sorry it took me so long. I do think our discussion with Peter elsewhere may have even better fruit, so I'm going to hold of on including these just yet. Joel -- Life's Little Instruction Book #510 "Count your blessings." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: Make lockdep happy with configfs @ 2009-01-28 3:44 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:44 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, Jan 26, 2009 at 12:51:52PM +0100, Louis Rilling wrote: > Are you going to take one of these approaches to make lockdep and configfs > cohabit? I'm going to look at your patches now and do some review - sorry it took me so long. I do think our discussion with Peter elsewhere may have even better fruit, so I'm going to hold of on including these just yet. Joel -- Life's Little Instruction Book #510 "Count your blessings." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() 2008-12-18 11:15 ` Louis Rilling 2008-12-18 18:00 ` Make lockdep happy with configfs Louis Rilling @ 2008-12-18 18:00 ` Louis Rilling 2009-01-28 3:55 ` [Cluster-devel] " Joel Becker 2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling 2 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2008-12-18 18:00 UTC (permalink / raw) To: Joel Becker Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz, Louis Rilling When attaching default groups (subdirs) of a new group (in mkdir() or in configfs_register()), configfs recursively takes inode's mutexes along the path from the parent of the new group to the default subdirs. This is needed to ensure that the VFS will not race with operations on these sub-dirs. This is safe for the following reasons: - the VFS allows one to lock first an inode and second one of its children (The lock subclasses for this pattern are respectively I_MUTEX_PARENT and I_MUTEX_CHILD); - from this rule any inode path can be recursively locked in descending order as long as it stays under a single mountpoint and does not follow symlinks. Unfortunately lockdep does not know (yet?) how to handle such recursion. I've tried to use Peter Zijlstra's lock_set_subclass() helper to upgrade i_mutexes from I_MUTEX_CHILD to I_MUTEX_PARENT when we know that we might recursively lock some of their descendant, but this usage does not seem to fit the purpose of lock_set_subclass() because it leads to several i_mutex locked with subclass I_MUTEX_PARENT by the same task. >From inside configfs it is not possible to serialize those recursive locking with a top-level one, because mkdir() and rmdir() are already called with inodes locked by the VFS. So using some mutex_lock_nest_lock() is not an option. I am proposing two solutions: 1) one that wraps recursive mutex_lock()s with lockdep_off()/lockdep_on(). 2) (as suggested earlier by Peter Zijlstra) one that puts the i_mutexes recursively locked in different classes based on their depth from the top-level config_group created. This induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the nesting of configfs default groups whenever lockdep is activated but this limit looks reasonably high. Unfortunately, this also isolates VFS operations on configfs default groups from the others and thus lowers the chances to detect locking issues. Nobody likes solution 1), what I can understand. This patch implements solution 2). However lockdep is still not happy with configfs_depend_item(). Next patch reworks the locking of configfs_depend_item() and finally makes lockdep happy. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> --- fs/configfs/configfs_internal.h | 3 +++ fs/configfs/dir.c | 39 +++++++++++++++++++++++++++++++++++++++ fs/configfs/inode.c | 28 ++++++++++++++++++++++++++++ 3 files changed, 70 insertions(+), 0 deletions(-) diff --git a/fs/configfs/configfs_internal.h b/fs/configfs/configfs_internal.h index 762d287..da6061a 100644 --- a/fs/configfs/configfs_internal.h +++ b/fs/configfs/configfs_internal.h @@ -39,6 +39,9 @@ struct configfs_dirent { umode_t s_mode; struct dentry * s_dentry; struct iattr * s_iattr; +#ifdef CONFIG_LOCKDEP + int s_depth; +#endif }; #define CONFIGFS_ROOT 0x0001 diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c index 8e93341..f21be74 100644 --- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare INIT_LIST_HEAD(&sd->s_links); INIT_LIST_HEAD(&sd->s_children); sd->s_element = element; +#ifdef CONFIG_LOCKDEP + sd->s_depth = -1; +#endif spin_lock(&configfs_dirent_lock); if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { spin_unlock(&configfs_dirent_lock); @@ -176,6 +179,17 @@ static int init_symlink(struct inode * inode) return 0; } +#ifdef CONFIG_LOCKDEP +static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, + struct configfs_dirent *sd) +{ + int parent_depth = parent_sd->s_depth; + + if (parent_depth >= 0) + sd->s_depth = parent_depth + 1; +} +#endif + static int create_dir(struct config_item * k, struct dentry * p, struct dentry * d) { @@ -187,6 +201,9 @@ static int create_dir(struct config_item * k, struct dentry * p, error = configfs_make_dirent(p->d_fsdata, d, k, mode, CONFIGFS_DIR | CONFIGFS_USET_CREATING); if (!error) { +#ifdef CONFIG_LOCKDEP + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); +#endif error = configfs_create(d, mode, init_dir); if (!error) { inc_nlink(p->d_inode); @@ -789,11 +806,33 @@ static int configfs_attach_group(struct config_item *parent_item, * error, as rmdir() would. */ mutex_lock_nested(&dentry->d_inode->i_mutex, I_MUTEX_CHILD); +#ifdef CONFIG_LOCKDEP + /* + * item's i_mutex class is already setup, so s_depth is now only + * used to set new sub-directories s_depth, which is always done + * with item's i_mutex locked. + */ + /* + * sd->s_depth == -1 iff we are a non default group. + * else (we are a default group) sd->s_depth > 0 (see + * create_dir()). + */ + if (sd->s_depth == -1) + /* + * We are a non default group and we are going to create + * default groups. + */ + sd->s_depth = 0; +#endif ret = populate_groups(to_config_group(item)); if (ret) { configfs_detach_item(item); dentry->d_inode->i_flags |= S_DEAD; } +#ifdef CONFIG_LOCKDEP + /* We will not create default groups anymore. */ + sd->s_depth = -1; +#endif mutex_unlock(&dentry->d_inode->i_mutex); if (ret) d_delete(dentry); diff --git a/fs/configfs/inode.c b/fs/configfs/inode.c index 4803ccc..5e66b40 100644 --- a/fs/configfs/inode.c +++ b/fs/configfs/inode.c @@ -33,10 +33,15 @@ #include <linux/backing-dev.h> #include <linux/capability.h> #include <linux/sched.h> +#include <linux/lockdep.h> #include <linux/configfs.h> #include "configfs_internal.h" +#ifdef CONFIG_LOCKDEP +static struct lock_class_key default_group_class[MAX_LOCK_DEPTH]; +#endif + extern struct super_block * configfs_sb; static const struct address_space_operations configfs_aops = { @@ -153,6 +158,26 @@ struct inode * configfs_new_inode(mode_t mode, struct configfs_dirent * sd) return inode; } +#ifdef CONFIG_LOCKDEP +static void configfs_set_inode_lock_class(struct configfs_dirent *sd, + struct inode *inode) +{ + if (sd->s_depth > 0) { + if (sd->s_depth <= ARRAY_SIZE(default_group_class)) { + lockdep_set_class(&inode->i_mutex, + &default_group_class[sd->s_depth - 1]); + } else { + /* + * In practice the maximum level of locking depth is + * already reached. Just inform about possible reasons. + */ + printk("configfs: Too many levels of inodes for the locking correctness validator.\n"); + printk("Spurious warnings may appear.\n"); + } + } +} +#endif + int configfs_create(struct dentry * dentry, int mode, int (*init)(struct inode *)) { int error = 0; @@ -165,6 +190,9 @@ int configfs_create(struct dentry * dentry, int mode, int (*init)(struct inode * struct inode *p_inode = dentry->d_parent->d_inode; p_inode->i_mtime = p_inode->i_ctime = CURRENT_TIME; } +#ifdef CONFIG_LOCKDEP + configfs_set_inode_lock_class(sd, inode); +#endif goto Proceed; } else -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() 2008-12-18 18:00 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling @ 2009-01-28 3:55 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:55 UTC (permalink / raw) To: Louis Rilling; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz On Thu, Dec 18, 2008 at 07:00:17PM +0100, Louis Rilling wrote: > >From inside configfs it is not possible to serialize those recursive > locking with a top-level one, because mkdir() and rmdir() are already > called with inodes locked by the VFS. So using some > mutex_lock_nest_lock() is not an option. > > I am proposing two solutions: > 1) one that wraps recursive mutex_lock()s with > lockdep_off()/lockdep_on(). > 2) (as suggested earlier by Peter Zijlstra) one that puts the > i_mutexes recursively locked in different classes based on their > depth from the top-level config_group created. This > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > nesting of configfs default groups whenever lockdep is activated > but this limit looks reasonably high. Unfortunately, this also > isolates VFS operations on configfs default groups from the others > and thus lowers the chances to detect locking issues. > > Nobody likes solution 1), what I can understand. Me too :-P > This patch implements solution 2). However lockdep is still not happy with > configfs_depend_item(). Next patch reworks the locking of > configfs_depend_item() and finally makes lockdep happy. <snip> > #define CONFIGFS_ROOT 0x0001 > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index 8e93341..f21be74 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare > INIT_LIST_HEAD(&sd->s_links); > INIT_LIST_HEAD(&sd->s_children); > sd->s_element = element; > +#ifdef CONFIG_LOCKDEP > + sd->s_depth = -1; > +#endif > spin_lock(&configfs_dirent_lock); > if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { > spin_unlock(&configfs_dirent_lock); > @@ -176,6 +179,17 @@ static int init_symlink(struct inode * inode) > return 0; > } > > +#ifdef CONFIG_LOCKDEP > +static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, > + struct configfs_dirent *sd) > +{ > + int parent_depth = parent_sd->s_depth; > + > + if (parent_depth >= 0) > + sd->s_depth = parent_depth + 1; > +} > +#endif > + > static int create_dir(struct config_item * k, struct dentry * p, > struct dentry * d) > { > @@ -187,6 +201,9 @@ static int create_dir(struct config_item * k, struct dentry * p, > error = configfs_make_dirent(p->d_fsdata, d, k, mode, > CONFIGFS_DIR | CONFIGFS_USET_CREATING); > if (!error) { > +#ifdef CONFIG_LOCKDEP > + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); > +#endif > error = configfs_create(d, mode, init_dir); > if (!error) { > inc_nlink(p->d_inode); Can you change this to provide non-lockdep versions of functions? We don't want "#ifdef CONFIG_LOCKDEP" everywhere. What we want is the code to call functions unconditionally, and the functions to do nothing if lockdep is not enabled. Like: #ifdef CONFIG_LOCKDEP static inline void configfs_init_dir_dirent_depth(dirent) { dirent->s_depth = -1; } static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, struct configfs_dirent *sd) { int parent_depth = parent_sd->s_depth; if (parent_depth >= 0) sd->s_depth = parent_depth + 1; } #else static inline void configfs_init_dir_dirent_depth(dirent) { } static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, struct configfs_dirent *sd) { } #endif This makes the callsites much nicer to read: @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare INIT_LIST_HEAD(&sd->s_links); INIT_LIST_HEAD(&sd->s_children); sd->s_element = element; + configfs_init_dir_dirent_depth(sd); spin_lock(&configfs_dirent_lock); if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { spin_unlock(&configfs_dirent_lock); @@ -187,6 +201,7 @@ static int create_dir(struct config_item * k, struct dentry * p, error = configfs_make_dirent(p->d_fsdata, d, k, mode, CONFIGFS_DIR | CONFIGFS_USET_CREATING); if (!error) { + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); error = configfs_create(d, mode, init_dir); if (!error) { inc_nlink(p->d_inode); Otherwise, this patch seems pretty straightforward. Joel -- Life's Little Instruction Book #30 "Never buy a house without a fireplace." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() @ 2009-01-28 3:55 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:55 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 18, 2008 at 07:00:17PM +0100, Louis Rilling wrote: > >From inside configfs it is not possible to serialize those recursive > locking with a top-level one, because mkdir() and rmdir() are already > called with inodes locked by the VFS. So using some > mutex_lock_nest_lock() is not an option. > > I am proposing two solutions: > 1) one that wraps recursive mutex_lock()s with > lockdep_off()/lockdep_on(). > 2) (as suggested earlier by Peter Zijlstra) one that puts the > i_mutexes recursively locked in different classes based on their > depth from the top-level config_group created. This > induces an arbitrary limit (MAX_LOCK_DEPTH - 2 == 46) on the > nesting of configfs default groups whenever lockdep is activated > but this limit looks reasonably high. Unfortunately, this also > isolates VFS operations on configfs default groups from the others > and thus lowers the chances to detect locking issues. > > Nobody likes solution 1), what I can understand. Me too :-P > This patch implements solution 2). However lockdep is still not happy with > configfs_depend_item(). Next patch reworks the locking of > configfs_depend_item() and finally makes lockdep happy. <snip> > #define CONFIGFS_ROOT 0x0001 > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index 8e93341..f21be74 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare > INIT_LIST_HEAD(&sd->s_links); > INIT_LIST_HEAD(&sd->s_children); > sd->s_element = element; > +#ifdef CONFIG_LOCKDEP > + sd->s_depth = -1; > +#endif > spin_lock(&configfs_dirent_lock); > if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { > spin_unlock(&configfs_dirent_lock); > @@ -176,6 +179,17 @@ static int init_symlink(struct inode * inode) > return 0; > } > > +#ifdef CONFIG_LOCKDEP > +static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, > + struct configfs_dirent *sd) > +{ > + int parent_depth = parent_sd->s_depth; > + > + if (parent_depth >= 0) > + sd->s_depth = parent_depth + 1; > +} > +#endif > + > static int create_dir(struct config_item * k, struct dentry * p, > struct dentry * d) > { > @@ -187,6 +201,9 @@ static int create_dir(struct config_item * k, struct dentry * p, > error = configfs_make_dirent(p->d_fsdata, d, k, mode, > CONFIGFS_DIR | CONFIGFS_USET_CREATING); > if (!error) { > +#ifdef CONFIG_LOCKDEP > + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); > +#endif > error = configfs_create(d, mode, init_dir); > if (!error) { > inc_nlink(p->d_inode); Can you change this to provide non-lockdep versions of functions? We don't want "#ifdef CONFIG_LOCKDEP" everywhere. What we want is the code to call functions unconditionally, and the functions to do nothing if lockdep is not enabled. Like: #ifdef CONFIG_LOCKDEP static inline void configfs_init_dir_dirent_depth(dirent) { dirent->s_depth = -1; } static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, struct configfs_dirent *sd) { int parent_depth = parent_sd->s_depth; if (parent_depth >= 0) sd->s_depth = parent_depth + 1; } #else static inline void configfs_init_dir_dirent_depth(dirent) { } static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, struct configfs_dirent *sd) { } #endif This makes the callsites much nicer to read: @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare INIT_LIST_HEAD(&sd->s_links); INIT_LIST_HEAD(&sd->s_children); sd->s_element = element; + configfs_init_dir_dirent_depth(sd); spin_lock(&configfs_dirent_lock); if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { spin_unlock(&configfs_dirent_lock); @@ -187,6 +201,7 @@ static int create_dir(struct config_item * k, struct dentry * p, error = configfs_make_dirent(p->d_fsdata, d, k, mode, CONFIGFS_DIR | CONFIGFS_USET_CREATING); if (!error) { + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); error = configfs_create(d, mode, init_dir); if (!error) { inc_nlink(p->d_inode); Otherwise, this patch seems pretty straightforward. Joel -- Life's Little Instruction Book #30 "Never buy a house without a fireplace." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() 2009-01-28 3:55 ` [Cluster-devel] " Joel Becker (?) @ 2009-01-28 10:38 ` Louis Rilling -1 siblings, 0 replies; 53+ messages in thread From: Louis Rilling @ 2009-01-28 10:38 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz [-- Attachment #1: Type: text/plain, Size: 3723 bytes --] On 27/01/09 19:55 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 07:00:17PM +0100, Louis Rilling wrote: [...] > > > #define CONFIGFS_ROOT 0x0001 > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > > index 8e93341..f21be74 100644 > > --- a/fs/configfs/dir.c > > +++ b/fs/configfs/dir.c > > @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare > > INIT_LIST_HEAD(&sd->s_links); > > INIT_LIST_HEAD(&sd->s_children); > > sd->s_element = element; > > +#ifdef CONFIG_LOCKDEP > > + sd->s_depth = -1; > > +#endif > > spin_lock(&configfs_dirent_lock); > > if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { > > spin_unlock(&configfs_dirent_lock); > > @@ -176,6 +179,17 @@ static int init_symlink(struct inode * inode) > > return 0; > > } > > > > +#ifdef CONFIG_LOCKDEP > > +static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, > > + struct configfs_dirent *sd) > > +{ > > + int parent_depth = parent_sd->s_depth; > > + > > + if (parent_depth >= 0) > > + sd->s_depth = parent_depth + 1; > > +} > > +#endif > > + > > static int create_dir(struct config_item * k, struct dentry * p, > > struct dentry * d) > > { > > @@ -187,6 +201,9 @@ static int create_dir(struct config_item * k, struct dentry * p, > > error = configfs_make_dirent(p->d_fsdata, d, k, mode, > > CONFIGFS_DIR | CONFIGFS_USET_CREATING); > > if (!error) { > > +#ifdef CONFIG_LOCKDEP > > + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); > > +#endif > > error = configfs_create(d, mode, init_dir); > > if (!error) { > > inc_nlink(p->d_inode); > > Can you change this to provide non-lockdep versions of > functions? We don't want "#ifdef CONFIG_LOCKDEP" everywhere. What we > want is the code to call functions unconditionally, and the functions to > do nothing if lockdep is not enabled. Like: > > #ifdef CONFIG_LOCKDEP > static inline void configfs_init_dir_dirent_depth(dirent) > { > dirent->s_depth = -1; > } > > static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, > struct configfs_dirent *sd) > { > int parent_depth = parent_sd->s_depth; > > if (parent_depth >= 0) > sd->s_depth = parent_depth + 1; > } > #else > static inline void configfs_init_dir_dirent_depth(dirent) > { > } > > static void configfs_set_dir_dirent_depth(struct configfs_dirent *parent_sd, > struct configfs_dirent *sd) > { > } > #endif > > This makes the callsites much nicer to read: > > @@ -94,6 +94,9 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare > INIT_LIST_HEAD(&sd->s_links); > INIT_LIST_HEAD(&sd->s_children); > sd->s_element = element; > + configfs_init_dir_dirent_depth(sd); > spin_lock(&configfs_dirent_lock); > if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { > spin_unlock(&configfs_dirent_lock); > @@ -187,6 +201,7 @@ static int create_dir(struct config_item * k, struct dentry * p, > error = configfs_make_dirent(p->d_fsdata, d, k, mode, > CONFIGFS_DIR | CONFIGFS_USET_CREATING); > if (!error) { > + configfs_set_dir_dirent_depth(p->d_fsdata, d->d_fsdata); > error = configfs_create(d, mode, init_dir); > if (!error) { > inc_nlink(p->d_inode); Sure, that's why I said that this could be cleaner ;) > > Otherwise, this patch seems pretty straightforward. Cleaner patch coming. Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2008-12-18 11:15 ` Louis Rilling 2008-12-18 18:00 ` Make lockdep happy with configfs Louis Rilling 2008-12-18 18:00 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling @ 2008-12-18 18:00 ` Louis Rilling 2009-01-28 4:13 ` [Cluster-devel] " Joel Becker 2 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2008-12-18 18:00 UTC (permalink / raw) To: Joel Becker Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz, Louis Rilling configfs_depend_item() recursively locks all inodes mutex from configfs root to the target item, which makes lockdep unhappy. The purpose of this recursive locking is to ensure that the item tree can be safely parsed and that the target item, if found, is not about to leave. This patch reworks configfs_depend_item() locking using configfs_dirent_lock. Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and protects tagging of items to be removed, this lock can be used instead of the inodes mutex lock chain. This needs that the check for dependents be done atomically with CONFIGFS_USET_DROPPING tagging. Now lockdep looks happy with configfs. Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> --- fs/configfs/dir.c | 92 ++++++++++++++++++++++------------------------------- 1 files changed, 38 insertions(+), 54 deletions(-) diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c index f21be74..4dea906 100644 --- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -955,11 +955,11 @@ static int configfs_dump(struct configfs_dirent *sd, int level) * Note, btw, that this can be called at *any* time, even when a configfs * subsystem isn't registered, or when configfs is loading or unloading. * Just like configfs_register_subsystem(). So we take the same - * precautions. We pin the filesystem. We lock each i_mutex _in_order_ - * on our way down the tree. If we can find the target item in the + * precautions. We pin the filesystem. We lock configfs_dirent_lock. + * If we can find the target item in the * configfs tree, it must be part of the subsystem tree as well, so we - * do not need the subsystem semaphore. Holding the i_mutex chain locks - * out mkdir() and rmdir(), who might be racing us. + * do not need the subsystem semaphore. Holding configfs_dirent_lock helps + * locking out mkdir() and rmdir(), who might be racing us. */ /* @@ -972,17 +972,17 @@ static int configfs_dump(struct configfs_dirent *sd, int level) * do that so we can unlock it if we find nothing. * * Here we do a depth-first search of the dentry hierarchy looking for - * our object. We take i_mutex on each step of the way down. IT IS - * ESSENTIAL THAT i_mutex LOCKING IS ORDERED. If we come back up a branch, - * we'll drop the i_mutex. + * our object. + * We deliberately ignore items tagged as dropping since they are virtually + * dead, as well as items in the middle of attachment since they virtually + * do not exist yet. This completes the locking out of racing mkdir() and + * rmdir(). * - * If the target is not found, -ENOENT is bubbled up and we have released - * all locks. If the target was found, the locks will be cleared by - * configfs_depend_rollback(). + * If the target is not found, -ENOENT is bubbled up. * * This adds a requirement that all config_items be unique! * - * This is recursive because the locking traversal is tricky. There isn't + * This is recursive. There isn't * much on the stack, though, so folks that need this function - be careful * about your stack! Patches will be accepted to make it iterative. */ @@ -994,13 +994,13 @@ static int configfs_depend_prep(struct dentry *origin, BUG_ON(!origin || !sd); - /* Lock this guy on the way down */ - mutex_lock(&sd->s_dentry->d_inode->i_mutex); if (sd->s_element == target) /* Boo-yah */ goto out; list_for_each_entry(child_sd, &sd->s_children, s_sibling) { - if (child_sd->s_type & CONFIGFS_DIR) { + if (child_sd->s_type & CONFIGFS_DIR && + !child_sd->s_type & CONFIGFS_USET_DROPPING && + !child_sd->s_type & CONFIGFS_USET_CREATING) { ret = configfs_depend_prep(child_sd->s_dentry, target); if (!ret) @@ -1009,33 +1009,12 @@ static int configfs_depend_prep(struct dentry *origin, } /* We looped all our children and didn't find target */ - mutex_unlock(&sd->s_dentry->d_inode->i_mutex); ret = -ENOENT; out: return ret; } -/* - * This is ONLY called if configfs_depend_prep() did its job. So we can - * trust the entire path from item back up to origin. - * - * We walk backwards from item, unlocking each i_mutex. We finish by - * unlocking origin. - */ -static void configfs_depend_rollback(struct dentry *origin, - struct config_item *item) -{ - struct dentry *dentry = item->ci_dentry; - - while (dentry != origin) { - mutex_unlock(&dentry->d_inode->i_mutex); - dentry = dentry->d_parent; - } - - mutex_unlock(&origin->d_inode->i_mutex); -} - int configfs_depend_item(struct configfs_subsystem *subsys, struct config_item *target) { @@ -1076,17 +1055,21 @@ int configfs_depend_item(struct configfs_subsystem *subsys, /* Ok, now we can trust subsys/s_item */ - /* Scan the tree, locking i_mutex recursively, return 0 if found */ + spin_lock(&configfs_dirent_lock); + /* Scan the tree, protected by configfs_dirent_lock, return 0 if found */ ret = configfs_depend_prep(subsys_sd->s_dentry, target); if (ret) - goto out_unlock_fs; + goto out_unlock_dirent_lock; - /* We hold all i_mutexes from the subsystem down to the target */ + /* + * We are sure that the item is not about to be removed by rmdir(), and + * not in the middle of attachment by mkdir(). + */ p = target->ci_dentry->d_fsdata; p->s_dependent_count += 1; - configfs_depend_rollback(subsys_sd->s_dentry, target); - +out_unlock_dirent_lock: + spin_unlock(&configfs_dirent_lock); out_unlock_fs: mutex_unlock(&configfs_sb->s_root->d_inode->i_mutex); @@ -1111,10 +1094,10 @@ void configfs_undepend_item(struct configfs_subsystem *subsys, struct configfs_dirent *sd; /* - * Since we can trust everything is pinned, we just need i_mutex - * on the item. + * Since we can trust everything is pinned, we just need + * configfs_dirent_lock. */ - mutex_lock(&target->ci_dentry->d_inode->i_mutex); + spin_lock(&configfs_dirent_lock); sd = target->ci_dentry->d_fsdata; BUG_ON(sd->s_dependent_count < 1); @@ -1125,7 +1108,7 @@ void configfs_undepend_item(struct configfs_subsystem *subsys, * After this unlock, we cannot trust the item to stay alive! * DO NOT REFERENCE item after this unlock. */ - mutex_unlock(&target->ci_dentry->d_inode->i_mutex); + spin_unlock(&configfs_dirent_lock); } EXPORT_SYMBOL(configfs_undepend_item); @@ -1325,13 +1308,6 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry) if (sd->s_type & CONFIGFS_USET_DEFAULT) return -EPERM; - /* - * Here's where we check for dependents. We're protected by - * i_mutex. - */ - if (sd->s_dependent_count) - return -EBUSY; - /* Get a working ref until we have the child */ parent_item = configfs_get_config_item(dentry->d_parent); subsys = to_config_group(parent_item)->cg_subsys; @@ -1355,9 +1331,17 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry) mutex_lock(&configfs_symlink_mutex); spin_lock(&configfs_dirent_lock); - ret = configfs_detach_prep(dentry, &wait_mutex); - if (ret) - configfs_detach_rollback(dentry); + /* + * Here's where we check for dependents. We're protected by + * configfs_dirent_lock. + * If no dependent, atomically tag the item as dropping. + */ + ret = sd->s_dependent_count ? -EBUSY : 0; + if (!ret) { + ret = configfs_detach_prep(dentry, &wait_mutex); + if (ret) + configfs_detach_rollback(dentry); + } spin_unlock(&configfs_dirent_lock); mutex_unlock(&configfs_symlink_mutex); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling @ 2009-01-28 4:13 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 4:13 UTC (permalink / raw) To: Louis Rilling; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz On Thu, Dec 18, 2008 at 07:00:18PM +0100, Louis Rilling wrote: > configfs_depend_item() recursively locks all inodes mutex from configfs root to > the target item, which makes lockdep unhappy. The purpose of this recursive > locking is to ensure that the item tree can be safely parsed and that the target > item, if found, is not about to leave. > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock. > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and > protects tagging of items to be removed, this lock can be used instead of the > inodes mutex lock chain. > This needs that the check for dependents be done atomically with > CONFIGFS_USET_DROPPING tagging. > > Now lockdep looks happy with configfs. This looks almost, but not quite right. In the create path, we do configfs_new_dirent() before we set sd->s_type. But configfs_new_dirent() attaches sd->s_sibling. So, in aonther thread, configfs_depend_prep() can traverse this s_sibling without CONFIGFS_USET_CREATING being set. This turns out to be safe because CONFIGFS_DIR is also not set - but boy I'd like a comment about that. What if we're in mkdir(2) in one thread and another thread is trying to pin the parent directory? That is, we are inside configfs_mkdir(parent, new_dentry, mode). The other thread is doing configfs_depend_item(subsys, parent). With this patch, the other thread will not take parent->i_mutex. It will happily determine that parent is part of the tree and bump its s_dependent with no locking. Is this OK? If it is - isn't this patch good without any other reason? That is, aside from the issues of lockdep, isn't it better for configfs_depend_item() to never have to worry about the VFS locks other than the configfs root? Joel -- The zen have a saying: "When you learn how to listen, ANYONE can be your teacher." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy @ 2009-01-28 4:13 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 4:13 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 18, 2008 at 07:00:18PM +0100, Louis Rilling wrote: > configfs_depend_item() recursively locks all inodes mutex from configfs root to > the target item, which makes lockdep unhappy. The purpose of this recursive > locking is to ensure that the item tree can be safely parsed and that the target > item, if found, is not about to leave. > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock. > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and > protects tagging of items to be removed, this lock can be used instead of the > inodes mutex lock chain. > This needs that the check for dependents be done atomically with > CONFIGFS_USET_DROPPING tagging. > > Now lockdep looks happy with configfs. This looks almost, but not quite right. In the create path, we do configfs_new_dirent() before we set sd->s_type. But configfs_new_dirent() attaches sd->s_sibling. So, in aonther thread, configfs_depend_prep() can traverse this s_sibling without CONFIGFS_USET_CREATING being set. This turns out to be safe because CONFIGFS_DIR is also not set - but boy I'd like a comment about that. What if we're in mkdir(2) in one thread and another thread is trying to pin the parent directory? That is, we are inside configfs_mkdir(parent, new_dentry, mode). The other thread is doing configfs_depend_item(subsys, parent). With this patch, the other thread will not take parent->i_mutex. It will happily determine that parent is part of the tree and bump its s_dependent with no locking. Is this OK? If it is - isn't this patch good without any other reason? That is, aside from the issues of lockdep, isn't it better for configfs_depend_item() to never have to worry about the VFS locks other than the configfs root? Joel -- The zen have a saying: "When you learn how to listen, ANYONE can be your teacher." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-01-28 4:13 ` [Cluster-devel] " Joel Becker (?) @ 2009-01-28 10:32 ` Louis Rilling -1 siblings, 0 replies; 53+ messages in thread From: Louis Rilling @ 2009-01-28 10:32 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz [-- Attachment #1: Type: text/plain, Size: 2929 bytes --] On 27/01/09 20:13 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 07:00:18PM +0100, Louis Rilling wrote: > > configfs_depend_item() recursively locks all inodes mutex from configfs root to > > the target item, which makes lockdep unhappy. The purpose of this recursive > > locking is to ensure that the item tree can be safely parsed and that the target > > item, if found, is not about to leave. > > > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock. > > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and > > protects tagging of items to be removed, this lock can be used instead of the > > inodes mutex lock chain. > > This needs that the check for dependents be done atomically with > > CONFIGFS_USET_DROPPING tagging. > > > > Now lockdep looks happy with configfs. > > This looks almost, but not quite right. > In the create path, we do configfs_new_dirent() before we set > sd->s_type. But configfs_new_dirent() attaches sd->s_sibling. So, in > aonther thread, configfs_depend_prep() can traverse this s_sibling > without CONFIGFS_USET_CREATING being set. This turns out to be safe > because CONFIGFS_DIR is also not set - but boy I'd like a comment about > that. Definitely agreed. I should have written this comment instead of letting you notice this. > What if we're in mkdir(2) in one thread and another thread is > trying to pin the parent directory? That is, we are inside > configfs_mkdir(parent, new_dentry, mode). The other thread is doing > configfs_depend_item(subsys, parent). With this patch, the other thread > will not take parent->i_mutex. It will happily determine that > parent is part of the tree and bump its s_dependent with no locking. Is > this OK? Yes this is the expected impact. It is OK because 1) under a same critical section of configfs_dirent_lock, depend_item() checks that CONFIGFS_USET_DROPPING is not set and bumps s_dependent; 2) under a same critical section of configfs_dirent_lock, configfs_rmdir() checks the s_dependent count and tries to set CONFIGFS_USET_DROPPING. > If it is - isn't this patch good without any other reason? That > is, aside from the issues of lockdep, isn't it better for > configfs_depend_item() to never have to worry about the VFS locks other > than the configfs root? Yes, this patch may look like an improvement, independently from lockdep. I think that locking is simpler this way, and this also removes the need for configfs_depend_rollback(). Moreover this moves towards the management of configfs_dirents protected by configfs_dirent_lock only. In the end, it's up to you to judge if this is a good direction ;) Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 9:27 ` [Cluster-devel] " Joel Becker @ 2008-12-18 11:26 ` Steven Whitehouse -1 siblings, 0 replies; 53+ messages in thread From: Steven Whitehouse @ 2008-12-18 11:26 UTC (permalink / raw) To: Joel Becker Cc: Peter Zijlstra, Andrew Morton, Louis Rilling, linux-kernel, cluster-devel Hi, I have to say that when I brought this subject up, I didn't realise quite how tricky this was going to be, however... On Thu, 2008-12-18 at 01:27 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 08:26:48AM +0100, Peter Zijlstra wrote: > > On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > > > On Fri, 12 Dec 2008 16:29:11 +0100 > > > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > > >From inside configfs it is not possible to serialize those recursive > > > > locking with a top-level one, because mkdir() and rmdir() are already > > > > called with inodes locked by the VFS. So using some > > > > mutex_lock_nest_lock() is not an option. > > <snip> > <more snipping> > > Right, so basically we avoid syscalls by making vfs ops do stuff.. > > lovely - still not seeing it though - regular filesystems seems to cope > > just fine and they get to create arbitrary tree structures too. > > It's about the default_groups and how they build up and tear > down small bits of tree. > A simple creation of a config_item, a mkdir(2), is a normal VFS > lock set and doesn't make lockdep unhappy. But if the new config_item > has a default_group or two, they need locking too. Not so much on > mkdir(2), but on rmdir(2). > So if I've understood this correctly, the dentries created upon mkdir live until such time as they are removed at some later date, presumably with rmdir? When creating the tree, would it be possible to build it starting from the bottom and working towards the point of attachment and then to not actually attach it until the last moment? That way it would not be visible from userland until it was linked into the existing dir and that solves the locking issue for mkdir I think. As you say, rmdir seems the harder problem, but again, is it possible to separate the unlink operation from the destruction of the tree by keeping the tree, after its been unlinked, until the last userland reference has gone away? At least I assume that is why the locking is there. I may be a bit off the mark, but it seems like this is quite similar to how the VFS does umount, so maybe there are some hints in that code which may help us solve this issue? Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2008-12-18 11:26 ` Steven Whitehouse 0 siblings, 0 replies; 53+ messages in thread From: Steven Whitehouse @ 2008-12-18 11:26 UTC (permalink / raw) To: cluster-devel.redhat.com Hi, I have to say that when I brought this subject up, I didn't realise quite how tricky this was going to be, however... On Thu, 2008-12-18 at 01:27 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 08:26:48AM +0100, Peter Zijlstra wrote: > > On Wed, 2008-12-17 at 13:40 -0800, Andrew Morton wrote: > > > On Fri, 12 Dec 2008 16:29:11 +0100 > > > Louis Rilling <louis.rilling@kerlabs.com> wrote: > > > > >From inside configfs it is not possible to serialize those recursive > > > > locking with a top-level one, because mkdir() and rmdir() are already > > > > called with inodes locked by the VFS. So using some > > > > mutex_lock_nest_lock() is not an option. > > <snip> > <more snipping> > > Right, so basically we avoid syscalls by making vfs ops do stuff.. > > lovely - still not seeing it though - regular filesystems seems to cope > > just fine and they get to create arbitrary tree structures too. > > It's about the default_groups and how they build up and tear > down small bits of tree. > A simple creation of a config_item, a mkdir(2), is a normal VFS > lock set and doesn't make lockdep unhappy. But if the new config_item > has a default_group or two, they need locking too. Not so much on > mkdir(2), but on rmdir(2). > So if I've understood this correctly, the dentries created upon mkdir live until such time as they are removed at some later date, presumably with rmdir? When creating the tree, would it be possible to build it starting from the bottom and working towards the point of attachment and then to not actually attach it until the last moment? That way it would not be visible from userland until it was linked into the existing dir and that solves the locking issue for mkdir I think. As you say, rmdir seems the harder problem, but again, is it possible to separate the unlink operation from the destruction of the tree by keeping the tree, after its been unlinked, until the last userland reference has gone away? At least I assume that is why the locking is there. I may be a bit off the mark, but it seems like this is quite similar to how the VFS does umount, so maybe there are some hints in that code which may help us solve this issue? Steve. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 11:26 ` [Cluster-devel] " Steven Whitehouse (?) @ 2008-12-18 11:48 ` Louis Rilling -1 siblings, 0 replies; 53+ messages in thread From: Louis Rilling @ 2008-12-18 11:48 UTC (permalink / raw) To: Steven Whitehouse Cc: Joel Becker, Peter Zijlstra, Andrew Morton, linux-kernel, cluster-devel [-- Attachment #1: Type: text/plain, Size: 2004 bytes --] On 18/12/08 11:26 +0000, Steven Whitehouse wrote: > On Thu, 2008-12-18 at 01:27 -0800, Joel Becker wrote: > > > > It's about the default_groups and how they build up and tear > > down small bits of tree. > > A simple creation of a config_item, a mkdir(2), is a normal VFS > > lock set and doesn't make lockdep unhappy. But if the new config_item > > has a default_group or two, they need locking too. Not so much on > > mkdir(2), but on rmdir(2). > > > So if I've understood this correctly, the dentries created upon mkdir > live until such time as they are removed at some later date, presumably > with rmdir? > > When creating the tree, would it be possible to build it starting from > the bottom and working towards the point of attachment and then to not > actually attach it until the last moment? That way it would not be > visible from userland until it was linked into the existing dir and that > solves the locking issue for mkdir I think. > > As you say, rmdir seems the harder problem, but again, is it possible to > separate the unlink operation from the destruction of the tree by > keeping the tree, after its been unlinked, until the last userland > reference has gone away? At least I assume that is why the locking is > there. I may be a bit off the mark, but it seems like this is quite > similar to how the VFS does umount, so maybe there are some hints in > that code which may help us solve this issue? I second this kind of rework and even think that it is doable. This would avoid exposing things to userspace before they are created for sure. I even had to include dirty hacks in configfs to avoid exposing too much of such temporary things, and I'm definitely not proud of this. Unfortunately I don't have time to rework configfs this way at all. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 9:27 ` [Cluster-devel] " Joel Becker ` (2 preceding siblings ...) (?) @ 2008-12-18 11:56 ` Peter Zijlstra 2008-12-18 12:28 ` Peter Zijlstra -1 siblings, 1 reply; 53+ messages in thread From: Peter Zijlstra @ 2008-12-18 11:56 UTC (permalink / raw) To: Joel Becker Cc: Andrew Morton, Louis Rilling, linux-kernel, cluster-devel, swhiteho On Thu, 2008-12-18 at 01:27 -0800, Joel Becker wrote: > It's about the default_groups and how they build up and tear > down small bits of tree. > A simple creation of a config_item, a mkdir(2), is a normal VFS > lock set and doesn't make lockdep unhappy. But if the new config_item > has a default_group or two, they need locking too. Not so much on > mkdir(2), but on rmdir(2). Hohumm,.. So the problem is that mkdir() doesn't just create a single entity but a whole tree: configfs:/my_subsystem/$ mkdir foo might result in: foo/ foo/A/ foo/B/ foo/B/C/ which on rmdir foo you'd have to tear down, but only if its that exact tree and not when say A has any user created directories. VFS mkdir A/blah only synchronizes on A.i_mutex and checks S_DEAD to avoid races with rmdir A - which would lock first parent(A).i_mutex and then A.i_mutex before detaching A and marking it S_DEAD. So what you're now doing is locking the full foo/ subtree in order to check there is no user content and block mkdir/creat from generating any - which is where the trouble comes from, right? Like said on IRC, the whole populated thing made me think of mount/umount (steven whitehouse seems to have had a similar notion). You basically want to synchronize any user mkdir/creat against foo instead of just the new parent so that rmdir foo can tell if there is any such content without having to lock the whole subtree. That would mean them locking both foo and the new parent (when they're not one and the same). Trouble seems to be that vfs_mkdir() and such already have their new parent locked, which means you cannot go about locking foo anymore. But that would have resulted in a 3 deep lock-chain. (and I don't see any filesystem hooks in user_path_parent() -- which is probably a good thing) Bugger.. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 11:56 ` Peter Zijlstra @ 2008-12-18 12:28 ` Peter Zijlstra 2008-12-18 22:58 ` [Cluster-devel] " Joel Becker 0 siblings, 1 reply; 53+ messages in thread From: Peter Zijlstra @ 2008-12-18 12:28 UTC (permalink / raw) To: Joel Becker Cc: Andrew Morton, Louis Rilling, linux-kernel, cluster-devel, swhiteho On Thu, 2008-12-18 at 12:56 +0100, Peter Zijlstra wrote: > On Thu, 2008-12-18 at 01:27 -0800, Joel Becker wrote: > > > It's about the default_groups and how they build up and tear > > down small bits of tree. > > A simple creation of a config_item, a mkdir(2), is a normal VFS > > lock set and doesn't make lockdep unhappy. But if the new config_item > > has a default_group or two, they need locking too. Not so much on > > mkdir(2), but on rmdir(2). > > Hohumm,.. > > So the problem is that mkdir() doesn't just create a single entity but a > whole tree: > > configfs:/my_subsystem/$ mkdir foo > > might result in: > > foo/ > foo/A/ > foo/B/ > foo/B/C/ > > which on rmdir foo you'd have to tear down, but only if its that exact > tree and not when say A has any user created directories. > > VFS mkdir A/blah only synchronizes on A.i_mutex and checks S_DEAD to > avoid races with rmdir A - which would lock first parent(A).i_mutex and > then A.i_mutex before detaching A and marking it S_DEAD. > > So what you're now doing is locking the full foo/ subtree in order to > check there is no user content and block mkdir/creat from generating any > - which is where the trouble comes from, right? > > Like said on IRC, the whole populated thing made me think of > mount/umount (steven whitehouse seems to have had a similar notion). > > You basically want to synchronize any user mkdir/creat against foo > instead of just the new parent so that rmdir foo can tell if there is > any such content without having to lock the whole subtree. > > That would mean them locking both foo and the new parent (when they're > not one and the same). Trouble seems to be that vfs_mkdir() and such > already have their new parent locked, which means you cannot go about > locking foo anymore. But that would have resulted in a 3 deep > lock-chain. > > (and I don't see any filesystem hooks in user_path_parent() -- which is > probably a good thing) > > > Bugger.. In fact, both (configfs) mkdir and rmdir seem to synchronize on su_mutex.. mkdir B/C/bar C.i_mutex su_mutex vs rmdir foo parent(foo).i_mutex foo.i_mutex su_mutex once holding the rmdir su_mutex you can check foo's user-content, since any mkdir will be blocked. All you have to do is then re-validate in mkdir's su_mutex that !IS_DEADDIR(C). Does that sound plausible, or am I missing something obvious.. ? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 12:28 ` Peter Zijlstra @ 2008-12-18 22:58 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-18 22:58 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Louis Rilling, linux-kernel, cluster-devel, swhiteho On Thu, Dec 18, 2008 at 01:28:28PM +0100, Peter Zijlstra wrote: > In fact, both (configfs) mkdir and rmdir seem to synchronize on > su_mutex.. > > mkdir B/C/bar > > C.i_mutex > su_mutex > > vs > > rmdir foo > > parent(foo).i_mutex > foo.i_mutex > su_mutex > > > once holding the rmdir su_mutex you can check foo's user-content, since > any mkdir will be blocked. All you have to do is then re-validate in > mkdir's su_mutex that !IS_DEADDIR(C). We explicitly do not take any i_mutex locks after taking su_mutex. That's an ABBA risk. su_mutex protects the hierarchy of config_items. i_mutex protects the vfs view thereof. If you look in mkdir, we take su_mutex, get a new item from the client subsystem, then drop su_mutex. After that, we go about building our filesystem structure, using i_mutex where appropriate. More importantly is rmdir(2), where we use i_mutex in configfs_detach_group(), but are not holding su_sem. Only when configfs_detach_group() has successfully returned and we have torn down the filesystem structure do we take su_mutex and tear down the config_item structure. In fact, we're part of the way there. Check out that USET_DROPPING flag we set in detach_prep() while scanning for user objects. That flags us racing mkdir(2). When we are done with detach_prep(), we know that mkdir(2) calls racing behind us will do nothing until we safely lock them out with the locking in detach_group(). All mkdir(2) calls will have exited by the time we get the mutex, and no new mkdir(2) call can start because we have the mutex. Now look in detach_groups(). We drop the groups children before marking them DEAD. Louis' plan, I think, is to perhaps mark a group DEAD, disconnect it from the vfs, and then operate on its children. In this fashion, perhaps we can unlock the trailing lock like a normal VFS operation. This will require some serious auditing, however, because now vfs functions can get into the vfs objects behind us. And more vfs changes affect us. Whereas the current locking relies on the vfs's parent->child lock ordering only, something that isn't likely to be changed. Joel -- "You cannot bring about prosperity by discouraging thrift. You cannot strengthen the weak by weakening the strong. You cannot help the wage earner by pulling down the wage payer. You cannot further the brotherhood of man by encouraging class hatred. You cannot help the poor by destroying the rich. You cannot build character and courage by taking away a man's initiative and independence. You cannot help men permanently by doing for them what they could and should do for themselves." - Abraham Lincoln Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2008-12-18 22:58 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2008-12-18 22:58 UTC (permalink / raw) To: cluster-devel.redhat.com On Thu, Dec 18, 2008 at 01:28:28PM +0100, Peter Zijlstra wrote: > In fact, both (configfs) mkdir and rmdir seem to synchronize on > su_mutex.. > > mkdir B/C/bar > > C.i_mutex > su_mutex > > vs > > rmdir foo > > parent(foo).i_mutex > foo.i_mutex > su_mutex > > > once holding the rmdir su_mutex you can check foo's user-content, since > any mkdir will be blocked. All you have to do is then re-validate in > mkdir's su_mutex that !IS_DEADDIR(C). We explicitly do not take any i_mutex locks after taking su_mutex. That's an ABBA risk. su_mutex protects the hierarchy of config_items. i_mutex protects the vfs view thereof. If you look in mkdir, we take su_mutex, get a new item from the client subsystem, then drop su_mutex. After that, we go about building our filesystem structure, using i_mutex where appropriate. More importantly is rmdir(2), where we use i_mutex in configfs_detach_group(), but are not holding su_sem. Only when configfs_detach_group() has successfully returned and we have torn down the filesystem structure do we take su_mutex and tear down the config_item structure. In fact, we're part of the way there. Check out that USET_DROPPING flag we set in detach_prep() while scanning for user objects. That flags us racing mkdir(2). When we are done with detach_prep(), we know that mkdir(2) calls racing behind us will do nothing until we safely lock them out with the locking in detach_group(). All mkdir(2) calls will have exited by the time we get the mutex, and no new mkdir(2) call can start because we have the mutex. Now look in detach_groups(). We drop the groups children before marking them DEAD. Louis' plan, I think, is to perhaps mark a group DEAD, disconnect it from the vfs, and then operate on its children. In this fashion, perhaps we can unlock the trailing lock like a normal VFS operation. This will require some serious auditing, however, because now vfs functions can get into the vfs objects behind us. And more vfs changes affect us. Whereas the current locking relies on the vfs's parent->child lock ordering only, something that isn't likely to be changed. Joel -- "You cannot bring about prosperity by discouraging thrift. You cannot strengthen the weak by weakening the strong. You cannot help the wage earner by pulling down the wage payer. You cannot further the brotherhood of man by encouraging class hatred. You cannot help the poor by destroying the rich. You cannot build character and courage by taking away a man's initiative and independence. You cannot help men permanently by doing for them what they could and should do for themselves." - Abraham Lincoln Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 22:58 ` [Cluster-devel] " Joel Becker (?) @ 2008-12-19 10:29 ` Louis Rilling -1 siblings, 0 replies; 53+ messages in thread From: Louis Rilling @ 2008-12-19 10:29 UTC (permalink / raw) To: Peter Zijlstra, Andrew Morton, linux-kernel, cluster-devel, swhiteho [-- Attachment #1: Type: text/plain, Size: 3022 bytes --] On 18/12/08 14:58 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 01:28:28PM +0100, Peter Zijlstra wrote: > > In fact, both (configfs) mkdir and rmdir seem to synchronize on > > su_mutex.. > > > > mkdir B/C/bar > > > > C.i_mutex > > su_mutex > > > > vs > > > > rmdir foo > > > > parent(foo).i_mutex > > foo.i_mutex > > su_mutex > > > > > > once holding the rmdir su_mutex you can check foo's user-content, since > > any mkdir will be blocked. All you have to do is then re-validate in > > mkdir's su_mutex that !IS_DEADDIR(C). > > We explicitly do not take any i_mutex locks after taking > su_mutex. That's an ABBA risk. su_mutex protects the hierarchy of > config_items. i_mutex protects the vfs view thereof. > If you look in mkdir, we take su_mutex, get a new item from the > client subsystem, then drop su_mutex. After that, we go about building > our filesystem structure, using i_mutex where appropriate. More > importantly is rmdir(2), where we use i_mutex in > configfs_detach_group(), but are not holding su_sem. Only when > configfs_detach_group() has successfully returned and we have torn down > the filesystem structure do we take su_mutex and tear down the > config_item structure. > In fact, we're part of the way there. Check out that > USET_DROPPING flag we set in detach_prep() while scanning for user > objects. That flags us racing mkdir(2). When we are done with > detach_prep(), we know that mkdir(2) calls racing behind us will do > nothing until we safely lock them out with the locking in > detach_group(). All mkdir(2) calls will have exited by the time we get > the mutex, and no new mkdir(2) call can start because we have the mutex. > Now look in detach_groups(). We drop the groups children before > marking them DEAD. Louis' plan, I think, is to perhaps mark a group > DEAD, disconnect it from the vfs, and then operate on its children. In > this fashion, perhaps we can unlock the trailing lock like a normal VFS > operation. > This will require some serious auditing, however, because now > vfs functions can get into the vfs objects behind us. And more vfs > changes affect us. Whereas the current locking relies on the vfs's > parent->child lock ordering only, something that isn't likely to be > changed. I've thought about such plan, but I'm not comfortable enough with the VFS to tell how it could be done precisely, and whether it is safe to remove a whole tree from the dcache by just unlinking its root. In particular, how could we deal with racing operations under default groups? Should we setup a link from any default group to its youngest non-default group ancestor? As Steven suggested, looking at unmount might be interesting, but not today as far as I am concerned. Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2008-12-18 22:58 ` [Cluster-devel] " Joel Becker (?) (?) @ 2009-01-26 12:30 ` Peter Zijlstra 2009-01-26 13:24 ` Louis Rilling 2009-01-28 3:41 ` [Cluster-devel] " Joel Becker -1 siblings, 2 replies; 53+ messages in thread From: Peter Zijlstra @ 2009-01-26 12:30 UTC (permalink / raw) To: Joel Becker Cc: Andrew Morton, Louis Rilling, linux-kernel, cluster-devel, swhiteho On Thu, 2008-12-18 at 14:58 -0800, Joel Becker wrote: > On Thu, Dec 18, 2008 at 01:28:28PM +0100, Peter Zijlstra wrote: > > In fact, both (configfs) mkdir and rmdir seem to synchronize on > > su_mutex.. > > > > mkdir B/C/bar > > > > C.i_mutex > > su_mutex > > > > vs > > > > rmdir foo > > > > parent(foo).i_mutex > > foo.i_mutex > > su_mutex > > > > > > once holding the rmdir su_mutex you can check foo's user-content, since > > any mkdir will be blocked. All you have to do is then re-validate in > > mkdir's su_mutex that !IS_DEADDIR(C). > > We explicitly do not take any i_mutex locks after taking > su_mutex. That's an ABBA risk. su_mutex protects the hierarchy of > config_items. i_mutex protects the vfs view thereof. I don't think I was suggesting that. All you need is to serialize any mkdir/creat against the rmdir of the youngest non-default group, and you can do that by holding su_mutex. In rmdir, you already own all the i_mutex instances you need to uncouple the whole tree, all you need to do is validate that its indeed empty -- you don't need i_mutex's for that, because you're holding su_mutex, and any concurrent mkdir/creat will be blocking on that. If you find it empty, just mark everybody DEAD, drop su_mutex and decouple. All concurrent mkdir/creat thingies that were blocking will now bail because their parent is found DEAD. > If you look in mkdir, we take su_mutex, get a new item from the > client subsystem, then drop su_mutex. All you need to do before dropping su_mutex again is checking IS_DEADDIR(), if so, you just fail the whole mkdir() no extra i_mutex's needed. > After that, we go about building > our filesystem structure, using i_mutex where appropriate. Sure, but its ok to grow the default groups non-atomically, right? mkdir will only need to check that everything is empty in as far as it has been linked, and ensure the not yet linked entries won't be. > More > importantly is rmdir(2), where we use i_mutex in > configfs_detach_group(), but are not holding su_sem. Only when > configfs_detach_group() has successfully returned and we have torn down > the filesystem structure do we take su_mutex and tear down the > config_item structure. The only thing that matters is that you can hold su_mutex inside i_mutex. configfs_rmdir( "foo" ) { /* we hold i_mutex for foo and its parent */ mutex_lock(&subsys->su_mutex); if (default_tree_empty()) mark_default_tree_dead(); else ret = -EBUSY; mutex_unlock(&subsys->su_mutex); if (ret) return ret; /* do actual unlink foo */ } configfs_mkdir( "B/A/bar" ) { /* we hold i_mutex for A */ mutex_lock(&subsys->su_mutex); if (IS_DEADDIR(A)) ret = -EINVAL; /* or whatever */ /* increase A's use count, so default_tree_empty() will fail. * inc_A_or_subsys_use_count(); mutex_unlock(&subsys->su_mutex); if (ret) return ret; /* do actual mkdir */ } Surely something along these lines ought to work? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 12:30 ` Peter Zijlstra @ 2009-01-26 13:24 ` Louis Rilling 2009-01-26 13:41 ` Peter Zijlstra 2009-01-28 3:41 ` [Cluster-devel] " Joel Becker 1 sibling, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-01-26 13:24 UTC (permalink / raw) To: Peter Zijlstra Cc: Joel Becker, Andrew Morton, linux-kernel, cluster-devel, swhiteho [-- Attachment #1: Type: text/plain, Size: 4524 bytes --] Hi Peter, Thank you for continuing this discussion :) On 26/01/09 13:30 +0100, Peter Zijlstra wrote: > On Thu, 2008-12-18 at 14:58 -0800, Joel Becker wrote: > > On Thu, Dec 18, 2008 at 01:28:28PM +0100, Peter Zijlstra wrote: > > > In fact, both (configfs) mkdir and rmdir seem to synchronize on > > > su_mutex.. > > > > > > mkdir B/C/bar > > > > > > C.i_mutex > > > su_mutex > > > > > > vs > > > > > > rmdir foo > > > > > > parent(foo).i_mutex > > > foo.i_mutex > > > su_mutex > > > > > > > > > once holding the rmdir su_mutex you can check foo's user-content, since > > > any mkdir will be blocked. All you have to do is then re-validate in > > > mkdir's su_mutex that !IS_DEADDIR(C). > > > > We explicitly do not take any i_mutex locks after taking > > su_mutex. That's an ABBA risk. su_mutex protects the hierarchy of > > config_items. i_mutex protects the vfs view thereof. > > I don't think I was suggesting that. All you need is to serialize any > mkdir/creat against the rmdir of the youngest non-default group, and you > can do that by holding su_mutex. > > In rmdir, you already own all the i_mutex instances you need to uncouple > the whole tree, all you need to do is validate that its indeed empty -- > you don't need i_mutex's for that, because you're holding su_mutex, and > any concurrent mkdir/creat will be blocking on that. > > If you find it empty, just mark everybody DEAD, drop su_mutex and > decouple. All concurrent mkdir/creat thingies that were blocking will > now bail because their parent is found DEAD. > > > If you look in mkdir, we take su_mutex, get a new item from the > > client subsystem, then drop su_mutex. > > All you need to do before dropping su_mutex again is checking > IS_DEADDIR(), if so, you just fail the whole mkdir() no extra i_mutex's > needed. > > > After that, we go about building > > our filesystem structure, using i_mutex where appropriate. > > Sure, but its ok to grow the default groups non-atomically, right? mkdir > will only need to check that everything is empty in as far as it has > been linked, and ensure the not yet linked entries won't be. > > > More > > importantly is rmdir(2), where we use i_mutex in > > configfs_detach_group(), but are not holding su_sem. Only when > > configfs_detach_group() has successfully returned and we have torn down > > the filesystem structure do we take su_mutex and tear down the > > config_item structure. > > The only thing that matters is that you can hold su_mutex inside > i_mutex. > > > configfs_rmdir( "foo" ) > { > /* we hold i_mutex for foo and its parent */ > > mutex_lock(&subsys->su_mutex); > if (default_tree_empty()) > mark_default_tree_dead(); > else > ret = -EBUSY; > mutex_unlock(&subsys->su_mutex); > > if (ret) > return ret; > > /* do actual unlink foo */ > } > > > configfs_mkdir( "B/A/bar" ) > { > /* we hold i_mutex for A */ > > mutex_lock(&subsys->su_mutex); > if (IS_DEADDIR(A)) > ret = -EINVAL; /* or whatever */ > > /* increase A's use count, so default_tree_empty() will fail. * > inc_A_or_subsys_use_count(); > mutex_unlock(&subsys->su_mutex); > if (ret) > return ret; > > /* do actual mkdir */ > } > > > Surely something along these lines ought to work? I may have missed something important here, but how does your suggestion remove the need for recursively locking inodes? In configfs_rmdir(), we do not need to lock default groups inodes to prevent racing mkdir() under them. This race is already dealt with earlier in configfs_detach_prep(), which tries to set the CONFIGS_USET_DROPPING flag on the group to remove and on all its hierarchy of default groups, protected by configfs_dirent_lock. The matching logic in configfs_mkdir() may look a bit burried but is simple: configfs_attach_item()/configfs_attach_group() eventually calls configfs_new_dirent(), which fails whenever the parent is tagged with CONFIGFS_USET_DROPPING. Again, no inode locking is required for this logic. However configfs_rmdir() and configfs_mkdir() (recursively) lock inodes because this is how the VFS works when removing/adding entries under a directory which has already lived in the dcache. Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 13:24 ` Louis Rilling @ 2009-01-26 13:41 ` Peter Zijlstra 2009-01-26 14:00 ` Louis Rilling 0 siblings, 1 reply; 53+ messages in thread From: Peter Zijlstra @ 2009-01-26 13:41 UTC (permalink / raw) To: Louis.Rilling Cc: Joel Becker, Andrew Morton, linux-kernel, cluster-devel, swhiteho On Mon, 2009-01-26 at 14:24 +0100, Louis Rilling wrote: > However configfs_rmdir() and configfs_mkdir() (recursively) lock inodes because > this is how the VFS works when removing/adding entries under a directory which > has already lived in the dcache. Ok, so then I'm not understanding things correctly. Its not a locking correctness thing, but simply not being able to do it from the vfs calls because those assume locks held? Can't you simply punt the work to a worklet once you've created/removed the non-default group, which can be done from within the vfs callback ? ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 13:41 ` Peter Zijlstra @ 2009-01-26 14:00 ` Louis Rilling 2009-01-26 14:19 ` Peter Zijlstra 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-01-26 14:00 UTC (permalink / raw) To: Peter Zijlstra Cc: Joel Becker, Andrew Morton, linux-kernel, cluster-devel, swhiteho [-- Attachment #1: Type: text/plain, Size: 1728 bytes --] On 26/01/09 14:41 +0100, Peter Zijlstra wrote: > On Mon, 2009-01-26 at 14:24 +0100, Louis Rilling wrote: > > > However configfs_rmdir() and configfs_mkdir() (recursively) lock inodes because > > this is how the VFS works when removing/adding entries under a directory which > > has already lived in the dcache. > > Ok, so then I'm not understanding things correctly. You may understand the VFS better than I do actually. > > Its not a locking correctness thing, but simply not being able to do it > from the vfs calls because those assume locks held? > > Can't you simply punt the work to a worklet once you've created/removed > the non-default group, which can be done from within the vfs callback ? I'm not sure to understand your suggestion. Is this: 1) for mkdir(), create the non-default group, but without its default groups, and defer their creation to a worker which won't have constraints on locks held by any caller; 2) for rmdir(), unlink the non-default group, but without unlinking its default groups, and defer the recursive work to a lock-free context? For mkdir(), this may work. Maybe a bit confusing for userspace, since mkdir(A) returns as soon as A is created, but A may be populated later and userspace may rely on A being populated as soon as it is created (current behavior). As a configfs user, this makes my life harder... For rmdir(), is this safe to unlink a non-empty directory, and to empty it afterwards? This looks like going back to the unmount problem. Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 14:00 ` Louis Rilling @ 2009-01-26 14:19 ` Peter Zijlstra 2009-01-26 14:55 ` Louis Rilling 0 siblings, 1 reply; 53+ messages in thread From: Peter Zijlstra @ 2009-01-26 14:19 UTC (permalink / raw) To: Louis.Rilling Cc: Joel Becker, Andrew Morton, linux-kernel, cluster-devel, swhiteho On Mon, 2009-01-26 at 15:00 +0100, Louis Rilling wrote: > > Its not a locking correctness thing, but simply not being able to do it > > from the vfs calls because those assume locks held? > > > > Can't you simply punt the work to a worklet once you've created/removed > > the non-default group, which can be done from within the vfs callback ? > > I'm not sure to understand your suggestion. Is this: > 1) for mkdir(), create the non-default group, but without its default groups, > and defer their creation to a worker which won't have constraints on locks held > by any caller; > 2) for rmdir(), unlink the non-default group, but without unlinking its default > groups, and defer the recursive work to a lock-free context? > > For mkdir(), this may work. Maybe a bit confusing for userspace, since mkdir(A) > returns as soon as A is created, but A may be populated later and userspace may > rely on A being populated as soon as it is created (current behavior). As a > configfs user, this makes my life harder... Right, so that is the whole crux of the matter? Initially I understood the whole recursive locking issue to be about having to serialize mkdir vs rmdir so that you would know the default groups to be empty etc. You could create the subtree before you link it in. i_op->mkdir() only has the parent i_mutex held, so you should be able to create your inode, and all default groups (some of who will have the non-default group as parent, but that's ok, as we don't have that locked yet). Once you've constructed this, you could connect the non-default group to its parent. Also, you don't _need_ to have any i_mutex's locked here, because non of these inodes are reachable. > For rmdir(), is this safe to unlink a non-empty directory, and to empty it > afterwards? This looks like going back to the unmount problem. Dunno :-), I think it should be safe. The only guarantee you need is that there are no refs to inodes in the decoupled sub-tree (other than your own of course.) So you'd only need to punt the rmdir cleanup to eventd or something. ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 14:19 ` Peter Zijlstra @ 2009-01-26 14:55 ` Louis Rilling 2009-01-28 3:05 ` [Cluster-devel] " Joel Becker 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-01-26 14:55 UTC (permalink / raw) To: Peter Zijlstra Cc: Joel Becker, Andrew Morton, linux-kernel, cluster-devel, swhiteho [-- Attachment #1: Type: text/plain, Size: 3181 bytes --] On 26/01/09 15:19 +0100, Peter Zijlstra wrote: > On Mon, 2009-01-26 at 15:00 +0100, Louis Rilling wrote: > > > > Its not a locking correctness thing, but simply not being able to do it > > > from the vfs calls because those assume locks held? > > > > > > Can't you simply punt the work to a worklet once you've created/removed > > > the non-default group, which can be done from within the vfs callback ? > > > > I'm not sure to understand your suggestion. Is this: > > 1) for mkdir(), create the non-default group, but without its default groups, > > and defer their creation to a worker which won't have constraints on locks held > > by any caller; > > 2) for rmdir(), unlink the non-default group, but without unlinking its default > > groups, and defer the recursive work to a lock-free context? > > > > For mkdir(), this may work. Maybe a bit confusing for userspace, since mkdir(A) > > returns as soon as A is created, but A may be populated later and userspace may > > rely on A being populated as soon as it is created (current behavior). As a > > configfs user, this makes my life harder... > > Right, so that is the whole crux of the matter? Probably not. I'm not the maintainer of configfs, but I guess that Joel is a bit reluctant to deeply rework parts of something that actually works (conflicts with lockdep excepted). > > Initially I understood the whole recursive locking issue to be about > having to serialize mkdir vs rmdir so that you would know the default > groups to be empty etc. > > You could create the subtree before you link it in. i_op->mkdir() only > has the parent i_mutex held, so you should be able to create your inode, > and all default groups (some of who will have the non-default group as > parent, but that's ok, as we don't have that locked yet). > > Once you've constructed this, you could connect the non-default group to > its parent. > > Also, you don't _need_ to have any i_mutex's locked here, because non of > these inodes are reachable. True. I already suggested this to Joel (while fixing a race condition), but this raises other issues (see http://marc.info/?l=linux-kernel&m=121438776626316&w=2 for a previous discussion on this). > > > For rmdir(), is this safe to unlink a non-empty directory, and to empty it > > afterwards? This looks like going back to the unmount problem. > > Dunno :-), I think it should be safe. The only guarantee you need is > that there are no refs to inodes in the decoupled sub-tree (other than > your own of course.) > > So you'd only need to punt the rmdir cleanup to eventd or something. May be. Anyway I can't investigate this right now, and that's why I'm asking Joel if he is going to accept one of the temporary solutions that I provided (Note that my second solution http://marc.info/?l=linux-kernel&m=122962334723834&w=2 does not turn off lockdep!). Of course it's better if someone can just do this rework :) Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 14:55 ` Louis Rilling @ 2009-01-28 3:05 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:05 UTC (permalink / raw) To: Peter Zijlstra, Andrew Morton, linux-kernel, cluster-devel, swhiteho Thank you both for keeping up on this. I owe Louis some review that I'm going to try to get to. On Mon, Jan 26, 2009 at 03:55:36PM +0100, Louis Rilling wrote: > On 26/01/09 15:19 +0100, Peter Zijlstra wrote: > > On Mon, 2009-01-26 at 15:00 +0100, Louis Rilling wrote: > > > > > > Its not a locking correctness thing, but simply not being able to do it > > > > from the vfs calls because those assume locks held? > > > > > > > > Can't you simply punt the work to a worklet once you've created/removed > > > > the non-default group, which can be done from within the vfs callback ? > > > > > > I'm not sure to understand your suggestion. Is this: > > > 1) for mkdir(), create the non-default group, but without its default groups, > > > and defer their creation to a worker which won't have constraints on locks held > > > by any caller; > > > 2) for rmdir(), unlink the non-default group, but without unlinking its default > > > groups, and defer the recursive work to a lock-free context? > > > > > > For mkdir(), this may work. Maybe a bit confusing for userspace, since mkdir(A) > > > returns as soon as A is created, but A may be populated later and userspace may > > > rely on A being populated as soon as it is created (current behavior). As a > > > configfs user, this makes my life harder... > > > > Right, so that is the whole crux of the matter? The "appearance" of the entire default group hierarchy should be atomic. When mkdir(2) returns, it is all there. This does make our lives a little difficult inside configfs, but it makes everyone else's lives much easier. > Probably not. I'm not the maintainer of configfs, but I guess that Joel is a bit > reluctant to deeply rework parts of something that actually works (conflicts > with lockdep excepted). That is true - it works, and safely, and the lockdep conflict is the only real known issue. I'm not wary of changing it, I'm only wary of breaking it. That is, I want to understand what is being changed and be sure that we're keeping the correctness we have so far. I don't want to change it and "hope" we got it right, you know? This makes me cautious, but don't think I'm against change. As I stated, having lockdep work for us is a good thing. More replies to the other mails coming. Joel -- You can use a screwdriver to screw in screws or to clean your ears, however, the latter needs real skill, determination and a lack of fear of injuring yourself. It is much the same with JavaScript. - Chris Heilmann Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2009-01-28 3:05 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:05 UTC (permalink / raw) To: cluster-devel.redhat.com Thank you both for keeping up on this. I owe Louis some review that I'm going to try to get to. On Mon, Jan 26, 2009 at 03:55:36PM +0100, Louis Rilling wrote: > On 26/01/09 15:19 +0100, Peter Zijlstra wrote: > > On Mon, 2009-01-26 at 15:00 +0100, Louis Rilling wrote: > > > > > > Its not a locking correctness thing, but simply not being able to do it > > > > from the vfs calls because those assume locks held? > > > > > > > > Can't you simply punt the work to a worklet once you've created/removed > > > > the non-default group, which can be done from within the vfs callback ? > > > > > > I'm not sure to understand your suggestion. Is this: > > > 1) for mkdir(), create the non-default group, but without its default groups, > > > and defer their creation to a worker which won't have constraints on locks held > > > by any caller; > > > 2) for rmdir(), unlink the non-default group, but without unlinking its default > > > groups, and defer the recursive work to a lock-free context? > > > > > > For mkdir(), this may work. Maybe a bit confusing for userspace, since mkdir(A) > > > returns as soon as A is created, but A may be populated later and userspace may > > > rely on A being populated as soon as it is created (current behavior). As a > > > configfs user, this makes my life harder... > > > > Right, so that is the whole crux of the matter? The "appearance" of the entire default group hierarchy should be atomic. When mkdir(2) returns, it is all there. This does make our lives a little difficult inside configfs, but it makes everyone else's lives much easier. > Probably not. I'm not the maintainer of configfs, but I guess that Joel is a bit > reluctant to deeply rework parts of something that actually works (conflicts > with lockdep excepted). That is true - it works, and safely, and the lockdep conflict is the only real known issue. I'm not wary of changing it, I'm only wary of breaking it. That is, I want to understand what is being changed and be sure that we're keeping the correctness we have so far. I don't want to change it and "hope" we got it right, you know? This makes me cautious, but don't think I'm against change. As I stated, having lockdep work for us is a good thing. More replies to the other mails coming. Joel -- You can use a screwdriver to screw in screws or to clean your ears, however, the latter needs real skill, determination and a lack of fear of injuring yourself. It is much the same with JavaScript. - Chris Heilmann Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() 2009-01-26 12:30 ` Peter Zijlstra @ 2009-01-28 3:41 ` Joel Becker 2009-01-28 3:41 ` [Cluster-devel] " Joel Becker 1 sibling, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Andrew Morton, Louis Rilling, linux-kernel, cluster-devel, swhiteho On Mon, Jan 26, 2009 at 01:30:09PM +0100, Peter Zijlstra wrote: > I don't think I was suggesting that. All you need is to serialize any > mkdir/creat against the rmdir of the youngest non-default group, and you > can do that by holding su_mutex. > > In rmdir, you already own all the i_mutex instances you need to uncouple > the whole tree, all you need to do is validate that its indeed empty -- > you don't need i_mutex's for that, because you're holding su_mutex, and > any concurrent mkdir/creat will be blocking on that. > > If you find it empty, just mark everybody DEAD, drop su_mutex and > decouple. All concurrent mkdir/creat thingies that were blocking will > now bail because their parent is found DEAD. Right, that's what I was talking about when I said: > > Now look in detach_groups(). We drop the groups children before > > marking them DEAD. Louis' plan, I think, is to perhaps mark a group > > DEAD, disconnect it from the vfs, and then operate on its children. In > > this fashion, perhaps we can unlock the trailing lock like a normal VFS > > operation. > > This will require some serious auditing, however, because now > > vfs functions can get into the vfs objects behind us. And more vfs > > changes affect us. Whereas the current locking relies on the vfs's > > parent->child lock ordering only, something that isn't likely to be > > changed. That is, the vfs has already walked past this directory, dropped i_mutex, and is in a child default group holding its i_mutex. It wants to mkdir(2) down there. You're saying that, if mkdir(2) holds su_mutex higher up, it can check S_DEAD and compare with us, and that's exactly the scheme I mentioned in the first of the quoted paragraphs (Louis proposed it a while back). Thus, though we don't hold i_mutex on that child, it will eventually either a) have gotten su_mutex first, and will cause rmdir to ENOTEMPTY or b) have gotten su_mutex second, will see S_DEAD, and return -ENOENT. As far as preventing mkdir(2), I don't see why it wouldn't work. The issue is in that second quoted paragraph. We know that the VFS can look up our children if we're not holding i_mutex. In fact, cached_lookup() can find them without i_mutex. Now, we know that mkdir(2) and rmdir(2) will block at su_mutex. But what about all other file operations, both on the child directories *and* the attribute files? For attribute files, we prevent access at creation time with a flag. We can trust the flag because we hold i_mutex. This might hold anyway because we're holding that toplevel i_mutex. At teardown time, though, we know they can't be found because of i_mutex. Now we don't have that protection for processes that are farther down the tree. But the bigger issue is just the plain regular operations on our directories. An example is ->readdir(). We currently lock out ->readdir() during rmdir(2) with our holding of i_mutex. However, if we are not holding i_mutex, vfs_readdir() can call into our ->readdir() right as we're tearing things down. We may not have gotten to S_DEAD yet in the rmdir(2) path, and the two will race. We can't take su_mutex in ->readdir(), because that sort of solution effectively says "we have to take su_mutex for all operations", and we end up serializing all operations on a subsustem. Now, I can think of a way to make ->readdir() safe without su_mutex. But what other operation is next? How do I know I have them all? How do I notice when someone adds a new operation or code path to the generic code that I have to protect against? With i_mutex, I *know* that everyone has agreed that is the gatekeeper. Without it... *That's* the big worry. That's what I'm worried about being sure of. I'd love to hear a solution that we know will work, or at least move towards one. Joel PS: And we haven't even talked about configfs_depend_item() yet. -- Life's Little Instruction Book #252 "Take good care of those you love." Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [Cluster-devel] Re: [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() @ 2009-01-28 3:41 ` Joel Becker 0 siblings, 0 replies; 53+ messages in thread From: Joel Becker @ 2009-01-28 3:41 UTC (permalink / raw) To: cluster-devel.redhat.com On Mon, Jan 26, 2009 at 01:30:09PM +0100, Peter Zijlstra wrote: > I don't think I was suggesting that. All you need is to serialize any > mkdir/creat against the rmdir of the youngest non-default group, and you > can do that by holding su_mutex. > > In rmdir, you already own all the i_mutex instances you need to uncouple > the whole tree, all you need to do is validate that its indeed empty -- > you don't need i_mutex's for that, because you're holding su_mutex, and > any concurrent mkdir/creat will be blocking on that. > > If you find it empty, just mark everybody DEAD, drop su_mutex and > decouple. All concurrent mkdir/creat thingies that were blocking will > now bail because their parent is found DEAD. Right, that's what I was talking about when I said: > > Now look in detach_groups(). We drop the groups children before > > marking them DEAD. Louis' plan, I think, is to perhaps mark a group > > DEAD, disconnect it from the vfs, and then operate on its children. In > > this fashion, perhaps we can unlock the trailing lock like a normal VFS > > operation. > > This will require some serious auditing, however, because now > > vfs functions can get into the vfs objects behind us. And more vfs > > changes affect us. Whereas the current locking relies on the vfs's > > parent->child lock ordering only, something that isn't likely to be > > changed. That is, the vfs has already walked past this directory, dropped i_mutex, and is in a child default group holding its i_mutex. It wants to mkdir(2) down there. You're saying that, if mkdir(2) holds su_mutex higher up, it can check S_DEAD and compare with us, and that's exactly the scheme I mentioned in the first of the quoted paragraphs (Louis proposed it a while back). Thus, though we don't hold i_mutex on that child, it will eventually either a) have gotten su_mutex first, and will cause rmdir to ENOTEMPTY or b) have gotten su_mutex second, will see S_DEAD, and return -ENOENT. As far as preventing mkdir(2), I don't see why it wouldn't work. The issue is in that second quoted paragraph. We know that the VFS can look up our children if we're not holding i_mutex. In fact, cached_lookup() can find them without i_mutex. Now, we know that mkdir(2) and rmdir(2) will block at su_mutex. But what about all other file operations, both on the child directories *and* the attribute files? For attribute files, we prevent access at creation time with a flag. We can trust the flag because we hold i_mutex. This might hold anyway because we're holding that toplevel i_mutex. At teardown time, though, we know they can't be found because of i_mutex. Now we don't have that protection for processes that are farther down the tree. But the bigger issue is just the plain regular operations on our directories. An example is ->readdir(). We currently lock out ->readdir() during rmdir(2) with our holding of i_mutex. However, if we are not holding i_mutex, vfs_readdir() can call into our ->readdir() right as we're tearing things down. We may not have gotten to S_DEAD yet in the rmdir(2) path, and the two will race. We can't take su_mutex in ->readdir(), because that sort of solution effectively says "we have to take su_mutex for all operations", and we end up serializing all operations on a subsustem. Now, I can think of a way to make ->readdir() safe without su_mutex. But what other operation is next? How do I know I have them all? How do I notice when someone adds a new operation or code path to the generic code that I have to protect against? With i_mutex, I *know* that everyone has agreed that is the gatekeeper. Without it... *That's* the big worry. That's what I'm worried about being sure of. I'd love to hear a solution that we know will work, or at least move towards one. Joel PS: And we haven't even talked about configfs_depend_item() yet. -- Life's Little Instruction Book #252 "Take good care of those you love." Joel Becker Principal Software Developer Oracle E-mail: joel.becker at oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH v2] Make lockdep happy with configfs @ 2009-01-28 18:18 Louis Rilling 2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-01-28 18:18 UTC (permalink / raw) To: Joel Becker; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz Hi Joel, Here is a revised version of the patchset making lockdep happy with configfs. I still don't have a good setup to test the second patch beyond compilation, and I still guess that you have one :) Louis Changelog: - put s_depth logic in separate functions and remove #ifdef LOCKDEP in the hooked functions. - added the following note to explain why configfs_depend_prep() is correct when examining attaching items: + * Note: items in the middle of attachment start with s_type = 0 + * (configfs_new_dirent()), and configfs_make_dirent() (called from + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both + * cases the item is ignored. Since s_type is an int, we rely on the CPU to + * atomically update the value, without making configfs_make_dirent() take + * configfs_dirent_lock. - fixed parenthesis on pattern !a & b && c --> !(a & b) && c - quiet checkpatch Louis Rilling (2): configfs: Silence lockdep on mkdir() and rmdir() configfs: Rework configfs_depend_item() locking and make lockdep happy fs/configfs/configfs_internal.h | 3 + fs/configfs/dir.c | 188 ++++++++++++++++++++++++++++----------- fs/configfs/inode.c | 38 ++++++++ 3 files changed, 175 insertions(+), 54 deletions(-) -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes ^ permalink raw reply [flat|nested] 53+ messages in thread
* [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-01-28 18:18 [PATCH v2] Make lockdep happy with configfs Louis Rilling @ 2009-01-28 18:18 ` Louis Rilling 2009-04-29 18:52 ` Joel Becker 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-01-28 18:18 UTC (permalink / raw) To: Joel Becker Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz, Louis Rilling configfs_depend_item() recursively locks all inodes mutex from configfs root to the target item, which makes lockdep unhappy. The purpose of this recursive locking is to ensure that the item tree can be safely parsed and that the target item, if found, is not about to leave. This patch reworks configfs_depend_item() locking using configfs_dirent_lock. Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and protects tagging of items to be removed, this lock can be used instead of the inodes mutex lock chain. This needs that the check for dependents be done atomically with CONFIGFS_USET_DROPPING tagging. Now lockdep looks happy with configfs. Changelog: - added the following note to explain why configfs_depend_prep() is correct when examining attaching items: + * Note: items in the middle of attachment start with s_type = 0 + * (configfs_new_dirent()), and configfs_make_dirent() (called from + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both + * cases the item is ignored. Since s_type is an int, we rely on the CPU to + * atomically update the value, without making configfs_make_dirent() take + * configfs_dirent_lock. - fixed parenthesis on pattern !a & b && c --> !(a & b) && c - quiet checkpatch Signed-off-by: Louis Rilling <louis.rilling@kerlabs.com> --- fs/configfs/dir.c | 98 ++++++++++++++++++++++++----------------------------- 1 files changed, 44 insertions(+), 54 deletions(-) diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c index 836596b..2739514 100644 --- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -1006,11 +1006,11 @@ static int configfs_dump(struct configfs_dirent *sd, int level) * Note, btw, that this can be called at *any* time, even when a configfs * subsystem isn't registered, or when configfs is loading or unloading. * Just like configfs_register_subsystem(). So we take the same - * precautions. We pin the filesystem. We lock each i_mutex _in_order_ - * on our way down the tree. If we can find the target item in the + * precautions. We pin the filesystem. We lock configfs_dirent_lock. + * If we can find the target item in the * configfs tree, it must be part of the subsystem tree as well, so we - * do not need the subsystem semaphore. Holding the i_mutex chain locks - * out mkdir() and rmdir(), who might be racing us. + * do not need the subsystem semaphore. Holding configfs_dirent_lock helps + * locking out mkdir() and rmdir(), who might be racing us. */ /* @@ -1023,17 +1023,23 @@ static int configfs_dump(struct configfs_dirent *sd, int level) * do that so we can unlock it if we find nothing. * * Here we do a depth-first search of the dentry hierarchy looking for - * our object. We take i_mutex on each step of the way down. IT IS - * ESSENTIAL THAT i_mutex LOCKING IS ORDERED. If we come back up a branch, - * we'll drop the i_mutex. + * our object. + * We deliberately ignore items tagged as dropping since they are virtually + * dead, as well as items in the middle of attachment since they virtually + * do not exist yet. This completes the locking out of racing mkdir() and + * rmdir(). + * Note: items in the middle of attachment start with s_type = 0 + * (configfs_new_dirent()), and configfs_make_dirent() (called from + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both + * cases the item is ignored. Since s_type is an int, we rely on the CPU to + * atomically update the value, without making configfs_make_dirent() take + * configfs_dirent_lock. * - * If the target is not found, -ENOENT is bubbled up and we have released - * all locks. If the target was found, the locks will be cleared by - * configfs_depend_rollback(). + * If the target is not found, -ENOENT is bubbled up. * * This adds a requirement that all config_items be unique! * - * This is recursive because the locking traversal is tricky. There isn't + * This is recursive. There isn't * much on the stack, though, so folks that need this function - be careful * about your stack! Patches will be accepted to make it iterative. */ @@ -1045,13 +1051,13 @@ static int configfs_depend_prep(struct dentry *origin, BUG_ON(!origin || !sd); - /* Lock this guy on the way down */ - mutex_lock(&sd->s_dentry->d_inode->i_mutex); if (sd->s_element == target) /* Boo-yah */ goto out; list_for_each_entry(child_sd, &sd->s_children, s_sibling) { - if (child_sd->s_type & CONFIGFS_DIR) { + if ((child_sd->s_type & CONFIGFS_DIR) && + !(child_sd->s_type & CONFIGFS_USET_DROPPING) && + !(child_sd->s_type & CONFIGFS_USET_CREATING)) { ret = configfs_depend_prep(child_sd->s_dentry, target); if (!ret) @@ -1060,33 +1066,12 @@ static int configfs_depend_prep(struct dentry *origin, } /* We looped all our children and didn't find target */ - mutex_unlock(&sd->s_dentry->d_inode->i_mutex); ret = -ENOENT; out: return ret; } -/* - * This is ONLY called if configfs_depend_prep() did its job. So we can - * trust the entire path from item back up to origin. - * - * We walk backwards from item, unlocking each i_mutex. We finish by - * unlocking origin. - */ -static void configfs_depend_rollback(struct dentry *origin, - struct config_item *item) -{ - struct dentry *dentry = item->ci_dentry; - - while (dentry != origin) { - mutex_unlock(&dentry->d_inode->i_mutex); - dentry = dentry->d_parent; - } - - mutex_unlock(&origin->d_inode->i_mutex); -} - int configfs_depend_item(struct configfs_subsystem *subsys, struct config_item *target) { @@ -1127,17 +1112,21 @@ int configfs_depend_item(struct configfs_subsystem *subsys, /* Ok, now we can trust subsys/s_item */ - /* Scan the tree, locking i_mutex recursively, return 0 if found */ + spin_lock(&configfs_dirent_lock); + /* Scan the tree, return 0 if found */ ret = configfs_depend_prep(subsys_sd->s_dentry, target); if (ret) - goto out_unlock_fs; + goto out_unlock_dirent_lock; - /* We hold all i_mutexes from the subsystem down to the target */ + /* + * We are sure that the item is not about to be removed by rmdir(), and + * not in the middle of attachment by mkdir(). + */ p = target->ci_dentry->d_fsdata; p->s_dependent_count += 1; - configfs_depend_rollback(subsys_sd->s_dentry, target); - +out_unlock_dirent_lock: + spin_unlock(&configfs_dirent_lock); out_unlock_fs: mutex_unlock(&configfs_sb->s_root->d_inode->i_mutex); @@ -1162,10 +1151,10 @@ void configfs_undepend_item(struct configfs_subsystem *subsys, struct configfs_dirent *sd; /* - * Since we can trust everything is pinned, we just need i_mutex - * on the item. + * Since we can trust everything is pinned, we just need + * configfs_dirent_lock. */ - mutex_lock(&target->ci_dentry->d_inode->i_mutex); + spin_lock(&configfs_dirent_lock); sd = target->ci_dentry->d_fsdata; BUG_ON(sd->s_dependent_count < 1); @@ -1176,7 +1165,7 @@ void configfs_undepend_item(struct configfs_subsystem *subsys, * After this unlock, we cannot trust the item to stay alive! * DO NOT REFERENCE item after this unlock. */ - mutex_unlock(&target->ci_dentry->d_inode->i_mutex); + spin_unlock(&configfs_dirent_lock); } EXPORT_SYMBOL(configfs_undepend_item); @@ -1376,13 +1365,6 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry) if (sd->s_type & CONFIGFS_USET_DEFAULT) return -EPERM; - /* - * Here's where we check for dependents. We're protected by - * i_mutex. - */ - if (sd->s_dependent_count) - return -EBUSY; - /* Get a working ref until we have the child */ parent_item = configfs_get_config_item(dentry->d_parent); subsys = to_config_group(parent_item)->cg_subsys; @@ -1406,9 +1388,17 @@ static int configfs_rmdir(struct inode *dir, struct dentry *dentry) mutex_lock(&configfs_symlink_mutex); spin_lock(&configfs_dirent_lock); - ret = configfs_detach_prep(dentry, &wait_mutex); - if (ret) - configfs_detach_rollback(dentry); + /* + * Here's where we check for dependents. We're protected by + * configfs_dirent_lock. + * If no dependent, atomically tag the item as dropping. + */ + ret = sd->s_dependent_count ? -EBUSY : 0; + if (!ret) { + ret = configfs_detach_prep(dentry, &wait_mutex); + if (ret) + configfs_detach_rollback(dentry); + } spin_unlock(&configfs_dirent_lock); mutex_unlock(&configfs_symlink_mutex); -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling @ 2009-04-29 18:52 ` Joel Becker 2009-04-30 9:18 ` Louis Rilling 0 siblings, 1 reply; 53+ messages in thread From: Joel Becker @ 2009-04-29 18:52 UTC (permalink / raw) To: Louis Rilling; +Cc: linux-kernel, akpm, cluster-devel, swhiteho, peterz On Wed, Jan 28, 2009 at 07:18:33PM +0100, Louis Rilling wrote: > configfs_depend_item() recursively locks all inodes mutex from configfs root to > the target item, which makes lockdep unhappy. The purpose of this recursive > locking is to ensure that the item tree can be safely parsed and that the target > item, if found, is not about to leave. > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock. > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and > protects tagging of items to be removed, this lock can be used instead of the > inodes mutex lock chain. > This needs that the check for dependents be done atomically with > CONFIGFS_USET_DROPPING tagging. These patches are now in the 'lockdep' branch of the configfs tree. I'm planning to send them in the next merge window. I've made one change. > + * Note: items in the middle of attachment start with s_type = 0 > + * (configfs_new_dirent()), and configfs_make_dirent() (called from > + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both > + * cases the item is ignored. Since s_type is an int, we rely on the CPU to > + * atomically update the value, without making configfs_make_dirent() take > + * configfs_dirent_lock. I've added configfs_dirent_lock in configfs_make_dirent(), because it is not safe at all to rely on the fact that s_type is an int. It's an atomic set on one CPU, but there's no guarantee that it's seen correctly on other CPUs. Plus, there's no real need for speed here. So we properly take configfs_dirent_lock around s_type in configfs_make_dirent(), and that ensures we see things correctly on SMP. Joel -- "Three o'clock is always too late or too early for anything you want to do." - Jean-Paul Sartre Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-29 18:52 ` Joel Becker @ 2009-04-30 9:18 ` Louis Rilling 2009-04-30 17:20 ` Joel Becker 0 siblings, 1 reply; 53+ messages in thread From: Louis Rilling @ 2009-04-30 9:18 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz [-- Attachment #1: Type: text/plain, Size: 2230 bytes --] On 29/04/09 11:52 -0700, Joel Becker wrote: > On Wed, Jan 28, 2009 at 07:18:33PM +0100, Louis Rilling wrote: > > configfs_depend_item() recursively locks all inodes mutex from configfs root to > > the target item, which makes lockdep unhappy. The purpose of this recursive > > locking is to ensure that the item tree can be safely parsed and that the target > > item, if found, is not about to leave. > > > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock. > > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and > > protects tagging of items to be removed, this lock can be used instead of the > > inodes mutex lock chain. > > This needs that the check for dependents be done atomically with > > CONFIGFS_USET_DROPPING tagging. > > These patches are now in the 'lockdep' branch of the configfs > tree. I'm planning to send them in the next merge window. I've made > one change. > > > + * Note: items in the middle of attachment start with s_type = 0 > > + * (configfs_new_dirent()), and configfs_make_dirent() (called from > > + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both > > + * cases the item is ignored. Since s_type is an int, we rely on the CPU to > > + * atomically update the value, without making configfs_make_dirent() take > > + * configfs_dirent_lock. > > I've added configfs_dirent_lock in configfs_make_dirent(), > because it is not safe at all to rely on the fact that s_type is an int. > It's an atomic set on one CPU, but there's no guarantee that it's seen > correctly on other CPUs. Plus, there's no real need for speed here. So > we properly take configfs_dirent_lock around s_type in > configfs_make_dirent(), and that ensures we see things correctly on SMP. Agreed, I was going to suggest something like this. Actually I'd push the initialization of s_type down to configfs_new_dirent(), so that s_type either is always NULL, or always shows the correct type of object. Thanks, Louis -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-30 9:18 ` Louis Rilling @ 2009-04-30 17:20 ` Joel Becker 2009-04-30 17:30 ` Joel Becker 0 siblings, 1 reply; 53+ messages in thread From: Joel Becker @ 2009-04-30 17:20 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz On Thu, Apr 30, 2009 at 11:18:28AM +0200, Louis Rilling wrote: > On 29/04/09 11:52 -0700, Joel Becker wrote: > > On Wed, Jan 28, 2009 at 07:18:33PM +0100, Louis Rilling wrote: > > > configfs_depend_item() recursively locks all inodes mutex from configfs root to > > > the target item, which makes lockdep unhappy. The purpose of this recursive > > > locking is to ensure that the item tree can be safely parsed and that the target > > > item, if found, is not about to leave. > > > > > > This patch reworks configfs_depend_item() locking using configfs_dirent_lock. > > > Since configfs_dirent_lock protects all changes to the configfs_dirent tree, and > > > protects tagging of items to be removed, this lock can be used instead of the > > > inodes mutex lock chain. > > > This needs that the check for dependents be done atomically with > > > CONFIGFS_USET_DROPPING tagging. > > > > These patches are now in the 'lockdep' branch of the configfs > > tree. I'm planning to send them in the next merge window. I've made > > one change. > > > > > + * Note: items in the middle of attachment start with s_type = 0 > > > + * (configfs_new_dirent()), and configfs_make_dirent() (called from > > > + * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both > > > + * cases the item is ignored. Since s_type is an int, we rely on the CPU to > > > + * atomically update the value, without making configfs_make_dirent() take > > > + * configfs_dirent_lock. > > > > I've added configfs_dirent_lock in configfs_make_dirent(), > > because it is not safe at all to rely on the fact that s_type is an int. > > It's an atomic set on one CPU, but there's no guarantee that it's seen > > correctly on other CPUs. Plus, there's no real need for speed here. So > > we properly take configfs_dirent_lock around s_type in > > configfs_make_dirent(), and that ensures we see things correctly on SMP. > > Agreed, I was going to suggest something like this. Actually I'd push the > initialization of s_type down to configfs_new_dirent(), so that s_type either > is always NULL, or always shows the correct type of object. 0, not "NULL", but yeah I think that's a good plan. Joel -- "If at first you don't succeed, cover all traces that you tried." -Unknown Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-30 17:20 ` Joel Becker @ 2009-04-30 17:30 ` Joel Becker 2009-05-04 10:20 ` Louis Rilling 0 siblings, 1 reply; 53+ messages in thread From: Joel Becker @ 2009-04-30 17:30 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz On Thu, Apr 30, 2009 at 10:20:14AM -0700, Joel Becker wrote: > On Thu, Apr 30, 2009 at 11:18:28AM +0200, Louis Rilling wrote: > > Agreed, I was going to suggest something like this. Actually I'd push the > > initialization of s_type down to configfs_new_dirent(), so that s_type either > > is always NULL, or always shows the correct type of object. > > 0, not "NULL", but yeah I think that's a good plan. Like this. Please review the comment change mostly. diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c index 63d8815..8e48b52 100644 --- a/fs/configfs/dir.c +++ b/fs/configfs/dir.c @@ -167,8 +167,8 @@ configfs_adjust_dir_dirent_depth_after_populate(struct configfs_dirent *sd) /* * Allocates a new configfs_dirent and links it to the parent configfs_dirent */ -static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * parent_sd, - void * element) +static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent *parent_sd, + void *element, int type) { struct configfs_dirent * sd; @@ -180,6 +180,7 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare INIT_LIST_HEAD(&sd->s_links); INIT_LIST_HEAD(&sd->s_children); sd->s_element = element; + sd->s_type = type; configfs_init_dirent_depth(sd); spin_lock(&configfs_dirent_lock); if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { @@ -225,19 +226,12 @@ int configfs_make_dirent(struct configfs_dirent * parent_sd, { struct configfs_dirent * sd; - sd = configfs_new_dirent(parent_sd, element); + sd = configfs_new_dirent(parent_sd, element, type); if (IS_ERR(sd)) return PTR_ERR(sd); - /* - * We need configfs_dirent_lock so that configfs_depend_prep() - * can see s_type accurately on other CPUs. - */ - spin_lock(&configfs_dirent_lock); sd->s_mode = mode; - sd->s_type = type; sd->s_dentry = dentry; - spin_unlock(&configfs_dirent_lock); if (dentry) { dentry->d_fsdata = configfs_get(sd); dentry->d_op = &configfs_dentry_ops; @@ -1034,11 +1028,10 @@ static int configfs_dump(struct configfs_dirent *sd, int level) * dead, as well as items in the middle of attachment since they virtually * do not exist yet. This completes the locking out of racing mkdir() and * rmdir(). - * Note: items in the middle of attachment start with s_type = 0 - * (configfs_new_dirent()), and configfs_make_dirent() (called from - * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both - * cases the item is ignored. configfs_make_dirent() is locked out from - * updating s_type by configfs_dirent_lock. + * Note: subdirectories in the middle of attachment start with s_type = + * CONFIGFS_DIR|CONFIGFS_USET_CREATING set by create_dir(). When + * CONFIGFS_USET_CREATING is set, we ignore the item. The actual set of + * s_type is in configfs_new_dirent(), which has configfs_dirent_lock. * * If the target is not found, -ENOENT is bubbled up. * @@ -1514,7 +1507,7 @@ static int configfs_dir_open(struct inode *inode, struct file *file) */ err = -ENOENT; if (configfs_dirent_is_ready(parent_sd)) { - file->private_data = configfs_new_dirent(parent_sd, NULL); + file->private_data = configfs_new_dirent(parent_sd, NULL, 0); if (IS_ERR(file->private_data)) err = PTR_ERR(file->private_data); else -- "There is no sincerer love than the love of food." - George Bernard Shaw Joel Becker Principal Software Developer Oracle E-mail: joel.becker@oracle.com Phone: (650) 506-8127 ^ permalink raw reply related [flat|nested] 53+ messages in thread
* Re: [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy 2009-04-30 17:30 ` Joel Becker @ 2009-05-04 10:20 ` Louis Rilling 0 siblings, 0 replies; 53+ messages in thread From: Louis Rilling @ 2009-05-04 10:20 UTC (permalink / raw) To: linux-kernel, akpm, cluster-devel, swhiteho, peterz [-- Attachment #1: Type: text/plain, Size: 4567 bytes --] On 30/04/09 10:30 -0700, Joel Becker wrote: > On Thu, Apr 30, 2009 at 10:20:14AM -0700, Joel Becker wrote: > > On Thu, Apr 30, 2009 at 11:18:28AM +0200, Louis Rilling wrote: > > > Agreed, I was going to suggest something like this. Actually I'd push the > > > initialization of s_type down to configfs_new_dirent(), so that s_type either > > > is always NULL, or always shows the correct type of object. > > > > 0, not "NULL", but yeah I think that's a good plan. > > Like this. Please review the comment change mostly. Indeed, the code is exactly what I had in mind. Small comments about the comment change below. Thanks! Acked-by: Louis Rilling <louis.rilling@kerlabs.com> > > > diff --git a/fs/configfs/dir.c b/fs/configfs/dir.c > index 63d8815..8e48b52 100644 > --- a/fs/configfs/dir.c > +++ b/fs/configfs/dir.c > @@ -167,8 +167,8 @@ configfs_adjust_dir_dirent_depth_after_populate(struct configfs_dirent *sd) > /* > * Allocates a new configfs_dirent and links it to the parent configfs_dirent > */ > -static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * parent_sd, > - void * element) > +static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent *parent_sd, > + void *element, int type) > { > struct configfs_dirent * sd; > > @@ -180,6 +180,7 @@ static struct configfs_dirent *configfs_new_dirent(struct configfs_dirent * pare > INIT_LIST_HEAD(&sd->s_links); > INIT_LIST_HEAD(&sd->s_children); > sd->s_element = element; > + sd->s_type = type; > configfs_init_dirent_depth(sd); > spin_lock(&configfs_dirent_lock); > if (parent_sd->s_type & CONFIGFS_USET_DROPPING) { > @@ -225,19 +226,12 @@ int configfs_make_dirent(struct configfs_dirent * parent_sd, > { > struct configfs_dirent * sd; > > - sd = configfs_new_dirent(parent_sd, element); > + sd = configfs_new_dirent(parent_sd, element, type); > if (IS_ERR(sd)) > return PTR_ERR(sd); > > - /* > - * We need configfs_dirent_lock so that configfs_depend_prep() > - * can see s_type accurately on other CPUs. > - */ > - spin_lock(&configfs_dirent_lock); > sd->s_mode = mode; > - sd->s_type = type; > sd->s_dentry = dentry; > - spin_unlock(&configfs_dirent_lock); > if (dentry) { > dentry->d_fsdata = configfs_get(sd); > dentry->d_op = &configfs_dentry_ops; > @@ -1034,11 +1028,10 @@ static int configfs_dump(struct configfs_dirent *sd, int level) > * dead, as well as items in the middle of attachment since they virtually > * do not exist yet. This completes the locking out of racing mkdir() and > * rmdir(). > - * Note: items in the middle of attachment start with s_type = 0 > - * (configfs_new_dirent()), and configfs_make_dirent() (called from > - * create_dir()) sets s_type = CONFIGFS_DIR|CONFIGFS_USET_CREATING. In both > - * cases the item is ignored. configfs_make_dirent() is locked out from > - * updating s_type by configfs_dirent_lock. > + * Note: subdirectories in the middle of attachment start with s_type = > + * CONFIGFS_DIR|CONFIGFS_USET_CREATING set by create_dir(). When I'd say "As long as CONFIGFS_USET_CREATING is set", since, by design, once cleared, CONFIGFS_USET_CREATING never comes back. Louis > + * CONFIGFS_USET_CREATING is set, we ignore the item. The actual set of > + * s_type is in configfs_new_dirent(), which has configfs_dirent_lock. > * > * If the target is not found, -ENOENT is bubbled up. > * > @@ -1514,7 +1507,7 @@ static int configfs_dir_open(struct inode *inode, struct file *file) > */ > err = -ENOENT; > if (configfs_dirent_is_ready(parent_sd)) { > - file->private_data = configfs_new_dirent(parent_sd, NULL); > + file->private_data = configfs_new_dirent(parent_sd, NULL, 0); > if (IS_ERR(file->private_data)) > err = PTR_ERR(file->private_data); > else > > -- > > "There is no sincerer love than the love of food." > - George Bernard Shaw > > Joel Becker > Principal Software Developer > Oracle > E-mail: joel.becker@oracle.com > Phone: (650) 506-8127 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Dr Louis Rilling Kerlabs Skype: louis.rilling Batiment Germanium Phone: (+33|0) 6 80 89 08 23 80 avenue des Buttes de Coesmes http://www.kerlabs.com/ 35700 Rennes [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 53+ messages in thread
end of thread, other threads:[~2009-05-04 10:20 UTC | newest] Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-12-11 14:20 configfs, dlm_controld & lockdep Steven Whitehouse 2008-12-11 14:20 ` [Cluster-devel] " Steven Whitehouse 2008-12-11 14:44 ` Louis Rilling 2008-12-11 17:34 ` Joel Becker 2008-12-11 17:34 ` [Cluster-devel] " Joel Becker 2008-12-12 10:06 ` Louis Rilling 2008-12-12 15:29 ` [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Louis Rilling 2008-12-17 21:40 ` Andrew Morton 2008-12-17 21:40 ` [Cluster-devel] " Andrew Morton 2008-12-17 22:03 ` Joel Becker 2008-12-17 22:03 ` [Cluster-devel] " Joel Becker 2008-12-17 22:09 ` Andrew Morton 2008-12-17 22:09 ` [Cluster-devel] " Andrew Morton 2008-12-18 7:26 ` Peter Zijlstra 2008-12-18 9:27 ` Joel Becker 2008-12-18 9:27 ` [Cluster-devel] " Joel Becker 2008-12-18 11:15 ` Louis Rilling 2008-12-18 18:00 ` Make lockdep happy with configfs Louis Rilling 2009-01-26 11:51 ` Louis Rilling 2009-01-28 3:44 ` Joel Becker 2009-01-28 3:44 ` [Cluster-devel] " Joel Becker 2008-12-18 18:00 ` [PATCH 1/2] configfs: Silence lockdep on mkdir() and rmdir() Louis Rilling 2009-01-28 3:55 ` Joel Becker 2009-01-28 3:55 ` [Cluster-devel] " Joel Becker 2009-01-28 10:38 ` Louis Rilling 2008-12-18 18:00 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling 2009-01-28 4:13 ` Joel Becker 2009-01-28 4:13 ` [Cluster-devel] " Joel Becker 2009-01-28 10:32 ` Louis Rilling 2008-12-18 11:26 ` [PATCH] configfs: Silence lockdep on mkdir(), rmdir() and configfs_depend_item() Steven Whitehouse 2008-12-18 11:26 ` [Cluster-devel] " Steven Whitehouse 2008-12-18 11:48 ` Louis Rilling 2008-12-18 11:56 ` Peter Zijlstra 2008-12-18 12:28 ` Peter Zijlstra 2008-12-18 22:58 ` Joel Becker 2008-12-18 22:58 ` [Cluster-devel] " Joel Becker 2008-12-19 10:29 ` Louis Rilling 2009-01-26 12:30 ` Peter Zijlstra 2009-01-26 13:24 ` Louis Rilling 2009-01-26 13:41 ` Peter Zijlstra 2009-01-26 14:00 ` Louis Rilling 2009-01-26 14:19 ` Peter Zijlstra 2009-01-26 14:55 ` Louis Rilling 2009-01-28 3:05 ` Joel Becker 2009-01-28 3:05 ` [Cluster-devel] " Joel Becker 2009-01-28 3:41 ` Joel Becker 2009-01-28 3:41 ` [Cluster-devel] " Joel Becker 2009-01-28 18:18 [PATCH v2] Make lockdep happy with configfs Louis Rilling 2009-01-28 18:18 ` [PATCH 2/2] configfs: Rework configfs_depend_item() locking and make lockdep happy Louis Rilling 2009-04-29 18:52 ` Joel Becker 2009-04-30 9:18 ` Louis Rilling 2009-04-30 17:20 ` Joel Becker 2009-04-30 17:30 ` Joel Becker 2009-05-04 10:20 ` Louis Rilling
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.