* CCF locking problems
@ 2014-12-12 17:37 Russell King - ARM Linux
2014-12-12 18:50 ` Stephen Boyd
0 siblings, 1 reply; 3+ messages in thread
From: Russell King - ARM Linux @ 2014-12-12 17:37 UTC (permalink / raw)
To: linux-arm-kernel
CCF is causing lockdep problems elsewhere in the kernel...
During kernel boot, the following lock chain is established via CCF when
debugfs is enabled:
clk_register()
`-> __clk_init() (takes prepare_lock)
`-> clk_debug_register() (takes clk_debug_lock)
`-> debugfs_create_dir()
`-> __create_file() (takes i_mutex)
So, prepare_lock ends up being a parent of the i_mutex class.
Generic kernel code creates a dependency between i_mutex and mmap_sem:
iterate_dir() (takes i_mutex)
`-> dcache_readdir()
`-> filldir64()
`-> might_fault() (marks mmap_sem as potentially taken by
a fault)
This means prepare_lock is also a parent lock of mmap_sem.
The kernel mmap() implementation of takes the mmap_sem, before calling
into drivers. DRM's drm_gem_mmap() function which will be called as a
result of a mmap() takes the dev->struct_mutex mutex - which ends up as
a child of mmap_sem.
Now, if a DRM driver _then_ wants to use runtime PM, it may need to
call runtime PM functions beneath DRM's dev->struct_mutex (since, eg,
it may be protecting other DRM data while deciding which GPU to forward
to.) This ends up creating a circular dependency between dev->struct_mutex
and prepare_lock, involving all the above mentioned locks.
I believe it is totally unreasonable for CCF to allow the prepare lock
to depend on something as fundamental as core kernel locks - in fact,
looking at __clk_init(), it looks like this fails in a very basic aspect
of kernel programming: do setup first, then publish. If it did follow
that principle, it probably would not need to take the prepare lock
while calling clk_debug_register(), which would mean that prepare_lock
would not end up being a parent of potentially a lot of core kernel locks.
When you consider what prepare_lock is supposed to be doing, it's quite
clear that it should not be a parent to those.
Another interesting point is that clk_debug_create_one() has a comment
above it which is untrue:
/* caller must hold prepare_lock */
... except when it's called by clk_debug_init().
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 3+ messages in thread
* CCF locking problems
2014-12-12 17:37 CCF locking problems Russell King - ARM Linux
@ 2014-12-12 18:50 ` Stephen Boyd
2014-12-12 22:37 ` Russell King - ARM Linux
0 siblings, 1 reply; 3+ messages in thread
From: Stephen Boyd @ 2014-12-12 18:50 UTC (permalink / raw)
To: linux-arm-kernel
On 12/12/2014 09:37 AM, Russell King - ARM Linux wrote:
> CCF is causing lockdep problems elsewhere in the kernel...
>
> During kernel boot, the following lock chain is established via CCF when
> debugfs is enabled:
>
> clk_register()
> `-> __clk_init() (takes prepare_lock)
> `-> clk_debug_register() (takes clk_debug_lock)
> `-> debugfs_create_dir()
> `-> __create_file() (takes i_mutex)
>
> So, prepare_lock ends up being a parent of the i_mutex class.
>
> Generic kernel code creates a dependency between i_mutex and mmap_sem:
>
> iterate_dir() (takes i_mutex)
> `-> dcache_readdir()
> `-> filldir64()
> `-> might_fault() (marks mmap_sem as potentially taken by
> a fault)
>
> This means prepare_lock is also a parent lock of mmap_sem.
>
> The kernel mmap() implementation of takes the mmap_sem, before calling
> into drivers. DRM's drm_gem_mmap() function which will be called as a
> result of a mmap() takes the dev->struct_mutex mutex - which ends up as
> a child of mmap_sem.
>
> Now, if a DRM driver _then_ wants to use runtime PM, it may need to
> call runtime PM functions beneath DRM's dev->struct_mutex (since, eg,
> it may be protecting other DRM data while deciding which GPU to forward
> to.) This ends up creating a circular dependency between dev->struct_mutex
> and prepare_lock, involving all the above mentioned locks.
>
> I believe it is totally unreasonable for CCF to allow the prepare lock
> to depend on something as fundamental as core kernel locks - in fact,
> looking at __clk_init(), it looks like this fails in a very basic aspect
> of kernel programming: do setup first, then publish. If it did follow
> that principle, it probably would not need to take the prepare lock
> while calling clk_debug_register(), which would mean that prepare_lock
> would not end up being a parent of potentially a lot of core kernel locks.
>
> When you consider what prepare_lock is supposed to be doing, it's quite
> clear that it should not be a parent to those.
>
> Another interesting point is that clk_debug_create_one() has a comment
> above it which is untrue:
>
> /* caller must hold prepare_lock */
>
> ... except when it's called by clk_debug_init().
>
Do you have a lockdep splat? What kernel version are you running? There
was an earlier report of this that I tried to fix with this commit:
commit 6314b6796e3c070d4c8086b08dfd453a0aeac4cf
Author: Stephen Boyd <sboyd@codeaurora.org>
Date: Thu Sep 4 23:37:49 2014 -0700
clk: Don't hold prepare_lock across debugfs creation
but I'm not sure if you have that patch or if more recent changes to the
framework have caused this problem to reoccur. I think I missed the part
where clk_debug_register() is called after clk_debug_init() though, so
perhaps you have clocks that are getting registered after late init?
Here's an untested patch.
----8<-----
diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
index 44cdc47a6cc5..c9430653ddc9 100644
--- a/drivers/clk/clk.c
+++ b/drivers/clk/clk.c
@@ -240,12 +240,13 @@ static const struct file_operations clk_dump_fops = {
.release = single_release,
};
-/* caller must hold prepare_lock */
static int clk_debug_create_one(struct clk *clk, struct dentry *pdentry)
{
struct dentry *d;
int ret = -ENOMEM;
+ lockdep_assert_held(clk_debug_lock);
+
if (!clk || !pdentry) {
ret = -EINVAL;
goto out;
@@ -1944,7 +1945,6 @@ int __clk_init(struct device *dev, struct clk *clk)
else
clk->rate = 0;
- clk_debug_register(clk);
/*
* walk the list of orphan clocks and reparent any that are children of
* this clock
@@ -1979,6 +1979,9 @@ int __clk_init(struct device *dev, struct clk *clk)
out:
clk_prepare_unlock();
+ if (!ret)
+ clk_debug_register(clk);
+
return ret;
}
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply related [flat|nested] 3+ messages in thread
* CCF locking problems
2014-12-12 18:50 ` Stephen Boyd
@ 2014-12-12 22:37 ` Russell King - ARM Linux
0 siblings, 0 replies; 3+ messages in thread
From: Russell King - ARM Linux @ 2014-12-12 22:37 UTC (permalink / raw)
To: linux-arm-kernel
On Fri, Dec 12, 2014 at 10:50:35AM -0800, Stephen Boyd wrote:
> Do you have a lockdep splat? What kernel version are you running?
I could if I dug it out of the syslog files... but your patch would also
fix it - it's pretty close to what I'm now running. Mine was found on
a v3.18 based kernel.
> but I'm not sure if you have that patch or if more recent changes to the
> framework have caused this problem to reoccur. I think I missed the part
> where clk_debug_register() is called after clk_debug_init() though, so
> perhaps you have clocks that are getting registered after late init?
> Here's an untested patch.
>
> ----8<-----
>
> diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c
> index 44cdc47a6cc5..c9430653ddc9 100644
> --- a/drivers/clk/clk.c
> +++ b/drivers/clk/clk.c
> @@ -240,12 +240,13 @@ static const struct file_operations clk_dump_fops = {
> .release = single_release,
> };
>
> -/* caller must hold prepare_lock */
> static int clk_debug_create_one(struct clk *clk, struct dentry *pdentry)
> {
> struct dentry *d;
> int ret = -ENOMEM;
>
> + lockdep_assert_held(clk_debug_lock);
> +
> if (!clk || !pdentry) {
> ret = -EINVAL;
> goto out;
> @@ -1944,7 +1945,6 @@ int __clk_init(struct device *dev, struct clk *clk)
> else
> clk->rate = 0;
>
> - clk_debug_register(clk);
> /*
> * walk the list of orphan clocks and reparent any that are children of
> * this clock
> @@ -1979,6 +1979,9 @@ int __clk_init(struct device *dev, struct clk *clk)
> out:
> clk_prepare_unlock();
>
> + if (!ret)
> + clk_debug_register(clk);
> +
> return ret;
> }
Apart from the lockdep_assert_held(), this is basically what I'm
currently running, which does fix the problem. I was intending to send
my patch, but pub with good beer, gliding syndicate, and Christmas food
intervened tonight.
Thanks.
--
FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up
according to speedtest.net.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-12-12 22:37 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-12 17:37 CCF locking problems Russell King - ARM Linux
2014-12-12 18:50 ` Stephen Boyd
2014-12-12 22:37 ` Russell King - ARM Linux
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).