All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Vetter <daniel@ffwll.ch>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Rob Clark <robdclark@chromium.org>, Sean Paul <sean@poorly.run>,
	David Airlie <airlied@linux.ie>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU" 
	<linux-arm-msm@vger.kernel.org>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU" 
	<freedreno@lists.freedesktop.org>,
	open list <linux-kernel@vger.kernel.org>,
	"Menon, Nishanth" <nm@ti.com>
Subject: Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
Date: Sun, 25 Oct 2020 10:39:06 -0700	[thread overview]
Message-ID: <CAF6AEGv6RMCsK4yp-W2d1mVTMcEiiwFGAb+V8rYLhDdMhqP80Q@mail.gmail.com> (raw)
In-Reply-To: <20201022080644.2ck4okrxygmkuatn@vireshk-i7>

On Thu, Oct 22, 2020 at 1:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 07:13, Rob Clark wrote:
> > On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > > in setup/teardown code, since there's no other thread which can
> > > > possible hold a reference to anything your touching anymore. Ofc
> > > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > > into lists and stuff.
> > > >
> > > > The other reason is that especially with anything related to sysfs or
> > > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > > much nothing left you're allowed to hold while acquiring such a lock.
> > > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > > drivers) which need to interact with core mm) this means your
> > > > subsystem is a major pain to use.
> > > >
> > > > Usually the correct fix is to only hold your subsystem locks in
> > > > setup/teardown when absolutely required, and fix any data
> > > > inconsistency issues by reordering your setup/teardown code: When you
> > > > register as the last step and unregister as the first step, there's no
> > > > need for any additional locking. And hence no need to call debugfs
> > > > functions while holding your subsystem locks.
> > > >
> > > > The catch phrase I use for this is "don't solve object lifetime issues
> > > > with locking". Instead use refcounting and careful ordering in
> > > > setup/teardown code.
> > >
> > > This is exactly what I have done in the OPP core, the locks were taken
> > > only when really necessary, though as we have seen now I have missed
> > > that at a single place and that should be fixed as well. Will do that,
> > > thanks.
> >
> > I do have an easy enough way to repro the issue, so if you have a
> > patch I can certainly test it.
>
> Does this fix it for you ? There is one more place still left where we
> are taking the opp_table_lock while adding stuff to debugfs and that's
> not that straight forward to fix. But I didn't see that path in your
> circular dependency trace, so who knows :)

Nope, I suspect any creation of debugfs files will be problematic.

(btw, _add_opp_dev_unlocked() looks like it should be called
_add_opp_dev_locked()?)

It does look like 'struct opp_table' is already refcnt'd, so I suspect
you could replace holding opp_table_lock while calling into debugfs
with holding a reference to the opp_table instead?

BR,
-R

[  +0.074543] ======================================================
[  +0.006347] WARNING: possible circular locking dependency detected
[  +0.006349] 5.4.72 #14 Not tainted
[  +0.003501] ------------------------------------------------------
[  +0.006350] chrome/1865 is trying to acquire lock:
[  +0.004922] ffffffdd34921750 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[  +0.007779]
              but task is already holding lock:
[  +0.005989] ffffff81f0fc71a8 (reservation_ww_class_mutex){+.+.}, at:
submit_lock_objects+0x70/0x1ec
[  +0.001132] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[  +0.008156]
              which lock already depends on the new lock.

[  +0.000002]
              the existing dependency chain (in reverse order) is:
[  +0.000002]
              -> #4 (reservation_ww_class_mutex){+.+.}:
[  +0.000009]        __mutex_lock_common+0xec/0xc0c
[  +0.000004]        ww_mutex_lock_interruptible+0x5c/0xc4
[  +0.000003]        msm_gem_fault+0x2c/0x124
[  +0.000005]        __do_fault+0x40/0x16c
[  +0.000003]        handle_mm_fault+0x7cc/0xd98
[  +0.000005]        do_page_fault+0x230/0x3b4
[  +0.000003]        do_translation_fault+0x5c/0x78
[  +0.000004]        do_mem_abort+0x4c/0xb4
[  +0.000006]        el0_da+0x1c/0x20
[  +0.069917]
              -> #3 (&mm->mmap_sem){++++}:
[  +0.005548]        __might_fault+0x70/0x98
[  +0.004209]        compat_filldir+0xf8/0x48c
[  +0.004394]        dcache_readdir+0x70/0x1dc
[  +0.004394]        iterate_dir+0xd4/0x180
[  +0.004119]        __arm64_compat_sys_getdents+0xa0/0x19c
[  +0.005549]        el0_svc_common+0xa8/0x178
[  +0.004394]        el0_svc_compat_handler+0x2c/0x40
[  +0.005007]        el0_svc_compat+0x8/0x10
[  +0.004205]
              -> #2 (&sb->s_type->i_mutex_key#3){++++}:
[  +0.006708]        down_write+0x54/0x16c
[  +0.004034]        start_creating+0x68/0x128
[  +0.004392]        debugfs_create_dir+0x28/0x114
[  +0.004747]        opp_debug_register+0x8c/0xc0
[  +0.004657]        _add_opp_dev_unlocked+0x5c/0x70
[  +0.004920]        _add_opp_dev+0x38/0x58
[  +0.004118]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005813]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005724]        rpmhpd_probe+0x240/0x268
[  +0.004307]        platform_drv_probe+0x90/0xb0
[  +0.004654]        really_probe+0x134/0x2ec
[  +0.004304]        driver_probe_device+0x64/0xfc
[  +0.004746]        __device_attach_driver+0x8c/0xa4
[  +0.005008]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004480]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004121]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004833]        devm_of_platform_populate+0x58/0xa0
[  +0.005283]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004481]        platform_drv_probe+0x90/0xb0
[  +0.004657]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004833]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_bus_create+0x230/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004836]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005986]        do_one_initcall+0x1c8/0x3fc
[  +0.004572]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004304]        kernel_init_freeable+0x124/0x1a4
[  +0.005009]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004205]
              -> #1 (&opp_table->lock){+.+.}:
[  +0.005815]        __mutex_lock_common+0xec/0xc0c
[  +0.004832]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _add_opp_dev+0x2c/0x58
[  +0.004119]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005814]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005721]        rpmhpd_probe+0x240/0x268
[  +0.004306]        platform_drv_probe+0x90/0xb0
[  +0.004656]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005810]        of_platform_bus_create+0x1e4/0x368
[  +0.005197]        of_platform_populate+0x70/0xbc
[  +0.004832]        devm_of_platform_populate+0x58/0xa0
[  +0.005284]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004480]        platform_drv_probe+0x90/0xb0
[  +0.004658]        really_probe+0x134/0x2ec
[  +0.004301]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004831]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004118]        of_device_add+0x3c/0x4c
[  +0.004214]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005817]        of_platform_bus_create+0x1e4/0x368
[  +0.005186]        of_platform_bus_create+0x230/0x368
[  +0.005188]        of_platform_populate+0x70/0xbc
[  +0.004832]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005987]        do_one_initcall+0x1c8/0x3fc
[  +0.004569]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004305]        kernel_init_freeable+0x124/0x1a4
[  +0.005008]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004206]
              -> #0 (opp_table_lock){+.+.}:
[  +0.005640]        __lock_acquire+0xee4/0x2450
[  +0.004570]        lock_acquire+0x1cc/0x210
[  +0.004305]        __mutex_lock_common+0xec/0xc0c
[  +0.004833]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _find_opp_table+0x34/0x74
[  +0.004393]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.005372]        a6xx_gmu_resume+0xc8/0xecc
[  +0.004480]        a6xx_pm_resume+0x148/0x200
[  +0.004482]        adreno_resume+0x28/0x34
[  +0.004209]        pm_generic_runtime_resume+0x34/0x48
[  +0.005283]        __rpm_callback+0x70/0x10c
[  +0.004393]        rpm_callback+0x34/0x8c
[  +0.004119]        rpm_resume+0x414/0x550
[  +0.004119]        __pm_runtime_resume+0x7c/0xa0
[  +0.004746]        msm_gpu_submit+0x60/0x1c0
[  +0.004394]        msm_ioctl_gem_submit+0xadc/0xb60
[  +0.005010]        drm_ioctl_kernel+0x9c/0x118
[  +0.004569]        drm_ioctl+0x27c/0x408
[  +0.004034]        drm_compat_ioctl+0xcc/0xdc
[  +0.004483]        __se_compat_sys_ioctl+0x100/0x206c
[  +0.005186]        __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.005187]        el0_svc_common+0xa8/0x178
[  +0.004393]        el0_svc_compat_handler+0x2c/0x40
[  +0.005009]        el0_svc_compat+0x8/0x10
[  +0.004205]
              other info that might help us debug this:

[  +0.008213] Chain exists of:
                opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

[  +0.011780]  Possible unsafe locking scenario:

[  +0.006082]        CPU0                    CPU1
[  +0.004660]        ----                    ----
[  +0.004656]   lock(reservation_ww_class_mutex);
[  +0.004657]                                lock(&mm->mmap_sem);
[  +0.006079]                                lock(reservation_ww_class_mutex);
[  +0.007237]   lock(opp_table_lock);
[  +0.003592]
               *** DEADLOCK ***

[  +0.006084] 3 locks held by chrome/1865:
[  +0.004031]  #0: ffffff81edecc0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[  +0.009198]  #1: ffffff81d0000870
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[  +0.010086]  #2: ffffff81f0fc71a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[  +0.009735]
              stack backtrace:
[  +0.004482] CPU: 0 PID: 1865 Comm: chrome Not tainted 5.4.72 #14
[  +0.006173] Hardware name: Google Lazor (rev1+) with LTE (DT)
[  +0.005899] Call trace:
[  +0.002515]  dump_backtrace+0x0/0x158
[  +0.003768]  show_stack+0x20/0x2c
[  +0.003407]  dump_stack+0xc8/0x160
[  +0.003506]  print_circular_bug+0x2c4/0x2c8
[  +0.004305]  check_noncircular+0x1a8/0x1b0
[  +0.004206]  __lock_acquire+0xee4/0x2450
[  +0.004032]  lock_acquire+0x1cc/0x210
[  +0.003768]  __mutex_lock_common+0xec/0xc0c
[  +0.004305]  mutex_lock_nested+0x40/0x50
[  +0.004033]  _find_opp_table+0x34/0x74
[  +0.003855]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.004833]  a6xx_gmu_resume+0xc8/0xecc
[  +0.003943]  a6xx_pm_resume+0x148/0x200
[  +0.003944]  adreno_resume+0x28/0x34
[  +0.003681]  pm_generic_runtime_resume+0x34/0x48
[  +0.004745]  __rpm_callback+0x70/0x10c
[  +0.003854]  rpm_callback+0x34/0x8c
[  +0.003592]  rpm_resume+0x414/0x550
[  +0.003592]  __pm_runtime_resume+0x7c/0xa0
[  +0.004207]  msm_gpu_submit+0x60/0x1c0
[  +0.003855]  msm_ioctl_gem_submit+0xadc/0xb60
[  +0.004481]  drm_ioctl_kernel+0x9c/0x118
[  +0.004031]  drm_ioctl+0x27c/0x408
[  +0.003504]  drm_compat_ioctl+0xcc/0xdc
[  +0.003945]  __se_compat_sys_ioctl+0x100/0x206c
[  +0.004658]  __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.004659]  el0_svc_common+0xa8/0x178
[  +0.003855]  el0_svc_compat_handler+0x2c/0x40
[  +0.004480]  el0_svc_compat+0x8/0x10

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 2483e765318a..4cc0fb716381 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
>         struct opp_device *opp_dev, *temp;
>         int i;
>
> +       /* Drop the lock as soon as we can */
> +       list_del(&opp_table->node);
> +       mutex_unlock(&opp_table_lock);
> +
>         _of_clear_opp_table(opp_table);
>
>         /* Release clk */
> @@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
>
>         mutex_destroy(&opp_table->genpd_virt_dev_lock);
>         mutex_destroy(&opp_table->lock);
> -       list_del(&opp_table->node);
>         kfree(opp_table);
> -
> -       mutex_unlock(&opp_table_lock);
>  }
>
>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> --
> viresh

WARNING: multiple messages have this Message-ID (diff)
From: Rob Clark <robdclark@gmail.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Rob Clark <robdclark@chromium.org>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU"
	<freedreno@lists.freedesktop.org>, "Menon, Nishanth" <nm@ti.com>,
	David Airlie <airlied@linux.ie>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU"
	<linux-arm-msm@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Sean Paul <sean@poorly.run>
Subject: Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
Date: Sun, 25 Oct 2020 10:39:06 -0700	[thread overview]
Message-ID: <CAF6AEGv6RMCsK4yp-W2d1mVTMcEiiwFGAb+V8rYLhDdMhqP80Q@mail.gmail.com> (raw)
In-Reply-To: <20201022080644.2ck4okrxygmkuatn@vireshk-i7>

On Thu, Oct 22, 2020 at 1:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 07:13, Rob Clark wrote:
> > On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > > in setup/teardown code, since there's no other thread which can
> > > > possible hold a reference to anything your touching anymore. Ofc
> > > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > > into lists and stuff.
> > > >
> > > > The other reason is that especially with anything related to sysfs or
> > > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > > much nothing left you're allowed to hold while acquiring such a lock.
> > > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > > drivers) which need to interact with core mm) this means your
> > > > subsystem is a major pain to use.
> > > >
> > > > Usually the correct fix is to only hold your subsystem locks in
> > > > setup/teardown when absolutely required, and fix any data
> > > > inconsistency issues by reordering your setup/teardown code: When you
> > > > register as the last step and unregister as the first step, there's no
> > > > need for any additional locking. And hence no need to call debugfs
> > > > functions while holding your subsystem locks.
> > > >
> > > > The catch phrase I use for this is "don't solve object lifetime issues
> > > > with locking". Instead use refcounting and careful ordering in
> > > > setup/teardown code.
> > >
> > > This is exactly what I have done in the OPP core, the locks were taken
> > > only when really necessary, though as we have seen now I have missed
> > > that at a single place and that should be fixed as well. Will do that,
> > > thanks.
> >
> > I do have an easy enough way to repro the issue, so if you have a
> > patch I can certainly test it.
>
> Does this fix it for you ? There is one more place still left where we
> are taking the opp_table_lock while adding stuff to debugfs and that's
> not that straight forward to fix. But I didn't see that path in your
> circular dependency trace, so who knows :)

Nope, I suspect any creation of debugfs files will be problematic.

(btw, _add_opp_dev_unlocked() looks like it should be called
_add_opp_dev_locked()?)

It does look like 'struct opp_table' is already refcnt'd, so I suspect
you could replace holding opp_table_lock while calling into debugfs
with holding a reference to the opp_table instead?

BR,
-R

[  +0.074543] ======================================================
[  +0.006347] WARNING: possible circular locking dependency detected
[  +0.006349] 5.4.72 #14 Not tainted
[  +0.003501] ------------------------------------------------------
[  +0.006350] chrome/1865 is trying to acquire lock:
[  +0.004922] ffffffdd34921750 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[  +0.007779]
              but task is already holding lock:
[  +0.005989] ffffff81f0fc71a8 (reservation_ww_class_mutex){+.+.}, at:
submit_lock_objects+0x70/0x1ec
[  +0.001132] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[  +0.008156]
              which lock already depends on the new lock.

[  +0.000002]
              the existing dependency chain (in reverse order) is:
[  +0.000002]
              -> #4 (reservation_ww_class_mutex){+.+.}:
[  +0.000009]        __mutex_lock_common+0xec/0xc0c
[  +0.000004]        ww_mutex_lock_interruptible+0x5c/0xc4
[  +0.000003]        msm_gem_fault+0x2c/0x124
[  +0.000005]        __do_fault+0x40/0x16c
[  +0.000003]        handle_mm_fault+0x7cc/0xd98
[  +0.000005]        do_page_fault+0x230/0x3b4
[  +0.000003]        do_translation_fault+0x5c/0x78
[  +0.000004]        do_mem_abort+0x4c/0xb4
[  +0.000006]        el0_da+0x1c/0x20
[  +0.069917]
              -> #3 (&mm->mmap_sem){++++}:
[  +0.005548]        __might_fault+0x70/0x98
[  +0.004209]        compat_filldir+0xf8/0x48c
[  +0.004394]        dcache_readdir+0x70/0x1dc
[  +0.004394]        iterate_dir+0xd4/0x180
[  +0.004119]        __arm64_compat_sys_getdents+0xa0/0x19c
[  +0.005549]        el0_svc_common+0xa8/0x178
[  +0.004394]        el0_svc_compat_handler+0x2c/0x40
[  +0.005007]        el0_svc_compat+0x8/0x10
[  +0.004205]
              -> #2 (&sb->s_type->i_mutex_key#3){++++}:
[  +0.006708]        down_write+0x54/0x16c
[  +0.004034]        start_creating+0x68/0x128
[  +0.004392]        debugfs_create_dir+0x28/0x114
[  +0.004747]        opp_debug_register+0x8c/0xc0
[  +0.004657]        _add_opp_dev_unlocked+0x5c/0x70
[  +0.004920]        _add_opp_dev+0x38/0x58
[  +0.004118]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005813]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005724]        rpmhpd_probe+0x240/0x268
[  +0.004307]        platform_drv_probe+0x90/0xb0
[  +0.004654]        really_probe+0x134/0x2ec
[  +0.004304]        driver_probe_device+0x64/0xfc
[  +0.004746]        __device_attach_driver+0x8c/0xa4
[  +0.005008]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004480]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004121]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004833]        devm_of_platform_populate+0x58/0xa0
[  +0.005283]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004481]        platform_drv_probe+0x90/0xb0
[  +0.004657]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004833]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_bus_create+0x230/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004836]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005986]        do_one_initcall+0x1c8/0x3fc
[  +0.004572]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004304]        kernel_init_freeable+0x124/0x1a4
[  +0.005009]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004205]
              -> #1 (&opp_table->lock){+.+.}:
[  +0.005815]        __mutex_lock_common+0xec/0xc0c
[  +0.004832]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _add_opp_dev+0x2c/0x58
[  +0.004119]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005814]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005721]        rpmhpd_probe+0x240/0x268
[  +0.004306]        platform_drv_probe+0x90/0xb0
[  +0.004656]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005810]        of_platform_bus_create+0x1e4/0x368
[  +0.005197]        of_platform_populate+0x70/0xbc
[  +0.004832]        devm_of_platform_populate+0x58/0xa0
[  +0.005284]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004480]        platform_drv_probe+0x90/0xb0
[  +0.004658]        really_probe+0x134/0x2ec
[  +0.004301]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004831]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004118]        of_device_add+0x3c/0x4c
[  +0.004214]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005817]        of_platform_bus_create+0x1e4/0x368
[  +0.005186]        of_platform_bus_create+0x230/0x368
[  +0.005188]        of_platform_populate+0x70/0xbc
[  +0.004832]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005987]        do_one_initcall+0x1c8/0x3fc
[  +0.004569]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004305]        kernel_init_freeable+0x124/0x1a4
[  +0.005008]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004206]
              -> #0 (opp_table_lock){+.+.}:
[  +0.005640]        __lock_acquire+0xee4/0x2450
[  +0.004570]        lock_acquire+0x1cc/0x210
[  +0.004305]        __mutex_lock_common+0xec/0xc0c
[  +0.004833]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _find_opp_table+0x34/0x74
[  +0.004393]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.005372]        a6xx_gmu_resume+0xc8/0xecc
[  +0.004480]        a6xx_pm_resume+0x148/0x200
[  +0.004482]        adreno_resume+0x28/0x34
[  +0.004209]        pm_generic_runtime_resume+0x34/0x48
[  +0.005283]        __rpm_callback+0x70/0x10c
[  +0.004393]        rpm_callback+0x34/0x8c
[  +0.004119]        rpm_resume+0x414/0x550
[  +0.004119]        __pm_runtime_resume+0x7c/0xa0
[  +0.004746]        msm_gpu_submit+0x60/0x1c0
[  +0.004394]        msm_ioctl_gem_submit+0xadc/0xb60
[  +0.005010]        drm_ioctl_kernel+0x9c/0x118
[  +0.004569]        drm_ioctl+0x27c/0x408
[  +0.004034]        drm_compat_ioctl+0xcc/0xdc
[  +0.004483]        __se_compat_sys_ioctl+0x100/0x206c
[  +0.005186]        __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.005187]        el0_svc_common+0xa8/0x178
[  +0.004393]        el0_svc_compat_handler+0x2c/0x40
[  +0.005009]        el0_svc_compat+0x8/0x10
[  +0.004205]
              other info that might help us debug this:

[  +0.008213] Chain exists of:
                opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

[  +0.011780]  Possible unsafe locking scenario:

[  +0.006082]        CPU0                    CPU1
[  +0.004660]        ----                    ----
[  +0.004656]   lock(reservation_ww_class_mutex);
[  +0.004657]                                lock(&mm->mmap_sem);
[  +0.006079]                                lock(reservation_ww_class_mutex);
[  +0.007237]   lock(opp_table_lock);
[  +0.003592]
               *** DEADLOCK ***

[  +0.006084] 3 locks held by chrome/1865:
[  +0.004031]  #0: ffffff81edecc0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[  +0.009198]  #1: ffffff81d0000870
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[  +0.010086]  #2: ffffff81f0fc71a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[  +0.009735]
              stack backtrace:
[  +0.004482] CPU: 0 PID: 1865 Comm: chrome Not tainted 5.4.72 #14
[  +0.006173] Hardware name: Google Lazor (rev1+) with LTE (DT)
[  +0.005899] Call trace:
[  +0.002515]  dump_backtrace+0x0/0x158
[  +0.003768]  show_stack+0x20/0x2c
[  +0.003407]  dump_stack+0xc8/0x160
[  +0.003506]  print_circular_bug+0x2c4/0x2c8
[  +0.004305]  check_noncircular+0x1a8/0x1b0
[  +0.004206]  __lock_acquire+0xee4/0x2450
[  +0.004032]  lock_acquire+0x1cc/0x210
[  +0.003768]  __mutex_lock_common+0xec/0xc0c
[  +0.004305]  mutex_lock_nested+0x40/0x50
[  +0.004033]  _find_opp_table+0x34/0x74
[  +0.003855]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.004833]  a6xx_gmu_resume+0xc8/0xecc
[  +0.003943]  a6xx_pm_resume+0x148/0x200
[  +0.003944]  adreno_resume+0x28/0x34
[  +0.003681]  pm_generic_runtime_resume+0x34/0x48
[  +0.004745]  __rpm_callback+0x70/0x10c
[  +0.003854]  rpm_callback+0x34/0x8c
[  +0.003592]  rpm_resume+0x414/0x550
[  +0.003592]  __pm_runtime_resume+0x7c/0xa0
[  +0.004207]  msm_gpu_submit+0x60/0x1c0
[  +0.003855]  msm_ioctl_gem_submit+0xadc/0xb60
[  +0.004481]  drm_ioctl_kernel+0x9c/0x118
[  +0.004031]  drm_ioctl+0x27c/0x408
[  +0.003504]  drm_compat_ioctl+0xcc/0xdc
[  +0.003945]  __se_compat_sys_ioctl+0x100/0x206c
[  +0.004658]  __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.004659]  el0_svc_common+0xa8/0x178
[  +0.003855]  el0_svc_compat_handler+0x2c/0x40
[  +0.004480]  el0_svc_compat+0x8/0x10

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 2483e765318a..4cc0fb716381 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
>         struct opp_device *opp_dev, *temp;
>         int i;
>
> +       /* Drop the lock as soon as we can */
> +       list_del(&opp_table->node);
> +       mutex_unlock(&opp_table_lock);
> +
>         _of_clear_opp_table(opp_table);
>
>         /* Release clk */
> @@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
>
>         mutex_destroy(&opp_table->genpd_virt_dev_lock);
>         mutex_destroy(&opp_table->lock);
> -       list_del(&opp_table->node);
>         kfree(opp_table);
> -
> -       mutex_unlock(&opp_table_lock);
>  }
>
>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> --
> viresh
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

  reply	other threads:[~2020-10-25 17:39 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
2020-10-12  2:09 ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12 14:35   ` Daniel Vetter
2020-10-12 14:35     ` Daniel Vetter
2020-10-12 15:43     ` Rob Clark
2020-10-12 15:43       ` Rob Clark
2020-10-20  9:07       ` Viresh Kumar
2020-10-20  9:07         ` Viresh Kumar
2020-10-20 10:56         ` Daniel Vetter
2020-10-20 10:56           ` Daniel Vetter
2020-10-20 11:24           ` Viresh Kumar
2020-10-20 11:24             ` Viresh Kumar
2020-10-20 11:42             ` Daniel Vetter
2020-10-20 11:42               ` Daniel Vetter
2020-10-20 14:13             ` Rob Clark
2020-10-20 14:13               ` Rob Clark
2020-10-22  8:06               ` Viresh Kumar
2020-10-22  8:06                 ` Viresh Kumar
2020-10-25 17:39                 ` Rob Clark [this message]
2020-10-25 17:39                   ` Rob Clark
2020-10-27 11:35                   ` Viresh Kumar
2020-10-27 11:35                     ` Viresh Kumar
2020-11-03  5:47                     ` Viresh Kumar
2020-11-03  5:47                       ` Viresh Kumar
2020-11-03 16:50                       ` Rob Clark
2020-11-03 16:50                         ` Rob Clark
2020-11-04  3:03                         ` Viresh Kumar
2020-11-04  3:03                           ` Viresh Kumar
2020-11-05 19:24                           ` Rob Clark
2020-11-05 19:24                             ` Rob Clark
2020-11-06  7:16                             ` Viresh Kumar
2020-11-06  7:16                               ` Viresh Kumar
2020-11-17 10:03                               ` Viresh Kumar
2020-11-17 10:03                                 ` Viresh Kumar
2020-11-17 17:02                               ` Rob Clark
2020-11-17 17:02                                 ` Rob Clark
2020-11-18  5:28                                 ` Viresh Kumar
2020-11-18  5:28                                   ` Viresh Kumar
2020-11-18 16:53                                   ` Rob Clark
2020-11-18 16:53                                     ` Rob Clark
2020-11-19  6:05                                     ` Viresh Kumar
2020-11-19  6:05                                       ` Viresh Kumar
2020-12-07  6:16                                       ` Viresh Kumar
2020-12-07  6:16                                         ` Viresh Kumar
2020-12-16  5:22                                         ` Viresh Kumar
2020-12-16  5:22                                           ` Viresh Kumar
2020-10-12  2:09 ` [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 10/22] drm/msm: Drop chatty trace Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 11/22] drm/msm: Move update_fences() Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 13/22] drm/msm: Document and rename preempt_lock Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 15/22] drm/msm: Refcount submits Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 16/22] drm/msm: Remove obj->gpu Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 19/22] drm/msm: remove msm_gem_free_work Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12  2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
2020-10-12  2:09   ` Rob Clark
2020-10-12 14:40   ` Daniel Vetter
2020-10-12 14:40     ` Daniel Vetter
2020-10-12 15:07     ` Rob Clark
2020-10-12 15:07       ` Rob Clark
2020-10-13 11:08       ` Daniel Vetter
2020-10-13 11:08         ` Daniel Vetter
2020-10-13 16:15         ` [Freedreno] " Rob Clark
2020-10-13 16:15           ` Rob Clark
2020-10-15  8:22           ` Daniel Vetter
2020-10-15  8:22             ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF6AEGv6RMCsK4yp-W2d1mVTMcEiiwFGAb+V8rYLhDdMhqP80Q@mail.gmail.com \
    --to=robdclark@gmail.com \
    --cc=airlied@linux.ie \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nm@ti.com \
    --cc=robdclark@chromium.org \
    --cc=sean@poorly.run \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.