linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Daniel Vetter <daniel@ffwll.ch>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	Rob Clark <robdclark@chromium.org>, Sean Paul <sean@poorly.run>,
	David Airlie <airlied@linux.ie>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU" 
	<linux-arm-msm@vger.kernel.org>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU" 
	<freedreno@lists.freedesktop.org>,
	open list <linux-kernel@vger.kernel.org>,
	"Menon, Nishanth" <nm@ti.com>
Subject: Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
Date: Sun, 25 Oct 2020 10:39:06 -0700	[thread overview]
Message-ID: <CAF6AEGv6RMCsK4yp-W2d1mVTMcEiiwFGAb+V8rYLhDdMhqP80Q@mail.gmail.com> (raw)
In-Reply-To: <20201022080644.2ck4okrxygmkuatn@vireshk-i7>

On Thu, Oct 22, 2020 at 1:06 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 20-10-20, 07:13, Rob Clark wrote:
> > On Tue, Oct 20, 2020 at 4:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 20-10-20, 12:56, Daniel Vetter wrote:
> > > > Yeah that's bad practice. Generally you shouldn't need to hold locks
> > > > in setup/teardown code, since there's no other thread which can
> > > > possible hold a reference to anything your touching anymore. Ofc
> > > > excluding quickly grabbing/dropping a lock to insert/remove objects
> > > > into lists and stuff.
> > > >
> > > > The other reason is that especially with anything related to sysfs or
> > > > debugfs, the locking dependencies you're pulling in are enormous: vfs
> > > > locks pull in mm locks (due to mmap) and at that point there's pretty
> > > > much nothing left you're allowed to hold while acquiring such a lock.
> > > > For simple drivers this is no issue, but for fancy drivers (like gpu
> > > > drivers) which need to interact with core mm) this means your
> > > > subsystem is a major pain to use.
> > > >
> > > > Usually the correct fix is to only hold your subsystem locks in
> > > > setup/teardown when absolutely required, and fix any data
> > > > inconsistency issues by reordering your setup/teardown code: When you
> > > > register as the last step and unregister as the first step, there's no
> > > > need for any additional locking. And hence no need to call debugfs
> > > > functions while holding your subsystem locks.
> > > >
> > > > The catch phrase I use for this is "don't solve object lifetime issues
> > > > with locking". Instead use refcounting and careful ordering in
> > > > setup/teardown code.
> > >
> > > This is exactly what I have done in the OPP core, the locks were taken
> > > only when really necessary, though as we have seen now I have missed
> > > that at a single place and that should be fixed as well. Will do that,
> > > thanks.
> >
> > I do have an easy enough way to repro the issue, so if you have a
> > patch I can certainly test it.
>
> Does this fix it for you ? There is one more place still left where we
> are taking the opp_table_lock while adding stuff to debugfs and that's
> not that straight forward to fix. But I didn't see that path in your
> circular dependency trace, so who knows :)

Nope, I suspect any creation of debugfs files will be problematic.

(btw, _add_opp_dev_unlocked() looks like it should be called
_add_opp_dev_locked()?)

It does look like 'struct opp_table' is already refcnt'd, so I suspect
you could replace holding opp_table_lock while calling into debugfs
with holding a reference to the opp_table instead?

BR,
-R

[  +0.074543] ======================================================
[  +0.006347] WARNING: possible circular locking dependency detected
[  +0.006349] 5.4.72 #14 Not tainted
[  +0.003501] ------------------------------------------------------
[  +0.006350] chrome/1865 is trying to acquire lock:
[  +0.004922] ffffffdd34921750 (opp_table_lock){+.+.}, at:
_find_opp_table+0x34/0x74
[  +0.007779]
              but task is already holding lock:
[  +0.005989] ffffff81f0fc71a8 (reservation_ww_class_mutex){+.+.}, at:
submit_lock_objects+0x70/0x1ec
[  +0.001132] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
[  +0.008156]
              which lock already depends on the new lock.

[  +0.000002]
              the existing dependency chain (in reverse order) is:
[  +0.000002]
              -> #4 (reservation_ww_class_mutex){+.+.}:
[  +0.000009]        __mutex_lock_common+0xec/0xc0c
[  +0.000004]        ww_mutex_lock_interruptible+0x5c/0xc4
[  +0.000003]        msm_gem_fault+0x2c/0x124
[  +0.000005]        __do_fault+0x40/0x16c
[  +0.000003]        handle_mm_fault+0x7cc/0xd98
[  +0.000005]        do_page_fault+0x230/0x3b4
[  +0.000003]        do_translation_fault+0x5c/0x78
[  +0.000004]        do_mem_abort+0x4c/0xb4
[  +0.000006]        el0_da+0x1c/0x20
[  +0.069917]
              -> #3 (&mm->mmap_sem){++++}:
[  +0.005548]        __might_fault+0x70/0x98
[  +0.004209]        compat_filldir+0xf8/0x48c
[  +0.004394]        dcache_readdir+0x70/0x1dc
[  +0.004394]        iterate_dir+0xd4/0x180
[  +0.004119]        __arm64_compat_sys_getdents+0xa0/0x19c
[  +0.005549]        el0_svc_common+0xa8/0x178
[  +0.004394]        el0_svc_compat_handler+0x2c/0x40
[  +0.005007]        el0_svc_compat+0x8/0x10
[  +0.004205]
              -> #2 (&sb->s_type->i_mutex_key#3){++++}:
[  +0.006708]        down_write+0x54/0x16c
[  +0.004034]        start_creating+0x68/0x128
[  +0.004392]        debugfs_create_dir+0x28/0x114
[  +0.004747]        opp_debug_register+0x8c/0xc0
[  +0.004657]        _add_opp_dev_unlocked+0x5c/0x70
[  +0.004920]        _add_opp_dev+0x38/0x58
[  +0.004118]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005813]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005724]        rpmhpd_probe+0x240/0x268
[  +0.004307]        platform_drv_probe+0x90/0xb0
[  +0.004654]        really_probe+0x134/0x2ec
[  +0.004304]        driver_probe_device+0x64/0xfc
[  +0.004746]        __device_attach_driver+0x8c/0xa4
[  +0.005008]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004480]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004121]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004833]        devm_of_platform_populate+0x58/0xa0
[  +0.005283]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004481]        platform_drv_probe+0x90/0xb0
[  +0.004657]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004833]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005811]        of_platform_bus_create+0x1e4/0x368
[  +0.005185]        of_platform_bus_create+0x230/0x368
[  +0.005185]        of_platform_populate+0x70/0xbc
[  +0.004836]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005986]        do_one_initcall+0x1c8/0x3fc
[  +0.004572]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004304]        kernel_init_freeable+0x124/0x1a4
[  +0.005009]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004205]
              -> #1 (&opp_table->lock){+.+.}:
[  +0.005815]        __mutex_lock_common+0xec/0xc0c
[  +0.004832]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _add_opp_dev+0x2c/0x58
[  +0.004119]        _opp_get_opp_table+0xdc/0x1ac
[  +0.004745]        dev_pm_opp_get_opp_table_indexed+0x24/0x30
[  +0.005899]        dev_pm_opp_of_add_table_indexed+0x48/0x84
[  +0.005814]        of_genpd_add_provider_onecell+0xc0/0x1b8
[  +0.005721]        rpmhpd_probe+0x240/0x268
[  +0.004306]        platform_drv_probe+0x90/0xb0
[  +0.004656]        really_probe+0x134/0x2ec
[  +0.004305]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004481]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004832]        bus_probe_device+0x34/0x94
[  +0.004481]        device_add+0x1fc/0x3b0
[  +0.004119]        of_device_add+0x3c/0x4c
[  +0.004206]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005810]        of_platform_bus_create+0x1e4/0x368
[  +0.005197]        of_platform_populate+0x70/0xbc
[  +0.004832]        devm_of_platform_populate+0x58/0xa0
[  +0.005284]        rpmh_rsc_probe+0x36c/0x3cc
[  +0.004480]        platform_drv_probe+0x90/0xb0
[  +0.004658]        really_probe+0x134/0x2ec
[  +0.004301]        driver_probe_device+0x64/0xfc
[  +0.004745]        __device_attach_driver+0x8c/0xa4
[  +0.005007]        bus_for_each_drv+0x90/0xd8
[  +0.004480]        __device_attach+0xc0/0x148
[  +0.004481]        device_initial_probe+0x20/0x2c
[  +0.004831]        bus_probe_device+0x34/0x94
[  +0.004482]        device_add+0x1fc/0x3b0
[  +0.004118]        of_device_add+0x3c/0x4c
[  +0.004214]        of_platform_device_create_pdata+0xb8/0xfc
[  +0.005817]        of_platform_bus_create+0x1e4/0x368
[  +0.005186]        of_platform_bus_create+0x230/0x368
[  +0.005188]        of_platform_populate+0x70/0xbc
[  +0.004832]        of_platform_default_populate_init+0xa8/0xc0
[  +0.005987]        do_one_initcall+0x1c8/0x3fc
[  +0.004569]        do_initcall_level+0xb4/0x10c
[  +0.004657]        do_basic_setup+0x30/0x48
[  +0.004305]        kernel_init_freeable+0x124/0x1a4
[  +0.005008]        kernel_init+0x14/0x104
[  +0.004119]        ret_from_fork+0x10/0x18
[  +0.004206]
              -> #0 (opp_table_lock){+.+.}:
[  +0.005640]        __lock_acquire+0xee4/0x2450
[  +0.004570]        lock_acquire+0x1cc/0x210
[  +0.004305]        __mutex_lock_common+0xec/0xc0c
[  +0.004833]        mutex_lock_nested+0x40/0x50
[  +0.004570]        _find_opp_table+0x34/0x74
[  +0.004393]        dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.005372]        a6xx_gmu_resume+0xc8/0xecc
[  +0.004480]        a6xx_pm_resume+0x148/0x200
[  +0.004482]        adreno_resume+0x28/0x34
[  +0.004209]        pm_generic_runtime_resume+0x34/0x48
[  +0.005283]        __rpm_callback+0x70/0x10c
[  +0.004393]        rpm_callback+0x34/0x8c
[  +0.004119]        rpm_resume+0x414/0x550
[  +0.004119]        __pm_runtime_resume+0x7c/0xa0
[  +0.004746]        msm_gpu_submit+0x60/0x1c0
[  +0.004394]        msm_ioctl_gem_submit+0xadc/0xb60
[  +0.005010]        drm_ioctl_kernel+0x9c/0x118
[  +0.004569]        drm_ioctl+0x27c/0x408
[  +0.004034]        drm_compat_ioctl+0xcc/0xdc
[  +0.004483]        __se_compat_sys_ioctl+0x100/0x206c
[  +0.005186]        __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.005187]        el0_svc_common+0xa8/0x178
[  +0.004393]        el0_svc_compat_handler+0x2c/0x40
[  +0.005009]        el0_svc_compat+0x8/0x10
[  +0.004205]
              other info that might help us debug this:

[  +0.008213] Chain exists of:
                opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex

[  +0.011780]  Possible unsafe locking scenario:

[  +0.006082]        CPU0                    CPU1
[  +0.004660]        ----                    ----
[  +0.004656]   lock(reservation_ww_class_mutex);
[  +0.004657]                                lock(&mm->mmap_sem);
[  +0.006079]                                lock(reservation_ww_class_mutex);
[  +0.007237]   lock(opp_table_lock);
[  +0.003592]
               *** DEADLOCK ***

[  +0.006084] 3 locks held by chrome/1865:
[  +0.004031]  #0: ffffff81edecc0d8 (&dev->struct_mutex){+.+.}, at:
msm_ioctl_gem_submit+0x264/0xb60
[  +0.009198]  #1: ffffff81d0000870
(reservation_ww_class_acquire){+.+.}, at:
msm_ioctl_gem_submit+0x8e8/0xb60
[  +0.010086]  #2: ffffff81f0fc71a8
(reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
[  +0.009735]
              stack backtrace:
[  +0.004482] CPU: 0 PID: 1865 Comm: chrome Not tainted 5.4.72 #14
[  +0.006173] Hardware name: Google Lazor (rev1+) with LTE (DT)
[  +0.005899] Call trace:
[  +0.002515]  dump_backtrace+0x0/0x158
[  +0.003768]  show_stack+0x20/0x2c
[  +0.003407]  dump_stack+0xc8/0x160
[  +0.003506]  print_circular_bug+0x2c4/0x2c8
[  +0.004305]  check_noncircular+0x1a8/0x1b0
[  +0.004206]  __lock_acquire+0xee4/0x2450
[  +0.004032]  lock_acquire+0x1cc/0x210
[  +0.003768]  __mutex_lock_common+0xec/0xc0c
[  +0.004305]  mutex_lock_nested+0x40/0x50
[  +0.004033]  _find_opp_table+0x34/0x74
[  +0.003855]  dev_pm_opp_find_freq_exact+0x2c/0xdc
[  +0.004833]  a6xx_gmu_resume+0xc8/0xecc
[  +0.003943]  a6xx_pm_resume+0x148/0x200
[  +0.003944]  adreno_resume+0x28/0x34
[  +0.003681]  pm_generic_runtime_resume+0x34/0x48
[  +0.004745]  __rpm_callback+0x70/0x10c
[  +0.003854]  rpm_callback+0x34/0x8c
[  +0.003592]  rpm_resume+0x414/0x550
[  +0.003592]  __pm_runtime_resume+0x7c/0xa0
[  +0.004207]  msm_gpu_submit+0x60/0x1c0
[  +0.003855]  msm_ioctl_gem_submit+0xadc/0xb60
[  +0.004481]  drm_ioctl_kernel+0x9c/0x118
[  +0.004031]  drm_ioctl+0x27c/0x408
[  +0.003504]  drm_compat_ioctl+0xcc/0xdc
[  +0.003945]  __se_compat_sys_ioctl+0x100/0x206c
[  +0.004658]  __arm64_compat_sys_ioctl+0x20/0x2c
[  +0.004659]  el0_svc_common+0xa8/0x178
[  +0.003855]  el0_svc_compat_handler+0x2c/0x40
[  +0.004480]  el0_svc_compat+0x8/0x10

> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 2483e765318a..4cc0fb716381 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -1181,6 +1181,10 @@ static void _opp_table_kref_release(struct kref *kref)
>         struct opp_device *opp_dev, *temp;
>         int i;
>
> +       /* Drop the lock as soon as we can */
> +       list_del(&opp_table->node);
> +       mutex_unlock(&opp_table_lock);
> +
>         _of_clear_opp_table(opp_table);
>
>         /* Release clk */
> @@ -1208,10 +1212,7 @@ static void _opp_table_kref_release(struct kref *kref)
>
>         mutex_destroy(&opp_table->genpd_virt_dev_lock);
>         mutex_destroy(&opp_table->lock);
> -       list_del(&opp_table->node);
>         kfree(opp_table);
> -
> -       mutex_unlock(&opp_table_lock);
>  }
>
>  void dev_pm_opp_put_opp_table(struct opp_table *opp_table)
>
> --
> viresh

  reply	other threads:[~2020-10-25 17:39 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-12  2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
2020-10-12  2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
2020-10-12  2:09 ` [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper Rob Clark
2020-10-12  2:09 ` [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h Rob Clark
2020-10-12  2:09 ` [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers Rob Clark
2020-10-12  2:09 ` [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path Rob Clark
2020-10-12  2:09 ` [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos Rob Clark
2020-10-12  2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
2020-10-12 14:35   ` Daniel Vetter
2020-10-12 15:43     ` Rob Clark
2020-10-20  9:07       ` Viresh Kumar
2020-10-20 10:56         ` Daniel Vetter
2020-10-20 11:24           ` Viresh Kumar
2020-10-20 11:42             ` Daniel Vetter
2020-10-20 14:13             ` Rob Clark
2020-10-22  8:06               ` Viresh Kumar
2020-10-25 17:39                 ` Rob Clark [this message]
2020-10-27 11:35                   ` Viresh Kumar
2020-11-03  5:47                     ` Viresh Kumar
2020-11-03 16:50                       ` Rob Clark
2020-11-04  3:03                         ` Viresh Kumar
2020-11-05 19:24                           ` Rob Clark
2020-11-06  7:16                             ` Viresh Kumar
2020-11-17 10:03                               ` Viresh Kumar
2020-11-17 17:02                               ` Rob Clark
2020-11-18  5:28                                 ` Viresh Kumar
2020-11-18 16:53                                   ` Rob Clark
2020-11-19  6:05                                     ` Viresh Kumar
2020-12-07  6:16                                       ` Viresh Kumar
2020-12-16  5:22                                         ` Viresh Kumar
2020-10-12  2:09 ` [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking Rob Clark
2020-10-12  2:09 ` [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case Rob Clark
2020-10-12  2:09 ` [PATCH v2 10/22] drm/msm: Drop chatty trace Rob Clark
2020-10-12  2:09 ` [PATCH v2 11/22] drm/msm: Move update_fences() Rob Clark
2020-10-12  2:09 ` [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists Rob Clark
2020-10-12  2:09 ` [PATCH v2 13/22] drm/msm: Document and rename preempt_lock Rob Clark
2020-10-12  2:09 ` [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock Rob Clark
2020-10-12  2:09 ` [PATCH v2 15/22] drm/msm: Refcount submits Rob Clark
2020-10-12  2:09 ` [PATCH v2 16/22] drm/msm: Remove obj->gpu Rob Clark
2020-10-12  2:09 ` [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path Rob Clark
2020-10-12  2:09 ` [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path Rob Clark
2020-10-12  2:09 ` [PATCH v2 19/22] drm/msm: remove msm_gem_free_work Rob Clark
2020-10-12  2:09 ` [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path Rob Clark
2020-10-12  2:09 ` [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path Rob Clark
2020-10-12  2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
2020-10-12 14:40   ` Daniel Vetter
2020-10-12 15:07     ` Rob Clark
2020-10-13 11:08       ` Daniel Vetter
2020-10-13 16:15         ` [Freedreno] " Rob Clark
2020-10-15  8:22           ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF6AEGv6RMCsK4yp-W2d1mVTMcEiiwFGAb+V8rYLhDdMhqP80Q@mail.gmail.com \
    --to=robdclark@gmail.com \
    --cc=airlied@linux.ie \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=freedreno@lists.freedesktop.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nm@ti.com \
    --cc=robdclark@chromium.org \
    --cc=sean@poorly.run \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).