From: Viresh Kumar <viresh.kumar@linaro.org>
To: Rob Clark <robdclark@gmail.com>
Cc: dri-devel <dri-devel@lists.freedesktop.org>,
Rob Clark <robdclark@chromium.org>, Sean Paul <sean@poorly.run>,
David Airlie <airlied@linux.ie>,
"open list:DRM DRIVER FOR MSM ADRENO GPU"
<linux-arm-msm@vger.kernel.org>,
"open list:DRM DRIVER FOR MSM ADRENO GPU"
<freedreno@lists.freedesktop.org>,
open list <linux-kernel@vger.kernel.org>,
Daniel Vetter <daniel@ffwll.ch>, "Menon, Nishanth" <nm@ti.com>
Subject: Re: [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path
Date: Tue, 20 Oct 2020 14:37:29 +0530 [thread overview]
Message-ID: <20201020090729.qgqish5kqamhvatj@vireshk-i7> (raw)
In-Reply-To: <CAF6AEGstGtBswUUiyHxT2cCm8NwZekDnMzD0J_pQH37GwS=LiA@mail.gmail.com>
On 12-10-20, 08:43, Rob Clark wrote:
> On Mon, Oct 12, 2020 at 7:35 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Sun, Oct 11, 2020 at 07:09:34PM -0700, Rob Clark wrote:
> > > From: Rob Clark <robdclark@chromium.org>
> > >
> > > Unfortunately, due to an dev_pm_opp locking interaction with
> > > mm->mmap_sem, we need to do pm get before aquiring obj locks,
> > > otherwise we can have anger lockdep with the chain:
> >
> > tbh this sounds like a bug in that subsystem, since it means we cannot use
> > said subsystem in mmap handlers either.
> >
> > So if you have some remapping unit or need to wake up your gpu to blt the
> > buffer into system memory first, we're toast. That doesn't sound right. So
> > maybe Cc: pm folks and figure out how to fix this long term properly? Imo
> > not a good reason to hold up this patch set, since unwrangling mmap_sem
> > tends to be work ...
>
> + a couple of PM folks
>
> Looks like it has been this way for quite some time, so I guess the
> overlap between things using dev_pm_opp and mmap is low..
>
> fwiw, example splat so folks can see the locking interaction I am
> talking about.. I suspect the pm_opp interaction with mm->mmap_sem is
> from the debugfs calls while opp_table_lock is held?
I am not very sure about why this circular locking dependency is
happening here and how exactly can we fix it. The OPP core works under
the opp_table_lock, from within which it creates/remove the debugfs
stuff as well.
> [ 15.627855] ======================================================
> [ 15.634202] WARNING: possible circular locking dependency detected
> [ 15.640550] 5.4.70 #41 Not tainted
> [ 15.644050] ------------------------------------------------------
> [ 15.650397] chrome/1805 is trying to acquire lock:
> [ 15.655314] ffffffed90720738 (opp_table_lock){+.+.}, at:
> _find_opp_table+0x34/0x74
> [ 15.663092]
> [ 15.663092] but task is already holding lock:
> [ 15.669082] ffffff80ff3911a8 (reservation_ww_class_mutex){+.+.},
> at: submit_lock_objects+0x70/0x1ec
> [ 15.678369]
> [ 15.678369] which lock already depends on the new lock.
> [ 15.678369]
> [ 15.686764]
> [ 15.686764] the existing dependency chain (in reverse order) is:
> [ 15.694438]
> [ 15.694438] -> #3 (reservation_ww_class_mutex){+.+.}:
> [ 15.701146] __mutex_lock_common+0xec/0xc0c
> [ 15.705978] ww_mutex_lock_interruptible+0x5c/0xc4
> [ 15.711432] msm_gem_fault+0x2c/0x124
> [ 15.715731] __do_fault+0x40/0x16c
> [ 15.719766] handle_mm_fault+0x7cc/0xd98
> [ 15.724337] do_page_fault+0x230/0x3b4
> [ 15.728721] do_translation_fault+0x5c/0x78
> [ 15.733558] do_mem_abort+0x4c/0xb4
> [ 15.737680] el0_da+0x1c/0x20
> [ 15.741266]
> [ 15.741266] -> #2 (&mm->mmap_sem){++++}:
> [ 15.746809] __might_fault+0x70/0x98
> [ 15.751022] compat_filldir+0xf8/0x48c
> [ 15.755412] dcache_readdir+0x70/0x1dc
> [ 15.759808] iterate_dir+0xd4/0x180
> [ 15.763931] __arm64_compat_sys_getdents+0xa0/0x19c
> [ 15.769476] el0_svc_common+0xa8/0x178
> [ 15.773861] el0_svc_compat_handler+0x2c/0x40
> [ 15.778868] el0_svc_compat+0x8/0x10
> [ 15.783075]
> [ 15.783075] -> #1 (&sb->s_type->i_mutex_key#3){++++}:
> [ 15.789788] down_write+0x54/0x16c
> [ 15.793826] debugfs_remove_recursive+0x50/0x158
> [ 15.799108] opp_debug_unregister+0x34/0x114
> [ 15.804028] dev_pm_opp_put_opp_table+0xd0/0x14c
> [ 15.809308] dev_pm_opp_put_clkname+0x3c/0x50
> [ 15.814318] msm_dsi_host_destroy+0xb0/0xcc
> [ 15.819149] dsi_destroy+0x40/0x58
> [ 15.823184] dsi_bind+0x90/0x170
> [ 15.827041] component_bind_all+0xf0/0x208
> [ 15.831787] msm_drm_init+0x188/0x60c
> [ 15.836084] msm_drm_bind+0x24/0x30
> [ 15.840205] try_to_bring_up_master+0x15c/0x1a4
> [ 15.845396] __component_add+0x98/0x14c
> [ 15.849878] component_add+0x28/0x34
> [ 15.854086] dp_display_probe+0x324/0x370
> [ 15.858744] platform_drv_probe+0x90/0xb0
> [ 15.863400] really_probe+0x134/0x2ec
> [ 15.867699] driver_probe_device+0x64/0xfc
> [ 15.872443] __device_attach_driver+0x8c/0xa4
> [ 15.877459] bus_for_each_drv+0x90/0xd8
> [ 15.881939] __device_attach+0xc0/0x148
> [ 15.886420] device_initial_probe+0x20/0x2c
> [ 15.891254] bus_probe_device+0x34/0x94
> [ 15.895726] deferred_probe_work_func+0x78/0xb4
> [ 15.900914] process_one_work+0x30c/0x5d0
> [ 15.905573] worker_thread+0x240/0x3f0
> [ 15.909959] kthread+0x144/0x154
> [ 15.913809] ret_from_fork+0x10/0x18
> [ 15.918016]
> [ 15.918016] -> #0 (opp_table_lock){+.+.}:
> [ 15.923660] __lock_acquire+0xee4/0x2450
> [ 15.928230] lock_acquire+0x1cc/0x210
> [ 15.932527] __mutex_lock_common+0xec/0xc0c
> [ 15.937359] mutex_lock_nested+0x40/0x50
> [ 15.941928] _find_opp_table+0x34/0x74
> [ 15.946312] dev_pm_opp_find_freq_exact+0x2c/0xdc
> [ 15.951680] a6xx_gmu_resume+0xc8/0xecc
> [ 15.952812] fscrypt: AES-256-CTS-CBC using implementation "cts-cbc-aes-ce"
> [ 15.956161] a6xx_pm_resume+0x148/0x200
> [ 15.956166] adreno_resume+0x28/0x34
> [ 15.956171] pm_generic_runtime_resume+0x34/0x48
> [ 15.956174] __rpm_callback+0x70/0x10c
> [ 15.956176] rpm_callback+0x34/0x8c
> [ 15.956179] rpm_resume+0x414/0x550
> [ 15.956182] __pm_runtime_resume+0x7c/0xa0
> [ 15.956185] msm_gpu_submit+0x60/0x1c0
> [ 15.956190] msm_ioctl_gem_submit+0xadc/0xb60
> [ 16.003961] drm_ioctl_kernel+0x9c/0x118
> [ 16.008532] drm_ioctl+0x27c/0x408
> [ 16.012562] drm_compat_ioctl+0xcc/0xdc
> [ 16.017038] __se_compat_sys_ioctl+0x100/0x206c
> [ 16.022224] __arm64_compat_sys_ioctl+0x20/0x2c
> [ 16.027412] el0_svc_common+0xa8/0x178
> [ 16.031800] el0_svc_compat_handler+0x2c/0x40
> [ 16.036810] el0_svc_compat+0x8/0x10
> [ 16.041021]
> [ 16.041021] other info that might help us debug this:
> [ 16.041021]
> [ 16.049235] Chain exists of:
> [ 16.049235] opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> [ 16.049235]
> [ 16.061014] Possible unsafe locking scenario:
> [ 16.061014]
> [ 16.067091] CPU0 CPU1
> [ 16.071750] ---- ----
> [ 16.076399] lock(reservation_ww_class_mutex);
> [ 16.081059] lock(&mm->mmap_sem);
> [ 16.087134] lock(reservation_ww_class_mutex);
> [ 16.094369] lock(opp_table_lock);
> [ 16.097961]
> [ 16.097961] *** DEADLOCK ***
> [ 16.097961]
> [ 16.104038] 3 locks held by chrome/1805:
> [ 16.108068] #0: ffffff80fb20c0d8 (&dev->struct_mutex){+.+.}, at:
> msm_ioctl_gem_submit+0x264/0xb60
> [ 16.117264] #1: ffffff80dd712c70
> (reservation_ww_class_acquire){+.+.}, at:
> msm_ioctl_gem_submit+0x8e8/0xb60
> [ 16.127357] #2: ffffff80ff3911a8
> (reservation_ww_class_mutex){+.+.}, at: submit_lock_objects+0x70/0x1ec
> [ 16.137089]
> [ 16.137089] stack backtrace:
> [ 16.141567] CPU: 4 PID: 1805 Comm: chrome Not tainted 5.4.70 #41
> [ 16.147733] Hardware name: Google Lazor (rev1+) with LTE (DT)
> [ 16.153632] Call trace:
> [ 16.156154] dump_backtrace+0x0/0x158
> [ 16.159924] show_stack+0x20/0x2c
> [ 16.163340] dump_stack+0xc8/0x160
> [ 16.166840] print_circular_bug+0x2c4/0x2c8
> [ 16.171144] check_noncircular+0x1a8/0x1b0
> [ 16.175351] __lock_acquire+0xee4/0x2450
> [ 16.179382] lock_acquire+0x1cc/0x210
> [ 16.183146] __mutex_lock_common+0xec/0xc0c
> [ 16.187450] mutex_lock_nested+0x40/0x50
> [ 16.191481] _find_opp_table+0x34/0x74
> [ 16.195344] dev_pm_opp_find_freq_exact+0x2c/0xdc
> [ 16.200178] a6xx_gmu_resume+0xc8/0xecc
> [ 16.204120] a6xx_pm_resume+0x148/0x200
> [ 16.208064] adreno_resume+0x28/0x34
> [ 16.211743] pm_generic_runtime_resume+0x34/0x48
> [ 16.216488] __rpm_callback+0x70/0x10c
> [ 16.220342] rpm_callback+0x34/0x8c
> [ 16.223933] rpm_resume+0x414/0x550
> [ 16.227524] __pm_runtime_resume+0x7c/0xa0
> [ 16.231731] msm_gpu_submit+0x60/0x1c0
> [ 16.235586] msm_ioctl_gem_submit+0xadc/0xb60
> [ 16.240066] drm_ioctl_kernel+0x9c/0x118
> [ 16.244097] drm_ioctl+0x27c/0x408
> [ 16.247602] drm_compat_ioctl+0xcc/0xdc
> [ 16.251546] __se_compat_sys_ioctl+0x100/0x206c
> [ 16.256204] __arm64_compat_sys_ioctl+0x20/0x2c
> [ 16.260861] el0_svc_common+0xa8/0x178
> [ 16.264716] el0_svc_compat_handler+0x2c/0x40
> [ 16.269196] el0_svc_compat+0x8/0x10
>
> BR,
> -R
>
> > -Daniel
> >
> > >
> > > opp_table_lock --> &mm->mmap_sem --> reservation_ww_class_mutex
> > >
> > > For an explicit fencing userspace, the impact should be minimal
> > > as we do all the fence waits before this point. It could result
> > > in some needless resumes in error cases, etc.
> > >
> > > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > > ---
> > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++++++++--
> > > 1 file changed, 13 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > index 002130d826aa..a9422d043bfe 100644
> > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > > @@ -744,11 +744,20 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >
> > > ret = submit_lookup_objects(submit, args, file);
> > > if (ret)
> > > - goto out;
> > > + goto out_pre_pm;
> > >
> > > ret = submit_lookup_cmds(submit, args, file);
> > > if (ret)
> > > - goto out;
> > > + goto out_pre_pm;
> > > +
> > > + /*
> > > + * Thanks to dev_pm_opp opp_table_lock interactions with mm->mmap_sem
> > > + * in the resume path, we need to to rpm get before we lock objs.
> > > + * Which unfortunately might involve powering up the GPU sooner than
> > > + * is necessary. But at least in the explicit fencing case, we will
> > > + * have already done all the fence waiting.
> > > + */
> > > + pm_runtime_get_sync(&gpu->pdev->dev);
> > >
> > > /* copy_*_user while holding a ww ticket upsets lockdep */
> > > ww_acquire_init(&submit->ticket, &reservation_ww_class);
> > > @@ -825,6 +834,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > >
> > >
> > > out:
> > > + pm_runtime_put(&gpu->pdev->dev);
> > > +out_pre_pm:
> > > submit_cleanup(submit);
> > > if (has_ww_ticket)
> > > ww_acquire_fini(&submit->ticket);
--
viresh
next prev parent reply other threads:[~2020-10-20 9:07 UTC|newest]
Thread overview: 50+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-12 2:09 [PATCH 00/14] drm/msm: de-struct_mutex-ification Rob Clark
2020-10-12 2:09 ` [PATCH v2 01/22] drm/msm/gem: Add obj->lock wrappers Rob Clark
2020-10-12 2:09 ` [PATCH v2 02/22] drm/msm/gem: Rename internal get_iova_locked helper Rob Clark
2020-10-12 2:09 ` [PATCH v2 03/22] drm/msm/gem: Move prototypes to msm_gem.h Rob Clark
2020-10-12 2:09 ` [PATCH v2 04/22] drm/msm/gem: Add some _locked() helpers Rob Clark
2020-10-12 2:09 ` [PATCH v2 05/22] drm/msm/gem: Move locking in shrinker path Rob Clark
2020-10-12 2:09 ` [PATCH v2 06/22] drm/msm/submit: Move copy_from_user ahead of locking bos Rob Clark
2020-10-12 2:09 ` [PATCH v2 07/22] drm/msm: Do rpm get sooner in the submit path Rob Clark
2020-10-12 14:35 ` Daniel Vetter
2020-10-12 15:43 ` Rob Clark
2020-10-20 9:07 ` Viresh Kumar [this message]
2020-10-20 10:56 ` Daniel Vetter
2020-10-20 11:24 ` Viresh Kumar
2020-10-20 11:42 ` Daniel Vetter
2020-10-20 14:13 ` Rob Clark
2020-10-22 8:06 ` Viresh Kumar
2020-10-25 17:39 ` Rob Clark
2020-10-27 11:35 ` Viresh Kumar
2020-11-03 5:47 ` Viresh Kumar
2020-11-03 16:50 ` Rob Clark
2020-11-04 3:03 ` Viresh Kumar
2020-11-05 19:24 ` Rob Clark
2020-11-06 7:16 ` Viresh Kumar
2020-11-17 10:03 ` Viresh Kumar
2020-11-17 17:02 ` Rob Clark
2020-11-18 5:28 ` Viresh Kumar
2020-11-18 16:53 ` Rob Clark
2020-11-19 6:05 ` Viresh Kumar
2020-12-07 6:16 ` Viresh Kumar
2020-12-16 5:22 ` Viresh Kumar
2020-10-12 2:09 ` [PATCH v2 08/22] drm/msm/gem: Switch over to obj->resv for locking Rob Clark
2020-10-12 2:09 ` [PATCH v2 09/22] drm/msm: Use correct drm_gem_object_put() in fail case Rob Clark
2020-10-12 2:09 ` [PATCH v2 10/22] drm/msm: Drop chatty trace Rob Clark
2020-10-12 2:09 ` [PATCH v2 11/22] drm/msm: Move update_fences() Rob Clark
2020-10-12 2:09 ` [PATCH v2 12/22] drm/msm: Add priv->mm_lock to protect active/inactive lists Rob Clark
2020-10-12 2:09 ` [PATCH v2 13/22] drm/msm: Document and rename preempt_lock Rob Clark
2020-10-12 2:09 ` [PATCH v2 14/22] drm/msm: Protect ring->submits with it's own lock Rob Clark
2020-10-12 2:09 ` [PATCH v2 15/22] drm/msm: Refcount submits Rob Clark
2020-10-12 2:09 ` [PATCH v2 16/22] drm/msm: Remove obj->gpu Rob Clark
2020-10-12 2:09 ` [PATCH v2 17/22] drm/msm: Drop struct_mutex from the retire path Rob Clark
2020-10-12 2:09 ` [PATCH v2 18/22] drm/msm: Drop struct_mutex in free_object() path Rob Clark
2020-10-12 2:09 ` [PATCH v2 19/22] drm/msm: remove msm_gem_free_work Rob Clark
2020-10-12 2:09 ` [PATCH v2 20/22] drm/msm: drop struct_mutex in madvise path Rob Clark
2020-10-12 2:09 ` [PATCH v2 21/22] drm/msm: Drop struct_mutex in shrinker path Rob Clark
2020-10-12 2:09 ` [PATCH v2 22/22] drm/msm: Don't implicit-sync if only a single ring Rob Clark
2020-10-12 14:40 ` Daniel Vetter
2020-10-12 15:07 ` Rob Clark
2020-10-13 11:08 ` Daniel Vetter
2020-10-13 16:15 ` [Freedreno] " Rob Clark
2020-10-15 8:22 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201020090729.qgqish5kqamhvatj@vireshk-i7 \
--to=viresh.kumar@linaro.org \
--cc=airlied@linux.ie \
--cc=daniel@ffwll.ch \
--cc=dri-devel@lists.freedesktop.org \
--cc=freedreno@lists.freedesktop.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nm@ti.com \
--cc=robdclark@chromium.org \
--cc=robdclark@gmail.com \
--cc=sean@poorly.run \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).