From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39BC5C433EF for ; Mon, 14 Mar 2022 06:59:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236496AbiCNHAy (ORCPT ); Mon, 14 Mar 2022 03:00:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233747AbiCNHAx (ORCPT ); Mon, 14 Mar 2022 03:00:53 -0400 Received: from lgeamrelo11.lge.com (lgeamrelo13.lge.com [156.147.23.53]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1C70C40908 for ; Sun, 13 Mar 2022 23:59:42 -0700 (PDT) Received: from unknown (HELO lgeamrelo02.lge.com) (156.147.1.126) by 156.147.23.53 with ESMTP; 14 Mar 2022 15:59:40 +0900 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO X58A-UD3R) (10.177.244.38) by 156.147.1.126 with ESMTP; 14 Mar 2022 15:59:40 +0900 X-Original-SENDERIP: 10.177.244.38 X-Original-MAILFROM: byungchul.park@lge.com Date: Mon, 14 Mar 2022 15:59:06 +0900 From: Byungchul Park To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: torvalds@linux-foundation.org, damien.lemoal@opensource.wdc.com, linux-ide@vger.kernel.org, adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, mingo@redhat.com, linux-kernel@vger.kernel.org, peterz@infradead.org, will@kernel.org, tglx@linutronix.de, rostedt@goodmis.org, joel@joelfernandes.org, sashal@kernel.org, daniel.vetter@ffwll.ch, chris@chris-wilson.co.uk, duyuyang@gmail.com, johannes.berg@intel.com, tj@kernel.org, tytso@mit.edu, willy@infradead.org, david@fromorbit.com, amir73il@gmail.com, bfields@fieldses.org, gregkh@linuxfoundation.org, kernel-team@lge.com, linux-mm@kvack.org, akpm@linux-foundation.org, mhocko@kernel.org, minchan@kernel.org, hannes@cmpxchg.org, vdavydov.dev@gmail.com, sj@kernel.org, jglisse@redhat.com, dennis@kernel.org, cl@linux.com, penberg@kernel.org, rientjes@google.com, vbabka@suse.cz, ngupta@vflare.org, linux-block@vger.kernel.org, paolo.valente@linaro.org, josef@toxicpanda.com, linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz, jack@suse.com, jlayton@kernel.org, dan.j.williams@intel.com, hch@infradead.org, djwong@kernel.org, dri-devel@lists.freedesktop.org, airlied@linux.ie, rodrigosiqueiramelo@gmail.com, melissa.srw@gmail.com, hamohammed.sa@gmail.com Subject: Re: [PATCH v4 00/24] DEPT(Dependency Tracker) Message-ID: <20220314065906.GA6255@X58A-UD3R> References: <1646377603-19730-1-git-send-email-byungchul.park@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote: > On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote: > > Hi Linus and folks, > > > > I've been developing a tool for detecting deadlock possibilities by > > tracking wait/event rather than lock(?) acquisition order to try to > > cover all synchonization machanisms. It's done on v5.17-rc1 tag. > > > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1 > > > > Small feedback unrelated to thread: > I'm not sure "Need to expand the ring buffer" is something to call > WARN(). Is this stack trace useful for something? Yeah. It seems to happen too often. I won't warn it. Thanks. > ======== > > Hello Byungchul. These are two warnings of DEPT on system. > Both cases look similar. > > In what case DEPT says (unknown)? > I'm not sure we can properly debug this. > > =================================================== > DEPT: Circular dependency has been detected. > 5.17.0-rc1+ #3 Tainted: G W > --------------------------------------------------- > summary > --------------------------------------------------- > *** AA DEADLOCK *** > > context A > [S] (unknown)(&vfork:0) > [W] wait_for_completion_killable(&vfork:0) > [E] complete(&vfork:0) All the reports look like having to do with kernel_clone(). I need to check it more. Thank you very much. You are awesome, Hyeonggon. Thank you, Byungchul > [S]: start of the event context > [W]: the wait blocked > [E]: the event not reachable > --------------------------------------------------- > context A's detail > --------------------------------------------------- > context A > [S] (unknown)(&vfork:0) > [W] wait_for_completion_killable(&vfork:0) > [E] complete(&vfork:0) > > [S] (unknown)(&vfork:0): > (N/A) > > [W] wait_for_completion_killable(&vfork:0): > [] kernel_clone+0x25c/0x2b8 > stacktrace: > dept_wait+0x74/0x88 > wait_for_completion_killable+0x60/0xa0 > kernel_clone+0x25c/0x2b8 > __do_sys_clone+0x5c/0x74 > __arm64_sys_clone+0x18/0x20 > invoke_syscall.constprop.0+0x78/0xc4 > do_el0_svc+0x98/0xd0 > el0_svc+0x44/0xe4 > el0t_64_sync_handler+0xb0/0x12c > el0t_64_sync+0x158/0x15c > > [E] complete(&vfork:0): > [] mm_release+0x7c/0x90 > stacktrace: > dept_event+0xe0/0x100 > complete+0x48/0x98 > mm_release+0x7c/0x90 > exit_mm_release+0xc/0x14 > do_exit+0x1b4/0x81c > do_group_exit+0x30/0x9c > __wake_up_parent+0x0/0x24 > invoke_syscall.constprop.0+0x78/0xc4 > do_el0_svc+0x98/0xd0 > el0_svc+0x44/0xe4 > el0t_64_sync_handler+0xb0/0x12c > el0t_64_sync+0x158/0x15c > --------------------------------------------------- > information that might be helpful > --------------------------------------------------- > CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G W 5.17.0-rc1+ #3 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.0+0x9c/0xc4 > show_stack+0x14/0x28 > dump_stack_lvl+0x9c/0xcc > dump_stack+0x14/0x2c > print_circle+0x2d4/0x438 > cb_check_dl+0x44/0x70 > bfs+0x60/0x168 > add_dep+0x88/0x11c > do_event.constprop.0+0x19c/0x2c0 > dept_event+0xe0/0x100 > complete+0x48/0x98 > mm_release+0x7c/0x90 > exit_mm_release+0xc/0x14 > do_exit+0x1b4/0x81c > do_group_exit+0x30/0x9c > __wake_up_parent+0x0/0x24 > invoke_syscall.constprop.0+0x78/0xc4 > do_el0_svc+0x98/0xd0 > el0_svc+0x44/0xe4 > el0t_64_sync_handler+0xb0/0x12c > el0t_64_sync+0x158/0x15c > > > > > =================================================== > DEPT: Circular dependency has been detected. > 5.17.0-rc1+ #3 Tainted: G W > --------------------------------------------------- > summary > --------------------------------------------------- > *** AA DEADLOCK *** > > context A > [S] (unknown)(&try_completion:0) > [W] wait_for_completion_timeout(&try_completion:0) > [E] complete(&try_completion:0) > > [S]: start of the event context > [W]: the wait blocked > [E]: the event not reachable > --------------------------------------------------- > context A's detail > --------------------------------------------------- > context A > [S] (unknown)(&try_completion:0) > [W] wait_for_completion_timeout(&try_completion:0) > [E] complete(&try_completion:0) > > [S] (unknown)(&try_completion:0): > (N/A) > > [W] wait_for_completion_timeout(&try_completion:0): > [] kunit_try_catch_run+0xb4/0x160 > stacktrace: > dept_wait+0x74/0x88 > wait_for_completion_timeout+0x64/0xa0 > kunit_try_catch_run+0xb4/0x160 > kunit_test_try_catch_successful_try_no_catch+0x3c/0x98 > kunit_try_run_case+0x9c/0xa0 > kunit_generic_run_threadfn_adapter+0x1c/0x28 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > [E] complete(&try_completion:0): > [] kthread_complete_and_exit+0x18/0x20 > stacktrace: > dept_event+0xe0/0x100 > complete+0x48/0x98 > kthread_complete_and_exit+0x18/0x20 > kunit_try_catch_throw+0x0/0x1c > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > --------------------------------------------------- > information that might be helpful > --------------------------------------------------- > CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G W 5.17.0-rc1+ #3 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.0+0x9c/0xc4 > show_stack+0x14/0x28 > dump_stack_lvl+0x9c/0xcc > dump_stack+0x14/0x2c > print_circle+0x2d4/0x438 > cb_check_dl+0x44/0x70 > bfs+0x60/0x168 > add_dep+0x88/0x11c > do_event.constprop.0+0x19c/0x2c0 > dept_event+0xe0/0x100 > complete+0x48/0x98 > kthread_complete_and_exit+0x18/0x20 > kunit_try_catch_throw+0x0/0x1c > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > -- > Thank you, You are awesome! > Hyeonggon :-) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id CED0DC433EF for ; Mon, 14 Mar 2022 06:59:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 18D6F10E0A0; Mon, 14 Mar 2022 06:59:44 +0000 (UTC) Received: from lgeamrelo11.lge.com (lgeamrelo13.lge.com [156.147.23.53]) by gabe.freedesktop.org (Postfix) with ESMTP id 97B9A10E0A0 for ; Mon, 14 Mar 2022 06:59:42 +0000 (UTC) Received: from unknown (HELO lgeamrelo02.lge.com) (156.147.1.126) by 156.147.23.53 with ESMTP; 14 Mar 2022 15:59:40 +0900 X-Original-SENDERIP: 156.147.1.126 X-Original-MAILFROM: byungchul.park@lge.com Received: from unknown (HELO X58A-UD3R) (10.177.244.38) by 156.147.1.126 with ESMTP; 14 Mar 2022 15:59:40 +0900 X-Original-SENDERIP: 10.177.244.38 X-Original-MAILFROM: byungchul.park@lge.com Date: Mon, 14 Mar 2022 15:59:06 +0900 From: Byungchul Park To: Hyeonggon Yoo <42.hyeyoo@gmail.com> Subject: Re: [PATCH v4 00/24] DEPT(Dependency Tracker) Message-ID: <20220314065906.GA6255@X58A-UD3R> References: <1646377603-19730-1-git-send-email-byungchul.park@lge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: hamohammed.sa@gmail.com, jack@suse.cz, peterz@infradead.org, daniel.vetter@ffwll.ch, amir73il@gmail.com, david@fromorbit.com, dri-devel@lists.freedesktop.org, chris@chris-wilson.co.uk, bfields@fieldses.org, linux-ide@vger.kernel.org, adilger.kernel@dilger.ca, joel@joelfernandes.org, cl@linux.com, will@kernel.org, duyuyang@gmail.com, sashal@kernel.org, paolo.valente@linaro.org, damien.lemoal@opensource.wdc.com, willy@infradead.org, hch@infradead.org, airlied@linux.ie, mingo@redhat.com, djwong@kernel.org, vdavydov.dev@gmail.com, rientjes@google.com, dennis@kernel.org, linux-ext4@vger.kernel.org, linux-mm@kvack.org, ngupta@vflare.org, johannes.berg@intel.com, jack@suse.com, dan.j.williams@intel.com, josef@toxicpanda.com, rostedt@goodmis.org, linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, jglisse@redhat.com, viro@zeniv.linux.org.uk, tglx@linutronix.de, mhocko@kernel.org, vbabka@suse.cz, melissa.srw@gmail.com, sj@kernel.org, tytso@mit.edu, rodrigosiqueiramelo@gmail.com, kernel-team@lge.com, gregkh@linuxfoundation.org, jlayton@kernel.org, linux-kernel@vger.kernel.org, penberg@kernel.org, minchan@kernel.org, hannes@cmpxchg.org, tj@kernel.org, akpm@linux-foundation.org, torvalds@linux-foundation.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" On Sat, Mar 12, 2022 at 01:53:26AM +0000, Hyeonggon Yoo wrote: > On Fri, Mar 04, 2022 at 04:06:19PM +0900, Byungchul Park wrote: > > Hi Linus and folks, > > > > I've been developing a tool for detecting deadlock possibilities by > > tracking wait/event rather than lock(?) acquisition order to try to > > cover all synchonization machanisms. It's done on v5.17-rc1 tag. > > > > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1 > > > > Small feedback unrelated to thread: > I'm not sure "Need to expand the ring buffer" is something to call > WARN(). Is this stack trace useful for something? Yeah. It seems to happen too often. I won't warn it. Thanks. > ======== > > Hello Byungchul. These are two warnings of DEPT on system. > Both cases look similar. > > In what case DEPT says (unknown)? > I'm not sure we can properly debug this. > > =================================================== > DEPT: Circular dependency has been detected. > 5.17.0-rc1+ #3 Tainted: G W > --------------------------------------------------- > summary > --------------------------------------------------- > *** AA DEADLOCK *** > > context A > [S] (unknown)(&vfork:0) > [W] wait_for_completion_killable(&vfork:0) > [E] complete(&vfork:0) All the reports look like having to do with kernel_clone(). I need to check it more. Thank you very much. You are awesome, Hyeonggon. Thank you, Byungchul > [S]: start of the event context > [W]: the wait blocked > [E]: the event not reachable > --------------------------------------------------- > context A's detail > --------------------------------------------------- > context A > [S] (unknown)(&vfork:0) > [W] wait_for_completion_killable(&vfork:0) > [E] complete(&vfork:0) > > [S] (unknown)(&vfork:0): > (N/A) > > [W] wait_for_completion_killable(&vfork:0): > [] kernel_clone+0x25c/0x2b8 > stacktrace: > dept_wait+0x74/0x88 > wait_for_completion_killable+0x60/0xa0 > kernel_clone+0x25c/0x2b8 > __do_sys_clone+0x5c/0x74 > __arm64_sys_clone+0x18/0x20 > invoke_syscall.constprop.0+0x78/0xc4 > do_el0_svc+0x98/0xd0 > el0_svc+0x44/0xe4 > el0t_64_sync_handler+0xb0/0x12c > el0t_64_sync+0x158/0x15c > > [E] complete(&vfork:0): > [] mm_release+0x7c/0x90 > stacktrace: > dept_event+0xe0/0x100 > complete+0x48/0x98 > mm_release+0x7c/0x90 > exit_mm_release+0xc/0x14 > do_exit+0x1b4/0x81c > do_group_exit+0x30/0x9c > __wake_up_parent+0x0/0x24 > invoke_syscall.constprop.0+0x78/0xc4 > do_el0_svc+0x98/0xd0 > el0_svc+0x44/0xe4 > el0t_64_sync_handler+0xb0/0x12c > el0t_64_sync+0x158/0x15c > --------------------------------------------------- > information that might be helpful > --------------------------------------------------- > CPU: 6 PID: 229 Comm: start-stop-daem Tainted: G W 5.17.0-rc1+ #3 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.0+0x9c/0xc4 > show_stack+0x14/0x28 > dump_stack_lvl+0x9c/0xcc > dump_stack+0x14/0x2c > print_circle+0x2d4/0x438 > cb_check_dl+0x44/0x70 > bfs+0x60/0x168 > add_dep+0x88/0x11c > do_event.constprop.0+0x19c/0x2c0 > dept_event+0xe0/0x100 > complete+0x48/0x98 > mm_release+0x7c/0x90 > exit_mm_release+0xc/0x14 > do_exit+0x1b4/0x81c > do_group_exit+0x30/0x9c > __wake_up_parent+0x0/0x24 > invoke_syscall.constprop.0+0x78/0xc4 > do_el0_svc+0x98/0xd0 > el0_svc+0x44/0xe4 > el0t_64_sync_handler+0xb0/0x12c > el0t_64_sync+0x158/0x15c > > > > > =================================================== > DEPT: Circular dependency has been detected. > 5.17.0-rc1+ #3 Tainted: G W > --------------------------------------------------- > summary > --------------------------------------------------- > *** AA DEADLOCK *** > > context A > [S] (unknown)(&try_completion:0) > [W] wait_for_completion_timeout(&try_completion:0) > [E] complete(&try_completion:0) > > [S]: start of the event context > [W]: the wait blocked > [E]: the event not reachable > --------------------------------------------------- > context A's detail > --------------------------------------------------- > context A > [S] (unknown)(&try_completion:0) > [W] wait_for_completion_timeout(&try_completion:0) > [E] complete(&try_completion:0) > > [S] (unknown)(&try_completion:0): > (N/A) > > [W] wait_for_completion_timeout(&try_completion:0): > [] kunit_try_catch_run+0xb4/0x160 > stacktrace: > dept_wait+0x74/0x88 > wait_for_completion_timeout+0x64/0xa0 > kunit_try_catch_run+0xb4/0x160 > kunit_test_try_catch_successful_try_no_catch+0x3c/0x98 > kunit_try_run_case+0x9c/0xa0 > kunit_generic_run_threadfn_adapter+0x1c/0x28 > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > [E] complete(&try_completion:0): > [] kthread_complete_and_exit+0x18/0x20 > stacktrace: > dept_event+0xe0/0x100 > complete+0x48/0x98 > kthread_complete_and_exit+0x18/0x20 > kunit_try_catch_throw+0x0/0x1c > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > > --------------------------------------------------- > information that might be helpful > --------------------------------------------------- > CPU: 15 PID: 132 Comm: kunit_try_catch Tainted: G W 5.17.0-rc1+ #3 > Hardware name: linux,dummy-virt (DT) > Call trace: > dump_backtrace.part.0+0x9c/0xc4 > show_stack+0x14/0x28 > dump_stack_lvl+0x9c/0xcc > dump_stack+0x14/0x2c > print_circle+0x2d4/0x438 > cb_check_dl+0x44/0x70 > bfs+0x60/0x168 > add_dep+0x88/0x11c > do_event.constprop.0+0x19c/0x2c0 > dept_event+0xe0/0x100 > complete+0x48/0x98 > kthread_complete_and_exit+0x18/0x20 > kunit_try_catch_throw+0x0/0x1c > kthread+0xd4/0xe4 > ret_from_fork+0x10/0x20 > -- > Thank you, You are awesome! > Hyeonggon :-)