From: Rocco.Yue <rocco.yue@mediatek.com> To: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Matthias Brugger <matthias.bgg@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Masahiro Yamada <masahiroy@kernel.org>, "Nick Desaulniers" <ndesaulniers@google.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, "Peter Zijlstra (Intel)" <peterz@infradead.org>, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>, Peter Enderborg <peter.enderborg@sony.com>, Thomas Gleixner <tglx@linutronix.de>, Anshuman Khandual <anshuman.khandual@arm.com>, "Vitor Massaru Iha" <vitor@massaru.org>, Sedat Dilek <sedat.dilek@gmail.com>, "Wei Yang" <richard.weiyang@gmail.com>, Cong Wang <cong.wang@bytedance.com>, "Di Zhu" <zhudi21@huawei.com>, Stephen Hemminger <stephen@networkplumber.org>, Francis Laniel <laniel_francis@privacyrequired.com>, Roopa Prabhu <roopa@cumulusnetworks.com>, Andrii Nakryiko <andrii@kernel.org>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, netdev <netdev@vger.kernel.org>, linux-arm Mailing List <linux-arm-kernel@lists.infradead.org>, "moderated list:ARM/Mediatek SoC support" <linux-mediatek@lists.infradead.org>, <wsd_upsream@mediatek.com> Subject: Re: [PATCH] rtnetlink: add rtnl_lock debug log Date: Sat, 8 May 2021 16:53:32 +0800 [thread overview] Message-ID: <1620464012.29475.76.camel@mbjsdccf07> (raw) In-Reply-To: <CAHp75Vc=GMv5dJ1dJYr=B3W6c+nuPCfXa4wxLJOYPTuqrMskFg@mail.gmail.com> > > We often encounter system hangs caused by certain processes > > holding rtnl_lock for a long time. Even if there is a lock > > detection mechanism in Linux, it is a bit troublesome and > > affects the system performance. We hope to add a lightweight > > debugging mechanism for detecting rtnl_lock. > > > > Up to now, we have discovered and solved some potential bugs > > through such debug information of this lightweight rtnl_lock, > > which is helpful for us. > > > > When you say Y for RTNL_LOCK_DEBUG, then the kernel will detect > > if any function hold rtnl_lock too long and some key information > > will be printed to help identify the issue point. > > > > i.e: from the following logs, we can clear know that the pid=5546 > > clearly > > > RfxSender_4 process hold rtnl_lock for a long time, causing the > > holds > > > system hang. And we can also speculate that the delay operation > > to hang > Thanks for your review, Andy. I Will fix them. > > may be performed in devinet_ioctl(), resulting in rtnl_lock was > > not released in time. > > > > <6>[ 141.151364] ----------- rtnl_print_btrace start ----------- > > Can you, please, shrink this to the point? > Will shrink these points. > > <6>[ 141.152079] RfxSender_4[5546][R] hold rtnl_lock more than 2 sec, > > start time: 139129481562 > > <4>[ 141.153114] rtnl_lock+0x88/0xfc > > <4>[ 141.153523] devinet_ioctl+0x190/0x1268 > > <4>[ 141.154007] inet_ioctl+0x108/0x1f4 > > <4>[ 141.154449] sock_do_ioctl+0x88/0x200 > > <4>[ 141.154911] sock_ioctl+0x4b0/0x884 > > <4>[ 141.155367] do_vfs_ioctl+0x6b0/0xcc4 > > <4>[ 141.155830] __arm64_sys_ioctl+0xc0/0xec > > <4>[ 141.156326] el0_svc_common+0x130/0x2c0 > > <4>[ 141.156810] el0_svc_handler+0xd0/0xe0 > > <4>[ 141.157283] el0_svc+0x8/0xc > > <4>[ 141.157646] Call trace: > > <4>[ 141.157956] dump_backtrace+0x0/0x240 > > <4>[ 141.158418] show_stack+0x18/0x24 > > <4>[ 141.158836] rtnl_print_btrace+0x138/0x1cc > > <4>[ 141.159362] call_timer_fn+0x120/0x47c > > <4>[ 141.159834] expire_timers+0x28c/0x420 > > <4>[ 141.160306] __run_timers+0x3d0/0x494 > > <4>[ 141.160768] run_timer_softirq+0x24/0x48 > > <4>[ 141.161262] __do_softirq+0x26c/0x968 > > <4>[ 141.161725] irq_exit+0x1f8/0x2b4 > > <4>[ 141.162145] __handle_domain_irq+0xdc/0x15c > > <4>[ 141.162672] gic_handle_irq+0xe4/0x188 > > <4>[ 141.163144] el1_irq+0x104/0x200 > > <4>[ 141.163559] __const_udelay+0x118/0x1b0 > > <4>[ 141.164044] devinet_ioctl+0x1a0/0x1268 > > <4>[ 141.164527] inet_ioctl+0x108/0x1f4 > > <4>[ 141.164968] sock_do_ioctl+0x88/0x200 > > <4>[ 141.165428] sock_ioctl+0x4b0/0x884 > > <4>[ 141.165868] do_vfs_ioctl+0x6b0/0xcc4 > > <4>[ 141.166330] __arm64_sys_ioctl+0xc0/0xec > > <4>[ 141.166825] el0_svc_common+0x130/0x2c0 > > <4>[ 141.167308] el0_svc_handler+0xd0/0xe0 > > <4>[ 141.167786] el0_svc+0x8/0xc > > <6>[ 141.168153] ------------ rtnl_print_btrace end ----------- > > > > <6>[ 147.321389] rtnl_lock is held by [5546] from > > [139129481562] to [147321378812] > > > ... > > > +static struct rtnl_debug_btrace_t rtnl_instance = { > > + .task = NULL, > > + .pid = 0, > > + .start_time = 0, > > + .end_time = 0, > > + .nr_entries = 0, > > static assumes all 0:s, what's the point? > will fix it in the patch v2. > > +}; > > ... > > > +static void rtnl_print_btrace(struct timer_list *unused) > > +{ > > + pr_info("----------- %s start -----------\n", __func__); > > + pr_info("%s[%d][%c] hold rtnl_lock more than 2 sec, start time: %llu\n", > > + rtnl_instance.task->comm, > > + rtnl_instance.pid, > > + task_state_to_char(rtnl_instance.task), > > + rtnl_instance.start_time); > > + stack_trace_print(rtnl_instance.addrs, rtnl_instance.nr_entries, 0); > > > + show_stack(rtnl_instance.task, NULL, KERN_DEBUG); > > Unaligned debug level. > KERN_INFO is expected, I will fix it in patch v2. > > + pr_info("------------ %s end -----------\n", __func__); > > Looking into tons of these, perhaps you need to define pr_fmt(). I > haven't checked if it's already defined, though. > In the files related rtnetlink, pr_fmt() is not defined. I am not sure if I can define one since there are other error logs in the rtnetlink.c. In addition, there is "rtnl_lock" field in the log, I usually use it to retrieve whether rtnl_lock is held for a long time. If adding pr_fmt() is better, it is ok to me :-) > > +} > > ... > > > + if (rtnl_instance.end_time - rtnl_instance.start_time > 2000000000ULL) { > > Perhaps you should use one of the defined constants from time64.h ? > Will have a separate patch for this. > > + pr_info("rtnl_lock is held by [%d] from [%llu] to [%llu]\n", > > + rtnl_instance.pid, > > + rtnl_instance.start_time, > > + rtnl_instance.end_time); > > + } > Best Regard Rocco _______________________________________________ Linux-mediatek mailing list Linux-mediatek@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-mediatek
WARNING: multiple messages have this Message-ID (diff)
From: Rocco.Yue <rocco.yue@mediatek.com> To: Andy Shevchenko <andy.shevchenko@gmail.com> Cc: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Matthias Brugger <matthias.bgg@gmail.com>, Andrew Morton <akpm@linux-foundation.org>, Masahiro Yamada <masahiroy@kernel.org>, "Nick Desaulniers" <ndesaulniers@google.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, "Peter Zijlstra (Intel)" <peterz@infradead.org>, Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>, Peter Enderborg <peter.enderborg@sony.com>, Thomas Gleixner <tglx@linutronix.de>, Anshuman Khandual <anshuman.khandual@arm.com>, "Vitor Massaru Iha" <vitor@massaru.org>, Sedat Dilek <sedat.dilek@gmail.com>, "Wei Yang" <richard.weiyang@gmail.com>, Cong Wang <cong.wang@bytedance.com>, "Di Zhu" <zhudi21@huawei.com>, Stephen Hemminger <stephen@networkplumber.org>, Francis Laniel <laniel_francis@privacyrequired.com>, Roopa Prabhu <roopa@cumulusnetworks.com>, Andrii Nakryiko <andrii@kernel.org>, "Linux Kernel Mailing List" <linux-kernel@vger.kernel.org>, netdev <netdev@vger.kernel.org>, linux-arm Mailing List <linux-arm-kernel@lists.infradead.org>, "moderated list:ARM/Mediatek SoC support" <linux-mediatek@lists.infradead.org>, <wsd_upsream@mediatek.com> Subject: Re: [PATCH] rtnetlink: add rtnl_lock debug log Date: Sat, 8 May 2021 16:53:32 +0800 [thread overview] Message-ID: <1620464012.29475.76.camel@mbjsdccf07> (raw) In-Reply-To: <CAHp75Vc=GMv5dJ1dJYr=B3W6c+nuPCfXa4wxLJOYPTuqrMskFg@mail.gmail.com> > > We often encounter system hangs caused by certain processes > > holding rtnl_lock for a long time. Even if there is a lock > > detection mechanism in Linux, it is a bit troublesome and > > affects the system performance. We hope to add a lightweight > > debugging mechanism for detecting rtnl_lock. > > > > Up to now, we have discovered and solved some potential bugs > > through such debug information of this lightweight rtnl_lock, > > which is helpful for us. > > > > When you say Y for RTNL_LOCK_DEBUG, then the kernel will detect > > if any function hold rtnl_lock too long and some key information > > will be printed to help identify the issue point. > > > > i.e: from the following logs, we can clear know that the pid=5546 > > clearly > > > RfxSender_4 process hold rtnl_lock for a long time, causing the > > holds > > > system hang. And we can also speculate that the delay operation > > to hang > Thanks for your review, Andy. I Will fix them. > > may be performed in devinet_ioctl(), resulting in rtnl_lock was > > not released in time. > > > > <6>[ 141.151364] ----------- rtnl_print_btrace start ----------- > > Can you, please, shrink this to the point? > Will shrink these points. > > <6>[ 141.152079] RfxSender_4[5546][R] hold rtnl_lock more than 2 sec, > > start time: 139129481562 > > <4>[ 141.153114] rtnl_lock+0x88/0xfc > > <4>[ 141.153523] devinet_ioctl+0x190/0x1268 > > <4>[ 141.154007] inet_ioctl+0x108/0x1f4 > > <4>[ 141.154449] sock_do_ioctl+0x88/0x200 > > <4>[ 141.154911] sock_ioctl+0x4b0/0x884 > > <4>[ 141.155367] do_vfs_ioctl+0x6b0/0xcc4 > > <4>[ 141.155830] __arm64_sys_ioctl+0xc0/0xec > > <4>[ 141.156326] el0_svc_common+0x130/0x2c0 > > <4>[ 141.156810] el0_svc_handler+0xd0/0xe0 > > <4>[ 141.157283] el0_svc+0x8/0xc > > <4>[ 141.157646] Call trace: > > <4>[ 141.157956] dump_backtrace+0x0/0x240 > > <4>[ 141.158418] show_stack+0x18/0x24 > > <4>[ 141.158836] rtnl_print_btrace+0x138/0x1cc > > <4>[ 141.159362] call_timer_fn+0x120/0x47c > > <4>[ 141.159834] expire_timers+0x28c/0x420 > > <4>[ 141.160306] __run_timers+0x3d0/0x494 > > <4>[ 141.160768] run_timer_softirq+0x24/0x48 > > <4>[ 141.161262] __do_softirq+0x26c/0x968 > > <4>[ 141.161725] irq_exit+0x1f8/0x2b4 > > <4>[ 141.162145] __handle_domain_irq+0xdc/0x15c > > <4>[ 141.162672] gic_handle_irq+0xe4/0x188 > > <4>[ 141.163144] el1_irq+0x104/0x200 > > <4>[ 141.163559] __const_udelay+0x118/0x1b0 > > <4>[ 141.164044] devinet_ioctl+0x1a0/0x1268 > > <4>[ 141.164527] inet_ioctl+0x108/0x1f4 > > <4>[ 141.164968] sock_do_ioctl+0x88/0x200 > > <4>[ 141.165428] sock_ioctl+0x4b0/0x884 > > <4>[ 141.165868] do_vfs_ioctl+0x6b0/0xcc4 > > <4>[ 141.166330] __arm64_sys_ioctl+0xc0/0xec > > <4>[ 141.166825] el0_svc_common+0x130/0x2c0 > > <4>[ 141.167308] el0_svc_handler+0xd0/0xe0 > > <4>[ 141.167786] el0_svc+0x8/0xc > > <6>[ 141.168153] ------------ rtnl_print_btrace end ----------- > > > > <6>[ 147.321389] rtnl_lock is held by [5546] from > > [139129481562] to [147321378812] > > > ... > > > +static struct rtnl_debug_btrace_t rtnl_instance = { > > + .task = NULL, > > + .pid = 0, > > + .start_time = 0, > > + .end_time = 0, > > + .nr_entries = 0, > > static assumes all 0:s, what's the point? > will fix it in the patch v2. > > +}; > > ... > > > +static void rtnl_print_btrace(struct timer_list *unused) > > +{ > > + pr_info("----------- %s start -----------\n", __func__); > > + pr_info("%s[%d][%c] hold rtnl_lock more than 2 sec, start time: %llu\n", > > + rtnl_instance.task->comm, > > + rtnl_instance.pid, > > + task_state_to_char(rtnl_instance.task), > > + rtnl_instance.start_time); > > + stack_trace_print(rtnl_instance.addrs, rtnl_instance.nr_entries, 0); > > > + show_stack(rtnl_instance.task, NULL, KERN_DEBUG); > > Unaligned debug level. > KERN_INFO is expected, I will fix it in patch v2. > > + pr_info("------------ %s end -----------\n", __func__); > > Looking into tons of these, perhaps you need to define pr_fmt(). I > haven't checked if it's already defined, though. > In the files related rtnetlink, pr_fmt() is not defined. I am not sure if I can define one since there are other error logs in the rtnetlink.c. In addition, there is "rtnl_lock" field in the log, I usually use it to retrieve whether rtnl_lock is held for a long time. If adding pr_fmt() is better, it is ok to me :-) > > +} > > ... > > > + if (rtnl_instance.end_time - rtnl_instance.start_time > 2000000000ULL) { > > Perhaps you should use one of the defined constants from time64.h ? > Will have a separate patch for this. > > + pr_info("rtnl_lock is held by [%d] from [%llu] to [%llu]\n", > > + rtnl_instance.pid, > > + rtnl_instance.start_time, > > + rtnl_instance.end_time); > > + } > Best Regard Rocco _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-05-08 9:06 UTC|newest] Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-04-29 7:02 [PATCH] rtnetlink: add rtnl_lock debug log Rocco Yue 2021-04-29 7:02 ` Rocco Yue 2021-04-29 7:02 ` Rocco Yue 2021-04-29 9:32 ` Andy Shevchenko 2021-04-29 9:32 ` Andy Shevchenko 2021-04-29 9:32 ` Andy Shevchenko 2021-05-08 8:53 ` Rocco.Yue [this message] 2021-05-08 8:53 ` Rocco.Yue 2021-04-29 12:07 ` kernel test robot 2021-04-29 12:07 ` kernel test robot 2021-04-30 11:48 ` Tetsuo Handa 2021-04-30 11:48 ` Tetsuo Handa 2021-04-30 11:48 ` Tetsuo Handa 2021-05-08 8:54 ` Rocco.Yue 2021-05-08 8:54 ` Rocco.Yue
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1620464012.29475.76.camel@mbjsdccf07 \ --to=rocco.yue@mediatek.com \ --cc=akpm@linux-foundation.org \ --cc=andrii@kernel.org \ --cc=andriy.shevchenko@linux.intel.com \ --cc=andy.shevchenko@gmail.com \ --cc=anshuman.khandual@arm.com \ --cc=cong.wang@bytedance.com \ --cc=davem@davemloft.net \ --cc=kuba@kernel.org \ --cc=laniel_francis@privacyrequired.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mediatek@lists.infradead.org \ --cc=masahiroy@kernel.org \ --cc=matthias.bgg@gmail.com \ --cc=ndesaulniers@google.com \ --cc=netdev@vger.kernel.org \ --cc=penguin-kernel@i-love.sakura.ne.jp \ --cc=peter.enderborg@sony.com \ --cc=peterz@infradead.org \ --cc=richard.weiyang@gmail.com \ --cc=roopa@cumulusnetworks.com \ --cc=sedat.dilek@gmail.com \ --cc=stephen@networkplumber.org \ --cc=tglx@linutronix.de \ --cc=vitor@massaru.org \ --cc=wsd_upsream@mediatek.com \ --cc=zhudi21@huawei.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.