netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dongliang Mu <mudongliangabcd@gmail.com>
To: Pavel Skripkin <paskripkin@gmail.com>
Cc: alex.aring@gmail.com, "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-wpan@vger.kernel.org, netdev@vger.kernel.org,
	stefan@datenfreihafen.org,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com,
	Dan Carpenter <dan.carpenter@oracle.com>,
	Greg KH <gregkh@linuxfoundation.org>
Subject: Re: Suggestions on how to debug kernel crashes where printk and gdb both does not work
Date: Tue, 15 Jun 2021 18:37:14 +0800	[thread overview]
Message-ID: <CAD-N9QWj7LpdJvDy7r2+WCeFKw2P7DFos=88186-h3GFZPKAvw@mail.gmail.com> (raw)
In-Reply-To: <20210614233011.79ebe38a@gmail.com>

On Tue, Jun 15, 2021 at 4:30 AM Pavel Skripkin <paskripkin@gmail.com> wrote:
>
> On Mon, 14 Jun 2021 23:04:03 +0800
> Dongliang Mu <mudongliangabcd@gmail.com> wrote:
>
> > On Mon, Jun 14, 2021 at 10:47 PM Pavel Skripkin
> > <paskripkin@gmail.com> wrote:
> > >
> > > On Mon, 14 Jun 2021 22:40:55 +0800
> > > Dongliang Mu <mudongliangabcd@gmail.com> wrote:
> > >
> > > > On Mon, Jun 14, 2021 at 10:25 PM Pavel Skripkin
> > > > <paskripkin@gmail.com> wrote:
> > > > >
> > > > > On Mon, 14 Jun 2021 22:19:10 +0800
> > > > > Dongliang Mu <mudongliangabcd@gmail.com> wrote:
> > > > >
> > > > > > On Mon, Jun 14, 2021 at 9:34 PM Pavel Skripkin
> > > > > > <paskripkin@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, 14 Jun 2021 21:22:43 +0800
> > > > > > > Dongliang Mu <mudongliangabcd@gmail.com> wrote:
> > > > > > >
> > > > > > > > Dear kernel developers,
> > > > > > > >
> > > > > > > > I was trying to debug the crash - memory leak in
> > > > > > > > hwsim_add_one [1] recently. However, I encountered a
> > > > > > > > disgusting issue: my breakpoint and printk/pr_alert in the
> > > > > > > > functions that will be surely executed do not work. The
> > > > > > > > stack trace is in the following. I wrote this email to
> > > > > > > > ask for some suggestions on how to debug such cases?
> > > > > > > >
> > > > > > > > Thanks very much. Looking forward to your reply.
> > > > > > > >
> > > > > > >
> > > > > > > Hi, Dongliang!
> > > > > > >
> > > > > > > This bug is not similar to others on the dashboard. I spent
> > > > > > > some time debugging it a week ago. The main problem here,
> > > > > > > that memory allocation happens in the boot time:
> > > > > > >
> > > > > > > > [<ffffffff84359255>] kernel_init+0xc/0x1a7
> > > > > > > > init/main.c:1447
> > > > > > >
> > > > > >
> > > > > > Oh, nice catch. No wonder why my debugging does not work. :(
> > > > > >
> > > > > > > and reproducer simply tries to
> > > > > > > free this data. You can use ftrace to look at it. Smth like
> > > > > > > this:
> > > > > > >
> > > > > > > $ echo 'hwsim_*' > $TRACE_DIR/set_ftrace_filter
> > > > > >
> > > > > > Thanks for your suggestion.
> > > > > >
> > > > > > Do you have any conclusions about this case? If you have found
> > > > > > out the root cause and start writing patches, I will turn my
> > > > > > focus to other cases.
> > > > >
> > > > > No, I had some busy days and I have nothing about this bug for
> > > > > now. I've just traced the reproducer execution and that's all :)
> > > > >
> > > > > I guess, some error handling paths are broken, but Im not sure
> > > >
> > > > In the beginning, I agreed with you. However, after I manually
> > > > checked functions: hwsim_probe (initialization) and  hwsim_remove
> > > > (cleanup), then things may be different. The cleanup looks
> > > > correct to me. I would like to debug but stuck with the debugging
> > > > process.
> > > >
> > > > And there is another issue: the cleanup function also does not
> > > > output anything or hit the breakpoint. I don't quite understand
> > > > it since the cleanup is not at the boot time.
> > > >
> > > > Any idea?
> > > >
> > >
> > > Output from ftrace (syzkaller repro):
> > >
> > > root@syzkaller:~# cat /sys/kernel/tracing/trace
> > > # tracer: function_graph
> > > #
> > > # CPU  DURATION                  FUNCTION CALLS
> > > # |     |   |                     |   |   |   |
> > >  1)               |  hwsim_del_radio_nl() {
> > >  1)               |    hwsim_del() {
> > >  1)               |      hwsim_edge_unsubscribe_me() {
> > >  1) ! 310.041 us  |        hwsim_free_edge();
> > >  1) ! 665.221 us  |      }
> > >  1) * 52999.05 us |    }
> > >  1) * 53035.38 us |  }
> > >
> > > Cleanup function is not the case, I think :)
> >
> > It seems like I spot the incorrect cleanup function (hwsim_remove is
> > the right one is in my mind). Let me learn how to use ftrace to log
> > the executed functions and then discuss this case with you guys.
> >
>
> Hmmm, I think, there is a mess with lists.
>
> I just want to share my debug results, I have no idea about the fix for
> now.
>
> In hwsim_probe() edge for phy->idx = 1 is allocated, then reproduces
> sends a request to delete phy with idx == 0, so this check in
> hwsim_edge_unsubscribe_me():
>
>         if (e->endpoint->idx == phy->idx) {
>                 ... clean up code ...
>         }
>
> won't be passed and edge won't be freed (because it was allocated for
> phy with idx == 1). Allocated edge for phy 1 becomes leaked after
> hwsim_del(). I can't really see the code where phy with idx == 1 can
> be deleted from list...

Thanks for sharing your debugging result.

              hwsim_phys
                       |
   ---------------------------------
   |                                      |
sub0 (edges)                 sub1 (edges)
   ----> e (idx = 1)               ----> e (idx = 0)

hwsim_del_radio_nl will call hwsim_del to delete phy (idx:1).
However, in this function, it only deletes the e in the edge list of sub1.
Then it deletes phy (i.e., sub0) from the hwsim_phys list. So it
leaves the e in the edge list of sub0 non-free.

I proposed a patch and test it successfully in the syzbot dashboard.

diff --git a/drivers/net/ieee802154/mac802154_hwsim.c
b/drivers/net/ieee802154/mac802154_hwsim.c
index da9135231c07..b05159cff33a 100644
--- a/drivers/net/ieee802154/mac802154_hwsim.c
+++ b/drivers/net/ieee802154/mac802154_hwsim.c
@@ -824,9 +824,16 @@ static int hwsim_add_one(struct genl_info *info,
struct device *dev,
 static void hwsim_del(struct hwsim_phy *phy)
 {
  struct hwsim_pib *pib;
+ struct hwsim_edge *e;

  hwsim_edge_unsubscribe_me(phy);

+ // remove the edges in the list
+ list_for_each_entry_rcu(e, &phy->edges, list) {
+ list_del_rcu(&e->list);
+ hwsim_free_edge(e);
+ }
+
  list_del(&phy->list);

  rcu_read_lock();

 I will send a patch later.


>
> Maybe, it's kmemleak bug. Similar strange case was with this one
> https://syzkaller.appspot.com/bug?id=3a325b8389fc41c1bc94de0f4ac437ed13cce584.
> I find it strange, that I could reach leaked pointers after kmemleak reported a
> leak. Im not familiar with kmemleak internals and I might be wrong
>
>
> With regards,
> Pavel Skripkin

  reply	other threads:[~2021-06-15 10:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-14 13:22 Suggestions on how to debug kernel crashes where printk and gdb both does not work Dongliang Mu
2021-06-14 13:34 ` Pavel Skripkin
2021-06-14 14:19   ` Dongliang Mu
2021-06-14 14:25     ` Pavel Skripkin
2021-06-14 14:40       ` Dongliang Mu
2021-06-14 14:47         ` Pavel Skripkin
2021-06-14 15:04           ` Dongliang Mu
2021-06-14 20:30             ` Pavel Skripkin
2021-06-15 10:37               ` Dongliang Mu [this message]
2021-06-15 13:34                 ` Pavel Skripkin
2021-06-16  2:02                   ` Dongliang Mu
2021-06-16  2:11                     ` Dongliang Mu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAD-N9QWj7LpdJvDy7r2+WCeFKw2P7DFos=88186-h3GFZPKAvw@mail.gmail.com' \
    --to=mudongliangabcd@gmail.com \
    --cc=alex.aring@gmail.com \
    --cc=dan.carpenter@oracle.com \
    --cc=davem@davemloft.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-wpan@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=paskripkin@gmail.com \
    --cc=stefan@datenfreihafen.org \
    --cc=syzbot+b80c9959009a9325cdff@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).