From: Andrew Morton <akpm@linux-foundation.org>
To: Jarek Poplawski <jarkao2@o2.pl>
Cc: Folkert van Heusden <folkert@vanheusden.com>,
linux-kernel@vger.kernel.org,
Jason Wessel <jason.wessel@windriver.com>,
Thomas Gleixner <tglx@linutronix.de>,
stable@kernel.org, netdev@vger.kernel.org
Subject: Re: [PATCH] Re: [2.6.21.1] soft lockup when removing netconsole module
Date: Tue, 26 Jun 2007 16:07:53 -0700 [thread overview]
Message-ID: <20070626160753.f86d8803.akpm@linux-foundation.org> (raw)
In-Reply-To: <20070613092537.GA2432@ff.dom.local>
On Wed, 13 Jun 2007 11:25:37 +0200
Jarek Poplawski <jarkao2@o2.pl> wrote:
> On Tue, Jun 12, 2007 at 01:02:33PM +0200, Jarek Poplawski wrote:
> ...
> > Of course such a problem should preferably be fixed by somebody who
> > knows the code (alas I don't know netconsole), to be sure all needed
> > cancels are still done after this change. I hope Jason's patch is
> > right but I'm a little surprised I can't see netdev in cc (I'll try
> > to fix this).
>
> So, I've had a look into netpoll and, unfortunately, I don't
> think this patch is right...
>
> > > From: Jason Wessel <jason.wessel@windriver.com>
> > >
> > > Do not call cancel_rearming_delayed_work() if there is no
> > > pending work.
> > >
> > > Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
> > > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> > > ---
> > >
> > > net/core/netpoll.c | 6 ++++--
> > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff -puN net/core/netpoll.c~a net/core/netpoll.c
> > > --- a/net/core/netpoll.c~a
> > > +++ a/net/core/netpoll.c
> > > @@ -784,8 +784,10 @@ void netpoll_cleanup(struct netpoll *np)
> > > if (atomic_dec_and_test(&npinfo->refcnt)) {
> > > skb_queue_purge(&npinfo->arp_tx);
> > > skb_queue_purge(&npinfo->txq);
> > > - cancel_rearming_delayed_work(&npinfo->tx_work);
> > > - flush_scheduled_work();
> > > + if (delayed_work_pending(&npinfo->tx_work)) {
> > > + cancel_rearming_delayed_work(&npinfo->tx_work);
> > > + flush_scheduled_work();
> > > + }
> > >
> > > kfree(npinfo);
> > > }
> > > _
>
> There are such possibilities:
>
> 1. After positive delayed_work_pending(&npinfo->tx_work) test
> some work is queued, but there is no guarantee that when running
> it'll rearm again, so cancel_rearming_delayed_work can loop again;
>
> 2. After negative delayed_work_pending(&npinfo->tx_work) test
> a work is just running, eg. waiting on netif_tx_lock, while
> kfree(npinfo) is done here (oops?!).
>
> I've found an additional problem here with or without this patch:
> after deleting a timer in cancel_rearming_delayed_work() there could
> stay a last skb queued in npinfo->txq, and after kfree(npinfo)
> we have small memory leak. If I'm right here similar fix is needed
> in the current netpoll code: additional npinfo->txq purging only
> or maybe the whole cancel_rearming_ changed like this.
>
> I've tried to eliminate these problems in attached below patch
> proposal. I'm not sure it's all right: as I've written earlier I
> don't know netconsole enough, but it's probably a little better
> than above solution.
>
> I've some doubts yet (I didn't have time to check this all):
>
> 1. I hope this other schedule_delayed_work() from netpoll_send_skb()
> is not possible when netpoll_cleanup() runs - if I'm wrong additional
> check of npinfo->refcnt should be done there;
> 2. I also hope npinfo->refcnt before scheduling should be enough here
> - if not - another possibility is adding some locking eg.:
> netif_tx_lock before cancel for synchronization.
>
> Of course it would be very nice if somebody could test or verify
> this patch more.
>
> Regards,
> Jarek P.
>
>
> Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
>
> ---
>
> diff -Nurp 2.6.21-/net/core/netpoll.c 2.6.21/net/core/netpoll.c
> --- 2.6.21-/net/core/netpoll.c 2007-04-26 15:08:32.000000000 +0200
> +++ 2.6.21/net/core/netpoll.c 2007-06-12 21:05:23.000000000 +0200
> @@ -73,7 +73,8 @@ static void queue_process(struct work_st
> netif_tx_unlock(dev);
> local_irq_restore(flags);
>
> - schedule_delayed_work(&npinfo->tx_work, HZ/10);
> + if (atomic_read(&npinfo->refcnt))
> + schedule_delayed_work(&npinfo->tx_work, HZ/10);
> return;
> }
> netif_tx_unlock(dev);
> @@ -780,9 +781,15 @@ void netpoll_cleanup(struct netpoll *np)
> if (atomic_dec_and_test(&npinfo->refcnt)) {
> skb_queue_purge(&npinfo->arp_tx);
> skb_queue_purge(&npinfo->txq);
> - cancel_rearming_delayed_work(&npinfo->tx_work);
> + cancel_delayed_work(&npinfo->tx_work);
> flush_scheduled_work();
>
> + /* clean after last, unfinished work */
> + if (!skb_queue_empty(&npinfo->txq)) {
> + struct sk_buff *skb;
> + skb = __skb_dequeue(&npinfo->txq);
> + kfree_skb(skb);
> + }
> kfree(npinfo);
> }
> }
Everything went quiet?
If this patch has been tested and fixes the bug, can you please send a
version which is ready for merging? (ie: add a suitable description of
what it does).
Thanks.
next prev parent reply other threads:[~2007-06-26 23:09 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-26 15:40 [2.6.21.1] soft lockup when removing netconsole module Folkert van Heusden
2007-05-26 15:53 ` Parag Warudkar
2007-05-26 16:12 ` Thomas Gleixner
2007-05-26 16:17 ` Folkert van Heusden
2007-05-26 16:35 ` Thomas Gleixner
2007-05-26 16:49 ` Folkert van Heusden
2007-05-26 17:06 ` Thomas Gleixner
2007-05-26 17:12 ` Folkert van Heusden
2007-05-27 20:32 ` Matt Mackall
2007-05-29 7:56 ` Andrew Morton
2007-05-30 13:28 ` [PATCH] " Jason Wessel
2007-05-30 20:38 ` Folkert van Heusden
2007-06-12 11:02 ` Jarek Poplawski
2007-06-13 9:25 ` [PATCH] " Jarek Poplawski
2007-06-26 23:07 ` Andrew Morton [this message]
2007-06-27 0:46 ` Wessel, Jason
2007-06-27 1:00 ` Andrew Morton
2007-06-27 7:24 ` Jarek Poplawski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070626160753.f86d8803.akpm@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=folkert@vanheusden.com \
--cc=jarkao2@o2.pl \
--cc=jason.wessel@windriver.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=stable@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).