All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: 4.4.103 linux kernel regression
       [not found] <CAKRBrgF3206o=0nwR63A2JqeRP82ZU9WZ3U9L=zR_dAjy3tL1g@mail.gmail.com>
@ 2017-12-23 13:52 ` Greg KH
  2017-12-23 16:36   ` Konstantin Khlebnikov
  0 siblings, 1 reply; 3+ messages in thread
From: Greg KH @ 2017-12-23 13:52 UTC (permalink / raw)
  To: Mathias Tillman
  Cc: netdev, stable, xiyou.wangcong, dsahern, jeffy.chen, davem, khlebnikov

adding stable@ and netdev@

On Sat, Dec 23, 2017 at 10:49:27AM +0000, Mathias Tillman wrote:
> Hi, I wanted to make you aware of a recent regression to the Linux kernel
> introduced with commit 2417da3f4d6bc4fc6c77f613f0e2264090892aa5:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/ipv6?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5

Is this issue also present in Linus's tree?

> I have reported it here:
> https://bugzilla.kernel.org/show_bug.cgi?id=198189

Bugzilla doesn't work for networking bugs, nor stable stuff, just for a
few subsystems, sorry.

> Basically, that commit causes an endless loop if, for some reason, not all
> devices are unregistered in the rollback_registered_many function in
> net/dev.c
> 
> Decided to contact you directly since I have yet to receive any reply on
> the bug report, and I wasn't entirely sure what the procedure was. Please
> do let me know if I have to change anything in the report.

I can revert it, but it would be good to verify if this is an issue in
the latest releases or not first.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 4.4.103 linux kernel regression
  2017-12-23 13:52 ` 4.4.103 linux kernel regression Greg KH
@ 2017-12-23 16:36   ` Konstantin Khlebnikov
       [not found]     ` <CAKRBrgFT0_U=T4VhiSw69k8cMr-v+65gKbMpG4gh7=7ddNiFVg@mail.gmail.com>
  0 siblings, 1 reply; 3+ messages in thread
From: Konstantin Khlebnikov @ 2017-12-23 16:36 UTC (permalink / raw)
  To: Greg KH, Mathias Tillman
  Cc: netdev, stable, xiyou.wangcong, dsahern, jeffy.chen, davem

[-- Attachment #1: Type: text/plain, Size: 1493 bytes --]

On 23.12.2017 16:52, Greg KH wrote:
> adding stable@ and netdev@
> 
> On Sat, Dec 23, 2017 at 10:49:27AM +0000, Mathias Tillman wrote:
>> Hi, I wanted to make you aware of a recent regression to the Linux kernel
>> introduced with commit 2417da3f4d6bc4fc6c77f613f0e2264090892aa5:
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/ipv6?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5
> 
> Is this issue also present in Linus's tree?
> 
>> I have reported it here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=198189
> 
> Bugzilla doesn't work for networking bugs, nor stable stuff, just for a
> few subsystems, sorry.
> 
>> Basically, that commit causes an endless loop if, for some reason, not all
>> devices are unregistered in the rollback_registered_many function in
>> net/dev.c
>>
>> Decided to contact you directly since I have yet to receive any reply on
>> the bug report, and I wasn't entirely sure what the procedure was. Please
>> do let me know if I have to change anything in the report.
> 
> I can revert it, but it would be good to verify if this is an issue in
> the latest releases or not first.

Most likely bug fixed by that commit hid refcount leak for loopback device.

Mathias, please try debug patch from attachment.
It logs all refcount changes for loopback in non-host net namespace.
Hopefully log would will be tiny and show what is missing.

Looks like vsftpd creates and destroys empty net-ns, like "unshare -n true"

[-- Attachment #2: net-debug-lo-refcnt --]
[-- Type: text/plain, Size: 1892 bytes --]

net: debug lo refcnt

From: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 include/linux/netdevice.h |    4 ++++
 net/core/dev.c            |   14 ++++++++++++++
 2 files changed, 18 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 310e729c47a4..b483b0eb22e7 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3141,6 +3141,8 @@ extern int		netdev_budget;
 /* Called by rtnetlink.c:rtnl_unlock() */
 void netdev_run_todo(void);
 
+void netdev_refcnt_log(const struct net_device *dev, char op);
+
 /**
  *	dev_put - release reference to device
  *	@dev: network device
@@ -3150,6 +3152,7 @@ void netdev_run_todo(void);
 static inline void dev_put(struct net_device *dev)
 {
 	this_cpu_dec(*dev->pcpu_refcnt);
+	netdev_refcnt_log(dev, '-');
 }
 
 /**
@@ -3161,6 +3164,7 @@ static inline void dev_put(struct net_device *dev)
 static inline void dev_hold(struct net_device *dev)
 {
 	this_cpu_inc(*dev->pcpu_refcnt);
+	netdev_refcnt_log(dev, '+');
 }
 
 /* Carrier loss detection, dial on demand. The functions netif_carrier_on
diff --git a/net/core/dev.c b/net/core/dev.c
index 2e47d40388fc..d56c834140c8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6957,6 +6957,20 @@ int netdev_refcnt_read(const struct net_device *dev)
 }
 EXPORT_SYMBOL(netdev_refcnt_read);
 
+void netdev_refcnt_log(const struct net_device *dev, char op)
+{
+	static DEFINE_SPINLOCK(lock);
+	unsigned long flags;
+
+	if ((dev->flags & IFF_LOOPBACK) && !net_eq(dev_net(dev), &init_net)) {
+		spin_lock_irqsave(&lock, flags);
+		printk("%c %p %d\n", op, dev, netdev_refcnt_read(dev));
+		dump_stack();
+		spin_unlock_irqrestore(&lock, flags);
+	}
+}
+EXPORT_SYMBOL(netdev_refcnt_log);
+
 /**
  * netdev_wait_allrefs - wait until all references are gone.
  * @dev: target net_device

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: 4.4.103 linux kernel regression
       [not found]     ` <CAKRBrgFT0_U=T4VhiSw69k8cMr-v+65gKbMpG4gh7=7ddNiFVg@mail.gmail.com>
@ 2017-12-24  9:25       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 3+ messages in thread
From: Konstantin Khlebnikov @ 2017-12-24  9:25 UTC (permalink / raw)
  To: Mathias Tillman
  Cc: Greg KH, netdev, stable, xiyou.wangcong, dsahern, jeffy.chen, davem

On 23.12.2017 21:10, Mathias Tillman wrote:
> Thank you, I will test that patch and see if I can find anything interesting in the log. Will have to be some time later next week due to 
> the holidays, but I will get back to you with the results.

Ok, I'll be waiting.

Probably you could share your kernel config and lsmod output?

> 
> What commit are you referring to exactly? I can test it to see if it's fixed.

Commit that was added into v4.4.103 - 76da0704507bbc51875013f6557877ab308cfd0a upstream.

> 
> Also, I should mention that it's not just vsftpd it causes problems with - some other people have reported problems with starting and 
> stopping lxc containers. I don't use those myself so I can't really comment on that, but it does seem to have been fixed by reverting the 
> commit I mentioned.

Yes. This is common problem for all network namespaces.
Bug somewhere else and requires particular configuration.

> 
> Greg: Can't say if the problem exists on master or not - I'm really only able to reproduce it on the Turris Omnia router as I said in the 
> bug report. It's based on openwrt and requires some device-specific patches to function properly, so I'm not sure it would work on the 
> latest - but I can give it a try.
> 
> Regards
> Mathias
> 
> 
> On Sat, 23 Dec 2017, 17:36 Konstantin Khlebnikov, <khlebnikov@yandex-team.ru <mailto:khlebnikov@yandex-team.ru>> wrote:
> 
>     On 23.12.2017 16:52, Greg KH wrote:
>      > adding stable@ and netdev@
>      >
>      > On Sat, Dec 23, 2017 at 10:49:27AM +0000, Mathias Tillman wrote:
>      >> Hi, I wanted to make you aware of a recent regression to the Linux kernel
>      >> introduced with commit 2417da3f4d6bc4fc6c77f613f0e2264090892aa5:
>      >>
>     https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/net/ipv6?h=linux-4.4.y&id=2417da3f4d6bc4fc6c77f613f0e2264090892aa5
>      >
>      > Is this issue also present in Linus's tree?
>      >
>      >> I have reported it here:
>      >> https://bugzilla.kernel.org/show_bug.cgi?id=198189
>      >
>      > Bugzilla doesn't work for networking bugs, nor stable stuff, just for a
>      > few subsystems, sorry.
>      >
>      >> Basically, that commit causes an endless loop if, for some reason, not all
>      >> devices are unregistered in the rollback_registered_many function in
>      >> net/dev.c
>      >>
>      >> Decided to contact you directly since I have yet to receive any reply on
>      >> the bug report, and I wasn't entirely sure what the procedure was. Please
>      >> do let me know if I have to change anything in the report.
>      >
>      > I can revert it, but it would be good to verify if this is an issue in
>      > the latest releases or not first.
> 
>     Most likely bug fixed by that commit hid refcount leak for loopback device.
> 
>     Mathias, please try debug patch from attachment.
>     It logs all refcount changes for loopback in non-host net namespace.
>     Hopefully log would will be tiny and show what is missing.
> 
>     Looks like vsftpd creates and destroys empty net-ns, like "unshare -n true"
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-12-24  9:25 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKRBrgF3206o=0nwR63A2JqeRP82ZU9WZ3U9L=zR_dAjy3tL1g@mail.gmail.com>
2017-12-23 13:52 ` 4.4.103 linux kernel regression Greg KH
2017-12-23 16:36   ` Konstantin Khlebnikov
     [not found]     ` <CAKRBrgFT0_U=T4VhiSw69k8cMr-v+65gKbMpG4gh7=7ddNiFVg@mail.gmail.com>
2017-12-24  9:25       ` Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.