linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: Old content of /proc/net after switching network namespace
@ 2019-01-04  8:21 Mateusz Stępień
  2019-01-04 11:09 ` Alexey Dobriyan
  0 siblings, 1 reply; 4+ messages in thread
From: Mateusz Stępień @ 2019-01-04  8:21 UTC (permalink / raw)
  To: adobriyan; +Cc: linux-fsdevel, linux-kernel

Hello everyone,

After changing network namespace using setns, the content of /proc/net 
still represents the original namespace.
It looks like procfs dentries are not invalidated in dcache properly 
after the namespace switch.
It happens only, when you read content of /proc/net before changing 
namespace
The problem is reproducible in 4.19.13 but not in 4.14.X.
Bisecting the stable kernel tree shows that the commit
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1da4d377f943fe4194ffb9fb9c26cc58fad4dd24 
introduced the problem.
Reverting mentioned commit resolves it.

MCVE (slightly modified example from [man 2 setns]):

#define _GNU_SOURCE
#include <fcntl.h>
#include <sched.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>

#define errExit(msg)    do { perror(msg); exit(EXIT_FAILURE); \
                         } while (0)

void print_dev()
{
     int fd2;
     fd2 = open("/proc/net/dev", O_RDONLY);
     char buf[2048] = {0};
     read(fd2, buf, 2048);
     printf("%s", buf);
     close(fd2);
}
int
main(int argc, char *argv[])
{
     int fd;

     printf("before namespace switch =========\n");
     print_dev();
     fd = open(argv[1], O_RDONLY); /* Get file descriptor for namespace */
     if (fd == -1)
         errExit("open");

     if (setns(fd, 0) == -1)       /* Join that namespace */
         errExit("setns");

     printf("after namespace switch ++++++++++\n");
     print_dev();
     return 0;
}


Steps to reproduce (assuming we have an interface named enp0s9):

ip netns add test
ip link set dev enp0s9 netns test
ip netns exec test sleep 30 &
gcc -o mcve mcve.c # mcve.c contains above C code
./mcve /proc/$(pidof sleep)/ns/net
# before namespace switch =========
# Inter-|   Receive                                                | 
Transmit
#  face |bytes    packets errs drop fifo frame compressed 
multicast|bytes    packets errs drop fifo colls carrier compressed
#     lo:       0       0    0    0    0     0          0         0 
   0       0    0    0    0     0       0          0
#    br0:       0       0    0    0    0     0          0         0 
   0       0    0    0    0     0       0          0
# enp0s3:  149625    1117    0    0    0     0          0         1 
61664     485    0    0    0     0       0          0
# docker0:       0       0    0    0    0     0          0         0 
    0       0    0    0    0     0       0          0
# enp0s8:   17086      60    0    0    0     0          0        29 
1006      13    0    0    0     0       0          0
# after namespace switch ++++++++++
# Inter-|   Receive                                                | 
Transmit
#  face |bytes    packets errs drop fifo frame compressed 
multicast|bytes    packets errs drop fifo colls carrier compressed
#     lo:       0       0    0    0    0     0          0         0 
   0       0    0    0    0     0       0          0
#    br0:       0       0    0    0    0     0          0         0 
   0       0    0    0    0     0       0          0
# enp0s3:  150813    1135    0    0    0     0          0         1 
64348     503    0    0    0     0       0          0
# docker0:       0       0    0    0    0     0          0         0 
    0       0    0    0    0     0       0          0
# enp0s8:   17086      60    0    0    0     0          0        29 
1006      13    0    0    0     0       0          0
ip netns exec test cat /proc/net/dev
## Should display
# Inter-|   Receive                                                | 
Transmit
# face |bytes    packets errs drop fifo frame compressed multicast|bytes 
    packets errs drop fifo colls carrier compressed
#    lo:       0       0    0    0    0     0          0         0 
  0       0    0    0    0     0       0          0
# enp0s9:   10438      33    0    0    0     0          0        17 
936      12    0    0    0     0       0          0

output from awk -f scripts/ver_linux

Linux test-agent 4.19.13 #1 SMP PREEMPT Thu Jan 3 12:03:20 UTC 2019 
x86_64 GNU/Linux

Util-linux          	2.29.2
Mount               	2.29.2
Module-init-tools   	23
E2fsprogs           	1.43.4
Linux C Library     	2.24
Dynamic linker (ldd)	2.24
Linux C++ Library   	6.0.22
Procps              	3.3.12
Net-tools           	2.10
Sh-utils            	8.26
Udev                	232
Modules Loaded      	ahci ata_generic ata_piix crc32c_intel e1000 
ehci_hcd ehci_pci i2c_core i2c_piix4 libahci serio_raw usb_common usbcore

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: Old content of /proc/net after switching network namespace
  2019-01-04  8:21 PROBLEM: Old content of /proc/net after switching network namespace Mateusz Stępień
@ 2019-01-04 11:09 ` Alexey Dobriyan
  2019-01-07  8:54   ` Mateusz Stępień
  0 siblings, 1 reply; 4+ messages in thread
From: Alexey Dobriyan @ 2019-01-04 11:09 UTC (permalink / raw)
  To: Mateusz Stępień; +Cc: linux-fsdevel, linux-kernel

On Fri, Jan 04, 2019 at 09:21:23AM +0100, Mateusz Stępień wrote:
> After changing network namespace using setns, the content of /proc/net 
> still represents the original namespace.
> It looks like procfs dentries are not invalidated in dcache properly 
> after the namespace switch.
> It happens only, when you read content of /proc/net before changing 
> namespace
> The problem is reproducible in 4.19.13 but not in 4.14.X.
> Bisecting the stable kernel tree shows that the commit
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1da4d377f943fe4194ffb9fb9c26cc58fad4dd24 
> introduced the problem.
> Reverting mentioned commit resolves it.
> 
> MCVE (slightly modified example from [man 2 setns]):

	open /proc/net/dev
	read
	close

	setns

	open /proc/net/dev
	read
	close

This bug was discussed recently and Al even posted how to fix it
properly.

Try this horror patch meanwhile:

--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -219,19 +219,30 @@ void proc_free_inum(unsigned int inum)
 	ida_simple_remove(&proc_inum_ida, inum - PROC_DYNAMIC_FIRST);
 }
 
+static bool net_root(const struct proc_dir_entry *pde)
+{
+	return pde->parent == &proc_root && pde->mode == 0;
+}
+
 static int proc_misc_d_revalidate(struct dentry *dentry, unsigned int flags)
 {
+	const struct proc_dir_entry *pde;
+
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
-	if (atomic_read(&PDE(d_inode(dentry))->in_use) < 0)
+	pde = PDE(d_inode(dentry));
+
+	if (atomic_read(&pde->in_use) < 0 || net_root(pde))
 		return 0; /* revalidate */
 	return 1;
 }
 
 static int proc_misc_d_delete(const struct dentry *dentry)
 {
-	return atomic_read(&PDE(d_inode(dentry))->in_use) < 0;
+	const struct proc_dir_entry *pde = PDE(d_inode(dentry));
+
+	return atomic_read(&pde->in_use) < 0 || net_root(pde);
 }
 
 static const struct dentry_operations proc_misc_dentry_ops = {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: Old content of /proc/net after switching network namespace
  2019-01-04 11:09 ` Alexey Dobriyan
@ 2019-01-07  8:54   ` Mateusz Stępień
  2019-01-07 12:39     ` Alexey Dobriyan
  0 siblings, 1 reply; 4+ messages in thread
From: Mateusz Stępień @ 2019-01-07  8:54 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: linux-fsdevel, linux-kernel

Thank you for the quick response, unfortunately the patch does not work, 
problem still persist.
I saw the other mail thread, and it looks like the proper solution 
described by Al will take some time. I don't know if we can afford right 
now to write a proper patch ourselves, also at least my understanding is 
that this is a regression, so my question is: can we expect a proper 
patch in a coming weeks? If not, will reverting the patch at fault have 
any more negative side effects? Of course it will bring back the 
behaviour it was supposed to fix, but are there any other issues?

Thanks,
Mateusz

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: PROBLEM: Old content of /proc/net after switching network namespace
  2019-01-07  8:54   ` Mateusz Stępień
@ 2019-01-07 12:39     ` Alexey Dobriyan
  0 siblings, 0 replies; 4+ messages in thread
From: Alexey Dobriyan @ 2019-01-07 12:39 UTC (permalink / raw)
  To: Mateusz Stępień; +Cc: linux-fsdevel, linux-kernel

On Mon, Jan 07, 2019 at 09:54:26AM +0100, Mateusz Stępień wrote:
> Thank you for the quick response, unfortunately the patch does not work, 
> problem still persist.

OK, I reproduced the bug.

> I saw the other mail thread, and it looks like the proper solution 
> described by Al will take some time. I don't know if we can afford right 
> now to write a proper patch ourselves, also at least my understanding is 
> that this is a regression, so my question is: can we expect a proper 
> patch in a coming weeks? If not, will reverting the patch at fault have 
> any more negative side effects? Of course it will bring back the 
> behaviour it was supposed to fix, but are there any other issues?

Other issues, no.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-01-07 12:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-04  8:21 PROBLEM: Old content of /proc/net after switching network namespace Mateusz Stępień
2019-01-04 11:09 ` Alexey Dobriyan
2019-01-07  8:54   ` Mateusz Stępień
2019-01-07 12:39     ` Alexey Dobriyan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).