linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Brauner <christian.brauner@canonical.com>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	davem@davemloft.net, kuznet@ms2.inr.ac.ru,
	yoshfuji@linux-ipv6.org, pombredanne@nexb.com,
	kstewart@linuxfoundation.org, gregkh@linuxfoundation.org,
	dsahern@gmail.com, fw@strlen.de, lucien.xin@gmail.com,
	jakub.kicinski@netronome.com, jbenc@redhat.com,
	nicolas.dichtel@6wind.com
Subject: Re: [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR
Date: Mon, 3 Sep 2018 16:22:47 +0200	[thread overview]
Message-ID: <20180903142246.wvgucy57phpipy7h@gmail.com> (raw)
In-Reply-To: <2319a029-7aca-b7aa-2e8f-4dfdeedcb6df@virtuozzo.com>

On Mon, Sep 03, 2018 at 04:41:45PM +0300, Kirill Tkhai wrote:
> On 01.09.2018 04:34, Christian Brauner wrote:
> > On Thu, Aug 30, 2018 at 04:45:45PM +0200, Christian Brauner wrote:
> >> On Thu, Aug 30, 2018 at 11:49:31AM +0300, Kirill Tkhai wrote:
> >>> On 29.08.2018 21:13, Christian Brauner wrote:
> >>>> Hi Kirill,
> >>>>
> >>>> Thanks for the question!
> >>>>
> >>>> On Wed, Aug 29, 2018 at 11:30:37AM +0300, Kirill Tkhai wrote:
> >>>>> Hi, Christian,
> >>>>>
> >>>>> On 29.08.2018 02:18, Christian Brauner wrote:
> >>>>>> From: Christian Brauner <christian@brauner.io>
> >>>>>>
> >>>>>> Hey,
> >>>>>>
> >>>>>> A while back we introduced and enabled IFLA_IF_NETNSID in
> >>>>>> RTM_{DEL,GET,NEW}LINK requests (cf. [1], [2], [3], [4], [5]). This has led
> >>>>>> to signficant performance increases since it allows userspace to avoid
> >>>>>> taking the hit of a setns(netns_fd, CLONE_NEWNET), then getting the
> >>>>>> interfaces from the netns associated with the netns_fd. Especially when a
> >>>>>> lot of network namespaces are in use, using setns() becomes increasingly
> >>>>>> problematic when performance matters.
> >>>>>
> >>>>> could you please give a real example, when setns()+socket(AF_NETLINK) cause
> >>>>> problems with the performance? You should do this only once on application
> >>>>> startup, and then you have created netlink sockets in any net namespaces you
> >>>>> need. What is the problem here?
> >>>>
> >>>> So we have a daemon (LXD) that is often running thousands of containers.
> >>>> When users issue a lxc list request against the daemon it returns a list
> >>>> of all containers including all of the interfaces and addresses for each
> >>>> container. To retrieve those addresses we currently rely on setns() +
> >>>> getifaddrs() for each of those containers. That has horrible
> >>>> performance.
> >>>
> >>> Could you please provide some numbers showing that setns()
> >>> introduces signify performance decrease in the application?
> >>
> >> Sure, might take a few days++ though since I'm traveling.
> > 
> > Hey Kirill,
> > 
> > As promised here's some code [1] that compares performance. I basically
> > did a setns() to the network namespace and called getifaddrs() and
> > compared this to the scenario where I use the newly introduced property.
> > I did this 1 million times and calculated the mean getifaddrs()
> > retrieval time based on that.
> > My patch cuts the time in half.
> > 
> > brauner@wittgenstein:~/netns_getifaddrs$ ./getifaddrs_perf 0 1178
> > Mean time in microseconds (netnsid): 81
> > Mean time in microseconds (setns): 162
> > 
> > Christian
> > 
> > I'm only appending the main file since the netsns_getifaddrs() code I
> > used is pretty long:
> > 
> > [1]:
> > 
> > #define _GNU_SOURCE
> > #define __STDC_FORMAT_MACROS
> > #include <fcntl.h>
> > #include <inttypes.h>
> > #include <linux/types.h>
> > #include <sched.h>
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <sys/stat.h>
> > #include <sys/time.h>
> > #include <sys/types.h>
> > #include <unistd.h>
> > 
> > #include "netns_getifaddrs.h"
> > #include "print_getifaddrs.h"
> > 
> > #define ITERATIONS 1000000
> > #define SEC_TO_MICROSEC(x) (1000000 * (x))
> > 
> > int main(int argc, char *argv[])
> > {
> > 	int i, ret;
> > 	__s32 netns_id;
> > 	pid_t netns_pid;
> > 	char path[1024];
> > 	intmax_t times[ITERATIONS];
> > 	struct timeval t1, t2;
> > 	intmax_t time_in_mcs;
> > 	int fret = EXIT_FAILURE;
> > 	intmax_t sum = 0;
> > 	int host_netns_fd = -1, netns_fd = -1;
> > 
> > 	struct ifaddrs *ifaddrs = NULL;
> > 
> > 	if (argc != 3)
> > 		goto on_error;
> > 
> > 	netns_id = atoi(argv[1]);
> > 	netns_pid = atoi(argv[2]);
> > 	printf("%d\n", netns_id);
> > 	printf("%d\n", netns_pid);
> > 
> > 	for (i = 0; i < ITERATIONS; i++) {
> > 		ret = gettimeofday(&t1, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = netns_getifaddrs(&ifaddrs, netns_id);
> > 		freeifaddrs(ifaddrs);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = gettimeofday(&t2, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> > 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> > 		times[i] = time_in_mcs;
> > 	}
> > 
> > 	for (i = 0; i < ITERATIONS; i++)
> > 		sum += times[i];
> > 
> > 	printf("Mean time in microseconds (netnsid): %ju\n",
> > 	       sum / ITERATIONS);
> > 
> > 	ret = snprintf(path, sizeof(path), "/proc/%d/ns/net", netns_pid);
> > 	if (ret < 0 || (size_t)ret >= sizeof(path))
> > 		goto on_error;
> > 
> > 	netns_fd = open(path, O_RDONLY | O_CLOEXEC);
> > 	if (netns_fd < 0)
> > 		goto on_error;
> > 
> > 	host_netns_fd = open("/proc/self/ns/net", O_RDONLY | O_CLOEXEC);
> > 	if (host_netns_fd < 0)
> > 		goto on_error;
> > 
> > 	memset(times, 0, sizeof(times));
> > 	for (i = 0; i < ITERATIONS; i++) {
> > 		ret = gettimeofday(&t1, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = setns(netns_fd, CLONE_NEWNET);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = getifaddrs(&ifaddrs);
> > 		freeifaddrs(ifaddrs);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = gettimeofday(&t2, NULL);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		ret = setns(host_netns_fd, CLONE_NEWNET);
> > 		if (ret < 0)
> > 			goto on_error;
> > 
> > 		time_in_mcs = (SEC_TO_MICROSEC(t2.tv_sec) + t2.tv_usec) -
> > 			      (SEC_TO_MICROSEC(t1.tv_sec) + t1.tv_usec);
> > 		times[i] = time_in_mcs;
> > 	}
> > 
> > 	for (i = 0; i < ITERATIONS; i++)
> > 		sum += times[i];
> > 
> > 	printf("Mean time in microseconds (setns): %ju\n",
> > 	       sum / ITERATIONS);
> > 
> > 	fret = EXIT_SUCCESS;
> > 
> > on_error:
> > 	if (netns_fd >= 0)
> > 		close(netns_fd);
> > 
> > 	if (host_netns_fd >= 0)
> > 		close(host_netns_fd);
> > 
> > 	exit(fret);
> > }
> 
> But this is a synthetic test, while I asked about real workflow.
> Is this real problem for lxd, and there is observed performance
> decrease?

As you can see in this mail I explicitly stated that it is a real
performacne issue we see with LXD. You asked for numbers I gave you
numbers by writing a test-program just per your request. The benefit of
this "synthetic" case is that it allows us to clearly see the
performance benefit. Expecting me to hack all of this into LXD just to
get some perf numbers that will show the exact same thing per your
request is - and I hope I'm not being unreasonable here - expecting a
bit much.

> 
> I see, there are already nsid use in existing code, but I have to say,
> that adding new types of variables as a system call arguments make it
> less modular. When you request RTM_GETADDR for a specific nsid, this
> obligates the kernel to make everything unchangable during the call,
> doesn't it?
> 
> We may look at existing code as example, what problems this may cause.
> Look at do_setlink(). There are many different types of variables,
> and all of them should be dereferenced atomically. So, all the function
> is executed under global rtnl. And this causes delays in another config
> places, which are sensitive to rtnl. So, adding more dimensions to RTM_GETADDR
> may turn it in the same overloaded function as do_setlink() is. And one
> day, when we reach the state, when we must rework all of this, we won't
> be able to do this. I'm not sure, now is not too late.
> 
> I just say about this, because it's possible we should consider another
> approach in rtnl communication in general, and stop to overload it.

While I sympathize with your concerns this all seems very vague. There
is a real-world use case that is solved by this patchset.

      parent reply	other threads:[~2018-09-03 14:22 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-28 23:18 [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 1/5] rtnetlink: add rtnl_get_net_ns_capable() Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 2/5] if_addr: add IFA_IF_NETNSID Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 3/5] ipv4: enable IFA_IF_NETNSID for RTM_GETADDR Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 4/5] ipv6: " Christian Brauner
2018-08-30 18:41   ` kbuild test robot
2018-09-03  1:18     ` Christian Brauner
2018-08-28 23:18 ` [PATCH net-next 5/5] rtnetlink: move type calculation out of loop Christian Brauner
2018-08-29  8:30 ` [PATCH net-next 0/5] rtnetlink: add IFA_IF_NETNSID for RTM_GETADDR Kirill Tkhai
2018-08-29 18:13   ` Christian Brauner
2018-08-30  8:49     ` Kirill Tkhai
2018-08-30 14:45       ` Christian Brauner
2018-08-30 15:49         ` Nicolas Dichtel
2018-09-01  0:58           ` David Miller
2018-09-01 18:47             ` Christian Brauner
2018-09-02  9:58               ` Jiri Benc
2018-09-03  7:50                 ` Nicolas Dichtel
2018-09-03  9:32                   ` Christian Brauner
2018-09-01  1:34         ` Christian Brauner
2018-09-03 13:41           ` Kirill Tkhai
2018-09-03 13:50             ` Jiri Benc
2018-09-03 14:53               ` Kirill Tkhai
2018-09-03 14:22             ` Christian Brauner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180903142246.wvgucy57phpipy7h@gmail.com \
    --to=christian.brauner@canonical.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@gmail.com \
    --cc=fw@strlen.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=jakub.kicinski@netronome.com \
    --cc=jbenc@redhat.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=ktkhai@virtuozzo.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lucien.xin@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=pombredanne@nexb.com \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).