All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC V3 PATCH 00/26] Kernel NET policy
@ 2016-09-12 14:55 kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 01/26] net: introduce " kan.liang
                   ` (27 more replies)
  0 siblings, 28 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

It is a big challenge to get good network performance. First, the network
performance is not good with default system settings. Second, it is too
difficult to do automatic tuning for all possible workloads, since workloads
have different requirements. Some workloads may want high throughput. Some may
need low latency. Last but not least, there are lots of manual configurations.
Fine grained configuration is too difficult for users.

NET policy intends to simplify the network configuration and get a good network
performance according to the hints(policy) which is applied by user. It
provides some typical "policies" for user which can be set per-socket, per-task
or per-device. The kernel will automatically figures out how to merge different
requests to get good network performance.

NET policy is designed for multiqueue network devices. This implementation is
only for Intel NICs using i40e driver. But the concepts and generic code should
apply to other multiqueue NICs too.

NET policy is also a combination of generic policy manager code and some
ethtool callbacks (per queue coalesce setting, flow classification rules) to
configure the driver.

This series also supports CPU hotplug and device hotplug.

Here are some common questions about NET policy.
 1. Why userspace tool cannot do the same thing?
    A: Kernel is more suitable for NET policy.
       - User space code would be far more complicated to get right and perform
         well . It always need to work with out of date state compared to the
         latest, because it cannot do any locking with the kernel state.
       - User space code is less efficient than kernel code, because of the
         additional context switches needed.
       - Kernel is in the right position to coordinate requests from multiple
         users.

 2. Is NET policy looking for optimal settings?
    A: No. The NET policy intends to get a good network performance according
       to user's specific request. Our target for good performance is ~90% of
       the optimal settings.

 3. How's the configuration impact the connection rates?
    A: There are two places to acquire rtnl mutex to configure the device.
       - One is to do device policy setting. It happens on initalization stage,
         hotplug or queue number changes. The device policy will be set to
         NET_POLICY_NONE. If so, it "falls back" to the system default way to
         direct the packets. It doesn't block the connection.
       - The other is to set Rx network flow classification options or rules.
         It uses work queue to do asynchronized setting. It avoid destroying
         the connection rates.

 4. About disabling  IRQ balance?
    A: Disabling IRQ balance is a common way (recommend way for some devices) to
       tune network performance. NET policy provides an option for driver to choose
       to disable IRQ balance and set IRQ affinity.

Here are some key Interfaces/APIs for NET policy.

Interfaces which export to user space

   /proc/net/netpolicy/$DEV/policy
   User can set/get per device policy from /proc

   /proc/$PID/net_policy
   User can set/get per task policy from /proc
   prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
   An alternative way to set/get per task policy is from prctl.

   setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
   User can set/get per socket policy by setsockopt

New ndo opt

   int (*ndo_netpolicy_init)(struct net_device *dev,
                             struct netpolicy_info *info);
   Initialize device driver for NET policy

   int (*ndo_get_irq_info)(struct net_device *dev,
                           struct netpolicy_dev_info *info);
   Collect device information. Currently, only collecting IRQ
   informance should be enough.

   int (*ndo_set_net_policy)(struct net_device *dev,
                             enum netpolicy_name name);
   This interface is used to set device NET policy by name. It is device driver's
   responsibility to set driver specific configuration for the given policy.

NET policy subsystem APIs

   netpolicy_register(struct netpolicy_instance *instance,
                      enum netpolicy_name policy)
   netpolicy_unregister(struct netpolicy_instance *instance)
   Register/unregister per task/socket NET policy.
   The socket/task can only be benefited when it register itself with
   specific policy. After registeration, an record will be created and inserted
   into a RCU hash table, which include all the NET policy related information
   for the socket/task.

   netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx);
   Find the proper queue according to policy for packet receiving and
   transmitting

   netpolicy_set_rules(struct netpolicy_instance *instance);
   Configure Rx network flow classification rules

For using NET policy, the per-device policy must be set in advance. It will
automatically configure the system and re-organize the resource of the system
accordingly. For system configuration, in this series, it will disable irq
balance, set device queue irq affinity, and modify interrupt moderation. For
re-organizing the resource, current implementation forces that CPU and queue
irq are 1:1 mapping. An 1:1 mapping group is also called NET policy object.
For each device policy, it maintains a policy list. Once the device policy is
applied, the objects will be insert and tracked in that device policy list. The
policy list only be updated when CPU/device hotplug, queue number changes or
device policy changes.
The user can use /proc, prctl and setsockopt to set per-task and per-socket
NET policy. Once the policy is set, an related record will be inserted into RCU
hash table. The record includes ptr, policy and NET policy object. The ptr is
the pointer address of task/socket. The object will not be assigned until the
first package receive/transmit. The object is picked by round-robin from object
list. Once the object is determined, the following packets will be set to
redirect to the queue(object).
The object can be shared. The per-task or per-socket policy can be inherited.

Now NET policy supports four per device policies and three per task/socket
policies.
    - BULK policy: This policy is designed for high throughput. It can be
      applied to either per device policy or per task/socket policy.
    - CPU policy: This policy is designed for high throughput but lower CPU
      utilization (power saving). It can be applied to either per device policy
      or per task/socket policy.
    - LATENCY policy: This policy is designed for low latency. It can be
      applied to either per device policy or per task/socket policy.
    - MIX policy: This policy can only be applied to per device policy. This
      is designed for the case which miscellaneous types of workload running
      on the device.

Lots of tests are done for NET policy on platforms with Intel Xeon E5 V2
and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
Netperf is used to evaluate the throughput and latency performance.
  - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
    -b burst -D" is used to evaluate throughput performance, which is
    called throughput-first workload.
  - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
    used to evaluate latency performance, which is called latency-first
    workload.
  - Different loads are also evaluated by running 1, 12, 24, 48 or 96
    throughput-first workloads/latency-first workload simultaneously.

For "BULK" policy, the throughput performance is on average ~1.22X than
baseline.
For "CPU" policy, the throughput performance is on average ~1.19X than
baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
For "LATENCY" policy, the latency is on average 49.8% less than the baseline.
For "MIX" policy, mixed workloads performance is evaluated.
The mixed workloads are combination of throughput-first workload and
latency-first workload. Five different types of combinations are evaluated
(pure throughput-first workload, pure latency-first workloads,
 2/3 throughput-first workload + 1/3 latency-first workloads,
 1/3 throughput-first workload + 2/3 latency-first workloads and
 1/2 throughput-first workload + 1/2 latency-first workloads).
For caculating the performance of mixed workloads, a weighted sum system
is introduced.
Score = normalized_latency * Weight + normalized_throughput * (1 - Weight).
If we assume that the user has an equal interest in latency and throughput
performance, the Score for "MIX" policy is on average ~1.63X than baseline.

Changes since V2:
 - Set default to n for NET policy subsystem
 - Modify the queue selection algorism. The new algorism will consider
   CPU loads and ref number
 - Extends the netpolicy to support tc bpf when selecting Tx queue
 - Provides an option irq_affinity for driver to choose to disable IRQ balance
   and set IRQ affinity
 - Make the netpolicy_sys_map_version per device not global
 - Modify the changelog accordingly

Changes since V1:
 - Using work queue to set Rx network flow classification rules and search
   available NET policy object asynchronously.
 - Using RCU lock to replace read-write lock
 - Redo performance test and update performance results.
 - Some minor modification for codes and documents.
 - Remove i40e related patches which will be submitted in separate thread.

Kan Liang (26):
  net: introduce NET policy
  net/netpolicy: init NET policy
  net/netpolicy: get device queue irq information
  net/netpolicy: get CPU information
  net/netpolicy: create CPU and queue mapping
  net/netpolicy: set and remove IRQ affinity
  net/netpolicy: enable and disable NET policy
  net/netpolicy: introduce NET policy object
  net/netpolicy: set NET policy by policy name
  net/netpolicy: add three new NET policies
  net/netpolicy: add MIX policy
  net/netpolicy: NET device hotplug
  net/netpolicy: support CPU hotplug
  net/netpolicy: handle channel changes
  net/netpolicy: implement netpolicy register
  net/netpolicy: introduce per socket netpolicy
  net/netpolicy: introduce netpolicy_pick_queue
  net/netpolicy: set tx queues according to policy
  net/netpolicy: tc bpf extension to pick Tx queue
  net/netpolicy: set Rx queues according to policy
  net/netpolicy: introduce per task net policy
  net/netpolicy: set per task policy by proc
  net/netpolicy: fast path for finding the queues
  net/netpolicy: optimize for queue pair
  net/netpolicy: limit the total record number
  Documentation/networking: Document NET policy

 Documentation/networking/netpolicy.txt |  157 ++++
 arch/alpha/include/uapi/asm/socket.h   |    2 +
 arch/avr32/include/uapi/asm/socket.h   |    2 +
 arch/frv/include/uapi/asm/socket.h     |    2 +
 arch/ia64/include/uapi/asm/socket.h    |    2 +
 arch/m32r/include/uapi/asm/socket.h    |    2 +
 arch/mips/include/uapi/asm/socket.h    |    2 +
 arch/mn10300/include/uapi/asm/socket.h |    2 +
 arch/parisc/include/uapi/asm/socket.h  |    2 +
 arch/powerpc/include/uapi/asm/socket.h |    2 +
 arch/s390/include/uapi/asm/socket.h    |    2 +
 arch/sparc/include/uapi/asm/socket.h   |    2 +
 arch/xtensa/include/uapi/asm/socket.h  |    2 +
 fs/proc/base.c                         |   64 ++
 include/linux/init_task.h              |    9 +
 include/linux/netdevice.h              |   31 +
 include/linux/netpolicy.h              |  177 ++++
 include/linux/sched.h                  |    8 +
 include/net/net_namespace.h            |    3 +
 include/net/request_sock.h             |    4 +-
 include/net/sock.h                     |   28 +
 include/uapi/asm-generic/socket.h      |    2 +
 include/uapi/linux/bpf.h               |    8 +
 include/uapi/linux/prctl.h             |    4 +
 kernel/exit.c                          |    4 +
 kernel/fork.c                          |    6 +
 kernel/sched/fair.c                    |    8 +-
 kernel/sys.c                           |   31 +
 net/Kconfig                            |    7 +
 net/core/Makefile                      |    1 +
 net/core/dev.c                         |   20 +-
 net/core/ethtool.c                     |    8 +-
 net/core/filter.c                      |   36 +
 net/core/netpolicy.c                   | 1571 ++++++++++++++++++++++++++++++++
 net/core/sock.c                        |   36 +
 net/ipv4/af_inet.c                     |   71 ++
 net/ipv4/udp.c                         |    4 +
 samples/bpf/Makefile                   |    1 +
 samples/bpf/bpf_helpers.h              |    2 +
 39 files changed, 2317 insertions(+), 8 deletions(-)
 create mode 100644 Documentation/networking/netpolicy.txt
 create mode 100644 include/linux/netpolicy.h
 create mode 100644 net/core/netpolicy.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 01/26] net: introduce NET policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 02/26] net/netpolicy: init " kan.liang
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch introduce NET policy subsystem. If proc is supported in the
system, it creates netpolicy node in proc system.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h   |   7 +++
 include/net/net_namespace.h |   3 ++
 net/Kconfig                 |   7 +++
 net/core/Makefile           |   1 +
 net/core/netpolicy.c        | 128 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 146 insertions(+)
 create mode 100644 net/core/netpolicy.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 67bb978..435573c 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1618,6 +1618,8 @@ enum netdev_priv_flags {
  *			switch driver and used to set the phys state of the
  *			switch port.
  *
+ *	@proc_dev:	device node in proc to configure device net policy
+ *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
  */
@@ -1885,6 +1887,11 @@ struct net_device {
 	struct lock_class_key	*qdisc_tx_busylock;
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
+#ifdef CONFIG_NETPOLICY
+#ifdef CONFIG_PROC_FS
+	struct proc_dir_entry	*proc_dev;
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 0933c74..571f005 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -142,6 +142,9 @@ struct net {
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
+#ifdef CONFIG_NETPOLICY
+	struct proc_dir_entry	*proc_netpolicy;
+#endif /* CONFIG_NETPOLICY */
 };
 
 #include <linux/seq_file_net.h>
diff --git a/net/Kconfig b/net/Kconfig
index 7b6cd34..b2b0354 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
 
 endif
 
+config NETPOLICY
+	depends on NET
+	bool "Net policy support"
+	default n
+	---help---
+	Net policy support
+
 source "net/dccp/Kconfig"
 source "net/sctp/Kconfig"
 source "net/rds/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..0be7092 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NETPOLICY) += netpolicy.o
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
new file mode 100644
index 0000000..faabfe7
--- /dev/null
+++ b/net/core/netpolicy.c
@@ -0,0 +1,128 @@
+/*
+ * netpolicy.c: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * NET policy intends to simplify the network configuration and get a good
+ * network performance according to the hints(policy) which is applied by user.
+ *
+ * Motivation
+ * 	- The network performance is not good with default system settings.
+ *	- It is too difficult to do automatic tuning for all possible
+ *	  workloads, since workloads have different requirements. Some
+ *	  workloads may want high throughput. Some may need low latency.
+ *	- There are lots of manual configurations. Fine grained configuration
+ *	  is too difficult for users.
+ * 	So, it is a big challenge to get good network performance.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <linux/uaccess.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+
+#ifdef CONFIG_PROC_FS
+
+static int net_policy_proc_show(struct seq_file *m, void *v)
+{
+	struct net_device *dev = (struct net_device *)m->private;
+
+	seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+
+	return 0;
+}
+
+static int net_policy_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, net_policy_proc_show, PDE_DATA(inode));
+}
+
+static const struct file_operations proc_net_policy_operations = {
+	.open		= net_policy_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+	.owner		= THIS_MODULE,
+};
+
+static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
+{
+	dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
+	if (!dev->proc_dev)
+		return -ENOMEM;
+
+	if (!proc_create_data("policy", S_IWUSR | S_IRUGO,
+			      dev->proc_dev, &proc_net_policy_operations,
+			      (void *)dev)) {
+		remove_proc_subtree(dev->name, net->proc_netpolicy);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+	struct net_device *dev, *aux;
+
+	net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
+					     net->proc_net);
+	if (!net->proc_netpolicy)
+		return -ENOMEM;
+
+	for_each_netdev_safe(net, dev, aux) {
+		netpolicy_proc_dev_init(net, dev);
+	}
+
+	return 0;
+}
+
+#else /* CONFIG_PROC_FS */
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+	return 0;
+}
+#endif /* CONFIG_PROC_FS */
+
+static void __net_exit netpolicy_net_exit(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	remove_proc_subtree("netpolicy", net->proc_net);
+#endif /* CONFIG_PROC_FS */
+}
+
+static struct pernet_operations netpolicy_net_ops = {
+	.init = netpolicy_net_init,
+	.exit = netpolicy_net_exit,
+};
+
+static int __init netpolicy_init(void)
+{
+	int ret;
+
+	ret = register_pernet_subsys(&netpolicy_net_ops);
+
+	return ret;
+}
+
+static void __exit netpolicy_exit(void)
+{
+	unregister_pernet_subsys(&netpolicy_net_ops);
+}
+
+subsys_initcall(netpolicy_init);
+module_exit(netpolicy_exit);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 02/26] net/netpolicy: init NET policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 01/26] net: introduce " kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information kan.liang
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch tries to initialize NET policy for all the devices in the
system. However, not all device drivers have NET policy support. For
those drivers who does not have NET policy support, the node will not be
showed in /proc/net/netpolicy/.
The device driver who has NET policy support must implement the
interface ndo_netpolicy_init, which is used to do necessory
initialization and collect information (E.g. supported policies) from
driver.

The user can check /proc/net/netpolicy/ and
/proc/net/netpolicy/$DEV/policy to know the available device and its
supported policy.

np_lock is also introduced to protect the state of NET policy.

Device hotplug will be handled later in this series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h | 12 +++++++
 include/linux/netpolicy.h | 31 ++++++++++++++++
 net/core/netpolicy.c      | 91 +++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 123 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/netpolicy.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 435573c..876293d 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -53,6 +53,7 @@
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
 #include <linux/hashtable.h>
+#include <linux/netpolicy.h>
 
 struct netpoll_info;
 struct device;
@@ -1121,6 +1122,9 @@ struct netdev_xdp {
  * int (*ndo_xdp)(struct net_device *dev, struct netdev_xdp *xdp);
  *	This function is used to set or query state related to XDP on the
  *	netdevice. See definition of enum xdp_netdev_command for details.
+ * int(*ndo_netpolicy_init)(struct net_device *dev,
+ *			    struct netpolicy_info *info);
+ *	This function is used to init and get supported policy.
  *
  */
 struct net_device_ops {
@@ -1307,6 +1311,10 @@ struct net_device_ops {
 						       int needed_headroom);
 	int			(*ndo_xdp)(struct net_device *dev,
 					   struct netdev_xdp *xdp);
+#ifdef CONFIG_NETPOLICY
+	int			(*ndo_netpolicy_init)(struct net_device *dev,
+						      struct netpolicy_info *info);
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
@@ -1619,6 +1627,8 @@ enum netdev_priv_flags {
  *			switch port.
  *
  *	@proc_dev:	device node in proc to configure device net policy
+ *	@netpolicy:	NET policy related information of net device
+ *	@np_lock:	protect the state of NET policy
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -1891,6 +1901,8 @@ struct net_device {
 #ifdef CONFIG_PROC_FS
 	struct proc_dir_entry	*proc_dev;
 #endif /* CONFIG_PROC_FS */
+	struct netpolicy_info	*netpolicy;
+	spinlock_t		np_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
new file mode 100644
index 0000000..ca1f131
--- /dev/null
+++ b/include/linux/netpolicy.h
@@ -0,0 +1,31 @@
+/*
+ * netpolicy.h: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#ifndef __LINUX_NETPOLICY_H
+#define __LINUX_NETPOLICY_H
+
+enum netpolicy_name {
+	NET_POLICY_NONE		= 0,
+	NET_POLICY_MAX,
+};
+
+extern const char *policy_name[];
+
+struct netpolicy_info {
+	enum netpolicy_name	cur_policy;
+	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+};
+
+#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index faabfe7..c1c0dc0 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -34,14 +34,31 @@
 #include <linux/uaccess.h>
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
+#include <net/rtnetlink.h>
 
+const char *policy_name[NET_POLICY_MAX] = {
+	"NONE"
+};
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
 	struct net_device *dev = (struct net_device *)m->private;
-
-	seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+	int i;
+
+	if (WARN_ON(!dev->netpolicy))
+		return -EINVAL;
+
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+		seq_printf(m, "%s: There is no policy applied\n", dev->name);
+		seq_printf(m, "%s: The available policy include:", dev->name);
+		for_each_set_bit(i, dev->netpolicy->avail_policy, NET_POLICY_MAX)
+			seq_printf(m, " %s", policy_name[i]);
+		seq_printf(m, "\n");
+	} else {
+		seq_printf(m, "%s: POLICY %s is running on the system\n",
+			   dev->name, policy_name[dev->netpolicy->cur_policy]);
+	}
 
 	return 0;
 }
@@ -73,33 +90,85 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 	}
 	return 0;
 }
+#endif /* CONFIG_PROC_FS */
+
+int init_netpolicy(struct net_device *dev)
+{
+	int ret;
+
+	spin_lock(&dev->np_lock);
+	ret = 0;
+
+	if (!dev->netdev_ops->ndo_netpolicy_init) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (dev->netpolicy)
+		goto unlock;
+
+	dev->netpolicy = kzalloc(sizeof(*dev->netpolicy), GFP_ATOMIC);
+	if (!dev->netpolicy) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	ret = dev->netdev_ops->ndo_netpolicy_init(dev, dev->netpolicy);
+	if (ret) {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+	}
+
+unlock:
+	spin_unlock(&dev->np_lock);
+	return ret;
+}
+
+void uninit_netpolicy(struct net_device *dev)
+{
+	spin_lock(&dev->np_lock);
+	if (dev->netpolicy) {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+	}
+	spin_unlock(&dev->np_lock);
+}
 
 static int __net_init netpolicy_net_init(struct net *net)
 {
 	struct net_device *dev, *aux;
 
+#ifdef CONFIG_PROC_FS
 	net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
 					     net->proc_net);
 	if (!net->proc_netpolicy)
 		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
 
+	rtnl_lock();
 	for_each_netdev_safe(net, dev, aux) {
-		netpolicy_proc_dev_init(net, dev);
+		if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+			if (netpolicy_proc_dev_init(net, dev))
+				uninit_netpolicy(dev);
+			else
+#endif /* CONFIG_PROC_FS */
+			pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+		}
 	}
+	rtnl_unlock();
 
 	return 0;
 }
 
-#else /* CONFIG_PROC_FS */
-
-static int __net_init netpolicy_net_init(struct net *net)
-{
-	return 0;
-}
-#endif /* CONFIG_PROC_FS */
-
 static void __net_exit netpolicy_net_exit(struct net *net)
 {
+	struct net_device *dev, *aux;
+
+	rtnl_lock();
+	for_each_netdev_safe(net, dev, aux)
+		uninit_netpolicy(dev);
+	rtnl_unlock();
 #ifdef CONFIG_PROC_FS
 	remove_proc_subtree("netpolicy", net->proc_net);
 #endif /* CONFIG_PROC_FS */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 01/26] net: introduce " kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 02/26] net/netpolicy: init " kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 16:48   ` Sergei Shtylyov
  2016-09-12 14:55 ` [RFC V3 PATCH 04/26] net/netpolicy: get CPU information kan.liang
                   ` (24 subsequent siblings)
  27 siblings, 1 reply; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Net policy needs to know device information. Currently, it's enough to
only get irq information of rx and tx queues.

This patch introduces ndo ops to do so, not ethtool ops.
Because there are already several ways to get irq information in
userspace. It's not necessory to extend the ethtool.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |  5 +++++
 include/linux/netpolicy.h |  7 +++++++
 net/core/netpolicy.c      | 14 ++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 876293d..e1b5685 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1125,6 +1125,9 @@ struct netdev_xdp {
  * int(*ndo_netpolicy_init)(struct net_device *dev,
  *			    struct netpolicy_info *info);
  *	This function is used to init and get supported policy.
+ * int (*ndo_get_irq_info)(struct net_device *dev,
+ *			   struct netpolicy_dev_info *info);
+ *	This function is used to get irq information of rx and tx queues
  *
  */
 struct net_device_ops {
@@ -1314,6 +1317,8 @@ struct net_device_ops {
 #ifdef CONFIG_NETPOLICY
 	int			(*ndo_netpolicy_init)(struct net_device *dev,
 						      struct netpolicy_info *info);
+	int			(*ndo_get_irq_info)(struct net_device *dev,
+						    struct netpolicy_dev_info *info);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ca1f131..fc87d9b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -23,6 +23,13 @@ enum netpolicy_name {
 
 extern const char *policy_name[];
 
+struct netpolicy_dev_info {
+	u32	rx_num;
+	u32	tx_num;
+	u32	*rx_irq;
+	u32	*tx_irq;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index c1c0dc0..882f0de 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,20 @@
 #include <net/net_namespace.h>
 #include <net/rtnetlink.h>
 
+static int netpolicy_get_dev_info(struct net_device *dev,
+				  struct netpolicy_dev_info *d_info)
+{
+	if (!dev->netdev_ops->ndo_get_irq_info)
+		return -ENOTSUPP;
+	return dev->netdev_ops->ndo_get_irq_info(dev, d_info);
+}
+
+static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
+{
+	kfree(d_info->rx_irq);
+	kfree(d_info->tx_irq);
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 04/26] net/netpolicy: get CPU information
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (2 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 05/26] net/netpolicy: create CPU and queue mapping kan.liang
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Net policy also needs to know CPU information. Currently, online
CPU count is enough.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 882f0de..31c41ca 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -50,6 +50,11 @@ static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
 	kfree(d_info->tx_irq);
 }
 
+static u32 netpolicy_get_cpu_information(void)
+{
+	return num_online_cpus();
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 05/26] net/netpolicy: create CPU and queue mapping
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (3 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 04/26] net/netpolicy: get CPU information kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 06/26] net/netpolicy: set and remove IRQ affinity kan.liang
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Current implementation forces CPU and queue 1:1 mapping. This patch
introduces the function netpolicy_update_sys_map to create this mapping.
The result is stored in netpolicy_sys_info.

If the CPU count and queue count are different, the remaining
CPUs/queues are not used for now.

CPU hotplug, device hotplug or ethtool may change the CPU count or
queue count. For these cases, this function can also be called to
reconstruct the mapping. These cases will be handled later in this
series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 18 ++++++++++++
 net/core/netpolicy.c      | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fc87d9b..a946b75c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -30,9 +30,27 @@ struct netpolicy_dev_info {
 	u32	*tx_irq;
 };
 
+struct netpolicy_sys_map {
+	u32	cpu;
+	u32	queue;
+	u32	irq;
+};
+
+struct netpolicy_sys_info {
+	/*
+	 * Record the cpu and queue 1:1 mapping
+	 */
+	u32				avail_rx_num;
+	struct netpolicy_sys_map	*rx;
+	u32				avail_tx_num;
+	struct netpolicy_sys_map	*tx;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+	/* cpu and queue mapping information */
+	struct netpolicy_sys_info	sys_info;
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 31c41ca..0972341 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -55,6 +55,80 @@ static u32 netpolicy_get_cpu_information(void)
 	return num_online_cpus();
 }
 
+static void netpolicy_free_sys_map(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+
+	kfree(s_info->rx);
+	s_info->rx = NULL;
+	s_info->avail_rx_num = 0;
+	kfree(s_info->tx);
+	s_info->tx = NULL;
+	s_info->avail_tx_num = 0;
+}
+
+static int netpolicy_update_sys_map(struct net_device *dev,
+				    struct netpolicy_dev_info *d_info,
+				    u32 cpu)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 num, i, online_cpu;
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return -ENOMEM;
+
+	/* update rx cpu map */
+	if (cpu > d_info->rx_num)
+		num = d_info->rx_num;
+	else
+		num = cpu;
+
+	s_info->avail_rx_num = num;
+	s_info->rx = kcalloc(num, sizeof(*s_info->rx), GFP_ATOMIC);
+	if (!s_info->rx)
+		goto err;
+	cpumask_copy(cpumask, cpu_online_mask);
+
+	i = 0;
+	for_each_cpu(online_cpu, cpumask) {
+		if (i == num)
+			break;
+		s_info->rx[i].cpu = online_cpu;
+		s_info->rx[i].queue = i;
+		s_info->rx[i].irq = d_info->rx_irq[i];
+		i++;
+	}
+
+	/* update tx cpu map */
+	if (cpu >= d_info->tx_num)
+		num = d_info->tx_num;
+	else
+		num = cpu;
+
+	s_info->avail_tx_num = num;
+	s_info->tx = kcalloc(num, sizeof(*s_info->tx), GFP_ATOMIC);
+	if (!s_info->tx)
+		goto err;
+
+	i = 0;
+	for_each_cpu(online_cpu, cpumask) {
+		if (i == num)
+			break;
+		s_info->tx[i].cpu = online_cpu;
+		s_info->tx[i].queue = i;
+		s_info->tx[i].irq = d_info->tx_irq[i];
+		i++;
+	}
+
+	free_cpumask_var(cpumask);
+	return 0;
+err:
+	netpolicy_free_sys_map(dev);
+	free_cpumask_var(cpumask);
+	return -ENOMEM;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 06/26] net/netpolicy: set and remove IRQ affinity
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (4 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 05/26] net/netpolicy: create CPU and queue mapping kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 07/26] net/netpolicy: enable and disable NET policy kan.liang
                   ` (21 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patches introduces functions to set and remove IRQ affinity
according to cpu and queue mapping.

The functions will not record the previous affinity status. After a
set/remove cycles, it will set the affinity on all online CPU with IRQ
balance enabling.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0972341..adcb5e3 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/irq.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
 #include <linux/uaccess.h>
@@ -129,6 +130,38 @@ err:
 	return -ENOMEM;
 }
 
+static void netpolicy_clear_affinity(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 i;
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		irq_clear_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->rx[i].irq, cpu_online_mask);
+	}
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		irq_clear_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->tx[i].irq, cpu_online_mask);
+	}
+}
+
+static void netpolicy_set_affinity(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 i;
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		irq_set_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->rx[i].irq, cpumask_of(s_info->rx[i].cpu));
+	}
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		irq_set_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->tx[i].irq, cpumask_of(s_info->tx[i].cpu));
+	}
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 07/26] net/netpolicy: enable and disable NET policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (5 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 06/26] net/netpolicy: set and remove IRQ affinity kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 08/26] net/netpolicy: introduce NET policy object kan.liang
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch introduces functions to enable and disable NET policy.

For enabling, it collects device and CPU information and setup CPU/queue
mapping. Also, for some drivers like i40e driver, it will get better
performance if setting IRQ affinity. This patch provides an option
irq_affinity for driver to set IRQ affinity in NET policy enabling.

For disabling, it removes the IRQ affinity (if set) and mapping
information.

np_lock should protect the enable and disable state. It will be done
later in this series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  1 +
 net/core/netpolicy.c      | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a946b75c..bfab7b8 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -49,6 +49,7 @@ struct netpolicy_sys_info {
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+	bool irq_affinity;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 };
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index adcb5e3..a6e240f 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -162,6 +162,47 @@ static void netpolicy_set_affinity(struct net_device *dev)
 	}
 }
 
+static int netpolicy_disable(struct net_device *dev)
+{
+	if (dev->netpolicy->irq_affinity)
+		netpolicy_clear_affinity(dev);
+	netpolicy_free_sys_map(dev);
+
+	return 0;
+}
+
+static int netpolicy_enable(struct net_device *dev)
+{
+	int ret;
+	struct netpolicy_dev_info d_info;
+	u32 cpu;
+
+	if (WARN_ON(!dev->netpolicy))
+		return -EINVAL;
+
+	/* get driver information */
+	ret = netpolicy_get_dev_info(dev, &d_info);
+	if (ret)
+		return ret;
+
+	/* get cpu information */
+	cpu = netpolicy_get_cpu_information();
+
+	/* create sys map */
+	ret = netpolicy_update_sys_map(dev, &d_info, cpu);
+	if (ret) {
+		netpolicy_free_dev_info(&d_info);
+		return ret;
+	}
+
+	/* set irq affinity */
+	if (dev->netpolicy->irq_affinity)
+		netpolicy_set_affinity(dev);
+
+	netpolicy_free_dev_info(&d_info);
+	return 0;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 08/26] net/netpolicy: introduce NET policy object
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (6 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 07/26] net/netpolicy: enable and disable NET policy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 09/26] net/netpolicy: set NET policy by policy name kan.liang
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch introduces the concept of NET policy object and policy object
list.

The NET policy object is the instance of CPU/queue mapping. The object
can be shared between different tasks/sockets. So besides CPU and queue
information, the object also maintains a reference counter.

Each policy will have a dedicated object list. If the policy is set as
device policy, all objects will be inserted into the related policy
object list. The user will search and pickup the available objects from
the list later.

The network performance for objects could be different because of the
queue and CPU topology. To generate a proper object list, dev location,
HT and CPU topology have to be considered. The high performance objects
are in the front of the list.

The object lists will be regenerated if sys mapping changes or device
net policy changes.

Lock np_ob_list_lock is used to protect the object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |   2 +
 include/linux/netpolicy.h |  15 +++
 net/core/netpolicy.c      | 238 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 254 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e1b5685..8fcea13 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1634,6 +1634,7 @@ enum netdev_priv_flags {
  *	@proc_dev:	device node in proc to configure device net policy
  *	@netpolicy:	NET policy related information of net device
  *	@np_lock:	protect the state of NET policy
+ *	@np_ob_list_lock:	protect the net policy object list
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -1908,6 +1909,7 @@ struct net_device {
 #endif /* CONFIG_PROC_FS */
 	struct netpolicy_info	*netpolicy;
 	spinlock_t		np_lock;
+	spinlock_t		np_ob_list_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index bfab7b8..1c89dda 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -21,6 +21,12 @@ enum netpolicy_name {
 	NET_POLICY_MAX,
 };
 
+enum netpolicy_traffic {
+	NETPOLICY_RX		= 0,
+	NETPOLICY_TX,
+	NETPOLICY_RXTX,
+};
+
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
@@ -46,12 +52,21 @@ struct netpolicy_sys_info {
 	struct netpolicy_sys_map	*tx;
 };
 
+struct netpolicy_object {
+	struct list_head	list;
+	u32			cpu;
+	u32			queue;
+	atomic_t		refcnt;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	bool irq_affinity;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
+	/* List of policy objects 0 rx 1 tx */
+	struct list_head	obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index a6e240f..b330cf3 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,7 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 #include <net/rtnetlink.h>
+#include <linux/sort.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -162,11 +163,31 @@ static void netpolicy_set_affinity(struct net_device *dev)
 	}
 }
 
+static void netpolicy_free_obj_list(struct net_device *dev)
+{
+	int i, j;
+	struct netpolicy_object *obj, *tmp;
+
+	spin_lock(&dev->np_ob_list_lock);
+	for (i = 0; i < NETPOLICY_RXTX; i++) {
+		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++) {
+			if (list_empty(&dev->netpolicy->obj_list[i][j]))
+				continue;
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[i][j], list) {
+				list_del(&obj->list);
+				kfree(obj);
+			}
+		}
+	}
+	spin_unlock(&dev->np_ob_list_lock);
+}
+
 static int netpolicy_disable(struct net_device *dev)
 {
 	if (dev->netpolicy->irq_affinity)
 		netpolicy_clear_affinity(dev);
 	netpolicy_free_sys_map(dev);
+	netpolicy_free_obj_list(dev);
 
 	return 0;
 }
@@ -206,6 +227,213 @@ static int netpolicy_enable(struct net_device *dev)
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
+
+static u32 cpu_to_queue(struct net_device *dev,
+			u32 cpu, bool is_rx)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	int i;
+
+	if (is_rx) {
+		for (i = 0; i < s_info->avail_rx_num; i++) {
+			if (s_info->rx[i].cpu == cpu)
+				return s_info->rx[i].queue;
+		}
+	} else {
+		for (i = 0; i < s_info->avail_tx_num; i++) {
+			if (s_info->tx[i].cpu == cpu)
+				return s_info->tx[i].queue;
+		}
+	}
+
+	return ~0;
+}
+
+static int netpolicy_add_obj(struct net_device *dev,
+			     u32 cpu, bool is_rx,
+			     enum netpolicy_name policy)
+{
+	struct netpolicy_object *obj;
+	int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+
+	obj = kzalloc(sizeof(*obj), GFP_ATOMIC);
+	if (!obj)
+		return -ENOMEM;
+	obj->cpu = cpu;
+	obj->queue = cpu_to_queue(dev, cpu, is_rx);
+	list_add_tail(&obj->list, &dev->netpolicy->obj_list[dir][policy]);
+
+	return 0;
+}
+
+struct sort_node {
+	int	node;
+	int	distance;
+};
+
+static inline int node_distance_cmp(const void *a, const void *b)
+{
+	const struct sort_node *_a = a;
+	const struct sort_node *_b = b;
+
+	return _a->distance - _b->distance;
+}
+
+static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
+				   enum netpolicy_name policy,
+				   struct sort_node *nodes, int num_node,
+				   struct cpumask *node_avail_cpumask)
+{
+	cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
+	struct cpumask *node_assigned_cpumask;
+	int i, ret = -ENOMEM;
+	u32 cpu;
+
+	if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
+		return ret;
+	if (!alloc_cpumask_var(&sibling_tmp_cpumask, GFP_ATOMIC))
+		goto alloc_fail1;
+
+	node_assigned_cpumask = kcalloc(num_node, sizeof(struct cpumask), GFP_ATOMIC);
+	if (!node_assigned_cpumask)
+		goto alloc_fail2;
+
+	/* Don't share physical core */
+	for (i = 0; i < num_node; i++) {
+		if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
+			continue;
+		spin_lock(&dev->np_ob_list_lock);
+		cpumask_copy(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node]);
+		while (cpumask_weight(node_tmp_cpumask)) {
+			cpu = cpumask_first(node_tmp_cpumask);
+
+			/* push to obj list */
+			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (ret) {
+				spin_unlock(&dev->np_ob_list_lock);
+				goto err;
+			}
+
+			cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+			cpumask_and(sibling_tmp_cpumask, node_tmp_cpumask, topology_sibling_cpumask(cpu));
+			cpumask_xor(node_tmp_cpumask, node_tmp_cpumask, sibling_tmp_cpumask);
+		}
+		spin_unlock(&dev->np_ob_list_lock);
+	}
+
+	for (i = 0; i < num_node; i++) {
+		cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
+		if (cpumask_weight(node_tmp_cpumask) == 0)
+			continue;
+		spin_lock(&dev->np_ob_list_lock);
+		for_each_cpu(cpu, node_tmp_cpumask) {
+			/* push to obj list */
+			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (ret) {
+				spin_unlock(&dev->np_ob_list_lock);
+				goto err;
+			}
+			cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+		}
+		spin_unlock(&dev->np_ob_list_lock);
+	}
+
+err:
+	kfree(node_assigned_cpumask);
+alloc_fail2:
+	free_cpumask_var(sibling_tmp_cpumask);
+alloc_fail1:
+	free_cpumask_var(node_tmp_cpumask);
+
+	return ret;
+}
+
+static int netpolicy_gen_obj_list(struct net_device *dev,
+				  enum netpolicy_name policy)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	struct cpumask *node_avail_cpumask;
+	struct sort_node *nodes;
+	int i, ret, node = 0;
+	int num_nodes = 1;
+	u32 cpu;
+#ifdef CONFIG_NUMA
+	int dev_node = 0;
+	int val;
+#endif
+	/* The network performance for objects could be different
+	 * because of the queue and cpu topology.
+	 * The objects will be ordered accordingly,
+	 * and put high performance object in the front.
+	 *
+	 * The priority rules as below,
+	 * - The local object. (Local means cpu and queue are in the same node.)
+	 * - The cpu in the object is the only logical core in physical core.
+	 *   The sibiling core's object has not been added in the object list yet.
+	 * - The rest of objects
+	 *
+	 * So the order of object list is as below:
+	 * 1. Local core + the only logical core
+	 * 2. Remote core + the only logical core
+	 * 3. Local core + the core's sibling is already in the object list
+	 * 4. Remote core + the core's sibling is already in the object list
+	 */
+#ifdef CONFIG_NUMA
+	dev_node = dev_to_node(dev->dev.parent);
+	num_nodes = num_online_nodes();
+#endif
+
+	nodes = kcalloc(num_nodes, sizeof(*nodes), GFP_ATOMIC);
+	if (!nodes)
+		return -ENOMEM;
+
+	node_avail_cpumask = kcalloc(num_nodes, sizeof(struct cpumask), GFP_ATOMIC);
+	if (!node_avail_cpumask) {
+		kfree(nodes);
+		return -ENOMEM;
+	}
+
+#ifdef CONFIG_NUMA
+	/* order the node from near to far */
+	for_each_node_mask(i, node_online_map) {
+		val = node_distance(dev_node, i);
+		nodes[node].node = i;
+		nodes[node].distance = val;
+		node++;
+	}
+	sort(nodes, num_nodes, sizeof(*nodes),
+	     node_distance_cmp, NULL);
+#else
+	nodes[0].node = 0;
+#endif
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		cpu = s_info->rx[i].cpu;
+		cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+	}
+	ret = _netpolicy_gen_obj_list(dev, true, policy, nodes,
+				      node, node_avail_cpumask);
+	if (ret)
+		goto err;
+
+	for (i = 0; i < node; i++)
+		cpumask_clear(&node_avail_cpumask[nodes[i].node]);
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		cpu = s_info->tx[i].cpu;
+		cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+	}
+	ret = _netpolicy_gen_obj_list(dev, false, policy, nodes,
+				      node, node_avail_cpumask);
+	if (ret)
+		goto err;
+
+err:
+	kfree(nodes);
+	kfree(node_avail_cpumask);
+	return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -261,7 +489,7 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 
 int init_netpolicy(struct net_device *dev)
 {
-	int ret;
+	int ret, i, j;
 
 	spin_lock(&dev->np_lock);
 	ret = 0;
@@ -284,7 +512,15 @@ int init_netpolicy(struct net_device *dev)
 	if (ret) {
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
+		goto unlock;
+	}
+
+	spin_lock(&dev->np_ob_list_lock);
+	for (i = 0; i < NETPOLICY_RXTX; i++) {
+		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
+			INIT_LIST_HEAD(&dev->netpolicy->obj_list[i][j]);
 	}
+	spin_unlock(&dev->np_ob_list_lock);
 
 unlock:
 	spin_unlock(&dev->np_lock);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 09/26] net/netpolicy: set NET policy by policy name
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (7 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 08/26] net/netpolicy: introduce NET policy object kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 10/26] net/netpolicy: add three new NET policies kan.liang
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

User can write policy name to /proc/net/netpolicy/$DEV/policy to enable
net policy for specific device.

When the policy is enabled, the subsystem automatically disables IRQ
balance and set IRQ affinity. The object list is also generated
accordingly.

It is device driver's responsibility to set driver specific
configuration for the given policy.

np_lock will be used to protect the state.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |  5 +++
 include/linux/netpolicy.h |  1 +
 net/core/netpolicy.c      | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 8fcea13..3bfa5df 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1128,6 +1128,9 @@ struct netdev_xdp {
  * int (*ndo_get_irq_info)(struct net_device *dev,
  *			   struct netpolicy_dev_info *info);
  *	This function is used to get irq information of rx and tx queues
+ * int (*ndo_set_net_policy)(struct net_device *dev,
+ *			     enum netpolicy_name name);
+ *	This function is used to set per device net policy by name
  *
  */
 struct net_device_ops {
@@ -1319,6 +1322,8 @@ struct net_device_ops {
 						      struct netpolicy_info *info);
 	int			(*ndo_get_irq_info)(struct net_device *dev,
 						    struct netpolicy_dev_info *info);
+	int			(*ndo_set_net_policy)(struct net_device *dev,
+						      enum netpolicy_name name);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 1c89dda..8596b6a 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -27,6 +27,7 @@ enum netpolicy_traffic {
 	NETPOLICY_RXTX,
 };
 
+#define POLICY_NAME_LEN_MAX	64
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index b330cf3..511d1c6 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -37,6 +37,7 @@
 #include <net/net_namespace.h>
 #include <net/rtnetlink.h>
 #include <linux/sort.h>
+#include <linux/ctype.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -434,6 +435,69 @@ err:
 	return ret;
 }
 
+static int net_policy_set_by_name(char *name, struct net_device *dev)
+{
+	int i, ret;
+
+	spin_lock(&dev->np_lock);
+	ret = 0;
+
+	if (!dev->netpolicy ||
+	    !dev->netdev_ops->ndo_set_net_policy) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+		break;
+	}
+
+	if (!test_bit(i, dev->netpolicy->avail_policy)) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (i == dev->netpolicy->cur_policy)
+		goto unlock;
+
+	/* If there is no policy applied yet, need to do enable first . */
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+		ret = netpolicy_enable(dev);
+		if (ret)
+			goto unlock;
+	}
+
+	netpolicy_free_obj_list(dev);
+
+	/* Generate object list according to policy name */
+	ret = netpolicy_gen_obj_list(dev, i);
+	if (ret)
+		goto err;
+
+	/* set policy */
+	ret = dev->netdev_ops->ndo_set_net_policy(dev, i);
+	if (ret)
+		goto err;
+
+	/* If removing policy, need to do disable. */
+	if (i == NET_POLICY_NONE)
+		netpolicy_disable(dev);
+
+	dev->netpolicy->cur_policy = i;
+
+	spin_unlock(&dev->np_lock);
+	return 0;
+
+err:
+	netpolicy_free_obj_list(dev);
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE)
+		netpolicy_disable(dev);
+unlock:
+	spin_unlock(&dev->np_lock);
+	return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -463,11 +527,40 @@ static int net_policy_proc_open(struct inode *inode, struct file *file)
 	return single_open(file, net_policy_proc_show, PDE_DATA(inode));
 }
 
+static ssize_t net_policy_proc_write(struct file *file, const char __user *buf,
+				     size_t count, loff_t *pos)
+{
+	struct seq_file *m = file->private_data;
+	struct net_device *dev = (struct net_device *)m->private;
+	char name[POLICY_NAME_LEN_MAX];
+	int i, ret;
+
+	if (!dev->netpolicy)
+		return -ENOTSUPP;
+
+	if (count > POLICY_NAME_LEN_MAX)
+		return -EINVAL;
+
+	if (copy_from_user(name, buf, count))
+		return -EINVAL;
+
+	for (i = 0; i < count - 1; i++)
+		name[i] = toupper(name[i]);
+	name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+	ret = net_policy_set_by_name(name, dev);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
 static const struct file_operations proc_net_policy_operations = {
 	.open		= net_policy_proc_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
 	.release	= seq_release,
+	.write		= net_policy_proc_write,
 	.owner		= THIS_MODULE,
 };
 
@@ -531,6 +624,8 @@ void uninit_netpolicy(struct net_device *dev)
 {
 	spin_lock(&dev->np_lock);
 	if (dev->netpolicy) {
+		if (dev->netpolicy->cur_policy > NET_POLICY_NONE)
+			netpolicy_disable(dev);
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
 	}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 10/26] net/netpolicy: add three new NET policies
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (8 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 09/26] net/netpolicy: set NET policy by policy name kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 11/26] net/netpolicy: add MIX policy kan.liang
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Introduce three NET policies
CPU policy: configure for higher throughput and lower CPU% (power
saving).
BULK policy: configure for highest throughput.
LATENCY policy: configure for lowest latency.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 3 +++
 net/core/netpolicy.c      | 5 ++++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 8596b6a..951227b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -18,6 +18,9 @@
 
 enum netpolicy_name {
 	NET_POLICY_NONE		= 0,
+	NET_POLICY_CPU,
+	NET_POLICY_BULK,
+	NET_POLICY_LATENCY,
 	NET_POLICY_MAX,
 };
 
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 511d1c6..1d796a8 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -226,7 +226,10 @@ static int netpolicy_enable(struct net_device *dev)
 }
 
 const char *policy_name[NET_POLICY_MAX] = {
-	"NONE"
+	"NONE",
+	"CPU",
+	"BULK",
+	"LATENCY"
 };
 
 static u32 cpu_to_queue(struct net_device *dev,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 11/26] net/netpolicy: add MIX policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (9 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 10/26] net/netpolicy: add three new NET policies kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 12/26] net/netpolicy: NET device hotplug kan.liang
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

MIX policy is combine of other policies. It allows different queue has
different policy. If MIX policy is applied,
/proc/net/netpolicy/$DEV/policy shows per queue policy.

Usually, the workloads requires either high throughput or low latency.
So for current implementation, MIX policy is combine of LATENCY policy
and BULK policy.

The workloads which requires high throughput are usually utilize more
CPU resources compared to the workloads which requires low latency. This
means that if there is an equal interest in latency and throughput
performance, it is better to reserve more BULK queues than LATENCY
queues. In this patch, MIX policy is forced to include 1/3 LATENCY
policy queues and 2/3 BULK policy queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |   7 +++
 net/core/netpolicy.c      | 139 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 951227b..f60331d 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -22,6 +22,12 @@ enum netpolicy_name {
 	NET_POLICY_BULK,
 	NET_POLICY_LATENCY,
 	NET_POLICY_MAX,
+
+	/*
+	 * Mixture of the above policy
+	 * Can only be set as global policy.
+	 */
+	NET_POLICY_MIX,
 };
 
 enum netpolicy_traffic {
@@ -67,6 +73,7 @@ struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	bool irq_affinity;
+	bool has_mix_policy;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 1d796a8..f56beca 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -283,6 +283,9 @@ static inline int node_distance_cmp(const void *a, const void *b)
 	return _a->distance - _b->distance;
 }
 
+#define mix_latency_num(num)	((num) / 3)
+#define mix_throughput_num(num)	((num) - mix_latency_num(num))
+
 static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 				   enum netpolicy_name policy,
 				   struct sort_node *nodes, int num_node,
@@ -290,7 +293,9 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 {
 	cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
 	struct cpumask *node_assigned_cpumask;
+	int *l_num = NULL, *b_num = NULL;
 	int i, ret = -ENOMEM;
+	int num_node_cpu;
 	u32 cpu;
 
 	if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
@@ -302,6 +307,23 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 	if (!node_assigned_cpumask)
 		goto alloc_fail2;
 
+	if (policy == NET_POLICY_MIX) {
+		l_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+		if (!l_num)
+			goto alloc_fail3;
+		b_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+		if (!b_num) {
+			kfree(l_num);
+			goto alloc_fail3;
+		}
+
+		for (i = 0; i < num_node; i++) {
+			num_node_cpu = cpumask_weight(&node_avail_cpumask[nodes[i].node]);
+			l_num[i] = mix_latency_num(num_node_cpu);
+			b_num[i] = mix_throughput_num(num_node_cpu);
+		}
+	}
+
 	/* Don't share physical core */
 	for (i = 0; i < num_node; i++) {
 		if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
@@ -312,7 +334,13 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 			cpu = cpumask_first(node_tmp_cpumask);
 
 			/* push to obj list */
-			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (policy == NET_POLICY_MIX) {
+				if (l_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_LATENCY);
+				else if (b_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_BULK);
+			} else
+				ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
 			if (ret) {
 				spin_unlock(&dev->np_ob_list_lock);
 				goto err;
@@ -325,6 +353,41 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 		spin_unlock(&dev->np_ob_list_lock);
 	}
 
+	if (policy == NET_POLICY_MIX) {
+		struct netpolicy_object *obj;
+		int dir = is_rx ? 0 : 1;
+		u32 sibling;
+
+		/* if have to share core, choose latency core first. */
+		for (i = 0; i < num_node; i++) {
+			if ((l_num[i] < 1) && (b_num[i] < 1))
+				continue;
+			spin_lock(&dev->np_ob_list_lock);
+			list_for_each_entry(obj, &dev->netpolicy->obj_list[dir][NET_POLICY_LATENCY], list) {
+				if (cpu_to_node(obj->cpu) != nodes[i].node)
+					continue;
+
+				cpu = obj->cpu;
+				for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
+					if (cpumask_test_cpu(sibling, &node_assigned_cpumask[nodes[i].node]) ||
+					    !cpumask_test_cpu(sibling, &node_avail_cpumask[nodes[i].node]))
+						continue;
+
+					if (l_num[i]-- > 0)
+						ret = netpolicy_add_obj(dev, sibling, is_rx, NET_POLICY_LATENCY);
+					else if (b_num[i]-- > 0)
+						ret = netpolicy_add_obj(dev, sibling, is_rx, NET_POLICY_BULK);
+					if (ret) {
+						spin_unlock(&dev->np_ob_list_lock);
+						goto err;
+					}
+					cpumask_set_cpu(sibling, &node_assigned_cpumask[nodes[i].node]);
+				}
+			}
+			spin_unlock(&dev->np_ob_list_lock);
+		}
+	}
+
 	for (i = 0; i < num_node; i++) {
 		cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
 		if (cpumask_weight(node_tmp_cpumask) == 0)
@@ -332,7 +395,15 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 		spin_lock(&dev->np_ob_list_lock);
 		for_each_cpu(cpu, node_tmp_cpumask) {
 			/* push to obj list */
-			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (policy == NET_POLICY_MIX) {
+				if (l_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_LATENCY);
+				else if (b_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_BULK);
+				else
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_NONE);
+			} else
+				ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
 			if (ret) {
 				spin_unlock(&dev->np_ob_list_lock);
 				goto err;
@@ -343,6 +414,11 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 	}
 
 err:
+	if (policy == NET_POLICY_MIX) {
+		kfree(l_num);
+		kfree(b_num);
+	}
+alloc_fail3:
 	kfree(node_assigned_cpumask);
 alloc_fail2:
 	free_cpumask_var(sibling_tmp_cpumask);
@@ -381,6 +457,22 @@ static int netpolicy_gen_obj_list(struct net_device *dev,
 	 * 2. Remote core + the only logical core
 	 * 3. Local core + the core's sibling is already in the object list
 	 * 4. Remote core + the core's sibling is already in the object list
+	 *
+	 * For MIX policy, on each node, force 1/3 core as latency policy core,
+	 * the rest cores are bulk policy core.
+	 *
+	 * Besides the above priority rules, there is one more rule
+	 * - If it's sibling core's object has been applied a policy
+	 *   Choose the object which the sibling logical core applies latency policy first
+	 *
+	 * So the order of object list for MIX policy is as below:
+	 * 1. Local core + the only logical core
+	 * 2. Remote core + the only logical core
+	 * 3. Local core + the core's sibling is latency policy core
+	 * 4. Remote core + the core's sibling is latency policy core
+	 * 5. Local core + the core's sibling is bulk policy core
+	 * 6. Remote core + the core's sibling is bulk policy core
+	 *
 	 */
 #ifdef CONFIG_NUMA
 	dev_node = dev_to_node(dev->dev.parent);
@@ -451,14 +543,23 @@ static int net_policy_set_by_name(char *name, struct net_device *dev)
 		goto unlock;
 	}
 
-	for (i = 0; i < NET_POLICY_MAX; i++) {
-		if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
-		break;
-	}
+	if (!strncmp(name, "MIX", strlen("MIX"))) {
+		if (dev->netpolicy->has_mix_policy) {
+			i = NET_POLICY_MIX;
+		} else {
+			ret = -ENOTSUPP;
+			goto unlock;
+		}
+	} else {
+		for (i = 0; i < NET_POLICY_MAX; i++) {
+			if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+			break;
+		}
 
-	if (!test_bit(i, dev->netpolicy->avail_policy)) {
-		ret = -ENOTSUPP;
-		goto unlock;
+		if (!test_bit(i, dev->netpolicy->avail_policy)) {
+			ret = -ENOTSUPP;
+			goto unlock;
+		}
 	}
 
 	if (i == dev->netpolicy->cur_policy)
@@ -506,17 +607,35 @@ unlock:
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
 	struct net_device *dev = (struct net_device *)m->private;
+	enum netpolicy_name cur;
+	struct netpolicy_object *obj, *tmp;
 	int i;
 
 	if (WARN_ON(!dev->netpolicy))
 		return -EINVAL;
 
-	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+	cur = dev->netpolicy->cur_policy;
+	if (cur == NET_POLICY_NONE) {
 		seq_printf(m, "%s: There is no policy applied\n", dev->name);
 		seq_printf(m, "%s: The available policy include:", dev->name);
 		for_each_set_bit(i, dev->netpolicy->avail_policy, NET_POLICY_MAX)
 			seq_printf(m, " %s", policy_name[i]);
+		if (dev->netpolicy->has_mix_policy)
+			seq_printf(m, " MIX");
 		seq_printf(m, "\n");
+	} else if (cur == NET_POLICY_MIX) {
+		seq_printf(m, "%s: MIX policy is running on the system\n", dev->name);
+		spin_lock(&dev->np_ob_list_lock);
+		for (i = NET_POLICY_NONE; i < NET_POLICY_MAX; i++) {
+			seq_printf(m, "%s: queues for %s policy\n", dev->name, policy_name[i]);
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[NETPOLICY_RX][i], list) {
+				seq_printf(m, "%s: rx queue %d\n", dev->name, obj->queue);
+			}
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[NETPOLICY_TX][i], list) {
+				seq_printf(m, "%s: tx queue %d\n", dev->name, obj->queue);
+			}
+		}
+		spin_unlock(&dev->np_ob_list_lock);
 	} else {
 		seq_printf(m, "%s: POLICY %s is running on the system\n",
 			   dev->name, policy_name[dev->netpolicy->cur_policy]);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 12/26] net/netpolicy: NET device hotplug
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (10 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 11/26] net/netpolicy: add MIX policy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 13/26] net/netpolicy: support CPU hotplug kan.liang
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Support NET device up/down/namechange in the NET policy code.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 66 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index f56beca..271ecc3 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -688,6 +688,9 @@ static const struct file_operations proc_net_policy_operations = {
 
 static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 {
+	if (dev->proc_dev)
+		proc_remove(dev->proc_dev);
+
 	dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
 	if (!dev->proc_dev)
 		return -ENOMEM;
@@ -754,6 +757,19 @@ void uninit_netpolicy(struct net_device *dev)
 	spin_unlock(&dev->np_lock);
 }
 
+static void netpolicy_dev_init(struct net *net,
+			       struct net_device *dev)
+{
+	if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+		if (netpolicy_proc_dev_init(net, dev))
+			uninit_netpolicy(dev);
+		else
+#endif /* CONFIG_PROC_FS */
+		pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+	}
+}
+
 static int __net_init netpolicy_net_init(struct net *net)
 {
 	struct net_device *dev, *aux;
@@ -767,14 +783,7 @@ static int __net_init netpolicy_net_init(struct net *net)
 
 	rtnl_lock();
 	for_each_netdev_safe(net, dev, aux) {
-		if (!init_netpolicy(dev)) {
-#ifdef CONFIG_PROC_FS
-			if (netpolicy_proc_dev_init(net, dev))
-				uninit_netpolicy(dev);
-			else
-#endif /* CONFIG_PROC_FS */
-			pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
-		}
+		netpolicy_dev_init(net, dev);
 	}
 	rtnl_unlock();
 
@@ -799,17 +808,58 @@ static struct pernet_operations netpolicy_net_ops = {
 	.exit = netpolicy_net_exit,
 };
 
+static int netpolicy_notify(struct notifier_block *this,
+			    unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+	switch (event) {
+	case NETDEV_CHANGENAME:
+#ifdef CONFIG_PROC_FS
+		if (dev->proc_dev) {
+			proc_remove(dev->proc_dev);
+			if ((netpolicy_proc_dev_init(dev_net(dev), dev) < 0) &&
+			    dev->proc_dev) {
+				proc_remove(dev->proc_dev);
+				dev->proc_dev = NULL;
+			}
+		}
+#endif
+		break;
+	case NETDEV_UP:
+		netpolicy_dev_init(dev_net(dev), dev);
+		break;
+	case NETDEV_GOING_DOWN:
+		uninit_netpolicy(dev);
+#ifdef CONFIG_PROC_FS
+		proc_remove(dev->proc_dev);
+		dev->proc_dev = NULL;
+#endif
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_dev_notf = {
+	.notifier_call = netpolicy_notify,
+};
+
 static int __init netpolicy_init(void)
 {
 	int ret;
 
 	ret = register_pernet_subsys(&netpolicy_net_ops);
+	if (!ret)
+		register_netdevice_notifier(&netpolicy_dev_notf);
 
 	return ret;
 }
 
 static void __exit netpolicy_exit(void)
 {
+	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 13/26] net/netpolicy: support CPU hotplug
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (11 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 12/26] net/netpolicy: NET device hotplug kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 14/26] net/netpolicy: handle channel changes kan.liang
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

For CPU hotplug, the NET policy subsystem will rebuild the sys map and
object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 271ecc3..3bf0a44 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -38,6 +38,7 @@
 #include <net/rtnetlink.h>
 #include <linux/sort.h>
 #include <linux/ctype.h>
+#include <linux/cpu.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -846,6 +847,77 @@ static struct notifier_block netpolicy_dev_notf = {
 	.notifier_call = netpolicy_notify,
 };
 
+/**
+ * update_netpolicy_sys_map() - rebuild the sys map and object list
+ *
+ * This function go through all the available net policy supported device,
+ * and rebuild sys map and object list.
+ *
+ */
+void update_netpolicy_sys_map(void)
+{
+	struct net *net;
+	struct net_device *dev, *aux;
+	enum netpolicy_name cur_policy;
+
+	for_each_net(net) {
+		for_each_netdev_safe(net, dev, aux) {
+			spin_lock(&dev->np_lock);
+			if (!dev->netpolicy)
+				goto unlock;
+			cur_policy = dev->netpolicy->cur_policy;
+			if (cur_policy == NET_POLICY_NONE)
+				goto unlock;
+
+			dev->netpolicy->cur_policy = NET_POLICY_NONE;
+
+			/* rebuild everything */
+			netpolicy_disable(dev);
+			netpolicy_enable(dev);
+			if (netpolicy_gen_obj_list(dev, cur_policy)) {
+				pr_warn("NETPOLICY: Failed to generate netpolicy object list for dev %s\n",
+					dev->name);
+				netpolicy_disable(dev);
+				goto unlock;
+			}
+			if (dev->netdev_ops->ndo_set_net_policy(dev, cur_policy)) {
+				pr_warn("NETPOLICY: Failed to set netpolicy for dev %s\n",
+					dev->name);
+				netpolicy_disable(dev);
+				goto unlock;
+			}
+
+			dev->netpolicy->cur_policy = cur_policy;
+unlock:
+			spin_unlock(&dev->np_lock);
+		}
+	}
+}
+
+static int netpolicy_cpu_callback(struct notifier_block *nfb,
+				  unsigned long action, void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_ONLINE:
+		rtnl_lock();
+		update_netpolicy_sys_map();
+		rtnl_unlock();
+		break;
+	case CPU_DYING:
+		rtnl_lock();
+		update_netpolicy_sys_map();
+		rtnl_unlock();
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_cpu_notifier = {
+	&netpolicy_cpu_callback,
+	NULL,
+	0
+};
+
 static int __init netpolicy_init(void)
 {
 	int ret;
@@ -854,6 +926,10 @@ static int __init netpolicy_init(void)
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
 
+	cpu_notifier_register_begin();
+	__register_cpu_notifier(&netpolicy_cpu_notifier);
+	cpu_notifier_register_done();
+
 	return ret;
 }
 
@@ -861,6 +937,10 @@ static void __exit netpolicy_exit(void)
 {
 	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
+
+	cpu_notifier_register_begin();
+	__unregister_cpu_notifier(&netpolicy_cpu_notifier);
+	cpu_notifier_register_done();
 }
 
 subsys_initcall(netpolicy_init);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 14/26] net/netpolicy: handle channel changes
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (12 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 13/26] net/netpolicy: support CPU hotplug kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 15/26] net/netpolicy: implement netpolicy register kan.liang
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

User can uses ethtool to set the channel number. This patch handles the
channel changes by rebuilding the object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 8 ++++++++
 net/core/ethtool.c        | 8 +++++++-
 net/core/netpolicy.c      | 1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index f60331d..d6ba9f6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -80,4 +80,12 @@ struct netpolicy_info {
 	struct list_head	obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+#ifdef CONFIG_NETPOLICY
+extern void update_netpolicy_sys_map(void);
+#else
+static inline void update_netpolicy_sys_map(void)
+{
+}
+#endif
+
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..e1f8bd0 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1703,6 +1703,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 {
 	struct ethtool_channels channels, max;
 	u32 max_rx_in_use = 0;
+	int ret;
 
 	if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
 		return -EOPNOTSUPP;
@@ -1726,7 +1727,12 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 	    (channels.combined_count + channels.rx_count) <= max_rx_in_use)
 	    return -EINVAL;
 
-	return dev->ethtool_ops->set_channels(dev, &channels);
+	ret = dev->ethtool_ops->set_channels(dev, &channels);
+#ifdef CONFIG_NETPOLICY
+	if (!ret)
+		update_netpolicy_sys_map();
+#endif
+	return ret;
 }
 
 static int ethtool_get_pauseparam(struct net_device *dev, void __user *useraddr)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 3bf0a44..a739ac7 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -893,6 +893,7 @@ unlock:
 		}
 	}
 }
+EXPORT_SYMBOL(update_netpolicy_sys_map);
 
 static int netpolicy_cpu_callback(struct notifier_block *nfb,
 				  unsigned long action, void *hcpu)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 15/26] net/netpolicy: implement netpolicy register
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (13 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 14/26] net/netpolicy: handle channel changes kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 16/26] net/netpolicy: introduce per socket netpolicy kan.liang
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

The socket/task can only be benefited when it register itself with
specific policy. If it's the first time to register, a record will be
created and inserted into RCU hash table. The record includes ptr,
policy and object information. ptr is the socket/task's pointer which is
used as key to search the record in hash table. Object will be assigned
later.

This patch also introduces a new type NET_POLICY_INVALID, which
indicates that the task/socket are not registered.

np_hashtable_lock is introduced to protect the hash table.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  26 ++++++++
 net/core/netpolicy.c      | 153 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 179 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index d6ba9f6..ee33978 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -17,6 +17,7 @@
 #define __LINUX_NETPOLICY_H
 
 enum netpolicy_name {
+	NET_POLICY_INVALID	= -1,
 	NET_POLICY_NONE		= 0,
 	NET_POLICY_CPU,
 	NET_POLICY_BULK,
@@ -80,12 +81,37 @@ struct netpolicy_info {
 	struct list_head	obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_instance {
+	struct net_device	*dev;
+	enum netpolicy_name	policy; /* required policy */
+	void			*ptr;   /* pointers */
+};
+
+/* check if policy is valid */
+static inline int is_net_policy_valid(enum netpolicy_name policy)
+{
+	return ((policy < NET_POLICY_MAX) && (policy > NET_POLICY_INVALID));
+}
+
 #ifdef CONFIG_NETPOLICY
 extern void update_netpolicy_sys_map(void);
+extern int netpolicy_register(struct netpolicy_instance *instance,
+			      enum netpolicy_name policy);
+extern void netpolicy_unregister(struct netpolicy_instance *instance);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
 }
+
+static inline int netpolicy_register(struct netpolicy_instance *instance,
+				     enum netpolicy_name policy)
+{	return 0;
+}
+
+static inline void netpolicy_unregister(struct netpolicy_instance *instance)
+{
+}
+
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index a739ac7..503ebd1 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -39,6 +39,19 @@
 #include <linux/sort.h>
 #include <linux/ctype.h>
 #include <linux/cpu.h>
+#include <linux/hashtable.h>
+
+struct netpolicy_record {
+	struct hlist_node	hash_node;
+	unsigned long		ptr_id;
+	enum netpolicy_name	policy;
+	struct net_device	*dev;
+	struct netpolicy_object	*rx_obj;
+	struct netpolicy_object	*tx_obj;
+};
+
+static DEFINE_HASHTABLE(np_record_hash, 10);
+static DEFINE_SPINLOCK(np_hashtable_lock);
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -226,6 +239,143 @@ static int netpolicy_enable(struct net_device *dev)
 	return 0;
 }
 
+static struct netpolicy_record *netpolicy_record_search(unsigned long ptr_id)
+{
+	struct netpolicy_record *rec = NULL;
+
+	hash_for_each_possible_rcu(np_record_hash, rec, hash_node, ptr_id) {
+		if (rec->ptr_id == ptr_id)
+			break;
+	}
+
+	return rec;
+}
+
+static void put_queue(struct net_device *dev,
+		      struct netpolicy_object *rx_obj,
+		      struct netpolicy_object *tx_obj)
+{
+	if (!dev || !dev->netpolicy)
+		return;
+
+	if (rx_obj)
+		atomic_dec(&rx_obj->refcnt);
+	if (tx_obj)
+		atomic_dec(&tx_obj->refcnt);
+}
+
+static void netpolicy_record_clear_obj(void)
+{
+	struct netpolicy_record *rec;
+	int i;
+
+	spin_lock(&np_hashtable_lock);
+	hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+		put_queue(rec->dev, rec->rx_obj, rec->tx_obj);
+		rec->rx_obj = NULL;
+		rec->tx_obj = NULL;
+	}
+	spin_unlock(&np_hashtable_lock);
+}
+
+static void netpolicy_record_clear_dev_node(struct net_device *dev)
+{
+	struct netpolicy_record *rec;
+	int i;
+
+	spin_lock_bh(&np_hashtable_lock);
+	hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+		if (rec->dev == dev) {
+			hash_del_rcu(&rec->hash_node);
+			kfree(rec);
+		}
+	}
+	spin_unlock_bh(&np_hashtable_lock);
+}
+
+/**
+ * netpolicy_register() - Register per socket/task policy request
+ * @instance:	NET policy per socket/task instance info
+ * @policy:	request NET policy
+ *
+ * This function intends to register per socket/task policy request.
+ * If it's the first time to register, an record will be created and
+ * inserted into RCU hash table.
+ *
+ * The record includes ptr, policy and object info. ptr of the socket/task
+ * is the key to search the record in hash table. Object will be assigned
+ * until the first packet is received/transmitted.
+ *
+ * Return: 0 on success, others on failure
+ */
+int netpolicy_register(struct netpolicy_instance *instance,
+		       enum netpolicy_name policy)
+{
+	unsigned long ptr_id = (uintptr_t)instance->ptr;
+	struct netpolicy_record *new, *old;
+
+	if (!is_net_policy_valid(policy)) {
+		instance->policy = NET_POLICY_INVALID;
+		return -EINVAL;
+	}
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new) {
+		instance->policy = NET_POLICY_INVALID;
+		return -ENOMEM;
+	}
+
+	spin_lock_bh(&np_hashtable_lock);
+	/* Check it in mapping table */
+	old = netpolicy_record_search(ptr_id);
+	if (old) {
+		if (old->policy != policy) {
+			put_queue(old->dev, old->rx_obj, old->tx_obj);
+			old->rx_obj = NULL;
+			old->tx_obj = NULL;
+			old->policy = policy;
+		}
+		kfree(new);
+	} else {
+		new->ptr_id = ptr_id;
+		new->dev = instance->dev;
+		new->policy = policy;
+		hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+	}
+	instance->policy = policy;
+	spin_unlock_bh(&np_hashtable_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(netpolicy_register);
+
+/**
+ * netpolicy_unregister() - Unregister per socket/task policy request
+ * @instance:	NET policy per socket/task instance info
+ *
+ * This function intends to unregister policy request by del related record
+ * from hash table.
+ *
+ */
+void netpolicy_unregister(struct netpolicy_instance *instance)
+{
+	struct netpolicy_record *record;
+	unsigned long ptr_id = (uintptr_t)instance->ptr;
+
+	spin_lock_bh(&np_hashtable_lock);
+	/* del from hash table */
+	record = netpolicy_record_search(ptr_id);
+	if (record) {
+		hash_del_rcu(&record->hash_node);
+		/* The record cannot be share. It can be safely free. */
+		put_queue(record->dev, record->rx_obj, record->tx_obj);
+		kfree(record);
+	}
+	instance->policy = NET_POLICY_INVALID;
+	spin_unlock_bh(&np_hashtable_lock);
+}
+EXPORT_SYMBOL(netpolicy_unregister);
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE",
 	"CPU",
@@ -833,6 +983,7 @@ static int netpolicy_notify(struct notifier_block *this,
 		break;
 	case NETDEV_GOING_DOWN:
 		uninit_netpolicy(dev);
+		netpolicy_record_clear_dev_node(dev);
 #ifdef CONFIG_PROC_FS
 		proc_remove(dev->proc_dev);
 		dev->proc_dev = NULL;
@@ -871,6 +1022,8 @@ void update_netpolicy_sys_map(void)
 
 			dev->netpolicy->cur_policy = NET_POLICY_NONE;
 
+			/* clear mapping table */
+			netpolicy_record_clear_obj();
 			/* rebuild everything */
 			netpolicy_disable(dev);
 			netpolicy_enable(dev);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 16/26] net/netpolicy: introduce per socket netpolicy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (14 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 15/26] net/netpolicy: implement netpolicy register kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 17/26] net/netpolicy: introduce netpolicy_pick_queue kan.liang
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

The network socket is the most basic unit which control the network
traffic. A socket option is needed for user to set their own policy on
socket to improve the network performance.

There is no existing SOCKET options which can be reused. For socket
options, SO_MARK or may be SO_PRIORITY is close to NET policy's
requirement. But they can not be reused for NET policy. SO_MARK can be
used for routing and packet filtering. But the NET policy doesn't intend
to change the routing. It only redirects the packet to the specific
device queue. Also, the target queue is assigned by NET policy subsystem
at run time. It should not be set in advance. SO_PRIORITY can set
protocol-defined priority for all packets on the socket. But the NET
policies don't have priority yet.

This patch introduces a new socket option SO_NETPOLICY to
set/get net policy for socket. so that the application can set its own
policy on socket to improve the network performance.
Per socket net policy can also be inherited by new socket.

The usage of SO_NETPOLICY socket option is as below.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
getsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
The policy set by SO_NETPOLICY socket option must be valid and
compatible with current device policy. Othrewise, it will error out. The
socket policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  2 ++
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h    |  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h    |  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/net/request_sock.h             |  4 +++-
 include/net/sock.h                     |  9 +++++++++
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 28 ++++++++++++++++++++++++++++
 16 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..06b2ef9 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..24f85f0 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..82c8d44 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..b99c1df 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..71a43ed 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 2027240a..ce8b9ba 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..c041265 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..2639dcd 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE		0x402E
 
+#define SO_NETPOLICY		0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..e04e3b6 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..d43b854 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 31aede3..94a2cdf 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -86,6 +86,8 @@
 
 #define SO_CNX_ADVICE		0x0037
 
+#define SO_NETPOLICY		0x0038
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 81435d9..97f1691 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -101,4 +101,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 6ebe13e..1fa2d0e 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -101,7 +101,9 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 	sk_tx_queue_clear(req_to_sk(req));
 	req->saved_syn = NULL;
 	atomic_set(&req->rsk_refcnt, 0);
-
+#ifdef CONFIG_NETPOLICY
+	memcpy(&req_to_sk(req)->sk_netpolicy, &sk_listener->sk_netpolicy, sizeof(sk_listener->sk_netpolicy));
+#endif
 	return req;
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index c797c57..e1e9e3d 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -70,6 +70,7 @@
 #include <net/checksum.h>
 #include <net/tcp_states.h>
 #include <linux/net_tstamp.h>
+#include <linux/netpolicy.h>
 
 /*
  * This structure really needs to be cleaned up.
@@ -141,6 +142,7 @@ typedef __u64 __bitwise __addrpair;
  *		%SO_OOBINLINE settings, %SO_TIMESTAMPING settings
  *	@skc_incoming_cpu: record/match cpu processing incoming packets
  *	@skc_refcnt: reference count
+ *	@skc_netpolicy: per socket net policy
  *
  *	This is the minimal network layer representation of sockets, the header
  *	for struct sock and struct inet_timewait_sock.
@@ -200,6 +202,10 @@ struct sock_common {
 		struct sock	*skc_listener; /* request_sock */
 		struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */
 	};
+
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_instance    skc_netpolicy;
+#endif
 	/*
 	 * fields between dontcopy_begin/dontcopy_end
 	 * are not copied in sock_copy()
@@ -339,6 +345,9 @@ struct sock {
 #define sk_incoming_cpu		__sk_common.skc_incoming_cpu
 #define sk_flags		__sk_common.skc_flags
 #define sk_rxhash		__sk_common.skc_rxhash
+#ifdef CONFIG_NETPOLICY
+#define sk_netpolicy		__sk_common.skc_netpolicy
+#endif
 
 	socket_lock_t		sk_lock;
 	struct sk_buff_head	sk_receive_queue;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 67d632f..d2a5aeb 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -92,4 +92,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 51a7304..80d9f08 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1003,6 +1003,12 @@ set_rcvbuf:
 		if (val == 1)
 			dst_negative_advice(sk);
 		break;
+
+#ifdef CONFIG_NETPOLICY
+	case SO_NETPOLICY:
+		ret = netpolicy_register(&sk->sk_netpolicy, val);
+		break;
+#endif
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1263,6 +1269,11 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_incoming_cpu;
 		break;
 
+#ifdef CONFIG_NETPOLICY
+	case SO_NETPOLICY:
+		v.val = sk->sk_netpolicy.policy;
+		break;
+#endif
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).
@@ -1402,6 +1413,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
+
+#ifdef CONFIG_NETPOLICY
+		sk->sk_netpolicy.dev = NULL;
+		sk->sk_netpolicy.ptr = (void *)sk;
+		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
+#endif
 	}
 
 	return sk;
@@ -1439,6 +1456,10 @@ static void __sk_destruct(struct rcu_head *head)
 	put_pid(sk->sk_peer_pid);
 	if (likely(sk->sk_net_refcnt))
 		put_net(sock_net(sk));
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(sk->sk_netpolicy.policy))
+		netpolicy_unregister(&sk->sk_netpolicy);
+#endif
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
 
@@ -1575,6 +1596,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		if (sock_needs_netstamp(sk) &&
 		    newsk->sk_flags & SK_FLAGS_TIMESTAMP)
 			net_enable_timestamp();
+
+#ifdef CONFIG_NETPOLICY
+		newsk->sk_netpolicy.ptr = (void *)newsk;
+		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
+			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
+
+#endif
 	}
 out:
 	return newsk;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 17/26] net/netpolicy: introduce netpolicy_pick_queue
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (15 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 16/26] net/netpolicy: introduce per socket netpolicy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy kan.liang
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

To achieve better network performance, the key step is to distribute the
packets to dedicated queues according to policy and system run time
status.

This patch provides an interface which can return the proper dedicated
queue for socket/task. Then the packets of the socket/task will be
redirect to the dedicated queue for better network performance.

For selecting the proper queue, currently it checks the CPU loads and
ref number. The object which has lowest CPU loads and ref number will be
chosen.

The selected object will be stored in hashtable. So it does not need to
go through the whole object list every time.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  12 ++++
 include/linux/sched.h     |   3 +
 kernel/sched/fair.c       |   8 +--
 net/core/netpolicy.c      | 179 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 198 insertions(+), 4 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ee33978..e06b74c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -85,8 +85,15 @@ struct netpolicy_instance {
 	struct net_device	*dev;
 	enum netpolicy_name	policy; /* required policy */
 	void			*ptr;   /* pointers */
+	struct task_struct	*task;
 };
 
+struct netpolicy_cpu_load {
+	unsigned long		load;
+	struct netpolicy_object	*obj;
+};
+#define LOAD_TOLERANCE	5
+
 /* check if policy is valid */
 static inline int is_net_policy_valid(enum netpolicy_name policy)
 {
@@ -98,6 +105,7 @@ extern void update_netpolicy_sys_map(void);
 extern int netpolicy_register(struct netpolicy_instance *instance,
 			      enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_instance *instance);
+extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -112,6 +120,10 @@ static inline void netpolicy_unregister(struct netpolicy_instance *instance)
 {
 }
 
+static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
+{
+	return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 62c68e5..3b716a3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3481,4 +3481,7 @@ void cpufreq_add_update_util_hook(int cpu, struct update_util_data *data,
 void cpufreq_remove_update_util_hook(int cpu);
 #endif /* CONFIG_CPU_FREQ */
 
+extern unsigned long weighted_cpuload(const int cpu);
+extern unsigned long capacity_of(int cpu);
+
 #endif
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 039de34..a579ba2 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1257,10 +1257,10 @@ bool should_numa_migrate_memory(struct task_struct *p, struct page * page,
 	       group_faults_cpu(ng, src_nid) * group_faults(p, dst_nid) * 4;
 }
 
-static unsigned long weighted_cpuload(const int cpu);
+unsigned long weighted_cpuload(const int cpu);
 static unsigned long source_load(int cpu, int type);
 static unsigned long target_load(int cpu, int type);
-static unsigned long capacity_of(int cpu);
+unsigned long capacity_of(int cpu);
 static long effective_load(struct task_group *tg, int cpu, long wl, long wg);
 
 /* Cached statistics for all CPUs within a node */
@@ -4752,7 +4752,7 @@ static void cpu_load_update(struct rq *this_rq, unsigned long this_load,
 }
 
 /* Used instead of source_load when we know the type == 0 */
-static unsigned long weighted_cpuload(const int cpu)
+unsigned long weighted_cpuload(const int cpu)
 {
 	return cfs_rq_runnable_load_avg(&cpu_rq(cpu)->cfs);
 }
@@ -4902,7 +4902,7 @@ static unsigned long target_load(int cpu, int type)
 	return max(rq->cpu_load[type-1], total);
 }
 
-static unsigned long capacity_of(int cpu)
+unsigned long capacity_of(int cpu)
 {
 	return cpu_rq(cpu)->cpu_capacity;
 }
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 503ebd1..e82e0d3 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -40,6 +40,7 @@
 #include <linux/ctype.h>
 #include <linux/cpu.h>
 #include <linux/hashtable.h>
+#include <linux/sched.h>
 
 struct netpolicy_record {
 	struct hlist_node	hash_node;
@@ -293,6 +294,184 @@ static void netpolicy_record_clear_dev_node(struct net_device *dev)
 	spin_unlock_bh(&np_hashtable_lock);
 }
 
+static struct netpolicy_object *get_avail_object(struct net_device *dev,
+						 enum netpolicy_name policy,
+						 struct netpolicy_instance *instance,
+						 bool is_rx)
+{
+	int avail_cpu_num = cpumask_weight(tsk_cpus_allowed(instance->task));
+	int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+	struct netpolicy_object *tmp, *obj = NULL;
+	unsigned long load = 0, min_load = -1;
+	struct netpolicy_cpu_load *cpu_load;
+	int i = 0, val = -1;
+
+	/* Check if net policy is supported */
+	if (!dev || !dev->netpolicy)
+		goto exit;
+
+	/* The system should have queues which support the request policy. */
+	if ((policy != dev->netpolicy->cur_policy) &&
+	    (dev->netpolicy->cur_policy != NET_POLICY_MIX))
+		goto exit;
+
+	if (!avail_cpu_num)
+		goto exit;
+
+	cpu_load = kcalloc(avail_cpu_num, sizeof(*cpu_load), GFP_KERNEL);
+	if (!cpu_load)
+		goto exit;
+
+	spin_lock_bh(&dev->np_ob_list_lock);
+
+	/* find the lowest load and remove obvious high load objects */
+	list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) {
+		if (!cpumask_test_cpu(tmp->cpu, tsk_cpus_allowed(instance->task)))
+			continue;
+
+#ifdef CONFIG_SMP
+		/* normalized load */
+		load = weighted_cpuload(tmp->cpu) * 100 / capacity_of(tmp->cpu);
+
+		if ((min_load != -1) &&
+		    load > (min_load + LOAD_TOLERANCE))
+			continue;
+#endif
+		cpu_load[i].load = load;
+		cpu_load[i].obj = tmp;
+		if ((min_load == -1) ||
+		    (load < min_load))
+			min_load = load;
+		i++;
+	}
+	avail_cpu_num = i;
+	spin_unlock_bh(&dev->np_ob_list_lock);
+
+	for (i = 0; i < avail_cpu_num; i++) {
+		if (cpu_load[i].load > (min_load + LOAD_TOLERANCE))
+			continue;
+
+		tmp = cpu_load[i].obj;
+		if ((val > atomic_read(&tmp->refcnt)) ||
+		    (val == -1)) {
+			val = atomic_read(&tmp->refcnt);
+			obj = tmp;
+		}
+	}
+
+	if (!obj)
+		goto free_load;
+
+	atomic_inc(&obj->refcnt);
+
+free_load:
+	kfree(cpu_load);
+exit:
+	return obj;
+}
+
+static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
+{
+	struct netpolicy_record *old_record, *new_record;
+	struct net_device *dev = instance->dev;
+	unsigned long ptr_id = (uintptr_t)instance->ptr;
+	int queue = -1;
+
+	spin_lock_bh(&np_hashtable_lock);
+	old_record = netpolicy_record_search(ptr_id);
+	if (!old_record) {
+		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
+		instance->policy = NET_POLICY_INVALID;
+		goto err;
+	}
+
+	if (is_rx && old_record->rx_obj) {
+		queue = old_record->rx_obj->queue;
+	} else if (!is_rx && old_record->tx_obj) {
+		queue = old_record->tx_obj->queue;
+	} else {
+		new_record = kzalloc(sizeof(*new_record), GFP_KERNEL);
+		if (!new_record)
+			goto err;
+		memcpy(new_record, old_record, sizeof(*new_record));
+
+		if (is_rx) {
+			new_record->rx_obj = get_avail_object(dev, new_record->policy,
+							      instance, is_rx);
+			if (!new_record->dev)
+				new_record->dev = dev;
+			if (!new_record->rx_obj) {
+				kfree(new_record);
+				goto err;
+			}
+			queue = new_record->rx_obj->queue;
+		} else {
+			new_record->tx_obj = get_avail_object(dev, new_record->policy,
+							      instance, is_rx);
+			if (!new_record->dev)
+				new_record->dev = dev;
+			if (!new_record->tx_obj) {
+				kfree(new_record);
+				goto err;
+			}
+			queue = new_record->tx_obj->queue;
+		}
+		/* update record */
+		hlist_replace_rcu(&old_record->hash_node, &new_record->hash_node);
+		kfree(old_record);
+	}
+err:
+	spin_unlock_bh(&np_hashtable_lock);
+	return queue;
+}
+
+static inline bool policy_validate(struct netpolicy_instance *instance)
+{
+	struct net_device *dev = instance->dev;
+	enum netpolicy_name cur_policy;
+
+	cur_policy = dev->netpolicy->cur_policy;
+	if ((instance->policy == NET_POLICY_NONE) ||
+	    (cur_policy == NET_POLICY_NONE))
+		return false;
+
+	if (((cur_policy != NET_POLICY_MIX) && (cur_policy != instance->policy)) ||
+	    ((cur_policy == NET_POLICY_MIX) && (instance->policy == NET_POLICY_CPU))) {
+		pr_warn("NETPOLICY: %s current device policy %s doesn't support required policy %s! Remove net policy settings!\n",
+			dev->name, policy_name[cur_policy],
+			policy_name[instance->policy]);
+		return false;
+	}
+	return true;
+}
+
+/**
+ * netpolicy_pick_queue() - Find proper queue
+ * @instance:	NET policy per socket/task instance info
+ * @is_rx:	RX queue or TX queue
+ *
+ * This function intends to find the proper queue according to policy.
+ * For selecting the proper queue, currently it uses round-robin algorithm
+ * to find the available object from the given policy object list.
+ * The selected object will be stored in hashtable. So it does not need to
+ * go through the whole object list every time.
+ *
+ * Return: negative on failure, otherwise on the assigned queue
+ */
+int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
+{
+	struct net_device *dev = instance->dev;
+
+	if (!dev || !dev->netpolicy)
+		return -EINVAL;
+
+	if (!policy_validate(instance))
+		return -EINVAL;
+
+	return get_avail_queue(instance, is_rx);
+}
+EXPORT_SYMBOL(netpolicy_pick_queue);
+
 /**
  * netpolicy_register() - Register per socket/task policy request
  * @instance:	NET policy per socket/task instance info
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (16 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 17/26] net/netpolicy: introduce netpolicy_pick_queue kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 20:23     ` Tom Herbert
  2016-09-12 14:55 ` [RFC V3 PATCH 19/26] net/netpolicy: tc bpf extension to pick Tx queue kan.liang
                   ` (9 subsequent siblings)
  27 siblings, 1 reply; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

When the device tries to transmit a packet, netdev_pick_tx is called to
find the available tx queues. If the net policy is applied, it picks up
the assigned tx queue from net policy subsystem, and redirect the
traffic to the assigned queue.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/net/sock.h |  9 +++++++++
 net/core/dev.c     | 20 ++++++++++++++++++--
 2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index e1e9e3d..ca97f35 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2280,4 +2280,13 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
+/* Return netpolicy instance information from socket. */
+static inline struct netpolicy_instance *netpolicy_find_instance(struct sock *sk)
+{
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(sk->sk_netpolicy.policy))
+		return &sk->sk_netpolicy;
+#endif
+	return NULL;
+}
 #endif	/* _SOCK_H */
diff --git a/net/core/dev.c b/net/core/dev.c
index 34b5322..b9a8044 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 				    struct sk_buff *skb,
 				    void *accel_priv)
 {
+	struct sock *sk = skb->sk;
 	int queue_index = 0;
 
 #ifdef CONFIG_XPS
@@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 		if (ops->ndo_select_queue)
 			queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
 							    __netdev_pick_tx);
-		else
-			queue_index = __netdev_pick_tx(dev, skb);
+		else {
+#ifdef CONFIG_NETPOLICY
+			struct netpolicy_instance *instance;
+
+			queue_index = -1;
+			if (dev->netpolicy && sk) {
+				instance = netpolicy_find_instance(sk);
+				if (instance) {
+					if (!instance->dev)
+						instance->dev = dev;
+					queue_index = netpolicy_pick_queue(instance, false);
+				}
+			}
+			if (queue_index < 0)
+#endif
+				queue_index = __netdev_pick_tx(dev, skb);
+		}
 
 		if (!accel_priv)
 			queue_index = netdev_cap_txqueue(dev, queue_index);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 19/26] net/netpolicy: tc bpf extension to pick Tx queue
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (17 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 20/26] net/netpolicy: set Rx queues according to policy kan.liang
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch extends the netpolicy to support tc bpf when selecting Tx
queue. It implements a bpf classifier for clsact qdisc. The classifier
will pick up the proper queue from net policy subsystem. This queue
selection from tc is not compatible with XPS. So XPS will be invalid.

Currently, tc bpf extension only supports the queue selection on egress.
To enable the extension, the following command must be applied.
 # ./tc qdisc add dev $DEVNAME clsact
 # ./tc filter add dev $DEVNAME egress bpf obj netpolicy_kern.o

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/uapi/linux/bpf.h  |  8 ++++++++
 net/core/dev.c            |  4 ++--
 net/core/filter.c         | 36 ++++++++++++++++++++++++++++++++++++
 samples/bpf/Makefile      |  1 +
 samples/bpf/bpf_helpers.h |  2 ++
 5 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f896dfa..9c7d847 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -398,6 +398,14 @@ enum bpf_func_id {
 	 */
 	BPF_FUNC_skb_change_tail,
 
+	/**
+	 * bpf_netpolicy(skb)
+	 * Netpolicy tc extension. Search for proper Tx queue
+	 * @skb: pointer to skb
+	 * Return: 0 on success or negative error
+	 */
+	BPF_FUNC_netpolicy,
+
 	__BPF_FUNC_MAX_ID,
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index b9a8044..82304ce 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3285,8 +3285,8 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 #ifdef CONFIG_NETPOLICY
 			struct netpolicy_instance *instance;
 
-			queue_index = -1;
-			if (dev->netpolicy && sk) {
+			queue_index = sk_tx_queue_get(sk);
+			if ((queue_index < 0) && dev->netpolicy && sk) {
 				instance = netpolicy_find_instance(sk);
 				if (instance) {
 					if (!instance->dev)
diff --git a/net/core/filter.c b/net/core/filter.c
index a83766b..ce32288 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2351,6 +2351,38 @@ static const struct bpf_func_proto bpf_skb_set_tunnel_opt_proto = {
 	.arg3_type	= ARG_CONST_STACK_SIZE,
 };
 
+#ifdef CONFIG_NETPOLICY
+static u64 bpf_netpolicy(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+	struct sk_buff *skb = (struct sk_buff *) (unsigned long) r1;
+	struct netpolicy_instance *instance;
+	struct net_device *dev = skb->dev;
+	struct sock *sk = skb->sk;
+	int queue_index;
+
+	if (dev->netpolicy && sk) {
+		instance = netpolicy_find_instance(sk);
+		if (instance) {
+			if (!instance->dev)
+				instance->dev = dev;
+			queue_index = netpolicy_pick_queue(instance, false);
+			if ((queue_index >= 0) && sk_fullsock(sk) &&
+			    rcu_access_pointer(sk->sk_dst_cache))
+				sk_tx_queue_set(sk, queue_index);
+		}
+	}
+
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_netpolicy_proto = {
+	.func		= bpf_netpolicy,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+#endif
+
 static const struct bpf_func_proto *
 bpf_get_skb_set_tunnel_proto(enum bpf_func_id which)
 {
@@ -2515,6 +2547,10 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
 		return &bpf_get_smp_processor_id_proto;
 	case BPF_FUNC_skb_under_cgroup:
 		return &bpf_skb_under_cgroup_proto;
+#ifdef CONFIG_NETPOLICY
+	case BPF_FUNC_netpolicy:
+		return &bpf_netpolicy_proto;
+#endif
 	default:
 		return sk_filter_func_proto(func_id);
 	}
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 12b7304..4aedbb9 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -85,6 +85,7 @@ always += xdp2_kern.o
 always += test_current_task_under_cgroup_kern.o
 always += trace_event_kern.o
 always += sampleip_kern.o
+always += netpolicy_kern.o
 
 HOSTCFLAGS += -I$(objtree)/usr/include
 
diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h
index 90f44bd..b295bbc 100644
--- a/samples/bpf/bpf_helpers.h
+++ b/samples/bpf/bpf_helpers.h
@@ -88,6 +88,8 @@ static int (*bpf_l4_csum_replace)(void *ctx, int off, int from, int to, int flag
 	(void *) BPF_FUNC_l4_csum_replace;
 static int (*bpf_skb_under_cgroup)(void *ctx, void *map, int index) =
 	(void *) BPF_FUNC_skb_under_cgroup;
+static int (*bpf_netpolicy)(void *ctx) =
+	(void *) BPF_FUNC_netpolicy;
 
 #if defined(__x86_64__)
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 20/26] net/netpolicy: set Rx queues according to policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (18 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 19/26] net/netpolicy: tc bpf extension to pick Tx queue kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 21/26] net/netpolicy: introduce per task net policy kan.liang
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

For setting Rx queues, this patch configure Rx network flow
classification rules to redirect the packets to the assigned queue.

Since we may not get all the information required for rule until the
first packet arrived, it will add the rule after recvmsg. Also, to
avoid destroying the connection rates, the configuration will be done
asynchronized by work queue. So the first several packets may not use
the assigned queue.

The dev information will be discarded in udp_queue_rcv_skb, so we record
it in netpolicy struct in advance.

This patch only support INET tcp4 and udp4. It can be extend to other
socket type and V6 later shortly.

For each sk, it only supports one rule. If the port/address changed, the
previos rule will be replaced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  30 +++++++++++
 net/core/netpolicy.c      | 132 +++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv4/af_inet.c        |  71 +++++++++++++++++++++++++
 net/ipv4/udp.c            |   4 ++
 4 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index e06b74c..04cd07d 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -37,6 +37,8 @@ enum netpolicy_traffic {
 	NETPOLICY_RXTX,
 };
 
+#define NETPOLICY_INVALID_QUEUE	-1
+#define NETPOLICY_INVALID_LOC	NETPOLICY_INVALID_QUEUE
 #define POLICY_NAME_LEN_MAX	64
 extern const char *policy_name[];
 
@@ -81,11 +83,34 @@ struct netpolicy_info {
 	struct list_head	obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_tcpudpip4_spec {
+	/* source and Destination host and port */
+	__be32	ip4src;
+	__be32	ip4dst;
+	__be16	psrc;
+	__be16	pdst;
+};
+
+union netpolicy_flow_union {
+	struct netpolicy_tcpudpip4_spec	tcp_udp_ip4_spec;
+};
+
+struct netpolicy_flow_spec {
+	__u32	flow_type;
+	union netpolicy_flow_union	spec;
+};
+
 struct netpolicy_instance {
 	struct net_device	*dev;
 	enum netpolicy_name	policy; /* required policy */
 	void			*ptr;   /* pointers */
 	struct task_struct	*task;
+	int			location;	/* rule location */
+	atomic_t		rule_queue;	/* queue set by rule */
+	struct work_struct	fc_wk;		/* flow classification work */
+	atomic_t		fc_wk_cnt;	/* flow classification work number */
+	struct netpolicy_flow_spec	flow;	/* flow information */
+
 };
 
 struct netpolicy_cpu_load {
@@ -106,6 +131,7 @@ extern int netpolicy_register(struct netpolicy_instance *instance,
 			      enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_instance *instance);
 extern int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx);
+extern void netpolicy_set_rules(struct netpolicy_instance *instance);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -124,6 +150,10 @@ static inline int netpolicy_pick_queue(struct netpolicy_instance *instance, bool
 {
 	return 0;
 }
+
+static inline void netpolicy_set_rules(struct netpolicy_instance *instance)
+{
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index e82e0d3..252cbee 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -54,6 +54,8 @@ struct netpolicy_record {
 static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
+struct workqueue_struct *np_fc_wq;
+
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
 {
@@ -472,6 +474,90 @@ int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
 }
 EXPORT_SYMBOL(netpolicy_pick_queue);
 
+void np_flow_rule_set(struct work_struct *wk)
+{
+	struct netpolicy_instance *instance;
+	struct netpolicy_flow_spec *flow;
+	struct ethtool_rxnfc cmd;
+	struct net_device *dev;
+	int queue, ret;
+
+	instance = container_of(wk, struct netpolicy_instance,
+				fc_wk);
+	if (!instance)
+		goto done;
+
+	flow = &instance->flow;
+	if (WARN_ON(!flow))
+		goto done;
+	dev = instance->dev;
+	if (WARN_ON(!dev))
+		goto done;
+
+	/* Check if ntuple is supported */
+	if (!dev->ethtool_ops->set_rxnfc)
+		goto done;
+
+	/* Only support TCP/UDP V4 by now */
+	if ((flow->flow_type != TCP_V4_FLOW) &&
+	    (flow->flow_type != UDP_V4_FLOW))
+		goto done;
+
+	queue = get_avail_queue(instance, true);
+	if (queue < 0)
+		goto done;
+
+	/* using ethtool flow-type to configure
+	 * Rx network flow classification options or rules
+	 * RX_CLS_LOC_ANY must be supported by the driver
+	 */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.cmd = ETHTOOL_SRXCLSRLINS;
+	cmd.fs.flow_type = flow->flow_type;
+	cmd.fs.h_u.tcp_ip4_spec.ip4src = flow->spec.tcp_udp_ip4_spec.ip4src;
+	cmd.fs.h_u.tcp_ip4_spec.psrc = flow->spec.tcp_udp_ip4_spec.psrc;
+	cmd.fs.h_u.tcp_ip4_spec.ip4dst = flow->spec.tcp_udp_ip4_spec.ip4dst;
+	cmd.fs.h_u.tcp_ip4_spec.pdst = flow->spec.tcp_udp_ip4_spec.pdst;
+	cmd.fs.ring_cookie = queue;
+	cmd.fs.location = RX_CLS_LOC_ANY;
+	rtnl_lock();
+	ret = dev->ethtool_ops->set_rxnfc(dev, &cmd);
+	rtnl_unlock();
+	if (ret < 0) {
+		pr_warn("Failed to set rules ret %d\n", ret);
+		atomic_set(&instance->rule_queue, NETPOLICY_INVALID_QUEUE);
+		goto done;
+	}
+
+	/* TODO: now one sk only has one rule */
+	if (instance->location != NETPOLICY_INVALID_LOC) {
+		/* delete the old rule */
+		struct ethtool_rxnfc del_cmd;
+
+		del_cmd.cmd = ETHTOOL_SRXCLSRLDEL;
+		del_cmd.fs.location = instance->location;
+		rtnl_lock();
+		ret = dev->ethtool_ops->set_rxnfc(dev, &del_cmd);
+		rtnl_unlock();
+		if (ret < 0)
+			pr_warn("Failed to delete rules ret %d\n", ret);
+	}
+
+	/* record rule location */
+	instance->location = cmd.fs.location;
+	atomic_set(&instance->rule_queue, queue);
+done:
+	atomic_set(&instance->fc_wk_cnt, 0);
+}
+
+static void init_instance(struct netpolicy_instance *instance)
+{
+	instance->location = NETPOLICY_INVALID_LOC;
+	atomic_set(&instance->rule_queue, NETPOLICY_INVALID_QUEUE);
+	atomic_set(&instance->fc_wk_cnt, 0);
+	INIT_WORK(&instance->fc_wk, np_flow_rule_set);
+}
+
 /**
  * netpolicy_register() - Register per socket/task policy request
  * @instance:	NET policy per socket/task instance info
@@ -516,6 +602,7 @@ int netpolicy_register(struct netpolicy_instance *instance,
 		}
 		kfree(new);
 	} else {
+		init_instance(instance);
 		new->ptr_id = ptr_id;
 		new->dev = instance->dev;
 		new->policy = policy;
@@ -538,8 +625,23 @@ EXPORT_SYMBOL(netpolicy_register);
  */
 void netpolicy_unregister(struct netpolicy_instance *instance)
 {
-	struct netpolicy_record *record;
 	unsigned long ptr_id = (uintptr_t)instance->ptr;
+	struct net_device *dev = instance->dev;
+	struct netpolicy_record *record;
+
+	cancel_work_sync(&instance->fc_wk);
+	/* remove FD rules */
+	if (dev && instance->location != NETPOLICY_INVALID_LOC) {
+		struct ethtool_rxnfc del_cmd;
+
+		del_cmd.cmd = ETHTOOL_SRXCLSRLDEL;
+		del_cmd.fs.location = instance->location;
+		rtnl_lock();
+		dev->ethtool_ops->set_rxnfc(dev, &del_cmd);
+		rtnl_unlock();
+		instance->location = NETPOLICY_INVALID_LOC;
+		atomic_set(&instance->rule_queue, NETPOLICY_INVALID_QUEUE);
+	}
 
 	spin_lock_bh(&np_hashtable_lock);
 	/* del from hash table */
@@ -555,6 +657,28 @@ void netpolicy_unregister(struct netpolicy_instance *instance)
 }
 EXPORT_SYMBOL(netpolicy_unregister);
 
+/**
+ * netpolicy_set_rules() - Configure Rx network flow classification rules
+ * @instance:	NET policy per socket/task instance info
+ *
+ * This function intends to configure Rx network flow classification rules
+ * according to ip and port information. The configuration will be done
+ * asynchronized by work queue. It avoids to destroy the connection rates.
+ *
+ * Currently, it only supports TCP and UDP V4. Other protocols will be
+ * supported later.
+ *
+ */
+void netpolicy_set_rules(struct netpolicy_instance *instance)
+{
+	/* There should be only one work to run at the same time */
+	if (!atomic_cmpxchg(&instance->fc_wk_cnt, 0, 1)) {
+		instance->task = current;
+		queue_work(np_fc_wq, &instance->fc_wk);
+	}
+}
+EXPORT_SYMBOL(netpolicy_set_rules);
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE",
 	"CPU",
@@ -1255,6 +1379,10 @@ static int __init netpolicy_init(void)
 {
 	int ret;
 
+	np_fc_wq = create_workqueue("np_fc");
+	if (!np_fc_wq)
+		return -ENOMEM;
+
 	ret = register_pernet_subsys(&netpolicy_net_ops);
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
@@ -1268,6 +1396,8 @@ static int __init netpolicy_init(void)
 
 static void __exit netpolicy_exit(void)
 {
+	destroy_workqueue(np_fc_wq);
+
 	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index e94b47b..209edc4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -754,6 +754,71 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 }
 EXPORT_SYMBOL(inet_sendpage);
 
+static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
+{
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_instance *instance;
+	struct netpolicy_flow_spec *flow;
+	bool change = false;
+	int queue;
+
+	instance = netpolicy_find_instance(sk);
+	if (!instance)
+		return;
+
+	if (!instance->dev)
+		return;
+
+	flow = &instance->flow;
+	/* TODO: need to change here and add more protocol support */
+	if (sk->sk_family != AF_INET)
+		return;
+	if ((sk->sk_protocol == IPPROTO_TCP) &&
+	    (sk->sk_type == SOCK_STREAM)) {
+		if ((flow->flow_type != TCP_V4_FLOW) ||
+		    (flow->spec.tcp_udp_ip4_spec.ip4src != sk->sk_daddr) ||
+		    (flow->spec.tcp_udp_ip4_spec.psrc != sk->sk_dport) ||
+		    (flow->spec.tcp_udp_ip4_spec.ip4dst != sk->sk_rcv_saddr) ||
+		    (flow->spec.tcp_udp_ip4_spec.pdst != htons(sk->sk_num)))
+			change = true;
+		if (change) {
+			flow->flow_type = TCP_V4_FLOW;
+			flow->spec.tcp_udp_ip4_spec.ip4src = sk->sk_daddr;
+			flow->spec.tcp_udp_ip4_spec.psrc = sk->sk_dport;
+			flow->spec.tcp_udp_ip4_spec.ip4dst = sk->sk_rcv_saddr;
+			flow->spec.tcp_udp_ip4_spec.pdst = htons(sk->sk_num);
+		}
+	} else if ((sk->sk_protocol == IPPROTO_UDP) &&
+		   (sk->sk_type == SOCK_DGRAM)) {
+			DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
+
+			if (!sin || !sin->sin_addr.s_addr || !sin->sin_port)
+				return;
+			if ((flow->flow_type != UDP_V4_FLOW) ||
+			    (flow->spec.tcp_udp_ip4_spec.ip4src != sin->sin_addr.s_addr) ||
+			    (flow->spec.tcp_udp_ip4_spec.psrc != sin->sin_port) ||
+			    (flow->spec.tcp_udp_ip4_spec.ip4dst != sk->sk_rcv_saddr) ||
+			    (flow->spec.tcp_udp_ip4_spec.pdst != htons(sk->sk_num)))
+				change = true;
+			if (change) {
+				flow->flow_type = UDP_V4_FLOW;
+				flow->spec.tcp_udp_ip4_spec.ip4src = sin->sin_addr.s_addr;
+				flow->spec.tcp_udp_ip4_spec.psrc = sin->sin_port;
+				flow->spec.tcp_udp_ip4_spec.ip4dst = sk->sk_rcv_saddr;
+				flow->spec.tcp_udp_ip4_spec.pdst = htons(sk->sk_num);
+			}
+	} else {
+		return;
+	}
+
+	queue = netpolicy_pick_queue(instance, true);
+	if (queue < 0)
+		return;
+	if ((queue != atomic_read(&instance->rule_queue)) || change)
+		netpolicy_set_rules(instance);
+#endif
+}
+
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags)
 {
@@ -767,6 +832,12 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 				   flags & ~MSG_DONTWAIT, &addr_len);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
+
+	/* The dev info, src address and port information for UDP
+	 * can only be retrieved after processing the msg.
+	 */
+	sock_netpolicy_manage_flow(sk, msg);
+
 	return err;
 }
 EXPORT_SYMBOL(inet_recvmsg);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 058c312..cc2499d 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1786,6 +1786,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk) {
 		int ret;
 
+#ifdef CONFIG_NETPOLICY
+		/* Record dev info before it's discarded in udp_queue_rcv_skb */
+		sk->sk_netpolicy.dev = skb->dev;
+#endif
 		if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
 			skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
 						 inet_compute_pseudo);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 21/26] net/netpolicy: introduce per task net policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (19 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 20/26] net/netpolicy: set Rx queues according to policy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc kan.liang
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Usually, application as a whole has specific requirement. Applying the
net policy to all sockets one by one in the application is too complex.
This patch introduces per task net policy to address this case.
Once the per task net policy is applied, all the sockets in the
application will apply the same net policy. Also, per task net policy
can be inherited by all children.

The usage of PR_SET_NETPOLICY option is as below.
prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL).
It applies per task policy. The policy name must be valid and compatible
with current device policy. Othrewise, it will error out. The task
policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/init_task.h  |  9 +++++++++
 include/linux/sched.h      |  5 +++++
 include/net/sock.h         | 12 +++++++++++-
 include/uapi/linux/prctl.h |  4 ++++
 kernel/exit.c              |  4 ++++
 kernel/fork.c              |  6 ++++++
 kernel/sys.c               | 31 +++++++++++++++++++++++++++++++
 net/core/netpolicy.c       | 35 +++++++++++++++++++++++++++++++++++
 net/core/sock.c            | 10 +++++++++-
 net/ipv4/af_inet.c         |  7 +++++--
 10 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f8834f8..133d1cb 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,14 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_NETPOLICY
+#define INIT_NETPOLICY(tsk)						\
+	.task_netpolicy.policy = NET_POLICY_INVALID,			\
+	.task_netpolicy.dev = NULL,					\
+	.task_netpolicy.ptr = (void *)&tsk,
+#else
+#define INIT_NETPOLICY(tsk)
+#endif
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -260,6 +268,7 @@ extern struct task_group root_task_group;
 	INIT_VTIME(tsk)							\
 	INIT_NUMA_BALANCING(tsk)					\
 	INIT_KASAN(tsk)							\
+	INIT_NETPOLICY(tsk)						\
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3b716a3..1c8c674 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -62,6 +62,8 @@ struct sched_param {
 
 #include <asm/processor.h>
 
+#include <linux/netpolicy.h>
+
 #define SCHED_ATTR_SIZE_VER0	48	/* sizeof first published struct */
 
 /*
@@ -1923,6 +1925,9 @@ struct task_struct {
 #ifdef CONFIG_MMU
 	struct task_struct *oom_reaper_list;
 #endif
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_instance task_netpolicy;
+#endif
 /* CPU-specific state of this task */
 	struct thread_struct thread;
 /*
diff --git a/include/net/sock.h b/include/net/sock.h
index ca97f35..867dc84 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1484,6 +1484,7 @@ void sock_edemux(struct sk_buff *skb);
 #define sock_edemux(skb) sock_efree(skb)
 #endif
 
+void sock_setnetpolicy(struct socket *sock);
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    char __user *optval, unsigned int optlen);
 
@@ -2280,10 +2281,19 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
-/* Return netpolicy instance information from socket. */
+/* Return netpolicy instance information from either task or socket.
+ * If both task and socket have netpolicy instance information,
+ * using task's and unregistering socket's. Because task policy is
+ * dominant policy
+ */
 static inline struct netpolicy_instance *netpolicy_find_instance(struct sock *sk)
 {
 #ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(current->task_netpolicy.policy)) {
+		if (is_net_policy_valid(sk->sk_netpolicy.policy))
+			netpolicy_unregister(&sk->sk_netpolicy);
+		return &current->task_netpolicy;
+	}
 	if (is_net_policy_valid(sk->sk_netpolicy.policy))
 		return &sk->sk_netpolicy;
 #endif
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..bc182d2 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* Control net policy */
+#define PR_SET_NETPOLICY		48
+#define PR_GET_NETPOLICY		49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 2f974ae..37841da 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -858,6 +858,10 @@ void do_exit(long code)
 	if (unlikely(current->pi_state_cache))
 		kfree(current->pi_state_cache);
 #endif
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(current->task_netpolicy.policy))
+		netpolicy_unregister(&current->task_netpolicy);
+#endif
 	/*
 	 * Make sure we are holding no locks:
 	 */
diff --git a/kernel/fork.c b/kernel/fork.c
index 52e725d..fd61b7d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1451,6 +1451,12 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->sequential_io_avg	= 0;
 #endif
 
+#ifdef CONFIG_NETPOLICY
+	p->task_netpolicy.ptr = (void *)p;
+	if (is_net_policy_valid(p->task_netpolicy.policy))
+		netpolicy_register(&p->task_netpolicy, p->task_netpolicy.policy);
+#endif
+
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
 	if (retval)
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..b481a64 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2072,6 +2072,31 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 }
 #endif
 
+#ifdef CONFIG_NETPOLICY
+static int prctl_set_netpolicy(struct task_struct *me, int policy)
+{
+	return netpolicy_register(&me->task_netpolicy, policy);
+}
+
+static int prctl_get_netpolicy(struct task_struct *me, unsigned long adr)
+{
+	return put_user(me->task_netpolicy.policy, (int __user *)adr);
+}
+
+#else /* CONFIG_NETPOLICY */
+
+static int prctl_set_netpolicy(struct task_struct *me, int policy)
+{
+	return -EINVAL;
+}
+
+static int prctl_get_netpolicy(struct task_struct *me, unsigned long adr)
+{
+	return -EINVAL;
+}
+
+#endif /* CONFIG_NETPOLICY */
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2270,6 +2295,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+	case PR_SET_NETPOLICY:
+		error = prctl_set_netpolicy(me, arg2);
+		break;
+	case PR_GET_NETPOLICY:
+		error = prctl_get_netpolicy(me, arg2);
+		break;
 	default:
 		error = -EINVAL;
 		break;
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 252cbee..60a6d69 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -24,6 +24,35 @@
  *	  is too difficult for users.
  * 	So, it is a big challenge to get good network performance.
  *
+ * NET policy supports four policies per device, and three policies per task
+ * and per socket. For using NET policy, the device policy must be set in
+ * advance. The task policy or socket policy must be compatible with device
+ * policy.
+ *
+ * BULK policy		This policy is designed for high throughput. It can be
+ *			applied to either device policy or task/socket policy.
+ *			If it is applied to device policy, the only compatible
+ *			task/socket policy is BULK policy itself.
+ * CPU policy		This policy is designed for high throughput and lower
+ *			CPU utilization. It can be applied to either device
+ *			policy or task/socket policy. If it is applied to
+ *			device policy, the only compatible task/socket policy
+ *			is CPU policy itself.
+ * LATENCY policy	This policy is designed for low latency. It can be
+ *			applied to either device policy or task/socket policy.
+ *			If it is applied to device policy, the only compatible
+ *			task/socket policy is LATENCY policy itself.
+ * MIX policy		This policy can only be applied to device policy. It
+ *			is compatible with BULK and LATENCY policy. This
+ *			policy is designed for the case which miscellaneous
+ *			types of workload running on the device.
+ *
+ * The device policy changes the system configuration and reorganize the
+ * resource on the device, but it does not change the packets behavior.
+ * The task policy and socket policy redirect the packets to get good
+ * performance. If both task policy and socket policy are set in the same
+ * task, task policy will be applied. The task policy can also be inherited by
+ * children.
  */
 #include <linux/module.h>
 #include <linux/kernel.h>
@@ -444,6 +473,12 @@ static inline bool policy_validate(struct netpolicy_instance *instance)
 			policy_name[instance->policy]);
 		return false;
 	}
+
+	/* task policy is dominant policy */
+	if (is_net_policy_valid(current->task_netpolicy.policy) &&
+	    (current->task_netpolicy.policy != instance->policy))
+		return false;
+
 	return true;
 }
 
diff --git a/net/core/sock.c b/net/core/sock.c
index 80d9f08..1726a3c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1006,7 +1006,13 @@ set_rcvbuf:
 
 #ifdef CONFIG_NETPOLICY
 	case SO_NETPOLICY:
-		ret = netpolicy_register(&sk->sk_netpolicy, val);
+		if (is_net_policy_valid(current->task_netpolicy.policy) &&
+		    (current->task_netpolicy.policy != val)) {
+			printk_ratelimited(KERN_WARNING "NETPOLICY: new policy is not compatible with task netpolicy\n");
+			ret = -EINVAL;
+		} else {
+			ret = netpolicy_register(&sk->sk_netpolicy, val);
+		}
 		break;
 #endif
 	default:
@@ -1599,6 +1605,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 #ifdef CONFIG_NETPOLICY
 		newsk->sk_netpolicy.ptr = (void *)newsk;
+		if (is_net_policy_valid(current->task_netpolicy.policy))
+			newsk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
 			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 209edc4..71bee44 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -766,8 +766,11 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 	if (!instance)
 		return;
 
-	if (!instance->dev)
-		return;
+	if (!instance->dev) {
+		if (!sk->sk_netpolicy.dev)
+			return;
+		instance->dev = sk->sk_netpolicy.dev;
+	}
 
 	flow = &instance->flow;
 	/* TODO: need to change here and add more protocol support */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (20 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 21/26] net/netpolicy: introduce per task net policy kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 17:01   ` Sergei Shtylyov
  2016-09-12 14:55 ` [RFC V3 PATCH 23/26] net/netpolicy: fast path for finding the queues kan.liang
                   ` (5 subsequent siblings)
  27 siblings, 1 reply; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Users may not want to change the source code to add per task net polic
support. Or they may want to change a running task's net policy. prctl
does not work for both cases.

This patch adds an interface in /proc, which can be used to set and
retrieve policy of already running tasks. User can write the policy name
into /proc/$PID/net_policy to set per task net policy.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 fs/proc/base.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 54e2702..cfd7f5d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -91,6 +91,8 @@
 #include <asm/hardwall.h>
 #endif
 #include <trace/events/oom.h>
+#include <linux/netpolicy.h>
+#include <linux/ctype.h>
 #include "internal.h"
 #include "fd.h"
 
@@ -2811,6 +2813,65 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 	return err;
 }
 
+#ifdef CONFIG_NETPOLICY
+static int proc_net_policy_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *task = get_proc_task(inode);
+
+	if (is_net_policy_valid(task->task_netpolicy.policy))
+		seq_printf(m, "%s\n", policy_name[task->task_netpolicy.policy]);
+
+	return 0;
+}
+
+static int proc_net_policy_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_net_policy_show, inode);
+}
+
+static ssize_t proc_net_policy_write(struct file *file, const char __user *buf,
+				     size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task = get_proc_task(inode);
+	char name[POLICY_NAME_LEN_MAX];
+	int i, ret;
+
+	if (count >= POLICY_NAME_LEN_MAX)
+		return -EINVAL;
+
+	if (copy_from_user(name, buf, count))
+		return -EINVAL;
+
+	for (i = 0; i < count - 1; i++)
+		name[i] = toupper(name[i]);
+	name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if (!strncmp(name, policy_name[i], strlen(policy_name[i]))) {
+			ret = netpolicy_register(&task->task_netpolicy, i);
+			if (ret)
+				return ret;
+			break;
+		}
+	}
+
+	if (i == NET_POLICY_MAX)
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations proc_net_policy_operations = {
+	.open		= proc_net_policy_open,
+	.write		= proc_net_policy_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+#endif /* CONFIG_NETPOLICY */
+
 /*
  * Thread groups
  */
@@ -2910,6 +2971,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("timers",	  S_IRUGO, proc_timers_operations),
 #endif
 	REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
+#if IS_ENABLED(CONFIG_NETPOLICY)
+	REG("net_policy", S_IRUSR|S_IWUSR, proc_net_policy_operations),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 23/26] net/netpolicy: fast path for finding the queues
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (21 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 24/26] net/netpolicy: optimize for queue pair kan.liang
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Current implementation searches the hash table to get assigned object
for each transmit/receive packet. It's not necessory, because the
assigned object usually remain unchanged. This patch store the assigned
queue to speed up the searching process.

But under certain situations, the assigned objects has to be changed,
especially when system cpu and queue mapping changed, such as CPU
hotplug, device hotplug, queue number changes and so on. In this patch,
the netpolicy_sys_map_version is used to track the system cpu and queue
mapping changes. If the netpolicy_sys_map_version doesn't match with the
instance's version, the stored queue will be dropped. The
netpolicy_sys_map_version is protected by RCU lock.

Also, to reduce the overhead, this patch asynchronously find the
available object by work queue. So the first several packets may not be
benefited.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  23 +++++++---
 net/core/netpolicy.c      | 106 +++++++++++++++++++++++++++++++++++++++++++++-
 net/ipv4/af_inet.c        |   7 +--
 3 files changed, 125 insertions(+), 11 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 04cd07d..88f4f60 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -55,14 +55,20 @@ struct netpolicy_sys_map {
 	u32	irq;
 };
 
+struct netpolicy_sys_map_version {
+	struct rcu_head		rcu;
+	int			major;
+};
+
 struct netpolicy_sys_info {
 	/*
 	 * Record the cpu and queue 1:1 mapping
 	 */
-	u32				avail_rx_num;
-	struct netpolicy_sys_map	*rx;
-	u32				avail_tx_num;
-	struct netpolicy_sys_map	*tx;
+	u32					avail_rx_num;
+	struct netpolicy_sys_map		*rx;
+	u32					avail_tx_num;
+	struct netpolicy_sys_map		*tx;
+	struct netpolicy_sys_map_version __rcu	*version;
 };
 
 struct netpolicy_object {
@@ -110,7 +116,14 @@ struct netpolicy_instance {
 	struct work_struct	fc_wk;		/* flow classification work */
 	atomic_t		fc_wk_cnt;	/* flow classification work number */
 	struct netpolicy_flow_spec	flow;	/* flow information */
-
+	/* For fast path */
+	atomic_t		rx_queue;
+	atomic_t		tx_queue;
+	struct work_struct	get_rx_wk;
+	atomic_t		get_rx_wk_cnt;
+	struct work_struct	get_tx_wk;
+	atomic_t		get_tx_wk_cnt;
+	int			sys_map_version;
 };
 
 struct netpolicy_cpu_load {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 60a6d69..2f55a14 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -84,6 +84,7 @@ static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
 struct workqueue_struct *np_fc_wq;
+struct workqueue_struct *np_fast_path_wq;
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -456,6 +457,37 @@ err:
 	return queue;
 }
 
+static void np_find_rx_queue(struct work_struct *wk)
+{
+	struct netpolicy_instance *instance;
+	int queue;
+
+	instance = container_of(wk, struct netpolicy_instance,
+				get_rx_wk);
+
+	if (instance) {
+		queue = get_avail_queue(instance, true);
+		if (queue >= 0)
+			atomic_set(&instance->rx_queue, queue);
+	}
+	atomic_set(&instance->get_rx_wk_cnt, 0);
+}
+
+static void np_find_tx_queue(struct work_struct *wk)
+{
+	struct netpolicy_instance *instance;
+	int queue;
+
+	instance = container_of(wk, struct netpolicy_instance,
+				get_tx_wk);
+	if (instance) {
+		queue = get_avail_queue(instance, false);
+		if (queue >= 0)
+			atomic_set(&instance->tx_queue, queue);
+	}
+	atomic_set(&instance->get_tx_wk_cnt, 0);
+}
+
 static inline bool policy_validate(struct netpolicy_instance *instance)
 {
 	struct net_device *dev = instance->dev;
@@ -498,6 +530,7 @@ static inline bool policy_validate(struct netpolicy_instance *instance)
 int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
 {
 	struct net_device *dev = instance->dev;
+	int version;
 
 	if (!dev || !dev->netpolicy)
 		return -EINVAL;
@@ -505,7 +538,35 @@ int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
 	if (!policy_validate(instance))
 		return -EINVAL;
 
-	return get_avail_queue(instance, is_rx);
+	/* fast path */
+	rcu_read_lock();
+	version = rcu_dereference(dev->netpolicy->sys_info.version)->major;
+	if (version == instance->sys_map_version) {
+		if (is_rx && (atomic_read(&instance->rx_queue) != NETPOLICY_INVALID_QUEUE)) {
+			rcu_read_unlock();
+			return atomic_read(&instance->rx_queue);
+		}
+		if (!is_rx && (atomic_read(&instance->tx_queue) != NETPOLICY_INVALID_QUEUE)) {
+			rcu_read_unlock();
+			return atomic_read(&instance->tx_queue);
+		}
+	} else {
+		atomic_set(&instance->rx_queue, NETPOLICY_INVALID_QUEUE);
+		atomic_set(&instance->tx_queue, NETPOLICY_INVALID_QUEUE);
+		instance->sys_map_version = version;
+	}
+	rcu_read_unlock();
+
+	if (is_rx && !atomic_cmpxchg(&instance->get_rx_wk_cnt, 0, 1)) {
+		instance->task = current;
+		queue_work(np_fast_path_wq, &instance->get_rx_wk);
+	}
+	if (!is_rx && !atomic_cmpxchg(&instance->get_tx_wk_cnt, 0, 1)) {
+		instance->task = current;
+		queue_work(np_fast_path_wq, &instance->get_tx_wk);
+	}
+
+	return -1;
 }
 EXPORT_SYMBOL(netpolicy_pick_queue);
 
@@ -541,6 +602,7 @@ void np_flow_rule_set(struct work_struct *wk)
 	queue = get_avail_queue(instance, true);
 	if (queue < 0)
 		goto done;
+	atomic_set(&instance->rx_queue, queue);
 
 	/* using ethtool flow-type to configure
 	 * Rx network flow classification options or rules
@@ -591,6 +653,14 @@ static void init_instance(struct netpolicy_instance *instance)
 	atomic_set(&instance->rule_queue, NETPOLICY_INVALID_QUEUE);
 	atomic_set(&instance->fc_wk_cnt, 0);
 	INIT_WORK(&instance->fc_wk, np_flow_rule_set);
+
+	atomic_set(&instance->rx_queue, NETPOLICY_INVALID_QUEUE);
+	atomic_set(&instance->tx_queue, NETPOLICY_INVALID_QUEUE);
+	instance->sys_map_version = 0;
+	atomic_set(&instance->get_rx_wk_cnt, 0);
+	atomic_set(&instance->get_tx_wk_cnt, 0);
+	INIT_WORK(&instance->get_rx_wk, np_find_rx_queue);
+	INIT_WORK(&instance->get_tx_wk, np_find_tx_queue);
 }
 
 /**
@@ -664,6 +734,8 @@ void netpolicy_unregister(struct netpolicy_instance *instance)
 	struct net_device *dev = instance->dev;
 	struct netpolicy_record *record;
 
+	cancel_work_sync(&instance->get_rx_wk);
+	cancel_work_sync(&instance->get_tx_wk);
 	cancel_work_sync(&instance->fc_wk);
 	/* remove FD rules */
 	if (dev && instance->location != NETPOLICY_INVALID_LOC) {
@@ -1196,6 +1268,7 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 
 int init_netpolicy(struct net_device *dev)
 {
+	struct netpolicy_sys_map_version *version;
 	int ret, i, j;
 
 	spin_lock(&dev->np_lock);
@@ -1229,6 +1302,14 @@ int init_netpolicy(struct net_device *dev)
 	}
 	spin_unlock(&dev->np_ob_list_lock);
 
+	version = kzalloc(sizeof(*version), GFP_ATOMIC);
+	if (version)
+		rcu_assign_pointer(dev->netpolicy->sys_info.version, version);
+	else {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+		ret = -ENOMEM;
+	}
 unlock:
 	spin_unlock(&dev->np_lock);
 	return ret;
@@ -1240,6 +1321,8 @@ void uninit_netpolicy(struct net_device *dev)
 	if (dev->netpolicy) {
 		if (dev->netpolicy->cur_policy > NET_POLICY_NONE)
 			netpolicy_disable(dev);
+		RCU_INIT_POINTER(dev->netpolicy->sys_info.version, NULL);
+		synchronize_rcu();
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
 	}
@@ -1348,6 +1431,7 @@ void update_netpolicy_sys_map(void)
 	struct net *net;
 	struct net_device *dev, *aux;
 	enum netpolicy_name cur_policy;
+	struct netpolicy_sys_map_version *new_version, *old_version;
 
 	for_each_net(net) {
 		for_each_netdev_safe(net, dev, aux) {
@@ -1379,6 +1463,19 @@ void update_netpolicy_sys_map(void)
 			}
 
 			dev->netpolicy->cur_policy = cur_policy;
+
+			old_version = rcu_dereference_protected(dev->netpolicy->sys_info.version, 1);
+			new_version = kzalloc(sizeof(*new_version), GFP_ATOMIC);
+			if (new_version) {
+				new_version->major = old_version->major + 1;
+				if (new_version->major < 0)
+					new_version->major = 0;
+				rcu_assign_pointer(dev->netpolicy->sys_info.version, new_version);
+				kfree_rcu(old_version, rcu);
+			} else {
+				pr_warn("NETPOLICY: Failed to update sys map version for dev %s\n",
+					dev->name);
+			}
 unlock:
 			spin_unlock(&dev->np_lock);
 		}
@@ -1418,6 +1515,12 @@ static int __init netpolicy_init(void)
 	if (!np_fc_wq)
 		return -ENOMEM;
 
+	np_fast_path_wq = create_workqueue("np_fast_path");
+	if (!np_fast_path_wq) {
+		destroy_workqueue(np_fc_wq);
+		return -ENOMEM;
+	}
+
 	ret = register_pernet_subsys(&netpolicy_net_ops);
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
@@ -1432,6 +1535,7 @@ static int __init netpolicy_init(void)
 static void __exit netpolicy_exit(void)
 {
 	destroy_workqueue(np_fc_wq);
+	destroy_workqueue(np_fast_path_wq);
 
 	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 71bee44..8d90afd 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -760,7 +760,6 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 	struct netpolicy_instance *instance;
 	struct netpolicy_flow_spec *flow;
 	bool change = false;
-	int queue;
 
 	instance = netpolicy_find_instance(sk);
 	if (!instance)
@@ -814,10 +813,8 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 		return;
 	}
 
-	queue = netpolicy_pick_queue(instance, true);
-	if (queue < 0)
-		return;
-	if ((queue != atomic_read(&instance->rule_queue)) || change)
+	if ((atomic_read(&instance->rx_queue) != atomic_read(&instance->rule_queue)) ||
+	    change)
 		netpolicy_set_rules(instance);
 #endif
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 24/26] net/netpolicy: optimize for queue pair
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (22 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 23/26] net/netpolicy: fast path for finding the queues kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 25/26] net/netpolicy: limit the total record number kan.liang
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Some drivers like i40e driver does not support separate Tx and Rx queues
as channels. Using Rx queue to stand for the channels, if queue_pair is
set by driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 1 +
 net/core/netpolicy.c      | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 88f4f60..9b03b4d 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -83,6 +83,7 @@ struct netpolicy_info {
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	bool irq_affinity;
 	bool has_mix_policy;
+	bool queue_pair;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 2f55a14..84503a4 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -538,6 +538,9 @@ int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx)
 	if (!policy_validate(instance))
 		return -EINVAL;
 
+	if (dev->netpolicy->queue_pair)
+		is_rx = true;
+
 	/* fast path */
 	rcu_read_lock();
 	version = rcu_dereference(dev->netpolicy->sys_info.version)->major;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 25/26] net/netpolicy: limit the total record number
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (23 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 24/26] net/netpolicy: optimize for queue pair kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 14:55 ` [RFC V3 PATCH 26/26] Documentation/networking: Document NET policy kan.liang
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

NET policy can not fulfill users request without limit, because of the
security consideration and device limitation. For security
consideration, the attacker may fake millions of per task/socket request
to crash the system. For device limitation, the flow director rules
number is limited on i40e driver. NET policy should not run out the
rules, otherwise it cannot guarantee the good performance.

This patch limits the total record number in RCU hash table to fix the
cases as above. The max total record number could vary for different
device. For i40e driver, it limits the record number according to flow
director rules number. If it exceeds the limitation, the registeration
and new object request will be denied.

Since the dev may not be aware in registeration, the cur_rec_num may not
be updated on time. So the actual registered record may exceeds the
max_rec_num. But it will not bring any problems. Because the patch also
check the limitation on object request. It guarantees that the device
resource will not run out.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  4 ++++
 net/core/netpolicy.c      | 23 +++++++++++++++++++++--
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 9b03b4d..27fe8e9 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -40,6 +40,7 @@ enum netpolicy_traffic {
 #define NETPOLICY_INVALID_QUEUE	-1
 #define NETPOLICY_INVALID_LOC	NETPOLICY_INVALID_QUEUE
 #define POLICY_NAME_LEN_MAX	64
+#define NETPOLICY_MAX_RECORD_NUM	7000
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
@@ -88,6 +89,9 @@ struct netpolicy_info {
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
 	struct list_head	obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
+	/* for record number limitation */
+	int			max_rec_num;
+	atomic_t		cur_rec_num;
 };
 
 struct netpolicy_tcpudpip4_spec {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 84503a4..81afc47 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -409,6 +409,9 @@ static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
 	unsigned long ptr_id = (uintptr_t)instance->ptr;
 	int queue = -1;
 
+	if (atomic_read(&dev->netpolicy->cur_rec_num) > dev->netpolicy->max_rec_num)
+		return queue;
+
 	spin_lock_bh(&np_hashtable_lock);
 	old_record = netpolicy_record_search(ptr_id);
 	if (!old_record) {
@@ -430,8 +433,10 @@ static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
 		if (is_rx) {
 			new_record->rx_obj = get_avail_object(dev, new_record->policy,
 							      instance, is_rx);
-			if (!new_record->dev)
+			if (!new_record->dev) {
 				new_record->dev = dev;
+				atomic_inc(&dev->netpolicy->cur_rec_num);
+			}
 			if (!new_record->rx_obj) {
 				kfree(new_record);
 				goto err;
@@ -440,8 +445,10 @@ static int get_avail_queue(struct netpolicy_instance *instance, bool is_rx)
 		} else {
 			new_record->tx_obj = get_avail_object(dev, new_record->policy,
 							      instance, is_rx);
-			if (!new_record->dev)
+			if (!new_record->dev) {
 				new_record->dev = dev;
+				atomic_inc(&dev->netpolicy->cur_rec_num);
+			}
 			if (!new_record->tx_obj) {
 				kfree(new_record);
 				goto err;
@@ -685,6 +692,7 @@ int netpolicy_register(struct netpolicy_instance *instance,
 		       enum netpolicy_name policy)
 {
 	unsigned long ptr_id = (uintptr_t)instance->ptr;
+	struct net_device *dev = instance->dev;
 	struct netpolicy_record *new, *old;
 
 	if (!is_net_policy_valid(policy)) {
@@ -692,6 +700,10 @@ int netpolicy_register(struct netpolicy_instance *instance,
 		return -EINVAL;
 	}
 
+	if (dev && dev->netpolicy &&
+	    (atomic_read(&dev->netpolicy->cur_rec_num) > dev->netpolicy->max_rec_num))
+		return -ENOSPC;
+
 	new = kzalloc(sizeof(*new), GFP_KERNEL);
 	if (!new) {
 		instance->policy = NET_POLICY_INVALID;
@@ -715,6 +727,8 @@ int netpolicy_register(struct netpolicy_instance *instance,
 		new->dev = instance->dev;
 		new->policy = policy;
 		hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+		if (dev && dev->netpolicy)
+			atomic_inc(&dev->netpolicy->cur_rec_num);
 	}
 	instance->policy = policy;
 	spin_unlock_bh(&np_hashtable_lock);
@@ -761,6 +775,8 @@ void netpolicy_unregister(struct netpolicy_instance *instance)
 		/* The record cannot be share. It can be safely free. */
 		put_queue(record->dev, record->rx_obj, record->tx_obj);
 		kfree(record);
+		if (dev && dev->netpolicy)
+			atomic_dec(&dev->netpolicy->cur_rec_num);
 	}
 	instance->policy = NET_POLICY_INVALID;
 	spin_unlock_bh(&np_hashtable_lock);
@@ -1298,6 +1314,9 @@ int init_netpolicy(struct net_device *dev)
 		goto unlock;
 	}
 
+	if (!dev->netpolicy->max_rec_num)
+		dev->netpolicy->max_rec_num = NETPOLICY_MAX_RECORD_NUM;
+
 	spin_lock(&dev->np_ob_list_lock);
 	for (i = 0; i < NETPOLICY_RXTX; i++) {
 		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC V3 PATCH 26/26] Documentation/networking: Document NET policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (24 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 25/26] net/netpolicy: limit the total record number kan.liang
@ 2016-09-12 14:55 ` kan.liang
  2016-09-12 15:38 ` [RFC V3 PATCH 00/26] Kernel " Florian Westphal
  2016-09-12 15:52 ` Eric Dumazet
  27 siblings, 0 replies; 44+ messages in thread
From: kan.liang @ 2016-09-12 14:55 UTC (permalink / raw)
  To: davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi,
	Kan Liang

From: Kan Liang <kan.liang@intel.com>

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 Documentation/networking/netpolicy.txt | 157 +++++++++++++++++++++++++++++++++
 1 file changed, 157 insertions(+)
 create mode 100644 Documentation/networking/netpolicy.txt

diff --git a/Documentation/networking/netpolicy.txt b/Documentation/networking/netpolicy.txt
new file mode 100644
index 0000000..b8e3d4c
--- /dev/null
+++ b/Documentation/networking/netpolicy.txt
@@ -0,0 +1,157 @@
+What is Linux Net Policy?
+
+It is a big challenge to get good network performance. First, the network
+performance is not good with default system settings. Second, it is too
+difficult to do automatic tuning for all possible workloads, since workloads
+have different requirements. Some workloads may want high throughput. Some may
+need low latency. Last but not least, there are lots of manual configurations.
+Fine grained configuration is too difficult for users.
+
+"NET policy" intends to simplify the network configuration and get a
+good network performance according to the hints(policy) which is applied by
+user. It provides some typical "policies" for user which can be set
+per-socket, per-task or per-device. The kernel automatically figures out
+how to merge different requests to get good network performance.
+
+"Net policy" is designed for multiqueue network devices. This document
+describes the concepts and APIs of "net policy" support.
+
+NET POLICY CONCEPTS
+
+Scope of Net Policies
+
+    Device net policy: this policy applies to the whole device. Once the
+    device net policy is set, it automatically configures the system
+    according to the applied policy. The configuration usually includes IRQ
+    affinity, IRQ balance disable, interrupt moderation, and so on. But the
+    device net policy does not change the packet direction.
+
+    Task net policy: this is a per-task policy. When it is applied to specific
+    task, all packet transmissions of the task will be redirected to the
+    assigned queues accordingly. If a task does not define a task policy,
+    it "falls back" to the system default way to direct the packets. The
+    per-task policy must be compatible with device net policy.
+
+    Socket net policy: this is a per-socket policy. When it is applied to
+    specific socket, all packet transmissions of the socket will be redirected
+    to the assigned queues accordingly. If a socket does not define a socket
+    policy, it "falls back" to the system default way to direct the packets.
+    The per-socket policy must be compatible with both device net policy and
+    per-task policy.
+
+Components of Net Policies
+
+    Net policy object: it is a combination of CPU and queue. The queue IRQ has
+    to set affinity with the CPU. It can be shared between sockets and tasks.
+    A reference counter is used to track the sharing number.
+
+    Net policy object list: each device policy has an object list. Once the
+    device policy is determined, the net policy object will be inserted into
+    the net policy object list. The net policy object list does not change
+    unless the CPU/queue number is changed, the netpolicy is disabled or
+    the device policy is changed.
+    The network performance for objects could be different because of the
+    CPU/queue topology and dev location. The objects which can bring high
+    performance are in the front of the list.
+
+    RCU hash table: an RCU hash table to maintain the relationship between
+    the task/socket and the assigned object. The task/socket can get the
+    assigned object by searching the table.
+    If it is the first time, there is no assigned object in the table. It will
+    go through the object list to find the available object based on position
+    and reference number.
+    If the net policy object list changes, all the assigned objects will become
+    invalid.
+
+NET POLICY APIs
+
+Interfaces between net policy and device driver
+
+    int (*ndo_netpolicy_init)(struct net_device *dev,
+                              struct netpolicy_info *info);
+
+    The device driver who has NET policy support must implement this interface.
+    In this interface, the device driver does necessory initialization, and fill
+    the info for net policy module. The information could include supported
+    policy, MIX policy support, queue pair support and so on.
+
+    int (*ndo_get_irq_info)(struct net_device *dev,
+                            struct netpolicy_dev_info *info);
+
+    This interface is used to get more accurate device IRQ information.
+
+    int (*ndo_set_net_policy)(struct net_device *dev,
+                              enum netpolicy_name name);
+
+    This interface is used to set device net policy by name. It is device driver's
+    responsibility to set driver specific configuration for the given policy.
+
+Interfaces between net policy and kernel
+
+    int netpolicy_register(struct netpolicy_instance *instance);
+    void netpolicy_unregister(struct netpolicy_instance *instance);
+
+    This interface is used to register per task/socket net policy.
+    The socket/task can only be benefited when it register itself with specific
+    policy. After registeration, a record will be created and inserted into RCU
+    hash table, which include all the NET policy related information for the
+    socket/task, such as pointor, policy, object and so on.
+
+    int netpolicy_pick_queue(struct netpolicy_instance *instance, bool is_rx);
+
+    This interface is used to find the proper queue(object) for packet
+    receiving and transmitting. The proper queue is picked from object list
+    according to policy, reference, location and so on.
+
+
+    int netpolicy_set_rules(struct netpolicy_instance *instance);
+
+    This interface is used to add device specific rules. Once the rule is
+    applied, the packet from specific IP and port will be redirected to the
+    given queue. This interface is usually used on receive side.
+
+NET POLICY INTERFACE
+
+Device net policy setting
+
+    /proc/net/netpolicy/$DEV/policy
+
+    Concatenating(cat) the "policy" file can show the available device
+    policies, if there is no device policy applied. Otherwise, the device
+    policy name will be printed out. If it is MIX policy, the policy for each
+    queue will also be printed out.
+    User can set device net policy by writing policy name.
+
+Task policy setting
+
+    /proc/$PID/net_policy
+
+    Concatenating(cat) the "net_policy" file can show the applied per task
+    policy.
+    User can set per task net policy by writing policy name.
+
+    OR
+
+    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
+
+    "prctl" is an alternative way to set/get per task policy.
+
+Socket policy setting
+
+    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
+
+    The socket net policy can be set by option SO_NETPOLICY of setsockopt.
+
+AVAILABLE NET POLICIES
+
+    The available net policies are defined as below:
+    - CPU: intends to get higher throughput and lower CPU% (power saving).
+           This policy can be applied as either device net policy or
+           task/socket net policy.
+    - BULK: intends to get highest throughput. This policy can be applied as
+            either device net policy or task/socket net policy.
+    - LATENCY: intends to get lowest latency. This policy can be applied as
+               either device net policy or task/socket net policy.
+    - MIX: combination of other policies, which allows each queue to have a
+           different policy. This policy can only be set as device net policy.
+
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (25 preceding siblings ...)
  2016-09-12 14:55 ` [RFC V3 PATCH 26/26] Documentation/networking: Document NET policy kan.liang
@ 2016-09-12 15:38 ` Florian Westphal
  2016-09-12 17:21     ` Cong Wang
  2016-09-12 15:52 ` Eric Dumazet
  27 siblings, 1 reply; 44+ messages in thread
From: Florian Westphal @ 2016-09-12 15:38 UTC (permalink / raw)
  To: kan.liang
  Cc: davem, linux-kernel, netdev, jeffrey.t.kirsher, mingo, peterz,
	kuznet, jmorris, yoshfuji, kaber, akpm, keescook, viro, gorcunov,
	john.stultz, aduyck, ben, decot, fw, alexander.duyck, daniel,
	tom, rdunlap, xiyou.wangcong, hannes, stephen,
	alexei.starovoitov, jesse.brandeburg, andi

kan.liang@intel.com <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too

[..]

I ask to be dropped from CC list of further submissions of this series,
I've said all I have say about this ('do it in userspace') and
its very unlikely I will change my opinion.

Thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
  2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
                   ` (26 preceding siblings ...)
  2016-09-12 15:38 ` [RFC V3 PATCH 00/26] Kernel " Florian Westphal
@ 2016-09-12 15:52 ` Eric Dumazet
  2016-09-19 20:39   ` Stephen Hemminger
  27 siblings, 1 reply; 44+ messages in thread
From: Eric Dumazet @ 2016-09-12 15:52 UTC (permalink / raw)
  To: kan.liang
  Cc: davem, linux-kernel, netdev, jeffrey.t.kirsher, mingo, peterz,
	kuznet, jmorris, yoshfuji, kaber, akpm, keescook, viro, gorcunov,
	john.stultz, aduyck, ben, decot, fw, alexander.duyck, daniel,
	tom, rdunlap, xiyou.wangcong, hannes, stephen,
	alexei.starovoitov, jesse.brandeburg, andi

On Mon, 2016-09-12 at 07:55 -0700, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 

> 
>  Documentation/networking/netpolicy.txt |  157 ++++


I find this patch series very suspect, as
Documentation/networking/scaling.txt is untouched.

I highly recommend you present your ideas at next netdev conference.

I really doubt the mailing lists are the best place to present your
work, given the huge amount of code/layers you want to add in linux
kernel.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
  2016-09-12 14:55 ` [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information kan.liang
@ 2016-09-12 16:48   ` Sergei Shtylyov
  2016-09-13 12:23       ` Liang, Kan
  0 siblings, 1 reply; 44+ messages in thread
From: Sergei Shtylyov @ 2016-09-12 16:48 UTC (permalink / raw)
  To: kan.liang, davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi

Hello.

On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:

> From: Kan Liang <kan.liang@intel.com>
>
> Net policy needs to know device information. Currently, it's enough to
> only get irq information of rx and tx queues.
>
> This patch introduces ndo ops to do so, not ethtool ops.
> Because there are already several ways to get irq information in
> userspace. It's not necessory to extend the ethtool.

    Necessary.

> Signed-off-by: Kan Liang <kan.liang@intel.com>

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc
  2016-09-12 14:55 ` [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc kan.liang
@ 2016-09-12 17:01   ` Sergei Shtylyov
  0 siblings, 0 replies; 44+ messages in thread
From: Sergei Shtylyov @ 2016-09-12 17:01 UTC (permalink / raw)
  To: kan.liang, davem, linux-kernel, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, fw, alexander.duyck, daniel, tom, rdunlap, xiyou.wangcong,
	hannes, stephen, alexei.starovoitov, jesse.brandeburg, andi

On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:

> From: Kan Liang <kan.liang@intel.com>
>
> Users may not want to change the source code to add per task net polic

    Policy?

> support. Or they may want to change a running task's net policy. prctl
> does not work for both cases.
>
> This patch adds an interface in /proc, which can be used to set and
> retrieve policy of already running tasks. User can write the policy name
> into /proc/$PID/net_policy to set per task net policy.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
  2016-09-12 15:38 ` [RFC V3 PATCH 00/26] Kernel " Florian Westphal
@ 2016-09-12 17:21     ` Cong Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-12 17:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Liang, Kan, David Miller, LKML, Linux Kernel Network Developers,
	Jeff Kirsher, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz,
	Alexander Duyck, ben, David Decotigny, Alexander Duyck,
	Daniel Borkmann, Tom Herbert, rdunlap, Hannes Frederic Sowa,
	Stephen Hemminger, Alexei Starovoitov, Jesse Brandeburg,
	Andi Kleen

On Mon, Sep 12, 2016 at 8:38 AM, Florian Westphal <fw@strlen.de> wrote:
> kan.liang@intel.com <kan.liang@intel.com> wrote:
>> From: Kan Liang <kan.liang@intel.com>
>>
>> It is a big challenge to get good network performance. First, the network
>> performance is not good with default system settings. Second, it is too
>
> [..]
>
> I ask to be dropped from CC list of further submissions of this series,
> I've said all I have say about this ('do it in userspace') and
> its very unlikely I will change my opinion.

+1
Same for me.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
@ 2016-09-12 17:21     ` Cong Wang
  0 siblings, 0 replies; 44+ messages in thread
From: Cong Wang @ 2016-09-12 17:21 UTC (permalink / raw)
  To: Florian Westphal
  Cc: Liang, Kan, David Miller, LKML, Linux Kernel Network Developers,
	Jeff Kirsher, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz,
	Alexander Duyck, ben, David Decotigny, Alexander Duyck

On Mon, Sep 12, 2016 at 8:38 AM, Florian Westphal <fw@strlen.de> wrote:
> kan.liang@intel.com <kan.liang@intel.com> wrote:
>> From: Kan Liang <kan.liang@intel.com>
>>
>> It is a big challenge to get good network performance. First, the network
>> performance is not good with default system settings. Second, it is too
>
> [..]
>
> I ask to be dropped from CC list of further submissions of this series,
> I've said all I have say about this ('do it in userspace') and
> its very unlikely I will change my opinion.

+1
Same for me.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy
  2016-09-12 14:55 ` [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy kan.liang
@ 2016-09-12 20:23     ` Tom Herbert
  0 siblings, 0 replies; 44+ messages in thread
From: Tom Herbert @ 2016-09-12 20:23 UTC (permalink / raw)
  To: Liang, Kan
  Cc: David S. Miller, LKML, Linux Kernel Network Developers,
	Jeff Kirsher, Ingo Molnar, peterz, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, akpm,
	Kees Cook, viro, gorcunov, John Stultz, Alexander Duyck,
	Ben Hutchings, David Decotigny, Florian Westphal,
	Alexander Duyck, Daniel Borkmann, rdunlap, Cong Wang,
	Hannes Frederic Sowa, Stephen Hemminger, Alexei Starovoitov,
	Jesse Brandeburg, Andi Kleen

On Mon, Sep 12, 2016 at 7:55 AM,  <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> When the device tries to transmit a packet, netdev_pick_tx is called to
> find the available tx queues. If the net policy is applied, it picks up
> the assigned tx queue from net policy subsystem, and redirect the
> traffic to the assigned queue.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  include/net/sock.h |  9 +++++++++
>  net/core/dev.c     | 20 ++++++++++++++++++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e1e9e3d..ca97f35 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2280,4 +2280,13 @@ extern int sysctl_optmem_max;
>  extern __u32 sysctl_wmem_default;
>  extern __u32 sysctl_rmem_default;
>
> +/* Return netpolicy instance information from socket. */
> +static inline struct netpolicy_instance *netpolicy_find_instance(struct sock *sk)
> +{
> +#ifdef CONFIG_NETPOLICY
> +       if (is_net_policy_valid(sk->sk_netpolicy.policy))
> +               return &sk->sk_netpolicy;
> +#endif
> +       return NULL;
> +}
>  #endif /* _SOCK_H */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 34b5322..b9a8044 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
>                                     struct sk_buff *skb,
>                                     void *accel_priv)
>  {
> +       struct sock *sk = skb->sk;
>         int queue_index = 0;
>
>  #ifdef CONFIG_XPS
> @@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
>                 if (ops->ndo_select_queue)
>                         queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
>                                                             __netdev_pick_tx);
> -               else
> -                       queue_index = __netdev_pick_tx(dev, skb);
> +               else {
> +#ifdef CONFIG_NETPOLICY
> +                       struct netpolicy_instance *instance;
> +
> +                       queue_index = -1;
> +                       if (dev->netpolicy && sk) {
> +                               instance = netpolicy_find_instance(sk);
> +                               if (instance) {
> +                                       if (!instance->dev)
> +                                               instance->dev = dev;
> +                                       queue_index = netpolicy_pick_queue(instance, false);
> +                               }
> +                       }
> +                       if (queue_index < 0)
> +#endif

I doubt this produces the intended effect. Several drivers use
ndo_select_queue (such as mlx4) where there might do something special
for a few packets but end up called the default handler which
__netdev_pick_tx for most packets. So in such cases the netpolicy path
would be routinely bypassed. Maybe this code should be in
__netdev_pick_tx.

Tom

> +                               queue_index = __netdev_pick_tx(dev, skb);
> +               }
>
>                 if (!accel_priv)
>                         queue_index = netdev_cap_txqueue(dev, queue_index);
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy
@ 2016-09-12 20:23     ` Tom Herbert
  0 siblings, 0 replies; 44+ messages in thread
From: Tom Herbert @ 2016-09-12 20:23 UTC (permalink / raw)
  To: Liang, Kan
  Cc: David S. Miller, LKML, Linux Kernel Network Developers,
	Jeff Kirsher, Ingo Molnar, peterz, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, akpm,
	Kees Cook, viro, gorcunov, John Stultz, Alexander Duyck,
	Ben Hutchings, David Decotigny, Florian Westphal,
	Alexander Duyck, Daniel Borkmann, r

On Mon, Sep 12, 2016 at 7:55 AM,  <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> When the device tries to transmit a packet, netdev_pick_tx is called to
> find the available tx queues. If the net policy is applied, it picks up
> the assigned tx queue from net policy subsystem, and redirect the
> traffic to the assigned queue.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  include/net/sock.h |  9 +++++++++
>  net/core/dev.c     | 20 ++++++++++++++++++--
>  2 files changed, 27 insertions(+), 2 deletions(-)
>
> diff --git a/include/net/sock.h b/include/net/sock.h
> index e1e9e3d..ca97f35 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -2280,4 +2280,13 @@ extern int sysctl_optmem_max;
>  extern __u32 sysctl_wmem_default;
>  extern __u32 sysctl_rmem_default;
>
> +/* Return netpolicy instance information from socket. */
> +static inline struct netpolicy_instance *netpolicy_find_instance(struct sock *sk)
> +{
> +#ifdef CONFIG_NETPOLICY
> +       if (is_net_policy_valid(sk->sk_netpolicy.policy))
> +               return &sk->sk_netpolicy;
> +#endif
> +       return NULL;
> +}
>  #endif /* _SOCK_H */
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 34b5322..b9a8044 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
>                                     struct sk_buff *skb,
>                                     void *accel_priv)
>  {
> +       struct sock *sk = skb->sk;
>         int queue_index = 0;
>
>  #ifdef CONFIG_XPS
> @@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
>                 if (ops->ndo_select_queue)
>                         queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
>                                                             __netdev_pick_tx);
> -               else
> -                       queue_index = __netdev_pick_tx(dev, skb);
> +               else {
> +#ifdef CONFIG_NETPOLICY
> +                       struct netpolicy_instance *instance;
> +
> +                       queue_index = -1;
> +                       if (dev->netpolicy && sk) {
> +                               instance = netpolicy_find_instance(sk);
> +                               if (instance) {
> +                                       if (!instance->dev)
> +                                               instance->dev = dev;
> +                                       queue_index = netpolicy_pick_queue(instance, false);
> +                               }
> +                       }
> +                       if (queue_index < 0)
> +#endif

I doubt this produces the intended effect. Several drivers use
ndo_select_queue (such as mlx4) where there might do something special
for a few packets but end up called the default handler which
__netdev_pick_tx for most packets. So in such cases the netpolicy path
would be routinely bypassed. Maybe this code should be in
__netdev_pick_tx.

Tom

> +                               queue_index = __netdev_pick_tx(dev, skb);
> +               }
>
>                 if (!accel_priv)
>                         queue_index = netdev_cap_txqueue(dev, queue_index);
> --
> 2.5.5
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy
  2016-09-12 20:23     ` Tom Herbert
@ 2016-09-13 12:22       ` Liang, Kan
  -1 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2016-09-13 12:22 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, LKML, Linux Kernel Network Developers, Kirsher,
	Jeffrey T, Ingo Molnar, peterz, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, akpm, Kees Cook, viro,
	gorcunov, John Stultz, Alexander Duyck, David Decotigny,
	Alexander Duyck, Daniel Borkmann, rdunlap, Hannes Frederic Sowa,
	Stephen Hemminger, Alexei Starovoitov, Brandeburg, Jesse,
	Andi Kleen



> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: Monday, September 12, 2016 4:23 PM
> To: Liang, Kan <kan.liang@intel.com>
> Cc: David S. Miller <davem@davemloft.net>; LKML <linux-
> kernel@vger.kernel.org>; Linux Kernel Network Developers
> <netdev@vger.kernel.org>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> Ingo Molnar <mingo@redhat.com>; peterz@infradead.org; Alexey Kuznetsov
> <kuznet@ms2.inr.ac.ru>; James Morris <jmorris@namei.org>; Hideaki
> YOSHIFUJI <yoshfuji@linux-ipv6.org>; Patrick McHardy <kaber@trash.net>;
> akpm@linux-foundation.org; Kees Cook <keescook@chromium.org>;
> viro@zeniv.linux.org.uk; gorcunov@openvz.org; John Stultz
> <john.stultz@linaro.org>; Alexander Duyck <aduyck@mirantis.com>; Ben
> Hutchings <ben@decadent.org.uk>; David Decotigny <decot@googlers.com>;
> Florian Westphal <fw@strlen.de>; Alexander Duyck
> <alexander.duyck@gmail.com>; Daniel Borkmann <daniel@iogearbox.net>;
> rdunlap@infradead.org; Cong Wang <xiyou.wangcong@gmail.com>; Hannes
> Frederic Sowa <hannes@stressinduktion.org>; Stephen Hemminger
> <stephen@networkplumber.org>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Andi Kleen <andi@firstfloor.org>
> Subject: Re: [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to
> policy
> 
> On Mon, Sep 12, 2016 at 7:55 AM,  <kan.liang@intel.com> wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > When the device tries to transmit a packet, netdev_pick_tx is called
> > to find the available tx queues. If the net policy is applied, it
> > picks up the assigned tx queue from net policy subsystem, and redirect
> > the traffic to the assigned queue.
> >
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > ---
> >  include/net/sock.h |  9 +++++++++
> >  net/core/dev.c     | 20 ++++++++++++++++++--
> >  2 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h index
> > e1e9e3d..ca97f35 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2280,4 +2280,13 @@ extern int sysctl_optmem_max;  extern __u32
> > sysctl_wmem_default;  extern __u32 sysctl_rmem_default;
> >
> > +/* Return netpolicy instance information from socket. */ static
> > +inline struct netpolicy_instance *netpolicy_find_instance(struct sock
> > +*sk) { #ifdef CONFIG_NETPOLICY
> > +       if (is_net_policy_valid(sk->sk_netpolicy.policy))
> > +               return &sk->sk_netpolicy; #endif
> > +       return NULL;
> > +}
> >  #endif /* _SOCK_H */
> > diff --git a/net/core/dev.c b/net/core/dev.c index 34b5322..b9a8044
> > 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct
> net_device *dev,
> >                                     struct sk_buff *skb,
> >                                     void *accel_priv)  {
> > +       struct sock *sk = skb->sk;
> >         int queue_index = 0;
> >
> >  #ifdef CONFIG_XPS
> > @@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct
> net_device *dev,
> >                 if (ops->ndo_select_queue)
> >                         queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
> >                                                             __netdev_pick_tx);
> > -               else
> > -                       queue_index = __netdev_pick_tx(dev, skb);
> > +               else {
> > +#ifdef CONFIG_NETPOLICY
> > +                       struct netpolicy_instance *instance;
> > +
> > +                       queue_index = -1;
> > +                       if (dev->netpolicy && sk) {
> > +                               instance = netpolicy_find_instance(sk);
> > +                               if (instance) {
> > +                                       if (!instance->dev)
> > +                                               instance->dev = dev;
> > +                                       queue_index = netpolicy_pick_queue(instance, false);
> > +                               }
> > +                       }
> > +                       if (queue_index < 0) #endif
> 
> I doubt this produces the intended effect. Several drivers use
> ndo_select_queue (such as mlx4) where there might do something special
> for a few packets but end up called the default handler which
> __netdev_pick_tx for most packets. So in such cases the netpolicy path would
> be routinely bypassed. Maybe this code should be in __netdev_pick_tx.

I will move the code to __netdev_pick_tx in next version.

Thanks,
Kan

> 
> Tom
> 
> > +                               queue_index = __netdev_pick_tx(dev, skb);
> > +               }
> >
> >                 if (!accel_priv)
> >                         queue_index = netdev_cap_txqueue(dev,
> > queue_index);
> > --
> > 2.5.5
> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy
@ 2016-09-13 12:22       ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2016-09-13 12:22 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S. Miller, LKML, Linux Kernel Network Developers, Kirsher,
	Jeffrey T, Ingo Molnar, peterz, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, akpm, Kees Cook, viro,
	gorcunov, John Stultz, Alexander Duyck, David Decotigny,
	Alexander Duyck



> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: Monday, September 12, 2016 4:23 PM
> To: Liang, Kan <kan.liang@intel.com>
> Cc: David S. Miller <davem@davemloft.net>; LKML <linux-
> kernel@vger.kernel.org>; Linux Kernel Network Developers
> <netdev@vger.kernel.org>; Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>;
> Ingo Molnar <mingo@redhat.com>; peterz@infradead.org; Alexey Kuznetsov
> <kuznet@ms2.inr.ac.ru>; James Morris <jmorris@namei.org>; Hideaki
> YOSHIFUJI <yoshfuji@linux-ipv6.org>; Patrick McHardy <kaber@trash.net>;
> akpm@linux-foundation.org; Kees Cook <keescook@chromium.org>;
> viro@zeniv.linux.org.uk; gorcunov@openvz.org; John Stultz
> <john.stultz@linaro.org>; Alexander Duyck <aduyck@mirantis.com>; Ben
> Hutchings <ben@decadent.org.uk>; David Decotigny <decot@googlers.com>;
> Florian Westphal <fw@strlen.de>; Alexander Duyck
> <alexander.duyck@gmail.com>; Daniel Borkmann <daniel@iogearbox.net>;
> rdunlap@infradead.org; Cong Wang <xiyou.wangcong@gmail.com>; Hannes
> Frederic Sowa <hannes@stressinduktion.org>; Stephen Hemminger
> <stephen@networkplumber.org>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; Brandeburg, Jesse
> <jesse.brandeburg@intel.com>; Andi Kleen <andi@firstfloor.org>
> Subject: Re: [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to
> policy
> 
> On Mon, Sep 12, 2016 at 7:55 AM,  <kan.liang@intel.com> wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > When the device tries to transmit a packet, netdev_pick_tx is called
> > to find the available tx queues. If the net policy is applied, it
> > picks up the assigned tx queue from net policy subsystem, and redirect
> > the traffic to the assigned queue.
> >
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> > ---
> >  include/net/sock.h |  9 +++++++++
> >  net/core/dev.c     | 20 ++++++++++++++++++--
> >  2 files changed, 27 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h index
> > e1e9e3d..ca97f35 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2280,4 +2280,13 @@ extern int sysctl_optmem_max;  extern __u32
> > sysctl_wmem_default;  extern __u32 sysctl_rmem_default;
> >
> > +/* Return netpolicy instance information from socket. */ static
> > +inline struct netpolicy_instance *netpolicy_find_instance(struct sock
> > +*sk) { #ifdef CONFIG_NETPOLICY
> > +       if (is_net_policy_valid(sk->sk_netpolicy.policy))
> > +               return &sk->sk_netpolicy; #endif
> > +       return NULL;
> > +}
> >  #endif /* _SOCK_H */
> > diff --git a/net/core/dev.c b/net/core/dev.c index 34b5322..b9a8044
> > 100644
> > --- a/net/core/dev.c
> > +++ b/net/core/dev.c
> > @@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct
> net_device *dev,
> >                                     struct sk_buff *skb,
> >                                     void *accel_priv)  {
> > +       struct sock *sk = skb->sk;
> >         int queue_index = 0;
> >
> >  #ifdef CONFIG_XPS
> > @@ -3280,8 +3281,23 @@ struct netdev_queue *netdev_pick_tx(struct
> net_device *dev,
> >                 if (ops->ndo_select_queue)
> >                         queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
> >                                                             __netdev_pick_tx);
> > -               else
> > -                       queue_index = __netdev_pick_tx(dev, skb);
> > +               else {
> > +#ifdef CONFIG_NETPOLICY
> > +                       struct netpolicy_instance *instance;
> > +
> > +                       queue_index = -1;
> > +                       if (dev->netpolicy && sk) {
> > +                               instance = netpolicy_find_instance(sk);
> > +                               if (instance) {
> > +                                       if (!instance->dev)
> > +                                               instance->dev = dev;
> > +                                       queue_index = netpolicy_pick_queue(instance, false);
> > +                               }
> > +                       }
> > +                       if (queue_index < 0) #endif
> 
> I doubt this produces the intended effect. Several drivers use
> ndo_select_queue (such as mlx4) where there might do something special
> for a few packets but end up called the default handler which
> __netdev_pick_tx for most packets. So in such cases the netpolicy path would
> be routinely bypassed. Maybe this code should be in __netdev_pick_tx.

I will move the code to __netdev_pick_tx in next version.

Thanks,
Kan

> 
> Tom
> 
> > +                               queue_index = __netdev_pick_tx(dev, skb);
> > +               }
> >
> >                 if (!accel_priv)
> >                         queue_index = netdev_cap_txqueue(dev,
> > queue_index);
> > --
> > 2.5.5
> >

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
  2016-09-12 16:48   ` Sergei Shtylyov
@ 2016-09-13 12:23       ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2016-09-13 12:23 UTC (permalink / raw)
  To: Sergei Shtylyov, davem, linux-kernel, netdev
  Cc: Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck,
	decot, alexander.duyck, daniel, tom, rdunlap, hannes, stephen,
	alexei.starovoitov, Brandeburg, Jesse, andi

> 
> Hello.
> 
> On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:
> 
> > From: Kan Liang <kan.liang@intel.com>
> >
> > Net policy needs to know device information. Currently, it's enough to
> > only get irq information of rx and tx queues.
> >
> > This patch introduces ndo ops to do so, not ethtool ops.
> > Because there are already several ways to get irq information in
> > userspace. It's not necessory to extend the ethtool.
> 
>     Necessary.

OK. I will extend the ethtool in next version.

Thanks,
Kan

> 
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> 
> [...]
> 
> MBR, Sergei

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
@ 2016-09-13 12:23       ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2016-09-13 12:23 UTC (permalink / raw)
  To: Sergei Shtylyov, davem, linux-kernel, netdev
  Cc: Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck,
	decot, alexander.duyck, daniel,

> 
> Hello.
> 
> On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:
> 
> > From: Kan Liang <kan.liang@intel.com>
> >
> > Net policy needs to know device information. Currently, it's enough to
> > only get irq information of rx and tx queues.
> >
> > This patch introduces ndo ops to do so, not ethtool ops.
> > Because there are already several ways to get irq information in
> > userspace. It's not necessory to extend the ethtool.
> 
>     Necessary.

OK. I will extend the ethtool in next version.

Thanks,
Kan

> 
> > Signed-off-by: Kan Liang <kan.liang@intel.com>
> 
> [...]
> 
> MBR, Sergei


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
  2016-09-13 12:23       ` Liang, Kan
@ 2016-09-13 13:14         ` Alexander Duyck
  -1 siblings, 0 replies; 44+ messages in thread
From: Alexander Duyck @ 2016-09-13 13:14 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Sergei Shtylyov, davem, linux-kernel, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, decot, daniel, tom, rdunlap,
	hannes, stephen, alexei.starovoitov, Brandeburg, Jesse, andi

On Tue, Sep 13, 2016 at 5:23 AM, Liang, Kan <kan.liang@intel.com> wrote:
>>
>> Hello.
>>
>> On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:
>>
>> > From: Kan Liang <kan.liang@intel.com>
>> >
>> > Net policy needs to know device information. Currently, it's enough to
>> > only get irq information of rx and tx queues.
>> >
>> > This patch introduces ndo ops to do so, not ethtool ops.
>> > Because there are already several ways to get irq information in
>> > userspace. It's not necessory to extend the ethtool.
>>
>>     Necessary.
>
> OK. I will extend the ethtool in next version.
>
> Thanks,
> Kan

Kan, I don't think Sergei was saying you have to extend the ethtool.
Your spelling of necessary was incorrect in your patch description.

Sergei, please feel free to tell me I am wrong if my assumption on
that is incorrect.

- Alex

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
@ 2016-09-13 13:14         ` Alexander Duyck
  0 siblings, 0 replies; 44+ messages in thread
From: Alexander Duyck @ 2016-09-13 13:14 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Sergei Shtylyov, davem, linux-kernel, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz@linaro.org

On Tue, Sep 13, 2016 at 5:23 AM, Liang, Kan <kan.liang@intel.com> wrote:
>>
>> Hello.
>>
>> On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:
>>
>> > From: Kan Liang <kan.liang@intel.com>
>> >
>> > Net policy needs to know device information. Currently, it's enough to
>> > only get irq information of rx and tx queues.
>> >
>> > This patch introduces ndo ops to do so, not ethtool ops.
>> > Because there are already several ways to get irq information in
>> > userspace. It's not necessory to extend the ethtool.
>>
>>     Necessary.
>
> OK. I will extend the ethtool in next version.
>
> Thanks,
> Kan

Kan, I don't think Sergei was saying you have to extend the ethtool.
Your spelling of necessary was incorrect in your patch description.

Sergei, please feel free to tell me I am wrong if my assumption on
that is incorrect.

- Alex

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
  2016-09-13 13:14         ` Alexander Duyck
@ 2016-09-13 13:22           ` Liang, Kan
  -1 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2016-09-13 13:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Sergei Shtylyov, davem, linux-kernel, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, decot, daniel, tom, rdunlap,
	hannes, stephen, alexei.starovoitov, Brandeburg, Jesse, andi



> On Tue, Sep 13, 2016 at 5:23 AM, Liang, Kan <kan.liang@intel.com> wrote:
> >>
> >> Hello.
> >>
> >> On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:
> >>
> >> > From: Kan Liang <kan.liang@intel.com>
> >> >
> >> > Net policy needs to know device information. Currently, it's enough
> >> > to only get irq information of rx and tx queues.
> >> >
> >> > This patch introduces ndo ops to do so, not ethtool ops.
> >> > Because there are already several ways to get irq information in
> >> > userspace. It's not necessory to extend the ethtool.
> >>
> >>     Necessary.
> >
> > OK. I will extend the ethtool in next version.
> >
> > Thanks,
> > Kan
> 
> Kan, I don't think Sergei was saying you have to extend the ethtool.
> Your spelling of necessary was incorrect in your patch description.
> 
> Sergei, please feel free to tell me I am wrong if my assumption on that is
> incorrect.
> 
> - Alex

Oh, I see. Thanks Alex. :)

Kan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information
@ 2016-09-13 13:22           ` Liang, Kan
  0 siblings, 0 replies; 44+ messages in thread
From: Liang, Kan @ 2016-09-13 13:22 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Sergei Shtylyov, davem, linux-kernel, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz@linaro.org



> On Tue, Sep 13, 2016 at 5:23 AM, Liang, Kan <kan.liang@intel.com> wrote:
> >>
> >> Hello.
> >>
> >> On 09/12/2016 05:55 PM, kan.liang@intel.com wrote:
> >>
> >> > From: Kan Liang <kan.liang@intel.com>
> >> >
> >> > Net policy needs to know device information. Currently, it's enough
> >> > to only get irq information of rx and tx queues.
> >> >
> >> > This patch introduces ndo ops to do so, not ethtool ops.
> >> > Because there are already several ways to get irq information in
> >> > userspace. It's not necessory to extend the ethtool.
> >>
> >>     Necessary.
> >
> > OK. I will extend the ethtool in next version.
> >
> > Thanks,
> > Kan
> 
> Kan, I don't think Sergei was saying you have to extend the ethtool.
> Your spelling of necessary was incorrect in your patch description.
> 
> Sergei, please feel free to tell me I am wrong if my assumption on that is
> incorrect.
> 
> - Alex

Oh, I see. Thanks Alex. :)

Kan



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC V3 PATCH 00/26] Kernel NET policy
  2016-09-12 15:52 ` Eric Dumazet
@ 2016-09-19 20:39   ` Stephen Hemminger
  0 siblings, 0 replies; 44+ messages in thread
From: Stephen Hemminger @ 2016-09-19 20:39 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: kan.liang, davem, linux-kernel, netdev, jeffrey.t.kirsher, mingo,
	peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook, viro,
	gorcunov, john.stultz, aduyck, ben, decot, fw, alexander.duyck,
	daniel, tom, rdunlap, xiyou.wangcong, hannes, alexei.starovoitov,
	jesse.brandeburg, andi

On Mon, 12 Sep 2016 08:52:14 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Mon, 2016-09-12 at 07:55 -0700, kan.liang@intel.com wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >   
> 
> > 
> >  Documentation/networking/netpolicy.txt |  157 ++++  
> 
> 
> I find this patch series very suspect, as
> Documentation/networking/scaling.txt is untouched.
> 
> I highly recommend you present your ideas at next netdev conference.
> 
> I really doubt the mailing lists are the best place to present your
> work, given the huge amount of code/layers you want to add in linux
> kernel.

Agreed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2016-09-19 20:39 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-12 14:55 [RFC V3 PATCH 00/26] Kernel NET policy kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 01/26] net: introduce " kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 02/26] net/netpolicy: init " kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 03/26] net/netpolicy: get device queue irq information kan.liang
2016-09-12 16:48   ` Sergei Shtylyov
2016-09-13 12:23     ` Liang, Kan
2016-09-13 12:23       ` Liang, Kan
2016-09-13 13:14       ` Alexander Duyck
2016-09-13 13:14         ` Alexander Duyck
2016-09-13 13:22         ` Liang, Kan
2016-09-13 13:22           ` Liang, Kan
2016-09-12 14:55 ` [RFC V3 PATCH 04/26] net/netpolicy: get CPU information kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 05/26] net/netpolicy: create CPU and queue mapping kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 06/26] net/netpolicy: set and remove IRQ affinity kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 07/26] net/netpolicy: enable and disable NET policy kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 08/26] net/netpolicy: introduce NET policy object kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 09/26] net/netpolicy: set NET policy by policy name kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 10/26] net/netpolicy: add three new NET policies kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 11/26] net/netpolicy: add MIX policy kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 12/26] net/netpolicy: NET device hotplug kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 13/26] net/netpolicy: support CPU hotplug kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 14/26] net/netpolicy: handle channel changes kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 15/26] net/netpolicy: implement netpolicy register kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 16/26] net/netpolicy: introduce per socket netpolicy kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 17/26] net/netpolicy: introduce netpolicy_pick_queue kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 18/26] net/netpolicy: set tx queues according to policy kan.liang
2016-09-12 20:23   ` Tom Herbert
2016-09-12 20:23     ` Tom Herbert
2016-09-13 12:22     ` Liang, Kan
2016-09-13 12:22       ` Liang, Kan
2016-09-12 14:55 ` [RFC V3 PATCH 19/26] net/netpolicy: tc bpf extension to pick Tx queue kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 20/26] net/netpolicy: set Rx queues according to policy kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 21/26] net/netpolicy: introduce per task net policy kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 22/26] net/netpolicy: set per task policy by proc kan.liang
2016-09-12 17:01   ` Sergei Shtylyov
2016-09-12 14:55 ` [RFC V3 PATCH 23/26] net/netpolicy: fast path for finding the queues kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 24/26] net/netpolicy: optimize for queue pair kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 25/26] net/netpolicy: limit the total record number kan.liang
2016-09-12 14:55 ` [RFC V3 PATCH 26/26] Documentation/networking: Document NET policy kan.liang
2016-09-12 15:38 ` [RFC V3 PATCH 00/26] Kernel " Florian Westphal
2016-09-12 17:21   ` Cong Wang
2016-09-12 17:21     ` Cong Wang
2016-09-12 15:52 ` Eric Dumazet
2016-09-19 20:39   ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.