All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18  6:55 ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

It is a big challenge to get good network performance. First, the network
performance is not good with default system settings. Second, it is too
difficult to do automatic tuning for all possible workloads, since workloads
have different requirements. Some workloads may want high throughput. Some may
need low latency. Last but not least, there are lots of manual configurations.
Fine grained configuration is too difficult for users.

NET policy intends to simplify the network configuration and get a good network
performance according to the hints(policy) which is applied by user. It
provides some typical "policies" for user which can be set per-socket, per-task
or per-device. The kernel will automatically figures out how to merge different
requests to get good network performance.
Net policy is designed for multiqueue network devices. This implementation is
only for Intel NICs using i40e driver. But the concepts and generic code should
apply to other multiqueue NICs too.
Net policy is also a combination of generic policy manager code and some
ethtool callbacks (per queue coalesce setting, flow classification rules) to
configure the driver.
This series also supports CPU hotplug and device hotplug.

Here are some key Interfaces/APIs for NET policy.

   /proc/net/netpolicy/$DEV/policy
   User can set/get per device policy from /proc

   /proc/$PID/net_policy
   User can set/get per task policy from /proc
   prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
   An alternative way to set/get per task policy is from prctl.

   setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
   User can set/get per socket policy by setsockopt


   int (*ndo_netpolicy_init)(struct net_device *dev,
                             struct netpolicy_info *info);
   Initialize device driver for NET policy

   int (*ndo_get_irq_info)(struct net_device *dev,
                           struct netpolicy_dev_info *info);
   Collect device irq information

   int (*ndo_set_net_policy)(struct net_device *dev,
                             enum netpolicy_name name);
   Configure device according to policy name

   netpolicy_register(struct netpolicy_reg *reg);
   netpolicy_unregister(struct netpolicy_reg *reg);
   NET policy API to register/unregister per task/socket net policy.
   For each task/socket, an record will be created and inserted into an RCU
   hash table.

   netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
   NET policy API to find the proper queue for packet receiving and
   transmitting.

   netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
                        struct netpolicy_flow_spec *flow);
   NET policy API to add flow director rules.

For using NET policy, the per-device policy must be set in advance. It will
automatically configure the system and re-organize the resource of the system
accordingly. For system configuration, in this series, it will disable irq
balance, set device queue irq affinity, and modify interrupt moderation. For
re-organizing the resource, current implementation forces that CPU and queue
irq are 1:1 mapping. An 1:1 mapping group is also called net policy object.
For each device policy, it maintains a policy list. Once the device policy is
applied, the objects will be insert and tracked in that device policy list. The
policy list only be updated when cpu/device hotplug, queue number changes or
device policy changes.
The user can use /proc, prctl and setsockopt to set per-task and per-socket
net policy. Once the policy is set, an related record will be inserted into RCU
hash table. The record includes ptr, policy and net policy object. The ptr is
the pointer address of task/socket. The object will not be assigned until the
first package receive/transmit. The object is picked by round-robin from object
list. Once the object is determined, the following packets will be set to
redirect to the queue(object).
The object can be shared. The per-task or per-socket policy can be inherited.

Now NET policy supports four per device policies and three per task/socket
policies.
    - BULK policy: This policy is designed for high throughput. It can be
      applied to either per device policy or per task/socket policy.
    - CPU policy: This policy is designed for high throughput but lower CPU
      utilization. It can be applied to either per device policy or
      per task/socket policy.
    - LATENCY policy: This policy is designed for low latency. It can be
      applied to either per device policy or per task/socket policy.
    - MIX policy: This policy can only be applied to per device policy. This
      is designed for the case which miscellaneous types of workload running
      on the device.

Lots of tests are done for net policy on platforms with Intel Xeon E5 V2
and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
Netperf is used to evaluate the throughput and latency performance.
  - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
    -b burst -D" is used to evaluate throughput performance, which is
    called throughput-first workload.
  - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
    used to evaluate latency performance, which is called latency-first
    workload.
  - Different loads are also evaluated by running 1, 12, 24, 48 or 96
    throughput-first workloads/latency-first workload simultaneously.

For "BULK" policy, the throughput performance is on average ~1.26X than
baseline.
For "CPU" policy, the throughput performance is on average ~1.20X than
baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
For "LATENCY" policy, the latency is on average 53.5% less than the baseline.
For "MIX" policy, mixed workloads performance is evaluated.
The mixed workloads are combination of throughput-first workload and
latency-first workload. Five different types of combinations are evaluated
(pure throughput-first workload, pure latency-first workloads,
 2/3 throughput-first workload + 1/3 latency-first workloads,
 1/3 throughput-first workload + 2/3 latency-first workloads and
 1/2 throughput-first workload + 1/2 latency-first workloads).
For caculating the performance of mixed workloads, a weighted sum system
is introduced.
Score = normalized_latency * Weight + normalized_throughput * (1 - Weight).
If we assume that the user has an equal interest in latency and throughput
performance, the Score for "MIX" policy is on average ~1.52X than baseline.


Kan Liang (30):
  net: introduce NET policy
  net/netpolicy: init NET policy
  i40e/netpolicy: Implement ndo_netpolicy_init
  net/netpolicy: get driver information
  i40e/netpolicy: implement ndo_get_irq_info
  net/netpolicy: get CPU information
  net/netpolicy: create CPU and queue mapping
  net/netpolicy: set and remove irq affinity
  net/netpolicy: enable and disable net policy
  net/netpolicy: introduce netpolicy object
  net/netpolicy: set net policy by policy name
  i40e/netpolicy: implement ndo_set_net_policy
  i40e/netpolicy: add three new net policies
  net/netpolicy: add MIX policy
  i40e/netpolicy: add MIX policy support
  net/netpolicy: net device hotplug
  net/netpolicy: support CPU hotplug
  net/netpolicy: handle channel changes
  net/netpolicy: implement netpolicy register
  net/netpolicy: introduce per socket netpolicy
  net/policy: introduce netpolicy_pick_queue
  net/netpolicy: set tx queues according to policy
  i40e/ethtool: support RX_CLS_LOC_ANY
  net/netpolicy: set rx queues according to policy
  net/netpolicy: introduce per task net policy
  net/netpolicy: set per task policy by proc
  net/netpolicy: fast path for finding the queues
  net/netpolicy: optimize for queue pair
  net/netpolicy: limit the total record number
  Documentation/networking: Document net policy

 Documentation/networking/netpolicy.txt         |  158 +++
 arch/alpha/include/uapi/asm/socket.h           |    2 +
 arch/avr32/include/uapi/asm/socket.h           |    2 +
 arch/frv/include/uapi/asm/socket.h             |    2 +
 arch/ia64/include/uapi/asm/socket.h            |    2 +
 arch/m32r/include/uapi/asm/socket.h            |    2 +
 arch/mips/include/uapi/asm/socket.h            |    2 +
 arch/mn10300/include/uapi/asm/socket.h         |    2 +
 arch/parisc/include/uapi/asm/socket.h          |    2 +
 arch/powerpc/include/uapi/asm/socket.h         |    2 +
 arch/s390/include/uapi/asm/socket.h            |    2 +
 arch/sparc/include/uapi/asm/socket.h           |    2 +
 arch/xtensa/include/uapi/asm/socket.h          |    2 +
 drivers/net/ethernet/intel/i40e/i40e.h         |    3 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   44 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c    |  174 +++
 fs/proc/base.c                                 |   64 ++
 include/linux/init_task.h                      |   14 +
 include/linux/netdevice.h                      |   31 +
 include/linux/netpolicy.h                      |  160 +++
 include/linux/sched.h                          |    5 +
 include/net/net_namespace.h                    |    3 +
 include/net/request_sock.h                     |    4 +-
 include/net/sock.h                             |   10 +
 include/uapi/asm-generic/socket.h              |    2 +
 include/uapi/linux/prctl.h                     |    4 +
 kernel/exit.c                                  |    4 +
 kernel/fork.c                                  |   11 +
 kernel/sys.c                                   |   31 +
 net/Kconfig                                    |    7 +
 net/core/Makefile                              |    1 +
 net/core/dev.c                                 |   30 +-
 net/core/ethtool.c                             |    8 +-
 net/core/netpolicy.c                           | 1387 ++++++++++++++++++++++++
 net/core/sock.c                                |   46 +
 net/ipv4/af_inet.c                             |   75 ++
 net/ipv4/udp.c                                 |    4 +
 37 files changed, 2294 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/networking/netpolicy.txt
 create mode 100644 include/linux/netpolicy.h
 create mode 100644 net/core/netpolicy.c

-- 
2.5.5

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18  6:55 ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

It is a big challenge to get good network performance. First, the network
performance is not good with default system settings. Second, it is too
difficult to do automatic tuning for all possible workloads, since workloads
have different requirements. Some workloads may want high throughput. Some may
need low latency. Last but not least, there are lots of manual configurations.
Fine grained configuration is too difficult for users.

NET policy intends to simplify the network configuration and get a good network
performance according to the hints(policy) which is applied by user. It
provides some typical "policies" for user which can be set per-socket, per-task
or per-device. The kernel will automatically figures out how to merge different
requests to get good network performance.
Net policy is designed for multiqueue network devices. This implementation is
only for Intel NICs using i40e driver. But the concepts and generic code should
apply to other multiqueue NICs too.
Net policy is also a combination of generic policy manager code and some
ethtool callbacks (per queue coalesce setting, flow classification rules) to
configure the driver.
This series also supports CPU hotplug and device hotplug.

Here are some key Interfaces/APIs for NET policy.

   /proc/net/netpolicy/$DEV/policy
   User can set/get per device policy from /proc

   /proc/$PID/net_policy
   User can set/get per task policy from /proc
   prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
   An alternative way to set/get per task policy is from prctl.

   setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
   User can set/get per socket policy by setsockopt


   int (*ndo_netpolicy_init)(struct net_device *dev,
                             struct netpolicy_info *info);
   Initialize device driver for NET policy

   int (*ndo_get_irq_info)(struct net_device *dev,
                           struct netpolicy_dev_info *info);
   Collect device irq information

   int (*ndo_set_net_policy)(struct net_device *dev,
                             enum netpolicy_name name);
   Configure device according to policy name

   netpolicy_register(struct netpolicy_reg *reg);
   netpolicy_unregister(struct netpolicy_reg *reg);
   NET policy API to register/unregister per task/socket net policy.
   For each task/socket, an record will be created and inserted into an RCU
   hash table.

   netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
   NET policy API to find the proper queue for packet receiving and
   transmitting.

   netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
                        struct netpolicy_flow_spec *flow);
   NET policy API to add flow director rules.

For using NET policy, the per-device policy must be set in advance. It will
automatically configure the system and re-organize the resource of the system
accordingly. For system configuration, in this series, it will disable irq
balance, set device queue irq affinity, and modify interrupt moderation. For
re-organizing the resource, current implementation forces that CPU and queue
irq are 1:1 mapping. An 1:1 mapping group is also called net policy object.
For each device policy, it maintains a policy list. Once the device policy is
applied, the objects will be insert and tracked in that device policy list. The
policy list only be updated when cpu/device hotplug, queue number changes or
device policy changes.
The user can use /proc, prctl and setsockopt to set per-task and per-socket
net policy. Once the policy is set, an related record will be inserted into RCU
hash table. The record includes ptr, policy and net policy object. The ptr is
the pointer address of task/socket. The object will not be assigned until the
first package receive/transmit. The object is picked by round-robin from object
list. Once the object is determined, the following packets will be set to
redirect to the queue(object).
The object can be shared. The per-task or per-socket policy can be inherited.

Now NET policy supports four per device policies and three per task/socket
policies.
    - BULK policy: This policy is designed for high throughput. It can be
      applied to either per device policy or per task/socket policy.
    - CPU policy: This policy is designed for high throughput but lower CPU
      utilization. It can be applied to either per device policy or
      per task/socket policy.
    - LATENCY policy: This policy is designed for low latency. It can be
      applied to either per device policy or per task/socket policy.
    - MIX policy: This policy can only be applied to per device policy. This
      is designed for the case which miscellaneous types of workload running
      on the device.

Lots of tests are done for net policy on platforms with Intel Xeon E5 V2
and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
Netperf is used to evaluate the throughput and latency performance.
  - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
    -b burst -D" is used to evaluate throughput performance, which is
    called throughput-first workload.
  - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
    used to evaluate latency performance, which is called latency-first
    workload.
  - Different loads are also evaluated by running 1, 12, 24, 48 or 96
    throughput-first workloads/latency-first workload simultaneously.

For "BULK" policy, the throughput performance is on average ~1.26X than
baseline.
For "CPU" policy, the throughput performance is on average ~1.20X than
baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
For "LATENCY" policy, the latency is on average 53.5% less than the baseline.
For "MIX" policy, mixed workloads performance is evaluated.
The mixed workloads are combination of throughput-first workload and
latency-first workload. Five different types of combinations are evaluated
(pure throughput-first workload, pure latency-first workloads,
 2/3 throughput-first workload + 1/3 latency-first workloads,
 1/3 throughput-first workload + 2/3 latency-first workloads and
 1/2 throughput-first workload + 1/2 latency-first workloads).
For caculating the performance of mixed workloads, a weighted sum system
is introduced.
Score = normalized_latency * Weight + normalized_throughput * (1 - Weight).
If we assume that the user has an equal interest in latency and throughput
performance, the Score for "MIX" policy is on average ~1.52X than baseline.


Kan Liang (30):
  net: introduce NET policy
  net/netpolicy: init NET policy
  i40e/netpolicy: Implement ndo_netpolicy_init
  net/netpolicy: get driver information
  i40e/netpolicy: implement ndo_get_irq_info
  net/netpolicy: get CPU information
  net/netpolicy: create CPU and queue mapping
  net/netpolicy: set and remove irq affinity
  net/netpolicy: enable and disable net policy
  net/netpolicy: introduce netpolicy object
  net/netpolicy: set net policy by policy name
  i40e/netpolicy: implement ndo_set_net_policy
  i40e/netpolicy: add three new net policies
  net/netpolicy: add MIX policy
  i40e/netpolicy: add MIX policy support
  net/netpolicy: net device hotplug
  net/netpolicy: support CPU hotplug
  net/netpolicy: handle channel changes
  net/netpolicy: implement netpolicy register
  net/netpolicy: introduce per socket netpolicy
  net/policy: introduce netpolicy_pick_queue
  net/netpolicy: set tx queues according to policy
  i40e/ethtool: support RX_CLS_LOC_ANY
  net/netpolicy: set rx queues according to policy
  net/netpolicy: introduce per task net policy
  net/netpolicy: set per task policy by proc
  net/netpolicy: fast path for finding the queues
  net/netpolicy: optimize for queue pair
  net/netpolicy: limit the total record number
  Documentation/networking: Document net policy

 Documentation/networking/netpolicy.txt         |  158 +++
 arch/alpha/include/uapi/asm/socket.h           |    2 +
 arch/avr32/include/uapi/asm/socket.h           |    2 +
 arch/frv/include/uapi/asm/socket.h             |    2 +
 arch/ia64/include/uapi/asm/socket.h            |    2 +
 arch/m32r/include/uapi/asm/socket.h            |    2 +
 arch/mips/include/uapi/asm/socket.h            |    2 +
 arch/mn10300/include/uapi/asm/socket.h         |    2 +
 arch/parisc/include/uapi/asm/socket.h          |    2 +
 arch/powerpc/include/uapi/asm/socket.h         |    2 +
 arch/s390/include/uapi/asm/socket.h            |    2 +
 arch/sparc/include/uapi/asm/socket.h           |    2 +
 arch/xtensa/include/uapi/asm/socket.h          |    2 +
 drivers/net/ethernet/intel/i40e/i40e.h         |    3 +
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   44 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c    |  174 +++
 fs/proc/base.c                                 |   64 ++
 include/linux/init_task.h                      |   14 +
 include/linux/netdevice.h                      |   31 +
 include/linux/netpolicy.h                      |  160 +++
 include/linux/sched.h                          |    5 +
 include/net/net_namespace.h                    |    3 +
 include/net/request_sock.h                     |    4 +-
 include/net/sock.h                             |   10 +
 include/uapi/asm-generic/socket.h              |    2 +
 include/uapi/linux/prctl.h                     |    4 +
 kernel/exit.c                                  |    4 +
 kernel/fork.c                                  |   11 +
 kernel/sys.c                                   |   31 +
 net/Kconfig                                    |    7 +
 net/core/Makefile                              |    1 +
 net/core/dev.c                                 |   30 +-
 net/core/ethtool.c                             |    8 +-
 net/core/netpolicy.c                           | 1387 ++++++++++++++++++++++++
 net/core/sock.c                                |   46 +
 net/ipv4/af_inet.c                             |   75 ++
 net/ipv4/udp.c                                 |    4 +
 37 files changed, 2294 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/networking/netpolicy.txt
 create mode 100644 include/linux/netpolicy.h
 create mode 100644 net/core/netpolicy.c

-- 
2.5.5


^ permalink raw reply	[flat|nested] 123+ messages in thread

* [RFC PATCH 01/30] net: introduce NET policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:55   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch introduce NET policy subsystem. If proc is supported in the
system, it creates netpolicy node in proc system.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h   |   7 +++
 include/net/net_namespace.h |   3 ++
 net/Kconfig                 |   7 +++
 net/core/Makefile           |   1 +
 net/core/netpolicy.c        | 128 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 146 insertions(+)
 create mode 100644 net/core/netpolicy.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 49736a3..9e30a31 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1584,6 +1584,8 @@ enum netdev_priv_flags {
  *			switch driver and used to set the phys state of the
  *			switch port.
  *
+ *	@proc_dev:	device node in proc to configure device net policy
+ *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
  */
@@ -1851,6 +1853,11 @@ struct net_device {
 	struct lock_class_key	*qdisc_tx_busylock;
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
+#ifdef CONFIG_NETPOLICY
+#ifdef CONFIG_PROC_FS
+	struct proc_dir_entry	*proc_dev;
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..d2ff6c4 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -142,6 +142,9 @@ struct net {
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
+#ifdef CONFIG_NETPOLICY
+	struct proc_dir_entry	*proc_netpolicy;
+#endif /* CONFIG_NETPOLICY */
 };
 
 #include <linux/seq_file_net.h>
diff --git a/net/Kconfig b/net/Kconfig
index ff40562..c3ed726 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
 
 endif
 
+config NETPOLICY
+	depends on NET
+	bool "Net policy support"
+	default y
+	---help---
+	Net policy support
+
 source "net/dccp/Kconfig"
 source "net/sctp/Kconfig"
 source "net/rds/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..0be7092 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NETPOLICY) += netpolicy.o
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
new file mode 100644
index 0000000..faabfe7
--- /dev/null
+++ b/net/core/netpolicy.c
@@ -0,0 +1,128 @@
+/*
+ * netpolicy.c: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * NET policy intends to simplify the network configuration and get a good
+ * network performance according to the hints(policy) which is applied by user.
+ *
+ * Motivation
+ * 	- The network performance is not good with default system settings.
+ *	- It is too difficult to do automatic tuning for all possible
+ *	  workloads, since workloads have different requirements. Some
+ *	  workloads may want high throughput. Some may need low latency.
+ *	- There are lots of manual configurations. Fine grained configuration
+ *	  is too difficult for users.
+ * 	So, it is a big challenge to get good network performance.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <linux/uaccess.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+
+#ifdef CONFIG_PROC_FS
+
+static int net_policy_proc_show(struct seq_file *m, void *v)
+{
+	struct net_device *dev = (struct net_device *)m->private;
+
+	seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+
+	return 0;
+}
+
+static int net_policy_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, net_policy_proc_show, PDE_DATA(inode));
+}
+
+static const struct file_operations proc_net_policy_operations = {
+	.open		= net_policy_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+	.owner		= THIS_MODULE,
+};
+
+static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
+{
+	dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
+	if (!dev->proc_dev)
+		return -ENOMEM;
+
+	if (!proc_create_data("policy", S_IWUSR | S_IRUGO,
+			      dev->proc_dev, &proc_net_policy_operations,
+			      (void *)dev)) {
+		remove_proc_subtree(dev->name, net->proc_netpolicy);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+	struct net_device *dev, *aux;
+
+	net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
+					     net->proc_net);
+	if (!net->proc_netpolicy)
+		return -ENOMEM;
+
+	for_each_netdev_safe(net, dev, aux) {
+		netpolicy_proc_dev_init(net, dev);
+	}
+
+	return 0;
+}
+
+#else /* CONFIG_PROC_FS */
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+	return 0;
+}
+#endif /* CONFIG_PROC_FS */
+
+static void __net_exit netpolicy_net_exit(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	remove_proc_subtree("netpolicy", net->proc_net);
+#endif /* CONFIG_PROC_FS */
+}
+
+static struct pernet_operations netpolicy_net_ops = {
+	.init = netpolicy_net_init,
+	.exit = netpolicy_net_exit,
+};
+
+static int __init netpolicy_init(void)
+{
+	int ret;
+
+	ret = register_pernet_subsys(&netpolicy_net_ops);
+
+	return ret;
+}
+
+static void __exit netpolicy_exit(void)
+{
+	unregister_pernet_subsys(&netpolicy_net_ops);
+}
+
+subsys_initcall(netpolicy_init);
+module_exit(netpolicy_exit);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 01/30] net: introduce NET policy
@ 2016-07-18  6:55   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

This patch introduce NET policy subsystem. If proc is supported in the
system, it creates netpolicy node in proc system.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h   |   7 +++
 include/net/net_namespace.h |   3 ++
 net/Kconfig                 |   7 +++
 net/core/Makefile           |   1 +
 net/core/netpolicy.c        | 128 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 146 insertions(+)
 create mode 100644 net/core/netpolicy.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 49736a3..9e30a31 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1584,6 +1584,8 @@ enum netdev_priv_flags {
  *			switch driver and used to set the phys state of the
  *			switch port.
  *
+ *	@proc_dev:	device node in proc to configure device net policy
+ *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
  */
@@ -1851,6 +1853,11 @@ struct net_device {
 	struct lock_class_key	*qdisc_tx_busylock;
 	struct lock_class_key	*qdisc_running_key;
 	bool			proto_down;
+#ifdef CONFIG_NETPOLICY
+#ifdef CONFIG_PROC_FS
+	struct proc_dir_entry	*proc_dev;
+#endif /* CONFIG_PROC_FS */
+#endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
 
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4089abc..d2ff6c4 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -142,6 +142,9 @@ struct net {
 #endif
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
+#ifdef CONFIG_NETPOLICY
+	struct proc_dir_entry	*proc_netpolicy;
+#endif /* CONFIG_NETPOLICY */
 };
 
 #include <linux/seq_file_net.h>
diff --git a/net/Kconfig b/net/Kconfig
index ff40562..c3ed726 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -205,6 +205,13 @@ source "net/bridge/netfilter/Kconfig"
 
 endif
 
+config NETPOLICY
+	depends on NET
+	bool "Net policy support"
+	default y
+	---help---
+	Net policy support
+
 source "net/dccp/Kconfig"
 source "net/sctp/Kconfig"
 source "net/rds/Kconfig"
diff --git a/net/core/Makefile b/net/core/Makefile
index d6508c2..0be7092 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -27,3 +27,4 @@ obj-$(CONFIG_LWTUNNEL) += lwtunnel.o
 obj-$(CONFIG_DST_CACHE) += dst_cache.o
 obj-$(CONFIG_HWBM) += hwbm.o
 obj-$(CONFIG_NET_DEVLINK) += devlink.o
+obj-$(CONFIG_NETPOLICY) += netpolicy.o
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
new file mode 100644
index 0000000..faabfe7
--- /dev/null
+++ b/net/core/netpolicy.c
@@ -0,0 +1,128 @@
+/*
+ * netpolicy.c: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang at intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * NET policy intends to simplify the network configuration and get a good
+ * network performance according to the hints(policy) which is applied by user.
+ *
+ * Motivation
+ * 	- The network performance is not good with default system settings.
+ *	- It is too difficult to do automatic tuning for all possible
+ *	  workloads, since workloads have different requirements. Some
+ *	  workloads may want high throughput. Some may need low latency.
+ *	- There are lots of manual configurations. Fine grained configuration
+ *	  is too difficult for users.
+ * 	So, it is a big challenge to get good network performance.
+ *
+ */
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/seq_file.h>
+#include <linux/proc_fs.h>
+#include <linux/uaccess.h>
+#include <linux/netdevice.h>
+#include <net/net_namespace.h>
+
+#ifdef CONFIG_PROC_FS
+
+static int net_policy_proc_show(struct seq_file *m, void *v)
+{
+	struct net_device *dev = (struct net_device *)m->private;
+
+	seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+
+	return 0;
+}
+
+static int net_policy_proc_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, net_policy_proc_show, PDE_DATA(inode));
+}
+
+static const struct file_operations proc_net_policy_operations = {
+	.open		= net_policy_proc_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+	.owner		= THIS_MODULE,
+};
+
+static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
+{
+	dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
+	if (!dev->proc_dev)
+		return -ENOMEM;
+
+	if (!proc_create_data("policy", S_IWUSR | S_IRUGO,
+			      dev->proc_dev, &proc_net_policy_operations,
+			      (void *)dev)) {
+		remove_proc_subtree(dev->name, net->proc_netpolicy);
+		return -ENOMEM;
+	}
+	return 0;
+}
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+	struct net_device *dev, *aux;
+
+	net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
+					     net->proc_net);
+	if (!net->proc_netpolicy)
+		return -ENOMEM;
+
+	for_each_netdev_safe(net, dev, aux) {
+		netpolicy_proc_dev_init(net, dev);
+	}
+
+	return 0;
+}
+
+#else /* CONFIG_PROC_FS */
+
+static int __net_init netpolicy_net_init(struct net *net)
+{
+	return 0;
+}
+#endif /* CONFIG_PROC_FS */
+
+static void __net_exit netpolicy_net_exit(struct net *net)
+{
+#ifdef CONFIG_PROC_FS
+	remove_proc_subtree("netpolicy", net->proc_net);
+#endif /* CONFIG_PROC_FS */
+}
+
+static struct pernet_operations netpolicy_net_ops = {
+	.init = netpolicy_net_init,
+	.exit = netpolicy_net_exit,
+};
+
+static int __init netpolicy_init(void)
+{
+	int ret;
+
+	ret = register_pernet_subsys(&netpolicy_net_ops);
+
+	return ret;
+}
+
+static void __exit netpolicy_exit(void)
+{
+	unregister_pernet_subsys(&netpolicy_net_ops);
+}
+
+subsys_initcall(netpolicy_init);
+module_exit(netpolicy_exit);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 02/30] net/netpolicy: init NET policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:55   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch tries to initialize NET policy for all the devices in the
system. However, not all device drivers have NET policy support. For
those drivers who does not have NET policy support, the node will not be
showed in /proc/net/netpolicy/.
The device driver who has NET policy support must implement the
interface ndo_netpolicy_init, which is used to do necessory
initialization and collect information (E.g. supported policies) from
driver.
The user can check /proc/netpolicy/ and /proc/net/netpolicy/$DEV/policy
to know the available device and its supported policy.
np_lock is also introduced to protect the state of NET policy.
Device hotplug will be handled later in this series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h | 12 +++++++
 include/linux/netpolicy.h | 31 +++++++++++++++++
 net/core/netpolicy.c      | 86 +++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 118 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/netpolicy.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9e30a31..ef45dfe 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/netpolicy.h>
 
 struct netpoll_info;
 struct device;
@@ -1087,6 +1088,9 @@ struct tc_to_netdev {
  *	appropriate rx headroom value allows avoiding skb head copy on
  *	forward. Setting a negative value resets the rx headroom to the
  *	default value.
+ * int (*ndo_netpolicy_init)(struct net_device *dev,
+ * 			     struct netpolicy_info *info);
+ * 	This function is used to init and get supported policy.
  *
  */
 struct net_device_ops {
@@ -1271,6 +1275,10 @@ struct net_device_ops {
 						       struct sk_buff *skb);
 	void			(*ndo_set_rx_headroom)(struct net_device *dev,
 						       int needed_headroom);
+#ifdef CONFIG_NETPOLICY
+	int			(*ndo_netpolicy_init)(struct net_device *dev,
+						      struct netpolicy_info *info);
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
@@ -1585,6 +1593,8 @@ enum netdev_priv_flags {
  *			switch port.
  *
  *	@proc_dev:	device node in proc to configure device net policy
+ *	@netpolicy:	NET policy related information of net device
+ *	@np_lock:	protect the state of NET policy
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -1857,6 +1867,8 @@ struct net_device {
 #ifdef CONFIG_PROC_FS
 	struct proc_dir_entry	*proc_dev;
 #endif /* CONFIG_PROC_FS */
+	struct netpolicy_info	*netpolicy;
+	spinlock_t		np_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
new file mode 100644
index 0000000..ca1f131
--- /dev/null
+++ b/include/linux/netpolicy.h
@@ -0,0 +1,31 @@
+/*
+ * netpolicy.h: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang@intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#ifndef __LINUX_NETPOLICY_H
+#define __LINUX_NETPOLICY_H
+
+enum netpolicy_name {
+	NET_POLICY_NONE		= 0,
+	NET_POLICY_MAX,
+};
+
+extern const char *policy_name[];
+
+struct netpolicy_info {
+	enum netpolicy_name	cur_policy;
+	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+};
+
+#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index faabfe7..5f304d5 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,13 +35,29 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 
+const char *policy_name[NET_POLICY_MAX] = {
+	"NONE"
+};
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
 	struct net_device *dev = (struct net_device *)m->private;
-
-	seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+	int i;
+
+	if (WARN_ON(!dev->netpolicy))
+		return -EINVAL;
+
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+		seq_printf(m, "%s: There is no policy applied\n", dev->name);
+		seq_printf(m, "%s: The available policy include:", dev->name);
+		for_each_set_bit(i, dev->netpolicy->avail_policy, NET_POLICY_MAX)
+			seq_printf(m, " %s", policy_name[i]);
+		seq_printf(m, "\n");
+	} else {
+		seq_printf(m, "%s: POLICY %s is running on the system\n",
+			   dev->name, policy_name[dev->netpolicy->cur_policy]);
+	}
 
 	return 0;
 }
@@ -73,33 +89,81 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 	}
 	return 0;
 }
+#endif /* CONFIG_PROC_FS */
+
+int init_netpolicy(struct net_device *dev)
+{
+	int ret;
+
+	spin_lock(&dev->np_lock);
+	ret = 0;
+
+	if (!dev->netdev_ops->ndo_netpolicy_init) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (dev->netpolicy)
+		goto unlock;
+
+	dev->netpolicy = kzalloc(sizeof(*dev->netpolicy), GFP_ATOMIC);
+	if (!dev->netpolicy) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	ret = dev->netdev_ops->ndo_netpolicy_init(dev, dev->netpolicy);
+	if (ret) {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+	}
+
+unlock:
+	spin_unlock(&dev->np_lock);
+	return ret;
+}
+
+void uninit_netpolicy(struct net_device *dev)
+{
+	spin_lock(&dev->np_lock);
+	if (dev->netpolicy) {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+	}
+	spin_unlock(&dev->np_lock);
+}
 
 static int __net_init netpolicy_net_init(struct net *net)
 {
 	struct net_device *dev, *aux;
 
+#ifdef CONFIG_PROC_FS
 	net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
 					     net->proc_net);
 	if (!net->proc_netpolicy)
 		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
 
 	for_each_netdev_safe(net, dev, aux) {
-		netpolicy_proc_dev_init(net, dev);
+		if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+			if (netpolicy_proc_dev_init(net, dev))
+				uninit_netpolicy(dev);
+			else
+#endif /* CONFIG_PROC_FS */
+			pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+		}
 	}
 
 	return 0;
 }
 
-#else /* CONFIG_PROC_FS */
-
-static int __net_init netpolicy_net_init(struct net *net)
-{
-	return 0;
-}
-#endif /* CONFIG_PROC_FS */
-
 static void __net_exit netpolicy_net_exit(struct net *net)
 {
+	struct net_device *dev, *aux;
+
+	for_each_netdev_safe(net, dev, aux)
+		uninit_netpolicy(dev);
 #ifdef CONFIG_PROC_FS
 	remove_proc_subtree("netpolicy", net->proc_net);
 #endif /* CONFIG_PROC_FS */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 02/30] net/netpolicy: init NET policy
@ 2016-07-18  6:55   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

This patch tries to initialize NET policy for all the devices in the
system. However, not all device drivers have NET policy support. For
those drivers who does not have NET policy support, the node will not be
showed in /proc/net/netpolicy/.
The device driver who has NET policy support must implement the
interface ndo_netpolicy_init, which is used to do necessory
initialization and collect information (E.g. supported policies) from
driver.
The user can check /proc/netpolicy/ and /proc/net/netpolicy/$DEV/policy
to know the available device and its supported policy.
np_lock is also introduced to protect the state of NET policy.
Device hotplug will be handled later in this series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h | 12 +++++++
 include/linux/netpolicy.h | 31 +++++++++++++++++
 net/core/netpolicy.c      | 86 +++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 118 insertions(+), 11 deletions(-)
 create mode 100644 include/linux/netpolicy.h

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9e30a31..ef45dfe 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include <uapi/linux/netdevice.h>
 #include <uapi/linux/if_bonding.h>
 #include <uapi/linux/pkt_cls.h>
+#include <linux/netpolicy.h>
 
 struct netpoll_info;
 struct device;
@@ -1087,6 +1088,9 @@ struct tc_to_netdev {
  *	appropriate rx headroom value allows avoiding skb head copy on
  *	forward. Setting a negative value resets the rx headroom to the
  *	default value.
+ * int (*ndo_netpolicy_init)(struct net_device *dev,
+ * 			     struct netpolicy_info *info);
+ * 	This function is used to init and get supported policy.
  *
  */
 struct net_device_ops {
@@ -1271,6 +1275,10 @@ struct net_device_ops {
 						       struct sk_buff *skb);
 	void			(*ndo_set_rx_headroom)(struct net_device *dev,
 						       int needed_headroom);
+#ifdef CONFIG_NETPOLICY
+	int			(*ndo_netpolicy_init)(struct net_device *dev,
+						      struct netpolicy_info *info);
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
@@ -1585,6 +1593,8 @@ enum netdev_priv_flags {
  *			switch port.
  *
  *	@proc_dev:	device node in proc to configure device net policy
+ *	@netpolicy:	NET policy related information of net device
+ *	@np_lock:	protect the state of NET policy
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -1857,6 +1867,8 @@ struct net_device {
 #ifdef CONFIG_PROC_FS
 	struct proc_dir_entry	*proc_dev;
 #endif /* CONFIG_PROC_FS */
+	struct netpolicy_info	*netpolicy;
+	spinlock_t		np_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
new file mode 100644
index 0000000..ca1f131
--- /dev/null
+++ b/include/linux/netpolicy.h
@@ -0,0 +1,31 @@
+/*
+ * netpolicy.h: Net policy support
+ * Copyright (c) 2016, Intel Corporation.
+ * Author: Kan Liang (kan.liang at intel.com)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#ifndef __LINUX_NETPOLICY_H
+#define __LINUX_NETPOLICY_H
+
+enum netpolicy_name {
+	NET_POLICY_NONE		= 0,
+	NET_POLICY_MAX,
+};
+
+extern const char *policy_name[];
+
+struct netpolicy_info {
+	enum netpolicy_name	cur_policy;
+	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+};
+
+#endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index faabfe7..5f304d5 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,13 +35,29 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 
+const char *policy_name[NET_POLICY_MAX] = {
+	"NONE"
+};
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
 	struct net_device *dev = (struct net_device *)m->private;
-
-	seq_printf(m, "%s doesn't support net policy manager\n", dev->name);
+	int i;
+
+	if (WARN_ON(!dev->netpolicy))
+		return -EINVAL;
+
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+		seq_printf(m, "%s: There is no policy applied\n", dev->name);
+		seq_printf(m, "%s: The available policy include:", dev->name);
+		for_each_set_bit(i, dev->netpolicy->avail_policy, NET_POLICY_MAX)
+			seq_printf(m, " %s", policy_name[i]);
+		seq_printf(m, "\n");
+	} else {
+		seq_printf(m, "%s: POLICY %s is running on the system\n",
+			   dev->name, policy_name[dev->netpolicy->cur_policy]);
+	}
 
 	return 0;
 }
@@ -73,33 +89,81 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 	}
 	return 0;
 }
+#endif /* CONFIG_PROC_FS */
+
+int init_netpolicy(struct net_device *dev)
+{
+	int ret;
+
+	spin_lock(&dev->np_lock);
+	ret = 0;
+
+	if (!dev->netdev_ops->ndo_netpolicy_init) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (dev->netpolicy)
+		goto unlock;
+
+	dev->netpolicy = kzalloc(sizeof(*dev->netpolicy), GFP_ATOMIC);
+	if (!dev->netpolicy) {
+		ret = -ENOMEM;
+		goto unlock;
+	}
+
+	ret = dev->netdev_ops->ndo_netpolicy_init(dev, dev->netpolicy);
+	if (ret) {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+	}
+
+unlock:
+	spin_unlock(&dev->np_lock);
+	return ret;
+}
+
+void uninit_netpolicy(struct net_device *dev)
+{
+	spin_lock(&dev->np_lock);
+	if (dev->netpolicy) {
+		kfree(dev->netpolicy);
+		dev->netpolicy = NULL;
+	}
+	spin_unlock(&dev->np_lock);
+}
 
 static int __net_init netpolicy_net_init(struct net *net)
 {
 	struct net_device *dev, *aux;
 
+#ifdef CONFIG_PROC_FS
 	net->proc_netpolicy = proc_net_mkdir(net, "netpolicy",
 					     net->proc_net);
 	if (!net->proc_netpolicy)
 		return -ENOMEM;
+#endif /* CONFIG_PROC_FS */
 
 	for_each_netdev_safe(net, dev, aux) {
-		netpolicy_proc_dev_init(net, dev);
+		if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+			if (netpolicy_proc_dev_init(net, dev))
+				uninit_netpolicy(dev);
+			else
+#endif /* CONFIG_PROC_FS */
+			pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+		}
 	}
 
 	return 0;
 }
 
-#else /* CONFIG_PROC_FS */
-
-static int __net_init netpolicy_net_init(struct net *net)
-{
-	return 0;
-}
-#endif /* CONFIG_PROC_FS */
-
 static void __net_exit netpolicy_net_exit(struct net *net)
 {
+	struct net_device *dev, *aux;
+
+	for_each_netdev_safe(net, dev, aux)
+		uninit_netpolicy(dev);
 #ifdef CONFIG_PROC_FS
 	remove_proc_subtree("netpolicy", net->proc_net);
 #endif /* CONFIG_PROC_FS */
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 03/30] i40e/netpolicy: Implement ndo_netpolicy_init
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:55   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Support ndo_netpolicy_init in i40e driver.
For i40e driver, there is no extra initialization work to do. It only
needs to update the available policy bitmap.
policy_param will be filled according to different policies later.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 41 +++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2b11405..ee1f0b2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8960,6 +8960,44 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 	return features;
 }
 
+#ifdef CONFIG_NETPOLICY
+
+#define NET_POLICY_NOT_SUPPORT	-2
+#define NET_POLICY_END		-3
+static int policy_param[NET_POLICY_MAX + 1][2] = {
+	/* rx-usec, tx-usec */
+	{0, 0},
+
+	{NET_POLICY_END, NET_POLICY_END},
+};
+
+/**
+ * i40e_ndo_netpolicy_init
+ * @dev: the net device pointer
+ * @info: netpolicy info which need to be updated
+ *
+ * Init and update available policy on i40e driver
+ * Returns 0 on success, negative on failure
+ */
+static int i40e_ndo_netpolicy_init(struct net_device *dev,
+				   struct netpolicy_info *info)
+{
+	int i;
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if ((policy_param[i][0] == NET_POLICY_END) &&
+		    (policy_param[i][1] == NET_POLICY_END))
+			break;
+
+		if ((policy_param[i][0] != NET_POLICY_NOT_SUPPORT) &&
+		    (policy_param[i][1] != NET_POLICY_NOT_SUPPORT))
+			set_bit(i, info->avail_policy);
+	}
+
+	return 0;
+}
+#endif /* CONFIG_NETPOLICY */
+
 static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_open		= i40e_open,
 	.ndo_stop		= i40e_close,
@@ -8996,6 +9034,9 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_features_check	= i40e_features_check,
 	.ndo_bridge_getlink	= i40e_ndo_bridge_getlink,
 	.ndo_bridge_setlink	= i40e_ndo_bridge_setlink,
+#ifdef CONFIG_NETPOLICY
+	.ndo_netpolicy_init	= i40e_ndo_netpolicy_init,
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 03/30] i40e/netpolicy: Implement ndo_netpolicy_init
@ 2016-07-18  6:55   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Support ndo_netpolicy_init in i40e driver.
For i40e driver, there is no extra initialization work to do. It only
needs to update the available policy bitmap.
policy_param will be filled according to different policies later.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 41 +++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2b11405..ee1f0b2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8960,6 +8960,44 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 	return features;
 }
 
+#ifdef CONFIG_NETPOLICY
+
+#define NET_POLICY_NOT_SUPPORT	-2
+#define NET_POLICY_END		-3
+static int policy_param[NET_POLICY_MAX + 1][2] = {
+	/* rx-usec, tx-usec */
+	{0, 0},
+
+	{NET_POLICY_END, NET_POLICY_END},
+};
+
+/**
+ * i40e_ndo_netpolicy_init
+ * @dev: the net device pointer
+ * @info: netpolicy info which need to be updated
+ *
+ * Init and update available policy on i40e driver
+ * Returns 0 on success, negative on failure
+ */
+static int i40e_ndo_netpolicy_init(struct net_device *dev,
+				   struct netpolicy_info *info)
+{
+	int i;
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if ((policy_param[i][0] == NET_POLICY_END) &&
+		    (policy_param[i][1] == NET_POLICY_END))
+			break;
+
+		if ((policy_param[i][0] != NET_POLICY_NOT_SUPPORT) &&
+		    (policy_param[i][1] != NET_POLICY_NOT_SUPPORT))
+			set_bit(i, info->avail_policy);
+	}
+
+	return 0;
+}
+#endif /* CONFIG_NETPOLICY */
+
 static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_open		= i40e_open,
 	.ndo_stop		= i40e_close,
@@ -8996,6 +9034,9 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_features_check	= i40e_features_check,
 	.ndo_bridge_getlink	= i40e_ndo_bridge_getlink,
 	.ndo_bridge_setlink	= i40e_ndo_bridge_setlink,
+#ifdef CONFIG_NETPOLICY
+	.ndo_netpolicy_init	= i40e_ndo_netpolicy_init,
+#endif /* CONFIG_NETPOLICY */
 };
 
 /**
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 04/30] net/netpolicy: get driver information
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:55   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Net policy needs to know device information. Currently, it's enough to
only get irq information of rx and tx queues. This patch
introduces ndo_get_irq_info to do so.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |  5 +++++
 include/linux/netpolicy.h |  7 +++++++
 net/core/netpolicy.c      | 14 ++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef45dfe..3470943 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1091,6 +1091,9 @@ struct tc_to_netdev {
  * int (*ndo_netpolicy_init)(struct net_device *dev,
  * 			     struct netpolicy_info *info);
  * 	This function is used to init and get supported policy.
+ * int (*ndo_get_irq_info)(struct net_device *dev,
+ * 			   struct netpolicy_dev_info *info);
+ * 	This function is used to get irq information of rx and tx queues
  *
  */
 struct net_device_ops {
@@ -1278,6 +1281,8 @@ struct net_device_ops {
 #ifdef CONFIG_NETPOLICY
 	int			(*ndo_netpolicy_init)(struct net_device *dev,
 						      struct netpolicy_info *info);
+	int			(*ndo_get_irq_info)(struct net_device *dev,
+						    struct netpolicy_dev_info *info);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ca1f131..fc87d9b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -23,6 +23,13 @@ enum netpolicy_name {
 
 extern const char *policy_name[];
 
+struct netpolicy_dev_info {
+	u32	rx_num;
+	u32	tx_num;
+	u32	*rx_irq;
+	u32	*tx_irq;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 5f304d5..7c34c8a 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,20 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 
+static int netpolicy_get_dev_info(struct net_device *dev,
+				  struct netpolicy_dev_info *d_info)
+{
+	if (!dev->netdev_ops->ndo_get_irq_info)
+		return -ENOTSUPP;
+	return dev->netdev_ops->ndo_get_irq_info(dev, d_info);
+}
+
+static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
+{
+	kfree(d_info->rx_irq);
+	kfree(d_info->tx_irq);
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 04/30] net/netpolicy: get driver information
@ 2016-07-18  6:55   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Net policy needs to know device information. Currently, it's enough to
only get irq information of rx and tx queues. This patch
introduces ndo_get_irq_info to do so.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |  5 +++++
 include/linux/netpolicy.h |  7 +++++++
 net/core/netpolicy.c      | 14 ++++++++++++++
 3 files changed, 26 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ef45dfe..3470943 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1091,6 +1091,9 @@ struct tc_to_netdev {
  * int (*ndo_netpolicy_init)(struct net_device *dev,
  * 			     struct netpolicy_info *info);
  * 	This function is used to init and get supported policy.
+ * int (*ndo_get_irq_info)(struct net_device *dev,
+ * 			   struct netpolicy_dev_info *info);
+ * 	This function is used to get irq information of rx and tx queues
  *
  */
 struct net_device_ops {
@@ -1278,6 +1281,8 @@ struct net_device_ops {
 #ifdef CONFIG_NETPOLICY
 	int			(*ndo_netpolicy_init)(struct net_device *dev,
 						      struct netpolicy_info *info);
+	int			(*ndo_get_irq_info)(struct net_device *dev,
+						    struct netpolicy_dev_info *info);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index ca1f131..fc87d9b 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -23,6 +23,13 @@ enum netpolicy_name {
 
 extern const char *policy_name[];
 
+struct netpolicy_dev_info {
+	u32	rx_num;
+	u32	tx_num;
+	u32	*rx_irq;
+	u32	*tx_irq;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 5f304d5..7c34c8a 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,20 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 
+static int netpolicy_get_dev_info(struct net_device *dev,
+				  struct netpolicy_dev_info *d_info)
+{
+	if (!dev->netdev_ops->ndo_get_irq_info)
+		return -ENOTSUPP;
+	return dev->netdev_ops->ndo_get_irq_info(dev, d_info);
+}
+
+static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
+{
+	kfree(d_info->rx_irq);
+	kfree(d_info->tx_irq);
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 05/30] i40e/netpolicy: implement ndo_get_irq_info
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:55   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Implement ndo_get_irq_info in i40e driver to get irq information of rx
and tx queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 40 +++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ee1f0b2..8a919e44 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8996,6 +8996,45 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 
 	return 0;
 }
+
+/**
+ * i40e_ndo_get_irq_info
+ * @dev: the net device pointer
+ * @info: irq information which need to be updated
+ *
+ * Update irq information of tx and rx queues
+ * Returns 0 on success, negative on failure
+ */
+static int i40e_ndo_get_irq_info(struct net_device *dev,
+				 struct netpolicy_dev_info *info)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
+	int i;
+
+	info->rx_num = vsi->num_queue_pairs;
+	info->rx_irq = kmalloc_array(info->rx_num, sizeof(u32), GFP_KERNEL);
+	if (!info->rx_irq) {
+		info->rx_num = 0;
+		return -ENOMEM;
+	}
+
+	info->tx_num = vsi->num_queue_pairs;
+	info->tx_irq = kmalloc_array(info->tx_num, sizeof(u32), GFP_KERNEL);
+	if (!info->tx_irq) {
+		info->tx_num = 0;
+		kfree(info->rx_irq);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < vsi->num_queue_pairs; i++) {
+		info->rx_irq[i] = pf->msix_entries[vsi->base_vector + i].vector;
+		info->tx_irq[i] = pf->msix_entries[vsi->base_vector + i].vector;
+	}
+
+	return 0;
+}
 #endif /* CONFIG_NETPOLICY */
 
 static const struct net_device_ops i40e_netdev_ops = {
@@ -9036,6 +9075,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_bridge_setlink	= i40e_ndo_bridge_setlink,
 #ifdef CONFIG_NETPOLICY
 	.ndo_netpolicy_init	= i40e_ndo_netpolicy_init,
+	.ndo_get_irq_info	= i40e_ndo_get_irq_info,
 #endif /* CONFIG_NETPOLICY */
 };
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 05/30] i40e/netpolicy: implement ndo_get_irq_info
@ 2016-07-18  6:55   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:55 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Implement ndo_get_irq_info in i40e driver to get irq information of rx
and tx queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 40 +++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index ee1f0b2..8a919e44 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8996,6 +8996,45 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 
 	return 0;
 }
+
+/**
+ * i40e_ndo_get_irq_info
+ * @dev: the net device pointer
+ * @info: irq information which need to be updated
+ *
+ * Update irq information of tx and rx queues
+ * Returns 0 on success, negative on failure
+ */
+static int i40e_ndo_get_irq_info(struct net_device *dev,
+				 struct netpolicy_dev_info *info)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
+	int i;
+
+	info->rx_num = vsi->num_queue_pairs;
+	info->rx_irq = kmalloc_array(info->rx_num, sizeof(u32), GFP_KERNEL);
+	if (!info->rx_irq) {
+		info->rx_num = 0;
+		return -ENOMEM;
+	}
+
+	info->tx_num = vsi->num_queue_pairs;
+	info->tx_irq = kmalloc_array(info->tx_num, sizeof(u32), GFP_KERNEL);
+	if (!info->tx_irq) {
+		info->tx_num = 0;
+		kfree(info->rx_irq);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < vsi->num_queue_pairs; i++) {
+		info->rx_irq[i] = pf->msix_entries[vsi->base_vector + i].vector;
+		info->tx_irq[i] = pf->msix_entries[vsi->base_vector + i].vector;
+	}
+
+	return 0;
+}
 #endif /* CONFIG_NETPOLICY */
 
 static const struct net_device_ops i40e_netdev_ops = {
@@ -9036,6 +9075,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 	.ndo_bridge_setlink	= i40e_ndo_bridge_setlink,
 #ifdef CONFIG_NETPOLICY
 	.ndo_netpolicy_init	= i40e_ndo_netpolicy_init,
+	.ndo_get_irq_info	= i40e_ndo_get_irq_info,
 #endif /* CONFIG_NETPOLICY */
 };
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 06/30] net/netpolicy: get CPU information
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Net policy also needs to know CPU information. Currently, online
CPU number is enough.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7c34c8a..075aaca 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -49,6 +49,11 @@ static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
 	kfree(d_info->tx_irq);
 }
 
+static u32 netpolicy_get_cpu_information(void)
+{
+	return num_online_cpus();
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 06/30] net/netpolicy: get CPU information
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Net policy also needs to know CPU information. Currently, online
CPU number is enough.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7c34c8a..075aaca 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -49,6 +49,11 @@ static void netpolicy_free_dev_info(struct netpolicy_dev_info *d_info)
 	kfree(d_info->tx_irq);
 }
 
+static u32 netpolicy_get_cpu_information(void)
+{
+	return num_online_cpus();
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 07/30] net/netpolicy: create CPU and queue mapping
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Current implementation forces CPU and queue 1:1 mapping. This patch
introduces the function netpolicy_update_sys_map to create this mapping.
The result is stored in netpolicy_sys_info.
If the CPU count and queue count are different, the remaining
CPUs/queues are not used.
CPU hotplug, device hotplug or ethtool may change the CPU count or
queue count. For these cases, this function can also be called to
reconstruct the mapping. These cases will be handled later in this
series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 18 ++++++++++++
 net/core/netpolicy.c      | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fc87d9b..a946b75c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -30,9 +30,27 @@ struct netpolicy_dev_info {
 	u32	*tx_irq;
 };
 
+struct netpolicy_sys_map {
+	u32	cpu;
+	u32	queue;
+	u32	irq;
+};
+
+struct netpolicy_sys_info {
+	/*
+	 * Record the cpu and queue 1:1 mapping
+	 */
+	u32				avail_rx_num;
+	struct netpolicy_sys_map	*rx;
+	u32				avail_tx_num;
+	struct netpolicy_sys_map	*tx;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+	/* cpu and queue mapping information */
+	struct netpolicy_sys_info	sys_info;
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 075aaca..ff7fc04 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -54,6 +54,80 @@ static u32 netpolicy_get_cpu_information(void)
 	return num_online_cpus();
 }
 
+static void netpolicy_free_sys_map(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+
+	kfree(s_info->rx);
+	s_info->rx = NULL;
+	s_info->avail_rx_num = 0;
+	kfree(s_info->tx);
+	s_info->tx = NULL;
+	s_info->avail_tx_num = 0;
+}
+
+static int netpolicy_update_sys_map(struct net_device *dev,
+				    struct netpolicy_dev_info *d_info,
+				    u32 cpu)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 num, i, online_cpu;
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return -ENOMEM;
+
+	/* update rx cpu map */
+	if (cpu > d_info->rx_num)
+		num = d_info->rx_num;
+	else
+		num = cpu;
+
+	s_info->avail_rx_num = num;
+	s_info->rx = kcalloc(num, sizeof(*s_info->rx), GFP_ATOMIC);
+	if (!s_info->rx)
+		goto err;
+	cpumask_copy(cpumask, cpu_online_mask);
+
+	i = 0;
+	for_each_cpu(online_cpu, cpumask) {
+		if (i == num)
+			break;
+		s_info->rx[i].cpu = online_cpu;
+		s_info->rx[i].queue = i;
+		s_info->rx[i].irq = d_info->rx_irq[i];
+		i++;
+	}
+
+	/* update tx cpu map */
+	if (cpu >= d_info->tx_num)
+		num = d_info->tx_num;
+	else
+		num = cpu;
+
+	s_info->avail_tx_num = num;
+	s_info->tx = kcalloc(num, sizeof(*s_info->tx), GFP_ATOMIC);
+	if (!s_info->tx)
+		goto err;
+
+	i = 0;
+	for_each_cpu(online_cpu, cpumask) {
+		if (i == num)
+			break;
+		s_info->tx[i].cpu = online_cpu;
+		s_info->tx[i].queue = i;
+		s_info->tx[i].irq = d_info->tx_irq[i];
+		i++;
+	}
+
+	free_cpumask_var(cpumask);
+	return 0;
+err:
+	netpolicy_free_sys_map(dev);
+	free_cpumask_var(cpumask);
+	return -ENOMEM;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 07/30] net/netpolicy: create CPU and queue mapping
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Current implementation forces CPU and queue 1:1 mapping. This patch
introduces the function netpolicy_update_sys_map to create this mapping.
The result is stored in netpolicy_sys_info.
If the CPU count and queue count are different, the remaining
CPUs/queues are not used.
CPU hotplug, device hotplug or ethtool may change the CPU count or
queue count. For these cases, this function can also be called to
reconstruct the mapping. These cases will be handled later in this
series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 18 ++++++++++++
 net/core/netpolicy.c      | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fc87d9b..a946b75c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -30,9 +30,27 @@ struct netpolicy_dev_info {
 	u32	*tx_irq;
 };
 
+struct netpolicy_sys_map {
+	u32	cpu;
+	u32	queue;
+	u32	irq;
+};
+
+struct netpolicy_sys_info {
+	/*
+	 * Record the cpu and queue 1:1 mapping
+	 */
+	u32				avail_rx_num;
+	struct netpolicy_sys_map	*rx;
+	u32				avail_tx_num;
+	struct netpolicy_sys_map	*tx;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+	/* cpu and queue mapping information */
+	struct netpolicy_sys_info	sys_info;
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 075aaca..ff7fc04 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -54,6 +54,80 @@ static u32 netpolicy_get_cpu_information(void)
 	return num_online_cpus();
 }
 
+static void netpolicy_free_sys_map(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+
+	kfree(s_info->rx);
+	s_info->rx = NULL;
+	s_info->avail_rx_num = 0;
+	kfree(s_info->tx);
+	s_info->tx = NULL;
+	s_info->avail_tx_num = 0;
+}
+
+static int netpolicy_update_sys_map(struct net_device *dev,
+				    struct netpolicy_dev_info *d_info,
+				    u32 cpu)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 num, i, online_cpu;
+	cpumask_var_t cpumask;
+
+	if (!alloc_cpumask_var(&cpumask, GFP_ATOMIC))
+		return -ENOMEM;
+
+	/* update rx cpu map */
+	if (cpu > d_info->rx_num)
+		num = d_info->rx_num;
+	else
+		num = cpu;
+
+	s_info->avail_rx_num = num;
+	s_info->rx = kcalloc(num, sizeof(*s_info->rx), GFP_ATOMIC);
+	if (!s_info->rx)
+		goto err;
+	cpumask_copy(cpumask, cpu_online_mask);
+
+	i = 0;
+	for_each_cpu(online_cpu, cpumask) {
+		if (i == num)
+			break;
+		s_info->rx[i].cpu = online_cpu;
+		s_info->rx[i].queue = i;
+		s_info->rx[i].irq = d_info->rx_irq[i];
+		i++;
+	}
+
+	/* update tx cpu map */
+	if (cpu >= d_info->tx_num)
+		num = d_info->tx_num;
+	else
+		num = cpu;
+
+	s_info->avail_tx_num = num;
+	s_info->tx = kcalloc(num, sizeof(*s_info->tx), GFP_ATOMIC);
+	if (!s_info->tx)
+		goto err;
+
+	i = 0;
+	for_each_cpu(online_cpu, cpumask) {
+		if (i == num)
+			break;
+		s_info->tx[i].cpu = online_cpu;
+		s_info->tx[i].queue = i;
+		s_info->tx[i].irq = d_info->tx_irq[i];
+		i++;
+	}
+
+	free_cpumask_var(cpumask);
+	return 0;
+err:
+	netpolicy_free_sys_map(dev);
+	free_cpumask_var(cpumask);
+	return -ENOMEM;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 08/30] net/netpolicy: set and remove irq affinity
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patches introduces functions to set and remove irq affinity
according to cpu and queue mapping.
The functions will not record the previous affinity status. After a
set/remove cycles, it will set the affinity on all online cpu with irq
balance enabling.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index ff7fc04..c44818d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/irq.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
 #include <linux/uaccess.h>
@@ -128,6 +129,38 @@ err:
 	return -ENOMEM;
 }
 
+static void netpolicy_clear_affinity(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 i;
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		irq_clear_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->rx[i].irq, cpu_online_mask);
+	}
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		irq_clear_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->tx[i].irq, cpu_online_mask);
+	}
+}
+
+static void netpolicy_set_affinity(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 i;
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		irq_set_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->rx[i].irq, cpumask_of(s_info->rx[i].cpu));
+	}
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		irq_set_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->tx[i].irq, cpumask_of(s_info->tx[i].cpu));
+	}
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 08/30] net/netpolicy: set and remove irq affinity
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

This patches introduces functions to set and remove irq affinity
according to cpu and queue mapping.
The functions will not record the previous affinity status. After a
set/remove cycles, it will set the affinity on all online cpu with irq
balance enabling.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index ff7fc04..c44818d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -29,6 +29,7 @@
 #include <linux/kernel.h>
 #include <linux/errno.h>
 #include <linux/init.h>
+#include <linux/irq.h>
 #include <linux/seq_file.h>
 #include <linux/proc_fs.h>
 #include <linux/uaccess.h>
@@ -128,6 +129,38 @@ err:
 	return -ENOMEM;
 }
 
+static void netpolicy_clear_affinity(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 i;
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		irq_clear_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->rx[i].irq, cpu_online_mask);
+	}
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		irq_clear_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->tx[i].irq, cpu_online_mask);
+	}
+}
+
+static void netpolicy_set_affinity(struct net_device *dev)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	u32 i;
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		irq_set_status_flags(s_info->rx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->rx[i].irq, cpumask_of(s_info->rx[i].cpu));
+	}
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		irq_set_status_flags(s_info->tx[i].irq, IRQ_NO_BALANCING);
+		irq_set_affinity_hint(s_info->tx[i].irq, cpumask_of(s_info->tx[i].cpu));
+	}
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 09/30] net/netpolicy: enable and disable net policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch introduces functions to enable and disable net policy.
For enabling, it collects device and cpu information, setup cpu/queue
mapping, and set irq affinity accordingly.
For disabling, it removes the irq affinity and mapping information.
np_lock should protect the enable and disable state. It will be done
later in this series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index c44818d..7d4a49d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -161,6 +161,45 @@ static void netpolicy_set_affinity(struct net_device *dev)
 	}
 }
 
+static int netpolicy_disable(struct net_device *dev)
+{
+	netpolicy_clear_affinity(dev);
+	netpolicy_free_sys_map(dev);
+
+	return 0;
+}
+
+static int netpolicy_enable(struct net_device *dev)
+{
+	int ret;
+	struct netpolicy_dev_info d_info;
+	u32 cpu;
+
+	if (WARN_ON(!dev->netpolicy))
+		return -EINVAL;
+
+	/* get driver information */
+	ret = netpolicy_get_dev_info(dev, &d_info);
+	if (ret)
+		return ret;
+
+	/* get cpu information */
+	cpu = netpolicy_get_cpu_information();
+
+	/* create sys map */
+	ret = netpolicy_update_sys_map(dev, &d_info, cpu);
+	if (ret) {
+		netpolicy_free_dev_info(&d_info);
+		return ret;
+	}
+
+	/* set irq affinity */
+	netpolicy_set_affinity(dev);
+
+	netpolicy_free_dev_info(&d_info);
+	return 0;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 09/30] net/netpolicy: enable and disable net policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

This patch introduces functions to enable and disable net policy.
For enabling, it collects device and cpu information, setup cpu/queue
mapping, and set irq affinity accordingly.
For disabling, it removes the irq affinity and mapping information.
np_lock should protect the enable and disable state. It will be done
later in this series.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index c44818d..7d4a49d 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -161,6 +161,45 @@ static void netpolicy_set_affinity(struct net_device *dev)
 	}
 }
 
+static int netpolicy_disable(struct net_device *dev)
+{
+	netpolicy_clear_affinity(dev);
+	netpolicy_free_sys_map(dev);
+
+	return 0;
+}
+
+static int netpolicy_enable(struct net_device *dev)
+{
+	int ret;
+	struct netpolicy_dev_info d_info;
+	u32 cpu;
+
+	if (WARN_ON(!dev->netpolicy))
+		return -EINVAL;
+
+	/* get driver information */
+	ret = netpolicy_get_dev_info(dev, &d_info);
+	if (ret)
+		return ret;
+
+	/* get cpu information */
+	cpu = netpolicy_get_cpu_information();
+
+	/* create sys map */
+	ret = netpolicy_update_sys_map(dev, &d_info, cpu);
+	if (ret) {
+		netpolicy_free_dev_info(&d_info);
+		return ret;
+	}
+
+	/* set irq affinity */
+	netpolicy_set_affinity(dev);
+
+	netpolicy_free_dev_info(&d_info);
+	return 0;
+}
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 10/30] net/netpolicy: introduce netpolicy object
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This patch introduces the concept of netpolicy object and policy object
list.

The netpolicy object is the instance of CPU/queue mapping. The object
can be shared between different tasks/sockets. So besides CPU and queue
information, the object also maintains a reference counter.

Each policy will have a dedicated object list. If the policy is set as
device policy, all objects will be inserted into the related policy
object list. The user will search and pickup the available objects from
the list later.

The network performance for objects could be different because of the
queue and CPU topology. To generate a proper object list, dev location,
HT and CPU topology have to be considered. The high performance objects
are in the front of the list.

The object lists will be regenerated if sys mapping changes or device
net policy changes.

Lock np_ob_list_lock is used to protect the object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |   2 +
 include/linux/netpolicy.h |  15 +++
 net/core/netpolicy.c      | 237 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 253 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3470943..e60c30f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1600,6 +1600,7 @@ enum netdev_priv_flags {
  *	@proc_dev:	device node in proc to configure device net policy
  *	@netpolicy:	NET policy related information of net device
  *	@np_lock:	protect the state of NET policy
+ *	@np_ob_list_lock:	protect the net policy object list
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -1874,6 +1875,7 @@ struct net_device {
 #endif /* CONFIG_PROC_FS */
 	struct netpolicy_info	*netpolicy;
 	spinlock_t		np_lock;
+	spinlock_t		np_ob_list_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a946b75c..73a5fa6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -21,6 +21,12 @@ enum netpolicy_name {
 	NET_POLICY_MAX,
 };
 
+enum netpolicy_traffic {
+	NETPOLICY_RX		= 0,
+	NETPOLICY_TX,
+	NETPOLICY_RXTX,
+};
+
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
@@ -46,11 +52,20 @@ struct netpolicy_sys_info {
 	struct netpolicy_sys_map	*tx;
 };
 
+struct netpolicy_object {
+	struct list_head	list;
+	u32			cpu;
+	u32			queue;
+	atomic_t		refcnt;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
+	/* List of policy objects 0 rx 1 tx */
+	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7d4a49d..0f8ff16 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,7 @@
 #include <linux/uaccess.h>
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
+#include <linux/sort.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -161,10 +162,30 @@ static void netpolicy_set_affinity(struct net_device *dev)
 	}
 }
 
+static void netpolicy_free_obj_list(struct net_device *dev)
+{
+	int i, j;
+	struct netpolicy_object *obj, *tmp;
+
+	spin_lock(&dev->np_ob_list_lock);
+	for (i = 0; i < NETPOLICY_RXTX; i++) {
+		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++) {
+			if (list_empty(&dev->netpolicy->obj_list[i][j]))
+				continue;
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[i][j], list) {
+				list_del(&obj->list);
+				kfree(obj);
+			}
+		}
+	}
+	spin_unlock(&dev->np_ob_list_lock);
+}
+
 static int netpolicy_disable(struct net_device *dev)
 {
 	netpolicy_clear_affinity(dev);
 	netpolicy_free_sys_map(dev);
+	netpolicy_free_obj_list(dev);
 
 	return 0;
 }
@@ -203,6 +224,212 @@ static int netpolicy_enable(struct net_device *dev)
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
+
+static u32 cpu_to_queue(struct net_device *dev,
+			u32 cpu, bool is_rx)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	int i;
+
+	if (is_rx) {
+		for (i = 0; i < s_info->avail_rx_num; i++) {
+			if (s_info->rx[i].cpu == cpu)
+				return s_info->rx[i].queue;
+		}
+	} else {
+		for (i = 0; i < s_info->avail_tx_num; i++) {
+			if (s_info->tx[i].cpu == cpu)
+				return s_info->tx[i].queue;
+		}
+	}
+
+	return ~0;
+}
+
+static int netpolicy_add_obj(struct net_device *dev,
+			     u32 cpu, bool is_rx,
+			     enum netpolicy_name policy)
+{
+	struct netpolicy_object *obj;
+	int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+
+	obj = kzalloc(sizeof(*obj), GFP_ATOMIC);
+	if (!obj)
+		return -ENOMEM;
+	obj->cpu = cpu;
+	obj->queue = cpu_to_queue(dev, cpu, is_rx);
+	list_add_tail(&obj->list, &dev->netpolicy->obj_list[dir][policy]);
+
+	return 0;
+}
+
+struct sort_node {
+	int	node;
+	int	distance;
+};
+
+static inline int node_distance_cmp(const void *a, const void *b)
+{
+	const struct sort_node *_a = a;
+	const struct sort_node *_b = b;
+
+	return _a->distance - _b->distance;
+}
+
+static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
+				   enum netpolicy_name policy,
+				   struct sort_node *nodes, int num_node,
+				   struct cpumask *node_avail_cpumask)
+{
+	cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
+	struct cpumask *node_assigned_cpumask;
+	int i, ret = -ENOMEM;
+	u32 cpu;
+
+	if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
+		return ret;
+	if (!alloc_cpumask_var(&sibling_tmp_cpumask, GFP_ATOMIC))
+		goto alloc_fail1;
+
+	node_assigned_cpumask = kcalloc(num_node, sizeof(struct cpumask), GFP_ATOMIC);
+	if (!node_assigned_cpumask)
+		goto alloc_fail2;
+
+	/* Don't share physical core */
+	for (i = 0; i < num_node; i++) {
+		if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
+			continue;
+		spin_lock(&dev->np_ob_list_lock);
+		cpumask_copy(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node]);
+		while (cpumask_weight(node_tmp_cpumask)) {
+			cpu = cpumask_first(node_tmp_cpumask);
+
+			/* push to obj list */
+			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (ret) {
+				spin_unlock(&dev->np_ob_list_lock);
+				goto err;
+			}
+
+			cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+			cpumask_and(sibling_tmp_cpumask, node_tmp_cpumask, topology_sibling_cpumask(cpu));
+			cpumask_xor(node_tmp_cpumask, node_tmp_cpumask, sibling_tmp_cpumask);
+		}
+		spin_unlock(&dev->np_ob_list_lock);
+	}
+
+	for (i = 0; i < num_node; i++) {
+		cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
+		if (cpumask_weight(node_tmp_cpumask) == 0)
+			continue;
+		spin_lock(&dev->np_ob_list_lock);
+		for_each_cpu(cpu, node_tmp_cpumask) {
+			/* push to obj list */
+			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (ret) {
+				spin_unlock(&dev->np_ob_list_lock);
+				goto err;
+			}
+			cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+		}
+		spin_unlock(&dev->np_ob_list_lock);
+	}
+
+err:
+	kfree(node_assigned_cpumask);
+alloc_fail2:
+	free_cpumask_var(sibling_tmp_cpumask);
+alloc_fail1:
+	free_cpumask_var(node_tmp_cpumask);
+
+	return ret;
+}
+
+static int netpolicy_gen_obj_list(struct net_device *dev,
+				  enum netpolicy_name policy)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	struct cpumask *node_avail_cpumask;
+	int dev_node = 0, num_nodes = 1;
+	struct sort_node *nodes;
+	int i, ret, node = 0;
+	u32 cpu;
+#ifdef CONFIG_NUMA
+	int val;
+#endif
+	/* The network performance for objects could be different
+	 * because of the queue and cpu topology.
+	 * The objects will be ordered accordingly,
+	 * and put high performance object in the front.
+	 *
+	 * The priority rules as below,
+	 * - The local object. (Local means cpu and queue are in the same node.)
+	 * - The cpu in the object is the only logical core in physical core.
+	 *   The sibiling core's object has not been added in the object list yet.
+	 * - The rest of objects
+	 *
+	 * So the order of object list is as below:
+	 * 1. Local core + the only logical core
+	 * 2. Remote core + the only logical core
+	 * 3. Local core + the core's sibling is already in the object list
+	 * 4. Remote core + the core's sibling is already in the object list
+	 */
+#ifdef CONFIG_NUMA
+	dev_node = dev_to_node(dev->dev.parent);
+	num_nodes = num_online_nodes();
+#endif
+
+	nodes = kcalloc(num_nodes, sizeof(*nodes), GFP_ATOMIC);
+	if (!nodes)
+		return -ENOMEM;
+
+	node_avail_cpumask = kcalloc(num_nodes, sizeof(struct cpumask), GFP_ATOMIC);
+	if (!node_avail_cpumask) {
+		kfree(nodes);
+		return -ENOMEM;
+	}
+
+#ifdef CONFIG_NUMA
+	/* order the node from near to far */
+	for_each_node_mask(i, node_online_map) {
+		val = node_distance(dev_node, i);
+		nodes[node].node = i;
+		nodes[node].distance = val;
+		node++;
+	}
+	sort(nodes, num_nodes, sizeof(*nodes),
+	     node_distance_cmp, NULL);
+#else
+	nodes[0].node = 0;
+#endif
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		cpu = s_info->rx[i].cpu;
+		cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+	}
+	ret = _netpolicy_gen_obj_list(dev, true, policy, nodes,
+				      node, node_avail_cpumask);
+	if (ret)
+		goto err;
+
+	for (i = 0; i < node; i++)
+		cpumask_clear(&node_avail_cpumask[nodes[i].node]);
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		cpu = s_info->tx[i].cpu;
+		cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+	}
+	ret = _netpolicy_gen_obj_list(dev, false, policy, nodes,
+				      node, node_avail_cpumask);
+	if (ret)
+		goto err;
+
+err:
+	kfree(nodes);
+	kfree(node_avail_cpumask);
+	return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -258,7 +485,7 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 
 int init_netpolicy(struct net_device *dev)
 {
-	int ret;
+	int ret, i, j;
 
 	spin_lock(&dev->np_lock);
 	ret = 0;
@@ -281,7 +508,15 @@ int init_netpolicy(struct net_device *dev)
 	if (ret) {
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
+		goto unlock;
+	}
+
+	spin_lock(&dev->np_ob_list_lock);
+	for (i = 0; i < NETPOLICY_RXTX; i++) {
+		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
+			INIT_LIST_HEAD(&dev->netpolicy->obj_list[i][j]);
 	}
+	spin_unlock(&dev->np_ob_list_lock);
 
 unlock:
 	spin_unlock(&dev->np_lock);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 10/30] net/netpolicy: introduce netpolicy object
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

This patch introduces the concept of netpolicy object and policy object
list.

The netpolicy object is the instance of CPU/queue mapping. The object
can be shared between different tasks/sockets. So besides CPU and queue
information, the object also maintains a reference counter.

Each policy will have a dedicated object list. If the policy is set as
device policy, all objects will be inserted into the related policy
object list. The user will search and pickup the available objects from
the list later.

The network performance for objects could be different because of the
queue and CPU topology. To generate a proper object list, dev location,
HT and CPU topology have to be considered. The high performance objects
are in the front of the list.

The object lists will be regenerated if sys mapping changes or device
net policy changes.

Lock np_ob_list_lock is used to protect the object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |   2 +
 include/linux/netpolicy.h |  15 +++
 net/core/netpolicy.c      | 237 +++++++++++++++++++++++++++++++++++++++++++++-
 3 files changed, 253 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3470943..e60c30f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1600,6 +1600,7 @@ enum netdev_priv_flags {
  *	@proc_dev:	device node in proc to configure device net policy
  *	@netpolicy:	NET policy related information of net device
  *	@np_lock:	protect the state of NET policy
+ *	@np_ob_list_lock:	protect the net policy object list
  *
  *	FIXME: cleanup struct net_device such that network protocol info
  *	moves out.
@@ -1874,6 +1875,7 @@ struct net_device {
 #endif /* CONFIG_PROC_FS */
 	struct netpolicy_info	*netpolicy;
 	spinlock_t		np_lock;
+	spinlock_t		np_ob_list_lock;
 #endif /* CONFIG_NETPOLICY */
 };
 #define to_net_dev(d) container_of(d, struct net_device, dev)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index a946b75c..73a5fa6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -21,6 +21,12 @@ enum netpolicy_name {
 	NET_POLICY_MAX,
 };
 
+enum netpolicy_traffic {
+	NETPOLICY_RX		= 0,
+	NETPOLICY_TX,
+	NETPOLICY_RXTX,
+};
+
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
@@ -46,11 +52,20 @@ struct netpolicy_sys_info {
 	struct netpolicy_sys_map	*tx;
 };
 
+struct netpolicy_object {
+	struct list_head	list;
+	u32			cpu;
+	u32			queue;
+	atomic_t		refcnt;
+};
+
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
+	/* List of policy objects 0 rx 1 tx */
+	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 7d4a49d..0f8ff16 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -35,6 +35,7 @@
 #include <linux/uaccess.h>
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
+#include <linux/sort.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -161,10 +162,30 @@ static void netpolicy_set_affinity(struct net_device *dev)
 	}
 }
 
+static void netpolicy_free_obj_list(struct net_device *dev)
+{
+	int i, j;
+	struct netpolicy_object *obj, *tmp;
+
+	spin_lock(&dev->np_ob_list_lock);
+	for (i = 0; i < NETPOLICY_RXTX; i++) {
+		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++) {
+			if (list_empty(&dev->netpolicy->obj_list[i][j]))
+				continue;
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[i][j], list) {
+				list_del(&obj->list);
+				kfree(obj);
+			}
+		}
+	}
+	spin_unlock(&dev->np_ob_list_lock);
+}
+
 static int netpolicy_disable(struct net_device *dev)
 {
 	netpolicy_clear_affinity(dev);
 	netpolicy_free_sys_map(dev);
+	netpolicy_free_obj_list(dev);
 
 	return 0;
 }
@@ -203,6 +224,212 @@ static int netpolicy_enable(struct net_device *dev)
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE"
 };
+
+static u32 cpu_to_queue(struct net_device *dev,
+			u32 cpu, bool is_rx)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	int i;
+
+	if (is_rx) {
+		for (i = 0; i < s_info->avail_rx_num; i++) {
+			if (s_info->rx[i].cpu == cpu)
+				return s_info->rx[i].queue;
+		}
+	} else {
+		for (i = 0; i < s_info->avail_tx_num; i++) {
+			if (s_info->tx[i].cpu == cpu)
+				return s_info->tx[i].queue;
+		}
+	}
+
+	return ~0;
+}
+
+static int netpolicy_add_obj(struct net_device *dev,
+			     u32 cpu, bool is_rx,
+			     enum netpolicy_name policy)
+{
+	struct netpolicy_object *obj;
+	int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+
+	obj = kzalloc(sizeof(*obj), GFP_ATOMIC);
+	if (!obj)
+		return -ENOMEM;
+	obj->cpu = cpu;
+	obj->queue = cpu_to_queue(dev, cpu, is_rx);
+	list_add_tail(&obj->list, &dev->netpolicy->obj_list[dir][policy]);
+
+	return 0;
+}
+
+struct sort_node {
+	int	node;
+	int	distance;
+};
+
+static inline int node_distance_cmp(const void *a, const void *b)
+{
+	const struct sort_node *_a = a;
+	const struct sort_node *_b = b;
+
+	return _a->distance - _b->distance;
+}
+
+static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
+				   enum netpolicy_name policy,
+				   struct sort_node *nodes, int num_node,
+				   struct cpumask *node_avail_cpumask)
+{
+	cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
+	struct cpumask *node_assigned_cpumask;
+	int i, ret = -ENOMEM;
+	u32 cpu;
+
+	if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
+		return ret;
+	if (!alloc_cpumask_var(&sibling_tmp_cpumask, GFP_ATOMIC))
+		goto alloc_fail1;
+
+	node_assigned_cpumask = kcalloc(num_node, sizeof(struct cpumask), GFP_ATOMIC);
+	if (!node_assigned_cpumask)
+		goto alloc_fail2;
+
+	/* Don't share physical core */
+	for (i = 0; i < num_node; i++) {
+		if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
+			continue;
+		spin_lock(&dev->np_ob_list_lock);
+		cpumask_copy(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node]);
+		while (cpumask_weight(node_tmp_cpumask)) {
+			cpu = cpumask_first(node_tmp_cpumask);
+
+			/* push to obj list */
+			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (ret) {
+				spin_unlock(&dev->np_ob_list_lock);
+				goto err;
+			}
+
+			cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+			cpumask_and(sibling_tmp_cpumask, node_tmp_cpumask, topology_sibling_cpumask(cpu));
+			cpumask_xor(node_tmp_cpumask, node_tmp_cpumask, sibling_tmp_cpumask);
+		}
+		spin_unlock(&dev->np_ob_list_lock);
+	}
+
+	for (i = 0; i < num_node; i++) {
+		cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
+		if (cpumask_weight(node_tmp_cpumask) == 0)
+			continue;
+		spin_lock(&dev->np_ob_list_lock);
+		for_each_cpu(cpu, node_tmp_cpumask) {
+			/* push to obj list */
+			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (ret) {
+				spin_unlock(&dev->np_ob_list_lock);
+				goto err;
+			}
+			cpumask_set_cpu(cpu, &node_assigned_cpumask[nodes[i].node]);
+		}
+		spin_unlock(&dev->np_ob_list_lock);
+	}
+
+err:
+	kfree(node_assigned_cpumask);
+alloc_fail2:
+	free_cpumask_var(sibling_tmp_cpumask);
+alloc_fail1:
+	free_cpumask_var(node_tmp_cpumask);
+
+	return ret;
+}
+
+static int netpolicy_gen_obj_list(struct net_device *dev,
+				  enum netpolicy_name policy)
+{
+	struct netpolicy_sys_info *s_info = &dev->netpolicy->sys_info;
+	struct cpumask *node_avail_cpumask;
+	int dev_node = 0, num_nodes = 1;
+	struct sort_node *nodes;
+	int i, ret, node = 0;
+	u32 cpu;
+#ifdef CONFIG_NUMA
+	int val;
+#endif
+	/* The network performance for objects could be different
+	 * because of the queue and cpu topology.
+	 * The objects will be ordered accordingly,
+	 * and put high performance object in the front.
+	 *
+	 * The priority rules as below,
+	 * - The local object. (Local means cpu and queue are in the same node.)
+	 * - The cpu in the object is the only logical core in physical core.
+	 *   The sibiling core's object has not been added in the object list yet.
+	 * - The rest of objects
+	 *
+	 * So the order of object list is as below:
+	 * 1. Local core + the only logical core
+	 * 2. Remote core + the only logical core
+	 * 3. Local core + the core's sibling is already in the object list
+	 * 4. Remote core + the core's sibling is already in the object list
+	 */
+#ifdef CONFIG_NUMA
+	dev_node = dev_to_node(dev->dev.parent);
+	num_nodes = num_online_nodes();
+#endif
+
+	nodes = kcalloc(num_nodes, sizeof(*nodes), GFP_ATOMIC);
+	if (!nodes)
+		return -ENOMEM;
+
+	node_avail_cpumask = kcalloc(num_nodes, sizeof(struct cpumask), GFP_ATOMIC);
+	if (!node_avail_cpumask) {
+		kfree(nodes);
+		return -ENOMEM;
+	}
+
+#ifdef CONFIG_NUMA
+	/* order the node from near to far */
+	for_each_node_mask(i, node_online_map) {
+		val = node_distance(dev_node, i);
+		nodes[node].node = i;
+		nodes[node].distance = val;
+		node++;
+	}
+	sort(nodes, num_nodes, sizeof(*nodes),
+	     node_distance_cmp, NULL);
+#else
+	nodes[0].node = 0;
+#endif
+
+	for (i = 0; i < s_info->avail_rx_num; i++) {
+		cpu = s_info->rx[i].cpu;
+		cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+	}
+	ret = _netpolicy_gen_obj_list(dev, true, policy, nodes,
+				      node, node_avail_cpumask);
+	if (ret)
+		goto err;
+
+	for (i = 0; i < node; i++)
+		cpumask_clear(&node_avail_cpumask[nodes[i].node]);
+
+	for (i = 0; i < s_info->avail_tx_num; i++) {
+		cpu = s_info->tx[i].cpu;
+		cpumask_set_cpu(cpu, &node_avail_cpumask[cpu_to_node(cpu)]);
+	}
+	ret = _netpolicy_gen_obj_list(dev, false, policy, nodes,
+				      node, node_avail_cpumask);
+	if (ret)
+		goto err;
+
+err:
+	kfree(nodes);
+	kfree(node_avail_cpumask);
+	return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -258,7 +485,7 @@ static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 
 int init_netpolicy(struct net_device *dev)
 {
-	int ret;
+	int ret, i, j;
 
 	spin_lock(&dev->np_lock);
 	ret = 0;
@@ -281,7 +508,15 @@ int init_netpolicy(struct net_device *dev)
 	if (ret) {
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
+		goto unlock;
+	}
+
+	spin_lock(&dev->np_ob_list_lock);
+	for (i = 0; i < NETPOLICY_RXTX; i++) {
+		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
+			INIT_LIST_HEAD(&dev->netpolicy->obj_list[i][j]);
 	}
+	spin_unlock(&dev->np_ob_list_lock);
 
 unlock:
 	spin_unlock(&dev->np_lock);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 11/30] net/netpolicy: set net policy by policy name
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

User can write policy name to /proc/net/netpolicy/$DEV/policy to enable
net policy for specific device.

When the policy is enabled, the module automatically disables irq
balance and set irq affinity. The object list is also generated
accordingly.

np_lock will be used to protect the state.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |  5 +++
 include/linux/netpolicy.h |  1 +
 net/core/netpolicy.c      | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e60c30f..45cb589 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1094,6 +1094,9 @@ struct tc_to_netdev {
  * int (*ndo_get_irq_info)(struct net_device *dev,
  * 			   struct netpolicy_dev_info *info);
  * 	This function is used to get irq information of rx and tx queues
+ * int (*ndo_set_net_policy)(struct net_device *dev,
+ * 			     enum netpolicy_name name);
+ * 	This function is used to set global net policy by name
  *
  */
 struct net_device_ops {
@@ -1283,6 +1286,8 @@ struct net_device_ops {
 						      struct netpolicy_info *info);
 	int			(*ndo_get_irq_info)(struct net_device *dev,
 						    struct netpolicy_dev_info *info);
+	int			(*ndo_set_net_policy)(struct net_device *dev,
+						      enum netpolicy_name name);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 73a5fa6..b1d9277 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -27,6 +27,7 @@ enum netpolicy_traffic {
 	NETPOLICY_RXTX,
 };
 
+#define POLICY_NAME_LEN_MAX	64
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0f8ff16..8112839 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,7 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 #include <linux/sort.h>
+#include <linux/ctype.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -430,6 +431,69 @@ err:
 	return ret;
 }
 
+static int net_policy_set_by_name(char *name, struct net_device *dev)
+{
+	int i, ret;
+
+	spin_lock(&dev->np_lock);
+	ret = 0;
+
+	if (!dev->netpolicy ||
+	    !dev->netdev_ops->ndo_set_net_policy) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+		break;
+	}
+
+	if (!test_bit(i, dev->netpolicy->avail_policy)) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (i == dev->netpolicy->cur_policy)
+		goto unlock;
+
+	/* If there is no policy applied yet, need to do enable first . */
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+		ret = netpolicy_enable(dev);
+		if (ret)
+			goto unlock;
+	}
+
+	netpolicy_free_obj_list(dev);
+
+	/* Generate object list according to policy name */
+	ret = netpolicy_gen_obj_list(dev, i);
+	if (ret)
+		goto err;
+
+	/* set policy */
+	ret = dev->netdev_ops->ndo_set_net_policy(dev, i);
+	if (ret)
+		goto err;
+
+	/* If removing policy, need to do disable. */
+	if (i == NET_POLICY_NONE)
+		netpolicy_disable(dev);
+
+	dev->netpolicy->cur_policy = i;
+
+	spin_unlock(&dev->np_lock);
+	return 0;
+
+err:
+	netpolicy_free_obj_list(dev);
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE)
+		netpolicy_disable(dev);
+unlock:
+	spin_unlock(&dev->np_lock);
+	return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -459,11 +523,40 @@ static int net_policy_proc_open(struct inode *inode, struct file *file)
 	return single_open(file, net_policy_proc_show, PDE_DATA(inode));
 }
 
+static ssize_t net_policy_proc_write(struct file *file, const char __user *buf,
+				     size_t count, loff_t *pos)
+{
+	struct seq_file *m = file->private_data;
+	struct net_device *dev = (struct net_device *)m->private;
+	char name[POLICY_NAME_LEN_MAX];
+	int i, ret;
+
+	if (!dev->netpolicy)
+		return -ENOTSUPP;
+
+	if (count > POLICY_NAME_LEN_MAX)
+		return -EINVAL;
+
+	if (copy_from_user(name, buf, count))
+		return -EINVAL;
+
+	for (i = 0; i < count - 1; i++)
+		name[i] = toupper(name[i]);
+	name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+	ret = net_policy_set_by_name(name, dev);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
 static const struct file_operations proc_net_policy_operations = {
 	.open		= net_policy_proc_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
 	.release	= seq_release,
+	.write		= net_policy_proc_write,
 	.owner		= THIS_MODULE,
 };
 
@@ -527,6 +620,8 @@ void uninit_netpolicy(struct net_device *dev)
 {
 	spin_lock(&dev->np_lock);
 	if (dev->netpolicy) {
+		if (dev->netpolicy->cur_policy > NET_POLICY_NONE)
+			netpolicy_disable(dev);
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
 	}
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 11/30] net/netpolicy: set net policy by policy name
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

User can write policy name to /proc/net/netpolicy/$DEV/policy to enable
net policy for specific device.

When the policy is enabled, the module automatically disables irq
balance and set irq affinity. The object list is also generated
accordingly.

np_lock will be used to protect the state.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netdevice.h |  5 +++
 include/linux/netpolicy.h |  1 +
 net/core/netpolicy.c      | 95 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e60c30f..45cb589 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1094,6 +1094,9 @@ struct tc_to_netdev {
  * int (*ndo_get_irq_info)(struct net_device *dev,
  * 			   struct netpolicy_dev_info *info);
  * 	This function is used to get irq information of rx and tx queues
+ * int (*ndo_set_net_policy)(struct net_device *dev,
+ * 			     enum netpolicy_name name);
+ * 	This function is used to set global net policy by name
  *
  */
 struct net_device_ops {
@@ -1283,6 +1286,8 @@ struct net_device_ops {
 						      struct netpolicy_info *info);
 	int			(*ndo_get_irq_info)(struct net_device *dev,
 						    struct netpolicy_dev_info *info);
+	int			(*ndo_set_net_policy)(struct net_device *dev,
+						      enum netpolicy_name name);
 #endif /* CONFIG_NETPOLICY */
 };
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 73a5fa6..b1d9277 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -27,6 +27,7 @@ enum netpolicy_traffic {
 	NETPOLICY_RXTX,
 };
 
+#define POLICY_NAME_LEN_MAX	64
 extern const char *policy_name[];
 
 struct netpolicy_dev_info {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0f8ff16..8112839 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -36,6 +36,7 @@
 #include <linux/netdevice.h>
 #include <net/net_namespace.h>
 #include <linux/sort.h>
+#include <linux/ctype.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -430,6 +431,69 @@ err:
 	return ret;
 }
 
+static int net_policy_set_by_name(char *name, struct net_device *dev)
+{
+	int i, ret;
+
+	spin_lock(&dev->np_lock);
+	ret = 0;
+
+	if (!dev->netpolicy ||
+	    !dev->netdev_ops->ndo_set_net_policy) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+		break;
+	}
+
+	if (!test_bit(i, dev->netpolicy->avail_policy)) {
+		ret = -ENOTSUPP;
+		goto unlock;
+	}
+
+	if (i == dev->netpolicy->cur_policy)
+		goto unlock;
+
+	/* If there is no policy applied yet, need to do enable first . */
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+		ret = netpolicy_enable(dev);
+		if (ret)
+			goto unlock;
+	}
+
+	netpolicy_free_obj_list(dev);
+
+	/* Generate object list according to policy name */
+	ret = netpolicy_gen_obj_list(dev, i);
+	if (ret)
+		goto err;
+
+	/* set policy */
+	ret = dev->netdev_ops->ndo_set_net_policy(dev, i);
+	if (ret)
+		goto err;
+
+	/* If removing policy, need to do disable. */
+	if (i == NET_POLICY_NONE)
+		netpolicy_disable(dev);
+
+	dev->netpolicy->cur_policy = i;
+
+	spin_unlock(&dev->np_lock);
+	return 0;
+
+err:
+	netpolicy_free_obj_list(dev);
+	if (dev->netpolicy->cur_policy == NET_POLICY_NONE)
+		netpolicy_disable(dev);
+unlock:
+	spin_unlock(&dev->np_lock);
+	return ret;
+}
+
 #ifdef CONFIG_PROC_FS
 
 static int net_policy_proc_show(struct seq_file *m, void *v)
@@ -459,11 +523,40 @@ static int net_policy_proc_open(struct inode *inode, struct file *file)
 	return single_open(file, net_policy_proc_show, PDE_DATA(inode));
 }
 
+static ssize_t net_policy_proc_write(struct file *file, const char __user *buf,
+				     size_t count, loff_t *pos)
+{
+	struct seq_file *m = file->private_data;
+	struct net_device *dev = (struct net_device *)m->private;
+	char name[POLICY_NAME_LEN_MAX];
+	int i, ret;
+
+	if (!dev->netpolicy)
+		return -ENOTSUPP;
+
+	if (count > POLICY_NAME_LEN_MAX)
+		return -EINVAL;
+
+	if (copy_from_user(name, buf, count))
+		return -EINVAL;
+
+	for (i = 0; i < count - 1; i++)
+		name[i] = toupper(name[i]);
+	name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+	ret = net_policy_set_by_name(name, dev);
+	if (ret)
+		return ret;
+
+	return count;
+}
+
 static const struct file_operations proc_net_policy_operations = {
 	.open		= net_policy_proc_open,
 	.read		= seq_read,
 	.llseek		= seq_lseek,
 	.release	= seq_release,
+	.write		= net_policy_proc_write,
 	.owner		= THIS_MODULE,
 };
 
@@ -527,6 +620,8 @@ void uninit_netpolicy(struct net_device *dev)
 {
 	spin_lock(&dev->np_lock);
 	if (dev->netpolicy) {
+		if (dev->netpolicy->cur_policy > NET_POLICY_NONE)
+			netpolicy_disable(dev);
 		kfree(dev->netpolicy);
 		dev->netpolicy = NULL;
 	}
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 12/30] i40e/netpolicy: implement ndo_set_net_policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Setting net policy for all tx and rx queues according to policy name.
For i40e driver, the policy only changes per queue interrupt moderation.
It uses new ethtool callback (per queue coalesce setting) to configure
the driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  3 ++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  6 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 43 ++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index e83fc8a..a4bd430 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -862,4 +862,7 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
+			    struct ethtool_coalesce *ec,
+			    int queue);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 4962e85..1f3537e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2012,9 +2012,9 @@ static int i40e_get_per_queue_coalesce(struct net_device *netdev, u32 queue,
 	return __i40e_get_coalesce(netdev, ec, queue);
 }
 
-static void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
-				   struct ethtool_coalesce *ec,
-				   int queue)
+void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
+			    struct ethtool_coalesce *ec,
+			    int queue)
 {
 	struct i40e_pf *pf = vsi->back;
 	struct i40e_hw *hw = &pf->hw;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8a919e44..3336373 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9035,6 +9035,48 @@ static int i40e_ndo_get_irq_info(struct net_device *dev,
 
 	return 0;
 }
+
+/**
+ * i40e_set_net_policy
+ * @dev: the net device pointer
+ * @name: policy name
+ *
+ * set policy to each tx and rx queue
+ * Returns 0 on success, negative on failure
+ */
+static int i40e_set_net_policy(struct net_device *dev,
+			       enum netpolicy_name name)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct netpolicy_object *obj;
+	struct ethtool_coalesce ec;
+
+	if (policy_param[name][NETPOLICY_RX] > 0) {
+		ec.rx_coalesce_usecs = policy_param[name][NETPOLICY_RX];
+		ec.use_adaptive_rx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_RX] == 0) {
+		ec.use_adaptive_rx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	if (policy_param[name][NETPOLICY_TX] > 0) {
+		ec.tx_coalesce_usecs = policy_param[name][NETPOLICY_TX];
+		ec.use_adaptive_tx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_TX] == 0) {
+		ec.use_adaptive_tx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	/*For i40e driver, tx and rx are always in pair */
+	list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][name], list) {
+		i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+	}
+
+	return 0;
+}
 #endif /* CONFIG_NETPOLICY */
 
 static const struct net_device_ops i40e_netdev_ops = {
@@ -9076,6 +9118,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 #ifdef CONFIG_NETPOLICY
 	.ndo_netpolicy_init	= i40e_ndo_netpolicy_init,
 	.ndo_get_irq_info	= i40e_ndo_get_irq_info,
+	.ndo_set_net_policy	= i40e_set_net_policy,
 #endif /* CONFIG_NETPOLICY */
 };
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 12/30] i40e/netpolicy: implement ndo_set_net_policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Setting net policy for all tx and rx queues according to policy name.
For i40e driver, the policy only changes per queue interrupt moderation.
It uses new ethtool callback (per queue coalesce setting) to configure
the driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h         |  3 ++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  6 ++--
 drivers/net/ethernet/intel/i40e/i40e_main.c    | 43 ++++++++++++++++++++++++++
 3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index e83fc8a..a4bd430 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -862,4 +862,7 @@ i40e_status i40e_get_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_set_npar_bw_setting(struct i40e_pf *pf);
 i40e_status i40e_commit_npar_bw_setting(struct i40e_pf *pf);
 void i40e_print_link_message(struct i40e_vsi *vsi, bool isup);
+void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
+			    struct ethtool_coalesce *ec,
+			    int queue);
 #endif /* _I40E_H_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 4962e85..1f3537e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2012,9 +2012,9 @@ static int i40e_get_per_queue_coalesce(struct net_device *netdev, u32 queue,
 	return __i40e_get_coalesce(netdev, ec, queue);
 }
 
-static void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
-				   struct ethtool_coalesce *ec,
-				   int queue)
+void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
+			    struct ethtool_coalesce *ec,
+			    int queue)
 {
 	struct i40e_pf *pf = vsi->back;
 	struct i40e_hw *hw = &pf->hw;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 8a919e44..3336373 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9035,6 +9035,48 @@ static int i40e_ndo_get_irq_info(struct net_device *dev,
 
 	return 0;
 }
+
+/**
+ * i40e_set_net_policy
+ * @dev: the net device pointer
+ * @name: policy name
+ *
+ * set policy to each tx and rx queue
+ * Returns 0 on success, negative on failure
+ */
+static int i40e_set_net_policy(struct net_device *dev,
+			       enum netpolicy_name name)
+{
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct netpolicy_object *obj;
+	struct ethtool_coalesce ec;
+
+	if (policy_param[name][NETPOLICY_RX] > 0) {
+		ec.rx_coalesce_usecs = policy_param[name][NETPOLICY_RX];
+		ec.use_adaptive_rx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_RX] == 0) {
+		ec.use_adaptive_rx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	if (policy_param[name][NETPOLICY_TX] > 0) {
+		ec.tx_coalesce_usecs = policy_param[name][NETPOLICY_TX];
+		ec.use_adaptive_tx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_TX] == 0) {
+		ec.use_adaptive_tx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	/*For i40e driver, tx and rx are always in pair */
+	list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][name], list) {
+		i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+	}
+
+	return 0;
+}
 #endif /* CONFIG_NETPOLICY */
 
 static const struct net_device_ops i40e_netdev_ops = {
@@ -9076,6 +9118,7 @@ static const struct net_device_ops i40e_netdev_ops = {
 #ifdef CONFIG_NETPOLICY
 	.ndo_netpolicy_init	= i40e_ndo_netpolicy_init,
 	.ndo_get_irq_info	= i40e_ndo_get_irq_info,
+	.ndo_set_net_policy	= i40e_set_net_policy,
 #endif /* CONFIG_NETPOLICY */
 };
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 13/30] i40e/netpolicy: add three new net policies
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Introduce three net policies for i40e driver.
CPU policy: configure for higher throughput and lower CPU%.
BULK policy: configure for highest throughput.
LATENCY policy: configure for lowest latency.

Lots of tests are done for net policy on platforms with Intel Xeon E5 V2
and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
Netperf is used to evaluate the throughput and latency performance for
these three net policies.
For "BULK" policy, the throughput performance is on average ~1.26X than
baseline.
For "CPU" policy, the throughput performance is on average ~1.20X than
baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
For "LATENCY" policy, the latency is on average 53.5% less than the
baseline.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 ++++++++++
 include/linux/netpolicy.h                   |  3 +++
 net/core/netpolicy.c                        |  5 ++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3336373..11b921b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8961,12 +8961,22 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 }
 
 #ifdef CONFIG_NETPOLICY
+/* Interrupt moderation in microseconds */
+#define NET_POLICY_CPU_RX	125
+#define NET_POLICY_CPU_TX	250
+#define NET_POLICY_BULK_RX	50
+#define NET_POLICY_BULK_TX	125
+#define NET_POLICY_LATENCY_RX	5
+#define NET_POLICY_LATENCY_TX	10
 
 #define NET_POLICY_NOT_SUPPORT	-2
 #define NET_POLICY_END		-3
 static int policy_param[NET_POLICY_MAX + 1][2] = {
 	/* rx-usec, tx-usec */
 	{0, 0},
+	{NET_POLICY_CPU_RX, NET_POLICY_CPU_TX},		/* CPU policy */
+	{NET_POLICY_BULK_RX, NET_POLICY_BULK_TX},	/* BULK policy */
+	{NET_POLICY_LATENCY_RX, NET_POLICY_LATENCY_TX},	/* LATENCY policy */
 
 	{NET_POLICY_END, NET_POLICY_END},
 };
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index b1d9277..3d348a7 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -18,6 +18,9 @@
 
 enum netpolicy_name {
 	NET_POLICY_NONE		= 0,
+	NET_POLICY_CPU,
+	NET_POLICY_BULK,
+	NET_POLICY_LATENCY,
 	NET_POLICY_MAX,
 };
 
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8112839..71e9163 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -223,7 +223,10 @@ static int netpolicy_enable(struct net_device *dev)
 }
 
 const char *policy_name[NET_POLICY_MAX] = {
-	"NONE"
+	"NONE",
+	"CPU",
+	"BULK",
+	"LATENCY"
 };
 
 static u32 cpu_to_queue(struct net_device *dev,
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 13/30] i40e/netpolicy: add three new net policies
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Introduce three net policies for i40e driver.
CPU policy: configure for higher throughput and lower CPU%.
BULK policy: configure for highest throughput.
LATENCY policy: configure for lowest latency.

Lots of tests are done for net policy on platforms with Intel Xeon E5 V2
and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
Netperf is used to evaluate the throughput and latency performance for
these three net policies.
For "BULK" policy, the throughput performance is on average ~1.26X than
baseline.
For "CPU" policy, the throughput performance is on average ~1.20X than
baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
For "LATENCY" policy, the latency is on average 53.5% less than the
baseline.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 10 ++++++++++
 include/linux/netpolicy.h                   |  3 +++
 net/core/netpolicy.c                        |  5 ++++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 3336373..11b921b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8961,12 +8961,22 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 }
 
 #ifdef CONFIG_NETPOLICY
+/* Interrupt moderation in microseconds */
+#define NET_POLICY_CPU_RX	125
+#define NET_POLICY_CPU_TX	250
+#define NET_POLICY_BULK_RX	50
+#define NET_POLICY_BULK_TX	125
+#define NET_POLICY_LATENCY_RX	5
+#define NET_POLICY_LATENCY_TX	10
 
 #define NET_POLICY_NOT_SUPPORT	-2
 #define NET_POLICY_END		-3
 static int policy_param[NET_POLICY_MAX + 1][2] = {
 	/* rx-usec, tx-usec */
 	{0, 0},
+	{NET_POLICY_CPU_RX, NET_POLICY_CPU_TX},		/* CPU policy */
+	{NET_POLICY_BULK_RX, NET_POLICY_BULK_TX},	/* BULK policy */
+	{NET_POLICY_LATENCY_RX, NET_POLICY_LATENCY_TX},	/* LATENCY policy */
 
 	{NET_POLICY_END, NET_POLICY_END},
 };
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index b1d9277..3d348a7 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -18,6 +18,9 @@
 
 enum netpolicy_name {
 	NET_POLICY_NONE		= 0,
+	NET_POLICY_CPU,
+	NET_POLICY_BULK,
+	NET_POLICY_LATENCY,
 	NET_POLICY_MAX,
 };
 
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8112839..71e9163 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -223,7 +223,10 @@ static int netpolicy_enable(struct net_device *dev)
 }
 
 const char *policy_name[NET_POLICY_MAX] = {
-	"NONE"
+	"NONE",
+	"CPU",
+	"BULK",
+	"LATENCY"
 };
 
 static u32 cpu_to_queue(struct net_device *dev,
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 14/30] net/netpolicy: add MIX policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

MIX policy is combine of other policies. It allows different queue has
different policy. If MIX policy is applied,
/proc/net/netpolicy/$DEV/policy shows per queue policy.
Usually, the workloads requires either high throughput or low latency.
So for current implementation, MIX policy is combine of latency policy
and bulk policy.
The workloads which requires high throughput are usually utilize more
CPU resources compared to the workloads which requires low latency. This
means that if there is an equal interest in latency and throughput
performance, it is better to reserve more bulk queues than latency
queues.
In this patch, MIX policy is forced to include 1/3 latency policy queues
and 2/3 bulk policy queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |   7 +++
 net/core/netpolicy.c      | 139 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 3d348a7..579ff98 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -22,6 +22,12 @@ enum netpolicy_name {
 	NET_POLICY_BULK,
 	NET_POLICY_LATENCY,
 	NET_POLICY_MAX,
+
+	/*
+	 * Mixture of the above policy
+	 * Can only be set as global policy.
+	 */
+	NET_POLICY_MIX,
 };
 
 enum netpolicy_traffic {
@@ -66,6 +72,7 @@ struct netpolicy_object {
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+	bool	has_mix_policy;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 71e9163..8336106 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -280,6 +280,9 @@ static inline int node_distance_cmp(const void *a, const void *b)
 	return _a->distance - _b->distance;
 }
 
+#define mix_latency_num(num)	((num) / 3)
+#define mix_throughput_num(num)	((num) - mix_latency_num(num))
+
 static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 				   enum netpolicy_name policy,
 				   struct sort_node *nodes, int num_node,
@@ -287,7 +290,9 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 {
 	cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
 	struct cpumask *node_assigned_cpumask;
+	int *l_num = NULL, *b_num = NULL;
 	int i, ret = -ENOMEM;
+	int num_node_cpu;
 	u32 cpu;
 
 	if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
@@ -299,6 +304,23 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 	if (!node_assigned_cpumask)
 		goto alloc_fail2;
 
+	if (policy == NET_POLICY_MIX) {
+		l_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+		if (!l_num)
+			goto alloc_fail3;
+		b_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+		if (!b_num) {
+			kfree(l_num);
+			goto alloc_fail3;
+		}
+
+		for (i = 0; i < num_node; i++) {
+			num_node_cpu = cpumask_weight(&node_avail_cpumask[nodes[i].node]);
+			l_num[i] = mix_latency_num(num_node_cpu);
+			b_num[i] = mix_throughput_num(num_node_cpu);
+		}
+	}
+
 	/* Don't share physical core */
 	for (i = 0; i < num_node; i++) {
 		if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
@@ -309,7 +331,13 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 			cpu = cpumask_first(node_tmp_cpumask);
 
 			/* push to obj list */
-			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (policy == NET_POLICY_MIX) {
+				if (l_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_LATENCY);
+				else if (b_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_BULK);
+			} else
+				ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
 			if (ret) {
 				spin_unlock(&dev->np_ob_list_lock);
 				goto err;
@@ -322,6 +350,41 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 		spin_unlock(&dev->np_ob_list_lock);
 	}
 
+	if (policy == NET_POLICY_MIX) {
+		struct netpolicy_object *obj;
+		int dir = is_rx ? 0 : 1;
+		u32 sibling;
+
+		/* if have to share core, choose latency core first. */
+		for (i = 0; i < num_node; i++) {
+			if ((l_num[i] < 1) && (b_num[i] < 1))
+				continue;
+			spin_lock(&dev->np_ob_list_lock);
+			list_for_each_entry(obj, &dev->netpolicy->obj_list[dir][NET_POLICY_LATENCY], list) {
+				if (cpu_to_node(obj->cpu) != nodes[i].node)
+					continue;
+
+				cpu = obj->cpu;
+				for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
+					if (cpumask_test_cpu(sibling, &node_assigned_cpumask[nodes[i].node]) ||
+					    !cpumask_test_cpu(sibling, &node_avail_cpumask[nodes[i].node]))
+						continue;
+
+					if (l_num[i]-- > 0)
+						ret = netpolicy_add_obj(dev, sibling, is_rx, NET_POLICY_LATENCY);
+					else if (b_num[i]-- > 0)
+						ret = netpolicy_add_obj(dev, sibling, is_rx, NET_POLICY_BULK);
+					if (ret) {
+						spin_unlock(&dev->np_ob_list_lock);
+						goto err;
+					}
+					cpumask_set_cpu(sibling, &node_assigned_cpumask[nodes[i].node]);
+				}
+			}
+			spin_unlock(&dev->np_ob_list_lock);
+		}
+	}
+
 	for (i = 0; i < num_node; i++) {
 		cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
 		if (cpumask_weight(node_tmp_cpumask) == 0)
@@ -329,7 +392,15 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 		spin_lock(&dev->np_ob_list_lock);
 		for_each_cpu(cpu, node_tmp_cpumask) {
 			/* push to obj list */
-			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (policy == NET_POLICY_MIX) {
+				if (l_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_LATENCY);
+				else if (b_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_BULK);
+				else
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_NONE);
+			} else
+				ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
 			if (ret) {
 				spin_unlock(&dev->np_ob_list_lock);
 				goto err;
@@ -340,6 +411,11 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 	}
 
 err:
+	if (policy == NET_POLICY_MIX) {
+		kfree(l_num);
+		kfree(b_num);
+	}
+alloc_fail3:
 	kfree(node_assigned_cpumask);
 alloc_fail2:
 	free_cpumask_var(sibling_tmp_cpumask);
@@ -377,6 +453,22 @@ static int netpolicy_gen_obj_list(struct net_device *dev,
 	 * 2. Remote core + the only logical core
 	 * 3. Local core + the core's sibling is already in the object list
 	 * 4. Remote core + the core's sibling is already in the object list
+	 *
+	 * For MIX policy, on each node, force 1/3 core as latency policy core,
+	 * the rest cores are bulk policy core.
+	 *
+	 * Besides the above priority rules, there is one more rule
+	 * - If it's sibling core's object has been applied a policy
+	 *   Choose the object which the sibling logical core applies latency policy first
+	 *
+	 * So the order of object list for MIX policy is as below:
+	 * 1. Local core + the only logical core
+	 * 2. Remote core + the only logical core
+	 * 3. Local core + the core's sibling is latency policy core
+	 * 4. Remote core + the core's sibling is latency policy core
+	 * 5. Local core + the core's sibling is bulk policy core
+	 * 6. Remote core + the core's sibling is bulk policy core
+	 *
 	 */
 #ifdef CONFIG_NUMA
 	dev_node = dev_to_node(dev->dev.parent);
@@ -447,14 +539,23 @@ static int net_policy_set_by_name(char *name, struct net_device *dev)
 		goto unlock;
 	}
 
-	for (i = 0; i < NET_POLICY_MAX; i++) {
-		if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
-		break;
-	}
+	if (!strncmp(name, "MIX", strlen("MIX"))) {
+		if (dev->netpolicy->has_mix_policy) {
+			i = NET_POLICY_MIX;
+		} else {
+			ret = -ENOTSUPP;
+			goto unlock;
+		}
+	} else {
+		for (i = 0; i < NET_POLICY_MAX; i++) {
+			if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+			break;
+		}
 
-	if (!test_bit(i, dev->netpolicy->avail_policy)) {
-		ret = -ENOTSUPP;
-		goto unlock;
+		if (!test_bit(i, dev->netpolicy->avail_policy)) {
+			ret = -ENOTSUPP;
+			goto unlock;
+		}
 	}
 
 	if (i == dev->netpolicy->cur_policy)
@@ -502,17 +603,35 @@ unlock:
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
 	struct net_device *dev = (struct net_device *)m->private;
+	enum netpolicy_name cur;
+	struct netpolicy_object *obj, *tmp;
 	int i;
 
 	if (WARN_ON(!dev->netpolicy))
 		return -EINVAL;
 
-	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+	cur = dev->netpolicy->cur_policy;
+	if (cur == NET_POLICY_NONE) {
 		seq_printf(m, "%s: There is no policy applied\n", dev->name);
 		seq_printf(m, "%s: The available policy include:", dev->name);
 		for_each_set_bit(i, dev->netpolicy->avail_policy, NET_POLICY_MAX)
 			seq_printf(m, " %s", policy_name[i]);
+		if (dev->netpolicy->has_mix_policy)
+			seq_printf(m, " MIX");
 		seq_printf(m, "\n");
+	} else if (cur == NET_POLICY_MIX) {
+		seq_printf(m, "%s: MIX policy is running on the system\n", dev->name);
+		spin_lock(&dev->np_ob_list_lock);
+		for (i = NET_POLICY_NONE; i < NET_POLICY_MAX; i++) {
+			seq_printf(m, "%s: queues for %s policy\n", dev->name, policy_name[i]);
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[NETPOLICY_RX][i], list) {
+				seq_printf(m, "%s: rx queue %d\n", dev->name, obj->queue);
+			}
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[NETPOLICY_TX][i], list) {
+				seq_printf(m, "%s: tx queue %d\n", dev->name, obj->queue);
+			}
+		}
+		spin_unlock(&dev->np_ob_list_lock);
 	} else {
 		seq_printf(m, "%s: POLICY %s is running on the system\n",
 			   dev->name, policy_name[dev->netpolicy->cur_policy]);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 14/30] net/netpolicy: add MIX policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

MIX policy is combine of other policies. It allows different queue has
different policy. If MIX policy is applied,
/proc/net/netpolicy/$DEV/policy shows per queue policy.
Usually, the workloads requires either high throughput or low latency.
So for current implementation, MIX policy is combine of latency policy
and bulk policy.
The workloads which requires high throughput are usually utilize more
CPU resources compared to the workloads which requires low latency. This
means that if there is an equal interest in latency and throughput
performance, it is better to reserve more bulk queues than latency
queues.
In this patch, MIX policy is forced to include 1/3 latency policy queues
and 2/3 bulk policy queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |   7 +++
 net/core/netpolicy.c      | 139 ++++++++++++++++++++++++++++++++++++++++++----
 2 files changed, 136 insertions(+), 10 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 3d348a7..579ff98 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -22,6 +22,12 @@ enum netpolicy_name {
 	NET_POLICY_BULK,
 	NET_POLICY_LATENCY,
 	NET_POLICY_MAX,
+
+	/*
+	 * Mixture of the above policy
+	 * Can only be set as global policy.
+	 */
+	NET_POLICY_MIX,
 };
 
 enum netpolicy_traffic {
@@ -66,6 +72,7 @@ struct netpolicy_object {
 struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
+	bool	has_mix_policy;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 71e9163..8336106 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -280,6 +280,9 @@ static inline int node_distance_cmp(const void *a, const void *b)
 	return _a->distance - _b->distance;
 }
 
+#define mix_latency_num(num)	((num) / 3)
+#define mix_throughput_num(num)	((num) - mix_latency_num(num))
+
 static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 				   enum netpolicy_name policy,
 				   struct sort_node *nodes, int num_node,
@@ -287,7 +290,9 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 {
 	cpumask_var_t node_tmp_cpumask, sibling_tmp_cpumask;
 	struct cpumask *node_assigned_cpumask;
+	int *l_num = NULL, *b_num = NULL;
 	int i, ret = -ENOMEM;
+	int num_node_cpu;
 	u32 cpu;
 
 	if (!alloc_cpumask_var(&node_tmp_cpumask, GFP_ATOMIC))
@@ -299,6 +304,23 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 	if (!node_assigned_cpumask)
 		goto alloc_fail2;
 
+	if (policy == NET_POLICY_MIX) {
+		l_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+		if (!l_num)
+			goto alloc_fail3;
+		b_num = kcalloc(num_node, sizeof(int), GFP_ATOMIC);
+		if (!b_num) {
+			kfree(l_num);
+			goto alloc_fail3;
+		}
+
+		for (i = 0; i < num_node; i++) {
+			num_node_cpu = cpumask_weight(&node_avail_cpumask[nodes[i].node]);
+			l_num[i] = mix_latency_num(num_node_cpu);
+			b_num[i] = mix_throughput_num(num_node_cpu);
+		}
+	}
+
 	/* Don't share physical core */
 	for (i = 0; i < num_node; i++) {
 		if (cpumask_weight(&node_avail_cpumask[nodes[i].node]) == 0)
@@ -309,7 +331,13 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 			cpu = cpumask_first(node_tmp_cpumask);
 
 			/* push to obj list */
-			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (policy == NET_POLICY_MIX) {
+				if (l_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_LATENCY);
+				else if (b_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_BULK);
+			} else
+				ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
 			if (ret) {
 				spin_unlock(&dev->np_ob_list_lock);
 				goto err;
@@ -322,6 +350,41 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 		spin_unlock(&dev->np_ob_list_lock);
 	}
 
+	if (policy == NET_POLICY_MIX) {
+		struct netpolicy_object *obj;
+		int dir = is_rx ? 0 : 1;
+		u32 sibling;
+
+		/* if have to share core, choose latency core first. */
+		for (i = 0; i < num_node; i++) {
+			if ((l_num[i] < 1) && (b_num[i] < 1))
+				continue;
+			spin_lock(&dev->np_ob_list_lock);
+			list_for_each_entry(obj, &dev->netpolicy->obj_list[dir][NET_POLICY_LATENCY], list) {
+				if (cpu_to_node(obj->cpu) != nodes[i].node)
+					continue;
+
+				cpu = obj->cpu;
+				for_each_cpu(sibling, topology_sibling_cpumask(cpu)) {
+					if (cpumask_test_cpu(sibling, &node_assigned_cpumask[nodes[i].node]) ||
+					    !cpumask_test_cpu(sibling, &node_avail_cpumask[nodes[i].node]))
+						continue;
+
+					if (l_num[i]-- > 0)
+						ret = netpolicy_add_obj(dev, sibling, is_rx, NET_POLICY_LATENCY);
+					else if (b_num[i]-- > 0)
+						ret = netpolicy_add_obj(dev, sibling, is_rx, NET_POLICY_BULK);
+					if (ret) {
+						spin_unlock(&dev->np_ob_list_lock);
+						goto err;
+					}
+					cpumask_set_cpu(sibling, &node_assigned_cpumask[nodes[i].node]);
+				}
+			}
+			spin_unlock(&dev->np_ob_list_lock);
+		}
+	}
+
 	for (i = 0; i < num_node; i++) {
 		cpumask_xor(node_tmp_cpumask, &node_avail_cpumask[nodes[i].node], &node_assigned_cpumask[nodes[i].node]);
 		if (cpumask_weight(node_tmp_cpumask) == 0)
@@ -329,7 +392,15 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 		spin_lock(&dev->np_ob_list_lock);
 		for_each_cpu(cpu, node_tmp_cpumask) {
 			/* push to obj list */
-			ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
+			if (policy == NET_POLICY_MIX) {
+				if (l_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_LATENCY);
+				else if (b_num[i]-- > 0)
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_BULK);
+				else
+					ret = netpolicy_add_obj(dev, cpu, is_rx, NET_POLICY_NONE);
+			} else
+				ret = netpolicy_add_obj(dev, cpu, is_rx, policy);
 			if (ret) {
 				spin_unlock(&dev->np_ob_list_lock);
 				goto err;
@@ -340,6 +411,11 @@ static int _netpolicy_gen_obj_list(struct net_device *dev, bool is_rx,
 	}
 
 err:
+	if (policy == NET_POLICY_MIX) {
+		kfree(l_num);
+		kfree(b_num);
+	}
+alloc_fail3:
 	kfree(node_assigned_cpumask);
 alloc_fail2:
 	free_cpumask_var(sibling_tmp_cpumask);
@@ -377,6 +453,22 @@ static int netpolicy_gen_obj_list(struct net_device *dev,
 	 * 2. Remote core + the only logical core
 	 * 3. Local core + the core's sibling is already in the object list
 	 * 4. Remote core + the core's sibling is already in the object list
+	 *
+	 * For MIX policy, on each node, force 1/3 core as latency policy core,
+	 * the rest cores are bulk policy core.
+	 *
+	 * Besides the above priority rules, there is one more rule
+	 * - If it's sibling core's object has been applied a policy
+	 *   Choose the object which the sibling logical core applies latency policy first
+	 *
+	 * So the order of object list for MIX policy is as below:
+	 * 1. Local core + the only logical core
+	 * 2. Remote core + the only logical core
+	 * 3. Local core + the core's sibling is latency policy core
+	 * 4. Remote core + the core's sibling is latency policy core
+	 * 5. Local core + the core's sibling is bulk policy core
+	 * 6. Remote core + the core's sibling is bulk policy core
+	 *
 	 */
 #ifdef CONFIG_NUMA
 	dev_node = dev_to_node(dev->dev.parent);
@@ -447,14 +539,23 @@ static int net_policy_set_by_name(char *name, struct net_device *dev)
 		goto unlock;
 	}
 
-	for (i = 0; i < NET_POLICY_MAX; i++) {
-		if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
-		break;
-	}
+	if (!strncmp(name, "MIX", strlen("MIX"))) {
+		if (dev->netpolicy->has_mix_policy) {
+			i = NET_POLICY_MIX;
+		} else {
+			ret = -ENOTSUPP;
+			goto unlock;
+		}
+	} else {
+		for (i = 0; i < NET_POLICY_MAX; i++) {
+			if (!strncmp(name, policy_name[i], strlen(policy_name[i])))
+			break;
+		}
 
-	if (!test_bit(i, dev->netpolicy->avail_policy)) {
-		ret = -ENOTSUPP;
-		goto unlock;
+		if (!test_bit(i, dev->netpolicy->avail_policy)) {
+			ret = -ENOTSUPP;
+			goto unlock;
+		}
 	}
 
 	if (i == dev->netpolicy->cur_policy)
@@ -502,17 +603,35 @@ unlock:
 static int net_policy_proc_show(struct seq_file *m, void *v)
 {
 	struct net_device *dev = (struct net_device *)m->private;
+	enum netpolicy_name cur;
+	struct netpolicy_object *obj, *tmp;
 	int i;
 
 	if (WARN_ON(!dev->netpolicy))
 		return -EINVAL;
 
-	if (dev->netpolicy->cur_policy == NET_POLICY_NONE) {
+	cur = dev->netpolicy->cur_policy;
+	if (cur == NET_POLICY_NONE) {
 		seq_printf(m, "%s: There is no policy applied\n", dev->name);
 		seq_printf(m, "%s: The available policy include:", dev->name);
 		for_each_set_bit(i, dev->netpolicy->avail_policy, NET_POLICY_MAX)
 			seq_printf(m, " %s", policy_name[i]);
+		if (dev->netpolicy->has_mix_policy)
+			seq_printf(m, " MIX");
 		seq_printf(m, "\n");
+	} else if (cur == NET_POLICY_MIX) {
+		seq_printf(m, "%s: MIX policy is running on the system\n", dev->name);
+		spin_lock(&dev->np_ob_list_lock);
+		for (i = NET_POLICY_NONE; i < NET_POLICY_MAX; i++) {
+			seq_printf(m, "%s: queues for %s policy\n", dev->name, policy_name[i]);
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[NETPOLICY_RX][i], list) {
+				seq_printf(m, "%s: rx queue %d\n", dev->name, obj->queue);
+			}
+			list_for_each_entry_safe(obj, tmp, &dev->netpolicy->obj_list[NETPOLICY_TX][i], list) {
+				seq_printf(m, "%s: tx queue %d\n", dev->name, obj->queue);
+			}
+		}
+		spin_unlock(&dev->np_ob_list_lock);
 	} else {
 		seq_printf(m, "%s: POLICY %s is running on the system\n",
 			   dev->name, policy_name[dev->netpolicy->cur_policy]);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 15/30] i40e/netpolicy: add MIX policy support
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Enable i40e MIX policy support. Based on the test, the MIX policy has
better performance if increasing rx interrupt moderation a little bit.

For evaluating the MIX policy performance, mixed workloads are tested.
The mixed workloads are combination of throughput-first workload and
latency-first workload. Five different types of combinations are
evaluated.
(pure throughput-first workload, pure latency-first workloads,
 2/3 throughput-first workload + 1/3 latency-first workloads,
 1/3 throughput-first workload + 2/3 latency-first workloads and
 1/2 throughput-first workload + 1/2 latency-first workloads).

For caculating the performance of mixed workloads, a weighted sum system
is also introduced. Here is the formula.

Score = normalized_latency * Weight + normalized_throughput * (1 -
Weight).

If we assume that the user has an equal interest in latency and
throughput performance, the Score for "MIX" policy is on average ~1.52X
than baseline.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 71 +++++++++++++++++++++--------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 11b921b..d3f087d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8966,6 +8966,8 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 #define NET_POLICY_CPU_TX	250
 #define NET_POLICY_BULK_RX	50
 #define NET_POLICY_BULK_TX	125
+#define NET_POLICY_MIX_BULK_RX	62
+#define NET_POLICY_MIX_BULK_TX	122
 #define NET_POLICY_LATENCY_RX	5
 #define NET_POLICY_LATENCY_TX	10
 
@@ -9004,6 +9006,9 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 			set_bit(i, info->avail_policy);
 	}
 
+	/* support MIX policy */
+	info->has_mix_policy = true;
+
 	return 0;
 }
 
@@ -9046,6 +9051,30 @@ static int i40e_ndo_get_irq_info(struct net_device *dev,
 	return 0;
 }
 
+static int i40e_fill_coalesce_for_policy(struct ethtool_coalesce *ec,
+					 enum netpolicy_name name)
+{
+	if (policy_param[name][NETPOLICY_RX] > 0) {
+		ec->rx_coalesce_usecs = policy_param[name][NETPOLICY_RX];
+		ec->use_adaptive_rx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_RX] == 0) {
+		ec->use_adaptive_rx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	if (policy_param[name][NETPOLICY_TX] > 0) {
+		ec->tx_coalesce_usecs = policy_param[name][NETPOLICY_TX];
+		ec->use_adaptive_tx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_TX] == 0) {
+		ec->use_adaptive_tx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * i40e_set_net_policy
  * @dev: the net device pointer
@@ -9061,28 +9090,30 @@ static int i40e_set_net_policy(struct net_device *dev,
 	struct i40e_vsi *vsi = np->vsi;
 	struct netpolicy_object *obj;
 	struct ethtool_coalesce ec;
-
-	if (policy_param[name][NETPOLICY_RX] > 0) {
-		ec.rx_coalesce_usecs = policy_param[name][NETPOLICY_RX];
-		ec.use_adaptive_rx_coalesce = 0;
-	} else if (policy_param[name][NETPOLICY_RX] == 0) {
-		ec.use_adaptive_rx_coalesce = 1;
-	} else {
-		return -EINVAL;
-	}
-
-	if (policy_param[name][NETPOLICY_TX] > 0) {
-		ec.tx_coalesce_usecs = policy_param[name][NETPOLICY_TX];
-		ec.use_adaptive_tx_coalesce = 0;
-	} else if (policy_param[name][NETPOLICY_TX] == 0) {
-		ec.use_adaptive_tx_coalesce = 1;
-	} else {
-		return -EINVAL;
-	}
+	int i, ret;
 
 	/*For i40e driver, tx and rx are always in pair */
-	list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][name], list) {
-		i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+	if (name == NET_POLICY_MIX) {
+		/* Under MIX policy, the paramers for BULK object are different */
+		policy_param[NET_POLICY_BULK][NETPOLICY_RX] = NET_POLICY_MIX_BULK_RX;
+		policy_param[NET_POLICY_BULK][NETPOLICY_TX] = NET_POLICY_MIX_BULK_TX;
+		for (i = NET_POLICY_NONE; i < NET_POLICY_MAX; i++) {
+			ret = i40e_fill_coalesce_for_policy(&ec, i);
+			if (ret)
+				return ret;
+			list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][i], list) {
+				i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+			}
+		}
+	} else {
+		policy_param[NET_POLICY_BULK][NETPOLICY_RX] = NET_POLICY_BULK_RX;
+		policy_param[NET_POLICY_BULK][NETPOLICY_TX] = NET_POLICY_BULK_TX;
+		ret = i40e_fill_coalesce_for_policy(&ec, name);
+		if (ret)
+			return ret;
+		list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][name], list) {
+			i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+		}
 	}
 
 	return 0;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 15/30] i40e/netpolicy: add MIX policy support
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Enable i40e MIX policy support. Based on the test, the MIX policy has
better performance if increasing rx interrupt moderation a little bit.

For evaluating the MIX policy performance, mixed workloads are tested.
The mixed workloads are combination of throughput-first workload and
latency-first workload. Five different types of combinations are
evaluated.
(pure throughput-first workload, pure latency-first workloads,
 2/3 throughput-first workload + 1/3 latency-first workloads,
 1/3 throughput-first workload + 2/3 latency-first workloads and
 1/2 throughput-first workload + 1/2 latency-first workloads).

For caculating the performance of mixed workloads, a weighted sum system
is also introduced. Here is the formula.

Score = normalized_latency * Weight + normalized_throughput * (1 -
Weight).

If we assume that the user has an equal interest in latency and
throughput performance, the Score for "MIX" policy is on average ~1.52X
than baseline.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 71 +++++++++++++++++++++--------
 1 file changed, 51 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 11b921b..d3f087d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8966,6 +8966,8 @@ static netdev_features_t i40e_features_check(struct sk_buff *skb,
 #define NET_POLICY_CPU_TX	250
 #define NET_POLICY_BULK_RX	50
 #define NET_POLICY_BULK_TX	125
+#define NET_POLICY_MIX_BULK_RX	62
+#define NET_POLICY_MIX_BULK_TX	122
 #define NET_POLICY_LATENCY_RX	5
 #define NET_POLICY_LATENCY_TX	10
 
@@ -9004,6 +9006,9 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 			set_bit(i, info->avail_policy);
 	}
 
+	/* support MIX policy */
+	info->has_mix_policy = true;
+
 	return 0;
 }
 
@@ -9046,6 +9051,30 @@ static int i40e_ndo_get_irq_info(struct net_device *dev,
 	return 0;
 }
 
+static int i40e_fill_coalesce_for_policy(struct ethtool_coalesce *ec,
+					 enum netpolicy_name name)
+{
+	if (policy_param[name][NETPOLICY_RX] > 0) {
+		ec->rx_coalesce_usecs = policy_param[name][NETPOLICY_RX];
+		ec->use_adaptive_rx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_RX] == 0) {
+		ec->use_adaptive_rx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	if (policy_param[name][NETPOLICY_TX] > 0) {
+		ec->tx_coalesce_usecs = policy_param[name][NETPOLICY_TX];
+		ec->use_adaptive_tx_coalesce = 0;
+	} else if (policy_param[name][NETPOLICY_TX] == 0) {
+		ec->use_adaptive_tx_coalesce = 1;
+	} else {
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 /**
  * i40e_set_net_policy
  * @dev: the net device pointer
@@ -9061,28 +9090,30 @@ static int i40e_set_net_policy(struct net_device *dev,
 	struct i40e_vsi *vsi = np->vsi;
 	struct netpolicy_object *obj;
 	struct ethtool_coalesce ec;
-
-	if (policy_param[name][NETPOLICY_RX] > 0) {
-		ec.rx_coalesce_usecs = policy_param[name][NETPOLICY_RX];
-		ec.use_adaptive_rx_coalesce = 0;
-	} else if (policy_param[name][NETPOLICY_RX] == 0) {
-		ec.use_adaptive_rx_coalesce = 1;
-	} else {
-		return -EINVAL;
-	}
-
-	if (policy_param[name][NETPOLICY_TX] > 0) {
-		ec.tx_coalesce_usecs = policy_param[name][NETPOLICY_TX];
-		ec.use_adaptive_tx_coalesce = 0;
-	} else if (policy_param[name][NETPOLICY_TX] == 0) {
-		ec.use_adaptive_tx_coalesce = 1;
-	} else {
-		return -EINVAL;
-	}
+	int i, ret;
 
 	/*For i40e driver, tx and rx are always in pair */
-	list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][name], list) {
-		i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+	if (name == NET_POLICY_MIX) {
+		/* Under MIX policy, the paramers for BULK object are different */
+		policy_param[NET_POLICY_BULK][NETPOLICY_RX] = NET_POLICY_MIX_BULK_RX;
+		policy_param[NET_POLICY_BULK][NETPOLICY_TX] = NET_POLICY_MIX_BULK_TX;
+		for (i = NET_POLICY_NONE; i < NET_POLICY_MAX; i++) {
+			ret = i40e_fill_coalesce_for_policy(&ec, i);
+			if (ret)
+				return ret;
+			list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][i], list) {
+				i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+			}
+		}
+	} else {
+		policy_param[NET_POLICY_BULK][NETPOLICY_RX] = NET_POLICY_BULK_RX;
+		policy_param[NET_POLICY_BULK][NETPOLICY_TX] = NET_POLICY_BULK_TX;
+		ret = i40e_fill_coalesce_for_policy(&ec, name);
+		if (ret)
+			return ret;
+		list_for_each_entry(obj, &dev->netpolicy->obj_list[NETPOLICY_RX][name], list) {
+			i40e_set_itr_per_queue(vsi, &ec, obj->queue);
+		}
 	}
 
 	return 0;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 16/30] net/netpolicy: net device hotplug
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Support net device up/down/namechange in the netpolicy code.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 66 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8336106..2a04fcf 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -684,6 +684,9 @@ static const struct file_operations proc_net_policy_operations = {
 
 static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 {
+	if (dev->proc_dev)
+		proc_remove(dev->proc_dev);
+
 	dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
 	if (!dev->proc_dev)
 		return -ENOMEM;
@@ -750,6 +753,19 @@ void uninit_netpolicy(struct net_device *dev)
 	spin_unlock(&dev->np_lock);
 }
 
+static void netpolicy_dev_init(struct net *net,
+			       struct net_device *dev)
+{
+	if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+		if (netpolicy_proc_dev_init(net, dev))
+			uninit_netpolicy(dev);
+		else
+#endif /* CONFIG_PROC_FS */
+		pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+	}
+}
+
 static int __net_init netpolicy_net_init(struct net *net)
 {
 	struct net_device *dev, *aux;
@@ -762,14 +778,7 @@ static int __net_init netpolicy_net_init(struct net *net)
 #endif /* CONFIG_PROC_FS */
 
 	for_each_netdev_safe(net, dev, aux) {
-		if (!init_netpolicy(dev)) {
-#ifdef CONFIG_PROC_FS
-			if (netpolicy_proc_dev_init(net, dev))
-				uninit_netpolicy(dev);
-			else
-#endif /* CONFIG_PROC_FS */
-			pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
-		}
+		netpolicy_dev_init(net, dev);
 	}
 
 	return 0;
@@ -791,17 +800,58 @@ static struct pernet_operations netpolicy_net_ops = {
 	.exit = netpolicy_net_exit,
 };
 
+static int netpolicy_notify(struct notifier_block *this,
+			    unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+	switch (event) {
+	case NETDEV_CHANGENAME:
+#ifdef CONFIG_PROC_FS
+		if (dev->proc_dev) {
+			proc_remove(dev->proc_dev);
+			if ((netpolicy_proc_dev_init(dev_net(dev), dev) < 0) &&
+			    dev->proc_dev) {
+				proc_remove(dev->proc_dev);
+				dev->proc_dev = NULL;
+			}
+		}
+#endif
+		break;
+	case NETDEV_UP:
+		netpolicy_dev_init(dev_net(dev), dev);
+		break;
+	case NETDEV_GOING_DOWN:
+		uninit_netpolicy(dev);
+#ifdef CONFIG_PROC_FS
+		proc_remove(dev->proc_dev);
+		dev->proc_dev = NULL;
+#endif
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_dev_notf = {
+	.notifier_call = netpolicy_notify,
+};
+
 static int __init netpolicy_init(void)
 {
 	int ret;
 
 	ret = register_pernet_subsys(&netpolicy_net_ops);
+	if (!ret)
+		register_netdevice_notifier(&netpolicy_dev_notf);
 
 	return ret;
 }
 
 static void __exit netpolicy_exit(void)
 {
+	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
 }
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 16/30] net/netpolicy: net device hotplug
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Support net device up/down/namechange in the netpolicy code.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 66 +++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 58 insertions(+), 8 deletions(-)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 8336106..2a04fcf 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -684,6 +684,9 @@ static const struct file_operations proc_net_policy_operations = {
 
 static int netpolicy_proc_dev_init(struct net *net, struct net_device *dev)
 {
+	if (dev->proc_dev)
+		proc_remove(dev->proc_dev);
+
 	dev->proc_dev = proc_net_mkdir(net, dev->name, net->proc_netpolicy);
 	if (!dev->proc_dev)
 		return -ENOMEM;
@@ -750,6 +753,19 @@ void uninit_netpolicy(struct net_device *dev)
 	spin_unlock(&dev->np_lock);
 }
 
+static void netpolicy_dev_init(struct net *net,
+			       struct net_device *dev)
+{
+	if (!init_netpolicy(dev)) {
+#ifdef CONFIG_PROC_FS
+		if (netpolicy_proc_dev_init(net, dev))
+			uninit_netpolicy(dev);
+		else
+#endif /* CONFIG_PROC_FS */
+		pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
+	}
+}
+
 static int __net_init netpolicy_net_init(struct net *net)
 {
 	struct net_device *dev, *aux;
@@ -762,14 +778,7 @@ static int __net_init netpolicy_net_init(struct net *net)
 #endif /* CONFIG_PROC_FS */
 
 	for_each_netdev_safe(net, dev, aux) {
-		if (!init_netpolicy(dev)) {
-#ifdef CONFIG_PROC_FS
-			if (netpolicy_proc_dev_init(net, dev))
-				uninit_netpolicy(dev);
-			else
-#endif /* CONFIG_PROC_FS */
-			pr_info("NETPOLICY: Init net policy for %s\n", dev->name);
-		}
+		netpolicy_dev_init(net, dev);
 	}
 
 	return 0;
@@ -791,17 +800,58 @@ static struct pernet_operations netpolicy_net_ops = {
 	.exit = netpolicy_net_exit,
 };
 
+static int netpolicy_notify(struct notifier_block *this,
+			    unsigned long event,
+			    void *ptr)
+{
+	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+	switch (event) {
+	case NETDEV_CHANGENAME:
+#ifdef CONFIG_PROC_FS
+		if (dev->proc_dev) {
+			proc_remove(dev->proc_dev);
+			if ((netpolicy_proc_dev_init(dev_net(dev), dev) < 0) &&
+			    dev->proc_dev) {
+				proc_remove(dev->proc_dev);
+				dev->proc_dev = NULL;
+			}
+		}
+#endif
+		break;
+	case NETDEV_UP:
+		netpolicy_dev_init(dev_net(dev), dev);
+		break;
+	case NETDEV_GOING_DOWN:
+		uninit_netpolicy(dev);
+#ifdef CONFIG_PROC_FS
+		proc_remove(dev->proc_dev);
+		dev->proc_dev = NULL;
+#endif
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_dev_notf = {
+	.notifier_call = netpolicy_notify,
+};
+
 static int __init netpolicy_init(void)
 {
 	int ret;
 
 	ret = register_pernet_subsys(&netpolicy_net_ops);
+	if (!ret)
+		register_netdevice_notifier(&netpolicy_dev_notf);
 
 	return ret;
 }
 
 static void __exit netpolicy_exit(void)
 {
+	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
 }
 
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 17/30] net/netpolicy: support CPU hotplug
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

For CPU hotplug, the net policy module will rebuild the sys map and
object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 2a04fcf..46af407 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -37,6 +37,7 @@
 #include <net/net_namespace.h>
 #include <linux/sort.h>
 #include <linux/ctype.h>
+#include <linux/cpu.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -838,6 +839,75 @@ static struct notifier_block netpolicy_dev_notf = {
 	.notifier_call = netpolicy_notify,
 };
 
+/**
+ * update_netpolicy_sys_map() - rebuild the sys map and object list
+ *
+ * This function go through all the available net policy supported device,
+ * and rebuild sys map and object list.
+ *
+ */
+void update_netpolicy_sys_map(void)
+{
+	struct net *net;
+	struct net_device *dev, *aux;
+	enum netpolicy_name cur_policy;
+
+	for_each_net(net) {
+		for_each_netdev_safe(net, dev, aux) {
+			spin_lock(&dev->np_lock);
+			if (!dev->netpolicy)
+				goto unlock;
+			cur_policy = dev->netpolicy->cur_policy;
+			if (cur_policy == NET_POLICY_NONE)
+				goto unlock;
+
+			dev->netpolicy->cur_policy = NET_POLICY_NONE;
+
+			/* rebuild everything */
+			netpolicy_disable(dev);
+			netpolicy_enable(dev);
+			if (netpolicy_gen_obj_list(dev, cur_policy)) {
+				pr_warn("NETPOLICY: Failed to generate "
+					"netpolicy object list for dev %s\n",
+					dev->name);
+				netpolicy_disable(dev);
+				goto unlock;
+			}
+			if (dev->netdev_ops->ndo_set_net_policy(dev, cur_policy)) {
+				pr_warn("NETPOLICY: Failed to set "
+					"netpolicy for dev %s\n",
+					dev->name);
+				netpolicy_disable(dev);
+				goto unlock;
+			}
+
+			dev->netpolicy->cur_policy = cur_policy;
+unlock:
+			spin_unlock(&dev->np_lock);
+		}
+	}
+}
+
+static int netpolicy_cpu_callback(struct notifier_block *nfb,
+				  unsigned long action, void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_ONLINE:
+		update_netpolicy_sys_map();
+		break;
+	case CPU_DYING:
+		update_netpolicy_sys_map();
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_cpu_notifier = {
+	&netpolicy_cpu_callback,
+	NULL,
+	0
+};
+
 static int __init netpolicy_init(void)
 {
 	int ret;
@@ -846,6 +916,10 @@ static int __init netpolicy_init(void)
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
 
+	cpu_notifier_register_begin();
+	__register_cpu_notifier(&netpolicy_cpu_notifier);
+	cpu_notifier_register_done();
+
 	return ret;
 }
 
@@ -853,6 +927,10 @@ static void __exit netpolicy_exit(void)
 {
 	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
+
+	cpu_notifier_register_begin();
+	__unregister_cpu_notifier(&netpolicy_cpu_notifier);
+	cpu_notifier_register_done();
 }
 
 subsys_initcall(netpolicy_init);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 17/30] net/netpolicy: support CPU hotplug
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

For CPU hotplug, the net policy module will rebuild the sys map and
object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/netpolicy.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 2a04fcf..46af407 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -37,6 +37,7 @@
 #include <net/net_namespace.h>
 #include <linux/sort.h>
 #include <linux/ctype.h>
+#include <linux/cpu.h>
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -838,6 +839,75 @@ static struct notifier_block netpolicy_dev_notf = {
 	.notifier_call = netpolicy_notify,
 };
 
+/**
+ * update_netpolicy_sys_map() - rebuild the sys map and object list
+ *
+ * This function go through all the available net policy supported device,
+ * and rebuild sys map and object list.
+ *
+ */
+void update_netpolicy_sys_map(void)
+{
+	struct net *net;
+	struct net_device *dev, *aux;
+	enum netpolicy_name cur_policy;
+
+	for_each_net(net) {
+		for_each_netdev_safe(net, dev, aux) {
+			spin_lock(&dev->np_lock);
+			if (!dev->netpolicy)
+				goto unlock;
+			cur_policy = dev->netpolicy->cur_policy;
+			if (cur_policy == NET_POLICY_NONE)
+				goto unlock;
+
+			dev->netpolicy->cur_policy = NET_POLICY_NONE;
+
+			/* rebuild everything */
+			netpolicy_disable(dev);
+			netpolicy_enable(dev);
+			if (netpolicy_gen_obj_list(dev, cur_policy)) {
+				pr_warn("NETPOLICY: Failed to generate "
+					"netpolicy object list for dev %s\n",
+					dev->name);
+				netpolicy_disable(dev);
+				goto unlock;
+			}
+			if (dev->netdev_ops->ndo_set_net_policy(dev, cur_policy)) {
+				pr_warn("NETPOLICY: Failed to set "
+					"netpolicy for dev %s\n",
+					dev->name);
+				netpolicy_disable(dev);
+				goto unlock;
+			}
+
+			dev->netpolicy->cur_policy = cur_policy;
+unlock:
+			spin_unlock(&dev->np_lock);
+		}
+	}
+}
+
+static int netpolicy_cpu_callback(struct notifier_block *nfb,
+				  unsigned long action, void *hcpu)
+{
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_ONLINE:
+		update_netpolicy_sys_map();
+		break;
+	case CPU_DYING:
+		update_netpolicy_sys_map();
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct notifier_block netpolicy_cpu_notifier = {
+	&netpolicy_cpu_callback,
+	NULL,
+	0
+};
+
 static int __init netpolicy_init(void)
 {
 	int ret;
@@ -846,6 +916,10 @@ static int __init netpolicy_init(void)
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
 
+	cpu_notifier_register_begin();
+	__register_cpu_notifier(&netpolicy_cpu_notifier);
+	cpu_notifier_register_done();
+
 	return ret;
 }
 
@@ -853,6 +927,10 @@ static void __exit netpolicy_exit(void)
 {
 	unregister_netdevice_notifier(&netpolicy_dev_notf);
 	unregister_pernet_subsys(&netpolicy_net_ops);
+
+	cpu_notifier_register_begin();
+	__unregister_cpu_notifier(&netpolicy_cpu_notifier);
+	cpu_notifier_register_done();
 }
 
 subsys_initcall(netpolicy_init);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 18/30] net/netpolicy: handle channel changes
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

User can uses ethtool to set the channel number. This patch handles the
channel changes by rebuilding the object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 8 ++++++++
 net/core/ethtool.c        | 8 +++++++-
 net/core/netpolicy.c      | 1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 579ff98..cc75e3c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -79,4 +79,12 @@ struct netpolicy_info {
 	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+#ifdef CONFIG_NETPOLICY
+extern void update_netpolicy_sys_map(void);
+#else
+static inline void update_netpolicy_sys_map(void)
+{
+}
+#endif
+
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..e1f8bd0 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1703,6 +1703,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 {
 	struct ethtool_channels channels, max;
 	u32 max_rx_in_use = 0;
+	int ret;
 
 	if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
 		return -EOPNOTSUPP;
@@ -1726,7 +1727,12 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 	    (channels.combined_count + channels.rx_count) <= max_rx_in_use)
 	    return -EINVAL;
 
-	return dev->ethtool_ops->set_channels(dev, &channels);
+	ret = dev->ethtool_ops->set_channels(dev, &channels);
+#ifdef CONFIG_NETPOLICY
+	if (!ret)
+		update_netpolicy_sys_map();
+#endif
+	return ret;
 }
 
 static int ethtool_get_pauseparam(struct net_device *dev, void __user *useraddr)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 46af407..da7d9f1 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -887,6 +887,7 @@ unlock:
 		}
 	}
 }
+EXPORT_SYMBOL(update_netpolicy_sys_map);
 
 static int netpolicy_cpu_callback(struct notifier_block *nfb,
 				  unsigned long action, void *hcpu)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 18/30] net/netpolicy: handle channel changes
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

User can uses ethtool to set the channel number. This patch handles the
channel changes by rebuilding the object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 8 ++++++++
 net/core/ethtool.c        | 8 +++++++-
 net/core/netpolicy.c      | 1 +
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 579ff98..cc75e3c 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -79,4 +79,12 @@ struct netpolicy_info {
 	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+#ifdef CONFIG_NETPOLICY
+extern void update_netpolicy_sys_map(void);
+#else
+static inline void update_netpolicy_sys_map(void)
+{
+}
+#endif
+
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 9774898..e1f8bd0 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1703,6 +1703,7 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 {
 	struct ethtool_channels channels, max;
 	u32 max_rx_in_use = 0;
+	int ret;
 
 	if (!dev->ethtool_ops->set_channels || !dev->ethtool_ops->get_channels)
 		return -EOPNOTSUPP;
@@ -1726,7 +1727,12 @@ static noinline_for_stack int ethtool_set_channels(struct net_device *dev,
 	    (channels.combined_count + channels.rx_count) <= max_rx_in_use)
 	    return -EINVAL;
 
-	return dev->ethtool_ops->set_channels(dev, &channels);
+	ret = dev->ethtool_ops->set_channels(dev, &channels);
+#ifdef CONFIG_NETPOLICY
+	if (!ret)
+		update_netpolicy_sys_map();
+#endif
+	return ret;
 }
 
 static int ethtool_get_pauseparam(struct net_device *dev, void __user *useraddr)
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 46af407..da7d9f1 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -887,6 +887,7 @@ unlock:
 		}
 	}
 }
+EXPORT_SYMBOL(update_netpolicy_sys_map);
 
 static int netpolicy_cpu_callback(struct notifier_block *nfb,
 				  unsigned long action, void *hcpu)
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 19/30] net/netpolicy: implement netpolicy register
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

User can register itself in netpolicy module with specific policy.
If it's the first time to register, an record will be created and
inserted into RCU hash table. The record includes ptr, policy and object
information. ptr is assigned by the user which is used as key to search
the record in hash table. Object will be assigned by netpolicy later.

If CPU/device are removed(hotplug), the assigned object will be clear.
This patch also introduces a new type NET_POLICY_INVALID, which
indicates that the task/socket are not registered.
np_hashtable_lock is introduced to protect the hash table.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  26 ++++++++
 net/core/netpolicy.c      | 150 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 176 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index cc75e3c..89361d9 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -17,6 +17,7 @@
 #define __LINUX_NETPOLICY_H
 
 enum netpolicy_name {
+	NET_POLICY_INVALID	= -1,
 	NET_POLICY_NONE		= 0,
 	NET_POLICY_CPU,
 	NET_POLICY_BULK,
@@ -79,12 +80,37 @@ struct netpolicy_info {
 	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_reg {
+	struct net_device	*dev;
+	enum netpolicy_name	policy; /* required policy */
+	void			*ptr;   /* pointers */
+};
+
+/* check if policy is valid */
+static inline int is_net_policy_valid(enum netpolicy_name policy)
+{
+	return ((policy < NET_POLICY_MAX) && (policy > NET_POLICY_INVALID));
+}
+
 #ifdef CONFIG_NETPOLICY
 extern void update_netpolicy_sys_map(void);
+extern int netpolicy_register(struct netpolicy_reg *reg,
+			      enum netpolicy_name policy);
+extern void netpolicy_unregister(struct netpolicy_reg *reg);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
 }
+
+static inline int netpolicy_register(struct netpolicy_reg *reg,
+				     enum netpolicy_name policy)
+{	return 0;
+}
+
+static inline void netpolicy_unregister(struct netpolicy_reg *reg)
+{
+}
+
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index da7d9f1..13ab5e1 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -38,6 +38,19 @@
 #include <linux/sort.h>
 #include <linux/ctype.h>
 #include <linux/cpu.h>
+#include <linux/hashtable.h>
+
+struct netpolicy_record {
+	struct hlist_node	hash_node;
+	unsigned long		ptr_id;
+	enum netpolicy_name	policy;
+	struct net_device	*dev;
+	struct netpolicy_object	*rx_obj;
+	struct netpolicy_object	*tx_obj;
+};
+
+static DEFINE_HASHTABLE(np_record_hash, 10);
+static DEFINE_SPINLOCK(np_hashtable_lock);
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -223,6 +236,140 @@ static int netpolicy_enable(struct net_device *dev)
 	return 0;
 }
 
+static struct netpolicy_record *netpolicy_record_search(unsigned long ptr_id)
+{
+	struct netpolicy_record *rec = NULL;
+
+	hash_for_each_possible_rcu(np_record_hash, rec, hash_node, ptr_id) {
+		if (rec->ptr_id == ptr_id)
+			break;
+	}
+
+	return rec;
+}
+
+static void netpolicy_record_clear_obj(void)
+{
+	struct netpolicy_record *rec;
+	int i;
+
+	spin_lock_bh(&np_hashtable_lock);
+	hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+		rec->rx_obj = NULL;
+		rec->tx_obj = NULL;
+	}
+	spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void netpolicy_record_clear_dev_node(struct net_device *dev)
+{
+	struct netpolicy_record *rec;
+	int i;
+
+	spin_lock_bh(&np_hashtable_lock);
+	hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+		if (rec->dev == dev) {
+			hash_del_rcu(&rec->hash_node);
+			kfree(rec);
+		}
+	}
+	spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void put_queue(struct net_device *dev,
+		      struct netpolicy_object *rx_obj,
+		      struct netpolicy_object *tx_obj)
+{
+	if (!dev || !dev->netpolicy)
+		return;
+
+	if (rx_obj)
+		atomic_dec(&rx_obj->refcnt);
+	if (tx_obj)
+		atomic_dec(&tx_obj->refcnt);
+}
+
+/**
+ * netpolicy_register() - Register per socket/task policy request
+ * @reg:	NET policy register info
+ * @policy:	request NET policy
+ *
+ * This function intends to register per socket/task policy request.
+ * If it's the first time to register, an record will be created and
+ * inserted into RCU hash table.
+ *
+ * The record includes ptr, policy and object info. ptr of the socket/task
+ * is the key to search the record in hash table. Object will be assigned
+ * until the first packet is received/transmitted.
+ *
+ * Return: 0 on success, others on failure
+ */
+int netpolicy_register(struct netpolicy_reg *reg,
+		       enum netpolicy_name policy)
+{
+	unsigned long ptr_id = (uintptr_t)reg->ptr;
+	struct netpolicy_record *new, *old;
+
+	if (!is_net_policy_valid(policy)) {
+		reg->policy = NET_POLICY_INVALID;
+		return -EINVAL;
+	}
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new) {
+		reg->policy = NET_POLICY_INVALID;
+		return -ENOMEM;
+	}
+
+	spin_lock_bh(&np_hashtable_lock);
+	/* Check it in mapping table */
+	old = netpolicy_record_search(ptr_id);
+	if (old) {
+		if (old->policy != policy) {
+			put_queue(old->dev, old->rx_obj, old->tx_obj);
+			old->policy = policy;
+		}
+		kfree(new);
+	} else {
+		new->ptr_id = ptr_id;
+		new->dev = reg->dev;
+		new->policy = policy;
+		hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+	}
+	reg->policy = policy;
+	spin_unlock_bh(&np_hashtable_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(netpolicy_register);
+
+/**
+ * netpolicy_unregister() - Unregister per socket/task policy request
+ * @reg:	NET policy register info
+ *
+ * This function intends to unregister policy request by del related record
+ * from hash table.
+ *
+ */
+void netpolicy_unregister(struct netpolicy_reg *reg)
+{
+	struct netpolicy_record *record;
+	unsigned long ptr_id = (uintptr_t)reg->ptr;
+
+	spin_lock_bh(&np_hashtable_lock);
+	/* del from hash table */
+	record = netpolicy_record_search(ptr_id);
+	if (record) {
+		hash_del_rcu(&record->hash_node);
+		/* The record cannot be share. It can be safely free. */
+		put_queue(record->dev, record->rx_obj, record->tx_obj);
+		kfree(record);
+	}
+	reg->policy = NET_POLICY_INVALID;
+	spin_unlock_bh(&np_hashtable_lock);
+}
+EXPORT_SYMBOL(netpolicy_unregister);
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE",
 	"CPU",
@@ -825,6 +972,7 @@ static int netpolicy_notify(struct notifier_block *this,
 		break;
 	case NETDEV_GOING_DOWN:
 		uninit_netpolicy(dev);
+		netpolicy_record_clear_dev_node(dev);
 #ifdef CONFIG_PROC_FS
 		proc_remove(dev->proc_dev);
 		dev->proc_dev = NULL;
@@ -863,6 +1011,8 @@ void update_netpolicy_sys_map(void)
 
 			dev->netpolicy->cur_policy = NET_POLICY_NONE;
 
+			/* clear mapping table */
+			netpolicy_record_clear_obj();
 			/* rebuild everything */
 			netpolicy_disable(dev);
 			netpolicy_enable(dev);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 19/30] net/netpolicy: implement netpolicy register
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

User can register itself in netpolicy module with specific policy.
If it's the first time to register, an record will be created and
inserted into RCU hash table. The record includes ptr, policy and object
information. ptr is assigned by the user which is used as key to search
the record in hash table. Object will be assigned by netpolicy later.

If CPU/device are removed(hotplug), the assigned object will be clear.
This patch also introduces a new type NET_POLICY_INVALID, which
indicates that the task/socket are not registered.
np_hashtable_lock is introduced to protect the hash table.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |  26 ++++++++
 net/core/netpolicy.c      | 150 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 176 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index cc75e3c..89361d9 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -17,6 +17,7 @@
 #define __LINUX_NETPOLICY_H
 
 enum netpolicy_name {
+	NET_POLICY_INVALID	= -1,
 	NET_POLICY_NONE		= 0,
 	NET_POLICY_CPU,
 	NET_POLICY_BULK,
@@ -79,12 +80,37 @@ struct netpolicy_info {
 	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
 };
 
+struct netpolicy_reg {
+	struct net_device	*dev;
+	enum netpolicy_name	policy; /* required policy */
+	void			*ptr;   /* pointers */
+};
+
+/* check if policy is valid */
+static inline int is_net_policy_valid(enum netpolicy_name policy)
+{
+	return ((policy < NET_POLICY_MAX) && (policy > NET_POLICY_INVALID));
+}
+
 #ifdef CONFIG_NETPOLICY
 extern void update_netpolicy_sys_map(void);
+extern int netpolicy_register(struct netpolicy_reg *reg,
+			      enum netpolicy_name policy);
+extern void netpolicy_unregister(struct netpolicy_reg *reg);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
 }
+
+static inline int netpolicy_register(struct netpolicy_reg *reg,
+				     enum netpolicy_name policy)
+{	return 0;
+}
+
+static inline void netpolicy_unregister(struct netpolicy_reg *reg)
+{
+}
+
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index da7d9f1..13ab5e1 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -38,6 +38,19 @@
 #include <linux/sort.h>
 #include <linux/ctype.h>
 #include <linux/cpu.h>
+#include <linux/hashtable.h>
+
+struct netpolicy_record {
+	struct hlist_node	hash_node;
+	unsigned long		ptr_id;
+	enum netpolicy_name	policy;
+	struct net_device	*dev;
+	struct netpolicy_object	*rx_obj;
+	struct netpolicy_object	*tx_obj;
+};
+
+static DEFINE_HASHTABLE(np_record_hash, 10);
+static DEFINE_SPINLOCK(np_hashtable_lock);
 
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
@@ -223,6 +236,140 @@ static int netpolicy_enable(struct net_device *dev)
 	return 0;
 }
 
+static struct netpolicy_record *netpolicy_record_search(unsigned long ptr_id)
+{
+	struct netpolicy_record *rec = NULL;
+
+	hash_for_each_possible_rcu(np_record_hash, rec, hash_node, ptr_id) {
+		if (rec->ptr_id == ptr_id)
+			break;
+	}
+
+	return rec;
+}
+
+static void netpolicy_record_clear_obj(void)
+{
+	struct netpolicy_record *rec;
+	int i;
+
+	spin_lock_bh(&np_hashtable_lock);
+	hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+		rec->rx_obj = NULL;
+		rec->tx_obj = NULL;
+	}
+	spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void netpolicy_record_clear_dev_node(struct net_device *dev)
+{
+	struct netpolicy_record *rec;
+	int i;
+
+	spin_lock_bh(&np_hashtable_lock);
+	hash_for_each_rcu(np_record_hash, i, rec, hash_node) {
+		if (rec->dev == dev) {
+			hash_del_rcu(&rec->hash_node);
+			kfree(rec);
+		}
+	}
+	spin_unlock_bh(&np_hashtable_lock);
+}
+
+static void put_queue(struct net_device *dev,
+		      struct netpolicy_object *rx_obj,
+		      struct netpolicy_object *tx_obj)
+{
+	if (!dev || !dev->netpolicy)
+		return;
+
+	if (rx_obj)
+		atomic_dec(&rx_obj->refcnt);
+	if (tx_obj)
+		atomic_dec(&tx_obj->refcnt);
+}
+
+/**
+ * netpolicy_register() - Register per socket/task policy request
+ * @reg:	NET policy register info
+ * @policy:	request NET policy
+ *
+ * This function intends to register per socket/task policy request.
+ * If it's the first time to register, an record will be created and
+ * inserted into RCU hash table.
+ *
+ * The record includes ptr, policy and object info. ptr of the socket/task
+ * is the key to search the record in hash table. Object will be assigned
+ * until the first packet is received/transmitted.
+ *
+ * Return: 0 on success, others on failure
+ */
+int netpolicy_register(struct netpolicy_reg *reg,
+		       enum netpolicy_name policy)
+{
+	unsigned long ptr_id = (uintptr_t)reg->ptr;
+	struct netpolicy_record *new, *old;
+
+	if (!is_net_policy_valid(policy)) {
+		reg->policy = NET_POLICY_INVALID;
+		return -EINVAL;
+	}
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new) {
+		reg->policy = NET_POLICY_INVALID;
+		return -ENOMEM;
+	}
+
+	spin_lock_bh(&np_hashtable_lock);
+	/* Check it in mapping table */
+	old = netpolicy_record_search(ptr_id);
+	if (old) {
+		if (old->policy != policy) {
+			put_queue(old->dev, old->rx_obj, old->tx_obj);
+			old->policy = policy;
+		}
+		kfree(new);
+	} else {
+		new->ptr_id = ptr_id;
+		new->dev = reg->dev;
+		new->policy = policy;
+		hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+	}
+	reg->policy = policy;
+	spin_unlock_bh(&np_hashtable_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(netpolicy_register);
+
+/**
+ * netpolicy_unregister() - Unregister per socket/task policy request
+ * @reg:	NET policy register info
+ *
+ * This function intends to unregister policy request by del related record
+ * from hash table.
+ *
+ */
+void netpolicy_unregister(struct netpolicy_reg *reg)
+{
+	struct netpolicy_record *record;
+	unsigned long ptr_id = (uintptr_t)reg->ptr;
+
+	spin_lock_bh(&np_hashtable_lock);
+	/* del from hash table */
+	record = netpolicy_record_search(ptr_id);
+	if (record) {
+		hash_del_rcu(&record->hash_node);
+		/* The record cannot be share. It can be safely free. */
+		put_queue(record->dev, record->rx_obj, record->tx_obj);
+		kfree(record);
+	}
+	reg->policy = NET_POLICY_INVALID;
+	spin_unlock_bh(&np_hashtable_lock);
+}
+EXPORT_SYMBOL(netpolicy_unregister);
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE",
 	"CPU",
@@ -825,6 +972,7 @@ static int netpolicy_notify(struct notifier_block *this,
 		break;
 	case NETDEV_GOING_DOWN:
 		uninit_netpolicy(dev);
+		netpolicy_record_clear_dev_node(dev);
 #ifdef CONFIG_PROC_FS
 		proc_remove(dev->proc_dev);
 		dev->proc_dev = NULL;
@@ -863,6 +1011,8 @@ void update_netpolicy_sys_map(void)
 
 			dev->netpolicy->cur_policy = NET_POLICY_NONE;
 
+			/* clear mapping table */
+			netpolicy_record_clear_obj();
 			/* rebuild everything */
 			netpolicy_disable(dev);
 			netpolicy_enable(dev);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 20/30] net/netpolicy: introduce per socket netpolicy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The network socket is the most basic unit which control the network
traffic. This patch introduces a new socket option SO_NETPOLICY to
set/get net policy for socket. so that the application can set its own
policy on socket to improve the network performance.
Per socket net policy can also be inherited by new socket.

The usage of SO_NETPOLICY socket option is as below.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
getsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
The policy set by SO_NETPOLICY socket option must be valid and
compatible with current device policy. Othrewise, it will error out. The
socket policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  2 ++
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h    |  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h    |  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/net/request_sock.h             |  4 +++-
 include/net/sock.h                     |  9 +++++++++
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 28 ++++++++++++++++++++++++++++
 16 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..06b2ef9 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..24f85f0 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..82c8d44 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..b99c1df 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..71a43ed 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 2027240a..ce8b9ba 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..c041265 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..2639dcd 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE		0x402E
 
+#define SO_NETPOLICY		0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..e04e3b6 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..d43b854 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 31aede3..94a2cdf 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -86,6 +86,8 @@
 
 #define SO_CNX_ADVICE		0x0037
 
+#define SO_NETPOLICY		0x0038
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 81435d9..97f1691 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -101,4 +101,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 6ebe13e..1fa2d0e 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -101,7 +101,9 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 	sk_tx_queue_clear(req_to_sk(req));
 	req->saved_syn = NULL;
 	atomic_set(&req->rsk_refcnt, 0);
-
+#ifdef CONFIG_NETPOLICY
+	memcpy(&req_to_sk(req)->sk_netpolicy, &sk_listener->sk_netpolicy, sizeof(sk_listener->sk_netpolicy));
+#endif
 	return req;
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 649d2a8..e4721de 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -70,6 +70,7 @@
 #include <net/checksum.h>
 #include <net/tcp_states.h>
 #include <linux/net_tstamp.h>
+#include <linux/netpolicy.h>
 
 /*
  * This structure really needs to be cleaned up.
@@ -141,6 +142,7 @@ typedef __u64 __bitwise __addrpair;
  *		%SO_OOBINLINE settings, %SO_TIMESTAMPING settings
  *	@skc_incoming_cpu: record/match cpu processing incoming packets
  *	@skc_refcnt: reference count
+ *	@skc_netpolicy: per socket net policy
  *
  *	This is the minimal network layer representation of sockets, the header
  *	for struct sock and struct inet_timewait_sock.
@@ -200,6 +202,10 @@ struct sock_common {
 		struct sock	*skc_listener; /* request_sock */
 		struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */
 	};
+
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_reg    skc_netpolicy;
+#endif
 	/*
 	 * fields between dontcopy_begin/dontcopy_end
 	 * are not copied in sock_copy()
@@ -339,6 +345,9 @@ struct sock {
 #define sk_incoming_cpu		__sk_common.skc_incoming_cpu
 #define sk_flags		__sk_common.skc_flags
 #define sk_rxhash		__sk_common.skc_rxhash
+#ifdef CONFIG_NETPOLICY
+#define sk_netpolicy		__sk_common.skc_netpolicy
+#endif
 
 	socket_lock_t		sk_lock;
 	struct sk_buff_head	sk_receive_queue;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 67d632f..d2a5aeb 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -92,4 +92,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..6eaaa08 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1002,6 +1002,12 @@ set_rcvbuf:
 		if (val == 1)
 			dst_negative_advice(sk);
 		break;
+
+#ifdef CONFIG_NETPOLICY
+	case SO_NETPOLICY:
+		ret = netpolicy_register(&sk->sk_netpolicy, val);
+		break;
+#endif
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1262,6 +1268,11 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_incoming_cpu;
 		break;
 
+#ifdef CONFIG_NETPOLICY
+	case SO_NETPOLICY:
+		v.val = sk->sk_netpolicy.policy;
+		break;
+#endif
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).
@@ -1423,6 +1434,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
+
+#ifdef CONFIG_NETPOLICY
+		sk->sk_netpolicy.dev = NULL;
+		sk->sk_netpolicy.ptr = (void *)sk;
+		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
+#endif
 	}
 
 	return sk;
@@ -1460,6 +1477,10 @@ static void __sk_destruct(struct rcu_head *head)
 	put_pid(sk->sk_peer_pid);
 	if (likely(sk->sk_net_refcnt))
 		put_net(sock_net(sk));
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(sk->sk_netpolicy.policy))
+		netpolicy_unregister(&sk->sk_netpolicy);
+#endif
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
 
@@ -1596,6 +1617,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		if (sock_needs_netstamp(sk) &&
 		    newsk->sk_flags & SK_FLAGS_TIMESTAMP)
 			net_enable_timestamp();
+
+#ifdef CONFIG_NETPOLICY
+		newsk->sk_netpolicy.ptr = (void *)newsk;
+		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
+			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
+
+#endif
 	}
 out:
 	return newsk;
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 20/30] net/netpolicy: introduce per socket netpolicy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

The network socket is the most basic unit which control the network
traffic. This patch introduces a new socket option SO_NETPOLICY to
set/get net policy for socket. so that the application can set its own
policy on socket to improve the network performance.
Per socket net policy can also be inherited by new socket.

The usage of SO_NETPOLICY socket option is as below.
setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
getsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
The policy set by SO_NETPOLICY socket option must be valid and
compatible with current device policy. Othrewise, it will error out. The
socket policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 arch/alpha/include/uapi/asm/socket.h   |  2 ++
 arch/avr32/include/uapi/asm/socket.h   |  2 ++
 arch/frv/include/uapi/asm/socket.h     |  2 ++
 arch/ia64/include/uapi/asm/socket.h    |  2 ++
 arch/m32r/include/uapi/asm/socket.h    |  2 ++
 arch/mips/include/uapi/asm/socket.h    |  2 ++
 arch/mn10300/include/uapi/asm/socket.h |  2 ++
 arch/parisc/include/uapi/asm/socket.h  |  2 ++
 arch/powerpc/include/uapi/asm/socket.h |  2 ++
 arch/s390/include/uapi/asm/socket.h    |  2 ++
 arch/sparc/include/uapi/asm/socket.h   |  2 ++
 arch/xtensa/include/uapi/asm/socket.h  |  2 ++
 include/net/request_sock.h             |  4 +++-
 include/net/sock.h                     |  9 +++++++++
 include/uapi/asm-generic/socket.h      |  2 ++
 net/core/sock.c                        | 28 ++++++++++++++++++++++++++++
 16 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/arch/alpha/include/uapi/asm/socket.h b/arch/alpha/include/uapi/asm/socket.h
index 9e46d6e..06b2ef9 100644
--- a/arch/alpha/include/uapi/asm/socket.h
+++ b/arch/alpha/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/avr32/include/uapi/asm/socket.h b/arch/avr32/include/uapi/asm/socket.h
index 1fd147f..24f85f0 100644
--- a/arch/avr32/include/uapi/asm/socket.h
+++ b/arch/avr32/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI__ASM_AVR32_SOCKET_H */
diff --git a/arch/frv/include/uapi/asm/socket.h b/arch/frv/include/uapi/asm/socket.h
index afbc98f0..82c8d44 100644
--- a/arch/frv/include/uapi/asm/socket.h
+++ b/arch/frv/include/uapi/asm/socket.h
@@ -90,5 +90,7 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
 
diff --git a/arch/ia64/include/uapi/asm/socket.h b/arch/ia64/include/uapi/asm/socket.h
index 0018fad..b99c1df 100644
--- a/arch/ia64/include/uapi/asm/socket.h
+++ b/arch/ia64/include/uapi/asm/socket.h
@@ -99,4 +99,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_IA64_SOCKET_H */
diff --git a/arch/m32r/include/uapi/asm/socket.h b/arch/m32r/include/uapi/asm/socket.h
index 5fe42fc..71a43ed 100644
--- a/arch/m32r/include/uapi/asm/socket.h
+++ b/arch/m32r/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_M32R_SOCKET_H */
diff --git a/arch/mips/include/uapi/asm/socket.h b/arch/mips/include/uapi/asm/socket.h
index 2027240a..ce8b9ba 100644
--- a/arch/mips/include/uapi/asm/socket.h
+++ b/arch/mips/include/uapi/asm/socket.h
@@ -108,4 +108,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/mn10300/include/uapi/asm/socket.h b/arch/mn10300/include/uapi/asm/socket.h
index 5129f23..c041265 100644
--- a/arch/mn10300/include/uapi/asm/socket.h
+++ b/arch/mn10300/include/uapi/asm/socket.h
@@ -90,4 +90,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/parisc/include/uapi/asm/socket.h b/arch/parisc/include/uapi/asm/socket.h
index 9c935d7..2639dcd 100644
--- a/arch/parisc/include/uapi/asm/socket.h
+++ b/arch/parisc/include/uapi/asm/socket.h
@@ -89,4 +89,6 @@
 
 #define SO_CNX_ADVICE		0x402E
 
+#define SO_NETPOLICY		0x402F
+
 #endif /* _UAPI_ASM_SOCKET_H */
diff --git a/arch/powerpc/include/uapi/asm/socket.h b/arch/powerpc/include/uapi/asm/socket.h
index 1672e33..e04e3b6 100644
--- a/arch/powerpc/include/uapi/asm/socket.h
+++ b/arch/powerpc/include/uapi/asm/socket.h
@@ -97,4 +97,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif	/* _ASM_POWERPC_SOCKET_H */
diff --git a/arch/s390/include/uapi/asm/socket.h b/arch/s390/include/uapi/asm/socket.h
index 41b51c2..d43b854 100644
--- a/arch/s390/include/uapi/asm/socket.h
+++ b/arch/s390/include/uapi/asm/socket.h
@@ -96,4 +96,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* _ASM_SOCKET_H */
diff --git a/arch/sparc/include/uapi/asm/socket.h b/arch/sparc/include/uapi/asm/socket.h
index 31aede3..94a2cdf 100644
--- a/arch/sparc/include/uapi/asm/socket.h
+++ b/arch/sparc/include/uapi/asm/socket.h
@@ -86,6 +86,8 @@
 
 #define SO_CNX_ADVICE		0x0037
 
+#define SO_NETPOLICY		0x0038
+
 /* Security levels - as per NRL IPv6 - don't actually do anything */
 #define SO_SECURITY_AUTHENTICATION		0x5001
 #define SO_SECURITY_ENCRYPTION_TRANSPORT	0x5002
diff --git a/arch/xtensa/include/uapi/asm/socket.h b/arch/xtensa/include/uapi/asm/socket.h
index 81435d9..97f1691 100644
--- a/arch/xtensa/include/uapi/asm/socket.h
+++ b/arch/xtensa/include/uapi/asm/socket.h
@@ -101,4 +101,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif	/* _XTENSA_SOCKET_H */
diff --git a/include/net/request_sock.h b/include/net/request_sock.h
index 6ebe13e..1fa2d0e 100644
--- a/include/net/request_sock.h
+++ b/include/net/request_sock.h
@@ -101,7 +101,9 @@ reqsk_alloc(const struct request_sock_ops *ops, struct sock *sk_listener,
 	sk_tx_queue_clear(req_to_sk(req));
 	req->saved_syn = NULL;
 	atomic_set(&req->rsk_refcnt, 0);
-
+#ifdef CONFIG_NETPOLICY
+	memcpy(&req_to_sk(req)->sk_netpolicy, &sk_listener->sk_netpolicy, sizeof(sk_listener->sk_netpolicy));
+#endif
 	return req;
 }
 
diff --git a/include/net/sock.h b/include/net/sock.h
index 649d2a8..e4721de 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -70,6 +70,7 @@
 #include <net/checksum.h>
 #include <net/tcp_states.h>
 #include <linux/net_tstamp.h>
+#include <linux/netpolicy.h>
 
 /*
  * This structure really needs to be cleaned up.
@@ -141,6 +142,7 @@ typedef __u64 __bitwise __addrpair;
  *		%SO_OOBINLINE settings, %SO_TIMESTAMPING settings
  *	@skc_incoming_cpu: record/match cpu processing incoming packets
  *	@skc_refcnt: reference count
+ *	@skc_netpolicy: per socket net policy
  *
  *	This is the minimal network layer representation of sockets, the header
  *	for struct sock and struct inet_timewait_sock.
@@ -200,6 +202,10 @@ struct sock_common {
 		struct sock	*skc_listener; /* request_sock */
 		struct inet_timewait_death_row *skc_tw_dr; /* inet_timewait_sock */
 	};
+
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_reg    skc_netpolicy;
+#endif
 	/*
 	 * fields between dontcopy_begin/dontcopy_end
 	 * are not copied in sock_copy()
@@ -339,6 +345,9 @@ struct sock {
 #define sk_incoming_cpu		__sk_common.skc_incoming_cpu
 #define sk_flags		__sk_common.skc_flags
 #define sk_rxhash		__sk_common.skc_rxhash
+#ifdef CONFIG_NETPOLICY
+#define sk_netpolicy		__sk_common.skc_netpolicy
+#endif
 
 	socket_lock_t		sk_lock;
 	struct sk_buff_head	sk_receive_queue;
diff --git a/include/uapi/asm-generic/socket.h b/include/uapi/asm-generic/socket.h
index 67d632f..d2a5aeb 100644
--- a/include/uapi/asm-generic/socket.h
+++ b/include/uapi/asm-generic/socket.h
@@ -92,4 +92,6 @@
 
 #define SO_CNX_ADVICE		53
 
+#define SO_NETPOLICY		54
+
 #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/net/core/sock.c b/net/core/sock.c
index 08bf97e..6eaaa08 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1002,6 +1002,12 @@ set_rcvbuf:
 		if (val == 1)
 			dst_negative_advice(sk);
 		break;
+
+#ifdef CONFIG_NETPOLICY
+	case SO_NETPOLICY:
+		ret = netpolicy_register(&sk->sk_netpolicy, val);
+		break;
+#endif
 	default:
 		ret = -ENOPROTOOPT;
 		break;
@@ -1262,6 +1268,11 @@ int sock_getsockopt(struct socket *sock, int level, int optname,
 		v.val = sk->sk_incoming_cpu;
 		break;
 
+#ifdef CONFIG_NETPOLICY
+	case SO_NETPOLICY:
+		v.val = sk->sk_netpolicy.policy;
+		break;
+#endif
 	default:
 		/* We implement the SO_SNDLOWAT etc to not be settable
 		 * (1003.1g 7).
@@ -1423,6 +1434,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 
 		sock_update_classid(&sk->sk_cgrp_data);
 		sock_update_netprioidx(&sk->sk_cgrp_data);
+
+#ifdef CONFIG_NETPOLICY
+		sk->sk_netpolicy.dev = NULL;
+		sk->sk_netpolicy.ptr = (void *)sk;
+		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
+#endif
 	}
 
 	return sk;
@@ -1460,6 +1477,10 @@ static void __sk_destruct(struct rcu_head *head)
 	put_pid(sk->sk_peer_pid);
 	if (likely(sk->sk_net_refcnt))
 		put_net(sock_net(sk));
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(sk->sk_netpolicy.policy))
+		netpolicy_unregister(&sk->sk_netpolicy);
+#endif
 	sk_prot_free(sk->sk_prot_creator, sk);
 }
 
@@ -1596,6 +1617,13 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		if (sock_needs_netstamp(sk) &&
 		    newsk->sk_flags & SK_FLAGS_TIMESTAMP)
 			net_enable_timestamp();
+
+#ifdef CONFIG_NETPOLICY
+		newsk->sk_netpolicy.ptr = (void *)newsk;
+		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
+			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
+
+#endif
 	}
 out:
 	return newsk;
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 21/30] net/policy: introduce netpolicy_pick_queue
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

This function will be used to get assigned queues by policy and ptr.
If it's first time, get_avail_queue will be called to find the available
object from the given policy object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |   5 ++
 net/core/netpolicy.c      | 119 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 89361d9..e20820d 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void);
 extern int netpolicy_register(struct netpolicy_reg *reg,
 			      enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_reg *reg);
+extern int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct netpolicy_reg *reg)
 {
 }
 
+static inline int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
+{
+	return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 13ab5e1..6992d08 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -289,6 +289,125 @@ static void put_queue(struct net_device *dev,
 		atomic_dec(&tx_obj->refcnt);
 }
 
+static struct netpolicy_object *get_avail_queue(struct net_device *dev,
+						enum netpolicy_name policy,
+						bool is_rx)
+{
+	int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+	struct netpolicy_object *tmp, *obj = NULL;
+	int val = -1;
+
+	/* Check if net policy is supported */
+	if (!dev || !dev->netpolicy)
+		return NULL;
+
+	/* The system should have queues which support the request policy. */
+	if ((policy != dev->netpolicy->cur_policy) &&
+	    (dev->netpolicy->cur_policy != NET_POLICY_MIX))
+		return NULL;
+
+	spin_lock(&dev->np_ob_list_lock);
+	list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) {
+		if ((val > atomic_read(&tmp->refcnt)) ||
+		    (val == -1)) {
+			val = atomic_read(&tmp->refcnt);
+			obj = tmp;
+		}
+	}
+	spin_unlock(&dev->np_ob_list_lock);
+
+	if (WARN_ON(!obj))
+		return NULL;
+	atomic_inc(&obj->refcnt);
+
+	return obj;
+}
+
+/**
+ * netpolicy_pick_queue() - Find assigned queue
+ * @reg:	NET policy register info
+ * @is_rx:	RX queue or TX queue
+ *
+ * This function intends to find the assigned queue according to policy and
+ * ptr. If it's first time, get_avail_queue will be called to find the
+ * available object from the given policy object list. Then the object info
+ * will be updated in the hash table.
+ *
+ * Return: negative on failure, otherwise on the assigned queue
+ */
+int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
+{
+	struct netpolicy_record *old_record, *new_record;
+	struct net_device *dev = reg->dev;
+	enum netpolicy_name cur_policy;
+	unsigned long ptr_id = (uintptr_t)reg->ptr;
+	int queue = -1;
+
+	if (!dev || !dev->netpolicy)
+		goto err;
+
+	cur_policy = dev->netpolicy->cur_policy;
+	if ((reg->policy == NET_POLICY_NONE) ||
+	    (cur_policy == NET_POLICY_NONE))
+		return queue;
+
+	if (((cur_policy != NET_POLICY_MIX) && (cur_policy != reg->policy)) ||
+	    ((cur_policy == NET_POLICY_MIX) && (reg->policy == NET_POLICY_CPU))) {
+		pr_warn("NETPOLICY: %s current device policy %s doesn't support required policy %s! Remove net policy settings!\n",
+			dev->name, policy_name[cur_policy],
+			policy_name[reg->policy]);
+		goto err;
+	}
+
+	old_record = netpolicy_record_search(ptr_id);
+	if (!old_record) {
+		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
+		goto err;
+	}
+
+	new_record = kzalloc(sizeof(*new_record), GFP_KERNEL);
+	if (!new_record)
+		return -ENOMEM;
+	memcpy(new_record, old_record, sizeof(*new_record));
+
+	if (is_rx) {
+		if (!new_record->rx_obj) {
+			new_record->rx_obj = get_avail_queue(dev, new_record->policy, is_rx);
+			if (!new_record->dev)
+				new_record->dev = dev;
+			if (!new_record->rx_obj) {
+				kfree(new_record);
+				return -ENOTSUPP;
+			}
+		}
+		queue = new_record->rx_obj->queue;
+	} else {
+		if (!new_record->tx_obj) {
+			new_record->tx_obj = get_avail_queue(dev, new_record->policy, is_rx);
+			if (!new_record->dev)
+				new_record->dev = dev;
+			if (!new_record->tx_obj) {
+				kfree(new_record);
+				return -ENOTSUPP;
+			}
+		}
+		queue = new_record->tx_obj->queue;
+	}
+
+	/* update record */
+	spin_lock_bh(&np_hashtable_lock);
+	hlist_replace_rcu(&old_record->hash_node, &new_record->hash_node);
+	spin_unlock_bh(&np_hashtable_lock);
+	kfree(old_record);
+
+	return queue;
+
+err:
+	netpolicy_unregister(reg);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(netpolicy_pick_queue);
+
 /**
  * netpolicy_register() - Register per socket/task policy request
  * @reg:	NET policy register info
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 21/30] net/policy: introduce netpolicy_pick_queue
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

This function will be used to get assigned queues by policy and ptr.
If it's first time, get_avail_queue will be called to find the available
object from the given policy object list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h |   5 ++
 net/core/netpolicy.c      | 119 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 124 insertions(+)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 89361d9..e20820d 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -97,6 +97,7 @@ extern void update_netpolicy_sys_map(void);
 extern int netpolicy_register(struct netpolicy_reg *reg,
 			      enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_reg *reg);
+extern int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -111,6 +112,10 @@ static inline void netpolicy_unregister(struct netpolicy_reg *reg)
 {
 }
 
+static inline int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
+{
+	return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 13ab5e1..6992d08 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -289,6 +289,125 @@ static void put_queue(struct net_device *dev,
 		atomic_dec(&tx_obj->refcnt);
 }
 
+static struct netpolicy_object *get_avail_queue(struct net_device *dev,
+						enum netpolicy_name policy,
+						bool is_rx)
+{
+	int dir = is_rx ? NETPOLICY_RX : NETPOLICY_TX;
+	struct netpolicy_object *tmp, *obj = NULL;
+	int val = -1;
+
+	/* Check if net policy is supported */
+	if (!dev || !dev->netpolicy)
+		return NULL;
+
+	/* The system should have queues which support the request policy. */
+	if ((policy != dev->netpolicy->cur_policy) &&
+	    (dev->netpolicy->cur_policy != NET_POLICY_MIX))
+		return NULL;
+
+	spin_lock(&dev->np_ob_list_lock);
+	list_for_each_entry(tmp, &dev->netpolicy->obj_list[dir][policy], list) {
+		if ((val > atomic_read(&tmp->refcnt)) ||
+		    (val == -1)) {
+			val = atomic_read(&tmp->refcnt);
+			obj = tmp;
+		}
+	}
+	spin_unlock(&dev->np_ob_list_lock);
+
+	if (WARN_ON(!obj))
+		return NULL;
+	atomic_inc(&obj->refcnt);
+
+	return obj;
+}
+
+/**
+ * netpolicy_pick_queue() - Find assigned queue
+ * @reg:	NET policy register info
+ * @is_rx:	RX queue or TX queue
+ *
+ * This function intends to find the assigned queue according to policy and
+ * ptr. If it's first time, get_avail_queue will be called to find the
+ * available object from the given policy object list. Then the object info
+ * will be updated in the hash table.
+ *
+ * Return: negative on failure, otherwise on the assigned queue
+ */
+int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
+{
+	struct netpolicy_record *old_record, *new_record;
+	struct net_device *dev = reg->dev;
+	enum netpolicy_name cur_policy;
+	unsigned long ptr_id = (uintptr_t)reg->ptr;
+	int queue = -1;
+
+	if (!dev || !dev->netpolicy)
+		goto err;
+
+	cur_policy = dev->netpolicy->cur_policy;
+	if ((reg->policy == NET_POLICY_NONE) ||
+	    (cur_policy == NET_POLICY_NONE))
+		return queue;
+
+	if (((cur_policy != NET_POLICY_MIX) && (cur_policy != reg->policy)) ||
+	    ((cur_policy == NET_POLICY_MIX) && (reg->policy == NET_POLICY_CPU))) {
+		pr_warn("NETPOLICY: %s current device policy %s doesn't support required policy %s! Remove net policy settings!\n",
+			dev->name, policy_name[cur_policy],
+			policy_name[reg->policy]);
+		goto err;
+	}
+
+	old_record = netpolicy_record_search(ptr_id);
+	if (!old_record) {
+		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
+		goto err;
+	}
+
+	new_record = kzalloc(sizeof(*new_record), GFP_KERNEL);
+	if (!new_record)
+		return -ENOMEM;
+	memcpy(new_record, old_record, sizeof(*new_record));
+
+	if (is_rx) {
+		if (!new_record->rx_obj) {
+			new_record->rx_obj = get_avail_queue(dev, new_record->policy, is_rx);
+			if (!new_record->dev)
+				new_record->dev = dev;
+			if (!new_record->rx_obj) {
+				kfree(new_record);
+				return -ENOTSUPP;
+			}
+		}
+		queue = new_record->rx_obj->queue;
+	} else {
+		if (!new_record->tx_obj) {
+			new_record->tx_obj = get_avail_queue(dev, new_record->policy, is_rx);
+			if (!new_record->dev)
+				new_record->dev = dev;
+			if (!new_record->tx_obj) {
+				kfree(new_record);
+				return -ENOTSUPP;
+			}
+		}
+		queue = new_record->tx_obj->queue;
+	}
+
+	/* update record */
+	spin_lock_bh(&np_hashtable_lock);
+	hlist_replace_rcu(&old_record->hash_node, &new_record->hash_node);
+	spin_unlock_bh(&np_hashtable_lock);
+	kfree(old_record);
+
+	return queue;
+
+err:
+	netpolicy_unregister(reg);
+	return -EINVAL;
+}
+EXPORT_SYMBOL(netpolicy_pick_queue);
+
 /**
  * netpolicy_register() - Register per socket/task policy request
  * @reg:	NET policy register info
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 22/30] net/netpolicy: set tx queues according to policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

When the device tries to transmit a buffer, netdev_pick_tx is called to
find the available tx queues. This patch checks the per socket net
policy of the binding socket of the buffer. If net policy is set, it
picks up the assigned tx queue from net policy module, and redirect the
traffic to the assigned queue.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/dev.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7894e40..6108e3b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 				    void *accel_priv)
 {
 	int queue_index = 0;
+	struct sock *sk = skb->sk;
 
 #ifdef CONFIG_XPS
 	u32 sender_cpu = skb->sender_cpu - 1;
@@ -3279,8 +3280,21 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 		if (ops->ndo_select_queue)
 			queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
 							    __netdev_pick_tx);
-		else
-			queue_index = __netdev_pick_tx(dev, skb);
+		else {
+#ifdef CONFIG_NETPOLICY
+			queue_index = -1;
+			if (sk && (sk->sk_netpolicy.policy > NET_POLICY_NONE)) {
+				/* There is no device bind to socket when setting policy
+				 * Assign the dev now.
+				 */
+				if (!sk->sk_netpolicy.dev)
+					sk->sk_netpolicy.dev = dev;
+				queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, false);
+			}
+			if (queue_index < 0)
+#endif
+				queue_index = __netdev_pick_tx(dev, skb);
+		}
 
 		if (!accel_priv)
 			queue_index = netdev_cap_txqueue(dev, queue_index);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 22/30] net/netpolicy: set tx queues according to policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

When the device tries to transmit a buffer, netdev_pick_tx is called to
find the available tx queues. This patch checks the per socket net
policy of the binding socket of the buffer. If net policy is set, it
picks up the assigned tx queue from net policy module, and redirect the
traffic to the assigned queue.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 net/core/dev.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7894e40..6108e3b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3266,6 +3266,7 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 				    void *accel_priv)
 {
 	int queue_index = 0;
+	struct sock *sk = skb->sk;
 
 #ifdef CONFIG_XPS
 	u32 sender_cpu = skb->sender_cpu - 1;
@@ -3279,8 +3280,21 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 		if (ops->ndo_select_queue)
 			queue_index = ops->ndo_select_queue(dev, skb, accel_priv,
 							    __netdev_pick_tx);
-		else
-			queue_index = __netdev_pick_tx(dev, skb);
+		else {
+#ifdef CONFIG_NETPOLICY
+			queue_index = -1;
+			if (sk && (sk->sk_netpolicy.policy > NET_POLICY_NONE)) {
+				/* There is no device bind to socket when setting policy
+				 * Assign the dev now.
+				 */
+				if (!sk->sk_netpolicy.dev)
+					sk->sk_netpolicy.dev = dev;
+				queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, false);
+			}
+			if (queue_index < 0)
+#endif
+				queue_index = __netdev_pick_tx(dev, skb);
+		}
 
 		if (!accel_priv)
 			queue_index = netdev_cap_txqueue(dev, queue_index);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 23/30] i40e/ethtool: support RX_CLS_LOC_ANY
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

The existing special location RX_CLS_LOC_ANY flag is designed for the
case which the caller does not know/care about the location. Now, this
flag is only handled in ethtool user space. If the kernel directly calls
the ETHTOOL_SRXCLSRLINS interface with RX_CLS_LOC_ANY flag set, it will
error out.
This patch implements the RX_CLS_LOC_ANY support for i40e driver. It
finds the available location from the end of the list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 38 ++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1f3537e..4276ed7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2552,6 +2552,32 @@ static int i40e_del_fdir_entry(struct i40e_vsi *vsi,
 	return ret;
 }
 
+static int find_empty_slot(struct i40e_pf *pf)
+{
+	struct i40e_fdir_filter *rule;
+	struct hlist_node *node2;
+	__u32 data = i40e_get_fd_cnt_all(pf);
+	unsigned long *slot;
+	int i;
+
+	slot = kzalloc(BITS_TO_LONGS(data) * sizeof(long), GFP_KERNEL);
+	if (!slot)
+		return -ENOMEM;
+
+	hlist_for_each_entry_safe(rule, node2,
+				  &pf->fdir_filter_list, fdir_node) {
+		set_bit(rule->fd_id, slot);
+	}
+
+	for (i = data - 1; i > 0; i--) {
+		if (!test_bit(i, slot))
+			break;
+	}
+	kfree(slot);
+
+	return i;
+}
+
 /**
  * i40e_add_fdir_ethtool - Add/Remove Flow Director filters
  * @vsi: pointer to the targeted VSI
@@ -2588,9 +2614,15 @@ static int i40e_add_fdir_ethtool(struct i40e_vsi *vsi,
 
 	fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
 
-	if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
-			      pf->hw.func_caps.fd_filters_guaranteed)) {
-		return -EINVAL;
+	if (fsp->location != RX_CLS_LOC_ANY) {
+		if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
+				      pf->hw.func_caps.fd_filters_guaranteed)) {
+			return -EINVAL;
+		}
+	} else {
+		fsp->location = find_empty_slot(pf);
+		if (fsp->location < 0)
+			return -ENOSPC;
 	}
 
 	if ((fsp->ring_cookie != RX_CLS_FLOW_DISC) &&
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 23/30] i40e/ethtool: support RX_CLS_LOC_ANY
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

The existing special location RX_CLS_LOC_ANY flag is designed for the
case which the caller does not know/care about the location. Now, this
flag is only handled in ethtool user space. If the kernel directly calls
the ETHTOOL_SRXCLSRLINS interface with RX_CLS_LOC_ANY flag set, it will
error out.
This patch implements the RX_CLS_LOC_ANY support for i40e driver. It
finds the available location from the end of the list.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 38 ++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1f3537e..4276ed7 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2552,6 +2552,32 @@ static int i40e_del_fdir_entry(struct i40e_vsi *vsi,
 	return ret;
 }
 
+static int find_empty_slot(struct i40e_pf *pf)
+{
+	struct i40e_fdir_filter *rule;
+	struct hlist_node *node2;
+	__u32 data = i40e_get_fd_cnt_all(pf);
+	unsigned long *slot;
+	int i;
+
+	slot = kzalloc(BITS_TO_LONGS(data) * sizeof(long), GFP_KERNEL);
+	if (!slot)
+		return -ENOMEM;
+
+	hlist_for_each_entry_safe(rule, node2,
+				  &pf->fdir_filter_list, fdir_node) {
+		set_bit(rule->fd_id, slot);
+	}
+
+	for (i = data - 1; i > 0; i--) {
+		if (!test_bit(i, slot))
+			break;
+	}
+	kfree(slot);
+
+	return i;
+}
+
 /**
  * i40e_add_fdir_ethtool - Add/Remove Flow Director filters
  * @vsi: pointer to the targeted VSI
@@ -2588,9 +2614,15 @@ static int i40e_add_fdir_ethtool(struct i40e_vsi *vsi,
 
 	fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
 
-	if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
-			      pf->hw.func_caps.fd_filters_guaranteed)) {
-		return -EINVAL;
+	if (fsp->location != RX_CLS_LOC_ANY) {
+		if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
+				      pf->hw.func_caps.fd_filters_guaranteed)) {
+			return -EINVAL;
+		}
+	} else {
+		fsp->location = find_empty_slot(pf);
+		if (fsp->location < 0)
+			return -ENOSPC;
 	}
 
 	if ((fsp->ring_cookie != RX_CLS_FLOW_DISC) &&
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 24/30] net/netpolicy: set rx queues according to policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

For setting rx queues, this patch add rules for Flow Director filters.
Since we may not get all the information required for rule until the
first package arrived, it will add the rule after recvmsg. The first
several packages may not use the assigned queue.
The dev information will be discarded in udp_queue_rcv_skb, so we record
it in netpolicy struct in advance.
This patch only support INET tcp4 and udp4. It can be extend to other
socket type and V6 later shortly.
For each sk, it only supports one rule. If the port/address changed, the
previos rule will be replaced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 33 ++++++++++++++++--
 net/core/netpolicy.c      | 89 +++++++++++++++++++++++++++++++++++++++++++++++
 net/core/sock.c           |  4 +++
 net/ipv4/af_inet.c        | 55 +++++++++++++++++++++++++++++
 net/ipv4/udp.c            |  4 +++
 5 files changed, 183 insertions(+), 2 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index e20820d..1cd5ac4 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -82,8 +82,27 @@ struct netpolicy_info {
 
 struct netpolicy_reg {
 	struct net_device	*dev;
-	enum netpolicy_name	policy; /* required policy */
-	void			*ptr;   /* pointers */
+	enum netpolicy_name	policy;		/* required policy */
+	void			*ptr;		/* pointers */
+	u32			location;	/* rule location */
+	u32			rule_queue;	/* queue set by rule */
+};
+
+struct netpolicy_tcpudpip4_spec {
+	/* source and Destination host and port */
+	__be32	ip4src;
+	__be32	ip4dst;
+	__be16	psrc;
+	__be16	pdst;
+};
+
+union netpolicy_flow_union {
+	struct netpolicy_tcpudpip4_spec		tcp_udp_ip4_spec;
+};
+
+struct netpolicy_flow_spec {
+	__u32	flow_type;
+	union netpolicy_flow_union	spec;
 };
 
 /* check if policy is valid */
@@ -98,6 +117,9 @@ extern int netpolicy_register(struct netpolicy_reg *reg,
 			      enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_reg *reg);
 extern int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
+extern int netpolicy_set_rules(struct netpolicy_reg *reg,
+			       u32 queue_index,
+			       struct netpolicy_flow_spec *flow);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -116,6 +138,13 @@ static inline int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 {
 	return 0;
 }
+
+static inline int netpolicy_set_rules(struct netpolicy_reg *reg,
+				      u32 queue_index,
+				      struct netpolicy_flow_spec *flow)
+{
+	return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 6992d08..0ed3080 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -39,6 +39,7 @@
 #include <linux/ctype.h>
 #include <linux/cpu.h>
 #include <linux/hashtable.h>
+#include <net/rtnetlink.h>
 
 struct netpolicy_record {
 	struct hlist_node	hash_node;
@@ -474,6 +475,20 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 {
 	struct netpolicy_record *record;
 	unsigned long ptr_id = (uintptr_t)reg->ptr;
+	struct net_device *dev = reg->dev;
+
+	/* remove FD rules */
+	if (dev && reg->location != ~0) {
+		struct ethtool_rxnfc del_cmd;
+
+		del_cmd.cmd = ETHTOOL_SRXCLSRLDEL;
+		del_cmd.fs.location = reg->location;
+		rtnl_lock();
+		dev->ethtool_ops->set_rxnfc(dev, &del_cmd);
+		rtnl_unlock();
+		reg->location = ~0;
+		reg->rule_queue = ~0;
+	}
 
 	spin_lock_bh(&np_hashtable_lock);
 	/* del from hash table */
@@ -489,6 +504,80 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 }
 EXPORT_SYMBOL(netpolicy_unregister);
 
+/**
+ * netpolicy_set_rules() - Configure Rx network flow classification rules
+ * @reg:		NET policy register info
+ * @queue_index:	Rx queue which want to set rules
+ * @flow:		Target flow to apply rules
+ *
+ * This function intends to configure Rx network flow classification rules
+ * according to ip and port information.
+ *
+ * Currently, it only supports TCP and UDP V4. Other protocols will be
+ * supported later.
+ *
+ * Return: 0 on success, others on failure
+ */
+int netpolicy_set_rules(struct netpolicy_reg *reg,
+			u32 queue_index,
+			struct netpolicy_flow_spec *flow)
+{
+	int ret;
+	struct ethtool_rxnfc cmd;
+	struct net_device *dev = reg->dev;
+
+	if (!dev)
+		return -EINVAL;
+
+	/* Check if ntuple is supported */
+	if (!dev->ethtool_ops->set_rxnfc)
+		return -EOPNOTSUPP;
+
+	/* Only support TCP/UDP V4 by now */
+	if ((flow->flow_type != TCP_V4_FLOW) &&
+	    (flow->flow_type != UDP_V4_FLOW))
+		return -EOPNOTSUPP;
+
+	/* using flow-type (Flow Director filters) */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.cmd = ETHTOOL_SRXCLSRLINS;
+	cmd.fs.flow_type = flow->flow_type;
+	cmd.fs.h_u.tcp_ip4_spec.ip4src = flow->spec.tcp_udp_ip4_spec.ip4src;
+	cmd.fs.h_u.tcp_ip4_spec.psrc = flow->spec.tcp_udp_ip4_spec.psrc;
+	cmd.fs.h_u.tcp_ip4_spec.ip4dst = flow->spec.tcp_udp_ip4_spec.ip4dst;
+	cmd.fs.h_u.tcp_ip4_spec.pdst = flow->spec.tcp_udp_ip4_spec.pdst;
+	cmd.fs.ring_cookie = queue_index;
+	cmd.fs.location = RX_CLS_LOC_ANY;
+	rtnl_lock();
+	ret = dev->ethtool_ops->set_rxnfc(dev, &cmd);
+	rtnl_unlock();
+	if (ret < 0) {
+		pr_warn("Failed to set rules ret %d\n", ret);
+		return ret;
+	}
+
+	/* TODO: now one sk only has one rule */
+	if (reg->location != ~0) {
+		/* delete the old rule */
+		struct ethtool_rxnfc del_cmd;
+
+		del_cmd.cmd = ETHTOOL_SRXCLSRLDEL;
+		del_cmd.fs.location = reg->location;
+		rtnl_lock();
+		ret = dev->ethtool_ops->set_rxnfc(dev, &del_cmd);
+		rtnl_unlock();
+		if (ret < 0)
+			pr_warn("Failed to delete rules ret %d\n", ret);
+	}
+
+	/* record rule location */
+	reg->location = cmd.fs.location;
+	reg->rule_queue = queue_index;
+
+	return ret;
+}
+EXPORT_SYMBOL(netpolicy_set_rules);
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE",
 	"CPU",
diff --git a/net/core/sock.c b/net/core/sock.c
index 6eaaa08..849274a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1439,6 +1439,8 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sk->sk_netpolicy.dev = NULL;
 		sk->sk_netpolicy.ptr = (void *)sk;
 		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
+		sk->sk_netpolicy.location = ~0;
+		sk->sk_netpolicy.rule_queue = ~0;
 #endif
 	}
 
@@ -1620,6 +1622,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 #ifdef CONFIG_NETPOLICY
 		newsk->sk_netpolicy.ptr = (void *)newsk;
+		newsk->sk_netpolicy.location = ~0;
+		newsk->sk_netpolicy.rule_queue = ~0;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
 			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 55513e6..889ffdc 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -759,6 +759,55 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 }
 EXPORT_SYMBOL(inet_sendpage);
 
+static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
+{
+#ifdef CONFIG_NETPOLICY
+	int queue_index;
+	struct netpolicy_flow_spec flow;
+
+	if (!sk->sk_netpolicy.dev)
+		return;
+
+	if (sk->sk_netpolicy.policy <= NET_POLICY_NONE)
+		return;
+
+	queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, true);
+	if ((queue_index < 0) ||
+	    (queue_index == sk->sk_netpolicy.rule_queue))
+		return;
+
+	memset(&flow, 0, sizeof(flow));
+	/* TODO: need to change here and add more protocol support */
+	if (sk->sk_protocol == IPPROTO_TCP &&
+	    sk->sk_type == SOCK_STREAM) {
+		flow.flow_type = TCP_V4_FLOW;
+		flow.spec.tcp_udp_ip4_spec.ip4src = sk->sk_daddr;
+		flow.spec.tcp_udp_ip4_spec.psrc = sk->sk_dport;
+		flow.spec.tcp_udp_ip4_spec.ip4dst = sk->sk_rcv_saddr;
+		flow.spec.tcp_udp_ip4_spec.pdst = htons(sk->sk_num);
+	} else if (sk->sk_protocol == IPPROTO_UDP &&
+		   sk->sk_type == SOCK_DGRAM) {
+		DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
+
+		flow.flow_type = UDP_V4_FLOW;
+		if (sin && sin->sin_addr.s_addr)
+			flow.spec.tcp_udp_ip4_spec.ip4src = sin->sin_addr.s_addr;
+		else
+			return;
+		if (sin && sin->sin_port)
+			flow.spec.tcp_udp_ip4_spec.psrc = sin->sin_port;
+		else
+			return;
+		flow.spec.tcp_udp_ip4_spec.ip4dst = sk->sk_rcv_saddr;
+		flow.spec.tcp_udp_ip4_spec.pdst = htons(sk->sk_num);
+	} else {
+		return;
+	}
+	netpolicy_set_rules(&sk->sk_netpolicy, queue_index, &flow);
+
+#endif
+}
+
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags)
 {
@@ -772,6 +821,12 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 				   flags & ~MSG_DONTWAIT, &addr_len);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
+
+	/* The dev info, src address and port information for UDP
+	 * can only be retrieved after processing the msg.
+	 */
+	sock_netpolicy_manage_flow(sk, msg);
+
 	return err;
 }
 EXPORT_SYMBOL(inet_recvmsg);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ca5e8ea..13181c8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1785,6 +1785,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk) {
 		int ret;
 
+#ifdef CONFIG_NETPOLICY
+		/* Record dev info before it's discarded in udp_queue_rcv_skb */
+		sk->sk_netpolicy.dev = skb->dev;
+#endif
 		if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
 			skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
 						 inet_compute_pseudo);
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 24/30] net/netpolicy: set rx queues according to policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

For setting rx queues, this patch add rules for Flow Director filters.
Since we may not get all the information required for rule until the
first package arrived, it will add the rule after recvmsg. The first
several packages may not use the assigned queue.
The dev information will be discarded in udp_queue_rcv_skb, so we record
it in netpolicy struct in advance.
This patch only support INET tcp4 and udp4. It can be extend to other
socket type and V6 later shortly.
For each sk, it only supports one rule. If the port/address changed, the
previos rule will be replaced.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/netpolicy.h | 33 ++++++++++++++++--
 net/core/netpolicy.c      | 89 +++++++++++++++++++++++++++++++++++++++++++++++
 net/core/sock.c           |  4 +++
 net/ipv4/af_inet.c        | 55 +++++++++++++++++++++++++++++
 net/ipv4/udp.c            |  4 +++
 5 files changed, 183 insertions(+), 2 deletions(-)

diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index e20820d..1cd5ac4 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -82,8 +82,27 @@ struct netpolicy_info {
 
 struct netpolicy_reg {
 	struct net_device	*dev;
-	enum netpolicy_name	policy; /* required policy */
-	void			*ptr;   /* pointers */
+	enum netpolicy_name	policy;		/* required policy */
+	void			*ptr;		/* pointers */
+	u32			location;	/* rule location */
+	u32			rule_queue;	/* queue set by rule */
+};
+
+struct netpolicy_tcpudpip4_spec {
+	/* source and Destination host and port */
+	__be32	ip4src;
+	__be32	ip4dst;
+	__be16	psrc;
+	__be16	pdst;
+};
+
+union netpolicy_flow_union {
+	struct netpolicy_tcpudpip4_spec		tcp_udp_ip4_spec;
+};
+
+struct netpolicy_flow_spec {
+	__u32	flow_type;
+	union netpolicy_flow_union	spec;
 };
 
 /* check if policy is valid */
@@ -98,6 +117,9 @@ extern int netpolicy_register(struct netpolicy_reg *reg,
 			      enum netpolicy_name policy);
 extern void netpolicy_unregister(struct netpolicy_reg *reg);
 extern int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
+extern int netpolicy_set_rules(struct netpolicy_reg *reg,
+			       u32 queue_index,
+			       struct netpolicy_flow_spec *flow);
 #else
 static inline void update_netpolicy_sys_map(void)
 {
@@ -116,6 +138,13 @@ static inline int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 {
 	return 0;
 }
+
+static inline int netpolicy_set_rules(struct netpolicy_reg *reg,
+				      u32 queue_index,
+				      struct netpolicy_flow_spec *flow)
+{
+	return 0;
+}
 #endif
 
 #endif /*__LINUX_NETPOLICY_H*/
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 6992d08..0ed3080 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -39,6 +39,7 @@
 #include <linux/ctype.h>
 #include <linux/cpu.h>
 #include <linux/hashtable.h>
+#include <net/rtnetlink.h>
 
 struct netpolicy_record {
 	struct hlist_node	hash_node;
@@ -474,6 +475,20 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 {
 	struct netpolicy_record *record;
 	unsigned long ptr_id = (uintptr_t)reg->ptr;
+	struct net_device *dev = reg->dev;
+
+	/* remove FD rules */
+	if (dev && reg->location != ~0) {
+		struct ethtool_rxnfc del_cmd;
+
+		del_cmd.cmd = ETHTOOL_SRXCLSRLDEL;
+		del_cmd.fs.location = reg->location;
+		rtnl_lock();
+		dev->ethtool_ops->set_rxnfc(dev, &del_cmd);
+		rtnl_unlock();
+		reg->location = ~0;
+		reg->rule_queue = ~0;
+	}
 
 	spin_lock_bh(&np_hashtable_lock);
 	/* del from hash table */
@@ -489,6 +504,80 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 }
 EXPORT_SYMBOL(netpolicy_unregister);
 
+/**
+ * netpolicy_set_rules() - Configure Rx network flow classification rules
+ * @reg:		NET policy register info
+ * @queue_index:	Rx queue which want to set rules
+ * @flow:		Target flow to apply rules
+ *
+ * This function intends to configure Rx network flow classification rules
+ * according to ip and port information.
+ *
+ * Currently, it only supports TCP and UDP V4. Other protocols will be
+ * supported later.
+ *
+ * Return: 0 on success, others on failure
+ */
+int netpolicy_set_rules(struct netpolicy_reg *reg,
+			u32 queue_index,
+			struct netpolicy_flow_spec *flow)
+{
+	int ret;
+	struct ethtool_rxnfc cmd;
+	struct net_device *dev = reg->dev;
+
+	if (!dev)
+		return -EINVAL;
+
+	/* Check if ntuple is supported */
+	if (!dev->ethtool_ops->set_rxnfc)
+		return -EOPNOTSUPP;
+
+	/* Only support TCP/UDP V4 by now */
+	if ((flow->flow_type != TCP_V4_FLOW) &&
+	    (flow->flow_type != UDP_V4_FLOW))
+		return -EOPNOTSUPP;
+
+	/* using flow-type (Flow Director filters) */
+	memset(&cmd, 0, sizeof(cmd));
+	cmd.cmd = ETHTOOL_SRXCLSRLINS;
+	cmd.fs.flow_type = flow->flow_type;
+	cmd.fs.h_u.tcp_ip4_spec.ip4src = flow->spec.tcp_udp_ip4_spec.ip4src;
+	cmd.fs.h_u.tcp_ip4_spec.psrc = flow->spec.tcp_udp_ip4_spec.psrc;
+	cmd.fs.h_u.tcp_ip4_spec.ip4dst = flow->spec.tcp_udp_ip4_spec.ip4dst;
+	cmd.fs.h_u.tcp_ip4_spec.pdst = flow->spec.tcp_udp_ip4_spec.pdst;
+	cmd.fs.ring_cookie = queue_index;
+	cmd.fs.location = RX_CLS_LOC_ANY;
+	rtnl_lock();
+	ret = dev->ethtool_ops->set_rxnfc(dev, &cmd);
+	rtnl_unlock();
+	if (ret < 0) {
+		pr_warn("Failed to set rules ret %d\n", ret);
+		return ret;
+	}
+
+	/* TODO: now one sk only has one rule */
+	if (reg->location != ~0) {
+		/* delete the old rule */
+		struct ethtool_rxnfc del_cmd;
+
+		del_cmd.cmd = ETHTOOL_SRXCLSRLDEL;
+		del_cmd.fs.location = reg->location;
+		rtnl_lock();
+		ret = dev->ethtool_ops->set_rxnfc(dev, &del_cmd);
+		rtnl_unlock();
+		if (ret < 0)
+			pr_warn("Failed to delete rules ret %d\n", ret);
+	}
+
+	/* record rule location */
+	reg->location = cmd.fs.location;
+	reg->rule_queue = queue_index;
+
+	return ret;
+}
+EXPORT_SYMBOL(netpolicy_set_rules);
+
 const char *policy_name[NET_POLICY_MAX] = {
 	"NONE",
 	"CPU",
diff --git a/net/core/sock.c b/net/core/sock.c
index 6eaaa08..849274a 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1439,6 +1439,8 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sk->sk_netpolicy.dev = NULL;
 		sk->sk_netpolicy.ptr = (void *)sk;
 		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
+		sk->sk_netpolicy.location = ~0;
+		sk->sk_netpolicy.rule_queue = ~0;
 #endif
 	}
 
@@ -1620,6 +1622,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 
 #ifdef CONFIG_NETPOLICY
 		newsk->sk_netpolicy.ptr = (void *)newsk;
+		newsk->sk_netpolicy.location = ~0;
+		newsk->sk_netpolicy.rule_queue = ~0;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
 			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 55513e6..889ffdc 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -759,6 +759,55 @@ ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset,
 }
 EXPORT_SYMBOL(inet_sendpage);
 
+static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
+{
+#ifdef CONFIG_NETPOLICY
+	int queue_index;
+	struct netpolicy_flow_spec flow;
+
+	if (!sk->sk_netpolicy.dev)
+		return;
+
+	if (sk->sk_netpolicy.policy <= NET_POLICY_NONE)
+		return;
+
+	queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, true);
+	if ((queue_index < 0) ||
+	    (queue_index == sk->sk_netpolicy.rule_queue))
+		return;
+
+	memset(&flow, 0, sizeof(flow));
+	/* TODO: need to change here and add more protocol support */
+	if (sk->sk_protocol == IPPROTO_TCP &&
+	    sk->sk_type == SOCK_STREAM) {
+		flow.flow_type = TCP_V4_FLOW;
+		flow.spec.tcp_udp_ip4_spec.ip4src = sk->sk_daddr;
+		flow.spec.tcp_udp_ip4_spec.psrc = sk->sk_dport;
+		flow.spec.tcp_udp_ip4_spec.ip4dst = sk->sk_rcv_saddr;
+		flow.spec.tcp_udp_ip4_spec.pdst = htons(sk->sk_num);
+	} else if (sk->sk_protocol == IPPROTO_UDP &&
+		   sk->sk_type == SOCK_DGRAM) {
+		DECLARE_SOCKADDR(struct sockaddr_in *, sin, msg->msg_name);
+
+		flow.flow_type = UDP_V4_FLOW;
+		if (sin && sin->sin_addr.s_addr)
+			flow.spec.tcp_udp_ip4_spec.ip4src = sin->sin_addr.s_addr;
+		else
+			return;
+		if (sin && sin->sin_port)
+			flow.spec.tcp_udp_ip4_spec.psrc = sin->sin_port;
+		else
+			return;
+		flow.spec.tcp_udp_ip4_spec.ip4dst = sk->sk_rcv_saddr;
+		flow.spec.tcp_udp_ip4_spec.pdst = htons(sk->sk_num);
+	} else {
+		return;
+	}
+	netpolicy_set_rules(&sk->sk_netpolicy, queue_index, &flow);
+
+#endif
+}
+
 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 		 int flags)
 {
@@ -772,6 +821,12 @@ int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
 				   flags & ~MSG_DONTWAIT, &addr_len);
 	if (err >= 0)
 		msg->msg_namelen = addr_len;
+
+	/* The dev info, src address and port information for UDP
+	 * can only be retrieved after processing the msg.
+	 */
+	sock_netpolicy_manage_flow(sk, msg);
+
 	return err;
 }
 EXPORT_SYMBOL(inet_recvmsg);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ca5e8ea..13181c8 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1785,6 +1785,10 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	if (sk) {
 		int ret;
 
+#ifdef CONFIG_NETPOLICY
+		/* Record dev info before it's discarded in udp_queue_rcv_skb */
+		sk->sk_netpolicy.dev = skb->dev;
+#endif
 		if (inet_get_convert_csum(sk) && uh->check && !IS_UDPLITE(sk))
 			skb_checksum_try_convert(skb, IPPROTO_UDP, uh->check,
 						 inet_compute_pseudo);
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 25/30] net/netpolicy: introduce per task net policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Usually, application as a whole has specific requirement. Applying the
net policy to all sockets one by one in the application is too complex.
This patch introduces per task net policy to address this case.
Once the per task net policy is applied, all the sockets in the
application will apply the same net policy. Also, per task net policy
can be inherited by all children.

The usage of PR_SET_NETPOLICY option is as below.
prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL).
It applies per task policy. The policy name must be valid and compatible
with current device policy. Othrewise, it will error out. The task
policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/init_task.h  | 11 +++++++++++
 include/linux/sched.h      |  5 +++++
 include/net/sock.h         |  1 +
 include/uapi/linux/prctl.h |  4 ++++
 kernel/exit.c              |  4 ++++
 kernel/fork.c              |  8 ++++++++
 kernel/sys.c               | 31 +++++++++++++++++++++++++++++++
 net/core/dev.c             | 26 +++++++++++++++++++-------
 net/core/netpolicy.c       | 34 ++++++++++++++++++++++++++++++++++
 net/core/sock.c            | 10 +++++++++-
 net/ipv4/af_inet.c         | 38 +++++++++++++++++++++++++++++---------
 11 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f8834f8..eda7ffc 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,16 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_NETPOLICY
+#define INIT_NETPOLICY(tsk)						\
+	.task_netpolicy.policy = NET_POLICY_INVALID,			\
+	.task_netpolicy.dev = NULL,					\
+	.task_netpolicy.location = ~0,					\
+	.task_netpolicy.rule_queue = ~0,				\
+	.task_netpolicy.ptr = (void *)&tsk,
+#else
+#define INIT_NETPOLICY(tsk)
+#endif
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -260,6 +270,7 @@ extern struct task_group root_task_group;
 	INIT_VTIME(tsk)							\
 	INIT_NUMA_BALANCING(tsk)					\
 	INIT_KASAN(tsk)							\
+	INIT_NETPOLICY(tsk)						\
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..2f37989 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -62,6 +62,8 @@ struct sched_param {
 
 #include <asm/processor.h>
 
+#include <linux/netpolicy.h>
+
 #define SCHED_ATTR_SIZE_VER0	48	/* sizeof first published struct */
 
 /*
@@ -1918,6 +1920,9 @@ struct task_struct {
 #ifdef CONFIG_MMU
 	struct task_struct *oom_reaper_list;
 #endif
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_reg task_netpolicy;
+#endif
 /* CPU-specific state of this task */
 	struct thread_struct thread;
 /*
diff --git a/include/net/sock.h b/include/net/sock.h
index e4721de..c7cc055 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1477,6 +1477,7 @@ void sock_edemux(struct sk_buff *skb);
 #define sock_edemux(skb) sock_efree(skb)
 #endif
 
+void sock_setnetpolicy(struct socket *sock);
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    char __user *optval, unsigned int optlen);
 
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..bc182d2 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* Control net policy */
+#define PR_SET_NETPOLICY		48
+#define PR_GET_NETPOLICY		49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 9e6e135..8995ec7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -778,6 +778,10 @@ void do_exit(long code)
 	if (unlikely(current->pi_state_cache))
 		kfree(current->pi_state_cache);
 #endif
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(current->task_netpolicy.policy))
+		netpolicy_unregister(&current->task_netpolicy);
+#endif
 	/*
 	 * Make sure we are holding no locks:
 	 */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4a7ec0c..31262d2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1453,6 +1453,14 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->sequential_io_avg	= 0;
 #endif
 
+#ifdef CONFIG_NETPOLICY
+	p->task_netpolicy.location = ~0;
+	p->task_netpolicy.rule_queue = ~0;
+	p->task_netpolicy.ptr = (void *)p;
+	if (is_net_policy_valid(p->task_netpolicy.policy))
+		netpolicy_register(&p->task_netpolicy, p->task_netpolicy.policy);
+#endif
+
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
 	if (retval)
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..b481a64 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2072,6 +2072,31 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 }
 #endif
 
+#ifdef CONFIG_NETPOLICY
+static int prctl_set_netpolicy(struct task_struct *me, int policy)
+{
+	return netpolicy_register(&me->task_netpolicy, policy);
+}
+
+static int prctl_get_netpolicy(struct task_struct *me, unsigned long adr)
+{
+	return put_user(me->task_netpolicy.policy, (int __user *)adr);
+}
+
+#else /* CONFIG_NETPOLICY */
+
+static int prctl_set_netpolicy(struct task_struct *me, int policy)
+{
+	return -EINVAL;
+}
+
+static int prctl_get_netpolicy(struct task_struct *me, unsigned long adr)
+{
+	return -EINVAL;
+}
+
+#endif /* CONFIG_NETPOLICY */
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2270,6 +2295,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+	case PR_SET_NETPOLICY:
+		error = prctl_set_netpolicy(me, arg2);
+		break;
+	case PR_GET_NETPOLICY:
+		error = prctl_get_netpolicy(me, arg2);
+		break;
 	default:
 		error = -EINVAL;
 		break;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6108e3b..f8213d2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3283,13 +3283,25 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 		else {
 #ifdef CONFIG_NETPOLICY
 			queue_index = -1;
-			if (sk && (sk->sk_netpolicy.policy > NET_POLICY_NONE)) {
-				/* There is no device bind to socket when setting policy
-				 * Assign the dev now.
-				 */
-				if (!sk->sk_netpolicy.dev)
-					sk->sk_netpolicy.dev = dev;
-				queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, false);
+			if (dev->netpolicy && sk) {
+				if (is_net_policy_valid(current->task_netpolicy.policy)) {
+					if (!current->task_netpolicy.dev)
+						current->task_netpolicy.dev = dev;
+					if (is_net_policy_valid(sk->sk_netpolicy.policy))
+						netpolicy_unregister(&sk->sk_netpolicy);
+
+					if (current->task_netpolicy.policy > NET_POLICY_NONE)
+						queue_index = netpolicy_pick_queue(&current->task_netpolicy, false);
+				} else {
+					if (sk->sk_netpolicy.policy > NET_POLICY_NONE) {
+						/* There is no device bind to socket when setting policy
+						 * Assign the dev now.
+						 */
+						if (!sk->sk_netpolicy.dev)
+							sk->sk_netpolicy.dev = dev;
+						queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, false);
+					}
+				}
 			}
 			if (queue_index < 0)
 #endif
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0ed3080..9e14137 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -24,6 +24,35 @@
  *	  is too difficult for users.
  * 	So, it is a big challenge to get good network performance.
  *
+ * NET policy supports four policies per device, and three policies per task
+ * and per socket. For using NET policy, the device policy must be set in
+ * advance. The task policy or socket policy must be compatible with device
+ * policy.
+ *
+ * BULK policy		This policy is designed for high throughput. It can be
+ *			applied to either device policy or task/socket policy.
+ *			If it is applied to device policy, the only compatible
+ *			task/socket policy is BULK policy itself.
+ * CPU policy		This policy is designed for high throughput and lower
+ *			CPU utilization. It can be applied to either device
+ *			policy or task/socket policy. If it is applied to
+ *			device policy, the only compatible task/socket policy
+ *			is CPU policy itself.
+ * LATENCY policy	This policy is designed for low latency. It can be
+ *			applied to either device policy or task/socket policy.
+ *			If it is applied to device policy, the only compatible
+ *			task/socket policy is LATENCY policy itself.
+ * MIX policy		This policy can only be applied to device policy. It
+ *			is compatible with BULK and LATENCY policy. This
+ *			policy is designed for the case which miscellaneous
+ *			types of workload running on the device.
+ *
+ * The device policy changes the system configuration and reorganize the
+ * resource on the device, but it does not change the packets behavior.
+ * The task policy and socket policy redirect the packets to get good
+ * performance. If both task policy and socket policy are set in the same
+ * task, task policy will be applied. The task policy can also be inherited by
+ * children.
  */
 #include <linux/module.h>
 #include <linux/kernel.h>
@@ -360,6 +389,11 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 		goto err;
 	}
 
+	/* task policy should be the same as socket policy */
+	if (is_net_policy_valid(current->task_netpolicy.policy) &&
+	    (current->task_netpolicy.policy != reg->policy))
+		return -EINVAL;
+
 	old_record = netpolicy_record_search(ptr_id);
 	if (!old_record) {
 		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
diff --git a/net/core/sock.c b/net/core/sock.c
index 849274a..4d47a89 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1005,7 +1005,13 @@ set_rcvbuf:
 
 #ifdef CONFIG_NETPOLICY
 	case SO_NETPOLICY:
-		ret = netpolicy_register(&sk->sk_netpolicy, val);
+		if (is_net_policy_valid(current->task_netpolicy.policy) &&
+		    (current->task_netpolicy.policy != val)) {
+			printk_ratelimited(KERN_WARNING "NETPOLICY: new policy is not compatible with task netpolicy\n");
+			ret = -EINVAL;
+		} else {
+			ret = netpolicy_register(&sk->sk_netpolicy, val);
+		}
 		break;
 #endif
 	default:
@@ -1624,6 +1630,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_netpolicy.ptr = (void *)newsk;
 		newsk->sk_netpolicy.location = ~0;
 		newsk->sk_netpolicy.rule_queue = ~0;
+		if (is_net_policy_valid(current->task_netpolicy.policy))
+			newsk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
 			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 889ffdc..3727240 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -765,16 +765,33 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 	int queue_index;
 	struct netpolicy_flow_spec flow;
 
-	if (!sk->sk_netpolicy.dev)
-		return;
+	if (is_net_policy_valid(current->task_netpolicy.policy)) {
+		if (current->task_netpolicy.policy == NET_POLICY_NONE)
+			return;
 
-	if (sk->sk_netpolicy.policy <= NET_POLICY_NONE)
-		return;
+		if ((!sk->sk_netpolicy.dev) && (!current->task_netpolicy.dev))
+			return;
 
-	queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, true);
-	if ((queue_index < 0) ||
-	    (queue_index == sk->sk_netpolicy.rule_queue))
-		return;
+		if (!current->task_netpolicy.dev)
+			current->task_netpolicy.dev = sk->sk_netpolicy.dev;
+		if (is_net_policy_valid(sk->sk_netpolicy.policy))
+			netpolicy_unregister(&sk->sk_netpolicy);
+		queue_index = netpolicy_pick_queue(&current->task_netpolicy, true);
+		if ((queue_index < 0) ||
+		    (queue_index == current->task_netpolicy.rule_queue))
+			return;
+	} else {
+		if (!sk->sk_netpolicy.dev)
+			return;
+
+		if (sk->sk_netpolicy.policy <= NET_POLICY_NONE)
+			return;
+
+		queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, true);
+		if ((queue_index < 0) ||
+		    (queue_index == sk->sk_netpolicy.rule_queue))
+			return;
+	}
 
 	memset(&flow, 0, sizeof(flow));
 	/* TODO: need to change here and add more protocol support */
@@ -803,7 +820,10 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 	} else {
 		return;
 	}
-	netpolicy_set_rules(&sk->sk_netpolicy, queue_index, &flow);
+	if (current->task_netpolicy.policy > NET_POLICY_NONE)
+		netpolicy_set_rules(&current->task_netpolicy, queue_index, &flow);
+	else
+		netpolicy_set_rules(&sk->sk_netpolicy, queue_index, &flow);
 
 #endif
 }
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 25/30] net/netpolicy: introduce per task net policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Usually, application as a whole has specific requirement. Applying the
net policy to all sockets one by one in the application is too complex.
This patch introduces per task net policy to address this case.
Once the per task net policy is applied, all the sockets in the
application will apply the same net policy. Also, per task net policy
can be inherited by all children.

The usage of PR_SET_NETPOLICY option is as below.
prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL).
It applies per task policy. The policy name must be valid and compatible
with current device policy. Othrewise, it will error out. The task
policy will be set to NET_POLICY_INVALID.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/init_task.h  | 11 +++++++++++
 include/linux/sched.h      |  5 +++++
 include/net/sock.h         |  1 +
 include/uapi/linux/prctl.h |  4 ++++
 kernel/exit.c              |  4 ++++
 kernel/fork.c              |  8 ++++++++
 kernel/sys.c               | 31 +++++++++++++++++++++++++++++++
 net/core/dev.c             | 26 +++++++++++++++++++-------
 net/core/netpolicy.c       | 34 ++++++++++++++++++++++++++++++++++
 net/core/sock.c            | 10 +++++++++-
 net/ipv4/af_inet.c         | 38 +++++++++++++++++++++++++++++---------
 11 files changed, 155 insertions(+), 17 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f8834f8..eda7ffc 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,16 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_NETPOLICY
+#define INIT_NETPOLICY(tsk)						\
+	.task_netpolicy.policy = NET_POLICY_INVALID,			\
+	.task_netpolicy.dev = NULL,					\
+	.task_netpolicy.location = ~0,					\
+	.task_netpolicy.rule_queue = ~0,				\
+	.task_netpolicy.ptr = (void *)&tsk,
+#else
+#define INIT_NETPOLICY(tsk)
+#endif
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1fffff (=2MB)
@@ -260,6 +270,7 @@ extern struct task_group root_task_group;
 	INIT_VTIME(tsk)							\
 	INIT_NUMA_BALANCING(tsk)					\
 	INIT_KASAN(tsk)							\
+	INIT_NETPOLICY(tsk)						\
 }
 
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 253538f..2f37989 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -62,6 +62,8 @@ struct sched_param {
 
 #include <asm/processor.h>
 
+#include <linux/netpolicy.h>
+
 #define SCHED_ATTR_SIZE_VER0	48	/* sizeof first published struct */
 
 /*
@@ -1918,6 +1920,9 @@ struct task_struct {
 #ifdef CONFIG_MMU
 	struct task_struct *oom_reaper_list;
 #endif
+#ifdef CONFIG_NETPOLICY
+	struct netpolicy_reg task_netpolicy;
+#endif
 /* CPU-specific state of this task */
 	struct thread_struct thread;
 /*
diff --git a/include/net/sock.h b/include/net/sock.h
index e4721de..c7cc055 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1477,6 +1477,7 @@ void sock_edemux(struct sk_buff *skb);
 #define sock_edemux(skb) sock_efree(skb)
 #endif
 
+void sock_setnetpolicy(struct socket *sock);
 int sock_setsockopt(struct socket *sock, int level, int op,
 		    char __user *optval, unsigned int optlen);
 
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index a8d0759..bc182d2 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -197,4 +197,8 @@ struct prctl_mm_map {
 # define PR_CAP_AMBIENT_LOWER		3
 # define PR_CAP_AMBIENT_CLEAR_ALL	4
 
+/* Control net policy */
+#define PR_SET_NETPOLICY		48
+#define PR_GET_NETPOLICY		49
+
 #endif /* _LINUX_PRCTL_H */
diff --git a/kernel/exit.c b/kernel/exit.c
index 9e6e135..8995ec7 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -778,6 +778,10 @@ void do_exit(long code)
 	if (unlikely(current->pi_state_cache))
 		kfree(current->pi_state_cache);
 #endif
+#ifdef CONFIG_NETPOLICY
+	if (is_net_policy_valid(current->task_netpolicy.policy))
+		netpolicy_unregister(&current->task_netpolicy);
+#endif
 	/*
 	 * Make sure we are holding no locks:
 	 */
diff --git a/kernel/fork.c b/kernel/fork.c
index 4a7ec0c..31262d2 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1453,6 +1453,14 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 	p->sequential_io_avg	= 0;
 #endif
 
+#ifdef CONFIG_NETPOLICY
+	p->task_netpolicy.location = ~0;
+	p->task_netpolicy.rule_queue = ~0;
+	p->task_netpolicy.ptr = (void *)p;
+	if (is_net_policy_valid(p->task_netpolicy.policy))
+		netpolicy_register(&p->task_netpolicy, p->task_netpolicy.policy);
+#endif
+
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
 	if (retval)
diff --git a/kernel/sys.c b/kernel/sys.c
index 89d5be4..b481a64 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -2072,6 +2072,31 @@ static int prctl_get_tid_address(struct task_struct *me, int __user **tid_addr)
 }
 #endif
 
+#ifdef CONFIG_NETPOLICY
+static int prctl_set_netpolicy(struct task_struct *me, int policy)
+{
+	return netpolicy_register(&me->task_netpolicy, policy);
+}
+
+static int prctl_get_netpolicy(struct task_struct *me, unsigned long adr)
+{
+	return put_user(me->task_netpolicy.policy, (int __user *)adr);
+}
+
+#else /* CONFIG_NETPOLICY */
+
+static int prctl_set_netpolicy(struct task_struct *me, int policy)
+{
+	return -EINVAL;
+}
+
+static int prctl_get_netpolicy(struct task_struct *me, unsigned long adr)
+{
+	return -EINVAL;
+}
+
+#endif /* CONFIG_NETPOLICY */
+
 SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 		unsigned long, arg4, unsigned long, arg5)
 {
@@ -2270,6 +2295,12 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3,
 	case PR_GET_FP_MODE:
 		error = GET_FP_MODE(me);
 		break;
+	case PR_SET_NETPOLICY:
+		error = prctl_set_netpolicy(me, arg2);
+		break;
+	case PR_GET_NETPOLICY:
+		error = prctl_get_netpolicy(me, arg2);
+		break;
 	default:
 		error = -EINVAL;
 		break;
diff --git a/net/core/dev.c b/net/core/dev.c
index 6108e3b..f8213d2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3283,13 +3283,25 @@ struct netdev_queue *netdev_pick_tx(struct net_device *dev,
 		else {
 #ifdef CONFIG_NETPOLICY
 			queue_index = -1;
-			if (sk && (sk->sk_netpolicy.policy > NET_POLICY_NONE)) {
-				/* There is no device bind to socket when setting policy
-				 * Assign the dev now.
-				 */
-				if (!sk->sk_netpolicy.dev)
-					sk->sk_netpolicy.dev = dev;
-				queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, false);
+			if (dev->netpolicy && sk) {
+				if (is_net_policy_valid(current->task_netpolicy.policy)) {
+					if (!current->task_netpolicy.dev)
+						current->task_netpolicy.dev = dev;
+					if (is_net_policy_valid(sk->sk_netpolicy.policy))
+						netpolicy_unregister(&sk->sk_netpolicy);
+
+					if (current->task_netpolicy.policy > NET_POLICY_NONE)
+						queue_index = netpolicy_pick_queue(&current->task_netpolicy, false);
+				} else {
+					if (sk->sk_netpolicy.policy > NET_POLICY_NONE) {
+						/* There is no device bind to socket when setting policy
+						 * Assign the dev now.
+						 */
+						if (!sk->sk_netpolicy.dev)
+							sk->sk_netpolicy.dev = dev;
+						queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, false);
+					}
+				}
 			}
 			if (queue_index < 0)
 #endif
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 0ed3080..9e14137 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -24,6 +24,35 @@
  *	  is too difficult for users.
  * 	So, it is a big challenge to get good network performance.
  *
+ * NET policy supports four policies per device, and three policies per task
+ * and per socket. For using NET policy, the device policy must be set in
+ * advance. The task policy or socket policy must be compatible with device
+ * policy.
+ *
+ * BULK policy		This policy is designed for high throughput. It can be
+ *			applied to either device policy or task/socket policy.
+ *			If it is applied to device policy, the only compatible
+ *			task/socket policy is BULK policy itself.
+ * CPU policy		This policy is designed for high throughput and lower
+ *			CPU utilization. It can be applied to either device
+ *			policy or task/socket policy. If it is applied to
+ *			device policy, the only compatible task/socket policy
+ *			is CPU policy itself.
+ * LATENCY policy	This policy is designed for low latency. It can be
+ *			applied to either device policy or task/socket policy.
+ *			If it is applied to device policy, the only compatible
+ *			task/socket policy is LATENCY policy itself.
+ * MIX policy		This policy can only be applied to device policy. It
+ *			is compatible with BULK and LATENCY policy. This
+ *			policy is designed for the case which miscellaneous
+ *			types of workload running on the device.
+ *
+ * The device policy changes the system configuration and reorganize the
+ * resource on the device, but it does not change the packets behavior.
+ * The task policy and socket policy redirect the packets to get good
+ * performance. If both task policy and socket policy are set in the same
+ * task, task policy will be applied. The task policy can also be inherited by
+ * children.
  */
 #include <linux/module.h>
 #include <linux/kernel.h>
@@ -360,6 +389,11 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 		goto err;
 	}
 
+	/* task policy should be the same as socket policy */
+	if (is_net_policy_valid(current->task_netpolicy.policy) &&
+	    (current->task_netpolicy.policy != reg->policy))
+		return -EINVAL;
+
 	old_record = netpolicy_record_search(ptr_id);
 	if (!old_record) {
 		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
diff --git a/net/core/sock.c b/net/core/sock.c
index 849274a..4d47a89 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1005,7 +1005,13 @@ set_rcvbuf:
 
 #ifdef CONFIG_NETPOLICY
 	case SO_NETPOLICY:
-		ret = netpolicy_register(&sk->sk_netpolicy, val);
+		if (is_net_policy_valid(current->task_netpolicy.policy) &&
+		    (current->task_netpolicy.policy != val)) {
+			printk_ratelimited(KERN_WARNING "NETPOLICY: new policy is not compatible with task netpolicy\n");
+			ret = -EINVAL;
+		} else {
+			ret = netpolicy_register(&sk->sk_netpolicy, val);
+		}
 		break;
 #endif
 	default:
@@ -1624,6 +1630,8 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_netpolicy.ptr = (void *)newsk;
 		newsk->sk_netpolicy.location = ~0;
 		newsk->sk_netpolicy.rule_queue = ~0;
+		if (is_net_policy_valid(current->task_netpolicy.policy))
+			newsk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
 			netpolicy_register(&newsk->sk_netpolicy, newsk->sk_netpolicy.policy);
 
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 889ffdc..3727240 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -765,16 +765,33 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 	int queue_index;
 	struct netpolicy_flow_spec flow;
 
-	if (!sk->sk_netpolicy.dev)
-		return;
+	if (is_net_policy_valid(current->task_netpolicy.policy)) {
+		if (current->task_netpolicy.policy == NET_POLICY_NONE)
+			return;
 
-	if (sk->sk_netpolicy.policy <= NET_POLICY_NONE)
-		return;
+		if ((!sk->sk_netpolicy.dev) && (!current->task_netpolicy.dev))
+			return;
 
-	queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, true);
-	if ((queue_index < 0) ||
-	    (queue_index == sk->sk_netpolicy.rule_queue))
-		return;
+		if (!current->task_netpolicy.dev)
+			current->task_netpolicy.dev = sk->sk_netpolicy.dev;
+		if (is_net_policy_valid(sk->sk_netpolicy.policy))
+			netpolicy_unregister(&sk->sk_netpolicy);
+		queue_index = netpolicy_pick_queue(&current->task_netpolicy, true);
+		if ((queue_index < 0) ||
+		    (queue_index == current->task_netpolicy.rule_queue))
+			return;
+	} else {
+		if (!sk->sk_netpolicy.dev)
+			return;
+
+		if (sk->sk_netpolicy.policy <= NET_POLICY_NONE)
+			return;
+
+		queue_index = netpolicy_pick_queue(&sk->sk_netpolicy, true);
+		if ((queue_index < 0) ||
+		    (queue_index == sk->sk_netpolicy.rule_queue))
+			return;
+	}
 
 	memset(&flow, 0, sizeof(flow));
 	/* TODO: need to change here and add more protocol support */
@@ -803,7 +820,10 @@ static void sock_netpolicy_manage_flow(struct sock *sk, struct msghdr *msg)
 	} else {
 		return;
 	}
-	netpolicy_set_rules(&sk->sk_netpolicy, queue_index, &flow);
+	if (current->task_netpolicy.policy > NET_POLICY_NONE)
+		netpolicy_set_rules(&current->task_netpolicy, queue_index, &flow);
+	else
+		netpolicy_set_rules(&sk->sk_netpolicy, queue_index, &flow);
 
 #endif
 }
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 26/30] net/netpolicy: set per task policy by proc
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Users may not want to change the source code to add per task net polic
support. Or they may want to change a running task's net policy. prctl
does not work for both cases.

This patch adds an interface in /proc, which can be used to set and
retrieve policy of already running tasks. User can write the policy name
into /proc/$PID/net_policy to set per task net policy.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 fs/proc/base.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index a11eb71..7679785 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -91,6 +91,8 @@
 #include <asm/hardwall.h>
 #endif
 #include <trace/events/oom.h>
+#include <linux/netpolicy.h>
+#include <linux/ctype.h>
 #include "internal.h"
 #include "fd.h"
 
@@ -2807,6 +2809,65 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 	return err;
 }
 
+#ifdef CONFIG_NETPOLICY
+static int proc_net_policy_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *task = get_proc_task(inode);
+
+	if (is_net_policy_valid(task->task_netpolicy.policy))
+		seq_printf(m, "%s\n", policy_name[task->task_netpolicy.policy]);
+
+	return 0;
+}
+
+static int proc_net_policy_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_net_policy_show, inode);
+}
+
+static ssize_t proc_net_policy_write(struct file *file, const char __user *buf,
+				     size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task = get_proc_task(inode);
+	char name[POLICY_NAME_LEN_MAX];
+	int i, ret;
+
+	if (count >= POLICY_NAME_LEN_MAX)
+		return -EINVAL;
+
+	if (copy_from_user(name, buf, count))
+		return -EINVAL;
+
+	for (i = 0; i < count - 1; i++)
+		name[i] = toupper(name[i]);
+	name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if (!strncmp(name, policy_name[i], strlen(policy_name[i]))) {
+			ret = netpolicy_register(&task->task_netpolicy, i);
+			if (ret)
+				return ret;
+			break;
+		}
+	}
+
+	if (i == NET_POLICY_MAX)
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations proc_net_policy_operations = {
+	.open		= proc_net_policy_open,
+	.write		= proc_net_policy_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+#endif /* CONFIG_NETPOLICY */
+
 /*
  * Thread groups
  */
@@ -2906,6 +2967,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("timers",	  S_IRUGO, proc_timers_operations),
 #endif
 	REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
+#if IS_ENABLED(CONFIG_NETPOLICY)
+	REG("net_policy", S_IRUSR|S_IWUSR, proc_net_policy_operations),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 26/30] net/netpolicy: set per task policy by proc
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Users may not want to change the source code to add per task net polic
support. Or they may want to change a running task's net policy. prctl
does not work for both cases.

This patch adds an interface in /proc, which can be used to set and
retrieve policy of already running tasks. User can write the policy name
into /proc/$PID/net_policy to set per task net policy.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 fs/proc/base.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 64 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index a11eb71..7679785 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -91,6 +91,8 @@
 #include <asm/hardwall.h>
 #endif
 #include <trace/events/oom.h>
+#include <linux/netpolicy.h>
+#include <linux/ctype.h>
 #include "internal.h"
 #include "fd.h"
 
@@ -2807,6 +2809,65 @@ static int proc_pid_personality(struct seq_file *m, struct pid_namespace *ns,
 	return err;
 }
 
+#ifdef CONFIG_NETPOLICY
+static int proc_net_policy_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *task = get_proc_task(inode);
+
+	if (is_net_policy_valid(task->task_netpolicy.policy))
+		seq_printf(m, "%s\n", policy_name[task->task_netpolicy.policy]);
+
+	return 0;
+}
+
+static int proc_net_policy_open(struct inode *inode, struct file *file)
+{
+	return single_open(file, proc_net_policy_show, inode);
+}
+
+static ssize_t proc_net_policy_write(struct file *file, const char __user *buf,
+				     size_t count, loff_t *ppos)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task = get_proc_task(inode);
+	char name[POLICY_NAME_LEN_MAX];
+	int i, ret;
+
+	if (count >= POLICY_NAME_LEN_MAX)
+		return -EINVAL;
+
+	if (copy_from_user(name, buf, count))
+		return -EINVAL;
+
+	for (i = 0; i < count - 1; i++)
+		name[i] = toupper(name[i]);
+	name[POLICY_NAME_LEN_MAX - 1] = 0;
+
+	for (i = 0; i < NET_POLICY_MAX; i++) {
+		if (!strncmp(name, policy_name[i], strlen(policy_name[i]))) {
+			ret = netpolicy_register(&task->task_netpolicy, i);
+			if (ret)
+				return ret;
+			break;
+		}
+	}
+
+	if (i == NET_POLICY_MAX)
+		return -EINVAL;
+
+	return count;
+}
+
+static const struct file_operations proc_net_policy_operations = {
+	.open		= proc_net_policy_open,
+	.write		= proc_net_policy_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
+#endif /* CONFIG_NETPOLICY */
+
 /*
  * Thread groups
  */
@@ -2906,6 +2967,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("timers",	  S_IRUGO, proc_timers_operations),
 #endif
 	REG("timerslack_ns", S_IRUGO|S_IWUGO, proc_pid_set_timerslack_ns_operations),
+#if IS_ENABLED(CONFIG_NETPOLICY)
+	REG("net_policy", S_IRUSR|S_IWUSR, proc_net_policy_operations),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 27/30] net/netpolicy: fast path for finding the queues
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Current implementation searches the hash table to get assigned object
for each transmit/receive packet. It's not necessory, because the
assigned object usually remain unchanged.

This patch store the assigned queue into netpolicy_reg struct. So it
doesnot need to search the hash table everytime unless the system cpu
and queue mapping changed.

netpolicy_sys_map_version is used to track the system cpu and queue
mapping changes. It's protected by a rw lock (TODO: will replace by RCU
shortly).

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/init_task.h |  3 +++
 include/linux/netpolicy.h |  5 +++++
 kernel/fork.c             |  3 +++
 net/core/netpolicy.c      | 36 ++++++++++++++++++++++++++++++++++++
 net/core/sock.c           |  6 ++++++
 5 files changed, 53 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index eda7ffc..06ea231 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -189,6 +189,9 @@ extern struct task_group root_task_group;
 	.task_netpolicy.dev = NULL,					\
 	.task_netpolicy.location = ~0,					\
 	.task_netpolicy.rule_queue = ~0,				\
+	.task_netpolicy.rx_queue = ~0,					\
+	.task_netpolicy.tx_queue = ~0,					\
+	.task_netpolicy.sys_map_version = 0,				\
 	.task_netpolicy.ptr = (void *)&tsk,
 #else
 #define INIT_NETPOLICY(tsk)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 1cd5ac4..fa740b5 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -39,6 +39,7 @@ enum netpolicy_traffic {
 
 #define POLICY_NAME_LEN_MAX	64
 extern const char *policy_name[];
+extern int netpolicy_sys_map_version __read_mostly;
 
 struct netpolicy_dev_info {
 	u32	rx_num;
@@ -86,6 +87,10 @@ struct netpolicy_reg {
 	void			*ptr;		/* pointers */
 	u32			location;	/* rule location */
 	u32			rule_queue;	/* queue set by rule */
+	/* Info for fast path */
+	u32			rx_queue;
+	u32			tx_queue;
+	int			sys_map_version;
 };
 
 struct netpolicy_tcpudpip4_spec {
diff --git a/kernel/fork.c b/kernel/fork.c
index 31262d2..fcb856b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1456,6 +1456,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 #ifdef CONFIG_NETPOLICY
 	p->task_netpolicy.location = ~0;
 	p->task_netpolicy.rule_queue = ~0;
+	p->task_netpolicy.rx_queue = ~0;
+	p->task_netpolicy.tx_queue = ~0;
+	p->task_netpolicy.sys_map_version = 0;
 	p->task_netpolicy.ptr = (void *)p;
 	if (is_net_policy_valid(p->task_netpolicy.policy))
 		netpolicy_register(&p->task_netpolicy, p->task_netpolicy.policy);
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 9e14137..a63ccd4 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -82,6 +82,10 @@ struct netpolicy_record {
 static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
+int netpolicy_sys_map_version;
+/* read write lock to protect sys map version */
+static DEFINE_RWLOCK(np_sys_map_lock);
+
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
 {
@@ -394,6 +398,24 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	    (current->task_netpolicy.policy != reg->policy))
 		return -EINVAL;
 
+	/* fast path */
+	read_lock(&np_sys_map_lock);
+	if (netpolicy_sys_map_version == reg->sys_map_version) {
+		if (is_rx && (reg->rx_queue != ~0)) {
+			read_unlock(&np_sys_map_lock);
+			return reg->rx_queue;
+		}
+		if (!is_rx && (reg->tx_queue != ~0)) {
+			read_unlock(&np_sys_map_lock);
+			return reg->tx_queue;
+		}
+	} else {
+		reg->rx_queue = ~0;
+		reg->tx_queue = ~0;
+		reg->sys_map_version = netpolicy_sys_map_version;
+	}
+	read_unlock(&np_sys_map_lock);
+
 	old_record = netpolicy_record_search(ptr_id);
 	if (!old_record) {
 		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
@@ -435,6 +457,11 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	spin_unlock_bh(&np_hashtable_lock);
 	kfree(old_record);
 
+	if (is_rx)
+		reg->rx_queue  = queue;
+	else
+		reg->tx_queue  = queue;
+
 	return queue;
 
 err:
@@ -522,6 +549,9 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 		rtnl_unlock();
 		reg->location = ~0;
 		reg->rule_queue = ~0;
+		reg->rx_queue = ~0;
+		reg->tx_queue = ~0;
+		reg->sys_map_version = 0;
 	}
 
 	spin_lock_bh(&np_hashtable_lock);
@@ -1272,6 +1302,10 @@ void update_netpolicy_sys_map(void)
 				netpolicy_disable(dev);
 				goto unlock;
 			}
+			write_lock(&np_sys_map_lock);
+			if (netpolicy_sys_map_version++ < 0)
+				netpolicy_sys_map_version = 0;
+			write_unlock(&np_sys_map_lock);
 
 			dev->netpolicy->cur_policy = cur_policy;
 unlock:
@@ -1305,6 +1339,8 @@ static int __init netpolicy_init(void)
 {
 	int ret;
 
+	netpolicy_sys_map_version = 0;
+
 	ret = register_pernet_subsys(&netpolicy_net_ops);
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
diff --git a/net/core/sock.c b/net/core/sock.c
index 4d47a89..284aafd 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1447,6 +1447,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		sk->sk_netpolicy.location = ~0;
 		sk->sk_netpolicy.rule_queue = ~0;
+		sk->sk_netpolicy.rx_queue = ~0;
+		sk->sk_netpolicy.tx_queue = ~0;
+		sk->sk_netpolicy.sys_map_version = 0;
 #endif
 	}
 
@@ -1630,6 +1633,9 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_netpolicy.ptr = (void *)newsk;
 		newsk->sk_netpolicy.location = ~0;
 		newsk->sk_netpolicy.rule_queue = ~0;
+		newsk->sk_netpolicy.rx_queue = ~0;
+		newsk->sk_netpolicy.tx_queue = ~0;
+		newsk->sk_netpolicy.sys_map_version = 0;
 		if (is_net_policy_valid(current->task_netpolicy.policy))
 			newsk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 27/30] net/netpolicy: fast path for finding the queues
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Current implementation searches the hash table to get assigned object
for each transmit/receive packet. It's not necessory, because the
assigned object usually remain unchanged.

This patch store the assigned queue into netpolicy_reg struct. So it
doesnot need to search the hash table everytime unless the system cpu
and queue mapping changed.

netpolicy_sys_map_version is used to track the system cpu and queue
mapping changes. It's protected by a rw lock (TODO: will replace by RCU
shortly).

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 include/linux/init_task.h |  3 +++
 include/linux/netpolicy.h |  5 +++++
 kernel/fork.c             |  3 +++
 net/core/netpolicy.c      | 36 ++++++++++++++++++++++++++++++++++++
 net/core/sock.c           |  6 ++++++
 5 files changed, 53 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index eda7ffc..06ea231 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -189,6 +189,9 @@ extern struct task_group root_task_group;
 	.task_netpolicy.dev = NULL,					\
 	.task_netpolicy.location = ~0,					\
 	.task_netpolicy.rule_queue = ~0,				\
+	.task_netpolicy.rx_queue = ~0,					\
+	.task_netpolicy.tx_queue = ~0,					\
+	.task_netpolicy.sys_map_version = 0,				\
 	.task_netpolicy.ptr = (void *)&tsk,
 #else
 #define INIT_NETPOLICY(tsk)
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 1cd5ac4..fa740b5 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -39,6 +39,7 @@ enum netpolicy_traffic {
 
 #define POLICY_NAME_LEN_MAX	64
 extern const char *policy_name[];
+extern int netpolicy_sys_map_version __read_mostly;
 
 struct netpolicy_dev_info {
 	u32	rx_num;
@@ -86,6 +87,10 @@ struct netpolicy_reg {
 	void			*ptr;		/* pointers */
 	u32			location;	/* rule location */
 	u32			rule_queue;	/* queue set by rule */
+	/* Info for fast path */
+	u32			rx_queue;
+	u32			tx_queue;
+	int			sys_map_version;
 };
 
 struct netpolicy_tcpudpip4_spec {
diff --git a/kernel/fork.c b/kernel/fork.c
index 31262d2..fcb856b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1456,6 +1456,9 @@ static struct task_struct *copy_process(unsigned long clone_flags,
 #ifdef CONFIG_NETPOLICY
 	p->task_netpolicy.location = ~0;
 	p->task_netpolicy.rule_queue = ~0;
+	p->task_netpolicy.rx_queue = ~0;
+	p->task_netpolicy.tx_queue = ~0;
+	p->task_netpolicy.sys_map_version = 0;
 	p->task_netpolicy.ptr = (void *)p;
 	if (is_net_policy_valid(p->task_netpolicy.policy))
 		netpolicy_register(&p->task_netpolicy, p->task_netpolicy.policy);
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 9e14137..a63ccd4 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -82,6 +82,10 @@ struct netpolicy_record {
 static DEFINE_HASHTABLE(np_record_hash, 10);
 static DEFINE_SPINLOCK(np_hashtable_lock);
 
+int netpolicy_sys_map_version;
+/* read write lock to protect sys map version */
+static DEFINE_RWLOCK(np_sys_map_lock);
+
 static int netpolicy_get_dev_info(struct net_device *dev,
 				  struct netpolicy_dev_info *d_info)
 {
@@ -394,6 +398,24 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	    (current->task_netpolicy.policy != reg->policy))
 		return -EINVAL;
 
+	/* fast path */
+	read_lock(&np_sys_map_lock);
+	if (netpolicy_sys_map_version == reg->sys_map_version) {
+		if (is_rx && (reg->rx_queue != ~0)) {
+			read_unlock(&np_sys_map_lock);
+			return reg->rx_queue;
+		}
+		if (!is_rx && (reg->tx_queue != ~0)) {
+			read_unlock(&np_sys_map_lock);
+			return reg->tx_queue;
+		}
+	} else {
+		reg->rx_queue = ~0;
+		reg->tx_queue = ~0;
+		reg->sys_map_version = netpolicy_sys_map_version;
+	}
+	read_unlock(&np_sys_map_lock);
+
 	old_record = netpolicy_record_search(ptr_id);
 	if (!old_record) {
 		pr_warn("NETPOLICY: doesn't registered. Remove net policy settings!\n");
@@ -435,6 +457,11 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	spin_unlock_bh(&np_hashtable_lock);
 	kfree(old_record);
 
+	if (is_rx)
+		reg->rx_queue  = queue;
+	else
+		reg->tx_queue  = queue;
+
 	return queue;
 
 err:
@@ -522,6 +549,9 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 		rtnl_unlock();
 		reg->location = ~0;
 		reg->rule_queue = ~0;
+		reg->rx_queue = ~0;
+		reg->tx_queue = ~0;
+		reg->sys_map_version = 0;
 	}
 
 	spin_lock_bh(&np_hashtable_lock);
@@ -1272,6 +1302,10 @@ void update_netpolicy_sys_map(void)
 				netpolicy_disable(dev);
 				goto unlock;
 			}
+			write_lock(&np_sys_map_lock);
+			if (netpolicy_sys_map_version++ < 0)
+				netpolicy_sys_map_version = 0;
+			write_unlock(&np_sys_map_lock);
 
 			dev->netpolicy->cur_policy = cur_policy;
 unlock:
@@ -1305,6 +1339,8 @@ static int __init netpolicy_init(void)
 {
 	int ret;
 
+	netpolicy_sys_map_version = 0;
+
 	ret = register_pernet_subsys(&netpolicy_net_ops);
 	if (!ret)
 		register_netdevice_notifier(&netpolicy_dev_notf);
diff --git a/net/core/sock.c b/net/core/sock.c
index 4d47a89..284aafd 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1447,6 +1447,9 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		sk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		sk->sk_netpolicy.location = ~0;
 		sk->sk_netpolicy.rule_queue = ~0;
+		sk->sk_netpolicy.rx_queue = ~0;
+		sk->sk_netpolicy.tx_queue = ~0;
+		sk->sk_netpolicy.sys_map_version = 0;
 #endif
 	}
 
@@ -1630,6 +1633,9 @@ struct sock *sk_clone_lock(const struct sock *sk, const gfp_t priority)
 		newsk->sk_netpolicy.ptr = (void *)newsk;
 		newsk->sk_netpolicy.location = ~0;
 		newsk->sk_netpolicy.rule_queue = ~0;
+		newsk->sk_netpolicy.rx_queue = ~0;
+		newsk->sk_netpolicy.tx_queue = ~0;
+		newsk->sk_netpolicy.sys_map_version = 0;
 		if (is_net_policy_valid(current->task_netpolicy.policy))
 			newsk->sk_netpolicy.policy = NET_POLICY_INVALID;
 		if (is_net_policy_valid(newsk->sk_netpolicy.policy))
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 28/30] net/netpolicy: optimize for queue pair
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Some drivers like i40e driver does not support separate tx and rx queues
as channels. Using rx queue to stand for the channels, if queue_pair is
set by driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +++
 include/linux/netpolicy.h                   | 1 +
 net/core/netpolicy.c                        | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d3f087d..f03d9f6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9009,6 +9009,9 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 	/* support MIX policy */
 	info->has_mix_policy = true;
 
+	/* support queue pair */
+	info->queue_pair = true;
+
 	return 0;
 }
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fa740b5..2de59a6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -75,6 +75,7 @@ struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	bool	has_mix_policy;
+	bool	queue_pair;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index a63ccd4..83242d3 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -398,6 +398,9 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	    (current->task_netpolicy.policy != reg->policy))
 		return -EINVAL;
 
+	if (dev->netpolicy->queue_pair)
+		is_rx = true;
+
 	/* fast path */
 	read_lock(&np_sys_map_lock);
 	if (netpolicy_sys_map_version == reg->sys_map_version) {
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 28/30] net/netpolicy: optimize for queue pair
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Some drivers like i40e driver does not support separate tx and rx queues
as channels. Using rx queue to stand for the channels, if queue_pair is
set by driver.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 +++
 include/linux/netpolicy.h                   | 1 +
 net/core/netpolicy.c                        | 3 +++
 3 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d3f087d..f03d9f6 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9009,6 +9009,9 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 	/* support MIX policy */
 	info->has_mix_policy = true;
 
+	/* support queue pair */
+	info->queue_pair = true;
+
 	return 0;
 }
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index fa740b5..2de59a6 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -75,6 +75,7 @@ struct netpolicy_info {
 	enum netpolicy_name	cur_policy;
 	unsigned long avail_policy[BITS_TO_LONGS(NET_POLICY_MAX)];
 	bool	has_mix_policy;
+	bool	queue_pair;
 	/* cpu and queue mapping information */
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index a63ccd4..83242d3 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -398,6 +398,9 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	    (current->task_netpolicy.policy != reg->policy))
 		return -EINVAL;
 
+	if (dev->netpolicy->queue_pair)
+		is_rx = true;
+
 	/* fast path */
 	read_lock(&np_sys_map_lock);
 	if (netpolicy_sys_map_version == reg->sys_map_version) {
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 29/30] net/netpolicy: limit the total record number
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

NET policy can not fulfill users request without limit, because of the
security consideration and device limitation. For security
consideration, the attacker may fake millions of per task/socket request
to crash the system. For device limitation, the flow director rules
number is limited on i40e driver. NET policy should not run out the
rules, otherwise it cannot guarantee the good performance.

This patch limits the total record number in RCU hash table to fix the
cases as above. The max total record number could vary for different
device. For i40e driver, it limits the record number according to flow
director rules number. If it exceeds the limitation, the registeration
and new object request will be denied.

Since the dev may not be aware in registeration, the cur_rec_num may not
be updated on time. So the actual registered record may exceeds the
max_rec_num. But it will not bring any problems. Because the patch also
check the limitation on object request. It guarantees that the device
resource will not run out.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |  6 ++++++
 include/linux/netpolicy.h                   |  4 ++++
 net/core/netpolicy.c                        | 22 ++++++++++++++++++++--
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f03d9f6..db03f5a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8994,6 +8994,9 @@ static int policy_param[NET_POLICY_MAX + 1][2] = {
 static int i40e_ndo_netpolicy_init(struct net_device *dev,
 				   struct netpolicy_info *info)
 {
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
 	int i;
 
 	for (i = 0; i < NET_POLICY_MAX; i++) {
@@ -9012,6 +9015,9 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 	/* support queue pair */
 	info->queue_pair = true;
 
+	/* limit the record number to flow director rules number */
+	info->max_rec_num = i40e_get_fd_cnt_all(pf);
+
 	return 0;
 }
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 2de59a6..1307363 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -38,6 +38,7 @@ enum netpolicy_traffic {
 };
 
 #define POLICY_NAME_LEN_MAX	64
+#define NETPOLICY_MAX_RECORD_NUM	7000
 extern const char *policy_name[];
 extern int netpolicy_sys_map_version __read_mostly;
 
@@ -80,6 +81,9 @@ struct netpolicy_info {
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
 	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
+	/* for record number limitation */
+	int				max_rec_num;
+	atomic_t			cur_rec_num;
 };
 
 struct netpolicy_reg {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 83242d3..5e9c9b8 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -380,6 +380,9 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	if (!dev || !dev->netpolicy)
 		goto err;
 
+	if (atomic_read(&dev->netpolicy->cur_rec_num) > dev->netpolicy->max_rec_num)
+		goto err;
+
 	cur_policy = dev->netpolicy->cur_policy;
 	if ((reg->policy == NET_POLICY_NONE) ||
 	    (cur_policy == NET_POLICY_NONE))
@@ -433,8 +436,10 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	if (is_rx) {
 		if (!new_record->rx_obj) {
 			new_record->rx_obj = get_avail_queue(dev, new_record->policy, is_rx);
-			if (!new_record->dev)
+			if (!new_record->dev) {
 				new_record->dev = dev;
+				atomic_inc(&dev->netpolicy->cur_rec_num);
+			}
 			if (!new_record->rx_obj) {
 				kfree(new_record);
 				return -ENOTSUPP;
@@ -444,8 +449,10 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	} else {
 		if (!new_record->tx_obj) {
 			new_record->tx_obj = get_avail_queue(dev, new_record->policy, is_rx);
-			if (!new_record->dev)
+			if (!new_record->dev) {
 				new_record->dev = dev;
+				atomic_inc(&dev->netpolicy->cur_rec_num);
+			}
 			if (!new_record->tx_obj) {
 				kfree(new_record);
 				return -ENOTSUPP;
@@ -493,12 +500,17 @@ int netpolicy_register(struct netpolicy_reg *reg,
 {
 	unsigned long ptr_id = (uintptr_t)reg->ptr;
 	struct netpolicy_record *new, *old;
+	struct net_device *dev = reg->dev;
 
 	if (!is_net_policy_valid(policy)) {
 		reg->policy = NET_POLICY_INVALID;
 		return -EINVAL;
 	}
 
+	if (dev && dev->netpolicy &&
+	    (atomic_read(&dev->netpolicy->cur_rec_num) > dev->netpolicy->max_rec_num))
+		return -ENOSPC;
+
 	new = kzalloc(sizeof(*new), GFP_KERNEL);
 	if (!new) {
 		reg->policy = NET_POLICY_INVALID;
@@ -519,6 +531,8 @@ int netpolicy_register(struct netpolicy_reg *reg,
 		new->dev = reg->dev;
 		new->policy = policy;
 		hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+		if (dev && dev->netpolicy)
+			atomic_inc(&dev->netpolicy->cur_rec_num);
 	}
 	reg->policy = policy;
 	spin_unlock_bh(&np_hashtable_lock);
@@ -565,6 +579,7 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 		/* The record cannot be share. It can be safely free. */
 		put_queue(record->dev, record->rx_obj, record->tx_obj);
 		kfree(record);
+		atomic_dec(&dev->netpolicy->cur_rec_num);
 	}
 	reg->policy = NET_POLICY_INVALID;
 	spin_unlock_bh(&np_hashtable_lock);
@@ -1152,6 +1167,9 @@ int init_netpolicy(struct net_device *dev)
 		goto unlock;
 	}
 
+	if (!dev->netpolicy->max_rec_num)
+		dev->netpolicy->max_rec_num = NETPOLICY_MAX_RECORD_NUM;
+
 	spin_lock(&dev->np_ob_list_lock);
 	for (i = 0; i < NETPOLICY_RXTX; i++) {
 		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 29/30] net/netpolicy: limit the total record number
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

NET policy can not fulfill users request without limit, because of the
security consideration and device limitation. For security
consideration, the attacker may fake millions of per task/socket request
to crash the system. For device limitation, the flow director rules
number is limited on i40e driver. NET policy should not run out the
rules, otherwise it cannot guarantee the good performance.

This patch limits the total record number in RCU hash table to fix the
cases as above. The max total record number could vary for different
device. For i40e driver, it limits the record number according to flow
director rules number. If it exceeds the limitation, the registeration
and new object request will be denied.

Since the dev may not be aware in registeration, the cur_rec_num may not
be updated on time. So the actual registered record may exceeds the
max_rec_num. But it will not bring any problems. Because the patch also
check the limitation on object request. It guarantees that the device
resource will not run out.

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c |  6 ++++++
 include/linux/netpolicy.h                   |  4 ++++
 net/core/netpolicy.c                        | 22 ++++++++++++++++++++--
 3 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f03d9f6..db03f5a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8994,6 +8994,9 @@ static int policy_param[NET_POLICY_MAX + 1][2] = {
 static int i40e_ndo_netpolicy_init(struct net_device *dev,
 				   struct netpolicy_info *info)
 {
+	struct i40e_netdev_priv *np = netdev_priv(dev);
+	struct i40e_vsi *vsi = np->vsi;
+	struct i40e_pf *pf = vsi->back;
 	int i;
 
 	for (i = 0; i < NET_POLICY_MAX; i++) {
@@ -9012,6 +9015,9 @@ static int i40e_ndo_netpolicy_init(struct net_device *dev,
 	/* support queue pair */
 	info->queue_pair = true;
 
+	/* limit the record number to flow director rules number */
+	info->max_rec_num = i40e_get_fd_cnt_all(pf);
+
 	return 0;
 }
 
diff --git a/include/linux/netpolicy.h b/include/linux/netpolicy.h
index 2de59a6..1307363 100644
--- a/include/linux/netpolicy.h
+++ b/include/linux/netpolicy.h
@@ -38,6 +38,7 @@ enum netpolicy_traffic {
 };
 
 #define POLICY_NAME_LEN_MAX	64
+#define NETPOLICY_MAX_RECORD_NUM	7000
 extern const char *policy_name[];
 extern int netpolicy_sys_map_version __read_mostly;
 
@@ -80,6 +81,9 @@ struct netpolicy_info {
 	struct netpolicy_sys_info	sys_info;
 	/* List of policy objects 0 rx 1 tx */
 	struct list_head		obj_list[NETPOLICY_RXTX][NET_POLICY_MAX];
+	/* for record number limitation */
+	int				max_rec_num;
+	atomic_t			cur_rec_num;
 };
 
 struct netpolicy_reg {
diff --git a/net/core/netpolicy.c b/net/core/netpolicy.c
index 83242d3..5e9c9b8 100644
--- a/net/core/netpolicy.c
+++ b/net/core/netpolicy.c
@@ -380,6 +380,9 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	if (!dev || !dev->netpolicy)
 		goto err;
 
+	if (atomic_read(&dev->netpolicy->cur_rec_num) > dev->netpolicy->max_rec_num)
+		goto err;
+
 	cur_policy = dev->netpolicy->cur_policy;
 	if ((reg->policy == NET_POLICY_NONE) ||
 	    (cur_policy == NET_POLICY_NONE))
@@ -433,8 +436,10 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	if (is_rx) {
 		if (!new_record->rx_obj) {
 			new_record->rx_obj = get_avail_queue(dev, new_record->policy, is_rx);
-			if (!new_record->dev)
+			if (!new_record->dev) {
 				new_record->dev = dev;
+				atomic_inc(&dev->netpolicy->cur_rec_num);
+			}
 			if (!new_record->rx_obj) {
 				kfree(new_record);
 				return -ENOTSUPP;
@@ -444,8 +449,10 @@ int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx)
 	} else {
 		if (!new_record->tx_obj) {
 			new_record->tx_obj = get_avail_queue(dev, new_record->policy, is_rx);
-			if (!new_record->dev)
+			if (!new_record->dev) {
 				new_record->dev = dev;
+				atomic_inc(&dev->netpolicy->cur_rec_num);
+			}
 			if (!new_record->tx_obj) {
 				kfree(new_record);
 				return -ENOTSUPP;
@@ -493,12 +500,17 @@ int netpolicy_register(struct netpolicy_reg *reg,
 {
 	unsigned long ptr_id = (uintptr_t)reg->ptr;
 	struct netpolicy_record *new, *old;
+	struct net_device *dev = reg->dev;
 
 	if (!is_net_policy_valid(policy)) {
 		reg->policy = NET_POLICY_INVALID;
 		return -EINVAL;
 	}
 
+	if (dev && dev->netpolicy &&
+	    (atomic_read(&dev->netpolicy->cur_rec_num) > dev->netpolicy->max_rec_num))
+		return -ENOSPC;
+
 	new = kzalloc(sizeof(*new), GFP_KERNEL);
 	if (!new) {
 		reg->policy = NET_POLICY_INVALID;
@@ -519,6 +531,8 @@ int netpolicy_register(struct netpolicy_reg *reg,
 		new->dev = reg->dev;
 		new->policy = policy;
 		hash_add_rcu(np_record_hash, &new->hash_node, ptr_id);
+		if (dev && dev->netpolicy)
+			atomic_inc(&dev->netpolicy->cur_rec_num);
 	}
 	reg->policy = policy;
 	spin_unlock_bh(&np_hashtable_lock);
@@ -565,6 +579,7 @@ void netpolicy_unregister(struct netpolicy_reg *reg)
 		/* The record cannot be share. It can be safely free. */
 		put_queue(record->dev, record->rx_obj, record->tx_obj);
 		kfree(record);
+		atomic_dec(&dev->netpolicy->cur_rec_num);
 	}
 	reg->policy = NET_POLICY_INVALID;
 	spin_unlock_bh(&np_hashtable_lock);
@@ -1152,6 +1167,9 @@ int init_netpolicy(struct net_device *dev)
 		goto unlock;
 	}
 
+	if (!dev->netpolicy->max_rec_num)
+		dev->netpolicy->max_rec_num = NETPOLICY_MAX_RECORD_NUM;
+
 	spin_lock(&dev->np_ob_list_lock);
 	for (i = 0; i < NETPOLICY_RXTX; i++) {
 		for (j = NET_POLICY_NONE; j < NET_POLICY_MAX; j++)
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [RFC PATCH 30/30] Documentation/networking: Document net policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18  6:56   ` kan.liang
  -1 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, Kan Liang

From: Kan Liang <kan.liang@intel.com>

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 Documentation/networking/netpolicy.txt | 158 +++++++++++++++++++++++++++++++++
 1 file changed, 158 insertions(+)
 create mode 100644 Documentation/networking/netpolicy.txt

diff --git a/Documentation/networking/netpolicy.txt b/Documentation/networking/netpolicy.txt
new file mode 100644
index 0000000..2ce938e
--- /dev/null
+++ b/Documentation/networking/netpolicy.txt
@@ -0,0 +1,158 @@
+What is Linux Net Policy?
+
+It is a big challenge to get good network performance. First, the network
+performance is not good with default system settings. Second, it is too
+difficult to do automatic tuning for all possible workloads, since workloads
+have different requirements. Some workloads may want high throughput. Some may
+need low latency. Last but not least, there are lots of manual configurations.
+Fine grained configuration is too difficult for users.
+
+"NET policy" intends to simplify the network configuration and get a
+good network performance according to the hints(policy) which is applied by
+user. It provides some typical "policies" for user which can be set
+per-socket, per-task or per-device. The kernel will automatically figures out
+how to merge different requests to get good network performance.
+
+"Net policy" is designed for multiqueue network devices. This document
+describes the concepts and APIs of "net policy" support.
+
+NET POLICY CONCEPTS
+
+Scope of Net Policies
+
+    Device net policy: this policy applies to the whole device. Once the
+    device net policy is set, it automatically configures the system
+    according to the applied policy. The configuration usually includes irq
+    affinity, irq balance disable, interrupt moderation, and so on. But the
+    device net policy does not change the packet direction.
+
+    Task net policy: this is a per-task policy. When it is applied to specific
+    task, all packets transmissions of the task will be redirect to the
+    assigned queues accordingly. If a task does not define a task policy,
+    it "falls back" to the system default way to direct the packets. The
+    per-task policy must be compatible with device net policy.
+
+    Socket net policy: this is a per-socket policy. When it is applied to
+    specific socket, all packets transmissions of the socket will be redirect
+    to the assigned queues accordingly. If a socket does not define a socket
+    policy, it "falls back" to the system default way to direct the packets.
+    The per-socket policy must be compatible with both device net policy and
+    per-task policy.
+
+Components of Net Policies
+
+    Net policy object: it is a combination of cpu and queue. The queue irq has
+    to set affinity with the cpu. It can be shared between sockets and tasks.
+    A reference counter is used to track the sharing number.
+
+    Net policy object list: each device policy has an object list. Once the
+    device policy is determined, the net policy object will be inserted into
+    the net policy object list. The net policy object list does not change
+    unless the cpu/queue number is changed, the netpolicy is disabled or
+    the device policy is changed.
+    The network performance for objects could be different because of the
+    queue/cpu topology and dev location. The objects which can bring high
+    performance are in the front of the list.
+
+    RCU hash table: a RCU hash table to maintain the relationship between
+    the task/socket and the assigned object. The task/socket can get the
+    assigned object by searching the table.
+    If it is the first time, there is no assigned object in the table. It will
+    go through the object list to find the available object based on position
+    and reference number.
+    If the net policy object list changes, all the assigned object will become
+    invalid.
+
+NET POLICY APIs
+
+Interfaces between net policy and device driver
+
+    int (*ndo_netpolicy_init)(struct net_device *dev,
+                              struct netpolicy_info *info);
+
+    The device driver who has NET policy support must implement this interface.
+    In this interface, the device driver do necessory initialization, and fill
+    the info for net policy module. The information could inlcude supported
+    policy, MIX policy support, queue pair support and so on.
+
+    int (*ndo_get_irq_info)(struct net_device *dev,
+                            struct netpolicy_dev_info *info);
+
+    This interface is used to get more accurate device irq information.
+
+    int (*ndo_set_net_policy)(struct net_device *dev,
+                              enum netpolicy_name name);
+
+    This interface is used to set device net policy by name
+
+Interfaces between net policy and kernel
+
+    int netpolicy_register(struct netpolicy_reg *reg);
+    void netpolicy_unregister(struct netpolicy_reg *reg);
+
+    This interface is used to register per task/socket net policy.
+    If it's the first time to register, an record will be created and inserted
+    into RCU hash table. The record includes ptr, policy and object
+    information. There is only one user for each record. The record cannot be
+    share.
+
+
+    int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
+
+    This interface is used to find the proper queue(object) for packet
+    receiving and transmitting. The proper queue is picked from object list
+    according to policy, reference, location and so on.
+
+
+    int netpolicy_set_rules(struct netpolicy_reg *reg,
+                            u32 queue_index,
+                            struct netpolicy_flow_spec *flow);
+
+    This interface is used to add device specific rules. Once the rule is
+    applied, the packet from specific IP and port will be redirect to the
+    given queue. This interface usually be used in receive side.
+
+NET POLICY INTERFACE
+
+Device net policy setting
+
+    /proc/net/netpolicy/$DEV/policy
+
+    Concatenating(cat) the "policy" file can show the available device
+    policies, if there is no device policy applied. Otherwise, the device
+    policy name will be print out. If it is MIX policy, the policy for each
+    queue will also be print out.
+    User can set device net policy by writing policy name.
+
+Task policy setting
+
+    /proc/$PID/net_policy
+
+    Concatenating(cat) the "net_policy" file can show the applied per task
+    policy.
+    User can set per task net policy by writing policy name.
+
+    OR
+
+    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
+
+    "prctl" is an alternative way to set/get per task policy.
+
+Socket policy setting
+
+    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
+
+    The socket net policy can be set by option SO_NETPOLICY of setsockopt
+
+AVAILABLE NET POLICIES
+
+    The available net policies are defined as below:
+    - CPU: intends to get higher throughput and lower CPU%. This policy can be
+           applied as either device net policy or task/socket net policy.
+    - BULK: intends to get highest throughput. This policy can be applied as
+            either device net policy or task/socket net policy.
+    - LATENCY: intends to get lowest latency. This policy can be applied as
+               either device net policy or task/socket net policy.
+    - MIX: combination of other policies, which allows each queue has
+           different policy. This policy can only be set as device net policy.
+
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 30/30] Documentation/networking: Document net policy
@ 2016-07-18  6:56   ` kan.liang
  0 siblings, 0 replies; 123+ messages in thread
From: kan.liang @ 2016-07-18  6:56 UTC (permalink / raw)
  To: intel-wired-lan

From: Kan Liang <kan.liang@intel.com>

Signed-off-by: Kan Liang <kan.liang@intel.com>
---
 Documentation/networking/netpolicy.txt | 158 +++++++++++++++++++++++++++++++++
 1 file changed, 158 insertions(+)
 create mode 100644 Documentation/networking/netpolicy.txt

diff --git a/Documentation/networking/netpolicy.txt b/Documentation/networking/netpolicy.txt
new file mode 100644
index 0000000..2ce938e
--- /dev/null
+++ b/Documentation/networking/netpolicy.txt
@@ -0,0 +1,158 @@
+What is Linux Net Policy?
+
+It is a big challenge to get good network performance. First, the network
+performance is not good with default system settings. Second, it is too
+difficult to do automatic tuning for all possible workloads, since workloads
+have different requirements. Some workloads may want high throughput. Some may
+need low latency. Last but not least, there are lots of manual configurations.
+Fine grained configuration is too difficult for users.
+
+"NET policy" intends to simplify the network configuration and get a
+good network performance according to the hints(policy) which is applied by
+user. It provides some typical "policies" for user which can be set
+per-socket, per-task or per-device. The kernel will automatically figures out
+how to merge different requests to get good network performance.
+
+"Net policy" is designed for multiqueue network devices. This document
+describes the concepts and APIs of "net policy" support.
+
+NET POLICY CONCEPTS
+
+Scope of Net Policies
+
+    Device net policy: this policy applies to the whole device. Once the
+    device net policy is set, it automatically configures the system
+    according to the applied policy. The configuration usually includes irq
+    affinity, irq balance disable, interrupt moderation, and so on. But the
+    device net policy does not change the packet direction.
+
+    Task net policy: this is a per-task policy. When it is applied to specific
+    task, all packets transmissions of the task will be redirect to the
+    assigned queues accordingly. If a task does not define a task policy,
+    it "falls back" to the system default way to direct the packets. The
+    per-task policy must be compatible with device net policy.
+
+    Socket net policy: this is a per-socket policy. When it is applied to
+    specific socket, all packets transmissions of the socket will be redirect
+    to the assigned queues accordingly. If a socket does not define a socket
+    policy, it "falls back" to the system default way to direct the packets.
+    The per-socket policy must be compatible with both device net policy and
+    per-task policy.
+
+Components of Net Policies
+
+    Net policy object: it is a combination of cpu and queue. The queue irq has
+    to set affinity with the cpu. It can be shared between sockets and tasks.
+    A reference counter is used to track the sharing number.
+
+    Net policy object list: each device policy has an object list. Once the
+    device policy is determined, the net policy object will be inserted into
+    the net policy object list. The net policy object list does not change
+    unless the cpu/queue number is changed, the netpolicy is disabled or
+    the device policy is changed.
+    The network performance for objects could be different because of the
+    queue/cpu topology and dev location. The objects which can bring high
+    performance are in the front of the list.
+
+    RCU hash table: a RCU hash table to maintain the relationship between
+    the task/socket and the assigned object. The task/socket can get the
+    assigned object by searching the table.
+    If it is the first time, there is no assigned object in the table. It will
+    go through the object list to find the available object based on position
+    and reference number.
+    If the net policy object list changes, all the assigned object will become
+    invalid.
+
+NET POLICY APIs
+
+Interfaces between net policy and device driver
+
+    int (*ndo_netpolicy_init)(struct net_device *dev,
+                              struct netpolicy_info *info);
+
+    The device driver who has NET policy support must implement this interface.
+    In this interface, the device driver do necessory initialization, and fill
+    the info for net policy module. The information could inlcude supported
+    policy, MIX policy support, queue pair support and so on.
+
+    int (*ndo_get_irq_info)(struct net_device *dev,
+                            struct netpolicy_dev_info *info);
+
+    This interface is used to get more accurate device irq information.
+
+    int (*ndo_set_net_policy)(struct net_device *dev,
+                              enum netpolicy_name name);
+
+    This interface is used to set device net policy by name
+
+Interfaces between net policy and kernel
+
+    int netpolicy_register(struct netpolicy_reg *reg);
+    void netpolicy_unregister(struct netpolicy_reg *reg);
+
+    This interface is used to register per task/socket net policy.
+    If it's the first time to register, an record will be created and inserted
+    into RCU hash table. The record includes ptr, policy and object
+    information. There is only one user for each record. The record cannot be
+    share.
+
+
+    int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
+
+    This interface is used to find the proper queue(object) for packet
+    receiving and transmitting. The proper queue is picked from object list
+    according to policy, reference, location and so on.
+
+
+    int netpolicy_set_rules(struct netpolicy_reg *reg,
+                            u32 queue_index,
+                            struct netpolicy_flow_spec *flow);
+
+    This interface is used to add device specific rules. Once the rule is
+    applied, the packet from specific IP and port will be redirect to the
+    given queue. This interface usually be used in receive side.
+
+NET POLICY INTERFACE
+
+Device net policy setting
+
+    /proc/net/netpolicy/$DEV/policy
+
+    Concatenating(cat) the "policy" file can show the available device
+    policies, if there is no device policy applied. Otherwise, the device
+    policy name will be print out. If it is MIX policy, the policy for each
+    queue will also be print out.
+    User can set device net policy by writing policy name.
+
+Task policy setting
+
+    /proc/$PID/net_policy
+
+    Concatenating(cat) the "net_policy" file can show the applied per task
+    policy.
+    User can set per task net policy by writing policy name.
+
+    OR
+
+    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
+
+    "prctl" is an alternative way to set/get per task policy.
+
+Socket policy setting
+
+    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
+
+    The socket net policy can be set by option SO_NETPOLICY of setsockopt
+
+AVAILABLE NET POLICIES
+
+    The available net policies are defined as below:
+    - CPU: intends to get higher throughput and lower CPU%. This policy can be
+           applied as either device net policy or task/socket net policy.
+    - BULK: intends to get highest throughput. This policy can be applied as
+            either device net policy or task/socket net policy.
+    - LATENCY: intends to get lowest latency. This policy can be applied as
+               either device net policy or task/socket net policy.
+    - MIX: combination of other policies, which allows each queue has
+           different policy. This policy can only be set as device net policy.
+
-- 
2.5.5


^ permalink raw reply related	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18 15:18   ` Florian Westphal
  -1 siblings, 0 replies; 123+ messages in thread
From: Florian Westphal @ 2016-07-18 15:18 UTC (permalink / raw)
  To: kan.liang
  Cc: davem, linux-kernel, intel-wired-lan, netdev, jeffrey.t.kirsher,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, ben, decot,
	jesse.brandeburg, andi

kan.liang@intel.com <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too
> difficult to do automatic tuning for all possible workloads, since workloads
> have different requirements. Some workloads may want high throughput.

Seems you did lots of tests to find optimal settings for a given base
policy.

What is missing in the kernel UAPI so userspace could do these settings
on its own, without adding this policy stuff to the kernel?

It seems strange to me to add such policies to the kernel.
Addmittingly, documentation of some settings is non-existent and one needs
various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).

But all of these details could be hidden from user.
Have you looked at tuna for instance?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 15:18   ` Florian Westphal
  0 siblings, 0 replies; 123+ messages in thread
From: Florian Westphal @ 2016-07-18 15:18 UTC (permalink / raw)
  To: intel-wired-lan

kan.liang at intel.com <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too
> difficult to do automatic tuning for all possible workloads, since workloads
> have different requirements. Some workloads may want high throughput.

Seems you did lots of tests to find optimal settings for a given base
policy.

What is missing in the kernel UAPI so userspace could do these settings
on its own, without adding this policy stuff to the kernel?

It seems strange to me to add such policies to the kernel.
Addmittingly, documentation of some settings is non-existent and one needs
various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).

But all of these details could be hidden from user.
Have you looked at tuna for instance?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 15:18   ` [Intel-wired-lan] " Florian Westphal
@ 2016-07-18 15:45     ` Andi Kleen
  -1 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 15:45 UTC (permalink / raw)
  To: Florian Westphal
  Cc: kan.liang, davem, linux-kernel, intel-wired-lan, netdev,
	jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi

> It seems strange to me to add such policies to the kernel.
> Addmittingly, documentation of some settings is non-existent and one needs
> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).

The problem is that different applications need different policies.

The only entity which can efficiently negotiate between different
applications' conflicting requests is the kernel. And that is pretty 
much the basic job description of a kernel: multiplex hardware
efficiently between different users.

So yes the user space tuning approach works for simple cases
("only run workloads that require the same tuning"), but is ultimately not
very interesting nor scalable.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 15:45     ` Andi Kleen
  0 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 15:45 UTC (permalink / raw)
  To: intel-wired-lan

> It seems strange to me to add such policies to the kernel.
> Addmittingly, documentation of some settings is non-existent and one needs
> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).

The problem is that different applications need different policies.

The only entity which can efficiently negotiate between different
applications' conflicting requests is the kernel. And that is pretty 
much the basic job description of a kernel: multiplex hardware
efficiently between different users.

So yes the user space tuning approach works for simple cases
("only run workloads that require the same tuning"), but is ultimately not
very interesting nor scalable.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 15:18   ` [Intel-wired-lan] " Florian Westphal
  (?)
@ 2016-07-18 15:51     ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 15:51 UTC (permalink / raw)
  To: Florian Westphal
  Cc: davem, linux-kernel, intel-wired-lan, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, ben, decot, Brandeburg,
	Jesse, andi



> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> high throughput.
> 
> Seems you did lots of tests to find optimal settings for a given base policy.
> 
Yes. Current test only base on Intel i40e driver. The optimal settings should
vary for other devices. But adding settings for new device is not hard.

> What is missing in the kernel UAPI so userspace could do these settings on its
> own, without adding this policy stuff to the kernel?

The main purpose of the proposal is to simplify the configuration. Too many
options will let them confuse. 
For normal users, they just need to tell the kernel that they want high throughput
for the application. The kernel will take care of the rest.
So, I don't think we need an interface for user to set their own policy settings.

> 
> It seems strange to me to add such policies to the kernel.

But kernel is the only place which can merge all user's requests.

> Addmittingly, documentation of some settings is non-existent and one needs
> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> 
> But all of these details could be hidden from user.
> Have you looked at tuna for instance?

Not yet. Is there similar settings for network?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 15:51     ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 15:51 UTC (permalink / raw)
  To: Florian Westphal
  Cc: davem, linux-kernel, intel-wired-lan, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, ben@decadent.org.uk



> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> high throughput.
> 
> Seems you did lots of tests to find optimal settings for a given base policy.
> 
Yes. Current test only base on Intel i40e driver. The optimal settings should
vary for other devices. But adding settings for new device is not hard.

> What is missing in the kernel UAPI so userspace could do these settings on its
> own, without adding this policy stuff to the kernel?

The main purpose of the proposal is to simplify the configuration. Too many
options will let them confuse. 
For normal users, they just need to tell the kernel that they want high throughput
for the application. The kernel will take care of the rest.
So, I don't think we need an interface for user to set their own policy settings.

> 
> It seems strange to me to add such policies to the kernel.

But kernel is the only place which can merge all user's requests.

> Addmittingly, documentation of some settings is non-existent and one needs
> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> 
> But all of these details could be hidden from user.
> Have you looked at tuna for instance?

Not yet. Is there similar settings for network?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 15:51     ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 15:51 UTC (permalink / raw)
  To: intel-wired-lan



> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> high throughput.
> 
> Seems you did lots of tests to find optimal settings for a given base policy.
> 
Yes. Current test only base on Intel i40e driver. The optimal settings should
vary for other devices. But adding settings for new device is not hard.

> What is missing in the kernel UAPI so userspace could do these settings on its
> own, without adding this policy stuff to the kernel?

The main purpose of the proposal is to simplify the configuration. Too many
options will let them confuse. 
For normal users, they just need to tell the kernel that they want high throughput
for the application. The kernel will take care of the rest.
So, I don't think we need an interface for user to set their own policy settings.

> 
> It seems strange to me to add such policies to the kernel.

But kernel is the only place which can merge all user's requests.

> Addmittingly, documentation of some settings is non-existent and one needs
> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> 
> But all of these details could be hidden from user.
> Have you looked at tuna for instance?

Not yet. Is there similar settings for network?

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 15:51     ` Liang, Kan
  (?)
@ 2016-07-18 16:17       ` Florian Westphal
  -1 siblings, 0 replies; 123+ messages in thread
From: Florian Westphal @ 2016-07-18 16:17 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, Brandeburg, Jesse, andi

Liang, Kan <kan.liang@intel.com> wrote:
> > What is missing in the kernel UAPI so userspace could do these settings on its
> > own, without adding this policy stuff to the kernel?
> 
> The main purpose of the proposal is to simplify the configuration. Too many
> options will let them confuse. 
> For normal users, they just need to tell the kernel that they want high throughput
> for the application. The kernel will take care of the rest.
> So, I don't think we need an interface for user to set their own policy settings.

I don't (yet) agree that the kernel is the right place for this.
I agree that current (bare) kernel config interface(s) for this are
hard to use.

> > It seems strange to me to add such policies to the kernel.
> 
> But kernel is the only place which can merge all user's requests.

I don't think so.

If different requests conflict in a way that is possible to do something
meaningful the I don't see why userspace tool cannot do the same
thing...

> > Addmittingly, documentation of some settings is non-existent and one needs
> > various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> > 
> > But all of these details could be hidden from user.
> > Have you looked at tuna for instance?
> 
> Not yet. Is there similar settings for network?

Last time I checked tuna could only set a few network-related sysctls
and handle irq settings/affinity, but not e.g. tune irq coalescening
or any other network interface specific settings.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 16:17       ` Florian Westphal
  0 siblings, 0 replies; 123+ messages in thread
From: Florian Westphal @ 2016-07-18 16:17 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck

Liang, Kan <kan.liang@intel.com> wrote:
> > What is missing in the kernel UAPI so userspace could do these settings on its
> > own, without adding this policy stuff to the kernel?
> 
> The main purpose of the proposal is to simplify the configuration. Too many
> options will let them confuse. 
> For normal users, they just need to tell the kernel that they want high throughput
> for the application. The kernel will take care of the rest.
> So, I don't think we need an interface for user to set their own policy settings.

I don't (yet) agree that the kernel is the right place for this.
I agree that current (bare) kernel config interface(s) for this are
hard to use.

> > It seems strange to me to add such policies to the kernel.
> 
> But kernel is the only place which can merge all user's requests.

I don't think so.

If different requests conflict in a way that is possible to do something
meaningful the I don't see why userspace tool cannot do the same
thing...

> > Addmittingly, documentation of some settings is non-existent and one needs
> > various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> > 
> > But all of these details could be hidden from user.
> > Have you looked at tuna for instance?
> 
> Not yet. Is there similar settings for network?

Last time I checked tuna could only set a few network-related sysctls
and handle irq settings/affinity, but not e.g. tune irq coalescening
or any other network interface specific settings.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 16:17       ` Florian Westphal
  0 siblings, 0 replies; 123+ messages in thread
From: Florian Westphal @ 2016-07-18 16:17 UTC (permalink / raw)
  To: intel-wired-lan

Liang, Kan <kan.liang@intel.com> wrote:
> > What is missing in the kernel UAPI so userspace could do these settings on its
> > own, without adding this policy stuff to the kernel?
> 
> The main purpose of the proposal is to simplify the configuration. Too many
> options will let them confuse. 
> For normal users, they just need to tell the kernel that they want high throughput
> for the application. The kernel will take care of the rest.
> So, I don't think we need an interface for user to set their own policy settings.

I don't (yet) agree that the kernel is the right place for this.
I agree that current (bare) kernel config interface(s) for this are
hard to use.

> > It seems strange to me to add such policies to the kernel.
> 
> But kernel is the only place which can merge all user's requests.

I don't think so.

If different requests conflict in a way that is possible to do something
meaningful the I don't see why userspace tool cannot do the same
thing...

> > Addmittingly, documentation of some settings is non-existent and one needs
> > various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> > 
> > But all of these details could be hidden from user.
> > Have you looked at tuna for instance?
> 
> Not yet. Is there similar settings for network?

Last time I checked tuna could only set a few network-related sysctls
and handle irq settings/affinity, but not e.g. tune irq coalescening
or any other network interface specific settings.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 23/30] i40e/ethtool: support RX_CLS_LOC_ANY
  2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
@ 2016-07-18 16:21     ` Alexander Duyck
  -1 siblings, 0 replies; 123+ messages in thread
From: Alexander Duyck @ 2016-07-18 16:21 UTC (permalink / raw)
  To: kan.liang
  Cc: David Miller, linux-kernel, intel-wired-lan, Netdev,
	Jeff Kirsher, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	keescook, viro, gorcunov, john.stultz, Alex Duyck, Ben Hutchings,
	decot, Brandeburg, Jesse, Andi Kleen

On Sun, Jul 17, 2016 at 11:56 PM,  <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> The existing special location RX_CLS_LOC_ANY flag is designed for the
> case which the caller does not know/care about the location. Now, this
> flag is only handled in ethtool user space. If the kernel directly calls
> the ETHTOOL_SRXCLSRLINS interface with RX_CLS_LOC_ANY flag set, it will
> error out.
> This patch implements the RX_CLS_LOC_ANY support for i40e driver. It
> finds the available location from the end of the list.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>

Instead of reinventing the wheel you may wan to take a look at using
ndo_rx_flow_steer instead.  It was basically meant to be used for
kernel space applications to be able to add flow director rules.

> ---
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 38 ++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index 1f3537e..4276ed7 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -2552,6 +2552,32 @@ static int i40e_del_fdir_entry(struct i40e_vsi *vsi,
>         return ret;
>  }
>
> +static int find_empty_slot(struct i40e_pf *pf)
> +{
> +       struct i40e_fdir_filter *rule;
> +       struct hlist_node *node2;
> +       __u32 data = i40e_get_fd_cnt_all(pf);
> +       unsigned long *slot;
> +       int i;
> +
> +       slot = kzalloc(BITS_TO_LONGS(data) * sizeof(long), GFP_KERNEL);
> +       if (!slot)
> +               return -ENOMEM;
> +
> +       hlist_for_each_entry_safe(rule, node2,
> +                                 &pf->fdir_filter_list, fdir_node) {
> +               set_bit(rule->fd_id, slot);
> +       }
> +
> +       for (i = data - 1; i > 0; i--) {
> +               if (!test_bit(i, slot))
> +                       break;
> +       }
> +       kfree(slot);
> +
> +       return i;
> +}
> +

This doesn't seem like a very efficient way to find free slots.  If
you are wanting to make this efficient you might just want to keep the
bitmap always allocated.  In addition if you rewrite this so that it
keeps a variable that you can do a simple increment and test with you
will probably find that more often then not you will be able to find a
free slot on your first try.

>  /**
>   * i40e_add_fdir_ethtool - Add/Remove Flow Director filters
>   * @vsi: pointer to the targeted VSI
> @@ -2588,9 +2614,15 @@ static int i40e_add_fdir_ethtool(struct i40e_vsi *vsi,
>
>         fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
>
> -       if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
> -                             pf->hw.func_caps.fd_filters_guaranteed)) {
> -               return -EINVAL;
> +       if (fsp->location != RX_CLS_LOC_ANY) {
> +               if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
> +                                     pf->hw.func_caps.fd_filters_guaranteed)) {
> +                       return -EINVAL;
> +               }
> +       } else {
> +               fsp->location = find_empty_slot(pf);
> +               if (fsp->location < 0)
> +                       return -ENOSPC;
>         }
>
>         if ((fsp->ring_cookie != RX_CLS_FLOW_DISC) &&

The ethtool interface isn't really meant to be used for writing rules
from kernel space.  You would likely be much better off just using
ndo_rx_flow_steer instead.  Then it will even give you information
back on where the rule you created now resides.

- Alex

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 23/30] i40e/ethtool: support RX_CLS_LOC_ANY
@ 2016-07-18 16:21     ` Alexander Duyck
  0 siblings, 0 replies; 123+ messages in thread
From: Alexander Duyck @ 2016-07-18 16:21 UTC (permalink / raw)
  To: intel-wired-lan

On Sun, Jul 17, 2016 at 11:56 PM,  <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> The existing special location RX_CLS_LOC_ANY flag is designed for the
> case which the caller does not know/care about the location. Now, this
> flag is only handled in ethtool user space. If the kernel directly calls
> the ETHTOOL_SRXCLSRLINS interface with RX_CLS_LOC_ANY flag set, it will
> error out.
> This patch implements the RX_CLS_LOC_ANY support for i40e driver. It
> finds the available location from the end of the list.
>
> Signed-off-by: Kan Liang <kan.liang@intel.com>

Instead of reinventing the wheel you may wan to take a look at using
ndo_rx_flow_steer instead.  It was basically meant to be used for
kernel space applications to be able to add flow director rules.

> ---
>  drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 38 ++++++++++++++++++++++++--
>  1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> index 1f3537e..4276ed7 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
> @@ -2552,6 +2552,32 @@ static int i40e_del_fdir_entry(struct i40e_vsi *vsi,
>         return ret;
>  }
>
> +static int find_empty_slot(struct i40e_pf *pf)
> +{
> +       struct i40e_fdir_filter *rule;
> +       struct hlist_node *node2;
> +       __u32 data = i40e_get_fd_cnt_all(pf);
> +       unsigned long *slot;
> +       int i;
> +
> +       slot = kzalloc(BITS_TO_LONGS(data) * sizeof(long), GFP_KERNEL);
> +       if (!slot)
> +               return -ENOMEM;
> +
> +       hlist_for_each_entry_safe(rule, node2,
> +                                 &pf->fdir_filter_list, fdir_node) {
> +               set_bit(rule->fd_id, slot);
> +       }
> +
> +       for (i = data - 1; i > 0; i--) {
> +               if (!test_bit(i, slot))
> +                       break;
> +       }
> +       kfree(slot);
> +
> +       return i;
> +}
> +

This doesn't seem like a very efficient way to find free slots.  If
you are wanting to make this efficient you might just want to keep the
bitmap always allocated.  In addition if you rewrite this so that it
keeps a variable that you can do a simple increment and test with you
will probably find that more often then not you will be able to find a
free slot on your first try.

>  /**
>   * i40e_add_fdir_ethtool - Add/Remove Flow Director filters
>   * @vsi: pointer to the targeted VSI
> @@ -2588,9 +2614,15 @@ static int i40e_add_fdir_ethtool(struct i40e_vsi *vsi,
>
>         fsp = (struct ethtool_rx_flow_spec *)&cmd->fs;
>
> -       if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
> -                             pf->hw.func_caps.fd_filters_guaranteed)) {
> -               return -EINVAL;
> +       if (fsp->location != RX_CLS_LOC_ANY) {
> +               if (fsp->location >= (pf->hw.func_caps.fd_filters_best_effort +
> +                                     pf->hw.func_caps.fd_filters_guaranteed)) {
> +                       return -EINVAL;
> +               }
> +       } else {
> +               fsp->location = find_empty_slot(pf);
> +               if (fsp->location < 0)
> +                       return -ENOSPC;
>         }
>
>         if ((fsp->ring_cookie != RX_CLS_FLOW_DISC) &&

The ethtool interface isn't really meant to be used for writing rules
from kernel space.  You would likely be much better off just using
ndo_rx_flow_steer instead.  Then it will even give you information
back on where the rule you created now resides.

- Alex

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18 16:22   ` Daniel Borkmann
  -1 siblings, 0 replies; 123+ messages in thread
From: Daniel Borkmann @ 2016-07-18 16:22 UTC (permalink / raw)
  To: kan.liang, davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi, tj

Hi Kan,

On 07/18/2016 08:55 AM, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too
> difficult to do automatic tuning for all possible workloads, since workloads
> have different requirements. Some workloads may want high throughput. Some may
> need low latency. Last but not least, there are lots of manual configurations.
> Fine grained configuration is too difficult for users.
>
> NET policy intends to simplify the network configuration and get a good network
> performance according to the hints(policy) which is applied by user. It
> provides some typical "policies" for user which can be set per-socket, per-task
> or per-device. The kernel will automatically figures out how to merge different
> requests to get good network performance.
> Net policy is designed for multiqueue network devices. This implementation is
> only for Intel NICs using i40e driver. But the concepts and generic code should
> apply to other multiqueue NICs too.
> Net policy is also a combination of generic policy manager code and some
> ethtool callbacks (per queue coalesce setting, flow classification rules) to
> configure the driver.
> This series also supports CPU hotplug and device hotplug.
>
> Here are some key Interfaces/APIs for NET policy.
>
>     /proc/net/netpolicy/$DEV/policy
>     User can set/get per device policy from /proc
>
>     /proc/$PID/net_policy
>     User can set/get per task policy from /proc
>     prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
>     An alternative way to set/get per task policy is from prctl.
>
>     setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
>     User can set/get per socket policy by setsockopt
>
>
>     int (*ndo_netpolicy_init)(struct net_device *dev,
>                               struct netpolicy_info *info);
>     Initialize device driver for NET policy
>
>     int (*ndo_get_irq_info)(struct net_device *dev,
>                             struct netpolicy_dev_info *info);
>     Collect device irq information
>
>     int (*ndo_set_net_policy)(struct net_device *dev,
>                               enum netpolicy_name name);
>     Configure device according to policy name
>
>     netpolicy_register(struct netpolicy_reg *reg);
>     netpolicy_unregister(struct netpolicy_reg *reg);
>     NET policy API to register/unregister per task/socket net policy.
>     For each task/socket, an record will be created and inserted into an RCU
>     hash table.
>
>     netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
>     NET policy API to find the proper queue for packet receiving and
>     transmitting.
>
>     netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
>                          struct netpolicy_flow_spec *flow);
>     NET policy API to add flow director rules.
>
> For using NET policy, the per-device policy must be set in advance. It will
> automatically configure the system and re-organize the resource of the system
> accordingly. For system configuration, in this series, it will disable irq
> balance, set device queue irq affinity, and modify interrupt moderation. For
> re-organizing the resource, current implementation forces that CPU and queue
> irq are 1:1 mapping. An 1:1 mapping group is also called net policy object.
> For each device policy, it maintains a policy list. Once the device policy is
> applied, the objects will be insert and tracked in that device policy list. The
> policy list only be updated when cpu/device hotplug, queue number changes or
> device policy changes.
> The user can use /proc, prctl and setsockopt to set per-task and per-socket
> net policy. Once the policy is set, an related record will be inserted into RCU
> hash table. The record includes ptr, policy and net policy object. The ptr is
> the pointer address of task/socket. The object will not be assigned until the
> first package receive/transmit. The object is picked by round-robin from object
> list. Once the object is determined, the following packets will be set to
> redirect to the queue(object).
> The object can be shared. The per-task or per-socket policy can be inherited.
>
> Now NET policy supports four per device policies and three per task/socket
> policies.
>      - BULK policy: This policy is designed for high throughput. It can be
>        applied to either per device policy or per task/socket policy.
>      - CPU policy: This policy is designed for high throughput but lower CPU
>        utilization. It can be applied to either per device policy or
>        per task/socket policy.
>      - LATENCY policy: This policy is designed for low latency. It can be
>        applied to either per device policy or per task/socket policy.
>      - MIX policy: This policy can only be applied to per device policy. This
>        is designed for the case which miscellaneous types of workload running
>        on the device.

I'm missing a bit of discussion on the existing facilities there are under
networking and why they cannot be adapted to support these kind of hints?

On a higher level picture, why for example, a new cgroup in combination with
tc shouldn't be the ones resolving these policies on resource usage?

If sockets want to provide specific hints that may or may not be granted,
then this could be via SO_MARK, maybe SO_PRIORITY with above semantics or
some new marker perhaps that can be accessed from lower layers.

> Kan Liang (30):
>    net: introduce NET policy
>    net/netpolicy: init NET policy
>    i40e/netpolicy: Implement ndo_netpolicy_init
>    net/netpolicy: get driver information
>    i40e/netpolicy: implement ndo_get_irq_info
>    net/netpolicy: get CPU information
>    net/netpolicy: create CPU and queue mapping
>    net/netpolicy: set and remove irq affinity
>    net/netpolicy: enable and disable net policy
>    net/netpolicy: introduce netpolicy object
>    net/netpolicy: set net policy by policy name
>    i40e/netpolicy: implement ndo_set_net_policy
>    i40e/netpolicy: add three new net policies
>    net/netpolicy: add MIX policy
>    i40e/netpolicy: add MIX policy support
>    net/netpolicy: net device hotplug
>    net/netpolicy: support CPU hotplug
>    net/netpolicy: handle channel changes
>    net/netpolicy: implement netpolicy register
>    net/netpolicy: introduce per socket netpolicy
>    net/policy: introduce netpolicy_pick_queue
>    net/netpolicy: set tx queues according to policy
>    i40e/ethtool: support RX_CLS_LOC_ANY
>    net/netpolicy: set rx queues according to policy
>    net/netpolicy: introduce per task net policy
>    net/netpolicy: set per task policy by proc
>    net/netpolicy: fast path for finding the queues
>    net/netpolicy: optimize for queue pair
>    net/netpolicy: limit the total record number
>    Documentation/networking: Document net policy
>
>   Documentation/networking/netpolicy.txt         |  158 +++
>   arch/alpha/include/uapi/asm/socket.h           |    2 +
>   arch/avr32/include/uapi/asm/socket.h           |    2 +
>   arch/frv/include/uapi/asm/socket.h             |    2 +
>   arch/ia64/include/uapi/asm/socket.h            |    2 +
>   arch/m32r/include/uapi/asm/socket.h            |    2 +
>   arch/mips/include/uapi/asm/socket.h            |    2 +
>   arch/mn10300/include/uapi/asm/socket.h         |    2 +
>   arch/parisc/include/uapi/asm/socket.h          |    2 +
>   arch/powerpc/include/uapi/asm/socket.h         |    2 +
>   arch/s390/include/uapi/asm/socket.h            |    2 +
>   arch/sparc/include/uapi/asm/socket.h           |    2 +
>   arch/xtensa/include/uapi/asm/socket.h          |    2 +
>   drivers/net/ethernet/intel/i40e/i40e.h         |    3 +
>   drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   44 +-
>   drivers/net/ethernet/intel/i40e/i40e_main.c    |  174 +++
>   fs/proc/base.c                                 |   64 ++
>   include/linux/init_task.h                      |   14 +
>   include/linux/netdevice.h                      |   31 +
>   include/linux/netpolicy.h                      |  160 +++
>   include/linux/sched.h                          |    5 +
>   include/net/net_namespace.h                    |    3 +
>   include/net/request_sock.h                     |    4 +-
>   include/net/sock.h                             |   10 +
>   include/uapi/asm-generic/socket.h              |    2 +
>   include/uapi/linux/prctl.h                     |    4 +
>   kernel/exit.c                                  |    4 +
>   kernel/fork.c                                  |   11 +
>   kernel/sys.c                                   |   31 +
>   net/Kconfig                                    |    7 +
>   net/core/Makefile                              |    1 +
>   net/core/dev.c                                 |   30 +-
>   net/core/ethtool.c                             |    8 +-
>   net/core/netpolicy.c                           | 1387 ++++++++++++++++++++++++
>   net/core/sock.c                                |   46 +
>   net/ipv4/af_inet.c                             |   75 ++
>   net/ipv4/udp.c                                 |    4 +
>   37 files changed, 2294 insertions(+), 10 deletions(-)
>   create mode 100644 Documentation/networking/netpolicy.txt
>   create mode 100644 include/linux/netpolicy.h
>   create mode 100644 net/core/netpolicy.c
>

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 16:22   ` Daniel Borkmann
  0 siblings, 0 replies; 123+ messages in thread
From: Daniel Borkmann @ 2016-07-18 16:22 UTC (permalink / raw)
  To: intel-wired-lan

Hi Kan,

On 07/18/2016 08:55 AM, kan.liang at intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too
> difficult to do automatic tuning for all possible workloads, since workloads
> have different requirements. Some workloads may want high throughput. Some may
> need low latency. Last but not least, there are lots of manual configurations.
> Fine grained configuration is too difficult for users.
>
> NET policy intends to simplify the network configuration and get a good network
> performance according to the hints(policy) which is applied by user. It
> provides some typical "policies" for user which can be set per-socket, per-task
> or per-device. The kernel will automatically figures out how to merge different
> requests to get good network performance.
> Net policy is designed for multiqueue network devices. This implementation is
> only for Intel NICs using i40e driver. But the concepts and generic code should
> apply to other multiqueue NICs too.
> Net policy is also a combination of generic policy manager code and some
> ethtool callbacks (per queue coalesce setting, flow classification rules) to
> configure the driver.
> This series also supports CPU hotplug and device hotplug.
>
> Here are some key Interfaces/APIs for NET policy.
>
>     /proc/net/netpolicy/$DEV/policy
>     User can set/get per device policy from /proc
>
>     /proc/$PID/net_policy
>     User can set/get per task policy from /proc
>     prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
>     An alternative way to set/get per task policy is from prctl.
>
>     setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
>     User can set/get per socket policy by setsockopt
>
>
>     int (*ndo_netpolicy_init)(struct net_device *dev,
>                               struct netpolicy_info *info);
>     Initialize device driver for NET policy
>
>     int (*ndo_get_irq_info)(struct net_device *dev,
>                             struct netpolicy_dev_info *info);
>     Collect device irq information
>
>     int (*ndo_set_net_policy)(struct net_device *dev,
>                               enum netpolicy_name name);
>     Configure device according to policy name
>
>     netpolicy_register(struct netpolicy_reg *reg);
>     netpolicy_unregister(struct netpolicy_reg *reg);
>     NET policy API to register/unregister per task/socket net policy.
>     For each task/socket, an record will be created and inserted into an RCU
>     hash table.
>
>     netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
>     NET policy API to find the proper queue for packet receiving and
>     transmitting.
>
>     netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
>                          struct netpolicy_flow_spec *flow);
>     NET policy API to add flow director rules.
>
> For using NET policy, the per-device policy must be set in advance. It will
> automatically configure the system and re-organize the resource of the system
> accordingly. For system configuration, in this series, it will disable irq
> balance, set device queue irq affinity, and modify interrupt moderation. For
> re-organizing the resource, current implementation forces that CPU and queue
> irq are 1:1 mapping. An 1:1 mapping group is also called net policy object.
> For each device policy, it maintains a policy list. Once the device policy is
> applied, the objects will be insert and tracked in that device policy list. The
> policy list only be updated when cpu/device hotplug, queue number changes or
> device policy changes.
> The user can use /proc, prctl and setsockopt to set per-task and per-socket
> net policy. Once the policy is set, an related record will be inserted into RCU
> hash table. The record includes ptr, policy and net policy object. The ptr is
> the pointer address of task/socket. The object will not be assigned until the
> first package receive/transmit. The object is picked by round-robin from object
> list. Once the object is determined, the following packets will be set to
> redirect to the queue(object).
> The object can be shared. The per-task or per-socket policy can be inherited.
>
> Now NET policy supports four per device policies and three per task/socket
> policies.
>      - BULK policy: This policy is designed for high throughput. It can be
>        applied to either per device policy or per task/socket policy.
>      - CPU policy: This policy is designed for high throughput but lower CPU
>        utilization. It can be applied to either per device policy or
>        per task/socket policy.
>      - LATENCY policy: This policy is designed for low latency. It can be
>        applied to either per device policy or per task/socket policy.
>      - MIX policy: This policy can only be applied to per device policy. This
>        is designed for the case which miscellaneous types of workload running
>        on the device.

I'm missing a bit of discussion on the existing facilities there are under
networking and why they cannot be adapted to support these kind of hints?

On a higher level picture, why for example, a new cgroup in combination with
tc shouldn't be the ones resolving these policies on resource usage?

If sockets want to provide specific hints that may or may not be granted,
then this could be via SO_MARK, maybe SO_PRIORITY with above semantics or
some new marker perhaps that can be accessed from lower layers.

> Kan Liang (30):
>    net: introduce NET policy
>    net/netpolicy: init NET policy
>    i40e/netpolicy: Implement ndo_netpolicy_init
>    net/netpolicy: get driver information
>    i40e/netpolicy: implement ndo_get_irq_info
>    net/netpolicy: get CPU information
>    net/netpolicy: create CPU and queue mapping
>    net/netpolicy: set and remove irq affinity
>    net/netpolicy: enable and disable net policy
>    net/netpolicy: introduce netpolicy object
>    net/netpolicy: set net policy by policy name
>    i40e/netpolicy: implement ndo_set_net_policy
>    i40e/netpolicy: add three new net policies
>    net/netpolicy: add MIX policy
>    i40e/netpolicy: add MIX policy support
>    net/netpolicy: net device hotplug
>    net/netpolicy: support CPU hotplug
>    net/netpolicy: handle channel changes
>    net/netpolicy: implement netpolicy register
>    net/netpolicy: introduce per socket netpolicy
>    net/policy: introduce netpolicy_pick_queue
>    net/netpolicy: set tx queues according to policy
>    i40e/ethtool: support RX_CLS_LOC_ANY
>    net/netpolicy: set rx queues according to policy
>    net/netpolicy: introduce per task net policy
>    net/netpolicy: set per task policy by proc
>    net/netpolicy: fast path for finding the queues
>    net/netpolicy: optimize for queue pair
>    net/netpolicy: limit the total record number
>    Documentation/networking: Document net policy
>
>   Documentation/networking/netpolicy.txt         |  158 +++
>   arch/alpha/include/uapi/asm/socket.h           |    2 +
>   arch/avr32/include/uapi/asm/socket.h           |    2 +
>   arch/frv/include/uapi/asm/socket.h             |    2 +
>   arch/ia64/include/uapi/asm/socket.h            |    2 +
>   arch/m32r/include/uapi/asm/socket.h            |    2 +
>   arch/mips/include/uapi/asm/socket.h            |    2 +
>   arch/mn10300/include/uapi/asm/socket.h         |    2 +
>   arch/parisc/include/uapi/asm/socket.h          |    2 +
>   arch/powerpc/include/uapi/asm/socket.h         |    2 +
>   arch/s390/include/uapi/asm/socket.h            |    2 +
>   arch/sparc/include/uapi/asm/socket.h           |    2 +
>   arch/xtensa/include/uapi/asm/socket.h          |    2 +
>   drivers/net/ethernet/intel/i40e/i40e.h         |    3 +
>   drivers/net/ethernet/intel/i40e/i40e_ethtool.c |   44 +-
>   drivers/net/ethernet/intel/i40e/i40e_main.c    |  174 +++
>   fs/proc/base.c                                 |   64 ++
>   include/linux/init_task.h                      |   14 +
>   include/linux/netdevice.h                      |   31 +
>   include/linux/netpolicy.h                      |  160 +++
>   include/linux/sched.h                          |    5 +
>   include/net/net_namespace.h                    |    3 +
>   include/net/request_sock.h                     |    4 +-
>   include/net/sock.h                             |   10 +
>   include/uapi/asm-generic/socket.h              |    2 +
>   include/uapi/linux/prctl.h                     |    4 +
>   kernel/exit.c                                  |    4 +
>   kernel/fork.c                                  |   11 +
>   kernel/sys.c                                   |   31 +
>   net/Kconfig                                    |    7 +
>   net/core/Makefile                              |    1 +
>   net/core/dev.c                                 |   30 +-
>   net/core/ethtool.c                             |    8 +-
>   net/core/netpolicy.c                           | 1387 ++++++++++++++++++++++++
>   net/core/sock.c                                |   46 +
>   net/ipv4/af_inet.c                             |   75 ++
>   net/ipv4/udp.c                                 |    4 +
>   37 files changed, 2294 insertions(+), 10 deletions(-)
>   create mode 100644 Documentation/networking/netpolicy.txt
>   create mode 100644 include/linux/netpolicy.h
>   create mode 100644 net/core/netpolicy.c
>


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 15:51     ` Liang, Kan
  (?)
@ 2016-07-18 16:34       ` Tom Herbert
  -1 siblings, 0 replies; 123+ messages in thread
From: Tom Herbert @ 2016-07-18 16:34 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, Brandeburg, Jesse, andi

On Mon, Jul 18, 2016 at 5:51 PM, Liang, Kan <kan.liang@intel.com> wrote:
>
>
>> >
>> > It is a big challenge to get good network performance. First, the
>> > network performance is not good with default system settings. Second,
>> > it is too difficult to do automatic tuning for all possible workloads,
>> > since workloads have different requirements. Some workloads may want
>> high throughput.
>>
>> Seems you did lots of tests to find optimal settings for a given base policy.
>>
> Yes. Current test only base on Intel i40e driver. The optimal settings should
> vary for other devices. But adding settings for new device is not hard.
>
The optimal settings are very dependent on system architecture (NUMA
config, #cpus, memory, etc.) and sometimes kernel version as well. A
database that provides best configurations across different devices,
architectures, and kernel version might be interesting; but beware
that that is a whole bunch of work to maintain, Either way policy like
this really should be handled in userspace.

Tom

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 16:34       ` Tom Herbert
  0 siblings, 0 replies; 123+ messages in thread
From: Tom Herbert @ 2016-07-18 16:34 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck

On Mon, Jul 18, 2016 at 5:51 PM, Liang, Kan <kan.liang@intel.com> wrote:
>
>
>> >
>> > It is a big challenge to get good network performance. First, the
>> > network performance is not good with default system settings. Second,
>> > it is too difficult to do automatic tuning for all possible workloads,
>> > since workloads have different requirements. Some workloads may want
>> high throughput.
>>
>> Seems you did lots of tests to find optimal settings for a given base policy.
>>
> Yes. Current test only base on Intel i40e driver. The optimal settings should
> vary for other devices. But adding settings for new device is not hard.
>
The optimal settings are very dependent on system architecture (NUMA
config, #cpus, memory, etc.) and sometimes kernel version as well. A
database that provides best configurations across different devices,
architectures, and kernel version might be interesting; but beware
that that is a whole bunch of work to maintain, Either way policy like
this really should be handled in userspace.

Tom

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 16:34       ` Tom Herbert
  0 siblings, 0 replies; 123+ messages in thread
From: Tom Herbert @ 2016-07-18 16:34 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Jul 18, 2016 at 5:51 PM, Liang, Kan <kan.liang@intel.com> wrote:
>
>
>> >
>> > It is a big challenge to get good network performance. First, the
>> > network performance is not good with default system settings. Second,
>> > it is too difficult to do automatic tuning for all possible workloads,
>> > since workloads have different requirements. Some workloads may want
>> high throughput.
>>
>> Seems you did lots of tests to find optimal settings for a given base policy.
>>
> Yes. Current test only base on Intel i40e driver. The optimal settings should
> vary for other devices. But adding settings for new device is not hard.
>
The optimal settings are very dependent on system architecture (NUMA
config, #cpus, memory, etc.) and sometimes kernel version as well. A
database that provides best configurations across different devices,
architectures, and kernel version might be interesting; but beware
that that is a whole bunch of work to maintain, Either way policy like
this really should be handled in userspace.

Tom

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 30/30] Documentation/networking: Document net policy
  2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
@ 2016-07-18 16:58     ` Randy Dunlap
  -1 siblings, 0 replies; 123+ messages in thread
From: Randy Dunlap @ 2016-07-18 16:58 UTC (permalink / raw)
  To: kan.liang, davem, linux-kernel, intel-wired-lan, netdev
  Cc: jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg, andi

On 07/17/16 23:56, kan.liang@intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  Documentation/networking/netpolicy.txt | 158 +++++++++++++++++++++++++++++++++
>  1 file changed, 158 insertions(+)
>  create mode 100644 Documentation/networking/netpolicy.txt
> 
> diff --git a/Documentation/networking/netpolicy.txt b/Documentation/networking/netpolicy.txt
> new file mode 100644
> index 0000000..2ce938e
> --- /dev/null
> +++ b/Documentation/networking/netpolicy.txt
> @@ -0,0 +1,158 @@
> +What is Linux Net Policy?
> +
> +It is a big challenge to get good network performance. First, the network
> +performance is not good with default system settings. Second, it is too
> +difficult to do automatic tuning for all possible workloads, since workloads
> +have different requirements. Some workloads may want high throughput. Some may
> +need low latency. Last but not least, there are lots of manual configurations.
> +Fine grained configuration is too difficult for users.
> +
> +"NET policy" intends to simplify the network configuration and get a
> +good network performance according to the hints(policy) which is applied by
> +user. It provides some typical "policies" for user which can be set
> +per-socket, per-task or per-device. The kernel will automatically figures out

                                      drop:       will

> +how to merge different requests to get good network performance.
> +
> +"Net policy" is designed for multiqueue network devices. This document
> +describes the concepts and APIs of "net policy" support.
> +
> +NET POLICY CONCEPTS
> +
> +Scope of Net Policies
> +
> +    Device net policy: this policy applies to the whole device. Once the
> +    device net policy is set, it automatically configures the system
> +    according to the applied policy. The configuration usually includes irq
> +    affinity, irq balance disable, interrupt moderation, and so on. But the
> +    device net policy does not change the packet direction.
> +
> +    Task net policy: this is a per-task policy. When it is applied to specific
> +    task, all packets transmissions of the task will be redirect to the

                 packet                                    redirected

> +    assigned queues accordingly. If a task does not define a task policy,
> +    it "falls back" to the system default way to direct the packets. The
> +    per-task policy must be compatible with device net policy.
> +
> +    Socket net policy: this is a per-socket policy. When it is applied to
> +    specific socket, all packets transmissions of the socket will be redirect

                            packet                                      redirected

> +    to the assigned queues accordingly. If a socket does not define a socket
> +    policy, it "falls back" to the system default way to direct the packets.
> +    The per-socket policy must be compatible with both device net policy and
> +    per-task policy.
> +
> +Components of Net Policies
> +
> +    Net policy object: it is a combination of cpu and queue. The queue irq has
> +    to set affinity with the cpu. It can be shared between sockets and tasks.
> +    A reference counter is used to track the sharing number.

I would prefer to see CPU instead of cpu and IRQ instead of irq throughout the file.

> +
> +    Net policy object list: each device policy has an object list. Once the
> +    device policy is determined, the net policy object will be inserted into
> +    the net policy object list. The net policy object list does not change
> +    unless the cpu/queue number is changed, the netpolicy is disabled or
> +    the device policy is changed.
> +    The network performance for objects could be different because of the
> +    queue/cpu topology and dev location. The objects which can bring high
> +    performance are in the front of the list.
> +
> +    RCU hash table: a RCU hash table to maintain the relationship between

                       an RCU

> +    the task/socket and the assigned object. The task/socket can get the
> +    assigned object by searching the table.
> +    If it is the first time, there is no assigned object in the table. It will
> +    go through the object list to find the available object based on position
> +    and reference number.
> +    If the net policy object list changes, all the assigned object will become

                                                               objects

> +    invalid.
> +
> +NET POLICY APIs
> +
> +Interfaces between net policy and device driver
> +
> +    int (*ndo_netpolicy_init)(struct net_device *dev,
> +                              struct netpolicy_info *info);
> +
> +    The device driver who has NET policy support must implement this interface.
> +    In this interface, the device driver do necessory initialization, and fill

                                            does necessary

> +    the info for net policy module. The information could inlcude supported

                                                             include

> +    policy, MIX policy support, queue pair support and so on.
> +
> +    int (*ndo_get_irq_info)(struct net_device *dev,
> +                            struct netpolicy_dev_info *info);
> +
> +    This interface is used to get more accurate device irq information.
> +
> +    int (*ndo_set_net_policy)(struct net_device *dev,
> +                              enum netpolicy_name name);
> +
> +    This interface is used to set device net policy by name

                                                          name.

> +
> +Interfaces between net policy and kernel
> +
> +    int netpolicy_register(struct netpolicy_reg *reg);
> +    void netpolicy_unregister(struct netpolicy_reg *reg);
> +
> +    This interface is used to register per task/socket net policy.
> +    If it's the first time to register, an record will be created and inserted

                                           a record

> +    into RCU hash table. The record includes ptr, policy and object
> +    information. There is only one user for each record. The record cannot be
> +    share.

       shared.

> +
> +
> +    int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> +
> +    This interface is used to find the proper queue(object) for packet
> +    receiving and transmitting. The proper queue is picked from object list
> +    according to policy, reference, location and so on.
> +
> +
> +    int netpolicy_set_rules(struct netpolicy_reg *reg,
> +                            u32 queue_index,
> +                            struct netpolicy_flow_spec *flow);
> +
> +    This interface is used to add device specific rules. Once the rule is
> +    applied, the packet from specific IP and port will be redirect to the

                                                             redirected

> +    given queue. This interface usually be used in receive side.

                                   is usually used on the receive side.

> +
> +NET POLICY INTERFACE
> +
> +Device net policy setting
> +
> +    /proc/net/netpolicy/$DEV/policy
> +
> +    Concatenating(cat) the "policy" file can show the available device
> +    policies, if there is no device policy applied. Otherwise, the device
> +    policy name will be print out. If it is MIX policy, the policy for each

                           printed

> +    queue will also be print out.

                          printed

> +    User can set device net policy by writing policy name.
> +
> +Task policy setting
> +
> +    /proc/$PID/net_policy
> +
> +    Concatenating(cat) the "net_policy" file can show the applied per task
> +    policy.
> +    User can set per task net policy by writing policy name.
> +
> +    OR
> +
> +    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> +
> +    "prctl" is an alternative way to set/get per task policy.
> +
> +Socket policy setting
> +
> +    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> +
> +    The socket net policy can be set by option SO_NETPOLICY of setsockopt

                                                                  setsockopt.

> +
> +AVAILABLE NET POLICIES
> +
> +    The available net policies are defined as below:
> +    - CPU: intends to get higher throughput and lower CPU%. This policy can be
> +           applied as either device net policy or task/socket net policy.
> +    - BULK: intends to get highest throughput. This policy can be applied as
> +            either device net policy or task/socket net policy.
> +    - LATENCY: intends to get lowest latency. This policy can be applied as
> +               either device net policy or task/socket net policy.
> +    - MIX: combination of other policies, which allows each queue has

                                                                     to have a

> +           different policy. This policy can only be set as device net policy.
> +
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 30/30] Documentation/networking: Document net policy
@ 2016-07-18 16:58     ` Randy Dunlap
  0 siblings, 0 replies; 123+ messages in thread
From: Randy Dunlap @ 2016-07-18 16:58 UTC (permalink / raw)
  To: intel-wired-lan

On 07/17/16 23:56, kan.liang at intel.com wrote:
> From: Kan Liang <kan.liang@intel.com>
> 
> Signed-off-by: Kan Liang <kan.liang@intel.com>
> ---
>  Documentation/networking/netpolicy.txt | 158 +++++++++++++++++++++++++++++++++
>  1 file changed, 158 insertions(+)
>  create mode 100644 Documentation/networking/netpolicy.txt
> 
> diff --git a/Documentation/networking/netpolicy.txt b/Documentation/networking/netpolicy.txt
> new file mode 100644
> index 0000000..2ce938e
> --- /dev/null
> +++ b/Documentation/networking/netpolicy.txt
> @@ -0,0 +1,158 @@
> +What is Linux Net Policy?
> +
> +It is a big challenge to get good network performance. First, the network
> +performance is not good with default system settings. Second, it is too
> +difficult to do automatic tuning for all possible workloads, since workloads
> +have different requirements. Some workloads may want high throughput. Some may
> +need low latency. Last but not least, there are lots of manual configurations.
> +Fine grained configuration is too difficult for users.
> +
> +"NET policy" intends to simplify the network configuration and get a
> +good network performance according to the hints(policy) which is applied by
> +user. It provides some typical "policies" for user which can be set
> +per-socket, per-task or per-device. The kernel will automatically figures out

                                      drop:       will

> +how to merge different requests to get good network performance.
> +
> +"Net policy" is designed for multiqueue network devices. This document
> +describes the concepts and APIs of "net policy" support.
> +
> +NET POLICY CONCEPTS
> +
> +Scope of Net Policies
> +
> +    Device net policy: this policy applies to the whole device. Once the
> +    device net policy is set, it automatically configures the system
> +    according to the applied policy. The configuration usually includes irq
> +    affinity, irq balance disable, interrupt moderation, and so on. But the
> +    device net policy does not change the packet direction.
> +
> +    Task net policy: this is a per-task policy. When it is applied to specific
> +    task, all packets transmissions of the task will be redirect to the

                 packet                                    redirected

> +    assigned queues accordingly. If a task does not define a task policy,
> +    it "falls back" to the system default way to direct the packets. The
> +    per-task policy must be compatible with device net policy.
> +
> +    Socket net policy: this is a per-socket policy. When it is applied to
> +    specific socket, all packets transmissions of the socket will be redirect

                            packet                                      redirected

> +    to the assigned queues accordingly. If a socket does not define a socket
> +    policy, it "falls back" to the system default way to direct the packets.
> +    The per-socket policy must be compatible with both device net policy and
> +    per-task policy.
> +
> +Components of Net Policies
> +
> +    Net policy object: it is a combination of cpu and queue. The queue irq has
> +    to set affinity with the cpu. It can be shared between sockets and tasks.
> +    A reference counter is used to track the sharing number.

I would prefer to see CPU instead of cpu and IRQ instead of irq throughout the file.

> +
> +    Net policy object list: each device policy has an object list. Once the
> +    device policy is determined, the net policy object will be inserted into
> +    the net policy object list. The net policy object list does not change
> +    unless the cpu/queue number is changed, the netpolicy is disabled or
> +    the device policy is changed.
> +    The network performance for objects could be different because of the
> +    queue/cpu topology and dev location. The objects which can bring high
> +    performance are in the front of the list.
> +
> +    RCU hash table: a RCU hash table to maintain the relationship between

                       an RCU

> +    the task/socket and the assigned object. The task/socket can get the
> +    assigned object by searching the table.
> +    If it is the first time, there is no assigned object in the table. It will
> +    go through the object list to find the available object based on position
> +    and reference number.
> +    If the net policy object list changes, all the assigned object will become

                                                               objects

> +    invalid.
> +
> +NET POLICY APIs
> +
> +Interfaces between net policy and device driver
> +
> +    int (*ndo_netpolicy_init)(struct net_device *dev,
> +                              struct netpolicy_info *info);
> +
> +    The device driver who has NET policy support must implement this interface.
> +    In this interface, the device driver do necessory initialization, and fill

                                            does necessary

> +    the info for net policy module. The information could inlcude supported

                                                             include

> +    policy, MIX policy support, queue pair support and so on.
> +
> +    int (*ndo_get_irq_info)(struct net_device *dev,
> +                            struct netpolicy_dev_info *info);
> +
> +    This interface is used to get more accurate device irq information.
> +
> +    int (*ndo_set_net_policy)(struct net_device *dev,
> +                              enum netpolicy_name name);
> +
> +    This interface is used to set device net policy by name

                                                          name.

> +
> +Interfaces between net policy and kernel
> +
> +    int netpolicy_register(struct netpolicy_reg *reg);
> +    void netpolicy_unregister(struct netpolicy_reg *reg);
> +
> +    This interface is used to register per task/socket net policy.
> +    If it's the first time to register, an record will be created and inserted

                                           a record

> +    into RCU hash table. The record includes ptr, policy and object
> +    information. There is only one user for each record. The record cannot be
> +    share.

       shared.

> +
> +
> +    int netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> +
> +    This interface is used to find the proper queue(object) for packet
> +    receiving and transmitting. The proper queue is picked from object list
> +    according to policy, reference, location and so on.
> +
> +
> +    int netpolicy_set_rules(struct netpolicy_reg *reg,
> +                            u32 queue_index,
> +                            struct netpolicy_flow_spec *flow);
> +
> +    This interface is used to add device specific rules. Once the rule is
> +    applied, the packet from specific IP and port will be redirect to the

                                                             redirected

> +    given queue. This interface usually be used in receive side.

                                   is usually used on the receive side.

> +
> +NET POLICY INTERFACE
> +
> +Device net policy setting
> +
> +    /proc/net/netpolicy/$DEV/policy
> +
> +    Concatenating(cat) the "policy" file can show the available device
> +    policies, if there is no device policy applied. Otherwise, the device
> +    policy name will be print out. If it is MIX policy, the policy for each

                           printed

> +    queue will also be print out.

                          printed

> +    User can set device net policy by writing policy name.
> +
> +Task policy setting
> +
> +    /proc/$PID/net_policy
> +
> +    Concatenating(cat) the "net_policy" file can show the applied per task
> +    policy.
> +    User can set per task net policy by writing policy name.
> +
> +    OR
> +
> +    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> +
> +    "prctl" is an alternative way to set/get per task policy.
> +
> +Socket policy setting
> +
> +    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> +
> +    The socket net policy can be set by option SO_NETPOLICY of setsockopt

                                                                  setsockopt.

> +
> +AVAILABLE NET POLICIES
> +
> +    The available net policies are defined as below:
> +    - CPU: intends to get higher throughput and lower CPU%. This policy can be
> +           applied as either device net policy or task/socket net policy.
> +    - BULK: intends to get highest throughput. This policy can be applied as
> +            either device net policy or task/socket net policy.
> +    - LATENCY: intends to get lowest latency. This policy can be applied as
> +               either device net policy or task/socket net policy.
> +    - MIX: combination of other policies, which allows each queue has

                                                                     to have a

> +           different policy. This policy can only be set as device net policy.
> +
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
@ 2016-07-18 17:00   ` Alexander Duyck
  -1 siblings, 0 replies; 123+ messages in thread
From: Alexander Duyck @ 2016-07-18 17:00 UTC (permalink / raw)
  To: kan.liang
  Cc: David Miller, linux-kernel, intel-wired-lan, Netdev,
	Jeff Kirsher, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	keescook, viro, gorcunov, john.stultz, Alex Duyck, Ben Hutchings,
	decot, Brandeburg, Jesse, Andi Kleen

On Sun, Jul 17, 2016 at 11:55 PM,  <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too
> difficult to do automatic tuning for all possible workloads, since workloads
> have different requirements. Some workloads may want high throughput. Some may
> need low latency. Last but not least, there are lots of manual configurations.
> Fine grained configuration is too difficult for users.

The problem as I see it is that this is just going to end up likely
being an even more intrusive version of irqbalance.  I really don't
like the way that turned out as it did a number of really dumb things
that usually result in it being disabled as soon as you actually want
to do anything that will actually involve any kind of performance
tuning.  If this stuff is pushed into the kernel it will be even
harder to get rid of and that is definitely a bad thing.

> NET policy intends to simplify the network configuration and get a good network
> performance according to the hints(policy) which is applied by user. It
> provides some typical "policies" for user which can be set per-socket, per-task
> or per-device. The kernel will automatically figures out how to merge different
> requests to get good network performance.

So where is your policy for power saving?  From past experience I can
tell you that while performance tuning is a good thing, doing so at
the expense of power management is bad.  In addition you seem to be
making a lot of assumptions here that the end users are going to
rewrite their applications to use the new socket options you added in
order to try and tune the performance.  I have a hard time believing
most developers are going to go to all that trouble.  In addition I
suspect that even if they do go to that trouble they will probably
still screw it up and you will end up with applications advertising
latency as a goal when they should have specified CPU and so on.

> Net policy is designed for multiqueue network devices. This implementation is
> only for Intel NICs using i40e driver. But the concepts and generic code should
> apply to other multiqueue NICs too.

I would argue that your code is not very generic.  The fact that it is
relying on flow director already greatly limits what you can do.  If
you want to make this truly generic I would say you need to find ways
to make this work on everything all the way down to things like i40evf
and igb which don't have support for Flow Director.

> Net policy is also a combination of generic policy manager code and some
> ethtool callbacks (per queue coalesce setting, flow classification rules) to
> configure the driver.
> This series also supports CPU hotplug and device hotplug.
>
> Here are some key Interfaces/APIs for NET policy.
>
>    /proc/net/netpolicy/$DEV/policy
>    User can set/get per device policy from /proc
>
>    /proc/$PID/net_policy
>    User can set/get per task policy from /proc
>    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
>    An alternative way to set/get per task policy is from prctl.
>
>    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
>    User can set/get per socket policy by setsockopt
>
>
>    int (*ndo_netpolicy_init)(struct net_device *dev,
>                              struct netpolicy_info *info);
>    Initialize device driver for NET policy
>
>    int (*ndo_get_irq_info)(struct net_device *dev,
>                            struct netpolicy_dev_info *info);
>    Collect device irq information

Instead of making the irq info a part of the ndo ops it might make
more sense to make it part of an ethtool op.  Maybe you could make it
so that you could specify a single queue at a time and get things like
statistics, IRQ, and ring information.

>    int (*ndo_set_net_policy)(struct net_device *dev,
>                              enum netpolicy_name name);
>    Configure device according to policy name

I really don't like this piece of it.  I really think we shouldn't be
leaving so much up to the driver to determine how to handle things.
In addition just passing one of 4 different types doesn't do much for
actual configuration because the actual configuration of the device is
much more complex then that.  Essentially all this does is provide a
benchmark tuning interface.

>    netpolicy_register(struct netpolicy_reg *reg);
>    netpolicy_unregister(struct netpolicy_reg *reg);
>    NET policy API to register/unregister per task/socket net policy.
>    For each task/socket, an record will be created and inserted into an RCU
>    hash table.

This piece will take a significant amount of time before it could ever
catch on.  Once again this just looks like a benchmark tuning
interface.  It isn't of much value.

>    netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
>    NET policy API to find the proper queue for packet receiving and
>    transmitting.
>
>    netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
>                         struct netpolicy_flow_spec *flow);
>    NET policy API to add flow director rules.

So Flow Director is a very Intel-centric approach.  I would suggest
taking a now at NTUPLE and RXNFC rules as that is what is actually
implemented in the kernel.  In addition I would recommend exploring
RPS and ndo_rx_flow_steer as those are existing interfaces for
configuring a specific flow to be delivered to a specific CPU.

> For using NET policy, the per-device policy must be set in advance. It will
> automatically configure the system and re-organize the resource of the system
> accordingly. For system configuration, in this series, it will disable irq
> balance, set device queue irq affinity, and modify interrupt moderation. For
> re-organizing the resource, current implementation forces that CPU and queue
> irq are 1:1 mapping. An 1:1 mapping group is also called net policy object.
> For each device policy, it maintains a policy list. Once the device policy is
> applied, the objects will be insert and tracked in that device policy list. The
> policy list only be updated when cpu/device hotplug, queue number changes or
> device policy changes.

So as a beginning step it might make more sense to try and fix
irqbalance instead of disabling it.  That is a huge red flag for me.
You are just implementing something that is more intrusive than
irqbalance and my concern here is we can't just disable it and
reconfigure things like we can with the current irqbalance.  If
irqbalance never got it right then why should we trust this?

Also how will you code handle a non 1:1 mapping.  For example I know
one thing I have been looking at trying out was implementing a setup
that would allocate 1 Tx queue per logical CPU, and 1 Rx queue per
physical CPU.  The reason for that being that from past experience on
ixgbe I have found that more Rx queues does not equal more performance
when you start stacking active queues on SMT pairs.  If you don't have
enough queues for the number of CPUs in a case such as this how would
your code handle it?

> The user can use /proc, prctl and setsockopt to set per-task and per-socket
> net policy. Once the policy is set, an related record will be inserted into RCU
> hash table. The record includes ptr, policy and net policy object. The ptr is
> the pointer address of task/socket. The object will not be assigned until the
> first package receive/transmit. The object is picked by round-robin from object
> list. Once the object is determined, the following packets will be set to
> redirect to the queue(object).
> The object can be shared. The per-task or per-socket policy can be inherited.
>
> Now NET policy supports four per device policies and three per task/socket
> policies.
>     - BULK policy: This policy is designed for high throughput. It can be
>       applied to either per device policy or per task/socket policy.
>     - CPU policy: This policy is designed for high throughput but lower CPU
>       utilization. It can be applied to either per device policy or
>       per task/socket policy.
>     - LATENCY policy: This policy is designed for low latency. It can be
>       applied to either per device policy or per task/socket policy.
>     - MIX policy: This policy can only be applied to per device policy. This
>       is designed for the case which miscellaneous types of workload running
>       on the device.

This is a rather sparse list of policies.  I know most organizations
with large data centers care about power savings AND latency.  What
you have here is a rather simplistic set of targets.  I think actual
configuration is much more complex then that.

> Lots of tests are done for net policy on platforms with Intel Xeon E5 V2
> and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.

So I assume you are saying you applied your patches on top of a 4.6.0
kernel then for testing correct?  I'm just wanting to verify we aren't
looking 4.6.0 versus the current net-next or Linus's 4.7-RCX tree.

> Netperf is used to evaluate the throughput and latency performance.
>   - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
>     -b burst -D" is used to evaluate throughput performance, which is
>     called throughput-first workload.

While this is okay for testing performance you might be better off
using a TCP_STREAM, TCP_MAERTS, and perhaps UDP_STREAM test.  There
aren't too many real-world applications that will give you the kind of
traffic pattern you see with TCP_RR being used for a bulk throughput
test.

>   - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
>     used to evaluate latency performance, which is called latency-first
>     workload.
>   - Different loads are also evaluated by running 1, 12, 24, 48 or 96
>     throughput-first workloads/latency-first workload simultaneously.
>
> For "BULK" policy, the throughput performance is on average ~1.26X than
> baseline.
> For "CPU" policy, the throughput performance is on average ~1.20X than
> baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
> For "LATENCY" policy, the latency is on average 53.5% less than the baseline.

I have misgivings about just throwing out random numbers with no
actual data to back it up.  What kind of throughput and CPU
utilization were you actually seeing?  The idea is that we should be
able to take your patches and apply them on our own system to see
similar values and I'm suspecting that in many cases you might be
focusing on the wrong things.  For example I could get good "LATENCY"
numbers by just disabling interrupt throttling.  That would look
really good for latency, but my CPU utilization would be through the
roof.  It might be useful if you could provide throughput, CPU
utilization, and latency numbers for your baseline versus each of
these settings.

> For "MIX" policy, mixed workloads performance is evaluated.
> The mixed workloads are combination of throughput-first workload and
> latency-first workload. Five different types of combinations are evaluated
> (pure throughput-first workload, pure latency-first workloads,
>  2/3 throughput-first workload + 1/3 latency-first workloads,
>  1/3 throughput-first workload + 2/3 latency-first workloads and
>  1/2 throughput-first workload + 1/2 latency-first workloads).
> For caculating the performance of mixed workloads, a weighted sum system
> is introduced.
> Score = normalized_latency * Weight + normalized_throughput * (1 - Weight).
> If we assume that the user has an equal interest in latency and throughput
> performance, the Score for "MIX" policy is on average ~1.52X than baseline.

This scoring system of yours makes no sense.  Just give us the numbers
on what the average latency did versus your "baseline" and the same
for the throughput.

> Kan Liang (30):
>   net: introduce NET policy
>   net/netpolicy: init NET policy
>   i40e/netpolicy: Implement ndo_netpolicy_init
>   net/netpolicy: get driver information
>   i40e/netpolicy: implement ndo_get_irq_info
>   net/netpolicy: get CPU information
>   net/netpolicy: create CPU and queue mapping
>   net/netpolicy: set and remove irq affinity
>   net/netpolicy: enable and disable net policy
>   net/netpolicy: introduce netpolicy object
>   net/netpolicy: set net policy by policy name
>   i40e/netpolicy: implement ndo_set_net_policy
>   i40e/netpolicy: add three new net policies
>   net/netpolicy: add MIX policy
>   i40e/netpolicy: add MIX policy support
>   net/netpolicy: net device hotplug
>   net/netpolicy: support CPU hotplug
>   net/netpolicy: handle channel changes
>   net/netpolicy: implement netpolicy register
>   net/netpolicy: introduce per socket netpolicy
>   net/policy: introduce netpolicy_pick_queue
>   net/netpolicy: set tx queues according to policy
>   i40e/ethtool: support RX_CLS_LOC_ANY
>   net/netpolicy: set rx queues according to policy
>   net/netpolicy: introduce per task net policy
>   net/netpolicy: set per task policy by proc
>   net/netpolicy: fast path for finding the queues
>   net/netpolicy: optimize for queue pair
>   net/netpolicy: limit the total record number
>   Documentation/networking: Document net policy

30 patches is quite a bit to review.  You might have better luck
getting review and/or feedback if you could split this up into at
least 2 patch sets of 15 or so patches when you try to actually submit
this.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 17:00   ` Alexander Duyck
  0 siblings, 0 replies; 123+ messages in thread
From: Alexander Duyck @ 2016-07-18 17:00 UTC (permalink / raw)
  To: intel-wired-lan

On Sun, Jul 17, 2016 at 11:55 PM,  <kan.liang@intel.com> wrote:
> From: Kan Liang <kan.liang@intel.com>
>
> It is a big challenge to get good network performance. First, the network
> performance is not good with default system settings. Second, it is too
> difficult to do automatic tuning for all possible workloads, since workloads
> have different requirements. Some workloads may want high throughput. Some may
> need low latency. Last but not least, there are lots of manual configurations.
> Fine grained configuration is too difficult for users.

The problem as I see it is that this is just going to end up likely
being an even more intrusive version of irqbalance.  I really don't
like the way that turned out as it did a number of really dumb things
that usually result in it being disabled as soon as you actually want
to do anything that will actually involve any kind of performance
tuning.  If this stuff is pushed into the kernel it will be even
harder to get rid of and that is definitely a bad thing.

> NET policy intends to simplify the network configuration and get a good network
> performance according to the hints(policy) which is applied by user. It
> provides some typical "policies" for user which can be set per-socket, per-task
> or per-device. The kernel will automatically figures out how to merge different
> requests to get good network performance.

So where is your policy for power saving?  From past experience I can
tell you that while performance tuning is a good thing, doing so at
the expense of power management is bad.  In addition you seem to be
making a lot of assumptions here that the end users are going to
rewrite their applications to use the new socket options you added in
order to try and tune the performance.  I have a hard time believing
most developers are going to go to all that trouble.  In addition I
suspect that even if they do go to that trouble they will probably
still screw it up and you will end up with applications advertising
latency as a goal when they should have specified CPU and so on.

> Net policy is designed for multiqueue network devices. This implementation is
> only for Intel NICs using i40e driver. But the concepts and generic code should
> apply to other multiqueue NICs too.

I would argue that your code is not very generic.  The fact that it is
relying on flow director already greatly limits what you can do.  If
you want to make this truly generic I would say you need to find ways
to make this work on everything all the way down to things like i40evf
and igb which don't have support for Flow Director.

> Net policy is also a combination of generic policy manager code and some
> ethtool callbacks (per queue coalesce setting, flow classification rules) to
> configure the driver.
> This series also supports CPU hotplug and device hotplug.
>
> Here are some key Interfaces/APIs for NET policy.
>
>    /proc/net/netpolicy/$DEV/policy
>    User can set/get per device policy from /proc
>
>    /proc/$PID/net_policy
>    User can set/get per task policy from /proc
>    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
>    An alternative way to set/get per task policy is from prctl.
>
>    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
>    User can set/get per socket policy by setsockopt
>
>
>    int (*ndo_netpolicy_init)(struct net_device *dev,
>                              struct netpolicy_info *info);
>    Initialize device driver for NET policy
>
>    int (*ndo_get_irq_info)(struct net_device *dev,
>                            struct netpolicy_dev_info *info);
>    Collect device irq information

Instead of making the irq info a part of the ndo ops it might make
more sense to make it part of an ethtool op.  Maybe you could make it
so that you could specify a single queue at a time and get things like
statistics, IRQ, and ring information.

>    int (*ndo_set_net_policy)(struct net_device *dev,
>                              enum netpolicy_name name);
>    Configure device according to policy name

I really don't like this piece of it.  I really think we shouldn't be
leaving so much up to the driver to determine how to handle things.
In addition just passing one of 4 different types doesn't do much for
actual configuration because the actual configuration of the device is
much more complex then that.  Essentially all this does is provide a
benchmark tuning interface.

>    netpolicy_register(struct netpolicy_reg *reg);
>    netpolicy_unregister(struct netpolicy_reg *reg);
>    NET policy API to register/unregister per task/socket net policy.
>    For each task/socket, an record will be created and inserted into an RCU
>    hash table.

This piece will take a significant amount of time before it could ever
catch on.  Once again this just looks like a benchmark tuning
interface.  It isn't of much value.

>    netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
>    NET policy API to find the proper queue for packet receiving and
>    transmitting.
>
>    netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
>                         struct netpolicy_flow_spec *flow);
>    NET policy API to add flow director rules.

So Flow Director is a very Intel-centric approach.  I would suggest
taking a now at NTUPLE and RXNFC rules as that is what is actually
implemented in the kernel.  In addition I would recommend exploring
RPS and ndo_rx_flow_steer as those are existing interfaces for
configuring a specific flow to be delivered to a specific CPU.

> For using NET policy, the per-device policy must be set in advance. It will
> automatically configure the system and re-organize the resource of the system
> accordingly. For system configuration, in this series, it will disable irq
> balance, set device queue irq affinity, and modify interrupt moderation. For
> re-organizing the resource, current implementation forces that CPU and queue
> irq are 1:1 mapping. An 1:1 mapping group is also called net policy object.
> For each device policy, it maintains a policy list. Once the device policy is
> applied, the objects will be insert and tracked in that device policy list. The
> policy list only be updated when cpu/device hotplug, queue number changes or
> device policy changes.

So as a beginning step it might make more sense to try and fix
irqbalance instead of disabling it.  That is a huge red flag for me.
You are just implementing something that is more intrusive than
irqbalance and my concern here is we can't just disable it and
reconfigure things like we can with the current irqbalance.  If
irqbalance never got it right then why should we trust this?

Also how will you code handle a non 1:1 mapping.  For example I know
one thing I have been looking at trying out was implementing a setup
that would allocate 1 Tx queue per logical CPU, and 1 Rx queue per
physical CPU.  The reason for that being that from past experience on
ixgbe I have found that more Rx queues does not equal more performance
when you start stacking active queues on SMT pairs.  If you don't have
enough queues for the number of CPUs in a case such as this how would
your code handle it?

> The user can use /proc, prctl and setsockopt to set per-task and per-socket
> net policy. Once the policy is set, an related record will be inserted into RCU
> hash table. The record includes ptr, policy and net policy object. The ptr is
> the pointer address of task/socket. The object will not be assigned until the
> first package receive/transmit. The object is picked by round-robin from object
> list. Once the object is determined, the following packets will be set to
> redirect to the queue(object).
> The object can be shared. The per-task or per-socket policy can be inherited.
>
> Now NET policy supports four per device policies and three per task/socket
> policies.
>     - BULK policy: This policy is designed for high throughput. It can be
>       applied to either per device policy or per task/socket policy.
>     - CPU policy: This policy is designed for high throughput but lower CPU
>       utilization. It can be applied to either per device policy or
>       per task/socket policy.
>     - LATENCY policy: This policy is designed for low latency. It can be
>       applied to either per device policy or per task/socket policy.
>     - MIX policy: This policy can only be applied to per device policy. This
>       is designed for the case which miscellaneous types of workload running
>       on the device.

This is a rather sparse list of policies.  I know most organizations
with large data centers care about power savings AND latency.  What
you have here is a rather simplistic set of targets.  I think actual
configuration is much more complex then that.

> Lots of tests are done for net policy on platforms with Intel Xeon E5 V2
> and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.

So I assume you are saying you applied your patches on top of a 4.6.0
kernel then for testing correct?  I'm just wanting to verify we aren't
looking 4.6.0 versus the current net-next or Linus's 4.7-RCX tree.

> Netperf is used to evaluate the throughput and latency performance.
>   - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
>     -b burst -D" is used to evaluate throughput performance, which is
>     called throughput-first workload.

While this is okay for testing performance you might be better off
using a TCP_STREAM, TCP_MAERTS, and perhaps UDP_STREAM test.  There
aren't too many real-world applications that will give you the kind of
traffic pattern you see with TCP_RR being used for a bulk throughput
test.

>   - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
>     used to evaluate latency performance, which is called latency-first
>     workload.
>   - Different loads are also evaluated by running 1, 12, 24, 48 or 96
>     throughput-first workloads/latency-first workload simultaneously.
>
> For "BULK" policy, the throughput performance is on average ~1.26X than
> baseline.
> For "CPU" policy, the throughput performance is on average ~1.20X than
> baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
> For "LATENCY" policy, the latency is on average 53.5% less than the baseline.

I have misgivings about just throwing out random numbers with no
actual data to back it up.  What kind of throughput and CPU
utilization were you actually seeing?  The idea is that we should be
able to take your patches and apply them on our own system to see
similar values and I'm suspecting that in many cases you might be
focusing on the wrong things.  For example I could get good "LATENCY"
numbers by just disabling interrupt throttling.  That would look
really good for latency, but my CPU utilization would be through the
roof.  It might be useful if you could provide throughput, CPU
utilization, and latency numbers for your baseline versus each of
these settings.

> For "MIX" policy, mixed workloads performance is evaluated.
> The mixed workloads are combination of throughput-first workload and
> latency-first workload. Five different types of combinations are evaluated
> (pure throughput-first workload, pure latency-first workloads,
>  2/3 throughput-first workload + 1/3 latency-first workloads,
>  1/3 throughput-first workload + 2/3 latency-first workloads and
>  1/2 throughput-first workload + 1/2 latency-first workloads).
> For caculating the performance of mixed workloads, a weighted sum system
> is introduced.
> Score = normalized_latency * Weight + normalized_throughput * (1 - Weight).
> If we assume that the user has an equal interest in latency and throughput
> performance, the Score for "MIX" policy is on average ~1.52X than baseline.

This scoring system of yours makes no sense.  Just give us the numbers
on what the average latency did versus your "baseline" and the same
for the throughput.

> Kan Liang (30):
>   net: introduce NET policy
>   net/netpolicy: init NET policy
>   i40e/netpolicy: Implement ndo_netpolicy_init
>   net/netpolicy: get driver information
>   i40e/netpolicy: implement ndo_get_irq_info
>   net/netpolicy: get CPU information
>   net/netpolicy: create CPU and queue mapping
>   net/netpolicy: set and remove irq affinity
>   net/netpolicy: enable and disable net policy
>   net/netpolicy: introduce netpolicy object
>   net/netpolicy: set net policy by policy name
>   i40e/netpolicy: implement ndo_set_net_policy
>   i40e/netpolicy: add three new net policies
>   net/netpolicy: add MIX policy
>   i40e/netpolicy: add MIX policy support
>   net/netpolicy: net device hotplug
>   net/netpolicy: support CPU hotplug
>   net/netpolicy: handle channel changes
>   net/netpolicy: implement netpolicy register
>   net/netpolicy: introduce per socket netpolicy
>   net/policy: introduce netpolicy_pick_queue
>   net/netpolicy: set tx queues according to policy
>   i40e/ethtool: support RX_CLS_LOC_ANY
>   net/netpolicy: set rx queues according to policy
>   net/netpolicy: introduce per task net policy
>   net/netpolicy: set per task policy by proc
>   net/netpolicy: fast path for finding the queues
>   net/netpolicy: optimize for queue pair
>   net/netpolicy: limit the total record number
>   Documentation/networking: Document net policy

30 patches is quite a bit to review.  You might have better luck
getting review and/or feedback if you could split this up into at
least 2 patch sets of 15 or so patches when you try to actually submit
this.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 16:17       ` Florian Westphal
  (?)
@ 2016-07-18 17:40         ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 17:40 UTC (permalink / raw)
  To: Florian Westphal
  Cc: davem, linux-kernel, intel-wired-lan, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, ben, decot, Brandeburg,
	Jesse, andi



> 
> > > It seems strange to me to add such policies to the kernel.
> >
> > But kernel is the only place which can merge all user's requests.
> 
> I don't think so.
> 
> If different requests conflict in a way that is possible to dosomething
> meaningful the I don't see why userspace tool cannot do the same thing...
> 

Yes, I should correct my expression.
I think kernel should be a better place to do those things.
Kernel should be more efficient to coordinate those requests to get good
performance.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 17:40         ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 17:40 UTC (permalink / raw)
  To: Florian Westphal
  Cc: davem, linux-kernel, intel-wired-lan, netdev, Kirsher, Jeffrey T,
	mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm, keescook,
	viro, gorcunov, john.stultz, aduyck, ben@decadent.org.uk



> 
> > > It seems strange to me to add such policies to the kernel.
> >
> > But kernel is the only place which can merge all user's requests.
> 
> I don't think so.
> 
> If different requests conflict in a way that is possible to dosomething
> meaningful the I don't see why userspace tool cannot do the same thing...
> 

Yes, I should correct my expression.
I think kernel should be a better place to do those things.
Kernel should be more efficient to coordinate those requests to get good
performance.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 17:40         ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 17:40 UTC (permalink / raw)
  To: intel-wired-lan



> 
> > > It seems strange to me to add such policies to the kernel.
> >
> > But kernel is the only place which can merge all user's requests.
> 
> I don't think so.
> 
> If different requests conflict in a way that is possible to dosomething
> meaningful the I don't see why userspace tool cannot do the same thing...
> 

Yes, I should correct my expression.
I think kernel should be a better place to do those things.
Kernel should be more efficient to coordinate those requests to get good
performance.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 15:45     ` [Intel-wired-lan] " Andi Kleen
@ 2016-07-18 17:52       ` Cong Wang
  -1 siblings, 0 replies; 123+ messages in thread
From: Cong Wang @ 2016-07-18 17:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Florian Westphal, kan.liang, David Miller, LKML, intel-wired-lan,
	Linux Kernel Network Developers, Jeff Kirsher, Ingo Molnar,
	Peter Zijlstra, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton, Kees Cook,
	Al Viro, Cyrill Gorcunov, John Stultz, Alex Duyck, ben, decot,
	Jesse Brandeburg

On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> It seems strange to me to add such policies to the kernel.
>> Addmittingly, documentation of some settings is non-existent and one needs
>> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
>
> The problem is that different applications need different policies.
>
> The only entity which can efficiently negotiate between different
> applications' conflicting requests is the kernel. And that is pretty
> much the basic job description of a kernel: multiplex hardware
> efficiently between different users.
>
> So yes the user space tuning approach works for simple cases
> ("only run workloads that require the same tuning"), but is ultimately not
> very interesting nor scalable.

I don't read the code yet, just the cover letter.

We have global tunings, per-network-namespace tunings, per-socket
tunings. It is still unclear why you can't just put different applications
into different namespaces/containers to get different policies.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 17:52       ` Cong Wang
  0 siblings, 0 replies; 123+ messages in thread
From: Cong Wang @ 2016-07-18 17:52 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> It seems strange to me to add such policies to the kernel.
>> Addmittingly, documentation of some settings is non-existent and one needs
>> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
>
> The problem is that different applications need different policies.
>
> The only entity which can efficiently negotiate between different
> applications' conflicting requests is the kernel. And that is pretty
> much the basic job description of a kernel: multiplex hardware
> efficiently between different users.
>
> So yes the user space tuning approach works for simple cases
> ("only run workloads that require the same tuning"), but is ultimately not
> very interesting nor scalable.

I don't read the code yet, just the cover letter.

We have global tunings, per-network-namespace tunings, per-socket
tunings. It is still unclear why you can't just put different applications
into different namespaces/containers to get different policies.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 16:34       ` Tom Herbert
  (?)
@ 2016-07-18 17:58         ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 17:58 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, Brandeburg, Jesse, andi



> On Mon, Jul 18, 2016 at 5:51 PM, Liang, Kan <kan.liang@intel.com> wrote:
> >
> >
> >> >
> >> > It is a big challenge to get good network performance. First, the
> >> > network performance is not good with default system settings.
> >> > Second, it is too difficult to do automatic tuning for all possible
> >> > workloads, since workloads have different requirements. Some
> >> > workloads may want
> >> high throughput.
> >>
> >> Seems you did lots of tests to find optimal settings for a given base policy.
> >>
> > Yes. Current test only base on Intel i40e driver. The optimal settings
> > should vary for other devices. But adding settings for new device is not
> hard.
> >
> The optimal settings are very dependent on system architecture (NUMA
> config, #cpus, memory, etc.) and sometimes kernel version as well. A
> database that provides best configurations across different devices,
> architectures, and kernel version might be interesting; but beware that that is
> a whole bunch of work to maintain, Either way policy like this really should
> be handled in userspace.

The expression "optimal" I used here is not accurate. Sorry for that.
The NET policy tries to get good (near optimal) performance by very
simple configuration.
I agree that there are lots of dependencies for the optimal settings.
But most of the settings should be very similar. The near optimal performance
by applying those common settings are good enough for most users. 
We don't need to maintain a database for configurations across
devices/architectures/kernel version...

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 17:58         ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 17:58 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck



> On Mon, Jul 18, 2016 at 5:51 PM, Liang, Kan <kan.liang@intel.com> wrote:
> >
> >
> >> >
> >> > It is a big challenge to get good network performance. First, the
> >> > network performance is not good with default system settings.
> >> > Second, it is too difficult to do automatic tuning for all possible
> >> > workloads, since workloads have different requirements. Some
> >> > workloads may want
> >> high throughput.
> >>
> >> Seems you did lots of tests to find optimal settings for a given base policy.
> >>
> > Yes. Current test only base on Intel i40e driver. The optimal settings
> > should vary for other devices. But adding settings for new device is not
> hard.
> >
> The optimal settings are very dependent on system architecture (NUMA
> config, #cpus, memory, etc.) and sometimes kernel version as well. A
> database that provides best configurations across different devices,
> architectures, and kernel version might be interesting; but beware that that is
> a whole bunch of work to maintain, Either way policy like this really should
> be handled in userspace.

The expression "optimal" I used here is not accurate. Sorry for that.
The NET policy tries to get good (near optimal) performance by very
simple configuration.
I agree that there are lots of dependencies for the optimal settings.
But most of the settings should be very similar. The near optimal performance
by applying those common settings are good enough for most users. 
We don't need to maintain a database for configurations across
devices/architectures/kernel version...

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 17:58         ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 17:58 UTC (permalink / raw)
  To: intel-wired-lan



> On Mon, Jul 18, 2016 at 5:51 PM, Liang, Kan <kan.liang@intel.com> wrote:
> >
> >
> >> >
> >> > It is a big challenge to get good network performance. First, the
> >> > network performance is not good with default system settings.
> >> > Second, it is too difficult to do automatic tuning for all possible
> >> > workloads, since workloads have different requirements. Some
> >> > workloads may want
> >> high throughput.
> >>
> >> Seems you did lots of tests to find optimal settings for a given base policy.
> >>
> > Yes. Current test only base on Intel i40e driver. The optimal settings
> > should vary for other devices. But adding settings for new device is not
> hard.
> >
> The optimal settings are very dependent on system architecture (NUMA
> config, #cpus, memory, etc.) and sometimes kernel version as well. A
> database that provides best configurations across different devices,
> architectures, and kernel version might be interesting; but beware that that is
> a whole bunch of work to maintain, Either way policy like this really should
> be handled in userspace.

The expression "optimal" I used here is not accurate. Sorry for that.
The NET policy tries to get good (near optimal) performance by very
simple configuration.
I agree that there are lots of dependencies for the optimal settings.
But most of the settings should be very similar. The near optimal performance
by applying those common settings are good enough for most users. 
We don't need to maintain a database for configurations across
devices/architectures/kernel version...

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 16:22   ` [Intel-wired-lan] " Daniel Borkmann
@ 2016-07-18 18:30     ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 18:30 UTC (permalink / raw)
  To: Daniel Borkmann, davem, linux-kernel, intel-wired-lan, netdev
  Cc: Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, Brandeburg, Jesse, andi, tj



> 
> Hi Kan,
> 
> On 07/18/2016 08:55 AM, kan.liang@intel.com wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> > high throughput. Some may need low latency. Last but not least, there are
> lots of manual configurations.
> > Fine grained configuration is too difficult for users.
> >
> > NET policy intends to simplify the network configuration and get a
> > good network performance according to the hints(policy) which is
> > applied by user. It provides some typical "policies" for user which
> > can be set per-socket, per-task or per-device. The kernel will
> > automatically figures out how to merge different requests to get good
> network performance.
> > Net policy is designed for multiqueue network devices. This
> > implementation is only for Intel NICs using i40e driver. But the
> > concepts and generic code should apply to other multiqueue NICs too.
> > Net policy is also a combination of generic policy manager code and
> > some ethtool callbacks (per queue coalesce setting, flow
> > classification rules) to configure the driver.
> > This series also supports CPU hotplug and device hotplug.
> >
> > Here are some key Interfaces/APIs for NET policy.
> >
> >     /proc/net/netpolicy/$DEV/policy
> >     User can set/get per device policy from /proc
> >
> >     /proc/$PID/net_policy
> >     User can set/get per task policy from /proc
> >     prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> >     An alternative way to set/get per task policy is from prctl.
> >
> >     setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> >     User can set/get per socket policy by setsockopt
> >
> >
> >     int (*ndo_netpolicy_init)(struct net_device *dev,
> >                               struct netpolicy_info *info);
> >     Initialize device driver for NET policy
> >
> >     int (*ndo_get_irq_info)(struct net_device *dev,
> >                             struct netpolicy_dev_info *info);
> >     Collect device irq information
> >
> >     int (*ndo_set_net_policy)(struct net_device *dev,
> >                               enum netpolicy_name name);
> >     Configure device according to policy name
> >
> >     netpolicy_register(struct netpolicy_reg *reg);
> >     netpolicy_unregister(struct netpolicy_reg *reg);
> >     NET policy API to register/unregister per task/socket net policy.
> >     For each task/socket, an record will be created and inserted into an RCU
> >     hash table.
> >
> >     netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> >     NET policy API to find the proper queue for packet receiving and
> >     transmitting.
> >
> >     netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
> >                          struct netpolicy_flow_spec *flow);
> >     NET policy API to add flow director rules.
> >
> > For using NET policy, the per-device policy must be set in advance. It
> > will automatically configure the system and re-organize the resource
> > of the system accordingly. For system configuration, in this series,
> > it will disable irq balance, set device queue irq affinity, and modify
> > interrupt moderation. For re-organizing the resource, current
> > implementation forces that CPU and queue irq are 1:1 mapping. An 1:1
> mapping group is also called net policy object.
> > For each device policy, it maintains a policy list. Once the device
> > policy is applied, the objects will be insert and tracked in that
> > device policy list. The policy list only be updated when cpu/device
> > hotplug, queue number changes or device policy changes.
> > The user can use /proc, prctl and setsockopt to set per-task and
> > per-socket net policy. Once the policy is set, an related record will
> > be inserted into RCU hash table. The record includes ptr, policy and
> > net policy object. The ptr is the pointer address of task/socket. The
> > object will not be assigned until the first package receive/transmit.
> > The object is picked by round-robin from object list. Once the object
> > is determined, the following packets will be set to redirect to the
> queue(object).
> > The object can be shared. The per-task or per-socket policy can be
> inherited.
> >
> > Now NET policy supports four per device policies and three per
> > task/socket policies.
> >      - BULK policy: This policy is designed for high throughput. It can be
> >        applied to either per device policy or per task/socket policy.
> >      - CPU policy: This policy is designed for high throughput but lower CPU
> >        utilization. It can be applied to either per device policy or
> >        per task/socket policy.
> >      - LATENCY policy: This policy is designed for low latency. It can be
> >        applied to either per device policy or per task/socket policy.
> >      - MIX policy: This policy can only be applied to per device policy. This
> >        is designed for the case which miscellaneous types of workload running
> >        on the device.
> 
> I'm missing a bit of discussion on the existing facilities there are under
> networking and why they cannot be adapted to support these kind of hints?
>

Currently, I use existing ethtool interfaces to configure the device.
There could be more later.
 
> On a higher level picture, why for example, a new cgroup in combination
> with tc shouldn't be the ones resolving these policies on resource usage?
>

The NET policy doesn't support cgroup yet, but it's on my todo list.
The granularity for the device resource is per queue. The packet will be
redirected to the specific queue.
I'm not sure if cgroup with tc can do that.

 
> If sockets want to provide specific hints that may or may not be granted, then
> this could be via SO_MARK, maybe SO_PRIORITY with above semantics or
> some new marker perhaps that can be accessed from lower layers.
>
 
I think SO_MARK tries to filter the packet for connections.
We need an interface to filter the packet per device queue.
There is no such options as far as I know.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 18:30     ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 18:30 UTC (permalink / raw)
  To: intel-wired-lan



> 
> Hi Kan,
> 
> On 07/18/2016 08:55 AM, kan.liang at intel.com wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> > high throughput. Some may need low latency. Last but not least, there are
> lots of manual configurations.
> > Fine grained configuration is too difficult for users.
> >
> > NET policy intends to simplify the network configuration and get a
> > good network performance according to the hints(policy) which is
> > applied by user. It provides some typical "policies" for user which
> > can be set per-socket, per-task or per-device. The kernel will
> > automatically figures out how to merge different requests to get good
> network performance.
> > Net policy is designed for multiqueue network devices. This
> > implementation is only for Intel NICs using i40e driver. But the
> > concepts and generic code should apply to other multiqueue NICs too.
> > Net policy is also a combination of generic policy manager code and
> > some ethtool callbacks (per queue coalesce setting, flow
> > classification rules) to configure the driver.
> > This series also supports CPU hotplug and device hotplug.
> >
> > Here are some key Interfaces/APIs for NET policy.
> >
> >     /proc/net/netpolicy/$DEV/policy
> >     User can set/get per device policy from /proc
> >
> >     /proc/$PID/net_policy
> >     User can set/get per task policy from /proc
> >     prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> >     An alternative way to set/get per task policy is from prctl.
> >
> >     setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> >     User can set/get per socket policy by setsockopt
> >
> >
> >     int (*ndo_netpolicy_init)(struct net_device *dev,
> >                               struct netpolicy_info *info);
> >     Initialize device driver for NET policy
> >
> >     int (*ndo_get_irq_info)(struct net_device *dev,
> >                             struct netpolicy_dev_info *info);
> >     Collect device irq information
> >
> >     int (*ndo_set_net_policy)(struct net_device *dev,
> >                               enum netpolicy_name name);
> >     Configure device according to policy name
> >
> >     netpolicy_register(struct netpolicy_reg *reg);
> >     netpolicy_unregister(struct netpolicy_reg *reg);
> >     NET policy API to register/unregister per task/socket net policy.
> >     For each task/socket, an record will be created and inserted into an RCU
> >     hash table.
> >
> >     netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> >     NET policy API to find the proper queue for packet receiving and
> >     transmitting.
> >
> >     netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
> >                          struct netpolicy_flow_spec *flow);
> >     NET policy API to add flow director rules.
> >
> > For using NET policy, the per-device policy must be set in advance. It
> > will automatically configure the system and re-organize the resource
> > of the system accordingly. For system configuration, in this series,
> > it will disable irq balance, set device queue irq affinity, and modify
> > interrupt moderation. For re-organizing the resource, current
> > implementation forces that CPU and queue irq are 1:1 mapping. An 1:1
> mapping group is also called net policy object.
> > For each device policy, it maintains a policy list. Once the device
> > policy is applied, the objects will be insert and tracked in that
> > device policy list. The policy list only be updated when cpu/device
> > hotplug, queue number changes or device policy changes.
> > The user can use /proc, prctl and setsockopt to set per-task and
> > per-socket net policy. Once the policy is set, an related record will
> > be inserted into RCU hash table. The record includes ptr, policy and
> > net policy object. The ptr is the pointer address of task/socket. The
> > object will not be assigned until the first package receive/transmit.
> > The object is picked by round-robin from object list. Once the object
> > is determined, the following packets will be set to redirect to the
> queue(object).
> > The object can be shared. The per-task or per-socket policy can be
> inherited.
> >
> > Now NET policy supports four per device policies and three per
> > task/socket policies.
> >      - BULK policy: This policy is designed for high throughput. It can be
> >        applied to either per device policy or per task/socket policy.
> >      - CPU policy: This policy is designed for high throughput but lower CPU
> >        utilization. It can be applied to either per device policy or
> >        per task/socket policy.
> >      - LATENCY policy: This policy is designed for low latency. It can be
> >        applied to either per device policy or per task/socket policy.
> >      - MIX policy: This policy can only be applied to per device policy. This
> >        is designed for the case which miscellaneous types of workload running
> >        on the device.
> 
> I'm missing a bit of discussion on the existing facilities there are under
> networking and why they cannot be adapted to support these kind of hints?
>

Currently, I use existing ethtool interfaces to configure the device.
There could be more later.
 
> On a higher level picture, why for example, a new cgroup in combination
> with tc shouldn't be the ones resolving these policies on resource usage?
>

The NET policy doesn't support cgroup yet, but it's on my todo list.
The granularity for the device resource is per queue. The packet will be
redirected to the specific queue.
I'm not sure if cgroup with tc can do that.

 
> If sockets want to provide specific hints that may or may not be granted, then
> this could be via SO_MARK, maybe SO_PRIORITY with above semantics or
> some new marker perhaps that can be accessed from lower layers.
>
 
I think SO_MARK tries to filter the packet for connections.
We need an interface to filter the packet per device queue.
There is no such options as far as I know.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 15:45     ` [Intel-wired-lan] " Andi Kleen
@ 2016-07-18 19:04       ` Hannes Frederic Sowa
  -1 siblings, 0 replies; 123+ messages in thread
From: Hannes Frederic Sowa @ 2016-07-18 19:04 UTC (permalink / raw)
  To: Andi Kleen, Florian Westphal
  Cc: kan.liang, davem, linux-kernel, intel-wired-lan, netdev,
	jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg

On 18.07.2016 17:45, Andi Kleen wrote:
>> It seems strange to me to add such policies to the kernel.
>> Addmittingly, documentation of some settings is non-existent and one needs
>> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> 
> The problem is that different applications need different policies.

I fear that if those policies get changed in future, people will rely on
some of their side-effects, causing us to add more and more policies
which basically just differ in those side-effects.

If you compare your policies to madvise or fadvise options, they seem a
have a much more strict and narrower effects, which can be reasoned much
more easily about.

> The only entity which can efficiently negotiate between different
> applications' conflicting requests is the kernel. And that is pretty 
> much the basic job description of a kernel: multiplex hardware
> efficiently between different users.

The multiplexing part seems to be not really relevant for the per-device
settings, thus being controllable from current user space just fine.
Per-task setting could be conflicting with per-socket settings which
could lead to non-deterministic behavior. Probably semantically it
should be made clear what overrides what here (here == cover letter).
Things like indeterminate allocation of sockets in a threaded
environment come to my mind. Also allocation strategy could very much
depend on the installed rss key.

> So yes the user space tuning approach works for simple cases
> ("only run workloads that require the same tuning"), but is ultimately not
> very interesting nor scalable.

I wonder if this can be attacked from a different angle. What would be
missing to add support for this in user space? The first possibility
that came to my mind is to just multiplex those hints in the kernel.
Implement a generic way to add metadata to sockets and allow tuning
daemons to retrieve them via sockdiag? I could imagine that if the
SO_INCOMING_CPU information would be visible in sockdiag, one could
already do more automatic tuning and basically allow to implement your
policy in user space.

Bye,
Hannes

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 19:04       ` Hannes Frederic Sowa
  0 siblings, 0 replies; 123+ messages in thread
From: Hannes Frederic Sowa @ 2016-07-18 19:04 UTC (permalink / raw)
  To: intel-wired-lan

On 18.07.2016 17:45, Andi Kleen wrote:
>> It seems strange to me to add such policies to the kernel.
>> Addmittingly, documentation of some settings is non-existent and one needs
>> various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> 
> The problem is that different applications need different policies.

I fear that if those policies get changed in future, people will rely on
some of their side-effects, causing us to add more and more policies
which basically just differ in those side-effects.

If you compare your policies to madvise or fadvise options, they seem a
have a much more strict and narrower effects, which can be reasoned much
more easily about.

> The only entity which can efficiently negotiate between different
> applications' conflicting requests is the kernel. And that is pretty 
> much the basic job description of a kernel: multiplex hardware
> efficiently between different users.

The multiplexing part seems to be not really relevant for the per-device
settings, thus being controllable from current user space just fine.
Per-task setting could be conflicting with per-socket settings which
could lead to non-deterministic behavior. Probably semantically it
should be made clear what overrides what here (here == cover letter).
Things like indeterminate allocation of sockets in a threaded
environment come to my mind. Also allocation strategy could very much
depend on the installed rss key.

> So yes the user space tuning approach works for simple cases
> ("only run workloads that require the same tuning"), but is ultimately not
> very interesting nor scalable.

I wonder if this can be attacked from a different angle. What would be
missing to add support for this in user space? The first possibility
that came to my mind is to just multiplex those hints in the kernel.
Implement a generic way to add metadata to sockets and allow tuning
daemons to retrieve them via sockdiag? I could imagine that if the
SO_INCOMING_CPU information would be visible in sockdiag, one could
already do more automatic tuning and basically allow to implement your
policy in user space.

Bye,
Hannes


^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 19:04       ` [Intel-wired-lan] " Hannes Frederic Sowa
@ 2016-07-18 19:43         ` Andi Kleen
  -1 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 19:43 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andi Kleen, Florian Westphal, kan.liang, davem, linux-kernel,
	intel-wired-lan, netdev, jeffrey.t.kirsher, mingo, peterz,
	kuznet, jmorris, yoshfuji, kaber, akpm, keescook, viro, gorcunov,
	john.stultz, aduyck, ben, decot, jesse.brandeburg

> I wonder if this can be attacked from a different angle. What would be
> missing to add support for this in user space? The first possibility
> that came to my mind is to just multiplex those hints in the kernel.

"just" is the handwaving part here -- you're proposing a micro kernel
approach where part of the multiplexing job that the kernel is doing
is farmed out to a message passing user space component.

I suspect this would be far more complicated to get right and
perform well than a straight forward monolithic kernel subsystem --
which is traditionally how Linux has approached things.

The daemon would always need to work with out of date state
compared to the latest, because it cannot do any locking with the
kernel state.  So you end up with a complex distributed system with multiple
agents "fighting" with each other, and the tuning agent
never being able to keep up with the actual work.

Also of course it would be fundamentally less efficient than
kernel code doing that, just because of the additional context
switches needed.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 19:43         ` Andi Kleen
  0 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 19:43 UTC (permalink / raw)
  To: intel-wired-lan

> I wonder if this can be attacked from a different angle. What would be
> missing to add support for this in user space? The first possibility
> that came to my mind is to just multiplex those hints in the kernel.

"just" is the handwaving part here -- you're proposing a micro kernel
approach where part of the multiplexing job that the kernel is doing
is farmed out to a message passing user space component.

I suspect this would be far more complicated to get right and
perform well than a straight forward monolithic kernel subsystem --
which is traditionally how Linux has approached things.

The daemon would always need to work with out of date state
compared to the latest, because it cannot do any locking with the
kernel state.  So you end up with a complex distributed system with multiple
agents "fighting" with each other, and the tuning agent
never being able to keep up with the actual work.

Also of course it would be fundamentally less efficient than
kernel code doing that, just because of the additional context
switches needed.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 17:00   ` [Intel-wired-lan] " Alexander Duyck
  (?)
@ 2016-07-18 19:45     ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 19:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, linux-kernel, intel-wired-lan, Netdev, Kirsher,
	Jeffrey T, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	keescook, viro, gorcunov, john.stultz, Alex Duyck, Ben Hutchings,
	decot, Brandeburg, Jesse, Andi Kleen



> On Sun, Jul 17, 2016 at 11:55 PM,  <kan.liang@intel.com> wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> > high throughput. Some may need low latency. Last but not least, there are
> lots of manual configurations.
> > Fine grained configuration is too difficult for users.
> 
> The problem as I see it is that this is just going to end up likely being an even
> more intrusive version of irqbalance.  I really don't like the way that turned
> out as it did a number of really dumb things that usually result in it being
> disabled as soon as you actually want to do anything that will actually involve
> any kind of performance tuning.  If this stuff is pushed into the kernel it will
> be even harder to get rid of and that is definitely a bad thing.
> 
> > NET policy intends to simplify the network configuration and get a
> > good network performance according to the hints(policy) which is
> > applied by user. It provides some typical "policies" for user which
> > can be set per-socket, per-task or per-device. The kernel will
> > automatically figures out how to merge different requests to get good
> network performance.
> 
> So where is your policy for power saving?  From past experience I can tell you

There is no policy for power saving yet. I will add it to my todo list.

> that while performance tuning is a good thing, doing so at the expense of
> power management is bad.  In addition you seem to be making a lot of
> assumptions here that the end users are going to rewrite their applications to
> use the new socket options you added in order to try and tune the

Currently, they can set per task policy by proc to get good performance without
code changes.

> performance.  I have a hard time believing most developers are going to go
> to all that trouble.  In addition I suspect that even if they do go to that
> trouble they will probably still screw it up and you will end up with
> applications advertising latency as a goal when they should have specified
> CPU and so on.
> 
> > Net policy is designed for multiqueue network devices. This
> > implementation is only for Intel NICs using i40e driver. But the
> > concepts and generic code should apply to other multiqueue NICs too.
> 
> I would argue that your code is not very generic.  The fact that it is relying on
> flow director already greatly limits what you can do.  If you want to make this
> truly generic I would say you need to find ways to make this work on
> everything all the way down to things like i40evf and igb which don't have
> support for Flow Director.

Actually the NET policy codes employ ethtool's interface set_rxnfc to set rules.
It should be generic.
I guess I emphasize Flow Director too much in the document which make
you confuse.

> 
> > Net policy is also a combination of generic policy manager code and
> > some ethtool callbacks (per queue coalesce setting, flow
> > classification rules) to configure the driver.
> > This series also supports CPU hotplug and device hotplug.
> >
> > Here are some key Interfaces/APIs for NET policy.
> >
> >    /proc/net/netpolicy/$DEV/policy
> >    User can set/get per device policy from /proc
> >
> >    /proc/$PID/net_policy
> >    User can set/get per task policy from /proc
> >    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> >    An alternative way to set/get per task policy is from prctl.
> >
> >    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> >    User can set/get per socket policy by setsockopt
> >
> >
> >    int (*ndo_netpolicy_init)(struct net_device *dev,
> >                              struct netpolicy_info *info);
> >    Initialize device driver for NET policy
> >
> >    int (*ndo_get_irq_info)(struct net_device *dev,
> >                            struct netpolicy_dev_info *info);
> >    Collect device irq information
> 
> Instead of making the irq info a part of the ndo ops it might make more
> sense to make it part of an ethtool op.  Maybe you could make it so that you
> could specify a single queue at a time and get things like statistics, IRQ, and
> ring information.

I will think about it. Thanks.

> 
> >    int (*ndo_set_net_policy)(struct net_device *dev,
> >                              enum netpolicy_name name);
> >    Configure device according to policy name
> 
> I really don't like this piece of it.  I really think we shouldn't be leaving so
> much up to the driver to determine how to handle things.

There are some settings are device specific. For example, the interrupt
moderation for i40e for BULK policy is (50, 125). For other device, the number
could be different. For other device, only tunning interrupt moderation may
not be enough. So we need an interface for driver specific setting.

> In addition just passing one of 4 different types doesn't do much for actual
> configuration because the actual configuration of the device is much more
> complex then that.  Essentially all this does is provide a benchmark tuning
> interface.

The actual configuration is too complex for the user.
The different types at least provide them a good performance and
a good baseline to start.

> 
> >    netpolicy_register(struct netpolicy_reg *reg);
> >    netpolicy_unregister(struct netpolicy_reg *reg);
> >    NET policy API to register/unregister per task/socket net policy.
> >    For each task/socket, an record will be created and inserted into an RCU
> >    hash table.
> 
> This piece will take a significant amount of time before it could ever catch on.
> Once again this just looks like a benchmark tuning interface.  It isn't of much
> value.
> 
> >    netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> >    NET policy API to find the proper queue for packet receiving and
> >    transmitting.
> >
> >    netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
> >                         struct netpolicy_flow_spec *flow);
> >    NET policy API to add flow director rules.
> 
> So Flow Director is a very Intel-centric approach.  I would suggest taking a
> now at NTUPLE and RXNFC rules as that is what is actually implemented in
> the kernel.  In addition I would recommend exploring RPS and
> ndo_rx_flow_steer as those are existing interfaces for configuring a specific
> flow to be delivered to a specific CPU.

Current codes use set_rxnfc.
Sorry for the confusion for Flow Director.

> 
> > For using NET policy, the per-device policy must be set in advance. It
> > will automatically configure the system and re-organize the resource
> > of the system accordingly. For system configuration, in this series,
> > it will disable irq balance, set device queue irq affinity, and modify
> > interrupt moderation. For re-organizing the resource, current
> > implementation forces that CPU and queue irq are 1:1 mapping. An 1:1
> mapping group is also called net policy object.
> > For each device policy, it maintains a policy list. Once the device
> > policy is applied, the objects will be insert and tracked in that
> > device policy list. The policy list only be updated when cpu/device
> > hotplug, queue number changes or device policy changes.
> 
> So as a beginning step it might make more sense to try and fix irqbalance
> instead of disabling it.  That is a huge red flag for me.
> You are just implementing something that is more intrusive than irqbalance
> and my concern here is we can't just disable it and reconfigure things like we
> can with the current irqbalance.  If irqbalance never got it right then why
> should we trust this?
> 
> Also how will you code handle a non 1:1 mapping.  For example I know one
> thing I have been looking at trying out was implementing a setup that would
> allocate 1 Tx queue per logical CPU, and 1 Rx queue per physical CPU.  The
> reason for that being that from past experience on ixgbe I have found that
> more Rx queues does not equal more performance when you start stacking
> active queues on SMT pairs.  If you don't have enough queues for the
> number of CPUs in a case such as this how would your code handle it?

The basic schedule unit for NET policy is object. Currently, one object includes
1 CPU and 1 queue. CPU and queue are 1:1 mapping.
For your case, we can define object as 2 TX queue, 1 Rx queue and 2 logical CPU
in the same physical core.
The object generation code is not hard to change.

> 
> > The user can use /proc, prctl and setsockopt to set per-task and
> > per-socket net policy. Once the policy is set, an related record will
> > be inserted into RCU hash table. The record includes ptr, policy and
> > net policy object. The ptr is the pointer address of task/socket. The
> > object will not be assigned until the first package receive/transmit.
> > The object is picked by round-robin from object list. Once the object
> > is determined, the following packets will be set to redirect to the
> queue(object).
> > The object can be shared. The per-task or per-socket policy can be
> inherited.
> >
> > Now NET policy supports four per device policies and three per
> > task/socket policies.
> >     - BULK policy: This policy is designed for high throughput. It can be
> >       applied to either per device policy or per task/socket policy.
> >     - CPU policy: This policy is designed for high throughput but lower CPU
> >       utilization. It can be applied to either per device policy or
> >       per task/socket policy.
> >     - LATENCY policy: This policy is designed for low latency. It can be
> >       applied to either per device policy or per task/socket policy.
> >     - MIX policy: This policy can only be applied to per device policy. This
> >       is designed for the case which miscellaneous types of workload running
> >       on the device.
> 
> This is a rather sparse list of policies.  I know most organizations with large
> data centers care about power savings AND latency.  What you have here is a
> rather simplistic set of targets.  I think actual configuration is much more
> complex then that.

There will be more policies are added in the future.

> 
> > Lots of tests are done for net policy on platforms with Intel Xeon E5
> > V2 and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
> 
> So I assume you are saying you applied your patches on top of a 4.6.0 kernel
> then for testing correct?  I'm just wanting to verify we aren't looking 4.6.0
> versus the current net-next or Linus's 4.7-RCX tree.

Yes, the patches on top of 4.6.0 kernel.

> 
> > Netperf is used to evaluate the throughput and latency performance.
> >   - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
> >     -b burst -D" is used to evaluate throughput performance, which is
> >     called throughput-first workload.
> 
> While this is okay for testing performance you might be better off using a
> TCP_STREAM, TCP_MAERTS, and perhaps UDP_STREAM test.  There aren't
> too many real-world applications that will give you the kind of traffic pattern
> you see with TCP_RR being used for a bulk throughput test.
> 
> >   - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
> >     used to evaluate latency performance, which is called latency-first
> >     workload.
> >   - Different loads are also evaluated by running 1, 12, 24, 48 or 96
> >     throughput-first workloads/latency-first workload simultaneously.
> >
> > For "BULK" policy, the throughput performance is on average ~1.26X
> > than baseline.
> > For "CPU" policy, the throughput performance is on average ~1.20X than
> > baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
> > For "LATENCY" policy, the latency is on average 53.5% less than the
> baseline.
> 
> I have misgivings about just throwing out random numbers with no actual
> data to back it up.  What kind of throughput and CPU utilization were you
> actually seeing?  The idea is that we should be able to take your patches and
> apply them on our own system to see similar values and I'm suspecting that
> in many cases you might be focusing on the wrong things.  For example I
> could get good "LATENCY"
> numbers by just disabling interrupt throttling.  That would look really good
> for latency, but my CPU utilization would be through the roof.  It might be
> useful if you could provide throughput, CPU utilization, and latency numbers
> for your baseline versus each of these settings.

I'm not sure if I can share the absolute number. But here is a white paper which may
include the number you want.

The per-queue interrupt moderation solution mentioned in the white paper is what I
use in kernel NET policy.
The baseline is not an out of box Linux. The baseline did simple configuration which
people usually do (disable irqbalance, set queue irq affinity, set CPU affinity) for
better performance.
https://01.org/improve-network-performance-setting-queue-interrupt-moderation-linux

You may find that the performance data for NET policy is not as good as the data in
white paper. That's because the kernel implementation doesn't set CPU affinity for
application. There is also a little bit overhead for current kernel implementation.

Thanks,
Kan  
> 
> > For "MIX" policy, mixed workloads performance is evaluated. per-queue interrupt moderation solution
> > The mixed workloads are combination of throughput-first workload and
> > latency-first workload. Five different types of combinations are
> > evaluated (pure throughput-first workload, pure latency-first
> > workloads,
> >  2/3 throughput-first workload + 1/3 latency-first workloads,
> >  1/3 throughput-first workload + 2/3 latency-first workloads and
> >  1/2 throughput-first workload + 1/2 latency-first workloads).
> > For caculating the performance of mixed workloads, a weighted sum
> > system is introduced.
> > Score = normalized_latency * Weight + normalized_throughput * (1 -
> Weight).
> > If we assume that the user has an equal interest in latency and
> > throughput performance, the Score for "MIX" policy is on average ~1.52X
> than baseline.
> 
> This scoring system of yours makes no sense.  Just give us the numbers on
> what the average latency did versus your "baseline" and the same for the
> throughput.
> 
> > Kan Liang (30):
> >   net: introduce NET policy
> >   net/netpolicy: init NET policy
> >   i40e/netpolicy: Implement ndo_netpolicy_init
> >   net/netpolicy: get driver information
> >   i40e/netpolicy: implement ndo_get_irq_info
> >   net/netpolicy: get CPU information
> >   net/netpolicy: create CPU and queue mapping
> >   net/netpolicy: set and remove irq affinity
> >   net/netpolicy: enable and disable net policy
> >   net/netpolicy: introduce netpolicy object
> >   net/netpolicy: set net policy by policy name
> >   i40e/netpolicy: implement ndo_set_net_policy
> >   i40e/netpolicy: add three new net policies
> >   net/netpolicy: add MIX policy
> >   i40e/netpolicy: add MIX policy support
> >   net/netpolicy: net device hotplug
> >   net/netpolicy: support CPU hotplug
> >   net/netpolicy: handle channel changes
> >   net/netpolicy: implement netpolicy register
> >   net/netpolicy: introduce per socket netpolicy
> >   net/policy: introduce netpolicy_pick_queue
> >   net/netpolicy: set tx queues according to policy
> >   i40e/ethtool: support RX_CLS_LOC_ANY
> >   net/netpolicy: set rx queues according to policy
> >   net/netpolicy: introduce per task net policy
> >   net/netpolicy: set per task policy by proc
> >   net/netpolicy: fast path for finding the queues
> >   net/netpolicy: optimize for queue pair
> >   net/netpolicy: limit the total record number
> >   Documentation/networking: Document net policy
> 
> 30 patches is quite a bit to review.  You might have better luck getting review
> and/or feedback if you could split this up into at least 2 patch sets of 15 or so
> patches when you try to actually submit this.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 19:45     ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 19:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David Miller, linux-kernel, intel-wired-lan, Netdev, Kirsher,
	Jeffrey T, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	keescook, viro, gorcunov, john.stultz, Alex Duyck, Ben Hutchings,
	decot, Brandeburg, Jesse, Andi



> On Sun, Jul 17, 2016 at 11:55 PM,  <kan.liang@intel.com> wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> > high throughput. Some may need low latency. Last but not least, there are
> lots of manual configurations.
> > Fine grained configuration is too difficult for users.
> 
> The problem as I see it is that this is just going to end up likely being an even
> more intrusive version of irqbalance.  I really don't like the way that turned
> out as it did a number of really dumb things that usually result in it being
> disabled as soon as you actually want to do anything that will actually involve
> any kind of performance tuning.  If this stuff is pushed into the kernel it will
> be even harder to get rid of and that is definitely a bad thing.
> 
> > NET policy intends to simplify the network configuration and get a
> > good network performance according to the hints(policy) which is
> > applied by user. It provides some typical "policies" for user which
> > can be set per-socket, per-task or per-device. The kernel will
> > automatically figures out how to merge different requests to get good
> network performance.
> 
> So where is your policy for power saving?  From past experience I can tell you

There is no policy for power saving yet. I will add it to my todo list.

> that while performance tuning is a good thing, doing so at the expense of
> power management is bad.  In addition you seem to be making a lot of
> assumptions here that the end users are going to rewrite their applications to
> use the new socket options you added in order to try and tune the

Currently, they can set per task policy by proc to get good performance without
code changes.

> performance.  I have a hard time believing most developers are going to go
> to all that trouble.  In addition I suspect that even if they do go to that
> trouble they will probably still screw it up and you will end up with
> applications advertising latency as a goal when they should have specified
> CPU and so on.
> 
> > Net policy is designed for multiqueue network devices. This
> > implementation is only for Intel NICs using i40e driver. But the
> > concepts and generic code should apply to other multiqueue NICs too.
> 
> I would argue that your code is not very generic.  The fact that it is relying on
> flow director already greatly limits what you can do.  If you want to make this
> truly generic I would say you need to find ways to make this work on
> everything all the way down to things like i40evf and igb which don't have
> support for Flow Director.

Actually the NET policy codes employ ethtool's interface set_rxnfc to set rules.
It should be generic.
I guess I emphasize Flow Director too much in the document which make
you confuse.

> 
> > Net policy is also a combination of generic policy manager code and
> > some ethtool callbacks (per queue coalesce setting, flow
> > classification rules) to configure the driver.
> > This series also supports CPU hotplug and device hotplug.
> >
> > Here are some key Interfaces/APIs for NET policy.
> >
> >    /proc/net/netpolicy/$DEV/policy
> >    User can set/get per device policy from /proc
> >
> >    /proc/$PID/net_policy
> >    User can set/get per task policy from /proc
> >    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> >    An alternative way to set/get per task policy is from prctl.
> >
> >    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> >    User can set/get per socket policy by setsockopt
> >
> >
> >    int (*ndo_netpolicy_init)(struct net_device *dev,
> >                              struct netpolicy_info *info);
> >    Initialize device driver for NET policy
> >
> >    int (*ndo_get_irq_info)(struct net_device *dev,
> >                            struct netpolicy_dev_info *info);
> >    Collect device irq information
> 
> Instead of making the irq info a part of the ndo ops it might make more
> sense to make it part of an ethtool op.  Maybe you could make it so that you
> could specify a single queue at a time and get things like statistics, IRQ, and
> ring information.

I will think about it. Thanks.

> 
> >    int (*ndo_set_net_policy)(struct net_device *dev,
> >                              enum netpolicy_name name);
> >    Configure device according to policy name
> 
> I really don't like this piece of it.  I really think we shouldn't be leaving so
> much up to the driver to determine how to handle things.

There are some settings are device specific. For example, the interrupt
moderation for i40e for BULK policy is (50, 125). For other device, the number
could be different. For other device, only tunning interrupt moderation may
not be enough. So we need an interface for driver specific setting.

> In addition just passing one of 4 different types doesn't do much for actual
> configuration because the actual configuration of the device is much more
> complex then that.  Essentially all this does is provide a benchmark tuning
> interface.

The actual configuration is too complex for the user.
The different types at least provide them a good performance and
a good baseline to start.

> 
> >    netpolicy_register(struct netpolicy_reg *reg);
> >    netpolicy_unregister(struct netpolicy_reg *reg);
> >    NET policy API to register/unregister per task/socket net policy.
> >    For each task/socket, an record will be created and inserted into an RCU
> >    hash table.
> 
> This piece will take a significant amount of time before it could ever catch on.
> Once again this just looks like a benchmark tuning interface.  It isn't of much
> value.
> 
> >    netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> >    NET policy API to find the proper queue for packet receiving and
> >    transmitting.
> >
> >    netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
> >                         struct netpolicy_flow_spec *flow);
> >    NET policy API to add flow director rules.
> 
> So Flow Director is a very Intel-centric approach.  I would suggest taking a
> now at NTUPLE and RXNFC rules as that is what is actually implemented in
> the kernel.  In addition I would recommend exploring RPS and
> ndo_rx_flow_steer as those are existing interfaces for configuring a specific
> flow to be delivered to a specific CPU.

Current codes use set_rxnfc.
Sorry for the confusion for Flow Director.

> 
> > For using NET policy, the per-device policy must be set in advance. It
> > will automatically configure the system and re-organize the resource
> > of the system accordingly. For system configuration, in this series,
> > it will disable irq balance, set device queue irq affinity, and modify
> > interrupt moderation. For re-organizing the resource, current
> > implementation forces that CPU and queue irq are 1:1 mapping. An 1:1
> mapping group is also called net policy object.
> > For each device policy, it maintains a policy list. Once the device
> > policy is applied, the objects will be insert and tracked in that
> > device policy list. The policy list only be updated when cpu/device
> > hotplug, queue number changes or device policy changes.
> 
> So as a beginning step it might make more sense to try and fix irqbalance
> instead of disabling it.  That is a huge red flag for me.
> You are just implementing something that is more intrusive than irqbalance
> and my concern here is we can't just disable it and reconfigure things like we
> can with the current irqbalance.  If irqbalance never got it right then why
> should we trust this?
> 
> Also how will you code handle a non 1:1 mapping.  For example I know one
> thing I have been looking at trying out was implementing a setup that would
> allocate 1 Tx queue per logical CPU, and 1 Rx queue per physical CPU.  The
> reason for that being that from past experience on ixgbe I have found that
> more Rx queues does not equal more performance when you start stacking
> active queues on SMT pairs.  If you don't have enough queues for the
> number of CPUs in a case such as this how would your code handle it?

The basic schedule unit for NET policy is object. Currently, one object includes
1 CPU and 1 queue. CPU and queue are 1:1 mapping.
For your case, we can define object as 2 TX queue, 1 Rx queue and 2 logical CPU
in the same physical core.
The object generation code is not hard to change.

> 
> > The user can use /proc, prctl and setsockopt to set per-task and
> > per-socket net policy. Once the policy is set, an related record will
> > be inserted into RCU hash table. The record includes ptr, policy and
> > net policy object. The ptr is the pointer address of task/socket. The
> > object will not be assigned until the first package receive/transmit.
> > The object is picked by round-robin from object list. Once the object
> > is determined, the following packets will be set to redirect to the
> queue(object).
> > The object can be shared. The per-task or per-socket policy can be
> inherited.
> >
> > Now NET policy supports four per device policies and three per
> > task/socket policies.
> >     - BULK policy: This policy is designed for high throughput. It can be
> >       applied to either per device policy or per task/socket policy.
> >     - CPU policy: This policy is designed for high throughput but lower CPU
> >       utilization. It can be applied to either per device policy or
> >       per task/socket policy.
> >     - LATENCY policy: This policy is designed for low latency. It can be
> >       applied to either per device policy or per task/socket policy.
> >     - MIX policy: This policy can only be applied to per device policy. This
> >       is designed for the case which miscellaneous types of workload running
> >       on the device.
> 
> This is a rather sparse list of policies.  I know most organizations with large
> data centers care about power savings AND latency.  What you have here is a
> rather simplistic set of targets.  I think actual configuration is much more
> complex then that.

There will be more policies are added in the future.

> 
> > Lots of tests are done for net policy on platforms with Intel Xeon E5
> > V2 and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
> 
> So I assume you are saying you applied your patches on top of a 4.6.0 kernel
> then for testing correct?  I'm just wanting to verify we aren't looking 4.6.0
> versus the current net-next or Linus's 4.7-RCX tree.

Yes, the patches on top of 4.6.0 kernel.

> 
> > Netperf is used to evaluate the throughput and latency performance.
> >   - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
> >     -b burst -D" is used to evaluate throughput performance, which is
> >     called throughput-first workload.
> 
> While this is okay for testing performance you might be better off using a
> TCP_STREAM, TCP_MAERTS, and perhaps UDP_STREAM test.  There aren't
> too many real-world applications that will give you the kind of traffic pattern
> you see with TCP_RR being used for a bulk throughput test.
> 
> >   - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
> >     used to evaluate latency performance, which is called latency-first
> >     workload.
> >   - Different loads are also evaluated by running 1, 12, 24, 48 or 96
> >     throughput-first workloads/latency-first workload simultaneously.
> >
> > For "BULK" policy, the throughput performance is on average ~1.26X
> > than baseline.
> > For "CPU" policy, the throughput performance is on average ~1.20X than
> > baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
> > For "LATENCY" policy, the latency is on average 53.5% less than the
> baseline.
> 
> I have misgivings about just throwing out random numbers with no actual
> data to back it up.  What kind of throughput and CPU utilization were you
> actually seeing?  The idea is that we should be able to take your patches and
> apply them on our own system to see similar values and I'm suspecting that
> in many cases you might be focusing on the wrong things.  For example I
> could get good "LATENCY"
> numbers by just disabling interrupt throttling.  That would look really good
> for latency, but my CPU utilization would be through the roof.  It might be
> useful if you could provide throughput, CPU utilization, and latency numbers
> for your baseline versus each of these settings.

I'm not sure if I can share the absolute number. But here is a white paper which may
include the number you want.

The per-queue interrupt moderation solution mentioned in the white paper is what I
use in kernel NET policy.
The baseline is not an out of box Linux. The baseline did simple configuration which
people usually do (disable irqbalance, set queue irq affinity, set CPU affinity) for
better performance.
https://01.org/improve-network-performance-setting-queue-interrupt-moderation-linux

You may find that the performance data for NET policy is not as good as the data in
white paper. That's because the kernel implementation doesn't set CPU affinity for
application. There is also a little bit overhead for current kernel implementation.

Thanks,
Kan  
> 
> > For "MIX" policy, mixed workloads performance is evaluated. per-queue interrupt moderation solution
> > The mixed workloads are combination of throughput-first workload and
> > latency-first workload. Five different types of combinations are
> > evaluated (pure throughput-first workload, pure latency-first
> > workloads,
> >  2/3 throughput-first workload + 1/3 latency-first workloads,
> >  1/3 throughput-first workload + 2/3 latency-first workloads and
> >  1/2 throughput-first workload + 1/2 latency-first workloads).
> > For caculating the performance of mixed workloads, a weighted sum
> > system is introduced.
> > Score = normalized_latency * Weight + normalized_throughput * (1 -
> Weight).
> > If we assume that the user has an equal interest in latency and
> > throughput performance, the Score for "MIX" policy is on average ~1.52X
> than baseline.
> 
> This scoring system of yours makes no sense.  Just give us the numbers on
> what the average latency did versus your "baseline" and the same for the
> throughput.
> 
> > Kan Liang (30):
> >   net: introduce NET policy
> >   net/netpolicy: init NET policy
> >   i40e/netpolicy: Implement ndo_netpolicy_init
> >   net/netpolicy: get driver information
> >   i40e/netpolicy: implement ndo_get_irq_info
> >   net/netpolicy: get CPU information
> >   net/netpolicy: create CPU and queue mapping
> >   net/netpolicy: set and remove irq affinity
> >   net/netpolicy: enable and disable net policy
> >   net/netpolicy: introduce netpolicy object
> >   net/netpolicy: set net policy by policy name
> >   i40e/netpolicy: implement ndo_set_net_policy
> >   i40e/netpolicy: add three new net policies
> >   net/netpolicy: add MIX policy
> >   i40e/netpolicy: add MIX policy support
> >   net/netpolicy: net device hotplug
> >   net/netpolicy: support CPU hotplug
> >   net/netpolicy: handle channel changes
> >   net/netpolicy: implement netpolicy register
> >   net/netpolicy: introduce per socket netpolicy
> >   net/policy: introduce netpolicy_pick_queue
> >   net/netpolicy: set tx queues according to policy
> >   i40e/ethtool: support RX_CLS_LOC_ANY
> >   net/netpolicy: set rx queues according to policy
> >   net/netpolicy: introduce per task net policy
> >   net/netpolicy: set per task policy by proc
> >   net/netpolicy: fast path for finding the queues
> >   net/netpolicy: optimize for queue pair
> >   net/netpolicy: limit the total record number
> >   Documentation/networking: Document net policy
> 
> 30 patches is quite a bit to review.  You might have better luck getting review
> and/or feedback if you could split this up into at least 2 patch sets of 15 or so
> patches when you try to actually submit this.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 19:45     ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 19:45 UTC (permalink / raw)
  To: intel-wired-lan



> On Sun, Jul 17, 2016 at 11:55 PM,  <kan.liang@intel.com> wrote:
> > From: Kan Liang <kan.liang@intel.com>
> >
> > It is a big challenge to get good network performance. First, the
> > network performance is not good with default system settings. Second,
> > it is too difficult to do automatic tuning for all possible workloads,
> > since workloads have different requirements. Some workloads may want
> > high throughput. Some may need low latency. Last but not least, there are
> lots of manual configurations.
> > Fine grained configuration is too difficult for users.
> 
> The problem as I see it is that this is just going to end up likely being an even
> more intrusive version of irqbalance.  I really don't like the way that turned
> out as it did a number of really dumb things that usually result in it being
> disabled as soon as you actually want to do anything that will actually involve
> any kind of performance tuning.  If this stuff is pushed into the kernel it will
> be even harder to get rid of and that is definitely a bad thing.
> 
> > NET policy intends to simplify the network configuration and get a
> > good network performance according to the hints(policy) which is
> > applied by user. It provides some typical "policies" for user which
> > can be set per-socket, per-task or per-device. The kernel will
> > automatically figures out how to merge different requests to get good
> network performance.
> 
> So where is your policy for power saving?  From past experience I can tell you

There is no policy for power saving yet. I will add it to my todo list.

> that while performance tuning is a good thing, doing so at the expense of
> power management is bad.  In addition you seem to be making a lot of
> assumptions here that the end users are going to rewrite their applications to
> use the new socket options you added in order to try and tune the

Currently, they can set per task policy by proc to get good performance without
code changes.

> performance.  I have a hard time believing most developers are going to go
> to all that trouble.  In addition I suspect that even if they do go to that
> trouble they will probably still screw it up and you will end up with
> applications advertising latency as a goal when they should have specified
> CPU and so on.
> 
> > Net policy is designed for multiqueue network devices. This
> > implementation is only for Intel NICs using i40e driver. But the
> > concepts and generic code should apply to other multiqueue NICs too.
> 
> I would argue that your code is not very generic.  The fact that it is relying on
> flow director already greatly limits what you can do.  If you want to make this
> truly generic I would say you need to find ways to make this work on
> everything all the way down to things like i40evf and igb which don't have
> support for Flow Director.

Actually the NET policy codes employ ethtool's interface set_rxnfc to set rules.
It should be generic.
I guess I emphasize Flow Director too much in the document which make
you confuse.

> 
> > Net policy is also a combination of generic policy manager code and
> > some ethtool callbacks (per queue coalesce setting, flow
> > classification rules) to configure the driver.
> > This series also supports CPU hotplug and device hotplug.
> >
> > Here are some key Interfaces/APIs for NET policy.
> >
> >    /proc/net/netpolicy/$DEV/policy
> >    User can set/get per device policy from /proc
> >
> >    /proc/$PID/net_policy
> >    User can set/get per task policy from /proc
> >    prctl(PR_SET_NETPOLICY, POLICY_NAME, NULL, NULL, NULL)
> >    An alternative way to set/get per task policy is from prctl.
> >
> >    setsockopt(sockfd,SOL_SOCKET,SO_NETPOLICY,&policy,sizeof(int))
> >    User can set/get per socket policy by setsockopt
> >
> >
> >    int (*ndo_netpolicy_init)(struct net_device *dev,
> >                              struct netpolicy_info *info);
> >    Initialize device driver for NET policy
> >
> >    int (*ndo_get_irq_info)(struct net_device *dev,
> >                            struct netpolicy_dev_info *info);
> >    Collect device irq information
> 
> Instead of making the irq info a part of the ndo ops it might make more
> sense to make it part of an ethtool op.  Maybe you could make it so that you
> could specify a single queue at a time and get things like statistics, IRQ, and
> ring information.

I will think about it. Thanks.

> 
> >    int (*ndo_set_net_policy)(struct net_device *dev,
> >                              enum netpolicy_name name);
> >    Configure device according to policy name
> 
> I really don't like this piece of it.  I really think we shouldn't be leaving so
> much up to the driver to determine how to handle things.

There are some settings are device specific. For example, the interrupt
moderation for i40e for BULK policy is (50, 125). For other device, the number
could be different. For other device, only tunning interrupt moderation may
not be enough. So we need an interface for driver specific setting.

> In addition just passing one of 4 different types doesn't do much for actual
> configuration because the actual configuration of the device is much more
> complex then that.  Essentially all this does is provide a benchmark tuning
> interface.

The actual configuration is too complex for the user.
The different types at least provide them a good performance and
a good baseline to start.

> 
> >    netpolicy_register(struct netpolicy_reg *reg);
> >    netpolicy_unregister(struct netpolicy_reg *reg);
> >    NET policy API to register/unregister per task/socket net policy.
> >    For each task/socket, an record will be created and inserted into an RCU
> >    hash table.
> 
> This piece will take a significant amount of time before it could ever catch on.
> Once again this just looks like a benchmark tuning interface.  It isn't of much
> value.
> 
> >    netpolicy_pick_queue(struct netpolicy_reg *reg, bool is_rx);
> >    NET policy API to find the proper queue for packet receiving and
> >    transmitting.
> >
> >    netpolicy_set_rules(struct netpolicy_reg *reg, u32 queue_index,
> >                         struct netpolicy_flow_spec *flow);
> >    NET policy API to add flow director rules.
> 
> So Flow Director is a very Intel-centric approach.  I would suggest taking a
> now at NTUPLE and RXNFC rules as that is what is actually implemented in
> the kernel.  In addition I would recommend exploring RPS and
> ndo_rx_flow_steer as those are existing interfaces for configuring a specific
> flow to be delivered to a specific CPU.

Current codes use set_rxnfc.
Sorry for the confusion for Flow Director.

> 
> > For using NET policy, the per-device policy must be set in advance. It
> > will automatically configure the system and re-organize the resource
> > of the system accordingly. For system configuration, in this series,
> > it will disable irq balance, set device queue irq affinity, and modify
> > interrupt moderation. For re-organizing the resource, current
> > implementation forces that CPU and queue irq are 1:1 mapping. An 1:1
> mapping group is also called net policy object.
> > For each device policy, it maintains a policy list. Once the device
> > policy is applied, the objects will be insert and tracked in that
> > device policy list. The policy list only be updated when cpu/device
> > hotplug, queue number changes or device policy changes.
> 
> So as a beginning step it might make more sense to try and fix irqbalance
> instead of disabling it.  That is a huge red flag for me.
> You are just implementing something that is more intrusive than irqbalance
> and my concern here is we can't just disable it and reconfigure things like we
> can with the current irqbalance.  If irqbalance never got it right then why
> should we trust this?
> 
> Also how will you code handle a non 1:1 mapping.  For example I know one
> thing I have been looking at trying out was implementing a setup that would
> allocate 1 Tx queue per logical CPU, and 1 Rx queue per physical CPU.  The
> reason for that being that from past experience on ixgbe I have found that
> more Rx queues does not equal more performance when you start stacking
> active queues on SMT pairs.  If you don't have enough queues for the
> number of CPUs in a case such as this how would your code handle it?

The basic schedule unit for NET policy is object. Currently, one object includes
1 CPU and 1 queue. CPU and queue are 1:1 mapping.
For your case, we can define object as 2 TX queue, 1 Rx queue and 2 logical CPU
in the same physical core.
The object generation code is not hard to change.

> 
> > The user can use /proc, prctl and setsockopt to set per-task and
> > per-socket net policy. Once the policy is set, an related record will
> > be inserted into RCU hash table. The record includes ptr, policy and
> > net policy object. The ptr is the pointer address of task/socket. The
> > object will not be assigned until the first package receive/transmit.
> > The object is picked by round-robin from object list. Once the object
> > is determined, the following packets will be set to redirect to the
> queue(object).
> > The object can be shared. The per-task or per-socket policy can be
> inherited.
> >
> > Now NET policy supports four per device policies and three per
> > task/socket policies.
> >     - BULK policy: This policy is designed for high throughput. It can be
> >       applied to either per device policy or per task/socket policy.
> >     - CPU policy: This policy is designed for high throughput but lower CPU
> >       utilization. It can be applied to either per device policy or
> >       per task/socket policy.
> >     - LATENCY policy: This policy is designed for low latency. It can be
> >       applied to either per device policy or per task/socket policy.
> >     - MIX policy: This policy can only be applied to per device policy. This
> >       is designed for the case which miscellaneous types of workload running
> >       on the device.
> 
> This is a rather sparse list of policies.  I know most organizations with large
> data centers care about power savings AND latency.  What you have here is a
> rather simplistic set of targets.  I think actual configuration is much more
> complex then that.

There will be more policies are added in the future.

> 
> > Lots of tests are done for net policy on platforms with Intel Xeon E5
> > V2 and XL710 40G NIC. The baseline test is with Linux 4.6.0 kernel.
> 
> So I assume you are saying you applied your patches on top of a 4.6.0 kernel
> then for testing correct?  I'm just wanting to verify we aren't looking 4.6.0
> versus the current net-next or Linus's 4.7-RCX tree.

Yes, the patches on top of 4.6.0 kernel.

> 
> > Netperf is used to evaluate the throughput and latency performance.
> >   - "netperf -f m -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize
> >     -b burst -D" is used to evaluate throughput performance, which is
> >     called throughput-first workload.
> 
> While this is okay for testing performance you might be better off using a
> TCP_STREAM, TCP_MAERTS, and perhaps UDP_STREAM test.  There aren't
> too many real-world applications that will give you the kind of traffic pattern
> you see with TCP_RR being used for a bulk throughput test.
> 
> >   - "netperf -t TCP_RR -H server_IP -c -C -l 60 -- -r buffersize" is
> >     used to evaluate latency performance, which is called latency-first
> >     workload.
> >   - Different loads are also evaluated by running 1, 12, 24, 48 or 96
> >     throughput-first workloads/latency-first workload simultaneously.
> >
> > For "BULK" policy, the throughput performance is on average ~1.26X
> > than baseline.
> > For "CPU" policy, the throughput performance is on average ~1.20X than
> > baseline, and has lower CPU% (on average ~5% lower than "BULK" policy).
> > For "LATENCY" policy, the latency is on average 53.5% less than the
> baseline.
> 
> I have misgivings about just throwing out random numbers with no actual
> data to back it up.  What kind of throughput and CPU utilization were you
> actually seeing?  The idea is that we should be able to take your patches and
> apply them on our own system to see similar values and I'm suspecting that
> in many cases you might be focusing on the wrong things.  For example I
> could get good "LATENCY"
> numbers by just disabling interrupt throttling.  That would look really good
> for latency, but my CPU utilization would be through the roof.  It might be
> useful if you could provide throughput, CPU utilization, and latency numbers
> for your baseline versus each of these settings.

I'm not sure if I can share the absolute number. But here is a white paper which may
include the number you want.

The per-queue interrupt moderation solution mentioned in the white paper is what I
use in kernel NET policy.
The baseline is not an out of box Linux. The baseline did simple configuration which
people usually do (disable irqbalance, set queue irq affinity, set CPU affinity) for
better performance.
https://01.org/improve-network-performance-setting-queue-interrupt-moderation-linux

You may find that the performance data for NET policy is not as good as the data in
white paper. That's because the kernel implementation doesn't set CPU affinity for
application. There is also a little bit overhead for current kernel implementation.

Thanks,
Kan  
> 
> > For "MIX" policy, mixed workloads performance is evaluated. per-queue interrupt moderation solution
> > The mixed workloads are combination of throughput-first workload and
> > latency-first workload. Five different types of combinations are
> > evaluated (pure throughput-first workload, pure latency-first
> > workloads,
> >  2/3 throughput-first workload + 1/3 latency-first workloads,
> >  1/3 throughput-first workload + 2/3 latency-first workloads and
> >  1/2 throughput-first workload + 1/2 latency-first workloads).
> > For caculating the performance of mixed workloads, a weighted sum
> > system is introduced.
> > Score = normalized_latency * Weight + normalized_throughput * (1 -
> Weight).
> > If we assume that the user has an equal interest in latency and
> > throughput performance, the Score for "MIX" policy is on average ~1.52X
> than baseline.
> 
> This scoring system of yours makes no sense.  Just give us the numbers on
> what the average latency did versus your "baseline" and the same for the
> throughput.
> 
> > Kan Liang (30):
> >   net: introduce NET policy
> >   net/netpolicy: init NET policy
> >   i40e/netpolicy: Implement ndo_netpolicy_init
> >   net/netpolicy: get driver information
> >   i40e/netpolicy: implement ndo_get_irq_info
> >   net/netpolicy: get CPU information
> >   net/netpolicy: create CPU and queue mapping
> >   net/netpolicy: set and remove irq affinity
> >   net/netpolicy: enable and disable net policy
> >   net/netpolicy: introduce netpolicy object
> >   net/netpolicy: set net policy by policy name
> >   i40e/netpolicy: implement ndo_set_net_policy
> >   i40e/netpolicy: add three new net policies
> >   net/netpolicy: add MIX policy
> >   i40e/netpolicy: add MIX policy support
> >   net/netpolicy: net device hotplug
> >   net/netpolicy: support CPU hotplug
> >   net/netpolicy: handle channel changes
> >   net/netpolicy: implement netpolicy register
> >   net/netpolicy: introduce per socket netpolicy
> >   net/policy: introduce netpolicy_pick_queue
> >   net/netpolicy: set tx queues according to policy
> >   i40e/ethtool: support RX_CLS_LOC_ANY
> >   net/netpolicy: set rx queues according to policy
> >   net/netpolicy: introduce per task net policy
> >   net/netpolicy: set per task policy by proc
> >   net/netpolicy: fast path for finding the queues
> >   net/netpolicy: optimize for queue pair
> >   net/netpolicy: limit the total record number
> >   Documentation/networking: Document net policy
> 
> 30 patches is quite a bit to review.  You might have better luck getting review
> and/or feedback if you could split this up into at least 2 patch sets of 15 or so
> patches when you try to actually submit this.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 19:45     ` Liang, Kan
  (?)
@ 2016-07-18 19:49       ` Andi Kleen
  -1 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 19:49 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Alexander Duyck, David Miller, linux-kernel, intel-wired-lan,
	Netdev, Kirsher, Jeffrey T, Ingo Molnar, Peter Zijlstra,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Andrew Morton, keescook, viro, gorcunov,
	john.stultz, Alex Duyck, Ben Hutchings, decot, Brandeburg, Jesse,
	Andi Kleen

> > So where is your policy for power saving?  From past experience I can tell you
> 
> There is no policy for power saving yet. I will add it to my todo list.

Yes it's interesting to consider. The main goal here is to maximize CPU
idle residency? I wonder if that's that much different from the CPU policy.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 19:49       ` Andi Kleen
  0 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 19:49 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Alexander Duyck, David Miller, linux-kernel, intel-wired-lan,
	Netdev, Kirsher, Jeffrey T, Ingo Molnar, Peter Zijlstra,
	Alexey Kuznetsov, James Morris, Hideaki YOSHIFUJI,
	Patrick McHardy, Andrew Morton, keescook, viro, gorcunov,
	john.stultz, Alex Duyck, Ben Hutchings, decot,

> > So where is your policy for power saving?  From past experience I can tell you
> 
> There is no policy for power saving yet. I will add it to my todo list.

Yes it's interesting to consider. The main goal here is to maximize CPU
idle residency? I wonder if that's that much different from the CPU policy.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 19:49       ` Andi Kleen
  0 siblings, 0 replies; 123+ messages in thread
From: Andi Kleen @ 2016-07-18 19:49 UTC (permalink / raw)
  To: intel-wired-lan

> > So where is your policy for power saving?  From past experience I can tell you
> 
> There is no policy for power saving yet. I will add it to my todo list.

Yes it's interesting to consider. The main goal here is to maximize CPU
idle residency? I wonder if that's that much different from the CPU policy.

-Andi

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 17:52       ` [Intel-wired-lan] " Cong Wang
@ 2016-07-18 20:14         ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 20:14 UTC (permalink / raw)
  To: Cong Wang, Andi Kleen
  Cc: Florian Westphal, David Miller, LKML, intel-wired-lan,
	Linux Kernel Network Developers, Kirsher, Jeffrey T, Ingo Molnar,
	Peter Zijlstra, Alexey Kuznetsov, James Morris,
	Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton, Kees Cook,
	Al Viro, Cyrill Gorcunov, John Stultz, Alex Duyck, ben, decot,
	Brandeburg, Jesse



> 
> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >> It seems strange to me to add such policies to the kernel.
> >> Addmittingly, documentation of some settings is non-existent and one
> >> needs various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> >
> > The problem is that different applications need different policies.
> >
> > The only entity which can efficiently negotiate between different
> > applications' conflicting requests is the kernel. And that is pretty
> > much the basic job description of a kernel: multiplex hardware
> > efficiently between different users.
> >
> > So yes the user space tuning approach works for simple cases ("only
> > run workloads that require the same tuning"), but is ultimately not
> > very interesting nor scalable.
> 
> I don't read the code yet, just the cover letter.
> 
> We have global tunings, per-network-namespace tunings, per-socket tunings.
> It is still unclear why you can't just put different applications into different
> namespaces/containers to get different policies.

In NET policy, we do per queue tunings.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 20:14         ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 20:14 UTC (permalink / raw)
  To: intel-wired-lan



> 
> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >> It seems strange to me to add such policies to the kernel.
> >> Addmittingly, documentation of some settings is non-existent and one
> >> needs various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
> >
> > The problem is that different applications need different policies.
> >
> > The only entity which can efficiently negotiate between different
> > applications' conflicting requests is the kernel. And that is pretty
> > much the basic job description of a kernel: multiplex hardware
> > efficiently between different users.
> >
> > So yes the user space tuning approach works for simple cases ("only
> > run workloads that require the same tuning"), but is ultimately not
> > very interesting nor scalable.
> 
> I don't read the code yet, just the cover letter.
> 
> We have global tunings, per-network-namespace tunings, per-socket tunings.
> It is still unclear why you can't just put different applications into different
> namespaces/containers to get different policies.

In NET policy, we do per queue tunings.


Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 20:14         ` [Intel-wired-lan] " Liang, Kan
  (?)
@ 2016-07-18 20:19           ` Cong Wang
  -1 siblings, 0 replies; 123+ messages in thread
From: Cong Wang @ 2016-07-18 20:19 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Andi Kleen, Florian Westphal, David Miller, LKML,
	intel-wired-lan, Linux Kernel Network Developers, Kirsher,
	Jeffrey T, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz, Alex Duyck,
	ben, decot, Brandeburg, Jesse

On Mon, Jul 18, 2016 at 1:14 PM, Liang, Kan <kan.liang@intel.com> wrote:
>
>
>>
>> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> It seems strange to me to add such policies to the kernel.
>> >> Addmittingly, documentation of some settings is non-existent and one
>> >> needs various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
>> >
>> > The problem is that different applications need different policies.
>> >
>> > The only entity which can efficiently negotiate between different
>> > applications' conflicting requests is the kernel. And that is pretty
>> > much the basic job description of a kernel: multiplex hardware
>> > efficiently between different users.
>> >
>> > So yes the user space tuning approach works for simple cases ("only
>> > run workloads that require the same tuning"), but is ultimately not
>> > very interesting nor scalable.
>>
>> I don't read the code yet, just the cover letter.
>>
>> We have global tunings, per-network-namespace tunings, per-socket tunings.
>> It is still unclear why you can't just put different applications into different
>> namespaces/containers to get different policies.
>
> In NET policy, we do per queue tunings.

Is it possible to isolate NIC queues for containers?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 20:19           ` Cong Wang
  0 siblings, 0 replies; 123+ messages in thread
From: Cong Wang @ 2016-07-18 20:19 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Andi Kleen, Florian Westphal, David Miller, LKML,
	intel-wired-lan, Linux Kernel Network Developers, Kirsher,
	Jeffrey T, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz, Alex Duyck,
	ben, decot, Brandeburg, Jesse

On Mon, Jul 18, 2016 at 1:14 PM, Liang, Kan <kan.liang@intel.com> wrote:
>
>
>>
>> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> It seems strange to me to add such policies to the kernel.
>> >> Addmittingly, documentation of some settings is non-existent and one
>> >> needs various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
>> >
>> > The problem is that different applications need different policies.
>> >
>> > The only entity which can efficiently negotiate between different
>> > applications' conflicting requests is the kernel. And that is pretty
>> > much the basic job description of a kernel: multiplex hardware
>> > efficiently between different users.
>> >
>> > So yes the user space tuning approach works for simple cases ("only
>> > run workloads that require the same tuning"), but is ultimately not
>> > very interesting nor scalable.
>>
>> I don't read the code yet, just the cover letter.
>>
>> We have global tunings, per-network-namespace tunings, per-socket tunings.
>> It is still unclear why you can't just put different applications into different
>> namespaces/containers to get different policies.
>
> In NET policy, we do per queue tunings.

Is it possible to isolate NIC queues for containers?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 20:19           ` Cong Wang
  0 siblings, 0 replies; 123+ messages in thread
From: Cong Wang @ 2016-07-18 20:19 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Jul 18, 2016 at 1:14 PM, Liang, Kan <kan.liang@intel.com> wrote:
>
>
>>
>> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
>> >> It seems strange to me to add such policies to the kernel.
>> >> Addmittingly, documentation of some settings is non-existent and one
>> >> needs various different tools to set this (sysctl, procfs, sysfs, ethtool, etc).
>> >
>> > The problem is that different applications need different policies.
>> >
>> > The only entity which can efficiently negotiate between different
>> > applications' conflicting requests is the kernel. And that is pretty
>> > much the basic job description of a kernel: multiplex hardware
>> > efficiently between different users.
>> >
>> > So yes the user space tuning approach works for simple cases ("only
>> > run workloads that require the same tuning"), but is ultimately not
>> > very interesting nor scalable.
>>
>> I don't read the code yet, just the cover letter.
>>
>> We have global tunings, per-network-namespace tunings, per-socket tunings.
>> It is still unclear why you can't just put different applications into different
>> namespaces/containers to get different policies.
>
> In NET policy, we do per queue tunings.

Is it possible to isolate NIC queues for containers?

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 20:19           ` Cong Wang
  (?)
@ 2016-07-18 20:24             ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 20:24 UTC (permalink / raw)
  To: Cong Wang
  Cc: Andi Kleen, Florian Westphal, David Miller, LKML,
	intel-wired-lan, Linux Kernel Network Developers, Kirsher,
	Jeffrey T, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz, Alex Duyck,
	ben, decot, Brandeburg, Jesse



> >>
> >> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >> >> It seems strange to me to add such policies to the kernel.
> >> >> Addmittingly, documentation of some settings is non-existent and
> >> >> one needs various different tools to set this (sysctl, procfs, sysfs,
> ethtool, etc).
> >> >
> >> > The problem is that different applications need different policies.
> >> >
> >> > The only entity which can efficiently negotiate between different
> >> > applications' conflicting requests is the kernel. And that is
> >> > pretty much the basic job description of a kernel: multiplex
> >> > hardware efficiently between different users.
> >> >
> >> > So yes the user space tuning approach works for simple cases ("only
> >> > run workloads that require the same tuning"), but is ultimately not
> >> > very interesting nor scalable.
> >>
> >> I don't read the code yet, just the cover letter.
> >>
> >> We have global tunings, per-network-namespace tunings, per-socket
> tunings.
> >> It is still unclear why you can't just put different applications
> >> into different namespaces/containers to get different policies.
> >
> > In NET policy, we do per queue tunings.
> 
> Is it possible to isolate NIC queues for containers?

Yes, but we don't  have containers support yet. 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 20:24             ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 20:24 UTC (permalink / raw)
  To: Cong Wang
  Cc: Andi Kleen, Florian Westphal, David Miller, LKML,
	intel-wired-lan, Linux Kernel Network Developers, Kirsher,
	Jeffrey T, Ingo Molnar, Peter Zijlstra, Alexey Kuznetsov,
	James Morris, Hideaki YOSHIFUJI, Patrick McHardy, Andrew Morton,
	Kees Cook, Al Viro, Cyrill Gorcunov, John Stultz, Alex Duyck,
	ben, decot, Brandeburg, Jesse



> >>
> >> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >> >> It seems strange to me to add such policies to the kernel.
> >> >> Addmittingly, documentation of some settings is non-existent and
> >> >> one needs various different tools to set this (sysctl, procfs, sysfs,
> ethtool, etc).
> >> >
> >> > The problem is that different applications need different policies.
> >> >
> >> > The only entity which can efficiently negotiate between different
> >> > applications' conflicting requests is the kernel. And that is
> >> > pretty much the basic job description of a kernel: multiplex
> >> > hardware efficiently between different users.
> >> >
> >> > So yes the user space tuning approach works for simple cases ("only
> >> > run workloads that require the same tuning"), but is ultimately not
> >> > very interesting nor scalable.
> >>
> >> I don't read the code yet, just the cover letter.
> >>
> >> We have global tunings, per-network-namespace tunings, per-socket
> tunings.
> >> It is still unclear why you can't just put different applications
> >> into different namespaces/containers to get different policies.
> >
> > In NET policy, we do per queue tunings.
> 
> Is it possible to isolate NIC queues for containers?

Yes, but we don't  have containers support yet. 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 20:24             ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-18 20:24 UTC (permalink / raw)
  To: intel-wired-lan



> >>
> >> On Mon, Jul 18, 2016 at 8:45 AM, Andi Kleen <andi@firstfloor.org> wrote:
> >> >> It seems strange to me to add such policies to the kernel.
> >> >> Addmittingly, documentation of some settings is non-existent and
> >> >> one needs various different tools to set this (sysctl, procfs, sysfs,
> ethtool, etc).
> >> >
> >> > The problem is that different applications need different policies.
> >> >
> >> > The only entity which can efficiently negotiate between different
> >> > applications' conflicting requests is the kernel. And that is
> >> > pretty much the basic job description of a kernel: multiplex
> >> > hardware efficiently between different users.
> >> >
> >> > So yes the user space tuning approach works for simple cases ("only
> >> > run workloads that require the same tuning"), but is ultimately not
> >> > very interesting nor scalable.
> >>
> >> I don't read the code yet, just the cover letter.
> >>
> >> We have global tunings, per-network-namespace tunings, per-socket
> tunings.
> >> It is still unclear why you can't just put different applications
> >> into different namespaces/containers to get different policies.
> >
> > In NET policy, we do per queue tunings.
> 
> Is it possible to isolate NIC queues for containers?

Yes, but we don't  have containers support yet. 

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 18:30     ` [Intel-wired-lan] " Liang, Kan
@ 2016-07-18 20:51       ` Daniel Borkmann
  -1 siblings, 0 replies; 123+ messages in thread
From: Daniel Borkmann @ 2016-07-18 20:51 UTC (permalink / raw)
  To: Liang, Kan, davem, linux-kernel, intel-wired-lan, netdev
  Cc: Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, Brandeburg, Jesse, andi, tj

On 07/18/2016 08:30 PM, Liang, Kan wrote:
>> On 07/18/2016 08:55 AM, kan.liang@intel.com wrote:
[...]
>> On a higher level picture, why for example, a new cgroup in combination
>> with tc shouldn't be the ones resolving these policies on resource usage?
>
> The NET policy doesn't support cgroup yet, but it's on my todo list.
> The granularity for the device resource is per queue. The packet will be
> redirected to the specific queue.
> I'm not sure if cgroup with tc can do that.

Did you have a look at sch_mqprio, which can be used along with either
netprio cgroup or netcls cgroup plus tc on clsact's egress side to set
the priority for mqprio mappings from application side? At leats ixgbe,
i40e, fm10k have offload support for it and a number of other nics. You
could also use cls_bpf for making the prio assignment if you need to
involve also other meta data from the skb (like mark or prio derived from
sockets, etc). Maybe it doesn't cover all of what you need, but could be
a start to extend upon?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 20:51       ` Daniel Borkmann
  0 siblings, 0 replies; 123+ messages in thread
From: Daniel Borkmann @ 2016-07-18 20:51 UTC (permalink / raw)
  To: intel-wired-lan

On 07/18/2016 08:30 PM, Liang, Kan wrote:
>> On 07/18/2016 08:55 AM, kan.liang at intel.com wrote:
[...]
>> On a higher level picture, why for example, a new cgroup in combination
>> with tc shouldn't be the ones resolving these policies on resource usage?
>
> The NET policy doesn't support cgroup yet, but it's on my todo list.
> The granularity for the device resource is per queue. The packet will be
> redirected to the specific queue.
> I'm not sure if cgroup with tc can do that.

Did you have a look at sch_mqprio, which can be used along with either
netprio cgroup or netcls cgroup plus tc on clsact's egress side to set
the priority for mqprio mappings from application side? At leats ixgbe,
i40e, fm10k have offload support for it and a number of other nics. You
could also use cls_bpf for making the prio assignment if you need to
involve also other meta data from the skb (like mark or prio derived from
sockets, etc). Maybe it doesn't cover all of what you need, but could be
a start to extend upon?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 19:43         ` [Intel-wired-lan] " Andi Kleen
@ 2016-07-18 21:51           ` Hannes Frederic Sowa
  -1 siblings, 0 replies; 123+ messages in thread
From: Hannes Frederic Sowa @ 2016-07-18 21:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Florian Westphal, kan.liang, davem, linux-kernel,
	intel-wired-lan, netdev, jeffrey.t.kirsher, mingo, peterz,
	kuznet, jmorris, yoshfuji, kaber, akpm, keescook, viro, gorcunov,
	john.stultz, aduyck, ben, decot, jesse.brandeburg

Hello,

On Mon, Jul 18, 2016, at 21:43, Andi Kleen wrote:
> > I wonder if this can be attacked from a different angle. What would be
> > missing to add support for this in user space? The first possibility
> > that came to my mind is to just multiplex those hints in the kernel.
> 
> "just" is the handwaving part here -- you're proposing a micro kernel
> approach where part of the multiplexing job that the kernel is doing
> is farmed out to a message passing user space component.
> 
> I suspect this would be far more complicated to get right and
> perform well than a straight forward monolithic kernel subsystem --
> which is traditionally how Linux has approached things.

At the same time having any kind of policy in the kernel was also always
avoided.

> The daemon would always need to work with out of date state
> compared to the latest, because it cannot do any locking with the
> kernel state.  So you end up with a complex distributed system with
> multiple
> agents "fighting" with each other, and the tuning agent
> never being able to keep up with the actual work.

But you don't want to have the tuning agents in the fast path? If you
really try to synchronously update all queue mappings/irqs during socket
creation or connect time this would add rtnl lock to basically socket
creation, as drivers require that. This would slow down basic socket
operations a lot and synchronize them with the management interface.

Even dst_entries are not synchronously updated anymore nowadays as that
would require too much locking overhead in the kernel.

> Also of course it would be fundamentally less efficient than
> kernel code doing that, just because of the additional context
> switches needed.

Synchronizing or configuring any kind of queues already requires
rtnl_mutex. I didn't test it but acquiring rtnl mutex in inet_recvmsg is
unlikely to fly performance wise and might even be very dangerous under
DoS attacks (like I see in 24/30).

Bye,
Hannes

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-18 21:51           ` Hannes Frederic Sowa
  0 siblings, 0 replies; 123+ messages in thread
From: Hannes Frederic Sowa @ 2016-07-18 21:51 UTC (permalink / raw)
  To: intel-wired-lan

Hello,

On Mon, Jul 18, 2016, at 21:43, Andi Kleen wrote:
> > I wonder if this can be attacked from a different angle. What would be
> > missing to add support for this in user space? The first possibility
> > that came to my mind is to just multiplex those hints in the kernel.
> 
> "just" is the handwaving part here -- you're proposing a micro kernel
> approach where part of the multiplexing job that the kernel is doing
> is farmed out to a message passing user space component.
> 
> I suspect this would be far more complicated to get right and
> perform well than a straight forward monolithic kernel subsystem --
> which is traditionally how Linux has approached things.

At the same time having any kind of policy in the kernel was also always
avoided.

> The daemon would always need to work with out of date state
> compared to the latest, because it cannot do any locking with the
> kernel state.  So you end up with a complex distributed system with
> multiple
> agents "fighting" with each other, and the tuning agent
> never being able to keep up with the actual work.

But you don't want to have the tuning agents in the fast path? If you
really try to synchronously update all queue mappings/irqs during socket
creation or connect time this would add rtnl lock to basically socket
creation, as drivers require that. This would slow down basic socket
operations a lot and synchronize them with the management interface.

Even dst_entries are not synchronously updated anymore nowadays as that
would require too much locking overhead in the kernel.

> Also of course it would be fundamentally less efficient than
> kernel code doing that, just because of the additional context
> switches needed.

Synchronizing or configuring any kind of queues already requires
rtnl_mutex. I didn't test it but acquiring rtnl mutex in inet_recvmsg is
unlikely to fly performance wise and might even be very dangerous under
DoS attacks (like I see in 24/30).

Bye,
Hannes

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-18 21:51           ` [Intel-wired-lan] " Hannes Frederic Sowa
  (?)
@ 2016-07-19  1:49             ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-19  1:49 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Andi Kleen
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, Brandeburg, Jesse



> 
> > Also of course it would be fundamentally less efficient than kernel
> > code doing that, just because of the additional context switches
> > needed.
> 
> Synchronizing or configuring any kind of queues already requires rtnl_mutex.
> I didn't test it but acquiring rtnl mutex in inet_recvmsg is unlikely to fly
> performance wise and

Yes, rtnl will bring some overheads. But the configuration is one time thing for
application or socket. It only happens on receiving first packet.
Unless the application/socket only transmit few packets, the overhead
could be ignored. If they only transmit few packets, why they care about
performance?

> might even be very dangerous under DoS attacks (like
> I see in 24/30).
> 
Patch 29/30 tries to prevent such case.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-19  1:49             ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-19  1:49 UTC (permalink / raw)
  To: Hannes Frederic Sowa, Andi Kleen
  Cc: Florian Westphal, davem, linux-kernel, intel-wired-lan, netdev,
	Kirsher, Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck



> 
> > Also of course it would be fundamentally less efficient than kernel
> > code doing that, just because of the additional context switches
> > needed.
> 
> Synchronizing or configuring any kind of queues already requires rtnl_mutex.
> I didn't test it but acquiring rtnl mutex in inet_recvmsg is unlikely to fly
> performance wise and

Yes, rtnl will bring some overheads. But the configuration is one time thing for
application or socket. It only happens on receiving first packet.
Unless the application/socket only transmit few packets, the overhead
could be ignored. If they only transmit few packets, why they care about
performance?

> might even be very dangerous under DoS attacks (like
> I see in 24/30).
> 
Patch 29/30 tries to prevent such case.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-19  1:49             ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-19  1:49 UTC (permalink / raw)
  To: intel-wired-lan



> 
> > Also of course it would be fundamentally less efficient than kernel
> > code doing that, just because of the additional context switches
> > needed.
> 
> Synchronizing or configuring any kind of queues already requires rtnl_mutex.
> I didn't test it but acquiring rtnl mutex in inet_recvmsg is unlikely to fly
> performance wise and

Yes, rtnl will bring some overheads. But the configuration is one time thing for
application or socket. It only happens on receiving first packet.
Unless the application/socket only transmit few packets, the overhead
could be ignored. If they only transmit few packets, why they care about
performance?

> might even be very dangerous under DoS attacks (like
> I see in 24/30).
> 
Patch 29/30 tries to prevent such case.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* Re: [RFC PATCH 00/30] Kernel NET policy
  2016-07-19  1:49             ` Liang, Kan
@ 2016-07-19  5:03               ` David Miller
  -1 siblings, 0 replies; 123+ messages in thread
From: David Miller @ 2016-07-19  5:03 UTC (permalink / raw)
  To: kan.liang
  Cc: hannes, andi, fw, linux-kernel, intel-wired-lan, netdev,
	jeffrey.t.kirsher, mingo, peterz, kuznet, jmorris, yoshfuji,
	kaber, akpm, keescook, viro, gorcunov, john.stultz, aduyck, ben,
	decot, jesse.brandeburg

From: "Liang, Kan" <kan.liang@intel.com>
Date: Tue, 19 Jul 2016 01:49:41 +0000

> Yes, rtnl will bring some overheads. But the configuration is one
> time thing for application or socket. It only happens on receiving
> first packet.

Thanks for destroying our connection rates.

This kind of overhead is simply unacceptable.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-19  5:03               ` David Miller
  0 siblings, 0 replies; 123+ messages in thread
From: David Miller @ 2016-07-19  5:03 UTC (permalink / raw)
  To: intel-wired-lan

From: "Liang, Kan" <kan.liang@intel.com>
Date: Tue, 19 Jul 2016 01:49:41 +0000

> Yes, rtnl will bring some overheads. But the configuration is one
> time thing for application or socket. It only happens on receiving
> first packet.

Thanks for destroying our connection rates.

This kind of overhead is simply unacceptable.

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
  2016-07-19  5:03               ` [Intel-wired-lan] " David Miller
  (?)
@ 2016-07-19 13:43                 ` Liang, Kan
  -1 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-19 13:43 UTC (permalink / raw)
  To: David Miller
  Cc: hannes, andi, fw, linux-kernel, intel-wired-lan, netdev, Kirsher,
	Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm,
	keescook, viro, gorcunov, john.stultz, aduyck, ben, decot,
	Brandeburg, Jesse



> > Yes, rtnl will bring some overheads. But the configuration is one time
> > thing for application or socket. It only happens on receiving first
> > packet.
> 
> Thanks for destroying our connection rates.
> 
> This kind of overhead is simply unacceptable.

If so, I think I can make the configuration asynchronized for next
version. The connection rate should not be destroyed.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* RE: [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-19 13:43                 ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-19 13:43 UTC (permalink / raw)
  To: David Miller
  Cc: hannes, andi, fw, linux-kernel, intel-wired-lan, netdev, Kirsher,
	Jeffrey T, mingo, peterz, kuznet, jmorris, yoshfuji, kaber, akpm,
	keescook, viro, gorcunov, john.stultz@linaro.org



> > Yes, rtnl will bring some overheads. But the configuration is one time
> > thing for application or socket. It only happens on receiving first
> > packet.
> 
> Thanks for destroying our connection rates.
> 
> This kind of overhead is simply unacceptable.

If so, I think I can make the configuration asynchronized for next
version. The connection rate should not be destroyed.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 123+ messages in thread

* [Intel-wired-lan] [RFC PATCH 00/30] Kernel NET policy
@ 2016-07-19 13:43                 ` Liang, Kan
  0 siblings, 0 replies; 123+ messages in thread
From: Liang, Kan @ 2016-07-19 13:43 UTC (permalink / raw)
  To: intel-wired-lan



> > Yes, rtnl will bring some overheads. But the configuration is one time
> > thing for application or socket. It only happens on receiving first
> > packet.
> 
> Thanks for destroying our connection rates.
> 
> This kind of overhead is simply unacceptable.

If so, I think I can make the configuration asynchronized for next
version. The connection rate should not be destroyed.

Thanks,
Kan


^ permalink raw reply	[flat|nested] 123+ messages in thread

end of thread, other threads:[~2016-07-19 13:43 UTC | newest]

Thread overview: 123+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-18  6:55 [RFC PATCH 00/30] Kernel NET policy kan.liang
2016-07-18  6:55 ` [Intel-wired-lan] " kan.liang
2016-07-18  6:55 ` [RFC PATCH 01/30] net: introduce " kan.liang
2016-07-18  6:55   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:55 ` [RFC PATCH 02/30] net/netpolicy: init " kan.liang
2016-07-18  6:55   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:55 ` [RFC PATCH 03/30] i40e/netpolicy: Implement ndo_netpolicy_init kan.liang
2016-07-18  6:55   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:55 ` [RFC PATCH 04/30] net/netpolicy: get driver information kan.liang
2016-07-18  6:55   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:55 ` [RFC PATCH 05/30] i40e/netpolicy: implement ndo_get_irq_info kan.liang
2016-07-18  6:55   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 06/30] net/netpolicy: get CPU information kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 07/30] net/netpolicy: create CPU and queue mapping kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 08/30] net/netpolicy: set and remove irq affinity kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 09/30] net/netpolicy: enable and disable net policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 10/30] net/netpolicy: introduce netpolicy object kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 11/30] net/netpolicy: set net policy by policy name kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 12/30] i40e/netpolicy: implement ndo_set_net_policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 13/30] i40e/netpolicy: add three new net policies kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 14/30] net/netpolicy: add MIX policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 15/30] i40e/netpolicy: add MIX policy support kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 16/30] net/netpolicy: net device hotplug kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 17/30] net/netpolicy: support CPU hotplug kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 18/30] net/netpolicy: handle channel changes kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 19/30] net/netpolicy: implement netpolicy register kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 20/30] net/netpolicy: introduce per socket netpolicy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 21/30] net/policy: introduce netpolicy_pick_queue kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 22/30] net/netpolicy: set tx queues according to policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 23/30] i40e/ethtool: support RX_CLS_LOC_ANY kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18 16:21   ` Alexander Duyck
2016-07-18 16:21     ` [Intel-wired-lan] " Alexander Duyck
2016-07-18  6:56 ` [RFC PATCH 24/30] net/netpolicy: set rx queues according to policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 25/30] net/netpolicy: introduce per task net policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 26/30] net/netpolicy: set per task policy by proc kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 27/30] net/netpolicy: fast path for finding the queues kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 28/30] net/netpolicy: optimize for queue pair kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 29/30] net/netpolicy: limit the total record number kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18  6:56 ` [RFC PATCH 30/30] Documentation/networking: Document net policy kan.liang
2016-07-18  6:56   ` [Intel-wired-lan] " kan.liang
2016-07-18 16:58   ` Randy Dunlap
2016-07-18 16:58     ` [Intel-wired-lan] " Randy Dunlap
2016-07-18 15:18 ` [RFC PATCH 00/30] Kernel NET policy Florian Westphal
2016-07-18 15:18   ` [Intel-wired-lan] " Florian Westphal
2016-07-18 15:45   ` Andi Kleen
2016-07-18 15:45     ` [Intel-wired-lan] " Andi Kleen
2016-07-18 17:52     ` Cong Wang
2016-07-18 17:52       ` [Intel-wired-lan] " Cong Wang
2016-07-18 20:14       ` Liang, Kan
2016-07-18 20:14         ` [Intel-wired-lan] " Liang, Kan
2016-07-18 20:19         ` Cong Wang
2016-07-18 20:19           ` [Intel-wired-lan] " Cong Wang
2016-07-18 20:19           ` Cong Wang
2016-07-18 20:24           ` Liang, Kan
2016-07-18 20:24             ` [Intel-wired-lan] " Liang, Kan
2016-07-18 20:24             ` Liang, Kan
2016-07-18 19:04     ` Hannes Frederic Sowa
2016-07-18 19:04       ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-07-18 19:43       ` Andi Kleen
2016-07-18 19:43         ` [Intel-wired-lan] " Andi Kleen
2016-07-18 21:51         ` Hannes Frederic Sowa
2016-07-18 21:51           ` [Intel-wired-lan] " Hannes Frederic Sowa
2016-07-19  1:49           ` Liang, Kan
2016-07-19  1:49             ` [Intel-wired-lan] " Liang, Kan
2016-07-19  1:49             ` Liang, Kan
2016-07-19  5:03             ` David Miller
2016-07-19  5:03               ` [Intel-wired-lan] " David Miller
2016-07-19 13:43               ` Liang, Kan
2016-07-19 13:43                 ` [Intel-wired-lan] " Liang, Kan
2016-07-19 13:43                 ` Liang, Kan
2016-07-18 15:51   ` Liang, Kan
2016-07-18 15:51     ` [Intel-wired-lan] " Liang, Kan
2016-07-18 15:51     ` Liang, Kan
2016-07-18 16:17     ` Florian Westphal
2016-07-18 16:17       ` [Intel-wired-lan] " Florian Westphal
2016-07-18 16:17       ` Florian Westphal
2016-07-18 17:40       ` Liang, Kan
2016-07-18 17:40         ` [Intel-wired-lan] " Liang, Kan
2016-07-18 17:40         ` Liang, Kan
2016-07-18 16:34     ` Tom Herbert
2016-07-18 16:34       ` [Intel-wired-lan] " Tom Herbert
2016-07-18 16:34       ` Tom Herbert
2016-07-18 17:58       ` Liang, Kan
2016-07-18 17:58         ` [Intel-wired-lan] " Liang, Kan
2016-07-18 17:58         ` Liang, Kan
2016-07-18 16:22 ` Daniel Borkmann
2016-07-18 16:22   ` [Intel-wired-lan] " Daniel Borkmann
2016-07-18 18:30   ` Liang, Kan
2016-07-18 18:30     ` [Intel-wired-lan] " Liang, Kan
2016-07-18 20:51     ` Daniel Borkmann
2016-07-18 20:51       ` [Intel-wired-lan] " Daniel Borkmann
2016-07-18 17:00 ` Alexander Duyck
2016-07-18 17:00   ` [Intel-wired-lan] " Alexander Duyck
2016-07-18 19:45   ` Liang, Kan
2016-07-18 19:45     ` [Intel-wired-lan] " Liang, Kan
2016-07-18 19:45     ` Liang, Kan
2016-07-18 19:49     ` Andi Kleen
2016-07-18 19:49       ` [Intel-wired-lan] " Andi Kleen
2016-07-18 19:49       ` Andi Kleen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.