All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/11] ptq: Per Thread Queues
@ 2020-06-24 17:17 Tom Herbert
  2020-06-24 17:17 ` [RFC PATCH 01/11] cgroup: Export cgroup_{procs,threads}_start and cgroup_procs_next Tom Herbert
                   ` (10 more replies)
  0 siblings, 11 replies; 26+ messages in thread
From: Tom Herbert @ 2020-06-24 17:17 UTC (permalink / raw)
  To: netdev; +Cc: Tom Herbert

Per Thread Queues allows application threads to be assigned dedicated
hardware network queues for both transmit and receive. This facility
provides a high degree of traffic isolation between applications and
can also help facilitate high performance due to fine grained packet
steering. An overview and design considerations of Per Thread Queues
has been add to Documentation/networking/scaling.rst.

This patch set provides a basic implementation of Per Thread Queues.
The patch set includes:

	- Minor Infrastructure changes to cgroups (just export a
	  couple of functions)
	- netqueue.h to hold generic definitions for network queues
	- Minor infrastructure in aRFS and net-sysfs to accommodate
	  PTQ
	- Introduce the concept of "global queues". These are used
	  in cgroup configuration of PTQ. Global queues can be
	  mapped to real device queues. A per device queue sysfs
	  parameter is added to configure the mapping of device
	  queue to a global queue
	- Creation of a new cgroup controller, "net_queues", that
	  is used to configure Per Thread Queues
	- Hook up the transmit path. This has two parts: 1) In
	  send socket operations record the transmit queue
	  associated with a task in the sock structure, 2) In
	  netdev_pick_tx, check if the sock structure of the skb
	  has a valid transmit global queue set. If so, convert the
	  queue identifier to a device queue identifier based on the per
	  device mapping table. This selection precedes XPS
	- Hook up the receive path. This has two parts: 1) In
	  rps_record_sock_flow check if a receive global queue is
	  assigned to the running task, if so then set it in the
	  sock_flow_table entry for the flow. Note this in lieu of
	  setting the running CPU in the entry. 2) Change get_rps_cpu to
	  query the sock_flow_table to see if a queue index has been
	  stored (as opposed to a CPU number). If a queue index is
	  present, use it for steering including for it to be the
	  target of ndo_rx_flow_steer.

Related features and concepts:

	- netprio and prio_tc_map: Similar to those, PTQ allows control,
	  via cgroups and per device maps, over mapping applications'
	  packets to transmit queues. However, PTQ is intended to
	  perform fine grained per application mapping to queues such
	  that each application thread, possibly thousands of them, can
	  have its own dedicate transmit queue.
	- aRFS: On the transmit side PTQ extends aRFS to steer packets
	  for a flow based on assigned global queue as opposed to only
	  running CPU for the processing thread. In PTQ, the queue
	  "follows" the thread so that when threads are scheduled to
	  run on a different CPU, the packets for flows of the thread
	  continue to be received on the right queue. This addresses
	  a problem in aRFS where when a thread is rescheduled all
	  of its aRFS steered flows may be to moved to a different queue
	  (i.e. ndo_rx_flow_steer needs to be called for each flow).
	- Busy polling: PTQ provides silo'ing of an application packets
	  into queues and busy polling of those queue can then be
	  applied for high performance. This is likely the fist
	  instantiation of PTQ to combine it with busy polling
	  (moving interrupts for those queues as threads are scheduled
	  is most likely prohibitive). Busy polling is only practical
	  with a few queues, like maybe at most one per CPU, and
	  won't scale to thousands of per thread queues in use.
          (to address that sleeping-busy-poll with completion
	  queues is suggested below).
	- Making Networking Queues a First Class Citizen in the Kernel
	  https://linuxplumbersconf.org/event/4/contributions/462/
	  attachments/241/422/LPC_2019_kernel_queue_manager.pdf:
	  The concept of "global queues" should be a good complement
	  to this proposal. Global queue provide an abstract
	  representation of device queues. the abstraction is resolved
	  when the global queue is mapped to a real hardware queue. This
	  layering allows exposing queues to the user and configuration
	  which might be associated with general attributes (like high
	  priority, QoS characteristics, etc.). The mapping to a
	  specific device queue gives the low level queue that satisfies
	  the implied service of the global queue.  Any attributes and
	  associations are configured and in no way hardcoded, so that
	  the use of queues in the manner is fully extensible and can be
	  driven be arbitrary user defined policy. Since global queues
	  are device agnostic they not just can be managed as local
	  system resource, but also across across the distributed
	  tasks for a job in the datacenter like as a property of a
	  container in Kubernetes (similar to how we might manage
	  network priority as a global DC resource, but global queues
	  provide much more granularity and richness in what they can
	  convey).

There are a number of possible extensions to this work

	- Queue selection could be done on a per process basis
	  or a per socket basis as well as a per thread basis. (per
	  packet basis probably makes little sense due to OOO)
	- The mechanism for selecting a queue to assign to a thread
	  could be programmed. For instance, an eBPF hook could be
	  added that would allow very fine grained policies to do
	  queue selection.
	- "Global queue groups" could be created where a global queue
	  identifier maps to some group of device queues and there is
	  a selection algorithm, possibly another eBPF hook, that
	  maps to a specific device queue for use.
	- Another attribute in the cgroup could be added to enable
	  or disable aRFS on a per thread basis.
	- Extend the net_queues cgroup to allow control over
	  busy-polling on a per cgroup basis. This could further
	  be enhanced by eBPF hooks to control busy-polling for
	  individual sockets of the cgroup per some arbitrary policy
	  (similar to eBPF hook for SO_RESUSEPORT).
	- Elasticity in listener sockets. As described in the
	  Documentation we expect that a filter can be installed to
	  direct packets an application to the set of queues for the
	  applications. The problem is that the application may
	  create threads on demand so that we don't know a priori
	  how many queues the application needs. Optimally, we
	  want a mechanism to dynamically enable/disable a
	  queue in the filter set so that at any given time the
	  application is receive packets only on queues it is
	  actively using. This may entail a new ndo_function.
	- The sleeping-busy-poll with completion queue model
	  described in the documentation could be integrated. This
	  would most entail creating a reverse mapping from queue
	  to threads, and then allowing the thread processing a
	  device completion queue to schedule the threads of interest.


Tom Herbert (11):
  cgroup: Export cgroup_{procs,threads}_start and cgroup_procs_next
  net: Create netqueue.h and define NO_QUEUE
  arfs: Create set_arfs_queue
  net-sysfs: Create rps_create_sock_flow_table
  net: Infrastructure for per queue aRFS
  net: Function to check against maximum number for RPS queues
  net: Introduce global queues
  ptq: Per Thread Queues
  ptq: Hook up transmit side of Per Queue Threads
  ptq: Hook up receive side of Per Queue Threads
  doc: Documentation for Per Thread Queues

 Documentation/networking/scaling.rst | 195 +++++++-
 include/linux/cgroup.h               |   3 +
 include/linux/cgroup_subsys.h        |   4 +
 include/linux/netdevice.h            | 204 +++++++-
 include/linux/netqueue.h             |  25 +
 include/linux/sched.h                |   4 +
 include/net/ptq.h                    |  45 ++
 include/net/sock.h                   |  75 ++-
 kernel/cgroup/cgroup.c               |   9 +-
 kernel/fork.c                        |   4 +
 net/Kconfig                          |  18 +
 net/core/Makefile                    |   1 +
 net/core/dev.c                       | 177 +++++--
 net/core/filter.c                    |   4 +-
 net/core/net-sysfs.c                 | 201 +++++++-
 net/core/ptq.c                       | 688 +++++++++++++++++++++++++++
 net/core/sysctl_net_core.c           | 152 ++++--
 net/ipv4/af_inet.c                   |   6 +
 18 files changed, 1693 insertions(+), 122 deletions(-)
 create mode 100644 include/linux/netqueue.h
 create mode 100644 include/net/ptq.h
 create mode 100644 net/core/ptq.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 26+ messages in thread
* Re: [RFC PATCH 05/11] net: Infrastructure for per queue aRFS
@ 2020-06-24 21:50 kernel test robot
  2020-06-28  8:56 ` Rong Chen
  0 siblings, 1 reply; 26+ messages in thread
From: kernel test robot @ 2020-06-24 21:50 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 7829 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20200624171749.11927-6-tom@herbertland.com>
References: <20200624171749.11927-6-tom@herbertland.com>
TO: Tom Herbert <tom@herbertland.com>

Hi Tom,

[FYI, it's a private test report for your RFC patch.]
[auto build test WARNING on net/master]
[also build test WARNING on ipvs/master net-next/master linus/master v5.8-rc2 next-20200624]
[cannot apply to cgroup/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use  as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Tom-Herbert/ptq-Per-Thread-Queues/20200625-012135
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 0275875530f692c725c6f993aced2eca2d6ac50c
:::::: branch date: 4 hours ago
:::::: commit date: 4 hours ago
config: s390-randconfig-s031-20200624 (attached as .config)
compiler: s390-linux-gcc (GCC) 9.3.0
reproduce:
        # apt-get install sparse
        # sparse version: v0.6.2-dirty
        # save the attached .config to linux build tree
        make W=1 C=1 ARCH=s390 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__'

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


sparse warnings: (new ones prefixed by >>)

   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3264:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3264:23: sparse:     got unsigned int
   net/core/dev.c:3264:23: sparse: sparse: cast from restricted __wsum
   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3264:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3264:23: sparse:     got unsigned int
   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 1 (different base types) @@     expected unsigned int [usertype] val @@     got restricted __wsum @@
   net/core/dev.c:3264:23: sparse:     expected unsigned int [usertype] val
   net/core/dev.c:3264:23: sparse:     got restricted __wsum
   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3264:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3264:23: sparse:     got unsigned int
   net/core/dev.c:3264:23: sparse: sparse: cast from restricted __wsum
   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3264:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3264:23: sparse:     got unsigned int
   net/core/dev.c:3264:23: sparse: sparse: cast from restricted __wsum
   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3264:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3264:23: sparse:     got unsigned int
   net/core/dev.c:3264:23: sparse: sparse: cast from restricted __wsum
   net/core/dev.c:3264:23: sparse: sparse: incorrect type in argument 4 (different base types) @@     expected restricted __wsum [usertype] csum @@     got unsigned int @@
   net/core/dev.c:3264:23: sparse:     expected restricted __wsum [usertype] csum
   net/core/dev.c:3264:23: sparse:     got unsigned int
   net/core/dev.c:3264:23: sparse: sparse: cast from restricted __wsum
>> net/core/dev.c:4451:27: sparse: sparse: cast to non-scalar
>> net/core/dev.c:4451:27: sparse: sparse: cast from non-scalar
   net/core/dev.c:5614:1: sparse: sparse: symbol '__pcpu_scope_flush_works' was not declared. Should it be static?
   net/core/dev.c:3747:26: sparse: sparse: context imbalance in '__dev_queue_xmit' - different lock contexts for basic block
   net/core/dev.c:4922:44: sparse: sparse: context imbalance in 'net_tx_action' - unexpected unlock

# https://github.com/0day-ci/linux/commit/8cf630e2a48d7b6e18be2f46f90cebf8ec5d506c
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 8cf630e2a48d7b6e18be2f46f90cebf8ec5d506c
vim +4451 net/core/dev.c

c445477d74ab37 Ben Hutchings 2011-01-19  4426  
c445477d74ab37 Ben Hutchings 2011-01-19  4427  /**
c445477d74ab37 Ben Hutchings 2011-01-19  4428   * rps_may_expire_flow - check whether an RFS hardware filter may be removed
c445477d74ab37 Ben Hutchings 2011-01-19  4429   * @dev: Device on which the filter was set
c445477d74ab37 Ben Hutchings 2011-01-19  4430   * @rxq_index: RX queue index
c445477d74ab37 Ben Hutchings 2011-01-19  4431   * @flow_id: Flow ID passed to ndo_rx_flow_steer()
c445477d74ab37 Ben Hutchings 2011-01-19  4432   * @filter_id: Filter ID returned by ndo_rx_flow_steer()
c445477d74ab37 Ben Hutchings 2011-01-19  4433   *
c445477d74ab37 Ben Hutchings 2011-01-19  4434   * Drivers that implement ndo_rx_flow_steer() should periodically call
c445477d74ab37 Ben Hutchings 2011-01-19  4435   * this function for each installed filter and remove the filters for
c445477d74ab37 Ben Hutchings 2011-01-19  4436   * which it returns %true.
c445477d74ab37 Ben Hutchings 2011-01-19  4437   */
c445477d74ab37 Ben Hutchings 2011-01-19  4438  bool rps_may_expire_flow(struct net_device *dev, u16 rxq_index,
c445477d74ab37 Ben Hutchings 2011-01-19  4439  			 u32 flow_id, u16 filter_id)
c445477d74ab37 Ben Hutchings 2011-01-19  4440  {
c445477d74ab37 Ben Hutchings 2011-01-19  4441  	struct netdev_rx_queue *rxqueue = dev->_rx + rxq_index;
c445477d74ab37 Ben Hutchings 2011-01-19  4442  	struct rps_dev_flow_table *flow_table;
8cf630e2a48d7b Tom Herbert   2020-06-24  4443  	struct rps_cpu_qid cpu_qid;
c445477d74ab37 Ben Hutchings 2011-01-19  4444  	struct rps_dev_flow *rflow;
c445477d74ab37 Ben Hutchings 2011-01-19  4445  	bool expire = true;
c445477d74ab37 Ben Hutchings 2011-01-19  4446  
c445477d74ab37 Ben Hutchings 2011-01-19  4447  	rcu_read_lock();
c445477d74ab37 Ben Hutchings 2011-01-19  4448  	flow_table = rcu_dereference(rxqueue->rps_flow_table);
c445477d74ab37 Ben Hutchings 2011-01-19  4449  	if (flow_table && flow_id <= flow_table->mask) {
c445477d74ab37 Ben Hutchings 2011-01-19  4450  		rflow = &flow_table->flows[flow_id];
8cf630e2a48d7b Tom Herbert   2020-06-24 @4451  		cpu_qid = READ_ONCE(rflow->cpu_qid);
8cf630e2a48d7b Tom Herbert   2020-06-24  4452  		if (rflow->filter == filter_id && !cpu_qid.use_qid &&
8cf630e2a48d7b Tom Herbert   2020-06-24  4453  		    cpu_qid.cpu < nr_cpu_ids &&
8cf630e2a48d7b Tom Herbert   2020-06-24  4454  		    ((int)(per_cpu(softnet_data, cpu_qid.cpu).input_queue_head -
c445477d74ab37 Ben Hutchings 2011-01-19  4455  			   rflow->last_qtail) <
c445477d74ab37 Ben Hutchings 2011-01-19  4456  		     (int)(10 * flow_table->mask)))
c445477d74ab37 Ben Hutchings 2011-01-19  4457  			expire = false;
c445477d74ab37 Ben Hutchings 2011-01-19  4458  	}
c445477d74ab37 Ben Hutchings 2011-01-19  4459  	rcu_read_unlock();
c445477d74ab37 Ben Hutchings 2011-01-19  4460  	return expire;
c445477d74ab37 Ben Hutchings 2011-01-19  4461  }
c445477d74ab37 Ben Hutchings 2011-01-19  4462  EXPORT_SYMBOL(rps_may_expire_flow);
c445477d74ab37 Ben Hutchings 2011-01-19  4463  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 19564 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-06-30 21:06 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-24 17:17 [RFC PATCH 00/11] ptq: Per Thread Queues Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 01/11] cgroup: Export cgroup_{procs,threads}_start and cgroup_procs_next Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 02/11] net: Create netqueue.h and define NO_QUEUE Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 03/11] arfs: Create set_arfs_queue Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 04/11] net-sysfs: Create rps_create_sock_flow_table Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 05/11] net: Infrastructure for per queue aRFS Tom Herbert
2020-06-28  8:55   ` kernel test robot
2020-06-24 17:17 ` [RFC PATCH 06/11] net: Function to check against maximum number for RPS queues Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 07/11] net: Introduce global queues Tom Herbert
2020-06-24 23:00   ` kernel test robot
2020-06-24 23:58   ` kernel test robot
2020-06-25  0:23   ` kernel test robot
2020-06-30 21:06   ` Jonathan Lemon
2020-06-24 17:17 ` [RFC PATCH 08/11] ptq: Per Thread Queues Tom Herbert
2020-06-24 21:20   ` kernel test robot
2020-06-25  1:50   ` [RFC PATCH] ptq: null_pcdesc can be static kernel test robot
2020-06-25  7:26   ` [RFC PATCH 08/11] ptq: Per Thread Queues kernel test robot
2020-06-24 17:17 ` [RFC PATCH 09/11] ptq: Hook up transmit side of Per Queue Threads Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 10/11] ptq: Hook up receive " Tom Herbert
2020-06-24 17:17 ` [RFC PATCH 11/11] doc: Documentation for Per Thread Queues Tom Herbert
2020-06-25  2:20   ` kernel test robot
2020-06-25 23:00   ` Jacob Keller
2020-06-29  6:28   ` Saeed Mahameed
2020-06-29 15:10     ` Tom Herbert
2020-06-24 21:50 [RFC PATCH 05/11] net: Infrastructure for per queue aRFS kernel test robot
2020-06-28  8:56 ` Rong Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.