Kernel interface to configure queue-group parameters

* Kernel interface to configure queue-group parameters
@ 2023-02-07  0:15 Nambiar, Amritha
  2023-02-07 16:28 ` Alexander H Duyck
  0 siblings, 1 reply; 10+ messages in thread
From: Nambiar, Amritha @ 2023-02-07  0:15 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, edumazet, pabeni, Saeed Mahameed, alexander.duyck,
	Samudrala, Sridhar

Hello,

We are looking for feedback on the kernel interface to configure 
queue-group level parameters.

Queues are primary residents in the kernel and there are multiple 
interfaces to configure queue-level parameters. For example, tx_maxrate 
for a transmit queue can be controlled via the sysfs interface. Ethtool 
is another option to change the RX/TX ring parameters of the specified 
network device (example, rx-buf-len, tx-push etc.).

Queue_groups are a set of queues grouped together into a single object. 
For example, tx_queue_group-0 is a transmit queue_group with index 0 and 
can have transmit queues say 0-31, similarly rx_queue_group-0 is a 
receive queue_group with index 0 and can have receive queues 0-31, 
tx/rx_queue_group_1 may consist of TX and RX queues say 32-127 
respectively. Currently, upstream drivers for both ice and mlx5 support 
creating TX and RX queue groups via the tc-mqprio and ethtool interfaces.

At this point, the kernel does not have an abstraction for queue_group. 
A close equivalent in the kernel is a 'traffic class' which consists of 
a set of transmit queues. Today, traffic classes are created using TC's 
mqprio scheduler. Only a limited set of parameters can be configured on 
each traffic class via mqprio, example priority per traffic class, min 
and max bandwidth rates per traffic class etc. Mqprio also supports 
offload of these parameters to the hardware. The parameters set for the 
traffic class (tx queue_group) is applicable to all transmit queues 
belonging to the queue_group. However, introducing additional parameters 
for queue_groups and configuring them via mqprio makes the interface 
less user-friendly (as the command line gets cumbersome due to the 
number of qdisc parameters). Although, mqprio is the interface to create 
transmit queue_groups, and is also the interface to configure and 
offload certain transmit queue_group parameters, due to these 
limitations we are wondering if it is worth considering other interface 
options for configuring queue_group parameters.

Likewise, receive queue_groups can be created using the ethtool 
interface as RSS contexts. Next step would be to configure 
per-rx_queue_group parameters. Based on the discussion in 
https://lore.kernel.org/netdev/20221114091559.7e24c7de@kernel.org/,
it looks like ethtool may not be the right interface to configure 
rx_queue_group parameters (that are unrelated to flow<->queue 
assignment), example NAPI configurations on the queue_group.

The key gaps in the kernel to support queue-group parameters are:
1. 'queue_group' abstraction in the kernel for both TX and RX distinctly
2. Offload hooks for TX/RX queue_group parameters depending on the 
chosen interface.

Following are the options we have investigated:

1. tc-mqprio:
    Pros:
    - Already supports creating queue_groups, offload of certain parameters

    Cons:
    - Introducing new parameters makes the interface less user-friendly. 
  TC qdisc parameters are specified at the qdisc creation, larger the 
number of traffic classes and their respective parameters, lesser the 
usability.

2. Ethtool:
    Pros:
    - Already creates RX queue_groups as RSS contexts

    Cons:
    - May not be the right interface for non-RSS related parameters

    Example for configuring number of napi pollers for a queue group:
    ethtool -X <iface> context <context_num> num_pollers <n>

3. sysfs:
    Pros:
    - Ideal to configure parameters such as NAPI/IRQ for Rx queue_group.
    - Makes it possible to support some existing per-netdev napi 
parameters like 'threaded' and 'napi_defer_hard_irqs' etc. to be 
per-queue-group parameters.

    Cons:
    - Requires introducing new queue_group structures for TX and RX 
queue groups and references for it, kset references for queue_group in 
struct net_device
    - Additional ndo ops in net_device_ops for each parameter for 
hardware offload.

    Examples :
    /sys/class/net/<iface>/queue_groups/rxqg-<0-n>/num_pollers
    /sys/class/net/<iface>/queue_groups/txqg-<0-n>/min_rate

4. Devlink:
    Pros:
    - New parameters can be added without any changes to the kernel or 
userspace.

    Cons:
    - Queue/Queue_group is a function-wide entity, Devlink is for 
device-wide stuff. Devlink being device centric is not suitable for 
queue parameters such as rates, NAPI etc.

Thanks,
Amritha

^ permalink raw reply	[flat|nested] 10+ messages in thread