* [pull request][net-next 00/10] mlx5 updates 2020-06-23 @ 2020-06-23 19:52 Saeed Mahameed 2020-06-23 19:52 ` [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed ` (9 more replies) 0 siblings, 10 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Saeed Mahameed Hi Dave, Jakub This series adds misc updates and one small feature, Relaxed ordering, to mlx5 driver. For more information please see tag log below. Please pull and let me know if there is any problem. Thanks, Saeed. --- The following changes since commit 8af7b4525acf5012b2f111a8b168b8647f2c8d60: Merge branch 'net-atlantic-additional-A2-features' (2020-06-22 21:10:22 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-updates-2020-06-23 for you to fetch changes up to 378dd789c6335191ca38f57b71c88a8ff4387335: net/mlx5e: Add support for PCI relaxed ordering (2020-06-23 12:49:14 -0700) ---------------------------------------------------------------- mlx5-updates-2020-06-23 1) Misc updates and cleanup 2) Use RCU instead of spinlock for vxlan table 3) Support for PCI relaxed ordering On some systems, especially ARM and AMD systems, with relaxed ordering set, traffic on the remote-numa is at the same level as when on the local numa. Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa has 300% improvement in the bandwidth. With relaxed ordering turned off: BW:10 [GB/s] With relaxed ordering turned on: BW:40 [GB/s] ---------------------------------------------------------------- Alaa Hleihel (1): net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c Aya Levin (1): net/mlx5e: Add support for PCI relaxed ordering Denis Efremov (1): net/mlx5: Use kfree(ft->g) in arfs_create_groups() Hu Haowen (2): net/mlx5: FWTrace: Add missing space net/mlx5: Add a missing macro undefinition Maxim Mikityanskiy (1): net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel Parav Pandit (1): net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed (2): net/mlx5e: vxlan: Use RCU for vxlan table lookup net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() Vlad Buslov (1): net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT .../ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 + .../net/ethernet/mellanox/mlx5/core/en/rep/neigh.c | 1 + .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c | 13 ----- .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h | 2 - drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 2 +- .../net/ethernet/mellanox/mlx5/core/en_common.c | 67 ++++++++++++++++++++-- .../net/ethernet/mellanox/mlx5/core/en_ethtool.c | 46 +++++++++++++++ drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 29 ++++++++-- drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 16 +++--- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 10 ---- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 10 ++++ .../net/ethernet/mellanox/mlx5/core/lib/vxlan.c | 64 +++++++++------------ .../net/ethernet/mellanox/mlx5/core/lib/vxlan.h | 5 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + include/linux/mlx5/driver.h | 10 +++- 18 files changed, 195 insertions(+), 89 deletions(-) ^ permalink raw reply [flat|nested] 42+ messages in thread
* [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 21:00 ` Jakub Kicinski 2020-06-23 19:52 ` [net-next 02/10] net/mlx5: FWTrace: Add missing space Saeed Mahameed ` (8 subsequent siblings) 9 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Parav Pandit, Saeed Mahameed From: Parav Pandit <parav@mellanox.com> Flow steering core layer is independent of the eswitch layer. Hence avoid fs_core dependency on eswitch. Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces") Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 10 ---------- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 1 - drivers/net/ethernet/mellanox/mlx5/core/fs_core.h | 10 ++++++++++ 3 files changed, 10 insertions(+), 11 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 67e09902bd88b..522cadc09149a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -44,16 +44,6 @@ #include "lib/mpfs.h" #include "en/tc_ct.h" -#define FDB_TC_MAX_CHAIN 3 -#define FDB_FT_CHAIN (FDB_TC_MAX_CHAIN + 1) -#define FDB_TC_SLOW_PATH_CHAIN (FDB_FT_CHAIN + 1) - -/* The index of the last real chain (FT) + 1 as chain zero is valid as well */ -#define FDB_NUM_CHAINS (FDB_FT_CHAIN + 1) - -#define FDB_TC_MAX_PRIO 16 -#define FDB_TC_LEVELS_PER_PRIO 2 - #ifdef CONFIG_MLX5_ESWITCH #define ESW_OFFLOADS_DEFAULT_NUM_GROUPS 15 diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c index 13e2fb79c21ae..e47a669839356 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.c @@ -41,7 +41,6 @@ #include "diag/fs_tracepoint.h" #include "accel/ipsec.h" #include "fpga/ipsec.h" -#include "eswitch.h" #define INIT_TREE_NODE_ARRAY_SIZE(...) (sizeof((struct init_tree_node[]){__VA_ARGS__}) /\ sizeof(struct init_tree_node)) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h index 825b662f809b4..afe7f0bffb939 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_core.h @@ -39,6 +39,16 @@ #include <linux/llist.h> #include <steering/fs_dr.h> +#define FDB_TC_MAX_CHAIN 3 +#define FDB_FT_CHAIN (FDB_TC_MAX_CHAIN + 1) +#define FDB_TC_SLOW_PATH_CHAIN (FDB_FT_CHAIN + 1) + +/* The index of the last real chain (FT) + 1 as chain zero is valid as well */ +#define FDB_NUM_CHAINS (FDB_FT_CHAIN + 1) + +#define FDB_TC_MAX_PRIO 16 +#define FDB_TC_LEVELS_PER_PRIO 2 + struct mlx5_modify_hdr { enum mlx5_flow_namespace_type ns_type; union { -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer 2020-06-23 19:52 ` [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed @ 2020-06-23 21:00 ` Jakub Kicinski 0 siblings, 0 replies; 42+ messages in thread From: Jakub Kicinski @ 2020-06-23 21:00 UTC (permalink / raw) To: Saeed Mahameed; +Cc: David S. Miller, netdev, Parav Pandit On Tue, 23 Jun 2020 12:52:20 -0700 Saeed Mahameed wrote: > From: Parav Pandit <parav@mellanox.com> > > Flow steering core layer is independent of the eswitch layer. > Hence avoid fs_core dependency on eswitch. > > Fixes: 328edb499f99 ("net/mlx5: Split FDB fast path prio to multiple namespaces") A little liberal on the use of fixes tag here... > Signed-off-by: Parav Pandit <parav@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* [net-next 02/10] net/mlx5: FWTrace: Add missing space 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed 2020-06-23 19:52 ` [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 19:52 ` [net-next 03/10] net/mlx5: Add a missing macro undefinition Saeed Mahameed ` (7 subsequent siblings) 9 siblings, 0 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Hu Haowen, Saeed Mahameed From: Hu Haowen <xianfengting221@163.com> Missing space at the end of a comment line, add it. Signed-off-by: Hu Haowen <xianfengting221@163.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c index a7551274be58a..ad3594c4afcb5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c @@ -676,7 +676,7 @@ static void mlx5_fw_tracer_handle_traces(struct work_struct *work) block_count = tracer->buff.size / TRACER_BLOCK_SIZE_BYTE; start_offset = tracer->buff.consumer_index * TRACER_BLOCK_SIZE_BYTE; - /* Copy the block to local buffer to avoid HW override while being processed*/ + /* Copy the block to local buffer to avoid HW override while being processed */ memcpy(tmp_trace_block, tracer->buff.log_buf + start_offset, TRACER_BLOCK_SIZE_BYTE); -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [net-next 03/10] net/mlx5: Add a missing macro undefinition 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed 2020-06-23 19:52 ` [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed 2020-06-23 19:52 ` [net-next 02/10] net/mlx5: FWTrace: Add missing space Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 19:52 ` [net-next 04/10] net/mlx5: Use kfree(ft->g) in arfs_create_groups() Saeed Mahameed ` (6 subsequent siblings) 9 siblings, 0 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Hu Haowen, Leon Romanovsky, Saeed Mahameed From: Hu Haowen <xianfengting221@163.com> The macro ODP_CAP_SET_MAX is only used in function handle_hca_cap_odp() in file main.c, so it should be undefined when there are no more uses of it. Signed-off-by: Hu Haowen <xianfengting221@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 8b658908f0442..be038ed1658b8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -489,6 +489,8 @@ static int handle_hca_cap_odp(struct mlx5_core_dev *dev, void *set_ctx) ODP_CAP_SET_MAX(dev, dc_odp_caps.read); ODP_CAP_SET_MAX(dev, dc_odp_caps.atomic); +#undef ODP_CAP_SET_MAX + if (!do_set) return 0; -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [net-next 04/10] net/mlx5: Use kfree(ft->g) in arfs_create_groups() 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (2 preceding siblings ...) 2020-06-23 19:52 ` [net-next 03/10] net/mlx5: Add a missing macro undefinition Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 19:52 ` [net-next 05/10] net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel Saeed Mahameed ` (5 subsequent siblings) 9 siblings, 0 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Denis Efremov, Saeed Mahameed From: Denis Efremov <efremov@linux.com> Use kfree() instead of kvfree() on ft->g in arfs_create_groups() because the memory is allocated with kcalloc(). Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index 014639ea06e34..c4c9d6cda7e62 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -220,7 +220,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft, sizeof(*ft->g), GFP_KERNEL); in = kvzalloc(inlen, GFP_KERNEL); if (!in || !ft->g) { - kvfree(ft->g); + kfree(ft->g); kvfree(in); return -ENOMEM; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [net-next 05/10] net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (3 preceding siblings ...) 2020-06-23 19:52 ` [net-next 04/10] net/mlx5: Use kfree(ft->g) in arfs_create_groups() Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 19:52 ` [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c Saeed Mahameed ` (4 subsequent siblings) 9 siblings, 0 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Maxim Mikityanskiy, Saeed Mahameed From: Maxim Mikityanskiy <maximmi@mellanox.com> mlx5e_xsk_first_unused_channel is a leftover from old versions of the first XSK commit, and it was never used. Remove it. Fixes: db05815b36cb ("net/mlx5e: Add XSK zero-copy support") Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.c | 13 ------------- .../net/ethernet/mellanox/mlx5/core/en/xsk/umem.h | 2 -- 2 files changed, 15 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c index 7b17fcd0a56d7..331ca2b0f8a4a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.c @@ -215,16 +215,3 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid) return umem ? mlx5e_xsk_enable_umem(priv, umem, ix) : mlx5e_xsk_disable_umem(priv, ix); } - -u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk) -{ - u16 res = xsk->refcnt ? params->num_channels : 0; - - while (res) { - if (mlx5e_xsk_get_umem(params, xsk, res - 1)) - break; - --res; - } - - return res; -} diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h index 25b4cbe58b540..bada949735867 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xsk/umem.h @@ -26,6 +26,4 @@ int mlx5e_xsk_setup_umem(struct net_device *dev, struct xdp_umem *umem, u16 qid) int mlx5e_xsk_resize_reuseq(struct xdp_umem *umem, u32 nentries); -u16 mlx5e_xsk_first_unused_channel(struct mlx5e_params *params, struct mlx5e_xsk *xsk); - #endif /* __MLX5_EN_XSK_UMEM_H__ */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (4 preceding siblings ...) 2020-06-23 19:52 ` [net-next 05/10] net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 21:02 ` Jakub Kicinski 2020-06-23 19:52 ` [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT Saeed Mahameed ` (3 subsequent siblings) 9 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Alaa Hleihel, Vlad Buslov, Saeed Mahameed From: Alaa Hleihel <alaa@mellanox.com> After the cited commit, the header net/arp.h is no longer used in en_rep.c. So, move it to the new file rep/neigh.c that uses it now. Fixes: 549c243e4e01 ("net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c") Signed-off-by: Alaa Hleihel <alaa@mellanox.com> Reviewed-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c index baa162432e75e..a0913836c973f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/rep/neigh.c @@ -10,6 +10,7 @@ #include <linux/spinlock.h> #include <linux/notifier.h> #include <net/netevent.h> +#include <net/arp.h> #include "neigh.h" #include "tc.h" #include "en_rep.h" diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c index 20ff8526d2126..ed2430677b129 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rep.c @@ -35,7 +35,6 @@ #include <net/switchdev.h> #include <net/pkt_cls.h> #include <net/act_api.h> -#include <net/arp.h> #include <net/devlink.h> #include <net/ipv6_stubs.h> -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c 2020-06-23 19:52 ` [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c Saeed Mahameed @ 2020-06-23 21:02 ` Jakub Kicinski 0 siblings, 0 replies; 42+ messages in thread From: Jakub Kicinski @ 2020-06-23 21:02 UTC (permalink / raw) To: Saeed Mahameed; +Cc: David S. Miller, netdev, Alaa Hleihel, Vlad Buslov On Tue, 23 Jun 2020 12:52:25 -0700 Saeed Mahameed wrote: > From: Alaa Hleihel <alaa@mellanox.com> > > After the cited commit, the header net/arp.h is no longer used in en_rep.c. > So, move it to the new file rep/neigh.c that uses it now. > > Fixes: 549c243e4e01 ("net/mlx5e: Extract neigh-specific code from en_rep.c to rep/neigh.c") ditto > Signed-off-by: Alaa Hleihel <alaa@mellanox.com> > Reviewed-by: Vlad Buslov <vladbu@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (5 preceding siblings ...) 2020-06-23 19:52 ` [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 21:03 ` Jakub Kicinski 2020-06-23 19:52 ` [net-next 08/10] net/mlx5e: vxlan: Use RCU for vxlan table lookup Saeed Mahameed ` (2 subsequent siblings) 9 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba Cc: netdev, Vlad Buslov, Roi Dayan, Maor Dickman, Saeed Mahameed From: Vlad Buslov <vladbu@mellanox.com> en_tc.h header file declares several TC-specific functions in CONFIG_MLX5_ESWITCH block even though those functions are only compiled when CONFIG_MLX5_CLS_ACT is set, which is a recent change. Move them to proper block. Fixes: d956873f908c ("net/mlx5e: Introduce kconfig var for TC support") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Maor Dickman <maord@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h index 5c330b0cae213..1561eaa89ffd2 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h @@ -40,6 +40,14 @@ #ifdef CONFIG_MLX5_ESWITCH +int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags); + +struct mlx5e_tc_update_priv { + struct net_device *tun_dev; +}; + +#if IS_ENABLED(CONFIG_MLX5_CLS_ACT) + struct tunnel_match_key { struct flow_dissector_key_control enc_control; struct flow_dissector_key_keyid enc_key_id; @@ -114,8 +122,6 @@ void mlx5e_put_encap_flow_list(struct mlx5e_priv *priv, struct list_head *flow_l struct mlx5e_neigh_hash_entry; void mlx5e_tc_update_neigh_used_value(struct mlx5e_neigh_hash_entry *nhe); -int mlx5e_tc_num_filters(struct mlx5e_priv *priv, unsigned long flags); - void mlx5e_tc_reoffload_flows_work(struct work_struct *work); enum mlx5e_tc_attr_to_reg { @@ -142,10 +148,6 @@ extern struct mlx5e_tc_attr_to_reg_mapping mlx5e_tc_attr_to_reg_mappings[]; bool mlx5e_is_valid_eswitch_fwd_dev(struct mlx5e_priv *priv, struct net_device *out_dev); -struct mlx5e_tc_update_priv { - struct net_device *tun_dev; -}; - struct mlx5e_tc_mod_hdr_acts { int num_actions; int max_actions; @@ -174,8 +176,6 @@ void mlx5e_tc_set_ethertype(struct mlx5_core_dev *mdev, struct flow_match_basic *match, bool outer, void *headers_c, void *headers_v); -#if IS_ENABLED(CONFIG_MLX5_CLS_ACT) - int mlx5e_tc_nic_init(struct mlx5e_priv *priv); void mlx5e_tc_nic_cleanup(struct mlx5e_priv *priv); -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT 2020-06-23 19:52 ` [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT Saeed Mahameed @ 2020-06-23 21:03 ` Jakub Kicinski 2020-06-23 21:26 ` Saeed Mahameed 0 siblings, 1 reply; 42+ messages in thread From: Jakub Kicinski @ 2020-06-23 21:03 UTC (permalink / raw) To: Saeed Mahameed Cc: David S. Miller, netdev, Vlad Buslov, Roi Dayan, Maor Dickman On Tue, 23 Jun 2020 12:52:26 -0700 Saeed Mahameed wrote: > From: Vlad Buslov <vladbu@mellanox.com> > > en_tc.h header file declares several TC-specific functions in > CONFIG_MLX5_ESWITCH block even though those functions are only compiled > when CONFIG_MLX5_CLS_ACT is set, which is a recent change. Move them to > proper block. > > Fixes: d956873f908c ("net/mlx5e: Introduce kconfig var for TC support") and here... do those break build or something? > Signed-off-by: Vlad Buslov <vladbu@mellanox.com> > Reviewed-by: Roi Dayan <roid@mellanox.com> > Reviewed-by: Maor Dickman <maord@mellanox.com> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT 2020-06-23 21:03 ` Jakub Kicinski @ 2020-06-23 21:26 ` Saeed Mahameed 2020-06-23 21:33 ` Jakub Kicinski 0 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 21:26 UTC (permalink / raw) To: kuba; +Cc: Roi Dayan, Maor Dickman, davem, netdev, Vlad Buslov On Tue, 2020-06-23 at 14:03 -0700, Jakub Kicinski wrote: > On Tue, 23 Jun 2020 12:52:26 -0700 Saeed Mahameed wrote: > > From: Vlad Buslov <vladbu@mellanox.com> > > > > en_tc.h header file declares several TC-specific functions in > > CONFIG_MLX5_ESWITCH block even though those functions are only > > compiled > > when CONFIG_MLX5_CLS_ACT is set, which is a recent change. Move > > them to > > proper block. > > > > Fixes: d956873f908c ("net/mlx5e: Introduce kconfig var for TC > > support") > > and here... do those break build or something? No, just redundant exposure and leftovers. Do you want me to remove the Fixes Tags ? Personally I don't mind fixes tags for something this basic, but your call.. > > > Signed-off-by: Vlad Buslov <vladbu@mellanox.com> > > Reviewed-by: Roi Dayan <roid@mellanox.com> > > Reviewed-by: Maor Dickman <maord@mellanox.com> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT 2020-06-23 21:26 ` Saeed Mahameed @ 2020-06-23 21:33 ` Jakub Kicinski 0 siblings, 0 replies; 42+ messages in thread From: Jakub Kicinski @ 2020-06-23 21:33 UTC (permalink / raw) To: Saeed Mahameed; +Cc: Roi Dayan, Maor Dickman, davem, netdev, Vlad Buslov On Tue, 23 Jun 2020 21:26:02 +0000 Saeed Mahameed wrote: > On Tue, 2020-06-23 at 14:03 -0700, Jakub Kicinski wrote: > > On Tue, 23 Jun 2020 12:52:26 -0700 Saeed Mahameed wrote: > > > From: Vlad Buslov <vladbu@mellanox.com> > > > > > > en_tc.h header file declares several TC-specific functions in > > > CONFIG_MLX5_ESWITCH block even though those functions are only > > > compiled > > > when CONFIG_MLX5_CLS_ACT is set, which is a recent change. Move > > > them to > > > proper block. > > > > > > Fixes: d956873f908c ("net/mlx5e: Introduce kconfig var for TC > > > support") > > > > and here... do those break build or something? > > No, just redundant exposure and leftovers. > Do you want me to remove the Fixes Tags ? > Personally I don't mind fixes tags for something this basic, > but your call.. If you don't mind - please remove them, IMHO frivolous use of Fixes tags removes half of their value. ^ permalink raw reply [flat|nested] 42+ messages in thread
* [net-next 08/10] net/mlx5e: vxlan: Use RCU for vxlan table lookup 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (6 preceding siblings ...) 2020-06-23 19:52 ` [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 19:52 ` [net-next 09/10] net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() Saeed Mahameed 2020-06-23 19:52 ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Saeed Mahameed 9 siblings, 0 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Saeed Mahameed, Maxim Mikityanskiy Remove the spinlock protecting the vxlan table and use RCU instead. This will improve performance as it will eliminate contention on data path cores. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Reviewed-by: Maxim Mikityanskiy <maximmi@mellanox.com> --- .../ethernet/mellanox/mlx5/core/lib/vxlan.c | 65 ++++++++----------- 1 file changed, 27 insertions(+), 38 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c index 82c766a951656..85cbc42955859 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c @@ -40,7 +40,6 @@ struct mlx5_vxlan { struct mlx5_core_dev *mdev; - spinlock_t lock; /* protect vxlan table */ /* max_num_ports is usuallly 4, 16 buckets is more than enough */ DECLARE_HASHTABLE(htable, 4); int num_ports; @@ -78,45 +77,46 @@ static int mlx5_vxlan_core_del_port_cmd(struct mlx5_core_dev *mdev, u16 port) return mlx5_cmd_exec_in(mdev, delete_vxlan_udp_dport, in); } -static struct mlx5_vxlan_port* -mlx5_vxlan_lookup_port_locked(struct mlx5_vxlan *vxlan, u16 port) +struct mlx5_vxlan_port *mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) { - struct mlx5_vxlan_port *vxlanp; + struct mlx5_vxlan_port *retptr = NULL, *vxlanp; - hash_for_each_possible(vxlan->htable, vxlanp, hlist, port) { - if (vxlanp->udp_port == port) - return vxlanp; - } + if (!mlx5_vxlan_allowed(vxlan)) + return NULL; - return NULL; + rcu_read_lock(); + hash_for_each_possible_rcu(vxlan->htable, vxlanp, hlist, port) + if (vxlanp->udp_port == port) { + retptr = vxlanp; + break; + } + rcu_read_unlock(); + + return retptr; } -struct mlx5_vxlan_port *mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) +static struct mlx5_vxlan_port *vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) { struct mlx5_vxlan_port *vxlanp; - if (!mlx5_vxlan_allowed(vxlan)) - return NULL; - - spin_lock_bh(&vxlan->lock); - vxlanp = mlx5_vxlan_lookup_port_locked(vxlan, port); - spin_unlock_bh(&vxlan->lock); - - return vxlanp; + hash_for_each_possible(vxlan->htable, vxlanp, hlist, port) + if (vxlanp->udp_port == port) + return vxlanp; + return NULL; } int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port) { struct mlx5_vxlan_port *vxlanp; - int ret = -ENOSPC; + int ret = 0; - vxlanp = mlx5_vxlan_lookup_port(vxlan, port); + mutex_lock(&vxlan->sync_lock); + vxlanp = vxlan_lookup_port(vxlan, port); if (vxlanp) { refcount_inc(&vxlanp->refcount); - return 0; + goto unlock; } - mutex_lock(&vxlan->sync_lock); if (vxlan->num_ports >= mlx5_vxlan_max_udp_ports(vxlan->mdev)) { mlx5_core_info(vxlan->mdev, "UDP port (%d) not offloaded, max number of UDP ports (%d) are already offloaded\n", @@ -138,9 +138,7 @@ int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port) vxlanp->udp_port = port; refcount_set(&vxlanp->refcount, 1); - spin_lock_bh(&vxlan->lock); - hash_add(vxlan->htable, &vxlanp->hlist, port); - spin_unlock_bh(&vxlan->lock); + hash_add_rcu(vxlan->htable, &vxlanp->hlist, port); vxlan->num_ports++; mutex_unlock(&vxlan->sync_lock); @@ -157,34 +155,26 @@ int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port) int mlx5_vxlan_del_port(struct mlx5_vxlan *vxlan, u16 port) { struct mlx5_vxlan_port *vxlanp; - bool remove = false; int ret = 0; mutex_lock(&vxlan->sync_lock); - spin_lock_bh(&vxlan->lock); - vxlanp = mlx5_vxlan_lookup_port_locked(vxlan, port); + vxlanp = vxlan_lookup_port(vxlan, port); if (!vxlanp) { ret = -ENOENT; goto out_unlock; } if (refcount_dec_and_test(&vxlanp->refcount)) { - hash_del(&vxlanp->hlist); - remove = true; - } - -out_unlock: - spin_unlock_bh(&vxlan->lock); - - if (remove) { + hash_del_rcu(&vxlanp->hlist); + synchronize_rcu(); mlx5_vxlan_core_del_port_cmd(vxlan->mdev, port); kfree(vxlanp); vxlan->num_ports--; } +out_unlock: mutex_unlock(&vxlan->sync_lock); - return ret; } @@ -201,7 +191,6 @@ struct mlx5_vxlan *mlx5_vxlan_create(struct mlx5_core_dev *mdev) vxlan->mdev = mdev; mutex_init(&vxlan->sync_lock); - spin_lock_init(&vxlan->lock); hash_init(vxlan->htable); /* Hardware adds 4789 (IANA_VXLAN_UDP_PORT) by default */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [net-next 09/10] net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (7 preceding siblings ...) 2020-06-23 19:52 ` [net-next 08/10] net/mlx5e: vxlan: Use RCU for vxlan table lookup Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 19:52 ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Saeed Mahameed 9 siblings, 0 replies; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Saeed Mahameed struct mlx5_vxlan_port is not exposed to the outside callers, it is redundant to return a pointer to it from mlx5_vxlan_port_lookup(), to be only used as a boolean, so just return a boolean. Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c | 9 +++++---- drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h | 5 ++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c index 85cbc42955859..be34330d89cc4 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.c @@ -77,9 +77,10 @@ static int mlx5_vxlan_core_del_port_cmd(struct mlx5_core_dev *mdev, u16 port) return mlx5_cmd_exec_in(mdev, delete_vxlan_udp_dport, in); } -struct mlx5_vxlan_port *mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) +bool mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) { - struct mlx5_vxlan_port *retptr = NULL, *vxlanp; + struct mlx5_vxlan_port *vxlanp; + bool found = false; if (!mlx5_vxlan_allowed(vxlan)) return NULL; @@ -87,12 +88,12 @@ struct mlx5_vxlan_port *mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 por rcu_read_lock(); hash_for_each_possible_rcu(vxlan->htable, vxlanp, hlist, port) if (vxlanp->udp_port == port) { - retptr = vxlanp; + found = true; break; } rcu_read_unlock(); - return retptr; + return found; } static struct mlx5_vxlan_port *vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h index 8fb0eb08fa6d2..6d599f4a8acdf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/vxlan.h @@ -50,15 +50,14 @@ struct mlx5_vxlan *mlx5_vxlan_create(struct mlx5_core_dev *mdev); void mlx5_vxlan_destroy(struct mlx5_vxlan *vxlan); int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port); int mlx5_vxlan_del_port(struct mlx5_vxlan *vxlan, u16 port); -struct mlx5_vxlan_port *mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port); +bool mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port); #else static inline struct mlx5_vxlan* mlx5_vxlan_create(struct mlx5_core_dev *mdev) { return ERR_PTR(-EOPNOTSUPP); } static inline void mlx5_vxlan_destroy(struct mlx5_vxlan *vxlan) { return; } static inline int mlx5_vxlan_add_port(struct mlx5_vxlan *vxlan, u16 port) { return -EOPNOTSUPP; } static inline int mlx5_vxlan_del_port(struct mlx5_vxlan *vxlan, u16 port) { return -EOPNOTSUPP; } -static inline struct mx5_vxlan_port* -mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) { return NULL; } +static inline bool mlx5_vxlan_lookup_port(struct mlx5_vxlan *vxlan, u16 port) { return false; } #endif #endif /* __MLX5_VXLAN_H__ */ -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed ` (8 preceding siblings ...) 2020-06-23 19:52 ` [net-next 09/10] net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() Saeed Mahameed @ 2020-06-23 19:52 ` Saeed Mahameed 2020-06-23 21:31 ` Jakub Kicinski 9 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-23 19:52 UTC (permalink / raw) To: David S. Miller, kuba; +Cc: netdev, Aya Levin, Tariq Toukan, Saeed Mahameed From: Aya Levin <ayal@mellanox.com> The concept of Relaxed Ordering in the PCI Express environment allows switches in the path between the Requester and Completer to reorder some transactions just received before others that were previously enqueued. In ETH driver, there is no question of write integrity since each memory segment is written only once per cycle. In addition, the driver doesn't access the memory shared with the hardware until the corresponding CQE arrives indicating all PCI transactions are done. With relaxed ordering set, traffic on the remote-numa is at the same level as when on the local numa. Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa has 300% improvement in the bandwidth. With relaxed ordering turned off: BW:10 [GB/s] With relaxed ordering turned on: BW:40 [GB/s] The driver turns relaxed ordering off by default. It exposes 2 boolean private-flags in ethtool: pci_ro_read and pci_ro_write for user control. $ ethtool --show-priv-flags eth2 Private flags for eth2: ... pci_ro_read : off pci_ro_write : off $ ethtool --set-priv-flags eth2 pci_ro_write on $ ethtool --set-priv-flags eth2 pci_ro_read on Signed-off-by: Aya Levin <ayal@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 3 + .../ethernet/mellanox/mlx5/core/en_common.c | 67 +++++++++++++++++-- .../ethernet/mellanox/mlx5/core/en_ethtool.c | 46 +++++++++++++ .../net/ethernet/mellanox/mlx5/core/en_main.c | 29 ++++++-- include/linux/mlx5/driver.h | 10 ++- 5 files changed, 143 insertions(+), 12 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 842db20493df6..32b1d41d36347 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -228,6 +228,8 @@ enum mlx5e_priv_flag { MLX5E_PFLAG_RX_STRIDING_RQ, MLX5E_PFLAG_RX_NO_CSUM_COMPLETE, MLX5E_PFLAG_XDP_TX_MPWQE, + MLX5E_PFLAG_PCI_RO_READ, + MLX5E_PFLAG_PCI_RO_WRITE, MLX5E_NUM_PFLAGS, /* Keep last */ }; @@ -1033,6 +1035,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev); void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev); int mlx5e_refresh_tirs(struct mlx5e_priv *priv, bool enable_uc_lb, bool enable_mc_lb); +__be32 mlx5e_mkey_ro_get(struct mlx5e_resources *res, u8 mkey_idx); /* common netdev helpers */ void mlx5e_create_q_counters(struct mlx5e_priv *priv); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c index 1e42c7ae621b9..a3a6a16c774d0 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_common.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_common.c @@ -61,9 +61,10 @@ void mlx5e_destroy_tir(struct mlx5_core_dev *mdev, } static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, - struct mlx5_core_mkey *mkey) + struct mlx5_core_mkey *mkey, u8 ro_state) { int inlen = MLX5_ST_SZ_BYTES(create_mkey_in); + static const u8 mkey_variant = 0x5e; void *mkc; u32 *in; int err; @@ -76,10 +77,13 @@ static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_PA); MLX5_SET(mkc, mkc, lw, 1); MLX5_SET(mkc, mkc, lr, 1); - + MLX5_SET(mkc, mkc, relaxed_ordering_read, ro_state & MLX5E_MKEY_RO_READ); + MLX5_SET(mkc, mkc, relaxed_ordering_write, ro_state & MLX5E_MKEY_RO_WRITE); MLX5_SET(mkc, mkc, pd, pdn); MLX5_SET(mkc, mkc, length64, 1); MLX5_SET(mkc, mkc, qpn, 0xffffff); + MLX5_SET(mkc, mkc, mkey_7_0, mkey_variant); + mkey->key = mkey_variant; err = mlx5_core_create_mkey(mdev, mkey, in, inlen); @@ -87,6 +91,57 @@ static int mlx5e_create_mkey(struct mlx5_core_dev *mdev, u32 pdn, return err; } +static bool mlx5e_rx_mkey_supported(struct mlx5_core_dev *mdev, u8 mkey_idx) +{ + if ((mkey_idx & MLX5E_MKEY_RO_READ) && + !MLX5_CAP_GEN(mdev, relaxed_ordering_read)) + return false; + if ((mkey_idx & MLX5E_MKEY_RO_WRITE) && + !MLX5_CAP_GEN(mdev, relaxed_ordering_write)) + return false; + return true; +} + +static int mlx5e_create_mkeys(struct mlx5_core_dev *mdev, u32 pdn, + struct mlx5_core_mkey mkey_arr[]) +{ + int i, err; + + for (i = 0; i < MLX5E_MKEY_RO_NUM; i++) { + if (!mlx5e_rx_mkey_supported(mdev, i)) + continue; + err = mlx5e_create_mkey(mdev, pdn, &mkey_arr[i], i); + if (err) + goto destroy; + } + return err; + +destroy: + while (--i >= 0) { + if (!mkey_arr[i].key) + continue; + mlx5_core_destroy_mkey(mdev, &mkey_arr[i]); + } + return err; +} + +static void mlx5e_destroy_mkeys(struct mlx5_core_dev *mdev, + struct mlx5_core_mkey mkey_arr[]) +{ + int i; + + for (i = 0; i < MLX5E_MKEY_RO_NUM; i++) { + if (!mkey_arr[i].key) + continue; + mlx5_core_destroy_mkey(mdev, &mkey_arr[i]); + } +} + +__be32 mlx5e_mkey_ro_get(struct mlx5e_resources *res, u8 mkey_idx) +{ + return cpu_to_be32(res->mkey_ro[mkey_idx].key); +} + int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev) { struct mlx5e_resources *res = &mdev->mlx5e_res; @@ -104,9 +159,9 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev) goto err_dealloc_pd; } - err = mlx5e_create_mkey(mdev, res->pdn, &res->mkey); + err = mlx5e_create_mkeys(mdev, res->pdn, res->mkey_ro); if (err) { - mlx5_core_err(mdev, "create mkey failed, %d\n", err); + mlx5_core_err(mdev, "create mkeys failed, %d\n", err); goto err_dealloc_transport_domain; } @@ -122,7 +177,7 @@ int mlx5e_create_mdev_resources(struct mlx5_core_dev *mdev) return 0; err_destroy_mkey: - mlx5_core_destroy_mkey(mdev, &res->mkey); + mlx5e_destroy_mkeys(mdev, res->mkey_ro); err_dealloc_transport_domain: mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); err_dealloc_pd: @@ -135,7 +190,7 @@ void mlx5e_destroy_mdev_resources(struct mlx5_core_dev *mdev) struct mlx5e_resources *res = &mdev->mlx5e_res; mlx5_free_bfreg(mdev, &res->bfreg); - mlx5_core_destroy_mkey(mdev, &res->mkey); + mlx5e_destroy_mkeys(mdev, res->mkey_ro); mlx5_core_dealloc_transport_domain(mdev, res->td.tdn); mlx5_core_dealloc_pd(mdev, res->pdn); memset(res, 0, sizeof(*res)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c index ec5658bbe3c57..4e61f7f87118f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c @@ -1905,6 +1905,50 @@ static int set_pflag_xdp_tx_mpwqe(struct net_device *netdev, bool enable) return err; } +static int set_pflag_pci_ro_read(struct net_device *netdev, bool enable) +{ + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5e_channels new_channels = {}; + struct mlx5e_resources *res; + + res = &priv->mdev->mlx5e_res; + if (enable && !mlx5e_mkey_ro_get(res, MLX5E_MKEY_RO_READ)) + return -EOPNOTSUPP; + + new_channels.params = priv->channels.params; + + MLX5E_SET_PFLAG(&new_channels.params, MLX5E_PFLAG_PCI_RO_READ, enable); + + if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { + priv->channels.params = new_channels.params; + return 0; + } + + return mlx5e_safe_switch_channels(priv, &new_channels, NULL, NULL); +} + +static int set_pflag_pci_ro_write(struct net_device *netdev, bool enable) +{ + struct mlx5e_priv *priv = netdev_priv(netdev); + struct mlx5e_channels new_channels = {}; + struct mlx5e_resources *res; + + res = &priv->mdev->mlx5e_res; + if (enable && !mlx5e_mkey_ro_get(res, MLX5E_MKEY_RO_WRITE)) + return -EOPNOTSUPP; + + new_channels.params = priv->channels.params; + + MLX5E_SET_PFLAG(&new_channels.params, MLX5E_PFLAG_PCI_RO_WRITE, enable); + + if (!test_bit(MLX5E_STATE_OPENED, &priv->state)) { + priv->channels.params = new_channels.params; + return 0; + } + + return mlx5e_safe_switch_channels(priv, &new_channels, NULL, NULL); +} + static const struct pflag_desc mlx5e_priv_flags[MLX5E_NUM_PFLAGS] = { { "rx_cqe_moder", set_pflag_rx_cqe_based_moder }, { "tx_cqe_moder", set_pflag_tx_cqe_based_moder }, @@ -1912,6 +1956,8 @@ static const struct pflag_desc mlx5e_priv_flags[MLX5E_NUM_PFLAGS] = { { "rx_striding_rq", set_pflag_rx_striding_rq }, { "rx_no_csum_complete", set_pflag_rx_no_csum_complete }, { "xdp_tx_mpwqe", set_pflag_xdp_tx_mpwqe }, + { "pci_ro_read", set_pflag_pci_ro_read }, + { "pci_ro_write", set_pflag_pci_ro_write }, }; static int mlx5e_handle_pflag(struct net_device *netdev, diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index a836a02a21166..80d1d940a78a6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -258,8 +258,11 @@ static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev, u64 npages, u8 page_shift, + struct mlx5e_params *params, struct mlx5_core_mkey *umr_mkey) { + bool ro_write = MLX5E_GET_PFLAG(params, MLX5E_PFLAG_PCI_RO_WRITE); + bool ro_read = MLX5E_GET_PFLAG(params, MLX5E_PFLAG_PCI_RO_READ); int inlen = MLX5_ST_SZ_BYTES(create_mkey_in); void *mkc; u32 *in; @@ -276,7 +279,8 @@ static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev, MLX5_SET(mkc, mkc, lw, 1); MLX5_SET(mkc, mkc, lr, 1); MLX5_SET(mkc, mkc, access_mode_1_0, MLX5_MKC_ACCESS_MODE_MTT); - + MLX5_SET(mkc, mkc, relaxed_ordering_write, ro_write); + MLX5_SET(mkc, mkc, relaxed_ordering_read, ro_read); MLX5_SET(mkc, mkc, qpn, 0xffffff); MLX5_SET(mkc, mkc, pd, mdev->mlx5e_res.pdn); MLX5_SET64(mkc, mkc, len, npages << page_shift); @@ -290,11 +294,12 @@ static int mlx5e_create_umr_mkey(struct mlx5_core_dev *mdev, return err; } -static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq *rq) +static int mlx5e_create_rq_umr_mkey(struct mlx5_core_dev *mdev, struct mlx5e_rq *rq, + struct mlx5e_params *params) { u64 num_mtts = MLX5E_REQUIRED_MTTS(mlx5_wq_ll_get_size(&rq->mpwqe.wq)); - return mlx5e_create_umr_mkey(mdev, num_mtts, PAGE_SHIFT, &rq->umr_mkey); + return mlx5e_create_umr_mkey(mdev, num_mtts, PAGE_SHIFT, params, &rq->umr_mkey); } static inline u64 mlx5e_get_mpwqe_offset(struct mlx5e_rq *rq, u16 wqe_ix) @@ -457,7 +462,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, rq->buff.frame0_sz = (1 << rq->mpwqe.log_stride_sz); - err = mlx5e_create_rq_umr_mkey(mdev, rq); + err = mlx5e_create_rq_umr_mkey(mdev, rq, params); if (err) goto err_rq_wq_destroy; rq->mkey_be = cpu_to_be32(rq->umr_mkey.key); @@ -1924,6 +1929,18 @@ static u8 mlx5e_enumerate_lag_port(struct mlx5_core_dev *mdev, int ix) return (ix + port_aff_bias) % mlx5e_get_num_lag_ports(mdev); } +static __be32 mlx5e_choose_ro_mkey(struct mlx5e_resources *res, struct mlx5e_params *params) +{ + u8 mkey_idx = 0; + + if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_PCI_RO_READ)) + mkey_idx |= MLX5E_MKEY_RO_READ; + if (MLX5E_GET_PFLAG(params, MLX5E_PFLAG_PCI_RO_WRITE)) + mkey_idx |= MLX5E_MKEY_RO_WRITE; + + return mlx5e_mkey_ro_get(res, mkey_idx); +} + static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, struct mlx5e_params *params, struct mlx5e_channel_param *cparam, @@ -1953,12 +1970,14 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, c->cpu = cpu; c->pdev = priv->mdev->device; c->netdev = priv->netdev; - c->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key); c->num_tc = params->num_tc; c->xdp = !!params->xdp_prog; c->stats = &priv->channel_stats[ix].ch; c->irq_desc = irq_to_desc(irq); c->lag_port = mlx5e_enumerate_lag_port(priv->mdev, ix); + c->mkey_be = mlx5e_choose_ro_mkey(&priv->mdev->mlx5e_res, params); + if (WARN_ON_ONCE(!c->mkey_be)) + return -EINVAL; netif_napi_add(netdev, &c->napi, mlx5e_napi_poll, 64); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 13c0e4556eda9..f3e97c3606705 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -613,10 +613,18 @@ struct mlx5_td { u32 tdn; }; +enum mlx5e_mkey_ro { + MLX5E_MKEY_RO_NONE = 0, + MLX5E_MKEY_RO_READ = 1, + MLX5E_MKEY_RO_WRITE = 2, + MLX5E_MKEY_RO_RW = 3, + MLX5E_MKEY_RO_NUM +}; + struct mlx5e_resources { u32 pdn; struct mlx5_td td; - struct mlx5_core_mkey mkey; + struct mlx5_core_mkey mkey_ro[MLX5E_MKEY_RO_NUM]; struct mlx5_sq_bfreg bfreg; }; -- 2.26.2 ^ permalink raw reply related [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-23 19:52 ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Saeed Mahameed @ 2020-06-23 21:31 ` Jakub Kicinski 2020-06-24 6:56 ` Saeed Mahameed 0 siblings, 1 reply; 42+ messages in thread From: Jakub Kicinski @ 2020-06-23 21:31 UTC (permalink / raw) To: Saeed Mahameed Cc: David S. Miller, netdev, Aya Levin, Tariq Toukan, Michal Kubecek On Tue, 23 Jun 2020 12:52:29 -0700 Saeed Mahameed wrote: > From: Aya Levin <ayal@mellanox.com> > > The concept of Relaxed Ordering in the PCI Express environment allows > switches in the path between the Requester and Completer to reorder some > transactions just received before others that were previously enqueued. > > In ETH driver, there is no question of write integrity since each memory > segment is written only once per cycle. In addition, the driver doesn't > access the memory shared with the hardware until the corresponding CQE > arrives indicating all PCI transactions are done. Assuming the device sets the RO bits appropriately, right? Otherwise CQE write could theoretically surpass the data write, no? > With relaxed ordering set, traffic on the remote-numa is at the same > level as when on the local numa. Same level of? Achievable bandwidth? > Running TCP single stream over ConnectX-4 LX, ARM CPU on remote-numa > has 300% improvement in the bandwidth. > With relaxed ordering turned off: BW:10 [GB/s] > With relaxed ordering turned on: BW:40 [GB/s] > > The driver turns relaxed ordering off by default. It exposes 2 boolean > private-flags in ethtool: pci_ro_read and pci_ro_write for user > control. > > $ ethtool --show-priv-flags eth2 > Private flags for eth2: > ... > pci_ro_read : off > pci_ro_write : off > > $ ethtool --set-priv-flags eth2 pci_ro_write on > $ ethtool --set-priv-flags eth2 pci_ro_read on I think Michal will rightly complain that this does not belong in private flags any more. As (/if?) ARM deployments take a foothold in DC this will become a common setting for most NICs. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-23 21:31 ` Jakub Kicinski @ 2020-06-24 6:56 ` Saeed Mahameed 2020-06-24 7:34 ` Aya Levin 0 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-24 6:56 UTC (permalink / raw) To: kuba; +Cc: mkubecek, Aya Levin, davem, netdev, Tariq Toukan On Tue, 2020-06-23 at 14:31 -0700, Jakub Kicinski wrote: > On Tue, 23 Jun 2020 12:52:29 -0700 Saeed Mahameed wrote: > > From: Aya Levin <ayal@mellanox.com> > > > > The concept of Relaxed Ordering in the PCI Express environment > > allows > > switches in the path between the Requester and Completer to reorder > > some > > transactions just received before others that were previously > > enqueued. > > > > In ETH driver, there is no question of write integrity since each > > memory > > segment is written only once per cycle. In addition, the driver > > doesn't > > access the memory shared with the hardware until the corresponding > > CQE > > arrives indicating all PCI transactions are done. > Hi Jakub, sorry i missed your comments on this patch. > Assuming the device sets the RO bits appropriately, right? Otherwise > CQE write could theoretically surpass the data write, no? > Yes HW guarantees correctness of correlated queues and transactions. > > With relaxed ordering set, traffic on the remote-numa is at the > > same > > level as when on the local numa. > > Same level of? Achievable bandwidth? > Yes, Bandwidth, according the below explanation, i see that the message needs improvements. > > Running TCP single stream over ConnectX-4 LX, ARM CPU on remote- > > numa > > has 300% improvement in the bandwidth. > > With relaxed ordering turned off: BW:10 [GB/s] > > With relaxed ordering turned on: BW:40 [GB/s] > > > > The driver turns relaxed ordering off by default. It exposes 2 > > boolean > > private-flags in ethtool: pci_ro_read and pci_ro_write for user > > control. > > > > $ ethtool --show-priv-flags eth2 > > Private flags for eth2: > > ... > > pci_ro_read : off > > pci_ro_write : off > > > > $ ethtool --set-priv-flags eth2 pci_ro_write on > > $ ethtool --set-priv-flags eth2 pci_ro_read on > > I think Michal will rightly complain that this does not belong in > private flags any more. As (/if?) ARM deployments take a foothold > in DC this will become a common setting for most NICs. Initially we used pcie_relaxed_ordering_enabled() to programmatically enable this on/off on boot but this seems to introduce some degradation on some Intel CPUs since the Intel Faulty CPUs list is not up to date. Aya is discussing this with Bjorn. So until we figure this out, will keep this off by default. for the private flags we want to keep them for performance analysis as we do with all other mlx5 special performance features and flags. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-24 6:56 ` Saeed Mahameed @ 2020-06-24 7:34 ` Aya Levin 2020-06-24 17:22 ` Jakub Kicinski 0 siblings, 1 reply; 42+ messages in thread From: Aya Levin @ 2020-06-24 7:34 UTC (permalink / raw) To: Saeed Mahameed, kuba, Bjorn Helgaas; +Cc: mkubecek, davem, netdev, Tariq Toukan On 6/24/2020 9:56 AM, Saeed Mahameed wrote: > On Tue, 2020-06-23 at 14:31 -0700, Jakub Kicinski wrote: >> On Tue, 23 Jun 2020 12:52:29 -0700 Saeed Mahameed wrote: >>> From: Aya Levin <ayal@mellanox.com> >>> >>> The concept of Relaxed Ordering in the PCI Express environment >>> allows >>> switches in the path between the Requester and Completer to reorder >>> some >>> transactions just received before others that were previously >>> enqueued. >>> >>> In ETH driver, there is no question of write integrity since each >>> memory >>> segment is written only once per cycle. In addition, the driver >>> doesn't >>> access the memory shared with the hardware until the corresponding >>> CQE >>> arrives indicating all PCI transactions are done. >> > > Hi Jakub, sorry i missed your comments on this patch. > >> Assuming the device sets the RO bits appropriately, right? Otherwise >> CQE write could theoretically surpass the data write, no? >> > > Yes HW guarantees correctness of correlated queues and transactions. > >>> With relaxed ordering set, traffic on the remote-numa is at the >>> same >>> level as when on the local numa. >> >> Same level of? Achievable bandwidth? >> > > Yes, Bandwidth, according the below explanation, i see that the message > needs improvements. > >>> Running TCP single stream over ConnectX-4 LX, ARM CPU on remote- >>> numa >>> has 300% improvement in the bandwidth. >>> With relaxed ordering turned off: BW:10 [GB/s] >>> With relaxed ordering turned on: BW:40 [GB/s] >>> >>> The driver turns relaxed ordering off by default. It exposes 2 >>> boolean >>> private-flags in ethtool: pci_ro_read and pci_ro_write for user >>> control. >>> >>> $ ethtool --show-priv-flags eth2 >>> Private flags for eth2: >>> ... >>> pci_ro_read : off >>> pci_ro_write : off >>> >>> $ ethtool --set-priv-flags eth2 pci_ro_write on >>> $ ethtool --set-priv-flags eth2 pci_ro_read on >> >> I think Michal will rightly complain that this does not belong in >> private flags any more. As (/if?) ARM deployments take a foothold >> in DC this will become a common setting for most NICs. > > Initially we used pcie_relaxed_ordering_enabled() to > programmatically enable this on/off on boot but this seems to > introduce some degradation on some Intel CPUs since the Intel Faulty > CPUs list is not up to date. Aya is discussing this with Bjorn. Adding Bjorn Helgaas > > So until we figure this out, will keep this off by default. > > for the private flags we want to keep them for performance analysis as > we do with all other mlx5 special performance features and flags. > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-24 7:34 ` Aya Levin @ 2020-06-24 17:22 ` Jakub Kicinski 2020-06-24 20:15 ` Saeed Mahameed 2020-06-26 20:12 ` Bjorn Helgaas 0 siblings, 2 replies; 42+ messages in thread From: Jakub Kicinski @ 2020-06-24 17:22 UTC (permalink / raw) To: Aya Levin Cc: Saeed Mahameed, Bjorn Helgaas, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck On Wed, 24 Jun 2020 10:34:40 +0300 Aya Levin wrote: > >> I think Michal will rightly complain that this does not belong in > >> private flags any more. As (/if?) ARM deployments take a foothold > >> in DC this will become a common setting for most NICs. > > > > Initially we used pcie_relaxed_ordering_enabled() to > > programmatically enable this on/off on boot but this seems to > > introduce some degradation on some Intel CPUs since the Intel Faulty > > CPUs list is not up to date. Aya is discussing this with Bjorn. > Adding Bjorn Helgaas I see. Simply using pcie_relaxed_ordering_enabled() and blacklisting bad CPUs seems far nicer from operational perspective. Perhaps Bjorn will chime in. Pushing the validation out to the user is not a great solution IMHO. > > So until we figure this out, will keep this off by default. > > > > for the private flags we want to keep them for performance analysis as > > we do with all other mlx5 special performance features and flags. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-24 17:22 ` Jakub Kicinski @ 2020-06-24 20:15 ` Saeed Mahameed [not found] ` <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> 2020-06-26 20:12 ` Bjorn Helgaas 1 sibling, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-06-24 20:15 UTC (permalink / raw) To: Aya Levin, kuba Cc: mkubecek, linux-pci, helgaas, davem, netdev, Tariq Toukan, alexander.h.duyck On Wed, 2020-06-24 at 10:22 -0700, Jakub Kicinski wrote: > On Wed, 24 Jun 2020 10:34:40 +0300 Aya Levin wrote: > > > > I think Michal will rightly complain that this does not belong > > > > in > > > > private flags any more. As (/if?) ARM deployments take a > > > > foothold > > > > in DC this will become a common setting for most NICs. > > > > > > Initially we used pcie_relaxed_ordering_enabled() to > > > programmatically enable this on/off on boot but this seems to > > > introduce some degradation on some Intel CPUs since the Intel > > > Faulty > > > CPUs list is not up to date. Aya is discussing this with Bjorn. > > Adding Bjorn Helgaas > > I see. Simply using pcie_relaxed_ordering_enabled() and blacklisting > bad CPUs seems far nicer from operational perspective. Perhaps Bjorn > will chime in. Pushing the validation out to the user is not a great > solution IMHO. > Can we move on with this patch for now ? since we are going to keep the user knob anyway, what is missing is setting the default value automatically but this can't be done until we fix pcie_relaxed_ordering_enabled() ^ permalink raw reply [flat|nested] 42+ messages in thread
[parent not found: <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>]
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering [not found] ` <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> @ 2020-07-06 13:00 ` Aya Levin 2020-07-06 16:52 ` Jakub Kicinski 2020-07-06 19:49 ` David Miller 0 siblings, 2 replies; 42+ messages in thread From: Aya Levin @ 2020-07-06 13:00 UTC (permalink / raw) To: Jakub Kicinski, Saeed Mahameed, David S. Miller Cc: mkubecek, linux-pci, helgaas, davem, netdev, Tariq Toukan, alexander.h.duyck@linux.intel.com" On 6/24/2020 11:30 PM, Jakub Kicinski wrote: > On Wed, 24 Jun 2020 20:15:14 +0000 Saeed Mahameed wrote: >> On Wed, 2020-06-24 at 10:22 -0700, Jakub Kicinski wrote: >>> On Wed, 24 Jun 2020 10:34:40 +0300 Aya Levin wrote: >>>>>> I think Michal will rightly complain that this does not belong >>>>>> in >>>>>> private flags any more. As (/if?) ARM deployments take a >>>>>> foothold >>>>>> in DC this will become a common setting for most NICs. >>>>> >>>>> Initially we used pcie_relaxed_ordering_enabled() to >>>>> programmatically enable this on/off on boot but this seems to >>>>> introduce some degradation on some Intel CPUs since the Intel >>>>> Faulty >>>>> CPUs list is not up to date. Aya is discussing this with Bjorn. >>>> Adding Bjorn Helgaas >>> >>> I see. Simply using pcie_relaxed_ordering_enabled() and blacklisting >>> bad CPUs seems far nicer from operational perspective. Perhaps Bjorn >>> will chime in. Pushing the validation out to the user is not a great >>> solution IMHO. >> >> Can we move on with this patch for now ? since we are going to keep the >> user knob anyway, what is missing is setting the default value >> automatically but this can't be done until we >> fix pcie_relaxed_ordering_enabled() > > If this patch was just adding a chicken bit that'd be fine, but opt in > I'm not hugely comfortable with. Seems like Bjorn has provided some > assistance already on the defaults but there doesn't appear to be much > progress being made. Hi Jakub, Dave Assuming the discussions with Bjorn will conclude in a well-trusted API that ensures relaxed ordering in enabled, I'd still like a method to turn off relaxed ordering for performance debugging sake. Bjorn highlighted the fact that the PCIe sub system can only offer a query method. Even if theoretically a set API will be provided, this will not fit a netdev debugging - I wonder if CPU vendors even support relaxed ordering set/unset... On the driver's side relaxed ordering is an attribute of the mkey and should be available for configuration (similar to number of CPU vs. number of channels). Based on the above, and binding the driver's default relaxed ordering to the return value from pcie_relaxed_ordering_enabled(), may I continue with previous direction of a private-flag to control the client side (my driver) ? Aya. > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-06 13:00 ` Aya Levin @ 2020-07-06 16:52 ` Jakub Kicinski 2020-07-06 19:49 ` David Miller 1 sibling, 0 replies; 42+ messages in thread From: Jakub Kicinski @ 2020-07-06 16:52 UTC (permalink / raw) To: Aya Levin Cc: Saeed Mahameed, David S. Miller, mkubecek, linux-pci, helgaas, netdev, Tariq Toukan, alexander.h.duyck@linux.intel.com" On Mon, 6 Jul 2020 16:00:59 +0300 Aya Levin wrote: > Assuming the discussions with Bjorn will conclude in a well-trusted API > that ensures relaxed ordering in enabled, I'd still like a method to > turn off relaxed ordering for performance debugging sake. > Bjorn highlighted the fact that the PCIe sub system can only offer a > query method. Even if theoretically a set API will be provided, this > will not fit a netdev debugging - I wonder if CPU vendors even support > relaxed ordering set/unset... > On the driver's side relaxed ordering is an attribute of the mkey and > should be available for configuration (similar to number of CPU vs. > number of channels). > Based on the above, and binding the driver's default relaxed ordering to > the return value from pcie_relaxed_ordering_enabled(), may I continue > with previous direction of a private-flag to control the client side (my > driver) ? That's fine with me, chicken bit seems reasonable as long as the default is dictated by the PCI subsystem. I have no particularly strong feeling on the API used for the chicken bit, but others may. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-06 13:00 ` Aya Levin 2020-07-06 16:52 ` Jakub Kicinski @ 2020-07-06 19:49 ` David Miller 2040-07-08 8:22 ` Aya Levin 1 sibling, 1 reply; 42+ messages in thread From: David Miller @ 2020-07-06 19:49 UTC (permalink / raw) To: ayal Cc: kuba, saeedm, mkubecek, linux-pci, helgaas, netdev, tariqt, alexander.h.duyck From: Aya Levin <ayal@mellanox.com> Date: Mon, 6 Jul 2020 16:00:59 +0300 > Assuming the discussions with Bjorn will conclude in a well-trusted > API that ensures relaxed ordering in enabled, I'd still like a method > to turn off relaxed ordering for performance debugging sake. > Bjorn highlighted the fact that the PCIe sub system can only offer a > query method. Even if theoretically a set API will be provided, this > will not fit a netdev debugging - I wonder if CPU vendors even support > relaxed ordering set/unset... > On the driver's side relaxed ordering is an attribute of the mkey and > should be available for configuration (similar to number of CPU > vs. number of channels). > Based on the above, and binding the driver's default relaxed ordering > to the return value from pcie_relaxed_ordering_enabled(), may I > continue with previous direction of a private-flag to control the > client side (my driver) ? I don't like this situation at all. If RO is so dodgy that it potentially needs to be disabled, that is going to be an issue not just with networking devices but also with storage and other device types as well. Will every device type have a custom way to disable RO, thus inconsistently, in order to accomodate this? That makes no sense and is a terrible user experience. That's why the knob belongs generically in PCI or similar. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-06 19:49 ` David Miller @ 2040-07-08 8:22 ` Aya Levin 2020-07-08 23:16 ` Bjorn Helgaas 2020-07-23 21:03 ` Alexander Duyck 0 siblings, 2 replies; 42+ messages in thread From: Aya Levin @ 2040-07-08 8:22 UTC (permalink / raw) To: David Miller, helgaas Cc: kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck, Jason Gunthorpe On 7/6/2020 10:49 PM, David Miller wrote: > From: Aya Levin <ayal@mellanox.com> > Date: Mon, 6 Jul 2020 16:00:59 +0300 > >> Assuming the discussions with Bjorn will conclude in a well-trusted >> API that ensures relaxed ordering in enabled, I'd still like a method >> to turn off relaxed ordering for performance debugging sake. >> Bjorn highlighted the fact that the PCIe sub system can only offer a >> query method. Even if theoretically a set API will be provided, this >> will not fit a netdev debugging - I wonder if CPU vendors even support >> relaxed ordering set/unset... >> On the driver's side relaxed ordering is an attribute of the mkey and >> should be available for configuration (similar to number of CPU >> vs. number of channels). >> Based on the above, and binding the driver's default relaxed ordering >> to the return value from pcie_relaxed_ordering_enabled(), may I >> continue with previous direction of a private-flag to control the >> client side (my driver) ? > > I don't like this situation at all. > > If RO is so dodgy that it potentially needs to be disabled, that is > going to be an issue not just with networking devices but also with > storage and other device types as well. > > Will every device type have a custom way to disable RO, thus > inconsistently, in order to accomodate this? > > That makes no sense and is a terrible user experience. > > That's why the knob belongs generically in PCI or similar. > Hi Bjorn, Mellanox NIC supports relaxed ordering operation over DMA buffers. However for debug prepossess we must have a chicken bit to disable relaxed ordering on a specific system without effecting others in run-time. In order to meet this requirement, I added a netdev private-flag to ethtool for set RO API. Dave raised a concern regarding embedding relaxed ordering set API per system (networking, storage and others). We need the ability to manage relaxed ordering in a unify manner. Could you please define a PCI sub-system solution to meet this requirement? Aya. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2040-07-08 8:22 ` Aya Levin @ 2020-07-08 23:16 ` Bjorn Helgaas 2020-07-08 23:26 ` Jason Gunthorpe 2020-07-14 10:47 ` Aya Levin 2020-07-23 21:03 ` Alexander Duyck 1 sibling, 2 replies; 42+ messages in thread From: Bjorn Helgaas @ 2020-07-08 23:16 UTC (permalink / raw) To: Aya Levin Cc: David Miller, kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck, Jason Gunthorpe On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote: > On 7/6/2020 10:49 PM, David Miller wrote: > > From: Aya Levin <ayal@mellanox.com> > > Date: Mon, 6 Jul 2020 16:00:59 +0300 > > > > > Assuming the discussions with Bjorn will conclude in a well-trusted > > > API that ensures relaxed ordering in enabled, I'd still like a method > > > to turn off relaxed ordering for performance debugging sake. > > > Bjorn highlighted the fact that the PCIe sub system can only offer a > > > query method. Even if theoretically a set API will be provided, this > > > will not fit a netdev debugging - I wonder if CPU vendors even support > > > relaxed ordering set/unset... > > > On the driver's side relaxed ordering is an attribute of the mkey and > > > should be available for configuration (similar to number of CPU > > > vs. number of channels). > > > Based on the above, and binding the driver's default relaxed ordering > > > to the return value from pcie_relaxed_ordering_enabled(), may I > > > continue with previous direction of a private-flag to control the > > > client side (my driver) ? > > > > I don't like this situation at all. > > > > If RO is so dodgy that it potentially needs to be disabled, that is > > going to be an issue not just with networking devices but also with > > storage and other device types as well. > > > > Will every device type have a custom way to disable RO, thus > > inconsistently, in order to accomodate this? > > > > That makes no sense and is a terrible user experience. > > > > That's why the knob belongs generically in PCI or similar. > > > Hi Bjorn, > > Mellanox NIC supports relaxed ordering operation over DMA buffers. > However for debug prepossess we must have a chicken bit to disable > relaxed ordering on a specific system without effecting others in > run-time. In order to meet this requirement, I added a netdev > private-flag to ethtool for set RO API. > > Dave raised a concern regarding embedding relaxed ordering set API > per system (networking, storage and others). We need the ability to > manage relaxed ordering in a unify manner. Could you please define a > PCI sub-system solution to meet this requirement? I agree, this is definitely a mess. Let me just outline what I think we have today and what we're missing. - On the hardware side, device use of Relaxed Ordering is controlled by the Enable Relaxed Ordering bit in the PCIe Device Control register (or the PCI-X Command register). If set, the device is allowed but not required to set the Relaxed Ordering bit in transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec 7.2.3). I suspect there may be device-specific controls, too, because [1] claims to enable/disable Relaxed Ordering but doesn't touch the PCIe Device Control register. Device-specific controls are certainly allowed, but of course it would be up to the driver, and the device cannot generate TLPs with Relaxed Ordering unless the architected PCIe Enable Relaxed Ordering bit is *also* set. - Platform firmware can enable Relaxed Ordering for a device either before handoff to the OS or via the _HPX ACPI method. - The PCI core never enables Relaxed Ordering itself except when applying _HPX. - At enumeration-time, the PCI core disables Relaxed Ordering in pci_configure_relaxed_ordering() if the device is below a Root Port that has a quirk indicating an erratum. This quirk currently includes many Intel Root Ports, but not all, and is an ongoing maintenance problem. - The PCI core provides pcie_relaxed_ordering_enabled() which tells you whether Relaxed Ordering is enabled. Only used by cxgb4 and csio, which use that information to fill in Ingress Queue Commands. - The PCI core does not provide a driver interface to enable or disable Relaxed Ordering. - Some drivers disable Relaxed Ordering themselves: mtip32xx, netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA), tsi721, kp2000_pcie. - Some drivers enable Relaxed Ordering themselves: niu, tegra. What are we missing and what should the PCI core do? - Currently the Enable Relaxed Ordering bit depends on what firmware did. Maybe the PCI core should always clear it during enumeration? - The PCI core should probably have a driver interface like pci_set_relaxed_ordering(dev, enable) that can set or clear the architected PCI-X or PCIe Enable Relaxed Ordering bit. - Maybe there should be a kernel command-line parameter like "pci=norelax" that disables Relaxed Ordering for every device and prevents pci_set_relaxed_ordering() from enabling it. I'm mixed on this because these tend to become folklore about how to "fix" problems and we end up with systems that don't work unless you happen to find the option on the web. For debugging issues, it might be enough to disable Relaxed Ordering using setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0" [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/ ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-08 23:16 ` Bjorn Helgaas @ 2020-07-08 23:26 ` Jason Gunthorpe 2020-07-09 17:35 ` Jonathan Lemon 2020-07-14 10:47 ` Aya Levin 1 sibling, 1 reply; 42+ messages in thread From: Jason Gunthorpe @ 2020-07-08 23:26 UTC (permalink / raw) To: Bjorn Helgaas Cc: Aya Levin, David Miller, kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > I suspect there may be device-specific controls, too, because [1] > claims to enable/disable Relaxed Ordering but doesn't touch the > PCIe Device Control register. Device-specific controls are > certainly allowed, but of course it would be up to the driver, and > the device cannot generate TLPs with Relaxed Ordering unless the > architected PCIe Enable Relaxed Ordering bit is *also* set. Yes, at least on RDMA relaxed ordering can be set on a per transaction basis and is something userspace can choose to use or not at a fine granularity. This is because we have to support historical applications that make assumptions that data arrives in certain orders. I've been thinking of doing the same as this patch but for RDMA kernel ULPs and just globally turn it on if the PCI CAP is enabled as none of our in-kernel uses have the legacy data ordering problem. There are reports that using relaxed ordering is a *huge* speed up in certain platforms/configurations/etc. > issues, it might be enough to disable Relaxed Ordering using > setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0" For the purposes of occasional performance testing I think this should be good enough? Aya? Jason ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-08 23:26 ` Jason Gunthorpe @ 2020-07-09 17:35 ` Jonathan Lemon 2020-07-09 18:20 ` Jason Gunthorpe 0 siblings, 1 reply; 42+ messages in thread From: Jonathan Lemon @ 2020-07-09 17:35 UTC (permalink / raw) To: Jason Gunthorpe Cc: Bjorn Helgaas, Aya Levin, David Miller, kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck On Wed, Jul 08, 2020 at 08:26:02PM -0300, Jason Gunthorpe wrote: > On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > > I suspect there may be device-specific controls, too, because [1] > > claims to enable/disable Relaxed Ordering but doesn't touch the > > PCIe Device Control register. Device-specific controls are > > certainly allowed, but of course it would be up to the driver, and > > the device cannot generate TLPs with Relaxed Ordering unless the > > architected PCIe Enable Relaxed Ordering bit is *also* set. > > Yes, at least on RDMA relaxed ordering can be set on a per transaction > basis and is something userspace can choose to use or not at a fine > granularity. This is because we have to support historical > applications that make assumptions that data arrives in certain > orders. > > I've been thinking of doing the same as this patch but for RDMA kernel > ULPs and just globally turn it on if the PCI CAP is enabled as none of > our in-kernel uses have the legacy data ordering problem. If I'm following this correctly - there are two different controls being discussed here: 1) having the driver request PCI relaxed ordering, which may or may not be granted, based on other system settings, and 2) having the driver set RO on the transactions it initiates, which are honored iff the PCI bit is set. It seems that in addition to the PCI core changes, there still is a need for driver controls? Unless the driver always enables RO if it's capable? -- Jonathan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-09 17:35 ` Jonathan Lemon @ 2020-07-09 18:20 ` Jason Gunthorpe 2020-07-09 19:47 ` Jakub Kicinski 2020-07-09 20:33 ` Jonathan Lemon 0 siblings, 2 replies; 42+ messages in thread From: Jason Gunthorpe @ 2020-07-09 18:20 UTC (permalink / raw) To: Jonathan Lemon Cc: Bjorn Helgaas, Aya Levin, David Miller, kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck On Thu, Jul 09, 2020 at 10:35:50AM -0700, Jonathan Lemon wrote: > On Wed, Jul 08, 2020 at 08:26:02PM -0300, Jason Gunthorpe wrote: > > On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > > > I suspect there may be device-specific controls, too, because [1] > > > claims to enable/disable Relaxed Ordering but doesn't touch the > > > PCIe Device Control register. Device-specific controls are > > > certainly allowed, but of course it would be up to the driver, and > > > the device cannot generate TLPs with Relaxed Ordering unless the > > > architected PCIe Enable Relaxed Ordering bit is *also* set. > > > > Yes, at least on RDMA relaxed ordering can be set on a per transaction > > basis and is something userspace can choose to use or not at a fine > > granularity. This is because we have to support historical > > applications that make assumptions that data arrives in certain > > orders. > > > > I've been thinking of doing the same as this patch but for RDMA kernel > > ULPs and just globally turn it on if the PCI CAP is enabled as none of > > our in-kernel uses have the legacy data ordering problem. > > If I'm following this correctly - there are two different controls being > discussed here: > > 1) having the driver request PCI relaxed ordering, which may or may > not be granted, based on other system settings, and This is what Bjorn was thinking about, yes, it is some PCI layer function to control the global config space bit. > 2) having the driver set RO on the transactions it initiates, which > are honored iff the PCI bit is set. > > It seems that in addition to the PCI core changes, there still is a need > for driver controls? Unless the driver always enables RO if it's capable? I think the PCI spec imagined that when the config space RO bit was enabled the PCI device would just start using RO packets, in an appropriate and device specific way. So the fine grained control in #2 is something done extra by some devices. IMHO if the driver knows it is functionally correct with RO then it should enable it fully on the device when the config space bit is set. I'm not sure there is a reason to allow users to finely tune RO, at least I haven't heard of cases where RO is a degredation depending on workload. If some platform doesn't work when RO is turned on then it should be globally black listed like is already done in some cases. If the devices has bugs and uses RO wrong, or the driver has bugs and is only stable with !RO and Intel, then the driver shouldn't turn it on at all. In all of these cases it is not a user tunable. Development and testing reasons, like 'is my crash from a RO bug?' to tune should be met by the device global setpci, I think. Jason ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-09 18:20 ` Jason Gunthorpe @ 2020-07-09 19:47 ` Jakub Kicinski 2020-07-10 2:18 ` Saeed Mahameed 2020-07-09 20:33 ` Jonathan Lemon 1 sibling, 1 reply; 42+ messages in thread From: Jakub Kicinski @ 2020-07-09 19:47 UTC (permalink / raw) To: Jason Gunthorpe Cc: Jonathan Lemon, Bjorn Helgaas, Aya Levin, David Miller, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck On Thu, 9 Jul 2020 15:20:11 -0300 Jason Gunthorpe wrote: > > 2) having the driver set RO on the transactions it initiates, which > > are honored iff the PCI bit is set. > > > > It seems that in addition to the PCI core changes, there still is a need > > for driver controls? Unless the driver always enables RO if it's capable? > > I think the PCI spec imagined that when the config space RO bit was > enabled the PCI device would just start using RO packets, in an > appropriate and device specific way. > > So the fine grained control in #2 is something done extra by some > devices. > > IMHO if the driver knows it is functionally correct with RO then it > should enable it fully on the device when the config space bit is set. > > I'm not sure there is a reason to allow users to finely tune RO, at > least I haven't heard of cases where RO is a degredation depending on > workload. > > If some platform doesn't work when RO is turned on then it should be > globally black listed like is already done in some cases. > > If the devices has bugs and uses RO wrong, or the driver has bugs and > is only stable with !RO and Intel, then the driver shouldn't turn it > on at all. > > In all of these cases it is not a user tunable. > > Development and testing reasons, like 'is my crash from a RO bug?' to > tune should be met by the device global setpci, I think. +1 ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-09 19:47 ` Jakub Kicinski @ 2020-07-10 2:18 ` Saeed Mahameed 2020-07-10 12:21 ` Jason Gunthorpe 0 siblings, 1 reply; 42+ messages in thread From: Saeed Mahameed @ 2020-07-10 2:18 UTC (permalink / raw) To: jgg, kuba Cc: mkubecek, Aya Levin, davem, Tariq Toukan, alexander.h.duyck, jonathan.lemon, helgaas, linux-pci, netdev On Thu, 2020-07-09 at 12:47 -0700, Jakub Kicinski wrote: > On Thu, 9 Jul 2020 15:20:11 -0300 Jason Gunthorpe wrote: > > > 2) having the driver set RO on the transactions it initiates, > > > which > > > are honored iff the PCI bit is set. > > > > > > It seems that in addition to the PCI core changes, there still is > > > a need > > > for driver controls? Unless the driver always enables RO if it's > > > capable? > > > > I think the PCI spec imagined that when the config space RO bit was > > enabled the PCI device would just start using RO packets, in an > > appropriate and device specific way. > > > > So the fine grained control in #2 is something done extra by some > > devices. > > > > IMHO if the driver knows it is functionally correct with RO then it > > should enable it fully on the device when the config space bit is > > set. > > > > I'm not sure there is a reason to allow users to finely tune RO, at > > least I haven't heard of cases where RO is a degredation depending > > on > > workload. > > > > If some platform doesn't work when RO is turned on then it should > > be > > globally black listed like is already done in some cases. > > > > If the devices has bugs and uses RO wrong, or the driver has bugs > > and > > is only stable with !RO and Intel, then the driver shouldn't turn > > it > > on at all. > > > > In all of these cases it is not a user tunable. > > > > Development and testing reasons, like 'is my crash from a RO bug?' > > to > > tune should be met by the device global setpci, I think. > > +1 Be careful though to load driver with RO on and then setpci RO off.. not sure what the side effects are, unstable driver maybe ? And not sure what should be the procedure then ? reload driver ? FW will get a notification from PCI ? ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-10 2:18 ` Saeed Mahameed @ 2020-07-10 12:21 ` Jason Gunthorpe 0 siblings, 0 replies; 42+ messages in thread From: Jason Gunthorpe @ 2020-07-10 12:21 UTC (permalink / raw) To: Saeed Mahameed Cc: kuba, mkubecek, Aya Levin, davem, Tariq Toukan, alexander.h.duyck, jonathan.lemon, helgaas, linux-pci, netdev On Fri, Jul 10, 2020 at 02:18:02AM +0000, Saeed Mahameed wrote: > Be careful though to load driver with RO on and then setpci RO off.. > not sure what the side effects are, unstable driver maybe ? According to the PCI spec HW should stop doing RO immediately once the config space bit is cleared. In any event continuing to issue RO won't harm anything. > And not sure what should be the procedure then ? reload driver ? FW > will get a notification from PCI ? At worst you'd have to reload the driver - continuing to use RO if the driver starts with RO off is seriously broken and probably won't work with the quirks to disable RO on buggy platforms. But as above, the RO config space bit should have immedaite effect on the device and it should stop using RO. The device HW itself has to enforce this to be spec compliant. Jason ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-09 18:20 ` Jason Gunthorpe 2020-07-09 19:47 ` Jakub Kicinski @ 2020-07-09 20:33 ` Jonathan Lemon 1 sibling, 0 replies; 42+ messages in thread From: Jonathan Lemon @ 2020-07-09 20:33 UTC (permalink / raw) To: Jason Gunthorpe Cc: Bjorn Helgaas, Aya Levin, David Miller, kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck On Thu, Jul 09, 2020 at 03:20:11PM -0300, Jason Gunthorpe wrote: > On Thu, Jul 09, 2020 at 10:35:50AM -0700, Jonathan Lemon wrote: > > On Wed, Jul 08, 2020 at 08:26:02PM -0300, Jason Gunthorpe wrote: > > > On Wed, Jul 08, 2020 at 06:16:30PM -0500, Bjorn Helgaas wrote: > > > > I suspect there may be device-specific controls, too, because [1] > > > > claims to enable/disable Relaxed Ordering but doesn't touch the > > > > PCIe Device Control register. Device-specific controls are > > > > certainly allowed, but of course it would be up to the driver, and > > > > the device cannot generate TLPs with Relaxed Ordering unless the > > > > architected PCIe Enable Relaxed Ordering bit is *also* set. > > > > > > Yes, at least on RDMA relaxed ordering can be set on a per transaction > > > basis and is something userspace can choose to use or not at a fine > > > granularity. This is because we have to support historical > > > applications that make assumptions that data arrives in certain > > > orders. > > > > > > I've been thinking of doing the same as this patch but for RDMA kernel > > > ULPs and just globally turn it on if the PCI CAP is enabled as none of > > > our in-kernel uses have the legacy data ordering problem. > > > > If I'm following this correctly - there are two different controls being > > discussed here: > > > > 1) having the driver request PCI relaxed ordering, which may or may > > not be granted, based on other system settings, and > > This is what Bjorn was thinking about, yes, it is some PCI layer > function to control the global config space bit. > > > 2) having the driver set RO on the transactions it initiates, which > > are honored iff the PCI bit is set. > > > > It seems that in addition to the PCI core changes, there still is a need > > for driver controls? Unless the driver always enables RO if it's capable? > > I think the PCI spec imagined that when the config space RO bit was > enabled the PCI device would just start using RO packets, in an > appropriate and device specific way. > > So the fine grained control in #2 is something done extra by some > devices. > > IMHO if the driver knows it is functionally correct with RO then it > should enable it fully on the device when the config space bit is set. Sounds reasonable to me. -- Jonathan ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-07-08 23:16 ` Bjorn Helgaas 2020-07-08 23:26 ` Jason Gunthorpe @ 2020-07-14 10:47 ` Aya Levin 1 sibling, 0 replies; 42+ messages in thread From: Aya Levin @ 2020-07-14 10:47 UTC (permalink / raw) To: Bjorn Helgaas Cc: David Miller, kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, alexander.h.duyck, Jason Gunthorpe On 7/9/2020 2:16 AM, Bjorn Helgaas wrote: > On Sun, Jul 08, 2040 at 11:22:12AM +0300, Aya Levin wrote: >> On 7/6/2020 10:49 PM, David Miller wrote: >>> From: Aya Levin <ayal@mellanox.com> >>> Date: Mon, 6 Jul 2020 16:00:59 +0300 >>> >>>> Assuming the discussions with Bjorn will conclude in a well-trusted >>>> API that ensures relaxed ordering in enabled, I'd still like a method >>>> to turn off relaxed ordering for performance debugging sake. >>>> Bjorn highlighted the fact that the PCIe sub system can only offer a >>>> query method. Even if theoretically a set API will be provided, this >>>> will not fit a netdev debugging - I wonder if CPU vendors even support >>>> relaxed ordering set/unset... >>>> On the driver's side relaxed ordering is an attribute of the mkey and >>>> should be available for configuration (similar to number of CPU >>>> vs. number of channels). >>>> Based on the above, and binding the driver's default relaxed ordering >>>> to the return value from pcie_relaxed_ordering_enabled(), may I >>>> continue with previous direction of a private-flag to control the >>>> client side (my driver) ? >>> >>> I don't like this situation at all. >>> >>> If RO is so dodgy that it potentially needs to be disabled, that is >>> going to be an issue not just with networking devices but also with >>> storage and other device types as well. >>> >>> Will every device type have a custom way to disable RO, thus >>> inconsistently, in order to accomodate this? >>> >>> That makes no sense and is a terrible user experience. >>> >>> That's why the knob belongs generically in PCI or similar. >>> >> Hi Bjorn, >> >> Mellanox NIC supports relaxed ordering operation over DMA buffers. >> However for debug prepossess we must have a chicken bit to disable >> relaxed ordering on a specific system without effecting others in >> run-time. In order to meet this requirement, I added a netdev >> private-flag to ethtool for set RO API. >> >> Dave raised a concern regarding embedding relaxed ordering set API >> per system (networking, storage and others). We need the ability to >> manage relaxed ordering in a unify manner. Could you please define a >> PCI sub-system solution to meet this requirement? > > I agree, this is definitely a mess. Let me just outline what I think > we have today and what we're missing. > > - On the hardware side, device use of Relaxed Ordering is controlled > by the Enable Relaxed Ordering bit in the PCIe Device Control > register (or the PCI-X Command register). If set, the device is > allowed but not required to set the Relaxed Ordering bit in > transactions it initiates (PCIe r5.0, sec 7.5.3.4; PCI-X 2.0, sec > 7.2.3). > > I suspect there may be device-specific controls, too, because [1] > claims to enable/disable Relaxed Ordering but doesn't touch the > PCIe Device Control register. Device-specific controls are > certainly allowed, but of course it would be up to the driver, and > the device cannot generate TLPs with Relaxed Ordering unless the > architected PCIe Enable Relaxed Ordering bit is *also* set. > > - Platform firmware can enable Relaxed Ordering for a device either > before handoff to the OS or via the _HPX ACPI method. > > - The PCI core never enables Relaxed Ordering itself except when > applying _HPX. > > - At enumeration-time, the PCI core disables Relaxed Ordering in > pci_configure_relaxed_ordering() if the device is below a Root > Port that has a quirk indicating an erratum. This quirk currently > includes many Intel Root Ports, but not all, and is an ongoing > maintenance problem. > > - The PCI core provides pcie_relaxed_ordering_enabled() which tells > you whether Relaxed Ordering is enabled. Only used by cxgb4 and > csio, which use that information to fill in Ingress Queue > Commands. > > - The PCI core does not provide a driver interface to enable or > disable Relaxed Ordering. > > - Some drivers disable Relaxed Ordering themselves: mtip32xx, > netup_unidvb, tg3, myri10ge (oddly, only if CONFIG_MYRI10GE_DCA), > tsi721, kp2000_pcie. > > - Some drivers enable Relaxed Ordering themselves: niu, tegra. > > What are we missing and what should the PCI core do? > > - Currently the Enable Relaxed Ordering bit depends on what firmware > did. Maybe the PCI core should always clear it during > enumeration? > > - The PCI core should probably have a driver interface like > pci_set_relaxed_ordering(dev, enable) that can set or clear the > architected PCI-X or PCIe Enable Relaxed Ordering bit. > > - Maybe there should be a kernel command-line parameter like > "pci=norelax" that disables Relaxed Ordering for every device and > prevents pci_set_relaxed_ordering() from enabling it. > > I'm mixed on this because these tend to become folklore about how > to "fix" problems and we end up with systems that don't work > unless you happen to find the option on the web. For debugging > issues, it might be enough to disable Relaxed Ordering using > setpci, e.g., "setpci -s02:00.0 CAP_EXP+8.w=0" > > [1] https://lore.kernel.org/netdev/20200623195229.26411-11-saeedm@mellanox.com/ > Hi Bjorn, Thanks for the detailed reply. From initial testing I can say that turning off the relaxed ordering on the PCI (setpci -s02:00.0 CAP_EXP+8.w=0) is the chicken bit I was looking for. This lower the risk of depending on pcie_relaxed_ordering_enabled(). I will update my patch and resubmit. Thanks, Aya ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2040-07-08 8:22 ` Aya Levin 2020-07-08 23:16 ` Bjorn Helgaas @ 2020-07-23 21:03 ` Alexander Duyck 1 sibling, 0 replies; 42+ messages in thread From: Alexander Duyck @ 2020-07-23 21:03 UTC (permalink / raw) To: Aya Levin, David Miller, helgaas Cc: kuba, saeedm, mkubecek, linux-pci, netdev, tariqt, Jason Gunthorpe On 7/8/2040 1:22 AM, Aya Levin wrote: > > > On 7/6/2020 10:49 PM, David Miller wrote: >> From: Aya Levin <ayal@mellanox.com> >> Date: Mon, 6 Jul 2020 16:00:59 +0300 >> >>> Assuming the discussions with Bjorn will conclude in a well-trusted >>> API that ensures relaxed ordering in enabled, I'd still like a method >>> to turn off relaxed ordering for performance debugging sake. >>> Bjorn highlighted the fact that the PCIe sub system can only offer a >>> query method. Even if theoretically a set API will be provided, this >>> will not fit a netdev debugging - I wonder if CPU vendors even support >>> relaxed ordering set/unset... >>> On the driver's side relaxed ordering is an attribute of the mkey and >>> should be available for configuration (similar to number of CPU >>> vs. number of channels). >>> Based on the above, and binding the driver's default relaxed ordering >>> to the return value from pcie_relaxed_ordering_enabled(), may I >>> continue with previous direction of a private-flag to control the >>> client side (my driver) ? >> >> I don't like this situation at all. >> >> If RO is so dodgy that it potentially needs to be disabled, that is >> going to be an issue not just with networking devices but also with >> storage and other device types as well. >> >> Will every device type have a custom way to disable RO, thus >> inconsistently, in order to accomodate this? >> >> That makes no sense and is a terrible user experience. >> >> That's why the knob belongs generically in PCI or similar. >> > Hi Bjorn, > > Mellanox NIC supports relaxed ordering operation over DMA buffers. > However for debug prepossess we must have a chicken bit to disable > relaxed ordering on a specific system without effecting others in > run-time. In order to meet this requirement, I added a netdev > private-flag to ethtool for set RO API. > > Dave raised a concern regarding embedding relaxed ordering set API per > system (networking, storage and others). We need the ability to manage > relaxed ordering in a unify manner. Could you please define a PCI > sub-system solution to meet this requirement? > > Aya. Isn't there a relaxed ordering bit in the PCIe configuration space? Couldn't you use that as a global indication of if you can support relaxed ordering or not? Reading through the spec it seems like that is kind of the point of the config space bit in the Device Control register. If the bit is not set there then you shouldn't be able to use relaxed ordering in the device. Then it is just a matter of using setpci to enable/disable it. Thanks. - Alex ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-24 17:22 ` Jakub Kicinski 2020-06-24 20:15 ` Saeed Mahameed @ 2020-06-26 20:12 ` Bjorn Helgaas 2020-06-26 20:24 ` David Miller 2020-06-29 9:32 ` Aya Levin 1 sibling, 2 replies; 42+ messages in thread From: Bjorn Helgaas @ 2020-06-26 20:12 UTC (permalink / raw) To: Jakub Kicinski Cc: Aya Levin, Saeed Mahameed, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck On Wed, Jun 24, 2020 at 10:22:58AM -0700, Jakub Kicinski wrote: > On Wed, 24 Jun 2020 10:34:40 +0300 Aya Levin wrote: > > >> I think Michal will rightly complain that this does not belong in > > >> private flags any more. As (/if?) ARM deployments take a foothold > > >> in DC this will become a common setting for most NICs. > > > > > > Initially we used pcie_relaxed_ordering_enabled() to > > > programmatically enable this on/off on boot but this seems to > > > introduce some degradation on some Intel CPUs since the Intel Faulty > > > CPUs list is not up to date. Aya is discussing this with Bjorn. > > Adding Bjorn Helgaas > > I see. Simply using pcie_relaxed_ordering_enabled() and blacklisting > bad CPUs seems far nicer from operational perspective. Perhaps Bjorn > will chime in. Pushing the validation out to the user is not a great > solution IMHO. I'm totally lost, but maybe it doesn't matter because it looks like David has pulled this series already. There probably *should* be a PCI core interface to enable RO, but there isn't one today. pcie_relaxed_ordering_enabled() doesn't *enable* anything. All it does is tell you whether RO is already enabled. This patch ([net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering) apparently adds a knob to control RO, but I can't connect the dots. It doesn't touch PCI_EXP_DEVCTL_RELAX_EN, and that symbol doesn't occur anywhere in drivers/net except tg3, myri10ge, and niu. And this whole series doesn't contain PCI_EXP_DEVCTL_RELAX_EN or pcie_relaxed_ordering_enabled(). I do have a couple emails from Aya, but they didn't include a patch and I haven't quite figured out what the question was. > > > So until we figure this out, will keep this off by default. > > > > > > for the private flags we want to keep them for performance analysis as > > > we do with all other mlx5 special performance features and flags. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-26 20:12 ` Bjorn Helgaas @ 2020-06-26 20:24 ` David Miller 2020-06-29 9:32 ` Aya Levin 1 sibling, 0 replies; 42+ messages in thread From: David Miller @ 2020-06-26 20:24 UTC (permalink / raw) To: helgaas Cc: kuba, ayal, saeedm, mkubecek, netdev, tariqt, linux-pci, alexander.h.duyck From: Bjorn Helgaas <helgaas@kernel.org> Date: Fri, 26 Jun 2020 15:12:54 -0500 > I'm totally lost, but maybe it doesn't matter because it looks like > David has pulled this series already. I pulled an updated version of this series with this patch removed. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-26 20:12 ` Bjorn Helgaas 2020-06-26 20:24 ` David Miller @ 2020-06-29 9:32 ` Aya Levin 2020-06-29 19:33 ` Bjorn Helgaas 1 sibling, 1 reply; 42+ messages in thread From: Aya Levin @ 2020-06-29 9:32 UTC (permalink / raw) To: Bjorn Helgaas, Jakub Kicinski Cc: Saeed Mahameed, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck On 6/26/2020 11:12 PM, Bjorn Helgaas wrote: > On Wed, Jun 24, 2020 at 10:22:58AM -0700, Jakub Kicinski wrote: >> On Wed, 24 Jun 2020 10:34:40 +0300 Aya Levin wrote: >>>>> I think Michal will rightly complain that this does not belong in >>>>> private flags any more. As (/if?) ARM deployments take a foothold >>>>> in DC this will become a common setting for most NICs. >>>> >>>> Initially we used pcie_relaxed_ordering_enabled() to >>>> programmatically enable this on/off on boot but this seems to >>>> introduce some degradation on some Intel CPUs since the Intel Faulty >>>> CPUs list is not up to date. Aya is discussing this with Bjorn. >>> Adding Bjorn Helgaas >> >> I see. Simply using pcie_relaxed_ordering_enabled() and blacklisting >> bad CPUs seems far nicer from operational perspective. Perhaps Bjorn >> will chime in. Pushing the validation out to the user is not a great >> solution IMHO. > > I'm totally lost, but maybe it doesn't matter because it looks like > David has pulled this series already. > > There probably *should* be a PCI core interface to enable RO, but > there isn't one today. > > pcie_relaxed_ordering_enabled() doesn't *enable* anything. All it > does is tell you whether RO is already enabled. > > This patch ([net-next 10/10] net/mlx5e: Add support for PCI relaxed > ordering) apparently adds a knob to control RO, but I can't connect > the dots. It doesn't touch PCI_EXP_DEVCTL_RELAX_EN, and that symbol > doesn't occur anywhere in drivers/net except tg3, myri10ge, and niu. > > And this whole series doesn't contain PCI_EXP_DEVCTL_RELAX_EN or > pcie_relaxed_ordering_enabled(). I wanted to turn on RO on the ETH driver based on pcie_relaxed_ordering_enabled(). From my experiments I see that pcie_relaxed_ordering_enabled() return true on Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. This CPU is from Haswell series which is known to have bug in RO implementation. In this case, I expected pcie_relaxed_ordering_enabled() to return false, shouldn't it? In addition, we are worried about future bugs in new CPUs which may result in performance degradation while using RO, as long as the function pcie_relaxed_ordering_enabled() will return true for these CPUs. That's why we thought of adding the feature on our card with default off and enable the user to set it. > > I do have a couple emails from Aya, but they didn't include a patch > and I haven't quite figured out what the question was. > >>>> So until we figure this out, will keep this off by default. >>>> >>>> for the private flags we want to keep them for performance analysis as >>>> we do with all other mlx5 special performance features and flags. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-29 9:32 ` Aya Levin @ 2020-06-29 19:33 ` Bjorn Helgaas 2020-06-29 19:57 ` Raj, Ashok 0 siblings, 1 reply; 42+ messages in thread From: Bjorn Helgaas @ 2020-06-29 19:33 UTC (permalink / raw) To: Aya Levin Cc: Jakub Kicinski, Saeed Mahameed, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck, Ashok Raj, Ding Tianhong, Casey Leedom [+cc Ashok, Ding, Casey] On Mon, Jun 29, 2020 at 12:32:44PM +0300, Aya Levin wrote: > I wanted to turn on RO on the ETH driver based on > pcie_relaxed_ordering_enabled(). > From my experiments I see that pcie_relaxed_ordering_enabled() return true > on Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. This CPU is from Haswell > series which is known to have bug in RO implementation. In this case, I > expected pcie_relaxed_ordering_enabled() to return false, shouldn't it? Is there an erratum for this? How do we know this device has a bug in relaxed ordering? > In addition, we are worried about future bugs in new CPUs which may result > in performance degradation while using RO, as long as the function > pcie_relaxed_ordering_enabled() will return true for these CPUs. I'm worried about this too. I do not want to add a Device ID to the quirk_relaxedordering_disable() list for every new Intel CPU. That's a huge hassle and creates a real problem for old kernels running on those new CPUs, because things might work "most of the time" but not always. Maybe we need to prevent the use of relaxed ordering for *all* Intel CPUs. > That's why > we thought of adding the feature on our card with default off and enable the > user to set it. Bjorn ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-29 19:33 ` Bjorn Helgaas @ 2020-06-29 19:57 ` Raj, Ashok 2020-06-30 7:32 ` Ding Tianhong 0 siblings, 1 reply; 42+ messages in thread From: Raj, Ashok @ 2020-06-29 19:57 UTC (permalink / raw) To: Bjorn Helgaas Cc: Aya Levin, Jakub Kicinski, Saeed Mahameed, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck, Ding Tianhong, Casey Leedom, Ashok Raj Hi Bjorn On Mon, Jun 29, 2020 at 02:33:16PM -0500, Bjorn Helgaas wrote: > [+cc Ashok, Ding, Casey] > > On Mon, Jun 29, 2020 at 12:32:44PM +0300, Aya Levin wrote: > > I wanted to turn on RO on the ETH driver based on > > pcie_relaxed_ordering_enabled(). > > From my experiments I see that pcie_relaxed_ordering_enabled() return true > > on Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. This CPU is from Haswell > > series which is known to have bug in RO implementation. In this case, I > > expected pcie_relaxed_ordering_enabled() to return false, shouldn't it? > > Is there an erratum for this? How do we know this device has a bug > in relaxed ordering? https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-optimization-reference-manual.html For some reason they weren't documented in the errata, but under Optimization manual :-) Table 3-7. Intel Processor CPU RP Device IDs for Processors Optimizing PCIe Performance Processor CPU RP Device IDs Intel Xeon processors based on Broadwell microarchitecture 6F01H-6F0EH Intel Xeon processors based on Haswell microarchitecture 2F01H-2F0EH These are the two that were listed in the manual. drivers/pci/quirks.c also has an eloborate list of root ports where relaxed_ordering is disabled. Did you check if its not already covered here? Send lspci if its not already covered by this table. > > > In addition, we are worried about future bugs in new CPUs which may result > > in performance degradation while using RO, as long as the function > > pcie_relaxed_ordering_enabled() will return true for these CPUs. > > I'm worried about this too. I do not want to add a Device ID to the > quirk_relaxedordering_disable() list for every new Intel CPU. That's > a huge hassle and creates a real problem for old kernels running on > those new CPUs, because things might work "most of the time" but not > always. I'll check when this is fixed, i was told newer ones should work properly. But I'll confirm. ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-29 19:57 ` Raj, Ashok @ 2020-06-30 7:32 ` Ding Tianhong 2020-07-05 11:15 ` Aya Levin 0 siblings, 1 reply; 42+ messages in thread From: Ding Tianhong @ 2020-06-30 7:32 UTC (permalink / raw) To: Raj, Ashok, Bjorn Helgaas Cc: Aya Levin, Jakub Kicinski, Saeed Mahameed, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck, Casey Leedom 在 2020/6/30 3:57, Raj, Ashok 写道: > Hi Bjorn > > > On Mon, Jun 29, 2020 at 02:33:16PM -0500, Bjorn Helgaas wrote: >> [+cc Ashok, Ding, Casey] >> >> On Mon, Jun 29, 2020 at 12:32:44PM +0300, Aya Levin wrote: >>> I wanted to turn on RO on the ETH driver based on >>> pcie_relaxed_ordering_enabled(). >>> From my experiments I see that pcie_relaxed_ordering_enabled() return true >>> on Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. This CPU is from Haswell >>> series which is known to have bug in RO implementation. In this case, I >>> expected pcie_relaxed_ordering_enabled() to return false, shouldn't it? >> >> Is there an erratum for this? How do we know this device has a bug >> in relaxed ordering? > > https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-optimization-reference-manual.html > > For some reason they weren't documented in the errata, but under > Optimization manual :-) > > Table 3-7. Intel Processor CPU RP Device IDs for Processors Optimizing PCIe > Performance > Processor CPU RP Device IDs > Intel Xeon processors based on Broadwell microarchitecture 6F01H-6F0EH > Intel Xeon processors based on Haswell microarchitecture 2F01H-2F0EH > > These are the two that were listed in the manual. drivers/pci/quirks.c also > has an eloborate list of root ports where relaxed_ordering is disabled. Did > you check if its not already covered here? > > Send lspci if its not already covered by this table. > Looks like the chip series is not in the errta list, but it is really difficult to distinguish and test. > >> >>> In addition, we are worried about future bugs in new CPUs which may result >>> in performance degradation while using RO, as long as the function >>> pcie_relaxed_ordering_enabled() will return true for these CPUs. >> >> I'm worried about this too. I do not want to add a Device ID to the >> quirk_relaxedordering_disable() list for every new Intel CPU. That's >> a huge hassle and creates a real problem for old kernels running on >> those new CPUs, because things might work "most of the time" but not >> always. > > I'll check when this is fixed, i was told newer ones should work properly. > But I'll confirm. > Maybe prevent the Relax Ordering for all Intel CPUs is a better soluton, looks like it will not break anything. Ding > > . > ^ permalink raw reply [flat|nested] 42+ messages in thread
* Re: [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering 2020-06-30 7:32 ` Ding Tianhong @ 2020-07-05 11:15 ` Aya Levin 0 siblings, 0 replies; 42+ messages in thread From: Aya Levin @ 2020-07-05 11:15 UTC (permalink / raw) To: Ding Tianhong, Raj, Ashok, Bjorn Helgaas Cc: Jakub Kicinski, Saeed Mahameed, mkubecek, davem, netdev, Tariq Toukan, linux-pci, Alexander Duyck, Casey Leedom [-- Attachment #1: Type: text/plain, Size: 2667 bytes --] On 6/30/2020 10:32 AM, Ding Tianhong wrote: > > > 在 2020/6/30 3:57, Raj, Ashok 写道: >> Hi Bjorn >> >> >> On Mon, Jun 29, 2020 at 02:33:16PM -0500, Bjorn Helgaas wrote: >>> [+cc Ashok, Ding, Casey] >>> >>> On Mon, Jun 29, 2020 at 12:32:44PM +0300, Aya Levin wrote: >>>> I wanted to turn on RO on the ETH driver based on >>>> pcie_relaxed_ordering_enabled(). >>>> From my experiments I see that pcie_relaxed_ordering_enabled() return true >>>> on Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz. This CPU is from Haswell >>>> series which is known to have bug in RO implementation. In this case, I >>>> expected pcie_relaxed_ordering_enabled() to return false, shouldn't it? >>> >>> Is there an erratum for this? How do we know this device has a bug >>> in relaxed ordering? >> >> https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-optimization-reference-manual.html >> >> For some reason they weren't documented in the errata, but under >> Optimization manual :-) >> >> Table 3-7. Intel Processor CPU RP Device IDs for Processors Optimizing PCIe >> Performance >> Processor CPU RP Device IDs >> Intel Xeon processors based on Broadwell microarchitecture 6F01H-6F0EH >> Intel Xeon processors based on Haswell microarchitecture 2F01H-2F0EH >> >> These are the two that were listed in the manual. drivers/pci/quirks.c also >> has an eloborate list of root ports where relaxed_ordering is disabled. Did >> you check if its not already covered here? >> >> Send lspci if its not already covered by this table. Attached lspci -vm output >> > > Looks like the chip series is not in the errta list, but it is really difficult to distinguish and test. Does Intel plan to send a fixing patch that will go to -stable? > >> >>> >>>> In addition, we are worried about future bugs in new CPUs which may result >>>> in performance degradation while using RO, as long as the function >>>> pcie_relaxed_ordering_enabled() will return true for these CPUs. >>> >>> I'm worried about this too. I do not want to add a Device ID to the >>> quirk_relaxedordering_disable() list for every new Intel CPU. That's >>> a huge hassle and creates a real problem for old kernels running on >>> those new CPUs, because things might work "most of the time" but not >>> always. Please advise how to move forward >> >> I'll check when this is fixed, i was told newer ones should work properly. >> But I'll confirm. Any updates? This is important information to proceed >> > > Maybe prevent the Relax Ordering for all Intel CPUs is a better soluton, looks like > it will not break anything. Should I provide this patch? Aya. > > Ding >> >> . >> > [-- Attachment #2: lspci_vm.txt --] [-- Type: text/plain, Size: 43207 bytes --] 00:00.0 0600: 8086:2f00 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [90] Express Root Port (Slot-), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [144] Vendor Specific Information: ID=0004 Rev=1 Len=03c <?> Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> 00:01.0 0604: 8086:2f02 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=03, subordinate=03, sec-latency=0 I/O behind bridge: 00002000-00002fff Memory behind bridge: 93c00000-93dfffff Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot-), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 00:02.0 0604: 8086:2f04 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 Memory behind bridge: 93f00000-947fffff Prefetchable memory behind bridge: 0000000090000000-0000000091ffffff Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 00:03.0 0604: 8086:2f08 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=02, subordinate=02, sec-latency=0 Memory behind bridge: 94800000-948fffff Prefetchable memory behind bridge: 0000000093a00000-0000000093afffff Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot-), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 00:03.1 0604: 8086:2f09 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=01, subordinate=01, sec-latency=0 Memory behind bridge: 94900000-949fffff Prefetchable memory behind bridge: 0000000093b00000-0000000093bfffff Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot-), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 00:03.2 0604: 8086:2f0a (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=05, subordinate=05, sec-latency=0 Memory behind bridge: 94a00000-95bfffff Prefetchable memory behind bridge: 0000033ffc000000-0000033fffffffff Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 00:05.0 0880: 8086:2f28 (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:05.1 0880: 8086:2f29 (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> Capabilities: [110] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> Capabilities: [120] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> Capabilities: [130] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> 00:05.2 0880: 8086:2f2a (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:05.4 0800: 8086:2f2c (rev 02) (prog-if 20 [IO(X)-APIC]) Subsystem: 8086:0000 Flags: bus master, fast devsel, latency 0 Memory at 93e08000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [e0] Power Management version 3 00:05.6 1101: 8086:2f39 (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Kernel driver in use: hswep_uncore 00:06.0 0880: 8086:2f10 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 00:06.1 0880: 8086:2f11 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 00:06.2 0880: 8086:2f12 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:06.3 0880: 8086:2f13 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 00:06.4 0880: 8086:2f14 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:06.5 0880: 8086:2f15 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:06.6 0880: 8086:2f16 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:06.7 0880: 8086:2f17 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:07.0 0880: 8086:2f18 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 00:07.1 0880: 8086:2f19 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:07.2 0880: 8086:2f1a (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:07.3 0880: 8086:2f1b (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:07.4 0880: 8086:2f1c (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 00:11.0 ff00: 8086:8d7c (rev 05) Subsystem: 1028:0600 Flags: bus master, fast devsel, latency 0 Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [80] Power Management version 3 00:11.4 0106: 8086:8d62 (rev 05) (prog-if 01 [AHCI 1.0]) Subsystem: 1028:0600 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 80 I/O ports at 3078 [size=8] I/O ports at 308c [size=4] I/O ports at 3070 [size=8] I/O ports at 3088 [size=4] I/O ports at 3040 [size=32] Memory at 93e02000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Kernel driver in use: ahci 00:16.0 0780: 8086:8d3a (rev 05) Subsystem: 1028:0600 Flags: bus master, fast devsel, latency 0, IRQ 255 Memory at 93e07000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+ 00:16.1 0780: 8086:8d3b (rev 05) Subsystem: 1028:0600 Flags: bus master, fast devsel, latency 0, IRQ 255 Memory at 93e06000 (64-bit, non-prefetchable) [size=16] Capabilities: [50] Power Management version 3 Capabilities: [8c] MSI: Enable- Count=1/1 Maskable- 64bit+ 00:1a.0 0c03: 8086:8d2d (rev 05) (prog-if 20 [EHCI]) Subsystem: 1028:0600 Flags: bus master, medium devsel, latency 0, IRQ 18 Memory at 93e04000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci 00:1c.0 0604: 8086:8d10 (rev d5) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=06, subordinate=06, sec-latency=0 Capabilities: [40] Express Root Port (Slot-), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: 1028:0600 Capabilities: [a0] Power Management version 3 Kernel driver in use: pcieport 00:1c.7 0604: 8086:8d1e (rev d5) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=00, secondary=07, subordinate=0b, sec-latency=0 Memory behind bridge: 93000000-939fffff Prefetchable memory behind bridge: 0000000092000000-0000000092ffffff Capabilities: [40] Express Root Port (Slot+), MSI 00 Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [90] Subsystem: 1028:0600 Capabilities: [a0] Power Management version 3 Capabilities: [100] Advanced Error Reporting Kernel driver in use: pcieport 00:1d.0 0c03: 8086:8d26 (rev 05) (prog-if 20 [EHCI]) Subsystem: 1028:0600 Flags: bus master, medium devsel, latency 0, IRQ 18 Memory at 93e03000 (32-bit, non-prefetchable) [size=1K] Capabilities: [50] Power Management version 2 Capabilities: [58] Debug port: BAR=1 offset=00a0 Capabilities: [98] PCI Advanced Features Kernel driver in use: ehci-pci 00:1f.0 0601: 8086:8d44 (rev 05) Subsystem: 1028:0600 Flags: bus master, medium devsel, latency 0 Capabilities: [e0] Vendor Specific Information: Len=0c <?> Kernel driver in use: lpc_ich 00:1f.2 0106: 8086:8d02 (rev 05) (prog-if 01 [AHCI 1.0]) Subsystem: 1028:0600 Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 81 I/O ports at 3068 [size=8] I/O ports at 3084 [size=4] I/O ports at 3060 [size=8] I/O ports at 3080 [size=4] I/O ports at 3020 [size=32] Memory at 93e01000 (32-bit, non-prefetchable) [size=2K] Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit- Capabilities: [70] Power Management version 3 Capabilities: [a8] SATA HBA v1.0 Kernel driver in use: ahci 01:00.0 0200: 14e4:165f Subsystem: 1028:1f5b Flags: bus master, fast devsel, latency 0, IRQ 82 Memory at 93b30000 (64-bit, prefetchable) [size=64K] Memory at 93b40000 (64-bit, prefetchable) [size=64K] Memory at 93b50000 (64-bit, prefetchable) [size=64K] Expansion ROM at 94900000 [disabled] [size=256K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable+ Count=17 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [13c] Device Serial Number 00-00-b0-83-fe-cf-d4-05 Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: tg3 01:00.1 0200: 14e4:165f Subsystem: 1028:1f5b Flags: bus master, fast devsel, latency 0, IRQ 83 Memory at 93b00000 (64-bit, prefetchable) [size=64K] Memory at 93b10000 (64-bit, prefetchable) [size=64K] Memory at 93b20000 (64-bit, prefetchable) [size=64K] Expansion ROM at 94940000 [disabled] [size=256K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable- Count=17 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [13c] Device Serial Number 00-00-b0-83-fe-cf-d4-06 Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: tg3 02:00.0 0200: 14e4:165f Subsystem: 1028:1f5b Flags: bus master, fast devsel, latency 0, IRQ 84 Memory at 93a30000 (64-bit, prefetchable) [size=64K] Memory at 93a40000 (64-bit, prefetchable) [size=64K] Memory at 93a50000 (64-bit, prefetchable) [size=64K] Expansion ROM at 94800000 [disabled] [size=256K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable- Count=17 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [13c] Device Serial Number 00-00-b0-83-fe-cf-d4-07 Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: tg3 02:00.1 0200: 14e4:165f Subsystem: 1028:1f5b Flags: bus master, fast devsel, latency 0, IRQ 85 Memory at 93a00000 (64-bit, prefetchable) [size=64K] Memory at 93a10000 (64-bit, prefetchable) [size=64K] Memory at 93a20000 (64-bit, prefetchable) [size=64K] Expansion ROM at 94840000 [disabled] [size=256K] Capabilities: [48] Power Management version 3 Capabilities: [50] Vital Product Data Capabilities: [58] MSI: Enable- Count=1/8 Maskable- 64bit+ Capabilities: [a0] MSI-X: Enable- Count=17 Masked- Capabilities: [ac] Express Endpoint, MSI 00 Capabilities: [100] Advanced Error Reporting Capabilities: [13c] Device Serial Number 00-00-b0-83-fe-cf-d4-08 Capabilities: [150] Power Budgeting <?> Capabilities: [160] Virtual Channel Kernel driver in use: tg3 03:00.0 0104: 1000:005d (rev 02) Subsystem: 1028:1f49 Flags: bus master, fast devsel, latency 0, IRQ 37 I/O ports at 2000 [size=256] Memory at 93d00000 (64-bit, non-prefetchable) [size=64K] Memory at 93c00000 (64-bit, non-prefetchable) [size=1M] Expansion ROM at <ignored> [disabled] Capabilities: [50] Power Management version 3 Capabilities: [68] Express Endpoint, MSI 00 Capabilities: [d0] Vital Product Data Capabilities: [a8] MSI: Enable- Count=1/1 Maskable+ 64bit+ Capabilities: [c0] MSI-X: Enable+ Count=97 Masked- Capabilities: [100] Advanced Error Reporting Capabilities: [1e0] #19 Capabilities: [1c0] Power Budgeting <?> Capabilities: [190] #16 Capabilities: [148] Alternative Routing-ID Interpretation (ARI) Kernel driver in use: megaraid_sas 04:00.0 0200: 15b3:101d Subsystem: 15b3:0047 Flags: bus master, fast devsel, latency 0, IRQ 91 Memory at 90000000 (64-bit, prefetchable) [size=32M] Expansion ROM at 93f00000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [1c0] #19 Capabilities: [320] #27 Capabilities: [370] #26 Capabilities: [420] #25 Kernel driver in use: mlx5_core 05:00.0 0200: 15b3:1017 Subsystem: 15b3:0020 Flags: bus master, fast devsel, latency 0, IRQ 133 Memory at 33ffe000000 (64-bit, prefetchable) [size=32M] Expansion ROM at 94a00000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [1c0] #19 Capabilities: [230] Access Control Services Kernel driver in use: mlx5_core 05:00.1 0200: 15b3:1017 Subsystem: 15b3:0020 Flags: bus master, fast devsel, latency 0, IRQ 83 Memory at 33ffc000000 (64-bit, prefetchable) [size=32M] Expansion ROM at 95300000 [disabled] [size=1M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [230] Access Control Services Kernel driver in use: mlx5_core 07:00.0 0604: 1912:001d (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 BIST result: 00 Bus: primary=07, secondary=08, subordinate=0b, sec-latency=0 Memory behind bridge: 93000000-939fffff Prefetchable memory behind bridge: 0000000092000000-0000000092ffffff Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Upstream Port, MSI 00 Capabilities: [b0] Subsystem: 1912:001d Capabilities: [100] Advanced Error Reporting Kernel driver in use: pcieport 08:00.0 0604: 1912:001d (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 BIST result: 00 Bus: primary=08, secondary=09, subordinate=0a, sec-latency=0 Memory behind bridge: 93000000-938fffff Prefetchable memory behind bridge: 0000000092000000-0000000092ffffff Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express Downstream Port (Slot-), MSI 00 Capabilities: [b0] Subsystem: 1912:001d Capabilities: [100] Advanced Error Reporting Kernel driver in use: pcieport 09:00.0 0604: 1912:001a (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 BIST result: 00 Bus: primary=09, secondary=0a, subordinate=0a, sec-latency=0 Memory behind bridge: 93000000-938fffff Prefetchable memory behind bridge: 0000000092000000-0000000092ffffff Capabilities: [40] Power Management version 3 Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [70] Express PCI-Express to PCI/PCI-X Bridge, MSI 00 Capabilities: [b0] Subsystem: 1912:001a Capabilities: [100] Advanced Error Reporting 0a:00.0 0300: 102b:0534 (rev 01) (prog-if 00 [VGA controller]) Subsystem: 1028:0600 Flags: bus master, medium devsel, latency 0, IRQ 19 Memory at 92000000 (32-bit, prefetchable) [size=16M] Memory at 93800000 (32-bit, non-prefetchable) [size=16K] Memory at 93000000 (32-bit, non-prefetchable) [size=8M] Expansion ROM at <unassigned> [disabled] Capabilities: [dc] Power Management version 1 Kernel driver in use: mgag200 7f:08.0 0880: 8086:2f80 (rev 02) Subsystem: 8086:2f80 Flags: fast devsel 7f:08.2 1101: 8086:2f32 (rev 02) Subsystem: 8086:2f32 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:08.3 0880: 8086:2f83 (rev 02) Subsystem: 8086:2f83 Flags: fast devsel 7f:08.5 0880: 8086:2f85 (rev 02) Subsystem: 8086:2f85 Flags: fast devsel 7f:08.6 0880: 8086:2f86 (rev 02) Subsystem: 8086:2f86 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:08.7 0880: 8086:2f87 (rev 02) Subsystem: 8086:2f87 Flags: fast devsel 7f:09.0 0880: 8086:2f90 (rev 02) Subsystem: 8086:2f90 Flags: fast devsel 7f:09.2 1101: 8086:2f33 (rev 02) Subsystem: 8086:2f33 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:09.3 0880: 8086:2f93 (rev 02) Subsystem: 8086:2f93 Flags: fast devsel 7f:09.5 0880: 8086:2f95 (rev 02) Subsystem: 8086:2f95 Flags: fast devsel 7f:09.6 0880: 8086:2f96 (rev 02) Subsystem: 8086:2f96 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:0b.0 0880: 8086:2f81 (rev 02) Subsystem: 8086:2f81 Flags: fast devsel 7f:0b.1 1101: 8086:2f36 (rev 02) Subsystem: 8086:2f36 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:0b.2 1101: 8086:2f37 (rev 02) Subsystem: 8086:2f37 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:0b.4 0880: 8086:2f41 (rev 02) Subsystem: 8086:2f41 Flags: fast devsel 7f:0b.5 1101: 8086:2f3e (rev 02) Subsystem: 8086:2f3e Flags: fast devsel Kernel driver in use: hswep_uncore 7f:0b.6 1101: 8086:2f3f (rev 02) Subsystem: 8086:2f3f Flags: fast devsel 7f:0c.0 0880: 8086:2fe0 (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel 7f:0c.1 0880: 8086:2fe1 (rev 02) Subsystem: 8086:2fe1 Flags: fast devsel 7f:0c.2 0880: 8086:2fe2 (rev 02) Subsystem: 8086:2fe2 Flags: fast devsel 7f:0c.3 0880: 8086:2fe3 (rev 02) Subsystem: 8086:2fe3 Flags: fast devsel 7f:0c.4 0880: 8086:2fe4 (rev 02) Subsystem: 8086:2fe4 Flags: fast devsel 7f:0c.5 0880: 8086:2fe5 (rev 02) Subsystem: 8086:2fe5 Flags: fast devsel 7f:0c.6 0880: 8086:2fe6 (rev 02) Subsystem: 8086:2fe6 Flags: fast devsel 7f:0c.7 0880: 8086:2fe7 (rev 02) Subsystem: 8086:2fe7 Flags: fast devsel 7f:0d.0 0880: 8086:2fe8 (rev 02) Subsystem: 8086:2fe8 Flags: fast devsel 7f:0d.1 0880: 8086:2fe9 (rev 02) Subsystem: 8086:2fe9 Flags: fast devsel 7f:0f.0 0880: 8086:2ff8 (rev 02) Flags: fast devsel 7f:0f.1 0880: 8086:2ff9 (rev 02) Flags: fast devsel 7f:0f.2 0880: 8086:2ffa (rev 02) Flags: fast devsel 7f:0f.3 0880: 8086:2ffb (rev 02) Flags: fast devsel 7f:0f.4 0880: 8086:2ffc (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel 7f:0f.5 0880: 8086:2ffd (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel 7f:0f.6 0880: 8086:2ffe (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel 7f:10.0 0880: 8086:2f1d (rev 02) Subsystem: 8086:2f1d Flags: fast devsel 7f:10.1 1101: 8086:2f34 (rev 02) Subsystem: 8086:2f34 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:10.5 0880: 8086:2f1e (rev 02) Subsystem: 8086:2f1e Flags: fast devsel 7f:10.6 1101: 8086:2f7d (rev 02) Subsystem: 8086:2f7d Flags: fast devsel 7f:10.7 0880: 8086:2f1f (rev 02) Subsystem: 8086:2f1f Flags: fast devsel 7f:12.0 0880: 8086:2fa0 (rev 02) Subsystem: 8086:2fa0 Flags: fast devsel Kernel driver in use: sbridge_edac 7f:12.1 1101: 8086:2f30 (rev 02) Subsystem: 8086:2f30 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:12.2 0880: 8086:2f70 (rev 02) Subsystem: 8086:2f70 Flags: fast devsel 7f:12.4 0880: 8086:2f60 (rev 02) Subsystem: 8086:2f60 Flags: fast devsel 7f:12.5 1101: 8086:2f38 (rev 02) Subsystem: 8086:2f38 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:12.6 0880: 8086:2f78 (rev 02) Subsystem: 8086:2f78 Flags: fast devsel 7f:13.0 0880: 8086:2fa8 (rev 02) Subsystem: 8086:2fa8 Flags: fast devsel 7f:13.1 0880: 8086:2f71 (rev 02) Subsystem: 8086:2f71 Flags: fast devsel 7f:13.2 0880: 8086:2faa (rev 02) Subsystem: 8086:2faa Flags: fast devsel 7f:13.3 0880: 8086:2fab (rev 02) Subsystem: 8086:2fab Flags: fast devsel 7f:13.4 0880: 8086:2fac (rev 02) Subsystem: 8086:2fac Flags: fast devsel 7f:13.5 0880: 8086:2fad (rev 02) Subsystem: 8086:2fad Flags: fast devsel 7f:13.6 0880: 8086:2fae (rev 02) Flags: fast devsel 7f:13.7 0880: 8086:2faf (rev 02) Flags: fast devsel 7f:14.0 0880: 8086:2fb0 (rev 02) Subsystem: 8086:2fb0 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:14.1 0880: 8086:2fb1 (rev 02) Subsystem: 8086:2fb1 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:14.2 0880: 8086:2fb2 (rev 02) Subsystem: 8086:2fb2 Flags: fast devsel 7f:14.3 0880: 8086:2fb3 (rev 02) Subsystem: 8086:2fb3 Flags: fast devsel 7f:14.4 0880: 8086:2fbc (rev 02) Flags: fast devsel 7f:14.5 0880: 8086:2fbd (rev 02) Flags: fast devsel 7f:14.6 0880: 8086:2fbe (rev 02) Flags: fast devsel 7f:14.7 0880: 8086:2fbf (rev 02) Flags: fast devsel 7f:15.0 0880: 8086:2fb4 (rev 02) Subsystem: 8086:2fb4 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:15.1 0880: 8086:2fb5 (rev 02) Subsystem: 8086:2fb5 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:15.2 0880: 8086:2fb6 (rev 02) Subsystem: 8086:2fb6 Flags: fast devsel 7f:15.3 0880: 8086:2fb7 (rev 02) Subsystem: 8086:2fb7 Flags: fast devsel 7f:16.0 0880: 8086:2f68 (rev 02) Subsystem: 8086:2f68 Flags: fast devsel 7f:16.1 0880: 8086:2f79 (rev 02) Subsystem: 8086:2f79 Flags: fast devsel 7f:16.2 0880: 8086:2f6a (rev 02) Subsystem: 8086:2f6a Flags: fast devsel 7f:16.3 0880: 8086:2f6b (rev 02) Subsystem: 8086:2f6b Flags: fast devsel 7f:16.4 0880: 8086:2f6c (rev 02) Subsystem: 8086:2f6c Flags: fast devsel 7f:16.5 0880: 8086:2f6d (rev 02) Subsystem: 8086:2f6d Flags: fast devsel 7f:16.6 0880: 8086:2f6e (rev 02) Flags: fast devsel 7f:16.7 0880: 8086:2f6f (rev 02) Flags: fast devsel 7f:17.0 0880: 8086:2fd0 (rev 02) Subsystem: 8086:2fd0 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:17.1 0880: 8086:2fd1 (rev 02) Subsystem: 8086:2fd1 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:17.2 0880: 8086:2fd2 (rev 02) Subsystem: 8086:2fd2 Flags: fast devsel 7f:17.3 0880: 8086:2fd3 (rev 02) Subsystem: 8086:2fd3 Flags: fast devsel 7f:17.4 0880: 8086:2fb8 (rev 02) Flags: fast devsel 7f:17.5 0880: 8086:2fb9 (rev 02) Flags: fast devsel 7f:17.6 0880: 8086:2fba (rev 02) Flags: fast devsel 7f:17.7 0880: 8086:2fbb (rev 02) Flags: fast devsel 7f:18.0 0880: 8086:2fd4 (rev 02) Subsystem: 8086:2fd4 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:18.1 0880: 8086:2fd5 (rev 02) Subsystem: 8086:2fd5 Flags: fast devsel Kernel driver in use: hswep_uncore 7f:18.2 0880: 8086:2fd6 (rev 02) Subsystem: 8086:2fd6 Flags: fast devsel 7f:18.3 0880: 8086:2fd7 (rev 02) Subsystem: 8086:2fd7 Flags: fast devsel 7f:1e.0 0880: 8086:2f98 (rev 02) Subsystem: 8086:2f98 Flags: fast devsel 7f:1e.1 0880: 8086:2f99 (rev 02) Subsystem: 8086:2f99 Flags: fast devsel 7f:1e.2 0880: 8086:2f9a (rev 02) Subsystem: 8086:2f9a Flags: fast devsel 7f:1e.3 0880: 8086:2fc0 (rev 02) Subsystem: 8086:2fc0 Flags: fast devsel I/O ports at <ignored> [disabled] Kernel driver in use: hswep_uncore 7f:1e.4 0880: 8086:2f9c (rev 02) Subsystem: 8086:2f9c Flags: fast devsel 7f:1f.0 0880: 8086:2f88 (rev 02) Flags: fast devsel 7f:1f.2 0880: 8086:2f8a (rev 02) Flags: fast devsel 80:01.0 0604: 8086:2f02 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=81, subordinate=81, sec-latency=0 Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 80:02.0 0604: 8086:2f04 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=82, subordinate=82, sec-latency=0 Memory behind bridge: c8100000-c90fffff Prefetchable memory behind bridge: 0000037ffc000000-0000037fffffffff Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 80:03.0 0604: 8086:2f08 (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=83, subordinate=83, sec-latency=0 Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 80:03.2 0604: 8086:2f0a (rev 02) (prog-if 00 [Normal decode]) Flags: bus master, fast devsel, latency 0 Bus: primary=80, secondary=84, subordinate=84, sec-latency=0 Capabilities: [40] Subsystem: 8086:0000 Capabilities: [60] MSI: Enable+ Count=1/2 Maskable+ 64bit- Capabilities: [90] Express Root Port (Slot+), MSI 00 Capabilities: [e0] Power Management version 3 Capabilities: [100] Vendor Specific Information: ID=0002 Rev=0 Len=00c <?> Capabilities: [110] Access Control Services Capabilities: [148] Advanced Error Reporting Capabilities: [1d0] Vendor Specific Information: ID=0003 Rev=1 Len=00a <?> Capabilities: [250] #19 Capabilities: [280] Vendor Specific Information: ID=0005 Rev=3 Len=018 <?> Capabilities: [300] Vendor Specific Information: ID=0008 Rev=0 Len=038 <?> Kernel driver in use: pcieport 80:05.0 0880: 8086:2f28 (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:05.1 0880: 8086:2f29 (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [100] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> Capabilities: [110] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> Capabilities: [120] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> Capabilities: [130] Vendor Specific Information: ID=0006 Rev=1 Len=010 <?> 80:05.2 0880: 8086:2f2a (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:05.4 0800: 8086:2f2c (rev 02) (prog-if 20 [IO(X)-APIC]) Subsystem: 8086:0000 Flags: bus master, fast devsel, latency 0 Memory at c8000000 (32-bit, non-prefetchable) [size=4K] Capabilities: [44] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [e0] Power Management version 3 80:05.6 1101: 8086:2f39 (rev 02) Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Kernel driver in use: hswep_uncore 80:06.0 0880: 8086:2f10 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 80:06.1 0880: 8086:2f11 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 80:06.2 0880: 8086:2f12 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:06.3 0880: 8086:2f13 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 80:06.4 0880: 8086:2f14 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:06.5 0880: 8086:2f15 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:06.6 0880: 8086:2f16 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:06.7 0880: 8086:2f17 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:07.0 0880: 8086:2f18 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 Capabilities: [100] Vendor Specific Information: ID=0001 Rev=0 Len=0b8 <?> 80:07.1 0880: 8086:2f19 (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:07.2 0880: 8086:2f1a (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:07.3 0880: 8086:2f1b (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 80:07.4 0880: 8086:2f1c (rev 02) Subsystem: 8086:0000 Flags: fast devsel Capabilities: [40] Express Root Complex Integrated Endpoint, MSI 00 82:00.0 0200: 15b3:101d Subsystem: 15b3:0043 Flags: bus master, fast devsel, latency 0, IRQ 216 Memory at 37ffe000000 (64-bit, prefetchable) [size=32M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [1c0] #19 Capabilities: [230] Access Control Services Capabilities: [320] #27 Capabilities: [370] #26 Capabilities: [420] #25 Kernel driver in use: mlx5_core 82:00.1 0200: 15b3:101d Subsystem: 15b3:0043 Flags: bus master, fast devsel, latency 0, IRQ 258 Memory at 37ffc000000 (64-bit, prefetchable) [size=32M] Capabilities: [60] Express Endpoint, MSI 00 Capabilities: [48] Vital Product Data Capabilities: [9c] MSI-X: Enable+ Count=64 Masked- Capabilities: [c0] Vendor Specific Information: Len=18 <?> Capabilities: [40] Power Management version 3 Capabilities: [100] Advanced Error Reporting Capabilities: [150] Alternative Routing-ID Interpretation (ARI) Capabilities: [180] Single Root I/O Virtualization (SR-IOV) Capabilities: [230] Access Control Services Capabilities: [420] #25 Kernel driver in use: mlx5_core ff:08.0 0880: 8086:2f80 (rev 02) Subsystem: 8086:2f80 Flags: fast devsel ff:08.2 1101: 8086:2f32 (rev 02) Subsystem: 8086:2f32 Flags: fast devsel Kernel driver in use: hswep_uncore ff:08.3 0880: 8086:2f83 (rev 02) Subsystem: 8086:2f83 Flags: fast devsel ff:08.5 0880: 8086:2f85 (rev 02) Subsystem: 8086:2f85 Flags: fast devsel ff:08.6 0880: 8086:2f86 (rev 02) Subsystem: 8086:2f86 Flags: fast devsel Kernel driver in use: hswep_uncore ff:08.7 0880: 8086:2f87 (rev 02) Subsystem: 8086:2f87 Flags: fast devsel ff:09.0 0880: 8086:2f90 (rev 02) Subsystem: 8086:2f90 Flags: fast devsel ff:09.2 1101: 8086:2f33 (rev 02) Subsystem: 8086:2f33 Flags: fast devsel Kernel driver in use: hswep_uncore ff:09.3 0880: 8086:2f93 (rev 02) Subsystem: 8086:2f93 Flags: fast devsel ff:09.5 0880: 8086:2f95 (rev 02) Subsystem: 8086:2f95 Flags: fast devsel ff:09.6 0880: 8086:2f96 (rev 02) Subsystem: 8086:2f96 Flags: fast devsel Kernel driver in use: hswep_uncore ff:0b.0 0880: 8086:2f81 (rev 02) Subsystem: 8086:2f81 Flags: fast devsel ff:0b.1 1101: 8086:2f36 (rev 02) Subsystem: 8086:2f36 Flags: fast devsel Kernel driver in use: hswep_uncore ff:0b.2 1101: 8086:2f37 (rev 02) Subsystem: 8086:2f37 Flags: fast devsel Kernel driver in use: hswep_uncore ff:0b.4 0880: 8086:2f41 (rev 02) Subsystem: 8086:2f41 Flags: fast devsel ff:0b.5 1101: 8086:2f3e (rev 02) Subsystem: 8086:2f3e Flags: fast devsel Kernel driver in use: hswep_uncore ff:0b.6 1101: 8086:2f3f (rev 02) Subsystem: 8086:2f3f Flags: fast devsel ff:0c.0 0880: 8086:2fe0 (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel ff:0c.1 0880: 8086:2fe1 (rev 02) Subsystem: 8086:2fe1 Flags: fast devsel ff:0c.2 0880: 8086:2fe2 (rev 02) Subsystem: 8086:2fe2 Flags: fast devsel ff:0c.3 0880: 8086:2fe3 (rev 02) Subsystem: 8086:2fe3 Flags: fast devsel ff:0c.4 0880: 8086:2fe4 (rev 02) Subsystem: 8086:2fe4 Flags: fast devsel ff:0c.5 0880: 8086:2fe5 (rev 02) Subsystem: 8086:2fe5 Flags: fast devsel ff:0c.6 0880: 8086:2fe6 (rev 02) Subsystem: 8086:2fe6 Flags: fast devsel ff:0c.7 0880: 8086:2fe7 (rev 02) Subsystem: 8086:2fe7 Flags: fast devsel ff:0d.0 0880: 8086:2fe8 (rev 02) Subsystem: 8086:2fe8 Flags: fast devsel ff:0d.1 0880: 8086:2fe9 (rev 02) Subsystem: 8086:2fe9 Flags: fast devsel ff:0f.0 0880: 8086:2ff8 (rev 02) Flags: fast devsel ff:0f.1 0880: 8086:2ff9 (rev 02) Flags: fast devsel ff:0f.2 0880: 8086:2ffa (rev 02) Flags: fast devsel ff:0f.3 0880: 8086:2ffb (rev 02) Flags: fast devsel ff:0f.4 0880: 8086:2ffc (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel ff:0f.5 0880: 8086:2ffd (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel ff:0f.6 0880: 8086:2ffe (rev 02) Subsystem: 8086:2fe0 Flags: fast devsel ff:10.0 0880: 8086:2f1d (rev 02) Subsystem: 8086:2f1d Flags: fast devsel ff:10.1 1101: 8086:2f34 (rev 02) Subsystem: 8086:2f34 Flags: fast devsel Kernel driver in use: hswep_uncore ff:10.5 0880: 8086:2f1e (rev 02) Subsystem: 8086:2f1e Flags: fast devsel ff:10.6 1101: 8086:2f7d (rev 02) Subsystem: 8086:2f7d Flags: fast devsel ff:10.7 0880: 8086:2f1f (rev 02) Subsystem: 8086:2f1f Flags: fast devsel ff:12.0 0880: 8086:2fa0 (rev 02) Subsystem: 8086:2fa0 Flags: fast devsel ff:12.1 1101: 8086:2f30 (rev 02) Subsystem: 8086:2f30 Flags: fast devsel Kernel driver in use: hswep_uncore ff:12.2 0880: 8086:2f70 (rev 02) Subsystem: 8086:2f70 Flags: fast devsel ff:12.4 0880: 8086:2f60 (rev 02) Subsystem: 8086:2f60 Flags: fast devsel ff:12.5 1101: 8086:2f38 (rev 02) Subsystem: 8086:2f38 Flags: fast devsel Kernel driver in use: hswep_uncore ff:12.6 0880: 8086:2f78 (rev 02) Subsystem: 8086:2f78 Flags: fast devsel ff:13.0 0880: 8086:2fa8 (rev 02) Subsystem: 8086:2fa8 Flags: fast devsel ff:13.1 0880: 8086:2f71 (rev 02) Subsystem: 8086:2f71 Flags: fast devsel ff:13.2 0880: 8086:2faa (rev 02) Subsystem: 8086:2faa Flags: fast devsel ff:13.3 0880: 8086:2fab (rev 02) Subsystem: 8086:2fab Flags: fast devsel ff:13.4 0880: 8086:2fac (rev 02) Subsystem: 8086:2fac Flags: fast devsel ff:13.5 0880: 8086:2fad (rev 02) Subsystem: 8086:2fad Flags: fast devsel ff:13.6 0880: 8086:2fae (rev 02) Flags: fast devsel ff:13.7 0880: 8086:2faf (rev 02) Flags: fast devsel ff:14.0 0880: 8086:2fb0 (rev 02) Subsystem: 8086:2fb0 Flags: fast devsel Kernel driver in use: hswep_uncore ff:14.1 0880: 8086:2fb1 (rev 02) Subsystem: 8086:2fb1 Flags: fast devsel Kernel driver in use: hswep_uncore ff:14.2 0880: 8086:2fb2 (rev 02) Subsystem: 8086:2fb2 Flags: fast devsel ff:14.3 0880: 8086:2fb3 (rev 02) Subsystem: 8086:2fb3 Flags: fast devsel ff:14.4 0880: 8086:2fbc (rev 02) Flags: fast devsel ff:14.5 0880: 8086:2fbd (rev 02) Flags: fast devsel ff:14.6 0880: 8086:2fbe (rev 02) Flags: fast devsel ff:14.7 0880: 8086:2fbf (rev 02) Flags: fast devsel ff:15.0 0880: 8086:2fb4 (rev 02) Subsystem: 8086:2fb4 Flags: fast devsel Kernel driver in use: hswep_uncore ff:15.1 0880: 8086:2fb5 (rev 02) Subsystem: 8086:2fb5 Flags: fast devsel Kernel driver in use: hswep_uncore ff:15.2 0880: 8086:2fb6 (rev 02) Subsystem: 8086:2fb6 Flags: fast devsel ff:15.3 0880: 8086:2fb7 (rev 02) Subsystem: 8086:2fb7 Flags: fast devsel ff:16.0 0880: 8086:2f68 (rev 02) Subsystem: 8086:2f68 Flags: fast devsel ff:16.1 0880: 8086:2f79 (rev 02) Subsystem: 8086:2f79 Flags: fast devsel ff:16.2 0880: 8086:2f6a (rev 02) Subsystem: 8086:2f6a Flags: fast devsel ff:16.3 0880: 8086:2f6b (rev 02) Subsystem: 8086:2f6b Flags: fast devsel ff:16.4 0880: 8086:2f6c (rev 02) Subsystem: 8086:2f6c Flags: fast devsel ff:16.5 0880: 8086:2f6d (rev 02) Subsystem: 8086:2f6d Flags: fast devsel ff:16.6 0880: 8086:2f6e (rev 02) Flags: fast devsel ff:16.7 0880: 8086:2f6f (rev 02) Flags: fast devsel ff:17.0 0880: 8086:2fd0 (rev 02) Subsystem: 8086:2fd0 Flags: fast devsel Kernel driver in use: hswep_uncore ff:17.1 0880: 8086:2fd1 (rev 02) Subsystem: 8086:2fd1 Flags: fast devsel Kernel driver in use: hswep_uncore ff:17.2 0880: 8086:2fd2 (rev 02) Subsystem: 8086:2fd2 Flags: fast devsel ff:17.3 0880: 8086:2fd3 (rev 02) Subsystem: 8086:2fd3 Flags: fast devsel ff:17.4 0880: 8086:2fb8 (rev 02) Flags: fast devsel ff:17.5 0880: 8086:2fb9 (rev 02) Flags: fast devsel ff:17.6 0880: 8086:2fba (rev 02) Flags: fast devsel ff:17.7 0880: 8086:2fbb (rev 02) Flags: fast devsel ff:18.0 0880: 8086:2fd4 (rev 02) Subsystem: 8086:2fd4 Flags: fast devsel Kernel driver in use: hswep_uncore ff:18.1 0880: 8086:2fd5 (rev 02) Subsystem: 8086:2fd5 Flags: fast devsel Kernel driver in use: hswep_uncore ff:18.2 0880: 8086:2fd6 (rev 02) Subsystem: 8086:2fd6 Flags: fast devsel ff:18.3 0880: 8086:2fd7 (rev 02) Subsystem: 8086:2fd7 Flags: fast devsel ff:1e.0 0880: 8086:2f98 (rev 02) Subsystem: 8086:2f98 Flags: fast devsel ff:1e.1 0880: 8086:2f99 (rev 02) Subsystem: 8086:2f99 Flags: fast devsel ff:1e.2 0880: 8086:2f9a (rev 02) Subsystem: 8086:2f9a Flags: fast devsel ff:1e.3 0880: 8086:2fc0 (rev 02) Subsystem: 8086:2fc0 Flags: fast devsel I/O ports at <ignored> [disabled] Kernel driver in use: hswep_uncore ff:1e.4 0880: 8086:2f9c (rev 02) Subsystem: 8086:2f9c Flags: fast devsel ff:1f.0 0880: 8086:2f88 (rev 02) Flags: fast devsel ff:1f.2 0880: 8086:2f8a (rev 02) Flags: fast devsel ^ permalink raw reply [flat|nested] 42+ messages in thread
end of thread, other threads:[~2020-07-23 21:03 UTC | newest] Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-23 19:52 [pull request][net-next 00/10] mlx5 updates 2020-06-23 Saeed Mahameed 2020-06-23 19:52 ` [net-next 01/10] net/mlx5: Avoid eswitch header inclusion in fs core layer Saeed Mahameed 2020-06-23 21:00 ` Jakub Kicinski 2020-06-23 19:52 ` [net-next 02/10] net/mlx5: FWTrace: Add missing space Saeed Mahameed 2020-06-23 19:52 ` [net-next 03/10] net/mlx5: Add a missing macro undefinition Saeed Mahameed 2020-06-23 19:52 ` [net-next 04/10] net/mlx5: Use kfree(ft->g) in arfs_create_groups() Saeed Mahameed 2020-06-23 19:52 ` [net-next 05/10] net/mlx5e: Remove unused mlx5e_xsk_first_unused_channel Saeed Mahameed 2020-06-23 19:52 ` [net-next 06/10] net/mlx5e: Move including net/arp.h from en_rep.c to rep/neigh.c Saeed Mahameed 2020-06-23 21:02 ` Jakub Kicinski 2020-06-23 19:52 ` [net-next 07/10] net/mlx5e: Move TC-specific function definitions into MLX5_CLS_ACT Saeed Mahameed 2020-06-23 21:03 ` Jakub Kicinski 2020-06-23 21:26 ` Saeed Mahameed 2020-06-23 21:33 ` Jakub Kicinski 2020-06-23 19:52 ` [net-next 08/10] net/mlx5e: vxlan: Use RCU for vxlan table lookup Saeed Mahameed 2020-06-23 19:52 ` [net-next 09/10] net/mlx5e: vxlan: Return bool instead of opaque ptr in port_lookup() Saeed Mahameed 2020-06-23 19:52 ` [net-next 10/10] net/mlx5e: Add support for PCI relaxed ordering Saeed Mahameed 2020-06-23 21:31 ` Jakub Kicinski 2020-06-24 6:56 ` Saeed Mahameed 2020-06-24 7:34 ` Aya Levin 2020-06-24 17:22 ` Jakub Kicinski 2020-06-24 20:15 ` Saeed Mahameed [not found] ` <20200624133018.5a4d238b@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com> 2020-07-06 13:00 ` Aya Levin 2020-07-06 16:52 ` Jakub Kicinski 2020-07-06 19:49 ` David Miller 2040-07-08 8:22 ` Aya Levin 2020-07-08 23:16 ` Bjorn Helgaas 2020-07-08 23:26 ` Jason Gunthorpe 2020-07-09 17:35 ` Jonathan Lemon 2020-07-09 18:20 ` Jason Gunthorpe 2020-07-09 19:47 ` Jakub Kicinski 2020-07-10 2:18 ` Saeed Mahameed 2020-07-10 12:21 ` Jason Gunthorpe 2020-07-09 20:33 ` Jonathan Lemon 2020-07-14 10:47 ` Aya Levin 2020-07-23 21:03 ` Alexander Duyck 2020-06-26 20:12 ` Bjorn Helgaas 2020-06-26 20:24 ` David Miller 2020-06-29 9:32 ` Aya Levin 2020-06-29 19:33 ` Bjorn Helgaas 2020-06-29 19:57 ` Raj, Ashok 2020-06-30 7:32 ` Ding Tianhong 2020-07-05 11:15 ` Aya Levin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.