* [PATCH mlx5-next v1 1/3] net/mlx5: Expose optimal performance scatter entries capability
2019-10-07 11:58 [PATCH rdma-next v1 0/3] Optimize SGL registration Leon Romanovsky
@ 2019-10-07 11:58 ` Leon Romanovsky
2019-10-07 11:58 ` [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages Leon Romanovsky
2019-10-07 11:58 ` [PATCH rdma-next v1 3/3] RDMA/mlx5: Add capability for max sge to get optimized performance Leon Romanovsky
2 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2019-10-07 11:58 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe, Christoph Hellwig
Cc: Leon Romanovsky, RDMA mailing list, Or Gerlitz, Yamin Friedman,
Saeed Mahameed, linux-netdev
From: Yamin Friedman <yaminf@mellanox.com>
Expose maximum scatter entries per RDMA READ for optimal performance.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
include/linux/mlx5/mlx5_ifc.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 138c50d5a353..c0bfb1d90dd2 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1153,7 +1153,7 @@ struct mlx5_ifc_cmd_hca_cap_bits {
u8 log_max_srq[0x5];
u8 reserved_at_b0[0x10];
- u8 reserved_at_c0[0x8];
+ u8 max_sgl_for_optimized_performance[0x8];
u8 log_max_cq_sz[0x8];
u8 reserved_at_d0[0xb];
u8 log_max_cq[0x5];
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages
2019-10-07 11:58 [PATCH rdma-next v1 0/3] Optimize SGL registration Leon Romanovsky
2019-10-07 11:58 ` [PATCH mlx5-next v1 1/3] net/mlx5: Expose optimal performance scatter entries capability Leon Romanovsky
@ 2019-10-07 11:58 ` Leon Romanovsky
2019-10-07 12:12 ` Christoph Hellwig
2019-10-07 11:58 ` [PATCH rdma-next v1 3/3] RDMA/mlx5: Add capability for max sge to get optimized performance Leon Romanovsky
2 siblings, 1 reply; 8+ messages in thread
From: Leon Romanovsky @ 2019-10-07 11:58 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe, Christoph Hellwig
Cc: Leon Romanovsky, RDMA mailing list, Or Gerlitz, Yamin Friedman,
Saeed Mahameed, linux-netdev
From: Yamin Friedman <yaminf@mellanox.com>
If there are more scatter entries than the recommended limit provided by
the ib device, UMR registration is used. This will provide optimal
performance when performing large RDMA READs over devices that advertise
the threshold capability.
With ConnectX-5 running NVMeoF RDMA with FIO single QP 128KB writes:
Without use of cap: 70Gb/sec
With use of cap: 84Gb/sec
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/core/rw.c | 14 ++++++++------
include/rdma/ib_verbs.h | 2 ++
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index 5337393d4dfe..8739bd28232b 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -20,14 +20,16 @@ module_param_named(force_mr, rdma_rw_force_mr, bool, 0);
MODULE_PARM_DESC(force_mr, "Force usage of MRs for RDMA READ/WRITE operations");
/*
- * Check if the device might use memory registration. This is currently only
- * true for iWarp devices. In the future we can hopefully fine tune this based
- * on HCA driver input.
+ * Check if the device might use memory registration. This is currently
+ * true for iWarp devices and devices that have optimized SGL registration
+ * logic.
*/
static inline bool rdma_rw_can_use_mr(struct ib_device *dev, u8 port_num)
{
if (rdma_protocol_iwarp(dev, port_num))
return true;
+ if (dev->attrs.max_sgl_rd)
+ return true;
if (unlikely(rdma_rw_force_mr))
return true;
return false;
@@ -37,15 +39,15 @@ static inline bool rdma_rw_can_use_mr(struct ib_device *dev, u8 port_num)
* Check if the device will use memory registration for this RW operation.
* We currently always use memory registrations for iWarp RDMA READs, and
* have a debug option to force usage of MRs.
- *
- * XXX: In the future we can hopefully fine tune this based on HCA driver
- * input.
*/
static inline bool rdma_rw_io_needs_mr(struct ib_device *dev, u8 port_num,
enum dma_data_direction dir, int dma_nents)
{
if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
return true;
+ if (dev->attrs.max_sgl_rd && dir == DMA_FROM_DEVICE &&
+ dma_nents > dev->attrs.max_sgl_rd)
+ return true;
if (unlikely(rdma_rw_force_mr))
return true;
return false;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 4f671378dbfc..60fd98a9b7e8 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -445,6 +445,8 @@ struct ib_device_attr {
struct ib_tm_caps tm_caps;
struct ib_cq_caps cq_caps;
u64 max_dm_size;
+ /* Max entries for sgl for optimized performance per READ */
+ u32 max_sgl_rd;
};
enum ib_mtu {
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages
2019-10-07 11:58 ` [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages Leon Romanovsky
@ 2019-10-07 12:12 ` Christoph Hellwig
2019-10-07 12:36 ` Leon Romanovsky
0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-07 12:12 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Doug Ledford, Jason Gunthorpe, Christoph Hellwig,
Leon Romanovsky, RDMA mailing list, Or Gerlitz, Yamin Friedman,
Saeed Mahameed, linux-netdev
Sorry for nitpicking again, but..
On Mon, Oct 07, 2019 at 02:58:18PM +0300, Leon Romanovsky wrote:
> @@ -37,15 +39,15 @@ static inline bool rdma_rw_can_use_mr(struct ib_device *dev, u8 port_num)
> * Check if the device will use memory registration for this RW operation.
> * We currently always use memory registrations for iWarp RDMA READs, and
> * have a debug option to force usage of MRs.
> - *
> - * XXX: In the future we can hopefully fine tune this based on HCA driver
> - * input.
The above comment needs an updated a la:
* Check if the device will use memory registration for this RW operation.
* For RDMA READs we must use MRs on iWarp and can optionaly use them as an
* optimaztion otherwise. Additionally we have a debug option to force usage
* of MRs to help testing this code path.
> if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
> return true;
> + if (dev->attrs.max_sgl_rd && dir == DMA_FROM_DEVICE &&
> + dma_nents > dev->attrs.max_sgl_rd)
> + return true;
This can be simplified to:
if (dir == DMA_FROM_DEVICE &&
(rdma_protocol_iwarp(dev, port_num) ||
(dev->attrs.max_sgl_rd && dma_nents > dev->attrs.max_sgl_rd)))
return true;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages
2019-10-07 12:12 ` Christoph Hellwig
@ 2019-10-07 12:36 ` Leon Romanovsky
2019-10-07 12:48 ` Christoph Hellwig
0 siblings, 1 reply; 8+ messages in thread
From: Leon Romanovsky @ 2019-10-07 12:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Or Gerlitz,
Yamin Friedman, Saeed Mahameed, linux-netdev
On Mon, Oct 07, 2019 at 05:12:44AM -0700, Christoph Hellwig wrote:
> Sorry for nitpicking again, but..
>
> On Mon, Oct 07, 2019 at 02:58:18PM +0300, Leon Romanovsky wrote:
> > @@ -37,15 +39,15 @@ static inline bool rdma_rw_can_use_mr(struct ib_device *dev, u8 port_num)
> > * Check if the device will use memory registration for this RW operation.
> > * We currently always use memory registrations for iWarp RDMA READs, and
> > * have a debug option to force usage of MRs.
> > - *
> > - * XXX: In the future we can hopefully fine tune this based on HCA driver
> > - * input.
>
> The above comment needs an updated a la:
>
> * Check if the device will use memory registration for this RW operation.
> * For RDMA READs we must use MRs on iWarp and can optionaly use them as an
> * optimaztion otherwise. Additionally we have a debug option to force usage
> * of MRs to help testing this code path.
>
>
> > if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
> > return true;
> > + if (dev->attrs.max_sgl_rd && dir == DMA_FROM_DEVICE &&
> > + dma_nents > dev->attrs.max_sgl_rd)
> > + return true;
>
> This can be simplified to:
>
> if (dir == DMA_FROM_DEVICE &&
> (rdma_protocol_iwarp(dev, port_num) ||
> (dev->attrs.max_sgl_rd && dma_nents > dev->attrs.max_sgl_rd)))
> return true;
I don't think that it simplifies and wanted to make separate checks to
be separated. For example, rdma_protocol_iwarp() has nothing to do with
attrs.max_sgl_rd.
I'll fix comment.
Thanks
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages
2019-10-07 12:36 ` Leon Romanovsky
@ 2019-10-07 12:48 ` Christoph Hellwig
2019-10-07 13:17 ` Leon Romanovsky
0 siblings, 1 reply; 8+ messages in thread
From: Christoph Hellwig @ 2019-10-07 12:48 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Christoph Hellwig, Doug Ledford, Jason Gunthorpe,
RDMA mailing list, Or Gerlitz, Yamin Friedman, Saeed Mahameed,
linux-netdev
On Mon, Oct 07, 2019 at 03:36:56PM +0300, Leon Romanovsky wrote:
> > > if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
> > > return true;
> > > + if (dev->attrs.max_sgl_rd && dir == DMA_FROM_DEVICE &&
> > > + dma_nents > dev->attrs.max_sgl_rd)
> > > + return true;
> >
> > This can be simplified to:
> >
> > if (dir == DMA_FROM_DEVICE &&
> > (rdma_protocol_iwarp(dev, port_num) ||
> > (dev->attrs.max_sgl_rd && dma_nents > dev->attrs.max_sgl_rd)))
> > return true;
>
> I don't think that it simplifies and wanted to make separate checks to
> be separated. For example, rdma_protocol_iwarp() has nothing to do with
> attrs.max_sgl_rd.
The important bit is to have the DMA_FROM_DEVICE check only once, as
we only do the registration for reads with either parameter. So if
you want it more verbose the wya would be:
if (dir == DMA_FROM_DEVICE) {
if (rdma_protocol_iwarp(dev, port_num))
return true;
if (dev->attrs.max_sgl_rd && dma_nents > dev->attrs.max_sgl_rd)
return true;
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages
2019-10-07 12:48 ` Christoph Hellwig
@ 2019-10-07 13:17 ` Leon Romanovsky
0 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2019-10-07 13:17 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Doug Ledford, Jason Gunthorpe, RDMA mailing list, Or Gerlitz,
Yamin Friedman, Saeed Mahameed, linux-netdev
On Mon, Oct 07, 2019 at 05:48:31AM -0700, Christoph Hellwig wrote:
> On Mon, Oct 07, 2019 at 03:36:56PM +0300, Leon Romanovsky wrote:
> > > > if (rdma_protocol_iwarp(dev, port_num) && dir == DMA_FROM_DEVICE)
> > > > return true;
> > > > + if (dev->attrs.max_sgl_rd && dir == DMA_FROM_DEVICE &&
> > > > + dma_nents > dev->attrs.max_sgl_rd)
> > > > + return true;
> > >
> > > This can be simplified to:
> > >
> > > if (dir == DMA_FROM_DEVICE &&
> > > (rdma_protocol_iwarp(dev, port_num) ||
> > > (dev->attrs.max_sgl_rd && dma_nents > dev->attrs.max_sgl_rd)))
> > > return true;
> >
> > I don't think that it simplifies and wanted to make separate checks to
> > be separated. For example, rdma_protocol_iwarp() has nothing to do with
> > attrs.max_sgl_rd.
>
> The important bit is to have the DMA_FROM_DEVICE check only once, as
> we only do the registration for reads with either parameter. So if
> you want it more verbose the wya would be:
>
> if (dir == DMA_FROM_DEVICE) {
> if (rdma_protocol_iwarp(dev, port_num))
> return true;
> if (dev->attrs.max_sgl_rd && dma_nents > dev->attrs.max_sgl_rd)
> return true;
> }
I'm doing it now, Thank you for taking time to explain.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH rdma-next v1 3/3] RDMA/mlx5: Add capability for max sge to get optimized performance
2019-10-07 11:58 [PATCH rdma-next v1 0/3] Optimize SGL registration Leon Romanovsky
2019-10-07 11:58 ` [PATCH mlx5-next v1 1/3] net/mlx5: Expose optimal performance scatter entries capability Leon Romanovsky
2019-10-07 11:58 ` [PATCH rdma-next v1 2/3] RDMA/rw: Support threshold for registration vs scattering to local pages Leon Romanovsky
@ 2019-10-07 11:58 ` Leon Romanovsky
2 siblings, 0 replies; 8+ messages in thread
From: Leon Romanovsky @ 2019-10-07 11:58 UTC (permalink / raw)
To: Doug Ledford, Jason Gunthorpe, Christoph Hellwig
Cc: Leon Romanovsky, RDMA mailing list, Or Gerlitz, Yamin Friedman,
Saeed Mahameed, linux-netdev
From: Yamin Friedman <yaminf@mellanox.com>
Allows the IB device to provide a value of maximum scatter gather entries
per RDMA READ.
In certain cases it may be preferable for a device to perform UMR memory
registration rather than have many scatter entries in a single RDMA READ.
This provides a significant performance increase in devices capable of
using different memory registration schemes based on the number of scatter
gather entries. This general capability allows each device vendor to fine
tune when it is better to use memory registration.
Signed-off-by: Yamin Friedman <yaminf@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
drivers/infiniband/hw/mlx5/main.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index fa23c8e7043b..39d54e285ae9 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1012,6 +1012,8 @@ static int mlx5_ib_query_device(struct ib_device *ibdev,
1 << MLX5_CAP_GEN(mdev, log_max_klm_list_size);
props->max_pi_fast_reg_page_list_len =
props->max_fast_reg_page_list_len / 2;
+ props->max_sgl_rd =
+ MLX5_CAP_GEN(mdev, max_sgl_for_optimized_performance);
get_atomic_caps_qp(dev, props);
props->masked_atomic_cap = IB_ATOMIC_NONE;
props->max_mcast_grp = 1 << MLX5_CAP_GEN(mdev, log_max_mcg);
--
2.20.1
^ permalink raw reply related [flat|nested] 8+ messages in thread