All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next][PATCH 0/5] rds: add tos support
@ 2019-02-05  0:04 Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 1/5] rds: make v3.1 as compat version Santosh Shilimkar
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-02-05  0:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: yanjun.zhu, santosh.shilimkar

RDS applications make use of tos to classify database traffic.
This feature has been used in shipping products from 2.6.32 based
kernels. Its tied with RDS v4.1 protocol version and the compatibility
gets negotiated as part of connections setup.

Patchset keeps full backward compatibility using existing connection
negotiation scheme. Currently the feature is exploited by RDMA
transport and for TCP transport the user tos values are mapped to
same default class (0).

For RDMA transports, RDMA CM service type API is used to
set up different SL(service lanes) and the IB fabric is configured
for tos mapping using Subnet Manager(SL to VL mappings).
Similarly for ROCE fabric, user priority is mapped with different
DSCP code points which are associated with different switch queues
in the fabric.

The original code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved significantly over period of time.

Thanks to Yanjun for doing testing with various combinations of host like
v3.1<->v4.1, v4.1.<->v3.1, v4.1 upstream to shipping v4.1 etc etc

Patchset is also available on below git tree.

The following changes since commit cc7335786f7278d66bdcf96d3d411edfcb01be51:

  socket: fix for Add SO_TIMESTAMP[NS]_NEW (2019-02-03 20:36:11 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_net-next-5.1/rds-tos-v4

for you to fetch changes up to fd261ce6a30e01ad67c416e2c67e263024b3a6f9:

  rds: rdma: update rdma transport for tos (2019-02-04 14:59:13 -0800)

----------------------------------------------------------------

Santosh Shilimkar (5):
  rds: make v3.1 as compat version
  rds: rdma: add consumer reject
  rds: add type of service(tos) infrastructure
  rds: add transport specific tos_map hook
  rds: rdma: update rdma transport for tos

 include/uapi/linux/rds.h | 11 ++++++++
 net/rds/af_rds.c         | 37 ++++++++++++++++++++++++-
 net/rds/connection.c     | 21 ++++++++------
 net/rds/ib.c             | 11 ++++++++
 net/rds/ib.h             |  4 ++-
 net/rds/ib_cm.c          | 72 +++++++++++++++++++++++++++---------------------
 net/rds/ib_recv.c        |  4 +--
 net/rds/ib_send.c        |  5 ++--
 net/rds/rdma_transport.c | 14 ++++++++++
 net/rds/rdma_transport.h |  6 ++++
 net/rds/rds.h            | 14 ++++++++--
 net/rds/recv.c           |  1 +
 net/rds/send.c           |  7 +++--
 net/rds/tcp.c            |  8 ++++++
 net/rds/tcp_listen.c     |  2 +-
 net/rds/threads.c        |  1 +
 16 files changed, 166 insertions(+), 52 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [net-next][PATCH 1/5] rds: make v3.1 as compat version
  2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
@ 2019-02-05  0:04 ` Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 2/5] rds: rdma: add consumer reject Santosh Shilimkar
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-02-05  0:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: yanjun.zhu, santosh.shilimkar

Mark RDSv3.1 as compat version and add v4.1 version macro's.
Subsequent patches enable TOS(Type of Service) feature which is
tied with v4.1 for RDMA transport.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 net/rds/connection.c |  1 +
 net/rds/ib_cm.c      | 40 +++++++++++++++++++++++-----------------
 net/rds/rds.h        |  4 ++++
 net/rds/threads.c    |  1 +
 4 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index 3bd2f4a..1ab14b6 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -139,6 +139,7 @@ static void __rds_conn_path_init(struct rds_connection *conn,
 	atomic_set(&cp->cp_state, RDS_CONN_DOWN);
 	cp->cp_send_gen = 0;
 	cp->cp_reconnect_jiffies = 0;
+	cp->cp_conn->c_proposed_version = RDS_PROTOCOL_VERSION;
 	INIT_DELAYED_WORK(&cp->cp_send_w, rds_send_worker);
 	INIT_DELAYED_WORK(&cp->cp_recv_w, rds_recv_worker);
 	INIT_DELAYED_WORK(&cp->cp_conn_w, rds_connect_worker);
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index bfbb31f..0eeae09 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -133,23 +133,24 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
 		rds_ib_set_flow_control(conn, be32_to_cpu(credit));
 	}
 
-	if (conn->c_version < RDS_PROTOCOL(3, 1)) {
-		pr_notice("RDS/IB: Connection <%pI6c,%pI6c> version %u.%u no longer supported\n",
-			  &conn->c_laddr, &conn->c_faddr,
-			  RDS_PROTOCOL_MAJOR(conn->c_version),
-			  RDS_PROTOCOL_MINOR(conn->c_version));
-		set_bit(RDS_DESTROY_PENDING, &conn->c_path[0].cp_flags);
-		rds_conn_destroy(conn);
-		return;
-	} else {
-		pr_notice("RDS/IB: %s conn connected <%pI6c,%pI6c> version %u.%u%s\n",
-			  ic->i_active_side ? "Active" : "Passive",
-			  &conn->c_laddr, &conn->c_faddr,
-			  RDS_PROTOCOL_MAJOR(conn->c_version),
-			  RDS_PROTOCOL_MINOR(conn->c_version),
-			  ic->i_flowctl ? ", flow control" : "");
+	if (conn->c_version < RDS_PROTOCOL_VERSION) {
+		if (conn->c_version != RDS_PROTOCOL_COMPAT_VERSION) {
+			pr_notice("RDS/IB: Connection <%pI6c,%pI6c> version %u.%u no longer supported\n",
+				  &conn->c_laddr, &conn->c_faddr,
+				  RDS_PROTOCOL_MAJOR(conn->c_version),
+				  RDS_PROTOCOL_MINOR(conn->c_version));
+			rds_conn_destroy(conn);
+			return;
+		}
 	}
 
+	pr_notice("RDS/IB: %s conn connected <%pI6c,%pI6c> version %u.%u%s\n",
+		  ic->i_active_side ? "Active" : "Passive",
+		  &conn->c_laddr, &conn->c_faddr,
+		  RDS_PROTOCOL_MAJOR(conn->c_version),
+		  RDS_PROTOCOL_MINOR(conn->c_version),
+		  ic->i_flowctl ? ", flow control" : "");
+
 	atomic_set(&ic->i_cq_quiesce, 0);
 
 	/* Init rings and fill recv. this needs to wait until protocol
@@ -184,6 +185,7 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
 					    NULL);
 	}
 
+	conn->c_proposed_version = conn->c_version;
 	rds_connect_complete(conn);
 }
 
@@ -667,6 +669,9 @@ static u32 rds_ib_protocol_compatible(struct rdma_cm_event *event, bool isv6)
 		version = RDS_PROTOCOL_3_0;
 		while ((common >>= 1) != 0)
 			version++;
+	} else if (RDS_PROTOCOL_COMPAT_VERSION ==
+		   RDS_PROTOCOL(major, minor)) {
+		version = RDS_PROTOCOL_COMPAT_VERSION;
 	} else {
 		if (isv6)
 			printk_ratelimited(KERN_NOTICE "RDS: Connection from %pI6c using incompatible protocol version %u.%u\n",
@@ -861,7 +866,7 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
 
 	/* If the peer doesn't do protocol negotiation, we must
 	 * default to RDSv3.0 */
-	rds_ib_set_protocol(conn, RDS_PROTOCOL_3_0);
+	rds_ib_set_protocol(conn, RDS_PROTOCOL_VERSION);
 	ic->i_flowctl = rds_ib_sysctl_flow_control;	/* advertise flow control */
 
 	ret = rds_ib_setup_qp(conn);
@@ -870,7 +875,8 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
 		goto out;
 	}
 
-	rds_ib_cm_fill_conn_param(conn, &conn_param, &dp, RDS_PROTOCOL_VERSION,
+	rds_ib_cm_fill_conn_param(conn, &conn_param, &dp,
+				  conn->c_proposed_version,
 				  UINT_MAX, UINT_MAX, isv6);
 	ret = rdma_connect(cm_id, &conn_param);
 	if (ret)
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 4ffe100..660023f 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -19,10 +19,13 @@
  */
 #define RDS_PROTOCOL_3_0	0x0300
 #define RDS_PROTOCOL_3_1	0x0301
+#define RDS_PROTOCOL_4_0	0x0400
+#define RDS_PROTOCOL_4_1	0x0401
 #define RDS_PROTOCOL_VERSION	RDS_PROTOCOL_3_1
 #define RDS_PROTOCOL_MAJOR(v)	((v) >> 8)
 #define RDS_PROTOCOL_MINOR(v)	((v) & 255)
 #define RDS_PROTOCOL(maj, min)	(((maj) << 8) | min)
+#define RDS_PROTOCOL_COMPAT_VERSION	RDS_PROTOCOL_3_1
 
 /* The following ports, 16385, 18634, 18635, are registered with IANA as
  * the ports to be used for RDS over TCP and UDP.  Currently, only RDS over
@@ -151,6 +154,7 @@ struct rds_connection {
 	struct rds_cong_map	*c_fcong;
 
 	/* Protocol version */
+	unsigned int		c_proposed_version;
 	unsigned int		c_version;
 	possible_net_t		c_net;
 
diff --git a/net/rds/threads.c b/net/rds/threads.c
index e64f9e4..32dc50f 100644
--- a/net/rds/threads.c
+++ b/net/rds/threads.c
@@ -93,6 +93,7 @@ void rds_connect_path_complete(struct rds_conn_path *cp, int curr)
 		queue_delayed_work(rds_wq, &cp->cp_recv_w, 0);
 	}
 	rcu_read_unlock();
+	cp->cp_conn->c_proposed_version = RDS_PROTOCOL_VERSION;
 }
 EXPORT_SYMBOL_GPL(rds_connect_path_complete);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next][PATCH 2/5] rds: rdma: add consumer reject
  2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 1/5] rds: make v3.1 as compat version Santosh Shilimkar
@ 2019-02-05  0:04 ` Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 3/5] rds: add type of service(tos) infrastructure Santosh Shilimkar
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-02-05  0:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: yanjun.zhu, santosh.shilimkar

For legacy protocol version incompatibility with non linux RDS,
consumer reject reason being used to convey it to peer. But the
choice of reject reason value as '1' was really poor.

Anyway for interoperability reasons with shipping products,
it needs to be supported. For any future versions, properly
encoded reject reason should to be used.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 net/rds/ib_cm.c          |  6 ++++--
 net/rds/rdma_transport.c | 12 ++++++++++++
 net/rds/rdma_transport.h |  6 ++++++
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 0eeae09..a1c3ad3 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -734,8 +734,10 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
 
 	/* Check whether the remote protocol version matches ours. */
 	version = rds_ib_protocol_compatible(event, isv6);
-	if (!version)
+	if (!version) {
+		err = RDS_RDMA_REJ_INCOMPAT;
 		goto out;
+	}
 
 	dp = event->param.conn.private_data;
 	if (isv6) {
@@ -851,7 +853,7 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
 	if (conn)
 		mutex_unlock(&conn->c_cm_lock);
 	if (err)
-		rdma_reject(cm_id, NULL, 0);
+		rdma_reject(cm_id, &err, sizeof(int));
 	return destroy;
 }
 
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 6b0f57c..63cbc6b 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -51,6 +51,8 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 	struct rds_connection *conn = cm_id->context;
 	struct rds_transport *trans;
 	int ret = 0;
+	int *err;
+	u8 len;
 
 	rdsdebug("conn %p id %p handling event %u (%s)\n", conn, cm_id,
 		 event->event, rdma_event_msg(event->event));
@@ -106,8 +108,18 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 		break;
 
 	case RDMA_CM_EVENT_REJECTED:
+		if (!conn)
+			break;
+		err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
+		if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
+			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
+				&conn->c_laddr, &conn->c_faddr);
+			conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
+			rds_conn_drop(conn);
+		}
 		rdsdebug("Connection rejected: %s\n",
 			 rdma_reject_msg(cm_id, event->status));
+		break;
 		/* FALLTHROUGH */
 	case RDMA_CM_EVENT_ADDR_ERROR:
 	case RDMA_CM_EVENT_ROUTE_ERROR:
diff --git a/net/rds/rdma_transport.h b/net/rds/rdma_transport.h
index 200d313..bfafd4a 100644
--- a/net/rds/rdma_transport.h
+++ b/net/rds/rdma_transport.h
@@ -11,6 +11,12 @@
 
 #define RDS_RDMA_RESOLVE_TIMEOUT_MS     5000
 
+/* Below reject reason is for legacy interoperability issue with non-linux
+ * RDS endpoints where older version incompatibility is conveyed via value 1.
+ * For future version(s), proper encoded reject reason should be be used.
+ */
+#define RDS_RDMA_REJ_INCOMPAT		1
+
 int rds_rdma_conn_connect(struct rds_connection *conn);
 int rds_rdma_cm_event_handler(struct rdma_cm_id *cm_id,
 			      struct rdma_cm_event *event);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next][PATCH 3/5] rds: add type of service(tos) infrastructure
  2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 1/5] rds: make v3.1 as compat version Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 2/5] rds: rdma: add consumer reject Santosh Shilimkar
@ 2019-02-05  0:04 ` Santosh Shilimkar
       [not found]   ` <20190307220106.9099-1-gerd.rausch@oracle.com>
  2019-02-05  0:04 ` [net-next][PATCH 4/5] rds: add transport specific tos_map hook Santosh Shilimkar
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Santosh Shilimkar @ 2019-02-05  0:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: yanjun.zhu, santosh.shilimkar

RDS Service type (TOS) is user-defined and needs to be configured
via RDS IOCTL interface. It must be set before initiating any
traffic and once set the TOS can not be changed. All out-going
traffic from the socket will be associated with its TOS.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 include/uapi/linux/rds.h | 11 +++++++++++
 net/rds/af_rds.c         | 35 ++++++++++++++++++++++++++++++++++-
 net/rds/connection.c     | 20 +++++++++++---------
 net/rds/ib.c             |  1 +
 net/rds/ib_cm.c          |  2 +-
 net/rds/rdma_transport.c |  1 +
 net/rds/rds.h            |  9 +++++++--
 net/rds/recv.c           |  1 +
 net/rds/send.c           |  6 +++---
 net/rds/tcp.c            |  1 +
 net/rds/tcp_listen.c     |  2 +-
 11 files changed, 72 insertions(+), 17 deletions(-)

diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index 8b73cb6..5d0f76c 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -69,6 +69,12 @@
 #define RDS_TRANS_COUNT	3
 #define	RDS_TRANS_NONE	(~0)
 
+/* IOCTLS commands for SOL_RDS */
+#define SIOCRDSSETTOS		(SIOCPROTOPRIVATE)
+#define SIOCRDSGETTOS		(SIOCPROTOPRIVATE + 1)
+
+typedef __u8	rds_tos_t;
+
 /*
  * Control message types for SOL_RDS.
  *
@@ -149,6 +155,7 @@ struct rds_info_connection {
 	__be32		faddr;
 	__u8		transport[TRANSNAMSIZ];		/* null term ascii */
 	__u8		flags;
+	__u8		tos;
 } __attribute__((packed));
 
 struct rds6_info_connection {
@@ -171,6 +178,7 @@ struct rds_info_message {
 	__be16		lport;
 	__be16		fport;
 	__u8		flags;
+	__u8		tos;
 } __attribute__((packed));
 
 struct rds6_info_message {
@@ -214,6 +222,7 @@ struct rds_info_tcp_socket {
 	__u32           last_sent_nxt;
 	__u32           last_expected_una;
 	__u32           last_seen_una;
+	__u8		tos;
 } __attribute__((packed));
 
 struct rds6_info_tcp_socket {
@@ -240,6 +249,7 @@ struct rds_info_rdma_connection {
 	__u32		max_send_sge;
 	__u32		rdma_mr_max;
 	__u32		rdma_mr_size;
+	__u8		tos;
 };
 
 struct rds6_info_rdma_connection {
@@ -253,6 +263,7 @@ struct rds6_info_rdma_connection {
 	__u32		max_send_sge;
 	__u32		rdma_mr_max;
 	__u32		rdma_mr_size;
+	__u8		tos;
 };
 
 /* RDS message Receive Path Latency points */
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 65571a6..9045158 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -254,7 +254,38 @@ static __poll_t rds_poll(struct file *file, struct socket *sock,
 
 static int rds_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
-	return -ENOIOCTLCMD;
+	struct rds_sock *rs = rds_sk_to_rs(sock->sk);
+	rds_tos_t tos;
+
+	switch (cmd) {
+	case SIOCRDSSETTOS:
+		if (get_user(tos, (rds_tos_t __user *)arg))
+			return -EFAULT;
+
+		if (rs->rs_transport &&
+		    rs->rs_transport->t_type == RDS_TRANS_TCP)
+			tos = 0;
+
+		spin_lock_bh(&rds_sock_lock);
+		if (rs->rs_tos || rs->rs_conn) {
+			spin_unlock_bh(&rds_sock_lock);
+			return -EINVAL;
+		}
+		rs->rs_tos = tos;
+		spin_unlock_bh(&rds_sock_lock);
+		break;
+	case SIOCRDSGETTOS:
+		spin_lock_bh(&rds_sock_lock);
+		tos = rs->rs_tos;
+		spin_unlock_bh(&rds_sock_lock);
+		if (put_user(tos, (rds_tos_t __user *)arg))
+			return -EFAULT;
+		break;
+	default:
+		return -ENOIOCTLCMD;
+	}
+
+	return 0;
 }
 
 static int rds_cancel_sent_to(struct rds_sock *rs, char __user *optval,
@@ -650,6 +681,8 @@ static int __rds_create(struct socket *sock, struct sock *sk, int protocol)
 	spin_lock_init(&rs->rs_rdma_lock);
 	rs->rs_rdma_keys = RB_ROOT;
 	rs->rs_rx_traces = 0;
+	rs->rs_tos = 0;
+	rs->rs_conn = NULL;
 
 	spin_lock_bh(&rds_sock_lock);
 	list_add_tail(&rs->rs_item, &rds_sock_list);
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 1ab14b6..7ea134f 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -84,7 +84,7 @@ static struct rds_connection *rds_conn_lookup(struct net *net,
 					      const struct in6_addr *laddr,
 					      const struct in6_addr *faddr,
 					      struct rds_transport *trans,
-					      int dev_if)
+					      u8 tos, int dev_if)
 {
 	struct rds_connection *conn, *ret = NULL;
 
@@ -92,6 +92,7 @@ static struct rds_connection *rds_conn_lookup(struct net *net,
 		if (ipv6_addr_equal(&conn->c_faddr, faddr) &&
 		    ipv6_addr_equal(&conn->c_laddr, laddr) &&
 		    conn->c_trans == trans &&
+		    conn->c_tos == tos &&
 		    net == rds_conn_net(conn) &&
 		    conn->c_dev_if == dev_if) {
 			ret = conn;
@@ -160,7 +161,7 @@ static struct rds_connection *__rds_conn_create(struct net *net,
 						const struct in6_addr *laddr,
 						const struct in6_addr *faddr,
 						struct rds_transport *trans,
-						gfp_t gfp,
+						gfp_t gfp, u8 tos,
 						int is_outgoing,
 						int dev_if)
 {
@@ -172,7 +173,7 @@ static struct rds_connection *__rds_conn_create(struct net *net,
 	int npaths = (trans->t_mp_capable ? RDS_MPATH_WORKERS : 1);
 
 	rcu_read_lock();
-	conn = rds_conn_lookup(net, head, laddr, faddr, trans, dev_if);
+	conn = rds_conn_lookup(net, head, laddr, faddr, trans, tos, dev_if);
 	if (conn &&
 	    conn->c_loopback &&
 	    conn->c_trans != &rds_loop_transport &&
@@ -206,6 +207,7 @@ static struct rds_connection *__rds_conn_create(struct net *net,
 	conn->c_isv6 = !ipv6_addr_v4mapped(laddr);
 	conn->c_faddr = *faddr;
 	conn->c_dev_if = dev_if;
+	conn->c_tos = tos;
 
 #if IS_ENABLED(CONFIG_IPV6)
 	/* If the local address is link local, set c_bound_if to be the
@@ -298,7 +300,7 @@ static struct rds_connection *__rds_conn_create(struct net *net,
 		struct rds_connection *found;
 
 		found = rds_conn_lookup(net, head, laddr, faddr, trans,
-					dev_if);
+					tos, dev_if);
 		if (found) {
 			struct rds_conn_path *cp;
 			int i;
@@ -333,10 +335,10 @@ static struct rds_connection *__rds_conn_create(struct net *net,
 struct rds_connection *rds_conn_create(struct net *net,
 				       const struct in6_addr *laddr,
 				       const struct in6_addr *faddr,
-				       struct rds_transport *trans, gfp_t gfp,
-				       int dev_if)
+				       struct rds_transport *trans, u8 tos,
+				       gfp_t gfp, int dev_if)
 {
-	return __rds_conn_create(net, laddr, faddr, trans, gfp, 0, dev_if);
+	return __rds_conn_create(net, laddr, faddr, trans, gfp, tos, 0, dev_if);
 }
 EXPORT_SYMBOL_GPL(rds_conn_create);
 
@@ -344,9 +346,9 @@ struct rds_connection *rds_conn_create_outgoing(struct net *net,
 						const struct in6_addr *laddr,
 						const struct in6_addr *faddr,
 						struct rds_transport *trans,
-						gfp_t gfp, int dev_if)
+						u8 tos, gfp_t gfp, int dev_if)
 {
-	return __rds_conn_create(net, laddr, faddr, trans, gfp, 1, dev_if);
+	return __rds_conn_create(net, laddr, faddr, trans, gfp, tos, 1, dev_if);
 }
 EXPORT_SYMBOL_GPL(rds_conn_create_outgoing);
 
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 9d7b758..21b6588 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -301,6 +301,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
 
 	iinfo->src_addr = conn->c_laddr.s6_addr32[3];
 	iinfo->dst_addr = conn->c_faddr.s6_addr32[3];
+	iinfo->tos = conn->c_tos;
 
 	memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid));
 	memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid));
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index a1c3ad3..70518e3 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -786,7 +786,7 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
 
 	/* RDS/IB is not currently netns aware, thus init_net */
 	conn = rds_conn_create(&init_net, daddr6, saddr6,
-			       &rds_ib_transport, GFP_KERNEL, ifindex);
+			       &rds_ib_transport, 0, GFP_KERNEL, ifindex);
 	if (IS_ERR(conn)) {
 		rdsdebug("rds_conn_create failed (%ld)\n", PTR_ERR(conn));
 		conn = NULL;
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 63cbc6b..e37f915 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -115,6 +115,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
 				&conn->c_laddr, &conn->c_faddr);
 			conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
+			conn->c_tos = 0;
 			rds_conn_drop(conn);
 		}
 		rdsdebug("Connection rejected: %s\n",
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 660023f..7e52b92 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -158,6 +158,9 @@ struct rds_connection {
 	unsigned int		c_version;
 	possible_net_t		c_net;
 
+	/* TOS */
+	u8			c_tos;
+
 	struct list_head	c_map_item;
 	unsigned long		c_map_queued;
 
@@ -652,6 +655,7 @@ struct rds_sock {
 	u8			rs_rx_traces;
 	u8			rs_rx_trace[RDS_MSG_RX_DGRAM_TRACE_MAX];
 	struct rds_msg_zcopy_queue rs_zcookie_queue;
+	u8			rs_tos;
 };
 
 static inline struct rds_sock *rds_sk_to_rs(const struct sock *sk)
@@ -760,13 +764,14 @@ struct rds_sock *rds_find_bound(const struct in6_addr *addr, __be16 port,
 struct rds_connection *rds_conn_create(struct net *net,
 				       const struct in6_addr *laddr,
 				       const struct in6_addr *faddr,
-				       struct rds_transport *trans, gfp_t gfp,
+				       struct rds_transport *trans,
+				       u8 tos, gfp_t gfp,
 				       int dev_if);
 struct rds_connection *rds_conn_create_outgoing(struct net *net,
 						const struct in6_addr *laddr,
 						const struct in6_addr *faddr,
 						struct rds_transport *trans,
-						gfp_t gfp, int dev_if);
+						u8 tos, gfp_t gfp, int dev_if);
 void rds_conn_shutdown(struct rds_conn_path *cpath);
 void rds_conn_destroy(struct rds_connection *conn);
 void rds_conn_drop(struct rds_connection *conn);
diff --git a/net/rds/recv.c b/net/rds/recv.c
index 6bb6b16..853de48 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -782,6 +782,7 @@ void rds_inc_info_copy(struct rds_incoming *inc,
 
 	minfo.seq = be64_to_cpu(inc->i_hdr.h_sequence);
 	minfo.len = be32_to_cpu(inc->i_hdr.h_len);
+	minfo.tos = inc->i_conn->c_tos;
 
 	if (flip) {
 		minfo.laddr = daddr;
diff --git a/net/rds/send.c b/net/rds/send.c
index fd8b687..c555e12 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1277,12 +1277,12 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
 
 	/* rds_conn_create has a spinlock that runs with IRQ off.
 	 * Caching the conn in the socket helps a lot. */
-	if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr))
+	if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr)) {
 		conn = rs->rs_conn;
-	else {
+	} else {
 		conn = rds_conn_create_outgoing(sock_net(sock->sk),
 						&rs->rs_bound_addr, &daddr,
-						rs->rs_transport,
+						rs->rs_transport, 0,
 						sock->sk->sk_allocation,
 						scope_id);
 		if (IS_ERR(conn)) {
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index c16f0a3..eb68519 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -267,6 +267,7 @@ static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
 		tsinfo.last_sent_nxt = tc->t_last_sent_nxt;
 		tsinfo.last_expected_una = tc->t_last_expected_una;
 		tsinfo.last_seen_una = tc->t_last_seen_una;
+		tsinfo.tos = tc->t_cpath->cp_conn->c_tos;
 
 		rds_info_copy(iter, &tsinfo, sizeof(tsinfo));
 	}
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index c12203f..810a3a4 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -200,7 +200,7 @@ int rds_tcp_accept_one(struct socket *sock)
 
 	conn = rds_conn_create(sock_net(sock->sk),
 			       my_addr, peer_addr,
-			       &rds_tcp_transport, GFP_KERNEL, dev_if);
+			       &rds_tcp_transport, 0, GFP_KERNEL, dev_if);
 
 	if (IS_ERR(conn)) {
 		ret = PTR_ERR(conn);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next][PATCH 4/5] rds: add transport specific tos_map hook
  2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
                   ` (2 preceding siblings ...)
  2019-02-05  0:04 ` [net-next][PATCH 3/5] rds: add type of service(tos) infrastructure Santosh Shilimkar
@ 2019-02-05  0:04 ` Santosh Shilimkar
  2019-02-05  0:04 ` [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos Santosh Shilimkar
  2019-02-07  1:01 ` [net-next][PATCH 0/5] rds: add tos support David Miller
  5 siblings, 0 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-02-05  0:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: yanjun.zhu, santosh.shilimkar

RDMA transport maps user tos to underline virtual lanes(VL)
for IB or DSCP values. RDMA CM transport abstract thats for
RDS. TCP transport makes use of default priority 0 and maps
all user tos values to it.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 net/rds/af_rds.c | 10 ++++++----
 net/rds/ib.c     | 10 ++++++++++
 net/rds/rds.h    |  1 +
 net/rds/tcp.c    |  7 +++++++
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 9045158..d6cc97f 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -255,16 +255,18 @@ static __poll_t rds_poll(struct file *file, struct socket *sock,
 static int rds_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg)
 {
 	struct rds_sock *rs = rds_sk_to_rs(sock->sk);
-	rds_tos_t tos;
+	rds_tos_t utos, tos = 0;
 
 	switch (cmd) {
 	case SIOCRDSSETTOS:
-		if (get_user(tos, (rds_tos_t __user *)arg))
+		if (get_user(utos, (rds_tos_t __user *)arg))
 			return -EFAULT;
 
 		if (rs->rs_transport &&
-		    rs->rs_transport->t_type == RDS_TRANS_TCP)
-			tos = 0;
+		    rs->rs_transport->get_tos_map)
+			tos = rs->rs_transport->get_tos_map(utos);
+		else
+			return -ENOIOCTLCMD;
 
 		spin_lock_bh(&rds_sock_lock);
 		if (rs->rs_tos || rs->rs_conn) {
diff --git a/net/rds/ib.c b/net/rds/ib.c
index 21b6588..2da9b75 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -515,6 +515,15 @@ void rds_ib_exit(void)
 	rds_ib_mr_exit();
 }
 
+static u8 rds_ib_get_tos_map(u8 tos)
+{
+	/* 1:1 user to transport map for RDMA transport.
+	 * In future, if custom map is desired, hook can export
+	 * user configurable map.
+	 */
+	return tos;
+}
+
 struct rds_transport rds_ib_transport = {
 	.laddr_check		= rds_ib_laddr_check,
 	.xmit_path_complete	= rds_ib_xmit_path_complete,
@@ -537,6 +546,7 @@ struct rds_transport rds_ib_transport = {
 	.sync_mr		= rds_ib_sync_mr,
 	.free_mr		= rds_ib_free_mr,
 	.flush_mrs		= rds_ib_flush_mrs,
+	.get_tos_map		= rds_ib_get_tos_map,
 	.t_owner		= THIS_MODULE,
 	.t_name			= "infiniband",
 	.t_unloading		= rds_ib_is_unloading,
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 7e52b92..0d8f67c 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -574,6 +574,7 @@ struct rds_transport {
 	void (*free_mr)(void *trans_private, int invalidate);
 	void (*flush_mrs)(void);
 	bool (*t_unloading)(struct rds_connection *conn);
+	u8 (*get_tos_map)(u8 tos);
 };
 
 /* Bind hash table key length.  It is the sum of the size of a struct
diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index eb68519..fd26941 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -453,6 +453,12 @@ static void rds_tcp_destroy_conns(void)
 
 static void rds_tcp_exit(void);
 
+static u8 rds_tcp_get_tos_map(u8 tos)
+{
+	/* all user tos mapped to default 0 for TCP transport */
+	return 0;
+}
+
 struct rds_transport rds_tcp_transport = {
 	.laddr_check		= rds_tcp_laddr_check,
 	.xmit_path_prepare	= rds_tcp_xmit_path_prepare,
@@ -467,6 +473,7 @@ struct rds_transport rds_tcp_transport = {
 	.inc_free		= rds_tcp_inc_free,
 	.stats_info_copy	= rds_tcp_stats_info_copy,
 	.exit			= rds_tcp_exit,
+	.get_tos_map		= rds_tcp_get_tos_map,
 	.t_owner		= THIS_MODULE,
 	.t_name			= "tcp",
 	.t_type			= RDS_TRANS_TCP,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos
  2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
                   ` (3 preceding siblings ...)
  2019-02-05  0:04 ` [net-next][PATCH 4/5] rds: add transport specific tos_map hook Santosh Shilimkar
@ 2019-02-05  0:04 ` Santosh Shilimkar
  2019-03-05 16:33   ` Gerd Rausch
  2019-02-07  1:01 ` [net-next][PATCH 0/5] rds: add tos support David Miller
  5 siblings, 1 reply; 22+ messages in thread
From: Santosh Shilimkar @ 2019-02-05  0:04 UTC (permalink / raw)
  To: netdev, davem; +Cc: yanjun.zhu, santosh.shilimkar

For RDMA transports, RDS TOS is an extension of IB QoS(Annex A13)
to provide clients the ability to segregate traffic flows for
different type of data. RDMA CM abstract it for ULPs using
rdma_set_service_type(). Internally, each traffic flow is
represented by a connection with all of its independent resources
like that of a normal connection, and is differentiated by
service type. In other words, there can be multiple qp connections
between an IP pair and each supports a unique service type.

The feature has been added from RDSv4.1 onwards and supports
rolling upgrades. RDMA connection metadata also carries the tos
information to set up SL on end to end context. The original
code was developed by Bang Nguyen in downstream kernel back in
2.6.32 kernel days and it has evolved over period of time.

Reviewed-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
[yanjun.zhu@oracle.com: Adapted original patch with ipv6 changes]
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
 net/rds/ib.h             |  4 +++-
 net/rds/ib_cm.c          | 32 +++++++++++++++++---------------
 net/rds/ib_recv.c        |  4 ++--
 net/rds/ib_send.c        |  5 +++--
 net/rds/rdma_transport.c |  1 +
 net/rds/send.c           |  5 +++--
 6 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/net/rds/ib.h b/net/rds/ib.h
index 71ff356..752f922 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -67,7 +67,9 @@ struct rds_ib_conn_priv_cmn {
 	u8			ricpc_protocol_major;
 	u8			ricpc_protocol_minor;
 	__be16			ricpc_protocol_minor_mask;	/* bitmask */
-	__be32			ricpc_reserved1;
+	u8			ricpc_dp_toss;
+	u8			ripc_reserved1;
+	__be16			ripc_reserved2;
 	__be64			ricpc_ack_seq;
 	__be32			ricpc_credit;	/* non-zero enables flow ctl */
 };
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index 70518e3..66c6eb5 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -144,9 +144,9 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
 		}
 	}
 
-	pr_notice("RDS/IB: %s conn connected <%pI6c,%pI6c> version %u.%u%s\n",
+	pr_notice("RDS/IB: %s conn connected <%pI6c,%pI6c,%d> version %u.%u%s\n",
 		  ic->i_active_side ? "Active" : "Passive",
-		  &conn->c_laddr, &conn->c_faddr,
+		  &conn->c_laddr, &conn->c_faddr, conn->c_tos,
 		  RDS_PROTOCOL_MAJOR(conn->c_version),
 		  RDS_PROTOCOL_MINOR(conn->c_version),
 		  ic->i_flowctl ? ", flow control" : "");
@@ -222,6 +222,7 @@ static void rds_ib_cm_fill_conn_param(struct rds_connection *conn,
 			    cpu_to_be16(RDS_IB_SUPPORTED_PROTOCOLS);
 			dp->ricp_v6.dp_ack_seq =
 			    cpu_to_be64(rds_ib_piggyb_ack(ic));
+			dp->ricp_v6.dp_cmn.ricpc_dp_toss = conn->c_tos;
 
 			conn_param->private_data = &dp->ricp_v6;
 			conn_param->private_data_len = sizeof(dp->ricp_v6);
@@ -236,6 +237,7 @@ static void rds_ib_cm_fill_conn_param(struct rds_connection *conn,
 			    cpu_to_be16(RDS_IB_SUPPORTED_PROTOCOLS);
 			dp->ricp_v4.dp_ack_seq =
 			    cpu_to_be64(rds_ib_piggyb_ack(ic));
+			dp->ricp_v4.dp_cmn.ricpc_dp_toss = conn->c_tos;
 
 			conn_param->private_data = &dp->ricp_v4;
 			conn_param->private_data_len = sizeof(dp->ricp_v4);
@@ -391,10 +393,9 @@ static void rds_ib_qp_event_handler(struct ib_event *event, void *data)
 		rdma_notify(ic->i_cm_id, IB_EVENT_COMM_EST);
 		break;
 	default:
-		rdsdebug("Fatal QP Event %u (%s) "
-			"- connection %pI6c->%pI6c, reconnecting\n",
-			event->event, ib_event_msg(event->event),
-			&conn->c_laddr, &conn->c_faddr);
+		rdsdebug("Fatal QP Event %u (%s) - connection %pI6c->%pI6c, reconnecting\n",
+			 event->event, ib_event_msg(event->event),
+			 &conn->c_laddr, &conn->c_faddr);
 		rds_conn_drop(conn);
 		break;
 	}
@@ -662,11 +663,11 @@ static u32 rds_ib_protocol_compatible(struct rdma_cm_event *event, bool isv6)
 
 	/* Even if len is crap *now* I still want to check it. -ASG */
 	if (event->param.conn.private_data_len < data_len || major == 0)
-		return RDS_PROTOCOL_3_0;
+		return RDS_PROTOCOL_4_0;
 
 	common = be16_to_cpu(mask) & RDS_IB_SUPPORTED_PROTOCOLS;
-	if (major == 3 && common) {
-		version = RDS_PROTOCOL_3_0;
+	if (major == 4 && common) {
+		version = RDS_PROTOCOL_4_0;
 		while ((common >>= 1) != 0)
 			version++;
 	} else if (RDS_PROTOCOL_COMPAT_VERSION ==
@@ -778,15 +779,16 @@ int rds_ib_cm_handle_connect(struct rdma_cm_id *cm_id,
 		daddr6 = &d_mapped_addr;
 	}
 
-	rdsdebug("saddr %pI6c daddr %pI6c RDSv%u.%u lguid 0x%llx fguid "
-		 "0x%llx\n", saddr6, daddr6,
-		 RDS_PROTOCOL_MAJOR(version), RDS_PROTOCOL_MINOR(version),
+	rdsdebug("saddr %pI6c daddr %pI6c RDSv%u.%u lguid 0x%llx fguid 0x%llx, tos:%d\n",
+		 saddr6, daddr6, RDS_PROTOCOL_MAJOR(version),
+		 RDS_PROTOCOL_MINOR(version),
 		 (unsigned long long)be64_to_cpu(lguid),
-		 (unsigned long long)be64_to_cpu(fguid));
+		 (unsigned long long)be64_to_cpu(fguid), dp_cmn->ricpc_dp_toss);
 
 	/* RDS/IB is not currently netns aware, thus init_net */
 	conn = rds_conn_create(&init_net, daddr6, saddr6,
-			       &rds_ib_transport, 0, GFP_KERNEL, ifindex);
+			       &rds_ib_transport, dp_cmn->ricpc_dp_toss,
+			       GFP_KERNEL, ifindex);
 	if (IS_ERR(conn)) {
 		rdsdebug("rds_conn_create failed (%ld)\n", PTR_ERR(conn));
 		conn = NULL;
@@ -868,7 +870,7 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
 
 	/* If the peer doesn't do protocol negotiation, we must
 	 * default to RDSv3.0 */
-	rds_ib_set_protocol(conn, RDS_PROTOCOL_VERSION);
+	rds_ib_set_protocol(conn, RDS_PROTOCOL_4_1);
 	ic->i_flowctl = rds_ib_sysctl_flow_control;	/* advertise flow control */
 
 	ret = rds_ib_setup_qp(conn);
diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 2f16146..d395eec 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -986,9 +986,9 @@ void rds_ib_recv_cqe_handler(struct rds_ib_connection *ic,
 	} else {
 		/* We expect errors as the qp is drained during shutdown */
 		if (rds_conn_up(conn) || rds_conn_connecting(conn))
-			rds_ib_conn_error(conn, "recv completion on <%pI6c,%pI6c> had status %u (%s), disconnecting and reconnecting\n",
+			rds_ib_conn_error(conn, "recv completion on <%pI6c,%pI6c, %d> had status %u (%s), disconnecting and reconnecting\n",
 					  &conn->c_laddr, &conn->c_faddr,
-					  wc->status,
+					  conn->c_tos, wc->status,
 					  ib_wc_status_msg(wc->status));
 	}
 
diff --git a/net/rds/ib_send.c b/net/rds/ib_send.c
index 4e0c36a..09c46f2 100644
--- a/net/rds/ib_send.c
+++ b/net/rds/ib_send.c
@@ -305,8 +305,9 @@ void rds_ib_send_cqe_handler(struct rds_ib_connection *ic, struct ib_wc *wc)
 
 	/* We expect errors as the qp is drained during shutdown */
 	if (wc->status != IB_WC_SUCCESS && rds_conn_up(conn)) {
-		rds_ib_conn_error(conn, "send completion on <%pI6c,%pI6c> had status %u (%s), disconnecting and reconnecting\n",
-				  &conn->c_laddr, &conn->c_faddr, wc->status,
+		rds_ib_conn_error(conn, "send completion on <%pI6c,%pI6c,%d> had status %u (%s), disconnecting and reconnecting\n",
+				  &conn->c_laddr, &conn->c_faddr,
+				  conn->c_tos, wc->status,
 				  ib_wc_status_msg(wc->status));
 	}
 }
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index e37f915..46bce83 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -83,6 +83,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
 		break;
 
 	case RDMA_CM_EVENT_ADDR_RESOLVED:
+		rdma_set_service_type(cm_id, conn->c_tos);
 		/* XXX do we need to clean up if this fails? */
 		ret = rdma_resolve_route(cm_id,
 					 RDS_RDMA_RESOLVE_TIMEOUT_MS);
diff --git a/net/rds/send.c b/net/rds/send.c
index c555e12..166dd57 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -1277,12 +1277,13 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
 
 	/* rds_conn_create has a spinlock that runs with IRQ off.
 	 * Caching the conn in the socket helps a lot. */
-	if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr)) {
+	if (rs->rs_conn && ipv6_addr_equal(&rs->rs_conn->c_faddr, &daddr) &&
+	    rs->rs_tos == rs->rs_conn->c_tos) {
 		conn = rs->rs_conn;
 	} else {
 		conn = rds_conn_create_outgoing(sock_net(sock->sk),
 						&rs->rs_bound_addr, &daddr,
-						rs->rs_transport, 0,
+						rs->rs_transport, rs->rs_tos,
 						sock->sk->sk_allocation,
 						scope_id);
 		if (IS_ERR(conn)) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [net-next][PATCH 0/5] rds: add tos support
  2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
                   ` (4 preceding siblings ...)
  2019-02-05  0:04 ` [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos Santosh Shilimkar
@ 2019-02-07  1:01 ` David Miller
  5 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2019-02-07  1:01 UTC (permalink / raw)
  To: santosh.shilimkar; +Cc: netdev, yanjun.zhu

From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Date: Mon,  4 Feb 2019 16:04:44 -0800

> RDS applications make use of tos to classify database traffic.
> This feature has been used in shipping products from 2.6.32 based
> kernels. Its tied with RDS v4.1 protocol version and the compatibility
> gets negotiated as part of connections setup.
 ...
> Patchset is also available on below git tree.
> 
> The following changes since commit cc7335786f7278d66bdcf96d3d411edfcb01be51:
> 
>   socket: fix for Add SO_TIMESTAMP[NS]_NEW (2019-02-03 20:36:11 -0800)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/ssantosh/linux.git for_net-next-5.1/rds-tos-v4

Pulled, thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos
  2019-02-05  0:04 ` [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos Santosh Shilimkar
@ 2019-03-05 16:33   ` Gerd Rausch
  2019-03-05 16:41     ` Santosh Shilimkar
  0 siblings, 1 reply; 22+ messages in thread
From: Gerd Rausch @ 2019-03-05 16:33 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev, davem; +Cc: yanjun.zhu

Hi,

This patchset breaks compatibility...

On 04/02/2019 16.04, Santosh Shilimkar wrote:
> --- a/net/rds/ib_cm.c
> +++ b/net/rds/ib_cm.c
> @@ -868,7 +870,7 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
>  
>  	/* If the peer doesn't do protocol negotiation, we must
>  	 * default to RDSv3.0 */
> -	rds_ib_set_protocol(conn, RDS_PROTOCOL_VERSION);
> +	rds_ib_set_protocol(conn, RDS_PROTOCOL_4_1);
>  	ic->i_flowctl = rds_ib_sysctl_flow_control;	/* advertise flow control */
>  
>  	ret = rds_ib_setup_qp(conn);

The comment calls out to fallback to RDSv3.0, but the code assumes that v4.1 is
the new common standard.

If there's a mechanism that ensures compatibility with older (pre-4.1) versions
of RDS I am not seeing it.
The inconsistency in comment vs. code doesn't help in that regard.

And tests illustrated this incompatibility:
2 peers with this patchset can talk to eachother.
Peers with a mix of post-this-patchset and pre-this-patchset can no longer talk
to eachother.

Thanks,

  Gerd

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos
  2019-03-05 16:33   ` Gerd Rausch
@ 2019-03-05 16:41     ` Santosh Shilimkar
  2019-03-05 16:48       ` Gerd Rausch
  0 siblings, 1 reply; 22+ messages in thread
From: Santosh Shilimkar @ 2019-03-05 16:41 UTC (permalink / raw)
  To: Gerd Rausch, netdev, davem; +Cc: yanjun.zhu

On 3/5/2019 8:33 AM, Gerd Rausch wrote:
> Hi,
> 
> This patchset breaks compatibility...
> 
> On 04/02/2019 16.04, Santosh Shilimkar wrote:
>> --- a/net/rds/ib_cm.c
>> +++ b/net/rds/ib_cm.c
>> @@ -868,7 +870,7 @@ int rds_ib_cm_initiate_connect(struct rdma_cm_id *cm_id, bool isv6)
>>   
>>   	/* If the peer doesn't do protocol negotiation, we must
>>   	 * default to RDSv3.0 */
>> -	rds_ib_set_protocol(conn, RDS_PROTOCOL_VERSION);
>> +	rds_ib_set_protocol(conn, RDS_PROTOCOL_4_1);
>>   	ic->i_flowctl = rds_ib_sysctl_flow_control;	/* advertise flow control */
>>   
>>   	ret = rds_ib_setup_qp(conn);
> 
> The comment calls out to fallback to RDSv3.0, but the code assumes that v4.1 is
> the new common standard.
> 
> If there's a mechanism that ensures compatibility with older (pre-4.1) versions
> of RDS I am not seeing it.
Thats handled as part of the connection reject handler as part of 
negotiation.

> The inconsistency in comment vs. code doesn't help in that regard.
>
Yeah the comment should have been updated.

> And tests illustrated this incompatibility:
> 2 peers with this patchset can talk to eachother.
> Peers with a mix of post-this-patchset and pre-this-patchset can no longer talk
> to eachother.
> 
They can talk to each other as per Yanjun tests. He is working on 
setting it up net-next to see if something got missed out. Stay tune. 
Will update about the results.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos
  2019-03-05 16:41     ` Santosh Shilimkar
@ 2019-03-05 16:48       ` Gerd Rausch
  2019-03-05 17:02         ` Santosh Shilimkar
                           ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Gerd Rausch @ 2019-03-05 16:48 UTC (permalink / raw)
  To: Santosh Shilimkar, netdev, davem; +Cc: yanjun.zhu

Hi Santosh,

On 05/03/2019 08.41, Santosh Shilimkar wrote:
> On 3/5/2019 8:33 AM, Gerd Rausch wrote:
>> If there's a mechanism that ensures compatibility with older (pre-4.1) versions
>> of RDS I am not seeing it.
> Thats handled as part of the connection reject handler as part of negotiation.
> 

Evidentally, that mechanism isn't working properly.

>> The inconsistency in comment vs. code doesn't help in that regard.
>>
> Yeah the comment should have been updated.
> 

>> And tests illustrated this incompatibility:
>> 2 peers with this patchset can talk to eachother.
>> Peers with a mix of post-this-patchset and pre-this-patchset can no longer talk
>> to eachother.
>>
> They can talk to each other as per Yanjun tests.

I am happy to hear it worked for him, but is it possible that his tests may have been incomplete?

In a unicast e-mail conversation with Yanjun, he acknowledged:
"Now I found another hosts and I can reproduce Gerd's bug on the hosts. Now I am working on the hosts to find the root cause."

So we have non-working code in David's repository, which I consider not to be a good thing.

> He is working on setting it up net-next to see if something got missed out.
> Stay tune. Will update about the results.
> 

Can we back this patchset out again until we have working & compatible code?

Thanks,

  Gerd



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos
  2019-03-05 16:48       ` Gerd Rausch
@ 2019-03-05 17:02         ` Santosh Shilimkar
  2019-03-06  5:28         ` Yanjun Zhu
       [not found]         ` <20190306070409.26840-1-gerd.rausch@oracle.com>
  2 siblings, 0 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-03-05 17:02 UTC (permalink / raw)
  To: Gerd Rausch, netdev, davem; +Cc: yanjun.zhu

On 3/5/2019 8:48 AM, Gerd Rausch wrote:
> Hi Santosh,
> 
> On 05/03/2019 08.41, Santosh Shilimkar wrote:
>> On 3/5/2019 8:33 AM, Gerd Rausch wrote:
>>> If there's a mechanism that ensures compatibility with older (pre-4.1) versions
>>> of RDS I am not seeing it.
>> Thats handled as part of the connection reject handler as part of negotiation.
>>
> 
> Evidentally, that mechanism isn't working properly.
> 
>>> The inconsistency in comment vs. code doesn't help in that regard.
>>>
>> Yeah the comment should have been updated.
>>
> 
>>> And tests illustrated this incompatibility:
>>> 2 peers with this patchset can talk to eachother.
>>> Peers with a mix of post-this-patchset and pre-this-patchset can no longer talk
>>> to eachother.
>>>
>> They can talk to each other as per Yanjun tests.
> 
> I am happy to hear it worked for him, but is it possible that his tests may have been incomplete?
> 
> In a unicast e-mail conversation with Yanjun, he acknowledged:
> "Now I found another hosts and I can reproduce Gerd's bug on the hosts. Now I am working on the hosts to find the root cause."
>
Haven't heard that yet from him. Let me find out. Thanks for the heads up.

> So we have non-working code in David's repository, which I consider not to be a good thing.
> 
>> He is working on setting it up net-next to see if something got missed out.
>> Stay tune. Will update about the results.
>>
> 
> Can we back this patchset out again until we have working & compatible code?
We can fix this for v5.1-rcx and also send fix for v5.0 stable since its 
only that version has the issue. Will send out fix or backout patch.

Many thanks for reporting it Gerd.

Regards,
Santosh


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos
  2019-03-05 16:48       ` Gerd Rausch
  2019-03-05 17:02         ` Santosh Shilimkar
@ 2019-03-06  5:28         ` Yanjun Zhu
       [not found]         ` <20190306070409.26840-1-gerd.rausch@oracle.com>
  2 siblings, 0 replies; 22+ messages in thread
From: Yanjun Zhu @ 2019-03-06  5:28 UTC (permalink / raw)
  To: Gerd Rausch, Santosh Shilimkar, netdev, davem


On 2019/3/6 0:48, Gerd Rausch wrote:
> Hi Santosh,
>
> On 05/03/2019 08.41, Santosh Shilimkar wrote:
>> On 3/5/2019 8:33 AM, Gerd Rausch wrote:
>>> If there's a mechanism that ensures compatibility with older (pre-4.1) versions
>>> of RDS I am not seeing it.
>> Thats handled as part of the connection reject handler as part of negotiation.
>>
> Evidentally, that mechanism isn't working properly.
>
>>> The inconsistency in comment vs. code doesn't help in that regard.
>>>
>> Yeah the comment should have been updated.
>>
>>> And tests illustrated this incompatibility:
>>> 2 peers with this patchset can talk to eachother.
>>> Peers with a mix of post-this-patchset and pre-this-patchset can no longer talk
>>> to eachother.
>>>
>> They can talk to each other as per Yanjun tests.
> I am happy to hear it worked for him, but is it possible that his tests may have been incomplete?
>
> In a unicast e-mail conversation with Yanjun, he acknowledged:
> "Now I found another hosts and I can reproduce Gerd's bug on the hosts. Now I am working on the hosts to find the root cause."

Sorry. It is late to reply. I explained it in details. Just now I made 
tests with  4.20.0(without qos) and 5.0.0-rc8 (with qos).

When run rds-stress without rdma, it can work well.

With rdma, there is some problem with rds-stress. This is a known 
problem. From Gerd, he has some commits to fix it.

Zhu Yanjun

>
> So we have non-working code in David's repository, which I consider not to be a good thing.
>
>> He is working on setting it up net-next to see if something got missed out.
>> Stay tune. Will update about the results.
>>
> Can we back this patchset out again until we have working & compatible code?
>
> Thanks,
>
>    Gerd
>
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net/rds: Accept peer connection reject messages due to incompatible version
       [not found]         ` <20190306070409.26840-1-gerd.rausch@oracle.com>
@ 2019-03-06  8:41           ` Yanjun Zhu
       [not found]             ` <20190307014920.24257-1-gerd.rausch@oracle.com>
  2019-03-06 17:55           ` [PATCH] " Santosh Shilimkar
  1 sibling, 1 reply; 22+ messages in thread
From: Yanjun Zhu @ 2019-03-06  8:41 UTC (permalink / raw)
  To: Gerd Rausch, Santosh Shilimkar, netdev; +Cc: David Miller


On 2019/3/6 15:04, Gerd Rausch wrote:
> Prior to
> commit d021fabf525ff ("rds: rdma: add consumer reject")
>
> function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
> connection attempt by issuing a "rds_conn_drop".
>
> The commit mentioned above added a "break", eliminating
> the "fallthrough" case and made the "rds_conn_drop" rather conditional:
>
> Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
> carries an integer-value of "1" inside "private_data":
>
>>                 if (!conn)
>> +                       break;
>> +               err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
>> +               if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
>> +                       pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
>> +                               &conn->c_laddr, &conn->c_faddr);
>> +                       conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
>> +                       rds_conn_drop(conn);
>> +               }
>>                  rdsdebug("Connection rejected: %s\n",
>>                           rdma_reject_msg(cm_id, event->status));
>> +               break;
>>                  /* FALLTHROUGH */
> A number of issues are worth mentioning here:
>    #1) Previous versions of the RDS code simply rejected a connection
>        by calling "rdma_reject(cm_id, NULL, 0);"
>        So the value of the payload in "private_data" will not be "1",
>        but "0".
>
>    #2) Now the code has become dependent on host byte order and sizing.
>        If one peer is big-endian, the other is little-endian,
>        or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
>        the *err check does not work as intended.
>
>    #3) There is no check for "len" to see if the data behind *err is even valid.
>        Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
>        carry 148 bytes of zeroized payload.
>        But that should probably not be relied upon here.
>
>    #4) With the added "break;",
>        we might as well drop the misleading "/* FALLTHROUGH */" comment.
>
> This commit does _not_ address issue #2, as the sender would have to
> agree on a byte order as well.
>
> Here is the sequence of messages in this observed error-scenario:
>    Host-A is pre-QoS changes (excluding the commit mentioned above)
>    Host-B is post-QoS changes (including the commit mentioned above)
>
>    #1 Host-B
>       issues a connection request via function "rds_conn_path_transition"
>       connection state transitions to "RDS_CONN_CONNECTING"
>
>    #2 Host-A
>       rejects the incompatible connection request (from #1)
>       It does so by calling "rdma_reject(cm_id, NULL, 0);"
>
>    #3 Host-B
>       receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
>       But since the code is changed in the way described above,
>       it won't drop the connection here, simply because "*err == 0".
>
>    #4 Host-A
>       issues a connection request
>
>    #5 Host-B
>       receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
>       and ends up calling "rds_ib_cm_handle_connect".
>       But since the state is already in "RDS_CONN_CONNECTING"
>       (as of #1) it will end up issuing a "rdma_reject" without
>       dropping the connection:
>          if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
>              /* Wait and see - our connect may still be succeeding */
>              rds_ib_stats_inc(s_ib_connect_raced);
>          }
>          goto out;
>
>    #6 Host-A
>       receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
>       drops the connection and tries again (goto #4) until it gives up.
>
> Orabug: 29444532

This is the internal bug. It should be removed.

Zhu Yanjun

>
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
>   net/rds/rdma_transport.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
> index 46bce8389066..f628e7fda66d 100644
> --- a/net/rds/rdma_transport.c
> +++ b/net/rds/rdma_transport.c
> @@ -112,7 +112,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
>   		if (!conn)
>   			break;
>   		err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
> -		if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
> +		if (!err || (err && len >= sizeof(*err) && ((*err) <= RDS_RDMA_REJ_INCOMPAT))) {
>   			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
>   				&conn->c_laddr, &conn->c_faddr);
>   			conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
> @@ -122,7 +122,6 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
>   		rdsdebug("Connection rejected: %s\n",
>   			 rdma_reject_msg(cm_id, event->status));
>   		break;
> -		/* FALLTHROUGH */
>   	case RDMA_CM_EVENT_ADDR_ERROR:
>   	case RDMA_CM_EVENT_ROUTE_ERROR:
>   	case RDMA_CM_EVENT_CONNECT_ERROR:

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] net/rds: Accept peer connection reject messages due to incompatible version
       [not found]         ` <20190306070409.26840-1-gerd.rausch@oracle.com>
  2019-03-06  8:41           ` [PATCH] net/rds: Accept peer connection reject messages due to incompatible version Yanjun Zhu
@ 2019-03-06 17:55           ` Santosh Shilimkar
  1 sibling, 0 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-03-06 17:55 UTC (permalink / raw)
  To: Gerd Rausch, netdev, Yanjun Zhu; +Cc: David Miller

On 3/5/2019 11:04 PM, Gerd Rausch wrote:
> Prior to
> commit d021fabf525ff ("rds: rdma: add consumer reject")
> 
> function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
> connection attempt by issuing a "rds_conn_drop".
> 
> The commit mentioned above added a "break", eliminating
> the "fallthrough" case and made the "rds_conn_drop" rather conditional:
> 
> Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
> carries an integer-value of "1" inside "private_data":
> 
>>                 if (!conn)
>> +                       break;
>> +               err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
>> +               if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
>> +                       pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
>> +                               &conn->c_laddr, &conn->c_faddr);
>> +                       conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
>> +                       rds_conn_drop(conn);
>> +               }
>>                  rdsdebug("Connection rejected: %s\n",
>>                           rdma_reject_msg(cm_id, event->status));
>> +               break;
>>                  /* FALLTHROUGH */
> 
> A number of issues are worth mentioning here:
>    #1) Previous versions of the RDS code simply rejected a connection
>        by calling "rdma_reject(cm_id, NULL, 0);"
>        So the value of the payload in "private_data" will not be "1",
>        but "0".
> 
>    #2) Now the code has become dependent on host byte order and sizing.
>        If one peer is big-endian, the other is little-endian,
>        or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
>        the *err check does not work as intended.
> 
>    #3) There is no check for "len" to see if the data behind *err is even valid.
>        Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
>        carry 148 bytes of zeroized payload.
>        But that should probably not be relied upon here.
> 
>    #4) With the added "break;",
>        we might as well drop the misleading "/* FALLTHROUGH */" comment.
> 
> This commit does _not_ address issue #2, as the sender would have to
> agree on a byte order as well.
> 
> Here is the sequence of messages in this observed error-scenario:
>    Host-A is pre-QoS changes (excluding the commit mentioned above)
>    Host-B is post-QoS changes (including the commit mentioned above)
> 
>    #1 Host-B
>       issues a connection request via function "rds_conn_path_transition"
>       connection state transitions to "RDS_CONN_CONNECTING"
> 
>    #2 Host-A
>       rejects the incompatible connection request (from #1)
>       It does so by calling "rdma_reject(cm_id, NULL, 0);"
> 
>    #3 Host-B
>       receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
>       But since the code is changed in the way described above,
>       it won't drop the connection here, simply because "*err == 0".
> 
>    #4 Host-A
>       issues a connection request
> 
>    #5 Host-B
>       receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
>       and ends up calling "rds_ib_cm_handle_connect".
>       But since the state is already in "RDS_CONN_CONNECTING"
>       (as of #1) it will end up issuing a "rdma_reject" without
>       dropping the connection:
>          if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
>              /* Wait and see - our connect may still be succeeding */
>              rds_ib_stats_inc(s_ib_connect_raced);
>          }
>          goto out;
> 
>    #6 Host-A
>       receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
>       drops the connection and tries again (goto #4) until it gives up.
> 
> Orabug: 29444532
> 
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
>   net/rds/rdma_transport.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
> index 46bce8389066..f628e7fda66d 100644
> --- a/net/rds/rdma_transport.c
> +++ b/net/rds/rdma_transport.c
> @@ -112,7 +112,7 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
>   		if (!conn)
>   			break;
>   		err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
> -		if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
> +		if (!err || (err && len >= sizeof(*err) && ((*err) <= RDS_RDMA_REJ_INCOMPAT))) {
>   			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
>   				&conn->c_laddr, &conn->c_faddr);
>   			conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
> @@ -122,7 +122,6 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
>   		rdsdebug("Connection rejected: %s\n",
>   			 rdma_reject_msg(cm_id, event->status));
>   		break;
> -		/* FALLTHROUGH */
>   	case RDMA_CM_EVENT_ADDR_ERROR:
>   	case RDMA_CM_EVENT_ROUTE_ERROR:
>   	case RDMA_CM_EVENT_CONNECT_ERROR:
> 
Very similar test diff [1] I sent Yanjun to test yesterday... Thanks for 
checking. Will submit a cleaned up fix Gerd. Thansk for checking

 From 82c638f11137e29220051c704b63e3d7ed7d44d0 Mon Sep 17 00:00:00 2001
From: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Date: Tue, 5 Mar 2019 18:19:52 -0800
Subject: [PATCH] TEST debug patch

Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
  net/rds/rdma_transport.c |   16 ++++++++++++++++
  1 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index 46bce83..c7b1fff 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -109,8 +109,10 @@ static int rds_rdma_cm_event_handler_cmn(struct 
rdma_cm_id *cm_id,
  		break;

  	case RDMA_CM_EVENT_REJECTED:
+		err = (int *)event->param.conn.private_data;
  		if (!conn)
  			break;
+#if 0
  		err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
  		if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
  			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping 
connection\n",
@@ -119,6 +121,20 @@ static int rds_rdma_cm_event_handler_cmn(struct 
rdma_cm_id *cm_id,
  			conn->c_tos = 0;
  			rds_conn_drop(conn);
  		}
+#endif
+		if (event->status == RDS_REJ_CONSUMER_DEFINED &&
+				(*err) == RDS_RDMA_REJ_INCOMPAT) {
+			
+			pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
+				&conn->c_laddr, &conn->c_faddr);
+			conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
+			conn->c_tos = 0;
+			rds_conn_drop(conn);
+		} else {
+			pr_debug("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping 
connection\n",
+				&conn->c_laddr, &conn->c_faddr);
+			rds_conn_drop(conn);
+		}
  		rdsdebug("Connection rejected: %s\n",
  			 rdma_reject_msg(cm_id, event->status));
  		break;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH v2] net/rds: Accept peer connection reject messages due to incompatible version
       [not found]             ` <20190307014920.24257-1-gerd.rausch@oracle.com>
@ 2019-03-07  1:55               ` Santosh Shilimkar
  2019-03-07  2:09                 ` Yanjun Zhu
  0 siblings, 1 reply; 22+ messages in thread
From: Santosh Shilimkar @ 2019-03-07  1:55 UTC (permalink / raw)
  To: Gerd Rausch, Yanjun Zhu, netdev; +Cc: David Miller



On 3/6/2019 5:49 PM, Gerd Rausch wrote:
> Prior to
> commit d021fabf525ff ("rds: rdma: add consumer reject")
> 
> function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
> connection attempt by issuing a "rds_conn_drop".
> 
> The commit mentioned above added a "break", eliminating
> the "fallthrough" case and made the "rds_conn_drop" rather conditional:
> 
> Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
> carries an integer-value of "1" inside "private_data":
> 
>>                 if (!conn)
>> +                       break;
>> +               err = (int *)rdma_consumer_reject_data(cm_id, event, &len);
>> +               if (!err || (err && ((*err) == RDS_RDMA_REJ_INCOMPAT))) {
>> +                       pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> rejected, dropping connection\n",
>> +                               &conn->c_laddr, &conn->c_faddr);
>> +                       conn->c_proposed_version = RDS_PROTOCOL_COMPAT_VERSION;
>> +                       rds_conn_drop(conn);
>> +               }
>>                  rdsdebug("Connection rejected: %s\n",
>>                           rdma_reject_msg(cm_id, event->status));
>> +               break;
>>                  /* FALLTHROUGH */
> 
> A number of issues are worth mentioning here:
>    #1) Previous versions of the RDS code simply rejected a connection
>        by calling "rdma_reject(cm_id, NULL, 0);"
>        So the value of the payload in "private_data" will not be "1",
>        but "0".
> 
>    #2) Now the code has become dependent on host byte order and sizing.
>        If one peer is big-endian, the other is little-endian,
>        or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
>        the *err check does not work as intended.
> 
>    #3) There is no check for "len" to see if the data behind *err is even valid.
>        Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" will always
>        carry 148 bytes of zeroized payload.
>        But that should probably not be relied upon here.
> 
>    #4) With the added "break;",
>        we might as well drop the misleading "/* FALLTHROUGH */" comment.
> 
> This commit does _not_ address issue #2, as the sender would have to
> agree on a byte order as well.
> 
> Here is the sequence of messages in this observed error-scenario:
>    Host-A is pre-QoS changes (excluding the commit mentioned above)
>    Host-B is post-QoS changes (including the commit mentioned above)
> 
>    #1 Host-B
>       issues a connection request via function "rds_conn_path_transition"
>       connection state transitions to "RDS_CONN_CONNECTING"
> 
>    #2 Host-A
>       rejects the incompatible connection request (from #1)
>       It does so by calling "rdma_reject(cm_id, NULL, 0);"
> 
>    #3 Host-B
>       receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
>       But since the code is changed in the way described above,
>       it won't drop the connection here, simply because "*err == 0".
> 
>    #4 Host-A
>       issues a connection request
> 
>    #5 Host-B
>       receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
>       and ends up calling "rds_ib_cm_handle_connect".
>       But since the state is already in "RDS_CONN_CONNECTING"
>       (as of #1) it will end up issuing a "rdma_reject" without
>       dropping the connection:
>          if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
>              /* Wait and see - our connect may still be succeeding */
>              rds_ib_stats_inc(s_ib_connect_raced);
>          }
>          goto out;
> 
>    #6 Host-A
>       receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
>       drops the connection and tries again (goto #4) until it gives up.
> 
> Fixes: d021fabf525ff ("rds: rdma: add consumer reject")
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
>   net/rds/rdma_transport.c | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> Changes in submitted patch v2:
> * Dropped the "Orabug:" line from the commit-log message (as requested)
> * Added a "Fixes:" line to the commit-log-message
> 
Thanks Gerd for posting an update. The fix looks correct as already
mentioned in earlier post.
FWIW,
Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>

Hi Yanjun,
Please provide your tested-by since you mentioned offlist that
so far you are unable to reproduce the issue on net-next. Thanks.

Regards,
Santosh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH v2] net/rds: Accept peer connection reject messages due to incompatible version
  2019-03-07  1:55               ` [net-next PATCH v2] " Santosh Shilimkar
@ 2019-03-07  2:09                 ` Yanjun Zhu
  2019-03-07  3:28                   ` Yanjun Zhu
  0 siblings, 1 reply; 22+ messages in thread
From: Yanjun Zhu @ 2019-03-07  2:09 UTC (permalink / raw)
  To: Santosh Shilimkar, Gerd Rausch, netdev; +Cc: David Miller


On 2019/3/7 9:55, Santosh Shilimkar wrote:
>
>
> On 3/6/2019 5:49 PM, Gerd Rausch wrote:
>> Prior to
>> commit d021fabf525ff ("rds: rdma: add consumer reject")
>>
>> function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
>> connection attempt by issuing a "rds_conn_drop".
>>
>> The commit mentioned above added a "break", eliminating
>> the "fallthrough" case and made the "rds_conn_drop" rather conditional:
>>
>> Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
>> carries an integer-value of "1" inside "private_data":
>>
>>>                 if (!conn)
>>> +                       break;
>>> +               err = (int *)rdma_consumer_reject_data(cm_id, event, 
>>> &len);
>>> +               if (!err || (err && ((*err) == 
>>> RDS_RDMA_REJ_INCOMPAT))) {
>>> +                       pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> 
>>> rejected, dropping connection\n",
>>> +                               &conn->c_laddr, &conn->c_faddr);
>>> +                       conn->c_proposed_version = 
>>> RDS_PROTOCOL_COMPAT_VERSION;
>>> +                       rds_conn_drop(conn);
>>> +               }
>>>                  rdsdebug("Connection rejected: %s\n",
>>>                           rdma_reject_msg(cm_id, event->status));
>>> +               break;
>>>                  /* FALLTHROUGH */
>>
>> A number of issues are worth mentioning here:
>>    #1) Previous versions of the RDS code simply rejected a connection
>>        by calling "rdma_reject(cm_id, NULL, 0);"
>>        So the value of the payload in "private_data" will not be "1",
>>        but "0".
>>
>>    #2) Now the code has become dependent on host byte order and sizing.
>>        If one peer is big-endian, the other is little-endian,
>>        or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
>>        the *err check does not work as intended.
>>
>>    #3) There is no check for "len" to see if the data behind *err is 
>> even valid.
>>        Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" 
>> will always
>>        carry 148 bytes of zeroized payload.
>>        But that should probably not be relied upon here.
>>
>>    #4) With the added "break;",
>>        we might as well drop the misleading "/* FALLTHROUGH */" comment.
>>
>> This commit does _not_ address issue #2, as the sender would have to
>> agree on a byte order as well.
>>
>> Here is the sequence of messages in this observed error-scenario:
>>    Host-A is pre-QoS changes (excluding the commit mentioned above)
>>    Host-B is post-QoS changes (including the commit mentioned above)
>>
>>    #1 Host-B
>>       issues a connection request via function 
>> "rds_conn_path_transition"
>>       connection state transitions to "RDS_CONN_CONNECTING"
>>
>>    #2 Host-A
>>       rejects the incompatible connection request (from #1)
>>       It does so by calling "rdma_reject(cm_id, NULL, 0);"
>>
>>    #3 Host-B
>>       receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
>>       But since the code is changed in the way described above,
>>       it won't drop the connection here, simply because "*err == 0".
>>
>>    #4 Host-A
>>       issues a connection request
>>
>>    #5 Host-B
>>       receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
>>       and ends up calling "rds_ib_cm_handle_connect".
>>       But since the state is already in "RDS_CONN_CONNECTING"
>>       (as of #1) it will end up issuing a "rdma_reject" without
>>       dropping the connection:
>>          if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
>>              /* Wait and see - our connect may still be succeeding */
>>              rds_ib_stats_inc(s_ib_connect_raced);
>>          }
>>          goto out;
>>
>>    #6 Host-A
>>       receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
>>       drops the connection and tries again (goto #4) until it gives up.
>>
>> Fixes: d021fabf525ff ("rds: rdma: add consumer reject")
>> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
>> ---
>>   net/rds/rdma_transport.c | 3 +--
>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>
>> Changes in submitted patch v2:
>> * Dropped the "Orabug:" line from the commit-log message (as requested)
>> * Added a "Fixes:" line to the commit-log-message
>>
> Thanks Gerd for posting an update. The fix looks correct as already
> mentioned in earlier post.
> FWIW,
> Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>
>
> Hi Yanjun,
> Please provide your tested-by since you mentioned offlist that

OK. Now I am working with Gerd to reproduce this bug.

I think this problem can be reproduced since Gerd is confident with it.

So I send my tested-by in advance.

Reviewed-and-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Thanks,

Zhu Yanjun

> so far you are unable to reproduce the issue on net-next. Thanks.
>
> Regards,
> Santosh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH v2] net/rds: Accept peer connection reject messages due to incompatible version
  2019-03-07  2:09                 ` Yanjun Zhu
@ 2019-03-07  3:28                   ` Yanjun Zhu
  0 siblings, 0 replies; 22+ messages in thread
From: Yanjun Zhu @ 2019-03-07  3:28 UTC (permalink / raw)
  To: Santosh Shilimkar, Gerd Rausch, netdev; +Cc: David Miller


On 2019/3/7 10:09, Yanjun Zhu wrote:
>
> On 2019/3/7 9:55, Santosh Shilimkar wrote:
>>
>>
>> On 3/6/2019 5:49 PM, Gerd Rausch wrote:
>>> Prior to
>>> commit d021fabf525ff ("rds: rdma: add consumer reject")
>>>
>>> function "rds_rdma_cm_event_handler_cmn" would always honor a rejected
>>> connection attempt by issuing a "rds_conn_drop".
>>>
>>> The commit mentioned above added a "break", eliminating
>>> the "fallthrough" case and made the "rds_conn_drop" rather conditional:
>>>
>>> Now it only happens if a "consumer defined" reject (i.e. "rdma_reject")
>>> carries an integer-value of "1" inside "private_data":
>>>
>>>>                 if (!conn)
>>>> +                       break;
>>>> +               err = (int *)rdma_consumer_reject_data(cm_id, 
>>>> event, &len);
>>>> +               if (!err || (err && ((*err) == 
>>>> RDS_RDMA_REJ_INCOMPAT))) {
>>>> +                       pr_warn("RDS/RDMA: conn <%pI6c, %pI6c> 
>>>> rejected, dropping connection\n",
>>>> +                               &conn->c_laddr, &conn->c_faddr);
>>>> +                       conn->c_proposed_version = 
>>>> RDS_PROTOCOL_COMPAT_VERSION;
>>>> +                       rds_conn_drop(conn);
>>>> +               }
>>>>                  rdsdebug("Connection rejected: %s\n",
>>>>                           rdma_reject_msg(cm_id, event->status));
>>>> +               break;
>>>>                  /* FALLTHROUGH */
>>>
>>> A number of issues are worth mentioning here:
>>>    #1) Previous versions of the RDS code simply rejected a connection
>>>        by calling "rdma_reject(cm_id, NULL, 0);"
>>>        So the value of the payload in "private_data" will not be "1",
>>>        but "0".
>>>
>>>    #2) Now the code has become dependent on host byte order and sizing.
>>>        If one peer is big-endian, the other is little-endian,
>>>        or there's a difference in sizeof(int) (e.g. ILP64 vs LP64),
>>>        the *err check does not work as intended.
>>>
>>>    #3) There is no check for "len" to see if the data behind *err is 
>>> even valid.
>>>        Luckily, it appears that the "rdma_reject(cm_id, NULL, 0)" 
>>> will always
>>>        carry 148 bytes of zeroized payload.
>>>        But that should probably not be relied upon here.
>>>
>>>    #4) With the added "break;",
>>>        we might as well drop the misleading "/* FALLTHROUGH */" 
>>> comment.
>>>
>>> This commit does _not_ address issue #2, as the sender would have to
>>> agree on a byte order as well.
>>>
>>> Here is the sequence of messages in this observed error-scenario:
>>>    Host-A is pre-QoS changes (excluding the commit mentioned above)
>>>    Host-B is post-QoS changes (including the commit mentioned above)
>>>
>>>    #1 Host-B
>>>       issues a connection request via function 
>>> "rds_conn_path_transition"
>>>       connection state transitions to "RDS_CONN_CONNECTING"
>>>
>>>    #2 Host-A
>>>       rejects the incompatible connection request (from #1)
>>>       It does so by calling "rdma_reject(cm_id, NULL, 0);"
>>>
>>>    #3 Host-B
>>>       receives an "RDMA_CM_EVENT_REJECTED" event (from #2)
>>>       But since the code is changed in the way described above,
>>>       it won't drop the connection here, simply because "*err == 0".
>>>
>>>    #4 Host-A
>>>       issues a connection request
>>>
>>>    #5 Host-B
>>>       receives an "RDMA_CM_EVENT_CONNECT_REQUEST" event
>>>       and ends up calling "rds_ib_cm_handle_connect".
>>>       But since the state is already in "RDS_CONN_CONNECTING"
>>>       (as of #1) it will end up issuing a "rdma_reject" without
>>>       dropping the connection:
>>>          if (rds_conn_state(conn) == RDS_CONN_CONNECTING) {
>>>              /* Wait and see - our connect may still be succeeding */
>>>              rds_ib_stats_inc(s_ib_connect_raced);
>>>          }
>>>          goto out;
>>>
>>>    #6 Host-A
>>>       receives an "RDMA_CM_EVENT_REJECTED" event (from #5),
>>>       drops the connection and tries again (goto #4) until it gives up.
>>>
>>> Fixes: d021fabf525ff ("rds: rdma: add consumer reject")
>>> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
>>> ---
>>>   net/rds/rdma_transport.c | 3 +--
>>>   1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> Changes in submitted patch v2:
>>> * Dropped the "Orabug:" line from the commit-log message (as requested)
>>> * Added a "Fixes:" line to the commit-log-message
>>>
>> Thanks Gerd for posting an update. The fix looks correct as already
>> mentioned in earlier post.
>> FWIW,
>> Acked-by: Santosh Shilimkar<santosh.shilimkar@oracle.com>
>>
>> Hi Yanjun,
>> Please provide your tested-by since you mentioned offlist that
>
> OK. Now I am working with Gerd to reproduce this bug.
>
> I think this problem can be reproduced since Gerd is confident with it.

Sorry. The HCA device in my test env is

CA 'mlx4_0'
         CA type: MT26428
         Number of ports: 2
         Firmware version: 2.11.2010
         Hardware version: b0
         Node GUID: 0x0002c903000a7a30
         System image GUID: 0x0002c903000a7a33
         Port 1:
                 State: Active
                 Physical state: LinkUp
                 Rate: 40
                 Base lid: 50
                 LMC: 0
                 SM lid: 26
                 Capability mask: 0x02590868
                 Port GUID: 0x0002c903000a7a31
                 Link layer: InfiniBand
         Port 2:
                 State: Active
                 Physical state: LinkUp
                 Rate: 40
                 Base lid: 51
                 LMC: 0
                 SM lid: 26
                 Capability mask: 0x02590868
                 Port GUID: 0x0002c903000a7a32
                 Link layer: InfiniBand
And from Gerd

"

The setup I use that ran into the issue right way is comprised of:
2 CX4 HCAs wired up back to back in RoCE mode (LINK_TYPE=ETH)

"

Perhaps the HW causes this problem. Since I can not reproduce this bug 
and make tests with this patch,

and I did review this patch, I change to this:

Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>

Thanks, Gerd.

Zhu Yanjun

>
> So I send my tested-by in advance.
>
> Reviewed-and-tested-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>
> Thanks,
>
> Zhu Yanjun
>
>> so far you are unable to reproduce the issue on net-next. Thanks.
>>
>> Regards,
>> Santosh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH] net/rds: Return proper "tos" value to user-space
       [not found]   ` <20190307220106.9099-1-gerd.rausch@oracle.com>
@ 2019-03-08  1:16     ` Yanjun Zhu
  2019-03-08  1:37     ` santosh.shilimkar
  1 sibling, 0 replies; 22+ messages in thread
From: Yanjun Zhu @ 2019-03-08  1:16 UTC (permalink / raw)
  To: Gerd Rausch, Santosh Shilimkar, netdev; +Cc: David Miller


On 2019/3/8 6:01, Gerd Rausch wrote:
> The proper "tos" value needs to be returned
> to user-space (sockopt RDS_INFO_CONNECTIONS).
>
> Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure")
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>

In RDS/IB, tos is set in this function. Do you still use RoCE device?

static int rds_ib_conn_info_visitor(struct rds_connection *conn,

                                     void *buffer)
{
         struct rds_info_rdma_connection *iinfo = buffer;
         struct rds_ib_connection *ic;

         /* We will only ever look at IB transports */
         if (conn->c_trans != &rds_ib_transport)
                 return 0;
         if (conn->c_isv6)
                 return 0;

         iinfo->src_addr = conn->c_laddr.s6_addr32[3];
         iinfo->dst_addr = conn->c_faddr.s6_addr32[3];
         iinfo->tos = conn->c_tos;

         memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid));
         memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid));
         if (rds_conn_state(conn) == RDS_CONN_UP) {
                 struct rds_ib_device *rds_ibdev;

                 ic = conn->c_transport_data;

                 rdma_read_gids(ic->i_cm_id, (union ib_gid 
*)&iinfo->src_gid,
                                (union ib_gid *)&iinfo->dst_gid);

                 rds_ibdev = ic->rds_ibdev;
                 iinfo->max_send_wr = ic->i_send_ring.w_nr;
                 iinfo->max_recv_wr = ic->i_recv_ring.w_nr;
                 iinfo->max_send_sge = rds_ibdev->max_sge;
                 rds_ib_get_mr_info(rds_ibdev, iinfo);
         }
         return 1;

}

run rds-info, you can find tos with the above function.
> ---
>   net/rds/connection.c | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index 7ea134f9a825..ed7f2133acc2 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -736,6 +736,7 @@ static int rds_conn_info_visitor(struct rds_conn_path *cp, void *buffer)
>   	cinfo->next_rx_seq = cp->cp_next_rx_seq;
>   	cinfo->laddr = conn->c_laddr.s6_addr32[3];
>   	cinfo->faddr = conn->c_faddr.s6_addr32[3];
> +	cinfo->tos = conn->c_tos;

Without this commit, what will happen?

Zhu Yanjun

>   	strncpy(cinfo->transport, conn->c_trans->t_name,
>   		sizeof(cinfo->transport));
>   	cinfo->flags = 0;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH] net/rds: Return proper "tos" value to user-space
       [not found]   ` <20190307220106.9099-1-gerd.rausch@oracle.com>
  2019-03-08  1:16     ` [net-next PATCH] net/rds: Return proper "tos" value to user-space Yanjun Zhu
@ 2019-03-08  1:37     ` santosh.shilimkar
  2019-03-08 22:37       ` Gerd Rausch
  1 sibling, 1 reply; 22+ messages in thread
From: santosh.shilimkar @ 2019-03-08  1:37 UTC (permalink / raw)
  To: Gerd Rausch, Yanjun Zhu, netdev; +Cc: David Miller

On 3/7/19 2:01 PM, Gerd Rausch wrote:
> The proper "tos" value needs to be returned
> to user-space (sockopt RDS_INFO_CONNECTIONS).
> 
> Fixes: 3eb450367d08 ("rds: add type of service(tos) infrastructure")
> Signed-off-by: Gerd Rausch <gerd.rausch@oracle.com>
> ---
>   net/rds/connection.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/net/rds/connection.c b/net/rds/connection.c
> index 7ea134f9a825..ed7f2133acc2 100644
> --- a/net/rds/connection.c
> +++ b/net/rds/connection.c
> @@ -736,6 +736,7 @@ static int rds_conn_info_visitor(struct rds_conn_path *cp, void *buffer)
>   	cinfo->next_rx_seq = cp->cp_next_rx_seq;
>   	cinfo->laddr = conn->c_laddr.s6_addr32[3];
>   	cinfo->faddr = conn->c_faddr.s6_addr32[3];
> +	cinfo->tos = conn->c_tos;
>   	strncpy(cinfo->transport, conn->c_trans->t_name,
>   		sizeof(cinfo->transport));
>   	cinfo->flags = 0;
> 
Transport function populates it "iinfo->tos" so 'rds-info -I'
already should be showing the correct output but we should popullate
it here to for socket option so looks good

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH] net/rds: Return proper "tos" value to user-space
  2019-03-08  1:37     ` santosh.shilimkar
@ 2019-03-08 22:37       ` Gerd Rausch
  2019-03-08 22:54         ` Santosh Shilimkar
  2019-03-08 23:57         ` Zhu Yanjun
  0 siblings, 2 replies; 22+ messages in thread
From: Gerd Rausch @ 2019-03-08 22:37 UTC (permalink / raw)
  To: santosh.shilimkar, Yanjun Zhu, netdev; +Cc: David Miller

On 07/03/2019 17.37, santosh.shilimkar@oracle.com wrote:
>> --- a/net/rds/connection.c
>> +++ b/net/rds/connection.c
>> @@ -736,6 +736,7 @@ static int rds_conn_info_visitor(struct rds_conn_path *cp, void *buffer)
>>       cinfo->next_rx_seq = cp->cp_next_rx_seq;
>>       cinfo->laddr = conn->c_laddr.s6_addr32[3];
>>       cinfo->faddr = conn->c_faddr.s6_addr32[3];
>> +    cinfo->tos = conn->c_tos;
>>       strncpy(cinfo->transport, conn->c_trans->t_name,
>>           sizeof(cinfo->transport));
>>       cinfo->flags = 0;
>>
> Transport function populates it "iinfo->tos" so 'rds-info -I'
> already should be showing the correct output but we should popullate
> it here to for socket option so looks good
> 

"rds-info -I" did show the correct output, but

"rds-info -n" did not:

% rds-info -n
RDS Connections:
      LocalAddr      RemoteAddr  Tos           NextTX           NextRX Flgs
  192.168.253.2   192.168.253.1  159            46940            46264 --C-

> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
> 

Thanks,

  Gerd

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH] net/rds: Return proper "tos" value to user-space
  2019-03-08 22:37       ` Gerd Rausch
@ 2019-03-08 22:54         ` Santosh Shilimkar
  2019-03-08 23:57         ` Zhu Yanjun
  1 sibling, 0 replies; 22+ messages in thread
From: Santosh Shilimkar @ 2019-03-08 22:54 UTC (permalink / raw)
  To: Gerd Rausch, Yanjun Zhu, netdev; +Cc: David Miller

On 3/8/2019 2:37 PM, Gerd Rausch wrote:
> On 07/03/2019 17.37, santosh.shilimkar@oracle.com wrote:
>>> --- a/net/rds/connection.c
>>> +++ b/net/rds/connection.c
>>> @@ -736,6 +736,7 @@ static int rds_conn_info_visitor(struct rds_conn_path *cp, void *buffer)
>>>        cinfo->next_rx_seq = cp->cp_next_rx_seq;
>>>        cinfo->laddr = conn->c_laddr.s6_addr32[3];
>>>        cinfo->faddr = conn->c_faddr.s6_addr32[3];
>>> +    cinfo->tos = conn->c_tos;
>>>        strncpy(cinfo->transport, conn->c_trans->t_name,
>>>            sizeof(cinfo->transport));
>>>        cinfo->flags = 0;
>>>
>> Transport function populates it "iinfo->tos" so 'rds-info -I'
>> already should be showing the correct output but we should popullate
>> it here to for socket option so looks good
>>
> 
> "rds-info -I" did show the correct output, but
> 
> "rds-info -n" did not:
>
Thats what I said. Thanks for confirming it :-)

Regards,
Santosh

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [net-next PATCH] net/rds: Return proper "tos" value to user-space
  2019-03-08 22:37       ` Gerd Rausch
  2019-03-08 22:54         ` Santosh Shilimkar
@ 2019-03-08 23:57         ` Zhu Yanjun
  1 sibling, 0 replies; 22+ messages in thread
From: Zhu Yanjun @ 2019-03-08 23:57 UTC (permalink / raw)
  To: Gerd Rausch, santosh.shilimkar, netdev; +Cc: David Miller


在 2019/3/9 6:37, Gerd Rausch 写道:
> On 07/03/2019 17.37, santosh.shilimkar@oracle.com wrote:
>>> --- a/net/rds/connection.c
>>> +++ b/net/rds/connection.c
>>> @@ -736,6 +736,7 @@ static int rds_conn_info_visitor(struct rds_conn_path *cp, void *buffer)
>>>        cinfo->next_rx_seq = cp->cp_next_rx_seq;
>>>        cinfo->laddr = conn->c_laddr.s6_addr32[3];
>>>        cinfo->faddr = conn->c_faddr.s6_addr32[3];
>>> +    cinfo->tos = conn->c_tos;
>>>        strncpy(cinfo->transport, conn->c_trans->t_name,
>>>            sizeof(cinfo->transport));
>>>        cinfo->flags = 0;
>>>
>> Transport function populates it "iinfo->tos" so 'rds-info -I'
>> already should be showing the correct output but we should popullate
>> it here to for socket option so looks good
>>
> "rds-info -I" did show the correct output, but
>
> "rds-info -n" did not:

Thanks.

Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>


>
> % rds-info -n
> RDS Connections:
>        LocalAddr      RemoteAddr  Tos           NextTX           NextRX Flgs
>    192.168.253.2   192.168.253.1  159            46940            46264 --C-
>
>> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>>
> Thanks,
>
>    Gerd

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2019-03-08 23:57 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-05  0:04 [net-next][PATCH 0/5] rds: add tos support Santosh Shilimkar
2019-02-05  0:04 ` [net-next][PATCH 1/5] rds: make v3.1 as compat version Santosh Shilimkar
2019-02-05  0:04 ` [net-next][PATCH 2/5] rds: rdma: add consumer reject Santosh Shilimkar
2019-02-05  0:04 ` [net-next][PATCH 3/5] rds: add type of service(tos) infrastructure Santosh Shilimkar
     [not found]   ` <20190307220106.9099-1-gerd.rausch@oracle.com>
2019-03-08  1:16     ` [net-next PATCH] net/rds: Return proper "tos" value to user-space Yanjun Zhu
2019-03-08  1:37     ` santosh.shilimkar
2019-03-08 22:37       ` Gerd Rausch
2019-03-08 22:54         ` Santosh Shilimkar
2019-03-08 23:57         ` Zhu Yanjun
2019-02-05  0:04 ` [net-next][PATCH 4/5] rds: add transport specific tos_map hook Santosh Shilimkar
2019-02-05  0:04 ` [net-next][PATCH 5/5] rds: rdma: update rdma transport for tos Santosh Shilimkar
2019-03-05 16:33   ` Gerd Rausch
2019-03-05 16:41     ` Santosh Shilimkar
2019-03-05 16:48       ` Gerd Rausch
2019-03-05 17:02         ` Santosh Shilimkar
2019-03-06  5:28         ` Yanjun Zhu
     [not found]         ` <20190306070409.26840-1-gerd.rausch@oracle.com>
2019-03-06  8:41           ` [PATCH] net/rds: Accept peer connection reject messages due to incompatible version Yanjun Zhu
     [not found]             ` <20190307014920.24257-1-gerd.rausch@oracle.com>
2019-03-07  1:55               ` [net-next PATCH v2] " Santosh Shilimkar
2019-03-07  2:09                 ` Yanjun Zhu
2019-03-07  3:28                   ` Yanjun Zhu
2019-03-06 17:55           ` [PATCH] " Santosh Shilimkar
2019-02-07  1:01 ` [net-next][PATCH 0/5] rds: add tos support David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.