netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next 1/3] tipc: support broadcast/replicast configurable for bc-link
@ 2019-02-19 11:30 Hoang Le
  2019-02-19 11:30 ` [net-next 2/3] tipc: introduce new capability flag for cluster Hoang Le
  2019-02-19 11:30 ` [net-next 3/3] tipc: smooth change between replicast and broadcast Hoang Le
  0 siblings, 2 replies; 4+ messages in thread
From: Hoang Le @ 2019-02-19 11:30 UTC (permalink / raw)
  To: jon.maloy, maloy, ying.xue, netdev, tipc-discussion

Currently, a multicast stream uses either broadcast or replicast as
transmission method, based on the ratio between number of actual
destinations nodes and cluster size.

However, when an L2 interface (e.g., VXLAN) provides pseudo
broadcast support, this becomes very inefficient, as it blindly
replicates multicast packets to all cluster/subnet nodes,
irrespective of whether they host actual target sockets or not.

The TIPC multicast algorithm is able to distinguish real destination
nodes from other nodes, and hence provides a smarter and more
efficient method for transferring multicast messages than
pseudo broadcast can do.

Because of this, we now make it possible for users to force
the broadcast link to permanently switch to using replicast,
irrespective of which capabilities the bearer provides,
or pretend to provide.
Conversely, we also make it possible to force the broadcast link
to always use true broadcast. While maybe less useful in
deployed systems, this may at least be useful for testing the
broadcast algorithm in small clusters.

We retain the current AUTOSELECT ability, i.e., to let the broadcast link
automatically select which algorithm to use, and to switch back and forth
between broadcast and replicast as the ratio between destination
node number and cluster size changes. This remains the default method.

Furthermore, we make it possible to configure the threshold ratio for
such switches. The default ratio is now set to 10%, down from 25% in the
earlier implementation.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
---
 include/uapi/linux/tipc_netlink.h |   2 +
 net/tipc/bcast.c                  | 104 ++++++++++++++++++++++++++++--
 net/tipc/bcast.h                  |   7 ++
 net/tipc/link.c                   |   8 +++
 net/tipc/netlink.c                |   4 +-
 5 files changed, 120 insertions(+), 5 deletions(-)

diff --git a/include/uapi/linux/tipc_netlink.h b/include/uapi/linux/tipc_netlink.h
index 0ebe02ef1a86..efb958fd167d 100644
--- a/include/uapi/linux/tipc_netlink.h
+++ b/include/uapi/linux/tipc_netlink.h
@@ -281,6 +281,8 @@ enum {
 	TIPC_NLA_PROP_TOL,		/* u32 */
 	TIPC_NLA_PROP_WIN,		/* u32 */
 	TIPC_NLA_PROP_MTU,		/* u32 */
+	TIPC_NLA_PROP_BROADCAST,	/* u32 */
+	TIPC_NLA_PROP_BROADCAST_RATIO,	/* u32 */
 
 	__TIPC_NLA_PROP_MAX,
 	TIPC_NLA_PROP_MAX = __TIPC_NLA_PROP_MAX - 1
diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index d8026543bf4c..12b59268bdd6 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -54,7 +54,9 @@ const char tipc_bclink_name[] = "broadcast-link";
  * @dests: array keeping number of reachable destinations per bearer
  * @primary_bearer: a bearer having links to all broadcast destinations, if any
  * @bcast_support: indicates if primary bearer, if any, supports broadcast
+ * @force_bcast: forces broadcast for multicast traffic
  * @rcast_support: indicates if all peer nodes support replicast
+ * @force_rcast: forces replicast for multicast traffic
  * @rc_ratio: dest count as percentage of cluster size where send method changes
  * @bc_threshold: calculated from rc_ratio; if dests > threshold use broadcast
  */
@@ -64,7 +66,9 @@ struct tipc_bc_base {
 	int dests[MAX_BEARERS];
 	int primary_bearer;
 	bool bcast_support;
+	bool force_bcast;
 	bool rcast_support;
+	bool force_rcast;
 	int rc_ratio;
 	int bc_threshold;
 };
@@ -485,10 +489,63 @@ static int tipc_bc_link_set_queue_limits(struct net *net, u32 limit)
 	return 0;
 }
 
+static int tipc_bc_link_set_broadcast_mode(struct net *net, u32 bc_mode)
+{
+	struct tipc_bc_base *bb = tipc_bc_base(net);
+
+	switch (bc_mode) {
+	case BCLINK_MODE_BCAST:
+		if (!bb->bcast_support)
+			return -ENOPROTOOPT;
+
+		bb->force_bcast = true;
+		bb->force_rcast = false;
+		break;
+	case BCLINK_MODE_RCAST:
+		if (!bb->rcast_support)
+			return -ENOPROTOOPT;
+
+		bb->force_bcast = false;
+		bb->force_rcast = true;
+		break;
+	case BCLINK_MODE_SEL:
+		if (!bb->bcast_support || !bb->rcast_support)
+			return -ENOPROTOOPT;
+
+		bb->force_bcast = false;
+		bb->force_rcast = false;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int tipc_bc_link_set_broadcast_ratio(struct net *net, u32 bc_ratio)
+{
+	struct tipc_bc_base *bb = tipc_bc_base(net);
+
+	if (!bb->bcast_support || !bb->rcast_support)
+		return -ENOPROTOOPT;
+
+	if (bc_ratio > 100 || bc_ratio <= 0)
+		return -EINVAL;
+
+	bb->rc_ratio = bc_ratio;
+	tipc_bcast_lock(net);
+	tipc_bcbase_calc_bc_threshold(net);
+	tipc_bcast_unlock(net);
+
+	return 0;
+}
+
 int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[])
 {
 	int err;
 	u32 win;
+	u32 bc_mode;
+	u32 bc_ratio;
 	struct nlattr *props[TIPC_NLA_PROP_MAX + 1];
 
 	if (!attrs[TIPC_NLA_LINK_PROP])
@@ -498,12 +555,28 @@ int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[])
 	if (err)
 		return err;
 
-	if (!props[TIPC_NLA_PROP_WIN])
+	if (!props[TIPC_NLA_PROP_WIN] &&
+	    !props[TIPC_NLA_PROP_BROADCAST] &&
+	    !props[TIPC_NLA_PROP_BROADCAST_RATIO]) {
 		return -EOPNOTSUPP;
+	}
+
+	if (props[TIPC_NLA_PROP_BROADCAST]) {
+		bc_mode = nla_get_u32(props[TIPC_NLA_PROP_BROADCAST]);
+		err = tipc_bc_link_set_broadcast_mode(net, bc_mode);
+	}
 
-	win = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+	if (!err && props[TIPC_NLA_PROP_BROADCAST_RATIO]) {
+		bc_ratio = nla_get_u32(props[TIPC_NLA_PROP_BROADCAST_RATIO]);
+		err = tipc_bc_link_set_broadcast_ratio(net, bc_ratio);
+	}
 
-	return tipc_bc_link_set_queue_limits(net, win);
+	if (!err && props[TIPC_NLA_PROP_WIN]) {
+		win = nla_get_u32(props[TIPC_NLA_PROP_WIN]);
+		err = tipc_bc_link_set_queue_limits(net, win);
+	}
+
+	return err;
 }
 
 int tipc_bcast_init(struct net *net)
@@ -529,7 +602,7 @@ int tipc_bcast_init(struct net *net)
 		goto enomem;
 	bb->link = l;
 	tn->bcl = l;
-	bb->rc_ratio = 25;
+	bb->rc_ratio = 10;
 	bb->rcast_support = true;
 	return 0;
 enomem:
@@ -576,3 +649,26 @@ void tipc_nlist_purge(struct tipc_nlist *nl)
 	nl->remote = 0;
 	nl->local = false;
 }
+
+u32 tipc_bcast_get_broadcast_mode(struct net *net)
+{
+	struct tipc_bc_base *bb = tipc_bc_base(net);
+
+	if (bb->force_bcast)
+		return BCLINK_MODE_BCAST;
+
+	if (bb->force_rcast)
+		return BCLINK_MODE_RCAST;
+
+	if (bb->bcast_support && bb->rcast_support)
+		return BCLINK_MODE_SEL;
+
+	return 0;
+}
+
+u32 tipc_bcast_get_broadcast_ratio(struct net *net)
+{
+	struct tipc_bc_base *bb = tipc_bc_base(net);
+
+	return bb->rc_ratio;
+}
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 751530ab0c49..37c55e7347a5 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -48,6 +48,10 @@ extern const char tipc_bclink_name[];
 
 #define TIPC_METHOD_EXPIRE msecs_to_jiffies(5000)
 
+#define BCLINK_MODE_BCAST  0x1
+#define BCLINK_MODE_RCAST  0x2
+#define BCLINK_MODE_SEL    0x4
+
 struct tipc_nlist {
 	struct list_head list;
 	u32 self;
@@ -92,6 +96,9 @@ int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg);
 int tipc_nl_bc_link_set(struct net *net, struct nlattr *attrs[]);
 int tipc_bclink_reset_stats(struct net *net);
 
+u32 tipc_bcast_get_broadcast_mode(struct net *net);
+u32 tipc_bcast_get_broadcast_ratio(struct net *net);
+
 static inline void tipc_bcast_lock(struct net *net)
 {
 	spin_lock_bh(&tipc_net(net)->bclock);
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 341ecd796aa4..52d23b3ffaf5 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -2197,6 +2197,8 @@ int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg)
 	struct nlattr *attrs;
 	struct nlattr *prop;
 	struct tipc_net *tn = net_generic(net, tipc_net_id);
+	u32 bc_mode = tipc_bcast_get_broadcast_mode(net);
+	u32 bc_ratio = tipc_bcast_get_broadcast_ratio(net);
 	struct tipc_link *bcl = tn->bcl;
 
 	if (!bcl)
@@ -2233,6 +2235,12 @@ int tipc_nl_add_bc_link(struct net *net, struct tipc_nl_msg *msg)
 		goto attr_msg_full;
 	if (nla_put_u32(msg->skb, TIPC_NLA_PROP_WIN, bcl->window))
 		goto prop_msg_full;
+	if (nla_put_u32(msg->skb, TIPC_NLA_PROP_BROADCAST, bc_mode))
+		goto prop_msg_full;
+	if (bc_mode & BCLINK_MODE_SEL)
+		if (nla_put_u32(msg->skb, TIPC_NLA_PROP_BROADCAST_RATIO,
+				bc_ratio))
+			goto prop_msg_full;
 	nla_nest_end(msg->skb, prop);
 
 	err = __tipc_nl_add_bc_link_stat(msg->skb, &bcl->stats);
diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c
index 99ee419210ba..5240f64e8ccc 100644
--- a/net/tipc/netlink.c
+++ b/net/tipc/netlink.c
@@ -110,7 +110,9 @@ const struct nla_policy tipc_nl_prop_policy[TIPC_NLA_PROP_MAX + 1] = {
 	[TIPC_NLA_PROP_UNSPEC]		= { .type = NLA_UNSPEC },
 	[TIPC_NLA_PROP_PRIO]		= { .type = NLA_U32 },
 	[TIPC_NLA_PROP_TOL]		= { .type = NLA_U32 },
-	[TIPC_NLA_PROP_WIN]		= { .type = NLA_U32 }
+	[TIPC_NLA_PROP_WIN]		= { .type = NLA_U32 },
+	[TIPC_NLA_PROP_BROADCAST]	= { .type = NLA_U32 },
+	[TIPC_NLA_PROP_BROADCAST_RATIO]	= { .type = NLA_U32 }
 };
 
 const struct nla_policy tipc_nl_bearer_policy[TIPC_NLA_BEARER_MAX + 1]	= {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [net-next 2/3] tipc: introduce new capability flag for cluster
  2019-02-19 11:30 [net-next 1/3] tipc: support broadcast/replicast configurable for bc-link Hoang Le
@ 2019-02-19 11:30 ` Hoang Le
  2019-02-19 11:30 ` [net-next 3/3] tipc: smooth change between replicast and broadcast Hoang Le
  1 sibling, 0 replies; 4+ messages in thread
From: Hoang Le @ 2019-02-19 11:30 UTC (permalink / raw)
  To: jon.maloy, maloy, ying.xue, netdev, tipc-discussion

As a preparation for introducing a moothly switching between replicast
and broadcast method for multicast message. We have to introduce a new
capability flag TIPC_MCAST_RBCTL to handle this new feature because of
compatibility reasons.
When a cluster upgrade a node can come back with this new capabilities
which also must be reflected in the cluster capabilities field and new
feature only applicable if the cluster supports this new capability.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
---
 net/tipc/core.c |  2 ++
 net/tipc/core.h |  3 +++
 net/tipc/node.c | 18 ++++++++++++++++++
 net/tipc/node.h |  6 ++++--
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/net/tipc/core.c b/net/tipc/core.c
index 5b38f5164281..27cccd101ef6 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -43,6 +43,7 @@
 #include "net.h"
 #include "socket.h"
 #include "bcast.h"
+#include "node.h"
 
 #include <linux/module.h>
 
@@ -59,6 +60,7 @@ static int __net_init tipc_init_net(struct net *net)
 	tn->node_addr = 0;
 	tn->trial_addr = 0;
 	tn->addr_trial_end = 0;
+	tn->capabilities = TIPC_NODE_CAPABILITIES;
 	memset(tn->node_id, 0, sizeof(tn->node_id));
 	memset(tn->node_id_string, 0, sizeof(tn->node_id_string));
 	tn->mon_threshold = TIPC_DEF_MON_THRESHOLD;
diff --git a/net/tipc/core.h b/net/tipc/core.h
index 8020a6c360ff..7a68e1b6a066 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -122,6 +122,9 @@ struct tipc_net {
 	/* Topology subscription server */
 	struct tipc_topsrv *topsrv;
 	atomic_t subscription_count;
+
+	/* Cluster capabilities */
+	u16 capabilities;
 };
 
 static inline struct tipc_net *tipc_net(struct net *net)
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 2dc4919ab23c..2717893e9dbe 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -383,6 +383,11 @@ static struct tipc_node *tipc_node_create(struct net *net, u32 addr,
 				tipc_link_update_caps(l, capabilities);
 		}
 		write_unlock_bh(&n->lock);
+		/* Calculate cluster capabilities */
+		tn->capabilities = TIPC_NODE_CAPABILITIES;
+		list_for_each_entry_rcu(temp_node, &tn->node_list, list) {
+			tn->capabilities &= temp_node->capabilities;
+		}
 		goto exit;
 	}
 	n = kzalloc(sizeof(*n), GFP_ATOMIC);
@@ -433,6 +438,11 @@ static struct tipc_node *tipc_node_create(struct net *net, u32 addr,
 			break;
 	}
 	list_add_tail_rcu(&n->list, &temp_node->list);
+	/* Calculate cluster capabilities */
+	tn->capabilities = TIPC_NODE_CAPABILITIES;
+	list_for_each_entry_rcu(temp_node, &tn->node_list, list) {
+		tn->capabilities &= temp_node->capabilities;
+	}
 	trace_tipc_node_create(n, true, " ");
 exit:
 	spin_unlock_bh(&tn->node_list_lock);
@@ -589,6 +599,7 @@ static void  tipc_node_clear_links(struct tipc_node *node)
  */
 static bool tipc_node_cleanup(struct tipc_node *peer)
 {
+	struct tipc_node *temp_node;
 	struct tipc_net *tn = tipc_net(peer->net);
 	bool deleted = false;
 
@@ -604,6 +615,13 @@ static bool tipc_node_cleanup(struct tipc_node *peer)
 		deleted = true;
 	}
 	tipc_node_write_unlock(peer);
+
+	/* Calculate cluster capabilities */
+	tn->capabilities = TIPC_NODE_CAPABILITIES;
+	list_for_each_entry_rcu(temp_node, &tn->node_list, list) {
+		tn->capabilities &= temp_node->capabilities;
+	}
+
 	spin_unlock_bh(&tn->node_list_lock);
 	return deleted;
 }
diff --git a/net/tipc/node.h b/net/tipc/node.h
index 4f59a30e989a..2404225c5d58 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -51,7 +51,8 @@ enum {
 	TIPC_BLOCK_FLOWCTL    = (1 << 3),
 	TIPC_BCAST_RCAST      = (1 << 4),
 	TIPC_NODE_ID128       = (1 << 5),
-	TIPC_LINK_PROTO_SEQNO = (1 << 6)
+	TIPC_LINK_PROTO_SEQNO = (1 << 6),
+	TIPC_MCAST_RBCTL      = (1 << 7)
 };
 
 #define TIPC_NODE_CAPABILITIES (TIPC_SYN_BIT           |  \
@@ -60,7 +61,8 @@ enum {
 				TIPC_BCAST_RCAST       |   \
 				TIPC_BLOCK_FLOWCTL     |   \
 				TIPC_NODE_ID128        |   \
-				TIPC_LINK_PROTO_SEQNO)
+				TIPC_LINK_PROTO_SEQNO  |   \
+				TIPC_MCAST_RBCTL)
 #define INVALID_BEARER_ID -1
 
 void tipc_node_stop(struct net *net);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [net-next 3/3] tipc: smooth change between replicast and broadcast
  2019-02-19 11:30 [net-next 1/3] tipc: support broadcast/replicast configurable for bc-link Hoang Le
  2019-02-19 11:30 ` [net-next 2/3] tipc: introduce new capability flag for cluster Hoang Le
@ 2019-02-19 11:30 ` Hoang Le
  2019-02-22  0:00   ` David Miller
  1 sibling, 1 reply; 4+ messages in thread
From: Hoang Le @ 2019-02-19 11:30 UTC (permalink / raw)
  To: jon.maloy, maloy, ying.xue, netdev, tipc-discussion

Currently, a multicast stream may start out using replicast, because
there are few destinations, and then it should ideally switch to
L2/broadcast IGMP/multicast when the number of destinations grows beyond
a certain limit. The opposite should happen when the number decreases
below the limit.

To eliminate the risk of message reordering caused by method change,
a sending socket must stick to a previously selected method until it
enters an idle period of 5 seconds. Means there is a 5 seconds pause
in the traffic from the sender socket.

If the sender never makes such a pause, the method will never change,
and transmission may become very inefficient as the cluster grows.

With this commit, we allow such a switch between replicast and
broadcast without any need for a traffic pause.

Solution is to send a dummy message with only the header, also with
the SYN bit set, via broadcast or replicast. For the data message,
the SYN bit is set and sending via replicast or broadcast (inverse
method with dummy).

Then, at receiving side any messages follow first SYN bit message
(data or dummy message), they will be held in deferred queue until
another pair (dummy or data message) arrived in other link.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Hoang Le <hoang.h.le@dektech.com.au>
---
 net/tipc/bcast.c  | 165 +++++++++++++++++++++++++++++++++++++++++++++-
 net/tipc/bcast.h  |   5 ++
 net/tipc/msg.h    |  10 +++
 net/tipc/socket.c |   5 ++
 4 files changed, 184 insertions(+), 1 deletion(-)

diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
index 12b59268bdd6..de11a036821e 100644
--- a/net/tipc/bcast.c
+++ b/net/tipc/bcast.c
@@ -220,9 +220,24 @@ static void tipc_bcast_select_xmit_method(struct net *net, int dests,
 	}
 	/* Can current method be changed ? */
 	method->expires = jiffies + TIPC_METHOD_EXPIRE;
-	if (method->mandatory || time_before(jiffies, exp))
+	if (method->mandatory)
 		return;
 
+	if (!(tipc_net(net)->capabilities & TIPC_MCAST_RBCTL) &&
+	    time_before(jiffies, exp))
+		return;
+
+	/* Configuration as force 'broadcast' method */
+	if (bb->force_bcast) {
+		method->rcast = false;
+		return;
+	}
+	/* Configuration as force 'replicast' method */
+	if (bb->force_rcast) {
+		method->rcast = true;
+		return;
+	}
+	/* Configuration as 'autoselect' or default method */
 	/* Determine method to use now */
 	method->rcast = dests <= bb->bc_threshold;
 }
@@ -285,6 +300,63 @@ static int tipc_rcast_xmit(struct net *net, struct sk_buff_head *pkts,
 	return 0;
 }
 
+/* tipc_mcast_send_sync - deliver a dummy message with SYN bit
+ * @net: the applicable net namespace
+ * @skb: socket buffer to copy
+ * @method: send method to be used
+ * @dests: destination nodes for message.
+ * @cong_link_cnt: returns number of encountered congested destination links
+ * Returns 0 if success, otherwise errno
+ */
+static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb,
+				struct tipc_mc_method *method,
+				struct tipc_nlist *dests,
+				u16 *cong_link_cnt)
+{
+	struct sk_buff_head tmpq;
+	struct sk_buff *_skb;
+	struct tipc_msg *hdr, *_hdr;
+
+	/* Is a cluster supporting with new capabilities ? */
+	if (!(tipc_net(net)->capabilities & TIPC_MCAST_RBCTL))
+		return 0;
+
+	hdr = buf_msg(skb);
+	if (msg_user(hdr) == MSG_FRAGMENTER)
+		hdr = msg_get_wrapped(hdr);
+	if (msg_type(hdr) != TIPC_MCAST_MSG)
+		return 0;
+
+	/* Allocate dummy message */
+	_skb = tipc_buf_acquire(MCAST_H_SIZE, GFP_KERNEL);
+	if (!skb)
+		return -ENOMEM;
+
+	/* Preparing for 'synching' header */
+	msg_set_syn(hdr, 1);
+
+	/* Copy skb's header into a dummy header */
+	skb_copy_to_linear_data(_skb, hdr, MCAST_H_SIZE);
+	skb_orphan(_skb);
+
+	/* Reverse method for dummy message */
+	_hdr = buf_msg(_skb);
+	msg_set_size(_hdr, MCAST_H_SIZE);
+	msg_set_is_rcast(_hdr, !msg_is_rcast(hdr));
+
+	skb_queue_head_init(&tmpq);
+	__skb_queue_tail(&tmpq, _skb);
+	if (method->rcast)
+		tipc_bcast_xmit(net, &tmpq, cong_link_cnt);
+	else
+		tipc_rcast_xmit(net, &tmpq, dests, cong_link_cnt);
+
+	/* This queue should normally be empty by now */
+	__skb_queue_purge(&tmpq);
+
+	return 0;
+}
+
 /* tipc_mcast_xmit - deliver message to indicated destination nodes
  *                   and to identified node local sockets
  * @net: the applicable net namespace
@@ -300,6 +372,9 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
 		    u16 *cong_link_cnt)
 {
 	struct sk_buff_head inputq, localq;
+	struct sk_buff *skb;
+	struct tipc_msg *hdr;
+	bool rcast = method->rcast;
 	int rc = 0;
 
 	skb_queue_head_init(&inputq);
@@ -313,6 +388,18 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
 	/* Send according to determined transmit method */
 	if (dests->remote) {
 		tipc_bcast_select_xmit_method(net, dests->remote, method);
+
+		skb = skb_peek(pkts);
+		hdr = buf_msg(skb);
+		if (msg_user(hdr) == MSG_FRAGMENTER)
+			hdr = msg_get_wrapped(hdr);
+		msg_set_is_rcast(hdr, method->rcast);
+
+		/* Switch method ? */
+		if (rcast != method->rcast)
+			tipc_mcast_send_sync(net, skb, method,
+					     dests, cong_link_cnt);
+
 		if (method->rcast)
 			rc = tipc_rcast_xmit(net, pkts, dests, cong_link_cnt);
 		else
@@ -672,3 +759,79 @@ u32 tipc_bcast_get_broadcast_ratio(struct net *net)
 
 	return bb->rc_ratio;
 }
+
+void tipc_mcast_filter_msg(struct sk_buff_head *defq,
+			   struct sk_buff_head *inputq)
+{
+	struct sk_buff *skb, *_skb, *tmp;
+	struct tipc_msg *hdr, *_hdr;
+	bool match = false;
+	u32 node, port;
+
+	skb = skb_peek(inputq);
+	hdr = buf_msg(skb);
+
+	if (likely(!msg_is_syn(hdr) && skb_queue_empty(defq)))
+		return;
+
+	node = msg_orignode(hdr);
+	port = msg_origport(hdr);
+
+	/* Has the twin SYN message already arrived ? */
+	skb_queue_walk(defq, _skb) {
+		_hdr = buf_msg(_skb);
+		if (msg_orignode(_hdr) != node)
+			continue;
+		if (msg_origport(_hdr) != port)
+			continue;
+		match = true;
+		break;
+	}
+
+	if (!match) {
+		if (!msg_is_syn(hdr))
+			return;
+		__skb_dequeue(inputq);
+		__skb_queue_tail(defq, skb);
+		return;
+	}
+
+	/* Deliver non-SYN message from other link, otherwise queue it */
+	if (!msg_is_syn(hdr)) {
+		if (msg_is_rcast(hdr) != msg_is_rcast(_hdr))
+			return;
+		__skb_dequeue(inputq);
+		__skb_queue_tail(defq, skb);
+		return;
+	}
+
+	/* Queue non-SYN/SYN message from same link */
+	if (msg_is_rcast(hdr) == msg_is_rcast(_hdr)) {
+		__skb_dequeue(inputq);
+		__skb_queue_tail(defq, skb);
+		return;
+	}
+
+	/* Matching SYN messages => return the one with data, if any */
+	__skb_unlink(_skb, defq);
+	if (msg_data_sz(hdr)) {
+		kfree_skb(_skb);
+	} else {
+		__skb_dequeue(inputq);
+		kfree_skb(skb);
+		__skb_queue_tail(inputq, _skb);
+	}
+
+	/* Deliver subsequent non-SYN messages from same peer */
+	skb_queue_walk_safe(defq, _skb, tmp) {
+		_hdr = buf_msg(_skb);
+		if (msg_orignode(_hdr) != node)
+			continue;
+		if (msg_origport(_hdr) != port)
+			continue;
+		if (msg_is_syn(_hdr))
+			break;
+		__skb_unlink(_skb, defq);
+		__skb_queue_tail(inputq, _skb);
+	}
+}
diff --git a/net/tipc/bcast.h b/net/tipc/bcast.h
index 37c55e7347a5..484bde289d3a 100644
--- a/net/tipc/bcast.h
+++ b/net/tipc/bcast.h
@@ -67,11 +67,13 @@ void tipc_nlist_del(struct tipc_nlist *nl, u32 node);
 /* Cookie to be used between socket and broadcast layer
  * @rcast: replicast (instead of broadcast) was used at previous xmit
  * @mandatory: broadcast/replicast indication was set by user
+ * @deferredq: defer queue to make message in order
  * @expires: re-evaluate non-mandatory transmit method if we are past this
  */
 struct tipc_mc_method {
 	bool rcast;
 	bool mandatory;
+	struct sk_buff_head deferredq;
 	unsigned long expires;
 };
 
@@ -99,6 +101,9 @@ int tipc_bclink_reset_stats(struct net *net);
 u32 tipc_bcast_get_broadcast_mode(struct net *net);
 u32 tipc_bcast_get_broadcast_ratio(struct net *net);
 
+void tipc_mcast_filter_msg(struct sk_buff_head *defq,
+			   struct sk_buff_head *inputq);
+
 static inline void tipc_bcast_lock(struct net *net)
 {
 	spin_lock_bh(&tipc_net(net)->bclock);
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index d7e4b8b93f9d..528ba9241acc 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -257,6 +257,16 @@ static inline void msg_set_src_droppable(struct tipc_msg *m, u32 d)
 	msg_set_bits(m, 0, 18, 1, d);
 }
 
+static inline bool msg_is_rcast(struct tipc_msg *m)
+{
+	return msg_bits(m, 0, 18, 0x1);
+}
+
+static inline void msg_set_is_rcast(struct tipc_msg *m, bool d)
+{
+	msg_set_bits(m, 0, 18, 0x1, d);
+}
+
 static inline void msg_set_size(struct tipc_msg *m, u32 sz)
 {
 	m->hdr[0] = htonl((msg_word(m, 0) & ~0x1ffff) | sz);
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 8fc5acd4820d..de83eb1e718e 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -483,6 +483,7 @@ static int tipc_sk_create(struct net *net, struct socket *sock,
 		tsk_set_unreturnable(tsk, true);
 		if (sock->type == SOCK_DGRAM)
 			tsk_set_unreliable(tsk, true);
+		__skb_queue_head_init(&tsk->mc_method.deferredq);
 	}
 
 	trace_tipc_sk_create(sk, NULL, TIPC_DUMP_NONE, " ");
@@ -580,6 +581,7 @@ static int tipc_release(struct socket *sock)
 	sk->sk_shutdown = SHUTDOWN_MASK;
 	tipc_sk_leave(tsk);
 	tipc_sk_withdraw(tsk, 0, NULL);
+	__skb_queue_purge(&tsk->mc_method.deferredq);
 	sk_stop_timer(sk, &sk->sk_timer);
 	tipc_sk_remove(tsk);
 
@@ -2157,6 +2159,9 @@ static void tipc_sk_filter_rcv(struct sock *sk, struct sk_buff *skb,
 	if (unlikely(grp))
 		tipc_group_filter_msg(grp, &inputq, xmitq);
 
+	if (msg_type(hdr) == TIPC_MCAST_MSG)
+		tipc_mcast_filter_msg(&tsk->mc_method.deferredq, &inputq);
+
 	/* Validate and add to receive buffer if there is space */
 	while ((skb = __skb_dequeue(&inputq))) {
 		hdr = buf_msg(skb);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [net-next 3/3] tipc: smooth change between replicast and broadcast
  2019-02-19 11:30 ` [net-next 3/3] tipc: smooth change between replicast and broadcast Hoang Le
@ 2019-02-22  0:00   ` David Miller
  0 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2019-02-22  0:00 UTC (permalink / raw)
  To: hoang.h.le; +Cc: jon.maloy, maloy, ying.xue, netdev, tipc-discussion

From: Hoang Le <hoang.h.le@dektech.com.au>
Date: Tue, 19 Feb 2019 18:30:54 +0700

> +static int tipc_mcast_send_sync(struct net *net, struct sk_buff *skb,
> +				struct tipc_mc_method *method,
> +				struct tipc_nlist *dests,
> +				u16 *cong_link_cnt)
> +{
> +	struct sk_buff_head tmpq;
> +	struct sk_buff *_skb;
> +	struct tipc_msg *hdr, *_hdr;

Reverse christmas tree please.

> @@ -300,6 +372,9 @@ int tipc_mcast_xmit(struct net *net, struct sk_buff_head *pkts,
>  		    u16 *cong_link_cnt)
>  {
>  	struct sk_buff_head inputq, localq;
> +	struct sk_buff *skb;
> +	struct tipc_msg *hdr;
> +	bool rcast = method->rcast;
>  	int rc = 0;

Likewise.

Thanks.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-02-22  0:00 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-19 11:30 [net-next 1/3] tipc: support broadcast/replicast configurable for bc-link Hoang Le
2019-02-19 11:30 ` [net-next 2/3] tipc: introduce new capability flag for cluster Hoang Le
2019-02-19 11:30 ` [net-next 3/3] tipc: smooth change between replicast and broadcast Hoang Le
2019-02-22  0:00   ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).