All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] llc enhancements
@ 2009-12-03 22:31 Octavian Purdila
  2009-12-03 22:31 ` [PATCH 1/4] llc: use dev_hard_header Octavian Purdila
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 22:31 UTC (permalink / raw)
  To: opurdila, netdev; +Cc: Arnaldo Carvalho de Melo


This patch series introduces a couple of general enhancements to LLC
socket code as well as changes to scale the LLC socket lookup code for
a large number of interfaces needed for our uses cases (STP traffic
generation from a large number of ports, via virtual network
interfaces).

Octavian Purdila (4):
      llc: use dev_hard_header
      llc: add support for LLC_OPT_PKTINFO
      llc: use a device based hash table to speed up multicast delivery
      llc: replace the socket list with a local address based hash

 drivers/net/myri_sbus.c |    6 ++--
 include/linux/llc.h     |    7 ++++
 include/net/llc.h       |   39 ++++++++++++++++++++--
 include/net/llc_conn.h  |    2 +
 net/8021q/vlan_dev.c    |    7 ++--
 net/ethernet/eth.c      |    7 ++--
 net/llc/af_llc.c        |   29 ++++++++++++++++
 net/llc/llc_conn.c      |   82 ++++++++++++++++++++++++++++++++---------------
 net/llc/llc_core.c      |    7 +++-
 net/llc/llc_output.c    |   45 ++++---------------------
 net/llc/llc_proc.c      |   67 ++++++++++++++++++++++++--------------
 net/llc/llc_sap.c       |   18 ++++++----
 12 files changed, 205 insertions(+), 111 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] llc: use dev_hard_header
  2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
@ 2009-12-03 22:31 ` Octavian Purdila
  2009-12-03 22:31 ` [PATCH 2/4] llc: add support for LLC_OPT_PKTINFO Octavian Purdila
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 22:31 UTC (permalink / raw)
  To: opurdila, netdev; +Cc: Arnaldo Carvalho de Melo

Using dev_hard_header allows us to use LLC with VLANs and potentially
other Ethernet/TokernRing specific encapsulations. It also removes code
duplication between LLC and Ethernet/TokenRing core code.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 drivers/net/myri_sbus.c |    6 +++---
 net/8021q/vlan_dev.c    |    7 +++----
 net/ethernet/eth.c      |    7 ++++---
 net/llc/llc_output.c    |   45 ++++++++-------------------------------------
 4 files changed, 18 insertions(+), 47 deletions(-)

diff --git a/drivers/net/myri_sbus.c b/drivers/net/myri_sbus.c
index b3513ad..8b43130 100644
--- a/drivers/net/myri_sbus.c
+++ b/drivers/net/myri_sbus.c
@@ -716,10 +716,10 @@ static int myri_header(struct sk_buff *skb, struct net_device *dev,
 	pad[0] = MYRI_PAD_LEN;
 	pad[1] = 0xab;
 
-	/* Set the protocol type. For a packet of type ETH_P_802_3 we put the length
-	 * in here instead. It is up to the 802.2 layer to carry protocol information.
+	/* Set the protocol type. For a packet of type ETH_P_802_3/2 we put the
+	 * length in here instead.
 	 */
-	if (type != ETH_P_802_3)
+	if (type != ETH_P_802_3 && type != ETH_P_802_2)
 		eth->h_proto = htons(type);
 	else
 		eth->h_proto = htons(len);
diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
index de0dc6b..1bc9be5 100644
--- a/net/8021q/vlan_dev.c
+++ b/net/8021q/vlan_dev.c
@@ -263,11 +263,10 @@ static int vlan_dev_hard_header(struct sk_buff *skb, struct net_device *dev,
 		vhdr->h_vlan_TCI = htons(vlan_tci);
 
 		/*
-		 *  Set the protocol type. For a packet of type ETH_P_802_3 we
-		 *  put the length in here instead. It is up to the 802.2
-		 *  layer to carry protocol information.
+		 *  Set the protocol type. For a packet of type ETH_P_802_3/2 we
+		 *  put the length in here instead.
 		 */
-		if (type != ETH_P_802_3)
+		if (type != ETH_P_802_3 && type != ETH_P_802_2)
 			vhdr->h_vlan_encapsulated_proto = htons(type);
 		else
 			vhdr->h_vlan_encapsulated_proto = htons(len);
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index dd3db88..c380da6 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -1,3 +1,4 @@
+]
 /*
  * INET		An implementation of the TCP/IP protocol suite for the LINUX
  *		operating system.  INET is implemented using the  BSD Socket
@@ -73,8 +74,8 @@ __setup("ether=", netdev_boot_setup);
  * @len:   packet length (<= skb->len)
  *
  *
- * Set the protocol type. For a packet of type ETH_P_802_3 we put the length
- * in here instead. It is up to the 802.2 layer to carry protocol information.
+ * Set the protocol type. For a packet of type ETH_P_802_3/2 we put the length
+ * in here instead.
  */
 int eth_header(struct sk_buff *skb, struct net_device *dev,
 	       unsigned short type,
@@ -82,7 +83,7 @@ int eth_header(struct sk_buff *skb, struct net_device *dev,
 {
 	struct ethhdr *eth = (struct ethhdr *)skb_push(skb, ETH_HLEN);
 
-	if (type != ETH_P_802_3)
+	if (type != ETH_P_802_3 && type != ETH_P_802_2)
 		eth->h_proto = htons(type);
 	else
 		eth->h_proto = htons(len);
diff --git a/net/llc/llc_output.c b/net/llc/llc_output.c
index 754f4fe..b38a107 100644
--- a/net/llc/llc_output.c
+++ b/net/llc/llc_output.c
@@ -33,48 +33,19 @@
 int llc_mac_hdr_init(struct sk_buff *skb,
 		     const unsigned char *sa, const unsigned char *da)
 {
-	int rc = 0;
+	int rc = -EINVAL;
 
 	switch (skb->dev->type) {
-#ifdef CONFIG_TR
-	case ARPHRD_IEEE802_TR: {
-		struct net_device *dev = skb->dev;
-		struct trh_hdr *trh;
-
-		skb_push(skb, sizeof(*trh));
-		skb_reset_mac_header(skb);
-		trh = tr_hdr(skb);
-		trh->ac = AC;
-		trh->fc = LLC_FRAME;
-		if (sa)
-			memcpy(trh->saddr, sa, dev->addr_len);
-		else
-			memset(trh->saddr, 0, dev->addr_len);
-		if (da) {
-			memcpy(trh->daddr, da, dev->addr_len);
-			tr_source_route(skb, trh, dev);
-			skb_reset_mac_header(skb);
-		}
-		break;
-	}
-#endif
+	case ARPHRD_IEEE802_TR:
 	case ARPHRD_ETHER:
-	case ARPHRD_LOOPBACK: {
-		unsigned short len = skb->len;
-		struct ethhdr *eth;
-
-		skb_push(skb, sizeof(*eth));
-		skb_reset_mac_header(skb);
-		eth = eth_hdr(skb);
-		eth->h_proto = htons(len);
-		memcpy(eth->h_dest, da, ETH_ALEN);
-		memcpy(eth->h_source, sa, ETH_ALEN);
+	case ARPHRD_LOOPBACK:
+		rc = dev_hard_header(skb, skb->dev, ETH_P_802_2, da, sa,
+				     skb->len);
+		if (rc > 0)
+			rc = 0;
 		break;
-	}
 	default:
-		printk(KERN_WARNING "device type not supported: %d\n",
-		       skb->dev->type);
-		rc = -EINVAL;
+		WARN(1, "device type not supported: %d\n", skb->dev->type);
 	}
 	return rc;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] llc: add support for LLC_OPT_PKTINFO
  2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
  2009-12-03 22:31 ` [PATCH 1/4] llc: use dev_hard_header Octavian Purdila
@ 2009-12-03 22:31 ` Octavian Purdila
  2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 22:31 UTC (permalink / raw)
  To: opurdila, netdev; +Cc: Arnaldo Carvalho de Melo

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 include/linux/llc.h    |    7 +++++++
 include/net/llc_conn.h |    1 +
 net/llc/af_llc.c       |   29 +++++++++++++++++++++++++++++
 3 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/include/linux/llc.h b/include/linux/llc.h
index 7733585..ad7074b 100644
--- a/include/linux/llc.h
+++ b/include/linux/llc.h
@@ -36,6 +36,7 @@ enum llc_sockopts {
 	LLC_OPT_BUSY_TMR_EXP,	/* busy state expire time (secs). */
 	LLC_OPT_TX_WIN,		/* tx window size. */
 	LLC_OPT_RX_WIN,		/* rx window size. */
+	LLC_OPT_PKTINFO,	/* ancillary packet information. */
 	LLC_OPT_MAX
 };
 
@@ -70,6 +71,12 @@ enum llc_sockopts {
 #define LLC_SAP_RM	0xD4		/* Resource Management 		*/
 #define LLC_SAP_GLOBAL	0xFF		/* Global SAP. 			*/
 
+struct llc_pktinfo {
+	int lpi_ifindex;
+	unsigned char lpi_sap;
+	unsigned char lpi_mac[IFHWADDRLEN];
+};
+
 #ifdef __KERNEL__
 #define LLC_SAP_DYN_START	0xC0
 #define LLC_SAP_DYN_STOP	0xDE
diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
index e2374e3..fe982fd 100644
--- a/include/net/llc_conn.h
+++ b/include/net/llc_conn.h
@@ -76,6 +76,7 @@ struct llc_sock {
 	u32		    rx_pdu_hdr;	   /* used for saving header of last pdu
 					      received and caused sending FRMR.
 					      Used for resending FRMR */
+	u32		    cmsg_flags;
 };
 
 static inline struct llc_sock *llc_sk(const struct sock *sk)
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 5266c28..ffc96d3 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -47,6 +47,10 @@ static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout);
 #define dprintk(args...)
 #endif
 
+/* Maybe we'll add some more in the future. */
+#define LLC_CMSG_PKTINFO	1
+
+
 /**
  *	llc_ui_next_link_no - return the next unused link number for a sap
  *	@sap: Address of sap to get link number from.
@@ -591,6 +595,20 @@ static int llc_wait_data(struct sock *sk, long timeo)
 	return rc;
 }
 
+static void llc_cmsg_rcv(struct msghdr *msg, struct sk_buff *skb)
+{
+	struct llc_sock *llc = llc_sk(skb->sk);
+
+	if (llc->cmsg_flags & LLC_CMSG_PKTINFO) {
+		struct llc_pktinfo info;
+
+		info.lpi_ifindex = llc_sk(skb->sk)->dev->ifindex;
+		llc_pdu_decode_dsap(skb, &info.lpi_sap);
+		llc_pdu_decode_da(skb, info.lpi_mac);
+		put_cmsg(msg, SOL_LLC, LLC_OPT_PKTINFO, sizeof(info), &info);
+	}
+}
+
 /**
  *	llc_ui_accept - accept a new incoming connection.
  *	@sock: Socket which connections arrive on.
@@ -812,6 +830,8 @@ copy_uaddr:
 		memcpy(uaddr, llc_ui_skb_cb(skb), sizeof(*uaddr));
 		msg->msg_namelen = sizeof(*uaddr);
 	}
+	if (llc_sk(sk)->cmsg_flags)
+		llc_cmsg_rcv(msg, skb);
 	goto out;
 }
 
@@ -1030,6 +1050,12 @@ static int llc_ui_setsockopt(struct socket *sock, int level, int optname,
 			goto out;
 		llc->rw = opt;
 		break;
+	case LLC_OPT_PKTINFO:
+		if (opt)
+			llc->cmsg_flags |= LLC_CMSG_PKTINFO;
+		else
+			llc->cmsg_flags &= ~LLC_CMSG_PKTINFO;
+		break;
 	default:
 		rc = -ENOPROTOOPT;
 		goto out;
@@ -1083,6 +1109,9 @@ static int llc_ui_getsockopt(struct socket *sock, int level, int optname,
 		val = llc->k;				break;
 	case LLC_OPT_RX_WIN:
 		val = llc->rw;				break;
+	case LLC_OPT_PKTINFO:
+		val = (llc->cmsg_flags & LLC_CMSG_PKTINFO) != 0;
+		break;
 	default:
 		rc = -ENOPROTOOPT;
 		goto out;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
  2009-12-03 22:31 ` [PATCH 1/4] llc: use dev_hard_header Octavian Purdila
  2009-12-03 22:31 ` [PATCH 2/4] llc: add support for LLC_OPT_PKTINFO Octavian Purdila
@ 2009-12-03 22:31 ` Octavian Purdila
  2009-12-03 22:59   ` Eric Dumazet
  2009-12-03 23:25   ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Stephen Hemminger
  2009-12-03 22:31 ` [PATCH 4/4] llc: replace the socket list with a local address based hash Octavian Purdila
  2009-12-03 23:55 ` [PATCH 0/4] llc enhancements David Miller
  4 siblings, 2 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 22:31 UTC (permalink / raw)
  To: opurdila, netdev; +Cc: Arnaldo Carvalho de Melo

This patch adds a per SAP device based hash table to solve the
multicast delivery scalability issues for the case where the are a
large number of interfaces and a large number of sockets (bound to the
same SAP) are used.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 include/net/llc.h      |   20 ++++++++++++++++----
 include/net/llc_conn.h |    1 +
 net/llc/llc_conn.c     |   18 +++++++++++++++++-
 net/llc/llc_core.c     |    3 +++
 net/llc/llc_sap.c      |   11 ++++++-----
 5 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/include/net/llc.h b/include/net/llc.h
index 7940da1..31e9902 100644
--- a/include/net/llc.h
+++ b/include/net/llc.h
@@ -31,6 +31,14 @@ struct llc_addr {
 #define LLC_SAP_STATE_INACTIVE	1
 #define LLC_SAP_STATE_ACTIVE	2
 
+#define LLC_SK_DEV_HASH_BITS 10
+#define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS)
+
+struct llc_sk_list {
+	rwlock_t lock;
+	struct hlist_head list;
+};
+
 /**
  * struct llc_sap - Defines the SAP component
  *
@@ -53,12 +61,16 @@ struct llc_sap {
 				     struct net_device *orig_dev);
 	struct llc_addr	 laddr;
 	struct list_head node;
-	struct {
-		rwlock_t	  lock;
-		struct hlist_head list;
-	} sk_list;
+	struct llc_sk_list sk_list;
+	struct llc_sk_list sk_dev_hash[LLC_SK_DEV_HASH_ENTRIES];
 };
 
+static inline
+struct llc_sk_list *llc_sk_dev_hash(struct llc_sap *sap, int ifindex)
+{
+	return &sap->sk_dev_hash[ifindex % LLC_SK_DEV_HASH_ENTRIES];
+}
+
 #define LLC_DEST_INVALID         0      /* Invalid LLC PDU type */
 #define LLC_DEST_SAP             1      /* Type 1 goes here */
 #define LLC_DEST_CONN            2      /* Type 2 goes here */
diff --git a/include/net/llc_conn.h b/include/net/llc_conn.h
index fe982fd..2f97d8d 100644
--- a/include/net/llc_conn.h
+++ b/include/net/llc_conn.h
@@ -77,6 +77,7 @@ struct llc_sock {
 					      received and caused sending FRMR.
 					      Used for resending FRMR */
 	u32		    cmsg_flags;
+	struct hlist_node   dev_hash_node;
 };
 
 static inline struct llc_sock *llc_sk(const struct sock *sk)
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index c6bab39..c3f47ec 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -651,11 +651,19 @@ static int llc_find_offset(int state, int ev_type)
  */
 void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
 {
+	struct llc_sock *llc = llc_sk(sk);
+	struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, llc->dev->ifindex);
+
 	llc_sap_hold(sap);
+	llc->sap = sap;
+
 	write_lock_bh(&sap->sk_list.lock);
-	llc_sk(sk)->sap = sap;
 	sk_add_node(sk, &sap->sk_list.list);
 	write_unlock_bh(&sap->sk_list.lock);
+
+	write_lock_bh(&dev_hb->lock);
+	hlist_add_head(&llc->dev_hash_node, &dev_hb->list);
+	write_unlock_bh(&dev_hb->lock);
 }
 
 /**
@@ -668,9 +676,17 @@ void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
  */
 void llc_sap_remove_socket(struct llc_sap *sap, struct sock *sk)
 {
+	struct llc_sock *llc = llc_sk(sk);
+	struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, llc->dev->ifindex);
+
 	write_lock_bh(&sap->sk_list.lock);
 	sk_del_node_init(sk);
 	write_unlock_bh(&sap->sk_list.lock);
+
+	write_lock_bh(&dev_hb->lock);
+	hlist_del(&llc->dev_hash_node);
+	write_unlock_bh(&dev_hb->lock);
+
 	llc_sap_put(sap);
 }
 
diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
index ff4c0ab..1d79cda 100644
--- a/net/llc/llc_core.c
+++ b/net/llc/llc_core.c
@@ -33,12 +33,15 @@ DEFINE_RWLOCK(llc_sap_list_lock);
 static struct llc_sap *llc_sap_alloc(void)
 {
 	struct llc_sap *sap = kzalloc(sizeof(*sap), GFP_ATOMIC);
+	int i;
 
 	if (sap) {
 		/* sap->laddr.mac - leave as a null, it's filled by bind */
 		sap->state = LLC_SAP_STATE_ACTIVE;
 		rwlock_init(&sap->sk_list.lock);
 		atomic_set(&sap->refcnt, 1);
+		for (i = 0; i < LLC_SK_DEV_HASH_ENTRIES; i++)
+			rwlock_init(&sap->sk_dev_hash[i].lock);
 	}
 	return sap;
 }
diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c
index 008de1f..11bad55 100644
--- a/net/llc/llc_sap.c
+++ b/net/llc/llc_sap.c
@@ -340,12 +340,13 @@ static void llc_sap_mcast(struct llc_sap *sap,
 			  const struct llc_addr *laddr,
 			  struct sk_buff *skb)
 {
-	struct sock *sk;
+	struct llc_sock *llc;
 	struct hlist_node *node;
+	struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, skb->dev->ifindex);
 
-	read_lock_bh(&sap->sk_list.lock);
-	sk_for_each(sk, node, &sap->sk_list.list) {
-		struct llc_sock *llc = llc_sk(sk);
+	read_lock_bh(&dev_hb->lock);
+	hlist_for_each_entry(llc, node, &dev_hb->list, dev_hash_node) {
+		struct sock *sk = &llc->sk;
 		struct sk_buff *skb1;
 
 		if (sk->sk_type != SOCK_DGRAM)
@@ -365,7 +366,7 @@ static void llc_sap_mcast(struct llc_sap *sap,
 		llc_sap_rcv(sap, skb1, sk);
 		sock_put(sk);
 	}
-	read_unlock_bh(&sap->sk_list.lock);
+	read_unlock_bh(&dev_hb->lock);
 }
 
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] llc: replace the socket list with a local address based hash
  2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
                   ` (2 preceding siblings ...)
  2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
@ 2009-12-03 22:31 ` Octavian Purdila
  2009-12-03 23:55 ` [PATCH 0/4] llc enhancements David Miller
  4 siblings, 0 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 22:31 UTC (permalink / raw)
  To: opurdila, netdev; +Cc: Arnaldo Carvalho de Melo

For the cases where a lot of interfaces are used in conjunction with a
lot of LLC sockets bound to the same SAP, the iteration of the socket
list becomes prohibitively expensive.

Replacing the list with a a local address based hash significantly
improves the bind and listener lookup operations as well as the
datagram delivery.

Connected sockets delivery is also improved, but this patch does not
address the case where we have lots of sockets with the same local
address connected to different remote addresses.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 include/net/llc.h  |   21 +++++++++++++++-
 net/llc/llc_conn.c |   64 ++++++++++++++++++++++++++++++-------------------
 net/llc/llc_core.c |    4 +-
 net/llc/llc_proc.c |   67 +++++++++++++++++++++++++++++++++------------------
 net/llc/llc_sap.c  |    7 +++--
 5 files changed, 108 insertions(+), 55 deletions(-)

diff --git a/include/net/llc.h b/include/net/llc.h
index 31e9902..51cb761 100644
--- a/include/net/llc.h
+++ b/include/net/llc.h
@@ -16,6 +16,8 @@
 #include <linux/if_ether.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
+#include <linux/hash.h>
+#include <linux/jhash.h>
 
 #include <asm/atomic.h>
 
@@ -34,6 +36,9 @@ struct llc_addr {
 #define LLC_SK_DEV_HASH_BITS 10
 #define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS)
 
+#define LLC_SK_LADDR_HASH_BITS 10
+#define LLC_SK_LADDR_HASH_ENTRIES (1<<LLC_SK_LADDR_HASH_BITS)
+
 struct llc_sk_list {
 	rwlock_t lock;
 	struct hlist_head list;
@@ -61,7 +66,7 @@ struct llc_sap {
 				     struct net_device *orig_dev);
 	struct llc_addr	 laddr;
 	struct list_head node;
-	struct llc_sk_list sk_list;
+	struct llc_sk_list sk_laddr_hash[LLC_SK_LADDR_HASH_ENTRIES];
 	struct llc_sk_list sk_dev_hash[LLC_SK_DEV_HASH_ENTRIES];
 };
 
@@ -71,6 +76,20 @@ struct llc_sk_list *llc_sk_dev_hash(struct llc_sap *sap, int ifindex)
 	return &sap->sk_dev_hash[ifindex % LLC_SK_DEV_HASH_ENTRIES];
 }
 
+static inline
+u32 llc_sk_laddr_hashfn(struct llc_sap *sap, const struct llc_addr *laddr)
+{
+	return hash_32(jhash(laddr->mac, sizeof(laddr->mac), 0),
+		       LLC_SK_LADDR_HASH_BITS);
+}
+
+static inline
+struct llc_sk_list *llc_sk_laddr_hash(struct llc_sap *sap,
+				      const struct llc_addr *laddr)
+{
+	return &sap->sk_laddr_hash[llc_sk_laddr_hashfn(sap, laddr)];
+}
+
 #define LLC_DEST_INVALID         0      /* Invalid LLC PDU type */
 #define LLC_DEST_SAP             1      /* Type 1 goes here */
 #define LLC_DEST_CONN            2      /* Type 2 goes here */
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index c3f47ec..14a2f34 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -485,9 +485,10 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
 {
 	struct sock *rc;
 	struct hlist_node *node;
+	struct llc_sk_list *laddr_hb = llc_sk_laddr_hash(sap, laddr);
 
-	read_lock(&sap->sk_list.lock);
-	sk_for_each(rc, node, &sap->sk_list.list) {
+	read_lock(&laddr_hb->lock);
+	sk_for_each(rc, node, &laddr_hb->list) {
 		struct llc_sock *llc = llc_sk(rc);
 
 		if (llc->laddr.lsap == laddr->lsap &&
@@ -500,7 +501,7 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
 	}
 	rc = NULL;
 found:
-	read_unlock(&sap->sk_list.lock);
+	read_unlock(&laddr_hb->lock);
 	return rc;
 }
 
@@ -516,6 +517,28 @@ struct sock *llc_lookup_established(struct llc_sap *sap,
 	return sk;
 }
 
+static struct sock *__llc_lookup_listener(struct llc_sap *sap,
+					  struct llc_addr *laddr)
+{
+	struct sock *rc = NULL;
+	struct hlist_node *node;
+	struct llc_sk_list *laddr_hb = llc_sk_laddr_hash(sap, laddr);
+
+	read_lock(&laddr_hb->lock);
+	sk_for_each(rc, node, &laddr_hb->list) {
+		struct llc_sock *llc = llc_sk(rc);
+
+		if (rc->sk_type == SOCK_STREAM && rc->sk_state == TCP_LISTEN &&
+		    llc->laddr.lsap == laddr->lsap &&
+		    llc_mac_match(llc->laddr.mac, laddr->mac))
+			goto found;
+	}
+
+found:
+	read_unlock(&laddr_hb->lock);
+	return rc;
+}
+
 /**
  *	llc_lookup_listener - Finds listener for local MAC + SAP
  *	@sap: SAP
@@ -529,24 +552,13 @@ struct sock *llc_lookup_established(struct llc_sap *sap,
 static struct sock *llc_lookup_listener(struct llc_sap *sap,
 					struct llc_addr *laddr)
 {
+	static struct llc_addr null_addr;
 	struct sock *rc;
-	struct hlist_node *node;
 
-	read_lock(&sap->sk_list.lock);
-	sk_for_each(rc, node, &sap->sk_list.list) {
-		struct llc_sock *llc = llc_sk(rc);
+	rc = __llc_lookup_listener(sap, laddr);
+	if (!rc)
+		rc = __llc_lookup_listener(sap, &null_addr);
 
-		if (rc->sk_type == SOCK_STREAM && rc->sk_state == TCP_LISTEN &&
-		    llc->laddr.lsap == laddr->lsap &&
-		    (llc_mac_match(llc->laddr.mac, laddr->mac) ||
-		     llc_mac_null(llc->laddr.mac))) {
-			sock_hold(rc);
-			goto found;
-		}
-	}
-	rc = NULL;
-found:
-	read_unlock(&sap->sk_list.lock);
 	return rc;
 }
 
@@ -647,19 +659,20 @@ static int llc_find_offset(int state, int ev_type)
  *	@sap: SAP
  *	@sk: socket
  *
- *	This function adds a socket to sk_list of a SAP.
+ *	This function adds a socket to the hash tables of a SAP.
  */
 void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
 {
 	struct llc_sock *llc = llc_sk(sk);
 	struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, llc->dev->ifindex);
+	struct llc_sk_list *laddr_hb = llc_sk_laddr_hash(sap, &llc->laddr);
 
 	llc_sap_hold(sap);
 	llc->sap = sap;
 
-	write_lock_bh(&sap->sk_list.lock);
-	sk_add_node(sk, &sap->sk_list.list);
-	write_unlock_bh(&sap->sk_list.lock);
+	write_lock_bh(&laddr_hb->lock);
+	sk_add_node(sk, &laddr_hb->list);
+	write_unlock_bh(&laddr_hb->lock);
 
 	write_lock_bh(&dev_hb->lock);
 	hlist_add_head(&llc->dev_hash_node, &dev_hb->list);
@@ -671,17 +684,18 @@ void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
  *	@sap: SAP
  *	@sk: socket
  *
- *	This function removes a connection from sk_list.list of a SAP if
+ *	This function removes a connection from the hash tables of a SAP if
  *	the connection was in this list.
  */
 void llc_sap_remove_socket(struct llc_sap *sap, struct sock *sk)
 {
 	struct llc_sock *llc = llc_sk(sk);
 	struct llc_sk_list *dev_hb = llc_sk_dev_hash(sap, llc->dev->ifindex);
+	struct llc_sk_list *laddr_hb = llc_sk_laddr_hash(sap, &llc->laddr);
 
-	write_lock_bh(&sap->sk_list.lock);
+	write_lock_bh(&laddr_hb->lock);
 	sk_del_node_init(sk);
-	write_unlock_bh(&sap->sk_list.lock);
+	write_unlock_bh(&laddr_hb->lock);
 
 	write_lock_bh(&dev_hb->lock);
 	hlist_del(&llc->dev_hash_node);
diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
index 1d79cda..dccc5e6 100644
--- a/net/llc/llc_core.c
+++ b/net/llc/llc_core.c
@@ -38,10 +38,11 @@ static struct llc_sap *llc_sap_alloc(void)
 	if (sap) {
 		/* sap->laddr.mac - leave as a null, it's filled by bind */
 		sap->state = LLC_SAP_STATE_ACTIVE;
-		rwlock_init(&sap->sk_list.lock);
 		atomic_set(&sap->refcnt, 1);
 		for (i = 0; i < LLC_SK_DEV_HASH_ENTRIES; i++)
 			rwlock_init(&sap->sk_dev_hash[i].lock);
+		for (i = 0; i < LLC_SK_LADDR_HASH_ENTRIES; i++)
+			rwlock_init(&sap->sk_laddr_hash[i].lock);
 	}
 	return sap;
 }
@@ -145,7 +146,6 @@ out:
  */
 void llc_sap_close(struct llc_sap *sap)
 {
-	WARN_ON(!hlist_empty(&sap->sk_list.list));
 	llc_del_sap(sap);
 	kfree(sap);
 }
diff --git a/net/llc/llc_proc.c b/net/llc/llc_proc.c
index be47ac4..58613c6 100644
--- a/net/llc/llc_proc.c
+++ b/net/llc/llc_proc.c
@@ -32,21 +32,23 @@ static void llc_ui_format_mac(struct seq_file *seq, u8 *addr)
 
 static struct sock *llc_get_sk_idx(loff_t pos)
 {
-	struct list_head *sap_entry;
 	struct llc_sap *sap;
-	struct hlist_node *node;
 	struct sock *sk = NULL;
-
-	list_for_each(sap_entry, &llc_sap_list) {
-		sap = list_entry(sap_entry, struct llc_sap, node);
-
-		read_lock_bh(&sap->sk_list.lock);
-		sk_for_each(sk, node, &sap->sk_list.list) {
-			if (!pos)
-				goto found;
-			--pos;
+	int i;
+
+	list_for_each_entry(sap, &llc_sap_list, node) {
+		for (i = 0; i < LLC_SK_LADDR_HASH_ENTRIES; i++) {
+			struct llc_sk_list *laddr_hb = &sap->sk_laddr_hash[i];
+			struct hlist_node *node;
+
+			read_lock_bh(&laddr_hb->lock);
+			sk_for_each(sk, node, &laddr_hb->list) {
+				if (!pos)
+					goto found; /* keep the lock */
+				--pos;
+			}
+			read_unlock_bh(&laddr_hb->lock);
 		}
-		read_unlock_bh(&sap->sk_list.lock);
 	}
 	sk = NULL;
 found:
@@ -61,6 +63,28 @@ static void *llc_seq_start(struct seq_file *seq, loff_t *pos)
 	return l ? llc_get_sk_idx(--l) : SEQ_START_TOKEN;
 }
 
+static struct sock *laddr_hash_next(struct llc_sap *sap, int bucket)
+{
+	struct llc_sk_list *laddr_hb;
+	struct hlist_node *node;
+	struct sock *sk = NULL;
+
+	/* release the lock previously taken below or in llc_get_sk_idx */
+	if (bucket >= 0)
+		read_unlock_bh(&sap->sk_laddr_hash[bucket].lock);
+
+	while (++bucket < LLC_SK_LADDR_HASH_ENTRIES) {
+		laddr_hb = &sap->sk_laddr_hash[bucket];
+		read_lock_bh(&laddr_hb->lock);
+		sk_for_each(sk, node, &laddr_hb->list)
+			goto out; /* keep the lock */
+		read_unlock_bh(&laddr_hb->lock);
+	}
+
+out:
+	return sk;
+}
+
 static void *llc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 {
 	struct sock* sk, *next;
@@ -80,18 +104,13 @@ static void *llc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	}
 	llc = llc_sk(sk);
 	sap = llc->sap;
-	read_unlock_bh(&sap->sk_list.lock);
-	sk = NULL;
-	for (;;) {
-		if (sap->node.next == &llc_sap_list)
-			break;
-		sap = list_entry(sap->node.next, struct llc_sap, node);
-		read_lock_bh(&sap->sk_list.lock);
-		if (!hlist_empty(&sap->sk_list.list)) {
-			sk = sk_head(&sap->sk_list.list);
+	sk = laddr_hash_next(sap, llc_sk_laddr_hashfn(sap, &llc->laddr));
+	if (sk)
+		goto out;
+	list_for_each_entry_continue(sap, &llc_sap_list, node) {
+		sk = laddr_hash_next(sap, -1);
+		if (sk)
 			break;
-		}
-		read_unlock_bh(&sap->sk_list.lock);
 	}
 out:
 	return sk;
@@ -104,7 +123,7 @@ static void llc_seq_stop(struct seq_file *seq, void *v)
 		struct llc_sock *llc = llc_sk(sk);
 		struct llc_sap *sap = llc->sap;
 
-		read_unlock_bh(&sap->sk_list.lock);
+		read_unlock_bh(&llc_sk_laddr_hash(sap, &llc->laddr)->lock);
 	}
 	read_unlock_bh(&llc_sap_list_lock);
 }
diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c
index 11bad55..2622c3f 100644
--- a/net/llc/llc_sap.c
+++ b/net/llc/llc_sap.c
@@ -310,9 +310,10 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap,
 {
 	struct sock *rc;
 	struct hlist_node *node;
+	struct llc_sk_list *laddr_hb = llc_sk_laddr_hash(sap, laddr);
 
-	read_lock_bh(&sap->sk_list.lock);
-	sk_for_each(rc, node, &sap->sk_list.list) {
+	read_lock_bh(&laddr_hb->lock);
+	sk_for_each(rc, node, &laddr_hb->list) {
 		struct llc_sock *llc = llc_sk(rc);
 
 		if (rc->sk_type == SOCK_DGRAM &&
@@ -324,7 +325,7 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap,
 	}
 	rc = NULL;
 found:
-	read_unlock_bh(&sap->sk_list.lock);
+	read_unlock_bh(&laddr_hb->lock);
 	return rc;
 }
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
@ 2009-12-03 22:59   ` Eric Dumazet
  2009-12-03 23:30     ` Octavian Purdila
  2009-12-03 23:25   ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Stephen Hemminger
  1 sibling, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2009-12-03 22:59 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

Octavian Purdila a écrit :
> This patch adds a per SAP device based hash table to solve the
> multicast delivery scalability issues for the case where the are a
> large number of interfaces and a large number of sockets (bound to the
> same SAP) are used.
> 
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> ---
>  include/net/llc.h      |   20 ++++++++++++++++----
>  include/net/llc_conn.h |    1 +
>  net/llc/llc_conn.c     |   18 +++++++++++++++++-
>  net/llc/llc_core.c     |    3 +++
>  net/llc/llc_sap.c      |   11 ++++++-----
>  5 files changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/llc.h b/include/net/llc.h
> index 7940da1..31e9902 100644
> --- a/include/net/llc.h
> +++ b/include/net/llc.h
> @@ -31,6 +31,14 @@ struct llc_addr {
>  #define LLC_SAP_STATE_INACTIVE	1
>  #define LLC_SAP_STATE_ACTIVE	2
>  
> +#define LLC_SK_DEV_HASH_BITS 10
> +#define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS)
> +
> +struct llc_sk_list {
> +	rwlock_t lock;
> +	struct hlist_head list;
> +};
> +

This patch introduces a big hash table, and 1024 rwlocks, for IXIACOM need.

Is the problem both on lock contention and lookup ?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
  2009-12-03 22:59   ` Eric Dumazet
@ 2009-12-03 23:25   ` Stephen Hemminger
  2009-12-03 23:53     ` Octavian Purdila
  1 sibling, 1 reply; 22+ messages in thread
From: Stephen Hemminger @ 2009-12-03 23:25 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: opurdila, netdev, Arnaldo Carvalho de Melo

On Fri,  4 Dec 2009 00:31:37 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> This patch adds a per SAP device based hash table to solve the
> multicast delivery scalability issues for the case where the are a
> large number of interfaces and a large number of sockets (bound to the
> same SAP) are used.

Rather than adding hash table and rwlock, why not hash list RCU
and a single spin lock

-- 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 22:59   ` Eric Dumazet
@ 2009-12-03 23:30     ` Octavian Purdila
  2009-12-03 23:52       ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 23:30 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Arnaldo Carvalho de Melo

On Friday 04 December 2009 00:59:58 you wrote:
> Octavian Purdila a écrit :
> > This patch adds a per SAP device based hash table to solve the
> > multicast delivery scalability issues for the case where the are a
> > large number of interfaces and a large number of sockets (bound to the
> > same SAP) are used.
> >
> > Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> > ---
> >  include/net/llc.h      |   20 ++++++++++++++++----
> >  include/net/llc_conn.h |    1 +
> >  net/llc/llc_conn.c     |   18 +++++++++++++++++-
> >  net/llc/llc_core.c     |    3 +++
> >  net/llc/llc_sap.c      |   11 ++++++-----
> >  5 files changed, 43 insertions(+), 10 deletions(-)
> >
> > diff --git a/include/net/llc.h b/include/net/llc.h
> > index 7940da1..31e9902 100644
> > --- a/include/net/llc.h
> > +++ b/include/net/llc.h
> > @@ -31,6 +31,14 @@ struct llc_addr {
> >  #define LLC_SAP_STATE_INACTIVE	1
> >  #define LLC_SAP_STATE_ACTIVE	2
> >
> > +#define LLC_SK_DEV_HASH_BITS 10
> > +#define LLC_SK_DEV_HASH_ENTRIES (1<<LLC_SK_DEV_HASH_BITS)
> > +
> > +struct llc_sk_list {
> > +	rwlock_t lock;
> > +	struct hlist_head list;
> > +};
> > +
> 
> This patch introduces a big hash table, and 1024 rwlocks, for IXIACOM need.
> 

Yes, that is probably not appropriate for upstream. What would be a good 
value?

> Is the problem both on lock contention and lookup ?

Since at this point we are using UP ports contention is not really an issue 
for us. I've extrapolated this (lock per hash bucket) based on how locking is 
done in other places, like UDP. 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 23:30     ` Octavian Purdila
@ 2009-12-03 23:52       ` Eric Dumazet
  2009-12-04  0:15         ` Octavian Purdila
  2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
  0 siblings, 2 replies; 22+ messages in thread
From: Eric Dumazet @ 2009-12-03 23:52 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

Octavian Purdila a écrit :
> 
> Yes, that is probably not appropriate for upstream. What would be a good 
> value?
> 

A small one to begin (say 64).

> Since at this point we are using UP ports contention is not really an issue 
> for us. I've extrapolated this (lock per hash bucket) based on how locking is 
> done in other places, like UDP. 

Yes but you know we want to remove those locks per UDP hash bucket, since we dont
really need them anymore. ;)


If you remember, we had in the past one rwlock for the whole UDP table.

Then this was converted to one spinlock per hash slot (128 slots) + RCU lookups for unicast RX

Then we dynamically sized udp table at boot (up to 65536 slots)

multicast optimization (holding lock for small duration + double hashing)

bind optimization (thanks to double hashing)

To be done :

1) multicast RX can be done without taking any lock, and RCU lookups
2) zap all locks and use one lock, or a small array of hashed spinlocks



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 23:25   ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Stephen Hemminger
@ 2009-12-03 23:53     ` Octavian Purdila
  2009-12-04  0:37       ` Stephen Hemminger
  0 siblings, 1 reply; 22+ messages in thread
From: Octavian Purdila @ 2009-12-03 23:53 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Arnaldo Carvalho de Melo

On Friday 04 December 2009 01:25:13 you wrote:
> On Fri,  4 Dec 2009 00:31:37 +0200
> 
> Octavian Purdila <opurdila@ixiacom.com> wrote:
> > This patch adds a per SAP device based hash table to solve the
> > multicast delivery scalability issues for the case where the are a
> > large number of interfaces and a large number of sockets (bound to the
> > same SAP) are used.
> 
> Rather than adding hash table and rwlock, why not hash list RCU
> and a single spin lock
> 

I have a partial version with RCU and single spinlock, but then I ran into a 
(Eric's I think) patch which moved the UDP lock per bucket. And since RCU 
can't help on the write side (in this instance each time we bound or delete 
the socket) it was not clear to me what is the best approach.

Should I go ahead with the RCU and single lock?


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] llc enhancements
  2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
                   ` (3 preceding siblings ...)
  2009-12-03 22:31 ` [PATCH 4/4] llc: replace the socket list with a local address based hash Octavian Purdila
@ 2009-12-03 23:55 ` David Miller
  2009-12-04  0:20   ` Octavian Purdila
  4 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2009-12-03 23:55 UTC (permalink / raw)
  To: opurdila; +Cc: netdev, acme

From: Octavian Purdila <opurdila@ixiacom.com>
Date: Fri,  4 Dec 2009 00:31:34 +0200

> 
> This patch series introduces a couple of general enhancements to LLC
> socket code as well as changes to scale the LLC socket lookup code for
> a large number of interfaces needed for our uses cases (STP traffic
> generation from a large number of ports, via virtual network
> interfaces).

These patches are still in flux, in particular the hash table
bits need changes, so this stuff will have to wait until the
next merge window, it's too late for the current one sorry.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 23:52       ` Eric Dumazet
@ 2009-12-04  0:15         ` Octavian Purdila
  2009-12-04  0:28           ` Eric Dumazet
  2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
  1 sibling, 1 reply; 22+ messages in thread
From: Octavian Purdila @ 2009-12-04  0:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Arnaldo Carvalho de Melo

On Friday 04 December 2009 01:52:44 you wrote:
> Octavian Purdila a écrit :

> > Since at this point we are using UP ports contention is not really an
> > issue for us. I've extrapolated this (lock per hash bucket) based on how
> > locking is done in other places, like UDP.
> 
> Yes but you know we want to remove those locks per UDP hash bucket, since
>  we dont really need them anymore. ;)
> 
> If you remember, we had in the past one rwlock for the whole UDP table.
> 
> Then this was converted to one spinlock per hash slot (128 slots) + RCU
>  lookups for unicast RX
> 
> Then we dynamically sized udp table at boot (up to 65536 slots)
> 
> multicast optimization (holding lock for small duration + double hashing)
> 
> bind optimization (thanks to double hashing)
> 
> To be done :
> 
> 1) multicast RX can be done without taking any lock, and RCU lookups
> 2) zap all locks and use one lock, or a small array of hashed spinlocks
> 

Thanks for the nice summary Eric !

I still have one doubt related to this: we still need locking for creating and 
destroying sockets to insert/remove them into/from the hash, RCU can't help us 
here, right? 

In that case wouldn't spinlock contention become an issue for short lived 
connections? Probably not for UDP (or LLC), but for TCP I certainly can think 
of a few usecases for short lived connections.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/4] llc enhancements
  2009-12-03 23:55 ` [PATCH 0/4] llc enhancements David Miller
@ 2009-12-04  0:20   ` Octavian Purdila
  0 siblings, 0 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-04  0:20 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, acme

On Friday 04 December 2009 01:55:53 you wrote:
> From: Octavian Purdila <opurdila@ixiacom.com>
> Date: Fri,  4 Dec 2009 00:31:34 +0200
> 
> > This patch series introduces a couple of general enhancements to LLC
> > socket code as well as changes to scale the LLC socket lookup code for
> > a large number of interfaces needed for our uses cases (STP traffic
> > generation from a large number of ports, via virtual network
> > interfaces).
> 
> These patches are still in flux, in particular the hash table
> bits need changes, so this stuff will have to wait until the
> next merge window, it's too late for the current one sorry.
> 

Sure, not problem, I didn't intended these for this merge window. I've should 
have marked them as [RFC PATCH], sorry for the confusion.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-04  0:15         ` Octavian Purdila
@ 2009-12-04  0:28           ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2009-12-04  0:28 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

Octavian Purdila a écrit :
> On Friday 04 December 2009 01:52:44 you wrote:
>> Octavian Purdila a écrit :
> 
>>> Since at this point we are using UP ports contention is not really an
>>> issue for us. I've extrapolated this (lock per hash bucket) based on how
>>> locking is done in other places, like UDP.
>> Yes but you know we want to remove those locks per UDP hash bucket, since
>>  we dont really need them anymore. ;)
>>
>> If you remember, we had in the past one rwlock for the whole UDP table.
>>
>> Then this was converted to one spinlock per hash slot (128 slots) + RCU
>>  lookups for unicast RX
>>
>> Then we dynamically sized udp table at boot (up to 65536 slots)
>>
>> multicast optimization (holding lock for small duration + double hashing)
>>
>> bind optimization (thanks to double hashing)
>>
>> To be done :
>>
>> 1) multicast RX can be done without taking any lock, and RCU lookups
>> 2) zap all locks and use one lock, or a small array of hashed spinlocks
>>
> 
> Thanks for the nice summary Eric !
> 
> I still have one doubt related to this: we still need locking for creating and 
> destroying sockets to insert/remove them into/from the hash, RCU can't help us 
> here, right? 

Sure. RCU is used for readers only. We still need locks to protect writers
against them.

> 
> In that case wouldn't spinlock contention become an issue for short lived 
> connections? Probably not for UDP (or LLC), but for TCP I certainly can think 
> of a few usecases for short lived connections.

Yes, this is why an array of hashed spinlocks would be good, (as already done with TCP
for example, or IP route cache)

(Say you have a table of 65536 UDP slots on your high performance server,
handling millions of udp sockets, you dont _need_ 65536 spinlocks, but some number
related to number of cpus)



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery
  2009-12-03 23:53     ` Octavian Purdila
@ 2009-12-04  0:37       ` Stephen Hemminger
  0 siblings, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2009-12-04  0:37 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

On Fri, 4 Dec 2009 01:53:11 +0200
Octavian Purdila <opurdila@ixiacom.com> wrote:

> On Friday 04 December 2009 01:25:13 you wrote:
> > On Fri,  4 Dec 2009 00:31:37 +0200
> > 
> > Octavian Purdila <opurdila@ixiacom.com> wrote:
> > > This patch adds a per SAP device based hash table to solve the
> > > multicast delivery scalability issues for the case where the are a
> > > large number of interfaces and a large number of sockets (bound to the
> > > same SAP) are used.
> > 
> > Rather than adding hash table and rwlock, why not hash list RCU
> > and a single spin lock
> > 
> 
> I have a partial version with RCU and single spinlock, but then I ran into a 
> (Eric's I think) patch which moved the UDP lock per bucket. And since RCU 
> can't help on the write side (in this instance each time we bound or delete 
> the socket) it was not clear to me what is the best approach.

The lock is held for such a brief period on connection setup that a single
spinlock is probably ok.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-03 23:52       ` Eric Dumazet
  2009-12-04  0:15         ` Octavian Purdila
@ 2009-12-08 21:10         ` Octavian Purdila
  2009-12-08 21:26           ` Eric Dumazet
                             ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Octavian Purdila @ 2009-12-08 21:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Arnaldo Carvalho de Melo

On Friday 04 December 2009 01:52:44 you wrote:

> > Since at this point we are using UP ports contention is not really an
> > issue for us. I've extrapolated this (lock per hash bucket) based on how
> > locking is done in other places, like UDP.
> 
> Yes but you know we want to remove those locks per UDP hash bucket, since
>  we dont really need them anymore. ;)
> 
> 
> If you remember, we had in the past one rwlock for the whole UDP table.
> 
> Then this was converted to one spinlock per hash slot (128 slots) + RCU
>  lookups for unicast RX
> 
> Then we dynamically sized udp table at boot (up to 65536 slots)
> 
> multicast optimization (holding lock for small duration + double hashing)
> 
> bind optimization (thanks to double hashing)
> 
> To be done :
> 
> 1) multicast RX can be done without taking any lock, and RCU lookups
> 2) zap all locks and use one lock, or a small array of hashed spinlocks
> 

OK, here is my first try at llc RCU-fication. 

One doubt before pasting the code: In slab.h comment and in udp.c I see the lookup is restarted if an improper object is returned. Is that really required?

Thanks!
tavi
 
[RFC PATCH] llc: convert the socket list to RCU locking
    
For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
which require some extra checks in the lookup code:
    
a) Since socket can be free while looking it up, or getting a
reference to it, we need to check the socket reference counter to make
sure the reference we got points to a live socket
    
b) It can also happen that the reference we got during lookup will be
freed and the memory reused by another socket. Thus, after getting the
reference, we must check again that we got the right socket.
    
Note that the /proc code was not yet converted to RCU, it still uses
spinlocks for protection.
    
Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
---
 include/net/llc.h  |    7 ++--
 net/llc/af_llc.c   |    1 +
 net/llc/llc_conn.c |   89 ++++++++++++++++++++++++++++++++++------------------
 net/llc/llc_core.c |    5 ++-
 net/llc/llc_proc.c |   20 ++++++------
 net/llc/llc_sap.c  |   72 ++++++++++++++++++++++++++++-------------
 6 files changed, 124 insertions(+), 70 deletions(-)

diff --git a/include/net/llc.h b/include/net/llc.h
index 7940da1..1559cf1 100644
--- a/include/net/llc.h
+++ b/include/net/llc.h
@@ -16,6 +16,7 @@
 #include <linux/if_ether.h>
 #include <linux/list.h>
 #include <linux/spinlock.h>
+#include <linux/rculist_nulls.h>
 
 #include <asm/atomic.h>
 
@@ -53,10 +54,8 @@ struct llc_sap {
 				     struct net_device *orig_dev);
 	struct llc_addr	 laddr;
 	struct list_head node;
-	struct {
-		rwlock_t	  lock;
-		struct hlist_head list;
-	} sk_list;
+	spinlock_t sk_lock;
+	struct hlist_nulls_head sk_list;
 };
 
 #define LLC_DEST_INVALID         0      /* Invalid LLC PDU type */
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index ffc96d3..baefb4f 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -140,6 +140,7 @@ static struct proto llc_proto = {
 	.name	  = "LLC",
 	.owner	  = THIS_MODULE,
 	.obj_size = sizeof(struct llc_sock),
+	.slab_flags = SLAB_DESTROY_BY_RCU,
 };
 
 /**
diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
index c6bab39..8331fc8 100644
--- a/net/llc/llc_conn.c
+++ b/net/llc/llc_conn.c
@@ -468,6 +468,19 @@ static int llc_exec_conn_trans_actions(struct sock *sk,
 	return rc;
 }
 
+static inline bool llc_estab_match(const struct llc_sap *sap,
+				   const struct llc_addr *daddr,
+				   const struct llc_addr *laddr,
+				   const struct sock *sk)
+{
+	struct llc_sock *llc = llc_sk(sk);
+
+	return llc->laddr.lsap == laddr->lsap &&
+		llc->daddr.lsap == daddr->lsap &&
+		llc_mac_match(llc->laddr.mac, laddr->mac) &&
+		llc_mac_match(llc->daddr.mac, daddr->mac);
+}
+
 /**
  *	__llc_lookup_established - Finds connection for the remote/local sap/mac
  *	@sap: SAP
@@ -484,23 +497,24 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
 					     struct llc_addr *laddr)
 {
 	struct sock *rc;
-	struct hlist_node *node;
-
-	read_lock(&sap->sk_list.lock);
-	sk_for_each(rc, node, &sap->sk_list.list) {
-		struct llc_sock *llc = llc_sk(rc);
-
-		if (llc->laddr.lsap == laddr->lsap &&
-		    llc->daddr.lsap == daddr->lsap &&
-		    llc_mac_match(llc->laddr.mac, laddr->mac) &&
-		    llc_mac_match(llc->daddr.mac, daddr->mac)) {
-			sock_hold(rc);
+	struct hlist_nulls_node *node;
+
+	rcu_read_lock();
+	sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
+		if (llc_estab_match(sap, daddr, laddr, rc)) {
+			/* Extra checks required by SLAB_DESTROY_BY_RCU */
+			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+				continue;
+			if (unlikely(!llc_estab_match(sap, daddr, laddr, rc))) {
+				sock_put(rc);
+				continue;
+			}
 			goto found;
 		}
 	}
 	rc = NULL;
 found:
-	read_unlock(&sap->sk_list.lock);
+	rcu_read_unlock();
 	return rc;
 }
 
@@ -516,6 +530,18 @@ struct sock *llc_lookup_established(struct llc_sap *sap,
 	return sk;
 }
 
+static inline bool llc_listener_match(const struct llc_sap *sap,
+				      const struct llc_addr *laddr,
+				      const struct sock *sk)
+{
+	struct llc_sock *llc = llc_sk(sk);
+
+	return sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN &&
+		llc->laddr.lsap == laddr->lsap &&
+		(llc_mac_match(llc->laddr.mac, laddr->mac) ||
+		 llc_mac_null(llc->laddr.mac));
+}
+
 /**
  *	llc_lookup_listener - Finds listener for local MAC + SAP
  *	@sap: SAP
@@ -530,23 +556,24 @@ static struct sock *llc_lookup_listener(struct llc_sap *sap,
 					struct llc_addr *laddr)
 {
 	struct sock *rc;
-	struct hlist_node *node;
-
-	read_lock(&sap->sk_list.lock);
-	sk_for_each(rc, node, &sap->sk_list.list) {
-		struct llc_sock *llc = llc_sk(rc);
-
-		if (rc->sk_type == SOCK_STREAM && rc->sk_state == TCP_LISTEN &&
-		    llc->laddr.lsap == laddr->lsap &&
-		    (llc_mac_match(llc->laddr.mac, laddr->mac) ||
-		     llc_mac_null(llc->laddr.mac))) {
-			sock_hold(rc);
+	struct hlist_nulls_node *node;
+
+	rcu_read_lock();
+	sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
+		if (llc_listener_match(sap, laddr, rc)) {
+			/* Extra checks required by SLAB_DESTROY_BY_RCU */
+			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+				continue;
+			if (unlikely(!llc_listener_match(sap, laddr, rc))) {
+				sock_put(rc);
+				continue;
+			}
 			goto found;
 		}
 	}
 	rc = NULL;
 found:
-	read_unlock(&sap->sk_list.lock);
+	rcu_read_unlock();
 	return rc;
 }
 
@@ -652,10 +679,10 @@ static int llc_find_offset(int state, int ev_type)
 void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
 {
 	llc_sap_hold(sap);
-	write_lock_bh(&sap->sk_list.lock);
+	spin_lock_bh(&sap->sk_lock);
 	llc_sk(sk)->sap = sap;
-	sk_add_node(sk, &sap->sk_list.list);
-	write_unlock_bh(&sap->sk_list.lock);
+	sk_nulls_add_node_rcu(sk, &sap->sk_list);
+	spin_lock_bh(&sap->sk_lock);
 }
 
 /**
@@ -663,14 +690,14 @@ void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
  *	@sap: SAP
  *	@sk: socket
  *
- *	This function removes a connection from sk_list.list of a SAP if
+ *	This function removes a connection from sk_list of a SAP if
  *	the connection was in this list.
  */
 void llc_sap_remove_socket(struct llc_sap *sap, struct sock *sk)
 {
-	write_lock_bh(&sap->sk_list.lock);
-	sk_del_node_init(sk);
-	write_unlock_bh(&sap->sk_list.lock);
+	spin_lock_bh(&sap->sk_lock);
+	sk_nulls_del_node_init_rcu(sk);
+	spin_unlock_bh(&sap->sk_lock);
 	llc_sap_put(sap);
 }
 
diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
index ff4c0ab..5276b97 100644
--- a/net/llc/llc_core.c
+++ b/net/llc/llc_core.c
@@ -37,7 +37,8 @@ static struct llc_sap *llc_sap_alloc(void)
 	if (sap) {
 		/* sap->laddr.mac - leave as a null, it's filled by bind */
 		sap->state = LLC_SAP_STATE_ACTIVE;
-		rwlock_init(&sap->sk_list.lock);
+		spin_lock_init(&sap->sk_lock);
+		INIT_HLIST_NULLS_HEAD(&sap->sk_list, 0);
 		atomic_set(&sap->refcnt, 1);
 	}
 	return sap;
@@ -142,7 +143,7 @@ out:
  */
 void llc_sap_close(struct llc_sap *sap)
 {
-	WARN_ON(!hlist_empty(&sap->sk_list.list));
+	WARN_ON(!hlist_nulls_empty(&sap->sk_list));
 	llc_del_sap(sap);
 	kfree(sap);
 }
diff --git a/net/llc/llc_proc.c b/net/llc/llc_proc.c
index be47ac4..6546093 100644
--- a/net/llc/llc_proc.c
+++ b/net/llc/llc_proc.c
@@ -34,19 +34,19 @@ static struct sock *llc_get_sk_idx(loff_t pos)
 {
 	struct list_head *sap_entry;
 	struct llc_sap *sap;
-	struct hlist_node *node;
+	struct hlist_nulls_node *node;
 	struct sock *sk = NULL;
 
 	list_for_each(sap_entry, &llc_sap_list) {
 		sap = list_entry(sap_entry, struct llc_sap, node);
 
-		read_lock_bh(&sap->sk_list.lock);
-		sk_for_each(sk, node, &sap->sk_list.list) {
+		spin_lock_bh(&sap->sk_lock);
+		sk_nulls_for_each(sk, node, &sap->sk_list) {
 			if (!pos)
 				goto found;
 			--pos;
 		}
-		read_unlock_bh(&sap->sk_list.lock);
+		spin_unlock_bh(&sap->sk_lock);
 	}
 	sk = NULL;
 found:
@@ -80,18 +80,18 @@ static void *llc_seq_next(struct seq_file *seq, void *v, loff_t *pos)
 	}
 	llc = llc_sk(sk);
 	sap = llc->sap;
-	read_unlock_bh(&sap->sk_list.lock);
+	spin_unlock_bh(&sap->sk_lock);
 	sk = NULL;
 	for (;;) {
 		if (sap->node.next == &llc_sap_list)
 			break;
 		sap = list_entry(sap->node.next, struct llc_sap, node);
-		read_lock_bh(&sap->sk_list.lock);
-		if (!hlist_empty(&sap->sk_list.list)) {
-			sk = sk_head(&sap->sk_list.list);
+		spin_lock_bh(&sap->sk_lock);
+		if (!hlist_nulls_empty(&sap->sk_list)) {
+			sk = sk_nulls_head(&sap->sk_list);
 			break;
 		}
-		read_unlock_bh(&sap->sk_list.lock);
+		spin_unlock_bh(&sap->sk_lock);
 	}
 out:
 	return sk;
@@ -104,7 +104,7 @@ static void llc_seq_stop(struct seq_file *seq, void *v)
 		struct llc_sock *llc = llc_sk(sk);
 		struct llc_sap *sap = llc->sap;
 
-		read_unlock_bh(&sap->sk_list.lock);
+		spin_unlock_bh(&sap->sk_lock);
 	}
 	read_unlock_bh(&llc_sap_list_lock);
 }
diff --git a/net/llc/llc_sap.c b/net/llc/llc_sap.c
index 008de1f..655bbf8 100644
--- a/net/llc/llc_sap.c
+++ b/net/llc/llc_sap.c
@@ -297,6 +297,17 @@ static void llc_sap_rcv(struct llc_sap *sap, struct sk_buff *skb,
 	llc_sap_state_process(sap, skb);
 }
 
+static inline bool llc_dgram_match(const struct llc_sap *sap,
+				   const struct llc_addr *laddr,
+				   const struct sock *sk)
+{
+     struct llc_sock *llc = llc_sk(sk);
+
+     return sk->sk_type == SOCK_DGRAM &&
+	  llc->laddr.lsap == laddr->lsap &&
+	  llc_mac_match(llc->laddr.mac, laddr->mac);
+}
+
 /**
  *	llc_lookup_dgram - Finds dgram socket for the local sap/mac
  *	@sap: SAP
@@ -309,25 +320,39 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap,
 				     const struct llc_addr *laddr)
 {
 	struct sock *rc;
-	struct hlist_node *node;
-
-	read_lock_bh(&sap->sk_list.lock);
-	sk_for_each(rc, node, &sap->sk_list.list) {
-		struct llc_sock *llc = llc_sk(rc);
-
-		if (rc->sk_type == SOCK_DGRAM &&
-		    llc->laddr.lsap == laddr->lsap &&
-		    llc_mac_match(llc->laddr.mac, laddr->mac)) {
-			sock_hold(rc);
+	struct hlist_nulls_node *node;
+
+	rcu_read_lock_bh();
+	sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
+		if (llc_dgram_match(sap, laddr, rc)) {
+			/* Extra checks required by SLAB_DESTROY_BY_RCU */
+			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
+				continue;
+			if (unlikely(!llc_dgram_match(sap, laddr, rc))) {
+				sock_put(rc);
+				continue;
+			}
 			goto found;
 		}
 	}
 	rc = NULL;
 found:
-	read_unlock_bh(&sap->sk_list.lock);
+	rcu_read_unlock_bh();
 	return rc;
 }
 
+static inline bool llc_mcast_match(const struct llc_sap *sap,
+				   const struct llc_addr *laddr,
+				   const struct sk_buff *skb,
+				   const struct sock *sk)
+{
+     struct llc_sock *llc = llc_sk(sk);
+
+     return sk->sk_type == SOCK_DGRAM &&
+	  llc->laddr.lsap == laddr->lsap &&
+	  llc->dev == skb->dev;
+}
+
 /**
  * 	llc_sap_mcast - Deliver multicast PDU's to all matching datagram sockets.
  *	@sap: SAP
@@ -341,31 +366,32 @@ static void llc_sap_mcast(struct llc_sap *sap,
 			  struct sk_buff *skb)
 {
 	struct sock *sk;
-	struct hlist_node *node;
+	struct hlist_nulls_node *node;
 
-	read_lock_bh(&sap->sk_list.lock);
-	sk_for_each(sk, node, &sap->sk_list.list) {
-		struct llc_sock *llc = llc_sk(sk);
+	rcu_read_lock_bh();
+	sk_nulls_for_each_rcu(sk, node, &sap->sk_list) {
 		struct sk_buff *skb1;
 
-		if (sk->sk_type != SOCK_DGRAM)
+		if (!llc_mcast_match(sap, laddr, skb, sk))
 			continue;
-
-		if (llc->laddr.lsap != laddr->lsap)
+		/* Extra checks required by SLAB_DESTROY_BY_RCU */
+		if (unlikely(!atomic_inc_not_zero(&sk->sk_refcnt)))
 			continue;
-
-		if (llc->dev != skb->dev)
+		if (unlikely(!llc_mcast_match(sap, laddr, skb, sk))) {
+			sock_put(sk);
 			continue;
+		}
 
 		skb1 = skb_clone(skb, GFP_ATOMIC);
-		if (!skb1)
+		if (!skb1) {
+			sock_put(sk);
 			break;
+		}
 
-		sock_hold(sk);
 		llc_sap_rcv(sap, skb1, sk);
 		sock_put(sk);
 	}
-	read_unlock_bh(&sap->sk_list.lock);
+	rcu_read_unlock_bh();
 }
 
 


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
@ 2009-12-08 21:26           ` Eric Dumazet
  2009-12-09 20:19           ` Eric Dumazet
  2009-12-09 20:52           ` Jarek Poplawski
  2 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2009-12-08 21:26 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

Octavian Purdila a écrit :
>>
> 
> OK, here is my first try at llc RCU-fication. 

Nice work !

> 
> One doubt before pasting the code: In slab.h comment and in udp.c I see the lookup is restarted if an improper object is returned. Is that really required?
> 


Its needed only if you convert to a hash table (more than one chain),
and I guess you definitly want a fanout of your XXXX items ?

Check Documentation/RCU/rculist_nulls.txt for details :)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
  2009-12-08 21:26           ` Eric Dumazet
@ 2009-12-09 20:19           ` Eric Dumazet
  2009-12-09 20:36             ` Octavian Purdila
  2009-12-09 20:52           ` Jarek Poplawski
  2 siblings, 1 reply; 22+ messages in thread
From: Eric Dumazet @ 2009-12-09 20:19 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

Le 08/12/2009 22:10, Octavian Purdila a écrit :
> On Friday 04 December 2009 01:52:44 you wrote:
> 
>>> Since at this point we are using UP ports contention is not really an
>>> issue for us. I've extrapolated this (lock per hash bucket) based on how
>>> locking is done in other places, like UDP.
>>
>> Yes but you know we want to remove those locks per UDP hash bucket, since
>>  we dont really need them anymore. ;)
>>
>>
>> If you remember, we had in the past one rwlock for the whole UDP table.
>>
>> Then this was converted to one spinlock per hash slot (128 slots) + RCU
>>  lookups for unicast RX
>>
>> Then we dynamically sized udp table at boot (up to 65536 slots)
>>
>> multicast optimization (holding lock for small duration + double hashing)
>>
>> bind optimization (thanks to double hashing)
>>
>> To be done :
>>
>> 1) multicast RX can be done without taking any lock, and RCU lookups
>> 2) zap all locks and use one lock, or a small array of hashed spinlocks
>>
> 
> OK, here is my first try at llc RCU-fication. 
> 
> One doubt before pasting the code: In slab.h comment and in udp.c I see the lookup is restarted if an improper object is returned. Is that really required?
> 
> Thanks!
> tavi
>  
> [RFC PATCH] llc: convert the socket list to RCU locking
>     
> For the reclamation phase we use the SLAB_DESTROY_BY_RCU mechanism,
> which require some extra checks in the lookup code:
>     
> a) Since socket can be free while looking it up, or getting a
> reference to it, we need to check the socket reference counter to make
> sure the reference we got points to a live socket
>     
> b) It can also happen that the reference we got during lookup will be
> freed and the memory reused by another socket. Thus, after getting the
> reference, we must check again that we got the right socket.
>     
> Note that the /proc code was not yet converted to RCU, it still uses
> spinlocks for protection.
>     
> Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
> ---
>  include/net/llc.h  |    7 ++--
>  net/llc/af_llc.c   |    1 +
>  net/llc/llc_conn.c |   89 ++++++++++++++++++++++++++++++++++------------------
>  net/llc/llc_core.c |    5 ++-
>  net/llc/llc_proc.c |   20 ++++++------
>  net/llc/llc_sap.c  |   72 ++++++++++++++++++++++++++++-------------
>  6 files changed, 124 insertions(+), 70 deletions(-)
> 
> diff --git a/include/net/llc.h b/include/net/llc.h
> index 7940da1..1559cf1 100644
> --- a/include/net/llc.h
> +++ b/include/net/llc.h
> @@ -16,6 +16,7 @@
>  #include <linux/if_ether.h>
>  #include <linux/list.h>
>  #include <linux/spinlock.h>
> +#include <linux/rculist_nulls.h>
>  
>  #include <asm/atomic.h>
>  
> @@ -53,10 +54,8 @@ struct llc_sap {
>  				     struct net_device *orig_dev);
>  	struct llc_addr	 laddr;
>  	struct list_head node;
> -	struct {
> -		rwlock_t	  lock;
> -		struct hlist_head list;
> -	} sk_list;
> +	spinlock_t sk_lock;
> +	struct hlist_nulls_head sk_list;
>  };
>  
>  #define LLC_DEST_INVALID         0      /* Invalid LLC PDU type */
> diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
> index ffc96d3..baefb4f 100644
> --- a/net/llc/af_llc.c
> +++ b/net/llc/af_llc.c
> @@ -140,6 +140,7 @@ static struct proto llc_proto = {
>  	.name	  = "LLC",
>  	.owner	  = THIS_MODULE,
>  	.obj_size = sizeof(struct llc_sock),
> +	.slab_flags = SLAB_DESTROY_BY_RCU,
>  };
>  
>  /**
> diff --git a/net/llc/llc_conn.c b/net/llc/llc_conn.c
> index c6bab39..8331fc8 100644
> --- a/net/llc/llc_conn.c
> +++ b/net/llc/llc_conn.c
> @@ -468,6 +468,19 @@ static int llc_exec_conn_trans_actions(struct sock *sk,
>  	return rc;
>  }
>  
> +static inline bool llc_estab_match(const struct llc_sap *sap,
> +				   const struct llc_addr *daddr,
> +				   const struct llc_addr *laddr,
> +				   const struct sock *sk)
> +{
> +	struct llc_sock *llc = llc_sk(sk);
> +
> +	return llc->laddr.lsap == laddr->lsap &&
> +		llc->daddr.lsap == daddr->lsap &&
> +		llc_mac_match(llc->laddr.mac, laddr->mac) &&
> +		llc_mac_match(llc->daddr.mac, daddr->mac);
> +}
> +
>  /**
>   *	__llc_lookup_established - Finds connection for the remote/local sap/mac
>   *	@sap: SAP
> @@ -484,23 +497,24 @@ static struct sock *__llc_lookup_established(struct llc_sap *sap,
>  					     struct llc_addr *laddr)
>  {
>  	struct sock *rc;
> -	struct hlist_node *node;
> -
> -	read_lock(&sap->sk_list.lock);
> -	sk_for_each(rc, node, &sap->sk_list.list) {
> -		struct llc_sock *llc = llc_sk(rc);
> -
> -		if (llc->laddr.lsap == laddr->lsap &&
> -		    llc->daddr.lsap == daddr->lsap &&
> -		    llc_mac_match(llc->laddr.mac, laddr->mac) &&
> -		    llc_mac_match(llc->daddr.mac, daddr->mac)) {
> -			sock_hold(rc);
> +	struct hlist_nulls_node *node;
> +
> +	rcu_read_lock();
> +	sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
> +		if (llc_estab_match(sap, daddr, laddr, rc)) {
> +			/* Extra checks required by SLAB_DESTROY_BY_RCU */
> +			if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
> +				continue;

Hmm, this wont work in fact, because if we have several llc_sap allocated on machine,
we have no guarantee a freed socket wont be reused and inserted in another llc_sap list.

So before calling llc_estab_match() (and/or llc_listener_match()) you should
check if 'rc' really belong to current sap.

If not, you must restart the lookup (you hit a socket that was moved to another sap)


> +			if (unlikely(!llc_estab_match(sap, daddr, laddr, rc))) {
> +				sock_put(rc);
> +				continue;
> +			}
>  			goto found;
>  		}
>  	}
>  	rc = NULL;
>  found:
> -	read_unlock(&sap->sk_list.lock);
> +	rcu_read_unlock();
>  	return rc;
>  }
>  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-09 20:19           ` Eric Dumazet
@ 2009-12-09 20:36             ` Octavian Purdila
  2009-12-09 21:49               ` Octavian Purdila
  0 siblings, 1 reply; 22+ messages in thread
From: Octavian Purdila @ 2009-12-09 20:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Arnaldo Carvalho de Melo

On Wednesday 09 December 2009 22:19:24 you wrote:

> > +     rcu_read_lock();
> > +     sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
> > +             if (llc_estab_match(sap, daddr, laddr, rc)) {
> > +                     /* Extra checks required by SLAB_DESTROY_BY_RCU */
> > +                     if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
> > +                             continue;
> 
> Hmm, this wont work in fact, because if we have several llc_sap allocated
>  on machine, we have no guarantee a freed socket wont be reused and
>  inserted in another llc_sap list.
> 
> So before calling llc_estab_match() (and/or llc_listener_match()) you
>  should check if 'rc' really belong to current sap.
> 
> If not, you must restart the lookup (you hit a socket that was moved to
>  another sap)
> 

Right ! 

>> 
>> One doubt before pasting the code: In slab.h comment and in udp.c I see the 
lookup is restarted if an improper object is returned. Is that really 
required?
>> 
>
>
>Its needed only if you convert to a hash table (more than one chain),
>and I guess you definitly want a fanout of your XXXX items ?
>
>Check Documentation/RCU/rculist_nulls.txt for details :)

And of course I'll still want to add the hash table ;) I was just trying to 
get my head around something simpler first.

I think I have everything  (in my head :) ) that I need now to come back with 
a v2 llc patch set now. Thanks for your help !


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
  2009-12-08 21:26           ` Eric Dumazet
  2009-12-09 20:19           ` Eric Dumazet
@ 2009-12-09 20:52           ` Jarek Poplawski
  2 siblings, 0 replies; 22+ messages in thread
From: Jarek Poplawski @ 2009-12-09 20:52 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: Eric Dumazet, netdev, Arnaldo Carvalho de Melo

Octavian Purdila wrote, On 12/08/2009 10:10 PM:

...
> @@ -652,10 +679,10 @@ static int llc_find_offset(int state, int ev_type)
>  void llc_sap_add_socket(struct llc_sap *sap, struct sock *sk)
>  {
>  	llc_sap_hold(sap);
> -	write_lock_bh(&sap->sk_list.lock);
> +	spin_lock_bh(&sap->sk_lock);
>  	llc_sk(sk)->sap = sap;
> -	sk_add_node(sk, &sap->sk_list.list);
> -	write_unlock_bh(&sap->sk_list.lock);
> +	sk_nulls_add_node_rcu(sk, &sap->sk_list);
> +	spin_lock_bh(&sap->sk_lock);
------->^^^^^^^^^^


Jarek P.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-09 20:36             ` Octavian Purdila
@ 2009-12-09 21:49               ` Octavian Purdila
  2009-12-09 22:34                 ` Eric Dumazet
  0 siblings, 1 reply; 22+ messages in thread
From: Octavian Purdila @ 2009-12-09 21:49 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, Arnaldo Carvalho de Melo

On Wednesday 09 December 2009 22:36:06 you wrote:
 
> I think I have everything  (in my head :) ) that I need now to come back
>  with a v2 llc patch set now. 

Hmm, not really :) I still can't see how we can do multicast delivery only with RCU when we need to restart the lookup.

I think you mentioned that it is actually possible to do this for UDP, but I didn't found out how :-?

Also, I think we need to restart the lookup even when only one list is used, when we found out that the object is either free or not the one we were looking for, since the current object might have been added to the end of the 
list, thus short circuiting our search. E.g.:

@@ -323,14 +323,16 @@ static struct sock *llc_lookup_dgram(struct llc_sap *sap,
        struct hlist_nulls_node *node;
 
        rcu_read_lock_bh();
+again:
        sk_nulls_for_each_rcu(rc, node, &sap->sk_list) {
+               if (llc_sk(rc)->sap != sap)
+                       goto again;
                if (llc_dgram_match(sap, laddr, rc)) {
                        /* Extra checks required by SLAB_DESTROY_BY_RCU */
                        if (unlikely(!atomic_inc_not_zero(&rc->sk_refcnt)))
-                               continue;
+                               goto again;
                        if (unlikely(!llc_dgram_match(sap, laddr, rc))) {
                                sock_put(rc);
-                               continue;
+                               goto again;
                        }
                        goto found;
                }

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery)
  2009-12-09 21:49               ` Octavian Purdila
@ 2009-12-09 22:34                 ` Eric Dumazet
  0 siblings, 0 replies; 22+ messages in thread
From: Eric Dumazet @ 2009-12-09 22:34 UTC (permalink / raw)
  To: Octavian Purdila; +Cc: netdev, Arnaldo Carvalho de Melo

Le 09/12/2009 22:49, Octavian Purdila a écrit :
> On Wednesday 09 December 2009 22:36:06 you wrote:
>  
>> I think I have everything  (in my head :) ) that I need now to come back
>>  with a v2 llc patch set now. 
> 
> Hmm, not really :) I still can't see how we can do multicast delivery only with RCU when we need to restart the lookup.
> 
> I think you mentioned that it is actually possible to do this for UDP, but I didn't found out how :-?

A patch is coming for this, it might need a seqlock (instead of a spinlock).

Do the list lookup and take a refcount on all matching sockets, store pointers in a 'stack'.

If lookup was sucessful, then multicast packet to all sockets present in stack.

> 
> Also, I think we need to restart the lookup even when only one list is used, when we found out that the object is either free or not the one we were looking for, since the current object might have been added to the end of the 
> list, thus short circuiting our search. E.g.:
> 

Not necessary, because a socket is always added at front of the list.

You have a guarantee that you examined all sockets at least once.

But it doesnt matter so much anyway, because you dont want a single list :)


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-12-09 22:34 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-03 22:31 [PATCH 0/4] llc enhancements Octavian Purdila
2009-12-03 22:31 ` [PATCH 1/4] llc: use dev_hard_header Octavian Purdila
2009-12-03 22:31 ` [PATCH 2/4] llc: add support for LLC_OPT_PKTINFO Octavian Purdila
2009-12-03 22:31 ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Octavian Purdila
2009-12-03 22:59   ` Eric Dumazet
2009-12-03 23:30     ` Octavian Purdila
2009-12-03 23:52       ` Eric Dumazet
2009-12-04  0:15         ` Octavian Purdila
2009-12-04  0:28           ` Eric Dumazet
2009-12-08 21:10         ` [RFC PATCH] llc: convert the socket list to RCU locking (was Re: [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery) Octavian Purdila
2009-12-08 21:26           ` Eric Dumazet
2009-12-09 20:19           ` Eric Dumazet
2009-12-09 20:36             ` Octavian Purdila
2009-12-09 21:49               ` Octavian Purdila
2009-12-09 22:34                 ` Eric Dumazet
2009-12-09 20:52           ` Jarek Poplawski
2009-12-03 23:25   ` [PATCH 3/4] llc: use a device based hash table to speed up multicast delivery Stephen Hemminger
2009-12-03 23:53     ` Octavian Purdila
2009-12-04  0:37       ` Stephen Hemminger
2009-12-03 22:31 ` [PATCH 4/4] llc: replace the socket list with a local address based hash Octavian Purdila
2009-12-03 23:55 ` [PATCH 0/4] llc enhancements David Miller
2009-12-04  0:20   ` Octavian Purdila

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.