linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [Patch v8 0/3] net: reserve ports for applications using fixed port numbers
@ 2010-04-12 10:03 Amerigo Wang
  2010-04-12 10:04 ` [Patch 1/3] sysctl: refactor integer handling proc code Amerigo Wang
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Amerigo Wang @ 2010-04-12 10:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Octavian Purdila, ebiederm, Eric Dumazet, penguin-kernel, netdev,
	Neil Horman, Amerigo Wang, David Miller

Changes from the previous version:
- Rename proc_{get,put}_ulong to proc_{get,put}_long();
- Fix potential dead loop problems in cma code.

------------->

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

There are still some miss behaviors with regard to proc parsing in odd
invalid cases (for "40000\0-40001" all is acknowledged but only 40000
is accepted) but they are not easy to fix without changing the current
"acknowledge how much we accepted" behavior.

Because of that and because the same issues are present in the
existing proc_dointvec code as well I don't think its worth holding
the actual feature (port reservation) after such petty error recovery
issues.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Patch 1/3] sysctl: refactor integer handling proc code
  2010-04-12 10:03 [Patch v8 0/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
@ 2010-04-12 10:04 ` Amerigo Wang
  2010-04-13 11:18   ` Alexey Dobriyan
  2010-04-12 10:04 ` [Patch 2/3] sysctl: add proc_do_large_bitmap Amerigo Wang
  2010-04-12 10:04 ` [Patch 3/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
  2 siblings, 1 reply; 22+ messages in thread
From: Amerigo Wang @ 2010-04-12 10:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Octavian Purdila, Eric Dumazet, penguin-kernel, netdev,
	Neil Horman, ebiederm, David Miller, Amerigo Wang


From: Octavian Purdila <opurdila@ixiacom.com>

As we are about to add another integer handling proc function a little
bit of cleanup is in order: add a few helper functions to improve code
readability and decrease code duplication.

In the process a bug is also fixed: if the user specifies a number
with more then 20 digits it will be interpreted as two integers
(e.g. 10000...13 will be interpreted as 100.... and 13).

Behavior for EFAULT handling was changed as well. Previous to this
patch, when an EFAULT error occurred in the middle of a write
operation, although some of the elements were set, that was not
acknowledged to the user (by shorting the write and returning the
number of bytes accepted). EFAULT is now treated just like any other
errors by acknowledging the amount of bytes accepted.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---

Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -2040,8 +2040,148 @@ int proc_dostring(struct ctl_table *tabl
 			       buffer, lenp, ppos);
 }
 
+static int proc_skip_wspace(char __user **buf, size_t *size)
+{
+	char c;
+
+	while (*size) {
+		if (get_user(c, *buf))
+			return -EFAULT;
+		if (!isspace(c))
+			break;
+		(*size)--;
+		(*buf)++;
+	}
+
+	return 0;
+}
+
+static bool isanyof(char c, const char *v, unsigned len)
+{
+	int i;
+
+	if (!len)
+		return false;
+
+	for (i = 0; i < len; i++)
+		if (c == v[i])
+			break;
+	if (i == len)
+		return false;
+
+	return true;
+}
+
+#define TMPBUFLEN 22
+/**
+ * proc_get_long - reads an ASCII formated integer from a user buffer
+ *
+ * @buf - user buffer
+ * @size - size of the user buffer
+ * @val - this is where the number will be stored
+ * @neg - set to %TRUE if number is negative
+ * @perm_tr - a vector which contains the allowed trailers
+ * @perm_tr_len - size of the perm_tr vector
+ * @tr - pointer to store the trailer character
+ *
+ * In case of success 0 is returned and buf and size are updated with
+ * the amount of bytes read. If tr is non NULL and a trailing
+ * character exist (size is non zero after returning from this
+ * function) tr is updated with the trailing character.
+ */
+static int proc_get_long(char __user **buf, size_t *size,
+			  unsigned long *val, bool *neg,
+			  const char *perm_tr, unsigned perm_tr_len, char *tr)
+{
+	int len;
+	char *p, tmp[TMPBUFLEN];
+
+	if (!*size)
+		return -EINVAL;
+
+	len = *size;
+	if (len > TMPBUFLEN-1)
+		len = TMPBUFLEN-1;
+
+	if (copy_from_user(tmp, *buf, len))
+		return -EFAULT;
+
+	tmp[len] = 0;
+	p = tmp;
+	if (*p == '-' && *size > 1) {
+		*neg = 1;
+		p++;
+	} else
+		*neg = 0;
+	if (!isdigit(*p))
+		return -EINVAL;
+
+	*val = simple_strtoul(p, &p, 0);
+
+	len = p - tmp;
+
+	/* We don't know if the next char is whitespace thus we may accept
+	 * invalid integers (e.g. 1234...a) or two integers instead of one
+	 * (e.g. 123...1). So lets not allow such large numbers. */
+	if (len == TMPBUFLEN - 1)
+		return -EINVAL;
+
+	if (len < *size && perm_tr_len && !isanyof(*p, perm_tr, perm_tr_len))
+		return -EINVAL;
+
+	if (tr && (len < *size))
+		*tr = *p;
+
+	*buf += len;
+	*size -= len;
+
+	return 0;
+}
+
+/**
+ * proc_put_long - coverts an integer to a decimal ASCII formated string
+ *
+ * @buf - the user buffer
+ * @size - the size of the user buffer
+ * @val - the integer to be converted
+ * @neg - sign of the number, %TRUE for negative
+ * @first - if %FALSE will insert a separator character before the number
+ * @separator - the separator character
+ *
+ * In case of success 0 is returned and buf and size are updated with
+ * the amount of bytes read.
+ */
+static int proc_put_long(char __user **buf, size_t *size, unsigned long val,
+			  bool neg, bool first, char separator)
+{
+	int len;
+	char tmp[TMPBUFLEN], *p = tmp;
+
+	if (!first)
+		*p++ = separator;
+	sprintf(p, "%s%lu", neg ? "-" : "", val);
+	len = strlen(tmp);
+	if (len > *size)
+		len = *size;
+	if (copy_to_user(*buf, tmp, len))
+		return -EFAULT;
+	*size -= len;
+	*buf += len;
+	return 0;
+}
+#undef TMPBUFLEN
+
+static int proc_put_char(char __user **buf, size_t *size, char c)
+{
+	if (*size) {
+		if (put_user(c, *buf))
+			return -EFAULT;
+		(*size)--, (*buf)++;
+	}
+	return 0;
+}
 
-static int do_proc_dointvec_conv(int *negp, unsigned long *lvalp,
+static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
 				 int *valp,
 				 int write, void *data)
 {
@@ -2050,7 +2190,7 @@ static int do_proc_dointvec_conv(int *ne
 	} else {
 		int val = *valp;
 		if (val < 0) {
-			*negp = -1;
+			*negp = 1;
 			*lvalp = (unsigned long)-val;
 		} else {
 			*negp = 0;
@@ -2060,20 +2200,18 @@ static int do_proc_dointvec_conv(int *ne
 	return 0;
 }
 
+static const char proc_wspace_sep[] = { ' ', '\t', '\n', 0 };
+
 static int __do_proc_dointvec(void *tbl_data, struct ctl_table *table,
-		  int write, void __user *buffer,
+		  int write, void __user *_buffer,
 		  size_t *lenp, loff_t *ppos,
-		  int (*conv)(int *negp, unsigned long *lvalp, int *valp,
+		  int (*conv)(bool *negp, unsigned long *lvalp, int *valp,
 			      int write, void *data),
 		  void *data)
 {
-#define TMPBUFLEN 21
-	int *i, vleft, first = 1, neg;
-	unsigned long lval;
-	size_t left, len;
-	
-	char buf[TMPBUFLEN], *p;
-	char __user *s = buffer;
+	int *i, vleft, first = 1, err = 0;
+	size_t left;
+	char __user *buffer = (char __user *) _buffer;
 	
 	if (!tbl_data || !table->maxlen || !*lenp ||
 	    (*ppos && !write)) {
@@ -2089,88 +2227,48 @@ static int __do_proc_dointvec(void *tbl_
 		conv = do_proc_dointvec_conv;
 
 	for (; left && vleft--; i++, first=0) {
-		if (write) {
-			while (left) {
-				char c;
-				if (get_user(c, s))
-					return -EFAULT;
-				if (!isspace(c))
-					break;
-				left--;
-				s++;
-			}
-			if (!left)
-				break;
-			neg = 0;
-			len = left;
-			if (len > sizeof(buf) - 1)
-				len = sizeof(buf) - 1;
-			if (copy_from_user(buf, s, len))
-				return -EFAULT;
-			buf[len] = 0;
-			p = buf;
-			if (*p == '-' && left > 1) {
-				neg = 1;
-				p++;
-			}
-			if (*p < '0' || *p > '9')
-				break;
-
-			lval = simple_strtoul(p, &p, 0);
+		unsigned long lval;
+		bool neg;
 
-			len = p-buf;
-			if ((len < left) && *p && !isspace(*p))
+		if (write) {
+			err = proc_skip_wspace(&buffer, &left);
+			if (err)
+				return err;
+			err = proc_get_long(&buffer, &left, &lval, &neg,
+					     proc_wspace_sep,
+					     sizeof(proc_wspace_sep), NULL);
+			if (err)
 				break;
-			s += len;
-			left -= len;
-
-			if (conv(&neg, &lval, i, 1, data))
+			if (conv(&neg, &lval, i, 1, data)) {
+				err = -EINVAL;
 				break;
+			}
 		} else {
-			p = buf;
-			if (!first)
-				*p++ = '\t';
-	
-			if (conv(&neg, &lval, i, 0, data))
+			if (conv(&neg, &lval, i, 0, data)) {
+				err = -EINVAL;
 				break;
-
-			sprintf(p, "%s%lu", neg ? "-" : "", lval);
-			len = strlen(buf);
-			if (len > left)
-				len = left;
-			if(copy_to_user(s, buf, len))
-				return -EFAULT;
-			left -= len;
-			s += len;
-		}
-	}
-
-	if (!write && !first && left) {
-		if(put_user('\n', s))
-			return -EFAULT;
-		left--, s++;
-	}
-	if (write) {
-		while (left) {
-			char c;
-			if (get_user(c, s++))
-				return -EFAULT;
-			if (!isspace(c))
+			}
+			err = proc_put_long(&buffer, &left, lval, neg, first,
+					     '\t');
+			if (err)
 				break;
-			left--;
 		}
 	}
+
+	if (!write && !first && left && !err)
+		err = proc_put_char(&buffer, &left, '\n');
+	if (write && !err)
+		err = proc_skip_wspace(&buffer, &left);
 	if (write && first)
-		return -EINVAL;
+		return err ? : -EINVAL;
 	*lenp -= left;
 	*ppos += *lenp;
 	return 0;
-#undef TMPBUFLEN
 }
 
 static int do_proc_dointvec(struct ctl_table *table, int write,
 		  void __user *buffer, size_t *lenp, loff_t *ppos,
-		  int (*conv)(int *negp, unsigned long *lvalp, int *valp,
+		  int (*conv)(bool *negp, unsigned long *lvalp, int *valp,
 			      int write, void *data),
 		  void *data)
 {
@@ -2238,8 +2336,8 @@ struct do_proc_dointvec_minmax_conv_para
 	int *max;
 };
 
-static int do_proc_dointvec_minmax_conv(int *negp, unsigned long *lvalp, 
-					int *valp, 
+static int do_proc_dointvec_minmax_conv(bool *negp, unsigned long *lvalp,
+					int *valp,
 					int write, void *data)
 {
 	struct do_proc_dointvec_minmax_conv_param *param = data;
@@ -2252,7 +2350,7 @@ static int do_proc_dointvec_minmax_conv(
 	} else {
 		int val = *valp;
 		if (val < 0) {
-			*negp = -1;
+			*negp = 1;
 			*lvalp = (unsigned long)-val;
 		} else {
 			*negp = 0;
@@ -2290,17 +2388,15 @@ int proc_dointvec_minmax(struct ctl_tabl
 }
 
 static int __do_proc_doulongvec_minmax(void *data, struct ctl_table *table, int write,
-				     void __user *buffer,
+				     void __user *_buffer,
 				     size_t *lenp, loff_t *ppos,
 				     unsigned long convmul,
 				     unsigned long convdiv)
 {
-#define TMPBUFLEN 21
-	unsigned long *i, *min, *max, val;
-	int vleft, first=1, neg;
-	size_t len, left;
-	char buf[TMPBUFLEN], *p;
-	char __user *s = buffer;
+	unsigned long *i, *min, *max;
+	int vleft, first = 1, err = 0;
+	size_t left;
+	char __user *buffer = (char __user *) _buffer;
 	
 	if (!data || !table->maxlen || !*lenp ||
 	    (*ppos && !write)) {
@@ -2315,82 +2411,42 @@ static int __do_proc_doulongvec_minmax(v
 	left = *lenp;
 	
 	for (; left && vleft--; i++, min++, max++, first=0) {
+		unsigned long val;
+
 		if (write) {
-			while (left) {
-				char c;
-				if (get_user(c, s))
-					return -EFAULT;
-				if (!isspace(c))
-					break;
-				left--;
-				s++;
-			}
-			if (!left)
-				break;
-			neg = 0;
-			len = left;
-			if (len > TMPBUFLEN-1)
-				len = TMPBUFLEN-1;
-			if (copy_from_user(buf, s, len))
-				return -EFAULT;
-			buf[len] = 0;
-			p = buf;
-			if (*p == '-' && left > 1) {
-				neg = 1;
-				p++;
-			}
-			if (*p < '0' || *p > '9')
-				break;
-			val = simple_strtoul(p, &p, 0) * convmul / convdiv ;
-			len = p-buf;
-			if ((len < left) && *p && !isspace(*p))
+			bool neg;
+
+			err = proc_skip_wspace(&buffer, &left);
+			if (err)
+				return err;
+			err = proc_get_long(&buffer, &left, &val, &neg,
+					     proc_wspace_sep,
+					     sizeof(proc_wspace_sep), NULL);
+			if (err)
 				break;
 			if (neg)
-				val = -val;
-			s += len;
-			left -= len;
-
-			if(neg)
 				continue;
 			if ((min && val < *min) || (max && val > *max))
 				continue;
 			*i = val;
 		} else {
-			p = buf;
-			if (!first)
-				*p++ = '\t';
-			sprintf(p, "%lu", convdiv * (*i) / convmul);
-			len = strlen(buf);
-			if (len > left)
-				len = left;
-			if(copy_to_user(s, buf, len))
-				return -EFAULT;
-			left -= len;
-			s += len;
-		}
-	}
-
-	if (!write && !first && left) {
-		if(put_user('\n', s))
-			return -EFAULT;
-		left--, s++;
-	}
-	if (write) {
-		while (left) {
-			char c;
-			if (get_user(c, s++))
-				return -EFAULT;
-			if (!isspace(c))
+			val = convdiv * (*i) / convmul;
+			err = proc_put_long(&buffer, &left, val, 0, first,
+					     '\t');
+			if (err)
 				break;
-			left--;
 		}
 	}
+
+	if (!write && !first && left && !err)
+		err = proc_put_char(&buffer, &left, '\n');
+	if (write && !err)
+		err = proc_skip_wspace(&buffer, &left);
 	if (write && first)
-		return -EINVAL;
+		return err ? : -EINVAL;
 	*lenp -= left;
 	*ppos += *lenp;
 	return 0;
-#undef TMPBUFLEN
 }
 
 static int do_proc_doulongvec_minmax(struct ctl_table *table, int write,
@@ -2451,7 +2507,7 @@ int proc_doulongvec_ms_jiffies_minmax(st
 }
 
 
-static int do_proc_dointvec_jiffies_conv(int *negp, unsigned long *lvalp,
+static int do_proc_dointvec_jiffies_conv(bool *negp, unsigned long *lvalp,
 					 int *valp,
 					 int write, void *data)
 {
@@ -2463,7 +2519,7 @@ static int do_proc_dointvec_jiffies_conv
 		int val = *valp;
 		unsigned long lval;
 		if (val < 0) {
-			*negp = -1;
+			*negp = 1;
 			lval = (unsigned long)-val;
 		} else {
 			*negp = 0;
@@ -2474,7 +2530,7 @@ static int do_proc_dointvec_jiffies_conv
 	return 0;
 }
 
-static int do_proc_dointvec_userhz_jiffies_conv(int *negp, unsigned long *lvalp,
+static int do_proc_dointvec_userhz_jiffies_conv(bool *negp, unsigned long *lvalp,
 						int *valp,
 						int write, void *data)
 {
@@ -2486,7 +2542,7 @@ static int do_proc_dointvec_userhz_jiffi
 		int val = *valp;
 		unsigned long lval;
 		if (val < 0) {
-			*negp = -1;
+			*negp = 1;
 			lval = (unsigned long)-val;
 		} else {
 			*negp = 0;
@@ -2497,7 +2553,7 @@ static int do_proc_dointvec_userhz_jiffi
 	return 0;
 }
 
-static int do_proc_dointvec_ms_jiffies_conv(int *negp, unsigned long *lvalp,
+static int do_proc_dointvec_ms_jiffies_conv(bool *negp, unsigned long *lvalp,
 					    int *valp,
 					    int write, void *data)
 {
@@ -2507,7 +2563,7 @@ static int do_proc_dointvec_ms_jiffies_c
 		int val = *valp;
 		unsigned long lval;
 		if (val < 0) {
-			*negp = -1;
+			*negp = 1;
 			lval = (unsigned long)-val;
 		} else {
 			*negp = 0;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Patch 2/3] sysctl: add proc_do_large_bitmap
  2010-04-12 10:03 [Patch v8 0/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
  2010-04-12 10:04 ` [Patch 1/3] sysctl: refactor integer handling proc code Amerigo Wang
@ 2010-04-12 10:04 ` Amerigo Wang
  2010-04-12 10:04 ` [Patch 3/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
  2 siblings, 0 replies; 22+ messages in thread
From: Amerigo Wang @ 2010-04-12 10:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Octavian Purdila, Eric Dumazet, penguin-kernel, netdev,
	Neil Horman, Amerigo Wang, ebiederm, David Miller

From: Octavian Purdila <opurdila@ixiacom.com>

The new function can be used to read/write large bitmaps via /proc. A
comma separated range format is used for compact output and input
(e.g. 1,3-4,10-10).

Writing into the file will first reset the bitmap then update it
based on the given input.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---

Index: linux-2.6/include/linux/sysctl.h
===================================================================
--- linux-2.6.orig/include/linux/sysctl.h
+++ linux-2.6/include/linux/sysctl.h
@@ -980,6 +980,8 @@ extern int proc_doulongvec_minmax(struct
 				  void __user *, size_t *, loff_t *);
 extern int proc_doulongvec_ms_jiffies_minmax(struct ctl_table *table, int,
 				      void __user *, size_t *, loff_t *);
+extern int proc_do_large_bitmap(struct ctl_table *, int,
+				void __user *, size_t *, loff_t *);
 
 /*
  * Register a set of sysctl names by calling register_sysctl_table
Index: linux-2.6/kernel/sysctl.c
===================================================================
--- linux-2.6.orig/kernel/sysctl.c
+++ linux-2.6/kernel/sysctl.c
@@ -2072,6 +2072,23 @@ static bool isanyof(char c, const char *
 	return true;
 }
 
+static int proc_skip_anyof(char __user **buf, size_t *size,
+			   const char *v, unsigned len)
+{
+	char c;
+
+	while (*size) {
+		if (get_user(c, *buf))
+			return -EFAULT;
+		if (!isanyof(c, v, len))
+			break;
+		(*size)--;
+		(*buf)++;
+	}
+
+	return 0;
+}
+
 #define TMPBUFLEN 22
 /**
  * proc_get_long - reads an ASCII formated integer from a user buffer
@@ -2663,6 +2680,135 @@ static int proc_do_cad_pid(struct ctl_ta
 	return 0;
 }
 
+/**
+ * proc_do_large_bitmap - read/write from/to a large bitmap
+ * @table: the sysctl table
+ * @write: %TRUE if this is a write to the sysctl file
+ * @buffer: the user buffer
+ * @lenp: the size of the user buffer
+ * @ppos: file position
+ *
+ * The bitmap is stored at table->data and the bitmap length (in bits)
+ * in table->maxlen.
+ *
+ * We use a range comma separated format (e.g. 1,3-4,10-10) so that
+ * large bitmaps may be represented in a compact manner. Writing into
+ * the file will clear the bitmap then update it with the given input.
+ *
+ * Returns 0 on success.
+ */
+int proc_do_large_bitmap(struct ctl_table *table, int write,
+			 void __user *_buffer, size_t *lenp, loff_t *ppos)
+{
+	int err = 0;
+	bool first = 1;
+	size_t left = *lenp;
+	unsigned long bitmap_len = table->maxlen;
+	char __user *buffer = (char __user *) _buffer;
+	unsigned long *bitmap = (unsigned long *) table->data;
+	unsigned long *tmp_bitmap = NULL;
+	char tr_a[] = { '-', ',', '\n', 0 }, tr_b[] = { ',', '\n', 0 }, c;
+	char tr_end[] = { '\n', 0 };
+
+
+	if (!bitmap_len || !left || (*ppos && !write)) {
+		*lenp = 0;
+		return 0;
+	}
+
+	if (write) {
+		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
+				     GFP_KERNEL);
+		if (!tmp_bitmap)
+			return -ENOMEM;
+		err = proc_skip_anyof(&buffer, &left, tr_end, sizeof(tr_end));
+		while (!err && left) {
+			unsigned long val_a, val_b;
+			bool neg;
+
+			err = proc_get_long(&buffer, &left, &val_a, &neg, tr_a,
+					     sizeof(tr_a), &c);
+			if (err)
+				break;
+			if (val_a >= bitmap_len || neg) {
+				err = -EINVAL;
+				break;
+			}
+
+			val_b = val_a;
+			if (left) {
+				buffer++;
+				left--;
+			}
+
+			if (c == '-') {
+				err = proc_get_long(&buffer, &left, &val_b,
+						     &neg, tr_b, sizeof(tr_b),
+						     &c);
+				if (err)
+					break;
+				if (val_b >= bitmap_len || neg ||
+				    val_a > val_b) {
+					err = -EINVAL;
+					break;
+				}
+				if (left) {
+					buffer++;
+					left--;
+				}
+			}
+
+			while (val_a <= val_b)
+				set_bit(val_a++, tmp_bitmap);
+
+			first = 0;
+			err = proc_skip_anyof(&buffer, &left, tr_end,
+					      sizeof(tr_end));
+		}
+	} else {
+		unsigned long bit_a, bit_b = 0;
+
+		while (left) {
+			bit_a = find_next_bit(bitmap, bitmap_len, bit_b);
+			if (bit_a >= bitmap_len)
+				break;
+			bit_b = find_next_zero_bit(bitmap, bitmap_len,
+						   bit_a + 1) - 1;
+
+			err = proc_put_long(&buffer, &left, bit_a, 0, first,
+					     ',');
+			if (err)
+				break;
+			if (bit_a != bit_b) {
+				err = proc_put_char(&buffer, &left, '-');
+				if (err)
+					break;
+				err = proc_put_long(&buffer, &left, bit_b, 0,
+						     1, 0);
+				if (err)
+					break;
+			}
+
+			first = 0; bit_b++;
+		}
+		if (!err)
+			err = proc_put_char(&buffer, &left, '\n');
+	}
+
+	if (!err) {
+		if (write)
+			memcpy(bitmap, tmp_bitmap,
+			       BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
+		kfree(tmp_bitmap);
+		*lenp -= left;
+		*ppos += *lenp;
+		return 0;
+	} else {
+		kfree(tmp_bitmap);
+		return err;
+	}
+}
+
 #else /* CONFIG_PROC_FS */
 
 int proc_dostring(struct ctl_table *table, int write,

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [Patch 3/3] net: reserve ports for applications using fixed port numbers
  2010-04-12 10:03 [Patch v8 0/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
  2010-04-12 10:04 ` [Patch 1/3] sysctl: refactor integer handling proc code Amerigo Wang
  2010-04-12 10:04 ` [Patch 2/3] sysctl: add proc_do_large_bitmap Amerigo Wang
@ 2010-04-12 10:04 ` Amerigo Wang
  2010-04-13  1:21   ` Tetsuo Handa
  2 siblings, 1 reply; 22+ messages in thread
From: Amerigo Wang @ 2010-04-12 10:04 UTC (permalink / raw)
  To: linux-kernel
  Cc: Octavian Purdila, Eric Dumazet, penguin-kernel, netdev,
	Neil Horman, Amerigo Wang, David Miller, ebiederm

From: Octavian Purdila <opurdila@ixiacom.com>

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
---

Index: linux-2.6/Documentation/networking/ip-sysctl.txt
===================================================================
--- linux-2.6.orig/Documentation/networking/ip-sysctl.txt
+++ linux-2.6/Documentation/networking/ip-sysctl.txt
@@ -588,6 +588,37 @@ ip_local_port_range - 2 INTEGERS
 	(i.e. by default) range 1024-4999 is enough to issue up to
 	2000 connections per second to systems supporting timestamps.
 
+ip_local_reserved_ports - list of comma separated ranges
+	Specify the ports which are reserved for known third-party
+	applications. These ports will not be used by automatic port
+	assignments (e.g. when calling connect() or bind() with port
+	number 0). Explicit port allocation behavior is unchanged.
+
+	The format used for both input and output is a comma separated
+	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
+	10). Writing to the file will clear all previously reserved
+	ports and update the current list with the one given in the
+	input.
+
+	Note that ip_local_port_range and ip_local_reserved_ports
+	settings are independent and both are considered by the kernel
+	when determining which ports are available for automatic port
+	assignments.
+
+	You can reserve ports which are not in the current
+	ip_local_port_range, e.g.:
+
+	$ cat /proc/sys/net/ipv4/ip_local_port_range
+	32000	61000
+	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
+	8080,9148
+
+	although this is redundant. However such a setting is useful
+	if later the port range is changed to a value that will
+	include the reserved ports.
+
+	Default: Empty
+
 ip_nonlocal_bind - BOOLEAN
 	If set, allows processes to bind() to non-local IP addresses,
 	which can be quite useful - but may break some applications.
Index: linux-2.6/drivers/infiniband/core/cma.c
===================================================================
--- linux-2.6.orig/drivers/infiniband/core/cma.c
+++ linux-2.6/drivers/infiniband/core/cma.c
@@ -1980,6 +1980,8 @@ retry:
 	/* FIXME: add proper port randomization per like inet_csk_get_port */
 	do {
 		ret = idr_get_new_above(ps, bind_list, next_port, &port);
+		if (!ret && inet_is_reserved_local_port(port))
+			ret = -EAGAIN;
 	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
 
 	if (ret)
@@ -2995,11 +2997,19 @@ static void cma_remove_one(struct ib_dev
 static int __init cma_init(void)
 {
 	int ret, low, high, remaining;
+	int tries = 10;
 
-	get_random_bytes(&next_port, sizeof next_port);
 	inet_get_local_port_range(&low, &high);
+again:
+	get_random_bytes(&next_port, sizeof next_port);
 	remaining = (high - low) + 1;
 	next_port = ((unsigned int) next_port % remaining) + low;
+	if (inet_is_reserved_local_port(next_port)) {
+		if (tries--)
+			goto again;
+		else
+			return -EBUSY;
+	}
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)
Index: linux-2.6/include/net/ip.h
===================================================================
--- linux-2.6.orig/include/net/ip.h
+++ linux-2.6/include/net/ip.h
@@ -184,6 +184,12 @@ extern struct local_ports {
 } sysctl_local_ports;
 extern void inet_get_local_port_range(int *low, int *high);
 
+extern unsigned long *sysctl_local_reserved_ports;
+static inline int inet_is_reserved_local_port(int port)
+{
+	return test_bit(port, sysctl_local_reserved_ports);
+}
+
 extern int sysctl_ip_default_ttl;
 extern int sysctl_ip_nonlocal_bind;
 
Index: linux-2.6/net/ipv4/af_inet.c
===================================================================
--- linux-2.6.orig/net/ipv4/af_inet.c
+++ linux-2.6/net/ipv4/af_inet.c
@@ -1552,9 +1552,13 @@ static int __init inet_init(void)
 
 	BUILD_BUG_ON(sizeof(struct inet_skb_parm) > sizeof(dummy_skb->cb));
 
+	sysctl_local_reserved_ports = kzalloc(65536 / 8, GFP_KERNEL);
+	if (!sysctl_local_reserved_ports)
+		goto out;
+
 	rc = proto_register(&tcp_prot, 1);
 	if (rc)
-		goto out;
+		goto out_free_reserved_ports;
 
 	rc = proto_register(&udp_prot, 1);
 	if (rc)
@@ -1653,6 +1657,8 @@ out_unregister_udp_proto:
 	proto_unregister(&udp_prot);
 out_unregister_tcp_proto:
 	proto_unregister(&tcp_prot);
+out_free_reserved_ports:
+	kfree(sysctl_local_reserved_ports);
 	goto out;
 }
 
Index: linux-2.6/net/ipv4/inet_connection_sock.c
===================================================================
--- linux-2.6.orig/net/ipv4/inet_connection_sock.c
+++ linux-2.6/net/ipv4/inet_connection_sock.c
@@ -37,6 +37,9 @@ struct local_ports sysctl_local_ports __
 	.range = { 32768, 61000 },
 };
 
+unsigned long *sysctl_local_reserved_ports;
+EXPORT_SYMBOL(sysctl_local_reserved_ports);
+
 void inet_get_local_port_range(int *low, int *high)
 {
 	unsigned seq;
@@ -108,6 +111,8 @@ again:
 
 		smallest_size = -1;
 		do {
+			if (inet_is_reserved_local_port(rover))
+				goto next_nolock;
 			head = &hashinfo->bhash[inet_bhashfn(net, rover,
 					hashinfo->bhash_size)];
 			spin_lock(&head->lock);
@@ -130,6 +135,7 @@ again:
 			break;
 		next:
 			spin_unlock(&head->lock);
+		next_nolock:
 			if (++rover > high)
 				rover = low;
 		} while (--remaining > 0);
Index: linux-2.6/net/ipv4/inet_hashtables.c
===================================================================
--- linux-2.6.orig/net/ipv4/inet_hashtables.c
+++ linux-2.6/net/ipv4/inet_hashtables.c
@@ -456,6 +456,8 @@ int __inet_hash_connect(struct inet_time
 		local_bh_disable();
 		for (i = 1; i <= remaining; i++) {
 			port = low + (i + offset) % remaining;
+			if (inet_is_reserved_local_port(port))
+				continue;
 			head = &hinfo->bhash[inet_bhashfn(net, port,
 					hinfo->bhash_size)];
 			spin_lock(&head->lock);
Index: linux-2.6/net/ipv4/sysctl_net_ipv4.c
===================================================================
--- linux-2.6.orig/net/ipv4/sysctl_net_ipv4.c
+++ linux-2.6/net/ipv4/sysctl_net_ipv4.c
@@ -299,6 +299,13 @@ static struct ctl_table ipv4_table[] = {
 		.mode		= 0644,
 		.proc_handler	= ipv4_local_port_range,
 	},
+	{
+		.procname	= "ip_local_reserved_ports",
+		.data		= NULL, /* initialized in sysctl_ipv4_init */
+		.maxlen		= 65536,
+		.mode		= 0644,
+		.proc_handler	= proc_do_large_bitmap,
+	},
 #ifdef CONFIG_IP_MULTICAST
 	{
 		.procname	= "igmp_max_memberships",
@@ -736,6 +743,16 @@ static __net_initdata struct pernet_oper
 static __init int sysctl_ipv4_init(void)
 {
 	struct ctl_table_header *hdr;
+	struct ctl_table *i;
+
+	for (i = ipv4_table; i->procname; i++) {
+		if (strcmp(i->procname, "ip_local_reserved_ports") == 0) {
+			i->data = sysctl_local_reserved_ports;
+			break;
+		}
+	}
+	if (!i->procname)
+		return -EINVAL;
 
 	hdr = register_sysctl_paths(net_ipv4_ctl_path, ipv4_table);
 	if (hdr == NULL)
Index: linux-2.6/net/ipv4/udp.c
===================================================================
--- linux-2.6.orig/net/ipv4/udp.c
+++ linux-2.6/net/ipv4/udp.c
@@ -233,7 +233,8 @@ int udp_lib_get_port(struct sock *sk, un
 			 */
 			do {
 				if (low <= snum && snum <= high &&
-				    !test_bit(snum >> udptable->log, bitmap))
+				    !test_bit(snum >> udptable->log, bitmap) &&
+				    !inet_is_reserved_local_port(snum))
 					goto found;
 				snum += rand;
 			} while (snum != first);
Index: linux-2.6/net/sctp/socket.c
===================================================================
--- linux-2.6.orig/net/sctp/socket.c
+++ linux-2.6/net/sctp/socket.c
@@ -5436,6 +5436,8 @@ static long sctp_get_port_local(struct s
 			rover++;
 			if ((rover < low) || (rover > high))
 				rover = low;
+			if (inet_is_reserved_local_port(rover))
+				continue;
 			index = sctp_phashfn(rover);
 			head = &sctp_port_hashtable[index];
 			sctp_spin_lock(&head->lock);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Patch 3/3] net: reserve ports for applications using fixed port numbers
  2010-04-12 10:04 ` [Patch 3/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
@ 2010-04-13  1:21   ` Tetsuo Handa
  2010-04-13  7:13     ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2010-04-13  1:21 UTC (permalink / raw)
  To: amwang
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm, linux-kernel

Hello.

> --- linux-2.6.orig/drivers/infiniband/core/cma.c
> +++ linux-2.6/drivers/infiniband/core/cma.c
> @@ -1980,6 +1980,8 @@ retry:
>  	/* FIXME: add proper port randomization per like inet_csk_get_port */
>  	do {
>  		ret = idr_get_new_above(ps, bind_list, next_port, &port);
> +		if (!ret && inet_is_reserved_local_port(port))
> +			ret = -EAGAIN;
>  	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>  
>  	if (ret)
> 
I think above part is wrong. Below program
--------------------
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/idr.h>

static DEFINE_IDR(idr);
static int idr_demo_init(void)
{
	int next_port = 65530;
	int port = 0;
	int ret = -EINTR;
	while (!signal_pending(current)) {
		msleep(1000);
		ret = idr_get_new_above(&idr, NULL, next_port, &port);
		printk(KERN_INFO "idr_get_new_above() = %d\n", ret);
		if (!ret) {
			/* Emulate inet_is_reserved_local_port(port) = true */
			printk(KERN_INFO "Port %u is reserved.\n", port);
			ret = -EAGAIN;
		}
		if (ret == -EAGAIN) {
			if (idr_pre_get(&idr, GFP_KERNEL)) {
				printk(KERN_INFO "idr_pre_get() succeeded.\n");
				continue;
			}
			printk(KERN_INFO "idr_pre_get() failed.\n");
			break;
		} else {
			printk(KERN_INFO "next_port=%u port=%u\n",
			       next_port, port);
			break;
		}
	}
	if (!ret)
		idr_remove(&idr, port);
	idr_destroy(&idr);
	return -EINVAL;
}
module_init(idr_demo_init);
MODULE_LICENSE("GPL");
--------------------
generated below output.

idr_get_new_above() = -11
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65530 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65531 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65532 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65533 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65534 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65535 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65536 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
Port 65537 is reserved.
idr_pre_get() succeeded.
idr_get_new_above() = 0
(...snipped...)

This result suggests that above loop will continue until idr_pre_get() fails
due to out of memory if all ports were reserved.

Also, if idr_get_new_above() returned 0, bind_list (which is a kmalloc()ed
pointer) is already installed into a free slot (see comment on
idr_get_new_above_int()). Thus, simply calling idr_get_new_above() again will
install the same pointer into multiple slots. I guess it will malfunction later.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Patch 3/3] net: reserve ports for applications using fixed port numbers
  2010-04-13  1:21   ` Tetsuo Handa
@ 2010-04-13  7:13     ` Cong Wang
  2010-04-13  8:48       ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2010-04-13  7:13 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm, linux-kernel

Tetsuo Handa wrote:
> Hello.
> 
>> --- linux-2.6.orig/drivers/infiniband/core/cma.c
>> +++ linux-2.6/drivers/infiniband/core/cma.c
>> @@ -1980,6 +1980,8 @@ retry:
>>  	/* FIXME: add proper port randomization per like inet_csk_get_port */
>>  	do {
>>  		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>> +		if (!ret && inet_is_reserved_local_port(port))
>> +			ret = -EAGAIN;
>>  	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>>  
>>  	if (ret)
>>
> I think above part is wrong. Below program
...
> 
> This result suggests that above loop will continue until idr_pre_get() fails
> due to out of memory if all ports were reserved.
> 
> Also, if idr_get_new_above() returned 0, bind_list (which is a kmalloc()ed
> pointer) is already installed into a free slot (see comment on
> idr_get_new_above_int()). Thus, simply calling idr_get_new_above() again will
> install the same pointer into multiple slots. I guess it will malfunction later.

Thanks for testing!

How about:

+		if (!ret && inet_is_reserved_local_port(port))
+			ret = -EBUSY;

? So that it will break the loop and return error.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Patch 1/3] sysctl: refactor integer handling proc code
  2010-04-13 11:18   ` Alexey Dobriyan
@ 2010-04-13  7:35     ` Cong Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Cong Wang @ 2010-04-13  7:35 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-kernel, Octavian Purdila, Eric Dumazet, penguin-kernel,
	netdev, Neil Horman, ebiederm, David Miller

Alexey Dobriyan wrote:
> On Mon, Apr 12, 2010 at 06:04:04AM -0400, Amerigo Wang wrote:
>> As we are about to add another integer handling proc function a little
>> bit of cleanup is in order: add a few helper functions to improve code
>> readability and decrease code duplication.
>>
>> In the process a bug is also fixed: if the user specifies a number
>> with more then 20 digits it will be interpreted as two integers
>> (e.g. 10000...13 will be interpreted as 100.... and 13).
> 
> ULONG_MAX is not 22 digits always.
> 
> The fix is to not rely on simple_strtoul()
> 
> I guess it's time to finally remove it. :-(


Or use strict_strtoul()?

> 
> Also, it's better to copy_from user stuff once.
> Without looking at non-trivial users, one page should be enough.

It seems that all proc code assumes that the input buffer will
not exceed one page size.


> 
>> Behavior for EFAULT handling was changed as well. Previous to this
>> patch, when an EFAULT error occurred in the middle of a write
>> operation, although some of the elements were set, that was not
>> acknowledged to the user (by shorting the write and returning the
>> number of bytes accepted). EFAULT is now treated just like any other
>> errors by acknowledging the amount of bytes accepted.
> 
>> +static int proc_skip_wspace(char __user **buf, size_t *size)
>> +{
>> +	char c;
>> +
>> +	while (*size) {
>> +		if (get_user(c, *buf))
>> +			return -EFAULT;
>> +		if (!isspace(c))
>> +			break;
>> +		(*size)--;
>> +		(*buf)++;
>> +	}
>> +
>> +	return 0;
>> +}
> 
> yeah, copy_from_user once, so we won't have this.

Ok.

> 
>> +static bool isanyof(char c, const char *v, unsigned len)
> 
> A what?
> this is memchr()
> 

Hmm, right, it should be memchr(v, c, len).

Thanks!


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Patch 3/3] net: reserve ports for applications using fixed port numbers
  2010-04-13  7:13     ` Cong Wang
@ 2010-04-13  8:48       ` Cong Wang
  2010-04-13 13:07         ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2010-04-13  8:48 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm, linux-kernel

Cong Wang wrote:
> Tetsuo Handa wrote:
>> Hello.
>>
>>> --- linux-2.6.orig/drivers/infiniband/core/cma.c
>>> +++ linux-2.6/drivers/infiniband/core/cma.c
>>> @@ -1980,6 +1980,8 @@ retry:
>>>  	/* FIXME: add proper port randomization per like inet_csk_get_port */
>>>  	do {
>>>  		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>>> +		if (!ret && inet_is_reserved_local_port(port))
>>> +			ret = -EAGAIN;
>>>  	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>>>  
>>>  	if (ret)
>>>
>> I think above part is wrong. Below program
> ...
>> This result suggests that above loop will continue until idr_pre_get() fails
>> due to out of memory if all ports were reserved.
>>
>> Also, if idr_get_new_above() returned 0, bind_list (which is a kmalloc()ed
>> pointer) is already installed into a free slot (see comment on
>> idr_get_new_above_int()). Thus, simply calling idr_get_new_above() again will
>> install the same pointer into multiple slots. I guess it will malfunction later.
> 
> Thanks for testing!
> 
> How about:
> 
> +		if (!ret && inet_is_reserved_local_port(port))
> +			ret = -EBUSY;
> 
> ? So that it will break the loop and return error.
> 

Or use the similar trick:

 int tries = 10;
...

 if(!ret && inet_is_reserved_local_port(port)) {
   if (tries--)
     ret = -EAGAIN;
   else
     ret = -EBUSY;
 }

Any comments?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Patch 1/3] sysctl: refactor integer handling proc code
  2010-04-12 10:04 ` [Patch 1/3] sysctl: refactor integer handling proc code Amerigo Wang
@ 2010-04-13 11:18   ` Alexey Dobriyan
  2010-04-13  7:35     ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Alexey Dobriyan @ 2010-04-13 11:18 UTC (permalink / raw)
  To: Amerigo Wang
  Cc: linux-kernel, Octavian Purdila, Eric Dumazet, penguin-kernel,
	netdev, Neil Horman, ebiederm, David Miller

On Mon, Apr 12, 2010 at 06:04:04AM -0400, Amerigo Wang wrote:
> As we are about to add another integer handling proc function a little
> bit of cleanup is in order: add a few helper functions to improve code
> readability and decrease code duplication.
> 
> In the process a bug is also fixed: if the user specifies a number
> with more then 20 digits it will be interpreted as two integers
> (e.g. 10000...13 will be interpreted as 100.... and 13).

ULONG_MAX is not 22 digits always.

The fix is to not rely on simple_strtoul()

I guess it's time to finally remove it. :-(

Also, it's better to copy_from user stuff once.
Without looking at non-trivial users, one page should be enough.

> Behavior for EFAULT handling was changed as well. Previous to this
> patch, when an EFAULT error occurred in the middle of a write
> operation, although some of the elements were set, that was not
> acknowledged to the user (by shorting the write and returning the
> number of bytes accepted). EFAULT is now treated just like any other
> errors by acknowledging the amount of bytes accepted.

> +static int proc_skip_wspace(char __user **buf, size_t *size)
> +{
> +	char c;
> +
> +	while (*size) {
> +		if (get_user(c, *buf))
> +			return -EFAULT;
> +		if (!isspace(c))
> +			break;
> +		(*size)--;
> +		(*buf)++;
> +	}
> +
> +	return 0;
> +}

yeah, copy_from_user once, so we won't have this.

> +static bool isanyof(char c, const char *v, unsigned len)

A what?
this is memchr()

> +{
> +	int i;
> +
> +	if (!len)
> +		return false;
> +
> +	for (i = 0; i < len; i++)
> +		if (c == v[i])
> +			break;
> +	if (i == len)
> +		return false;
> +
> +	return true;
> +}
> +
> +#define TMPBUFLEN 22
> +/**
> + * proc_get_long - reads an ASCII formated integer from a user buffer
> + *
> + * @buf - user buffer
> + * @size - size of the user buffer
> + * @val - this is where the number will be stored
> + * @neg - set to %TRUE if number is negative
> + * @perm_tr - a vector which contains the allowed trailers
> + * @perm_tr_len - size of the perm_tr vector
> + * @tr - pointer to store the trailer character
> + *
> + * In case of success 0 is returned and buf and size are updated with
> + * the amount of bytes read. If tr is non NULL and a trailing
> + * character exist (size is non zero after returning from this
> + * function) tr is updated with the trailing character.
> + */
> +static int proc_get_long(char __user **buf, size_t *size,
> +			  unsigned long *val, bool *neg,
> +			  const char *perm_tr, unsigned perm_tr_len, char *tr)
> +{
> +	int len;
> +	char *p, tmp[TMPBUFLEN];
> +
> +	if (!*size)
> +		return -EINVAL;
> +
> +	len = *size;
> +	if (len > TMPBUFLEN-1)
> +		len = TMPBUFLEN-1;
> +
> +	if (copy_from_user(tmp, *buf, len))
> +		return -EFAULT;
> +
> +	tmp[len] = 0;
> +	p = tmp;
> +	if (*p == '-' && *size > 1) {
> +		*neg = 1;
> +		p++;
> +	} else
> +		*neg = 0;
> +	if (!isdigit(*p))
> +		return -EINVAL;
> +
> +	*val = simple_strtoul(p, &p, 0);
> +
> +	len = p - tmp;
> +
> +	/* We don't know if the next char is whitespace thus we may accept
> +	 * invalid integers (e.g. 1234...a) or two integers instead of one
> +	 * (e.g. 123...1). So lets not allow such large numbers. */
> +	if (len == TMPBUFLEN - 1)
> +		return -EINVAL;
> +
> +	if (len < *size && perm_tr_len && !isanyof(*p, perm_tr, perm_tr_len))
> +		return -EINVAL;
> +
> +	if (tr && (len < *size))
> +		*tr = *p;
> +
> +	*buf += len;
> +	*size -= len;
> +
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Patch 3/3] net: reserve ports for applications using fixed port numbers
  2010-04-13  8:48       ` Cong Wang
@ 2010-04-13 13:07         ` Tetsuo Handa
  2010-04-13 16:32           ` Sean Hefty
  0 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2010-04-13 13:07 UTC (permalink / raw)
  To: amwang, sean.hefty, rolandd
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm, linux-kernel

Hello.

Adding Sean Hefty and Roland Dreier as drivers/infiniband/core/cma.c maintainer.

Cong Wang wrote:
> Cong Wang wrote:
> > Tetsuo Handa wrote:
> >> Hello.
> >>
> >>> --- linux-2.6.orig/drivers/infiniband/core/cma.c
> >>> +++ linux-2.6/drivers/infiniband/core/cma.c
> >>> @@ -1980,6 +1980,8 @@ retry:
> >>>  	/* FIXME: add proper port randomization per like inet_csk_get_port */
> >>>  	do {
> >>>  		ret = idr_get_new_above(ps, bind_list, next_port, &port);
> >>> +		if (!ret && inet_is_reserved_local_port(port))
> >>> +			ret = -EAGAIN;
> >>>  	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
> >>>  
> >>>  	if (ret)
> >>>
> >> I think above part is wrong. Below program
> > ...
> >> This result suggests that above loop will continue until idr_pre_get() fails
> >> due to out of memory if all ports were reserved.
> >>
> >> Also, if idr_get_new_above() returned 0, bind_list (which is a kmalloc()ed
> >> pointer) is already installed into a free slot (see comment on
> >> idr_get_new_above_int()). Thus, simply calling idr_get_new_above() again will
> >> install the same pointer into multiple slots. I guess it will malfunction later.
> > 
> > Thanks for testing!
> > 
> > How about:
> > 
> > +		if (!ret && inet_is_reserved_local_port(port))
> > +			ret = -EBUSY;
> > 
> > ? So that it will break the loop and return error.
> > 
> 
> Or use the similar trick:
> 
>  int tries = 10;
> ...
> 
>  if(!ret && inet_is_reserved_local_port(port)) {
>    if (tries--)
>      ret = -EAGAIN;
>    else
>      ret = -EBUSY;
>  }
> 
> Any comments?
> 
I don't like above change. Above change makes local port assignment from
"likely-succeed" (succeeds if one port is available from thousands of ports) to
"unlikely-succeed" (fail if randomly chosen port is already in use).
We should repeat for all ranges specified in /proc/sys/net/ipv4/ip_local_port_range .

cma_alloc_any_port() and cma_alloc_port() are almost identical.
Thus, I think we can call cma_alloc_port() from cma_alloc_any_port().

Sean and Roland, is below patch correct?
inet_is_reserved_local_port() is the new function proposed in this patchset.

---
 drivers/infiniband/core/cma.c |   68 ++++++++++++++----------------------------
 1 file changed, 23 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,31 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		if (last_used_port != rover &&
+		    !inet_is_reserved_local_port(rover) &&
+		    !idr_find(ps, (unsigned short) rover) &&
+		    !cma_alloc_port(ps, id_priv, rover)) {
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			last_used_port = rover;
+			return 0;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining > 0);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2978,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [Patch 3/3] net: reserve ports for applications using fixed port numbers
  2010-04-13 13:07         ` Tetsuo Handa
@ 2010-04-13 16:32           ` Sean Hefty
  2010-04-14  2:01             ` [PATCH] Infiniband: Randomize local port allocation penguin-kernel
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Hefty @ 2010-04-13 16:32 UTC (permalink / raw)
  To: 'Tetsuo Handa', amwang, rolandd
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm, linux-kernel

>Sean and Roland, is below patch correct?
>inet_is_reserved_local_port() is the new function proposed in this patchset.

It looks correct to me.  I didn't test the patch series, but if I comment out
the call to inet_is_reserved_local_port() in the provided below, the changes
worked fine for me.

Acked-by: Sean Hefty <sean.hefty@intel.com>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] Infiniband: Randomize local port allocation.
  2010-04-13 16:32           ` Sean Hefty
@ 2010-04-14  2:01             ` penguin-kernel
  2010-04-14  4:38               ` Cong Wang
  2010-04-15  0:01               ` Sean Hefty
  0 siblings, 2 replies; 22+ messages in thread
From: penguin-kernel @ 2010-04-14  2:01 UTC (permalink / raw)
  To: rolandd, sean.hefty
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel

Sean Hefty wrote:
> Sean and Roland, is below patch correct?
> >inet_is_reserved_local_port() is the new function proposed in this patchset.
> 
> It looks correct to me.  I didn't test the patch series, but if I comment out
> the call to inet_is_reserved_local_port() in the provided below, the changes
> worked fine for me.
> 
> Acked-by: Sean Hefty <sean.hefty@intel.com>
> 
Thank you for testing.

I think it is better to split this patch into

Part 1: Make cma_alloc_any_port() to use cma_alloc_port().

Part 2: Insert "!inet_is_reserved_local_port(rover) &&" line.

for future "git bisect".

Roland, will you review below patch for part 1?
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   69 ++++++++++++++----------------------------
 1 file changed, 24 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,32 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+	do {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		if (last_used_port != rover &&
+		    !idr_find(ps, (unsigned short) rover)) {
+			int ret = cma_alloc_port(ps, id_priv, rover);
+			/*
+			 * Remember previously used port number in order to
+			 * avoid re-using same port immediately after it is
+			 * closed.
+			 */
+			if (!ret)
+				last_used_port = rover;
+			if (ret != -EADDRNOTAVAIL)
+				return ret;
 		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
-	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	} while (--remaining > 0);
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2979,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Infiniband: Randomize local port allocation.
  2010-04-14  2:01             ` [PATCH] Infiniband: Randomize local port allocation penguin-kernel
@ 2010-04-14  4:38               ` Cong Wang
  2010-04-15  0:01               ` Sean Hefty
  1 sibling, 0 replies; 22+ messages in thread
From: Cong Wang @ 2010-04-14  4:38 UTC (permalink / raw)
  To: penguin-kernel
  Cc: rolandd, sean.hefty, opurdila, eric.dumazet, netdev, nhorman,
	davem, ebiederm, linux-kernel

penguin-kernel@i-love.sakura.ne.jp wrote:
> Sean Hefty wrote:
>> Sean and Roland, is below patch correct?
>>> inet_is_reserved_local_port() is the new function proposed in this patchset.
>> It looks correct to me.  I didn't test the patch series, but if I comment out
>> the call to inet_is_reserved_local_port() in the provided below, the changes
>> worked fine for me.
>>
>> Acked-by: Sean Hefty <sean.hefty@intel.com>
>>
> Thank you for testing.
> 
> I think it is better to split this patch into
> 
> Part 1: Make cma_alloc_any_port() to use cma_alloc_port().
> 
> Part 2: Insert "!inet_is_reserved_local_port(rover) &&" line.
> 
> for future "git bisect".
> 

Right, thanks a lot for your work!

So, I will rebase my patch 3/3 on top of this patch. I hope someone
could take this one asap.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] Infiniband: Randomize local port allocation.
  2010-04-14  2:01             ` [PATCH] Infiniband: Randomize local port allocation penguin-kernel
  2010-04-14  4:38               ` Cong Wang
@ 2010-04-15  0:01               ` Sean Hefty
  2010-04-15  2:29                 ` Tetsuo Handa
  1 sibling, 1 reply; 22+ messages in thread
From: Sean Hefty @ 2010-04-15  0:01 UTC (permalink / raw)
  To: penguin-kernel, rolandd
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel

>[PATCH] Infiniband: Randomize local port allocation.
>
>Randomize local port allocation in a way sctp_get_port_local() does.
>
>Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

Thanks for fixing this long outstanding issue.  :)  The latest patch looks
correct and passed some simple tests that I ran against it.  One comment below,
which I didn't catch before:

>---
> drivers/infiniband/core/cma.c |   69 ++++++++++++++---------------------------
>-
> 1 file changed, 24 insertions(+), 45 deletions(-)
>
>--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
>+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
>@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
> static DEFINE_IDR(tcp_ps);
> static DEFINE_IDR(udp_ps);
> static DEFINE_IDR(ipoib_ps);
>-static int next_port;
>
> struct cma_device {
> 	struct list_head	list;
>@@ -1970,47 +1969,32 @@ err1:
>
> static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
> {
>-	struct rdma_bind_list *bind_list;
>-	int port, ret, low, high;
>-
>-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
>-	if (!bind_list)
>-		return -ENOMEM;
>-
>-retry:
>-	/* FIXME: add proper port randomization per like inet_csk_get_port */
>-	do {
>-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>-
>-	if (ret)
>-		goto err1;
>+	static unsigned int last_used_port;
>+	int low, high, remaining;
>+	unsigned int rover;
>
> 	inet_get_local_port_range(&low, &high);
>-	if (port > high) {
>-		if (next_port != low) {
>-			idr_remove(ps, port);
>-			next_port = low;
>-			goto retry;
>+	remaining = (high - low) + 1;
>+	rover = net_random() % remaining + low;
>+	do {
>+		rover++;
>+		if ((rover < low) || (rover > high))
>+			rover = low;

Assuming that we're likely to pick a valid port on the first try, it would be
more efficient to move the above 3 lines to the end of the while loop.

>+		if (last_used_port != rover &&
>+		    !idr_find(ps, (unsigned short) rover)) {
>+			int ret = cma_alloc_port(ps, id_priv, rover);
>+			/*
>+			 * Remember previously used port number in order to
>+			 * avoid re-using same port immediately after it is
>+			 * closed.
>+			 */
>+			if (!ret)
>+				last_used_port = rover;
>+			if (ret != -EADDRNOTAVAIL)
>+				return ret;
> 		}
>-		ret = -EADDRNOTAVAIL;
>-		goto err2;
>-	}
>-
>-	if (port == high)
>-		next_port = low;
>-	else
>-		next_port = port + 1;
>-
>-	bind_list->ps = ps;
>-	bind_list->port = (unsigned short) port;
>-	cma_bind_port(bind_list, id_priv);
>-	return 0;
>-err2:
>-	idr_remove(ps, port);
>-err1:
>-	kfree(bind_list);
>-	return ret;
>+	} while (--remaining > 0);
>+	return -EADDRNOTAVAIL;
> }



^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [PATCH] Infiniband: Randomize local port allocation.
  2010-04-15  0:01               ` Sean Hefty
@ 2010-04-15  2:29                 ` Tetsuo Handa
  2010-04-15 19:55                   ` [PATCH] rdma/cm: " Sean Hefty
  2010-04-21 23:19                   ` [PATCH] Infiniband: " Roland Dreier
  0 siblings, 2 replies; 22+ messages in thread
From: Tetsuo Handa @ 2010-04-15  2:29 UTC (permalink / raw)
  To: sean.hefty
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel, rolandd

Sean Hefty wrote:
> >+	remaining = (high - low) + 1;
> >+	rover = net_random() % remaining + low;
> >+	do {
> >+		rover++;
> >+		if ((rover < low) || (rover > high))
> >+			rover = low;
> 
> Assuming that we're likely to pick a valid port on the first try, it would be
> more efficient to move the above 3 lines to the end of the while loop.
> 
Indeed. I moved these lines to "if (--remaining) { ... }" block.
--------------------
[PATCH] Infiniband: Randomize local port allocation.

Randomize local port allocation in a way sctp_get_port_local() does.
Update rover at the end of loop since we're likely to pick a valid port
on the first try.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 drivers/infiniband/core/cma.c |   70 +++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 45 deletions(-)

--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
 static DEFINE_IDR(tcp_ps);
 static DEFINE_IDR(udp_ps);
 static DEFINE_IDR(ipoib_ps);
-static int next_port;
 
 struct cma_device {
 	struct list_head	list;
@@ -1970,47 +1969,33 @@ err1:
 
 static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
 {
-	struct rdma_bind_list *bind_list;
-	int port, ret, low, high;
-
-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
-	if (!bind_list)
-		return -ENOMEM;
-
-retry:
-	/* FIXME: add proper port randomization per like inet_csk_get_port */
-	do {
-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
-
-	if (ret)
-		goto err1;
+	static unsigned int last_used_port;
+	int low, high, remaining;
+	unsigned int rover;
 
 	inet_get_local_port_range(&low, &high);
-	if (port > high) {
-		if (next_port != low) {
-			idr_remove(ps, port);
-			next_port = low;
-			goto retry;
-		}
-		ret = -EADDRNOTAVAIL;
-		goto err2;
+	remaining = (high - low) + 1;
+	rover = net_random() % remaining + low;
+retry:
+	if (last_used_port != rover &&
+	    !idr_find(ps, (unsigned short) rover)) {
+		int ret = cma_alloc_port(ps, id_priv, rover);
+		/*
+		 * Remember previously used port number in order to avoid
+		 * re-using same port immediately after it is closed.
+		 */
+		if (!ret)
+			last_used_port = rover;
+		if (ret != -EADDRNOTAVAIL)
+			return ret;
 	}
-
-	if (port == high)
-		next_port = low;
-	else
-		next_port = port + 1;
-
-	bind_list->ps = ps;
-	bind_list->port = (unsigned short) port;
-	cma_bind_port(bind_list, id_priv);
-	return 0;
-err2:
-	idr_remove(ps, port);
-err1:
-	kfree(bind_list);
-	return ret;
+	if (--remaining) {
+		rover++;
+		if ((rover < low) || (rover > high))
+			rover = low;
+		goto retry;
+	}
+	return -EADDRNOTAVAIL;
 }
 
 static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
@@ -2995,12 +2980,7 @@ static void cma_remove_one(struct ib_dev
 
 static int __init cma_init(void)
 {
-	int ret, low, high, remaining;
-
-	get_random_bytes(&next_port, sizeof next_port);
-	inet_get_local_port_range(&low, &high);
-	remaining = (high - low) + 1;
-	next_port = ((unsigned int) next_port % remaining) + low;
+	int ret;
 
 	cma_wq = create_singlethread_workqueue("rdma_cm");
 	if (!cma_wq)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] rdma/cm: Randomize local port allocation.
  2010-04-15  2:29                 ` Tetsuo Handa
@ 2010-04-15 19:55                   ` Sean Hefty
  2010-04-16  2:22                     ` Cong Wang
  2010-04-21 23:19                   ` [PATCH] Infiniband: " Roland Dreier
  1 sibling, 1 reply; 22+ messages in thread
From: Sean Hefty @ 2010-04-15 19:55 UTC (permalink / raw)
  To: 'Tetsuo Handa'
  Cc: amwang, opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel, rolandd, linux-rdma

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>

>Randomize local port allocation in a way sctp_get_port_local() does.
>Update rover at the end of loop since we're likely to pick a valid port
>on the first try.
>
>Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>

>---

I like this version, thanks!  I'm not sure which tree to merge it through.
Are you needing this for 2.6.34, or is 2.6.35 okay?

> drivers/infiniband/core/cma.c |   70 +++++++++++++++--------------------------
>-
> 1 file changed, 25 insertions(+), 45 deletions(-)
>
>--- linux-2.6.34-rc4.orig/drivers/infiniband/core/cma.c
>+++ linux-2.6.34-rc4/drivers/infiniband/core/cma.c
>@@ -79,7 +79,6 @@ static DEFINE_IDR(sdp_ps);
> static DEFINE_IDR(tcp_ps);
> static DEFINE_IDR(udp_ps);
> static DEFINE_IDR(ipoib_ps);
>-static int next_port;
>
> struct cma_device {
> 	struct list_head	list;
>@@ -1970,47 +1969,33 @@ err1:
>
> static int cma_alloc_any_port(struct idr *ps, struct rdma_id_private *id_priv)
> {
>-	struct rdma_bind_list *bind_list;
>-	int port, ret, low, high;
>-
>-	bind_list = kzalloc(sizeof *bind_list, GFP_KERNEL);
>-	if (!bind_list)
>-		return -ENOMEM;
>-
>-retry:
>-	/* FIXME: add proper port randomization per like inet_csk_get_port */
>-	do {
>-		ret = idr_get_new_above(ps, bind_list, next_port, &port);
>-	} while ((ret == -EAGAIN) && idr_pre_get(ps, GFP_KERNEL));
>-
>-	if (ret)
>-		goto err1;
>+	static unsigned int last_used_port;
>+	int low, high, remaining;
>+	unsigned int rover;
>
> 	inet_get_local_port_range(&low, &high);
>-	if (port > high) {
>-		if (next_port != low) {
>-			idr_remove(ps, port);
>-			next_port = low;
>-			goto retry;
>-		}
>-		ret = -EADDRNOTAVAIL;
>-		goto err2;
>+	remaining = (high - low) + 1;
>+	rover = net_random() % remaining + low;
>+retry:
>+	if (last_used_port != rover &&
>+	    !idr_find(ps, (unsigned short) rover)) {
>+		int ret = cma_alloc_port(ps, id_priv, rover);
>+		/*
>+		 * Remember previously used port number in order to avoid
>+		 * re-using same port immediately after it is closed.
>+		 */
>+		if (!ret)
>+			last_used_port = rover;
>+		if (ret != -EADDRNOTAVAIL)
>+			return ret;
> 	}
>-
>-	if (port == high)
>-		next_port = low;
>-	else
>-		next_port = port + 1;
>-
>-	bind_list->ps = ps;
>-	bind_list->port = (unsigned short) port;
>-	cma_bind_port(bind_list, id_priv);
>-	return 0;
>-err2:
>-	idr_remove(ps, port);
>-err1:
>-	kfree(bind_list);
>-	return ret;
>+	if (--remaining) {
>+		rover++;
>+		if ((rover < low) || (rover > high))
>+			rover = low;
>+		goto retry;
>+	}
>+	return -EADDRNOTAVAIL;
> }
>
> static int cma_use_port(struct idr *ps, struct rdma_id_private *id_priv)
>@@ -2995,12 +2980,7 @@ static void cma_remove_one(struct ib_dev
>
> static int __init cma_init(void)
> {
>-	int ret, low, high, remaining;
>-
>-	get_random_bytes(&next_port, sizeof next_port);
>-	inet_get_local_port_range(&low, &high);
>-	remaining = (high - low) + 1;
>-	next_port = ((unsigned int) next_port % remaining) + low;
>+	int ret;
>
> 	cma_wq = create_singlethread_workqueue("rdma_cm");
> 	if (!cma_wq)


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rdma/cm: Randomize local port allocation.
  2010-04-15 19:55                   ` [PATCH] rdma/cm: " Sean Hefty
@ 2010-04-16  2:22                     ` Cong Wang
  2010-04-16 13:54                       ` Tetsuo Handa
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2010-04-16  2:22 UTC (permalink / raw)
  To: Sean Hefty
  Cc: 'Tetsuo Handa',
	opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel, rolandd, linux-rdma

Sean Hefty wrote:
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> 
>> Randomize local port allocation in a way sctp_get_port_local() does.
>> Update rover at the end of loop since we're likely to pick a valid port
>> on the first try.
>>
>> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Reviewed-by: Sean Hefty <sean.hefty@intel.com>
> 

Thanks, everyone!

> 
> I like this version, thanks!  I'm not sure which tree to merge it through.
> Are you needing this for 2.6.34, or is 2.6.35 okay?
> 

As soon as possible, so 2.6.34. :)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rdma/cm: Randomize local port allocation.
  2010-04-16  2:22                     ` Cong Wang
@ 2010-04-16 13:54                       ` Tetsuo Handa
  2010-04-16 20:30                         ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Tetsuo Handa @ 2010-04-16 13:54 UTC (permalink / raw)
  To: amwang, sean.hefty
  Cc: opurdila, eric.dumazet, netdev, nhorman, davem, ebiederm,
	linux-kernel, rolandd, linux-rdma

Cong Wang wrote:
> Sean Hefty wrote:
> > I like this version, thanks!  I'm not sure which tree to merge it through.
> > Are you needing this for 2.6.34, or is 2.6.35 okay?
> > 
> 
> As soon as possible, so 2.6.34. :)
> 
Cong, merge window for 2.6.34 was already closed.
You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rdma/cm: Randomize local port allocation.
  2010-04-16 13:54                       ` Tetsuo Handa
@ 2010-04-16 20:30                         ` David Miller
  2010-04-20  4:34                           ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2010-04-16 20:30 UTC (permalink / raw)
  To: penguin-kernel
  Cc: amwang, sean.hefty, opurdila, eric.dumazet, netdev, nhorman,
	ebiederm, linux-kernel, rolandd, linux-rdma

From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Fri, 16 Apr 2010 22:54:22 +0900

> Cong Wang wrote:
>> Sean Hefty wrote:
>> > I like this version, thanks!  I'm not sure which tree to merge it through.
>> > Are you needing this for 2.6.34, or is 2.6.35 okay?
>> > 
>> 
>> As soon as possible, so 2.6.34. :)
>> 
> Cong, merge window for 2.6.34 was already closed.
> You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
> rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
> queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.

I don't take RDMA patches into net-next-2.6, the less I touch this
stack avoiding stuff the better and Roland has been taking this stuff
into his own tree for some time now.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] rdma/cm: Randomize local port allocation.
  2010-04-16 20:30                         ` David Miller
@ 2010-04-20  4:34                           ` Cong Wang
  0 siblings, 0 replies; 22+ messages in thread
From: Cong Wang @ 2010-04-20  4:34 UTC (permalink / raw)
  To: David Miller
  Cc: penguin-kernel, sean.hefty, opurdila, eric.dumazet, netdev,
	nhorman, ebiederm, linux-kernel, rolandd, linux-rdma

David Miller wrote:
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Fri, 16 Apr 2010 22:54:22 +0900
> 
>> Cong Wang wrote:
>>> Sean Hefty wrote:
>>>> I like this version, thanks!  I'm not sure which tree to merge it through.
>>>> Are you needing this for 2.6.34, or is 2.6.35 okay?
>>>>
>>> As soon as possible, so 2.6.34. :)
>>>
>> Cong, merge window for 2.6.34 was already closed.
>> You need to make your patchset towards 2.6.35 (using net-next-2.6 tree)
>> rather than 2.6.34 (using linux-2.6 tree). Therefore, this patch being
>> queued for 2.6.35 (through net-next-2.6 tree) should be okay for you.
> 
> I don't take RDMA patches into net-next-2.6, the less I touch this
> stack avoiding stuff the better and Roland has been taking this stuff
> into his own tree for some time now.

I left for a few days.

Ok, so I will wait for this to be merged.

Thanks, David and Tetsuo!


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Infiniband: Randomize local port allocation.
  2010-04-15  2:29                 ` Tetsuo Handa
  2010-04-15 19:55                   ` [PATCH] rdma/cm: " Sean Hefty
@ 2010-04-21 23:19                   ` Roland Dreier
  2010-04-21 23:22                     ` Roland Dreier
  1 sibling, 1 reply; 22+ messages in thread
From: Roland Dreier @ 2010-04-21 23:19 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sean.hefty, amwang, opurdila, eric.dumazet, netdev, nhorman,
	davem, ebiederm, linux-kernel, rolandd

Thanks, applied this part of the patch -- I preferred this one since the
goto into the middle of a loop seemed worse than a goto out of the loop...
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] Infiniband: Randomize local port allocation.
  2010-04-21 23:19                   ` [PATCH] Infiniband: " Roland Dreier
@ 2010-04-21 23:22                     ` Roland Dreier
  0 siblings, 0 replies; 22+ messages in thread
From: Roland Dreier @ 2010-04-21 23:22 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: sean.hefty, amwang, opurdila, eric.dumazet, netdev, nhorman,
	davem, ebiederm, linux-kernel, rolandd

 > Thanks, applied this part of the patch -- I preferred this one since the

err, not "part of the patch" -- I meant "this version of the patch".
-- 
Roland Dreier <rolandd@cisco.com> || For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/index.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2010-04-21 23:22 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-12 10:03 [Patch v8 0/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
2010-04-12 10:04 ` [Patch 1/3] sysctl: refactor integer handling proc code Amerigo Wang
2010-04-13 11:18   ` Alexey Dobriyan
2010-04-13  7:35     ` Cong Wang
2010-04-12 10:04 ` [Patch 2/3] sysctl: add proc_do_large_bitmap Amerigo Wang
2010-04-12 10:04 ` [Patch 3/3] net: reserve ports for applications using fixed port numbers Amerigo Wang
2010-04-13  1:21   ` Tetsuo Handa
2010-04-13  7:13     ` Cong Wang
2010-04-13  8:48       ` Cong Wang
2010-04-13 13:07         ` Tetsuo Handa
2010-04-13 16:32           ` Sean Hefty
2010-04-14  2:01             ` [PATCH] Infiniband: Randomize local port allocation penguin-kernel
2010-04-14  4:38               ` Cong Wang
2010-04-15  0:01               ` Sean Hefty
2010-04-15  2:29                 ` Tetsuo Handa
2010-04-15 19:55                   ` [PATCH] rdma/cm: " Sean Hefty
2010-04-16  2:22                     ` Cong Wang
2010-04-16 13:54                       ` Tetsuo Handa
2010-04-16 20:30                         ` David Miller
2010-04-20  4:34                           ` Cong Wang
2010-04-21 23:19                   ` [PATCH] Infiniband: " Roland Dreier
2010-04-21 23:22                     ` Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).