All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports
@ 2012-03-10 23:36 Helge Deller
  2012-03-11 22:55 ` David Miller
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Helge Deller @ 2012-03-10 23:36 UTC (permalink / raw)
  To: Octavian Purdila, WANG Cong, Linux Kernel Development,
	Andrew Morton, Eric W. Biederman, Frank Danapfel, Laszlo Ersek,
	Linus

When writing to the ip_local_reserved_ports proc file it will currently clear
all previously reserved ports and update the current list with the one given 
in the input.

This behaviour makes it's usage quite hard, for example:
a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
   characters at maximum may not be sufficient to provide all your wished-to-
   be-reserved ports at once.
b) There is no easy way to disable specific given ports, you always need to
   give the full port list at once. This makes shell scripting hard, since
   you need to parse everything yourself.
c) There is no easy way to just add specific ports or port ranges. Again,
   this would be useful for shell scripts.

The following patch solves this problem by simply extending the parser
in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
of given ports or port ranges and to either add or drop the given ports
from the already existing list.

Here is an example:
$ echo "1000-2000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,5000   (works as before, current port list is replaced by new one)

$ echo "add 3000-4000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,3000-4000,5000   (new ports added)

$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-1499,3501-4000,5000   (given ports were dropped from the list)

My main motivation for this patch is a huge commercial application which
by default may use lots of ports. The full port list which I would have needed
to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was around
30K, and in this case all ports were already combined to regions where possible.
With this patch it's now easy to split up the port ranges into single pieces
and to implement everything in simple bootup shell scripts. Furthermore
adding new or removing unneeded ports dynamically at runtime is now easily
possible.


Signed-off-by: Helge Deller <deller@gmx.de>


 Documentation/networking/ip-sysctl.txt |   10 ++++++
 kernel/sysctl.c                        |   48 +++++++++++++++++++++++++++------
 2 files changed, 49 insertions(+), 9 deletions(-)



diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ad3e80e..2ca93ab 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -624,7 +624,15 @@ ip_local_reserved_ports - list of comma separated ranges
 	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
 	10). Writing to the file will clear all previously reserved
 	ports and update the current list with the one given in the
-	input.
+	input. When writing to the file the keywords "add" or "release"
+	may be used to add or release the given ports or port ranges
+	to/from the already existing port list.
+	Example:
+	$ echo "1000-2000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "add 3000-4000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
+	1000-1499,3501-4000,5000
 
 	Note that ip_local_port_range and ip_local_reserved_ports
 	settings are independent and both are considered by the kernel
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f487f25..241fc5a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
  * We use a range comma separated format (e.g. 1,3-4,10-10) so that
  * large bitmaps may be represented in a compact manner. Writing into
  * the file will clear the bitmap then update it with the given input.
+ * If "add" or "release" is written in front of numbers or number ranges,
+ * the given bits will be added to or released from the existing bitmap.
  *
  * Returns 0 on success.
  */
@@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 {
 	int err = 0;
 	bool first = 1;
+	bool add_or_release = 0, xadd = 0, xrelease = 0;
 	size_t left = *lenp;
 	unsigned long bitmap_len = table->maxlen;
 	unsigned long *bitmap = (unsigned long *) table->data;
-	unsigned long *tmp_bitmap = NULL;
-	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
+	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
+	char tr_a[] = { '-', ',', ' ', '\n' },
+	     tr_b[] = { ',', ' ', '\n', 0 }, c;
 
 	if (!bitmap_len || !left || (*ppos && !write)) {
 		*lenp = 0;
@@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
                 }
 		kbuf[left] = 0;
 
-		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
-				     GFP_KERNEL);
+		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
+					sizeof(unsigned long), GFP_KERNEL);
+		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
 		if (!tmp_bitmap) {
 			free_page(page);
 			return -ENOMEM;
@@ -2850,7 +2855,30 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 		proc_skip_char(&kbuf, &left, '\n');
 		while (!err && left) {
 			unsigned long val_a, val_b;
-			bool neg;
+			bool neg, found;
+
+			left -= proc_skip_spaces(&kbuf);
+			if (!left)
+				continue;
+
+			found = (0 == strnicmp(kbuf, "add ", 4));
+			if (found) {
+				xadd = 1; xrelease = 0;
+				add_or_release = 1;
+				kbuf += 4;
+				left -= 4;
+			}
+			found = (0 == strnicmp(kbuf, "release ", 8));
+			if (found) {
+				xadd = 0; xrelease = 1;
+				add_or_release = 1;
+				kbuf += 8;
+				left -= 8;
+			}
+
+			left -= proc_skip_spaces(&kbuf);
+			if (!left)
+				continue;
 
 			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
 					     sizeof(tr_a), &c);
@@ -2885,7 +2913,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 			}
 
 			while (val_a <= val_b)
-				set_bit(val_a++, tmp_bitmap);
+				if (xrelease)
+					set_bit(val_a++, release_bitmap);
+				else
+					set_bit(val_a++, tmp_bitmap);
 
 			first = 0;
 			proc_skip_char(&kbuf, &left, '\n');
@@ -2926,9 +2957,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 
 	if (!err) {
 		if (write) {
-			if (*ppos)
+			if (*ppos || add_or_release) {
 				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
-			else
+				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
+			} else
 				memcpy(bitmap, tmp_bitmap,
 					BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-03-10 23:36 [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
@ 2012-03-11 22:55 ` David Miller
  2012-03-12  3:42 ` Cong Wang
  2012-03-13 20:33 ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2) Helge Deller
  2 siblings, 0 replies; 17+ messages in thread
From: David Miller @ 2012-03-11 22:55 UTC (permalink / raw)
  To: deller
  Cc: opurdila, amwang, linux-kernel, akpm, ebiederm, fdanapfe, lersek,
	torvalds


Networking patches should be submitted to netdev@vger.kernel.org, most
of the networking developers don't even read linux-kernel


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-03-10 23:36 [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
  2012-03-11 22:55 ` David Miller
@ 2012-03-12  3:42 ` Cong Wang
  2012-03-12 21:09   ` Helge Deller
  2012-03-13 20:33 ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2) Helge Deller
  2 siblings, 1 reply; 17+ messages in thread
From: Cong Wang @ 2012-03-12  3:42 UTC (permalink / raw)
  To: Helge Deller
  Cc: Octavian Purdila, Linux Kernel Development, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek, Linus

Hello, Helge,

On Sun, 2012-03-11 at 00:36 +0100, Helge Deller wrote:
> When writing to the ip_local_reserved_ports proc file it will currently clear
> all previously reserved ports and update the current list with the one given 
> in the input.
> 
> This behaviour makes it's usage quite hard, for example:
> a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
>    characters at maximum may not be sufficient to provide all your wished-to-
>    be-reserved ports at once.

Yes, this should be extended IMHO.

> b) There is no easy way to disable specific given ports, you always need to
>    give the full port list at once. This makes shell scripting hard, since
>    you need to parse everything yourself.
> c) There is no easy way to just add specific ports or port ranges. Again,
>    this would be useful for shell scripts.
> 

These could be calculated in user-space, although it maybe not as easy
as you want.

> The following patch solves this problem by simply extending the parser
> in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
> of given ports or port ranges and to either add or drop the given ports
> from the already existing list.

This looks a little odd, because we do "magic" things with a sysctl
file, which is supposed to be plain text file. Do we have existing
examples?

BTW, as David mentioned, please Cc netdev next time.

Thanks.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-03-12  3:42 ` Cong Wang
@ 2012-03-12 21:09   ` Helge Deller
  0 siblings, 0 replies; 17+ messages in thread
From: Helge Deller @ 2012-03-12 21:09 UTC (permalink / raw)
  To: Cong Wang
  Cc: Octavian Purdila, Linux Kernel Development, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek, Linus

On 03/12/2012 04:42 AM, Cong Wang wrote:
> On Sun, 2012-03-11 at 00:36 +0100, Helge Deller wrote:
>> When writing to the ip_local_reserved_ports proc file it will currently clear
>> all previously reserved ports and update the current list with the one given
>> in the input.
>>
>> This behaviour makes it's usage quite hard, for example:
>> a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
>>     characters at maximum may not be sufficient to provide all your wished-to-
>>     be-reserved ports at once.
>
> Yes, this should be extended IMHO.

Yes, known problem and not easy to fix.

>> b) There is no easy way to disable specific given ports, you always need to
>>     give the full port list at once. This makes shell scripting hard, since
>>     you need to parse everything yourself.
>> c) There is no easy way to just add specific ports or port ranges. Again,
>>     this would be useful for shell scripts.
>>
>
> These could be calculated in user-space, although it maybe not as easy
> as you want.

Right.

>> The following patch solves this problem by simply extending the parser
>> in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
>> of given ports or port ranges and to either add or drop the given ports
>> from the already existing list.
>
> This looks a little odd, because we do "magic" things with a sysctl
> file, which is supposed to be plain text file. Do we have existing
> examples?

I don't think the networking sysctl has such tunables.
Overall, ip_local_reserved_ports is the only user of the
large_bitmap function and as such you can't compare the
input/output of this file with other sysctl files which
operate on a limited number of integers/booleans/strings only.
Furthermore my patch does not remove the "plain text"
behaviour of this file. You can still echo plain ports or
port lists into it and a "cat" gives you the same output
as before. It just extends the interface to add/remove
ports more easily if you want.

One somewhat comparable proc file which comes to my mind in this
regard is /proc/scsi/scsi where you can echo commands like
"scsi report-devs 1" and which then reacts. A user of this
interface is e.g. the known rescan-scsi-bus shell script:
http://www.garloff.de/kurt/linux/rescan-scsi-bus.sh-1.25

> BTW, as David mentioned, please Cc netdev next time.

Sure. I'll send an updated patch tomorrow and will CC netdev.

Thanks,
Helge

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2)
  2012-03-10 23:36 [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
  2012-03-11 22:55 ` David Miller
  2012-03-12  3:42 ` Cong Wang
@ 2012-03-13 20:33 ` Helge Deller
  2012-03-14  7:43   ` Cong Wang
  2012-03-14 22:14   ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3) Helge Deller
  2 siblings, 2 replies; 17+ messages in thread
From: Helge Deller @ 2012-03-13 20:33 UTC (permalink / raw)
  To: Octavian Purdila, netdev, David Miller, Cong Wang, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek

When writing to the ip_local_reserved_ports proc file it will currently clear
all previously reserved ports and update the current list with the one given 
in the input.

This behaviour makes it's usage quite hard, for example:
a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
   characters at maximum may not be sufficient to provide all your wished-to-
   be-reserved ports at once.
b) There is no easy way to disable specific given ports, you always need to
   give the full port list at once. This makes shell scripting hard, since
   you need to parse everything yourself.
c) There is no easy way to just add specific ports or port ranges. Again,
   this would be useful for shell scripts.

The following patch solves this problem by simply extending the parser
in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
of given ports or port ranges and to either add or drop the given ports
from the already existing list.

Here is an example:
$ echo "1000-2000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,5000   (works as before, current port list is replaced by new one)

$ echo "add 3000-4000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,3000-4000,5000   (new ports added)

$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-1499,3501-4000,5000   (given ports were dropped from the list)

My main motivation for this patch is because of a huge commercial application
which by default may use lots of ports. The full port list which I would have
needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
around 30K, and in this case all ports were already combined to regions where
possible. With this patch it's now easy to split up the port ranges into
single pieces and to implement everything in simple bootup shell scripts.
Furthermore adding new or removing unneeded ports dynamically at runtime is
now easily possible.

Signed-off-by: Helge Deller <deller@gmx.de>
CC: Octavian Purdila <octavian.purdila@intel.com>
CC: netdev@vger.kernel.org
CC: David Miller <davem@davemloft.net>
CC: Cong Wang <amwang@redhat.com>
CC: "Eric W. Biederman" <ebiederm@xmission.com>

 Documentation/networking/ip-sysctl.txt |   10 +++++-
 kernel/sysctl.c                        |   50 +++++++++++++++++++++++++++------
 2 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ad3e80e..fc52546 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -624,7 +624,15 @@ ip_local_reserved_ports - list of comma separated ranges
 	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
 	10). Writing to the file will clear all previously reserved
 	ports and update the current list with the one given in the
-	input.
+	input unless one of the keywords "add" or "release" is used
+	in front of the ports in which case the given ports are added
+	to or released from the currently existing port list.
+	Example:
+	$ echo "1000-2000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "add 3000-4000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
+	1000-1499,3501-4000,5000
 
 	Note that ip_local_port_range and ip_local_reserved_ports
 	settings are independent and both are considered by the kernel
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f487f25..1f60398 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
  * We use a range comma separated format (e.g. 1,3-4,10-10) so that
  * large bitmaps may be represented in a compact manner. Writing into
  * the file will clear the bitmap then update it with the given input.
+ * If "add" or "release" is written in front of numbers or number ranges,
+ * the given bits will be added to or released from the existing bitmap.
  *
  * Returns 0 on success.
  */
@@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 {
 	int err = 0;
 	bool first = 1;
+	bool add_or_release = 0, xrelease = 0;
 	size_t left = *lenp;
 	unsigned long bitmap_len = table->maxlen;
 	unsigned long *bitmap = (unsigned long *) table->data;
-	unsigned long *tmp_bitmap = NULL;
-	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
+	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
+	char tr_a[] = { '-', ',', ' ', '\n' },
+	     tr_b[] = { ',', ' ', '\n', 0 }, c;
 
 	if (!bitmap_len || !left || (*ppos && !write)) {
 		*lenp = 0;
@@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
                 }
 		kbuf[left] = 0;
 
-		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
-				     GFP_KERNEL);
+		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
+					sizeof(unsigned long), GFP_KERNEL);
+		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
 		if (!tmp_bitmap) {
 			free_page(page);
 			return -ENOMEM;
@@ -2850,7 +2855,32 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 		proc_skip_char(&kbuf, &left, '\n');
 		while (!err && left) {
 			unsigned long val_a, val_b;
-			bool neg;
+			bool neg, found;
+
+			left -= proc_skip_spaces(&kbuf);
+			if (!left)
+				continue;
+
+			if (first || add_or_release) {
+				found = (0 == strnicmp(kbuf, "add ", 4));
+				if (found) {
+					xrelease = 0;
+					add_or_release = 1;
+					kbuf += 4;
+					left -= 4;
+				}
+				found = (0 == strnicmp(kbuf, "release ", 8));
+				if (found) {
+					xrelease = 1;
+					add_or_release = 1;
+					kbuf += 8;
+					left -= 8;
+				}
+
+				left -= proc_skip_spaces(&kbuf);
+				if (!left)
+					continue;
+			}
 
 			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
 					     sizeof(tr_a), &c);
@@ -2885,7 +2915,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 			}
 
 			while (val_a <= val_b)
-				set_bit(val_a++, tmp_bitmap);
+				if (xrelease)
+					set_bit(val_a++, release_bitmap);
+				else
+					set_bit(val_a++, tmp_bitmap);
 
 			first = 0;
 			proc_skip_char(&kbuf, &left, '\n');
@@ -2926,9 +2959,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 
 	if (!err) {
 		if (write) {
-			if (*ppos)
+			if (*ppos || add_or_release) {
 				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
-			else
+				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
+			} else
 				memcpy(bitmap, tmp_bitmap,
 					BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2)
  2012-03-13 20:33 ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2) Helge Deller
@ 2012-03-14  7:43   ` Cong Wang
  2012-03-14 22:06     ` Helge Deller
  2012-03-14 22:14   ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3) Helge Deller
  1 sibling, 1 reply; 17+ messages in thread
From: Cong Wang @ 2012-03-14  7:43 UTC (permalink / raw)
  To: Helge Deller
  Cc: Octavian Purdila, netdev, David Miller, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek

On Tue, 2012-03-13 at 21:33 +0100, Helge Deller wrote:
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f487f25..1f60398 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
>   * We use a range comma separated format (e.g. 1,3-4,10-10) so that
>   * large bitmaps may be represented in a compact manner. Writing into
>   * the file will clear the bitmap then update it with the given input.
> + * If "add" or "release" is written in front of numbers or number ranges,
> + * the given bits will be added to or released from the existing bitmap.
>   *

What if I only write "add" or "release" ("add ", "release " too) into
this file? Make sure you have tested this corner case.

>   * Returns 0 on success.
>   */
> @@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  {
>  	int err = 0;
>  	bool first = 1;
> +	bool add_or_release = 0, xrelease = 0;
>  	size_t left = *lenp;
>  	unsigned long bitmap_len = table->maxlen;
>  	unsigned long *bitmap = (unsigned long *) table->data;
> -	unsigned long *tmp_bitmap = NULL;
> -	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
> +	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
> +	char tr_a[] = { '-', ',', ' ', '\n' },
> +	     tr_b[] = { ',', ' ', '\n', 0 }, c;
>  
>  	if (!bitmap_len || !left || (*ppos && !write)) {
>  		*lenp = 0;
> @@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>                  }
>  		kbuf[left] = 0;
>  
> -		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
> -				     GFP_KERNEL);
> +		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
> +					sizeof(unsigned long), GFP_KERNEL);
> +		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];

So you double the size, and give the second half to 'release_bitmap',
this will waste spaces when release_bitmap is short, right? *I think* we
can check if we want to release any bitmaps first, and then only
allocate one of tmp_bitmap and release_bitmap.

>  		if (!tmp_bitmap) {
>  			free_page(page);
>  			return -ENOMEM;
> @@ -2850,7 +2855,32 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  		proc_skip_char(&kbuf, &left, '\n');
>  		while (!err && left) {
>  			unsigned long val_a, val_b;
> -			bool neg;
> +			bool neg, found;
> +
> +			left -= proc_skip_spaces(&kbuf);
> +			if (!left)
> +				continue;
> +
> +			if (first || add_or_release) {
> +				found = (0 == strnicmp(kbuf, "add ", 4));
> +				if (found) {

I think we don't need an extra variable 'found' here.

> +					xrelease = 0;
> +					add_or_release = 1;
> +					kbuf += 4;
> +					left -= 4;
> +				}
> +				found = (0 == strnicmp(kbuf, "release ", 8));
> +				if (found) {
> +					xrelease = 1;
> +					add_or_release = 1;
> +					kbuf += 8;
> +					left -= 8;
> +				}
> +
> +				left -= proc_skip_spaces(&kbuf);
> +				if (!left)
> +					continue;
> +			}
>  
>  			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
>  					     sizeof(tr_a), &c);
> @@ -2885,7 +2915,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  			}
>  
>  			while (val_a <= val_b)
> -				set_bit(val_a++, tmp_bitmap);
> +				if (xrelease)
> +					set_bit(val_a++, release_bitmap);
> +				else
> +					set_bit(val_a++, tmp_bitmap);
>  
>  			first = 0;
>  			proc_skip_char(&kbuf, &left, '\n');
> @@ -2926,9 +2959,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  
>  	if (!err) {
>  		if (write) {
> -			if (*ppos)
> +			if (*ppos || add_or_release) {
>  				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
> -			else
> +				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
> +			} else


Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2)
  2012-03-14  7:43   ` Cong Wang
@ 2012-03-14 22:06     ` Helge Deller
  2012-03-14 22:20       ` Stephen Hemminger
  0 siblings, 1 reply; 17+ messages in thread
From: Helge Deller @ 2012-03-14 22:06 UTC (permalink / raw)
  To: Cong Wang
  Cc: Octavian Purdila, netdev, David Miller, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek

On 03/14/2012 08:43 AM, Cong Wang wrote:
> On Tue, 2012-03-13 at 21:33 +0100, Helge Deller wrote:
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
>> index f487f25..1f60398 100644
>> --- a/kernel/sysctl.c
>> +++ b/kernel/sysctl.c
>> @@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
>>   * We use a range comma separated format (e.g. 1,3-4,10-10) so that
>>   * large bitmaps may be represented in a compact manner. Writing into
>>   * the file will clear the bitmap then update it with the given input.
>> + * If "add" or "release" is written in front of numbers or number ranges,
>> + * the given bits will be added to or released from the existing bitmap.
>>   *
> 
> What if I only write "add" or "release" ("add ", "release " too) into
> this file? Make sure you have tested this corner case.

Sure, I tested this case. It will not modify the the current port list.

But there were other cases which I initially didn't took care of, mostly 
because I wanted to keep the parser simple.
Now, in the next version of the patch the following cases will be handled
correctly: 
- "add release 100"		(->syntax error)
- "release 100 add 100"		(->with all in one line the result will not be
		as expected because first bitmap_or() and then bitmap_andnot()
		will be executed, so that the 100th bit becomes released instead
		of added. Users will need to split this into two echo commands
		otherwise -EINVAL will be returned.)

>>   * Returns 0 on success.
>>   */
>> @@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>>  {
>>  	int err = 0;
>>  	bool first = 1;
>> +	bool add_or_release = 0, xrelease = 0;
>>  	size_t left = *lenp;
>>  	unsigned long bitmap_len = table->maxlen;
>>  	unsigned long *bitmap = (unsigned long *) table->data;
>> -	unsigned long *tmp_bitmap = NULL;
>> -	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
>> +	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
>> +	char tr_a[] = { '-', ',', ' ', '\n' },
>> +	     tr_b[] = { ',', ' ', '\n', 0 }, c;
>>  
>>  	if (!bitmap_len || !left || (*ppos && !write)) {
>>  		*lenp = 0;
>> @@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>>                  }
>>  		kbuf[left] = 0;
>>  
>> -		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
>> -				     GFP_KERNEL);
>> +		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
>> +					sizeof(unsigned long), GFP_KERNEL);
>> +		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
> 
> So you double the size, and give the second half to 'release_bitmap',
> this will waste spaces when release_bitmap is short, right?

Yes.

> *I think* we
> can check if we want to release any bitmaps first, and then only
> allocate one of tmp_bitmap and release_bitmap.

The simpliest solution would be to use strcasestr(kbuf,"release") but
this function isn't available in the kernel.
Alternatively I could just search for e.g. upper- and lowercase 
"release", but I don't like that either. 
Maybe you have a better idea?

Overall, 65536 bits (ports) for the ip_local_reserved_ports bitfield
occupies 8K. With my patch this now becomes 16K. For desktop/server
usages I think this is OK, esp. since it's only used temporarily and
freed directly after usage again.

>>  		if (!tmp_bitmap) {
>>  			free_page(page);
>>  			return -ENOMEM;
>> @@ -2850,7 +2855,32 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>>  		proc_skip_char(&kbuf, &left, '\n');
>>  		while (!err && left) {
>>  			unsigned long val_a, val_b;
>> -			bool neg;
>> +			bool neg, found;
>> +
>> +			left -= proc_skip_spaces(&kbuf);
>> +			if (!left)
>> +				continue;
>> +
>> +			if (first || add_or_release) {
>> +				found = (0 == strnicmp(kbuf, "add ", 4));
>> +				if (found) {
> 
> I think we don't need an extra variable 'found' here.

Yes, thanks. Fixed in next version.

Thanks,
Helge

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3)
  2012-03-13 20:33 ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2) Helge Deller
  2012-03-14  7:43   ` Cong Wang
@ 2012-03-14 22:14   ` Helge Deller
  2012-03-14 22:34     ` Eric W. Biederman
  2012-04-04 20:24     ` [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
  1 sibling, 2 replies; 17+ messages in thread
From: Helge Deller @ 2012-03-14 22:14 UTC (permalink / raw)
  To: Octavian Purdila, netdev, David Miller, Cong Wang, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek

When writing to the ip_local_reserved_ports proc file it will currently clear
all previously reserved ports and update the current list with the one given 
in the input.

This behaviour makes it's usage quite hard, for example:
a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
   characters at maximum may not be sufficient to provide all your wished-to-
   be-reserved ports at once.
b) There is no easy way to disable specific given ports, you always need to
   give the full port list at once. This makes shell scripting hard, since
   you need to parse everything yourself.
c) There is no easy way to just add specific ports or port ranges. Again,
   this would be useful for shell scripts.

The following patch solves this problem by simply extending the parser
in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
of given ports or port ranges and to either add or drop the given ports
from the already existing list.

Here is an example:
$ echo "1000-2000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,5000   (works as before, current port list is replaced by new one)

$ echo "add 3000-4000" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-2000,3000-4000,5000   (new ports added)

$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
1000-1499,3501-4000,5000   (given ports were dropped from the list)

My main motivation for this patch is because of a huge commercial application
which by default may use lots of ports. The full port list which I would have
needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
around 30K, and in this case all ports were already combined to regions where
possible. With this patch it's now easy to split up the port ranges into
single pieces and to implement everything in simple bootup shell scripts.
Furthermore adding new or removing unneeded ports dynamically at runtime is
now easily possible.

Signed-off-by: Helge Deller <deller@gmx.de>
CC: Octavian Purdila <octavian.purdila@intel.com>
CC: netdev@vger.kernel.org
CC: David Miller <davem@davemloft.net>
CC: Cong Wang <amwang@redhat.com>
CC: "Eric W. Biederman" <ebiederm@xmission.com>

 Documentation/networking/ip-sysctl.txt |   10 +++++-
 kernel/sysctl.c                        |   51 ++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 8 deletions(-)


diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index ad3e80e..fc52546 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -624,7 +624,15 @@ ip_local_reserved_ports - list of comma separated ranges
 	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
 	10). Writing to the file will clear all previously reserved
 	ports and update the current list with the one given in the
-	input.
+	input unless one of the keywords "add" or "release" is used
+	in front of the ports in which case the given ports are added
+	to or released from the currently existing port list.
+	Example:
+	$ echo "1000-2000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "add 3000-4000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
+	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
+	1000-1499,3501-4000,5000
 
 	Note that ip_local_port_range and ip_local_reserved_ports
 	settings are independent and both are considered by the kernel
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index f487f25..f9b1930 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
  * We use a range comma separated format (e.g. 1,3-4,10-10) so that
  * large bitmaps may be represented in a compact manner. Writing into
  * the file will clear the bitmap then update it with the given input.
+ * If "add" or "release" is written in front of numbers or number ranges,
+ * the given bits will be added to or released from the existing bitmap.
  *
  * Returns 0 on success.
  */
@@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 {
 	int err = 0;
 	bool first = 1;
+	bool add_or_release = 0, xrelease = 0;
 	size_t left = *lenp;
 	unsigned long bitmap_len = table->maxlen;
 	unsigned long *bitmap = (unsigned long *) table->data;
-	unsigned long *tmp_bitmap = NULL;
-	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
+	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
+	char tr_a[] = { '-', ',', ' ', '\n' },
+	     tr_b[] = { ',', ' ', '\n', 0 }, c;
 
 	if (!bitmap_len || !left || (*ppos && !write)) {
 		*lenp = 0;
@@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
                 }
 		kbuf[left] = 0;
 
-		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
-				     GFP_KERNEL);
+		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
+					sizeof(unsigned long), GFP_KERNEL);
+		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
 		if (!tmp_bitmap) {
 			free_page(page);
 			return -ENOMEM;
@@ -2852,6 +2857,29 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 			unsigned long val_a, val_b;
 			bool neg;
 
+			left -= proc_skip_spaces(&kbuf);
+			if (!left)
+				continue;
+
+			if (first || add_or_release) {
+				if (!strnicmp(kbuf, "add ", 4)) {
+					xrelease = 0;
+					add_or_release = 1;
+					kbuf += 4;
+					left -= 4;
+				} else
+				if (!strnicmp(kbuf, "release ", 8)) {
+					xrelease = 1;
+					add_or_release = 1;
+					kbuf += 8;
+					left -= 8;
+				}
+
+				left -= proc_skip_spaces(&kbuf);
+				if (!left)
+					continue;
+			}
+
 			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
 					     sizeof(tr_a), &c);
 			if (err)
@@ -2885,12 +2913,20 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 			}
 
 			while (val_a <= val_b)
-				set_bit(val_a++, tmp_bitmap);
+				if (xrelease)
+					set_bit(val_a++, release_bitmap);
+				else
+					set_bit(val_a++, tmp_bitmap);
 
 			first = 0;
 			proc_skip_char(&kbuf, &left, '\n');
 		}
 		free_page(page);
+
+		/* Do not allow adding and releasing same bits in one step. */
+		if (!err && add_or_release &&
+		    bitmap_intersects(tmp_bitmap, release_bitmap, bitmap_len))
+			err = -EINVAL;
 	} else {
 		unsigned long bit_a, bit_b = 0;
 
@@ -2926,9 +2962,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
 
 	if (!err) {
 		if (write) {
-			if (*ppos)
+			if (*ppos || add_or_release) {
 				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
-			else
+				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
+			} else
 				memcpy(bitmap, tmp_bitmap,
 					BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
 		}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2)
  2012-03-14 22:06     ` Helge Deller
@ 2012-03-14 22:20       ` Stephen Hemminger
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2012-03-14 22:20 UTC (permalink / raw)
  To: Helge Deller
  Cc: Cong Wang, Octavian Purdila, netdev, David Miller, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek

On Wed, 14 Mar 2012 23:06:33 +0100
Helge Deller <deller@gmx.de> wrote:

> On 03/14/2012 08:43 AM, Cong Wang wrote:
> > On Tue, 2012-03-13 at 21:33 +0100, Helge Deller wrote:
> >> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> >> index f487f25..1f60398 100644
> >> --- a/kernel/sysctl.c
> >> +++ b/kernel/sysctl.c
> >> @@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
> >>   * We use a range comma separated format (e.g. 1,3-4,10-10) so that
> >>   * large bitmaps may be represented in a compact manner. Writing into
> >>   * the file will clear the bitmap then update it with the given input.
> >> + * If "add" or "release" is written in front of numbers or number ranges,
> >> + * the given bits will be added to or released from the existing bitmap.
> >>   *
> > 
> > What if I only write "add" or "release" ("add ", "release " too) into
> > this file? Make sure you have tested this corner case.
> 
> Sure, I tested this case. It will not modify the the current port list.
> 
> But there were other cases which I initially didn't took care of, mostly 
> because I wanted to keep the parser simple.
> Now, in the next version of the patch the following cases will be handled
> correctly: 
> - "add release 100"		(->syntax error)
> - "release 100 add 100"		(->with all in one line the result will not be
> 		as expected because first bitmap_or() and then bitmap_andnot()
> 		will be executed, so that the 100th bit becomes released instead
> 		of added. Users will need to split this into two echo commands
> 		otherwise -EINVAL will be returned.)
> 
> >>   * Returns 0 on success.
> >>   */
> >> @@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
> >>  {
> >>  	int err = 0;
> >>  	bool first = 1;
> >> +	bool add_or_release = 0, xrelease = 0;
> >>  	size_t left = *lenp;
> >>  	unsigned long bitmap_len = table->maxlen;
> >>  	unsigned long *bitmap = (unsigned long *) table->data;
> >> -	unsigned long *tmp_bitmap = NULL;
> >> -	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
> >> +	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
> >> +	char tr_a[] = { '-', ',', ' ', '\n' },
> >> +	     tr_b[] = { ',', ' ', '\n', 0 }, c;
> >>  
> >>  	if (!bitmap_len || !left || (*ppos && !write)) {
> >>  		*lenp = 0;
> >> @@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
> >>                  }
> >>  		kbuf[left] = 0;
> >>  
> >> -		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
> >> -				     GFP_KERNEL);
> >> +		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
> >> +					sizeof(unsigned long), GFP_KERNEL);
> >> +		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
> > 
> > So you double the size, and give the second half to 'release_bitmap',
> > this will waste spaces when release_bitmap is short, right?
> 
> Yes.
> 
> > *I think* we
> > can check if we want to release any bitmaps first, and then only
> > allocate one of tmp_bitmap and release_bitmap.
> 
> The simpliest solution would be to use strcasestr(kbuf,"release") but
> this function isn't available in the kernel.
> Alternatively I could just search for e.g. upper- and lowercase 
> "release", but I don't like that either. 
> Maybe you have a better idea?
> 
> Overall, 65536 bits (ports) for the ip_local_reserved_ports bitfield
> occupies 8K. With my patch this now becomes 16K. For desktop/server
> usages I think this is OK, esp. since it's only used temporarily and
> freed directly after usage again.
> 
> >>  		if (!tmp_bitmap) {
> >>  			free_page(page);
> >>  			return -ENOMEM;
> >> @@ -2850,7 +2855,32 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
> >>  		proc_skip_char(&kbuf, &left, '\n');
> >>  		while (!err && left) {
> >>  			unsigned long val_a, val_b;
> >> -			bool neg;
> >> +			bool neg, found;
> >> +
> >> +			left -= proc_skip_spaces(&kbuf);
> >> +			if (!left)
> >> +				continue;
> >> +
> >> +			if (first || add_or_release) {
> >> +				found = (0 == strnicmp(kbuf, "add ", 4));
> >> +				if (found) {
> > 
> > I think we don't need an extra variable 'found' here.
> 

This is getting to be a text book example of why /proc is ugly
as a general purpose API.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3)
  2012-03-14 22:14   ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3) Helge Deller
@ 2012-03-14 22:34     ` Eric W. Biederman
  2012-03-15 23:35       ` Helge Deller
  2012-04-04 20:24     ` [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
  1 sibling, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2012-03-14 22:34 UTC (permalink / raw)
  To: Helge Deller
  Cc: Octavian Purdila, netdev, David Miller, Cong Wang, Andrew Morton,
	Frank Danapfel, Laszlo Ersek

Helge Deller <deller@gmx.de> writes:

> When writing to the ip_local_reserved_ports proc file it will currently clear
> all previously reserved ports and update the current list with the one given 
> in the input.
>
> This behaviour makes it's usage quite hard, for example:
> a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
>    characters at maximum may not be sufficient to provide all your wished-to-
>    be-reserved ports at once.

Fundamentally this need to be fixed first or else you will not be able
to display the bitmap through sysctl.

> b) There is no easy way to disable specific given ports, you always need to
>    give the full port list at once. This makes shell scripting hard, since
>    you need to parse everything yourself.

> c) There is no easy way to just add specific ports or port ranges. Again,
>    this would be useful for shell scripts.

Arguably b and c call for user space tools for better tools for dealing
with text ranges not a magic parser in /proc.  We already have tools
like seq, dshbak, and pdsh.  What is the difficulty of writing a little
shell utility instead of modifying /proc?

> The following patch solves this problem by simply extending the parser
> in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
> of given ports or port ranges and to either add or drop the given ports
> from the already existing list.
>
> Here is an example:
> $ echo "1000-2000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 1000-2000,5000   (works as before, current port list is replaced by new one)
>
> $ echo "add 3000-4000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 1000-2000,3000-4000,5000   (new ports added)
>
> $ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> 1000-1499,3501-4000,5000   (given ports were dropped from the list)
>
> My main motivation for this patch is because of a huge commercial application
> which by default may use lots of ports. The full port list which I would have
> needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
> around 30K, and in this case all ports were already combined to regions where
> possible. With this patch it's now easy to split up the port ranges into
> single pieces and to implement everything in simple bootup shell scripts.
> Furthermore adding new or removing unneeded ports dynamically at runtime is
> now easily possible.

You are breaking concept that the bitmap is a single value in /proc/sys
which I don't like at all.

And ultimately this is a lot of code to avoid fixing a 4K limit of
sysctl reads and writes.  If I understand this correctly after this
patch is applied you can not read the result you write in with your
commercial application.  That just seems wrong.

Can you please attack the fundamental issue first?

Eric

> Signed-off-by: Helge Deller <deller@gmx.de>
> CC: Octavian Purdila <octavian.purdila@intel.com>
> CC: netdev@vger.kernel.org
> CC: David Miller <davem@davemloft.net>
> CC: Cong Wang <amwang@redhat.com>
> CC: "Eric W. Biederman" <ebiederm@xmission.com>
>
>  Documentation/networking/ip-sysctl.txt |   10 +++++-
>  kernel/sysctl.c                        |   51 ++++++++++++++++++++++++++++-----
>  2 files changed, 53 insertions(+), 8 deletions(-)
>
>
> diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
> index ad3e80e..fc52546 100644
> --- a/Documentation/networking/ip-sysctl.txt
> +++ b/Documentation/networking/ip-sysctl.txt
> @@ -624,7 +624,15 @@ ip_local_reserved_ports - list of comma separated ranges
>  	list of ranges (e.g. "1,2-4,10-10" for ports 1, 2, 3, 4 and
>  	10). Writing to the file will clear all previously reserved
>  	ports and update the current list with the one given in the
> -	input.
> +	input unless one of the keywords "add" or "release" is used
> +	in front of the ports in which case the given ports are added
> +	to or released from the currently existing port list.
> +	Example:
> +	$ echo "1000-2000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> +	$ echo "add 3000-4000,5000" > /proc/sys/net/ipv4/ip_local_reserved_ports
> +	$ echo "release 1500-3500" > /proc/sys/net/ipv4/ip_local_reserved_ports
> +	$ cat /proc/sys/net/ipv4/ip_local_reserved_ports
> +	1000-1499,3501-4000,5000
>  
>  	Note that ip_local_port_range and ip_local_reserved_ports
>  	settings are independent and both are considered by the kernel
> diff --git a/kernel/sysctl.c b/kernel/sysctl.c
> index f487f25..f9b1930 100644
> --- a/kernel/sysctl.c
> +++ b/kernel/sysctl.c
> @@ -2805,6 +2805,8 @@ static int proc_do_cad_pid(struct ctl_table *table, int write,
>   * We use a range comma separated format (e.g. 1,3-4,10-10) so that
>   * large bitmaps may be represented in a compact manner. Writing into
>   * the file will clear the bitmap then update it with the given input.
> + * If "add" or "release" is written in front of numbers or number ranges,
> + * the given bits will be added to or released from the existing bitmap.
>   *
>   * Returns 0 on success.
>   */
> @@ -2813,11 +2815,13 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  {
>  	int err = 0;
>  	bool first = 1;
> +	bool add_or_release = 0, xrelease = 0;
>  	size_t left = *lenp;
>  	unsigned long bitmap_len = table->maxlen;
>  	unsigned long *bitmap = (unsigned long *) table->data;
> -	unsigned long *tmp_bitmap = NULL;
> -	char tr_a[] = { '-', ',', '\n' }, tr_b[] = { ',', '\n', 0 }, c;
> +	unsigned long *tmp_bitmap = NULL, *release_bitmap = NULL;
> +	char tr_a[] = { '-', ',', ' ', '\n' },
> +	     tr_b[] = { ',', ' ', '\n', 0 }, c;
>  
>  	if (!bitmap_len || !left || (*ppos && !write)) {
>  		*lenp = 0;
> @@ -2841,8 +2845,9 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>                  }
>  		kbuf[left] = 0;
>  
> -		tmp_bitmap = kzalloc(BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long),
> -				     GFP_KERNEL);
> +		tmp_bitmap = kzalloc(2 * BITS_TO_LONGS(bitmap_len) *
> +					sizeof(unsigned long), GFP_KERNEL);
> +		release_bitmap = &tmp_bitmap[BITS_TO_LONGS(bitmap_len)];
>  		if (!tmp_bitmap) {
>  			free_page(page);
>  			return -ENOMEM;
> @@ -2852,6 +2857,29 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  			unsigned long val_a, val_b;
>  			bool neg;
>  
> +			left -= proc_skip_spaces(&kbuf);
> +			if (!left)
> +				continue;
> +
> +			if (first || add_or_release) {
> +				if (!strnicmp(kbuf, "add ", 4)) {
> +					xrelease = 0;
> +					add_or_release = 1;
> +					kbuf += 4;
> +					left -= 4;
> +				} else
> +				if (!strnicmp(kbuf, "release ", 8)) {
> +					xrelease = 1;
> +					add_or_release = 1;
> +					kbuf += 8;
> +					left -= 8;
> +				}
> +
> +				left -= proc_skip_spaces(&kbuf);
> +				if (!left)
> +					continue;
> +			}
> +
>  			err = proc_get_long(&kbuf, &left, &val_a, &neg, tr_a,
>  					     sizeof(tr_a), &c);
>  			if (err)
> @@ -2885,12 +2913,20 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  			}
>  
>  			while (val_a <= val_b)
> -				set_bit(val_a++, tmp_bitmap);
> +				if (xrelease)
> +					set_bit(val_a++, release_bitmap);
> +				else
> +					set_bit(val_a++, tmp_bitmap);
>  
>  			first = 0;
>  			proc_skip_char(&kbuf, &left, '\n');
>  		}
>  		free_page(page);
> +
> +		/* Do not allow adding and releasing same bits in one step. */
> +		if (!err && add_or_release &&
> +		    bitmap_intersects(tmp_bitmap, release_bitmap, bitmap_len))
> +			err = -EINVAL;
>  	} else {
>  		unsigned long bit_a, bit_b = 0;
>  
> @@ -2926,9 +2962,10 @@ int proc_do_large_bitmap(struct ctl_table *table, int write,
>  
>  	if (!err) {
>  		if (write) {
> -			if (*ppos)
> +			if (*ppos || add_or_release) {
>  				bitmap_or(bitmap, bitmap, tmp_bitmap, bitmap_len);
> -			else
> +				bitmap_andnot(bitmap, bitmap, release_bitmap, bitmap_len);
> +			} else
>  				memcpy(bitmap, tmp_bitmap,
>  					BITS_TO_LONGS(bitmap_len) * sizeof(unsigned long));
>  		}

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3)
  2012-03-14 22:34     ` Eric W. Biederman
@ 2012-03-15 23:35       ` Helge Deller
  0 siblings, 0 replies; 17+ messages in thread
From: Helge Deller @ 2012-03-15 23:35 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Octavian Purdila, netdev, David Miller, Cong Wang, Andrew Morton,
	Frank Danapfel, Laszlo Ersek

On 03/14/2012 11:34 PM, Eric W. Biederman wrote:
> Helge Deller<deller@gmx.de>  writes:
>
>> When writing to the ip_local_reserved_ports proc file it will currently clear
>> all previously reserved ports and update the current list with the one given
>> in the input.
>>
>> This behaviour makes it's usage quite hard, for example:
>> a) The generic proc filesystem limitation of only handle up to PAGE_SIZE-1
>>     characters at maximum may not be sufficient to provide all your wished-to-
>>     be-reserved ports at once.
>
> Fundamentally this need to be fixed first or else you will not be able
> to display the bitmap through sysctl.

Not necessarily.
Reading more than 4K from /proc/sys/net/ipv4/ip_local_reserved_ports
is possible today already (everything depends on how much chars you
read-at-once). Just try using "dd bs=1M". Writing is the major limiting factor.

>> b) There is no easy way to disable specific given ports, you always need to
>>     give the full port list at once. This makes shell scripting hard, since
>>     you need to parse everything yourself.
>
>> c) There is no easy way to just add specific ports or port ranges. Again,
>>     this would be useful for shell scripts.
>
> Arguably b and c call for user space tools for better tools for dealing
> with text ranges not a magic parser in /proc.  We already have tools
> like seq, dshbak, and pdsh.  What is the difficulty of writing a little
> shell utility instead of modifying /proc?

I'm not modifying /proc. I'm just extending it to provide an alternative
method to modify values for the large bitmap case only.
The basic existing behaviour to just read/write values is still functional.

If for example the current value of ip_local_reserved_ports is:
300,400-500,600....
then I like the idea of just
	echo "release 420-480" > ...
much more than
	cat ip_local_reserved_ports | do_complicated_shell_string_scripting_to_remove_420-480_without_getting_above_4K > ip_local_reserved_ports
BTW, in this case the range 400-500 gets split up and the output should become
300,400-419,481-500,600....

>> The following patch solves this problem by simply extending the parser
>> in proc_do_large_bitmap() to accept the keywords "add" and "release" in front
>> of given ports or port ranges and to either add or drop the given ports
>> from the already existing list.
>>
>> Here is an example:
>> $ echo "1000-2000,5000">  /proc/sys/net/ipv4/ip_local_reserved_ports
>> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
>> 1000-2000,5000   (works as before, current port list is replaced by new one)
>>
>> $ echo "add 3000-4000">  /proc/sys/net/ipv4/ip_local_reserved_ports
>> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
>> 1000-2000,3000-4000,5000   (new ports added)
>>
>> $ echo "release 1500-3500">  /proc/sys/net/ipv4/ip_local_reserved_ports
>> $ cat /proc/sys/net/ipv4/ip_local_reserved_ports
>> 1000-1499,3501-4000,5000   (given ports were dropped from the list)
>>
>> My main motivation for this patch is because of a huge commercial application
>> which by default may use lots of ports. The full port list which I would have
>> needed to echo to the /proc/sys/net/ipv4/ip_local_reserved_ports file was
>> around 30K, and in this case all ports were already combined to regions where
>> possible. With this patch it's now easy to split up the port ranges into
>> single pieces and to implement everything in simple bootup shell scripts.
>> Furthermore adding new or removing unneeded ports dynamically at runtime is
>> now easily possible.
>
> You are breaking concept that the bitmap is a single value in /proc/sys
> which I don't like at all.

Please keep in mind that this is a 65535 bit (port) bitmap, not just a "simple"
single value.
And the basic concept is still there.

> And ultimately this is a lot of code to avoid fixing a 4K limit of
> sysctl reads and writes.

I would love if the 4K problem gets fixed.
But it seems that this will lead to a major redesign and rewrite of
the whole /proc interface which I won't be able to drive.
I'm not even sure if this could be done in short time without
breaking existing compatibility. Other people here on the list can
surely comment on that better than me.

So, my patch is at least a "balanced" solution which can make the
current ip_local_reserved_ports interface useable to existing
applications. In it's current form, the usage of ip_local_reserved_ports
isfull of limitations.

>If I understand this correctly after this
> patch is applied you can not read the result you write in with your
> commercial application.  That just seems wrong.

As mentioned above: reading is possible. And with my patch
applied you usually probably won't need to read the values since you
don't care which other ports have been configured and just add/release
the ones you are interested in.

Regarding the commercial application: The idea is to provide an additional
shell-based system initscript which fills the ip_local_reserved_ports
list with initial values based on the product which the customer
installed. After the values have been set during bootup, further (dynamic)
changes of this port list won't happen unless the administrator
manually wants to add/release other ports. So, no magic in here...

> Can you please attack the fundamental issue first?

You mean the 4K/PAGE_SIZE limitation? As said above, that's probably
above my possibilities / time capatibilities. Sorry.

But of course I'm open for all kind of other ideas how and API for
the large bitmap case could be done better. Sadly the seq_file interface
doesn't seem to support reading yet...?

Helge

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-03-14 22:14   ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3) Helge Deller
  2012-03-14 22:34     ` Eric W. Biederman
@ 2012-04-04 20:24     ` Helge Deller
  2012-04-09  8:43       ` Cong Wang
  1 sibling, 1 reply; 17+ messages in thread
From: Helge Deller @ 2012-04-04 20:24 UTC (permalink / raw)
  To: Octavian Purdila, netdev, David Miller, Cong Wang, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek, shemminger

I would like to follow up on my last patch series to be able to modify
the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
from userspace.

My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
modifications to the proc interface, which - based on the feedback here
on the list - seemed to not be the right way to go (although I personally
still like the idea very much :-)).

Anyway, with this RFC I would like to get feedback about a new proposed
API and attached kernel patch.

The idea is to introduce a new <optname> value for get/setsockopt()
named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
bitmap via standard get/setsockopt() syscalls.
As far as I understand this seems to be similiar to how iptables works.

An untested kernel patch for review and feedback is attached below.

In userspace it then would be possible to write a new tool or to extend
for example the "ip" tool to accept commands like:
$> ip reserved_ports add 100-2000
$> ip reserved_ports remove 50-60
$> ip reserved_ports list     (to show current reserved port list)

This userspace tool could then read the port bitmap from kernel via
a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS, <bitmaplist>)
and write back the results after modification via
c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS, <bitmaplist>)

Would that be an acceptable solution?
If yes, I will of course write the kernel patch and the userspace 
implementation and
post it here for review again...

Thanks,
Helge


diff --git a/include/asm-generic/socket.h b/include/asm-generic/socket.h
--- a/include/asm-generic/socket.h
+++ b/include/asm-generic/socket.h
@@ -67,4 +67,6 @@

  #define SO_WIFI_STATUS        41
  #define SCM_WIFI_STATUS    SO_WIFI_STATUS
+
+#define SO_RESERVED_PORTS    42
  #endif /* __ASM_GENERIC_SOCKET_H */
diff --git a/include/net/ip.h b/include/net/ip.h
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -214,6 +214,7 @@ extern struct local_ports {
  } sysctl_local_ports;
  extern void inet_get_local_port_range(int *low, int *high);

+#define SYSCTL_LOCAL_RESERVED_PORTS_BYTES (65536 / 8)
  extern unsigned long *sysctl_local_reserved_ports;
  static inline int inet_is_reserved_local_port(int port)
  {
diff --git a/net/core/sock.c b/net/core/sock.c
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -793,6 +793,13 @@ set_rcvbuf:
          sock_valbool_flag(sk, SOCK_WIFI_STATUS, valbool);
          break;

+    case SO_RESERVED_PORTS:
+        ret = -EINVAL;
+        if (optlen == SYSCTL_LOCAL_RESERVED_PORTS_BYTES)
+            ret = copy_from_user(sysctl_local_reserved_ports, optval, 
optlen);
+        break;
+
+
      default:
          ret = -ENOPROTOOPT;
          break;
@@ -1018,6 +1025,13 @@ int sock_getsockopt(struct socket *sock, int 
level, int optname,
          v.val = !!sock_flag(sk, SOCK_WIFI_STATUS);
          break;

+    case SO_RESERVED_PORTS:
+        if (len != SYSCTL_LOCAL_RESERVED_PORTS_BYTES)
+            return -EINVAL;
+        if (copy_to_user(optval, sysctl_local_reserved_ports, len))
+            return -EFAULT;
+        goto lenout;
+
      default:
          return -ENOPROTOOPT;
      }
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1642,7 +1642,7 @@ static int __init inet_init(void)

      BUILD_BUG_ON(sizeof(struct inet_skb_parm) > sizeof(dummy_skb->cb));

-    sysctl_local_reserved_ports = kzalloc(65536 / 8, GFP_KERNEL);
+    sysctl_local_reserved_ports = 
kzalloc(SYSCTL_LOCAL_RESERVED_PORTS_BYTES, GFP_KERNEL);
      if (!sysctl_local_reserved_ports)
          goto out;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-04-04 20:24     ` [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
@ 2012-04-09  8:43       ` Cong Wang
  2012-04-10 21:04         ` Helge Deller
  0 siblings, 1 reply; 17+ messages in thread
From: Cong Wang @ 2012-04-09  8:43 UTC (permalink / raw)
  To: Helge Deller
  Cc: Octavian Purdila, netdev, David Miller, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek, shemminger

On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
> I would like to follow up on my last patch series to be able to modify
> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
> from userspace.
> 
> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
> modifications to the proc interface, which - based on the feedback here
> on the list - seemed to not be the right way to go (although I personally
> still like the idea very much :-)).
> 
> Anyway, with this RFC I would like to get feedback about a new proposed
> API and attached kernel patch.
> 
> The idea is to introduce a new <optname> value for get/setsockopt()
> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
> bitmap via standard get/setsockopt() syscalls.
> As far as I understand this seems to be similiar to how iptables works.
> 
> An untested kernel patch for review and feedback is attached below.
> 
> In userspace it then would be possible to write a new tool or to extend
> for example the "ip" tool to accept commands like:
> $> ip reserved_ports add 100-2000
> $> ip reserved_ports remove 50-60
> $> ip reserved_ports list     (to show current reserved port list)
> 
> This userspace tool could then read the port bitmap from kernel via
> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS, <bitmaplist>)
> and write back the results after modification via
> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS, <bitmaplist>)
> 
> Would that be an acceptable solution?

Hmm, it is indeed that bitmap fits for syscall rather than /proc file.

But it seems that using getsockopt()/setsockopt() makes it like it is a
per-socket setting, actually it is a system-wide setting. So I am
wondering if exporting a binary /proc file for this is a better
solution.

Thanks.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-04-09  8:43       ` Cong Wang
@ 2012-04-10 21:04         ` Helge Deller
  2012-04-10 22:13           ` Eric W. Biederman
  0 siblings, 1 reply; 17+ messages in thread
From: Helge Deller @ 2012-04-10 21:04 UTC (permalink / raw)
  To: Cong Wang
  Cc: Octavian Purdila, netdev, David Miller, Andrew Morton,
	Eric W. Biederman, Frank Danapfel, Laszlo Ersek, shemminger

On 04/09/2012 10:43 AM, Cong Wang wrote:
> On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
>> I would like to follow up on my last patch series to be able to modify
>> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
>> from userspace.
>>
>> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
>> modifications to the proc interface, which - based on the feedback here
>> on the list - seemed to not be the right way to go (although I personally
>> still like the idea very much :-)).
>>
>> Anyway, with this RFC I would like to get feedback about a new proposed
>> API and attached kernel patch.
>>
>> The idea is to introduce a new<optname>  value for get/setsockopt()
>> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
>> bitmap via standard get/setsockopt() syscalls.
>> As far as I understand this seems to be similiar to how iptables works.
>>
>> An untested kernel patch for review and feedback is attached below.
>>
>> In userspace it then would be possible to write a new tool or to extend
>> for example the "ip" tool to accept commands like:
>> $>  ip reserved_ports add 100-2000
>> $>  ip reserved_ports remove 50-60
>> $>  ip reserved_ports list     (to show current reserved port list)
>>
>> This userspace tool could then read the port bitmap from kernel via
>> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
>> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>> and write back the results after modification via
>> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>
>> Would that be an acceptable solution?
> Hmm, it is indeed that bitmap fits for syscall rather than /proc file.
>
> But it seems that using getsockopt()/setsockopt() makes it like it is a
> per-socket setting, actually it is a system-wide setting.
Yes, that's the reason why I used SOL_SOCKET which configures at least
a few system-wide settings too.

> So I am
> wondering if exporting a binary /proc file for this is a better
> solution.
Yeah - that's another solution, but (65536 ports)/(8 bits per byte) = 8 
KByte, so we
may again hit the 4k limit of /proc (unless you do binary reads which should
be done with a binary /proc-entry anyway).

Again, I'm open to develop any kind of solution which would get an OK here.

Helge

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-04-10 21:04         ` Helge Deller
@ 2012-04-10 22:13           ` Eric W. Biederman
  2012-05-17 21:18             ` Helge Deller
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2012-04-10 22:13 UTC (permalink / raw)
  To: Helge Deller
  Cc: Cong Wang, Octavian Purdila, netdev, David Miller, Andrew Morton,
	Frank Danapfel, Laszlo Ersek, shemminger

Helge Deller <deller@gmx.de> writes:

> On 04/09/2012 10:43 AM, Cong Wang wrote:
>> On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
>>> I would like to follow up on my last patch series to be able to modify
>>> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
>>> from userspace.
>>>
>>> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
>>> modifications to the proc interface, which - based on the feedback here
>>> on the list - seemed to not be the right way to go (although I personally
>>> still like the idea very much :-)).
>>>
>>> Anyway, with this RFC I would like to get feedback about a new proposed
>>> API and attached kernel patch.
>>>
>>> The idea is to introduce a new<optname>  value for get/setsockopt()
>>> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
>>> bitmap via standard get/setsockopt() syscalls.
>>> As far as I understand this seems to be similiar to how iptables works.
>>>
>>> An untested kernel patch for review and feedback is attached below.
>>>
>>> In userspace it then would be possible to write a new tool or to extend
>>> for example the "ip" tool to accept commands like:
>>> $>  ip reserved_ports add 100-2000
>>> $>  ip reserved_ports remove 50-60
>>> $>  ip reserved_ports list     (to show current reserved port list)
>>>
>>> This userspace tool could then read the port bitmap from kernel via
>>> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
>>> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>> and write back the results after modification via
>>> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>>
>>> Would that be an acceptable solution?
>> Hmm, it is indeed that bitmap fits for syscall rather than /proc file.
>>
>> But it seems that using getsockopt()/setsockopt() makes it like it is a
>> per-socket setting, actually it is a system-wide setting.
> Yes, that's the reason why I used SOL_SOCKET which configures at least
> a few system-wide settings too.
>
>> So I am
>> wondering if exporting a binary /proc file for this is a better
>> solution.
> Yeah - that's another solution, but (65536 ports)/(8 bits per byte) = 8 KByte,
> so we
> may again hit the 4k limit of /proc (unless you do binary reads which should
> be done with a binary /proc-entry anyway).
>
> Again, I'm open to develop any kind of solution which would get an OK
> here.

Just looking at proc_do_large_bitmap, it does appear that there is a
very local 4k limit on writes.

Can you please just modify proc_do_large_bitmap so that there is not a
4k limit on writes.  Ideally the code would just read another 4k from
userspace when it is getting close to the end of it's 4k buffer, or
perhaps we just read everything directly from userspace and run slowly.

The bitmap is installed atomically at the end so any weird partial
states should not be a problem..

Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-04-10 22:13           ` Eric W. Biederman
@ 2012-05-17 21:18             ` Helge Deller
  2012-05-17 21:22               ` Stephen Hemminger
  0 siblings, 1 reply; 17+ messages in thread
From: Helge Deller @ 2012-05-17 21:18 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Cong Wang, Octavian Purdila, netdev, David Miller, Andrew Morton,
	Frank Danapfel, Laszlo Ersek, shemminger

On 04/11/2012 12:13 AM, Eric W. Biederman wrote:
> Helge Deller <deller@gmx.de> writes:
> 
>> On 04/09/2012 10:43 AM, Cong Wang wrote:
>>> On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
>>>> I would like to follow up on my last patch series to be able to modify
>>>> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
>>>> from userspace.
>>>>
>>>> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
>>>> modifications to the proc interface, which - based on the feedback here
>>>> on the list - seemed to not be the right way to go (although I personally
>>>> still like the idea very much :-)).
>>>>
>>>> Anyway, with this RFC I would like to get feedback about a new proposed
>>>> API and attached kernel patch.
>>>>
>>>> The idea is to introduce a new<optname>  value for get/setsockopt()
>>>> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
>>>> bitmap via standard get/setsockopt() syscalls.
>>>> As far as I understand this seems to be similiar to how iptables works.
>>>>
>>>> An untested kernel patch for review and feedback is attached below.
>>>>
>>>> In userspace it then would be possible to write a new tool or to extend
>>>> for example the "ip" tool to accept commands like:
>>>> $>  ip reserved_ports add 100-2000
>>>> $>  ip reserved_ports remove 50-60
>>>> $>  ip reserved_ports list     (to show current reserved port list)
>>>>
>>>> This userspace tool could then read the port bitmap from kernel via
>>>> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
>>>> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>>> and write back the results after modification via
>>>> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
>>>>
>>>> Would that be an acceptable solution?
>>> Hmm, it is indeed that bitmap fits for syscall rather than /proc file.
>>>
>>> But it seems that using getsockopt()/setsockopt() makes it like it is a
>>> per-socket setting, actually it is a system-wide setting.
>> Yes, that's the reason why I used SOL_SOCKET which configures at least
>> a few system-wide settings too.
>>
>>> So I am
>>> wondering if exporting a binary /proc file for this is a better
>>> solution.
>> Yeah - that's another solution, but (65536 ports)/(8 bits per byte) = 8 KByte,
>> so we
>> may again hit the 4k limit of /proc (unless you do binary reads which should
>> be done with a binary /proc-entry anyway).
>>
>> Again, I'm open to develop any kind of solution which would get an OK
>> here.
> 
> Just looking at proc_do_large_bitmap, it does appear that there is a
> very local 4k limit on writes.
> 
> Can you please just modify proc_do_large_bitmap so that there is not a
> 4k limit on writes.  Ideally the code would just read another 4k from
> userspace when it is getting close to the end of it's 4k buffer, or
> perhaps we just read everything directly from userspace and run slowly.

Hi Eric,

sorry for the very late reply.
Yes, you are right- this is only a local 4K limit. Increasing it allowed me 
to write more ports at once.

With your tips I was now able to build a simple solution which fits my needs.
Based on standard tools like echo and dd (with the seek option) I can
block all ports which I need.

Nevertheless, the current kernel interface is not very flexible.
So, my proposal for a new interface (with tools) still stands. I just need
and advise what would be acceptable. Without any advise I will just leave
everything as is (since I'm now fine with it).

Helge

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports
  2012-05-17 21:18             ` Helge Deller
@ 2012-05-17 21:22               ` Stephen Hemminger
  0 siblings, 0 replies; 17+ messages in thread
From: Stephen Hemminger @ 2012-05-17 21:22 UTC (permalink / raw)
  To: Helge Deller
  Cc: Eric W. Biederman, Cong Wang, Octavian Purdila, netdev,
	David Miller, Andrew Morton, Frank Danapfel, Laszlo Ersek

On Thu, 17 May 2012 23:18:18 +0200
Helge Deller <deller@gmx.de> wrote:

> On 04/11/2012 12:13 AM, Eric W. Biederman wrote:
> > Helge Deller <deller@gmx.de> writes:
> > 
> >> On 04/09/2012 10:43 AM, Cong Wang wrote:
> >>> On Wed, 2012-04-04 at 22:24 +0200, Helge Deller wrote:
> >>>> I would like to follow up on my last patch series to be able to modify
> >>>> the contents of the /proc/sys/net/ipv4/ip_local_reserved_ports port list
> >>>> from userspace.
> >>>>
> >>>> My last patch (https://lkml.org/lkml/2012/3/10/187) was based on
> >>>> modifications to the proc interface, which - based on the feedback here
> >>>> on the list - seemed to not be the right way to go (although I personally
> >>>> still like the idea very much :-)).
> >>>>
> >>>> Anyway, with this RFC I would like to get feedback about a new proposed
> >>>> API and attached kernel patch.
> >>>>
> >>>> The idea is to introduce a new<optname>  value for get/setsockopt()
> >>>> named SO_RESERVED_PORTS to get/set the ip_local_reserved_ports
> >>>> bitmap via standard get/setsockopt() syscalls.
> >>>> As far as I understand this seems to be similiar to how iptables works.
> >>>>
> >>>> An untested kernel patch for review and feedback is attached below.
> >>>>
> >>>> In userspace it then would be possible to write a new tool or to extend
> >>>> for example the "ip" tool to accept commands like:
> >>>> $>  ip reserved_ports add 100-2000
> >>>> $>  ip reserved_ports remove 50-60
> >>>> $>  ip reserved_ports list     (to show current reserved port list)
> >>>>
> >>>> This userspace tool could then read the port bitmap from kernel via
> >>>> a) socket(PF_INET, SOCK_RAW, IPPROTO_RAW)
> >>>> b) getsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
> >>>> and write back the results after modification via
> >>>> c) setsockopt(3, SOL_SOCKET, SO_RESERVED_PORTS,<bitmaplist>)
> >>>>
> >>>> Would that be an acceptable solution?
> >>> Hmm, it is indeed that bitmap fits for syscall rather than /proc file.
> >>>
> >>> But it seems that using getsockopt()/setsockopt() makes it like it is a
> >>> per-socket setting, actually it is a system-wide setting.
> >> Yes, that's the reason why I used SOL_SOCKET which configures at least
> >> a few system-wide settings too.
> >>
> >>> So I am
> >>> wondering if exporting a binary /proc file for this is a better
> >>> solution.
> >> Yeah - that's another solution, but (65536 ports)/(8 bits per byte) = 8 KByte,
> >> so we
> >> may again hit the 4k limit of /proc (unless you do binary reads which should
> >> be done with a binary /proc-entry anyway).
> >>
> >> Again, I'm open to develop any kind of solution which would get an OK
> >> here.
> > 
> > Just looking at proc_do_large_bitmap, it does appear that there is a
> > very local 4k limit on writes.
> > 
> > Can you please just modify proc_do_large_bitmap so that there is not a
> > 4k limit on writes.  Ideally the code would just read another 4k from
> > userspace when it is getting close to the end of it's 4k buffer, or
> > perhaps we just read everything directly from userspace and run slowly.
> 
> Hi Eric,
> 
> sorry for the very late reply.
> Yes, you are right- this is only a local 4K limit. Increasing it allowed me 
> to write more ports at once.
> 
> With your tips I was now able to build a simple solution which fits my needs.
> Based on standard tools like echo and dd (with the seek option) I can
> block all ports which I need.
> 
> Nevertheless, the current kernel interface is not very flexible.
> So, my proposal for a new interface (with tools) still stands. I just need
> and advise what would be acceptable. Without any advise I will just leave
> everything as is (since I'm now fine with it).
> 
> Helge

Sounds like an ideal case for providing an mmap interface to the
bitmap?

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2012-05-17 21:22 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-10 23:36 [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
2012-03-11 22:55 ` David Miller
2012-03-12  3:42 ` Cong Wang
2012-03-12 21:09   ` Helge Deller
2012-03-13 20:33 ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v2) Helge Deller
2012-03-14  7:43   ` Cong Wang
2012-03-14 22:06     ` Helge Deller
2012-03-14 22:20       ` Stephen Hemminger
2012-03-14 22:14   ` [PATCH] enhance usability of /proc/sys/net/ipv4/ip_local_reserved_ports (v3) Helge Deller
2012-03-14 22:34     ` Eric W. Biederman
2012-03-15 23:35       ` Helge Deller
2012-04-04 20:24     ` [RFC] API to modify /proc/sys/net/ipv4/ip_local_reserved_ports Helge Deller
2012-04-09  8:43       ` Cong Wang
2012-04-10 21:04         ` Helge Deller
2012-04-10 22:13           ` Eric W. Biederman
2012-05-17 21:18             ` Helge Deller
2012-05-17 21:22               ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.