linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] kernel: automatically split user namespace extent
@ 2020-11-26 10:08 Giuseppe Scrivano
  2020-12-01 17:53 ` Eric W. Biederman
  0 siblings, 1 reply; 7+ messages in thread
From: Giuseppe Scrivano @ 2020-11-26 10:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: christian.brauner, serge, ebiederm

writing to the id map fails when an extent overlaps multiple mappings
in the parent user namespace, e.g.:

$ cat /proc/self/uid_map
         0       1000          1
         1     100000      65536
$ unshare -U sleep 100 &
[1] 1029703
$ printf "0 0 100\n" | tee /proc/$!/uid_map
0 0 100
tee: /proc/1029703/uid_map: Operation not permitted

To prevent it from happening, automatically split an extent so that
each portion fits in one extent in the parent user namespace.

$ cat /proc/self/uid_map
         0       1000          1
         1     110000      65536
$ unshare -U sleep 100 &
[1] 1552
$ printf "0 0 100\n" | tee /proc/$!/uid_map
0 0 100
$ cat /proc/$!/uid_map
         0          0          1
         1          1         99

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
---
 kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index 87804e0371fe..b5542be2bd0a 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
 	.show = projid_m_show,
 };
 
+static void split_overlapping_mappings(struct uid_gid_map *parent_map,
+				       struct uid_gid_extent *extent,
+				       struct uid_gid_extent *overflow_extent)
+{
+	unsigned int idx;
+
+	overflow_extent->first = (u32) -1;
+
+	/* Split extent if it not fully contained in an extent from parent_map.  */
+	for (idx = 0; idx < parent_map->nr_extents; idx++) {
+		struct uid_gid_extent *prev;
+		u32 first, last, prev_last, size;
+
+		if (parent_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
+			prev = &parent_map->extent[idx];
+		else
+			prev = &parent_map->forward[idx];
+
+		first = extent->lower_first;
+		last = extent->lower_first + extent->count - 1;
+		prev_last = prev->first + prev->count - 1;
+
+		if ((first <= prev_last) && (last > prev_last)) {
+			size = prev_last - first + 1;
+
+			overflow_extent->first = extent->first + size;
+			overflow_extent->lower_first = extent->lower_first + size;
+			overflow_extent->count = extent->count - size;
+
+			extent->count = size;
+			return;
+		}
+	}
+}
+
 static bool mappings_overlap(struct uid_gid_map *new_map,
 			     struct uid_gid_extent *extent)
 {
@@ -852,6 +887,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 	struct uid_gid_map new_map;
 	unsigned idx;
 	struct uid_gid_extent extent;
+	struct uid_gid_extent overflow_extent;
 	char *kbuf = NULL, *pos, *next_line;
 	ssize_t ret;
 
@@ -946,18 +982,24 @@ static ssize_t map_write(struct file *file, const char __user *buf,
 		     extent.lower_first)
 			goto out;
 
-		/* Do the ranges in extent overlap any previous extents? */
-		if (mappings_overlap(&new_map, &extent))
-			goto out;
+		do {
+			/* Do the ranges in extent overlap any previous extents? */
+			if (mappings_overlap(&new_map, &extent))
+				goto out;
 
-		if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
-		    (next_line != NULL))
-			goto out;
+			if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
+			    (next_line != NULL))
+				goto out;
 
-		ret = insert_extent(&new_map, &extent);
-		if (ret < 0)
-			goto out;
-		ret = -EINVAL;
+			split_overlapping_mappings(parent_map, &extent, &overflow_extent);
+
+			ret = insert_extent(&new_map, &extent);
+			if (ret < 0)
+				goto out;
+			ret = -EINVAL;
+
+			extent = overflow_extent;
+		} while (overflow_extent.first != (u32) -1);
 	}
 	/* Be very certaint the new map actually exists */
 	if (new_map.nr_extents == 0)
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: automatically split user namespace extent
  2020-11-26 10:08 [PATCH] kernel: automatically split user namespace extent Giuseppe Scrivano
@ 2020-12-01 17:53 ` Eric W. Biederman
  2020-12-02 16:12   ` Giuseppe Scrivano
  0 siblings, 1 reply; 7+ messages in thread
From: Eric W. Biederman @ 2020-12-01 17:53 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: linux-kernel, christian.brauner, serge, Linux Containers


Nit: The tag should have been "userns:" rather than kernel.

Giuseppe Scrivano <gscrivan@redhat.com> writes:

> writing to the id map fails when an extent overlaps multiple mappings
> in the parent user namespace, e.g.:
>
> $ cat /proc/self/uid_map
>          0       1000          1
>          1     100000      65536
> $ unshare -U sleep 100 &
> [1] 1029703
> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> 0 0 100
> tee: /proc/1029703/uid_map: Operation not permitted
>
> To prevent it from happening, automatically split an extent so that
> each portion fits in one extent in the parent user namespace.

I don't see anything fundamentally wrong with relaxing this
restriction, but more code does have more room for bugs to hide.

What is the advantage of relaxing this restriction?

> $ cat /proc/self/uid_map
>          0       1000          1
>          1     110000      65536
> $ unshare -U sleep 100 &
> [1] 1552
> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> 0 0 100
> $ cat /proc/$!/uid_map
>          0          0          1
>          1          1         99
>
> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
> ---
>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
>  1 file changed, 52 insertions(+), 10 deletions(-)
>
> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> index 87804e0371fe..b5542be2bd0a 100644
> --- a/kernel/user_namespace.c
> +++ b/kernel/user_namespace.c
> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
>  	.show = projid_m_show,
>  };
>  
> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
> +				       struct uid_gid_extent *extent,
> +				       struct uid_gid_extent *overflow_extent)
> +{
> +	unsigned int idx;
> +
> +	overflow_extent->first = (u32) -1;
> +
> +	/* Split extent if it not fully contained in an extent from parent_map.  */
> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {

Ouch!

For the larger tree we perform binary searches typically and
here you are walking every entry unconditionally.

It looks like this makes the write O(N^2) from O(NlogN)
which for a user facing function is not desirable.

I think something like insert_and_split_extent may be ok.
Incorporating your loop and the part that inserts an element.

As written this almost doubles the complexity of the code,
as well as making it perform much worse.  Which is a problem.


> +		struct uid_gid_extent *prev;
> +		u32 first, last, prev_last, size;
> +
> +		if (parent_map->nr_extents <= UID_GID_MAP_MAX_BASE_EXTENTS)
> +			prev = &parent_map->extent[idx];
> +		else
> +			prev = &parent_map->forward[idx];
> +
> +		first = extent->lower_first;
> +		last = extent->lower_first + extent->count - 1;
> +		prev_last = prev->first + prev->count - 1;
> +
> +		if ((first <= prev_last) && (last > prev_last)) {
> +			size = prev_last - first + 1;
> +
> +			overflow_extent->first = extent->first + size;
> +			overflow_extent->lower_first = extent->lower_first + size;
> +			overflow_extent->count = extent->count - size;
> +
> +			extent->count = size;
> +			return;
> +		}
> +	}
> +}
> +
>  static bool mappings_overlap(struct uid_gid_map *new_map,
>  			     struct uid_gid_extent *extent)
>  {
> @@ -852,6 +887,7 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  	struct uid_gid_map new_map;
>  	unsigned idx;
>  	struct uid_gid_extent extent;
> +	struct uid_gid_extent overflow_extent;
>  	char *kbuf = NULL, *pos, *next_line;
>  	ssize_t ret;
>  
> @@ -946,18 +982,24 @@ static ssize_t map_write(struct file *file, const char __user *buf,
>  		     extent.lower_first)
>  			goto out;
>  
> -		/* Do the ranges in extent overlap any previous extents? */
> -		if (mappings_overlap(&new_map, &extent))
> -			goto out;
> +		do {
> +			/* Do the ranges in extent overlap any previous extents? */
> +			if (mappings_overlap(&new_map, &extent))
> +				goto out;

Why should mappings_overlap be called in the loop?   Will splitting an
extent create the possibility for creating overlapping mappings?

> -		if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
> -		    (next_line != NULL))
> -			goto out;
> +			if ((new_map.nr_extents + 1) == UID_GID_MAP_MAX_EXTENTS &&
> +			    (next_line != NULL))
> +				goto out;
>  
> -		ret = insert_extent(&new_map, &extent);
> -		if (ret < 0)
> -			goto out;
> -		ret = -EINVAL;
> +			split_overlapping_mappings(parent_map, &extent, &overflow_extent);
> +
> +			ret = insert_extent(&new_map, &extent);
> +			if (ret < 0)
> +				goto out;
> +			ret = -EINVAL;
> +
> +			extent = overflow_extent;
> +		} while (overflow_extent.first != (u32) -1);
>  	}
>  	/* Be very certaint the new map actually exists */
>  	if (new_map.nr_extents == 0)

Eric

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: automatically split user namespace extent
  2020-12-01 17:53 ` Eric W. Biederman
@ 2020-12-02 16:12   ` Giuseppe Scrivano
  2021-04-02 14:32     ` Serge E. Hallyn
  0 siblings, 1 reply; 7+ messages in thread
From: Giuseppe Scrivano @ 2020-12-02 16:12 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: linux-kernel, christian.brauner, serge, Linux Containers

Hi Eric,

ebiederm@xmission.com (Eric W. Biederman) writes:

> Nit: The tag should have been "userns:" rather than kernel.
>
> Giuseppe Scrivano <gscrivan@redhat.com> writes:
>
>> writing to the id map fails when an extent overlaps multiple mappings
>> in the parent user namespace, e.g.:
>>
>> $ cat /proc/self/uid_map
>>          0       1000          1
>>          1     100000      65536
>> $ unshare -U sleep 100 &
>> [1] 1029703
>> $ printf "0 0 100\n" | tee /proc/$!/uid_map
>> 0 0 100
>> tee: /proc/1029703/uid_map: Operation not permitted
>>
>> To prevent it from happening, automatically split an extent so that
>> each portion fits in one extent in the parent user namespace.
>
> I don't see anything fundamentally wrong with relaxing this
> restriction, but more code does have more room for bugs to hide.
>
> What is the advantage of relaxing this restriction?

we are running rootless containers in a namespace created with
newuidmap/newgidmap where the mappings look like:

$ cat /proc/self/uid_map
0       1000          1
1     110000      65536

users are allowed to create child user namespaces and specify the
mappings to use.  Doing so, they often hit the issue that the mappings
cannot overlap multiple extents in the parent user namespace.

The issue could be completely addressed in user space, but to me it
looks like an implementation detail that user space should not know
about.
In addition, it would also be slower (additional read of the current
uid_map and gid_map files) and must be implemented separately in each
container runtime.

>> $ cat /proc/self/uid_map
>>          0       1000          1
>>          1     110000      65536
>> $ unshare -U sleep 100 &
>> [1] 1552
>> $ printf "0 0 100\n" | tee /proc/$!/uid_map
>> 0 0 100
>> $ cat /proc/$!/uid_map
>>          0          0          1
>>          1          1         99
>>
>> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
>> ---
>>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 52 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> index 87804e0371fe..b5542be2bd0a 100644
>> --- a/kernel/user_namespace.c
>> +++ b/kernel/user_namespace.c
>> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
>>  	.show = projid_m_show,
>>  };
>>  
>> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
>> +				       struct uid_gid_extent *extent,
>> +				       struct uid_gid_extent *overflow_extent)
>> +{
>> +	unsigned int idx;
>> +
>> +	overflow_extent->first = (u32) -1;
>> +
>> +	/* Split extent if it not fully contained in an extent from parent_map.  */
>> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {
>
> Ouch!
>
> For the larger tree we perform binary searches typically and
> here you are walking every entry unconditionally.
>
> It looks like this makes the write O(N^2) from O(NlogN)
> which for a user facing function is not desirable.
>
> I think something like insert_and_split_extent may be ok.
> Incorporating your loop and the part that inserts an element.
>
> As written this almost doubles the complexity of the code,
> as well as making it perform much worse.  Which is a problem.

I've attempted to implement the new functionality at input validation
time to not touch the existing security checks.

I've thought the pattern for iterating the extents was fine as I've
taken it from mappings_overlap (even if it is used differently on an
unsorted array).

Thanks for the hint, I'll move the new logic when map_id_range_down() is
used and I'll send a v2.

Thanks,
Giuseppe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: automatically split user namespace extent
  2020-12-02 16:12   ` Giuseppe Scrivano
@ 2021-04-02 14:32     ` Serge E. Hallyn
  2021-04-02 14:46       ` Giuseppe Scrivano
  0 siblings, 1 reply; 7+ messages in thread
From: Serge E. Hallyn @ 2021-04-02 14:32 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: Eric W. Biederman, linux-kernel, christian.brauner, serge,
	Linux Containers

On Wed, Dec 02, 2020 at 05:12:27PM +0100, Giuseppe Scrivano wrote:
> Hi Eric,
> 
> ebiederm@xmission.com (Eric W. Biederman) writes:
> 
> > Nit: The tag should have been "userns:" rather than kernel.
> >
> > Giuseppe Scrivano <gscrivan@redhat.com> writes:
> >
> >> writing to the id map fails when an extent overlaps multiple mappings
> >> in the parent user namespace, e.g.:
> >>
> >> $ cat /proc/self/uid_map
> >>          0       1000          1
> >>          1     100000      65536
> >> $ unshare -U sleep 100 &
> >> [1] 1029703
> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> >> 0 0 100
> >> tee: /proc/1029703/uid_map: Operation not permitted
> >>
> >> To prevent it from happening, automatically split an extent so that
> >> each portion fits in one extent in the parent user namespace.
> >
> > I don't see anything fundamentally wrong with relaxing this
> > restriction, but more code does have more room for bugs to hide.
> >
> > What is the advantage of relaxing this restriction?
> 
> we are running rootless containers in a namespace created with
> newuidmap/newgidmap where the mappings look like:
> 
> $ cat /proc/self/uid_map
> 0       1000          1
> 1     110000      65536
> 
> users are allowed to create child user namespaces and specify the
> mappings to use.  Doing so, they often hit the issue that the mappings
> cannot overlap multiple extents in the parent user namespace.
> 
> The issue could be completely addressed in user space, but to me it
> looks like an implementation detail that user space should not know
> about.
> In addition, it would also be slower (additional read of the current
> uid_map and gid_map files) and must be implemented separately in each
> container runtime.
> 
> >> $ cat /proc/self/uid_map
> >>          0       1000          1
> >>          1     110000      65536
> >> $ unshare -U sleep 100 &
> >> [1] 1552
> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> >> 0 0 100
> >> $ cat /proc/$!/uid_map
> >>          0          0          1
> >>          1          1         99
> >>
> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
> >> ---
> >>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
> >>  1 file changed, 52 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> index 87804e0371fe..b5542be2bd0a 100644
> >> --- a/kernel/user_namespace.c
> >> +++ b/kernel/user_namespace.c
> >> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
> >>  	.show = projid_m_show,
> >>  };
> >>  
> >> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
> >> +				       struct uid_gid_extent *extent,
> >> +				       struct uid_gid_extent *overflow_extent)
> >> +{
> >> +	unsigned int idx;
> >> +
> >> +	overflow_extent->first = (u32) -1;
> >> +
> >> +	/* Split extent if it not fully contained in an extent from parent_map.  */
> >> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {
> >
> > Ouch!
> >
> > For the larger tree we perform binary searches typically and
> > here you are walking every entry unconditionally.
> >
> > It looks like this makes the write O(N^2) from O(NlogN)
> > which for a user facing function is not desirable.
> >
> > I think something like insert_and_split_extent may be ok.
> > Incorporating your loop and the part that inserts an element.
> >
> > As written this almost doubles the complexity of the code,
> > as well as making it perform much worse.  Which is a problem.
> 
> I've attempted to implement the new functionality at input validation
> time to not touch the existing security checks.
> 
> I've thought the pattern for iterating the extents was fine as I've
> taken it from mappings_overlap (even if it is used differently on an
> unsorted array).
> 
> Thanks for the hint, I'll move the new logic when map_id_range_down() is
> used and I'll send a v2.

Hi,

sorry if I miseed it.  Did you ever send a v2?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: automatically split user namespace extent
  2021-04-02 14:32     ` Serge E. Hallyn
@ 2021-04-02 14:46       ` Giuseppe Scrivano
  2021-05-05 15:09         ` Giuseppe Scrivano
  0 siblings, 1 reply; 7+ messages in thread
From: Giuseppe Scrivano @ 2021-04-02 14:46 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, linux-kernel, christian.brauner, Linux Containers

Hi Serge,

"Serge E. Hallyn" <serge@hallyn.com> writes:

> On Wed, Dec 02, 2020 at 05:12:27PM +0100, Giuseppe Scrivano wrote:
>> Hi Eric,
>> 
>> ebiederm@xmission.com (Eric W. Biederman) writes:
>> 
>> > Nit: The tag should have been "userns:" rather than kernel.
>> >
>> > Giuseppe Scrivano <gscrivan@redhat.com> writes:
>> >
>> >> writing to the id map fails when an extent overlaps multiple mappings
>> >> in the parent user namespace, e.g.:
>> >>
>> >> $ cat /proc/self/uid_map
>> >>          0       1000          1
>> >>          1     100000      65536
>> >> $ unshare -U sleep 100 &
>> >> [1] 1029703
>> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
>> >> 0 0 100
>> >> tee: /proc/1029703/uid_map: Operation not permitted
>> >>
>> >> To prevent it from happening, automatically split an extent so that
>> >> each portion fits in one extent in the parent user namespace.
>> >
>> > I don't see anything fundamentally wrong with relaxing this
>> > restriction, but more code does have more room for bugs to hide.
>> >
>> > What is the advantage of relaxing this restriction?
>> 
>> we are running rootless containers in a namespace created with
>> newuidmap/newgidmap where the mappings look like:
>> 
>> $ cat /proc/self/uid_map
>> 0       1000          1
>> 1     110000      65536
>> 
>> users are allowed to create child user namespaces and specify the
>> mappings to use.  Doing so, they often hit the issue that the mappings
>> cannot overlap multiple extents in the parent user namespace.
>> 
>> The issue could be completely addressed in user space, but to me it
>> looks like an implementation detail that user space should not know
>> about.
>> In addition, it would also be slower (additional read of the current
>> uid_map and gid_map files) and must be implemented separately in each
>> container runtime.
>> 
>> >> $ cat /proc/self/uid_map
>> >>          0       1000          1
>> >>          1     110000      65536
>> >> $ unshare -U sleep 100 &
>> >> [1] 1552
>> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
>> >> 0 0 100
>> >> $ cat /proc/$!/uid_map
>> >>          0          0          1
>> >>          1          1         99
>> >>
>> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
>> >> ---
>> >>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
>> >>  1 file changed, 52 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>> >> index 87804e0371fe..b5542be2bd0a 100644
>> >> --- a/kernel/user_namespace.c
>> >> +++ b/kernel/user_namespace.c
>> >> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
>> >>  	.show = projid_m_show,
>> >>  };
>> >>  
>> >> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
>> >> +				       struct uid_gid_extent *extent,
>> >> +				       struct uid_gid_extent *overflow_extent)
>> >> +{
>> >> +	unsigned int idx;
>> >> +
>> >> +	overflow_extent->first = (u32) -1;
>> >> +
>> >> +	/* Split extent if it not fully contained in an extent from parent_map.  */
>> >> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {
>> >
>> > Ouch!
>> >
>> > For the larger tree we perform binary searches typically and
>> > here you are walking every entry unconditionally.
>> >
>> > It looks like this makes the write O(N^2) from O(NlogN)
>> > which for a user facing function is not desirable.
>> >
>> > I think something like insert_and_split_extent may be ok.
>> > Incorporating your loop and the part that inserts an element.
>> >
>> > As written this almost doubles the complexity of the code,
>> > as well as making it perform much worse.  Which is a problem.
>> 
>> I've attempted to implement the new functionality at input validation
>> time to not touch the existing security checks.
>> 
>> I've thought the pattern for iterating the extents was fine as I've
>> taken it from mappings_overlap (even if it is used differently on an
>> unsorted array).
>> 
>> Thanks for the hint, I'll move the new logic when map_id_range_down() is
>> used and I'll send a v2.
>
> Hi,
>
> sorry if I miseed it.  Did you ever send a v2?

no worries, the v2 is here:

https://lkml.kernel.org/lkml/20201203150252.1229077-1-gscrivan@redhat.com/

Regards,
Giuseppe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: automatically split user namespace extent
  2021-04-02 14:46       ` Giuseppe Scrivano
@ 2021-05-05 15:09         ` Giuseppe Scrivano
  2021-05-05 16:06           ` Serge E. Hallyn
  0 siblings, 1 reply; 7+ messages in thread
From: Giuseppe Scrivano @ 2021-05-05 15:09 UTC (permalink / raw)
  To: Serge E. Hallyn
  Cc: Eric W. Biederman, linux-kernel, christian.brauner, Linux Containers

Hi Serge,

Giuseppe Scrivano <gscrivan@redhat.com> writes:

> Hi Serge,
>
> "Serge E. Hallyn" <serge@hallyn.com> writes:
>
>> On Wed, Dec 02, 2020 at 05:12:27PM +0100, Giuseppe Scrivano wrote:
>>> Hi Eric,
>>> 
>>> ebiederm@xmission.com (Eric W. Biederman) writes:
>>> 
>>> > Nit: The tag should have been "userns:" rather than kernel.
>>> >
>>> > Giuseppe Scrivano <gscrivan@redhat.com> writes:
>>> >
>>> >> writing to the id map fails when an extent overlaps multiple mappings
>>> >> in the parent user namespace, e.g.:
>>> >>
>>> >> $ cat /proc/self/uid_map
>>> >>          0       1000          1
>>> >>          1     100000      65536
>>> >> $ unshare -U sleep 100 &
>>> >> [1] 1029703
>>> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
>>> >> 0 0 100
>>> >> tee: /proc/1029703/uid_map: Operation not permitted
>>> >>
>>> >> To prevent it from happening, automatically split an extent so that
>>> >> each portion fits in one extent in the parent user namespace.
>>> >
>>> > I don't see anything fundamentally wrong with relaxing this
>>> > restriction, but more code does have more room for bugs to hide.
>>> >
>>> > What is the advantage of relaxing this restriction?
>>> 
>>> we are running rootless containers in a namespace created with
>>> newuidmap/newgidmap where the mappings look like:
>>> 
>>> $ cat /proc/self/uid_map
>>> 0       1000          1
>>> 1     110000      65536
>>> 
>>> users are allowed to create child user namespaces and specify the
>>> mappings to use.  Doing so, they often hit the issue that the mappings
>>> cannot overlap multiple extents in the parent user namespace.
>>> 
>>> The issue could be completely addressed in user space, but to me it
>>> looks like an implementation detail that user space should not know
>>> about.
>>> In addition, it would also be slower (additional read of the current
>>> uid_map and gid_map files) and must be implemented separately in each
>>> container runtime.
>>> 
>>> >> $ cat /proc/self/uid_map
>>> >>          0       1000          1
>>> >>          1     110000      65536
>>> >> $ unshare -U sleep 100 &
>>> >> [1] 1552
>>> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
>>> >> 0 0 100
>>> >> $ cat /proc/$!/uid_map
>>> >>          0          0          1
>>> >>          1          1         99
>>> >>
>>> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
>>> >> ---
>>> >>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
>>> >>  1 file changed, 52 insertions(+), 10 deletions(-)
>>> >>
>>> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
>>> >> index 87804e0371fe..b5542be2bd0a 100644
>>> >> --- a/kernel/user_namespace.c
>>> >> +++ b/kernel/user_namespace.c
>>> >> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
>>> >>  	.show = projid_m_show,
>>> >>  };
>>> >>  
>>> >> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
>>> >> +				       struct uid_gid_extent *extent,
>>> >> +				       struct uid_gid_extent *overflow_extent)
>>> >> +{
>>> >> +	unsigned int idx;
>>> >> +
>>> >> +	overflow_extent->first = (u32) -1;
>>> >> +
>>> >> +	/* Split extent if it not fully contained in an extent from parent_map.  */
>>> >> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {
>>> >
>>> > Ouch!
>>> >
>>> > For the larger tree we perform binary searches typically and
>>> > here you are walking every entry unconditionally.
>>> >
>>> > It looks like this makes the write O(N^2) from O(NlogN)
>>> > which for a user facing function is not desirable.
>>> >
>>> > I think something like insert_and_split_extent may be ok.
>>> > Incorporating your loop and the part that inserts an element.
>>> >
>>> > As written this almost doubles the complexity of the code,
>>> > as well as making it perform much worse.  Which is a problem.
>>> 
>>> I've attempted to implement the new functionality at input validation
>>> time to not touch the existing security checks.
>>> 
>>> I've thought the pattern for iterating the extents was fine as I've
>>> taken it from mappings_overlap (even if it is used differently on an
>>> unsorted array).
>>> 
>>> Thanks for the hint, I'll move the new logic when map_id_range_down() is
>>> used and I'll send a v2.
>>
>> Hi,
>>
>> sorry if I miseed it.  Did you ever send a v2?
>
> no worries, the v2 is here:
>
> https://lkml.kernel.org/lkml/20201203150252.1229077-1-gscrivan@redhat.com/

have you had a chance to look at the patch?

Thanks,
Giuseppe


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] kernel: automatically split user namespace extent
  2021-05-05 15:09         ` Giuseppe Scrivano
@ 2021-05-05 16:06           ` Serge E. Hallyn
  0 siblings, 0 replies; 7+ messages in thread
From: Serge E. Hallyn @ 2021-05-05 16:06 UTC (permalink / raw)
  To: Giuseppe Scrivano
  Cc: Serge E. Hallyn, Eric W. Biederman, linux-kernel,
	christian.brauner, Linux Containers

No.  Moving it to the top of my queue for tonight.

On Wed, May 05, 2021 at 05:09:08PM +0200, Giuseppe Scrivano wrote:
> Hi Serge,
> 
> Giuseppe Scrivano <gscrivan@redhat.com> writes:
> 
> > Hi Serge,
> >
> > "Serge E. Hallyn" <serge@hallyn.com> writes:
> >
> >> On Wed, Dec 02, 2020 at 05:12:27PM +0100, Giuseppe Scrivano wrote:
> >>> Hi Eric,
> >>> 
> >>> ebiederm@xmission.com (Eric W. Biederman) writes:
> >>> 
> >>> > Nit: The tag should have been "userns:" rather than kernel.
> >>> >
> >>> > Giuseppe Scrivano <gscrivan@redhat.com> writes:
> >>> >
> >>> >> writing to the id map fails when an extent overlaps multiple mappings
> >>> >> in the parent user namespace, e.g.:
> >>> >>
> >>> >> $ cat /proc/self/uid_map
> >>> >>          0       1000          1
> >>> >>          1     100000      65536
> >>> >> $ unshare -U sleep 100 &
> >>> >> [1] 1029703
> >>> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> >>> >> 0 0 100
> >>> >> tee: /proc/1029703/uid_map: Operation not permitted
> >>> >>
> >>> >> To prevent it from happening, automatically split an extent so that
> >>> >> each portion fits in one extent in the parent user namespace.
> >>> >
> >>> > I don't see anything fundamentally wrong with relaxing this
> >>> > restriction, but more code does have more room for bugs to hide.
> >>> >
> >>> > What is the advantage of relaxing this restriction?
> >>> 
> >>> we are running rootless containers in a namespace created with
> >>> newuidmap/newgidmap where the mappings look like:
> >>> 
> >>> $ cat /proc/self/uid_map
> >>> 0       1000          1
> >>> 1     110000      65536
> >>> 
> >>> users are allowed to create child user namespaces and specify the
> >>> mappings to use.  Doing so, they often hit the issue that the mappings
> >>> cannot overlap multiple extents in the parent user namespace.
> >>> 
> >>> The issue could be completely addressed in user space, but to me it
> >>> looks like an implementation detail that user space should not know
> >>> about.
> >>> In addition, it would also be slower (additional read of the current
> >>> uid_map and gid_map files) and must be implemented separately in each
> >>> container runtime.
> >>> 
> >>> >> $ cat /proc/self/uid_map
> >>> >>          0       1000          1
> >>> >>          1     110000      65536
> >>> >> $ unshare -U sleep 100 &
> >>> >> [1] 1552
> >>> >> $ printf "0 0 100\n" | tee /proc/$!/uid_map
> >>> >> 0 0 100
> >>> >> $ cat /proc/$!/uid_map
> >>> >>          0          0          1
> >>> >>          1          1         99
> >>> >>
> >>> >> Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
> >>> >> ---
> >>> >>  kernel/user_namespace.c | 62 ++++++++++++++++++++++++++++++++++-------
> >>> >>  1 file changed, 52 insertions(+), 10 deletions(-)
> >>> >>
> >>> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >>> >> index 87804e0371fe..b5542be2bd0a 100644
> >>> >> --- a/kernel/user_namespace.c
> >>> >> +++ b/kernel/user_namespace.c
> >>> >> @@ -706,6 +706,41 @@ const struct seq_operations proc_projid_seq_operations = {
> >>> >>  	.show = projid_m_show,
> >>> >>  };
> >>> >>  
> >>> >> +static void split_overlapping_mappings(struct uid_gid_map *parent_map,
> >>> >> +				       struct uid_gid_extent *extent,
> >>> >> +				       struct uid_gid_extent *overflow_extent)
> >>> >> +{
> >>> >> +	unsigned int idx;
> >>> >> +
> >>> >> +	overflow_extent->first = (u32) -1;
> >>> >> +
> >>> >> +	/* Split extent if it not fully contained in an extent from parent_map.  */
> >>> >> +	for (idx = 0; idx < parent_map->nr_extents; idx++) {
> >>> >
> >>> > Ouch!
> >>> >
> >>> > For the larger tree we perform binary searches typically and
> >>> > here you are walking every entry unconditionally.
> >>> >
> >>> > It looks like this makes the write O(N^2) from O(NlogN)
> >>> > which for a user facing function is not desirable.
> >>> >
> >>> > I think something like insert_and_split_extent may be ok.
> >>> > Incorporating your loop and the part that inserts an element.
> >>> >
> >>> > As written this almost doubles the complexity of the code,
> >>> > as well as making it perform much worse.  Which is a problem.
> >>> 
> >>> I've attempted to implement the new functionality at input validation
> >>> time to not touch the existing security checks.
> >>> 
> >>> I've thought the pattern for iterating the extents was fine as I've
> >>> taken it from mappings_overlap (even if it is used differently on an
> >>> unsorted array).
> >>> 
> >>> Thanks for the hint, I'll move the new logic when map_id_range_down() is
> >>> used and I'll send a v2.
> >>
> >> Hi,
> >>
> >> sorry if I miseed it.  Did you ever send a v2?
> >
> > no worries, the v2 is here:
> >
> > https://lkml.kernel.org/lkml/20201203150252.1229077-1-gscrivan@redhat.com/
> 
> have you had a chance to look at the patch?
> 
> Thanks,
> Giuseppe

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-05 16:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-26 10:08 [PATCH] kernel: automatically split user namespace extent Giuseppe Scrivano
2020-12-01 17:53 ` Eric W. Biederman
2020-12-02 16:12   ` Giuseppe Scrivano
2021-04-02 14:32     ` Serge E. Hallyn
2021-04-02 14:46       ` Giuseppe Scrivano
2021-05-05 15:09         ` Giuseppe Scrivano
2021-05-05 16:06           ` Serge E. Hallyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).