From: Peter Zijlstra <peterz@infradead.org>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Nikolay Borisov <nborisov@suse.com>,
Christian Brauner <christian.brauner@ubuntu.com>,
Linux Containers <containers@lists.linux-foundation.org>,
tycho@tycho.ws, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/5] userns: Don't read extents twice in m_start
Date: Wed, 1 Nov 2017 14:05:39 +0100 [thread overview]
Message-ID: <20171101130539.j5bxmhs2trqurrr2@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <87a806ntn0.fsf@xmission.com>
On Wed, Nov 01, 2017 at 06:08:35AM -0500, Eric W. Biederman wrote:
> Nikolay Borisov <nborisov@suse.com> writes:
>
> > On 1.11.2017 01:48, Eric W. Biederman wrote:
> >>
> >> This is important so reading /proc/<pid>/{uid_map,gid_map,projid_map} while
> >> the map is being written does not do strange things.
> >>
> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
> >> ---
> >> kernel/user_namespace.c | 6 ++++--
> >> 1 file changed, 4 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
> >> index 563a2981d7c7..4f7e357ac1e2 100644
> >> --- a/kernel/user_namespace.c
> >> +++ b/kernel/user_namespace.c
> >> @@ -683,11 +683,13 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
> >> struct uid_gid_map *map)
> >> {
> >> loff_t pos = *ppos;
> >> + unsigned extents = map->nr_extents;
> >> + smp_rmb();
> >
> > Barriers need to be paired to work correctly as well as have explicit
> > comments describing the pairing as per kernel coding style. Checkpatch
> > will actually produce warning for that particular memory barrier.
>
> So please look at the code and read the comment.
What comment, there isn't any, which is what he's complaining about.
> The fact the barrier was not in m_start earlier is strictly speaking a
> bug.
Sure; doesn't excuse you for not writing sensible comments to go with
it.
> In practice except for a very narrow window when this data is changing
> the one time it can, this code does not matter at all.
>
> As for checkpatch I have sympathy for it, checkpatch has a hard job,
> but I won't listen to checkpatch when it is wrong.
No, undocumented barriers are a royal pain. Memory barriers should come
with a comment that describes the desired ordering and points to the
pairing barrier(s).
Also, you probably want READ_ONCE() here and WRITE_ONCE() in
map_write(), the compiler is free to do unordered byte loads/stores
without it.
And finally, did you want to use smp_store_release() and
smp_load_acquire() instead?
Something like so perhaps?
---
kernel/user_namespace.c | 73 ++++++++++++++++++++++++++++++-------------------
1 file changed, 45 insertions(+), 28 deletions(-)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index c490f1e4313b..f758911cabd5 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -25,8 +25,47 @@
#include <linux/fs_struct.h>
static struct kmem_cache *user_ns_cachep __read_mostly;
+
+/*
+ * The userns_state_mutex serializes all writes to any given map.
+ *
+ * Any map is only ever written once.
+ *
+ * An id map fits within 1 cache line on most architectures.
+ */
static DEFINE_MUTEX(userns_state_mutex);
+/*
+ *
+ * There is a one time data dependency between reading the count of the extents
+ * and the values of the extents. The desired behavior is to see the values of
+ * the extents that were written before the count of the extents.
+ *
+ * To achieve this smp_store_release() is used on guarantee the write order and
+ * smp_load_acquire() is guaranteed that we don't have weakly ordered
+ * architectures returning stale data.
+ */
+static inline void map_store_extents(struct uid_gid_map *map, unsigned int extents)
+{
+ /*
+ * Ensure the map->extent[] stores happen-before we grow map->nr_extents
+ * to cover it. Matches the load_acquire in map_load_extents().
+ */
+ smp_store_release(&map->nr_extents, extents);
+}
+
+static inline unsigned int map_load_extents(struct uid_gid_map *map)
+{
+ /*
+ * Ensure the map->nr_extents load happens-before we try and access
+ * map->extent[], such that we guarantee the data is in fact there.
+ *
+ * Matches the store-relese in map_store_extents().
+ */
+ return smp_load_acquire(&map->nr_extents);
+}
+
+
static bool new_idmap_permitted(const struct file *file,
struct user_namespace *ns, int cap_setid,
struct uid_gid_map *map);
@@ -206,8 +245,7 @@ static u32 map_id_range_down(struct uid_gid_map *map, u32 id, u32 count)
id2 = id + count - 1;
/* Find the matching extent */
- extents = map->nr_extents;
- smp_rmb();
+ extents = map_load_extents(map);
for (idx = 0; idx < extents; idx++) {
first = map->extent[idx].first;
last = first + map->extent[idx].count - 1;
@@ -230,8 +268,7 @@ static u32 map_id_down(struct uid_gid_map *map, u32 id)
u32 first, last;
/* Find the matching extent */
- extents = map->nr_extents;
- smp_rmb();
+ extents = map_load_extents(map);
for (idx = 0; idx < extents; idx++) {
first = map->extent[idx].first;
last = first + map->extent[idx].count - 1;
@@ -253,8 +290,7 @@ static u32 map_id_up(struct uid_gid_map *map, u32 id)
u32 first, last;
/* Find the matching extent */
- extents = map->nr_extents;
- smp_rmb();
+ extents = map_load_extents(map);
for (idx = 0; idx < extents; idx++) {
first = map->extent[idx].lower_first;
last = first + map->extent[idx].count - 1;
@@ -543,7 +579,7 @@ static void *m_start(struct seq_file *seq, loff_t *ppos,
struct uid_gid_extent *extent = NULL;
loff_t pos = *ppos;
- if (pos < map->nr_extents)
+ if (pos < map_load_extents(map))
extent = &map->extent[pos];
return extent;
@@ -652,25 +688,6 @@ static ssize_t map_write(struct file *file, const char __user *buf,
char *kbuf = NULL, *pos, *next_line;
ssize_t ret = -EINVAL;
- /*
- * The userns_state_mutex serializes all writes to any given map.
- *
- * Any map is only ever written once.
- *
- * An id map fits within 1 cache line on most architectures.
- *
- * On read nothing needs to be done unless you are on an
- * architecture with a crazy cache coherency model like alpha.
- *
- * There is a one time data dependency between reading the
- * count of the extents and the values of the extents. The
- * desired behavior is to see the values of the extents that
- * were written before the count of the extents.
- *
- * To achieve this smp_wmb() is used on guarantee the write
- * order and smp_rmb() is guaranteed that we don't have crazy
- * architectures returning stale data.
- */
mutex_lock(&userns_state_mutex);
ret = -EPERM;
@@ -790,8 +807,8 @@ static ssize_t map_write(struct file *file, const char __user *buf,
/* Install the map */
memcpy(map->extent, new_map.extent,
new_map.nr_extents*sizeof(new_map.extent[0]));
- smp_wmb();
- map->nr_extents = new_map.nr_extents;
+
+ map_store_extents(map, new_map.nr_extents);
*ppos = count;
ret = count;
next prev parent reply other threads:[~2017-11-01 13:05 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-24 22:04 [PATCH 1/2 v6] user namespace: use union in {g,u}idmap struct Christian Brauner
2017-10-24 22:04 ` [PATCH 2/2 v6] user namespaces: bump idmap limits to 340 Christian Brauner
2017-10-31 23:46 ` [PATCH 0/5] userns: bump idmap limits, fixes & tweaks Eric W. Biederman
2017-10-31 23:47 ` [PATCH 1/5] userns: Don't special case a count of 0 Eric W. Biederman
2017-10-31 23:47 ` [PATCH 2/5] userns: Simplify the user and group mapping functions Eric W. Biederman
2017-10-31 23:48 ` [PATCH 3/5] userns: Don't read extents twice in m_start Eric W. Biederman
2017-11-01 8:31 ` Nikolay Borisov
2017-11-01 11:08 ` Eric W. Biederman
2017-11-01 13:05 ` Nikolay Borisov
2017-11-01 13:05 ` Peter Zijlstra [this message]
2017-11-01 14:01 ` Christian Brauner
2017-11-01 14:16 ` Peter Zijlstra
2017-11-01 16:29 ` Christian Brauner
2017-11-01 16:31 ` Christian Brauner
2017-11-01 17:00 ` Joe Perches
2017-11-01 17:20 ` Eric W. Biederman
2017-11-01 18:15 ` Peter Zijlstra
2017-10-31 23:48 ` [PATCH 4/5] userns: Make map_id_down a wrapper for map_id_range_down Eric W. Biederman
2017-10-31 23:49 ` [PATCH 5/5] userns: Simplify insert_extent Eric W. Biederman
2017-11-01 10:51 ` [PATCH 0/5] userns: bump idmap limits, fixes & tweaks Christian Brauner
2017-11-01 11:15 ` Eric W. Biederman
2017-11-01 13:31 ` Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171101130539.j5bxmhs2trqurrr2@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux-foundation.org \
--cc=ebiederm@xmission.com \
--cc=linux-kernel@vger.kernel.org \
--cc=nborisov@suse.com \
--cc=tycho@tycho.ws \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).