linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
@ 2019-10-12  1:05 Aleksa Sarai
  2019-10-14 15:41 ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2019-10-12  1:05 UTC (permalink / raw)
  To: Tejun Heo, Li Zefan, Johannes Weiner
  Cc: cgroups, linux-kernel, Aleksa Sarai, stable

Because pids->limit can be changed concurrently (but we don't want to
take a lock because it would be needlessly expensive), use the
appropriate memory barriers.

Fixes: commit 49b786ea146f ("cgroup: implement the PIDs subsystem")
Cc: stable@vger.kernel.org # v4.3+
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
---
 kernel/cgroup/pids.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 8e513a573fe9..a726e4a20177 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -152,7 +152,7 @@ static int pids_try_charge(struct pids_cgroup *pids, int num)
 		 * p->limit is %PIDS_MAX then we know that this test will never
 		 * fail.
 		 */
-		if (new > p->limit)
+		if (new > READ_ONCE(p->limit))
 			goto revert;
 	}
 
@@ -277,7 +277,7 @@ static ssize_t pids_max_write(struct kernfs_open_file *of, char *buf,
 	 * Limit updates don't need to be mutex'd, since it isn't
 	 * critical that any racing fork()s follow the new limit.
 	 */
-	pids->limit = limit;
+	WRITE_ONCE(pids->limit, limit);
 	return nbytes;
 }
 
@@ -285,7 +285,7 @@ static int pids_max_show(struct seq_file *sf, void *v)
 {
 	struct cgroup_subsys_state *css = seq_css(sf);
 	struct pids_cgroup *pids = css_pids(css);
-	int64_t limit = pids->limit;
+	int64_t limit = READ_ONCE(pids->limit);
 
 	if (limit >= PIDS_MAX)
 		seq_printf(sf, "%s\n", PIDS_MAX_STR);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-12  1:05 [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations Aleksa Sarai
@ 2019-10-14 15:41 ` Tejun Heo
  2019-10-14 15:59   ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2019-10-14 15:41 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> Because pids->limit can be changed concurrently (but we don't want to
> take a lock because it would be needlessly expensive), use the
> appropriate memory barriers.

I can't quite tell what problem it's fixing.  Can you elaborate a
scenario where the current code would break that your patch fixes?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-14 15:41 ` Tejun Heo
@ 2019-10-14 15:59   ` Aleksa Sarai
  2019-10-14 16:33     ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2019-10-14 15:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1147 bytes --]

On 2019-10-14, Tejun Heo <tj@kernel.org> wrote:
> On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > Because pids->limit can be changed concurrently (but we don't want to
> > take a lock because it would be needlessly expensive), use the
> > appropriate memory barriers.
> 
> I can't quite tell what problem it's fixing.  Can you elaborate a
> scenario where the current code would break that your patch fixes?

As far as I can tell, not using *_ONCE() here means that if you had a
process changing pids->limit from A to B, a process might be able to
temporarily exceed pids->limit -- because pids->limit accesses are not
protected by mutexes and the C compiler can produce confusing
intermediate values for pids->limit[1].

But this is more of a correctness fix than one fixing an actually
exploitable bug -- given the kernel memory model work, it seems like a
good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
access.

[1]: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-14 15:59   ` Aleksa Sarai
@ 2019-10-14 16:33     ` Tejun Heo
  2019-10-16  8:32       ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2019-10-14 16:33 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

Hello, Aleksa.

On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
> On 2019-10-14, Tejun Heo <tj@kernel.org> wrote:
> > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > > Because pids->limit can be changed concurrently (but we don't want to
> > > take a lock because it would be needlessly expensive), use the
> > > appropriate memory barriers.
> > 
> > I can't quite tell what problem it's fixing.  Can you elaborate a
> > scenario where the current code would break that your patch fixes?
> 
> As far as I can tell, not using *_ONCE() here means that if you had a
> process changing pids->limit from A to B, a process might be able to
> temporarily exceed pids->limit -- because pids->limit accesses are not
> protected by mutexes and the C compiler can produce confusing
> intermediate values for pids->limit[1].
>
> But this is more of a correctness fix than one fixing an actually
> exploitable bug -- given the kernel memory model work, it seems like a
> good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
> access.

READ/WRITE_ONCE provides protection against compiler generating
multiple accesses for a single operation.  It won't prevent split
writes / reads of 64bit variables on 32bit machines.  For that, you'd
have to switch them to atomic64_t's.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-14 16:33     ` Tejun Heo
@ 2019-10-16  8:32       ` Aleksa Sarai
  2019-10-16 14:27         ` Tejun Heo
  0 siblings, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2019-10-16  8:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 2288 bytes --]

On 2019-10-14, Tejun Heo <tj@kernel.org> wrote:
> Hello, Aleksa.
> 
> On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
> > On 2019-10-14, Tejun Heo <tj@kernel.org> wrote:
> > > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > > > Because pids->limit can be changed concurrently (but we don't want to
> > > > take a lock because it would be needlessly expensive), use the
> > > > appropriate memory barriers.
> > > 
> > > I can't quite tell what problem it's fixing.  Can you elaborate a
> > > scenario where the current code would break that your patch fixes?
> > 
> > As far as I can tell, not using *_ONCE() here means that if you had a
> > process changing pids->limit from A to B, a process might be able to
> > temporarily exceed pids->limit -- because pids->limit accesses are not
> > protected by mutexes and the C compiler can produce confusing
> > intermediate values for pids->limit[1].
> >
> > But this is more of a correctness fix than one fixing an actually
> > exploitable bug -- given the kernel memory model work, it seems like a
> > good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
> > access.
> 
> READ/WRITE_ONCE provides protection against compiler generating
> multiple accesses for a single operation.  It won't prevent split
> writes / reads of 64bit variables on 32bit machines.  For that, you'd
> have to switch them to atomic64_t's.

Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
me like it's explicitly saying that I shouldn't use atomic64_t if I'm
just using it for fetching and assignment.

> The non-RMW ops are (typically) regular LOADs and STOREs and are
> canonically implemented using READ_ONCE(), WRITE_ONCE(),
> smp_load_acquire() and smp_store_release() respectively. Therefore, if
> you find yourself only using the Non-RMW operations of atomic_t, you
> do not in fact need atomic_t at all and are doing it wrong.

As for 64-bit on 32-bit machines -- that is a separate issue, but from
[1] it seems to me like there are more problems that *_ONCE() fixes than
just split reads and writes.

[1]: https://lwn.net/Articles/793253/

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-16  8:32       ` Aleksa Sarai
@ 2019-10-16 14:27         ` Tejun Heo
  2019-10-16 15:29           ` Aleksa Sarai
  0 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2019-10-16 14:27 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

Hello, Aleksa.

On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> just using it for fetching and assignment.

Hah, where is it saying that?  The alternative would be seqlock or
u64_stats or straight-up locking but idk for this atomic64_t should be
fine.

> > The non-RMW ops are (typically) regular LOADs and STOREs and are
> > canonically implemented using READ_ONCE(), WRITE_ONCE(),
> > smp_load_acquire() and smp_store_release() respectively. Therefore, if
> > you find yourself only using the Non-RMW operations of atomic_t, you
> > do not in fact need atomic_t at all and are doing it wrong.
> 
> As for 64-bit on 32-bit machines -- that is a separate issue, but from
> [1] it seems to me like there are more problems that *_ONCE() fixes than
> just split reads and writes.

Your explanations are too wishy washy.  If you wanna fix it, please do
it correctly.  R/W ONCE isn't the right solution here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-16 14:27         ` Tejun Heo
@ 2019-10-16 15:29           ` Aleksa Sarai
  2019-10-16 15:32             ` Tejun Heo
  2019-10-16 15:35             ` Aleksa Sarai
  0 siblings, 2 replies; 10+ messages in thread
From: Aleksa Sarai @ 2019-10-16 15:29 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1407 bytes --]

On 2019-10-16, Tejun Heo <tj@kernel.org> wrote:
> Hello, Aleksa.
> 
> On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> > me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> > just using it for fetching and assignment.
> 
> Hah, where is it saying that?

Isn't that what this says:

> Therefore, if you find yourself only using the Non-RMW operations of
> atomic_t, you do not in fact need atomic_t at all and are doing it
> wrong.

Doesn't using just atomic64_read() and atomic64_set() fall under "only
using the non-RMW operations of atomic_t"? But yes, I agree that any
locking is overkill.

> > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > just split reads and writes.
> 
> Your explanations are too wishy washy.  If you wanna fix it, please do
> it correctly.  R/W ONCE isn't the right solution here.

Sure, I will switch it to use atomic64_read() and atomic64_set() instead
if that's what you'd prefer. Though I will mention that on quite a few
architectures atomic64_read() is defined as:

  #define atomic64_read(v)        READ_ONCE((v)->counter)

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-16 15:29           ` Aleksa Sarai
@ 2019-10-16 15:32             ` Tejun Heo
  2019-10-16 15:35             ` Aleksa Sarai
  1 sibling, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2019-10-16 15:32 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

Hello,

On Thu, Oct 17, 2019 at 02:29:46AM +1100, Aleksa Sarai wrote:
> > Hah, where is it saying that?
> 
> Isn't that what this says:
> 
> > Therefore, if you find yourself only using the Non-RMW operations of
> > atomic_t, you do not in fact need atomic_t at all and are doing it
> > wrong.
> 
> Doesn't using just atomic64_read() and atomic64_set() fall under "only
> using the non-RMW operations of atomic_t"? But yes, I agree that any
> locking is overkill.

Yeah, I mean, it's an overkill.  We can use seqlock or u64_stat here
but it doesn't matter that much.

> > > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > > just split reads and writes.
> > 
> > Your explanations are too wishy washy.  If you wanna fix it, please do
> > it correctly.  R/W ONCE isn't the right solution here.
> 
> Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> if that's what you'd prefer. Though I will mention that on quite a few
> architectures atomic64_read() is defined as:
> 
>   #define atomic64_read(v)        READ_ONCE((v)->counter)

Yeah, on archs which don't have split access on 64bits.  On the ones
which do, it does something else.  The generic implementation is
straight-up locking, I think.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-16 15:29           ` Aleksa Sarai
  2019-10-16 15:32             ` Tejun Heo
@ 2019-10-16 15:35             ` Aleksa Sarai
  2019-10-16 15:54               ` Tejun Heo
  1 sibling, 1 reply; 10+ messages in thread
From: Aleksa Sarai @ 2019-10-16 15:35 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

[-- Attachment #1: Type: text/plain, Size: 1632 bytes --]

On 2019-10-17, Aleksa Sarai <cyphar@cyphar.com> wrote:
> On 2019-10-16, Tejun Heo <tj@kernel.org> wrote:
> > Hello, Aleksa.
> > 
> > On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> > > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> > > me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> > > just using it for fetching and assignment.
> > 
> > Hah, where is it saying that?
> 
> Isn't that what this says:
> 
> > Therefore, if you find yourself only using the Non-RMW operations of
> > atomic_t, you do not in fact need atomic_t at all and are doing it
> > wrong.
> 
> Doesn't using just atomic64_read() and atomic64_set() fall under "only
> using the non-RMW operations of atomic_t"? But yes, I agree that any
> locking is overkill.
> 
> > > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > > just split reads and writes.
> > 
> > Your explanations are too wishy washy.  If you wanna fix it, please do
> > it correctly.  R/W ONCE isn't the right solution here.
> 
> Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> if that's what you'd prefer. Though I will mention that on quite a few
> architectures atomic64_read() is defined as:
> 
>   #define atomic64_read(v)        READ_ONCE((v)->counter)

Though I guess that's because on those architectures it turns out that
READ_ONCE is properly atomic?

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations
  2019-10-16 15:35             ` Aleksa Sarai
@ 2019-10-16 15:54               ` Tejun Heo
  0 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2019-10-16 15:54 UTC (permalink / raw)
  To: Aleksa Sarai; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel, stable

On Thu, Oct 17, 2019 at 02:35:20AM +1100, Aleksa Sarai wrote:
> > Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> > if that's what you'd prefer. Though I will mention that on quite a few
> > architectures atomic64_read() is defined as:
> > 
> >   #define atomic64_read(v)        READ_ONCE((v)->counter)
> 
> Though I guess that's because on those architectures it turns out that
> READ_ONCE is properly atomic?

Oh yeah, on archs where 64bit accesses are atomic, READ_ONCE() /
WRITE_ONCE() would work here.  If the limit variable were ulong
instead of an explicit 64bit variable, RW ONCE would work too as ulong
accesses are atomic on all archs IIRC.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-16 15:54 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-12  1:05 [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations Aleksa Sarai
2019-10-14 15:41 ` Tejun Heo
2019-10-14 15:59   ` Aleksa Sarai
2019-10-14 16:33     ` Tejun Heo
2019-10-16  8:32       ` Aleksa Sarai
2019-10-16 14:27         ` Tejun Heo
2019-10-16 15:29           ` Aleksa Sarai
2019-10-16 15:32             ` Tejun Heo
2019-10-16 15:35             ` Aleksa Sarai
2019-10-16 15:54               ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).