All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
@ 2020-02-11 14:15 ` Qais Yousef
  0 siblings, 0 replies; 8+ messages in thread
From: Qais Yousef @ 2020-02-11 14:15 UTC (permalink / raw)
  To: Li Zefan, Tejun Heo, Johannes Weiner; +Cc: cgroups, linux-kernel, Qais Yousef

LTP cpuset_hotplug_test.sh was failing with the following error message

	cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5).

Which is due to a race condition between cpu hotplug operation and
reading cpuset.cpus file.

When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its
internal data structures with the new values. If a read happens during
this window, the user will read a stale value, hence triggering the
failure above.

To fix the issue make sure cpuset_wait_for_hotplug() is called before
allowing any value to be read, hence forcing the synchronization to
happen before the read.

I ran 500 iterations with this fix applied with no failure triggered.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---

I think it's okay to flush the workqueue from the read context? We do it on the
write, so I assumed it's okay on the read too. But it'd be good to confirm it
doesn't break any rule I'm not aware of.

 kernel/cgroup/cpuset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58f5073acff7..593055522626 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2405,6 +2405,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
 	cpuset_filetype_t type = seq_cft(sf)->private;
 	int ret = 0;
 
+	/* Ensure all hotplug ops were done before reading any value */
+	cpuset_wait_for_hotplug();
+
 	spin_lock_irq(&callback_lock);
 
 	switch (type) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
@ 2020-02-11 14:15 ` Qais Yousef
  0 siblings, 0 replies; 8+ messages in thread
From: Qais Yousef @ 2020-02-11 14:15 UTC (permalink / raw)
  To: Li Zefan, Tejun Heo, Johannes Weiner
  Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Qais Yousef

LTP cpuset_hotplug_test.sh was failing with the following error message

	cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5).

Which is due to a race condition between cpu hotplug operation and
reading cpuset.cpus file.

When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its
internal data structures with the new values. If a read happens during
this window, the user will read a stale value, hence triggering the
failure above.

To fix the issue make sure cpuset_wait_for_hotplug() is called before
allowing any value to be read, hence forcing the synchronization to
happen before the read.

I ran 500 iterations with this fix applied with no failure triggered.

Signed-off-by: Qais Yousef <qais.yousef-5wv7dgnIgG8@public.gmane.org>
---

I think it's okay to flush the workqueue from the read context? We do it on the
write, so I assumed it's okay on the read too. But it'd be good to confirm it
doesn't break any rule I'm not aware of.

 kernel/cgroup/cpuset.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 58f5073acff7..593055522626 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2405,6 +2405,9 @@ static int cpuset_common_seq_show(struct seq_file *sf, void *v)
 	cpuset_filetype_t type = seq_cft(sf)->private;
 	int ret = 0;
 
+	/* Ensure all hotplug ops were done before reading any value */
+	cpuset_wait_for_hotplug();
+
 	spin_lock_irq(&callback_lock);
 
 	switch (type) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
@ 2020-02-12 22:15   ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2020-02-12 22:15 UTC (permalink / raw)
  To: Qais Yousef; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel

On Tue, Feb 11, 2020 at 02:15:54PM +0000, Qais Yousef wrote:
> LTP cpuset_hotplug_test.sh was failing with the following error message
> 
> 	cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5).
> 
> Which is due to a race condition between cpu hotplug operation and
> reading cpuset.cpus file.
> 
> When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its
> internal data structures with the new values. If a read happens during
> this window, the user will read a stale value, hence triggering the
> failure above.
> 
> To fix the issue make sure cpuset_wait_for_hotplug() is called before
> allowing any value to be read, hence forcing the synchronization to
> happen before the read.
> 
> I ran 500 iterations with this fix applied with no failure triggered.
> 
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>

Hello, Qais. I just applied a patch which makes the operation
synchronous. Can you see whether the problem is gone on the
cgroup/for-next branch?

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
@ 2020-02-12 22:15   ` Tejun Heo
  0 siblings, 0 replies; 8+ messages in thread
From: Tejun Heo @ 2020-02-12 22:15 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Li Zefan, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, Feb 11, 2020 at 02:15:54PM +0000, Qais Yousef wrote:
> LTP cpuset_hotplug_test.sh was failing with the following error message
> 
> 	cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5).
> 
> Which is due to a race condition between cpu hotplug operation and
> reading cpuset.cpus file.
> 
> When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its
> internal data structures with the new values. If a read happens during
> this window, the user will read a stale value, hence triggering the
> failure above.
> 
> To fix the issue make sure cpuset_wait_for_hotplug() is called before
> allowing any value to be read, hence forcing the synchronization to
> happen before the read.
> 
> I ran 500 iterations with this fix applied with no failure triggered.
> 
> Signed-off-by: Qais Yousef <qais.yousef-5wv7dgnIgG8@public.gmane.org>

Hello, Qais. I just applied a patch which makes the operation
synchronous. Can you see whether the problem is gone on the
cgroup/for-next branch?

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
@ 2020-02-13 11:50     ` Qais Yousef
  0 siblings, 0 replies; 8+ messages in thread
From: Qais Yousef @ 2020-02-13 11:50 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel

Hi Tejun

On 02/12/20 17:15, Tejun Heo wrote:
> On Tue, Feb 11, 2020 at 02:15:54PM +0000, Qais Yousef wrote:
> > LTP cpuset_hotplug_test.sh was failing with the following error message
> > 
> > 	cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5).
> > 
> > Which is due to a race condition between cpu hotplug operation and
> > reading cpuset.cpus file.
> > 
> > When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its
> > internal data structures with the new values. If a read happens during
> > this window, the user will read a stale value, hence triggering the
> > failure above.
> > 
> > To fix the issue make sure cpuset_wait_for_hotplug() is called before
> > allowing any value to be read, hence forcing the synchronization to
> > happen before the read.
> > 
> > I ran 500 iterations with this fix applied with no failure triggered.
> > 
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> 
> Hello, Qais. I just applied a patch which makes the operation
> synchronous. Can you see whether the problem is gone on the
> cgroup/for-next branch?
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next

I ran 500 iterations of cpuset_hotplug_test.sh on the branch, it passed.

I also cherry-picked commit 6426bfb1d5f0 ("cpuset: Make cpuset hotplug synchronous")
into v5.6-rc1 and ran 100 iterations and it passed too.

While investigating the problem, I could reproduce it all the way back to v5.0.
Stopped there so earlier versions could still have the problem.

Do you think it's worth porting the change to stable trees? Admittedly the
problem should be benign, but it did trigger an LTP failure.

I can check 4.19 and 4.14 stable trees (which at least in Android world are
still relevant) if you agree it makes sense to put a fix in stable.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
@ 2020-02-13 11:50     ` Qais Yousef
  0 siblings, 0 replies; 8+ messages in thread
From: Qais Yousef @ 2020-02-13 11:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Li Zefan, Johannes Weiner, cgroups-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Tejun

On 02/12/20 17:15, Tejun Heo wrote:
> On Tue, Feb 11, 2020 at 02:15:54PM +0000, Qais Yousef wrote:
> > LTP cpuset_hotplug_test.sh was failing with the following error message
> > 
> > 	cpuset_hotplug 1 TFAIL: root group's cpus isn't expected(Result: 0-5, Expect: 0,2-5).
> > 
> > Which is due to a race condition between cpu hotplug operation and
> > reading cpuset.cpus file.
> > 
> > When a cpu is onlined/offlined, cpuset schedules a workqueue to sync its
> > internal data structures with the new values. If a read happens during
> > this window, the user will read a stale value, hence triggering the
> > failure above.
> > 
> > To fix the issue make sure cpuset_wait_for_hotplug() is called before
> > allowing any value to be read, hence forcing the synchronization to
> > happen before the read.
> > 
> > I ran 500 iterations with this fix applied with no failure triggered.
> > 
> > Signed-off-by: Qais Yousef <qais.yousef-5wv7dgnIgG8@public.gmane.org>
> 
> Hello, Qais. I just applied a patch which makes the operation
> synchronous. Can you see whether the problem is gone on the
> cgroup/for-next branch?
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git for-next

I ran 500 iterations of cpuset_hotplug_test.sh on the branch, it passed.

I also cherry-picked commit 6426bfb1d5f0 ("cpuset: Make cpuset hotplug synchronous")
into v5.6-rc1 and ran 100 iterations and it passed too.

While investigating the problem, I could reproduce it all the way back to v5.0.
Stopped there so earlier versions could still have the problem.

Do you think it's worth porting the change to stable trees? Admittedly the
problem should be benign, but it did trigger an LTP failure.

I can check 4.19 and 4.14 stable trees (which at least in Android world are
still relevant) if you agree it makes sense to put a fix in stable.

Thanks

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
  2020-02-13 11:50     ` Qais Yousef
  (?)
@ 2020-02-13 13:56     ` Tejun Heo
  2020-02-13 14:36       ` Qais Yousef
  -1 siblings, 1 reply; 8+ messages in thread
From: Tejun Heo @ 2020-02-13 13:56 UTC (permalink / raw)
  To: Qais Yousef; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel

Hello,

On Thu, Feb 13, 2020 at 11:50:16AM +0000, Qais Yousef wrote:
> I ran 500 iterations of cpuset_hotplug_test.sh on the branch, it passed.
> 
> I also cherry-picked commit 6426bfb1d5f0 ("cpuset: Make cpuset hotplug synchronous")
> into v5.6-rc1 and ran 100 iterations and it passed too.

Awesome, thanks for verifying.

> While investigating the problem, I could reproduce it all the way back to v5.0.
> Stopped there so earlier versions could still have the problem.
> 
> Do you think it's worth porting the change to stable trees? Admittedly the
> problem should be benign, but it did trigger an LTP failure.

I'm afraid not. It's not an issue which would affect actual use cases
and there's (as always) some risks involved with backporting it, so
the benefit just doesn't seem justifiable here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.*
  2020-02-13 13:56     ` Tejun Heo
@ 2020-02-13 14:36       ` Qais Yousef
  0 siblings, 0 replies; 8+ messages in thread
From: Qais Yousef @ 2020-02-13 14:36 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, Johannes Weiner, cgroups, linux-kernel

On 02/13/20 08:56, Tejun Heo wrote:
> Hello,
> 
> On Thu, Feb 13, 2020 at 11:50:16AM +0000, Qais Yousef wrote:
> > I ran 500 iterations of cpuset_hotplug_test.sh on the branch, it passed.
> > 
> > I also cherry-picked commit 6426bfb1d5f0 ("cpuset: Make cpuset hotplug synchronous")
> > into v5.6-rc1 and ran 100 iterations and it passed too.
> 
> Awesome, thanks for verifying.
> 
> > While investigating the problem, I could reproduce it all the way back to v5.0.
> > Stopped there so earlier versions could still have the problem.
> > 
> > Do you think it's worth porting the change to stable trees? Admittedly the
> > problem should be benign, but it did trigger an LTP failure.
> 
> I'm afraid not. It's not an issue which would affect actual use cases
> and there's (as always) some risks involved with backporting it, so
> the benefit just doesn't seem justifiable here.

Yeah I can't see how a real application would depend on this functionality
other than for informational reasons. Or testing like in this case.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-02-13 14:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-11 14:15 [PATCH] cgroup/cpuset: Fix a race condition when reading cpuset.* Qais Yousef
2020-02-11 14:15 ` Qais Yousef
2020-02-12 22:15 ` Tejun Heo
2020-02-12 22:15   ` Tejun Heo
2020-02-13 11:50   ` Qais Yousef
2020-02-13 11:50     ` Qais Yousef
2020-02-13 13:56     ` Tejun Heo
2020-02-13 14:36       ` Qais Yousef

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.