linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] fix unsigned pcp adjustments
@ 2013-10-27  7:44 Greg Thelen
  2013-10-27  7:44 ` [PATCH 1/3] percpu counter: test module Greg Thelen
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Greg Thelen @ 2013-10-27  7:44 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, Andrew Morton
  Cc: x86, linux-kernel, cgroups, linux-mm, Greg Thelen

As of v3.11-9444-g3ea67d0 "memcg: add per cgroup writeback pages accounting"
memcg use of __this_cpu_add(counter, -nr_pages) leads to incorrect statistic
values because the negated nr_pages is not sign extended (counter is long,
nr_pages is unsigned int).  The memcg fix is __this_cpu_sub(counter, nr_pages).
But that doesn't simply work because __this_cpu_sub(counter, nr_pages) was
implemented as __this_cpu_add(counter, -nr_pages) which suffers the same
problem.  Example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()
    
Before this change long_counter on a 64 bit machine ends with value 0xffffffff,
rather than 0xffffffffffffffff.  This is because this_cpu_sub(pcp, delta) boils
down to:
  long_counter = 0 + 0xffffffff

Patch 1 creates a test module for percpu counters operations which demonstrates
the __this_cpu_sub() problems.  This patch is independent can be discarded if
there is no interest.

Patch 2 fixes __this_cpu_sub() to work with unsigned adjustments.

Patch 3 uses __this_cpu_sub() in memcg.

An alternative smaller solution is for memcg to use:
  __this_cpu_add(counter, -(int)nr_pages)
admitting that __this_cpu_add/sub() doesn't work with unsigned adjustments.  But
I felt like fixing the core services to prevent this in the future.

Greg Thelen (3):
  percpu counter: test module
  percpu counter: cast this_cpu_sub() adjustment
  memcg: use __this_cpu_sub to decrement stats

 arch/x86/include/asm/percpu.h |   3 +-
 include/linux/percpu.h        |   8 +--
 lib/Kconfig.debug             |   9 +++
 lib/Makefile                  |   2 +
 lib/percpu_test.c             | 138 ++++++++++++++++++++++++++++++++++++++++++
 mm/memcontrol.c               |   2 +-
 6 files changed, 156 insertions(+), 6 deletions(-)
 create mode 100644 lib/percpu_test.c

-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] percpu counter: test module
  2013-10-27  7:44 [PATCH 0/3] fix unsigned pcp adjustments Greg Thelen
@ 2013-10-27  7:44 ` Greg Thelen
  2013-10-27 11:18   ` Tejun Heo
  2013-10-27  7:44 ` [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment Greg Thelen
  2013-10-27  7:44 ` [PATCH 3/3] memcg: use __this_cpu_sub to decrement stats Greg Thelen
  2 siblings, 1 reply; 11+ messages in thread
From: Greg Thelen @ 2013-10-27  7:44 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, Andrew Morton
  Cc: x86, linux-kernel, cgroups, linux-mm, Greg Thelen

Tests various percpu operations.

Enable with CONFIG_PERCPU_TEST=m.

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 lib/Kconfig.debug |   9 ++++
 lib/Makefile      |   2 +
 lib/percpu_test.c | 138 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 149 insertions(+)
 create mode 100644 lib/percpu_test.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 06344d9..cee589d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1472,6 +1472,15 @@ config INTERVAL_TREE_TEST
 	help
 	  A benchmark measuring the performance of the interval tree library
 
+config PERCPU_TEST
+	tristate "Per cpu counter test"
+	depends on m && DEBUG_KERNEL
+	help
+	  Enable this option to build test modules with validates per-cpu
+	  counter operations.
+
+	  If unsure, say N.
+
 config ATOMIC64_SELFTEST
 	bool "Perform an atomic64_t self-test at boot"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index f3bb2cb..bb016e1 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -157,6 +157,8 @@ obj-$(CONFIG_INTERVAL_TREE_TEST) += interval_tree_test.o
 
 interval_tree_test-objs := interval_tree_test_main.o interval_tree.o
 
+obj-$(CONFIG_PERCPU_TEST) += percpu_test.o
+
 obj-$(CONFIG_ASN1) += asn1_decoder.o
 
 obj-$(CONFIG_FONT_SUPPORT) += fonts/
diff --git a/lib/percpu_test.c b/lib/percpu_test.c
new file mode 100644
index 0000000..1ebeb44
--- /dev/null
+++ b/lib/percpu_test.c
@@ -0,0 +1,138 @@
+#include <linux/module.h>
+
+/* validate @native and @pcp counter values match @expected */
+#define CHECK(native, pcp, expected)                                    \
+	do {                                                            \
+		WARN((native) != (expected),                            \
+		     "raw %ld (0x%lx) != expected %ld (0x%lx)",         \
+		     (long)(native), (long)(native),                    \
+		     (long)(expected), (long)(expected));               \
+		WARN(__this_cpu_read(pcp) != (expected),                \
+		     "pcp %ld (0x%lx) != expected %ld (0x%lx)",         \
+		     (long)__this_cpu_read(pcp), (long)__this_cpu_read(pcp), \
+		     (long)(expected), (long)(expected));               \
+	} while (0)
+
+static DEFINE_PER_CPU(long, long_counter);
+static DEFINE_PER_CPU(unsigned long, ulong_counter);
+
+static int __init percpu_test_init(void)
+{
+	/*
+	 * volatile prevents compiler from optimizing it uses, otherwise the
+	 * +ul_one and -ul_one below would replace with inc/dec instructions.
+	 */
+	volatile unsigned int ui_one = 1;
+	long l = 0;
+	unsigned long ul = 0;
+
+	pr_info("percpu test start\n");
+
+	preempt_disable();
+
+	l += -1;
+	__this_cpu_add(long_counter, -1);
+	CHECK(l, long_counter, -1);
+
+	l += 1;
+	__this_cpu_add(long_counter, 1);
+	CHECK(l, long_counter, 0);
+
+	ul = 0;
+	__this_cpu_write(ulong_counter, 0);
+
+	ul += 1UL;
+	__this_cpu_add(ulong_counter, 1UL);
+	CHECK(ul, ulong_counter, 1);
+
+	ul += -1UL;
+	__this_cpu_add(ulong_counter, -1UL);
+	CHECK(ul, ulong_counter, 0);
+
+	ul += -(unsigned long)1;
+	__this_cpu_add(ulong_counter, -(unsigned long)1);
+	CHECK(ul, ulong_counter, -1);
+
+	ul = 0;
+	__this_cpu_write(ulong_counter, 0);
+
+	ul -= 1;
+	__this_cpu_dec(ulong_counter);
+	CHECK(ul, ulong_counter, 0xffffffffffffffff);
+	CHECK(ul, ulong_counter, -1);
+
+	l += -ui_one;
+	__this_cpu_add(long_counter, -ui_one);
+	CHECK(l, long_counter, 0xffffffff);
+
+	l += ui_one;
+	__this_cpu_add(long_counter, ui_one);
+	CHECK(l, long_counter, 0x100000000);
+
+
+	l = 0;
+	__this_cpu_write(long_counter, 0);
+
+	l -= ui_one;
+	__this_cpu_sub(long_counter, ui_one);
+	CHECK(l, long_counter, -1);
+
+	l = 0;
+	__this_cpu_write(long_counter, 0);
+
+	l += ui_one;
+	__this_cpu_add(long_counter, ui_one);
+	CHECK(l, long_counter, 1);
+
+	l += -ui_one;
+	__this_cpu_add(long_counter, -ui_one);
+	CHECK(l, long_counter, 0x100000000);
+
+	l = 0;
+	__this_cpu_write(long_counter, 0);
+
+	l -= ui_one;
+	this_cpu_sub(long_counter, ui_one);
+	CHECK(l, long_counter, -1);
+	CHECK(l, long_counter, 0xffffffffffffffff);
+
+	ul = 0;
+	__this_cpu_write(ulong_counter, 0);
+
+	ul += ui_one;
+	__this_cpu_add(ulong_counter, ui_one);
+	CHECK(ul, ulong_counter, 1);
+
+	ul = 0;
+	__this_cpu_write(ulong_counter, 0);
+
+	ul -= ui_one;
+	__this_cpu_sub(ulong_counter, ui_one);
+	CHECK(ul, ulong_counter, -1);
+	CHECK(ul, ulong_counter, 0xffffffffffffffff);
+
+	ul = 3;
+	__this_cpu_write(ulong_counter, 3);
+
+	ul = this_cpu_sub_return(ulong_counter, ui_one);
+	CHECK(ul, ulong_counter, 2);
+
+	ul = __this_cpu_sub_return(ulong_counter, ui_one);
+	CHECK(ul, ulong_counter, 0);
+
+	preempt_enable();
+
+	pr_info("percpu test done\n");
+	return -EAGAIN;  /* Fail will directly unload the module */
+}
+
+static void __exit percpu_test_exit(void)
+{
+}
+
+module_init(percpu_test_init)
+module_exit(percpu_test_exit)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Greg Thelen");
+MODULE_DESCRIPTION("percpu variables test");
-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment
  2013-10-27  7:44 [PATCH 0/3] fix unsigned pcp adjustments Greg Thelen
  2013-10-27  7:44 ` [PATCH 1/3] percpu counter: test module Greg Thelen
@ 2013-10-27  7:44 ` Greg Thelen
  2013-10-27 11:22   ` Tejun Heo
  2013-10-27 17:12   ` Greg Thelen
  2013-10-27  7:44 ` [PATCH 3/3] memcg: use __this_cpu_sub to decrement stats Greg Thelen
  2 siblings, 2 replies; 11+ messages in thread
From: Greg Thelen @ 2013-10-27  7:44 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, Andrew Morton
  Cc: x86, linux-kernel, cgroups, linux-mm, Greg Thelen

this_cpu_sub() is implemented as negation and addition.

This patch casts the adjustment to the counter type before negation to
sign extend the adjustment.  This helps in cases where the counter
type is wider than an unsigned adjustment.  An alternative to this
patch is to declare such operations unsupported, but it seemed useful
to avoid surprises.

This patch specifically helps the following example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()

Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff.  This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
  long_counter = 0 + 0xffffffff

Also apply the same cast to:
  __this_cpu_sub()
  this_cpu_sub_return()
  and __this_cpu_sub_return()

All percpu_test.ko passes, especially the following cases which
previously failed:

  l -= ui_one;
  __this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);

  l -= ui_one;
  this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);
  CHECK(l, long_counter, 0xffffffffffffffff);

  ul -= ui_one;
  __this_cpu_sub(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, -1);
  CHECK(ul, ulong_counter, 0xffffffffffffffff);

  ul = this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 2);

  ul = __this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 1);

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 arch/x86/include/asm/percpu.h | 3 ++-
 include/linux/percpu.h        | 8 ++++----
 lib/percpu_test.c             | 2 +-
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
index 0da5200..b3e18f8 100644
--- a/arch/x86/include/asm/percpu.h
+++ b/arch/x86/include/asm/percpu.h
@@ -128,7 +128,8 @@ do {							\
 do {									\
 	typedef typeof(var) pao_T__;					\
 	const int pao_ID__ = (__builtin_constant_p(val) &&		\
-			      ((val) == 1 || (val) == -1)) ? (val) : 0;	\
+			      ((val) == 1 || (val) == -1)) ?		\
+				(int)(val) : 0;				\
 	if (0) {							\
 		pao_T__ pao_tmp__;					\
 		pao_tmp__ = (val);					\
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index cc88172..c74088a 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -332,7 +332,7 @@ do {									\
 #endif
 
 #ifndef this_cpu_sub
-# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
+# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(typeof(pcp))(val))
 #endif
 
 #ifndef this_cpu_inc
@@ -418,7 +418,7 @@ do {									\
 # define this_cpu_add_return(pcp, val)	__pcpu_size_call_return2(this_cpu_add_return_, pcp, val)
 #endif
 
-#define this_cpu_sub_return(pcp, val)	this_cpu_add_return(pcp, -(val))
+#define this_cpu_sub_return(pcp, val)	this_cpu_add_return(pcp, -(typeof(pcp))(val))
 #define this_cpu_inc_return(pcp)	this_cpu_add_return(pcp, 1)
 #define this_cpu_dec_return(pcp)	this_cpu_add_return(pcp, -1)
 
@@ -586,7 +586,7 @@ do {									\
 #endif
 
 #ifndef __this_cpu_sub
-# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
+# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(typeof(pcp))(val))
 #endif
 
 #ifndef __this_cpu_inc
@@ -668,7 +668,7 @@ do {									\
 	__pcpu_size_call_return2(__this_cpu_add_return_, pcp, val)
 #endif
 
-#define __this_cpu_sub_return(pcp, val)	__this_cpu_add_return(pcp, -(val))
+#define __this_cpu_sub_return(pcp, val)	__this_cpu_add_return(pcp, -(typeof(pcp))(val))
 #define __this_cpu_inc_return(pcp)	__this_cpu_add_return(pcp, 1)
 #define __this_cpu_dec_return(pcp)	__this_cpu_add_return(pcp, -1)
 
diff --git a/lib/percpu_test.c b/lib/percpu_test.c
index 1ebeb44..8ab4231 100644
--- a/lib/percpu_test.c
+++ b/lib/percpu_test.c
@@ -118,7 +118,7 @@ static int __init percpu_test_init(void)
 	CHECK(ul, ulong_counter, 2);
 
 	ul = __this_cpu_sub_return(ulong_counter, ui_one);
-	CHECK(ul, ulong_counter, 0);
+	CHECK(ul, ulong_counter, 1);
 
 	preempt_enable();
 
-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] memcg: use __this_cpu_sub to decrement stats
  2013-10-27  7:44 [PATCH 0/3] fix unsigned pcp adjustments Greg Thelen
  2013-10-27  7:44 ` [PATCH 1/3] percpu counter: test module Greg Thelen
  2013-10-27  7:44 ` [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment Greg Thelen
@ 2013-10-27  7:44 ` Greg Thelen
  2013-10-27 11:24   ` Tejun Heo
  2 siblings, 1 reply; 11+ messages in thread
From: Greg Thelen @ 2013-10-27  7:44 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, Andrew Morton
  Cc: x86, linux-kernel, cgroups, linux-mm, Greg Thelen

As of v3.11-9444-g3ea67d0 "memcg: add per cgroup writeback pages
accounting" memcg counter errors are possible when moving charged
memory to a different memcg.  Charge movement occurs when processing
writes to memory.force_empty, moving tasks to a memcg with
memcg.move_charge_at_immigrate=1, or memcg deletion.  An example
showing error after memory.force_empty:
  $ cd /sys/fs/cgroup/memory
  $ mkdir x
  $ rm /data/tmp/file
  $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
  [1] 13600
  $ grep ^mapped x/memory.stat
  mapped_file 1048576
  $ echo 13600 > tasks
  $ echo 1 > x/memory.force_empty
  $ grep ^mapped x/memory.stat
  mapped_file 4503599627370496

mapped_file should end with 0.
  4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
  1048576          == 0x10,0000           == 0x100 pages

This issue only affects the source memcg on 64 bit machines; the
destination memcg counters are correct.  So the rmdir case is not too
important because such counters are soon disappearing with the entire
memcg.  But the memcg.force_empty and
memory.move_charge_at_immigrate=1 cases are larger problems as the
bogus counters are visible for the (possibly long) remaining life of
the source memcg.

The problem is due to memcg use of __this_cpu_from(.., -nr_pages),
which is subtly wrong because it subtracts the unsigned int nr_pages
(either -1 or -512 for THP) from a signed long percpu counter.  When
nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines
stat->count[idx] is signed 64 bit.  So memcg's attempt to simply
decrement a count (e.g. from 1 to 0) boils down to:
  long count = 1
  unsigned int nr_pages = 1
  count += -nr_pages  /* -nr_pages == 0xffff,ffff */
  count is now 0x1,0000,0000 instead of 0

The fix is to subtract the unsigned page count rather than adding its
negation.  This only works with the "percpu counter: cast
this_cpu_sub() adjustment" patch which fixes this_cpu_sub().

Signed-off-by: Greg Thelen <gthelen@google.com>
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index aa8185c..b7ace0f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3773,7 +3773,7 @@ void mem_cgroup_move_account_page_stat(struct mem_cgroup *from,
 {
 	/* Update stat data for mem_cgroup */
 	preempt_disable();
-	__this_cpu_add(from->stat->count[idx], -nr_pages);
+	__this_cpu_sub(from->stat->count[idx], nr_pages);
 	__this_cpu_add(to->stat->count[idx], nr_pages);
 	preempt_enable();
 }
-- 
1.8.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] percpu counter: test module
  2013-10-27  7:44 ` [PATCH 1/3] percpu counter: test module Greg Thelen
@ 2013-10-27 11:18   ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2013-10-27 11:18 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Christoph Lameter, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Johannes Weiner, Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki,
	handai.szj, Andrew Morton, x86, linux-kernel, cgroups, linux-mm

On Sun, Oct 27, 2013 at 12:44:34AM -0700, Greg Thelen wrote:
> Tests various percpu operations.
> 
> Enable with CONFIG_PERCPU_TEST=m.
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment
  2013-10-27  7:44 ` [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment Greg Thelen
@ 2013-10-27 11:22   ` Tejun Heo
  2013-10-27 12:04     ` Andrew Morton
  2013-10-27 17:12   ` Greg Thelen
  1 sibling, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2013-10-27 11:22 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Christoph Lameter, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Johannes Weiner, Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki,
	handai.szj, Andrew Morton, x86, linux-kernel, cgroups, linux-mm

On Sun, Oct 27, 2013 at 12:44:35AM -0700, Greg Thelen wrote:
> this_cpu_sub() is implemented as negation and addition.
> 
> This patch casts the adjustment to the counter type before negation to
> sign extend the adjustment.  This helps in cases where the counter
> type is wider than an unsigned adjustment.  An alternative to this
> patch is to declare such operations unsupported, but it seemed useful
> to avoid surprises.
> 
> This patch specifically helps the following example:
>   unsigned int delta = 1
>   preempt_disable()
>   this_cpu_write(long_counter, 0)
>   this_cpu_sub(long_counter, delta)
>   preempt_enable()
> 
> Before this change long_counter on a 64 bit machine ends with value
> 0xffffffff, rather than 0xffffffffffffffff.  This is because
> this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
> which is basically:
>   long_counter = 0 + 0xffffffff
> 
> Also apply the same cast to:
>   __this_cpu_sub()
>   this_cpu_sub_return()
>   and __this_cpu_sub_return()
> 
> All percpu_test.ko passes, especially the following cases which
> previously failed:
> 
>   l -= ui_one;
>   __this_cpu_sub(long_counter, ui_one);
>   CHECK(l, long_counter, -1);
> 
>   l -= ui_one;
>   this_cpu_sub(long_counter, ui_one);
>   CHECK(l, long_counter, -1);
>   CHECK(l, long_counter, 0xffffffffffffffff);
> 
>   ul -= ui_one;
>   __this_cpu_sub(ulong_counter, ui_one);
>   CHECK(ul, ulong_counter, -1);
>   CHECK(ul, ulong_counter, 0xffffffffffffffff);
> 
>   ul = this_cpu_sub_return(ulong_counter, ui_one);
>   CHECK(ul, ulong_counter, 2);
> 
>   ul = __this_cpu_sub_return(ulong_counter, ui_one);
>   CHECK(ul, ulong_counter, 1);
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

Ouch, nice catch.

 Acked-by: Tejun Heo <tj@kernel.org>

We probably want to cc stable for this and the next one.  How should
these be routed?  I can take these through percpu tree or mm works
too.  Either way, it'd be best to route them together.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] memcg: use __this_cpu_sub to decrement stats
  2013-10-27  7:44 ` [PATCH 3/3] memcg: use __this_cpu_sub to decrement stats Greg Thelen
@ 2013-10-27 11:24   ` Tejun Heo
  0 siblings, 0 replies; 11+ messages in thread
From: Tejun Heo @ 2013-10-27 11:24 UTC (permalink / raw)
  To: Greg Thelen
  Cc: Christoph Lameter, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Johannes Weiner, Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki,
	handai.szj, Andrew Morton, x86, linux-kernel, cgroups, linux-mm

On Sun, Oct 27, 2013 at 12:44:36AM -0700, Greg Thelen wrote:
> As of v3.11-9444-g3ea67d0 "memcg: add per cgroup writeback pages
> accounting" memcg counter errors are possible when moving charged
> memory to a different memcg.  Charge movement occurs when processing
> writes to memory.force_empty, moving tasks to a memcg with
> memcg.move_charge_at_immigrate=1, or memcg deletion.  An example
> showing error after memory.force_empty:
>   $ cd /sys/fs/cgroup/memory
>   $ mkdir x
>   $ rm /data/tmp/file
>   $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
>   [1] 13600
>   $ grep ^mapped x/memory.stat
>   mapped_file 1048576
>   $ echo 13600 > tasks
>   $ echo 1 > x/memory.force_empty
>   $ grep ^mapped x/memory.stat
>   mapped_file 4503599627370496
> 
> mapped_file should end with 0.
>   4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
>   1048576          == 0x10,0000           == 0x100 pages
> 
> This issue only affects the source memcg on 64 bit machines; the
> destination memcg counters are correct.  So the rmdir case is not too
> important because such counters are soon disappearing with the entire
> memcg.  But the memcg.force_empty and
> memory.move_charge_at_immigrate=1 cases are larger problems as the
> bogus counters are visible for the (possibly long) remaining life of
> the source memcg.
> 
> The problem is due to memcg use of __this_cpu_from(.., -nr_pages),
> which is subtly wrong because it subtracts the unsigned int nr_pages
> (either -1 or -512 for THP) from a signed long percpu counter.  When
> nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines
> stat->count[idx] is signed 64 bit.  So memcg's attempt to simply
> decrement a count (e.g. from 1 to 0) boils down to:
>   long count = 1
>   unsigned int nr_pages = 1
>   count += -nr_pages  /* -nr_pages == 0xffff,ffff */
>   count is now 0x1,0000,0000 instead of 0
> 
> The fix is to subtract the unsigned page count rather than adding its
> negation.  This only works with the "percpu counter: cast
> this_cpu_sub() adjustment" patch which fixes this_cpu_sub().
> 
> Signed-off-by: Greg Thelen <gthelen@google.com>

 Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment
  2013-10-27 11:22   ` Tejun Heo
@ 2013-10-27 12:04     ` Andrew Morton
  2013-10-27 13:00       ` Tejun Heo
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2013-10-27 12:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Greg Thelen, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, x86, linux-kernel, cgroups,
	linux-mm

On Sun, 27 Oct 2013 07:22:55 -0400 Tejun Heo <tj@kernel.org> wrote:

> We probably want to cc stable for this and the next one.  How should
> these be routed?  I can take these through percpu tree or mm works
> too.  Either way, it'd be best to route them together.

Yes, all three look like -stable material to me.  I'll grab them later
in the week if you haven't ;)

The names of the first two patches distress me.  They rather clearly
assert that the code affects percpu_counter.[ch], but that is not the case. 
Massaging is needed to fix that up.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment
  2013-10-27 12:04     ` Andrew Morton
@ 2013-10-27 13:00       ` Tejun Heo
  2013-10-27 16:13         ` Greg Thelen
  0 siblings, 1 reply; 11+ messages in thread
From: Tejun Heo @ 2013-10-27 13:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Greg Thelen, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, x86, linux-kernel, cgroups,
	linux-mm

On Sun, Oct 27, 2013 at 05:04:29AM -0700, Andrew Morton wrote:
> On Sun, 27 Oct 2013 07:22:55 -0400 Tejun Heo <tj@kernel.org> wrote:
> 
> > We probably want to cc stable for this and the next one.  How should
> > these be routed?  I can take these through percpu tree or mm works
> > too.  Either way, it'd be best to route them together.
> 
> Yes, all three look like -stable material to me.  I'll grab them later
> in the week if you haven't ;)

Tried to apply to percpu but the third one is a fix for a patch which
was added to -mm during v3.12-rc1, so these are yours. :)

> The names of the first two patches distress me.  They rather clearly
> assert that the code affects percpu_counter.[ch], but that is not the case. 
> Massaging is needed to fix that up.

Yeah, something like the following would be better

 percpu: add test module for various percpu operations
 percpu: fix this_cpu_sub() subtrahend casting for unsigneds
 memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment
  2013-10-27 13:00       ` Tejun Heo
@ 2013-10-27 16:13         ` Greg Thelen
  0 siblings, 0 replies; 11+ messages in thread
From: Greg Thelen @ 2013-10-27 16:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Christoph Lameter, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Johannes Weiner, Michal Hocko, Balbir Singh,
	KAMEZAWA Hiroyuki, handai.szj, x86, linux-kernel, cgroups,
	linux-mm

On Sun, Oct 27 2013, Tejun Heo wrote:

> On Sun, Oct 27, 2013 at 05:04:29AM -0700, Andrew Morton wrote:
>> On Sun, 27 Oct 2013 07:22:55 -0400 Tejun Heo <tj@kernel.org> wrote:
>> 
>> > We probably want to cc stable for this and the next one.  How should
>> > these be routed?  I can take these through percpu tree or mm works
>> > too.  Either way, it'd be best to route them together.
>> 
>> Yes, all three look like -stable material to me.  I'll grab them later
>> in the week if you haven't ;)
>
> Tried to apply to percpu but the third one is a fix for a patch which
> was added to -mm during v3.12-rc1, so these are yours. :)

I don't object to stable for the first two non-memcg patches, but it's
probably unnecessary.  I should have made it more clear, but an audit of
v3.12-rc6 shows that only new memcg code is affected - the new
mem_cgroup_move_account_page_stat() is the only place where an unsigned
adjustment is used.  All other callers (e.g. shrink_dcache_sb) already
use a signed adjustment, so no problems before v3.12.  Though I did not
audit the stable kernel trees, so there could be something hiding in
there.

>> The names of the first two patches distress me.  They rather clearly
>> assert that the code affects percpu_counter.[ch], but that is not the case. 
>> Massaging is needed to fix that up.
>
> Yeah, something like the following would be better
>
>  percpu: add test module for various percpu operations
>  percpu: fix this_cpu_sub() subtrahend casting for unsigneds
>  memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting

No objection to renaming.  Let me know if you want these reposed with
updated titles.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment
  2013-10-27  7:44 ` [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment Greg Thelen
  2013-10-27 11:22   ` Tejun Heo
@ 2013-10-27 17:12   ` Greg Thelen
  1 sibling, 0 replies; 11+ messages in thread
From: Greg Thelen @ 2013-10-27 17:12 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Christoph Lameter, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Johannes Weiner, Michal Hocko, Balbir Singh, KAMEZAWA Hiroyuki,
	handai.szj, Andrew Morton, x86, linux-kernel, cgroups, linux-mm

On Sun, Oct 27 2013, Greg Thelen wrote:

> this_cpu_sub() is implemented as negation and addition.
>
> This patch casts the adjustment to the counter type before negation to
> sign extend the adjustment.  This helps in cases where the counter
> type is wider than an unsigned adjustment.  An alternative to this
> patch is to declare such operations unsupported, but it seemed useful
> to avoid surprises.
>
> This patch specifically helps the following example:
>   unsigned int delta = 1
>   preempt_disable()
>   this_cpu_write(long_counter, 0)
>   this_cpu_sub(long_counter, delta)
>   preempt_enable()
>
> Before this change long_counter on a 64 bit machine ends with value
> 0xffffffff, rather than 0xffffffffffffffff.  This is because
> this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
> which is basically:
>   long_counter = 0 + 0xffffffff
>
> Also apply the same cast to:
>   __this_cpu_sub()
>   this_cpu_sub_return()
>   and __this_cpu_sub_return()
>
> All percpu_test.ko passes, especially the following cases which
> previously failed:
>
>   l -= ui_one;
>   __this_cpu_sub(long_counter, ui_one);
>   CHECK(l, long_counter, -1);
>
>   l -= ui_one;
>   this_cpu_sub(long_counter, ui_one);
>   CHECK(l, long_counter, -1);
>   CHECK(l, long_counter, 0xffffffffffffffff);
>
>   ul -= ui_one;
>   __this_cpu_sub(ulong_counter, ui_one);
>   CHECK(ul, ulong_counter, -1);
>   CHECK(ul, ulong_counter, 0xffffffffffffffff);
>
>   ul = this_cpu_sub_return(ulong_counter, ui_one);
>   CHECK(ul, ulong_counter, 2);
>
>   ul = __this_cpu_sub_return(ulong_counter, ui_one);
>   CHECK(ul, ulong_counter, 1);
>
> Signed-off-by: Greg Thelen <gthelen@google.com>
> ---
>  arch/x86/include/asm/percpu.h | 3 ++-
>  include/linux/percpu.h        | 8 ++++----
>  lib/percpu_test.c             | 2 +-
>  3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h
> index 0da5200..b3e18f8 100644
> --- a/arch/x86/include/asm/percpu.h
> +++ b/arch/x86/include/asm/percpu.h
> @@ -128,7 +128,8 @@ do {							\
>  do {									\
>  	typedef typeof(var) pao_T__;					\
>  	const int pao_ID__ = (__builtin_constant_p(val) &&		\
> -			      ((val) == 1 || (val) == -1)) ? (val) : 0;	\
> +			      ((val) == 1 || (val) == -1)) ?		\
> +				(int)(val) : 0;				\
>  	if (0) {							\
>  		pao_T__ pao_tmp__;					\
>  		pao_tmp__ = (val);					\
> diff --git a/include/linux/percpu.h b/include/linux/percpu.h
> index cc88172..c74088a 100644
> --- a/include/linux/percpu.h
> +++ b/include/linux/percpu.h
> @@ -332,7 +332,7 @@ do {									\
>  #endif
>  
>  #ifndef this_cpu_sub
> -# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(val))
> +# define this_cpu_sub(pcp, val)		this_cpu_add((pcp), -(typeof(pcp))(val))
>  #endif
>  
>  #ifndef this_cpu_inc
> @@ -418,7 +418,7 @@ do {									\
>  # define this_cpu_add_return(pcp, val)	__pcpu_size_call_return2(this_cpu_add_return_, pcp, val)
>  #endif
>  
> -#define this_cpu_sub_return(pcp, val)	this_cpu_add_return(pcp, -(val))
> +#define this_cpu_sub_return(pcp, val)	this_cpu_add_return(pcp, -(typeof(pcp))(val))
>  #define this_cpu_inc_return(pcp)	this_cpu_add_return(pcp, 1)
>  #define this_cpu_dec_return(pcp)	this_cpu_add_return(pcp, -1)
>  
> @@ -586,7 +586,7 @@ do {									\
>  #endif
>  
>  #ifndef __this_cpu_sub
> -# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(val))
> +# define __this_cpu_sub(pcp, val)	__this_cpu_add((pcp), -(typeof(pcp))(val))
>  #endif
>  
>  #ifndef __this_cpu_inc
> @@ -668,7 +668,7 @@ do {									\
>  	__pcpu_size_call_return2(__this_cpu_add_return_, pcp, val)
>  #endif
>  
> -#define __this_cpu_sub_return(pcp, val)	__this_cpu_add_return(pcp, -(val))
> +#define __this_cpu_sub_return(pcp, val)	__this_cpu_add_return(pcp, -(typeof(pcp))(val))
>  #define __this_cpu_inc_return(pcp)	__this_cpu_add_return(pcp, 1)
>  #define __this_cpu_dec_return(pcp)	__this_cpu_add_return(pcp, -1)
>  
> diff --git a/lib/percpu_test.c b/lib/percpu_test.c
> index 1ebeb44..8ab4231 100644
> --- a/lib/percpu_test.c
> +++ b/lib/percpu_test.c
> @@ -118,7 +118,7 @@ static int __init percpu_test_init(void)
>  	CHECK(ul, ulong_counter, 2);
>  
>  	ul = __this_cpu_sub_return(ulong_counter, ui_one);
> -	CHECK(ul, ulong_counter, 0);
> +	CHECK(ul, ulong_counter, 1);
>  
>  	preempt_enable();

Oops.  This update to percpu_test.c doesn't belong in this patch.  It
should be moved to the earlier patch 1/3.  I'll repost the series with
this update and the suggested changes to patch titles.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2013-10-27 17:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-27  7:44 [PATCH 0/3] fix unsigned pcp adjustments Greg Thelen
2013-10-27  7:44 ` [PATCH 1/3] percpu counter: test module Greg Thelen
2013-10-27 11:18   ` Tejun Heo
2013-10-27  7:44 ` [PATCH 2/3] percpu counter: cast this_cpu_sub() adjustment Greg Thelen
2013-10-27 11:22   ` Tejun Heo
2013-10-27 12:04     ` Andrew Morton
2013-10-27 13:00       ` Tejun Heo
2013-10-27 16:13         ` Greg Thelen
2013-10-27 17:12   ` Greg Thelen
2013-10-27  7:44 ` [PATCH 3/3] memcg: use __this_cpu_sub to decrement stats Greg Thelen
2013-10-27 11:24   ` Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).