linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race
@ 2023-03-15  8:49 Dave Chinner
  2023-03-15  8:49 ` [PATCH 1/4] cpumask: introduce for_each_cpu_or Dave Chinner
                   ` (5 more replies)
  0 siblings, 6 replies; 12+ messages in thread
From: Dave Chinner @ 2023-03-15  8:49 UTC (permalink / raw)
  To: linux-kernel, linux-xfs; +Cc: linux-mm, linux-fsdevel, yebin10

Ye Bin reported an XFS assert failure when testing CPU hotplug
recently. During unmount, XFs was asserting that a percpu counter
value should be zero because at that point in time a non-zero value
indicates a space accounting leak which is a bug. The details of
that failure can be found here:

https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/

Ye Bin then proposed changing the XFS code to use
percpu_counter_sum_all(), which surprised me because I didn't know
that function existed at all. Indeed, it was only merged in the
recent 6.3-rc1 merge window because someone else had noticed a
pcpctr sum race with hotplug.

commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all
interface") was introduced via the mm tree. Nobody outside that
scope knew about this, because who bothers to even try to read LKML
these days? There was little list discussion, and I don't see
anything other than a cursory review done on the patch.

At minimum, linux-fsdevel should have been cc'd because multiple
filesystems use percpu counters for both threshold and ENOSPC
accounting in filesystems.  Hence if there is a problem with
percpu_counter_sum() leaking, filesystem developers kinda need to
know about it because leaks like this (as per the XFS bug report)
can actually result in on-disk corruption occurring.

So, now I know that there is an accuracy problem with
percpu_counter_sum(), I will assert that we need to fix it properly
rathern than hack around it by adding a new variant. Leaving people
who know nothing about cpu hotplug to try to work out if they have a
hotplug related issue with their use of percpu_counter_sum() is just
bad code; percpu_counter_sum() should just Do The Right Thing.

Use of the cpu_dying_mask should effectively close this race
condition.  That is, when we take a CPU offline we effectively do:

	mark cpu dying
	clear cpu from cpu_online_mask
	run cpu dead callbacks
	  ....
	  <lock counter>
	  fold pcp count into fbc->count
	  clear pcp count
	  <unlock counter>
	  ...
	mark CPU dead
	clear cpu dying

The race condition occurs because we can run a _sum operation
between the "clear cpu online" mask update and the "pcpctr cpu dead"
notification runs and fold the pcp counter values back into the
global count.  The sum sees that the CPU is not online, so it skips
that CPU even though the count is not zero and hasn't been folded by
the CPU dead notifier. Hence it skips something that it shouldn't.

However, that race condition doesn't exist if we take cpu_dying_mask
into account during the sum.  i.e. percpu_counter_sum() should
iterate every CPU set in either the cpu_online_mask and the
cpu_dying_mask to capture CPUs that are being taken offline.

If the cpu is not set in the dying mask, then the online or offline
state of the CPU is correct an there is no notifier pending over
running and we will skip/sum it correctly.

If the CPU is set in the dying mask, then we need to sum it
regardless of the online mask state or even whether the cpu dead
notifier has run.  If the sum wins the race to the pcp counter on
the dying CPU, it is included in the local sum from the pcp
variable. If the notifier wins the race, it gets folded back into
the global count and zeroed before the sum runs. Then the sum
includes the count in the local sum from the global counter sum
rather than the percpu counter.

Either way, we get the same correct sum value from
percpu_counter_sum() regardless of how it races with a CPU being
removed from the system. And there is no need for
percpu_count_sum_all() anymore.

This series introduces bitmap operations for finding bits set in
either of two bitmasks and adds the for_each_cpu_or() wrapper to
iterate CPUs set in either of the two supplied cpu masks. It then
converts __percpu_counter_sum_mask() to use this, and have
__percpu_counter_sum() pass the cpu_dying_mask as the second mask.
This fixes the race condition with CPUs dying.

It then converts the only user of percpu_counter_sum_all() to use
percpu_counter_sum() as percpu_counter_sum_all() is now redundant,
then it removes percpu_counter_sum_all() and recombines
__percpu_counter_sum_mask() and __percpu_counter_sum().

This effectively undoes all the changes in commit f689054aace2
except for the small change to use for_each_cpu_or() to fold in the
cpu_dying_mask made in this patch set to avoid the problematic race
condition. Hence the cpu unplug race condition is now correctly
handled by percpu_counter_sum(), and people can go back to being
blissfully ignorant of how pcpctrs interact with CPU hotplug (which
is how it should be!).

This has spent the last siz hours running generic/650 on XFS on a
couple of VMs (on 4p, the other 16p) which stresses the filesystem
by running a multi-process fsstress invocation whilst randomly
onlining and offlining CPUs. Hence it's exercising all the percpu
counter cpu dead paths whilst the filesystem is randomly modifying,
reading and summing summing the critical counters that XFS needs for
accurate accounting of resource consumption within the filesystem.

Thoughts, comments and testing welcome!

-Dave.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] cpumask: introduce for_each_cpu_or
  2023-03-15  8:49 [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Dave Chinner
@ 2023-03-15  8:49 ` Dave Chinner
  2023-03-15  8:49 ` [PATCH 2/4] pcpcntrs: fix dying cpu summation race Dave Chinner
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2023-03-15  8:49 UTC (permalink / raw)
  To: linux-kernel, linux-xfs; +Cc: linux-mm, linux-fsdevel, yebin10

From: Dave Chinner <dchinner@redhat.com>

Equivalent of for_each_cpu_and, except it ORs the two masks together
so it iterates all the CPUs present in either mask.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/cpumask.h | 17 +++++++++++++++++
 include/linux/find.h    | 37 +++++++++++++++++++++++++++++++++++++
 lib/find_bit.c          |  9 +++++++++
 3 files changed, 63 insertions(+)

diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h
index d4901ca8883c..ca736b05ec7b 100644
--- a/include/linux/cpumask.h
+++ b/include/linux/cpumask.h
@@ -350,6 +350,23 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta
 #define for_each_cpu_andnot(cpu, mask1, mask2)				\
 	for_each_andnot_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
 
+/**
+ * for_each_cpu_or - iterate over every cpu present in either mask
+ * @cpu: the (optionally unsigned) integer iterator
+ * @mask1: the first cpumask pointer
+ * @mask2: the second cpumask pointer
+ *
+ * This saves a temporary CPU mask in many places.  It is equivalent to:
+ *	struct cpumask tmp;
+ *	cpumask_or(&tmp, &mask1, &mask2);
+ *	for_each_cpu(cpu, &tmp)
+ *		...
+ *
+ * After the loop, cpu is >= nr_cpu_ids.
+ */
+#define for_each_cpu_or(cpu, mask1, mask2)				\
+	for_each_or_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), small_cpumask_bits)
+
 /**
  * cpumask_any_but - return a "random" in a cpumask, but not this one.
  * @mask: the cpumask to search
diff --git a/include/linux/find.h b/include/linux/find.h
index 4647864a5ffd..5e4f39ef2e72 100644
--- a/include/linux/find.h
+++ b/include/linux/find.h
@@ -14,6 +14,8 @@ unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long
 					unsigned long nbits, unsigned long start);
 unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned long *addr2,
 					unsigned long nbits, unsigned long start);
+unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned long *addr2,
+					unsigned long nbits, unsigned long start);
 unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
 					 unsigned long start);
 extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size);
@@ -127,6 +129,36 @@ unsigned long find_next_andnot_bit(const unsigned long *addr1,
 }
 #endif
 
+#ifndef find_next_or_bit
+/**
+ * find_next_or_bit - find the next set bit in either memory regions
+ * @addr1: The first address to base the search on
+ * @addr2: The second address to base the search on
+ * @size: The bitmap size in bits
+ * @offset: The bitnumber to start searching at
+ *
+ * Returns the bit number for the next set bit
+ * If no bits are set, returns @size.
+ */
+static inline
+unsigned long find_next_or_bit(const unsigned long *addr1,
+		const unsigned long *addr2, unsigned long size,
+		unsigned long offset)
+{
+	if (small_const_nbits(size)) {
+		unsigned long val;
+
+		if (unlikely(offset >= size))
+			return size;
+
+		val = (*addr1 | *addr2) & GENMASK(size - 1, offset);
+		return val ? __ffs(val) : size;
+	}
+
+	return _find_next_or_bit(addr1, addr2, size, offset);
+}
+#endif
+
 #ifndef find_next_zero_bit
 /**
  * find_next_zero_bit - find the next cleared bit in a memory region
@@ -536,6 +568,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned
 	     (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
 	     (bit)++)
 
+#define for_each_or_bit(bit, addr1, addr2, size) \
+	for ((bit) = 0;									\
+	     (bit) = find_next_or_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\
+	     (bit)++)
+
 /* same as for_each_set_bit() but use bit as value to start with */
 #define for_each_set_bit_from(bit, addr, size) \
 	for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++)
diff --git a/lib/find_bit.c b/lib/find_bit.c
index c10920e66788..32f99e9a670e 100644
--- a/lib/find_bit.c
+++ b/lib/find_bit.c
@@ -182,6 +182,15 @@ unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned l
 EXPORT_SYMBOL(_find_next_andnot_bit);
 #endif
 
+#ifndef find_next_or_bit
+unsigned long _find_next_or_bit(const unsigned long *addr1, const unsigned long *addr2,
+					unsigned long nbits, unsigned long start)
+{
+	return FIND_NEXT_BIT(addr1[idx] | addr2[idx], /* nop */, nbits, start);
+}
+EXPORT_SYMBOL(_find_next_or_bit);
+#endif
+
 #ifndef find_next_zero_bit
 unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits,
 					 unsigned long start)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] pcpcntrs: fix dying cpu summation race
  2023-03-15  8:49 [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Dave Chinner
  2023-03-15  8:49 ` [PATCH 1/4] cpumask: introduce for_each_cpu_or Dave Chinner
@ 2023-03-15  8:49 ` Dave Chinner
  2023-03-15  8:49 ` [PATCH 3/4] fork: remove use of percpu_counter_sum_all Dave Chinner
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2023-03-15  8:49 UTC (permalink / raw)
  To: linux-kernel, linux-xfs; +Cc: linux-mm, linux-fsdevel, yebin10

From: Dave Chinner <dchinner@redhat.com>

In commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all
interface") a race condition between a cpu dying and
percpu_counter_sum() iterating online CPUs was identified. The
solution was to iterate all possible CPUs for summation via
percpu_counter_sum_all().

We recently had a percpu_counter_sum() call in XFS trip over this
same race condition and it fired a debug assert because the
filesystem was unmounting and the counter *should* be zero just
before we destroy it. That was reported here:

https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/

likely as a result of running generic/648 which exercises
filesystems in the presence of CPU online/offline events.

The solution to use percpu_counter_sum_all() is an awful one. We
use percpu counters and percpu_counter_sum() for accurate and
reliable threshold detection for space management, so a summation
race condition during these operations can result in overcommit of
available space and that may result in filesystem shutdowns.

As percpu_counter_sum_all() iterates all possible CPUs rather than
just those online or even those present, the mask can include CPUs
that aren't even installed in the machine, or in the case of
machines that can hot-plug CPU capable nodes, even have physical
sockets present in the machine.

Fundamentally, this race condition is caused by the CPU being
offlined being removed from the cpu_online_mask before the notifier
that cleans up per-cpu state is run. Hence percpu_counter_sum() will
not sum the count for a cpu currently being taken offline,
regardless of whether the notifier has run or not. This is
the root cause of the bug.

The percpu counter notifier iterates all the registered counters,
locks the counter and moves the percpu count to the global sum.
This is serialised against other operations that move the percpu
counter to the global sum as well as percpu_counter_sum() operations
that sum the percpu counts while holding the counter lock.

Hence the notifier is safe to run concurrently with sum operations,
and the only thing we actually need to care about is that
percpu_counter_sum() iterates dying CPUs. That's trivial to do,
and when there are no CPUs dying, it has no addition overhead except
for a cpumask_or() operation.

This change makes percpu_counter_sum() always do the right thing in
the presence of CPU hot unplug events and makes
percpu_counter_sum_all() unnecessary. This, in turn, means that
filesystems like XFS, ext4, and btrfs don't have to work out when
they should use percpu_counter_sum() vs percpu_counter_sum_all() in
their space accounting algorithms

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 lib/percpu_counter.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index dba56c5c1837..0e096311e0c0 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -131,7 +131,7 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
 
 	raw_spin_lock_irqsave(&fbc->lock, flags);
 	ret = fbc->count;
-	for_each_cpu(cpu, cpu_mask) {
+	for_each_cpu_or(cpu, cpu_online_mask, cpu_mask) {
 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
 		ret += *pcount;
 	}
@@ -141,11 +141,20 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
 
 /*
  * Add up all the per-cpu counts, return the result.  This is a more accurate
- * but much slower version of percpu_counter_read_positive()
+ * but much slower version of percpu_counter_read_positive().
+ *
+ * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
+ * from CPUs that are in the process of being taken offline. Dying cpus have
+ * been removed from the online mask, but may not have had the hotplug dead
+ * notifier called to fold the percpu count back into the global counter sum.
+ * By including dying CPUs in the iteration mask, we avoid this race condition
+ * so __percpu_counter_sum() just does the right thing when CPUs are being taken
+ * offline.
  */
 s64 __percpu_counter_sum(struct percpu_counter *fbc)
 {
-	return __percpu_counter_sum_mask(fbc, cpu_online_mask);
+
+	return __percpu_counter_sum_mask(fbc, cpu_dying_mask);
 }
 EXPORT_SYMBOL(__percpu_counter_sum);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] fork: remove use of percpu_counter_sum_all
  2023-03-15  8:49 [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Dave Chinner
  2023-03-15  8:49 ` [PATCH 1/4] cpumask: introduce for_each_cpu_or Dave Chinner
  2023-03-15  8:49 ` [PATCH 2/4] pcpcntrs: fix dying cpu summation race Dave Chinner
@ 2023-03-15  8:49 ` Dave Chinner
  2023-03-15  8:49 ` [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all() Dave Chinner
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2023-03-15  8:49 UTC (permalink / raw)
  To: linux-kernel, linux-xfs; +Cc: linux-mm, linux-fsdevel, yebin10

From: Dave Chinner <dchinner@redhat.com>

This effectively reverts the change made in commit f689054aace2
("percpu_counter: add percpu_counter_sum_all interface") as the
race condition percpu_counter_sum_all() was invented to avoid is
now handled directly in percpu_counter_sum() and nobody needs to
care about summing racing with cpu unplug anymore.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 kernel/fork.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d8cda4c6de6c..c0257cbee093 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -755,11 +755,6 @@ static void check_mm(struct mm_struct *mm)
 	for (i = 0; i < NR_MM_COUNTERS; i++) {
 		long x = percpu_counter_sum(&mm->rss_stat[i]);
 
-		if (likely(!x))
-			continue;
-
-		/* Making sure this is not due to race with CPU offlining. */
-		x = percpu_counter_sum_all(&mm->rss_stat[i]);
 		if (unlikely(x))
 			pr_alert("BUG: Bad rss-counter state mm:%p type:%s val:%ld\n",
 				 mm, resident_page_types[i], x);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
  2023-03-15  8:49 [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Dave Chinner
                   ` (2 preceding siblings ...)
  2023-03-15  8:49 ` [PATCH 3/4] fork: remove use of percpu_counter_sum_all Dave Chinner
@ 2023-03-15  8:49 ` Dave Chinner
  2023-03-15 19:22   ` kernel test robot
  2023-03-15 20:14   ` kernel test robot
       [not found] ` <20230315233618.2168-1-hdanton@sina.com>
  2023-03-18  0:41 ` [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Darrick J. Wong
  5 siblings, 2 replies; 12+ messages in thread
From: Dave Chinner @ 2023-03-15  8:49 UTC (permalink / raw)
  To: linux-kernel, linux-xfs; +Cc: linux-mm, linux-fsdevel, yebin10

From: Dave Chinner <dchinner@redhat.com>

percpu_counter_sum_all() is now redundant as the race condition it
was invented to handle is now dealt with by percpu_counter_sum()
directly and all users of percpu_counter_sum_all() have been
removed.

Remove it.

This effectively reverts the changes made in f689054aace2
("percpu_counter: add percpu_counter_sum_all interface") except for
the cpumask iteration that fixes percpu_counter_sum() made earlier
in this series.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 include/linux/percpu_counter.h |  6 -----
 lib/percpu_counter.c           | 40 ++++++++++------------------------
 2 files changed, 11 insertions(+), 35 deletions(-)

diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h
index 521a733e21a9..75b73c83bc9d 100644
--- a/include/linux/percpu_counter.h
+++ b/include/linux/percpu_counter.h
@@ -45,7 +45,6 @@ void percpu_counter_set(struct percpu_counter *fbc, s64 amount);
 void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount,
 			      s32 batch);
 s64 __percpu_counter_sum(struct percpu_counter *fbc);
-s64 percpu_counter_sum_all(struct percpu_counter *fbc);
 int __percpu_counter_compare(struct percpu_counter *fbc, s64 rhs, s32 batch);
 void percpu_counter_sync(struct percpu_counter *fbc);
 
@@ -196,11 +195,6 @@ static inline s64 percpu_counter_sum(struct percpu_counter *fbc)
 	return percpu_counter_read(fbc);
 }
 
-static inline s64 percpu_counter_sum_all(struct percpu_counter *fbc)
-{
-	return percpu_counter_read(fbc);
-}
-
 static inline bool percpu_counter_initialized(struct percpu_counter *fbc)
 {
 	return true;
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 0e096311e0c0..5004463c4f9f 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -122,23 +122,6 @@ void percpu_counter_sync(struct percpu_counter *fbc)
 }
 EXPORT_SYMBOL(percpu_counter_sync);
 
-static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
-			      const struct cpumask *cpu_mask)
-{
-	s64 ret;
-	int cpu;
-	unsigned long flags;
-
-	raw_spin_lock_irqsave(&fbc->lock, flags);
-	ret = fbc->count;
-	for_each_cpu_or(cpu, cpu_online_mask, cpu_mask) {
-		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
-		ret += *pcount;
-	}
-	raw_spin_unlock_irqrestore(&fbc->lock, flags);
-	return ret;
-}
-
 /*
  * Add up all the per-cpu counts, return the result.  This is a more accurate
  * but much slower version of percpu_counter_read_positive().
@@ -153,22 +136,21 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
  */
 s64 __percpu_counter_sum(struct percpu_counter *fbc)
 {
+	s64 ret;
+	int cpu;
+	unsigned long flags;
 
-	return __percpu_counter_sum_mask(fbc, cpu_dying_mask);
+	raw_spin_lock_irqsave(&fbc->lock, flags);
+	ret = fbc->count;
+	for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
+		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
+		ret += *pcount;
+	}
+	raw_spin_unlock_irqrestore(&fbc->lock, flags);
+	return ret;
 }
 EXPORT_SYMBOL(__percpu_counter_sum);
 
-/*
- * This is slower version of percpu_counter_sum as it traverses all possible
- * cpus. Use this only in the cases where accurate data is needed in the
- * presense of CPUs getting offlined.
- */
-s64 percpu_counter_sum_all(struct percpu_counter *fbc)
-{
-	return __percpu_counter_sum_mask(fbc, cpu_possible_mask);
-}
-EXPORT_SYMBOL(percpu_counter_sum_all);
-
 int __percpu_counter_init(struct percpu_counter *fbc, s64 amount, gfp_t gfp,
 			  struct lock_class_key *key)
 {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
  2023-03-15  8:49 ` [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all() Dave Chinner
@ 2023-03-15 19:22   ` kernel test robot
  2023-03-15 20:55     ` Dave Chinner
  2023-03-15 20:14   ` kernel test robot
  1 sibling, 1 reply; 12+ messages in thread
From: kernel test robot @ 2023-03-15 19:22 UTC (permalink / raw)
  To: Dave Chinner, linux-kernel, linux-xfs
  Cc: oe-kbuild-all, linux-mm, linux-fsdevel, yebin10

Hi Dave,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.3-rc2 next-20230315]
[cannot apply to dennis-percpu/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
patch link:    https://lore.kernel.org/r/20230315084938.2544737-5-david%40fromorbit.com
patch subject: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
config: i386-randconfig-a005 (https://download.01.org/0day-ci/archive/20230316/202303160333.XqIRz3JU-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
        git checkout 8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/hwmon/ fs/xfs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303160333.XqIRz3JU-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/string.h:5,
                    from include/linux/uuid.h:11,
                    from fs/xfs/xfs_linux.h:10,
                    from fs/xfs/xfs.h:22,
                    from fs/xfs/xfs_super.c:7:
   fs/xfs/xfs_super.c: In function 'xfs_destroy_percpu_counters':
>> fs/xfs/xfs_super.c:1079:16: error: implicit declaration of function 'percpu_counter_sum_all'; did you mean 'percpu_counter_sum'? [-Werror=implicit-function-declaration]
    1079 |                percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
         |                ^~~~~~~~~~~~~~~~~~~~~~
   include/linux/compiler.h:77:45: note: in definition of macro 'likely'
      77 | # define likely(x)      __builtin_expect(!!(x), 1)
         |                                             ^
   fs/xfs/xfs_super.c:1078:9: note: in expansion of macro 'ASSERT'
    1078 |         ASSERT(xfs_is_shutdown(mp) ||
         |         ^~~~~~
   cc1: some warnings being treated as errors


vim +1079 fs/xfs/xfs_super.c

8757c38f2cf6e5 Ian Kent        2019-11-04  1070  
8757c38f2cf6e5 Ian Kent        2019-11-04  1071  static void
8757c38f2cf6e5 Ian Kent        2019-11-04  1072  xfs_destroy_percpu_counters(
8757c38f2cf6e5 Ian Kent        2019-11-04  1073  	struct xfs_mount	*mp)
8757c38f2cf6e5 Ian Kent        2019-11-04  1074  {
8757c38f2cf6e5 Ian Kent        2019-11-04  1075  	percpu_counter_destroy(&mp->m_icount);
8757c38f2cf6e5 Ian Kent        2019-11-04  1076  	percpu_counter_destroy(&mp->m_ifree);
8757c38f2cf6e5 Ian Kent        2019-11-04  1077  	percpu_counter_destroy(&mp->m_fdblocks);
75c8c50fa16a23 Dave Chinner    2021-08-18  1078  	ASSERT(xfs_is_shutdown(mp) ||
c35278f526edf1 Ye Bin          2023-03-14 @1079  	       percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
8757c38f2cf6e5 Ian Kent        2019-11-04  1080  	percpu_counter_destroy(&mp->m_delalloc_blks);
2229276c528326 Darrick J. Wong 2022-04-12  1081  	percpu_counter_destroy(&mp->m_frextents);
8757c38f2cf6e5 Ian Kent        2019-11-04  1082  }
8757c38f2cf6e5 Ian Kent        2019-11-04  1083  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
  2023-03-15  8:49 ` [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all() Dave Chinner
  2023-03-15 19:22   ` kernel test robot
@ 2023-03-15 20:14   ` kernel test robot
  2023-03-15 21:21     ` Darrick J. Wong
  1 sibling, 1 reply; 12+ messages in thread
From: kernel test robot @ 2023-03-15 20:14 UTC (permalink / raw)
  To: Dave Chinner, linux-kernel, linux-xfs
  Cc: llvm, oe-kbuild-all, linux-mm, linux-fsdevel, yebin10

Hi Dave,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.3-rc2 next-20230315]
[cannot apply to dennis-percpu/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
patch link:    https://lore.kernel.org/r/20230315084938.2544737-5-david%40fromorbit.com
patch subject: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
config: riscv-randconfig-r042-20230313 (https://download.01.org/0day-ci/archive/20230316/202303160421.bnmiVRCM-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
        git checkout 8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash fs/xfs/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303160421.bnmiVRCM-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/xfs/xfs_super.c:1079:9: error: call to undeclared function 'percpu_counter_sum_all'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
                  percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
                  ^
   fs/xfs/xfs_super.c:1079:9: note: did you mean 'percpu_counter_sum'?
   include/linux/percpu_counter.h:193:19: note: 'percpu_counter_sum' declared here
   static inline s64 percpu_counter_sum(struct percpu_counter *fbc)
                     ^
   1 error generated.


vim +/percpu_counter_sum_all +1079 fs/xfs/xfs_super.c

8757c38f2cf6e5 Ian Kent        2019-11-04  1070  
8757c38f2cf6e5 Ian Kent        2019-11-04  1071  static void
8757c38f2cf6e5 Ian Kent        2019-11-04  1072  xfs_destroy_percpu_counters(
8757c38f2cf6e5 Ian Kent        2019-11-04  1073  	struct xfs_mount	*mp)
8757c38f2cf6e5 Ian Kent        2019-11-04  1074  {
8757c38f2cf6e5 Ian Kent        2019-11-04  1075  	percpu_counter_destroy(&mp->m_icount);
8757c38f2cf6e5 Ian Kent        2019-11-04  1076  	percpu_counter_destroy(&mp->m_ifree);
8757c38f2cf6e5 Ian Kent        2019-11-04  1077  	percpu_counter_destroy(&mp->m_fdblocks);
75c8c50fa16a23 Dave Chinner    2021-08-18  1078  	ASSERT(xfs_is_shutdown(mp) ||
c35278f526edf1 Ye Bin          2023-03-14 @1079  	       percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
8757c38f2cf6e5 Ian Kent        2019-11-04  1080  	percpu_counter_destroy(&mp->m_delalloc_blks);
2229276c528326 Darrick J. Wong 2022-04-12  1081  	percpu_counter_destroy(&mp->m_frextents);
8757c38f2cf6e5 Ian Kent        2019-11-04  1082  }
8757c38f2cf6e5 Ian Kent        2019-11-04  1083  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
  2023-03-15 19:22   ` kernel test robot
@ 2023-03-15 20:55     ` Dave Chinner
  2023-03-17  2:22       ` Yujie Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Chinner @ 2023-03-15 20:55 UTC (permalink / raw)
  To: kernel test robot
  Cc: linux-kernel, linux-xfs, oe-kbuild-all, linux-mm, linux-fsdevel, yebin10

On Thu, Mar 16, 2023 at 03:22:31AM +0800, kernel test robot wrote:
> Hi Dave,
> 
> Thank you for the patch! Yet something to improve:

No, ithere is nothing wrong with my patch series, this is something
for _you_ to improve.

> [auto build test ERROR on linus/master]
> [also build test ERROR on v6.3-rc2 next-20230315]
> [cannot apply to dennis-percpu/for-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
> patch link:    https://lore.kernel.org/r/20230315084938.2544737-5-david%40fromorbit.com
> patch subject: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
> config: i386-randconfig-a005 (https://download.01.org/0day-ci/archive/20230316/202303160333.XqIRz3JU-lkp@intel.com/config)
> compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
> reproduce (this is a W=1 build):
>         # https://github.com/intel-lab-lkp/linux/commit/8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
>         git checkout 8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         make W=1 O=build_dir ARCH=i386 olddefconfig
>         make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/hwmon/ fs/xfs/
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Link: https://lore.kernel.org/oe-kbuild-all/202303160333.XqIRz3JU-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from include/linux/string.h:5,
>                     from include/linux/uuid.h:11,
>                     from fs/xfs/xfs_linux.h:10,
>                     from fs/xfs/xfs.h:22,
>                     from fs/xfs/xfs_super.c:7:
>    fs/xfs/xfs_super.c: In function 'xfs_destroy_percpu_counters':
> >> fs/xfs/xfs_super.c:1079:16: error: implicit declaration of function 'percpu_counter_sum_all'; did you mean 'percpu_counter_sum'? [-Werror=implicit-function-declaration]
>     1079 |                percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
>          |                ^~~~~~~~~~~~~~~~~~~~~~
>    include/linux/compiler.h:77:45: note: in definition of macro 'likely'
>       77 | # define likely(x)      __builtin_expect(!!(x), 1)
>          |                                             ^
>    fs/xfs/xfs_super.c:1078:9: note: in expansion of macro 'ASSERT'
>     1078 |         ASSERT(xfs_is_shutdown(mp) ||
>          |         ^~~~~~
>    cc1: some warnings being treated as errors
> 
> 
> vim +1079 fs/xfs/xfs_super.c
> 
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1070  
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1071  static void
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1072  xfs_destroy_percpu_counters(
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1073  	struct xfs_mount	*mp)
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1074  {
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1075  	percpu_counter_destroy(&mp->m_icount);
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1076  	percpu_counter_destroy(&mp->m_ifree);
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1077  	percpu_counter_destroy(&mp->m_fdblocks);
> 75c8c50fa16a23 Dave Chinner    2021-08-18  1078  	ASSERT(xfs_is_shutdown(mp) ||
> c35278f526edf1 Ye Bin          2023-03-14 @1079  	       percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);

This change has not been committed to any tree that I am aware of.
It was only posted to the XFS list yesterday, and I effectively
NACK'd it and wrote this patchset instead to fix the issue.

IOWs, if -anyone- has actually committed this change to add
percpu_counter_sum_all() to XFS, they've done the wrong thing.
Hence this build failure is a robot issue, not a problem with my
patch series.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
  2023-03-15 20:14   ` kernel test robot
@ 2023-03-15 21:21     ` Darrick J. Wong
  0 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2023-03-15 21:21 UTC (permalink / raw)
  To: kernel test robot
  Cc: Dave Chinner, linux-kernel, linux-xfs, llvm, oe-kbuild-all,
	linux-mm, linux-fsdevel, yebin10

On Thu, Mar 16, 2023 at 04:14:04AM +0800, kernel test robot wrote:
> Hi Dave,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v6.3-rc2 next-20230315]
> [cannot apply to dennis-percpu/for-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
> patch link:    https://lore.kernel.org/r/20230315084938.2544737-5-david%40fromorbit.com
> patch subject: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
> config: riscv-randconfig-r042-20230313 (https://download.01.org/0day-ci/archive/20230316/202303160421.bnmiVRCM-lkp@intel.com/config)
> compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project 67409911353323ca5edf2049ef0df54132fa1ca7)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install riscv cross compiling tool for clang build
>         # apt-get install binutils-riscv64-linux-gnu
>         # https://github.com/intel-lab-lkp/linux/commit/8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
>         git checkout 8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv olddefconfig
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash fs/xfs/
> 
> If you fix the issue, kindly add following tag where applicable
> | Reported-by: kernel test robot <lkp@intel.com>
> | Link: https://lore.kernel.org/oe-kbuild-all/202303160421.bnmiVRCM-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
> >> fs/xfs/xfs_super.c:1079:9: error: call to undeclared function 'percpu_counter_sum_all'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
>                   percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
>                   ^
>    fs/xfs/xfs_super.c:1079:9: note: did you mean 'percpu_counter_sum'?
>    include/linux/percpu_counter.h:193:19: note: 'percpu_counter_sum' declared here
>    static inline s64 percpu_counter_sum(struct percpu_counter *fbc)
>                      ^
>    1 error generated.
> 
> 
> vim +/percpu_counter_sum_all +1079 fs/xfs/xfs_super.c
> 
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1070  
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1071  static void
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1072  xfs_destroy_percpu_counters(
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1073  	struct xfs_mount	*mp)
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1074  {
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1075  	percpu_counter_destroy(&mp->m_icount);
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1076  	percpu_counter_destroy(&mp->m_ifree);
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1077  	percpu_counter_destroy(&mp->m_fdblocks);
> 75c8c50fa16a23 Dave Chinner    2021-08-18  1078  	ASSERT(xfs_is_shutdown(mp) ||
> c35278f526edf1 Ye Bin          2023-03-14 @1079  	       percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);

Ah, yes, ChatGPT-4 hired someone via Mechanical Turk to log in to my
kernel.org account, answer the CAPTCHA, and push Ye's patch into
for-next.

(No it didn't.  Is your bot merging random patches from the XFS list and
whining when the result doesn't build?  WTH is going on here??)

--D

> 8757c38f2cf6e5 Ian Kent        2019-11-04  1080  	percpu_counter_destroy(&mp->m_delalloc_blks);
> 2229276c528326 Darrick J. Wong 2022-04-12  1081  	percpu_counter_destroy(&mp->m_frextents);
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1082  }
> 8757c38f2cf6e5 Ian Kent        2019-11-04  1083  
> 
> -- 
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
  2023-03-15 20:55     ` Dave Chinner
@ 2023-03-17  2:22       ` Yujie Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Yujie Liu @ 2023-03-17  2:22 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong
  Cc: kernel test robot, linux-kernel, linux-xfs, oe-kbuild-all,
	linux-mm, linux-fsdevel, yebin10

On Thu, Mar 16, 2023 at 07:55:50AM +1100, Dave Chinner wrote:
> On Thu, Mar 16, 2023 at 03:22:31AM +0800, kernel test robot wrote:
> > Hi Dave,
> > 
> > Thank you for the patch! Yet something to improve:
> 
> No, ithere is nothing wrong with my patch series, this is something
> for _you_ to improve.
> 
> > [auto build test ERROR on linus/master]
> > [also build test ERROR on v6.3-rc2 next-20230315]
> > [cannot apply to dennis-percpu/for-next]
> > [If your patch is applied to the wrong git tree, kindly drop us a note.
> > And when submitting patch, we suggest to use '--base' as documented in
> > https://git-scm.com/docs/git-format-patch#_base_tree_information]
> > 
> > url:    https://github.com/intel-lab-lkp/linux/commits/Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
> > patch link:    https://lore.kernel.org/r/20230315084938.2544737-5-david%40fromorbit.com
> > patch subject: [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all()
> > config: i386-randconfig-a005 (https://download.01.org/0day-ci/archive/20230316/202303160333.XqIRz3JU-lkp@intel.com/config)
> > compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
> > reproduce (this is a W=1 build):
> >         # https://github.com/intel-lab-lkp/linux/commit/8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
> >         git remote add linux-review https://github.com/intel-lab-lkp/linux
> >         git fetch --no-tags linux-review Dave-Chinner/cpumask-introduce-for_each_cpu_or/20230315-165202
> >         git checkout 8360dcb55f1eb08fe7a1f457f3b99bef8e306c8b
> >         # save the config file
> >         mkdir build_dir && cp config build_dir/.config
> >         make W=1 O=build_dir ARCH=i386 olddefconfig
> >         make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash drivers/hwmon/ fs/xfs/
> > 
> > If you fix the issue, kindly add following tag where applicable
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Link: https://lore.kernel.org/oe-kbuild-all/202303160333.XqIRz3JU-lkp@intel.com/
> > 
> > All errors (new ones prefixed by >>):
> > 
> >    In file included from include/linux/string.h:5,
> >                     from include/linux/uuid.h:11,
> >                     from fs/xfs/xfs_linux.h:10,
> >                     from fs/xfs/xfs.h:22,
> >                     from fs/xfs/xfs_super.c:7:
> >    fs/xfs/xfs_super.c: In function 'xfs_destroy_percpu_counters':
> > >> fs/xfs/xfs_super.c:1079:16: error: implicit declaration of function 'percpu_counter_sum_all'; did you mean 'percpu_counter_sum'? [-Werror=implicit-function-declaration]
> >     1079 |                percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
> >          |                ^~~~~~~~~~~~~~~~~~~~~~
> >    include/linux/compiler.h:77:45: note: in definition of macro 'likely'
> >       77 | # define likely(x)      __builtin_expect(!!(x), 1)
> >          |                                             ^
> >    fs/xfs/xfs_super.c:1078:9: note: in expansion of macro 'ASSERT'
> >     1078 |         ASSERT(xfs_is_shutdown(mp) ||
> >          |         ^~~~~~
> >    cc1: some warnings being treated as errors
> > 
> > 
> > vim +1079 fs/xfs/xfs_super.c
> > 
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1070  
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1071  static void
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1072  xfs_destroy_percpu_counters(
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1073  	struct xfs_mount	*mp)
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1074  {
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1075  	percpu_counter_destroy(&mp->m_icount);
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1076  	percpu_counter_destroy(&mp->m_ifree);
> > 8757c38f2cf6e5 Ian Kent        2019-11-04  1077  	percpu_counter_destroy(&mp->m_fdblocks);
> > 75c8c50fa16a23 Dave Chinner    2021-08-18  1078  	ASSERT(xfs_is_shutdown(mp) ||
> > c35278f526edf1 Ye Bin          2023-03-14 @1079  	       percpu_counter_sum_all(&mp->m_delalloc_blks) == 0);
> 
> This change has not been committed to any tree that I am aware of.
> It was only posted to the XFS list yesterday, and I effectively
> NACK'd it and wrote this patchset instead to fix the issue.
> 
> IOWs, if -anyone- has actually committed this change to add
> percpu_counter_sum_all() to XFS, they've done the wrong thing.
> Hence this build failure is a robot issue, not a problem with my
> patch series.

Sorry about this false positive report.

The robot misinterpreted the link in the cover letter, and wrongly
thought it was a prerequisite patch for this patch series, leading to
this false report.

We will improve the robot and increase the accuracy.

--
Best Regards,
Yujie

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/4] pcpcntrs: fix dying cpu summation race
       [not found] ` <20230315233618.2168-1-hdanton@sina.com>
@ 2023-03-18  0:37   ` Dave Chinner
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Chinner @ 2023-03-18  0:37 UTC (permalink / raw)
  To: Hillf Danton; +Cc: linux-kernel, linux-xfs, linux-mm, linux-fsdevel, yebin10

On Thu, Mar 16, 2023 at 07:36:18AM +0800, Hillf Danton wrote:
> On 15 Mar 2023 19:49:36 +1100 Dave Chinner <dchinner@redhat.com>
> > @@ -141,11 +141,20 @@ static s64 __percpu_counter_sum_mask(struct percpu_counter *fbc,
> >  
> >  /*
> >   * Add up all the per-cpu counts, return the result.  This is a more accurate
> > - * but much slower version of percpu_counter_read_positive()
> > + * but much slower version of percpu_counter_read_positive().
> > + *
> > + * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums
> > + * from CPUs that are in the process of being taken offline. Dying cpus have
> > + * been removed from the online mask, but may not have had the hotplug dead
> > + * notifier called to fold the percpu count back into the global counter sum.
> > + * By including dying CPUs in the iteration mask, we avoid this race condition
> > + * so __percpu_counter_sum() just does the right thing when CPUs are being taken
> > + * offline.
> >   */
> >  s64 __percpu_counter_sum(struct percpu_counter *fbc)
> >  {
> > -	return __percpu_counter_sum_mask(fbc, cpu_online_mask);
> > +
> > +	return __percpu_counter_sum_mask(fbc, cpu_dying_mask);
> >  }
> >  EXPORT_SYMBOL(__percpu_counter_sum);
> >  
> > -- 
> > 2.39.2
> 
> Hm... the window of the race between a dying cpu and the sum of percpu counter
> spotted in commit f689054aace2 is stil open after a text-book log message.
> 
> 	cpu 0			cpu 2
> 	---			---
> 	percpu_counter_sum() 	percpu_counter_cpu_dead()
> 
> 	raw_spin_lock_irqsave(&fbc->lock, flags);
> 	ret = fbc->count;
> 	for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) {
> 		s32 *pcount = per_cpu_ptr(fbc->counters, cpu);
> 		ret += *pcount;
> 	}
> 	raw_spin_unlock_irqrestore(&fbc->lock, flags);
> 
> 				raw_spin_lock(&fbc->lock);
> 				pcount = per_cpu_ptr(fbc->counters, cpu);
> 				fbc->count += *pcount;
> 				*pcount = 0;
> 				raw_spin_unlock(&fbc->lock);

Their is no race condition updating fbc->count here - I explained
this in the cover letter. i.e. the sum in percpu_counter_sum() is to
a private counter and does not change fbc->count. Therefore we only
need/want to fold the dying cpu percpu count into fbc->count in the
CPU_DEAD callback.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race
  2023-03-15  8:49 [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Dave Chinner
                   ` (4 preceding siblings ...)
       [not found] ` <20230315233618.2168-1-hdanton@sina.com>
@ 2023-03-18  0:41 ` Darrick J. Wong
  5 siblings, 0 replies; 12+ messages in thread
From: Darrick J. Wong @ 2023-03-18  0:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-xfs, linux-mm, linux-fsdevel, yebin10

On Wed, Mar 15, 2023 at 07:49:34PM +1100, Dave Chinner wrote:
> Ye Bin reported an XFS assert failure when testing CPU hotplug
> recently. During unmount, XFs was asserting that a percpu counter
> value should be zero because at that point in time a non-zero value
> indicates a space accounting leak which is a bug. The details of
> that failure can be found here:
> 
> https://lore.kernel.org/linux-kernel/20230314090649.326642-1-yebin@huaweicloud.com/
> 
> Ye Bin then proposed changing the XFS code to use
> percpu_counter_sum_all(), which surprised me because I didn't know
> that function existed at all. Indeed, it was only merged in the
> recent 6.3-rc1 merge window because someone else had noticed a
> pcpctr sum race with hotplug.
> 
> commit f689054aace2 ("percpu_counter: add percpu_counter_sum_all
> interface") was introduced via the mm tree. Nobody outside that
> scope knew about this, because who bothers to even try to read LKML
> these days? There was little list discussion, and I don't see
> anything other than a cursory review done on the patch.
> 
> At minimum, linux-fsdevel should have been cc'd because multiple
> filesystems use percpu counters for both threshold and ENOSPC
> accounting in filesystems.  Hence if there is a problem with
> percpu_counter_sum() leaking, filesystem developers kinda need to
> know about it because leaks like this (as per the XFS bug report)
> can actually result in on-disk corruption occurring.
> 
> So, now I know that there is an accuracy problem with
> percpu_counter_sum(), I will assert that we need to fix it properly
> rathern than hack around it by adding a new variant. Leaving people
> who know nothing about cpu hotplug to try to work out if they have a
> hotplug related issue with their use of percpu_counter_sum() is just
> bad code; percpu_counter_sum() should just Do The Right Thing.
> 
> Use of the cpu_dying_mask should effectively close this race
> condition.  That is, when we take a CPU offline we effectively do:
> 
> 	mark cpu dying
> 	clear cpu from cpu_online_mask
> 	run cpu dead callbacks
> 	  ....
> 	  <lock counter>
> 	  fold pcp count into fbc->count
> 	  clear pcp count
> 	  <unlock counter>
> 	  ...
> 	mark CPU dead
> 	clear cpu dying
> 
> The race condition occurs because we can run a _sum operation
> between the "clear cpu online" mask update and the "pcpctr cpu dead"
> notification runs and fold the pcp counter values back into the
> global count.  The sum sees that the CPU is not online, so it skips
> that CPU even though the count is not zero and hasn't been folded by
> the CPU dead notifier. Hence it skips something that it shouldn't.
> 
> However, that race condition doesn't exist if we take cpu_dying_mask
> into account during the sum.  i.e. percpu_counter_sum() should
> iterate every CPU set in either the cpu_online_mask and the
> cpu_dying_mask to capture CPUs that are being taken offline.
> 
> If the cpu is not set in the dying mask, then the online or offline
> state of the CPU is correct an there is no notifier pending over
> running and we will skip/sum it correctly.
> 
> If the CPU is set in the dying mask, then we need to sum it
> regardless of the online mask state or even whether the cpu dead
> notifier has run.  If the sum wins the race to the pcp counter on
> the dying CPU, it is included in the local sum from the pcp
> variable. If the notifier wins the race, it gets folded back into
> the global count and zeroed before the sum runs. Then the sum
> includes the count in the local sum from the global counter sum
> rather than the percpu counter.
> 
> Either way, we get the same correct sum value from
> percpu_counter_sum() regardless of how it races with a CPU being
> removed from the system. And there is no need for
> percpu_count_sum_all() anymore.
> 
> This series introduces bitmap operations for finding bits set in
> either of two bitmasks and adds the for_each_cpu_or() wrapper to
> iterate CPUs set in either of the two supplied cpu masks. It then
> converts __percpu_counter_sum_mask() to use this, and have
> __percpu_counter_sum() pass the cpu_dying_mask as the second mask.
> This fixes the race condition with CPUs dying.
> 
> It then converts the only user of percpu_counter_sum_all() to use
> percpu_counter_sum() as percpu_counter_sum_all() is now redundant,
> then it removes percpu_counter_sum_all() and recombines
> __percpu_counter_sum_mask() and __percpu_counter_sum().
> 
> This effectively undoes all the changes in commit f689054aace2
> except for the small change to use for_each_cpu_or() to fold in the
> cpu_dying_mask made in this patch set to avoid the problematic race
> condition. Hence the cpu unplug race condition is now correctly
> handled by percpu_counter_sum(), and people can go back to being
> blissfully ignorant of how pcpctrs interact with CPU hotplug (which
> is how it should be!).
> 
> This has spent the last siz hours running generic/650 on XFS on a
> couple of VMs (on 4p, the other 16p) which stresses the filesystem
> by running a multi-process fsstress invocation whilst randomly
> onlining and offlining CPUs. Hence it's exercising all the percpu
> counter cpu dead paths whilst the filesystem is randomly modifying,
> reading and summing summing the critical counters that XFS needs for
> accurate accounting of resource consumption within the filesystem.
> 
> Thoughts, comments and testing welcome!

I've been testing this since I saw it, and AFAICT it looks good to me.
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

That said -- do any of the people who know percpu counters better than
me have any thoughts?  The justification and the code changes make sense
to me, but keeping up with xfs metadata is enough for me.

--D


> 
> -Dave.
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-03-18  0:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-15  8:49 [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Dave Chinner
2023-03-15  8:49 ` [PATCH 1/4] cpumask: introduce for_each_cpu_or Dave Chinner
2023-03-15  8:49 ` [PATCH 2/4] pcpcntrs: fix dying cpu summation race Dave Chinner
2023-03-15  8:49 ` [PATCH 3/4] fork: remove use of percpu_counter_sum_all Dave Chinner
2023-03-15  8:49 ` [PATCH 4/4] pcpcntr: remove percpu_counter_sum_all() Dave Chinner
2023-03-15 19:22   ` kernel test robot
2023-03-15 20:55     ` Dave Chinner
2023-03-17  2:22       ` Yujie Liu
2023-03-15 20:14   ` kernel test robot
2023-03-15 21:21     ` Darrick J. Wong
     [not found] ` <20230315233618.2168-1-hdanton@sina.com>
2023-03-18  0:37   ` [PATCH 2/4] pcpcntrs: fix dying cpu summation race Dave Chinner
2023-03-18  0:41 ` [PATCH 0/4] pcpctr: fix percpu_counter_sum vs cpu offline race Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).