stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
@ 2023-04-08  5:22 Tze-nan Wu
  2023-04-08  8:35 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tze-nan Wu @ 2023-04-08  5:22 UTC (permalink / raw)
  To: rostedt, mhiramat
  Cc: bobule.chang, wsd_upstream, Tze-nan.Wu, stable,
	AngeloGioacchino Del Regno, linux-kernel, linux-trace-kernel,
	linux-arm-kernel, linux-mediatek

Sometimes, write to buffer_size_kb can be permanently failure if we change
the cpu_online_mask between two for_each_online_buffer_cpu loops in
function ring_buffer_reset_online_cpus.

The number of increasing and decreasing on cpu_buffer->resize_disable
may be inconsistent, leading the resize_disabled in some CPUs becoming
none zero after ring_buffer_reset_online_cpus return.

This issue can be reproduced by "echo 0 > trace" and hotplug cpu at the
same time. After reproducing succeess, we can find out the attempt to
write to buffer_size_kb node failure every time.

This patch prevent the inconsistent increasing and decreasing on
cpu_buffer->resize_disabled by copying the cpu_online_mask at the
beginning of the function.

But I wonder if there's any side-effect of this patch,
since the behavior changed, if we turn on a cpu between the two loops,
reset_disabled_cpu_buffer() of that cpu won't be run as before,
meaning the cpu_buffer on that cpu just awake will not be cleaned up.

Cc: stable@vger.kernel.org
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
 kernel/trace/ring_buffer.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 76a2d91eecad..468f46bba71e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -288,9 +288,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
 #define for_each_buffer_cpu(buffer, cpu)		\
 	for_each_cpu(cpu, buffer->cpumask)
 
-#define for_each_online_buffer_cpu(buffer, cpu)		\
-	for_each_cpu_and(cpu, buffer->cpumask, cpu_online_mask)
-
 #define TS_SHIFT	27
 #define TS_MASK		((1ULL << TS_SHIFT) - 1)
 #define TS_DELTA_TEST	(~TS_MASK)
@@ -5353,12 +5350,15 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
+	cpumask_var_t reset_online_mask;
 	int cpu;
 
 	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
 
-	for_each_online_buffer_cpu(buffer, cpu) {
+	cpumask_copy(reset_online_mask, cpu_online_mask);
+
+	for_each_cpu_and(cpu, buffer->cpumask, reset_online_mask) {
 		cpu_buffer = buffer->buffers[cpu];
 
 		atomic_inc(&cpu_buffer->resize_disabled);
@@ -5368,7 +5368,7 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
 	/* Make sure all commits have finished */
 	synchronize_rcu();
 
-	for_each_online_buffer_cpu(buffer, cpu) {
+	for_each_cpu_and(cpu, buffer->cpumask, reset_online_mask) {
 		cpu_buffer = buffer->buffers[cpu];
 
 		reset_disabled_cpu_buffer(cpu_buffer);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-08  5:22 [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled Tze-nan Wu
@ 2023-04-08  8:35 ` kernel test robot
  2023-04-08 12:20 ` kernel test robot
  2023-04-09  2:46 ` [PATCH v2] " Tze-nan Wu
  2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-04-08  8:35 UTC (permalink / raw)
  To: Tze-nan Wu, rostedt, mhiramat
  Cc: llvm, oe-kbuild-all, bobule.chang, wsd_upstream, Tze-nan.Wu,
	stable, AngeloGioacchino Del Regno, linux-kernel,
	linux-trace-kernel, linux-arm-kernel, linux-mediatek

Hi Tze-nan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on rostedt-trace/for-next v6.3-rc5 next-20230406]
[cannot apply to rostedt-trace/for-next-urgent]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tze-nan-Wu/ring-buffer-Prevent-inconsistent-operation-on-cpu_buffer-resize_disabled/20230408-132502
patch link:    https://lore.kernel.org/r/20230408052226.25268-1-Tze-nan.Wu%40mediatek.com
patch subject: [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
config: x86_64-randconfig-a002-20230403 (https://download.01.org/0day-ci/archive/20230408/202304081615.eiaqpbV8-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/d404bc0af0a4bde3aa20704642d69a78bdc154f8
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Tze-nan-Wu/ring-buffer-Prevent-inconsistent-operation-on-cpu_buffer-resize_disabled/20230408-132502
        git checkout d404bc0af0a4bde3aa20704642d69a78bdc154f8
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash kernel/trace/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304081615.eiaqpbV8-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/trace/ring_buffer.c:5359:15: warning: variable 'reset_online_mask' is uninitialized when used here [-Wuninitialized]
           cpumask_copy(reset_online_mask, cpu_online_mask);
                        ^~~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c:5353:33: note: initialize the variable 'reset_online_mask' to silence this warning
           cpumask_var_t reset_online_mask;
                                          ^
                                           = NULL
   1 warning generated.


vim +/reset_online_mask +5359 kernel/trace/ring_buffer.c

  5344	
  5345	/**
  5346	 * ring_buffer_reset_online_cpus - reset a ring buffer per CPU buffer
  5347	 * @buffer: The ring buffer to reset a per cpu buffer of
  5348	 * @cpu: The CPU buffer to be reset
  5349	 */
  5350	void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
  5351	{
  5352		struct ring_buffer_per_cpu *cpu_buffer;
  5353		cpumask_var_t reset_online_mask;
  5354		int cpu;
  5355	
  5356		/* prevent another thread from changing buffer sizes */
  5357		mutex_lock(&buffer->mutex);
  5358	
> 5359		cpumask_copy(reset_online_mask, cpu_online_mask);
  5360	
  5361		for_each_cpu_and(cpu, buffer->cpumask, reset_online_mask) {
  5362			cpu_buffer = buffer->buffers[cpu];
  5363	
  5364			atomic_inc(&cpu_buffer->resize_disabled);
  5365			atomic_inc(&cpu_buffer->record_disabled);
  5366		}
  5367	
  5368		/* Make sure all commits have finished */
  5369		synchronize_rcu();
  5370	
  5371		for_each_cpu_and(cpu, buffer->cpumask, reset_online_mask) {
  5372			cpu_buffer = buffer->buffers[cpu];
  5373	
  5374			reset_disabled_cpu_buffer(cpu_buffer);
  5375	
  5376			atomic_dec(&cpu_buffer->record_disabled);
  5377			atomic_dec(&cpu_buffer->resize_disabled);
  5378		}
  5379	
  5380		mutex_unlock(&buffer->mutex);
  5381	}
  5382	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-08  5:22 [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled Tze-nan Wu
  2023-04-08  8:35 ` kernel test robot
@ 2023-04-08 12:20 ` kernel test robot
  2023-04-09  2:46 ` [PATCH v2] " Tze-nan Wu
  2 siblings, 0 replies; 12+ messages in thread
From: kernel test robot @ 2023-04-08 12:20 UTC (permalink / raw)
  To: Tze-nan Wu, rostedt, mhiramat
  Cc: oe-kbuild-all, bobule.chang, wsd_upstream, Tze-nan.Wu, stable,
	AngeloGioacchino Del Regno, linux-kernel, linux-trace-kernel,
	linux-arm-kernel, linux-mediatek

Hi Tze-nan,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[cannot apply to rostedt-trace/for-next-urgent]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Tze-nan-Wu/ring-buffer-Prevent-inconsistent-operation-on-cpu_buffer-resize_disabled/20230408-132502
patch link:    https://lore.kernel.org/r/20230408052226.25268-1-Tze-nan.Wu%40mediatek.com
patch subject: [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
config: s390-allyesconfig (https://download.01.org/0day-ci/archive/20230408/202304082051.Dp50upfS-lkp@intel.com/config)
compiler: s390-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/d404bc0af0a4bde3aa20704642d69a78bdc154f8
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Tze-nan-Wu/ring-buffer-Prevent-inconsistent-operation-on-cpu_buffer-resize_disabled/20230408-132502
        git checkout d404bc0af0a4bde3aa20704642d69a78bdc154f8
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=s390 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=s390 SHELL=/bin/bash kernel/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202304082051.Dp50upfS-lkp@intel.com/

All warnings (new ones prefixed by >>):

   kernel/trace/ring_buffer.c: In function 'ring_buffer_reset_online_cpus':
>> kernel/trace/ring_buffer.c:5359:9: warning: 'reset_online_mask' is used uninitialized [-Wuninitialized]
    5359 |         cpumask_copy(reset_online_mask, cpu_online_mask);
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   kernel/trace/ring_buffer.c:5353:23: note: 'reset_online_mask' was declared here
    5353 |         cpumask_var_t reset_online_mask;
         |                       ^~~~~~~~~~~~~~~~~


vim +/reset_online_mask +5359 kernel/trace/ring_buffer.c

  5344	
  5345	/**
  5346	 * ring_buffer_reset_online_cpus - reset a ring buffer per CPU buffer
  5347	 * @buffer: The ring buffer to reset a per cpu buffer of
  5348	 * @cpu: The CPU buffer to be reset
  5349	 */
  5350	void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
  5351	{
  5352		struct ring_buffer_per_cpu *cpu_buffer;
  5353		cpumask_var_t reset_online_mask;
  5354		int cpu;
  5355	
  5356		/* prevent another thread from changing buffer sizes */
  5357		mutex_lock(&buffer->mutex);
  5358	
> 5359		cpumask_copy(reset_online_mask, cpu_online_mask);
  5360	
  5361		for_each_cpu_and(cpu, buffer->cpumask, reset_online_mask) {
  5362			cpu_buffer = buffer->buffers[cpu];
  5363	
  5364			atomic_inc(&cpu_buffer->resize_disabled);
  5365			atomic_inc(&cpu_buffer->record_disabled);
  5366		}
  5367	
  5368		/* Make sure all commits have finished */
  5369		synchronize_rcu();
  5370	
  5371		for_each_cpu_and(cpu, buffer->cpumask, reset_online_mask) {
  5372			cpu_buffer = buffer->buffers[cpu];
  5373	
  5374			reset_disabled_cpu_buffer(cpu_buffer);
  5375	
  5376			atomic_dec(&cpu_buffer->record_disabled);
  5377			atomic_dec(&cpu_buffer->resize_disabled);
  5378		}
  5379	
  5380		mutex_unlock(&buffer->mutex);
  5381	}
  5382	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-08  5:22 [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled Tze-nan Wu
  2023-04-08  8:35 ` kernel test robot
  2023-04-08 12:20 ` kernel test robot
@ 2023-04-09  2:46 ` Tze-nan Wu
  2023-04-09 12:30   ` Bagas Sanjaya
  2023-04-10  7:35   ` [PATCH v3] " Tze-nan Wu
  2 siblings, 2 replies; 12+ messages in thread
From: Tze-nan Wu @ 2023-04-09  2:46 UTC (permalink / raw)
  To: rostedt, mhiramat
  Cc: bobule.chang, wsd_upstream, Tze-nan.Wu, stable,
	AngeloGioacchino Del Regno, linux-kernel, linux-trace-kernel,
	linux-arm-kernel, linux-mediatek

Write to buffer_size_kb can permanently fail due to cpu_online_mask changed
between two for_each_online_buffer_cpu loops.
The number of increasing and decreasing on cpu_buffer->resize_disable
may be inconsistent, leading that the resize_disabled in some CPUs
becoming none zero after ring_buffer_reset_online_cpus return.

This issue can be reproduced by "echo 0 > trace" and hotplug cpu at the
same time. After reproducing success, we can find out buffer_size_kb
will not be functional anymore.

This patch uses cpus_read_lock() to prevent cpu_online_mask being changed
between two different "for_each_online_buffer_cpu".

Changes in v2:
  Use cpus_read_lock() instead of copying cpu_online_mask at the entry of
  function

Link:
  https://lore.kernel.org/lkml/20230408052226.25268-1-Tze-nan.Wu@mediatek.com/

Cc: stable@vger.kernel.org
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
 kernel/trace/ring_buffer.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 76a2d91eecad..44d833252fb0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -5357,6 +5357,7 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
 
 	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
+	cpus_read_lock();
 
 	for_each_online_buffer_cpu(buffer, cpu) {
 		cpu_buffer = buffer->buffers[cpu];
@@ -5377,6 +5378,7 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
 		atomic_dec(&cpu_buffer->resize_disabled);
 	}
 
+	cpus_read_unlock();
 	mutex_unlock(&buffer->mutex);
 }
 
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-09  2:46 ` [PATCH v2] " Tze-nan Wu
@ 2023-04-09 12:30   ` Bagas Sanjaya
  2023-04-10  1:35     ` Tze-nan Wu (吳澤南)
  2023-04-10  7:35   ` [PATCH v3] " Tze-nan Wu
  1 sibling, 1 reply; 12+ messages in thread
From: Bagas Sanjaya @ 2023-04-09 12:30 UTC (permalink / raw)
  To: Tze-nan Wu, rostedt, mhiramat
  Cc: bobule.chang, wsd_upstream, stable, AngeloGioacchino Del Regno,
	linux-kernel, linux-trace-kernel, linux-arm-kernel,
	linux-mediatek

On 4/9/23 09:46, Tze-nan Wu wrote:
> This issue can be reproduced by "echo 0 > trace" and hotplug cpu at the
> same time. After reproducing success, we can find out buffer_size_kb
> will not be functional anymore.
> 

Do you mean disabling tracing while hotplugging CPU? Or disabling both
tracing and hotplug CPU?

> This patch uses cpus_read_lock() to prevent cpu_online_mask being changed
> between two different "for_each_online_buffer_cpu".
> 

"Use cpu_read_lock() to prevent ..."

> Changes in v2:
>   Use cpus_read_lock() instead of copying cpu_online_mask at the entry of
>   function
> 

To resolve kernel test robot warnings ([1] and [2])? Or have they been fixed?

[1]: https://lore.kernel.org/stable/202304081615.eiaqpbV8-lkp@intel.com/
[2]: https://lore.kernel.org/stable/202304082051.Dp50upfS-lkp@intel.com/

Thanks.

-- 
An old man doll... just what I always wanted! - Clara


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-09 12:30   ` Bagas Sanjaya
@ 2023-04-10  1:35     ` Tze-nan Wu (吳澤南)
  0 siblings, 0 replies; 12+ messages in thread
From: Tze-nan Wu (吳澤南) @ 2023-04-10  1:35 UTC (permalink / raw)
  To: mhiramat, rostedt, bagasdotme
  Cc: linux-mediatek, linux-trace-kernel, linux-kernel,
	Cheng-Jui Wang (王正睿),
	wsd_upstream, Bobule Chang (張弘義),
	stable, linux-arm-kernel, angelogioacchino.delregno

Hi Bagas,

On Sun, 2023-04-09 at 19:30 +0700, Bagas Sanjaya wrote:
> 
> On 4/9/23 09:46, Tze-nan Wu wrote:
> > This issue can be reproduced by "echo 0 > trace" and hotplug cpu at
> > the
> > same time. After reproducing success, we can find out
> > buffer_size_kb
> > will not be functional anymore.
> > 
> 
> Do you mean disabling tracing while hotplugging CPU? Or disabling
> both
> tracing and hotplug CPU?
> 
  I mean disabling tracing while hotplugging CPU, sorry for the
confusion here.

> > This patch uses cpus_read_lock() to prevent cpu_online_mask being
> > changed
> > between two different "for_each_online_buffer_cpu".
> > 
> 
> "Use cpu_read_lock() to prevent ..."
> 
  Thanks for the suggestion 

> > Changes in v2:
> >   Use cpus_read_lock() instead of copying cpu_online_mask at the
> > entry of
> >   function
> > 
> 
  Since I change to fix the issue by using cpus_read_lock(), we don't
need a copy of cpu_online_mask anymore, hence the two warnings will not
meet in the v2 patch.

> To resolve kernel test robot warnings ([1] and [2])? Or have they
> been fixed?
> 
> [1]: 
> https://lore.kernel.org/stable/202304081615.eiaqpbV8-lkp@intel.com/
> [2]: 
> https://lore.kernel.org/stable/202304082051.Dp50upfS-lkp@intel.com/
> 
> Thanks.
> 
> --
> An old man doll... just what I always wanted! - Clara
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-09  2:46 ` [PATCH v2] " Tze-nan Wu
  2023-04-09 12:30   ` Bagas Sanjaya
@ 2023-04-10  7:35   ` Tze-nan Wu
  2023-04-11 16:44     ` Steven Rostedt
  2023-04-14  1:16     ` Tze-nan Wu (吳澤南)
  1 sibling, 2 replies; 12+ messages in thread
From: Tze-nan Wu @ 2023-04-10  7:35 UTC (permalink / raw)
  To: rostedt, mhiramat
  Cc: bobule.chang, wsd_upstream, Tze-nan.Wu, cheng-jui.wang, npiggin,
	stable, AngeloGioacchino Del Regno, Paul E. McKenney,
	linux-kernel, linux-trace-kernel, linux-arm-kernel,
	linux-mediatek

Write to buffer_size_kb can permanently fail, due to cpu_online_mask may
changed between two for_each_online_buffer_cpu loops.
The number of increasing and decreasing on cpu_buffer->resize_disable
may be inconsistent, leading that the resize_disabled in some CPUs
becoming none zero after ring_buffer_reset_online_cpus return.

This issue can be reproduced by "echo 0 > trace" while hotplugging cpu.
After reproducing success, we can find out buffer_size_kb will not be
functional anymore.

Prevent the two "loops" in this function from iterating through different
online cpus by copying cpu_online_mask at the entry of the function.

Changes from v1 to v3:
  Declare the cpumask variable statically rather than dynamically.

Changes from v2 to v3:
  Considering holding cpu_hotplug_lock too long because of the
  synchronize_rcu(), maybe it's better to prevent the issue by copying
  cpu_online_mask at the entry of the function as V1 does, instead of
  using cpus_read_lock().

Link: https://lore.kernel.org/lkml/20230408052226.25268-1-Tze-nan.Wu@mediatek.com/
Link: https://lore.kernel.org/oe-kbuild-all/202304082051.Dp50upfS-lkp@intel.com/
Link: https://lore.kernel.org/oe-kbuild-all/202304081615.eiaqpbV8-lkp@intel.com/

Cc: stable@vger.kernel.org
Cc: npiggin@gmail.com
Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
---
 kernel/trace/ring_buffer.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 76a2d91eecad..dc758930dacb 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -288,9 +288,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
 #define for_each_buffer_cpu(buffer, cpu)		\
 	for_each_cpu(cpu, buffer->cpumask)
 
-#define for_each_online_buffer_cpu(buffer, cpu)		\
-	for_each_cpu_and(cpu, buffer->cpumask, cpu_online_mask)
-
 #define TS_SHIFT	27
 #define TS_MASK		((1ULL << TS_SHIFT) - 1)
 #define TS_DELTA_TEST	(~TS_MASK)
@@ -5353,12 +5350,19 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
 {
 	struct ring_buffer_per_cpu *cpu_buffer;
+	cpumask_t reset_online_cpumask;
 	int cpu;
 
+	/*
+	 * Record cpu_online_mask here to make sure we iterate through the same
+	 * online CPUs in the following two loops.
+	 */
+	cpumask_copy(&reset_online_cpumask, cpu_online_mask);
+
 	/* prevent another thread from changing buffer sizes */
 	mutex_lock(&buffer->mutex);
 
-	for_each_online_buffer_cpu(buffer, cpu) {
+	for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask) {
 		cpu_buffer = buffer->buffers[cpu];
 
 		atomic_inc(&cpu_buffer->resize_disabled);
@@ -5368,7 +5372,7 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
 	/* Make sure all commits have finished */
 	synchronize_rcu();
 
-	for_each_online_buffer_cpu(buffer, cpu) {
+	for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask) {
 		cpu_buffer = buffer->buffers[cpu];
 
 		reset_disabled_cpu_buffer(cpu_buffer);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-10  7:35   ` [PATCH v3] " Tze-nan Wu
@ 2023-04-11 16:44     ` Steven Rostedt
  2023-04-11 16:48       ` Steven Rostedt
  2023-04-12  2:27       ` Tze-nan Wu (吳澤南)
  2023-04-14  1:16     ` Tze-nan Wu (吳澤南)
  1 sibling, 2 replies; 12+ messages in thread
From: Steven Rostedt @ 2023-04-11 16:44 UTC (permalink / raw)
  To: Tze-nan Wu
  Cc: mhiramat, bobule.chang, wsd_upstream, cheng-jui.wang, npiggin,
	stable, AngeloGioacchino Del Regno, Paul E. McKenney,
	linux-kernel, linux-trace-kernel, linux-arm-kernel,
	linux-mediatek


Please have each new patch be a new thread, and not a Cc to the previous
version of the patch. As it makes it hard to find in INBOXs.

On Mon, 10 Apr 2023 15:35:08 +0800
Tze-nan Wu <Tze-nan.Wu@mediatek.com> wrote:

> Write to buffer_size_kb can permanently fail, due to cpu_online_mask may
> changed between two for_each_online_buffer_cpu loops.
> The number of increasing and decreasing on cpu_buffer->resize_disable
> may be inconsistent, leading that the resize_disabled in some CPUs
> becoming none zero after ring_buffer_reset_online_cpus return.
> 
> This issue can be reproduced by "echo 0 > trace" while hotplugging cpu.
> After reproducing success, we can find out buffer_size_kb will not be
> functional anymore.
> 
> Prevent the two "loops" in this function from iterating through different
> online cpus by copying cpu_online_mask at the entry of the function.
> 

The "Changes from" need to go below  the '---', otherwise they are added to
the git commit (we don't want it there).

> Changes from v1 to v3:
>   Declare the cpumask variable statically rather than dynamically.
> 
> Changes from v2 to v3:
>   Considering holding cpu_hotplug_lock too long because of the
>   synchronize_rcu(), maybe it's better to prevent the issue by copying
>   cpu_online_mask at the entry of the function as V1 does, instead of
>   using cpus_read_lock().
> 
> Link: https://lore.kernel.org/lkml/20230408052226.25268-1-Tze-nan.Wu@mediatek.com/
> Link: https://lore.kernel.org/oe-kbuild-all/202304082051.Dp50upfS-lkp@intel.com/
> Link: https://lore.kernel.org/oe-kbuild-all/202304081615.eiaqpbV8-lkp@intel.com/
> 
> Cc: stable@vger.kernel.org
> Cc: npiggin@gmail.com
> Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by avoiding synchronize_rcu for each CPU")
> Reported-by: kernel test robot <lkp@intel.com>
> Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> ---

This is where the "Changes from" go. And since this patch is not suppose to
be a Cc. But since it's still good to have a link to it. You could do:

Changes from v2 to v3: https://lore.kernel.org/linux-trace-kernel/20230409024616.31099-1-Tze-nan.Wu@mediatek.com/
  Considering holding cpu_hotplug_lock too long because of the
  synchronize_rcu(), maybe it's better to prevent the issue by copying
  cpu_online_mask at the entry of the function as V1 does, instead of
  using cpus_read_lock().


Where the previous version changes has the lore link to the previous patch,
in case someone wants to look at it.


>  kernel/trace/ring_buffer.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> index 76a2d91eecad..dc758930dacb 100644
> --- a/kernel/trace/ring_buffer.c
> +++ b/kernel/trace/ring_buffer.c
> @@ -288,9 +288,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
>  #define for_each_buffer_cpu(buffer, cpu)		\
>  	for_each_cpu(cpu, buffer->cpumask)
>  
> -#define for_each_online_buffer_cpu(buffer, cpu)		\
> -	for_each_cpu_and(cpu, buffer->cpumask, cpu_online_mask)
> -
>  #define TS_SHIFT	27
>  #define TS_MASK		((1ULL << TS_SHIFT) - 1)
>  #define TS_DELTA_TEST	(~TS_MASK)
> @@ -5353,12 +5350,19 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
>  void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
>  {
>  	struct ring_buffer_per_cpu *cpu_buffer;
> +	cpumask_t reset_online_cpumask;

It's usually considered bad form to put a cpumask on the stack. As it can
be 128 bytes for a machine with 1024 CPUs (and yes they do exist). Also,
the mask size is set to NR_CPUS not the actual size, so you do not even
need to have it that big.


>  	int cpu;
>  
> +	/*
> +	 * Record cpu_online_mask here to make sure we iterate through the same
> +	 * online CPUs in the following two loops.
> +	 */
> +	cpumask_copy(&reset_online_cpumask, cpu_online_mask);
> +
>  	/* prevent another thread from changing buffer sizes */
>  	mutex_lock(&buffer->mutex);
>  
> -	for_each_online_buffer_cpu(buffer, cpu) {
> +	for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask) {
>  		cpu_buffer = buffer->buffers[cpu];
>  
>  		atomic_inc(&cpu_buffer->resize_disabled);

Anyway, we don't need to modify any of the above, and just do the following
instead of atomic_inc():

#define RESET_BIT	(1 << 30)

		atomic_add(&cpu_buffer->resize_disabled, RESET_BIT);


> @@ -5368,7 +5372,7 @@ void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
>  	/* Make sure all commits have finished */
>  	synchronize_rcu();
>  
> -	for_each_online_buffer_cpu(buffer, cpu) {
> +	for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask) {
>  		cpu_buffer = buffer->buffers[cpu];

Then here we can do:

		/*
		 * If a CPU came online during the synchronize_rcu(), then
		 * ignore it.
		 */
		if (!atomic_read(&cpu_buffer->resize_disabled) & RESET_BIT))
			continue;

		atomic_sub(&cpu_buffer->resize_disabled, RESET_BIT);


As the resize_disabled only needs to be set to something to make it
disabled.

-- Steve

>  
>  		reset_disabled_cpu_buffer(cpu_buffer);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-11 16:44     ` Steven Rostedt
@ 2023-04-11 16:48       ` Steven Rostedt
  2023-04-12  2:27       ` Tze-nan Wu (吳澤南)
  1 sibling, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2023-04-11 16:48 UTC (permalink / raw)
  To: Tze-nan Wu
  Cc: mhiramat, bobule.chang, wsd_upstream, cheng-jui.wang, npiggin,
	stable, AngeloGioacchino Del Regno, Paul E. McKenney,
	linux-kernel, linux-trace-kernel, linux-arm-kernel,
	linux-mediatek

On Tue, 11 Apr 2023 12:44:03 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> Please have each new patch be a new thread, and not a Cc to the previous

Should have been "reply to the previous" :-p

-- Steve

> version of the patch. As it makes it hard to find in INBOXs.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-11 16:44     ` Steven Rostedt
  2023-04-11 16:48       ` Steven Rostedt
@ 2023-04-12  2:27       ` Tze-nan Wu (吳澤南)
  2023-04-12  2:32         ` Steven Rostedt
  1 sibling, 1 reply; 12+ messages in thread
From: Tze-nan Wu (吳澤南) @ 2023-04-12  2:27 UTC (permalink / raw)
  To: rostedt
  Cc: linux-kernel, linux-trace-kernel, linux-mediatek,
	Cheng-Jui Wang (王正睿),
	paulmck, wsd_upstream, Bobule Chang (張弘義),
	stable, linux-arm-kernel, mhiramat, angelogioacchino.delregno,
	npiggin

Hi Steve,

On Tue, 2023-04-11 at 12:44 -0400, Steven Rostedt wrote:
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
> 
> 
> Please have each new patch be a new thread, and not a Cc to the
> previous
> version of the patch. As it makes it hard to find in INBOXs.
> 

No problem, got it.

> On Mon, 10 Apr 2023 15:35:08 +0800
> Tze-nan Wu <Tze-nan.Wu@mediatek.com> wrote:
> 
> > Write to buffer_size_kb can permanently fail, due to
> > cpu_online_mask may
> > changed between two for_each_online_buffer_cpu loops.
> > The number of increasing and decreasing on cpu_buffer-
> > >resize_disable
> > may be inconsistent, leading that the resize_disabled in some CPUs
> > becoming none zero after ring_buffer_reset_online_cpus return.
> > 
> > This issue can be reproduced by "echo 0 > trace" while hotplugging
> > cpu.
> > After reproducing success, we can find out buffer_size_kb will not
> > be
> > functional anymore.
> > 
> > Prevent the two "loops" in this function from iterating through
> > different
> > online cpus by copying cpu_online_mask at the entry of the
> > function.
> > 
> 
> The "Changes from" need to go below  the '---', otherwise they are
> added to
> the git commit (we don't want it there).
> 

Will remember this, won't happened next time :)

> > Changes from v1 to v3:
> >   Declare the cpumask variable statically rather than dynamically.
> > 
> > Changes from v2 to v3:
> >   Considering holding cpu_hotplug_lock too long because of the
> >   synchronize_rcu(), maybe it's better to prevent the issue by
> > copying
> >   cpu_online_mask at the entry of the function as V1 does, instead
> > of
> >   using cpus_read_lock().
> > 
> > Link: 
> > https://lore.kernel.org/lkml/20230408052226.25268-1-Tze-nan.Wu@mediatek.com/
> > Link: 
> > https://lore.kernel.org/oe-kbuild-all/202304082051.Dp50upfS-lkp@intel.com/
> > Link: 
> > https://lore.kernel.org/oe-kbuild-all/202304081615.eiaqpbV8-lkp@intel.com/
> > 
> > Cc: stable@vger.kernel.org
> > Cc: npiggin@gmail.com
> > Fixes: b23d7a5f4a07 ("ring-buffer: speed up buffer resets by
> > avoiding synchronize_rcu for each CPU")
> > Reported-by: kernel test robot <lkp@intel.com>
> > Reviewed-by: Cheng-Jui Wang <cheng-jui.wang@mediatek.com>
> > Signed-off-by: Tze-nan Wu <Tze-nan.Wu@mediatek.com>
> > ---
> 
> This is where the "Changes from" go. And since this patch is not
> suppose to
> be a Cc. But since it's still good to have a link to it. You could
> do:
> 
> Changes from v2 to v3: 
> https://lore.kernel.org/linux-trace-kernel/20230409024616.31099-1-Tze-nan.Wu@mediatek.com/
>   Considering holding cpu_hotplug_lock too long because of the
>   synchronize_rcu(), maybe it's better to prevent the issue by
> copying
>   cpu_online_mask at the entry of the function as V1 does, instead of
>   using cpus_read_lock().
> 
> 
> Where the previous version changes has the lore link to the previous
> patch,
> in case someone wants to look at it.
> 

Sure, a link here is really helpful.
Will follow this format in the future.

> 
> >  kernel/trace/ring_buffer.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/trace/ring_buffer.c
> > b/kernel/trace/ring_buffer.c
> > index 76a2d91eecad..dc758930dacb 100644
> > --- a/kernel/trace/ring_buffer.c
> > +++ b/kernel/trace/ring_buffer.c
> > @@ -288,9 +288,6 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data);
> >  #define for_each_buffer_cpu(buffer, cpu)             \
> >       for_each_cpu(cpu, buffer->cpumask)
> > 
> > -#define for_each_online_buffer_cpu(buffer, cpu)              \
> > -     for_each_cpu_and(cpu, buffer->cpumask, cpu_online_mask)
> > -
> >  #define TS_SHIFT     27
> >  #define TS_MASK              ((1ULL << TS_SHIFT) - 1)
> >  #define TS_DELTA_TEST        (~TS_MASK)
> > @@ -5353,12 +5350,19 @@ EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
> >  void ring_buffer_reset_online_cpus(struct trace_buffer *buffer)
> >  {
> >       struct ring_buffer_per_cpu *cpu_buffer;
> > +     cpumask_t reset_online_cpumask;
> 
> It's usually considered bad form to put a cpumask on the stack. As it
> can
> be 128 bytes for a machine with 1024 CPUs (and yes they do exist).
> Also,
> the mask size is set to NR_CPUS not the actual size, so you do not
> even
> need to have it that big.
> 

Never thought about that until you told me,
I will keep it in mind before declare a cpumask next time.

> 
> >       int cpu;
> > 
> > +     /*
> > +      * Record cpu_online_mask here to make sure we iterate
> > through the same
> > +      * online CPUs in the following two loops.
> > +      */
> > +     cpumask_copy(&reset_online_cpumask, cpu_online_mask);
> > +
> >       /* prevent another thread from changing buffer sizes */
> >       mutex_lock(&buffer->mutex);
> > 
> > -     for_each_online_buffer_cpu(buffer, cpu) {
> > +     for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask)
> > {
> >               cpu_buffer = buffer->buffers[cpu];
> > 
> >               atomic_inc(&cpu_buffer->resize_disabled);
> 
> Anyway, we don't need to modify any of the above, and just do the
> following
> instead of atomic_inc():
> 
> #define RESET_BIT       (1 << 30)
> 
>                 atomic_add(&cpu_buffer->resize_disabled, RESET_BIT);
> 
> 
> > @@ -5368,7 +5372,7 @@ void ring_buffer_reset_online_cpus(struct
> > trace_buffer *buffer)
> >       /* Make sure all commits have finished */
> >       synchronize_rcu();
> > 
> > -     for_each_online_buffer_cpu(buffer, cpu) {
> > +     for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask)

Maybe we should use for_each_buffer_cpu(buffer, cpu) here?
since a CPU may also came offline during synchronize_rcu().

> > {
> >               cpu_buffer = buffer->buffers[cpu];
> 
> Then here we can do:
> 
>                 /*
>                  * If a CPU came online during the synchronize_rcu(),
> then
>                  * ignore it.
>                  */
>                 if (!atomic_read(&cpu_buffer->resize_disabled) &
> RESET_BIT))
>                         continue;
> 
>                 atomic_sub(&cpu_buffer->resize_disabled, RESET_BIT);
> 
> 
> As the resize_disabled only needs to be set to something to make it
> disabled.
> 
> -- Steve
> 

Thanks for all your suggestions, learn a lot from here, really
appriciate :).
I will upload a v4 patch in new thread as soon as the new patch pass my
test.

-- Tzenan


> > 
> >               reset_disabled_cpu_buffer(cpu_buffer);
> 
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-12  2:27       ` Tze-nan Wu (吳澤南)
@ 2023-04-12  2:32         ` Steven Rostedt
  0 siblings, 0 replies; 12+ messages in thread
From: Steven Rostedt @ 2023-04-12  2:32 UTC (permalink / raw)
  To: Tze-nan Wu (吳澤南)
  Cc: linux-kernel, linux-trace-kernel, linux-mediatek,
	Cheng-Jui Wang (王正睿),
	paulmck, wsd_upstream, Bobule Chang (張弘義),
	stable, linux-arm-kernel, mhiramat, angelogioacchino.delregno,
	npiggin

On Wed, 12 Apr 2023 02:27:53 +0000
Tze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com> wrote:

> > > @@ -5368,7 +5372,7 @@ void ring_buffer_reset_online_cpus(struct
> > > trace_buffer *buffer)
> > >       /* Make sure all commits have finished */
> > >       synchronize_rcu();
> > > 
> > > -     for_each_online_buffer_cpu(buffer, cpu) {
> > > +     for_each_cpu_and(cpu, buffer->cpumask, &reset_online_cpumask)  
> 
> Maybe we should use for_each_buffer_cpu(buffer, cpu) here?
> since a CPU may also came offline during synchronize_rcu().

Yeah, I guess that works too (not looking at the code at the moment though).

-- Steve

> 
> > > {
> > >               cpu_buffer = buffer->buffers[cpu];  
> > 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled
  2023-04-10  7:35   ` [PATCH v3] " Tze-nan Wu
  2023-04-11 16:44     ` Steven Rostedt
@ 2023-04-14  1:16     ` Tze-nan Wu (吳澤南)
  1 sibling, 0 replies; 12+ messages in thread
From: Tze-nan Wu (吳澤南) @ 2023-04-14  1:16 UTC (permalink / raw)
  To: mhiramat, rostedt
  Cc: linux-kernel, linux-trace-kernel, linux-mediatek,
	Cheng-Jui Wang (王正睿),
	wsd_upstream, paulmck, Bobule Chang (張弘義),
	stable, linux-arm-kernel, angelogioacchino.delregno, npiggin

In case those who are following this thread but are not included in the
cc list, the v4 patch cannot be easily found by them due to the tile of
the newer patch had been changed.

Let my put a link here:
  
https://lore.kernel.org/lkml/20230412112401.25081-1-Tze-nan.Wu@mediatek.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-04-14  1:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-08  5:22 [PATCH] ring-buffer: Prevent inconsistent operation on cpu_buffer->resize_disabled Tze-nan Wu
2023-04-08  8:35 ` kernel test robot
2023-04-08 12:20 ` kernel test robot
2023-04-09  2:46 ` [PATCH v2] " Tze-nan Wu
2023-04-09 12:30   ` Bagas Sanjaya
2023-04-10  1:35     ` Tze-nan Wu (吳澤南)
2023-04-10  7:35   ` [PATCH v3] " Tze-nan Wu
2023-04-11 16:44     ` Steven Rostedt
2023-04-11 16:48       ` Steven Rostedt
2023-04-12  2:27       ` Tze-nan Wu (吳澤南)
2023-04-12  2:32         ` Steven Rostedt
2023-04-14  1:16     ` Tze-nan Wu (吳澤南)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).