linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
@ 2021-10-29  3:05 Athira Rajeev
  2021-10-29  6:16 ` Nicholas Piggin
  2021-10-29 10:20 ` kernel test robot
  0 siblings, 2 replies; 10+ messages in thread
From: Athira Rajeev @ 2021-10-29  3:05 UTC (permalink / raw)
  To: mpe; +Cc: nathanl, maddy, rnsastry, kjain, npiggin, linuxppc-dev

During Live Partition Migration (LPM), it is observed that perf
counter values reports zero post migration completion. However
'perf stat' with workload continues to show counts post migration
since PMU gets disabled/enabled during sched switches. But incase
of system/cpu wide monitoring, zero counts were reported with 'perf
stat' after migration completion.

Example:
 ./perf stat -e r1001e -I 1000
           time             counts unit events
     1.001010437         22,137,414      r1001e
     2.002495447         15,455,821      r1001e
<<>> As seen in next below logs, the counter values shows zero
        after migration is completed.
<<>>
    86.142535370    129,392,333,440      r1001e
    87.144714617                  0      r1001e
    88.146526636                  0      r1001e
    89.148085029                  0      r1001e

Here PMU is enabled during start of perf session and counter
values are read at intervals. Counters are only disabled at the
end of session. The powerpc mobility code presently does not handle
disabling and enabling back of PMU counters during partition
migration. Also since the PMU register values are not saved/restored
during migration, PMU registers like Monitor Mode Control Register 0
(MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
the value it was programmed with. Hence PMU counters will not be
enabled correctly post migration.

Fix this in mobility code by handling disabling and enabling of
PMU in all cpu's before and after migration. Patch introduces two
functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
mobility_pmu_disable() is called before the processor threads goes
to suspend state so as to disable the PMU counters. And disable is
done only if there are any active events running on that cpu.
mobility_pmu_enable() is called after the migrate is done to enable
back the PMU counters.

Since the performance Monitor counters ( PMCs) are not
saved/restored during LPM, results in PMC value being zero and the
'event->hw.prev_count' being non-zero value. This causes problem
during updation of event->count since we always accumulate
(event->hw.prev_count - PMC value) in event->count.  If
event->hw.prev_count is greater PMC value, event->count becomes
negative. To fix this, 'prev_count' also needs to be re-initialised
for all events while enabling back the events. Hence read the
existing events and clear the PMC index (stored in event->hw.idx)
for all events im mobility_pmu_disable. By this way, event count
settings will get re-initialised correctly in power_pmu_enable.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
[ Fixed compilation error reported by kernel test robot ]
Reported-by: kernel test robot <lkp@intel.com>
---
Changelog:
Change from v2 -> v3:
Addressed review comments from Nicholas Piggin.
 - Removed the "migrate" field which was added in initial
   patch to address updation of event count settings correctly
   in power_pmu_enable. Instead read off existing events in
   mobility_pmu_disable before power_pmu_enable.
 - Moved the mobility_pmu_disable/enable declaration from
   rtas.h to perf event header file.

Addressed review comments from Nathan.
 - Moved the mobility function calls from stop_machine
   context out to pseries_migrate_partition. Also now this
   is a per cpu invocation.

Change from v1 -> v2:
 - Moved the mobility_pmu_enable and mobility_pmu_disable
   declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
   Also included 'asm/rtas.h' in core-book3s to fix the
   compilation warning reported by kernel test robot.

 arch/powerpc/include/asm/perf_event.h     |  8 +++++
 arch/powerpc/perf/core-book3s.c           | 39 +++++++++++++++++++++++
 arch/powerpc/platforms/pseries/mobility.c |  7 ++++
 3 files changed, 54 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
index 164e910bf654..88aab6cf840c 100644
--- a/arch/powerpc/include/asm/perf_event.h
+++ b/arch/powerpc/include/asm/perf_event.h
@@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
 static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
 #endif
 
+#ifdef CONFIG_PPC_PERF_CTRS
+void mobility_pmu_disable(void *unused);
+void mobility_pmu_enable(void *unused);
+#else
+static inline void mobility_pmu_disable(void *unused) { }
+static inline void mobility_pmu_enable(void *unused) { }
+#endif
+
 #ifdef CONFIG_FSL_EMB_PERF_EVENT
 #include <asm/perf_event_fsl_emb.h>
 #endif
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 73e62e9b179b..2e8c4c668fa3 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
 	local_irq_restore(flags);
 }
 
+/*
+ * Called from pseries_migrate_partition() function
+ * before migration, from powerpc/mobility code.
+ */
+void mobility_pmu_disable(void *unused)
+{
+	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+	struct perf_event *event;
+
+	if (cpuhw->n_events != 0) {
+		int i;
+
+		power_pmu_disable(NULL);
+		/*
+		 * Read off any pre-existing events because the register
+		 * values may not be migrated.
+		 */
+		for (i = 0; i < cpuhw->n_events; ++i) {
+			event = cpuhw->event[i];
+			if (event->hw.idx) {
+				power_pmu_read(event);
+				event->hw.idx = 0;
+			}
+		}
+	}
+}
+
 /*
  * Re-enable all events if disable == 0.
  * If we were previously disabled and events were added, then
@@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
 	local_irq_restore(flags);
 }
 
+/*
+ * Called from pseries_migrate_partition() function
+ * after migration, from powerpc/mobility code.
+ */
+void mobility_pmu_enable(void *unused)
+{
+	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
+
+	cpuhw->n_added = cpuhw->n_events;
+	power_pmu_enable(NULL);
+}
+
 static int collect_events(struct perf_event *group, int max_count,
 			  struct perf_event *ctrs[], u64 *events,
 			  unsigned int *flags)
diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
index e83e0891272d..3e96485ccba4 100644
--- a/arch/powerpc/platforms/pseries/mobility.c
+++ b/arch/powerpc/platforms/pseries/mobility.c
@@ -22,6 +22,7 @@
 #include <linux/delay.h>
 #include <linux/slab.h>
 #include <linux/stringify.h>
+#include <linux/perf_event.h>
 
 #include <asm/machdep.h>
 #include <asm/rtas.h>
@@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
 	if (ret)
 		return ret;
 
+	/* Disable PMU before suspend */
+	on_each_cpu(&mobility_pmu_disable, NULL, 0);
+
 	ret = pseries_suspend(handle);
 	if (ret == 0)
 		post_mobility_fixup();
 	else
 		pseries_cancel_migration(handle, ret);
 
+	/* Enable PMU after resuming */
+	on_each_cpu(&mobility_pmu_enable, NULL, 0);
+
 	return ret;
 }
 
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-10-29  3:05 [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active Athira Rajeev
@ 2021-10-29  6:16 ` Nicholas Piggin
  2021-10-29 13:15   ` Michael Ellerman
  2021-11-02  6:13   ` Athira Rajeev
  2021-10-29 10:20 ` kernel test robot
  1 sibling, 2 replies; 10+ messages in thread
From: Nicholas Piggin @ 2021-10-29  6:16 UTC (permalink / raw)
  To: Athira Rajeev, mpe; +Cc: nathanl, kjain, maddy, linuxppc-dev, rnsastry

Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
> During Live Partition Migration (LPM), it is observed that perf
> counter values reports zero post migration completion. However
> 'perf stat' with workload continues to show counts post migration
> since PMU gets disabled/enabled during sched switches. But incase
> of system/cpu wide monitoring, zero counts were reported with 'perf
> stat' after migration completion.
> 
> Example:
>  ./perf stat -e r1001e -I 1000
>            time             counts unit events
>      1.001010437         22,137,414      r1001e
>      2.002495447         15,455,821      r1001e
> <<>> As seen in next below logs, the counter values shows zero
>         after migration is completed.
> <<>>
>     86.142535370    129,392,333,440      r1001e
>     87.144714617                  0      r1001e
>     88.146526636                  0      r1001e
>     89.148085029                  0      r1001e

This is the output without the patch? After the patch it keeps counting 
I suppose? And does the very large count go away too?

> 
> Here PMU is enabled during start of perf session and counter
> values are read at intervals. Counters are only disabled at the
> end of session. The powerpc mobility code presently does not handle
> disabling and enabling back of PMU counters during partition
> migration. Also since the PMU register values are not saved/restored
> during migration, PMU registers like Monitor Mode Control Register 0
> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
> the value it was programmed with. Hence PMU counters will not be
> enabled correctly post migration.
> 
> Fix this in mobility code by handling disabling and enabling of
> PMU in all cpu's before and after migration. Patch introduces two
> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
> mobility_pmu_disable() is called before the processor threads goes
> to suspend state so as to disable the PMU counters. And disable is
> done only if there are any active events running on that cpu.
> mobility_pmu_enable() is called after the migrate is done to enable
> back the PMU counters.
> 
> Since the performance Monitor counters ( PMCs) are not
> saved/restored during LPM, results in PMC value being zero and the
> 'event->hw.prev_count' being non-zero value. This causes problem
> during updation of event->count since we always accumulate
> (event->hw.prev_count - PMC value) in event->count.  If
> event->hw.prev_count is greater PMC value, event->count becomes
> negative. To fix this, 'prev_count' also needs to be re-initialised
> for all events while enabling back the events. Hence read the
> existing events and clear the PMC index (stored in event->hw.idx)
> for all events im mobility_pmu_disable. By this way, event count
> settings will get re-initialised correctly in power_pmu_enable.
> 
> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> [ Fixed compilation error reported by kernel test robot ]
> Reported-by: kernel test robot <lkp@intel.com>
> ---
> Changelog:
> Change from v2 -> v3:
> Addressed review comments from Nicholas Piggin.
>  - Removed the "migrate" field which was added in initial
>    patch to address updation of event count settings correctly
>    in power_pmu_enable. Instead read off existing events in
>    mobility_pmu_disable before power_pmu_enable.
>  - Moved the mobility_pmu_disable/enable declaration from
>    rtas.h to perf event header file.
> 
> Addressed review comments from Nathan.
>  - Moved the mobility function calls from stop_machine
>    context out to pseries_migrate_partition. Also now this
>    is a per cpu invocation.
> 
> Change from v1 -> v2:
>  - Moved the mobility_pmu_enable and mobility_pmu_disable
>    declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>    Also included 'asm/rtas.h' in core-book3s to fix the
>    compilation warning reported by kernel test robot.
> 
>  arch/powerpc/include/asm/perf_event.h     |  8 +++++
>  arch/powerpc/perf/core-book3s.c           | 39 +++++++++++++++++++++++
>  arch/powerpc/platforms/pseries/mobility.c |  7 ++++
>  3 files changed, 54 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
> index 164e910bf654..88aab6cf840c 100644
> --- a/arch/powerpc/include/asm/perf_event.h
> +++ b/arch/powerpc/include/asm/perf_event.h
> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
>  static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
>  #endif
>  
> +#ifdef CONFIG_PPC_PERF_CTRS
> +void mobility_pmu_disable(void *unused);
> +void mobility_pmu_enable(void *unused);
> +#else
> +static inline void mobility_pmu_disable(void *unused) { }
> +static inline void mobility_pmu_enable(void *unused) { }
> +#endif
> +
>  #ifdef CONFIG_FSL_EMB_PERF_EVENT
>  #include <asm/perf_event_fsl_emb.h>
>  #endif
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 73e62e9b179b..2e8c4c668fa3 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
>  	local_irq_restore(flags);
>  }
>  
> +/*
> + * Called from pseries_migrate_partition() function
> + * before migration, from powerpc/mobility code.
> + */
> +void mobility_pmu_disable(void *unused)
> +{
> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
> +	struct perf_event *event;
> +
> +	if (cpuhw->n_events != 0) {
> +		int i;
> +
> +		power_pmu_disable(NULL);
> +		/*
> +		 * Read off any pre-existing events because the register
> +		 * values may not be migrated.
> +		 */
> +		for (i = 0; i < cpuhw->n_events; ++i) {
> +			event = cpuhw->event[i];
> +			if (event->hw.idx) {
> +				power_pmu_read(event);
> +				event->hw.idx = 0;
> +			}
> +		}
> +	}
> +}
> +
>  /*
>   * Re-enable all events if disable == 0.
>   * If we were previously disabled and events were added, then
> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
>  	local_irq_restore(flags);
>  }
>  
> +/*
> + * Called from pseries_migrate_partition() function
> + * after migration, from powerpc/mobility code.
> + */
> +void mobility_pmu_enable(void *unused)
> +{
> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
> +
> +	cpuhw->n_added = cpuhw->n_events;
> +	power_pmu_enable(NULL);
> +}
> +
>  static int collect_events(struct perf_event *group, int max_count,
>  			  struct perf_event *ctrs[], u64 *events,
>  			  unsigned int *flags)
> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
> index e83e0891272d..3e96485ccba4 100644
> --- a/arch/powerpc/platforms/pseries/mobility.c
> +++ b/arch/powerpc/platforms/pseries/mobility.c
> @@ -22,6 +22,7 @@
>  #include <linux/delay.h>
>  #include <linux/slab.h>
>  #include <linux/stringify.h>
> +#include <linux/perf_event.h>
>  
>  #include <asm/machdep.h>
>  #include <asm/rtas.h>
> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>  	if (ret)
>  		return ret;
>  
> +	/* Disable PMU before suspend */
> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);

Why was this moved out of stop machine and to an IPI?

My concern would be, what are the other CPUs doing at this time? Is it 
possible they could take interrupts and schedule? Could that mess up the
perf state here?

Thanks,
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-10-29  3:05 [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active Athira Rajeev
  2021-10-29  6:16 ` Nicholas Piggin
@ 2021-10-29 10:20 ` kernel test robot
  1 sibling, 0 replies; 10+ messages in thread
From: kernel test robot @ 2021-10-29 10:20 UTC (permalink / raw)
  To: Athira Rajeev, mpe
  Cc: nathanl, maddy, kbuild-all, rnsastry, kjain, npiggin, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 3091 bytes --]

Hi Athira,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on tip/perf/core v5.15-rc7 next-20211028]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Athira-Rajeev/powerpc-perf-Enable-PMU-counters-post-partition-migration-if-PMU-is-active/20211029-110804
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc64-randconfig-r003-20211028 (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/caeb6bb8d2f36e6b15151587193c1e0a9e62ab16
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Athira-Rajeev/powerpc-perf-Enable-PMU-counters-post-partition-migration-if-PMU-is-active/20211029-110804
        git checkout caeb6bb8d2f36e6b15151587193c1e0a9e62ab16
        # save the attached .config to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   arch/powerpc/platforms/pseries/mobility.c: In function 'pseries_migrate_partition':
>> arch/powerpc/platforms/pseries/mobility.c:670:22: error: 'mobility_pmu_disable' undeclared (first use in this function)
     670 |         on_each_cpu(&mobility_pmu_disable, NULL, 0);
         |                      ^~~~~~~~~~~~~~~~~~~~
   arch/powerpc/platforms/pseries/mobility.c:670:22: note: each undeclared identifier is reported only once for each function it appears in
>> arch/powerpc/platforms/pseries/mobility.c:679:22: error: 'mobility_pmu_enable' undeclared (first use in this function)
     679 |         on_each_cpu(&mobility_pmu_enable, NULL, 0);
         |                      ^~~~~~~~~~~~~~~~~~~


vim +/mobility_pmu_disable +670 arch/powerpc/platforms/pseries/mobility.c

   660	
   661	static int pseries_migrate_partition(u64 handle)
   662	{
   663		int ret;
   664	
   665		ret = wait_for_vasi_session_suspending(handle);
   666		if (ret)
   667			return ret;
   668	
   669		/* Disable PMU before suspend */
 > 670		on_each_cpu(&mobility_pmu_disable, NULL, 0);
   671	
   672		ret = pseries_suspend(handle);
   673		if (ret == 0)
   674			post_mobility_fixup();
   675		else
   676			pseries_cancel_migration(handle, ret);
   677	
   678		/* Enable PMU after resuming */
 > 679		on_each_cpu(&mobility_pmu_enable, NULL, 0);
   680	
   681		return ret;
   682	}
   683	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 36333 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-10-29  6:16 ` Nicholas Piggin
@ 2021-10-29 13:15   ` Michael Ellerman
  2021-11-02 11:00     ` Athira Rajeev
  2021-11-02 11:35     ` Nicholas Piggin
  2021-11-02  6:13   ` Athira Rajeev
  1 sibling, 2 replies; 10+ messages in thread
From: Michael Ellerman @ 2021-10-29 13:15 UTC (permalink / raw)
  To: Nicholas Piggin, Athira Rajeev
  Cc: nathanl, kjain, maddy, linuxppc-dev, rnsastry

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>> 
>> Example:
>>  ./perf stat -e r1001e -I 1000
>>            time             counts unit events
>>      1.001010437         22,137,414      r1001e
>>      2.002495447         15,455,821      r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>>         after migration is completed.
>> <<>>
>>     86.142535370    129,392,333,440      r1001e
>>     87.144714617                  0      r1001e
>>     88.146526636                  0      r1001e
>>     89.148085029                  0      r1001e
>
> This is the output without the patch? After the patch it keeps counting 
> I suppose? And does the very large count go away too?
>
>> 
>> Here PMU is enabled during start of perf session and counter
>> values are read at intervals. Counters are only disabled at the
>> end of session. The powerpc mobility code presently does not handle
>> disabling and enabling back of PMU counters during partition
>> migration. Also since the PMU register values are not saved/restored
>> during migration, PMU registers like Monitor Mode Control Register 0
>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>> the value it was programmed with. Hence PMU counters will not be
>> enabled correctly post migration.
>> 
>> Fix this in mobility code by handling disabling and enabling of
>> PMU in all cpu's before and after migration. Patch introduces two
>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>> mobility_pmu_disable() is called before the processor threads goes
>> to suspend state so as to disable the PMU counters. And disable is
>> done only if there are any active events running on that cpu.
>> mobility_pmu_enable() is called after the migrate is done to enable
>> back the PMU counters.
>> 
>> Since the performance Monitor counters ( PMCs) are not
>> saved/restored during LPM, results in PMC value being zero and the
>> 'event->hw.prev_count' being non-zero value. This causes problem
>> during updation of event->count since we always accumulate
>> (event->hw.prev_count - PMC value) in event->count.  If
>> event->hw.prev_count is greater PMC value, event->count becomes
>> negative. To fix this, 'prev_count' also needs to be re-initialised
>> for all events while enabling back the events. Hence read the
>> existing events and clear the PMC index (stored in event->hw.idx)
>> for all events im mobility_pmu_disable. By this way, event count
>> settings will get re-initialised correctly in power_pmu_enable.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> [ Fixed compilation error reported by kernel test robot ]
>> Reported-by: kernel test robot <lkp@intel.com>
>> ---
>> Changelog:
>> Change from v2 -> v3:
>> Addressed review comments from Nicholas Piggin.
>>  - Removed the "migrate" field which was added in initial
>>    patch to address updation of event count settings correctly
>>    in power_pmu_enable. Instead read off existing events in
>>    mobility_pmu_disable before power_pmu_enable.
>>  - Moved the mobility_pmu_disable/enable declaration from
>>    rtas.h to perf event header file.
>> 
>> Addressed review comments from Nathan.
>>  - Moved the mobility function calls from stop_machine
>>    context out to pseries_migrate_partition. Also now this
>>    is a per cpu invocation.
>> 
>> Change from v1 -> v2:
>>  - Moved the mobility_pmu_enable and mobility_pmu_disable
>>    declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>>    Also included 'asm/rtas.h' in core-book3s to fix the
>>    compilation warning reported by kernel test robot.
>> 
>>  arch/powerpc/include/asm/perf_event.h     |  8 +++++
>>  arch/powerpc/perf/core-book3s.c           | 39 +++++++++++++++++++++++
>>  arch/powerpc/platforms/pseries/mobility.c |  7 ++++
>>  3 files changed, 54 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
>> index 164e910bf654..88aab6cf840c 100644
>> --- a/arch/powerpc/include/asm/perf_event.h
>> +++ b/arch/powerpc/include/asm/perf_event.h
>> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
>>  static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
>>  #endif
>>  
>> +#ifdef CONFIG_PPC_PERF_CTRS
>> +void mobility_pmu_disable(void *unused);
>> +void mobility_pmu_enable(void *unused);
>> +#else
>> +static inline void mobility_pmu_disable(void *unused) { }
>> +static inline void mobility_pmu_enable(void *unused) { }
>> +#endif
>> +
>>  #ifdef CONFIG_FSL_EMB_PERF_EVENT
>>  #include <asm/perf_event_fsl_emb.h>
>>  #endif
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 73e62e9b179b..2e8c4c668fa3 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
>>  	local_irq_restore(flags);
>>  }
>>  
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * before migration, from powerpc/mobility code.
>> + */

These are only needed if pseries is built, so should be inside a PSERIES
ifdef.

This function should handle iterating over CPUs, that shouldn't be left
up to the mobility.c code.

And the names should be something like pmu_start_migration(),
pmu_finish_migration().

>> +void mobility_pmu_disable(void *unused)
>> +{
>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +	struct perf_event *event;
>> +
>> +	if (cpuhw->n_events != 0) {
>> +		int i;
>> +
>> +		power_pmu_disable(NULL);
>> +		/*
>> +		 * Read off any pre-existing events because the register
>> +		 * values may not be migrated.
>> +		 */
>> +		for (i = 0; i < cpuhw->n_events; ++i) {
>> +			event = cpuhw->event[i];
>> +			if (event->hw.idx) {
>> +				power_pmu_read(event);
>> +				event->hw.idx = 0;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>>  /*
>>   * Re-enable all events if disable == 0.
>>   * If we were previously disabled and events were added, then
>> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
>>  	local_irq_restore(flags);
>>  }
>>  
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * after migration, from powerpc/mobility code.
>> + */
>> +void mobility_pmu_enable(void *unused)
>> +{
>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	cpuhw->n_added = cpuhw->n_events;
>> +	power_pmu_enable(NULL);
>> +}
>> +
>>  static int collect_events(struct perf_event *group, int max_count,
>>  			  struct perf_event *ctrs[], u64 *events,
>>  			  unsigned int *flags)
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index e83e0891272d..3e96485ccba4 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -22,6 +22,7 @@
>>  #include <linux/delay.h>
>>  #include <linux/slab.h>
>>  #include <linux/stringify.h>
>> +#include <linux/perf_event.h>
>>  
>>  #include <asm/machdep.h>
>>  #include <asm/rtas.h>
>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>  	if (ret)
>>  		return ret;
>>  
>> +	/* Disable PMU before suspend */
>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>
> Why was this moved out of stop machine and to an IPI?
>
> My concern would be, what are the other CPUs doing at this time? Is it 
> possible they could take interrupts and schedule? Could that mess up the
> perf state here?

pseries_migrate_partition() is called directly from migration_store(),
which is the sysfs store function, which can be called concurrently by
different CPUs.

It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
from sys_rtas(), again with no locking.

So we could have two CPUs calling into here at the same time, which
might not crash, but is unlikely to work well.

I think the lack of locking might have been OK in the past because only
one CPU will successfully get the other CPUs to call do_join() in
pseries_suspend(). But I could be wrong.

Anyway, now that we're mutating the PMU state before suspending we need
to be more careful. So I think we need a lock around the whole sequence.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-10-29  6:16 ` Nicholas Piggin
  2021-10-29 13:15   ` Michael Ellerman
@ 2021-11-02  6:13   ` Athira Rajeev
  1 sibling, 0 replies; 10+ messages in thread
From: Athira Rajeev @ 2021-11-02  6:13 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Nathan Lynch, maddy, rnsastry, kjain, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 8090 bytes --]



> On 29-Oct-2021, at 11:46 AM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>> During Live Partition Migration (LPM), it is observed that perf
>> counter values reports zero post migration completion. However
>> 'perf stat' with workload continues to show counts post migration
>> since PMU gets disabled/enabled during sched switches. But incase
>> of system/cpu wide monitoring, zero counts were reported with 'perf
>> stat' after migration completion.
>> 
>> Example:
>> ./perf stat -e r1001e -I 1000
>>           time             counts unit events
>>     1.001010437         22,137,414      r1001e
>>     2.002495447         15,455,821      r1001e
>> <<>> As seen in next below logs, the counter values shows zero
>>        after migration is completed.
>> <<>>
>>    86.142535370    129,392,333,440      r1001e
>>    87.144714617                  0      r1001e
>>    88.146526636                  0      r1001e
>>    89.148085029                  0      r1001e
> 
> This is the output without the patch? After the patch it keeps counting 
> I suppose? And does the very large count go away too?

Hi Nick,

Thanks for the review comments.
The output is without the patch. After the patch it keeps counting and negative count goes away.
My bad, missed to keep both results in present version. I will add the before and after patch results in next version.


> 
>> 
>> Here PMU is enabled during start of perf session and counter
>> values are read at intervals. Counters are only disabled at the
>> end of session. The powerpc mobility code presently does not handle
>> disabling and enabling back of PMU counters during partition
>> migration. Also since the PMU register values are not saved/restored
>> during migration, PMU registers like Monitor Mode Control Register 0
>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>> the value it was programmed with. Hence PMU counters will not be
>> enabled correctly post migration.
>> 
>> Fix this in mobility code by handling disabling and enabling of
>> PMU in all cpu's before and after migration. Patch introduces two
>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>> mobility_pmu_disable() is called before the processor threads goes
>> to suspend state so as to disable the PMU counters. And disable is
>> done only if there are any active events running on that cpu.
>> mobility_pmu_enable() is called after the migrate is done to enable
>> back the PMU counters.
>> 
>> Since the performance Monitor counters ( PMCs) are not
>> saved/restored during LPM, results in PMC value being zero and the
>> 'event->hw.prev_count' being non-zero value. This causes problem
>> during updation of event->count since we always accumulate
>> (event->hw.prev_count - PMC value) in event->count.  If
>> event->hw.prev_count is greater PMC value, event->count becomes
>> negative. To fix this, 'prev_count' also needs to be re-initialised
>> for all events while enabling back the events. Hence read the
>> existing events and clear the PMC index (stored in event->hw.idx)
>> for all events im mobility_pmu_disable. By this way, event count
>> settings will get re-initialised correctly in power_pmu_enable.
>> 
>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>> [ Fixed compilation error reported by kernel test robot ]
>> Reported-by: kernel test robot <lkp@intel.com>
>> ---
>> Changelog:
>> Change from v2 -> v3:
>> Addressed review comments from Nicholas Piggin.
>> - Removed the "migrate" field which was added in initial
>>   patch to address updation of event count settings correctly
>>   in power_pmu_enable. Instead read off existing events in
>>   mobility_pmu_disable before power_pmu_enable.
>> - Moved the mobility_pmu_disable/enable declaration from
>>   rtas.h to perf event header file.
>> 
>> Addressed review comments from Nathan.
>> - Moved the mobility function calls from stop_machine
>>   context out to pseries_migrate_partition. Also now this
>>   is a per cpu invocation.
>> 
>> Change from v1 -> v2:
>> - Moved the mobility_pmu_enable and mobility_pmu_disable
>>   declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>>   Also included 'asm/rtas.h' in core-book3s to fix the
>>   compilation warning reported by kernel test robot.
>> 
>> arch/powerpc/include/asm/perf_event.h     |  8 +++++
>> arch/powerpc/perf/core-book3s.c           | 39 +++++++++++++++++++++++
>> arch/powerpc/platforms/pseries/mobility.c |  7 ++++
>> 3 files changed, 54 insertions(+)
>> 
>> diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
>> index 164e910bf654..88aab6cf840c 100644
>> --- a/arch/powerpc/include/asm/perf_event.h
>> +++ b/arch/powerpc/include/asm/perf_event.h
>> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
>> static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
>> #endif
>> 
>> +#ifdef CONFIG_PPC_PERF_CTRS
>> +void mobility_pmu_disable(void *unused);
>> +void mobility_pmu_enable(void *unused);
>> +#else
>> +static inline void mobility_pmu_disable(void *unused) { }
>> +static inline void mobility_pmu_enable(void *unused) { }
>> +#endif
>> +
>> #ifdef CONFIG_FSL_EMB_PERF_EVENT
>> #include <asm/perf_event_fsl_emb.h>
>> #endif
>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>> index 73e62e9b179b..2e8c4c668fa3 100644
>> --- a/arch/powerpc/perf/core-book3s.c
>> +++ b/arch/powerpc/perf/core-book3s.c
>> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
>> 	local_irq_restore(flags);
>> }
>> 
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * before migration, from powerpc/mobility code.
>> + */
>> +void mobility_pmu_disable(void *unused)
>> +{
>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +	struct perf_event *event;
>> +
>> +	if (cpuhw->n_events != 0) {
>> +		int i;
>> +
>> +		power_pmu_disable(NULL);
>> +		/*
>> +		 * Read off any pre-existing events because the register
>> +		 * values may not be migrated.
>> +		 */
>> +		for (i = 0; i < cpuhw->n_events; ++i) {
>> +			event = cpuhw->event[i];
>> +			if (event->hw.idx) {
>> +				power_pmu_read(event);
>> +				event->hw.idx = 0;
>> +			}
>> +		}
>> +	}
>> +}
>> +
>> /*
>>  * Re-enable all events if disable == 0.
>>  * If we were previously disabled and events were added, then
>> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
>> 	local_irq_restore(flags);
>> }
>> 
>> +/*
>> + * Called from pseries_migrate_partition() function
>> + * after migration, from powerpc/mobility code.
>> + */
>> +void mobility_pmu_enable(void *unused)
>> +{
>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>> +
>> +	cpuhw->n_added = cpuhw->n_events;
>> +	power_pmu_enable(NULL);
>> +}
>> +
>> static int collect_events(struct perf_event *group, int max_count,
>> 			  struct perf_event *ctrs[], u64 *events,
>> 			  unsigned int *flags)
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index e83e0891272d..3e96485ccba4 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -22,6 +22,7 @@
>> #include <linux/delay.h>
>> #include <linux/slab.h>
>> #include <linux/stringify.h>
>> +#include <linux/perf_event.h>
>> 
>> #include <asm/machdep.h>
>> #include <asm/rtas.h>
>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>> 	if (ret)
>> 		return ret;
>> 
>> +	/* Disable PMU before suspend */
>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
> 
> Why was this moved out of stop machine and to an IPI?
> 
> My concern would be, what are the other CPUs doing at this time? Is it 
> possible they could take interrupts and schedule? Could that mess up the
> perf state here?
> 
> Thanks,
> Nick


[-- Attachment #2: Type: text/html, Size: 23307 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-10-29 13:15   ` Michael Ellerman
@ 2021-11-02 11:00     ` Athira Rajeev
  2021-11-02 11:35     ` Nicholas Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Athira Rajeev @ 2021-11-02 11:00 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, maddy, rnsastry, kjain, Nicholas Piggin, linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 9755 bytes --]



> On 29-Oct-2021, at 6:45 PM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> Nicholas Piggin <npiggin@gmail.com <mailto:npiggin@gmail.com>> writes:
>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>> During Live Partition Migration (LPM), it is observed that perf
>>> counter values reports zero post migration completion. However
>>> 'perf stat' with workload continues to show counts post migration
>>> since PMU gets disabled/enabled during sched switches. But incase
>>> of system/cpu wide monitoring, zero counts were reported with 'perf
>>> stat' after migration completion.
>>> 
>>> Example:
>>> ./perf stat -e r1001e -I 1000
>>>           time             counts unit events
>>>     1.001010437         22,137,414      r1001e
>>>     2.002495447         15,455,821      r1001e
>>> <<>> As seen in next below logs, the counter values shows zero
>>>        after migration is completed.
>>> <<>>
>>>    86.142535370    129,392,333,440      r1001e
>>>    87.144714617                  0      r1001e
>>>    88.146526636                  0      r1001e
>>>    89.148085029                  0      r1001e
>> 
>> This is the output without the patch? After the patch it keeps counting 
>> I suppose? And does the very large count go away too?
>> 
>>> 
>>> Here PMU is enabled during start of perf session and counter
>>> values are read at intervals. Counters are only disabled at the
>>> end of session. The powerpc mobility code presently does not handle
>>> disabling and enabling back of PMU counters during partition
>>> migration. Also since the PMU register values are not saved/restored
>>> during migration, PMU registers like Monitor Mode Control Register 0
>>> (MMCR0), Monitor Mode Control Register 1 (MMCR1) will not contain
>>> the value it was programmed with. Hence PMU counters will not be
>>> enabled correctly post migration.
>>> 
>>> Fix this in mobility code by handling disabling and enabling of
>>> PMU in all cpu's before and after migration. Patch introduces two
>>> functions 'mobility_pmu_disable' and 'mobility_pmu_enable'.
>>> mobility_pmu_disable() is called before the processor threads goes
>>> to suspend state so as to disable the PMU counters. And disable is
>>> done only if there are any active events running on that cpu.
>>> mobility_pmu_enable() is called after the migrate is done to enable
>>> back the PMU counters.
>>> 
>>> Since the performance Monitor counters ( PMCs) are not
>>> saved/restored during LPM, results in PMC value being zero and the
>>> 'event->hw.prev_count' being non-zero value. This causes problem
>>> during updation of event->count since we always accumulate
>>> (event->hw.prev_count - PMC value) in event->count.  If
>>> event->hw.prev_count is greater PMC value, event->count becomes
>>> negative. To fix this, 'prev_count' also needs to be re-initialised
>>> for all events while enabling back the events. Hence read the
>>> existing events and clear the PMC index (stored in event->hw.idx)
>>> for all events im mobility_pmu_disable. By this way, event count
>>> settings will get re-initialised correctly in power_pmu_enable.
>>> 
>>> Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
>>> [ Fixed compilation error reported by kernel test robot ]
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> ---
>>> Changelog:
>>> Change from v2 -> v3:
>>> Addressed review comments from Nicholas Piggin.
>>> - Removed the "migrate" field which was added in initial
>>>   patch to address updation of event count settings correctly
>>>   in power_pmu_enable. Instead read off existing events in
>>>   mobility_pmu_disable before power_pmu_enable.
>>> - Moved the mobility_pmu_disable/enable declaration from
>>>   rtas.h to perf event header file.
>>> 
>>> Addressed review comments from Nathan.
>>> - Moved the mobility function calls from stop_machine
>>>   context out to pseries_migrate_partition. Also now this
>>>   is a per cpu invocation.
>>> 
>>> Change from v1 -> v2:
>>> - Moved the mobility_pmu_enable and mobility_pmu_disable
>>>   declarations under CONFIG_PPC_PERF_CTRS in rtas.h.
>>>   Also included 'asm/rtas.h' in core-book3s to fix the
>>>   compilation warning reported by kernel test robot.
>>> 
>>> arch/powerpc/include/asm/perf_event.h     |  8 +++++
>>> arch/powerpc/perf/core-book3s.c           | 39 +++++++++++++++++++++++
>>> arch/powerpc/platforms/pseries/mobility.c |  7 ++++
>>> 3 files changed, 54 insertions(+)
>>> 
>>> diff --git a/arch/powerpc/include/asm/perf_event.h b/arch/powerpc/include/asm/perf_event.h
>>> index 164e910bf654..88aab6cf840c 100644
>>> --- a/arch/powerpc/include/asm/perf_event.h
>>> +++ b/arch/powerpc/include/asm/perf_event.h
>>> @@ -17,6 +17,14 @@ static inline bool is_sier_available(void) { return false; }
>>> static inline unsigned long get_pmcs_ext_regs(int idx) { return 0; }
>>> #endif
>>> 
>>> +#ifdef CONFIG_PPC_PERF_CTRS
>>> +void mobility_pmu_disable(void *unused);
>>> +void mobility_pmu_enable(void *unused);
>>> +#else
>>> +static inline void mobility_pmu_disable(void *unused) { }
>>> +static inline void mobility_pmu_enable(void *unused) { }
>>> +#endif
>>> +
>>> #ifdef CONFIG_FSL_EMB_PERF_EVENT
>>> #include <asm/perf_event_fsl_emb.h>
>>> #endif
>>> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
>>> index 73e62e9b179b..2e8c4c668fa3 100644
>>> --- a/arch/powerpc/perf/core-book3s.c
>>> +++ b/arch/powerpc/perf/core-book3s.c
>>> @@ -1343,6 +1343,33 @@ static void power_pmu_disable(struct pmu *pmu)
>>> 	local_irq_restore(flags);
>>> }
>>> 
>>> +/*
>>> + * Called from pseries_migrate_partition() function
>>> + * before migration, from powerpc/mobility code.
>>> + */
> 
> These are only needed if pseries is built, so should be inside a PSERIES
> ifdef.

Sure mpe, will address this change in next version
> 
> This function should handle iterating over CPUs, that shouldn't be left
> up to the mobility.c code.
> 
> And the names should be something like pmu_start_migration(),
> pmu_finish_migration().

Ok, will change
> 
>>> +void mobility_pmu_disable(void *unused)
>>> +{
>>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>>> +	struct perf_event *event;
>>> +
>>> +	if (cpuhw->n_events != 0) {
>>> +		int i;
>>> +
>>> +		power_pmu_disable(NULL);
>>> +		/*
>>> +		 * Read off any pre-existing events because the register
>>> +		 * values may not be migrated.
>>> +		 */
>>> +		for (i = 0; i < cpuhw->n_events; ++i) {
>>> +			event = cpuhw->event[i];
>>> +			if (event->hw.idx) {
>>> +				power_pmu_read(event);
>>> +				event->hw.idx = 0;
>>> +			}
>>> +		}
>>> +	}
>>> +}
>>> +
>>> /*
>>>  * Re-enable all events if disable == 0.
>>>  * If we were previously disabled and events were added, then
>>> @@ -1515,6 +1542,18 @@ static void power_pmu_enable(struct pmu *pmu)
>>> 	local_irq_restore(flags);
>>> }
>>> 
>>> +/*
>>> + * Called from pseries_migrate_partition() function
>>> + * after migration, from powerpc/mobility code.
>>> + */
>>> +void mobility_pmu_enable(void *unused)
>>> +{
>>> +	struct cpu_hw_events *cpuhw = this_cpu_ptr(&cpu_hw_events);
>>> +
>>> +	cpuhw->n_added = cpuhw->n_events;
>>> +	power_pmu_enable(NULL);
>>> +}
>>> +
>>> static int collect_events(struct perf_event *group, int max_count,
>>> 			  struct perf_event *ctrs[], u64 *events,
>>> 			  unsigned int *flags)
>>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>>> index e83e0891272d..3e96485ccba4 100644
>>> --- a/arch/powerpc/platforms/pseries/mobility.c
>>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>>> @@ -22,6 +22,7 @@
>>> #include <linux/delay.h>
>>> #include <linux/slab.h>
>>> #include <linux/stringify.h>
>>> +#include <linux/perf_event.h>
>>> 
>>> #include <asm/machdep.h>
>>> #include <asm/rtas.h>
>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>> 	if (ret)
>>> 		return ret;
>>> 
>>> +	/* Disable PMU before suspend */
>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>> 
>> Why was this moved out of stop machine and to an IPI?
>> 
>> My concern would be, what are the other CPUs doing at this time? Is it 
>> possible they could take interrupts and schedule? Could that mess up the
>> perf state here?
> 
> pseries_migrate_partition() is called directly from migration_store(),
> which is the sysfs store function, which can be called concurrently by
> different CPUs.
> 
> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
> from sys_rtas(), again with no locking.
> 
> So we could have two CPUs calling into here at the same time, which
> might not crash, but is unlikely to work well.
> 
> I think the lack of locking might have been OK in the past because only
> one CPU will successfully get the other CPUs to call do_join() in
> pseries_suspend(). But I could be wrong.
> 
> Anyway, now that we're mutating the PMU state before suspending we need
> to be more careful. So I think we need a lock around the whole sequence.

Thanks for the review comments.

The PMU callback invocations were from inside stop machine ( inside “do_join” ) in initial version.
Moved these out to pseries_migrate_partition as an IPI so as to minimise calls to other sub-system from “do_join” context, as pointed by Nathan.
But if it is not safe to have per-cpu invocation from pseries_migrate_partition, should we handle this under the interrupt disable path in do_join itself ( as in the initial version ) ? 

Please share your thoughts/suggestions.

Thanks
Athira

> 
> cheers


[-- Attachment #2: Type: text/html, Size: 35735 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-10-29 13:15   ` Michael Ellerman
  2021-11-02 11:00     ` Athira Rajeev
@ 2021-11-02 11:35     ` Nicholas Piggin
  2021-11-03 21:11       ` Nathan Lynch
  1 sibling, 1 reply; 10+ messages in thread
From: Nicholas Piggin @ 2021-11-02 11:35 UTC (permalink / raw)
  To: Athira Rajeev, Michael Ellerman
  Cc: nathanl, kjain, maddy, linuxppc-dev, rnsastry

Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>  	if (ret)
>>>  		return ret;
>>>  
>>> +	/* Disable PMU before suspend */
>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>
>> Why was this moved out of stop machine and to an IPI?
>>
>> My concern would be, what are the other CPUs doing at this time? Is it 
>> possible they could take interrupts and schedule? Could that mess up the
>> perf state here?
> 
> pseries_migrate_partition() is called directly from migration_store(),
> which is the sysfs store function, which can be called concurrently by
> different CPUs.
> 
> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
> from sys_rtas(), again with no locking.
> 
> So we could have two CPUs calling into here at the same time, which
> might not crash, but is unlikely to work well.
> 
> I think the lack of locking might have been OK in the past because only
> one CPU will successfully get the other CPUs to call do_join() in
> pseries_suspend(). But I could be wrong.
> 
> Anyway, now that we're mutating the PMU state before suspending we need
> to be more careful. So I think we need a lock around the whole sequence.

My concern is still that we wouldn't necessarily have the other CPUs 
under control at that point even if we serialize the migrate path.
They could take interrupts, possibly call into perf subsystem after
the mobility_pmu_disable (e.g., via syscall or context switch) which 
might mess things up.

I think the stop machine is a reasonable place for the code in this 
case. It's a low level disabling of hardware facility and saving off 
registers.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-11-02 11:35     ` Nicholas Piggin
@ 2021-11-03 21:11       ` Nathan Lynch
  2021-11-04  5:55         ` Michael Ellerman
  2021-11-05  1:46         ` Nicholas Piggin
  0 siblings, 2 replies; 10+ messages in thread
From: Nathan Lynch @ 2021-11-03 21:11 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: Athira Rajeev, rnsastry, kjain, maddy, linuxppc-dev

Nicholas Piggin <npiggin@gmail.com> writes:
> Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
>> Nicholas Piggin <npiggin@gmail.com> writes:
>>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>>  	if (ret)
>>>>  		return ret;
>>>>  
>>>> +	/* Disable PMU before suspend */
>>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>>
>>> Why was this moved out of stop machine and to an IPI?
>>>
>>> My concern would be, what are the other CPUs doing at this time? Is it 
>>> possible they could take interrupts and schedule? Could that mess up the
>>> perf state here?
>> 
>> pseries_migrate_partition() is called directly from migration_store(),
>> which is the sysfs store function, which can be called concurrently by
>> different CPUs.
>> 
>> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
>> from sys_rtas(), again with no locking.
>> 
>> So we could have two CPUs calling into here at the same time, which
>> might not crash, but is unlikely to work well.
>> 
>> I think the lack of locking might have been OK in the past because only
>> one CPU will successfully get the other CPUs to call do_join() in
>> pseries_suspend(). But I could be wrong.
>> 
>> Anyway, now that we're mutating the PMU state before suspending we need
>> to be more careful. So I think we need a lock around the whole
>> sequence.

Regardless of the outcome here, generally agreed that some serialization
should be imposed in this path. The way the platform works (and some
extra measures by the drmgr utility) make it so that this code isn't
entered concurrently in usual operation, but it's possible to make it
happen if you are root.

A file-static mutex should be OK.

> My concern is still that we wouldn't necessarily have the other CPUs 
> under control at that point even if we serialize the migrate path.
> They could take interrupts, possibly call into perf subsystem after
> the mobility_pmu_disable (e.g., via syscall or context switch) which 
> might mess things up.
>
> I think the stop machine is a reasonable place for the code in this 
> case. It's a low level disabling of hardware facility and saving off 
> registers.

That makes sense, but I can't help feeling concerned still. For this to
be safe, power_pmu_enable() and power_pmu_disable() must never sleep or
re-enable interrupts or send IPIs. I don't see anything obviously unsafe
right now, but is that already part of their contract? Is there much
risk they could change in the future to violate those constraints?

That aside, the proposed change seems like we would be hacking around a
more generic perf/pmu limitation in a powerpc-specific way. I see the
same behavior on x86 across suspend/resume.

# perf stat -a -e cache-misses -I 1000 & sleep 2 ; systemctl suspend ; sleep 20 ; kill $(jobs -p)
[1] 189806
#           time             counts unit events
     1.000501710          9,983,649      cache-misses
     2.002620321         14,131,072      cache-misses
     3.004579071         23,010,971      cache-misses
     9.971854783 140,737,491,680,853      cache-misses
    10.982669250                  0      cache-misses
    11.984660498                  0      cache-misses
    12.986648392                  0      cache-misses
    13.988561766                  0      cache-misses
    14.992670615                  0      cache-misses
    15.994938111                  0      cache-misses
    16.996703952                  0      cache-misses
    17.999092812                  0      cache-misses
    19.000602677                  0      cache-misses
    20.003272216                  0      cache-misses
    21.004770295                  0      cache-misses
# uname -r
5.13.19-100.fc33.x86_64



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-11-03 21:11       ` Nathan Lynch
@ 2021-11-04  5:55         ` Michael Ellerman
  2021-11-05  1:46         ` Nicholas Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Ellerman @ 2021-11-04  5:55 UTC (permalink / raw)
  To: Nathan Lynch, Nicholas Piggin
  Cc: kjain, Athira Rajeev, maddy, linuxppc-dev, rnsastry

Nathan Lynch <nathanl@linux.ibm.com> writes:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>>>  	if (ret)
>>>>>  		return ret;
>>>>>  
>>>>> +	/* Disable PMU before suspend */
>>>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>>>
>>>> Why was this moved out of stop machine and to an IPI?
>>>>
>>>> My concern would be, what are the other CPUs doing at this time? Is it 
>>>> possible they could take interrupts and schedule? Could that mess up the
>>>> perf state here?
>>> 
>>> pseries_migrate_partition() is called directly from migration_store(),
>>> which is the sysfs store function, which can be called concurrently by
>>> different CPUs.
>>> 
>>> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
>>> from sys_rtas(), again with no locking.
>>> 
>>> So we could have two CPUs calling into here at the same time, which
>>> might not crash, but is unlikely to work well.
>>> 
>>> I think the lack of locking might have been OK in the past because only
>>> one CPU will successfully get the other CPUs to call do_join() in
>>> pseries_suspend(). But I could be wrong.
>>> 
>>> Anyway, now that we're mutating the PMU state before suspending we need
>>> to be more careful. So I think we need a lock around the whole
>>> sequence.
>
> Regardless of the outcome here, generally agreed that some serialization
> should be imposed in this path. The way the platform works (and some
> extra measures by the drmgr utility) make it so that this code isn't
> entered concurrently in usual operation, but it's possible to make it
> happen if you are root.

Yeah I agree it's unlikely to be a problem in practice.

> A file-static mutex should be OK.

Ack.

>> My concern is still that we wouldn't necessarily have the other CPUs 
>> under control at that point even if we serialize the migrate path.
>> They could take interrupts, possibly call into perf subsystem after
>> the mobility_pmu_disable (e.g., via syscall or context switch) which 
>> might mess things up.
>>
>> I think the stop machine is a reasonable place for the code in this 
>> case. It's a low level disabling of hardware facility and saving off 
>> registers.
>
> That makes sense, but I can't help feeling concerned still. For this to
> be safe, power_pmu_enable() and power_pmu_disable() must never sleep or
> re-enable interrupts or send IPIs. I don't see anything obviously unsafe
> right now, but is that already part of their contract? Is there much
> risk they could change in the future to violate those constraints?
>
> That aside, the proposed change seems like we would be hacking around a
> more generic perf/pmu limitation in a powerpc-specific way. I see the
> same behavior on x86 across suspend/resume.
>
> # perf stat -a -e cache-misses -I 1000 & sleep 2 ; systemctl suspend ; sleep 20 ; kill $(jobs -p)
> [1] 189806
> #           time             counts unit events
>      1.000501710          9,983,649      cache-misses
>      2.002620321         14,131,072      cache-misses
>      3.004579071         23,010,971      cache-misses
>      9.971854783 140,737,491,680,853      cache-misses
>     10.982669250                  0      cache-misses
>     11.984660498                  0      cache-misses
>     12.986648392                  0      cache-misses
>     13.988561766                  0      cache-misses
>     14.992670615                  0      cache-misses
>     15.994938111                  0      cache-misses
>     16.996703952                  0      cache-misses
>     17.999092812                  0      cache-misses
>     19.000602677                  0      cache-misses
>     20.003272216                  0      cache-misses
>     21.004770295                  0      cache-misses
> # uname -r
> 5.13.19-100.fc33.x86_64

That is interesting.

Athira, I guess we should bring that to the perf maintainers and see if
there's any interest in solving the issue in a generic fashion.

cheers

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active
  2021-11-03 21:11       ` Nathan Lynch
  2021-11-04  5:55         ` Michael Ellerman
@ 2021-11-05  1:46         ` Nicholas Piggin
  1 sibling, 0 replies; 10+ messages in thread
From: Nicholas Piggin @ 2021-11-05  1:46 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Athira Rajeev, rnsastry, kjain, maddy, linuxppc-dev

Excerpts from Nathan Lynch's message of November 4, 2021 7:11 am:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> Excerpts from Michael Ellerman's message of October 29, 2021 11:15 pm:
>>> Nicholas Piggin <npiggin@gmail.com> writes:
>>>> Excerpts from Athira Rajeev's message of October 29, 2021 1:05 pm:
>>>>> @@ -631,12 +632,18 @@ static int pseries_migrate_partition(u64 handle)
>>>>>  	if (ret)
>>>>>  		return ret;
>>>>>  
>>>>> +	/* Disable PMU before suspend */
>>>>> +	on_each_cpu(&mobility_pmu_disable, NULL, 0);
>>>>
>>>> Why was this moved out of stop machine and to an IPI?
>>>>
>>>> My concern would be, what are the other CPUs doing at this time? Is it 
>>>> possible they could take interrupts and schedule? Could that mess up the
>>>> perf state here?
>>> 
>>> pseries_migrate_partition() is called directly from migration_store(),
>>> which is the sysfs store function, which can be called concurrently by
>>> different CPUs.
>>> 
>>> It's also potentially called from rtas_syscall_dispatch_ibm_suspend_me(),
>>> from sys_rtas(), again with no locking.
>>> 
>>> So we could have two CPUs calling into here at the same time, which
>>> might not crash, but is unlikely to work well.
>>> 
>>> I think the lack of locking might have been OK in the past because only
>>> one CPU will successfully get the other CPUs to call do_join() in
>>> pseries_suspend(). But I could be wrong.
>>> 
>>> Anyway, now that we're mutating the PMU state before suspending we need
>>> to be more careful. So I think we need a lock around the whole
>>> sequence.
> 
> Regardless of the outcome here, generally agreed that some serialization
> should be imposed in this path. The way the platform works (and some
> extra measures by the drmgr utility) make it so that this code isn't
> entered concurrently in usual operation, but it's possible to make it
> happen if you are root.
> 
> A file-static mutex should be OK.
> 
>> My concern is still that we wouldn't necessarily have the other CPUs 
>> under control at that point even if we serialize the migrate path.
>> They could take interrupts, possibly call into perf subsystem after
>> the mobility_pmu_disable (e.g., via syscall or context switch) which 
>> might mess things up.
>>
>> I think the stop machine is a reasonable place for the code in this 
>> case. It's a low level disabling of hardware facility and saving off 
>> registers.
> 
> That makes sense, but I can't help feeling concerned still. For this to
> be safe, power_pmu_enable() and power_pmu_disable() must never sleep or
> re-enable interrupts or send IPIs. I don't see anything obviously unsafe
> right now, but is that already part of their contract?

Yes that would have to be. That's much the same as an IPI handler. Maybe 
stop machine has a few other things to worry about but I'm not sure that
would get in the way. Just have a note in the implementation or name to
remind it is stop machine context.  

> Is there much
> risk they could change in the future to violate those constraints?

My guess is if it needed to do anything much more complicated then you
would want to shut down perf more completely in a way that other APIs
know about.

> 
> That aside, the proposed change seems like we would be hacking around a
> more generic perf/pmu limitation in a powerpc-specific way. I see the
> same behavior on x86 across suspend/resume.

Arguably it's fixing up a x86 suspend/resume bug that might be fixed
in the same way as this -- presumably it's not restoring PMU registers
at resume.

But if they did something generic in perf core, this code could be
changed over to use it quite easily I would hope.  But pinging the
other perf devs to check would be a good idea.

Thanks,
Nick

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-11-05  1:46 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-29  3:05 [V3] powerpc/perf: Enable PMU counters post partition migration if PMU is active Athira Rajeev
2021-10-29  6:16 ` Nicholas Piggin
2021-10-29 13:15   ` Michael Ellerman
2021-11-02 11:00     ` Athira Rajeev
2021-11-02 11:35     ` Nicholas Piggin
2021-11-03 21:11       ` Nathan Lynch
2021-11-04  5:55         ` Michael Ellerman
2021-11-05  1:46         ` Nicholas Piggin
2021-11-02  6:13   ` Athira Rajeev
2021-10-29 10:20 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).