linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/5] Add dtl_entry tracepoint
@ 2018-10-25 20:25 Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask Naveen N. Rao
                   ` (4 more replies)
  0 siblings, 5 replies; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-25 20:25 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Nathan Fontenot, Jeremy Kerr,
	Steven Rostedt
  Cc: linuxppc-dev

This is v1 of the patches for providing a tracepoint for processing the 
dispatch trace log entries from the hypervisor in a shared processor 
LPAR. The previous RFC can be found here:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=66340

Since the RFC, this series has been expanded/generalized to support 
!CONFIG_VIRT_CPU_ACCOUNTING_NATIVE and has been tested in different 
configurations. The dispatch distance calculation has also been updated 
to use the platform provided information better.

Also, patch 3 is new and fixes an issue with stolen time accounting when 
the dtl debugfs interface is in use.

- Naveen


Naveen N. Rao (5):
  powerpc/pseries: Use macros for referring to the DTL enable mask
  powerpc/pseries: Do not save the previous DTL mask value
  powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
  powerpc/pseries: Factor out DTL buffer allocation and registration
    routines
  powerpc/pseries: Introduce dtl_entry tracepoint

 arch/powerpc/include/asm/lppaca.h         |  11 +
 arch/powerpc/include/asm/plpar_wrappers.h |   9 +
 arch/powerpc/include/asm/trace.h          |  55 +++++
 arch/powerpc/kernel/entry_64.S            |  39 ++++
 arch/powerpc/kernel/time.c                |   7 +-
 arch/powerpc/mm/numa.c                    | 144 ++++++++++++-
 arch/powerpc/platforms/pseries/dtl.c      |  22 +-
 arch/powerpc/platforms/pseries/lpar.c     | 249 ++++++++++++++++++++--
 arch/powerpc/platforms/pseries/setup.c    |  34 +--
 9 files changed, 502 insertions(+), 68 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask
  2018-10-25 20:25 [PATCH v1 0/5] Add dtl_entry tracepoint Naveen N. Rao
@ 2018-10-25 20:25 ` Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 2/5] powerpc/pseries: Do not save the previous DTL mask value Naveen N. Rao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-25 20:25 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Nathan Fontenot, Jeremy Kerr,
	Steven Rostedt
  Cc: linuxppc-dev

Introduce macros to encode the DTL enable mask fields and use those
instead of hardcoding numbers.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h      | 11 +++++++++++
 arch/powerpc/platforms/pseries/dtl.c   |  8 +-------
 arch/powerpc/platforms/pseries/lpar.c  |  2 +-
 arch/powerpc/platforms/pseries/setup.c |  2 +-
 4 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index 7c23ce8a5a4c..2c7e31187726 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -154,6 +154,17 @@ struct dtl_entry {
 #define DISPATCH_LOG_BYTES	4096	/* bytes per cpu */
 #define N_DISPATCH_LOG		(DISPATCH_LOG_BYTES / sizeof(struct dtl_entry))
 
+/*
+ * Dispatch trace log event enable mask:
+ *   0x1: voluntary virtual processor waits
+ *   0x2: time-slice preempts
+ *   0x4: virtual partition memory page faults
+ */
+#define DTL_LOG_CEDE		0x1
+#define DTL_LOG_PREEMPT		0x2
+#define DTL_LOG_FAULT		0x4
+#define DTL_LOG_ALL		(DTL_LOG_CEDE | DTL_LOG_PREEMPT | DTL_LOG_FAULT)
+
 extern struct kmem_cache *dtl_cache;
 
 /*
diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index ef6595153642..051ea2de1e1a 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -40,13 +40,7 @@ struct dtl {
 };
 static DEFINE_PER_CPU(struct dtl, cpu_dtl);
 
-/*
- * Dispatch trace log event mask:
- * 0x7: 0x1: voluntary virtual processor waits
- *      0x2: time-slice preempts
- *      0x4: virtual partition memory page faults
- */
-static u8 dtl_event_mask = 0x7;
+static u8 dtl_event_mask = DTL_LOG_ALL;
 
 
 /*
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index 0b5081085a44..ad194420e8ae 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -125,7 +125,7 @@ void vpa_init(int cpu)
 			pr_err("WARNING: DTL registration of cpu %d (hw %d) "
 			       "failed with %ld\n", smp_processor_id(),
 			       hwcpu, ret);
-		lppaca_of(cpu).dtl_enable_mask = 2;
+		lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
 	}
 }
 
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index 0f553dcfa548..f3b5822e88c6 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -306,7 +306,7 @@ static int alloc_dispatch_logs(void)
 		pr_err("WARNING: DTL registration of cpu %d (hw %d) failed "
 		       "with %d\n", smp_processor_id(),
 		       hard_smp_processor_id(), ret);
-	get_paca()->lppaca_ptr->dtl_enable_mask = 2;
+	get_paca()->lppaca_ptr->dtl_enable_mask = DTL_LOG_PREEMPT;
 
 	return 0;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 2/5] powerpc/pseries: Do not save the previous DTL mask value
  2018-10-25 20:25 [PATCH v1 0/5] Add dtl_entry tracepoint Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask Naveen N. Rao
@ 2018-10-25 20:25 ` Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used Naveen N. Rao
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-25 20:25 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Nathan Fontenot, Jeremy Kerr,
	Steven Rostedt
  Cc: linuxppc-dev

When CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is enabled, we always initialize
DTL enable mask to DTL_LOG_PREEMPT (0x2). There are no other places
where the mask is changed. As such, when reading the DTL log buffer
through debugfs, there is no need to save and restore the previous mask
value.

We don't need to save and restore the earlier mask value if
CONFIG_VIRT_CPU_ACCOUNTING_NATIVE is not enabled. So, remove the field
from the structure as well.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/pseries/dtl.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index 051ea2de1e1a..fb05804adb2f 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -55,7 +55,6 @@ struct dtl_ring {
 	struct dtl_entry *write_ptr;
 	struct dtl_entry *buf;
 	struct dtl_entry *buf_end;
-	u8	saved_dtl_mask;
 };
 
 static DEFINE_PER_CPU(struct dtl_ring, dtl_rings);
@@ -105,7 +104,6 @@ static int dtl_start(struct dtl *dtl)
 	dtlr->write_ptr = dtl->buf;
 
 	/* enable event logging */
-	dtlr->saved_dtl_mask = lppaca_of(dtl->cpu).dtl_enable_mask;
 	lppaca_of(dtl->cpu).dtl_enable_mask |= dtl_event_mask;
 
 	dtl_consumer = consume_dtle;
@@ -123,7 +121,7 @@ static void dtl_stop(struct dtl *dtl)
 	dtlr->buf = NULL;
 
 	/* restore dtl_enable_mask */
-	lppaca_of(dtl->cpu).dtl_enable_mask = dtlr->saved_dtl_mask;
+	lppaca_of(dtl->cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
 
 	if (atomic_dec_and_test(&dtl_count))
 		dtl_consumer = NULL;
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
  2018-10-25 20:25 [PATCH v1 0/5] Add dtl_entry tracepoint Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 2/5] powerpc/pseries: Do not save the previous DTL mask value Naveen N. Rao
@ 2018-10-25 20:25 ` Naveen N. Rao
  2018-10-25 21:08   ` Paul Mackerras
  2018-10-25 20:25 ` [PATCH v1 4/5] powerpc/pseries: Factor out DTL buffer allocation and registration routines Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 5/5] powerpc/pseries: Introduce dtl_entry tracepoint Naveen N. Rao
  4 siblings, 1 reply; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-25 20:25 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Nathan Fontenot, Jeremy Kerr,
	Steven Rostedt
  Cc: linuxppc-dev

When the dtl debugfs interface is used, we usually set the
dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing
DTL entries for all preempt reasons, including CEDE. In
scan_dispatch_log(), we add up the times from all entries and account
those towards stolen time. However, we should only be accounting stolen
time when the preemption was due to HDEC at the end of our time slice.

Fix this by checking for the dispatch reason in the DTL entry before
adding to the stolen time.

Fixes: cf9efce0ce313 ("powerpc: Account time using timebase rather than PURR")
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/time.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 40868f3ee113..923abc3e555d 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -199,7 +199,7 @@ static u64 scan_dispatch_log(u64 stop_tb)
 	struct lppaca *vpa = local_paca->lppaca_ptr;
 	u64 tb_delta;
 	u64 stolen = 0;
-	u64 dtb;
+	u64 dtb, dispatch_reason;
 
 	if (!dtl)
 		return 0;
@@ -210,6 +210,7 @@ static u64 scan_dispatch_log(u64 stop_tb)
 		dtb = be64_to_cpu(dtl->timebase);
 		tb_delta = be32_to_cpu(dtl->enqueue_to_dispatch_time) +
 			be32_to_cpu(dtl->ready_to_enqueue_time);
+		dispatch_reason = dtl->dispatch_reason;
 		barrier();
 		if (i + N_DISPATCH_LOG < be64_to_cpu(vpa->dtl_idx)) {
 			/* buffer has overflowed */
@@ -221,7 +222,9 @@ static u64 scan_dispatch_log(u64 stop_tb)
 			break;
 		if (dtl_consumer)
 			dtl_consumer(dtl, i);
-		stolen += tb_delta;
+		/* 7 indicates that this dispatch follows a time slice preempt */
+		if (dispatch_reason == 7)
+			stolen += tb_delta;
 		++i;
 		++dtl;
 		if (dtl == dtl_end)
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 4/5] powerpc/pseries: Factor out DTL buffer allocation and registration routines
  2018-10-25 20:25 [PATCH v1 0/5] Add dtl_entry tracepoint Naveen N. Rao
                   ` (2 preceding siblings ...)
  2018-10-25 20:25 ` [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used Naveen N. Rao
@ 2018-10-25 20:25 ` Naveen N. Rao
  2018-10-25 20:25 ` [PATCH v1 5/5] powerpc/pseries: Introduce dtl_entry tracepoint Naveen N. Rao
  4 siblings, 0 replies; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-25 20:25 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Nathan Fontenot, Jeremy Kerr,
	Steven Rostedt
  Cc: linuxppc-dev

Introduce new helpers for DTL buffer allocation and registration and
have the existing code use those.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/plpar_wrappers.h |  2 +
 arch/powerpc/platforms/pseries/lpar.c     | 66 ++++++++++++++++-------
 arch/powerpc/platforms/pseries/setup.c    | 34 +-----------
 3 files changed, 52 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index cff5a411e595..7dcbf42e9e11 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -88,6 +88,8 @@ static inline long register_dtl(unsigned long cpu, unsigned long vpa)
 	return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+extern void alloc_dtl_buffers(void);
+extern void register_dtl_buffer(int cpu);
 extern void vpa_init(int cpu);
 
 static inline long plpar_pte_enter(unsigned long flags,
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index ad194420e8ae..d83bb3db6767 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -64,13 +64,58 @@ EXPORT_SYMBOL(plpar_hcall);
 EXPORT_SYMBOL(plpar_hcall9);
 EXPORT_SYMBOL(plpar_hcall_norets);
 
+void alloc_dtl_buffers(void)
+{
+	int cpu;
+	struct paca_struct *pp;
+	struct dtl_entry *dtl;
+
+	for_each_possible_cpu(cpu) {
+		pp = paca_ptrs[cpu];
+		dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
+		if (!dtl) {
+			pr_warn("Failed to allocate dispatch trace log for cpu %d\n",
+				cpu);
+			pr_warn("Stolen time statistics will be unreliable\n");
+			break;
+		}
+
+		pp->dtl_ridx = 0;
+		pp->dispatch_log = dtl;
+		pp->dispatch_log_end = dtl + N_DISPATCH_LOG;
+		pp->dtl_curr = dtl;
+	}
+}
+
+void register_dtl_buffer(int cpu)
+{
+	long ret;
+	struct paca_struct *pp;
+	struct dtl_entry *dtl;
+	int hwcpu = get_hard_smp_processor_id(cpu);
+
+	pp = paca_ptrs[cpu];
+	dtl = pp->dispatch_log;
+	if (dtl) {
+		pp->dtl_ridx = 0;
+		pp->dtl_curr = dtl;
+		lppaca_of(cpu).dtl_idx = 0;
+
+		/* hypervisor reads buffer length from this field */
+		dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES);
+		ret = register_dtl(hwcpu, __pa(dtl));
+		if (ret)
+			pr_err("WARNING: DTL registration of cpu %d (hw %d) "
+			       "failed with %ld\n", cpu, hwcpu, ret);
+		lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
+	}
+}
+
 void vpa_init(int cpu)
 {
 	int hwcpu = get_hard_smp_processor_id(cpu);
 	unsigned long addr;
 	long ret;
-	struct paca_struct *pp;
-	struct dtl_entry *dtl;
 
 	/*
 	 * The spec says it "may be problematic" if CPU x registers the VPA of
@@ -111,22 +156,7 @@ void vpa_init(int cpu)
 	/*
 	 * Register dispatch trace log, if one has been allocated.
 	 */
-	pp = paca_ptrs[cpu];
-	dtl = pp->dispatch_log;
-	if (dtl) {
-		pp->dtl_ridx = 0;
-		pp->dtl_curr = dtl;
-		lppaca_of(cpu).dtl_idx = 0;
-
-		/* hypervisor reads buffer length from this field */
-		dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES);
-		ret = register_dtl(hwcpu, __pa(dtl));
-		if (ret)
-			pr_err("WARNING: DTL registration of cpu %d (hw %d) "
-			       "failed with %ld\n", smp_processor_id(),
-			       hwcpu, ret);
-		lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
-	}
+	register_dtl_buffer(cpu);
 }
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c
index f3b5822e88c6..be6a3845b7ea 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -267,46 +267,16 @@ struct kmem_cache *dtl_cache;
  */
 static int alloc_dispatch_logs(void)
 {
-	int cpu, ret;
-	struct paca_struct *pp;
-	struct dtl_entry *dtl;
-
 	if (!firmware_has_feature(FW_FEATURE_SPLPAR))
 		return 0;
 
 	if (!dtl_cache)
 		return 0;
 
-	for_each_possible_cpu(cpu) {
-		pp = paca_ptrs[cpu];
-		dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
-		if (!dtl) {
-			pr_warn("Failed to allocate dispatch trace log for cpu %d\n",
-				cpu);
-			pr_warn("Stolen time statistics will be unreliable\n");
-			break;
-		}
-
-		pp->dtl_ridx = 0;
-		pp->dispatch_log = dtl;
-		pp->dispatch_log_end = dtl + N_DISPATCH_LOG;
-		pp->dtl_curr = dtl;
-	}
+	alloc_dtl_buffers();
 
 	/* Register the DTL for the current (boot) cpu */
-	dtl = get_paca()->dispatch_log;
-	get_paca()->dtl_ridx = 0;
-	get_paca()->dtl_curr = dtl;
-	get_paca()->lppaca_ptr->dtl_idx = 0;
-
-	/* hypervisor reads buffer length from this field */
-	dtl->enqueue_to_dispatch_time = cpu_to_be32(DISPATCH_LOG_BYTES);
-	ret = register_dtl(hard_smp_processor_id(), __pa(dtl));
-	if (ret)
-		pr_err("WARNING: DTL registration of cpu %d (hw %d) failed "
-		       "with %d\n", smp_processor_id(),
-		       hard_smp_processor_id(), ret);
-	get_paca()->lppaca_ptr->dtl_enable_mask = DTL_LOG_PREEMPT;
+	register_dtl_buffer(smp_processor_id());
 
 	return 0;
 }
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v1 5/5] powerpc/pseries: Introduce dtl_entry tracepoint
  2018-10-25 20:25 [PATCH v1 0/5] Add dtl_entry tracepoint Naveen N. Rao
                   ` (3 preceding siblings ...)
  2018-10-25 20:25 ` [PATCH v1 4/5] powerpc/pseries: Factor out DTL buffer allocation and registration routines Naveen N. Rao
@ 2018-10-25 20:25 ` Naveen N. Rao
  4 siblings, 0 replies; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-25 20:25 UTC (permalink / raw)
  To: Michael Ellerman, Paul Mackerras, Nathan Fontenot, Jeremy Kerr,
	Steven Rostedt
  Cc: linuxppc-dev

This tracepoint provides access to the fields of each DTL entry in the
Dispatch Trace Log buffer. Since the buffer is populated by the
hypervisor and since we allocate just a 4k area per cpu for the buffer,
we need to process the entries on a regular basis before they are
overwritten by the hypervisor. We do this by using a static branch (or a
reference counter if we don't have jump labels) in ret_from_except
similar to how the hcall/opal tracepoints do.

Apart from making the DTL entries available for processing through the
usual trace interface, this tracepoint also adds a new field 'distance'
to each DTL entry, enabling enhanced statistics around the vcpu dispatch
behavior of the hypervisor.

For Shared Processor LPARs, the POWER Hypervisor maintains a relatively
static mapping of LPAR vcpus to physical processor cores and tries to
always dispatch vcpus on their associated physical processor core. The
LPAR can discover this through the H_VPHN(flags=1) hcall to obtain the
associativity of the LPAR vcpus.

However, under certain scenarios, vcpus may be dispatched on a different
processor core. The actual physical processor number on which a certain
vcpu is dispatched is available to the LPAR in the 'processor_id' field
of each DTL entry. The LPAR can then discover the associativity of that
physical processor through the H_VPHN(flags=2) hcall. This can then be
compared to the home node associativity for that specific vcpu to
determine if the vcpu was dispatched on the same core or not.  If the
vcpu was not dispatched on the home node, it is possible to determine if
the vcpu was dispatched in a different chip, socket or drawer.

The tracepoint field 'distance' encodes this information. If distance is
0, then the vcpu was dispatched on its home node/chip. If not,
increasing values of 'distance' indicate a dispatch on a different chip
in a MCM, different socket or a different drawer.

In terms of the implementation, we update our numa code to retain the
vcpu associativity that is retrieved while discovering our numa
topology. In addition, on tracepoint registration, we discover the
physical cpu associativity. This information is only retrieved during
the tracepoint registration and is not expected to change for the
duration of the trace.

To support configurations with/without CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
selected, we generalize and extend helpers for DTL buffer allocation,
freeing and registration. We also introduce a global variable 'dtl_mask'
to encode the DTL enable mask to be set for all cpus. This helps ensure
that cpus that come online honor the global enable mask.

Finally, to ensure that the new dtl_entry tracepoint usage does not
interfere with the dtl debugfs interface, we introduce helpers to ensure
only one of the two interfaces are used at any point in time.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/plpar_wrappers.h |   7 +
 arch/powerpc/include/asm/trace.h          |  55 +++++++
 arch/powerpc/kernel/entry_64.S            |  39 +++++
 arch/powerpc/mm/numa.c                    | 144 ++++++++++++++++-
 arch/powerpc/platforms/pseries/dtl.c      |  10 +-
 arch/powerpc/platforms/pseries/lpar.c     | 187 +++++++++++++++++++++-
 6 files changed, 434 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/include/asm/plpar_wrappers.h b/arch/powerpc/include/asm/plpar_wrappers.h
index 7dcbf42e9e11..029f019ddfb6 100644
--- a/arch/powerpc/include/asm/plpar_wrappers.h
+++ b/arch/powerpc/include/asm/plpar_wrappers.h
@@ -88,7 +88,14 @@ static inline long register_dtl(unsigned long cpu, unsigned long vpa)
 	return vpa_call(H_VPA_REG_DTL, cpu, vpa);
 }
 
+extern void dtl_entry_tracepoint_enable(void);
+extern void dtl_entry_tracepoint_disable(void);
+extern int register_dtl_buffer_access(int global);
+extern void unregister_dtl_buffer_access(int global);
+extern void set_dtl_mask(u8 mask);
+extern void reset_dtl_mask(void);
 extern void alloc_dtl_buffers(void);
+extern void free_dtl_buffers(void);
 extern void register_dtl_buffer(int cpu);
 extern void vpa_init(int cpu);
 
diff --git a/arch/powerpc/include/asm/trace.h b/arch/powerpc/include/asm/trace.h
index d018e8602694..bcb8d66d3232 100644
--- a/arch/powerpc/include/asm/trace.h
+++ b/arch/powerpc/include/asm/trace.h
@@ -101,6 +101,61 @@ TRACE_EVENT_FN_COND(hcall_exit,
 
 	hcall_tracepoint_regfunc, hcall_tracepoint_unregfunc
 );
+
+#ifdef CONFIG_PPC_SPLPAR
+extern int dtl_entry_tracepoint_regfunc(void);
+extern void dtl_entry_tracepoint_unregfunc(void);
+extern u8 compute_dispatch_distance(unsigned int pcpu);
+
+TRACE_EVENT_FN(dtl_entry,
+
+	TP_PROTO(u8 dispatch_reason, u8 preempt_reason, u16 processor_id,
+		u32 enqueue_to_dispatch_time, u32 ready_to_enqueue_time,
+		u32 waiting_to_ready_time, u64 timebase, u64 fault_addr,
+		u64 srr0, u64 srr1),
+
+	TP_ARGS(dispatch_reason, preempt_reason, processor_id,
+		enqueue_to_dispatch_time, ready_to_enqueue_time,
+		waiting_to_ready_time, timebase, fault_addr,
+		srr0, srr1),
+
+	TP_STRUCT__entry(
+		__field(u8, dispatch_reason)
+		__field(u8, preempt_reason)
+		__field(u16, processor_id)
+		__field(u32, enqueue_to_dispatch_time)
+		__field(u32, ready_to_enqueue_time)
+		__field(u32, waiting_to_ready_time)
+		__field(u64, timebase)
+		__field(u64, fault_addr)
+		__field(u64, srr0)
+		__field(u64, srr1)
+		__field(u8, distance)
+	),
+
+	TP_fast_assign(
+		__entry->dispatch_reason = dispatch_reason;
+		__entry->preempt_reason = preempt_reason;
+		__entry->processor_id = processor_id;
+		__entry->enqueue_to_dispatch_time = enqueue_to_dispatch_time;
+		__entry->ready_to_enqueue_time = ready_to_enqueue_time;
+		__entry->waiting_to_ready_time = waiting_to_ready_time;
+		__entry->timebase = timebase;
+		__entry->fault_addr = fault_addr;
+		__entry->srr0 = srr0;
+		__entry->srr1 = srr1;
+		__entry->distance = compute_dispatch_distance(processor_id);
+	),
+
+	TP_printk("dispatch_reason=%u preempt_reason=%u processor_id=%u enq_to_disp=%u ready_to_enq=%u wait_to_ready=%u tb=%llu fault_addr=0x%llx srr0=0x%llx srr1=0x%llx distance=%u",
+		__entry->dispatch_reason, __entry->preempt_reason, __entry->processor_id,
+		__entry->enqueue_to_dispatch_time, __entry->ready_to_enqueue_time,
+		__entry->waiting_to_ready_time, __entry->timebase, __entry->fault_addr,
+		__entry->srr0, __entry->srr1, __entry->distance),
+
+	dtl_entry_tracepoint_regfunc, dtl_entry_tracepoint_unregfunc
+);
+#endif
 #endif
 
 #ifdef CONFIG_PPC_POWERNV
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 7b1693adff2a..9c3b922dda77 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -46,6 +46,7 @@
 #include <asm/exception-64e.h>
 #endif
 #include <asm/feature-fixups.h>
+#include <linux/jump_label.h>
 
 /*
  * System calls.
@@ -714,6 +715,38 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	addi	r1,r1,SWITCH_FRAME_SIZE
 	blr
 
+#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_PPC_SPLPAR)
+#ifdef HAVE_JUMP_LABEL
+#define TRACE_DTL_ENTRY_BRANCH(LABEL)					\
+	ARCH_STATIC_BRANCH(LABEL, dtl_entry_tracepoint_key)
+#else
+
+	.pushsection	".toc","aw"
+	.globl dtl_entry_tracepoint_refcount
+dtl_entry_tracepoint_refcount:
+	.8byte	0
+	.popsection
+
+/*
+ * We branch around this in early init (eg when populating the MMU
+ * hashtable) by using an unconditional cpu feature.
+ */
+#define TRACE_DTL_ENTRY_BRANCH(LABEL)				\
+BEGIN_FTR_SECTION;						\
+	b	1f;						\
+END_FTR_SECTION(0, 1);						\
+	ld	r12,dtl_entry_tracepoint_refcount@toc(r2);	\
+	cmpdi	r12,0;						\
+	bne-	LABEL;						\
+1:
+#endif
+
+do_trace_dtl_entry:
+	bl	__trace_dtl_entry
+	b	.Lpost_trace_dtl_entry
+_ASM_NOKPROBE_SYMBOL(do_trace_dtl_entry);
+#endif
+
 	.align	7
 _GLOBAL(ret_from_except)
 	ld	r11,_TRAP(r1)
@@ -734,6 +767,11 @@ _GLOBAL(ret_from_except_lite)
 	mtmsrd	r10,1		  /* Update machine state */
 #endif /* CONFIG_PPC_BOOK3E */
 
+#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_PPC_SPLPAR)
+	TRACE_DTL_ENTRY_BRANCH(do_trace_dtl_entry)
+.Lpost_trace_dtl_entry:
+#endif
+
 	CURRENT_THREAD_INFO(r9, r1)
 	ld	r3,_MSR(r1)
 #ifdef CONFIG_PPC_BOOK3E
@@ -768,6 +806,7 @@ _GLOBAL(ret_from_except_lite)
 	bl	restore_math
 	b	restore
 #endif
+
 1:	andi.	r0,r4,_TIF_NEED_RESCHED
 	beq	2f
 	bl	restore_interrupts
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 693ae1c1acba..641fb12e9e55 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -41,6 +41,7 @@
 #include <asm/setup.h>
 #include <asm/vdso.h>
 #include <asm/drmem.h>
+#include <asm/plpar_wrappers.h>
 
 static int numa_enabled = 1;
 
@@ -1078,6 +1079,9 @@ static int prrn_enabled;
 static void reset_topology_timer(void);
 static int topology_timer_secs = 1;
 static int topology_inited;
+static __be32 vcpu_associativity[NR_CPUS][VPHN_ASSOC_BUFSIZE];
+static __be32 pcpu_associativity[NR_CPUS][VPHN_ASSOC_BUFSIZE];
+static int no_distance_info;
 
 /*
  * Change polling interval for associativity changes.
@@ -1157,12 +1161,10 @@ static int update_cpu_associativity_changes_mask(void)
  * Retrieve the new associativity information for a virtual processor's
  * home node.
  */
-static long hcall_vphn(unsigned long cpu, __be32 *associativity)
+static long hcall_vphn(unsigned long hwcpu, unsigned long flags, __be32 *associativity)
 {
 	long rc;
 	long retbuf[PLPAR_HCALL9_BUFSIZE] = {0};
-	u64 flags = 1;
-	int hwcpu = get_hard_smp_processor_id(cpu);
 
 	rc = plpar_hcall9(H_HOME_NODE_ASSOCIATIVITY, retbuf, flags, hwcpu);
 	vphn_unpack_associativity(retbuf, associativity);
@@ -1175,7 +1177,7 @@ static long vphn_get_associativity(unsigned long cpu,
 {
 	long rc;
 
-	rc = hcall_vphn(cpu, associativity);
+	rc = hcall_vphn(get_hard_smp_processor_id(cpu), 1, associativity);
 
 	switch (rc) {
 	case H_FUNCTION:
@@ -1200,7 +1202,7 @@ static long vphn_get_associativity(unsigned long cpu,
 
 int find_and_online_cpu_nid(int cpu)
 {
-	__be32 associativity[VPHN_ASSOC_BUFSIZE] = {0};
+	__be32 *associativity = vcpu_associativity[cpu];
 	int new_nid;
 
 	/* Use associativity from first thread for all siblings */
@@ -1237,6 +1239,138 @@ int find_and_online_cpu_nid(int cpu)
 	return new_nid;
 }
 
+static unsigned int find_possible_pcpus(void)
+{
+	struct device_node *rtas;
+	unsigned int max_depth, num_pcpus = 0;
+
+	rtas = of_find_node_by_path("/rtas");
+	if (min_common_depth <= 0 || !rtas)
+		return 0;
+
+	if (!of_property_read_u32(rtas,
+				"ibm,max-associativity-domains",
+				&max_depth))
+		of_property_read_u32_index(rtas,
+					   "ibm,max-associativity-domains",
+					   max_depth, &num_pcpus);
+
+	of_node_put(rtas);
+
+	/*
+	 * The OF property reports the maximum cpu number.
+	 * We instead want the maximum number of cpus.
+	 */
+	return num_pcpus + 1;
+}
+
+DECLARE_PER_CPU(u64, dtl_entry_ridx);
+
+/* This is only called on first registration */
+int dtl_entry_tracepoint_regfunc(void)
+{
+	unsigned int i, cpu, num_pcpus;
+	long rc;
+
+	if (register_dtl_buffer_access(1))
+		return -EBUSY;
+
+	if (!vphn_enabled || distance_ref_points_depth < 1 ||
+			     !distance_ref_points ||
+			     !firmware_has_feature(FW_FEATURE_TYPE1_AFFINITY))
+		no_distance_info = 1;
+	else {
+		no_distance_info = 0;
+		num_pcpus = find_possible_pcpus();
+		if (!num_pcpus)
+			num_pcpus = NR_CPUS;
+
+		/*
+		 * Phyp numbers physical processors from 0 and theads per core
+		 * remains the same across the hypervisor and the LPAR, so we can
+		 * skip retrieving associativity for the sibling threads.
+		 * NOTE: This is only retrieved during tracepoint registration
+		 * and is not updated thereafter.
+		 */
+		for (i = 0; i < NR_CPUS; i++) {
+			if (i != cpu_first_thread_sibling(i))
+				continue;
+			if (i >= num_pcpus) {
+				pcpu_associativity[i][0] = cpu_to_be32(NR_CPUS);
+				continue;
+			}
+			rc = hcall_vphn(i, 2, pcpu_associativity[i]);
+			if (rc != H_SUCCESS) {
+				pcpu_associativity[i][0] = cpu_to_be32(NR_CPUS);
+				pr_debug_ratelimited("pcpu_associativity could not be discovered for cpu %d\n", i);
+			}
+		}
+	}
+
+	set_dtl_mask(DTL_LOG_ALL);
+
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+	/* Setup dtl buffers and register those */
+	alloc_dtl_buffers();
+
+	for_each_online_cpu(cpu)
+		register_dtl_buffer(cpu);
+#endif
+
+	for_each_online_cpu(cpu)
+		per_cpu(dtl_entry_ridx, cpu) = be64_to_cpu(lppaca_of(cpu).dtl_idx);
+
+	dtl_entry_tracepoint_enable();
+
+	return 0;
+}
+
+void dtl_entry_tracepoint_unregfunc(void)
+{
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+	int cpu;
+#endif
+
+	dtl_entry_tracepoint_disable();
+
+	reset_dtl_mask();
+
+#ifndef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+	for_each_possible_cpu(cpu)
+		unregister_dtl(get_hard_smp_processor_id(cpu));
+
+	free_dtl_buffers();
+#endif
+
+	unregister_dtl_buffer_access(1);
+}
+
+/* We return 255 to indicate that we couldn't determine the dispatch distance */
+u8 compute_dispatch_distance(unsigned int pcpu)
+{
+	__be32 *pcpu_assoc, *vcpu_assoc;
+	int i, index, distance = distance_ref_points_depth;
+
+	if (!vphn_enabled || no_distance_info)
+		return 255;
+
+	vcpu_assoc = vcpu_associativity[cpu_first_thread_sibling(smp_processor_id())];
+	pcpu_assoc = pcpu_associativity[cpu_first_thread_sibling(pcpu)];
+
+	if (be32_to_cpu(pcpu_assoc[0]) == NR_CPUS)
+		return 255;
+
+	for (i = distance_ref_points_depth - 1; i >= 0; i--) {
+		index = be32_to_cpu(distance_ref_points[i]);
+		if (be32_to_cpu(vcpu_assoc[index]) == be32_to_cpu(pcpu_assoc[index]))
+			distance--;
+		else
+			break;
+	}
+
+	return distance;
+}
+
 /*
  * Update the CPU maps and sysfs entries for a single CPU when its NUMA
  * characteristics change. This function doesn't perform any locking and is
diff --git a/arch/powerpc/platforms/pseries/dtl.c b/arch/powerpc/platforms/pseries/dtl.c
index fb05804adb2f..ec25d3e6cdbb 100644
--- a/arch/powerpc/platforms/pseries/dtl.c
+++ b/arch/powerpc/platforms/pseries/dtl.c
@@ -193,11 +193,15 @@ static int dtl_enable(struct dtl *dtl)
 	if (dtl->buf)
 		return -EBUSY;
 
+	if (register_dtl_buffer_access(0))
+		return -EBUSY;
+
 	n_entries = dtl_buf_entries;
 	buf = kmem_cache_alloc_node(dtl_cache, GFP_KERNEL, cpu_to_node(dtl->cpu));
 	if (!buf) {
 		printk(KERN_WARNING "%s: buffer alloc failed for cpu %d\n",
 				__func__, dtl->cpu);
+		unregister_dtl_buffer_access(0);
 		return -ENOMEM;
 	}
 
@@ -214,8 +218,11 @@ static int dtl_enable(struct dtl *dtl)
 	}
 	spin_unlock(&dtl->lock);
 
-	if (rc)
+	if (rc) {
+		unregister_dtl_buffer_access(0);
 		kmem_cache_free(dtl_cache, buf);
+	}
+
 	return rc;
 }
 
@@ -227,6 +234,7 @@ static void dtl_disable(struct dtl *dtl)
 	dtl->buf = NULL;
 	dtl->buf_entries = 0;
 	spin_unlock(&dtl->lock);
+	unregister_dtl_buffer_access(0);
 }
 
 /* file interface */
diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c
index d83bb3db6767..67b2c59c10b1 100644
--- a/arch/powerpc/platforms/pseries/lpar.c
+++ b/arch/powerpc/platforms/pseries/lpar.c
@@ -30,6 +30,7 @@
 #include <linux/jump_label.h>
 #include <linux/delay.h>
 #include <linux/stop_machine.h>
+#include <linux/spinlock.h>
 #include <asm/processor.h>
 #include <asm/mmu.h>
 #include <asm/page.h>
@@ -64,6 +65,167 @@ EXPORT_SYMBOL(plpar_hcall);
 EXPORT_SYMBOL(plpar_hcall9);
 EXPORT_SYMBOL(plpar_hcall_norets);
 
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+u8 dtl_mask = DTL_LOG_PREEMPT;
+#else
+u8 dtl_mask = 0;
+#endif
+
+static DEFINE_SPINLOCK(dtl_buffer_refctr_lock);
+static unsigned int dtl_buffer_global_refctr, dtl_buffer_percpu_refctr;
+
+int register_dtl_buffer_access(int global)
+{
+	int rc = 0;
+
+	spin_lock(&dtl_buffer_refctr_lock);
+
+	if ((global && (dtl_buffer_global_refctr || dtl_buffer_percpu_refctr)) ||
+			(!global && dtl_buffer_global_refctr)) {
+		rc = -1;
+	} else {
+		if (global)
+			dtl_buffer_global_refctr++;
+		else
+			dtl_buffer_percpu_refctr++;
+	}
+
+	spin_unlock(&dtl_buffer_refctr_lock);
+
+	return rc;
+}
+
+void unregister_dtl_buffer_access(int global)
+{
+	spin_lock(&dtl_buffer_refctr_lock);
+
+	if (global)
+		dtl_buffer_global_refctr--;
+	else
+		dtl_buffer_percpu_refctr--;
+
+	spin_unlock(&dtl_buffer_refctr_lock);
+}
+
+void set_dtl_mask(u8 mask)
+{
+	int cpu;
+
+	dtl_mask = mask;
+	for_each_present_cpu(cpu)
+		lppaca_of(cpu).dtl_enable_mask = dtl_mask;
+}
+
+void reset_dtl_mask()
+{
+	int cpu;
+
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
+	dtl_mask = DTL_LOG_PREEMPT;
+#else
+	dtl_mask = 0;
+#endif
+	for_each_present_cpu(cpu)
+		lppaca_of(cpu).dtl_enable_mask = dtl_mask;
+}
+
+#if defined(CONFIG_TRACEPOINTS) && defined(CONFIG_PPC_SPLPAR)
+#ifdef HAVE_JUMP_LABEL
+struct static_key dtl_entry_tracepoint_key = STATIC_KEY_INIT;
+
+void dtl_entry_tracepoint_enable(void)
+{
+	static_key_slow_inc(&dtl_entry_tracepoint_key);
+}
+
+void dtl_entry_tracepoint_disable(void)
+{
+	static_key_slow_dec(&dtl_entry_tracepoint_key);
+}
+#else
+/* NB: reg/unreg are called while guarded with the tracepoints_mutex */
+extern long dtl_entry_tracepoint_refcount;
+
+void dtl_entry_tracepoint_enable(void)
+{
+	dtl_entry_tracepoint_refcount++;
+}
+
+void dtl_entry_tracepoint_disable(void)
+{
+	dtl_entry_tracepoint_refcount--;
+}
+#endif
+
+/*
+ * Since the tracing code might execute hcalls we need to guard against
+ * recursion. One example of this are spinlocks calling H_YIELD on
+ * shared processor partitions.
+ */
+static DEFINE_PER_CPU(unsigned int, dtl_entry_trace_depth);
+DEFINE_PER_CPU(u64, dtl_entry_ridx);
+
+static void __process_dtl_buffer(void)
+{
+	struct dtl_entry dtle;
+	u64 i = __this_cpu_read(dtl_entry_ridx);
+	struct dtl_entry *dtl = local_paca->dispatch_log + (i % N_DISPATCH_LOG);
+	struct dtl_entry *dtl_end = local_paca->dispatch_log_end;
+	struct lppaca *vpa = local_paca->lppaca_ptr;
+
+	if (!dtl || i == be64_to_cpu(vpa->dtl_idx))
+		return;
+
+	while (i < be64_to_cpu(vpa->dtl_idx)) {
+		dtle = *dtl;
+		barrier();
+		if (i + N_DISPATCH_LOG < be64_to_cpu(vpa->dtl_idx)) {
+			/* buffer has overflowed */
+			i = be64_to_cpu(vpa->dtl_idx) - N_DISPATCH_LOG;
+			dtl = local_paca->dispatch_log + (i % N_DISPATCH_LOG);
+			continue;
+		}
+		trace_dtl_entry(dtle.dispatch_reason, dtle.preempt_reason,
+				be16_to_cpu(dtle.processor_id),
+				be32_to_cpu(dtle.enqueue_to_dispatch_time),
+				be32_to_cpu(dtle.ready_to_enqueue_time),
+				be32_to_cpu(dtle.waiting_to_ready_time),
+				be64_to_cpu(dtle.timebase),
+				be64_to_cpu(dtle.fault_addr),
+				be64_to_cpu(dtle.srr0),
+				be64_to_cpu(dtle.srr1));
+		++i;
+		++dtl;
+		if (dtl == dtl_end)
+			dtl = local_paca->dispatch_log;
+	}
+
+	__this_cpu_write(dtl_entry_ridx, i);
+}
+
+void __trace_dtl_entry(void)
+{
+	unsigned long flags;
+	unsigned int *depth;
+
+	local_irq_save(flags);
+	preempt_disable();
+
+	depth = this_cpu_ptr(&dtl_entry_trace_depth);
+
+	if (*depth)
+		goto out;
+
+	(*depth)++;
+	__process_dtl_buffer();
+	(*depth)--;
+
+out:
+	preempt_enable();
+	local_irq_restore(flags);
+}
+#endif
+
 void alloc_dtl_buffers(void)
 {
 	int cpu;
@@ -72,11 +234,15 @@ void alloc_dtl_buffers(void)
 
 	for_each_possible_cpu(cpu) {
 		pp = paca_ptrs[cpu];
+		if (pp->dispatch_log)
+			continue;
 		dtl = kmem_cache_alloc(dtl_cache, GFP_KERNEL);
 		if (!dtl) {
 			pr_warn("Failed to allocate dispatch trace log for cpu %d\n",
 				cpu);
+#ifdef CONFIG_VIRT_CPU_ACCOUNTING_NATIVE
 			pr_warn("Stolen time statistics will be unreliable\n");
+#endif
 			break;
 		}
 
@@ -87,6 +253,23 @@ void alloc_dtl_buffers(void)
 	}
 }
 
+void free_dtl_buffers(void)
+{
+	int cpu;
+	struct paca_struct *pp;
+
+	for_each_possible_cpu(cpu) {
+		pp = paca_ptrs[cpu];
+		if (!pp->dispatch_log)
+			continue;
+		kmem_cache_free(dtl_cache, pp->dispatch_log);
+		pp->dtl_ridx = 0;
+		pp->dispatch_log = 0;
+		pp->dispatch_log_end = 0;
+		pp->dtl_curr = 0;
+	}
+}
+
 void register_dtl_buffer(int cpu)
 {
 	long ret;
@@ -96,7 +279,7 @@ void register_dtl_buffer(int cpu)
 
 	pp = paca_ptrs[cpu];
 	dtl = pp->dispatch_log;
-	if (dtl) {
+	if (dtl && dtl_mask) {
 		pp->dtl_ridx = 0;
 		pp->dtl_curr = dtl;
 		lppaca_of(cpu).dtl_idx = 0;
@@ -107,7 +290,7 @@ void register_dtl_buffer(int cpu)
 		if (ret)
 			pr_err("WARNING: DTL registration of cpu %d (hw %d) "
 			       "failed with %ld\n", cpu, hwcpu, ret);
-		lppaca_of(cpu).dtl_enable_mask = DTL_LOG_PREEMPT;
+		lppaca_of(cpu).dtl_enable_mask = dtl_mask;
 	}
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
  2018-10-25 20:25 ` [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used Naveen N. Rao
@ 2018-10-25 21:08   ` Paul Mackerras
  2018-10-26  7:40     ` Naveen N. Rao
  0 siblings, 1 reply; 8+ messages in thread
From: Paul Mackerras @ 2018-10-25 21:08 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: linuxppc-dev, Jeremy Kerr, Steven Rostedt, Nathan Fontenot

On Fri, Oct 26, 2018 at 01:55:44AM +0530, Naveen N. Rao wrote:
> When the dtl debugfs interface is used, we usually set the
> dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing
> DTL entries for all preempt reasons, including CEDE. In
> scan_dispatch_log(), we add up the times from all entries and account
> those towards stolen time. However, we should only be accounting stolen
> time when the preemption was due to HDEC at the end of our time slice.

It's always been the case that stolen time when idle has been
accounted as idle time, not stolen time.  That's why we didn't check
for this in the past.

Do you have a test that shows different results (as in reported idle
and stolen times) with this patch compared to without?

Paul.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used
  2018-10-25 21:08   ` Paul Mackerras
@ 2018-10-26  7:40     ` Naveen N. Rao
  0 siblings, 0 replies; 8+ messages in thread
From: Naveen N. Rao @ 2018-10-26  7:40 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-dev, Jeremy Kerr, Steven Rostedt, Nathan Fontenot

Paul Mackerras wrote:
> On Fri, Oct 26, 2018 at 01:55:44AM +0530, Naveen N. Rao wrote:
>> When the dtl debugfs interface is used, we usually set the
>> dtl_enable_mask to 0x7 (DTL_LOG_ALL). When this happens, we start seeing
>> DTL entries for all preempt reasons, including CEDE. In
>> scan_dispatch_log(), we add up the times from all entries and account
>> those towards stolen time. However, we should only be accounting stolen
>> time when the preemption was due to HDEC at the end of our time slice.
> 
> It's always been the case that stolen time when idle has been
> accounted as idle time, not stolen time.  That's why we didn't check
> for this in the past.
> 
> Do you have a test that shows different results (as in reported idle
> and stolen times) with this patch compared to without?

Ah ok, that makes sense now and explains why I couldn't observe much of 
a difference in practice. However, I also went by the fact that there 
are 7 other preemption reasons, which could impact our calculation.  
Looking at the list again, it looks like H_CONFER/H_PROD and some faults 
can also have an impact here, though they may be rare?

Thanks,
Naveen



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-10-26  7:43 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-25 20:25 [PATCH v1 0/5] Add dtl_entry tracepoint Naveen N. Rao
2018-10-25 20:25 ` [PATCH v1 1/5] powerpc/pseries: Use macros for referring to the DTL enable mask Naveen N. Rao
2018-10-25 20:25 ` [PATCH v1 2/5] powerpc/pseries: Do not save the previous DTL mask value Naveen N. Rao
2018-10-25 20:25 ` [PATCH v1 3/5] powerpc/pseries: Fix stolen time accounting when dtl debugfs is used Naveen N. Rao
2018-10-25 21:08   ` Paul Mackerras
2018-10-26  7:40     ` Naveen N. Rao
2018-10-25 20:25 ` [PATCH v1 4/5] powerpc/pseries: Factor out DTL buffer allocation and registration routines Naveen N. Rao
2018-10-25 20:25 ` [PATCH v1 5/5] powerpc/pseries: Introduce dtl_entry tracepoint Naveen N. Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).