linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
@ 2020-08-28  9:00 vincent.donnefort
  2020-08-28 10:27 ` Qais Yousef
  2020-10-05  7:43 ` [tip: sched/core] " tip-bot2 for Vincent Donnefort
  0 siblings, 2 replies; 26+ messages in thread
From: vincent.donnefort @ 2020-08-28  9:00 UTC (permalink / raw)
  To: mingo, peterz, vincent.guittot
  Cc: linux-kernel, dietmar.eggemann, valentin.schneider, qais.yousef,
	Vincent Donnefort

From: Vincent Donnefort <vincent.donnefort@arm.com>

rq->cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 021ad7b..7e19d59 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2055,6 +2055,7 @@ const struct sched_avg *sched_trace_rq_avg_dl(struct rq *rq);
 const struct sched_avg *sched_trace_rq_avg_irq(struct rq *rq);
 
 int sched_trace_rq_cpu(struct rq *rq);
+int sched_trace_rq_cpu_capacity(struct rq *rq);
 int sched_trace_rq_nr_running(struct rq *rq);
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 8ab48b3..f94ddd1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -630,6 +630,10 @@ DECLARE_TRACE(pelt_se_tp,
 	TP_PROTO(struct sched_entity *se),
 	TP_ARGS(se));
 
+DECLARE_TRACE(sched_cpu_capacity_tp,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq));
+
 DECLARE_TRACE(sched_overutilized_tp,
 	TP_PROTO(struct root_domain *rd, bool overutilized),
 	TP_ARGS(rd, overutilized));
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 06b0a40..e468271 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -36,6 +36,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_rt_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_dl_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_irq_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_se_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_cpu_capacity_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 44f7a0b..27f4392 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8116,6 +8116,8 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 		capacity = 1;
 
 	cpu_rq(cpu)->cpu_capacity = capacity;
+	trace_sched_cpu_capacity_tp(cpu_rq(cpu));
+
 	sdg->sgc->capacity = capacity;
 	sdg->sgc->min_capacity = capacity;
 	sdg->sgc->max_capacity = capacity;
@@ -11318,6 +11320,18 @@ int sched_trace_rq_cpu(struct rq *rq)
 }
 EXPORT_SYMBOL_GPL(sched_trace_rq_cpu);
 
+int sched_trace_rq_cpu_capacity(struct rq *rq)
+{
+	return rq ?
+#ifdef CONFIG_SMP
+		rq->cpu_capacity
+#else
+		SCHED_CAPACITY_SCALE
+#endif
+		: -1;
+}
+EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
+
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd)
 {
 #ifdef CONFIG_SMP
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-08-28  9:00 [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity vincent.donnefort
@ 2020-08-28 10:27 ` Qais Yousef
  2020-08-28 17:10   ` Dietmar Eggemann
  2020-10-05  7:43 ` [tip: sched/core] " tip-bot2 for Vincent Donnefort
  1 sibling, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-08-28 10:27 UTC (permalink / raw)
  To: vincent.donnefort
  Cc: mingo, peterz, vincent.guittot, linux-kernel, dietmar.eggemann,
	valentin.schneider

On 08/28/20 10:00, vincent.donnefort@arm.com wrote:
> From: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> rq->cpu_capacity is a key element in several scheduler parts, such as EAS
> task placement and load balancing. Tracking this value enables testing
> and/or debugging by a toolkit.
> 
> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h

[...]

> +int sched_trace_rq_cpu_capacity(struct rq *rq)
> +{
> +	return rq ?
> +#ifdef CONFIG_SMP
> +		rq->cpu_capacity
> +#else
> +		SCHED_CAPACITY_SCALE
> +#endif
> +		: -1;
> +}
> +EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
> +

The placement of this #ifdef looks odd to me. But FWIW

Reviewed-by: Qais Yousef <qais.yousef@arm.com>

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-08-28 10:27 ` Qais Yousef
@ 2020-08-28 17:10   ` Dietmar Eggemann
  2020-08-28 17:26     ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Dietmar Eggemann @ 2020-08-28 17:10 UTC (permalink / raw)
  To: Qais Yousef, vincent.donnefort
  Cc: mingo, peterz, vincent.guittot, linux-kernel, valentin.schneider

On 28/08/2020 12:27, Qais Yousef wrote:
> On 08/28/20 10:00, vincent.donnefort@arm.com wrote:
>> From: Vincent Donnefort <vincent.donnefort@arm.com>
>>
>> rq->cpu_capacity is a key element in several scheduler parts, such as EAS
>> task placement and load balancing. Tracking this value enables testing
>> and/or debugging by a toolkit.
>>
>> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
> 
> [...]
> 
>> +int sched_trace_rq_cpu_capacity(struct rq *rq)
>> +{
>> +	return rq ?
>> +#ifdef CONFIG_SMP
>> +		rq->cpu_capacity
>> +#else
>> +		SCHED_CAPACITY_SCALE
>> +#endif
>> +		: -1;
>> +}
>> +EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
>> +
> 
> The placement of this #ifdef looks odd to me. But FWIW
> 
> Reviewed-by: Qais Yousef <qais.yousef@arm.com>

Returning -1 for cpu_capacity? It makes sense for sched_trace_rq_cpu()
but for cpu_capacity?

Can you remind me why we have all these helper functions like
sched_trace_rq_cpu_capacity?

In case we would let the extra code (which transforms trace points into
trace events) know the internals of struct rq we could handle those
things in the TRACE_EVENT and/or the register_trace_##name(void
(*probe)(data_proto), void *data) thing.
We always said when the internal things will change this extra code will
break. So that's not an issue.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-08-28 17:10   ` Dietmar Eggemann
@ 2020-08-28 17:26     ` Qais Yousef
  2020-09-02 10:44       ` Dietmar Eggemann
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-08-28 17:26 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: vincent.donnefort, mingo, peterz, vincent.guittot, linux-kernel,
	valentin.schneider

On 08/28/20 19:10, Dietmar Eggemann wrote:
> On 28/08/2020 12:27, Qais Yousef wrote:
> > On 08/28/20 10:00, vincent.donnefort@arm.com wrote:
> >> From: Vincent Donnefort <vincent.donnefort@arm.com>
> >>
> >> rq->cpu_capacity is a key element in several scheduler parts, such as EAS
> >> task placement and load balancing. Tracking this value enables testing
> >> and/or debugging by a toolkit.
> >>
> >> Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
> >>
> >> diff --git a/include/linux/sched.h b/include/linux/sched.h
> > 
> > [...]
> > 
> >> +int sched_trace_rq_cpu_capacity(struct rq *rq)
> >> +{
> >> +	return rq ?
> >> +#ifdef CONFIG_SMP
> >> +		rq->cpu_capacity
> >> +#else
> >> +		SCHED_CAPACITY_SCALE
> >> +#endif
> >> +		: -1;
> >> +}
> >> +EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
> >> +
> > 
> > The placement of this #ifdef looks odd to me. But FWIW
> > 
> > Reviewed-by: Qais Yousef <qais.yousef@arm.com>
> 
> Returning -1 for cpu_capacity? It makes sense for sched_trace_rq_cpu()
> but for cpu_capacity?

If rq is NULL you return -1, an error the way I read it. rq is passed as an
argument, so better ensure we handle NULL and not blindly dereference rq and
crash.

> 
> Can you remind me why we have all these helper functions like
> sched_trace_rq_cpu_capacity?

struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
these helper functions was the agreement to help modules trace internal info.
By passing generic info you decouple the tracepoint from giving specific info
and allow the modules to extract all the info they need from the same
tracepoint. IE: if you need more than just cpu_capacity from this tracepoint,
you can get that without having to continuously add extra arguments everytime
you need an extra piece of info. Unless this info is not in the rq of course.

> 
> In case we would let the extra code (which transforms trace points into
> trace events) know the internals of struct rq we could handle those
> things in the TRACE_EVENT and/or the register_trace_##name(void
> (*probe)(data_proto), void *data) thing.
> We always said when the internal things will change this extra code will
> break. So that's not an issue.

The problem is that you need to export struct rq in a public header. Which we
don't want to do. I have been trying to find out how to use BTF so we can
remove these functions. Haven't gotten far away yet - but it should be doable
and it's a question of me finding enough time to understand what was currently
done and if I can re-use something or need to come up with extra infrastructure
first.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-08-28 17:26     ` Qais Yousef
@ 2020-09-02 10:44       ` Dietmar Eggemann
  2020-09-02 13:54         ` Phil Auld
  2020-09-07 10:48         ` Qais Yousef
  0 siblings, 2 replies; 26+ messages in thread
From: Dietmar Eggemann @ 2020-09-02 10:44 UTC (permalink / raw)
  To: Qais Yousef
  Cc: vincent.donnefort, mingo, peterz, vincent.guittot, linux-kernel,
	valentin.schneider, Phil Auld

+ Phil Auld <pauld@redhat.com>

On 28/08/2020 19:26, Qais Yousef wrote:
> On 08/28/20 19:10, Dietmar Eggemann wrote:
>> On 28/08/2020 12:27, Qais Yousef wrote:
>>> On 08/28/20 10:00, vincent.donnefort@arm.com wrote:
>>>> From: Vincent Donnefort <vincent.donnefort@arm.com>

[...]

>> Can you remind me why we have all these helper functions like
>> sched_trace_rq_cpu_capacity?
> 
> struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
> these helper functions was the agreement to help modules trace internal info.
> By passing generic info you decouple the tracepoint from giving specific info
> and allow the modules to extract all the info they need from the same
> tracepoint. IE: if you need more than just cpu_capacity from this tracepoint,
> you can get that without having to continuously add extra arguments everytime
> you need an extra piece of info. Unless this info is not in the rq of course.

I think this decoupling is not necessary. The natural place for those
scheduler trace_event based on trace_points extension files is
kernel/sched/ and here the internal sched.h can just be included.

If someone really wants to build this as an out-of-tree module there is
an easy way to make kernel/sched/sched.h visible.

CFLAGS_sched_tp.o := -I$KERNEL_SRC/kernel/sched

all:
    make -C $KERNEL_SRC M=$(PWD) modules

This allowed me to build our trace_event extension module (sched_tp.c,
sched_events.h) out-of-tree and I was able to get rid of all the
sched_trace_foo() functions (in fair.c, include/linux/sched.h) and code
there content directly in foo.c

There are two things we would need exported from the kernel:

(1) cfs_rq_tg_path() to print the path of a taskgroup cfs_rq or se.

(2) sched_uclamp_used so uclamp_rq_util_with() can be used in
    sched_events.h.

I put Phil Auld on cc because of his trace_point
sched_update_nr_running_tp. I think Phil was using sched_tp as a base so
I can't see an issue why we can't also remove sched_trace_rq_nr_running().

>> In case we would let the extra code (which transforms trace points into
>> trace events) know the internals of struct rq we could handle those
>> things in the TRACE_EVENT and/or the register_trace_##name(void
>> (*probe)(data_proto), void *data) thing.
>> We always said when the internal things will change this extra code will
>> break. So that's not an issue.
> 
> The problem is that you need to export struct rq in a public header. Which we
> don't want to do. I have been trying to find out how to use BTF so we can
> remove these functions. Haven't gotten far away yet - but it should be doable
> and it's a question of me finding enough time to understand what was currently
> done and if I can re-use something or need to come up with extra infrastructure
> first.

Let's keep the footprint of these trace points as small as possible in
the scheduler code.

I'm putting the changes I described above in our monthly EAS integration
right now and when this worked out nicely I will share the patches on lkml.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-02 10:44       ` Dietmar Eggemann
@ 2020-09-02 13:54         ` Phil Auld
  2020-09-07 11:02           ` Qais Yousef
  2020-09-07 10:48         ` Qais Yousef
  1 sibling, 1 reply; 26+ messages in thread
From: Phil Auld @ 2020-09-02 13:54 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: Qais Yousef, vincent.donnefort, mingo, peterz, vincent.guittot,
	linux-kernel, valentin.schneider

On Wed, Sep 02, 2020 at 12:44:42PM +0200 Dietmar Eggemann wrote:
> + Phil Auld <pauld@redhat.com>
>

Thanks Dietmar.


> On 28/08/2020 19:26, Qais Yousef wrote:
> > On 08/28/20 19:10, Dietmar Eggemann wrote:
> >> On 28/08/2020 12:27, Qais Yousef wrote:
> >>> On 08/28/20 10:00, vincent.donnefort@arm.com wrote:
> >>>> From: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> [...]
> 
> >> Can you remind me why we have all these helper functions like
> >> sched_trace_rq_cpu_capacity?
> > 
> > struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
> > these helper functions was the agreement to help modules trace internal info.
> > By passing generic info you decouple the tracepoint from giving specific info
> > and allow the modules to extract all the info they need from the same
> > tracepoint. IE: if you need more than just cpu_capacity from this tracepoint,
> > you can get that without having to continuously add extra arguments everytime
> > you need an extra piece of info. Unless this info is not in the rq of course.
> 
> I think this decoupling is not necessary. The natural place for those
> scheduler trace_event based on trace_points extension files is
> kernel/sched/ and here the internal sched.h can just be included.
> 
> If someone really wants to build this as an out-of-tree module there is
> an easy way to make kernel/sched/sched.h visible.
>

It's not so much that we really _want_ to do this in an external module.
But we aren't adding more trace events and my (limited) knowledge of
BPF let me to the conclusion that its raw tracepoint functionality
requires full events.  I didn't see any other way to do it.

We could put sched_tp in the tree under a debug CONFIG :)

> CFLAGS_sched_tp.o := -I$KERNEL_SRC/kernel/sched
> 
> all:
>     make -C $KERNEL_SRC M=$(PWD) modules
> 
> This allowed me to build our trace_event extension module (sched_tp.c,
> sched_events.h) out-of-tree and I was able to get rid of all the
> sched_trace_foo() functions (in fair.c, include/linux/sched.h) and code
> there content directly in foo.c
> 
> There are two things we would need exported from the kernel:
> 
> (1) cfs_rq_tg_path() to print the path of a taskgroup cfs_rq or se.
> 
> (2) sched_uclamp_used so uclamp_rq_util_with() can be used in
>     sched_events.h.
> 
> I put Phil Auld on cc because of his trace_point
> sched_update_nr_running_tp. I think Phil was using sched_tp as a base so
> I can't see an issue why we can't also remove sched_trace_rq_nr_running().
>

Our Perf team is now actively using this in downstream, using sched_tp, and
finding it very useful.

But I have no problem if this is all simpler in the kernel tree.

> >> In case we would let the extra code (which transforms trace points into
> >> trace events) know the internals of struct rq we could handle those
> >> things in the TRACE_EVENT and/or the register_trace_##name(void
> >> (*probe)(data_proto), void *data) thing.
> >> We always said when the internal things will change this extra code will
> >> break. So that's not an issue.
> > 
> > The problem is that you need to export struct rq in a public header. Which we
> > don't want to do. I have been trying to find out how to use BTF so we can
> > remove these functions. Haven't gotten far away yet - but it should be doable
> > and it's a question of me finding enough time to understand what was currently
> > done and if I can re-use something or need to come up with extra infrastructure
> > first.
> 
> Let's keep the footprint of these trace points as small as possible in
> the scheduler code.
> 
> I'm putting the changes I described above in our monthly EAS integration
> right now and when this worked out nicely I will share the patches on lkml.
> 

Sounds good, thanks!


Cheers,
Phil

-- 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-02 10:44       ` Dietmar Eggemann
  2020-09-02 13:54         ` Phil Auld
@ 2020-09-07 10:48         ` Qais Yousef
  2020-09-07 11:13           ` peterz
  1 sibling, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-09-07 10:48 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: vincent.donnefort, mingo, peterz, vincent.guittot, linux-kernel,
	valentin.schneider, Phil Auld

Hi Dietmar

On 09/02/20 12:44, Dietmar Eggemann wrote:
> + Phil Auld <pauld@redhat.com>
> 
> On 28/08/2020 19:26, Qais Yousef wrote:
> > On 08/28/20 19:10, Dietmar Eggemann wrote:
> >> On 28/08/2020 12:27, Qais Yousef wrote:
> >>> On 08/28/20 10:00, vincent.donnefort@arm.com wrote:
> >>>> From: Vincent Donnefort <vincent.donnefort@arm.com>
> 
> [...]
> 
> >> Can you remind me why we have all these helper functions like
> >> sched_trace_rq_cpu_capacity?
> > 
> > struct rq is defined in kernel/sched/sched.h. It's not exported. Exporting
> > these helper functions was the agreement to help modules trace internal info.
> > By passing generic info you decouple the tracepoint from giving specific info
> > and allow the modules to extract all the info they need from the same
> > tracepoint. IE: if you need more than just cpu_capacity from this tracepoint,
> > you can get that without having to continuously add extra arguments everytime
> > you need an extra piece of info. Unless this info is not in the rq of course.
> 
> I think this decoupling is not necessary. The natural place for those
> scheduler trace_event based on trace_points extension files is
> kernel/sched/ and here the internal sched.h can just be included.
> 
> If someone really wants to build this as an out-of-tree module there is
> an easy way to make kernel/sched/sched.h visible.
> 
> CFLAGS_sched_tp.o := -I$KERNEL_SRC/kernel/sched
> 
> all:
>     make -C $KERNEL_SRC M=$(PWD) modules

Sorry for the late response. Was on holiday.

IMHO the above is a hack. Out-of-tree modules should rely on public headers and
exported functions only. What you propose means that people who want to use
these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
maybe true for us who work in the embedded world. But users who run normal
distro kernels (desktop/servers) will fail to build against
`/lib/modules/$(uname -r)/build` where that internal header file is not
exported. IOW, we're putting extra hoops for a large class of users here to be
able to access these internal data. They have to maintain their out-of-tree
definition of these structures.

FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with the
exports as it's still not a contract and they can disappear anytime we want.
Migrating to using BTF is the right way forward IMO. I don't think what we have
here is out-of-control yet. Though I agree they're annoying.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-02 13:54         ` Phil Auld
@ 2020-09-07 11:02           ` Qais Yousef
  2020-09-08 13:19             ` Phil Auld
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-09-07 11:02 UTC (permalink / raw)
  To: Phil Auld
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, peterz,
	vincent.guittot, linux-kernel, valentin.schneider

On 09/02/20 09:54, Phil Auld wrote:
> > 
> > I think this decoupling is not necessary. The natural place for those
> > scheduler trace_event based on trace_points extension files is
> > kernel/sched/ and here the internal sched.h can just be included.
> > 
> > If someone really wants to build this as an out-of-tree module there is
> > an easy way to make kernel/sched/sched.h visible.
> >
> 
> It's not so much that we really _want_ to do this in an external module.
> But we aren't adding more trace events and my (limited) knowledge of
> BPF let me to the conclusion that its raw tracepoint functionality
> requires full events.  I didn't see any other way to do it.

I did have a patch that allowed that. It might be worth trying to upstream it.
It just required a new macro which could be problematic.

https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b

With the above I could attach using bpf::RAW_TRACEPOINT mechanism.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-07 10:48         ` Qais Yousef
@ 2020-09-07 11:13           ` peterz
  2020-09-07 14:51             ` Qais Yousef
  2021-01-04 15:18             ` Qais Yousef
  0 siblings, 2 replies; 26+ messages in thread
From: peterz @ 2020-09-07 11:13 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, vincent.guittot,
	linux-kernel, valentin.schneider, Phil Auld

On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> IMHO the above is a hack. Out-of-tree modules should rely on public headers and
> exported functions only. What you propose means that people who want to use
> these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
> maybe true for us who work in the embedded world. But users who run normal
> distro kernels (desktop/servers) will fail to build against

But this isn't really aimed at regular users. We're aiming this at
developers (IIUC) so I dont really see this as a problem.

> FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with the
> exports as it's still not a contract and they can disappear anytime we want.
> Migrating to using BTF is the right way forward IMO. I don't think what we have
> here is out-of-control yet. Though I agree they're annoying.

Right, we're hiding behind the explicit lack of ABI for modules.

Anyway, CTF/BTF/random other crap that isn't DWARFs should work fine to
replace all this muck. Just no idea what the state of any of that is.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-07 11:13           ` peterz
@ 2020-09-07 14:51             ` Qais Yousef
  2020-09-08 11:17               ` Dietmar Eggemann
  2021-01-04 15:18             ` Qais Yousef
  1 sibling, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-09-07 14:51 UTC (permalink / raw)
  To: peterz
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, vincent.guittot,
	linux-kernel, valentin.schneider, Phil Auld

On 09/07/20 13:13, peterz@infradead.org wrote:
> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> > IMHO the above is a hack. Out-of-tree modules should rely on public headers and
> > exported functions only. What you propose means that people who want to use
> > these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
> > maybe true for us who work in the embedded world. But users who run normal
> > distro kernels (desktop/servers) will fail to build against
> 
> But this isn't really aimed at regular users. We're aiming this at
> developers (IIUC) so I dont really see this as a problem.
> 
> > FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with the
> > exports as it's still not a contract and they can disappear anytime we want.
> > Migrating to using BTF is the right way forward IMO. I don't think what we have
> > here is out-of-control yet. Though I agree they're annoying.
> 
> Right, we're hiding behind the explicit lack of ABI for modules.
> 
> Anyway, CTF/BTF/random other crap that isn't DWARFs should work fine to
> replace all this muck. Just no idea what the state of any of that is.

So I was thinking of having a function that allows a module to read member of
struct rq (or any struct for that matters), but I think that's the harder
(though neater) way around.

Just compiled a kernel with CONFIG_DEBUG_INFO_BTF_INFO; and doing

	$ pahole rq
	struct rq {
        raw_spinlock_t             lock;                 /*     0     4 */
        unsigned int               nr_running;           /*     4     4 */
        long unsigned int          last_blocked_load_update_tick; /*     8     8 */
        unsigned int               has_blocked_load;     /*    16     4 */

        /* XXX 12 bytes hole, try to pack */

        call_single_data_t         nohz_csd;             /*    32    32 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        unsigned int               nohz_tick_stopped;    /*    64     4 */
        atomic_t                   nohz_flags;           /*    68     4 */
        unsigned int               ttwu_pending;         /*    72     4 */

        /* XXX 4 bytes hole, try to pack */

        u64                        nr_switches;          /*    80     8 */
	.
	.
	.
	}

dumps the struct rq {...}; which means one can easily use that to autogenerate
a header containing the structs they care about accessing for their running
kernel.

pahole automatically knows how to find /sys/kernel/btf/vmlinux to parse the
debug info btw.

The only caveat is that one has to recompile the module for each running
kernel; but that's acceptable I think. Not sure how many allow loading a module
that's not compiled for that particular kernel version anyway.

Note to try this you'll need pahole v1.16 or newer. And compiling pahole on
Ubuntu is a pain. I had to create a fedora docker image to compile it in.

So I think we have this already solved. Though not sure how to document it..

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-07 14:51             ` Qais Yousef
@ 2020-09-08 11:17               ` Dietmar Eggemann
  2020-09-08 15:17                 ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Dietmar Eggemann @ 2020-09-08 11:17 UTC (permalink / raw)
  To: Qais Yousef, peterz
  Cc: vincent.donnefort, mingo, vincent.guittot, linux-kernel,
	valentin.schneider, Phil Auld

On 07/09/2020 16:51, Qais Yousef wrote:
> On 09/07/20 13:13, peterz@infradead.org wrote:
>> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
>>> IMHO the above is a hack. Out-of-tree modules should rely on public headers and
>>> exported functions only. What you propose means that people who want to use
>>> these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
>>> maybe true for us who work in the embedded world. But users who run normal
>>> distro kernels (desktop/servers) will fail to build against
>>
>> But this isn't really aimed at regular users. We're aiming this at
>> developers (IIUC) so I dont really see this as a problem.

This is what I thought as well. All these helpers can be coded directly
in these tracepoint-2-traceevent (tp-2-te) converters. As long as they
are build from within kernel/sched/ there is no issue with the export
via kernel/sched/sched.h. Otherwise this little trick would be necessary.
But since it is a tool for developers I guess we can assume that they
can build it from within kernel/sched/.

I tested:

https://lkml.kernel.org/r/20200907091717.26116-1-dietmar.eggemann@arm.com

with our EAS integration which provides one of these tp-2-t2 converter
(sched_tp.c).

http://linux-arm.org/git?p=linux-power.git;a=shortlog;h=refs/heads/topic/tracepoints

[...]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-07 11:02           ` Qais Yousef
@ 2020-09-08 13:19             ` Phil Auld
  2020-09-08 15:22               ` Qais Yousef
  2021-01-04 18:26               ` Qais Yousef
  0 siblings, 2 replies; 26+ messages in thread
From: Phil Auld @ 2020-09-08 13:19 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, peterz,
	vincent.guittot, linux-kernel, valentin.schneider

Hi Quais,

On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> On 09/02/20 09:54, Phil Auld wrote:
> > > 
> > > I think this decoupling is not necessary. The natural place for those
> > > scheduler trace_event based on trace_points extension files is
> > > kernel/sched/ and here the internal sched.h can just be included.
> > > 
> > > If someone really wants to build this as an out-of-tree module there is
> > > an easy way to make kernel/sched/sched.h visible.
> > >
> > 
> > It's not so much that we really _want_ to do this in an external module.
> > But we aren't adding more trace events and my (limited) knowledge of
> > BPF let me to the conclusion that its raw tracepoint functionality
> > requires full events.  I didn't see any other way to do it.
> 
> I did have a patch that allowed that. It might be worth trying to upstream it.
> It just required a new macro which could be problematic.
> 
> https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> 
> With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
>

Yeah, that could work. I meant there was no way to do it with what was there :)

In our initial attempts at using BPF to get at nr_running (which I was not
involved in and don't have all the details...) there were issues being able to
keep up and losing events.  That may have been an implementation issue, but
using the module and trace-cmd doesn't have that problem. Hopefully you don't
see that using RAW_TRACEPOINTs.


Fwiw, I don't think these little helper routines are all that hard to maintain.
If something changes in those fields, which seems moderately unlikely at least
for many of them, the compiler will complain.

And I agree with you about preferring to use the public headers for the module.
I think we can work around it though, if needed.


Cheers,
Phil

> Cheers
> 
> --
> Qais Yousef
> 

-- 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-08 11:17               ` Dietmar Eggemann
@ 2020-09-08 15:17                 ` Qais Yousef
  2020-09-08 16:17                   ` Dietmar Eggemann
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2020-09-08 15:17 UTC (permalink / raw)
  To: Dietmar Eggemann
  Cc: peterz, vincent.donnefort, mingo, vincent.guittot, linux-kernel,
	valentin.schneider, Phil Auld

On 09/08/20 13:17, Dietmar Eggemann wrote:
> On 07/09/2020 16:51, Qais Yousef wrote:
> > On 09/07/20 13:13, peterz@infradead.org wrote:
> >> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> >>> IMHO the above is a hack. Out-of-tree modules should rely on public headers and
> >>> exported functions only. What you propose means that people who want to use
> >>> these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
> >>> maybe true for us who work in the embedded world. But users who run normal
> >>> distro kernels (desktop/servers) will fail to build against
> >>
> >> But this isn't really aimed at regular users. We're aiming this at
> >> developers (IIUC) so I dont really see this as a problem.
> 
> This is what I thought as well. All these helpers can be coded directly
> in these tracepoint-2-traceevent (tp-2-te) converters. As long as they
> are build from within kernel/sched/ there is no issue with the export
> via kernel/sched/sched.h. Otherwise this little trick would be necessary.
> But since it is a tool for developers I guess we can assume that they
> can build it from within kernel/sched/.

I think this will reduce the usefulness of these tracepoints. But if you really
want to remove them, I am certainly not strongly attached to them and they were
meant to be removable anyway. So fine by me :-)

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-08 13:19             ` Phil Auld
@ 2020-09-08 15:22               ` Qais Yousef
  2021-01-04 18:26               ` Qais Yousef
  1 sibling, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2020-09-08 15:22 UTC (permalink / raw)
  To: Phil Auld
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, peterz,
	vincent.guittot, linux-kernel, valentin.schneider

On 09/08/20 09:19, Phil Auld wrote:
> Hi Quais,
> 
> On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> > On 09/02/20 09:54, Phil Auld wrote:
> > > > 
> > > > I think this decoupling is not necessary. The natural place for those
> > > > scheduler trace_event based on trace_points extension files is
> > > > kernel/sched/ and here the internal sched.h can just be included.
> > > > 
> > > > If someone really wants to build this as an out-of-tree module there is
> > > > an easy way to make kernel/sched/sched.h visible.
> > > >
> > > 
> > > It's not so much that we really _want_ to do this in an external module.
> > > But we aren't adding more trace events and my (limited) knowledge of
> > > BPF let me to the conclusion that its raw tracepoint functionality
> > > requires full events.  I didn't see any other way to do it.
> > 
> > I did have a patch that allowed that. It might be worth trying to upstream it.
> > It just required a new macro which could be problematic.
> > 
> > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > 
> > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> >
> 
> Yeah, that could work. I meant there was no way to do it with what was there :)
> 
> In our initial attempts at using BPF to get at nr_running (which I was not
> involved in and don't have all the details...) there were issues being able to
> keep up and losing events.  That may have been an implementation issue, but
> using the module and trace-cmd doesn't have that problem. Hopefully you don't
> see that using RAW_TRACEPOINTs.

I haven't played with that since then tbh. I use BPF every now and then, but
rather simplistically. So if I encoutnered such a problem I wouldn't have
noticed.

> Fwiw, I don't think these little helper routines are all that hard to maintain.
> If something changes in those fields, which seems moderately unlikely at least
> for many of them, the compiler will complain.
> 
> And I agree with you about preferring to use the public headers for the module.
> I think we can work around it though, if needed.

+1

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-08 15:17                 ` Qais Yousef
@ 2020-09-08 16:17                   ` Dietmar Eggemann
  0 siblings, 0 replies; 26+ messages in thread
From: Dietmar Eggemann @ 2020-09-08 16:17 UTC (permalink / raw)
  To: Qais Yousef
  Cc: peterz, vincent.donnefort, mingo, vincent.guittot, linux-kernel,
	valentin.schneider, Phil Auld

On 08/09/2020 17:17, Qais Yousef wrote:
> On 09/08/20 13:17, Dietmar Eggemann wrote:
>> On 07/09/2020 16:51, Qais Yousef wrote:
>>> On 09/07/20 13:13, peterz@infradead.org wrote:
>>>> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
>>>>> IMHO the above is a hack. Out-of-tree modules should rely on public headers and
>>>>> exported functions only. What you propose means that people who want to use
>>>>> these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
>>>>> maybe true for us who work in the embedded world. But users who run normal
>>>>> distro kernels (desktop/servers) will fail to build against
>>>>
>>>> But this isn't really aimed at regular users. We're aiming this at
>>>> developers (IIUC) so I dont really see this as a problem.
>>
>> This is what I thought as well. All these helpers can be coded directly
>> in these tracepoint-2-traceevent (tp-2-te) converters. As long as they
>> are build from within kernel/sched/ there is no issue with the export
>> via kernel/sched/sched.h. Otherwise this little trick would be necessary.
>> But since it is a tool for developers I guess we can assume that they
>> can build it from within kernel/sched/.
> 
> I think this will reduce the usefulness of these tracepoints. But if you really
> want to remove them, I am certainly not strongly attached to them and they were
> meant to be removable anyway. So fine by me :-)

I would like to see them go. Less stuff to maintain. And as we see with
the new cpu_capacity tp there are always more helper functions coming.

IMHO, the ability to build those modules via public headers is less
important since they are meant for developers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [tip: sched/core] sched/debug: Add new tracepoint to track cpu_capacity
  2020-08-28  9:00 [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity vincent.donnefort
  2020-08-28 10:27 ` Qais Yousef
@ 2020-10-05  7:43 ` tip-bot2 for Vincent Donnefort
  1 sibling, 0 replies; 26+ messages in thread
From: tip-bot2 for Vincent Donnefort @ 2020-10-05  7:43 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Vincent Donnefort, Peter Zijlstra (Intel), x86, LKML

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     51cf18c90ca1b51d1cb4af3064e85fcf8610b5d2
Gitweb:        https://git.kernel.org/tip/51cf18c90ca1b51d1cb4af3064e85fcf8610b5d2
Author:        Vincent Donnefort <vincent.donnefort@arm.com>
AuthorDate:    Fri, 28 Aug 2020 10:00:49 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Sat, 03 Oct 2020 16:30:52 +02:00

sched/debug: Add new tracepoint to track cpu_capacity

rq->cpu_capacity is a key element in several scheduler parts, such as EAS
task placement and load balancing. Tracking this value enables testing
and/or debugging by a toolkit.

Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1598605249-72651-1-git-send-email-vincent.donnefort@arm.com
---
 include/linux/sched.h        |  1 +
 include/trace/events/sched.h |  4 ++++
 kernel/sched/core.c          |  1 +
 kernel/sched/fair.c          | 14 ++++++++++++++
 4 files changed, 20 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 2bf0af1..f516c18 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2044,6 +2044,7 @@ const struct sched_avg *sched_trace_rq_avg_dl(struct rq *rq);
 const struct sched_avg *sched_trace_rq_avg_irq(struct rq *rq);
 
 int sched_trace_rq_cpu(struct rq *rq);
+int sched_trace_rq_cpu_capacity(struct rq *rq);
 int sched_trace_rq_nr_running(struct rq *rq);
 
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index fec25b9..c96a433 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -630,6 +630,10 @@ DECLARE_TRACE(pelt_se_tp,
 	TP_PROTO(struct sched_entity *se),
 	TP_ARGS(se));
 
+DECLARE_TRACE(sched_cpu_capacity_tp,
+	TP_PROTO(struct rq *rq),
+	TP_ARGS(rq));
+
 DECLARE_TRACE(sched_overutilized_tp,
 	TP_PROTO(struct root_domain *rd, bool overutilized),
 	TP_ARGS(rd, overutilized));
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index dd32d85..3dc415f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -36,6 +36,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_rt_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_dl_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_irq_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(pelt_se_tp);
+EXPORT_TRACEPOINT_SYMBOL_GPL(sched_cpu_capacity_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_overutilized_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp);
 EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cec6cf9..aa4c622 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8108,6 +8108,8 @@ static void update_cpu_capacity(struct sched_domain *sd, int cpu)
 		capacity = 1;
 
 	cpu_rq(cpu)->cpu_capacity = capacity;
+	trace_sched_cpu_capacity_tp(cpu_rq(cpu));
+
 	sdg->sgc->capacity = capacity;
 	sdg->sgc->min_capacity = capacity;
 	sdg->sgc->max_capacity = capacity;
@@ -11321,6 +11323,18 @@ int sched_trace_rq_cpu(struct rq *rq)
 }
 EXPORT_SYMBOL_GPL(sched_trace_rq_cpu);
 
+int sched_trace_rq_cpu_capacity(struct rq *rq)
+{
+	return rq ?
+#ifdef CONFIG_SMP
+		rq->cpu_capacity
+#else
+		SCHED_CAPACITY_SCALE
+#endif
+		: -1;
+}
+EXPORT_SYMBOL_GPL(sched_trace_rq_cpu_capacity);
+
 const struct cpumask *sched_trace_rd_span(struct root_domain *rd)
 {
 #ifdef CONFIG_SMP

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-07 11:13           ` peterz
  2020-09-07 14:51             ` Qais Yousef
@ 2021-01-04 15:18             ` Qais Yousef
  1 sibling, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2021-01-04 15:18 UTC (permalink / raw)
  To: peterz
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, vincent.guittot,
	linux-kernel, valentin.schneider, Phil Auld,
	Arnaldo Carvalho de Melo

On 09/07/20 13:13, peterz@infradead.org wrote:
> On Mon, Sep 07, 2020 at 11:48:45AM +0100, Qais Yousef wrote:
> > IMHO the above is a hack. Out-of-tree modules should rely on public headers and
> > exported functions only. What you propose means that people who want to use
> > these tracepoints in meaningful way must have a prebuilt kernel handy. Which is
> > maybe true for us who work in the embedded world. But users who run normal
> > distro kernels (desktop/servers) will fail to build against
> 
> But this isn't really aimed at regular users. We're aiming this at
> developers (IIUC) so I dont really see this as a problem.
> 
> > FWIW, I did raise this concern with Peter in 2019 OSPM and he was okay with the
> > exports as it's still not a contract and they can disappear anytime we want.
> > Migrating to using BTF is the right way forward IMO. I don't think what we have
> > here is out-of-control yet. Though I agree they're annoying.
> 
> Right, we're hiding behind the explicit lack of ABI for modules.
> 
> Anyway, CTF/BTF/random other crap that isn't DWARFs should work fine to
> replace all this muck. Just no idea what the state of any of that is.

So I have updated my reference module to remove the dependency on those
functions:

	https://github.com/qais-yousef/tracepoints-helpers/tree/pelt-tps-v3-create-events/sched_tp

When building against a prebuilt kernel tree, I use pahole + vmlinux to generate
vmlinux.h with all the internal struct we depend on. The kernel must be
compiled with CONFIG_DEBUG_INFO for pahole to work.

When building against a running kernel (ie: exported headers only available),
we try to use /sys/kernel/btf/vmlinux to generate vmlinux.h.

One outstanding issue is that pahole + BTF don't generate alignment info so
while the module compiles but I get a crash when I load the module and enable
one of the tracepoints. bpftool generates alignment info with BTF, but it dumps
everything which leads to another set of problems.

Barring fixing the BTF issue which I'm talking with Arnaldo about, I think we
should be in good position to remove these exported functions soon.

In the module we have to maintain helper functions (see sched_tp_helpers.h) but
I think that's much easier than maintaining out of tree copy of the structs.

It also enabled me to add uclamp trace events which had dependency on internal
details that I wasn't keen on exporting in mainline.

Is this a good/better compromise?

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2020-09-08 13:19             ` Phil Auld
  2020-09-08 15:22               ` Qais Yousef
@ 2021-01-04 18:26               ` Qais Yousef
  2021-01-04 18:59                 ` Alexei Starovoitov
  2021-01-11 14:04                 ` Peter Zijlstra
  1 sibling, 2 replies; 26+ messages in thread
From: Qais Yousef @ 2021-01-04 18:26 UTC (permalink / raw)
  To: Phil Auld, Peter Zijlstra (Intel)
  Cc: Dietmar Eggemann, vincent.donnefort, mingo, vincent.guittot,
	linux-kernel, valentin.schneider

On 09/08/20 09:19, Phil Auld wrote:
> Hi Quais,
> 
> On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> > On 09/02/20 09:54, Phil Auld wrote:
> > > > 
> > > > I think this decoupling is not necessary. The natural place for those
> > > > scheduler trace_event based on trace_points extension files is
> > > > kernel/sched/ and here the internal sched.h can just be included.
> > > > 
> > > > If someone really wants to build this as an out-of-tree module there is
> > > > an easy way to make kernel/sched/sched.h visible.
> > > >
> > > 
> > > It's not so much that we really _want_ to do this in an external module.
> > > But we aren't adding more trace events and my (limited) knowledge of
> > > BPF let me to the conclusion that its raw tracepoint functionality
> > > requires full events.  I didn't see any other way to do it.
> > 
> > I did have a patch that allowed that. It might be worth trying to upstream it.
> > It just required a new macro which could be problematic.
> > 
> > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > 
> > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> >
> 
> Yeah, that could work. I meant there was no way to do it with what was there :)
> 
> In our initial attempts at using BPF to get at nr_running (which I was not
> involved in and don't have all the details...) there were issues being able to
> keep up and losing events.  That may have been an implementation issue, but
> using the module and trace-cmd doesn't have that problem. Hopefully you don't
> see that using RAW_TRACEPOINTs.

So I have a proper patch for that now, that actually turned out to be really
tiny once you untangle exactly what is missing.

Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
that?

Thanks

--
Qais Yousef

-->8--

From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
From: Qais Yousef <qais.yousef@arm.com>
Date: Wed, 30 Dec 2020 22:20:34 +0000
Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints

Some subsystems only have bare tracepoints (a tracepoint with no
associated trace event) to avoid the problem of trace events being an
ABI that can't be changed.

From bpf presepective, bare tracepoints are what it calls
RAW_TRACEPOINT().

Since bpf assumed there's 1:1 mapping, it relied on hooking to
DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
no knowledge about their existence.

By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.

Enabling that comes with the contract that changes to raw tracepoints
don't constitute a regression if they break existing bpf programs.
We need the ability to continue to morph and modify these raw
tracepoints without worrying about any ABI.

Signed-off-by: Qais Yousef <qais.yousef@arm.com>
---
 include/trace/bpf_probe.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
index cd74bffed5c6..a23be89119aa 100644
--- a/include/trace/bpf_probe.h
+++ b/include/trace/bpf_probe.h
@@ -55,8 +55,7 @@
 /* tracepoints with more than 12 arguments will hit build error */
 #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
 
-#undef DECLARE_EVENT_CLASS
-#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
+#define __BPF_DECLARE_TRACE(call, proto, args)				\
 static notrace void							\
 __bpf_trace_##call(void *__data, proto)					\
 {									\
@@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)					\
 	CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));	\
 }
 
+#undef DECLARE_EVENT_CLASS
+#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print)	\
+	__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
+
 /*
  * This part is compiled out, it is only here as a build time check
  * to make sure that if the tracepoint handling changes, the
@@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
 #define DEFINE_EVENT_PRINT(template, name, proto, args, print)	\
 	DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
 
+#undef DECLARE_TRACE
+#define DECLARE_TRACE(call, proto, args)				\
+	__BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))		\
+	__DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
+
 #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
 
 #undef DEFINE_EVENT_WRITABLE
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-04 18:26               ` Qais Yousef
@ 2021-01-04 18:59                 ` Alexei Starovoitov
  2021-01-05 11:38                   ` Qais Yousef
  2021-01-11 14:04                 ` Peter Zijlstra
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2021-01-04 18:59 UTC (permalink / raw)
  To: Qais Yousef, bpf, Daniel Borkmann, Andrii Nakryiko
  Cc: Phil Auld, Peter Zijlstra (Intel),
	Dietmar Eggemann, vincent.donnefort, Ingo Molnar,
	vincent.guittot, LKML, Valentin Schneider

On Mon, Jan 4, 2021 at 10:29 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 09/08/20 09:19, Phil Auld wrote:
> > Hi Quais,
> >
> > On Mon, Sep 07, 2020 at 12:02:24PM +0100 Qais Yousef wrote:
> > > On 09/02/20 09:54, Phil Auld wrote:
> > > > >
> > > > > I think this decoupling is not necessary. The natural place for those
> > > > > scheduler trace_event based on trace_points extension files is
> > > > > kernel/sched/ and here the internal sched.h can just be included.
> > > > >
> > > > > If someone really wants to build this as an out-of-tree module there is
> > > > > an easy way to make kernel/sched/sched.h visible.
> > > > >
> > > >
> > > > It's not so much that we really _want_ to do this in an external module.
> > > > But we aren't adding more trace events and my (limited) knowledge of
> > > > BPF let me to the conclusion that its raw tracepoint functionality
> > > > requires full events.  I didn't see any other way to do it.
> > >
> > > I did have a patch that allowed that. It might be worth trying to upstream it.
> > > It just required a new macro which could be problematic.
> > >
> > > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > >
> > > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> > >
> >
> > Yeah, that could work. I meant there was no way to do it with what was there :)
> >
> > In our initial attempts at using BPF to get at nr_running (which I was not
> > involved in and don't have all the details...) there were issues being able to
> > keep up and losing events.  That may have been an implementation issue, but
> > using the module and trace-cmd doesn't have that problem. Hopefully you don't
> > see that using RAW_TRACEPOINTs.
>
> So I have a proper patch for that now, that actually turned out to be really
> tiny once you untangle exactly what is missing.
>
> Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> that?
>
> Thanks
>
> --
> Qais Yousef
>
> -->8--
>
> From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
> From: Qais Yousef <qais.yousef@arm.com>
> Date: Wed, 30 Dec 2020 22:20:34 +0000
> Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints
>
> Some subsystems only have bare tracepoints (a tracepoint with no
> associated trace event) to avoid the problem of trace events being an
> ABI that can't be changed.
>
> From bpf presepective, bare tracepoints are what it calls
> RAW_TRACEPOINT().
>
> Since bpf assumed there's 1:1 mapping, it relied on hooking to
> DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
> bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
> no knowledge about their existence.
>
> By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
> DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
>
> Enabling that comes with the contract that changes to raw tracepoints
> don't constitute a regression if they break existing bpf programs.
> We need the ability to continue to morph and modify these raw
> tracepoints without worrying about any ABI.
>
> Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> ---
>  include/trace/bpf_probe.h | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> index cd74bffed5c6..a23be89119aa 100644
> --- a/include/trace/bpf_probe.h
> +++ b/include/trace/bpf_probe.h
> @@ -55,8 +55,7 @@
>  /* tracepoints with more than 12 arguments will hit build error */
>  #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
>
> -#undef DECLARE_EVENT_CLASS
> -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> +#define __BPF_DECLARE_TRACE(call, proto, args)                         \
>  static notrace void                                                    \
>  __bpf_trace_##call(void *__data, proto)                                        \
>  {                                                                      \
> @@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)                                      \
>         CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
>  }
>
> +#undef DECLARE_EVENT_CLASS
> +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> +       __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
> +
>  /*
>   * This part is compiled out, it is only here as a build time check
>   * to make sure that if the tracepoint handling changes, the
> @@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
>  #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
>         DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
>
> +#undef DECLARE_TRACE
> +#define DECLARE_TRACE(call, proto, args)                               \
> +       __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))          \
> +       __DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
> +
>  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)

The patch looks fine to me.
Please add a few things:
- selftests to make sure it gets routinely tested with bpf CI.
- add a doc with contents from commit log.
The "Does bpf make things into an abi ?" question keeps coming back
over and over again.
Everytime we have the same answer that No, bpf cannot bake things into abi.
I think once it's spelled out somewhere in Documentation/ it would be easier to
repeat this message.
Also please tag future patches to bpf-next tree to make sure things
keep being tested.

Thanks

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-04 18:59                 ` Alexei Starovoitov
@ 2021-01-05 11:38                   ` Qais Yousef
  2021-01-05 16:44                     ` Alexei Starovoitov
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2021-01-05 11:38 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Phil Auld,
	Peter Zijlstra (Intel),
	Dietmar Eggemann, vincent.donnefort, Ingo Molnar,
	vincent.guittot, LKML, Valentin Schneider

On 01/04/21 10:59, Alexei Starovoitov wrote:
> > > > I did have a patch that allowed that. It might be worth trying to upstream it.
> > > > It just required a new macro which could be problematic.
> > > >
> > > > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > > >
> > > > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> > > >
> > >
> > > Yeah, that could work. I meant there was no way to do it with what was there :)
> > >
> > > In our initial attempts at using BPF to get at nr_running (which I was not
> > > involved in and don't have all the details...) there were issues being able to
> > > keep up and losing events.  That may have been an implementation issue, but
> > > using the module and trace-cmd doesn't have that problem. Hopefully you don't
> > > see that using RAW_TRACEPOINTs.
> >
> > So I have a proper patch for that now, that actually turned out to be really
> > tiny once you untangle exactly what is missing.
> >
> > Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> > that?
> >
> > Thanks
> >
> > --
> > Qais Yousef
> >
> > -->8--
> >
> > From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
> > From: Qais Yousef <qais.yousef@arm.com>
> > Date: Wed, 30 Dec 2020 22:20:34 +0000
> > Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints
> >
> > Some subsystems only have bare tracepoints (a tracepoint with no
> > associated trace event) to avoid the problem of trace events being an
> > ABI that can't be changed.
> >
> > From bpf presepective, bare tracepoints are what it calls
> > RAW_TRACEPOINT().
> >
> > Since bpf assumed there's 1:1 mapping, it relied on hooking to
> > DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
> > bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
> > no knowledge about their existence.
> >
> > By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
> > DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
> >
> > Enabling that comes with the contract that changes to raw tracepoints
> > don't constitute a regression if they break existing bpf programs.
> > We need the ability to continue to morph and modify these raw
> > tracepoints without worrying about any ABI.
> >
> > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > ---
> >  include/trace/bpf_probe.h | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > index cd74bffed5c6..a23be89119aa 100644
> > --- a/include/trace/bpf_probe.h
> > +++ b/include/trace/bpf_probe.h
> > @@ -55,8 +55,7 @@
> >  /* tracepoints with more than 12 arguments will hit build error */
> >  #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> >
> > -#undef DECLARE_EVENT_CLASS
> > -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > +#define __BPF_DECLARE_TRACE(call, proto, args)                         \
> >  static notrace void                                                    \
> >  __bpf_trace_##call(void *__data, proto)                                        \
> >  {                                                                      \
> > @@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)                                      \
> >         CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> >  }
> >
> > +#undef DECLARE_EVENT_CLASS
> > +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > +       __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
> > +
> >  /*
> >   * This part is compiled out, it is only here as a build time check
> >   * to make sure that if the tracepoint handling changes, the
> > @@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
> >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
> >         DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> >
> > +#undef DECLARE_TRACE
> > +#define DECLARE_TRACE(call, proto, args)                               \
> > +       __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))          \
> > +       __DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
> > +
> >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> 
> The patch looks fine to me.
> Please add a few things:
> - selftests to make sure it gets routinely tested with bpf CI.

Any pointer to an example test I could base this on?

> - add a doc with contents from commit log.

You're referring to the ABI part of the changelog, right?

> The "Does bpf make things into an abi ?" question keeps coming back
> over and over again.
> Everytime we have the same answer that No, bpf cannot bake things into abi.
> I think once it's spelled out somewhere in Documentation/ it would be easier to
> repeat this message.

How about a new Documentation/bpf/ABI.rst? I can write something up initially
for us to discuss in detail when I post.

We have Documentation/ABI directory but I don't think it's suitable for what we
want.

> Also please tag future patches to bpf-next tree to make sure things
> keep being tested.

Sure. I understood this means adding a [PATCH bpf-next ...] in the subject
line.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-05 11:38                   ` Qais Yousef
@ 2021-01-05 16:44                     ` Alexei Starovoitov
  2021-01-06 11:27                       ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2021-01-05 16:44 UTC (permalink / raw)
  To: Qais Yousef
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Phil Auld,
	Peter Zijlstra (Intel),
	Dietmar Eggemann, vincent.donnefort, Ingo Molnar,
	vincent.guittot, LKML, Valentin Schneider

On Tue, Jan 5, 2021 at 3:39 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 01/04/21 10:59, Alexei Starovoitov wrote:
> > > > > I did have a patch that allowed that. It might be worth trying to upstream it.
> > > > > It just required a new macro which could be problematic.
> > > > >
> > > > > https://github.com/qais-yousef/linux/commit/fb9fea29edb8af327e6b2bf3bc41469a8e66df8b
> > > > >
> > > > > With the above I could attach using bpf::RAW_TRACEPOINT mechanism.
> > > > >
> > > >
> > > > Yeah, that could work. I meant there was no way to do it with what was there :)
> > > >
> > > > In our initial attempts at using BPF to get at nr_running (which I was not
> > > > involved in and don't have all the details...) there were issues being able to
> > > > keep up and losing events.  That may have been an implementation issue, but
> > > > using the module and trace-cmd doesn't have that problem. Hopefully you don't
> > > > see that using RAW_TRACEPOINTs.
> > >
> > > So I have a proper patch for that now, that actually turned out to be really
> > > tiny once you untangle exactly what is missing.
> > >
> > > Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> > > that?
> > >
> > > Thanks
> > >
> > > --
> > > Qais Yousef
> > >
> > > -->8--
> > >
> > > From cf81de8c7db03d62730939aa902579339e2fc859 Mon Sep 17 00:00:00 2001
> > > From: Qais Yousef <qais.yousef@arm.com>
> > > Date: Wed, 30 Dec 2020 22:20:34 +0000
> > > Subject: [PATCH] trace: bpf: Allow bpf to attach to bare tracepoints
> > >
> > > Some subsystems only have bare tracepoints (a tracepoint with no
> > > associated trace event) to avoid the problem of trace events being an
> > > ABI that can't be changed.
> > >
> > > From bpf presepective, bare tracepoints are what it calls
> > > RAW_TRACEPOINT().
> > >
> > > Since bpf assumed there's 1:1 mapping, it relied on hooking to
> > > DEFINE_EVENT() macro to create bpf mapping of the tracepoints. Since
> > > bare tracepoints use DECLARE_TRACE() to create the tracepoint, bpf had
> > > no knowledge about their existence.
> > >
> > > By teaching bpf_probe.h to parse DECLARE_TRACE() in a similar fashion to
> > > DEFINE_EVENT(), bpf can find and attach to the new raw tracepoints.
> > >
> > > Enabling that comes with the contract that changes to raw tracepoints
> > > don't constitute a regression if they break existing bpf programs.
> > > We need the ability to continue to morph and modify these raw
> > > tracepoints without worrying about any ABI.
> > >
> > > Signed-off-by: Qais Yousef <qais.yousef@arm.com>
> > > ---
> > >  include/trace/bpf_probe.h | 12 ++++++++++--
> > >  1 file changed, 10 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/trace/bpf_probe.h b/include/trace/bpf_probe.h
> > > index cd74bffed5c6..a23be89119aa 100644
> > > --- a/include/trace/bpf_probe.h
> > > +++ b/include/trace/bpf_probe.h
> > > @@ -55,8 +55,7 @@
> > >  /* tracepoints with more than 12 arguments will hit build error */
> > >  #define CAST_TO_U64(...) CONCATENATE(__CAST, COUNT_ARGS(__VA_ARGS__))(__VA_ARGS__)
> > >
> > > -#undef DECLARE_EVENT_CLASS
> > > -#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > > +#define __BPF_DECLARE_TRACE(call, proto, args)                         \
> > >  static notrace void                                                    \
> > >  __bpf_trace_##call(void *__data, proto)                                        \
> > >  {                                                                      \
> > > @@ -64,6 +63,10 @@ __bpf_trace_##call(void *__data, proto)                                      \
> > >         CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(prog, CAST_TO_U64(args));  \
> > >  }
> > >
> > > +#undef DECLARE_EVENT_CLASS
> > > +#define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) \
> > > +       __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))
> > > +
> > >  /*
> > >   * This part is compiled out, it is only here as a build time check
> > >   * to make sure that if the tracepoint handling changes, the
> > > @@ -111,6 +114,11 @@ __DEFINE_EVENT(template, call, PARAMS(proto), PARAMS(args), size)
> > >  #define DEFINE_EVENT_PRINT(template, name, proto, args, print) \
> > >         DEFINE_EVENT(template, name, PARAMS(proto), PARAMS(args))
> > >
> > > +#undef DECLARE_TRACE
> > > +#define DECLARE_TRACE(call, proto, args)                               \
> > > +       __BPF_DECLARE_TRACE(call, PARAMS(proto), PARAMS(args))          \
> > > +       __DEFINE_EVENT(call, call, PARAMS(proto), PARAMS(args), 0)
> > > +
> > >  #include TRACE_INCLUDE(TRACE_INCLUDE_FILE)
> >
> > The patch looks fine to me.
> > Please add a few things:
> > - selftests to make sure it gets routinely tested with bpf CI.
>
> Any pointer to an example test I could base this on?

selftests/bpf/

> > - add a doc with contents from commit log.
>
> You're referring to the ABI part of the changelog, right?
>
> > The "Does bpf make things into an abi ?" question keeps coming back
> > over and over again.
> > Everytime we have the same answer that No, bpf cannot bake things into abi.
> > I think once it's spelled out somewhere in Documentation/ it would be easier to
> > repeat this message.
>
> How about a new Documentation/bpf/ABI.rst? I can write something up initially
> for us to discuss in detail when I post.

There is Documentation/bpf/bpf_design_QA.rst
and we already have this text in there that was added back in 2017:

Q: Does BPF have a stable ABI?
------------------------------
A: YES. BPF instructions, arguments to BPF programs, set of helper
functions and their arguments, recognized return codes are all part
of ABI. However there is one specific exception to tracing programs
which are using helpers like bpf_probe_read() to walk kernel internal
data structures and compile with kernel internal headers. Both of these
kernel internals are subject to change and can break with newer kernels
such that the program needs to be adapted accordingly.

I'm suggesting to add an additional section to this Q/A doc to include
more or less
the same text you had in the commit log.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-05 16:44                     ` Alexei Starovoitov
@ 2021-01-06 11:27                       ` Qais Yousef
  2021-01-06 23:42                         ` Andrii Nakryiko
  0 siblings, 1 reply; 26+ messages in thread
From: Qais Yousef @ 2021-01-06 11:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Daniel Borkmann, Andrii Nakryiko, Phil Auld,
	Peter Zijlstra (Intel),
	Dietmar Eggemann, vincent.donnefort, Ingo Molnar,
	vincent.guittot, LKML, Valentin Schneider

On 01/05/21 08:44, Alexei Starovoitov wrote:
> > Any pointer to an example test I could base this on?
> 
> selftests/bpf/

I was hoping for something more elaborate. I thought there's something already
there that do some verification for raw tracepoint that I could either extend
or replicate. Otherwise this could end up being a time sink for me and I'm not
keen on jumping down this rabbit hole.

> > > - add a doc with contents from commit log.
> >
> > You're referring to the ABI part of the changelog, right?
> >
> > > The "Does bpf make things into an abi ?" question keeps coming back
> > > over and over again.
> > > Everytime we have the same answer that No, bpf cannot bake things into abi.
> > > I think once it's spelled out somewhere in Documentation/ it would be easier to
> > > repeat this message.
> >
> > How about a new Documentation/bpf/ABI.rst? I can write something up initially
> > for us to discuss in detail when I post.
> 
> There is Documentation/bpf/bpf_design_QA.rst
> and we already have this text in there that was added back in 2017:
> 
> Q: Does BPF have a stable ABI?
> ------------------------------
> A: YES. BPF instructions, arguments to BPF programs, set of helper
> functions and their arguments, recognized return codes are all part
> of ABI. However there is one specific exception to tracing programs
> which are using helpers like bpf_probe_read() to walk kernel internal
> data structures and compile with kernel internal headers. Both of these
> kernel internals are subject to change and can break with newer kernels
> such that the program needs to be adapted accordingly.
> 
> I'm suggesting to add an additional section to this Q/A doc to include
> more or less
> the same text you had in the commit log.

Works for me.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-06 11:27                       ` Qais Yousef
@ 2021-01-06 23:42                         ` Andrii Nakryiko
  2021-01-07 11:23                           ` Qais Yousef
  0 siblings, 1 reply; 26+ messages in thread
From: Andrii Nakryiko @ 2021-01-06 23:42 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Andrii Nakryiko,
	Phil Auld, Peter Zijlstra (Intel),
	Dietmar Eggemann, vincent.donnefort, Ingo Molnar,
	vincent.guittot, LKML, Valentin Schneider

On Wed, Jan 6, 2021 at 3:27 AM Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 01/05/21 08:44, Alexei Starovoitov wrote:
> > > Any pointer to an example test I could base this on?
> >
> > selftests/bpf/
>
> I was hoping for something more elaborate. I thought there's something already
> there that do some verification for raw tracepoint that I could either extend
> or replicate. Otherwise this could end up being a time sink for me and I'm not
> keen on jumping down this rabbit hole.

One way would be to add either another custom tracepoint definition to
a test module or modify the existing one to be a bare tracepoint. See
links below.

If it's easy to trigger those tracepoints from user-space on demand,
writing a similar (to module_attach) selftest for in-kernel tracepoint
is trivial.

  [0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
  [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_module_attach.c#L12-L18
  [2] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/module_attach.c

>
> > > > - add a doc with contents from commit log.
> > >
> > > You're referring to the ABI part of the changelog, right?
> > >
> > > > The "Does bpf make things into an abi ?" question keeps coming back
> > > > over and over again.
> > > > Everytime we have the same answer that No, bpf cannot bake things into abi.
> > > > I think once it's spelled out somewhere in Documentation/ it would be easier to
> > > > repeat this message.
> > >
> > > How about a new Documentation/bpf/ABI.rst? I can write something up initially
> > > for us to discuss in detail when I post.
> >
> > There is Documentation/bpf/bpf_design_QA.rst
> > and we already have this text in there that was added back in 2017:
> >
> > Q: Does BPF have a stable ABI?
> > ------------------------------
> > A: YES. BPF instructions, arguments to BPF programs, set of helper
> > functions and their arguments, recognized return codes are all part
> > of ABI. However there is one specific exception to tracing programs
> > which are using helpers like bpf_probe_read() to walk kernel internal
> > data structures and compile with kernel internal headers. Both of these
> > kernel internals are subject to change and can break with newer kernels
> > such that the program needs to be adapted accordingly.
> >
> > I'm suggesting to add an additional section to this Q/A doc to include
> > more or less
> > the same text you had in the commit log.
>
> Works for me.
>
> Thanks
>
> --
> Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-06 23:42                         ` Andrii Nakryiko
@ 2021-01-07 11:23                           ` Qais Yousef
  0 siblings, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2021-01-07 11:23 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Daniel Borkmann, Andrii Nakryiko,
	Phil Auld, Peter Zijlstra (Intel),
	Dietmar Eggemann, vincent.donnefort, Ingo Molnar,
	vincent.guittot, LKML, Valentin Schneider

On 01/06/21 15:42, Andrii Nakryiko wrote:
> On Wed, Jan 6, 2021 at 3:27 AM Qais Yousef <qais.yousef@arm.com> wrote:
> >
> > On 01/05/21 08:44, Alexei Starovoitov wrote:
> > > > Any pointer to an example test I could base this on?
> > >
> > > selftests/bpf/
> >
> > I was hoping for something more elaborate. I thought there's something already
> > there that do some verification for raw tracepoint that I could either extend
> > or replicate. Otherwise this could end up being a time sink for me and I'm not
> > keen on jumping down this rabbit hole.
> 
> One way would be to add either another custom tracepoint definition to
> a test module or modify the existing one to be a bare tracepoint. See
> links below.
> 
> If it's easy to trigger those tracepoints from user-space on demand,
> writing a similar (to module_attach) selftest for in-kernel tracepoint
> is trivial.
> 
>   [0] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/bpf_testmod/bpf_testmod-events.h
>   [1] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/progs/test_module_attach.c#L12-L18
>   [2] https://github.com/torvalds/linux/blob/master/tools/testing/selftests/bpf/prog_tests/module_attach.c

Thanks a lot Andrii. That will make it much easier to figure out something.

Cheers

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-04 18:26               ` Qais Yousef
  2021-01-04 18:59                 ` Alexei Starovoitov
@ 2021-01-11 14:04                 ` Peter Zijlstra
  2021-01-11 14:08                   ` Qais Yousef
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2021-01-11 14:04 UTC (permalink / raw)
  To: Qais Yousef
  Cc: Phil Auld, Dietmar Eggemann, vincent.donnefort, mingo,
	vincent.guittot, linux-kernel, valentin.schneider

On Mon, Jan 04, 2021 at 06:26:42PM +0000, Qais Yousef wrote:

> So I have a proper patch for that now, that actually turned out to be really
> tiny once you untangle exactly what is missing.
> 
> Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> that?

In order to use these you need to rely on BTF to get anything useful
done right? And anything that relies on BTF cannot be ABI.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity
  2021-01-11 14:04                 ` Peter Zijlstra
@ 2021-01-11 14:08                   ` Qais Yousef
  0 siblings, 0 replies; 26+ messages in thread
From: Qais Yousef @ 2021-01-11 14:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Phil Auld, Dietmar Eggemann, vincent.donnefort, mingo,
	vincent.guittot, linux-kernel, valentin.schneider

On 01/11/21 15:04, Peter Zijlstra wrote:
> On Mon, Jan 04, 2021 at 06:26:42PM +0000, Qais Yousef wrote:
> 
> > So I have a proper patch for that now, that actually turned out to be really
> > tiny once you untangle exactly what is missing.
> > 
> > Peter, bpf programs aren't considered ABIs AFAIK, do you have concerns about
> > that?
> 
> In order to use these you need to rely on BTF to get anything useful
> done right? And anything that relies on BTF cannot be ABI.

Yes. To decode struct rq for instance one has to either hardcode it in their
program or use BTF to get the definition dynamically.

The worry is if we modify the function signature of the tracepoint. Alexei did
confirm this can't be an ABI and I'm adding additional documentation to make
this crystal clear.

Thanks

--
Qais Yousef

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2021-01-11 14:09 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-28  9:00 [PATCH v2] sched/debug: Add new tracepoint to track cpu_capacity vincent.donnefort
2020-08-28 10:27 ` Qais Yousef
2020-08-28 17:10   ` Dietmar Eggemann
2020-08-28 17:26     ` Qais Yousef
2020-09-02 10:44       ` Dietmar Eggemann
2020-09-02 13:54         ` Phil Auld
2020-09-07 11:02           ` Qais Yousef
2020-09-08 13:19             ` Phil Auld
2020-09-08 15:22               ` Qais Yousef
2021-01-04 18:26               ` Qais Yousef
2021-01-04 18:59                 ` Alexei Starovoitov
2021-01-05 11:38                   ` Qais Yousef
2021-01-05 16:44                     ` Alexei Starovoitov
2021-01-06 11:27                       ` Qais Yousef
2021-01-06 23:42                         ` Andrii Nakryiko
2021-01-07 11:23                           ` Qais Yousef
2021-01-11 14:04                 ` Peter Zijlstra
2021-01-11 14:08                   ` Qais Yousef
2020-09-07 10:48         ` Qais Yousef
2020-09-07 11:13           ` peterz
2020-09-07 14:51             ` Qais Yousef
2020-09-08 11:17               ` Dietmar Eggemann
2020-09-08 15:17                 ` Qais Yousef
2020-09-08 16:17                   ` Dietmar Eggemann
2021-01-04 15:18             ` Qais Yousef
2020-10-05  7:43 ` [tip: sched/core] " tip-bot2 for Vincent Donnefort

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).