[Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
@ 2016-09-06 18:51 Al Viro
  2016-09-06 19:22 ` Steven Rostedt
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Al Viro @ 2016-09-06 18:51 UTC (permalink / raw)
  To: ksummit-discuss

	Right now there is no mechanism for saying "if this tracepoints
breaks, it's Not Our Problem(tm)".  All of them are parts of userland
ABI, potentially casting in stone all kinds of kernel internals.  E.g.
just today a patch series adding tracepoints to kobject primitives
had been posted; if _that_ becomes a part of stable ABI, we get the
lifetime rules for anything with an embedded kobject exposed to userland
and potentially impossible to change - all it takes is a single piece of
software making non-trivial use of those.

	Exposing kernel internals to debugging scripts et.al. is fine,
but only as long as we are promising to keep those scripts working.
Otherwise we end up promising that arseloads of code we can't even see,
sticking its fingers very deep into the kernel internals, will be kept
functional through the kernel changes.  This is absolutely insane, of
course, and I doubt that anybody would be willing to make such promises,
no matter how much value would that add.

	The tracepoints are undiffirentiated mass, with no way for userland
to tell whether it's using something stable or an equivalent of someone's
debugging printks with no promise of stability.  And folks, there *are*
people out there with commit priveleges on assorted userland projects who'd
made it perfectly clear that _any_ kernel interface they find is fair game -
as in "if you didn't want it used, why did you put it in the kernel?".
No matter how much I dislike that bunch for other things, I can't blame them
for being less than upfront regarding that policy.  It had been expressed
very clearly and we'd better take the warning seriously.

	I think this is something that needs to be discussed at KS; IMO we
need at least some way to express the degree of stability promises made
wrt individual tracepoints and some mechanisms for preventing silent creep
towards full stability; something along the lines of "unstable tracepoint $FOO
used by $PROGRAM, kernel tainted", at least.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 18:51 [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties Al Viro
@ 2016-09-06 19:22 ` Steven Rostedt
  2016-09-06 21:36   ` Alexey Dobriyan
  2016-09-06 21:05 ` Shuah Khan
  2016-09-07 23:17 ` Masami Hiramatsu
  2 siblings, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2016-09-06 19:22 UTC (permalink / raw)
  To: Al Viro; +Cc: ksummit-discuss

On Tue, 6 Sep 2016 19:51:43 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> 	I think this is something that needs to be discussed at KS; IMO we
> need at least some way to express the degree of stability promises made
> wrt individual tracepoints and some mechanisms for preventing silent creep
> towards full stability; something along the lines of "unstable tracepoint $FOO
> used by $PROGRAM, kernel tainted", at least.

What about having a set of tracepoints that are only enabled if one
adds to the kernel command line "this-kernel-is-broken" and a big
printk banner saying something like:

*****************************************************************
*****************************************************************
**        WARNING WARNING WARNING WARNING WARNING WARNING      **
**                                                             **
**        This kernel is BROKEN! Do not use in production      **
*****************************************************************
*****************************************************************

Then the tracepoints in vfs will appear.

This worked so far with my trace_printk() usage.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 18:51 [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties Al Viro
  2016-09-06 19:22 ` Steven Rostedt
@ 2016-09-06 21:05 ` Shuah Khan
  2016-09-08  3:13   ` Masami Hiramatsu
  2016-09-07 23:17 ` Masami Hiramatsu
  2 siblings, 1 reply; 21+ messages in thread
From: Shuah Khan @ 2016-09-06 21:05 UTC (permalink / raw)
  To: Al Viro; +Cc: ksummit-discuss

On Tue, Sep 6, 2016 at 12:51 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>         Right now there is no mechanism for saying "if this tracepoints
> breaks, it's Not Our Problem(tm)".  All of them are parts of userland
> ABI, potentially casting in stone all kinds of kernel internals.  E.g.
> just today a patch series adding tracepoints to kobject primitives
> had been posted; if _that_ becomes a part of stable ABI, we get the
> lifetime rules for anything with an embedded kobject exposed to userland
> and potentially impossible to change - all it takes is a single piece of
> software making non-trivial use of those.

I honestly didn't think that my patch series will result in a special
KS topic :)
However, I think it is a good idea to discuss it as general topic for
what kind of
kernel information should/should not be made visible via tracepoints or other
debug mechanisms.

We do support a wide range of tracepoints and events in various sub-systems
skb.h, pagemap.h, and pagemap.h so on. Maybe it would be helpful to agree
on some sort of guidelines for exposure.

Guess I never considered tracepoints as userspace API, anymore than debug
messages. It is part of debug and only visible to root.

In my mind it is mainly a debug information that would only make sense to
kernel developers. That is why I didn't think about needing to keep the
tracepoints identical. That said, it is a good idea for us to discuss at the KS.

thanks,
-- Shuah

>
>         Exposing kernel internals to debugging scripts et.al. is fine,
> but only as long as we are promising to keep those scripts working.
> Otherwise we end up promising that arseloads of code we can't even see,
> sticking its fingers very deep into the kernel internals, will be kept
> functional through the kernel changes.  This is absolutely insane, of
> course, and I doubt that anybody would be willing to make such promises,
> no matter how much value would that add.
>
>         The tracepoints are undiffirentiated mass, with no way for userland
> to tell whether it's using something stable or an equivalent of someone's
> debugging printks with no promise of stability.  And folks, there *are*
> people out there with commit priveleges on assorted userland projects who'd
> made it perfectly clear that _any_ kernel interface they find is fair game -
> as in "if you didn't want it used, why did you put it in the kernel?".
> No matter how much I dislike that bunch for other things, I can't blame them
> for being less than upfront regarding that policy.  It had been expressed
> very clearly and we'd better take the warning seriously.
>
>         I think this is something that needs to be discussed at KS; IMO we
> need at least some way to express the degree of stability promises made
> wrt individual tracepoints and some mechanisms for preventing silent creep
> towards full stability; something along the lines of "unstable tracepoint $FOO
> used by $PROGRAM, kernel tainted", at least.
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 19:22 ` Steven Rostedt
@ 2016-09-06 21:36   ` Alexey Dobriyan
  2016-09-06 21:53     ` Steven Rostedt
  2016-09-06 22:02     ` Alexey Dobriyan
  0 siblings, 2 replies; 21+ messages in thread
From: Alexey Dobriyan @ 2016-09-06 21:36 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Tue, Sep 06, 2016 at 03:22:43PM -0400, Steven Rostedt wrote:
> On Tue, 6 Sep 2016 19:51:43 +0100
> Al Viro <viro@ZenIV.linux.org.uk> wrote:
> 
> > 	I think this is something that needs to be discussed at KS; IMO we
> > need at least some way to express the degree of stability promises made
> > wrt individual tracepoints and some mechanisms for preventing silent creep
> > towards full stability; something along the lines of "unstable tracepoint $FOO
> > used by $PROGRAM, kernel tainted", at least.
> 
> What about having a set of tracepoints that are only enabled if one
> adds to the kernel command line "this-kernel-is-broken" and a big
> printk banner saying something like:

The solution was out there for quite some time :-)

	Scope of Compatibility
	Packages in Red Hat Enterprise Linux are classified under one of
	the following four compatibility levels:

	[ ] Compatibility level 1: APIs and ABIs are stable across three
	    major releases;

	[ ] Compatibility level 2: APIs and ABIs are stable within one major
	    release.

	[ ] Compatibility level 3: Reserved for future use.

	[X] Compatibility level 4: No compatibility is provided.

The winning move is to not play and let distros sort it out.

P.S.: techically every kernel release almost certainly breaks crash(1)
program, program many people on this list should be familiar with.
It is unclear why rules should be different for tracepoints.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 21:36   ` Alexey Dobriyan
@ 2016-09-06 21:53     ` Steven Rostedt
  2016-09-06 22:41       ` Alexey Dobriyan
  2016-09-06 22:02     ` Alexey Dobriyan
  1 sibling, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2016-09-06 21:53 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: ksummit-discuss

On Wed, 7 Sep 2016 00:36:44 +0300
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> The solution was out there for quite some time :-)
> 
> 	Scope of Compatibility
> 	Packages in Red Hat Enterprise Linux are classified under one of
> 	the following four compatibility levels:
> 
> 	[ ] Compatibility level 1: APIs and ABIs are stable across three
> 	    major releases;
> 
> 	[ ] Compatibility level 2: APIs and ABIs are stable within one major
> 	    release.
> 
> 	[ ] Compatibility level 3: Reserved for future use.
> 
> 	[X] Compatibility level 4: No compatibility is provided.
> 
> The winning move is to not play and let distros sort it out.

Except that Linus has a hard rule for this. See the reason behind his
infamous rant:

   https://lkml.org/lkml/2012/12/23/75

Specifically:

 "If a change results in user programs breaking, it's a bug in the
  kernel. We never EVER blame the user programs."

> 
> P.S.: techically every kernel release almost certainly breaks crash(1)
> program, program many people on this list should be familiar with.
> It is unclear why rules should be different for tracepoints.

Well, crash() isn't a userspace tool that runs on top of Linux. Well,
it does, but only the input from a core dump of a Linux kernel breaks
it. It will always run fine on all Linux versions as long as it uses
the same input.

Tracepoints are runtime visible. This isn't a postmortem analysis. We
already had an issues when powertop read the tracepoints directly
without using the tracepoint format file parsing, and we ended up
having 4 bytes of useless data in *every* tracepoint. Luckily, that got
fixed because this hard coding broke when running powertop from a 32
bit userspace on top of a 64 bit kernel. I worked to get powertop to
use the tracepoint format parsing that perf and trace-cmd uses.

But if something depends on event fields, we need to maintain that. For
now, we have fake fields in the sched_wakeup tracepoint, because of
this.

It's a balance that we need to figure out. One is that tracepoints are
really helpful for in the field debugging to see what is happening. The
other is that they are becoming an ABI and if a useful tool (like
powertop) hooks into them, whatever they hooked into becomes set in
stone.

This is a real issue, and has been brought up in past kernel summits
without a resolution.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 21:36   ` Alexey Dobriyan
  2016-09-06 21:53     ` Steven Rostedt
@ 2016-09-06 22:02     ` Alexey Dobriyan
  2016-09-06 22:15       ` Steven Rostedt
  1 sibling, 1 reply; 21+ messages in thread
From: Alexey Dobriyan @ 2016-09-06 22:02 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Wed, Sep 07, 2016 at 12:36:44AM +0300, Alexey Dobriyan wrote:
> P.S.: techically every kernel release almost certainly breaks crash(1)
> program, program many people on this list should be familiar with.
> It is unclear why rules should be different for tracepoints.

Concrete example:

	TRACE_EVENT(sched_switch,
        TP_STRUCT__entry(
                __array(        char,   prev_comm,      TASK_COMM_LEN   )
                __field(        pid_t,  prev_pid                        )
                __field(        int,    prev_prio                       )
                __field(        long,   prev_state
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

struct task_struct::state is long but it looks like a historical
artifact, it can be just int. In this case it is easy to extend value and
preserve compatibility.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 22:02     ` Alexey Dobriyan
@ 2016-09-06 22:15       ` Steven Rostedt
  0 siblings, 0 replies; 21+ messages in thread
From: Steven Rostedt @ 2016-09-06 22:15 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: ksummit-discuss

On Wed, 7 Sep 2016 01:02:13 +0300
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Wed, Sep 07, 2016 at 12:36:44AM +0300, Alexey Dobriyan wrote:
> > P.S.: techically every kernel release almost certainly breaks crash(1)
> > program, program many people on this list should be familiar with.
> > It is unclear why rules should be different for tracepoints.  
> 
> Concrete example:
> 
> 	TRACE_EVENT(sched_switch,
>         TP_STRUCT__entry(
>                 __array(        char,   prev_comm,      TASK_COMM_LEN   )
>                 __field(        pid_t,  prev_pid                        )
>                 __field(        int,    prev_prio                       )
>                 __field(        long,   prev_state
> 		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> struct task_struct::state is long but it looks like a historical
> artifact, it can be just int. In this case it is easy to extend value and
> preserve compatibility.

Oh, tracepoints can change, and they do all the time. It's only if that
change breaks existing tools.

The sched_wake_up has a "success" field that tools use to know if the
wake up was successful or not. That's because the old code where the
tracepoint was located was hit even if the task was already awake and
nothing happened. Today the tracepoint is only called when a wake up
actually happens. That is, "success" is always true. But we need to
keep that there, because existing tools depend on it.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 21:53     ` Steven Rostedt
@ 2016-09-06 22:41       ` Alexey Dobriyan
  2016-09-06 23:12         ` Steven Rostedt
  2016-09-07  5:10         ` Al Viro
  0 siblings, 2 replies; 21+ messages in thread
From: Alexey Dobriyan @ 2016-09-06 22:41 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Tue, Sep 06, 2016 at 05:53:43PM -0400, Steven Rostedt wrote:
> On Wed, 7 Sep 2016 00:36:44 +0300
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> 
> > The solution was out there for quite some time :-)
> > 
> > 	Scope of Compatibility
> > 	Packages in Red Hat Enterprise Linux are classified under one of
> > 	the following four compatibility levels:
> > 
> > 	[ ] Compatibility level 1: APIs and ABIs are stable across three
> > 	    major releases;
> > 
> > 	[ ] Compatibility level 2: APIs and ABIs are stable within one major
> > 	    release.
> > 
> > 	[ ] Compatibility level 3: Reserved for future use.
> > 
> > 	[X] Compatibility level 4: No compatibility is provided.
> > 
> > The winning move is to not play and let distros sort it out.
> 
> Except that Linus has a hard rule for this. See the reason behind his
> infamous rant:
> 
>    https://lkml.org/lkml/2012/12/23/75
> 
> Specifically:
> 
>  "If a change results in user programs breaking, it's a bug in the
>   kernel. We never EVER blame the user programs."

Linus has said many things. I've personally had Python compilation busted
when Linux 4 appeared but somehow digit 4 is still with us. By that logic,
major version should have been reverted back to 3 long ago.

> > P.S.: techically every kernel release almost certainly breaks crash(1)
> > program, program many people on this list should be familiar with.
> > It is unclear why rules should be different for tracepoints.
> 
> Well, crash() isn't a userspace tool that runs on top of Linux. Well,
> it does, but only the input from a core dump of a Linux kernel breaks
> it. It will always run fine on all Linux versions as long as it uses
> the same input.

It can act on live kernel.

> Tracepoints are runtime visible. This isn't a postmortem analysis. We
> already had an issues when powertop read the tracepoints directly
> without using the tracepoint format file parsing, and we ended up
> having 4 bytes of useless data in *every* tracepoint. Luckily, that got
> fixed because this hard coding broke when running powertop from a 32
> bit userspace on top of a 64 bit kernel. I worked to get powertop to
> use the tracepoint format parsing that perf and trace-cmd uses.
> 
> But if something depends on event fields, we need to maintain that. For
> now, we have fake fields in the sched_wakeup tracepoint, because of
> this.
> 
> It's a balance that we need to figure out. One is that tracepoints are
> really helpful for in the field debugging to see what is happening. The
> other is that they are becoming an ABI and if a useful tool (like
> powertop) hooks into them, whatever they hooked into becomes set in
> stone.

There is no balance. One can't even reorder gfp_t flags:

	DECLARE_EVENT_CLASS(kmem_alloc,
	TP_STRUCT__entry(
                __field(        unsigned long,  call_site       )
                __field(        const void *,   ptr             )
                __field(        size_t,         bytes_req       )
                __field(        size_t,         bytes_alloc     )
                __field(        gfp_t,          gfp_flags       )
        ),


> This is a real issue, and has been brought up in past kernel summits
> without a resolution.

Gentlemen's agreement then:
* kernel developers don't break tracepoints on purpose and maintain
  compatibility in simple cases (long => int, deleted field, etc),
* real, justified tracepoint breakage doesn't count.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 22:41       ` Alexey Dobriyan
@ 2016-09-06 23:12         ` Steven Rostedt
  2016-09-08 11:43           ` Alexey Dobriyan
  2016-09-07  5:10         ` Al Viro
  1 sibling, 1 reply; 21+ messages in thread
From: Steven Rostedt @ 2016-09-06 23:12 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: ksummit-discuss

On Wed, 7 Sep 2016 01:41:00 +0300
Alexey Dobriyan <adobriyan@gmail.com> wrote:

>
> > Specifically:
> > 
> >  "If a change results in user programs breaking, it's a bug in the
> >   kernel. We never EVER blame the user programs."  
> 
> Linus has said many things. I've personally had Python compilation busted
> when Linux 4 appeared but somehow digit 4 is still with us. By that logic,
> major version should have been reverted back to 3 long ago.

There is a limit to the insanity. If a userspace tool depends on a
kernel version number, then it pinned itself to that version. If Python
never expected a 4 to appear, then it's compiling will be left to 3.x
kernels.

> 
> > > P.S.: techically every kernel release almost certainly breaks crash(1)
> > > program, program many people on this list should be familiar with.
> > > It is unclear why rules should be different for tracepoints.  
> > 
> > Well, crash() isn't a userspace tool that runs on top of Linux. Well,
> > it does, but only the input from a core dump of a Linux kernel breaks
> > it. It will always run fine on all Linux versions as long as it uses
> > the same input.  
> 
> It can act on live kernel.

Again, there's a limit to the insanity ;-)

> 
> > Tracepoints are runtime visible. This isn't a postmortem analysis. We
> > already had an issues when powertop read the tracepoints directly
> > without using the tracepoint format file parsing, and we ended up
> > having 4 bytes of useless data in *every* tracepoint. Luckily, that got
> > fixed because this hard coding broke when running powertop from a 32
> > bit userspace on top of a 64 bit kernel. I worked to get powertop to
> > use the tracepoint format parsing that perf and trace-cmd uses.
> > 
> > But if something depends on event fields, we need to maintain that. For
> > now, we have fake fields in the sched_wakeup tracepoint, because of
> > this.
> > 
> > It's a balance that we need to figure out. One is that tracepoints are
> > really helpful for in the field debugging to see what is happening. The
> > other is that they are becoming an ABI and if a useful tool (like
> > powertop) hooks into them, whatever they hooked into becomes set in
> > stone.  
> 
> There is no balance. One can't even reorder gfp_t flags:
> 
> 	DECLARE_EVENT_CLASS(kmem_alloc,
> 	TP_STRUCT__entry(
>                 __field(        unsigned long,  call_site       )
>                 __field(        const void *,   ptr             )
>                 __field(        size_t,         bytes_req       )
>                 __field(        size_t,         bytes_alloc     )
>                 __field(        gfp_t,          gfp_flags       )
>         ),

You mean if a tool depends on the order of bits set? I guess the
question is, is there such a tool, and have people complained when
things break? Or has anything broken yet?

> 
> 
> > This is a real issue, and has been brought up in past kernel summits
> > without a resolution.  
> 
> Gentlemen's agreement then:
> * kernel developers don't break tracepoints on purpose and maintain
>   compatibility in simple cases (long => int, deleted field, etc),
> * real, justified tracepoint breakage doesn't count.

According to Linus's rule, it really comes down to a tool that depends
on some feature in a tracepoint, and later we change it, break the
tool, and user complains. Then we are stuck with that tracepoint as it
was.

We can change any user space ABI as long as nobody notices ;-) If a user
space ABI breaks in the forest and nobody is around to complain about
it, is it still breakage? According to Linus, the answer is no.

Tracepoints are no different than any other ABI. The problem is, we
don't know if something is using it until we change it and that
something breaks.

-- Steve

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 22:41       ` Alexey Dobriyan
  2016-09-06 23:12         ` Steven Rostedt
@ 2016-09-07  5:10         ` Al Viro
  2016-09-07  5:30           ` Andy Lutomirski
  1 sibling, 1 reply; 21+ messages in thread
From: Al Viro @ 2016-09-07  5:10 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: ksummit-discuss

On Wed, Sep 07, 2016 at 01:41:00AM +0300, Alexey Dobriyan wrote:

> Gentlemen's agreement then:
> * kernel developers don't break tracepoints on purpose and maintain
>   compatibility in simple cases (long => int, deleted field, etc),
> * real, justified tracepoint breakage doesn't count.

No go.  Scenario:

1) piss-poor API is added in form of tracehook.  It exports some information
that can be used to derive something genuinely interesting.  Most of the
time.  Corner cases are unsolvable, even though it might be possible to
provide the interesting part sanely.  Just not in that form.  Moreover,
faking the bits used to derive that information so that existing userland
logics would yield the right result is bloody hard and restricts what we
can do kernel-side, even though the real thing userland wants would not have
such problems.

2) userland code of matching quality is added to a certain daemon.  Tons
of boxen become dependent on that thing _in_ _that_ _shitty_ _form_.

3) you notice the problems kernel-side and make changes that break the damn
thing.

4) daemon authors come complaining.  You refer them to the "gentlemen's
agreement".  They refer you to the Figure 1 and inform you that if you didn't
want that API to be used, you shouldn't have allowed it into the kernel.
Correctly on both counts, at that.  Incidentally, that agreement of yours is
not something they signed upon, and what's that "gentleman" thing anyway?
Linus informs you that you are a particularly obtuse idiot and haven't you
learnt that WE DON'T BREAK USERLAND CODE, DAMNIT by now?

5) you are stuck keeping that turd, no matter the cost.  Adding a replacement
API (sanely designed this time) that would allow to get the same information
is welcome, but it doesn't relieve you from keeping both at least for 5-6
years.

Tracepoints are easy to add for quick debugging/instrumentation/etc. and
that's what makes them useful; OTOH, for a stable API you want more time
going into design and review, or you'll pay a _lot_ more maintaining it.
The thing is, API that has started with no intention to turn it into stable one
might end up become just that.  Your agreement is basically that tracepoints
should never become stable APIs, but that train has left years ago.

One way to deal with that is to refuse to accept any tracepoints in the area
you maintain.  Another might be to differentiate between stable and unstable
ones, with much more review for the former and a very explicit "if you
ever want to use that for more than one kernel version, you get to keep it
working yourself" about the latter.  But that only works if it's visible to
userland *and* Linus agrees that anything of that sort will not get the
normal stable API treatment, i.e. "real code uses it, you get to keep it
working".

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07  5:10         ` Al Viro
@ 2016-09-07  5:30           ` Andy Lutomirski
  2016-09-07  6:41             ` Vlastimil Babka
                               ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Andy Lutomirski @ 2016-09-07  5:30 UTC (permalink / raw)
  To: Al Viro; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On Sep 6, 2016 10:10 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Sep 07, 2016 at 01:41:00AM +0300, Alexey Dobriyan wrote:
>
> > Gentlemen's agreement then:
> > * kernel developers don't break tracepoints on purpose and maintain
> >   compatibility in simple cases (long => int, deleted field, etc),
> > * real, justified tracepoint breakage doesn't count.
>
> No go.  Scenario:
>
> 1) piss-poor API is added in form of tracehook.  It exports some
information
> that can be used to derive something genuinely interesting.  Most of the
> time.  Corner cases are unsolvable, even though it might be possible to
> provide the interesting part sanely.  Just not in that form.  Moreover,
> faking the bits used to derive that information so that existing userland
> logics would yield the right result is bloody hard and restricts what we
> can do kernel-side, even though the real thing userland wants would not
have
> such problems.
>

Agreed.

I wouldn't mind a policy that tracepoints are simply never stable.  Maybe
we should even deliberately change them periodically to drive the point
home.

The kernel should be able to have a debug API that is genuinely for
*debugging* and doesn't freeze the underlying implementation.

Windows, AFAICT, works like this.  If you write a production program that
invokes WinDbg or similar, it's going to break down the road.

[-- Attachment #2: Type: text/html, Size: 1743 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07  5:30           ` Andy Lutomirski
@ 2016-09-07  6:41             ` Vlastimil Babka
  2016-09-19 12:51               ` Michal Hocko
  2016-09-07 13:15             ` Christian Borntraeger
  2016-09-07 15:30             ` Shuah Khan
  2 siblings, 1 reply; 21+ messages in thread
From: Vlastimil Babka @ 2016-09-07  6:41 UTC (permalink / raw)
  To: ksummit-discuss

On 09/07/2016 07:30 AM, Andy Lutomirski wrote:
> On Sep 6, 2016 10:10 PM, "Al Viro" <viro@zeniv.linux.org.uk
> <mailto:viro@zeniv.linux.org.uk>> wrote:
>> 1) piss-poor API is added in form of tracehook.  It exports some
> information
>> that can be used to derive something genuinely interesting.  Most of the
>> time.  Corner cases are unsolvable, even though it might be possible to
>> provide the interesting part sanely.  Just not in that form.  Moreover,
>> faking the bits used to derive that information so that existing userland
>> logics would yield the right result is bloody hard and restricts what we
>> can do kernel-side, even though the real thing userland wants would
> not have
>> such problems.
>>
>
> Agreed.
>
> I wouldn't mind a policy that tracepoints are simply never stable.
> Maybe we should even deliberately change them periodically to drive the
> point home.

I would wish that as well. Tracepoints sometimes must expose 
implementation details to be useful for debugging *the implementation*. 
The tools that use them to build some extra interesting metrics on top 
the tracepoints may often need to know even more about the 
implementation for correct interpretation of the data.

Even if the kernel implementation changes *without* touching the 
tracepoint format, the changed behavior might still destroy these 
metrics. (I hope it's understandable what I'm trying to say, I can't 
think of a good example right now). Do we want to set the implementation 
in stone to such extent? That would suck. If not, then the only way 
would be to indeed not mainline any tracepoints, which would also suck.

> The kernel should be able to have a debug API that is genuinely for
> *debugging* and doesn't freeze the underlying implementation.
>
> Windows, AFAICT, works like this.  If you write a production program
> that invokes WinDbg or similar, it's going to break down the road.
>
>
>
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07  5:30           ` Andy Lutomirski
  2016-09-07  6:41             ` Vlastimil Babka
@ 2016-09-07 13:15             ` Christian Borntraeger
  2016-09-07 15:30             ` Shuah Khan
  2 siblings, 0 replies; 21+ messages in thread
From: Christian Borntraeger @ 2016-09-07 13:15 UTC (permalink / raw)
  To: Andy Lutomirski, Al Viro; +Cc: ksummit-discuss

On 09/07/2016 07:30 AM, Andy Lutomirski wrote:
> On Sep 6, 2016 10:10 PM, "Al Viro" <viro@zeniv.linux.org.uk <mailto:viro@zeniv.linux.org.uk>> wrote:
>>
>> On Wed, Sep 07, 2016 at 01:41:00AM +0300, Alexey Dobriyan wrote:
>>
>> > Gentlemen's agreement then:
>> > * kernel developers don't break tracepoints on purpose and maintain
>> >   compatibility in simple cases (long => int, deleted field, etc),
>> > * real, justified tracepoint breakage doesn't count.
>>
>> No go.  Scenario:
>>
>> 1) piss-poor API is added in form of tracehook.  It exports some information
>> that can be used to derive something genuinely interesting.  Most of the
>> time.  Corner cases are unsolvable, even though it might be possible to
>> provide the interesting part sanely.  Just not in that form.  Moreover,
>> faking the bits used to derive that information so that existing userland
>> logics would yield the right result is bloody hard and restricts what we
>> can do kernel-side, even though the real thing userland wants would not have
>> such problems.
>>
> 
> Agreed.
> 
> I wouldn't mind a policy that tracepoints are simply never stable.  Maybe we should even deliberately change them periodically to drive the point home.

In case I will be invited this is certainly a topic that I am interested in.

We faced this challenge as well for our kvm/s390 trace points. Back then
Somebody (Avi?) suggested to provide two classes on trace points (kvm and kvm-s390)
where kvm provides trace points that are a given (HW events) and kvm-s390 is 
implementation specific. The idea was that we can change kvm-s390 and keep kvm.
Now there is a user of these trace points (perf kvm), which actually limits us to
only extend existing trace points (but not changing existing fields) also for
the kvm trace points.

Do we consider userspace under tools as part of the kernel (so we can change 
trace points that are only used by these tools?

Christian

> 
> The kernel should be able to have a debug API that is genuinely for *debugging* and doesn't freeze the underlying implementation.
> 
> Windows, AFAICT, works like this.  If you write a production program that invokes WinDbg or similar, it's going to break down the road.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07  5:30           ` Andy Lutomirski
  2016-09-07  6:41             ` Vlastimil Babka
  2016-09-07 13:15             ` Christian Borntraeger
@ 2016-09-07 15:30             ` Shuah Khan
  2016-09-07 16:10               ` Rik van Riel
  2 siblings, 1 reply; 21+ messages in thread
From: Shuah Khan @ 2016-09-07 15:30 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: ksummit-discuss

On Tue, Sep 6, 2016 at 11:30 PM, Andy Lutomirski <luto@kernel.org> wrote:
> On Sep 6, 2016 10:10 PM, "Al Viro" <viro@zeniv.linux.org.uk> wrote:
>>
>> On Wed, Sep 07, 2016 at 01:41:00AM +0300, Alexey Dobriyan wrote:
>>
>> > Gentlemen's agreement then:
>> > * kernel developers don't break tracepoints on purpose and maintain
>> >   compatibility in simple cases (long => int, deleted field, etc),
>> > * real, justified tracepoint breakage doesn't count.
>>
>> No go.  Scenario:
>>
>> 1) piss-poor API is added in form of tracehook.  It exports some
>> information
>> that can be used to derive something genuinely interesting.  Most of the
>> time.  Corner cases are unsolvable, even though it might be possible to
>> provide the interesting part sanely.  Just not in that form.  Moreover,
>> faking the bits used to derive that information so that existing userland
>> logics would yield the right result is bloody hard and restricts what we
>> can do kernel-side, even though the real thing userland wants would not
>> have
>> such problems.
>>
>
> Agreed.
>
> I wouldn't mind a policy that tracepoints are simply never stable.  Maybe we
> should even deliberately change them periodically to drive the point home.
>
> The kernel should be able to have a debug API that is genuinely for
> *debugging* and doesn't freeze the underlying implementation.

Agreed. Tracepoints and events provide a powerful tool in debug certain class
of problems (races and performance problems) where traditional debug methods
such as CONFIG_DEBUG_FOO aren't effective. Trace information includes important
status information on thread status which is helpful in debugging.

What I find useful is being able enable and disable events during
run-time without
worrying about whether or not  CONFIG_DEBUG_FOO is enabled on the kernel I am
debugging. In some cases, problem becomes disappears when debug is enabled.
In my mind, the kernel itself is different beast and all bets are off
for certain class
of problems.

I am by no means discounting the userspace angle of this issue. I am
looking to see
if we can find a way to use this feature without handicapping ourselves.

>
> Windows, AFAICT, works like this.  If you write a production program that
> invokes WinDbg or similar, it's going to break down the road.

We can make it clear that tracepoints and events can change with no notice.

thanks,
-- Shuah

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07 15:30             ` Shuah Khan
@ 2016-09-07 16:10               ` Rik van Riel
  2016-09-08  3:24                 ` Masami Hiramatsu
  2016-09-15 19:23                 ` Mark Brown
  0 siblings, 2 replies; 21+ messages in thread
From: Rik van Riel @ 2016-09-07 16:10 UTC (permalink / raw)
  To: Shuah Khan, Andy Lutomirski; +Cc: ksummit-discuss

[-- Attachment #1: Type: text/plain, Size: 1381 bytes --]

On Wed, 2016-09-07 at 09:30 -0600, Shuah Khan wrote:
> On Tue, Sep 6, 2016 at 11:30 PM, Andy Lutomirski <luto@kernel.org>
> wrote:
> > 
> > I wouldn't mind a policy that tracepoints are simply never
> > stable.  Maybe we
> > should even deliberately change them periodically to drive the
> > point home.
> > 
> > The kernel should be able to have a debug API that is genuinely for
> > *debugging* and doesn't freeze the underlying implementation.
> Agreed. Tracepoints and events provide a powerful tool in debug
> certain class
> of problems (races and performance problems) where traditional debug
> methods
> such as CONFIG_DEBUG_FOO aren't effective. Trace information includes
> important
> status information on thread status which is helpful in debugging.

From an enterprise distro (and user) point of view,
it is important to be able to debug a kernel that
is running on a production system (and developed
some problem after a month of running), without
having to reboot into a special "debug kernel".

Being able to just fire up a tracer debugging
script that can identify intermittent problems
is an invaluable tool in making the kernel better
for our users.

Hamstringing our ability to make the kernel better,
in order to keep the debugging ABI stable, is
shooting ourselves (and our users) in the foot.

-- 
All rights reversed

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 18:51 [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties Al Viro
  2016-09-06 19:22 ` Steven Rostedt
  2016-09-06 21:05 ` Shuah Khan
@ 2016-09-07 23:17 ` Masami Hiramatsu
  2 siblings, 0 replies; 21+ messages in thread
From: Masami Hiramatsu @ 2016-09-07 23:17 UTC (permalink / raw)
  To: Al Viro; +Cc: ksummit-discuss

On Tue, 6 Sep 2016 19:51:43 +0100
Al Viro <viro@ZenIV.linux.org.uk> wrote:

> 	Right now there is no mechanism for saying "if this tracepoints
> breaks, it's Not Our Problem(tm)".  All of them are parts of userland
> ABI, potentially casting in stone all kinds of kernel internals.  E.g.
> just today a patch series adding tracepoints to kobject primitives
> had been posted; if _that_ becomes a part of stable ABI, we get the
> lifetime rules for anything with an embedded kobject exposed to userland
> and potentially impossible to change - all it takes is a single piece of
> software making non-trivial use of those.

I have heard that the tracepoints(event) interface should not be considered
as stable API, since it strongly depended on kernel internal.
Actually, we already export the "format" of event for each tracepoint, and
provide libtraceevent for user-space programs. Of course there is no 
"semantics" information, but it depends on kernel implementation. 
So IMHO, if someone wants to make a tool for using tracepoint/traceevent,
he should update out-of-tree, or contribute it to linux kernel as a tool
under tools/.


> 	Exposing kernel internals to debugging scripts et.al. is fine,
> but only as long as we are promising to keep those scripts working.

For in-tree script, yes, but we can't promiss it for out-of-tree scripts.

> Otherwise we end up promising that arseloads of code we can't even see,
> sticking its fingers very deep into the kernel internals, will be kept
> functional through the kernel changes.  This is absolutely insane, of
> course, and I doubt that anybody would be willing to make such promises,
> no matter how much value would that add.
> 
> 	The tracepoints are undiffirentiated mass, with no way for userland
> to tell whether it's using something stable or an equivalent of someone's
> debugging printks with no promise of stability.  And folks, there *are*
> people out there with commit priveleges on assorted userland projects who'd
> made it perfectly clear that _any_ kernel interface they find is fair game -
> as in "if you didn't want it used, why did you put it in the kernel?".
> No matter how much I dislike that bunch for other things, I can't blame them
> for being less than upfront regarding that policy.  It had been expressed
> very clearly and we'd better take the warning seriously.

Hmm, at least we have to state that the tracepoint/event interface is not
stable in Documentation/trace/tracepoints.txt or somewhere else in kernel
tree.

Thank you,

> 
> 	I think this is something that needs to be discussed at KS; IMO we
> need at least some way to express the degree of stability promises made
> wrt individual tracepoints and some mechanisms for preventing silent creep
> towards full stability; something along the lines of "unstable tracepoint $FOO
> used by $PROGRAM, kernel tainted", at least.
> _______________________________________________
> Ksummit-discuss mailing list
> Ksummit-discuss@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 21:05 ` Shuah Khan
@ 2016-09-08  3:13   ` Masami Hiramatsu
  0 siblings, 0 replies; 21+ messages in thread
From: Masami Hiramatsu @ 2016-09-08  3:13 UTC (permalink / raw)
  To: Shuah Khan; +Cc: ksummit-discuss

On Tue, 6 Sep 2016 15:05:04 -0600
Shuah Khan <shuahkhan@gmail.com> wrote:

> On Tue, Sep 6, 2016 at 12:51 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> >         Right now there is no mechanism for saying "if this tracepoints
> > breaks, it's Not Our Problem(tm)".  All of them are parts of userland
> > ABI, potentially casting in stone all kinds of kernel internals.  E.g.
> > just today a patch series adding tracepoints to kobject primitives
> > had been posted; if _that_ becomes a part of stable ABI, we get the
> > lifetime rules for anything with an embedded kobject exposed to userland
> > and potentially impossible to change - all it takes is a single piece of
> > software making non-trivial use of those.
> 
> I honestly didn't think that my patch series will result in a special
> KS topic :)
> However, I think it is a good idea to discuss it as general topic for
> what kind of
> kernel information should/should not be made visible via tracepoints or other
> debug mechanisms.

Hmm, what the "information" means? I think there are 2 level of information
the "formal" information and "semantic" information. Tracepoints, debuginfo,
etc. provided by technical path are formal, and nobody can give the semantic
one.

For example, if we see an "unsigned long flags" in the code, we can understand
that is unsigned integer value and the size will be 32bit or 64bit depends on
CPU architecture. However, we can not know what the value means in the context
except for reading code or comment. It can be used for storing irq-flags, or
conditional flags, or rarely used for counting something. And also, we may not
know what will happened if the value is changed, except for precise documents etc.

One possible way to solve this is adding a kerneldoc entry for each tracepoint
so that we can understand what it means. However, it is still not enough for
keeping userspace program, because the "meaning" may not be machine readable.

Another possible solution is keeping the information meaning fixed with
its context (iow, make it stable), and write a manpage. But IMHO, it leads
to haedening of the arteris of the kernel desgin.

So, I would like to recommend someone who are using this kind of "information"
keep thier code update for newer kernel or contribute it (including test code)
to kernel tree, so that making it easy to fix/update (or abandon ... if the
context which tool depends on, is totally changed) it.

> We do support a wide range of tracepoints and events in various sub-systems
> skb.h, pagemap.h, and pagemap.h so on. Maybe it would be helpful to agree
> on some sort of guidelines for exposure.

How about adding a kerneldoc comment for each event as I described above?
I don't like to expose it via debugfs or tracefs, but maybe we can distribute
it as documents with kernel.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07 16:10               ` Rik van Riel
@ 2016-09-08  3:24                 ` Masami Hiramatsu
  2016-09-15 19:23                 ` Mark Brown
  1 sibling, 0 replies; 21+ messages in thread
From: Masami Hiramatsu @ 2016-09-08  3:24 UTC (permalink / raw)
  To: Rik van Riel; +Cc: ksummit-discuss, Andy Lutomirski

On Wed, 07 Sep 2016 12:10:49 -0400
Rik van Riel <riel@redhat.com> wrote:

> On Wed, 2016-09-07 at 09:30 -0600, Shuah Khan wrote:
> > On Tue, Sep 6, 2016 at 11:30 PM, Andy Lutomirski <luto@kernel.org>
> > wrote:
> > > 
> > > I wouldn't mind a policy that tracepoints are simply never
> > > stable.  Maybe we
> > > should even deliberately change them periodically to drive the
> > > point home.
> > > 
> > > The kernel should be able to have a debug API that is genuinely for
> > > *debugging* and doesn't freeze the underlying implementation.
> > Agreed. Tracepoints and events provide a powerful tool in debug
> > certain class
> > of problems (races and performance problems) where traditional debug
> > methods
> > such as CONFIG_DEBUG_FOO aren't effective. Trace information includes
> > important
> > status information on thread status which is helpful in debugging.
> 
> From an enterprise distro (and user) point of view,
> it is important to be able to debug a kernel that
> is running on a production system (and developed
> some problem after a month of running), without
> having to reboot into a special "debug kernel".
> 
> Being able to just fire up a tracer debugging
> script that can identify intermittent problems
> is an invaluable tool in making the kernel better
> for our users.

I had made a systemtap flight-recorder mode for same reason, which
records events in background and user can dump it if something
happens. I know that ftrace also has a trace buffer dump feature
when kernel gets panic. (and crash can dump ftrace buffer from
kernel crashdump)

> Hamstringing our ability to make the kernel better,
> in order to keep the debugging ABI stable, is
> shooting ourselves (and our users) in the foot.

Agreed, even if we are able to stabilize tracepoints, the code context
around the tracepoints should be changed, not only code evolution,
but also fixing bugs.

Thank you,

-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-06 23:12         ` Steven Rostedt
@ 2016-09-08 11:43           ` Alexey Dobriyan
  0 siblings, 0 replies; 21+ messages in thread
From: Alexey Dobriyan @ 2016-09-08 11:43 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: ksummit-discuss

On Wed, Sep 7, 2016 at 2:12 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Wed, 7 Sep 2016 01:41:00 +0300
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
>>
>> > Specifically:
>> >
>> >  "If a change results in user programs breaking, it's a bug in the
>> >   kernel. We never EVER blame the user programs."
>>
>> Linus has said many things. I've personally had Python compilation busted
>> when Linux 4 appeared but somehow digit 4 is still with us. By that logic,
>> major version should have been reverted back to 3 long ago.
>
> There is a limit to the insanity. If a userspace tool depends on a
> kernel version number, then it pinned itself to that version. If Python
> never expected a 4 to appear, then it's compiling will be left to 3.x
> kernels.

No, no, no. Python compiled fine on 2.6 (it was 2.6 => 3 transition
of course), and then it stopped compiling on 3.

How fast people forget:

    F15 has now moved to the 2.6.40 kernel.  If you haven't
    been paying attention lately, you'll probably be saying
    "wait... there is no 2.6.40 upstream" and you would be right.
    So Fedora's 2.6.40 is really the 3.0 upstream kernel,
    "rebranded" to follow the 2.6.x numbering scheme.
    This was done to avoid userspace incompatibilities with
    the 3.x numbering scheme for packages that were either
    tightly coupled to kernel version and/or, uh, doing things
    a bit wrongly.  Most of those packages have been fixed in f16
    at this point.

So much stuff broke, warranting non-existent kernel version.

>> > > P.S.: techically every kernel release almost certainly breaks crash(1)
>> > > program, program many people on this list should be familiar with.
>> > > It is unclear why rules should be different for tracepoints.
>> >
>> > Well, crash() isn't a userspace tool that runs on top of Linux. Well,
>> > it does, but only the input from a core dump of a Linux kernel breaks
>> > it. It will always run fine on all Linux versions as long as it uses
>> > the same input.
>>
>> It can act on live kernel.
>
> Again, there's a limit to the insanity ;-)

Of course. There is no question about crash because it is
so obviously depends on kernel internals.


>> > Tracepoints are runtime visible. This isn't a postmortem analysis. We
>> > already had an issues when powertop read the tracepoints directly
>> > without using the tracepoint format file parsing, and we ended up
>> > having 4 bytes of useless data in *every* tracepoint. Luckily, that got
>> > fixed because this hard coding broke when running powertop from a 32
>> > bit userspace on top of a 64 bit kernel. I worked to get powertop to
>> > use the tracepoint format parsing that perf and trace-cmd uses.
>> >
>> > But if something depends on event fields, we need to maintain that. For
>> > now, we have fake fields in the sched_wakeup tracepoint, because of
>> > this.
>> >
>> > It's a balance that we need to figure out. One is that tracepoints are
>> > really helpful for in the field debugging to see what is happening. The
>> > other is that they are becoming an ABI and if a useful tool (like
>> > powertop) hooks into them, whatever they hooked into becomes set in
>> > stone.
>>
>> There is no balance. One can't even reorder gfp_t flags:
>>
>>       DECLARE_EVENT_CLASS(kmem_alloc,
>>       TP_STRUCT__entry(
>>                 __field(        unsigned long,  call_site       )
>>                 __field(        const void *,   ptr             )
>>                 __field(        size_t,         bytes_req       )
>>                 __field(        size_t,         bytes_alloc     )
>>                 __field(        gfp_t,          gfp_flags       )
>>         ),
>
> You mean if a tool depends on the order of bits set? I guess the
> question is, is there such a tool, and have people complained when
> things break? Or has anything broken yet?

How on earth could I know what is broken?
It is obvious to anyone who has grasped the concept of ABI
that gfp_t flags can not be changed anymore.

Here is something I don't undestand.

When /proc/*/pagemap exports raw page flags, pagemap authors
get flamed and ridiculed for doing it. pagemap abstracts flags
to maintain stable ordering at least and everything was quiet since then.
But when tracepoints ships gfp_t directly it is "umm, ohh, lets discuss it,
because, you know, much useful interface, enterprise distros mmmkay"
when it is clearly should not get past even brief review.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07 16:10               ` Rik van Riel
  2016-09-08  3:24                 ` Masami Hiramatsu
@ 2016-09-15 19:23                 ` Mark Brown
  1 sibling, 0 replies; 21+ messages in thread
From: Mark Brown @ 2016-09-15 19:23 UTC (permalink / raw)
  To: Rik van Riel; +Cc: ksummit-discuss, Andy Lutomirski

[-- Attachment #1: Type: text/plain, Size: 954 bytes --]

On Wed, Sep 07, 2016 at 12:10:49PM -0400, Rik van Riel wrote:

> From an enterprise distro (and user) point of view,
> it is important to be able to debug a kernel that
> is running on a production system (and developed
> some problem after a month of running), without
> having to reboot into a special "debug kernel".
> 
> Being able to just fire up a tracer debugging
> script that can identify intermittent problems
> is an invaluable tool in making the kernel better
> for our users.
> 
> Hamstringing our ability to make the kernel better,
> in order to keep the debugging ABI stable, is
> shooting ourselves (and our users) in the foot.

This also applies very much to embedded systems development.  The
flight recorder aspect and the ability to turn on trace without
requiring a custom build be distributed and flashed are massively
helpful, I've seen people get really excited when they see tracepoints
for the first time.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties
  2016-09-07  6:41             ` Vlastimil Babka
@ 2016-09-19 12:51               ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2016-09-19 12:51 UTC (permalink / raw)
  To: Vlastimil Babka; +Cc: ksummit-discuss

On Wed 07-09-16 08:41:23, Vlastimil Babka wrote:
> On 09/07/2016 07:30 AM, Andy Lutomirski wrote:
> > On Sep 6, 2016 10:10 PM, "Al Viro" <viro@zeniv.linux.org.uk
> > <mailto:viro@zeniv.linux.org.uk>> wrote:
> > > 1) piss-poor API is added in form of tracehook.  It exports some
> > information
> > > that can be used to derive something genuinely interesting.  Most of the
> > > time.  Corner cases are unsolvable, even though it might be possible to
> > > provide the interesting part sanely.  Just not in that form.  Moreover,
> > > faking the bits used to derive that information so that existing userland
> > > logics would yield the right result is bloody hard and restricts what we
> > > can do kernel-side, even though the real thing userland wants would
> > not have
> > > such problems.
> > > 
> > 
> > Agreed.
> > 
> > I wouldn't mind a policy that tracepoints are simply never stable.
> > Maybe we should even deliberately change them periodically to drive the
> > point home.
> 
> I would wish that as well. Tracepoints sometimes must expose implementation
> details to be useful for debugging *the implementation*. The tools that use
> them to build some extra interesting metrics on top the tracepoints may
> often need to know even more about the implementation for correct
> interpretation of the data.
> 
> Even if the kernel implementation changes *without* touching the tracepoint
> format, the changed behavior might still destroy these metrics. (I hope it's
> understandable what I'm trying to say, I can't think of a good example right
> now). Do we want to set the implementation in stone to such extent? That
> would suck. If not, then the only way would be to indeed not mainline any
> tracepoints, which would also suck.

Absolutely agreed. As an example just look at the recent reclaim
changes. We no longer do per-zone and replaced it by per-node. This has
required changes to the tracepoint as well. Just look at how
599d0c954f91d0689c9bb421b5bc04ea02437a41 changed
mm_vmscan_lru_shrink_inactive. We definitely do not want to cast our
implementation details into stone.

My recollection is that tracepoints will never be considered a stable
ABI. If we want something stable then we have proper ways to use...

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-09-19 12:51 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-06 18:51 [Ksummit-discuss] [topic proposal] tracepoints and ABI stability warranties Al Viro
2016-09-06 19:22 ` Steven Rostedt
2016-09-06 21:36   ` Alexey Dobriyan
2016-09-06 21:53     ` Steven Rostedt
2016-09-06 22:41       ` Alexey Dobriyan
2016-09-06 23:12         ` Steven Rostedt
2016-09-08 11:43           ` Alexey Dobriyan
2016-09-07  5:10         ` Al Viro
2016-09-07  5:30           ` Andy Lutomirski
2016-09-07  6:41             ` Vlastimil Babka
2016-09-19 12:51               ` Michal Hocko
2016-09-07 13:15             ` Christian Borntraeger
2016-09-07 15:30             ` Shuah Khan
2016-09-07 16:10               ` Rik van Riel
2016-09-08  3:24                 ` Masami Hiramatsu
2016-09-15 19:23                 ` Mark Brown
2016-09-06 22:02     ` Alexey Dobriyan
2016-09-06 22:15       ` Steven Rostedt
2016-09-06 21:05 ` Shuah Khan
2016-09-08  3:13   ` Masami Hiramatsu
2016-09-07 23:17 ` Masami Hiramatsu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.