* [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
[not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
@ 2011-12-01 21:41 ` Mathieu Desnoyers
2011-12-01 21:57 ` Christoph Hellwig
2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw)
To: Greg KH, Mathieu Desnoyers
Cc: devel, lttng-dev, Mathieu Desnoyers, Linus Torvalds,
Christoph Hellwig, Christoph Lameter, Tejun Heo, David Howells,
David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt,
linux-mm, linux-kernel, Greg KH
LTTng needs this symbol exported. It calls it to ensure its tracing
buffers and allocated data structures never trigger a page fault. This
is required to handle page fault handler tracing and NMI tracing
gracefully.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Christoph Hellwig <hch@infradead.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Tejun Heo <tj@kernel.org>
CC: David Howells <dhowells@redhat.com>
CC: David McCullough <davidm@snapgear.com>
CC: D Jeff Dionne <jeff@uClinux.org>
CC: Greg Ungerer <gerg@snapgear.com>
CC: Paul Mundt <lethal@linux-sh.org>
CC: linux-mm@kvack.org
CC: linux-kernel@vger.kernel.org
CC: Greg KH <greg@kroah.com>
---
mm/nommu.c | 1 +
mm/vmalloc.c | 1 +
2 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/mm/nommu.c b/mm/nommu.c
index b982290..b22a0d9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -441,6 +441,7 @@ EXPORT_SYMBOL_GPL(vm_unmap_aliases);
void __attribute__((weak)) vmalloc_sync_all(void)
{
}
+EXPORT_SYMBOL_GPL(vmalloc_sync_all);
/**
* alloc_vm_area - allocate a range of kernel address space
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3231bf3..37ddce5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2137,6 +2137,7 @@ EXPORT_SYMBOL(remap_vmalloc_range);
void __attribute__((weak)) vmalloc_sync_all(void)
{
}
+EXPORT_SYMBOL_GPL(vmalloc_sync_all);
static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
--
1.7.5.4
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules
[not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
@ 2011-12-01 21:41 ` Mathieu Desnoyers
2011-12-02 7:19 ` Jens Axboe
2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw)
To: Greg KH, Mathieu Desnoyers
Cc: devel, lttng-dev, Mathieu Desnoyers, Linus Torvalds, Ingo Molnar,
Jens Axboe, linux-kernel, Greg KH
The LTTng driver needs this symbol exported because it implements its
own splice actor.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Jens Axboe <axboe@kernel.dk>
CC: linux-kernel@vger.kernel.org
CC: Greg KH <greg@kroah.com>
---
fs/splice.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/fs/splice.c b/fs/splice.c
index fa2defa..9eb15b5 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
return ret;
}
+EXPORT_SYMBOL_GPL(splice_to_pipe);
void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
{
--
1.7.5.4
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH 09/11] sched: export task_prio to GPL modules
[not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
@ 2011-12-01 21:41 ` Mathieu Desnoyers
2011-12-01 21:56 ` Peter Zijlstra
2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw)
To: Greg KH, Mathieu Desnoyers
Cc: devel, lttng-dev, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
linux-kernel, Greg KH
LTTng needs this symbol to prepend the current task dynamic priority
value to events (optional context information).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
CC: linux-kernel@vger.kernel.org
CC: Greg KH <greg@kroah.com>
---
kernel/sched.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 0e9344a..80dbb09 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5142,6 +5142,7 @@ int task_prio(const struct task_struct *p)
{
return p->prio - MAX_RT_PRIO;
}
+EXPORT_SYMBOL_GPL(task_prio);
/**
* task_nice - return the nice value of a given task.
--
1.7.5.4
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
@ 2011-12-01 21:56 ` Peter Zijlstra
2011-12-01 22:04 ` Mathieu Desnoyers
2011-12-01 22:14 ` Greg KH
0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 21:56 UTC (permalink / raw)
To: Mathieu Desnoyers; +Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel
On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> LTTng needs this symbol to prepend the current task dynamic priority
> value to events (optional context information).
I absolutely detest exporting such stuff. It propagates the idea that
task prio actually means something. Also, modules really shouldn't care.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
@ 2011-12-01 21:57 ` Christoph Hellwig
2011-12-01 22:13 ` Greg KH
0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2011-12-01 21:57 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Christoph Hellwig,
Christoph Lameter, Tejun Heo, David Howells, David McCullough,
D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel
On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> LTTng needs this symbol exported. It calls it to ensure its tracing
> buffers and allocated data structures never trigger a page fault. This
> is required to handle page fault handler tracing and NMI tracing
> gracefully.
We:
a) don't export symbols unless they have an intree-user
b) especially don't export something as lowlevel as this one.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 21:56 ` Peter Zijlstra
@ 2011-12-01 22:04 ` Mathieu Desnoyers
2011-12-01 22:10 ` Peter Zijlstra
2011-12-01 22:14 ` Greg KH
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:04 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > LTTng needs this symbol to prepend the current task dynamic priority
> > value to events (optional context information).
>
> I absolutely detest exporting such stuff. It propagates the idea that
> task prio actually means something. Also, modules really shouldn't care.
People debugging their SCHED_FIFO/SCHED_RR applications, as well as
users of priority-inheritance futexes, may happen to find this
information extremely useful.
Just saying...
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:04 ` Mathieu Desnoyers
@ 2011-12-01 22:10 ` Peter Zijlstra
2011-12-01 22:15 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 22:10 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol to prepend the current task dynamic priority
> > > value to events (optional context information).
> >
> > I absolutely detest exporting such stuff. It propagates the idea that
> > task prio actually means something. Also, modules really shouldn't care.
>
> People debugging their SCHED_FIFO/SCHED_RR applications, as well as
> users of priority-inheritance futexes, may happen to find this
> information extremely useful.
>
> Just saying...
Right until the moment we go do deadlines.. Anyway, it still doesn't
make sense, your sched_switch() tracepoint handler gets this
information, why do you need this export at all?
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
2011-12-01 21:57 ` Christoph Hellwig
@ 2011-12-01 22:13 ` Greg KH
2011-12-01 22:19 ` Mathieu Desnoyers
2011-12-01 22:28 ` Christoph Hellwig
0 siblings, 2 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 22:13 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds,
Christoph Lameter, Tejun Heo, David Howells, David McCullough,
D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel
On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > LTTng needs this symbol exported. It calls it to ensure its tracing
> > buffers and allocated data structures never trigger a page fault. This
> > is required to handle page fault handler tracing and NMI tracing
> > gracefully.
>
> We:
>
> a) don't export symbols unless they have an intree-user
lttng is now in-tree in the drivers/staging/ area. See linux-next for
details if you are curious.
> b) especially don't export something as lowlevel as this one.
Mathieu, there's nothing else you can do to get this information? Or
does lttng really want such lowlevel data?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 21:56 ` Peter Zijlstra
2011-12-01 22:04 ` Mathieu Desnoyers
@ 2011-12-01 22:14 ` Greg KH
2011-12-01 22:20 ` Mathieu Desnoyers
2011-12-01 23:07 ` Peter Zijlstra
1 sibling, 2 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 22:14 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel
On Thu, Dec 01, 2011 at 10:56:08PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > LTTng needs this symbol to prepend the current task dynamic priority
> > value to events (optional context information).
>
> I absolutely detest exporting such stuff. It propagates the idea that
> task prio actually means something. Also, modules really shouldn't care.
Mathieu, if you don't have this information, does anything really care?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:10 ` Peter Zijlstra
@ 2011-12-01 22:15 ` Mathieu Desnoyers
2011-12-01 22:36 ` Mathieu Desnoyers
2011-12-01 23:06 ` Peter Zijlstra
0 siblings, 2 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:15 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > > LTTng needs this symbol to prepend the current task dynamic priority
> > > > value to events (optional context information).
> > >
> > > I absolutely detest exporting such stuff. It propagates the idea that
> > > task prio actually means something. Also, modules really shouldn't care.
> >
> > People debugging their SCHED_FIFO/SCHED_RR applications, as well as
> > users of priority-inheritance futexes, may happen to find this
> > information extremely useful.
> >
> > Just saying...
>
> Right until the moment we go do deadlines.. Anyway, it still doesn't
> make sense, your sched_switch() tracepoint handler gets this
> information, why do you need this export at all?
If you don't want to trace sched_switch, but just conveniently prepend
this information to all your events, then lttng lets you dynamically
target this extra bit of information. Note that it's not a mandatory
event field: I call those "context" fields that the tracer prepends to
events, as requested by the user.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
2011-12-01 22:13 ` Greg KH
@ 2011-12-01 22:19 ` Mathieu Desnoyers
2011-12-01 22:41 ` Greg KH
2011-12-01 22:28 ` Christoph Hellwig
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:19 UTC (permalink / raw)
To: Greg KH
Cc: Christoph Hellwig, devel, lttng-dev, Linus Torvalds,
Christoph Lameter, Tejun Heo, David Howells, David McCullough,
D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel
* Greg KH (greg@kroah.com) wrote:
> On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > buffers and allocated data structures never trigger a page fault. This
> > > is required to handle page fault handler tracing and NMI tracing
> > > gracefully.
> >
> > We:
> >
> > a) don't export symbols unless they have an intree-user
>
> lttng is now in-tree in the drivers/staging/ area. See linux-next for
> details if you are curious.
>
> > b) especially don't export something as lowlevel as this one.
>
> Mathieu, there's nothing else you can do to get this information? Or
> does lttng really want such lowlevel data?
LTTng calls vmalloc_sync_all() to make sure it won't crash the system
(due to recursive page fault) when hooking on the page fault handler and
on any hook that would happen to sit in a function hit by NMI context.
So it really goes beyond just extracting information for this one I'm
afraid: it's a matter of execution correctness.
This is a point I'm really anal about: the tracer should _never_ crash
the traced system, _ever_, in any foreseeable condition.
Thanks,
Mathieu
>
> thanks,
>
> greg k-h
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:14 ` Greg KH
@ 2011-12-01 22:20 ` Mathieu Desnoyers
2011-12-01 23:07 ` Peter Zijlstra
1 sibling, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:20 UTC (permalink / raw)
To: Greg KH; +Cc: Peter Zijlstra, devel, lttng-dev, Ingo Molnar, linux-kernel
* Greg KH (greg@kroah.com) wrote:
> On Thu, Dec 01, 2011 at 10:56:08PM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol to prepend the current task dynamic priority
> > > value to events (optional context information).
> >
> > I absolutely detest exporting such stuff. It propagates the idea that
> > task prio actually means something. Also, modules really shouldn't care.
>
> Mathieu, if you don't have this information, does anything really care?
I can just remove this specific context module, nothing else will care
except the end users, but it's a shame to lose this option.
Thanks,
Mathieu
>
> thanks,
>
> greg k-h
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
2011-12-01 22:13 ` Greg KH
2011-12-01 22:19 ` Mathieu Desnoyers
@ 2011-12-01 22:28 ` Christoph Hellwig
2011-12-01 23:00 ` Greg KH
1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2011-12-01 22:28 UTC (permalink / raw)
To: Greg KH
Cc: Christoph Hellwig, Mathieu Desnoyers, devel, lttng-dev,
Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells,
David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt,
linux-mm, linux-kernel
On Thu, Dec 01, 2011 at 02:13:37PM -0800, Greg KH wrote:
> On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > buffers and allocated data structures never trigger a page fault. This
> > > is required to handle page fault handler tracing and NMI tracing
> > > gracefully.
> >
> > We:
> >
> > a) don't export symbols unless they have an intree-user
>
> lttng is now in-tree in the drivers/staging/ area. See linux-next for
> details if you are curious.
Eww - merging stuff without discussion on lkml is more than evil.
Either way, it was guaranteed that drivers/staging is considered out of
tree for core code. I'm defintively dead set against exporting anything
for staging and opening that slippery slope.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:15 ` Mathieu Desnoyers
@ 2011-12-01 22:36 ` Mathieu Desnoyers
2011-12-01 23:05 ` Peter Zijlstra
2011-12-01 23:06 ` Peter Zijlstra
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:36 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote:
> > > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > > > LTTng needs this symbol to prepend the current task dynamic priority
> > > > > value to events (optional context information).
> > > >
> > > > I absolutely detest exporting such stuff. It propagates the idea that
> > > > task prio actually means something. Also, modules really shouldn't care.
> > >
> > > People debugging their SCHED_FIFO/SCHED_RR applications, as well as
> > > users of priority-inheritance futexes, may happen to find this
> > > information extremely useful.
> > >
> > > Just saying...
> >
> > Right until the moment we go do deadlines.. Anyway, it still doesn't
> > make sense, your sched_switch() tracepoint handler gets this
> > information, why do you need this export at all?
>
> If you don't want to trace sched_switch, but just conveniently prepend
> this information to all your events, then lttng lets you dynamically
> target this extra bit of information. Note that it's not a mandatory
> event field: I call those "context" fields that the tracer prepends to
> events, as requested by the user.
One more point:
compudj@thinkos:/proc/204$ cat sched
khubd (204, #threads: 1)
---------------------------------------------------------
se.exec_start : 3355267.749529
se.vruntime : 113843.899081
se.sum_exec_runtime : 12.820702
nr_switches : 386
nr_voluntary_switches : 385
nr_involuntary_switches : 1
se.load.weight : 1024
policy : 0
prio : 120
clock-delta : 130
So what you are saying is that it is fine to export task_prio to
_userspace_, thus making it part of the ABI, but it's not OK to export
it to GPL modules ?
Weird huh ?
Mathieu
>
> Thanks,
>
> Mathieu
>
>
> --
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
2011-12-01 22:19 ` Mathieu Desnoyers
@ 2011-12-01 22:41 ` Greg KH
0 siblings, 0 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 22:41 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Christoph Hellwig, devel, lttng-dev, Linus Torvalds,
Christoph Lameter, Tejun Heo, David Howells, David McCullough,
D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel
On Thu, Dec 01, 2011 at 05:19:40PM -0500, Mathieu Desnoyers wrote:
> * Greg KH (greg@kroah.com) wrote:
> > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > > buffers and allocated data structures never trigger a page fault. This
> > > > is required to handle page fault handler tracing and NMI tracing
> > > > gracefully.
> > >
> > > We:
> > >
> > > a) don't export symbols unless they have an intree-user
> >
> > lttng is now in-tree in the drivers/staging/ area. See linux-next for
> > details if you are curious.
> >
> > > b) especially don't export something as lowlevel as this one.
> >
> > Mathieu, there's nothing else you can do to get this information? Or
> > does lttng really want such lowlevel data?
>
> LTTng calls vmalloc_sync_all() to make sure it won't crash the system
> (due to recursive page fault) when hooking on the page fault handler and
> on any hook that would happen to sit in a function hit by NMI context.
> So it really goes beyond just extracting information for this one I'm
> afraid: it's a matter of execution correctness.
Ok, fair enough.
Christoph, is there any other way to achive something like this without
this symbol being exported that you know of?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
2011-12-01 22:28 ` Christoph Hellwig
@ 2011-12-01 23:00 ` Greg KH
0 siblings, 0 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 23:00 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds,
Christoph Lameter, Tejun Heo, David Howells, David McCullough,
D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel
On Thu, Dec 01, 2011 at 05:28:03PM -0500, Christoph Hellwig wrote:
> On Thu, Dec 01, 2011 at 02:13:37PM -0800, Greg KH wrote:
> > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > > buffers and allocated data structures never trigger a page fault. This
> > > > is required to handle page fault handler tracing and NMI tracing
> > > > gracefully.
> > >
> > > We:
> > >
> > > a) don't export symbols unless they have an intree-user
> >
> > lttng is now in-tree in the drivers/staging/ area. See linux-next for
> > details if you are curious.
>
> Eww - merging stuff without discussion on lkml is more than evil.
Do you really want discussing all staging driver crap on lkml?
Core changes, like this one, for stuff in staging should be done on
lkml, which is what this conversation is :)
> Either way, it was guaranteed that drivers/staging is considered out of
> tree for core code.
The zram and zcache code would tend to disagree with you there :)
> I'm defintively dead set against exporting anything for staging and
> opening that slippery slope.
How else should we handle something like this then? Some code, this one
specifically, is trying to get merged, so taking it slowly, through
staging, and getting it reviewed and cleaned up better before it can go
into the "real" part of the kernel, is the whole goal here.
Here's a real need for a symbol that an existing, shipping, useful
kernel module is wanting to use.
If you can provide a way that this can be handled without such an
export, that does not require digging through the symbol table (which is
what it was doing and I rightfully objected to that), then please let us
know.
Otherwise, what are our alternatives here, to just forbid this code from
ever being merged?
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:36 ` Mathieu Desnoyers
@ 2011-12-01 23:05 ` Peter Zijlstra
2011-12-02 13:51 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 23:05 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
On Thu, 2011-12-01 at 17:36 -0500, Mathieu Desnoyers wrote:
> So what you are saying is that it is fine to export task_prio to
> _userspace_, thus making it part of the ABI, but it's not OK to export
> it to GPL modules ?
that's a SCHED_DEBUG proc file.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:15 ` Mathieu Desnoyers
2011-12-01 22:36 ` Mathieu Desnoyers
@ 2011-12-01 23:06 ` Peter Zijlstra
2011-12-01 23:18 ` Greg KH
1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 23:06 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
>
> If you don't want to trace sched_switch, but just conveniently prepend
> this information to all your events
Oh so you want to debug a scheduler issue but don't want to use the
scheduler tracepoint, I guess that makes perfect sense for clueless
people.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 22:14 ` Greg KH
2011-12-01 22:20 ` Mathieu Desnoyers
@ 2011-12-01 23:07 ` Peter Zijlstra
2011-12-01 23:17 ` Greg KH
1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 23:07 UTC (permalink / raw)
To: Greg KH; +Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel
On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> greg k-h
Greg, why are you merging this crap anyway? Aren't there enough tracer
thingies around already?
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 23:07 ` Peter Zijlstra
@ 2011-12-01 23:17 ` Greg KH
2011-12-05 14:17 ` Ingo Molnar
0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-01 23:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel
On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote:
> On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> > greg k-h
>
> Greg, why are you merging this crap anyway? Aren't there enough tracer
> thingies around already?
I don't know, is there?
There's some reason the distros, and users, still use lttng, so I'm
guessing that it fits the needs of quite a few people.
That's why I'm merging it, if that the in-kernel stuff obsoletes lttng,
great, let me, and the distros know.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 23:06 ` Peter Zijlstra
@ 2011-12-01 23:18 ` Greg KH
2011-12-01 23:47 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-01 23:18 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel,
Darren Hart
On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote:
> On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
> >
> > If you don't want to trace sched_switch, but just conveniently prepend
> > this information to all your events
>
> Oh so you want to debug a scheduler issue but don't want to use the
> scheduler tracepoint, I guess that makes perfect sense for clueless
> people.
Matheiu, can't lttng use the scheduler tracepoint for this information?
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 23:18 ` Greg KH
@ 2011-12-01 23:47 ` Mathieu Desnoyers
0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 23:47 UTC (permalink / raw)
To: Greg KH
Cc: Peter Zijlstra, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
* Greg KH (greg@kroah.com) wrote:
> On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
> > >
> > > If you don't want to trace sched_switch, but just conveniently prepend
> > > this information to all your events
> >
> > Oh so you want to debug a scheduler issue but don't want to use the
> > scheduler tracepoint, I guess that makes perfect sense for clueless
> > people.
>
> Matheiu, can't lttng use the scheduler tracepoint for this information?
LTTng allows user to choose between both methods, each one being suited
to a particular use of the tracer:
A) Extraction through the scheduler tracepoint:
LTTng viewers have a full-fledged current state reconstruction of the
traced OS (for any point in time during the trace) performed as one
of the bottom layers of our trace analysis tools. This makes sense
for use-cases where the data needs to be transported, and/or stored,
and where the amount of data throughput needs to be minimized. We use
this technique a lot, of course. This state-tracking requires
CPU/memory resource usage by the viewer.
B) Extraction through "optional" event context information:
We have, in development, a new "enhanced top" called lttngtop that
uses tracing information, directly read from mmap'd buffers, to
provide second-by-second profile information of the system. It is
not as sensitive to data compactness as the transport/disk storage
use-case, mainly because no data copy is ever required -- the buffers
simply get overwritten after lttngtop has finished aggregating the
information. This has less performance overhead that the big hammer
"top" that periodically reads all files in /proc, and can provide
much more detailed profiles.
This use-case favors sending additional data from kernel to
user-space rather than recomputing the OS state within lttngtop, due
to the very low overhead of direct mmap data transport, over
recomputing state needlessly.
We could very well "cheat" and use a scheduler tracepoint to keep a
duplicate of the current priority value for each CPU within the tracer
kernel module. Let me know if you want me to do this.
Also, as a matter of fact, the "prio" information exported from the
sched_switch event in mainline trace events does not match the prio
shown in /proc stat files. The "MAX_RT_PRIO" offset is missing.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules
2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
@ 2011-12-02 7:19 ` Jens Axboe
2011-12-02 12:32 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Jens Axboe @ 2011-12-02 7:19 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Ingo Molnar,
Jens Axboe, linux-kernel
On 2011-12-01 22:41, Mathieu Desnoyers wrote:
> The LTTng driver needs this symbol exported because it implements its
> own splice actor.
>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: linux-kernel@vger.kernel.org
> CC: Greg KH <greg@kroah.com>
> ---
> fs/splice.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/fs/splice.c b/fs/splice.c
> index fa2defa..9eb15b5 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>
> return ret;
> }
> +EXPORT_SYMBOL_GPL(splice_to_pipe);
The rest of the splice symbols are regular exports, please do the same
for this one. Thanks.
--
Jens Axboe
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules
2011-12-02 7:19 ` Jens Axboe
@ 2011-12-02 12:32 ` Mathieu Desnoyers
0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-02 12:32 UTC (permalink / raw)
To: Jens Axboe
Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Ingo Molnar,
Jens Axboe, linux-kernel
* Jens Axboe (jens@axboe.dk) wrote:
> On 2011-12-01 22:41, Mathieu Desnoyers wrote:
> > The LTTng driver needs this symbol exported because it implements its
> > own splice actor.
> >
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > CC: Linus Torvalds <torvalds@linux-foundation.org>
> > CC: Ingo Molnar <mingo@elte.hu>
> > CC: Jens Axboe <axboe@kernel.dk>
> > CC: linux-kernel@vger.kernel.org
> > CC: Greg KH <greg@kroah.com>
> > ---
> > fs/splice.c | 1 +
> > 1 files changed, 1 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/splice.c b/fs/splice.c
> > index fa2defa..9eb15b5 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> >
> > return ret;
> > }
> > +EXPORT_SYMBOL_GPL(splice_to_pipe);
>
> The rest of the splice symbols are regular exports, please do the same
> for this one. Thanks.
I've been wondering about this one, but thought it would be better to
let you decide on opening up the symbol more than with _GPL. Will do!
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 23:05 ` Peter Zijlstra
@ 2011-12-02 13:51 ` Mathieu Desnoyers
0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-02 13:51 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart
* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-12-01 at 17:36 -0500, Mathieu Desnoyers wrote:
> > So what you are saying is that it is fine to export task_prio to
> > _userspace_, thus making it part of the ABI, but it's not OK to export
> > it to GPL modules ?
>
> that's a SCHED_DEBUG proc file.
Fair point. You'll then notice that /proc/<pid>/stat (18th field)
exports it too, and it's not under SCHED_DEBUG:
ok:/proc/20# cat stat
20 (migration/5) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 -100 0 1 0 70 0
0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744071579371389 0
0 17 5 99 1 0 0 0
(see -100 above)
as defined in Documentation/filesystems/proc.txt:
"Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
[...]
priority priority level"
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-01 23:17 ` Greg KH
@ 2011-12-05 14:17 ` Ingo Molnar
2011-12-06 21:44 ` Greg KH
2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
0 siblings, 2 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-05 14:17 UTC (permalink / raw)
To: Greg KH
Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
linux-kernel, Linus Torvalds, Andrew Morton
* Greg KH <greg@kroah.com> wrote:
> On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> > > greg k-h
> >
> > Greg, why are you merging this crap anyway? Aren't there enough tracer
> > thingies around already?
>
> I don't know, is there?
>
> There's some reason the distros, and users, still use lttng,
> so I'm guessing that it fits the needs of quite a few people.
Same goes for a whole lot of other crap that distros are
carrying. Would we want to merge a different CPU scheduler or
the 4g:4g patch or a completely new networking stack into
drivers/staging/? I don't think so.
I.e. putting LTTNG into drivers/staging/ will not really solve
anything - and in may in fact delay any sane technical
resolution:
There's a difference between a driver that has to go into
drivers/staging/ because nobody cares enough [and the driver
isnt high quality enough yet], and a core kernel feature that we
DO care about and which HAS BEEN REJECTED IN ITS FORM.
> That's why I'm merging it, if that the in-kernel stuff
> obsoletes lttng, great, let me, and the distros know.
I'm NAK-ing the LTTNG driver really, as it's a workaround for a
core kernel NAK.
Mathieu, please work with the tracing folks who DO care about
this stuff. It's not like there's a lack of interest in this
area, nor is there a lack of willingness to take patches. What
there is a lack of is your willingness to actually work on
getting something unified, integrated to users...
LTTNG has been going on for how many years? I havent seen many
steps towards actually *merging* its functionality - you insist
on doing your own random thing, which is different in random
ways. Yes, some of those random ways may in fact be better than
what we have upstream - would you be interested in filtering
those out and pushing them upstream? I certainly would like to
see that happen.
We want to pick the best features, and throw away current
upstream code in favor of superior out of tree code - this
concept of letting crap sit alongside each other when people do
care i cannot agree with.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-05 14:17 ` Ingo Molnar
@ 2011-12-06 21:44 ` Greg KH
2011-12-08 5:23 ` Ingo Molnar
2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
1 sibling, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-06 21:44 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
linux-kernel, Linus Torvalds, Andrew Morton
On Mon, Dec 05, 2011 at 03:17:49PM +0100, Ingo Molnar wrote:
>
> * Greg KH <greg@kroah.com> wrote:
>
> > On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote:
> > > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> > > > greg k-h
> > >
> > > Greg, why are you merging this crap anyway? Aren't there enough tracer
> > > thingies around already?
> >
> > I don't know, is there?
> >
> > There's some reason the distros, and users, still use lttng,
> > so I'm guessing that it fits the needs of quite a few people.
>
> Same goes for a whole lot of other crap that distros are
> carrying. Would we want to merge a different CPU scheduler or
> the 4g:4g patch or a completely new networking stack into
> drivers/staging/? I don't think so.
Distros have new CPU schedulers and are still dragging the 4g split
around? A whole new networking stack would be interesting, and if
self-contained, possible :)
> I.e. putting LTTNG into drivers/staging/ will not really solve
> anything - and in may in fact delay any sane technical
> resolution:
>
> There's a difference between a driver that has to go into
> drivers/staging/ because nobody cares enough [and the driver
> isnt high quality enough yet], and a core kernel feature that we
> DO care about and which HAS BEEN REJECTED IN ITS FORM.
I didn't realize that lttng was rejected, when was that done? I
couldn't find it in the archives anywhere.
That's why I took this. It's a way for the code to get cleaned up, and
into "mergable" state, much easier, with more help than if it was
out-of-tree. The fact that distros have been shipping and relying on it
for years shows that it is something that is needed, and it being
self-contained, makes it eligible for the staging tree.
> > That's why I'm merging it, if that the in-kernel stuff
> > obsoletes lttng, great, let me, and the distros know.
>
> I'm NAK-ing the LTTNG driver really, as it's a workaround for a
> core kernel NAK.
Huh?
> Mathieu, please work with the tracing folks who DO care about
> this stuff. It's not like there's a lack of interest in this
> area, nor is there a lack of willingness to take patches. What
> there is a lack of is your willingness to actually work on
> getting something unified, integrated to users...
>
> LTTNG has been going on for how many years? I havent seen many
> steps towards actually *merging* its functionality - you insist
> on doing your own random thing, which is different in random
> ways. Yes, some of those random ways may in fact be better than
> what we have upstream - would you be interested in filtering
> those out and pushing them upstream? I certainly would like to
> see that happen.
>
> We want to pick the best features, and throw away current
> upstream code in favor of superior out of tree code - this
> concept of letting crap sit alongside each other when people do
> care i cannot agree with.
Mathieu, a good explaination of what lttng has that the in-kernel
tracing and perf doesn't have would be a good place to start.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-05 14:17 ` Ingo Molnar
2011-12-06 21:44 ` Greg KH
@ 2011-12-07 22:57 ` Mathieu Desnoyers
2011-12-08 5:40 ` Ingo Molnar
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-07 22:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: Greg KH, Peter Zijlstra, devel, lttng-dev, linux-kernel,
Linus Torvalds, Andrew Morton, Thomas Gleixner, Steven Rostedt,
Frederic Weisbecker
Hi Ingo,
* Ingo Molnar (mingo@elte.hu) wrote:
[...]
> Mathieu, please work with the tracing folks who DO care about
> this stuff. It's not like there's a lack of interest in this
> area, nor is there a lack of willingness to take patches. What
> there is a lack of is your willingness to actually work on
> getting something unified, integrated to users...
>
> LTTNG has been going on for how many years? I havent seen many
> steps towards actually *merging* its functionality - you insist
> on doing your own random thing, which is different in random
> ways. Yes, some of those random ways may in fact be better than
> what we have upstream - would you be interested in filtering
> those out and pushing them upstream? I certainly would like to
> see that happen.
>
> We want to pick the best features, and throw away current
> upstream code in favor of superior out of tree code - this
> concept of letting crap sit alongside each other when people do
> care i cannot agree with.
LTTng 2.0, today, offers a unified interface for kernel and userspace
tracing, in the form of libraries and git-alike command line user
interface. It produces a trace format (CTF) that has been developed in
collaboration with hardware vendors and reviewed by tracing developers
of the Linux community, which allows analyzing correlated traces across
the software and hardware stacks, and supports being streamed over the
network with zero-copy both in TCP, UDP format, with optional
encryption, checksum, and more. It supports multiple concurrent users,
and hooks with tracepoints, Perf PMU counters, kprobes, kretprobes, and
system calls, with the ability to attach "context" information prepended
before each event record as selected by the user when setting up a
tracing session.
It is currently self-contained: it's been designed to be shipped as a
stand-alone set of self-contained modules, but I recently received the
offer to get it pulled into staging, which I accepted.
In my opinion, tracers need to be split into three distinct parts:
1) core tracing infrastructure that _needs to_ be shared. This mainly
targets instrumentation, and I've done my share of contribution to
mainline on this front already. I think the infrastructure we have
today is in pretty good shape.
2) tracing infrastructure that _could_ be shared. I'm mostly targeting ring
buffers and trace clocks there. It could be a nice-to-have to share the
implementation, as long as it does not get in the way of what each
project is trying to achieve. So far, what I noticed is that each
project is lacking understanding of the intent and constraints of the
other projects, thus either considering what the others are doing
as over- or under- engineering, depending on the context. Therefore,
as long as there is no agreement on the right amount of care that
needs to be put in the design of these components, it might be best
to duplicate the implementation and slowly converge as each project
gets to understand the other project's constraints. To make progress
on this front, you need to have both code-bases into mainline.
3) interfaces to user-space: very much like filesystems, these ABIs
don't need to be shared across projects that have different
use-cases. Having multiple tracer ABIs, if self-contained, should
not hurt anybody and just increase the rate of innovation. Sadly,
the ABIs exposed by perf/ftrace do not seem to be a good fit for
LTTng use-cases. Since the perf/ftrace ABIs, as well as the LTTng
ABI, are all already used by many tools, it will likely be really
difficult to change them overnight.
As an example of where we could benefit from working together, LTTng is
currently using a shadow copy of the TRACE_EVENT macros, because
the upstream version is quite limiting with respect to generating
compact probe code. It could be good to integrate those changes
upstream, and I think the best way to achieve this is if the perf and
ftrace developers can have a look at the approach taken by LTTng to
achieve this -- which is better done if LTTng is merged into staging.
Another example is how LTTng extracts system call arguments types, which
is performed by generating TRACE_EVENT description of the system call
table with a script. We could definitely help out each other in this
area.
There are certainly many other areas where we could eventually benefit
from working together, listed above as #2 "tracing infrastructure that
_could_ be shared", but I think it is better to first focus on the core
infrastructure that we need to share before getting into the territory
of the infrastructure we could share if took the time to understand each
other's requirements fully first. Meanwhile, having a duplicated
implementation of these parts that "could" be shared should not hurt
anyone -- it would even help understanding each other --, as long as
they stay self-contained.
In summary, I'm really open to help out on working on common pieces of
infrastructures, but for that they need to take into account both the
current perf/ftrace use-cases and the LTTng use-cases.
Best regards,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-06 21:44 ` Greg KH
@ 2011-12-08 5:23 ` Ingo Molnar
2011-12-08 23:27 ` Greg KH
0 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2011-12-08 5:23 UTC (permalink / raw)
To: Greg KH
Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner
* Greg KH <greg@kroah.com> wrote:
> > Same goes for a whole lot of other crap that distros are
> > carrying. Would we want to merge a different CPU scheduler
> > or the 4g:4g patch or a completely new networking stack into
> > drivers/staging/? I don't think so.
>
> Distros have new CPU schedulers and are still dragging the 4g
> split around? A whole new networking stack would be
> interesting, and if self-contained, possible :)
The point being, there's legitimate reasons to refuse crap to an
area that *people care about* in a constructive manner.
There's no rejection of LTTNG in the "hey, go away, you are
doing it wrong" fashion - we are not holding a monopoly on how
instrumentation is supposed to be done and we've been wrong
before.
There's a highly constructive, open attitude towards LTTNG and
has been for years:
" Mathieu, please split it up and integrate/unify it with the
existing instrumentation features of Linux - and if it
replaces existing stuff because an LTTNG component is
superior then so be it. "
Let me repeat it: there's no lack of willingness of cooperation
from the kernel instrumentation subsystem side. There's a lack
of movement from Mathieu - *he* is keeping LTTNG fragmented for
barely justifyable technological reasons.
Thus there's absolutely no forward movement from having this in
drivers/staging/ - in fact there's backwards movement: yet
another instrumentation gadget with its own separate ABI and
highly overlapping functionality, plus even less incentive for
it to cooperate...
It is not the typical drivers/staging/ situation where there's
either lack of work on a piece of code or some fundamental
disagreement about the right model. LTTNG has been
*intentionally* kept a separate entity, a separate brand, for
whatever non-technical reasons. How will drivers/staging/ change
that? It won't. It's a bit like VirtualBox really.
In short: this move only *increases* the incentive for LTTNG to
stay fragmented and/or force modularization crap like the highly
unfortunate situation of security modules ...
> > I.e. putting LTTNG into drivers/staging/ will not really
> > solve anything - and in may in fact delay any sane technical
> > resolution:
> >
> > There's a difference between a driver that has to go into
> > drivers/staging/ because nobody cares enough [and the driver
> > isnt high quality enough yet], and a core kernel feature
> > that we DO care about and which HAS BEEN REJECTED IN ITS
> > FORM.
>
> I didn't realize that lttng was rejected, when was that done?
> I couldn't find it in the archives anywhere.
It wasnt resubmitted for years - see the pattern and see the
problem? :-)
Merging it will cause even *less* cooperation, because of the
reasons above and because LTTNG adds a parallel ABI.
> The fact that distros have been shipping and relying on it for
> years shows that it is something that is needed, and it being
> self-contained, makes it eligible for the staging tree.
LTT(NG) was simply the historically first tracing toolkit that
embedded people got used to and there's still some inertia - and
distros add a lot of crap that people find marginally useful
which perpetuates the fork if there's at least one active
developer behind it. Most of its functionality is available via
existing upstream functionality - and where not we are more than
willing to accomodate patches!
drivers/staging/ is a tool that i support in many (in fact most)
cases - but i don't support it if it does harm.
I'm supposed to say 'no' to extra complexity more often, and
this is definitely one of those cases:
Nacked-by: Ingo Molnar <mingo@elte.hu>
Also obviously NAK to the scheduler symbol export - that alone
should tell you that it's not just a "driver" - it deeply hooks
into the core kernel...
Please respect the NAK.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
@ 2011-12-08 5:40 ` Ingo Molnar
0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-08 5:40 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, Peter Zijlstra, devel, lttng-dev, linux-kernel,
Linus Torvalds, Andrew Morton, Thomas Gleixner, Steven Rostedt,
Frederic Weisbecker
* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> Hi Ingo,
>
> * Ingo Molnar (mingo@elte.hu) wrote:
> [...]
> > Mathieu, please work with the tracing folks who DO care about
> > this stuff. It's not like there's a lack of interest in this
> > area, nor is there a lack of willingness to take patches. What
> > there is a lack of is your willingness to actually work on
> > getting something unified, integrated to users...
> >
> > LTTNG has been going on for how many years? I havent seen many
> > steps towards actually *merging* its functionality - you insist
> > on doing your own random thing, which is different in random
> > ways. Yes, some of those random ways may in fact be better than
> > what we have upstream - would you be interested in filtering
> > those out and pushing them upstream? I certainly would like to
> > see that happen.
> >
> > We want to pick the best features, and throw away current
> > upstream code in favor of superior out of tree code - this
> > concept of letting crap sit alongside each other when people do
> > care i cannot agree with.
>
> LTTng 2.0, today, offers a unified interface for kernel and
> userspace tracing, in the form of libraries and git-alike
> command line user interface. [...]
Note that Arnaldo is working on such a perf-alike tracing tool
workflow with the new 'trace' utility that we announced and
prototyped a couple of months ago.
The perf.data data format is now extensible as well and
tightened for transportability. Tools such as PowerTop or
sysprof have standardized around the perf ABI.
So there's a *lot* of overlap with existing upstream efforts and
the last thing we need is the parallel LTTNG ABI.
Are you willing to merge LTTNG into our existing kernel and
userspace infrastructure and ABIs, with the possible end result
that LTTNG ceases to be a separately named entity?
Mind hooking up with Arnaldo and with Steve regarding how we
could best split up the LTTNG bits and move them upstream?
Frankly, i've seen a *lot* of talk from you but unfortunately
*very* little action on that front, so i think my healthy
scepticism is justified.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-08 5:23 ` Ingo Molnar
@ 2011-12-08 23:27 ` Greg KH
2011-12-19 10:49 ` Ingo Molnar
0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-08 23:27 UTC (permalink / raw)
To: Ingo Molnar
Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner
On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote:
>
> * Greg KH <greg@kroah.com> wrote:
>
> > > Same goes for a whole lot of other crap that distros are
> > > carrying. Would we want to merge a different CPU scheduler
> > > or the 4g:4g patch or a completely new networking stack into
> > > drivers/staging/? I don't think so.
> >
> > Distros have new CPU schedulers and are still dragging the 4g
> > split around? A whole new networking stack would be
> > interesting, and if self-contained, possible :)
>
> The point being, there's legitimate reasons to refuse crap to an
> area that *people care about* in a constructive manner.
>
> There's no rejection of LTTNG in the "hey, go away, you are
> doing it wrong" fashion - we are not holding a monopoly on how
> instrumentation is supposed to be done and we've been wrong
> before.
>
> There's a highly constructive, open attitude towards LTTNG and
> has been for years:
>
> " Mathieu, please split it up and integrate/unify it with the
> existing instrumentation features of Linux - and if it
> replaces existing stuff because an LTTNG component is
> superior then so be it. "
Ok, that's fair enough.
Mathieu, will you please work on this? Or is there some reason you
don't feel this is possible?
> drivers/staging/ is a tool that i support in many (in fact most)
> cases - but i don't support it if it does harm.
>
> I'm supposed to say 'no' to extra complexity more often, and
> this is definitely one of those cases:
>
> Nacked-by: Ingo Molnar <mingo@elte.hu>
>
> Also obviously NAK to the scheduler symbol export - that alone
> should tell you that it's not just a "driver" - it deeply hooks
> into the core kernel...
>
> Please respect the NAK.
Will do, I'll go delete it from the staging-next tree now.
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-08 23:27 ` Greg KH
@ 2011-12-19 10:49 ` Ingo Molnar
2011-12-19 15:30 ` [lttng-dev] " Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2011-12-19 10:49 UTC (permalink / raw)
To: Greg KH
Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner
* Greg KH <greg@kroah.com> wrote:
> On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote:
> >
> > * Greg KH <greg@kroah.com> wrote:
> >
> > > > Same goes for a whole lot of other crap that distros are
> > > > carrying. Would we want to merge a different CPU scheduler
> > > > or the 4g:4g patch or a completely new networking stack into
> > > > drivers/staging/? I don't think so.
> > >
> > > Distros have new CPU schedulers and are still dragging the 4g
> > > split around? A whole new networking stack would be
> > > interesting, and if self-contained, possible :)
> >
> > The point being, there's legitimate reasons to refuse crap to an
> > area that *people care about* in a constructive manner.
> >
> > There's no rejection of LTTNG in the "hey, go away, you are
> > doing it wrong" fashion - we are not holding a monopoly on how
> > instrumentation is supposed to be done and we've been wrong
> > before.
> >
> > There's a highly constructive, open attitude towards LTTNG and
> > has been for years:
> >
> > " Mathieu, please split it up and integrate/unify it with the
> > existing instrumentation features of Linux - and if it
> > replaces existing stuff because an LTTNG component is
> > superior then so be it. "
>
> Ok, that's fair enough.
>
> Mathieu, will you please work on this? Or is there some
> reason you don't feel this is possible?
Mathieu, any update on this? I don't want the LTTNG goodies to
drop on the floor - we just have to integrate them properly.
If you 100% disagree with how specific things are done upstream
right now then don't hold back: just replace existing mechanisms
- that gives a starting point to discuss what the best way is
forward.
> > drivers/staging/ is a tool that i support in many (in fact most)
> > cases - but i don't support it if it does harm.
> >
> > I'm supposed to say 'no' to extra complexity more often, and
> > this is definitely one of those cases:
> >
> > Nacked-by: Ingo Molnar <mingo@elte.hu>
> >
> > Also obviously NAK to the scheduler symbol export - that alone
> > should tell you that it's not just a "driver" - it deeply hooks
> > into the core kernel...
> >
> > Please respect the NAK.
>
> Will do, I'll go delete it from the staging-next tree now.
Thanks Greg!
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-19 10:49 ` Ingo Molnar
@ 2011-12-19 15:30 ` Mathieu Desnoyers
2011-12-20 11:08 ` Ingo Molnar
0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-19 15:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Steven Rostedt
* Ingo Molnar (mingo@elte.hu) wrote:
>
> * Greg KH <greg@kroah.com> wrote:
>
> > On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote:
[...]
> > > There's a highly constructive, open attitude towards LTTNG and
> > > has been for years:
> > >
> > > " Mathieu, please split it up and integrate/unify it with the
> > > existing instrumentation features of Linux - and if it
> > > replaces existing stuff because an LTTNG component is
> > > superior then so be it. "
> >
> > Ok, that's fair enough.
> >
> > Mathieu, will you please work on this? Or is there some
> > reason you don't feel this is possible?
>
> Mathieu, any update on this? I don't want the LTTNG goodies to
> drop on the floor - we just have to integrate them properly.
>
> If you 100% disagree with how specific things are done upstream
> right now then don't hold back: just replace existing mechanisms
> - that gives a starting point to discuss what the best way is
> forward.
I'm bringing a though question then: what should we do if I strongly
think that the current ABIs should be replaced ? To support this, let's
note that the current perf ABI:
- lacks versioning information to handle change. I think shipping the tracer
tools within the Linux tools/ directory made sense for an initial
phase that made tracer solutions more popular for kernel developers
(and it did a great job a that), but if we want to move on to build
tools that target a wider audience, we should leave the tools/ sandbox
and create separate projects, with clearly defined ABIs, using ABI
versioning to manage changes. At this point, I think that perf tool
shipped within tools/ is more than anything a pain for
non-kernel-developer users, and favors design of sloppy ABIs.
- makes it impossible to move to CTF (Common Trace Format) and benefit
from the added features it allows,
- makes it needlessly hard, if not impossible, for perf to move to
something that would have the benefits brought by the fast unified
ring buffer code I created 2 years ago,
- makes it impossible to benefit from the LTTng fast trace clocks.
Also, it should be noted that I am finding that the way perf evolved
into a large monolithic binary blob that needs to be all enabled or all
disabled makes it quite hard to extend and re-use. As a matter of fact,
there are various cases where Steven and I tried to create performance
tests for the perf ring buffer and just could not do it without hacking
the perf code. I would definitely prefer to go for a modular approach for
the in-kernel code, and an approach based on user-level libraries for
low-level tracer interaction, with applications depending on those
libraries, again all handled with ABI versioning and library versioning.
I have to give recognition to perf: it's a fantastic performance counter
management/sampling tool, but it has clearly never been geared towards
low-overhead tracing, and this shows.
One possible way for moving things forward is to leave the current
perf/ftrace implementation and ABIs in place along with the existing
tools. We could create a new ABI merging perf, ftrace and LTTng best
features into one (e.g. kstrace for Kernel System Trace -- just made it
up, better ideas are welcome), and gradually move the user-space part of
the 3 tools to the new ABI. It is worth noting that the need for a new
ABI is something many people involved in tracing -- by that I mean those
doing most of the actual upstream tracer implementation work -- agreed
upon in the last 2 years when meetings at conferences. This would allow
a deprecation phase to take place, and would allow removal of the
maintenance burden of the duplicated Perf/Ftrace ABIs, all that while
also bringing in an ABI that allows handling of change and innovation,
which is, IMHO, the key limiting factor of the current ABIs.
By doing so, perf could become the set of tools targeting what it does
best: performance counters management and sampling, ftrace could keep on
targeting function tracing, and lttng could be used for all-system
tracing, everyone sharing the same kernel-level implementation and ABIs
(kstrace ABI).
Thoughts ?
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-19 15:30 ` [lttng-dev] " Mathieu Desnoyers
@ 2011-12-20 11:08 ` Ingo Molnar
2011-12-20 21:46 ` Frank Rowand
` (2 more replies)
0 siblings, 3 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-20 11:08 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo
(Cc:-ing Arnaldo on this as well.)
* Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> > Mathieu, any update on this? I don't want the LTTNG goodies
> > to drop on the floor - we just have to integrate them
> > properly.
> >
> > If you 100% disagree with how specific things are done
> > upstream right now then don't hold back: just replace
> > existing mechanisms - that gives a starting point to discuss
> > what the best way is forward.
>
> I'm bringing a though question then: what should we do if I
> strongly think that the current ABIs should be replaced ? To
> support this, let's note that the current perf ABI:
>
> - lacks versioning information to handle change. [...]
That's not actually true on *any* level: we are changing,
evolving and extending the perf ABIs all the time.
There's two main API/ABI components:
1) the perf syscall which is part of the Linux syscall ABI.
Individual versions of the ABI have (monotonically increasing)
sizes for "struct perf_event_attr" - you can consider these
natural ABI versioning.
So the 'versioning' is not done via some inflexible and ugly,
Windows-alike 'explicit ABI version' field, but done via
structure sizes and -ENOSYS.
We've iterated and versioned it numerous times in the past 10
kernel releases, in a backwards compatible manner.
2) the perf.data file
The versioning there is capability bitmask based - modelled
after ext2/ext3/ext4 capability bitmasks. It's extensible as
well.
I think your concentration on ABIs is missing a very fundamental
property of instrumentation:
the life-time and persistence of instrumentation data is
typically very short ('days' is already an exception - typical
is minutes, at most hours), and for that reason we havent been
getting much pressure from users to maintain a perf.data ABI -
but we are doing it nevertheless.
Instrumentation is fundamentally about the 'here and now' and so
it fundamentally differs from things like backup formats and
database formats. An ABI does not hurt and we are maintaining
it, but you are overrating its importance significantly.
> [...] I think shipping the tracer tools within the Linux
> tools/ directory made sense for an initial phase that made
> tracer solutions more popular for kernel developers (and it
> did a great job a that), but if we want to move on to build
> tools that target a wider audience, we should leave the
> tools/ sandbox and create separate projects, with clearly
> defined ABIs, using ABI versioning to manage changes. At
> this point, I think that perf tool shipped within tools/ is
> more than anything a pain for non-kernel-developer users,
> and favors design of sloppy ABIs.
I think you've thoroughly misunderstood the upstream ABI
versioning status quo, which makes your argument out of this
world.
The perf ABIs are well-defined and well-maintained. See an
ad-hoc ABI and tool compatibility experiment i made here:
[F.A.Q.] perf ABI backwards and forwards compatibility
https://lkml.org/lkml/2011/11/8/77
> - makes it impossible to move to CTF (Common Trace Format)
> and benefit from the added features it allows,
"CTF" was mainly written by yourself, right?
If there's any tool worth caring about that wants to deal in CTF
then it can be converted just fine. I don't think it matters
nearly as much as you seem to imply, see my reply further below.
> - makes it needlessly hard, if not impossible, for perf to
> move to something that would have the benefits brought by
> the fast unified ring buffer code I created 2 years ago,
The current upstream code actually has a fast unified
ring-buffer, mmap()-ed to user-space, so you'd have to be a bit
more specific about that point.
> - makes it impossible to benefit from the LTTng fast trace
> clocks.
We have various trace clocks upstream as well - so you'd have to
outline it specifically why it's "impossible".
> Also, it should be noted that I am finding that the way perf
> evolved into a large monolithic binary blob that needs to be
> all enabled or all disabled makes it quite hard to extend and
> re-use. [...]
There's a (very) healthy in-flux of features - it's one of the
most active kernel and userpace projects we have.
So *others* don't find it hard to work with. If you have
specific observations i'm sure Arnaldo will appreciate them.
[ I snipped the rest of your reply - you seem to have deep
rooted misconceptions about what the current upstream
principles and practices are in this area: you are banging on
open doors! ]
Anyway, my prior request+offer stands: please split LTTNG up
into individual feature blocks done to extend or replace
existing instrumentation features and offer them as changes to
existing upstream instrumentation code. We want every
conceivable useful feature, but we *really* don't want
schizophrenic duplication in this area.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-20 11:08 ` Ingo Molnar
@ 2011-12-20 21:46 ` Frank Rowand
2011-12-23 10:51 ` Ingo Molnar
2011-12-21 18:47 ` Aaron Spear
2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
2 siblings, 1 reply; 51+ messages in thread
From: Frank Rowand @ 2011-12-20 21:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: Mathieu Desnoyers, Greg KH, devel, Peter Zijlstra, linux-kernel,
lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo
On 12/20/11 03:08, Ingo Molnar wrote:
>
> (Cc:-ing Arnaldo on this as well.)
>
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
>
< snip >
> I think your concentration on ABIs is missing a very fundamental
> property of instrumentation:
>
> the life-time and persistence of instrumentation data is
> typically very short ('days' is already an exception - typical
> is minutes, at most hours), and for that reason we havent been
> getting much pressure from users to maintain a perf.data ABI -
> but we are doing it nevertheless.
>
> Instrumentation is fundamentally about the 'here and now' and so
> it fundamentally differs from things like backup formats and
> database formats. An ABI does not hurt and we are maintaining
> it, but you are overrating its importance significantly.
Just to provide visibility to a different use case...
The life time of my data is typically weeks, months, or years
(though I am not likely to re-process year old raw data).
< snip >
-Frank
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-20 11:08 ` Ingo Molnar
2011-12-20 21:46 ` Frank Rowand
@ 2011-12-21 18:47 ` Aaron Spear
2011-12-21 18:58 ` Christoph Hellwig
2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
2 siblings, 1 reply; 51+ messages in thread
From: Aaron Spear @ 2011-12-21 18:47 UTC (permalink / raw)
To: Ingo Molnar
Cc: devel, Peter Zijlstra, Greg KH, linux-kernel, Steven Rostedt,
Arnaldo Carvalho de Melo, lttng-dev, Mathieu Desnoyers,
Andrew Morton, Linus Torvalds, Thomas Gleixner,
Mathieu Desnoyers
* Ingo Molnar <mingo@elte.hu> wrote:
> "CTF" was mainly written by yourself, right?
>
> If there's any tool worth caring about that wants to deal in CTF
> then it can be converted just fine. I don't think it matters
> nearly as much as you seem to imply, see my reply further below.
Hi Ingo,
I thought it might be a useful point of reference to mention that there is a commitment to CTF for more than just LTTng. The Multicore Association and member companies including TI, Freescale, Samsung, Mentor Graphics, Wind River Systems, VMware and others intend to use CTF as a lingua franca for correlation of traces taken from different tracing technologies in heterogeneous multi-core systems. Linux is pivotal here of course, but we are also aggregating various types of hardware traces as well as instrumentation trace from bare metal, RTOS's, and other OS's. Many of the requirements that went into the draft CTF specification were driven by this working groups experience in the embedded industry and many different legacy tracing technologies. While Mathieu has been instrumental in creating CTF, he is certainly not the only one with a vested interest in its future.
respectfully,
Aaron Spear - VMware
Chairman, Multicore Association Tools Infrastructure Working Group
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-21 18:47 ` Aaron Spear
@ 2011-12-21 18:58 ` Christoph Hellwig
0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2011-12-21 18:58 UTC (permalink / raw)
To: Aaron Spear
Cc: Ingo Molnar, devel, Peter Zijlstra, Greg KH, linux-kernel,
Steven Rostedt, Arnaldo Carvalho de Melo, lttng-dev,
Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Mathieu Desnoyers
Vmware using it is more a reason to avoid it than using it.. :)
And most certainly not a reason to export internal kernel details.
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
2011-12-20 21:46 ` Frank Rowand
@ 2011-12-23 10:51 ` Ingo Molnar
0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-23 10:51 UTC (permalink / raw)
To: Frank Rowand
Cc: Mathieu Desnoyers, Greg KH, devel, Peter Zijlstra, linux-kernel,
lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo
* Frank Rowand <frank.rowand@am.sony.com> wrote:
> On 12/20/11 03:08, Ingo Molnar wrote:
> >
> > (Cc:-ing Arnaldo on this as well.)
> >
> > * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> >
>
> < snip >
>
> > I think your concentration on ABIs is missing a very fundamental
> > property of instrumentation:
> >
> > the life-time and persistence of instrumentation data is
> > typically very short ('days' is already an exception - typical
> > is minutes, at most hours), and for that reason we havent been
> > getting much pressure from users to maintain a perf.data ABI -
> > but we are doing it nevertheless.
> >
> > Instrumentation is fundamentally about the 'here and now' and so
> > it fundamentally differs from things like backup formats and
> > database formats. An ABI does not hurt and we are maintaining
> > it, but you are overrating its importance significantly.
>
> Just to provide visibility to a different use case...
>
> The life time of my data is typically weeks, months, or years
> (though I am not likely to re-process year old raw data).
I'm not saying that it's absolutely never done: for example
monitoring/logging on a production box and evaluating events
only once per month would certainly qualify.
I just say that the overwhelming majority of usecases utilize
traces on a short time-span and that we must keep the common
usecase in mind when supporting not so common usecases.
It's the same deal as with -rt: compared to the 'normal' usage
of Linux -rt is somewhat of a special case - yet it's still
something very much worth doing, as long as the main usecase is
always kept in mind.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
* Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
2011-12-20 11:08 ` Ingo Molnar
2011-12-20 21:46 ` Frank Rowand
2011-12-21 18:47 ` Aaron Spear
@ 2011-12-23 16:46 ` Mathieu Desnoyers
2011-12-23 17:21 ` Ted Ts'o
2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-23 16:46 UTC (permalink / raw)
To: Ingo Molnar
Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev,
Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt,
Arnaldo Carvalho de Melo
Hi Ingo,
I'll break down my reply in various sub-topics, and address them
separately in the following weeks. Let's start with the ABIs.
* Ingo Molnar (mingo@elte.hu) wrote:
>
> (Cc:-ing Arnaldo on this as well.)
>
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
>
> > > Mathieu, any update on this? I don't want the LTTNG goodies
> > > to drop on the floor - we just have to integrate them
> > > properly.
> > >
> > > If you 100% disagree with how specific things are done
> > > upstream right now then don't hold back: just replace
> > > existing mechanisms - that gives a starting point to discuss
> > > what the best way is forward.
> >
> > I'm bringing a though question then: what should we do if I
> > strongly think that the current ABIs should be replaced ? To
> > support this, let's note that the current perf ABI:
> >
> > - lacks versioning information to handle change. [...]
>
> That's not actually true on *any* level: we are changing,
> evolving and extending the perf ABIs all the time.
You may be able to evolve and extend the Perf ABI, but the way this ABI
is designed does not allow you to change it in ways that would introduce
ABI incompatibility between versions (the equivalent of a major version
number change).
You're therefore gradually painting yourself in a corner without any
ability to go back and revisit previous decisions, and this is bad
because revisiting those past decisions will be needed to bring in some
LTTng features, because those decisions were taken without having those
features in mind. Supporting a new feature is not always as easy as
"extending a structure" as you seem to imply.
> There's two main API/ABI components:
>
> 1) the perf syscall which is part of the Linux syscall ABI.
>
> Individual versions of the ABI have (monotonically increasing)
> sizes for "struct perf_event_attr" - you can consider these
> natural ABI versioning.
>
> So the 'versioning' is not done via some inflexible and ugly,
> Windows-alike 'explicit ABI version' field, but done via
> structure sizes and -ENOSYS.
Judging versions as inflexibile and ugly is merely a matter of taste.
However, the inability to do any kind of major change due to the way the
Perf ABI is made has a clear direct impact on the ability to innovate
within this project.
> We've iterated and versioned it numerous times in the past 10
> kernel releases, in a backwards compatible manner.
>
> 2) the perf.data file
>
> The versioning there is capability bitmask based - modelled
> after ext2/ext3/ext4 capability bitmasks. It's extensible as
> well.
AFAIU, filesystems have very strict compatibility requirements because
they sit on hard drives for years on live systems that cannot always
easily permit migration between incompatible layouts. Traces don't have
the same constraints (see below),
>
> I think your concentration on ABIs is missing a very fundamental
> property of instrumentation:
>
> the life-time and persistence of instrumentation data is
> typically very short ('days' is already an exception - typical
> is minutes, at most hours), and for that reason we havent been
> getting much pressure from users to maintain a perf.data ABI -
> but we are doing it nevertheless.
>
> Instrumentation is fundamentally about the 'here and now' and so
> it fundamentally differs from things like backup formats and
> database formats. An ABI does not hurt and we are maintaining
> it, but you are overrating its importance significantly.
I think you are really focusing on a developer use-case, which might be
why you are missing the big picture. How many Linux developers are out
there ? How many Linux system administrators are out there ? Many, many
more. With all due respect, I'm afraid your definition of "typically" is
limited by your developer-centric vision. So far, I came up with the
following breakdown of use-cases in terms of trace data life-span:
- Long-persistence traces (old traces): for this use-case, a conversion
phase is usually OK. These long-persistance traces are useful in
production system monitoring scenarios, and for finding delta in
execution between different runs of a test suite (for instance). This
use-case allows format breakage if the old format can be identified by
a trace converter.
- Short-lived traces (debugging use-case): pretty much anything
would do, as long as the user-level tool can detect if it understands
the layout.
- Live traces: we want to minimize the overhead, both on the trace
producer and on the machine performing the data analysis (which can be
either the traced machine or a separate host), while still providing a
live stream of data. This is useful for applications like lttngtop
(showing a live report of the system) and for production system
monitoring. In this case, we want the tools to be able to find out if
they can read the trace format (or report an error, asking for
upgrade if they can't). Trace conversion is not appropriate in this
scenario due to the added timing complexity and overhead.
As you will notice, none of these use-cases require a filesystem-alike
bitmask-based compatibility ABI at the trace format level.
Using explicit versioning allows drastic changes to be done when they
are required, in the process allowing a trace converter to be used to
deal with "old" legacy traces, and allowing a live trace
aggregator/analyzer to detect if it can support the live trace stream.
> > [...] I think shipping the tracer tools within the Linux
> > tools/ directory made sense for an initial phase that made
> > tracer solutions more popular for kernel developers (and it
> > did a great job a that), but if we want to move on to build
> > tools that target a wider audience, we should leave the
> > tools/ sandbox and create separate projects, with clearly
> > defined ABIs, using ABI versioning to manage changes. At
> > this point, I think that perf tool shipped within tools/ is
> > more than anything a pain for non-kernel-developer users,
> > and favors design of sloppy ABIs.
>
> I think you've thoroughly misunderstood the upstream ABI
> versioning status quo, which makes your argument out of this
> world.
>
> The perf ABIs are well-defined and well-maintained. See an
> ad-hoc ABI and tool compatibility experiment i made here:
>
> [F.A.Q.] perf ABI backwards and forwards compatibility
> https://lkml.org/lkml/2011/11/8/77
I hope my answer above explains why I think the what perf handles ABI
changes is a terrible choice. In summary:
- Perf is painting itself in a corner, not allowing any ABI breakage,
only "extensions", which limits integration of features that require
core changes,
- It's doing so without even needing it: Perf is using an ABI versioning
scheme designed for filesystems, when it is not in fact driven by the
same constraints.
Best regards,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
@ 2011-12-23 17:21 ` Ted Ts'o
2011-12-23 18:16 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Ted Ts'o @ 2011-12-23 17:21 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel,
lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner,
Steven Rostedt, Arnaldo Carvalho de Melo
On Fri, Dec 23, 2011 at 11:46:29AM -0500, Mathieu Desnoyers wrote:
> - It's doing so without even needing it: Perf is using an ABI versioning
> scheme designed for filesystems, when it is not in fact driven by the
> same constraints.
Well, there are *some* constraints. I've been assured that despite
the fact that the perf client is in the kernel sources (something
which I still think is a bad idea, since it's leading to other bad
choices like kvm-tool wanting to be bundled with kernel sources), that
it is *not* a license to jerk the format around wildly --- that people
will have installed userspace binaries that shouldn't randomly break
they boot a new kernel.
So I'm *glad* that Perf is using an ABI versioning scheme that accepts
the same restraints as file systems. It means we don't randomly break
userspace tools.
So Mathieu, if you think it is the current standards of backwards
compatibility are too rigid, what level of tool breakage do you think
is acceptable? It's not just about the backwards compatibility of the
trace files, it's also about compatibility of userspace utilities.
For example, systemtap, where you had to recompile from source at
each kernel revision, and pray it would still build goes too far in
the other direction, wouldn't you agree? What is the correct level of
kernel developer annoyance you think is appropriate to inflict on
ourselves?
Regards,
- Ted
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
2011-12-23 17:21 ` Ted Ts'o
@ 2011-12-23 18:16 ` Mathieu Desnoyers
2011-12-25 17:46 ` Ted Ts'o
0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-23 18:16 UTC (permalink / raw)
To: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra,
linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo
Hi Ted,
* Ted Ts'o (tytso@mit.edu) wrote:
> On Fri, Dec 23, 2011 at 11:46:29AM -0500, Mathieu Desnoyers wrote:
> > - It's doing so without even needing it: Perf is using an ABI versioning
> > scheme designed for filesystems, when it is not in fact driven by the
> > same constraints.
>
> Well, there are *some* constraints. I've been assured that despite
> the fact that the perf client is in the kernel sources (something
> which I still think is a bad idea, since it's leading to other bad
> choices like kvm-tool wanting to be bundled with kernel sources), that
> it is *not* a license to jerk the format around wildly --- that people
> will have installed userspace binaries that shouldn't randomly break
> they boot a new kernel.
>
> So I'm *glad* that Perf is using an ABI versioning scheme that accepts
> the same restraints as file systems. It means we don't randomly break
> userspace tools.
>
> So Mathieu, if you think it is the current standards of backwards
> compatibility are too rigid, what level of tool breakage do you think
> is acceptable? It's not just about the backwards compatibility of the
> trace files, it's also about compatibility of userspace utilities.
>
> For example, systemtap, where you had to recompile from source at
> each kernel revision, and pray it would still build goes too far in
> the other direction, wouldn't you agree? What is the correct level of
> kernel developer annoyance you think is appropriate to inflict on
> ourselves?
I completely agree that systemtap did not have the right level of
compatibility towards changes. It clearly does not make sense to require
the tools to be updated whenever the kernel version and instrumentation
changes. What makes sense to me, though, is to allow breakage when a
newly introduced tracer feature requires the ABI to break.
What I currently see as a tradeoff sweet-spot between compatibility
burden and ability to innovate is to split the ABI and handle
compatibility as follows:
- ABIs to control the tracer
- Versioned, ideally always incrementally adding features, but still
keeping room for major changes if needed. We should expect very,
very seldom breakages on this front. This requires update of tracer
control tools when the ABI is broken.
- ABIs to transport tracing data
- Versioned, can and should change when a feature or transport
performance enhancement require to break compatibility. This
requires update of trace data consumer tools when compability is
broken.
(note that ABI to control the tracer and ABI to transport data could
share the same version numbering if the control tools and transport
tools happen to reside in the same user-level packages)
- The trace data format
- Both versioned _and_ self-described.
Self-description of the event/field layout allows the same tools to
understand traces gathered on different kernel versions, on different
architectures, with different tracer configurations.
Versioning on top of the self-described trace format allows changes
to what the trace self-description can express.
So the breakages would happen only when required by tracer tool
capability enhancements, not randomly when a kernel instrumentation
source happens to change.
Best regards,
Mathieu
P.S.: my next replies will be slightly delayed, due to Christmas
holidays.
>
> Regards,
>
>
> - Ted
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
2011-12-23 18:16 ` Mathieu Desnoyers
@ 2011-12-25 17:46 ` Ted Ts'o
2012-01-12 14:09 ` Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Ted Ts'o @ 2011-12-25 17:46 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel,
lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner,
Steven Rostedt, Arnaldo Carvalho de Melo
On Fri, Dec 23, 2011 at 01:16:41PM -0500, Mathieu Desnoyers wrote:
>
> (note that ABI to control the tracer and ABI to transport data could
> share the same version numbering if the control tools and transport
> tools happen to reside in the same user-level packages)
Being able to control the tracer but then not being able to look at
the trace output is useless. So they might as well be the same
thing....
> - The trace data format
> - Both versioned _and_ self-described.
> Self-description of the event/field layout allows the same tools to
> understand traces gathered on different kernel versions, on different
> architectures, with different tracer configurations.
> Versioning on top of the self-described trace format allows changes
> to what the trace self-description can express.
So there are two ways to do this. One is to make changes be backwards
compatible, so that the trace data format only breaks if you use the
new feature; if it doesn't you encode things the old fashioned way.
The other way of doing things is to randomly break users whenever the
tracing developers decide to add some random new feature, regardless
of whether or not a partiuclar user finds that new feature to be
useful.
The first is acceptable. The second, IMHO, is not. Linus has said
quite strongly that WE DO NOT BREAK USERSPACE. Period.
Regards,
- Ted
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
2011-12-25 17:46 ` Ted Ts'o
@ 2012-01-12 14:09 ` Mathieu Desnoyers
2012-01-12 14:54 ` Steven Rostedt
0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2012-01-12 14:09 UTC (permalink / raw)
To: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra,
linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo
* Ted Ts'o (tytso@mit.edu) wrote:
> On Fri, Dec 23, 2011 at 01:16:41PM -0500, Mathieu Desnoyers wrote:
[...]
> > - The trace data format
> > - Both versioned _and_ self-described.
> > Self-description of the event/field layout allows the same tools to
> > understand traces gathered on different kernel versions, on different
> > architectures, with different tracer configurations.
> > Versioning on top of the self-described trace format allows changes
> > to what the trace self-description can express.
>
> So there are two ways to do this. One is to make changes be backwards
> compatible, so that the trace data format only breaks if you use the
> new feature; if it doesn't you encode things the old fashioned way.
> The other way of doing things is to randomly break users whenever the
> tracing developers decide to add some random new feature, regardless
> of whether or not a partiuclar user finds that new feature to be
> useful.
>
> The first is acceptable. The second, IMHO, is not. Linus has said
> quite strongly that WE DO NOT BREAK USERSPACE. Period.
Please allow me to look into what needs to be kept compatible for a good
user experience (for both Linux end users and kernel developers) in the
case of tracing:
Let's first describe what we really utterly don't want: random breakages
between the kernel and user-level tracing control/transport/analysis
tools. Consequently, I think we could say that it would be unacceptable
for userspace tools to break for every slight change of kernel code. If
that would be the case (as it was with the approach SystemTap was taking
before they started hooking into the kernel with tracepoints), then we'd
need to regenerate the tools for pretty much every -rc kernel, and for
each local development tree, which would make those tools useless to
kernel developers.
It is important to clarify that tracing is, in my opinion, not part of
the runtime support, which makes it very different by nature from
filesystems and kernel runtime support. So I agree with Linus' argument
about not breaking userspace when applied to runtime support, because
being unable to even boot a system due to an ABI breakage is very much
unwanted. However, I think it should not be applied as-is to tracing,
because you cannot make a system unusable due to a tracer ABI breakage:
if a tracer can be packaged in a set of standalone modules, that clearly
shows it is not part of the system runtime support.
That being said, ABI versioning could still handle ABI changes without
significantly impacting the users: when an ABI breakage is needed, we
can keep the old code around for a while and expose both the old and new
ABIs. This would ensure that the user-level tools can query for the
specific ABI major version(s) they support. That should improve the user
experience by providing "deprecated" console warnings for a few kernel
releases before the old code ends up being removed.
So, in summary:
* Old kernels vs new tools:
New tools can query for the latest ABI they know, and fall-back on older
ABIs, with limited features.
* New kernels vs old tools:
Keeping around the old ABI for a deprecation phase lets old tools work on
a bleeding edge kernel while the ABI change is being introduced, which
should satisfy the kernel developer use-case.
Best regards,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 14:09 ` Mathieu Desnoyers
@ 2012-01-12 14:54 ` Steven Rostedt
2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 14:54 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra,
linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds,
Thomas Gleixner, Arnaldo Carvalho de Melo
On Thu, 2012-01-12 at 09:09 -0500, Mathieu Desnoyers wrote:
> It is important to clarify that tracing is, in my opinion, not part of
> the runtime support, which makes it very different by nature from
> filesystems and kernel runtime support. So I agree with Linus' argument
> about not breaking userspace when applied to runtime support, because
> being unable to even boot a system due to an ABI breakage is very much
> unwanted. However, I think it should not be applied as-is to tracing,
> because you cannot make a system unusable due to a tracer ABI breakage:
> if a tracer can be packaged in a set of standalone modules, that clearly
> shows it is not part of the system runtime support.
Correct that tracing is not something that needs to make the system run,
but that's still no excuse to make ABI changes any different. Note, we
don't change things within the /proc/stat or /proc/*/stat and that's not
required to make the system run. We can add onto those files, but we
can't change what the current numbers mean.
>
> That being said, ABI versioning could still handle ABI changes without
> significantly impacting the users: when an ABI breakage is needed, we
> can keep the old code around for a while and expose both the old and new
> ABIs. This would ensure that the user-level tools can query for the
> specific ABI major version(s) they support. That should improve the user
> experience by providing "deprecated" console warnings for a few kernel
> releases before the old code ends up being removed.
ABI version numbers are meaningless, and prone to be broken. The change
would have to be added with the commit that updates the change otherwise
git bisecting can get screwed up too.
The way ABI changes in the kernel have always been was to look at the
file itself and have the tool be able to determine what version of the
ABI is there based on what files exists, or what exists in the file.
I've done this with trace-cmd and ftrace. The debugfs system has changed
a lot, and trace-cmd can handle each change. I never had a need for a
version number to do this. I simply have trace-cmd look at what is
available and what isn't.
If you need to know if a syscall exists, you try it and if you get
-ENOSYS, then you know it doesn't exist. We have no need for an
arbitrary version number that is meaningless. The existence of (or lack
of) tells us all we need to know.
>
> So, in summary:
>
> * Old kernels vs new tools:
>
> New tools can query for the latest ABI they know, and fall-back on older
> ABIs, with limited features.
>
> * New kernels vs old tools:
>
> Keeping around the old ABI for a deprecation phase lets old tools work on
> a bleeding edge kernel while the ABI change is being introduced, which
> should satisfy the kernel developer use-case.
We've done this without version numbers. Just look at all the udev
changes.
-- Steve
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 14:54 ` Steven Rostedt
@ 2012-01-12 15:39 ` Mathieu Desnoyers
2012-01-12 15:53 ` Steven Rostedt
2012-01-12 20:00 ` Greg KH
0 siblings, 2 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2012-01-12 15:39 UTC (permalink / raw)
To: Steven Rostedt
Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH,
linux-kernel, Arnaldo Carvalho de Melo, lttng-dev,
Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Thu, 2012-01-12 at 09:09 -0500, Mathieu Desnoyers wrote:
>
>
> > It is important to clarify that tracing is, in my opinion, not part of
> > the runtime support, which makes it very different by nature from
> > filesystems and kernel runtime support. So I agree with Linus' argument
> > about not breaking userspace when applied to runtime support, because
> > being unable to even boot a system due to an ABI breakage is very much
> > unwanted. However, I think it should not be applied as-is to tracing,
> > because you cannot make a system unusable due to a tracer ABI breakage:
> > if a tracer can be packaged in a set of standalone modules, that clearly
> > shows it is not part of the system runtime support.
>
> Correct that tracing is not something that needs to make the system run,
> but that's still no excuse to make ABI changes any different. Note, we
> don't change things within the /proc/stat or /proc/*/stat and that's not
> required to make the system run. We can add onto those files, but we
> can't change what the current numbers mean.
This is because this stat ABI is volountarily exposed like this. It does
not mean that this is the case everywhere else in the kernel. And it
might not be the right way to expose it: I bet that PeterZ would really
like to get the thread priority value removed from /proc/*/stat, because
it exposes something "internal" to the scheduler from his point of view,
but this particular ABI has chosen to evolve without ever retiring a
value previously exported.
>
> >
> > That being said, ABI versioning could still handle ABI changes without
> > significantly impacting the users: when an ABI breakage is needed, we
> > can keep the old code around for a while and expose both the old and new
> > ABIs. This would ensure that the user-level tools can query for the
> > specific ABI major version(s) they support. That should improve the user
> > experience by providing "deprecated" console warnings for a few kernel
> > releases before the old code ends up being removed.
>
> ABI version numbers are meaningless, and prone to be broken. The change
> would have to be added with the commit that updates the change otherwise
> git bisecting can get screwed up too.
Of course, the commit that updates the code would "fork" to a new ABI if
it ever need to diverge from the old one.
> The way ABI changes in the kernel have always been was to look at the
> file itself and have the tool be able to determine what version of the
> ABI is there based on what files exists, or what exists in the file.
> I've done this with trace-cmd and ftrace. The debugfs system has changed
> a lot, and trace-cmd can handle each change. I never had a need for a
> version number to do this. I simply have trace-cmd look at what is
> available and what isn't.
>
> If you need to know if a syscall exists, you try it and if you get
> -ENOSYS, then you know it doesn't exist. We have no need for an
> arbitrary version number that is meaningless. The existence of (or lack
> of) tells us all we need to know.
pipe()/pipe2()
dup()/dup2()/dup3()
umount()/umount2()
mmap()/mmap2()
madvise()/madvise1()
eventfd()/eventfd2()
Those look very much like major version numbers to me. And these are
entirely compatible with your statement above about using -ENOSYS to
detect if the major version number is implemented or not.
If your only concern is that the major version number should be part of
the ABI name (as in the examples above), that can be arranged.
>
> >
> > So, in summary:
> >
> > * Old kernels vs new tools:
> >
> > New tools can query for the latest ABI they know, and fall-back on older
> > ABIs, with limited features.
> >
> > * New kernels vs old tools:
> >
> > Keeping around the old ABI for a deprecation phase lets old tools work on
> > a bleeding edge kernel while the ABI change is being introduced, which
> > should satisfy the kernel developer use-case.
>
> We've done this without version numbers. Just look at all the udev
> changes.
Are you seriously refering to udev as an example of how to handle
changes, or as one of the worse ABI breakage mess that happened in the
Linux kernel history ? My own experience as a Linux users (in the
era around 2.6.12 kernels if my memory serves me right) lead me to think
it's the latter. And because udev is part of the runtime support, that
indeed led to non-bootable systems and lots of frustrated users.
Thanks,
Mathieu
>
> -- Steve
>
>
>
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
>
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
@ 2012-01-12 15:53 ` Steven Rostedt
2012-01-12 15:59 ` Steven Rostedt
2012-01-12 16:27 ` Mathieu Desnoyers
2012-01-12 20:00 ` Greg KH
1 sibling, 2 replies; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 15:53 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH,
linux-kernel, Arnaldo Carvalho de Melo, lttng-dev,
Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton
On Thu, 2012-01-12 at 10:39 -0500, Mathieu Desnoyers wrote:
> pipe()/pipe2()
> dup()/dup2()/dup3()
> umount()/umount2()
> mmap()/mmap2()
> madvise()/madvise1()
> eventfd()/eventfd2()
>
> Those look very much like major version numbers to me. And these are
> entirely compatible with your statement above about using -ENOSYS to
> detect if the major version number is implemented or not.
That's a stretch in calling version numbers. All but the madvise case
above are how many parameters it takes, not really a "version" number.
It's adding a new syscall, not updating a version and then deprecating
the old one. As I believe all the above are still supported.
>
> If your only concern is that the major version number should be part of
> the ABI name (as in the examples above), that can be arranged.
> >
> > We've done this without version numbers. Just look at all the udev
> > changes.
>
> Are you seriously refering to udev as an example of how to handle
> changes, or as one of the worse ABI breakage mess that happened in the
> Linux kernel history ? My own experience as a Linux users (in the
> era around 2.6.12 kernels if my memory serves me right) lead me to think
> it's the latter. And because udev is part of the runtime support, that
> indeed led to non-bootable systems and lots of frustrated users.
Yeah, I know it sucked, as I got burned by it too. But having "version"
numbers wouldn't have helped at all. In fact, it should have kept both
ways working much longer, or at least had the new udev support both.
What udev did is more like what you want to do than what I did with
trace-cmd.
-- Steve
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 15:53 ` Steven Rostedt
@ 2012-01-12 15:59 ` Steven Rostedt
2012-01-12 16:27 ` Mathieu Desnoyers
1 sibling, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 15:59 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH,
linux-kernel, Arnaldo Carvalho de Melo, lttng-dev,
Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton
On Thu, 2012-01-12 at 10:53 -0500, Steven Rostedt wrote:
> That's a stretch in calling version numbers. All but the madvise case
> above are how many parameters it takes, not really a "version" number.
>
> It's adding a new syscall, not updating a version and then deprecating
> the old one. As I believe all the above are still supported.
>
Actually, the madvise1() isn't supported. But this just shows that it
has nothing to do with a version number. What version is madvise()?
-- Steve
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 15:53 ` Steven Rostedt
2012-01-12 15:59 ` Steven Rostedt
@ 2012-01-12 16:27 ` Mathieu Desnoyers
2012-01-12 16:34 ` Steven Rostedt
1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2012-01-12 16:27 UTC (permalink / raw)
To: Steven Rostedt
Cc: devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel,
Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner,
Ingo Molnar, Linus Torvalds, Andrew Morton
* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Thu, 2012-01-12 at 10:39 -0500, Mathieu Desnoyers wrote:
>
> > pipe()/pipe2()
> > dup()/dup2()/dup3()
> > umount()/umount2()
> > mmap()/mmap2()
> > madvise()/madvise1()
> > eventfd()/eventfd2()
> >
> > Those look very much like major version numbers to me. And these are
> > entirely compatible with your statement above about using -ENOSYS to
> > detect if the major version number is implemented or not.
>
> That's a stretch in calling version numbers. All but the madvise case
> above are how many parameters it takes, not really a "version" number.
>
> It's adding a new syscall, not updating a version and then deprecating
> the old one. As I believe all the above are still supported.
>
> >
> > If your only concern is that the major version number should be part of
> > the ABI name (as in the examples above), that can be arranged.
>
> > >
> > > We've done this without version numbers. Just look at all the udev
> > > changes.
> >
> > Are you seriously refering to udev as an example of how to handle
> > changes, or as one of the worse ABI breakage mess that happened in the
> > Linux kernel history ? My own experience as a Linux users (in the
> > era around 2.6.12 kernels if my memory serves me right) lead me to think
> > it's the latter. And because udev is part of the runtime support, that
> > indeed led to non-bootable systems and lots of frustrated users.
>
> Yeah, I know it sucked, as I got burned by it too. But having "version"
> numbers wouldn't have helped at all. In fact, it should have kept both
> ways working much longer, or at least had the new udev support both.
>
> What udev did is more like what you want to do than what I did with
> trace-cmd.
OK. Then how can trace-cmd support the LTTng features ?
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 16:27 ` Mathieu Desnoyers
@ 2012-01-12 16:34 ` Steven Rostedt
0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 16:34 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel,
Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner,
Ingo Molnar, Linus Torvalds, Andrew Morton
On Thu, 2012-01-12 at 11:27 -0500, Mathieu Desnoyers wrote:
> > What udev did is more like what you want to do than what I did with
> > trace-cmd.
>
> OK. Then how can trace-cmd support the LTTng features ?
New syscalls, or new files, and simply check if they exist.
New features should not break old ones.
-- Steve
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
2012-01-12 15:53 ` Steven Rostedt
@ 2012-01-12 20:00 ` Greg KH
2012-01-16 8:55 ` Ingo Molnar
1 sibling, 1 reply; 51+ messages in thread
From: Greg KH @ 2012-01-12 20:00 UTC (permalink / raw)
To: Mathieu Desnoyers
Cc: Steven Rostedt, Mathieu Desnoyers, devel, Ted Ts'o,
Peter Zijlstra, linux-kernel, Arnaldo Carvalho de Melo,
lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds,
Andrew Morton
On Thu, Jan 12, 2012 at 10:39:57AM -0500, Mathieu Desnoyers wrote:
> > We've done this without version numbers. Just look at all the udev
> > changes.
>
> Are you seriously refering to udev as an example of how to handle
> changes, or as one of the worse ABI breakage mess that happened in the
> Linux kernel history ? My own experience as a Linux users (in the
> era around 2.6.12 kernels if my memory serves me right) lead me to think
> it's the latter. And because udev is part of the runtime support, that
> indeed led to non-bootable systems and lots of frustrated users.
Really? You fail to remember the fact that we _fixed_ those
non-bootable systems by putting the userspace bits back, and symlinks,
and all other sorts of gyrations in order to prevent userspace from
breaking again.
And it worked, and people's machines worked again, and no one since then
has reported a problem.
So I think udev actually is a good example of how to do it right, we
provide proper backwards compatibility in the kernel to keep userspace
working.
thanks,
greg k-h
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
2012-01-12 20:00 ` Greg KH
@ 2012-01-16 8:55 ` Ingo Molnar
0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2012-01-16 8:55 UTC (permalink / raw)
To: Greg KH
Cc: Mathieu Desnoyers, Steven Rostedt, Mathieu Desnoyers, devel,
Ted Ts'o, Peter Zijlstra, linux-kernel,
Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner,
Linus Torvalds, Andrew Morton
* Greg KH <greg@kroah.com> wrote:
> So I think udev actually is a good example of how to do it
> right, we provide proper backwards compatibility in the kernel
> to keep userspace working.
I agree, i still have a udev system that i installed 5 years
ago, and it's working mostly fine with current kernels.
Compatibility is a desirable property, it is something that
preserves our users - and if done right it's almost never a big
issue technically. If it is hindering someone then there must be
other problems.
Of course to developers the simplest approach is always to just
develop without regard for compatibility. The simplest form of
that is that people write patches that work fine on their own
systems but crash the kernel on other systems. We fix those
bugs. Another, subtler form is when the patches work fine on
their systems but break apps on other systems. We fix those bugs
too.
That's why we have testing, regression tracking and maintainers,
to control that - compatibility is just another dimension to
'correctness', in the typical case with no inherent restrictions
on future features and possibilities.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2012-01-16 8:55 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
2011-12-01 21:57 ` Christoph Hellwig
2011-12-01 22:13 ` Greg KH
2011-12-01 22:19 ` Mathieu Desnoyers
2011-12-01 22:41 ` Greg KH
2011-12-01 22:28 ` Christoph Hellwig
2011-12-01 23:00 ` Greg KH
2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
2011-12-02 7:19 ` Jens Axboe
2011-12-02 12:32 ` Mathieu Desnoyers
2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
2011-12-01 21:56 ` Peter Zijlstra
2011-12-01 22:04 ` Mathieu Desnoyers
2011-12-01 22:10 ` Peter Zijlstra
2011-12-01 22:15 ` Mathieu Desnoyers
2011-12-01 22:36 ` Mathieu Desnoyers
2011-12-01 23:05 ` Peter Zijlstra
2011-12-02 13:51 ` Mathieu Desnoyers
2011-12-01 23:06 ` Peter Zijlstra
2011-12-01 23:18 ` Greg KH
2011-12-01 23:47 ` Mathieu Desnoyers
2011-12-01 22:14 ` Greg KH
2011-12-01 22:20 ` Mathieu Desnoyers
2011-12-01 23:07 ` Peter Zijlstra
2011-12-01 23:17 ` Greg KH
2011-12-05 14:17 ` Ingo Molnar
2011-12-06 21:44 ` Greg KH
2011-12-08 5:23 ` Ingo Molnar
2011-12-08 23:27 ` Greg KH
2011-12-19 10:49 ` Ingo Molnar
2011-12-19 15:30 ` [lttng-dev] " Mathieu Desnoyers
2011-12-20 11:08 ` Ingo Molnar
2011-12-20 21:46 ` Frank Rowand
2011-12-23 10:51 ` Ingo Molnar
2011-12-21 18:47 ` Aaron Spear
2011-12-21 18:58 ` Christoph Hellwig
2011-12-23 16:46 ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
2011-12-23 17:21 ` Ted Ts'o
2011-12-23 18:16 ` Mathieu Desnoyers
2011-12-25 17:46 ` Ted Ts'o
2012-01-12 14:09 ` Mathieu Desnoyers
2012-01-12 14:54 ` Steven Rostedt
2012-01-12 15:39 ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
2012-01-12 15:53 ` Steven Rostedt
2012-01-12 15:59 ` Steven Rostedt
2012-01-12 16:27 ` Mathieu Desnoyers
2012-01-12 16:34 ` Steven Rostedt
2012-01-12 20:00 ` Greg KH
2012-01-16 8:55 ` Ingo Molnar
2011-12-07 22:57 ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
2011-12-08 5:40 ` Ingo Molnar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).