linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
       [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
@ 2011-12-01 21:41 ` Mathieu Desnoyers
  2011-12-01 21:57   ` Christoph Hellwig
  2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
  2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
  2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw)
  To: Greg KH, Mathieu Desnoyers
  Cc: devel, lttng-dev, Mathieu Desnoyers, Linus Torvalds,
	Christoph Hellwig, Christoph Lameter, Tejun Heo, David Howells,
	David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt,
	linux-mm, linux-kernel, Greg KH

LTTng needs this symbol exported. It calls it to ensure its tracing
buffers and allocated data structures never trigger a page fault. This
is required to handle page fault handler tracing and NMI tracing
gracefully.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Christoph Hellwig <hch@infradead.org>
CC: Christoph Lameter <cl@linux-foundation.org>
CC: Tejun Heo <tj@kernel.org>
CC: David Howells <dhowells@redhat.com>
CC: David McCullough <davidm@snapgear.com>
CC: D Jeff Dionne <jeff@uClinux.org>
CC: Greg Ungerer <gerg@snapgear.com>
CC: Paul Mundt <lethal@linux-sh.org>
CC: linux-mm@kvack.org
CC: linux-kernel@vger.kernel.org
CC: Greg KH <greg@kroah.com>
---
 mm/nommu.c   |    1 +
 mm/vmalloc.c |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/nommu.c b/mm/nommu.c
index b982290..b22a0d9 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -441,6 +441,7 @@ EXPORT_SYMBOL_GPL(vm_unmap_aliases);
 void  __attribute__((weak)) vmalloc_sync_all(void)
 {
 }
+EXPORT_SYMBOL_GPL(vmalloc_sync_all);
 
 /**
  *	alloc_vm_area - allocate a range of kernel address space
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3231bf3..37ddce5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2137,6 +2137,7 @@ EXPORT_SYMBOL(remap_vmalloc_range);
 void  __attribute__((weak)) vmalloc_sync_all(void)
 {
 }
+EXPORT_SYMBOL_GPL(vmalloc_sync_all);
 
 
 static int f(pte_t *pte, pgtable_t table, unsigned long addr, void *data)
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules
       [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
  2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
@ 2011-12-01 21:41 ` Mathieu Desnoyers
  2011-12-02  7:19   ` Jens Axboe
  2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
  2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw)
  To: Greg KH, Mathieu Desnoyers
  Cc: devel, lttng-dev, Mathieu Desnoyers, Linus Torvalds, Ingo Molnar,
	Jens Axboe, linux-kernel, Greg KH

The LTTng driver needs this symbol exported because it implements its
own splice actor.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Linus Torvalds <torvalds@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: Jens Axboe <axboe@kernel.dk>
CC: linux-kernel@vger.kernel.org
CC: Greg KH <greg@kroah.com>
---
 fs/splice.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index fa2defa..9eb15b5 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(splice_to_pipe);
 
 void spd_release_page(struct splice_pipe_desc *spd, unsigned int i)
 {
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [PATCH 09/11] sched: export task_prio to GPL modules
       [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
  2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
  2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
@ 2011-12-01 21:41 ` Mathieu Desnoyers
  2011-12-01 21:56   ` Peter Zijlstra
  2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 21:41 UTC (permalink / raw)
  To: Greg KH, Mathieu Desnoyers
  Cc: devel, lttng-dev, Mathieu Desnoyers, Ingo Molnar, Peter Zijlstra,
	linux-kernel, Greg KH

LTTng needs this symbol to prepend the current task dynamic priority
value to events (optional context information).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Peter Zijlstra <peterz@infradead.org>
CC: linux-kernel@vger.kernel.org
CC: Greg KH <greg@kroah.com>
---
 kernel/sched.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 0e9344a..80dbb09 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5142,6 +5142,7 @@ int task_prio(const struct task_struct *p)
 {
 	return p->prio - MAX_RT_PRIO;
 }
+EXPORT_SYMBOL_GPL(task_prio);
 
 /**
  * task_nice - return the nice value of a given task.
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
@ 2011-12-01 21:56   ` Peter Zijlstra
  2011-12-01 22:04     ` Mathieu Desnoyers
  2011-12-01 22:14     ` Greg KH
  0 siblings, 2 replies; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 21:56 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel

On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> LTTng needs this symbol to prepend the current task dynamic priority
> value to events (optional context information).

I absolutely detest exporting such stuff. It propagates the idea that
task prio actually means something. Also, modules really shouldn't care.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
  2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
@ 2011-12-01 21:57   ` Christoph Hellwig
  2011-12-01 22:13     ` Greg KH
  0 siblings, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2011-12-01 21:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Christoph Hellwig,
	Christoph Lameter, Tejun Heo, David Howells, David McCullough,
	D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel

On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> LTTng needs this symbol exported. It calls it to ensure its tracing
> buffers and allocated data structures never trigger a page fault. This
> is required to handle page fault handler tracing and NMI tracing
> gracefully.

We:

 a) don't export symbols unless they have an intree-user
 b) especially don't export something as lowlevel as this one.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 21:56   ` Peter Zijlstra
@ 2011-12-01 22:04     ` Mathieu Desnoyers
  2011-12-01 22:10       ` Peter Zijlstra
  2011-12-01 22:14     ` Greg KH
  1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > LTTng needs this symbol to prepend the current task dynamic priority
> > value to events (optional context information).
> 
> I absolutely detest exporting such stuff. It propagates the idea that
> task prio actually means something. Also, modules really shouldn't care.

People debugging their SCHED_FIFO/SCHED_RR applications, as well as
users of priority-inheritance futexes, may happen to find this
information extremely useful.

Just saying...

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:04     ` Mathieu Desnoyers
@ 2011-12-01 22:10       ` Peter Zijlstra
  2011-12-01 22:15         ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 22:10 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol to prepend the current task dynamic priority
> > > value to events (optional context information).
> > 
> > I absolutely detest exporting such stuff. It propagates the idea that
> > task prio actually means something. Also, modules really shouldn't care.
> 
> People debugging their SCHED_FIFO/SCHED_RR applications, as well as
> users of priority-inheritance futexes, may happen to find this
> information extremely useful.
> 
> Just saying...

Right until the moment we go do deadlines.. Anyway, it still doesn't
make sense, your sched_switch() tracepoint handler gets this
information, why do you need this export at all?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
  2011-12-01 21:57   ` Christoph Hellwig
@ 2011-12-01 22:13     ` Greg KH
  2011-12-01 22:19       ` Mathieu Desnoyers
  2011-12-01 22:28       ` Christoph Hellwig
  0 siblings, 2 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 22:13 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds,
	Christoph Lameter, Tejun Heo, David Howells, David McCullough,
	D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel

On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > LTTng needs this symbol exported. It calls it to ensure its tracing
> > buffers and allocated data structures never trigger a page fault. This
> > is required to handle page fault handler tracing and NMI tracing
> > gracefully.
> 
> We:
> 
>  a) don't export symbols unless they have an intree-user

lttng is now in-tree in the drivers/staging/ area.  See linux-next for
details if you are curious.

>  b) especially don't export something as lowlevel as this one.

Mathieu, there's nothing else you can do to get this information?  Or
does lttng really want such lowlevel data?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 21:56   ` Peter Zijlstra
  2011-12-01 22:04     ` Mathieu Desnoyers
@ 2011-12-01 22:14     ` Greg KH
  2011-12-01 22:20       ` Mathieu Desnoyers
  2011-12-01 23:07       ` Peter Zijlstra
  1 sibling, 2 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 22:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel

On Thu, Dec 01, 2011 at 10:56:08PM +0100, Peter Zijlstra wrote:
> On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > LTTng needs this symbol to prepend the current task dynamic priority
> > value to events (optional context information).
> 
> I absolutely detest exporting such stuff. It propagates the idea that
> task prio actually means something. Also, modules really shouldn't care.

Mathieu, if you don't have this information, does anything really care?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:10       ` Peter Zijlstra
@ 2011-12-01 22:15         ` Mathieu Desnoyers
  2011-12-01 22:36           ` Mathieu Desnoyers
  2011-12-01 23:06           ` Peter Zijlstra
  0 siblings, 2 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:15 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote:
> > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > > LTTng needs this symbol to prepend the current task dynamic priority
> > > > value to events (optional context information).
> > > 
> > > I absolutely detest exporting such stuff. It propagates the idea that
> > > task prio actually means something. Also, modules really shouldn't care.
> > 
> > People debugging their SCHED_FIFO/SCHED_RR applications, as well as
> > users of priority-inheritance futexes, may happen to find this
> > information extremely useful.
> > 
> > Just saying...
> 
> Right until the moment we go do deadlines.. Anyway, it still doesn't
> make sense, your sched_switch() tracepoint handler gets this
> information, why do you need this export at all?

If you don't want to trace sched_switch, but just conveniently prepend
this information to all your events, then lttng lets you dynamically
target this extra bit of information. Note that it's not a mandatory
event field: I call those "context" fields that the tracer prepends to
events, as requested by the user.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
  2011-12-01 22:13     ` Greg KH
@ 2011-12-01 22:19       ` Mathieu Desnoyers
  2011-12-01 22:41         ` Greg KH
  2011-12-01 22:28       ` Christoph Hellwig
  1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:19 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, devel, lttng-dev, Linus Torvalds,
	Christoph Lameter, Tejun Heo, David Howells, David McCullough,
	D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel

* Greg KH (greg@kroah.com) wrote:
> On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > buffers and allocated data structures never trigger a page fault. This
> > > is required to handle page fault handler tracing and NMI tracing
> > > gracefully.
> > 
> > We:
> > 
> >  a) don't export symbols unless they have an intree-user
> 
> lttng is now in-tree in the drivers/staging/ area.  See linux-next for
> details if you are curious.
> 
> >  b) especially don't export something as lowlevel as this one.
> 
> Mathieu, there's nothing else you can do to get this information?  Or
> does lttng really want such lowlevel data?

LTTng calls vmalloc_sync_all() to make sure it won't crash the system
(due to recursive page fault) when hooking on the page fault handler and
on any hook that would happen to sit in a function hit by NMI context.
So it really goes beyond just extracting information for this one I'm
afraid: it's a matter of execution correctness.

This is a point I'm really anal about: the tracer should _never_ crash
the traced system, _ever_, in any foreseeable condition.

Thanks,

Mathieu

> 
> thanks,
> 
> greg k-h

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:14     ` Greg KH
@ 2011-12-01 22:20       ` Mathieu Desnoyers
  2011-12-01 23:07       ` Peter Zijlstra
  1 sibling, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:20 UTC (permalink / raw)
  To: Greg KH; +Cc: Peter Zijlstra, devel, lttng-dev, Ingo Molnar, linux-kernel

* Greg KH (greg@kroah.com) wrote:
> On Thu, Dec 01, 2011 at 10:56:08PM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol to prepend the current task dynamic priority
> > > value to events (optional context information).
> > 
> > I absolutely detest exporting such stuff. It propagates the idea that
> > task prio actually means something. Also, modules really shouldn't care.
> 
> Mathieu, if you don't have this information, does anything really care?

I can just remove this specific context module, nothing else will care
except the end users, but it's a shame to lose this option.

Thanks,

Mathieu

> 
> thanks,
> 
> greg k-h

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
  2011-12-01 22:13     ` Greg KH
  2011-12-01 22:19       ` Mathieu Desnoyers
@ 2011-12-01 22:28       ` Christoph Hellwig
  2011-12-01 23:00         ` Greg KH
  1 sibling, 1 reply; 51+ messages in thread
From: Christoph Hellwig @ 2011-12-01 22:28 UTC (permalink / raw)
  To: Greg KH
  Cc: Christoph Hellwig, Mathieu Desnoyers, devel, lttng-dev,
	Linus Torvalds, Christoph Lameter, Tejun Heo, David Howells,
	David McCullough, D Jeff Dionne, Greg Ungerer, Paul Mundt,
	linux-mm, linux-kernel

On Thu, Dec 01, 2011 at 02:13:37PM -0800, Greg KH wrote:
> On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > buffers and allocated data structures never trigger a page fault. This
> > > is required to handle page fault handler tracing and NMI tracing
> > > gracefully.
> > 
> > We:
> > 
> >  a) don't export symbols unless they have an intree-user
> 
> lttng is now in-tree in the drivers/staging/ area.  See linux-next for
> details if you are curious.

Eww - merging stuff without discussion on lkml is more than evil.

Either way, it was guaranteed that drivers/staging is considered out of
tree for core code.  I'm defintively dead set against exporting anything
for staging and opening that slippery slope.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:15         ` Mathieu Desnoyers
@ 2011-12-01 22:36           ` Mathieu Desnoyers
  2011-12-01 23:05             ` Peter Zijlstra
  2011-12-01 23:06           ` Peter Zijlstra
  1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 22:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

* Mathieu Desnoyers (mathieu.desnoyers@efficios.com) wrote:
> * Peter Zijlstra (peterz@infradead.org) wrote:
> > On Thu, 2011-12-01 at 17:04 -0500, Mathieu Desnoyers wrote:
> > > * Peter Zijlstra (peterz@infradead.org) wrote:
> > > > On Thu, 2011-12-01 at 16:41 -0500, Mathieu Desnoyers wrote:
> > > > > LTTng needs this symbol to prepend the current task dynamic priority
> > > > > value to events (optional context information).
> > > > 
> > > > I absolutely detest exporting such stuff. It propagates the idea that
> > > > task prio actually means something. Also, modules really shouldn't care.
> > > 
> > > People debugging their SCHED_FIFO/SCHED_RR applications, as well as
> > > users of priority-inheritance futexes, may happen to find this
> > > information extremely useful.
> > > 
> > > Just saying...
> > 
> > Right until the moment we go do deadlines.. Anyway, it still doesn't
> > make sense, your sched_switch() tracepoint handler gets this
> > information, why do you need this export at all?
> 
> If you don't want to trace sched_switch, but just conveniently prepend
> this information to all your events, then lttng lets you dynamically
> target this extra bit of information. Note that it's not a mandatory
> event field: I call those "context" fields that the tracer prepends to
> events, as requested by the user.

One more point:

compudj@thinkos:/proc/204$ cat sched
khubd (204, #threads: 1)
---------------------------------------------------------
se.exec_start                      :       3355267.749529
se.vruntime                        :        113843.899081
se.sum_exec_runtime                :            12.820702
nr_switches                        :                  386
nr_voluntary_switches              :                  385
nr_involuntary_switches            :                    1
se.load.weight                     :                 1024
policy                             :                    0
prio                               :                  120
clock-delta                        :                  130

So what you are saying is that it is fine to export task_prio to
_userspace_, thus making it part of the ABI, but it's not OK to export
it to GPL modules ?

Weird huh ?

Mathieu

> 
> Thanks,
> 
> Mathieu
> 
> 
> -- 
> Mathieu Desnoyers
> Operating System Efficiency R&D Consultant
> EfficiOS Inc.
> http://www.efficios.com

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
  2011-12-01 22:19       ` Mathieu Desnoyers
@ 2011-12-01 22:41         ` Greg KH
  0 siblings, 0 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 22:41 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Christoph Hellwig, devel, lttng-dev, Linus Torvalds,
	Christoph Lameter, Tejun Heo, David Howells, David McCullough,
	D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel

On Thu, Dec 01, 2011 at 05:19:40PM -0500, Mathieu Desnoyers wrote:
> * Greg KH (greg@kroah.com) wrote:
> > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > > buffers and allocated data structures never trigger a page fault. This
> > > > is required to handle page fault handler tracing and NMI tracing
> > > > gracefully.
> > > 
> > > We:
> > > 
> > >  a) don't export symbols unless they have an intree-user
> > 
> > lttng is now in-tree in the drivers/staging/ area.  See linux-next for
> > details if you are curious.
> > 
> > >  b) especially don't export something as lowlevel as this one.
> > 
> > Mathieu, there's nothing else you can do to get this information?  Or
> > does lttng really want such lowlevel data?
> 
> LTTng calls vmalloc_sync_all() to make sure it won't crash the system
> (due to recursive page fault) when hooking on the page fault handler and
> on any hook that would happen to sit in a function hit by NMI context.
> So it really goes beyond just extracting information for this one I'm
> afraid: it's a matter of execution correctness.

Ok, fair enough.

Christoph, is there any other way to achive something like this without
this symbol being exported that you know of?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules
  2011-12-01 22:28       ` Christoph Hellwig
@ 2011-12-01 23:00         ` Greg KH
  0 siblings, 0 replies; 51+ messages in thread
From: Greg KH @ 2011-12-01 23:00 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mathieu Desnoyers, devel, lttng-dev, Linus Torvalds,
	Christoph Lameter, Tejun Heo, David Howells, David McCullough,
	D Jeff Dionne, Greg Ungerer, Paul Mundt, linux-mm, linux-kernel

On Thu, Dec 01, 2011 at 05:28:03PM -0500, Christoph Hellwig wrote:
> On Thu, Dec 01, 2011 at 02:13:37PM -0800, Greg KH wrote:
> > On Thu, Dec 01, 2011 at 04:57:00PM -0500, Christoph Hellwig wrote:
> > > On Thu, Dec 01, 2011 at 04:41:13PM -0500, Mathieu Desnoyers wrote:
> > > > LTTng needs this symbol exported. It calls it to ensure its tracing
> > > > buffers and allocated data structures never trigger a page fault. This
> > > > is required to handle page fault handler tracing and NMI tracing
> > > > gracefully.
> > > 
> > > We:
> > > 
> > >  a) don't export symbols unless they have an intree-user
> > 
> > lttng is now in-tree in the drivers/staging/ area.  See linux-next for
> > details if you are curious.
> 
> Eww - merging stuff without discussion on lkml is more than evil.

Do you really want discussing all staging driver crap on lkml?

Core changes, like this one, for stuff in staging should be done on
lkml, which is what this conversation is :)

> Either way, it was guaranteed that drivers/staging is considered out of
> tree for core code.

The zram and zcache code would tend to disagree with you there :)

> I'm defintively dead set against exporting anything for staging and
> opening that slippery slope.

How else should we handle something like this then?  Some code, this one
specifically, is trying to get merged, so taking it slowly, through
staging, and getting it reviewed and cleaned up better before it can go
into the "real" part of the kernel, is the whole goal here.

Here's a real need for a symbol that an existing, shipping, useful
kernel module is wanting to use.

If you can provide a way that this can be handled without such an
export, that does not require digging through the symbol table (which is
what it was doing and I rightfully objected to that), then please let us
know.

Otherwise, what are our alternatives here, to just forbid this code from
ever being merged?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:36           ` Mathieu Desnoyers
@ 2011-12-01 23:05             ` Peter Zijlstra
  2011-12-02 13:51               ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 23:05 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

On Thu, 2011-12-01 at 17:36 -0500, Mathieu Desnoyers wrote:
> So what you are saying is that it is fine to export task_prio to
> _userspace_, thus making it part of the ABI, but it's not OK to export
> it to GPL modules ? 

that's a SCHED_DEBUG proc file.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:15         ` Mathieu Desnoyers
  2011-12-01 22:36           ` Mathieu Desnoyers
@ 2011-12-01 23:06           ` Peter Zijlstra
  2011-12-01 23:18             ` Greg KH
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 23:06 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
> 
> If you don't want to trace sched_switch, but just conveniently prepend
> this information to all your events 

Oh so you want to debug a scheduler issue but don't want to use the
scheduler tracepoint, I guess that makes perfect sense for clueless
people.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 22:14     ` Greg KH
  2011-12-01 22:20       ` Mathieu Desnoyers
@ 2011-12-01 23:07       ` Peter Zijlstra
  2011-12-01 23:17         ` Greg KH
  1 sibling, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2011-12-01 23:07 UTC (permalink / raw)
  To: Greg KH; +Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel

On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> greg k-h

Greg, why are you merging this crap anyway? Aren't there enough tracer
thingies around already?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 23:07       ` Peter Zijlstra
@ 2011-12-01 23:17         ` Greg KH
  2011-12-05 14:17           ` Ingo Molnar
  0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-01 23:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel

On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote:
> On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> > greg k-h
> 
> Greg, why are you merging this crap anyway? Aren't there enough tracer
> thingies around already?

I don't know, is there?

There's some reason the distros, and users, still use lttng, so I'm
guessing that it fits the needs of quite a few people.

That's why I'm merging it, if that the in-kernel stuff obsoletes lttng,
great, let me, and the distros know.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 23:06           ` Peter Zijlstra
@ 2011-12-01 23:18             ` Greg KH
  2011-12-01 23:47               ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-01 23:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mathieu Desnoyers, devel, lttng-dev, Ingo Molnar, linux-kernel,
	Darren Hart

On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote:
> On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
> > 
> > If you don't want to trace sched_switch, but just conveniently prepend
> > this information to all your events 
> 
> Oh so you want to debug a scheduler issue but don't want to use the
> scheduler tracepoint, I guess that makes perfect sense for clueless
> people.

Matheiu, can't lttng use the scheduler tracepoint for this information?



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 23:18             ` Greg KH
@ 2011-12-01 23:47               ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-01 23:47 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

* Greg KH (greg@kroah.com) wrote:
> On Fri, Dec 02, 2011 at 12:06:37AM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 17:15 -0500, Mathieu Desnoyers wrote:
> > > 
> > > If you don't want to trace sched_switch, but just conveniently prepend
> > > this information to all your events 
> > 
> > Oh so you want to debug a scheduler issue but don't want to use the
> > scheduler tracepoint, I guess that makes perfect sense for clueless
> > people.
> 
> Matheiu, can't lttng use the scheduler tracepoint for this information?

LTTng allows user to choose between both methods, each one being suited
to a particular use of the tracer:

A) Extraction through the scheduler tracepoint:

   LTTng viewers have a full-fledged current state reconstruction of the
   traced OS (for any point in time during the trace) performed as one
   of the bottom layers of our trace analysis tools. This makes sense
   for use-cases where the data needs to be transported, and/or stored,
   and where the amount of data throughput needs to be minimized. We use
   this technique a lot, of course. This state-tracking requires
   CPU/memory resource usage by the viewer.

B) Extraction through "optional" event context information:

   We have, in development, a new "enhanced top" called lttngtop that
   uses tracing information, directly read from mmap'd buffers, to
   provide second-by-second profile information of the system. It is
   not as sensitive to data compactness as the transport/disk storage
   use-case, mainly because no data copy is ever required -- the buffers
   simply get overwritten after lttngtop has finished aggregating the
   information. This has less performance overhead that the big hammer
   "top" that periodically reads all files in /proc, and can provide
   much more detailed profiles.

   This use-case favors sending additional data from kernel to
   user-space rather than recomputing the OS state within lttngtop, due
   to the very low overhead of direct mmap data transport, over
   recomputing state needlessly.

We could very well "cheat" and use a scheduler tracepoint to keep a
duplicate of the current priority value for each CPU within the tracer
kernel module. Let me know if you want me to do this.

Also, as a matter of fact, the "prio" information exported from the
sched_switch event in mainline trace events does not match the prio
shown in /proc stat files. The "MAX_RT_PRIO" offset is missing.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules
  2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
@ 2011-12-02  7:19   ` Jens Axboe
  2011-12-02 12:32     ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Jens Axboe @ 2011-12-02  7:19 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Ingo Molnar,
	Jens Axboe, linux-kernel

On 2011-12-01 22:41, Mathieu Desnoyers wrote:
> The LTTng driver needs this symbol exported because it implements its
> own splice actor.
> 
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Linus Torvalds <torvalds@linux-foundation.org>
> CC: Ingo Molnar <mingo@elte.hu>
> CC: Jens Axboe <axboe@kernel.dk>
> CC: linux-kernel@vger.kernel.org
> CC: Greg KH <greg@kroah.com>
> ---
>  fs/splice.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/splice.c b/fs/splice.c
> index fa2defa..9eb15b5 100644
> --- a/fs/splice.c
> +++ b/fs/splice.c
> @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
>  
>  	return ret;
>  }
> +EXPORT_SYMBOL_GPL(splice_to_pipe);

The rest of the splice symbols are regular exports, please do the same
for this one. Thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 03/11] fs/splice: export splice_to_pipe to GPL modules
  2011-12-02  7:19   ` Jens Axboe
@ 2011-12-02 12:32     ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-02 12:32 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Greg KH, devel, lttng-dev, Linus Torvalds, Ingo Molnar,
	Jens Axboe, linux-kernel

* Jens Axboe (jens@axboe.dk) wrote:
> On 2011-12-01 22:41, Mathieu Desnoyers wrote:
> > The LTTng driver needs this symbol exported because it implements its
> > own splice actor.
> > 
> > Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> > CC: Linus Torvalds <torvalds@linux-foundation.org>
> > CC: Ingo Molnar <mingo@elte.hu>
> > CC: Jens Axboe <axboe@kernel.dk>
> > CC: linux-kernel@vger.kernel.org
> > CC: Greg KH <greg@kroah.com>
> > ---
> >  fs/splice.c |    1 +
> >  1 files changed, 1 insertions(+), 0 deletions(-)
> > 
> > diff --git a/fs/splice.c b/fs/splice.c
> > index fa2defa..9eb15b5 100644
> > --- a/fs/splice.c
> > +++ b/fs/splice.c
> > @@ -263,6 +263,7 @@ ssize_t splice_to_pipe(struct pipe_inode_info *pipe,
> >  
> >  	return ret;
> >  }
> > +EXPORT_SYMBOL_GPL(splice_to_pipe);
> 
> The rest of the splice symbols are regular exports, please do the same
> for this one. Thanks.

I've been wondering about this one, but thought it would be better to
let you decide on opening up the symbol more than with _GPL. Will do!

Thanks,

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 23:05             ` Peter Zijlstra
@ 2011-12-02 13:51               ` Mathieu Desnoyers
  0 siblings, 0 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-02 13:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Greg KH, devel, lttng-dev, Ingo Molnar, linux-kernel, Darren Hart

* Peter Zijlstra (peterz@infradead.org) wrote:
> On Thu, 2011-12-01 at 17:36 -0500, Mathieu Desnoyers wrote:
> > So what you are saying is that it is fine to export task_prio to
> > _userspace_, thus making it part of the ABI, but it's not OK to export
> > it to GPL modules ? 
> 
> that's a SCHED_DEBUG proc file.

Fair point. You'll then notice that /proc/<pid>/stat (18th field)
exports it too, and it's not under SCHED_DEBUG:

ok:/proc/20# cat stat
20 (migration/5) S 2 0 0 0 -1 2216722496 0 0 0 0 0 0 0 0 -100 0 1 0 70 0
0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 18446744071579371389 0
0 17 5 99 1 0 0 0

(see -100 above)

as defined in Documentation/filesystems/proc.txt:

"Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
[...]
priority      priority level"

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-01 23:17         ` Greg KH
@ 2011-12-05 14:17           ` Ingo Molnar
  2011-12-06 21:44             ` Greg KH
  2011-12-07 22:57             ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
  0 siblings, 2 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-05 14:17 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
	linux-kernel, Linus Torvalds, Andrew Morton


* Greg KH <greg@kroah.com> wrote:

> On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote:
> > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> > > greg k-h
> > 
> > Greg, why are you merging this crap anyway? Aren't there enough tracer
> > thingies around already?
> 
> I don't know, is there?
> 
> There's some reason the distros, and users, still use lttng, 
> so I'm guessing that it fits the needs of quite a few people.

Same goes for a whole lot of other crap that distros are 
carrying. Would we want to merge a different CPU scheduler or 
the 4g:4g patch or a completely new networking stack into 
drivers/staging/? I don't think so.

I.e. putting LTTNG into drivers/staging/ will not really solve 
anything - and in may in fact delay any sane technical 
resolution:

There's a difference between a driver that has to go into 
drivers/staging/ because nobody cares enough [and the driver 
isnt high quality enough yet], and a core kernel feature that we 
DO care about and which HAS BEEN REJECTED IN ITS FORM.

> That's why I'm merging it, if that the in-kernel stuff 
> obsoletes lttng, great, let me, and the distros know.

I'm NAK-ing the LTTNG driver really, as it's a workaround for a 
core kernel NAK.

Mathieu, please work with the tracing folks who DO care about 
this stuff. It's not like there's a lack of interest in this 
area, nor is there a lack of willingness to take patches. What 
there is a lack of is your willingness to actually work on 
getting something unified, integrated to users...

LTTNG has been going on for how many years? I havent seen many 
steps towards actually *merging* its functionality - you insist 
on doing your own random thing, which is different in random 
ways. Yes, some of those random ways may in fact be better than 
what we have upstream - would you be interested in filtering 
those out and pushing them upstream? I certainly would like to 
see that happen.

We want to pick the best features, and throw away current 
upstream code in favor of superior out of tree code - this 
concept of letting crap sit alongside each other when people do 
care i cannot agree with.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-05 14:17           ` Ingo Molnar
@ 2011-12-06 21:44             ` Greg KH
  2011-12-08  5:23               ` Ingo Molnar
  2011-12-07 22:57             ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
  1 sibling, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-06 21:44 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
	linux-kernel, Linus Torvalds, Andrew Morton

On Mon, Dec 05, 2011 at 03:17:49PM +0100, Ingo Molnar wrote:
> 
> * Greg KH <greg@kroah.com> wrote:
> 
> > On Fri, Dec 02, 2011 at 12:07:10AM +0100, Peter Zijlstra wrote:
> > > On Thu, 2011-12-01 at 14:14 -0800, Greg KH wrote:
> > > > greg k-h
> > > 
> > > Greg, why are you merging this crap anyway? Aren't there enough tracer
> > > thingies around already?
> > 
> > I don't know, is there?
> > 
> > There's some reason the distros, and users, still use lttng, 
> > so I'm guessing that it fits the needs of quite a few people.
> 
> Same goes for a whole lot of other crap that distros are 
> carrying. Would we want to merge a different CPU scheduler or 
> the 4g:4g patch or a completely new networking stack into 
> drivers/staging/? I don't think so.

Distros have new CPU schedulers and are still dragging the 4g split
around?  A whole new networking stack would be interesting, and if
self-contained, possible :)

> I.e. putting LTTNG into drivers/staging/ will not really solve 
> anything - and in may in fact delay any sane technical 
> resolution:
> 
> There's a difference between a driver that has to go into 
> drivers/staging/ because nobody cares enough [and the driver 
> isnt high quality enough yet], and a core kernel feature that we 
> DO care about and which HAS BEEN REJECTED IN ITS FORM.

I didn't realize that lttng was rejected, when was that done?  I
couldn't find it in the archives anywhere.

That's why I took this.  It's a way for the code to get cleaned up, and
into "mergable" state, much easier, with more help than if it was
out-of-tree.  The fact that distros have been shipping and relying on it
for years shows that it is something that is needed, and it being
self-contained, makes it eligible for the staging tree.

> > That's why I'm merging it, if that the in-kernel stuff 
> > obsoletes lttng, great, let me, and the distros know.
> 
> I'm NAK-ing the LTTNG driver really, as it's a workaround for a 
> core kernel NAK.

Huh?

> Mathieu, please work with the tracing folks who DO care about 
> this stuff. It's not like there's a lack of interest in this 
> area, nor is there a lack of willingness to take patches. What 
> there is a lack of is your willingness to actually work on 
> getting something unified, integrated to users...
> 
> LTTNG has been going on for how many years? I havent seen many 
> steps towards actually *merging* its functionality - you insist 
> on doing your own random thing, which is different in random 
> ways. Yes, some of those random ways may in fact be better than 
> what we have upstream - would you be interested in filtering 
> those out and pushing them upstream? I certainly would like to 
> see that happen.
> 
> We want to pick the best features, and throw away current 
> upstream code in favor of superior out of tree code - this 
> concept of letting crap sit alongside each other when people do 
> care i cannot agree with.

Mathieu, a good explaination of what lttng has that the in-kernel
tracing and perf doesn't have would be a good place to start.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-05 14:17           ` Ingo Molnar
  2011-12-06 21:44             ` Greg KH
@ 2011-12-07 22:57             ` Mathieu Desnoyers
  2011-12-08  5:40               ` Ingo Molnar
  1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-07 22:57 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg KH, Peter Zijlstra, devel, lttng-dev, linux-kernel,
	Linus Torvalds, Andrew Morton, Thomas Gleixner, Steven Rostedt,
	Frederic Weisbecker

Hi Ingo,

* Ingo Molnar (mingo@elte.hu) wrote:
[...]
> Mathieu, please work with the tracing folks who DO care about 
> this stuff. It's not like there's a lack of interest in this 
> area, nor is there a lack of willingness to take patches. What 
> there is a lack of is your willingness to actually work on 
> getting something unified, integrated to users...
> 
> LTTNG has been going on for how many years? I havent seen many 
> steps towards actually *merging* its functionality - you insist 
> on doing your own random thing, which is different in random 
> ways. Yes, some of those random ways may in fact be better than 
> what we have upstream - would you be interested in filtering 
> those out and pushing them upstream? I certainly would like to 
> see that happen.
>
> We want to pick the best features, and throw away current 
> upstream code in favor of superior out of tree code - this 
> concept of letting crap sit alongside each other when people do 
> care i cannot agree with.

LTTng 2.0, today, offers a unified interface for kernel and userspace
tracing, in the form of libraries and git-alike command line user
interface. It produces a trace format (CTF) that has been developed in
collaboration with hardware vendors and reviewed by tracing developers
of the Linux community, which allows analyzing correlated traces across
the software and hardware stacks, and supports being streamed over the
network with zero-copy both in TCP, UDP format, with optional
encryption, checksum, and more. It supports multiple concurrent users,
and hooks with tracepoints, Perf PMU counters, kprobes, kretprobes, and
system calls, with the ability to attach "context" information prepended
before each event record as selected by the user when setting up a
tracing session.

It is currently self-contained: it's been designed to be shipped as a
stand-alone set of self-contained modules, but I recently received the
offer to get it pulled into staging, which I accepted.

In my opinion, tracers need to be split into three distinct parts:

1) core tracing infrastructure that _needs to_ be shared. This mainly
   targets instrumentation, and I've done my share of contribution to
   mainline on this front already. I think the infrastructure we have
   today is in pretty good shape.

2) tracing infrastructure that _could_ be shared. I'm mostly targeting ring
   buffers and trace clocks there. It could be a nice-to-have to share the
   implementation, as long as it does not get in the way of what each
   project is trying to achieve. So far, what I noticed is that each
   project is lacking understanding of the intent and constraints of the
   other projects, thus either considering what the others are doing
   as over- or under- engineering, depending on the context. Therefore,
   as long as there is no agreement on the right amount of care that
   needs to be put in the design of these components, it might be best
   to duplicate the implementation and slowly converge as each project
   gets to understand the other project's constraints. To make progress
   on this front, you need to have both code-bases into mainline.
   
3) interfaces to user-space: very much like filesystems, these ABIs
   don't need to be shared across projects that have different
   use-cases. Having multiple tracer ABIs, if self-contained, should
   not hurt anybody and just increase the rate of innovation. Sadly,
   the ABIs exposed by perf/ftrace do not seem to be a good fit for
   LTTng use-cases. Since the perf/ftrace ABIs, as well as the LTTng
   ABI, are all already used by many tools, it will likely be really
   difficult to change them overnight.

As an example of where we could benefit from working together, LTTng is
currently using a shadow copy of the TRACE_EVENT macros, because
the upstream version is quite limiting with respect to generating
compact probe code. It could be good to integrate those changes
upstream, and I think the best way to achieve this is if the perf and
ftrace developers can have a look at the approach taken by LTTng to
achieve this -- which is better done if LTTng is merged into staging.

Another example is how LTTng extracts system call arguments types, which
is performed by generating TRACE_EVENT description of the system call
table with a script. We could definitely help out each other in this
area.

There are certainly many other areas where we could eventually benefit
from working together, listed above as #2 "tracing infrastructure that
_could_ be shared", but I think it is better to first focus on the core
infrastructure that we need to share before getting into the territory
of the infrastructure we could share if took the time to understand each
other's requirements fully first. Meanwhile, having a duplicated
implementation of these parts that "could" be shared should not hurt
anyone -- it would even help understanding each other --, as long as
they stay self-contained.

In summary, I'm really open to help out on working on common pieces of
infrastructures, but for that they need to take into account both the
current perf/ftrace use-cases and the LTTng use-cases.

Best regards,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-06 21:44             ` Greg KH
@ 2011-12-08  5:23               ` Ingo Molnar
  2011-12-08 23:27                 ` Greg KH
  0 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2011-12-08  5:23 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
	linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner


* Greg KH <greg@kroah.com> wrote:

> > Same goes for a whole lot of other crap that distros are 
> > carrying. Would we want to merge a different CPU scheduler 
> > or the 4g:4g patch or a completely new networking stack into 
> > drivers/staging/? I don't think so.
> 
> Distros have new CPU schedulers and are still dragging the 4g 
> split around?  A whole new networking stack would be 
> interesting, and if self-contained, possible :)

The point being, there's legitimate reasons to refuse crap to an 
area that *people care about* in a constructive manner.

There's no rejection of LTTNG in the "hey, go away, you are 
doing it wrong" fashion - we are not holding a monopoly on how 
instrumentation is supposed to be done and we've been wrong 
before.

There's a highly constructive, open attitude towards LTTNG and 
has been for years:

 " Mathieu, please split it up and integrate/unify it with the 
   existing instrumentation features of Linux - and if it 
   replaces existing stuff because an LTTNG component is 
   superior then so be it. "

Let me repeat it: there's no lack of willingness of cooperation 
from the kernel instrumentation subsystem side. There's a lack 
of movement from Mathieu - *he* is keeping LTTNG fragmented for 
barely justifyable technological reasons.

Thus there's absolutely no forward movement from having this in 
drivers/staging/ - in fact there's backwards movement: yet 
another instrumentation gadget with its own separate ABI and 
highly overlapping functionality, plus even less incentive for 
it to cooperate...

It is not the typical drivers/staging/ situation where there's 
either lack of work on a piece of code or some fundamental 
disagreement about the right model. LTTNG has been 
*intentionally* kept a separate entity, a separate brand, for 
whatever non-technical reasons. How will drivers/staging/ change 
that? It won't. It's a bit like VirtualBox really.

In short: this move only *increases* the incentive for LTTNG to 
stay fragmented and/or force modularization crap like the highly 
unfortunate situation of security modules ...

> > I.e. putting LTTNG into drivers/staging/ will not really 
> > solve anything - and in may in fact delay any sane technical 
> > resolution:
> > 
> > There's a difference between a driver that has to go into 
> > drivers/staging/ because nobody cares enough [and the driver 
> > isnt high quality enough yet], and a core kernel feature 
> > that we DO care about and which HAS BEEN REJECTED IN ITS 
> > FORM.
> 
> I didn't realize that lttng was rejected, when was that done?  
> I couldn't find it in the archives anywhere.

It wasnt resubmitted for years - see the pattern and see the 
problem? :-)

Merging it will cause even *less* cooperation, because of the 
reasons above and because LTTNG adds a parallel ABI.

> The fact that distros have been shipping and relying on it for 
> years shows that it is something that is needed, and it being 
> self-contained, makes it eligible for the staging tree.

LTT(NG) was simply the historically first tracing toolkit that 
embedded people got used to and there's still some inertia - and 
distros add a lot of crap that people find marginally useful
which perpetuates the fork if there's at least one active
developer behind it. Most of its functionality is available via
existing upstream functionality - and where not we are more than
willing to accomodate patches!

drivers/staging/ is a tool that i support in many (in fact most) 
cases - but i don't support it if it does harm.

I'm supposed to say 'no' to extra complexity more often, and 
this is definitely one of those cases:

Nacked-by: Ingo Molnar <mingo@elte.hu>

Also obviously NAK to the scheduler symbol export - that alone 
should tell you that it's not just a "driver" - it deeply hooks 
into the core kernel...

Please respect the NAK.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-07 22:57             ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
@ 2011-12-08  5:40               ` Ingo Molnar
  0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-08  5:40 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, Peter Zijlstra, devel, lttng-dev, linux-kernel,
	Linus Torvalds, Andrew Morton, Thomas Gleixner, Steven Rostedt,
	Frederic Weisbecker


* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:

> Hi Ingo,
> 
> * Ingo Molnar (mingo@elte.hu) wrote:
> [...]
> > Mathieu, please work with the tracing folks who DO care about 
> > this stuff. It's not like there's a lack of interest in this 
> > area, nor is there a lack of willingness to take patches. What 
> > there is a lack of is your willingness to actually work on 
> > getting something unified, integrated to users...
> > 
> > LTTNG has been going on for how many years? I havent seen many 
> > steps towards actually *merging* its functionality - you insist 
> > on doing your own random thing, which is different in random 
> > ways. Yes, some of those random ways may in fact be better than 
> > what we have upstream - would you be interested in filtering 
> > those out and pushing them upstream? I certainly would like to 
> > see that happen.
> >
> > We want to pick the best features, and throw away current 
> > upstream code in favor of superior out of tree code - this 
> > concept of letting crap sit alongside each other when people do 
> > care i cannot agree with.
> 
> LTTng 2.0, today, offers a unified interface for kernel and 
> userspace tracing, in the form of libraries and git-alike 
> command line user interface. [...]

Note that Arnaldo is working on such a perf-alike tracing tool 
workflow with the new 'trace' utility that we announced and 
prototyped a couple of months ago.

The perf.data data format is now extensible as well and 
tightened for transportability. Tools such as PowerTop or 
sysprof have standardized around the perf ABI.

So there's a *lot* of overlap with existing upstream efforts and 
the last thing we need is the parallel LTTNG ABI.

Are you willing to merge LTTNG into our existing kernel and 
userspace infrastructure and ABIs, with the possible end result 
that LTTNG ceases to be a separately named entity?

Mind hooking up with Arnaldo and with Steve regarding how we 
could best split up the LTTNG bits and move them upstream?

Frankly, i've seen a *lot* of talk from you but unfortunately 
*very* little action on that front, so i think my healthy 
scepticism is justified.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-08  5:23               ` Ingo Molnar
@ 2011-12-08 23:27                 ` Greg KH
  2011-12-19 10:49                   ` Ingo Molnar
  0 siblings, 1 reply; 51+ messages in thread
From: Greg KH @ 2011-12-08 23:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
	linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner

On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote:
> 
> * Greg KH <greg@kroah.com> wrote:
> 
> > > Same goes for a whole lot of other crap that distros are 
> > > carrying. Would we want to merge a different CPU scheduler 
> > > or the 4g:4g patch or a completely new networking stack into 
> > > drivers/staging/? I don't think so.
> > 
> > Distros have new CPU schedulers and are still dragging the 4g 
> > split around?  A whole new networking stack would be 
> > interesting, and if self-contained, possible :)
> 
> The point being, there's legitimate reasons to refuse crap to an 
> area that *people care about* in a constructive manner.
> 
> There's no rejection of LTTNG in the "hey, go away, you are 
> doing it wrong" fashion - we are not holding a monopoly on how 
> instrumentation is supposed to be done and we've been wrong 
> before.
> 
> There's a highly constructive, open attitude towards LTTNG and 
> has been for years:
> 
>  " Mathieu, please split it up and integrate/unify it with the 
>    existing instrumentation features of Linux - and if it 
>    replaces existing stuff because an LTTNG component is 
>    superior then so be it. "

Ok, that's fair enough.

Mathieu, will you please work on this?  Or is there some reason you
don't feel this is possible?

> drivers/staging/ is a tool that i support in many (in fact most) 
> cases - but i don't support it if it does harm.
> 
> I'm supposed to say 'no' to extra complexity more often, and 
> this is definitely one of those cases:
> 
> Nacked-by: Ingo Molnar <mingo@elte.hu>
> 
> Also obviously NAK to the scheduler symbol export - that alone 
> should tell you that it's not just a "driver" - it deeply hooks 
> into the core kernel...
> 
> Please respect the NAK.

Will do, I'll go delete it from the staging-next tree now.

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-08 23:27                 ` Greg KH
@ 2011-12-19 10:49                   ` Ingo Molnar
  2011-12-19 15:30                     ` [lttng-dev] " Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2011-12-19 10:49 UTC (permalink / raw)
  To: Greg KH
  Cc: Peter Zijlstra, Mathieu Desnoyers, devel, lttng-dev,
	linux-kernel, Linus Torvalds, Andrew Morton, Thomas Gleixner


* Greg KH <greg@kroah.com> wrote:

> On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote:
> > 
> > * Greg KH <greg@kroah.com> wrote:
> > 
> > > > Same goes for a whole lot of other crap that distros are 
> > > > carrying. Would we want to merge a different CPU scheduler 
> > > > or the 4g:4g patch or a completely new networking stack into 
> > > > drivers/staging/? I don't think so.
> > > 
> > > Distros have new CPU schedulers and are still dragging the 4g 
> > > split around?  A whole new networking stack would be 
> > > interesting, and if self-contained, possible :)
> > 
> > The point being, there's legitimate reasons to refuse crap to an 
> > area that *people care about* in a constructive manner.
> > 
> > There's no rejection of LTTNG in the "hey, go away, you are 
> > doing it wrong" fashion - we are not holding a monopoly on how 
> > instrumentation is supposed to be done and we've been wrong 
> > before.
> > 
> > There's a highly constructive, open attitude towards LTTNG and 
> > has been for years:
> > 
> >  " Mathieu, please split it up and integrate/unify it with the 
> >    existing instrumentation features of Linux - and if it 
> >    replaces existing stuff because an LTTNG component is 
> >    superior then so be it. "
> 
> Ok, that's fair enough.
> 
> Mathieu, will you please work on this?  Or is there some 
> reason you don't feel this is possible?

Mathieu, any update on this? I don't want the LTTNG goodies to 
drop on the floor - we just have to integrate them properly.

If you 100% disagree with how specific things are done upstream 
right now then don't hold back: just replace existing mechanisms 
- that gives a starting point to discuss what the best way is 
forward.

> > drivers/staging/ is a tool that i support in many (in fact most) 
> > cases - but i don't support it if it does harm.
> > 
> > I'm supposed to say 'no' to extra complexity more often, and 
> > this is definitely one of those cases:
> > 
> > Nacked-by: Ingo Molnar <mingo@elte.hu>
> > 
> > Also obviously NAK to the scheduler symbol export - that alone 
> > should tell you that it's not just a "driver" - it deeply hooks 
> > into the core kernel...
> > 
> > Please respect the NAK.
> 
> Will do, I'll go delete it from the staging-next tree now.

Thanks Greg!

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-19 10:49                   ` Ingo Molnar
@ 2011-12-19 15:30                     ` Mathieu Desnoyers
  2011-12-20 11:08                       ` Ingo Molnar
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-19 15:30 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Steven Rostedt

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> * Greg KH <greg@kroah.com> wrote:
> 
> > On Thu, Dec 08, 2011 at 06:23:54AM +0100, Ingo Molnar wrote:
[...]
> > > There's a highly constructive, open attitude towards LTTNG and 
> > > has been for years:
> > > 
> > >  " Mathieu, please split it up and integrate/unify it with the 
> > >    existing instrumentation features of Linux - and if it 
> > >    replaces existing stuff because an LTTNG component is 
> > >    superior then so be it. "
> > 
> > Ok, that's fair enough.
> > 
> > Mathieu, will you please work on this?  Or is there some 
> > reason you don't feel this is possible?
> 
> Mathieu, any update on this? I don't want the LTTNG goodies to 
> drop on the floor - we just have to integrate them properly.
> 
> If you 100% disagree with how specific things are done upstream 
> right now then don't hold back: just replace existing mechanisms 
> - that gives a starting point to discuss what the best way is 
> forward.

I'm bringing a though question then: what should we do if I strongly
think that the current ABIs should be replaced ?  To support this, let's
note that the current perf ABI:

 - lacks versioning information to handle change. I think shipping the tracer
   tools within the Linux tools/ directory made sense for an initial
   phase that made tracer solutions more popular for kernel developers
   (and it did a great job a that), but if we want to move on to build
   tools that target a wider audience, we should leave the tools/ sandbox
   and create separate projects, with clearly defined ABIs, using ABI
   versioning to manage changes. At this point, I think that perf tool
   shipped within tools/ is more than anything a pain for
   non-kernel-developer users, and favors design of sloppy ABIs.

 - makes it impossible to move to CTF (Common Trace Format) and benefit
   from the added features it allows,

 - makes it needlessly hard, if not impossible, for perf to move to
   something that would have the benefits brought by the fast unified
   ring buffer code I created 2 years ago,

 - makes it impossible to benefit from the LTTng fast trace clocks.

Also, it should be noted that I am finding that the way perf evolved
into a large monolithic binary blob that needs to be all enabled or all
disabled makes it quite hard to extend and re-use. As a matter of fact,
there are various cases where Steven and I tried to create performance
tests for the perf ring buffer and just could not do it without hacking
the perf code. I would definitely prefer to go for a modular approach for
the in-kernel code, and an approach based on user-level libraries for
low-level tracer interaction, with applications depending on those
libraries, again all handled with ABI versioning and library versioning.

I have to give recognition to perf: it's a fantastic performance counter
management/sampling tool, but it has clearly never been geared towards
low-overhead tracing, and this shows.

One possible way for moving things forward is to leave the current
perf/ftrace implementation and ABIs in place along with the existing
tools. We could create a new ABI merging perf, ftrace and LTTng best
features into one (e.g. kstrace for Kernel System Trace -- just made it
up, better ideas are welcome), and gradually move the user-space part of
the 3 tools to the new ABI. It is worth noting that the need for a new
ABI is something many people involved in tracing -- by that I mean those
doing most of the actual upstream tracer implementation work -- agreed
upon in the last 2 years when meetings at conferences. This would allow
a deprecation phase to take place, and would allow removal of the
maintenance burden of the duplicated Perf/Ftrace ABIs, all that while
also bringing in an ABI that allows handling of change and innovation,
which is, IMHO, the key limiting factor of the current ABIs.

By doing so, perf could become the set of tools targeting what it does
best: performance counters management and sampling, ftrace could keep on
targeting function tracing, and lttng could be used for all-system
tracing, everyone sharing the same kernel-level implementation and ABIs
(kstrace ABI).

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-19 15:30                     ` [lttng-dev] " Mathieu Desnoyers
@ 2011-12-20 11:08                       ` Ingo Molnar
  2011-12-20 21:46                         ` Frank Rowand
                                           ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-20 11:08 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo


(Cc:-ing Arnaldo on this as well.)

* Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:

> > Mathieu, any update on this? I don't want the LTTNG goodies 
> > to drop on the floor - we just have to integrate them 
> > properly.
> > 
> > If you 100% disagree with how specific things are done 
> > upstream right now then don't hold back: just replace 
> > existing mechanisms - that gives a starting point to discuss 
> > what the best way is forward.
> 
> I'm bringing a though question then: what should we do if I 
> strongly think that the current ABIs should be replaced ?  To 
> support this, let's note that the current perf ABI:
> 
>  - lacks versioning information to handle change. [...]

That's not actually true on *any* level: we are changing, 
evolving and extending the perf ABIs all the time.

There's two main API/ABI components:

1) the perf syscall which is part of the Linux syscall ABI.

Individual versions of the ABI have (monotonically increasing) 
sizes for "struct perf_event_attr" - you can consider these 
natural ABI versioning.

So the 'versioning' is not done via some inflexible and ugly, 
Windows-alike 'explicit ABI version' field, but done via 
structure sizes and -ENOSYS.

We've iterated and versioned it numerous times in the past 10 
kernel releases, in a backwards compatible manner.

2) the perf.data file

The versioning there is capability bitmask based - modelled 
after ext2/ext3/ext4 capability bitmasks. It's extensible as 
well.

I think your concentration on ABIs is missing a very fundamental 
property of instrumentation:

  the life-time and persistence of instrumentation data is 
  typically very short ('days' is already an exception - typical 
  is minutes, at most hours), and for that reason we havent been 
  getting much pressure from users to maintain a perf.data ABI - 
  but we are doing it nevertheless.

Instrumentation is fundamentally about the 'here and now' and so 
it fundamentally differs from things like backup formats and 
database formats. An ABI does not hurt and we are maintaining 
it, but you are overrating its importance significantly.


>    [...] I think shipping the tracer tools within the Linux 
>    tools/ directory made sense for an initial phase that made 
>    tracer solutions more popular for kernel developers (and it 
>    did a great job a that), but if we want to move on to build 
>    tools that target a wider audience, we should leave the 
>    tools/ sandbox and create separate projects, with clearly 
>    defined ABIs, using ABI versioning to manage changes. At 
>    this point, I think that perf tool shipped within tools/ is 
>    more than anything a pain for non-kernel-developer users, 
>    and favors design of sloppy ABIs.

I think you've thoroughly misunderstood the upstream ABI 
versioning status quo, which makes your argument out of this 
world.

The perf ABIs are well-defined and well-maintained. See an 
ad-hoc ABI and tool compatibility experiment i made here:

   [F.A.Q.] perf ABI backwards and forwards compatibility
   https://lkml.org/lkml/2011/11/8/77

>  - makes it impossible to move to CTF (Common Trace Format) 
>    and benefit from the added features it allows,

"CTF" was mainly written by yourself, right?

If there's any tool worth caring about that wants to deal in CTF 
then it can be converted just fine. I don't think it matters 
nearly as much as you seem to imply, see my reply further below.

>  - makes it needlessly hard, if not impossible, for perf to
>    move to something that would have the benefits brought by 
>    the fast unified ring buffer code I created 2 years ago,

The current upstream code actually has a fast unified 
ring-buffer, mmap()-ed to user-space, so you'd have to be a bit 
more specific about that point.

>  - makes it impossible to benefit from the LTTng fast trace 
>  clocks.

We have various trace clocks upstream as well - so you'd have to 
outline it specifically why it's "impossible".

> Also, it should be noted that I am finding that the way perf 
> evolved into a large monolithic binary blob that needs to be 
> all enabled or all disabled makes it quite hard to extend and 
> re-use. [...]

There's a (very) healthy in-flux of features - it's one of the 
most active kernel and userpace projects we have.

So *others* don't find it hard to work with. If you have 
specific observations i'm sure Arnaldo will appreciate them.

[ I snipped the rest of your reply - you seem to have deep 
  rooted misconceptions about what the current upstream 
  principles and practices are in this area: you are banging on 
  open doors! ]

Anyway, my prior request+offer stands: please split LTTNG up 
into individual feature blocks done to extend or replace 
existing instrumentation features and offer them as changes to 
existing upstream instrumentation code. We want every 
conceivable useful feature, but we *really* don't want 
schizophrenic duplication in this area.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-20 11:08                       ` Ingo Molnar
@ 2011-12-20 21:46                         ` Frank Rowand
  2011-12-23 10:51                           ` Ingo Molnar
  2011-12-21 18:47                         ` Aaron Spear
  2011-12-23 16:46                         ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
  2 siblings, 1 reply; 51+ messages in thread
From: Frank Rowand @ 2011-12-20 21:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mathieu Desnoyers, Greg KH, devel, Peter Zijlstra, linux-kernel,
	lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo

On 12/20/11 03:08, Ingo Molnar wrote:
> 
> (Cc:-ing Arnaldo on this as well.)
> 
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> 

< snip >

> I think your concentration on ABIs is missing a very fundamental 
> property of instrumentation:
> 
>   the life-time and persistence of instrumentation data is 
>   typically very short ('days' is already an exception - typical 
>   is minutes, at most hours), and for that reason we havent been 
>   getting much pressure from users to maintain a perf.data ABI - 
>   but we are doing it nevertheless.
> 
> Instrumentation is fundamentally about the 'here and now' and so 
> it fundamentally differs from things like backup formats and 
> database formats. An ABI does not hurt and we are maintaining 
> it, but you are overrating its importance significantly.

Just to provide visibility to a different use case...

The life time of my data is typically weeks, months, or years
(though I am not likely to re-process year old raw data).

< snip >

-Frank


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-20 11:08                       ` Ingo Molnar
  2011-12-20 21:46                         ` Frank Rowand
@ 2011-12-21 18:47                         ` Aaron Spear
  2011-12-21 18:58                           ` Christoph Hellwig
  2011-12-23 16:46                         ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
  2 siblings, 1 reply; 51+ messages in thread
From: Aaron Spear @ 2011-12-21 18:47 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: devel, Peter Zijlstra, Greg KH, linux-kernel, Steven Rostedt,
	Arnaldo Carvalho de Melo, lttng-dev, Mathieu Desnoyers,
	Andrew Morton, Linus Torvalds, Thomas Gleixner,
	Mathieu Desnoyers

* Ingo Molnar <mingo@elte.hu> wrote:

> "CTF" was mainly written by yourself, right?
> 
> If there's any tool worth caring about that wants to deal in CTF
> then it can be converted just fine. I don't think it matters
> nearly as much as you seem to imply, see my reply further below.

Hi Ingo,

I thought it might be a useful point of reference to mention that there is a commitment to CTF for more than just LTTng.  The Multicore Association and member companies including TI, Freescale, Samsung, Mentor Graphics, Wind River Systems, VMware and others intend to use CTF as a lingua franca for correlation of traces taken from different tracing technologies in heterogeneous multi-core systems.  Linux is pivotal here of course, but we are also aggregating various types of hardware traces as well as instrumentation trace from bare metal, RTOS's, and other OS's.  Many of the requirements that went into the draft CTF specification were driven by this working groups experience in the embedded industry and many different legacy tracing technologies.  While Mathieu has been instrumental in creating CTF, he is certainly not the only one with a vested interest in its future.

respectfully,
Aaron Spear - VMware
Chairman, Multicore Association Tools Infrastructure Working Group

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-21 18:47                         ` Aaron Spear
@ 2011-12-21 18:58                           ` Christoph Hellwig
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Hellwig @ 2011-12-21 18:58 UTC (permalink / raw)
  To: Aaron Spear
  Cc: Ingo Molnar, devel, Peter Zijlstra, Greg KH, linux-kernel,
	Steven Rostedt, Arnaldo Carvalho de Melo, lttng-dev,
	Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Mathieu Desnoyers

Vmware using it is more a reason to avoid it than using it.. :)

And most certainly not a reason to export internal kernel details.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules
  2011-12-20 21:46                         ` Frank Rowand
@ 2011-12-23 10:51                           ` Ingo Molnar
  0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2011-12-23 10:51 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Mathieu Desnoyers, Greg KH, devel, Peter Zijlstra, linux-kernel,
	lttng-dev, Mathieu Desnoyers, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo


* Frank Rowand <frank.rowand@am.sony.com> wrote:

> On 12/20/11 03:08, Ingo Molnar wrote:
> > 
> > (Cc:-ing Arnaldo on this as well.)
> > 
> > * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> > 
> 
> < snip >
> 
> > I think your concentration on ABIs is missing a very fundamental 
> > property of instrumentation:
> > 
> >   the life-time and persistence of instrumentation data is 
> >   typically very short ('days' is already an exception - typical 
> >   is minutes, at most hours), and for that reason we havent been 
> >   getting much pressure from users to maintain a perf.data ABI - 
> >   but we are doing it nevertheless.
> > 
> > Instrumentation is fundamentally about the 'here and now' and so 
> > it fundamentally differs from things like backup formats and 
> > database formats. An ABI does not hurt and we are maintaining 
> > it, but you are overrating its importance significantly.
> 
> Just to provide visibility to a different use case...
> 
> The life time of my data is typically weeks, months, or years 
> (though I am not likely to re-process year old raw data).

I'm not saying that it's absolutely never done: for example 
monitoring/logging on a production box and evaluating events 
only once per month would certainly qualify.

I just say that the overwhelming majority of usecases utilize 
traces on a short time-span and that we must keep the common 
usecase in mind when supporting not so common usecases.

It's the same deal as with -rt: compared to the 'normal' usage 
of Linux -rt is somewhat of a special case - yet it's still 
something very much worth doing, as long as the main usecase is 
always kept in mind.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
  2011-12-20 11:08                       ` Ingo Molnar
  2011-12-20 21:46                         ` Frank Rowand
  2011-12-21 18:47                         ` Aaron Spear
@ 2011-12-23 16:46                         ` Mathieu Desnoyers
  2011-12-23 17:21                           ` Ted Ts'o
  2 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-23 16:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Greg KH, devel, Peter Zijlstra, linux-kernel, lttng-dev,
	Andrew Morton, Linus Torvalds, Thomas Gleixner, Steven Rostedt,
	Arnaldo Carvalho de Melo

Hi Ingo,

I'll break down my reply in various sub-topics, and address them
separately in the following weeks. Let's start with the ABIs.

* Ingo Molnar (mingo@elte.hu) wrote:
> 
> (Cc:-ing Arnaldo on this as well.)
> 
> * Mathieu Desnoyers <compudj@krystal.dyndns.org> wrote:
> 
> > > Mathieu, any update on this? I don't want the LTTNG goodies 
> > > to drop on the floor - we just have to integrate them 
> > > properly.
> > > 
> > > If you 100% disagree with how specific things are done 
> > > upstream right now then don't hold back: just replace 
> > > existing mechanisms - that gives a starting point to discuss 
> > > what the best way is forward.
> > 
> > I'm bringing a though question then: what should we do if I 
> > strongly think that the current ABIs should be replaced ?  To 
> > support this, let's note that the current perf ABI:
> > 
> >  - lacks versioning information to handle change. [...]
> 
> That's not actually true on *any* level: we are changing, 
> evolving and extending the perf ABIs all the time.

You may be able to evolve and extend the Perf ABI, but the way this ABI
is designed does not allow you to change it in ways that would introduce
ABI incompatibility between versions (the equivalent of a major version
number change).

You're therefore gradually painting yourself in a corner without any
ability to go back and revisit previous decisions, and this is bad
because revisiting those past decisions will be needed to bring in some
LTTng features, because those decisions were taken without having those
features in mind. Supporting a new feature is not always as easy as
"extending a structure" as you seem to imply.

> There's two main API/ABI components:
> 
> 1) the perf syscall which is part of the Linux syscall ABI.
> 
> Individual versions of the ABI have (monotonically increasing) 
> sizes for "struct perf_event_attr" - you can consider these 
> natural ABI versioning.
> 
> So the 'versioning' is not done via some inflexible and ugly, 
> Windows-alike 'explicit ABI version' field, but done via 
> structure sizes and -ENOSYS.

Judging versions as inflexibile and ugly is merely a matter of taste.
However, the inability to do any kind of major change due to the way the
Perf ABI is made has a clear direct impact on the ability to innovate
within this project.

> We've iterated and versioned it numerous times in the past 10 
> kernel releases, in a backwards compatible manner.
> 
> 2) the perf.data file
> 
> The versioning there is capability bitmask based - modelled 
> after ext2/ext3/ext4 capability bitmasks. It's extensible as 
> well.

AFAIU, filesystems have very strict compatibility requirements because
they sit on hard drives for years on live systems that cannot always
easily permit migration between incompatible layouts. Traces don't have
the same constraints (see below),

> 
> I think your concentration on ABIs is missing a very fundamental 
> property of instrumentation:
> 
>   the life-time and persistence of instrumentation data is 
>   typically very short ('days' is already an exception - typical 
>   is minutes, at most hours), and for that reason we havent been 
>   getting much pressure from users to maintain a perf.data ABI - 
>   but we are doing it nevertheless.
> 
> Instrumentation is fundamentally about the 'here and now' and so 
> it fundamentally differs from things like backup formats and 
> database formats. An ABI does not hurt and we are maintaining 
> it, but you are overrating its importance significantly.

I think you are really focusing on a developer use-case, which might be
why you are missing the big picture. How many Linux developers are out
there ? How many Linux system administrators are out there ?  Many, many
more. With all due respect, I'm afraid your definition of "typically" is
limited by your developer-centric vision. So far, I came up with the
following breakdown of use-cases in terms of trace data life-span:

- Long-persistence traces (old traces): for this use-case, a conversion
  phase is usually OK. These long-persistance traces are useful in
  production system monitoring scenarios, and for finding delta in
  execution between different runs of a test suite (for instance). This
  use-case allows format breakage if the old format can be identified by
  a trace converter.
- Short-lived traces (debugging use-case): pretty much anything
  would do, as long as the user-level tool can detect if it understands
  the layout.
- Live traces: we want to minimize the overhead, both on the trace
  producer and on the machine performing the data analysis (which can be
  either the traced machine or a separate host), while still providing a
  live stream of data. This is useful for applications like lttngtop
  (showing a live report of the system) and for production system
  monitoring. In this case, we want the tools to be able to find out if
  they can read the trace format (or report an error, asking for
  upgrade if they can't). Trace conversion is not appropriate in this
  scenario due to the added timing complexity and overhead.

As you will notice, none of these use-cases require a filesystem-alike
bitmask-based compatibility ABI at the trace format level.

Using explicit versioning allows drastic changes to be done when they
are required, in the process allowing a trace converter to be used to
deal with "old" legacy traces, and allowing a live trace
aggregator/analyzer to detect if it can support the live trace stream.


> >    [...] I think shipping the tracer tools within the Linux 
> >    tools/ directory made sense for an initial phase that made 
> >    tracer solutions more popular for kernel developers (and it 
> >    did a great job a that), but if we want to move on to build 
> >    tools that target a wider audience, we should leave the 
> >    tools/ sandbox and create separate projects, with clearly 
> >    defined ABIs, using ABI versioning to manage changes. At 
> >    this point, I think that perf tool shipped within tools/ is 
> >    more than anything a pain for non-kernel-developer users, 
> >    and favors design of sloppy ABIs.
> 
> I think you've thoroughly misunderstood the upstream ABI 
> versioning status quo, which makes your argument out of this 
> world.
> 
> The perf ABIs are well-defined and well-maintained. See an 
> ad-hoc ABI and tool compatibility experiment i made here:
> 
>    [F.A.Q.] perf ABI backwards and forwards compatibility
>    https://lkml.org/lkml/2011/11/8/77

I hope my answer above explains why I think the what perf handles ABI
changes is a terrible choice. In summary:

- Perf is painting itself in a corner, not allowing any ABI breakage,
  only "extensions", which limits integration of features that require
  core changes,
- It's doing so without even needing it: Perf is using an ABI versioning
  scheme designed for filesystems, when it is not in fact driven by the
  same constraints.

Best regards,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
  2011-12-23 16:46                         ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
@ 2011-12-23 17:21                           ` Ted Ts'o
  2011-12-23 18:16                             ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Ted Ts'o @ 2011-12-23 17:21 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel,
	lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner,
	Steven Rostedt, Arnaldo Carvalho de Melo

On Fri, Dec 23, 2011 at 11:46:29AM -0500, Mathieu Desnoyers wrote:
> - It's doing so without even needing it: Perf is using an ABI versioning
>   scheme designed for filesystems, when it is not in fact driven by the
>   same constraints.

Well, there are *some* constraints.  I've been assured that despite
the fact that the perf client is in the kernel sources (something
which I still think is a bad idea, since it's leading to other bad
choices like kvm-tool wanting to be bundled with kernel sources), that
it is *not* a license to jerk the format around wildly --- that people
will have installed userspace binaries that shouldn't randomly break
they boot a new kernel.

So I'm *glad* that Perf is using an ABI versioning scheme that accepts
the same restraints as file systems.  It means we don't randomly break
userspace tools.

So Mathieu, if you think it is the current standards of backwards
compatibility are too rigid, what level of tool breakage do you think
is acceptable?  It's not just about the backwards compatibility of the
trace files, it's also about compatibility of userspace utilities.

For example, systemtap, where you had to recompile from source at
each kernel revision, and pray it would still build goes too far in
the other direction, wouldn't you agree?  What is the correct level of
kernel developer annoyance you think is appropriate to inflict on
ourselves?

Regards,


						- Ted

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
  2011-12-23 17:21                           ` Ted Ts'o
@ 2011-12-23 18:16                             ` Mathieu Desnoyers
  2011-12-25 17:46                               ` Ted Ts'o
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2011-12-23 18:16 UTC (permalink / raw)
  To: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra,
	linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo

Hi Ted,

* Ted Ts'o (tytso@mit.edu) wrote:
> On Fri, Dec 23, 2011 at 11:46:29AM -0500, Mathieu Desnoyers wrote:
> > - It's doing so without even needing it: Perf is using an ABI versioning
> >   scheme designed for filesystems, when it is not in fact driven by the
> >   same constraints.
> 
> Well, there are *some* constraints.  I've been assured that despite
> the fact that the perf client is in the kernel sources (something
> which I still think is a bad idea, since it's leading to other bad
> choices like kvm-tool wanting to be bundled with kernel sources), that
> it is *not* a license to jerk the format around wildly --- that people
> will have installed userspace binaries that shouldn't randomly break
> they boot a new kernel.
> 
> So I'm *glad* that Perf is using an ABI versioning scheme that accepts
> the same restraints as file systems.  It means we don't randomly break
> userspace tools.
> 
> So Mathieu, if you think it is the current standards of backwards
> compatibility are too rigid, what level of tool breakage do you think
> is acceptable?  It's not just about the backwards compatibility of the
> trace files, it's also about compatibility of userspace utilities.
> 
> For example, systemtap, where you had to recompile from source at
> each kernel revision, and pray it would still build goes too far in
> the other direction, wouldn't you agree?  What is the correct level of
> kernel developer annoyance you think is appropriate to inflict on
> ourselves?

I completely agree that systemtap did not have the right level of
compatibility towards changes. It clearly does not make sense to require
the tools to be updated whenever the kernel version and instrumentation
changes. What makes sense to me, though, is to allow breakage when a
newly introduced tracer feature requires the ABI to break.

What I currently see as a tradeoff sweet-spot between compatibility
burden and ability to innovate is to split the ABI and handle
compatibility as follows:

- ABIs to control the tracer
  - Versioned, ideally always incrementally adding features, but still
    keeping room for major changes if needed. We should expect very,
    very seldom breakages on this front. This requires update of tracer
    control tools when the ABI is broken.

- ABIs to transport tracing data
  - Versioned, can and should change when a feature or transport
    performance enhancement require to break compatibility. This
    requires update of trace data consumer tools when compability is
    broken.

(note that ABI to control the tracer and ABI to transport data could
share the same version numbering if the control tools and transport
tools happen to reside in the same user-level packages)

- The trace data format
  - Both versioned _and_ self-described.
  Self-description of the event/field layout allows the same tools to
  understand traces gathered on different kernel versions, on different
  architectures, with different tracer configurations.
  Versioning on top of the self-described trace format allows changes
  to what the trace self-description can express.

So the breakages would happen only when required by tracer tool
capability enhancements, not randomly when a kernel instrumentation
source happens to change.

Best regards,

Mathieu

P.S.: my next replies will be slightly delayed, due to Christmas
holidays.

> 
> Regards,
> 
> 
> 						- Ted

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
  2011-12-23 18:16                             ` Mathieu Desnoyers
@ 2011-12-25 17:46                               ` Ted Ts'o
  2012-01-12 14:09                                 ` Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Ted Ts'o @ 2011-12-25 17:46 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ingo Molnar, Greg KH, devel, Peter Zijlstra, linux-kernel,
	lttng-dev, Andrew Morton, Linus Torvalds, Thomas Gleixner,
	Steven Rostedt, Arnaldo Carvalho de Melo

On Fri, Dec 23, 2011 at 01:16:41PM -0500, Mathieu Desnoyers wrote:
> 
> (note that ABI to control the tracer and ABI to transport data could
> share the same version numbering if the control tools and transport
> tools happen to reside in the same user-level packages)

Being able to control the tracer but then not being able to look at
the trace output is useless.  So they might as well be the same
thing....

> - The trace data format
>   - Both versioned _and_ self-described.
>   Self-description of the event/field layout allows the same tools to
>   understand traces gathered on different kernel versions, on different
>   architectures, with different tracer configurations.
>   Versioning on top of the self-described trace format allows changes
>   to what the trace self-description can express.

So there are two ways to do this.  One is to make changes be backwards
compatible, so that the trace data format only breaks if you use the
new feature; if it doesn't you encode things the old fashioned way.
The other way of doing things is to randomly break users whenever the
tracing developers decide to add some random new feature, regardless
of whether or not a partiuclar user finds that new feature to be
useful.

The first is acceptable.  The second, IMHO, is not.  Linus has said
quite strongly that WE DO NOT BREAK USERSPACE.   Period.

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
  2011-12-25 17:46                               ` Ted Ts'o
@ 2012-01-12 14:09                                 ` Mathieu Desnoyers
  2012-01-12 14:54                                   ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2012-01-12 14:09 UTC (permalink / raw)
  To: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra,
	linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Steven Rostedt, Arnaldo Carvalho de Melo

* Ted Ts'o (tytso@mit.edu) wrote:
> On Fri, Dec 23, 2011 at 01:16:41PM -0500, Mathieu Desnoyers wrote:
[...]
> > - The trace data format
> >   - Both versioned _and_ self-described.
> >   Self-description of the event/field layout allows the same tools to
> >   understand traces gathered on different kernel versions, on different
> >   architectures, with different tracer configurations.
> >   Versioning on top of the self-described trace format allows changes
> >   to what the trace self-description can express.
> 
> So there are two ways to do this.  One is to make changes be backwards
> compatible, so that the trace data format only breaks if you use the
> new feature; if it doesn't you encode things the old fashioned way.
> The other way of doing things is to randomly break users whenever the
> tracing developers decide to add some random new feature, regardless
> of whether or not a partiuclar user finds that new feature to be
> useful.
> 
> The first is acceptable.  The second, IMHO, is not.  Linus has said
> quite strongly that WE DO NOT BREAK USERSPACE.   Period.

Please allow me to look into what needs to be kept compatible for a good
user experience (for both Linux end users and kernel developers) in the
case of tracing:

Let's first describe what we really utterly don't want: random breakages
between the kernel and user-level tracing control/transport/analysis
tools. Consequently, I think we could say that it would be unacceptable
for userspace tools to break for every slight change of kernel code. If
that would be the case (as it was with the approach SystemTap was taking
before they started hooking into the kernel with tracepoints), then we'd
need to regenerate the tools for pretty much every -rc kernel, and for
each local development tree, which would make those tools useless to
kernel developers.

It is important to clarify that tracing is, in my opinion, not part of
the runtime support, which makes it very different by nature from
filesystems and kernel runtime support. So I agree with Linus' argument
about not breaking userspace when applied to runtime support, because
being unable to even boot a system due to an ABI breakage is very much
unwanted. However, I think it should not be applied as-is to tracing,
because you cannot make a system unusable due to a tracer ABI breakage:
if a tracer can be packaged in a set of standalone modules, that clearly
shows it is not part of the system runtime support.

That being said, ABI versioning could still handle ABI changes without
significantly impacting the users: when an ABI breakage is needed, we
can keep the old code around for a while and expose both the old and new
ABIs. This would ensure that the user-level tools can query for the
specific ABI major version(s) they support. That should improve the user
experience by providing "deprecated" console warnings for a few kernel
releases before the old code ends up being removed.

So, in summary:

  * Old kernels vs new tools:

New tools can query for the latest ABI they know, and fall-back on older
ABIs, with limited features.

  * New kernels vs old tools:

Keeping around the old ABI for a deprecation phase lets old tools work on
a bleeding edge kernel while the ABI change is being introduced, which
should satisfy the kernel developer use-case.

Best regards,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 14:09                                 ` Mathieu Desnoyers
@ 2012-01-12 14:54                                   ` Steven Rostedt
  2012-01-12 15:39                                     ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
  0 siblings, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 14:54 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Ted Ts'o, Ingo Molnar, Greg KH, devel, Peter Zijlstra,
	linux-kernel, lttng-dev, Andrew Morton, Linus Torvalds,
	Thomas Gleixner, Arnaldo Carvalho de Melo

On Thu, 2012-01-12 at 09:09 -0500, Mathieu Desnoyers wrote:


> It is important to clarify that tracing is, in my opinion, not part of
> the runtime support, which makes it very different by nature from
> filesystems and kernel runtime support. So I agree with Linus' argument
> about not breaking userspace when applied to runtime support, because
> being unable to even boot a system due to an ABI breakage is very much
> unwanted. However, I think it should not be applied as-is to tracing,
> because you cannot make a system unusable due to a tracer ABI breakage:
> if a tracer can be packaged in a set of standalone modules, that clearly
> shows it is not part of the system runtime support.

Correct that tracing is not something that needs to make the system run,
but that's still no excuse to make ABI changes any different. Note, we
don't change things within the /proc/stat or /proc/*/stat and that's not
required to make the system run. We can add onto those files, but we
can't change what the current numbers mean.

> 
> That being said, ABI versioning could still handle ABI changes without
> significantly impacting the users: when an ABI breakage is needed, we
> can keep the old code around for a while and expose both the old and new
> ABIs. This would ensure that the user-level tools can query for the
> specific ABI major version(s) they support. That should improve the user
> experience by providing "deprecated" console warnings for a few kernel
> releases before the old code ends up being removed.

ABI version numbers are meaningless, and prone to be broken. The change
would have to be added with the commit that updates the change otherwise
git bisecting can get screwed up too.

The way ABI changes in the kernel have always been was to look at the
file itself and have the tool be able to determine what version of the
ABI is there based on what files exists, or what exists in the file.
I've done this with trace-cmd and ftrace. The debugfs system has changed
a lot, and trace-cmd can handle each change. I never had a need for a
version number to do this. I simply have trace-cmd look at what is
available and what isn't.

If you need to know if a syscall exists, you try it and if you get
-ENOSYS, then you know it doesn't exist. We have no need for an
arbitrary version number that is meaningless. The existence of (or lack
of) tells us all we need to know.


> 
> So, in summary:
> 
>   * Old kernels vs new tools:
> 
> New tools can query for the latest ABI they know, and fall-back on older
> ABIs, with limited features.
> 
>   * New kernels vs old tools:
> 
> Keeping around the old ABI for a deprecation phase lets old tools work on
> a bleeding edge kernel while the ABI change is being introduced, which
> should satisfy the kernel developer use-case.

We've done this without version numbers. Just look at all the udev
changes.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 14:54                                   ` Steven Rostedt
@ 2012-01-12 15:39                                     ` Mathieu Desnoyers
  2012-01-12 15:53                                       ` Steven Rostedt
  2012-01-12 20:00                                       ` Greg KH
  0 siblings, 2 replies; 51+ messages in thread
From: Mathieu Desnoyers @ 2012-01-12 15:39 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH,
	linux-kernel, Arnaldo Carvalho de Melo, lttng-dev,
	Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton

* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Thu, 2012-01-12 at 09:09 -0500, Mathieu Desnoyers wrote:
> 
> 
> > It is important to clarify that tracing is, in my opinion, not part of
> > the runtime support, which makes it very different by nature from
> > filesystems and kernel runtime support. So I agree with Linus' argument
> > about not breaking userspace when applied to runtime support, because
> > being unable to even boot a system due to an ABI breakage is very much
> > unwanted. However, I think it should not be applied as-is to tracing,
> > because you cannot make a system unusable due to a tracer ABI breakage:
> > if a tracer can be packaged in a set of standalone modules, that clearly
> > shows it is not part of the system runtime support.
> 
> Correct that tracing is not something that needs to make the system run,
> but that's still no excuse to make ABI changes any different. Note, we
> don't change things within the /proc/stat or /proc/*/stat and that's not
> required to make the system run. We can add onto those files, but we
> can't change what the current numbers mean.

This is because this stat ABI is volountarily exposed like this. It does
not mean that this is the case everywhere else in the kernel. And it
might not be the right way to expose it: I bet that PeterZ would really
like to get the thread priority value removed from /proc/*/stat, because
it exposes something "internal" to the scheduler from his point of view,
but this particular ABI has chosen to evolve without ever retiring a
value previously exported.

> 
> > 
> > That being said, ABI versioning could still handle ABI changes without
> > significantly impacting the users: when an ABI breakage is needed, we
> > can keep the old code around for a while and expose both the old and new
> > ABIs. This would ensure that the user-level tools can query for the
> > specific ABI major version(s) they support. That should improve the user
> > experience by providing "deprecated" console warnings for a few kernel
> > releases before the old code ends up being removed.
> 
> ABI version numbers are meaningless, and prone to be broken. The change
> would have to be added with the commit that updates the change otherwise
> git bisecting can get screwed up too.

Of course, the commit that updates the code would "fork" to a new ABI if
it ever need to diverge from the old one.

> The way ABI changes in the kernel have always been was to look at the
> file itself and have the tool be able to determine what version of the
> ABI is there based on what files exists, or what exists in the file.
> I've done this with trace-cmd and ftrace. The debugfs system has changed
> a lot, and trace-cmd can handle each change. I never had a need for a
> version number to do this. I simply have trace-cmd look at what is
> available and what isn't.
> 
> If you need to know if a syscall exists, you try it and if you get
> -ENOSYS, then you know it doesn't exist. We have no need for an
> arbitrary version number that is meaningless. The existence of (or lack
> of) tells us all we need to know.

pipe()/pipe2()
dup()/dup2()/dup3()
umount()/umount2()
mmap()/mmap2()
madvise()/madvise1()
eventfd()/eventfd2()

Those look very much like major version numbers to me. And these are
entirely compatible with your statement above about using -ENOSYS to
detect if the major version number is implemented or not.

If your only concern is that the major version number should be part of
the ABI name (as in the examples above), that can be arranged.

> 
> > 
> > So, in summary:
> > 
> >   * Old kernels vs new tools:
> > 
> > New tools can query for the latest ABI they know, and fall-back on older
> > ABIs, with limited features.
> > 
> >   * New kernels vs old tools:
> > 
> > Keeping around the old ABI for a deprecation phase lets old tools work on
> > a bleeding edge kernel while the ABI change is being introduced, which
> > should satisfy the kernel developer use-case.
> 
> We've done this without version numbers. Just look at all the udev
> changes.

Are you seriously refering to udev as an example of how to handle
changes, or as one of the worse ABI breakage mess that happened in the
Linux kernel history ? My own experience as a Linux users (in the
era around 2.6.12 kernels if my memory serves me right) lead me to think
it's the latter. And because udev is part of the runtime support, that
indeed led to non-bootable systems and lots of frustrated users.

Thanks,

Mathieu

> 
> -- Steve
> 
> 
> 
> _______________________________________________
> lttng-dev mailing list
> lttng-dev@lists.lttng.org
> http://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 15:39                                     ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
@ 2012-01-12 15:53                                       ` Steven Rostedt
  2012-01-12 15:59                                         ` Steven Rostedt
  2012-01-12 16:27                                         ` Mathieu Desnoyers
  2012-01-12 20:00                                       ` Greg KH
  1 sibling, 2 replies; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 15:53 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH,
	linux-kernel, Arnaldo Carvalho de Melo, lttng-dev,
	Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton

On Thu, 2012-01-12 at 10:39 -0500, Mathieu Desnoyers wrote:

> pipe()/pipe2()
> dup()/dup2()/dup3()
> umount()/umount2()
> mmap()/mmap2()
> madvise()/madvise1()
> eventfd()/eventfd2()
> 
> Those look very much like major version numbers to me. And these are
> entirely compatible with your statement above about using -ENOSYS to
> detect if the major version number is implemented or not.

That's a stretch in calling version numbers. All but the madvise case
above are how many parameters it takes, not really a "version" number.

It's adding a new syscall, not updating a version and then deprecating
the old one. As I believe all the above are still supported.

> 
> If your only concern is that the major version number should be part of
> the ABI name (as in the examples above), that can be arranged.

> > 
> > We've done this without version numbers. Just look at all the udev
> > changes.
> 
> Are you seriously refering to udev as an example of how to handle
> changes, or as one of the worse ABI breakage mess that happened in the
> Linux kernel history ? My own experience as a Linux users (in the
> era around 2.6.12 kernels if my memory serves me right) lead me to think
> it's the latter. And because udev is part of the runtime support, that
> indeed led to non-bootable systems and lots of frustrated users.

Yeah, I know it sucked, as I got burned by it too. But having "version"
numbers wouldn't have helped at all. In fact, it should have kept both
ways working much longer, or at least had the new udev support both. 

What udev did is more like what you want to do than what I did with
trace-cmd.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 15:53                                       ` Steven Rostedt
@ 2012-01-12 15:59                                         ` Steven Rostedt
  2012-01-12 16:27                                         ` Mathieu Desnoyers
  1 sibling, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 15:59 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Mathieu Desnoyers, devel, Ted Ts'o, Peter Zijlstra, Greg KH,
	linux-kernel, Arnaldo Carvalho de Melo, lttng-dev,
	Thomas Gleixner, Ingo Molnar, Linus Torvalds, Andrew Morton

On Thu, 2012-01-12 at 10:53 -0500, Steven Rostedt wrote:

> That's a stretch in calling version numbers. All but the madvise case
> above are how many parameters it takes, not really a "version" number.
> 
> It's adding a new syscall, not updating a version and then deprecating
> the old one. As I believe all the above are still supported.
> 

Actually, the madvise1() isn't supported. But this just shows that it
has nothing to do with a version number. What version is madvise()?

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 15:53                                       ` Steven Rostedt
  2012-01-12 15:59                                         ` Steven Rostedt
@ 2012-01-12 16:27                                         ` Mathieu Desnoyers
  2012-01-12 16:34                                           ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: Mathieu Desnoyers @ 2012-01-12 16:27 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel,
	Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Andrew Morton

* Steven Rostedt (rostedt@goodmis.org) wrote:
> On Thu, 2012-01-12 at 10:39 -0500, Mathieu Desnoyers wrote:
> 
> > pipe()/pipe2()
> > dup()/dup2()/dup3()
> > umount()/umount2()
> > mmap()/mmap2()
> > madvise()/madvise1()
> > eventfd()/eventfd2()
> > 
> > Those look very much like major version numbers to me. And these are
> > entirely compatible with your statement above about using -ENOSYS to
> > detect if the major version number is implemented or not.
> 
> That's a stretch in calling version numbers. All but the madvise case
> above are how many parameters it takes, not really a "version" number.
> 
> It's adding a new syscall, not updating a version and then deprecating
> the old one. As I believe all the above are still supported.
> 
> > 
> > If your only concern is that the major version number should be part of
> > the ABI name (as in the examples above), that can be arranged.
> 
> > > 
> > > We've done this without version numbers. Just look at all the udev
> > > changes.
> > 
> > Are you seriously refering to udev as an example of how to handle
> > changes, or as one of the worse ABI breakage mess that happened in the
> > Linux kernel history ? My own experience as a Linux users (in the
> > era around 2.6.12 kernels if my memory serves me right) lead me to think
> > it's the latter. And because udev is part of the runtime support, that
> > indeed led to non-bootable systems and lots of frustrated users.
> 
> Yeah, I know it sucked, as I got burned by it too. But having "version"
> numbers wouldn't have helped at all. In fact, it should have kept both
> ways working much longer, or at least had the new udev support both. 
> 
> What udev did is more like what you want to do than what I did with
> trace-cmd.

OK. Then how can trace-cmd support the LTTng features ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 16:27                                         ` Mathieu Desnoyers
@ 2012-01-12 16:34                                           ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2012-01-12 16:34 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: devel, Ted Ts'o, Peter Zijlstra, Greg KH, linux-kernel,
	Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner,
	Ingo Molnar, Linus Torvalds, Andrew Morton

On Thu, 2012-01-12 at 11:27 -0500, Mathieu Desnoyers wrote:

> > What udev did is more like what you want to do than what I did with
> > trace-cmd.
> 
> OK. Then how can trace-cmd support the LTTng features ?

New syscalls, or new files, and simply check if they exist.

New features should not break old ones.

-- Steve



^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 15:39                                     ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
  2012-01-12 15:53                                       ` Steven Rostedt
@ 2012-01-12 20:00                                       ` Greg KH
  2012-01-16  8:55                                         ` Ingo Molnar
  1 sibling, 1 reply; 51+ messages in thread
From: Greg KH @ 2012-01-12 20:00 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Steven Rostedt, Mathieu Desnoyers, devel, Ted Ts'o,
	Peter Zijlstra, linux-kernel, Arnaldo Carvalho de Melo,
	lttng-dev, Thomas Gleixner, Ingo Molnar, Linus Torvalds,
	Andrew Morton

On Thu, Jan 12, 2012 at 10:39:57AM -0500, Mathieu Desnoyers wrote:
> > We've done this without version numbers. Just look at all the udev
> > changes.
> 
> Are you seriously refering to udev as an example of how to handle
> changes, or as one of the worse ABI breakage mess that happened in the
> Linux kernel history ? My own experience as a Linux users (in the
> era around 2.6.12 kernels if my memory serves me right) lead me to think
> it's the latter. And because udev is part of the runtime support, that
> indeed led to non-bootable systems and lots of frustrated users.

Really?  You fail to remember the fact that we _fixed_ those
non-bootable systems by putting the userspace bits back, and symlinks,
and all other sorts of gyrations in order to prevent userspace from
breaking again.

And it worked, and people's machines worked again, and no one since then
has reported a problem.

So I think udev actually is a good example of how to do it right, we
provide proper backwards compatibility in the kernel to keep userspace
working.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [lttng-dev] Perf ABI (was: Re: [PATCH 09/11] sched: export task_prio to GPL modules)
  2012-01-12 20:00                                       ` Greg KH
@ 2012-01-16  8:55                                         ` Ingo Molnar
  0 siblings, 0 replies; 51+ messages in thread
From: Ingo Molnar @ 2012-01-16  8:55 UTC (permalink / raw)
  To: Greg KH
  Cc: Mathieu Desnoyers, Steven Rostedt, Mathieu Desnoyers, devel,
	Ted Ts'o, Peter Zijlstra, linux-kernel,
	Arnaldo Carvalho de Melo, lttng-dev, Thomas Gleixner,
	Linus Torvalds, Andrew Morton


* Greg KH <greg@kroah.com> wrote:

> So I think udev actually is a good example of how to do it 
> right, we provide proper backwards compatibility in the kernel 
> to keep userspace working.

I agree, i still have a udev system that i installed 5 years 
ago, and it's working mostly fine with current kernels.

Compatibility is a desirable property, it is something that 
preserves our users - and if done right it's almost never a big 
issue technically. If it is hindering someone then there must be 
other problems.

Of course to developers the simplest approach is always to just 
develop without regard for compatibility. The simplest form of 
that is that people write patches that work fine on their own 
systems but crash the kernel on other systems. We fix those 
bugs. Another, subtler form is when the patches work fine on 
their systems but break apps on other systems. We fix those bugs 
too.

That's why we have testing, regression tracking and maintainers, 
to control that - compatibility is just another dimension to 
'correctness', in the typical case with no inherent restrictions 
on future features and possibilities.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2012-01-16  8:55 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1322775683-8741-1-git-send-email-mathieu.desnoyers@efficios.com>
2011-12-01 21:41 ` [PATCH 01/11] mm: export vmalloc_sync_all symbol to GPL modules Mathieu Desnoyers
2011-12-01 21:57   ` Christoph Hellwig
2011-12-01 22:13     ` Greg KH
2011-12-01 22:19       ` Mathieu Desnoyers
2011-12-01 22:41         ` Greg KH
2011-12-01 22:28       ` Christoph Hellwig
2011-12-01 23:00         ` Greg KH
2011-12-01 21:41 ` [PATCH 03/11] fs/splice: export splice_to_pipe " Mathieu Desnoyers
2011-12-02  7:19   ` Jens Axboe
2011-12-02 12:32     ` Mathieu Desnoyers
2011-12-01 21:41 ` [PATCH 09/11] sched: export task_prio " Mathieu Desnoyers
2011-12-01 21:56   ` Peter Zijlstra
2011-12-01 22:04     ` Mathieu Desnoyers
2011-12-01 22:10       ` Peter Zijlstra
2011-12-01 22:15         ` Mathieu Desnoyers
2011-12-01 22:36           ` Mathieu Desnoyers
2011-12-01 23:05             ` Peter Zijlstra
2011-12-02 13:51               ` Mathieu Desnoyers
2011-12-01 23:06           ` Peter Zijlstra
2011-12-01 23:18             ` Greg KH
2011-12-01 23:47               ` Mathieu Desnoyers
2011-12-01 22:14     ` Greg KH
2011-12-01 22:20       ` Mathieu Desnoyers
2011-12-01 23:07       ` Peter Zijlstra
2011-12-01 23:17         ` Greg KH
2011-12-05 14:17           ` Ingo Molnar
2011-12-06 21:44             ` Greg KH
2011-12-08  5:23               ` Ingo Molnar
2011-12-08 23:27                 ` Greg KH
2011-12-19 10:49                   ` Ingo Molnar
2011-12-19 15:30                     ` [lttng-dev] " Mathieu Desnoyers
2011-12-20 11:08                       ` Ingo Molnar
2011-12-20 21:46                         ` Frank Rowand
2011-12-23 10:51                           ` Ingo Molnar
2011-12-21 18:47                         ` Aaron Spear
2011-12-21 18:58                           ` Christoph Hellwig
2011-12-23 16:46                         ` Perf ABI (was: Re: [lttng-dev] [PATCH 09/11] sched: export task_prio to GPL modules) Mathieu Desnoyers
2011-12-23 17:21                           ` Ted Ts'o
2011-12-23 18:16                             ` Mathieu Desnoyers
2011-12-25 17:46                               ` Ted Ts'o
2012-01-12 14:09                                 ` Mathieu Desnoyers
2012-01-12 14:54                                   ` Steven Rostedt
2012-01-12 15:39                                     ` [lttng-dev] Perf ABI (was: " Mathieu Desnoyers
2012-01-12 15:53                                       ` Steven Rostedt
2012-01-12 15:59                                         ` Steven Rostedt
2012-01-12 16:27                                         ` Mathieu Desnoyers
2012-01-12 16:34                                           ` Steven Rostedt
2012-01-12 20:00                                       ` Greg KH
2012-01-16  8:55                                         ` Ingo Molnar
2011-12-07 22:57             ` [PATCH 09/11] sched: export task_prio to GPL modules Mathieu Desnoyers
2011-12-08  5:40               ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).