linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [patch] printk subsystems
@ 2003-04-21 18:23 Perez-Gonzalez, Inaky
  2003-04-21 18:30 ` H. Peter Anvin
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-21 18:23 UTC (permalink / raw)
  To: 'Greg KH', Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'



> From: Greg KH [mailto:greg@kroah.com]
> 
> > Yep, that is the point, and it is small enough (5 ulongs) that
> > it can be embedded anywhere without being of high impact and
> > having to allocate it [first example that comes to mind is
> > for sending a device connection message; you can embed a short
> > message in the device structure and query that for delivery;
> > no buffer, no nothing, the data straight from the source].
> 
> And the device is removed from the system, the memory for that device is
> freed, and then a user comes along and trys to read that message.
> 
> oops...  :)

Hey! Come on! You don't think I am that lame, do you? Man what
a fame I do have!

Before the device vaporizes, it recalls the message, so there is 
no message to read - the same way you take away the sysfs data from
the sysfs tree ...

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-21 18:23 [patch] printk subsystems Perez-Gonzalez, Inaky
@ 2003-04-21 18:30 ` H. Peter Anvin
  0 siblings, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2003-04-21 18:30 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Greg KH', 'karim@opersys.com',
	'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

Perez-Gonzalez, Inaky wrote:
> 
> Hey! Come on! You don't think I am that lame, do you? Man what
> a fame I do have!
> 
> Before the device vaporizes, it recalls the message, so there is 
> no message to read - the same way you take away the sysfs data from
> the sysfs tree ...
> 

If you think that will happen with printk(), then, quite frankly, you're
seriously deluded.

	-hpa



^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-24 18:56 Manfred Spraul
@ 2003-04-24 19:10 ` bob
  0 siblings, 0 replies; 52+ messages in thread
From: bob @ 2003-04-24 19:10 UTC (permalink / raw)
  To: Manfred Spraul; +Cc: bob, linux-kernel

Manfred Spraul writes:
 > Robert wrote:
 > 
 > >There is both a qualitative difference and quantitative difference in a
 > >lockless algorithm as described versus one that uses locking.  Most
 > >importantly for Linux, these algorithms in practice have better performance
 > >characteristics.
 > >
 > Do you have benchmark numbers that compare "lockless" and locking 
 > algorithms on large MP systems?
 > 
 > For example, how much faster is one 'lock;cmpxchg' compared to 
 > 'spin_lock();if (x==var) var = y;spin_unlock();'.
 > 
 > So far I assumed that for spinlock that are only held for a few cycles, 
 > the cacheline trashing dominates, and not the spinning.
 > I've avoided to replace spin_lock+inc+spin_unlock with atomic_inc(). 
 > (Just look at the needed memory barriers: smp_mb__after_clear_bit & friends)
 > 
 > RCU uses per-cpu queues that are really lockless and avoid the cache 
 > trashing, that is a real win.
 > 
 > --
 >     Manfred
 > 

You're right in the common case - cache thrashing is definitely dominant
(though in K42 we've tried to be very careful to design code and data so
the last acquisition is almost always on the same processor).  The problem
arises is the process ever gets interrupted after spin_lock.  Then
performance falls of a cliff because everyone backs up for the lock.
That's what I had meant by in practice it works better.  From my experience
the OS likes to interrupt you in the place you least want :-).  I certainly
could point to lots of preemption numbers (which motivated the comment),
and though I'm sure there's the other, I don't know where offhand.

In some specific places it's probably all right to go with a spin lock, for
the logging/tracing code (which started this thread) that will be used
generically throughout the kernel by many callers, lockless is the way to
go.

-bob

Robert Wisniewski
The K42 MP OS Project
Advanced Operating Systems
Scalable Parallel Systems
IBM T.J. Watson Research Center
914-945-3181
http://www.research.ibm.com/K42/
bob@watson.ibm.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-24 18:56 Manfred Spraul
  2003-04-24 19:10 ` bob
  0 siblings, 1 reply; 52+ messages in thread
From: Manfred Spraul @ 2003-04-24 18:56 UTC (permalink / raw)
  To: bob; +Cc: linux-kernel

Robert wrote:

>There is both a qualitative difference and quantitative difference in a
>lockless algorithm as described versus one that uses locking.  Most
>importantly for Linux, these algorithms in practice have better performance
>characteristics.
>
Do you have benchmark numbers that compare "lockless" and locking 
algorithms on large MP systems?

For example, how much faster is one 'lock;cmpxchg' compared to 
'spin_lock();if (x==var) var = y;spin_unlock();'.

So far I assumed that for spinlock that are only held for a few cycles, 
the cacheline trashing dominates, and not the spinning.
I've avoided to replace spin_lock+inc+spin_unlock with atomic_inc(). 
(Just look at the needed memory barriers: smp_mb__after_clear_bit & friends)

RCU uses per-cpu queues that are really lockless and avoid the cache 
trashing, that is a real win.

--
    Manfred


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-22  5:09 Perez-Gonzalez, Inaky
@ 2003-04-24 18:22 ` bob
  0 siblings, 0 replies; 52+ messages in thread
From: bob @ 2003-04-24 18:22 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Tom Zanussi',
	'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Robert Wisniewski'

There is both a qualitative difference and quantitative difference in a
lockless algorithm as described versus one that uses locking.  Most
importantly for Linux, these algorithms in practice have better performance
characteristics.  There is a whole body of literature on lock free
algorithms (see Maurice Herlihy's 1988 PODC Wait-Free Synchronization
paper).  When a process holds a lock nobody else can make progress.  If
that process is interrupted everybody waits.  Furthermore, when designing
for scalability, queue locks are used, which considerably exacerbates the
problem (see Kontothanassis TOCS Feb 1997).  Locking reduces concurrency,
lockfree and lockless algorithms allow increased concurrency (both
processes can simultaneously log their events once they've reserved space).

The lockless tag is indeed correct, accurate, and helpful in identifying
the characteristics of the algorithm.  More of these algorithms, such as
the recent RCU work, will need to be placed into Linux for it to perform
well on multiprocessors.

Robert Wisniewski
The K42 MP OS Project
Advanced Operating Systems
Scalable Parallel Systems
IBM T.J. Watson Research Center
914-945-3181
http://www.research.ibm.com/K42/
bob@watson.ibm.com

Perez-Gonzalez, Inaky writes:
 > 
 > > From: Karim Yaghmour [mailto:karim@opersys.com]
> >
> > relayfs actually uses 2 mutually-exclusive schemes internally -
> > 'lockless' and 'locking', depending on the availability of a cmpxchg
> > instruction (lockless needs cmpxchg).  If the lockless scheme is being
> > used, relay_lock_channel() does no locking or irq disabling of any
> > kind i.e. it's basically a no-op in that case.  
> 
> So that means you are using cmpxchg to do the locking. I mean, not the
> "locking" itself, but a similar process to that of locking. I see. 
> 
> However, isn't it the almost the same as spinlocking? You are basically
> trying to "allocate" a channel idx with atomic cmpxchg; if it fails, you
> are retrying, spinning on the retry code until successful.
> 
> Not meaning to be an smartass here, but I don't buy the "lockless" tag,
> I would agree it is an optimized-lock scheme [assuming it works better
> than the spinlock case, that I am sure it does because if not you guys
> would have not gone through the process of implementing it], but it is
> not lockless.
> 

>> Don't get me wrong - I don't mean the actual difference is not important;
>> what I mean is not important is me buying the "lockless" tag or not. I 
>> actually think that the method you guys use is really sharp.


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-22 22:53 Perez-Gonzalez, Inaky
@ 2003-04-23  3:58 ` Tom Zanussi
  0 siblings, 0 replies; 52+ messages in thread
From: Tom Zanussi @ 2003-04-23  3:58 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Tom Zanussi', 'karim@opersys.com',
	'linux-kernel@vger.kernel.org'

Perez-Gonzalez, Inaky writes:
 > 
 > > From: Tom Zanussi [mailto:zanussi@us.ibm.com]
 > > Perez-Gonzalez, Inaky writes:
 > >  > > From: Tom Zanussi [mailto:zanussi@us.ibm.com]
 > >  > >
 > >  > > In relayfs, the event can be generated directly into the space
 > >  > > reserved for it - in fact this is exactly what LTT does.  There
 > aren't
 > >  > > two separate steps, one 'generating' the event and another copying it
 > >  > > to the relayfs buffer, if that's what you mean.
 > >  >
 > >  > In this case, what happens if the user space, through mmap, copies
 > >  > while the message is half-baked (ie, from another CPU) ... won't it
 > >  > be inconsistent?
 > > 
 > > There's a count kept, per sub-buffer, that's updated after each write.
 > > If this count doesn't match the expected size of the sub-buffer, the
 > > reader can ignore the incomplete buffer and come back to it later.
 > > The count is maintained automatically by relay_write(); if you're
 > > writing directly into the channel as LTT does though, part of the task
 > > is to call relay_commit() after the write, which updates the count and
 > > maintains consistency.
 > 
 > Hmmm, scratch, scratch ... there is something I still don't get here. 
 > I am in lockless_commit() - for what you say, and what I read, I would 
 > then expect the length of the sub-buffer would be mapped to user space, 
 > so I can memcpy out of the mmaped area and then take only the part that
 > is guaranteed to be consistent. But the atomic_add() is done on the 
 > rchan->scheme.lockless.fillcount[buffer_number]. So, I don't see how
 > that count pops out to user space, as rchan->buf to rchan->buf + rchan->
 > alloc_size is what is mapped, and rchan is a kernel-only struct that
 > is not exposed through mmap().
 > 

It 'pops' out to user space through some protocol defined between a
relayfs kernel client and the user space program.  relayfs doesn't say
anything about the protocol, but does provide the kernel client enough
information about the state of the channel via relay_info(), which
supplies among other things (these aren't the real names, they're
changed here to make things maybe a little clearer):

n_subbufs - number of sub-buffers
subbuf_size - size of each sub-bufer
subbuf_complete[] - array of booleans basically the result of 
		  fillcount[subbuf_no] == subbuf_size
subbufs_produced - by the channel
subbufs_consumed - by userspace client, maybe

This is enough information for a userspace client to figure out what
to log.  How it gets there and when is up to the client.  For
instance, the kernel client could send a signal whenever a sub-buffer
is full (which it's notified of, if it chooses to be, by a delivery
callback).  The user client could then do something like

buf = mmap(fd, n_bufs * bufsize); /* whole channel is mapped */ 

sighandler()
{
	get_channel_info_from_kernel(&info); /* ioctl/procfs/sysfs/... */
	subbufs_ready = subbufs_produced - subbufs_consumed;
	for(i=0; i<subbufs_ready; i++) {
		 subbuf_no = (subbufs_consumed + i) % n_subbufs;
		 if(!buffer_complete[subbuf_no])
			break; /* Try again next sig */
		 write(log_fd, buf + subbufno * subbuf_size, subbuf_size);
	 }
}

 > And then, once I have this, next time I read I don't want to read
 > what I already did; I guess I can advance my buf pointer to 
 > buf+real_size, but then how do I wrap around - meaning, how do I
 > detect when do I have to wrap?
 > 

Wrapping is taken care of automatically in the above code.

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-23  0:28 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-23  0:28 UTC (permalink / raw)
  To: 'karim@opersys.com'
  Cc: 'linux-kernel@vger.kernel.org', 'Tom Zanussi'


> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> The difference between copy_to_user() and memcpy() is not a
> minor detail. There is a reason why relayfs does its things the
> way it does.

Well, then I would like to ask you to help me understand what
is so radically different (aside from the might-sleep), because
I simply don't get it, and I am always willing to learn ..

> > That is a good point, that brought me yesterday night to the following
> > doubt. How do you guarantee integrity of the data when reading with
> > mmap. In other words, if I am just copying the mmap region, how do
> > I know that what I am copying is safe enough, that it is not being
> > modified by CPU #2, for example?
> [snip]
> 
> "Use the source, Luke."

Yep, see my last message to Tom, I am doing that - however, it is
pretty difficult to grasp things at the first try [specially when
you are kind of in a rush for many different things :)].

> OK, so you are suggesting we start making a difference in the kernel
> between those printks which are "optional" and those that are
> "compulsory"? Interested kernel developers are free to voice their
> interest at any time now ...

Where did I say that? I am not talking about printk() anywhere, 
btw, although if someone wants to do that, hey, it's their decision, 
isn't it? I am nobody's Mum as to tell them what to do and not 
to do.

OTOH, if someone would consider plugging printk to kue, I'd
consider kind of stupid to printk recallable messages ... if it
can be recalled, why print it at all?

I don't know why, but I have a feeling like you are taking all
this conversation too personal. It is not my intention to stomp
over all your work and criticize it. I am learning what you guys
have done, and yes, looking for defects or problems, because I
am concerned about things that don't match in my head and how
to solve them.

> > Now, if you want to make it resizable, that understands Japanese and
> > does double loops followed by a nose dive and a vertical climb up,
> > well, that's up to the client of the API. And I didn't want to
> > constraint the gymnastics that the client could do to handle a buffer.
> 
> Well, if we're talking about "double loops followed by a nose dive"
> we're certainly not going anywhere. I'll leave it to other LKML
> subscribers to decided wether automatically resizeable buffers are
> of interest.

Hey, where is your sense of humor?! Are you German, like my 
girlfriend? ;)

Automatically resizeable buffers are of *much* interest, to me at
least - remember that the only point I was stressing is I am leaving
that design/implementation issue out of the kue code, up to the 
client, while in relayfs it is inside of it.

I consider that a gain in kue's flexibility, while it is a gain 
in performance and space optimization on relayfs's hand (this is
my interpretation of Tom words in a previous mail and the code
I have had time to grasp; I think it makes full sense).

> > > I'm sorry, but the way I see printk() is that once I send something
> > > to it, it really ought to show up somewhere. Heck, I'm printk'ing
> > > it. If I plugged a device and the driver said "Your chair is
> > > on fire", I want to know about it whether the device has been
> > > unplugged later or not.
> >
> > I would say this case, printk(), would fit in my second example,
> > doesn't it? ... this is one message you want delivered, not recalled.
> 
> What I've been trying to say here is that there are no two kinds of
> printk. printk is printk and it ought to behave the same in all
> instances.

And I never said nowhere that it should not, in fact I have
stated clearly that printk's is something you shall not recall.
Twice, already ...

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-22 18:46 Perez-Gonzalez, Inaky
@ 2003-04-22 23:28 ` Karim Yaghmour
  0 siblings, 0 replies; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-22 23:28 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'linux-kernel@vger.kernel.org', 'Tom Zanussi'


[The CC list has been trimmed]

"Perez-Gonzalez, Inaky" wrote:
> copy_to_user() has to do some more gymnastics in the process,
> but basically, the bulk is the same [at least by reading the
> asm of __copy_user() in usercopy.c and __memcpy() in string.h
> -- it is kind of different, but in function is/should
> be the same - bar that copy_to_user() might sleep due to
> paging-in and preemption and who knows what else].

The difference between copy_to_user() and memcpy() is not a
minor detail. There is a reason why relayfs does its things the
way it does.

> That is a good point, that brought me yesterday night to the following
> doubt. How do you guarantee integrity of the data when reading with
> mmap. In other words, if I am just copying the mmap region, how do
> I know that what I am copying is safe enough, that it is not being
> modified by CPU #2, for example?
[snip]

"Use the source, Luke."

> As I explained below, you don't _have_to_ drop it; however, in some
> cases, it makes sense to drop it because it is meaningless anyway (ie
> the device-plugged message - why would I want the userspace to check
> it if I know there is no device - so I recall it). Errors are another
> matter, and you don't want to recall those.

OK, so you are suggesting we start making a difference in the kernel
between those printks which are "optional" and those that are
"compulsory"? Interested kernel developers are free to voice their
interest at any time now ...

> This is different from running out of space. Like it or not, if you
> have a circular buffer with limited space and you run out ... moc!
> you loose, drop something somewhere to make space for it. This is not
> a kue limitation, this is a property of buffers: they fill up.

I wasn't interested in teaching anyone about buffering basics. The point
I was making is that different buffering schemes have different behaviors
in regards to overflow. IMNSHO I think the drop-the-oldest behavior
is preferable to the drop-whatever-has-gone-away behavior. But maybe
that's just me.

> Now, if you want to make it resizable, that understands Japanese and
> does double loops followed by a nose dive and a vertical climb up,
> well, that's up to the client of the API. And I didn't want to
> constraint the gymnastics that the client could do to handle a buffer.

Well, if we're talking about "double loops followed by a nose dive"
we're certainly not going anywhere. I'll leave it to other LKML
subscribers to decided wether automatically resizeable buffers are
of interest.

> > I'm sorry, but the way I see printk() is that once I send something
> > to it, it really ought to show up somewhere. Heck, I'm printk'ing
> > it. If I plugged a device and the driver said "Your chair is
> > on fire", I want to know about it whether the device has been
> > unplugged later or not.
> 
> I would say this case, printk(), would fit in my second example,
> doesn't it? ... this is one message you want delivered, not recalled.

What I've been trying to say here is that there are no two kinds of
printk. printk is printk and it ought to behave the same in all
instances.

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22 22:53 Perez-Gonzalez, Inaky
  2003-04-23  3:58 ` Tom Zanussi
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22 22:53 UTC (permalink / raw)
  To: 'Tom Zanussi'
  Cc: 'karim@opersys.com', 'linux-kernel@vger.kernel.org'


> From: Tom Zanussi [mailto:zanussi@us.ibm.com]
> Perez-Gonzalez, Inaky writes:
>  > > From: Tom Zanussi [mailto:zanussi@us.ibm.com]
>  > >
>  > > In relayfs, the event can be generated directly into the space
>  > > reserved for it - in fact this is exactly what LTT does.  There
aren't
>  > > two separate steps, one 'generating' the event and another copying it
>  > > to the relayfs buffer, if that's what you mean.
>  >
>  > In this case, what happens if the user space, through mmap, copies
>  > while the message is half-baked (ie, from another CPU) ... won't it
>  > be inconsistent?
> 
> There's a count kept, per sub-buffer, that's updated after each write.
> If this count doesn't match the expected size of the sub-buffer, the
> reader can ignore the incomplete buffer and come back to it later.
> The count is maintained automatically by relay_write(); if you're
> writing directly into the channel as LTT does though, part of the task
> is to call relay_commit() after the write, which updates the count and
> maintains consistency.

Hmmm, scratch, scratch ... there is something I still don't get here. 
I am in lockless_commit() - for what you say, and what I read, I would 
then expect the length of the sub-buffer would be mapped to user space, 
so I can memcpy out of the mmaped area and then take only the part that
is guaranteed to be consistent. But the atomic_add() is done on the 
rchan->scheme.lockless.fillcount[buffer_number]. So, I don't see how
that count pops out to user space, as rchan->buf to rchan->buf + rchan->
alloc_size is what is mapped, and rchan is a kernel-only struct that
is not exposed through mmap().

Where is the Easter bunny I am missing? Would you mind to give a little bit
of pseudocode? I am trying to understand how to do this:

/* in userspace */

char *buf;
int fd; /* for the channel */
int log_fd; /* for permanent storage */

/* open, blah ... */

buf = mmap (... fd ... 1M ...);

if (new stuff is ready /* how to detect, select() on fd? */) {
	/* bring the data to safety, copy only what is consistent */
	real_size = /* or where do I get this from? */
	write (log_fd, buf, real_size);
}

And then, once I have this, next time I read I don't want to read
what I already did; I guess I can advance my buf pointer to 
buf+real_size, but then how do I wrap around - meaning, how do I
detect when do I have to wrap?

>  > Yes, you have to guarantee the existence of the event data structures
>  > (the 'struct kue', the embedded 'struct kue_user' and the event data
>  > itself); if they are embedded into another structure that will dissa-
>  > pear, you can choose to:
>  > ...
> 
> Well, kmalloc() seems like the most straightforward and convenient way
> of managing space for all these individual events, if not the most
> efficient.  Are you thinking that sub-allocating them out of a larger
> buffer might make more sense, for instance?  If so, I'd suggest
> relayfs for that. ;-) Just kidding, ...

Good try :) As I said somewhere else, that'd be up to the client. Wanna
use kmalloc()? or kmem_cache_alloc()? or something else? I guess it'd 
be convenient to provide a pre-implemented circular buffer thingie ready
to use.

I guess the suballocation makes sense when you have a fixed message size
and you want to optimize the allocation; for that matter, kmalloc is no
different to any other pool, as they are just pools of base-2 sizes. In
some other sense, you are doing the same in relayfs, managing kind of
an allocation pool, but not as flexible (and thus probably faster) because
the usage model doesn't pose as many requirements as the memory pools have.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-22 19:02 Perez-Gonzalez, Inaky
  2003-04-22 19:03 ` H. Peter Anvin
@ 2003-04-22 21:52 ` Tom Zanussi
  1 sibling, 0 replies; 52+ messages in thread
From: Tom Zanussi @ 2003-04-22 21:52 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Tom Zanussi', 'karim@opersys.com',
	'linux-kernel@vger.kernel.org'

Trimmed the cc list to those of us still here...

Perez-Gonzalez, Inaky writes:
 > 
 > 
 > > From: Tom Zanussi [mailto:zanussi@us.ibm.com]
 > > 
 > > In relayfs, the event can be generated directly into the space
 > > reserved for it - in fact this is exactly what LTT does.  There aren't
 > > two separate steps, one 'generating' the event and another copying it
 > > to the relayfs buffer, if that's what you mean.
 > 
 > In this case, what happens if the user space, through mmap, copies
 > while the message is half-baked (ie, from another CPU) ... won't it
 > be inconsistent?
 > 

There's a count kept, per sub-buffer, that's updated after each write.
If this count doesn't match the expected size of the sub-buffer, the
reader can ignore the incomplete buffer and come back to it later.
The count is maintained automatically by relay_write(); if you're
writing directly into the channel as LTT does though, part of the task
is to call relay_commit() after the write, which updates the count and
maintains consistency.

 > > Well, I'm not sure I understand the details of kue all that well, so
 > > let me know if I'm missing something, but for kue events to really be
 > > self-contained, wouldn't the data need to be copied into the event
 > > unless the data structure containing them was guaranteed to exist
 > > until the event was disposed of one way or another?
 > 
 > Yes, you have to guarantee the existence of the event data structures
 > (the 'struct kue', the embedded 'struct kue_user' and the event data
 > itself); if they are embedded into another structure that will dissa-
 > pear, you can choose to:
 > 
 > (a) recall the event [if it is recallable or makes sense to do so]
 > (b) dynamically allocate the event header and data, generate it 
 >     into that dynamic space.
 > (c) dynamically allocate and copy [slow]
 > 
 > (this works now; however, once I finish the destructor code, it
 > will give you the flexibility to use other stuff than just kmalloc()).
 > 
 > You can play many tricks here, but that depends on your needs,
 > requirements and similar stuff.
 > 

Well, kmalloc() seems like the most straightforward and convenient way
of managing space for all these individual events, if not the most
efficient.  Are you thinking that sub-allocating them out of a larger
buffer might make more sense, for instance?  If so, I'd suggest
relayfs for that. ;-) Just kidding, but it does seem you'll have a
certain amount of bookkeeping overhead and will need to deal with
things like fragmentation if you're going to manage a private memory
pool for everything.

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-22 19:02 Perez-Gonzalez, Inaky
@ 2003-04-22 19:03 ` H. Peter Anvin
  2003-04-22 21:52 ` Tom Zanussi
  1 sibling, 0 replies; 52+ messages in thread
From: H. Peter Anvin @ 2003-04-22 19:03 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Tom Zanussi', 'karim@opersys.com',
	'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com'

Please take me off this Cc: list.

	-hpa


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22 19:02 Perez-Gonzalez, Inaky
  2003-04-22 19:03 ` H. Peter Anvin
  2003-04-22 21:52 ` Tom Zanussi
  0 siblings, 2 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22 19:02 UTC (permalink / raw)
  To: 'Tom Zanussi', Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com'



> From: Tom Zanussi [mailto:zanussi@us.ibm.com]
> 
> In relayfs, the event can be generated directly into the space
> reserved for it - in fact this is exactly what LTT does.  There aren't
> two separate steps, one 'generating' the event and another copying it
> to the relayfs buffer, if that's what you mean.

In this case, what happens if the user space, through mmap, copies
while the message is half-baked (ie, from another CPU) ... won't it
be inconsistent?

> Well, I'm not sure I understand the details of kue all that well, so
> let me know if I'm missing something, but for kue events to really be
> self-contained, wouldn't the data need to be copied into the event
> unless the data structure containing them was guaranteed to exist
> until the event was disposed of one way or another?

Yes, you have to guarantee the existence of the event data structures
(the 'struct kue', the embedded 'struct kue_user' and the event data
itself); if they are embedded into another structure that will dissa-
pear, you can choose to:

(a) recall the event [if it is recallable or makes sense to do so]
(b) dynamically allocate the event header and data, generate it 
    into that dynamic space.
(c) dynamically allocate and copy [slow]

(this works now; however, once I finish the destructor code, it
will give you the flexibility to use other stuff than just kmalloc()).

You can play many tricks here, but that depends on your needs,
requirements and similar stuff.

> backing data structure was guaranteed to exist, wouldn't it need to be
> static (unchanging) for the data to mean the same thing when it was
> delivered as when it was logged?  If the data needs to be copied to

Of course; I am assuming the client knows that (this is a must for
static allocation, similar in problem to when you give out an string
from inside your code). If the client is not willing to do that, it
has to dynamically allocate and provide a destructor.

> the event to make it self-contained, then you're actually doing 2
> copies per event - the first to the event, the second the
> copy_to_user().

That I do in the (c) case; not in the (b) case. In most situations
you shall be able to choose how to do it [and I guess most sensible
people would choose (b)].

> By contrast, relayfs is worst-case (and best-case) single copy
> into the relayfs buffer, which is allocated/freed once during its
> lifetime.

Sure; I'd just like to know how are you maintaining consistency for
the mmap stuff.


Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22 18:46 Perez-Gonzalez, Inaky
  2003-04-22 23:28 ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22 18:46 UTC (permalink / raw)
  To: 'karim@opersys.com', Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> "Perez-Gonzalez, Inaky" wrote:
> > However, in relayfs that problem is shifted, unless I am missing
> > something. For what I know, so far, is that you have to copy the
> > message to the relayfs buffer, right? So you have to generate the
> > message and then copy it to the channel with relay_write(). So
> > here is kue's copy_to_user() counterpart.
> 
> OK, so you are claiming that memcpy() == copy_to_user()?

Not ==, although you cannot deny that they do basically the same: 
copy memory. 

copy_to_user() has to do some more gymnastics in the process, 
but basically, the bulk is the same [at least by reading the 
asm of __copy_user() in usercopy.c and __memcpy() in string.h 
-- it is kind of different, but in function is/should
be the same - bar that copy_to_user() might sleep due to 
paging-in and preemption and who knows what else].
 
> [If nothing else, please keep in mind that the memcpy() in question
> is to an rvmalloc'ed buffer.]

Good issue for caching ...

> Maybe I'm just old fashioned, but I usually want to provide a
> logging function with a pointer and a size parameter, and I want
> whatever I'm passing to it be placed in a secure repository where
> my own code couldn't touch it even if it went berserk.

That is a good point, that brought me yesterday night to the following
doubt. How do you guarantee integrity of the data when reading with
mmap. In other words, if I am just copying the mmap region, how do
I know that what I am copying is safe enough, that it is not being
modified by CPU #2, for example? (because user space and kernel space
will not share the locks, at most, user space can look at a couple of
markers that identify the bottom and the top of the "safe" buffer,
but there is not way to get both of them atomically). Also, if it
is a circular buffer, is there a way for the user space to know
when did it wrap around? I still don't get how mmap works all this
out (or is the buffer being moved under the user space's feet?)

> Again, you are making assumptions regarding the usage of your mechanism.
> With relayfs, dropping a channel (even one that has millions upon millions
> of events) requires one rvfree().

Well, we all have to make certain assumptions, other wise we'd be
having philosophical discussions about the squareness of the circle
for ever an ever in a for(;;) loop. 

> Sorry, you don't get to see b, c, g, h, and i because something
> changed in the system and whatever wanted to send those over isn't
> running anymore. Maybe I'm just off the chart, but I'd rather see
> the first list of events.

As I explained below, you don't _have_to_ drop it; however, in some
cases, it makes sense to drop it because it is meaningless anyway (ie
the device-plugged message - why would I want the userspace to check
it if I know there is no device - so I recall it). Errors are another
matter, and you don't want to recall those.

This is different from running out of space. Like it or not, if you 
have a circular buffer with limited space and you run out ... moc!
you loose, drop something somewhere to make space for it. This is not
a kue limitation, this is a property of buffers: they fill up.

Now, if you want to make it resizable, that understands Japanese and
does double loops followed by a nose dive and a vertical climb up, 
well, that's up to the client of the API. And I didn't want to 
constraint the gymnastics that the client could do to handle a buffer.

> > However, there are two different concepts here. One is the event
> > that you want to send and recall it if not delivered by the time
> > it does not make sense anymore (think plug a device, then remove
> > it). The other want is the event you want delivered really badly
> > (think a "message" like "the temperature of the nuclear reactor's
> > core has reached the high watermark").
> 
> I'm sorry, but the way I see printk() is that once I send something
> to it, it really ought to show up somewhere. Heck, I'm printk'ing
> it. If I plugged a device and the driver said "Your chair is
> on fire", I want to know about it whether the device has been
> unplugged later or not.

I would say this case, printk(), would fit in my second example,
doesn't it? ... this is one message you want delivered, not recalled.

> > As I mentioned before, this kind-of-compensates-but-not-really
> > with the fact of having to generate the message and then copy
> > it to the channel.
> 
> That's the memcpy() == copy_to_user() again.

Please note the "kind-of-compensates-but-not-really"; then refer
to my first paragraph. 

> Nevertheless, if you want to measure scalability alone, try
> porting LTT to kue, and try running LMbench and co. LTT is very
> demanding in terms of buffering (indeed, I'll go a step further and
> claim that it is the most demanding application in terms of
> buffering) and already runs on relayfs.

Got it, thanks :)

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-22  4:02 Perez-Gonzalez, Inaky
  2003-04-22  5:52 ` Karim Yaghmour
@ 2003-04-22  6:04 ` Tom Zanussi
  1 sibling, 0 replies; 52+ messages in thread
From: Tom Zanussi @ 2003-04-22  6:04 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

Perez-Gonzalez, Inaky writes:
 > 
 > > From: Karim Yaghmour [mailto:karim@opersys.com]
 > > 
 > > Consider the following:
 > > 1) kue_read() has a while(1) which loops around and delivers messages
 > > one-by-one (to the best of my understanding of the code you posted).
 > 
 > That's right.
 > 
 > > Hence, delivery time increases with the number of events. In contrast,
 > > relayfs can deliver tens of thousands of events in a single shot.
 > 
 > Sure, I agree with that - it is the price to pay for less code and
 > not having to copy on the kernel side. As I mentioned before to Tom,
 > I don't think that in the long run the overhead will be that much,
 > although it is true it will be very dependent on the average size of 
 > the messages and how many you deliver. If we are delivering two-byte
 > messages at 1M/second rate, the effectivity rate goes down, and with
 > it the scalability.
 > 
 > However, in relayfs that problem is shifted, unless I am missing 
 > something. For what I know, so far, is that you have to copy the
 > message to the relayfs buffer, right? So you have to generate the
 > message and then copy it to the channel with relay_write(). So
 > here is kue's copy_to_user() counterpart.
 > 
 > If there were a way to reserve the space in relayfs, so that then
 > I can generate the message straight over there, that scalability
 > problem would be gone.
 > 

In relayfs, the event can be generated directly into the space
reserved for it - in fact this is exactly what LTT does.  There aren't
two separate steps, one 'generating' the event and another copying it
to the relayfs buffer, if that's what you mean.

[snip]
 > 
 > > > That's the difference. I don't intend to have that. The data
 > > > storage can be reused or not, that is up to the client of the
 > > > kernel API. They still can reuse it if needed by reclaiming the
 > > > event (recall_event), refilling the data and re-sending it.
 > > 
 > > Right, but by reusing the event, older data is thereby destroyed
 > > (undelivered). Which comes back to what I (and others) have been
 > > saying: kue requires the sender's data structures to exist until
 > > their content is delivered.
 > 
 > That's it, that is the principle of a circular buffer. AFAIK, you
 > have the same in relayfs. Old stuff gets dropped. Period. And the
 > sender's data structures don't really need to exist forever,
 > because the event is self-contained. The only part that might
 > be a problem is the destructor [if for example, you were a module,
 > the destructor code was in the module and the module unloaded 
 > without recalling pending events first - the kind of thing
 > you should never do; either use a non-modular destructor or make
 > sure the module count is not decreased until all the events have
 > been delivered or recalled ... simple to do too]
 > 

Well, I'm not sure I understand the details of kue all that well, so
let me know if I'm missing something, but for kue events to really be
self-contained, wouldn't the data need to be copied into the event
unless the data structure containing them was guaranteed to exist
until the event was disposed of one way or another?  And even if the
backing data structure was guaranteed to exist, wouldn't it need to be
static (unchanging) for the data to mean the same thing when it was
delivered as when it was logged?  If the data needs to be copied to
the event to make it self-contained, then you're actually doing 2
copies per event - the first to the event, the second the
copy_to_user().  I'm not sure how much would be necessary in normal
usage, but if you're doing a lot of kmalloc()/kfree() to contain the
events, that's another potentialy large source of overhead.

By contrast, relayfs is worst-case (and best-case) single copy
into the relayfs buffer, which is allocated/freed once during its
lifetime.

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-22  3:04 Perez-Gonzalez, Inaky
@ 2003-04-22  6:00 ` Tom Zanussi
  0 siblings, 0 replies; 52+ messages in thread
From: Tom Zanussi @ 2003-04-22  6:00 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Tom Zanussi', 'karim@opersys.com',
	'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com'

Perez-Gonzalez, Inaky writes:
 > 
 > > From: Tom Zanussi [mailto:zanussi@us.ibm.com]
 > > 
 > > It seems to me that when comparing apples to apples, namely
 > > considering the complete lifecycle of an event, ... <snip>
 > > 
 > > While kue_send_event() in itself is very simple and efficient, it's
 > > only part of the story, the other parts being the copy_to_user() ...
 > 
 > Agreed - my mistake here in the comparison for leaving out that stuff.
 > 
 > > event.  While kue can avoid this kernel-side copy, it's not possible
 > > for it to avoid the copy_to_user() since its design precludes mmapping
 > > the kernel data.  Again, six of one, half dozen of another.  kue looks
 > 
 > Sure - those things, I would say, they compensate one another, 
 > except for that mmap() detail that pushes the balance towards relayfs
 > regarding effectiveness when delivering the messages; I think that
 > at the end the difference should not be too big as the copying of
 > the data in kue to user space should roughly compensate by the copying
 > of the data to the relayfs buffer; after all, a copy is a copy.
 > No data to back this claim though, I am just thinking a mental 
 > schematic of the lifetime of a bit in both systems out loud.
 > 

Right.  This is what I meant when I said the two were very similar
when considering the lifetime of a single event, ignoring everything
else such as bulk processing via mmap() vs. iterating through a list,
as discussed elsewhere.

 > Or, again, I am missing something ...
 > 
 > Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
 > (and my fault)
 > 

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-22  4:02 Perez-Gonzalez, Inaky
@ 2003-04-22  5:52 ` Karim Yaghmour
  2003-04-22  6:04 ` Tom Zanussi
  1 sibling, 0 replies; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-22  5:52 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


"Perez-Gonzalez, Inaky" wrote:
> However, in relayfs that problem is shifted, unless I am missing
> something. For what I know, so far, is that you have to copy the
> message to the relayfs buffer, right? So you have to generate the
> message and then copy it to the channel with relay_write(). So
> here is kue's copy_to_user() counterpart.

OK, so you are claiming that memcpy() == copy_to_user()?

[If nothing else, please keep in mind that the memcpy() in question
is to an rvmalloc'ed buffer.]

> If there were a way to reserve the space in relayfs, so that then
> I can generate the message straight over there, that scalability
> problem would be gone.

I don't see a scalability problem with relayfs.

Maybe I'm just old fashioned, but I usually want to provide a
logging function with a pointer and a size parameter, and I want
whatever I'm passing to it be placed in a secure repository where
my own code couldn't touch it even if it went berserk.

In other words, if I'm coding a driver and my code isn't stable
yet, I want the printks generated by my sane code to survive,
even if my insane code came later and destroyed all the data
structures I depend on. If printk still has to rely on my insane
code for its data, then I'm probably going to waste some time
finding my problem.

> > 2) by having to maintain next and prev pointers, kue consumes more
> > memory than relayfs (at least 8 bytes/message more actually, on a
> > 32-bit machine.) For large messages, the impact is negligeable, but
> > the smaller the messages the bigger the overhead.
> 
> True; I would say most messages are going to be at least 30
> something bytes in length. I don't think there is like an
> estimated average of the messages size, right?

relayfs doesn't make any assumptions on this issue.

> > 3) by having to go through the next/prev pointers, accessing message
> > X requires reading all messages before it. This can be simplified
> 
> Not really, because access is sequential, one shot. Once you have
> read it, there it goes. So you always keep a pointer to the next
> msg that you are going to read.

You are assuming sequential reading, relayfs doesn't.

> > are used. [Other kue calls are also handicapped by similar problems,
> > such as the deletion of the entire list.]
> 
> Yes, this is hopelessly O(N), but creation or deletion of an entire
> list is not such a common thing (I would say that otherwise would
> imply some design considerations that should be revised in the
> system).

Again, you are making assumptions regarding the usage of your mechanism.
With relayfs, dropping a channel (even one that has millions upon millions
of events) requires one rvfree().

> That's it, that is the principle of a circular buffer. AFAIK, you
> have the same in relayfs. Old stuff gets dropped. Period. And the
> sender's data structures don't really need to exist forever,
> because the event is self-contained.

Sorry, but you're missing the point. There's a world of a difference
between a mechanism that has to drop the oldest data because there
are too many events occuring and a mechanism that has to drop messages
according to the changes to the system. You are not comparing the
same thing.

So say I have 10 events: a, b, c, d, e, f, g, h, i, j. Say that event
"j" is the one marking the occurence of a catastrophic physical event.

With a circular buffer scheme, you may get something like:
f, g, h, i, j.

With kue, you may get:
a, d, e, f, j.

Sorry, you don't get to see b, c, g, h, and i because something
changed in the system and whatever wanted to send those over isn't
running anymore. Maybe I'm just off the chart, but I'd rather see
the first list of events.

Plus, remember that relayfs has buffer-boundary notification callbacks.
If the subsystem being notified doesn't care enough to do something
about the data, there's really nothing relayfs can do about it. And
if there's just too much data for the channel, then:
1) Allocate a bigger channel.
2) Use the dynamic growth/shrinking code being worked on.

> However, there are two different concepts here. One is the event
> that you want to send and recall it if not delivered by the time
> it does not make sense anymore (think plug a device, then remove
> it). The other want is the event you want delivered really badly
> (think a "message" like "the temperature of the nuclear reactor's
> core has reached the high watermark").

I'm sorry, but the way I see printk() is that once I send something
to it, it really ought to show up somewhere. Heck, I'm printk'ing
it. If I plugged a device and the driver said "Your chair is
on fire", I want to know about it whether the device has been
unplugged later or not.

> > Right, but kue has to loop through the queue to deliver the messages
> > one-by-one. The more messages there are, the longer the delivery time.
> > Not to mention that you first have to copy it to user-space before
> > the reader can do write() to put it to permanent storage. With relafys,
> > you just do write() and you're done.
> 
> As I mentioned before, this kind-of-compensates-but-not-really
> with the fact of having to generate the message and then copy
> it to the channel.

That's the memcpy() == copy_to_user() again.

> I think that at this point it'd be interesting to run something
> like a benchmark [once I finish the destructor code], however,
> it is going to be fairly difficult to test both implementations
> in similar grounds. Any ideas?

IMHO I don't think you can truely compare relayfs to kue, because
the basic concepts implemented in kue do not correspond to what is
needed of a generalized buffering mechanism for the kernel.
Nevertheless, if you want to measure scalability alone, try
porting LTT to kue, and try running LMbench and co. LTT is very
demanding in terms of buffering (indeed, I'll go a step further and
claim that it is the most demanding application in terms of
buffering) and already runs on relayfs.

Here are some benchmarks we've already run with the current buffering
code being used in relayfs:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103573710926859&w=2

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22  5:09 Perez-Gonzalez, Inaky
  2003-04-24 18:22 ` bob
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  5:09 UTC (permalink / raw)
  To: 'karim@opersys.com'
  Cc: 'Tom Zanussi', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Robert Wisniewski'


> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> "Perez-Gonzalez, Inaky" wrote:
>
> > Not meaning to be an smartass here, but I don't buy the "lockless" tag,
> > I would agree it is an optimized-lock scheme ....
> >
> > Although it is not that important, no need to make a fuss out of that :)
> 
> I actually think this is important.

Don't get me wrong - I don't mean the actual difference is not important;
what I mean is not important is me buying the "lockless" tag or not. I 
actually think that the method you guys use is really sharp.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-22  2:49 Perez-Gonzalez, Inaky
@ 2003-04-22  4:34 ` Karim Yaghmour
  0 siblings, 0 replies; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-22  4:34 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Tom Zanussi', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	Robert Wisniewski


"Perez-Gonzalez, Inaky" wrote:
> > From: Tom Zanussi
> > relayfs actually uses 2 mutually-exclusive schemes internally -
> > 'lockless' and 'locking', depending on the availability of a cmpxchg
> > instruction (lockless needs cmpxchg).  If the lockless scheme is being
> > used, relay_lock_channel() does no locking or irq disabling of any
> > kind i.e. it's basically a no-op in that case.
>
> So that means you are using cmpxchg to do the locking. I mean, not the
> "locking" itself, but a similar process to that of locking. I see.
> 
> However, isn't it the almost the same as spinlocking? You are basically
> trying to "allocate" a channel idx with atomic cmpxchg; if it fails, you
> are retrying, spinning on the retry code until successful.
> 
> Not meaning to be an smartass here, but I don't buy the "lockless" tag,
> I would agree it is an optimized-lock scheme [assuming it works better
> than the spinlock case, that I am sure it does because if not you guys
> would have not gone through the process of implementing it], but it is
> not lockless.
> 
> Although it is not that important, no need to make a fuss out of that :)

I actually think this is important.

The meaning of "lockless" becomes quite clear when both relayfs logging
schemes are compared. In the locking scheme, one of the following must
be used:
local_irq_save()
spin_lock_irqsave()

[They "must" be used because the relay_write() function could be called
from within an interrupt handler and the only safe way to manipulate
buffers that are accessible in read-write both to interrupt handlers
and other code is to disable interrupts in one way or another.]

Both of these disable interrupts on the local processor (actually,
spin_lock_irqsave() has a local_irq_save() inside it.)

With the cmpxchg, there is no interrupt disabling whatsoever. The code tries
to allocate some space, and retries if it fails. The most likely reason
it may fail is in the case when an interrupt occurs and that interrupt's
handler tries and succeeds in allocating space in the buffer instead of
the interrupted code. To the best of my memory, the tests we've done show
that the code very rarely has to try more than two or three times.

While the code does the loop once or twice, however, the processor is
free to continue handling interrupts. None of the code instances is
actively spining in waiting for another instance to relinquish a lock.
There is, indeed, no lock here to be acquired or to be waited on.

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22  4:02 Perez-Gonzalez, Inaky
  2003-04-22  5:52 ` Karim Yaghmour
  2003-04-22  6:04 ` Tom Zanussi
  0 siblings, 2 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  4:02 UTC (permalink / raw)
  To: 'karim@opersys.com', Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> Consider the following:
> 1) kue_read() has a while(1) which loops around and delivers messages
> one-by-one (to the best of my understanding of the code you posted).

That's right.

> Hence, delivery time increases with the number of events. In contrast,
> relayfs can deliver tens of thousands of events in a single shot.

Sure, I agree with that - it is the price to pay for less code and
not having to copy on the kernel side. As I mentioned before to Tom,
I don't think that in the long run the overhead will be that much,
although it is true it will be very dependent on the average size of 
the messages and how many you deliver. If we are delivering two-byte
messages at 1M/second rate, the effectivity rate goes down, and with
it the scalability.

However, in relayfs that problem is shifted, unless I am missing 
something. For what I know, so far, is that you have to copy the
message to the relayfs buffer, right? So you have to generate the
message and then copy it to the channel with relay_write(). So
here is kue's copy_to_user() counterpart.

If there were a way to reserve the space in relayfs, so that then
I can generate the message straight over there, that scalability
problem would be gone.

> 2) by having to maintain next and prev pointers, kue consumes more
> memory than relayfs (at least 8 bytes/message more actually, on a
> 32-bit machine.) For large messages, the impact is negligeable, but
> the smaller the messages the bigger the overhead.

True; I would say most messages are going to be at least 30 
something bytes in length. I don't think there is like an 
estimated average of the messages size, right?

> 3) by having to go through the next/prev pointers, accessing message
> X requires reading all messages before it. This can be simplified

Not really, because access is sequential, one shot. Once you have
read it, there it goes. So you always keep a pointer to the next
msg that you are going to read.

> are used. [Other kue calls are also handicapped by similar problems,
> such as the deletion of the entire list.]

Yes, this is hopelessly O(N), but creation or deletion of an entire
list is not such a common thing (I would say that otherwise would
imply some design considerations that should be revised in the
system). 

Same thing goes for the closure of a file descriptor, that has
to go over the list deciding who to send to the gallows. Need to
work on that though.

> > That's the difference. I don't intend to have that. The data
> > storage can be reused or not, that is up to the client of the
> > kernel API. They still can reuse it if needed by reclaiming the
> > event (recall_event), refilling the data and re-sending it.
> 
> Right, but by reusing the event, older data is thereby destroyed
> (undelivered). Which comes back to what I (and others) have been
> saying: kue requires the sender's data structures to exist until
> their content is delivered.

That's it, that is the principle of a circular buffer. AFAIK, you
have the same in relayfs. Old stuff gets dropped. Period. And the
sender's data structures don't really need to exist forever,
because the event is self-contained. The only part that might
be a problem is the destructor [if for example, you were a module,
the destructor code was in the module and the module unloaded 
without recalling pending events first - the kind of thing
you should never do; either use a non-modular destructor or make
sure the module count is not decreased until all the events have
been delivered or recalled ... simple to do too]

However, there are two different concepts here. One is the event
that you want to send and recall it if not delivered by the time
it does not make sense anymore (think plug a device, then remove
it). The other want is the event you want delivered really badly
(think a "message" like "the temperature of the nuclear reactor's
core has reached the high watermark").

First one is the kind of event you would want to plug into the
data structures (or not, you might also kmalloc() it) and then,
when it stops making sense, recall it.

Second one, you don't care what happens after you are not there.
The code just had to set it on its way home.

The key is that the client of the API can control that behavior.

> Right, but then you have 2 layers of buffering/queing instead
> of a single one.

No, I just have one buffering/queuing, because I still had to 
generate the message to put it somewhere. And I generate it 
directly to what is going to be delivered.

> Right, but kue has to loop through the queue to deliver the messages
> one-by-one. The more messages there are, the longer the delivery time.
> Not to mention that you first have to copy it to user-space before
> the reader can do write() to put it to permanent storage. With relafys,
> you just do write() and you're done.

As I mentioned before, this kind-of-compensates-but-not-really
with the fact of having to generate the message and then copy
it to the channel.

I think that at this point it'd be interesting to run something
like a benchmark [once I finish the destructor code], however,
it is going to be fairly difficult to test both implementations
in similar grounds. Any ideas?

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22  3:04 Perez-Gonzalez, Inaky
  2003-04-22  6:00 ` Tom Zanussi
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  3:04 UTC (permalink / raw)
  To: 'Tom Zanussi'
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


> From: Tom Zanussi [mailto:zanussi@us.ibm.com]
> 
> It seems to me that when comparing apples to apples, namely
> considering the complete lifecycle of an event, ... <snip>
> 
> While kue_send_event() in itself is very simple and efficient, it's
> only part of the story, the other parts being the copy_to_user() ...

Agreed - my mistake here in the comparison for leaving out that stuff.

> event.  While kue can avoid this kernel-side copy, it's not possible
> for it to avoid the copy_to_user() since its design precludes mmapping
> the kernel data.  Again, six of one, half dozen of another.  kue looks

Sure - those things, I would say, they compensate one another, 
except for that mmap() detail that pushes the balance towards relayfs
regarding effectiveness when delivering the messages; I think that
at the end the difference should not be too big as the copying of
the data in kue to user space should roughly compensate by the copying
of the data to the relayfs buffer; after all, a copy is a copy.
No data to back this claim though, I am just thinking a mental 
schematic of the lifetime of a bit in both systems out loud.

Or, again, I am missing something ...

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-22  2:49 Perez-Gonzalez, Inaky
  2003-04-22  4:34 ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  2:49 UTC (permalink / raw)
  To: 'Tom Zanussi'
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com'

 
> From: Tom Zanussi
>
> relayfs actually uses 2 mutually-exclusive schemes internally -
> 'lockless' and 'locking', depending on the availability of a cmpxchg
> instruction (lockless needs cmpxchg).  If the lockless scheme is being
> used, relay_lock_channel() does no locking or irq disabling of any
> kind i.e. it's basically a no-op in that case.  

So that means you are using cmpxchg to do the locking. I mean, not the
"locking" itself, but a similar process to that of locking. I see. 

However, isn't it the almost the same as spinlocking? You are basically
trying to "allocate" a channel idx with atomic cmpxchg; if it fails, you
are retrying, spinning on the retry code until successful.

Not meaning to be an smartass here, but I don't buy the "lockless" tag,
I would agree it is an optimized-lock scheme [assuming it works better
than the spinlock case, that I am sure it does because if not you guys
would have not gone through the process of implementing it], but it is
not lockless.

Although it is not that important, no need to make a fuss out of that :)

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-21 18:42 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-21 18:42 UTC (permalink / raw)
  To: 'H. Peter Anvin', Perez-Gonzalez, Inaky
  Cc: 'Greg KH', 'karim@opersys.com',
	'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

> From: H. Peter Anvin [mailto:hpa@zytor.com]
> 
> Perez-Gonzalez, Inaky wrote:
> >
> > Hey! Come on! You don't think I am that lame, do you? Man what
> > a fame I do have!
> >
> > Before the device vaporizes, it recalls the message, so there is
> > no message to read - the same way you take away the sysfs data from
> > the sysfs tree ...
> 
> If you think that will happen with printk(), then, quite frankly, you're
> seriously deluded.

I am kind of deluded, that's for sure. And sore too, but that's another one.

I tend to agree with you; however, it can be done. You would need to adapt
a circular buffer to work with kue. Not a big deal though - just an space 
allocator (that would recall the oldest messages if in need of space) and
the 
'destructor' would just clear the space.

If I get to modify the code to make the destructor thing work (any of these
days), then it will be possible to do it without modifying kue at all.

Now that I think about it, it would work - but I don't think it'd be
really worth it (the per-message overhead would be big for printk,
I'd say). For the record, I really think relayfs could be a better
answer [with the limited reading that so far I had about it].

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-17 21:03   ` Perez-Gonzalez, Inaky
                       ` (2 preceding siblings ...)
  2003-04-18  7:42     ` Greg KH
@ 2003-04-21 15:56     ` Karim Yaghmour
  3 siblings, 0 replies; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-21 15:56 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


Others have addressed several points already, I just want to come
back to the scalability issues to make my point clear:

"Perez-Gonzalez, Inaky" wrote:
> Well, the total overhead for queuing an event is strictly O(1),
> bar the acquisition of the queue's semaphore in the middle [I
> still hadn't time to finish this and post it, btw]. I think it
> is pretty scalable assuming you don't have the whole system
> delivering to a single queue.

Consider the following:
1) kue_read() has a while(1) which loops around and delivers messages
one-by-one (to the best of my understanding of the code you posted).
Hence, delivery time increases with the number of events. In contrast,
relayfs can deliver tens of thousands of events in a single shot.

2) by having to maintain next and prev pointers, kue consumes more
memory than relayfs (at least 8 bytes/message more actually, on a
32-bit machine.) For large messages, the impact is negligeable, but
the smaller the messages the bigger the overhead.

3) by having to go through the next/prev pointers, accessing message
X requires reading all messages before it. This can be simplified
with relayfs if: a) equal-sized messages are used, b) sub-buffers
are used. [Other kue calls are also handicapped by similar problems,
such as the deletion of the entire list.]

> > Also, at that rate, you simply can't wait on the reader to read
> > events one-by-one until you can reuse the structure where you
> > stored the data to be read.
> 
> That's the difference. I don't intend to have that. The data
> storage can be reused or not, that is up to the client of the
> kernel API. They still can reuse it if needed by reclaiming the
> event (recall_event), refilling the data and re-sending it.

Right, but by reusing the event, older data is thereby destroyed
(undelivered). Which comes back to what I (and others) have been
saying: kue requires the sender's data structures to exist until
their content is delivered.

> That's where the send-and-forget method helps: provide a
> destructor [will replace the 'flags' field - have it cooking
> on my CVS] that will be called once the event is delivered
> to all parties [if not NULL]. Then you can implement your
> own recovery system using a circular buffer, or kmalloc or
> whatever you wish.

Right, but then you have 2 layers of buffering/queing instead
of a single one.

> > relayfs) and the reader has to read events by the thousand every
> > time.
> 
> The reader can do that, in user space; as many events as
> fit into the reader-provided buffer will be delivered.

Right, but kue has to loop through the queue to deliver the messages
one-by-one. The more messages there are, the longer the delivery time.
Not to mention that you first have to copy it to user-space before
the reader can do write() to put it to permanent storage. With relafys,
you just do write() and you're done.

Cheers,

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-17 21:03   ` Perez-Gonzalez, Inaky
  2003-04-17 21:37     ` Tom Zanussi
  2003-04-18  7:21     ` Tom Zanussi
@ 2003-04-18  7:42     ` Greg KH
  2003-04-21 15:56     ` Karim Yaghmour
  3 siblings, 0 replies; 52+ messages in thread
From: Greg KH @ 2003-04-18  7:42 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

On Thu, Apr 17, 2003 at 02:03:47PM -0700, Perez-Gonzalez, Inaky wrote:
> 
> Yep, that is the point, and it is small enough (5 ulongs) that 
> it can be embedded anywhere without being of high impact and 
> having to allocate it [first example that comes to mind is
> for sending a device connection message; you can embed a short
> message in the device structure and query that for delivery;
> no buffer, no nothing, the data straight from the source].

And the device is removed from the system, the memory for that device is
freed, and then a user comes along and trys to read that message.

oops...  :)

greg k-h

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-17 21:03   ` Perez-Gonzalez, Inaky
  2003-04-17 21:37     ` Tom Zanussi
@ 2003-04-18  7:21     ` Tom Zanussi
  2003-04-18  7:42     ` Greg KH
  2003-04-21 15:56     ` Karim Yaghmour
  3 siblings, 0 replies; 52+ messages in thread
From: Tom Zanussi @ 2003-04-18  7:21 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

Perez-Gonzalez, Inaky writes:
 > 
 > 
 > 
 > Well, the total overhead for queuing an event is strictly O(1),
 > bar the acquisition of the queue's semaphore in the middle [I
 > still hadn't time to finish this and post it, btw]. I think it
 > is pretty scalable assuming you don't have the whole system 
 > delivering to a single queue.
 > 
 > Total is four lines if I unfold __kue_queue(), and the list_add_tail()
 > is not that complex. That's versus relay_write(), that I think is the
 > equivalent function [bar the extra goodies] is more complex
 > [disclaimer: this is just looking over the 030317 patch's shoulder,
 > I am in kind of a rush - feel free to correct me here].
 > 

It seems to me that when comparing apples to apples, namely
considering the complete lifecycle of an event, kue and relayfs are
very similar wrt performance and memory usage; whether kue is
scaleable or not I couldn't say, but we've previously published
benchmarks for LTT on this list showing that the relayfs logging code
(the same as that used by LTT) scales very well to logging millions
upon millions of events with low overhead.  

While kue_send_event() in itself is very simple and efficient, it's
only part of the story, the other parts being the copy_to_user() that
must be done to get each event to user space and the subsequent
bookeeping necessary to remove it from the queue and make destructor
calls.  Only if we include all of the above is relayfs' relay_write()
equivalent - once relay_write() returns, that's the end of the story
as far as that event is concerned - at that point the data is directly
available to a client that has the buffer mmapped, and nothing more
remains to be done.  So yes, relay_write() is more complex code-wise
because it's doing more.  As far as algorithmic complexity goes, the
time to log an event via relay_write() is also pretty much constant,
the only variables being that it may take more than one iteration to
reserve a slot in case of a reserve collision with another writer,
which should happen fairly rarely, and the fact that if a given event
is the last event in a buffer, the end-of-buffer slow path is
triggered, which is also relatively speaking a rare occurrence.
Actually, the time it takes to memcpy the event into the relayfs
buffer should also be factored in, as it depends on the size of the
event.  While kue can avoid this kernel-side copy, it's not possible
for it to avoid the copy_to_user() since its design precludes mmapping
the kernel data.  Again, six of one, half dozen of another.  kue looks
like a nice elegant way of logging small bits of data and I'm sure it
has its advantages, though I think the same thing could be
accomplished in a slightly different way with a relayfs channel.

Anyway, to address the original topic, I'm working on a drop-in
replacement of printk that replaces the static printk buffer with a
dynamically resizeable relayfs channel (a new relayfs capability that
will be available to all relayfs clients).  In addition to being
resizeable manually (probably via commands to the syslog system call),
it will also have an 'auto-resize' capability that allows the printk
channel to adapt to printk traffic levels - increase as necessary when
an overflow condition is detected, and fall back to a more reasonable
level when the excess capacity is no longer needed.  Init-time printks
will still use the static printk buffer, but because the static buffer
is marked as __initdata, it can be made large enough to handle lots of
init-time data, all of which is atomically copied over to the the
dynamic relayfs channel before init data is discarded.  Once klogd has
logged all the init data then present in the temporarily enlarged
relay channel, the channel would then resize itself to to a normal
working size.  Hopefully this will solve the problem of lost printks
both at boot-time and during normal operation and isn't a stopgap
measure.

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
  2003-04-17 21:03   ` Perez-Gonzalez, Inaky
@ 2003-04-17 21:37     ` Tom Zanussi
  2003-04-18  7:21     ` Tom Zanussi
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 52+ messages in thread
From: Tom Zanussi @ 2003-04-17 21:37 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

Hi,

Perez-Gonzalez, Inaky writes:
 > > 
 > > relayfs is there to solve the data transfer problems for the most
 > > demanding of applications. Sending a few messages here and there
 > > isn't really a problem. Sending messages/events/what-you-want-to-call-it
 > > by the thousand every second, while using as little locking as possible
 > > (lockless-logging is implemented in the case of relayfs' buffer handling
 > > routines), and providing per-cpu buffering requires a different beast.
 > 
 > Well, you are doing an IRQ lock (relay_lock_channel()), so it is not
 > lockless. Or am I missing anything here? Please let me know, I am
 > really interested on how to reduce locking in for logging to the 
 > minimal.

relayfs actually uses 2 mutually-exclusive schemes internally -
'lockless' and 'locking', depending on the availability of a cmpxchg
instruction (lockless needs cmpxchg).  If the lockless scheme is being
used, relay_lock_channel() does no locking or irq disabling of any
kind i.e. it's basically a no-op in that case.  It's only when the
'locking' scheme is in use that relay_lock_channel() does locking/irq
disabling.  Normally the lockless scheme would be in use - the locking
scheme is there mainly as a fallback, so normally relay_lock_channel()
would indeed cause no locking.

-- 
Regards,

Tom Zanussi <zanussi@us.ibm.com>
IBM Linux Technology Center/RAS


^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-17 21:03   ` Perez-Gonzalez, Inaky
  2003-04-17 21:37     ` Tom Zanussi
                       ` (3 more replies)
  0 siblings, 4 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-17 21:03 UTC (permalink / raw)
  To: 'karim@opersys.com'
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'



> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> "Perez-Gonzalez, Inaky" wrote:
> > But you don't need to provide buffers, because normally the data
> > is already in the kernel, so why need to copy it to another buffer
> > for delivery?
> 
> There is no copying going on. As with kue, you have to have a
> packaged structure somewhere to send to the recipient. As per
> your code:
> +       _m4 = kmalloc (sizeof (*_m4), GFP_KERNEL);
> +       memcpy (_m4, &m4, sizeof (m4));
> +       _m4->kue.flags = KUE_KFREE;
> +       kue_send_event (&_m4->kue);
> 
> _m4 and m4 are placeholders that must exist before being queued,
> there's just no way around that. 

Yep, that is the point, and it is small enough (5 ulongs) that 
it can be embedded anywhere without being of high impact and 
having to allocate it [first example that comes to mind is
for sending a device connection message; you can embed a short
message in the device structure and query that for delivery;
no buffer, no nothing, the data straight from the source].

I didn't want to use buffers for all the reasons people has
exposed. They involve allocation of space, somehow [inside
of the buffer for example] and there is a time when you
have to start dropping things. When kue you can avoid that
when your messages are embedded in your dat structs [provided
you keep them small, if they are huge, well, you loose -
that is a conceptual limitation].

> When the channel buffer is mmap'ed in the user-process' address space,
> all that is needed is a write() with a pointer to the buffer for it
> to go to storage. There is zero-copying going on here.

That's a nice thing of your approach; kue cannot do mmap().

> Plus, kue uses lists with next & prev pointers. That simply won't
> scale if you have a buffer filling at the rate of 10,000 events/s.

Well, the total overhead for queuing an event is strictly O(1),
bar the acquisition of the queue's semaphore in the middle [I
still hadn't time to finish this and post it, btw]. I think it
is pretty scalable assuming you don't have the whole system 
delivering to a single queue.

Total is four lines if I unfold __kue_queue(), and the list_add_tail()
is not that complex. That's versus relay_write(), that I think is the
equivalent function [bar the extra goodies] is more complex
[disclaimer: this is just looking over the 030317 patch's shoulder,
I am in kind of a rush - feel free to correct me here].

> Also, at that rate, you simply can't wait on the reader to read
> events one-by-one until you can reuse the structure where you
> stored the data to be read.

That's the difference. I don't intend to have that. The data 
storage can be reused or not, that is up to the client of the
kernel API. They still can reuse it if needed by reclaiming the
event (recall_event), refilling the data and re-sending it.

That's where the send-and-forget method helps: provide a 
destructor [will replace the 'flags' field - have it cooking
on my CVS] that will be called once the event is delivered 
to all parties [if not NULL]. Then you can implement your 
own recovery system using a circular buffer, or kmalloc or
whatever you wish.

> relayfs) and the reader has to read events by the thousand every
> time.

The reader can do that, in user space; as many events as
fit into the reader-provided buffer will be delivered.

> > This is where I think relayfs is doing too much, and that is the
> > reason why I implemented the kue stuff. It is very lightweight
> > and does almost the same [of course, it is not bidirectional, but
> > still nobody asked for that].
> 
> relayfs is there to solve the data transfer problems for the most
> demanding of applications. Sending a few messages here and there
> isn't really a problem. Sending messages/events/what-you-want-to-call-it
> by the thousand every second, while using as little locking as possible
> (lockless-logging is implemented in the case of relayfs' buffer handling
> routines), and providing per-cpu buffering requires a different beast.

Well, you are doing an IRQ lock (relay_lock_channel()), so it is not
lockless. Or am I missing anything here? Please let me know, I am
really interested on how to reduce locking in for logging to the 
minimal.

Thanks,

BTW: I am going to be out of town from five minutes from now until
Monday ... not that I don't want to keep reading :)

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-17 19:58 Perez-Gonzalez, Inaky
@ 2003-04-17 20:34 ` Karim Yaghmour
  2003-04-17 21:03   ` Perez-Gonzalez, Inaky
  0 siblings, 1 reply; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-17 20:34 UTC (permalink / raw)
  To: Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


"Perez-Gonzalez, Inaky" wrote:
> But you don't need to provide buffers, because normally the data
> is already in the kernel, so why need to copy it to another buffer
> for delivery?

There is no copying going on. As with kue, you have to have a
packaged structure somewhere to send to the recipient. As per
your code:
+       _m4 = kmalloc (sizeof (*_m4), GFP_KERNEL);
+       memcpy (_m4, &m4, sizeof (m4));
+       _m4->kue.flags = KUE_KFREE;
+       kue_send_event (&_m4->kue);

_m4 and m4 are placeholders that must exist before being queued,
there's just no way around that. With relayfs you would do:
relay_write(channel_id, &m4, , time_delta_offset);

When the channel buffer is mmap'ed in the user-process' address space,
all that is needed is a write() with a pointer to the buffer for it
to go to storage. There is zero-copying going on here.

Plus, kue uses lists with next & prev pointers. That simply won't
scale if you have a buffer filling at the rate of 10,000 events/s.
Also, at that rate, you simply can't wait on the reader to read
events one-by-one until you can reuse the structure where you
stored the data to be read. The data has to be secured in the buffer
at the return of the logging function (relay_write() in the case of
relayfs) and the reader has to read events by the thousand every
time.

> This is where I think relayfs is doing too much, and that is the
> reason why I implemented the kue stuff. It is very lightweight
> and does almost the same [of course, it is not bidirectional, but
> still nobody asked for that].

relayfs is there to solve the data transfer problems for the most
demanding of applications. Sending a few messages here and there
isn't really a problem. Sending messages/events/what-you-want-to-call-it
by the thousand every second, while using as little locking as possible
(lockless-logging is implemented in the case of relayfs' buffer handling
routines), and providing per-cpu buffering requires a different beast.

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* RE: [patch] printk subsystems
@ 2003-04-17 19:58 Perez-Gonzalez, Inaky
  2003-04-17 20:34 ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-17 19:58 UTC (permalink / raw)
  To: 'karim@opersys.com', 'Martin Hicks'
  Cc: 'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


> From: Karim Yaghmour [mailto:karim@opersys.com]
>
> I beg to differ. There's a point where we've got to stop saying "oh,
> this buffering mechanism is special and it requires its own code."
> relayfs is there to provide a unified light-weight mechanism for
> transfering large amounts of data from the kernel to user space.

But you don't need to provide buffers, because normally the data
is already in the kernel, so why need to copy it to another buffer
for delivery?

That's the point I tried to address with the kue patches I posted 
last week - once you have the data, wherever, you just queue it
for delivery, and provide the delivery subsystem for means to 
destroy it when it is delivered (and thus, not needed anymore)
[currently I only support kfree(), but I plan to add a destructor
function that at the same time can work as a callback for delivery].

This is where I think relayfs is doing too much, and that is the
reason why I implemented the kue stuff. It is very lightweight
and does almost the same [of course, it is not bidirectional, but
still nobody asked for that].

Cheers,

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-16 12:43                         ` Daniel Stekloff
@ 2003-04-17 15:56                           ` Martin Hicks
  2003-04-17 13:58                             ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Martin Hicks @ 2003-04-17 15:56 UTC (permalink / raw)
  To: Daniel Stekloff
  Cc: Martin Hicks, Patrick Mochel, Randy.Dunlap, hpa, pavel, jes,
	linux-kernel, wildos



On Wed, Apr 16, 2003 at 12:43:58PM +0000, Daniel Stekloff wrote:
> On Wednesday 16 April 2003 07:16 pm, Martin Hicks wrote:
> > On Wed, Apr 16, 2003 at 11:42:59AM -0700, Patrick Mochel wrote:
> > > > I like the idea of having logging levels, which include debug, defined
> > > > by subsystem. Each subsystem will have separate requirements for
> > > > logging. Networking, for instance, already has the NETIF_MSG* levels
> > > > defined in netdevice.h that can be set with Ethtool. I can see, for
> > > > example, having the msg_enable not in the private data as it is now but
> > > > in the subsystem or class structure for that device, such as in struct
> > > > net_device. This could easily be exported through sysfs.
> > >
> > > It would be nice. Unfortunately, it's only a nifty pipe-dream at the
> > > moment, unless some lucky volunteer would like to step forward. ;)
> >
> > I guess my question is this:
> >
> > Is the patch I posted useful enough to go into the kernel?  I think it
> > is.  It introduces very little overhead, and provides most of the
> > functionality that you guys are discussing.  It does use sysctl, and not
> > sysfs but does that really matter?
> 
> 
> I would rather not see the filtering applied to printk specifically like 
> you've done it. I think this is still another stop gap measure for buffer 
> overruns. I would like to see for:
> 
> 1) Buffer overruns - a mechanism that wouldn't hit a buffer overrun, say a 
> relayfs implementation of printk that could be easily configured in, or a 
> mechanism that knows/reports when a overrun has happened like the Linux event 
> logging project.

I don't think relayfs solves the problem either.  This just adds an
extra dependency for yet another pseudo-filesystem.  printk is something
that needs to "just work" even if the kernel is in the midst of
crashing.  Adding the extra complexity of all printk going out through a
filesystem/buffer layer is not desirable, IMHO.

It seems that the relayfs solution for buffer overflows in the printk 
buffer is to just make lots of buffers.  I really want to be able to
turn off prink logging for stuff I don't care about, without the
complexity of having fifteen different logs to look in and changing 
how get get log info from the kernel to syslog.

> 
> 2) Message filtering - a mechanism above printk that allows filtering on the 
> fly and built into the new device model. Such a mechanism as Patrick 
> described that could be put into the dev_* macros in device.h. 

I haven't looked into these features too much.  Is every piece of
hardware in a machine considered a device?
i.e., can messages from CPU probing, Memory, NUMA nodes, etc. be
filtered separately while changing the logging level on these devices at
runtime?

The dev_* printk macros are all, of course, resolved at runtime.  How
does one control these printk's at runtime?

mh

-- 
Wild Open Source Inc.                  mort@wildopensource.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-17 15:56                           ` Martin Hicks
@ 2003-04-17 13:58                             ` Karim Yaghmour
  0 siblings, 0 replies; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-17 13:58 UTC (permalink / raw)
  To: Martin Hicks
  Cc: Daniel Stekloff, Patrick Mochel, Randy.Dunlap, hpa, pavel, jes,
	linux-kernel, wildos, Tom Zanussi


Martin Hicks wrote:
> I don't think relayfs solves the problem either.  This just adds an
> extra dependency for yet another pseudo-filesystem.  printk is something
> that needs to "just work" even if the kernel is in the midst of
> crashing.  Adding the extra complexity of all printk going out through a
> filesystem/buffer layer is not desirable, IMHO.

I beg to differ. There's a point where we've got to stop saying "oh,
this buffering mechanism is special and it requires its own code."
relayfs is there to provide a unified light-weight mechanism for
transfering large amounts of data from the kernel to user space.

> It seems that the relayfs solution for buffer overflows in the printk
> buffer is to just make lots of buffers.  I really want to be able to
> turn off prink logging for stuff I don't care about, without the
> complexity of having fifteen different logs to look in and changing
> how get get log info from the kernel to syslog.

Again, as I said earlier, relayfs doesn't care about filtering. That's
to the upper layers to take care of. It so happens that relayfs simplifies
filtering by allowing the upper layers to mux their data using separate
channels. In no way is anyone forced to do that, though. It's there if
you need it, and if you need to simply have a is_this_message_logged()
function, then so be it, but that's yours to implement.

As for buffer overflows and printk, automatically resizeable log buffers
using a water-mark scheme are on the relayfs to-do list.

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-16 18:42                     ` Patrick Mochel
  2003-04-16 12:35                       ` Daniel Stekloff
@ 2003-04-16 19:16                       ` Martin Hicks
  2003-04-16 12:43                         ` Daniel Stekloff
  1 sibling, 1 reply; 52+ messages in thread
From: Martin Hicks @ 2003-04-16 19:16 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Daniel Stekloff, Randy.Dunlap, Martin Hicks, hpa, pavel, jes,
	linux-kernel, wildos



On Wed, Apr 16, 2003 at 11:42:59AM -0700, Patrick Mochel wrote:
> 
> > I like the idea of having logging levels, which include debug, defined by 
> > subsystem. Each subsystem will have separate requirements for logging. 
> > Networking, for instance, already has the NETIF_MSG* levels defined in 
> > netdevice.h that can be set with Ethtool. I can see, for example, having the 
> > msg_enable not in the private data as it is now but in the subsystem or class 
> > structure for that device, such as in struct net_device. This could easily be 
> > exported through sysfs. 
> 
> It would be nice. Unfortunately, it's only a nifty pipe-dream at the 
> moment, unless some lucky volunteer would like to step forward. ;)
> 


I guess my question is this:

Is the patch I posted useful enough to go into the kernel?  I think it
is.  It introduces very little overhead, and provides most of the
functionality that you guys are discussing.  It does use sysctl, and not
sysfs but does that really matter?

mh

-- 
Wild Open Source Inc.                  mort@wildopensource.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-14 22:33                   ` Daniel Stekloff
@ 2003-04-16 18:42                     ` Patrick Mochel
  2003-04-16 12:35                       ` Daniel Stekloff
  2003-04-16 19:16                       ` Martin Hicks
  0 siblings, 2 replies; 52+ messages in thread
From: Patrick Mochel @ 2003-04-16 18:42 UTC (permalink / raw)
  To: Daniel Stekloff
  Cc: Randy.Dunlap, Martin Hicks, hpa, pavel, jes, linux-kernel, wildos


> Would the debug level be for the entire subsystem? Do you think people would 
> like to be able to set debug/logging level per driver or device, and not just 
> subsystem? 

I can see a use for doing per-object debug levels, but I'd rather not add 
the overhead to every object, especially when it would be used by a small 
minority of the populace. 

Such a flag could easily be placed in the subsystem-specific object, and 
accessed through the logging/debugging wrappers.

> Is debugging level here the same as logging level? 

Yes. 

> I like the idea of having logging levels, which include debug, defined by 
> subsystem. Each subsystem will have separate requirements for logging. 
> Networking, for instance, already has the NETIF_MSG* levels defined in 
> netdevice.h that can be set with Ethtool. I can see, for example, having the 
> msg_enable not in the private data as it is now but in the subsystem or class 
> structure for that device, such as in struct net_device. This could easily be 
> exported through sysfs. 

It would be nice. Unfortunately, it's only a nifty pipe-dream at the 
moment, unless some lucky volunteer would like to step forward. ;)


	-pat


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-16 19:16                       ` Martin Hicks
@ 2003-04-16 12:43                         ` Daniel Stekloff
  2003-04-17 15:56                           ` Martin Hicks
  0 siblings, 1 reply; 52+ messages in thread
From: Daniel Stekloff @ 2003-04-16 12:43 UTC (permalink / raw)
  To: Martin Hicks, Patrick Mochel
  Cc: Randy.Dunlap, Martin Hicks, hpa, pavel, jes, linux-kernel, wildos

On Wednesday 16 April 2003 07:16 pm, Martin Hicks wrote:
> On Wed, Apr 16, 2003 at 11:42:59AM -0700, Patrick Mochel wrote:
> > > I like the idea of having logging levels, which include debug, defined
> > > by subsystem. Each subsystem will have separate requirements for
> > > logging. Networking, for instance, already has the NETIF_MSG* levels
> > > defined in netdevice.h that can be set with Ethtool. I can see, for
> > > example, having the msg_enable not in the private data as it is now but
> > > in the subsystem or class structure for that device, such as in struct
> > > net_device. This could easily be exported through sysfs.
> >
> > It would be nice. Unfortunately, it's only a nifty pipe-dream at the
> > moment, unless some lucky volunteer would like to step forward. ;)
>
> I guess my question is this:
>
> Is the patch I posted useful enough to go into the kernel?  I think it
> is.  It introduces very little overhead, and provides most of the
> functionality that you guys are discussing.  It does use sysctl, and not
> sysfs but does that really matter?


I would rather not see the filtering applied to printk specifically like 
you've done it. I think this is still another stop gap measure for buffer 
overruns. I would like to see for:

1) Buffer overruns - a mechanism that wouldn't hit a buffer overrun, say a 
relayfs implementation of printk that could be easily configured in, or a 
mechanism that knows/reports when a overrun has happened like the Linux event 
logging project.

For relayfs, please see: http://www.opersys.com/relayfs/index.html
For event logging, please see: http://evlog.sourceforge.net/

2) Message filtering - a mechanism above printk that allows filtering on the 
fly and built into the new device model. Such a mechanism as Patrick 
described that could be put into the dev_* macros in device.h. 

Thanks,

Dan



> mh


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-16 18:42                     ` Patrick Mochel
@ 2003-04-16 12:35                       ` Daniel Stekloff
  2003-04-16 19:16                       ` Martin Hicks
  1 sibling, 0 replies; 52+ messages in thread
From: Daniel Stekloff @ 2003-04-16 12:35 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Randy.Dunlap, Martin Hicks, hpa, pavel, jes, linux-kernel, wildos

On Wednesday 16 April 2003 06:42 pm, Patrick Mochel wrote:
> > Would the debug level be for the entire subsystem? Do you think people
> > would like to be able to set debug/logging level per driver or device,
> > and not just subsystem?
>
> I can see a use for doing per-object debug levels, but I'd rather not add
> the overhead to every object, especially when it would be used by a small
> minority of the populace.
>
> Such a flag could easily be placed in the subsystem-specific object, and
> accessed through the logging/debugging wrappers.


I was thinking this as well, having the dev_* macros make the check for the 
current logging level. That way each call to the macros wouldn't have to 
check the flag but could be part of the added value the macros give us.



> > Is debugging level here the same as logging level?
>
> Yes.
>
> > I like the idea of having logging levels, which include debug, defined by
> > subsystem. Each subsystem will have separate requirements for logging.
> > Networking, for instance, already has the NETIF_MSG* levels defined in
> > netdevice.h that can be set with Ethtool. I can see, for example, having
> > the msg_enable not in the private data as it is now but in the subsystem
> > or class structure for that device, such as in struct net_device. This
> > could easily be exported through sysfs.
>
> It would be nice. Unfortunately, it's only a nifty pipe-dream at the
> moment, unless some lucky volunteer would like to step forward. ;)


This is where I was going back when I sent you the patch for the network 
device class. Unfortunately, I haven't returned to looking at it. Too busy 
banging my head against other things... <grin>



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-15 13:27                   ` Martin Hicks
@ 2003-04-15 14:40                     ` Karim Yaghmour
  0 siblings, 0 replies; 52+ messages in thread
From: Karim Yaghmour @ 2003-04-15 14:40 UTC (permalink / raw)
  To: Martin Hicks
  Cc: Patrick Mochel, Randy.Dunlap, hpa, pavel, jes, linux-kernel,
	wildos, Tom Zanussi


Martin Hicks wrote:
> I'm not sure that this addresses the core problem that I'm trying to
> deal with.  The problem is that machines with certain configurations
> (large number of CPUs, Nodes, or a bunch of SCSI and disks) display far
> too many messages to the console, resulting in the log buffer being
> overflowed.  The method that I'm proposing simply allows you to decide
> what gets logged when a printk() happens, depending on the message's
> priority and which subsystem it originated from.

I'm not going to address the "filtering" aspect of the problem, but
I would like to point out that this issue of printk overflowing and
having multiple streams of printk is already solved by relayfs:
http://www.opersys.com/relayfs/

With relayfs, one could easily have multi-channel printks (e.g. one
for each "subsystem" and a main one for important messages of all
subsystems.) The advantages of relayfs are obvious:
- No more lost printks.
- A unified buffering scheme for all subsystems that need to send
data to user-space.
- Lockless per-cpu buffering.
- etc.

We've already started playing around with printk on relayfs, though
we don't have code to offer at this time.

In terms of init-time printk'ing with relayfs, this is the scheme
I suggest:
- Change all statically allocated printk buffers to __initdata.
- Add a registration function in kernel/printk.c which is called
during initialization.
- Said function:
	1) Allocates relayfs channel(s)
	2) Atomically copies existing init-time data to channel
	3) Starts using relayfs channel for all future transfers

That's it. Thereafter, all statically allocated printk buffers are
dropped and all buffer management is left to relayfs.

[The filtering aspect is not taken care of by relayfs because it
is not part of its "mandate". relayfs only aims at providing a
very reliable lightweight high-speed data transfer mechanism for
providing kernel data to user space. Higher-level mechanisms can
easily use different relayfs channels to filter/mux data.]

Karim

===================================================

                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-14 18:33                 ` Patrick Mochel
  2003-04-14 22:33                   ` Daniel Stekloff
@ 2003-04-15 13:27                   ` Martin Hicks
  2003-04-15 14:40                     ` Karim Yaghmour
  1 sibling, 1 reply; 52+ messages in thread
From: Martin Hicks @ 2003-04-15 13:27 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Randy.Dunlap, Martin Hicks, hpa, pavel, jes, linux-kernel, wildos



On Mon, Apr 14, 2003 at 01:33:54PM -0500, Patrick Mochel wrote:
> 
> > I don't like the #define DEBUG approach.  It's useless for users; it's a
> > developer debug tool.  It won't allow some support staff to ask users to
> > enable module debugging (or subsystem debugging) and see what gets printed.
> 
> Agreed. Having a runtime-tweakable field would be very handy, and 
> something that's been requested many times over. 
> 
> > Martin, you are ahead of my schedule, but I was planning to use sysfs
> > to add a 'debug' flag/file that could be dynamically altered on a per-module
> > basis.
> 
> Something I've pondered in the past is a per-subsystem (as in struct 
> subsystem) debug field and log buffer. When the subsystem is registered, a 
> sysfs 'debug' file is created, from which the user can set the noisiness 
> level. 
> 
> >From there, each subsystem can specify the size of a log buffer, which 
> would be allocated also when the subsystem is registered. Messages from 
> the subsystem, and kobjects belonging to it, would be copied into the 
> local log buffer. 
> 
> Wrapper functions can be created, similar to the dev_* functions, which 
> take a kobject as the first parameter. From this, the subsystem and log 
> buffer, can be derived (or rather, passed to a lower-level helper). 
> 
> This all falls under the 'gee-whiz-this-might-be-neat' category, and may
> inherently suck; I haven't tried it. Doing the core code is < 1 day's
> work, though there would be nothing that actually used it..

I'm not sure that this addresses the core problem that I'm trying to
deal with.  The problem is that machines with certain configurations
(large number of CPUs, Nodes, or a bunch of SCSI and disks) display far
too many messages to the console, resulting in the log buffer being
overflowed.  The method that I'm proposing simply allows you to decide
what gets logged when a printk() happens, depending on the message's
priority and which subsystem it originated from.

mh

-- 
Wild Open Source Inc.                  mort@wildopensource.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-14 18:33                 ` Patrick Mochel
@ 2003-04-14 22:33                   ` Daniel Stekloff
  2003-04-16 18:42                     ` Patrick Mochel
  2003-04-15 13:27                   ` Martin Hicks
  1 sibling, 1 reply; 52+ messages in thread
From: Daniel Stekloff @ 2003-04-14 22:33 UTC (permalink / raw)
  To: Patrick Mochel, Randy.Dunlap
  Cc: Martin Hicks, hpa, pavel, jes, linux-kernel, wildos

On Monday 14 April 2003 11:33 am, Patrick Mochel wrote:
> > I don't like the #define DEBUG approach.  It's useless for users; it's a
> > developer debug tool.  It won't allow some support staff to ask users to
> > enable module debugging (or subsystem debugging) and see what gets
> > printed.
>
> Agreed. Having a runtime-tweakable field would be very handy, and
> something that's been requested many times over.
>
> > Martin, you are ahead of my schedule, but I was planning to use sysfs
> > to add a 'debug' flag/file that could be dynamically altered on a
> > per-module basis.
>
> Something I've pondered in the past is a per-subsystem (as in struct
> subsystem) debug field and log buffer. When the subsystem is registered, a
> sysfs 'debug' file is created, from which the user can set the noisiness
> level.


Would the debug level be for the entire subsystem? Do you think people would 
like to be able to set debug/logging level per driver or device, and not just 
subsystem? 

Is debugging level here the same as logging level? 

I like the idea of having logging levels, which include debug, defined by 
subsystem. Each subsystem will have separate requirements for logging. 
Networking, for instance, already has the NETIF_MSG* levels defined in 
netdevice.h that can be set with Ethtool. I can see, for example, having the 
msg_enable not in the private data as it is now but in the subsystem or class 
structure for that device, such as in struct net_device. This could easily be 
exported through sysfs. 

Thanks,

Dan


> From there, each subsystem can specify the size of a log buffer, which
> would be allocated also when the subsystem is registered. Messages from
> the subsystem, and kobjects belonging to it, would be copied into the
> local log buffer.
>
> Wrapper functions can be created, similar to the dev_* functions, which
> take a kobject as the first parameter. From this, the subsystem and log
> buffer, can be derived (or rather, passed to a lower-level helper).
>
> This all falls under the 'gee-whiz-this-might-be-neat' category, and may
> inherently suck; I haven't tried it. Doing the core code is < 1 day's
> work, though there would be nothing that actually used it..
>
>
> 	-pat


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 23:10               ` Randy.Dunlap
@ 2003-04-14 18:33                 ` Patrick Mochel
  2003-04-14 22:33                   ` Daniel Stekloff
  2003-04-15 13:27                   ` Martin Hicks
  0 siblings, 2 replies; 52+ messages in thread
From: Patrick Mochel @ 2003-04-14 18:33 UTC (permalink / raw)
  To: Randy.Dunlap; +Cc: Martin Hicks, hpa, pavel, jes, linux-kernel, wildos


> I don't like the #define DEBUG approach.  It's useless for users; it's a
> developer debug tool.  It won't allow some support staff to ask users to
> enable module debugging (or subsystem debugging) and see what gets printed.

Agreed. Having a runtime-tweakable field would be very handy, and 
something that's been requested many times over. 

> Martin, you are ahead of my schedule, but I was planning to use sysfs
> to add a 'debug' flag/file that could be dynamically altered on a per-module
> basis.

Something I've pondered in the past is a per-subsystem (as in struct 
subsystem) debug field and log buffer. When the subsystem is registered, a 
sysfs 'debug' file is created, from which the user can set the noisiness 
level. 

>From there, each subsystem can specify the size of a log buffer, which 
would be allocated also when the subsystem is registered. Messages from 
the subsystem, and kobjects belonging to it, would be copied into the 
local log buffer. 

Wrapper functions can be created, similar to the dev_* functions, which 
take a kobject as the first parameter. From this, the subsystem and log 
buffer, can be derived (or rather, passed to a lower-level helper). 

This all falls under the 'gee-whiz-this-might-be-neat' category, and may
inherently suck; I haven't tried it. Doing the core code is < 1 day's
work, though there would be nothing that actually used it..


	-pat


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-07 20:13 Martin Hicks
  2003-04-08 18:41 ` Pavel Machek
@ 2003-04-11 19:21 ` Martin Hicks
  1 sibling, 0 replies; 52+ messages in thread
From: Martin Hicks @ 2003-04-11 19:21 UTC (permalink / raw)
  To: linux-kernel; +Cc: hpa, wildos


Hello,

Here is the next iteration of this patch.  This time it includes some
documentation as well as a sysctl interface and a kernel command line
option.

The patch is against 2.5-bk.

Any comments?
mh

-- 
Wild Open Source Inc.                  mort@wildopensource.com


# This is a BitKeeper generated patch for the following project:
# Project Name: Linux kernel tree
# This patch format is intended for GNU patch command version 2.5 or higher.
# This patch includes the following deltas:
#	           ChangeSet	1.1184  -> 1.1186 
#	include/linux/kernel.h	1.35    -> 1.36   
#	     kernel/sysctl.c	1.41    -> 1.42   
#	include/linux/sysctl.h	1.42    -> 1.43   
#	     kernel/printk.c	1.24    -> 1.25   
#	               (new)	        -> 1.2     Documentation/printksubsystems.txt
#
# The following is the BitKeeper ChangeSet Log
# --------------------------------------------
# 03/04/11	mort@socrates.bork.org	1.1185
# Add printk subsystems.
# --------------------------------------------
# 03/04/11	mort@socrates.bork.org	1.1186
# Documentation updates.
# --------------------------------------------
#
diff -Nru a/Documentation/printksubsystems.txt b/Documentation/printksubsystems.txt
--- /dev/null	Wed Dec 31 16:00:00 1969
+++ b/Documentation/printksubsystems.txt	Fri Apr 11 15:14:14 2003
@@ -0,0 +1,131 @@
+
+Printk Subsystems
+=================
+
+What and Why
+------------
+
+Printk subsystems were introduced to provide a mechanism to control
+which messages are actually logged into the fixed length printk buffer.
+As the Linux Kernel has been made to work on larger and larger 
+machines, the number of messages that are displayed on the console
+during bootup have increased also.  Certain subsystems are extremely
+verbose and are easily able to overflow the fixed length printk buffer.
+
+Although simply making the printk buffer larger is possible, this is 
+just a stop gap solution.  It was decided that there should be a 
+method to partition printk calls into different categories based
+on which subsystem they originate from so they can be filtered at 
+run time.  Please note that this depends on people using the KERN_*
+printk priority system.
+
+Printk subsystems are a benefit to anyone but are particularly useful 
+for those who maintain or have customers who maintain (for example)  
+large SMP machines, large NUMA machines, or machines with many SCSI 
+controllers and disks. They allow you to control how verbose each 
+subsystem is during normal operation.  If you run into a problem more 
+messages can be logged by increasing the loglevel for that particular 
+subsystem.  The main point is that all of the printk strings are still
+in the kernel, they just aren't placed into the printk log if they
+aren't high enough priority.
+
+
+
+How to use it
+-------------
+
+The way everyone currently calls printk is something like this:
+
+printk(KERN_NOTIFY "My message.  Value = %d\n", foo);
+
+Another set of flags have been added that assign the message to a
+particular printk subsystem.  Currently these are:
+
+PRINTK_UNASS     -- The default if no identifier is provided.
+PRINTK_CORE      -- For core messages (e.g., cpu messages, memory, etc.)
+PRINTK_SCSI      -- Messages related to SCSI.
+PRINTK_NET       -- Messages related to networking.
+PRINTK_USB       -- Messages related to USB.
+
+See include/linux/kernel.h for the latest list (just in case the above 
+isn't kept up-to-date).
+
+If the above printk was originating from somewhere in the network 
+hierarchy then the author should use:
+
+printk(PRINTK_NET KERN_NOTIFY "My message.  Value = %d\n", foo);
+
+
+
+Configuration Parameters
+------------------------
+
+Each of the printk subsystems has a set of parameters associated with it.
+These are the same values that are associated with the console loglevel
+(/proc/sys/kernel/printk).  There are 4 integer parameters:
+
+-Subsystem loglevel
+-Default message loglevel
+-Minimum console loglevel
+-Default console loglevel
+
+The filtering is very simple.  If the message that comes in is not assigned
+to a printk subsystem it is assigned to PRINTK_UNASS.  Then, if there
+is no priority (KERN_*) assigned to the message, it is given a the
+"default message loglevel" priority for the subsystem that the message
+originated from.  Finally, if the message loglevel value is less than the 
+subsystem loglevel value then the message is placed in the printk buffer.  
+It then makes it's way to other locations such as the console or syslog.  
+
+Note that the console_printk's "default message loglevel" is no longer used 
+because if a message has no KERN_* flag prepended to the message then it is 
+assigned the printk subsystem's default message loglevel, not the 
+console_printk's default message loglevel.
+
+These printk subsystem values are configurable through the sysctl interface.  
+The sysctl files associated with this are located in 
+/proc/sys/kernel/printk_subsystem/
+
+The subsystem loglevel is also configurable through a command line option.
+The latter three values are only configurable through sysctl.  If you 
+require a different initial value for any of the latter three values 
+you must recompile the kernel, changing the values of the prink_subsystem 
+array in kernel/printk.c
+
+To change the subsystem loglevel you simply provide a comma separated 
+list of values to the "printk_subsys" kernel command line option.  To
+use a default loglevel for a particular queue, assign the special value
+"-1".
+
+E.g., To set the threshold for unassigned, core and scsi to 6, 5, 4 
+(respectively) add the following to the kernel command line:
+
+printk_subsys=6,5,4
+
+E.g., To set the loglevel of core and net to 5 add the following:
+
+printk_subsys=-1,5,-1,5
+
+
+
+Adding a new printk subsystem
+-----------------------------
+
+1)  In include/linux/printk.h:
+  
+    - Add a new PRINTK_ define
+    - Modify LAST_PRINTK_SUBSYS
+    - Modify NUM_PRINTK_SUBSYSTEMS
+
+2)  In kernel/printk.c modify the printk_subsystem initializer if you
+    would like different defaults for the new printk subsystem.
+
+3)  In include/linux/sysctl.h add a new element to the enum with names
+    like PRINTK_SUBSYS_* that describes your new printk subsystem.
+
+4)  In kernel/sysctl.c add a new entry to the printk_subsys_table.
+
+Recompile and you should have a new printk subsystem available for use.
+
+--
+Martin Hicks <mort@wildopensource.com>  --  April 10, 2003
diff -Nru a/include/linux/kernel.h b/include/linux/kernel.h
--- a/include/linux/kernel.h	Fri Apr 11 15:14:14 2003
+++ b/include/linux/kernel.h	Fri Apr 11 15:14:14 2003
@@ -47,6 +47,17 @@
 #define minimum_console_loglevel (console_printk[2])
 #define default_console_loglevel (console_printk[3])
 
+/* Printk subsystem identifiers */
+#define PRINTK_UNASS    "<A>"   /* unassigned printk subsystem          */
+#define PRINTK_CORE     "<B>"   /* from the core kernel                 */
+#define PRINTK_SCSI     "<C>"   /* from the SCSI subsystem              */
+#define PRINTK_NET      "<D>"   /* from the Net subsystem               */
+#define PRINTK_USB      "<E>"   /* from the USB subsystem               */
+
+#define FIRST_PRINTK_SUBSYS PRINTK_UNASS[1]
+#define LAST_PRINTK_SUBSYS PRINTK_USB[1]
+#define NUM_PRINTK_SUBSYSTEMS 5
+
 struct completion;
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
diff -Nru a/include/linux/sysctl.h b/include/linux/sysctl.h
--- a/include/linux/sysctl.h	Fri Apr 11 15:14:14 2003
+++ b/include/linux/sysctl.h	Fri Apr 11 15:14:14 2003
@@ -130,6 +130,7 @@
 	KERN_PIDMAX=55,		/* int: PID # limit */
   	KERN_CORE_PATTERN=56,	/* string: pattern for core-file names */
 	KERN_PANIC_ON_OOPS=57,  /* int: whether we will panic on an oops */
+        KERN_PRINTK_SUBSYS=58,  /* intvec: controls printk subsystem log levels */
 };
 
 
@@ -190,6 +191,16 @@
 	RANDOM_WRITE_THRESH=4,
 	RANDOM_BOOT_ID=5,
 	RANDOM_UUID=6
+};
+
+/* /proc/sys/kernel/prink_subsystem */
+enum
+{
+        PRINTK_SUBSYS_UNASS=1,
+        PRINTK_SUBSYS_CORE=2,
+        PRINTK_SUBSYS_SCSI=3,
+        PRINTK_SUBSYS_NET=4,
+        PRINTK_SUBSYS_USB=5,
 };
 
 /* /proc/sys/bus/isa */
diff -Nru a/kernel/printk.c b/kernel/printk.c
--- a/kernel/printk.c	Fri Apr 11 15:14:14 2003
+++ b/kernel/printk.c	Fri Apr 11 15:14:14 2003
@@ -42,6 +42,9 @@
 #define MINIMUM_CONSOLE_LOGLEVEL 1 /* Minimum loglevel we let people use */
 #define DEFAULT_CONSOLE_LOGLEVEL 7 /* anything MORE serious than KERN_DEBUG */
 
+#define MINIMUM_SUBSYS_LOGLEVEL 1
+#define DEFAULT_SUBSYS_LOGLEVEL 8
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 
 int console_printk[4] = {
@@ -51,6 +54,19 @@
 	DEFAULT_CONSOLE_LOGLEVEL,	/* default_console_loglevel */
 };
 
+/*  [][0] == subsystem log level
+ *  [][1] == default message loglevel
+ *  [][2] == minimum subsystem loglevel
+ *  [][3] == default subsystem loglevel */
+int printk_subsystem[NUM_PRINTK_SUBSYSTEMS][4] = {  
+        [0 ... NUM_PRINTK_SUBSYSTEMS-1] = { 
+                DEFAULT_SUBSYS_LOGLEVEL,
+                DEFAULT_MESSAGE_LOGLEVEL,
+                MINIMUM_SUBSYS_LOGLEVEL,
+                DEFAULT_SUBSYS_LOGLEVEL
+        }
+};
+
 int oops_in_progress;
 
 /*
@@ -141,6 +157,27 @@
 
 __setup("console=", console_setup);
 
+
+/*
+ *   Process the command line arguments for the printk subsystems.
+ */
+static int __init printk_subsys_setup(char *str)
+{
+        int i, ret, val;
+
+        for (i = 0; i < NUM_PRINTK_SUBSYSTEMS; i++) {
+                ret = get_option(&str, &val);
+                if (!ret)
+                        break;
+                if (val >= 0 && val <= 8)
+                        printk_subsystem[i][0] = val;
+        }
+
+        return 1;
+}
+
+__setup("printk_subsys=", printk_subsys_setup);
+
 /*
  * Commands to do_syslog:
  *
@@ -390,10 +427,11 @@
 {
 	va_list args;
 	unsigned long flags;
-	int printed_len;
+	int printed_len, msg_log_level, msg_subsystem, i;
 	char *p;
 	static char printk_buf[1024];
-	static int log_level_unknown = 1;
+	static int begin_message = 1;
+        
 
 	if (oops_in_progress) {
 		/* If a crash is occurring, make sure we can't deadlock */
@@ -409,23 +447,45 @@
 	va_start(args, fmt);
 	printed_len = vsnprintf(printk_buf, sizeof(printk_buf), fmt, args);
 	va_end(args);
-
+        
 	/*
-	 * Copy the output into log_buf.  If the caller didn't provide
-	 * appropriate log level tags, we insert them here
+	 * Copy the output into log_buf.
 	 */
-	for (p = printk_buf; *p; p++) {
-		if (log_level_unknown) {
-			if (p[0] != '<' || p[1] < '0' || p[1] > '7' || p[2] != '>') {
+        p = printk_buf;
+	while (*p) {
+		if (begin_message) {
+                        /* Figure out if there is zero, one or two flags */
+                        msg_log_level = -1;
+                        msg_subsystem = 0;  /* A - Unassigned */
+                        for (i = 0; i < 2; i++) {
+				if (p[0] == '<' && p[2] == '>') {
+                                	if (p[1] >= '0' && p[1] <= '7')
+                                        	msg_log_level = p[1] - '0';
+                                	if (p[1] >= FIRST_PRINTK_SUBSYS && 
+                                            p[1] <= LAST_PRINTK_SUBSYS)
+                                        	msg_subsystem = p[1] - FIRST_PRINTK_SUBSYS;
+				} else 
+					break;
+				p+=3;
+			}
+
+                        /* Decide if we print this message at all */
+                        if (msg_log_level == -1)
+                                msg_log_level = printk_subsystem[msg_subsystem][1];
+                                
+                        if (msg_log_level < printk_subsystem[msg_subsystem][0]) {
+                                begin_message = 0;
 				emit_log_char('<');
-				emit_log_char(default_message_loglevel + '0');
+                                emit_log_char(msg_log_level + '0');
 				emit_log_char('>');
+                        } else { // Get out of this loop.  Don't log anything.
+                                break;
 			}
-			log_level_unknown = 0;
 		}
 		emit_log_char(*p);
 		if (*p == '\n')
-			log_level_unknown = 1;
+			begin_message = 1;
+                p++;
 	}
 
 	if (!cpu_online(smp_processor_id())) {
diff -Nru a/kernel/sysctl.c b/kernel/sysctl.c
--- a/kernel/sysctl.c	Fri Apr 11 15:14:14 2003
+++ b/kernel/sysctl.c	Fri Apr 11 15:14:14 2003
@@ -57,6 +57,7 @@
 extern int cad_pid;
 extern int pid_max;
 extern int sysctl_lower_zone_protection;
+extern int printk_subsystem[][4];
 
 /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */
 static int maxolduid = 65535;
@@ -122,6 +123,7 @@
 static ctl_table debug_table[];
 static ctl_table dev_table[];
 extern ctl_table random_table[];
+static ctl_table printk_subsys_table[];
 
 /* /proc declarations: */
 
@@ -265,6 +267,7 @@
 	 0600, NULL, &proc_dointvec},
 	{KERN_PANIC_ON_OOPS,"panic_on_oops",
 	 &panic_on_oops,sizeof(int),0644,NULL,&proc_dointvec},
+        {KERN_PRINTK_SUBSYS, "printk_subsystem", NULL, 0, 0555, printk_subsys_table},
 	{0}
 };
 
@@ -363,6 +366,20 @@
 static ctl_table dev_table[] = {
 	{0}
 };  
+
+static ctl_table printk_subsys_table[] = {
+        {PRINTK_SUBSYS_UNASS, "unassigned", printk_subsystem[0], 4*sizeof(int), 
+         0644, NULL, &proc_dointvec},
+        {PRINTK_SUBSYS_CORE, "core", printk_subsystem[1], 4*sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {PRINTK_SUBSYS_SCSI, "scsi", printk_subsystem[2], 4*sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {PRINTK_SUBSYS_NET, "net", printk_subsystem[3], 4*sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {PRINTK_SUBSYS_USB, "usb", printk_subsystem[4], 4*sizeof(int),
+         0644, NULL, &proc_dointvec},
+        {0}
+};
 
 extern void init_irq_proc (void);
 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
@ 2003-04-08 23:15 Chuck Ebbert
  0 siblings, 0 replies; 52+ messages in thread
From: Chuck Ebbert @ 2003-04-08 23:15 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

Pavel Machek wrote:

> Well, I think we should first kill all crappy messages -- that
> benefits everyone.


 ...and _I_ want a bootverbose option like FreeBSD has.

 The default should be for each driver to print one line when
it initializes so you know it's there...





--
 Chuck
 I am not a number!

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 22:55             ` Martin Hicks
@ 2003-04-08 23:10               ` Randy.Dunlap
  2003-04-14 18:33                 ` Patrick Mochel
  0 siblings, 1 reply; 52+ messages in thread
From: Randy.Dunlap @ 2003-04-08 23:10 UTC (permalink / raw)
  To: Martin Hicks; +Cc: hpa, pavel, jes, linux-kernel, wildos

On Tue, 8 Apr 2003 18:55:23 -0400 Martin Hicks <mort@wildopensource.com> wrote:

| On Tue, Apr 08, 2003 at 03:05:07PM -0700, H. Peter Anvin wrote:
| > Pavel Machek wrote:
| > > 
| > > Well, #define DEBUG in the driver seems like the way to go. I do not
| > > like "subsystem ID" idea, because subsystems are not really well
| > > defined etc.
| > >
| > 
| > I think that's a non-issue, because it's largely self-defining.  It's
| > basically whatever the developers want them to be, because they're the
| > ones who it needs to make sense to.
| 
| Exactly right.  The worst cases are: 1) developers  assign messages
| to a completely wrong subsystem or 2) don't assign the printk to any
| subsystem, in which case we're in exactly the same situation as we are
| in now.
| 
| > It should, however, be an open set, not a closed set like in syslog.
| 
| I agree.  I'll try to make it as easy as possible to add another
| subsystem.
| 
| I'm going to work on the sysctl interface for this next.

Eek, I have some opinions on this too.

I don't like the #define DEBUG approach.  It's useless for users; it's a
developer debug tool.  It won't allow some support staff to ask users to
enable module debugging (or subsystem debugging) and see what gets printed.

Martin, you are ahead of my schedule, but I was planning to use sysfs
to add a 'debug' flag/file that could be dynamically altered on a per-module
basis.

--
~Randy

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 22:05           ` H. Peter Anvin
@ 2003-04-08 22:55             ` Martin Hicks
  2003-04-08 23:10               ` Randy.Dunlap
  0 siblings, 1 reply; 52+ messages in thread
From: Martin Hicks @ 2003-04-08 22:55 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Pavel Machek, Jes Sorensen, linux-kernel, wildos



On Tue, Apr 08, 2003 at 03:05:07PM -0700, H. Peter Anvin wrote:
> Pavel Machek wrote:
> > 
> > Well, #define DEBUG in the driver seems like the way to go. I do not
> > like "subsystem ID" idea, because subsystems are not really well
> > defined etc.
> >
> 
> I think that's a non-issue, because it's largely self-defining.  It's
> basically whatever the developers want them to be, because they're the
> ones who it needs to make sense to.

Exactly right.  The worst cases are: 1) developers  assign messages
to a completely wrong subsystem or 2) don't assign the printk to any
subsystem, in which case we're in exactly the same situation as we are
in now.

> It should, however, be an open set, not a closed set like in syslog.

I agree.  I'll try to make it as easy as possible to add another
subsystem.

I'm going to work on the sysctl interface for this next.

mh

-- 
Wild Open Source Inc.                  mort@wildopensource.com

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 21:57         ` Pavel Machek
  2003-04-08 22:02           ` Jes Sorensen
@ 2003-04-08 22:05           ` H. Peter Anvin
  2003-04-08 22:55             ` Martin Hicks
  1 sibling, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2003-04-08 22:05 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jes Sorensen, Martin Hicks, linux-kernel, wildos

Pavel Machek wrote:
> 
> Well, #define DEBUG in the driver seems like the way to go. I do not
> like "subsystem ID" idea, because subsystems are not really well
> defined etc.
>

I think that's a non-issue, because it's largely self-defining.  It's
basically whatever the developers want them to be, because they're the
ones who it needs to make sense to.

It should, however, be an open set, not a closed set like in syslog.

	-hpa


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 21:57         ` Pavel Machek
@ 2003-04-08 22:02           ` Jes Sorensen
  2003-04-08 22:05           ` H. Peter Anvin
  1 sibling, 0 replies; 52+ messages in thread
From: Jes Sorensen @ 2003-04-08 22:02 UTC (permalink / raw)
  To: Pavel Machek; +Cc: H. Peter Anvin, Martin Hicks, linux-kernel, wildos

>>>>> "Pavel" == Pavel Machek <pavel@ucw.cz> writes:

Pavel> Well, #define DEBUG in the driver seems like the way to go. I
Pavel> do not like "subsystem ID" idea, because subsystems are not
Pavel> really well defined etc.

Which doesn't solve the problem as this means the end user will have
to recompile his/her kernel to debug things. When Joe Random is
sitting with his favorite distro CD trying to install it on a brand
new motherboard doing funky things in it's ACPI routing or something
like that, it's very useful for SuSE/Red Hat/Mandrake/Debian etc. to
be able to instruct him to set this debug flag and tell them what
happens.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 21:02     ` Pavel Machek
  2003-04-08 21:10       ` H. Peter Anvin
@ 2003-04-08 22:00       ` Jes Sorensen
  1 sibling, 0 replies; 52+ messages in thread
From: Jes Sorensen @ 2003-04-08 22:00 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Martin Hicks, linux-kernel, hpa, wildos

>>>>> "Pavel" == Pavel Machek <pavel@ucw.cz> writes:

Pavel> Hi!
>> Killing the printk's means they are not around if you have an end
>> user who is running into problems at boot time. Having a feature
>> like this means they can default to 'off' then if a problem arises,
>> whoever is doing the support can ask the user to try and enable
>> printk's for say SCSI and get the input, without haven to rebuild
>> the kernel from scratch.

Pavel> Well, I think we should first kill all crappy messages -- that
Pavel> benefits everyone. I believe that if we kill all unneccessary
Pavel> (carrying no information except perhaps copyright or
Pavel> advertising) will help current problem a lot.

I agree that some messages can be eliminated, but not all of
them. Even some of the ones you suggested might be valuable to keep,
like the CPU flags. Generally this isn't a problem on a small box with
2 CPUs and 2 disks, but if you have 32 CPUs and 64 SCSI disks, the
amount of data being printed becomes quite substantial.

So while I agree that it wouldn't hurt for us to eliminate some
unncessary printk's, then I still think Martin's patch has a lot of
merit.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 21:10       ` H. Peter Anvin
@ 2003-04-08 21:57         ` Pavel Machek
  2003-04-08 22:02           ` Jes Sorensen
  2003-04-08 22:05           ` H. Peter Anvin
  0 siblings, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2003-04-08 21:57 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, Jes Sorensen, Martin Hicks, linux-kernel, wildos

Hi!

> > Well, I think we should first kill all crappy messages -- that
> > benefits everyone. I believe that if we kill all unneccessary
> > (carrying no information  except perhaps copyright or advertising)
> > will help current problem a lot.
> > 
> 
> That may sometimes be true, but a few things may be useful to be able to
> turn back on.

Well, #define DEBUG in the driver seems like the way to go. I do not
like "subsystem ID" idea, because subsystems are not really well
defined etc.
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 21:02     ` Pavel Machek
@ 2003-04-08 21:10       ` H. Peter Anvin
  2003-04-08 21:57         ` Pavel Machek
  2003-04-08 22:00       ` Jes Sorensen
  1 sibling, 1 reply; 52+ messages in thread
From: H. Peter Anvin @ 2003-04-08 21:10 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Jes Sorensen, Martin Hicks, linux-kernel, wildos

Pavel Machek wrote:
> 
> Well, I think we should first kill all crappy messages -- that
> benefits everyone. I believe that if we kill all unneccessary
> (carrying no information  except perhaps copyright or advertising)
> will help current problem a lot.
> 

That may sometimes be true, but a few things may be useful to be able to
turn back on.

	-hpa



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 20:02   ` Jes Sorensen
@ 2003-04-08 21:02     ` Pavel Machek
  2003-04-08 21:10       ` H. Peter Anvin
  2003-04-08 22:00       ` Jes Sorensen
  0 siblings, 2 replies; 52+ messages in thread
From: Pavel Machek @ 2003-04-08 21:02 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Pavel Machek, Martin Hicks, linux-kernel, hpa, wildos

Hi!

> >> Basically, each printk is assigned to a subsystem and that
> >> subsystem has the same set of values that the console_printk array
> >> has.  The difference is that the console_printk loglevel decides if
> >> the message goes to the console whereas the subsystem loglevel
> >> decides if that message goes to the log at all.
> 
> Pavel> Well, I consider this stop gap too... Right solution is to kill
> Pavel> printk()s from too verbose part so that it does not
> Pavel> overflow....
> 
> Hi Pavel,
> 
> Killing the printk's means they are not around if you have an end user
> who is running into problems at boot time. Having a feature like this
> means they can default to 'off' then if a problem arises, whoever is
> doing the support can ask the user to try and enable printk's for say
> SCSI and get the input, without haven to rebuild the kernel from
> scratch.

Well, I think we should first kill all crappy messages -- that
benefits everyone. I believe that if we kill all unneccessary
(carrying no information  except perhaps copyright or advertising)
will help current problem a lot.


See:Redundant:
Mar 31 21:38:54 amd kernel: Local APIC disabled by BIOS -- reenabling.
Mar 31 21:38:54 amd kernel: Found and enabled local APIC!
Strange:
Mar 31 21:38:54 amd kernel: -> /dev
Mar 31 21:38:54 amd kernel: -> /dev/console
Mar 31 21:38:54 amd kernel: -> /root
Could not those be made to a signle line? Or dropped completely?
Mar 31 21:38:54 amd kernel: enabled ExtINT on CPU#0
Mar 31 21:38:54 amd kernel: ESR value before enabling vector: 00000000
Mar 31 21:38:54 amd kernel: ESR value after enabling vector: 00000000
Useless:
Mar 31 21:38:54 amd kernel: Initializing RT netlink socket
Mar 31 21:38:54 amd kernel: mtrr: v2.0 (20020519)
I do not believe bio's are *that* important:
Mar 31 21:38:54 amd kernel: BIO: pool of 256 setup, 15Kb (60
bytes/bio)
Mar 31 21:38:54 amd kernel: biovec pool[0]:   1 bvecs: 256 entries (12
bytes)
Mar 31 21:38:54 amd kernel: biovec pool[1]:   4 bvecs: 256 entries (48
bytes)
Mar 31 21:38:54 amd kernel: biovec pool[2]:  16 bvecs: 256 entries
(192 bytes)
Mar 31 21:38:54 amd kernel: biovec pool[3]:  64 bvecs: 256 entries
(768 bytes)
Mar 31 21:38:54 amd kernel: biovec pool[4]: 128 bvecs: 256 entries
(1536 bytes)
Mar 31 21:38:55 amd kernel: biovec pool[5]: 256 bvecs: 256 entries
(3072 bytes)
Mar 31 21:38:55 amd kernel: block request queues:
Mar 31 21:38:55 amd kernel:  128 requests per read queue
Mar 31 21:38:55 amd kernel:  128 requests per write queue
Mar 31 21:38:55 amd kernel:  8 requests per batch
Mar 31 21:38:55 amd kernel:  enter congestion at 15
Mar 31 21:38:55 amd kernel:  exit congestion at 17
More useless stuff (could be usefull whhhhen CONFIG_MODULE, but when
in kernel?)
Mar 31 21:38:55 amd kernel: Linux Kernel Card Services 3.1.22
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver usbfs
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver hub
Mar 31 21:38:55 amd kernel: Coda Kernel/Venus communications, v5.3.15,
coda@cs.cmu.edu
Mar 31 21:38:55 amd kernel: Installing knfsd (copyright (C) 1996
okir@monad.swb.de).
Mar 31 21:38:55 amd kernel: NTFS driver 2.1.0 [Flags: R/O DEBUG].
Mar 31 21:38:55 amd kernel: udf: registering filesystem
Mar 31 21:38:55 amd kernel: Linux agpgart interface v0.100 (c) Dave
Jones
Mar 31 21:38:55 amd kernel: loop: loaded (max 8 devices)
Mar 31 21:38:55 amd kernel: pcnet32.c:v1.27b 01.10.2002
tsbogend@alpha.franken.de
Mar 31 21:38:55 amd kernel: PPP generic driver version 2.4.2
Mar 31 21:38:55 amd kernel: PPP Deflate Compression module registered
Mar 31 21:38:55 amd kernel: PPP BSD Compression module registered
Mar 31 21:38:55 amd kernel: Linux Tulip driver version 1.1.13 (May 11,
2002)
Mar 31 21:38:55 amd kernel: drivers/usb/host/uhci-hcd.c: USB Universal
Host Controller Interface driver v2.0
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver acm
Mar 31 21:38:55 amd kernel: drivers/usb/class/cdc-acm.c: v0.21:USB
Abstract Control Model driver for USB modems and ISDN adapte$Mar 31
21:38:55 amd kernel: drivers/usb/core/usb.c: registered new driver
usblp
Mar 31 21:38:55 amd kernel: drivers/usb/class/usblp.c: v0.13: USB
Printer Device Class driver
Mar 31 21:38:55 amd kernel: Initializing USB Mass Storage driver...
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver usb-storage
Mar 31 21:38:55 amd kernel: USB Mass Storage support registered.
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver hid
Mar 31 21:38:55 amd kernel: drivers/usb/input/hid-core.c: v2.0:USB HID
core driver
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver vicam
Mar 31 21:38:55 amd kernel: drivers/usb/net/cdc-ether.c:
drivers/usb/net/cdc-ether.c: v0.98.5 22 Sep 2001 Brad Hards and
anothe$Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered
new driver CDCEther
Mar 31 21:38:55 amd kernel: drivers/usb/core/usb.c: registered new
driver usbnet

...and so on andsimilar.
							Pavel
-- 
Horseback riding is like software...
...vgf orggre jura vgf serr.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-08 18:41 ` Pavel Machek
@ 2003-04-08 20:02   ` Jes Sorensen
  2003-04-08 21:02     ` Pavel Machek
  0 siblings, 1 reply; 52+ messages in thread
From: Jes Sorensen @ 2003-04-08 20:02 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Martin Hicks, linux-kernel, hpa, wildos

>>>>> "Pavel" == Pavel Machek <pavel@ucw.cz> writes:

>> Basically, each printk is assigned to a subsystem and that
>> subsystem has the same set of values that the console_printk array
>> has.  The difference is that the console_printk loglevel decides if
>> the message goes to the console whereas the subsystem loglevel
>> decides if that message goes to the log at all.

Pavel> Well, I consider this stop gap too... Right solution is to kill
Pavel> printk()s from too verbose part so that it does not
Pavel> overflow....

Hi Pavel,

Killing the printk's means they are not around if you have an end user
who is running into problems at boot time. Having a feature like this
means they can default to 'off' then if a problem arises, whoever is
doing the support can ask the user to try and enable printk's for say
SCSI and get the input, without haven to rebuild the kernel from
scratch.

For people supporting large numbers of users (like all the
distributions) this seems like a good win to me.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [patch] printk subsystems
  2003-04-07 20:13 Martin Hicks
@ 2003-04-08 18:41 ` Pavel Machek
  2003-04-08 20:02   ` Jes Sorensen
  2003-04-11 19:21 ` Martin Hicks
  1 sibling, 1 reply; 52+ messages in thread
From: Pavel Machek @ 2003-04-08 18:41 UTC (permalink / raw)
  To: Martin Hicks; +Cc: linux-kernel, hpa, wildos

Hi!

> In an effort to get greater control over which printk()'s are logged
> during boot and after, I've put together this patch that introduces the
> concept of printk subsystems.  The problem that some are beginning to
> face with larger machines is that certain subsystems are overly verbose
> (SCSI, USB, cpu related messages on large NUMA or SMP machines)
> and they overflow the buffer.  Making the logbuffer bigger is a stop gap
> solution but I think this is a more elegant solution.

> Basically, each printk is assigned to a subsystem and that subsystem has
> the same set of values that the console_printk array has.  The
> difference is that the console_printk loglevel decides if the message
> goes to the console whereas the subsystem loglevel decides if that
> message goes to the log at all.

Well, I consider this stop gap too... Right solution is to kill
printk()s from too verbose part so that it does not overflow....

								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [patch] printk subsystems
@ 2003-04-07 20:13 Martin Hicks
  2003-04-08 18:41 ` Pavel Machek
  2003-04-11 19:21 ` Martin Hicks
  0 siblings, 2 replies; 52+ messages in thread
From: Martin Hicks @ 2003-04-07 20:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: hpa, wildos


Hello,

In an effort to get greater control over which printk()'s are logged
during boot and after, I've put together this patch that introduces the
concept of printk subsystems.  The problem that some are beginning to
face with larger machines is that certain subsystems are overly verbose
(SCSI, USB, cpu related messages on large NUMA or SMP machines)
and they overflow the buffer.  Making the logbuffer bigger is a stop gap
solution but I think this is a more elegant solution.

Basically, each printk is assigned to a subsystem and that subsystem has
the same set of values that the console_printk array has.  The
difference is that the console_printk loglevel decides if the message
goes to the console whereas the subsystem loglevel decides if that
message goes to the log at all.

This patch implements the core, but I haven't yet put in the facilities
to change the default values that are used at compile-time.  I'm looking
for opinions on the architecture, not the completeness.  I plan to add
configuration through sysctl.

To use the feature you simply add the subsystem identifier to the printk
call, much the same way that you add the priority tag:

printk(PRINTK_NET KERN_INFO "This is a printk from the net subsys\n");

Each subsystem has a default KERN_* priority, and if no subsystem is
given then the printk is put into the PRINTK_UNASS queue (which is setup
by default to log all messages).

The patch is against 2.5.66.

Opinions and comments welcome
mh

-- 
Martin Hicks  ||  mort@bork.org  || PGP/GnuPG: 0x4C7F2BEE
plato up 6 days,  6:33, 14 users,  load average: 0.09, 0.11, 0.09
Beer: So much more than just a breakfast drink.



diff -X /home/mort/diff-exclude -uEr linux-2.5.66.pristine/include/linux/kernel.h linux-2.5.66/include/linux/kernel.h
--- linux-2.5.66.pristine/include/linux/kernel.h	2003-03-17 16:43:37.000000000 -0500
+++ linux-2.5.66/include/linux/kernel.h	2003-04-07 14:33:05.000000000 -0400
@@ -47,6 +47,18 @@
 #define minimum_console_loglevel (console_printk[2])
 #define default_console_loglevel (console_printk[3])
 
+/* Printk subsystem identifiers */
+#define PRINTK_UNASS    "<A>"   /* unassigned printk subsystem          */
+#define PRINTK_CORE     "<B>"   /* from the core kernel                 */
+#define PRINTK_SCSI     "<C>"   /* from the SCSI subsystem              */
+#define PRINTK_NET      "<D>"   /* from the Net subsystem               */
+#define PRINTK_USB      "<E>"   /* from the USB subsystem               */
+
+#define FIRST_PRINTK_SUBSYS PRINTK_UNASS[1]
+#define LAST_PRINTK_SUBSYS PRINTK_USB[1]
+
+extern int printk_subsystem[5][4];
+
 struct completion;
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
@@ -102,6 +114,62 @@
 		console_loglevel = 15;
 }
 
+static inline int decode_subsys(char *subsys) 
+{
+        if (subsys[1] >= FIRST_PRINTK_SUBSYS &&
+            subsys[1] <= LAST_PRINTK_SUBSYS)
+                return subsys[1] - FIRST_PRINTK_SUBSYS;
+        return -1;
+}
+
+/* returns the log threshold for a given subsystem.  -1 on error.  */
+static inline int get_subsys_loglevel(char *subsys)
+{
+        int index;
+        if ((index = decode_subsys(subsys)) != -1)
+                return printk_subsystem[index][0];
+        return -1;
+}
+
+/* sets the log threshold for a given subsystem.  
+ * returns 0 if everything is okay, -1 if an error is encoutered. */
+static inline int set_subsys_loglevel(char *subsys, int level)
+{
+        int index;
+        if (level < 0 || level > 8)
+                return -1;
+        if ((index = decode_subsys(subsys)) != -1) {
+                if (level < printk_subsystem[index][2])
+                        level = printk_subsystem[index][3];
+                printk_subsystem[index][0] = level;
+                return 0;
+        }
+        return -1;
+}
+
+/* returns the default message level for a given subsystem.  -1 on error */
+static inline int get_subsys_msglevel(char *subsys)
+{
+        int index;
+        if ((index = decode_subsys(subsys)) != -1)
+                return printk_subsystem[index][1];
+        return -1;
+}
+
+/* sets the default message level for a given subsystem.
+ * return 0 if everything is okay, - 1 if an error is encountered */
+static inline int set_subsys_msglevel(char *subsys, int level)
+{
+        int index;
+        if (level < 0 || level > 7)
+                return -1;
+        if ((index = decode_subsys(subsys)) != -1) {
+                printk_subsystem[index][1] = level;
+                return 0;
+        }
+        return -1;
+}
+
 extern void bust_spinlocks(int yes);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
 
Only in linux-2.5.66.pristine/include/sound: pcm_sgbuf.h
diff -X /home/mort/diff-exclude -uEr linux-2.5.66.pristine/kernel/printk.c linux-2.5.66/kernel/printk.c
--- linux-2.5.66.pristine/kernel/printk.c	2003-03-17 16:44:50.000000000 -0500
+++ linux-2.5.66/kernel/printk.c	2003-04-07 14:54:33.356776808 -0400
@@ -42,6 +42,9 @@
 #define MINIMUM_CONSOLE_LOGLEVEL 1 /* Minimum loglevel we let people use */
 #define DEFAULT_CONSOLE_LOGLEVEL 7 /* anything MORE serious than KERN_DEBUG */
 
+#define MINIMUM_SUBSYS_LOGLEVEL 1
+#define DEFAULT_SUBSYS_LOGLEVEL 8
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 
 int console_printk[4] = {
@@ -51,6 +54,18 @@
 	DEFAULT_CONSOLE_LOGLEVEL,	/* default_console_loglevel */
 };
 
+int printk_subsystem[5][4] = {  /*  [][0] == subsystem log level
+                                 *  [][1] == default message loglevel
+                                 *  [][2] == minimum subsystem loglevel
+                                 *  [][3] == default subsystem loglevel */
+        [0 ... 4] = { 
+                DEFAULT_SUBSYS_LOGLEVEL,
+                DEFAULT_MESSAGE_LOGLEVEL,
+                MINIMUM_SUBSYS_LOGLEVEL,
+                DEFAULT_SUBSYS_LOGLEVEL
+        }
+};
+
 int oops_in_progress;
 
 /*
@@ -390,10 +405,11 @@
 {
 	va_list args;
 	unsigned long flags;
-	int printed_len;
+	int printed_len, msg_log_level, msg_subsystem, i;
 	char *p;
 	static char printk_buf[1024];
-	static int log_level_unknown = 1;
+	static int begin_message = 1;
+        
 
 	if (oops_in_progress) {
 		/* If a crash is occurring, make sure we can't deadlock */
@@ -409,23 +425,44 @@
 	va_start(args, fmt);
 	printed_len = vsnprintf(printk_buf, sizeof(printk_buf), fmt, args);
 	va_end(args);
-
+        
 	/*
-	 * Copy the output into log_buf.  If the caller didn't provide
-	 * appropriate log level tags, we insert them here
+	 * Copy the output into log_buf.
 	 */
-	for (p = printk_buf; *p; p++) {
-		if (log_level_unknown) {
-			if (p[0] != '<' || p[1] < '0' || p[1] > '7' || p[2] != '>') {
+        p = printk_buf;
+	while (*p) {
+		if (begin_message) {
+                        /* Figure out if there is zero, one or two flags */
+                        msg_log_level = -1;
+                        msg_subsystem = 0;  /* A - Unassigned */
+                        for (i = 0; i < 2; i++) {
+				if (p[0] == '<' && p[2] == '>') {
+                                	if (p[1] >= '0' && p[1] <= '7')
+                                        	msg_log_level = p[1] - '0';
+                                	if (p[1] >= 'A' && p[1] <= 'G')
+                                        	msg_subsystem = p[1] - 'A';
+				} else 
+					break;
+				p+=3;
+			}
+
+                        /* Decide if we print this message at all */
+                        if (msg_log_level == -1)
+                                msg_log_level = printk_subsystem[msg_subsystem][1];
+                                
+                        if (msg_log_level < printk_subsystem[msg_subsystem][0]) {
+                                begin_message = 0;
 				emit_log_char('<');
-				emit_log_char(default_message_loglevel + '0');
+                                emit_log_char(msg_log_level + '0');
 				emit_log_char('>');
+                        } else { // Get out of this loop.  Don't log anything.
+                                break;
 			}
-			log_level_unknown = 0;
 		}
 		emit_log_char(*p);
 		if (*p == '\n')
-			log_level_unknown = 1;
+			begin_message = 1;
+                p++;
 	}
 
 	if (!cpu_online(smp_processor_id())) {

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2003-04-24 18:58 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-21 18:23 [patch] printk subsystems Perez-Gonzalez, Inaky
2003-04-21 18:30 ` H. Peter Anvin
  -- strict thread matches above, loose matches on Subject: below --
2003-04-24 18:56 Manfred Spraul
2003-04-24 19:10 ` bob
2003-04-23  0:28 Perez-Gonzalez, Inaky
2003-04-22 22:53 Perez-Gonzalez, Inaky
2003-04-23  3:58 ` Tom Zanussi
2003-04-22 19:02 Perez-Gonzalez, Inaky
2003-04-22 19:03 ` H. Peter Anvin
2003-04-22 21:52 ` Tom Zanussi
2003-04-22 18:46 Perez-Gonzalez, Inaky
2003-04-22 23:28 ` Karim Yaghmour
2003-04-22  5:09 Perez-Gonzalez, Inaky
2003-04-24 18:22 ` bob
2003-04-22  4:02 Perez-Gonzalez, Inaky
2003-04-22  5:52 ` Karim Yaghmour
2003-04-22  6:04 ` Tom Zanussi
2003-04-22  3:04 Perez-Gonzalez, Inaky
2003-04-22  6:00 ` Tom Zanussi
2003-04-22  2:49 Perez-Gonzalez, Inaky
2003-04-22  4:34 ` Karim Yaghmour
2003-04-21 18:42 Perez-Gonzalez, Inaky
2003-04-17 19:58 Perez-Gonzalez, Inaky
2003-04-17 20:34 ` Karim Yaghmour
2003-04-17 21:03   ` Perez-Gonzalez, Inaky
2003-04-17 21:37     ` Tom Zanussi
2003-04-18  7:21     ` Tom Zanussi
2003-04-18  7:42     ` Greg KH
2003-04-21 15:56     ` Karim Yaghmour
2003-04-08 23:15 Chuck Ebbert
2003-04-07 20:13 Martin Hicks
2003-04-08 18:41 ` Pavel Machek
2003-04-08 20:02   ` Jes Sorensen
2003-04-08 21:02     ` Pavel Machek
2003-04-08 21:10       ` H. Peter Anvin
2003-04-08 21:57         ` Pavel Machek
2003-04-08 22:02           ` Jes Sorensen
2003-04-08 22:05           ` H. Peter Anvin
2003-04-08 22:55             ` Martin Hicks
2003-04-08 23:10               ` Randy.Dunlap
2003-04-14 18:33                 ` Patrick Mochel
2003-04-14 22:33                   ` Daniel Stekloff
2003-04-16 18:42                     ` Patrick Mochel
2003-04-16 12:35                       ` Daniel Stekloff
2003-04-16 19:16                       ` Martin Hicks
2003-04-16 12:43                         ` Daniel Stekloff
2003-04-17 15:56                           ` Martin Hicks
2003-04-17 13:58                             ` Karim Yaghmour
2003-04-15 13:27                   ` Martin Hicks
2003-04-15 14:40                     ` Karim Yaghmour
2003-04-08 22:00       ` Jes Sorensen
2003-04-11 19:21 ` Martin Hicks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).