linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [patch] printk subsystems
@ 2003-04-22  5:09 Perez-Gonzalez, Inaky
  2003-04-24 18:22 ` bob
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  5:09 UTC (permalink / raw)
  To: 'karim@opersys.com'
  Cc: 'Tom Zanussi', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Robert Wisniewski'


> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> "Perez-Gonzalez, Inaky" wrote:
>
> > Not meaning to be an smartass here, but I don't buy the "lockless" tag,
> > I would agree it is an optimized-lock scheme ....
> >
> > Although it is not that important, no need to make a fuss out of that :)
> 
> I actually think this is important.

Don't get me wrong - I don't mean the actual difference is not important;
what I mean is not important is me buying the "lockless" tag or not. I 
actually think that the method you guys use is really sharp.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-24 18:56 Manfred Spraul
  2003-04-24 19:10 ` bob
  0 siblings, 1 reply; 52+ messages in thread
From: Manfred Spraul @ 2003-04-24 18:56 UTC (permalink / raw)
  To: bob; +Cc: linux-kernel

Robert wrote:

>There is both a qualitative difference and quantitative difference in a
>lockless algorithm as described versus one that uses locking.  Most
>importantly for Linux, these algorithms in practice have better performance
>characteristics.
>
Do you have benchmark numbers that compare "lockless" and locking 
algorithms on large MP systems?

For example, how much faster is one 'lock;cmpxchg' compared to 
'spin_lock();if (x==var) var = y;spin_unlock();'.

So far I assumed that for spinlock that are only held for a few cycles, 
the cacheline trashing dominates, and not the spinning.
I've avoided to replace spin_lock+inc+spin_unlock with atomic_inc(). 
(Just look at the needed memory barriers: smp_mb__after_clear_bit & friends)

RCU uses per-cpu queues that are really lockless and avoid the cache 
trashing, that is a real win.

--
    Manfred


^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-23  0:28 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-23  0:28 UTC (permalink / raw)
  To: 'karim@opersys.com'
  Cc: 'linux-kernel@vger.kernel.org', 'Tom Zanussi'


> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> The difference between copy_to_user() and memcpy() is not a
> minor detail. There is a reason why relayfs does its things the
> way it does.

Well, then I would like to ask you to help me understand what
is so radically different (aside from the might-sleep), because
I simply don't get it, and I am always willing to learn ..

> > That is a good point, that brought me yesterday night to the following
> > doubt. How do you guarantee integrity of the data when reading with
> > mmap. In other words, if I am just copying the mmap region, how do
> > I know that what I am copying is safe enough, that it is not being
> > modified by CPU #2, for example?
> [snip]
> 
> "Use the source, Luke."

Yep, see my last message to Tom, I am doing that - however, it is
pretty difficult to grasp things at the first try [specially when
you are kind of in a rush for many different things :)].

> OK, so you are suggesting we start making a difference in the kernel
> between those printks which are "optional" and those that are
> "compulsory"? Interested kernel developers are free to voice their
> interest at any time now ...

Where did I say that? I am not talking about printk() anywhere, 
btw, although if someone wants to do that, hey, it's their decision, 
isn't it? I am nobody's Mum as to tell them what to do and not 
to do.

OTOH, if someone would consider plugging printk to kue, I'd
consider kind of stupid to printk recallable messages ... if it
can be recalled, why print it at all?

I don't know why, but I have a feeling like you are taking all
this conversation too personal. It is not my intention to stomp
over all your work and criticize it. I am learning what you guys
have done, and yes, looking for defects or problems, because I
am concerned about things that don't match in my head and how
to solve them.

> > Now, if you want to make it resizable, that understands Japanese and
> > does double loops followed by a nose dive and a vertical climb up,
> > well, that's up to the client of the API. And I didn't want to
> > constraint the gymnastics that the client could do to handle a buffer.
> 
> Well, if we're talking about "double loops followed by a nose dive"
> we're certainly not going anywhere. I'll leave it to other LKML
> subscribers to decided wether automatically resizeable buffers are
> of interest.

Hey, where is your sense of humor?! Are you German, like my 
girlfriend? ;)

Automatically resizeable buffers are of *much* interest, to me at
least - remember that the only point I was stressing is I am leaving
that design/implementation issue out of the kue code, up to the 
client, while in relayfs it is inside of it.

I consider that a gain in kue's flexibility, while it is a gain 
in performance and space optimization on relayfs's hand (this is
my interpretation of Tom words in a previous mail and the code
I have had time to grasp; I think it makes full sense).

> > > I'm sorry, but the way I see printk() is that once I send something
> > > to it, it really ought to show up somewhere. Heck, I'm printk'ing
> > > it. If I plugged a device and the driver said "Your chair is
> > > on fire", I want to know about it whether the device has been
> > > unplugged later or not.
> >
> > I would say this case, printk(), would fit in my second example,
> > doesn't it? ... this is one message you want delivered, not recalled.
> 
> What I've been trying to say here is that there are no two kinds of
> printk. printk is printk and it ought to behave the same in all
> instances.

And I never said nowhere that it should not, in fact I have
stated clearly that printk's is something you shall not recall.
Twice, already ...

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-22 22:53 Perez-Gonzalez, Inaky
  2003-04-23  3:58 ` Tom Zanussi
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22 22:53 UTC (permalink / raw)
  To: 'Tom Zanussi'
  Cc: 'karim@opersys.com', 'linux-kernel@vger.kernel.org'


> From: Tom Zanussi [mailto:zanussi@us.ibm.com]
> Perez-Gonzalez, Inaky writes:
>  > > From: Tom Zanussi [mailto:zanussi@us.ibm.com]
>  > >
>  > > In relayfs, the event can be generated directly into the space
>  > > reserved for it - in fact this is exactly what LTT does.  There
aren't
>  > > two separate steps, one 'generating' the event and another copying it
>  > > to the relayfs buffer, if that's what you mean.
>  >
>  > In this case, what happens if the user space, through mmap, copies
>  > while the message is half-baked (ie, from another CPU) ... won't it
>  > be inconsistent?
> 
> There's a count kept, per sub-buffer, that's updated after each write.
> If this count doesn't match the expected size of the sub-buffer, the
> reader can ignore the incomplete buffer and come back to it later.
> The count is maintained automatically by relay_write(); if you're
> writing directly into the channel as LTT does though, part of the task
> is to call relay_commit() after the write, which updates the count and
> maintains consistency.

Hmmm, scratch, scratch ... there is something I still don't get here. 
I am in lockless_commit() - for what you say, and what I read, I would 
then expect the length of the sub-buffer would be mapped to user space, 
so I can memcpy out of the mmaped area and then take only the part that
is guaranteed to be consistent. But the atomic_add() is done on the 
rchan->scheme.lockless.fillcount[buffer_number]. So, I don't see how
that count pops out to user space, as rchan->buf to rchan->buf + rchan->
alloc_size is what is mapped, and rchan is a kernel-only struct that
is not exposed through mmap().

Where is the Easter bunny I am missing? Would you mind to give a little bit
of pseudocode? I am trying to understand how to do this:

/* in userspace */

char *buf;
int fd; /* for the channel */
int log_fd; /* for permanent storage */

/* open, blah ... */

buf = mmap (... fd ... 1M ...);

if (new stuff is ready /* how to detect, select() on fd? */) {
	/* bring the data to safety, copy only what is consistent */
	real_size = /* or where do I get this from? */
	write (log_fd, buf, real_size);
}

And then, once I have this, next time I read I don't want to read
what I already did; I guess I can advance my buf pointer to 
buf+real_size, but then how do I wrap around - meaning, how do I
detect when do I have to wrap?

>  > Yes, you have to guarantee the existence of the event data structures
>  > (the 'struct kue', the embedded 'struct kue_user' and the event data
>  > itself); if they are embedded into another structure that will dissa-
>  > pear, you can choose to:
>  > ...
> 
> Well, kmalloc() seems like the most straightforward and convenient way
> of managing space for all these individual events, if not the most
> efficient.  Are you thinking that sub-allocating them out of a larger
> buffer might make more sense, for instance?  If so, I'd suggest
> relayfs for that. ;-) Just kidding, ...

Good try :) As I said somewhere else, that'd be up to the client. Wanna
use kmalloc()? or kmem_cache_alloc()? or something else? I guess it'd 
be convenient to provide a pre-implemented circular buffer thingie ready
to use.

I guess the suballocation makes sense when you have a fixed message size
and you want to optimize the allocation; for that matter, kmalloc is no
different to any other pool, as they are just pools of base-2 sizes. In
some other sense, you are doing the same in relayfs, managing kind of
an allocation pool, but not as flexible (and thus probably faster) because
the usage model doesn't pose as many requirements as the memory pools have.

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)


^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-22 19:02 Perez-Gonzalez, Inaky
  2003-04-22 19:03 ` H. Peter Anvin
  2003-04-22 21:52 ` Tom Zanussi
  0 siblings, 2 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22 19:02 UTC (permalink / raw)
  To: 'Tom Zanussi', Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com'



> From: Tom Zanussi [mailto:zanussi@us.ibm.com]
> 
> In relayfs, the event can be generated directly into the space
> reserved for it - in fact this is exactly what LTT does.  There aren't
> two separate steps, one 'generating' the event and another copying it
> to the relayfs buffer, if that's what you mean.

In this case, what happens if the user space, through mmap, copies
while the message is half-baked (ie, from another CPU) ... won't it
be inconsistent?

> Well, I'm not sure I understand the details of kue all that well, so
> let me know if I'm missing something, but for kue events to really be
> self-contained, wouldn't the data need to be copied into the event
> unless the data structure containing them was guaranteed to exist
> until the event was disposed of one way or another?

Yes, you have to guarantee the existence of the event data structures
(the 'struct kue', the embedded 'struct kue_user' and the event data
itself); if they are embedded into another structure that will dissa-
pear, you can choose to:

(a) recall the event [if it is recallable or makes sense to do so]
(b) dynamically allocate the event header and data, generate it 
    into that dynamic space.
(c) dynamically allocate and copy [slow]

(this works now; however, once I finish the destructor code, it
will give you the flexibility to use other stuff than just kmalloc()).

You can play many tricks here, but that depends on your needs,
requirements and similar stuff.

> backing data structure was guaranteed to exist, wouldn't it need to be
> static (unchanging) for the data to mean the same thing when it was
> delivered as when it was logged?  If the data needs to be copied to

Of course; I am assuming the client knows that (this is a must for
static allocation, similar in problem to when you give out an string
from inside your code). If the client is not willing to do that, it
has to dynamically allocate and provide a destructor.

> the event to make it self-contained, then you're actually doing 2
> copies per event - the first to the event, the second the
> copy_to_user().

That I do in the (c) case; not in the (b) case. In most situations
you shall be able to choose how to do it [and I guess most sensible
people would choose (b)].

> By contrast, relayfs is worst-case (and best-case) single copy
> into the relayfs buffer, which is allocated/freed once during its
> lifetime.

Sure; I'd just like to know how are you maintaining consistency for
the mmap stuff.


Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-22 18:46 Perez-Gonzalez, Inaky
  2003-04-22 23:28 ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22 18:46 UTC (permalink / raw)
  To: 'karim@opersys.com', Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> "Perez-Gonzalez, Inaky" wrote:
> > However, in relayfs that problem is shifted, unless I am missing
> > something. For what I know, so far, is that you have to copy the
> > message to the relayfs buffer, right? So you have to generate the
> > message and then copy it to the channel with relay_write(). So
> > here is kue's copy_to_user() counterpart.
> 
> OK, so you are claiming that memcpy() == copy_to_user()?

Not ==, although you cannot deny that they do basically the same: 
copy memory. 

copy_to_user() has to do some more gymnastics in the process, 
but basically, the bulk is the same [at least by reading the 
asm of __copy_user() in usercopy.c and __memcpy() in string.h 
-- it is kind of different, but in function is/should
be the same - bar that copy_to_user() might sleep due to 
paging-in and preemption and who knows what else].
 
> [If nothing else, please keep in mind that the memcpy() in question
> is to an rvmalloc'ed buffer.]

Good issue for caching ...

> Maybe I'm just old fashioned, but I usually want to provide a
> logging function with a pointer and a size parameter, and I want
> whatever I'm passing to it be placed in a secure repository where
> my own code couldn't touch it even if it went berserk.

That is a good point, that brought me yesterday night to the following
doubt. How do you guarantee integrity of the data when reading with
mmap. In other words, if I am just copying the mmap region, how do
I know that what I am copying is safe enough, that it is not being
modified by CPU #2, for example? (because user space and kernel space
will not share the locks, at most, user space can look at a couple of
markers that identify the bottom and the top of the "safe" buffer,
but there is not way to get both of them atomically). Also, if it
is a circular buffer, is there a way for the user space to know
when did it wrap around? I still don't get how mmap works all this
out (or is the buffer being moved under the user space's feet?)

> Again, you are making assumptions regarding the usage of your mechanism.
> With relayfs, dropping a channel (even one that has millions upon millions
> of events) requires one rvfree().

Well, we all have to make certain assumptions, other wise we'd be
having philosophical discussions about the squareness of the circle
for ever an ever in a for(;;) loop. 

> Sorry, you don't get to see b, c, g, h, and i because something
> changed in the system and whatever wanted to send those over isn't
> running anymore. Maybe I'm just off the chart, but I'd rather see
> the first list of events.

As I explained below, you don't _have_to_ drop it; however, in some
cases, it makes sense to drop it because it is meaningless anyway (ie
the device-plugged message - why would I want the userspace to check
it if I know there is no device - so I recall it). Errors are another
matter, and you don't want to recall those.

This is different from running out of space. Like it or not, if you 
have a circular buffer with limited space and you run out ... moc!
you loose, drop something somewhere to make space for it. This is not
a kue limitation, this is a property of buffers: they fill up.

Now, if you want to make it resizable, that understands Japanese and
does double loops followed by a nose dive and a vertical climb up, 
well, that's up to the client of the API. And I didn't want to 
constraint the gymnastics that the client could do to handle a buffer.

> > However, there are two different concepts here. One is the event
> > that you want to send and recall it if not delivered by the time
> > it does not make sense anymore (think plug a device, then remove
> > it). The other want is the event you want delivered really badly
> > (think a "message" like "the temperature of the nuclear reactor's
> > core has reached the high watermark").
> 
> I'm sorry, but the way I see printk() is that once I send something
> to it, it really ought to show up somewhere. Heck, I'm printk'ing
> it. If I plugged a device and the driver said "Your chair is
> on fire", I want to know about it whether the device has been
> unplugged later or not.

I would say this case, printk(), would fit in my second example,
doesn't it? ... this is one message you want delivered, not recalled.

> > As I mentioned before, this kind-of-compensates-but-not-really
> > with the fact of having to generate the message and then copy
> > it to the channel.
> 
> That's the memcpy() == copy_to_user() again.

Please note the "kind-of-compensates-but-not-really"; then refer
to my first paragraph. 

> Nevertheless, if you want to measure scalability alone, try
> porting LTT to kue, and try running LMbench and co. LTT is very
> demanding in terms of buffering (indeed, I'll go a step further and
> claim that it is the most demanding application in terms of
> buffering) and already runs on relayfs.

Got it, thanks :)

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-22  4:02 Perez-Gonzalez, Inaky
  2003-04-22  5:52 ` Karim Yaghmour
  2003-04-22  6:04 ` Tom Zanussi
  0 siblings, 2 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  4:02 UTC (permalink / raw)
  To: 'karim@opersys.com', Perez-Gonzalez, Inaky
  Cc: 'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'hpa@zytor.com', 'pavel@ucw.cz',
	'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


> From: Karim Yaghmour [mailto:karim@opersys.com]
> 
> Consider the following:
> 1) kue_read() has a while(1) which loops around and delivers messages
> one-by-one (to the best of my understanding of the code you posted).

That's right.

> Hence, delivery time increases with the number of events. In contrast,
> relayfs can deliver tens of thousands of events in a single shot.

Sure, I agree with that - it is the price to pay for less code and
not having to copy on the kernel side. As I mentioned before to Tom,
I don't think that in the long run the overhead will be that much,
although it is true it will be very dependent on the average size of 
the messages and how many you deliver. If we are delivering two-byte
messages at 1M/second rate, the effectivity rate goes down, and with
it the scalability.

However, in relayfs that problem is shifted, unless I am missing 
something. For what I know, so far, is that you have to copy the
message to the relayfs buffer, right? So you have to generate the
message and then copy it to the channel with relay_write(). So
here is kue's copy_to_user() counterpart.

If there were a way to reserve the space in relayfs, so that then
I can generate the message straight over there, that scalability
problem would be gone.

> 2) by having to maintain next and prev pointers, kue consumes more
> memory than relayfs (at least 8 bytes/message more actually, on a
> 32-bit machine.) For large messages, the impact is negligeable, but
> the smaller the messages the bigger the overhead.

True; I would say most messages are going to be at least 30 
something bytes in length. I don't think there is like an 
estimated average of the messages size, right?

> 3) by having to go through the next/prev pointers, accessing message
> X requires reading all messages before it. This can be simplified

Not really, because access is sequential, one shot. Once you have
read it, there it goes. So you always keep a pointer to the next
msg that you are going to read.

> are used. [Other kue calls are also handicapped by similar problems,
> such as the deletion of the entire list.]

Yes, this is hopelessly O(N), but creation or deletion of an entire
list is not such a common thing (I would say that otherwise would
imply some design considerations that should be revised in the
system). 

Same thing goes for the closure of a file descriptor, that has
to go over the list deciding who to send to the gallows. Need to
work on that though.

> > That's the difference. I don't intend to have that. The data
> > storage can be reused or not, that is up to the client of the
> > kernel API. They still can reuse it if needed by reclaiming the
> > event (recall_event), refilling the data and re-sending it.
> 
> Right, but by reusing the event, older data is thereby destroyed
> (undelivered). Which comes back to what I (and others) have been
> saying: kue requires the sender's data structures to exist until
> their content is delivered.

That's it, that is the principle of a circular buffer. AFAIK, you
have the same in relayfs. Old stuff gets dropped. Period. And the
sender's data structures don't really need to exist forever,
because the event is self-contained. The only part that might
be a problem is the destructor [if for example, you were a module,
the destructor code was in the module and the module unloaded 
without recalling pending events first - the kind of thing
you should never do; either use a non-modular destructor or make
sure the module count is not decreased until all the events have
been delivered or recalled ... simple to do too]

However, there are two different concepts here. One is the event
that you want to send and recall it if not delivered by the time
it does not make sense anymore (think plug a device, then remove
it). The other want is the event you want delivered really badly
(think a "message" like "the temperature of the nuclear reactor's
core has reached the high watermark").

First one is the kind of event you would want to plug into the
data structures (or not, you might also kmalloc() it) and then,
when it stops making sense, recall it.

Second one, you don't care what happens after you are not there.
The code just had to set it on its way home.

The key is that the client of the API can control that behavior.

> Right, but then you have 2 layers of buffering/queing instead
> of a single one.

No, I just have one buffering/queuing, because I still had to 
generate the message to put it somewhere. And I generate it 
directly to what is going to be delivered.

> Right, but kue has to loop through the queue to deliver the messages
> one-by-one. The more messages there are, the longer the delivery time.
> Not to mention that you first have to copy it to user-space before
> the reader can do write() to put it to permanent storage. With relafys,
> you just do write() and you're done.

As I mentioned before, this kind-of-compensates-but-not-really
with the fact of having to generate the message and then copy
it to the channel.

I think that at this point it'd be interesting to run something
like a benchmark [once I finish the destructor code], however,
it is going to be fairly difficult to test both implementations
in similar grounds. Any ideas?

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-22  3:04 Perez-Gonzalez, Inaky
  2003-04-22  6:00 ` Tom Zanussi
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  3:04 UTC (permalink / raw)
  To: 'Tom Zanussi'
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


> From: Tom Zanussi [mailto:zanussi@us.ibm.com]
> 
> It seems to me that when comparing apples to apples, namely
> considering the complete lifecycle of an event, ... <snip>
> 
> While kue_send_event() in itself is very simple and efficient, it's
> only part of the story, the other parts being the copy_to_user() ...

Agreed - my mistake here in the comparison for leaving out that stuff.

> event.  While kue can avoid this kernel-side copy, it's not possible
> for it to avoid the copy_to_user() since its design precludes mmapping
> the kernel data.  Again, six of one, half dozen of another.  kue looks

Sure - those things, I would say, they compensate one another, 
except for that mmap() detail that pushes the balance towards relayfs
regarding effectiveness when delivering the messages; I think that
at the end the difference should not be too big as the copying of
the data in kue to user space should roughly compensate by the copying
of the data to the relayfs buffer; after all, a copy is a copy.
No data to back this claim though, I am just thinking a mental 
schematic of the lifetime of a bit in both systems out loud.

Or, again, I am missing something ...

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-22  2:49 Perez-Gonzalez, Inaky
  2003-04-22  4:34 ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-22  2:49 UTC (permalink / raw)
  To: 'Tom Zanussi'
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com'

 
> From: Tom Zanussi
>
> relayfs actually uses 2 mutually-exclusive schemes internally -
> 'lockless' and 'locking', depending on the availability of a cmpxchg
> instruction (lockless needs cmpxchg).  If the lockless scheme is being
> used, relay_lock_channel() does no locking or irq disabling of any
> kind i.e. it's basically a no-op in that case.  

So that means you are using cmpxchg to do the locking. I mean, not the
"locking" itself, but a similar process to that of locking. I see. 

However, isn't it the almost the same as spinlocking? You are basically
trying to "allocate" a channel idx with atomic cmpxchg; if it fails, you
are retrying, spinning on the retry code until successful.

Not meaning to be an smartass here, but I don't buy the "lockless" tag,
I would agree it is an optimized-lock scheme [assuming it works better
than the spinlock case, that I am sure it does because if not you guys
would have not gone through the process of implementing it], but it is
not lockless.

Although it is not that important, no need to make a fuss out of that :)

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-21 18:42 Perez-Gonzalez, Inaky
  0 siblings, 0 replies; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-21 18:42 UTC (permalink / raw)
  To: 'H. Peter Anvin', Perez-Gonzalez, Inaky
  Cc: 'Greg KH', 'karim@opersys.com',
	'Martin Hicks', 'Daniel Stekloff',
	'Patrick Mochel', 'Randy.Dunlap',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'

> From: H. Peter Anvin [mailto:hpa@zytor.com]
> 
> Perez-Gonzalez, Inaky wrote:
> >
> > Hey! Come on! You don't think I am that lame, do you? Man what
> > a fame I do have!
> >
> > Before the device vaporizes, it recalls the message, so there is
> > no message to read - the same way you take away the sysfs data from
> > the sysfs tree ...
> 
> If you think that will happen with printk(), then, quite frankly, you're
> seriously deluded.

I am kind of deluded, that's for sure. And sore too, but that's another one.

I tend to agree with you; however, it can be done. You would need to adapt
a circular buffer to work with kue. Not a big deal though - just an space 
allocator (that would recall the oldest messages if in need of space) and
the 
'destructor' would just clear the space.

If I get to modify the code to make the destructor thing work (any of these
days), then it will be possible to do it without modifying kue at all.

Now that I think about it, it would work - but I don't think it'd be
really worth it (the per-message overhead would be big for printk,
I'd say). For the record, I really think relayfs could be a better
answer [with the limited reading that so far I had about it].

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-21 18:23 Perez-Gonzalez, Inaky
  2003-04-21 18:30 ` H. Peter Anvin
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-21 18:23 UTC (permalink / raw)
  To: 'Greg KH', Perez-Gonzalez, Inaky
  Cc: 'karim@opersys.com', 'Martin Hicks',
	'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'



> From: Greg KH [mailto:greg@kroah.com]
> 
> > Yep, that is the point, and it is small enough (5 ulongs) that
> > it can be embedded anywhere without being of high impact and
> > having to allocate it [first example that comes to mind is
> > for sending a device connection message; you can embed a short
> > message in the device structure and query that for delivery;
> > no buffer, no nothing, the data straight from the source].
> 
> And the device is removed from the system, the memory for that device is
> freed, and then a user comes along and trys to read that message.
> 
> oops...  :)

Hey! Come on! You don't think I am that lame, do you? Man what
a fame I do have!

Before the device vaporizes, it recalls the message, so there is 
no message to read - the same way you take away the sysfs data from
the sysfs tree ...

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* RE: [patch] printk subsystems
@ 2003-04-17 19:58 Perez-Gonzalez, Inaky
  2003-04-17 20:34 ` Karim Yaghmour
  0 siblings, 1 reply; 52+ messages in thread
From: Perez-Gonzalez, Inaky @ 2003-04-17 19:58 UTC (permalink / raw)
  To: 'karim@opersys.com', 'Martin Hicks'
  Cc: 'Daniel Stekloff', 'Patrick Mochel',
	'Randy.Dunlap', 'hpa@zytor.com',
	'pavel@ucw.cz', 'jes@wildopensource.com',
	'linux-kernel@vger.kernel.org', 'wildos@sgi.com',
	'Tom Zanussi'


> From: Karim Yaghmour [mailto:karim@opersys.com]
>
> I beg to differ. There's a point where we've got to stop saying "oh,
> this buffering mechanism is special and it requires its own code."
> relayfs is there to provide a unified light-weight mechanism for
> transfering large amounts of data from the kernel to user space.

But you don't need to provide buffers, because normally the data
is already in the kernel, so why need to copy it to another buffer
for delivery?

That's the point I tried to address with the kue patches I posted 
last week - once you have the data, wherever, you just queue it
for delivery, and provide the delivery subsystem for means to 
destroy it when it is delivered (and thus, not needed anymore)
[currently I only support kfree(), but I plan to add a destructor
function that at the same time can work as a callback for delivery].

This is where I think relayfs is doing too much, and that is the
reason why I implemented the kue stuff. It is very lightweight
and does almost the same [of course, it is not bidirectional, but
still nobody asked for that].

Cheers,

Iñaky Pérez-González -- Not speaking for Intel -- all opinions are my own
(and my fault)

^ permalink raw reply	[flat|nested] 52+ messages in thread
* Re: [patch] printk subsystems
@ 2003-04-08 23:15 Chuck Ebbert
  0 siblings, 0 replies; 52+ messages in thread
From: Chuck Ebbert @ 2003-04-08 23:15 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel

Pavel Machek wrote:

> Well, I think we should first kill all crappy messages -- that
> benefits everyone.


 ...and _I_ want a bootverbose option like FreeBSD has.

 The default should be for each driver to print one line when
it initializes so you know it's there...





--
 Chuck
 I am not a number!

^ permalink raw reply	[flat|nested] 52+ messages in thread
* [patch] printk subsystems
@ 2003-04-07 20:13 Martin Hicks
  2003-04-08 18:41 ` Pavel Machek
  2003-04-11 19:21 ` Martin Hicks
  0 siblings, 2 replies; 52+ messages in thread
From: Martin Hicks @ 2003-04-07 20:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: hpa, wildos


Hello,

In an effort to get greater control over which printk()'s are logged
during boot and after, I've put together this patch that introduces the
concept of printk subsystems.  The problem that some are beginning to
face with larger machines is that certain subsystems are overly verbose
(SCSI, USB, cpu related messages on large NUMA or SMP machines)
and they overflow the buffer.  Making the logbuffer bigger is a stop gap
solution but I think this is a more elegant solution.

Basically, each printk is assigned to a subsystem and that subsystem has
the same set of values that the console_printk array has.  The
difference is that the console_printk loglevel decides if the message
goes to the console whereas the subsystem loglevel decides if that
message goes to the log at all.

This patch implements the core, but I haven't yet put in the facilities
to change the default values that are used at compile-time.  I'm looking
for opinions on the architecture, not the completeness.  I plan to add
configuration through sysctl.

To use the feature you simply add the subsystem identifier to the printk
call, much the same way that you add the priority tag:

printk(PRINTK_NET KERN_INFO "This is a printk from the net subsys\n");

Each subsystem has a default KERN_* priority, and if no subsystem is
given then the printk is put into the PRINTK_UNASS queue (which is setup
by default to log all messages).

The patch is against 2.5.66.

Opinions and comments welcome
mh

-- 
Martin Hicks  ||  mort@bork.org  || PGP/GnuPG: 0x4C7F2BEE
plato up 6 days,  6:33, 14 users,  load average: 0.09, 0.11, 0.09
Beer: So much more than just a breakfast drink.



diff -X /home/mort/diff-exclude -uEr linux-2.5.66.pristine/include/linux/kernel.h linux-2.5.66/include/linux/kernel.h
--- linux-2.5.66.pristine/include/linux/kernel.h	2003-03-17 16:43:37.000000000 -0500
+++ linux-2.5.66/include/linux/kernel.h	2003-04-07 14:33:05.000000000 -0400
@@ -47,6 +47,18 @@
 #define minimum_console_loglevel (console_printk[2])
 #define default_console_loglevel (console_printk[3])
 
+/* Printk subsystem identifiers */
+#define PRINTK_UNASS    "<A>"   /* unassigned printk subsystem          */
+#define PRINTK_CORE     "<B>"   /* from the core kernel                 */
+#define PRINTK_SCSI     "<C>"   /* from the SCSI subsystem              */
+#define PRINTK_NET      "<D>"   /* from the Net subsystem               */
+#define PRINTK_USB      "<E>"   /* from the USB subsystem               */
+
+#define FIRST_PRINTK_SUBSYS PRINTK_UNASS[1]
+#define LAST_PRINTK_SUBSYS PRINTK_USB[1]
+
+extern int printk_subsystem[5][4];
+
 struct completion;
 
 #ifdef CONFIG_DEBUG_SPINLOCK_SLEEP
@@ -102,6 +114,62 @@
 		console_loglevel = 15;
 }
 
+static inline int decode_subsys(char *subsys) 
+{
+        if (subsys[1] >= FIRST_PRINTK_SUBSYS &&
+            subsys[1] <= LAST_PRINTK_SUBSYS)
+                return subsys[1] - FIRST_PRINTK_SUBSYS;
+        return -1;
+}
+
+/* returns the log threshold for a given subsystem.  -1 on error.  */
+static inline int get_subsys_loglevel(char *subsys)
+{
+        int index;
+        if ((index = decode_subsys(subsys)) != -1)
+                return printk_subsystem[index][0];
+        return -1;
+}
+
+/* sets the log threshold for a given subsystem.  
+ * returns 0 if everything is okay, -1 if an error is encoutered. */
+static inline int set_subsys_loglevel(char *subsys, int level)
+{
+        int index;
+        if (level < 0 || level > 8)
+                return -1;
+        if ((index = decode_subsys(subsys)) != -1) {
+                if (level < printk_subsystem[index][2])
+                        level = printk_subsystem[index][3];
+                printk_subsystem[index][0] = level;
+                return 0;
+        }
+        return -1;
+}
+
+/* returns the default message level for a given subsystem.  -1 on error */
+static inline int get_subsys_msglevel(char *subsys)
+{
+        int index;
+        if ((index = decode_subsys(subsys)) != -1)
+                return printk_subsystem[index][1];
+        return -1;
+}
+
+/* sets the default message level for a given subsystem.
+ * return 0 if everything is okay, - 1 if an error is encountered */
+static inline int set_subsys_msglevel(char *subsys, int level)
+{
+        int index;
+        if (level < 0 || level > 7)
+                return -1;
+        if ((index = decode_subsys(subsys)) != -1) {
+                printk_subsystem[index][1] = level;
+                return 0;
+        }
+        return -1;
+}
+
 extern void bust_spinlocks(int yes);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
 
Only in linux-2.5.66.pristine/include/sound: pcm_sgbuf.h
diff -X /home/mort/diff-exclude -uEr linux-2.5.66.pristine/kernel/printk.c linux-2.5.66/kernel/printk.c
--- linux-2.5.66.pristine/kernel/printk.c	2003-03-17 16:44:50.000000000 -0500
+++ linux-2.5.66/kernel/printk.c	2003-04-07 14:54:33.356776808 -0400
@@ -42,6 +42,9 @@
 #define MINIMUM_CONSOLE_LOGLEVEL 1 /* Minimum loglevel we let people use */
 #define DEFAULT_CONSOLE_LOGLEVEL 7 /* anything MORE serious than KERN_DEBUG */
 
+#define MINIMUM_SUBSYS_LOGLEVEL 1
+#define DEFAULT_SUBSYS_LOGLEVEL 8
+
 DECLARE_WAIT_QUEUE_HEAD(log_wait);
 
 int console_printk[4] = {
@@ -51,6 +54,18 @@
 	DEFAULT_CONSOLE_LOGLEVEL,	/* default_console_loglevel */
 };
 
+int printk_subsystem[5][4] = {  /*  [][0] == subsystem log level
+                                 *  [][1] == default message loglevel
+                                 *  [][2] == minimum subsystem loglevel
+                                 *  [][3] == default subsystem loglevel */
+        [0 ... 4] = { 
+                DEFAULT_SUBSYS_LOGLEVEL,
+                DEFAULT_MESSAGE_LOGLEVEL,
+                MINIMUM_SUBSYS_LOGLEVEL,
+                DEFAULT_SUBSYS_LOGLEVEL
+        }
+};
+
 int oops_in_progress;
 
 /*
@@ -390,10 +405,11 @@
 {
 	va_list args;
 	unsigned long flags;
-	int printed_len;
+	int printed_len, msg_log_level, msg_subsystem, i;
 	char *p;
 	static char printk_buf[1024];
-	static int log_level_unknown = 1;
+	static int begin_message = 1;
+        
 
 	if (oops_in_progress) {
 		/* If a crash is occurring, make sure we can't deadlock */
@@ -409,23 +425,44 @@
 	va_start(args, fmt);
 	printed_len = vsnprintf(printk_buf, sizeof(printk_buf), fmt, args);
 	va_end(args);
-
+        
 	/*
-	 * Copy the output into log_buf.  If the caller didn't provide
-	 * appropriate log level tags, we insert them here
+	 * Copy the output into log_buf.
 	 */
-	for (p = printk_buf; *p; p++) {
-		if (log_level_unknown) {
-			if (p[0] != '<' || p[1] < '0' || p[1] > '7' || p[2] != '>') {
+        p = printk_buf;
+	while (*p) {
+		if (begin_message) {
+                        /* Figure out if there is zero, one or two flags */
+                        msg_log_level = -1;
+                        msg_subsystem = 0;  /* A - Unassigned */
+                        for (i = 0; i < 2; i++) {
+				if (p[0] == '<' && p[2] == '>') {
+                                	if (p[1] >= '0' && p[1] <= '7')
+                                        	msg_log_level = p[1] - '0';
+                                	if (p[1] >= 'A' && p[1] <= 'G')
+                                        	msg_subsystem = p[1] - 'A';
+				} else 
+					break;
+				p+=3;
+			}
+
+                        /* Decide if we print this message at all */
+                        if (msg_log_level == -1)
+                                msg_log_level = printk_subsystem[msg_subsystem][1];
+                                
+                        if (msg_log_level < printk_subsystem[msg_subsystem][0]) {
+                                begin_message = 0;
 				emit_log_char('<');
-				emit_log_char(default_message_loglevel + '0');
+                                emit_log_char(msg_log_level + '0');
 				emit_log_char('>');
+                        } else { // Get out of this loop.  Don't log anything.
+                                break;
 			}
-			log_level_unknown = 0;
 		}
 		emit_log_char(*p);
 		if (*p == '\n')
-			log_level_unknown = 1;
+			begin_message = 1;
+                p++;
 	}
 
 	if (!cpu_online(smp_processor_id())) {

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2003-04-24 18:58 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-22  5:09 [patch] printk subsystems Perez-Gonzalez, Inaky
2003-04-24 18:22 ` bob
  -- strict thread matches above, loose matches on Subject: below --
2003-04-24 18:56 Manfred Spraul
2003-04-24 19:10 ` bob
2003-04-23  0:28 Perez-Gonzalez, Inaky
2003-04-22 22:53 Perez-Gonzalez, Inaky
2003-04-23  3:58 ` Tom Zanussi
2003-04-22 19:02 Perez-Gonzalez, Inaky
2003-04-22 19:03 ` H. Peter Anvin
2003-04-22 21:52 ` Tom Zanussi
2003-04-22 18:46 Perez-Gonzalez, Inaky
2003-04-22 23:28 ` Karim Yaghmour
2003-04-22  4:02 Perez-Gonzalez, Inaky
2003-04-22  5:52 ` Karim Yaghmour
2003-04-22  6:04 ` Tom Zanussi
2003-04-22  3:04 Perez-Gonzalez, Inaky
2003-04-22  6:00 ` Tom Zanussi
2003-04-22  2:49 Perez-Gonzalez, Inaky
2003-04-22  4:34 ` Karim Yaghmour
2003-04-21 18:42 Perez-Gonzalez, Inaky
2003-04-21 18:23 Perez-Gonzalez, Inaky
2003-04-21 18:30 ` H. Peter Anvin
2003-04-17 19:58 Perez-Gonzalez, Inaky
2003-04-17 20:34 ` Karim Yaghmour
2003-04-17 21:03   ` Perez-Gonzalez, Inaky
2003-04-17 21:37     ` Tom Zanussi
2003-04-18  7:21     ` Tom Zanussi
2003-04-18  7:42     ` Greg KH
2003-04-21 15:56     ` Karim Yaghmour
2003-04-08 23:15 Chuck Ebbert
2003-04-07 20:13 Martin Hicks
2003-04-08 18:41 ` Pavel Machek
2003-04-08 20:02   ` Jes Sorensen
2003-04-08 21:02     ` Pavel Machek
2003-04-08 21:10       ` H. Peter Anvin
2003-04-08 21:57         ` Pavel Machek
2003-04-08 22:02           ` Jes Sorensen
2003-04-08 22:05           ` H. Peter Anvin
2003-04-08 22:55             ` Martin Hicks
2003-04-08 23:10               ` Randy.Dunlap
2003-04-14 18:33                 ` Patrick Mochel
2003-04-14 22:33                   ` Daniel Stekloff
2003-04-16 18:42                     ` Patrick Mochel
2003-04-16 12:35                       ` Daniel Stekloff
2003-04-16 19:16                       ` Martin Hicks
2003-04-16 12:43                         ` Daniel Stekloff
2003-04-17 15:56                           ` Martin Hicks
2003-04-17 13:58                             ` Karim Yaghmour
2003-04-15 13:27                   ` Martin Hicks
2003-04-15 14:40                     ` Karim Yaghmour
2003-04-08 22:00       ` Jes Sorensen
2003-04-11 19:21 ` Martin Hicks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).