Re: [patch] printk subsystems

From: Karim Yaghmour <karim@opersys.com>
To: "Perez-Gonzalez, Inaky" <inaky.perez-gonzalez@intel.com>
Cc: "'Martin Hicks'" <mort@wildopensource.com>,
	"'Daniel Stekloff'" <dsteklof@us.ibm.com>,
	"'Patrick Mochel'" <mochel@osdl.org>,
	"'Randy.Dunlap'" <rddunlap@osdl.org>,
	"'hpa@zytor.com'" <hpa@zytor.com>,
	"'pavel@ucw.cz'" <pavel@ucw.cz>,
	"'jes@wildopensource.com'" <jes@wildopensource.com>,
	"'linux-kernel@vger.kernel.org'" <linux-kernel@vger.kernel.org>,
	"'wildos@sgi.com'" <wildos@sgi.com>,
	"'Tom Zanussi'" <zanussi@us.ibm.com>
Subject: Re: [patch] printk subsystems
Date: Tue, 22 Apr 2003 01:52:44 -0400	[thread overview]
Message-ID: <3EA4D8AC.BF01743A@opersys.com> (raw)
In-Reply-To: A46BBDB345A7D5118EC90002A5072C780C263841@orsmsx116.jf.intel.com

"Perez-Gonzalez, Inaky" wrote:
> However, in relayfs that problem is shifted, unless I am missing
> something. For what I know, so far, is that you have to copy the
> message to the relayfs buffer, right? So you have to generate the
> message and then copy it to the channel with relay_write(). So
> here is kue's copy_to_user() counterpart.

OK, so you are claiming that memcpy() == copy_to_user()?

[If nothing else, please keep in mind that the memcpy() in question
is to an rvmalloc'ed buffer.]

> If there were a way to reserve the space in relayfs, so that then
> I can generate the message straight over there, that scalability
> problem would be gone.

I don't see a scalability problem with relayfs.

Maybe I'm just old fashioned, but I usually want to provide a
logging function with a pointer and a size parameter, and I want
whatever I'm passing to it be placed in a secure repository where
my own code couldn't touch it even if it went berserk.

In other words, if I'm coding a driver and my code isn't stable
yet, I want the printks generated by my sane code to survive,
even if my insane code came later and destroyed all the data
structures I depend on. If printk still has to rely on my insane
code for its data, then I'm probably going to waste some time
finding my problem.

> > 2) by having to maintain next and prev pointers, kue consumes more
> > memory than relayfs (at least 8 bytes/message more actually, on a
> > 32-bit machine.) For large messages, the impact is negligeable, but
> > the smaller the messages the bigger the overhead.
> 
> True; I would say most messages are going to be at least 30
> something bytes in length. I don't think there is like an
> estimated average of the messages size, right?

relayfs doesn't make any assumptions on this issue.

> > 3) by having to go through the next/prev pointers, accessing message
> > X requires reading all messages before it. This can be simplified
> 
> Not really, because access is sequential, one shot. Once you have
> read it, there it goes. So you always keep a pointer to the next
> msg that you are going to read.

You are assuming sequential reading, relayfs doesn't.

> > are used. [Other kue calls are also handicapped by similar problems,
> > such as the deletion of the entire list.]
> 
> Yes, this is hopelessly O(N), but creation or deletion of an entire
> list is not such a common thing (I would say that otherwise would
> imply some design considerations that should be revised in the
> system).

Again, you are making assumptions regarding the usage of your mechanism.
With relayfs, dropping a channel (even one that has millions upon millions
of events) requires one rvfree().

> That's it, that is the principle of a circular buffer. AFAIK, you
> have the same in relayfs. Old stuff gets dropped. Period. And the
> sender's data structures don't really need to exist forever,
> because the event is self-contained.

Sorry, but you're missing the point. There's a world of a difference
between a mechanism that has to drop the oldest data because there
are too many events occuring and a mechanism that has to drop messages
according to the changes to the system. You are not comparing the
same thing.

So say I have 10 events: a, b, c, d, e, f, g, h, i, j. Say that event
"j" is the one marking the occurence of a catastrophic physical event.

With a circular buffer scheme, you may get something like:
f, g, h, i, j.

With kue, you may get:
a, d, e, f, j.

Sorry, you don't get to see b, c, g, h, and i because something
changed in the system and whatever wanted to send those over isn't
running anymore. Maybe I'm just off the chart, but I'd rather see
the first list of events.

Plus, remember that relayfs has buffer-boundary notification callbacks.
If the subsystem being notified doesn't care enough to do something
about the data, there's really nothing relayfs can do about it. And
if there's just too much data for the channel, then:
1) Allocate a bigger channel.
2) Use the dynamic growth/shrinking code being worked on.

> However, there are two different concepts here. One is the event
> that you want to send and recall it if not delivered by the time
> it does not make sense anymore (think plug a device, then remove
> it). The other want is the event you want delivered really badly
> (think a "message" like "the temperature of the nuclear reactor's
> core has reached the high watermark").

I'm sorry, but the way I see printk() is that once I send something
to it, it really ought to show up somewhere. Heck, I'm printk'ing
it. If I plugged a device and the driver said "Your chair is
on fire", I want to know about it whether the device has been
unplugged later or not.

> > Right, but kue has to loop through the queue to deliver the messages
> > one-by-one. The more messages there are, the longer the delivery time.
> > Not to mention that you first have to copy it to user-space before
> > the reader can do write() to put it to permanent storage. With relafys,
> > you just do write() and you're done.
> 
> As I mentioned before, this kind-of-compensates-but-not-really
> with the fact of having to generate the message and then copy
> it to the channel.

That's the memcpy() == copy_to_user() again.

> I think that at this point it'd be interesting to run something
> like a benchmark [once I finish the destructor code], however,
> it is going to be fairly difficult to test both implementations
> in similar grounds. Any ideas?

IMHO I don't think you can truely compare relayfs to kue, because
the basic concepts implemented in kue do not correspond to what is
needed of a generalized buffering mechanism for the kernel.
Nevertheless, if you want to measure scalability alone, try
porting LTT to kue, and try running LMbench and co. LTT is very
demanding in terms of buffering (indeed, I'll go a step further and
claim that it is the most demanding application in terms of
buffering) and already runs on relayfs.

Here are some benchmarks we've already run with the current buffering
code being used in relayfs:
http://marc.theaimsgroup.com/?l=linux-kernel&m=103573710926859&w=2

Karim

===================================================
                 Karim Yaghmour
               karim@opersys.com
      Embedded and Real-Time Linux Expert
===================================================