linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [ANNOUNCE] Linux Hardened Device Drivers Project
@ 2002-09-21  0:26 Rhoads, Rob
  2002-09-21  0:54 ` [ANNOUNCE] Linux [add-more-silly-APIs] " Jeff Garzik
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Rhoads, Rob @ 2002-09-21  0:26 UTC (permalink / raw)
  To: 'linux-kernel@vger.kernel.org'; +Cc: Rhoads, Rob


Project Announcement:
--------------------
We've started a new project on sourceforge.net w/ focus 
on hardening Linux device drivers for highly available 
systems. This project is being worked on with folks from 
OSDL's CGL and DCL projects as well.

Initially we've created a specification, a few kernel modules
that implement a set of driver programming interfaces, and
a sample device driver that demonstrates those interfaces.

We are actively soliciting involvement with others in the 
Linux developer community. We need your help to make this 
project relevant and useful. 

Below I've included an overview of the hardened driver project. 
By no means is this complete or final. It's just our initial
attempt at defining what is meant by the term hardened driver
and the areas we want to focus on.

For additional info, please checkout the links at the bottom 
of this message and the Hardened Drivers web site at 
http://hardeneddrivers.sf.net.


Hardened Driver Project Overview:
--------------------------------
Device drivers have traditionally been a significant source 
of software faults. For this reason, they are of key concern
in improving the availability and stability of the operating
system. A critical element in creating Highly Available (HA)
environment is to reduce the likelihood of faults in key 
drivers, a methodology called driver hardening. 

A device driver is typically implemented with emphasis on 
the proper operation of the hardware. Attention to how it 
will function in the event of hardware faults is often 
minimal. Hardened drivers, on the other hand, are designed
with the assumption that the underlying hardware that they
control will fail. They need to respond to such failures by
handling faults gracefully, limiting the impact on the overall
system. Hardened device drivers must continue to operate when 
the hardware has failed (e.g. allow device fail-over), and 
must not allow the propagation of corrupt data from a failed 
device to other components of the system.

Hardened device drivers must also be active participants in 
the recovery of detected faults, by locally recovering them or
by reporting them to higher-level system management software 
that subsequently instructs the driver to take a specific 
action.

The goal of a hardened driver is to provide an environment 
in which hardware and software failures are transparent to 
the applications using their services, where possible. The 
way to effectively achieve this goal is to analyze a 
driver's software design and implement appropriate changes
to improve stability, reliability and availability, and 
to provide instrumentation for management middleware.

We believe that improving driver stability and reliability 
includes such measures as ensuring that all wait loops are
limited with a timeout, validating input and output data and
structuring the driver to anticipate hardware errors. 
Improving availability includes adding support for device
hot swapping and validating the driver with fault injection.
Instrumentation for management middleware includes functions
such as reporting of statistical indicators and logging of 
pertinent events to enable postmortem analysis in the event
of a failure.

To minimize instability contributed by device drivers and to 
enhance the availability of HA systems, we've attempted to 
define a set of requirements that a device driver should 
adhere to in order to be considered a hardened driver. We 
then define different hardening traits and the required 
programming interfaces to support these hardening traits.

We've identified four areas in which drivers can be hardened:
o Hardening with code robustness
o Hardening with event logging
o Hardening with diagnostics
o Hardening with resource monitoring and statistics

We've also identified some key areas we feel are most critical
to overall system stability and plan to focus initial hardening 
efforts on drivers for network interface cards, physical storage, 
and logical storage.

Project Links:
-------------
o The Driver Hardening website:  
  http://hardeneddrivers.sourceforge.net

o The SourceForge project related info:
  http://sourceforge.net/projects/hardeneddrivers

o Hardened Drivers Mailing List Info (subscribe here):
  http://lists.sourceforge.net/mailman/listinfo/hardeneddrivers-discuss


+=+=+
Rob Rhoads                     mailto:rob.rhoads@intel.com
Staff Software Engineer        office: 503-677-5498
Telecom Software Platforms
Intel Communications Group

This email message solely contains my own personal views, and not
necessarily those of my employer.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux [add-more-silly-APIs] Device Drivers Project
  2002-09-21  0:26 [ANNOUNCE] Linux Hardened Device Drivers Project Rhoads, Rob
@ 2002-09-21  0:54 ` Jeff Garzik
  2002-09-21  1:06 ` [ANNOUNCE] Linux Hardened " Andre Hedrick
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Jeff Garzik @ 2002-09-21  0:54 UTC (permalink / raw)
  To: Rhoads, Rob; +Cc: 'linux-kernel@vger.kernel.org'

Rhoads, Rob wrote:
> Project Announcement:
> --------------------
> We've started a new project on sourceforge.net w/ focus 
> on hardening Linux device drivers for highly available 
> systems. This project is being worked on with folks from 
> OSDL's CGL and DCL projects as well.

[...]
> Hardened Driver Project Overview:
> --------------------------------
> Device drivers have traditionally been a significant source 
> of software faults. For this reason, they are of key concern
> in improving the availability and stability of the operating
> system. A critical element in creating Highly Available (HA)
> environment is to reduce the likelihood of faults in key 
> drivers, a methodology called driver hardening. 
[...]
> To minimize instability contributed by device drivers and to 
> enhance the availability of HA systems, we've attempted to 
> define a set of requirements that a device driver should 
> adhere to in order to be considered a hardened driver. We 
> then define different hardening traits and the required 
> programming interfaces to support these hardening traits.
> 
> We've identified four areas in which drivers can be hardened:
> o Hardening with code robustness
> o Hardening with event logging
> o Hardening with diagnostics
> o Hardening with resource monitoring and statistics
> 
> We've also identified some key areas we feel are most critical
> to overall system stability and plan to focus initial hardening 
> efforts on drivers for network interface cards, physical storage, 
> and logical storage.


Sigh.

While the goal is certainly good and true, the implementation really stinks.

You simply cannot "harden" drivers by adding additional statistics nor 
by printing diagnostic messages via printk().  Further, centralizing 
--domain-specific-- diagnostics and statistics is just plain moving in 
the wrong direction.

Hardening drivers is a __human__ problem.  People use existing APIs and 
fuck up, thus creating bugs.  You are attempting to paper it over with 
buzzword-compliant features, but not actually addressing the real 
problem.  Adding silly APIs does not fix this.  You can't avoid getting 
down and dirty and actually fixing the drivers, and fixing up the 
_existing_ APIs so that humans create fewer bugs.

I fully support hardening, and "carrier grade linux."  This just ain't 
the way to do it.

[no offense intended.  if you think my comments harsh, wait until Al 
Viro sees your sample driver...]

	Jeff




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-21  0:26 [ANNOUNCE] Linux Hardened Device Drivers Project Rhoads, Rob
  2002-09-21  0:54 ` [ANNOUNCE] Linux [add-more-silly-APIs] " Jeff Garzik
@ 2002-09-21  1:06 ` Andre Hedrick
  2002-09-21 10:41   ` Bernd Eckenfels
  2002-09-21  1:40 ` Greg KH
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Andre Hedrick @ 2002-09-21  1:06 UTC (permalink / raw)
  To: Rhoads, Rob; +Cc: 'linux-kernel@vger.kernel.org'


Hi Rob,

My opinion only, and you may think it "stinks" ... oh well.

Obvious this is a way for the telecom folks to get something for free that
really should be paid for by funding the project with CASH.  Or funding
(a) startup(s) related to generating such support.

Regardless, it takes (fill in the blank) to boldly ask people to add APIs
for an industry who is only interested in using and not contributing.
Prove that all the stuff which is going to be plugged into these
security-hole^Wbug-generators^Wfeatures will be scheduled for open source.
Or this another attempt to try and take over the license and shove BSD
down the piles?

Pointed Blunt Raw, but nice.

Regards,

Andre Hedrick
LAD Storage Consulting Group

PS: I see a lot of "wants", are there any "gives" ?


On Fri, 20 Sep 2002, Rhoads, Rob wrote:

> 
> Project Announcement:
> --------------------
> We've started a new project on sourceforge.net w/ focus 
> on hardening Linux device drivers for highly available 
> systems. This project is being worked on with folks from 
> OSDL's CGL and DCL projects as well.
> 
> Initially we've created a specification, a few kernel modules
> that implement a set of driver programming interfaces, and
> a sample device driver that demonstrates those interfaces.
> 
> We are actively soliciting involvement with others in the 
> Linux developer community. We need your help to make this 
> project relevant and useful. 

We need your CAPITAL to pay for our TIME.

> Below I've included an overview of the hardened driver project. 
> By no means is this complete or final. It's just our initial
> attempt at defining what is meant by the term hardened driver
> and the areas we want to focus on.

Great, do they serve the needs of more than "INTEL"?

> For additional info, please checkout the links at the bottom 
> of this message and the Hardened Drivers web site at 
> http://hardeneddrivers.sf.net.
> 
> 
> Hardened Driver Project Overview:
> --------------------------------
> Device drivers have traditionally been a significant source 
> of software faults. For this reason, they are of key concern
> in improving the availability and stability of the operating
> system. A critical element in creating Highly Available (HA)
> environment is to reduce the likelihood of faults in key 
> drivers, a methodology called driver hardening. 
> 
> A device driver is typically implemented with emphasis on 
> the proper operation of the hardware. Attention to how it 
> will function in the event of hardware faults is often 
> minimal. Hardened drivers, on the other hand, are designed
> with the assumption that the underlying hardware that they
> control will fail. They need to respond to such failures by
> handling faults gracefully, limiting the impact on the overall
> system. Hardened device drivers must continue to operate when 
> the hardware has failed (e.g. allow device fail-over), and 
> must not allow the propagation of corrupt data from a failed 
> device to other components of the system.
> 
> Hardened device drivers must also be active participants in 
> the recovery of detected faults, by locally recovering them or
> by reporting them to higher-level system management software 
> that subsequently instructs the driver to take a specific 
> action.
> 
> The goal of a hardened driver is to provide an environment 
> in which hardware and software failures are transparent to 
> the applications using their services, where possible. The 
> way to effectively achieve this goal is to analyze a 
> driver's software design and implement appropriate changes
> to improve stability, reliability and availability, and 
> to provide instrumentation for management middleware.
> 
> We believe that improving driver stability and reliability 
> includes such measures as ensuring that all wait loops are
> limited with a timeout, validating input and output data and
> structuring the driver to anticipate hardware errors. 
> Improving availability includes adding support for device
> hot swapping and validating the driver with fault injection.
> Instrumentation for management middleware includes functions
> such as reporting of statistical indicators and logging of 
> pertinent events to enable postmortem analysis in the event
> of a failure.
> 
> To minimize instability contributed by device drivers and to 
> enhance the availability of HA systems, we've attempted to 
> define a set of requirements that a device driver should 
> adhere to in order to be considered a hardened driver. We 
> then define different hardening traits and the required 
> programming interfaces to support these hardening traits.
> 
> We've identified four areas in which drivers can be hardened:
> o Hardening with code robustness
> o Hardening with event logging
> o Hardening with diagnostics
> o Hardening with resource monitoring and statistics
> 
> We've also identified some key areas we feel are most critical
> to overall system stability and plan to focus initial hardening 
> efforts on drivers for network interface cards, physical storage, 
> and logical storage.
> 
> Project Links:
> -------------
> o The Driver Hardening website:  
>   http://hardeneddrivers.sourceforge.net
> 
> o The SourceForge project related info:
>   http://sourceforge.net/projects/hardeneddrivers
> 
> o Hardened Drivers Mailing List Info (subscribe here):
>   http://lists.sourceforge.net/mailman/listinfo/hardeneddrivers-discuss
> 
> 
> +=+=+
> Rob Rhoads                     mailto:rob.rhoads@intel.com
> Staff Software Engineer        office: 503-677-5498
> Telecom Software Platforms
> Intel Communications Group
> 
> This email message solely contains my own personal views, and not
> necessarily those of my employer.
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-21  0:26 [ANNOUNCE] Linux Hardened Device Drivers Project Rhoads, Rob
  2002-09-21  0:54 ` [ANNOUNCE] Linux [add-more-silly-APIs] " Jeff Garzik
  2002-09-21  1:06 ` [ANNOUNCE] Linux Hardened " Andre Hedrick
@ 2002-09-21  1:40 ` Greg KH
  2002-09-21  5:34   ` my review of the Device Driver Hardening Design Spec Greg KH
  2002-09-23  6:13 ` [ANNOUNCE] Linux Hardened Device Drivers Project Randy.Dunlap
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2002-09-21  1:40 UTC (permalink / raw)
  To: Rhoads, Rob; +Cc: 'linux-kernel@vger.kernel.org'

Hi,

I've just started to read over the published spec, and will reserve
comment on it, and the example code you've created after I'm done
reading it.  But I'll make a few comments right now on your
announcement:


On Fri, Sep 20, 2002 at 05:26:47PM -0700, Rhoads, Rob wrote:
> 
> Project Announcement:
> --------------------
> We've started a new project on sourceforge.net w/ focus 
> on hardening Linux device drivers for highly available 
> systems. This project is being worked on with folks from 
> OSDL's CGL and DCL projects as well.

Who is "we"?

> Hardened Driver Project Overview:
> --------------------------------
> Device drivers have traditionally been a significant source 
> of software faults. For this reason, they are of key concern
> in improving the availability and stability of the operating
> system. A critical element in creating Highly Available (HA)
> environment is to reduce the likelihood of faults in key 
> drivers, a methodology called driver hardening. 

Or in simpler terms, making drivers that work, right?
Do you have any statistics that show that existing Linux drivers are a
problem with HA systems?  If so, which drivers?

> A device driver is typically implemented with emphasis on 
> the proper operation of the hardware. Attention to how it 
> will function in the event of hardware faults is often 
> minimal.

Ah, a broad generalization, very nice to set up for the reasoning behind
your project.  But is this really true?  Lots of existing kernel drivers
can handle a wide range of hardware faults, and user faults.  Again, do
people have any specific problems with existing drivers, or driver
subsystems?

> The goal of a hardened driver is to provide an environment 
> in which hardware and software failures are transparent to 
> the applications using their services, where possible. The 
> way to effectively achieve this goal is to analyze a 
> driver's software design and implement appropriate changes
> to improve stability, reliability and availability, and 
> to provide instrumentation for management middleware.

So in order to achieve reliable drivers, we want to add more lines of
code to the driver to allow for instrumentation?  What happens when the
fault happens in the instrumentation interface?  And what is watching
this interface for problems in it's handling of data?

> We believe that improving driver stability and reliability 
> includes such measures as ensuring that all wait loops are
> limited with a timeout, validating input and output data and
> structuring the driver to anticipate hardware errors. 

All good things to achieve.  Have you looked at the kernel-janitors
project?  There are lots of places where you all can jump in to achieve
this right now in the existing code.  Patches for these items are always
welcome, a spec is not needed :)

> Improving availability includes adding support for device
> hot swapping and validating the driver with fault injection.

Hot-swap needs to have hardware that can support this.  Linux currently
supports these kinds of hardware configurations (USB, IEEE1294, PCI
Hotplug, cPCI Hotplug, hotplug CPU, etc.)  Are there existing types of
hardware that is present in your systems that do not have support on
Linux?  And if so, creating drivers for this hardware would be greatly
appreciated.

As for "fault injection", this traditionally requires hardware test
setups that are beyond the means of most kernel programmers.  Will your
group be providing access to this kind of hardware for kernel developers
to test their drivers with?

> Instrumentation for management middleware includes functions
> such as reporting of statistical indicators and logging of 
> pertinent events to enable postmortem analysis in the event
> of a failure.

Um, about this middleware management layer, are you talking about
RAS-style kernel logging?  If so, please see the archives about why the
current implementation of this has been rejected by the kernel
community.

> We've identified four areas in which drivers can be hardened:
> o Hardening with code robustness

You mean the driver core?  That should be a requirement of any Linux
kernel driver today, hardened or not.  So all Linux drivers already meet
this, right?  If not, please let us know and they will be fixed.

> o Hardening with event logging

See the above comment about RAS.

> o Hardening with diagnostics

Ah, but most hardware does not support diagnostics.  What do you do
suggest be done for this?

> o Hardening with resource monitoring and statistics

The middle management layer, right?  I'll get into my response of this
once I've gone over the spec.

> We've also identified some key areas we feel are most critical
> to overall system stability and plan to focus initial hardening 
> efforts on drivers for network interface cards, physical storage, 
> and logical storage.

In a quick look at your example code and documentation, this is all for
the 2.4 kernel.  As the 2.5 deadline is almost a month away, do you have
any intention of trying to get these features and layers into the 2.5
kernel?  And if not, are you willing to wait until the 2.7 kernel is
opened up?

That's probably enough questions for now :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* my review of the Device Driver Hardening Design Spec
  2002-09-21  1:40 ` Greg KH
@ 2002-09-21  5:34   ` Greg KH
  2002-09-21 15:21     ` Martin J. Bligh
  0 siblings, 1 reply; 13+ messages in thread
From: Greg KH @ 2002-09-21  5:34 UTC (permalink / raw)
  To: Rhoads, Rob, linux-kernel, hardeneddrivers-discuss, cgl_discussion

<sorry for the mangled header on the first send of this>

On Fri, Sep 20, 2002 at 06:40:54PM -0700, Greg KH wrote:
> Hi,
> 
> I've just started to read over the published spec, and will reserve
> comment on it, and the example code you've created after I'm done
> reading it.

Ok, here's some comments on the 0.5h release of the Device Driver
Hardening Design Specification:

(I'll skip the intro, and feel good sections and get into the details
that you lay out, starting in section 2)

Section 2:
2.1:
	- do NOT use /proc for driver info.  Use driverfs.
	- If you are using a kernel version that does not have driverfs,
	  put all /proc driver info under /proc/drivers, which is where
	  it belongs.
	- Only have 1 value per file, and no binary data in the files.
	- Do not put the "kernel version for which the driver was
	  compiled", as that _always_ much match the kernel version that
	  is running, so is redundant.

2.2:
	- do NOT use typedef

2.5.5:
	- you do not have to always check data returned from functions,
	  if you wrote the functions in the first place.  Redundant
	  checking of all data within the kernel, slows things down.
	  Sure, some checking is good, but do not say that it is a
	  requirement, or no one will want to use your driver.

The majority of section 2 is very nice, it's a good list of things that
drivers should do.


Section 3:

Wow, where to start...

The Common Statistic Manager:
	- why does this have to live in the kernel?  It should be in
	  userspace, grabbing all of the data from the /proc files you
	  just specified in section 2.1.
	  
POSIX event logging:
	- wow, not much I can say here, that hasn't already been said
	  before :(

Diagnostics:
	- now these are a good idea.  A common subsystem that drivers
	  can register what kind of diagnostics they can run on their
	  hardware, nice.

3.1.1:
	- UUIDs!!!???  You have got to be kidding.  Here, for the
	  benefit of those who have not read this, I'll quote:
	  	"Each subsystem, and each resource contained within each
		subsystem, needs to be uniquely identified.  In order to
		do this a hardened driver developer shall pre-assign a
		Universally Unique Identifier (UUID) as the Subsystem ID
		for each subsystem, and shall provide a means to assign
		a unique Resource ID string for each resource within a
		subsystem."
	
	 So for every resource, a string shall be associated with it.
	 But that means for most resources, the string will take up more
	 memory than the resource itself does.  Does that make sense?

	 It's also up to the driver to create these resource ids at
	 runtime and guarantee their uniqueness over the lifetime of the
	 kernel.  How in the world can you expect every driver author to
	 do this?  Any example code out there?

	 And what are these UUIDs going to be used for, ah, event
	 logging.  Enough said.

3.2 Statistics:
	You actually want every driver to support SNMP compliant
	statistics groups within themselves?  Why?  What a bloat of a
	kernel.

	All of this should be done (if at all) from userspace.
	

3.2.5.2:
(I'm not condoning ANY of these functions or code, just trying to point out how
you should, if they were to be in the kernel, done properly.)
	- do not use typedef
	- struct stat_info does not need *unit, as that is already
	  specified in the scale field, right?
	- the stat_value_t union is just a horrible abomination, don't
	  do that.

3.3 Diagnostics:
	- not a bad idea, but some work could be done on the
	  implementation.  Would fit in nicely with the device driver
	  model in 2.5.  For 2.4, it would be another subsystem a driver
	  would register with.

3.3.3.2:
	- no typedefs
	- run() is horrible, you are trying to fit all kinds of possible
	  diagnosis into one function callback.  Not a good idea.
	  Break the different kinds of callbacks out into different
	  functions.  That ensures type safety, right now you are just
	  creating another ioctl() type mess.

3.4 Event logging:
	- I'm not even going to touch this, sorry.

4: High Availability
	- are you all working with the existing HA group?

4.1:
	- um, what are you trying to say here.  This section is
	  pointless.  Yes we all think Hot Swap is a good idea, that's
	  why Linux currently supports it.

4.2:
	- RAID and ethernet bonding is nice. Again, Linux already has
	  projects and support for these things.  Why mention them?


The rest of this section is fine, and I welcome any test harnesses that
are created to do this kind of fault injection for driver testing.

5:
	- Here you back-pedal on everything you said up till now.  Let
	  me summarize what is said in these 3 paragraphs in 1 sentence:
	  	"Yes, all these things are well and good, but don't let
		them effect the currently great performance Linux has
		today."
	  Sorry, but you can't have it both ways.

5.1:
	- do NOT use #ifdef in the .c files.  Only in .h files.
	- why is CONFIG_DRIVER_HOTSWAP an option.  What does it do that
	  CONFIG_HOTPLUG does not do today?
	- actually, what do any of these CONFIG_ options do, and why
	  would someone not want the CONFIG_DRIVER_ROBUST to be always
	  enabled?


In summary, I think that a lot of people have spent a lot of time in
creating this document, and the surrounding code that matches this
document.  I really wish that a tiny bit of that effort had gone into
contacting the Linux kernel development community, and asking to work
with them on a project like this.  Due to that not happening, and by
looking at the resultant spec and code, I'm really afraid the majority
of that time and effort will have been wasted.

What do I think can be salvaged?  Diagnostics are a good idea, and I
think they fit into the driver model in 2.5 pretty well.  A lot of
kernel janitoring work could be done by the CG team to clean up, and
harden (by applying the things in section 2) the existing kernel
drivers.  That effort alone would go a long way in helping the stability
of Linux, and also introduce the CG developers into the kernel community
as active, helping developers.  It would allow the CG developers to
learn from the existing developers, as we must be doing something right
for Linux to be working as well as it does :)

Also, open specs for the hardware the CG members produce, to allow
existing kernel drivers to be enhanced (instead of having to be reverse
engineered), and new kernel drivers to be created, would also go a long
way in helping out both the CG's members and the entire Linux
community's cause of having a robust, stable kernel be achived easier.
Closed specs, and closed drivers do not help anyone.


thanks for reading this far,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-21  1:06 ` [ANNOUNCE] Linux Hardened " Andre Hedrick
@ 2002-09-21 10:41   ` Bernd Eckenfels
  2002-09-21 11:20     ` Russell King
  0 siblings, 1 reply; 13+ messages in thread
From: Bernd Eckenfels @ 2002-09-21 10:41 UTC (permalink / raw)
  To: linux-kernel

In article <Pine.LNX.4.10.10209201753310.25090-100000@master.linux-ide.org> you wrote:
> Regardless, it takes (fill in the blank) to boldly ask people to add APIs
> for an industry who is only interested in using and not contributing.

There is more than one industry interested in it. It simply sucks if your
kernel panic only because you remove a SCSI cable. IT also sucks if your
kernel panics only vecause you have a bad block on a Disk.

Companies which build carrier grade Linux Systems (like HP, IBM and SGI _do_
contribute on making Linux an Enterprise System).

So personal I find this project good, and adding the Linux Testing community
is needed. But I dont think that a lot of new APIs is needed in the first
place. (Well, possibly for things like path failover/md somebody needs to
define an actual error handling, like it is done currently), but "debugging"
all drivers by review is needed. On the other hand, the reason this has not
happend just shows us, that it is not trivial to find a second person which
understands hardware's error behaviour.

Greetings
Bernd

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-21 10:41   ` Bernd Eckenfels
@ 2002-09-21 11:20     ` Russell King
  0 siblings, 0 replies; 13+ messages in thread
From: Russell King @ 2002-09-21 11:20 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

On Sat, Sep 21, 2002 at 12:41:59PM +0200, Bernd Eckenfels wrote:
> In article <Pine.LNX.4.10.10209201753310.25090-100000@master.linux-ide.org> you wrote:
> > Regardless, it takes (fill in the blank) to boldly ask people to add APIs
> > for an industry who is only interested in using and not contributing.
> 
> There is more than one industry interested in it. It simply sucks if your
> kernel panic only because you remove a SCSI cable. IT also sucks if your
> kernel panics only vecause you have a bad block on a Disk.

Both of which I'd classify as bugs.  I recently submitted a few patches
that fix some of the idiotic or bad error handling in the 2.4 SCSI
layer.  Although they didn't completely fix some of the problems, it
did highlight some of the problem areas.

> On the other hand, the reason this has not
> happend just shows us, that it is not trivial to find a second person which
> understands hardware's error behaviour.

Or people with broken hardware don't report that the error paths are
broken; they just fix their hardware.

I have a Syquest 270MB drive here.  Bought from new, but it has never
worked 100% properly.  It mostly complains about media errors and the
like.  After several rounds with Syquest, I lost faith in it.  However,
I still have it.  Why?

I keep test filesystems on the cartridges.  Perfect when I want to run
some tests that could well take out a filesystem, or when I want to test
out the SCSI error handling.  That's how I found that the 2.4 SCSI error
handling code has the possibility to eat disks alive when it encounters
an error.

Would extra API's have helped find this?  Would it have made the driver
more stable?  Would it have caught the bug in my SCSI driver that caused
it not to request sense on error and therefore throw the SCSI subsystem
into a never-ending loop?  The answers are: no, no, no.

Would testing with broken hardware have found this?  Would it make the
driver more stable?  Yes, and yes.

IMO, driver stability comes with testing and review by people who know
both the hardware _and_ who know the kernel API inside out.  There seems
to be a lack the latter, and a lack of people with broken hardware for
the former.

So next time when your hard disk develops media errors, or your network
card starts corrupting data, think about whether it would be a useful
test device to someone.  (Obviously not if its completely 100% dead.)

-- 
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: my review of the Device Driver Hardening Design Spec
  2002-09-21  5:34   ` my review of the Device Driver Hardening Design Spec Greg KH
@ 2002-09-21 15:21     ` Martin J. Bligh
  0 siblings, 0 replies; 13+ messages in thread
From: Martin J. Bligh @ 2002-09-21 15:21 UTC (permalink / raw)
  To: Greg KH, Rhoads, Rob, linux-kernel, hardeneddrivers-discuss,
	cgl_discussion

> What do I think can be salvaged?  Diagnostics are a good idea, and I
> think they fit into the driver model in 2.5 pretty well.  A lot of
> kernel janitoring work could be done by the CG team to clean up, and
> harden (by applying the things in section 2) the existing kernel
> drivers.  That effort alone would go a long way in helping the stability
> of Linux, and also introduce the CG developers into the kernel community
> as active, helping developers.  It would allow the CG developers to
> learn from the existing developers, as we must be doing something right
> for Linux to be working as well as it does :)

People with fault injection hardware are also extremely helpful 
(assuming they do something useful with it). That's not something most 
of the community would have access to, but the CG-type people probably 
do. A couple of people who spent their full time kicking the hell out
of Sequent's fibrechannel system made a massive difference to it's
quality and reliabilty. 

That's definitely something this project could help by doing ... 
whatever people feel about the some of more theoretical aspects to 
their work being discussed, I think few would object to some real-world 
help from people tracking down and fixing existing bugs, especially in
the error handling.

M.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-21  0:26 [ANNOUNCE] Linux Hardened Device Drivers Project Rhoads, Rob
                   ` (2 preceding siblings ...)
  2002-09-21  1:40 ` Greg KH
@ 2002-09-23  6:13 ` Randy.Dunlap
  2002-09-23 12:31 ` Lars Marowsky-Bree
  2002-09-23 22:38 ` Rhoads, Rob
  5 siblings, 0 replies; 13+ messages in thread
From: Randy.Dunlap @ 2002-09-23  6:13 UTC (permalink / raw)
  To: Rhoads, Rob
  Cc: 'linux-kernel@vger.kernel.org',
	cgl_discussion, hardeneddrivers-discuss

On Fri, 20 Sep 2002, Rhoads, Rob wrote:

| Project Announcement:
| --------------------
|
| Initially we've created a specification, a few kernel modules
| that implement a set of driver programming interfaces, and
| a sample device driver that demonstrates those interfaces.
| -

Only addressing spec bugs for now.
More comments tomorrow when I'm more awake.

section
3.1.1.1	"Table 2 takes a closer look at the fields...."
	No, it doesn't.

3.2.3	"The CONFIG_DRIVER_STATISTICS flag...."
	but section 5.1 calls it CONFIG_DRIVER_STATS.

3.3.2	"The CONFIG_DRIVER_STATISTICS build configuration option"
	should be CONFIG_DRIVER_DIAGNOSTICS
	and change "statistics support" to "diagnostics support"

3.4.2.7.3, example 2:  missing final '|' after "%s"

3.4.3.8.1, Comments on #defines:
	aren't several of these backwards?

3.4.3.8.3, last #define:  bad font change.

-- 
~Randy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-21  0:26 [ANNOUNCE] Linux Hardened Device Drivers Project Rhoads, Rob
                   ` (3 preceding siblings ...)
  2002-09-23  6:13 ` [ANNOUNCE] Linux Hardened Device Drivers Project Randy.Dunlap
@ 2002-09-23 12:31 ` Lars Marowsky-Bree
  2002-09-23 22:38 ` Rhoads, Rob
  5 siblings, 0 replies; 13+ messages in thread
From: Lars Marowsky-Bree @ 2002-09-23 12:31 UTC (permalink / raw)
  To: Rhoads, Rob, 'linux-kernel@vger.kernel.org'

On 2002-09-20T17:26:47,
   "Rhoads, Rob" <rob.rhoads@intel.com> said:

Hi Rob,

I fully support the idea to audit the Linux device drivers - using guidelines,
hardware fault injection, stress testing etc - and fixing any potential bugs.
This is obviously a very important task, because the drivers are some of the
most ugly code I've seen in the kernel.

"Pro-active monitoring", ie by basically gathering whatever statistics are
available and feeding them to some sort of user-space application and then
trying to deduce a potential failure is also a very valuable goal; so exposing
more statistics seems definetely good, too. As long as that doesn't introduce
even more errors...

Any help you can offer on the above is surely appreciated by all involved and
will have a direct, positive impact on Linux.

That said, and the fluff in your specification aside (which was very likely
necessary for management ;-), your spec certainly contains some good points on
how to write stable and robust code. (Aside from the comments the others have
raised already regarding event logging and that of course all recommendations
need to be thoughtfully applied to the case in question)

The statistics can best be exposed via driverfs or /proc (for kernels which
don't have driverfs); however, the statistics analyser nor the SNMP agent
pre-processing belong into the kernel itself. Keep the drivers as lean as
possible, that will introduce less errors at this level. I object to the CSM
being in kernel space. Having a more or less common API for the statistics to
be gathered and exposed by the drivers would be highly valuable indeed though.

What are your further timelines?

A lot of the above - ie, audit and test current drivers - can be done without
(at least not with much more) further planning; I'm always rather amazed at
how much effort Intel, IBM and their child OSDL spent on pretty specifications
which could also be applied to real work ;-)



Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
Principal Squirrel
Research and Development, SuSE Linux AG
 
``Immortality is an adequate definition of high availability for me.''
	--- Gregory F. Pfister


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [ANNOUNCE] Linux Hardened Device Drivers Project
@ 2002-09-23 22:38 ` Rhoads, Rob
  2002-09-24  0:08   ` Greg KH
  2002-09-24 17:12   ` Greg KH
  0 siblings, 2 replies; 13+ messages in thread
From: Rhoads, Rob @ 2002-09-23 22:38 UTC (permalink / raw)
  To: 'linux-kernel@vger.kernel.org'

I appreciate all the feedback. Based on the wide variety 
of ideas/comments, it looks like I need to go back and 
incorporate these ideas into the document, potentially 
changing areas in major ways where appropriate.    

Rather than bog down this mailing list with exchanges, 
I would like to move this discussion to the hardened 
driver mailing list.  Please don't feel like I'm 
ignoring your feedback--just moving the forum.

An underlying theme tends to revolve around the binding
of the concepts of 'hardening' and RAS features being 
added to drivers.  We will be looking into splitting 
these two different approaches out from this singular 
document and into their appropriate locations.

If you are interested (even if you aren't) please go 
to http://lists.sourceforge.net/lists/listinfo/hardeneddrivers-discuss 
and subscribe to the mailing list.

+=+=+
Rob Rhoads                     mailto:rob.rhoads@intel.com
Staff Software Engineer        office: 503-677-5498
Telecom Software Platforms
Intel Communications Group

This email message solely contains my own personal views, and not
necessarily those of my employer.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-23 22:38 ` Rhoads, Rob
@ 2002-09-24  0:08   ` Greg KH
  2002-09-24 17:12   ` Greg KH
  1 sibling, 0 replies; 13+ messages in thread
From: Greg KH @ 2002-09-24  0:08 UTC (permalink / raw)
  To: Rhoads, Rob; +Cc: 'linux-kernel@vger.kernel.org'

On Mon, Sep 23, 2002 at 03:38:32PM -0700, Rhoads, Rob wrote:
> 
> Rather than bog down this mailing list with exchanges, 
> I would like to move this discussion to the hardened 
> driver mailing list.  Please don't feel like I'm 
> ignoring your feedback--just moving the forum.

No, please don't move this off to another mailing list.  This is where
the majority of all kernel programmers are, don't try to make us move to
yet-another-mailing-list just to discuss your project.  If you want our
contributions, and want our input, use this list.

If you stay on smaller mailing lists, like cg-discuss and
hardened-drivers, you do not reach the widest group of people, which is
what you will have to do if you want to have a chance for your
contributions to become part of the main kernel.

> An underlying theme tends to revolve around the binding
> of the concepts of 'hardening' and RAS features being 
> added to drivers.  We will be looking into splitting 
> these two different approaches out from this singular 
> document and into their appropriate locations.

Where would these locations be?

> If you are interested (even if you aren't) please go 
> to http://lists.sourceforge.net/lists/listinfo/hardeneddrivers-discuss 
> and subscribe to the mailing list.

Sorry, but major kernel driver discussions should occur on lkml.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [ANNOUNCE] Linux Hardened Device Drivers Project
  2002-09-23 22:38 ` Rhoads, Rob
  2002-09-24  0:08   ` Greg KH
@ 2002-09-24 17:12   ` Greg KH
  1 sibling, 0 replies; 13+ messages in thread
From: Greg KH @ 2002-09-24 17:12 UTC (permalink / raw)
  To: Rhoads, Rob; +Cc: linux-kernel, hardeneddrivers-discuss, cgl_discussion

On Mon, Sep 23, 2002 at 03:38:32PM -0700, Rhoads, Rob wrote:
> I appreciate all the feedback. Based on the wide variety 
> of ideas/comments, it looks like I need to go back and 
> incorporate these ideas into the document, potentially 
> changing areas in major ways where appropriate.    

Not to be a pest, but I, and a lot of other people, posted some very
specific questions in response to both your original posting, and in
response to the published specification and published code.  It would be
considered proper etiquette if you would at least try to respond to
_some_ of these questions, as you did ask for them, rather than stating
that you are going to go mull over everything and come back with a
modified document.

If you don't, any expectations of people reviewing future specs, or
proposals from this project should be kept quite low.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2002-09-24 17:07 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-21  0:26 [ANNOUNCE] Linux Hardened Device Drivers Project Rhoads, Rob
2002-09-21  0:54 ` [ANNOUNCE] Linux [add-more-silly-APIs] " Jeff Garzik
2002-09-21  1:06 ` [ANNOUNCE] Linux Hardened " Andre Hedrick
2002-09-21 10:41   ` Bernd Eckenfels
2002-09-21 11:20     ` Russell King
2002-09-21  1:40 ` Greg KH
2002-09-21  5:34   ` my review of the Device Driver Hardening Design Spec Greg KH
2002-09-21 15:21     ` Martin J. Bligh
2002-09-23  6:13 ` [ANNOUNCE] Linux Hardened Device Drivers Project Randy.Dunlap
2002-09-23 12:31 ` Lars Marowsky-Bree
2002-09-23 22:38 ` Rhoads, Rob
2002-09-24  0:08   ` Greg KH
2002-09-24 17:12   ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).