linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] New Driver Model for 2.5
@ 2001-10-19 23:33 Benjamin Herrenschmidt
  2001-10-20  0:09 ` Linus Torvalds
  0 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-19 23:33 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Jeff Garzik, linux-kernel, Linus Torvalds

Reading about the suspend to disk issue, and thinking about
some of pmac needs, I tend to stil think we have overlooked
that ordering issue.

We should probably add a couple of list_heads to define a second
tree in parallell to the device-tree, which is the power tree.
A device is by default inserted in both tree as a child of it's bus
controller.
But the arch must be able to move it elsewhere. I beleive we have
a way around the VM related ordering issues, but we do have other
kind of ordering constraints that have to be dealt with when
we start broadcasting the callbacks. 

Also, I think you didn't state that io_bus is a superset of device.
In fact, it's just a device that has childs, and this should
probably be more generically viewed in struct device itself.

Any device should be able to have childs, so we really have 2
interleaved trees of devices, the bus tree and the power tree.
In fact, to be complete, we could even define the interrupt tree
with one more set of links as it's really not related to the bus
tree on many archs/machines, and having a tree definition is really
useful when you deal with cascaded controllers.

What do you think ?

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 23:33 [RFC] New Driver Model for 2.5 Benjamin Herrenschmidt
@ 2001-10-20  0:09 ` Linus Torvalds
  2001-10-20  9:28   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Linus Torvalds @ 2001-10-20  0:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Patrick Mochel, Jeff Garzik, linux-kernel


On Sat, 20 Oct 2001, Benjamin Herrenschmidt wrote:
>
> Reading about the suspend to disk issue, and thinking about
> some of pmac needs, I tend to stil think we have overlooked
> that ordering issue.

Why?

If there is some ordering inherent in the bus, that has to be shown in the
bus structure. Why would you EVER care about order between devices that
are independent?

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-20  0:09 ` Linus Torvalds
@ 2001-10-20  9:28   ` Benjamin Herrenschmidt
  2001-10-21 17:09     ` Pavel Machek
  0 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-20  9:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Patrick Mochel, Jeff Garzik, linux-kernel

>Why?
>
>If there is some ordering inherent in the bus, that has to be shown in the
>bus structure. Why would you EVER care about order between devices that
>are independent?

The power tree layout isn't necessarily identical to the bus tree layout.
On some macs, for example, we have some ASICs that can control some other
chip's clock and power lines, without having a direct parent relationchip.

So I like having the ability to reorder the power tree layout from
arch code. But I can work around if this ability is not provided by
the struct device. In fact, my main issue here is with Apple's big
"mac-io" ASIC (combo of several devices along with various IO lines
and clocks), and I beleive I will have to handle it as a special
case anyway for other reason (it must really be shut down last as
once down, I can't even talk to the power manager chip ;) I think
all other devices I have to deal with follow the physical bus
ordering.

The problem of suspend-to-disk, which requires, I beleive, that the
device used for the memory backup, to be state-saved last, is still
a problem I don't know how to solve. Maybe using flags in the device
structure indicating it's deferred. That would cause it's parents
to be deferred as well. The presence of the flag would prevent the
actual "suspend" state to be entered during step 3. Once all devices
are suspended, we then know that the bus path to the disk used for
suspend-to-disk is still powered and perform the actual suspend-to
disk operation.

However, I beleive that requires using non-generic IO functions (as
IO queues for the controller have already been blocked by step 2, and
the driver would have to deal with it's own saved state carefuly as
it can't obviously save state to RAM after it has been used to backup
the RAM itself). Maybe that could be a separate message (suspend_to_disk)
sent instead of step 3 (suspend) to this device.

That would give us the following scenario:

 - The device for suspend-to-disk is identified and a flag is set
   in it's device structure. This flag (or a different one to make
   things clear eventually) is "broadcast" all the way up the tree
   so it's parent brigdes/controllers are marked as well.
 - All devices get "suspend_prepare".
 - All devices get "suspend_save_state" and block normal IOs
 - All devices not marked above get "suspend"
 - Last housecleaning is done by the kernel.
 - The device marked above get a special "suspend_to_disk" message
   during which it can perform the actual memory backup and suspend
   itself.
 - The machine is put to sleep.

I currently don't implement suspend-to-disks on Mac, so I may have
overlooked something. Also, I'm not too sure about the requirements
of x86 laptops regarding those features. I'm lucky, Mac laptops
keep the RAM content during suspend ;)

Note also that if not doing suspend-to-disk, I think we should also
make sure to sync all buffers after suspend_prepare and before sending
the suspend_save_state messages. I noticed that recent 2.4 versions
are more sensible about power fail during suspend (that is battery
getting empty, or whatever causing lose of RAM content). I used to
call fsync_devs(0) between those 2 steps in the Mac PM scheme, but
it appears that with recent 2.4's, this doesn't prevent fsck from
finding inconsistencies.

Ben.




^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-20  9:28   ` Benjamin Herrenschmidt
@ 2001-10-21 17:09     ` Pavel Machek
  2001-10-23  0:19       ` Patrick Mochel
  0 siblings, 1 reply; 120+ messages in thread
From: Pavel Machek @ 2001-10-21 17:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Patrick Mochel, Jeff Garzik, linux-kernel

Hi!

> The problem of suspend-to-disk, which requires, I beleive, that the
> device used for the memory backup, to be state-saved last, is still
> a problem I don't know how to solve. Maybe using flags in the device

Don't care about it. Its easy.

> That would give us the following scenario:
> 
>  - The device for suspend-to-disk is identified and a flag is set
>    in it's device structure. This flag (or a different one to make
>    things clear eventually) is "broadcast" all the way up the tree
>    so it's parent brigdes/controllers are marked as well.

You don't need this.

>  - All devices get "suspend_prepare".
>  - All devices get "suspend_save_state" and block normal IOs
>  - All devices not marked above get "suspend"

... not needed. You are going powerdown (suspend-to-disk ends in
powerdown, right?), so you don't care about state devices are in. You don't need to suspend them.

You just write state to disk and powerdown, now.
								Pavel
-- 
I'm pavel@ucw.cz. "In my country we have almost anarchy and I don't care."
Panos Katsaloulis describing me w.r.t. patents at discuss@linmodems.org

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-21 17:09     ` Pavel Machek
@ 2001-10-23  0:19       ` Patrick Mochel
  2001-10-23  0:31         ` Alan Cox
  0 siblings, 1 reply; 120+ messages in thread
From: Patrick Mochel @ 2001-10-23  0:19 UTC (permalink / raw)
  To: Pavel Machek; +Cc: Benjamin Herrenschmidt, Jeff Garzik, linux-kernel


> > That would give us the following scenario:
> >
> >  - The device for suspend-to-disk is identified and a flag is set
> >    in it's device structure. This flag (or a different one to make
> >    things clear eventually) is "broadcast" all the way up the tree
> >    so it's parent brigdes/controllers are marked as well.
>
> You don't need this.

Correct. Suspend-to-disk is a process, not a discrete action. It goes much
like Ben described:

>From the system point of view (slightly different from the driver's point
of view):

- notify devices of pending suspension; check for failures
- tell devices to suspend state; write this state to disk
- turn all devices off
- power off (aka suspend system)

> >  - All devices get "suspend_prepare".
> >  - All devices get "suspend_save_state" and block normal IOs
> >  - All devices not marked above get "suspend"
>
> ... not needed. You are going powerdown (suspend-to-disk ends in
> powerdown, right?), so you don't care about state devices are in. You don't need to suspend them.
>
> You just write state to disk and powerdown, now.

Uhm...That would probably work. But, I would rather explicitly turn off
all the devices. In the world of ACPI, S4 and S5 (Suspend to disk and
soft-off respectively) don't power the system completely off; some power
remains to capture wake events.

[ One point to note is that in an ACPI-enabled system, we use S5 to power
the system off. But, there are things still running in the system.
Everything is not shut off until you perform a mechanical off (unplug it).
This means that you can get wake events and cause the system to boot after
you think you've turned it off. This is why some people are experiencing
reboots when the system should be powering down. ]

By explicitly turning off as many devices as possible, we're playing more
on the safe side of things, and possibly reducing the amount of power that
is being consumed.


Btw, I updated the model to support an n-stage suspend process, with 3
stages explicitly defined, as per some discussion about it. Those stages
are:

        SUSPEND_NOTIFY
        SUSPEND_SAVE_STATE
        SUSPEND_POWER_DOWN

To suspend the device tree, one would do something like:

	/* Tell all the devices we're going to sleep.
	 * This also allows them to allocate memory before the swap
	 * device stops taking orders.
	 */
	device_suspend(3, SUSPEND_NOTIFY);

	/* if someone failed, get out now */

	/* Now tell them to stop I/O and save their state */
	device_suspend(3, SUSPEND_SAVE_STATE);

	/* Write the state to disk... */

	/* Finally, turn all of the devices off. */
	device_suspend(3, SUSPEND_POWER_DOWN);

	/* Now, put the system to sleep. */


I also updated the docs. It can all be found at:

http://kernel.org/pub/linux/kernel/people/mochel/device/


	-pat


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  0:31         ` Alan Cox
@ 2001-10-23  0:29           ` Patrick Mochel
  2001-10-23  7:53             ` Alan Cox
  2001-10-23  9:44             ` Pavel Machek
  2001-10-23 10:54           ` Benjamin Herrenschmidt
  1 sibling, 2 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-23  0:29 UTC (permalink / raw)
  To: Alan Cox; +Cc: Pavel Machek, Benjamin Herrenschmidt, Jeff Garzik, linux-kernel


On Tue, 23 Oct 2001, Alan Cox wrote:

> > 	/* Now tell them to stop I/O and save their state */
> > 	device_suspend(3, SUSPEND_SAVE_STATE);
>
> I'd very much like this one to be two pass, with the second pass occuring
> after interrupts are disabled. There are some horrible cases to try and
> handle otherwise (like devices that like to jam the irq line high).

I forgot to mention to disable interrupts after the SUSPEND_NOTIFY call.
The idea is to allocate all memory in the first pass, disable interrupts,
then save state. Would that work? Or, should some of the state saving take
place with interrupts enabled?


> Ditto on return from suspend where some devices also like to float the irq
> high as you take them over (eg USB on my Palmax). From comments Ben made
> ages back I believe ppc has similar issues if not worse

Yes, the resume sequence is broken into two stages:

	device_resume(RESUME_POWER_ON);

	/* enable interrupts */

	device_resume(RESUME_RESTORE_STATE);

Do you see a need to break it up further?

	-pat



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  0:19       ` Patrick Mochel
@ 2001-10-23  0:31         ` Alan Cox
  2001-10-23  0:29           ` Patrick Mochel
  2001-10-23 10:54           ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-23  0:31 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Pavel Machek, Benjamin Herrenschmidt, Jeff Garzik, linux-kernel

> 	/* Now tell them to stop I/O and save their state */
> 	device_suspend(3, SUSPEND_SAVE_STATE);

I'd very much like this one to be two pass, with the second pass occuring
after interrupts are disabled. There are some horrible cases to try and
handle otherwise (like devices that like to jam the irq line high).

Ditto on return from suspend where some devices also like to float the irq
high as you take them over (eg USB on my Palmax). From comments Ben made
ages back I believe ppc has similar issues if not worse


Alan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  0:29           ` Patrick Mochel
@ 2001-10-23  7:53             ` Alan Cox
  2001-10-23 15:10               ` Jonathan Lundell
  2001-10-23  9:44             ` Pavel Machek
  1 sibling, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-23  7:53 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Alan Cox, Pavel Machek, Benjamin Herrenschmidt, Jeff Garzik,
	linux-kernel

> The idea is to allocate all memory in the first pass, disable interrupts,
> then save state. Would that work? Or, should some of the state saving take
> place with interrupts enabled?

Imagine the state saving done on a USB device. There you need interrupts
on while retrieving the state from say a USB scanner, and in some cases
off while killing the USB controller.

> > Ditto on return from suspend where some devices also like to float the irq
> > high as you take them over (eg USB on my Palmax). From comments Ben made
> > ages back I believe ppc has similar issues if not worse
> 
> Yes, the resume sequence is broken into two stages:
> 
> 	device_resume(RESUME_POWER_ON);
> 
> 	/* enable interrupts */
> 
> 	device_resume(RESUME_RESTORE_STATE);
> 
> Do you see a need to break it up further?

Nope.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  0:29           ` Patrick Mochel
  2001-10-23  7:53             ` Alan Cox
@ 2001-10-23  9:44             ` Pavel Machek
  2001-10-23 11:03               ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 120+ messages in thread
From: Pavel Machek @ 2001-10-23  9:44 UTC (permalink / raw)
  To: Patrick Mochel
  Cc: Alan Cox, Benjamin Herrenschmidt, Jeff Garzik, linux-kernel

Hi!

> > > 	/* Now tell them to stop I/O and save their state */
> > > 	device_suspend(3, SUSPEND_SAVE_STATE);
> >
> > I'd very much like this one to be two pass, with the second pass occuring
> > after interrupts are disabled. There are some horrible cases to try and
> > handle otherwise (like devices that like to jam the irq line high).
> 
> I forgot to mention to disable interrupts after the SUSPEND_NOTIFY call.
> The idea is to allocate all memory in the first pass, disable interrupts,
> then save state. Would that work? Or, should some of the state saving take
> place with interrupts enabled?

That looks ugly, because you'd need to add DONT_SUSPEND_NOTIFY, called
when SUSPEND_NOTIFY fails.
								Pavel
-- 
Casualities in World Trade Center: 6453 dead inside the building,
cryptography in U.S.A. and free speech in Czech Republic.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  0:31         ` Alan Cox
  2001-10-23  0:29           ` Patrick Mochel
@ 2001-10-23 10:54           ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-23 10:54 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel, Patrick Mochel

>I'd very much like this one to be two pass, with the second pass occuring
>after interrupts are disabled. There are some horrible cases to try and
>handle otherwise (like devices that like to jam the irq line high).
>
>Ditto on return from suspend where some devices also like to float the irq
>high as you take them over (eg USB on my Palmax). From comments Ben made
>ages back I believe ppc has similar issues if not worse

Well, the idea here was to disable them between the second and third
pass. Device that can completely suspend with interrupts enabled can
do it at the end of step 2, while more broken devices can do it at
step 3. It might be semantically more clear to actually consider step
2 as exclusively "block io & save state", in which case, breaking up
the "suspend" state into 2 separate states with and without interrupt
makes sense. We just didn't fell that was necessary.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  9:44             ` Pavel Machek
@ 2001-10-23 11:03               ` Benjamin Herrenschmidt
  2001-10-23 11:49                 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-23 11:03 UTC (permalink / raw)
  To: Pavel Machek; +Cc: linux-kernel, Alan Cox, Patrick Mochel

>> I forgot to mention to disable interrupts after the SUSPEND_NOTIFY call.
>> The idea is to allocate all memory in the first pass, disable interrupts,
>> then save state. Would that work? Or, should some of the state saving take
>> place with interrupts enabled?
>
>That looks ugly, because you'd need to add DONT_SUSPEND_NOTIFY, called
>when SUSPEND_NOTIFY fails.
>								Pavel

No, interrupts have to be shut down between SUSPEND_SAVE_STATE and
SUSPEND_POWER_DOWN, I beleive.

SUSPEND_SAVE_STATE must run with interrupts enabled, as it's supposed
to both block new incoming IOs and wait for pending ones to complete (*).
It would be sub-efficient to force drivers to implement polled IOs for
this case.

SUSPEND_POWER_DOWN itself should perfectly be able to run with interrupts
disabled, I beleive, as must of the actual suspend sequence can be done
in SUSPEND_SAVE_STATE on most chips.

There is no problem with failure there. Just call RESUME_POWER_ON if
SUSPEND_POWER_DOWN failed (or later), then RESUME_RESTORE_STATE in
all cases. The driver knows from which state it comes from anyway,
and we don't have, I beleive, that strick VM need of separating
suspend from free's. Well... let's think more about it... we might
actually need to allocate memory in RESUME_RESTORE_STATE to create
new requests or whatever the driver need... but we can also do that
earlier, inside SUSPEND_NOTIFY (or just use whatever memory we
pre-allocated to save state). So it might make sense to have the
resume process be an exact mirror of the wakeup one, or not, maybe
just a matter of taste.

In most cases, keep in mind that most drivers won't need to implement
all of these.

Ben.





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23 11:03               ` Benjamin Herrenschmidt
@ 2001-10-23 11:49                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-23 11:49 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Pavel Machek, linux-kernel, Alan Cox

>SUSPEND_SAVE_STATE must run with interrupts enabled, as it's supposed
>to both block new incoming IOs and wait for pending ones to complete (*).
>It would be sub-efficient to force drivers to implement polled IOs for
>this case.

I forgot...

(*) Did you decide if you allowed that "call me again later" result code
from SUSPEND_SAVE_STATE ? If yes, that would mean you must loop notifying
all drivers that have not ack'ed it until they all do before going to
SUSPEND_POWER_DOWN. It's probably not much bloat to let the feature in,
as usual, it doesn't have to be used by drivers, but for those drivers
who knows it will take some time for pending async requests to complete,
it makes sense to let others perform they job. It would slightly speed
up the suspend process, which is not critical, but still nice ;)

Ben.





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23  7:53             ` Alan Cox
@ 2001-10-23 15:10               ` Jonathan Lundell
  2001-10-23 15:49                 ` Alan Cox
  2001-10-23 20:22                 ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 120+ messages in thread
From: Jonathan Lundell @ 2001-10-23 15:10 UTC (permalink / raw)
  To: Alan Cox, Patrick Mochel
  Cc: Alan Cox, Pavel Machek, Benjamin Herrenschmidt, Jeff Garzik,
	linux-kernel

At 8:53 AM +0100 10/23/01, Alan Cox wrote:
>  > The idea is to allocate all memory in the first pass, disable interrupts,
>>  then save state. Would that work? Or, should some of the state saving take
>>  place with interrupts enabled?
>
>Imagine the state saving done on a USB device. There you need interrupts
>on while retrieving the state from say a USB scanner, and in some cases
>off while killing the USB controller.

Is this a realistic example? That is, is a kernel-side driver likely 
to be able to meaningfully extract state information from a scanner? 
And is it necessary?

And for a scanner, if the current operation is a scan generating a GB 
of data, what happens if the disk subsystem is no longer accepting 
requests?

As Jeff Garzik pointed out, NIC drivers typically don't need to save 
any state at all; it's all recreateable from software structures. 
Perhaps that characteristic can and should be generalized to other 
devices.

In that case, SUSPEND_SAVE_STATE becomes more like SUSPEND_QUIESCE: 
stop accepting new requests, and complete current requests.

"Stop accepting new requests" is nontrivial as well, in the general 
case. New requests that can't be discarded need to be queued 
somewhere. Whose responsibility is that? Ideally at some point where 
a queue already exists, possibly in the requester.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23 15:10               ` Jonathan Lundell
@ 2001-10-23 15:49                 ` Alan Cox
  2001-10-23 20:22                 ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-23 15:49 UTC (permalink / raw)
  To: Jonathan Lundell
  Cc: Alan Cox, Patrick Mochel, Pavel Machek, Benjamin Herrenschmidt,
	Jeff Garzik, linux-kernel

> Is this a realistic example? That is, is a kernel-side driver likely 
> to be able to meaningfully extract state information from a scanner? 
> And is it necessary?

It may be a bad example - but think about things like page settings. Do you
want a resume to scan in colour when you set black and white just before
suspend ?


> And for a scanner, if the current operation is a scan generating a GB 
> of data, what happens if the disk subsystem is no longer accepting 
> requests?

It should have refused to suspend because it was active

> In that case, SUSPEND_SAVE_STATE becomes more like SUSPEND_QUIESCE: 
> stop accepting new requests, and complete current requests.

Maybe. That sounds like nice design and horrible implementation

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23 15:10               ` Jonathan Lundell
  2001-10-23 15:49                 ` Alan Cox
@ 2001-10-23 20:22                 ` Benjamin Herrenschmidt
  2001-10-23 20:54                   ` Alan Cox
  1 sibling, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-23 20:22 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Alan Cox, Patrick Mochel, linux-kernel, Linus Torvalds

>"Stop accepting new requests" is nontrivial as well, in the general 
>case. New requests that can't be discarded need to be queued 
>somewhere. Whose responsibility is that? Ideally at some point where 
>a queue already exists, possibly in the requester.

Some driver already handle queues. In the case of network driver, just
stop your network queue and stop accepting incoming packets. If your
driver is too simple to have queues, a simple semaphore on entry points
can often be enough. You shouldn't deadlock as you are not supposed to
re-enter a sleeping driver in step 2.

The above, is ensured by the tree layout which does the dependency
ordering. You might have slightly off-tree dependencies, like I have
in a couple of case on macs. But I figured that all of them could be
handled as special case in some parent nodes without beeing that
dirty (in most case, those are Apple specific ASICs containing devices
with inter-deps, and the workaround is to move some devices sleep code
to the node of the ASIC itself).

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23 20:22                 ` Benjamin Herrenschmidt
@ 2001-10-23 20:54                   ` Alan Cox
  2001-10-24  0:26                     ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-23 20:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jonathan Lundell, Alan Cox, Patrick Mochel, linux-kernel, Linus Torvalds

> >"Stop accepting new requests" is nontrivial as well, in the general 
> >case. New requests that can't be discarded need to be queued 
> >somewhere. Whose responsibility is that? Ideally at some point where 
> >a queue already exists, possibly in the requester.
> 
> Some driver already handle queues. In the case of network driver, just
> stop your network queue and stop accepting incoming packets. If your
> driver is too simple to have queues, a simple semaphore on entry points
> can often be enough. You shouldn't deadlock as you are not supposed to
> re-enter a sleeping driver in step 2.

Stop accepting new requests is not simple. To complete existing requests you
might need an arbitary other module to complete a new request you submit
as part of your shutdown. 

Alan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-23 20:54                   ` Alan Cox
@ 2001-10-24  0:26                     ` Benjamin Herrenschmidt
  2001-10-24  9:57                       ` Alan Cox
  0 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-24  0:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

>> Some driver already handle queues. In the case of network driver, just
>> stop your network queue and stop accepting incoming packets. If your
>> driver is too simple to have queues, a simple semaphore on entry points
>> can often be enough. You shouldn't deadlock as you are not supposed to
>> re-enter a sleeping driver in step 2.
>
>Stop accepting new requests is not simple. To complete existing requests you
>might need an arbitary other module to complete a new request you submit
>as part of your shutdown. 

That mean you have an ordering dependency, the driver you rely
upon must be stopped after you. That's the point of having a
tree here. Patrick and Linus feel the bus tree is enough to handle
that dependency, which might well be the case for 99% of drivers.

I have a couple of cases where that's not completely true on pmacs,
but nothing that can't worked around simply I beleive.

If you feel more drivers will be affected, then we will probably
need to separate the power-tree from the device-tree and provide
some hooks so that ordering can be tweaked.

All this assumes you don't have circular dependencies of course ;)

I see a lot of cases where this "block IOs" is easily dealt with
in the drivers I maintain on pmac, that might not be that easy on
other archs, I can't tell.

Basically, simple drivers can just use a semaphore. I do that for
our sound driver for example, I block any app doing an ioctl while
the driver is sleeping. (This happens late enough in the sleep
process so that userland using /dev/apm_bios already got
notified and acked the suspend, letting properly written apps to
have stopped themselves already).

Drivers using a request queue usually already have a way to mark
themselves busy (they use that to decide if they have to kick
the HW or not when getting a new request). In cases where a mid-layer
enters the scene, like SCSI, that wants to do timeouts, then well...
we can let it timeout (just stop processing requests), or we can
have the midlayer go to sleep as well :) That later solution
may cause some interesting ordering issues however...

Network drivers can stop their queue or just drop packets... I'd
like if they waited for packets received from the network stack
before the callback is called are waited to be sent. Those packets
may contain the request to a server to send a wake-on-lan magic
packet to your machine ;) For now, I just block the output
queue and flush the rings on pmac, but I also dont support WOL yet.

For fbdevs, I simply switch them to dummy functions when asleep.
This appear to work well. (Well, I do some additional state save
and PM, but all I do for "blocking IOs" is to drop them...)
Any printk done after they are suspended isn't displayed, but that's
not a real issue.

So yes, "blocking IOs" can actually mean "dropping new IOs",
that depends very much on the driver.

For USB, for example, we can consider that when a device driver
(not a controller driver) suspend has been done, any URB it submits
can just be dropped (returned immediately with an error). We don't
need blocking here neither. Of course, that means we have the
framework to call devices' suspend/resume callbacks when the
controller is about to go to sleep.

There might be other examples. I agree it's not a 2 lines fix
per driver, but that's the better I could imagine so far to have
something reliable. 

If you have other ideas, please share.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24  0:26                     ` Benjamin Herrenschmidt
@ 2001-10-24  9:57                       ` Alan Cox
  2001-10-24 10:34                         ` Benjamin Herrenschmidt
                                           ` (2 more replies)
  0 siblings, 3 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-24  9:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

> upon must be stopped after you. That's the point of having a
> tree here. Patrick and Linus feel the bus tree is enough to handle
> that dependency, which might well be the case for 99% of drivers.

The two trees are certainly closely related - USB devices before USB hub,
USB hub before PCI etc. The scanner example works fine there, providing that
we are careful about memory issues - remember the USB layer allocates memory
to do any transaction, so the scanner has to complete its state save before 
we do any interrupt disabling/memory alloc freezing.

Thats still just ordering and maybe two passes

> the HW or not when getting a new request). In cases where a mid-layer
> enters the scene, like SCSI, that wants to do timeouts, then well...
> we can let it timeout (just stop processing requests), or we can
> have the midlayer go to sleep as well :) That later solution
> may cause some interesting ordering issues however...

For scsi you have to complete the pending commands, you don't know what the
transaction granularity is in some cases and half completing the sequence
won't help you. In addition the upper layers have to queue additional scsi
commands to do stuff like cd drawer locking and to ask the drive firmware
to enter powerdown modes

> For USB, for example, we can consider that when a device driver
> (not a controller driver) suspend has been done, any URB it submits
> can just be dropped (returned immediately with an error). We don't
> need blocking here neither. Of course, that means we have the
> framework to call devices' suspend/resume callbacks when the
> controller is about to go to sleep.

That will scramble large numbers of devices. Randomly erroring pending block
writes is -not- civilised.

Alan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24  9:57                       ` Alan Cox
@ 2001-10-24 10:34                         ` Benjamin Herrenschmidt
  2001-10-24 10:54                           ` Alan Cox
  2001-10-24 15:18                         ` Jonathan Lundell
  2001-10-24 15:41                         ` Linus Torvalds
  2 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-24 10:34 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

>
>The two trees are certainly closely related - USB devices before USB hub,
>USB hub before PCI etc. The scanner example works fine there, providing that
>we are careful about memory issues - remember the USB layer allocates memory
>to do any transaction, so the scanner has to complete its state save before 
>we do any interrupt disabling/memory alloc freezing.

That's why we have that first step that is run before any device is blocked
and with interrupts still flowing. Devices that need memory for either state
save or for requests used during power down are supposed to allocate them
at this point.

>Thats still just ordering and maybe two passes

Well, 3 passes actually ;) I'd suggest you re-read Patrick's mochel latest
document, which describes the 3 step process. One first pass give a chance
to device to "prepare" for sleep, that is allocate anything they will need
without actually blocking or suspending anything. On the second pass, devices
are asked to suspend IOs, complete pending ones, and save state. This is
done with interrupts still enabled. The 3rd pass is called with interrupt
disabled and is where the actual shutdown of the device is supposed to happen.

In practice, a lot of drivers will only need to implement 1 or 2 of these
3 passes. But the flexibility has to be there for a device that need both\
to allocate memory and to suspend with interrupts disabled.

An additional idea we had was to make pass 2 somewhat asyncrhonous by
allowing some kind of "call my later" result code (basically, a device
would block it's queue, and return "call me later" while it still has
pending IOs). This is an optional "feature".

>> the HW or not when getting a new request). In cases where a mid-layer
>> enters the scene, like SCSI, that wants to do timeouts, then well...
>> we can let it timeout (just stop processing requests), or we can
>> have the midlayer go to sleep as well :) That later solution
>> may cause some interesting ordering issues however...
>
>For scsi you have to complete the pending commands, you don't know what the
>transaction granularity is in some cases and half completing the sequence
>won't help you. In addition the upper layers have to queue additional scsi
>commands to do stuff like cd drawer locking and to ask the drive firmware
>to enter powerdown modes

Yes. SCSI is a problem as the current SCSI layer is tricky enough to
make this difficult. I beleive that if the SCSI devices are childs of
the SCSI controller, then they can take care of suspending the device
(that is sending whatever command to lock the drawer or stop the disk
spinning) before the controller is actually going to sleep. In that
case, there's not much left to the controller, it isn't supposed to
have any command in queue nor receive any new one once all it's child
drivers have suspended.

>> For USB, for example, we can consider that when a device driver
>> (not a controller driver) suspend has been done, any URB it submits
>> can just be dropped (returned immediately with an error). We don't
>> need blocking here neither. Of course, that means we have the
>> framework to call devices' suspend/resume callbacks when the
>> controller is about to go to sleep.
>
>That will scramble large numbers of devices. Randomly erroring pending block
>writes is -not- civilised.

The drivers will have to be adapted for PM, whatever scheme we use. The
USB case is very similar to SCSI. The controller is a parent of all
devices. Devices will get the suspend before the controller, they will
have a chance to suspend requests and wait for pending ones to complete
(for example, the USB storage will have a chance to block new incoming
requests in the queue and wait for pending ones to complete) before
the USB controller is put to suspend.
That way, once we are reaching real USB controller suspend, we can safely
discard urbs as we are not supposed to get any until devices have been
resumed.

I currently don't do that on pmac (I just let URBs be handled by OHCI
and let OHCI fill the TD queues in memory, I only prevent the controller
from actually handling those queues). It's not good as some drivers will
get error messages due to their underlying device beeing suspended,
and won't understand why the URBs are going away.

In the case of USB devices that don't support suspend state, it's slightly
more tricky as on some HW, I may have to actually turn them off. That mean
the driver must deal with re-doing configuration & set-interface on
wakeup. That is again a driver matter, and a bit like the PCI case we
mentioned previously, we need some way to know if driver for a given
device can handle the requested power state or not. If not, we should
probably abort the suspend sequence, and find a clean interface to tell
userland about which device caused the failure so the user can deal with
it (rmmod the driver for example).


Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 10:34                         ` Benjamin Herrenschmidt
@ 2001-10-24 10:54                           ` Alan Cox
  2001-10-24 13:04                             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-24 10:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

> case, there's not much left to the controller, it isn't supposed to
> have any command in queue nor receive any new one once all it's child
> drivers have suspended.

scsi devices are children of the scsi subststem (sd, sg, sr, st, osst) not
of the controller. That is how the state flows anyway. Only sr/sd etc know
what the state is for a given device on power off as they may issue 
multiple requests per action true transaction. sg would have to simply
refuse any suspend if open (think about cd-burning or even worse firmware
download)

So the scsi devices hang off sd, sr etc which in turn hang off scsi and 
the controllers hang off scsi (and or the bus layers)

This one at least I think I do understand

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 10:54                           ` Alan Cox
@ 2001-10-24 13:04                             ` Benjamin Herrenschmidt
  2001-10-24 13:25                               ` Alan Cox
                                                 ` (3 more replies)
  0 siblings, 4 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-24 13:04 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

>> case, there's not much left to the controller, it isn't supposed to
>> have any command in queue nor receive any new one once all it's child
>> drivers have suspended.
>
>scsi devices are children of the scsi subststem (sd, sg, sr, st, osst) not
>of the controller. That is how the state flows anyway. Only sr/sd etc know
>what the state is for a given device on power off as they may issue 
>multiple requests per action true transaction. sg would have to simply
>refuse any suspend if open (think about cd-burning or even worse firmware
>download)
>
>So the scsi devices hang off sd, sr etc which in turn hang off scsi and 
>the controllers hang off scsi (and or the bus layers)
>
>This one at least I think I do understand

The problem with subsystems is that they don't fit well in the
power tree. They aren't "devices" in that sense that they are
not exposing a struct device, and they spawn over several controllers
which means the dependency can quickly become unmanageable, especially
when SCSI starts beeing layered on top of USB or FireWire.

Also, the dependency issue is made worst if you let RAID enter into
the dance as I beleive ultimately, nothing would prevent a volume to
spawn over several devices from different controllers or even different
controller types. 

So let's see if I properly understand what is needed in the SCSI case:

The parent is the controller. We can't do much about this since we need
that relationchip for ordering. By controller, it can be a real SCSI host,
but it can also be a virtual host exposed by an USB storage device or
a firewire SBP2 device.

The child of this controller has to be a struct device for each physical
device on the bus. (just one in the case of an USB storage). The struct
device for this child is +/- generic, possibly created by the generic
SCSI probe code.

This device might (must ?) have childs instanciated by whatever "client"
attach to a given SCSI device. Clients like sg would effectively refuse
suspend, while clients like sd would do standard disk spindown commands.
That mean there is not "one" PM node for the SCSI subsystem, but one
per instance of a given subsystem module.

Now, I'm not sure what would happen with RAID. If we need to have logical
volumes be child of the sd "client", then we have to face the fact that
a given child may have multiple parents... welcome to the power graph !
But do we really need logical volumes to be part of the PM tree or
can blocking of requests at the sd layer be enough ? Remember we are
in pass2, we have already done memory allocation, we are supposed to
no longer swap nor do any disk/storage related activity.

A tricky issue indeed...


Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 13:04                             ` Benjamin Herrenschmidt
@ 2001-10-24 13:25                               ` Alan Cox
  2001-10-24 16:19                                 ` Linus Torvalds
  2001-10-24 16:15                               ` Linus Torvalds
                                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-24 13:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

> Now, I'm not sure what would happen with RAID. If we need to have logical
> volumes be child of the sd "client", then we have to face the fact that
> a given child may have multiple parents... welcome to the power graph !
> But do we really need logical volumes to be part of the PM tree or
> can blocking of requests at the sd layer be enough ? Remember we are
> in pass2, we have already done memory allocation, we are supposed to
> no longer swap nor do any disk/storage related activity.

Assuming you want to synchronize the raid before suspend - a reasonably
policy but not essential then you'd have to shut down the raid before
sd, then sd would let the devices shut down which lets the controller
shutdown

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24  9:57                       ` Alan Cox
  2001-10-24 10:34                         ` Benjamin Herrenschmidt
@ 2001-10-24 15:18                         ` Jonathan Lundell
  2001-10-24 15:41                         ` Linus Torvalds
  2 siblings, 0 replies; 120+ messages in thread
From: Jonathan Lundell @ 2001-10-24 15:18 UTC (permalink / raw)
  To: Alan Cox, Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, linux-kernel, Patrick Mochel

At 10:57 AM +0100 10/24/01, Alan Cox wrote:
>  > the HW or not when getting a new request). In cases where a mid-layer
>>  enters the scene, like SCSI, that wants to do timeouts, then well...
>>  we can let it timeout (just stop processing requests), or we can
>>  have the midlayer go to sleep as well :) That later solution
>>  may cause some interesting ordering issues however...
>
>For scsi you have to complete the pending commands, you don't know what the
>transaction granularity is in some cases and half completing the sequence
>won't help you. In addition the upper layers have to queue additional scsi
>commands to do stuff like cd drawer locking and to ask the drive firmware
>to enter powerdown modes
>
>>  For USB, for example, we can consider that when a device driver
>>  (not a controller driver) suspend has been done, any URB it submits
>>  can just be dropped (returned immediately with an error). We don't
>>  need blocking here neither. Of course, that means we have the
>>  framework to call devices' suspend/resume callbacks when the
>  > controller is about to go to sleep.
>
>That will scramble large numbers of devices. Randomly erroring pending block
>writes is -not- civilised.

In our "extreme prejudice" suspend (this is in the context of masking 
& recovering from a fault in a fault-tolerant machine) we have cases 
in which completion of pending commands isn't possible. Our solution 
is to issue a SCSI bus reset, and terminate all outstanding commands 
with an appropriate (retryable) error. This is especially easy to 
implement in drivers that use SCSI bus reset as a routine (though 
last resort) error recovery mechanism, since the requisite logic is 
already in place. Not pretty, I suppose, but effective.

One model we've considered (but haven't implemented yet) is to make 
parents in the device tree responsible for suspending their children, 
so the suspend propagates down the tree and each node "knows" how to 
suspend its children, assuming any special action is required. So a 
SCSI HBA, for example, would be asked by its bus parent to suspend, 
and in turn would suspend its SCSI device children before suspending 
itself. I'm not quite sure how virtual device layers like md would 
fit into this scheme, since they can cut across device and power 
hierarchies.

At 11:54 AM +0100 10/24/01, Alan Cox wrote:
>So the scsi devices hang off sd, sr etc which in turn hang off scsi and
>the controllers hang off scsi (and or the bus layers)

Our first implementation was under Solaris 2.x (SPARC) in which the 
parent->child relationship is bus->hba->sd. scsi isn't in the tree; 
it's more of an interface layer between hba & sd. fwiw.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24  9:57                       ` Alan Cox
  2001-10-24 10:34                         ` Benjamin Herrenschmidt
  2001-10-24 15:18                         ` Jonathan Lundell
@ 2001-10-24 15:41                         ` Linus Torvalds
  2001-10-24 15:59                           ` Alan Cox
  2 siblings, 1 reply; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 15:41 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin Herrenschmidt, linux-kernel, Patrick Mochel, Jonathan Lundell


On Wed, 24 Oct 2001, Alan Cox wrote:
>
> That will scramble large numbers of devices. Randomly erroring pending block
> writes is -not- civilised.

Note that one thing in suspending the machine that has _nothing_ to do
with the actual device tree is that higher layers have to suspend whatever
it is they are doing anyway.

Ie part of the suspend action (which is unrelated to the driver model) is
to stop all regularly scheduled activity - not necessarily flushing all
dirty buffers, but certainly waiting for all pending IO. That's a much
higher level thing that the device though - the devices themselves should
never ever see this (except in the sense that they don't see new requests
coming in).

There are other "higher-level" issues: while a device "prepare to suspend"
call might block for some device information, that does not mean that it
can allocate memory with GFP_KERNEL, for example: when we shut off device
X, the disk may have been prepared for shutdown already, and the VM layer
cannot do any IO. So the suspend (and resume) function have to use
GFP_NOIO for their allocations - _regardless_ of any other device issues.

So sure, there are tons of issues here, but none of them have, in my
opinion, anything to do with the device model itself. More just normal
implementation details.

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 15:59                           ` Alan Cox
@ 2001-10-24 15:56                             ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 15:56 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin Herrenschmidt, linux-kernel, Patrick Mochel, Jonathan Lundell


On Wed, 24 Oct 2001, Alan Cox wrote:
>
> > call might block for some device information, that does not mean that it
> > can allocate memory with GFP_KERNEL, for example: when we shut off device
> > X, the disk may have been prepared for shutdown already, and the VM layer
> > cannot do any IO. So the suspend (and resume) function have to use
> > GFP_NOIO for their allocations - _regardless_ of any other device issues.
>
> So I have to write a whole extra set of code paths to duplicate normal
> functionality during power off

If that ends up being a problem, we can just make alloc_pages turn off the
IO bits on suspend. Easy enough..

Although I think you're making the problem bigger than it is. Most of the
suspend stuff should not need any "normal functionality" at all.

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 15:41                         ` Linus Torvalds
@ 2001-10-24 15:59                           ` Alan Cox
  2001-10-24 15:56                             ` Linus Torvalds
  0 siblings, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-24 15:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Benjamin Herrenschmidt, linux-kernel, Patrick Mochel,
	Jonathan Lundell

> call might block for some device information, that does not mean that it
> can allocate memory with GFP_KERNEL, for example: when we shut off device
> X, the disk may have been prepared for shutdown already, and the VM layer
> cannot do any IO. So the suspend (and resume) function have to use
> GFP_NOIO for their allocations - _regardless_ of any other device issues.

So I have to write a whole extra set of code paths to duplicate normal
functionality during power off

> So sure, there are tons of issues here, but none of them have, in my
> opinion, anything to do with the device model itself. More just normal
> implementation details.

My concern is that we need to make the implementation details simple. eg
so that simple things like "save state" can be done before we get into
"no this, no that , no the other" situations. Also so that for the many 
drivers where freezing the system once we have irqs off is easier (a lot
of sound for example is easiest done by disable irq, disable dma engine,
copy registers, return) can be done late and with small amounts of code


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 13:04                             ` Benjamin Herrenschmidt
  2001-10-24 13:25                               ` Alan Cox
@ 2001-10-24 16:15                               ` Linus Torvalds
  2001-10-24 16:46                                 ` Xavier Bestel
                                                   ` (3 more replies)
  2001-10-24 17:01                               ` Mike Anderson
  2001-10-25  9:02                               ` Eric W. Biederman
  3 siblings, 4 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 16:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, linux-kernel, Patrick Mochel, Jonathan Lundell


On Wed, 24 Oct 2001, Benjamin Herrenschmidt wrote:
> >
> >So the scsi devices hang off sd, sr etc which in turn hang off scsi and
> >the controllers hang off scsi (and or the bus layers)
> >
> >This one at least I think I do understand
>
> The problem with subsystems is that they don't fit well in the
> power tree. They aren't "devices" in that sense that they are
> not exposing a struct device, and they spawn over several controllers
> which means the dependency can quickly become unmanageable, especially
> when SCSI starts beeing layered on top of USB or FireWire.

Why would you _ever_ get "sg.c" and other crap involved in the suspend
process?

The device tree is for _device_ suspend, not for "subsystem suspend". The
SCSI subsystem is a piece of cr*p, but even if it was perfect it should
never get involved with the act of suspension.

We should not have pending IO, but that's for a totally different reason:
the first thing the much much MUCH higher levels of suspend should be
doing is to make sure that user apps are "quiescent". And that isn't done
by getting involved with sg.c or anything similar, but by basically
stopping all user apps (think of the equivalent of a "kill -STOP -1", but
done internally in the kernel without actually using a signal).

> Also, the dependency issue is made worst if you let RAID enter into
> the dance as I beleive ultimately, nothing would prevent a volume to
> spawn over several devices from different controllers or even different
> controller types.

Why would you get RAID involved? There is no _IO_ involved in suspending:
we just stop doing what we're doing, and leave it at that. We don't try to
flush state, we just freeze the machine.

The act of "suspend" should basically be: shut off the SCSI controller,
screw all devices, reset the bus on resume.

The act of suspend on USB should be to turn off the host controller and
remove power from devices. End of story. Nothing fancy.

If somebody removes a disk or equivalent while we're suspended, that's
_his_ problem, and is exactly the same as removing a disk while the disk
is running. Either the subsystem (like USB) already handles it, or it
doesn't. Suspend is _not_ an excuse to do anything that isn't done at
run-time.

So suspend is _not_ supposed to be equivalent of a full clean shutdown
with just users not seeing it.  That's way too expensive to be practical.
Remember: the main point of suspend is to have a laptop go to sleep, and
come back up on the order of a few _seconds_.

And if there are desktops which would like to suspend but cannot because
they aren't strictly designed for it, then tough - we should not try to
design a heavy suspend for hardware that doesn't live with it well.

Also, realize that the act of suspension is STARTED BY THE USER. Which
means that before the kernel suspends, you _can_ have user programs that
basically take disk arrays off-line etc if that is what you want. But
that's not ae kernel suspend issue.

			Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 13:25                               ` Alan Cox
@ 2001-10-24 16:19                                 ` Linus Torvalds
  2001-10-24 16:36                                   ` Michael H. Warfield
  2001-10-24 22:48                                   ` Alan Cox
  0 siblings, 2 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 16:19 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin Herrenschmidt, linux-kernel, Patrick Mochel, Jonathan Lundell


On Wed, 24 Oct 2001, Alan Cox wrote:
>
> Assuming you want to synchronize the raid before suspend - a reasonably
> policy but not essential then you'd have to shut down the raid before
> sd, then sd would let the devices shut down which lets the controller
> shutdown

I will _refuse_ to have a kernel suspend that synchronizes the raid etc.
That would make suspend/resume potentially take a _loong_ time.

If you want to synchronize your raid thing, make the user-level thing that
triggers the suspend do it. Same goes for things like "sync network
filesystems" etc. This is not a kernel level issue, and the kernel
shouldn't even try to do it.

If somebody has pending stuff over NFS and suspends, and when it comes
back it's not on the network any more, that is 100% equivalent to removing
a PCMCIA network card while running. It's supposed to work - but if you
lose data that's YOUR problem, not the kernels.

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:19                                 ` Linus Torvalds
@ 2001-10-24 16:36                                   ` Michael H. Warfield
  2001-10-24 16:45                                     ` Linus Torvalds
  2001-10-24 22:48                                   ` Alan Cox
  1 sibling, 1 reply; 120+ messages in thread
From: Michael H. Warfield @ 2001-10-24 16:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Benjamin Herrenschmidt, linux-kernel, Patrick Mochel,
	Jonathan Lundell

On Wed, Oct 24, 2001 at 09:19:45AM -0700, Linus Torvalds wrote:

> On Wed, 24 Oct 2001, Alan Cox wrote:

> > Assuming you want to synchronize the raid before suspend - a reasonably
> > policy but not essential then you'd have to shut down the raid before
> > sd, then sd would let the devices shut down which lets the controller
> > shutdown

> I will _refuse_ to have a kernel suspend that synchronizes the raid etc.
> That would make suspend/resume potentially take a _loong_ time.

	If you have Magic SysRq enabled, would that do the job prior
to suspend?  Typically with Pavel's swsusp package, I hit the Alt-SysRq-s
before hitting Alt-SysRq-d to suspend him.  Does Alt-SysRq-s synchronize
a raid?  Of course, at that point, the choice to take the "_loong_ time"
is in user space - meat space, user space - since I chose to hit that
key combination.

> If you want to synchronize your raid thing, make the user-level thing that
> triggers the suspend do it. Same goes for things like "sync network
> filesystems" etc. This is not a kernel level issue, and the kernel
> shouldn't even try to do it.

	What does the Alt-SysRq-s combination do about networks then?

> If somebody has pending stuff over NFS and suspends, and when it comes
> back it's not on the network any more, that is 100% equivalent to removing
> a PCMCIA network card while running. It's supposed to work - but if you
> lose data that's YOUR problem, not the kernels.

> 		Linus

	Mike
-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  /\/\|=mhw=|\/\/       |  (678) 463-0932   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:36                                   ` Michael H. Warfield
@ 2001-10-24 16:45                                     ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 16:45 UTC (permalink / raw)
  To: Michael H. Warfield
  Cc: Alan Cox, Benjamin Herrenschmidt, linux-kernel, Patrick Mochel,
	Jonathan Lundell


On Wed, 24 Oct 2001, Michael H. Warfield wrote:
>
> > I will _refuse_ to have a kernel suspend that synchronizes the raid etc.
> > That would make suspend/resume potentially take a _loong_ time.
>
> 	If you have Magic SysRq enabled, would that do the job prior
> to suspend?  Typically with Pavel's swsusp package, I hit the Alt-SysRq-s
> before hitting Alt-SysRq-d to suspend him.  Does Alt-SysRq-s synchronize
> a raid?  Of course, at that point, the choice to take the "_loong_ time"
> is in user space - meat space, user space - since I chose to hit that
> key combination.

Sure. I only refuse to have it be "integrated" into the suspend - but it's
certainly perfectly fine to have "combination events", whether by having
special keystrokes that starts them or by having scripts or programs that
first do the sync and then do "echo 3 > /proc/acpi/sleep" or whatever.

> 	What does the Alt-SysRq-s combination do about networks then?

I think it just does a "fsync_dev()", which will do the right thing for
network filesystems too.

But let's make an example: let's assume that I'm working on my laptop, and
the NFS server goes down so I decide to take a break. Should I not be able
to suspend, only because the sync won't finish?

That's the wrong answer. By _default_ I should just suspend, and when I
come back it will continue to try to write back the data (not by any
magical suspend/resume means, but just because that's what NFS does anyway
when the server hasn't answered)

			Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:15                               ` Linus Torvalds
@ 2001-10-24 16:46                                 ` Xavier Bestel
  2001-10-24 16:54                                   ` Patrick Mochel
  2001-10-24 16:55                                   ` Linus Torvalds
  2001-10-24 17:33                                 ` Benjamin Herrenschmidt
                                                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 120+ messages in thread
From: Xavier Bestel @ 2001-10-24 16:46 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Benjamin Herrenschmidt, Alan Cox, Linux Kernel Mailing List,
	Patrick Mochel, Jonathan Lundell

le mer 24-10-2001 à 18:15, Linus Torvalds a écrit :
> Also, realize that the act of suspension is STARTED BY THE USER. Which

... or triggered by some kind of inactivity timer, or low battery
condition.

	Xav


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:46                                 ` Xavier Bestel
@ 2001-10-24 16:54                                   ` Patrick Mochel
  2001-10-24 16:55                                   ` Linus Torvalds
  1 sibling, 0 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-24 16:54 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Linus Torvalds, Benjamin Herrenschmidt, Alan Cox,
	Linux Kernel Mailing List, Jonathan Lundell


On 24 Oct 2001, Xavier Bestel wrote:

> le mer 24-10-2001 à 18:15, Linus Torvalds a écrit :
> > Also, realize that the act of suspension is STARTED BY THE USER. Which
>
> ... or triggered by some kind of inactivity timer, or low battery
> condition.

...which should be done in userspace.

	-pat


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:46                                 ` Xavier Bestel
  2001-10-24 16:54                                   ` Patrick Mochel
@ 2001-10-24 16:55                                   ` Linus Torvalds
  2001-10-24 22:45                                     ` Alan Cox
  1 sibling, 1 reply; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 16:55 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Benjamin Herrenschmidt, Alan Cox, Linux Kernel Mailing List,
	Patrick Mochel, Jonathan Lundell


On 24 Oct 2001, Xavier Bestel wrote:
>
> le mer 24-10-2001 à 18:15, Linus Torvalds a écrit :
> > Also, realize that the act of suspension is STARTED BY THE USER. Which
>
> ... or triggered by some kind of inactivity timer, or low battery
> condition.

Note that even when that happens, it's not supposed to be the kernel start
_starts_ the activity of suspension.

An inactivity timer or low battery notification will just notify the
proper deamon, and the policy on what to do should be in user space.  For
example, on low battery you might want to set up a X window warning the
user that the machine _will_ suspend in five seconds. And the kernel
certainly won't do that.

So as far as the kernel is concerned, a suspend is _always_ started by
"the user". Of course, the whole point with computers is that many things
can be automated, and "the user" may not be a human sitting at the
machine.

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 13:04                             ` Benjamin Herrenschmidt
  2001-10-24 13:25                               ` Alan Cox
  2001-10-24 16:15                               ` Linus Torvalds
@ 2001-10-24 17:01                               ` Mike Anderson
  2001-10-25  9:02                               ` Eric W. Biederman
  3 siblings, 0 replies; 120+ messages in thread
From: Mike Anderson @ 2001-10-24 17:01 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

Benjamin Herrenschmidt [benh@kernel.crashing.org] wrote:
> Now, I'm not sure what would happen with RAID. If we need to have logical
> volumes be child of the sd "client", then we have to face the fact that
> a given child may have multiple parents... welcome to the power graph !

You do not have to add RAID to need to worry about multiple parents. If we
want to correctly represent devices that have multiple paths (i.e twin
tailed SCSI, fibre channel, multi-ported devices, etc) we should have a
solution to handle this.  Some O/S's have moved to directed graphs to
address the multiple parent issue. Exposing only one block / character
device per real physical device would reduce O/S resources (major /
minors, structs) and provide a single request queue.

The current model of a scsi_device having a single parent and being
attached to the scsi_host host_queue has made adding multi-path support
to Linux below the SCSI lower level driver difficult.

> But do we really need logical volumes to be part of the PM tree or
> can blocking of requests at the sd layer be enough ? Remember we are
> in pass2, we have already done memory allocation, we are supposed to
> no longer swap nor do any disk/storage related activity.
> 
> A tricky issue indeed...
> 
> 
> Ben.

-Mike
-- 
Michael Anderson
andmike@us.ibm.com


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:15                               ` Linus Torvalds
  2001-10-24 16:46                                 ` Xavier Bestel
@ 2001-10-24 17:33                                 ` Benjamin Herrenschmidt
  2001-10-24 22:41                                   ` Alan Cox
  2001-10-25 21:47                                   ` Pavel Machek
  2001-10-24 22:50                                 ` Alan Cox
  2001-10-25  8:27                                 ` Rob Turk
  3 siblings, 2 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-24 17:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alan Cox, linux-kernel, Patrick Mochel, Jonathan Lundell

>Why would you _ever_ get "sg.c" and other crap involved in the suspend
>process?

Ahhh... ;)

>The device tree is for _device_ suspend, not for "subsystem suspend". The
>SCSI subsystem is a piece of cr*p, but even if it was perfect it should
>never get involved with the act of suspension.

I agree I'd like subsystems to avoid polluting the PM tree (or device tree).
If there are a few cases where a subsystem needs to know a driver it's using
is asleep, it's probably up to the interface of this susbystem to provide
a function to be called by the driver when it's going to suspend mode.

I can't really tell how it would be done for SCSI, I admit I got quickly
lost in the current drivers/scsi midlayer trying to figure out how
it layered things, but Andre told me it should be rewritten. I beleive
the proper way here would be to have an _interface_ (no mid-layer)
exposed by drivers who can receive SCSI commands (them beeing either USB,
ATAPI, FireWire or real SCSI devices).

However, Alan makes a point about state information. For example, the
"generic" CD-ROM driver for SCSI-protocol (those CD-ROMs beeing either
ATAPI, real SCSI, ...) would be the only one to know some state information
like rotation speed or whatever configuration info a given device may
support (placing ourselves in an ideal world of course, where all CDs
using the SCSI protocol share the same driver). So it must be some
way involved in the PM process.

I don't know what people plan for SCSI & ATAPI overhaul in 2.5, but
depending on how that scheme is done will impact the PM process. 

>We should not have pending IO, but that's for a totally different reason:
>the first thing the much much MUCH higher levels of suspend should be
>doing is to make sure that user apps are "quiescent". And that isn't done
>by getting involved with sg.c or anything similar, but by basically
>stopping all user apps (think of the equivalent of a "kill -STOP -1", but
>done internally in the kernel without actually using a signal).

Is this necessary ? That would definitely make things easier to implement
to forget about incoming requests, but I'm not sure it's the right way.
In fact, is it really working ? You could well have in-kernel threads
triggering IOs or launching new userland stuffs (what happens if a not
yet suspended driver for, let's say USB, see a new device coming in 
and starts /sbin/hotplug). Some filesystems have garbage collector threads
that can do IOs eventually. There are various kind of IOs that can in fact
be triggered entirely within the kernel.

Also, if you don't stop userland, wakeup is amazingly fast since
userland is up very soon, a sound app (mp3 player for example) will
be blocked until the driver it's write()'ing to is woken up, 
a swapping app is blocked until the IO requests for that page is
completed (which means if it was blocked in a block driver queue,
once the block driver have resumed operations), etc...
Currently, on pmacs, i don't have the tree but I do have some
kind of priority mecanism. Userland is woken up as soon as the
disk is ready. I think scheduling only have to be disabled between
step 2 and 3 of the sleep process, that is after all IOs have been
blocked  and before shutting down devices (that is before the step
that runs with IRQs off).

I really don't think it's _that_ difficult to properly do this blocking.
For things like sound drivers, a simple semaphore is plenty enough. For
network drivers, stopping the output queue is a one-line thing,
for block devices, it usually a matter of marking ourselves busy and
letting requests pile on our queue. The should be no risk of blocking
the sleep process itself this way since we made sure we did all allocations
needed by drivers prior to starting the actual blocking of requests, so we
won't block swap out.

Userland apps that hack on hardware directly like X will have been
suspended earlier (possibly via the existing /dev/apmbios interface
but we should rather define a new cleaner way to have PM control
from userland.

I have all of this more or less working on pmac laptops. I don't have
the new device model, so I handle dependencies manually with a priority
mecanism, but it's already good enough to let me resume userland before
ADB and sound, and possibly stuffs. (Which is nice since my sound chips
usually need one or 2 second to recalibrate and ADB need a few seconds to
probe the bus, all this happens asynchronously).

>> Also, the dependency issue is made worst if you let RAID enter into
>> the dance as I beleive ultimately, nothing would prevent a volume to
>> spawn over several devices from different controllers or even different
>> controller types.
>
>Why would you get RAID involved? There is no _IO_ involved in suspending:
>we just stop doing what we're doing, and leave it at that. We don't try to
>flush state, we just freeze the machine.

Ok.

>The act of "suspend" should basically be: shut off the SCSI controller,
>screw all devices, reset the bus on resume.

Well, some devices will need some state saving. You may need to save and
restore some speed setting for example. You want pending requests to
be properly terminated before you shut down, etc...

>The act of suspend on USB should be to turn off the host controller and
>remove power from devices. End of story. Nothing fancy.

Ehh... no, you want to put the USB bus into suspend state, which isn't
the same ;) Or you lose the ability to wake up the machine from the
USB keyboard, which on some iMacs is I think, the only way.

So before doing so, you need to iterate all childs of the controller
(that's fine, they have struct device entries with PM functions in them),
and tell them to suspend. Some devices need to be sent a command for
enabling remote wakeup. Others may need other housekeeping before the
bus goes to suspend.

But there's nothing magic there, nor anything fancy.
Just USB drivers have a struct device for each USB device, and are
notified of suspend normally as part of the device tree walk.
They are childs of the USB controller, we are guaranteed the controller
won't be down before devices.

I agree that is this is properly done, then we know the controller
won't have to deal with pending or new incoming IOs once it's its turn
to go sleeping, and can just enter it's suspend state immediately.

>If somebody removes a disk or equivalent while we're suspended, that's
>_his_ problem, and is exactly the same as removing a disk while the disk
>is running. Either the subsystem (like USB) already handles it, or it
>doesn't. Suspend is _not_ an excuse to do anything that isn't done at
>run-time.

Yup.

>So suspend is _not_ supposed to be equivalent of a full clean shutdown
>with just users not seeing it.  That's way too expensive to be practical.
>Remember: the main point of suspend is to have a laptop go to sleep, and
>come back up on the order of a few _seconds_.

Yup.

>And if there are desktops which would like to suspend but cannot because
>they aren't strictly designed for it, then tough - we should not try to
>design a heavy suspend for hardware that doesn't live with it well.
>
>Also, realize that the act of suspension is STARTED BY THE USER. Which
>means that before the kernel suspends, you _can_ have user programs that
>basically take disk arrays off-line etc if that is what you want. But
>that's not ae kernel suspend issue.

Yes. We do that on pmac as well. The lid of the laptop is monitored by
a userland daemon, which runs scripts before and after sending the real
suspend ioctl to our PM driver.

There is also /dev/apm_bios which allows for the suspend process to
notify (and wait for ack) userland apps that requested X (in our case,
X so it stop banging the HW, properly save it's own state and properly
resume the card on wakeup). 

However, that interface could as well be completely userland (X
requesting notifications from the PM daemon). 

Ben.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 17:33                                 ` Benjamin Herrenschmidt
@ 2001-10-24 22:41                                   ` Alan Cox
  2001-10-24 22:41                                     ` Linus Torvalds
                                                       ` (2 more replies)
  2001-10-25 21:47                                   ` Pavel Machek
  1 sibling, 3 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-24 22:41 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Alan Cox, linux-kernel, Patrick Mochel, Jonathan Lundell

> >The device tree is for _device_ suspend, not for "subsystem suspend". The
> >SCSI subsystem is a piece of cr*p, but even if it was perfect it should
> >never get involved with the act of suspension.
> 
> I agree I'd like subsystems to avoid polluting the PM tree (or device tree).
> If there are a few cases where a subsystem needs to know a driver it's using
> is asleep, it's probably up to the interface of this susbystem to provide
> a function to be called by the driver when it's going to suspend mode.

I don't think it is a big problem. We can add virtual nodes. They way I
see it we either
	a) put in grungy subsystem hacks
	b) register virtual device nodes for subsystems when needed

b feels cleaner

> I really don't think it's _that_ difficult to properly do this blocking.
> For things like sound drivers, a simple semaphore is plenty enough. For

Sound is more easily handled by not blocking user space but waiting until
the final IRQ off moment and grabbing the registers. That avoids a lot
of ugly locking gunge. It literally comes down to

	case suspending
		kmalloc buffer
		done
	case final suspend point
		turn off DMA
		readl
		readl
		readl
		readl
		...
		done


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 22:41                                   ` Alan Cox
@ 2001-10-24 22:41                                     ` Linus Torvalds
  2001-10-25  7:58                                     ` Benjamin Herrenschmidt
  2001-10-25  8:03                                     ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-24 22:41 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin Herrenschmidt, linux-kernel, Patrick Mochel, Jonathan Lundell


On Wed, 24 Oct 2001, Alan Cox wrote:
>
> I don't think it is a big problem. We can add virtual nodes. They way I
> see it we either
> 	a) put in grungy subsystem hacks
> 	b) register virtual device nodes for subsystems when needed
>
> b feels cleaner

I agree. I would personally see us using _more_ "virtual device node"
things already: right now we have things like SuperIO chips that contain
both a serial line and a parallel port (and...), and some drivers do
really ugly things with them - keep them as one "struct pci_dev", and then
have two drivers sharing the device.

It would be much cleaner to have _one_ driver for such SuperIO chips (a
"multinode" driver), which just creates two virtual pci_dev structures,
and lets the regular serial driver handle the "virtual serial device" etc.

That has the advantage of:
 - not needing special hacks in various serial/parallel drivers
 - the devices show up naturally and logically in whatever user mode
   "device m nager" tree

So the device nodes do not have to match the physical tree. The physical
device tree only sets up the initial physical scanning, and obviously
limits _reality_ ;)

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:55                                   ` Linus Torvalds
@ 2001-10-24 22:45                                     ` Alan Cox
  0 siblings, 0 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-24 22:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Xavier Bestel, Benjamin Herrenschmidt, Alan Cox,
	Linux Kernel Mailing List, Patrick Mochel, Jonathan Lundell

> So as far as the kernel is concerned, a suspend is _always_ started by
> "the user". Of course, the whole point with computers is that many thin=
> gs
> can be automated, and "the user" may not be a human sitting at the
> machine.

How does that apply to the equivalent of an APM critical shutdown - do we
still vector that via userspace ?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:19                                 ` Linus Torvalds
  2001-10-24 16:36                                   ` Michael H. Warfield
@ 2001-10-24 22:48                                   ` Alan Cox
  1 sibling, 0 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-24 22:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Benjamin Herrenschmidt, linux-kernel, Patrick Mochel,
	Jonathan Lundell

> If you want to synchronize your raid thing, make the user-level thing that
> triggers the suspend do it. Same goes for things like "sync network
> filesystems" etc. This is not a kernel level issue, and the kernel
> shouldn't even try to do it.

Makes good sense - I agree

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:15                               ` Linus Torvalds
  2001-10-24 16:46                                 ` Xavier Bestel
  2001-10-24 17:33                                 ` Benjamin Herrenschmidt
@ 2001-10-24 22:50                                 ` Alan Cox
  2001-10-25  4:14                                   ` Linus Torvalds
  2001-10-25  8:27                                 ` Rob Turk
  3 siblings, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-24 22:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Benjamin Herrenschmidt, Alan Cox, linux-kernel, Patrick Mochel,
	Jonathan Lundell

> Why would you _ever_ get "sg.c" and other crap involved in the suspend
> process?
> 
> The device tree is for _device_ suspend, not for "subsystem suspend". The
> SCSI subsystem is a piece of cr*p, but even if it was perfect it should
> never get involved with the act of suspension.

Well I don't want my laptop to suspend during a CD burn or firmware update.
The device itself doesn't know anything about how busy it is since its
just sending packets, only the subsystem driver controller it does

> by getting involved with sg.c or anything similar, but by basically
> stopping all user apps (think of the equivalent of a "kill -STOP -1", but
> done internally in the kernel without actually using a signal).

Stopping all user apps really tends to ruin the cd and the firmware
update.

> Remember: the main point of suspend is to have a laptop go to sleep, and
> come back up on the order of a few _seconds_.

It also has to avoid unpleasant situations

> Also, realize that the act of suspension is STARTED BY THE USER. Which
> means that before the kernel suspends, you _can_ have user programs that
> basically take disk arrays off-line etc if that is what you want. But
> that's not ae kernel suspend issue.

There are certain practicalities here with trying to make user space dig
around in fuser innards or patching every cd burner. The sg layer is one
that has to get involved (be it as a driver call back or a virtual driver)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 22:50                                 ` Alan Cox
@ 2001-10-25  4:14                                   ` Linus Torvalds
  2001-10-25 12:42                                     ` Alan Cox
  0 siblings, 1 reply; 120+ messages in thread
From: Linus Torvalds @ 2001-10-25  4:14 UTC (permalink / raw)
  To: Alan Cox
  Cc: Benjamin Herrenschmidt, linux-kernel, Patrick Mochel, Jonathan Lundell


On Wed, 24 Oct 2001, Alan Cox wrote:
>
> Well I don't want my laptop to suspend during a CD burn or firmware update.
> The device itself doesn't know anything about how busy it is since its
> just sending packets, only the subsystem driver controller it does

But that's _your_ problem. Not the kernels.

If you have a acpi deamon that decides to make the machine go to sleep
while burning a CD, that's nothign to do with the kernel at all.

It has nothing to do with sg.c either, for that matter.

> > Remember: the main point of suspend is to have a laptop go to sleep, and
> > come back up on the order of a few _seconds_.
>
> It also has to avoid unpleasant situations

Absolutely NOT.

The kernel does not set policy. If the user says "suspend now", then we
suspend now. Whether a CD burn or anything else is going on is totally
irrelevant.

> There are certain practicalities here with trying to make user space dig
> around in fuser innards or patching every cd burner. The sg layer is one
> that has to get involved (be it as a driver call back or a virtual driver)

Not a way in hell. If the sg layer wants to export a "/proc/sgbusy",
that's its problem.

But if I say "suspend", and the kernel refuses, I will kill the offending
piece of crap from sg.c before you can blink an eye.

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 22:41                                   ` Alan Cox
  2001-10-24 22:41                                     ` Linus Torvalds
@ 2001-10-25  7:58                                     ` Benjamin Herrenschmidt
  2001-10-25 12:22                                       ` Alan Cox
  2001-10-25  8:03                                     ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25  7:58 UTC (permalink / raw)
  To: Alan Cox, linux-kernel

>> I really don't think it's _that_ difficult to properly do this blocking.
>> For things like sound drivers, a simple semaphore is plenty enough. For
>
>Sound is more easily handled by not blocking user space but waiting until
>the final IRQ off moment and grabbing the registers. That avoids a lot
>of ugly locking gunge. It literally comes down to

My point about using a semaphore was to avoid getting mixer ioctls
banging the HW while it is shut down.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 22:41                                   ` Alan Cox
  2001-10-24 22:41                                     ` Linus Torvalds
  2001-10-25  7:58                                     ` Benjamin Herrenschmidt
@ 2001-10-25  8:03                                     ` Benjamin Herrenschmidt
  2001-10-25  8:09                                       ` Benjamin Herrenschmidt
  2001-10-25 12:20                                       ` Alan Cox
  2 siblings, 2 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25  8:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Patrick Mochel, Jonathan Lundell, linux-kernel

>
>I don't think it is a big problem. We can add virtual nodes. They way I
>see it we either
>	a) put in grungy subsystem hacks
>	b) register virtual device nodes for subsystems when needed
>
>b feels cleaner

Ok, provided that there is not one big "SCSI subsystem" virtual node,
that would screw up the entire dep. hierarchy, but rather virtual nodes
created by SCSI "clients" on the fly as childs of their devices. That
looks fine to me ;)

An example would be:

  * SCSI host
  |
  |
  * SCSI disk device
  |
  |
  * "sd" node

In this case, "sg" could add itself when opened, and eventually cause
sleep requests to be rejected for example.

Well, in fact, I don't think there is real need for the "SCSI disk device"
node, but that depends pretty much on the new SCSI architecture. 

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  8:03                                     ` Benjamin Herrenschmidt
@ 2001-10-25  8:09                                       ` Benjamin Herrenschmidt
  2001-10-25 12:20                                       ` Alan Cox
  1 sibling, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25  8:09 UTC (permalink / raw)
  To: Alan Cox; +Cc: Linus Torvalds, Patrick Mochel, Jonathan Lundell, linux-kernel

>In this case, "sg" could add itself when opened, and eventually cause
>sleep requests to be rejected for example.

Well, looks like Linus won't let this one pass ;) A /proc/sgbusy would
eventually be ok, but I'd rather start defining a proper interface
to the PM daemon in userland for apps to request that the machine
doesn't go to sleep. That would be used, among other things, by
CD burners & firmware updaters. No need to hack thousands of apps,
I beleive if we get a patch implementing support for that in cdrecord,
then all burners software will magically start getting it ;)

Ben.





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 16:15                               ` Linus Torvalds
                                                   ` (2 preceding siblings ...)
  2001-10-24 22:50                                 ` Alan Cox
@ 2001-10-25  8:27                                 ` Rob Turk
  2001-10-25 10:01                                   ` Benjamin Herrenschmidt
                                                     ` (3 more replies)
  3 siblings, 4 replies; 120+ messages in thread
From: Rob Turk @ 2001-10-25  8:27 UTC (permalink / raw)
  To: linux-kernel

"Linus Torvalds" <torvalds@transmeta.com> wrote in message
news:cistron.Pine.LNX.4.33.0110240901350.8049-100000@penguin.transmeta.com..
.
>
> On Wed, 24 Oct 2001, Benjamin Herrenschmidt wrote:
> > >
> > >So the scsi devices hang off sd, sr etc which in turn hang off scsi and
> > >the controllers hang off scsi (and or the bus layers)
> > >
> > >This one at least I think I do understand
> >
> > The problem with subsystems is that they don't fit well in the
> > power tree. They aren't "devices" in that sense that they are
> > not exposing a struct device, and they spawn over several controllers
> > which means the dependency can quickly become unmanageable, especially
> > when SCSI starts beeing layered on top of USB or FireWire.
>
> Why would you _ever_ get "sg.c" and other crap involved in the suspend
> process?
>
> The device tree is for _device_ suspend, not for "subsystem suspend". The
> SCSI subsystem is a piece of cr*p, but even if it was perfect it should
> never get involved with the act of suspension.
>
> We should not have pending IO, but that's for a totally different reason:
> the first thing the much much MUCH higher levels of suspend should be
> doing is to make sure that user apps are "quiescent". And that isn't done
> by getting involved with sg.c or anything similar, but by basically
> stopping all user apps (think of the equivalent of a "kill -STOP -1", but
> done internally in the kernel without actually using a signal).
>
> > Also, the dependency issue is made worst if you let RAID enter into
> > the dance as I beleive ultimately, nothing would prevent a volume to
> > spawn over several devices from different controllers or even different
> > controller types.
>
> Why would you get RAID involved? There is no _IO_ involved in suspending:
> we just stop doing what we're doing, and leave it at that. We don't try to
> flush state, we just freeze the machine.
>
> The act of "suspend" should basically be: shut off the SCSI controller,
> screw all devices, reset the bus on resume.
>

Doing so will create havoc on sequential devices, such as tape drives. If
your system simply suspends, then all is well. Any data that isn't flushed
yet is buffered inside the tapedrive. But when the system resumes and resets
the SCSI bus, it will cause all data in the tape drive to be lost, and for
most tape systems it will also re-position them at LBOT. Any running
tar/dump/whatever tape process would not survive such a suspend-resume
cycle.

Another more subtle issue is state information that exists between the SCSI
controller and the target devices. At some point they might have negotiated
synchronous and/or wide transfer parameters. This information must be
preserved, or you'll observe lockups, data corruption and the likes. Since
these parameters are maintained at the lowest driver level, they should know
about suspend. The low-level driver must know to re-negotiate these
parameters when it comes back to life.

Rob






^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 13:04                             ` Benjamin Herrenschmidt
                                                 ` (2 preceding siblings ...)
  2001-10-24 17:01                               ` Mike Anderson
@ 2001-10-25  9:02                               ` Eric W. Biederman
  2001-10-25  9:29                                 ` Linus Torvalds
  3 siblings, 1 reply; 120+ messages in thread
From: Eric W. Biederman @ 2001-10-25  9:02 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, linux-kernel, Patrick Mochel, Jonathan Lundell

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

> >> case, there's not much left to the controller, it isn't supposed to
> >> have any command in queue nor receive any new one once all it's child
> >> drivers have suspended.
> >
> >scsi devices are children of the scsi subststem (sd, sg, sr, st, osst) not
> >of the controller. That is how the state flows anyway. Only sr/sd etc know
> >what the state is for a given device on power off as they may issue 
> >multiple requests per action true transaction. sg would have to simply
> >refuse any suspend if open (think about cd-burning or even worse firmware
> >download)
> >
> >So the scsi devices hang off sd, sr etc which in turn hang off scsi and 
> >the controllers hang off scsi (and or the bus layers)
> >
> >This one at least I think I do understand
> 
> The problem with subsystems is that they don't fit well in the
> power tree. They aren't "devices" in that sense that they are
> not exposing a struct device, and they spawn over several controllers
> which means the dependency can quickly become unmanageable, especially
> when SCSI starts beeing layered on top of USB or FireWire.
> 
> Also, the dependency issue is made worst if you let RAID enter into
> the dance as I beleive ultimately, nothing would prevent a volume to
> spawn over several devices from different controllers or even different
> controller types. 

On the dependency case for x86 I have a fun common example.
To shut off the cpu, or the whole motherboard I need to talk to the
southbridge.  To talk to the southbridge, I need to talk to the northbridge.

So at least to some extent shutting down busses is a really different
case from shutting down devices.  And only in some cases can a tree
model it at all.

Equally fun are temperature monitors that appear on both the lpc/isa bus
and the i2c bus.

Or another fun common one.  To shut down the interrupt controller, I first
need to shut down every device that thinks it can generate interrupts.
But my interrupt controller is way out on my pci->isa bridge.  So I
can't shut that device down.

Sorry this whole device tree idea for shutdown ordering doesn't seem
to match my idea of reality.

Now I need to take a little time out and see what the code that is
being discussed will actually do about situations like the above.

> A tricky issue indeed...

Agreed.

Eric

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  9:02                               ` Eric W. Biederman
@ 2001-10-25  9:29                                 ` Linus Torvalds
  2001-10-25  9:47                                   ` Benjamin Herrenschmidt
  2001-10-25 10:11                                   ` Eric W. Biederman
  0 siblings, 2 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-25  9:29 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Herrenschmidt, Alan Cox, linux-kernel, Patrick Mochel,
	Jonathan Lundell


On 25 Oct 2001, Eric W. Biederman wrote:
>
> Or another fun common one.  To shut down the interrupt controller, I first
> need to shut down every device that thinks it can generate interrupts.
> But my interrupt controller is way out on my pci->isa bridge.  So I
> can't shut that device down.
>
> Sorry this whole device tree idea for shutdown ordering doesn't seem
> to match my idea of reality.

Your _examples_ do not match any reality.

Don't worry about things like the CPU shutdown: you have to have special
code for it anyway.

Let's face it, the device tree is for _devices_. It's for shutting down a
network card before we shut down the PCI bridge that is in front of it.

The issue of "core shutdown" is not covered - and isn't _meant_ to be
covered. That's the problem of the architecture-specific code. There is no
point in having a device tree for that, because it's going to be very much
architecture-specific anyway (ie on x86 we may have to just blindly trust
some silly APCI table data etc).

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  9:29                                 ` Linus Torvalds
@ 2001-10-25  9:47                                   ` Benjamin Herrenschmidt
  2001-10-25 10:11                                   ` Eric W. Biederman
  1 sibling, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25  9:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Eric W. Biederman, Patrick Mochel

>> Or another fun common one.  To shut down the interrupt controller, I first
>> need to shut down every device that thinks it can generate interrupts.
>> But my interrupt controller is way out on my pci->isa bridge.  So I
>> can't shut that device down.
>>
>> Sorry this whole device tree idea for shutdown ordering doesn't seem
>> to match my idea of reality.
>
>Your _examples_ do not match any reality.
>
>Don't worry about things like the CPU shutdown: you have to have special
>code for it anyway.
>
>Let's face it, the device tree is for _devices_. It's for shutting down a
>network card before we shut down the PCI bridge that is in front of it.
>
>The issue of "core shutdown" is not covered - and isn't _meant_ to be
>covered. That's the problem of the architecture-specific code. There is no
>point in having a device tree for that, because it's going to be very much
>architecture-specific anyway (ie on x86 we may have to just blindly trust
>some silly APCI table data etc).

Definitelly. I have similar issues with pmacs, clocks generation
and interrupt controller are in Apple's mac-io ASIC which is on PCI,
so this ASIC can't be part of the normal PM tree and has to be handled
as part of the "core" PM code. This kind of issue will still happen, the
new scheme won't "magically" make PM work on every single laptop out
there, there will still be some corner cases to deal with, but at least
these will be limited to real corner cases and most "normal" drivers
will fit in the new, saner, mecanism.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  8:27                                 ` Rob Turk
@ 2001-10-25 10:01                                   ` Benjamin Herrenschmidt
  2001-10-25 10:02                                   ` Helge Hafting
                                                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25 10:01 UTC (permalink / raw)
  To: Rob Turk; +Cc: Linus Torvalds, Alan Cox, linux-kernel, Patrick Mochel

>
>Doing so will create havoc on sequential devices, such as tape drives. If
>your system simply suspends, then all is well. Any data that isn't flushed
>yet is buffered inside the tapedrive. But when the system resumes and resets
>the SCSI bus, it will cause all data in the tape drive to be lost, and for
>most tape systems it will also re-position them at LBOT. Any running
>tar/dump/whatever tape process would not survive such a suspend-resume
>cycle.
>
>Another more subtle issue is state information that exists between the SCSI
>controller and the target devices. At some point they might have negotiated
>synchronous and/or wide transfer parameters. This information must be
>preserved, or you'll observe lockups, data corruption and the likes. Since
>these parameters are maintained at the lowest driver level, they should know
>about suspend. The low-level driver must know to re-negotiate these
>parameters when it comes back to life.

This can be handled by having st (or sd, or whatever "client driver" decide
to take over a SCSI device) register a struct device node that is a child
of the actual SCSI device.

In fact, I'm wondering if we need a struct device node at all for the
SCSI device on the bus. The SCSI controller (or USB/storage or
FireWire SBP2) will expose SCSI devices, that is "interface" to
which you can feed SCSI requests, but do those really need to have
a structure device associated ? One possibility would be to only do
so once attached to a "client" driver like st, sd, sg, ...
The "client" would then create that structure.

But...

If we still want "unclaimed" devices to have a representation in the
device tree (because, for example, userland wants to know about them,
eventually in order to "instanciate" an sg driver), then we could have
the SCSI subsystem create a simple skeletton struct device when the
devices are probed, and have the client driver just populate this with
more infos & PM hooks once attached to the device.
I don't think there's a need to have 2 struct device stacked.

But it's mostly a matter of taste ;)

Thinking more about it, I think I prefer the second solution. That is
to have SCSI create "standard" struct device for all devices probed
on the bus, thus ensuring they are visible from the userland
representation of the device-tree, and then eventually have drivers like
sd, st, etc... add entries & PM hooks to those devices if needed when
attached to them. But doing that, or just having them create a virtual
node as a child of the device is mostly a matter of taste, I beleive.

Ben.




^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  8:27                                 ` Rob Turk
  2001-10-25 10:01                                   ` Benjamin Herrenschmidt
@ 2001-10-25 10:02                                   ` Helge Hafting
  2001-10-25 14:20                                   ` Victor Yodaiken
  2001-10-25 21:59                                   ` Pavel Machek
  3 siblings, 0 replies; 120+ messages in thread
From: Helge Hafting @ 2001-10-25 10:02 UTC (permalink / raw)
  To: Rob Turk, linux-kernel

Rob Turk wrote:

> Doing so will create havoc on sequential devices, such as tape drives. If
> your system simply suspends, then all is well. Any data that isn't flushed
> yet is buffered inside the tapedrive. But when the system resumes and resets
> the SCSI bus, it will cause all data in the tape drive to be lost, and for
> most tape systems it will also re-position them at LBOT. Any running
> tar/dump/whatever tape process would not survive such a suspend-resume
> cycle.
> 
Well, why reset the scsi bus on resume then?
That seems unnecessary.  At suspend time the devices simply 
don't get more requests. (Except perhaps spin-down 
requests for disks.)  Then nothing much happens.  Eventually
the system wakes up, and requests appear again.  First spin-up
requests, then ordinary io.

Quite a few scsi bioses have an option for not resetting
the bus when booting.  Less delay, and necessary for those
few with a shared scsi bus.  Seems a reset won't be
necessary for suspend/resume either, which is supposed to
be a lighter operation than a reboot.

If your scsi adapter don't support this - it isn't
suspend/resume compatible the way I see it.

Helge Hafting

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  9:29                                 ` Linus Torvalds
  2001-10-25  9:47                                   ` Benjamin Herrenschmidt
@ 2001-10-25 10:11                                   ` Eric W. Biederman
  2001-10-25 10:59                                     ` Linus Torvalds
  1 sibling, 1 reply; 120+ messages in thread
From: Eric W. Biederman @ 2001-10-25 10:11 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Benjamin Herrenschmidt, Alan Cox, linux-kernel, Patrick Mochel,
	Jonathan Lundell

"Linus Torvalds" <torvalds@transmeta.com> writes:

> On 25 Oct 2001, Eric W. Biederman wrote:
> >
> > Or another fun common one.  To shut down the interrupt controller, I first
> > need to shut down every device that thinks it can generate interrupts.
> > But my interrupt controller is way out on my pci->isa bridge.  So I
> > can't shut that device down.
> >
> > Sorry this whole device tree idea for shutdown ordering doesn't seem
> > to match my idea of reality.
> 
> Your _examples_ do not match any reality.

I'll go as far as agreeing they do not matching any _practical_ reality.

> Don't worry about things like the CPU shutdown: you have to have special
> code for it anyway.

But that is the case I plan on coding....
 
> Let's face it, the device tree is for _devices_. It's for shutting down a
> network card before we shut down the PCI bridge that is in front of it.
> 
> The issue of "core shutdown" is not covered - and isn't _meant_ to be
> covered. 

O.k. I'll step back and let you guys handle the normal cases.  I rarely
get past "core startup" and "core shutdown".  

> That's the problem of the architecture-specific code. There is no
> point in having a device tree for that, because it's going to be very much
> architecture-specific anyway (ie on x86 we may have to just blindly trust
> some silly APCI table data etc).

I'm doing my best to provide a real world alternative to ACPI on some
boards. 

My perspective is coming from linuxBIOS, or in general GPL'd
firmware, so it is a little different.  

But at this point in the conversation it looks like I should just back
off, let the core API get the easy cases correct.  And then come back
and figure out how to handle the truly weird cases.

Eric

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 10:11                                   ` Eric W. Biederman
@ 2001-10-25 10:59                                     ` Linus Torvalds
  0 siblings, 0 replies; 120+ messages in thread
From: Linus Torvalds @ 2001-10-25 10:59 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: Benjamin Herrenschmidt, Alan Cox, linux-kernel, Patrick Mochel,
	Jonathan Lundell


On 25 Oct 2001, Eric W. Biederman wrote:
>
> > That's the problem of the architecture-specific code. There is no
> > point in having a device tree for that, because it's going to be very much
> > architecture-specific anyway (ie on x86 we may have to just blindly trust
> > some silly APCI table data etc).
>
> I'm doing my best to provide a real world alternative to ACPI on some
> boards.

That will be much appreciated. ACPI is not all that wonderful to say the
least. With enough knowledge of the hardware you can mostly do a better
job (the problem is "enough knowledge", especially on most laptops where
most of the GPIO signals etc are pretty much ad-hoc and not defined by the
chipsets but by the board layout person.. And we can't query the board
revision even if they gave us the information ;)

> My perspective is coming from linuxBIOS, or in general GPL'd
> firmware, so it is a little different.

Understood.

		Linus


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  8:03                                     ` Benjamin Herrenschmidt
  2001-10-25  8:09                                       ` Benjamin Herrenschmidt
@ 2001-10-25 12:20                                       ` Alan Cox
  1 sibling, 0 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-25 12:20 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alan Cox, Linus Torvalds, Patrick Mochel, Jonathan Lundell, linux-kernel

> In this case, "sg" could add itself when opened, and eventually cause
> sleep requests to be rejected for example.

I think SG is the only really special case one reading over the code. 
Disk and CD might want to issue a couple of things (cache flush, unlock
media type stuff) but nothing tricky.

> Well, in fact, I don't think there is real need for the "SCSI disk device"
> node, but that depends pretty much on the new SCSI architecture. 

Sure

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  7:58                                     ` Benjamin Herrenschmidt
@ 2001-10-25 12:22                                       ` Alan Cox
  2001-10-25 14:57                                         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Alan Cox @ 2001-10-25 12:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Alan Cox, linux-kernel

> >Sound is more easily handled by not blocking user space but waiting until
> >the final IRQ off moment and grabbing the registers. That avoids a lot
> >of ugly locking gunge. It literally comes down to
> 
> My point about using a semaphore was to avoid getting mixer ioctls
> banging the HW while it is shut down.

Yes I can follow that - you want to avoid the aclink being shut down while
active. That seems to be just part of the ordering. I'd also put the ac97
save/restore in the ac97_codec.c stuff - lets write it once 8)

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  4:14                                   ` Linus Torvalds
@ 2001-10-25 12:42                                     ` Alan Cox
  2001-10-25 21:52                                       ` Xavier Bestel
  2001-10-26 11:35                                       ` Helge Hafting
  0 siblings, 2 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-25 12:42 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Alan Cox, Benjamin Herrenschmidt, linux-kernel, Patrick Mochel,
	Jonathan Lundell

> If you have a acpi deamon that decides to make the machine go to sleep
> while burning a CD, that's nothign to do with the kernel at all.

One job kernel drivers have is to say "I can't safely sleep at this moment"
Even windows/XP beta gets this right.

> The kernel does not set policy. If the user says "suspend now", then we
> suspend now. Whether a CD burn or anything else is going on is totally
> irrelevant.

I know what the end user viewpoint on that would be. In a sense I do
agree with you - but that would assume we could re-invent every single
scsi generic driver, figure out how to make /proc/sg/%d/... work and the
like

> But if I say "suspend", and the kernel refuses, I will kill the offending
> piece of crap from sg.c before you can blink an eye.

Thats fine by me. Anyone wanting to be able to burn cds safely can run a
-ac kernel tree

Alan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  8:27                                 ` Rob Turk
  2001-10-25 10:01                                   ` Benjamin Herrenschmidt
  2001-10-25 10:02                                   ` Helge Hafting
@ 2001-10-25 14:20                                   ` Victor Yodaiken
  2001-10-25 14:44                                     ` Jeff Garzik
                                                       ` (4 more replies)
  2001-10-25 21:59                                   ` Pavel Machek
  3 siblings, 5 replies; 120+ messages in thread
From: Victor Yodaiken @ 2001-10-25 14:20 UTC (permalink / raw)
  To: Rob Turk; +Cc: linux-kernel

On Thu, Oct 25, 2001 at 10:27:11AM +0200, Rob Turk wrote:
> > The act of "suspend" should basically be: shut off the SCSI controller,
> > screw all devices, reset the bus on resume.
> >
> 
> Doing so will create havoc on sequential devices, such as tape drives. If

I'm failing  to imagine a good case for suspending a system that has a
tape drive on it.





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 14:20                                   ` Victor Yodaiken
@ 2001-10-25 14:44                                     ` Jeff Garzik
  2001-10-25 14:45                                     ` Jeff Garzik
                                                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-25 14:44 UTC (permalink / raw)
  To: Victor Yodaiken; +Cc: Rob Turk, linux-kernel

Victor Yodaiken wrote:
> 
> On Thu, Oct 25, 2001 at 10:27:11AM +0200, Rob Turk wrote:
> > > The act of "suspend" should basically be: shut off the SCSI controller,
> > > screw all devices, reset the bus on resume.
> > >
> >
> > Doing so will create havoc on sequential devices, such as tape drives. If
> 
> I'm failing  to imagine a good case for suspending a system that has a
> tape drive on it.

I've often seen user workstations with tape driver.

I fail to see the need to suspend such a system while using the tape
drive, though :)

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 14:20                                   ` Victor Yodaiken
  2001-10-25 14:44                                     ` Jeff Garzik
@ 2001-10-25 14:45                                     ` Jeff Garzik
  2001-10-25 15:22                                     ` Rob Turk
                                                       ` (2 subsequent siblings)
  4 siblings, 0 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-25 14:45 UTC (permalink / raw)
  To: Victor Yodaiken; +Cc: Rob Turk, linux-kernel

Victor Yodaiken wrote:
> 
> On Thu, Oct 25, 2001 at 10:27:11AM +0200, Rob Turk wrote:
> > > The act of "suspend" should basically be: shut off the SCSI controller,
> > > screw all devices, reset the bus on resume.
> > >
> >
> > Doing so will create havoc on sequential devices, such as tape drives. If
> 
> I'm failing  to imagine a good case for suspending a system that has a
> tape drive on it.

I've often seen user workstations with tape drives.  Very uncommon these
days, agreed.

I fail to see the need to suspend such a system while using the tape
drive, though :)

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 12:22                                       ` Alan Cox
@ 2001-10-25 14:57                                         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25 14:57 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-kernel

>> My point about using a semaphore was to avoid getting mixer ioctls
>> banging the HW while it is shut down.
>
>Yes I can follow that - you want to avoid the aclink being shut down while
>active. That seems to be just part of the ordering. I'd also put the ac97
>save/restore in the ac97_codec.c stuff - lets write it once 8)

Not exactly ;) Since sleep/resume on pmac is somewhat asynchronous, things
like sound (which in my case can take 1 to 2 seconds to come back to
to calibration delay of the chip) is done async. So userland is already
running again, and may be hitting the driver with various mixer ioctls,
while my HW isn't yet ready to get them (nothing fancy here).

I do also have the problem of having the sound chip on an i2c bus on some
machiones, and so the problem of dependencies between the i2c controller
and the sound driver (samples use a separate i2s bus), but this is also
easily fixed by either having the sound chip a child of the i2c controller,
or just shutting down the i2c controller as part of the platform code
which is what I do now.

I did various experiments doing CD and/or MP3 playback and putting the
machine to sleep. The "smoothest" result I obtained was using this
semaphore on driver entrypoints. This "cleanly" suspends the sound app
on it's next call to the sleeping sound driver and resume it when the
sound driver is ready. Since the sound chip takes forever to come back,
it's long enough for disk & cd to be fully back up, and the player
not to skip when resumed :)

Ben.




^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 14:20                                   ` Victor Yodaiken
  2001-10-25 14:44                                     ` Jeff Garzik
  2001-10-25 14:45                                     ` Jeff Garzik
@ 2001-10-25 15:22                                     ` Rob Turk
  2001-10-25 15:44                                     ` Jonathan Lundell
  2001-10-25 16:26                                     ` David Lang
  4 siblings, 0 replies; 120+ messages in thread
From: Rob Turk @ 2001-10-25 15:22 UTC (permalink / raw)
  To: linux-kernel


"Victor Yodaiken" <yodaiken@fsmlabs.com> wrote in message
news:cistron.20011025082001.B764@hq2...
> On Thu, Oct 25, 2001 at 10:27:11AM +0200, Rob Turk wrote:
> > > The act of "suspend" should basically be: shut off the SCSI controller,
> > > screw all devices, reset the bus on resume.
> > >
> >
> > Doing so will create havoc on sequential devices, such as tape drives. If
>
> I'm failing  to imagine a good case for suspending a system that has a
> tape drive on it.
>

Well, maybe the tape example wasn't all that good. The state information
(wide/sync negotiation) still needs to be retained for all SCSI devices though.

Rob





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 14:20                                   ` Victor Yodaiken
                                                       ` (2 preceding siblings ...)
  2001-10-25 15:22                                     ` Rob Turk
@ 2001-10-25 15:44                                     ` Jonathan Lundell
  2001-10-25 16:26                                     ` David Lang
  4 siblings, 0 replies; 120+ messages in thread
From: Jonathan Lundell @ 2001-10-25 15:44 UTC (permalink / raw)
  To: Rob Turk, linux-kernel

At 5:22 PM +0200 10/25/01, Rob Turk wrote:
>  > I'm failing  to imagine a good case for suspending a system that has a
>>  tape drive on it.
>>
>
>Well, maybe the tape example wasn't all that good. The state information
>(wide/sync negotiation) still needs to be retained for all SCSI 
>devices though.

Any driver that uses SCSI bus reset for last-resort error recovery 
(and I think it's pretty typical) needs to be able to renegotiate the 
connection. Maybe even after a SCSI device reset; I don't recall. So 
initiating that negotiation as part of (or after) resume doesn't seem 
all that burdensome.

You need that anyway for "deep sleep" that powers down devices completely.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 14:20                                   ` Victor Yodaiken
                                                       ` (3 preceding siblings ...)
  2001-10-25 15:44                                     ` Jonathan Lundell
@ 2001-10-25 16:26                                     ` David Lang
  4 siblings, 0 replies; 120+ messages in thread
From: David Lang @ 2001-10-25 16:26 UTC (permalink / raw)
  To: Victor Yodaiken; +Cc: Rob Turk, linux-kernel

let alone a reason for a suspend to be triggered while running the tape.

David Lang

On Thu, 25 Oct 2001, Victor Yodaiken wrote:

> Date: Thu, 25 Oct 2001 08:20:01 -0600
> From: Victor Yodaiken <yodaiken@fsmlabs.com>
> To: Rob Turk <r.turk@chello.nl>
> Cc: linux-kernel@vger.kernel.org
> Subject: Re: [RFC] New Driver Model for 2.5
>
> On Thu, Oct 25, 2001 at 10:27:11AM +0200, Rob Turk wrote:
> > > The act of "suspend" should basically be: shut off the SCSI controller,
> > > screw all devices, reset the bus on resume.
> > >
> >
> > Doing so will create havoc on sequential devices, such as tape drives. If
>
> I'm failing  to imagine a good case for suspending a system that has a
> tape drive on it.
>
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 21:59                                   ` Pavel Machek
@ 2001-10-25 21:32                                     ` Rob Turk
  0 siblings, 0 replies; 120+ messages in thread
From: Rob Turk @ 2001-10-25 21:32 UTC (permalink / raw)
  To: linux-kernel


"Pavel Machek" <pavel@suse.cz> wrote in message
news:cistron.20011025235935.B10358@elf.ucw.cz...
> Hi!
>
> > > The act of "suspend" should basically be: shut off the SCSI controller,
> > > screw all devices, reset the bus on resume.
> > >
> >
> > Doing so will create havoc on sequential devices, such as tape drives. If
> > your system simply suspends, then all is well. Any data that isn't flushed
> > yet is buffered inside the tapedrive. But when the system resumes and resets
> > the SCSI bus, it will cause all data in the tape drive to be lost, and for
> > most tape systems it will also re-position them at LBOT. Any running
> > tar/dump/whatever tape process would not survive such a suspend-resume
> > cycle.
>
> Then there's something wrong with st.
>
> Imagine EMI comes and SCSI gets reset. That should not mean tar
> failing, right? So you have st broken in first place.
> Pavel
> --
No, it's an inherant part of the SCSI spec. A SCSI bus reset will cause many, if
not all tape devices to rewind to begin of tape. The st driver can detect this
(A SCSI Unit Attention will be returned on the next SCSI command), and try to
re-position the tape to it's previous location. Doing so is not easy, and on
many tape drives even impractical. On an almost full tape, a DLT drive would
take up to two hours to get back to where it last was. Too much for most
time-out mechanisms...

On a SCSI bus reset, tape related processes are better off passing the condition
upward to user space (tar, dump, whatever). Intelligent user space programs may
be able to recover, the dumb ones will fail.

By the way, EMI is very unlikely to cause a SCSI bus reset. It may cause
repeated parity errorsto the point that a host or devices decides to reset the
bus. If there's this much EMI, then you should use a better transport (HVD SCSI,
or fibre). But this part of the discussion is probably better helt at
comp.periphs.scsi 8-)

Rob





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-24 17:33                                 ` Benjamin Herrenschmidt
  2001-10-24 22:41                                   ` Alan Cox
@ 2001-10-25 21:47                                   ` Pavel Machek
  1 sibling, 0 replies; 120+ messages in thread
From: Pavel Machek @ 2001-10-25 21:47 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Linus Torvalds, Alan Cox, linux-kernel, Patrick Mochel, Jonathan Lundell

Hi!

> >We should not have pending IO, but that's for a totally different reason:
> >the first thing the much much MUCH higher levels of suspend should be
> >doing is to make sure that user apps are "quiescent". And that isn't done
> >by getting involved with sg.c or anything similar, but by basically
> >stopping all user apps (think of the equivalent of a "kill -STOP -1", but
> >done internally in the kernel without actually using a signal).
> 
> Is this necessary ? That would definitely make things easier to implement
> to forget about incoming requests, but I'm not sure it's the right way.
> In fact, is it really working ? You could well have in-kernel threads
> triggering IOs or launching new userland stuffs (what happens if a not
> yet suspended driver for, let's say USB, see a new device coming in 
> and starts /sbin/hotplug). Some filesystems have garbage collector threads
> that can do IOs eventually. There are various kind of IOs that can in fact
> be triggered entirely within the kernel.

I need this for suspend to disk. I really need quiescent system if I
want to write to swap image, right?

I implemented a "refrigerator" mechanism, and manually marked threads
that should not be frozen. I'm doing my best to stop everyone who
might try to wake userland.

[I just mailed the patch as 2.4.13 - swsusp. Tell me if you want a
copy.]

								Pavel
-- 
STOP THE WAR! Someone killed innocent Americans. That does not give
U.S. right to kill people in Afganistan.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 12:42                                     ` Alan Cox
@ 2001-10-25 21:52                                       ` Xavier Bestel
  2001-10-25 23:53                                         ` Benjamin Herrenschmidt
  2001-10-26 11:35                                       ` Helge Hafting
  1 sibling, 1 reply; 120+ messages in thread
From: Xavier Bestel @ 2001-10-25 21:52 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, Benjamin Herrenschmidt,
	Linux Kernel Mailing List, Patrick Mochel, Jonathan Lundell

le jeu 25-10-2001 à 14:42, Alan Cox a écrit :
> > If you have a acpi deamon that decides to make the machine go to sleep
> > while burning a CD, that's nothign to do with the kernel at all.
> 
> One job kernel drivers have is to say "I can't safely sleep at this moment"
> Even windows/XP beta gets this right.

The other solution (let cdrecord tell I-dunno-how the PM daemon that
it's doing something "important") is IMHO better: the PM daemon could
judge if it should honor the suspend request depending on its priority
(inactivity, power button or low battery) and the running "important"
jobs.

	Xav


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25  8:27                                 ` Rob Turk
                                                     ` (2 preceding siblings ...)
  2001-10-25 14:20                                   ` Victor Yodaiken
@ 2001-10-25 21:59                                   ` Pavel Machek
  2001-10-25 21:32                                     ` Rob Turk
  3 siblings, 1 reply; 120+ messages in thread
From: Pavel Machek @ 2001-10-25 21:59 UTC (permalink / raw)
  To: Rob Turk; +Cc: linux-kernel

Hi!

> > The act of "suspend" should basically be: shut off the SCSI controller,
> > screw all devices, reset the bus on resume.
> >
> 
> Doing so will create havoc on sequential devices, such as tape drives. If
> your system simply suspends, then all is well. Any data that isn't flushed
> yet is buffered inside the tapedrive. But when the system resumes and resets
> the SCSI bus, it will cause all data in the tape drive to be lost, and for
> most tape systems it will also re-position them at LBOT. Any running
> tar/dump/whatever tape process would not survive such a suspend-resume
> cycle.

Then there's something wrong with st.

Imagine EMI comes and SCSI gets reset. That should not mean tar
failing, right? So you have st broken in first place.
								Pavel
-- 
STOP THE WAR! Someone killed innocent Americans. That does not give
U.S. right to kill people in Afganistan.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 23:53                                         ` Benjamin Herrenschmidt
@ 2001-10-25 23:53                                           ` Alan Cox
  0 siblings, 0 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-25 23:53 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Xavier Bestel, Alan Cox, Linus Torvalds, Patrick Mochel,
	Linux Kernel Mailing List

> The cdrecord case is a high level issue, and scsi is a mess ;)

Grin

> We are not yet at a point where we can be more constructive
> than what was already said. Ultimately we need to move a bit
> forward with the real implementation and see how some problems
> show up. The architecture as it was designed so far is light,
> and most of the debate is not around it, it's around how it
> should be used by drivers & kernel subsystems ;)

I think I understand how to handle this and avoid races. Linus idea of
/proc files so you can ask "what is busy" solves most of it. Then the
policy daemon can make a choice about suspending or not.

If we make the proc file a large bitmask of events then the policy daemon
call to kernel becomes

	"suspend even if [event-mask] set"

This means that a daemon call that races the start of say a CD burn will
fail and the daemon can rescan, rethink and if need be reissue the request
sanely.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 21:52                                       ` Xavier Bestel
@ 2001-10-25 23:53                                         ` Benjamin Herrenschmidt
  2001-10-25 23:53                                           ` Alan Cox
  0 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-25 23:53 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Alan Cox, Linus Torvalds, Patrick Mochel, Linux Kernel Mailing List

>> One job kernel drivers have is to say "I can't safely sleep at this moment"
>> Even windows/XP beta gets this right.
>
>The other solution (let cdrecord tell I-dunno-how the PM daemon that
>it's doing something "important") is IMHO better: the PM daemon could
>judge if it should honor the suspend request depending on its priority
>(inactivity, power button or low battery) and the running "important"
>jobs.

The cdrecord case is a high level issue, and scsi is a mess ;)

We are not yet at a point where we can be more constructive
than what was already said. Ultimately we need to move a bit
forward with the real implementation and see how some problems
show up. The architecture as it was designed so far is light,
and most of the debate is not around it, it's around how it
should be used by drivers & kernel subsystems ;)

Also, there will always be corner cases where a kernel driver will
be able to override whatever policy was set. Drivers can register
to the PM layer and can return errors. It have to be exceptional,
but I beleive some critical flash algorithm or whatever would
benefit from it.

I would however vote for making that a bit simpler by providing a
kernel-global sleep rw semaphore. The idea here is that drivers, file
systems, or whatever bits of kernel code that, for a given momment,
won't afford beeing put to sleep will take a read lock on it. The
sleep code will take a write lock. The read lockers won't make the
PM code fail. It will just block a bit, waiting for the critical
operations to complete.

I'm thinking about things like jffs2 on embedded. That (nice) flash
file system has a background thread that does garbage collecting
of your flash in order to be able to erase unused parts. It would
benefit greatly embedded boards that implement system sleep if
that thread could make sure sleep won't happen in the middle of
moving blocks, or whatever atomicity it may want for its operations.

A firmware flashing facility in a driver may use that mecanism too.

It's a corner case, it's not meant to be used a lot, but it nice
to have it.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-25 12:42                                     ` Alan Cox
  2001-10-25 21:52                                       ` Xavier Bestel
@ 2001-10-26 11:35                                       ` Helge Hafting
  2001-10-26 12:38                                         ` Alan Cox
  1 sibling, 1 reply; 120+ messages in thread
From: Helge Hafting @ 2001-10-26 11:35 UTC (permalink / raw)
  To: linux-kernel

Alan Cox wrote:

> 
> > But if I say "suspend", and the kernel refuses, I will kill the offending
> > piece of crap from sg.c before you can blink an eye.
> 
> Thats fine by me. Anyone wanting to be able to burn cds safely can run a
> -ac kernel tree

Telling the kernel to suspend while burning a CD is
on the same level as ejecting the CD while burning.  
It has to go wrong.  Someone explicitly asking for
trouble might as well get it.

The really dumb users is probably using a GUI tool
for either activity, that one may of course refuse
to ruin the burn.

Helge Hafting

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-26 11:35                                       ` Helge Hafting
@ 2001-10-26 12:38                                         ` Alan Cox
  0 siblings, 0 replies; 120+ messages in thread
From: Alan Cox @ 2001-10-26 12:38 UTC (permalink / raw)
  To: Helge Hafting; +Cc: linux-kernel

> Telling the kernel to suspend while burning a CD is
> on the same level as ejecting the CD while burning.  
> It has to go wrong.  Someone explicitly asking for
> trouble might as well get it.

It need not be someone asking for trouble. It might just be a ten minute
"nothing happened" timeout that starts the decision making.

> The really dumb users is probably using a GUI tool
> for either activity, that one may of course refuse
> to ruin the burn.

The current GUI tools don't know anything about 2.5 power management, and
in some cases don't know when the driver has done needed work or not.

Alan

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-20 13:52         ` Kai Henningsen
  2001-10-22 11:02           ` Padraig Brady
@ 2001-10-27 11:01           ` Kai Henningsen
  1 sibling, 0 replies; 120+ messages in thread
From: Kai Henningsen @ 2001-10-27 11:01 UTC (permalink / raw)
  To: linux-kernel

padraig@antefacto.com (Padraig Brady)  wrote on 22.10.01 in <3BD3FCD6.5010802@antefacto.com>:

> For ethernet cards (or anything else) on the PCI bus you can use
> the following to specify an order:
> ftp://platan.vc.cvut.cz/pub/linux/pciorder.patch-2.4.12-ac1.gz
> This allows you to pass the following parameter to the kernel:
> pciorder=Bus:Node.Fn,Bus:Node.Fn,... e.g.
> pciorder=0:0d.0,0:0b.0,0:0a.0

That doesn't look useful.

MfG Kai

^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC] New Driver Model for 2.5
  2001-10-24 17:56 Grover, Andrew
@ 2001-10-24 18:45 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-24 18:45 UTC (permalink / raw)
  To: Grover, Andrew, linux-kernel

>
>Awesome.
>
>So non i386 archs do not have the problem with the video bios having to run
>on resume, or did you have to handle this somehow?

Fortunately, Mac laptops don't shut the chip down, the PM microcontroller will
just suspend the clock to it. fbdev's are mandatory on macs, and so we use the
fbdev for mach64 or r128 (the 2 types of chips you find on mac laptops, radeon
is coming soon however) to save a few things and put the chip in D2 mode
(or vendor specific suspend mode for mach64).

The problem do exist with Mac desktops as they power off the PCI and AGP
slots. That's the main reason why I don't add support for those in Linux
currently. We need some way to revive the card, which can be either done
with a chip-specific init sequence (in the fbdev), a small forth emulator
with enough of Open Firmware environement to run the OF driver for the
card, or a shell to run the MacOS driver for the card. All of these
solutions are tricky however.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC] New Driver Model for 2.5
@ 2001-10-24 17:56 Grover, Andrew
  2001-10-24 18:45 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Grover, Andrew @ 2001-10-24 17:56 UTC (permalink / raw)
  To: 'Benjamin Herrenschmidt'
  Cc: Alan Cox, linux-kernel, Patrick Mochel, Jonathan Lundell, Linus Torvalds

> From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> I have all of this more or less working on pmac laptops. I don't have
> the new device model, so I handle dependencies manually with 
> a priority
> mecanism, but it's already good enough to let me resume 
> userland before
> ADB and sound, and possibly stuffs. (Which is nice since my 
> sound chips
> usually need one or 2 second to recalibrate and ADB need a 
> few seconds to
> probe the bus, all this happens asynchronously).

Awesome.

So non i386 archs do not have the problem with the video bios having to run
on resume, or did you have to handle this somehow?

Regards -- Andy

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-20 13:52         ` Kai Henningsen
@ 2001-10-22 11:02           ` Padraig Brady
  2001-10-27 11:01           ` Kai Henningsen
  1 sibling, 0 replies; 120+ messages in thread
From: Padraig Brady @ 2001-10-22 11:02 UTC (permalink / raw)
  To: Kai Henningsen; +Cc: linux-kernel

Kai Henningsen wrote:

>mfedyk@matchmail.com (Mike Fedyk)  wrote on 19.10.01 in <20011019122101.G2467@mikef-linux.matchmail.com>:
>
>>On Fri, Oct 19, 2001 at 09:02:09PM +0200, Tim Jansen wrote:
>>
>>>On Friday 19 October 2001 20:26, Patrick Mochel wrote:
>>>
>>>>There are equivalents in USB. But, neither of them are globally unique
>>>>identifiers for the device. That doesn't necessarily mean that one
>>>>couldn't be ascertained from the device; ethernet cards do have MAC
>>>>addresses. But, I don't think that many will have a ID/serial number.
>>>>[...]
>>>>Which leads me to the question: what real benefit does this have? Why
>>>>would you ever want to do a global search in kernel space for a
>>>>particular device?
>>>>
>>>For example for harddisks. You usually want them to be mounted in the same
>>>directory.
>>>
>>When is /etc/fstab going to support this?
>>
>
>Know your tools.
>
>/etc/fstab:
>UUID=eba05cbf-55ff-44d7-846a-7846c6010843 /usr ext2 defaults,nocheck 0 2
>
>I have this mounted right now, on 2.2.19:
>
>/dev/sdb7              3936400   3597588    138852  97% /usr
>
>That's an ext2 partition ID, so even if repartitioning renumbers the  
>partition, mount will still find it - only mkfs forces me to use a new ID.  
>Changing the controller and SCSI id obviously makes no difference  
>whatsoever. I could use labels, too, but they tend to be less unique.
>
>/proc/partitions is necessary to know what partitions to look at, of  
>course.
>
>>>Or for ethernet adapters:
>>>because each is connected to a different network, so you need to assign
>>>different IP addresses to them.
>>>
>>I haven't seen anything assign ethX assign a certain order, except for
>>ordered module loading, and then if there are multiple devices with the same
>>driver, the order is chosen by bus scanning order, or module option.
>>
>
>Exactly. So you can't use the order if there's any possibility of this, so  
>you need to use the MAC address.
>
Yes this (MAC address) is the only general way of doing it.
For ethernet cards (or anything else) on the PCI bus you can use
the following to specify an order:
ftp://platan.vc.cvut.cz/pub/linux/pciorder.patch-2.4.12-ac1.gz
This allows you to pass the following parameter to the kernel:
pciorder=Bus:Node.Fn,Bus:Node.Fn,... e.g.
pciorder=0:0d.0,0:0b.0,0:0a.0

Padraig.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 18:26       ` Patrick Mochel
  2001-10-19 19:02         ` Tim Jansen
@ 2001-10-20 13:52         ` Kai Henningsen
  2001-10-22 11:02           ` Padraig Brady
  2001-10-27 11:01           ` Kai Henningsen
  1 sibling, 2 replies; 120+ messages in thread
From: Kai Henningsen @ 2001-10-20 13:52 UTC (permalink / raw)
  To: linux-kernel

mfedyk@matchmail.com (Mike Fedyk)  wrote on 19.10.01 in <20011019122101.G2467@mikef-linux.matchmail.com>:

> On Fri, Oct 19, 2001 at 09:02:09PM +0200, Tim Jansen wrote:
> > On Friday 19 October 2001 20:26, Patrick Mochel wrote:
> > > There are equivalents in USB. But, neither of them are globally unique
> > > identifiers for the device. That doesn't necessarily mean that one
> > > couldn't be ascertained from the device; ethernet cards do have MAC
> > > addresses. But, I don't think that many will have a ID/serial number.
> > > [...]
> > > Which leads me to the question: what real benefit does this have? Why
> > > would you ever want to do a global search in kernel space for a
> > > particular device?
> >
> > For example for harddisks. You usually want them to be mounted in the same
> > directory.
>
> When is /etc/fstab going to support this?

Know your tools.

/etc/fstab:
UUID=eba05cbf-55ff-44d7-846a-7846c6010843 /usr ext2 defaults,nocheck 0 2

I have this mounted right now, on 2.2.19:

/dev/sdb7              3936400   3597588    138852  97% /usr

That's an ext2 partition ID, so even if repartitioning renumbers the  
partition, mount will still find it - only mkfs forces me to use a new ID.  
Changing the controller and SCSI id obviously makes no difference  
whatsoever. I could use labels, too, but they tend to be less unique.

/proc/partitions is necessary to know what partitions to look at, of  
course.

> >Or for ethernet adapters:
> > because each is connected to a different network, so you need to assign
> > different IP addresses to them.
> >
>
> I haven't seen anything assign ethX assign a certain order, except for
> ordered module loading, and then if there are multiple devices with the same
> driver, the order is chosen by bus scanning order, or module option.

Exactly. So you can't use the order if there's any possibility of this, so  
you need to use the MAC address.

MfG Kai

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 20:07             ` Tim Jansen
  2001-10-19 20:24               ` Mike Fedyk
@ 2001-10-20 13:47               ` Kai Henningsen
  1 sibling, 0 replies; 120+ messages in thread
From: Kai Henningsen @ 2001-10-20 13:47 UTC (permalink / raw)
  To: linux-kernel

tim@tjansen.de (Tim Jansen)  wrote on 20.10.01 in <15ui2Y-05aCZcC@fmrl05.sul.t-online.com>:

> On Friday 19 October 2001 22:24, you wrote:

> > > Ok, but I think no one doubts that it is a bad idea to assign ethX
> > > semi-randomly. Basically this is the same problem as with device files,
> > > only in a different namespace.
> > So is that in favor of changing the current ethX naming convention or not?
>
> I don't know. You don't need a device file for networking, but if there is
> some mechanism to allow stable names it would certainly be good to use it
> for network, too.

You need stable identifiers, but those identifiers don't need to be the  
usual names, as long as you have a way to find out which identifier goes  
with which name dynamically.


MfG Kai

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 19:21           ` Mike Fedyk
  2001-10-19 20:07             ` Tim Jansen
@ 2001-10-20  1:41             ` john slee
  1 sibling, 0 replies; 120+ messages in thread
From: john slee @ 2001-10-20  1:41 UTC (permalink / raw)
  To: Tim Jansen, Patrick Mochel, linux-kernel

On Fri, Oct 19, 2001 at 12:21:01PM -0700, Mike Fedyk wrote:
> When is /etc/fstab going to support this?

it does; at least on my debian system:

# e2label /dev/hda1

# e2label /dev/hda1 foo 
# e2label /dev/hda1
foo
# mount LABEL=foo /mnt
# 

you can use the same LABEL=foo syntax in /etc/fstab...
according to my fstab(5) manpage this also works with xfs, although i've
not tried it.

surely i am not deluded and this is a not-debian-specific feature?
having used nothing but debian for some years now i really can't be
sure...

j.

--
R N G G   "Well, there it goes again... And we just sit 
 I G G G   here without opposable thumbs." -- gary larson

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 23:30     ` Benjamin Herrenschmidt
@ 2001-10-19 23:54       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-19 23:54 UTC (permalink / raw)
  To: Patrick Mochel, linux-kernel, Jeff Garzik

>Reading about the suspend to disk issue, and thinking about
>some of my needs, I tend to stil think we have overlooked
>that issue. We should probably add a couple of list_heads
>to define a second tree in parallell to the device-tree, which
>is the power tree. A device is by default inserted in both
>tree as a child of it's bus controller. But the arch must be
>able to move it elsewhere. I beleive we have a way around the
>VM related ordering issues, but we do have other kind of
>ordering constraints. 
>
> .../...

Argh, I'm too tired, I sent the draft ! sorry ;)

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 12:13   ` Benjamin Herrenschmidt
                       ` (3 preceding siblings ...)
  2001-10-19 15:21     ` Taral
@ 2001-10-19 23:30     ` Benjamin Herrenschmidt
  2001-10-19 23:54       ` Benjamin Herrenschmidt
  4 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-19 23:30 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, Jeff Garzik

Reading about the suspend to disk issue, and thinking about
some of my needs, I tend to stil think we have overlooked
that issue. We should probably add a couple of list_heads
to define a second tree in parallell to the device-tree, which
is the power tree. A device is by default inserted in both
tree as a child of it's bus controller. But the arch must be
able to move it elsewhere. I beleive we have a way around the
VM related ordering issues, but we do have other kind of
ordering constraints. 

It may be me not reading well, but I think you didn't define
the fact that io_bus is a superset of device. In fact, it's
just a device that has childs, and this should probably be
more generically viewed in struct device itself. Any device
should be able to have childs, so we really have 2 interleaved
trees of devices, the bus tree and the power tree. In fact,
to be complete, we could even define the interrupt tree with
one more set of links as it's really not related to the bus
tree on many archs/machines, and having a tree definition
is really useful when you deal with cascaded controllers.

What do you think ?

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 20:24               ` Mike Fedyk
@ 2001-10-19 22:25                 ` Tim Jansen
  0 siblings, 0 replies; 120+ messages in thread
From: Tim Jansen @ 2001-10-19 22:25 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

On Friday 19 October 2001 22:24, you wrote:
> > You could encode that device id in the node's path or use the path as a
> > moniker for the device id (the symlink solution does this), but you need
> > to have more information about the device than it's minor number (the X
> > in /dev/lpX).
> What does devfs do now?

Gets the name from the device driver, usually X is the minor number (or the 
minor number + some constant, if several drivers share a major number). The 
names are only constant if the devices are discovered in the same order.


> > Ok, but I think no one doubts that it is a bad idea to assign ethX
> > semi-randomly. Basically this is the same problem as with device files,
> > only in a different namespace.
> So is that in favor of changing the current ethX naming convention or not?

I don't know. You don't need a device file for networking, but if there is 
some mechanism to allow stable names it would certainly be good to use it for 
network, too. 


> > The device registry (www.tjansen.de/devreg) patches devfs to allow the
> > things described above though.
> Everything, with all of the ids?  What about scsi/ide?

Only SCSI/sd, PCI and USB.

bye...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* RE: [RFC] New Driver Model for 2.5
@ 2001-10-19 21:43 Grover, Andrew
  0 siblings, 0 replies; 120+ messages in thread
From: Grover, Andrew @ 2001-10-19 21:43 UTC (permalink / raw)
  To: 'Mike Fedyk', Tim Jansen; +Cc: linux-kernel, Patrick Mochel

My impression was that while long-term having a device tree (with or without
a fs to expose it to userland) may help with the infamous Linux naming
issues, that the first go-round should try to completely avoid this issue
entirely, and focus on just enabling the global device tree itself. (I just
want to suspend/wake my laptop's devices in the right order!)

Everyone pretty much agrees that the device tree and device power management
are good. My hope is we don't let other contentious issues hinder its
implementation.

Regards -- Andy


> From: Mike Fedyk [mailto:mfedyk@matchmail.com]
> Sent: Friday, October 19, 2001 1:24 PM
> To: Tim Jansen
> Cc: linux-kernel@vger.kernel.org; Patrick Mochel
> Subject: Re: [RFC] New Driver Model for 2.5
> 
> 
> On Fri, Oct 19, 2001 at 10:07:39PM +0200, Tim Jansen wrote:
> > On Friday 19 October 2001 21:21, you wrote:
> > > > For example for harddisks. You usually want them to be 
> mounted in the
> > > > same directory.
> > > When is /etc/fstab going to support this?
> > 
> > You can use the device ids to provide stable symlinks, then 
> /etc/fstab 
> > shouldn't be a problem. 
> 
> Sounds good.
> 
> >Or you rewrite mount to support it. Or you do it in
> > the kernel with a user-space helper: when a new device is 
> connected its ID is 
> > sent to some user-space app, and the user-space app then 
> assigns a minor 
> > number and devfs name to the node.
> >
> 
> Or, just use autofs, it does pretty much what you're describing.
> 
> > IMHO using the path of a file in /dev to identify a device 
> node does not work 
> > in a hotplugging environment. You need this to support 
> existing apps, but the
> > only way to be sure that you always get the same device is 
> to use device IDs. 
> 
> Actually, I don't have a hotplug envoronment, but that's not 
> the only place
> it would be useful.  Does ide/scsi have reliably unique 
> device IDs?  If so,
> once devfs gets rid of those races it would be very useful in 
> a large raid
> setup.  Hmm, I guess that could be hot-pluggable with high 
> end hardware.
> 
> > You could encode that device id in the node's path or use 
> the path as a 
> > moniker for the device id (the symlink solution does this), 
> but you need to 
> > have more information about the device than it's minor 
> number (the X in 
> > /dev/lpX).
> >
> 
> What does devfs do now?
> 
> > 
> > > >Or for ethernet adapters:
> > > > because each is connected to a different network, so 
> you need to assign
> > > > different IP addresses to them.
> > > I haven't seen anything assign ethX assign a certain 
> order, except for
> > > ordered module loading, and then if there are multiple 
> devices with the
> > > same driver, the order is chosen by bus scanning order, 
> or module option.
> > 
> > Ok, but I think no one doubts that it is a bad idea to assign ethX 
> > semi-randomly. Basically this is the same problem as with 
> device files, only 
> > in a different namespace.
> >
> 
> So is that in favor of changing the current ethX naming 
> convention or not?
> 
> > 
> > > Does anyone know if devfs will, or has any plans to 
> support any of the
> > > above features?
> > 
> > The device registry (www.tjansen.de/devreg) patches devfs 
> to allow the things 
> > described above though.
> > 
> 
> Everything, with all of the ids?  What about scsi/ide?
> 
> -
> To unsubscribe from this list: send the line "unsubscribe 
> linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 20:07             ` Tim Jansen
@ 2001-10-19 20:24               ` Mike Fedyk
  2001-10-19 22:25                 ` Tim Jansen
  2001-10-20 13:47               ` Kai Henningsen
  1 sibling, 1 reply; 120+ messages in thread
From: Mike Fedyk @ 2001-10-19 20:24 UTC (permalink / raw)
  To: Tim Jansen; +Cc: linux-kernel, Patrick Mochel

On Fri, Oct 19, 2001 at 10:07:39PM +0200, Tim Jansen wrote:
> On Friday 19 October 2001 21:21, you wrote:
> > > For example for harddisks. You usually want them to be mounted in the
> > > same directory.
> > When is /etc/fstab going to support this?
> 
> You can use the device ids to provide stable symlinks, then /etc/fstab 
> shouldn't be a problem. 

Sounds good.

>Or you rewrite mount to support it. Or you do it in
> the kernel with a user-space helper: when a new device is connected its ID is 
> sent to some user-space app, and the user-space app then assigns a minor 
> number and devfs name to the node.
>

Or, just use autofs, it does pretty much what you're describing.

> IMHO using the path of a file in /dev to identify a device node does not work 
> in a hotplugging environment. You need this to support existing apps, but the
> only way to be sure that you always get the same device is to use device IDs. 

Actually, I don't have a hotplug envoronment, but that's not the only place
it would be useful.  Does ide/scsi have reliably unique device IDs?  If so,
once devfs gets rid of those races it would be very useful in a large raid
setup.  Hmm, I guess that could be hot-pluggable with high end hardware.

> You could encode that device id in the node's path or use the path as a 
> moniker for the device id (the symlink solution does this), but you need to 
> have more information about the device than it's minor number (the X in 
> /dev/lpX).
>

What does devfs do now?

> 
> > >Or for ethernet adapters:
> > > because each is connected to a different network, so you need to assign
> > > different IP addresses to them.
> > I haven't seen anything assign ethX assign a certain order, except for
> > ordered module loading, and then if there are multiple devices with the
> > same driver, the order is chosen by bus scanning order, or module option.
> 
> Ok, but I think no one doubts that it is a bad idea to assign ethX 
> semi-randomly. Basically this is the same problem as with device files, only 
> in a different namespace.
>

So is that in favor of changing the current ethX naming convention or not?

> 
> > Does anyone know if devfs will, or has any plans to support any of the
> > above features?
> 
> The device registry (www.tjansen.de/devreg) patches devfs to allow the things 
> described above though.
> 

Everything, with all of the ids?  What about scsi/ide?


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 19:21           ` Mike Fedyk
@ 2001-10-19 20:07             ` Tim Jansen
  2001-10-19 20:24               ` Mike Fedyk
  2001-10-20 13:47               ` Kai Henningsen
  2001-10-20  1:41             ` john slee
  1 sibling, 2 replies; 120+ messages in thread
From: Tim Jansen @ 2001-10-19 20:07 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel, Patrick Mochel

On Friday 19 October 2001 21:21, you wrote:
> > For example for harddisks. You usually want them to be mounted in the
> > same directory.
> When is /etc/fstab going to support this?

You can use the device ids to provide stable symlinks, then /etc/fstab 
shouldn't be a problem. Or you rewrite mount to support it. Or you do it in 
the kernel with a user-space helper: when a new device is connected its ID is 
sent to some user-space app, and the user-space app then assigns a minor 
number and devfs name to the node.

IMHO using the path of a file in /dev to identify a device node does not work 
in a hotplugging environment. You need this to support existing apps, but the 
only way to be sure that you always get the same device is to use device IDs. 
You could encode that device id in the node's path or use the path as a 
moniker for the device id (the symlink solution does this), but you need to 
have more information about the device than it's minor number (the X in 
/dev/lpX).


> >Or for ethernet adapters:
> > because each is connected to a different network, so you need to assign
> > different IP addresses to them.
> I haven't seen anything assign ethX assign a certain order, except for
> ordered module loading, and then if there are multiple devices with the
> same driver, the order is chosen by bus scanning order, or module option.

Ok, but I think no one doubts that it is a bad idea to assign ethX 
semi-randomly. Basically this is the same problem as with device files, only 
in a different namespace.


> Does anyone know if devfs will, or has any plans to support any of the
> above features?

The device registry (www.tjansen.de/devreg) patches devfs to allow the things 
described above though.

bye...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 19:02         ` Tim Jansen
@ 2001-10-19 19:21           ` Mike Fedyk
  2001-10-19 20:07             ` Tim Jansen
  2001-10-20  1:41             ` john slee
  0 siblings, 2 replies; 120+ messages in thread
From: Mike Fedyk @ 2001-10-19 19:21 UTC (permalink / raw)
  To: Tim Jansen; +Cc: Patrick Mochel, linux-kernel

On Fri, Oct 19, 2001 at 09:02:09PM +0200, Tim Jansen wrote:
> On Friday 19 October 2001 20:26, Patrick Mochel wrote:
> > There are equivalents in USB. But, neither of them are globally unique
> > identifiers for the device. That doesn't necessarily mean that one
> > couldn't be ascertained from the device; ethernet cards do have MAC
> > addresses. But, I don't think that many will have a ID/serial number.
> > [...]
> > Which leads me to the question: what real benefit does this have? Why
> > would you ever want to do a global search in kernel space for a particular
> > device? 
> 
> For example for harddisks. You usually want them to be mounted in the same 
> directory. 

When is /etc/fstab going to support this?

#Or if you have several printers of the same type connected your
> computer you need a way of identifying them. 

/dev/ttyS0 and /dev/lp0

>Or for ethernet adapters: 
> because each is connected to a different network, so you need to assign 
> different IP addresses to them.
>

I haven't seen anything assign ethX assign a certain order, except for
ordered module loading, and then if there are multiple devices with the same
driver, the order is chosen by bus scanning order, or module option.

> Actually most USB harddisks, printers and network adapters have unique serial 
> number (you have to be careful though as some claim to have a serial number, 
> but it is not unique).
>

How different do you expect this new driver model to be?

Does anyone know if devfs will, or has any plans to support any of the above
features?

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 18:26       ` Patrick Mochel
@ 2001-10-19 19:02         ` Tim Jansen
  2001-10-19 19:21           ` Mike Fedyk
  2001-10-20 13:52         ` Kai Henningsen
  1 sibling, 1 reply; 120+ messages in thread
From: Tim Jansen @ 2001-10-19 19:02 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel

On Friday 19 October 2001 20:26, Patrick Mochel wrote:
> There are equivalents in USB. But, neither of them are globally unique
> identifiers for the device. That doesn't necessarily mean that one
> couldn't be ascertained from the device; ethernet cards do have MAC
> addresses. But, I don't think that many will have a ID/serial number.
> [...]
> Which leads me to the question: what real benefit does this have? Why
> would you ever want to do a global search in kernel space for a particular
> device? 

For example for harddisks. You usually want them to be mounted in the same 
directory. Or if you have several printers of the same type connected your 
computer you need a way of identifying them. Or for ethernet adapters: 
because each is connected to a different network, so you need to assign 
different IP addresses to them.

Actually most USB harddisks, printers and network adapters have unique serial 
number (you have to be careful though as some claim to have a serial number, 
but it is not unique).

bye...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19  7:57     ` Henning P. Schmiedehausen
  2001-10-19  8:09       ` Jeff Garzik
@ 2001-10-19 18:50       ` Tim Jansen
  1 sibling, 0 replies; 120+ messages in thread
From: Tim Jansen @ 2001-10-19 18:50 UTC (permalink / raw)
  To: hps, Henning P. Schmiedehausen, linux-kernel

On Friday 19 October 2001 09:57, Henning P. Schmiedehausen wrote:
> Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> >>> struct device {
> And a version field! Please add a version field right to the
> beginning. This would make supporting legacy drivers in later versions
> _much_ easier.

IMHO it would be a good idea not to create and fill those structs using a 
function, instead of letting the driver code create the struct or using 
versions. 

In the current version of the patch struct device is allocated using 

struct device *device_alloc_dev(void);

and then later registered using 

int device_register_dev(struct device *dev);


In other words the fields are set by the bus driver. The problem is that when 
somebody adds a new, required field then existing code will silently break. 
So I would propose to think about using something like

struct device *device_create_dev(const char *name,
			const char *bus_id,
			struct device_driver *driver, 
			void *driver_data, 
			void *platform_data,
			u32 current_state);

The advantage is that when you add a new field the old code won't compile 
before it has been fixed. It also allows you to do large changes in the 
underlying code without breaking source compatibility. 
The disadvantage is that you cannot add a field that should be specifier by 
the caller without either adding a new function or destroying source 
compatibility.

bye...


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19 17:01 Kevin Easton
@ 2001-10-19 18:40 ` Patrick Mochel
  0 siblings, 0 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-19 18:40 UTC (permalink / raw)
  To: linux-kernel


> Am I correct in thinking that the current "state of play" after these recent
> discussions is a 3 step suspend process, following an algorithm similar to:

Yes. After some discussion, I think we need a 3-step process. I will be
updating the docs today.

> If this is approximately the right idea, then how will write_out_state work if
> the device(s) that this operation uses aren't accepting requests anymore
> (because they've done suspend_save_state)?  Is it that "Stop accepting
> requests" is actually "Stop accepting requests that will cause a change in the
> device state"?  In that case, devices that can have the state written out to
> them will be limited to those where the act of writing it out will never cause
> such a request, right?

That's an interesting question, and one that depends on the answer to
several questions.

The mechanism for going to sleep is dependent first on the architecture
and secondly on the power managment scheme. It is up to the scheme to work
out the finer details concerning it.

(That's not a copout; we're just not likely to have a generic suspend
routine. Even if every implementation is using the same mechanism, I don't
know if it could ever be consolidated into one singular body of code.)

So then, how do we do suspend to disk? All the progress in that area has
been made by swsusp. I don't know the finer details of how it works, so
I'm not about to comment on how to make it work or modify it to better fit
our needs. Maybe someone from that camp could comment on whether or not
the 3-stage model would completely screw them or not? Or, how to make it
work under this model? Or if it even matters?

	-pat





^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 22:10     ` Kai Henningsen
@ 2001-10-19 18:26       ` Patrick Mochel
  2001-10-19 19:02         ` Tim Jansen
  2001-10-20 13:52         ` Kai Henningsen
  0 siblings, 2 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-19 18:26 UTC (permalink / raw)
  To: linux-kernel


> > Hmm. So, this would be a device ID, much like the Vendor/Device ID pair in
> > PCI space?
>
> Except for the fact that the Vendor/Device ID pair is a device *class*
> identifier, and the uuid is a device *instance* identifier.

Actually, the Vendor/Device ID pair is a unique identifier for the device
model. There are a Base Class and Subclass IDs, as well as a subsystem
vendor ID.

There are equivalents in USB. But, neither of them are globally unique
identifiers for the device. That doesn't necessarily mean that one
couldn't be ascertained from the device; ethernet cards do have MAC
addresses. But, I don't think that many will have a ID/serial number.

And this leads to inconsistency. You'll have PCI devices that have a
Vendor/Device/Class ID, and some that have a device-specific ID. Then
you'll have USB devices with the same. And what about legacy devices?

Which leads me to the question: what real benefit does this have? Why
would you ever want to do a global search in kernel space for a particular
device? The bus structure can keep (and likely already does keep) this
information. It can export it to userland on its own; the top layer
doesn't need to do that.

Yes, the formats of each file will be different, but they would be anyway.
The names may be different for different buses, but we can encourage the
bus layers to all export a file of the same name ("ID") if we want.

I don't think the UUID belongs in the top level structure. It belongs in
whatever structure dictates it - the bus structure or the class strucutre.

	-pat



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 18:28     ` Jonathan Lundell
  2001-10-18 19:49       ` Patrick Mochel
@ 2001-10-19 17:12       ` Jonathan Lundell
  1 sibling, 0 replies; 120+ messages in thread
From: Jonathan Lundell @ 2001-10-19 17:12 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Patrick Mochel, linux-kernel

At 4:19 PM -0400 10/18/01, Jeff Garzik wrote:
>  > In this particular case, of course, the driver can keep a soft copy
>>  of the current MAC address and and restore from that, but that means
>>  making special cases of special things.
>
>For that specific case, NIC drivers should read a copy of the MAC
>address at probe time, and store it in dev->dev_addr.  Each power-up+if
>open cycle, the MAC address to programmed onto the NIC.  So that is not
>a special case but the normal case, keeping a soft copy of the MAC.
>
>Just being picky :)

No, you're right, and it's especially true of NIC drivers. Partly, I 
assume, because it's SOP in NIC drivers to routinely reinitialize the 
hardware after various errors. And for Ethernet makes it easy, 
because we're allowed to silently discard packets.

My own inclination would be to always keep enough information in 
driver structures to reinitialize the device, though I'd hesitate to 
assert that this is always possible, or practical.

WRT the suspend/resume sequence, I'd like to see the process 
extensible. So, for example, a single suspend entry point with an 
argument specifying the current action. The stuff I'm working on 
requires a kind of "suspend with extreme prejudice" in which the 
driver can't decline to suspend, as well as a suspend that comes 
*after* a bus (therefore device) reset (which would explain my urge 
to keep device state in soft structures). This is generally simple 
enough for Ethernet drivers, but a little trickier for other devices.

BTW (and excuse me for not searching this out, if it's available), is 
2.5 intended to have a real device tree? There's a related issue for 
suspend/resume, namely the hierarchical relationship of some devices 
(eg md->sd->adapter or whatever).
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 16:19     ` Patrick Mochel
  2001-10-18 17:38       ` Tim Jansen
  2001-10-18 22:06       ` Benjamin Herrenschmidt
@ 2001-10-19 17:09       ` Kai Henningsen
  2 siblings, 0 replies; 120+ messages in thread
From: Kai Henningsen @ 2001-10-19 17:09 UTC (permalink / raw)
  To: linux-kernel

benh@kernel.crashing.org (Benjamin Herrenschmidt)  wrote on 19.10.01 in <20011018220604.23253@smtp.wanadoo.fr>:

> collisions between uuid's of different devices types. In the case of
> ethernet hardware, the MAC address seems to be the best type of uuid
> available, so it would be something like "ethaddr,xx:xx:xx:xx:xx:xx",
> FireWire has a generic uuid allocation scheme as well, it could be
> "ieee1394,xxxxxx...", etc...

I have no idea what Firewire uses, but there are two generic kinds of  
numbers that the IEEE allocates (actually, they're two different views on  
a single id space).

Those are the MAC-48 address used by ethernet, fddi, and various other  
protocols, and the EUI-64 used by more modern designs (and referenced by  
IPv6; in fact, there's an algorithm that lets you create an EUI-64 from a  
MAC-48 via bit stuffing).

Both of these depend on a 24 bit id called company_id or OUI which you can  
buy from the IEEE for US$1.250,00 (for 16 million MAC-48's or 1 trillion  
EUI-64's).

The list of public OUIs is at <URL:http://standards.ieee.org/regauth/oui/ 
oui.txt> (there are "unlisted numbers" in that namespace, too).

Ah, I see IEEE 1394 *does* use OUIs. Not at all surprising, of course.

So, the namespace should be used, not the appliation. In fact, given the  
standard conversion from MAC-48 to EUI-64, we should probably just use one  
namespace for both: my current ethernet card 00:50:FC:0C:63:69 would thus  
be named "eui-64,00:50:fc:ff:ff:0c:63:69".

More than you ever wanted to know about this stuff: <URL:http:// 
standards.ieee.org/regauth/oui/>.

Of course, there *are* other namespaces.

MfG Kai

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
@ 2001-10-19 17:01 Kevin Easton
  2001-10-19 18:40 ` Patrick Mochel
  0 siblings, 1 reply; 120+ messages in thread
From: Kevin Easton @ 2001-10-19 17:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: mochel

Hi,

Am I correct in thinking that the current "state of play" after these recent
discussions is a 3 step suspend process, following an algorithm similar to:

    if (suspend_prepare(device_list) == failed) {
        suspend_cancel(device_list);
        return failed;
    }

    if (suspend_save_state(device_list) == failed) {
        suspend_cancel(device_list);
        return failed;
    }

    write_out_state();
    suspend_now(device_list);

Where these operations on the drivers are defined as:

suspend_prepare:
    Allocate any memory needed for saving of state, suspending & resuming
    device.  LAST CHANCE TO ALLOCATE MEMORY.

suspend_save_state:
    Stop accepting requests.
    Save state of device.

suspend_now:
    Turn off device.

suspend_cancel:
    Free any memory that may have been allocated for saving of state.
    Resume normal operation.

...and write_out_state() somehow stores the saved (in memory) state of the
devices to nonvolatile storage.

If this is approximately the right idea, then how will write_out_state work if
the device(s) that this operation uses aren't accepting requests anymore 
(because they've done suspend_save_state)?  Is it that "Stop accepting 
requests" is actually "Stop accepting requests that will cause a change in the
device state"?  In that case, devices that can have the state written out to 
them will be limited to those where the act of writing it out will never cause
such a request, right?

    - Kevin.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 12:13   ` Benjamin Herrenschmidt
                       ` (2 preceding siblings ...)
  2001-10-19  7:57     ` Henning P. Schmiedehausen
@ 2001-10-19 15:21     ` Taral
  2001-10-19 23:30     ` Benjamin Herrenschmidt
  4 siblings, 0 replies; 120+ messages in thread
From: Taral @ 2001-10-19 15:21 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Patrick Mochel, Jeff Garzik, linux-kernel, Linus Torvalds

On Thu, Oct 18, 2001 at 02:13:18PM +0200, Benjamin Herrenschmidt wrote:
> I would add to the generic structure device, a "uuid" string field.
> This field would contain a "munged" unique identifier composed of
> the bus type followed which whatever bus-specific unique ID is
> provided by the driver. If the driver don't provide one, it defaults
> to a copy of the busID.
> 
> What I have in mind here is to have a common place to look for the
> best possible unique identification for a device. Typical example are
> ieee1394 hard disks which do have a unique ID, and so can be properly
> tracked between insertion removal.

Actually, if this field were to be added, I think it would be far better
to have it be NULL in the case where there is no ID which can be
expected to remain the same on insert/remove. Otherwise we might have
people getting very confused when someone removes device A and adds
device B and they end up with the same "unique id" because neither one
has a real unique id.

-- 
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Any technology, no matter how primitive, is magic to those who don't
understand it." -- Florence Ambrose

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19  8:31         ` Keith Owens
@ 2001-10-19  8:43           ` Jeff Garzik
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-19  8:43 UTC (permalink / raw)
  To: Keith Owens; +Cc: linux-kernel

Keith Owens wrote:
> Will you want modutils support for this new struct?  If so it needs
> a version field.

For struct device?  Um, no, we don't need modutils support for it.

	Jeff


-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19  8:09       ` Jeff Garzik
@ 2001-10-19  8:31         ` Keith Owens
  2001-10-19  8:43           ` Jeff Garzik
  0 siblings, 1 reply; 120+ messages in thread
From: Keith Owens @ 2001-10-19  8:31 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel

On Fri, 19 Oct 2001 04:09:05 -0400, 
Jeff Garzik <jgarzik@mandrakesoft.com> wrote:
>"Henning P. Schmiedehausen" wrote:
>> And a version field! Please add a version field right to the
>> beginning. This would make supporting legacy drivers in later versions
>> _much_ easier.
>
>This is not a structure that is directly exposed to userspace, so it
>doesn't need to be versioned.

Will you want modutils support for this new struct?  If so it needs
a version field.


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-19  7:57     ` Henning P. Schmiedehausen
@ 2001-10-19  8:09       ` Jeff Garzik
  2001-10-19  8:31         ` Keith Owens
  2001-10-19 18:50       ` Tim Jansen
  1 sibling, 1 reply; 120+ messages in thread
From: Jeff Garzik @ 2001-10-19  8:09 UTC (permalink / raw)
  To: hps; +Cc: linux-kernel

"Henning P. Schmiedehausen" wrote:
> And a version field! Please add a version field right to the
> beginning. This would make supporting legacy drivers in later versions
> _much_ easier.

That's something to be done in a Windows not Linux driver.

This is not a structure that is directly exposed to userspace, so it
doesn't need to be versioned.

We don't really support legacy drivers in the first place, much less
take up space with versions in structs everywhere..

	Jeff


-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 12:13   ` Benjamin Herrenschmidt
  2001-10-18 16:19     ` Patrick Mochel
  2001-10-18 22:10     ` Kai Henningsen
@ 2001-10-19  7:57     ` Henning P. Schmiedehausen
  2001-10-19  8:09       ` Jeff Garzik
  2001-10-19 18:50       ` Tim Jansen
  2001-10-19 15:21     ` Taral
  2001-10-19 23:30     ` Benjamin Herrenschmidt
  4 siblings, 2 replies; 120+ messages in thread
From: Henning P. Schmiedehausen @ 2001-10-19  7:57 UTC (permalink / raw)
  To: linux-kernel

Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:

>>> struct device {
>>>         struct list_head        bus_list;
>>>         struct io_bus           *parent;
>>>         struct io_bus           *subordinate;
>>> 
>>>         char                    name[DEVICE_NAME_SIZE];
>>>         char                    bus_id[BUS_ID_SIZE];
>>> 
>>>         struct dentry           *dentry;
>>>         struct list_head        files;
>>> 
>>>         struct  semaphore       lock;
>>> 
>>>         struct device_driver    *driver;
>>>         void                    *driver_data;
>>>         void                    *platform_data;
>>> 
>>>         u32                     current_state;
>>>         unsigned char           *saved_state;
>>> };

>Hi Patrick ! Nice to see this happening ;)

>I would add to the generic structure device, a "uuid" string field.

And a version field! Please add a version field right to the
beginning. This would make supporting legacy drivers in later versions
_much_ easier.

	Ciao
		Henning

-- 
Dipl.-Inf. (Univ.) Henning P. Schmiedehausen       -- Geschaeftsfuehrer
INTERMETA - Gesellschaft fuer Mehrwertdienste mbH     hps@intermeta.de

Am Schwabachgrund 22  Fon.: 09131 / 50654-0   info@intermeta.de
D-91054 Buckenhof     Fax.: 09131 / 50654-20   

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 23:44             ` Benjamin Herrenschmidt
@ 2001-10-18 23:52               ` Jeff Garzik
  0 siblings, 0 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-18 23:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel

Benjamin Herrenschmidt wrote:
> Well, you may think it's ok to do it, let's say, for a serial port, in
> step 1. But... what about NFS over PPP over that serial port ? :)

In fact, I have done to that connect a former roommate's Amiga to my
own.  He accessed my files across NFS using SLIP and a null modem
cable...  :)

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 23:30           ` Patrick Mochel
@ 2001-10-18 23:44             ` Benjamin Herrenschmidt
  2001-10-18 23:52               ` Jeff Garzik
  0 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-18 23:44 UTC (permalink / raw)
  To: Patrick Mochel, linux-kernel

>
>Ok, so we need another walk before we go to sleep.
>
>But, first a question - does the swap device need to absolutely be the
>last thing to stop taking requests? Or, can it stop after everything is
>done allocating memory?

The problem with VM is that you don't really have one swap device.

You can have swap on files from several devices, you can have mmap'ed
files from any mounted filesystem on any block device, you can have
NFS, etc...

That's why we must completely separate allocation from blocking of
activity. If we do so, we don't need to care about any ordering rule
between drivers (at least not because of this problem, other issues
may require ordering rules, but it's an arch matter).

>> The actual state save can be in step 2 or 3, we don't really care,
>> it depends mostly on what is more convenient for the driver writer.
>
>For most devices, it seems it could happen in the first, as well. They
>should be fine with stopping I/O requests early on. It's only special
>cases like swap and maybe one or two others that need an extra step,
>right?

Well, you may think it's ok to do it, let's say, for a serial port, in
step 1. But... what about NFS over PPP over that serial port ? :)

If a device don't need to allocate memory and can do the save_state
and shutdown in one step, then it only need to respond to step 2. It
will skip step 1 and step 3.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 22:18         ` Benjamin Herrenschmidt
@ 2001-10-18 23:30           ` Patrick Mochel
  2001-10-18 23:44             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 23:30 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel


> That's why I prefer more explicit semantics:
>
>  - Prepare sleep: Allocate enough memory to save state. For most
> devices, it will be a fixed quantity. In the case of devices that need
> per-request allocation, like USB of firewire, just allocate a limited
> pool. That means that you will eventually cause serialisation to
> happen when not needed and hurt perfs, but nobody will care at this
> point ;)
>
>  - Suspend activity: There you lock your IO queues, set your busy flag
> or whatever, and wait for any pending IO to be completed. Interrupts
> are enabled, scheduling as well (and other CPUs). Each driver is
> responsible to properly block a process issuing a request (which should
> not be a problem to implement for most of them, a single semaphore
> is enough for simple drivers, drivers with IO queues just need to
> leave requests in the queues, etc...)
>
>  - Set power state: Here you shut your device down for real. Interrupt
> are disabled. Only one CPU is still active (the others can be put in
> whatever state your arch allow, like a sleep loop or whatever...).

Ok, so we need another walk before we go to sleep.

But, first a question - does the swap device need to absolutely be the
last thing to stop taking requests? Or, can it stop after everything is
done allocating memory?

> The actual state save can be in step 2 or 3, we don't really care,
> it depends mostly on what is more convenient for the driver writer.

For most devices, it seems it could happen in the first, as well. They
should be fine with stopping I/O requests early on. It's only special
cases like swap and maybe one or two others that need an extra step,
right?

	-pat


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 21:32         ` John Alvord
  2001-10-18 22:23           ` Benjamin Herrenschmidt
@ 2001-10-18 22:26           ` Jeff Garzik
  1 sibling, 0 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-18 22:26 UTC (permalink / raw)
  To: John Alvord; +Cc: Patrick Mochel, Jonathan Lundell, linux-kernel

John Alvord wrote:
> 
> On Thu, 18 Oct 2001 12:49:01 -0700 (PDT), Patrick Mochel
> <mochelp@infinity.powertie.org> wrote:
> 
> >
> >> The "state of all the devices in the system". Presumably, while you
> >> walk the tree the first time (to save state) interrupts are enabled,
> >> and devices are active. Operations (including interrupts) on the
> >> device can, presumably, change the state of the device after its
> >> state has been saved.
> >
> >Ya, I'm an idiot sometimes. I relized this just as I was leaving for
> >lunch. I almost turned around to come back and answer..
> >
> >This is what I had in mind; If someone could give me a thumbs-up or
> >thumbs-down on whether or not this would work:
> >
> >When the driver gets a save_state request, that is its notification that
> >it is going to sleep. It should then stop/finish all I/O requests. It
> >should then prevent itself from taking any more - by setting a flag or
> >whatever. Then, device save state.
> >
> >>From that point in, it should know not to take any requests, theoretically
> >preserving state.
> >
> >When it gets the restore_state() call, it should first restore device
> >state. Once it does that, it knows that it can take I/O requests again.
> >
> >That should work, right?
> >
> >The only thing that that won't work for is the device to which we're
> >saving state, like the disk. At some point, though we have to accept that
> >the state that we saved was some checkpoint in the past, and it won't
> >reflect the state that changed in the process of writing the system state.
> 
> Maybe each driver could pass back a value indicating
> 
> 1) all done
> 2) N milliseconds more, please

It seems far less complex to simply let the driver do what it needs to
do, in the time it needs to do it.  The probe step in current drivers
for example could take anywhere from less than a second to several
seconds, depending on what needs to be done.

Though like I mentioned in a previous mail, if you have a two-stage
save-state step, a lot of those delays can be parallelized:  in the
first save-state the driver stops the hardware from accepting further
transaction, and initiates I/O request completion (where possible).  The
second save-state cleans up any outstanding transactions and shuts the
rest of the hardware down.

	Jeff


-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 21:32         ` John Alvord
@ 2001-10-18 22:23           ` Benjamin Herrenschmidt
  2001-10-18 22:26           ` Jeff Garzik
  1 sibling, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-18 22:23 UTC (permalink / raw)
  To: John Alvord; +Cc: Patrick Mochel, linux-kernel

>Maybe each driver could pass back a value indicating
>
>1) all done
>2) N milliseconds more, please
>
>and you could keep calling until every driver says all done. The
>all-done drivers would ignore any new interrupts. The Not-Yet drivers
>could get the last few interrupts the need to complete. Of course
>there would need to be an overall timeout. That would leave most of
>the responsibility with the drivers... who know most of the true
>requirements.

Hrm... The interesting thing with this scheme is that it allows
you to first block your queue, then let other driver do the same
while your async IO completes, and then come back. Well... this
could be an option to step "2" of my earlier proposal.
This requires the device structure to keep track of which driver
still wants to be called. It would only go to step 3 once all
drivers have ack'ed step 2.

Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 19:49       ` Patrick Mochel
  2001-10-18 20:40         ` Jeff Garzik
  2001-10-18 21:32         ` John Alvord
@ 2001-10-18 22:18         ` Benjamin Herrenschmidt
  2001-10-18 23:30           ` Patrick Mochel
  2 siblings, 1 reply; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-18 22:18 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, Jonathan Lundell

>
>
>When the driver gets a save_state request, that is its notification that
>it is going to sleep. It should then stop/finish all I/O requests. It
>should then prevent itself from taking any more - by setting a flag or
>whatever. Then, device save state. 

Ok, that's what I call more or less "blocking IO queues"..

>From that point in, it should know not to take any requests, theoretically
>preserving state. 

But that's my problem :) Imagine another driver next in the loop need
memory and call swap_out to happen. Problem: you just have blocked the
IO queue of your swap device.

>When it gets the restore_state() call, it should first restore device
>state. Once it does that, it knows that it can take I/O requests again. 
>
>That should work, right?
>
>The only thing that that won't work for is the device to which we're
>saving state, like the disk. At some point, though we have to accept that
>the state that we saved was some checkpoint in the past, and it won't
>reflect the state that changed in the process of writing the system state.

That's why I prefer more explicit semantics:

 - Prepare sleep: Allocate enough memory to save state. For most
devices, it will be a fixed quantity. In the case of devices that need
per-request allocation, like USB of firewire, just allocate a limited
pool. That means that you will eventually cause serialisation to
happen when not needed and hurt perfs, but nobody will care at this
point ;)

 - Suspend activity: There you lock your IO queues, set your busy flag
or whatever, and wait for any pending IO to be completed. Interrupts
are enabled, scheduling as well (and other CPUs). Each driver is
responsible to properly block a process issuing a request (which should
not be a problem to implement for most of them, a single semaphore
is enough for simple drivers, drivers with IO queues just need to
leave requests in the queues, etc...)

 - Set power state: Here you shut your device down for real. Interrupt
are disabled. Only one CPU is still active (the others can be put in
whatever state your arch allow, like a sleep loop or whatever...).

The actual state save can be in step 2 or 3, we don't really care,
it depends mostly on what is more convenient for the driver writer.

The resume process only needs 2 state imho:

 - Set power state: You power back on your device, re-configure the
hardware properly, and make sure it won't send spurrious interrupts.
System interrupts are disabled. One CPU is running.

 - Resume activity: System interrupts are back, scheduling too, you
start handling pending requests.


Regards,
Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 12:13   ` Benjamin Herrenschmidt
  2001-10-18 16:19     ` Patrick Mochel
@ 2001-10-18 22:10     ` Kai Henningsen
  2001-10-19 18:26       ` Patrick Mochel
  2001-10-19  7:57     ` Henning P. Schmiedehausen
                       ` (2 subsequent siblings)
  4 siblings, 1 reply; 120+ messages in thread
From: Kai Henningsen @ 2001-10-18 22:10 UTC (permalink / raw)
  To: linux-kernel

mochelp@infinity.powertie.org (Patrick Mochel)  wrote on 18.10.01 in <Pine.LNX.4.21.0110180826240.16868-100000@marty.infinity.powertie.org>:

> > I would add to the generic structure device, a "uuid" string field.
> > This field would contain a "munged" unique identifier composed of
> > the bus type followed which whatever bus-specific unique ID is
> > provided by the driver. If the driver don't provide one, it defaults
> > to a copy of the busID.
> >
> > What I have in mind here is to have a common place to look for the
> > best possible unique identification for a device. Typical example are
> > ieee1394 hard disks which do have a unique ID, and so can be properly
> > tracked between insertion removal.
>
> Hmm. So, this would be a device ID, much like the Vendor/Device ID pair in
> PCI space?

Except for the fact that the Vendor/Device ID pair is a device *class*  
identifier, and the uuid is a device *instance* identifier.


MfG Kai

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 16:19     ` Patrick Mochel
  2001-10-18 17:38       ` Tim Jansen
@ 2001-10-18 22:06       ` Benjamin Herrenschmidt
  2001-10-19 17:09       ` Kai Henningsen
  2 siblings, 0 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-18 22:06 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, Jeff Garzik, Linus Torvalds

>Hmm. So, this would be a device ID, much like the Vendor/Device ID pair in
>PCI space? Does this need to happen at the top layer, or can it work at
>the bus layer?

No, the idea is to have a unique identifier for a given device "instance".
A VendorID/DeviceID pair isn't unique as two PCI cards can perfectly have
the same one.

Some devices, and I think this is mandatory in the ieee1394 spec, provide
a per-device unique ID, similar to an ethernet address or a serial number.

The goal here is to provide in a common location for any kind of a device
the best approximation of a unique identifier a device can provide.

I beleive the "bus" driver would setup a "default" one for a given device
(like a munge of pci slot/vendorid/deviceId for a PCI device that doesn't
provide anything better), and let the device driver override that with
something with better "uniqueness" if available.

However, I didn't explain myself correctly when writing about appending
that id to a "bus type". It's not actually a bus type I was thinking
about, but a type related to the uuid itself to avoid name space
collisions between uuid's of different devices types. In the case of
ethernet hardware, the MAC address seems to be the best type of uuid
available, so it would be something like "ethaddr,xx:xx:xx:xx:xx:xx",
FireWire has a generic uuid allocation scheme as well, it could be
"ieee1394,xxxxxx...", etc...

The goal is to help userland, especially configuration tools, to
keep track or hardware. In the case of block devices like sbp2 disks,
it can help making sure that if a device is unplugged before beeing
properly unmounted, when plugged back, it can be identified as beeing
the same device.

>That shouldn't be too hard. ACPI wants to do something like that as well -
>they will be able to ascertain information about some devices that we
>otherwise wouldn't know, and will want to export that to userspace. 
>
>The idea was to make a call to platform_notify() on each device
>registration, so the platform/firmware/arch could do things liike that.

Ok, nice. maybe provide some room (for a pointer) in the device
structure to be used by the platform.

Thinking a bit more about it, why not simply calling the node's
father with a kind of child_notify() callback ? The default
behaviour would be to call the parent until it ends up at the
motherboard/arch level. That way, special bus types can add
bus-related informations to devices more easily.

>I think this can be solved in the suspend transition that I desribed:
>
>- save_state
>- suspend
>- resume
>- restore_state
>
>The save_state() call is the notification that the device will be
>suspended. It is in here that the driver allocates memory to save
>state. But, no devices are actually put to sleep until the enter tree has
>been walked to save state. 

Well, I remember when you first implemented this save state mecanism. I'm
not sure I like the save_state and restore_state semantics much. Since the
device can take additional requests after save_state (typically, it's a
block device and another driver is causing swap out from it's save_state
routine), then your state gets changed. You are not really saving the device
state at this point, you are allocating room to save state.

>Then, we make a rule that says "Thou shall not allocate memory in
>suspend() or resume()" and let them be damned if they do. 

Ok. That means that things like USB must make sure they pre-allocate
any USBs that may be needed. Sounds fine to me. 

>I remember that discussion, and I think the above transition should fix
>that as well - have save_state() and restore_state() operate with
>interrupts enabled, while suspend() and resume() execute with interrupts
>disabled.

I don't agree there. As I said, since the device can still take requests
after save_state, it can't really save its state nor block it's IO queues
if any. So that has to happen within suspend itself.

I've turned that problem in every possible directions ;) I think there's
really 3 required suspend steps, even if most drivers will only need to
really implement one or two.

>Yes, I remember these discussions as well. Oh, and what a nightmare that
>is. The bus layer needs to have logic to know what power state to enter
>based on the power state of all its children; a PCI bridge cannot enter a
>state lower than the lowest state of all its children. The PM layer should
>then take this into account and react appropriately.

Ok. So if I we implement a toplevel "motherboard" node that is father
of all PCI busses (and anything else), we can have the arch handle that.
If the loop of all it's device don't return D3 but something less, then
the motherboard won't be put to suspend. That's fine... for me ;) But
what if not beeing able to set a given PCI bus to D3 is not a real
issue ? (for example, the motherboard can be told not to shutdown that
specific PCI bus). Well, I beleive in this case, the motherboard can
have more intimate knowledge of it's children and which ones are
really "mandatory" for sleep or not.

>Ideally, we want some way to reinit all devices. Most should be possible,
>with one glaring exception: video. In order to reinit video, we need:

Are you sure we know how to reinit all sorts of SCSI cards our there
as well ?

>- a framebuffer driver that knows the innards of all the cards it supports
>
>or 
>
>- make something else do it, like X.
>
>The latter seems most plausible, since it knows about most cards. And, for
>initialisation, at least on x86, it can run the BIOS routines. 

Well, both may work. It's a matter of motherboard policy I beleive.
We might add a special case to the fbdev layer to be told by X "hey,
don't bother about sleep, I'll handle it", but X is only allowed to
touch hardware when frontmost, and this would require more linux-specific
cruft in X which would be difficult to get accepted.

I beleive the way to get back the video card will have to be dealt on
a per card basis anyway. If we have the infrastructure for drivers to
say "I can't deal with shutdown", it's enough for now. I'm looking
into some ways, on PPC, to re-run the card's firmware with a small
forth interpreter ;) (reminds you of ACPI ? heh ;) For now, it's
not an important issue as putting desktop machines to sleep is not
as important as putting laptops to sleep, and fortunately, so far,
Apple laptops don't power off the AGP slot during sleep (we use D2).


>Of course, that does nothing for you on PPC, but I am hoping something
>similar can be accomplished. Can X run the OFW routines in the video ROM?

It can't (well, it could probably run the BIOS of an x86 card), but
there might be other ways. Re-initing the card completely after figuring
out all registers values for a given model may be a working solution for
macs as they almost all use a limited range of ATI hardware. 
Emulating OF might be a solution as well. One last would be a wrapper
to run Apple MacOS drivers (which are in card's ROMs most of the time,
this is more or less already what Apple does with MacOS X).

Ben.




^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 19:49       ` Patrick Mochel
  2001-10-18 20:40         ` Jeff Garzik
@ 2001-10-18 21:32         ` John Alvord
  2001-10-18 22:23           ` Benjamin Herrenschmidt
  2001-10-18 22:26           ` Jeff Garzik
  2001-10-18 22:18         ` Benjamin Herrenschmidt
  2 siblings, 2 replies; 120+ messages in thread
From: John Alvord @ 2001-10-18 21:32 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Jonathan Lundell, linux-kernel

On Thu, 18 Oct 2001 12:49:01 -0700 (PDT), Patrick Mochel
<mochelp@infinity.powertie.org> wrote:

>
>> The "state of all the devices in the system". Presumably, while you 
>> walk the tree the first time (to save state) interrupts are enabled, 
>> and devices are active. Operations (including interrupts) on the 
>> device can, presumably, change the state of the device after its 
>> state has been saved.
>
>Ya, I'm an idiot sometimes. I relized this just as I was leaving for
>lunch. I almost turned around to come back and answer..
>
>This is what I had in mind; If someone could give me a thumbs-up or
>thumbs-down on whether or not this would work:
>
>When the driver gets a save_state request, that is its notification that
>it is going to sleep. It should then stop/finish all I/O requests. It
>should then prevent itself from taking any more - by setting a flag or
>whatever. Then, device save state. 
>
>>From that point in, it should know not to take any requests, theoretically
>preserving state. 
>
>When it gets the restore_state() call, it should first restore device
>state. Once it does that, it knows that it can take I/O requests again. 
>
>That should work, right?
>
>The only thing that that won't work for is the device to which we're
>saving state, like the disk. At some point, though we have to accept that
>the state that we saved was some checkpoint in the past, and it won't
>reflect the state that changed in the process of writing the system state.

Maybe each driver could pass back a value indicating

1) all done
2) N milliseconds more, please

and you could keep calling until every driver says all done. The
all-done drivers would ignore any new interrupts. The Not-Yet drivers
could get the last few interrupts the need to complete. Of course
there would need to be an overall timeout. That would leave most of
the responsibility with the drivers... who know most of the true
requirements.

john alvord

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 19:49       ` Patrick Mochel
@ 2001-10-18 20:40         ` Jeff Garzik
  2001-10-18 21:32         ` John Alvord
  2001-10-18 22:18         ` Benjamin Herrenschmidt
  2 siblings, 0 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-18 20:40 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Jonathan Lundell, linux-kernel

Patrick Mochel wrote:
> When the driver gets a save_state request, that is its notification that
> it is going to sleep. It should then stop/finish all I/O requests. It
> should then prevent itself from taking any more - by setting a flag or
> whatever. Then, device save state.
> 
> From that point in, it should know not to take any requests, theoretically
> preserving state.
> 
> When it gets the restore_state() call, it should first restore device
> state. Once it does that, it knows that it can take I/O requests again.
> 
> That should work, right?

Seems reasonable.  If a save_state is refused, I assume you
restore_state for all other devices and bring the system from a
half-working state [at the time the suspend was rejected] to a
full-working state?

Consider that it will take some amount of time to stop pending I/O
requests.  You might want to walk the tree, and tell devices "start
saving", and then walk the tree again and say "finish saving."

	Jeff


-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 18:28     ` Jonathan Lundell
@ 2001-10-18 19:49       ` Patrick Mochel
  2001-10-18 20:40         ` Jeff Garzik
                           ` (2 more replies)
  2001-10-19 17:12       ` Jonathan Lundell
  1 sibling, 3 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 19:49 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: linux-kernel


> The "state of all the devices in the system". Presumably, while you 
> walk the tree the first time (to save state) interrupts are enabled, 
> and devices are active. Operations (including interrupts) on the 
> device can, presumably, change the state of the device after its 
> state has been saved.

Ya, I'm an idiot sometimes. I relized this just as I was leaving for
lunch. I almost turned around to come back and answer..

This is what I had in mind; If someone could give me a thumbs-up or
thumbs-down on whether or not this would work:

When the driver gets a save_state request, that is its notification that
it is going to sleep. It should then stop/finish all I/O requests. It
should then prevent itself from taking any more - by setting a flag or
whatever. Then, device save state. 

>From that point in, it should know not to take any requests, theoretically
preserving state. 

When it gets the restore_state() call, it should first restore device
state. Once it does that, it knows that it can take I/O requests again. 

That should work, right?

The only thing that that won't work for is the device to which we're
saving state, like the disk. At some point, though we have to accept that
the state that we saved was some checkpoint in the past, and it won't
reflect the state that changed in the process of writing the system state.

	-pat




^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 17:38   ` Patrick Mochel
  2001-10-18 17:41     ` Patrick Mochel
@ 2001-10-18 18:28     ` Jonathan Lundell
  2001-10-18 19:49       ` Patrick Mochel
  2001-10-19 17:12       ` Jonathan Lundell
  1 sibling, 2 replies; 120+ messages in thread
From: Jonathan Lundell @ 2001-10-18 18:28 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel

At 10:38 AM -0700 10/18/01, Patrick Mochel wrote:
>On Thu, 18 Oct 2001, Jonathan Lundell wrote:
>
>>  At 11:08 AM -0500 10/18/01, Taral wrote:
>>  >On Wed, Oct 17, 2001 at 04:52:29PM -0700, Patrick Mochel wrote:
>>  >>  When a suspend transition is triggered, the device tree is 
>>walked first to
>>  >>  save the state of all the devices in the system. Once this is 
>>complete, the
>>  >>  saved state, now residing in memory, can be written to some non-volatile
>>  >>  location, like a disk partition or network location.
>  > >>...
>  > What happens to state changes between the first and second traversal
>>  of the device tree?
>
>State changes of what?
>
>After the first walk (save_state), you essentially have a snapshot of the
>system in memory which can be written to disk, memory, etc.
>
>Once that is done, you disable interrupts and walk the tree again to power
>off devices.

The "state of all the devices in the system". Presumably, while you 
walk the tree the first time (to save state) interrupts are enabled, 
and devices are active. Operations (including interrupts) on the 
device can, presumably, change the state of the device after its 
state has been saved.

To take a crude example, suppose you save the state of an Ethernet 
NIC, then change its MAC address, and then suspend the device. The 
saved state now has the wrong MAC address.

In this particular case, of course, the driver can keep a soft copy 
of the current MAC address and and restore from that, but that means 
making special cases of special things.

Look at it another way. Why not save the state at the beginning of 
time (say when the device is first initialized) instead of walking 
the tree at suspend time? Presumably because there's some difference 
between the state then, and the state at suspend time. How did that 
difference happen, and why couldn't it happen after the save-state 
tree-walk but before the actual device suspend?
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 17:38   ` Patrick Mochel
@ 2001-10-18 17:41     ` Patrick Mochel
  2001-10-18 18:28     ` Jonathan Lundell
  1 sibling, 0 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 17:41 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: linux-kernel


> After the first walk (save_state), you essentially have a snapshot of the
> system in memory which can be written to disk, memory, etc.

Sorry: written to disk, network, etc. (since it's already in memory ;)

	-pat



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 16:52 ` Jonathan Lundell
@ 2001-10-18 17:38   ` Patrick Mochel
  2001-10-18 17:41     ` Patrick Mochel
  2001-10-18 18:28     ` Jonathan Lundell
  0 siblings, 2 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 17:38 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: linux-kernel




On Thu, 18 Oct 2001, Jonathan Lundell wrote:

> At 11:08 AM -0500 10/18/01, Taral wrote:
> >On Wed, Oct 17, 2001 at 04:52:29PM -0700, Patrick Mochel wrote:
> >>  When a suspend transition is triggered, the device tree is walked first to
> >>  save the state of all the devices in the system. Once this is complete, the
> >>  saved state, now residing in memory, can be written to some non-volatile
> >>  location, like a disk partition or network location.
> >>
> >>  The device tree is then walked again to suspend all of the devices. This
> >>  guarantees that the device controlling the location to write the state is
> >>  still powered on while you have a snapshot of the system state.
> >
> >Aha! A much nicer solution to the problem the ACPI people are having
> >with suspend/resume (ordering problems).
> 
> What happens to state changes between the first and second traversal 
> of the device tree?

State changes of what?

After the first walk (save_state), you essentially have a snapshot of the
system in memory which can be written to disk, memory, etc.

Once that is done, you disable interrupts and walk the tree again to power
off devices. 

	-pat


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 16:19     ` Patrick Mochel
@ 2001-10-18 17:38       ` Tim Jansen
  2001-10-18 22:06       ` Benjamin Herrenschmidt
  2001-10-19 17:09       ` Kai Henningsen
  2 siblings, 0 replies; 120+ messages in thread
From: Tim Jansen @ 2001-10-18 17:38 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel

On Thursday 18 October 2001 18:19, Patrick Mochel wrote:
> Hmm. So, this would be a device ID, much like the Vendor/Device ID pair in
> PCI space? Does this need to happen at the top layer, or can it work at
> the bus layer?

Probably both. See this discussion on device ids (from linux-hotplug-devel):

http://www.geocrawler.com/archives/3/9005/2001/9/0/6716219/
http://www.geocrawler.com/mail/thread.php3?subject=IDs+%28was+Re%3A+Hotplugging+for+the+input+subsystem%29&list=9005

bye...

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 17:05 ` Jonathan Corbet
@ 2001-10-18 17:33   ` Patrick Mochel
  0 siblings, 0 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 17:33 UTC (permalink / raw)
  To: Jonathan Corbet; +Cc: linux-kernel


> > probe:
> > 	Check for device existence and associate driver with it. 
> 
> What, exactly, does "associate driver" mean?  Filling in the struct device
> field, perhaps?  Calling register_chrdev (or register_whatever)?  Creation
> of a ddfs entry?  As a driver writer I can understand that the probe
> routine should check for the existence of some device, and perhaps set up
> an internal data structure.  What else happens?

That's basically it. The bus should have already known about the existence
of the device, filled in the fields of struct device and registered it in
the global tree. 

As Jeff Garzik suggested: 

probe:
        register interface
        sanity check h/w to make sure it's there and alive
        stop DMA/interrupts/etc., just in case
        start timer to powerdown h/w in N seconds

in which interface would be your device node (char dev, devfs node, etc).


	-pat




^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-17 23:52 Patrick Mochel
                   ` (2 preceding siblings ...)
  2001-10-18 16:52 ` Jonathan Lundell
@ 2001-10-18 17:05 ` Jonathan Corbet
  2001-10-18 17:33   ` Patrick Mochel
  3 siblings, 1 reply; 120+ messages in thread
From: Jonathan Corbet @ 2001-10-18 17:05 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel

> The (New) Linux Kernel Driver Model

It looks like a good start - a lot of things will be cleaner afterward.

A question...

In struct device_driver:

> probe:
> 	Check for device existence and associate driver with it. 

What, exactly, does "associate driver" mean?  Filling in the struct device
field, perhaps?  Calling register_chrdev (or register_whatever)?  Creation
of a ddfs entry?  As a driver writer I can understand that the probe
routine should check for the existence of some device, and perhaps set up
an internal data structure.  What else happens?

jon

Jonathan Corbet
Executive editor, LWN.net
corbet@lwn.net

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-17 23:52 Patrick Mochel
  2001-10-18  6:23 ` Jeff Garzik
  2001-10-18 16:08 ` Taral
@ 2001-10-18 16:52 ` Jonathan Lundell
  2001-10-18 17:38   ` Patrick Mochel
  2001-10-18 17:05 ` Jonathan Corbet
  3 siblings, 1 reply; 120+ messages in thread
From: Jonathan Lundell @ 2001-10-18 16:52 UTC (permalink / raw)
  To: linux-kernel

At 11:08 AM -0500 10/18/01, Taral wrote:
>On Wed, Oct 17, 2001 at 04:52:29PM -0700, Patrick Mochel wrote:
>>  When a suspend transition is triggered, the device tree is walked first to
>>  save the state of all the devices in the system. Once this is complete, the
>>  saved state, now residing in memory, can be written to some non-volatile
>>  location, like a disk partition or network location.
>>
>>  The device tree is then walked again to suspend all of the devices. This
>>  guarantees that the device controlling the location to write the state is
>>  still powered on while you have a snapshot of the system state.
>
>Aha! A much nicer solution to the problem the ACPI people are having
>with suspend/resume (ordering problems).

What happens to state changes between the first and second traversal 
of the device tree?
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18 12:13   ` Benjamin Herrenschmidt
@ 2001-10-18 16:19     ` Patrick Mochel
  2001-10-18 17:38       ` Tim Jansen
                         ` (2 more replies)
  2001-10-18 22:10     ` Kai Henningsen
                       ` (3 subsequent siblings)
  4 siblings, 3 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 16:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Jeff Garzik, linux-kernel, Linus Torvalds


Hi there. 

> I would add to the generic structure device, a "uuid" string field.
> This field would contain a "munged" unique identifier composed of
> the bus type followed which whatever bus-specific unique ID is
> provided by the driver. If the driver don't provide one, it defaults
> to a copy of the busID.
> 
> What I have in mind here is to have a common place to look for the
> best possible unique identification for a device. Typical example are
> ieee1394 hard disks which do have a unique ID, and so can be properly
> tracked between insertion removal.

Hmm. So, this would be a device ID, much like the Vendor/Device ID pair in
PCI space? Does this need to happen at the top layer, or can it work at
the bus layer?

> Also, I'd like to see a simple ability for the arch code to add
> entries to the exposed device filesystem nodes. The main reason for
> this is that on machines like PPC with OpenFirmware or Sun with
> OpenBoot, it makes a lot of sense (and is very useful for bootloader
> configuration among others) to be able to know the firmware "path"
> corresponding to a given device. On PPC, the generic PCI code can
> do the convertion between an Open Firmware device node an a PCI
> device in the kernel, but doing so from userland is a lot more
> tricky. The device filesystem is a very good way to fix that problem
> once for all.

That shouldn't be too hard. ACPI wants to do something like that as well -
they will be able to ascertain information about some devices that we
otherwise wouldn't know, and will want to export that to userspace. 

The idea was to make a call to platform_notify() on each device
registration, so the platform/firmware/arch could do things liike that.

> However, there is another important point about power management I
> discovered the "hard way", which is memory allocation vs. turning
> off of swapping devices (that is the swap device itself or any device
> on which you may have mmap'ed files).
> 
> For "transcient" power management (that is dynamically putting a
> subsystem to sleep when idle until it gets a new request), there
> is no real problem provided that the driver can do the wakeup without
> allocating memory.
> 
> For system power sleep, where you actually shut down everything,
> the problem happens when you start shutting down those "swap" devices.
> Once done, you may be in a situation where another device, to be shut
> down or to wake up properly, need to allocate memory (see for example
> USB devices that need to allocate urb's). This may cause requests
> to swap_out which will block indefinitely if trying to swap out
> pages to an already sleeping device.
> 
> I "work around" this in the PowerBook sleep code in a bit dumb way
> which work in 99% of the case but is probably broken as well if you
> are really near oom. Basically, instead of calling only the "suspend"
> callbacks of devices, I have an additional "suspend requested"
> one that is sent to every driver using my specific PM scheme _before_
> starting the real round of "suspend" callbacks. Drivers that need 
> a significant amount of backup memory (like some framebuffers) will
> do the necessary allocations from this early callback.

I think this can be solved in the suspend transition that I desribed:

- save_state
- suspend
- resume
- restore_state

The save_state() call is the notification that the device will be
suspended. It is in here that the driver allocates memory to save
state. But, no devices are actually put to sleep until the enter tree has
been walked to save state. 

Then, we make a rule that says "Thou shall not allocate memory in
suspend() or resume()" and let them be damned if they do. 

> Another issue with suspend and resume is with interrupt sharing and
> some bad devices that unconditionally assert their interrupt line
> when put to any PM state. On the contrary, some drivers, in order
> to properly block any new request in it's queues and wait for any
> pending one to complete, may need to operate with interrupt still
> running. I discussed that a bit with Alan, and it seem that we really
> need 2 rounds of "suspend" callbacks in this case (at least for
> system suspend), one with interrupts still enabled, one with interrupts
> disabled.

I remember that discussion, and I think the above transition should fix
that as well - have save_state() and restore_state() operate with
interrupts enabled, while suspend() and resume() execute with interrupts
disabled.

> Finally, I have another need for which I'm not sure how to react
> with either the current scheme or the new scheme. On "desktop"
> Apple systems (at least all the recent G4 ones), the PCI bus will
> be effectively powered down during system sleep. That means that
> we must (at least that's what both MacOS and MacOS X do) prevent
> the complete system sleep when at least one PCI slot contains a
> card for which the driver can't properly restore the state after
> a complete shutdown. This frequently happens, for example, with
> video cards that rely on some initial chip & pll configuration
> to be done by the firmware. We may be able to fallback to some 
> kind of "light suspend" where we suspend any device we can but
> not the motherboard, but that mean that the "main" PM code has
> to know about the problem and need some way to know if a given
> node in the device tree can or cannot be revived from a given
> power state (in this case, we might consider beeing powered down
> as equivalent to D3 state). My current solution is to not allow
> system sleep at all on those desktop machines.

Yes, I remember these discussions as well. Oh, and what a nightmare that
is. The bus layer needs to have logic to know what power state to enter
based on the power state of all its children; a PCI bridge cannot enter a
state lower than the lowest state of all its children. The PM layer should
then take this into account and react appropriately.

Ideally, we want some way to reinit all devices. Most should be possible,
with one glaring exception: video. In order to reinit video, we need:

- a framebuffer driver that knows the innards of all the cards it supports

or 

- make something else do it, like X.

The latter seems most plausible, since it knows about most cards. And, for
initialisation, at least on x86, it can run the BIOS routines. 

Of course, that does nothing for you on PPC, but I am hoping something
similar can be accomplished. Can X run the OFW routines in the video ROM?

	-pat


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-17 23:52 Patrick Mochel
  2001-10-18  6:23 ` Jeff Garzik
@ 2001-10-18 16:08 ` Taral
  2001-10-18 16:52 ` Jonathan Lundell
  2001-10-18 17:05 ` Jonathan Corbet
  3 siblings, 0 replies; 120+ messages in thread
From: Taral @ 2001-10-18 16:08 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 905 bytes --]

On Wed, Oct 17, 2001 at 04:52:29PM -0700, Patrick Mochel wrote:
> When a suspend transition is triggered, the device tree is walked first to 
> save the state of all the devices in the system. Once this is complete, the 
> saved state, now residing in memory, can be written to some non-volatile 
> location, like a disk partition or network location. 
> 
> The device tree is then walked again to suspend all of the devices. This 
> guarantees that the device controlling the location to write the state is 
> still powered on while you have a snapshot of the system state. 

Aha! A much nicer solution to the problem the ACPI people are having
with suspend/resume (ordering problems).

-- 
Taral <taral@taral.net>
This message is digitally signed. Please PGP encrypt mail to me.
"Any technology, no matter how primitive, is magic to those who don't
understand it." -- Florence Ambrose

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18  6:23 ` Jeff Garzik
  2001-10-18 12:13   ` Benjamin Herrenschmidt
@ 2001-10-18 15:17   ` Patrick Mochel
  1 sibling, 0 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-18 15:17 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Linus Torvalds




> So, remove() might be called without a shutdown(), and then asked to
> perform the duties normally performed by shutdown()?  That sounds like
> API dain bramage.  :)

:) I cannot disagree. I probably shouldn't have actually stated that in
the document, since the cases in which that could happen are very rare -
hotplug devices that can survive a suprise removal..Those drivers are
special and (should they ever exist) will have to know to do
that. Consider it removed. 

> Your proposal sounds ok, my one objection is separating probe/remove
> further into init/shutdown.  Can you give real-life cases where this
> will be useful?  I don't see it causing much except headache.
> 
> The preferred way of doing things (IMHO) is to do some simply sanity
> checking of the h/w device at probe time, and then perform lots of
> initialization and such at device/interface open time.  You ideally want
> a device driver lifecycle to look like
> 
> probe:
> 	register interface
> 	sanity check h/w to make sure it's there and alive
> 	stop DMA/interrupts/etc., just in case
> 	start timer to powerdown h/w in N seconds
> 
> dev_open:
> 	wake up device, if necessary
> 	init device
> 
> dev_close:
> 	stop DMA/interrupts/etc.
> 	start timer to powerdown h/w in N seconds
> 
> With that in mind, init -really- happens at device open, and in
> additional is driven more through normal user interaction via standard
> APIs, than the PCI and PM subsystems.

I agree. My main goal was to change probe() to be simple answer to the
question "Hey, are you there?", and move the init features out of it. 
In devices that support power management, that would happen anyway, so
anyway, so that resume() could re-init the device. 

I will update the code and the document to note that. 

Thanks,

	-pat


^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-18  6:23 ` Jeff Garzik
@ 2001-10-18 12:13   ` Benjamin Herrenschmidt
  2001-10-18 16:19     ` Patrick Mochel
                       ` (4 more replies)
  2001-10-18 15:17   ` Patrick Mochel
  1 sibling, 5 replies; 120+ messages in thread
From: Benjamin Herrenschmidt @ 2001-10-18 12:13 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: Jeff Garzik, linux-kernel, Linus Torvalds

>> 
>> struct device {
>>         struct list_head        bus_list;
>>         struct io_bus           *parent;
>>         struct io_bus           *subordinate;
>> 
>>         char                    name[DEVICE_NAME_SIZE];
>>         char                    bus_id[BUS_ID_SIZE];
>> 
>>         struct dentry           *dentry;
>>         struct list_head        files;
>> 
>>         struct  semaphore       lock;
>> 
>>         struct device_driver    *driver;
>>         void                    *driver_data;
>>         void                    *platform_data;
>> 
>>         u32                     current_state;
>>         unsigned char           *saved_state;
>> };

Hi Patrick ! Nice to see this happening ;)

I would add to the generic structure device, a "uuid" string field.
This field would contain a "munged" unique identifier composed of
the bus type followed which whatever bus-specific unique ID is
provided by the driver. If the driver don't provide one, it defaults
to a copy of the busID.

What I have in mind here is to have a common place to look for the
best possible unique identification for a device. Typical example are
ieee1394 hard disks which do have a unique ID, and so can be properly
tracked between insertion removal.

Also, I'd like to see a simple ability for the arch code to add
entries to the exposed device filesystem nodes. The main reason for
this is that on machines like PPC with OpenFirmware or Sun with
OpenBoot, it makes a lot of sense (and is very useful for bootloader
configuration among others) to be able to know the firmware "path"
corresponding to a given device. On PPC, the generic PCI code can
do the convertion between an Open Firmware device node an a PCI
device in the kernel, but doing so from userland is a lot more
tricky. The device filesystem is a very good way to fix that problem
once for all.

>The preferred way of doing things (IMHO) is to do some simply sanity
>checking of the h/w device at probe time, and then perform lots of
>initialization and such at device/interface open time.  You ideally want
>a device driver lifecycle to look like
>
>probe:
>	register interface
>	sanity check h/w to make sure it's there and alive
>	stop DMA/interrupts/etc., just in case
>	start timer to powerdown h/w in N seconds
>
>dev_open:
>	wake up device, if necessary
>	init device
>
>dev_close:
>	stop DMA/interrupts/etc.

I completely agree there as well. In some case, the suspend (or powerdown)
of the device can even be triggered on an open device with an idle timer.
Good candidates are hard disks and sound (which is often kept open
all the  time by the userland mixer).

However, there is another important point about power management I
discovered the "hard way", which is memory allocation vs. turning
off of swapping devices (that is the swap device itself or any device
on which you may have mmap'ed files).

For "transcient" power management (that is dynamically putting a
subsystem to sleep when idle until it gets a new request), there
is no real problem provided that the driver can do the wakeup without
allocating memory.

For system power sleep, where you actually shut down everything,
the problem happens when you start shutting down those "swap" devices.
Once done, you may be in a situation where another device, to be shut
down or to wake up properly, need to allocate memory (see for example
USB devices that need to allocate urb's). This may cause requests
to swap_out which will block indefinitely if trying to swap out
pages to an already sleeping device.

I "work around" this in the PowerBook sleep code in a bit dumb way
which work in 99% of the case but is probably broken as well if you
are really near oom. Basically, instead of calling only the "suspend"
callbacks of devices, I have an additional "suspend requested"
one that is sent to every driver using my specific PM scheme _before_
starting the real round of "suspend" callbacks. Drivers that need 
a significant amount of backup memory (like some framebuffers) will
do the necessary allocations from this early callback.

Another issue with suspend and resume is with interrupt sharing and
some bad devices that unconditionally assert their interrupt line
when put to any PM state. On the contrary, some drivers, in order
to properly block any new request in it's queues and wait for any
pending one to complete, may need to operate with interrupt still
running. I discussed that a bit with Alan, and it seem that we really
need 2 rounds of "suspend" callbacks in this case (at least for
system suspend), one with interrupts still enabled, one with interrupts
disabled.

Finally, I have another need for which I'm not sure how to react
with either the current scheme or the new scheme. On "desktop"
Apple systems (at least all the recent G4 ones), the PCI bus will
be effectively powered down during system sleep. That means that
we must (at least that's what both MacOS and MacOS X do) prevent
the complete system sleep when at least one PCI slot contains a
card for which the driver can't properly restore the state after
a complete shutdown. This frequently happens, for example, with
video cards that rely on some initial chip & pll configuration
to be done by the firmware. We may be able to fallback to some 
kind of "light suspend" where we suspend any device we can but
not the motherboard, but that mean that the "main" PM code has
to know about the problem and need some way to know if a given
node in the device tree can or cannot be revived from a given
power state (in this case, we might consider beeing powered down
as equivalent to D3 state). My current solution is to not allow
system sleep at all on those desktop machines.

Regards,
Ben.



^ permalink raw reply	[flat|nested] 120+ messages in thread

* Re: [RFC] New Driver Model for 2.5
  2001-10-17 23:52 Patrick Mochel
@ 2001-10-18  6:23 ` Jeff Garzik
  2001-10-18 12:13   ` Benjamin Herrenschmidt
  2001-10-18 15:17   ` Patrick Mochel
  2001-10-18 16:08 ` Taral
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 120+ messages in thread
From: Jeff Garzik @ 2001-10-18  6:23 UTC (permalink / raw)
  To: Patrick Mochel; +Cc: linux-kernel, Linus Torvalds

Patrick Mochel wrote:
> 
> One July afternoon, while hacking on the pm_dev layer for the purpose of
> system-wide power management support, I decided that I was quite tired of
> trying to make this layer look like a tree and feel like a tree, but not
> have any real integration with the actual device drivers..
> 
> I had read the accounts of what the goals were for 2.5. And, after some
> conversations with Linus and the (gasp) ACPI guys, I realized that I had a
> good chunk of the infrastructural code written; it was a matter of working
> out a few crucial details and massaging it in nicely.
> 
> I have had the chance this week (after moving and vacationing) to update
> the (read: write some) documentation for it. I will not go into details,
> and will let the document speak for itself.
> 
> With all luck, this should go into the early stages of 2.5, and allow a
> significant cleanup of many drivers. Such a model will also allow for neat
> tricks like full device power management support, and Plug N Play
> capabilities.
> 
> In order to support the new driver model, I have written a small in-memory
> filesystem, called ddfs, to export a unified interface to userland. It is
> mentioned in the doc, and is pretty self-explanatory. More information
> will be available soon.
> 
> There is code available for the model and ddfs at:
> 
> http://kernel.org/pub/linux/kernel/people/mochel/device/
> 
> but there are some fairly large caveats concerning it.
> 
> First, I feel comfortable with the device layer code and the ddfs
> code. Though, the PCI code is still work in progress. I am still working
> out some of the finer details concerning it.
> 
> Next is the environment under which I developed it all. It was on an ia32
> box, with only PCI support, and using ACPI. The latter didn't have too
> much of an effect on the development, but there are a few items explicitly
> inspired by it..
> 
> I am hoping both the PCI code, and the structure and in general can be
> further improved based on the input of the driver maintainers.
> 
> This model is not final, and may be way off from what most people actually
> want. It has gotten tentative blessing from all those that have seen it,
> though they number but a few. It's definitely not the only solution...
> 
> That said, enjoy; and have at it.
> 
>         -pat
> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The (New) Linux Kernel Driver Model
> 
> Version 0.01
> 
> 17 October 2001
> 
> Overview
> ~~~~~~~~
> 
> This driver model is a unification of all the current, disparate driver models
> that are currently in the kernel. It is intended is to augment the
> bus-specific drivers for bridges and devices by consolidating a set of data
> and operations into globally accessible data structures.
> 
> Current driver models implement some sort of tree-like structure (sometimes
> just a list) for the devices they control. But, there is no linkage between
> the different bus types.
> 
> A common data structure can provide this linkage with little overhead: when a
> bus driver discovers a particular device, it can insert it into the global
> tree as well as its local tree. In fact, the local tree becomes just a subset
> of the global tree.
> 
> Common data fields can also be moved out of the local bus models into the
> global model. Some of the manipulation of these fields can also be
> consolidated. Most likely, manipulation functions will become a set
> of helper functions, which the bus drivers wrap around to include any
> bus-specific items.
> 
> The common device and bridge interface currently reflects the goals of the
> modern PC: namely the ability to do seamless Plug and Play, power management,
> and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
> us that any device in the system may fit any of these criteria.)
> 
> In reality, not every bus will be able to support such operations. But, most
> buses will support a majority of those operations, and all future buses will.
> In other words, a bus that doesn't support an operation is the exception,
> instead of the other way around.
> 
> Drivers
> ~~~~~~~
> 
> The callbacks for bridges and devices are intended to be singular for a
> particular type of bus. For each type of bus that has support compiled in the
> kernel, there should be one statically allocated structure with the
> appropriate callbacks that each device (or bridge) of that type share.
> 
> Each bus layer should implement the callbacks for these drivers. It then
> forwards the calls on to the device-specific callbacks. This means that
> device-specific drivers must still implement callbacks for each operation.
> But, they are not called from the top level driver layer.
> 
> This does add another layer of indirection for calling one of these functions,
> but there are benefits that are believed to outweigh this slowdown.
> 
> First, it prevents device-specific drivers from having to know about the
> global device layer. This speeds up integration time incredibly. It also
> allows drivers to be more portable across kernel versions. Note that the
> former was intentional, the latter is an added bonus.
> 
> Second, this added indirection allows the bus to perform any additional logic
> necessary for its child devices. A bus layer may add additional information to
> the call, or translate it into something meaningful for its children.
> 
> This could be done in the driver, but if it happens for every object of a
> particular type, it is best done at a higher level.
> 
> Recap
> ~~~~~
> 
> Instances of devices and bridges are allocated dynamically as the system
> discovers their existence. Their fields describe the individual object.
> Drivers - in the global sense - are statically allocated and singular for a
> particular type of bus. They describe a set of operations that every type of
> bus could implement, the implementation following the bus's semantics.
> 
> Downstream Access
> ~~~~~~~~~~~~~~~~~
> 
> Common data fields have been moved out of individual bus layers into a common
> data structure. But, these fields must still be accessed by the bus layers,
> and
> sometimes by the device-specific drivers.
> 
> Other bus layers are encouraged to do what has been done for the PCI layer.
> struct pci_dev now looks like this:
> 
> struct pci_dev {
>         ...
> 
>         struct device device;
> };
> 
> Note first that it is statically allocated. This means only one allocation on
> device discovery. Note also that it is at the _end_ of struct pci_dev. This is
> to make people think about what they're doing when switching between the bus
> driver and the global driver; and to prevent against mindless casts between
> the two.
> 
> The PCI bus layer freely accesses the fields of struct device. It knows about
> the structure of struct pci_dev, and it should know the structure of struct
> device. PCI devices that have been converted generally do not touch the fields
> of struct device. More precisely, device-specific drivers should not touch
> fields of struct device unless there is a strong compelling reason to do so.
> 
> This abstraction is prevention of unnecessary pain during transitional phases.
> If the name of the field changes or is removed, then every downstream driver
> will break. On the other hand, if only the bus layer (and not the device
> layer) accesses struct device, it is only those that need to change.
> 
> User Interface
> ~~~~~~~~~~~~~~
> 
> By virtue of having a complete hierarchical view of all the devices in the
> system, exporting a complete hierarchical view to userspace becomes relatively
> easy. Whenever a device is inserted into the tree, a file or directory can be
> created for it.
> 
> In this model, a directory is created for each bridge and each device. When it
> is created, it is populated with a set of default files, first at the global
> layer, then at the bus layer. The device layer may then add its own files.
> 
> These files export data about the driver and can be used to modify behavior of
> the driver or even device.
> 
> For example, at the global layer, a file named 'status' is created for each
> device. When read, it reports to the user the name of the device, its bus ID,
> its current power state, and the name of the driver its using.
> 
> By writing to this file, you can have control over the device. By writing
> "suspend 3" to this file, one could place the device into power state "3".
> Basically, by writing to this file, the user has access to the operations
> defined in struct device_driver.
> 
> The PCI layer also adds default files. For devices, it adds a "resource" file
> and a "wake" file. The former reports the BAR information for the device; the
> latter reports the wake capabilities of the device.
> 
> The device layer could also add files for device-specific data reporting and
> control.
> 
> The dentry to the device's directory is kept in struct device. It also keeps a
> linked list of all the files in the directory, with pointers to their read and
> write callbacks. This allows the driver layer to maintain full control of its
> destiny. If it desired to override the default behavior of a file, or simply
> remove it, it could easily do so. (It is assumed that the files added upstream
> will always be a known quantity.)
> 
> These features were initially implemented using procfs. However, after one
> conversation with Linus, a new filesystem - ddfs - was created to implement
> these features. It is an in-memory filesystem, based heavily off of ramfs,
> though it uses procfs as inspiration for its callback functionality.
> 
> Device Structures
> ~~~~~~~~~~~~~~~~~
> 
> struct device {
>         struct list_head        bus_list;
>         struct io_bus           *parent;
>         struct io_bus           *subordinate;
> 
>         char                    name[DEVICE_NAME_SIZE];
>         char                    bus_id[BUS_ID_SIZE];
> 
>         struct dentry           *dentry;
>         struct list_head        files;
> 
>         struct  semaphore       lock;
> 
>         struct device_driver    *driver;
>         void                    *driver_data;
>         void                    *platform_data;
> 
>         u32                     current_state;
>         unsigned char           *saved_state;
> };
> 
> bus_list:
>         List of all devices on a particular bus; i.e. the device's siblings
> 
> parent:
>         The parent bridge for the device.
> 
> subordinate:
>         If the device is a bridge itself, this points to the struct io_bus that is
>         created for it.
> 
> name:
>         Human readable (descriptive) name of device. E.g. "Intel EEPro 100"
> 
> bus_id:
>         Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
>         0). It is necessary to have a searchable bus id for each device; making it
>         ASCII allows us to use it for its directory name without translating it.
> 
> dentry:
>         Pointer to driver's ddfs directory.
> 
> files:
>         Linked list of all the files that a driver has in its ddfs directory.
> 
> lock:
>         Driver specific lock.
> 
> driver:
>         Pointer to a struct device_driver, the common operations for each device. See
>         next section.
> 
> driver_data:
>         Private data for the driver.
>         Much like the PCI implementation of this field, this allows device-specific
>         drivers to keep a pointer to a device-specific data.
> 
> platform_data:
>         Data that the platform (firmware) provides about the device.
>         For example, the ACPI BIOS or EFI may have additional information about the
>         device that is not directly mappable to any existing kernel data structure.
>         It also allows the platform driver (e.g. ACPI) to a driver without the driver
>         having to have explicit knowledge of (atrocities like) ACPI.
> 
> current_state:
>         Current power state of the device. For PCI and other modern devices, this is
>         0-3, though it's not necessarily limited to those values.
> 
> saved_state:
>         Pointer to driver-specific set of saved state.
>         Having it here allows modules to be unloaded on system suspend and reloaded
>         on resume and maintain state across transitions.
>         It also allows generic drivers to maintain state across system state
>         transitions.
>         (I've implemented a generic PCI driver for devices that don't have a
>         device-specific driver. Instead of managing some vector of saved state
>         for each device the generic driver supports, it can simply store it here.)
> 
> struct device_driver {
>         int     (*probe)        (struct device *dev);
>         int     (*remove)       (struct device *dev);
> 
>         int     (*init)         (struct device *dev);
>         int     (*shutdown)     (struct device *dev);
> 
>         int     (*save_state)   (struct device *dev, u32 state);
>         int     (*restore_state)(struct device *dev);
> 
>         int     (*suspend)      (struct device *dev, u32 state);
>         int     (*resume)       (struct device *dev);
> }
> 
> probe:
>         Check for device existence and associate driver with it.
> 
> remove:
>         Dissociate driver with device. Releases device so that it could be used by
>         another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an
>         ejection event could take place here.
> 
> init:
>         Initialise the device - allocate resources, irqs, etc.
> 
> shutdown:
>         "De-initialise" the device - release resources, free memory, etc.
> 
> save_state:
>         Save current device state before entering suspend state.
> 
> restore_state:
>         Restore device state, after coming back from suspend state.
> 
> suspend:
>         Physically enter suspend state.
> 
> resume:
>         Physically leave suspend state and re-initialise hardware.
> 
> Initially, the probe/remove sequence followed the PCI semantics exactly, but
> have since been broken up into a four-stage process: probe(), remove(),
> init(), and shutdown().
> 
> While it's not entirely necessary in all environments, breaking them up so
> each routine does only one thing makes sense.
> 
> Hot-pluggable devices may also benefit from this model, especially ones that
> can be subjected to suprise removals - only the remove function would be
> called, and the driver could easily know if the there was still hardware there
> to shutdown.
> 
> Drivers that are controlling failing, or buggy, hardware, by allowing the user
> to trigger a removal of the driver from userspace, without trying to shutdown
> down the device.
> 
> In each case that remove() is called without a shutdown(), it's important to
> note that resources will still need to be freed; it's only the hardware that
> cannot be assumed to be present.

So, remove() might be called without a shutdown(), and then asked to
perform the duties normally performed by shutdown()?  That sounds like
API dain bramage.  :)

Your proposal sounds ok, my one objection is separating probe/remove
further into init/shutdown.  Can you give real-life cases where this
will be useful?  I don't see it causing much except headache.

The preferred way of doing things (IMHO) is to do some simply sanity
checking of the h/w device at probe time, and then perform lots of
initialization and such at device/interface open time.  You ideally want
a device driver lifecycle to look like

probe:
	register interface
	sanity check h/w to make sure it's there and alive
	stop DMA/interrupts/etc., just in case
	start timer to powerdown h/w in N seconds

dev_open:
	wake up device, if necessary
	init device

dev_close:
	stop DMA/interrupts/etc.
	start timer to powerdown h/w in N seconds

With that in mind, init -really- happens at device open, and in
additional is driven more through normal user interaction via standard
APIs, than the PCI and PM subsystems.

-- 
Jeff Garzik      | "Mind if I drive?" -Sam
Building 1024    | "Not if you don't mind me clawing at the dash
MandrakeSoft     |  and shrieking like a cheerleader." -Max

^ permalink raw reply	[flat|nested] 120+ messages in thread

* [RFC] New Driver Model for 2.5
@ 2001-10-17 23:52 Patrick Mochel
  2001-10-18  6:23 ` Jeff Garzik
                   ` (3 more replies)
  0 siblings, 4 replies; 120+ messages in thread
From: Patrick Mochel @ 2001-10-17 23:52 UTC (permalink / raw)
  To: linux-kernel


One July afternoon, while hacking on the pm_dev layer for the purpose of
system-wide power management support, I decided that I was quite tired of
trying to make this layer look like a tree and feel like a tree, but not
have any real integration with the actual device drivers..

I had read the accounts of what the goals were for 2.5. And, after some
conversations with Linus and the (gasp) ACPI guys, I realized that I had a
good chunk of the infrastructural code written; it was a matter of working
out a few crucial details and massaging it in nicely.

I have had the chance this week (after moving and vacationing) to update
the (read: write some) documentation for it. I will not go into details,
and will let the document speak for itself. 

With all luck, this should go into the early stages of 2.5, and allow a
significant cleanup of many drivers. Such a model will also allow for neat
tricks like full device power management support, and Plug N Play
capabilities.


In order to support the new driver model, I have written a small in-memory
filesystem, called ddfs, to export a unified interface to userland. It is
mentioned in the doc, and is pretty self-explanatory. More information
will be available soon.


There is code available for the model and ddfs at:

http://kernel.org/pub/linux/kernel/people/mochel/device/

but there are some fairly large caveats concerning it. 

First, I feel comfortable with the device layer code and the ddfs
code. Though, the PCI code is still work in progress. I am still working
out some of the finer details concerning it. 

Next is the environment under which I developed it all. It was on an ia32
box, with only PCI support, and using ACPI. The latter didn't have too
much of an effect on the development, but there are a few items explicitly
inspired by it..

I am hoping both the PCI code, and the structure and in general can be
further improved based on the input of the driver maintainers. 


This model is not final, and may be way off from what most people actually
want. It has gotten tentative blessing from all those that have seen it,
though they number but a few. It's definitely not the only solution...

That said, enjoy; and have at it.

	-pat


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The (New) Linux Kernel Driver Model

Version 0.01 

17 October 2001 


Overview
~~~~~~~~

This driver model is a unification of all the current, disparate driver models 
that are currently in the kernel. It is intended is to augment the 
bus-specific drivers for bridges and devices by consolidating a set of data 
and operations into globally accessible data structures. 

Current driver models implement some sort of tree-like structure (sometimes 
just a list) for the devices they control. But, there is no linkage between 
the different bus types. 

A common data structure can provide this linkage with little overhead: when a 
bus driver discovers a particular device, it can insert it into the global 
tree as well as its local tree. In fact, the local tree becomes just a subset 
of the global tree. 

Common data fields can also be moved out of the local bus models into the 
global model. Some of the manipulation of these fields can also be 
consolidated. Most likely, manipulation functions will become a set 
of helper functions, which the bus drivers wrap around to include any 
bus-specific items.

The common device and bridge interface currently reflects the goals of the 
modern PC: namely the ability to do seamless Plug and Play, power management, 
and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures 
us that any device in the system may fit any of these criteria.)

In reality, not every bus will be able to support such operations. But, most 
buses will support a majority of those operations, and all future buses will. 
In other words, a bus that doesn't support an operation is the exception, 
instead of the other way around.


Drivers
~~~~~~~

The callbacks for bridges and devices are intended to be singular for a 
particular type of bus. For each type of bus that has support compiled in the 
kernel, there should be one statically allocated structure with the 
appropriate callbacks that each device (or bridge) of that type share. 

Each bus layer should implement the callbacks for these drivers. It then 
forwards the calls on to the device-specific callbacks. This means that 
device-specific drivers must still implement callbacks for each operation. 
But, they are not called from the top level driver layer.

This does add another layer of indirection for calling one of these functions, 
but there are benefits that are believed to outweigh this slowdown.

First, it prevents device-specific drivers from having to know about the 
global device layer. This speeds up integration time incredibly. It also 
allows drivers to be more portable across kernel versions. Note that the 
former was intentional, the latter is an added bonus. 

Second, this added indirection allows the bus to perform any additional logic 
necessary for its child devices. A bus layer may add additional information to 
the call, or translate it into something meaningful for its children. 

This could be done in the driver, but if it happens for every object of a 
particular type, it is best done at a higher level. 

Recap
~~~~~

Instances of devices and bridges are allocated dynamically as the system 
discovers their existence. Their fields describe the individual object. 
Drivers - in the global sense - are statically allocated and singular for a 
particular type of bus. They describe a set of operations that every type of 
bus could implement, the implementation following the bus's semantics.


Downstream Access
~~~~~~~~~~~~~~~~~

Common data fields have been moved out of individual bus layers into a common 
data structure. But, these fields must still be accessed by the bus layers, 
and 
sometimes by the device-specific drivers. 

Other bus layers are encouraged to do what has been done for the PCI layer. 
struct pci_dev now looks like this:

struct pci_dev {
	...

	struct device device;
};

Note first that it is statically allocated. This means only one allocation on 
device discovery. Note also that it is at the _end_ of struct pci_dev. This is 
to make people think about what they're doing when switching between the bus 
driver and the global driver; and to prevent against mindless casts between 
the two.

The PCI bus layer freely accesses the fields of struct device. It knows about 
the structure of struct pci_dev, and it should know the structure of struct 
device. PCI devices that have been converted generally do not touch the fields 
of struct device. More precisely, device-specific drivers should not touch 
fields of struct device unless there is a strong compelling reason to do so.

This abstraction is prevention of unnecessary pain during transitional phases. 
If the name of the field changes or is removed, then every downstream driver 
will break. On the other hand, if only the bus layer (and not the device 
layer) accesses struct device, it is only those that need to change.


User Interface
~~~~~~~~~~~~~~

By virtue of having a complete hierarchical view of all the devices in the 
system, exporting a complete hierarchical view to userspace becomes relatively 
easy. Whenever a device is inserted into the tree, a file or directory can be 
created for it.

In this model, a directory is created for each bridge and each device. When it 
is created, it is populated with a set of default files, first at the global 
layer, then at the bus layer. The device layer may then add its own files. 

These files export data about the driver and can be used to modify behavior of 
the driver or even device.

For example, at the global layer, a file named 'status' is created for each 
device. When read, it reports to the user the name of the device, its bus ID, 
its current power state, and the name of the driver its using. 

By writing to this file, you can have control over the device. By writing 
"suspend 3" to this file, one could place the device into power state "3". 
Basically, by writing to this file, the user has access to the operations 
defined in struct device_driver. 

The PCI layer also adds default files. For devices, it adds a "resource" file 
and a "wake" file. The former reports the BAR information for the device; the 
latter reports the wake capabilities of the device. 

The device layer could also add files for device-specific data reporting and 
control. 

The dentry to the device's directory is kept in struct device. It also keeps a 
linked list of all the files in the directory, with pointers to their read and 
write callbacks. This allows the driver layer to maintain full control of its 
destiny. If it desired to override the default behavior of a file, or simply 
remove it, it could easily do so. (It is assumed that the files added upstream 
will always be a known quantity.)

These features were initially implemented using procfs. However, after one 
conversation with Linus, a new filesystem - ddfs - was created to implement 
these features. It is an in-memory filesystem, based heavily off of ramfs, 
though it uses procfs as inspiration for its callback functionality.


Device Structures
~~~~~~~~~~~~~~~~~

struct device {
	struct list_head 	bus_list;
	struct io_bus   	*parent;
	struct io_bus   	*subordinate;

	char    		name[DEVICE_NAME_SIZE];
	char    		bus_id[BUS_ID_SIZE];

	struct dentry   	*dentry;
	struct list_head        files;

	struct 	semaphore       lock;

	struct device_driver 	*driver;
	void            	*driver_data;
	void    		*platform_data;

	u32             	current_state;
	unsigned char 		*saved_state;
};

bus_list: 
	List of all devices on a particular bus; i.e. the device's siblings

parent:
	The parent bridge for the device.

subordinate:
	If the device is a bridge itself, this points to the struct io_bus that is
	created for it.

name:
	Human readable (descriptive) name of device. E.g. "Intel EEPro 100"

bus_id:
	Parsable (yet ASCII) bus id. E.g. "00:04.00" (PCI Bus 0, Device 4, Function
	0). It is necessary to have a searchable bus id for each device; making it
	ASCII allows us to use it for its directory name without translating it.

dentry:
	Pointer to driver's ddfs directory.

files:
	Linked list of all the files that a driver has in its ddfs directory.

lock:
	Driver specific lock.

driver:
	Pointer to a struct device_driver, the common operations for each device. See
	next section.

driver_data:
	Private data for the driver.
	Much like the PCI implementation of this field, this allows device-specific
	drivers to keep a pointer to a device-specific data.

platform_data:
	Data that the platform (firmware) provides about the device. 
	For example, the ACPI BIOS or EFI may have additional information about the
	device that is not directly mappable to any existing kernel data structure. 
	It also allows the platform driver (e.g. ACPI) to a driver without the driver 
	having to have explicit knowledge of (atrocities like) ACPI. 


current_state:
	Current power state of the device. For PCI and other modern devices, this is
	0-3, though it's not necessarily limited to those values.

saved_state:
	Pointer to driver-specific set of saved state. 
	Having it here allows modules to be unloaded on system suspend and reloaded
	on resume and maintain state across transitions.
	It also allows generic drivers to maintain state across system state
	transitions. 
	(I've implemented a generic PCI driver for devices that don't have a
	device-specific driver. Instead of managing some vector of saved state
	for each device the generic driver supports, it can simply store it here.)



struct device_driver {
        int     (*probe)        (struct device *dev);
        int     (*remove)       (struct device *dev);

        int     (*init)		(struct device *dev);
        int     (*shutdown)	(struct device *dev);

        int     (*save_state)   (struct device *dev, u32 state);
        int     (*restore_state)(struct device *dev);

        int     (*suspend)      (struct device *dev, u32 state);
        int     (*resume)       (struct device *dev);
}


probe:
	Check for device existence and associate driver with it. 

remove:
	Dissociate driver with device. Releases device so that it could be used by
	another driver. Also, if it is a hotplug device (hotplug PCI, Cardbus), an 
	ejection event could take place here.

init:
	Initialise the device - allocate resources, irqs, etc. 

shutdown:
	"De-initialise" the device - release resources, free memory, etc.

save_state:
	Save current device state before entering suspend state.

restore_state:
	Restore device state, after coming back from suspend state.

suspend:
	Physically enter suspend state.

resume:
	Physically leave suspend state and re-initialise hardware.


Initially, the probe/remove sequence followed the PCI semantics exactly, but 
have since been broken up into a four-stage process: probe(), remove(), 
init(), and shutdown().

While it's not entirely necessary in all environments, breaking them up so 
each routine does only one thing makes sense. 

Hot-pluggable devices may also benefit from this model, especially ones that 
can be subjected to suprise removals - only the remove function would be 
called, and the driver could easily know if the there was still hardware there 
to shutdown. 

Drivers that are controlling failing, or buggy, hardware, by allowing the user 
to trigger a removal of the driver from userspace, without trying to shutdown 
down the device. 

In each case that remove() is called without a shutdown(), it's important to 
note that resources will still need to be freed; it's only the hardware that 
cannot be assumed to be present.


Suspend/resume transitions are broken into four stages as well to provide 
graceful recovery from a failed suspend attempt; and to ensure that state gets 
stored in a non-volatile location before the system (and its devices) are 
suspended.

When a suspend transition is triggered, the device tree is walked first to 
save the state of all the devices in the system. Once this is complete, the 
saved state, now residing in memory, can be written to some non-volatile 
location, like a disk partition or network location. 

The device tree is then walked again to suspend all of the devices. This 
guarantees that the device controlling the location to write the state is 
still powered on while you have a snapshot of the system state. 

If a device is in a critical I/O transaction, or for some other reason cannot 
stand to be suspended, it notify the kernel by failing in the save state 
step. At this point, state can either be restored, or dropped, for all the 
devices that had been already been touched, and execution may resume. No 
devices will have been powered off at this point, making it much easier to 
recover.

The resume transition is broken up into two steps mainly to stress the 
singularity of each step: resume() powers on the device and reinitialises it; 
restore_state() restores the device and bus-specific registers of the device.
resume() will happen with interrupts disabled; restore_state() with them 
enabled.


Bus Structures
~~~~~~~~~~~~~~

struct io_bus {
	struct	list_head 	node;
	struct 	io_bus 		*parent;
	struct 	list_head 	children;
	struct 	list_head 	devices;

	struct 	list_head 	bus_list;

	struct 	device 		*self;
	struct 	dentry 		*dentry;
	struct 	list_head 	files;

	char    name[DEVICE_NAME_SIZE];
	char    bus_id[BUS_ID_SIZE];

	struct  bus_driver	*driver;
};

node:
	Bus's node in sibling list (its parent's list of child buses).

parent:
	Pointer to parent bridge.

children:
	List of subordinate buses. 
	In the children, this correlates to their 'node' field.

devices:
	List of devices on the bus this bridge controls.
	This field corresponds to the 'bus_list' field in each child device.

bus_list:
	Each type of bus keeps a list of all bridges that it finds. This is the 
	bridges entry in that list.

self:
	Pointer to the struct device for this bridge.

dentry:
	Every bus also gets a ddfs directory for which to add files to, as well as
	child device directories. Actually, every bridge will have two directories -
	one for the bridge device, and one for the subordinate device.

files:
	Each bus also gets a list of the files that are in the ddfs directory, for
	the same reasons as the devices - to have explicit control over the behavior
	and easy access to each file that any higher layers may have added.

name:
	Human readable ASCII name of bus.

bus_id:
	Machine readable (though ASCII) description of position on parent bus.

driver:
	Pointer to operations for bus.


struct bus_driver {
	char    name[16];
	struct  list_head node;
	int     (*scan)         (struct io_bus*);
	int     (*rescan)       (struct io_bus*);
	int     (*add_device)   (struct io_bus*, char*);
	int     (*remove_device)(struct io_bus*, struct device*);
	int     (*add_bus)      (struct io_bus*, char*);
	int     (*remove_bus)   (struct io_bus*, struct io_bus*);
};

name:
	ASCII name of bus.

node:
	List of buses of this type in system.

scan:
	Search the bus for devices. This is meant to be done only once - when the 
	bridge is initially discovered.

rescan:
	Search the bus again and look for changes. I.e. check for device insertion or 
	removal.

add_device:
	Trigger a device insertion at a particular location.

remove_device:
	Trigger the removal of a particular device.

add_bus:
	Trigger insertion of a new bridge device (and child bus) at a particular
	location on the bus.

remove_bus:
	Remove a particular bridge and subordinate bus.







^ permalink raw reply	[flat|nested] 120+ messages in thread

end of thread, other threads:[~2001-10-29 19:33 UTC | newest]

Thread overview: 120+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-10-19 23:33 [RFC] New Driver Model for 2.5 Benjamin Herrenschmidt
2001-10-20  0:09 ` Linus Torvalds
2001-10-20  9:28   ` Benjamin Herrenschmidt
2001-10-21 17:09     ` Pavel Machek
2001-10-23  0:19       ` Patrick Mochel
2001-10-23  0:31         ` Alan Cox
2001-10-23  0:29           ` Patrick Mochel
2001-10-23  7:53             ` Alan Cox
2001-10-23 15:10               ` Jonathan Lundell
2001-10-23 15:49                 ` Alan Cox
2001-10-23 20:22                 ` Benjamin Herrenschmidt
2001-10-23 20:54                   ` Alan Cox
2001-10-24  0:26                     ` Benjamin Herrenschmidt
2001-10-24  9:57                       ` Alan Cox
2001-10-24 10:34                         ` Benjamin Herrenschmidt
2001-10-24 10:54                           ` Alan Cox
2001-10-24 13:04                             ` Benjamin Herrenschmidt
2001-10-24 13:25                               ` Alan Cox
2001-10-24 16:19                                 ` Linus Torvalds
2001-10-24 16:36                                   ` Michael H. Warfield
2001-10-24 16:45                                     ` Linus Torvalds
2001-10-24 22:48                                   ` Alan Cox
2001-10-24 16:15                               ` Linus Torvalds
2001-10-24 16:46                                 ` Xavier Bestel
2001-10-24 16:54                                   ` Patrick Mochel
2001-10-24 16:55                                   ` Linus Torvalds
2001-10-24 22:45                                     ` Alan Cox
2001-10-24 17:33                                 ` Benjamin Herrenschmidt
2001-10-24 22:41                                   ` Alan Cox
2001-10-24 22:41                                     ` Linus Torvalds
2001-10-25  7:58                                     ` Benjamin Herrenschmidt
2001-10-25 12:22                                       ` Alan Cox
2001-10-25 14:57                                         ` Benjamin Herrenschmidt
2001-10-25  8:03                                     ` Benjamin Herrenschmidt
2001-10-25  8:09                                       ` Benjamin Herrenschmidt
2001-10-25 12:20                                       ` Alan Cox
2001-10-25 21:47                                   ` Pavel Machek
2001-10-24 22:50                                 ` Alan Cox
2001-10-25  4:14                                   ` Linus Torvalds
2001-10-25 12:42                                     ` Alan Cox
2001-10-25 21:52                                       ` Xavier Bestel
2001-10-25 23:53                                         ` Benjamin Herrenschmidt
2001-10-25 23:53                                           ` Alan Cox
2001-10-26 11:35                                       ` Helge Hafting
2001-10-26 12:38                                         ` Alan Cox
2001-10-25  8:27                                 ` Rob Turk
2001-10-25 10:01                                   ` Benjamin Herrenschmidt
2001-10-25 10:02                                   ` Helge Hafting
2001-10-25 14:20                                   ` Victor Yodaiken
2001-10-25 14:44                                     ` Jeff Garzik
2001-10-25 14:45                                     ` Jeff Garzik
2001-10-25 15:22                                     ` Rob Turk
2001-10-25 15:44                                     ` Jonathan Lundell
2001-10-25 16:26                                     ` David Lang
2001-10-25 21:59                                   ` Pavel Machek
2001-10-25 21:32                                     ` Rob Turk
2001-10-24 17:01                               ` Mike Anderson
2001-10-25  9:02                               ` Eric W. Biederman
2001-10-25  9:29                                 ` Linus Torvalds
2001-10-25  9:47                                   ` Benjamin Herrenschmidt
2001-10-25 10:11                                   ` Eric W. Biederman
2001-10-25 10:59                                     ` Linus Torvalds
2001-10-24 15:18                         ` Jonathan Lundell
2001-10-24 15:41                         ` Linus Torvalds
2001-10-24 15:59                           ` Alan Cox
2001-10-24 15:56                             ` Linus Torvalds
2001-10-23  9:44             ` Pavel Machek
2001-10-23 11:03               ` Benjamin Herrenschmidt
2001-10-23 11:49                 ` Benjamin Herrenschmidt
2001-10-23 10:54           ` Benjamin Herrenschmidt
  -- strict thread matches above, loose matches on Subject: below --
2001-10-24 17:56 Grover, Andrew
2001-10-24 18:45 ` Benjamin Herrenschmidt
2001-10-19 21:43 Grover, Andrew
2001-10-19 17:01 Kevin Easton
2001-10-19 18:40 ` Patrick Mochel
2001-10-17 23:52 Patrick Mochel
2001-10-18  6:23 ` Jeff Garzik
2001-10-18 12:13   ` Benjamin Herrenschmidt
2001-10-18 16:19     ` Patrick Mochel
2001-10-18 17:38       ` Tim Jansen
2001-10-18 22:06       ` Benjamin Herrenschmidt
2001-10-19 17:09       ` Kai Henningsen
2001-10-18 22:10     ` Kai Henningsen
2001-10-19 18:26       ` Patrick Mochel
2001-10-19 19:02         ` Tim Jansen
2001-10-19 19:21           ` Mike Fedyk
2001-10-19 20:07             ` Tim Jansen
2001-10-19 20:24               ` Mike Fedyk
2001-10-19 22:25                 ` Tim Jansen
2001-10-20 13:47               ` Kai Henningsen
2001-10-20  1:41             ` john slee
2001-10-20 13:52         ` Kai Henningsen
2001-10-22 11:02           ` Padraig Brady
2001-10-27 11:01           ` Kai Henningsen
2001-10-19  7:57     ` Henning P. Schmiedehausen
2001-10-19  8:09       ` Jeff Garzik
2001-10-19  8:31         ` Keith Owens
2001-10-19  8:43           ` Jeff Garzik
2001-10-19 18:50       ` Tim Jansen
2001-10-19 15:21     ` Taral
2001-10-19 23:30     ` Benjamin Herrenschmidt
2001-10-19 23:54       ` Benjamin Herrenschmidt
2001-10-18 15:17   ` Patrick Mochel
2001-10-18 16:08 ` Taral
2001-10-18 16:52 ` Jonathan Lundell
2001-10-18 17:38   ` Patrick Mochel
2001-10-18 17:41     ` Patrick Mochel
2001-10-18 18:28     ` Jonathan Lundell
2001-10-18 19:49       ` Patrick Mochel
2001-10-18 20:40         ` Jeff Garzik
2001-10-18 21:32         ` John Alvord
2001-10-18 22:23           ` Benjamin Herrenschmidt
2001-10-18 22:26           ` Jeff Garzik
2001-10-18 22:18         ` Benjamin Herrenschmidt
2001-10-18 23:30           ` Patrick Mochel
2001-10-18 23:44             ` Benjamin Herrenschmidt
2001-10-18 23:52               ` Jeff Garzik
2001-10-19 17:12       ` Jonathan Lundell
2001-10-18 17:05 ` Jonathan Corbet
2001-10-18 17:33   ` Patrick Mochel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).