linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-13 19:24 Adam J. Richter
  2002-10-13 19:51 ` Eric Blade
  2002-10-13 22:52 ` Andries Brouwer
  0 siblings, 2 replies; 45+ messages in thread
From: Adam J. Richter @ 2002-10-13 19:24 UTC (permalink / raw)
  To: linux-kernel

	linux-2.5.42 had an annoying new behavior.  When I would
try to do a warm reboot, it would spin down the hard drives, which
just made the reboot take longer and gave the impression that a
halt or poweroff was in progress.

	At first, I suspected IDE, but I think the new behavior in IDE
of spinning down the hard drives on suspend is correct.  The problem
is that the warm reboot system call is trying to suspend all of the
devices before a warm reboot for no reason.  We already have a reboot
notifier chain that drivers can use to register code that has to be
run in order to safely reboot or halt.  I am not talking about
eliminating that.  I am only talking about the soft reboot putting
devices into a power saving mode that is allowed to take a long
recovery time, especially given that the reboot is likely to want to
talk to every hardware device connected to the system.

	Anyhow, here is the patch.  As far as I can tell, there is no
delegated mainainer for kernel/sys.c, so I am sending this to
linux-kernel and I will resend it to Linus later if nobody points me
to another maintainer to go through and there are no complaints.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

--- linux-2.5.42/kernel/sys.c	2002-10-11 21:21:31.000000000 -0700
+++ linux/kernel/sys.c	2002-10-13 11:57:45.000000000 -0700
@@ -365,7 +365,6 @@
 	case LINUX_REBOOT_CMD_RESTART:
 		notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL);
 		system_running = 0;
-		device_shutdown();
 		printk(KERN_EMERG "Restarting system.\n");
 		machine_restart(NULL);
 		break;

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-13 22:14 Adam J. Richter
  2002-10-13 22:31 ` Russell King
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Adam J. Richter @ 2002-10-13 22:14 UTC (permalink / raw)
  To: ebiederm; +Cc: eblade, linux-kernel

Eric W. Biederman writes:
>Eric Blade <eblade@blackmagik.dynup.net> writes:

>> On Sun, 2002-10-13 at 15:24, Adam J. Richter wrote:
>> > 	At first, I suspected IDE, but I think the new behavior in IDE
>> > of spinning down the hard drives on suspend is correct.  The problem
>> > is that the warm reboot system call is trying to suspend all of the
>> > devices before a warm reboot for no reason.


>There is a very good reason for calling the driver->remove() function
>on a reboot so that we place the devices in a consistent state.  And
>stop any on going DMA etc.

	A warm reboot is supposed to do this by the platform dependent
machine_restart(), and whatever processes machine_restart sets off
(e.g., by making a RESET signal active).  If a device driver needs
some special processing prior to that, that is what the reboot
notifier chain is for.

>The reboot notifier call chain, is highly underused, and all of the drivers
>already have the code they need in their remove function.

	What specific drivers are you referring to that meet all three
of the criteria you have stated?:

		1. They need to use reboot_notifier because the
		   machine_restart() is not enough to reset them,

		2. They do not currently use reboot_notifier,

		3. They have a device->remove() that does what the
		   reboot_notifier should do (shuts down ongoing DMA,
		   because they have a some device that ignores a reset
		   signal or something like that).

	In general I would expective that a driver->remove(device)
function would not be able to touch much of the hardware at all,
because it would have to assume that the device has been or is being
physically yanked out of the computer, so it should do software cleanup
and maybe talk to the PCMCIA socket, USB hub port, etc. that previously
held the device.

	If you have a platform where, for example, somehow PCI devices
are able to continue jabbering away after the computer has been reset,
then that could probably be done more consistently for most drivers by
having machine_restart on that platform walk the PCI bus and shut down
everything (drivers that need to do something really special would
still use the reboot notifier).

	I could even see calling device_shutdown from machine_restart
on that platform only, and I could even image
"if(buggy_motherboard) device_shutdown();", but I would be really
surprised to learn that sort of thorough resetting is necessary for
most hardware on ordinary x86 PC's.

	I could also understand something like your system call for
booting Linux from Linux needing to do more thorough shutdowns and
resets, but not something that uses machine_restart().

>> I am not talking about
>> > eliminating that.  I am only talking about the soft reboot putting
>> > devices into a power saving mode that is allowed to take a long
>> > recovery time, especially given that the reboot is likely to want to
>> > talk to every hardware device connected to the system.

>If 
>driver->remove() 
>does that it is an issue with that device driver.

	device_shutdown and device_suspend are for power management.
It is important to turn the device off as soon as possible if a power
management routine has told you that the device is not going to be
used any more.

	If you've identified a bunch of devices that need
reboot_notifier, please list them so we can fix them rather than
taxing the users with unnecessarily slow reboots or poor battery life
(due to device not being shut down when they are no longer being used).

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-13 23:10 Adam J. Richter
  2002-10-13 23:15 ` Russell King
  2002-10-13 23:54 ` Eric W. Biederman
  0 siblings, 2 replies; 45+ messages in thread
From: Adam J. Richter @ 2002-10-13 23:10 UTC (permalink / raw)
  To: rmk; +Cc: ebiederm, eblade, linux-kernel

Russell King writes:
>x86, I believe, is one example of such a platform that can leave PCI
>devices jabbering over a warm reboot.

	The standards on pcisig.com are apparently proprietary, so I'm
afraid I can only quote a proprietary book I have handy:

	Reset Signal (RST#):

	When asserted, the reset signal forces all PCI configuration
	registers, master and target state machines and output drivers to an
	initialized state.  RST# may be asserted or deasserted asynchronously
	to the PCI CLK edge.  The assertion of RST# also initializes other,
	device-specific functions, but this subject is beyond the scope of the
	PCI specification.  All PCI output signals must be driver to their
	bening states.  In general, this means they must be tri-stated [...]
	
		Chapter 4, page 37
		_PCI System Architecture, 4th Edition_
		Tom Shanley and Don Anderson


	So, you must be talking about a PC that does not ground RST#
during a warm reboot or out of spec (according to this book) PCI devices,
which would not be specific to x86 unless we're talking about motherboard
chipset devices.

	I understand the benefits of being conservative, but let's not
be taken in by urban legend, or, more likely, some quirkly hardware
that we can set a flag for while we can reboot more quickly with most
other hardware.  Anyhow, if you or anyone can give me specifics about
devices jabbering away after reboot, that would be great

	I have no objection to replacing or supplementing the reboot
notifier chain with a method in struct device_driver, but let's not
overload these methods with ambiguous semantics.  I do not want to
call thirty functions that primarily return memory to various memory
allocators, mark a bunch of inodes as invalid, and otherwise arrange
things so that the kernel can smoothly continue to run user level
programs when, in fact, we just want to pull the reset line on the
computer.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

	

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-13 23:59 Adam J. Richter
  2002-10-14  0:07 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-13 23:59 UTC (permalink / raw)
  To: rmk; +Cc: ebiederm, eblade, linux-kernel

Russell King wrote:
>On Sun, Oct 13, 2002 at 04:10:01PM -0700, Adam J. Richter wrote:
>>       I have no objection to replacing or supplementing the reboot
>> notifier chain with a method in struct device_driver, but let's not
>> overload these methods with ambiguous semantics.  I do not want to
>> call thirty functions that primarily return memory to various memory
>> allocators, mark a bunch of inodes as invalid, and otherwise arrange
>> things so that the kernel can smoothly continue to run user level
>> programs when, in fact, we just want to pull the reset line on the
>> computer.
>
>And what about setups where you can't pull the reset line from software.
>I have several machines here like that.  And one of them needs software
>to talk to the cards to put them back into a sane state before rebooting.
>
>"rebooting" in this particular case is "turn MMU off, jump to location 0"

As I send in my response Eric Biederman,

|        If you have a platform where, for example, somehow PCI devices
| are able to continue jabbering away after the computer has been reset,
| then that could probably be done more consistently for most drivers by
| having machine_restart on that platform walk the PCI bus and shut down
| everything (drivers that need to do something really special would
| still use the reboot notifier).
|
|         I could even see calling device_shutdown from machine_restart
| on that platform only, [...]


>And I never said anything about needing to allocate memory to do this.
>I agree with you that suspending devices on reboot _is_ silly.  However,
>that's not what I was proposing.

	Then you've started a new thread of discussion, because
device_shutdown is defined in drivers/base/power.c as:

void device_shutdown(void)
{  
        device_suspend(4, SUSPEND_POWER_DOWN);
}


	Perhaps device_suspend ought to be renamed device_power_down.

	However, I'm not trying to quash what you want to discuss.
I'd be interested in hearing about clarifications and perhaps
extensions of the struct device_driver methods, which I think is what
you're getting at, perhaps here or on linux-hotplug.  It's just that,
for this thread, I'm trying to focus on my patch that eliminates the
software suspend on reboot (pros and cons, alternatives to it, etc.).

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-14 15:25 Adam J. Richter
  2002-10-14 16:44 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-14 15:25 UTC (permalink / raw)
  To: ebiederm; +Cc: eblade, linux-kernel, rmk

Eric W. Biederman wrote:
>Russell King <rmk@arm.linux.org.uk> writes:

>> "rebooting" in this particular case is "turn MMU off, jump to location 0"
>And for x86 it is turn MMU off, jump to location 0xffff0.


	The default reboot method of linux-2.5.42 on x86 is to have the
keyboard controller do a reset.  If you want to reboot by jumping to ffff00,
you have to pass "reboot=b" unless you are running on one of three specific
models of computers: the Dell PowerEdge 300, 1300 or 2400
(linux-2.5.42/arch/i386/kernel/dmi-scan.c line 522).  Here is the
relevant excerpt from machine_restart in linux-2.5.42/arch/i386/reboot.c:

        if(!reboot_thru_bios) {
                /* rebooting needs to touch the page at absolute addr 0 */
                *((unsigned short *)__va(0x472)) = reboot_mode;
                for (;;) {
                        int i;
                        for (i=0; i<100; i++) {
                                kb_wait();
                                udelay(50);
                                outb(0xfe,0x64);         /* pulse reset low */
                                udelay(50);
                        }
                        /* That didn't work - force a triple fault.. */
                        __asm__ __volatile__("lidt %0": :"m" (no_idt));
                        __asm__ __volatile__("int3");
                }
        }

In another message, you wrote:
>Exactly an in spec, PC does not need to ground RST# on reboot.

	By "as in spec", were you referring to my quotation from _PCI
System Architecture, 4th Ed._ by Shanley and Anderson or are saying
that there is a specification that supports your statement that "a PC
does not need to ground RST# on reboot."  If the latter, I would
appreciate it if you would identify that specification and the
appropriate section with it for verification.

Elsewhere you wrote:
>Additionally it was decided quite a
>while ago that calling device->remove() was the correct way to
>accomplish this.

	Do you remember where this "was decided?"  If it was on an
archived mailing list, do you remember approximately when and what the
subject line might be?  I'd like to look at this discussion both to
verify you statement and to avoid repetition.

In another message you wrote:
>machine_restart returns control to the BIOS.  
>It works this way on both x86 and alpha.  And the BIOS does
>not always toggle the RESET line on the machine.  The frequent case
>of devices not working in Linux after rebooting from windows, and
>visa versa is evidence of this.  On alpha the SRM doesn't even
>pretend to reset the machine.

	Here you are talk first about doing a warm reboot from Windows
to Linux and the device does not work.  That sounds like a quirky
device.  Also, you have failed to show that this is a device where the
Linux driver's ->remove() function solves the problem and that there
are so many of these devices that (where the remove function *does*
solve the problem and there is no reboot notifier) that it would be
impractical change those drivers.

	If the problem is that the warm reboot procedure failed to
"stop any ongoing DMA, etc." as you put it, then failure mode that you
would more typically expect would be memory corruption, manifesting
itself as random of other programs, not getting through the reboot
process half of the time, etc.

	The underlying trade-off is that if I want to reboot fast, I
should not have to call a bunch of hotplug routines that have to do a
lot of book keeping because they have to assume that the computer may
continue to run user level programs indefinitely after they return
(things like deallocating memory, unregistering devfs inodes, etc.).
On the other hand, if I want to halt, I really do want the disks to be
spun down even if my motherboard does not do a master power-off from
software.  Doing that requires that the power management request that
reboot makes differ from the one that halt makes.  So they could
not both be just "device_shtudown();".

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-14 18:41 Adam J. Richter
  2002-10-14 20:05 ` Eric W. Biederman
  2002-10-16  8:01 ` Pavel Machek
  0 siblings, 2 replies; 45+ messages in thread
From: Adam J. Richter @ 2002-10-14 18:41 UTC (permalink / raw)
  To: ebiederm; +Cc: eblade, linux-kernel, rmk

Eric W. Biederman wrote:
>"Adam J. Richter" <adam@yggdrasil.com> writes:

>> Eric W. Biederman wrote:
>> >Russell King <rmk@arm.linux.org.uk> writes:
>> 
>> >> "rebooting" in this particular case is "turn MMU off, jump to location 0"
>> >And for x86 it is turn MMU off, jump to location 0xffff0.
>> 
>> 	The default reboot method of linux-2.5.42 on x86 is to have the
>> keyboard controller do a reset.  

>Resetting the cpu != resetting the system.  And the keyboard controller
>only does a cpu level reset.  Which is basically a convoluted way to jump
>to: 0xfffffff0.

	I would be quite surprised if that reset was not wired
to eventually ground RST# on the PCI bus.


[...]
>>         if(!reboot_thru_bios) {
>>                 /* rebooting needs to touch the page at absolute addr 0 */
>>                 *((unsigned short *)__va(0x472)) = reboot_mode;
>>                 for (;;) {
>>                         int i;
>>                         for (i=0; i<100; i++) {
>>                                 kb_wait();
>>                                 udelay(50);
>>                                 outb(0xfe,0x64);         /* pulse reset low */
>>                                 udelay(50);
>>                         }
>>                         /* That didn't work - force a triple fault.. */
>>                         __asm__ __volatile__("lidt %0": :"m" (no_idt));
>>                         __asm__ __volatile__("int3");
>>                 }
>>         }

>Please note the write to: 0x40:0x72.  If this was a full machine reset
>this would not make sense as memory would need to be reinitialized
>making it unsafe to keep values in ram.

	The issue is not how you define the term "full machine reset."
The issue is whether the PCI bus is eventually reset as a result of
this procedure before by BIOS Power On Self Test starts executing at
0xffff00.  If it is, then most of the device cleanup is unnecessary.
Even if it does not, it would make a lot more sense to just have
machine_reset assert RST# after those few devices that really need
cleanup are done.

>> In another message, you wrote:
>> >Exactly an in spec, PC does not need to ground RST# on reboot.
>> 
>> 	By "as in spec", were you referring to my quotation from _PCI
>> System Architecture, 4th Ed._ by Shanley and Anderson or are saying
>> that there is a specification that supports your statement that "a PC
>> does not need to ground RST# on reboot."  If the latter, I would
>> appreciate it if you would identify that specification and the
>> appropriate section with it for verification.

>I'd love to but mostly I was referring to the de facto standard for
>pc architecture.  Which I don't know if anyone ever wrote down, all
>in one place.

	Please do not say "as in spec" if you are not actually
referrring to a specification document (electronic form is OK).  It is
extremely misleading as to how reliable your claims are.  Instead,
please indicate what you remember about where you learned this
information.


>> Elsewhere you wrote:
>> >Additionally it was decided quite a
>> >while ago that calling device->remove() was the correct way to
>> >accomplish this.
>> 
>> 	Do you remember where this "was decided?"  If it was on an
>> archived mailing list, do you remember approximately when and what the
>> subject line might be?  I'd like to look at this discussion both to
>> verify you statement and to avoid repetition.

>No.  I was not in on that conversation.  However given the fact it
>is all over the documentation.  And has been in the kernel for quite
>a while now.  I tracked it as early as 2.5.30.

	Making stuff up just to win an argument is unlikely to produce
the best Linux kernel.  If you don't remember things, if you're not
sure of something, please just admit it instead of wasting an
iteration of email by faking authoratitive references.


>This is both device_shutdown in the reboot path,
>and device_shutdown calling ->remove()
> 
>> In another message you wrote:
>> >machine_restart returns control to the BIOS.  
>> >It works this way on both x86 and alpha.  And the BIOS does
>> >not always toggle the RESET line on the machine.  The frequent case
>> >of devices not working in Linux after rebooting from windows, and
>> >visa versa is evidence of this.  On alpha the SRM doesn't even
>> >pretend to reset the machine.
>> 
>> 	Here you are talk first about doing a warm reboot from Windows
>> to Linux and the device does not work.  That sounds like a quirky
>> device.  

>No it sounds like a driver that either can or cannot handle devices
>in an arbitrary state.

>> Also, you have failed to show that this is a device where the
>> Linux driver's ->remove() function solves the problem and that there
>> are so many of these devices that (where the remove function *does*
>> solve the problem and there is no reboot notifier) that it would be
>> impractical change those drivers.

>I do not feel I need to defend the status quo.  It is you who are proposing
>a change that needs to support why using ->remove() is a bad way to go.

	sys_reboot does not call device_shutdown in 2.4.19 and it was
added to 2.5 in 2.5.8, so calling it status quo to the point where you
think there is no need to defend it is pretty iffy.  If most Linux
systems are running 2.4 these days, then it appears that most are
running without the ->remove() call just fine.

	Also, it's not just a question of whether you should feel a
need to defend the status quo.  The reality is that you made some
factual claims ("frequent case of devices not working in Linux after
rebooting from windows") and you seem to be unable to support,
espeically with regard to the question of how many of these devices
actually started to work correctly with ->remove() and which do not
work without it (i.e., do not work in 2.4.19 but do work in 2.5.8+).
I wouldn't have a problem with it if you would just admit that you
don't remember or that you may have accidentally made this up or that
you were repeating something you heard elsewhere but never checked,
and we could weight that information accordingly.

>I will just say that shutting devices in a depth first manner is a lot safer,
>and more reliable than a random walk, you will get with a reboot notifier.

	I already said that I don't have a problem with having a
reboot notifier call in struct device for those devices that need some
sort of specific shutdown code that cannot be handled by resetting
their parent bus or something similar.

>> 	If the problem is that the warm reboot procedure failed to
>> "stop any ongoing DMA, etc." as you put it, then failure mode that you
>> would more typically expect would be memory corruption, manifesting
>> itself as random of other programs, not getting through the reboot
>> process half of the time, etc.

>You would only see it on the boot up side.  Not placing devices in
>a consistent state so another os can talk to them is significant as well.

	I quite don't understand what you're saying here.  Are you
saying people usually cannot boot from linux-2.4.19 into Windows?

>> 	The underlying trade-off is that if I want to reboot fast, I
>> should not have to call a bunch of hotplug routines that have to do a
>> lot of book keeping because they have to assume that the computer may
>> continue to run user level programs indefinitely after they return
>> (things like deallocating memory, unregistering devfs inodes, etc.).

>I would be very surprised if this showed up in practice.  If Russell King
>makes the argument about excess book keeping slowing things down.  I
>will buy it because he runs on some very slow machines.  Personally I
>doubt anything will be user noticeable.

	When you want to optimize for resources like speed and space,
one is generally happy if an optimizations by a percent or two at a
time.  These calls can do IO.  It could add up.  Also, for code
compiled into the kernel (or non-unloadable modules if anyone
implements them, I suppose), it means that otherwise could be compiled
out with __exit, so you have a bigger kernel foot print.  Finally,
from a reliability standpoint, in the case of rebooting because of a
kernel bug, it means that the reboot process wants to execute code and
walk through data structures in many more drivers.


>> On the other hand, if I want to halt, I really do want the disks to be
>> spun down even if my motherboard does not do a master power-off from
>> software.  Doing that requires that the power management request that
>> reboot makes differ from the one that halt makes.  So they could
>> not both be just "device_shtudown();".

>So you want to send device_suspend(?, SUSPEND_POWER_DOWN) on a halt.
>Or do a software shutdown thing.  

>Whoever mapped the SUSPEND_SHUT_DOWN to SUSPEND_POWER_DOWN very much
>had it wrong when that code was merged with 2.5.42.

>Mostly it looks to me that you have a problem with IDE driver spinning
>down disks on reboot.  That change also is new in 2.5.41.

	You do not understand.  The fundamental problem is that
sys_reboot in linux-2.5.42/kernel/sys.c makes exactly the same power
management call for a reboot and for a halt (device_shutdown()).  I
want my IDE, SCSI, USB, and FireWire hard disks to spin down if I do a
halt.  I do not want them to do that when I reboot.  I also do not
want my reboot to wait for an interrupt from a sound card because
remove() needs to determine if it is being called because the device
had just been unplugged.  It is not just about IDE disk drives.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."


^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-15  2:53 Adam J. Richter
  2002-10-15 16:59 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-15  2:53 UTC (permalink / raw)
  To: ebiederm; +Cc: eblade, linux-kernel, rmk

Eric W. Biederman writes:
>>       You do not understand.  The fundamental problem is that
>> sys_reboot in linux-2.5.42/kernel/sys.c makes exactly the same power
>> management call for a reboot and for a halt (device_shutdown()).  I
>> want my IDE, SCSI, USB, and FireWire hard disks to spin down if I do a
>> halt.  I do not want them to do that when I reboot.  I also do not
>> want my reboot to wait for an interrupt from a sound card because
>> remove() needs to determine if it is being called because the device
>> had just been unplugged.  It is not just about IDE disk drives.
[...]
>Do you seriously have a hot-plug sound card, that waits for an
>interrupt to be certain the card is plugged in during remove?

        Although I was being hypothetical, come to think of it, I do
some USB speakers that I have not been using, and detecting their
removal is done by the host contoller polling the hub for an
"interrupt", typically once per millisecond.

>If there is non-trivial work to detect if a card is present it
>probably makes sense to factor remove into
>->quiet() and ->remove()
>Where quiet would put the device into a quiescent state, and
>remove would simply clean up the driver state.

        Splitting into ->quiet() and ->removed() would be helpful
in any case, where removed() would normally not touch the hardware,
since it is quite possible the device has already been removed,
since the callers of these routines generally know if they are
calling because the device has been removed or because they want
just want to turn it off, while ->remove() currently has to guess,
which not only wastes time but also can be difficult to do safely
when you don't know if the device that you're talking to is even
present anymore.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-15 18:54 Adam J. Richter
  0 siblings, 0 replies; 45+ messages in thread
From: Adam J. Richter @ 2002-10-15 18:54 UTC (permalink / raw)
  To: mochel; +Cc: ebiederm, eblade, linux-kernel

Patrick Mochel writes:
>> 	If you've identified a bunch of devices that need
>> reboot_notifier, please list them so we can fix them rather than
>> taxing the users with unnecessarily slow reboots or poor battery life
>> (due to device not being shut down when they are no longer being used).

>No, reboot notifiers are the completely wrong way to go, for one reason 
>alone: ordering. device_shutdown() does a depth-first walk of the tree to 
>shut down children devices before ancestors. You cannot guarantee that 
>with reboot notifiers.

	The question is not whether reboot notifiers should also be
part of struct device for sequencing purposes.  Nobody is arguing
against that.

	The question is whether it is impractical to have a distinct
device->reboot_notifier() for those devices that truely need it so
that we can avoid calling ->remove() for most devices.  To answer that,
I think you need list a large number of devices that match ALL THREE
of the following criteria:

|		1. They need to use reboot_notifier because the
|		   machine_restart() is not enough to reset them,
|
|		2. They do not currently use reboot_notifier,
|
|		3. They have a device->remove() that does what the
|		   reboot_notifier should do (shuts down ongoing DMA,
|		   because they have a some device that ignores a reset
|		   signal or something like that).

>Please don't try and convolute the code because you're worried about a few
>microseconds. It's about correctness first; then we can worry about
>micro-optimizing the hell out of it.

	1. When something has gone wrong with a device driver, you generally
	   gather what information you can and then try a warm reboot,
	   especially when one is working remotely.  Data structures in
	   device drivers can often corrupted under these circumstances.
	   So it is important that warm reboots reboot the system as
	   simply as they can with reliability.  That means only executing
	   the special shutdown code that really needs to be executed.

	2. For small platforms, I can about keeping the __devexit code out
	   of the kernel footprint.  I don't want Linux to lose the
	   competition small embedded devices (grated such decisions do not
	   pivot on __devexit alone, but a cultrue of wastefulness leads
	   creates bloat in steps such like this one).

	3. I do care that reboot goes fast.  Optimizations for speed and
	   space generally come about by making lots of little clean ups.
	   The standard power on self-test takes ages on most PCs, but
	   it doesn't have to be that way, and it isn't necessarily that
	   way on other platforms.  I'd linux not to be an impedement
	   to having systems where you don't even see the screen go black
	   when you reboot.

	Also, while your post is a very mild example, circular
arguments that implicitly attempt to define a term like "correctness"
waste everyone's time at best. At worst, such arguments lead to
software that unnecessarily biggers, less functional, slower to run,
or slower to be developed, harder to maintain or inferior by some
other real metric because someone made a trade-off against a real
benefit in favor of something that sounded beneficial but was actally
just a a circular definition.

	Instead, please show underlying benefits.  Show me how calling
->remove() for every device to do a warm reboot will make the kernel
smaller or will make reboots go faster.  If your change improves
reliability, then you should be able show real examples where reboots
work in 2.5.8+ and fail in 2.4, so that we might try to quantify the
probable trade-off between that scenario (where we still have reboot
notifiers available) and potentially running into problems by calling
clean-up code in each device driver.  In comparison, I frequently get
into trouble with the IDE drivers by inserting and removing
CompactFlash cards (haven't tried in 2.4.42), so I really do not want
any more clean-up code called that is necessarily if the kernel is
confused.  If you cannot show some benefit, it looks like calling all
those remove() routines will be a lose on speed, size and even
reliability.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-15 19:52 Adam J. Richter
  2002-10-16 12:13 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-15 19:52 UTC (permalink / raw)
  To: ebiederm; +Cc: eblade, linux-kernel, mochel, rmk

Eric W. Biederman writes:

>> >If there is non-trivial work to detect if a card is present it
>> >probably makes sense to factor remove into
>> >->quiet() and ->remove()
>> >Where quiet would put the device into a quiescent state, and
>> >remove would simply clean up the driver state.
>> 
>>         Splitting into ->quiet() and ->removed() would be helpful
>> in any case, where removed() would normally not touch the hardware,
>> since it is quite possible the device has already been removed,
>> since the callers of these routines generally know if they are
>> calling because the device has been removed or because they want
>> just want to turn it off, while ->remove() currently has to guess,
>> which not only wastes time but also can be difficult to do safely
>> when you don't know if the device that you're talking to is even
>> present anymore.

>Except for the case when a device is physically swapped before ->remove()
>which is really, really, nasty.  But it is quiet unlikely anyone will
>actually be that fast.  Whatever talks to the hardware has to check to
>see if it's device  is present.  But if usb is anything like PCI it 
>should be a very inexpensive check to see if a driver is present.  And
>writes to a non-existent device should be safe.

	I am not aware of any hotplug bus where removing a device and
inserting a new one leads to the new device being in a state where the
driver for the old device could accidentally talk to it without the
kernel actively doing something in the meantime like turning the
affected socket back on.

	For example, for USB, the hub port will the new device will be
plugged in (and the computer will be notified of this by the hub) but
the hub port will not remain connected.  The computer has to turn the
hub port back on.  The whole USB initialization process is documented
extremely well in the USB 2.0 standards which you can download for
free at http://www.usb.org, and it is also documented in the MindShare
books (_Universal Serail Bus System Architecture_ by Don Anderson and
Dave Dzatko, both 1st and 2nd Edition), although it is just as clear in
the standards document.

	For FireWire, both the removal and the insertion cause cause
all addresses of all devices to be reassigned, and the new device will
not have an address before that is complete, at least if I am
correclty reading _FireWire Sytem Architecure, 2nd Edition_ by Don
Anderson, as I haven't read the standard.

	I know that PCMCIA can power down a socket.  I assume that the
interrupt that detects card removal does this or otherwise disconnects
the socket when a card is removed and that the kernel does not turn
the socket back on until the remove routine for the card has
completed.  [From the manaul page for cardctl and looking at the
CardServices interface described in _PCMCIA System Architecture_,
2nd Edition by Don Anderson.]

	For CardBus, I assume that PCMCIA's protections apply, and I
suspect the PCI base address registers are defined to be clear at
insertion, so the device-specific IO ports and memory regions will not
be mapped.  In addition, the kernel does book keeping on which IO
ports and memory regions are currently allocated with
request_region(), so I assume it would assign the newly plugged in
card's IO and memory elsewhere if even if it were, so it may even be
safe for the kernel to map the new card and start its initialization
before the clean up for the old card has completed.

	For hotplug PCI, CompactPCI and PCI Mezzanine Cards, it seems
the user actually has to ask permission to insert and remove cards,
but, even so, a newly inserted card is initially electrically isolated
from the PCI bus, and the computer uses special hardware that allows
it to provide power and assert the PCI RST# signal for that card only
before deasserting RST# and then connecting it to the rest of the bus.
This clears the newly inserted card's PCI base address
registers. [_PCI System Architecture_, 4th edition by Tom Shanley and
Don Anderson, Chapter 22: Hot-Plug PCI, pages 464-465, 753-754.  Alas,
the pcisig.org standards are propritary.]  What I said about the
request_region() bookkeeping for CardBus applies here too.

	Besides, you have not identified a safe way that a combined
->remove() function can detect such situations more reliably than
separate ->quiet() and ->removed(), which at least have the benefit of
knowing what the kernel currently thinks the situation is.  So, you
really have no basis for saying "Splitting ->remove() into quiet() and
->removed() will be racy."

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-17  1:50 Adam J. Richter
  2002-10-17  9:08 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-17  1:50 UTC (permalink / raw)
  To: ebiederm; +Cc: eblade, linux-kernel, mochel, rmk

Eric Beiderman wrote:
>[snip details on a lot of hot plug code]
>
>No major comment except that I know for compact pci you can just
>walk up and unplug it.  As we have several compact pci boards here
>at work.
>
>But thank you for the clarification that a only on badly designed
>hardware a newly plugged in device will

	From the previously referenced MindShare book, it looks like
CompactPCI has pins of different lengths to ensure that the interrupt
indicating card removal will never later than the other contacts being
broken, to avoid races in code like this:

		control = inb(dev->base + CONTROL_PORT);
		data = inb(dev->base + CONTROL_PORT);
		if (dev->was_removed)
			return -ENODEV;
		...

static void remove_interrupt(struct device *dev)
{
	dev->was_removed = 1;
	unref_hardware(dev);
}
static inline void unref_hardware(struct device *dev)
{
	/* hw_ref_count is:
		- set 1 when driver is attached
		- incremented when a device or network interface is opened
		- decremented when a device or network interface is closed
		- decremented when hardware is removed or driver is detached
	*/
	if (atomic_dec(&dev->hw_ref_count))
		(*dev->driver->removed)(dev);
}


>> 	Besides, you have not identified a safe way that a combined
>> ->remove() function can detect such situations more reliably than
>> separate ->quiet() and ->removed(), which at least have the benefit of
>> knowing what the kernel currently thinks the situation is.  So, you
>> really have no basis for saying "Splitting ->remove() into quiet() and
>> ->removed() will be racy."

>O.k. Let me attempt a clarification, of what I was thinking.
>Currently pseudo code for remove does:

>remove() {
>	if (device_present()) {
		XXXXXXXXXXXX
>		device_be_quiet()
>	}
>	device_free(device_strucutres);
>}

	The device could still be removed where I put the "XXXXXXXXXX"
or in the midst of device_be_quiet().  You appear to be trying to
narrow the window of vulnerability because you apparently don't know
how to eliminate it.  That's fine if you cannot eliminate it, but, for
almost all if not all busses designed for hot plugging I believe you
can eliminate it entirely.

	You should not just be bracketing code that attempts
transactions with a device inside of a test of the device's presence.
Instead, you should look understand the behavior that occurs when
device is gone and program for that.  For example, if removal of a
device that is mapped to IO and memory space (PCMCIA, PCI, etc.) an
initial notification interrupt followed by writes to the previously
mapped areas being ignored and reads returning garbage, then you can
write code like:

static int foodev_send(struct foodev *dev, dma_addr_t ptr, int len)
{
	outl(dev->dma_addr, ptr);
	outl(dev->send_len, len);
	outw(dev->control_port, GET_READY_TO_SEND);
	/* Notice we did not check if the device is present until now. */
	return (dev->device.is_removed) ? -ENODEV : 0;
}

	In general, as long as you know that the addresses that your
driver is using will not be reassigned until your driver releases
them, you can do operations and just be prepared to handle the defined
error that occurs when you send do a transaction to a non-existant
device.  The driver typically only has to check for the device's
presence for things like taking some externally visible action or is
in some kind of wait loop awaiting an event that won't occur if the
device has been removed.  This is true device not mapped to the memory
and IO busses as well, such as USB, FireWire or SCSI (SCSI devices
that support hot plugging have assignable addresses).


>The way I imagined the split up:
>if (device->present()) {
>        device->be_quiet();
>}
>device->free_resources();


	device->present() is usually not specific to a device, but
rather to the parent bus, and that usually involves just checking the
internal information that it already has as to whether it has received
some kind of interrupt.  More importantly, the generators of requests
to power down a device and notification that the device is removed are
usually separate and they usually one want one of those functions.
There is really very little reason to group them into the same
function.  Typically, the callers will look like this:


void
foobus_interrupt(void *arg;)
{
	struct foobus *foobus = arg;
	event = inb(foobus_intr_register) & EVENT_MASK;
	arg = inb(foobus_slow_register);
	switch(event) {
		case FOOBUS_REMOVE:
			slot = arg & SLOT_MASK;
			unref_hardware(foobus->devs[slot]);
			break;
		...
	}
	outb(foobus_intr_acknlowledge, 1);
}

foobus_powerdown(usb_device *dev)
{
	(*dev->device->driver->quiet)(dev);
}

>Doing device_be_quiet() without the device_present() check is racy,
>because devices can be physically removed at arbitrary times.

	Your code definitely is racy.  I can see how to write these
things without races, which is something that I don't think you really
understand.

	Engineers think about these things when they design busses for
hot plugging.  I notice that programmers often have trouble really
grasping that other people might be much better than them in a
particular skill area.  Here I think you are really assuming a too low
of a competence level among those who design these busses.  I'm not
asking you to assume no mistakes have been made in the support of hot
plugging in these busses, but it's clear that you're assuming that
obvious mistakes have been made, and that assumption doesn't check
out.  If you think there is a race in a particular bus, show me a
clear example.


[...]
>I cannot imagine freeing data structures will noticeably slow a reboot
>down, so unless we actually need to tell a device to be quiet for
>other reasons besides driver removal, and machine reboot I do not see
>a point in adding complexity by changing the interface.

	Changing the interface will reduce complexity as only the code
that needs to be executed will be called.  Since your current
->remove() function can potentially do blocking IO to turn off a
device, it can potentially take a long time.  Also, you apparently
did not read this part of my previous message to Eric Blade in this thread:

|	1. When something has gone wrong with a device driver, you generally
|	   gather what information you can and then try a warm reboot,
|	   especially when one is working remotely.  Data structures in
|	   device drivers can often corrupted under these circumstances.
|	   So it is important that warm reboots reboot the system as
|	   simply as they can with reliability.  That means only executing
|	   the special shutdown code that really needs to be executed.
|
|	2. For small platforms, I can about keeping the __devexit code out
|	   of the kernel footprint.  I don't want Linux to lose the
|	   competition small embedded devices (grated such decisions do not
|	   pivot on __devexit alone, but a cultrue of wastefulness leads
|	   creates bloat in steps such like this one).
|
|	3. I do care that reboot goes fast.  Optimizations for speed and
|	   space generally come about by making lots of little clean ups.
|	   The standard power on self-test takes ages on most PCs, but
|	   it doesn't have to be that way, and it isn't necessarily that
|	   way on other platforms.  I'd linux not to be an impedement
|	   to having systems where you don't even see the screen go black
|	   when you reboot.


>There may be
>a valid argument in the suspend to swap case, but I will cross that
>bridge when i come to it. 

>For my thinking on how this should be handled:

>Except in some very select special instances, I do not know
>of a single device where it is safe to assume the hardware is in a
>sane state at any random given moment when we decide to reboot.

	I don't know what you mean by "sane state."  If you mean that
it's in a state that the reboot code can handle, I think almost all
hardware typically is.  If that were not the case, warm reboots from
2.4.x would not work.

[...]
>For 2.5 calling remove on reboot may not fix any issues, but it is
>certainly a correct thing to do.

	Here you go again, trying to make a circular argument by
arguing a definition of "correct" instead of show concrete advantages.
You're just wasting your time and any other reader's when you do that.

>And even if it does not fix issues
>today, it allows bugs to be fixed, as the come up.  And it allows
>the code to be tested by just doing a modular build and inserting
>and removing the module.

	You are confused.  I never said that the module removal code
should not quiet the hardware that it is talking to.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."
`h

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-20  7:01 Adam J. Richter
  2002-10-20  9:17 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-20  7:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: ebiederm, eblade, mochel

Eric W. Biederman wrote:
>Hopefully I am not coming off harsh but I am a little cranky this
>morning,  As before this change I thought things on the device side
>were pretty much under control they just needed a little stabilization
>and testing.  And now someone possibly me has to go through every
>driver and write a shutdown method because someone figured calling
>free was expensive.

	I think I've essentially refuted this in parts of previous
messages on this thread that Eric has basically ignored in his
responses.

	Interested readers should check the previous lkml messages
with this subject line.  Of particular relevance to this issue, see my
list of three advantages of the 2.4.x approach of not calling
unnecessary device-specific shutdown sequences in my message at
http://marc.theaimsgroup.com/?l=linux-kernel&m=103481960911183&w=2,
where I pasted in a response on the same issue that I originally
wroted in this thread to Eric Blade (after the paragraph that begins
"Changing the interface will reduce coplexity as only the code that
needs to be executed will be called.").  Also of interest is Richard
B. Johnson's example of BIOS reset code that shuts off the PCI bus
before touching any RAM at
http://marc.theaimsgroup.com/?l=linux-kernel&m=103462697923792&w=2.

	I'm willing to get into this in detail again upon request.
Otherwise, I think it would be a better use of everyone's time not to
subject linux-kernel readers to an infinite loop.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."

^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-21 20:56 Adam J. Richter
  2002-10-22  4:28 ` Eric W. Biederman
  0 siblings, 1 reply; 45+ messages in thread
From: Adam J. Richter @ 2002-10-21 20:56 UTC (permalink / raw)
  To: ebiederm, mochel; +Cc: eblade, linux-kernel, rmk

Eric Biederman writes:
>My big concern is with getting the shutdown path setup in a manner
>that works, and gets testing.

	Rebooting without traversing the device tree seems to have
essentially worked fine for 2.4.x.

>When booting linux from linux with
>sys_kexec a lot of my problems come back to some device driver not
>getting shutdown.

	kmonte and sys_kexec skip the BIOS reset code and therefore
may need to do more elaborate shutdown, but please do not saddle the
normal reboot case with the reliability risk of calling code in each
driver when a user might be rebooting a remote machine precisely
because of a a confused device driver or the potential slow down
(especially since you want an interface where the function that gets
called before reboot may need to do blocking IO).  For
kmonte/sys_kexec, this high cost might be necessary, but for the
normal reboot the cost is not worth the benefit.

	By way, given the ability to register reboot notifiers in the
device tree, I would be happy to see one registered at the top of the
PCI bus tree (so it would be called last) that would shut off the PCI
bus before reboot, along the lines of what Richard B. Johnson posted.
That would not involve walking a lot of data structures in many
different device drivers and it would be just a few instrutions.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."


^ permalink raw reply	[flat|nested] 45+ messages in thread
* Re: Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices
@ 2002-10-21 22:26 Adam J. Richter
  0 siblings, 0 replies; 45+ messages in thread
From: Adam J. Richter @ 2002-10-21 22:26 UTC (permalink / raw)
  To: ebiederm, mochel; +Cc: eblade, linux-kernel, rmk

I wrote to Eric Biederman:
>        kmonte and sys_kexec skip the BIOS reset code and therefore
>may need to do more elaborate shutdown, but please do not saddle the
>normal reboot case with [that].

	Let me add that I would be quite happy to see the common code
in sys_reboot moved into a shared exported routine that sys_kexec and
kmonte could use, so that they would automatically track changes to
that procedure rather than requiring maintenance of a duplicate copy
of that code.

Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."



^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2002-10-22  4:24 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-13 19:24 Patch: linux-2.5.42/kernel/sys.c - warm reboot should not suspend devices Adam J. Richter
2002-10-13 19:51 ` Eric Blade
2002-10-13 21:27   ` Eric W. Biederman
2002-10-13 22:52 ` Andries Brouwer
2002-10-14  0:30   ` Eric W. Biederman
2002-10-13 22:14 Adam J. Richter
2002-10-13 22:31 ` Russell King
2002-10-13 23:49 ` Eric W. Biederman
2002-10-15 16:35 ` Patrick Mochel
2002-10-15 20:04   ` Mikael Pettersson
2002-10-19 18:30   ` Eric W. Biederman
2002-10-20  9:47     ` Eric W. Biederman
2002-10-13 23:10 Adam J. Richter
2002-10-13 23:15 ` Russell King
2002-10-14  0:03   ` Eric W. Biederman
2002-10-13 23:54 ` Eric W. Biederman
2002-10-13 23:59 Adam J. Richter
2002-10-14  0:07 ` Eric W. Biederman
2002-10-14  5:38   ` Eric Blade
2002-10-14 15:28     ` Eric W. Biederman
2002-10-15  4:34       ` Eric Blade
2002-10-14 15:25 Adam J. Richter
2002-10-14 16:44 ` Eric W. Biederman
2002-10-14 17:48   ` Richard B. Johnson
2002-10-14 19:28     ` Eric W. Biederman
2002-10-14 20:17       ` Richard B. Johnson
2002-10-14 18:41 Adam J. Richter
2002-10-14 20:05 ` Eric W. Biederman
2002-10-15  4:55   ` Eric Blade
2002-10-16  8:01 ` Pavel Machek
2002-10-15  2:53 Adam J. Richter
2002-10-15 16:59 ` Eric W. Biederman
2002-10-15 18:54 Adam J. Richter
2002-10-15 19:52 Adam J. Richter
2002-10-16 12:13 ` Eric W. Biederman
2002-10-17  1:50 Adam J. Richter
2002-10-17  9:08 ` Eric W. Biederman
2002-10-20  7:01 Adam J. Richter
2002-10-20  9:17 ` Eric W. Biederman
2002-10-20 20:43   ` Patrick Mochel
2002-10-20 23:57     ` Eric W. Biederman
2002-10-21 17:13       ` Patrick Mochel
2002-10-21 20:56 Adam J. Richter
2002-10-22  4:28 ` Eric W. Biederman
2002-10-21 22:26 Adam J. Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).