* Re: Back to the future. [not found] ` <8elpT-7wY-21@gated-at.bofh.it> @ 2007-04-28 11:04 ` Bodo Eggert 0 siblings, 0 replies; 135+ messages in thread From: Bodo Eggert @ 2007-04-28 11:04 UTC (permalink / raw) To: Pavel Machek, David Lang, Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML Pavel Machek <pavel@ucw.cz> wrote: >> I also don't like the idea of storing this in the swap partition for a >> couple of reasons. >> >> 1. on many modern linux systems the swap partition is not large enough. >> >> for example, on my boxes with 16G or ram I only allocate 2G of swap >> space > > WTF? So allocate larger swap partition. You just told me disks are big > enough. 1) Repartitioning is sometimes not an option. 2) What happens, if the swap space gets used? I want to be sure I can suspend my {server,laptop} in case of power running out. Using swap is only an option for desktops. >> 2. it's too easy for other things to stomp on your swap partition. >> >> for example: booting from a live CD that finds and uses swap >> partitions > > That's a feature. If you are booting from live CD, you _want_ to erase > any hibernation image. NACK. You want to keep all partitions related to the hibernated system read-only. That's completely different from destroying all your unsafed data and possibly long-running tasks. -- Top 100 things you don't want the sysadmin to say: 51. YEEEHA!!! What a CRASH!!! Friß, Spammer: C@rzlmn.7eggert.dyndns.org D9GLNDg@Zk.7eggert.dyndns.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Back to the future. @ 2007-04-26 6:04 Nigel Cunningham 2007-04-26 7:28 ` Pekka Enberg ` (2 more replies) 0 siblings, 3 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 6:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: LKML [-- Attachment #1: Type: text/plain, Size: 1247 bytes --] Hi again. So - trying to get back to the original discussion - what (if anything) do you see as the way ahead? The options I can think of are (starting with things I can do): 1) I stop developing Suspend2, thereby pushing however many current Suspend2 users to move to [u]swsusp and seek to get that up to speed. 2) I quit my day job, see if Redhat will take me full time and give me the time to start trying to merge Suspend2 bit by bit. Alternatively, days suddenly become 8 hours longer and I discover the boundless energy and alertness needed to do this too :). Ok. Not going to happen. 3) Someone else steps up to the plate and tries to merge Suspend2 one bit at a time. 4) uswsusp and swsusp get dropped and Suspend2 goes into mainline. 5) Everything gets dropped and we start from scratch. 6) The status quo - or some small variant of it - stays. Oh... I said "way ahead". I guess that rules this one out, even though I'll be very surprised if it's not the one that wins out. 7) Suspend2 gets merged and people get to choose which they like better. Nearly forgot this as a conceivable possibility. Yeah, I know you said you don't want it. I'm just trying to think of what might possibly happen. N. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 6:04 Nigel Cunningham @ 2007-04-26 7:28 ` Pekka Enberg [not found] ` <1177573348.50 25.224.camel@nigel.suspend2.net> 2007-04-26 7:42 ` Nigel Cunningham 2007-04-26 8:38 ` Jan Engelhardt 2007-04-28 0:28 ` Bojan Smojver 2 siblings, 2 replies; 135+ messages in thread From: Pekka Enberg @ 2007-04-26 7:28 UTC (permalink / raw) To: nigel; +Cc: Linus Torvalds, LKML On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > 3) Someone else steps up to the plate and tries to merge Suspend2 one > bit at a time. So which bits do we want to merge? For example, Suspend2 kernel/power/ui.c, kernel/power/compression.c, and kernel/power/encryption.c seem pointless now that we have uswsusp. Furthermore, being the shameless Linus cheerleader that I am, I got the impression that we should fix the snapshot/shutdown logic in the kernel which Suspend2 doesn't really address? Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
[parent not found: <1177573348.50 25.224.camel@nigel.suspend2.net>]
* Re: Back to the future. 2007-04-26 7:28 ` Pekka Enberg [not found] ` <1177573348.50 25.224.camel@nigel.suspend2.net> @ 2007-04-26 7:42 ` Nigel Cunningham 2007-04-26 8:17 ` Pekka Enberg 2007-04-26 16:56 ` Linus Torvalds 1 sibling, 2 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 7:42 UTC (permalink / raw) To: Pekka Enberg; +Cc: Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 1027 bytes --] Hi. On Thu, 2007-04-26 at 10:28 +0300, Pekka Enberg wrote: > On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > > 3) Someone else steps up to the plate and tries to merge Suspend2 one > > bit at a time. > > So which bits do we want to merge? For example, Suspend2 > kernel/power/ui.c, kernel/power/compression.c, and > kernel/power/encryption.c seem pointless now that we have uswsusp. > Furthermore, being the shameless Linus cheerleader that I am, I got > the impression that we should fix the snapshot/shutdown logic in the > kernel which Suspend2 doesn't really address? I agree that the driver logic could be addressed too, but to answer your question... * Doing things in the right order? (Prepare the image, then do the atomic copy, then save). * Mulithreaded I/O (might as well use multiple cores to compress the image, now that we're hotplugging later). * Support for > 1 swap device. * Support for ordinary files. * Full image option. * Modular design? Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 7:42 ` Nigel Cunningham @ 2007-04-26 8:17 ` Pekka Enberg 2007-04-26 9:28 ` Nigel Cunningham 2007-04-26 16:56 ` Linus Torvalds 1 sibling, 1 reply; 135+ messages in thread From: Pekka Enberg @ 2007-04-26 8:17 UTC (permalink / raw) To: nigel; +Cc: Linus Torvalds, LKML Hi Nigel, On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > * Doing things in the right order? (Prepare the image, then do the > atomic copy, then save). As I am a total newbie to the power management code, I am unable to spot the conceptual difference in uswsusp suspend.c:suspend_system() and suspend2 kernel/power/suspend.c:suspend_main(). How are they different? On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > * Mulithreaded I/O (might as well use multiple cores to compress the > image, now that we're hotplugging later). I assume this doesn't affect the kernel at all with uswsusp? On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > * Modular design? This is too broad. Please be more specific of the problems the current suspend and snapshot/shutdown code in the kernel has. Now to add to your list, as far as I can tell, suspend2 provides better feedback to the user via the netlink mechanism (although the kernel shouldn't be sending messages such as userui_redraw but instead let the userspace know of the actual events, for example, that tasks have now been frozen). However, I am unsure if this is still relevant as most of the work (snapshot writing) is being done in userspace where we explicitly know when processes have been frozen, when the snapshot is finished, and when it's written to disk. Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 8:17 ` Pekka Enberg @ 2007-04-26 9:28 ` Nigel Cunningham 2007-04-26 17:29 ` Luca Tettamanti 0 siblings, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 9:28 UTC (permalink / raw) To: Pekka Enberg; +Cc: Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 1941 bytes --] Hi. On Thu, 2007-04-26 at 11:17 +0300, Pekka Enberg wrote: > Hi Nigel, > > On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > > * Doing things in the right order? (Prepare the image, then do the > > atomic copy, then save). > > As I am a total newbie to the power management code, I am unable to > spot the conceptual difference in uswsusp suspend.c:suspend_system() > and suspend2 kernel/power/suspend.c:suspend_main(). How are they > different? Will discuss in irc since you've appeared there... > On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > > * Mulithreaded I/O (might as well use multiple cores to compress the > > image, now that we're hotplugging later). > > I assume this doesn't affect the kernel at all with uswsusp? Well uswsusp would benefit from using multiple threads - if it can - to do the work. I saw quite an improvement from implementing it. > On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: > > * Modular design? > > This is too broad. Please be more specific of the problems the current > suspend and snapshot/shutdown code in the kernel has. Did you see the 'Reasons to merge' email I sent? It has more detail on this. > Now to add to your list, as far as I can tell, suspend2 provides > better feedback to the user via the netlink mechanism (although the > kernel shouldn't be sending messages such as userui_redraw but instead > let the userspace know of the actual events, for example, that tasks > have now been frozen). However, I am unsure if this is still relevant > as most of the work (snapshot writing) is being done in userspace > where we explicitly know when processes have been frozen, when the > snapshot is finished, and when it's written to disk. From uswsusp's point of view, yeah. But I'm still coming from the 'doing this in kernelspace makes far more sense' perspective. Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 9:28 ` Nigel Cunningham @ 2007-04-26 17:29 ` Luca Tettamanti 0 siblings, 0 replies; 135+ messages in thread From: Luca Tettamanti @ 2007-04-26 17:29 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Pekka Enberg, Linus Torvalds, linux-kernel Nigel Cunningham <nigel@nigel.suspend2.net> ha scritto: > On Thu, 2007-04-26 at 11:17 +0300, Pekka Enberg wrote: >> On 4/26/07, Nigel Cunningham <nigel@nigel.suspend2.net> wrote: >> > * Mulithreaded I/O (might as well use multiple cores to compress the >> > image, now that we're hotplugging later). >> >> I assume this doesn't affect the kernel at all with uswsusp? > > Well uswsusp would benefit from using multiple threads - if it can - to > do the work. I saw quite an improvement from implementing it. It's doable[1], but I'm not sure that the added complexity is worth of it. I'm suprised that you see a big improvement. I'd expect that the image write is bottlenecked by the disk performance. On my PC (Core2, locked at 1.6GHz) lzf can compress 250-280MB/s; even with an older CPU that can do 1/3 it's still more than the disk can handle. Luca [1] We may even use MPI to compress over a Beowulf cluster, it's userspace ;) -- "Ricorda sempre che sei unico, esattamente come tutti gli altri". ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 7:42 ` Nigel Cunningham 2007-04-26 8:17 ` Pekka Enberg @ 2007-04-26 16:56 ` Linus Torvalds 2007-04-26 17:03 ` Xavier Bestel ` (6 more replies) 1 sibling, 7 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-26 16:56 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Pekka Enberg, LKML On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > * Doing things in the right order? (Prepare the image, then do the > atomic copy, then save). I'd actually like to discuss this a bit.. I'm obviously not a huge fan of the whole user/kernel level split and interfaces, but I actually do think that there is *one* split that makes sense: - generate the (whole) snapshot image entirely inside the kernel - do nothing else (ie no IO at all), and just export it as a single image to user space (literally just mapping the pages into user space). *one* interface. None of the "pretty UI update" crap. Just a single system call: void *snapshot_system(u32 *size); which will map in the snapshot, return the mapped address and the size (and if you want to support snapshots > 4GB, be my guest, but I suspect you're actually *better* off just admitting that if you cannot shrink the snapshot to less than 32 bits, it's not worth doing) User space gets a fully running system, with that one process having that one image mapped into its address space. It can then compress/write/do whatever to that snapshot. You need one other system call, of course, which is int resume_snapshot(void *snapshot, u32 size); and for testing, you should be able to basically do u32 size; void *buffer = snapshot_system(&size); if (buffer != MAP_FAILED) resume_snapshot(buffer, size); and it should obviously work. And btw, the device model changes are a big part of this. Because I don't think it's even remotely debuggable with the full suspend/resume of the devices being part of generating the image! That freeze/snapshot/unfreeze sequence is likely a lot more debuggable, if only because freeze/unfreeze is actually a no-op for most devices, and snapshotting is trivial too. Once you have that snapshot image in user space you can do anything you want. And again: you'd hav a fully working system: not any degradation *at*all*. If you're in X, then X will continue running etc even after the snapshotting, although obviously the snapshotting will have tried to page a lot of stuff out in order to make the snapshot smaller, so you'll likely be crawling. > * Mulithreaded I/O (might as well use multiple cores to compress the > image, now that we're hotplugging later). > * Support for > 1 swap device. > * Support for ordinary files. > * Full image option. > * Modular design? I'd really suggest _just_ the "full image". Nothing else is probably ever worth supporting. Your "snapshot to disk" wouldn't be _quite_ as simple as "echo disk > /sys/power/state", but it should not necessarily be much worse than snapshot_kernel | gzip -9 > /dev/snapshot either (and resuming from the snapshot would just be the reverse)! And if you want to send the snapshot over a TCP connection to another host, be my guest. With pretty images while it's transferring. Whatever. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds @ 2007-04-26 17:03 ` Xavier Bestel 2007-04-26 17:34 ` Linus Torvalds 2007-04-26 17:07 ` Linus Torvalds ` (5 subsequent siblings) 6 siblings, 1 reply; 135+ messages in thread From: Xavier Bestel @ 2007-04-26 17:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nigel Cunningham, Pekka Enberg, LKML On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote: > Once you have that snapshot image in user space you can do anything you > want. And again: you'd hav a fully working system: not any degradation > *at*all*. If you're in X, then X will continue running etc even after the > snapshotting Won't there be problems if e.g. X tries to write something to its logfile after snapshot ? Xav ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 17:03 ` Xavier Bestel @ 2007-04-26 17:34 ` Linus Torvalds 2007-04-26 20:08 ` Nigel Cunningham 2007-04-27 7:51 ` Pekka Enberg 0 siblings, 2 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-26 17:34 UTC (permalink / raw) To: Xavier Bestel; +Cc: Nigel Cunningham, Pekka Enberg, LKML On Thu, 26 Apr 2007, Xavier Bestel wrote: > > Won't there be problems if e.g. X tries to write something to its > logfile after snapshot ? Sure. But that's a user-level issue. You do have to allow writing after snapshotting, since at a minimum, you'd want the snapshot itself to be written. So the kernel has to be fully running, and support full user space. No "degraded mode" like now. So when I said "fully running user mode", I really meant it from the perspective of the kernel - not necessarily from the perspective of the "user". You do want to limit _what_ user mode does, but you must not limit it by making the kernel less capable. Remounting mounted filesystems read-only sounds like a good idea, for example. We can do that. We have the technology. But we shouldn't limit user space from doing other things (for example, it might want to actually *mount* a new filesystem for writing the snapshot image). For example, right now we try to "fix" that with the whole process freezer thing. And that code has *caused* more problems than it fixed, since it tries to freeze all the kernel threads etc, and you simply don't have a truly *working*system*. I think it's fine to freeze processes if that is what you want to do (for example, send them SIGSTOP), but we freeze them *too*much* right now, and the suspend stuff has taken over policy way too much. We don't actually leave the system in a runnable state. I can almost guarantee that you'd be *better* off having the snapshot taking thing do a kill(-1, SIGSTOP); in user space than our current broken process freezer. At least that wouldn't have screwed up those kernel threads as badly as swsusp did. And no, I'm not saying that my suggestion is the only way to do it. Go wild. But the *current* situation is just broken. Three different things, none of which people can agree on. I'd *much* rather see a conceptually simpler approach that then required, but even more important is that right now people aren't even discussing alternatives, they're just pushing one of the three existing things, and that's simply not viable. Because I'm not merging another one. In fact, I personally feel that I shouldn't even have merged userspace-swsusp, but if Andrew thinks it needs to be merged, my personal feelings simply don't matter that much. I have to trust people. But yes, as far as *I* am personally concerned, I think it was a mistake to merge it. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 17:34 ` Linus Torvalds @ 2007-04-26 20:08 ` Nigel Cunningham 2007-04-26 20:45 ` Linus Torvalds ` (2 more replies) 2007-04-27 7:51 ` Pekka Enberg 1 sibling, 3 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 20:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: Xavier Bestel, Pekka Enberg, LKML [-- Attachment #1: Type: text/plain, Size: 4165 bytes --] Hi. On Thu, 2007-04-26 at 10:34 -0700, Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Xavier Bestel wrote: > > > > Won't there be problems if e.g. X tries to write something to its > > logfile after snapshot ? > > Sure. But that's a user-level issue. > > You do have to allow writing after snapshotting, since at a minimum, you'd > want the snapshot itself to be written. So the kernel has to be fully > running, and support full user space. No "degraded mode" like now. It doesn't need a fully functional userspace (unless you want to write to a fuse device, and even then that could be worked around - make it like uswsusp or userui).... can I deverge for a second and say that from this point of view, fuse is the lamest idea ever invented. Guaranteed to break your ability to suspend^Wsnapshot.... Anyhow, if the kernel has bmapped the pages it's going to write to beforehand, it knows where the image needs to go. No need for userspace at all. > So when I said "fully running user mode", I really meant it from the > perspective of the kernel - not necessarily from the perspective of the > "user". You do want to limit _what_ user mode does, but you must not limit > it by making the kernel less capable. > > Remounting mounted filesystems read-only sounds like a good idea, for > example. We can do that. We have the technology. But we shouldn't limit > user space from doing other things (for example, it might want to actually > *mount* a new filesystem for writing the snapshot image). We tried that. It would need some work. IIRC remounting filesystems read-only makes files become marked read-only. Perfectly sensible, except that if you then remount the filesystem rw at resume time, all those files are still marked ro and userspace crashes and burns. Not unfixable, I'll agree, but there is more work to do there. As to the example, mounting a new filesystem for writing the snapshot image should probably be done before we do the snapshot. Then it won't be in danger of triggering anything that might require one of the other fses to be rw (eg syslog). > For example, right now we try to "fix" that with the whole process freezer > thing. And that code has *caused* more problems than it fixed, since it > tries to freeze all the kernel threads etc, and you simply don't have a > truly *working*system*. Yes, it has been difficult. But so is bringing up a child. > I think it's fine to freeze processes if that is what you want to do (for > example, send them SIGSTOP), but we freeze them *too*much* right now, and > the suspend stuff has taken over policy way too much. We don't actually > leave the system in a runnable state. I can almost guarantee that you'd be > *better* off having the snapshot taking thing do a > > kill(-1, SIGSTOP); > > in user space than our current broken process freezer. At least that > wouldn't have screwed up those kernel threads as badly as swsusp did. I don't think it's fair to blame swsusp there. Maybe cpu hotplugging... > And no, I'm not saying that my suggestion is the only way to do it. Go > wild. But the *current* situation is just broken. Three different things, > none of which people can agree on. I'd *much* rather see a conceptually > simpler approach that then required, but even more important is that right > now people aren't even discussing alternatives, they're just pushing one > of the three existing things, and that's simply not viable. Because I'm > not merging another one. > > In fact, I personally feel that I shouldn't even have merged > userspace-swsusp, but if Andrew thinks it needs to be merged, my personal > feelings simply don't matter that much. I have to trust people. But yes, > as far as *I* am personally concerned, I think it was a mistake to merge > it. Perhaps you should try to make an alternative yourself instead of pushing us into making something we don't believe will work (my case) or have already done but in a way you don't like (Rafael). Don't talk about Pavel cutting code. He's just acking/nacking what Rafael sends him. Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 20:08 ` Nigel Cunningham @ 2007-04-26 20:45 ` Linus Torvalds 2007-04-26 20:50 ` Nigel Cunningham 2007-04-26 21:38 ` Theodore Tso 2007-04-26 22:08 ` Rafael J. Wysocki 2 siblings, 1 reply; 135+ messages in thread From: Linus Torvalds @ 2007-04-26 20:45 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Xavier Bestel, Pekka Enberg, LKML On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > Perhaps you should try to make an alternative yourself instead of > pushing us into making something we don't believe will work (my case) or > have already done but in a way you don't like (Rafael). Don't talk about > Pavel cutting code. He's just acking/nacking what Rafael sends him. I've done that in the past (USB, PCMCIA - screw the maintainers, redo it basically from scratch). But the thing is, I'm totally uninterested personally in the whole disk-snapshotting, so I'm not likely to do it there. But yes, I'm actually hoping that some new person will come in with a new idea. The current people seem to be too set in "their" corners, and I don't expect that to really change. Quite honestly, I don't foresee any of the current tree approaches really doing something new and obviously better, unless somebody new steps in. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 20:45 ` Linus Torvalds @ 2007-04-26 20:50 ` Nigel Cunningham 2007-04-27 0:10 ` Olivier Galibert 0 siblings, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 20:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Xavier Bestel, Pekka Enberg, LKML [-- Attachment #1: Type: text/plain, Size: 1462 bytes --] Hi. On Thu, 2007-04-26 at 13:45 -0700, Linus Torvalds wrote: > > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > > > Perhaps you should try to make an alternative yourself instead of > > pushing us into making something we don't believe will work (my case) or > > have already done but in a way you don't like (Rafael). Don't talk about > > Pavel cutting code. He's just acking/nacking what Rafael sends him. > > I've done that in the past (USB, PCMCIA - screw the maintainers, redo > it basically from scratch). But the thing is, I'm totally uninterested > personally in the whole disk-snapshotting, so I'm not likely to do it > there. > > But yes, I'm actually hoping that some new person will come in with a new > idea. The current people seem to be too set in "their" corners, and I > don't expect that to really change. > > Quite honestly, I don't foresee any of the current tree approaches really > doing something new and obviously better, unless somebody new steps in. That's because there is no other possibility. Sooner or later you have to do a snapshot, and somehow you have to save it. You're not going to get a new solution, just one that do those basic things in new and better ways. I'm perfectly willing to think through some alternate approach if you suggest something or prod my thinking in a new direction, but I'm afraid I just can't see right now how we can achieve what you're after. Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 20:50 ` Nigel Cunningham @ 2007-04-27 0:10 ` Olivier Galibert 2007-04-27 10:21 ` Daniel Pittman 2007-04-27 23:19 ` Nigel Cunningham 0 siblings, 2 replies; 135+ messages in thread From: Olivier Galibert @ 2007-04-27 0:10 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, Xavier Bestel, Pekka Enberg, LKML On Fri, Apr 27, 2007 at 06:50:56AM +1000, Nigel Cunningham wrote: > I'm perfectly willing to think through some alternate approach if you > suggest something or prod my thinking in a new direction, but I'm afraid > I just can't see right now how we can achieve what you're after. Ok, what about this approach I've been mulling about for a while: Suspend-to-disk is pretty much an exercise in state saving. There are multiple ways to do state saving, but they tend to end up in two categories: implicit and explicit. In implicit state saving, you try to save the state of the system/application/whatever "under its feet", more or less, and then fixup what is no saved/saveable correctly. A well-known example is the undumping process Emacs goes (went?) where it tries to dump the state of the memory as a new executable, with a lot of pleasure with various executable formats and subtleties due to side effects in libc code you don't control. In explicit state saving each object saves what is needed from its state to an independently defined format (instead of "whatever the memory organization happens to be at that point"). When reloading the state you have to parse it, and it usually requires rebuilding/relocating all references/pointers/etc. XEmacs currently has a "portable dumper" that pretty much does just that. We don't have any redumping problems anymore, they're over. Which one is the best depends heavily on the application. The amount of code in the implicit case depends on the amount of fixups to do. In the kernel case it happens to be a lot, pretty much everything that touches hardware has to save to memory the device state and reload it on resume. And bugs on hardware handling can be quite annoying to debug. And if some driver does not to saving/resume correctly, you have no way outside of playing with modules to ensure the safety of the suspend cycle. The amount of code in the explicit case is an interesting variable in the case of the kernel. You have to save what is needed, but how do you define what is needed? It is, pretty much, what running processes can observe from userspace. Now, what can a process observe: - its application text and anonymous memory pages - its file handles - its mapped files - its mapped whatever else - its sys5 IPC stuff - futex stuff and friends, namespaces, etc - its intrinsic characteristics it can reach through syscalls (i.e. the user-visible parts of current, like pid, uid...) - its currently running system call, if any So that's what we'd have to explicitely save. Anonymous memory, sys5 IPC, futex and current structures, that's easy stuff in practice. The fun part are pretty much: - references to files - references to active networking links - references to devices and associated visible state - currently running system call, aka the kernel stack for the process The last one is the one I'm the most afraid of. I hope that the signal stuff and/or the asynchronous syscall stuff that was discussed recently would allow to "unwind" blocking system calls back to the syscall level and then store the parameters for resume-time restart. The non-blocking calls you can just let finish. The first one is really interesting. If you value your filesystems, you'd rather have them clean after the suspend. And also you pretty much know that filesystems can move around when you're not looking, be it USB hotplug stuff (discovery order is random-ish isn't it?), module loading order issues or multithreaded device discovery. So you're way more happy *not* caching anything from the filesystem you can avoid. But what is a file reference, really? With the dcache handy, it's pretty much a path, since inodes don't always exist reliably. And if you have the lists of paths used by the processes on a particular filesystem, you can easily get an idea of where, if anywhere, the filesystem is even if you don't have reliable serials. More interestingly, you cannot, in any case, instantly corrupt your filesystem by having a mismatch between the in-memory cache and the reality. The processes which referenced files you can't find anywhere will end-up with EBADF or segfault depending on whether it was fd or mmap, ala revoke(). They'll probably die horribly. I'd rather have processes die than filesystems die, since in any case if the file isn't here anymore in practice the process could only destroy things. An interesting things there, nothing in that touches either the filesystem or the block devices. Everything is done at the VFS level. The devices don't need to care. And the "this filesystem goes there" can be done in userspace in an initramfs if people want to experiment with kinky strategies. After all, why not allow a sysadmin to regroup two filesystems into one though a suspend, the processes mostly don't need to care (well, tar may, but heh). Deleted files would have to be sillyrenamed or something. Implementation details ;-) Active networking links, you can consider them dead for a start. The networking guys can play with keepalives and stuff if they want to in a second step. Network seldom survives suspend anyway, too many timeouts involved, especially with dynamic IPs. That leaves references to devices. null, ptys, random, log are not a problem, they're virtual constructs. In a first approximation you can revoke() the rest brutally. On a "standard" system that will kill X (ouch), GPM and other input-interested devices, and everything with an opened sound device. Then you can add explicit state saving support to the devices you want, one by one. It may be possible to handle sound collectively at the ALSA layer level, I don't really know. Input shouldn't be too hard, not much state to save, X will be a pain and will probably need special casing. X is a big special case anyway, no matter what happens. For the less directly used devices you can always all explicit support when you feel like it. The interesting part is that either the device supports the suspend and says so explicitely, or the process can't access the device anymore using the previous fds/mmaps after resume. No weird half-condition. If (very) resilient, the process can even close, reopen, reconfigure and go on its merry way. And if you design the saving format correctly (attribute name/value pairs as text work beautifully for such a case), you can be resilient to extreme things including kernel version change or rsync-ing / and the state file and resuming in another box. And if a device gets something it can't parse as the state to go back to for a given fd/mmap for a process, it can always revoke() that one and go on. The main point of that kind of state-saving is to be trustable-by-design. For each process, either its environment could be restored correctly or the incorrect parts can not be accessed anymore. And the stability of the system and its filesystems is ensured pretty much whatever happens. There are a billion details to take into account in a real implementation, but I'm sure you can get the gist of the idea. OG. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 0:10 ` Olivier Galibert @ 2007-04-27 10:21 ` Daniel Pittman 2007-04-27 23:19 ` Nigel Cunningham 1 sibling, 0 replies; 135+ messages in thread From: Daniel Pittman @ 2007-04-27 10:21 UTC (permalink / raw) To: Olivier Galibert Cc: Nigel Cunningham, Linus Torvalds, Xavier Bestel, Pekka Enberg, LKML Olivier Galibert <galibert@pobox.com> writes: > On Fri, Apr 27, 2007 at 06:50:56AM +1000, Nigel Cunningham wrote: > >> I'm perfectly willing to think through some alternate approach if you >> suggest something or prod my thinking in a new direction, but I'm >> afraid I just can't see right now how we can achieve what you're >> after. > > Ok, what about this approach I've been mulling about for a while: > > Suspend-to-disk is pretty much an exercise in state saving. There are > multiple ways to do state saving, but they tend to end up in two > categories: implicit and explicit. [...] > In explicit state saving each object saves what is needed from its > state to an independently defined format (instead of "whatever the > memory organization happens to be at that point"). When reloading the > state you have to parse it, and it usually requires > rebuilding/relocating all references/pointers/etc. If you are looking seriously at this you might want to start with the code in the OpenVZ kernel (http://openvz.org) that allows a VE to "checkpoint" to disk and "restore" on the same or a different machine. This is, as far as I can tell, a portable implementation of this that already handles real live userspace applications moving transparently between two machines. It has the advantage that it lives in an orderly world where most devices and the file system are virtual but, hey, it works right now. Regards, Daniel -- Digital Infrastructure Solutions -- making IT simple, stable and secure Phone: 0401 155 707 email: contact@digital-infrastructure.com.au http://digital-infrastructure.com.au/ ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 0:10 ` Olivier Galibert 2007-04-27 10:21 ` Daniel Pittman @ 2007-04-27 23:19 ` Nigel Cunningham 1 sibling, 0 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 23:19 UTC (permalink / raw) To: Olivier Galibert; +Cc: Linus Torvalds, Xavier Bestel, Pekka Enberg, LKML [-- Attachment #1: Type: text/plain, Size: 283 bytes --] Hi. Just to let you know - I'm not ignoring your message. It's just taking some time to think through the issues and try to formulate a good reply. Oh, and of course there are a gazillion other messages flying about at the moment that need attention too. Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 20:08 ` Nigel Cunningham 2007-04-26 20:45 ` Linus Torvalds @ 2007-04-26 21:38 ` Theodore Tso 2007-04-27 10:10 ` Christoph Hellwig 2007-04-26 22:08 ` Rafael J. Wysocki 2 siblings, 1 reply; 135+ messages in thread From: Theodore Tso @ 2007-04-26 21:38 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, Xavier Bestel, Pekka Enberg, LKML On Fri, Apr 27, 2007 at 06:08:01AM +1000, Nigel Cunningham wrote: > We tried that. It would need some work. IIRC remounting filesystems > read-only makes files become marked read-only. Perfectly sensible, > except that if you then remount the filesystem rw at resume time, all > those files are still marked ro and userspace crashes and burns. Not > unfixable, I'll agree, but there is more work to do there. There are other solutions, though. One is that we could export a system call interface which freezes a filesystem and prevents any further I/O. We mostly have something like that right now (via the the write_super_lockfs function in the superblock operations structure), but we haven't exported it to userspace. And right now not all filesystems support it, but in theory that could be fixed (or you only suppor suspend/resume if all filesystems support lockfs). We would also need a similar interface to freeze any block device I/O, in case you have a database running and doing direct I/O to a block device. (Or again, we could simply not support that case; how many people will be running running a database accessing a block deivce on their laptop?) So in order to do this right, we would have to double the number of new interfaces needed from the two proposed by Linus --- which is why I think the userspace suspend solution is fundamentally NOT the right one. Rather the right one is the one which Linux ultimately used for PCMCIA, which is to do it all in the kernel. - Ted ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 21:38 ` Theodore Tso @ 2007-04-27 10:10 ` Christoph Hellwig 0 siblings, 0 replies; 135+ messages in thread From: Christoph Hellwig @ 2007-04-27 10:10 UTC (permalink / raw) To: Theodore Tso, Nigel Cunningham, Linus Torvalds, Xavier Bestel, Pekka Enberg, LKML On Thu, Apr 26, 2007 at 05:38:07PM -0400, Theodore Tso wrote: > On Fri, Apr 27, 2007 at 06:08:01AM +1000, Nigel Cunningham wrote: > > We tried that. It would need some work. IIRC remounting filesystems > > read-only makes files become marked read-only. Perfectly sensible, > > except that if you then remount the filesystem rw at resume time, all > > those files are still marked ro and userspace crashes and burns. Not > > unfixable, I'll agree, but there is more work to do there. > > There are other solutions, though. One is that we could export a > system call interface which freezes a filesystem and prevents any > further I/O. We mostly have something like that right now (via the > the write_super_lockfs function in the superblock operations > structure), but we haven't exported it to userspace. It is exported on XFS ;-) > We would also need a similar interface to freeze any block device I/O, > in case you have a database running and doing direct I/O to a block > device. (Or again, we could simply not support that case; how many > people will be running running a database accessing a block deivce on > their laptop?) block device I/O uses generic_file*whateveriscurrenthere*_write, which checks for the freeze flag, so the infrastructure for that is there aswell. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 20:08 ` Nigel Cunningham 2007-04-26 20:45 ` Linus Torvalds 2007-04-26 21:38 ` Theodore Tso @ 2007-04-26 22:08 ` Rafael J. Wysocki 2007-04-26 22:20 ` Nigel Cunningham 2007-04-26 23:15 ` Linus Torvalds 2 siblings, 2 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-26 22:08 UTC (permalink / raw) To: nigel, Linus Torvalds, Andrew Morton Cc: Xavier Bestel, Pekka Enberg, LKML, Pavel Machek On Thursday, 26 April 2007 22:08, Nigel Cunningham wrote: [--snip--] > > And no, I'm not saying that my suggestion is the only way to do it. Go > > wild. But the *current* situation is just broken. Three different things, > > none of which people can agree on. I'd *much* rather see a conceptually > > simpler approach that then required, but even more important is that right > > now people aren't even discussing alternatives, they're just pushing one > > of the three existing things, and that's simply not viable. Because I'm > > not merging another one. > > > > In fact, I personally feel that I shouldn't even have merged > > userspace-swsusp, but if Andrew thinks it needs to be merged, my personal > > feelings simply don't matter that much. I have to trust people. But yes, > > as far as *I* am personally concerned, I think it was a mistake to merge > > it. > > Perhaps you should try to make an alternative yourself instead of > pushing us into making something we don't believe will work (my case) or > have already done but in a way you don't like (Rafael). Don't talk about > Pavel cutting code. He's just acking/nacking what Rafael sends him. Well, I think that much of what Linus is saying indicates that he hasn't tried to write any such thing himself. ;-) Anyway, I'm tired of all this thing. Really. I've just been trying to make things _work_ more-or-less reliably in a way that Pavel liked and I really didn't know that much about the kernel when I started. In fact, I started as a user who needed certain functionality from the kernel and that was not there at the time. I've made some mistakes because of that (like the definitions of the ioctl numbers in suspend.h - this was just a rookie mistake, and I'm ashamed of it, but _nobody_ catched it, although I believe many people were looking at the patch). Now that I know much more than before, I can say I agree with Linus on his opinion about the separation of s2ram form the snapshot/restore functionality (I'll call it 'hibernation' for simplicity from now on). It should be done, because it would make things simpler and cleaner. Still, it will be difficult to do without screwing users en masse and that's my main concern here. I don't agree that we don't need the tasks freezer for suspending and hibernation. We need it, because we need to be sure that the (other) tasks will not get us in the way, and that also applies to kernel threads (and I don't think the tasks freezer is 'screwing' them, BTW). I agree that the userland interface for swsusp is not very nice and I'm going to do my best to clean that up. I hope that someone will help me, but if not, then that's fine. OTOH, it's difficult, if not impossible, to do a userland-driven hibernation in a completely clean way. I've tried that and I'm not exactly satisfied with the result, although it works and some distros use it. I wouldn't have done it again, but then I'm going to support the existing users, as I promised. Now, I think that the hibernation should better be done completely in the kernel, because that's just conceptually simpler, although some data exchange with the user land may be acceptable for some optional fancy stuff. I'm also tierd of the endless "to merge or not to merge suspend2" discussions that just lead to nowhere. For these reasons I declare that I'm ready to cooperate with Nigel on integrating as much of suspend2 as reasonably possible into the existing infrastructure, under the following conditions: - we don't remove the existing user-visible interfaces - we work on one piece of code at a time - we avoid code duplication, as much as possible - we avoid using open-coded things, if possible - if we don't agree on something, we ask someone wiser (volunteers welcome ;-)) If that's acceptable, we can start tomorrow. In the process, we can try to separate the hibernation code paths from the s2ram ones, but that will require a lot of knowledge about things that neither me nor Nigel, AFAICT, are very familiar with, like writing device drivers. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:08 ` Rafael J. Wysocki @ 2007-04-26 22:20 ` Nigel Cunningham 2007-04-26 23:15 ` Linus Torvalds 1 sibling, 0 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 22:20 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Andrew Morton, Xavier Bestel, Pekka Enberg, LKML, Pavel Machek [-- Attachment #1: Type: text/plain, Size: 5439 bytes --] Hi Rafael. On Fri, 2007-04-27 at 00:08 +0200, Rafael J. Wysocki wrote: > On Thursday, 26 April 2007 22:08, Nigel Cunningham wrote: > [--snip--] > > > And no, I'm not saying that my suggestion is the only way to do it. Go > > > wild. But the *current* situation is just broken. Three different things, > > > none of which people can agree on. I'd *much* rather see a conceptually > > > simpler approach that then required, but even more important is that right > > > now people aren't even discussing alternatives, they're just pushing one > > > of the three existing things, and that's simply not viable. Because I'm > > > not merging another one. > > > > > > In fact, I personally feel that I shouldn't even have merged > > > userspace-swsusp, but if Andrew thinks it needs to be merged, my personal > > > feelings simply don't matter that much. I have to trust people. But yes, > > > as far as *I* am personally concerned, I think it was a mistake to merge > > > it. > > > > Perhaps you should try to make an alternative yourself instead of > > pushing us into making something we don't believe will work (my case) or > > have already done but in a way you don't like (Rafael). Don't talk about > > Pavel cutting code. He's just acking/nacking what Rafael sends him. > > Well, I think that much of what Linus is saying indicates that he hasn't tried > to write any such thing himself. ;-) > > Anyway, I'm tired of all this thing. Really. I've just been trying to make > things _work_ more-or-less reliably in a way that Pavel liked and I really > didn't know that much about the kernel when I started. In fact, I started as a > user who needed certain functionality from the kernel and that was not there > at the time. I've made some mistakes because of that (like the definitions of > the ioctl numbers in suspend.h - this was just a rookie mistake, and I'm > ashamed of it, but _nobody_ catched it, although I believe many people were > looking at the patch). > > Now that I know much more than before, I can say I agree with Linus on his > opinion about the separation of s2ram form the snapshot/restore functionality > (I'll call it 'hibernation' for simplicity from now on). It should be done, > because it would make things simpler and cleaner. Still, it will be difficult > to do without screwing users en masse and that's my main concern here. > > I don't agree that we don't need the tasks freezer for suspending and > hibernation. We need it, because we need to be sure that the (other) tasks > will not get us in the way, and that also applies to kernel threads (and I > don't think the tasks freezer is 'screwing' them, BTW). > > I agree that the userland interface for swsusp is not very nice and I'm going > to do my best to clean that up. I hope that someone will help me, but if not, > then that's fine. OTOH, it's difficult, if not impossible, to do a > userland-driven hibernation in a completely clean way. I've tried that and I'm > not exactly satisfied with the result, although it works and some distros use > it. I wouldn't have done it again, but then I'm going to support the existing > users, as I promised. > > Now, I think that the hibernation should better be done completely in the > kernel, because that's just conceptually simpler, although some data exchange > with the user land may be acceptable for some optional fancy stuff. I'm also > tierd of the endless "to merge or not to merge suspend2" discussions that just > lead to nowhere. For these reasons I declare that I'm ready to cooperate with > Nigel on integrating as much of suspend2 as reasonably possible into the > existing infrastructure, under the following conditions: > - we don't remove the existing user-visible interfaces I don't want to remove user visible interfaces either (I understand that you mean the ioctls by that?). Perhaps we can find a way to make them still usable with a more in-kernel solution (ie some things become noops?). > - we work on one piece of code at a time Sure. We should spend some time discussing and planning beforehand so we don't waste time and effort writing and rewriting. > - we avoid code duplication, as much as possible No problem there. > - we avoid using open-coded things, if possible Regarding open-coded things, I assume you're referring to the extents. I would argue that they're not open-coded because list.h implements doubly linked lists, and extents use a singly linked list. That said, I suppose we could make the extents doubly linked and use list.h, even though that would be a waste of 4/8 bytes per extent. > - if we don't agree on something, we ask someone wiser (volunteers welcome ;-)) Absolutely! > If that's acceptable, we can start tomorrow. In the process, we can try to > separate the hibernation code paths from the s2ram ones, but that will require > a lot of knowledge about things that neither me nor Nigel, AFAICT, are very > familiar with, like writing device drivers. Yes. Thanks for this email. It's really encouraging, and I'm more than glad to work with you. Unfortunately, as you've seen me keep saying already, I have very limited time to work on this. Thankfully you seem to have more, and Pekka has also stepped up to help, so maybe we can make good forward progress despite my limitations. Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:08 ` Rafael J. Wysocki 2007-04-26 22:20 ` Nigel Cunningham @ 2007-04-26 23:15 ` Linus Torvalds 1 sibling, 0 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-26 23:15 UTC (permalink / raw) To: Rafael J. Wysocki Cc: nigel, Andrew Morton, Xavier Bestel, Pekka Enberg, LKML, Pavel Machek On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > > Well, I think that much of what Linus is saying indicates that he hasn't tried > to write any such thing himself. ;-) That's definitely true. The only interaction I ever had with "hibernation" (and yes, we should just call it that) is when I was working on s2ram and cleaning up the PCI device suspend/resume in particular, and trying (_mostly_ successfully - I think I broke it once or twice mainly due to interactions with the console, but on the whole I think it mostly worked) to not break hibernation in the process without actually running it. > Now that I know much more than before, I can say I agree with Linus on his > opinion about the separation of s2ram form the snapshot/restore functionality > (I'll call it 'hibernation' for simplicity from now on). So my strong opinion on it literally comes from the other end (ie _not_ knowing about hibernation, but trying to work with s2ram, and cursing the mixups). > It should be done, because it would make things simpler and cleaner. > Still, it will be difficult to do without screwing users en masse and > that's my main concern here. I do agree. It will inevitably affect a lot of devices. That's always painful. > I don't agree that we don't need the tasks freezer for suspending and > hibernation. We need it, because we need to be sure that the (other) tasks > will not get us in the way, and that also applies to kernel threads (and I > don't think the tasks freezer is 'screwing' them, BTW). I actually feel much less strongly about that, because just separating out s2ram and hibernate entirely from each other would already really get the thing _I_ care about taken care of - being able to work on one of the other without fear of breaking the other one. And besides, I actually came into the whole discussion because I'm not a huge fan of thinking that user-land is "better". If the thing can sanely be done in kernel, I'm actually all for that. What drives me wild is having three different things, and nobody driving. It needs somebody who (a) cares (b) has good taste and (c) has enough time and personal karma to burn that he can actually take the (obviously) inevitable heat from just doing things right, and convincing people to select *one* implementation. That kind of person is really really hard to find. And if you're it, you're in for some pain ;) Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 17:34 ` Linus Torvalds 2007-04-26 20:08 ` Nigel Cunningham @ 2007-04-27 7:51 ` Pekka Enberg 1 sibling, 0 replies; 135+ messages in thread From: Pekka Enberg @ 2007-04-27 7:51 UTC (permalink / raw) To: Linus Torvalds; +Cc: Xavier Bestel, Nigel Cunningham, LKML On 4/26/07, Linus Torvalds <torvalds@linux-foundation.org> wrote: > In fact, I personally feel that I shouldn't even have merged > userspace-swsusp, but if Andrew thinks it needs to be merged, my personal > feelings simply don't matter that much. I have to trust people. But yes, > as far as *I* am personally concerned, I think it was a mistake to merge > it. While the ioctl() interface is horrid, I think it's actually in principle pretty close to your snapshot_system()/resume_snapshot(). The ugliness probably comes from the fact that suspend to RAM and snapshot/shutdown are interleaved there too. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds 2007-04-26 17:03 ` Xavier Bestel @ 2007-04-26 17:07 ` Linus Torvalds 2007-04-26 18:22 ` Chase Venters ` (4 subsequent siblings) 6 siblings, 0 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-26 17:07 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Pekka Enberg, LKML On Thu, 26 Apr 2007, Linus Torvalds wrote: > > Once you have that snapshot image in user space you can do anything you > want. Side note: the exception, of course, is page out more. The swap device has to be read-only. We actually have support for that mode (it's how "swapoff" works: it marks swap devices as not accepting _new_ entries, even though old entries are still valid). So you can have a fully running system, with 99% of memory swapped out, and still guarantee that you won't swap out anything *more* (which would destroy the swap image, which you don't want, since it's where a lot of the memory may end up being, in order to make the snapshot itself as small as possible)! Anybody who cares can look at the code that messes with the the SWP_WRITEOK flag. You'd basically swap out enough to make the snapshot image fit comfortably in memory, and then you'd clear SWP_WRITEOK on all swap devices and return to user space. Or something very close to that. But the point here is that we should actually really be able to have a fully working system, even _after_ we created the snapshot. I don't even think you should need any "initrd only" kind of situation. If somebody can do that, with just those two system calls, I'll remove every other suspend-to-disk wannabe from the kernel in a heartbeat. I may have missed something subtle, of course, but I really *think* it should be doable. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds 2007-04-26 17:03 ` Xavier Bestel 2007-04-26 17:07 ` Linus Torvalds @ 2007-04-26 18:22 ` Chase Venters 2007-04-26 18:50 ` David Lang 2007-04-26 19:56 ` Nigel Cunningham ` (3 subsequent siblings) 6 siblings, 1 reply; 135+ messages in thread From: Chase Venters @ 2007-04-26 18:22 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nigel Cunningham, Pekka Enberg, LKML On Thu, 26 Apr 2007, Linus Torvalds wrote: > > Once you have that snapshot image in user space you can do anything you > want. And again: you'd hav a fully working system: not any degradation > *at*all*. If you're in X, then X will continue running etc even after the > snapshotting, although obviously the snapshotting will have tried to page > a lot of stuff out in order to make the snapshot smaller, so you'll likely > be crawling. > In fact... If you're just paging out to make a smaller snapshot (ie, not to free up memory), couldn't you just swap it out (if it's not backed by a file) then mark it as "half-released"... ie, the snapshot writing code ignores it knowing that it will be available on disk at resume, but then when the snapshot is complete it's still available in physical RAM, preventing user-space from crawling due to the necessity of paging it all back in? Thanks, Chase ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 18:22 ` Chase Venters @ 2007-04-26 18:50 ` David Lang 0 siblings, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-26 18:50 UTC (permalink / raw) To: Chase Venters; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Thu, 26 Apr 2007, Chase Venters wrote: > On Thu, 26 Apr 2007, Linus Torvalds wrote: > >> >> Once you have that snapshot image in user space you can do anything you >> want. And again: you'd hav a fully working system: not any degradation >> *at*all*. If you're in X, then X will continue running etc even after the >> snapshotting, although obviously the snapshotting will have tried to page >> a lot of stuff out in order to make the snapshot smaller, so you'll likely >> be crawling. >> > > In fact... If you're just paging out to make a smaller snapshot (ie, not > to free up memory), couldn't you just swap it out (if it's not backed by a > file) then mark it as "half-released"... ie, the snapshot writing code > ignores it knowing that it will be available on disk at resume, but then > when the snapshot is complete it's still available in physical RAM, > preventing user-space from crawling due to the necessity of paging it all > back in? your swap space may end up being re-used before you restore with std David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds ` (2 preceding siblings ...) 2007-04-26 18:22 ` Chase Venters @ 2007-04-26 19:56 ` Nigel Cunningham 2007-04-27 4:52 ` Pekka J Enberg 2007-04-28 19:09 ` Bill Davidsen 2007-04-26 22:40 ` Pavel Machek ` (2 subsequent siblings) 6 siblings, 2 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 19:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pekka Enberg, LKML [-- Attachment #1: Type: text/plain, Size: 3879 bytes --] Hi. On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote: > > On Thu, 26 Apr 2007, Nigel Cunningham wrote: > > > > * Doing things in the right order? (Prepare the image, then do the > > atomic copy, then save). > > I'd actually like to discuss this a bit.. > > I'm obviously not a huge fan of the whole user/kernel level split and > interfaces, but I actually do think that there is *one* split that makes > sense: > > - generate the (whole) snapshot image entirely inside the kernel > > - do nothing else (ie no IO at all), and just export it as a single image > to user space (literally just mapping the pages into user space). > *one* interface. None of the "pretty UI update" crap. Just a single > system call: > > void *snapshot_system(u32 *size); > > which will map in the snapshot, return the mapped address and the size > (and if you want to support snapshots > 4GB, be my guest, but I suspect > you're actually *better* off just admitting that if you cannot shrink > the snapshot to less than 32 bits, it's not worth doing) That inherently limits the image to half of available ram (you need somewhere to store the snapshot), so you won't get the full image you express interest in below. > User space gets a fully running system, with that one process having that > one image mapped into its address space. It can then compress/write/do > whatever to that snapshot. You're describing uswsusp! (At least in so far as I understand it!). You can't get a fully running system though, because if anything changes something on disk that was snapshotted (super blocks etc) your snapshot is invalid and you risk on-disk corruption. > And btw, the device model changes are a big part of this. Because I don't > think it's even remotely debuggable with the full suspend/resume of the > devices being part of generating the image! That freeze/snapshot/unfreeze > sequence is likely a lot more debuggable, if only because freeze/unfreeze > is actually a no-op for most devices, and snapshotting is trivial too. > > Once you have that snapshot image in user space you can do anything you > want. And again: you'd hav a fully working system: not any degradation > *at*all*. If you're in X, then X will continue running etc even after the > snapshotting, although obviously the snapshotting will have tried to page > a lot of stuff out in order to make the snapshot smaller, so you'll likely > be crawling. Nooooooo! See above about disk corruption. > > * Mulithreaded I/O (might as well use multiple cores to compress the > > image, now that we're hotplugging later). > > * Support for > 1 swap device. > > * Support for ordinary files. > > * Full image option. > > * Modular design? > > I'd really suggest _just_ the "full image". Nothing else is probably ever > worth supporting. Your "snapshot to disk" wouldn't be _quite_ as simple as > "echo disk > /sys/power/state", but it should not necessarily be much > worse than Please, go apply that logic elsewhere, then cut out (or at least stop adding) support for users with less common needs in other areas. I fully acknowledge that most users have only one place to store their image and it's a swap device. But that doesn't mean one size fits all. A full image implies that you need to figure out what's not going to change while you're writing it and save that separately. At the moment, I'm treating most of the LRU contents as that list. If we're going to start trying to let every man and his dog run while we're trying to snapshot the system, that's not going to work anymore - or the logic will get a lot more complicated. Sorry. I never thought I'd say this, but I think you're being naive about how simple the process of snapshotting a system is. Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 19:56 ` Nigel Cunningham @ 2007-04-27 4:52 ` Pekka J Enberg 2007-04-27 6:08 ` Nigel Cunningham 2007-04-27 20:44 ` Rafael J. Wysocki 2007-04-28 19:09 ` Bill Davidsen 1 sibling, 2 replies; 135+ messages in thread From: Pekka J Enberg @ 2007-04-27 4:52 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, LKML On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote: > > which will map in the snapshot, return the mapped address and the size > > (and if you want to support snapshots > 4GB, be my guest, but I suspect > > you're actually *better* off just admitting that if you cannot shrink > > the snapshot to less than 32 bits, it's not worth doing) On Fri, 27 Apr 2007, Nigel Cunningham wrote: > That inherently limits the image to half of available ram (you need > somewhere to store the snapshot), so you won't get the full image you > express interest in below. It doesn't. We can make the userspace mapped pages copy-on-write. As long as the userspace makes sure there's not much activity during snapshot/shutdown, we will be fine. What we probably do need to copy is kernel pages. Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 4:52 ` Pekka J Enberg @ 2007-04-27 6:08 ` Nigel Cunningham 2007-04-27 6:18 ` Pekka J Enberg 2007-04-27 20:44 ` Rafael J. Wysocki 1 sibling, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 6:08 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 1553 bytes --] Hi. On Fri, 2007-04-27 at 07:52 +0300, Pekka J Enberg wrote: > On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote: > > > which will map in the snapshot, return the mapped address and the size > > > (and if you want to support snapshots > 4GB, be my guest, but I suspect > > > you're actually *better* off just admitting that if you cannot shrink > > > the snapshot to less than 32 bits, it's not worth doing) > > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > That inherently limits the image to half of available ram (you need > > somewhere to store the snapshot), so you won't get the full image you > > express interest in below. > > It doesn't. We can make the userspace mapped pages copy-on-write. As long > as the userspace makes sure there's not much activity during > snapshot/shutdown, we will be fine. What we probably do need to copy is > kernel pages. COW is a possibility, but I understood (perhaps wrongly) that Linus was thinking of a single syscall or such like to prepare the snapshot. If you're going to start doing things like this, won't that mean you'd then have to update/redo the snapshot or somehow nullify the effect of anything the programs does so that doing it again after the snapshot is restored doesn't cause problems? I was going to leave it at that and press send, but perhaps that wouldn't be wise. I feel I should also ask what you're thinking of as a means of making sure userspace doesn't do much activity. Thanks for your labours! Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:08 ` Nigel Cunningham @ 2007-04-27 6:18 ` Pekka J Enberg 2007-04-27 6:29 ` Pekka J Enberg ` (3 more replies) 0 siblings, 4 replies; 135+ messages in thread From: Pekka J Enberg @ 2007-04-27 6:18 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, LKML On Fri, 27 Apr 2007, Nigel Cunningham wrote: > COW is a possibility, but I understood (perhaps wrongly) that Linus was > thinking of a single syscall or such like to prepare the snapshot. If > you're going to start doing things like this, won't that mean you'd then > have to update/redo the snapshot or somehow nullify the effect of > anything the programs does so that doing it again after the snapshot is > restored doesn't cause problems? No. The snapshot is just that. A snapshot in time. From kernel point of view, it doesn't matter one bit what when you did it or if the state has changed before you resume. It's up to userspace to make sure the user doesn't do real work while the snapshot is being written to disk and machine is shut down. On Fri, 27 Apr 2007, Nigel Cunningham wrote: > I was going to leave it at that and press send, but perhaps that > wouldn't be wise. I feel I should also ask what you're thinking of as a > means of making sure userspace doesn't do much activity. When the snapshot pages are COW, we will run out of memory if userspace writes to those pages too much. If userspace is blocked, say like displaying a "we are suspending" in X which blocks the user from using other programs that could generate new writes and mounting filesystems read-only, we don't need to worry about running out of memory. Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:18 ` Pekka J Enberg @ 2007-04-27 6:29 ` Pekka J Enberg 2007-04-27 6:34 ` Nigel Cunningham ` (2 subsequent siblings) 3 siblings, 0 replies; 135+ messages in thread From: Pekka J Enberg @ 2007-04-27 6:29 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, LKML On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > COW is a possibility, but I understood (perhaps wrongly) that Linus was > > thinking of a single syscall or such like to prepare the snapshot. If > > you're going to start doing things like this, won't that mean you'd then > > have to update/redo the snapshot or somehow nullify the effect of > > anything the programs does so that doing it again after the snapshot is > > restored doesn't cause problems? On Fri, 27 Apr 2007, Pekka J Enberg wrote: > No. The snapshot is just that. A snapshot in time. From kernel point of > view, it doesn't matter one bit what when you did it or if the state has > changed before you resume. It's up to userspace to make sure the user > doesn't do real work while the snapshot is being written to disk and > machine is shut down. Btw, obviously we need to break the COW when resuming and not include the snapshot mapping. However, that should be trivially doable by snapshotting the page mappings before remapping them as COW. Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:18 ` Pekka J Enberg 2007-04-27 6:29 ` Pekka J Enberg @ 2007-04-27 6:34 ` Nigel Cunningham 2007-04-27 6:50 ` Pekka J Enberg 2007-04-27 9:50 ` Oliver Neukum 2007-04-27 21:24 ` Rafael J. Wysocki 3 siblings, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 6:34 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 2787 bytes --] Hi. On Fri, 2007-04-27 at 09:18 +0300, Pekka J Enberg wrote: > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > COW is a possibility, but I understood (perhaps wrongly) that Linus was > > thinking of a single syscall or such like to prepare the snapshot. If > > you're going to start doing things like this, won't that mean you'd then > > have to update/redo the snapshot or somehow nullify the effect of > > anything the programs does so that doing it again after the snapshot is > > restored doesn't cause problems? > > No. The snapshot is just that. A snapshot in time. From kernel point of > view, it doesn't matter one bit what when you did it or if the state has > changed before you resume. It's up to userspace to make sure the user > doesn't do real work while the snapshot is being written to disk and > machine is shut down. Sorry Pekka, but that's just broken. It implies firstly that we tell all userspace programs "I'm sorry, but I'm suspending at the moment. Can you tip toe quietly around while I do it?" You can't seriously expect every userspace program to be modified to adjust it's behaviour according to whether we're writing a snapshot to disk at the moment or not. It also implies that we can prepare a snapshot and then happily have the contents of the disk change so that they don't match the superblock and other filesystem details we just saved in the snapshot. We can't. At least not without modifying all the filesystems so that (at a minimum) they know how to throw away all the metadata they have at resume time and reread it from disk. > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > I was going to leave it at that and press send, but perhaps that > > wouldn't be wise. I feel I should also ask what you're thinking of as a > > means of making sure userspace doesn't do much activity. > > When the snapshot pages are COW, we will run out of memory if userspace > writes to those pages too much. If userspace is blocked, say like > displaying a "we are suspending" in X which blocks the user from using > other programs that could generate new writes and mounting filesystems > read-only, we don't need to worry about running out of memory. This sounds feasible, but it's only really acceptable if your willing to have hibernation fail or restart multiple times. If your battery is running out or you need to rush to put a lappy in your bag because they train just came early, that's not an option. It's for that very reason that I've put a lot of effort into trying to make it work first time, every time. Not there yet, but it's a priority. By the way, sorry. This email feels like it is pouring a lot of cold water on your ideas. I don't want to be negative! Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:34 ` Nigel Cunningham @ 2007-04-27 6:50 ` Pekka J Enberg 2007-04-27 7:03 ` Nigel Cunningham 0 siblings, 1 reply; 135+ messages in thread From: Pekka J Enberg @ 2007-04-27 6:50 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, LKML On Fri, 27 Apr 2007, Nigel Cunningham wrote: > Sorry Pekka, but that's just broken. It certainly isn't. On Fri, 27 Apr 2007, Nigel Cunningham wrote: > It implies firstly that we tell all userspace programs "I'm sorry, but > I'm suspending at the moment. Can you tip toe quietly around while I do > it?" You can't seriously expect every userspace program to be modified > to adjust it's behaviour according to whether we're writing a snapshot > to disk at the moment or not. You don't need to modify other programs. You just need to display the progress bar and block _user input_. I don't even claim to know X, but I would be extremely surprised if you technically can't say "don't let the user touch any other windows except this one." The user couldn't care less whether tasks are frozen or not by the kernel. What matters is that the user can't shoot himself in the foot while snapshotting. Furthermore, we probably do need to do other things to ensure safety, like remounting filesystems read-only but again, this has nothing to do with snapshotting per se. What the kernel needs to worry about is (1) providing an atomic snapshot that is consistent and (2) resuming to that snapshot safely. If the _user_ loses data that was generated between snapshot + shutdown, it's absolutely no concern for the snapshot operation! On Fri, 27 Apr 2007, Nigel Cunningham wrote: > It also implies that we can prepare a snapshot and then happily have the > contents of the disk change so that they don't match the superblock and > other filesystem details we just saved in the snapshot. We can't. At > least not without modifying all the filesystems so that (at a minimum) > they know how to throw away all the metadata they have at resume time > and reread it from disk. But you just explained how we can! We shouldn't bend over backwards for snapshotting just because the filesystems don't currently support something we need. On Fri, 27 Apr 2007, Nigel Cunningham wrote: > By the way, sorry. This email feels like it is pouring a lot of cold > water on your ideas. I don't want to be negative! Don't worry, I am used to cold water :-). Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:50 ` Pekka J Enberg @ 2007-04-27 7:03 ` Nigel Cunningham 2007-04-27 7:24 ` Pekka J Enberg 0 siblings, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 7:03 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 3483 bytes --] Hi. On Fri, 2007-04-27 at 09:50 +0300, Pekka J Enberg wrote: > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > Sorry Pekka, but that's just broken. > > It certainly isn't. > > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > It implies firstly that we tell all userspace programs "I'm sorry, but > > I'm suspending at the moment. Can you tip toe quietly around while I do > > it?" You can't seriously expect every userspace program to be modified > > to adjust it's behaviour according to whether we're writing a snapshot > > to disk at the moment or not. > > You don't need to modify other programs. You just need to display the > progress bar and block _user input_. I don't even claim to know X, but I > would be extremely surprised if you technically can't say "don't let > the user touch any other windows except this one." The user couldn't care > less whether tasks are frozen or not by the kernel. What matters is that > the user can't shoot himself in the foot while snapshotting. User input doesn't account for all system activity. Think of cron jobs or user initiated jobs that may have started before the cycle began. > Furthermore, we probably do need to do other things to ensure safety, like > remounting filesystems read-only but again, this has nothing to do with > snapshotting per se. What the kernel needs to worry about is (1) providing > an atomic snapshot that is consistent and (2) resuming to that snapshot > safely. If the _user_ loses data that was generated between snapshot + > shutdown, it's absolutely no concern for the snapshot operation! Noooo! If the user looses data, the user will be concerned and we should be. I for one would do my best to avoid using software that loses my data for me. I wouldn't care if you said "Well, it's your fault. You lost the data." From my perspective as a user, I didn't lose the data, some part of the computer's OS did. > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > It also implies that we can prepare a snapshot and then happily have the > > contents of the disk change so that they don't match the superblock and > > other filesystem details we just saved in the snapshot. We can't. At > > least not without modifying all the filesystems so that (at a minimum) > > they know how to throw away all the metadata they have at resume time > > and reread it from disk. > > But you just explained how we can! We shouldn't bend over backwards for > snapshotting just because the filesystems don't currently support > something we need. Sorry, but I just don't believe filesystems should need to throw away metadata post resume. If we let data be changed after snapshotting (or ourselves cause it to be changed), we're the ones that are broken. Our snapshot is out of date and the expectations of userspace programs that were snapshotted will be out of date. Just imagine, for example, a userspace program that is snapshotted, then reads and deletes a temporary file. After the snapshot restore, it's running again. But wait, we can't read or delete the file again because it's already gone. Life just gets more complicated and confusing this way. > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > By the way, sorry. This email feels like it is pouring a lot of cold > > water on your ideas. I don't want to be negative! > > Don't worry, I am used to cold water :-). Maybe, but I'd still rather be encouraging! Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 7:03 ` Nigel Cunningham @ 2007-04-27 7:24 ` Pekka J Enberg 0 siblings, 0 replies; 135+ messages in thread From: Pekka J Enberg @ 2007-04-27 7:24 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, LKML On Fri, 27 Apr 2007, Nigel Cunningham wrote: > User input doesn't account for all system activity. Think of cron jobs > or user initiated jobs that may have started before the cycle began. Yes, but the _user_ did not start them so they didn't lose any work. See, it might or might not be important but that's something the _userspace_ has much more knowledge than the kernel ever will. On Fri, 27 Apr 2007, Nigel Cunningham wrote: > Noooo! If the user looses data, the user will be concerned and we should > be. I for one would do my best to avoid using software that loses my > data for me. I wouldn't care if you said "Well, it's your fault. You > lost the data." From my perspective as a user, I didn't lose the data, > some part of the computer's OS did. You are looking at snapshot/shutdown from kernel and user experience point of view at the same time which causes confusion here. Let me repeat: it is _absolutely no concern_ of the _kernel_ whether you resume to a snapshot that does not contain all your precious data. The kernel doesn't care one bit! That being said, the _userspace solution_ obviously needs to take this into account by blocking user input, making filesystems read-only, and maybe even blocking certain background processes (cron and beagle indexing come into mind). On Fri, 27 Apr 2007, Nigel Cunningham wrote: > Sorry, but I just don't believe filesystems should need to throw away > metadata post resume. If we let data be changed after snapshotting (or > ourselves cause it to be changed), we're the ones that are broken. Our > snapshot is out of date and the expectations of userspace programs that > were snapshotted will be out of date. Just imagine, for example, a > userspace program that is snapshotted, then reads and deletes a > temporary file. After the snapshot restore, it's running again. But > wait, we can't read or delete the file again because it's already gone. > Life just gets more complicated and confusing this way. It doesn't. We can either make the filesystem read-only or, surprise, surprise, make a _snapshot_ of the filesystem! And while the points you raised are important for the full end-user solution, it is absolutely not interesting to snapshot_system(). The only thing it needs to guarantee is a consistent snapshot that we can resume later. On Fri, 27 Apr 2007, Nigel Cunningham wrote: > Maybe, but I'd still rather be encouraging! You are. Perhaps you just don't know it yet. ;-) Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:18 ` Pekka J Enberg 2007-04-27 6:29 ` Pekka J Enberg 2007-04-27 6:34 ` Nigel Cunningham @ 2007-04-27 9:50 ` Oliver Neukum 2007-04-27 10:12 ` Pekka J Enberg 2007-04-27 21:24 ` Rafael J. Wysocki 3 siblings, 1 reply; 135+ messages in thread From: Oliver Neukum @ 2007-04-27 9:50 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Nigel Cunningham, Linus Torvalds, LKML Am Freitag, 27. April 2007 08:18 schrieb Pekka J Enberg: > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > COW is a possibility, but I understood (perhaps wrongly) that Linus was > > thinking of a single syscall or such like to prepare the snapshot. If > > you're going to start doing things like this, won't that mean you'd then > > have to update/redo the snapshot or somehow nullify the effect of > > anything the programs does so that doing it again after the snapshot is > > restored doesn't cause problems? > > No. The snapshot is just that. A snapshot in time. From kernel point of > view, it doesn't matter one bit what when you did it or if the state has > changed before you resume. It's up to userspace to make sure the user > doesn't do real work while the snapshot is being written to disk and > machine is shut down. And where is the benefit in that? How is such user space freezing logic simpler than having the kernel do the write? What can you do in user space if all filesystems are r/o that is worth the hassle? Regards Oliver ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 9:50 ` Oliver Neukum @ 2007-04-27 10:12 ` Pekka J Enberg 2007-04-27 19:07 ` Oliver Neukum 2007-04-28 10:35 ` Rafael J. Wysocki 0 siblings, 2 replies; 135+ messages in thread From: Pekka J Enberg @ 2007-04-27 10:12 UTC (permalink / raw) To: Oliver Neukum; +Cc: Nigel Cunningham, Linus Torvalds, LKML Am Freitag, 27. April 2007 08:18 schrieb Pekka J Enberg: > > No. The snapshot is just that. A snapshot in time. From kernel point of > > view, it doesn't matter one bit what when you did it or if the state has > > changed before you resume. It's up to userspace to make sure the user > > doesn't do real work while the snapshot is being written to disk and > > machine is shut down. On Fri, 27 Apr 2007, Oliver Neukum wrote: > And where is the benefit in that? How is such user space freezing logic > simpler than having the kernel do the write? > > What can you do in user space if all filesystems are r/o that is worth the > hassle? I am talking about snapshot_system() here. It's not given that the filesystems need to be read-only (you can snapshot them too). The benefit here is that you can do whatever you want with the snapshot (encrypt, compress, send over the network) and have a clean well-defined interface in the kernel. In addition, aborting the snapshot is simpler, simply munmap() the snapshot. The problem with writing in the kernel is obvious: we need to add new code to the kernel for compression, encryption, and userspace interaction (graphical progress bar) that are important for user experience. Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 10:12 ` Pekka J Enberg @ 2007-04-27 19:07 ` Oliver Neukum 2007-04-28 9:22 ` Pekka Enberg 2007-04-28 10:35 ` Rafael J. Wysocki 1 sibling, 1 reply; 135+ messages in thread From: Oliver Neukum @ 2007-04-27 19:07 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Nigel Cunningham, Linus Torvalds, LKML Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: > I am talking about snapshot_system() here. It's not given that the > filesystems need to be read-only (you can snapshot them too). The benefit > here is that you can do whatever you want with the snapshot (encrypt, > compress, send over the network) and have a clean well-defined interface > in the kernel. In addition, aborting the snapshot is simpler, simply > munmap() the snapshot. But is that worth the trade off? > The problem with writing in the kernel is obvious: we need to add new code > to the kernel for compression, encryption, and userspace interaction > (graphical progress bar) that are important for user experience. The kernel can already do compression and encryption. Regards Oliver ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 19:07 ` Oliver Neukum @ 2007-04-28 9:22 ` Pekka Enberg 2007-04-28 13:37 ` Oliver Neukum 0 siblings, 1 reply; 135+ messages in thread From: Pekka Enberg @ 2007-04-28 9:22 UTC (permalink / raw) To: Oliver Neukum; +Cc: Nigel Cunningham, Linus Torvalds, LKML Hi Oliver, Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: > > The problem with writing in the kernel is obvious: we need to add new code > > to the kernel for compression, encryption, and userspace interaction > > (graphical progress bar) that are important for user experience. On 4/27/07, Oliver Neukum <oliver@neukum.org> wrote: > The kernel can already do compression and encryption. Yes, if we all could agree on _which_ compression and encryption algorithm(s) we want to use. It goes beyond that too, where do you want to save the image? In the swap device or a regular file? And don't forget about debuggability either. It's faster to do a snapshot/resume without shutdown/restart in the middle or just do a snapshot, and examine its contents. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 9:22 ` Pekka Enberg @ 2007-04-28 13:37 ` Oliver Neukum 2007-05-03 12:06 ` Pavel Machek 0 siblings, 1 reply; 135+ messages in thread From: Oliver Neukum @ 2007-04-28 13:37 UTC (permalink / raw) To: Pekka Enberg; +Cc: Nigel Cunningham, Linus Torvalds, LKML Am Samstag, 28. April 2007 11:22 schrieb Pekka Enberg: > Hi Oliver, > > Am Freitag, 27. April 2007 12:12 schrieb Pekka J Enberg: > > > The problem with writing in the kernel is obvious: we need to add new code > > > to the kernel for compression, encryption, and userspace interaction > > > (graphical progress bar) that are important for user experience. > > On 4/27/07, Oliver Neukum <oliver@neukum.org> wrote: > > The kernel can already do compression and encryption. > > Yes, if we all could agree on _which_ compression and encryption Any of those available in the kernel. Where's the problem? > algorithm(s) we want to use. It goes beyond that too, where do you > want to save the image? In the swap device or a regular file? And A swap device is doubtlessly easier. But isn't the problem of using a swap file already fixed? The writeout seems the easiest part of hibernation. > don't forget about debuggability either. It's faster to do a > snapshot/resume without shutdown/restart in the middle or just do a > snapshot, and examine its contents. Then use a "fake reboot" option and save the image to a ramdisk. It isn't that hard. You must be able to survive that, as io errors during write out are possible. Regards Oliver ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 13:37 ` Oliver Neukum @ 2007-05-03 12:06 ` Pavel Machek 2007-05-04 21:52 ` Indan Zupancic 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-03 12:06 UTC (permalink / raw) To: Oliver Neukum; +Cc: Pekka Enberg, Nigel Cunningham, Linus Torvalds, LKML Hi! > > > The kernel can already do compression and encryption. > > > > Yes, if we all could agree on _which_ compression and encryption > > Any of those available in the kernel. Where's the problem? gzip is too slow for this. lzf works okay. Oh and swsusp wants rsa crypto. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-03 12:06 ` Pavel Machek @ 2007-05-04 21:52 ` Indan Zupancic 2007-05-05 9:16 ` Pavel Machek 0 siblings, 1 reply; 135+ messages in thread From: Indan Zupancic @ 2007-05-04 21:52 UTC (permalink / raw) To: Pavel Machek Cc: Oliver Neukum, Pekka Enberg, Nigel Cunningham, Linus Torvalds, LKML On Thu, May 3, 2007 14:06, Pavel Machek wrote: >> > > The kernel can already do compression and encryption. >> > >> > Yes, if we all could agree on _which_ compression and encryption >> >> Any of those available in the kernel. Where's the problem? > > gzip is too slow for this. lzf works okay. Oh and swsusp wants rsa > crypto. Then port lzf to the kernel, or help with the lzo port. Swsusp might want RSA crypto, but it doesn't really need it. Currently it only uses it to be able to suspend without asking for a passphrase. So the current sequence is: 1) Generate RSA keys + ask for a passphrase. (Once) ... 2) Suspend. (Encrypt snapshot with public RSA key). ... 3) Ask for the passphrase. 4) Resume. RSA is used so that the passphrase can be thrown away between 1 and 2. But the same functionality can be achieved by doing: 1) Define a user password (e.g. /etc/shadow thing). (Once) 2) When a user logs in: get random data and encrypt it with the password, this becomes the AES key. Store both the data and key in a secure way in memory, e.g. using the existing kernel key infrastructure. ... 3) Suspend. (Encrypt snapshot with the AES key and store the random data.) ... 3) Ask for the passphrase. (To get the AES key, encrypt the stored random data.) 4) Resume. Variants are possible of course, but this is the main idea. This is secure because the key infrastructure is secure, and even if it isn't the system must be compromised to get the suspend key before the suspend is done. But at that point the attacker already has all information that can be found in the suspend image, and could have done all kind of things to inflict damage (like installing a key logger). Advantage of this scheme is that it only need AES and can be done (mostly) in kernel space. It's also faster and simpler than the current RSA scheme. Disadvantage is that it wastes at least 32 bytes of memory when the system is running, to store the data and key. Only thing that needs to be done in userspace is setting the random data and AES key, but there exist a suitable interface for that (the key system). As user login is already done in user space, this can be integrated with that in a nice way. Greetings, Indan ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-04 21:52 ` Indan Zupancic @ 2007-05-05 9:16 ` Pavel Machek 2007-05-05 12:02 ` Indan Zupancic 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-05 9:16 UTC (permalink / raw) To: Indan Zupancic Cc: Oliver Neukum, Pekka Enberg, Nigel Cunningham, Linus Torvalds, LKML Hi! > But the same functionality can be achieved by doing: > > 1) Define a user password (e.g. /etc/shadow thing). (Once) > > 2) When a user logs in: get random data and encrypt it with the password, > this becomes the AES key. Store both the data and key in a secure way in > memory, e.g. using the existing kernel key infrastructure. > Advantage of this scheme is that it only need AES and can be done (mostly) > in kernel space. It's also faster and simpler than the current RSA scheme. > Disadvantage is that it wastes at least 32 bytes of memory when the system > is running, to store the data and key. Another disadvantage is that you need to hack into PAM infrastructure, that your suspend password needs to be same as someone's login password, and that it will really only work with single-user machine. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-05 9:16 ` Pavel Machek @ 2007-05-05 12:02 ` Indan Zupancic 0 siblings, 0 replies; 135+ messages in thread From: Indan Zupancic @ 2007-05-05 12:02 UTC (permalink / raw) To: Pavel Machek Cc: Oliver Neukum, Pekka Enberg, Nigel Cunningham, Linus Torvalds, LKML Hello, On Sat, May 5, 2007 11:16, Pavel Machek wrote: >> But the same functionality can be achieved by doing: >> >> 1) Define a user password (e.g. /etc/shadow thing). (Once) >> >> 2) When a user logs in: get random data and encrypt it with the password, >> this becomes the AES key. Store both the data and key in a secure way in >> memory, e.g. using the existing kernel key infrastructure. > > > >> Advantage of this scheme is that it only need AES and can be done (mostly) >> in kernel space. It's also faster and simpler than the current RSA scheme. >> Disadvantage is that it wastes at least 32 bytes of memory when the system >> is running, to store the data and key. > > Another disadvantage is that you need to hack into PAM infrastructure, > that your suspend password needs to be same as someone's login > password, and that it will really only work with single-user machine. The first two are only true if you want to integrate it with user login, so that a user only needs to sign in once, which seems like a convenient thing. But if you don't want to integrate with the existing login infrastructure, then just don't. And those disadvantages are true for any system that wants users to login once. Then the disadvantage is reduced to a user needing to provide the password at suspend if the system wasn't booted from a snapshot. But no need for users to generate any files, just to choose a resume password. If the resume key is stored per user instead of a single global instance, it will work with a multi-user system too. A more interesting question is what should happen when one user did the suspend and the other wants to resume. Throw away the snapshot? Refuse booting? Or boot and switch "active user"? If you don't want people to resume each other's suspends then a key per user works. If you want them to, then it becomes a bit tricky, especially if you don't integrate with the login system. You don't want that a user can resume someone else's snapshot and have access to everything that other user left open. Nor do you want users to give a password twice. If you want users to be able to resume each other's snapshots, you probably also want the system to switch users after the resume. No matter what scheme is used, this becomes hairy and hard to get watertight. (Perhaps "impossible" is more realistic: how to be able to read the suspend image and copying it to RAM again, without having access to all data within?) But if it's an "us" against "them" case, and you want users to resume each other's snapshots, you're right that the scheme I proposed will fall apart. In which case it needs to be adjusted a bit to handle this case: Have one global suspend/resume key, and for each user store it on disk, encrypted with that user's password. Also store the key in memory as before. Now when the system is suspended any user needs to have provided his password once for everyone to be able to suspend without giving a password. Also everyone can resume, if they have access to the file with the list of encrypted keys and provide the right password. (Notice that this looks more like the current scheme, where the private part of the RSA key is encrypted with a passphrase and all stored in a file.) Though it seems that using suspend to disk on a real multi-user system is always asking for problems, because the suspend image may contain valuable data which shouldn't be thrown away, but easily can by other users. Nor do you want users to claim the machine, so it's a lose/lose situation. Also with resume every user effectively gets root access, because of all the memory access. So inter-user security is down the drain anyway. Only sane usage I can see is when the users trust each other, in which case they can as well agree on one resume password. ;-) Greetings, Indan ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 10:12 ` Pekka J Enberg 2007-04-27 19:07 ` Oliver Neukum @ 2007-04-28 10:35 ` Rafael J. Wysocki 2007-04-28 18:43 ` David Lang 1 sibling, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 10:35 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Oliver Neukum, Nigel Cunningham, Linus Torvalds, LKML On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: > Am Freitag, 27. April 2007 08:18 schrieb Pekka J Enberg: > > > No. The snapshot is just that. A snapshot in time. From kernel point of > > > view, it doesn't matter one bit what when you did it or if the state has > > > changed before you resume. It's up to userspace to make sure the user > > > doesn't do real work while the snapshot is being written to disk and > > > machine is shut down. > > On Fri, 27 Apr 2007, Oliver Neukum wrote: > > And where is the benefit in that? How is such user space freezing logic > > simpler than having the kernel do the write? > > > > What can you do in user space if all filesystems are r/o that is worth the > > hassle? > > I am talking about snapshot_system() here. It's not given that the > filesystems need to be read-only (you can snapshot them too). The benefit > here is that you can do whatever you want with the snapshot (encrypt, > compress, send over the network) and have a clean well-defined interface > in the kernel. In addition, aborting the snapshot is simpler, simply > munmap() the snapshot. Well, swsusp currently does almost the same, except that you can read the image from the kernel as a stream of bytes, using read() and, during the restore phase, upload the same image using write(). The advantage of this is that the interface is symmetrical from the user space's point of view. [You're cancelling the hibernation by closing /dev/snapshot, which also is quite natural.] If you look at the interface in user.c, there are only two ioctls really needed for that in there, SNAPSHOT_ATOMIC_SNAPSHOT and SNAPSHOT_ATOMIC_RESTORE. Two more are handy for freezing tasks, SNAPSHOT_FREEZE and SNAPSHOT_UNFREEZE. The others were added later, to make the user space part simpler or capable of doing some fancy stuff, which I am ready to admit was a mistake. > The problem with writing in the kernel is obvious: we need to add new code > to the kernel for compression, encryption, and userspace interaction > (graphical progress bar) that are important for user experience. Yes, and that's why we wanted to introduce the userland part. The problem with this approach, as it's turned out, is that the userland part must be a very specialized piece of software, really careful of what it's doing, mainly because of the inability to checkpoint filesystems. If we could checkpoint filesystems and were able to unfreeze the user space after creating the snapshot without the risk of corrupting filesystems in the restore phase, the userland part could be much simpler (even as simple as Linus suggested). Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 10:35 ` Rafael J. Wysocki @ 2007-04-28 18:43 ` David Lang 2007-04-28 19:37 ` Rafael J. Wysocki 0 siblings, 1 reply; 135+ messages in thread From: David Lang @ 2007-04-28 18:43 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pekka J Enberg, Oliver Neukum, Nigel Cunningham, Linus Torvalds, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: >> The problem with writing in the kernel is obvious: we need to add new code >> to the kernel for compression, encryption, and userspace interaction >> (graphical progress bar) that are important for user experience. > > Yes, and that's why we wanted to introduce the userland part. The problem > with this approach, as it's turned out, is that the userland part must be a > very specialized piece of software, really careful of what it's doing, mainly > because of the inability to checkpoint filesystems. If we could checkpoint > filesystems and were able to unfreeze the user space after creating the > snapshot without the risk of corrupting filesystems in the restore phase, > the userland part could be much simpler (even as simple as Linus suggested). this sounds like a really good argument for having a useable userspace running. we already have the LVM snapshot code in the kernel, so we have the pieces available to protect the filesystems, we just need to figure out how to put them togeather. (the simpliest way would be to make a new suspend package that required the user to use LVM so that snapshots are available, but this is also the most disruptive approach) David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 18:43 ` David Lang @ 2007-04-28 19:37 ` Rafael J. Wysocki 0 siblings, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 19:37 UTC (permalink / raw) To: David Lang Cc: Pekka J Enberg, Oliver Neukum, Nigel Cunningham, Linus Torvalds, LKML On Saturday, 28 April 2007 20:43, David Lang wrote: > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > On Friday, 27 April 2007 12:12, Pekka J Enberg wrote: > >> The problem with writing in the kernel is obvious: we need to add new code > >> to the kernel for compression, encryption, and userspace interaction > >> (graphical progress bar) that are important for user experience. > > > > Yes, and that's why we wanted to introduce the userland part. The problem > > with this approach, as it's turned out, is that the userland part must be a > > very specialized piece of software, really careful of what it's doing, mainly > > because of the inability to checkpoint filesystems. If we could checkpoint > > filesystems and were able to unfreeze the user space after creating the > > snapshot without the risk of corrupting filesystems in the restore phase, > > the userland part could be much simpler (even as simple as Linus suggested). > > this sounds like a really good argument for having a useable userspace running. > we already have the LVM snapshot code in the kernel, so we have the pieces > available to protect the filesystems, we just need to figure out how to put them > togeather. (the simpliest way would be to make a new suspend package that > required the user to use LVM so that snapshots are available, but this is also > the most disruptive approach) Yes. I personally know very little about the LVM snapshot code and I wasn't aware of its capabilities. If we can make it possible to run the user space safely after we've created the memory snapshot, I'm all for it. As far as the package is concerned, we can just add the new user space tools to the suspend package containing our existing userland part. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 6:18 ` Pekka J Enberg ` (2 preceding siblings ...) 2007-04-27 9:50 ` Oliver Neukum @ 2007-04-27 21:24 ` Rafael J. Wysocki 2007-04-27 21:44 ` Linus Torvalds 3 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 21:24 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Nigel Cunningham, Linus Torvalds, LKML On Friday, 27 April 2007 08:18, Pekka J Enberg wrote: > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > COW is a possibility, but I understood (perhaps wrongly) that Linus was > > thinking of a single syscall or such like to prepare the snapshot. If > > you're going to start doing things like this, won't that mean you'd then > > have to update/redo the snapshot or somehow nullify the effect of > > anything the programs does so that doing it again after the snapshot is > > restored doesn't cause problems? > > No. The snapshot is just that. A snapshot in time. From kernel point of > view, it doesn't matter one bit what when you did it or if the state has > changed before you resume. It's up to userspace to make sure the user > doesn't do real work while the snapshot is being written to disk and > machine is shut down. Why do you think that keeping the user space frozen after 'snapshot' is a bad idea? I think that solves many of the problems you're discussing. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 21:24 ` Rafael J. Wysocki @ 2007-04-27 21:44 ` Linus Torvalds 2007-04-27 22:04 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-27 21:44 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Pekka J Enberg, Nigel Cunningham, LKML On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > > Why do you think that keeping the user space frozen after 'snapshot' is a bad > idea? I think that solves many of the problems you're discussing. It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do gdb -p <snapshotter> when something goes wrong?) but we also *depend* on user space for various things (the same way we depend on kernel threads, and why it has been such a total disaster to try to freeze the kernel threads too!). For example, if you want to do graphical stuff, just using X would be quite nice, wouldn't it? But I do agree that doing everythign in the kernel is likely to just be a hell of a lot simpler for everybody. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 21:44 ` Linus Torvalds @ 2007-04-27 22:04 ` Rafael J. Wysocki 2007-04-27 22:08 ` Linus Torvalds 2007-04-27 22:07 ` Nigel Cunningham 2007-04-28 0:18 ` Jeremy Fitzhardinge 2 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 22:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pekka J Enberg, Nigel Cunningham, LKML On Friday, 27 April 2007 23:44, Linus Torvalds wrote: > > On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > > > > Why do you think that keeping the user space frozen after 'snapshot' is a bad > > idea? I think that solves many of the problems you're discussing. > > It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do > > gdb -p <snapshotter> > > when something goes wrong?) but we also *depend* on user space for various > things (the same way we depend on kernel threads, and why it has been such > a total disaster to try to freeze the kernel threads too!). We're freezing many of them just fine. ;-) > For example, if you want to do graphical stuff, just using X would be quite > nice, wouldn't it? Yes, it would, but as long as we can't protect mounted filesystems from being touched, it's just dangerous to let the user space run at that point. > But I do agree that doing everythign in the kernel is likely to just be a > hell of a lot simpler for everybody. :-) Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 22:04 ` Rafael J. Wysocki @ 2007-04-27 22:08 ` Linus Torvalds 2007-04-27 22:41 ` Rafael J. Wysocki 0 siblings, 1 reply; 135+ messages in thread From: Linus Torvalds @ 2007-04-27 22:08 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Pekka J Enberg, Nigel Cunningham, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > We're freezing many of them just fine. ;-) And can you name a _single_ advantage of doing so? It so happens, that most people wouldn't notice or care that kmirrord got frozen (kernel thread picked at random - it might be one of the threads that has gotten special-cased to not do that), but I have yet to hear a single coherent explanation for why it's actually a good idea in the first place. And it has added totally idiotic code to every single kernel thread main loop. For _no_ reason, except that the concept was broken, and needed more breakage to just make it work. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 22:08 ` Linus Torvalds @ 2007-04-27 22:41 ` Rafael J. Wysocki 2007-04-27 22:26 ` David Lang 2007-04-27 23:17 ` Linus Torvalds 0 siblings, 2 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 22:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: Pekka J Enberg, Nigel Cunningham, LKML On Saturday, 28 April 2007 00:08, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > We're freezing many of them just fine. ;-) > > And can you name a _single_ advantage of doing so? Yes. We have a lot less interdependencies to worry about during the whole operation. > It so happens, that most people wouldn't notice or care that kmirrord got > frozen (kernel thread picked at random - it might be one of the threads > that has gotten special-cased to not do that), but I have yet to hear a > single coherent explanation for why it's actually a good idea in the first > place. Well, I don't know if that's a 'coherent' explanation from your point of view (probably not), but I'll try nevertheless: 1) if the kernel threads are frozen, we know that they don't hold any locks that could interfere with the freezing of device drivers, 2) if they are frozen, we know, for example, that they won't call user mode helpers or do similar things, 3) if they are frozen, we know that they won't submit I/O to disks and potentially damage filesystems (suspend2 has much more problems with that than swsusp, but still. And yes, there have been bug reports related to it, so it's not just my fantasy). Probably some other people can say more about it. > And it has added totally idiotic code to every single kernel thread main > loop. For _no_ reason, except that the concept was broken, and needed more > breakage to just make it work. It is actually useful for some things other than the hibernation/suspend, the code is not idiotic (it's one line of code in the majority of cases) and you should take that "I hate everything even remotely related to hibernation" hat off, really. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 22:41 ` Rafael J. Wysocki @ 2007-04-27 22:26 ` David Lang 2007-04-27 23:21 ` Rafael J. Wysocki 2007-04-27 23:17 ` Linus Torvalds 1 sibling, 1 reply; 135+ messages in thread From: David Lang @ 2007-04-27 22:26 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linus Torvalds, Pekka J Enberg, Nigel Cunningham, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: >>> We're freezing many of them just fine. ;-) >> >> And can you name a _single_ advantage of doing so? > > Yes. We have a lot less interdependencies to worry about during the whole > operation. > >> It so happens, that most people wouldn't notice or care that kmirrord got >> frozen (kernel thread picked at random - it might be one of the threads >> that has gotten special-cased to not do that), but I have yet to hear a >> single coherent explanation for why it's actually a good idea in the first >> place. > > Well, I don't know if that's a 'coherent' explanation from your point of view > (probably not), but I'll try nevertheless: > 1) if the kernel threads are frozen, we know that they don't hold any locks > that could interfere with the freezing of device drivers, does teh process of freezing really wait until all locks have been released? > 2) if they are frozen, we know, for example, that they won't call user mode > helpers or do similar things, this won't matter unless the user mode helpers are going to do I/O or other permanent changes > 3) if they are frozen, we know that they won't submit I/O to disks and > potentially damage filesystems (suspend2 has much more problems with that > than swsusp, but still. And yes, there have been bug reports related to it, > so it's not just my fantasy). if you have the filesystems checkpointed then I/O after the freeze won't matter as you just revert to the checkpoint (and since this is going to be thrown away it can stay in ram) if we are willing to make a break with the past to implement the new snapshot capability, we should be able to use the LVM snapshot code to handle the filesystem David Lang > Probably some other people can say more about it. > >> And it has added totally idiotic code to every single kernel thread main >> loop. For _no_ reason, except that the concept was broken, and needed more >> breakage to just make it work. > > It is actually useful for some things other than the hibernation/suspend, the > code is not idiotic (it's one line of code in the majority of cases) and you > should take that "I hate everything even remotely related to hibernation" hat > off, really. > > Greetings, > Rafael > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 22:26 ` David Lang @ 2007-04-27 23:21 ` Rafael J. Wysocki 2007-04-27 23:01 ` David Lang 0 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 23:21 UTC (permalink / raw) To: David Lang; +Cc: Linus Torvalds, Pekka J Enberg, Nigel Cunningham, LKML On Saturday, 28 April 2007 00:26, David Lang wrote: > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > >>> We're freezing many of them just fine. ;-) > >> > >> And can you name a _single_ advantage of doing so? > > > > Yes. We have a lot less interdependencies to worry about during the whole > > operation. > > > >> It so happens, that most people wouldn't notice or care that kmirrord got > >> frozen (kernel thread picked at random - it might be one of the threads > >> that has gotten special-cased to not do that), but I have yet to hear a > >> single coherent explanation for why it's actually a good idea in the first > >> place. > > > > Well, I don't know if that's a 'coherent' explanation from your point of view > > (probably not), but I'll try nevertheless: > > 1) if the kernel threads are frozen, we know that they don't hold any locks > > that could interfere with the freezing of device drivers, > > does teh process of freezing really wait until all locks have been released? Yes, it does. > > 2) if they are frozen, we know, for example, that they won't call user mode > > helpers or do similar things, > > this won't matter unless the user mode helpers are going to do I/O or other > permanent changes Please note that even accessing a file may be a permanent change. > > 3) if they are frozen, we know that they won't submit I/O to disks and > > potentially damage filesystems (suspend2 has much more problems with that > > than swsusp, but still. And yes, there have been bug reports related to it, > > so it's not just my fantasy). > > if you have the filesystems checkpointed then I/O after the freeze won't matter > as you just revert to the checkpoint (and since this is going to be thrown away > it can stay in ram) In that case, I would agree. Currently, however, we're not even close to this point. The checkpointing of filesystems would be a very welcome feature, but there's no anyone working on it right now, AFAICT. > if we are willing to make a break with the past to implement the new snapshot > capability, we should be able to use the LVM snapshot code to handle the > filesystem Yes, we can do that, in principle, and screw all of the current users in the process. And finally we'd end up with something similar to what is done now, IMHO. And no, the things are not just totally broken, as it may follow from these discussions. The problem is that the people who are discussing them so viciously have never tried to write anything like the hibernation code. This is as though as I were discussing the design of the CPU schedulers, although I only know how they work on a general level. Actually, the really problematic thing with the hibernation _right_ _now_ is what Linus is so concerned about (and rightfully so) - that we use the same device drivers' callbacks for the hibernation and suspend (aka s2ram). The other things work quite well and are really robust. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:21 ` Rafael J. Wysocki @ 2007-04-27 23:01 ` David Lang 2007-04-28 0:02 ` Rafael J. Wysocki 0 siblings, 1 reply; 135+ messages in thread From: David Lang @ 2007-04-27 23:01 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linus Torvalds, Pekka J Enberg, Nigel Cunningham, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 00:26, David Lang wrote: >> On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: >> >>>>> We're freezing many of them just fine. ;-) >>>> >>>> And can you name a _single_ advantage of doing so? >>> >>> Yes. We have a lot less interdependencies to worry about during the whole >>> operation. >>> >>>> It so happens, that most people wouldn't notice or care that kmirrord got >>>> frozen (kernel thread picked at random - it might be one of the threads >>>> that has gotten special-cased to not do that), but I have yet to hear a >>>> single coherent explanation for why it's actually a good idea in the first >>>> place. >>> >>> Well, I don't know if that's a 'coherent' explanation from your point of view >>> (probably not), but I'll try nevertheless: >>> 1) if the kernel threads are frozen, we know that they don't hold any locks >>> that could interfere with the freezing of device drivers, >> >> does teh process of freezing really wait until all locks have been released? > > Yes, it does. > >>> 2) if they are frozen, we know, for example, that they won't call user mode >>> helpers or do similar things, >> >> this won't matter unless the user mode helpers are going to do I/O or other >> permanent changes > > Please note that even accessing a file may be a permanent change. if accessing a file on a read-only filesystem changes that filesystem it's a bug see the recent thread about ext3 journal replays when mounting read-only as an example. >>> 3) if they are frozen, we know that they won't submit I/O to disks and >>> potentially damage filesystems (suspend2 has much more problems with that >>> than swsusp, but still. And yes, there have been bug reports related to it, >>> so it's not just my fantasy). >> >> if you have the filesystems checkpointed then I/O after the freeze won't matter >> as you just revert to the checkpoint (and since this is going to be thrown away >> it can stay in ram) > > In that case, I would agree. Currently, however, we're not even close to this > point. > > The checkpointing of filesystems would be a very welcome feature, but there's > no anyone working on it right now, AFAICT. > >> if we are willing to make a break with the past to implement the new snapshot >> capability, we should be able to use the LVM snapshot code to handle the >> filesystem > > Yes, we can do that, in principle, and screw all of the current users in the > process. And finally we'd end up with something similar to what is done now, > IMHO. however, the result may be a lot less 'special case pwoer management' code and a lot more re-use of code that's in place for other uses. if work on the current versions was stopped (other then trying to avoid regressions) and a new version (with new userspace tools) was built in a way that satisfies everyone the old version could be phased out in a year or two (per the normal feture removal process) > And no, the things are not just totally broken, as it may follow from these > discussions. The problem is that the people who are discussing them so > viciously have never tried to write anything like the hibernation code. > > This is as though as I were discussing the design of the CPU schedulers, > although I only know how they work on a general level. > > Actually, the really problematic thing with the hibernation _right_ _now_ is > what Linus is so concerned about (and rightfully so) - that we use the > same device drivers' callbacks for the hibernation and suspend (aka s2ram). > The other things work quite well and are really robust. if simply splitting the functions cleans everything up enough to satisfy everyone then we're almost done right? ;-) however I think that there are other fundamental disagreements here, and neither the 'do absolutly everything in the kernel' or the 'do almost nothing in the kernel' approaches are going to fly in the long run. I think the userspace<->kernel interface is going to be different then either apprach is doing now, and as such it's an oppurtunity to make more drastic changes if they are appropriate. for example, why should we have LVM snapshot code and hibernate snapshot/filesystem checkpoint code instead of just useing the LVM code (which gets excercised and tested far more then the other code ever would be)? saying that if you want to suspend to disk you need to use LVM is a change, but it's a change that people could probably live with. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:01 ` David Lang @ 2007-04-28 0:02 ` Rafael J. Wysocki 0 siblings, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 0:02 UTC (permalink / raw) To: David Lang; +Cc: Linus Torvalds, Pekka J Enberg, Nigel Cunningham, LKML On Saturday, 28 April 2007 01:01, David Lang wrote: > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > On Saturday, 28 April 2007 00:26, David Lang wrote: > >> On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > >> > >>>>> We're freezing many of them just fine. ;-) > >>>> > >>>> And can you name a _single_ advantage of doing so? > >>> > >>> Yes. We have a lot less interdependencies to worry about during the whole > >>> operation. > >>> > >>>> It so happens, that most people wouldn't notice or care that kmirrord got > >>>> frozen (kernel thread picked at random - it might be one of the threads > >>>> that has gotten special-cased to not do that), but I have yet to hear a > >>>> single coherent explanation for why it's actually a good idea in the first > >>>> place. > >>> > >>> Well, I don't know if that's a 'coherent' explanation from your point of view > >>> (probably not), but I'll try nevertheless: > >>> 1) if the kernel threads are frozen, we know that they don't hold any locks > >>> that could interfere with the freezing of device drivers, > >> > >> does teh process of freezing really wait until all locks have been released? > > > > Yes, it does. > > > >>> 2) if they are frozen, we know, for example, that they won't call user mode > >>> helpers or do similar things, > >> > >> this won't matter unless the user mode helpers are going to do I/O or other > >> permanent changes > > > > Please note that even accessing a file may be a permanent change. > > if accessing a file on a read-only filesystem changes that filesystem it's a bug > > see the recent thread about ext3 journal replays when mounting read-only as an > example. Oh well. Is this really wrong to protect users from such bugs, if we can do that? > >>> 3) if they are frozen, we know that they won't submit I/O to disks and > >>> potentially damage filesystems (suspend2 has much more problems with that > >>> than swsusp, but still. And yes, there have been bug reports related to it, > >>> so it's not just my fantasy). > >> > >> if you have the filesystems checkpointed then I/O after the freeze won't matter > >> as you just revert to the checkpoint (and since this is going to be thrown away > >> it can stay in ram) > > > > In that case, I would agree. Currently, however, we're not even close to this > > point. > > > > The checkpointing of filesystems would be a very welcome feature, but there's > > no anyone working on it right now, AFAICT. > > > >> if we are willing to make a break with the past to implement the new snapshot > >> capability, we should be able to use the LVM snapshot code to handle the > >> filesystem > > > > Yes, we can do that, in principle, and screw all of the current users in the > > process. And finally we'd end up with something similar to what is done now, > > IMHO. > > however, the result may be a lot less 'special case pwoer management' code and a Are you referring to some specific code? > lot more re-use of code that's in place for other uses. This already is happening. > if work on the current versions was stopped (other then trying to avoid > regressions) and a new version (with new userspace tools) was built in a way > that satisfies everyone the old version could be phased out in a year or two > (per the normal feture removal process) May I say it's not realistic? > > And no, the things are not just totally broken, as it may follow from these > > discussions. The problem is that the people who are discussing them so > > viciously have never tried to write anything like the hibernation code. > > > > This is as though as I were discussing the design of the CPU schedulers, > > although I only know how they work on a general level. > > > > Actually, the really problematic thing with the hibernation _right_ _now_ is > > what Linus is so concerned about (and rightfully so) - that we use the > > same device drivers' callbacks for the hibernation and suspend (aka s2ram). > > The other things work quite well and are really robust. > > if simply splitting the functions cleans everything up enough to satisfy > everyone then we're almost done right? ;-) Practically, yes. Theoretically, there's no software you can't improve (except, probably, TeX), but that might not be worth the effort. > however I think that there are other fundamental disagreements here, and neither > the 'do absolutly everything in the kernel' or the 'do almost nothing in the > kernel' approaches are going to fly in the long run. I think we'll have an agreement, though. > I think the userspace<->kernel interface is going to be different then > either apprach is doing now, You're probably right > and as such it's an oppurtunity to make more drastic changes if they are > appropriate. Well, maybe. > for example, why should we have LVM snapshot code and hibernate > snapshot/filesystem checkpoint code instead of just useing the LVM code (which > gets excercised and tested far more then the other code ever would be)? saying > that if you want to suspend to disk you need to use LVM is a change, but it's > a change that people could probably live with. Well, that's a theory. Probably a good one, but still. :-) The positive aspect of all this is that people have started to pay attention to what we're doing, and gradually they will learn about the problems that they're just not seeing right now. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 22:41 ` Rafael J. Wysocki 2007-04-27 22:26 ` David Lang @ 2007-04-27 23:17 ` Linus Torvalds 2007-04-27 23:45 ` Rafael J. Wysocki 2007-05-03 15:25 ` Pavel Machek 1 sibling, 2 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-27 23:17 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Pekka J Enberg, Nigel Cunningham, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > And can you name a _single_ advantage of doing so? > > Yes. We have a lot less interdependencies to worry about during the whole > operation. That's not an advantage. That's why it has *sucked*. Trying to freeze kernel threads has _caused_ problems. It has _added_ these interdependencies. It hasn't removed a single dependency at any time, it has just added new problems! > 1) if the kernel threads are frozen, we know that they don't hold any locks > that could interfere with the freezing of device drivers, > 2) if they are frozen, we know, for example, that they won't call user mode > helpers or do similar things, > 3) if they are frozen, we know that they won't submit I/O to disks and > potentially damage filesystems (suspend2 has much more problems with that > than swsusp, but still. And yes, there have been bug reports related to it, > so it's not just my fantasy). NONE of these are valid explanations at all. You're listing totally theoretical problems, and ignoring all the _real_ problems that trying to freeze kernel threads has _caused_. If you want to control user-mode helpers, you do that - you do not freeze kernel threads! And no, kernel threads do not submit IO to disks on their own. You just made that up. Yes, they can be involved in that whole disk submission thing, but in a good way - they can be required in order to make disk writing work! The problem that suspend has had is that it's done everything totally the wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. For example, kernel threads can be involved in md etc, but that's a *good* thing. The way to shut them up is not to freeze the threads, but to freeze the *disk*. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:17 ` Linus Torvalds @ 2007-04-27 23:45 ` Rafael J. Wysocki 2007-04-27 23:57 ` Nigel Cunningham 2007-04-27 23:59 ` Linus Torvalds 2007-05-03 15:25 ` Pavel Machek 1 sibling, 2 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 23:45 UTC (permalink / raw) To: Linus Torvalds, Nigel Cunningham; +Cc: Pekka J Enberg, LKML On Saturday, 28 April 2007 01:17, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > And can you name a _single_ advantage of doing so? > > > > Yes. We have a lot less interdependencies to worry about during the whole > > operation. > > That's not an advantage. That's why it has *sucked*. Actually, the less things happen while we're creating and saving the image, the less sources of potential problems there are and by freezing the kernel threads (not all of them), we cause less things to happen at that time. To make you happy, we could stop doing that, but what actual _advantage_ that would bring? > Trying to freeze kernel threads has _caused_ problems. It has _added_ > these interdependencies. It hasn't removed a single dependency at any > time, it has just added new problems! What problems are you talking about? > > 1) if the kernel threads are frozen, we know that they don't hold any locks > > that could interfere with the freezing of device drivers, > > 2) if they are frozen, we know, for example, that they won't call user mode > > helpers or do similar things, > > 3) if they are frozen, we know that they won't submit I/O to disks and > > potentially damage filesystems (suspend2 has much more problems with that > > than swsusp, but still. And yes, there have been bug reports related to it, > > so it's not just my fantasy). > > NONE of these are valid explanations at all. You're listing totally > theoretical problems, and ignoring all the _real_ problems that trying to > freeze kernel threads has _caused_. Example, please? > If you want to control user-mode helpers, you do that - you do not freeze > kernel threads! > > And no, kernel threads do not submit IO to disks on their own. You just > made that up. No, I didn't. Nigel can confirm, I think. > Yes, they can be involved in that whole disk submission thing, but in a good > way - they can be required in order to make disk writing work! Some of them can be, some other's need not be. We don't need any fs-related kernel threads for saving the image, for example. > The problem that suspend has had is that it's done everything totally the > wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. They can be asked before we do the snapshot and complete the operation afterwards, no? > For example, kernel threads can be involved in md etc, but that's a *good* > thing. We don't freeze these threads. > The way to shut them up is not to freeze the threads, but to freeze the *disk*. In principle, you're right. In practice, go and try it. Anyway, why is it so important that _all_ of the kernel threads be running while the snapshot is created and saved? ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:45 ` Rafael J. Wysocki @ 2007-04-27 23:57 ` Nigel Cunningham 2007-04-27 23:50 ` David Lang 2007-04-27 23:59 ` Linus Torvalds 1 sibling, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 23:57 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linus Torvalds, Pekka J Enberg, LKML [-- Attachment #1: Type: text/plain, Size: 4524 bytes --] Hi. On Sat, 2007-04-28 at 01:45 +0200, Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 01:17, Linus Torvalds wrote: > > > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > > > And can you name a _single_ advantage of doing so? > > > > > > Yes. We have a lot less interdependencies to worry about during the whole > > > operation. > > > > That's not an advantage. That's why it has *sucked*. > > Actually, the less things happen while we're creating and saving the image, > the less sources of potential problems there are and by freezing the kernel > threads (not all of them), we cause less things to happen at that time. > > To make you happy, we could stop doing that, but what actual _advantage_ > that would bring? A couple of other advantages to freezing other processes: 1) It makes predicting how much memory is available for making and saving snapshot a tractable problem. It therefore makes hibernation _much_ more reliable. 2) Racing against other processes would also make hibernation slower, increasing the chances of your battery running out before the save is complete. 3) It makes finding potential memory leaks in the code possible. It was ages ago now, but at one stage I could display a table saying exactly how many pages had been allocated and freed by different sections of the process and compare the number of free pages at the start and end of the cycle to ensure there were no memory leaks at all. > > Trying to freeze kernel threads has _caused_ problems. It has _added_ > > these interdependencies. It hasn't removed a single dependency at any > > time, it has just added new problems! > > What problems are you talking about? > > > > 1) if the kernel threads are frozen, we know that they don't hold any locks > > > that could interfere with the freezing of device drivers, > > > 2) if they are frozen, we know, for example, that they won't call user mode > > > helpers or do similar things, > > > 3) if they are frozen, we know that they won't submit I/O to disks and > > > potentially damage filesystems (suspend2 has much more problems with that > > > than swsusp, but still. And yes, there have been bug reports related to it, > > > so it's not just my fantasy). > > > > NONE of these are valid explanations at all. You're listing totally > > theoretical problems, and ignoring all the _real_ problems that trying to > > freeze kernel threads has _caused_. > > Example, please? I agree with Rafael. Freezing processes greatly helps in ensuring we have a consistent image. He's right, too, in asserting that it's even more important for Suspend2. Freezing processes is essential to being able to know that those LRU pages won't change and therefore being able to save them separately and then reuse them for the atomic copy. > > If you want to control user-mode helpers, you do that - you do not freeze > > kernel threads! > > > > And no, kernel threads do not submit IO to disks on their own. You just > > made that up. > > No, I didn't. Nigel can confirm, I think. I have had problems with MD threads generating I/O that I couldn't account for - after userspace had been frozen, filesystems had been nicely synced and so on. I have to speak with reservations though, because I haven't yet gotten to the bottom of where the I/O is coming from... too many things, too small time slices. > > Yes, they can be involved in that whole disk submission thing, but in a good > > way - they can be required in order to make disk writing work! > > Some of them can be, some other's need not be. We don't need any fs-related > kernel threads for saving the image, for example. Yeah, so long as we bmap the storage we want to use beforehand (thinking of swap files and ordinary files). > > The problem that suspend has had is that it's done everything totally the > > wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. > > They can be asked before we do the snapshot and complete the operation > afterwards, no? > > > For example, kernel threads can be involved in md etc, but that's a *good* > > thing. > > We don't freeze these threads. > > > The way to shut them up is not to freeze the threads, but to freeze the *disk*. > > In principle, you're right. In practice, go and try it. I have to disagree here. Freezing the disk instead of the threads is dealing with the symptoms instead of the cause. Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:57 ` Nigel Cunningham @ 2007-04-27 23:50 ` David Lang 2007-04-28 0:40 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 135+ messages in thread From: David Lang @ 2007-04-27 23:50 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML On Sat, 28 Apr 2007, Nigel Cunningham wrote: > Hi. > > On Sat, 2007-04-28 at 01:45 +0200, Rafael J. Wysocki wrote: >> On Saturday, 28 April 2007 01:17, Linus Torvalds wrote: >>> >>> On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: >>>> >>>>> And can you name a _single_ advantage of doing so? >>>> >>>> Yes. We have a lot less interdependencies to worry about during the whole >>>> operation. >>> >>> That's not an advantage. That's why it has *sucked*. >> >> Actually, the less things happen while we're creating and saving the image, >> the less sources of potential problems there are and by freezing the kernel >> threads (not all of them), we cause less things to happen at that time. >> >> To make you happy, we could stop doing that, but what actual _advantage_ >> that would bring? > > A couple of other advantages to freezing other processes: > > 1) It makes predicting how much memory is available for making and > saving snapshot a tractable problem. It therefore makes hibernation > _much_ more reliable. > 2) Racing against other processes would also make hibernation slower, > increasing the chances of your battery running out before the save is > complete. > 3) It makes finding potential memory leaks in the code possible. It was > ages ago now, but at one stage I could display a table saying exactly > how many pages had been allocated and freed by different sections of the > process and compare the number of free pages at the start and end of the > cycle to ensure there were no memory leaks at all. nobody is suggesting that you leave peocesses running while you do the snapshot, what is being proposed is 1. pause userspace (prevent scheduling) 2. make snapshot image of memory 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) 4. unpause 5. save image (with full userspace available, including network) 6. shutdown system (throw away all userspace memory, no need to do graceful shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if needed) >>> NONE of these are valid explanations at all. You're listing totally >>> theoretical problems, and ignoring all the _real_ problems that trying to >>> freeze kernel threads has _caused_. >> >> Example, please? > > I agree with Rafael. Freezing processes greatly helps in ensuring we > have a consistent image. He's right, too, in asserting that it's even > more important for Suspend2. Freezing processes is essential to being > able to know that those LRU pages won't change and therefore being able > to save them separately and then reuse them for the atomic copy. all that's needed for the snapshot is to prevent userspace from scheduling, and prevent media from being written to in a permanent way (writing to a LVM volume after invoking a snapshot doesn't count, just revert to the snapshot) David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:50 ` David Lang @ 2007-04-28 0:40 ` Linus Torvalds 2007-04-28 6:58 ` Oliver Neukum 2007-05-03 17:18 ` Pavel Machek 2 siblings, 0 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-28 0:40 UTC (permalink / raw) To: David Lang; +Cc: Nigel Cunningham, Rafael J. Wysocki, Pekka J Enberg, LKML On Fri, 27 Apr 2007, David Lang wrote: > > all that's needed for the snapshot is to prevent userspace from scheduling, Strictly speaking, all you *really* want to make sure is not so much that user-space isn't scheduling, as the fact that all device IO buffers must be empty. We can trivially snapshot an active user-space, and in fact it would probably be hard to do a snapshot in a way that it could even *know* or care about whether there are user-space processes running at the time of the snapshot. So that's not the real problem. What we obviously *cannot* snapshot is if some particular device is in the middle of being written to or read from, and has outstanding commands on the device itself (as opposed to just queued to the driver). So what we do want to make sure happens is that there are no IO queues that are active. And the best way to make sure that there are no IO queues active is to make sure that there are no new read or write-requests. And *that* you can do two ways: - actually intercepting the read/write requests. Probably not too hard, we could literally do it in the IO scheduler (and probably much more easily than doing it in the process scheduler), but the easy cases will only cover the block device layer, and character devices don't have the same kind of scheduler you can trap IO in. - we also don't want to generate new data that needs to be snapshotted, so we want to trap people who write even just to the page cache and turn pages dirty. Again, we could probably do it at *that* point (ie trapping them when they try to dirty a page), and it would be more logical, but again, there are other cases of people who generate more data (just any memory allocation obviously is a special case of generating more data to be snapshotted), so I do agree that we want to stop producing new data to be snapshotted, and we want to stop producing new read-requests. But kernel threads really do neither: in an idle system, kernel threads are idle too. A kernel thread is not like a user program that actually generates data - they only tend to act on behalf of other processes' needs. So I think that what snapshotting really *wants* to stop is not schedulign per se, but IO. And stopping user processes (as opposed to kernel threads) is probably a good way to get there. In fact, I'd argue that you want to stop user space and then encourage some kernel threads to *start* running, notably things like bdflush should probably be kicked to clean up some dirty stuff as part of the "shrink data to be snapshotted" part. Trying to free memory will do that on its own, of course. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:50 ` David Lang 2007-04-28 0:40 ` Linus Torvalds @ 2007-04-28 6:58 ` Oliver Neukum 2007-04-28 9:16 ` Pekka J Enberg 2007-04-28 18:28 ` David Lang 2007-05-03 17:18 ` Pavel Machek 2 siblings, 2 replies; 135+ messages in thread From: Oliver Neukum @ 2007-04-28 6:58 UTC (permalink / raw) To: David Lang Cc: Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML Am Samstag, 28. April 2007 01:50 schrieb David Lang: > 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) > 4. unpause > 5. save image (with full userspace available, including network) > 6. shutdown system (throw away all userspace memory, no need to do graceful > shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if > needed) And then you'll have people wonder why the server which sent out all those files has no log entries. You'd have to selectively unfreeze user space, which is a cure worse than the desease. Simply throwing away user space work is a bug. And no, you cannot say that it'll be redone away, as you are throwing away accepted input, too. Regards Oliver ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 6:58 ` Oliver Neukum @ 2007-04-28 9:16 ` Pekka J Enberg 2007-04-28 18:28 ` David Lang 1 sibling, 0 replies; 135+ messages in thread From: Pekka J Enberg @ 2007-04-28 9:16 UTC (permalink / raw) To: Oliver Neukum Cc: David Lang, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, LKML On Sat, 28 Apr 2007, Oliver Neukum wrote: > And then you'll have people wonder why the server which sent out all > those files has no log entries. You'd have to selectively unfreeze user > space, which is a cure worse than the desease. > > Simply throwing away user space work is a bug. And no, you cannot say that > it'll be redone away, as you are throwing away accepted input, too. It's not a bug, it's a feature =). While I totally agree with you that for the common case, you probably do want to avoid work in the userspace after taking the snapshot, it is something that should be solved separately. There is absolutely nothing wrong with taking a snapshot, doing some work, and then resuming to the snapshot and thus "losing" some the work (this is useful for debugging, for example). Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 6:58 ` Oliver Neukum 2007-04-28 9:16 ` Pekka J Enberg @ 2007-04-28 18:28 ` David Lang 1 sibling, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-28 18:28 UTC (permalink / raw) To: Oliver Neukum Cc: Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML [-- Attachment #1: Type: TEXT/PLAIN, Size: 1902 bytes --] On Sat, 28 Apr 2007, Oliver Neukum wrote: > Am Samstag, 28. April 2007 01:50 schrieb David Lang: >> 3. make mounted filesystems read-only (possibly with snapshot/checkpoint) >> 4. unpause >> 5. save image (with full userspace available, including network) >> 6. shutdown system (throw away all userspace memory, no need to do graceful >> shutdown or nice kill signals, revert filesystem to snapshot/checkpoint if >> needed) > > And then you'll have people wonder why the server which sent out all > those files has no log entries. You'd have to selectively unfreeze user > space, which is a cure worse than the desease. > > Simply throwing away user space work is a bug. And no, you cannot say that > it'll be redone away, as you are throwing away accepted input, too. when you are doing a suspend-to-disk I disagree with you. whoever is doing the suspend knows what is going on, and they can decide what needs to be done. the only case where you have 'unexpected' work being thrown away is if you are suspending a network server, and the process of suspending it is going to cut all the network connections anyway so it's not a seamless process. In this case it's fair to let the sysadmin choose between loosing some logs or doing some other step to prevent this from happening (which could be to shutdown the network service, or load a iptables rule to block the service) however, most of the uses of suspend-to-disk are going to be single-user machines and in that case telling the user that anything that they do after issuing the suspend is going to be lost is a perfectly sane thing to do. and for that matter, if the snapshot is cheap enough, some people may choose to cron the snapshot portion of a suspend-to-disk evvery few min as a safety net for something going wrong. In this case they really do want all of userspace to keep working after the snapshot. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:50 ` David Lang 2007-04-28 0:40 ` Linus Torvalds 2007-04-28 6:58 ` Oliver Neukum @ 2007-05-03 17:18 ` Pavel Machek 2007-05-07 2:13 ` David Lang 2 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-03 17:18 UTC (permalink / raw) To: David Lang Cc: Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML Hi! > nobody is suggesting that you leave peocesses running > while you do the snapshot, what is being proposed is > > 1. pause userspace (prevent scheduling) > 2. make snapshot image of memory > 3. make mounted filesystems read-only (possibly with > snapshot/checkpoint) > 4. unpause > 5. save image (with full userspace available, including > network) Including network? Your tcp peers will be really confused, then, if you ACK packets then claim you did not get them. No, you do not want to start network. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-03 17:18 ` Pavel Machek @ 2007-05-07 2:13 ` David Lang 2007-05-07 3:33 ` Kyle Moffett 2007-05-07 12:48 ` Pavel Machek 0 siblings, 2 replies; 135+ messages in thread From: David Lang @ 2007-05-07 2:13 UTC (permalink / raw) To: Pavel Machek Cc: Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML On Thu, 3 May 2007, Pavel Machek wrote: > Hi! > >> nobody is suggesting that you leave peocesses running >> while you do the snapshot, what is being proposed is >> >> 1. pause userspace (prevent scheduling) >> 2. make snapshot image of memory >> 3. make mounted filesystems read-only (possibly with >> snapshot/checkpoint) >> 4. unpause >> 5. save image (with full userspace available, including >> network) > > Including network? Your tcp peers will be really confused, then, if > you ACK packets then claim you did not get them. No, you do not want > to start network. anyone who is doing a hibernate or suspend who expect all the network connections to be working afterwords is dreaming or smokeing something. this is just another way that the failure can show up. in fact, I would say that it would probalby be a nice thing to do for intervening firewalls and external servers if a suspend closed all external TCP connections rather then leaving them dangling (eating up resources until they time out) if you software can't tolorate the network connection going away on you it will have problems in normal operation anyway, let alone when you suspend/hibernate your machine. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 2:13 ` David Lang @ 2007-05-07 3:33 ` Kyle Moffett 2007-05-07 12:48 ` Pavel Machek 1 sibling, 0 replies; 135+ messages in thread From: Kyle Moffett @ 2007-05-07 3:33 UTC (permalink / raw) To: David Lang Cc: Pavel Machek, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML On May 06, 2007, at 22:13:51, David Lang wrote: > anyone who is doing a hibernate or suspend who expect all the > network connections to be working afterwords is dreaming or > smokeing something. > > this is just another way that the failure can show up. > > in fact, I would say that it would probalby be a nice thing to do > for intervening firewalls and external servers if a suspend closed > all external TCP connections rather then leaving them dangling > (eating up resources until they time out) > > if you software can't tolorate the network connection going away on > you it will have problems in normal operation anyway, let alone > when you suspend/hibernate your machine. Yeah, for suspend-to-ram+resume and for snapshot+restore you probably want userspace to support some kind of initscript-like mechanism which is triggered by the lid-switch or something before calling into the kernel. That way it can close network connections mostly-nicely and down network interfaces before suspending, then re-run DHCP/ 802.11/whatever configuration after resume/restore. That might not be a bad place to handle NFS mounts and such too. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 2:13 ` David Lang 2007-05-07 3:33 ` Kyle Moffett @ 2007-05-07 12:48 ` Pavel Machek 2007-05-07 12:52 ` Oliver Neukum 1 sibling, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-07 12:48 UTC (permalink / raw) To: David Lang Cc: Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML Hi! > >>nobody is suggesting that you leave peocesses running > >>while you do the snapshot, what is being proposed is > >> > >>1. pause userspace (prevent scheduling) > >>2. make snapshot image of memory > >>3. make mounted filesystems read-only (possibly with > >>snapshot/checkpoint) > >>4. unpause > >>5. save image (with full userspace available, including > >>network) > > > >Including network? Your tcp peers will be really confused, then, if > >you ACK packets then claim you did not get them. No, you do not want > >to start network. > > anyone who is doing a hibernate or suspend who expect all the network > connections to be working afterwords is dreaming or smokeing >something. Really? It works today... if the suspend is short enough. And that's how it should be. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 12:48 ` Pavel Machek @ 2007-05-07 12:52 ` Oliver Neukum 2007-05-07 14:37 ` david 0 siblings, 1 reply; 135+ messages in thread From: Oliver Neukum @ 2007-05-07 12:52 UTC (permalink / raw) To: Pavel Machek Cc: David Lang, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek: > > >Including network? Your tcp peers will be really confused, then, if > > >you ACK packets then claim you did not get them. No, you do not want > > >to start network. > > > > anyone who is doing a hibernate or suspend who expect all the network > > connections to be working afterwords is dreaming or smokeing > >something. > > Really? It works today... if the suspend is short enough. And that's > how it should be. If we get very good at Wake-on-Lan it should work for any length of time. Regards Oliver ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 12:52 ` Oliver Neukum @ 2007-05-07 14:37 ` david 2007-05-07 19:51 ` Pavel Machek 0 siblings, 1 reply; 135+ messages in thread From: david @ 2007-05-07 14:37 UTC (permalink / raw) To: Oliver Neukum Cc: Pavel Machek, David Lang, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML On Mon, 7 May 2007, Oliver Neukum wrote: > Am Montag, 7. Mai 2007 14:48 schrieb Pavel Machek: >>>> Including network? Your tcp peers will be really confused, then, if >>>> you ACK packets then claim you did not get them. No, you do not want >>>> to start network. >>> >>> anyone who is doing a hibernate or suspend who expect all the network >>> connections to be working afterwords is dreaming or smokeing >>> something. >> >> Really? It works today... if the suspend is short enough. And that's >> how it should be. > > If we get very good at Wake-on-Lan it should work for any length > of time. for suspend-to-ram this would work, I stand corrected. for hibernate this would almost certinly not work, and I don't think that it's worth raising false hopes. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 14:37 ` david @ 2007-05-07 19:51 ` Pavel Machek 2007-05-07 19:55 ` david 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-07 19:51 UTC (permalink / raw) To: david Cc: Oliver Neukum, David Lang, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML Hi! > >>Really? It works today... if the suspend is short > >>enough. And that's > >>how it should be. > > > >If we get very good at Wake-on-Lan it should work for > >any length > >of time. > > for suspend-to-ram this would work, I stand corrected. > > for hibernate this would almost certinly not work, and I > don't think that it's worth raising false hopes. Check the facts. It used to work, and it should work today. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 19:51 ` Pavel Machek @ 2007-05-07 19:55 ` david 2007-05-07 20:38 ` Pavel Machek 0 siblings, 1 reply; 135+ messages in thread From: david @ 2007-05-07 19:55 UTC (permalink / raw) To: Pavel Machek Cc: Oliver Neukum, David Lang, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML On Mon, 7 May 2007, Pavel Machek wrote: >>>> Really? It works today... if the suspend is short >>>> enough. And that's >>>> how it should be. >>> >>> If we get very good at Wake-on-Lan it should work for >>> any length >>> of time. >> >> for suspend-to-ram this would work, I stand corrected. >> >> for hibernate this would almost certinly not work, and I >> don't think that it's worth raising false hopes. > > Check the facts. It used to work, and it should work today. I don't dispute that it sometimes works today. what I dispute is that makeing it work should be a contraint on a cleaner design that happens to cause tcp connections to fail on suspend-to-disk (hibernate). if you are dong suspend-to-disk for such a short period that TCP connections are able to recover (typically <15 min for most firewalls, in some cases <2 min for connections with keep-alive) is it really worth it? and once you pass the timeframes where the connections are still alive then it shouldn't matter, and in fact the server should gracefully close the connections to be nice to other devices and servers on the network. I dispute the idea that doing a suspend-to-disk and expecting that your network connections will recover when you wake up is a sane expectation. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 19:55 ` david @ 2007-05-07 20:38 ` Pavel Machek 2007-05-08 17:36 ` Disconnect 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-07 20:38 UTC (permalink / raw) To: david Cc: Oliver Neukum, David Lang, Nigel Cunningham, Rafael J. Wysocki, Linus Torvalds, Pekka J Enberg, LKML Hi! > I don't dispute that it sometimes works today. > > what I dispute is that makeing it work should be a contraint on a cleaner > design that happens to cause tcp connections to fail on suspend-to-disk > (hibernate). > > if you are dong suspend-to-disk for such a short period that TCP > connections are able to recover (typically <15 min for most firewalls, in > some cases <2 min for connections with keep-alive) is it really > worth it? People were using swsusp to move server from one room to another. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-07 20:38 ` Pavel Machek @ 2007-05-08 17:36 ` Disconnect 0 siblings, 0 replies; 135+ messages in thread From: Disconnect @ 2007-05-08 17:36 UTC (permalink / raw) To: linux-kernel We used it (with great success) to replace bad UPSs on single-PSU database servers under (light) load. No need for scheduled downtime, etc. The whole point of hibernation (or suspend to disk, or whatever you call it) is that the system goes to a zero-power state and then can be brought back to its original state. Closing in-progress network connections has nothing to do with pausing a machine any more than setting IM clients to 'away' would, or locking an X session. That sort of side-effect needs to be handled outside the core of "put state out to disk and read it back". On 5/7/07, Pavel Machek <pavel@ucw.cz> wrote: > Hi! > > > I don't dispute that it sometimes works today. > > > > what I dispute is that makeing it work should be a contraint on a cleaner > > design that happens to cause tcp connections to fail on suspend-to-disk > > (hibernate). > > > > if you are dong suspend-to-disk for such a short period that TCP > > connections are able to recover (typically <15 min for most firewalls, in > > some cases <2 min for connections with keep-alive) is it really > > worth it? > > People were using swsusp to move server from one room to another. > Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:45 ` Rafael J. Wysocki 2007-04-27 23:57 ` Nigel Cunningham @ 2007-04-27 23:59 ` Linus Torvalds 2007-04-28 0:18 ` Linus Torvalds ` (2 more replies) 1 sibling, 3 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-27 23:59 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Pekka J Enberg, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > Actually, the less things happen while we're creating and saving the image, > the less sources of potential problems there are and by freezing the kernel > threads (not all of them), we cause less things to happen at that time. That makes no sense. You have to create the snapshot image with interrupts disabled *anyway*. I really don't see how you can say that stopping threads etc can make any difference what-so-ever. If you don't create the snapshot with interrupts disabled (and just with a single CPU running) you have so many other problems that it's not even remotely funny. So there's *by*definition* nothing at all that can happen while you snapshot the system. Claiming otherwise is just silly. > To make you happy, we could stop doing that, but what actual _advantage_ > that would bring? Like getting rid of all the magic "I don't want you to freeze me" crud? Or getting rid of this horribly idiotic "three times widdershins" kind of black magic mentality! It looks like the main reason for the process freezing has nothing to do with technology, but some irrational fear of other things happening at the same time, even though they CANNOT happen if you do things even half-way sanely. The "let's stop all kernel threads" is superstition. It's the same kind of superstition that made people write "sync" three times before turning off the power in the olden times. It's the kind of superstition that comes from "we don't do things right, so let's be vewy vewy quiet and _pray_ that it works when we are beign quiet". That's bad. It's doubly bad, because that idiocy has also infected s2ram. Again, another thing that really makes no sense at all - and we do it not just for snapshotting, but for s2ram too. Can you tell me *why*? > > Trying to freeze kernel threads has _caused_ problems. It has _added_ > > these interdependencies. It hasn't removed a single dependency at any > > time, it has just added new problems! > > What problems are you talking about? Like you wouldn't know. Look at commit b43376927a that you yourself are credited with, just a month ago. Then, do something as simple as git grep create_freezeable_workthread and ponder the end results of that grep. If you don't see something wrong, you're blind. > > NONE of these are valid explanations at all. You're listing totally > > theoretical problems, and ignoring all the _real_ problems that trying to > > freeze kernel threads has _caused_. > > Example, please? Who do you think you are kidding? See above. And if you think that's an isolated example, look again. And start grepping for PF_NOFREEZE, and other examples. The fact is, there is not a *single* reason to freeze kernel threads. But some rocket scientist decided to, and then screwed everybody else over. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:59 ` Linus Torvalds @ 2007-04-28 0:18 ` Linus Torvalds 2007-05-05 11:42 ` Pavel Machek 2007-04-28 0:50 ` Paul Mackerras 2007-04-28 1:00 ` Rafael J. Wysocki 2 siblings, 1 reply; 135+ messages in thread From: Linus Torvalds @ 2007-04-28 0:18 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Pekka J Enberg, LKML On Fri, 27 Apr 2007, Linus Torvalds wrote: > > The "let's stop all kernel threads" is superstition. It's the same kind of > superstition that made people write "sync" three times before turning off > the power in the olden times. It's the kind of superstition that comes > from "we don't do things right, so let's be vewy vewy quiet and _pray_ > that it works when we are beign quiet". Side note: while I think things should probably *work* even with user processes going full bore while a snapshot it taken, I'll freely admit that I'll follow that superstition far enough that I think it's probably a good idea to try to quiesce the system to _some_ degree, and that stopping user programs is a good idea. Partly because the whole memory shrinking thing, and partly just because we should do the snapshot with hw IO queues empty. But I don't think it would necessarily be wrong (and in many ways it would probably be *right*) to do that IO queue stopping at the queue level rather than at a process level. Why stop processes just becasue you want to clean out IO queues? They are two totally different things! Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 0:18 ` Linus Torvalds @ 2007-05-05 11:42 ` Pavel Machek 0 siblings, 0 replies; 135+ messages in thread From: Pavel Machek @ 2007-05-05 11:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, Nigel Cunningham, Pekka J Enberg, LKML Hi! > > The "let's stop all kernel threads" is superstition. It's the same kind of > > superstition that made people write "sync" three times before turning off > > the power in the olden times. It's the kind of superstition that comes > > from "we don't do things right, so let's be vewy vewy quiet and _pray_ > > that it works when we are beign quiet". > > Side note: while I think things should probably *work* even with user > processes going full bore while a snapshot it taken, I'll freely admit > that I'll follow that superstition far enough that I think it's probably a > good idea to try to quiesce the system to _some_ degree, and that stopping > user programs is a good idea. Partly because the whole memory shrinking > thing, and partly just because we should do the snapshot with hw IO queues > empty. > > But I don't think it would necessarily be wrong (and in many ways it would > probably be *right*) to do that IO queue stopping at the queue level > rather than at a process level. Why stop processes just becasue you want > to clean out IO queues? They are two totally different things! Actually, I'd like to stop I/O queues; if there was easy way to do that, I'll happily switch. Notice that we'll need to stop 'I/O queues' of the char devices, too... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:59 ` Linus Torvalds 2007-04-28 0:18 ` Linus Torvalds @ 2007-04-28 0:50 ` Paul Mackerras 2007-04-28 1:00 ` Rafael J. Wysocki 2 siblings, 0 replies; 135+ messages in thread From: Paul Mackerras @ 2007-04-28 0:50 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, Nigel Cunningham, Pekka J Enberg, LKML Linus Torvalds writes: > I really don't see how you can say that stopping threads etc can make any > difference what-so-ever. If you don't create the snapshot with interrupts > disabled (and just with a single CPU running) you have so many other > problems that it's not even remotely funny. I agree. I don't like the freezer. We have had working kernel-controlled suspend to RAM on powerbooks for almost 10 years now, and we never needed to freeze processes. That said, I can see two attractions in freezing processes: 1. It provides a way to stop new I/O requests coming in, and thus somewhat makes up for the lack of a way to freeze device request queues (at least, we didn't have one last time I looked). 2. Systems do sometimes die while suspended (e.g. run out of battery, or the resume process fails), and to make the next boot painless, you want the filesystems on disk to be as clean as possible. Freezing processes and then doing a sync provides one way to achieve that. Of course, you have to make sure you don't freeze any kernel threads that are needed for doing the sync... And if one of your filesystems is using FUSE, it's not going to get very far. Paul. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:59 ` Linus Torvalds 2007-04-28 0:18 ` Linus Torvalds 2007-04-28 0:50 ` Paul Mackerras @ 2007-04-28 1:00 ` Rafael J. Wysocki 2007-04-28 1:12 ` Linus Torvalds 2 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 1:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nigel Cunningham, Pekka J Enberg, LKML On Saturday, 28 April 2007 01:59, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > Actually, the less things happen while we're creating and saving the image, > > the less sources of potential problems there are and by freezing the kernel > > threads (not all of them), we cause less things to happen at that time. > > That makes no sense. > > You have to create the snapshot image with interrupts disabled *anyway*. > > I really don't see how you can say that stopping threads etc can make any > difference what-so-ever. If you don't create the snapshot with interrupts > disabled (and just with a single CPU running) you have so many other > problems that it's not even remotely funny. > > So there's *by*definition* nothing at all that can happen while you > snapshot the system. Claiming otherwise is just silly. For creating the snapshot alone, it doesn't matter. Except that the restore is cleaner a bit (we know exactly what all of these threads will be doing when we restore the image and enable the IRQs after that). Still, I think that kernel threads can potentailly hold locks accross the freezing of devices and image creation and that is fishy. Also I believe, although I'm not 100% sure, that some of them may cause problems to appear after we've created the image and while we are saving it. > > To make you happy, we could stop doing that, but what actual _advantage_ > > that would bring? > > Like getting rid of all the magic "I don't want you to freeze me" crud? And what exactly is wrong with it? > Or getting rid of this horribly idiotic "three times widdershins" kind of > black magic mentality! It looks like the main reason for the process > freezing has nothing to do with technology, but some irrational fear of > other things happening at the same time, even though they CANNOT happen if > you do things even half-way sanely. > > The "let's stop all kernel threads" is superstition. It's the same kind of > superstition that made people write "sync" three times before turning off > the power in the olden times. It's the kind of superstition that comes > from "we don't do things right, so let's be vewy vewy quiet and _pray_ > that it works when we are beign quiet". > > That's bad. Okay. Accidentally, I'm working on a freezer patch, so I'll probably drop the freezing of kernel threads from swsusp in it and we'll see what happens. Let's do the experiment, shall we? > It's doubly bad, because that idiocy has also infected s2ram. Again, > another thing that really makes no sense at all - and we do it not just > for snapshotting, but for s2ram too. Can you tell me *why*? Why we freeze tasks at all or why we freeze kernel threads? > > > Trying to freeze kernel threads has _caused_ problems. It has _added_ > > > these interdependencies. It hasn't removed a single dependency at any > > > time, it has just added new problems! > > > > What problems are you talking about? > > Like you wouldn't know. Look at commit b43376927a that you yourself are > credited with, just a month ago. > > Then, do something as simple as > > git grep create_freezeable_workthread s/workthread/workqueue/ > and ponder the end results of that grep. If you don't see something wrong, > you're blind. This was a mistake, quite unrelated to the point you're making. And actually, I was trying to fix a problem with two kernel threads that we thought might submit I/O to disk after the image had been created. Otherwise I wouldn't have thought of doing that change. > > > NONE of these are valid explanations at all. You're listing totally > > > theoretical problems, and ignoring all the _real_ problems that trying to > > > freeze kernel threads has _caused_. > > > > Example, please? > > Who do you think you are kidding? See above. Well, if someone does something in a wrong way, that need not mean the thing he was trying to do was wrong. Somehow, I knew you would point at this ... > And if you think that's an isolated example, look again. And start > grepping for PF_NOFREEZE, and other examples. May I say I'm not convinced? > The fact is, there is not a *single* reason to freeze kernel threads. But > some rocket scientist decided to, and then screwed everybody else over. At least _that_ wasn't me. :-) Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:00 ` Rafael J. Wysocki @ 2007-04-28 1:12 ` Linus Torvalds 2007-04-28 0:54 ` David Lang 2007-04-28 1:44 ` Rafael J. Wysocki 0 siblings, 2 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-28 1:12 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Nigel Cunningham, Pekka J Enberg, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > another thing that really makes no sense at all - and we do it not just > > for snapshotting, but for s2ram too. Can you tell me *why*? > > Why we freeze tasks at all or why we freeze kernel threads? In many ways, "at all". I _do_ realize the IO request queue issues, and that we cannot actually do s2ram with some devices in the middle of a DMA. So we want to be able to avoid *that*, there's no question about that. And I suspect that stopping user threads and then waiting for a sync is practically one of the easier ways to do so. So in practice, the "at all" may become a "why freeze kernel threads?" and freezing user threads I don't find really objectionable. But as Paul pointed out, Linux on the old powerpc Mac hardware was actually rather famous for having working (and reliable) suspend long before it worked even remotely reliably on PC's. And they didn't do even that. (They didn't have ACPI, and they had a much more limited set of devices, but the whole process freezer is really about neither of those issues. The wild and wacky PC hardware has its problems, but that's _one_ thing we can't blame PC hardware for ;) > > git grep create_freezeable_workthread > > s/workthread/workqueue/ Yes. > > and ponder the end results of that grep. If you don't see something wrong, > > you're blind. > > This was a mistake, quite unrelated to the point you're making. Did you actually _do_ the "grep" (with the fixed argument)? I had two totally independent points. #1 was that you yourself have been fixing bugs in this area. #2 was the result of that grep. It's absolutely _empty_ except for the define to add that interface. NOBODY USES IT! Now, grep for the same interface that creates _non_freezeable workqueues. Put another way: [torvalds@woody linux]$ git grep create_workqueue | wc -l 35 [torvalds@woody linux]$ git grep create_freezeable_workqueue | wc -l 1 and that _one_ hit you get for the "freezeable" case is not actually a user, it's the definition! Ie my point is, nobody wants freezeable kernel threads. Absolutely nobody. Yet we have all this support for freezing them (or rather, we freeze them by default, and then we have all this support for _not_ doing that wrong default thing!) So yes, I think it would be interesting to just stop freezing kernel threads. Totally. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:12 ` Linus Torvalds @ 2007-04-28 0:54 ` David Lang 2007-04-28 1:44 ` Rafael J. Wysocki 1 sibling, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-28 0:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, Nigel Cunningham, Pekka J Enberg, LKML On Fri, 27 Apr 2007, Linus Torvalds wrote: > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: >> >>> It's doubly bad, because that idiocy has also infected s2ram. Again, >>> another thing that really makes no sense at all - and we do it not just >>> for snapshotting, but for s2ram too. Can you tell me *why*? >> >> Why we freeze tasks at all or why we freeze kernel threads? > > In many ways, "at all". > > I _do_ realize the IO request queue issues, and that we cannot actually do > s2ram with some devices in the middle of a DMA. So we want to be able to > avoid *that*, there's no question about that. And I suspect that stopping > user threads and then waiting for a sync is practically one of the easier > ways to do so. > > So in practice, the "at all" may become a "why freeze kernel threads?" and > freezing user threads I don't find really objectionable. there was a thread last week (or so) about splitting up the process list, one list for normal user processes, one for kernel threads, and one for dead processes waiting to be reaped. it almost sounds like what you want to do is to act as if the normal user threads weren't there for a short time (while you make the snapshot) and then recover them to continue and save the snapshot. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:12 ` Linus Torvalds 2007-04-28 0:54 ` David Lang @ 2007-04-28 1:44 ` Rafael J. Wysocki 2007-04-28 2:51 ` Daniel Hazelton 2007-04-28 8:50 ` Pavel Machek 1 sibling, 2 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 1:44 UTC (permalink / raw) To: Linus Torvalds Cc: Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov, Pavel Machek On Saturday, 28 April 2007 03:12, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > > another thing that really makes no sense at all - and we do it not just > > > for snapshotting, but for s2ram too. Can you tell me *why*? > > > > Why we freeze tasks at all or why we freeze kernel threads? > > In many ways, "at all". > > I _do_ realize the IO request queue issues, and that we cannot actually do > s2ram with some devices in the middle of a DMA. So we want to be able to > avoid *that*, there's no question about that. And I suspect that stopping > user threads and then waiting for a sync is practically one of the easier > ways to do so. > > So in practice, the "at all" may become a "why freeze kernel threads?" and > freezing user threads I don't find really objectionable. > > But as Paul pointed out, Linux on the old powerpc Mac hardware was > actually rather famous for having working (and reliable) suspend long > before it worked even remotely reliably on PC's. And they didn't do even > that. > > (They didn't have ACPI, and they had a much more limited set of devices, > but the whole process freezer is really about neither of those issues. The > wild and wacky PC hardware has its problems, but that's _one_ thing we > can't blame PC hardware for ;) We freeze user space processes for the reasons that you have quoted above. Why we freeze kernel threads in there too is a good question, but not for me to answer. I don't know. Pavel should know, I think. > > > git grep create_freezeable_workthread > > > > s/workthread/workqueue/ > > Yes. > > > > and ponder the end results of that grep. If you don't see something wrong, > > > you're blind. > > > > This was a mistake, quite unrelated to the point you're making. > > Did you actually _do_ the "grep" (with the fixed argument)? > > I had two totally independent points. #1 was that you yourself have been > fixing bugs in this area. #2 was the result of that grep. It's absolutely > _empty_ except for the define to add that interface. > > NOBODY USES IT! The reason is pretty simple. We wanted to drop that interface altogether, because it was broken (my fault), but Oleg suggested that we keep it so that we could fix and use it in the future (for purposes other than the hibernation, though). > Now, grep for the same interface that creates _non_freezeable workqueues. > > Put another way: > > [torvalds@woody linux]$ git grep create_workqueue | wc -l > 35 > > [torvalds@woody linux]$ git grep create_freezeable_workqueue | wc -l > 1 > > and that _one_ hit you get for the "freezeable" case is not actually a > user, it's the definition! > > Ie my point is, nobody wants freezeable kernel threads. Absolutely nobody. That's freezable workqueues only. :-) > Yet we have all this support for freezing them (or rather, we freeze them > by default, and then we have all this support for _not_ doing that wrong > default thing!) > > So yes, I think it would be interesting to just stop freezing kernel > threads. Totally. Okay, I'll do that. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:44 ` Rafael J. Wysocki @ 2007-04-28 2:51 ` Daniel Hazelton 2007-04-28 8:50 ` Pavel Machek 1 sibling, 0 replies; 135+ messages in thread From: Daniel Hazelton @ 2007-04-28 2:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov, Pavel Machek On Friday 27 April 2007 21:44:48 Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 03:12, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > It's doubly bad, because that idiocy has also infected s2ram. Again, > > > > another thing that really makes no sense at all - and we do it not > > > > just for snapshotting, but for s2ram too. Can you tell me *why*? > > > > > > Why we freeze tasks at all or why we freeze kernel threads? > > > > In many ways, "at all". > > > > I _do_ realize the IO request queue issues, and that we cannot actually > > do s2ram with some devices in the middle of a DMA. So we want to be able > > to avoid *that*, there's no question about that. And I suspect that > > stopping user threads and then waiting for a sync is practically one of > > the easier ways to do so. > > <snip> Apparently I *CANNOT* wrap my head around this - if just because my laptop, running a vendor 2.6.17 kernel does s2ram perfectly, at least, it does when using the "Upstart" init system rather than the classical SysV init system. I have tried it with the classical init and the suspend isn't triggered by the buttons that used to do it. I didn't try 'echo ram > /sys/power/state', but I have a feeling that would have worked as well. I have problems with s2disk, but thats because I keep my swap partition small - I try to keep it at or around 256M when I have more than half a gig of Ram in a system. Perhaps one of these days I'll grab a multi-gig flash disk, set it up as a swap partition and try it again. (every time I've tried s2disk I wind up running out of disk space - and this is with nothing but X running. Any kind of progress meter for when the system is doing s2disk would be nice - every time I've tried it all I see for the nearly 2 minutes before the s2disk attempt ends is a black screen. I say 2 minutes because thats how long it takes for it to learn that there isn't enough space on the swap-partition to save the image) DRH ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:44 ` Rafael J. Wysocki 2007-04-28 2:51 ` Daniel Hazelton @ 2007-04-28 8:50 ` Pavel Machek 2007-04-28 9:24 ` Rafael J. Wysocki ` (2 more replies) 1 sibling, 3 replies; 135+ messages in thread From: Pavel Machek @ 2007-04-28 8:50 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov Hi! > > In many ways, "at all". > > > > I _do_ realize the IO request queue issues, and that we cannot actually do > > s2ram with some devices in the middle of a DMA. So we want to be able to > > avoid *that*, there's no question about that. And I suspect that stopping > > user threads and then waiting for a sync is practically one of the easier > > ways to do so. > > > > So in practice, the "at all" may become a "why freeze kernel threads?" and > > freezing user threads I don't find really objectionable. > > > > But as Paul pointed out, Linux on the old powerpc Mac hardware was > > actually rather famous for having working (and reliable) suspend long > > before it worked even remotely reliably on PC's. And they didn't do even > > that. > > > > (They didn't have ACPI, and they had a much more limited set of devices, > > but the whole process freezer is really about neither of those issues. The > > wild and wacky PC hardware has its problems, but that's _one_ thing we > > can't blame PC hardware for ;) > > We freeze user space processes for the reasons that you have quoted above. > > Why we freeze kernel threads in there too is a good question, but not for me to > answer. I don't know. Pavel should know, I think. We do not want kernel threads running: a) they may hold some locks and deadlock suspend b) they may do some writes to disk, leading to corruption We could solve a) by carefully auditing suspend lock usage to make sure deadlocks are impossible even with kernel threads running. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 8:50 ` Pavel Machek @ 2007-04-28 9:24 ` Rafael J. Wysocki 2007-04-28 16:28 ` Linus Torvalds 2007-04-28 18:32 ` David Lang 2 siblings, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 9:24 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Saturday, 28 April 2007 10:50, Pavel Machek wrote: > Hi! > > > > In many ways, "at all". > > > > > > I _do_ realize the IO request queue issues, and that we cannot actually do > > > s2ram with some devices in the middle of a DMA. So we want to be able to > > > avoid *that*, there's no question about that. And I suspect that stopping > > > user threads and then waiting for a sync is practically one of the easier > > > ways to do so. > > > > > > So in practice, the "at all" may become a "why freeze kernel threads?" and > > > freezing user threads I don't find really objectionable. > > > > > > But as Paul pointed out, Linux on the old powerpc Mac hardware was > > > actually rather famous for having working (and reliable) suspend long > > > before it worked even remotely reliably on PC's. And they didn't do even > > > that. > > > > > > (They didn't have ACPI, and they had a much more limited set of devices, > > > but the whole process freezer is really about neither of those issues. The > > > wild and wacky PC hardware has its problems, but that's _one_ thing we > > > can't blame PC hardware for ;) > > > > We freeze user space processes for the reasons that you have quoted above. > > > > Why we freeze kernel threads in there too is a good question, but not for me to > > answer. I don't know. Pavel should know, I think. > > We do not want kernel threads running: > > a) they may hold some locks and deadlock suspend Yeah, the same issue as with the hibernation and I do think it's _real_. > b) they may do some writes to disk, leading to corruption Hmm, is that an issue in the suspend (aka s2ram) case? > We could solve a) by carefully auditing suspend lock usage to make > sure deadlocks are impossible even with kernel threads running. Yes, we can, but for now it's not been done yet. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 8:50 ` Pavel Machek 2007-04-28 9:24 ` Rafael J. Wysocki @ 2007-04-28 16:28 ` Linus Torvalds 2007-04-28 17:50 ` Rafael J. Wysocki 2007-04-28 18:32 ` David Lang 2 siblings, 1 reply; 135+ messages in thread From: Linus Torvalds @ 2007-04-28 16:28 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sat, 28 Apr 2007, Pavel Machek wrote: > > We do not want kernel threads running: > > a) they may hold some locks and deadlock suspend > > b) they may do some writes to disk, leading to corruption You're really just making both of those up. If a kernel thread holds a lock and deadlocks suspend, that would deadlock anythign else _too_. Suspend isn't *that* special. Everything it does are things other people do too. And no, kernel threads do not write to disk on their own. Name one. They help *others* write to disk, but those disk writes need to happen. The freezer has *caused* those deadlocks (eg by stopping threads that were needed for the suspend writeouts to succeed!), not solved them. So stop making these totally bogus arguments up. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 16:28 ` Linus Torvalds @ 2007-04-28 17:50 ` Rafael J. Wysocki 2007-04-28 21:25 ` Linus Torvalds 0 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 17:50 UTC (permalink / raw) To: Linus Torvalds Cc: Pavel Machek, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Saturday, 28 April 2007 18:28, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Pavel Machek wrote: > > > > We do not want kernel threads running: > > > > a) they may hold some locks and deadlock suspend > > > > b) they may do some writes to disk, leading to corruption > > You're really just making both of those up. > > If a kernel thread holds a lock and deadlocks suspend, that would deadlock > anythign else _too_. Suspend isn't *that* special. Everything it does are > things other people do too. > > And no, kernel threads do not write to disk on their own. Name one. xfssyncd , or at least it seems so at a quick look. > They help *others* write to disk, but those disk writes need to happen. > > The freezer has *caused* those deadlocks (eg by stopping threads that were > needed for the suspend writeouts to succeed!), not solved them. I can't remember anything like this, but I believe you have a specific test case in mind. > So stop making these totally bogus arguments up. Well, they may be bogus, but there's something else. I have reviewed some kernel threads used by device drivers that currently are frozen to see if it would be safe not to freeze them, and I'm worried. What, for example, if such a thread schedules a timeout and waits for something to happen (eg. the airo driver does something like this), but instead the hibernation/suspend happens and the device is frozen/suspended under it? Shouldn't the thread be notified by the driver's freeze/suspend callback? Moreover, what if after the restore the device is not present (for example, it may be a pcmcia card that the user has removed) and the thread is scheduled before the device's unfreeze callback has a chance to run? Shouldn't the thread check that the device is present? In that case it would have to be notified by someone that the check is necessary, but who can do that? Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 17:50 ` Rafael J. Wysocki @ 2007-04-28 21:25 ` Linus Torvalds 2007-04-28 23:03 ` Rafael J. Wysocki 2007-04-29 8:23 ` Pavel Machek 0 siblings, 2 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-28 21:25 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > The freezer has *caused* those deadlocks (eg by stopping threads that were > > needed for the suspend writeouts to succeed!), not solved them. > > I can't remember anything like this, but I believe you have a specific test > case in mind. Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Rafael, you really don't know what you're talking about, do you? Just _look_ at them. It's the IO threads etc that shouldn't be frozen, exactly *because* they do IO. You claim that kernel threads shouldn't do IO, but that's the point: if you cannot do IO when snapshotting to disk, here's a damn big clue for you: how do you think that snapshot is going to get written? I *guarantee* you that we've had a lot more problems with threads that should *not* have been frozen than with those hypothetical threads that you think should have been frozen. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 21:25 ` Linus Torvalds @ 2007-04-28 23:03 ` Rafael J. Wysocki 2007-04-28 23:45 ` Linus Torvalds 2007-04-29 8:23 ` Pavel Machek 1 sibling, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 23:03 UTC (permalink / raw) To: Linus Torvalds Cc: Pavel Machek, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Saturday, 28 April 2007 23:25, Linus Torvalds wrote: > > On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > > > > > > The freezer has *caused* those deadlocks (eg by stopping threads that were > > > needed for the suspend writeouts to succeed!), not solved them. > > > > I can't remember anything like this, but I believe you have a specific test > > case in mind. > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? Well, I don't know why exactly it had been originally introduced. Currently, it is used by the threads that should be running after the snapshot is done (they are not only I/O threads). > Rafael, you really don't know what you're talking about, do you? I think I know. > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > exactly *because* they do IO. You claim that kernel threads shouldn't do > IO, but that's the point: if you cannot do IO when snapshotting to disk, > here's a damn big clue for you: how do you think that snapshot is going to > get written? OK, more precisely: fs-related threads should not try to process their queues, etc., after the snapshot is done, because that may cause some fs data to be written at that time and then the fs in question may be corrupted after the restore. Not all of the I/O in general, fs data. Still, that alone probably is not a good enough reason for freezing all kernel threads. > I *guarantee* you that we've had a lot more problems with threads that > should *not* have been frozen than with those hypothetical threads that > you think should have been frozen. Well, I'm not sure whether or not that still would have been the case if we had stopped to freeze kernel threads for the hibernation/suspend. I just see potential problems that I've mentioned in the previous message and I don't see any evidence that they cannot occur. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 23:03 ` Rafael J. Wysocki @ 2007-04-28 23:45 ` Linus Torvalds 2007-04-29 0:01 ` Nigel Cunningham ` (2 more replies) 0 siblings, 3 replies; 135+ messages in thread From: Linus Torvalds @ 2007-04-28 23:45 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: > > OK, more precisely: fs-related threads should not try to process their queues, > etc., after the snapshot is done, because that may cause some fs data to be > written at that time and then the fs in question may be corrupted after the > restore. Not all of the I/O in general, fs data. But that's not true _either_. That's only true because right now I think we cannot even suspend to a swapfile (I might be wrong). If you have a swapfile on a filesystem, you'd need those fs queues running! > Well, I'm not sure whether or not that still would have been the case if we had > stopped to freeze kernel threads for the hibernation/suspend. Did you miss the email where Paul pointed out that Mac/PowerPC didn't use to do any of this? And apparently never had any issues with it? And probably worked more reliably several years ago than suspend/hibernation does _today_? Ie we do have history of _not_ freezing things. The freezing came later, and came with the subsystem that had more problems.. Linus ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 23:45 ` Linus Torvalds @ 2007-04-29 0:01 ` Nigel Cunningham 2007-04-29 5:01 ` Bojan Smojver 2007-04-29 3:43 ` Kyle Moffett 2007-04-29 8:57 ` Rafael J. Wysocki 2 siblings, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-29 0:01 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Pavel Machek, Pekka J Enberg, LKML, Oleg Nesterov [-- Attachment #1: Type: text/plain, Size: 1918 bytes --] Hi. On Sat, 2007-04-28 at 16:45 -0700, Linus Torvalds wrote: > > On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: > > > > OK, more precisely: fs-related threads should not try to process their queues, > > etc., after the snapshot is done, because that may cause some fs data to be > > written at that time and then the fs in question may be corrupted after the > > restore. Not all of the I/O in general, fs data. > > But that's not true _either_. That's only true because right now I think > we cannot even suspend to a swapfile (I might be wrong). > > If you have a swapfile on a filesystem, you'd need those fs queues > running! For Suspend2, and I think for swsusp too, we bmap the locations when allocating the storage, and then submit our own bios. Even if swsusp isn't using this method, I'm pretty sure the swap code does bmapping at swapon time to avoid raciness later. > > Well, I'm not sure whether or not that still would have been the case if we had > > stopped to freeze kernel threads for the hibernation/suspend. > > Did you miss the email where Paul pointed out that Mac/PowerPC didn't use > to do any of this? And apparently never had any issues with it? And > probably worked more reliably several years ago than suspend/hibernation > does _today_? > > Ie we do have history of _not_ freezing things. The freezing came later, > and came with the subsystem that had more problems.. It also came because of problems. Not working perfectly isn't necessarily a sign of a faulty reason for being added in the first place. I should also add, not freezing things is fine if you're happy with getting half an image at most. If you want a full just-as-if-I'd-never-turned-the-power-off image, you need freezing so that you can have some pages which can be saved before others are atomically copied, to ensure the whole image is consistent. Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-29 0:01 ` Nigel Cunningham @ 2007-04-29 5:01 ` Bojan Smojver 0 siblings, 0 replies; 135+ messages in thread From: Bojan Smojver @ 2007-04-29 5:01 UTC (permalink / raw) To: linux-kernel Nigel Cunningham <nigel <at> nigel.suspend2.net> writes: > If you want a full > just-as-if-I'd-never-turned-the-power-off image, Which (full images save) makes the system most responsive on resume. Coupled with compression and async I/O also keeps Suspend2 very, very fast, even with a slow disk and large amounts of RAM (as tested on one of my crappy old notebooks). From my (user) point of view, this is a brilliant feature to have. -- Bojan ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 23:45 ` Linus Torvalds 2007-04-29 0:01 ` Nigel Cunningham @ 2007-04-29 3:43 ` Kyle Moffett 2007-04-29 8:57 ` Rafael J. Wysocki 2 siblings, 0 replies; 135+ messages in thread From: Kyle Moffett @ 2007-04-29 3:43 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Pavel Machek, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Apr 28, 2007, at 19:45:01, Linus Torvalds wrote: > On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: >> Well, I'm not sure whether or not that still would have been the >> case if we had stopped to freeze kernel threads for the >> hibernation/suspend. > > Did you miss the email where Paul pointed out that Mac/PowerPC > didn't use to do any of this? And apparently never had any issues > with it? And probably worked more reliably several years ago than > suspend/hibernation > does _today_? Still works pretty reliably; the last time my PowerBook G4 was rebooted was 6 weeks ago. Once every 60 suspends or so the kernel USB driver gets really confused and doesn't wake up the USB controller properly, leading to dead keyboard/mouse, but other than that I never have problems. I wouldn't be surprised if I could comment out 90% of the "suspend" code and still have it work, the hardware in is is incredibly robust. I can even swap batteries while it's in suspend-to-RAM, as long as I do it in less than 45 sec or so; I get around 6-7 days of suspend-to-RAM time on a full charge. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 23:45 ` Linus Torvalds 2007-04-29 0:01 ` Nigel Cunningham 2007-04-29 3:43 ` Kyle Moffett @ 2007-04-29 8:57 ` Rafael J. Wysocki 2007-04-29 8:59 ` Pavel Machek 2 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-29 8:57 UTC (permalink / raw) To: Linus Torvalds Cc: Pavel Machek, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sunday, 29 April 2007 01:45, Linus Torvalds wrote: > > On Sun, 29 Apr 2007, Rafael J. Wysocki wrote: > > > > OK, more precisely: fs-related threads should not try to process their queues, > > etc., after the snapshot is done, because that may cause some fs data to be > > written at that time and then the fs in question may be corrupted after the > > restore. Not all of the I/O in general, fs data. > > But that's not true _either_. That's only true because right now I think > we cannot even suspend to a swapfile (I might be wrong). You are. > If you have a swapfile on a filesystem, you'd need those fs queues > running! No, I don't. It's done by bmapping the file and writing directly to the underlying blockdev. Otherwise we'd have corrupted filesystems after the restore. Swapfiles are handled this way anyway, so we just use the same code. > > Well, I'm not sure whether or not that still would have been the case if we had > > stopped to freeze kernel threads for the hibernation/suspend. > > Did you miss the email where Paul pointed out that Mac/PowerPC didn't use > to do any of this? No, I didn't. > And apparently never had any issues with it? On one platform with a limited subset of device drivers. > And probably worked more reliably several years ago than suspend/hibernation > does _today_? I have no problems with the hibernation on my test boxes (six of them), except for one network driver that doesn't bother to define a .suspend() callback. There are problems with the suspend (s2ram), but they are _not_ related to the freezing of kernel threads. Some of them are related to the other issue that you have risen, which is that the same callbacks should not be used for the suspend and hibernation, and which I think is absolutely valid. The remaining ones are related to the fact that graphic card vendors don't care for us at all. > Ie we do have history of _not_ freezing things. The freezing came later, > and came with the subsystem that had more problems.. It doesn't have that many problems as you are trying to suggest. At present, the only problems with it happen if someone tries to "improve" it in the way I did with the workqueues. Anyway, the freezing of tasks, including kernel threads, is one of the few things on which Pavel, Nigel and me completely agree that they should be done, so perhaps you could accept that? Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-29 8:57 ` Rafael J. Wysocki @ 2007-04-29 8:59 ` Pavel Machek 2007-04-29 9:32 ` Rafael J. Wysocki 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-29 8:59 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov Hi! > > Ie we do have history of _not_ freezing things. The freezing came later, > > and came with the subsystem that had more problems.. > > It doesn't have that many problems as you are trying to suggest. At present, > the only problems with it happen if someone tries to "improve" it in the way > I did with the workqueues. > > Anyway, the freezing of tasks, including kernel threads, is one of the few > things on which Pavel, Nigel and me completely agree that they should be done, > so perhaps you could accept that? Actually, if we want to support OLPC _nicely_, we'll need to get rid of freezer from suspend-to-RAM. Of course, that _will_ put more pressure at the drivers -- and break few of them... Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-29 8:59 ` Pavel Machek @ 2007-04-29 9:32 ` Rafael J. Wysocki 0 siblings, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-29 9:32 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sunday, 29 April 2007 10:59, Pavel Machek wrote: > Hi! > > > > Ie we do have history of _not_ freezing things. The freezing came later, > > > and came with the subsystem that had more problems.. > > > > It doesn't have that many problems as you are trying to suggest. At present, > > the only problems with it happen if someone tries to "improve" it in the way > > I did with the workqueues. > > > > Anyway, the freezing of tasks, including kernel threads, is one of the few > > things on which Pavel, Nigel and me completely agree that they should be done, > > so perhaps you could accept that? > > Actually, if we want to support OLPC _nicely_, we'll need to get rid > of freezer from suspend-to-RAM. Of course, that _will_ put more > pressure at the drivers -- and break few of them... I think the removal of sys_sync() from freeze_processes() in the s2ram case might help. I'm really afraid of dropping the freezing of kernel threads from the hibernation/suspend altogether before we know we won't break drivers, because we can introduce some very subtle and difficult to debug problems this way. Moreover, apart from speeding up the suspend slightly (kernel threads are frozen very quickly) this won't buy us anything, since kprobes uses the freezer and all of the infrastructure is needed anyway. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 21:25 ` Linus Torvalds 2007-04-28 23:03 ` Rafael J. Wysocki @ 2007-04-29 8:23 ` Pavel Machek 2007-04-29 9:22 ` Rafael J. Wysocki 1 sibling, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-29 8:23 UTC (permalink / raw) To: Linus Torvalds Cc: Rafael J. Wysocki, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov Hi! > > > The freezer has *caused* those deadlocks (eg by stopping threads that were > > > needed for the suspend writeouts to succeed!), not solved them. > > > > I can't remember anything like this, but I believe you have a specific test > > case in mind. > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? > > Rafael, you really don't know what you're talking about, do you? > > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > exactly *because* they do IO. You claim that kernel threads shouldn't do > IO, but that's the point: if you cannot do IO when snapshotting to disk, > here's a damn big clue for you: how do you think that snapshot is going to > get written? > > I *guarantee* you that we've had a lot more problems with threads that > should *not* have been frozen than with those hypothetical threads that > you think should have been frozen. Well, we had nasty corruption on XFS, caused by thread that was not frozen and should be. (While the other case leads "only" to deadlocks, so it is easier to debug.) The locking point.. when I added freezing to swsusp, I knew very little about kernel locking, so I "simply" decided to avoid the problem altogether... using the freezer. You may be right that locks are not a big problem for the hibernation after all; I just do not know. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-29 8:23 ` Pavel Machek @ 2007-04-29 9:22 ` Rafael J. Wysocki 0 siblings, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-29 9:22 UTC (permalink / raw) To: Pavel Machek Cc: Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sunday, 29 April 2007 10:23, Pavel Machek wrote: > Hi! > > > > > The freezer has *caused* those deadlocks (eg by stopping threads that were > > > > needed for the suspend writeouts to succeed!), not solved them. > > > > > > I can't remember anything like this, but I believe you have a specific test > > > case in mind. > > > > Ehh.. Why do you thik we _have_ that PF_NOFREEZE thing in the first place? > > > > Rafael, you really don't know what you're talking about, do you? > > > > Just _look_ at them. It's the IO threads etc that shouldn't be frozen, > > exactly *because* they do IO. You claim that kernel threads shouldn't do > > IO, but that's the point: if you cannot do IO when snapshotting to disk, > > here's a damn big clue for you: how do you think that snapshot is going to > > get written? > > > > I *guarantee* you that we've had a lot more problems with threads that > > should *not* have been frozen than with those hypothetical threads that > > you think should have been frozen. > > Well, we had nasty corruption on XFS, caused by thread that was not > frozen and should be. (While the other case leads "only" to deadlocks, > so it is easier to debug.) > > The locking point.. when I added freezing to swsusp, I knew very > little about kernel locking, so I "simply" decided to avoid the > problem altogether... using the freezer. > > You may be right that locks are not a big problem for the hibernation > after all; I just do not know. Still, I think, if a kernel thread is a part of a device driver, then _in_ _principle_ it needs _some_ synchronization with the driver's suspend/freeze and resume/thaw callbacks. For example, it's reasonable to assume that the thread should be quiet between suspend/freeze and resume/thaw. With the freezing of kernel threads we provide a simple means of such synchronization: use try_to_freeze() in a suitable place of your kernel thread and you're done. [Well, there should be a second part for making the thread die if the thaw callback doesn't find the device, but that's in the works.] Without it, there may be race conditions that we are not even aware of and that may trigger in, say, 1 in 10 suspends or so and I wish you good luck with debugging such things. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 8:50 ` Pavel Machek 2007-04-28 9:24 ` Rafael J. Wysocki 2007-04-28 16:28 ` Linus Torvalds @ 2007-04-28 18:32 ` David Lang 2007-04-28 19:14 ` Rafael J. Wysocki 2 siblings, 1 reply; 135+ messages in thread From: David Lang @ 2007-04-28 18:32 UTC (permalink / raw) To: Pavel Machek Cc: Rafael J. Wysocki, Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sat, 28 Apr 2007, Pavel Machek wrote: >> >> We freeze user space processes for the reasons that you have quoted above. >> >> Why we freeze kernel threads in there too is a good question, but not for me to >> answer. I don't know. Pavel should know, I think. > > We do not want kernel threads running: > > a) they may hold some locks and deadlock suspend > > b) they may do some writes to disk, leading to corruption > > We could solve a) by carefully auditing suspend lock usage to make > sure deadlocks are impossible even with kernel threads running. remember that we are doing suspend-to-disk, after we do the snapshot we will be doing a shutdown. that should simplify the locking issues David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 18:32 ` David Lang @ 2007-04-28 19:14 ` Rafael J. Wysocki 2007-04-28 18:44 ` David Lang 0 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 19:14 UTC (permalink / raw) To: David Lang Cc: Pavel Machek, Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Saturday, 28 April 2007 20:32, David Lang wrote: > On Sat, 28 Apr 2007, Pavel Machek wrote: > > >> > >> We freeze user space processes for the reasons that you have quoted above. > >> > >> Why we freeze kernel threads in there too is a good question, but not for me to > >> answer. I don't know. Pavel should know, I think. > > > > We do not want kernel threads running: > > > > a) they may hold some locks and deadlock suspend > > > > b) they may do some writes to disk, leading to corruption > > > > We could solve a) by carefully auditing suspend lock usage to make > > sure deadlocks are impossible even with kernel threads running. > > remember that we are doing suspend-to-disk, after we do the snapshot we will be > doing a shutdown. that should simplify the locking issues That's assuming that we won't need to cancel the hibernation. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 19:14 ` Rafael J. Wysocki @ 2007-04-28 18:44 ` David Lang 0 siblings, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-28 18:44 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Linus Torvalds, Nigel Cunningham, Pekka J Enberg, LKML, Oleg Nesterov On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 20:32, David Lang wrote: >> On Sat, 28 Apr 2007, Pavel Machek wrote: >> >>>> >>>> We freeze user space processes for the reasons that you have quoted above. >>>> >>>> Why we freeze kernel threads in there too is a good question, but not for me to >>>> answer. I don't know. Pavel should know, I think. >>> >>> We do not want kernel threads running: >>> >>> a) they may hold some locks and deadlock suspend >>> >>> b) they may do some writes to disk, leading to corruption >>> >>> We could solve a) by carefully auditing suspend lock usage to make >>> sure deadlocks are impossible even with kernel threads running. >> >> remember that we are doing suspend-to-disk, after we do the snapshot we will be >> doing a shutdown. that should simplify the locking issues > > That's assuming that we won't need to cancel the hibernation. true, but if we cancel the hibernation then why are the locks an issue? they are appropriate for the system state. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 23:17 ` Linus Torvalds 2007-04-27 23:45 ` Rafael J. Wysocki @ 2007-05-03 15:25 ` Pavel Machek 1 sibling, 0 replies; 135+ messages in thread From: Pavel Machek @ 2007-05-03 15:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, Pekka J Enberg, Nigel Cunningham, LKML Hi! > > 1) if the kernel threads are frozen, we know that they don't hold any locks > > that could interfere with the freezing of device drivers, > > 2) if they are frozen, we know, for example, that they won't call user mode > > helpers or do similar things, > > 3) if they are frozen, we know that they won't submit I/O to disks and > > potentially damage filesystems (suspend2 has much more problems with that > > than swsusp, but still. And yes, there have been bug reports related to it, > > so it's not just my fantasy). > > NONE of these are valid explanations at all. You're listing totally > theoretical problems, and ignoring all the _real_ problems that trying to > freeze kernel threads has _caused_. xfs problem was real. And I do not see that many problems caused by freezing kernel threads: at least you get deadlocks, not silent fs corruption. > And no, kernel threads do not submit IO to disks on their own. You just > made that up. Yes, they can be involved in that whole disk submission > thing, but in a good way - they can be required in order to make disk > writing work! Yep, so we have md doing io while we are doing atomic copy. That probably means it will continue when atomic copy is done... getting image out of sync with disk. (Plus we used to have bdflush, doing periodic writes to disk). > The problem that suspend has had is that it's done everything totally the > wrong way around. Do kernel threads do disk IO? Sure, if asked to do so. > For example, kernel threads can be involved in md etc, but that's a *good* > thing. The way to shut them up is not to freeze the threads, but to freeze > the *disk*. Well, if freezing the disk was available, I'd gladly do it. Is there easy way to implement that? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 21:44 ` Linus Torvalds 2007-04-27 22:04 ` Rafael J. Wysocki @ 2007-04-27 22:07 ` Nigel Cunningham 2007-04-28 1:03 ` Kyle Moffett 2007-04-28 0:18 ` Jeremy Fitzhardinge 2 siblings, 1 reply; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 22:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, Pekka J Enberg, LKML [-- Attachment #1: Type: text/plain, Size: 1751 bytes --] Hi. On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: > > On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > > > > Why do you think that keeping the user space frozen after 'snapshot' is a bad > > idea? I think that solves many of the problems you're discussing. > > It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do > > gdb -p <snapshotter> Make the machine being suspended a VM and you can already do that. > when something goes wrong?) but we also *depend* on user space for various > things (the same way we depend on kernel threads, and why it has been such > a total disaster to try to freeze the kernel threads too!). For example, > if you want to do graphical stuff, just using X would be quite nice, > wouldn't it? It would be nice, yes. But in doing so you make the contents of the disk inconsistent with the state you've just snapshotted, leading to filesystem corruption. Even if you modify filesystems to do checkpointing (which is what we're really talking about), you still also have the problem that your snapshot has to be stored somewhere before you write it to disk, so you also have to either 1) write some known static memory to disk before the snapshot and reuse it for the snapshot, 2) ensure up to half the RAM is free for your snapshot or 3) compress the snapshot as you take it, guessing beforehand how much memory the compressed snapshot might take and freeing that might 4) reserve memory at boot time for the atomic copy so that 2) or 3) is still done, but without having to free the memory. (Yuk!). > But I do agree that doing everythign in the kernel is likely to just be a > hell of a lot simpler for everybody. Indeed. Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 22:07 ` Nigel Cunningham @ 2007-04-28 1:03 ` Kyle Moffett 2007-04-28 1:15 ` Rafael J. Wysocki 2007-05-03 15:10 ` Pavel Machek 0 siblings, 2 replies; 135+ messages in thread From: Kyle Moffett @ 2007-04-28 1:03 UTC (permalink / raw) To: nigel; +Cc: Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, LKML On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: > Hi. > > On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: >> It makes it harder to debug (wouldn't it be *nice* to just ssh in, >> and do >> gdb -p <snapshotter> > > Make the machine being suspended a VM and you can already do that. >> when something goes wrong?) but we also *depend* on user space for >> various things (the same way we depend on kernel threads, and why >> it has been such a total disaster to try to freeze the kernel >> threads too!). For example, if you want to do graphical stuff, >> just using X would be quite nice, wouldn't it? > > But in doing so you make the contents of the disk inconsistent with > the state you've just snapshotted, leading to filesystem > corruption. Even if you modify filesystems to do checkpointing > (which is what we're really talking about), you still also have the > problem that your snapshot has to be stored somewhere before you > write it to disk, so you also have to either [snip] Actually, it's a lot simpler than that. We can just combine the device-mapper snapshot with a VM+kernel snapshot system call and be almost done: sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); When sys_snapshot is run, the kernel does: 1) Sequentially freeze mounted filesystems using blockdev freezing. If it's an fs that doesn't support freezing then either fail or force- remount-ro that fs and downgrade all its filedescriptors to RO. Doesn't need extra locking since process which try to do IO either succeed before the freeze call returns for that blockdev or sleep on the unfreeze of that blockdev. Filesystems are synchronized and made clean. 2) Iterate over the userspace process list, freezing each process and remapping all of its pages copy-on-write. Any device-specific pages need to have state saved by that device. 3) All processes (except kernel threads) are now frozen. 4) Kernel should save internal state corresponding to current userspace state. The kernel also swaps out excess pages to free up enough RAM and prepares the snapshot file-descriptor with copies of kernel memory and the original (pre-COW) mapped userspace pages. 5) Kernel substitutes filesystems for either a device-mapper snapshot with snapblockdev as backing storage or union with tmpfs and remounts the underlying filesystems as read-only. 6) Kernel unfreezes all userspace processes and returns the snapshot FD to userspace (where it can be read from). Then userspace can do whatever it wants. Any changes to filesystems mounted at the time of snapshot will be discarded at shutdown. Freshly mounted filesystems won't have the union or COW thing done, and so you can write your snapshot to a compressed encrypted file on a USB key if you want to, you just have to unmount it before the snapshot() syscall and remount it right afterwards. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:03 ` Kyle Moffett @ 2007-04-28 1:15 ` Rafael J. Wysocki 2007-04-28 0:51 ` David Lang 2007-04-28 1:25 ` Kyle Moffett 2007-05-03 15:10 ` Pavel Machek 1 sibling, 2 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 1:15 UTC (permalink / raw) To: Kyle Moffett; +Cc: nigel, Linus Torvalds, Pekka J Enberg, LKML On Saturday, 28 April 2007 03:03, Kyle Moffett wrote: > On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: > > Hi. > > > > On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: > >> It makes it harder to debug (wouldn't it be *nice* to just ssh in, > >> and do > >> gdb -p <snapshotter> > > > > Make the machine being suspended a VM and you can already do that. > > >> when something goes wrong?) but we also *depend* on user space for > >> various things (the same way we depend on kernel threads, and why > >> it has been such a total disaster to try to freeze the kernel > >> threads too!). For example, if you want to do graphical stuff, > >> just using X would be quite nice, wouldn't it? > > > > But in doing so you make the contents of the disk inconsistent with > > the state you've just snapshotted, leading to filesystem > > corruption. Even if you modify filesystems to do checkpointing > > (which is what we're really talking about), you still also have the > > problem that your snapshot has to be stored somewhere before you > > write it to disk, so you also have to either [snip] > > Actually, it's a lot simpler than that. We can just combine the > device-mapper snapshot with a VM+kernel snapshot system call and be > almost done: > > sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); > > When sys_snapshot is run, the kernel does: > > 1) Sequentially freeze mounted filesystems using blockdev freezing. > If it's an fs that doesn't support freezing then either fail or force- > remount-ro that fs and downgrade all its filedescriptors to RO. > Doesn't need extra locking since process which try to do IO either > succeed before the freeze call returns for that blockdev or sleep on > the unfreeze of that blockdev. Filesystems are synchronized and made > clean. > 2) Iterate over the userspace process list, freezing each process > and remapping all of its pages copy-on-write. Any device-specific > pages need to have state saved by that device. Why do you want to do 2) after 1) and not vice versa? > 3) All processes (except kernel threads) are now frozen. > 4) Kernel should save internal state corresponding to current > userspace state. The kernel also swaps out excess pages to free up > enough RAM and prepares the snapshot file-descriptor with copies of > kernel memory and the original (pre-COW) mapped userspace pages. > 5) Kernel substitutes filesystems for either a device-mapper > snapshot with snapblockdev as backing storage or union with tmpfs and > remounts the underlying filesystems as read-only. > 6) Kernel unfreezes all userspace processes and returns the snapshot > FD to userspace (where it can be read from). Okay, but how do we do the error recovery if, for example, the image cannot be saved? > Then userspace can do whatever it wants. Any changes to filesystems > mounted at the time of snapshot will be discarded at shutdown. > Freshly mounted filesystems won't have the union or COW thing done, > and so you can write your snapshot to a compressed encrypted file on > a USB key if you want to, you just have to unmount it before the > snapshot() syscall and remount it right afterwards. This seems to be a good idea. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:15 ` Rafael J. Wysocki @ 2007-04-28 0:51 ` David Lang 2007-04-28 1:25 ` Kyle Moffett 1 sibling, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-28 0:51 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Kyle Moffett, nigel, Linus Torvalds, Pekka J Enberg, LKML On Sat, 28 Apr 2007, Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 03:03, Kyle Moffett wrote: >> On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: >>> Hi. >>> >>> On Fri, 2007-04-27 at 14:44 -0700, Linus Torvalds wrote: >>>> It makes it harder to debug (wouldn't it be *nice* to just ssh in, >>>> and do >>>> gdb -p <snapshotter> >>> >>> Make the machine being suspended a VM and you can already do that. >> >>>> when something goes wrong?) but we also *depend* on user space for >>>> various things (the same way we depend on kernel threads, and why >>>> it has been such a total disaster to try to freeze the kernel >>>> threads too!). For example, if you want to do graphical stuff, >>>> just using X would be quite nice, wouldn't it? >>> >>> But in doing so you make the contents of the disk inconsistent with >>> the state you've just snapshotted, leading to filesystem >>> corruption. Even if you modify filesystems to do checkpointing >>> (which is what we're really talking about), you still also have the >>> problem that your snapshot has to be stored somewhere before you >>> write it to disk, so you also have to either [snip] >> >> Actually, it's a lot simpler than that. We can just combine the >> device-mapper snapshot with a VM+kernel snapshot system call and be >> almost done: >> >> sys_snapshot(dev_t snapblockdev, int __user *snapshotfd); >> >> When sys_snapshot is run, the kernel does: >> >> 1) Sequentially freeze mounted filesystems using blockdev freezing. >> If it's an fs that doesn't support freezing then either fail or force- >> remount-ro that fs and downgrade all its filedescriptors to RO. >> Doesn't need extra locking since process which try to do IO either >> succeed before the freeze call returns for that blockdev or sleep on >> the unfreeze of that blockdev. Filesystems are synchronized and made >> clean. >> 2) Iterate over the userspace process list, freezing each process >> and remapping all of its pages copy-on-write. Any device-specific >> pages need to have state saved by that device. > > Why do you want to do 2) after 1) and not vice versa? it doesn't really need to matter. if you care, just arrange to not schedule user processes while you are doing both steps. >> 3) All processes (except kernel threads) are now frozen. >> 4) Kernel should save internal state corresponding to current >> userspace state. The kernel also swaps out excess pages to free up >> enough RAM and prepares the snapshot file-descriptor with copies of >> kernel memory and the original (pre-COW) mapped userspace pages. >> 5) Kernel substitutes filesystems for either a device-mapper >> snapshot with snapblockdev as backing storage or union with tmpfs and >> remounts the underlying filesystems as read-only. >> 6) Kernel unfreezes all userspace processes and returns the snapshot >> FD to userspace (where it can be read from). > > Okay, but how do we do the error recovery if, for example, the image cannot > be saved? give the user an error message telling him this, wait for confirmation, and then jump directly to the restore step. revert everything to the snapshot image(s), restart it. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:15 ` Rafael J. Wysocki 2007-04-28 0:51 ` David Lang @ 2007-04-28 1:25 ` Kyle Moffett 1 sibling, 0 replies; 135+ messages in thread From: Kyle Moffett @ 2007-04-28 1:25 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: nigel, Linus Torvalds, Pekka J Enberg, LKML On Apr 27, 2007, at 21:15:28, Rafael J. Wysocki wrote: > On Saturday, 28 April 2007 03:03, Kyle Moffett wrote: >> On Apr 27, 2007, at 18:07:46, Nigel Cunningham wrote: >>> But in doing so you make the contents of the disk inconsistent >>> with the state you've just snapshotted, leading to filesystem >>> corruption. Even if you modify filesystems to do checkpointing >>> (which is what we're really talking about), you still also have >>> the problem that your snapshot has to be stored somewhere before >>> you write it to disk, so you also have to either [snip] >> >> When sys_snapshot is run, the kernel does: >> >> 1) Sequentially freeze mounted filesystems using blockdev >> freezing. If it's an fs that doesn't support freezing then either >> fail or force-remount-ro that fs and downgrade all its >> filedescriptors to RO. Doesn't need extra locking since process >> which try to do IO either succeed before the freeze call returns >> for that blockdev or sleep on the unfreeze of that blockdev. >> Filesystems are synchronized and made clean. >> 2) Iterate over the userspace process list, freezing each process >> and remapping all of its pages copy-on-write. Any device-specific >> pages need to have state saved by that device. > > Why do you want to do 2) after 1) and not vice versa? (1) can be done without extra locking. Device-mapper already has code to freeze filesystems and that makes a natural process-stopping point. Any threads doing IO will very quickly put themselves to sleep at (1) and save us some effort during step 2. >> 6) Kernel unfreezes all userspace processes and returns the >> snapshot FD to userspace (where it can be read from). > > Okay, but how do we do the error recovery if, for example, the > image cannot be saved? If the image can't be saved then there are 2 options: (1) Call sys_restore() with the image (2) Pass your snapshot file-descriptor to sys_unsnapshot() In the former case, the system will be restored to the state it was at a few seconds earlier, right as it took the snapshot. In the latter case the modified-in-memory snapshot pages will be synced back to the disk filesystems, the copy-on-write data-structures torn down (think of merging an LVM snapshot back into its base device), and the memory allocated for the snapshot will be freed. Either way the system is properly in sync with disk again, the only difference is whether you want to preserve the userspace state from during the attempted snapshot (IE: any error status). You could also save the error state in case (1) by just auto-posting a bug-report on http:// bugs.$VENDOR.com/ of course :-D. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:03 ` Kyle Moffett 2007-04-28 1:15 ` Rafael J. Wysocki @ 2007-05-03 15:10 ` Pavel Machek 2007-05-03 16:53 ` Kyle Moffett 1 sibling, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-03 15:10 UTC (permalink / raw) To: Kyle Moffett Cc: nigel, Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, LKML Hi! > >>It makes it harder to debug (wouldn't it be *nice* to > >>just ssh in, and do > >> gdb -p <snapshotter> > > > >Make the machine being suspended a VM and you can > >already do that. > > >>when something goes wrong?) but we also *depend* on > >>user space for various things (the same way we depend > >>on kernel threads, and why it has been such a total > >>disaster to try to freeze the kernel threads too!). > >>For example, if you want to do graphical stuff, just > >>using X would be quite nice, wouldn't it? > > > >But in doing so you make the contents of the disk > >inconsistent with the state you've just snapshotted, > >leading to filesystem corruption. Even if you modify > >filesystems to do checkpointing (which is what we're > >really talking about), you still also have the problem > >that your snapshot has to be stored somewhere before > >you write it to disk, so you also have to either [snip] > > Actually, it's a lot simpler than that. We can just > combine the device-mapper snapshot with a VM+kernel > snapshot system call and be almost done: > > sys_snapshot(dev_t snapblockdev, int __user > *snapshotfd); > > When sys_snapshot is run, the kernel does: > > 1) Sequentially freeze mounted filesystems using > blockdev freezing. If it's an fs that doesn't support > freezing then either fail or force- remount-ro that fs > and downgrade all its filedescriptors to RO. Doesn't > need extra locking since process which try to do IO > either succeed before the freeze call returns for that > blockdev or sleep on the unfreeze of that blockdev. > Filesystems are synchronized and made clean. How mature is freezing filesystems -- will it work on at least ext2/3 and vfat? What happens if you try to boot and filesystems are frozen from previous run? Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-03 15:10 ` Pavel Machek @ 2007-05-03 16:53 ` Kyle Moffett 2007-05-04 7:52 ` David Greaves 0 siblings, 1 reply; 135+ messages in thread From: Kyle Moffett @ 2007-05-03 16:53 UTC (permalink / raw) To: Pavel Machek Cc: nigel, Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, LKML On May 03, 2007, at 11:10:47, Pavel Machek wrote: > How mature is freezing filesystems -- will it work on at least > ext2/3 and vfat? I'm pretty sure it works on ext2/3 and xfs and possibly others, I don't know either way about VFAT though. Essentially the "freeze" part involves telling the filesystem to sync all data, flush the journal, and mark the filesystem clean. The intent under dm/LVM was to allow you to make snapshots without having to fsck the just- created snapshot before you mounted it. > What happens if you try to boot and filesystems are frozen from > previous run? If you're just doing a fresh boot then the filesystem is already clean due to the dm freeze and so it mounts up normally. All you need to do then is have a little startup script which purges the saved image before you fsck or remount things read-write since either case means the image is no longer safe to resume. If the kernel is later modified to purge all filesystem data (dcache/ pagecache) during snapshot and effectively remount and reopen all the files by path during restore then you could remove that requirement. You'd just need to make sure that the restore-from-disk scripts did an fsck or journal-restore before reloading the old kernel data. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-03 16:53 ` Kyle Moffett @ 2007-05-04 7:52 ` David Greaves 2007-05-04 13:27 ` Kyle Moffett 0 siblings, 1 reply; 135+ messages in thread From: David Greaves @ 2007-05-04 7:52 UTC (permalink / raw) To: Kyle Moffett Cc: Pavel Machek, nigel, Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, LKML Kyle Moffett wrote: > On May 03, 2007, at 11:10:47, Pavel Machek wrote: >> How mature is freezing filesystems -- will it work on at least ext2/3 >> and vfat? > > I'm pretty sure it works on ext2/3 and xfs and possibly others, I don't > know either way about VFAT though. Essentially the "freeze" part > involves telling the filesystem to sync all data, flush the journal, and > mark the filesystem clean. The intent under dm/LVM was to allow you to > make snapshots without having to fsck the just-created snapshot before > you mounted it. > >> What happens if you try to boot and filesystems are frozen from >> previous run? > > If you're just doing a fresh boot then the filesystem is already clean > due to the dm freeze and so it mounts up normally. All you need to do > then is have a little startup script which purges the saved image before > you fsck or remount things read-write since either case means the image > is no longer safe to resume. Wouldn't it be better if freeze wrote a freeze-ID to the fs and returned it? This would naturally be kept in the image and a UUID mismatch would be detectable - seems safer and more flexible than 'a script'. "This isn't the freeze you're looking for, move along" David ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-04 7:52 ` David Greaves @ 2007-05-04 13:27 ` Kyle Moffett 0 siblings, 0 replies; 135+ messages in thread From: Kyle Moffett @ 2007-05-04 13:27 UTC (permalink / raw) To: David Greaves Cc: Pavel Machek, nigel, Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, LKML On May 04, 2007, at 03:52:03, David Greaves wrote: > Kyle Moffett wrote: >> On May 03, 2007, at 11:10:47, Pavel Machek wrote: >>> What happens if you try to boot and filesystems are frozen from >>> previous run? >> >> If you're just doing a fresh boot then the filesystem is already >> clean due to the dm freeze and so it mounts up normally. All you >> need to do then is have a little startup script which purges the >> saved image before you fsck or remount things read-write since >> either case means the image is no longer safe to resume. > > Wouldn't it be better if freeze wrote a freeze-ID to the fs and > returned it? This would naturally be kept in the image and a UUID > mismatch would be detectable - seems safer and more flexible than > 'a script'. > > "This isn't the freeze you're looking for, move along" Possibly, but I was referring to the _current_ behavior of the device- mapper freezing. While perhaps not ideal, it's currently very easily usable. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 21:44 ` Linus Torvalds 2007-04-27 22:04 ` Rafael J. Wysocki 2007-04-27 22:07 ` Nigel Cunningham @ 2007-04-28 0:18 ` Jeremy Fitzhardinge 2007-04-28 1:00 ` Matthew Garrett 2 siblings, 1 reply; 135+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-28 0:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Rafael J. Wysocki, Pekka J Enberg, Nigel Cunningham, LKML Linus Torvalds wrote: > On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > >> Why do you think that keeping the user space frozen after 'snapshot' is a bad >> idea? I think that solves many of the problems you're discussing. >> > > It makes it harder to debug (wouldn't it be *nice* to just ssh in, and do > > gdb -p <snapshotter> > > when something goes wrong?) Yeah, or gdb vmlinux snapshot Then you could use kexec for resume... J ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 0:18 ` Jeremy Fitzhardinge @ 2007-04-28 1:00 ` Matthew Garrett 2007-04-28 1:05 ` Jeremy Fitzhardinge 2007-04-28 1:08 ` Rafael J. Wysocki 0 siblings, 2 replies; 135+ messages in thread From: Matthew Garrett @ 2007-04-28 1:00 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, Nigel Cunningham, LKML On Fri, Apr 27, 2007 at 05:18:16PM -0700, Jeremy Fitzhardinge wrote: > Then you could use kexec for resume... While that would certainly be nifty, I think we're arguably starting from the wrong point here. Why are we booting a kernel, trying to poke the hardware back into some sort of mock-quiescent state, freeing memory and then (finally) overwriting the entire contents of RAM rather than just doing all of this from the bootloader? Given the time spent in kernel setup and unpacking initramfs nowadays, I'm willing to bet it'd still be faster even if you're stuck using int 13 on x86. http://apcmag.com/5873/page14 suggests that Intel is looking into this, but I haven't heard anything more yet. To the best of my knowledge, this is also how Windows manages things. -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:00 ` Matthew Garrett @ 2007-04-28 1:05 ` Jeremy Fitzhardinge 2007-05-03 15:14 ` Pavel Machek 2007-04-28 1:08 ` Rafael J. Wysocki 1 sibling, 1 reply; 135+ messages in thread From: Jeremy Fitzhardinge @ 2007-04-28 1:05 UTC (permalink / raw) To: Matthew Garrett Cc: Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, Nigel Cunningham, LKML Matthew Garrett wrote: > While that would certainly be nifty, I think we're arguably starting > from the wrong point here. Why are we booting a kernel, trying to poke > the hardware back into some sort of mock-quiescent state, freeing memory > and then (finally) overwriting the entire contents of RAM rather than > just doing all of this from the bootloader? Sure, you could make suspend generate a complete bootable kernel image containing all RAM. Doesn't sound too hard to me. You know, from over here on the sidelines. J ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:05 ` Jeremy Fitzhardinge @ 2007-05-03 15:14 ` Pavel Machek 2007-06-01 19:00 ` Eric W. Biederman 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-05-03 15:14 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: Matthew Garrett, Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, Nigel Cunningham, LKML Hi! > > While that would certainly be nifty, I think we're arguably starting > > from the wrong point here. Why are we booting a kernel, trying to poke > > the hardware back into some sort of mock-quiescent state, freeing memory > > and then (finally) overwriting the entire contents of RAM rather than > > just doing all of this from the bootloader? Doing it from the bootloader sounds attractive... but it is lot of work. I'm essentially using linux as a bootloader. Patch for grub welcome. > Sure, you could make suspend generate a complete bootable kernel image > containing all RAM. Doesn't sound too hard to me. You know, from over > here on the sidelines. Ah, so we have a volunteer :-). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-05-03 15:14 ` Pavel Machek @ 2007-06-01 19:00 ` Eric W. Biederman 0 siblings, 0 replies; 135+ messages in thread From: Eric W. Biederman @ 2007-06-01 19:00 UTC (permalink / raw) To: Pavel Machek Cc: Jeremy Fitzhardinge, Matthew Garrett, Linus Torvalds, Rafael J. Wysocki, Pekka J Enberg, Nigel Cunningham, LKML Pavel Machek <pavel@ucw.cz> writes: > Hi! > >> > While that would certainly be nifty, I think we're arguably starting >> > from the wrong point here. Why are we booting a kernel, trying to poke >> > the hardware back into some sort of mock-quiescent state, freeing memory >> > and then (finally) overwriting the entire contents of RAM rather than >> > just doing all of this from the bootloader? > > Doing it from the bootloader sounds attractive... but it is lot of > work. I'm essentially using linux as a bootloader. > > Patch for grub welcome. Well. We actually have first class support for using linux as a bootloader. So you could use linux and do whatever dance you are doing from a bootloader if you felt the desire. That might make the dance a little easier. Eric ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-28 1:00 ` Matthew Garrett 2007-04-28 1:05 ` Jeremy Fitzhardinge @ 2007-04-28 1:08 ` Rafael J. Wysocki 1 sibling, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-28 1:08 UTC (permalink / raw) To: Matthew Garrett Cc: Jeremy Fitzhardinge, Linus Torvalds, Pekka J Enberg, Nigel Cunningham, LKML On Saturday, 28 April 2007 03:00, Matthew Garrett wrote: > On Fri, Apr 27, 2007 at 05:18:16PM -0700, Jeremy Fitzhardinge wrote: > > > Then you could use kexec for resume... > > While that would certainly be nifty, I think we're arguably starting > from the wrong point here. Why are we booting a kernel, trying to poke > the hardware back into some sort of mock-quiescent state, freeing memory > and then (finally) overwriting the entire contents of RAM rather than > just doing all of this from the bootloader? Given the time spent in > kernel setup and unpacking initramfs nowadays, I'm willing to bet it'd > still be faster even if you're stuck using int 13 on x86. Yes, that would be faster. > http://apcmag.com/5873/page14 suggests that Intel is looking into this, > but I haven't heard anything more yet. To the best of my knowledge, this > is also how Windows manages things. I think you're right. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 4:52 ` Pekka J Enberg 2007-04-27 6:08 ` Nigel Cunningham @ 2007-04-27 20:44 ` Rafael J. Wysocki 1 sibling, 0 replies; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 20:44 UTC (permalink / raw) To: Pekka J Enberg; +Cc: Nigel Cunningham, Linus Torvalds, LKML On Friday, 27 April 2007 06:52, Pekka J Enberg wrote: > On Thu, 2007-04-26 at 09:56 -0700, Linus Torvalds wrote: > > > which will map in the snapshot, return the mapped address and the size > > > (and if you want to support snapshots > 4GB, be my guest, but I suspect > > > you're actually *better* off just admitting that if you cannot shrink > > > the snapshot to less than 32 bits, it's not worth doing) > > On Fri, 27 Apr 2007, Nigel Cunningham wrote: > > That inherently limits the image to half of available ram (you need > > somewhere to store the snapshot), so you won't get the full image you > > express interest in below. > > It doesn't. We can make the userspace mapped pages copy-on-write. As long > as the userspace makes sure there's not much activity during > snapshot/shutdown, we will be fine. What we probably do need to copy is > kernel pages. The user space is (and IMHO should be) frozen way before that and what you're suggesting here is what I wanted to implement some time ago. The problem with this was that the user space pages may be updated, for example, by device drivers as a result of some deferred I/O after we've snapshotted the system. I didn't know how to find out which pages owned by the user space could be updated this way, so I gave up at that time. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 19:56 ` Nigel Cunningham 2007-04-27 4:52 ` Pekka J Enberg @ 2007-04-28 19:09 ` Bill Davidsen 1 sibling, 0 replies; 135+ messages in thread From: Bill Davidsen @ 2007-04-28 19:09 UTC (permalink / raw) To: nigel; +Cc: Linus Torvalds, Pekka Enberg, LKML Nigel Cunningham wrote: > Please, go apply that logic elsewhere, then cut out (or at least stop > adding) support for users with less common needs in other areas. I fully > acknowledge that most users have only one place to store their image and > it's a swap device. But that doesn't mean one size fits all. > I think to some extent that's part of the problem. Consider for a moment that a /dev/hibernate would be required, and that it must be (a) a disk, or (b) a partition, or (c) other devices in the future, like an nbd, USB flash or DVD. Don't have a device like that, then can't hibernate. Stop trying to be smart and use swap for two different things. Stop trying to have an interface between user space and kernel which does things not required to preserve the system. A progress indicator is not needed, power off is my progress indicator, and should be the sole valid end of a hibernate. > A full image implies that you need to figure out what's not going to > change while you're writing it and save that separately. At the moment, > I'm treating most of the LRU contents as that list. If we're going to > start trying to let every man and his dog run while we're trying to > snapshot the system, that's not going to work anymore - or the logic > will get a lot more complicated. > > Sorry. I never thought I'd say this, but I think you're being naive > about how simple the process of snapshotting a system is. Hibernate is useful to avoid complex boot, it's useful as the UPS gets tired, and putting features in the process beyond saving the snap (possibly compressed and/or encrypted) just adds complexity. Put it all in the kernel and use /sys/power/state as the user interface. Stop oversolving the problem. No, that doesn't avoid other hard issues, but for the most part suspend2 has addressed them. -- Bill Davidsen <davidsen@tmr.com> "We have more to fear from the bungling of the incompetent than from the machinations of the wicked." - from Slashdot ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds ` (3 preceding siblings ...) 2007-04-26 19:56 ` Nigel Cunningham @ 2007-04-26 22:40 ` Pavel Machek 2007-04-27 5:41 ` Pekka Enberg 2007-04-26 22:42 ` Pavel Machek 2007-04-27 12:49 ` Pavel Machek 6 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-26 22:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nigel Cunningham, Pekka Enberg, LKML Hi! > > * Doing things in the right order? (Prepare the image, then do the > > atomic copy, then save). > > I'd actually like to discuss this a bit.. > > I'm obviously not a huge fan of the whole user/kernel level split and > interfaces, but I actually do think that there is *one* split that makes > sense: > > - generate the (whole) snapshot image entirely inside the kernel > > - do nothing else (ie no IO at all), and just export it as a single image > to user space (literally just mapping the pages into user space). > *one* interface. None of the "pretty UI update" crap. Just a single > system call: > > void *snapshot_system(u32 *size); > > which will map in the snapshot, return the mapped address and the size > (and if you want to support snapshots > 4GB, be my guest, but I suspect > you're actually *better* off just admitting that if you cannot shrink > the snapshot to less than 32 bits, it's not worth doing) This is basically how uswsusp is designed. (We do not use system call, you just read from /dev/snapshot, and you have to make few ioctls to stop the other tasks). > and for testing, you should be able to basically do > > u32 size; > void *buffer = snapshot_system(&size); > if (buffer != MAP_FAILED) > resume_snapshot(buffer, size); > > and it should obviously work. Which is what I did long time ago, during uswsusp development. > Once you have that snapshot image in user space you can do anything you > want. And again: you'd hav a fully working system: not any degradation > *at*all*. If you're in X, then X will continue running etc even after the > snapshotting, although obviously the snapshotting will have tried to page > a lot of stuff out in order to make the snapshot smaller, so you'll likely > be crawling. Well... We decided not to do this in the fully working system. SIGSTOP is just not strong enough, and we want the snapshot atomic. Now, it would be _very_ nice to be able to snapshot system and continue running, but I just don't see how to do it without extensive filesystem support. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:40 ` Pavel Machek @ 2007-04-27 5:41 ` Pekka Enberg 2007-04-27 14:55 ` Pavel Machek 0 siblings, 1 reply; 135+ messages in thread From: Pekka Enberg @ 2007-04-27 5:41 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Nigel Cunningham, LKML On 4/27/07, Pavel Machek <pavel@ucw.cz> wrote: > Now, it would be _very_ nice to be able to snapshot system and > continue running, but I just don't see how to do it without extensive > filesystem support. So what kind of support do we need from the filesystem? Pekka ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 5:41 ` Pekka Enberg @ 2007-04-27 14:55 ` Pavel Machek 2007-04-27 21:39 ` Nigel Cunningham 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-27 14:55 UTC (permalink / raw) To: Pekka Enberg; +Cc: Linus Torvalds, Nigel Cunningham, LKML On Fri 2007-04-27 08:41:56, Pekka Enberg wrote: > On 4/27/07, Pavel Machek <pavel@ucw.cz> wrote: > >Now, it would be _very_ nice to be able to snapshot system and > >continue running, but I just don't see how to do it without extensive > >filesystem support. > > So what kind of support do we need from the filesystem? "forcedremount ro, not telling anyone, not killing processes" would do the trick. FS snapshots might do. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 14:55 ` Pavel Machek @ 2007-04-27 21:39 ` Nigel Cunningham 0 siblings, 0 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-27 21:39 UTC (permalink / raw) To: Pavel Machek; +Cc: Pekka Enberg, Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 853 bytes --] Hi. On Fri, 2007-04-27 at 16:55 +0200, Pavel Machek wrote: > On Fri 2007-04-27 08:41:56, Pekka Enberg wrote: > > On 4/27/07, Pavel Machek <pavel@ucw.cz> wrote: > > >Now, it would be _very_ nice to be able to snapshot system and > > >continue running, but I just don't see how to do it without extensive > > >filesystem support. > > > > So what kind of support do we need from the filesystem? > > "forcedremount ro, not telling anyone, not killing processes" would do > the trick. FS snapshots might do. It sounds to me more like Pekka is thinking of checkpointing support. If that's the case, then remounting filesystems isn't going to be an option. You want to freeze them for just long enough so that you can determine what needs saving in the checkpoint. You certainly don't want to make rw file handles ro and so on. Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds ` (4 preceding siblings ...) 2007-04-26 22:40 ` Pavel Machek @ 2007-04-26 22:42 ` Pavel Machek 2007-04-26 22:24 ` David Lang 2007-04-27 12:49 ` Pavel Machek 6 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-26 22:42 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nigel Cunningham, Pekka Enberg, LKML Hi! > I'd really suggest _just_ the "full image". Nothing else is probably ever > worth supporting. Your "snapshot to disk" wouldn't be _quite_ as simple as > "echo disk > /sys/power/state", but it should not necessarily be much > worse than > > snapshot_kernel | gzip -9 > /dev/snapshot Yep, we "freeze too much", so we can't just use the shell and pipe it. Too bad. 218 int write_image(char *resume_dev_name) 219 { 220 static struct swap_map_handle handle; 221 struct swsusp_info *header; 222 unsigned long start; 223 int fd; 224 int error; 225 226 fd = open(resume_dev_name, O_RDWR | O_SYNC); 227 if (fd < 0) { 228 printf("suspend: Could not open resume device\n"); 229 return error; 230 } 231 error = read(dev, buffer, PAGE_SIZE); 232 if (error < PAGE_SIZE) 233 return error < 0 ? error : -EFAULT; 234 header = (struct swsusp_info *)buffer; 235 if (!enough_swap(header->pages)) { 236 printf("suspend: Not enough free swap\n"); 237 return -ENOSPC; 238 } 239 error = init_swap_writer(&handle, fd); 240 if (!error) { 241 start = handle.cur_swap; 242 error = swap_write_page(&handle, header); 243 } 244 if (!error) 245 error = save_image(&handle, header->pages - 1); 246 if (!error) { 247 flush_swap_writer(&handle); 248 printf( "S" ); 249 error = mark_swap(fd, start); 250 printf( "|\n" ); 251 } 252 fsync(fd); 253 close(fd); 254 return error; 255 } This is basically the loop above, made complex by the fact that we do not want to have separate partition for snapshot; we just want to reuse free space in swap partition. I think you've just invented uswsusp. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:42 ` Pavel Machek @ 2007-04-26 22:24 ` David Lang 2007-04-26 23:12 ` Pavel Machek 0 siblings, 1 reply; 135+ messages in thread From: David Lang @ 2007-04-26 22:24 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Fri, 27 Apr 2007, Pavel Machek wrote: > This is basically the loop above, made complex by the fact that we do > not want to have separate partition for snapshot; we just want to > reuse free space in swap partition. with the size of drives today is it really that bad to require a seperate partition for this? I also don't like the idea of storing this in the swap partition for a couple of reasons. 1. on many modern linux systems the swap partition is not large enough. for example, on my boxes with 16G or ram I only allocate 2G of swap space 2. it's too easy for other things to stomp on your swap partition. for example: booting from a live CD that finds and uses swap partitions if you are needing space for your freeze, allocate it in an unabigous way, not by re-useing an existing partition. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:24 ` David Lang @ 2007-04-26 23:12 ` Pavel Machek 2007-04-26 22:49 ` David Lang 0 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-26 23:12 UTC (permalink / raw) To: David Lang; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML Hi! > >This is basically the loop above, made complex by the fact that we do > >not want to have separate partition for snapshot; we just want to > >reuse free space in swap partition. > > with the size of drives today is it really that bad to require a seperate > partition for this? Yes. You want uswsusp to work in situations where swsusp worked. > I also don't like the idea of storing this in the swap partition for a > couple of reasons. > > 1. on many modern linux systems the swap partition is not large enough. > > for example, on my boxes with 16G or ram I only allocate 2G of swap > space WTF? So allocate larger swap partition. You just told me disks are big enough. > 2. it's too easy for other things to stomp on your swap partition. > > for example: booting from a live CD that finds and uses swap > partitions That's a feature. If you are booting from live CD, you _want_ to erase any hibernation image. > if you are needing space for your freeze, allocate it in an unabigous way, > not by re-useing an existing partition. Of course you have that option. Writing image is done in userspace, so you are free to write it to raw partition (and first versions indeed done that). Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 23:12 ` Pavel Machek @ 2007-04-26 22:49 ` David Lang 2007-04-26 23:27 ` Pavel Machek 2007-04-27 0:23 ` Olivier Galibert 0 siblings, 2 replies; 135+ messages in thread From: David Lang @ 2007-04-26 22:49 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Fri, 27 Apr 2007, Pavel Machek wrote: > Hi! > >>> This is basically the loop above, made complex by the fact that we do >>> not want to have separate partition for snapshot; we just want to >>> reuse free space in swap partition. >> >> with the size of drives today is it really that bad to require a seperate >> partition for this? > > Yes. You want uswsusp to work in situations where swsusp worked. > >> I also don't like the idea of storing this in the swap partition for a >> couple of reasons. >> >> 1. on many modern linux systems the swap partition is not large enough. >> >> for example, on my boxes with 16G or ram I only allocate 2G of swap >> space > > WTF? So allocate larger swap partition. You just told me disks are big > enough. swap partitions are limited to 2G (or at least they were a couple of months ago when I last checked). I also don't want to run the risk of having a box try to _use_ 16G worth of swap. I'd rather have the box hit OOM first. >> 2. it's too easy for other things to stomp on your swap partition. >> >> for example: booting from a live CD that finds and uses swap >> partitions > > That's a feature. If you are booting from live CD, you _want_ to erase > any hibernation image. why? it's been stated that doing a std and booting another OS (including windows) is a valid and common useage. saying that if you boot another OS you trash your suspended image doesn't sound reasonable. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:49 ` David Lang @ 2007-04-26 23:27 ` Pavel Machek 2007-04-26 22:56 ` David Lang 2007-04-27 0:23 ` Olivier Galibert 1 sibling, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-26 23:27 UTC (permalink / raw) To: David Lang; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML Hi! > >That's a feature. If you are booting from live CD, you _want_ to erase > >any hibernation image. > > why? > > it's been stated that doing a std and booting another OS (including > windows) is a valid and common useage. saying that if you boot another OS > you trash your suspended image doesn't sound reasonable. If you hibernate your machine, boot from live cd, and change anything on any filesystem, you are pretty likely to loose that filesystem. Doing that with Windows is okay as Windows do not usually write to ext3 partitions. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 23:27 ` Pavel Machek @ 2007-04-26 22:56 ` David Lang 0 siblings, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-26 22:56 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Fri, 27 Apr 2007, Pavel Machek wrote: > Hi! > >>> That's a feature. If you are booting from live CD, you _want_ to erase >>> any hibernation image. >> >> why? >> >> it's been stated that doing a std and booting another OS (including >> windows) is a valid and common useage. saying that if you boot another OS >> you trash your suspended image doesn't sound reasonable. > > If you hibernate your machine, boot from live cd, and change anything > on any filesystem, you are pretty likely to loose that filesystem. booting from a live CD doesn't mean that you are going to mount the filesystem, let alone change it. but swap is not supposed to be this sensitive. David Lang > Doing that with Windows is okay as Windows do not usually write to > ext3 partitions. > Pavel > ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 22:49 ` David Lang 2007-04-26 23:27 ` Pavel Machek @ 2007-04-27 0:23 ` Olivier Galibert 1 sibling, 0 replies; 135+ messages in thread From: Olivier Galibert @ 2007-04-27 0:23 UTC (permalink / raw) To: David Lang Cc: Pavel Machek, Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Thu, Apr 26, 2007 at 03:49:51PM -0700, David Lang wrote: > swap partitions are limited to 2G (or at least they were a couple of months > ago when I last checked). I also don't want to run the risk of having a box > try to _use_ 16G worth of swap. I'd rather have the box hit OOM first. They aren't limited anymore, I have a number of machines with 20G swap for experiments. OG. ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 16:56 ` Linus Torvalds ` (5 preceding siblings ...) 2007-04-26 22:42 ` Pavel Machek @ 2007-04-27 12:49 ` Pavel Machek 2007-04-27 21:26 ` Rafael J. Wysocki 6 siblings, 1 reply; 135+ messages in thread From: Pavel Machek @ 2007-04-27 12:49 UTC (permalink / raw) To: Linus Torvalds; +Cc: Nigel Cunningham, Pekka Enberg, LKML Hi! > > * Doing things in the right order? (Prepare the image, then do the > > atomic copy, then save). > > I'd actually like to discuss this a bit.. > > I'm obviously not a huge fan of the whole user/kernel level split and > interfaces, but I actually do think that there is *one* split that makes > sense: > > - generate the (whole) snapshot image entirely inside the kernel > > - do nothing else (ie no IO at all), and just export it as a single image > to user space (literally just mapping the pages into user space). > *one* interface. None of the "pretty UI update" crap. Just a single > system call: > > void *snapshot_system(u32 *size); > > which will map in the snapshot, return the mapped address and the size > (and if you want to support snapshots > 4GB, be my guest, but I suspect > you're actually *better* off just admitting that if you cannot shrink > the snapshot to less than 32 bits, it's not worth doing) I think this is very similar to current uswsusp design; except that we are using read on /dev/snapshot to read the snapshot (not memory mapping) and that we freeze the system (because I do not think killall _SIGSTOP is enough). Can you confirm that it is indeed similar design, or tell me why I'm wrong? You had some pretty strong words for uswsusp before, so I'd like to understand your position here. ("Ouch, I do not know, I am out of time" is still better reply than silence.) Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 12:49 ` Pavel Machek @ 2007-04-27 21:26 ` Rafael J. Wysocki 2007-04-27 22:12 ` David Lang 0 siblings, 1 reply; 135+ messages in thread From: Rafael J. Wysocki @ 2007-04-27 21:26 UTC (permalink / raw) To: Pavel Machek; +Cc: Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Friday, 27 April 2007 14:49, Pavel Machek wrote: > Hi! > > > > * Doing things in the right order? (Prepare the image, then do the > > > atomic copy, then save). > > > > I'd actually like to discuss this a bit.. > > > > I'm obviously not a huge fan of the whole user/kernel level split and > > interfaces, but I actually do think that there is *one* split that makes > > sense: > > > > - generate the (whole) snapshot image entirely inside the kernel > > > > - do nothing else (ie no IO at all), and just export it as a single image > > to user space (literally just mapping the pages into user space). > > *one* interface. None of the "pretty UI update" crap. Just a single > > system call: > > > > void *snapshot_system(u32 *size); > > > > which will map in the snapshot, return the mapped address and the size > > (and if you want to support snapshots > 4GB, be my guest, but I suspect > > you're actually *better* off just admitting that if you cannot shrink > > the snapshot to less than 32 bits, it's not worth doing) > > I think this is very similar to current uswsusp design; except that we > are using read on /dev/snapshot to read the snapshot (not memory > mapping) and that we freeze the system Yes, it seems so. > (because I do not think killall _SIGSTOP is enough). Agreed. Greetings, Rafael ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-27 21:26 ` Rafael J. Wysocki @ 2007-04-27 22:12 ` David Lang 0 siblings, 0 replies; 135+ messages in thread From: David Lang @ 2007-04-27 22:12 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Pavel Machek, Linus Torvalds, Nigel Cunningham, Pekka Enberg, LKML On Fri, 27 Apr 2007, Rafael J. Wysocki wrote: > On Friday, 27 April 2007 14:49, Pavel Machek wrote: >> >> I think this is very similar to current uswsusp design; except that we >> are using read on /dev/snapshot to read the snapshot (not memory >> mapping) and that we freeze the system > > Yes, it seems so. > >> (because I do not think killall _SIGSTOP is enough). > remember, this is being done inside the kernel. the kernel can do things like saving off the scheduler queue to prevent any userspace from running during the snapshot, it could then move selected pids over to a new queue to selectivly 'unfreeze' whatever you need (like the X processes for example) and then proceed normally (allowing processes to be spawned, forked, etc without activiating the rest of userspace becouse the rest just won't be available to be scheduled) and userspace can tell the kernel the list of pids to unfreeze so the kernel doesn't need to try and guess. David Lang ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 6:04 Nigel Cunningham 2007-04-26 7:28 ` Pekka Enberg @ 2007-04-26 8:38 ` Jan Engelhardt 2007-04-26 9:33 ` Nigel Cunningham 2007-04-28 0:28 ` Bojan Smojver 2 siblings, 1 reply; 135+ messages in thread From: Jan Engelhardt @ 2007-04-26 8:38 UTC (permalink / raw) To: Nigel Cunningham; +Cc: Linus Torvalds, LKML On Apr 26 2007 16:04, Nigel Cunningham wrote: > >Hi again. > >So - trying to get back to the original discussion - what (if anything) >do you see as the way ahead? > >The options I can think of are (starting with things I can do): > >1) [...] >2) [...] >3) [...] >4) [...] >5) [...] >6) [...] >7) [...] Perhaps do it the EVMS way? Do as much in userspace as possible, and trying having a simple kernel API at the same time. Perhaps (3) would be it, but ask Redhat _first_ before quitting anything :) Jan -- ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 8:38 ` Jan Engelhardt @ 2007-04-26 9:33 ` Nigel Cunningham 0 siblings, 0 replies; 135+ messages in thread From: Nigel Cunningham @ 2007-04-26 9:33 UTC (permalink / raw) To: Jan Engelhardt; +Cc: Linus Torvalds, LKML [-- Attachment #1: Type: text/plain, Size: 791 bytes --] Hi. On Thu, 2007-04-26 at 10:38 +0200, Jan Engelhardt wrote: > On Apr 26 2007 16:04, Nigel Cunningham wrote: > > > >Hi again. > > > >So - trying to get back to the original discussion - what (if anything) > >do you see as the way ahead? > > > >The options I can think of are (starting with things I can do): > > > >1) [...] > >2) [...] > >3) [...] > >4) [...] > >5) [...] > >6) [...] > >7) [...] > > Perhaps do it the EVMS way? Do as much in userspace as possible, and > trying having a simple kernel API at the same time. > Perhaps (3) would be it, but ask Redhat _first_ before quitting anything :) :) Well, the EVMS way is swsusp. Personally, I agree with Linus that think putting suspend to disk code in userspace is just a broken idea. Regards, Nigel [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 135+ messages in thread
* Re: Back to the future. 2007-04-26 6:04 Nigel Cunningham 2007-04-26 7:28 ` Pekka Enberg 2007-04-26 8:38 ` Jan Engelhardt @ 2007-04-28 0:28 ` Bojan Smojver 2 siblings, 0 replies; 135+ messages in thread From: Bojan Smojver @ 2007-04-28 0:28 UTC (permalink / raw) To: linux-kernel Nigel Cunningham <nigel <at> nigel.suspend2.net> writes: > 4) uswsusp and swsusp get dropped and Suspend2 goes into mainline. After reading most of this thread, it seems that Linus is of the view that all three of these suck in one way or another. Suspend2 has the most features and is the fastest of the lot. It can behave like swsusp from the user's point of view (i.e. echo disk > /sys/power/state), so the migration should be seamless for most distros. It isn't complicated to set up. It's been proven in the field. It looks pretty. So, while we're waiting for the next STD technology, why not have the best and develop from there? -- Bojan ^ permalink raw reply [flat|nested] 135+ messages in thread
end of thread, other threads:[~2007-06-01 19:02 UTC | newest] Thread overview: 135+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <8e5l8-7SD-21@gated-at.bofh.it> [not found] ` <8e6Ka-1uR-3@gated-at.bofh.it> [not found] ` <8e6TS-1Id-11@gated-at.bofh.it> [not found] ` <8efu9-6mF-1@gated-at.bofh.it> [not found] ` <8ekWV-6FF-33@gated-at.bofh.it> [not found] ` <8el6y-6Sj-5@gated-at.bofh.it> [not found] ` <8elpT-7wY-21@gated-at.bofh.it> 2007-04-28 11:04 ` Back to the future Bodo Eggert 2007-04-26 6:04 Nigel Cunningham 2007-04-26 7:28 ` Pekka Enberg [not found] ` <1177573348.50 25.224.camel@nigel.suspend2.net> 2007-04-26 7:42 ` Nigel Cunningham 2007-04-26 8:17 ` Pekka Enberg 2007-04-26 9:28 ` Nigel Cunningham 2007-04-26 17:29 ` Luca Tettamanti 2007-04-26 16:56 ` Linus Torvalds 2007-04-26 17:03 ` Xavier Bestel 2007-04-26 17:34 ` Linus Torvalds 2007-04-26 20:08 ` Nigel Cunningham 2007-04-26 20:45 ` Linus Torvalds 2007-04-26 20:50 ` Nigel Cunningham 2007-04-27 0:10 ` Olivier Galibert 2007-04-27 10:21 ` Daniel Pittman 2007-04-27 23:19 ` Nigel Cunningham 2007-04-26 21:38 ` Theodore Tso 2007-04-27 10:10 ` Christoph Hellwig 2007-04-26 22:08 ` Rafael J. Wysocki 2007-04-26 22:20 ` Nigel Cunningham 2007-04-26 23:15 ` Linus Torvalds 2007-04-27 7:51 ` Pekka Enberg 2007-04-26 17:07 ` Linus Torvalds 2007-04-26 18:22 ` Chase Venters 2007-04-26 18:50 ` David Lang 2007-04-26 19:56 ` Nigel Cunningham 2007-04-27 4:52 ` Pekka J Enberg 2007-04-27 6:08 ` Nigel Cunningham 2007-04-27 6:18 ` Pekka J Enberg 2007-04-27 6:29 ` Pekka J Enberg 2007-04-27 6:34 ` Nigel Cunningham 2007-04-27 6:50 ` Pekka J Enberg 2007-04-27 7:03 ` Nigel Cunningham 2007-04-27 7:24 ` Pekka J Enberg 2007-04-27 9:50 ` Oliver Neukum 2007-04-27 10:12 ` Pekka J Enberg 2007-04-27 19:07 ` Oliver Neukum 2007-04-28 9:22 ` Pekka Enberg 2007-04-28 13:37 ` Oliver Neukum 2007-05-03 12:06 ` Pavel Machek 2007-05-04 21:52 ` Indan Zupancic 2007-05-05 9:16 ` Pavel Machek 2007-05-05 12:02 ` Indan Zupancic 2007-04-28 10:35 ` Rafael J. Wysocki 2007-04-28 18:43 ` David Lang 2007-04-28 19:37 ` Rafael J. Wysocki 2007-04-27 21:24 ` Rafael J. Wysocki 2007-04-27 21:44 ` Linus Torvalds 2007-04-27 22:04 ` Rafael J. Wysocki 2007-04-27 22:08 ` Linus Torvalds 2007-04-27 22:41 ` Rafael J. Wysocki 2007-04-27 22:26 ` David Lang 2007-04-27 23:21 ` Rafael J. Wysocki 2007-04-27 23:01 ` David Lang 2007-04-28 0:02 ` Rafael J. Wysocki 2007-04-27 23:17 ` Linus Torvalds 2007-04-27 23:45 ` Rafael J. Wysocki 2007-04-27 23:57 ` Nigel Cunningham 2007-04-27 23:50 ` David Lang 2007-04-28 0:40 ` Linus Torvalds 2007-04-28 6:58 ` Oliver Neukum 2007-04-28 9:16 ` Pekka J Enberg 2007-04-28 18:28 ` David Lang 2007-05-03 17:18 ` Pavel Machek 2007-05-07 2:13 ` David Lang 2007-05-07 3:33 ` Kyle Moffett 2007-05-07 12:48 ` Pavel Machek 2007-05-07 12:52 ` Oliver Neukum 2007-05-07 14:37 ` david 2007-05-07 19:51 ` Pavel Machek 2007-05-07 19:55 ` david 2007-05-07 20:38 ` Pavel Machek 2007-05-08 17:36 ` Disconnect 2007-04-27 23:59 ` Linus Torvalds 2007-04-28 0:18 ` Linus Torvalds 2007-05-05 11:42 ` Pavel Machek 2007-04-28 0:50 ` Paul Mackerras 2007-04-28 1:00 ` Rafael J. Wysocki 2007-04-28 1:12 ` Linus Torvalds 2007-04-28 0:54 ` David Lang 2007-04-28 1:44 ` Rafael J. Wysocki 2007-04-28 2:51 ` Daniel Hazelton 2007-04-28 8:50 ` Pavel Machek 2007-04-28 9:24 ` Rafael J. Wysocki 2007-04-28 16:28 ` Linus Torvalds 2007-04-28 17:50 ` Rafael J. Wysocki 2007-04-28 21:25 ` Linus Torvalds 2007-04-28 23:03 ` Rafael J. Wysocki 2007-04-28 23:45 ` Linus Torvalds 2007-04-29 0:01 ` Nigel Cunningham 2007-04-29 5:01 ` Bojan Smojver 2007-04-29 3:43 ` Kyle Moffett 2007-04-29 8:57 ` Rafael J. Wysocki 2007-04-29 8:59 ` Pavel Machek 2007-04-29 9:32 ` Rafael J. Wysocki 2007-04-29 8:23 ` Pavel Machek 2007-04-29 9:22 ` Rafael J. Wysocki 2007-04-28 18:32 ` David Lang 2007-04-28 19:14 ` Rafael J. Wysocki 2007-04-28 18:44 ` David Lang 2007-05-03 15:25 ` Pavel Machek 2007-04-27 22:07 ` Nigel Cunningham 2007-04-28 1:03 ` Kyle Moffett 2007-04-28 1:15 ` Rafael J. Wysocki 2007-04-28 0:51 ` David Lang 2007-04-28 1:25 ` Kyle Moffett 2007-05-03 15:10 ` Pavel Machek 2007-05-03 16:53 ` Kyle Moffett 2007-05-04 7:52 ` David Greaves 2007-05-04 13:27 ` Kyle Moffett 2007-04-28 0:18 ` Jeremy Fitzhardinge 2007-04-28 1:00 ` Matthew Garrett 2007-04-28 1:05 ` Jeremy Fitzhardinge 2007-05-03 15:14 ` Pavel Machek 2007-06-01 19:00 ` Eric W. Biederman 2007-04-28 1:08 ` Rafael J. Wysocki 2007-04-27 20:44 ` Rafael J. Wysocki 2007-04-28 19:09 ` Bill Davidsen 2007-04-26 22:40 ` Pavel Machek 2007-04-27 5:41 ` Pekka Enberg 2007-04-27 14:55 ` Pavel Machek 2007-04-27 21:39 ` Nigel Cunningham 2007-04-26 22:42 ` Pavel Machek 2007-04-26 22:24 ` David Lang 2007-04-26 23:12 ` Pavel Machek 2007-04-26 22:49 ` David Lang 2007-04-26 23:27 ` Pavel Machek 2007-04-26 22:56 ` David Lang 2007-04-27 0:23 ` Olivier Galibert 2007-04-27 12:49 ` Pavel Machek 2007-04-27 21:26 ` Rafael J. Wysocki 2007-04-27 22:12 ` David Lang 2007-04-26 8:38 ` Jan Engelhardt 2007-04-26 9:33 ` Nigel Cunningham 2007-04-28 0:28 ` Bojan Smojver
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.