* [PATCH 00/30] C/R OpenVZ/Virtuozzo style @ 2009-04-10 2:32 Alexey Dobriyan 2009-04-10 2:44 ` Alexey Dobriyan ` (4 more replies) 0 siblings, 5 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-10 2:32 UTC (permalink / raw) To: akpm, containers Cc: xemul, serue, dave, mingo, orenl, hch, torvalds, linux-kernel This is to show how we see C/R and to provoke discussion on number of important issues (mounts, ...). This is small part of long-awaited to be cleanuped code. It's able to restore busyloop on i386 and x86_64 and restore i386 busyloop on x86_64. It wasn't tested much more than that. I'm currently starting formal testsuite, otherwise it's whack-a-mole game and formal TODO list (a huge one). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 2:32 [PATCH 00/30] C/R OpenVZ/Virtuozzo style Alexey Dobriyan @ 2009-04-10 2:44 ` Alexey Dobriyan 2009-04-10 5:07 ` Dave Hansen ` (3 subsequent siblings) 4 siblings, 0 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-10 2:44 UTC (permalink / raw) To: akpm, containers Cc: xemul, serue, dave, mingo, orenl, hch, torvalds, linux-kernel On Fri, Apr 10, 2009 at 06:32:07AM +0400, Alexey Dobriyan wrote: > This is to show how we see C/R and to provoke discussion on number of > important issues (mounts, ...). > > This is small part of long-awaited to be cleanuped code. > > It's able to restore busyloop on i386 and x86_64 and restore i386 > busyloop on x86_64. It wasn't tested much more than that. > > I'm currently starting formal testsuite, otherwise it's whack-a-mole > game and formal TODO list (a huge one). Andrew, patches 1-9 stand by themselves and can be applied at any moment (except 4 and 7 where I made functions needlessly global). ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 2:32 [PATCH 00/30] C/R OpenVZ/Virtuozzo style Alexey Dobriyan 2009-04-10 2:44 ` Alexey Dobriyan @ 2009-04-10 5:07 ` Dave Hansen 2009-04-13 9:14 ` Alexey Dobriyan 2009-04-10 8:28 ` [PATCH 00/30] C/R OpenVZ/Virtuozzo style Ingo Molnar ` (2 subsequent siblings) 4 siblings, 1 reply; 37+ messages in thread From: Dave Hansen @ 2009-04-10 5:07 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, mingo, orenl, hch, torvalds, linux-kernel Hey Alexey, I'm curious how you see these fitting in with the work that we've been doing with Oren. Do you mean to just start a discussion or are you really proposing these as an alternative to what Oren has been posting? -- Dave ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 5:07 ` Dave Hansen @ 2009-04-13 9:14 ` Alexey Dobriyan 2009-04-13 11:16 ` Dave Hansen ` (2 more replies) 0 siblings, 3 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-13 9:14 UTC (permalink / raw) To: Dave Hansen Cc: akpm, containers, xemul, serue, mingo, orenl, hch, torvalds, linux-kernel On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: > I'm curious how you see these fitting in with the work that we've been > doing with Oren. Do you mean to just start a discussion or are you > really proposing these as an alternative to what Oren has been posting? Yes, this is posted as alternative. Some design decisions are seen as incorrect from here like: * not rejecting checkpoint with possible "leaks" from container * not having CAP_SYS_ADMIN on restart(2) * having small (TASK_COMM_LEN) and bigger (objref[1]) image format misdesigns. * doing fork(2)+restart(2) per restarted task and whole orchestration done from userspace/future init task. * not seeing bigger picture (note, this is not equivalent to supporting everything at once, nobody is asking for everything at once) wrt shared objects and format and code changes because of that (note again, image format will change, but it's easy to design high level structure which won't change) * checking of unsupported features done at wrong place and wrong time and runtime overhead because of that on CR=y kernels. There are also low-level things, but it's cumulative effect. [1] Do I inderstand correctly that cookie for shared object is an address on kernel stack? This is obviously unreliable, if yes :-) int objref; ... /* adding 'file' to the hash will keep a reference to it */ new = cr_obj_add_ptr(ctx, file, &objref, CR_OBJ_FILE, 0); ^^^^^^^ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-13 9:14 ` Alexey Dobriyan @ 2009-04-13 11:16 ` Dave Hansen 2009-04-13 18:07 ` Dave Hansen 2009-04-14 4:26 ` Oren Laadan 2 siblings, 0 replies; 37+ messages in thread From: Dave Hansen @ 2009-04-13 11:16 UTC (permalink / raw) To: Alexey Dobriyan Cc: xemul, containers, linux-kernel, hch, akpm, torvalds, mingo On Mon, 2009-04-13 at 13:14 +0400, Alexey Dobriyan wrote: > [1] Do I inderstand correctly that cookie for shared object is an > address on kernel stack? This is obviously unreliable, if yes :-) > > int objref; > ... > /* adding 'file' to the hash will keep a reference to it */ > new = cr_obj_add_ptr(ctx, file, &objref, CR_OBJ_FILE, 0); No, that's just Oren's way to get two return variables. He needs 'new' to figure out whether to write out the full file or just an objref record. He also needs 'objref' itself in case of writing either. cr_obj_add_ptr() modifies objref. -- Dave ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-13 9:14 ` Alexey Dobriyan 2009-04-13 11:16 ` Dave Hansen @ 2009-04-13 18:07 ` Dave Hansen 2009-04-14 4:26 ` Oren Laadan 2 siblings, 0 replies; 37+ messages in thread From: Dave Hansen @ 2009-04-13 18:07 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, mingo, orenl, hch, torvalds, linux-kernel On Mon, 2009-04-13 at 13:14 +0400, Alexey Dobriyan wrote: > On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: > > I'm curious how you see these fitting in with the work that we've been > > doing with Oren. Do you mean to just start a discussion or are you > > really proposing these as an alternative to what Oren has been posting? > > Yes, this is posted as alternative. I'm sure that you can understand that we've been working on Oren's patches for a bit and don't want to jump ship on a whim. :) Do you think we can change Oren's patch sufficiently to meet your needs here? Do you think we can change your patch sufficiently to meet Oren's needs? > Some design decisions are seen as incorrect from here like: > * not rejecting checkpoint with possible "leaks" from container Could you elaborate on this a bit? This sounds really important, but I'm having difficulty seeing how your patch addresses this or how Oren's doesn't. > * not having CAP_SYS_ADMIN on restart(2) > * having small (TASK_COMM_LEN) and bigger (objref[1]) image format > misdesigns. The format is certainly still in a huge amount of flux. I'd be really happy to see patches fixing these or clarifying them. I'm planning on going and looking at your patches in detail right now. Would you mind doing a more detailed review of Oren's so that we could use your expertise to close some of these gaps in the format? > * doing fork(2)+restart(2) per restarted task and whole orchestration > done from userspace/future init task. Yeah, we've certainly argued plenty about this one amongst ourselves. I'm personally open to changing this especially if you feel strongly about it. > * not seeing bigger picture (note, this is not equivalent to supporting > everything at once, nobody is asking for everything at once) wrt shared > objects and format and code changes because of that (note again, image > format will change, but it's easy to design high level structure which > won't change) > * checking of unsupported features done at wrong place and wrong time > and runtime overhead because of that on CR=y kernels. Yeah, I'm especially worried with respect to the shared objects right now. I'm pestering Oren to replace a big chunk of that stuff with other garbage that I've dreamed up. I'll go take a look at how you do this in your patches... Perhaps it will work even better. -- Dave ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-13 9:14 ` Alexey Dobriyan 2009-04-13 11:16 ` Dave Hansen 2009-04-13 18:07 ` Dave Hansen @ 2009-04-14 4:26 ` Oren Laadan 2009-04-14 14:58 ` Alexey Dobriyan 2 siblings, 1 reply; 37+ messages in thread From: Oren Laadan @ 2009-04-14 4:26 UTC (permalink / raw) To: Alexey Dobriyan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel Alexey Dobriyan wrote: > On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: >> I'm curious how you see these fitting in with the work that we've been >> doing with Oren. Do you mean to just start a discussion or are you >> really proposing these as an alternative to what Oren has been posting? > > Yes, this is posted as alternative. > > Some design decisions are seen as incorrect from here like: A definition of "design" would help; I find most of your comments below either vague, cryptic, or technical nits... > * not rejecting checkpoint with possible "leaks" from container ...like this, for example. Anything in the current design makes it impossible ? Anything prohibiting your from adding this feature to the current patch-set ? > * not having CAP_SYS_ADMIN on restart(2) Surely you have read already on the containers mailing list that for the *time being* we attempt to get as far as possible without requiring root privileges, to identify security hot-spots. And surely you have read there the observation that for the general case root privileges will probably be inevitable. And surely you don't seriously think that adding this check changes the "design"... > * having small (TASK_COMM_LEN) and bigger (objref[1]) image format > misdesigns. Eh ? > * doing fork(2)+restart(2) per restarted task and whole orchestration > done from userspace/future init task. Why is it "incorrect" ? What makes it "better" to do it in the kernel ? Only because you say so is not convincing. (also see my other post in this matter). > * not seeing bigger picture (note, this is not equivalent to supporting > everything at once, nobody is asking for everything at once) wrt shared > objects and format and code changes because of that (note again, image > format will change, but it's easy to design high level structure which > won't change) Why don't you describe the bigger picture so that the rest of can finally see it, too ?! (what a waste to have spent all this effort in vain...) > * checking of unsupported features done at wrong place and wrong time > and runtime overhead because of that on CR=y kernels. Eh ? Did you follow the code recently ? > > There are also low-level things, but it's cumulative effect. > > [1] Do I inderstand correctly that cookie for shared object is an > address on kernel stack? This is obviously unreliable, if yes :-) Ah... I see... you didn't look at it that hard, not even read the documentation with the code. > > int objref; > ... > /* adding 'file' to the hash will keep a reference to it */ > new = cr_obj_add_ptr(ctx, file, &objref, CR_OBJ_FILE, 0); > ^^^^^^^ That said, there are more similarities than differences between your suggested template and the current patchset. With your expertise you can contribute tremendously if you decide to work together. Oren. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 4:26 ` Oren Laadan @ 2009-04-14 14:58 ` Alexey Dobriyan 2009-04-14 18:08 ` Oren Laadan 0 siblings, 1 reply; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 14:58 UTC (permalink / raw) To: Oren Laadan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: > Alexey Dobriyan wrote: > > On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: > >> I'm curious how you see these fitting in with the work that we've been > >> doing with Oren. Do you mean to just start a discussion or are you > >> really proposing these as an alternative to what Oren has been posting? > > > > Yes, this is posted as alternative. > > > > Some design decisions are seen as incorrect from here like: > > A definition of "design" would help; I find most of your comments > below either vague, cryptic, or technical nits... > > > * not rejecting checkpoint with possible "leaks" from container > > ...like this, for example. Like checkpointing one process out of many living together. If you allow this you consequently drop checks (e.g. refcount checks) for "somebody else is using structure to be checkpointed". If you drop these checks, you can't decipher legal sutiations like "process genuinely doesn't care about routing table of netns it lives in" from "illegal" situations like "process created shm segment but currently doesn't use it so not checkpointing ipcns will result in breakagenlater". You'll have to move responsibility to user, so user exactly knows what app relies on and on what. And probably add flags like CKPT_SHM, CKPT_NETNS_ROUTE ad infinitum. And user will screw it badly and complain: "after restart my app segfaulted". And user himself is screwed now: old running process is already killed (it was checkpointed on purpose) and new process in image segfaults every time it's restarted. All of this in out opinion results in doing C/R unreliably and badly. We are going to do it well and dig from the other side. If "leak" (any "leak") is detected, C/R is aborted because kernel doesn't know what app relies on and what app doesn't care about. This protected from situations and failure modes described above. This also protects to some extent from in-kernel changes where C/R code should have been updated but wasn't. Person doing incomplete change won't notice e.g refcount checks and won't try to "fix" them. But we'll notice it, e.g. when running testsuite (amen) and update C/R code accordingly. I'm talking about these checks so that everyone understands: for_each_cr_object(ctx, obj, CR_CTX_MM_STRUCT) { struct mm_struct *mm = obj->o_obj; unsigned int cnt = atomic_read(&mm->mm_users); if (obj->o_count != cnt) { printk("%s: mm_struct %p has external references %lu:%u\n", __func__, mm, obj->o_count, cnt); return -EINVAL; } } They are like moving detectors, small, invisible, something moved, you don't know what, but you don't care because you have to investigate anyway. In this scheme, if user wants to checkpoint just one process, he should start it alone in separate container. Right now, in posted patchset as cloned process with CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET All of this by definition won't create situation "mom, where did my shm segment go?". > Anything in the current design makes it impossible ? > > Anything prohibiting your from adding this feature to the current > patch-set ? > > > * not having CAP_SYS_ADMIN on restart(2) > > Surely you have read already on the containers mailing list that > for the *time being* we attempt to get as far as possible without > requiring root privileges, to identify security hot-spots. More or less everything is hotspot. > And surely you have read there the observation that for the general > case root privileges will probably be inevitable. > > And surely you don't seriously think that adding this check changes > the "design"... This is not about design now, this is about if you allow restart(2) for everyone, you open kernel to very very high number of holes of all types. I wrote about it in reply to your code. Numbers of states described by any valid image as described by header files is very much higher than number of states that can validly end up in an image. Reducing set of states to only valid ones will require many checks. For example, just one structure: struct cr_hdr_cpu. Segments are directly loaded regardless of RPL. Debug registers are directly loaded regardless of to where breakpoint points to. If you don't understand scale of problem and insist on restart(2) not having CAP_SYS_ADMIN, we'll just drop from discussion and write email to bugtraq to warn sysadmins at least. restart(2) for everyone sounds like excellent feature and is excellent feature but simultaneusly is completely unrealistic for Unix-like kernel. > > * having small (TASK_COMM_LEN) and bigger (objref[1]) image format > > misdesigns. > > Eh ? TASK_COMM_LEN is 16 bytes and is not going to change, so you do __u8 cr_comm[16]; and forget. If TASK_COMM_LEN changes you'll just bump size of task_struct image and image version. Saving ->comm as variable-sized string is complication out of nowhere. As for objref, my apoligies, objection dropped, I understood how it can be useful. > > * doing fork(2)+restart(2) per restarted task and whole orchestration > > done from userspace/future init task. > > Why is it "incorrect" ? > What makes it "better" to do it in the kernel ? > Only because you say so is not convincing. > > (also see my other post in this matter). > > > * not seeing bigger picture (note, this is not equivalent to supporting > > everything at once, nobody is asking for everything at once) wrt shared > > objects and format and code changes because of that (note again, image > > format will change, but it's easy to design high level structure which > > won't change) > > Why don't you describe the bigger picture so that the rest of can > finally see it, too ?! > (what a waste to have spent all this effort in vain...) Sit and graph down all structures that you will eventually checkpoint and relations about them. Things like "struct cr_hdr_pids" won't even be on the radar. > > * checking of unsupported features done at wrong place and wrong time > > and runtime overhead because of that on CR=y kernels. > > Eh ? Did you follow the code recently ? Checks during dup_fd() were dropped? Good. > > There are also low-level things, but it's cumulative effect. > > > > [1] Do I inderstand correctly that cookie for shared object is an > > address on kernel stack? This is obviously unreliable, if yes :-) > > Ah... I see... you didn't look at it that hard, not even read the > documentation with the code. > > > > > int objref; > > ... > > /* adding 'file' to the hash will keep a reference to it */ > > new = cr_obj_add_ptr(ctx, file, &objref, CR_OBJ_FILE, 0); > > ^^^^^^^ > > That said, there are more similarities than differences between your > suggested template and the current patchset. With your expertise you > can contribute tremendously if you decide to work together. OK, totally misread code. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 14:58 ` Alexey Dobriyan @ 2009-04-14 18:08 ` Oren Laadan 2009-04-14 18:34 ` Alexey Dobriyan 2009-04-14 20:49 ` Alexey Dobriyan 0 siblings, 2 replies; 37+ messages in thread From: Oren Laadan @ 2009-04-14 18:08 UTC (permalink / raw) To: Alexey Dobriyan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel Alexey Dobriyan wrote: > On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: >> Alexey Dobriyan wrote: >>> On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: >>>> I'm curious how you see these fitting in with the work that we've been >>>> doing with Oren. Do you mean to just start a discussion or are you >>>> really proposing these as an alternative to what Oren has been posting? >>> Yes, this is posted as alternative. >>> >>> Some design decisions are seen as incorrect from here like: >> A definition of "design" would help; I find most of your comments >> below either vague, cryptic, or technical nits... >> >>> * not rejecting checkpoint with possible "leaks" from container >> ...like this, for example. > > Like checkpointing one process out of many living together. See the thread on creating tasks in userspace vs. kernel space: the argument here is that is an interesting enough use case for a checkpoint of not-an-entire-container. Of course it will require more logic to it, so the user can choose what she cares or does not care about, and the kernel could alert the user about it. The point is, that it is, IMHO, a desirable capability. > > If you allow this you consequently drop checks (e.g. refcount checks) > for "somebody else is using structure to be checkpointed". > >From this point below, I totally agree with you that for the purpose of a whole-container-checkpoint this is certainly desirable. My point was that it can be easily added the existing patchset (not yours). Why not add it there ? > If you drop these checks, you can't decipher legal sutiations like > "process genuinely doesn't care about routing table of netns it lives in" > from "illegal" situations like "process created shm segment but currently > doesn't use it so not checkpointing ipcns will result in breakagenlater". > > You'll have to move responsibility to user, so user exactly knows what > app relies on and on what. And probably add flags like CKPT_SHM, > CKPT_NETNS_ROUTE ad infinitum. > > And user will screw it badly and complain: "after restart my app > segfaulted". And user himself is screwed now: old running process is > already killed (it was checkpointed on purpose) and new process in image > segfaults every time it's restarted. > > All of this in out opinion results in doing C/R unreliably and badly. > > We are going to do it well and dig from the other side. > > If "leak" (any "leak") is detected, C/R is aborted because kernel > doesn't know what app relies on and what app doesn't care about. > > This protected from situations and failure modes described above. > > This also protects to some extent from in-kernel changes where C/R code > should have been updated but wasn't. Person doing incomplete change won't > notice e.g refcount checks and won't try to "fix" them. But we'll notice it, > e.g. when running testsuite (amen) and update C/R code accordingly. > > I'm talking about these checks so that everyone understands: > > for_each_cr_object(ctx, obj, CR_CTX_MM_STRUCT) { > struct mm_struct *mm = obj->o_obj; > unsigned int cnt = atomic_read(&mm->mm_users); > > if (obj->o_count != cnt) { > printk("%s: mm_struct %p has external references %lu:%u\n", __func__, mm, obj->o_count, cnt); > return -EINVAL; > } > } > > They are like moving detectors, small, invisible, something moved, you don't > know what, but you don't care because you have to investigate anyway. > > In this scheme, if user wants to checkpoint just one process, he should > start it alone in separate container. Right now, in posted patchset > as cloned process with > CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET So you suggest that to checkpoint a single process, say a cpu job that would run a week, which runs in the topmost pid_ns, I will need to checkpoint the entire topmost pid_ns (as a container, if at all possible - surely there will non-checkpointable tasks there) and then in user-space filter out the data and leave only one task, and then to restart I'll use a container again ? Pffff ... why not just allow subtree checkpoint, not in a container, with its well known limitations -- would work the same, for very little additional implementation cost. > > All of this by definition won't create situation "mom, where did my shm segment go?". > >> Anything in the current design makes it impossible ? >> >> Anything prohibiting your from adding this feature to the current >> patch-set ? >> >>> * not having CAP_SYS_ADMIN on restart(2) >> Surely you have read already on the containers mailing list that >> for the *time being* we attempt to get as far as possible without >> requiring root privileges, to identify security hot-spots. > > More or less everything is hotspot. Going back to my example of a subtree above: what sort of security issues you would have here, assuming all restart operations go via the usual security checks just like syscalls, and we take special care about values we allow as input, e.g. cpu debug registers etc ? BTW, the checks for cpu registers and other values for sanity is needed anyway to ensure that the kernel freak out if wrong input is fed (maliciously or by mistake). > >> And surely you have read there the observation that for the general >> case root privileges will probably be inevitable. >> >> And surely you don't seriously think that adding this check changes >> the "design"... > > This is not about design now, this is about if you allow restart(2) for > everyone, you open kernel to very very high number of holes of all > types. I wrote about it in reply to your code. > And there are many examples where this is not the case. The current patchset (v14-rc3), for example, to the best of my knowledge, does not have such issues (well, need to add checks for validity of regs but there is a FIXME comment :) > Numbers of states described by any valid image as described by header > files is very much higher than number of states that can validly end up > in an image. Reducing set of states to only valid ones will require > many checks. > > For example, just one structure: struct cr_hdr_cpu. > > Segments are directly loaded regardless of RPL. > > Debug registers are directly loaded regardless of to where breakpoint > points to. > > If you don't understand scale of problem and insist on restart(2) not > having CAP_SYS_ADMIN, we'll just drop from discussion and write email > to bugtraq to warn sysadmins at least. I humbly suggest that for many useful use-cases restart(2) can succeed and work well without requiring CAP_SYS_ADMIN. I'm also citing Serge saying that it's a good exercise to identify all those specific points where CAP_SYS_ADMIN is needed and make sure we understand them well. For this purpose, _starting_ with not requiring CAP_SYS_ADMIN is actually a good idea. Then we can isolate where it's strictly needed, what are the security implications, and add the requirement there. > > restart(2) for everyone sounds like excellent feature and is excellent > feature but simultaneusly is completely unrealistic for Unix-like kernel. > >>> * having small (TASK_COMM_LEN) and bigger (objref[1]) image format >>> misdesigns. >> Eh ? > > TASK_COMM_LEN is 16 bytes and is not going to change, so you do > > __u8 cr_comm[16]; > > and forget. If TASK_COMM_LEN changes you'll just bump size of > task_struct image and image version. That's an extra safety check. So that kernel doesn't _crash_ if a new image is fed to an older kernel (and for some reason the image altered). And also needed when you support non-CAP_SYS_ADMIN restart(2) (yeah, yeah, I got the point, limited version only). Oren. > > Saving ->comm as variable-sized string is complication out of nowhere. > > As for objref, my apoligies, objection dropped, I understood how it can be useful. > >>> * doing fork(2)+restart(2) per restarted task and whole orchestration >>> done from userspace/future init task. >> Why is it "incorrect" ? >> What makes it "better" to do it in the kernel ? >> Only because you say so is not convincing. >> >> (also see my other post in this matter). >> >>> * not seeing bigger picture (note, this is not equivalent to supporting >>> everything at once, nobody is asking for everything at once) wrt shared >>> objects and format and code changes because of that (note again, image >>> format will change, but it's easy to design high level structure which >>> won't change) >> Why don't you describe the bigger picture so that the rest of can >> finally see it, too ?! >> (what a waste to have spent all this effort in vain...) > > Sit and graph down all structures that you will eventually checkpoint and > relations about them. Things like "struct cr_hdr_pids" won't even be on > the radar. > >>> * checking of unsupported features done at wrong place and wrong time >>> and runtime overhead because of that on CR=y kernels. >> Eh ? Did you follow the code recently ? > > Checks during dup_fd() were dropped? Good. > >>> There are also low-level things, but it's cumulative effect. >>> >>> [1] Do I inderstand correctly that cookie for shared object is an >>> address on kernel stack? This is obviously unreliable, if yes :-) >> Ah... I see... you didn't look at it that hard, not even read the >> documentation with the code. >> >>> int objref; >>> ... >>> /* adding 'file' to the hash will keep a reference to it */ >>> new = cr_obj_add_ptr(ctx, file, &objref, CR_OBJ_FILE, 0); >>> ^^^^^^^ >> That said, there are more similarities than differences between your >> suggested template and the current patchset. With your expertise you >> can contribute tremendously if you decide to work together. > > OK, totally misread code. > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 18:08 ` Oren Laadan @ 2009-04-14 18:34 ` Alexey Dobriyan 2009-04-14 19:31 ` Oren Laadan 2009-04-14 20:49 ` Alexey Dobriyan 1 sibling, 1 reply; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 18:34 UTC (permalink / raw) To: Oren Laadan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote: > > > Alexey Dobriyan wrote: > > On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: > >> Alexey Dobriyan wrote: > >>> On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: > >>>> I'm curious how you see these fitting in with the work that we've been > >>>> doing with Oren. Do you mean to just start a discussion or are you > >>>> really proposing these as an alternative to what Oren has been posting? > >>> Yes, this is posted as alternative. > >>> > >>> Some design decisions are seen as incorrect from here like: > >> A definition of "design" would help; I find most of your comments > >> below either vague, cryptic, or technical nits... > >> > >>> * not rejecting checkpoint with possible "leaks" from container > >> ...like this, for example. > > > > Like checkpointing one process out of many living together. > > See the thread on creating tasks in userspace vs. kernel space: > the argument here is that is an interesting enough use case for > a checkpoint of not-an-entire-container. > > Of course it will require more logic to it, so the user can choose > what she cares or does not care about, and the kernel could alert > the user about it. > > The point is, that it is, IMHO, a desirable capability. > > > > > If you allow this you consequently drop checks (e.g. refcount checks) > > for "somebody else is using structure to be checkpointed". > > > > From this point below, I totally agree with you that for the purpose > of a whole-container-checkpoint this is certainly desirable. My point > was that it can be easily added the existing patchset (not yours). > Why not add it there ? > > > If you drop these checks, you can't decipher legal sutiations like > > "process genuinely doesn't care about routing table of netns it lives in" > > from "illegal" situations like "process created shm segment but currently > > doesn't use it so not checkpointing ipcns will result in breakagenlater". > > > > You'll have to move responsibility to user, so user exactly knows what > > app relies on and on what. And probably add flags like CKPT_SHM, > > CKPT_NETNS_ROUTE ad infinitum. > > > > And user will screw it badly and complain: "after restart my app > > segfaulted". And user himself is screwed now: old running process is > > already killed (it was checkpointed on purpose) and new process in image > > segfaults every time it's restarted. > > > > All of this in out opinion results in doing C/R unreliably and badly. > > > > We are going to do it well and dig from the other side. > > > > If "leak" (any "leak") is detected, C/R is aborted because kernel > > doesn't know what app relies on and what app doesn't care about. > > > > This protected from situations and failure modes described above. > > > > This also protects to some extent from in-kernel changes where C/R code > > should have been updated but wasn't. Person doing incomplete change won't > > notice e.g refcount checks and won't try to "fix" them. But we'll notice it, > > e.g. when running testsuite (amen) and update C/R code accordingly. > > > > I'm talking about these checks so that everyone understands: > > > > for_each_cr_object(ctx, obj, CR_CTX_MM_STRUCT) { > > struct mm_struct *mm = obj->o_obj; > > unsigned int cnt = atomic_read(&mm->mm_users); > > > > if (obj->o_count != cnt) { > > printk("%s: mm_struct %p has external references %lu:%u\n", __func__, mm, obj->o_count, cnt); > > return -EINVAL; > > } > > } > > > > They are like moving detectors, small, invisible, something moved, you don't > > know what, but you don't care because you have to investigate anyway. > > > > In this scheme, if user wants to checkpoint just one process, he should > > start it alone in separate container. Right now, in posted patchset > > as cloned process with > > CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET > > So you suggest that to checkpoint a single process, say a cpu job that > would run a week, which runs in the topmost pid_ns, I will need to > checkpoint the entire topmost pid_ns (as a container, if at all possible > - surely there will non-checkpointable tasks there) and then in > user-space filter out the data and leave only one task, and then to > restart I'll use a container again ? No, you do little preparations and start CPU job in container from the very beginning. > Pffff ... why not just allow subtree checkpoint, not in a container, > with its well known limitations -- would work the same, for very little > additional implementation cost. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 18:34 ` Alexey Dobriyan @ 2009-04-14 19:31 ` Oren Laadan 2009-04-14 20:08 ` Alexey Dobriyan 0 siblings, 1 reply; 37+ messages in thread From: Oren Laadan @ 2009-04-14 19:31 UTC (permalink / raw) To: Alexey Dobriyan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel Alexey Dobriyan wrote: > On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote: >> >> Alexey Dobriyan wrote: >>> On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: >>>> Alexey Dobriyan wrote: >>>>> On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: >>>>>> I'm curious how you see these fitting in with the work that we've been >>>>>> doing with Oren. Do you mean to just start a discussion or are you >>>>>> really proposing these as an alternative to what Oren has been posting? >>>>> Yes, this is posted as alternative. >>>>> >>>>> Some design decisions are seen as incorrect from here like: >>>> A definition of "design" would help; I find most of your comments >>>> below either vague, cryptic, or technical nits... >>>> >>>>> * not rejecting checkpoint with possible "leaks" from container >>>> ...like this, for example. >>> Like checkpointing one process out of many living together. >> See the thread on creating tasks in userspace vs. kernel space: >> the argument here is that is an interesting enough use case for >> a checkpoint of not-an-entire-container. >> >> Of course it will require more logic to it, so the user can choose >> what she cares or does not care about, and the kernel could alert >> the user about it. >> >> The point is, that it is, IMHO, a desirable capability. >> >>> If you allow this you consequently drop checks (e.g. refcount checks) >>> for "somebody else is using structure to be checkpointed". >>> >> From this point below, I totally agree with you that for the purpose >> of a whole-container-checkpoint this is certainly desirable. My point >> was that it can be easily added the existing patchset (not yours). >> Why not add it there ? >> >>> If you drop these checks, you can't decipher legal sutiations like >>> "process genuinely doesn't care about routing table of netns it lives in" >>> from "illegal" situations like "process created shm segment but currently >>> doesn't use it so not checkpointing ipcns will result in breakagenlater". >>> >>> You'll have to move responsibility to user, so user exactly knows what >>> app relies on and on what. And probably add flags like CKPT_SHM, >>> CKPT_NETNS_ROUTE ad infinitum. >>> >>> And user will screw it badly and complain: "after restart my app >>> segfaulted". And user himself is screwed now: old running process is >>> already killed (it was checkpointed on purpose) and new process in image >>> segfaults every time it's restarted. >>> >>> All of this in out opinion results in doing C/R unreliably and badly. >>> >>> We are going to do it well and dig from the other side. >>> >>> If "leak" (any "leak") is detected, C/R is aborted because kernel >>> doesn't know what app relies on and what app doesn't care about. >>> >>> This protected from situations and failure modes described above. >>> >>> This also protects to some extent from in-kernel changes where C/R code >>> should have been updated but wasn't. Person doing incomplete change won't >>> notice e.g refcount checks and won't try to "fix" them. But we'll notice it, >>> e.g. when running testsuite (amen) and update C/R code accordingly. >>> >>> I'm talking about these checks so that everyone understands: >>> >>> for_each_cr_object(ctx, obj, CR_CTX_MM_STRUCT) { >>> struct mm_struct *mm = obj->o_obj; >>> unsigned int cnt = atomic_read(&mm->mm_users); >>> >>> if (obj->o_count != cnt) { >>> printk("%s: mm_struct %p has external references %lu:%u\n", __func__, mm, obj->o_count, cnt); >>> return -EINVAL; >>> } >>> } >>> >>> They are like moving detectors, small, invisible, something moved, you don't >>> know what, but you don't care because you have to investigate anyway. >>> >>> In this scheme, if user wants to checkpoint just one process, he should >>> start it alone in separate container. Right now, in posted patchset >>> as cloned process with >>> CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET >> So you suggest that to checkpoint a single process, say a cpu job that >> would run a week, which runs in the topmost pid_ns, I will need to >> checkpoint the entire topmost pid_ns (as a container, if at all possible >> - surely there will non-checkpointable tasks there) and then in >> user-space filter out the data and leave only one task, and then to >> restart I'll use a container again ? > > No, you do little preparations and start CPU job in container from the very > beginning. So you are denying all those other users that don't want to do that the joy of checkpointing and restarting their stuff ... :( Or, for users who do run everything in container, but some task is not checkpointable - it is using this electronic microscope device attached to their handheld. Alas, they do want to checkpoint that useful program they are running there that calculates fibonacci numbers ... Or, a nested container that shares something with the parent container, so is not checkpointable by itself... Ok, you probably got the idea. Oren. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 19:31 ` Oren Laadan @ 2009-04-14 20:08 ` Alexey Dobriyan 0 siblings, 0 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 20:08 UTC (permalink / raw) To: Oren Laadan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel On Tue, Apr 14, 2009 at 03:31:55PM -0400, Oren Laadan wrote: > > > Alexey Dobriyan wrote: > > On Tue, Apr 14, 2009 at 02:08:21PM -0400, Oren Laadan wrote: > >> > >> Alexey Dobriyan wrote: > >>> On Tue, Apr 14, 2009 at 12:26:50AM -0400, Oren Laadan wrote: > >>>> Alexey Dobriyan wrote: > >>>>> On Thu, Apr 09, 2009 at 10:07:11PM -0700, Dave Hansen wrote: > >>>>>> I'm curious how you see these fitting in with the work that we've been > >>>>>> doing with Oren. Do you mean to just start a discussion or are you > >>>>>> really proposing these as an alternative to what Oren has been posting? > >>>>> Yes, this is posted as alternative. > >>>>> > >>>>> Some design decisions are seen as incorrect from here like: > >>>> A definition of "design" would help; I find most of your comments > >>>> below either vague, cryptic, or technical nits... > >>>> > >>>>> * not rejecting checkpoint with possible "leaks" from container > >>>> ...like this, for example. > >>> Like checkpointing one process out of many living together. > >> See the thread on creating tasks in userspace vs. kernel space: > >> the argument here is that is an interesting enough use case for > >> a checkpoint of not-an-entire-container. > >> > >> Of course it will require more logic to it, so the user can choose > >> what she cares or does not care about, and the kernel could alert > >> the user about it. > >> > >> The point is, that it is, IMHO, a desirable capability. > >> > >>> If you allow this you consequently drop checks (e.g. refcount checks) > >>> for "somebody else is using structure to be checkpointed". > >>> > >> From this point below, I totally agree with you that for the purpose > >> of a whole-container-checkpoint this is certainly desirable. My point > >> was that it can be easily added the existing patchset (not yours). > >> Why not add it there ? > >> > >>> If you drop these checks, you can't decipher legal sutiations like > >>> "process genuinely doesn't care about routing table of netns it lives in" > >>> from "illegal" situations like "process created shm segment but currently > >>> doesn't use it so not checkpointing ipcns will result in breakagenlater". > >>> > >>> You'll have to move responsibility to user, so user exactly knows what > >>> app relies on and on what. And probably add flags like CKPT_SHM, > >>> CKPT_NETNS_ROUTE ad infinitum. > >>> > >>> And user will screw it badly and complain: "after restart my app > >>> segfaulted". And user himself is screwed now: old running process is > >>> already killed (it was checkpointed on purpose) and new process in image > >>> segfaults every time it's restarted. > >>> > >>> All of this in out opinion results in doing C/R unreliably and badly. > >>> > >>> We are going to do it well and dig from the other side. > >>> > >>> If "leak" (any "leak") is detected, C/R is aborted because kernel > >>> doesn't know what app relies on and what app doesn't care about. > >>> > >>> This protected from situations and failure modes described above. > >>> > >>> This also protects to some extent from in-kernel changes where C/R code > >>> should have been updated but wasn't. Person doing incomplete change won't > >>> notice e.g refcount checks and won't try to "fix" them. But we'll notice it, > >>> e.g. when running testsuite (amen) and update C/R code accordingly. > >>> > >>> I'm talking about these checks so that everyone understands: > >>> > >>> for_each_cr_object(ctx, obj, CR_CTX_MM_STRUCT) { > >>> struct mm_struct *mm = obj->o_obj; > >>> unsigned int cnt = atomic_read(&mm->mm_users); > >>> > >>> if (obj->o_count != cnt) { > >>> printk("%s: mm_struct %p has external references %lu:%u\n", __func__, mm, obj->o_count, cnt); > >>> return -EINVAL; > >>> } > >>> } > >>> > >>> They are like moving detectors, small, invisible, something moved, you don't > >>> know what, but you don't care because you have to investigate anyway. > >>> > >>> In this scheme, if user wants to checkpoint just one process, he should > >>> start it alone in separate container. Right now, in posted patchset > >>> as cloned process with > >>> CLONE_NEWNS|CLONE_NEWUTS|CLONE_NEWIPC|CLONE_NEWUSER|CLONE_NEWPID|CLONE_NEWNET > >> So you suggest that to checkpoint a single process, say a cpu job that > >> would run a week, which runs in the topmost pid_ns, I will need to > >> checkpoint the entire topmost pid_ns (as a container, if at all possible > >> - surely there will non-checkpointable tasks there) and then in > >> user-space filter out the data and leave only one task, and then to > >> restart I'll use a container again ? > > > > No, you do little preparations and start CPU job in container from the very > > beginning. > > So you are denying all those other users that don't want to do that > the joy of checkpointing and restarting their stuff ... :( That's the price for a feature. In return kernel promises to not create surprises after restart(2). > Or, for users who do run everything in container, but some task is not > checkpointable - it is using this electronic microscope device attached > to their handheld. Why this example? If code for opened file will register checkpoint/restart hooks, everything will be fine. > Alas, they do want to checkpoint that useful program they are running there > that calculates fibonacci numbers ... > > Or, a nested container that shares something with the parent container, > so is not checkpointable by itself... > > Ok, you probably got the idea. You should too. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 18:08 ` Oren Laadan 2009-04-14 18:34 ` Alexey Dobriyan @ 2009-04-14 20:49 ` Alexey Dobriyan 2009-04-14 21:11 ` Dave Hansen 2009-04-14 21:39 ` Serge E. Hallyn 1 sibling, 2 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 20:49 UTC (permalink / raw) To: Oren Laadan Cc: Dave Hansen, akpm, containers, xemul, serue, mingo, hch, torvalds, linux-kernel > >>> * not having CAP_SYS_ADMIN on restart(2) > >> Surely you have read already on the containers mailing list that > >> for the *time being* we attempt to get as far as possible without > >> requiring root privileges, to identify security hot-spots. > > > > More or less everything is hotspot. > > Going back to my example of a subtree above: what sort of security > issues you would have here, assuming all restart operations go via > the usual security checks just like syscalls, and we take special > care about values we allow as input, e.g. cpu debug registers etc ? This is like asking if everything goes through correct security checks how will you screw something up. Everything won't go via usual security checks. Just for giggles, writing exploit for localhost hole becomes easier -- you very effectively turn off all randomization kernel does because VMA boundaries comes from disk. > BTW, the checks for cpu registers and other values for sanity is > needed anyway to ensure that the kernel freak out if wrong input > is fed (maliciously or by mistake). Out of head, mucking with registers of setuid program is no-no. As well as mucking with dirty data of setuid program. Capabilities will come from disk. Many EPERM checks are suspect. do_arch_prctl(ARCH_SET_GS) has TASK_SIZE check, do you? > >> And surely you have read there the observation that for the general > >> case root privileges will probably be inevitable. > >> > >> And surely you don't seriously think that adding this check changes > >> the "design"... > > > > This is not about design now, this is about if you allow restart(2) for > > everyone, you open kernel to very very high number of holes of all > > types. I wrote about it in reply to your code. > > > > And there are many examples where this is not the case. The current > patchset (v14-rc3), for example, to the best of my knowledge, does > not have such issues (well, need to add checks for validity of regs > but there is a FIXME comment :) And such an obvious issue like segmentt RPL is fixed? :-) ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 20:49 ` Alexey Dobriyan @ 2009-04-14 21:11 ` Dave Hansen 2009-04-14 21:39 ` Serge E. Hallyn 1 sibling, 0 replies; 37+ messages in thread From: Dave Hansen @ 2009-04-14 21:11 UTC (permalink / raw) To: Alexey Dobriyan Cc: Oren Laadan, xemul, containers, linux-kernel, hch, akpm, torvalds, mingo On Wed, 2009-04-15 at 00:49 +0400, Alexey Dobriyan wrote: > > Going back to my example of a subtree above: what sort of security > > issues you would have here, assuming all restart operations go via > > the usual security checks just like syscalls, and we take special > > care about values we allow as input, e.g. cpu debug registers etc ? > > This is like asking if everything goes through correct security checks > how will you screw something up. Everything won't go via usual security > checks. > > Just for giggles, writing exploit for localhost hole becomes easier -- > you very effectively turn off all randomization kernel does because VMA > boundaries comes from disk. Alexey, I think there's a bit of a misunderstanding here. We're proposing that whatever gets done during the restart would not be able to elevate above the privilege level of the user doing the restart. For instance, a user would not be able to restart a suid application -- that would require privilege. In the same way, at checkpoint time, we should deny users the ability to checkpoint things that are privileged. We need to pay very close attention to the kernel interfaces that we use to ensure that all the normal security checks are observed, of course. There's no new hole. A restore operation is just a pieced-together set of operations that the user would have already been able to perform. Again, I think this stems from some of us that think that c/r should be used outside exclusively checkpointing whole containers. I think we'll find some expanded use for c/r on things that aren't strict system containers. -- Dave ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 20:49 ` Alexey Dobriyan 2009-04-14 21:11 ` Dave Hansen @ 2009-04-14 21:39 ` Serge E. Hallyn 2009-04-15 19:21 ` CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) Alexey Dobriyan 1 sibling, 1 reply; 37+ messages in thread From: Serge E. Hallyn @ 2009-04-14 21:39 UTC (permalink / raw) To: Alexey Dobriyan Cc: Oren Laadan, Dave Hansen, akpm, containers, xemul, mingo, hch, torvalds, linux-kernel Quoting Alexey Dobriyan (adobriyan@gmail.com): > > >>> * not having CAP_SYS_ADMIN on restart(2) > > >> Surely you have read already on the containers mailing list that > > >> for the *time being* we attempt to get as far as possible without > > >> requiring root privileges, to identify security hot-spots. > > > > > > More or less everything is hotspot. > > > > Going back to my example of a subtree above: what sort of security > > issues you would have here, assuming all restart operations go via > > the usual security checks just like syscalls, and we take special > > care about values we allow as input, e.g. cpu debug registers etc ? > > This is like asking if everything goes through correct security checks > how will you screw something up. Everything won't go via usual security > checks. (Everything which doesn't go through usual security checks gets special security checks...) You are asking an administrator to give users setuid-root programs when they wouldn't otherwise need it. So now not only is there potential of security holes in kernel code which was written under the assumption that 'hey you have privilege when you run this so are trusted'. You also have more people running probably even less-vetted userspace code with privilege. > Just for giggles, writing exploit for localhost hole becomes easier -- > you very effectively turn off all randomization kernel does because VMA > boundaries comes from disk. Hmm, interesting point. But again you're going on a naive assumption that if restart requires privilege, only trusted users will use it. Rather, in the real world, I claim that unprivileged users will be given more privilege than they need. > > BTW, the checks for cpu registers and other values for sanity is > > needed anyway to ensure that the kernel freak out if wrong input > > is fed (maliciously or by mistake). > > Out of head, mucking with registers of setuid program is no-no. > As well as mucking with dirty data of setuid program. No argument there, no idea why you bring this up. > Capabilities will come from disk. Sure. > Many EPERM checks are suspect. No idea what you're saying. > do_arch_prctl(ARCH_SET_GS) has TASK_SIZE check, do you? We don't support x86_64 yet :) > > >> And surely you have read there the observation that for the general > > >> case root privileges will probably be inevitable. > > >> > > >> And surely you don't seriously think that adding this check changes > > >> the "design"... > > > > > > This is not about design now, this is about if you allow restart(2) for > > > everyone, you open kernel to very very high number of holes of all > > > types. I wrote about it in reply to your code. > > > > > > > And there are many examples where this is not the case. The current > > patchset (v14-rc3), for example, to the best of my knowledge, does > > not have such issues (well, need to add checks for validity of regs > > but there is a FIXME comment :) > > And such an obvious issue like segmentt RPL is fixed? :-) -serge ^ permalink raw reply [flat|nested] 37+ messages in thread
* CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) 2009-04-14 21:39 ` Serge E. Hallyn @ 2009-04-15 19:21 ` Alexey Dobriyan 2009-04-15 20:22 ` Serge E. Hallyn 2009-04-15 20:23 ` Dave Hansen 0 siblings, 2 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-15 19:21 UTC (permalink / raw) To: Serge E. Hallyn Cc: Oren Laadan, Dave Hansen, akpm, containers, xemul, mingo, hch, torvalds, linux-kernel Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) 2009-04-15 19:21 ` CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) Alexey Dobriyan @ 2009-04-15 20:22 ` Serge E. Hallyn 2009-04-15 20:23 ` Dave Hansen 1 sibling, 0 replies; 37+ messages in thread From: Serge E. Hallyn @ 2009-04-15 20:22 UTC (permalink / raw) To: Alexey Dobriyan Cc: Oren Laadan, Dave Hansen, akpm, containers, xemul, mingo, hch, torvalds, linux-kernel Quoting Alexey Dobriyan (adobriyan@gmail.com): > Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? You mean a sysctl to specify whether to require CAP_SYS_ADMIN for restart(2)? Yeah I wouldn't object to that - it certainly seems like something sane for an admin to use depending on their users. Though I think the bigger fish to fry first is whether we only support whole-container checkpoint/restart. If that is the case, then CAP_SYS_ADMIN will always be needed for restart since it will always unshare some namespaces. thanks, -serge ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) 2009-04-15 19:21 ` CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) Alexey Dobriyan 2009-04-15 20:22 ` Serge E. Hallyn @ 2009-04-15 20:23 ` Dave Hansen 2009-04-15 20:39 ` Serge E. Hallyn 1 sibling, 1 reply; 37+ messages in thread From: Dave Hansen @ 2009-04-15 20:23 UTC (permalink / raw) To: Alexey Dobriyan Cc: Serge E. Hallyn, xemul, containers, linux-kernel, hch, akpm, torvalds, mingo On Wed, 2009-04-15 at 23:21 +0400, Alexey Dobriyan wrote: > Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? If the point is not to let users even *try* restarting things if it *might* not work, then I guess this might be reasonable. If the goal is to increase security by always requiring CAP_SYS_ADMIN for "dangerous" operations, I fear it will be harmful. We may have people adding features that are not considering the security impact of what they're doing just because the cases they care about all require privilege. What would the goal be? -- Dave ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) 2009-04-15 20:23 ` Dave Hansen @ 2009-04-15 20:39 ` Serge E. Hallyn 2009-04-15 21:05 ` CAP_SYS_ADMIN on restart(2) Oren Laadan 0 siblings, 1 reply; 37+ messages in thread From: Serge E. Hallyn @ 2009-04-15 20:39 UTC (permalink / raw) To: Dave Hansen Cc: Alexey Dobriyan, xemul, containers, linux-kernel, hch, akpm, torvalds, mingo Quoting Dave Hansen (dave@linux.vnet.ibm.com): > On Wed, 2009-04-15 at 23:21 +0400, Alexey Dobriyan wrote: > > Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? > > If the point is not to let users even *try* restarting things if it > *might* not work, then I guess this might be reasonable. > > If the goal is to increase security by always requiring CAP_SYS_ADMIN > for "dangerous" operations, I fear it will be harmful. We may have > people adding features that are not considering the security impact of > what they're doing just because the cases they care about all require > privilege. Nah, I disagree. (Or put another way, that wouldn't be the goal) There are two administrators we want to satisfy: 1. the one who wants his users to do partial checkpoints, but doesn't want to risk giving away any privilege at all in the process. He'll be satisified by setting restart(2) to not require cap_sys_admin, and his users just won't be able to do a whole container. A lot of users will be happy with that (though no SYSVIPC support, then). 2. the one who may have one or two users who he trusts to do checkpoint/restart, but otherwise doesn't want even the slightest risk of other users using restart, and maybe finding an exploit. That one is probably the more common admin, and he'll be satisified with requiring CAP_SYS_ADMIN for all restart(2)'s, since he's ok risking giving extra privilege to the ones he trusts. And meanwhile, by virtue of leaving (1) supported, we are still more under the gun to make sure that everything restart(2) does is properly checked. > What would the goal be? > > -- Dave -serge ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) 2009-04-15 20:39 ` Serge E. Hallyn @ 2009-04-15 21:05 ` Oren Laadan 2009-04-15 21:16 ` Serge E. Hallyn 0 siblings, 1 reply; 37+ messages in thread From: Oren Laadan @ 2009-04-15 21:05 UTC (permalink / raw) To: Serge E. Hallyn Cc: Dave Hansen, xemul, containers, mingo, linux-kernel, hch, akpm, torvalds, Alexey Dobriyan Serge E. Hallyn wrote: > Quoting Dave Hansen (dave@linux.vnet.ibm.com): >> On Wed, 2009-04-15 at 23:21 +0400, Alexey Dobriyan wrote: >>> Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? >> If the point is not to let users even *try* restarting things if it >> *might* not work, then I guess this might be reasonable. >> >> If the goal is to increase security by always requiring CAP_SYS_ADMIN >> for "dangerous" operations, I fear it will be harmful. We may have >> people adding features that are not considering the security impact of >> what they're doing just because the cases they care about all require >> privilege. > > Nah, I disagree. (Or put another way, that wouldn't be the goal) > There are two administrators we want to satisfy: > > 1. the one who wants his users to do partial checkpoints, but doesn't > want to risk giving away any privilege at all in the process. He'll > be satisified by setting restart(2) to not require cap_sys_admin, > and his users just won't be able to do a whole container. A lot of > users will be happy with that (though no SYSVIPC support, then). There is also a middle way: use setuid program to allow creation of a new namespace (under your favorite policy), then drop the privileges and continue as unprivileged inside that container. IOW, don't make the initial container-creation a barrier for the entire operation. Oren. > > 2. the one who may have one or two users who he trusts to do > checkpoint/restart, but otherwise doesn't want even the slightest > risk of other users using restart, and maybe finding an exploit. > That one is probably the more common admin, and he'll be satisified > with requiring CAP_SYS_ADMIN for all restart(2)'s, since he's ok > risking giving extra privilege to the ones he trusts. > > And meanwhile, by virtue of leaving (1) supported, we are still > more under the gun to make sure that everything restart(2) does > is properly checked. > >> What would the goal be? >> >> -- Dave > > -serge > _______________________________________________ > Containers mailing list > Containers@lists.linux-foundation.org > https://lists.linux-foundation.org/mailman/listinfo/containers > ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) 2009-04-15 21:05 ` CAP_SYS_ADMIN on restart(2) Oren Laadan @ 2009-04-15 21:16 ` Serge E. Hallyn 2009-04-16 15:35 ` Alexey Dobriyan 0 siblings, 1 reply; 37+ messages in thread From: Serge E. Hallyn @ 2009-04-15 21:16 UTC (permalink / raw) To: Oren Laadan Cc: Dave Hansen, xemul, containers, mingo, linux-kernel, hch, akpm, torvalds, Alexey Dobriyan Quoting Oren Laadan (orenl@cs.columbia.edu): > > > Serge E. Hallyn wrote: > > Quoting Dave Hansen (dave@linux.vnet.ibm.com): > >> On Wed, 2009-04-15 at 23:21 +0400, Alexey Dobriyan wrote: > >>> Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? > >> If the point is not to let users even *try* restarting things if it > >> *might* not work, then I guess this might be reasonable. > >> > >> If the goal is to increase security by always requiring CAP_SYS_ADMIN > >> for "dangerous" operations, I fear it will be harmful. We may have > >> people adding features that are not considering the security impact of > >> what they're doing just because the cases they care about all require > >> privilege. > > > > Nah, I disagree. (Or put another way, that wouldn't be the goal) > > There are two administrators we want to satisfy: > > > > 1. the one who wants his users to do partial checkpoints, but doesn't > > want to risk giving away any privilege at all in the process. He'll > > be satisified by setting restart(2) to not require cap_sys_admin, > > and his users just won't be able to do a whole container. A lot of > > users will be happy with that (though no SYSVIPC support, then). > > There is also a middle way: use setuid program to allow creation > of a new namespace (under your favorite policy), then drop the > privileges and continue as unprivileged inside that container. > > IOW, don't make the initial container-creation a barrier for the > entire operation. That is still possible here. But I don't think it's relevant. What Alexey wants, I believe, is for users to be able to not have to worry about there being exploitable bugs in restart(2) which unprivileged users can play with. And for the usual distro-kernel reasons, saying use 'CONFIG_CHECKPOINT=n' is not an option. -serge ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) 2009-04-15 21:16 ` Serge E. Hallyn @ 2009-04-16 15:35 ` Alexey Dobriyan 2009-04-16 16:29 ` Serge E. Hallyn 0 siblings, 1 reply; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-16 15:35 UTC (permalink / raw) To: Serge E. Hallyn Cc: Oren Laadan, Dave Hansen, xemul, containers, mingo, linux-kernel, hch, akpm, torvalds On Wed, Apr 15, 2009 at 04:16:09PM -0500, Serge E. Hallyn wrote: > Quoting Oren Laadan (orenl@cs.columbia.edu): > > > > > > Serge E. Hallyn wrote: > > > Quoting Dave Hansen (dave@linux.vnet.ibm.com): > > >> On Wed, 2009-04-15 at 23:21 +0400, Alexey Dobriyan wrote: > > >>> Is sysctl to control CAP_SYS_ADMIN on restart(2) OK? > > >> If the point is not to let users even *try* restarting things if it > > >> *might* not work, then I guess this might be reasonable. > > >> > > >> If the goal is to increase security by always requiring CAP_SYS_ADMIN > > >> for "dangerous" operations, I fear it will be harmful. We may have > > >> people adding features that are not considering the security impact of > > >> what they're doing just because the cases they care about all require > > >> privilege. > > > > > > Nah, I disagree. (Or put another way, that wouldn't be the goal) > > > There are two administrators we want to satisfy: > > > > > > 1. the one who wants his users to do partial checkpoints, but doesn't > > > want to risk giving away any privilege at all in the process. He'll > > > be satisified by setting restart(2) to not require cap_sys_admin, > > > and his users just won't be able to do a whole container. A lot of > > > users will be happy with that (though no SYSVIPC support, then). > > > > There is also a middle way: use setuid program to allow creation > > of a new namespace (under your favorite policy), then drop the > > privileges and continue as unprivileged inside that container. > > > > IOW, don't make the initial container-creation a barrier for the > > entire operation. > > That is still possible here. But I don't think it's relevant. > > What Alexey wants, I believe, is for users to be able to not have > to worry about there being exploitable bugs in restart(2) which > unprivileged users can play with. And for the usual distro-kernel > reasons, saying use 'CONFIG_CHECKPOINT=n' is not an option. This is correct, yes. If I would be a sysadmin who knows a bit about kernel internals, I'd never trust restart(2) to get it right. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: CAP_SYS_ADMIN on restart(2) 2009-04-16 15:35 ` Alexey Dobriyan @ 2009-04-16 16:29 ` Serge E. Hallyn 0 siblings, 0 replies; 37+ messages in thread From: Serge E. Hallyn @ 2009-04-16 16:29 UTC (permalink / raw) To: Alexey Dobriyan Cc: Oren Laadan, Dave Hansen, xemul, containers, mingo, linux-kernel, hch, akpm, torvalds Quoting Alexey Dobriyan (adobriyan@gmail.com): > > What Alexey wants, I believe, is for users to be able to not have > > to worry about there being exploitable bugs in restart(2) which > > unprivileged users can play with. And for the usual distro-kernel > > reasons, saying use 'CONFIG_CHECKPOINT=n' is not an option. > > This is correct, yes. If I would be a sysadmin who knows a bit about > kernel internals, I'd never trust restart(2) to get it right. Now I suppose what we could do is define a new CAP_SYS_RESTART capability and require that. Then the admin to whom I'm trying to cater could simply 'capset cap_sys_restart=pe /bin/restart'. Then all users could use restart without being granted the extra privilege implied by CAP_SYS_ADMIN. -serge ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 2:32 [PATCH 00/30] C/R OpenVZ/Virtuozzo style Alexey Dobriyan 2009-04-10 2:44 ` Alexey Dobriyan 2009-04-10 5:07 ` Dave Hansen @ 2009-04-10 8:28 ` Ingo Molnar 2009-04-10 11:45 ` Alexey Dobriyan 2009-04-10 15:06 ` Linus Torvalds 2009-04-14 5:46 ` Oren Laadan 4 siblings, 1 reply; 37+ messages in thread From: Ingo Molnar @ 2009-04-10 8:28 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, dave, orenl, hch, torvalds, linux-kernel * Alexey Dobriyan <adobriyan@gmail.com> wrote: > This is to show how we see C/R and to provoke discussion on number > of important issues (mounts, ...). > > This is small part of long-awaited to be cleanuped code. > > It's able to restore busyloop on i386 and x86_64 and restore i386 > busyloop on x86_64. It wasn't tested much more than that. > > I'm currently starting formal testsuite, otherwise it's > whack-a-mole game and formal TODO list (a huge one). One simple patch-submission request: if you start sending large series of patches then please post it like others do: by using git-format-patch or some other tool that preserves the threading of the discussion and links all the 1/30..30/30 patches back to the 0/30 mail. This submission has created 31 separate threads on lkml, intermixed with other threads - making it more difficult to review it as a coherent unit. Thanks, Ingo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 8:28 ` [PATCH 00/30] C/R OpenVZ/Virtuozzo style Ingo Molnar @ 2009-04-10 11:45 ` Alexey Dobriyan 0 siblings, 0 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-10 11:45 UTC (permalink / raw) To: Ingo Molnar Cc: akpm, containers, xemul, serue, dave, orenl, hch, torvalds, linux-kernel On Fri, Apr 10, 2009 at 10:28:15AM +0200, Ingo Molnar wrote: > > * Alexey Dobriyan <adobriyan@gmail.com> wrote: > > > This is to show how we see C/R and to provoke discussion on number > > of important issues (mounts, ...). > > > > This is small part of long-awaited to be cleanuped code. > > > > It's able to restore busyloop on i386 and x86_64 and restore i386 > > busyloop on x86_64. It wasn't tested much more than that. > > > > I'm currently starting formal testsuite, otherwise it's > > whack-a-mole game and formal TODO list (a huge one). > > One simple patch-submission request: if you start sending large > series of patches then please post it like others do: by using > git-format-patch or some other tool that preserves the threading of > the discussion and links all the 1/30..30/30 patches back to the > 0/30 mail. > > This submission has created 31 separate threads on lkml, intermixed > with other threads - making it more difficult to review it as a > coherent unit. OK, I thought stgit will screw something, so sent it old proven way :-\ ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 2:32 [PATCH 00/30] C/R OpenVZ/Virtuozzo style Alexey Dobriyan ` (2 preceding siblings ...) 2009-04-10 8:28 ` [PATCH 00/30] C/R OpenVZ/Virtuozzo style Ingo Molnar @ 2009-04-10 15:06 ` Linus Torvalds 2009-04-13 7:39 ` Alexey Dobriyan 2009-04-14 5:46 ` Oren Laadan 4 siblings, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2009-04-10 15:06 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel On Fri, 10 Apr 2009, Alexey Dobriyan wrote: > > This is to show how we see C/R and to provoke discussion on number of > important issues (mounts, ...). My only initial reaction is that I absolutely hate the naming (not to say I love the code - just to say that I didn't even look at it, because I got hung up on the name). "cr"? It could be anything. I realize that to _you_ that is meaningful, but to somebody less specifically interested in checkpoint-restore 'cr' means 'carriage return' or just doesn't really say anything at all. That goes both for file naming (kernel/cr/xyzzy.c) and to a lesser degree for function naming too. I also don't think it makes sense to have something like kernel/cr/cr-x86_32.c or kernel/cr/cr-tty.c - maybe that is good right now, but I sure hope that the long-term goal is to have these things in the code that will need to change them when the code gets updated (ie arch/x86/kernel and drivers/char/) Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 15:06 ` Linus Torvalds @ 2009-04-13 7:39 ` Alexey Dobriyan 2009-04-13 18:39 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-13 7:39 UTC (permalink / raw) To: Linus Torvalds Cc: akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel On Fri, Apr 10, 2009 at 08:06:55AM -0700, Linus Torvalds wrote: > On Fri, 10 Apr 2009, Alexey Dobriyan wrote: > > > > This is to show how we see C/R and to provoke discussion on number of > > important issues (mounts, ...). > > My only initial reaction is that I absolutely hate the naming (not to say > I love the code - just to say that I didn't even look at it, because I got > hung up on the name). > > "cr"? It could be anything. I realize that to _you_ that is meaningful, > but to somebody less specifically interested in checkpoint-restore 'cr' > means 'carriage return' or just doesn't really say anything at all. Well, in OpenVZ everything is in kernel/cpt/ and prefixed with "cpt_" and "rst_". And I think "cr_" is super nice prefix: it's short, it's C-like, it reminds about restart part. Eventually, C/R will become standard in-kernel thing everyone should be at least aware of, so it's like learning what "vma" means. > That goes both for file naming (kernel/cr/xyzzy.c) and to a lesser degree > for function naming too. I also don't think it makes sense to have > something like kernel/cr/cr-x86_32.c or kernel/cr/cr-tty.c - maybe that is > good right now, but I sure hope that the long-term goal is to have these > things in the code that will need to change them when the code gets > updated (ie arch/x86/kernel and drivers/char/) In the long run, yes, C/R should be moved closer to core code it tries to checkpoint. Right now, however, doing "make kernel/cr/" is much quicker and C/R can not do much, so it's unclear how exactly splitting should be done. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-13 7:39 ` Alexey Dobriyan @ 2009-04-13 18:39 ` Linus Torvalds 2009-04-13 19:30 ` Ingo Molnar 2009-04-14 12:29 ` Alexey Dobriyan 0 siblings, 2 replies; 37+ messages in thread From: Linus Torvalds @ 2009-04-13 18:39 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel On Mon, 13 Apr 2009, Alexey Dobriyan wrote: > > Well, in OpenVZ everything is in kernel/cpt/ and prefixed with "cpt_" > and "rst_". So? We're not merging OpenVZ code _either_. > And I think "cr_" is super nice prefix: it's short, it's C-like, > it reminds about restart part It does no such thing. THAT'S THE POINT. "cr" means _nothing_ to anybody else than some CR-specific people, and those people don't even need it! Look around you. We try to use nicer names. We spell out "cpufreq", we don't call it "cf". Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-13 18:39 ` Linus Torvalds @ 2009-04-13 19:30 ` Ingo Molnar 2009-04-14 12:29 ` Alexey Dobriyan 1 sibling, 0 replies; 37+ messages in thread From: Ingo Molnar @ 2009-04-13 19:30 UTC (permalink / raw) To: Linus Torvalds Cc: Alexey Dobriyan, akpm, containers, xemul, serue, dave, orenl, hch, linux-kernel * Linus Torvalds <torvalds@linux-foundation.org> wrote: > On Mon, 13 Apr 2009, Alexey Dobriyan wrote: > > > > Well, in OpenVZ everything is in kernel/cpt/ and prefixed with > > "cpt_" and "rst_". > > So? > > We're not merging OpenVZ code _either_. > > > And I think "cr_" is super nice prefix: it's short, it's C-like, > > it reminds about restart part > > It does no such thing. THAT'S THE POINT. "cr" means _nothing_ to > anybody else than some CR-specific people, and those people don't > even need it! > > Look around you. We try to use nicer names. We spell out > "cpufreq", we don't call it "cf". I've done an analysis about what a good kernel-global naming scheme for checkpoint/restore might be (see further below). Here's the current namespace (from a stupid sed+grep script combo done on my mbox - so it combines all the submitted patches): cr_align cr_arch cr_arch_check_image_header cr_arch_check_image_task_struct cr_arch_check_mm_struct cr_arch_check_task_struct cr_arch_dump_mm_struct cr_arch_dump_task_struct cr_arch_len_mm_struct cr_arch_len_task_struct cr_arch_restore_mm_struct cr_arch_restore_task_struct cr_arg_end cr_arg_start cr_attach_get_file cr_bits cr_blocked cr_brk cr_can_checkpoint_file cr_cap_bset cr_cap_effective cr_cap_inheritable cr_cap_permitted cr_c_cc cr_c_cflag cr_check_* cr_check_cred cr_check_file cr_check_file: cr_check_image_header cr_check_mm cr_check_mm_struct cr_check_pid_ns cr_check_sighand cr_check_sighand_struct cr_check_signal cr_check_signal_struct cr_check_sock cr_check_task_struct cr_check_tty cr_check_user_struct cr_check_vma cr_c_iflag cr_c_ispeed cr_c_lflag cr_c_line cr_c_oflag cr_collect cr_collect_all_cred cr_collect_all_file cr_collect_all_files_struct cr_collect_all_fs_struct cr_collect_all_group_info cr_collect_all_ipc_ns cr_collect_all_mm_struct cr_collect_all_mnt_ns cr_collect_all_net_ns cr_collect_all_nsproxy cr_collect_all_pid cr_collect_all_pid_ns cr_collect_all_sighand_struct cr_collect_all_signal_struct cr_collect_all_sock cr_collect_all_task_struct cr_collect_all_tty_struct cr_collect_all_user_ns cr_collect_all_user_struct cr_collect_all_uts_ns cr_collect_cred cr_collect_file cr_collect_files_struct cr_collect_fs_struct cr_collect_group_info cr_collect_ipc_ns cr_collect_mm cr_collect_mm_struct cr_collect_mnt_ns cr_collect_net_ns cr_collect_nsproxy cr_collect_object cr_collect_pid cr_collect_pid_ns cr_collect_sighand cr_collect_sighand_struct cr_collect_signal cr_collect_signal_struct cr_collect_sock cr_collect_tasks cr_collect_task_struct cr_collect_tty cr_collect_user_ns cr_collect_user_struct cr_collect_uts_ns cr_comm cr_context cr_context_create cr_context_destroy cr_context_obj_type cr_context_task_struct cr_core cr_c_ospeed cr_cred cr_cs cr_ct cr_ctrl_status cr_ctx cr_ctx_count cr_ctx_put cr_data cr_debug cr_def_flags cr_domainname cr_dr0 cr_dr1 cr_dr2 cr_dr3 cr_dr6 cr_dr7 cr_driver_flags cr_driver_subtype cr_driver_type cr_ds cr_dump cr_dump_all_cred cr_dump_all_file cr_dump_all_files_struct cr_dump_all_fs_struct cr_dump_all_group_info cr_dump_all_ipc_ns cr_dump_all_mm_struct cr_dump_all_mnt_ns cr_dump_all_net_ns cr_dump_all_nsproxy cr_dump_all_pid cr_dump_all_pid_ns cr_dump_all_sighand_struct cr_dump_all_signal_struct cr_dump_all_sock cr_dump_all_task_struct cr_dump_all_tty cr_dump_all_user_ns cr_dump_all_user_struct cr_dump_all_uts_ns cr_dump_anonvma cr_dump_cred cr_dump_fd cr_dump_file cr_dump_files_struct cr_dump_fs_struct cr_dump_group_info cr_dump_header cr_dump_image_header cr_dump_ipc_ns cr_dump_mm_struct cr_dump_mnt_ns cr_dump_net_ns cr_dump_nsproxy cr_dump_pid cr_dump_pid_ns cr_dump_ptr cr_dump_sighand_struct cr_dump_signal_struct cr_dump_sigset cr_dump_sock cr_dump_task_struct cr_dump_task_struct_x86_32 cr_dump_task_struct_x86_64 cr_dump_terminator cr_dump_tty cr_dump_user_ns cr_dump_user_struct cr_dump_uts_ns cr_dump_vma cr_dump_vma_pages cr_dump_xstate cr_eax cr_ebp cr_ebx cr_ecx cr_edi cr_edx cr_eflags cr_egid cr_eip cr_enabled cr_end_code cr_end_data cr_env_end cr_env_start cr_es cr_esi cr_esp cr_euid cr_explain_file cr_fd cr_fd_flags cr_f_flags cr_file cr_file_checkpoint cr_file_checkpointable cr_file_get_func cr_files_struct cr_file_supported cr_fill_name cr_find_obj_by_pos cr_find_obj_by_ptr cr_flags cr_flow_stopped cr_f_owner cr_f_owner_euid cr_f_owner_pid_type cr_f_owner_signum cr_f_owner_uid cr_f_pos cr_freeze_tasks cr_fs cr_fs_checkpointable cr_fsgid cr_fsindex cr_fsuid cr_gid cr_gs cr_gsindex cr_hbuf_get cr_hbuf_put cr_hdr cr_hdr_cpu cr_hdr_fd cr_hdr_fd_data cr_hdr_fd_ent cr_hdr_files cr_hdr_head cr_hdr_head_arch cr_hdr_mm cr_hdr_mm_context cr_hdr_pgarr cr_hdr_pids cr_hdr_tail cr_hdr_tree cr_hdr_vma cr_header cr_hw_stopped cr_imag cr_image_arch_x86_32 cr_image_arch_x86_64 cr_image_cred cr_image_fd cr_image_file cr_image_files_struct cr_image_fs_struct cr_image_group_info cr_image_header cr_image_header_arch cr_image_ipc_ns cr_image_magic cr_image_mm_struct cr_image_mnt_ns cr_image_net_ns cr_image_nsproxy cr_image_pid cr_image_pid_ns cr_image_sighand_struct cr_image_signal_struct cr_image_sigset cr_image_sock cr_image_task_st cr_image_task_struct cr_image_tty cr_image_user_ns cr_image_user_struct cr_image_uts_ns cr_image_version cr_image_vma cr_image_vma_content cr_image_vma_vdso cr_i_mode cr_index cr_ipc_ns cr_kill_tasks cr_kread cr_last_pid cr_len cr_len_arch cr_len_xstate cr_level cr_link_index cr_low_latency cr_machine cr_mm_struct cr_mnt_ns cr_msg_ctlmax cr_msg_ctlmnb cr_msg_ctlmni cr_name cr_name_len cr_net_ns cr_ngroups cr_nodename cr_nr cr_nr_pages cr_nsproxy cr_obj cr_obj_add_ptr cr_obj_add_ref cr_object cr_object_create cr_object_destroy cr_object_header cr_orig_eax cr_orig_rax cr_packet cr_page_size cr_pgarr cr_pid_ns cr_pid_type cr_pos_cred cr_pos_f_cred cr_pos_file cr_pos_files cr_pos_f_owner_pid cr_pos_fs cr_pos_group_info cr_pos_ipc_ns cr_pos_mm cr_pos_mm_struct cr_pos_mnt_ns cr_pos_net_ns cr_pos_nsproxy cr_pos_parent cr_pos_pgrp cr_pos_pid cr_pos_pid_ns cr_pos_pids cr_pos_real_cred cr_pos_real_parent cr_pos_session cr_pos_sighand cr_pos_signal cr_pos_sk_net cr_pos_t cr_pos_user cr_pos_user_ns cr_pos_uts_ns cr_pos_vm_file cr_pread cr_prepare_image cr_pwd cr_pwd_len cr_quan@163 cr_r10 cr_r11 cr_r12 cr_r13 cr_r14 cr_r15 cr_r8 cr_r9 cr_rax cr_rbp cr_rbx cr_rcx cr_rdi cr_rdx cr_read_buffer cr_read_buf_type cr_read_cpu cr_read_cpu_fpu cr_read_fd_data cr_read_files cr_read_head_arch cr_read_mm cr_read_mm_context cr_read_obj_type cr_read_open_fname cr_read_string cr_read_task_struct cr_read_thread cr_real_blocked cr_release cr_restart cr_restore_all_fd cr_restore_all_vma cr_restore_cred cr_restore_fd cr_restore_file cr_restore_files_struct cr_restore_fs_struct cr_restore_group_info cr_restore_ipc_ns cr_restore_mm_struct cr_restore_mnt_ns cr_restore_net_ns cr_restore_nsproxy cr_restore_pid cr_restore_pid_ns cr_restore_ptr cr_restore_sighand_struct cr_restore_signal_struct cr_restore_sigset cr_restore_task_struct cr_restore_task_struct_x86_32 cr_restore_task_struct_x86_64 cr_restore_uts_ns cr_restore_vma cr_restore_vma_content cr_restore_vma_vdso cr_restore_xstate cr_rflags cr_rip cr_rlim cr_rlim_cur cr_rlim_max cr_root cr_root_len cr_rsi cr_rsp cr_s390_mm cr_s390_regs cr_sa cr_sa_flags cr_sa_handler cr_sa_mask cr_sa_restorer cr_save_cpu_regs cr_saved_auxv cr_saved_sigmask cr_scan_fds cr_securebits cr_sem_ctls cr_sgid cr_shm_ctlall cr_shm_ctlmax cr_shm_ctlmni cr_sighand cr_signal cr_signature cr_signum cr_sk_family cr_ss cr_start_addr cr_start_brk cr_start_code cr_start_data cr_start_stack cr_stopped cr_suid cr_sysctl_max_dgram_qlen cr_sysctl_somaxconn cr_sysname cr_task_struct cr_task_struct_arch cr_termios cr_thaw_tasks cr_tls_array cr_tsk_arch cr_type cr_uid cr_umask cr_unx cr_uts_ns cr_uts_release cr_version cr_vm_end cr_vm_flags cr_vm_page_prot cr_vm_pgoff cr_vm_start cr_winsize cr_write cr_write_buffer cr_write_cpu cr_write_error cr_write_fd_data cr_write_files cr_write_head cr_write_head_arch cr_write_mm_context cr_write_obj cr_write_thread cr_write_vma cr_ws_col cr_ws_row cr_ws_xpixel cr_ws_ypixel cr_xstate The cr_ prefix for internal symbols and fields should be dropped. For global symbols, the following naming scheme looks pretty intuitive, unique and self-descriptive to me: state_checkpoint_XYZ() state_restore_XYZ() state_collect_XYZ() state_dump_XYZ() state_image_XYZ() ... Seeing a state_*() symbol anywhere in the kernel will tell me 'ah, this is kernel object state checkpoint/restore related functionality', at a glance. [ And equally importantly, when seeing a state_*() symbol i can skip it easily, as an uninteresting-to-my-current-activity item. ] I had a quick look at various existing uses of "\<state_.*" symbols in the kernel, and no widely used variant looked particularly confusing to me, so i think this would be a good choice IMHO. Ingo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-13 18:39 ` Linus Torvalds 2009-04-13 19:30 ` Ingo Molnar @ 2009-04-14 12:29 ` Alexey Dobriyan 2009-04-14 13:44 ` Ingo Molnar 2009-04-14 17:09 ` Linus Torvalds 1 sibling, 2 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 12:29 UTC (permalink / raw) To: Linus Torvalds Cc: akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel On Mon, Apr 13, 2009 at 11:39:51AM -0700, Linus Torvalds wrote: > > > On Mon, 13 Apr 2009, Alexey Dobriyan wrote: > > > > Well, in OpenVZ everything is in kernel/cpt/ and prefixed with "cpt_" > > and "rst_". > > So? > > We're not merging OpenVZ code _either_. This is to give example of other prefixes: cpt_ and rst_ Are they fine? > > And I think "cr_" is super nice prefix: it's short, it's C-like, > > it reminds about restart part > > It does no such thing. THAT'S THE POINT. "cr" means _nothing_ to anybody > else than some CR-specific people, and those people don't even need it! > > Look around you. We try to use nicer names. We spell out "cpufreq", we > don't call it "cf". ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 12:29 ` Alexey Dobriyan @ 2009-04-14 13:44 ` Ingo Molnar 2009-04-14 16:53 ` Alexey Dobriyan 2009-04-14 17:09 ` Linus Torvalds 1 sibling, 1 reply; 37+ messages in thread From: Ingo Molnar @ 2009-04-14 13:44 UTC (permalink / raw) To: Alexey Dobriyan Cc: Linus Torvalds, akpm, containers, xemul, serue, dave, orenl, hch, linux-kernel * Alexey Dobriyan <adobriyan@gmail.com> wrote: > On Mon, Apr 13, 2009 at 11:39:51AM -0700, Linus Torvalds wrote: > > > > > > On Mon, 13 Apr 2009, Alexey Dobriyan wrote: > > > > > > Well, in OpenVZ everything is in kernel/cpt/ and prefixed with "cpt_" > > > and "rst_". > > > > So? > > > > We're not merging OpenVZ code _either_. > > This is to give example of other prefixes: cpt_ and rst_ > Are they fine? Not really. 'rst' can be easily mistaken for 'reset' and neither really tells me at a glance what they do. They are also quite tongue-twisters. See my namespace analysis and suggestions from yesterday for a proper naming scheme. The key i believe is to move away from this singular 'the world is all about checkpoint and restore', and move it to a IMHO clearer state_*() type of naming which really isolates all these kernel state save/restore management APIs from other kernel APIs. (See my mail from yesterday for details.) kstate_*() would be another, perhaps even clearer naming scheme. I.e.: kstate_checkpoint_XYZ() kstate_restore_XYZ() kstate_collect_XYZ() kstate_dump_XYZ() kstate_image_XYZ() ... Just _look_ at them - they are expressive at a glance, and reasonably short. That is the kind of first-time impression we need, not a 'wtf?' moment. I just checked, there's zero hits on "git grep \<kstate_" in the kernel, so it's a pristine namespace. IMHO, go wild ... Ingo ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 13:44 ` Ingo Molnar @ 2009-04-14 16:53 ` Alexey Dobriyan 0 siblings, 0 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 16:53 UTC (permalink / raw) To: Ingo Molnar Cc: Linus Torvalds, akpm, containers, xemul, serue, dave, orenl, hch, linux-kernel On Tue, Apr 14, 2009 at 03:44:20PM +0200, Ingo Molnar wrote: > > * Alexey Dobriyan <adobriyan@gmail.com> wrote: > > > On Mon, Apr 13, 2009 at 11:39:51AM -0700, Linus Torvalds wrote: > > > > > > > > > On Mon, 13 Apr 2009, Alexey Dobriyan wrote: > > > > > > > > Well, in OpenVZ everything is in kernel/cpt/ and prefixed with "cpt_" > > > > and "rst_". > > > > > > So? > > > > > > We're not merging OpenVZ code _either_. > > > > This is to give example of other prefixes: cpt_ and rst_ > > Are they fine? > > Not really. 'rst' can be easily mistaken for 'reset' and neither > really tells me at a glance what they do. They are also quite > tongue-twisters. > > See my namespace analysis and suggestions from yesterday for a > proper naming scheme. > > The key i believe is to move away from this singular 'the world is > all about checkpoint and restore', and move it to a IMHO clearer > state_*() type of naming which really isolates all these kernel > state save/restore management APIs from other kernel APIs. (See my > mail from yesterday for details.) > > kstate_*() would be another, perhaps even clearer naming scheme. > I.e.: > > kstate_checkpoint_XYZ() > kstate_restore_XYZ() > kstate_collect_XYZ() > kstate_dump_XYZ() > kstate_image_XYZ() > ... > > Just _look_ at them - they are expressive at a glance, and > reasonably short. That is the kind of first-time impression > we need, not a 'wtf?' moment. > > I just checked, there's zero hits on "git grep \<kstate_" in the > kernel, so it's a pristine namespace. IMHO, go wild ... Need to try it for real. One minor nit. This kstate_ doesn't include quite a noticable of in-kernel state. For example, task readahead state isn't relevant to C/R at all. It's state but irrelevant state. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 12:29 ` Alexey Dobriyan 2009-04-14 13:44 ` Ingo Molnar @ 2009-04-14 17:09 ` Linus Torvalds 2009-04-14 17:19 ` Randy Dunlap 1 sibling, 1 reply; 37+ messages in thread From: Linus Torvalds @ 2009-04-14 17:09 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel On Tue, 14 Apr 2009, Alexey Dobriyan wrote: > > > > We're not merging OpenVZ code _either_. > > This is to give example of other prefixes: cpt_ and rst_ > Are they fine? Do you secretly work for IBM? IBM has a well-known disdain for vowels, and basically refuses to use them for mnemonics (they were called on this, and did "eieio" as an instruction just to try to make up for it). But I'm from Finland. In Finnish, about 75% of all letters are vowels. I find this dis-emvoweling to be stupid and impractical. Without vowels, you can't tell Finnish words apart (admittedly, _with_ vowels, you generally cannot pronounce them, so to a non-Finn it doesn't much matter). My point is, let's go for a happy medium - LEAVE THE F*CKING VOWELS IN PLACE ALREADY. So let's call it "checkpoint" and "restore". Ok? Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 17:09 ` Linus Torvalds @ 2009-04-14 17:19 ` Randy Dunlap 2009-04-14 17:32 ` Linus Torvalds 0 siblings, 1 reply; 37+ messages in thread From: Randy Dunlap @ 2009-04-14 17:19 UTC (permalink / raw) To: Linus Torvalds Cc: Alexey Dobriyan, akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel Linus Torvalds wrote: > > On Tue, 14 Apr 2009, Alexey Dobriyan wrote: >>> We're not merging OpenVZ code _either_. >> This is to give example of other prefixes: cpt_ and rst_ >> Are they fine? http://www.rfc-editor.org/rfc/rfc5513.txt > Do you secretly work for IBM? > > IBM has a well-known disdain for vowels, and basically refuses to use them > for mnemonics (they were called on this, and did "eieio" as an instruction > just to try to make up for it). > > But I'm from Finland. In Finnish, about 75% of all letters are vowels. I > find this dis-emvoweling to be stupid and impractical. Without vowels, you > can't tell Finnish words apart (admittedly, _with_ vowels, you generally > cannot pronounce them, so to a non-Finn it doesn't much matter). > > My point is, let's go for a happy medium - LEAVE THE F*CKING VOWELS IN > PLACE ALREADY. > > So let's call it "checkpoint" and "restore". Ok? restart ? -- ~Randy ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 17:19 ` Randy Dunlap @ 2009-04-14 17:32 ` Linus Torvalds 0 siblings, 0 replies; 37+ messages in thread From: Linus Torvalds @ 2009-04-14 17:32 UTC (permalink / raw) To: Randy Dunlap Cc: Alexey Dobriyan, akpm, containers, xemul, serue, dave, mingo, orenl, hch, linux-kernel On Tue, 14 Apr 2009, Randy Dunlap wrote: > > > > So let's call it "checkpoint" and "restore". Ok? > > restart ? Even better (and proving my point that trying to use contractions like "rst" can be misleading, although I agree with Ingo that the most common mistake would be "reset") Linus ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-10 2:32 [PATCH 00/30] C/R OpenVZ/Virtuozzo style Alexey Dobriyan ` (3 preceding siblings ...) 2009-04-10 15:06 ` Linus Torvalds @ 2009-04-14 5:46 ` Oren Laadan 2009-04-14 15:19 ` Alexey Dobriyan 4 siblings, 1 reply; 37+ messages in thread From: Oren Laadan @ 2009-04-14 5:46 UTC (permalink / raw) To: Alexey Dobriyan Cc: akpm, containers, xemul, serue, dave, mingo, hch, torvalds, linux-kernel Some meta comments about this patch set: * Patches 1-9 are cleanups, unrelated to checkpoint/restart. They deserve a separate thread. * You barely take locks or reference counts to objects that you later refer to. What if something really bad happens ? * (contd) If you don't take locks, then you at the very least need to rely on the container remaining frozen for the duration of the operation (during checkpoint). * (contd) Still with locks and references, during restart you can't even freeze the container, so need extra logic to prevent bad things (e.g. OOM killer, signal or ptrace from parent container etc). * What is the rationale behind doing the freeze/unfreeze from within sys_checkpoint/sys_restart ? Instead of you let userspace do it (and only verify in kernel) you gain, again, flexibility. For example, you want to also snapshot the filesystem, then userspace will do something like: freeze container -> snapshot filesystem -> checkpoint -> thaw container. * A plethora of "FIXME" comments ... Alexey Dobriyan wrote: > This is to show how we see C/R and to provoke discussion on number of > important issues (mounts, ...). Quoting your patch: --- This is one big FIXME: What to do with overmounted files? What to do with mounts at all, who should restore them? just restore something to not oops on task exit --- > > This is small part of long-awaited to be cleanuped code. > > It's able to restore busyloop on i386 and x86_64 and restore i386 > busyloop on x86_64. It wasn't tested much more than that. Oh .. I really wish you'd sent a x86_64 patch our way, too ;) > > I'm currently starting formal testsuite, otherwise it's whack-a-mole > game and formal TODO list (a huge one). > So I'm still struggling to see the major different in the approaches that would justify throwing away our hard worked, reviewed, tested and functional code, and take this - similar in design, largely incomplete and unreviewed code. Best, Oren. ^ permalink raw reply [flat|nested] 37+ messages in thread
* Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style 2009-04-14 5:46 ` Oren Laadan @ 2009-04-14 15:19 ` Alexey Dobriyan 0 siblings, 0 replies; 37+ messages in thread From: Alexey Dobriyan @ 2009-04-14 15:19 UTC (permalink / raw) To: Oren Laadan Cc: akpm, containers, xemul, serue, dave, mingo, hch, torvalds, linux-kernel On Tue, Apr 14, 2009 at 01:46:36AM -0400, Oren Laadan wrote: > Some meta comments about this patch set: > > * Patches 1-9 are cleanups, unrelated to checkpoint/restart. They > deserve a separate thread. They will be sent separatedly. > * You barely take locks or reference counts to objects that you > later refer to. What if something really bad happens ? I'm thinking right now what is needed and isn't. Tasks are frozen so unmap VMA in the middle of dump can't happen. Refcounts aren't needed because tasks are frozen but alive and pin everything needed. This is during dump part. What is needed is impossibility of a task to dissapear in frozen state. On restart, I probably leak refcount to every object, but haven't checked just it. > * (contd) If you don't take locks, then you at the very least need > to rely on the container remaining frozen for the duration of the > operation (during checkpoint). > * (contd) Still with locks and references, during restart you can't > even freeze the container, so need extra logic to prevent bad things > (e.g. OOM killer, signal or ptrace from parent container etc). > > * What is the rationale behind doing the freeze/unfreeze from within > sys_checkpoint/sys_restart ? Instead of you let userspace do it > (and only verify in kernel) you gain, again, flexibility. For example, > you want to also snapshot the filesystem, then userspace will do > something like: freeze container -> snapshot filesystem -> checkpoint > -> thaw container. This is incomplete part. But, yes, freeze, dump, thaw/kill as separate actions make sense. checkpoint(CR_CPT_FREEZE); [rsync fs] checkpoint(CR_CPT_DUMP|CR_CPT_KILL); with check that CR_CPT_THAW doesn't happen during dump. > * A plethora of "FIXME" comments ... > > Alexey Dobriyan wrote: > > This is to show how we see C/R and to provoke discussion on number of > > important issues (mounts, ...). > > Quoting your patch: > --- > This is one big FIXME: > What to do with overmounted files? > What to do with mounts at all, who should restore them? > > just restore something to not oops on task exit This is just a skeleton for mnt_ns checkpoint, not seriously suggested to merge it. > > This is small part of long-awaited to be cleanuped code. > > > > It's able to restore busyloop on i386 and x86_64 and restore i386 > > busyloop on x86_64. It wasn't tested much more than that. > > Oh .. I really wish you'd sent a x86_64 patch our way, too ;) > > > > > I'm currently starting formal testsuite, otherwise it's whack-a-mole > > game and formal TODO list (a huge one). > > > > So I'm still struggling to see the major different in the approaches > that would justify throwing away our hard worked, reviewed, tested > and functional code, and take this - similar in design, largely > incomplete and unreviewed code. ^ permalink raw reply [flat|nested] 37+ messages in thread
end of thread, other threads:[~2009-04-16 16:29 UTC | newest] Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-04-10 2:32 [PATCH 00/30] C/R OpenVZ/Virtuozzo style Alexey Dobriyan 2009-04-10 2:44 ` Alexey Dobriyan 2009-04-10 5:07 ` Dave Hansen 2009-04-13 9:14 ` Alexey Dobriyan 2009-04-13 11:16 ` Dave Hansen 2009-04-13 18:07 ` Dave Hansen 2009-04-14 4:26 ` Oren Laadan 2009-04-14 14:58 ` Alexey Dobriyan 2009-04-14 18:08 ` Oren Laadan 2009-04-14 18:34 ` Alexey Dobriyan 2009-04-14 19:31 ` Oren Laadan 2009-04-14 20:08 ` Alexey Dobriyan 2009-04-14 20:49 ` Alexey Dobriyan 2009-04-14 21:11 ` Dave Hansen 2009-04-14 21:39 ` Serge E. Hallyn 2009-04-15 19:21 ` CAP_SYS_ADMIN on restart(2) (was: Re: [PATCH 00/30] C/R OpenVZ/Virtuozzo style) Alexey Dobriyan 2009-04-15 20:22 ` Serge E. Hallyn 2009-04-15 20:23 ` Dave Hansen 2009-04-15 20:39 ` Serge E. Hallyn 2009-04-15 21:05 ` CAP_SYS_ADMIN on restart(2) Oren Laadan 2009-04-15 21:16 ` Serge E. Hallyn 2009-04-16 15:35 ` Alexey Dobriyan 2009-04-16 16:29 ` Serge E. Hallyn 2009-04-10 8:28 ` [PATCH 00/30] C/R OpenVZ/Virtuozzo style Ingo Molnar 2009-04-10 11:45 ` Alexey Dobriyan 2009-04-10 15:06 ` Linus Torvalds 2009-04-13 7:39 ` Alexey Dobriyan 2009-04-13 18:39 ` Linus Torvalds 2009-04-13 19:30 ` Ingo Molnar 2009-04-14 12:29 ` Alexey Dobriyan 2009-04-14 13:44 ` Ingo Molnar 2009-04-14 16:53 ` Alexey Dobriyan 2009-04-14 17:09 ` Linus Torvalds 2009-04-14 17:19 ` Randy Dunlap 2009-04-14 17:32 ` Linus Torvalds 2009-04-14 5:46 ` Oren Laadan 2009-04-14 15:19 ` Alexey Dobriyan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).