* [PATCH 0/8] a start to credentials c/r @ 2009-05-26 17:32 Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 1/8] cr: break out new_user_ns() Serge E. Hallyn ` (8 more replies) 0 siblings, 9 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:32 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Following is the next version of the credentials c/r patchset, on top of the c/r patchset at git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git It implements checkpoint and restart of user, user namespaces, groups, supplementary groups, and struct cred. There is a question as to what to do about LSM data at restart. Right now I'm ignoring it, which means that prepare_creds() should ensure that the restart tasks get the context of the task calling sys_restart(). I suspect the right thing to do is to add two new LSM hooks, one which checks current's authorization to restart from the checkpoint file, and one which determines the task->cred->security filed based upon any of: 1. current_security() of the task calling sys_restart() 2. the task->cred->security checkpointed in the ckpt file 3. the ->security of the checkpoint file Oren, I think this version has all the changes you asked for except for restoring cred info for sysvipc. thanks, -serge ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 1/8] cr: break out new_user_ns() 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn @ 2009-05-26 17:33 ` Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 2/8] cr: split core function out of some set*{u,g}id functions Serge E. Hallyn ` (7 subsequent siblings) 8 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:33 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Break out the core function which checks privilege and (if allowed) creates a new user namespace, with the passed-in creating user_struct. Note that a user_namespace, unlike other namespace pointers, is not stored in the nsproxy. Rather it is purely a property of user_structs. This will let us keep the task restore code simpler. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> --- include/linux/user_namespace.h | 8 ++++++ kernel/user_namespace.c | 53 ++++++++++++++++++++++++++++------------ 2 files changed, 45 insertions(+), 16 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index cc4f453..a2b82d5 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -20,6 +20,8 @@ extern struct user_namespace init_user_ns; #ifdef CONFIG_USER_NS +struct user_namespace *new_user_ns(struct user_struct *creator, + struct user_struct **newroot); static inline struct user_namespace *get_user_ns(struct user_namespace *ns) { if (ns) @@ -38,6 +40,12 @@ static inline void put_user_ns(struct user_namespace *ns) #else +static inline struct user_namespace *new_user_ns(struct user_struct *creator, + struct user_struct **newroot) +{ + return -EINVAL; +} + static inline struct user_namespace *get_user_ns(struct user_namespace *ns) { return &init_user_ns; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 076c7c8..e624b0f 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -11,15 +11,8 @@ #include <linux/user_namespace.h> #include <linux/cred.h> -/* - * Create a new user namespace, deriving the creator from the user in the - * passed credentials, and replacing that user with the new root user for the - * new namespace. - * - * This is called by copy_creds(), which will finish setting the target task's - * credentials. - */ -int create_user_ns(struct cred *new) +static struct user_namespace *_new_user_ns(struct user_struct *creator, + struct user_struct **newroot) { struct user_namespace *ns; struct user_struct *root_user; @@ -27,7 +20,7 @@ int create_user_ns(struct cred *new) ns = kmalloc(sizeof(struct user_namespace), GFP_KERNEL); if (!ns) - return -ENOMEM; + return ERR_PTR(-ENOMEM); kref_init(&ns->kref); @@ -38,12 +31,43 @@ int create_user_ns(struct cred *new) root_user = alloc_uid(ns, 0); if (!root_user) { kfree(ns); - return -ENOMEM; + return ERR_PTR(-ENOMEM); } /* set the new root user in the credentials under preparation */ - ns->creator = new->user; - new->user = root_user; + ns->creator = creator; + + /* alloc_uid() incremented the userns refcount. Just set it to 1 */ + kref_set(&ns->kref, 1); + + *newroot = root_user; + return ns; +} + +struct user_namespace *new_user_ns(struct user_struct *creator, + struct user_struct **newroot) +{ + if (!capable(CAP_SYS_ADMIN)) + return ERR_PTR(-EPERM); + return _new_user_ns(creator, newroot); +} + +/* + * Create a new user namespace, deriving the creator from the user in the + * passed credentials, and replacing that user with the new root user for the + * new namespace. + * + * This is called by copy_creds(), which will finish setting the target task's + * credentials. + */ +int create_user_ns(struct cred *new) +{ + struct user_namespace *ns; + + ns = new_user_ns(new->user, &new->user); + if (IS_ERR(ns)) + return PTR_ERR(ns); + new->uid = new->euid = new->suid = new->fsuid = 0; new->gid = new->egid = new->sgid = new->fsgid = 0; put_group_info(new->group_info); @@ -54,9 +78,6 @@ int create_user_ns(struct cred *new) #endif /* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */ - /* alloc_uid() incremented the userns refcount. Just set it to 1 */ - kref_set(&ns->kref, 1); - return 0; } -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 2/8] cr: split core function out of some set*{u,g}id functions 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 1/8] cr: break out new_user_ns() Serge E. Hallyn @ 2009-05-26 17:33 ` Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 3/8] cr: capabilities: define checkpoint and restore fns Serge E. Hallyn ` (6 subsequent siblings) 8 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:33 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module When restarting tasks, we want to be able to change xuid and xgid in a struct cred, and do so with security checks. Break the core functionality of set{fs,res}{u,g}id into cred_setX which performs the access checks based on current_cred(), but performs the requested change on a passed-in cred. This will allow us to securely construct struct creds based on a checkpoint image, constrained by the caller's permissions, and apply them to the caller at the end of sys_restart(). Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> --- include/linux/cred.h | 8 +++ kernel/cred.c | 114 ++++++++++++++++++++++++++++++++++++++++++ kernel/sys.c | 134 ++++++++------------------------------------------ 3 files changed, 143 insertions(+), 113 deletions(-) diff --git a/include/linux/cred.h b/include/linux/cred.h index 3282ee4..bc5ffc2 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -20,6 +20,9 @@ struct user_struct; struct cred; struct inode; +/* defined in sys.c, used in cred_setresuid */ +extern int set_user(struct cred *new); + /* * COW Supplementary groups list */ @@ -343,4 +346,9 @@ do { \ *(_fsgid) = __cred->fsgid; \ } while(0) +int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid); +int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid); +int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid); +int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid); + #endif /* _LINUX_CRED_H */ diff --git a/kernel/cred.c b/kernel/cred.c index 3a03918..a017399 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -589,3 +589,117 @@ int set_create_files_as(struct cred *new, struct inode *inode) return security_kernel_create_files_as(new, inode); } EXPORT_SYMBOL(set_create_files_as); + +int cred_setresuid(struct cred *new, uid_t ruid, uid_t euid, uid_t suid) +{ + int retval; + const struct cred *old; + + retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES); + if (retval) + return retval; + old = current_cred(); + + if (!capable(CAP_SETUID)) { + if (ruid != (uid_t) -1 && ruid != old->uid && + ruid != old->euid && ruid != old->suid) + return -EPERM; + if (euid != (uid_t) -1 && euid != old->uid && + euid != old->euid && euid != old->suid) + return -EPERM; + if (suid != (uid_t) -1 && suid != old->uid && + suid != old->euid && suid != old->suid) + return -EPERM; + } + + if (ruid != (uid_t) -1) { + new->uid = ruid; + if (ruid != old->uid) { + retval = set_user(new); + if (retval < 0) + return retval; + } + } + if (euid != (uid_t) -1) + new->euid = euid; + if (suid != (uid_t) -1) + new->suid = suid; + new->fsuid = new->euid; + + return security_task_fix_setuid(new, old, LSM_SETID_RES); +} + +int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, + gid_t sgid) +{ + const struct cred *old = current_cred(); + int retval; + + retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES); + if (retval) + return retval; + + if (!capable(CAP_SETGID)) { + if (rgid != (gid_t) -1 && rgid != old->gid && + rgid != old->egid && rgid != old->sgid) + return -EPERM; + if (egid != (gid_t) -1 && egid != old->gid && + egid != old->egid && egid != old->sgid) + return -EPERM; + if (sgid != (gid_t) -1 && sgid != old->gid && + sgid != old->egid && sgid != old->sgid) + return -EPERM; + } + + if (rgid != (gid_t) -1) + new->gid = rgid; + if (egid != (gid_t) -1) + new->egid = egid; + if (sgid != (gid_t) -1) + new->sgid = sgid; + new->fsgid = new->egid; + return 0; +} + +int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid) +{ + const struct cred *old; + + old = current_cred(); + *old_fsuid = old->fsuid; + + if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS) < 0) + return -EPERM; + + if (uid == old->uid || uid == old->euid || + uid == old->suid || uid == old->fsuid || + capable(CAP_SETUID)) { + if (uid != *old_fsuid) { + new->fsuid = uid; + if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0) + return 0; + } + } + return -EPERM; +} + +int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid) +{ + const struct cred *old; + + old = current_cred(); + *old_fsgid = old->fsgid; + + if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS)) + return -EPERM; + + if (gid == old->gid || gid == old->egid || + gid == old->sgid || gid == old->fsgid || + capable(CAP_SETGID)) { + if (gid != *old_fsgid) { + new->fsgid = gid; + return 0; + } + } + return -EPERM; +} diff --git a/kernel/sys.c b/kernel/sys.c index e7998cf..fe5dcfe 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -558,11 +558,12 @@ error: /* * change the user struct in a credentials set to match the new UID */ -static int set_user(struct cred *new) +int set_user(struct cred *new) { struct user_struct *new_user; - new_user = alloc_uid(current_user_ns(), new->uid); + /* is this ok? */ + new_user = alloc_uid(new->user->user_ns, new->uid); if (!new_user) return -EAGAIN; @@ -703,14 +704,12 @@ error: return retval; } - /* * This function implements a generic ability to update ruid, euid, * and suid. This allows you to implement the 4.4 compatible seteuid(). */ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid) { - const struct cred *old; struct cred *new; int retval; @@ -718,45 +717,10 @@ SYSCALL_DEFINE3(setresuid, uid_t, ruid, uid_t, euid, uid_t, suid) if (!new) return -ENOMEM; - retval = security_task_setuid(ruid, euid, suid, LSM_SETID_RES); - if (retval) - goto error; - old = current_cred(); - - retval = -EPERM; - if (!capable(CAP_SETUID)) { - if (ruid != (uid_t) -1 && ruid != old->uid && - ruid != old->euid && ruid != old->suid) - goto error; - if (euid != (uid_t) -1 && euid != old->uid && - euid != old->euid && euid != old->suid) - goto error; - if (suid != (uid_t) -1 && suid != old->uid && - suid != old->euid && suid != old->suid) - goto error; - } + retval = cred_setresuid(new, ruid, euid, suid); + if (retval == 0) + return commit_creds(new); - if (ruid != (uid_t) -1) { - new->uid = ruid; - if (ruid != old->uid) { - retval = set_user(new); - if (retval < 0) - goto error; - } - } - if (euid != (uid_t) -1) - new->euid = euid; - if (suid != (uid_t) -1) - new->suid = suid; - new->fsuid = new->euid; - - retval = security_task_fix_setuid(new, old, LSM_SETID_RES); - if (retval < 0) - goto error; - - return commit_creds(new); - -error: abort_creds(new); return retval; } @@ -778,43 +742,17 @@ SYSCALL_DEFINE3(getresuid, uid_t __user *, ruid, uid_t __user *, euid, uid_t __u */ SYSCALL_DEFINE3(setresgid, gid_t, rgid, gid_t, egid, gid_t, sgid) { - const struct cred *old; struct cred *new; int retval; new = prepare_creds(); if (!new) return -ENOMEM; - old = current_cred(); - retval = security_task_setgid(rgid, egid, sgid, LSM_SETID_RES); - if (retval) - goto error; + retval = cred_setresgid(new, rgid, egid, sgid); + if (retval == 0) + return commit_creds(new); - retval = -EPERM; - if (!capable(CAP_SETGID)) { - if (rgid != (gid_t) -1 && rgid != old->gid && - rgid != old->egid && rgid != old->sgid) - goto error; - if (egid != (gid_t) -1 && egid != old->gid && - egid != old->egid && egid != old->sgid) - goto error; - if (sgid != (gid_t) -1 && sgid != old->gid && - sgid != old->egid && sgid != old->sgid) - goto error; - } - - if (rgid != (gid_t) -1) - new->gid = rgid; - if (egid != (gid_t) -1) - new->egid = egid; - if (sgid != (gid_t) -1) - new->sgid = sgid; - new->fsgid = new->egid; - - return commit_creds(new); - -error: abort_creds(new); return retval; } @@ -831,7 +769,6 @@ SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __u return retval; } - /* * "setfsuid()" sets the fsuid - the uid used for filesystem checks. This * is used for "access()" and for the NFS daemon (letting nfsd stay at @@ -840,35 +777,20 @@ SYSCALL_DEFINE3(getresgid, gid_t __user *, rgid, gid_t __user *, egid, gid_t __u */ SYSCALL_DEFINE1(setfsuid, uid_t, uid) { - const struct cred *old; struct cred *new; uid_t old_fsuid; + int retval; new = prepare_creds(); if (!new) return current_fsuid(); - old = current_cred(); - old_fsuid = old->fsuid; - - if (security_task_setuid(uid, (uid_t)-1, (uid_t)-1, LSM_SETID_FS) < 0) - goto error; - if (uid == old->uid || uid == old->euid || - uid == old->suid || uid == old->fsuid || - capable(CAP_SETUID)) { - if (uid != old_fsuid) { - new->fsuid = uid; - if (security_task_fix_setuid(new, old, LSM_SETID_FS) == 0) - goto change_okay; - } - } - -error: - abort_creds(new); - return old_fsuid; + retval = cred_setfsuid(new, uid, &old_fsuid); + if (retval == 0) + commit_creds(new); + else + abort_creds(new); -change_okay: - commit_creds(new); return old_fsuid; } @@ -877,34 +799,20 @@ change_okay: */ SYSCALL_DEFINE1(setfsgid, gid_t, gid) { - const struct cred *old; struct cred *new; gid_t old_fsgid; + int retval; new = prepare_creds(); if (!new) return current_fsgid(); - old = current_cred(); - old_fsgid = old->fsgid; - - if (security_task_setgid(gid, (gid_t)-1, (gid_t)-1, LSM_SETID_FS)) - goto error; - if (gid == old->gid || gid == old->egid || - gid == old->sgid || gid == old->fsgid || - capable(CAP_SETGID)) { - if (gid != old_fsgid) { - new->fsgid = gid; - goto change_okay; - } - } - -error: - abort_creds(new); - return old_fsgid; + retval = cred_setfsgid(new, gid, &old_fsgid); + if (retval == 0) + commit_creds(new); + else + abort_creds(new); -change_okay: - commit_creds(new); return old_fsgid; } -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 3/8] cr: capabilities: define checkpoint and restore fns 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 1/8] cr: break out new_user_ns() Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 2/8] cr: split core function out of some set*{u,g}id functions Serge E. Hallyn @ 2009-05-26 17:33 ` Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 4/8] groups: move code to kernel/groups.c Serge E. Hallyn ` (5 subsequent siblings) 8 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:33 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module An application checkpoint image will store capability sets (and the bounding set) as __u64s. Define checkpoint and restart functions to translate between those and kernel_cap_t's. Define a common function do_capset_tocred() which applies capability set changes to a passed-in struct cred. The restore function uses do_capset_tocred() to apply the restored capabilities to the struct cred being crafted, subject to the current task's (task executing sys_restart()) permissions. TODO: one day we'll want to c/r the securebits as well. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> --- include/linux/capability.h | 5 ++ kernel/capability.c | 94 +++++++++++++++++++++++++++++++++++++------ 2 files changed, 86 insertions(+), 13 deletions(-) diff --git a/include/linux/capability.h b/include/linux/capability.h index c302110..572b5a0 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -536,6 +536,11 @@ extern const kernel_cap_t __cap_empty_set; extern const kernel_cap_t __cap_full_set; extern const kernel_cap_t __cap_init_eff_set; +extern void checkpoint_save_cap(__u64 *dest, kernel_cap_t src); +struct cred; +extern int checkpoint_restore_cap(__u64 e, __u64 i, __u64 p, __u64 x, + struct cred *cred); + /** * has_capability - Determine if a task has a superior capability available * @t: The task in question diff --git a/kernel/capability.c b/kernel/capability.c index 4e17041..d2c9bb3 100644 --- a/kernel/capability.c +++ b/kernel/capability.c @@ -217,6 +217,45 @@ SYSCALL_DEFINE2(capget, cap_user_header_t, header, cap_user_data_t, dataptr) return ret; } +static int do_capset_tocred(kernel_cap_t *effective, kernel_cap_t *inheritable, + kernel_cap_t *permitted, struct cred *new) +{ + int ret; + + ret = security_capset(new, current_cred(), + effective, inheritable, permitted); + if (ret < 0) + return ret; + + /* + * for checkpoint-restart, do we want to wait until end of restart? + * not sure we care */ + audit_log_capset(current->pid, new, current_cred()); + + return 0; +} + +static int do_capset(kernel_cap_t *effective, kernel_cap_t *inheritable, + kernel_cap_t *permitted) +{ + struct cred *new; + int ret; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + + ret = do_capset_tocred(effective, inheritable, permitted, new); + if (ret < 0) + goto error; + + return commit_creds(new); + +error: + abort_creds(new); + return ret; +} + /** * sys_capset - set capabilities for a process or (*) a group of processes * @header: pointer to struct that contains capability version and @@ -240,7 +279,6 @@ SYSCALL_DEFINE2(capset, cap_user_header_t, header, const cap_user_data_t, data) struct __user_cap_data_struct kdata[_KERNEL_CAPABILITY_U32S]; unsigned i, tocopy; kernel_cap_t inheritable, permitted, effective; - struct cred *new; int ret; pid_t pid; @@ -271,22 +309,52 @@ SYSCALL_DEFINE2(capset, cap_user_header_t, header, const cap_user_data_t, data) i++; } - new = prepare_creds(); - if (!new) - return -ENOMEM; + return do_capset(&effective, &inheritable, &permitted); - ret = security_capset(new, current_cred(), - &effective, &inheritable, &permitted); - if (ret < 0) - goto error; +} - audit_log_capset(pid, new, current_cred()); - return commit_creds(new); +void checkpoint_save_cap(__u64 *dest, kernel_cap_t src) +{ + *dest = src.cap[0] | (src.cap[1] << sizeof(__u32)); +} -error: - abort_creds(new); - return ret; +static void do_capbset_drop(struct cred *cred, int cap) +{ + cap_lower(cred->cap_bset, cap); +} + +int checkpoint_restore_cap(__u64 newe, __u64 newi, __u64 newp, __u64 newx, + struct cred *cred) +{ + kernel_cap_t effective, inheritable, permitted, bset; + int may_dropbcap = capable(CAP_SETPCAP); + int ret, i; + + effective.cap[0] = newe; + effective.cap[1] = (newe >> sizeof(__u32)); + inheritable.cap[0] = newi; + inheritable.cap[1] = (newi >> sizeof(__u32)); + permitted.cap[0] = newp; + permitted.cap[1] = (newp >> sizeof(__u32)); + bset.cap[0] = newx; + bset.cap[1] = (newx >> sizeof(__u32)); + + ret = do_capset_tocred(&effective, &inheritable, &permitted, cred); + if (ret < 0) + return ret; + + for (i = 0; i < CAP_LAST_CAP; i++) { + if (cap_raised(bset, i)) + continue; + if (!cap_raised(current_cred()->cap_bset, i)) + continue; + if (!may_dropbcap) + return -EPERM; + do_capbset_drop(cred, i); + } + + return 0; } /** -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 4/8] groups: move code to kernel/groups.c 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn ` (2 preceding siblings ...) 2009-05-26 17:33 ` [PATCH 3/8] cr: capabilities: define checkpoint and restore fns Serge E. Hallyn @ 2009-05-26 17:33 ` Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 5/8] groups: allow compilation on s390x Serge E. Hallyn ` (4 subsequent siblings) 8 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:33 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Move supplementary groups implementation to kernel/groups.c . kernel/sys.c already accumulated quite a few random stuff. Do strictly copy/paste + add required headers to compile. Compile-tested on many configs and archs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge Hallyn <serue@us.ibm.com> --- kernel/Makefile | 1 + kernel/groups.c | 288 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sys.c | 283 ------------------------------------------------------ 3 files changed, 289 insertions(+), 283 deletions(-) create mode 100644 kernel/groups.c diff --git a/kernel/Makefile b/kernel/Makefile index 6bc638d..4d4f741 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -11,6 +11,7 @@ obj-y = sched.o fork.o exec_domain.o panic.o printk.o \ hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \ notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \ async.o +obj-y += groups.o ifdef CONFIG_FUNCTION_TRACER # Do not trace debug files and internal ftrace files diff --git a/kernel/groups.c b/kernel/groups.c new file mode 100644 index 0000000..1b95b2f --- /dev/null +++ b/kernel/groups.c @@ -0,0 +1,288 @@ +/* + * Supplementary group IDs + */ +#include <linux/cred.h> +#include <linux/module.h> +#include <linux/slab.h> +#include <linux/security.h> +#include <linux/syscalls.h> +#include <asm/uaccess.h> + +/* init to 2 - one for init_task, one to ensure it is never freed */ +struct group_info init_groups = { .usage = ATOMIC_INIT(2) }; + +struct group_info *groups_alloc(int gidsetsize) +{ + struct group_info *group_info; + int nblocks; + int i; + + nblocks = (gidsetsize + NGROUPS_PER_BLOCK - 1) / NGROUPS_PER_BLOCK; + /* Make sure we always allocate at least one indirect block pointer */ + nblocks = nblocks ? : 1; + group_info = kmalloc(sizeof(*group_info) + nblocks*sizeof(gid_t *), GFP_USER); + if (!group_info) + return NULL; + group_info->ngroups = gidsetsize; + group_info->nblocks = nblocks; + atomic_set(&group_info->usage, 1); + + if (gidsetsize <= NGROUPS_SMALL) + group_info->blocks[0] = group_info->small_block; + else { + for (i = 0; i < nblocks; i++) { + gid_t *b; + b = (void *)__get_free_page(GFP_USER); + if (!b) + goto out_undo_partial_alloc; + group_info->blocks[i] = b; + } + } + return group_info; + +out_undo_partial_alloc: + while (--i >= 0) { + free_page((unsigned long)group_info->blocks[i]); + } + kfree(group_info); + return NULL; +} + +EXPORT_SYMBOL(groups_alloc); + +void groups_free(struct group_info *group_info) +{ + if (group_info->blocks[0] != group_info->small_block) { + int i; + for (i = 0; i < group_info->nblocks; i++) + free_page((unsigned long)group_info->blocks[i]); + } + kfree(group_info); +} + +EXPORT_SYMBOL(groups_free); + +/* export the group_info to a user-space array */ +static int groups_to_user(gid_t __user *grouplist, + const struct group_info *group_info) +{ + int i; + unsigned int count = group_info->ngroups; + + for (i = 0; i < group_info->nblocks; i++) { + unsigned int cp_count = min(NGROUPS_PER_BLOCK, count); + unsigned int len = cp_count * sizeof(*grouplist); + + if (copy_to_user(grouplist, group_info->blocks[i], len)) + return -EFAULT; + + grouplist += NGROUPS_PER_BLOCK; + count -= cp_count; + } + return 0; +} + +/* fill a group_info from a user-space array - it must be allocated already */ +static int groups_from_user(struct group_info *group_info, + gid_t __user *grouplist) +{ + int i; + unsigned int count = group_info->ngroups; + + for (i = 0; i < group_info->nblocks; i++) { + unsigned int cp_count = min(NGROUPS_PER_BLOCK, count); + unsigned int len = cp_count * sizeof(*grouplist); + + if (copy_from_user(group_info->blocks[i], grouplist, len)) + return -EFAULT; + + grouplist += NGROUPS_PER_BLOCK; + count -= cp_count; + } + return 0; +} + +/* a simple Shell sort */ +static void groups_sort(struct group_info *group_info) +{ + int base, max, stride; + int gidsetsize = group_info->ngroups; + + for (stride = 1; stride < gidsetsize; stride = 3 * stride + 1) + ; /* nothing */ + stride /= 3; + + while (stride) { + max = gidsetsize - stride; + for (base = 0; base < max; base++) { + int left = base; + int right = left + stride; + gid_t tmp = GROUP_AT(group_info, right); + + while (left >= 0 && GROUP_AT(group_info, left) > tmp) { + GROUP_AT(group_info, right) = + GROUP_AT(group_info, left); + right = left; + left -= stride; + } + GROUP_AT(group_info, right) = tmp; + } + stride /= 3; + } +} + +/* a simple bsearch */ +int groups_search(const struct group_info *group_info, gid_t grp) +{ + unsigned int left, right; + + if (!group_info) + return 0; + + left = 0; + right = group_info->ngroups; + while (left < right) { + unsigned int mid = (left+right)/2; + int cmp = grp - GROUP_AT(group_info, mid); + if (cmp > 0) + left = mid + 1; + else if (cmp < 0) + right = mid; + else + return 1; + } + return 0; +} + +/** + * set_groups - Change a group subscription in a set of credentials + * @new: The newly prepared set of credentials to alter + * @group_info: The group list to install + * + * Validate a group subscription and, if valid, insert it into a set + * of credentials. + */ +int set_groups(struct cred *new, struct group_info *group_info) +{ + int retval; + + retval = security_task_setgroups(group_info); + if (retval) + return retval; + + put_group_info(new->group_info); + groups_sort(group_info); + get_group_info(group_info); + new->group_info = group_info; + return 0; +} + +EXPORT_SYMBOL(set_groups); + +/** + * set_current_groups - Change current's group subscription + * @group_info: The group list to impose + * + * Validate a group subscription and, if valid, impose it upon current's task + * security record. + */ +int set_current_groups(struct group_info *group_info) +{ + struct cred *new; + int ret; + + new = prepare_creds(); + if (!new) + return -ENOMEM; + + ret = set_groups(new, group_info); + if (ret < 0) { + abort_creds(new); + return ret; + } + + return commit_creds(new); +} + +EXPORT_SYMBOL(set_current_groups); + +SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist) +{ + const struct cred *cred = current_cred(); + int i; + + if (gidsetsize < 0) + return -EINVAL; + + /* no need to grab task_lock here; it cannot change */ + i = cred->group_info->ngroups; + if (gidsetsize) { + if (i > gidsetsize) { + i = -EINVAL; + goto out; + } + if (groups_to_user(grouplist, cred->group_info)) { + i = -EFAULT; + goto out; + } + } +out: + return i; +} + +/* + * SMP: Our groups are copy-on-write. We can set them safely + * without another task interfering. + */ + +SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) +{ + struct group_info *group_info; + int retval; + + if (!capable(CAP_SETGID)) + return -EPERM; + if ((unsigned)gidsetsize > NGROUPS_MAX) + return -EINVAL; + + group_info = groups_alloc(gidsetsize); + if (!group_info) + return -ENOMEM; + retval = groups_from_user(group_info, grouplist); + if (retval) { + put_group_info(group_info); + return retval; + } + + retval = set_current_groups(group_info); + put_group_info(group_info); + + return retval; +} + +/* + * Check whether we're fsgid/egid or in the supplemental group.. + */ +int in_group_p(gid_t grp) +{ + const struct cred *cred = current_cred(); + int retval = 1; + + if (grp != cred->fsgid) + retval = groups_search(cred->group_info, grp); + return retval; +} + +EXPORT_SYMBOL(in_group_p); + +int in_egroup_p(gid_t grp) +{ + const struct cred *cred = current_cred(); + int retval = 1; + + if (grp != cred->egid) + retval = groups_search(cred->group_info, grp); + return retval; +} + +EXPORT_SYMBOL(in_egroup_p); diff --git a/kernel/sys.c b/kernel/sys.c index fe5dcfe..0cedec0 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1020,289 +1020,6 @@ out: return err; } -/* - * Supplementary group IDs - */ - -/* init to 2 - one for init_task, one to ensure it is never freed */ -struct group_info init_groups = { .usage = ATOMIC_INIT(2) }; - -struct group_info *groups_alloc(int gidsetsize) -{ - struct group_info *group_info; - int nblocks; - int i; - - nblocks = (gidsetsize + NGROUPS_PER_BLOCK - 1) / NGROUPS_PER_BLOCK; - /* Make sure we always allocate at least one indirect block pointer */ - nblocks = nblocks ? : 1; - group_info = kmalloc(sizeof(*group_info) + nblocks*sizeof(gid_t *), GFP_USER); - if (!group_info) - return NULL; - group_info->ngroups = gidsetsize; - group_info->nblocks = nblocks; - atomic_set(&group_info->usage, 1); - - if (gidsetsize <= NGROUPS_SMALL) - group_info->blocks[0] = group_info->small_block; - else { - for (i = 0; i < nblocks; i++) { - gid_t *b; - b = (void *)__get_free_page(GFP_USER); - if (!b) - goto out_undo_partial_alloc; - group_info->blocks[i] = b; - } - } - return group_info; - -out_undo_partial_alloc: - while (--i >= 0) { - free_page((unsigned long)group_info->blocks[i]); - } - kfree(group_info); - return NULL; -} - -EXPORT_SYMBOL(groups_alloc); - -void groups_free(struct group_info *group_info) -{ - if (group_info->blocks[0] != group_info->small_block) { - int i; - for (i = 0; i < group_info->nblocks; i++) - free_page((unsigned long)group_info->blocks[i]); - } - kfree(group_info); -} - -EXPORT_SYMBOL(groups_free); - -/* export the group_info to a user-space array */ -static int groups_to_user(gid_t __user *grouplist, - const struct group_info *group_info) -{ - int i; - unsigned int count = group_info->ngroups; - - for (i = 0; i < group_info->nblocks; i++) { - unsigned int cp_count = min(NGROUPS_PER_BLOCK, count); - unsigned int len = cp_count * sizeof(*grouplist); - - if (copy_to_user(grouplist, group_info->blocks[i], len)) - return -EFAULT; - - grouplist += NGROUPS_PER_BLOCK; - count -= cp_count; - } - return 0; -} - -/* fill a group_info from a user-space array - it must be allocated already */ -static int groups_from_user(struct group_info *group_info, - gid_t __user *grouplist) -{ - int i; - unsigned int count = group_info->ngroups; - - for (i = 0; i < group_info->nblocks; i++) { - unsigned int cp_count = min(NGROUPS_PER_BLOCK, count); - unsigned int len = cp_count * sizeof(*grouplist); - - if (copy_from_user(group_info->blocks[i], grouplist, len)) - return -EFAULT; - - grouplist += NGROUPS_PER_BLOCK; - count -= cp_count; - } - return 0; -} - -/* a simple Shell sort */ -static void groups_sort(struct group_info *group_info) -{ - int base, max, stride; - int gidsetsize = group_info->ngroups; - - for (stride = 1; stride < gidsetsize; stride = 3 * stride + 1) - ; /* nothing */ - stride /= 3; - - while (stride) { - max = gidsetsize - stride; - for (base = 0; base < max; base++) { - int left = base; - int right = left + stride; - gid_t tmp = GROUP_AT(group_info, right); - - while (left >= 0 && GROUP_AT(group_info, left) > tmp) { - GROUP_AT(group_info, right) = - GROUP_AT(group_info, left); - right = left; - left -= stride; - } - GROUP_AT(group_info, right) = tmp; - } - stride /= 3; - } -} - -/* a simple bsearch */ -int groups_search(const struct group_info *group_info, gid_t grp) -{ - unsigned int left, right; - - if (!group_info) - return 0; - - left = 0; - right = group_info->ngroups; - while (left < right) { - unsigned int mid = (left+right)/2; - int cmp = grp - GROUP_AT(group_info, mid); - if (cmp > 0) - left = mid + 1; - else if (cmp < 0) - right = mid; - else - return 1; - } - return 0; -} - -/** - * set_groups - Change a group subscription in a set of credentials - * @new: The newly prepared set of credentials to alter - * @group_info: The group list to install - * - * Validate a group subscription and, if valid, insert it into a set - * of credentials. - */ -int set_groups(struct cred *new, struct group_info *group_info) -{ - int retval; - - retval = security_task_setgroups(group_info); - if (retval) - return retval; - - put_group_info(new->group_info); - groups_sort(group_info); - get_group_info(group_info); - new->group_info = group_info; - return 0; -} - -EXPORT_SYMBOL(set_groups); - -/** - * set_current_groups - Change current's group subscription - * @group_info: The group list to impose - * - * Validate a group subscription and, if valid, impose it upon current's task - * security record. - */ -int set_current_groups(struct group_info *group_info) -{ - struct cred *new; - int ret; - - new = prepare_creds(); - if (!new) - return -ENOMEM; - - ret = set_groups(new, group_info); - if (ret < 0) { - abort_creds(new); - return ret; - } - - return commit_creds(new); -} - -EXPORT_SYMBOL(set_current_groups); - -SYSCALL_DEFINE2(getgroups, int, gidsetsize, gid_t __user *, grouplist) -{ - const struct cred *cred = current_cred(); - int i; - - if (gidsetsize < 0) - return -EINVAL; - - /* no need to grab task_lock here; it cannot change */ - i = cred->group_info->ngroups; - if (gidsetsize) { - if (i > gidsetsize) { - i = -EINVAL; - goto out; - } - if (groups_to_user(grouplist, cred->group_info)) { - i = -EFAULT; - goto out; - } - } -out: - return i; -} - -/* - * SMP: Our groups are copy-on-write. We can set them safely - * without another task interfering. - */ - -SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) -{ - struct group_info *group_info; - int retval; - - if (!capable(CAP_SETGID)) - return -EPERM; - if ((unsigned)gidsetsize > NGROUPS_MAX) - return -EINVAL; - - group_info = groups_alloc(gidsetsize); - if (!group_info) - return -ENOMEM; - retval = groups_from_user(group_info, grouplist); - if (retval) { - put_group_info(group_info); - return retval; - } - - retval = set_current_groups(group_info); - put_group_info(group_info); - - return retval; -} - -/* - * Check whether we're fsgid/egid or in the supplemental group.. - */ -int in_group_p(gid_t grp) -{ - const struct cred *cred = current_cred(); - int retval = 1; - - if (grp != cred->fsgid) - retval = groups_search(cred->group_info, grp); - return retval; -} - -EXPORT_SYMBOL(in_group_p); - -int in_egroup_p(gid_t grp) -{ - const struct cred *cred = current_cred(); - int retval = 1; - - if (grp != cred->egid) - retval = groups_search(cred->group_info, grp); - return retval; -} - -EXPORT_SYMBOL(in_egroup_p); - DECLARE_RWSEM(uts_sem); SYSCALL_DEFINE1(newuname, struct new_utsname __user *, name) -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 5/8] groups: allow compilation on s390x 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn ` (3 preceding siblings ...) 2009-05-26 17:33 ` [PATCH 4/8] groups: move code to kernel/groups.c Serge E. Hallyn @ 2009-05-26 17:33 ` Serge E. Hallyn 2009-05-26 23:17 ` Serge E. Hallyn [not found] ` <20090526173242.GA13757-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> ` (3 subsequent siblings) 8 siblings, 1 reply; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:33 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> --- kernel/groups.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/kernel/groups.c b/kernel/groups.c index 1b95b2f..14ebc6a 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -1,6 +1,7 @@ /* * Supplementary group IDs */ +#include <linux/init.h> #include <linux/cred.h> #include <linux/module.h> #include <linux/slab.h> -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 5/8] groups: allow compilation on s390x 2009-05-26 17:33 ` [PATCH 5/8] groups: allow compilation on s390x Serge E. Hallyn @ 2009-05-26 23:17 ` Serge E. Hallyn 0 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 23:17 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Quoting Serge E. Hallyn (serue@us.ibm.com): > Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> > --- > kernel/groups.c | 1 + > 1 files changed, 1 insertions(+), 0 deletions(-) > > diff --git a/kernel/groups.c b/kernel/groups.c > index 1b95b2f..14ebc6a 100644 > --- a/kernel/groups.c > +++ b/kernel/groups.c > @@ -1,6 +1,7 @@ > /* > * Supplementary group IDs > */ > +#include <linux/init.h> > #include <linux/cred.h> > #include <linux/module.h> > #include <linux/slab.h> > -- > 1.6.1 As noted by Alexey, this is wrong, and the prob was actually fixed by the following patch he'd also sent last friday, so please replace this patch with the following: From: Alexey Dobriyan <adobriyan@gmail.com> Subject: [PATCH 01/38] cred: #include init.h in cred.h cred.h can't be included as first header because it uses __init and doesn't include init.h which is enough to break compilation on at least ia64. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> --- include/linux/cred.h | 1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/cred.h b/include/linux/cred.h index 3282ee4..4fa9996 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -13,6 +13,7 @@ #define _LINUX_CRED_H #include <linux/capability.h> +#include <linux/init.h> #include <linux/key.h> #include <asm/atomic.h> -- 1.5.6.5 _______________________________________________ Containers mailing list Containers@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/containers ^ permalink raw reply related [flat|nested] 17+ messages in thread
[parent not found: <20090526173242.GA13757-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>]
* [PATCH 6/8] cr: checkpoint and restore task credentials [not found] ` <20090526173242.GA13757-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2009-05-26 17:33 ` Serge E. Hallyn 2009-05-27 18:36 ` Alexey Dobriyan 0 siblings, 1 reply; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:33 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, linux-security-module-u79uwXL29TY76Z2rM5mHXA, Alexey Dobriyan This patch adds the checkpointing and restart of credentials (uids, gids, and capabilities) to Oren's c/r patchset (on top of v14). It goes to great pains to re-use (and define when needed) common helpers, in order to make sure that as security code is modified, the cr code will be updated. Some of the helpers should still be moved (i.e. _creds() functions should be in kernel/cred.c). When building the credentials for the restarted process, I 1. create a new struct cred as a copy of the running task's cred (using prepare_cred()) 2. always authorize any changes to the new struct cred based on the permissions of current_cred() (not the current transient state of the new cred). While this may mean that certain transient_cred1->transient_cred2 states are allowed which otherwise wouldn't be allowed, the fact remains that current_cred() is allowed to transition to transient_cred2. The reconstructed creds are applied to the task at the very end of the sys_restart call. This ensures that any objects which need to be re-created (file, socket, etc) are re-created using the creds of the task calling sys_restart - preventing an unpriv user from creating a privileged object, and ensuring that a root task can restart a process which had started out privileged, created some privileged objects, then dropped its privilege. With these patches, the root user can restart checkpoint images (created by either hallyn or root) of user hallyn's tasks, resulting in a program owned by hallyn. Plenty of bugs to be found, no doubt. TODO: * fully remove limit on # of groups in groupinfo at restore_read_groupinfo(). * restore uid info on sysvipc objects Changelog: May 26: Move group, user, userns, creds c/r functions out of checkpoint/process.c and into the appropriate files. May 26: Define struct ckpt_hdr_task_creds and move task cred objref c/r into {checkpoint_restore}_task_shared(). May 26: Take cred refs around checkpoint_write_creds() May 20: Remove the limit on number of groups in groupinfo at checkpoint time May 20: Remove the depth limit on empty user namespaces May 20: Better document checkpoint_user May 18: fix more refcounting: if (userns 5, uid 0) had no active tasks or child user_namespaces, then it shouldn't exist at restart or it, its namespace, and its whole chain of creators will be leaked. May 14: fix some refcounting: 1. a new user_ns needs a ref to remain pinned by its root user 2. current_user_ns needs an extra ref bc objhash drops two on restart 3. cred needs a ref for the real credentials bc commit_creds eats one ref. May 13: folded in fix to userns refcounting. Signed-off-by: Serge E. Hallyn <serue-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> --- checkpoint/objhash.c | 118 +++++++++++++++++++++++++++++ checkpoint/process.c | 154 +++++++++++++++++++++++++++++++++++++- include/linux/checkpoint.h | 11 +++ include/linux/checkpoint_hdr.h | 67 ++++++++++++++++ include/linux/checkpoint_types.h | 2 + include/linux/cred.h | 11 +++ include/linux/sched.h | 6 ++ include/linux/user_namespace.h | 6 ++ kernel/cred.c | 120 +++++++++++++++++++++++++++++ kernel/groups.c | 56 ++++++++++++++ kernel/user.c | 147 ++++++++++++++++++++++++++++++++++++ kernel/user_namespace.c | 86 +++++++++++++++++++++ 12 files changed, 782 insertions(+), 2 deletions(-) diff --git a/checkpoint/objhash.c b/checkpoint/objhash.c index 5618fff..44b948a 100644 --- a/checkpoint/objhash.c +++ b/checkpoint/objhash.c @@ -16,6 +16,7 @@ #include <linux/file.h> #include <linux/sched.h> #include <linux/ipc_namespace.h> +#include <linux/user_namespace.h> #include <linux/checkpoint.h> #include <linux/checkpoint_hdr.h> @@ -155,6 +156,71 @@ static int obj_ipc_ns_users(void *ptr) return atomic_read(&((struct ipc_namespace *) ptr)->count); } +static int obj_cred_grab(void *ptr) +{ + get_cred((struct cred *) ptr); + return 0; +} + +static void obj_cred_drop(void *ptr) +{ + put_cred((struct cred *) ptr); +} + +static int obj_cred_users(void *ptr) +{ + return atomic_read(&((struct cred *) ptr)->usage); +} + +static int obj_user_grab(void *ptr) +{ + struct user_struct *u = ptr; + (void) get_uid(u); + return 0; +} + +static void obj_user_drop(void *ptr) +{ + free_uid((struct user_struct *) ptr); +} + +static int obj_user_users(void *ptr) +{ + return atomic_read(&((struct user_struct *) ptr)->__count); +} + +static int obj_userns_grab(void *ptr) +{ + get_user_ns((struct user_namespace *) ptr); + return 0; +} + +static void obj_userns_drop(void *ptr) +{ + put_user_ns((struct user_namespace *) ptr); +} + +static int obj_user_ns_users(void *ptr) +{ + return atomic_read(&((struct user_namespace *) ptr)->kref.refcount); +} + +static int obj_groupinfo_grab(void *ptr) +{ + get_group_info((struct group_info *) ptr); + return 0; +} + +static void obj_groupinfo_drop(void *ptr) +{ + put_group_info((struct group_info *) ptr); +} + +static int obj_groupinfo_users(void *ptr) +{ + return atomic_read(&((struct group_info *) ptr)->usage); +} + static struct ckpt_obj_ops ckpt_obj_ops[] = { /* ignored object */ { @@ -221,6 +287,46 @@ static struct ckpt_obj_ops ckpt_obj_ops[] = { .checkpoint = checkpoint_bad, .restore = restore_bad, }, + /* user_ns object */ + { + .obj_name = "USER_NS", + .obj_type = CKPT_OBJ_USER_NS, + .ref_drop = obj_userns_drop, + .ref_grab = obj_userns_grab, + .ref_users = obj_user_ns_users, + .checkpoint = checkpoint_userns, + .restore = restore_userns, + }, + /* struct cred */ + { + .obj_name = "CRED", + .obj_type = CKPT_OBJ_CRED, + .ref_drop = obj_cred_drop, + .ref_grab = obj_cred_grab, + .ref_users = obj_cred_users, + .checkpoint = checkpoint_cred, + .restore = restore_cred, + }, + /* user object */ + { + .obj_name = "USER", + .obj_type = CKPT_OBJ_USER, + .ref_drop = obj_user_drop, + .ref_grab = obj_user_grab, + .ref_users = obj_user_users, + .checkpoint = checkpoint_user, + .restore = restore_user, + }, + /* struct groupinfo */ + { + .obj_name = "GROUPINFO", + .obj_type = CKPT_OBJ_GROUPINFO, + .ref_drop = obj_groupinfo_drop, + .ref_grab = obj_groupinfo_grab, + .ref_users = obj_groupinfo_users, + .checkpoint = checkpoint_groupinfo, + .restore = restore_groupinfo, + }, }; @@ -290,6 +396,18 @@ static struct ckpt_obj *obj_find_by_ptr(struct ckpt_ctx *ctx, void *ptr) return NULL; } +/* + * look up an obj and return objref if in hash, else + * return 0. Used during checkpoint. + */ +int obj_lookup(struct ckpt_ctx *ctx, void *ptr) +{ + struct ckpt_obj *obj = obj_find_by_ptr(ctx, ptr); + if (obj) + return obj->objref; + return 0; +} + static struct ckpt_obj *obj_find_by_objref(struct ckpt_ctx *ctx, int objref) { struct hlist_head *h; diff --git a/checkpoint/process.c b/checkpoint/process.c index fa166cd..41656e3 100644 --- a/checkpoint/process.c +++ b/checkpoint/process.c @@ -17,6 +17,7 @@ #include <linux/poll.h> #include <linux/nsproxy.h> #include <linux/utsname.h> +#include <linux/user_namespace.h> #include <linux/checkpoint.h> #include <linux/checkpoint_hdr.h> #include <linux/syscalls.h> @@ -27,6 +28,26 @@ * Checkpoint */ +int checkpoint_groupinfo(struct ckpt_ctx *ctx, void *ptr) +{ + return checkpoint_write_groupinfo(ctx, (struct group_info *)ptr); +} + +int checkpoint_userns(struct ckpt_ctx *ctx, void *ptr) +{ + return checkpoint_write_userns(ctx, (struct user_namespace *) ptr); +} + +int checkpoint_user(struct ckpt_ctx *ctx, void *ptr) +{ + return checkpoint_write_user(ctx, (struct user_struct *)ptr); +} + +int checkpoint_cred(struct ckpt_ctx *ctx, void *ptr) +{ + return checkpoint_write_cred(ctx, (struct cred *) ptr); +} + /* dump the task_struct of a given task */ static int checkpoint_task_struct(struct ckpt_ctx *ctx, struct task_struct *t) { @@ -298,6 +319,46 @@ static int checkpoint_task_objs(struct ckpt_ctx *ctx, struct task_struct *t) return ret; } +static int checkpoint_task_creds(struct ckpt_ctx *ctx, struct task_struct *t) +{ + int realcred_ref, ecred_ref; + struct cred *rcred, *ecred; + struct ckpt_hdr_task_creds *h; + int ret; + + rcred = get_cred(t->real_cred); + ecred = get_cred(t->cred); + + realcred_ref = checkpoint_obj(ctx, rcred, CKPT_OBJ_CRED); + if (realcred_ref < 0) { + ret = realcred_ref; + goto error; + } + + ecred_ref = checkpoint_obj(ctx, ecred, CKPT_OBJ_CRED); + if (ecred_ref < 0) { + ret = ecred_ref; + goto error; + } + + h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_TASK_CREDS); + if (!h) { + ret = -ENOMEM; + goto error; + } + + h->cred_ref = realcred_ref; + h->ecred_ref = ecred_ref; + ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h); + ckpt_hdr_put(ctx, h); + +error: + put_cred(rcred); + put_cred(ecred); + return ret; + +} + /* dump the task's shared state */ static int checkpoint_task_shared(struct ckpt_ctx *ctx, struct task_struct *t) { @@ -311,7 +372,9 @@ static int checkpoint_task_shared(struct ckpt_ctx *ctx, struct task_struct *t) * memory layout, such that during restart a task will already * have its namespaces restored when it gets to restore the mm. */ - ret = checkpoint_task_ns(ctx, t); + ret = checkpoint_task_creds(ctx, t); + if (!ret) + ret = checkpoint_task_ns(ctx, t); if (!ret) ret = checkpoint_task_objs(ctx, t); return ret; @@ -352,6 +415,26 @@ int checkpoint_task(struct ckpt_ctx *ctx, struct task_struct *t) * Restart */ +void *restore_groupinfo(struct ckpt_ctx *ctx) +{ + return (void *) restore_read_groupinfo(ctx); +} + +void *restore_userns(struct ckpt_ctx *ctx) +{ + return (void *) restore_read_userns(ctx); +} + +void *restore_user(struct ckpt_ctx *ctx) +{ + return (void *) restore_read_user(ctx); +} + +void *restore_cred(struct ckpt_ctx *ctx) +{ + return (void *) restore_read_cred(ctx); +} + /* read the task_struct into the current task */ static int restore_task_struct(struct ckpt_ctx *ctx) { @@ -369,8 +452,12 @@ static int restore_task_struct(struct ckpt_ctx *ctx) memset(t->comm, 0, TASK_COMM_LEN); ret = _ckpt_read_string(ctx, t->comm, h->task_comm_len); + if (ret < 0) + goto out; /* FIXME: restore remaining relevant task_struct fields */ + + ret = 0; out: ckpt_hdr_put(ctx, h); return ret; @@ -637,6 +724,34 @@ static int restore_task_objs(struct ckpt_ctx *ctx) return ret; } +static int restore_task_creds(struct ckpt_ctx *ctx) +{ + struct ckpt_hdr_task_creds *h; + struct cred *realcred, *ecred; + int ret = 0; + + h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_TASK_CREDS); + if (IS_ERR(h)) + return PTR_ERR(h); + + realcred = ckpt_obj_fetch(ctx, h->cred_ref, CKPT_OBJ_CRED); + if (IS_ERR(realcred)) { + ret = PTR_ERR(realcred); + goto out; + } + ecred = ckpt_obj_fetch(ctx, h->ecred_ref, CKPT_OBJ_CRED); + if (IS_ERR(ecred)) { + ret = PTR_ERR(ecred); + goto out; + } + ctx->realcred = realcred; + ctx->ecred = ecred; + +out: + ckpt_hdr_put(ctx, h); + return ret; +} + static int restore_task_shared(struct ckpt_ctx *ctx) { int ret; @@ -646,17 +761,48 @@ static int restore_task_shared(struct ckpt_ctx *ctx) * shared objects are restored before they are referenced, * ensure that the namespaces are fully restored first. */ - ret = restore_task_ns(ctx); + ret = restore_task_creds(ctx); + if (!ret) + ret = restore_task_ns(ctx); if (!ret) ret = restore_task_objs(ctx); return ret; } +static int restore_creds(struct ckpt_ctx *ctx) +{ + int ret; + const struct cred *old; + struct cred *rcred, *ecred; + + rcred = ctx->realcred; + ecred = ctx->ecred; + + /* commit_creds will take one ref for the eff creds, but + * expects us to hold a ref for the obj creds, so take a + * ref here */ + get_cred(rcred); + ret = commit_creds(rcred); + if (ret) + return ret; + + if (ecred == rcred) + return 0; + + old = override_creds(ecred); /* override_creds otoh takes new ref */ + put_cred(old); + + ctx->realcred = ctx->ecred = NULL; + return 0; +} + /* read the entire state of the current task */ int restore_task(struct ckpt_ctx *ctx) { int ret; + struct cred *realcred, *ecred; + ctx->realcred = ctx->ecred = NULL; ret = restore_task_struct(ctx); ckpt_debug("ret %d\n", ret); if (ret < 0) @@ -679,6 +825,10 @@ int restore_task(struct ckpt_ctx *ctx) goto out; ret = restore_cpu(ctx); ckpt_debug("cpu: ret %d\n", ret); + if (ret < 0) + goto out; + ret = restore_creds(ctx); + ckpt_debug("creds: ret %d\n", ret); out: return ret; } diff --git a/include/linux/checkpoint.h b/include/linux/checkpoint.h index 9660c54..b26b738 100644 --- a/include/linux/checkpoint.h +++ b/include/linux/checkpoint.h @@ -58,6 +58,7 @@ extern void *ckpt_obj_fetch(struct ckpt_ctx *ctx, int objref, enum obj_type type); extern int ckpt_obj_lookup_add(struct ckpt_ctx *ctx, void *ptr, enum obj_type type, int *first); +extern int obj_lookup(struct ckpt_ctx *ctx, void *ptr); extern int ckpt_obj_insert(struct ckpt_ctx *ctx, void *ptr, int objref, enum obj_type type); @@ -95,6 +96,16 @@ static inline int restore_ipc_ns(struct ckpt_ctx *ctx) extern int checkpoint_ipcns(struct ckpt_ctx *ctx, struct ipc_namespace *ipc_ns); extern int restore_ipcns(struct ckpt_ctx *ctx); +/* credentials */ +int checkpoint_groupinfo(struct ckpt_ctx *ctx, void *ptr); +int checkpoint_userns(struct ckpt_ctx *ctx, void *ptr); +int checkpoint_user(struct ckpt_ctx *ctx, void *ptr); +int checkpoint_cred(struct ckpt_ctx *ctx, void *ptr); +void *restore_groupinfo(struct ckpt_ctx *ctx); +void *restore_userns(struct ckpt_ctx *ctx); +void *restore_user(struct ckpt_ctx *ctx); +void *restore_cred(struct ckpt_ctx *ctx); + /* memory */ extern void ckpt_pgarr_free(struct ckpt_ctx *ctx); diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h index 8dc6438..df35703 100644 --- a/include/linux/checkpoint_hdr.h +++ b/include/linux/checkpoint_hdr.h @@ -38,6 +38,8 @@ struct ckpt_hdr { __u32 len; } __attribute__((aligned(8))); +#define CKPT_CRED_VERSION_1 1 + /* header types */ enum { CKPT_HDR_HEADER = 1, @@ -57,6 +59,11 @@ enum { CKPT_HDR_NS, CKPT_HDR_UTS_NS, CKPT_HDR_IPC_NS, + CKPT_HDR_USER_NS, + CKPT_HDR_CRED, + CKPT_HDR_USER, + CKPT_HDR_GROUPINFO, + CKPT_HDR_TASK_CREDS, CKPT_HDR_MM = 201, CKPT_HDR_VMA, @@ -103,6 +110,10 @@ enum obj_type { CKPT_OBJ_NS, CKPT_OBJ_UTS_NS, CKPT_OBJ_IPC_NS, + CKPT_OBJ_USER_NS, + CKPT_OBJ_CRED, + CKPT_OBJ_USER, + CKPT_OBJ_GROUPINFO, CKPT_OBJ_MAX }; @@ -161,12 +172,68 @@ struct ckpt_hdr_task { __u32 exit_code; __u32 exit_signal; +#ifdef CONFIG_AUDITSYSCALL + /* would audit want to track the checkpointed ids, + or (more likely) who actually restarted? */ +#endif + __u32 task_comm_len; + __u32 padding; +} __attribute__((aligned(8))); + +struct ckpt_hdr_task_creds { + struct ckpt_hdr h; + __s32 cred_ref; + __s32 ecred_ref; +} __attribute__((aligned(8))); + +struct ckpt_hdr_cred { + struct ckpt_hdr h; + __u32 version; /* especially since capability sets might grow */ + __u32 uid, suid, euid, fsuid; + __u32 gid, sgid, egid, fsgid; + __u64 cap_i, cap_p, cap_e; + __u64 cap_x; /* bounding set ('X') */ + __s32 user_ref; + __s32 groupinfo_ref; + __u32 padding; +} __attribute__((aligned(8))); + +struct ckpt_hdr_groupinfo { + struct ckpt_hdr h; + __u32 ngroups; + /* + * This is followed by ngroups __u32s + */ + __u32 groups[0]; +} __attribute__((aligned(8))); + +/* + * todo - keyrings and LSM + * These may be better done with userspace help though + */ +struct ckpt_hdr_user_struct { + struct ckpt_hdr h; + __u32 uid; + __s32 userns_ref; +} __attribute__((aligned(8))); + +/* + * The user-struct mostly tracks system resource usage. + * Most of it's contents therefore will simply be set + * correctly as restart opens resources + */ +#define CKPT_USERNS_INIT 1 +struct ckpt_hdr_user_ns { + struct ckpt_hdr h; + __u32 flags; + __s32 creator_ref; } __attribute__((aligned(8))); struct ckpt_hdr_task_ns { struct ckpt_hdr h; __s32 ns_objref; + __u32 padding; } __attribute__((aligned(8))); struct ckpt_hdr_task_objs { diff --git a/include/linux/checkpoint_types.h b/include/linux/checkpoint_types.h index 3fdc43e..62b8272 100644 --- a/include/linux/checkpoint_types.h +++ b/include/linux/checkpoint_types.h @@ -13,6 +13,7 @@ #define CKPT_VERSION 1 #define CHECKPOINT_SUBTREE 0x4 +#define RESTORE_CREATE_USERNS 0x8 #ifdef __KERNEL__ @@ -67,6 +68,7 @@ struct ckpt_ctx { atomic_t tasks_count; /* sync of tasks: used to coordinate */ struct completion complete; /* container root and other tasks on */ wait_queue_head_t waitq; /* start, end, and restart ordering */ + struct cred *realcred, *ecred; /* tmp storage for cred at restart */ }; diff --git a/include/linux/cred.h b/include/linux/cred.h index bc5ffc2..14d7a10 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -76,6 +76,12 @@ extern int groups_search(const struct group_info *, gid_t); extern int in_group_p(gid_t); extern int in_egroup_p(gid_t); +#ifdef CONFIG_CHECKPOINT +struct ckpt_ctx; +int checkpoint_write_groupinfo(struct ckpt_ctx *, struct group_info *); +struct group_info *restore_read_groupinfo(struct ckpt_ctx *); +#endif + /* * The common credentials for a thread group * - shared by CLONE_THREAD @@ -351,4 +357,9 @@ int cred_setresgid(struct cred *new, gid_t rgid, gid_t egid, gid_t sgid); int cred_setfsuid(struct cred *new, uid_t uid, uid_t *old_fsuid); int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid); +#ifdef CONFIG_CHECKPOINT +int checkpoint_write_cred(struct ckpt_ctx *, const struct cred *); +struct cred *restore_read_cred(struct ckpt_ctx *); +#endif + #endif /* _LINUX_CRED_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index d057e7a..06445c8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1871,6 +1871,12 @@ static inline struct user_struct *get_uid(struct user_struct *u) extern void free_uid(struct user_struct *); extern void release_uids(struct user_namespace *ns); +#ifdef CONFIG_CHECKPOINT +struct ckpt_ctx; +int checkpoint_write_user(struct ckpt_ctx *, struct user_struct *); +struct user_struct *restore_read_user(struct ckpt_ctx *); +#endif + #include <asm/current.h> extern void do_timer(unsigned long ticks); diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index a2b82d5..3eeee40 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -62,4 +62,10 @@ static inline void put_user_ns(struct user_namespace *ns) #endif +#ifdef CONFIG_CHECKPOINT +struct ckpt_ctx; +int checkpoint_write_userns(struct ckpt_ctx *, struct user_namespace *); +struct user_namespace *restore_read_userns(struct ckpt_ctx *); +#endif + #endif /* _LINUX_USER_H */ diff --git a/kernel/cred.c b/kernel/cred.c index a017399..c05192e 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -16,6 +16,7 @@ #include <linux/init_task.h> #include <linux/security.h> #include <linux/cn_proc.h> +#include <linux/checkpoint.h> #include "cred-internals.h" static struct kmem_cache *cred_jar; @@ -703,3 +704,122 @@ int cred_setfsgid(struct cred *new, gid_t gid, gid_t *old_fsgid) } return -EPERM; } + +#ifdef CONFIG_CHECKPOINT +int checkpoint_write_cred(struct ckpt_ctx *ctx, const struct cred *cred) +{ + int ret; + int groupinfo_ref, user_ref; + struct ckpt_hdr_cred *h; + + groupinfo_ref = checkpoint_obj(ctx, cred->group_info, + CKPT_OBJ_GROUPINFO); + if (groupinfo_ref < 0) + return groupinfo_ref; + user_ref = checkpoint_obj(ctx, cred->user, CKPT_OBJ_USER); + if (user_ref < 0) + return user_ref; + + h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_CRED); + if (!h) + return -ENOMEM; + + h->version = CKPT_CRED_VERSION_1; + h->uid = cred->uid; + h->suid = cred->suid; + h->euid = cred->euid; + h->fsuid = cred->fsuid; + + h->gid = cred->gid; + h->sgid = cred->sgid; + h->egid = cred->egid; + h->fsgid = cred->fsgid; + + checkpoint_save_cap(&h->cap_i, cred->cap_inheritable); + checkpoint_save_cap(&h->cap_p, cred->cap_permitted); + checkpoint_save_cap(&h->cap_e, cred->cap_effective); + checkpoint_save_cap(&h->cap_x, cred->cap_bset); + + h->user_ref = user_ref; + h->groupinfo_ref = groupinfo_ref; + + ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h); + ckpt_hdr_put(ctx, h); + + return ret; +} + +struct cred *restore_read_cred(struct ckpt_ctx *ctx) +{ + struct cred *cred; + struct ckpt_hdr_cred *h; + struct user_struct *user; + struct group_info *groupinfo; + int ret = -EINVAL; + uid_t olduid; + gid_t oldgid; + int i; + + h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_CRED); + if (IS_ERR(h)) + return ERR_PTR(PTR_ERR(h)); + if (h->version != CKPT_CRED_VERSION_1) + goto error; + + cred = prepare_creds(); + if (!cred) + goto error; + + + /* Do we care if the target user and target group were compatible? + * Probably. But then, we can't do any setuid without CAP_SETUID, + * so we must have been privileged to abuse it... */ + groupinfo = ckpt_obj_fetch(ctx, h->groupinfo_ref, CKPT_OBJ_GROUPINFO); + if (IS_ERR(groupinfo)) + goto err_putcred; + user = ckpt_obj_fetch(ctx, h->user_ref, CKPT_OBJ_USER); + if (IS_ERR(user)) + goto err_putcred; + + /* + * TODO: this check should go into the common helper in + * kernel/sys.c, and should account for user namespaces + */ + if (!capable(CAP_SETGID)) + for (i = 0; i < groupinfo->ngroups; i++) { + if (!in_egroup_p(GROUP_AT(groupinfo, i))) + goto err_putcred; + } + ret = set_groups(cred, groupinfo); + if (ret < 0) + goto err_putcred; + free_uid(cred->user); + cred->user = get_uid(user); + ret = cred_setresuid(cred, h->uid, h->euid, h->suid); + if (ret < 0) + goto err_putcred; + ret = cred_setfsuid(cred, h->fsuid, &olduid); + if (olduid != h->fsuid && ret < 0) + goto err_putcred; + ret = cred_setresgid(cred, h->gid, h->egid, h->sgid); + if (ret < 0) + goto err_putcred; + ret = cred_setfsgid(cred, h->fsgid, &oldgid); + if (oldgid != h->fsgid && ret < 0) + goto err_putcred; + ret = checkpoint_restore_cap(h->cap_e, h->cap_i, h->cap_p, h->cap_x, + cred); + if (ret) + goto err_putcred; + + ckpt_hdr_put(ctx, h); + return cred; + +err_putcred: + abort_creds(cred); +error: + ckpt_hdr_put(ctx, h); + return ERR_PTR(ret); +} + +#endif diff --git a/kernel/groups.c b/kernel/groups.c index 14ebc6a..46c4a14 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -7,6 +7,7 @@ #include <linux/slab.h> #include <linux/security.h> #include <linux/syscalls.h> +#include <linux/checkpoint.h> #include <asm/uaccess.h> /* init to 2 - one for init_task, one to ensure it is never freed */ @@ -287,3 +288,58 @@ int in_egroup_p(gid_t grp) } EXPORT_SYMBOL(in_egroup_p); + +#ifdef CONFIG_CHECKPOINT +int checkpoint_write_groupinfo(struct ckpt_ctx *ctx, struct group_info *g) +{ + int ret, i, size; + struct ckpt_hdr_groupinfo *h; + + size = sizeof(*h) + g->ngroups * sizeof(__u32); + h = ckpt_hdr_get_type(ctx, size, CKPT_HDR_GROUPINFO); + if (!h) + return -ENOMEM; + + h->ngroups = g->ngroups; + for (i = 0; i < g->ngroups; i++) + h->groups[i] = GROUP_AT(g, i); + + ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h); + ckpt_hdr_put(ctx, h); + + return ret; +} + +/* + * TODO - switch to reading in blocks, and only return an + * error for truly obscene # groups (like 10000) + */ +#define CKPT_MAXGROUPS 100 +#define MAX_GROUPINFO_SIZE (sizeof(*h)+CKPT_MAXGROUPS*sizeof(gid_t)) +struct group_info *restore_read_groupinfo(struct ckpt_ctx *ctx) +{ + struct group_info *g; + struct ckpt_hdr_groupinfo *h; + int i; + + h = ckpt_read_buf_type(ctx, MAX_GROUPINFO_SIZE, CKPT_HDR_GROUPINFO); + if (IS_ERR(h)) + return ERR_PTR(PTR_ERR(h)); + if (h->ngroups > CKPT_MAXGROUPS) { + g = ERR_PTR(-EINVAL); + goto out; + } + g = groups_alloc(h->ngroups); + if (!g) { + g = ERR_PTR(-ENOMEM); + goto out; + } + for (i = 0; i < h->ngroups; i++) + GROUP_AT(g, i) = h->groups[i]; + +out: + ckpt_hdr_put(ctx, h); + return g; +} + +#endif diff --git a/kernel/user.c b/kernel/user.c index 850e0ba..97f13e2 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -16,6 +16,7 @@ #include <linux/interrupt.h> #include <linux/module.h> #include <linux/user_namespace.h> +#include <linux/checkpoint.h> #include "cred-internals.h" struct user_namespace init_user_ns = { @@ -497,3 +498,149 @@ static int __init uid_cache_init(void) } module_init(uid_cache_init); + +#ifdef CONFIG_CHECKPOINT +/* + * write the user struct + * TODO keyring will need to be dumped + * + * Here is what we're doing. Remember a task can do clone(CLONE_NEWUSER) + * resulting in a cloned task in a new user namespace, with uid 0 in that + * new user_ns. In that case, the parent's user (uid+user_ns) is the + * 'creator' of the new user_ns. + * Here, we call the user_ns of the ctx->root_task the 'root_ns'. When we + * checkpoint a user-struct, we must store the chain of creators. We + * must not do so recursively, this being the kernel. In + * checkpoint_write_user() we walk and record in memory the list of creators up + * to either the latest user_struct which has already been saved, or the + * root_ns. Then we walk that chain backward, writing out the user_ns and + * user_struct to the checkpoint image. + */ +#define UNSAVED_STRIDE 50 +int checkpoint_write_user(struct ckpt_ctx *ctx, struct user_struct *u) +{ + struct user_namespace *ns, *root_ns; + struct ckpt_hdr_user_struct *h; + int ns_objref; + int ret, i, unsaved_ns_nr = 0; + struct user_struct *save_u; + struct user_struct **unsaved_creators; + int step = 1, size; + + /* if we've already saved the userns, then life is good */ + ns_objref = obj_lookup(ctx, u->user_ns); + if (ns_objref) + goto write_user; + + root_ns = task_cred_xxx(ctx->root_task, user)->user_ns; + + if (u->user_ns == root_ns) + goto save_last_ns; + + size = UNSAVED_STRIDE*sizeof(struct user_struct *); + unsaved_creators = kmalloc(size, GFP_KERNEL); + if (!unsaved_creators) + return -ENOMEM; + save_u = u; + do { + ns = save_u->user_ns; + save_u = ns->creator; + if (obj_lookup(ctx, save_u)) + goto found; + unsaved_creators[unsaved_ns_nr++] = save_u; + if (unsaved_ns_nr == step * UNSAVED_STRIDE) { + step++; + size = step*UNSAVED_STRIDE*sizeof(struct user_struct *); + unsaved_creators = krealloc(unsaved_creators, size, + GFP_KERNEL); + if (!unsaved_creators) + return -ENOMEM; + } + } while (ns != root_ns); + +found: + for (i = unsaved_ns_nr-1; i >= 0; i--) { + ret = checkpoint_obj(ctx, unsaved_creators[i], CKPT_OBJ_USER); + if (ret < 0) { + kfree(unsaved_creators); + return ret; + } + } + kfree(unsaved_creators); + +save_last_ns: + ns_objref = checkpoint_obj(ctx, u->user_ns, CKPT_OBJ_USER_NS); + if (ns_objref < 0) + return ns_objref; + +write_user: + h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_USER); + if (!h) + return -ENOMEM; + + h->uid = u->uid; + h->userns_ref = ns_objref; + + /* write out the user_struct */ + ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h); + ckpt_hdr_put(ctx, h); + + return ret; +} + +static int may_setuid(struct user_namespace *ns, uid_t uid) +{ + /* + * this next check will one day become + * if capable(CAP_SETUID, ns) return 1; + * followed by uid_equiv(current_userns, current_uid, ns, uid) + * instead of just uids. + */ + if (capable(CAP_SETUID)) + return 1; + + /* + * this may be overly strict, but since we might end up + * restarting a privileged program here, we do not want + * someone with only CAP_SYS_ADMIN but no CAP_SETUID to + * be able to create random userids even in a userns he + * created. + */ + if (current_user()->user_ns != ns) + return 0; + if (current_uid() == uid || + current_euid() == uid || + current_suid() == uid) + return 1; + return 0; +} + +struct user_struct *restore_read_user(struct ckpt_ctx *ctx) +{ + struct user_struct *u; + struct user_namespace *ns; + struct ckpt_hdr_user_struct *h; + + h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_USER); + if (IS_ERR(h)) + return ERR_PTR(PTR_ERR(h)); + + ns = ckpt_obj_fetch(ctx, h->userns_ref, CKPT_OBJ_USER_NS); + if (IS_ERR(ns)) { + u = ERR_PTR(PTR_ERR(ns)); + goto out; + } + + if (!may_setuid(ns, h->uid)) { + u = ERR_PTR(-EPERM); + goto out; + } + u = alloc_uid(ns, h->uid); + if (!u) + u = ERR_PTR(-EINVAL); + +out: + ckpt_hdr_put(ctx, h); + return u; +} +#endif diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index e624b0f..857cb3d 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -9,6 +9,7 @@ #include <linux/nsproxy.h> #include <linux/slab.h> #include <linux/user_namespace.h> +#include <linux/checkpoint.h> #include <linux/cred.h> static struct user_namespace *_new_user_ns(struct user_struct *creator, @@ -103,3 +104,88 @@ void free_user_ns(struct kref *kref) schedule_work(&ns->destroyer); } EXPORT_SYMBOL(free_user_ns); + +#ifdef CONFIG_CHECKPOINT +/* + * checkpoint_write_userns() is only called from + * checkpoint_write_user(). When called, we always know that + * either: + * 1. This is the root_ns (user_ns of the ctx->root_task), + * in which case we don't store a creator, but rather + * set the CKPT_USERNS_INIT flag. + * or + * 2. The creator has already been written out to the + * checkpoint image (and saved in the objhash) + */ +int checkpoint_write_userns(struct ckpt_ctx *ctx, + struct user_namespace *ns) +{ + struct ckpt_hdr_user_ns *h; + int creator_ref = 0; + unsigned int flags = 0; + struct user_namespace *root_ns; + int ret; + + root_ns = task_cred_xxx(ctx->root_task, user)->user_ns; + if (ns == root_ns) + flags = CKPT_USERNS_INIT; + else + creator_ref = obj_lookup(ctx, ns->creator); + if (!flags && !creator_ref) + return -EINVAL; + + h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_USER_NS); + if (!h) + return -ENOMEM; + h->creator_ref = creator_ref; + h->flags = flags; + ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h); + ckpt_hdr_put(ctx, h); + + return ret; +} + +struct user_namespace *restore_read_userns(struct ckpt_ctx *ctx) +{ + struct ckpt_hdr_user_ns *h; + struct user_namespace *ns; + struct user_struct *new_root, *creator; + + h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_USER_NS); + if (IS_ERR(h)) + return ERR_PTR(PTR_ERR(h)); + if (h->flags & ~CKPT_USERNS_INIT) /* only 1 valid flag */ + return ERR_PTR(-EINVAL); + if (h->flags & CKPT_USERNS_INIT) { + ckpt_hdr_put(ctx, h); + /* grab an extra ref bc objhash will drop an extra */ + return get_user_ns(current_user_ns()); + } + creator = ckpt_obj_fetch(ctx, h->creator_ref, CKPT_OBJ_USER); + ckpt_hdr_put(ctx, h); + + if (IS_ERR(creator)) + return ERR_PTR(-EINVAL); + ns = new_user_ns(creator, &new_root); + + if (IS_ERR(ns)) + return ns; + + /* new_user_ns() doesn't bump creator's refcount */ + get_uid(creator); + + /* objhash will drop new_ns refcount, but new_root + * should hold a ref */ + get_user_ns(ns); + + /* + * Free the new root user. If we actually needed it, + * then it will show up later in the checkpoint image + * The objhash will keep the userns pinned until then. + */ + free_uid(new_root); + + return ns; +} + +#endif -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 6/8] cr: checkpoint and restore task credentials 2009-05-26 17:33 ` [PATCH 6/8] cr: checkpoint and restore task credentials Serge E. Hallyn @ 2009-05-27 18:36 ` Alexey Dobriyan 2009-05-28 14:01 ` Serge E. Hallyn 0 siblings, 1 reply; 17+ messages in thread From: Alexey Dobriyan @ 2009-05-27 18:36 UTC (permalink / raw) To: Serge E. Hallyn Cc: Oren Laadan, Linux Containers, David Howells, linux-security-module On Tue, May 26, 2009 at 12:33:54PM -0500, Serge E. Hallyn wrote: > +struct ckpt_hdr_cred { > + struct ckpt_hdr h; > + __u32 version; /* especially since capability sets might grow */ Oh, no. Image version should be incremented. > + __u32 uid, suid, euid, fsuid; > + __u32 gid, sgid, egid, fsgid; > + __u64 cap_i, cap_p, cap_e; > + __u64 cap_x; /* bounding set ('X') */ > + __s32 user_ref; > + __s32 groupinfo_ref; > + __u32 padding; > +} __attribute__((aligned(8))); > + > +struct ckpt_hdr_groupinfo { > + struct ckpt_hdr h; > + __u32 ngroups; > + /* > + * This is followed by ngroups __u32s > + */ > + __u32 groups[0]; > +} __attribute__((aligned(8))); > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1871,6 +1871,12 @@ static inline struct user_struct *get_uid(struct user_struct *u) > extern void free_uid(struct user_struct *); > extern void release_uids(struct user_namespace *ns); > > +#ifdef CONFIG_CHECKPOINT > +struct ckpt_ctx; > +int checkpoint_write_user(struct ckpt_ctx *, struct user_struct *); > +struct user_struct *restore_read_user(struct ckpt_ctx *); > +#endif I'll rip credential stuff from sched.h, better not add more. > --- a/kernel/groups.c > +++ b/kernel/groups.c > @@ -287,3 +288,58 @@ int in_egroup_p(gid_t grp) > } > > EXPORT_SYMBOL(in_egroup_p); > + > +#ifdef CONFIG_CHECKPOINT > +int checkpoint_write_groupinfo(struct ckpt_ctx *ctx, struct group_info *g) > +{ > + int ret, i, size; > + struct ckpt_hdr_groupinfo *h; > + > + size = sizeof(*h) + g->ngroups * sizeof(__u32); > + h = ckpt_hdr_get_type(ctx, size, CKPT_HDR_GROUPINFO); > + if (!h) > + return -ENOMEM; > + > + h->ngroups = g->ngroups; > + for (i = 0; i < g->ngroups; i++) > + h->groups[i] = GROUP_AT(g, i); > + > + ret = ckpt_write_obj(ctx, (struct ckpt_hdr *) h); > + ckpt_hdr_put(ctx, h); > + > + return ret; > +} > + > +/* > + * TODO - switch to reading in blocks, and only return an > + * error for truly obscene # groups (like 10000) > + */ > +#define CKPT_MAXGROUPS 100 > +#define MAX_GROUPINFO_SIZE (sizeof(*h)+CKPT_MAXGROUPS*sizeof(gid_t)) > +struct group_info *restore_read_groupinfo(struct ckpt_ctx *ctx) > +{ > + struct group_info *g; > + struct ckpt_hdr_groupinfo *h; > + int i; > + > + h = ckpt_read_buf_type(ctx, MAX_GROUPINFO_SIZE, CKPT_HDR_GROUPINFO); > + if (IS_ERR(h)) > + return ERR_PTR(PTR_ERR(h)); > + if (h->ngroups > CKPT_MAXGROUPS) { > + g = ERR_PTR(-EINVAL); > + goto out; > + } > + g = groups_alloc(h->ngroups); > + if (!g) { > + g = ERR_PTR(-ENOMEM); > + goto out; > + } > + for (i = 0; i < h->ngroups; i++) > + GROUP_AT(g, i) = h->groups[i]; > + > +out: > + ckpt_hdr_put(ctx, h); > + return g; > +} No checks, that groups in image are a) sorted, b) ->ngroups is compatible with object image. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 6/8] cr: checkpoint and restore task credentials 2009-05-27 18:36 ` Alexey Dobriyan @ 2009-05-28 14:01 ` Serge E. Hallyn 2009-05-28 14:36 ` Alexey Dobriyan 0 siblings, 1 reply; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-28 14:01 UTC (permalink / raw) To: Alexey Dobriyan Cc: Serge E. Hallyn, Linux Containers, David Howells, linux-security-module Quoting Alexey Dobriyan (adobriyan@gmail.com): > On Tue, May 26, 2009 at 12:33:54PM -0500, Serge E. Hallyn wrote: > > +struct ckpt_hdr_cred { > > + struct ckpt_hdr h; > > + __u32 version; /* especially since capability sets might grow */ > > Oh, no. Image version should be incremented. Why? The format hasn't changed since my last set I don't think... Oh, I added the padding. Thanks. I have to bump it again for the next set (hopefully out today or tomorrow) as it adds securebits. (And hopefully a first stab at LSM, though it's not looking likely) > > + __u32 uid, suid, euid, fsuid; > > + __u32 gid, sgid, egid, fsgid; > > + __u64 cap_i, cap_p, cap_e; > > + __u64 cap_x; /* bounding set ('X') */ > > + __s32 user_ref; > > + __s32 groupinfo_ref; > > + __u32 padding; > > +} __attribute__((aligned(8))); > > + > > +struct ckpt_hdr_groupinfo { > > + struct ckpt_hdr h; > > + __u32 ngroups; > > + /* > > + * This is followed by ngroups __u32s > > + */ > > + __u32 groups[0]; > > +} __attribute__((aligned(8))); > > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -1871,6 +1871,12 @@ static inline struct user_struct *get_uid(struct user_struct *u) > > extern void free_uid(struct user_struct *); > > extern void release_uids(struct user_namespace *ns); > > > > +#ifdef CONFIG_CHECKPOINT > > +struct ckpt_ctx; > > +int checkpoint_write_user(struct ckpt_ctx *, struct user_struct *); > > +struct user_struct *restore_read_user(struct ckpt_ctx *); > > +#endif > > I'll rip credential stuff from sched.h, better not add more. Yeah I'll move this in cred.h. ... > > +#define CKPT_MAXGROUPS 100 > > +#define MAX_GROUPINFO_SIZE (sizeof(*h)+CKPT_MAXGROUPS*sizeof(gid_t)) > > +struct group_info *restore_read_groupinfo(struct ckpt_ctx *ctx) > > +{ > > + struct group_info *g; > > + struct ckpt_hdr_groupinfo *h; > > + int i; > > + > > + h = ckpt_read_buf_type(ctx, MAX_GROUPINFO_SIZE, CKPT_HDR_GROUPINFO); > > + if (IS_ERR(h)) > > + return ERR_PTR(PTR_ERR(h)); > > + if (h->ngroups > CKPT_MAXGROUPS) { > > + g = ERR_PTR(-EINVAL); > > + goto out; > > + } > > + g = groups_alloc(h->ngroups); > > + if (!g) { > > + g = ERR_PTR(-ENOMEM); > > + goto out; > > + } > > + for (i = 0; i < h->ngroups; i++) > > + GROUP_AT(g, i) = h->groups[i]; > > + > > +out: > > + ckpt_hdr_put(ctx, h); > > + return g; > > +} > > No checks, that groups in image are a) sorted, b) ->ngroups is compatible > with object image. Thanks, will fix. So I'd like to suggest that we take the pieces that we can both use (the code in groups.c, cred.c, security/security.c, and capabilities) and get it identical between both versions. But we would need to find a way to ignore API differences for reading and writing the checkpoint file. BTW I have some credentials (users, user namespaces, and securebits) testcases under cr_tests/userns/ in git://git.sr71.net/~hallyn/cr_tests.git. Maybe you can reuse some of that for your own testing. thanks, -serge ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 6/8] cr: checkpoint and restore task credentials 2009-05-28 14:01 ` Serge E. Hallyn @ 2009-05-28 14:36 ` Alexey Dobriyan 0 siblings, 0 replies; 17+ messages in thread From: Alexey Dobriyan @ 2009-05-28 14:36 UTC (permalink / raw) To: Serge E. Hallyn Cc: Serge E. Hallyn, Linux Containers, David Howells, linux-security-module On Thu, May 28, 2009 at 09:01:10AM -0500, Serge E. Hallyn wrote: > Quoting Alexey Dobriyan (adobriyan@gmail.com): > > On Tue, May 26, 2009 at 12:33:54PM -0500, Serge E. Hallyn wrote: > > > +struct ckpt_hdr_cred { > > > + struct ckpt_hdr h; > > > + __u32 version; /* especially since capability sets might grow */ > > > > Oh, no. Image version should be incremented. > > Why? The format hasn't changed since my last set I don't think... > > Oh, I added the padding. Thanks. I have to bump it again for the > next set (hopefully out today or tomorrow) as it adds securebits. > (And hopefully a first stab at LSM, though it's not looking > likely) Well, formally format has changed. But the statement is that image version alone is enough so per-object image versions aren't necessary. ^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH 7/8] cr: restore file->f_cred 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn ` (5 preceding siblings ...) [not found] ` <20090526173242.GA13757-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> @ 2009-05-26 17:34 ` Serge E. Hallyn 2009-05-26 17:34 ` [PATCH 8/8] user namespaces: debug refcounts Serge E. Hallyn 2009-05-27 3:05 ` [PATCH 0/8] a start to credentials c/r Casey Schaufler 8 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:34 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Restore a file's f_cred. This is set to the cred of the task doing the open, so often it will be the same as that of the restarted task. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> --- checkpoint/files.c | 16 ++++++++++++++-- include/linux/checkpoint_hdr.h | 2 +- 2 files changed, 15 insertions(+), 3 deletions(-) diff --git a/checkpoint/files.c b/checkpoint/files.c index 8c7ba9f..3e9ff75 100644 --- a/checkpoint/files.c +++ b/checkpoint/files.c @@ -152,7 +152,11 @@ int checkpoint_file_common(struct ckpt_ctx *ctx, struct file *file, h->f_pos = file->f_pos; h->f_version = file->f_version; - /* FIX: need also file->uid, file->gid, file->f_owner, etc */ + h->f_credref = checkpoint_obj(ctx, file->f_cred, CKPT_OBJ_CRED); + if (h->f_credref < 0) + return h->f_credref; + + /* FIX: need also file->f_owner, etc */ return 0; } @@ -361,8 +365,16 @@ int restore_file_common(struct ckpt_ctx *ctx, struct file *file, struct ckpt_hdr_file *h) { int ret; + struct cred *cred; + + /* FIX: need to restore owner etc */ - /* FIX: need to restore uid, gid, owner etc */ + /* restore the cred */ + cred = ckpt_obj_fetch(ctx, h->f_credref, CKPT_OBJ_CRED); + if (IS_ERR(cred)) + return PTR_ERR(cred); + put_cred(file->f_cred); + file->f_cred = get_cred(cred); /* safe to set 1st arg (fd) to 0, as command is F_SETFL */ ret = vfs_fcntl(0, F_SETFL, h->f_flags & CKPT_SETFL_MASK, file); diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h index df35703..0bad447 100644 --- a/include/linux/checkpoint_hdr.h +++ b/include/linux/checkpoint_hdr.h @@ -350,7 +350,7 @@ struct ckpt_hdr_file { __u32 f_type; __u32 f_mode; __u32 f_flags; - __u32 _padding; + __s32 f_credref; __u64 f_pos; __u64 f_version; } __attribute__((aligned(8))); -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH 8/8] user namespaces: debug refcounts 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn ` (6 preceding siblings ...) 2009-05-26 17:34 ` [PATCH 7/8] cr: restore file->f_cred Serge E. Hallyn @ 2009-05-26 17:34 ` Serge E. Hallyn 2009-05-27 3:05 ` [PATCH 0/8] a start to credentials c/r Casey Schaufler 8 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-26 17:34 UTC (permalink / raw) To: Oren Laadan Cc: Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Create /proc/userns, which prints out all user namespaces. It prints the address of the user_ns itself, the uid and userns address of the user who created it, and the reference count. Signed-off-by: Serge E. Hallyn <serue@us.ibm.com> --- checkpoint/process.c | 2 - include/linux/user_namespace.h | 2 + kernel/user.c | 1 + kernel/user_namespace.c | 84 ++++++++++++++++++++++++++++++++++++++++ 4 files changed, 87 insertions(+), 2 deletions(-) diff --git a/checkpoint/process.c b/checkpoint/process.c index 41656e3..c1db231 100644 --- a/checkpoint/process.c +++ b/checkpoint/process.c @@ -800,9 +800,7 @@ static int restore_creds(struct ckpt_ctx *ctx) int restore_task(struct ckpt_ctx *ctx) { int ret; - struct cred *realcred, *ecred; - ctx->realcred = ctx->ecred = NULL; ret = restore_task_struct(ctx); ckpt_debug("ret %d\n", ret); if (ret < 0) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 3eeee40..4503224 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -14,8 +14,10 @@ struct user_namespace { struct hlist_head uidhash_table[UIDHASH_SZ]; struct user_struct *creator; struct work_struct destroyer; + struct list_head list; }; +extern spinlock_t usernslist_lock; extern struct user_namespace init_user_ns; #ifdef CONFIG_USER_NS diff --git a/kernel/user.c b/kernel/user.c index 97f13e2..1a9a44f 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -24,6 +24,7 @@ struct user_namespace init_user_ns = { .refcount = ATOMIC_INIT(2), }, .creator = &root_user, + .list = LIST_HEAD_INIT(init_user_ns.list), }; EXPORT_SYMBOL_GPL(init_user_ns); diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index 857cb3d..e76b38f 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -11,6 +11,11 @@ #include <linux/user_namespace.h> #include <linux/checkpoint.h> #include <linux/cred.h> +#include <linux/proc_fs.h> +#include <linux/seq_file.h> +#include <linux/spinlock.h> + +DEFINE_SPINLOCK(usernslist_lock); static struct user_namespace *_new_user_ns(struct user_struct *creator, struct user_struct **newroot) @@ -41,6 +46,9 @@ static struct user_namespace *_new_user_ns(struct user_struct *creator, /* alloc_uid() incremented the userns refcount. Just set it to 1 */ kref_set(&ns->kref, 1); + spin_lock(&usernslist_lock); + list_add_tail(&ns->list, &init_user_ns.list); + spin_unlock(&usernslist_lock); *newroot = root_user; return ns; } @@ -91,6 +99,9 @@ static void free_user_ns_work(struct work_struct *work) { struct user_namespace *ns = container_of(work, struct user_namespace, destroyer); + spin_lock(&usernslist_lock); + list_del(&ns->list); + spin_unlock(&usernslist_lock); free_uid(ns->creator); kfree(ns); } @@ -105,6 +116,79 @@ void free_user_ns(struct kref *kref) } EXPORT_SYMBOL(free_user_ns); +#ifdef CONFIG_PROC_FS +static int proc_userns_show(struct seq_file *m, void *v) +{ + struct user_namespace *ns = v; + seq_printf(m, "userns %p creator (uid %d ns %p) count %d\n", + (void *)ns, ns->creator->uid, (void *) ns->creator->user_ns, + atomic_read(&ns->kref.refcount)); + return 0; +} + +static void *proc_userns_start(struct seq_file *p, loff_t *_pos) +{ + loff_t pos = *_pos; + struct user_namespace *ns = &init_user_ns; + spin_lock(&usernslist_lock); + while (pos) { + pos--; + ns = list_entry(ns->list.next, struct user_namespace, list); + if (ns == &init_user_ns) + return NULL; + } + return ns; +} + +static void *proc_userns_next(struct seq_file *p, void *v, loff_t *_pos) +{ + struct user_namespace *ns = v; + (*_pos)++; + ns = list_entry(ns->list.next, struct user_namespace, list); + if (ns == &init_user_ns) + return NULL; + return ns; +} + +static void proc_userns_stop(struct seq_file *p, void *v) +{ + spin_unlock(&usernslist_lock); +} + +static const struct seq_operations proc_userns_ops; + +static int proc_userns_open(struct inode *inode, struct file *filp) +{ + return seq_open(filp, &proc_userns_ops); +} + +static const struct seq_operations proc_userns_ops = { + .start = proc_userns_start, + .next = proc_userns_next, + .stop = proc_userns_stop, + .show = proc_userns_show, +}; + +const struct file_operations proc_userns_fops = { + .open = proc_userns_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + +static __init int user_ns_debug(void) +{ + struct proc_dir_entry *p; + + p = proc_create("userns", 0, NULL, &proc_userns_fops); + if (!p) + panic("cannot create /proc/userns\n"); + return 0; +} + +__initcall(user_ns_debug); +#endif + #ifdef CONFIG_CHECKPOINT /* * checkpoint_write_userns() is only called from -- 1.6.1 ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] a start to credentials c/r 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn ` (7 preceding siblings ...) 2009-05-26 17:34 ` [PATCH 8/8] user namespaces: debug refcounts Serge E. Hallyn @ 2009-05-27 3:05 ` Casey Schaufler 2009-05-27 12:37 ` Serge E. Hallyn 8 siblings, 1 reply; 17+ messages in thread From: Casey Schaufler @ 2009-05-27 3:05 UTC (permalink / raw) To: Serge E. Hallyn Cc: Oren Laadan, Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Serge E. Hallyn wrote: > Following is the next version of the credentials c/r patchset, > on top of the c/r patchset at > git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git > > It implements checkpoint and restart of user, user namespaces, > groups, supplementary groups, and struct cred. > > There is a question as to what to do about LSM data at > restart. Right now I'm ignoring it, which means that > prepare_creds() should ensure that the restart tasks get > the context of the task calling sys_restart(). I > suspect the right thing to do is to add two new LSM > hooks, one which checks current's authorization to > restart from the checkpoint file, How would that work? Based on information in the file? You have to assume that some number of checkpoint files have been hand written by Elbonian ne'er do wells. > and one which determines > the task->cred->security filed based upon any of: > 1. current_security() of the task calling sys_restart() > 2. the task->cred->security checkpointed in the ckpt file > 3. the ->security of the checkpoint file > For Smack the correct behavior would be: 1. for sys_restart() callers without CAP_MAC_ADMIN 2. for sys_restart() callers with CAP_MAC_ADMIN 3. never sys_restart() callers running with CAP_MAC_ADMIN would have to be very very careful about the files they restart. But that's nothing new in the MAC world. > Oren, I think this version has all the changes you asked > for except for restoring cred info for sysvipc. > > thanks, > -serge > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] a start to credentials c/r 2009-05-27 3:05 ` [PATCH 0/8] a start to credentials c/r Casey Schaufler @ 2009-05-27 12:37 ` Serge E. Hallyn 2009-05-27 16:03 ` Casey Schaufler 0 siblings, 1 reply; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-27 12:37 UTC (permalink / raw) To: Casey Schaufler Cc: Oren Laadan, Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Quoting Casey Schaufler (casey@schaufler-ca.com): > Serge E. Hallyn wrote: > > Following is the next version of the credentials c/r patchset, > > on top of the c/r patchset at > > git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git > > > > It implements checkpoint and restart of user, user namespaces, > > groups, supplementary groups, and struct cred. > > > > There is a question as to what to do about LSM data at > > restart. Right now I'm ignoring it, which means that > > prepare_creds() should ensure that the restart tasks get > > the context of the task calling sys_restart(). I > > suspect the right thing to do is to add two new LSM > > hooks, one which checks current's authorization to > > restart from the checkpoint file, > > How would that work? Based on information in the file? > You have to assume that some number of checkpoint files > have been hand written by Elbonian ne'er do wells. Not based on information in the file, but based on the credentials of the task which created the file, and whether an unprivileged task could have hand-edited the file before feeding it to sys_restart(). So some example decisions in terms of selinux contexts might be, 1. a task of user_u may restart a file of type user_u if the checkpointed context is user_u 2. a task of user_u may NOT restart a file of type user_u if the checkpointed context is root_u 3. a task of root_u may restart a file of type root_u if the checkpointed context is user_u Uh, so yes, bsaed on info in the file as well :) Except of course the LSM would just be fed the checkpointed context and the checkpoint file context (and can deduce current's context). > > and one which determines > > the task->cred->security filed based upon any of: > > 1. current_security() of the task calling sys_restart() > > 2. the task->cred->security checkpointed in the ckpt file > > 3. the ->security of the checkpoint file > > > > For Smack the correct behavior would be: > > 1. for sys_restart() callers without CAP_MAC_ADMIN > 2. for sys_restart() callers with CAP_MAC_ADMIN > 3. never That makes sense, and is basically analagous (if I'm thinking right) to how I'm doing capabilities. So the first (authorization hook) for smack would just always return TRUE? I can hook that up right now... > sys_restart() callers running with CAP_MAC_ADMIN would have to be > very very careful about the files they restart. But that's nothing > new in the MAC world. Yup. Mind you eventually I expect a setup where some privileged program is asked (by privileged or unprivilegd tasks) to create a checkpoint and ask the TPM to sign it. No unprivileged program can sign an image directly, so then a restart of a task with privilege can be restricted to anything with a valid signature. In that case, it may be safe to have the checkpointed task's credentials completely restored, including LSM labels. But that's a ways off. thanks, -serge ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] a start to credentials c/r 2009-05-27 12:37 ` Serge E. Hallyn @ 2009-05-27 16:03 ` Casey Schaufler 2009-05-27 18:24 ` Serge E. Hallyn 0 siblings, 1 reply; 17+ messages in thread From: Casey Schaufler @ 2009-05-27 16:03 UTC (permalink / raw) To: Serge E. Hallyn Cc: Oren Laadan, Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Serge E. Hallyn wrote: > Quoting Casey Schaufler (casey@schaufler-ca.com): > >> Serge E. Hallyn wrote: >> >>> Following is the next version of the credentials c/r patchset, >>> on top of the c/r patchset at >>> git://git.ncl.cs.columbia.edu/pub/git/linux-cr.git >>> >>> It implements checkpoint and restart of user, user namespaces, >>> groups, supplementary groups, and struct cred. >>> >>> There is a question as to what to do about LSM data at >>> restart. Right now I'm ignoring it, which means that >>> prepare_creds() should ensure that the restart tasks get >>> the context of the task calling sys_restart(). I >>> suspect the right thing to do is to add two new LSM >>> hooks, one which checks current's authorization to >>> restart from the checkpoint file, >>> >> How would that work? Based on information in the file? >> You have to assume that some number of checkpoint files >> have been hand written by Elbonian ne'er do wells. >> > > Not based on information in the file, but based on the > credentials of the task which created the file, and > whether an unprivileged task could have hand-edited the > file before feeding it to sys_restart(). > > So some example decisions in terms of selinux contexts might be, > 1. a task of user_u may restart a file of type user_u > if the checkpointed context is user_u > 2. a task of user_u may NOT restart a file of type user_u > if the checkpointed context is root_u > 3. a task of root_u may restart a file of type root_u > if the checkpointed context is user_u > > Uh, so yes, bsaed on info in the file as well :) Except > of course the LSM would just be fed the checkpointed context > and the checkpoint file context (and can deduce current's context). > And SELinux can do whatever calculations it likes based on the three contexts and the loaded policy. Are you at all concerned about the possibility that the policy may have changed? I can envision scenarios in which it would be impossible for a process to gain a particular context under current policy, but that a checkpointed process may have stored away. > >>> and one which determines >>> the task->cred->security filed based upon any of: >>> 1. current_security() of the task calling sys_restart() >>> 2. the task->cred->security checkpointed in the ckpt file >>> 3. the ->security of the checkpoint file >>> >>> >> For Smack the correct behavior would be: >> >> 1. for sys_restart() callers without CAP_MAC_ADMIN >> 2. for sys_restart() callers with CAP_MAC_ADMIN >> 3. never >> > > That makes sense, and is basically analagous (if I'm thinking > right) to how I'm doing capabilities. > > So the first (authorization hook) for smack would just always > return TRUE? > I suggest that it needs to check for a valid Smack label. Even though they're just text strings they do have limitations, including size (> 0 < 24) and character set. A call to smk_import() is the right way to do it, as it also makes sure the label is in the internal list. If smk_import() returns NULL something's amiss. > I can hook that up right now... > I bet you could do it even with the call to smk_import. (smiley here) > >> sys_restart() callers running with CAP_MAC_ADMIN would have to be >> very very careful about the files they restart. But that's nothing >> new in the MAC world. >> > > Yup. > > Mind you eventually I expect a setup where some privileged program > is asked (by privileged or unprivilegd tasks) to create a checkpoint > and ask the TPM to sign it. No unprivileged program can sign an > image directly, so then a restart of a task with privilege can be > restricted to anything with a valid signature. In that case, it > may be safe to have the checkpointed task's credentials completely > restored, including LSM labels. > All of the current LSMs share the property that the access control rules (SELinux policy, Smack access rules, TOMOYO policy) may change between the time of checkpoint and the time of restart. If I had a silver bullet answer to the concerns that raises I'd pass it along, but as I don't I'll stick to the answer I have for Smack (The rules of the moment are those that matter, and the architecture of Smack supports that) and leave the other LSMs to their own devices. > But that's a ways off. > It does look like a bit of work. Thank you. > thanks, > -serge > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH 0/8] a start to credentials c/r 2009-05-27 16:03 ` Casey Schaufler @ 2009-05-27 18:24 ` Serge E. Hallyn 0 siblings, 0 replies; 17+ messages in thread From: Serge E. Hallyn @ 2009-05-27 18:24 UTC (permalink / raw) To: Casey Schaufler Cc: Oren Laadan, Linux Containers, David Howells, Alexey Dobriyan, linux-security-module Quoting Casey Schaufler (casey@schaufler-ca.com): > Serge E. Hallyn wrote: > > Quoting Casey Schaufler (casey@schaufler-ca.com): ... > > Uh, so yes, bsaed on info in the file as well :) Except > > of course the LSM would just be fed the checkpointed context > > and the checkpoint file context (and can deduce current's context). > > > > And SELinux can do whatever calculations it likes based on the > three contexts and the loaded policy. Are you at all concerned > about the possibility that the policy may have changed? I can > envision scenarios in which it would be impossible for a process > to gain a particular context under current policy, but that a > checkpointed process may have stored away. Good point. But on the other hand, if the program were running the whole time, instead of being checkpointed and restarted, then the running program wouldn't be relabeled when the policy changed, right? Now if the domain becomes invalid, then presumably the restart would fail. But if the (source_domain,entry_type)->new_domain set changes from (root_t,x_entry_t)->x_t to (root_t,x_entry_t)->y_t, a task running as x_t won't be relabeled to y_t. So I don't thnk restarting a task which is checkpointed as x_t, under the x_t domain, is wrong. > >>> and one which determines > >>> the task->cred->security filed based upon any of: > >>> 1. current_security() of the task calling sys_restart() > >>> 2. the task->cred->security checkpointed in the ckpt file > >>> 3. the ->security of the checkpoint file > >>> > >>> > >> For Smack the correct behavior would be: > >> > >> 1. for sys_restart() callers without CAP_MAC_ADMIN > >> 2. for sys_restart() callers with CAP_MAC_ADMIN > >> 3. never > >> > > > > That makes sense, and is basically analagous (if I'm thinking > > right) to how I'm doing capabilities. > > > > So the first (authorization hook) for smack would just always > > return TRUE? > > > > I suggest that it needs to check for a valid Smack label. Even though > they're just text strings they do have limitations, including size > (> 0 < 24) and character set. A call to smk_import() is the right > way to do it, as it also makes sure the label is in the internal list. > If smk_import() returns NULL something's amiss. Ok, thanks. -serge ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2009-05-28 14:36 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-05-26 17:32 [PATCH 0/8] a start to credentials c/r Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 1/8] cr: break out new_user_ns() Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 2/8] cr: split core function out of some set*{u,g}id functions Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 3/8] cr: capabilities: define checkpoint and restore fns Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 4/8] groups: move code to kernel/groups.c Serge E. Hallyn 2009-05-26 17:33 ` [PATCH 5/8] groups: allow compilation on s390x Serge E. Hallyn 2009-05-26 23:17 ` Serge E. Hallyn [not found] ` <20090526173242.GA13757-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> 2009-05-26 17:33 ` [PATCH 6/8] cr: checkpoint and restore task credentials Serge E. Hallyn 2009-05-27 18:36 ` Alexey Dobriyan 2009-05-28 14:01 ` Serge E. Hallyn 2009-05-28 14:36 ` Alexey Dobriyan 2009-05-26 17:34 ` [PATCH 7/8] cr: restore file->f_cred Serge E. Hallyn 2009-05-26 17:34 ` [PATCH 8/8] user namespaces: debug refcounts Serge E. Hallyn 2009-05-27 3:05 ` [PATCH 0/8] a start to credentials c/r Casey Schaufler 2009-05-27 12:37 ` Serge E. Hallyn 2009-05-27 16:03 ` Casey Schaufler 2009-05-27 18:24 ` Serge E. Hallyn
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.