First of all, thank you very much for the review! I am still learning with many things and your feedback is very much appreciated! >First off: On Linux, as far as I know, chroots were never meant to be a security feature, and chroot "jails" break in a number of different ways. A lot of effort went into making bind mounts actually secure with reasonable performance, and I doubt that >something like this can provide anything close to that, at least not without gigantic runtime overhead. Instead of making people believe that it's now okay to use chroot for security, I would very much prefer to keep the "never use this for security >purposes" warning in the chroot() manpage and encourage people to use namespaces with bind mounts instead. I am aware that chroot jail should not be used for security purposes, but reality is that it is *used* in many places at least with some intention of getting a better security. I don't believe there is any person nowadays who believes that chroot is secure, but because it is super light-weight (even compare with namespaces setup) and because of legacy concerns, it is still there. I don't also claim that this LSM make chroot *fully* secure (as you mentioned below some important features are still missing) and it would never be as strong as full namespace-based solution, but practical problem with namespaces is that it isn't that easy to setup them up properly and possibility of making configuration mistake is still quite high. About the performance overhead, I actually don't believe it would be that gigantic (especially when we get proper LSM stacking in place that eliminates the need to store internal list and do searches), but of course any LSM has performance impact, especially with the hooks on the filesystem. And don't take me wrong, I do personally like namespaces very much. Just usecases I am trying to address with this are very different. > I think that a somewhat more formal specification of what this is supposed to be good for might help. Yes, I think this make sense. It would clarify on when you should use this (in cases when you do need to use chroot only but want to make it a degree better). >As far as I can tell, you haven't done anything to restrict the access to unix domain sockets or ptrace, which are probably the main tricks people would actually use to break out of a chroot "jail". Instead, you add support for some weird edgecase things like >"if you are privileged and inside a chroot, you can renice processes outside the chroot". Supporting unix domain sockets is in plans, I was looking into all GRKERNSEC features and working them one by one. I first processed all features that can be easily moved into existing LSM hooks, and nice was one of them. Then I started adding new hooks for the rest of features. But at some point I just exceeded my threshold of confidence with all the changes and hooks I am making (unix domain sockets might need further new hooks), so I wanted to bring this out for review earlier than later. What about ptrace? I am assume that we have Yama in place for this :) > Note: this feature currently prevents rtkit daemon from > normal operation. >And it probably also prevents people from using chroot() for what it's actually intended to do. I would not go as far as that. How many cases apart from rtkit daemon who sets priorities for other processes you know that would suffer from this behaviour? Again, this was an easy feature that fits into existing hook and can be fully removed if people don't see a value in it. It is certainly not the main part of it. > +/** > + * is_process_chrooted - process is inside chroot > + * @task: the task_struct of the process to be checked > + * > + * Returns 1 if task is inside chroot. > + */ > +static int is_process_chrooted(struct task_struct *task) { > + int rc = 0; > + struct hardchroot_info *entry; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(entry, &hardchroot_infos, node) { >Wait, what? Is this really a function that has execution time linear in the number of running processes and that is called for every fchdir()? That seems slow. Did you benchmark this on a system with lots of processes? I haven't done any benchmarking on this yet, but remember that hardchroot_infos will be quite a short list, which equals number of processes that explicitly entered chroot (not their children, only first parent process to do it). In Ubuntu and Fedora for example, after the full boot, I only had 3 such processes in the list, so this gives the size of list I am expecting: 10-20 items absolutely max (normally under 10), not hundreds. So, list iteration is quite fast. So, it doesn't depend on the number of system processes, but it does depend on number of processes that entered chroot. > +/** > + * is_same_root - check if two tasks share the same root > + * @task1: the task_struct of the first task to be checked > + * @task2: the task_struct of the second task to be checked > + * > + * Returns 1 if tasks share the same root. > + */ > +static int is_same_root(struct task_struct *task1, struct task_struct > +*task2) { > + int rc = 0; > + struct hardchroot_info *entry; > + struct dentry *dentry1 = NULL; > + struct dentry *dentry2 = NULL; > + > + rcu_read_lock(); > + list_for_each_entry_rcu(entry, &hardchroot_infos, node) { > + if (entry->invalid) > + continue; > + if (entry->task == task1) > + dentry1 = entry->dentry; > + if (entry->task == task2) > + dentry2 = entry->dentry; > + } > + if (dentry1 && (dentry1 == dentry2)) { > + rc = 1; > + pr_info("HCRT: pids %d and %d have the same root\n", > + task_pid_nr(task1), task_pid_nr(task2)); > + } >You're comparing dentries, not paths? So using the same directory through different bind mounts as root directories will still count as "same root" even though for chroot breakouts, it's like having different roots? Bind mounts are tricky in this case. From one side yes, it is yes like having two different roots, but then how to handle legitimate cases when people bind mount things into chroot to be used there? Blocking them would make bind mounts not supported inside chroot, which is not smth that should be done IMO. > +/** > + * hardchroot_path_chroot - validate chroot entry > + * @path contains the path structure. > + * > + * Returns 0 if chroot is allowed, -ve on error. > + */ > +static int hardchroot_path_chroot(const struct path *path) { > + int rc = 0; > + struct task_struct *myself = current; > + > + pr_info("HCRT, chroot: inode %lu and pid %d\n", > + d_backing_inode(path->dentry)->i_ino, > + task_pid_nr(myself)); > + > + get_task_struct(myself); > + if (is_process_chrooted(myself) && > + !is_inside_chroot(path->dentry, path->mnt)) { > + put_task_struct(myself); > + pr_info("HCRT, chroot denied: for inode %lu and pid %d\n", > + d_backing_inode(path->dentry)->i_ino, > + task_pid_nr(myself)); > + return -EACCES; > + } > + > + if (task_pid_nr(myself) > 1 && > + path->dentry != init_task.fs->root.dentry && > + path->dentry != myself->nsproxy->mnt_ns->root->mnt.mnt_root) { > + /* task is attempting to chroot, add it to the list */ > + rc = hardchroot_info_add(myself, path->dentry); > + pr_info("HCRT, chroot: adding %d to chrooted task list\n", > + task_pid_nr(myself)); > + } > + > + /* set the current working directory of all newly-chrooted > + * processes to the the root directory of the chroot > + */ > + set_fs_pwd(myself->fs, path); >That seems pretty broken, considering the possibility of open file descriptors pointing to the old root or so. Again, the main use case here is that we have a good process that enters chroot to minimize the damage in case it gets exploited some later time in life. A good process is supposed to close its file descriptors pointing to the old root before entering chroot, isn't it? >And again, chroot only works if the current process is privileged. >If it is, there are probably other ways to break out. You need to have privilege to *enter* chroot, but if you are good process, you will drop the privilege after you did your setup. > +/** > + * hardchroot_task_unshare - check if process is > + * allowed to unshare its namespaces > + * @unshare_flags flags > + * @new_fs contains the new fs_struct if created. > + * @new_fd contains the new files_struct if created. > + * @new_creds contains the new cred if created. > + * @new_nsproxy contains the new nsproxy if created. > + * > + * Returns 0 if unshare is allowed, -ve on error. > + */ > +static int hardchroot_task_unshare(unsigned long unshare_flags, > + const struct fs_struct *new_fs, > + const struct files_struct *new_fd, > + const struct cred *new_cred, > + const struct nsproxy *new_nsproxy) > +{ > + int rc = 0; > + struct task_struct *myself = current; > + const struct nsproxy *tnsproxy = new_nsproxy; > + > + pr_info("HCRT, unshare: unshare_flags %lu and pid %d\n", > + unshare_flags, task_pid_nr(myself)); > + if (new_fs) > + pr_info("HCRT, unshare: new_fs->root.dentry inode%lu\n", > + d_backing_inode(new_fs->root.dentry)->i_ino); > + > + if (!tnsproxy) > + tnsproxy = myself->nsproxy; > + > + if (new_fs && task_pid_nr(myself) > 1 && > + new_fs->root.dentry != init_task.fs->root.dentry && >Hm? You're accessing the fs_struct of init_task without any explicit locking? I am quite new to the kernel development and locking. I was reading a lot recently about locking to understand where it is needed and where not, but mistakes are obviously still coming. I guess I would need to do smth like: task_lock(&init_task) before and task_unlock(&init_task) after. > +/** > + * hardchroot_shm_shmat - check if shmat is allowed > + * @shp contains the shared memory structure to be modified. > + * @shmaddr contains the address to attach memory region to. > + * @shmflg contains the operational flags. > + * > + * Return 0 if allowed, -ve on error. > + */ > +int hardchroot_shm_shmat(struct shmid_kernel *shp, char __user *shmaddr, > + int shmflg) > +{ > + int rc = 0; > + struct task_struct *p; > + struct task_struct *myself = current; > + u64 st; > + time_t ct; > + > + pr_info("HCRT, shmat: shp %d, shmflg %d, pid %d\n", > + shp->shm_perm.id, shmflg, > + task_pid_nr(myself)); > + > + if (likely(!is_process_chrooted(myself))) > + return rc; > + > + rcu_read_lock(); > + read_lock(&tasklist_lock); > + > + p = find_task_by_vpid(shp->shm_cprid); >Not exactly your fault, but this is broken when PID namespaces are involved. Yes, this has been bothering me also... Grsecurity has their own version of this function (find_task_by_vpid_unrestricted) instead that tried to do global search, I guess I should think on doing smth similar, but I am wondering if this can be done better. I will look into this for the next version.