From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.7 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A6AAC433DB for ; Tue, 23 Mar 2021 21:00:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2177F601FE for ; Tue, 23 Mar 2021 21:00:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2177F601FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 205938D0023; Tue, 23 Mar 2021 17:00:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 11A3B8D0022; Tue, 23 Mar 2021 17:00:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E858F8D0023; Tue, 23 Mar 2021 17:00:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0096.hostedemail.com [216.40.44.96]) by kanga.kvack.org (Postfix) with ESMTP id BACD38D0022 for ; Tue, 23 Mar 2021 17:00:13 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6AEBA181AF5D3 for ; Tue, 23 Mar 2021 21:00:13 +0000 (UTC) X-FDA: 77952356706.03.1BB703E Received: from raptor.unsafe.ru (raptor.unsafe.ru [5.9.43.93]) by imf07.hostedemail.com (Postfix) with ESMTP id 49C99A0009F2 for ; Tue, 23 Mar 2021 21:00:12 +0000 (UTC) Received: from comp-core-i7-2640m-0182e6.redhat.com (ip-94-113-225-162.net.upcbroadband.cz [94.113.225.162]) by raptor.unsafe.ru (Postfix) with ESMTPSA id C55A540C96; Tue, 23 Mar 2021 21:00:11 +0000 (UTC) From: Alexey Gladkov To: LKML , Kernel Hardening , Linux Containers , linux-mm@kvack.org Cc: Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Linus Torvalds , Oleg Nesterov , kernel test robot Subject: [PATCH v9 7/8] Reimplement RLIMIT_MEMLOCK on top of ucounts Date: Tue, 23 Mar 2021 21:59:16 +0100 Message-Id: X-Mailer: git-send-email 2.29.3 In-Reply-To: References: MIME-Version: 1.0 X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.6.4 (raptor.unsafe.ru [5.9.43.93]); Tue, 23 Mar 2021 21:00:12 +0000 (UTC) X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 49C99A0009F2 X-Stat-Signature: k1tzqo4sounkmxk9t534krpn73cdr99r Received-SPF: none (gmail.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=raptor.unsafe.ru; client-ip=5.9.43.93 X-HE-DKIM-Result: none/none X-HE-Tag: 1616533212-768903 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The rlimit counter is tied to uid in the user_namespace. This allows rlimit values to be specified in userns even if they are already globally exceeded by the user. However, the value of the previous user_namespaces cannot be exceeded. Changelog v8: * Fix issues found by lkp-tests project. v7: * Keep only ucounts for RLIMIT_MEMLOCK checks instead of struct cred. v6: * Fix bug in hugetlb_file_setup() detected by trinity. Reported-by: kernel test robot Signed-off-by: Alexey Gladkov --- fs/hugetlbfs/inode.c | 16 ++++++++-------- include/linux/hugetlb.h | 4 ++-- include/linux/mm.h | 4 ++-- include/linux/sched/user.h | 1 - include/linux/shmem_fs.h | 2 +- include/linux/user_namespace.h | 1 + ipc/shm.c | 26 +++++++++++++------------- kernel/fork.c | 1 + kernel/ucount.c | 1 + kernel/user.c | 1 - kernel/user_namespace.c | 1 + mm/memfd.c | 4 ++-- mm/mlock.c | 23 +++++++++++++++-------- mm/mmap.c | 4 ++-- mm/shmem.c | 8 ++++---- 15 files changed, 53 insertions(+), 44 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 701c82c36138..be519fc9559a 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -1443,7 +1443,7 @@ static int get_hstate_idx(int page_size_log) * otherwise hugetlb_reserve_pages reserves one less hugepages than inte= nded. */ struct file *hugetlb_file_setup(const char *name, size_t size, - vm_flags_t acctflag, struct user_struct **user, + vm_flags_t acctflag, struct ucounts **ucounts, int creat_flags, int page_size_log) { struct inode *inode; @@ -1455,20 +1455,20 @@ struct file *hugetlb_file_setup(const char *name,= size_t size, if (hstate_idx < 0) return ERR_PTR(-ENODEV); =20 - *user =3D NULL; + *ucounts =3D NULL; mnt =3D hugetlbfs_vfsmount[hstate_idx]; if (!mnt) return ERR_PTR(-ENOENT); =20 if (creat_flags =3D=3D HUGETLB_SHMFS_INODE && !can_do_hugetlb_shm()) { - *user =3D current_user(); - if (user_shm_lock(size, *user)) { + *ucounts =3D current_ucounts(); + if (user_shm_lock(size, *ucounts)) { task_lock(current); pr_warn_once("%s (%d): Using mlock ulimits for SHM_HUGETLB is depreca= ted\n", current->comm, current->pid); task_unlock(current); } else { - *user =3D NULL; + *ucounts =3D NULL; return ERR_PTR(-EPERM); } } @@ -1495,9 +1495,9 @@ struct file *hugetlb_file_setup(const char *name, s= ize_t size, =20 iput(inode); out: - if (*user) { - user_shm_unlock(size, *user); - *user =3D NULL; + if (*ucounts) { + user_shm_unlock(size, *ucounts); + *ucounts =3D NULL; } return file; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index cccd1aab69dd..96d63dbdec65 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -434,7 +434,7 @@ static inline struct hugetlbfs_inode_info *HUGETLBFS_= I(struct inode *inode) extern const struct file_operations hugetlbfs_file_operations; extern const struct vm_operations_struct hugetlb_vm_ops; struct file *hugetlb_file_setup(const char *name, size_t size, vm_flags_= t acct, - struct user_struct **user, int creat_flags, + struct ucounts **ucounts, int creat_flags, int page_size_log); =20 static inline bool is_file_hugepages(struct file *file) @@ -454,7 +454,7 @@ static inline struct hstate *hstate_inode(struct inod= e *i) #define is_file_hugepages(file) false static inline struct file * hugetlb_file_setup(const char *name, size_t size, vm_flags_t acctflag, - struct user_struct **user, int creat_flags, + struct ucounts **ucounts, int creat_flags, int page_size_log) { return ERR_PTR(-ENOSYS); diff --git a/include/linux/mm.h b/include/linux/mm.h index 64a71bf20536..7466eab000d0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1658,8 +1658,8 @@ extern bool can_do_mlock(void); #else static inline bool can_do_mlock(void) { return false; } #endif -extern int user_shm_lock(size_t, struct user_struct *); -extern void user_shm_unlock(size_t, struct user_struct *); +extern int user_shm_lock(size_t, struct ucounts *); +extern void user_shm_unlock(size_t, struct ucounts *); =20 /* * Parameter block passed down to zap_pte_range in exceptional cases. diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 8ba9cec4fb99..82bd2532da6b 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -18,7 +18,6 @@ struct user_struct { #ifdef CONFIG_EPOLL atomic_long_t epoll_watches; /* The number of file descriptors currentl= y watched */ #endif - unsigned long locked_shm; /* How many pages of mlocked shm ? */ unsigned long unix_inflight; /* How many files in flight in unix socket= s */ atomic_long_t pipe_bufs; /* how many pages are allocated in pipe buffe= rs */ =20 diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index d82b6f396588..aa77dcd1646f 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -65,7 +65,7 @@ extern struct file *shmem_file_setup_with_mnt(struct vf= smount *mnt, extern int shmem_zero_setup(struct vm_area_struct *); extern unsigned long shmem_get_unmapped_area(struct file *, unsigned lon= g addr, unsigned long len, unsigned long pgoff, unsigned long flags); -extern int shmem_lock(struct file *file, int lock, struct user_struct *u= ser); +extern int shmem_lock(struct file *file, int lock, struct ucounts *ucoun= ts); #ifdef CONFIG_SHMEM extern const struct address_space_operations shmem_aops; static inline bool shmem_mapping(struct address_space *mapping) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespac= e.h index 6e8736c7aa29..82851fba7278 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -53,6 +53,7 @@ enum ucount_type { UCOUNT_RLIMIT_NPROC, UCOUNT_RLIMIT_MSGQUEUE, UCOUNT_RLIMIT_SIGPENDING, + UCOUNT_RLIMIT_MEMLOCK, UCOUNT_COUNTS, }; =20 diff --git a/ipc/shm.c b/ipc/shm.c index febd88daba8c..003234fbbd17 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -60,7 +60,7 @@ struct shmid_kernel /* private to the kernel */ time64_t shm_ctim; struct pid *shm_cprid; struct pid *shm_lprid; - struct user_struct *mlock_user; + struct ucounts *mlock_ucounts; =20 /* The task created the shm object. NULL if the task is dead. */ struct task_struct *shm_creator; @@ -286,10 +286,10 @@ static void shm_destroy(struct ipc_namespace *ns, s= truct shmid_kernel *shp) shm_rmid(ns, shp); shm_unlock(shp); if (!is_file_hugepages(shm_file)) - shmem_lock(shm_file, 0, shp->mlock_user); - else if (shp->mlock_user) + shmem_lock(shm_file, 0, shp->mlock_ucounts); + else if (shp->mlock_ucounts) user_shm_unlock(i_size_read(file_inode(shm_file)), - shp->mlock_user); + shp->mlock_ucounts); fput(shm_file); ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); @@ -625,7 +625,7 @@ static int newseg(struct ipc_namespace *ns, struct ip= c_params *params) =20 shp->shm_perm.key =3D key; shp->shm_perm.mode =3D (shmflg & S_IRWXUGO); - shp->mlock_user =3D NULL; + shp->mlock_ucounts =3D NULL; =20 shp->shm_perm.security =3D NULL; error =3D security_shm_alloc(&shp->shm_perm); @@ -650,7 +650,7 @@ static int newseg(struct ipc_namespace *ns, struct ip= c_params *params) if (shmflg & SHM_NORESERVE) acctflag =3D VM_NORESERVE; file =3D hugetlb_file_setup(name, hugesize, acctflag, - &shp->mlock_user, HUGETLB_SHMFS_INODE, + &shp->mlock_ucounts, HUGETLB_SHMFS_INODE, (shmflg >> SHM_HUGE_SHIFT) & SHM_HUGE_MASK); } else { /* @@ -698,8 +698,8 @@ static int newseg(struct ipc_namespace *ns, struct ip= c_params *params) no_id: ipc_update_pid(&shp->shm_cprid, NULL); ipc_update_pid(&shp->shm_lprid, NULL); - if (is_file_hugepages(file) && shp->mlock_user) - user_shm_unlock(size, shp->mlock_user); + if (is_file_hugepages(file) && shp->mlock_ucounts) + user_shm_unlock(size, shp->mlock_ucounts); fput(file); ipc_rcu_putref(&shp->shm_perm, shm_rcu_free); return error; @@ -1105,12 +1105,12 @@ static int shmctl_do_lock(struct ipc_namespace *n= s, int shmid, int cmd) goto out_unlock0; =20 if (cmd =3D=3D SHM_LOCK) { - struct user_struct *user =3D current_user(); + struct ucounts *ucounts =3D current_ucounts(); =20 - err =3D shmem_lock(shm_file, 1, user); + err =3D shmem_lock(shm_file, 1, ucounts); if (!err && !(shp->shm_perm.mode & SHM_LOCKED)) { shp->shm_perm.mode |=3D SHM_LOCKED; - shp->mlock_user =3D user; + shp->mlock_ucounts =3D ucounts; } goto out_unlock0; } @@ -1118,9 +1118,9 @@ static int shmctl_do_lock(struct ipc_namespace *ns,= int shmid, int cmd) /* SHM_UNLOCK */ if (!(shp->shm_perm.mode & SHM_LOCKED)) goto out_unlock0; - shmem_lock(shm_file, 0, shp->mlock_user); + shmem_lock(shm_file, 0, shp->mlock_ucounts); shp->shm_perm.mode &=3D ~SHM_LOCKED; - shp->mlock_user =3D NULL; + shp->mlock_ucounts =3D NULL; get_file(shm_file); ipc_unlock_object(&shp->shm_perm); rcu_read_unlock(); diff --git a/kernel/fork.c b/kernel/fork.c index 741f896c156e..a3a5e317c3c0 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -825,6 +825,7 @@ void __init fork_init(void) init_user_ns.ucount_max[UCOUNT_RLIMIT_NPROC] =3D task_rlimit(&init_task= , RLIMIT_NPROC); init_user_ns.ucount_max[UCOUNT_RLIMIT_MSGQUEUE] =3D task_rlimit(&init_t= ask, RLIMIT_MSGQUEUE); init_user_ns.ucount_max[UCOUNT_RLIMIT_SIGPENDING] =3D task_rlimit(&init= _task, RLIMIT_SIGPENDING); + init_user_ns.ucount_max[UCOUNT_RLIMIT_MEMLOCK] =3D task_rlimit(&init_ta= sk, RLIMIT_MEMLOCK); =20 #ifdef CONFIG_VMAP_STACK cpuhp_setup_state(CPUHP_BP_PREPARE_DYN, "fork:vm_stack_cache", diff --git a/kernel/ucount.c b/kernel/ucount.c index 29976c8942f3..1db2c8744cde 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -84,6 +84,7 @@ static struct ctl_table user_table[] =3D { { }, { }, { }, + { }, { } }; #endif /* CONFIG_SYSCTL */ diff --git a/kernel/user.c b/kernel/user.c index 6737327f83be..c82399c1618a 100644 --- a/kernel/user.c +++ b/kernel/user.c @@ -98,7 +98,6 @@ static DEFINE_SPINLOCK(uidhash_lock); /* root_user.__count is 1, for init task cred */ struct user_struct root_user =3D { .__count =3D REFCOUNT_INIT(1), - .locked_shm =3D 0, .uid =3D GLOBAL_ROOT_UID, .ratelimit =3D RATELIMIT_STATE_INIT(root_user.ratelimit, 0, 0), }; diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index df1bed32dd48..5ef0d4b182ba 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -124,6 +124,7 @@ int create_user_ns(struct cred *new) ns->ucount_max[UCOUNT_RLIMIT_NPROC] =3D rlimit(RLIMIT_NPROC); ns->ucount_max[UCOUNT_RLIMIT_MSGQUEUE] =3D rlimit(RLIMIT_MSGQUEUE); ns->ucount_max[UCOUNT_RLIMIT_SIGPENDING] =3D rlimit(RLIMIT_SIGPENDING); + ns->ucount_max[UCOUNT_RLIMIT_MEMLOCK] =3D rlimit(RLIMIT_MEMLOCK); ns->ucounts =3D ucounts; =20 /* Inherit USERNS_SETGROUPS_ALLOWED from our parent */ diff --git a/mm/memfd.c b/mm/memfd.c index 2647c898990c..081dd33e6a61 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -297,9 +297,9 @@ SYSCALL_DEFINE2(memfd_create, } =20 if (flags & MFD_HUGETLB) { - struct user_struct *user =3D NULL; + struct ucounts *ucounts =3D NULL; =20 - file =3D hugetlb_file_setup(name, 0, VM_NORESERVE, &user, + file =3D hugetlb_file_setup(name, 0, VM_NORESERVE, &ucounts, HUGETLB_ANONHUGE_INODE, (flags >> MFD_HUGE_SHIFT) & MFD_HUGE_MASK); diff --git a/mm/mlock.c b/mm/mlock.c index f8f8cc32d03d..b874d0436976 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -817,9 +817,10 @@ SYSCALL_DEFINE0(munlockall) */ static DEFINE_SPINLOCK(shmlock_user_lock); =20 -int user_shm_lock(size_t size, struct user_struct *user) +int user_shm_lock(size_t size, struct ucounts *ucounts) { unsigned long lock_limit, locked; + bool overlimit; int allowed =3D 0; =20 locked =3D (size + PAGE_SIZE - 1) >> PAGE_SHIFT; @@ -828,21 +829,27 @@ int user_shm_lock(size_t size, struct user_struct *= user) allowed =3D 1; lock_limit >>=3D PAGE_SHIFT; spin_lock(&shmlock_user_lock); - if (!allowed && - locked + user->locked_shm > lock_limit && !capable(CAP_IPC_LOCK)) + overlimit =3D inc_rlimit_ucounts_and_test(ucounts, UCOUNT_RLIMIT_MEMLOC= K, + locked, lock_limit); + + if (!allowed && overlimit && !capable(CAP_IPC_LOCK)) { + dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); + goto out; + } + if (!get_ucounts(ucounts)) { + dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, locked); goto out; - get_uid(user); - user->locked_shm +=3D locked; + } allowed =3D 1; out: spin_unlock(&shmlock_user_lock); return allowed; } =20 -void user_shm_unlock(size_t size, struct user_struct *user) +void user_shm_unlock(size_t size, struct ucounts *ucounts) { spin_lock(&shmlock_user_lock); - user->locked_shm -=3D (size + PAGE_SIZE - 1) >> PAGE_SHIFT; + dec_rlimit_ucounts(ucounts, UCOUNT_RLIMIT_MEMLOCK, (size + PAGE_SIZE - = 1) >> PAGE_SHIFT); spin_unlock(&shmlock_user_lock); - free_uid(user); + put_ucounts(ucounts); } diff --git a/mm/mmap.c b/mm/mmap.c index 3f287599a7a3..99f97d200aa4 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1605,7 +1605,7 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, u= nsigned long len, goto out_fput; } } else if (flags & MAP_HUGETLB) { - struct user_struct *user =3D NULL; + struct ucounts *ucounts =3D NULL; struct hstate *hs; =20 hs =3D hstate_sizelog((flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); @@ -1621,7 +1621,7 @@ unsigned long ksys_mmap_pgoff(unsigned long addr, u= nsigned long len, */ file =3D hugetlb_file_setup(HUGETLB_ANON_FILE, len, VM_NORESERVE, - &user, HUGETLB_ANONHUGE_INODE, + &ucounts, HUGETLB_ANONHUGE_INODE, (flags >> MAP_HUGE_SHIFT) & MAP_HUGE_MASK); if (IS_ERR(file)) return PTR_ERR(file); diff --git a/mm/shmem.c b/mm/shmem.c index b2db4ed0fbc7..45b71a002ab1 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2227,7 +2227,7 @@ static struct mempolicy *shmem_get_policy(struct vm= _area_struct *vma, } #endif =20 -int shmem_lock(struct file *file, int lock, struct user_struct *user) +int shmem_lock(struct file *file, int lock, struct ucounts *ucounts) { struct inode *inode =3D file_inode(file); struct shmem_inode_info *info =3D SHMEM_I(inode); @@ -2239,13 +2239,13 @@ int shmem_lock(struct file *file, int lock, struc= t user_struct *user) * no serialization needed when called from shm_destroy(). */ if (lock && !(info->flags & VM_LOCKED)) { - if (!user_shm_lock(inode->i_size, user)) + if (!user_shm_lock(inode->i_size, ucounts)) goto out_nomem; info->flags |=3D VM_LOCKED; mapping_set_unevictable(file->f_mapping); } - if (!lock && (info->flags & VM_LOCKED) && user) { - user_shm_unlock(inode->i_size, user); + if (!lock && (info->flags & VM_LOCKED) && ucounts) { + user_shm_unlock(inode->i_size, ucounts); info->flags &=3D ~VM_LOCKED; mapping_clear_unevictable(file->f_mapping); } --=20 2.29.3