From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1F3CCC433E5 for ; Tue, 23 Mar 2021 19:23:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D4672619D6 for ; Tue, 23 Mar 2021 19:22:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233305AbhCWTWa (ORCPT ); Tue, 23 Mar 2021 15:22:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232341AbhCWTWL (ORCPT ); Tue, 23 Mar 2021 15:22:11 -0400 Received: from smtp-190b.mail.infomaniak.ch (smtp-190b.mail.infomaniak.ch [IPv6:2001:1600:3:17::190b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 44332C061574 for ; Tue, 23 Mar 2021 12:22:11 -0700 (PDT) Received: from smtp-3-0001.mail.infomaniak.ch (unknown [10.4.36.108]) by smtp-2-3000.mail.infomaniak.ch (Postfix) with ESMTPS id 4F4h7r1t6gzMpp9D; Tue, 23 Mar 2021 20:22:00 +0100 (CET) Received: from ns3096276.ip-94-23-54.eu (unknown [23.97.221.149]) by smtp-3-0001.mail.infomaniak.ch (Postfix) with ESMTPA id 4F4h7k4fhnzlh8T2; Tue, 23 Mar 2021 20:21:54 +0100 (CET) Subject: Re: [PATCH v30 07/12] landlock: Support filesystem access-control To: Jann Horn Cc: James Morris , "Serge E . Hallyn" , Al Viro , Andrew Morton , Andy Lutomirski , Anton Ivanov , Arnd Bergmann , Casey Schaufler , David Howells , Jeff Dike , Jonathan Corbet , Kees Cook , Michael Kerrisk , Richard Weinberger , Shuah Khan , Vincent Dagonneau , Kernel Hardening , Linux API , linux-arch , "open list:DOCUMENTATION" , linux-fsdevel , kernel list , "open list:KERNEL SELFTEST FRAMEWORK" , linux-security-module , the arch/x86 maintainers , =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= References: <20210316204252.427806-1-mic@digikod.net> <20210316204252.427806-8-mic@digikod.net> From: =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= Message-ID: <7e494b74-8d5d-a109-6327-992d7d8fca87@digikod.net> Date: Tue, 23 Mar 2021 20:22:28 +0100 User-Agent: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-api@vger.kernel.org On 23/03/2021 18:49, Jann Horn wrote: > On Tue, Mar 23, 2021 at 4:54 PM Mickaël Salaün wrote: >> On 23/03/2021 01:13, Jann Horn wrote: >>> On Tue, Mar 16, 2021 at 9:43 PM Mickaël Salaün wrote: >>>> Using Landlock objects and ruleset, it is possible to tag inodes >>>> according to a process's domain. >>> [...] >>>> +static void release_inode(struct landlock_object *const object) >>>> + __releases(object->lock) >>>> +{ >>>> + struct inode *const inode = object->underobj; >>>> + struct super_block *sb; >>>> + >>>> + if (!inode) { >>>> + spin_unlock(&object->lock); >>>> + return; >>>> + } >>>> + >>>> + /* >>>> + * Protects against concurrent use by hook_sb_delete() of the reference >>>> + * to the underlying inode. >>>> + */ >>>> + object->underobj = NULL; >>>> + /* >>>> + * Makes sure that if the filesystem is concurrently unmounted, >>>> + * hook_sb_delete() will wait for us to finish iput(). >>>> + */ >>>> + sb = inode->i_sb; >>>> + atomic_long_inc(&landlock_superblock(sb)->inode_refs); >>>> + spin_unlock(&object->lock); >>>> + /* >>>> + * Because object->underobj was not NULL, hook_sb_delete() and >>>> + * get_inode_object() guarantee that it is safe to reset >>>> + * landlock_inode(inode)->object while it is not NULL. It is therefore >>>> + * not necessary to lock inode->i_lock. >>>> + */ >>>> + rcu_assign_pointer(landlock_inode(inode)->object, NULL); >>>> + /* >>>> + * Now, new rules can safely be tied to @inode with get_inode_object(). >>>> + */ >>>> + >>>> + iput(inode); >>>> + if (atomic_long_dec_and_test(&landlock_superblock(sb)->inode_refs)) >>>> + wake_up_var(&landlock_superblock(sb)->inode_refs); >>>> +} >>> [...] >>>> +static struct landlock_object *get_inode_object(struct inode *const inode) >>>> +{ >>>> + struct landlock_object *object, *new_object; >>>> + struct landlock_inode_security *inode_sec = landlock_inode(inode); >>>> + >>>> + rcu_read_lock(); >>>> +retry: >>>> + object = rcu_dereference(inode_sec->object); >>>> + if (object) { >>>> + if (likely(refcount_inc_not_zero(&object->usage))) { >>>> + rcu_read_unlock(); >>>> + return object; >>>> + } >>>> + /* >>>> + * We are racing with release_inode(), the object is going >>>> + * away. Wait for release_inode(), then retry. >>>> + */ >>>> + spin_lock(&object->lock); >>>> + spin_unlock(&object->lock); >>>> + goto retry; >>>> + } >>>> + rcu_read_unlock(); >>>> + >>>> + /* >>>> + * If there is no object tied to @inode, then create a new one (without >>>> + * holding any locks). >>>> + */ >>>> + new_object = landlock_create_object(&landlock_fs_underops, inode); >>>> + if (IS_ERR(new_object)) >>>> + return new_object; >>>> + >>>> + /* Protects against concurrent get_inode_object() calls. */ >>>> + spin_lock(&inode->i_lock); >>>> + object = rcu_dereference_protected(inode_sec->object, >>>> + lockdep_is_held(&inode->i_lock)); >>> >>> rcu_dereference_protected() requires that inode_sec->object is not >>> concurrently changed, but I think another thread could call >>> get_inode_object() while we're in landlock_create_object(), and then >>> we could race with the NULL write in release_inode() here? (It >>> wouldn't actually be a UAF though because we're not actually accessing >>> `object` here.) Or am I missing a lock that prevents this? >>> >>> In v28 this wasn't an issue because release_inode() was holding >>> inode->i_lock (and object->lock) during the NULL store; but in v29 and >>> this version the NULL store in release_inode() moved out of the locked >>> region. I think you could just move the NULL store in release_inode() >>> back up (and maybe add a comment explaining the locking rules for >>> landlock_inode(...)->object)? >>> >>> (Or alternatively you could use rcu_dereference_raw() with a comment >>> explaining that the read pointer is only used to check for NULL-ness, >>> and that it is guaranteed that the pointer can't change if it is NULL >>> and we're holding the lock. But that'd be needlessly complicated, I >>> think.) >> >> To reach rcu_assign_pointer(landlock_inode(inode)->object, NULL) in >> release_inode() or in hook_sb_delete(), the >> landlock_inode(inode)->object need to be non-NULL, > > Yes. > >> which implies that a >> call to get_inode_object(inode) either "retry" (because release_inode is >> only called by landlock_put_object, which set object->usage to 0) until >> it creates a new object, or reuses the existing referenced object (and >> increments object->usage). > > But it can be that landlock_inode(inode)->object only becomes non-NULL > after get_inode_object() has checked > rcu_dereference(inode_sec->object). > >> The worse case would be if >> get_inode_object(inode) is called just before the >> rcu_assign_pointer(landlock_inode(inode)->object, NULL) from >> hook_sb_delete(), which would result in an object with a NULL underobj, >> which is the expected behavior (and checked by release_inode). > > The scenario I'm talking about doesn't involve hook_sb_delete(). > >> The line rcu_assign_pointer(inode_sec->object, new_object) from >> get_inode_object() can only be reached if the underlying inode doesn't >> reference an object, > > Yes. > >> in which case hook_sb_delete() will not reach the >> rcu_assign_pointer(landlock_inode(inode)->object, NULL) line for this >> same inode. >> >> This works because get_inode_object(inode) is mutually exclusive to >> itself with the same inode (i.e. an inode can only point to an object >> that references this same inode). > > To clarify: You can concurrently call get_inode_object() multiple > times on the same inode, right? There are no locks held on entry to > that function. > >> I tried to explain this with the comment "Protects against concurrent >> get_inode_object() calls" in get_inode_object(), and the comments just >> before both rcu_assign_pointer(landlock_inode(inode)->object, NULL). > > The scenario I'm talking about is: > > Initially the inode does not have an associated landlock_object. There > are two threads A and B. Thread A is going to execute > get_inode_object(). Thread B is going to execute get_inode_object() > followed immediately by landlock_put_object(). > > thread A: enters get_inode_object() > thread A: rcu_dereference(inode_sec->object) returns NULL > thread A: enters landlock_create_object() > thread B: enters get_inode_object() > thread B: rcu_dereference(inode_sec->object) returns NULL > thread B: calls landlock_create_object() > thread B: sets inode_sec->object while holding inode->i_lock > thread B: leaves get_inode_object() > thread B: enters landlock_put_object() > thread B: object->usage drops to 0, object->lock is taken > thread B: calls release_inode() > thread B: drops object->lock > thread A: returns from landlock_create_object() > thread A: takes inode->i_lock > > At this point, thread B will run: > > rcu_assign_pointer(landlock_inode(inode)->object, NULL); > > while thread A runs: > > rcu_dereference_protected(inode_sec->object, > lockdep_is_held(&inode->i_lock)); > > meaning there is a (theoretical) data race, since > rcu_dereference_protected() doesn't use READ_ONCE(). Hum, I see, that is what I was missing. And that explain why there is (in practice) no impact on winning the race. I would prefer to use rcu_access_pointer() instead of rcu_dereference_protected() to avoid pitfall, and it reflects what I was expecting: --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -117,9 +117,7 @@ static struct landlock_object *get_inode_object(struct inode *const inode) /* Protects against concurrent get_inode_object() calls. */ spin_lock(&inode->i_lock); - object = rcu_dereference_protected(inode_sec->object, - lockdep_is_held(&inode->i_lock)); - if (unlikely(object)) { + if (unlikely(rcu_access_pointer(inode_sec->object))) { /* Someone else just created the object, bail out and retry. */ spin_unlock(&inode->i_lock); kfree(new_object); But I'm not sure about your proposition to move the NULL store in release_inode() back up. Do you mean to add back the inode lock in release_inode() like this? --- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -59,16 +59,12 @@ static void release_inode(struct landlock_object *const object) * Makes sure that if the filesystem is concurrently unmounted, * hook_sb_delete() will wait for us to finish iput(). */ + spin_lock(&inode->i_lock); sb = inode->i_sb; atomic_long_inc(&landlock_superblock(sb)->inode_refs); spin_unlock(&object->lock); - /* - * Because object->underobj was not NULL, hook_sb_delete() and - * get_inode_object() guarantee that it is safe to reset - * landlock_inode(inode)->object while it is not NULL. It is therefore - * not necessary to lock inode->i_lock. - */ rcu_assign_pointer(landlock_inode(inode)->object, NULL); + spin_unlock(&inode->i_lock); /* * Now, new rules can safely be tied to @inode with get_inode_object(). */ I would prefer to avoid nested locks if it is not necessary though. > >>>> + if (unlikely(object)) { >>>> + /* Someone else just created the object, bail out and retry. */ >>>> + spin_unlock(&inode->i_lock); >>>> + kfree(new_object); >>>> + >>>> + rcu_read_lock(); >>>> + goto retry; >>>> + } >>>> + >>>> + rcu_assign_pointer(inode_sec->object, new_object); >>>> + /* >>>> + * @inode will be released by hook_sb_delete() on its superblock >>>> + * shutdown. >>>> + */ >>>> + ihold(inode); >>>> + spin_unlock(&inode->i_lock); >>>> + return new_object; >>>> +}