From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03BF4EB64DA for ; Tue, 13 Jun 2023 16:27:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237059AbjFMQ1i (ORCPT ); Tue, 13 Jun 2023 12:27:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54116 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231691AbjFMQ1f (ORCPT ); Tue, 13 Jun 2023 12:27:35 -0400 Received: from smtp-relay-internal-0.canonical.com (smtp-relay-internal-0.canonical.com [185.125.188.122]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5AF19A1 for ; Tue, 13 Jun 2023 09:27:33 -0700 (PDT) Received: from mail-yb1-f200.google.com (mail-yb1-f200.google.com [209.85.219.200]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-0.canonical.com (Postfix) with ESMTPS id EA63C3F269 for ; Tue, 13 Jun 2023 16:27:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1686673650; bh=ValhE6UfwquXmpyVBBbxsSIHDSpMkwILF2ZvydAjNCo=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=Wkk5GUhRGcM0FYQ7NRW6JI7y0945sgQw853AzvXiNsSY/cZTzb5dmTi18IhU+0g/G 1as8PlYbCc16pIjrio+lyQfQ95KMZEoMMyjgGOBzBY9pGf5vnaV8rN6RhqG9dRRRRb rTsjVc6aWuDHrSeoqrXKWAxL0PxIFpejO14CDWNdrvnkBKNrBWUvw2lX5rxgQ8k0zo mzpVpuhNJ0xDcJ4pha4rQeml6JKNoni1kmivuhe8UqOlyS4l04TzQpSzYnFfvYDnKu 9LnG8k70QTbOe9XXzu0CtIONDt0KuXIEYqloCrnQosIj6myAke326xuqq3ev7dcx2U Wz7nIuhwCcXeg== Received: by mail-yb1-f200.google.com with SMTP id 3f1490d57ef6-bd69ee0edacso294050276.3 for ; Tue, 13 Jun 2023 09:27:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686673650; x=1689265650; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ValhE6UfwquXmpyVBBbxsSIHDSpMkwILF2ZvydAjNCo=; b=T8ZlHGWaY/QEKEZxjBUDE8Lm1f1IRp6P0DlVxFjodAgKkHZlPji/yuEJaetaqZq5VS KH3ln4j6QUmPFemogEe0DndyCYY0P6QJCtPnexxct3HkanUx18Caw7MTruioxFImqjhv L/QLAnek6HArCS811laB+EwxBuDRY2YkC3pxj9sREJiHmImEMsrsV7l7ZBAPS+R7OIPg g5a0sp+dzaOlJyx0aAGVEjk5kQaQDJcKjb+ea9NS81YGwQUZQpRuDGXv4SKlAxlUEp+4 A1fT9HvKDhHsyBMtFsZAm0VDtWNAqOStmbhhT04mtJnPFW0+ATJa1WLrgzZhIOVJ41m5 9lpQ== X-Gm-Message-State: AC+VfDyaXupgvq7PjiIVRoOVODbKlpVRIOid42eMIuTnCKCsVpdr9+fe /D6fQ+Nw8Qj7Qb7eh8XIFPYGpqLpIFDyHE/kPdh4BcvZ0V4JU5iDIDeWazPXq0PAr+04ItzyEhi JKgW3Nud/JKJJuVv+ZTawqqzcwdseZNkv3yJjhXNjuvovnC4KRAmjkheblg== X-Received: by 2002:a25:4605:0:b0:bc6:6083:8f42 with SMTP id t5-20020a254605000000b00bc660838f42mr1564366yba.21.1686673649892; Tue, 13 Jun 2023 09:27:29 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5JlLRjmXwi2pYsePj7YxcsfdOk4CxL8axkWWI3zvYCCITZ+u0R3LM2ErXc9+TNwZ73jINAQkimPaDpZvWgJb8= X-Received: by 2002:a25:4605:0:b0:bc6:6083:8f42 with SMTP id t5-20020a254605000000b00bc660838f42mr1564353yba.21.1686673649608; Tue, 13 Jun 2023 09:27:29 -0700 (PDT) MIME-Version: 1.0 References: <20230608154256.562906-1-aleksandr.mikhalitsyn@canonical.com> <20230609-alufolie-gezaubert-f18ef17cda12@brauner> In-Reply-To: From: Aleksandr Mikhalitsyn Date: Tue, 13 Jun 2023 18:27:18 +0200 Message-ID: Subject: Re: [PATCH v5 00/14] ceph: support idmapped mounts To: Gregory Farnum Cc: Xiubo Li , Christian Brauner , stgraber@ubuntu.com, linux-fsdevel@vger.kernel.org, Ilya Dryomov , Jeff Layton , ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 13, 2023 at 4:54=E2=80=AFPM Gregory Farnum = wrote: > > On Mon, Jun 12, 2023 at 6:43=E2=80=AFPM Xiubo Li wrot= e: > > > > > > On 6/9/23 18:12, Aleksandr Mikhalitsyn wrote: > > > On Fri, Jun 9, 2023 at 12:00=E2=80=AFPM Christian Brauner wrote: > > >> On Fri, Jun 09, 2023 at 10:59:19AM +0200, Aleksandr Mikhalitsyn wrot= e: > > >>> On Fri, Jun 9, 2023 at 3:57=E2=80=AFAM Xiubo Li = wrote: > > >>>> > > >>>> On 6/8/23 23:42, Alexander Mikhalitsyn wrote: > > >>>>> Dear friends, > > >>>>> > > >>>>> This patchset was originally developed by Christian Brauner but I= 'll continue > > >>>>> to push it forward. Christian allowed me to do that :) > > >>>>> > > >>>>> This feature is already actively used/tested with LXD/LXC project= . > > >>>>> > > >>>>> Git tree (based on https://github.com/ceph/ceph-client.git master= ): > > >>> Hi Xiubo! > > >>> > > >>>> Could you rebase these patches to 'testing' branch ? > > >>> Will do in -v6. > > >>> > > >>>> And you still have missed several places, for example the followin= g cases: > > >>>> > > >>>> > > >>>> 1 269 fs/ceph/addr.c <> > > >>>> req =3D ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_= GETATTR, > > >>>> mode); > > >>> + > > >>> > > >>>> 2 389 fs/ceph/dir.c <> > > >>>> req =3D ceph_mdsc_create_request(mdsc, op, USE_AUTH= _MDS); > > >>> + > > >>> > > >>>> 3 789 fs/ceph/dir.c <> > > >>>> req =3D ceph_mdsc_create_request(mdsc, op, USE_ANY_= MDS); > > >>> We don't have an idmapping passed to lookup from the VFS layer. As = I > > >>> mentioned before, it's just impossible now. > > >> ->lookup() doesn't deal with idmappings and really can't otherwise y= ou > > >> risk ending up with inode aliasing which is really not something you > > >> want. IOW, you can't fill in inode->i_{g,u}id based on a mount's > > >> idmapping as inode->i_{g,u}id absolutely needs to be a filesystem wi= de > > >> value. So better not even risk exposing the idmapping in there at al= l. > > > Thanks for adding, Christian! > > > > > > I agree, every time when we use an idmapping we need to be careful wi= th > > > what we map. AFAIU, inode->i_{g,u}id should be based on the filesyste= m > > > idmapping (not mount), > > > but in this case, Xiubo want's current_fs{u,g}id to be mapped > > > according to an idmapping. > > > Anyway, it's impossible at now and IMHO, until we don't have any > > > practical use case where > > > UID/GID-based path restriction is used in combination with idmapped > > > mounts it's not worth to > > > make such big changes in the VFS layer. > > > > > > May be I'm not right, but it seems like UID/GID-based path restrictio= n > > > is not a widespread > > > feature and I can hardly imagine it to be used with the container > > > workloads (for instance), > > > because it will require to always keep in sync MDS permissions > > > configuration with the > > > possible UID/GID ranges on the client. It looks like a nightmare for = sysadmin. > > > It is useful when cephfs is used as an external storage on the host, = but if you > > > share cephfs with a few containers with different user namespaces idm= apping... > > > > Hmm, while this will break the MDS permission check in cephfs then in > > lookup case. If we really couldn't support it we should make it to > > escape the check anyway or some OPs may fail and won't work as expected= . Dear Gregory, Thanks for the fast reply! > > I don't pretend to know the details of the VFS (or even our linux > client implementation), but I'm confused that this is apparently so > hard. It looks to me like we currently always fill in the "caller_uid" > with "from_kuid(&init_user_ns, req->r_cred->fsuid))". Is this actually > valid to begin with? If it is, why can't the uid mapping be applied on > that? Applying an idmapping is not hard, it's as simple as replacing from_kuid(&init_user_ns, req->r_cred->fsuid) to from_vfsuid(req->r_mnt_idmap, &init_user_ns, VFSUIDT_INIT(req->r_cred->fsui= d)) but the problem is that we don't have req->r_mnt_idmap for all the requests= . For instance, we don't have idmap arguments (that come from the VFS layer) for ->lookup operation and many others. There are some reasons for that (Christian has covered some of them). So, it's not about my laziness to implement that. It's a real pain ;-) > > As both the client and the server share authority over the inode's > state (including things like mode bits and owners), and need to do > permission checking, being able to tell the server the relevant actor > is inherently necessary. We also let admins restrict keys to > particular UID/GID combinations as they wish, and it's not the most > popular feature but it does get deployed. I would really expect a user > of UID mapping to be one of the *most* likely to employ such a > facility...maybe not with containers, but certainly end-user homedirs > and shared spaces. > > Disabling the MDS auth checks is really not an option. I guess we > could require any user employing idmapping to not be uid-restricted, > and set the anonymous UID (does that work, Xiubo, or was it the broken > one? In which case we'd have to default to root?). But that seems a > bit janky to me. That's an interesting point about anonymous UID, but at the same time, We use these caller's fs UID/GID values as an owner's UID/GID for newly created inodes. It means that we can't use anonymous UID everywhere in this case otherwise all new files/directories will be owned by an anonymous user. > -Greg Kind regards, Alex > > > @Greg > > > > For the lookup requests the idmapping couldn't get the mapped UID/GID > > just like all the other requests, which is needed by the MDS permission > > check. Is that okay to make it disable the check for this case ? I am > > afraid this will break the MDS permssions logic. > > > > Any idea ? > > > > Thanks > > > > - Xiubo > > > > > > > Kind regards, > > > Alex > > > > > >