From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CCD5ECDE20 for ; Wed, 11 Sep 2019 08:16:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6E19B2085B for ; Wed, 11 Sep 2019 08:16:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725924AbfIKIQp (ORCPT ); Wed, 11 Sep 2019 04:16:45 -0400 Received: from out03.mta.xmission.com ([166.70.13.233]:57590 "EHLO out03.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727152AbfIKIQo (ORCPT ); Wed, 11 Sep 2019 04:16:44 -0400 Received: from in02.mta.xmission.com ([166.70.13.52]) by out03.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1i7xnu-0003mo-Lx; Wed, 11 Sep 2019 02:16:42 -0600 Received: from 110.8.30.213.rev.vodafone.pt ([213.30.8.110] helo=x220.xmission.com) by in02.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.87) (envelope-from ) id 1i7xnp-0000gG-GW; Wed, 11 Sep 2019 02:16:42 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: Al Viro Cc: Yonghong Song , Carlos Antonio Neira Bustos , "netdev\@vger.kernel.org" , "brouer\@redhat.com" , "bpf\@vger.kernel.org" References: <20190906150952.23066-1-cneirabustos@gmail.com> <20190906150952.23066-3-cneirabustos@gmail.com> <20190906152435.GW1131@ZenIV.linux.org.uk> <20190906154647.GA19707@ZenIV.linux.org.uk> <20190906160020.GX1131@ZenIV.linux.org.uk> <20190907001056.GA1131@ZenIV.linux.org.uk> <7d196a64-cf36-c2d5-7328-154aaeb929eb@fb.com> <20190909174522.GA17882@frodo.byteswizards.com> <20190910231506.GL1131@ZenIV.linux.org.uk> Date: Wed, 11 Sep 2019 03:16:16 -0500 In-Reply-To: <20190910231506.GL1131@ZenIV.linux.org.uk> (Al Viro's message of "Wed, 11 Sep 2019 00:15:06 +0100") Message-ID: <87o8zr8cz3.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1i7xnp-0000gG-GW;;;mid=<87o8zr8cz3.fsf@x220.int.ebiederm.org>;;;hst=in02.mta.xmission.com;;;ip=213.30.8.110;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+rB1/EbSiXLC2gKNzl0bODizDKGQpqjGE= X-SA-Exim-Connect-IP: 213.30.8.110 X-SA-Exim-Mail-From: ebiederm@xmission.com Subject: Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info. X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in02.mta.xmission.com) Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org Al Viro writes: > On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote: >> >> Carlos, >> >> Discussed with Eric today for what is the best way to get >> the device number for a namespace. The following patch seems >> a reasonable start although Eric would like to see >> how the helper is used in order to decide whether the >> interface looks right. >> >> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2) >> Author: Yonghong Song >> Date: Mon Sep 9 21:50:51 2019 -0700 >> >> nsfs: add an interface function ns_get_inum_dev() >> >> This patch added an interface function >> ns_get_inum_dev(). Given a ns_common structure, >> the function returns the inode and device >> numbers. The function will be used later >> by a newly added bpf helper. >> >> Signed-off-by: Yonghong Song >> >> diff --git a/fs/nsfs.c b/fs/nsfs.c >> index a0431642c6b5..a603c6fc3f54 100644 >> --- a/fs/nsfs.c >> +++ b/fs/nsfs.c >> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd) >> return ERR_PTR(-EINVAL); >> } >> >> +/* Get the device number for the current task pidns. >> + */ >> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev) >> +{ >> + *inum = ns->inum; >> + *dev = nsfs_mnt->mnt_sb->s_dev; >> +} > > Umm... Where would it get the device number once we get (hell knows > what for) multiple nsfs instances? I still don't understand what > would that be about, TBH... Is it really per-userns? Or something > else entirely? Eric, could you give some context? My goal is not to paint things into a corner, with future changes. Right now it is possible to stat a namespace file descriptor and get a device and inode number. Then compare that. I don't want people using the inode number in nsfd as some magic namespace id. We have had times in the past where there was more than one superblock and thus more than one device number. Further if userspace ever uses this heavily there may be times in the future where for checkpoint/restart purposes we will want multiple nsfd's so we can preserve the inode number accross a migration. Realistically there will probably just some kind of hotplug notification to userspace to say we have hotplugged your operatining system as a migration notification. Now the halway discussion did not quite capture everything I was trying to say but it at least got to the right ballpark. The helper in fs/nsfs.c should be: bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino) { return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev)); } That way if/when there are multiple inodes identifying the same namespace the bpf programs don't need to change. Up farther in the stack it should be something like: > BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino) > { > return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino); > } > > const struct bpf_func_proto bpf_current_pidns_match_proto = { > .func = bpf_current_pins_match, > .gpl_only = true, > .ret_type = RET_INTEGER > .arg1_type = ARG_PTR_TO_DEVICE_NUMBER, > .arg2_type = ARG_PTR_TO_INODE_NUMBER, > }; That allows comparing what the bpf came up with with whatever value userspace generated by stating the file descriptor. That is the least bad suggestion I currently have for that functionality. It really would be better to not have that filter in the bpf program itself but in the infrastructure that binds a program to a set of tasks. The problem with this approach is whatever device/inode you have when the namespace they refer to exits there is the possibility that the inode will be reused. So your filter will eventually start matching on the wrong thing. Eric