BPF Archive on lore.kernel.org
 help / color / Atom feed
From: ebiederm@xmission.com (Eric W. Biederman)
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Yonghong Song <yhs@fb.com>,
	Carlos Antonio Neira Bustos <cneirabustos@gmail.com>,
	"netdev\@vger.kernel.org" <netdev@vger.kernel.org>,
	"brouer\@redhat.com" <brouer@redhat.com>,
	"bpf\@vger.kernel.org" <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info.
Date: Wed, 11 Sep 2019 03:16:16 -0500
Message-ID: <87o8zr8cz3.fsf@x220.int.ebiederm.org> (raw)
In-Reply-To: <20190910231506.GL1131@ZenIV.linux.org.uk> (Al Viro's message of "Wed, 11 Sep 2019 00:15:06 +0100")

Al Viro <viro@zeniv.linux.org.uk> writes:

> On Tue, Sep 10, 2019 at 10:35:09PM +0000, Yonghong Song wrote:
>> 
>> Carlos,
>> 
>> Discussed with Eric today for what is the best way to get
>> the device number for a namespace. The following patch seems
>> a reasonable start although Eric would like to see
>> how the helper is used in order to decide whether the
>> interface looks right.
>> 
>> commit bb00fc36d5d263047a8bceb3e51e969d7fbce7db (HEAD -> fs2)
>> Author: Yonghong Song <yhs@fb.com>
>> Date:   Mon Sep 9 21:50:51 2019 -0700
>> 
>>      nsfs: add an interface function ns_get_inum_dev()
>> 
>>      This patch added an interface function
>>      ns_get_inum_dev(). Given a ns_common structure,
>>      the function returns the inode and device
>>      numbers. The function will be used later
>>      by a newly added bpf helper.
>> 
>>      Signed-off-by: Yonghong Song <yhs@fb.com>
>> 
>> diff --git a/fs/nsfs.c b/fs/nsfs.c
>> index a0431642c6b5..a603c6fc3f54 100644
>> --- a/fs/nsfs.c
>> +++ b/fs/nsfs.c
>> @@ -245,6 +245,14 @@ struct file *proc_ns_fget(int fd)
>>          return ERR_PTR(-EINVAL);
>>   }
>> 
>> +/* Get the device number for the current task pidns.
>> + */
>> +void ns_get_inum_dev(struct ns_common *ns, u32 *inum, dev_t *dev)
>> +{
>> +       *inum = ns->inum;
>> +       *dev = nsfs_mnt->mnt_sb->s_dev;
>> +}
>
> Umm...  Where would it get the device number once we get (hell knows
> what for) multiple nsfs instances?  I still don't understand what
> would that be about, TBH...  Is it really per-userns?  Or something
> else entirely?  Eric, could you give some context?

My goal is not to paint things into a corner, with future changes.
Right now it is possible to stat a namespace file descriptor and
get a device and inode number.  Then compare that. 

I don't want people using the inode number in nsfd as some magic
namespace id.

We have had times in the past where there was more than one superblock
and thus more than one device number.  Further if userspace ever uses
this heavily there may be times in the future where for
checkpoint/restart purposes we will want multiple nsfd's so we can
preserve the inode number accross a migration.

Realistically there will probably just some kind of hotplug notification
to userspace to say we have hotplugged your operatining system as
a migration notification.

Now the halway discussion did not quite capture everything I was trying
to say but it at least got to the right ballpark.

The helper in fs/nsfs.c should be:

bool ns_match(const struct ns_common *ns, dev_t dev, ino_t ino)
{
        return ((ns->inum == ino) && (nsfs_mnt->mnt_sb->s_dev == dev));
}

That way if/when there are multiple inodes identifying the same
namespace the bpf programs don't need to change.

Up farther in the stack it should be something like:

> BPF_CALL_2(bpf_current_pidns_match, dev_t *dev, ino_t *ino)
> {
>         return ns_match(&task_active_pid_ns(current)->ns, *dev, *ino);
> }
> 
> const struct bpf_func_proto bpf_current_pidns_match_proto = {
> 	.func		= bpf_current_pins_match,
> 	.gpl_only	= true,
> 	.ret_type	= RET_INTEGER
> 	.arg1_type	= ARG_PTR_TO_DEVICE_NUMBER,
> 	.arg2_type	= ARG_PTR_TO_INODE_NUMBER,
> };

That allows comparing what the bpf came up with with whatever value
userspace generated by stating the file descriptor.


That is the least bad suggestion I currently have for that
functionality.  It really would be better to not have that filter in the
bpf program itself but in the infrastructure that binds a program to a
set of tasks.

The problem with this approach is whatever device/inode you have when
the namespace they refer to exits there is the possibility that the
inode will be reused.  So your filter will eventually start matching on
the wrong thing.

Eric

  reply index

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-06 15:09 [PATCH bpf-next v10 0/4] BPF: New helper to obtain namespace data from current task Carlos Neira
2019-09-06 15:09 ` [PATCH bpf-next v10 1/4] fs/namei.c: make available filename_lookup() for bpf helpers Carlos Neira
2019-09-06 15:09 ` [PATCH bpf-next v10 2/4] bpf: new helper to obtain namespace data from current task New bpf helper bpf_get_current_pidns_info Carlos Neira
2019-09-06 15:24   ` Al Viro
2019-09-06 15:46     ` Al Viro
2019-09-06 16:00       ` Al Viro
2019-09-06 23:21         ` Yonghong Song
2019-09-07  0:10           ` Al Viro
2019-09-07  6:34             ` Yonghong Song
2019-09-09 17:45               ` Carlos Antonio Neira Bustos
2019-09-10 22:35                 ` Yonghong Song
2019-09-10 23:15                   ` Al Viro
2019-09-11  8:16                     ` ebiederm [this message]
2019-09-12  5:49                       ` Yonghong Song
     [not found]                         ` <CACiB22j9M2gmccnh7XqqFp8g7qKFuiOrSAVJiA2tQHLB0pmoSQ@mail.gmail.com>
2019-09-13  2:56                           ` Yonghong Song
2019-09-13 11:58                             ` Carlos Antonio Neira Bustos
2019-09-13 16:59                         ` ebiederm
2019-09-13 17:28                           ` Yonghong Song
2019-09-11  4:32                   ` Carlos Antonio Neira Bustos
2019-09-11  8:17                     ` ebiederm
2019-09-10 22:46   ` Yonghong Song
2019-09-11  4:33     ` Carlos Antonio Neira Bustos
2019-09-06 15:09 ` [PATCH bpf-next v10 3/4] tools: Added bpf_get_current_pidns_info helper Carlos Neira
2019-09-06 15:09 ` [PATCH bpf-next v10 4/4] tools/testing/selftests/bpf: Add self-tests for helper bpf_get_pidns_info Carlos Neira
2019-09-10 22:55   ` Yonghong Song

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87o8zr8cz3.fsf@x220.int.ebiederm.org \
    --to=ebiederm@xmission.com \
    --cc=bpf@vger.kernel.org \
    --cc=brouer@redhat.com \
    --cc=cneirabustos@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git