From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=dOwe=AH=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 658D1C433E2
	for <linux-fsdevel@archiver.kernel.org>; Fri, 26 Jun 2020 04:59:35 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by mail.kernel.org (Postfix) with ESMTP id 3E52620768
	for <linux-fsdevel@archiver.kernel.org>; Fri, 26 Jun 2020 04:59:35 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726847AbgFZE7d (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Fri, 26 Jun 2020 00:59:33 -0400
Received: from www262.sakura.ne.jp ([202.181.97.72]:62634 "EHLO
        www262.sakura.ne.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1725306AbgFZE7c (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Fri, 26 Jun 2020 00:59:32 -0400
Received: from fsav107.sakura.ne.jp (fsav107.sakura.ne.jp [27.133.134.234])
        by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 05Q4waSg074045;
        Fri, 26 Jun 2020 13:58:36 +0900 (JST)
        (envelope-from penguin-kernel@i-love.sakura.ne.jp)
Received: from www262.sakura.ne.jp (202.181.97.72)
 by fsav107.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav107.sakura.ne.jp);
 Fri, 26 Jun 2020 13:58:36 +0900 (JST)
X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav107.sakura.ne.jp)
Received: from [192.168.1.9] (M106072142033.v4.enabler.ne.jp [106.72.142.33])
        (authenticated bits=0)
        by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 05Q4wZ1f074034
        (version=TLSv1.2 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO);
        Fri, 26 Jun 2020 13:58:35 +0900 (JST)
        (envelope-from penguin-kernel@i-love.sakura.ne.jp)
Subject: Re: [RFC][PATCH] net/bpfilter: Remove this broken and apparently
 unmantained
To:     Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Linus Torvalds <torvalds@linux-foundation.org>
Cc:     David Miller <davem@davemloft.net>,
        Greg Kroah-Hartman <greg@kroah.com>,
        "Eric W. Biederman" <ebiederm@xmission.com>,
        Kees Cook <keescook@chromium.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Alexei Starovoitov <ast@kernel.org>,
        Al Viro <viro@zeniv.linux.org.uk>, bpf <bpf@vger.kernel.org>,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Jakub Kicinski <kuba@kernel.org>,
        Masahiro Yamada <yamada.masahiro@socionext.com>,
        Gary Lin <GLin@suse.com>, Bruno Meneguele <bmeneg@redhat.com>,
        LSM List <linux-security-module@vger.kernel.org>,
        Casey Schaufler <casey@schaufler-ca.com>
References: <20200625095725.GA3303921@kroah.com>
 <778297d2-512a-8361-cf05-42d9379e6977@i-love.sakura.ne.jp>
 <20200625120725.GA3493334@kroah.com>
 <20200625.123437.2219826613137938086.davem@davemloft.net>
 <CAHk-=whuTwGHEPjvtbBvneHHXeqJC=q5S09mbPnqb=Q+MSPMag@mail.gmail.com>
 <20200626015121.qpxkdaqtsywe3zqx@ast-mbp.dhcp.thefacebook.com>
From:   Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Message-ID: <eb3bec08-9de4-c708-fb8e-b6a47145eb5e@i-love.sakura.ne.jp>
Date:   Fri, 26 Jun 2020 13:58:35 +0900
User-Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:68.0) Gecko/20100101
 Thunderbird/68.9.0
MIME-Version: 1.0
In-Reply-To: <20200626015121.qpxkdaqtsywe3zqx@ast-mbp.dhcp.thefacebook.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On 2020/06/26 10:51, Alexei Starovoitov wrote:
> On Thu, Jun 25, 2020 at 06:36:34PM -0700, Linus Torvalds wrote:
>> On Thu, Jun 25, 2020 at 12:34 PM David Miller <davem@davemloft.net> wrote:
>>>
>>> It's kernel code executing in userspace.  If you don't trust the
>>> signed code you don't trust the signed code.
>>>
>>> Nothing is magic about a piece of code executing in userspace.
>>
>> Well, there's one real issue: the most likely thing that code is going
>> to do is execute llvm to generate more code.

Wow! Are we going to allow execution of such complicated programs?

I was hoping that fork_usermode_blob() accepts only simple program
like the content of "hello64" generated by

----------
; nasm -f elf64 hello64.asm && ld -s -m elf_x86_64 -o hello64 hello64.o
section .text
global _start

_start:
  mov rax, 1        ; write(
  mov rdi, 1        ;   1,
  mov rsi, msg      ;   "Hello world\n",
  mov rdx, 12       ;   12
  syscall           ; );
  mov rax, 231      ; _exit(
  mov rdi, 0        ;   0
  syscall           ; );

section .rodata
  msg: db "Hello world", 0x0a
----------

which can be contained by mechanisms like seccomp; there is no pathname
resolution, no networking access etc.

>>
>> And that's I think the real security issue here: the context in which
>> the code executes. It may be triggered in one namespace, but what
>> namespaces and what rules should the thing actually then execute in.
>>
>> So no, trying to dismiss this as "there are no security issues" is
>> bogus. There very much are security issues.
> 
> I think you're referring to:
> 
>>>   We might need to invent built-in "protected userspace" because existing
>>>   "unprotected userspace" is not trustworthy enough to run kernel modules.
>>>   That's not just inventing fork_usermode_blob().
> 
> Another root process can modify the memory of usermode_blob process.

I'm not familiar with ptrace(); I'm just using /usr/bin/strace and /usr/bin/ltrace .
What I'm worrying is that some root process tampers with memory which initially
contained "hello64" above in order to let that memory do something different behavior.

For example, a usermode process started by fork_usermode_blob() which was initially
containing

----------
while (read(0, &uid, sizeof(uid)) == sizeof(uid)) {
    if (uid == 0)
        write(1, "OK\n", 3);
    else
        write(1, "NG\n", 3);
}
----------

can be somehow tampered like

----------
while (read(0, &uid, sizeof(uid)) == sizeof(uid)) {
    if (uid != 0)
        write(1, "OK\n", 3);
    else
        write(1, "NG\n", 3);
}
----------

due to interference from the rest of the system, how can we say "we trust kernel
code executing in userspace" ?

My question is: how is the byte array (which was copied from kernel space) kept secure/intact
under "root can poke into kernel or any process memory." environment? It is obvious that
we can't say "we trust kernel code executing in userspace" without some mechanism.

Currently fork_usermode_blob() is not providing security context for the byte array to be
executed. We could modify fork_usermode_blob() to provide security context for LSMs, but
I'll be more happy if we can implement that mechanism without counting on in-tree LSMs, for
SELinux is too complicated to support.

> I think that's Tetsuo's point about lack of LSM hooks is kernel_sock_shutdown().
> Obviously, kernel_sock_shutdown() can be called by kernel only.

I can't catch what you mean. The kernel code executing in userspace uses syscall
interface (e.g. SYSCALL_DEFINE2(shutdown, int, fd, int, how) path), doesn't it?

> I suspect he's imaging a hypothetical situation where kernel bits of kernel module
> interact with userblob bits of kernel module.
> Then another root process tampers with memory of userblob.

Yes, how to protect the memory of userblob is a concern. The memory of userblob can
interfere (or can be interfered by) the rest of the system is a problem.

> Then userblob interaction with kernel module can do kernel_sock_shutdown()
> on something that initial design of kernel+userblob module didn't intend.

I can't catch what you mean.

> I think this is trivially enforceable without creating new features.
> Existing security_ptrace_access_check() LSM hook can prevent tampering with
> memory of userblob.

There is security_ptrace_access_check() LSM hook, but no zero-configuration
method is available.

> 
> As far as userblob calling llvm and other things in sequence.
> That is no different from systemd calling things.

Right.

> security label can carry that execution context.

If files get a chance to be associated with appropriate pathname and
security label.

> 
>> My personally strongest argument for remoiving this kernel code is
>> that it's been there for a couple of years now, and it has never
>> actually done anything useful, and there's no actual sign that it ever
>> will, or that there is a solid plan in place for it.
> 
> you probably missed the detailed plan:
> https://lore.kernel.org/bpf/20200609235631.ukpm3xngbehfqthz@ast-mbp.dhcp.thefacebook.com/
> 
> The project #3 is the above is the one we're working on right now.
> It should be ready to post in a week.

I got a question on project #3. Given that "cat /sys/fs/bpf/my_ipv6_route"
produces the same human output as "cat /proc/net/ipv6_route", how security
checks which are done for "cat /proc/net/ipv6_route" can be enforced for
"cat /sys/fs/bpf/my_ipv6_route" ? Unless same security checks (e.g. permission
to read /proc/net/ipv6_route ) is enforced, such bpf usage sounds like a method
for bypassing existing security mechanisms.