From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4CA04C76186 for ; Tue, 23 Jul 2019 22:56:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 0797921926 for ; Tue, 23 Jul 2019 22:56:54 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=fb.com header.i=@fb.com header.b="ZpMEwg/+"; dkim=pass (1024-bit key) header.d=fb.onmicrosoft.com header.i=@fb.onmicrosoft.com header.b="PbjQuhg7" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732023AbfGWW4x (ORCPT ); Tue, 23 Jul 2019 18:56:53 -0400 Received: from mx0a-00082601.pphosted.com ([67.231.145.42]:1236 "EHLO mx0a-00082601.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726713AbfGWW4x (ORCPT ); Tue, 23 Jul 2019 18:56:53 -0400 Received: from pps.filterd (m0044008.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x6NMrgJ2027970; Tue, 23 Jul 2019 15:56:26 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : references : in-reply-to : content-type : content-id : content-transfer-encoding : mime-version; s=facebook; bh=9nq3Zw+TUh+t2ocWsFfIG5FD/g2DWHI+Iqdq02lK+14=; b=ZpMEwg/+1GMKonlyFnt2BqlzOn65OTdqf5brwWgIbQlcX3jD4cxxfrtTnRduW9Isw/jN TZkYsUPWmXxbqDb6viEOfdRZXaDUxRj9yr29grTGei8gIQskaYL95DJRFnsq+BB+gu73 Co3hVb4Z988RJspyEUoAwijhQTbwBXWff9o= Received: from mail.thefacebook.com (mailout.thefacebook.com [199.201.64.23]) by mx0a-00082601.pphosted.com with ESMTP id 2txb3u82ta-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-SHA384 bits=256 verify=NOT); Tue, 23 Jul 2019 15:56:26 -0700 Received: from prn-hub06.TheFacebook.com (2620:10d:c081:35::130) by prn-hub04.TheFacebook.com (2620:10d:c081:35::128) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1713.5; Tue, 23 Jul 2019 15:56:25 -0700 Received: from NAM04-CO1-obe.outbound.protection.outlook.com (192.168.54.28) by o365-in.thefacebook.com (192.168.16.30) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.1.1713.5 via Frontend Transport; Tue, 23 Jul 2019 15:56:25 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ZVES+BJZbm/O3xLLCV5q0s0evxzD07yHIwcebccODoloRwMQpaDoULTdgTksqKiMMB5/ILbGF/87jvdjxk1xAimVsqLbVNkDUpskfiq0/eQfGik/OJrt35ewNhFc3Rcm32Q2E91S08NnWSMhAJp61/ssABPHDu/F6Qv0Fl9boMJJ1QZ/PfvOCItUAF4uQPn0HCB4hGzuDr41QSykj/HoGnxDOMyop8jEHWS/p2Jmt3mm95V5Ims726bpB+lTA4cPb4GAGW7HDK14LazSnJYP00+p7bvV3euf90JawmEpZOU5Uz7mvyIk5n1u13L62EScaYSe9Ti5jjfNOD2aUjSuIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9nq3Zw+TUh+t2ocWsFfIG5FD/g2DWHI+Iqdq02lK+14=; b=hLnbsDgMXgitYI9/IFZeyQaPozZzxq39Gdca8JvGqKmSnLJfGVEdlLCUNW5OVCrCtrXnn/cblF3XLqCPTZr0MG/OMwINAngoK06hrSB3wfMY305nNe5zihVeoZVcBz4RNgR1ZcQb416SIzg0MLAvT6JNi3ay5Twm6yQiGHGjz1lcC/NDRRwMHeO+Y4isYPe1UHhT+k2Og01Gl9lrY4Y1Lf57hUQhq6MGBlQuNp9X/a7URcEu1TUqT1ZqVukvuHHeCgH38wkPfUR2YNlFfE2l96iqBrb4W+k1ZZc+RL1b7+WErho+B3E+OHMXnX43Jo7R/Gw5CE1z9MRrt09944RKmQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1;spf=pass smtp.mailfrom=fb.com;dmarc=pass action=none header.from=fb.com;dkim=pass header.d=fb.com;arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.onmicrosoft.com; s=selector1-fb-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9nq3Zw+TUh+t2ocWsFfIG5FD/g2DWHI+Iqdq02lK+14=; b=PbjQuhg7rvve4uAXwAlld6ezqyIDHhXwEjeWD2Q7u+9ycoEz+AzZzdisWkvHzWdIhms3GCEE+JeOA9BNulxu2cfbHebvpNsSXb7i194/ruBHGFqJRlPdCZ8XMxGkKY+1dRhZhpT1fjd+/hvSV5m1ZZeI/NykcsXG8BKr9mzLVeI= Received: from MWHPR15MB1165.namprd15.prod.outlook.com (10.175.3.22) by MWHPR15MB1247.namprd15.prod.outlook.com (10.175.2.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2094.17; Tue, 23 Jul 2019 22:56:23 +0000 Received: from MWHPR15MB1165.namprd15.prod.outlook.com ([fe80::4066:b41c:4397:27b7]) by MWHPR15MB1165.namprd15.prod.outlook.com ([fe80::4066:b41c:4397:27b7%7]) with mapi id 15.20.2094.013; Tue, 23 Jul 2019 22:56:23 +0000 From: Song Liu To: Andy Lutomirski CC: Kees Cook , "linux-security@vger.kernel.org" , Networking , bpf , Alexei Starovoitov , Daniel Borkmann , Kernel Team , Lorenz Bauer , Jann Horn , Greg KH , Linux API Subject: Re: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf Thread-Topic: [PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf Thread-Index: AQHVLSW5611trSWWQEuGSg4xXBva46awKSYAgAFFJoCAAehSgIACJsgAgAEbzICAARNPgIAANGYAgB9j6wCAATKggIAAgfQA Date: Tue, 23 Jul 2019 22:56:23 +0000 Message-ID: <514D5453-0AEE-420F-AEB6-3F4F58C62E7E@fb.com> References: <20190627201923.2589391-1-songliubraving@fb.com> <20190627201923.2589391-2-songliubraving@fb.com> <21894f45-70d8-dfca-8c02-044f776c5e05@kernel.org> <3C595328-3ABE-4421-9772-8D41094A4F57@fb.com> <0DE7F23E-9CD2-4F03-82B5-835506B59056@fb.com> <201907021115.DCD56BBABB@keescook> <4A7A225A-6C23-4C0F-9A95-7C6C56B281ED@fb.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: Apple Mail (2.3445.104.11) x-originating-ip: [2620:10d:c090:180::1:bd93] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: a8205f1b-fd67-4643-8126-08d70fc0fc4f x-microsoft-antispam: BCL:0;PCL:0;RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(2017052603328)(7193020);SRVR:MWHPR15MB1247; x-ms-traffictypediagnostic: MWHPR15MB1247: x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:3968; x-forefront-prvs: 0107098B6C x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(376002)(39860400002)(396003)(366004)(346002)(136003)(51444003)(199004)(189003)(8676002)(478600001)(6486002)(6116002)(76176011)(6916009)(86362001)(99286004)(6436002)(54906003)(316002)(68736007)(4326008)(33656002)(486006)(5660300002)(14454004)(25786009)(6512007)(2906002)(476003)(256004)(57306001)(5024004)(6246003)(53936002)(305945005)(66446008)(50226002)(66476007)(66556008)(229853002)(102836004)(8936002)(36756003)(446003)(76116006)(2616005)(66946007)(46003)(81166006)(11346002)(7416002)(7736002)(71200400001)(53546011)(71190400001)(81156014)(14444005)(64756008)(186003)(6506007);DIR:OUT;SFP:1102;SCL:1;SRVR:MWHPR15MB1247;H:MWHPR15MB1165.namprd15.prod.outlook.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;A:1;MX:1; received-spf: None (protection.outlook.com: fb.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: S/xk44dnnH3gpEZ+mRlUTcPQoiskQN6gww50B6yCuCXFh8lW5t+emrdk3rY1p+IAB9c4DW8C7kwIZ6KSVinGblVgyREDvtyHTdyfy6bkUR7IGaWFM2JnUBKK0ugTV4BljbDSmT8/QyZChIvu1DRQwBGQyNcRI4ACMvpDzfGjfOqHCwL42v3kjDJURGFLQmMO+ntvtkXrvPIdWNXc9VMXo7RJA9UoDdTbHYjd4aL44VxpIIB/UN0d17SO8r8VW9BslPKORzcCTf9ajA/eJFx+65TI0kK+ldOCRFgR/1PJSTdQ8Jjy4BTNXOWL9ntS2/qGqqnVkrEbxQImGcyNIkzLz5rYf/Y9VZMNrCNMyKuscNx2VwpkXwfVmxTBtNYwLH4sNC8td8X7+RrV7g7wdP6Q59WQFpW4lpTZlfcnyZVTUbk= Content-Type: text/plain; charset="us-ascii" Content-ID: <5EC7F3F996308B4AB86FDA88D0984145@namprd15.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: a8205f1b-fd67-4643-8126-08d70fc0fc4f X-MS-Exchange-CrossTenant-originalarrivaltime: 23 Jul 2019 22:56:23.5909 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 8ae927fe-1255-47a7-a2af-5f3a069daaa2 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: songliubraving@fb.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR15MB1247 X-OriginatorOrg: fb.com X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-07-23_09:,, signatures=0 X-Proofpoint-Spam-Details: rule=fb_default_notspam policy=fb_default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1906280000 definitions=main-1907230234 X-FB-Internal: deliver Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org > On Jul 23, 2019, at 8:11 AM, Andy Lutomirski wrote: >=20 > On Mon, Jul 22, 2019 at 1:54 PM Song Liu wrote: >>=20 >> Hi Andy, Lorenz, and all, >>=20 >>> On Jul 2, 2019, at 2:32 PM, Andy Lutomirski wrote: >>>=20 >>> On Tue, Jul 2, 2019 at 2:04 PM Kees Cook wrote: >>>>=20 >>>> On Mon, Jul 01, 2019 at 06:59:13PM -0700, Andy Lutomirski wrote: >>>>> I think I'm understanding your motivation. You're not trying to make >>>>> bpf() generically usable without privilege -- you're trying to create >>>>> a way to allow certain users to access dangerous bpf functionality >>>>> within some limits. >>>>>=20 >>>>> That's a perfectly fine goal, but I think you're reinventing the >>>>> wheel, and the wheel you're reinventing is quite complicated and >>>>> already exists. I think you should teach bpftool to be secure when >>>>> installed setuid root or with fscaps enabled and put your policy in >>>>> bpftool. If you want to harden this a little bit, it would seem >>>>> entirely reasonable to add a new CAP_BPF_ADMIN and change some, but >>>>> not all, of the capable() checks to check CAP_BPF_ADMIN instead of th= e >>>>> capabilities that they currently check. >>>>=20 >>>> If finer grained controls are wanted, it does seem like the /dev/bpf >>>> path makes the most sense. open, request abilities, use fd. The open c= an >>>> be mediated by DAC and LSM. The request can be mediated by LSM. This >>>> provides a way to add policy at the LSM level and at the tool level. >>>> (i.e. For tool-level controls: leave LSM wide open, make /dev/bpf owne= d >>>> by "bpfadmin" and bpftool becomes setuid "bpfadmin". For fine-grained >>>> controls, leave /dev/bpf wide open and add policy to SELinux, etc.) >>>>=20 >>>> With only a new CAP, you don't get the fine-grained controls. (The >>>> "request abilities" part is the key there.) >>>=20 >>> Sure you do: the effective set. It has somewhat bizarre defaults, but >>> I don't think that's a real problem. Also, this wouldn't be like >>> CAP_DAC_READ_SEARCH -- you can't accidentally use your BPF caps. >>>=20 >>> I think that a /dev capability-like object isn't totally nuts, but I >>> think we should do it well, and this patch doesn't really achieve >>> that. But I don't think bpf wants fine-grained controls like this at >>> all -- as I pointed upthread, a fine-grained solution really wants >>> different treatment for the different capable() checks, and a bunch of >>> them won't resemble capabilities or /dev/bpf at all. >>=20 >> With 5.3-rc1 out, I am back on this. :) >>=20 >> How about we modify the set as: >> 1. Introduce sys_bpf_with_cap() that takes fd of /dev/bpf. >=20 > I'm fine with this in principle, but: >=20 >> 2. Better handling of capable() calls through bpf code. I guess the >> biggest problem here is is_priv in verifier.c:bpf_check(). >=20 > I think it would be good to understand exactly what /dev/bpf will > enable one to do. Without some care, it would just become the next > CAP_SYS_ADMIN: if you can open it, sure, you're not root, but you can > intercept network traffic, modify cgroup behavior, and do plenty of > other things, any of which can probably be used to completely take > over the system. Well, yes. sys_bpf() is pretty powerful.=20 The goal of /dev/bpf is to enable special users to call sys_bpf(). In=20 the meanwhile, such users should not take down the whole system easily by accident, e.g., with rm -rf /. It is similar to CAP_BPF_ADMIN, without really adding the CAP_. =20 I think adding new CAP_ requires much more effort.=20 >=20 > It would also be nice to understand why you can't do what you need to > do entirely in user code using setuid or fscaps. It is not very easy to achieve the same control: only certain users can run certain tools (bpftool, etc.).=20 The closest approach I can find is: 1. use libcap (pam_cap) to give CAP_SETUID to certain users; 2. add setuid(0) to bpftool. The difference between this approach and /dev/bpf is that certain users would be able to run other tools that call setuid(). Though I am not=20 sure how many tools call setuid(), and how risky they are.=20 >=20 > Finally, at risk of rehashing some old arguments, I'll point out that > the bpf() syscall is an unusual design to begin with. As an example, > consider bpf_prog_attach(). Outside of bpf(), if I want to change the > behavior of a cgroup, I would write to a file in > /sys/kernel/cgroup/unified/whatever/, and normal DAC and MAC rules > apply. With bpf(), however, I just call bpf() to attach a program to > the cgroup. bpf() says "oh, you are capable(CAP_NET_ADMIN) -- go for > it!". Unless I missed something major, and I just re-read the code, > there is no check that the caller has write or LSM permission to > anything at all in cgroupfs, and the existing API would make it very > awkward to impose any kind of DAC rules here. >=20 > So I think it might actually be time to repay some techincal debt and > come up with a real fix. As a less intrusive approach, you could see > about requiring ownership of the cgroup directory instead of > CAP_NET_ADMIN. As a more intrusive but perhaps better approach, you > could invert the logic to to make it work like everything outside of > cgroup: add pseudo-files like bpf.inet_ingress to the cgroup > directories, and require a writable fd to *that* to a new improved > attach API. If a user could do: >=20 > int fd =3D open("/sys/fs/cgroup/.../bpf.inet_attach", O_RDWR); /* usual > DAC and MAC policy applies */ > int bpf_fd =3D setup the bpf stuff; /* no privilege required, unless > the program is huge or needs is_priv */ > bpf(BPF_IMPROVED_ATTACH, target =3D fd, program =3D bpf_fd); >=20 > there would be no capabilities or global privilege at all required for > this. It would just work with cgroup delegation, containers, etc. >=20 > I think you could even pull off this type of API change with only > libbpf changes. In particular, there's this code: >=20 > int bpf_prog_attach(int prog_fd, int target_fd, enum bpf_attach_type type= , > unsigned int flags) > { > union bpf_attr attr; >=20 > memset(&attr, 0, sizeof(attr)); > attr.target_fd =3D target_fd; > attr.attach_bpf_fd =3D prog_fd; > attr.attach_type =3D type; > attr.attach_flags =3D flags; >=20 > return sys_bpf(BPF_PROG_ATTACH, &attr, sizeof(attr)); > } >=20 > This would instead do something like: >=20 > int specific_target_fd =3D openat(target_fd, bpf_type_to_target[type], O_= RDWR); > attr.target_fd =3D specific_target_fd; > ... >=20 > return sys_bpf(BPF_PROG_IMPROVED_ATTACH, &attr, sizeof(attr)); >=20 > Would this solve your problem without needing /dev/bpf at all? This gives fine grain access control. I think it solves the problem.=20 But it also requires a lot of rework to sys_bpf(). And it may also=20 break backward/forward compatibility? Personally, I think it is an overkill for the original motivation:=20 call sys_bpf() with special user instead of root.=20 Alexei, Daniel: what do you think about this?=20 Thanks, Song=