From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97843C77B7D for ; Wed, 17 May 2023 07:05:51 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229617AbjEQHFs (ORCPT ); Wed, 17 May 2023 03:05:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229454AbjEQHFp (ORCPT ); Wed, 17 May 2023 03:05:45 -0400 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 793B9125; Wed, 17 May 2023 00:05:42 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R181e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=hsiangkao@linux.alibaba.com;NM=1;PH=DS;RN=23;SR=0;TI=SMTPD_---0VirfQ1g_1684307134; Received: from 30.97.48.190(mailfrom:hsiangkao@linux.alibaba.com fp:SMTPD_---0VirfQ1g_1684307134) by smtp.aliyun-inc.com; Wed, 17 May 2023 15:05:37 +0800 Message-ID: <7386e858-1026-2924-9df9-22350b1e33a7@linux.alibaba.com> Date: Wed, 17 May 2023 15:05:33 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 Subject: Re: [RFC PATCH bpf-next v3 00/37] FUSE BPF: A Stacked Filesystem Extension for FUSE To: Amir Goldstein Cc: Daniel Rosenberg , Miklos Szeredi , bpf@vger.kernel.org, Alexei Starovoitov , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-unionfs@vger.kernel.org, Daniel Borkmann , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Shuah Khan , Jonathan Corbet , Joanne Koong , Mykola Lysenko , kernel-team@android.com References: <20230418014037.2412394-1-drosen@google.com> <93e0e991-147f-0021-d635-95e615057273@linux.alibaba.com> From: Gao Xiang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Amir, On 2023/5/17 23:51, Amir Goldstein wrote: > On Wed, May 17, 2023 at 5:50 AM Gao Xiang wrote: >> >> >> >> On 2023/5/2 17:07, Daniel Rosenberg wrote: >>> On Mon, Apr 24, 2023 at 8:32 AM Miklos Szeredi wrote: >>>> >>>> >>>> The security model needs to be thought about and documented. Think >>>> about this: the fuse server now delegates operations it would itself >>>> perform to the passthrough code in fuse. The permissions that would >>>> have been checked in the context of the fuse server are now checked in >>>> the context of the task performing the operation. The server may be >>>> able to bypass seccomp restrictions. Files that are open on the >>>> backing filesystem are now hidden (e.g. lsof won't find these), which >>>> allows the server to obfuscate accesses to backing files. Etc. >>>> >>>> These are not particularly worrying if the server is privileged, but >>>> fuse comes with the history of supporting unprivileged servers, so we >>>> should look at supporting passthrough with unprivileged servers as >>>> well. >>>> >>> >>> This is on my todo list. My current plan is to grab the creds that the >>> daemon uses to respond to FUSE_INIT. That should keep behavior fairly >>> similar. I'm not sure if there are cases where the fuse server is >>> operating under multiple contexts. >>> I don't currently have a plan for exposing open files via lsof. Every >>> such file should relate to one that will show up though. I haven't dug >>> into how that's set up, but I'm open to suggestions. >>> >>>> My other generic comment is that you should add justification for >>>> doing this in the first place. I guess it's mainly performance. So >>>> how performance can be won in real life cases? It would also be good >>>> to measure the contribution of individual ops to that win. Is there >>>> another reason for this besides performance? >>>> >>>> Thanks, >>>> Miklos >>> >>> Our main concern with it is performance. We have some preliminary >>> numbers looking at the pure passthrough case. We've been testing using >>> a ramdrive on a somewhat slow machine, as that should highlight >>> differences more. We ran fio for sequential reads, and random >>> read/write. For sequential reads, we were seeing libfuse's >>> passthrough_hp take about a 50% hit, with fuse-bpf not being >>> detectably slower. For random read/write, we were seeing a roughly 90% >>> drop in performance from passthrough_hp, while fuse-bpf has about a 7% >>> drop in read and write speed. When we use a bpf that traces every >>> opcode, that performance hit increases to a roughly 1% drop in >>> sequential read performance, and a 20% drop in both read and write >>> performance for random read/write. We plan to make more complex bpf >>> examples, with fuse daemon equivalents to compare against. >>> >>> We have not looked closely at the impact of individual opcodes yet. >>> >>> There's also a potential ease of use for fuse-bpf. If you're >>> implementing a fuse daemon that is largely mirroring a backing >>> filesystem, you only need to write code for the differences in >>> behavior. For instance, say you want to remove image metadata like >>> location. You could give bpf information on what range of data is >>> metadata, and zero out that section without having to handle any other >>> operations. >> >> A bit out of topic (although I'm not quite look into FUSE BPF internals) >> After roughly listening to this topic in FS track last week, I'm not >> quite sure (at least in the long term) if it might be better if >> ebpf-related filter/redirect stuffs could be landed in vfs or in a >> somewhat stackable fs so that we could redirect/filter any sub-fstree >> in principle? It's just an open question and I have no real tendency >> of this but do we really need a BPF-filter functionality for each >> individual fs? > > I think that is a valid question, but the answer is that even if it makes sense, > doing something like this in vfs would be a much bigger project with larger > consequences on performance and security and whatnot, so even if > (and a very big if) this ever happens, using FUSE-BPF as a playground for > this sort of stuff would be a good idea. My current observation is that the total Fuse-BPF LoC is already beyond the whole FUSE itself. In addition, it almost hooks all fs operations which impacts something to me. > > This reminds me of union mounts - it made sense to have union mount > functionality in vfs, but after a long winding road, a stacked fs (overlayfs) > turned out to be a much more practical solution. Yeah, I agree. So it was just a pure hint on my side. > >> >> It sounds much like >> https://learn.microsoft.com/en-us/windows-hardware/drivers/ifs/about-file-system-filter-drivers >> > > Nice reference. > I must admit that I found it hard to understand what Windows filter drivers > can do compared to FUSE-BPF design. > It'd be nice to get some comparison from what is planned for FUSE-BPF. At least some investigation/analysis first might be better in the long term development. > > Interesting to note that there is a "legacy" Windows filter driver API, > so Windows didn't get everything right for the first API - that is especially > interesting to look at as repeating other people's mistakes would be a shame. I'm not familiar with that details as well, yet I saw that they have a filesystem filter subsystem, so I mentioned it here. Thanks, Gao Xiang > > Thanks, > Amir.