From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1EBBC31E5D for ; Tue, 18 Jun 2019 18:41:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id B04ED214AF for ; Tue, 18 Jun 2019 18:41:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Yob3O00V" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730097AbfFRSl3 (ORCPT ); Tue, 18 Jun 2019 14:41:29 -0400 Received: from mail-lj1-f193.google.com ([209.85.208.193]:35363 "EHLO mail-lj1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730060AbfFRSlZ (ORCPT ); Tue, 18 Jun 2019 14:41:25 -0400 Received: by mail-lj1-f193.google.com with SMTP id x25so608330ljh.2; Tue, 18 Jun 2019 11:41:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DJLrvL9J3hS2txWU4rwvYtfkkJ0VVqVRpiM8b1MdqDs=; b=Yob3O00VdrixZ26sFE8d4GZOnliwvMeAze/Cas41TuPtW7jaOAwS3WADHQElaVjq0b jiNjPyNYBhcsCvANwerwmAhO9rW1gS6LPse7Rq6sMfUPEZG0QzOciuNQHU1M5JtbrKaf pfFTpbTTzUS+095DiYkgk1pS64PnjdBIJ2JoEuyySWW0oPqBmAfvS0BiO4ipcBdV0LGv 0nVPI2yi7pigL9PNbz6KtDq2SfGM/iU1A+x/vTmnCM1hAOqNX8oAoO5RSuDiFztQ67BO rH7kql2k7FqH213BcONqXs/X1bD9WNFDD34jg5UOI6w6ryfplTuZmKqZzNfRmuFT37V+ er8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DJLrvL9J3hS2txWU4rwvYtfkkJ0VVqVRpiM8b1MdqDs=; b=HNA6J+nnUcO48R2z3oocjjT/VoOi88moQ0TiCNHQAhV6djaJzWx3BiD//2C0L84h4r 6SIC4SVaTqpXL7nGgeU0nXnruPLlYSQKfFsLuZfau57qzBlKK31fe1r9z084NzRgvdnI ExWq5Nr9OPlej5EKcj665yH53XntwbMBjaQ8LW4orqowwAXjzE5tcBROgDpHEvEY8FNY SfFlyOAlDV50VTqsR9lQPgTjm33V75j34hW6gEM6Eo9/JrSpTD6BhxwsNsSO8mhAtT9D 6Y8FqfMRjiLwA1eC81GVuuSxRBJ8pwSFyRRb/IJbQdaH6JGNqEcZ5IudpA6+aoal3NIo BxWg== X-Gm-Message-State: APjAAAXG4LlMtIL6N1IN7Oksuows1Dlpqj3cHNqjwqMLauxGAi9lNNh0 4CJeLHZL8EKa21njuDdOrF2FLv1+frxVpDXKsQA= X-Google-Smtp-Source: APXvYqyzqjZErueptxEPtLixGhhXjqGYMX/q9Li2tpkzFw/YvGvB3DzUavccTah9T2sj9HcbUYt864/NuaW/fFi+bx4= X-Received: by 2002:a2e:968e:: with SMTP id q14mr21628122lji.195.1560883282935; Tue, 18 Jun 2019 11:41:22 -0700 (PDT) MIME-Version: 1.0 References: <20190617180109.34950-1-sdf@google.com> <20190617180109.34950-2-sdf@google.com> <20190618163117.yuw44b24lo6prsrz@ast-mbp.dhcp.thefacebook.com> <20190618164913.GI9636@mini-arch> <20190618180944.GJ9636@mini-arch> In-Reply-To: <20190618180944.GJ9636@mini-arch> From: Alexei Starovoitov Date: Tue, 18 Jun 2019 11:41:11 -0700 Message-ID: Subject: Re: [PATCH bpf-next v6 1/9] bpf: implement getsockopt and setsockopt hooks To: Stanislav Fomichev Cc: Stanislav Fomichev , Network Development , bpf , "David S. Miller" , Alexei Starovoitov , Daniel Borkmann , Martin Lau Content-Type: text/plain; charset="UTF-8" Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Tue, Jun 18, 2019 at 11:09 AM Stanislav Fomichev wrote: > > On 06/18, Stanislav Fomichev wrote: > > On 06/18, Alexei Starovoitov wrote: > > > On Mon, Jun 17, 2019 at 11:01:01AM -0700, Stanislav Fomichev wrote: > > > > Implement new BPF_PROG_TYPE_CGROUP_SOCKOPT program type and > > > > BPF_CGROUP_{G,S}ETSOCKOPT cgroup hooks. > > > > > > > > BPF_CGROUP_SETSOCKOPT get a read-only view of the setsockopt arguments. > > > > BPF_CGROUP_GETSOCKOPT can modify the supplied buffer. > > > > Both of them reuse existing PTR_TO_PACKET{,_END} infrastructure. > > > > > > > > The buffer memory is pre-allocated (because I don't think there is > > > > a precedent for working with __user memory from bpf). This might be > > > > slow to do for each {s,g}etsockopt call, that's why I've added > > > > __cgroup_bpf_prog_array_is_empty that exits early if there is nothing > > > > attached to a cgroup. Note, however, that there is a race between > > > > __cgroup_bpf_prog_array_is_empty and BPF_PROG_RUN_ARRAY where cgroup > > > > program layout might have changed; this should not be a problem > > > > because in general there is a race between multiple calls to > > > > {s,g}etsocktop and user adding/removing bpf progs from a cgroup. > > > > > > > > The return code of the BPF program is handled as follows: > > > > * 0: EPERM > > > > * 1: success, execute kernel {s,g}etsockopt path after BPF prog exits > > > > * 2: success, do _not_ execute kernel {s,g}etsockopt path after BPF > > > > prog exits > > > > > > > > Note that if 0 or 2 is returned from BPF program, no further BPF program > > > > in the cgroup hierarchy is executed. This is in contrast with any existing > > > > per-cgroup BPF attach_type. > > > > > > This is drastically different from all other cgroup-bpf progs. > > > I think all programs should be executed regardless of return code. > > > It seems to me that 1 vs 2 difference can be expressed via bpf program logic > > > instead of return code. > > > > > > How about we do what all other cgroup-bpf progs do: > > > "any no is no. all yes is yes" > > > Meaning any ret=0 - EPERM back to user. > > > If all are ret=1 - kernel handles get/set. > > > > > > I think the desire to differentiate 1 vs 2 came from ordering issue > > > on getsockopt. > > > How about for setsockopt all progs run first and then kernel. > > > For getsockopt kernel runs first and then all progs. > > > Then progs will have an ability to overwrite anything the kernel returns. > > Good idea, makes sense. For getsockopt we'd also need to pass the return > > value of the kernel getsockopt to let bpf programs override it, but seems > > doable. Let me play with it a bit; I'll send another version if nothing > > major comes up. > > > > Thanks for another round of review! > One clarification: we'd still probably need to have 3 return codes for > setsockopt: > * any 0 - EPERM > * all 1 - continue with the kernel path (i.e. apply this sockopt as is) > * any 2 - return after all BPF hooks are executed (bypass kernel) > (any 0 trumps any 2 -> EPERM) > > The context is readonly for setsockopt, so it shouldn't be an issue. > Let me know if you have better idea how to handle that. I think we don't really need 2. The progs can reduce optlen to zero (or optname to BPF_EMPTY_SOCKOPT) and do ret=1. Then the kernel can see that nothing to be be done and return 0 to user space. Since parent prog in the chain will be able to see that child prog set optlen to zero, it will be able to overwrite if necessary. getsockopt wil be clean as well. all 1s return whatever was produced by progs to user space. and progs will be able to see what kernel wanted to return because the kernel's getsockopt logic ran first. ret=2 doesn't have any meaning for getsockopt, so nice to keep setsockopt symmetrical and don't do it there either.