From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16790C10F0E for ; Tue, 9 Apr 2019 20:42:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id D4D9F2084B for ; Tue, 9 Apr 2019 20:42:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PxS5I0Ub" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726578AbfDIUmM (ORCPT ); Tue, 9 Apr 2019 16:42:12 -0400 Received: from mail-ot1-f66.google.com ([209.85.210.66]:44202 "EHLO mail-ot1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726562AbfDIUmL (ORCPT ); Tue, 9 Apr 2019 16:42:11 -0400 Received: by mail-ot1-f66.google.com with SMTP id d24so16841815otl.11 for ; Tue, 09 Apr 2019 13:42:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=wYqVsWqQSWnGTgrw/VKRGKm9uoHuQiVWv/uRiw5UCoQ=; b=PxS5I0UbCAbzgi8wJQRxqK+nG0JjhboHitwK8nsW7DyyRGsMpjNn04uOtMbH4FajOB 3q23Yu5PdnKSRZo2d2LpoB/xMl+IX1MqD6QuPRYo1JT2sar++WO3ZlbjUTH1LtICeF2N Kg6jJiDm1b+BtlYD/3VGuMXsSLSER1E+7tUIXcqByWionKyiKmKDBifsr7YUMMTixP7Z sSlxD7dK7cEusMmFgN6EiVYOBEC57kuVfnB0woPARe95FMciwa86CyCH2BGk4Ukw4p72 bl4yYICbdKil0aSruDHtVBlpk4Rde0lx2fQee3TDW/je/hVjTuX1sg0uPu4PrzG9Idh8 v62g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=wYqVsWqQSWnGTgrw/VKRGKm9uoHuQiVWv/uRiw5UCoQ=; b=gQP+TuxGmsjHFmqBwLVFgdFrKSN5x5mWKxliLO2MvuF/nyY1HMR8xrQneGBGMcuqpM GfUcw2YBMoeVsAWLrla3lfcRHXsUStOgOdpbp6H6BplnMLCssfmV5BcmelquvJ//9NiD I5ut0Eq9BvgpL+A0DMPYrXnVKJ+ZeBxTYZDRkCqTjJwj8J0eOjnQK19Uau5GhrzHaiqM tduazoxl6IWFJEP2l/5+ZA+qssSdHf9ZKB9QdshHvKBWrcOTCEwVE/dlnCzucYwUnCl8 lcsDm4DYvWehuD0Y+MOvXYKug2Jv7IV7BLQswegRdU1oXOWn40p8Fr1En2cCSnhYvU/4 MPjA== X-Gm-Message-State: APjAAAXo/nSnuUgRKu9D0u0WVHzdRdp1pzZxSpVQN5OFIYKzPSY0sRGp k5dyhSmOlNciG4od7QdlDcUH7WUyyKSOxrbJ7KJTSw== X-Google-Smtp-Source: APXvYqzB0EI0eAZK4Np0Ct7gHO8FBH+oa5UJr3ScmA+T9La0PQzPNF9S04XKu5pkashErRi58xCwCuvBkKB1r9XJGfA= X-Received: by 2002:a9d:3b25:: with SMTP id z34mr25365368otb.298.1554842530941; Tue, 09 Apr 2019 13:42:10 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jann Horn Date: Tue, 9 Apr 2019 22:41:44 +0200 Message-ID: Subject: Re: [PATCH v3 bpf-next 00/21] bpf: Sysctl hook To: Andrey Ignatov Cc: Network Development , Alexei Starovoitov , Daniel Borkmann , guro@fb.com, kernel-team@fb.com, Luis Chamberlain , Kees Cook , Alexey Dobriyan , kernel list , linux-fsdevel , linux-security-module Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Apr 9, 2019 at 10:26 PM Andrey Ignatov wrote: > The patch set introduces new BPF hook for sysctl. > > It adds new program type BPF_PROG_TYPE_CGROUP_SYSCTL and attach type > BPF_CGROUP_SYSCTL. > > BPF_CGROUP_SYSCTL hook is placed before calling to sysctl's proc_handler so > that accesses (read/write) to sysctl can be controlled for specific cgroup > and either allowed or denied, or traced. Don't look at the credentials of "current" in a read or write handler. Consider what happens if, for example, someone inside a cgroup opens a sysctl file and passes the file descriptor to another process outside the cgroup over a unix domain socket, and that other process then writes to it. Either do your access check on open, or use the credentials that were saved during open() in the read/write handler. > The hook has access to sysctl name, current sysctl value and (on write > only) to new sysctl value via corresponding helpers. New sysctl value can > be overridden by program. Both name and values (current/new) are > represented as strings same way they're visible in /proc/sys/. It is up to > program to parse these strings. But even if a filter is installed that prevents all access to a sysctl, you can still read it by installing your own filter that, when a read is attempted the next time, dumps the value into a map or something like that, right? > To help with parsing the most common kind of sysctl value, vector of > integers, two new helpers are provided: bpf_strtol and bpf_strtoul with > semantic similar to user space strtol(3) and strtoul(3). > > The hook also provides bpf_sysctl context with two fields: > * @write indicates whether sysctl is being read (= 0) or written (= 1); > * @file_pos is sysctl file position to read from or write to, can be > overridden. > > The hook allows to make better isolation for containerized applications > that are run as root so that one container can't change a sysctl and affect > all other containers on a host, make changes to allowed sysctl in a safer > way and simplify sysctl tracing for cgroups. Why can't you use a user namespace and isolate things properly that way? That would be much cleaner, wouldn't it?