From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB2A1C7618F for ; Tue, 16 Jul 2019 16:26:36 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 84BBB206C2 for ; Tue, 16 Jul 2019 16:26:36 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730452AbfGPQ0b (ORCPT ); Tue, 16 Jul 2019 12:26:31 -0400 Received: from mx1.redhat.com ([209.132.183.28]:34184 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725926AbfGPQ0b (ORCPT ); Tue, 16 Jul 2019 12:26:31 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 18AB887620; Tue, 16 Jul 2019 16:26:30 +0000 (UTC) Received: from madcap2.tricolour.ca (ovpn-112-14.phx2.redhat.com [10.3.112.14]) by smtp.corp.redhat.com (Postfix) with ESMTPS id A40625C28D; Tue, 16 Jul 2019 16:26:19 +0000 (UTC) Date: Tue, 16 Jul 2019 12:26:16 -0400 From: Richard Guy Briggs To: Paul Moore Cc: Tycho Andersen , nhorman@tuxdriver.com, linux-api@vger.kernel.org, containers@lists.linux-foundation.org, LKML , dhowells@redhat.com, Linux-Audit Mailing List , netfilter-devel@vger.kernel.org, ebiederm@xmission.com, simo@redhat.com, netdev@vger.kernel.org, linux-fsdevel@vger.kernel.org, Eric Paris , "Serge E. Hallyn" Subject: Re: [PATCH ghak90 V6 02/10] audit: add container id Message-ID: <20190716162616.7kgvqbqxn4icqyb3@madcap2.tricolour.ca> References: <20190529222835.GD8959@cisco> <20190530170913.GA16722@mail.hallyn.com> <20190530212900.GC5739@cisco> <20190708181237.5poheliito7zpvmc@madcap2.tricolour.ca> <20190716153705.xx7dwrhliny5amut@madcap2.tricolour.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20180716 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Tue, 16 Jul 2019 16:26:30 +0000 (UTC) Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org On 2019-07-16 12:08, Paul Moore wrote: > On Tue, Jul 16, 2019 at 11:37 AM Richard Guy Briggs wrote: > > On 2019-07-15 17:09, Paul Moore wrote: > > > On Mon, Jul 8, 2019 at 2:12 PM Richard Guy Briggs wrote: > > > > On 2019-05-30 19:26, Paul Moore wrote: > > > > > > ... > > > > > > > > I like the creativity, but I worry that at some point these > > > > > limitations are going to be raised (limits have a funny way of doing > > > > > that over time) and we will be in trouble. I say "trouble" because I > > > > > want to be able to quickly do an audit container ID comparison and > > > > > we're going to pay a penalty for these larger values (we'll need this > > > > > when we add multiple auditd support and the requisite record routing). > > > > > > > > > > Thinking about this makes me also realize we probably need to think a > > > > > bit longer about audit container ID conflicts between orchestrators. > > > > > Right now we just take the value that is given to us by the > > > > > orchestrator, but if we want to allow multiple container orchestrators > > > > > to work without some form of cooperation in userspace (I think we have > > > > > to assume the orchestrators will not talk to each other) we likely > > > > > need to have some way to block reuse of an audit container ID. We > > > > > would either need to prevent the orchestrator from explicitly setting > > > > > an audit container ID to a currently in use value, or instead generate > > > > > the audit container ID in the kernel upon an event triggered by the > > > > > orchestrator (e.g. a write to a /proc file). I suspect we should > > > > > start looking at the idr code, I think we will need to make use of it. > > > > > > > > To address this, I'd suggest that it is enforced to only allow the > > > > setting of descendants and to maintain a master list of audit container > > > > identifiers (with a hash table if necessary later) that includes the > > > > container owner. > > > > > > We're discussing the audit container ID management policy elsewhere in > > > this thread so I won't comment on that here, but I did want to say > > > that we will likely need something better than a simple list of audit > > > container IDs from the start. It's common for systems to have > > > thousands of containers now (or multiple thousands), which tells me > > > that a list is a poor choice. You mentioned a hash table, so I would > > > suggest starting with that over the list for the initial patchset. > > > > I saw that as an internal incremental improvement that did not affect > > the API, so I wanted to keep things a bit simpler (as you've requested > > in the past) to get this going, and add that enhancement later. > > In general a simple approach is a good way to start when the > problem/use-case is not very well understood; in other words, don't > spend a lot of time/effort optimizing something you don't yet > understand. In this case we know that people want to deploy a *lot* > of containers on a single system so we should design the data > structures appropriately. A list is simply not a good fit here, I > believe/hope you know that too. Yes, I knew that, which is why I alluded to a hash table... > > I'll start working on it now. The hash table would simply point to > > lists anyways unless you can recommend a better approach. > > I assume when you say "point to lists" you are talking about using > lists for the hash buckets? If so, yes that should be fine at this > point. In general if the per-bucket lists become a bottleneck we can > look at the size of the table (or make it tunable) or even use a > different approach entirely. Ultimately the data store is an > implementation detail private to the audit subsystem in the kernel so > we should be able to change it as necessary without breaking anything. Yes, this is what I had in mind. It would be tunable either by a macro or a config option, so the exact value isn't a critical implementation detail that can be easily tuned as we gain experience with it. And yes, the intent was that it was a non-user-perceivable implementation choice other than performace metrics. > paul moore - RGB -- Richard Guy Briggs Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635