From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=YQHJ=ZE=vger.kernel.org=bpf-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6E95FC43331
	for <bpf@archiver.kernel.org>; Tue, 12 Nov 2019 22:38:27 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 3EA752196E
	for <bpf@archiver.kernel.org>; Tue, 12 Nov 2019 22:38:27 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="Zzevqphf"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1726953AbfKLWi0 (ORCPT <rfc822;bpf@archiver.kernel.org>);
        Tue, 12 Nov 2019 17:38:26 -0500
Received: from mail-lj1-f194.google.com ([209.85.208.194]:44570 "EHLO
        mail-lj1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726932AbfKLWi0 (ORCPT <rfc822;bpf@vger.kernel.org>);
        Tue, 12 Nov 2019 17:38:26 -0500
Received: by mail-lj1-f194.google.com with SMTP id g3so229846ljl.11
        for <bpf@vger.kernel.org>; Tue, 12 Nov 2019 14:38:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=netronome-com.20150623.gappssmtp.com; s=20150623;
        h=date:from:to:cc:subject:message-id:in-reply-to:references
         :organization:mime-version:content-transfer-encoding;
        bh=29otD4aR3aEWAxaTCSYvPRcjUiBxLg5jZIiTShtx8AY=;
        b=ZzevqphfV1hsT/pdPQxdqEtMcTW7RpulPX5NWB767wlWVJU8wP0cFFXwTEejgB3y18
         BGQef3e2EyGtx8TTI4pm3zcKsGxt5tlftEymnzya2vmJZ4MgaFeglgbixMKQshyCuI8I
         BxecFVHe6JDfZSFdOJuZn4RHcJvhRbKkzkpHuI10iCvXEx5rz8z/wrWmK9i4XpQlQKTu
         vmHYxjRuKTDAPH3d6l5Ixf4jSkZNtLUdA1DE7bdK5zE0mHnhW1xNeXQ7SUXT+GS1W2mc
         C74U3hKWuIrrp3j6Om1THzpI51tY6dB5DXJ6PUmN1ZnqG6MiNgNKAlNiEXjhOaWEBss6
         gWBQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:date:from:to:cc:subject:message-id:in-reply-to
         :references:organization:mime-version:content-transfer-encoding;
        bh=29otD4aR3aEWAxaTCSYvPRcjUiBxLg5jZIiTShtx8AY=;
        b=Zh3AOk7RJrbgvtoW+3p0beV6fYPEoP/3Kt5tnv52QMky4avlPuGPizGRmJS1dV6KX9
         83JBU6JXgi+Hs33jtOIFVeVIL04o2Mpvb+o4QJxzevrv3e6W/Q3Bvfit5EexwiexWLrR
         zbvsojT4RC3LV8XWMT+6cmcMIFUEiqB5sYt0lLWUhlkjo0pm2Y14WZeN81/4+t8NarKA
         FLGHsWntC6QtWHE5vs943RqHvgdJmYegc17ImfEVtVgtnexGZl1nMOinyGD6QSTJhfiz
         6cIgkocmyf9RiGDL+ulJ+B7EVb2ML0D2SYmjcEL9xIEiKR7RZlP4ZhfmYFzKO5vvQ5mm
         bPjw==
X-Gm-Message-State: APjAAAUnlnsHDieBLVUGJMIhzRW5gLti7dgvJCdwwjetvi1xreIGboXg
        yw2gxHl/7aOERwNca+lHwaQ1pQf5hyk=
X-Google-Smtp-Source: APXvYqxs6xpz6Uyx+8Ynsibh1nQnWQv6mWzukODKje6LtHgBWlJCFb6B7xSCXeIvLIhIMKClYDWUfw==
X-Received: by 2002:a2e:8809:: with SMTP id x9mr121591ljh.82.1573598304546;
        Tue, 12 Nov 2019 14:38:24 -0800 (PST)
Received: from cakuba ([66.60.152.14])
        by smtp.gmail.com with ESMTPSA id m28sm30869ljc.96.2019.11.12.14.38.22
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 12 Nov 2019 14:38:24 -0800 (PST)
Date:   Tue, 12 Nov 2019 14:38:17 -0800
From:   Jakub Kicinski <jakub.kicinski@netronome.com>
To:     Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc:     Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
        Networking <netdev@vger.kernel.org>,
        Alexei Starovoitov <ast@fb.com>,
        Daniel Borkmann <daniel@iogearbox.net>,
        Kernel Team <kernel-team@fb.com>,
        Rik van Riel <riel@surriel.com>,
        Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [PATCH v2 bpf-next 1/3] bpf: add mmap() support for
 BPF_MAP_TYPE_ARRAY
Message-ID: <20191112143817.0c0eb768@cakuba>
In-Reply-To: <CAEf4Bzbx0WvgX9uGF4U1HM41m6kfdvWHCeYBSBRnQhR3egGy5w@mail.gmail.com>
References: <20191109080633.2855561-1-andriin@fb.com>
        <20191109080633.2855561-2-andriin@fb.com>
        <20191111103743.1c3a38a3@cakuba>
        <CAEf4Bzay-sCd5+5Y1+toJuEd6vNh+R7pkosYA7V7wDqTdoDxdw@mail.gmail.com>
        <20191112111750.2168b131@cakuba>
        <CAEf4Bzbx0WvgX9uGF4U1HM41m6kfdvWHCeYBSBRnQhR3egGy5w@mail.gmail.com>
Organization: Netronome Systems, Ltd.
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: bpf-owner@vger.kernel.org
Precedence: bulk
List-ID: <bpf.vger.kernel.org>
X-Mailing-List: bpf@vger.kernel.org

On Tue, 12 Nov 2019 14:03:50 -0800, Andrii Nakryiko wrote:
> On Tue, Nov 12, 2019 at 11:17 AM Jakub Kicinski wrote:
> > On Mon, 11 Nov 2019 18:06:42 -0800, Andrii Nakryiko wrote:  
> > > So let's say if sizeof(struct bpf_array) is 300, then I'd have to either:
> > >
> > > - somehow make sure that I allocate 4k (for data) + 300 (for struct
> > > bpf_array) in such a way that those 4k of data are 4k-aligned. Is
> > > there any way to do that?
> > > - assuming there isn't, then another way would be to allocate entire
> > > 4k page for struct bpf_array itself, but put it at the end of that
> > > page, so that 4k of data is 4k-aligned. While wasteful, the bigger
> > > problem is that pointer to bpf_array is not a pointer to allocated
> > > memory anymore, so we'd need to remember that and adjust address
> > > before calling vfree().
> > >
> > > Were you suggesting #2 as a solution? Or am I missing some other way to do this?  
> >
> > I am suggesting #2, that's the way to do it in the kernel.  
> 
> So I'm concerned about this approach, because it feels like a bunch of
> unnecessarily wasted memory. While there is no way around doing
> round_up(PAGE_SIZE) for data itself, it certainly is not necessary to
> waste almost entire page for struct bpf_array. And given this is going
> to be used for BPF maps backing global variables, there most probably
> will be at least 3 (.data, .bss, .rodata) per each program, and could
> be more. Also, while on x86_64 page is 4k, on other architectures it
> can be up to 64KB, so this seems wasteful.

With the extra mutex and int you grew struct bpf_map from 192B to 256B,
that's for every map on the system, unconditionally, and array map has
an extra pointer even if it doesn't need it.

Increasing "wasted" space when an opt-in feature is selected doesn't
seem all that terrible to me, especially that the overhead of aligning
up map size to page size is already necessary.

> What's your concern exactly with the way it's implemented in this patch?

Judging by other threads we seem to care about performance of BPF
(rightly so). Doing an extra pointer deref for every static data access
seems like an obvious waste.

But then again, it's just an obvious suggestion, take it or leave it..