From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,T_DKIMWL_WL_HIGH,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67AEDC282DE for ; Thu, 23 May 2019 19:46:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 41C902133D for ; Thu, 23 May 2019 19:46:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558640810; bh=3HPlDIKoNiW5KkLhHXh369P9nPblupMd2QgOesjxNKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=ybExzkVFpivrZoDGy3u+tna/bXJ+7R3k71eVZsrAxpHJcD2K4ajNsUeKlKFqt68zH 8a2B8dF3BaQRhqZOtFjPH/LXjNol/PX0wOBhsv8hS0u4YrzcYJ/VzyPwO01C6WR7YF 653BI//jINniG4OAEomCODaZG5Zx5iZc+5re6Zck= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388689AbfEWTOL (ORCPT ); Thu, 23 May 2019 15:14:11 -0400 Received: from mail.kernel.org ([198.145.29.99]:48134 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2388459AbfEWTOK (ORCPT ); Thu, 23 May 2019 15:14:10 -0400 Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0C79D2133D; Thu, 23 May 2019 19:14:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558638849; bh=3HPlDIKoNiW5KkLhHXh369P9nPblupMd2QgOesjxNKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=hwfscTuF/L6/3V1BWU9LC5E2Xe78Q5REQG3laWHav3eQAKama1woIAf41uwJ18IwG 53VaPo5r4tHpamelsW5+TjMKfo/e/YOulO1RxalUVQQqCvK7pYtI/3JLyV7eJLYeVj VpSfgWqqCNt8w4Ixg0j1d69BNRwmMC1SvwpVHZGc= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Daniel Borkmann , Martin KaFai Lau , Alexei Starovoitov Subject: [PATCH 4.14 76/77] bpf, lru: avoid messing with eviction heuristics upon syscall lookup Date: Thu, 23 May 2019 21:06:34 +0200 Message-Id: <20190523181730.447456891@linuxfoundation.org> X-Mailer: git-send-email 2.21.0 In-Reply-To: <20190523181719.982121681@linuxfoundation.org> References: <20190523181719.982121681@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org From: Daniel Borkmann commit 50b045a8c0ccf44f76640ac3eea8d80ca53979a3 upstream. One of the biggest issues we face right now with picking LRU map over regular hash table is that a map walk out of user space, for example, to just dump the existing entries or to remove certain ones, will completely mess up LRU eviction heuristics and wrong entries such as just created ones will get evicted instead. The reason for this is that we mark an entry as "in use" via bpf_lru_node_set_ref() from system call lookup side as well. Thus upon walk, all entries are being marked, so information of actual least recently used ones are "lost". In case of Cilium where it can be used (besides others) as a BPF based connection tracker, this current behavior causes disruption upon control plane changes that need to walk the map from user space to evict certain entries. Discussion result from bpfconf [0] was that we should simply just remove marking from system call side as no good use case could be found where it's actually needed there. Therefore this patch removes marking for regular LRU and per-CPU flavor. If there ever should be a need in future, the behavior could be selected via map creation flag, but due to mentioned reason we avoid this here. [0] http://vger.kernel.org/bpfconf.html Fixes: 29ba732acbee ("bpf: Add BPF_MAP_TYPE_LRU_HASH") Fixes: 8f8449384ec3 ("bpf: Add BPF_MAP_TYPE_LRU_PERCPU_HASH") Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/hashtab.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c @@ -498,18 +498,30 @@ static u32 htab_map_gen_lookup(struct bp return insn - insn_buf; } -static void *htab_lru_map_lookup_elem(struct bpf_map *map, void *key) +static __always_inline void *__htab_lru_map_lookup_elem(struct bpf_map *map, + void *key, const bool mark) { struct htab_elem *l = __htab_map_lookup_elem(map, key); if (l) { - bpf_lru_node_set_ref(&l->lru_node); + if (mark) + bpf_lru_node_set_ref(&l->lru_node); return l->key + round_up(map->key_size, 8); } return NULL; } +static void *htab_lru_map_lookup_elem(struct bpf_map *map, void *key) +{ + return __htab_lru_map_lookup_elem(map, key, true); +} + +static void *htab_lru_map_lookup_elem_sys(struct bpf_map *map, void *key) +{ + return __htab_lru_map_lookup_elem(map, key, false); +} + static u32 htab_lru_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf) { @@ -1160,6 +1172,7 @@ const struct bpf_map_ops htab_lru_map_op .map_free = htab_map_free, .map_get_next_key = htab_map_get_next_key, .map_lookup_elem = htab_lru_map_lookup_elem, + .map_lookup_elem_sys_only = htab_lru_map_lookup_elem_sys, .map_update_elem = htab_lru_map_update_elem, .map_delete_elem = htab_lru_map_delete_elem, .map_gen_lookup = htab_lru_map_gen_lookup, @@ -1190,7 +1203,6 @@ static void *htab_lru_percpu_map_lookup_ int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value) { - struct bpf_htab *htab = container_of(map, struct bpf_htab, map); struct htab_elem *l; void __percpu *pptr; int ret = -ENOENT; @@ -1206,8 +1218,9 @@ int bpf_percpu_hash_copy(struct bpf_map l = __htab_map_lookup_elem(map, key); if (!l) goto out; - if (htab_is_lru(htab)) - bpf_lru_node_set_ref(&l->lru_node); + /* We do not mark LRU map element here in order to not mess up + * eviction heuristics when user space does a map walk. + */ pptr = htab_elem_get_ptr(l, map->key_size); for_each_possible_cpu(cpu) { bpf_long_memcpy(value + off,