From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 582B2C4332F for ; Tue, 19 Oct 2021 03:14:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 402B561074 for ; Tue, 19 Oct 2021 03:14:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232086AbhJSDQu (ORCPT ); Mon, 18 Oct 2021 23:16:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52038 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229774AbhJSDQs (ORCPT ); Mon, 18 Oct 2021 23:16:48 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6AC0C061745 for ; Mon, 18 Oct 2021 20:14:36 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id t184so15529510pfd.0 for ; Mon, 18 Oct 2021 20:14:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:to:cc:references :from:in-reply-to:content-transfer-encoding; bh=aM3Uu5Y3y5Vgada2bdPWHaQ5RiDRtrSzHIKW6V50leE=; b=eeY8dNlntVANS/YumjJeKJj3EiisMkt7ORQbYNbiJrNn2j5RLJeDnnqNZ2xjLg0/gC WEoMCb10GIAEaLRe295tQhPF7lQdErJd0ZUN1Y87RkNAJr5dxPQiBNvbnl9xqRNa1l/N 9uaNftBK6wsiTb8nqVTwsysQRaVJNZWJ9poMTH2jGENqTUzyzNyLzCz4QvZktp4W1pqS 0YWYAYRbtuawAGUMBaZMDuP9Ux1hrRO709/zM3eS6TINDoTI2bje3gw87p/0lDtJlL2U TszjhnOucLEe1Ld2ilVcsvghIVrlosxHrPkYyF/2e3c+Rwbb5ZJTvQ58SAt3Sqsn/oH/ XAIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :to:cc:references:from:in-reply-to:content-transfer-encoding; bh=aM3Uu5Y3y5Vgada2bdPWHaQ5RiDRtrSzHIKW6V50leE=; b=4kHYV2SwGkGNHqq4zibtFpmPkB8pCzqXAuqgdkS2NuWXQStkRLMzLwCY/ydngUN1eC Y2HdWUddoEeed3d3EFm9jg9fRBpzPQMBzk75ONAnGbkO/ZEJEckDweoxkhLAwXjdGgwj ZY+VxKfJ9lIoBt7QQhsP7Ye62K/cOEr1JMIbJybeoGwBLyDyHmIgjKjuj6dSenys2FN/ 1+BrrJMY02zJEq5pBhtfAwXz27AdeEL93uzZKf3wTQBrl+V6n2uJJDLGqD4msDW+r8mu bzrNOnqLiH9vKxU4opYP10bQ0UzhaFPmrKQ+StbE10rMIEJH5evAen4Vh+DnaO3vWiB+ cVhg== X-Gm-Message-State: AOAM530XO8Ok+6S14DiKZa05vrSay/VhTYKCMTFFGZ5lIQslPpu8t85t 8ddqFDHT3GsagseG10judvIZng== X-Google-Smtp-Source: ABdhPJwfYtqxpPN0L68LDtEG9zZo+GUWjRoapYQgmNZQnF4PdyJWOOIUoIKa0aug2G03DckNnuqyYA== X-Received: by 2002:a05:6a00:198c:b0:44d:ce87:d164 with SMTP id d12-20020a056a00198c00b0044dce87d164mr10603571pfl.64.1634613276110; Mon, 18 Oct 2021 20:14:36 -0700 (PDT) Received: from [10.70.253.117] ([139.177.225.234]) by smtp.gmail.com with ESMTPSA id gm14sm791619pjb.40.2021.10.18.20.14.32 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 18 Oct 2021 20:14:35 -0700 (PDT) Message-ID: Date: Tue, 19 Oct 2021 11:14:30 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [External] Re: [PATCH] bpf: use count for prealloc hashtab too To: Alexei Starovoitov Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Network Development , bpf , LKML References: <20211015090353.31248-1-zhouchengming@bytedance.com> <6d7246b6-195e-ee08-06b1-2d1ec722e7b2@bytedance.com> From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2021/10/19 上午9:57, Alexei Starovoitov 写道: > On Sun, Oct 17, 2021 at 10:49 PM Chengming Zhou > wrote: >> >> 在 2021/10/16 上午3:58, Alexei Starovoitov 写道: >>> On Fri, Oct 15, 2021 at 11:04 AM Chengming Zhou >>> wrote: >>>> >>>> We only use count for kmalloc hashtab not for prealloc hashtab, because >>>> __pcpu_freelist_pop() return NULL when no more elem in pcpu freelist. >>>> >>>> But the problem is that __pcpu_freelist_pop() will traverse all CPUs and >>>> spin_lock for all CPUs to find there is no more elem at last. >>>> >>>> We encountered bad case on big system with 96 CPUs that alloc_htab_elem() >>>> would last for 1ms. This patch use count for prealloc hashtab too, >>>> avoid traverse and spin_lock for all CPUs in this case. >>>> >>>> Signed-off-by: Chengming Zhou >>> >>> It's not clear from the commit log what you're solving. >>> The atomic inc/dec in critical path of prealloc maps hurts performance. >>> That's why it's not used. >>> >> Thanks for the explanation, what I'm solving is when hash table hasn't free >> elements, we don't need to call __pcpu_freelist_pop() to traverse and >> spin_lock all CPUs. The ftrace output of this bad case is below: >> >> 50) | htab_map_update_elem() { >> 50) 0.329 us | _raw_spin_lock_irqsave(); >> 50) 0.063 us | lookup_elem_raw(); >> 50) | alloc_htab_elem() { >> 50) | pcpu_freelist_pop() { >> 50) 0.209 us | _raw_spin_lock(); >> 50) 0.264 us | _raw_spin_lock(); > > This is LRU map. Not hash map. > It will grab spin_locks of other cpus > only if all previous cpus don't have free elements. > Most likely your map is actually full and doesn't have any free elems. > Since it's an lru it will force free an elem eventually. > Maybe I missed something, the map_update_elem function of LRU map is htab_lru_map_update_elem() and the htab_map_update_elem() above is the map_update_elem function of hash map. Because of the implementation of percpu freelist used in hash map, it will spin_lock all other CPUs when there is no free elements. Thanks.