From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.9 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA95EC2D0A8 for ; Mon, 28 Sep 2020 17:43:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 775862083B for ; Mon, 28 Sep 2020 17:43:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="AoiO5hE+" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726567AbgI1Rnt (ORCPT ); Mon, 28 Sep 2020 13:43:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41934 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726327AbgI1Rnt (ORCPT ); Mon, 28 Sep 2020 13:43:49 -0400 Received: from mail-il1-x142.google.com (mail-il1-x142.google.com [IPv6:2607:f8b0:4864:20::142]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 548B7C061755 for ; Mon, 28 Sep 2020 10:43:49 -0700 (PDT) Received: by mail-il1-x142.google.com with SMTP id c5so2111543ilk.11 for ; Mon, 28 Sep 2020 10:43:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=trTf9MxNXUHe/wVcl0UkPz25wyvxMrwBM3dkdlvvAJk=; b=AoiO5hE+gEK/rTeD4JHu0d99EYDkbToApamZNlAnGbvbj3La7KlEOu9AlBSdCWc2dC l86k9bj5OeHI540TJq7NqAUGeCiZVuFaS7aBBbuIF1QTrkU87z8TKMq1h2z+vUolI0Lf wJhXgj7R/mOxB9gVBv+tzB+Jz/R4/hE4CQq7YUIcFZ/k86bK3lp4HUrKL5Ytu9/TVFy3 k2Mc7XYHDEQfdY77iEb/Hl6PVQW3sraqJ9lYAWS2Vf0DcAsDf2MO0dJ4vxs5Du9Or3O0 ljvkzEJ+jPmqL6zSw3etQZ9ZhfeK0JA2DQ1bLKurU/siQXBuAdyGPknVNZz0Bb34hEQu Qaqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=trTf9MxNXUHe/wVcl0UkPz25wyvxMrwBM3dkdlvvAJk=; b=RjKynS9CgfyJCGE5UeYbMA7nULVfRckExrQVpp1Ip4LMN13uvx6Slum57MRRR96yY3 UxQh6mvuS+S1vU7DvZxaL8aqe0/h0cPxppvPbhedsbS9R+UTFFa7/lmQeTqUsf4i4iW9 ixEKz5NqVtwRKPwgDjL+OIr/umv2xZx6E7ahkC5+gHeDC/GnBswGfqtxlL1COwAqWpNm M6d6vWi/sniQZ92YV1l3S68i3muhhEHCOv3BDoG6QJxKqn+JHiYayV2j1jPa6CiRMXIh R2ByRxUplGdyHpxPq/C2I4u0Vq9qT9CSUn8qvQgYo4tZLevHc9Q1qH+wBQzmC8pvcxTG 30qA== X-Gm-Message-State: AOAM532xEUpUYsbfPM1sygAC85VBLctGGt1H06XCigmc+ssAKolj8x6a yTIR4d4W2czNYAAcFUZ97zqkwFsZvEW2rm5+gwY0KQ== X-Google-Smtp-Source: ABdhPJwp5fS5bBcQBa6J804Tck7n1I/5FT6CJQZnpONe4FdzFfL03YPO88rBsiDSKX2lA8tvMgMs7HvtdVCSl503mCM= X-Received: by 2002:a92:5882:: with SMTP id z2mr2200630ilf.137.1601315028310; Mon, 28 Sep 2020 10:43:48 -0700 (PDT) MIME-Version: 1.0 References: <20200914172453.1833883-1-weiwan@google.com> In-Reply-To: <20200914172453.1833883-1-weiwan@google.com> From: Eric Dumazet Date: Mon, 28 Sep 2020 19:43:36 +0200 Message-ID: Subject: Re: [RFC PATCH net-next 0/6] implement kthread based napi poll To: Wei Wang Cc: "David S . Miller" , netdev , Jakub Kicinski , Paolo Abeni , Hannes Frederic Sowa , Felix Fietkau Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org On Mon, Sep 14, 2020 at 7:26 PM Wei Wang wrote: > > The idea of moving the napi poll process out of softirq context to a > kernel thread based context is not new. > Paolo Abeni and Hannes Frederic Sowa has proposed patches to move napi > poll to kthread back in 2016. And Felix Fietkau has also proposed > patches of similar ideas to use workqueue to process napi poll just a > few weeks ago. > > The main reason we'd like to push forward with this idea is that the > scheduler has poor visibility into cpu cycles spent in softirq context, > and is not able to make optimal scheduling decisions of the user threads. > For example, we see in one of the application benchmark where network > load is high, the CPUs handling network softirqs has ~80% cpu util. And > user threads are still scheduled on those CPUs, despite other more idle > cpus available in the system. And we see very high tail latencies. In this > case, we have to explicitly pin away user threads from the CPUs handling > network softirqs to ensure good performance. > With napi poll moved to kthread, scheduler is in charge of scheduling both > the kthreads handling network load, and the user threads, and is able to > make better decisions. In the previous benchmark, if we do this and we > pin the kthreads processing napi poll to specific CPUs, scheduler is > able to schedule user threads away from these CPUs automatically. > > And the reason we prefer 1 kthread per napi, instead of 1 workqueue > entity per host, is that kthread is more configurable than workqueue, > and we could leverage existing tuning tools for threads, like taskset, > chrt, etc to tune scheduling class and cpu set, etc. Another reason is > if we eventually want to provide busy poll feature using kernel threads > for napi poll, kthread seems to be more suitable than workqueue. > > In this patch series, I revived Paolo and Hannes's patch in 2016 and > left them as the first 2 patches. Then there are changes proposed by > Felix, Jakub, Paolo and myself on top of those, with suggestions from > Eric Dumazet. > > In terms of performance, I ran tcp_rr tests with 1000 flows with > various request/response sizes, with RFS/RPS disabled, and compared > performance between softirq vs kthread. Host has 56 hyper threads and > 100Gbps nic. > > req/resp QPS 50%tile 90%tile 99%tile 99.9%tile > softirq 1B/1B 2.19M 284us 987us 1.1ms 1.56ms > kthread 1B/1B 2.14M 295us 987us 1.0ms 1.17ms > > softirq 5KB/5KB 1.31M 869us 1.06ms 1.28ms 2.38ms > kthread 5KB/5KB 1.32M 878us 1.06ms 1.26ms 1.66ms > > softirq 1MB/1MB 10.78K 84ms 166ms 234ms 294ms > kthread 1MB/1MB 10.83K 82ms 173ms 262ms 320ms > > I also ran one application benchmark where the user threads have more > work to do. We do see good amount of tail latency reductions with the > kthread model. Wei, this is a very nice work. Please re-send it without the RFC tag, so that we can hopefully merge it ASAP. Thanks !