From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E770DECE59D for ; Wed, 16 Oct 2019 07:05:30 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C9E8820873 for ; Wed, 16 Oct 2019 07:05:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387855AbfJPHFa (ORCPT ); Wed, 16 Oct 2019 03:05:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:44224 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387777AbfJPHFa (ORCPT ); Wed, 16 Oct 2019 03:05:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id DDE05B4E4; Wed, 16 Oct 2019 07:05:26 +0000 (UTC) Subject: Re: [RFC PATCH 1/2] block: add support for redirecting IO completion through eBPF To: Alexei Starovoitov , Hou Tao Cc: linux-block@vger.kernel.org, bpf , Network Development , Jens Axboe , Alexei Starovoitov , hare@suse.com, osandov@fb.com, ming.lei@redhat.com, damien.lemoal@wdc.com, bvanassche , Daniel Borkmann , Martin KaFai Lau , Song Liu , Yonghong Song References: <20191014122833.64908-1-houtao1@huawei.com> <20191014122833.64908-2-houtao1@huawei.com> From: Hannes Reinecke Openpgp: preference=signencrypt Autocrypt: addr=hare@suse.de; prefer-encrypt=mutual; keydata= mQINBE6KyREBEACwRN6XKClPtxPiABx5GW+Yr1snfhjzExxkTYaINHsWHlsLg13kiemsS6o7 qrc+XP8FmhcnCOts9e2jxZxtmpB652lxRB9jZE40mcSLvYLM7S6aH0WXKn8bOqpqOGJiY2bc 6qz6rJuqkOx3YNuUgiAxjuoYauEl8dg4bzex3KGkGRuxzRlC8APjHlwmsr+ETxOLBfUoRNuE b4nUtaseMPkNDwM4L9+n9cxpGbdwX0XwKFhlQMbG3rWA3YqQYWj1erKIPpgpfM64hwsdk9zZ QO1krgfULH4poPQFpl2+yVeEMXtsSou915jn/51rBelXeLq+cjuK5+B/JZUXPnNDoxOG3j3V VSZxkxLJ8RO1YamqZZbVP6jhDQ/bLcAI3EfjVbxhw9KWrh8MxTcmyJPn3QMMEp3wpVX9nSOQ tzG72Up/Py67VQe0x8fqmu7R4MmddSbyqgHrab/Nu+ak6g2RRn3QHXAQ7PQUq55BDtj85hd9 W2iBiROhkZ/R+Q14cJkWhzaThN1sZ1zsfBNW0Im8OVn/J8bQUaS0a/NhpXJWv6J1ttkX3S0c QUratRfX4D1viAwNgoS0Joq7xIQD+CfJTax7pPn9rT////hSqJYUoMXkEz5IcO+hptCH1HF3 qz77aA5njEBQrDRlslUBkCZ5P+QvZgJDy0C3xRGdg6ZVXEXJOQARAQABtCpIYW5uZXMgUmVp bmVja2UgKFN1U0UgTGFicykgPGhhcmVAc3VzZS5kZT6JAkEEEwECACsCGwMFCRLMAwAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheABQJOisquAhkBAAoJEGz4yi9OyKjPOHoQAJLeLvr6JNHx GPcHXaJLHQiinz2QP0/wtsT8+hE26dLzxb7hgxLafj9XlAXOG3FhGd+ySlQ5wSbbjdxNjgsq FIjqQ88/Lk1NfnqG5aUTPmhEF+PzkPogEV7Pm5Q17ap22VK623MPaltEba+ly6/pGOODbKBH ak3gqa7Gro5YCQzNU0QVtMpWyeGF7xQK76DY/atvAtuVPBJHER+RPIF7iv5J3/GFIfdrM+wS BubFVDOibgM7UBnpa7aohZ9RgPkzJpzECsbmbttxYaiv8+EOwark4VjvOne8dRaj50qeyJH6 HLpBXZDJH5ZcYJPMgunghSqghgfuUsd5fHmjFr3hDb5EoqAfgiRMSDom7wLZ9TGtT6viDldv hfWaIOD5UhpNYxfNgH6Y102gtMmN4o2P6g3UbZK1diH13s9DA5vI2mO2krGz2c5BOBmcctE5 iS+JWiCizOqia5Op+B/tUNye/YIXSC4oMR++Fgt30OEafB8twxydMAE3HmY+foawCpGq06yM vAguLzvm7f6wAPesDAO9vxRNC5y7JeN4Kytl561ciTICmBR80Pdgs/Obj2DwM6dvHquQbQrU Op4XtD3eGUW4qgD99DrMXqCcSXX/uay9kOG+fQBfK39jkPKZEuEV2QdpE4Pry36SUGfohSNq xXW+bMc6P+irTT39VWFUJMcSuQINBE6KyREBEACvEJggkGC42huFAqJcOcLqnjK83t4TVwEn JRisbY/VdeZIHTGtcGLqsALDzk+bEAcZapguzfp7cySzvuR6Hyq7hKEjEHAZmI/3IDc9nbdh EgdCiFatah0XZ/p4vp7KAelYqbv8YF/ORLylAdLh9rzLR6yHFqVaR4WL4pl4kEWwFhNSHLxe 55G56/dxBuoj4RrFoX3ynerXfbp4dH2KArPc0NfoamqebuGNfEQmDbtnCGE5zKcR0zvmXsRp qU7+caufueZyLwjTU+y5p34U4PlOO2Q7/bdaPEdXfpgvSpWk1o3H36LvkPV/PGGDCLzaNn04 BdiiiPEHwoIjCXOAcR+4+eqM4TSwVpTn6SNgbHLjAhCwCDyggK+3qEGJph+WNtNU7uFfscSP k4jqlxc8P+hn9IqaMWaeX9nBEaiKffR7OKjMdtFFnBRSXiW/kOKuuRdeDjL5gWJjY+IpdafP KhjvUFtfSwGdrDUh3SvB5knSixE3qbxbhbNxmqDVzyzMwunFANujyyVizS31DnWC6tKzANkC k15CyeFC6sFFu+WpRxvC6fzQTLI5CRGAB6FAxz8Hu5rpNNZHsbYs9Vfr/BJuSUfRI/12eOCL IvxRPpmMOlcI4WDW3EDkzqNAXn5Onx/b0rFGFpM4GmSPriEJdBb4M4pSD6fN6Y/Jrng/Bdwk SQARAQABiQIlBBgBAgAPBQJOiskRAhsMBQkSzAMAAAoJEGz4yi9OyKjPgEwQAIP/gy/Xqc1q OpzfFScswk3CEoZWSqHxn/fZasa4IzkwhTUmukuIvRew+BzwvrTxhHcz9qQ8hX7iDPTZBcUt ovWPxz+3XfbGqE+q0JunlIsP4N+K/I10nyoGdoFpMFMfDnAiMUiUatHRf9Wsif/nT6oRiPNJ T0EbbeSyIYe+ZOMFfZBVGPqBCbe8YMI+JiZeez8L9JtegxQ6O3EMQ//1eoPJ5mv5lWXLFQfx f4rAcKseM8DE6xs1+1AIsSIG6H+EE3tVm+GdCkBaVAZo2VMVapx9k8RMSlW7vlGEQsHtI0FT c1XNOCGjaP4ITYUiOpfkh+N0nUZVRTxWnJqVPGZ2Nt7xCk7eoJWTSMWmodFlsKSgfblXVfdM 9qoNScM3u0b9iYYuw/ijZ7VtYXFuQdh0XMM/V6zFrLnnhNmg0pnK6hO1LUgZlrxHwLZk5X8F uD/0MCbPmsYUMHPuJd5dSLUFTlejVXIbKTSAMd0tDSP5Ms8Ds84z5eHreiy1ijatqRFWFJRp ZtWlhGRERnDH17PUXDglsOA08HCls0PHx8itYsjYCAyETlxlLApXWdVl9YVwbQpQ+i693t/Y PGu8jotn0++P19d3JwXW8t6TVvBIQ1dRZHx1IxGLMn+CkDJMOmHAUMWTAXX2rf5tUjas8/v2 azzYF4VRJsdl+d0MCaSy8mUh Message-ID: <113e46d4-2a90-a694-8a24-7a6a3c019e88@suse.de> Date: Wed, 16 Oct 2019 09:05:26 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org On 10/15/19 11:04 PM, Alexei Starovoitov wrote: > On Mon, Oct 14, 2019 at 5:21 AM Hou Tao wrote: >> >> For network stack, RPS, namely Receive Packet Steering, is used to >> distribute network protocol processing from hardware-interrupted CPU >> to specific CPUs and alleviating soft-irq load of the interrupted CPU. >> >> For block layer, soft-irq (for single queue device) or hard-irq >> (for multiple queue device) is used to handle IO completion, so >> RPS will be useful when the soft-irq load or the hard-irq load >> of a specific CPU is too high, or a specific CPU set is required >> to handle IO completion. >> >> Instead of setting the CPU set used for handling IO completion >> through sysfs or procfs, we can attach an eBPF program to the >> request-queue, provide some useful info (e.g., the CPU >> which submits the request) to the program, and let the program >> decides the proper CPU for IO completion handling. >> >> Signed-off-by: Hou Tao > ... >> >> + rcu_read_lock(); >> + prog = rcu_dereference_protected(q->prog, 1); >> + if (prog) >> + bpf_ccpu = BPF_PROG_RUN(q->prog, NULL); >> + rcu_read_unlock(); >> + >> cpu = get_cpu(); >> - if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) >> - shared = cpus_share_cache(cpu, ctx->cpu); >> + if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) { >> + ccpu = ctx->cpu; >> + if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags)) >> + shared = cpus_share_cache(cpu, ctx->cpu); >> + } else >> + ccpu = bpf_ccpu; >> >> - if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) { >> + if (cpu != ccpu && !shared && cpu_online(ccpu)) { >> rq->csd.func = __blk_mq_complete_request_remote; >> rq->csd.info = rq; >> rq->csd.flags = 0; >> - smp_call_function_single_async(ctx->cpu, &rq->csd); >> + smp_call_function_single_async(ccpu, &rq->csd); > > Interesting idea. > Not sure whether such programability makes sense from > block layer point of view. > > From bpf side having a program with NULL input context is > a bit odd. We never had such things in the past, so this patchset > won't work as-is. > Also no-input means that the program choices are quite limited. > Other than round robin and random I cannot come up with other > cpu selection ideas. > I suggest to do writable tracepoint here instead. > Take a look at trace_nbd_send_request. > BPF prog can write into 'request'. > For your use case it will be able to write into 'bpf_ccpu' local variable. > If you keep it as raw tracepoint and don't add the actual tracepoint > with TP_STRUCT__entry and TP_fast_assign then it won't be abi > and you can change it later or remove it altogether. > That basically was my idea, too. Actually I was coming from a different angle, namely trying to figure out how we could do generic error injection in the block layer. eBPF would be one way of doing it, kprobes another. But writable trace events ... I'll have to check if we can leverage that here, too. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg HRB 247165 (AG München), GF: Felix Imendörffer