linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hannes Reinecke <hare@suse.de>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	Hou Tao <houtao1@huawei.com>
Cc: linux-block@vger.kernel.org, bpf <bpf@vger.kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>, Alexei Starovoitov <ast@kernel.org>,
	hare@suse.com, osandov@fb.com, ming.lei@redhat.com,
	damien.lemoal@wdc.com, bvanassche <bvanassche@acm.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>
Subject: Re: [RFC PATCH 1/2] block: add support for redirecting IO completion through eBPF
Date: Wed, 16 Oct 2019 09:05:26 +0200	[thread overview]
Message-ID: <113e46d4-2a90-a694-8a24-7a6a3c019e88@suse.de> (raw)
In-Reply-To: <CAADnVQ+UJK41VL-epYGxrRzqL_UsC+X=J8EXEn2i8P+TPGA_jg@mail.gmail.com>

On 10/15/19 11:04 PM, Alexei Starovoitov wrote:
> On Mon, Oct 14, 2019 at 5:21 AM Hou Tao <houtao1@huawei.com> wrote:
>>
>> For network stack, RPS, namely Receive Packet Steering, is used to
>> distribute network protocol processing from hardware-interrupted CPU
>> to specific CPUs and alleviating soft-irq load of the interrupted CPU.
>>
>> For block layer, soft-irq (for single queue device) or hard-irq
>> (for multiple queue device) is used to handle IO completion, so
>> RPS will be useful when the soft-irq load or the hard-irq load
>> of a specific CPU is too high, or a specific CPU set is required
>> to handle IO completion.
>>
>> Instead of setting the CPU set used for handling IO completion
>> through sysfs or procfs, we can attach an eBPF program to the
>> request-queue, provide some useful info (e.g., the CPU
>> which submits the request) to the program, and let the program
>> decides the proper CPU for IO completion handling.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ...
>>
>> +       rcu_read_lock();
>> +       prog = rcu_dereference_protected(q->prog, 1);
>> +       if (prog)
>> +               bpf_ccpu = BPF_PROG_RUN(q->prog, NULL);
>> +       rcu_read_unlock();
>> +
>>         cpu = get_cpu();
>> -       if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> -               shared = cpus_share_cache(cpu, ctx->cpu);
>> +       if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) {
>> +               ccpu = ctx->cpu;
>> +               if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> +                       shared = cpus_share_cache(cpu, ctx->cpu);
>> +       } else
>> +               ccpu = bpf_ccpu;
>>
>> -       if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
>> +       if (cpu != ccpu && !shared && cpu_online(ccpu)) {
>>                 rq->csd.func = __blk_mq_complete_request_remote;
>>                 rq->csd.info = rq;
>>                 rq->csd.flags = 0;
>> -               smp_call_function_single_async(ctx->cpu, &rq->csd);
>> +               smp_call_function_single_async(ccpu, &rq->csd);
> 
> Interesting idea.
> Not sure whether such programability makes sense from
> block layer point of view.
> 
> From bpf side having a program with NULL input context is
> a bit odd. We never had such things in the past, so this patchset
> won't work as-is.
> Also no-input means that the program choices are quite limited.
> Other than round robin and random I cannot come up with other
> cpu selection ideas.
> I suggest to do writable tracepoint here instead.
> Take a look at trace_nbd_send_request.
> BPF prog can write into 'request'.
> For your use case it will be able to write into 'bpf_ccpu' local variable.
> If you keep it as raw tracepoint and don't add the actual tracepoint
> with TP_STRUCT__entry and TP_fast_assign then it won't be abi
> and you can change it later or remove it altogether.
> 
That basically was my idea, too.

Actually I was coming from a different angle, namely trying to figure
out how we could do generic error injection in the block layer.
eBPF would be one way of doing it, kprobes another.

But writable trace events ... I'll have to check if we can leverage that
here, too.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

  reply	other threads:[~2019-10-16  7:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14 12:28 [RFC PATCH 0/2] block: use eBPF to redirect IO completion Hou Tao
2019-10-14 12:28 ` [RFC PATCH 1/2] block: add support for redirecting IO completion through eBPF Hou Tao
2019-10-15 21:04   ` Alexei Starovoitov
2019-10-16  7:05     ` Hannes Reinecke [this message]
2019-10-21 13:42     ` Hou Tao
2019-10-21 13:48       ` Bart Van Assche
2019-10-21 14:45         ` Jens Axboe
2019-10-14 12:28 ` [RFC PATCH 2/2] selftests/bpf: add test program for redirecting IO completion CPU Hou Tao
2019-10-15  1:20 ` [RFC PATCH 0/2] block: use eBPF to redirect IO completion Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=113e46d4-2a90-a694-8a24-7a6a3c019e88@suse.de \
    --to=hare@suse.de \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@wdc.com \
    --cc=daniel@iogearbox.net \
    --cc=hare@suse.com \
    --cc=houtao1@huawei.com \
    --cc=kafai@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=osandov@fb.com \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).