linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Hou Tao <houtao1@huawei.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: <linux-block@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	"Network Development" <netdev@vger.kernel.org>,
	Jens Axboe <axboe@kernel.dk>,
	"Alexei Starovoitov" <ast@kernel.org>, <hare@suse.com>,
	<osandov@fb.com>, <ming.lei@redhat.com>, <damien.lemoal@wdc.com>,
	bvanassche <bvanassche@acm.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"Martin KaFai Lau" <kafai@fb.com>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>
Subject: Re: [RFC PATCH 1/2] block: add support for redirecting IO completion through eBPF
Date: Mon, 21 Oct 2019 21:42:03 +0800	[thread overview]
Message-ID: <84032c64-8e5e-6ad1-63ea-57adee7a2875@huawei.com> (raw)
In-Reply-To: <CAADnVQ+UJK41VL-epYGxrRzqL_UsC+X=J8EXEn2i8P+TPGA_jg@mail.gmail.com>

Hi,

On 2019/10/16 5:04, Alexei Starovoitov wrote:
> On Mon, Oct 14, 2019 at 5:21 AM Hou Tao <houtao1@huawei.com> wrote:
>>
>> For network stack, RPS, namely Receive Packet Steering, is used to
>> distribute network protocol processing from hardware-interrupted CPU
>> to specific CPUs and alleviating soft-irq load of the interrupted CPU.
>>
>> For block layer, soft-irq (for single queue device) or hard-irq
>> (for multiple queue device) is used to handle IO completion, so
>> RPS will be useful when the soft-irq load or the hard-irq load
>> of a specific CPU is too high, or a specific CPU set is required
>> to handle IO completion.
>>
>> Instead of setting the CPU set used for handling IO completion
>> through sysfs or procfs, we can attach an eBPF program to the
>> request-queue, provide some useful info (e.g., the CPU
>> which submits the request) to the program, and let the program
>> decides the proper CPU for IO completion handling.
>>
>> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ...
>>
>> +       rcu_read_lock();
>> +       prog = rcu_dereference_protected(q->prog, 1);
>> +       if (prog)
>> +               bpf_ccpu = BPF_PROG_RUN(q->prog, NULL);
>> +       rcu_read_unlock();
>> +
>>         cpu = get_cpu();
>> -       if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> -               shared = cpus_share_cache(cpu, ctx->cpu);
>> +       if (bpf_ccpu < 0 || !cpu_online(bpf_ccpu)) {
>> +               ccpu = ctx->cpu;
>> +               if (!test_bit(QUEUE_FLAG_SAME_FORCE, &q->queue_flags))
>> +                       shared = cpus_share_cache(cpu, ctx->cpu);
>> +       } else
>> +               ccpu = bpf_ccpu;
>>
>> -       if (cpu != ctx->cpu && !shared && cpu_online(ctx->cpu)) {
>> +       if (cpu != ccpu && !shared && cpu_online(ccpu)) {
>>                 rq->csd.func = __blk_mq_complete_request_remote;
>>                 rq->csd.info = rq;
>>                 rq->csd.flags = 0;
>> -               smp_call_function_single_async(ctx->cpu, &rq->csd);
>> +               smp_call_function_single_async(ccpu, &rq->csd);
> 
> Interesting idea.
> Not sure whether such programability makes sense from
> block layer point of view.
> 
>>From bpf side having a program with NULL input context is
> a bit odd. We never had such things in the past, so this patchset
> won't work as-is.
No, it just works.

> Also no-input means that the program choices are quite limited.
> Other than round robin and random I cannot come up with other
> cpu selection idea> I suggest to do writable tracepoint here instead.
> Take a look at trace_nbd_send_request.
> BPF prog can write into 'request'.
> For your use case it will be able to write into 'bpf_ccpu' local variable.
> If you keep it as raw tracepoint and don't add the actual tracepoint
> with TP_STRUCT__entry and TP_fast_assign then it won't be abi
> and you can change it later or remove it altogether.
> 
Your suggestion is much simpler, so there will be no need for adding a new
program type, and all things need to be done are adding a raw tracepoint,
moving bpf_ccpu into struct request, and letting a BPF program to modify it.

I will try and thanks for your suggestions.

Regards,
Tao

> .
> 


  parent reply	other threads:[~2019-10-21 13:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-14 12:28 [RFC PATCH 0/2] block: use eBPF to redirect IO completion Hou Tao
2019-10-14 12:28 ` [RFC PATCH 1/2] block: add support for redirecting IO completion through eBPF Hou Tao
2019-10-15 21:04   ` Alexei Starovoitov
2019-10-16  7:05     ` Hannes Reinecke
2019-10-21 13:42     ` Hou Tao [this message]
2019-10-21 13:48       ` Bart Van Assche
2019-10-21 14:45         ` Jens Axboe
2019-10-14 12:28 ` [RFC PATCH 2/2] selftests/bpf: add test program for redirecting IO completion CPU Hou Tao
2019-10-15  1:20 ` [RFC PATCH 0/2] block: use eBPF to redirect IO completion Bob Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84032c64-8e5e-6ad1-63ea-57adee7a2875@huawei.com \
    --to=houtao1@huawei.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bpf@vger.kernel.org \
    --cc=bvanassche@acm.org \
    --cc=damien.lemoal@wdc.com \
    --cc=daniel@iogearbox.net \
    --cc=hare@suse.com \
    --cc=kafai@fb.com \
    --cc=linux-block@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=osandov@fb.com \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).