All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jens Axboe <axboe@kernel.dk>
To: Stefan Roesch <shr@devkernel.io>, kernel-team@fb.com
Cc: olivier@trillion01.com, netdev@vger.kernel.org,
	io-uring@vger.kernel.org, kuba@kernel.org
Subject: Re: [PATCH v5 1/3] io_uring: add napi busy polling support
Date: Mon, 21 Nov 2022 16:59:26 -0700	[thread overview]
Message-ID: <74feda24-37fd-11ea-af0e-1eff9ed4941e@kernel.dk> (raw)
In-Reply-To: <067a22bc-72ba-9035-05da-93c43ce356f2@kernel.dk>

On 11/21/22 12:45?PM, Jens Axboe wrote:
> On 11/21/22 12:14?PM, Stefan Roesch wrote:
>> +/*
>> + * io_napi_add() - Add napi id to the busy poll list
>> + * @file: file pointer for socket
>> + * @ctx:  io-uring context
>> + *
>> + * Add the napi id of the socket to the napi busy poll list.
>> + */
>> +void io_napi_add(struct file *file, struct io_ring_ctx *ctx)
>> +{
>> +	unsigned int napi_id;
>> +	struct socket *sock;
>> +	struct sock *sk;
>> +	struct io_napi_entry *ne;
>> +
>> +	if (!io_napi_busy_loop_on(ctx))
>> +		return;
>> +
>> +	sock = sock_from_file(file);
>> +	if (!sock)
>> +		return;
>> +
>> +	sk = sock->sk;
>> +	if (!sk)
>> +		return;
>> +
>> +	napi_id = READ_ONCE(sk->sk_napi_id);
>> +
>> +	/* Non-NAPI IDs can be rejected */
>> +	if (napi_id < MIN_NAPI_ID)
>> +		return;
>> +
>> +	spin_lock(&ctx->napi_lock);
>> +	list_for_each_entry(ne, &ctx->napi_list, list) {
>> +		if (ne->napi_id == napi_id) {
>> +			ne->timeout = jiffies + NAPI_TIMEOUT;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
>> +	if (!ne)
>> +		goto out;
>> +
>> +	ne->napi_id = napi_id;
>> +	ne->timeout = jiffies + NAPI_TIMEOUT;
>> +	list_add_tail(&ne->list, &ctx->napi_list);
>> +
>> +out:
>> +	spin_unlock(&ctx->napi_lock);
>> +}
> 
> I think this all looks good now, just one minor comment on the above. Is
> the expectation here that we'll basically always add to the napi list?
> If so, then I think allocating 'ne' outside the spinlock would be a lot
> saner, and then just kfree() it for the unlikely case where we find a
> duplicate.

After thinking about this a bit more, I don't think this is done in the
most optimal fashion. If the list is longer than a few entries, this
check (or check-alloc-insert) is pretty expensive and it'll add
substantial overhead to the poll path for sockets if napi is enabled.

I think we should do something ala:

1) When arming poll AND napi has been enabled for the ring, then
    alloc io_napi_entry upfront and store it in ->async_data.

2) Maintain the state in the io_napi_entry. If we're on the list,
    that can be checked with just list_empty(), for example. If not
    on the list, assign timeout and add.

3) Have regular request cleanup free it.

This could be combined with an alloc cache, I would not do that for the
first iteration though.

This would make io_napi_add() cheap - no more list iteration, and no
more allocations. And that is arguably the most important part, as that
is called everytime the poll is woken up. Particularly for multishot
that makes a big difference.

It's also designed much better imho, moving the more expensive bits to
the setup side.

-- 
Jens Axboe

  reply	other threads:[~2022-11-21 23:59 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-21 19:14 [PATCH v5 0/3] io_uring: add napi busy polling support Stefan Roesch
2022-11-21 19:14 ` [PATCH v5 1/3] " Stefan Roesch
2022-11-21 19:45   ` Jens Axboe
2022-11-21 23:59     ` Jens Axboe [this message]
2022-11-21 19:14 ` [PATCH v5 2/3] io_uring: add api to set / get napi configuration Stefan Roesch
2022-11-21 19:46   ` Jens Axboe
2022-11-22 13:13     ` Ammar Faizi
2022-11-22 13:19       ` Jens Axboe
2022-11-25 21:43   ` Ammar Faizi
2022-11-28 20:22     ` Stefan Roesch
2022-11-21 19:14 ` [PATCH v5 3/3] io_uring: add api to set napi prefer busy poll Stefan Roesch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=74feda24-37fd-11ea-af0e-1eff9ed4941e@kernel.dk \
    --to=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=olivier@trillion01.com \
    --cc=shr@devkernel.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.