All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Metzmacher <metze@samba.org>
To: Jens Axboe <axboe@kernel.dk>,
	Samba Technical <samba-technical@lists.samba.org>,
	io-uring <io-uring@vger.kernel.org>
Subject: Re: Samba with multichannel and io_uring
Date: Fri, 16 Oct 2020 18:03:47 +0200	[thread overview]
Message-ID: <892e855a-9c4f-ea5b-6728-f02df271c2c8@samba.org> (raw)
In-Reply-To: <18e153db-5ee9-f286-58ae-30065feda737@kernel.dk>


[-- Attachment #1.1: Type: text/plain, Size: 3154 bytes --]

Am 16.10.20 um 17:57 schrieb Jens Axboe:
> On 10/16/20 5:49 AM, Stefan Metzmacher wrote:
>> Hi Jens,
>>
>>> Thanks for sending this, very interesting! As per this email, I took a
>>> look at the NUMA bindings. If you can, please try this one-liner below.
>>> I'd be interested to know if that removes the fluctuations you're seeing
>>> due to bad locality.
>>>
>>> Looks like kthread_create_on_node() doesn't actually do anything (at
>>> least in terms of binding).
>>>
>>>
>>> diff --git a/fs/io-wq.c b/fs/io-wq.c
>>> index 74b84e8562fb..7bebb198b3df 100644
>>> --- a/fs/io-wq.c
>>> +++ b/fs/io-wq.c
>>> @@ -676,6 +676,7 @@ static bool create_io_worker(struct io_wq *wq, struct io_wqe *wqe, int index)
>>>  		kfree(worker);
>>>  		return false;
>>>  	}
>>> +	kthread_bind_mask(worker->task, cpumask_of_node(wqe->node));
>>>  
>>>  	raw_spin_lock_irq(&wqe->lock);
>>>  	hlist_nulls_add_head_rcu(&worker->nulls_node, &wqe->free_list);
>>>
>>
>> I no longer have access to that system, but I guess it will help, thanks!
> 
> I queued up it when I sent it out, and it'll go into stable as well.
> I since verified on NUMA here that it does the right thing, and that
> things weren't affinitized properly before. So pretty confident that it
> will indeed solve this issue!

Great thanks!

>> With this:
>>
>>         worker->task = kthread_create_on_node(io_wqe_worker, worker, wqe->node,
>>                                 "io_wqe_worker-%d/%d", index, wqe->node);
>>
>> I see only "io_wqe_worker-0" and "io_wqe_worker-1" in top, without '/0' or '/1' at the end,
>> this is because set_task_comm() truncates to 15 characters.
>>
>> As developer I think 'io_wqe' is really confusing, just from reading I thought it
>> means "work queue entry", but it's a per numa node worker pool container...
>> 'struct io_wq_node *wqn' would be easier to understand for me...
>>
>> Would it make sense to give each io_wq a unique identifier and use names like this:
>> (fdinfo of the io_uring fd could also include the io_wq id)
>>
>>  "io_wq-%u-%u%c", wq->id, wqn->node, index == IO_WQ_ACCT_BOUND ? 'B' : 'U')
>>
>>  io_wq-500-M
>>  io_wq-500-0B
>>  io_wq-500-0B
>>  io_wq-500-1B
>>  io_wq-500-0U
>>  io_wq-200-M
>>  io_wq-200-0B
>>  io_wq-200-0B
>>  io_wq-200-1B
>>  io_wq-200-0U
>>
>> I'm not sure how this interacts with workers moving between bound and unbound
>> and maybe a worker id might also be useful (or we rely on their pid)
> 
> I don't think that's too important, as it's just a snapshot in time. So
> it'll fluctuate based on the role of the worker.
> 
>> I just found that proc_task_name() handles PF_WQ_WORKER special
>> and cat /proc/$pid/comm can expose something like:
>>   kworker/u17:2-btrfs-worker-high
> 
> Yep, that's how they do fancier names. It's been on my agenda for a while
> to do something about this, I'll try and cook something up for 5.11.

With a function like wq_worker_comm being called by proc_task_name(),
you would capture current IO_WORKER_F_BOUND state and alter the name.

Please CC me on your patches in that direction.

Thanks!
metze




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-10-16 16:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-15  9:58 Samba with multichannel and io_uring Stefan Metzmacher
2020-10-15 10:06 ` Ralph Boehme
2020-10-15 15:45 ` Jeremy Allison
2020-10-15 16:11 ` Jens Axboe
2020-10-16 11:49   ` Stefan Metzmacher
2020-10-16 12:28     ` Stefan Metzmacher
2020-10-16 12:40       ` Stefan Metzmacher
2020-10-16 18:56         ` Jens Axboe
2020-10-16 15:57     ` Jens Axboe
2020-10-16 16:03       ` Stefan Metzmacher [this message]
2020-10-16 16:06         ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=892e855a-9c4f-ea5b-6728-f02df271c2c8@samba.org \
    --to=metze@samba.org \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=samba-technical@lists.samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.