fuse uring / wake_up on the same core

* fuse uring / wake_up on the same core
@ 2023-03-24 19:50 Bernd Schubert
  2023-03-24 22:44 ` Bernd Schubert
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Bernd Schubert @ 2023-03-24 19:50 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra, Miklos Szeredi
  Cc: Dharmendra Singh, linux-fsdevel, Amir Goldstein, linux-kernel

Ingo, Peter,

I would like to ask how to wake up from a waitq on the same core. I have 
tried __wake_up_sync()/WF_SYNC, but I do not see any effect.

I'm currently working on fuse/uring communication patches, besides uring 
communication there is also a queue per core. Basic bonnie++ benchmarks 
with a zero file size to just create/read(0)/delete show a ~3x IOPs 
difference between CPU bound bonnie++ and unbound - i.e. with these 
patches it _not_ fuse-daemon that needs to be bound, but the application 
doing IO to the file system. We basically have

bonnie -> vfs                                (app/vfs)
   fuse_req                                   (app/fuse.ko)
   qid = task_cpu(current)                  (app/fuse.ko)
     ring(qid) / SQE completion (fuse.ko)   (app/fuse.ko/uring)
       wait_event(req->waitq, ...)          (app/fuse.ko)
       [app wait]
         daemon ring / handle CQE           (daemon)
           send-back result as SQE          (daemon/uring)
             fuse_request_end               (daemon/uring/fuse.ko)
           wake_up()  ---> random core      (daemon/uring/fuse.ko)
       [app wakeup/fuse/vfs/syscall return]
bonnie ==> different core

1) bound

[root@imesrv1 ~]# numactl --localalloc --physcpubind=0 bonnie++ -q -x 1 
-s0  -d /scratch/dest/ -n 20:1:1:20 -r 0 -u 0 | bon_csv2txt
                     ------Sequential Create------ --------Random 
Create--------
                     -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
       files:max:min  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
imesrv1   20:1:1:20  6229  28 11289  41 12785  24  6615  28  7769  40 
10020  25
Latency               411us     824us     816us     298us   10473us 
200ms

2) not bound

[root@imesrv1 ~]# bonnie++ -q -x 1 -s0  -d /scratch/dest/ -n 20:1:1:20 
-r 0 -u 0 | bon_csv2txt
                     ------Sequential Create------ --------Random 
Create--------
                     -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
       files:max:min  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
/sec %CP
imesrv1   20:1:1:20  2064  33  2923  43  4556  28  2061  33  2186  42 
4245  30
Latency               850us    3914us    2496us     738us     758us 
6469us

With less files the difference becomes a bit smaller, but is still very 
visible. Besides cache line bouncing, I'm sure that CPU frequency and 
C-states will matter - I could tune that it in the lab, but in the end I 
want to test what users do (I had recently checked with large HPC center 
- Forschungszentrum Juelich - their HPC compute nodes are not tuned up, 
to save energy).
Also, in order to really tune down latencies, I want want to add a 
struct file_operations::uring_cmd_iopoll thread, which will spin for a 
short time and avoid most of kernel/userspace communication. If 
applications (with n-nthreads < n-cores) then get scheduled on different 
core differnent rings will be used, result in
n-threads-spinning > n-threads-application

There was already a related thread about fuse before

https://lore.kernel.org/lkml/1638780405-38026-1-git-send-email-quic_pragalla@quicinc.com/

With the fuse-uring patches that part is basically solved - the waitq 
that that thread is about is not used anymore. But as per above, 
remaining is the waitq of the incoming workq (not mentioned in the 
thread above). As I wrote, I have tried
__wake_up_sync((x), TASK_NORMAL), but it does not make a difference for 
me - similar to Miklos' testing before. I have also tried struct 
completion / swait - does not make a difference either.
I can see task_struct has wake_cpu, but there doesn't seem to be a good 
interface to set it.

Any ideas?

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread