hi,

This issue was found when I tested IORING_FEAT_FAST_POLL feature, with the newest
upstream codes, indeed I find that io_uring's performace improvement is not obvious
compared to epoll in my test environment, most of the time they are similar.
Test cases basically comes from:
   https://github.com/frevib/io_uring-echo-server/blob/io-uring-feat-fast-poll/benchmarks/benchmarks.md.
In above url, the author's test results shows that io_uring will get a big performace
improvement compared to epoll. I'm still looking into why I don't get the big improvement,
currently don't know why, but I find some obvious regression issue.

I wrote a simple tool based io_uring nop operation to evaluate io_uring framework in
v5.1 and 5.7.0-rc4+(jens's io_uring-5.7 branch), I see a obvious performace regression:
v5.1 kernel:
     $sudo taskset -c 60 ./io_uring_nop_stress -r 300 # run 300 seconds
     total ios: 1832524960
     IOPS:      6108416
5.7.0-rc4+
     $sudo taskset -c 60 ./io_uring_nop_stress -r 300
     total ios: 1597672304
     IOPS:      5325574
it's about 12% performance regression.

Using perf can see many performance bottlenecks, for example, io_submit_sqes is one.
For now, I did't make many analysis yet, just have a look at io_submit_sqes(), there
are many assignment operations in io_init_req(), but I'm not sure whether they are all
needed when req is not needed to be punt to io-wq, for example,
     INIT_IO_WORK(&req->work, io_wq_submit_work); # a whole struct assignment
from perf annotate tool, it's an expensive operation, I think reqs that use fast poll
feature use task-work function, so the INIT_IO_WORK maybe not necessary.

Above is just one issue, what I worry is that whether io_uring is becoming more bloated
gradually, and will not that better to aio. In https://kernel.dk/io_uring.pdf, it says
that io_uring will eliminate 104 bytes copy compared to aio, but see currenct
io_init_req(), io_uring maybe copy more, introducing more overhead? Or does we need to
carefully re-design struct io_kiocb, to reduce overhead as soon as possible.


Regards,
Xiaoguang Wang