Hello, during our research on entangling io_uring and parallel runtime systems one of our test cases results in situations where an `IORING_OP_ASYNC_CANCEL` request can not find (-ENOENT) or not cancel (EALREADY) a previously submitted read of an event file descriptor. However, the previously submitted read also never generates a CQE. We now wonder if this is a bug in the kernel, or, at least in the case of EALRADY, works as intended. Our current architecture expects that a request eventually creates a CQE when canceled. # Reproducer pseudo-code: create N eventfds create N threads thread_function: create thread-private io_uring queue pair for (i = 0, i < ITERATIONS, i++) submit read from eventfd n submit read from eventfd (n + 1) % N submit write to eventfd (n + 2) % N await completions until the write completion was reaped submit cancel requests for the two read requests await all outstanding requests (minus a possible already completed read request) Note that: - Each eventfd is read twice but only written once. - The read requests are canceled independently of their state. - There are five io_uring requests per loop iteration # Expectation Each of the five submitted request should be completed: * Write is always successful because writing to an eventfd only blocks if the counter reaches 0xfffffffffffffffe and we add only 1 in each iteration. Furthermore the read from the file descriptor resets the counter to 0. * The cancel requests are always completed with different return values dependent on the state of the read request to cancel. * The read requests should always be completed either because some data is available to read or because they are canceled. # Observation: Sometimes threads block in io_uring_enter forever because one read request is never completed and the cancel of such read returned with -ENOENT or -EALREADY. A C program to reproduce this situation is attached. It contains the essence of the previously mentioned test case with instructions how to compile and execute it. The following log excerpt was generated using a version of the reproducer where each write adds 0 to the eventfd count and thus not completing read requests. This means all read request should be canceled and all cancel requests should either return with 0 (the request was found and canceled) or -EALREADY the read is already in execution and should be interrupted. 0 Prepared read request (evfd: 0, tag: 1) 0 Submitted 1 requests -> 1 inflight 0 Prepared read request (evfd: 1, tag: 2) 0 Submitted 1 requests -> 2 inflight 0 Prepared write request (evfd: 2) 0 Submitted 1 requests -> 3 inflight 0 Collect write completion: 8 0 Prepared cancel request for 1 0 Prepared cancel request for 2 0 Submitted 2 requests -> 4 inflight 0 Collect read 1 completion: -125 - Operation canceled 0 Collect cancel read 1 completion: 0 0 Collect cancel read 2 completion: -2 - No such file or directory Thread 0 blocks forever because the second read could not be canceled (-ENOENT in the last line) but no completion is ever created for it. The far more common situation with the reproducer and adding 1 to the eventfds in each loop is that a request is not canceled and the cancel attempt returned with -EALREADY. There is no progress because the writer has already finished its loop and the cancel apparently does not really cancel the request. 1 Starting iteration 996 1 Prepared read request (evfd: 1, tag: 1) 1 Submitted 1 requests -> 1 inflight 1 Prepared read request (evfd: 2, tag: 2) 1 Submitted 1 requests -> 2 inflight 1 Prepared write request (evfd: 0) 1 Submitted 1 requests -> 3 inflight 1 Collect write completion: 8 1 Prepared cancel request for read 1 1 Prepared cancel request for read 2 1 Submitted 2 requests -> 4 inflight 1 Collect read 1 completion: -125 - Operation canceled 1 Collect cancel read 1 completion: 0 1 Collect cancel read 2 completion: -114 - Operation already in progress After reading the io_uring_enter(2) man page a IORING_OP_ASYNC_CANCEL's return value of -EALREADY apparently may not cause the request to terminate. At least that is our interpretation of "…res field will contain -EALREADY. In this case, the request may or may not terminate." I could reliably reproduce the behavior on different hardware, linux versions from 5.9 to 5.16 as well as liburing versions 0.7 and 2.1. With linux 5.6 I was not able to reproduce this cancel miss. So is the situation we see intended behavior of the API or is it a faulty race in the io_uring cancel code? If it is intended then it becomes really hard to build reliable abstractions using io_uring's cancellation. We really like to have the invariant that a canceled io_uring operation eventually generates a cqe, either completed or canceled/interrupted. --- Florian Fischer & Florian Schmaus f.fischer@cs.fau.de flow@cs.fau.de