All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Fischer <florian.fl.fischer@fau.de>
To: Jens Axboe <axboe@kernel.dk>
Cc: Florian Schmaus <schmaus@cs.fau.de>, io-uring@vger.kernel.org
Subject: Re: Canceled read requests never completed
Date: Wed, 19 Jan 2022 14:42:22 +0100	[thread overview]
Message-ID: <20220119134222.5wx7wrnalhrerspu@pasture> (raw)
In-Reply-To: <21588bc9-592c-7793-c72a-498ba0db4937@kernel.dk>

[-- Attachment #1: Type: text/plain, Size: 2822 bytes --]

On 18.01.2022 19:32, Jens Axboe wrote:
> On 1/18/22 4:36 PM, Jens Axboe wrote:
> > On 1/18/22 1:05 PM, Florian Fischer wrote:
> >>>> After reading the io_uring_enter(2) man page a IORING_OP_ASYNC_CANCEL's return value of -EALREADY apparently
> >>>> may not cause the request to terminate. At least that is our interpretation of "…res field will contain -EALREADY.
> >>>> In this case, the request may or may not terminate."
> >>>
> >>> I took a look at this, and my theory is that the request cancelation
> >>> ends up happening right in between when the work item is moved between
> >>> the work list and to the worker itself. The way the async queue works,
> >>> the work item is sitting in a list until it gets assigned by a worker.
> >>> When that assignment happens, it's removed from the general work list
> >>> and then assigned to the worker itself. There's a small gap there where
> >>> the work cannot be found in the general list, and isn't yet findable in
> >>> the worker itself either.
> >>>
> >>> Do you always see -ENOENT from the cancel when you get the hang
> >>> condition?
> >>
> >> No we also and actually more commonly observe cancel returning
> >> -EALREADY and the canceled read request never gets completed.
> >>
> >> As shown in the log snippet I included below.
> > 
> > I think there are a couple of different cases here. Can you try the
> > below patch? It's against current -git.
> 
> Cleaned it up and split it into functional bits, end result is here:
> 
> https://git.kernel.dk/cgit/linux-block/log/?h=io_uring-5.17

Thanks. I have build and tested your patches.

The most common error we observed (read -> read -> write -> 2x cancel)
is no longer reproducible and our originally test case works flawless :)

Nor could I reproduce any hangs with cancel returning -ENOENT.

But I still can reliably reproduce stuck threads when not incrementing the evfd
count and thus never completing reads due to available data to read.
(read -> read -> write (do not increment evfd count) -> 2x cancel)

I further reduced the attached C program to reproduce the above problem.
The code is also available now at our gitlab [1].

The following log output was created with a less 'minimal' version still including
log functionality:

  75 Collect write completion: 8
  75 Collect cancel read 1 completion: 0
  75 Collect cancel read 2 completion: -114
  75 Collect read 1 completion: -125
  75 Collect read 2 completion: -4

  75 Collect write completion: 8
  75 Collect cancel read 1 completion: 0
  75 Collect cancel read 2 completion: -114
  75 Collect read 1 completion: -125
  thread 75 stuck here

The scenario seams extremely artificial but non or less I think it should
work regardless of its usefulness.

Flo Fischer

[1]: https://gitlab.cs.fau.de/aj46ezos/io_uring-cancel/-/tree/minimal-write0

[-- Attachment #2: io_uring-cancel.c --]
[-- Type: text/plain, Size: 4862 bytes --]

/**
 * io_uring cancel miss reproducer
 * Copyright 2022 Florian Fischer <f.fischer@cs.fau.de>
 *
 * io_urings's IORING_OP_ASYNC_CANCEL sometimes does not cancel inflight requests.
 *
 * Tested systems:
 * - Xeon E7-4830 Debian 11
 *   - liburing 2.1
 *   - reproducable on: linux-block/io_uring-5.17
 *
 * Building the reproducer:
 *   $ gcc -Werror -Wall -g -O3 io_uring-cancel.c -o io_uring-cancel -pthread -luring
 *
 * Steps to reproduce on the above system:
 * 1. Compile the reproducer
 * 2. Run the reproducer in a loop
 *   $ for i in $(seq 100); do echo $i; ./io_uring-cancel; done
 *
 *   (I need < ~30 runs to reproduce a stuck thread with 96 threads)
 *
 * Explanation:
 * The reproducer opens #CPUs event file descriptors and starts #CPUs threads.
 * Each thread creates an io_uring and enters a loop.
 * In each iteration the thread submits a read request from evfds[thread_id].
 * Submits a second read request from evfds[(thread_id + 1) % #CPUs].
 * And issues a write request for evfds[(thread_id + 2) % #CPUs] but the write
 * request does not increase the eventfds count thus it will not complete any read.
 * The write request is awaited.
 * Then two cancel requests for the reads are submitted and the outstanding
 * requests are awaited.
 *
 * The reproducer gets stuck because a cancel request returning with -EALREADY
 * did not properly cancel the read the read request is never completed.
 *
 * WhenpPassing a non-zero value to the reproducer as write increment
 *   $ ./io_uring-cancel 1
 * I could not longer reproduce stuck threads.
 */
#define _GNU_SOURCE
#include <assert.h>
#include <err.h>
#include <errno.h>
#include <liburing.h>
#include <pthread.h>
#include <stdlib.h>
#include <sys/eventfd.h>
#include <sys/sysinfo.h>
#include <unistd.h>

#define IORING_ENTRIES 2
#define ITERATIONS 1000

_Thread_local size_t thread_id;

static pthread_t* threads;
static int* evfds;
static unsigned nthreads;
static unsigned write_increment = 0;

static struct io_uring_sqe* checked_get_sqe(struct io_uring* ring) {
	struct io_uring_sqe* sqe = io_uring_get_sqe(ring);
	assert(sqe);
	return sqe;
}

static void prep_evfd_read(struct io_uring* ring, int evfd, uint64_t* buf, uintptr_t tag) {
	struct io_uring_sqe* sqe = checked_get_sqe(ring);

	io_uring_prep_read(sqe, evfds[evfd], buf, sizeof(*buf), 0);
	io_uring_sqe_set_data(sqe, (void*)tag);
}

static void prep_evfd_write(struct io_uring* ring, int evfd, uint64_t* buf) {
	struct io_uring_sqe* sqe = checked_get_sqe(ring);

	io_uring_prep_write(sqe, evfds[evfd], buf, sizeof(*buf), 0);
}

static void prep_cancel(struct io_uring* ring, uintptr_t tag) {
	struct io_uring_sqe* sqe = checked_get_sqe(ring);

	io_uring_prep_cancel(sqe, (void*)tag, 0);
}

static void checked_submit_n(struct io_uring* ring, int n) {
	int res = io_uring_submit(ring);
	if (res < 0) err(EXIT_FAILURE, "io_uring_submit failed");
	if (res != n) errx(EXIT_FAILURE, "io_uring_submit submitted less sqes than preprared");
}

static void checked_submit_one(struct io_uring* ring) {
	checked_submit_n(ring, 1);
}

static void collect_n_completions(struct io_uring* ring, int n) {
	struct io_uring_cqe* cqe;
	for (int i = 0; i < n; ++i) {
		int res = io_uring_wait_cqe(ring, &cqe);
		if (res) err(EXIT_FAILURE, "io_uring_wait_cqe failed");
		io_uring_cqe_seen(ring, cqe);
	}
}

void* thread_func(void* arg) {
	thread_id = (uintptr_t)(arg);

	struct io_uring ring;
	int res = io_uring_queue_init(IORING_ENTRIES, &ring, 0);
	if (res) err(EXIT_FAILURE, "io_uring_queue_init failed");

	for (unsigned i = 0; i < ITERATIONS; ++i) {
		uint64_t rbuf = 0;
		uint64_t wbuf = write_increment;

		prep_evfd_read(&ring, thread_id, &rbuf, 1);
		checked_submit_one(&ring);

		prep_evfd_read(&ring, (thread_id + 1) % nthreads, &rbuf, 2);
		checked_submit_one(&ring);

		prep_evfd_write(&ring, (thread_id + 2) % nthreads, &wbuf);
		checked_submit_one(&ring);
		collect_n_completions(&ring, 1);


		prep_cancel(&ring, 1);
		prep_cancel(&ring, 2);
		checked_submit_n(&ring, 2);
		collect_n_completions(&ring, 4);
	}

	return NULL;
}

int main(int argc, char* argv[]) {
	if (argc > 1)
		write_increment = atoi(argv[1]);

	nthreads = get_nprocs();
	// nthreads = 10;
	evfds = malloc(sizeof(int) * nthreads);
	if (!evfds) err(EXIT_FAILURE, "malloc failed");

	for (unsigned i = 0; i < nthreads; ++i) {
		evfds[i] = eventfd(0, 0);
		if(!evfds[i]) err(EXIT_FAILURE, "eventfd failed");
	}

	threads = malloc(sizeof(pthread_t) * nthreads);
	if (!threads) err(EXIT_FAILURE, "malloc failed");

	for (unsigned i = 0; i < nthreads; ++i) {
		errno = pthread_create(&threads[i], NULL, thread_func, (void*)(uintptr_t)i);
		if (errno) err(EXIT_FAILURE, "pthread_create failed");
	}

	for (unsigned i = 0; i < nthreads; ++i) {
		errno = pthread_join(threads[i], NULL);
		if (errno) err(EXIT_FAILURE, "pthread_join failed");
	}

	exit(EXIT_SUCCESS);
}

      reply	other threads:[~2022-01-19 13:42 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-18 15:13 Canceled read requests never completed Florian Fischer
2022-01-18 19:07 ` Jens Axboe
2022-01-18 20:05   ` Florian Fischer
2022-01-18 23:36     ` Jens Axboe
2022-01-19  2:32       ` Jens Axboe
2022-01-19 13:42         ` Florian Fischer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220119134222.5wx7wrnalhrerspu@pasture \
    --to=florian.fl.fischer@fau.de \
    --cc=axboe@kernel.dk \
    --cc=io-uring@vger.kernel.org \
    --cc=schmaus@cs.fau.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.