All of lore.kernel.org
 help / color / mirror / Atom feed
* Odd timeout behavior
@ 2020-04-11 23:00 Hrvoje Zeba
  2020-04-12  2:07 ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Hrvoje Zeba @ 2020-04-11 23:00 UTC (permalink / raw)
  To: io-uring

[-- Attachment #1: Type: text/plain, Size: 885 bytes --]

Hi,

I've been looking at timeouts and found a case I can't wrap my head around.

Basically, If you submit OPs in a certain order, timeout fires before
time elapses where I wouldn't expect it to. The order is as follows:

poll(listen_socket, POLLIN) <- this never fires
nop(async)
timeout(1s, count=X)

If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
not fire (at least not immediately). This is expected apart from maybe
setting X=1 which would potentially allow the timeout to fire if nop
executes after the timeout is setup.

If you set it to 0xffffffff, it will always fire (at least on my
machine). Test program I'm using is attached.

The funny thing is that, if you remove the poll, timeout will not fire.

I'm using Linus' tree (v5.6-12604-gab6f762f0f53).

Could anybody shine a bit of light here?


Thank you,
Hrvoje

-- 
I doubt, therefore I might be.

[-- Attachment #2: nop-timeout.c --]
[-- Type: text/x-csrc, Size: 2205 bytes --]

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <time.h>
#include <poll.h>
#include <sys/time.h>
#include <sys/socket.h>
#include <arpa/inet.h>

#include "liburing.h"

struct io_uring_sqe* get_sqe(struct io_uring* ring)
{
	struct io_uring_sqe* sqe = io_uring_get_sqe(ring);
	assert(sqe != NULL);
	return sqe;
}

void enqueue_nop(struct io_uring* ring)
{
	struct io_uring_sqe* sqe = get_sqe(ring);

	io_uring_prep_nop(sqe);
	io_uring_sqe_set_data(sqe, (void*)1);
	io_uring_sqe_set_flags(sqe, IOSQE_ASYNC);
}

void enqueue_timeout(struct io_uring* ring)
{
	struct io_uring_sqe* sqe = get_sqe(ring);
	static struct __kernel_timespec ts;

	ts.tv_sec = 1;
	ts.tv_nsec = 0;

	io_uring_prep_timeout(sqe, &ts, (unsigned)-1, 0);
	io_uring_sqe_set_data(sqe, (void*)2);
}

void enqueue_poll(struct io_uring* ring, int fd)
{
	struct io_uring_sqe* sqe = get_sqe(ring);

	io_uring_prep_poll_add(sqe, fd, POLLIN | POLLERR | POLLHUP);
	io_uring_sqe_set_data(sqe, (void*)3);
}

int create_socket()
{
	int s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP);
	assert(s != -1);

	int flags = fcntl(s, F_GETFL, 0);
	assert(flags != -1);

	flags |= O_NONBLOCK;

	assert(fcntl(s, F_SETFL, flags) != -1);

	struct sockaddr_in addr;

	addr.sin_family = AF_INET;
	addr.sin_port = 0x1236;
	addr.sin_addr.s_addr = 0x0100007fU;

	assert(bind(s, (struct sockaddr*)&addr, sizeof(addr)) != -1);
	assert(listen(s, 1024) != -1);

	return s;
}

int main(int argc, char *argv[])
{
	struct io_uring ring;
	int ret;

	ret = io_uring_queue_init(4, &ring, 0);
	if (ret) {
		fprintf(stderr, "ring setup failed\n");
		return 1;
	}

	int s = create_socket();
	enqueue_poll(&ring, s);

	enqueue_nop(&ring);
	enqueue_timeout(&ring);

	ret = io_uring_submit_and_wait(&ring, 1);
	if (ret == -1) {
		fprintf(stderr, "submit failed\n");
		return 1;
	}

	struct io_uring_cqe* cqe;
	uint32_t head;
	uint32_t count = 0;

	io_uring_for_each_cqe(&ring, head, cqe) {
		if (io_uring_cqe_get_data(cqe) == (void*)2)
			fprintf(stderr, "Timeout triggered!\n");

		count++;
	}

	io_uring_cq_advance(&ring, count);

	io_uring_queue_exit(&ring);
	return 0;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-11 23:00 Odd timeout behavior Hrvoje Zeba
@ 2020-04-12  2:07 ` Jens Axboe
  2020-04-12  9:15   ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2020-04-12  2:07 UTC (permalink / raw)
  To: Hrvoje Zeba, io-uring, Pavel Begunkov, zhangyi (F)

On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
> Hi,
> 
> I've been looking at timeouts and found a case I can't wrap my head around.
> 
> Basically, If you submit OPs in a certain order, timeout fires before
> time elapses where I wouldn't expect it to. The order is as follows:
> 
> poll(listen_socket, POLLIN) <- this never fires
> nop(async)
> timeout(1s, count=X)
> 
> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
> not fire (at least not immediately). This is expected apart from maybe
> setting X=1 which would potentially allow the timeout to fire if nop
> executes after the timeout is setup.
> 
> If you set it to 0xffffffff, it will always fire (at least on my
> machine). Test program I'm using is attached.
> 
> The funny thing is that, if you remove the poll, timeout will not fire.
> 
> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
> 
> Could anybody shine a bit of light here?

Thinking about this, I think the mistake here is using the SQ side for
the timeouts. Let's say you queue up N requests that are waiting, like
the poll. Then you arm a timeout, it'll now be at N + count before it
fires. We really should be using the CQ side for the timeouts.

Adding Pavel and Yi Zhang.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-12  2:07 ` Jens Axboe
@ 2020-04-12  9:15   ` Pavel Begunkov
  2020-04-12 14:40     ` Jens Axboe
  2020-04-12 15:14     ` Hrvoje Zeba
  0 siblings, 2 replies; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-12  9:15 UTC (permalink / raw)
  To: Jens Axboe, Hrvoje Zeba, io-uring, zhangyi (F)

On 4/12/2020 5:07 AM, Jens Axboe wrote:
> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>> Hi,
>>
>> I've been looking at timeouts and found a case I can't wrap my head around.
>>
>> Basically, If you submit OPs in a certain order, timeout fires before
>> time elapses where I wouldn't expect it to. The order is as follows:
>>
>> poll(listen_socket, POLLIN) <- this never fires
>> nop(async)
>> timeout(1s, count=X)
>>
>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>> not fire (at least not immediately). This is expected apart from maybe
>> setting X=1 which would potentially allow the timeout to fire if nop
>> executes after the timeout is setup.
>>
>> If you set it to 0xffffffff, it will always fire (at least on my
>> machine). Test program I'm using is attached.
>>
>> The funny thing is that, if you remove the poll, timeout will not fire.
>>
>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>
>> Could anybody shine a bit of light here?
> 
> Thinking about this, I think the mistake here is using the SQ side for
> the timeouts. Let's say you queue up N requests that are waiting, like
> the poll. Then you arm a timeout, it'll now be at N + count before it
> fires. We really should be using the CQ side for the timeouts.

As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
__immediately__ (i.e. not waiting 1s). Currently, it should work more
like "fire after N events *submitted after the timeout* completed", so
SQ vs CQ is another topic, but IMHO is not related.

And still, the described behaviour is out of the definition. It's sounds
like int overflow. Ok, I'll debug it, rest assured. I already see a
couple of flaws anyway.

BTW, I don't see why this offset feature is there in the first place. It
can be easily done in the userspace on CQ reaping. It won't help
multi-threaded apps as well.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-12  9:15   ` Pavel Begunkov
@ 2020-04-12 14:40     ` Jens Axboe
  2020-04-17  8:39       ` Pavel Begunkov
  2020-04-12 15:14     ` Hrvoje Zeba
  1 sibling, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2020-04-12 14:40 UTC (permalink / raw)
  To: Pavel Begunkov, Hrvoje Zeba, io-uring, zhangyi (F)

On 4/12/20 3:15 AM, Pavel Begunkov wrote:
> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>> Hi,
>>>
>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>
>>> Basically, If you submit OPs in a certain order, timeout fires before
>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>
>>> poll(listen_socket, POLLIN) <- this never fires
>>> nop(async)
>>> timeout(1s, count=X)
>>>
>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>> not fire (at least not immediately). This is expected apart from maybe
>>> setting X=1 which would potentially allow the timeout to fire if nop
>>> executes after the timeout is setup.
>>>
>>> If you set it to 0xffffffff, it will always fire (at least on my
>>> machine). Test program I'm using is attached.
>>>
>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>
>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>
>>> Could anybody shine a bit of light here?
>>
>> Thinking about this, I think the mistake here is using the SQ side for
>> the timeouts. Let's say you queue up N requests that are waiting, like
>> the poll. Then you arm a timeout, it'll now be at N + count before it
>> fires. We really should be using the CQ side for the timeouts.
> 
> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
> __immediately__ (i.e. not waiting 1s). Currently, it should work more
> like "fire after N events *submitted after the timeout* completed", so
> SQ vs CQ is another topic, but IMHO is not related.
> 
> And still, the described behaviour is out of the definition. It's sounds
> like int overflow. Ok, I'll debug it, rest assured. I already see a
> couple of flaws anyway.

Yeah agree it's two separate issues, the -1U must be a simple overflow.
So probably not that tricky to fix. 

Reason I bring up the other part is that Hrvoje's test case had other
cases as well, and the SQ vs CQ trigger is worth looking into. For
example, if we do:

enqueue N polls
enqueue timeout, count == 2, t = 10s
enqueue 2 nops

I'd logically expect the timeout to trigger when nop #2 is completed.
But it won't be, because we still have N polls waiting. What the count
== 2 is really saying (right now) is "trigger timeout when CQ passes SQ
by 2", which seems a bit odd.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-12  9:15   ` Pavel Begunkov
  2020-04-12 14:40     ` Jens Axboe
@ 2020-04-12 15:14     ` Hrvoje Zeba
  2020-04-13  8:21       ` Pavel Begunkov
  1 sibling, 1 reply; 14+ messages in thread
From: Hrvoje Zeba @ 2020-04-12 15:14 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, zhangyi (F)

On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 4/12/2020 5:07 AM, Jens Axboe wrote:
> > On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
> >> Hi,
> >>
> >> I've been looking at timeouts and found a case I can't wrap my head around.
> >>
> >> Basically, If you submit OPs in a certain order, timeout fires before
> >> time elapses where I wouldn't expect it to. The order is as follows:
> >>
> >> poll(listen_socket, POLLIN) <- this never fires
> >> nop(async)
> >> timeout(1s, count=X)
> >>
> >> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
> >> not fire (at least not immediately). This is expected apart from maybe
> >> setting X=1 which would potentially allow the timeout to fire if nop
> >> executes after the timeout is setup.
> >>
> >> If you set it to 0xffffffff, it will always fire (at least on my
> >> machine). Test program I'm using is attached.
> >>
> >> The funny thing is that, if you remove the poll, timeout will not fire.
> >>
> >> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
> >>
> >> Could anybody shine a bit of light here?
> >
> > Thinking about this, I think the mistake here is using the SQ side for
> > the timeouts. Let's say you queue up N requests that are waiting, like
> > the poll. Then you arm a timeout, it'll now be at N + count before it
> > fires. We really should be using the CQ side for the timeouts.
>
> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
> __immediately__ (i.e. not waiting 1s).

Correct.

> And still, the described behaviour is out of the definition. It's sounds
> like int overflow. Ok, I'll debug it, rest assured. I already see a
> couple of flaws anyway.

For this particular case,

req->sequence = ctx->cached_sq_head + count - 1;

ends up being 1 which triggers in __req_need_defer() for nop sq.

-- 
I doubt, therefore I might be.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-12 15:14     ` Hrvoje Zeba
@ 2020-04-13  8:21       ` Pavel Begunkov
  2020-04-13 14:16         ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-13  8:21 UTC (permalink / raw)
  To: Hrvoje Zeba; +Cc: Jens Axboe, io-uring, zhangyi (F)

On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>
>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>> Hi,
>>>>
>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>
>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>
>>>> poll(listen_socket, POLLIN) <- this never fires
>>>> nop(async)
>>>> timeout(1s, count=X)
>>>>
>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>> not fire (at least not immediately). This is expected apart from maybe
>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>> executes after the timeout is setup.
>>>>
>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>> machine). Test program I'm using is attached.
>>>>
>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>
>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>
>>>> Could anybody shine a bit of light here?
>>>
>>> Thinking about this, I think the mistake here is using the SQ side for
>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>> fires. We really should be using the CQ side for the timeouts.
>>
>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>> __immediately__ (i.e. not waiting 1s).
> 
> Correct.
> 
>> And still, the described behaviour is out of the definition. It's sounds
>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>> couple of flaws anyway.
> 
> For this particular case,
> 
> req->sequence = ctx->cached_sq_head + count - 1;
> 
> ends up being 1 which triggers in __req_need_defer() for nop sq.

Right, that's it. The timeout's seq counter wraps around and triggers on
previously submitted but still inflight requests.

Jens, could you remind, do we limit number of inflight requests? We
discussed it before, but can't find the thread. If we don't, vile stuff
can happen with sequences.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-13  8:21       ` Pavel Begunkov
@ 2020-04-13 14:16         ` Jens Axboe
  2020-04-13 19:09           ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2020-04-13 14:16 UTC (permalink / raw)
  To: Pavel Begunkov, Hrvoje Zeba; +Cc: io-uring, zhangyi (F)

On 4/13/20 2:21 AM, Pavel Begunkov wrote:
> On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
>> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>
>>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>>> Hi,
>>>>>
>>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>>
>>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>>
>>>>> poll(listen_socket, POLLIN) <- this never fires
>>>>> nop(async)
>>>>> timeout(1s, count=X)
>>>>>
>>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>>> not fire (at least not immediately). This is expected apart from maybe
>>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>>> executes after the timeout is setup.
>>>>>
>>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>>> machine). Test program I'm using is attached.
>>>>>
>>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>>
>>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>>
>>>>> Could anybody shine a bit of light here?
>>>>
>>>> Thinking about this, I think the mistake here is using the SQ side for
>>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>>> fires. We really should be using the CQ side for the timeouts.
>>>
>>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>>> __immediately__ (i.e. not waiting 1s).
>>
>> Correct.
>>
>>> And still, the described behaviour is out of the definition. It's sounds
>>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>>> couple of flaws anyway.
>>
>> For this particular case,
>>
>> req->sequence = ctx->cached_sq_head + count - 1;
>>
>> ends up being 1 which triggers in __req_need_defer() for nop sq.
> 
> Right, that's it. The timeout's seq counter wraps around and triggers on
> previously submitted but still inflight requests.
> 
> Jens, could you remind, do we limit number of inflight requests? We
> discussed it before, but can't find the thread. If we don't, vile stuff
> can happen with sequences.

We don't.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-13 14:16         ` Jens Axboe
@ 2020-04-13 19:09           ` Pavel Begunkov
  2020-04-14  0:44             ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-13 19:09 UTC (permalink / raw)
  To: Jens Axboe, Hrvoje Zeba; +Cc: io-uring, zhangyi (F)

On 13/04/2020 17:16, Jens Axboe wrote:
> On 4/13/20 2:21 AM, Pavel Begunkov wrote:
>> On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
>>> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>
>>>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>>>
>>>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>>>
>>>>>> poll(listen_socket, POLLIN) <- this never fires
>>>>>> nop(async)
>>>>>> timeout(1s, count=X)
>>>>>>
>>>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>>>> not fire (at least not immediately). This is expected apart from maybe
>>>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>>>> executes after the timeout is setup.
>>>>>>
>>>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>>>> machine). Test program I'm using is attached.
>>>>>>
>>>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>>>
>>>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>>>
>>>>>> Could anybody shine a bit of light here?
>>>>>
>>>>> Thinking about this, I think the mistake here is using the SQ side for
>>>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>>>> fires. We really should be using the CQ side for the timeouts.
>>>>
>>>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>>>> __immediately__ (i.e. not waiting 1s).
>>>
>>> Correct.
>>>
>>>> And still, the described behaviour is out of the definition. It's sounds
>>>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>>>> couple of flaws anyway.
>>>
>>> For this particular case,
>>>
>>> req->sequence = ctx->cached_sq_head + count - 1;
>>>
>>> ends up being 1 which triggers in __req_need_defer() for nop sq.
>>
>> Right, that's it. The timeout's seq counter wraps around and triggers on
>> previously submitted but still inflight requests.
>>
>> Jens, could you remind, do we limit number of inflight requests? We
>> discussed it before, but can't find the thread. If we don't, vile stuff
>> can happen with sequences.
> 
> We don't.

I was too quick to judge, there won't be anything too bad, and only if we throw
2^32 requests (~1TB).

For the issue at hand, how about limiting timeouts' sqe->off by 2^31? This will
solve the issue for now, and I can't imagine anyone waiting for over one billion
requests to pass.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-13 19:09           ` Pavel Begunkov
@ 2020-04-14  0:44             ` Jens Axboe
  2020-04-14 15:46               ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2020-04-14  0:44 UTC (permalink / raw)
  To: Pavel Begunkov, Hrvoje Zeba; +Cc: io-uring, zhangyi (F)

On 4/13/20 1:09 PM, Pavel Begunkov wrote:
> On 13/04/2020 17:16, Jens Axboe wrote:
>> On 4/13/20 2:21 AM, Pavel Begunkov wrote:
>>> On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
>>>> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>>
>>>>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>>>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>>>>
>>>>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>>>>
>>>>>>> poll(listen_socket, POLLIN) <- this never fires
>>>>>>> nop(async)
>>>>>>> timeout(1s, count=X)
>>>>>>>
>>>>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>>>>> not fire (at least not immediately). This is expected apart from maybe
>>>>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>>>>> executes after the timeout is setup.
>>>>>>>
>>>>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>>>>> machine). Test program I'm using is attached.
>>>>>>>
>>>>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>>>>
>>>>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>>>>
>>>>>>> Could anybody shine a bit of light here?
>>>>>>
>>>>>> Thinking about this, I think the mistake here is using the SQ side for
>>>>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>>>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>>>>> fires. We really should be using the CQ side for the timeouts.
>>>>>
>>>>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>>>>> __immediately__ (i.e. not waiting 1s).
>>>>
>>>> Correct.
>>>>
>>>>> And still, the described behaviour is out of the definition. It's sounds
>>>>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>>>>> couple of flaws anyway.
>>>>
>>>> For this particular case,
>>>>
>>>> req->sequence = ctx->cached_sq_head + count - 1;
>>>>
>>>> ends up being 1 which triggers in __req_need_defer() for nop sq.
>>>
>>> Right, that's it. The timeout's seq counter wraps around and triggers on
>>> previously submitted but still inflight requests.
>>>
>>> Jens, could you remind, do we limit number of inflight requests? We
>>> discussed it before, but can't find the thread. If we don't, vile stuff
>>> can happen with sequences.
>>
>> We don't.
> 
> I was too quick to judge, there won't be anything too bad, and only if we throw
> 2^32 requests (~1TB).
> 
> For the issue at hand, how about limiting timeouts' sqe->off by 2^31? This will
> solve the issue for now, and I can't imagine anyone waiting for over one billion
> requests to pass.

I'm fine with that, but how do we handle someone asking for > INT_MAX?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-14  0:44             ` Jens Axboe
@ 2020-04-14 15:46               ` Pavel Begunkov
  2020-04-14 16:04                 ` Jens Axboe
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-14 15:46 UTC (permalink / raw)
  To: Jens Axboe, Hrvoje Zeba; +Cc: io-uring, zhangyi (F)

On 14/04/2020 03:44, Jens Axboe wrote:
> On 4/13/20 1:09 PM, Pavel Begunkov wrote:
>> On 13/04/2020 17:16, Jens Axboe wrote:
>>> On 4/13/20 2:21 AM, Pavel Begunkov wrote:
>>>> On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
>>>>> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>>>
>>>>>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>>>>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>>>>>
>>>>>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>>>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>>>>>
>>>>>>>> poll(listen_socket, POLLIN) <- this never fires
>>>>>>>> nop(async)
>>>>>>>> timeout(1s, count=X)
>>>>>>>>
>>>>>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>>>>>> not fire (at least not immediately). This is expected apart from maybe
>>>>>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>>>>>> executes after the timeout is setup.
>>>>>>>>
>>>>>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>>>>>> machine). Test program I'm using is attached.
>>>>>>>>
>>>>>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>>>>>
>>>>>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>>>>>
>>>>>>>> Could anybody shine a bit of light here?
>>>>>>>
>>>>>>> Thinking about this, I think the mistake here is using the SQ side for
>>>>>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>>>>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>>>>>> fires. We really should be using the CQ side for the timeouts.
>>>>>>
>>>>>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>>>>>> __immediately__ (i.e. not waiting 1s).
>>>>>
>>>>> Correct.
>>>>>
>>>>>> And still, the described behaviour is out of the definition. It's sounds
>>>>>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>>>>>> couple of flaws anyway.
>>>>>
>>>>> For this particular case,
>>>>>
>>>>> req->sequence = ctx->cached_sq_head + count - 1;
>>>>>
>>>>> ends up being 1 which triggers in __req_need_defer() for nop sq.
>>>>
>>>> Right, that's it. The timeout's seq counter wraps around and triggers on
>>>> previously submitted but still inflight requests.
>>>>
>>>> Jens, could you remind, do we limit number of inflight requests? We
>>>> discussed it before, but can't find the thread. If we don't, vile stuff
>>>> can happen with sequences.
>>>
>>> We don't.
>>
>> I was too quick to judge, there won't be anything too bad, and only if we throw
>> 2^32 requests (~1TB).
>>
>> For the issue at hand, how about limiting timeouts' sqe->off by 2^31? This will
>> solve the issue for now, and I can't imagine anyone waiting for over one billion
>> requests to pass.
> 
> I'm fine with that, but how do we handle someone asking for > INT_MAX?

>INT_MAX is allowed, but I want to return -EINVAL instead.
If you mean UINT_MAX, then sqe->off is u32, so can't happen.


-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-14 15:46               ` Pavel Begunkov
@ 2020-04-14 16:04                 ` Jens Axboe
  2020-04-14 16:31                   ` Pavel Begunkov
  0 siblings, 1 reply; 14+ messages in thread
From: Jens Axboe @ 2020-04-14 16:04 UTC (permalink / raw)
  To: Pavel Begunkov, Hrvoje Zeba; +Cc: io-uring, zhangyi (F)

On 4/14/20 9:46 AM, Pavel Begunkov wrote:
> On 14/04/2020 03:44, Jens Axboe wrote:
>> On 4/13/20 1:09 PM, Pavel Begunkov wrote:
>>> On 13/04/2020 17:16, Jens Axboe wrote:
>>>> On 4/13/20 2:21 AM, Pavel Begunkov wrote:
>>>>> On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
>>>>>> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>>>>
>>>>>>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>>>>>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>>>>>>
>>>>>>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>>>>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>>>>>>
>>>>>>>>> poll(listen_socket, POLLIN) <- this never fires
>>>>>>>>> nop(async)
>>>>>>>>> timeout(1s, count=X)
>>>>>>>>>
>>>>>>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>>>>>>> not fire (at least not immediately). This is expected apart from maybe
>>>>>>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>>>>>>> executes after the timeout is setup.
>>>>>>>>>
>>>>>>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>>>>>>> machine). Test program I'm using is attached.
>>>>>>>>>
>>>>>>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>>>>>>
>>>>>>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>>>>>>
>>>>>>>>> Could anybody shine a bit of light here?
>>>>>>>>
>>>>>>>> Thinking about this, I think the mistake here is using the SQ side for
>>>>>>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>>>>>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>>>>>>> fires. We really should be using the CQ side for the timeouts.
>>>>>>>
>>>>>>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>>>>>>> __immediately__ (i.e. not waiting 1s).
>>>>>>
>>>>>> Correct.
>>>>>>
>>>>>>> And still, the described behaviour is out of the definition. It's sounds
>>>>>>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>>>>>>> couple of flaws anyway.
>>>>>>
>>>>>> For this particular case,
>>>>>>
>>>>>> req->sequence = ctx->cached_sq_head + count - 1;
>>>>>>
>>>>>> ends up being 1 which triggers in __req_need_defer() for nop sq.
>>>>>
>>>>> Right, that's it. The timeout's seq counter wraps around and triggers on
>>>>> previously submitted but still inflight requests.
>>>>>
>>>>> Jens, could you remind, do we limit number of inflight requests? We
>>>>> discussed it before, but can't find the thread. If we don't, vile stuff
>>>>> can happen with sequences.
>>>>
>>>> We don't.
>>>
>>> I was too quick to judge, there won't be anything too bad, and only if we throw
>>> 2^32 requests (~1TB).
>>>
>>> For the issue at hand, how about limiting timeouts' sqe->off by 2^31? This will
>>> solve the issue for now, and I can't imagine anyone waiting for over one billion
>>> requests to pass.
>>
>> I'm fine with that, but how do we handle someone asking for > INT_MAX?
> 
>> INT_MAX is allowed, but I want to return -EINVAL instead.
> If you mean UINT_MAX, then sqe->off is u32, so can't happen.

No, I mean count > INT_MAX, what you're suggesting we just don't support.
If there are apps right now using that, how do we handle it?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-14 16:04                 ` Jens Axboe
@ 2020-04-14 16:31                   ` Pavel Begunkov
  0 siblings, 0 replies; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-14 16:31 UTC (permalink / raw)
  To: Jens Axboe, Hrvoje Zeba; +Cc: io-uring, zhangyi (F)

On 14/04/2020 19:04, Jens Axboe wrote:
> On 4/14/20 9:46 AM, Pavel Begunkov wrote:
>> On 14/04/2020 03:44, Jens Axboe wrote:
>>> On 4/13/20 1:09 PM, Pavel Begunkov wrote:
>>>> On 13/04/2020 17:16, Jens Axboe wrote:
>>>>> On 4/13/20 2:21 AM, Pavel Begunkov wrote:
>>>>>> On 4/12/2020 6:14 PM, Hrvoje Zeba wrote:
>>>>>>> On Sun, Apr 12, 2020 at 5:15 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>>>>>>>> On 4/11/20 5:00 PM, Hrvoje Zeba wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I've been looking at timeouts and found a case I can't wrap my head around.
>>>>>>>>>>
>>>>>>>>>> Basically, If you submit OPs in a certain order, timeout fires before
>>>>>>>>>> time elapses where I wouldn't expect it to. The order is as follows:
>>>>>>>>>>
>>>>>>>>>> poll(listen_socket, POLLIN) <- this never fires
>>>>>>>>>> nop(async)
>>>>>>>>>> timeout(1s, count=X)
>>>>>>>>>>
>>>>>>>>>> If you set X to anything but 0xffffffff/(unsigned)-1, the timeout does
>>>>>>>>>> not fire (at least not immediately). This is expected apart from maybe
>>>>>>>>>> setting X=1 which would potentially allow the timeout to fire if nop
>>>>>>>>>> executes after the timeout is setup.
>>>>>>>>>>
>>>>>>>>>> If you set it to 0xffffffff, it will always fire (at least on my
>>>>>>>>>> machine). Test program I'm using is attached.
>>>>>>>>>>
>>>>>>>>>> The funny thing is that, if you remove the poll, timeout will not fire.
>>>>>>>>>>
>>>>>>>>>> I'm using Linus' tree (v5.6-12604-gab6f762f0f53).
>>>>>>>>>>
>>>>>>>>>> Could anybody shine a bit of light here?
>>>>>>>>>
>>>>>>>>> Thinking about this, I think the mistake here is using the SQ side for
>>>>>>>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>>>>>>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>>>>>>>> fires. We really should be using the CQ side for the timeouts.
>>>>>>>>
>>>>>>>> As I get it, the problem is that timeout(off=0xffffffff, 1s) fires
>>>>>>>> __immediately__ (i.e. not waiting 1s).
>>>>>>>
>>>>>>> Correct.
>>>>>>>
>>>>>>>> And still, the described behaviour is out of the definition. It's sounds
>>>>>>>> like int overflow. Ok, I'll debug it, rest assured. I already see a
>>>>>>>> couple of flaws anyway.
>>>>>>>
>>>>>>> For this particular case,
>>>>>>>
>>>>>>> req->sequence = ctx->cached_sq_head + count - 1;
>>>>>>>
>>>>>>> ends up being 1 which triggers in __req_need_defer() for nop sq.
>>>>>>
>>>>>> Right, that's it. The timeout's seq counter wraps around and triggers on
>>>>>> previously submitted but still inflight requests.
>>>>>>
>>>>>> Jens, could you remind, do we limit number of inflight requests? We
>>>>>> discussed it before, but can't find the thread. If we don't, vile stuff
>>>>>> can happen with sequences.
>>>>>
>>>>> We don't.
>>>>
>>>> I was too quick to judge, there won't be anything too bad, and only if we throw
>>>> 2^32 requests (~1TB).
>>>>
>>>> For the issue at hand, how about limiting timeouts' sqe->off by 2^31? This will
>>>> solve the issue for now, and I can't imagine anyone waiting for over one billion
>>>> requests to pass.
>>>
>>> I'm fine with that, but how do we handle someone asking for > INT_MAX?
>>
>>> INT_MAX is allowed, but I want to return -EINVAL instead.
>> If you mean UINT_MAX, then sqe->off is u32, so can't happen.
> 
> No, I mean count > INT_MAX, what you're suggesting we just don't support.

Got it. That's what my question was about.

> If there are apps right now using that, how do we handle it?

1. if # of inflight requests is limited (empirically/naturally or not), then we
can extend req->seq to >= 33 bits (leaving sqe->off u32), and it'll just work.

e.g. if req->seq is u64, then we need 2^64 - 2^32 inflight requests to hit the
issue. And I don't expect anybody creating requests worth of 1ZB (2^30 TB).

2. or to think about something

I'll send 1., and then out of curiosity give a thought to 2.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-12 14:40     ` Jens Axboe
@ 2020-04-17  8:39       ` Pavel Begunkov
  2020-04-17 14:37         ` Hrvoje Zeba
  0 siblings, 1 reply; 14+ messages in thread
From: Pavel Begunkov @ 2020-04-17  8:39 UTC (permalink / raw)
  To: Jens Axboe, Hrvoje Zeba, io-uring, zhangyi (F)

On 12/04/2020 17:40, Jens Axboe wrote:
>> On 4/12/2020 5:07 AM, Jens Axboe wrote:
>>> Thinking about this, I think the mistake here is using the SQ side for
>>> the timeouts. Let's say you queue up N requests that are waiting, like
>>> the poll. Then you arm a timeout, it'll now be at N + count before it
>>> fires. We really should be using the CQ side for the timeouts.
...
> Reason I bring up the other part is that Hrvoje's test case had other
> cases as well, and the SQ vs CQ trigger is worth looking into. For
> example, if we do:
> 
> enqueue N polls
> enqueue timeout, count == 2, t = 10s
> enqueue 2 nops
> 
> I'd logically expect the timeout to trigger when nop #2 is completed.
> But it won't be, because we still have N polls waiting. What the count
> == 2 is really saying (right now) is "trigger timeout when CQ passes SQ
> by 2", which seems a bit odd.
> 

time for this:

1. do we really want to change current behaviour? As you said, there may be users.

2. why a timeout can't be triggered by another timeout completion? There are
bits adjusting req->sequence for enqueued timeouts up and down. I understand,
that liburing hides timeouts from users, but handling them inconsistently in
that sense from any other request is IMHO a bad idea. Can we kill it?

3. For your case, should it to fire exactly after those 2 nops? Or it can be
triggered by previously completed requests (e.g. polls)?

e.g. timeline as follows
- enqueue polls
- enqueue timeout
- 2 polls completed
- the timeout triggered by completion of polls
- do nops

The second one is what io_uring pdf says, and I'd prefer to have it.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Odd timeout behavior
  2020-04-17  8:39       ` Pavel Begunkov
@ 2020-04-17 14:37         ` Hrvoje Zeba
  0 siblings, 0 replies; 14+ messages in thread
From: Hrvoje Zeba @ 2020-04-17 14:37 UTC (permalink / raw)
  To: Pavel Begunkov; +Cc: Jens Axboe, io-uring, zhangyi (F)

On Fri, Apr 17, 2020 at 4:40 AM Pavel Begunkov <asml.silence@gmail.com> wrote:
>
> On 12/04/2020 17:40, Jens Axboe wrote:
> >> On 4/12/2020 5:07 AM, Jens Axboe wrote:
> >>> Thinking about this, I think the mistake here is using the SQ side for
> >>> the timeouts. Let's say you queue up N requests that are waiting, like
> >>> the poll. Then you arm a timeout, it'll now be at N + count before it
> >>> fires. We really should be using the CQ side for the timeouts.
> ...
> > Reason I bring up the other part is that Hrvoje's test case had other
> > cases as well, and the SQ vs CQ trigger is worth looking into. For
> > example, if we do:
> >
> > enqueue N polls
> > enqueue timeout, count == 2, t = 10s
> > enqueue 2 nops
> >
> > I'd logically expect the timeout to trigger when nop #2 is completed.
> > But it won't be, because we still have N polls waiting. What the count
> > == 2 is really saying (right now) is "trigger timeout when CQ passes SQ
> > by 2", which seems a bit odd.
> >
>
> time for this:
>
> 1. do we really want to change current behaviour? As you said, there may be users.
>

I still see io_uring as early development. I've had several breakages
when I upgraded the kernel so far. I'm fine with it.

> 2. why a timeout can't be triggered by another timeout completion? There are
> bits adjusting req->sequence for enqueued timeouts up and down. I understand,
> that liburing hides timeouts from users, but handling them inconsistently in
> that sense from any other request is IMHO a bad idea. Can we kill it?
>
> 3. For your case, should it to fire exactly after those 2 nops? Or it can be
> triggered by previously completed requests (e.g. polls)?
>
> e.g. timeline as follows
> - enqueue polls
> - enqueue timeout
> - 2 polls completed
> - the timeout triggered by completion of polls
> - do nops
>

Timeout fires on any cqes is the behavior I expected. I can see the
reasoning behind only triggering for sqes that come after the timeout
(io_uring_submit_and_wait() being the use-case) tho.

-- 
I doubt, therefore I might be.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-04-17 14:37 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-11 23:00 Odd timeout behavior Hrvoje Zeba
2020-04-12  2:07 ` Jens Axboe
2020-04-12  9:15   ` Pavel Begunkov
2020-04-12 14:40     ` Jens Axboe
2020-04-17  8:39       ` Pavel Begunkov
2020-04-17 14:37         ` Hrvoje Zeba
2020-04-12 15:14     ` Hrvoje Zeba
2020-04-13  8:21       ` Pavel Begunkov
2020-04-13 14:16         ` Jens Axboe
2020-04-13 19:09           ` Pavel Begunkov
2020-04-14  0:44             ` Jens Axboe
2020-04-14 15:46               ` Pavel Begunkov
2020-04-14 16:04                 ` Jens Axboe
2020-04-14 16:31                   ` Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.