All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] About "io_uring: add more uring info to fdinfo for debug"
@ 2021-10-28 21:24 Eric Dumazet
  2021-10-28 21:40 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2021-10-28 21:24 UTC (permalink / raw)
  To: Jens Axboe, Hao Xu; +Cc: LKML, Eric Dumazet

Hi

I was looking at commit 83f84356bc8f2d
("io_uring: add more uring info to fdinfo for debug") after receiving
syzbot reports.

I suspect that the following :

+       for (i = cached_sq_head; i < sq_tail; i++) {
+               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
+
+               if (likely(sq_idx <= sq_mask)) {
+                       struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
+
+                       seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
+                                  sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
+               }
+       }


Can loop around ~2^32 times if sq_tail is close to ~0U

I see various READ_ONCE(), which are probably not good enough.

At very minimum I would handling wrapping...

diff --git a/fs/io_uring.c b/fs/io_uring.c
index f37de99254bf8e0b8364b267388d1056fce436a4..50493f5c004b70cff103e20bf4b1ba92304eddb9 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -10117,7 +10117,7 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
        struct io_overflow_cqe *ocqe;
        struct io_rings *r = ctx->rings;
        unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
-       unsigned int cached_sq_head = ctx->cached_sq_head;
+       unsigned int cached_sq_head = READ_ONCE(ctx->cached_sq_head);
        unsigned int cached_cq_tail = ctx->cached_cq_tail;
        unsigned int sq_head = READ_ONCE(r->sq.head);
        unsigned int sq_tail = READ_ONCE(r->sq.tail);
@@ -10139,7 +10139,7 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
        seq_printf(m, "CqTail:\t%u\n", cq_tail & cq_mask);
        seq_printf(m, "CachedCqTail:\t%u\n", cached_cq_tail & cq_mask);
        seq_printf(m, "SQEs:\t%u\n", sq_tail - cached_sq_head);
-       for (i = cached_sq_head; i < sq_tail; i++) {
+       for (i = cached_sq_head; (int)(i - sq_tail) < 0; i++) {
                unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
 
                if (likely(sq_idx <= sq_mask)) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-28 21:24 [BUG] About "io_uring: add more uring info to fdinfo for debug" Eric Dumazet
@ 2021-10-28 21:40 ` Jens Axboe
  2021-10-29  0:13   ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2021-10-28 21:40 UTC (permalink / raw)
  To: Eric Dumazet, Hao Xu; +Cc: LKML, Eric Dumazet

On 10/28/21 3:24 PM, Eric Dumazet wrote:
> Hi
> 
> I was looking at commit 83f84356bc8f2d
> ("io_uring: add more uring info to fdinfo for debug") after receiving
> syzbot reports.
> 
> I suspect that the following :
> 
> +       for (i = cached_sq_head; i < sq_tail; i++) {
> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
> +
> +               if (likely(sq_idx <= sq_mask)) {
> +                       struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
> +
> +                       seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
> +                                  sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
> +               }
> +       }
> 
> 
> Can loop around ~2^32 times if sq_tail is close to ~0U
> 
> I see various READ_ONCE(), which are probably not good enough.
> 
> At very minimum I would handling wrapping...

Thanks for reporting this. I think on top of wrapping, the loop should
just be capped at sq_entries as well. There's no point dumping more than
that, ever.

I'll take a stab at this.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-28 21:40 ` Jens Axboe
@ 2021-10-29  0:13   ` Jens Axboe
  2021-10-29  0:43     ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2021-10-29  0:13 UTC (permalink / raw)
  To: Eric Dumazet, Hao Xu; +Cc: LKML, Eric Dumazet

On 10/28/21 3:40 PM, Jens Axboe wrote:
> On 10/28/21 3:24 PM, Eric Dumazet wrote:
>> Hi
>>
>> I was looking at commit 83f84356bc8f2d
>> ("io_uring: add more uring info to fdinfo for debug") after receiving
>> syzbot reports.
>>
>> I suspect that the following :
>>
>> +       for (i = cached_sq_head; i < sq_tail; i++) {
>> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
>> +
>> +               if (likely(sq_idx <= sq_mask)) {
>> +                       struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
>> +
>> +                       seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
>> +                                  sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
>> +               }
>> +       }
>>
>>
>> Can loop around ~2^32 times if sq_tail is close to ~0U
>>
>> I see various READ_ONCE(), which are probably not good enough.
>>
>> At very minimum I would handling wrapping...
> 
> Thanks for reporting this. I think on top of wrapping, the loop should
> just be capped at sq_entries as well. There's no point dumping more than
> that, ever.
> 
> I'll take a stab at this.

I'd probably do something like this - make sure wrap is sane and that we
always cap at the max number of entries we expect. This doesn't quite
hold true for CQEs, but honestly for debugging purposes, we only really
care about the sq ring side in terms of stalls. Or if we have unreaped
CQEs, which we'll still show.

This also removes the masking, as it's better to expose the ring indexes
directly. And just dump the raw ring head/tail for sq/cq. We still
include the cached info, but I think dumping the raw contents is saner
and more useful.


diff --git a/fs/io_uring.c b/fs/io_uring.c
index 17cb0e1b88f0..babd9950ae9f 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -10065,12 +10065,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 	struct io_overflow_cqe *ocqe;
 	struct io_rings *r = ctx->rings;
 	unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
-	unsigned int cached_sq_head = ctx->cached_sq_head;
-	unsigned int cached_cq_tail = ctx->cached_cq_tail;
 	unsigned int sq_head = READ_ONCE(r->sq.head);
 	unsigned int sq_tail = READ_ONCE(r->sq.tail);
 	unsigned int cq_head = READ_ONCE(r->cq.head);
 	unsigned int cq_tail = READ_ONCE(r->cq.tail);
+	unsigned int sq_entries, cq_entries;
 	bool has_lock;
 	unsigned int i;
 
@@ -10080,15 +10079,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 	 * and sq_tail and cq_head are changed by userspace. But it's ok since
 	 * we usually use these info when it is stuck.
 	 */
-	seq_printf(m, "SqHead:\t%u\n", sq_head & sq_mask);
-	seq_printf(m, "SqTail:\t%u\n", sq_tail & sq_mask);
-	seq_printf(m, "CachedSqHead:\t%u\n", cached_sq_head & sq_mask);
-	seq_printf(m, "CqHead:\t%u\n", cq_head & cq_mask);
-	seq_printf(m, "CqTail:\t%u\n", cq_tail & cq_mask);
-	seq_printf(m, "CachedCqTail:\t%u\n", cached_cq_tail & cq_mask);
-	seq_printf(m, "SQEs:\t%u\n", sq_tail - cached_sq_head);
-	for (i = cached_sq_head; i < sq_tail; i++) {
-		unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
+	seq_printf(m, "SqMask:\t\t0x%x\n", sq_mask);
+	seq_printf(m, "SqHead:\t%u\n", sq_head);
+	seq_printf(m, "SqTail:\t%u\n", sq_tail);
+	seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
+	seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
+	seq_printf(m, "CqHead:\t%u\n", cq_head);
+	seq_printf(m, "CqTail:\t%u\n", cq_tail);
+	seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
+	seq_printf(m, "SQEs:\t%u\n", sq_tail - ctx->cached_sq_head);
+	sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
+	for (i = 0; i < sq_entries; i++) {
+		unsigned int entry = i + sq_head;
+		unsigned int sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
 
 		if (likely(sq_idx <= sq_mask)) {
 			struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
@@ -10097,9 +10100,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
 				   sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
 		}
 	}
-	seq_printf(m, "CQEs:\t%u\n", cached_cq_tail - cq_head);
-	for (i = cq_head; i < cached_cq_tail; i++) {
-		struct io_uring_cqe *cqe = &r->cqes[i & cq_mask];
+	seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
+	cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
+	for (i = 0; i < cq_entries; i++) {
+		unsigned int entry = i + cq_head;
+		struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
 
 		seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
 			   i & cq_mask, cqe->user_data, cqe->res, cqe->flags);

-- 
Jens Axboe


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-29  0:13   ` Jens Axboe
@ 2021-10-29  0:43     ` Eric Dumazet
  2021-10-29 12:40       ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2021-10-29  0:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Eric Dumazet, Hao Xu, LKML

On Thu, Oct 28, 2021 at 5:13 PM Jens Axboe <axboe@kernel.dk> wrote:
>
> On 10/28/21 3:40 PM, Jens Axboe wrote:
> > On 10/28/21 3:24 PM, Eric Dumazet wrote:
> >> Hi
> >>
> >> I was looking at commit 83f84356bc8f2d
> >> ("io_uring: add more uring info to fdinfo for debug") after receiving
> >> syzbot reports.
> >>
> >> I suspect that the following :
> >>
> >> +       for (i = cached_sq_head; i < sq_tail; i++) {
> >> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
> >> +
> >> +               if (likely(sq_idx <= sq_mask)) {
> >> +                       struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
> >> +
> >> +                       seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
> >> +                                  sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
> >> +               }
> >> +       }
> >>
> >>
> >> Can loop around ~2^32 times if sq_tail is close to ~0U
> >>
> >> I see various READ_ONCE(), which are probably not good enough.
> >>
> >> At very minimum I would handling wrapping...
> >
> > Thanks for reporting this. I think on top of wrapping, the loop should
> > just be capped at sq_entries as well. There's no point dumping more than
> > that, ever.
> >
> > I'll take a stab at this.
>
> I'd probably do something like this - make sure wrap is sane and that we
> always cap at the max number of entries we expect. This doesn't quite
> hold true for CQEs, but honestly for debugging purposes, we only really
> care about the sq ring side in terms of stalls. Or if we have unreaped
> CQEs, which we'll still show.
>
> This also removes the masking, as it's better to expose the ring indexes
> directly. And just dump the raw ring head/tail for sq/cq. We still
> include the cached info, but I think dumping the raw contents is saner
> and more useful.
>
>
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 17cb0e1b88f0..babd9950ae9f 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -10065,12 +10065,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>         struct io_overflow_cqe *ocqe;
>         struct io_rings *r = ctx->rings;
>         unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
> -       unsigned int cached_sq_head = ctx->cached_sq_head;
> -       unsigned int cached_cq_tail = ctx->cached_cq_tail;
>         unsigned int sq_head = READ_ONCE(r->sq.head);
>         unsigned int sq_tail = READ_ONCE(r->sq.tail);
>         unsigned int cq_head = READ_ONCE(r->cq.head);
>         unsigned int cq_tail = READ_ONCE(r->cq.tail);
> +       unsigned int sq_entries, cq_entries;
>         bool has_lock;
>         unsigned int i;
>
> @@ -10080,15 +10079,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>          * and sq_tail and cq_head are changed by userspace. But it's ok since
>          * we usually use these info when it is stuck.
>          */
> -       seq_printf(m, "SqHead:\t%u\n", sq_head & sq_mask);
> -       seq_printf(m, "SqTail:\t%u\n", sq_tail & sq_mask);
> -       seq_printf(m, "CachedSqHead:\t%u\n", cached_sq_head & sq_mask);
> -       seq_printf(m, "CqHead:\t%u\n", cq_head & cq_mask);
> -       seq_printf(m, "CqTail:\t%u\n", cq_tail & cq_mask);
> -       seq_printf(m, "CachedCqTail:\t%u\n", cached_cq_tail & cq_mask);
> -       seq_printf(m, "SQEs:\t%u\n", sq_tail - cached_sq_head);
> -       for (i = cached_sq_head; i < sq_tail; i++) {
> -               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
> +       seq_printf(m, "SqMask:\t\t0x%x\n", sq_mask);
> +       seq_printf(m, "SqHead:\t%u\n", sq_head);
> +       seq_printf(m, "SqTail:\t%u\n", sq_tail);
> +       seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
> +       seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
> +       seq_printf(m, "CqHead:\t%u\n", cq_head);
> +       seq_printf(m, "CqTail:\t%u\n", cq_tail);
> +       seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
> +       seq_printf(m, "SQEs:\t%u\n", sq_tail - ctx->cached_sq_head);
> +       sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
> +       for (i = 0; i < sq_entries; i++) {
> +               unsigned int entry = i + sq_head;
> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
>
>                 if (likely(sq_idx <= sq_mask)) {
>                         struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
> @@ -10097,9 +10100,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>                                    sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
>                 }
>         }
> -       seq_printf(m, "CQEs:\t%u\n", cached_cq_tail - cq_head);
> -       for (i = cq_head; i < cached_cq_tail; i++) {
> -               struct io_uring_cqe *cqe = &r->cqes[i & cq_mask];
> +       seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
> +       cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
> +       for (i = 0; i < cq_entries; i++) {
> +               unsigned int entry = i + cq_head;
> +               struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
>
>                 seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
>                            i & cq_mask, cqe->user_data, cqe->res, cqe->flags);

Note : you probably want to replace  (i & cq_mask) to (entry & cq_mask) here

Otherwise, patch looks good to me.

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-29  0:43     ` Eric Dumazet
@ 2021-10-29 12:40       ` Jens Axboe
  2021-10-29 17:54         ` Hao Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2021-10-29 12:40 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Eric Dumazet, Hao Xu, LKML

On 10/28/21 6:43 PM, Eric Dumazet wrote:
> On Thu, Oct 28, 2021 at 5:13 PM Jens Axboe <axboe@kernel.dk> wrote:
>>
>> On 10/28/21 3:40 PM, Jens Axboe wrote:
>>> On 10/28/21 3:24 PM, Eric Dumazet wrote:
>>>> Hi
>>>>
>>>> I was looking at commit 83f84356bc8f2d
>>>> ("io_uring: add more uring info to fdinfo for debug") after receiving
>>>> syzbot reports.
>>>>
>>>> I suspect that the following :
>>>>
>>>> +       for (i = cached_sq_head; i < sq_tail; i++) {
>>>> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
>>>> +
>>>> +               if (likely(sq_idx <= sq_mask)) {
>>>> +                       struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
>>>> +
>>>> +                       seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
>>>> +                                  sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
>>>> +               }
>>>> +       }
>>>>
>>>>
>>>> Can loop around ~2^32 times if sq_tail is close to ~0U
>>>>
>>>> I see various READ_ONCE(), which are probably not good enough.
>>>>
>>>> At very minimum I would handling wrapping...
>>>
>>> Thanks for reporting this. I think on top of wrapping, the loop should
>>> just be capped at sq_entries as well. There's no point dumping more than
>>> that, ever.
>>>
>>> I'll take a stab at this.
>>
>> I'd probably do something like this - make sure wrap is sane and that we
>> always cap at the max number of entries we expect. This doesn't quite
>> hold true for CQEs, but honestly for debugging purposes, we only really
>> care about the sq ring side in terms of stalls. Or if we have unreaped
>> CQEs, which we'll still show.
>>
>> This also removes the masking, as it's better to expose the ring indexes
>> directly. And just dump the raw ring head/tail for sq/cq. We still
>> include the cached info, but I think dumping the raw contents is saner
>> and more useful.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index 17cb0e1b88f0..babd9950ae9f 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -10065,12 +10065,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>>         struct io_overflow_cqe *ocqe;
>>         struct io_rings *r = ctx->rings;
>>         unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
>> -       unsigned int cached_sq_head = ctx->cached_sq_head;
>> -       unsigned int cached_cq_tail = ctx->cached_cq_tail;
>>         unsigned int sq_head = READ_ONCE(r->sq.head);
>>         unsigned int sq_tail = READ_ONCE(r->sq.tail);
>>         unsigned int cq_head = READ_ONCE(r->cq.head);
>>         unsigned int cq_tail = READ_ONCE(r->cq.tail);
>> +       unsigned int sq_entries, cq_entries;
>>         bool has_lock;
>>         unsigned int i;
>>
>> @@ -10080,15 +10079,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>>          * and sq_tail and cq_head are changed by userspace. But it's ok since
>>          * we usually use these info when it is stuck.
>>          */
>> -       seq_printf(m, "SqHead:\t%u\n", sq_head & sq_mask);
>> -       seq_printf(m, "SqTail:\t%u\n", sq_tail & sq_mask);
>> -       seq_printf(m, "CachedSqHead:\t%u\n", cached_sq_head & sq_mask);
>> -       seq_printf(m, "CqHead:\t%u\n", cq_head & cq_mask);
>> -       seq_printf(m, "CqTail:\t%u\n", cq_tail & cq_mask);
>> -       seq_printf(m, "CachedCqTail:\t%u\n", cached_cq_tail & cq_mask);
>> -       seq_printf(m, "SQEs:\t%u\n", sq_tail - cached_sq_head);
>> -       for (i = cached_sq_head; i < sq_tail; i++) {
>> -               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
>> +       seq_printf(m, "SqMask:\t\t0x%x\n", sq_mask);
>> +       seq_printf(m, "SqHead:\t%u\n", sq_head);
>> +       seq_printf(m, "SqTail:\t%u\n", sq_tail);
>> +       seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
>> +       seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
>> +       seq_printf(m, "CqHead:\t%u\n", cq_head);
>> +       seq_printf(m, "CqTail:\t%u\n", cq_tail);
>> +       seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
>> +       seq_printf(m, "SQEs:\t%u\n", sq_tail - ctx->cached_sq_head);
>> +       sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
>> +       for (i = 0; i < sq_entries; i++) {
>> +               unsigned int entry = i + sq_head;
>> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
>>
>>                 if (likely(sq_idx <= sq_mask)) {
>>                         struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
>> @@ -10097,9 +10100,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>>                                    sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
>>                 }
>>         }
>> -       seq_printf(m, "CQEs:\t%u\n", cached_cq_tail - cq_head);
>> -       for (i = cq_head; i < cached_cq_tail; i++) {
>> -               struct io_uring_cqe *cqe = &r->cqes[i & cq_mask];
>> +       seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
>> +       cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
>> +       for (i = 0; i < cq_entries; i++) {
>> +               unsigned int entry = i + cq_head;
>> +               struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
>>
>>                 seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
>>                            i & cq_mask, cqe->user_data, cqe->res, cqe->flags);
> 
> Note : you probably want to replace  (i & cq_mask) to (entry & cq_mask) here
> 
> Otherwise, patch looks good to me.

Thanks, good catch. I've changed it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-29 12:40       ` Jens Axboe
@ 2021-10-29 17:54         ` Hao Xu
  2021-10-29 17:56           ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Hao Xu @ 2021-10-29 17:54 UTC (permalink / raw)
  To: Jens Axboe, Eric Dumazet, Eric Dumazet; +Cc: LKML

Hi Eric and Jens,
Sorry for reply late, I missed this eamil.
在 2021/10/29 下午8:40, Jens Axboe 写道:
> On 10/28/21 6:43 PM, Eric Dumazet wrote:
>> On Thu, Oct 28, 2021 at 5:13 PM Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> On 10/28/21 3:40 PM, Jens Axboe wrote:
>>>> On 10/28/21 3:24 PM, Eric Dumazet wrote:
>>>>> Hi
>>>>>
>>>>> I was looking at commit 83f84356bc8f2d
>>>>> ("io_uring: add more uring info to fdinfo for debug") after receiving
>>>>> syzbot reports.
>>>>>
>>>>> I suspect that the following :
>>>>>
>>>>> +       for (i = cached_sq_head; i < sq_tail; i++) {
>>>>> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
>>>>> +
>>>>> +               if (likely(sq_idx <= sq_mask)) {
>>>>> +                       struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
>>>>> +
>>>>> +                       seq_printf(m, "%5u: opcode:%d, fd:%d, flags:%x, user_data:%llu\n",
>>>>> +                                  sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
>>>>> +               }
>>>>> +       }
>>>>>
>>>>>
>>>>> Can loop around ~2^32 times if sq_tail is close to ~0U
Hi Eric,
I do guess the syzbot manually set the sq_tail or sq_head to some
value rather than indirectly change them by push sqes into the ring like
a normal programs do. But Surely the wrong setting of sq_tail/sq_head
could happen if a user use raw io_uring api. So my Bad, thanks for
reporting!
)
>>>>>
>>>>> I see various READ_ONCE(), which are probably not good enough.
>>>>>
>>>>> At very minimum I would handling wrapping...
Sorry for my poor English, do you mean the alignment of the output?
>>>>
>>>> Thanks for reporting this. I think on top of wrapping, the loop should
>>>> just be capped at sq_entries as well. There's no point dumping more than
>>>> that, ever.
>>>>
>>>> I'll take a stab at this.
>>>
>>> I'd probably do something like this - make sure wrap is sane and that we
>>> always cap at the max number of entries we expect. This doesn't quite
>>> hold true for CQEs, but honestly for debugging purposes, we only really
>>> care about the sq ring side in terms of stalls. Or if we have unreaped
>>> CQEs, which we'll still show.
>>>
>>> This also removes the masking, as it's better to expose the ring indexes
>>> directly. And just dump the raw ring head/tail for sq/cq. We still
>>> include the cached info, but I think dumping the raw contents is saner
>>> and more useful.
Hi Jens,
I was thinking that print all the entries including the unused ones
would make the list very long(real product use hundreds or thousands
while there may be just several useful sqes). And that may be confused,
a example is: sq_array[0-999] = 0, we have only one sqe sqes[0] in use.
then we'll print sqe[0] 1000 times, though we can get to know the real
useful one by sq_head and tail info after calculation.
And if only print the useful ones, exposing the original counters will
make the loop condition a bit simpler than exposing the ring index(since
the original counters depend on natural overflow) though we cannot do
that now since realizing issues like what Eric reports.

Regards,
Hao

>>>
>>>
>>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>>> index 17cb0e1b88f0..babd9950ae9f 100644
>>> --- a/fs/io_uring.c
>>> +++ b/fs/io_uring.c
>>> @@ -10065,12 +10065,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>>>          struct io_overflow_cqe *ocqe;
>>>          struct io_rings *r = ctx->rings;
>>>          unsigned int sq_mask = ctx->sq_entries - 1, cq_mask = ctx->cq_entries - 1;
>>> -       unsigned int cached_sq_head = ctx->cached_sq_head;
>>> -       unsigned int cached_cq_tail = ctx->cached_cq_tail;
>>>          unsigned int sq_head = READ_ONCE(r->sq.head);
>>>          unsigned int sq_tail = READ_ONCE(r->sq.tail);
>>>          unsigned int cq_head = READ_ONCE(r->cq.head);
>>>          unsigned int cq_tail = READ_ONCE(r->cq.tail);
>>> +       unsigned int sq_entries, cq_entries;
>>>          bool has_lock;
>>>          unsigned int i;
>>>
>>> @@ -10080,15 +10079,19 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>>>           * and sq_tail and cq_head are changed by userspace. But it's ok since
>>>           * we usually use these info when it is stuck.
>>>           */
>>> -       seq_printf(m, "SqHead:\t%u\n", sq_head & sq_mask);
>>> -       seq_printf(m, "SqTail:\t%u\n", sq_tail & sq_mask);
>>> -       seq_printf(m, "CachedSqHead:\t%u\n", cached_sq_head & sq_mask);
>>> -       seq_printf(m, "CqHead:\t%u\n", cq_head & cq_mask);
>>> -       seq_printf(m, "CqTail:\t%u\n", cq_tail & cq_mask);
>>> -       seq_printf(m, "CachedCqTail:\t%u\n", cached_cq_tail & cq_mask);
>>> -       seq_printf(m, "SQEs:\t%u\n", sq_tail - cached_sq_head);
>>> -       for (i = cached_sq_head; i < sq_tail; i++) {
>>> -               unsigned int sq_idx = READ_ONCE(ctx->sq_array[i & sq_mask]);
>>> +       seq_printf(m, "SqMask:\t\t0x%x\n", sq_mask);
>>> +       seq_printf(m, "SqHead:\t%u\n", sq_head);
>>> +       seq_printf(m, "SqTail:\t%u\n", sq_tail);
>>> +       seq_printf(m, "CachedSqHead:\t%u\n", ctx->cached_sq_head);
>>> +       seq_printf(m, "CqMask:\t0x%x\n", cq_mask);
>>> +       seq_printf(m, "CqHead:\t%u\n", cq_head);
>>> +       seq_printf(m, "CqTail:\t%u\n", cq_tail);
>>> +       seq_printf(m, "CachedCqTail:\t%u\n", ctx->cached_cq_tail);
>>> +       seq_printf(m, "SQEs:\t%u\n", sq_tail - ctx->cached_sq_head);
>>> +       sq_entries = min(sq_tail - sq_head, ctx->sq_entries);
>>> +       for (i = 0; i < sq_entries; i++) {
>>> +               unsigned int entry = i + sq_head;
>>> +               unsigned int sq_idx = READ_ONCE(ctx->sq_array[entry & sq_mask]);
>>>
>>>                  if (likely(sq_idx <= sq_mask)) {
>>>                          struct io_uring_sqe *sqe = &ctx->sq_sqes[sq_idx];
>>> @@ -10097,9 +10100,11 @@ static __cold void __io_uring_show_fdinfo(struct io_ring_ctx *ctx,
>>>                                     sq_idx, sqe->opcode, sqe->fd, sqe->flags, sqe->user_data);
>>>                  }
>>>          }
>>> -       seq_printf(m, "CQEs:\t%u\n", cached_cq_tail - cq_head);
>>> -       for (i = cq_head; i < cached_cq_tail; i++) {
>>> -               struct io_uring_cqe *cqe = &r->cqes[i & cq_mask];
>>> +       seq_printf(m, "CQEs:\t%u\n", cq_tail - cq_head);
>>> +       cq_entries = min(cq_tail - cq_head, ctx->cq_entries);
>>> +       for (i = 0; i < cq_entries; i++) {
>>> +               unsigned int entry = i + cq_head;
>>> +               struct io_uring_cqe *cqe = &r->cqes[entry & cq_mask];
>>>
>>>                  seq_printf(m, "%5u: user_data:%llu, res:%d, flag:%x\n",
>>>                             i & cq_mask, cqe->user_data, cqe->res, cqe->flags);
>>
>> Note : you probably want to replace  (i & cq_mask) to (entry & cq_mask) here
>>
>> Otherwise, patch looks good to me.
> 
> Thanks, good catch. I've changed it.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-29 17:54         ` Hao Xu
@ 2021-10-29 17:56           ` Jens Axboe
  2021-10-29 17:58             ` Hao Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2021-10-29 17:56 UTC (permalink / raw)
  To: Hao Xu, Eric Dumazet, Eric Dumazet; +Cc: LKML

On 10/29/21 11:54 AM, Hao Xu wrote:
> Hi Jens,
> I was thinking that print all the entries including the unused ones
> would make the list very long(real product use hundreds or thousands
> while there may be just several useful sqes). And that may be confused,
> a example is: sq_array[0-999] = 0, we have only one sqe sqes[0] in use.
> then we'll print sqe[0] 1000 times, though we can get to know the real
> useful one by sq_head and tail info after calculation.
> And if only print the useful ones, exposing the original counters will
> make the loop condition a bit simpler than exposing the ring index(since
> the original counters depend on natural overflow) though we cannot do
> that now since realizing issues like what Eric reports.

We can revisit in the merge window, I wrote up the pull requests for
this branch (among others) already. The most important part was fixing
the looping to appropriately cap it.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [BUG] About "io_uring: add more uring info to fdinfo for debug"
  2021-10-29 17:56           ` Jens Axboe
@ 2021-10-29 17:58             ` Hao Xu
  0 siblings, 0 replies; 8+ messages in thread
From: Hao Xu @ 2021-10-29 17:58 UTC (permalink / raw)
  To: Jens Axboe, Eric Dumazet, Eric Dumazet; +Cc: LKML

在 2021/10/30 上午1:56, Jens Axboe 写道:
> On 10/29/21 11:54 AM, Hao Xu wrote:
>> Hi Jens,
>> I was thinking that print all the entries including the unused ones
>> would make the list very long(real product use hundreds or thousands
>> while there may be just several useful sqes). And that may be confused,
>> a example is: sq_array[0-999] = 0, we have only one sqe sqes[0] in use.
>> then we'll print sqe[0] 1000 times, though we can get to know the real
>> useful one by sq_head and tail info after calculation.
>> And if only print the useful ones, exposing the original counters will
>> make the loop condition a bit simpler than exposing the ring index(since
>> the original counters depend on natural overflow) though we cannot do
>> that now since realizing issues like what Eric reports.
> 
> We can revisit in the merge window, I wrote up the pull requests for
> this branch (among others) already. The most important part was fixing
> the looping to appropriately cap it.
> 
Gotcha


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-29 17:59 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-28 21:24 [BUG] About "io_uring: add more uring info to fdinfo for debug" Eric Dumazet
2021-10-28 21:40 ` Jens Axboe
2021-10-29  0:13   ` Jens Axboe
2021-10-29  0:43     ` Eric Dumazet
2021-10-29 12:40       ` Jens Axboe
2021-10-29 17:54         ` Hao Xu
2021-10-29 17:56           ` Jens Axboe
2021-10-29 17:58             ` Hao Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.