All of lore.kernel.org
 help / color / mirror / Atom feed
* fio assert
@ 2016-11-01  3:12 Jaze Lee
  2016-11-01  6:07 ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Jaze Lee @ 2016-11-01  3:12 UTC (permalink / raw)
  To: fio

Hello,
   I found that when rate_iops used with write_lat_log, it will be
high probability
that the fio will assert. The below is the core dump.

(gdb) bt
#0  0x00007f3c3ccd25f7 in raise () from /lib64/libc.so.6
#1  0x00007f3c3ccd3ce8 in abort () from /lib64/libc.so.6
#2  0x00007f3c3cccb566 in __assert_fail_base () from /lib64/libc.so.6
#3  0x00007f3c3cccb612 in __assert_fail () from /lib64/libc.so.6
#4  0x000000000042249c in get_cur_log (iolog=0x7f3c390bb8b0) at stat.c:2004
#5  __add_log_sample (iolog=iolog@entry=0x7f3c390bb8b0,
val=val@entry=1, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=4096, t=1024,
offset=offset@entry=972521472)
    at stat.c:2019
#6  0x0000000000422767 in __add_log_sample (offset=972521472,
t=<optimized out>, bs=4096, ddir=DDIR_WRITE, val=1,
iolog=0x7f3c390bb8b0) at stat.c:2099
#7  add_log_sample (td=td@entry=0x7f3c1de55000, iolog=0x7f3c390bb8b0,
val=1, ddir=<optimized out>, bs=4096, offset=972521472) at stat.c:2131
#8  0x000000000042ef14 in add_log_sample (offset=<optimized out>,
bs=<optimized out>, ddir=<optimized out>, val=<optimized out>,
iolog=<optimized out>,
    td=0x7f3c1de55000) at stat.c:2269
#9  add_slat_sample (td=0x7f3c1de55000, ddir=<optimized out>,
usec=<optimized out>, bs=<optimized out>, offset=<optimized out>) at
stat.c:2270
#10 0x000000000043e976 in io_u_queued (td=<optimized out>,
io_u=<optimized out>) at io_u.c:2002
#11 0x000000000045a6ba in io_queue_event (td=td@entry=0x7f3c1de55000,
io_u=io_u@entry=0x7f3c04046200, ret=ret@entry=0x7f3c0e950c30,
    ddir=ddir@entry=DDIR_WRITE,
bytes_issued=bytes_issued@entry=0x7f3c0e950c60,
from_verify=from_verify@entry=0,
comp_time=comp_time@entry=0x7f3c0e950ce0)
    at backend.c:545
#12 0x000000000045d4af in do_io (bytes_done=<synthetic pointer>,
td=0x7f3c1de55000) at backend.c:1010
#13 thread_main (data=<optimized out>) at backend.c:1714
#14 0x00007f3c3d269dc5 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f3c3cd9321d in clone () from /lib64/libc.so.6



it points to https://github.com/axboe/fio/blob/master/stat.c#L2004
This line will judge whether nr_samples is bigger than max_samples.
I do not have time to dig into deeper.  Can someone explain more about
the bug, and is there a fix to this?

Thanks a lot.




-- 
谦谦君子

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio assert
  2016-11-01  3:12 fio assert Jaze Lee
@ 2016-11-01  6:07 ` Sitsofe Wheeler
       [not found]   ` <CAKZcxgc2k=o2_kbYdFmxScxC2Zns+SKi++=rdxvwD2vGj83RZg@mail.gmail.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2016-11-01  6:07 UTC (permalink / raw)
  To: Jaze Lee; +Cc: fio

On 1 November 2016 at 03:12, Jaze Lee <jazeltq@gmail.com> wrote:
> Hello,
>    I found that when rate_iops used with write_lat_log, it will be
> high probability
> that the fio will assert. The below is the core dump.
>
> (gdb) bt
> #0  0x00007f3c3ccd25f7 in raise () from /lib64/libc.so.6
> #1  0x00007f3c3ccd3ce8 in abort () from /lib64/libc.so.6
> #2  0x00007f3c3cccb566 in __assert_fail_base () from /lib64/libc.so.6
> #3  0x00007f3c3cccb612 in __assert_fail () from /lib64/libc.so.6
> #4  0x000000000042249c in get_cur_log (iolog=0x7f3c390bb8b0) at stat.c:2004
> #5  __add_log_sample (iolog=iolog@entry=0x7f3c390bb8b0,
> val=val@entry=1, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=4096, t=1024,
> offset=offset@entry=972521472)
>     at stat.c:2019

Could you include the job file / command line that reproduces the
issue and the version of fio you were using?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio assert
       [not found]       ` <CAKZcxgeXnostqxPAGi2ek1x+Tw7_CxrnjVzuy3QrJhwbERewnQ@mail.gmail.com>
@ 2016-11-03  7:31         ` Sitsofe Wheeler
  2016-12-01 10:03           ` Jaze Lee
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2016-11-03  7:31 UTC (permalink / raw)
  To: Jaze Lee; +Cc: fio, Jens Axboe

On 3 November 2016 at 01:58, Jaze Lee <jazeltq@gmail.com> wrote:
> Is there some news for this bug ?
>
> 2016-11-02 11:50 GMT+08:00 Jaze Lee <jazeltq@gmail.com>:
>> I found this is not related with rbd engine. If use libaio, and test
>> raw device. It will also happen.
>> the fio script is
>>
>> [global]
>> ioengine=libaio
>> filename=/dev/sda
>> direct=1
>> group_reporting
>> iodepth=32
>> thread
>>
>> [fio_iops_test]
>> description=fio_iops_test
>> runtime=600
>> bs=4K
>> rw=randread
>> write_lat_log=cpuiops
>> rate_iops=2000

Can you make sure that you keep the fio mailing list CC'd on your
mails rather than just replying to me directly? I'm just another fio
user like you...

I too can reproduce the problem with a slightly modified version of
your job file:
[fio_iops_test]
ioengine=libaio
direct=1
filename=/dev/nullb0
iodepth=2
bs=4K
rw=read
write_lat_log=/dev/null
rate_iops=2000

Within a second or two of running it the following assert is printed:
fio: stat.c:2004: get_cur_log: Assertion `iolog->pending->nr_samples <
iolog->pending->max_samples' failed.

Doing any of the following stops the assertion from triggering:
setting the iodepth to 1, removing the write_lat_log parameter,
removing the rate_iops parameter or adding the io_submit_mode=offload
parameter.

The problem seems to be due to the rate limiting somehow forcing iolog
growth pressure but there is an inability to grow the iolog at the
point we notice we need more room within stats. Normally the iolog
growing can be deferred until next submission but I guess when you
have to average over the samples in it (so as to work out the write
latency for a given period) this is an issue because the logging areas
aren't prepared for the change in size.

You could hack the fio code to force a bigger default iolog size so it
doesn't have to grow it later but a better way would be to use the
io_submit_mode=offload parameter
(https://github.com/axboe/fio/blob/fio-2.15/HOWTO#L888 ) which
decouples the submission from the completion/logging. Perhaps fio
could put up a warning suggesting its use if you use rate_iops and one
of the parameters that force averaging together but there could be
other ways of triggering the same issue...

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio assert
  2016-11-03  7:31         ` Sitsofe Wheeler
@ 2016-12-01 10:03           ` Jaze Lee
  2016-12-01 15:12             ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Jaze Lee @ 2016-12-01 10:03 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio, Jens Axboe

Hello,
  Dose this issue can be fixed? Or the only method is to use offload
to workaround.


2016-11-03 15:31 GMT+08:00 Sitsofe Wheeler <sitsofe@gmail.com>:
> On 3 November 2016 at 01:58, Jaze Lee <jazeltq@gmail.com> wrote:
>> Is there some news for this bug ?
>>
>> 2016-11-02 11:50 GMT+08:00 Jaze Lee <jazeltq@gmail.com>:
>>> I found this is not related with rbd engine. If use libaio, and test
>>> raw device. It will also happen.
>>> the fio script is
>>>
>>> [global]
>>> ioengine=libaio
>>> filename=/dev/sda
>>> direct=1
>>> group_reporting
>>> iodepth=32
>>> thread
>>>
>>> [fio_iops_test]
>>> description=fio_iops_test
>>> runtime=600
>>> bs=4K
>>> rw=randread
>>> write_lat_log=cpuiops
>>> rate_iops=2000
>
> Can you make sure that you keep the fio mailing list CC'd on your
> mails rather than just replying to me directly? I'm just another fio
> user like you...
>
> I too can reproduce the problem with a slightly modified version of
> your job file:
> [fio_iops_test]
> ioengine=libaio
> direct=1
> filename=/dev/nullb0
> iodepth=2
> bs=4K
> rw=read
> write_lat_log=/dev/null
> rate_iops=2000
>
> Within a second or two of running it the following assert is printed:
> fio: stat.c:2004: get_cur_log: Assertion `iolog->pending->nr_samples <
> iolog->pending->max_samples' failed.
>
> Doing any of the following stops the assertion from triggering:
> setting the iodepth to 1, removing the write_lat_log parameter,
> removing the rate_iops parameter or adding the io_submit_mode=offload
> parameter.
>
> The problem seems to be due to the rate limiting somehow forcing iolog
> growth pressure but there is an inability to grow the iolog at the
> point we notice we need more room within stats. Normally the iolog
> growing can be deferred until next submission but I guess when you
> have to average over the samples in it (so as to work out the write
> latency for a given period) this is an issue because the logging areas
> aren't prepared for the change in size.
>
> You could hack the fio code to force a bigger default iolog size so it
> doesn't have to grow it later but a better way would be to use the
> io_submit_mode=offload parameter
> (https://github.com/axboe/fio/blob/fio-2.15/HOWTO#L888 ) which
> decouples the submission from the completion/logging. Perhaps fio
> could put up a warning suggesting its use if you use rate_iops and one
> of the parameters that force averaging together but there could be
> other ways of triggering the same issue...
>
> --
> Sitsofe | http://sucs.org/~sits/



-- 
谦谦君子


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio assert
  2016-12-01 10:03           ` Jaze Lee
@ 2016-12-01 15:12             ` Jens Axboe
  2016-12-02  0:10               ` Jaze Lee
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2016-12-01 15:12 UTC (permalink / raw)
  To: Jaze Lee, Sitsofe Wheeler; +Cc: fio

On 12/01/2016 03:03 AM, Jaze Lee wrote:
> Hello,
>   Dose this issue can be fixed? Or the only method is to use offload
> to workaround.

Can you try the below patch? We simply need to make sure that we always
regrow, when we enter an idle state. The logic was a little warped, the
below is cleaner.


diff --git a/backend.c b/backend.c
index 8616fc2..ac71521 100644
--- a/backend.c
+++ b/backend.c
@@ -441,11 +441,8 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time)
 	int min_evts = 0;
 	int ret;
 
-	if (td->flags & TD_F_REGROW_LOGS) {
-		ret = io_u_quiesce(td);
-		regrow_logs(td);
-		return ret;
-	}
+	if (td->flags & TD_F_REGROW_LOGS)
+		return io_u_quiesce(td);
 
 	/*
 	 * if the queue is full, we MUST reap at least 1 event
diff --git a/io_u.c b/io_u.c
index 428d4b7..7420629 100644
--- a/io_u.c
+++ b/io_u.c
@@ -653,6 +653,9 @@ int io_u_quiesce(struct thread_data *td)
 			completed += ret;
 	}
 
+	if (td->flags & TD_F_REGROW_LOGS)
+		regrow_logs(td);
+
 	return completed;
 }
 

-- 
Jens Axboe



^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: fio assert
  2016-12-01 15:12             ` Jens Axboe
@ 2016-12-02  0:10               ` Jaze Lee
  2016-12-02  0:33                 ` Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: Jaze Lee @ 2016-12-02  0:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sitsofe Wheeler, fio

Thanks, it works. But can u explain more about how it works?
I thought we may change stat.c.....

2016-12-01 23:12 GMT+08:00 Jens Axboe <axboe@kernel.dk>:
> On 12/01/2016 03:03 AM, Jaze Lee wrote:
>> Hello,
>>   Dose this issue can be fixed? Or the only method is to use offload
>> to workaround.
>
> Can you try the below patch? We simply need to make sure that we always
> regrow, when we enter an idle state. The logic was a little warped, the
> below is cleaner.
>
>
> diff --git a/backend.c b/backend.c
> index 8616fc2..ac71521 100644
> --- a/backend.c
> +++ b/backend.c
> @@ -441,11 +441,8 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time)
>         int min_evts = 0;
>         int ret;
>
> -       if (td->flags & TD_F_REGROW_LOGS) {
> -               ret = io_u_quiesce(td);
> -               regrow_logs(td);
> -               return ret;
> -       }
> +       if (td->flags & TD_F_REGROW_LOGS)
> +               return io_u_quiesce(td);
>
>         /*
>          * if the queue is full, we MUST reap at least 1 event
> diff --git a/io_u.c b/io_u.c
> index 428d4b7..7420629 100644
> --- a/io_u.c
> +++ b/io_u.c
> @@ -653,6 +653,9 @@ int io_u_quiesce(struct thread_data *td)
>                         completed += ret;
>         }
>
> +       if (td->flags & TD_F_REGROW_LOGS)
> +               regrow_logs(td);
> +
>         return completed;
>  }
>
>
> --
> Jens Axboe
>



-- 
谦谦君子


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio assert
  2016-12-02  0:10               ` Jaze Lee
@ 2016-12-02  0:33                 ` Jens Axboe
  2016-12-02  4:10                   ` Jaze Lee
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2016-12-02  0:33 UTC (permalink / raw)
  To: Jaze Lee; +Cc: Sitsofe Wheeler, fio

On 12/01/2016 05:10 PM, Jaze Lee wrote:
> Thanks, it works. But can u explain more about how it works?
> I thought we may change stat.c.....

It wasn't a problem in stat.c, that was just the symptom. Fio has to
regrow the log segments, and it needs a call at the right time to do
that. For rated IO, we were missing that. The patch fixes that up.

Thanks for testing, I'll commit the fix.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fio assert
  2016-12-02  0:33                 ` Jens Axboe
@ 2016-12-02  4:10                   ` Jaze Lee
  0 siblings, 0 replies; 8+ messages in thread
From: Jaze Lee @ 2016-12-02  4:10 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sitsofe Wheeler, fio

OK. Thanks  Jens.

2016-12-02 8:33 GMT+08:00 Jens Axboe <axboe@kernel.dk>:
> On 12/01/2016 05:10 PM, Jaze Lee wrote:
>> Thanks, it works. But can u explain more about how it works?
>> I thought we may change stat.c.....
>
> It wasn't a problem in stat.c, that was just the symptom. Fio has to
> regrow the log segments, and it needs a call at the right time to do
> that. For rated IO, we were missing that. The patch fixes that up.
>
> Thanks for testing, I'll commit the fix.
>
> --
> Jens Axboe
>



-- 
谦谦君子


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-12-02  4:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-01  3:12 fio assert Jaze Lee
2016-11-01  6:07 ` Sitsofe Wheeler
     [not found]   ` <CAKZcxgc2k=o2_kbYdFmxScxC2Zns+SKi++=rdxvwD2vGj83RZg@mail.gmail.com>
     [not found]     ` <CAKZcxgfr_RfHD0XcDV0nMgFhU1nPNEzP0GciE-Unw-i-k3eaSw@mail.gmail.com>
     [not found]       ` <CAKZcxgeXnostqxPAGi2ek1x+Tw7_CxrnjVzuy3QrJhwbERewnQ@mail.gmail.com>
2016-11-03  7:31         ` Sitsofe Wheeler
2016-12-01 10:03           ` Jaze Lee
2016-12-01 15:12             ` Jens Axboe
2016-12-02  0:10               ` Jaze Lee
2016-12-02  0:33                 ` Jens Axboe
2016-12-02  4:10                   ` Jaze Lee

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.