* fio assert @ 2016-11-01 3:12 Jaze Lee 2016-11-01 6:07 ` Sitsofe Wheeler 0 siblings, 1 reply; 8+ messages in thread From: Jaze Lee @ 2016-11-01 3:12 UTC (permalink / raw) To: fio Hello, I found that when rate_iops used with write_lat_log, it will be high probability that the fio will assert. The below is the core dump. (gdb) bt #0 0x00007f3c3ccd25f7 in raise () from /lib64/libc.so.6 #1 0x00007f3c3ccd3ce8 in abort () from /lib64/libc.so.6 #2 0x00007f3c3cccb566 in __assert_fail_base () from /lib64/libc.so.6 #3 0x00007f3c3cccb612 in __assert_fail () from /lib64/libc.so.6 #4 0x000000000042249c in get_cur_log (iolog=0x7f3c390bb8b0) at stat.c:2004 #5 __add_log_sample (iolog=iolog@entry=0x7f3c390bb8b0, val=val@entry=1, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=4096, t=1024, offset=offset@entry=972521472) at stat.c:2019 #6 0x0000000000422767 in __add_log_sample (offset=972521472, t=<optimized out>, bs=4096, ddir=DDIR_WRITE, val=1, iolog=0x7f3c390bb8b0) at stat.c:2099 #7 add_log_sample (td=td@entry=0x7f3c1de55000, iolog=0x7f3c390bb8b0, val=1, ddir=<optimized out>, bs=4096, offset=972521472) at stat.c:2131 #8 0x000000000042ef14 in add_log_sample (offset=<optimized out>, bs=<optimized out>, ddir=<optimized out>, val=<optimized out>, iolog=<optimized out>, td=0x7f3c1de55000) at stat.c:2269 #9 add_slat_sample (td=0x7f3c1de55000, ddir=<optimized out>, usec=<optimized out>, bs=<optimized out>, offset=<optimized out>) at stat.c:2270 #10 0x000000000043e976 in io_u_queued (td=<optimized out>, io_u=<optimized out>) at io_u.c:2002 #11 0x000000000045a6ba in io_queue_event (td=td@entry=0x7f3c1de55000, io_u=io_u@entry=0x7f3c04046200, ret=ret@entry=0x7f3c0e950c30, ddir=ddir@entry=DDIR_WRITE, bytes_issued=bytes_issued@entry=0x7f3c0e950c60, from_verify=from_verify@entry=0, comp_time=comp_time@entry=0x7f3c0e950ce0) at backend.c:545 #12 0x000000000045d4af in do_io (bytes_done=<synthetic pointer>, td=0x7f3c1de55000) at backend.c:1010 #13 thread_main (data=<optimized out>) at backend.c:1714 #14 0x00007f3c3d269dc5 in start_thread () from /lib64/libpthread.so.0 #15 0x00007f3c3cd9321d in clone () from /lib64/libc.so.6 it points to https://github.com/axboe/fio/blob/master/stat.c#L2004 This line will judge whether nr_samples is bigger than max_samples. I do not have time to dig into deeper. Can someone explain more about the bug, and is there a fix to this? Thanks a lot. -- 谦谦君子 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio assert 2016-11-01 3:12 fio assert Jaze Lee @ 2016-11-01 6:07 ` Sitsofe Wheeler [not found] ` <CAKZcxgc2k=o2_kbYdFmxScxC2Zns+SKi++=rdxvwD2vGj83RZg@mail.gmail.com> 0 siblings, 1 reply; 8+ messages in thread From: Sitsofe Wheeler @ 2016-11-01 6:07 UTC (permalink / raw) To: Jaze Lee; +Cc: fio On 1 November 2016 at 03:12, Jaze Lee <jazeltq@gmail.com> wrote: > Hello, > I found that when rate_iops used with write_lat_log, it will be > high probability > that the fio will assert. The below is the core dump. > > (gdb) bt > #0 0x00007f3c3ccd25f7 in raise () from /lib64/libc.so.6 > #1 0x00007f3c3ccd3ce8 in abort () from /lib64/libc.so.6 > #2 0x00007f3c3cccb566 in __assert_fail_base () from /lib64/libc.so.6 > #3 0x00007f3c3cccb612 in __assert_fail () from /lib64/libc.so.6 > #4 0x000000000042249c in get_cur_log (iolog=0x7f3c390bb8b0) at stat.c:2004 > #5 __add_log_sample (iolog=iolog@entry=0x7f3c390bb8b0, > val=val@entry=1, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=4096, t=1024, > offset=offset@entry=972521472) > at stat.c:2019 Could you include the job file / command line that reproduces the issue and the version of fio you were using? -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CAKZcxgc2k=o2_kbYdFmxScxC2Zns+SKi++=rdxvwD2vGj83RZg@mail.gmail.com>]
[parent not found: <CAKZcxgfr_RfHD0XcDV0nMgFhU1nPNEzP0GciE-Unw-i-k3eaSw@mail.gmail.com>]
[parent not found: <CAKZcxgeXnostqxPAGi2ek1x+Tw7_CxrnjVzuy3QrJhwbERewnQ@mail.gmail.com>]
* Re: fio assert [not found] ` <CAKZcxgeXnostqxPAGi2ek1x+Tw7_CxrnjVzuy3QrJhwbERewnQ@mail.gmail.com> @ 2016-11-03 7:31 ` Sitsofe Wheeler 2016-12-01 10:03 ` Jaze Lee 0 siblings, 1 reply; 8+ messages in thread From: Sitsofe Wheeler @ 2016-11-03 7:31 UTC (permalink / raw) To: Jaze Lee; +Cc: fio, Jens Axboe On 3 November 2016 at 01:58, Jaze Lee <jazeltq@gmail.com> wrote: > Is there some news for this bug ? > > 2016-11-02 11:50 GMT+08:00 Jaze Lee <jazeltq@gmail.com>: >> I found this is not related with rbd engine. If use libaio, and test >> raw device. It will also happen. >> the fio script is >> >> [global] >> ioengine=libaio >> filename=/dev/sda >> direct=1 >> group_reporting >> iodepth=32 >> thread >> >> [fio_iops_test] >> description=fio_iops_test >> runtime=600 >> bs=4K >> rw=randread >> write_lat_log=cpuiops >> rate_iops=2000 Can you make sure that you keep the fio mailing list CC'd on your mails rather than just replying to me directly? I'm just another fio user like you... I too can reproduce the problem with a slightly modified version of your job file: [fio_iops_test] ioengine=libaio direct=1 filename=/dev/nullb0 iodepth=2 bs=4K rw=read write_lat_log=/dev/null rate_iops=2000 Within a second or two of running it the following assert is printed: fio: stat.c:2004: get_cur_log: Assertion `iolog->pending->nr_samples < iolog->pending->max_samples' failed. Doing any of the following stops the assertion from triggering: setting the iodepth to 1, removing the write_lat_log parameter, removing the rate_iops parameter or adding the io_submit_mode=offload parameter. The problem seems to be due to the rate limiting somehow forcing iolog growth pressure but there is an inability to grow the iolog at the point we notice we need more room within stats. Normally the iolog growing can be deferred until next submission but I guess when you have to average over the samples in it (so as to work out the write latency for a given period) this is an issue because the logging areas aren't prepared for the change in size. You could hack the fio code to force a bigger default iolog size so it doesn't have to grow it later but a better way would be to use the io_submit_mode=offload parameter (https://github.com/axboe/fio/blob/fio-2.15/HOWTO#L888 ) which decouples the submission from the completion/logging. Perhaps fio could put up a warning suggesting its use if you use rate_iops and one of the parameters that force averaging together but there could be other ways of triggering the same issue... -- Sitsofe | http://sucs.org/~sits/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio assert 2016-11-03 7:31 ` Sitsofe Wheeler @ 2016-12-01 10:03 ` Jaze Lee 2016-12-01 15:12 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Jaze Lee @ 2016-12-01 10:03 UTC (permalink / raw) To: Sitsofe Wheeler; +Cc: fio, Jens Axboe Hello, Dose this issue can be fixed? Or the only method is to use offload to workaround. 2016-11-03 15:31 GMT+08:00 Sitsofe Wheeler <sitsofe@gmail.com>: > On 3 November 2016 at 01:58, Jaze Lee <jazeltq@gmail.com> wrote: >> Is there some news for this bug ? >> >> 2016-11-02 11:50 GMT+08:00 Jaze Lee <jazeltq@gmail.com>: >>> I found this is not related with rbd engine. If use libaio, and test >>> raw device. It will also happen. >>> the fio script is >>> >>> [global] >>> ioengine=libaio >>> filename=/dev/sda >>> direct=1 >>> group_reporting >>> iodepth=32 >>> thread >>> >>> [fio_iops_test] >>> description=fio_iops_test >>> runtime=600 >>> bs=4K >>> rw=randread >>> write_lat_log=cpuiops >>> rate_iops=2000 > > Can you make sure that you keep the fio mailing list CC'd on your > mails rather than just replying to me directly? I'm just another fio > user like you... > > I too can reproduce the problem with a slightly modified version of > your job file: > [fio_iops_test] > ioengine=libaio > direct=1 > filename=/dev/nullb0 > iodepth=2 > bs=4K > rw=read > write_lat_log=/dev/null > rate_iops=2000 > > Within a second or two of running it the following assert is printed: > fio: stat.c:2004: get_cur_log: Assertion `iolog->pending->nr_samples < > iolog->pending->max_samples' failed. > > Doing any of the following stops the assertion from triggering: > setting the iodepth to 1, removing the write_lat_log parameter, > removing the rate_iops parameter or adding the io_submit_mode=offload > parameter. > > The problem seems to be due to the rate limiting somehow forcing iolog > growth pressure but there is an inability to grow the iolog at the > point we notice we need more room within stats. Normally the iolog > growing can be deferred until next submission but I guess when you > have to average over the samples in it (so as to work out the write > latency for a given period) this is an issue because the logging areas > aren't prepared for the change in size. > > You could hack the fio code to force a bigger default iolog size so it > doesn't have to grow it later but a better way would be to use the > io_submit_mode=offload parameter > (https://github.com/axboe/fio/blob/fio-2.15/HOWTO#L888 ) which > decouples the submission from the completion/logging. Perhaps fio > could put up a warning suggesting its use if you use rate_iops and one > of the parameters that force averaging together but there could be > other ways of triggering the same issue... > > -- > Sitsofe | http://sucs.org/~sits/ -- 谦谦君子 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio assert 2016-12-01 10:03 ` Jaze Lee @ 2016-12-01 15:12 ` Jens Axboe 2016-12-02 0:10 ` Jaze Lee 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2016-12-01 15:12 UTC (permalink / raw) To: Jaze Lee, Sitsofe Wheeler; +Cc: fio On 12/01/2016 03:03 AM, Jaze Lee wrote: > Hello, > Dose this issue can be fixed? Or the only method is to use offload > to workaround. Can you try the below patch? We simply need to make sure that we always regrow, when we enter an idle state. The logic was a little warped, the below is cleaner. diff --git a/backend.c b/backend.c index 8616fc2..ac71521 100644 --- a/backend.c +++ b/backend.c @@ -441,11 +441,8 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time) int min_evts = 0; int ret; - if (td->flags & TD_F_REGROW_LOGS) { - ret = io_u_quiesce(td); - regrow_logs(td); - return ret; - } + if (td->flags & TD_F_REGROW_LOGS) + return io_u_quiesce(td); /* * if the queue is full, we MUST reap at least 1 event diff --git a/io_u.c b/io_u.c index 428d4b7..7420629 100644 --- a/io_u.c +++ b/io_u.c @@ -653,6 +653,9 @@ int io_u_quiesce(struct thread_data *td) completed += ret; } + if (td->flags & TD_F_REGROW_LOGS) + regrow_logs(td); + return completed; } -- Jens Axboe ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: fio assert 2016-12-01 15:12 ` Jens Axboe @ 2016-12-02 0:10 ` Jaze Lee 2016-12-02 0:33 ` Jens Axboe 0 siblings, 1 reply; 8+ messages in thread From: Jaze Lee @ 2016-12-02 0:10 UTC (permalink / raw) To: Jens Axboe; +Cc: Sitsofe Wheeler, fio Thanks, it works. But can u explain more about how it works? I thought we may change stat.c..... 2016-12-01 23:12 GMT+08:00 Jens Axboe <axboe@kernel.dk>: > On 12/01/2016 03:03 AM, Jaze Lee wrote: >> Hello, >> Dose this issue can be fixed? Or the only method is to use offload >> to workaround. > > Can you try the below patch? We simply need to make sure that we always > regrow, when we enter an idle state. The logic was a little warped, the > below is cleaner. > > > diff --git a/backend.c b/backend.c > index 8616fc2..ac71521 100644 > --- a/backend.c > +++ b/backend.c > @@ -441,11 +441,8 @@ static int wait_for_completions(struct thread_data *td, struct timeval *time) > int min_evts = 0; > int ret; > > - if (td->flags & TD_F_REGROW_LOGS) { > - ret = io_u_quiesce(td); > - regrow_logs(td); > - return ret; > - } > + if (td->flags & TD_F_REGROW_LOGS) > + return io_u_quiesce(td); > > /* > * if the queue is full, we MUST reap at least 1 event > diff --git a/io_u.c b/io_u.c > index 428d4b7..7420629 100644 > --- a/io_u.c > +++ b/io_u.c > @@ -653,6 +653,9 @@ int io_u_quiesce(struct thread_data *td) > completed += ret; > } > > + if (td->flags & TD_F_REGROW_LOGS) > + regrow_logs(td); > + > return completed; > } > > > -- > Jens Axboe > -- 谦谦君子 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio assert 2016-12-02 0:10 ` Jaze Lee @ 2016-12-02 0:33 ` Jens Axboe 2016-12-02 4:10 ` Jaze Lee 0 siblings, 1 reply; 8+ messages in thread From: Jens Axboe @ 2016-12-02 0:33 UTC (permalink / raw) To: Jaze Lee; +Cc: Sitsofe Wheeler, fio On 12/01/2016 05:10 PM, Jaze Lee wrote: > Thanks, it works. But can u explain more about how it works? > I thought we may change stat.c..... It wasn't a problem in stat.c, that was just the symptom. Fio has to regrow the log segments, and it needs a call at the right time to do that. For rated IO, we were missing that. The patch fixes that up. Thanks for testing, I'll commit the fix. -- Jens Axboe ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fio assert 2016-12-02 0:33 ` Jens Axboe @ 2016-12-02 4:10 ` Jaze Lee 0 siblings, 0 replies; 8+ messages in thread From: Jaze Lee @ 2016-12-02 4:10 UTC (permalink / raw) To: Jens Axboe; +Cc: Sitsofe Wheeler, fio OK. Thanks Jens. 2016-12-02 8:33 GMT+08:00 Jens Axboe <axboe@kernel.dk>: > On 12/01/2016 05:10 PM, Jaze Lee wrote: >> Thanks, it works. But can u explain more about how it works? >> I thought we may change stat.c..... > > It wasn't a problem in stat.c, that was just the symptom. Fio has to > regrow the log segments, and it needs a call at the right time to do > that. For rated IO, we were missing that. The patch fixes that up. > > Thanks for testing, I'll commit the fix. > > -- > Jens Axboe > -- 谦谦君子 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-12-02 4:10 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-01 3:12 fio assert Jaze Lee 2016-11-01 6:07 ` Sitsofe Wheeler [not found] ` <CAKZcxgc2k=o2_kbYdFmxScxC2Zns+SKi++=rdxvwD2vGj83RZg@mail.gmail.com> [not found] ` <CAKZcxgfr_RfHD0XcDV0nMgFhU1nPNEzP0GciE-Unw-i-k3eaSw@mail.gmail.com> [not found] ` <CAKZcxgeXnostqxPAGi2ek1x+Tw7_CxrnjVzuy3QrJhwbERewnQ@mail.gmail.com> 2016-11-03 7:31 ` Sitsofe Wheeler 2016-12-01 10:03 ` Jaze Lee 2016-12-01 15:12 ` Jens Axboe 2016-12-02 0:10 ` Jaze Lee 2016-12-02 0:33 ` Jens Axboe 2016-12-02 4:10 ` Jaze Lee
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.