All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Taylor <ctaylor@pivot3.com>
To: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: "fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: RE: FIO log averaging issue with --write_bw_log and --write_iops_log using > 1000 log_avg_msec values
Date: Sat, 1 Apr 2017 01:32:25 +0000	[thread overview]
Message-ID: <CY4PR10MB14942012DEB923224A03FAE28C360@CY4PR10MB1494.namprd10.prod.outlook.com> (raw)
In-Reply-To: <CALjAwxiwMA89bJHs9-_MN9eNNibJTtPUBqrMCN6LbMD+kF_Ddw@mail.gmail.com>

Hi Sitsofe,
Yes, that is correct.


-----Original Message-----
From: Sitsofe Wheeler [mailto:sitsofe@gmail.com] 
Sent: Friday, March 31, 2017 12:35 AM
To: Chris Taylor <ctaylor@pivot3.com>
Cc: fio@vger.kernel.org
Subject: Re: FIO log averaging issue with --write_bw_log and --write_iops_log using > 1000 log_avg_msec values

Hi Chris,

Sorry for the slow reply. So this change is to deal with the case where the time being added is two more seconds in size?

On 2 March 2017 at 21:52, Chris Taylor <ctaylor@pivot3.com> wrote:
> Hi,
> I am less concerned about any possible timing issues as they are probably miniscule and I can't think of the scenario where it would be a problem and cause less accurate results to really be concerned about.
> Tests pretty much prove we are getting close to what we expect:
> fio --eta=never --ioengine=libaio --readwrite=write --time_based=1 
> --runtime=60 --filename=/tmp/fio --size=1M --direct=1 --name=test 
> --write_iops_log=/tmp/test --log_avg_msec=5500 --log_unix_epoch=1 
> 1488489137632, 6413, 1, 0 1488489143131, 5503, 1, 0 1488489148631, 
> 5585, 1, 0 1488489154131, 5883, 1, 0 1488489159631, 5726, 1, 0 
> 1488489165131, 5331, 1, 0 1488489170631, 5772, 1, 0 1488489176131, 
> 5661, 1, 0 1488489181631, 5970, 1, 0 1488489187131, 5776, 1, 0 [5499, 
> 5500, 5500, 5500, 5500, 5500, 5500, 5500, 5500]
>
> However, the problem still exists where in timeval_add_msec gets a 
> value greater than 1000 msecs the tv->tv_usec it will grow beyond 
> usecs (1000000) and causes unexpected behavior.  Possibly this is the 
> better fix to avoid other callers having to adjust for  the overflow 
> in timeval_add_msecs?\ diff --git a/time.c b/time.c index 
> f5dc049..a318ad5 100644
> --- a/time.c
> +++ b/time.c
> @@ -9,10 +9,9 @@ static unsigned long ns_granularity;  void 
> timeval_add_msec(struct timeval *tv, unsigned int msec)  {
>         tv->tv_usec += 1000 * msec;
> -       if (tv->tv_usec >= 1000000) {
> -               tv->tv_usec -= 1000000;
> -               tv->tv_sec++;
> -       }
> +    tv->tv_sec += tv->tv_usec / 1000000;
> +    tv->tv_usec %= 1000000;
> +
>  }
>
>  /*
> Running the same job as above with the change:
> 1488489763099, 5773, 1, 0
> 1488489768599, 5852, 1, 0
> 1488489774099, 5654, 1, 0
> 1488489779599, 5482, 1, 0
> 1488489785099, 5650, 1, 0
> 1488489790599, 5752, 1, 0
> 1488489796099, 5390, 1, 0
> 1488489801599, 5746, 1, 0
> 1488489807099, 5610, 1, 0
> 1488489812599, 5361, 1, 0
> [5500, 5500, 5500, 5500, 5500, 5500, 5500, 5500, 5500]
>
> Thanks,
> Chris
>
> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@gmail.com]
> Sent: Tuesday, February 28, 2017 11:50 AM
> To: Chris Taylor <ctaylor@pivot3.com>
> Cc: fio@vger.kernel.org
> Subject: Re: FIO log averaging issue with --write_bw_log and 
> --write_iops_log using > 1000 log_avg_msec values
>
> Hi,
>
> Re timing issues: do you mean because of usec /= 1000; on line 428? As mtime_since() returns milliseconds it would mean all fractional millisecond amounts would be pushed downward so potentially things could be be under by a millisecond. If that's what you're thinking of I guess how bad that is depends on how much accuracy you need and how much jitter there is within the system already. Perhaps you can demonstrate a problem scenario so folks numerate than I will see the problem too...
>
> Re rounding problem - fair enough.
>
> On 28 February 2017 at 00:47, Chris Taylor <ctaylor@pivot3.com> wrote:
>> Hi Sitsofe,
>> Looks like this is handled in commit be6bb2b72608d7efbec13d06c67446e229136afa.
>> I am no longer able to reproduce the issue with latest version of FIO 2.17, with that change as usecs will now be 0.  Could this potentially cause some timing issues?
>>
>> Prior to that commit...
>> The job below would result in bw_logs and iops_log that have skewed 
>> results both the add_iops_samples and add_bw_samples makes a call to 
>> mtime_since with start time as the _sample_time
>> ./stat.c:2435:  spent = mtime_since(&td->bw_sample_time, t)
>> ./stat.c:2510:  spent = mtime_since(&td->iops_sample_time, t);
>>
>> ./stat.c:2473:  timeval_add_msec(&td->bw_sample_time,
>> td->o.bw_avg_time);
>> ./stat.c:2548:  timeval_add_msec(&td->iops_sample_time,
>> td->o.iops_avg_time);
>>
>> stat.c:2480:27: runtime error: unsigned integer overflow: 2000 - 18446744073708449 cannot be represented in type 'unsigned long'
>> stat.c:2555:29: runtime error: unsigned integer overflow: 2000 - 18446744073708449 cannot be represented in type 'unsigned long'
>>
>> The overflow at stat.c:2265 is a different issue possibly a rounding problem with msec and usecs.
>
> --
> Sitsofe | http://sucs.org/~sits/
>
> On 28 February 2017 at 00:47, Chris Taylor <ctaylor@pivot3.com> wrote:
>> Hi Sitsofe,
>> Looks like this is handled in commit be6bb2b72608d7efbec13d06c67446e229136afa.
>> I am no longer able to reproduce the issue with latest version of FIO 2.17, with that change as usecs will now be 0.  Could this potentially cause some timing issues?
>>
>> Prior to that commit...
>> The job below would result in bw_logs and iops_log that have skewed 
>> results both the add_iops_samples and add_bw_samples makes a call to 
>> mtime_since with start time as the _sample_time
>> ./stat.c:2435:  spent = mtime_since(&td->bw_sample_time, t)
>> ./stat.c:2510:  spent = mtime_since(&td->iops_sample_time, t);
>>
>> ./stat.c:2473:  timeval_add_msec(&td->bw_sample_time,
>> td->o.bw_avg_time);
>> ./stat.c:2548:  timeval_add_msec(&td->iops_sample_time,
>> td->o.iops_avg_time);
>>
>> stat.c:2480:27: runtime error: unsigned integer overflow: 2000 - 18446744073708449 cannot be represented in type 'unsigned long'
>> stat.c:2555:29: runtime error: unsigned integer overflow: 2000 - 18446744073708449 cannot be represented in type 'unsigned long'
>>
>> The overflow at stat.c:2265 is a different issue possibly a rounding problem with msec and usecs.
>>
>>
>> Thanks,
>> Chris
>>
>> On 23 February 2017 at 20:55, Chris Taylor <ctaylor@pivot3.com> wrote:
>>> There is an issue in FIO when using the detailed bandwidth and iops logging with averaging over a period of time > 1 second.
>>> It seems that usecs overflows which later causes negative time diff values resulting in skewed toward 0 results.
>>> I have attached a potential fix that should prevent usecs from going beyond 1000000.
>>
>> Is this problem shown by the following job:
>>
>> ./fio --ioengine=null --write_bw_log=/dev/null --log_avg_msec=2000 
>> --size=100M --runtime=10s --time_based --name=overflow
>>
>> Among Clang's undefined behaviour sanitizer output was this for the above:
>> stat.c:2265:43: runtime error: unsigned integer overflow: 1999 - 2000 cannot be represented in type 'unsigned long'
>> stat.c:2265:28: runtime error: unsigned integer overflow: 3999 -
>> 4294967295 cannot be represented in type 'unsigned long'
>>
>>> diff --git a/time.c b/time.c
>>> index f5dc049..c748bee 100644
>>> --- a/time.c
>>> +++ b/time.c
>>> @@ -8,11 +8,18 @@ static unsigned long ns_granularity;
>>>
>>> void timeval_add_msec(struct timeval *tv, unsigned int msec) {
>>> -       tv->tv_usec += 1000 * msec;
>>> -       if (tv->tv_usec >= 1000000) {
>>> -               tv->tv_usec -= 1000000;
>>> -               tv->tv_sec++;
>>> -       }
>>> +       int adj_usec = 1000 * msec;
>>> +       int adj_sec = 0;
>>> +       tv->tv_usec +=  adj_usec;
>>> +       if (adj_usec >= 1000000) {
>>> +               adj_sec = adj_usec / 1000000;
>>> +               tv->tv_usec -=  adj_sec * 1000000;
>>> +               tv->tv_sec += adj_sec;
>>> +    }
>>> +    if (tv->tv_usec >= 1000000){
>>> +            tv->tv_usec -= 1000000;
>>> +            tv->tv_sec++;
>>> +    }
>>> }
>>
>> Is it still safe to use int if you're targeting a 32 bit platform?
>> --
>> Sitsofe | http://sucs.org/~sits/
>
>
>
> --
> Sitsofe | http://sucs.org/~sits/



--
Sitsofe | http://sucs.org/~sits/

  reply	other threads:[~2017-04-01  1:32 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-23 20:55 FIO log averaging issue with --write_bw_log and --write_iops_log using > 1000 log_avg_msec values Chris Taylor
2017-02-24  1:24 ` Sitsofe Wheeler
2017-02-28  0:47   ` Chris Taylor
2017-02-28 18:49     ` Sitsofe Wheeler
2017-03-02 21:52       ` Chris Taylor
2017-03-31  6:34         ` Sitsofe Wheeler
2017-04-01  1:32           ` Chris Taylor [this message]
2017-04-02  8:07             ` Sitsofe Wheeler
2017-04-02 21:56               ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR10MB14942012DEB923224A03FAE28C360@CY4PR10MB1494.namprd10.prod.outlook.com \
    --to=ctaylor@pivot3.com \
    --cc=fio@vger.kernel.org \
    --cc=sitsofe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.