* fio hangs with --status-interval
@ 2014-07-09 22:56 Michael Mattsson
2014-07-10 8:44 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-09 22:56 UTC (permalink / raw)
To: fio
Hey,
I've got 8 identical CentOS 6.5 clients that randomly keeps hanging
fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10
they both behave the same. I've also tried piping the output to tee
instead of redirecting to a file. I also tried --output and specified
output file, still same problem. My fio command runs through its tests
flawlessly without --status-interval and exits cleanly every time.
There could be anywhere from 0 to 5 clients that gets affected.
Running strace on the process that seem hung yields the following
output:
$ strace -p 31055
Process 31055 attached - interrupt to quit
futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL
It will sit there forever. Excuse the splurge but this is the command
(run 8 clients simultaneously where /fut1 and /fut2 is on shared
storage (NFSv3)):
$ fio --minimal --direct=1 --group_reporting --filesize=64m
--norandommap --blocksize=4k --time_based --iodepth=1 --ramp_time=15
--ioengine=libaio --status-interval=5 --name=mytest-foo --rw=randread
--numjobs=64 --filename=/fut1/6of4_1:/fut2/6of4_1:/fut1/6of4_2:/fut2/6of4_2:/fut1/6of4_3:/fut2/6of4_3:/fut1/6of4_4:/fut2/6of4_4:/fut1/6of4_5:/fut2/6of4_5:/fut1/6of4_6:/fut2/6of4_6:/fut1/6of4_7:/fut2/6of4_7:/fut1/6of4_8:/fut2/6of4_8:/fut1/6of4_9:/fut2/6of4_9:/fut1/6of4_10:/fut2/6of4_10:/fut1/6of4_11:/fut2/6of4_11:/fut1/6of4_12:/fut2/6of4_12:/fut1/6of4_13:/fut2/6of4_13:/fut1/6of4_14:/fut2/6of4_14:/fut1/6of4_15:/fut2/6of4_15:/fut1/6of4_16:/fut2/6of4_16:/fut1/6of4_17:/fut2/6of4_17:/fut1/6of4_18:/fut2/6of4_18:/fut1/6of4_19:/fut2/6of4_19:/fut1/6of4_20:/fut2/6of4_20:/fut1/6of4_21:/fut2/6of4_21:/fut1/6of4_22:/fut2/6of4_22:/fut1/6of4_23:/fut2/6of4_23:/fut1/6of4_24:/fut2/6of4_24:/fut1/6of4_25:/fut2/6of4_25:/fut1/6of4_26:/fut2/6of4_26:/fut1/6of4_27:/fut2/6of4_27:/fut1/6of4_28:/fut2/6of4_28:/fut1/6of4_29:/fut2/6of4_29:/fut1/6of4_30:/fut2/6of4_30:/fut1/6of4_31:/fut2/6of4_31:/fut1/6of4_32:/fut2/6of4_32
--runtime=60
Worth noting is that the output file is redirected to shared storage
(NFSv4) on a different system than the one under test.
These are the fio 2.1.4 compile options:
$ ./configure
Operating system Linux
CPU x86_64
Big endian no
Compiler gcc
Cross compile no
Wordsize 64
zlib no
Linux AIO support yes
POSIX AIO support yes
POSIX AIO support needs -lrt yes
POSIX AIO fsync yes
Solaris AIO support no
__sync_fetch_and_add yes
libverbs no
rdmacm no
Linux fallocate yes
POSIX fadvise yes
POSIX fallocate yes
sched_setaffinity(3 arg) yes
sched_setaffinity(2 arg) no
clock_gettime yes
CLOCK_MONOTONIC yes
CLOCK_MONOTONIC_PRECISE no
gettimeofday yes
fdatasync yes
sync_file_range yes
EXT4 move extent yes
Linux splice(2) yes
GUASI no
Fusion-io atomic engine no
libnuma no
strsep yes
strcasestr yes
getopt_long_only() yes
inet_aton yes
socklen_t yes
__thread yes
gtk 2.18 or higher no
RUSAGE_THREAD yes
SCHED_IDLE yes
TCP_NODELAY yes
RLIMIT_MEMLOCK yes
pwritev/preadv yes
Regards
Michael
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-09 22:56 fio hangs with --status-interval Michael Mattsson
@ 2014-07-10 8:44 ` Jens Axboe
2014-07-10 14:55 ` Michael Mattsson
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-10 8:44 UTC (permalink / raw)
To: Michael Mattsson, fio
[-- Attachment #1: Type: text/plain, Size: 988 bytes --]
On 2014-07-10 00:56, Michael Mattsson wrote:
> Hey,
> I've got 8 identical CentOS 6.5 clients that randomly keeps hanging
> fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10
> they both behave the same. I've also tried piping the output to tee
> instead of redirecting to a file. I also tried --output and specified
> output file, still same problem. My fio command runs through its tests
> flawlessly without --status-interval and exits cleanly every time.
> There could be anywhere from 0 to 5 clients that gets affected.
> Running strace on the process that seem hung yields the following
> output:
>
> $ strace -p 31055
> Process 31055 attached - interrupt to quit
> futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL
Strange, it must be stuck on the stat mutex, but I don't immediately see
why that would happen. Does the attached patch make any difference for
you, both in getting rid of the hang but still producing output at the
desired intervals?
--
Jens Axboe
[-- Attachment #2: stat.patch --]
[-- Type: text/x-patch, Size: 1790 bytes --]
diff --git a/stat.c b/stat.c
index 979c8100d378..93316a239f7b 100644
--- a/stat.c
+++ b/stat.c
@@ -1466,11 +1466,12 @@ static void *__show_running_run_stats(void fio_unused *arg)
* in the sig handler, but we should be disturbing the system less by just
* creating a thread to do it.
*/
-void show_running_run_stats(void)
+int show_running_run_stats(void)
{
pthread_t thread;
- fio_mutex_down(stat_mutex);
+ if (fio_mutex_down_trylock(stat_mutex))
+ return 1;
if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) {
int err;
@@ -1479,10 +1480,11 @@ void show_running_run_stats(void)
if (err)
log_err("fio: DU thread detach failed: %s\n", strerror(err));
- return;
+ return 0;
}
fio_mutex_up(stat_mutex);
+ return 1;
}
static int status_interval_init;
@@ -1531,8 +1533,8 @@ void check_for_running_stats(void)
fio_gettime(&status_time, NULL);
status_interval_init = 1;
} else if (mtime_since_now(&status_time) >= status_interval) {
- show_running_run_stats();
- fio_gettime(&status_time, NULL);
+ if (!show_running_run_stats())
+ fio_gettime(&status_time, NULL);
return;
}
}
diff --git a/stat.h b/stat.h
index 2e46175053e8..82b8e973e4be 100644
--- a/stat.h
+++ b/stat.h
@@ -218,7 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs);
extern int calc_thread_status(struct jobs_eta *je, int force);
extern void display_thread_status(struct jobs_eta *je);
extern void show_run_stats(void);
-extern void show_running_run_stats(void);
+extern int show_running_run_stats(void);
extern void check_for_running_stats(void);
extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr);
extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-10 8:44 ` Jens Axboe
@ 2014-07-10 14:55 ` Michael Mattsson
2014-07-10 20:07 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-10 14:55 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
Hey,
Thanks for the patch. I got log output but I had two clients hanging
in the same way with the attached patch.
$ patch -p0 < stat.patch
patching file b/stat.c
patching file b/stat.h
Hunk #1 succeeded at 213 (offset -5 lines).
I was using --output <filename>
Worth mentioning here is that on the NFS server the output file is
written to is issuing a stat(1) on the above output file once per
second. Could it cause any prolems?
Thanks!
Michael
On Thu, Jul 10, 2014 at 1:44 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-10 00:56, Michael Mattsson wrote:
>>
>> Hey,
>> I've got 8 identical CentOS 6.5 clients that randomly keeps hanging
>> fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10
>> they both behave the same. I've also tried piping the output to tee
>> instead of redirecting to a file. I also tried --output and specified
>> output file, still same problem. My fio command runs through its tests
>> flawlessly without --status-interval and exits cleanly every time.
>> There could be anywhere from 0 to 5 clients that gets affected.
>> Running strace on the process that seem hung yields the following
>> output:
>>
>> $ strace -p 31055
>> Process 31055 attached - interrupt to quit
>> futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL
>
>
> Strange, it must be stuck on the stat mutex, but I don't immediately see why
> that would happen. Does the attached patch make any difference for you, both
> in getting rid of the hang but still producing output at the desired
> intervals?
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-10 14:55 ` Michael Mattsson
@ 2014-07-10 20:07 ` Jens Axboe
2014-07-10 21:27 ` Michael Mattsson
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-10 20:07 UTC (permalink / raw)
To: Michael Mattsson; +Cc: fio
On 2014-07-10 16:55, Michael Mattsson wrote:
> Hey,
> Thanks for the patch. I got log output but I had two clients hanging
> in the same way with the attached patch.
>
> $ patch -p0 < stat.patch
> patching file b/stat.c
> patching file b/stat.h
> Hunk #1 succeeded at 213 (offset -5 lines).
>
> I was using --output <filename>
>
> Worth mentioning here is that on the NFS server the output file is
> written to is issuing a stat(1) on the above output file once per
> second. Could it cause any prolems?
OK, so next question. If you leave it long enough, do you get "stuck
process" dumps from the kernel? In any case, a:
# echo t > /proc/sysrq-trigger
dump would be handy to see for all the fio processes, this doesn't smell
like a fio issue.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-10 20:07 ` Jens Axboe
@ 2014-07-10 21:27 ` Michael Mattsson
2014-07-11 11:48 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-10 21:27 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
Hey,
I don't get any kernel messages about stuck processes. echo t >
/proc/sysrq-trigger gives the following related to fio:
fio S 0000000000000004 0 4189 3638 0x00000080
ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
Call Trace:
[<ffffffff8111f867>] ? unlock_page+0x27/0x30
[<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
[<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
[<ffffffff810af668>] futex_wait+0x1f8/0x380
[<ffffffff8100988e>] ? __switch_to+0x26e/0x320
[<ffffffff810b0f31>] do_futex+0x121/0xb50
[<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
[<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
[<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
[<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
[<ffffffff810b19db>] sys_futex+0x7b/0x170
[<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
[<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio S 0000000000000004 0 4191 3638 0x00000080
ffff88186bccde68 0000000000000086 0000000000000000 ffffffff8109fdd3
0000000000000004 0000000100000286 ffff88186bccde08 ffffffff8109f491
ffff8818680c7af8 ffff88186bccdfd8 000000000000fbc8 ffff8818680c7af8
Call Trace:
[<ffffffff8109fdd3>] ? __hrtimer_start_range_ns+0x1a3/0x460
[<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8152a40b>] do_nanosleep+0x8b/0xc0
[<ffffffff810a0294>] hrtimer_nanosleep+0xc4/0x180
[<ffffffff8109f0f0>] ? hrtimer_wakeup+0x0/0x30
[<ffffffff810a00c4>] ? hrtimer_start_range_ns+0x14/0x20
[<ffffffff810a03be>] sys_nanosleep+0x6e/0x80
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio S 0000000000000002 0 4270 3638 0x00000080
ffff88186a503bf8 0000000000000086 ffff88186a503b68 ffffffff8111f867
ffff88186a503d01 00007fa2f39b2000 ffff88186a503bd8 ffffffff810aec10
ffff88186a747058 ffff88186a503fd8 000000000000fbc8 ffff88186a747058
Call Trace:
[<ffffffff8111f867>] ? unlock_page+0x27/0x30
[<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
[<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
[<ffffffff810af668>] futex_wait+0x1f8/0x380
[<ffffffff810b0f31>] do_futex+0x121/0xb50
[<ffffffff810b19db>] sys_futex+0x7b/0x170
[<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
[<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Above output is without the stat.patch, below is with the patch:
fio S 0000000000000004 0 4189 3638 0x00000080
ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
Call Trace:
[<ffffffff8111f867>] ? unlock_page+0x27/0x30
[<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
[<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
[<ffffffff810af668>] futex_wait+0x1f8/0x380
[<ffffffff8100988e>] ? __switch_to+0x26e/0x320
[<ffffffff810b0f31>] do_futex+0x121/0xb50
[<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
[<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
[<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
[<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
[<ffffffff810b19db>] sys_futex+0x7b/0x170
[<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
[<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio S 0000000000000004 0 4191 3638 0x00000080
ffff88186bccde68 0000000000000086 0000000000000000 ffffffff8109fdd3
0000000000000004 0000000100000286 ffff88186bccde08 ffffffff8109f491
ffff8818680c7af8 ffff88186bccdfd8 000000000000fbc8 ffff8818680c7af8
Call Trace:
[<ffffffff8109fdd3>] ? __hrtimer_start_range_ns+0x1a3/0x460
[<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
[<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[<ffffffff8152a40b>] do_nanosleep+0x8b/0xc0
[<ffffffff810a0294>] hrtimer_nanosleep+0xc4/0x180
[<ffffffff8109f0f0>] ? hrtimer_wakeup+0x0/0x30
[<ffffffff810a00c4>] ? hrtimer_start_range_ns+0x14/0x20
[<ffffffff810a03be>] sys_nanosleep+0x6e/0x80
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio S 0000000000000002 0 4270 3638 0x00000080
ffff88186a503bf8 0000000000000086 ffff88186a503b68 ffffffff8111f867
ffff88186a503d01 00007fa2f39b2000 ffff88186a503bd8 ffffffff810aec10
ffff88186a747058 ffff88186a503fd8 000000000000fbc8 ffff88186a747058
Call Trace:
[<ffffffff8111f867>] ? unlock_page+0x27/0x30
[<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
[<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
[<ffffffff810af668>] futex_wait+0x1f8/0x380
[<ffffffff810b0f31>] do_futex+0x121/0xb50
[<ffffffff810b19db>] sys_futex+0x7b/0x170
[<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
[<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
[<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Thanks!
Michael
On Thu, Jul 10, 2014 at 1:07 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-10 16:55, Michael Mattsson wrote:
>>
>> Hey,
>> Thanks for the patch. I got log output but I had two clients hanging
>> in the same way with the attached patch.
>>
>> $ patch -p0 < stat.patch
>> patching file b/stat.c
>> patching file b/stat.h
>> Hunk #1 succeeded at 213 (offset -5 lines).
>>
>> I was using --output <filename>
>>
>> Worth mentioning here is that on the NFS server the output file is
>> written to is issuing a stat(1) on the above output file once per
>> second. Could it cause any prolems?
>
>
> OK, so next question. If you leave it long enough, do you get "stuck
> process" dumps from the kernel? In any case, a:
>
> # echo t > /proc/sysrq-trigger
>
> dump would be handy to see for all the fio processes, this doesn't smell
> like a fio issue.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-10 21:27 ` Michael Mattsson
@ 2014-07-11 11:48 ` Jens Axboe
2014-07-11 16:19 ` Michael Mattsson
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-11 11:48 UTC (permalink / raw)
To: Michael Mattsson; +Cc: fio
On 2014-07-10 23:27, Michael Mattsson wrote:
> Hey,
> I don't get any kernel messages about stuck processes. echo t >
> /proc/sysrq-trigger gives the following related to fio:
>
> fio S 0000000000000004 0 4189 3638 0x00000080
> ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
> ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
> ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
> Call Trace:
> [<ffffffff8111f867>] ? unlock_page+0x27/0x30
> [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
> [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
> [<ffffffff810af668>] futex_wait+0x1f8/0x380
> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
> [<ffffffff810b0f31>] do_futex+0x121/0xb50
> [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
> [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
> [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
> [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
> [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
> [<ffffffff810b19db>] sys_futex+0x7b/0x170
> [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
> [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
OK, so nothing unexpected there. But it's all still very weird. Could
you try and attach gdb to a fio process that is stuck like this, and
generate 'bt' backtraces?
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-11 11:48 ` Jens Axboe
@ 2014-07-11 16:19 ` Michael Mattsson
2014-07-12 8:42 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-11 16:19 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
Hey,
This system will be busy over the weekend with other work, I'll get
this to you at the next window or try reproduce it on a different
system. Do you want the trace with or without the patch you sent?
Thanks!
Michael
On Fri, Jul 11, 2014 at 4:48 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-10 23:27, Michael Mattsson wrote:
>>
>> Hey,
>> I don't get any kernel messages about stuck processes. echo t >
>> /proc/sysrq-trigger gives the following related to fio:
>>
>> fio S 0000000000000004 0 4189 3638 0x00000080
>> ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
>> ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
>> ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
>> Call Trace:
>> [<ffffffff8111f867>] ? unlock_page+0x27/0x30
>> [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
>> [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
>> [<ffffffff810af668>] futex_wait+0x1f8/0x380
>> [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
>> [<ffffffff810b0f31>] do_futex+0x121/0xb50
>> [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
>> [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
>> [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
>> [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
>> [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
>> [<ffffffff810b19db>] sys_futex+0x7b/0x170
>> [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
>> [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
>> [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>
>
> OK, so nothing unexpected there. But it's all still very weird. Could you
> try and attach gdb to a fio process that is stuck like this, and generate
> 'bt' backtraces?
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-11 16:19 ` Michael Mattsson
@ 2014-07-12 8:42 ` Jens Axboe
2014-07-16 16:58 ` Vasily Tarasov
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-12 8:42 UTC (permalink / raw)
To: Michael Mattsson; +Cc: fio
On 2014-07-11 18:19, Michael Mattsson wrote:
> Hey,
> This system will be busy over the weekend with other work, I'll get
> this to you at the next window or try reproduce it on a different
> system. Do you want the trace with or without the patch you sent?
That's fine - and either version can be used, doesn't matter to me.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-12 8:42 ` Jens Axboe
@ 2014-07-16 16:58 ` Vasily Tarasov
2014-07-21 8:08 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-16 16:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Michael Mattsson, fio
I started to observe similar behavior on one of my workloads. Also,
with periodic statistics output and also on RHEL 6.5. Here is gdb
output in my case:
# ps axu | grep fio
root 4489 0.0 0.0 322040 52816 pts/1 Sl+ 08:31 0:03 fio
--status-interval 10 --minimal fios/1.fio
root 5547 0.0 0.0 103256 860 pts/0 S+ 09:56 0:00 grep fio
# cat /proc/4489/wchan
futex_wait_queue_me
# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) attach 4489
Attaching to process 4489
Reading symbols from /usr/local/bin/fio...done.
Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/librdmacm.so.1
Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib64/libibverbs.so.1
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libaio.so.1
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[New LWP 4768]
[New LWP 4491]
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.132.el6.x86_64 libaio-0.3.107-10.el6.x86_64
libibverbs-1.1.7-1.el6.x86_64 librdmacm-1.0.17-1.el6.x86_64
zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x000000000042ea39 in fio_mutex_down (mutex=0x7f08b4e9f000) at mutex.c:155
#2 0x000000000041b680 in show_run_stats () at stat.c:1409
#3 0x0000000000449c85 in fio_backend () at backend.c:2042
#4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
#5 0x000000000040a4b9 in _start ()
(gdb)
Does it help?
Thanks,
Vasily
On Sat, Jul 12, 2014 at 4:42 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-11 18:19, Michael Mattsson wrote:
>>
>> Hey,
>> This system will be busy over the weekend with other work, I'll get
>> this to you at the next window or try reproduce it on a different
>> system. Do you want the trace with or without the patch you sent?
>
>
> That's fine - and either version can be used, doesn't matter to me.
>
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-16 16:58 ` Vasily Tarasov
@ 2014-07-21 8:08 ` Jens Axboe
2014-07-21 20:25 ` Vasily Tarasov
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-21 8:08 UTC (permalink / raw)
To: Vasily Tarasov; +Cc: Michael Mattsson, fio
[-- Attachment #1: Type: text/plain, Size: 3422 bytes --]
On 2014-07-16 18:58, Vasily Tarasov wrote:
> I started to observe similar behavior on one of my workloads. Also,
> with periodic statistics output and also on RHEL 6.5. Here is gdb
> output in my case:
>
>
> # ps axu | grep fio
> root 4489 0.0 0.0 322040 52816 pts/1 Sl+ 08:31 0:03 fio
> --status-interval 10 --minimal fios/1.fio
> root 5547 0.0 0.0 103256 860 pts/0 S+ 09:56 0:00 grep fio
>
> # cat /proc/4489/wchan
> futex_wait_queue_me
>
> # gdb
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> (gdb) attach 4489
> Attaching to process 4489
> Reading symbols from /usr/local/bin/fio...done.
> Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/librdmacm.so.1
> Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib64/libibverbs.so.1
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libaio.so.1
> Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libz.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [New LWP 4768]
> [New LWP 4491]
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.132.el6.x86_64 libaio-0.3.107-10.el6.x86_64
> libibverbs-1.1.7-1.el6.x86_64 librdmacm-1.0.17-1.el6.x86_64
> zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x000000000042ea39 in fio_mutex_down (mutex=0x7f08b4e9f000) at mutex.c:155
> #2 0x000000000041b680 in show_run_stats () at stat.c:1409
> #3 0x0000000000449c85 in fio_backend () at backend.c:2042
> #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
> #5 0x000000000040a4b9 in _start ()
> (gdb)
Are there other threads alive, would be interesting to see a backtrace
from them. In any case, I think it'd be better to move the stat mutex
grab to the stat thread itself. I can't reproduce this, so can you check
if the attached patch makes a difference?
--
Jens Axboe
[-- Attachment #2: stat.patch --]
[-- Type: text/x-patch, Size: 1325 bytes --]
diff --git a/stat.c b/stat.c
index 979c8100d378..d8365811b25f 100644
--- a/stat.c
+++ b/stat.c
@@ -1411,13 +1411,15 @@ void show_run_stats(void)
fio_mutex_up(stat_mutex);
}
-static void *__show_running_run_stats(void fio_unused *arg)
+static void *__show_running_run_stats(void *arg)
{
struct thread_data *td;
unsigned long long *rt;
struct timeval tv;
int i;
+ fio_mutex_down(stat_mutex);
+
rt = malloc(thread_number * sizeof(unsigned long long));
fio_gettime(&tv, NULL);
@@ -1458,6 +1460,7 @@ static void *__show_running_run_stats(void fio_unused *arg)
free(rt);
fio_mutex_up(stat_mutex);
+ free(arg);
return NULL;
}
@@ -1468,21 +1471,23 @@ static void *__show_running_run_stats(void fio_unused *arg)
*/
void show_running_run_stats(void)
{
- pthread_t thread;
+ pthread_t *thread;
- fio_mutex_down(stat_mutex);
+ thread = calloc(1, sizeof(*thread));
+ if (!thread)
+ return;
- if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) {
+ if (!pthread_create(thread, NULL, __show_running_run_stats, thread)) {
int err;
- err = pthread_detach(thread);
+ err = pthread_detach(*thread);
if (err)
log_err("fio: DU thread detach failed: %s\n", strerror(err));
return;
}
- fio_mutex_up(stat_mutex);
+ free(thread);
}
static int status_interval_init;
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-21 8:08 ` Jens Axboe
@ 2014-07-21 20:25 ` Vasily Tarasov
2014-07-21 20:54 ` Jens Axboe
2014-07-25 7:43 ` Jens Axboe
0 siblings, 2 replies; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-21 20:25 UTC (permalink / raw)
To: Jens Axboe; +Cc: Michael Mattsson, fio
Hi Jens,
I tried your patch, but it didn't help. Interestingly, the number of
threads changes in the end. At first, during the run:
# ps -eLf | grep fio
root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio
--status-interval 10 --minimal fios/1.fio
root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio
[root@bison01 vass]# ps -eLf | grep fio
root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio
--status-interval 10 --minimal fios/1.fio
root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio
[root@bison01 vass]# ps -eLf | grep fio
root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio
--status-interval 10 --minimal fios/1.fio
root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio
Later, when the threads are stuck:
# ps -eLf | grep fio
root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio
Is the number of threads supposed to change?..
Here is the fio file I'm using:
<snip starts>
# cat fios/1.fio
[global]
rw=write
bs=8m
direct=0
[sdaa]
filename=/dev/sdaa
<snip ends>
Command line is
fio --status-interval 10 --minimal fios/1.fio
Thanks,
Vasily
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-21 20:25 ` Vasily Tarasov
@ 2014-07-21 20:54 ` Jens Axboe
2014-07-25 7:43 ` Jens Axboe
1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2014-07-21 20:54 UTC (permalink / raw)
To: Vasily Tarasov; +Cc: Michael Mattsson, fio
On 2014-07-21 22:25, Vasily Tarasov wrote:
> Hi Jens,
>
> I tried your patch, but it didn't help. Interestingly, the number of
> threads changes in the end. At first, during the run:
>
> # ps -eLf | grep fio
> root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio
> --status-interval 10 --minimal fios/1.fio
> root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio
> --status-interval 10 --minimal fios/1.fio
> root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio
> --status-interval 10 --minimal fios/1.fio
> root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio
>
> Later, when the threads are stuck:
>
> # ps -eLf | grep fio
> root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio
>
> Is the number of threads supposed to change?..
>
> Here is the fio file I'm using:
>
> <snip starts>
> # cat fios/1.fio
> [global]
> rw=write
> bs=8m
> direct=0
>
> [sdaa]
> filename=/dev/sdaa
> <snip ends>
>
> Command line is
>
> fio --status-interval 10 --minimal fios/1.fio
Thanks, I'll try and reproduce it tomorrow.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-21 20:25 ` Vasily Tarasov
2014-07-21 20:54 ` Jens Axboe
@ 2014-07-25 7:43 ` Jens Axboe
2014-07-25 7:56 ` Jens Axboe
1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-25 7:43 UTC (permalink / raw)
To: Vasily Tarasov; +Cc: Michael Mattsson, fio
[-- Attachment #1: Type: text/plain, Size: 3920 bytes --]
On 2014-07-21 22:25, Vasily Tarasov wrote:
> Hi Jens,
>
> I tried your patch, but it didn't help. Interestingly, the number of
> threads changes in the end. At first, during the run:
>
> # ps -eLf | grep fio
> root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio
> --status-interval 10 --minimal fios/1.fio
> root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio
> --status-interval 10 --minimal fios/1.fio
> root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio
> --status-interval 10 --minimal fios/1.fio
> root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio
>
> Later, when the threads are stuck:
>
> # ps -eLf | grep fio
> root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio
>
> Is the number of threads supposed to change?..
Never answered this one... Yes, it'll change, since when you run the
job, you'll have one backend process, a number of IO workers, and one
disk util thread typically. When you get stuck, it's the backend that is
left waiting for that mutex.
In any case, I haven't been able to figure this one out yet. But it
should be safe enough to just ignore the stat mutex for the final
output, since the threads otherwise accessing it are gone. Can you see
if this one makes the issue go away?
--
Jens Axboe
[-- Attachment #2: stat-hang.patch --]
[-- Type: text/x-patch, Size: 812 bytes --]
diff --git a/backend.c b/backend.c
index 30f78b72f9d1..981625b61095 100644
--- a/backend.c
+++ b/backend.c
@@ -2068,7 +2068,7 @@ int fio_backend(void)
run_threads();
if (!fio_abort) {
- show_run_stats();
+ __show_run_stats();
if (write_bw_log) {
int i;
diff --git a/stat.h b/stat.h
index 2e46175053e8..90a7fb31a1bc 100644
--- a/stat.h
+++ b/stat.h
@@ -218,6 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs);
extern int calc_thread_status(struct jobs_eta *je, int force);
extern void display_thread_status(struct jobs_eta *je);
extern void show_run_stats(void);
+extern void __show_run_stats(void);
extern void show_running_run_stats(void);
extern void check_for_running_stats(void);
extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr);
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-25 7:43 ` Jens Axboe
@ 2014-07-25 7:56 ` Jens Axboe
2014-07-25 16:34 ` Vasily Tarasov
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-25 7:56 UTC (permalink / raw)
To: Vasily Tarasov; +Cc: Michael Mattsson, fio
On 2014-07-25 09:43, Jens Axboe wrote:
> On 2014-07-21 22:25, Vasily Tarasov wrote:
>> Hi Jens,
>>
>> I tried your patch, but it didn't help. Interestingly, the number of
>> threads changes in the end. At first, during the run:
>>
>> # ps -eLf | grep fio
>> root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio
>> [root@bison01 vass]# ps -eLf | grep fio
>> root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio
>> [root@bison01 vass]# ps -eLf | grep fio
>> root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio
>>
>> Later, when the threads are stuck:
>>
>> # ps -eLf | grep fio
>> root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio
>>
>> Is the number of threads supposed to change?..
>
> Never answered this one... Yes, it'll change, since when you run the
> job, you'll have one backend process, a number of IO workers, and one
> disk util thread typically. When you get stuck, it's the backend that is
> left waiting for that mutex.
>
> In any case, I haven't been able to figure this one out yet. But it
> should be safe enough to just ignore the stat mutex for the final
> output, since the threads otherwise accessing it are gone. Can you see
> if this one makes the issue go away?
Patch was not compiled, was missing the non-static __show_run_stats().
But just pull current -git, I have committed a variant that does compile :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-25 7:56 ` Jens Axboe
@ 2014-07-25 16:34 ` Vasily Tarasov
2014-07-28 8:56 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-25 16:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: Michael Mattsson, fio
Hi Jens,
You'll be surprised but it did not help :( I used the latest code from
git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.
I don't know if it helps, but I see this behavior on a machine with
96GB of RAM. So, after buffered writes are over, fio waits for a long
time till all dirty buffers hit the disk. But, even after there is no
more disk activity, fio is still stuck for as long as I don't kill it.
Regarding the number of threads. I do understand where the 3 threads
can come from:
1) Backend thread (sort of a manager)
1) Worker thread(s)
2) Disk stats thread
I my case I defined only one job instance, so I suppose there always
should be only one worker thread. I don't understand how the total
number of threads go to 10 in the end.
<snip starts>
$ ps -eLf | grep fio
root 4427 4135 4427 0 15 07:44 pts/1 00:00:02 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4636 0 15 07:56 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4637 0 15 07:57 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4638 0 15 07:57 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4647 0 15 07:57 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4650 0 15 07:57 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4651 0 15 07:57 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4652 0 15 07:57 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4653 0 15 07:58 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4654 0 15 07:58 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4663 0 15 07:58 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4664 0 15 07:58 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4666 0 15 07:58 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4668 0 15 07:58 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
root 4427 4135 4669 0 15 07:59 pts/1 00:00:00 fio
--minimal --status-interval 10 1.fio
<snip ends>
Thanks,
Vasily
On Fri, Jul 25, 2014 at 3:56 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-25 09:43, Jens Axboe wrote:
>>
>> On 2014-07-21 22:25, Vasily Tarasov wrote:
>>>
>>> Hi Jens,
>>>
>>> I tried your patch, but it didn't help. Interestingly, the number of
>>> threads changes in the end. At first, during the run:
>>>
>>> # ps -eLf | grep fio
>>> root 5224 4274 5224 1 2 11:12 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5231 5224 5231 60 1 11:12 ? 00:00:07 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5260 5237 5260 0 1 11:12 pts/0 00:00:00 grep fio
>>> [root@bison01 vass]# ps -eLf | grep fio
>>> root 5224 4274 5224 0 2 11:12 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5231 5224 5231 16 1 11:12 ? 00:00:21 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5293 5237 5293 0 1 11:14 pts/0 00:00:00 grep fio
>>> [root@bison01 vass]# ps -eLf | grep fio
>>> root 5224 4274 5224 0 2 11:12 pts/1 00:00:01 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5225 0 2 11:12 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5231 5224 5231 12 1 11:12 ? 00:01:13 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5411 5237 5411 0 1 11:22 pts/0 00:00:00 grep fio
>>>
>>> Later, when the threads are stuck:
>>>
>>> # ps -eLf | grep fio
>>> root 5224 4274 5224 0 16 11:12 pts/1 00:00:02 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5225 0 16 11:12 pts/1 00:00:01 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5458 0 16 11:25 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5459 0 16 11:25 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5460 0 16 11:25 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5461 0 16 11:25 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5462 0 16 11:25 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5471 0 16 11:25 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5472 0 16 11:26 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5475 0 16 11:26 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5476 0 16 11:26 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5477 0 16 11:26 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5478 0 16 11:26 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5487 0 16 11:26 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5488 0 16 11:27 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 5224 4274 5489 0 16 11:27 pts/1 00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root 6665 5237 6665 0 1 13:21 pts/0 00:00:00 grep fio
>>>
>>> Is the number of threads supposed to change?..
>>
>>
>> Never answered this one... Yes, it'll change, since when you run the
>> job, you'll have one backend process, a number of IO workers, and one
>> disk util thread typically. When you get stuck, it's the backend that is
>> left waiting for that mutex.
>>
>> In any case, I haven't been able to figure this one out yet. But it
>> should be safe enough to just ignore the stat mutex for the final
>> output, since the threads otherwise accessing it are gone. Can you see
>> if this one makes the issue go away?
>
>
> Patch was not compiled, was missing the non-static __show_run_stats(). But
> just pull current -git, I have committed a variant that does compile :-)
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-25 16:34 ` Vasily Tarasov
@ 2014-07-28 8:56 ` Jens Axboe
2014-07-28 16:05 ` Vasily Tarasov
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-28 8:56 UTC (permalink / raw)
To: Vasily Tarasov; +Cc: Michael Mattsson, fio
On 2014-07-25 18:34, Vasily Tarasov wrote:
> Hi Jens,
>
> You'll be surprised but it did not help :( I used the latest code from
> git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.
That's actually good news, since it didn't make a lot of sense. So lets
see if we can't get to the bottom of this...
> I don't know if it helps, but I see this behavior on a machine with
> 96GB of RAM. So, after buffered writes are over, fio waits for a long
> time till all dirty buffers hit the disk. But, even after there is no
> more disk activity, fio is still stuck for as long as I don't kill it.
>
> Regarding the number of threads. I do understand where the 3 threads
> can come from:
>
> 1) Backend thread (sort of a manager)
> 1) Worker thread(s)
> 2) Disk stats thread
>
> I my case I defined only one job instance, so I suppose there always
> should be only one worker thread. I don't understand how the total
> number of threads go to 10 in the end.
>
> <snip starts>
> $ ps -eLf | grep fio
> root 4427 4135 4427 0 15 07:44 pts/1 00:00:02 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4636 0 15 07:56 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4637 0 15 07:57 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4638 0 15 07:57 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4647 0 15 07:57 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4650 0 15 07:57 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4651 0 15 07:57 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4652 0 15 07:57 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4653 0 15 07:58 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4654 0 15 07:58 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4663 0 15 07:58 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4664 0 15 07:58 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4666 0 15 07:58 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4668 0 15 07:58 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> root 4427 4135 4669 0 15 07:59 pts/1 00:00:00 fio
> --minimal --status-interval 10 1.fio
> <snip ends>
Can you try and gdb attach to it when it's hung and produce a new
backtrace? It can't be off the final status run, I wonder if it's off
the mutex down and remove instead.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-28 8:56 ` Jens Axboe
@ 2014-07-28 16:05 ` Vasily Tarasov
2014-07-28 19:49 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-28 16:05 UTC (permalink / raw)
To: Jens Axboe; +Cc: Michael Mattsson, fio
Hi Jens,
Thanks for looking into this. Here is the information you've requested:
1. Workload I'm running:
<snip starts>
[global]
rw=write
bs=8m
direct=0
thread=1
size=4g
[sdaa]
filename=/dev/sdaa
<snip ends>
2. That's what I see on the screen:
<snip starts>
# fio --status-interval 10 1.fio
sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1
fio-2.1.11-10-gae7e
Starting 1 thread
Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014
write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec
clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68
lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71
clat percentiles (usec):
| 1.00th=[ 2448], 5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ 2544],
| 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ 2576],
| 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ 3152],
| 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ 3664],
| 99.99th=[ 3664]
bw (MB /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, stdev=215.50
lat (msec) : 4=100.00%
cpu : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=512/d=0, short=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s,
maxb=367566KB/s, mint=11411msec, maxt=11411msec
Disk stats (read/write):
sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36%
<snip ends>
At this point fio does not exit. Below are the gdb backtraces. There
are three threads in the run. When I attach to one thread, get
backtrace, and then detach fio crashes with Segfault. So, to collect 3
backtraces, I ran the experiment 3 times and attached to a different
thread every time. Below, I change the PIDs so that they match the ps
output.
# ps -eLf | grep fio
root 18844 10064 18844 0 3 08:50 pts/1 00:00:00 fio
--status-interval 10 1.fio
root 18844 10064 18862 0 3 08:50 pts/1 00:00:00 fio
--status-interval 10 1.fio
root 18844 10064 18863 0 3 08:50 pts/1 00:00:00 fio
--status-interval 10 1.fio
root 18902 18805 18902 0 1 08:53 pts/0 00:00:00 grep fio
# gdb
(gdb) attach 18844
(gdb) bt
#0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
#2 0x0000000000411bc0 in stat_exit () at stat.c:1905
#3 0x000000000044c1d6 in fio_backend () at backend.c:2094
#4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
#5 0x000000000040a679 in _start ()
(gdb) attach 1862
(gdb) bt
#0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at mutex.c:155
#2 0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at stat.c:1445
#3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
#4 0x000000376eee8b6d in clone () from /lib64/libc.so.6
(gdb) attach 1863
(gdb) bt
#0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
#2 0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at stat.c:1421
#3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
#4 0x000000376eee8b6d in clone () from /lib64/libc.so.6
Let me know if you need any additional information.
Thank you,
Vasily
On Mon, Jul 28, 2014 at 1:56 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-25 18:34, Vasily Tarasov wrote:
>>
>> Hi Jens,
>>
>> You'll be surprised but it did not help :( I used the latest code from
>> git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.
>
>
> That's actually good news, since it didn't make a lot of sense. So lets see
> if we can't get to the bottom of this...
>
>
>> I don't know if it helps, but I see this behavior on a machine with
>> 96GB of RAM. So, after buffered writes are over, fio waits for a long
>> time till all dirty buffers hit the disk. But, even after there is no
>> more disk activity, fio is still stuck for as long as I don't kill it.
>>
>> Regarding the number of threads. I do understand where the 3 threads
>> can come from:
>>
>> 1) Backend thread (sort of a manager)
>> 1) Worker thread(s)
>> 2) Disk stats thread
>>
>> I my case I defined only one job instance, so I suppose there always
>> should be only one worker thread. I don't understand how the total
>> number of threads go to 10 in the end.
>>
>> <snip starts>
>> $ ps -eLf | grep fio
>> root 4427 4135 4427 0 15 07:44 pts/1 00:00:02 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4636 0 15 07:56 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4637 0 15 07:57 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4638 0 15 07:57 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4647 0 15 07:57 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4650 0 15 07:57 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4651 0 15 07:57 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4652 0 15 07:57 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4653 0 15 07:58 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4654 0 15 07:58 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4663 0 15 07:58 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4664 0 15 07:58 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4666 0 15 07:58 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4668 0 15 07:58 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root 4427 4135 4669 0 15 07:59 pts/1 00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> <snip ends>
>
>
> Can you try and gdb attach to it when it's hung and produce a new backtrace?
> It can't be off the final status run, I wonder if it's off the mutex down
> and remove instead.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-28 16:05 ` Vasily Tarasov
@ 2014-07-28 19:49 ` Jens Axboe
2014-10-23 15:57 ` Michael Mattsson
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-28 19:49 UTC (permalink / raw)
To: Vasily Tarasov; +Cc: Michael Mattsson, fio
On 2014-07-28 18:05, Vasily Tarasov wrote:
> Hi Jens,
>
> Thanks for looking into this. Here is the information you've requested:
>
> 1. Workload I'm running:
>
> <snip starts>
>
> [global]
> rw=write
> bs=8m
> direct=0
> thread=1
> size=4g
>
> [sdaa]
> filename=/dev/sdaa
>
> <snip ends>
>
> 2. That's what I see on the screen:
>
> <snip starts>
>
> # fio --status-interval 10 1.fio
> sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1
> fio-2.1.11-10-gae7e
> Starting 1 thread
> Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
> sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014
> write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec
> clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68
> lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71
> clat percentiles (usec):
> | 1.00th=[ 2448], 5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ 2544],
> | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ 2576],
> | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ 3152],
> | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ 3664],
> | 99.99th=[ 3664]
> bw (MB /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, stdev=215.50
> lat (msec) : 4=100.00%
> cpu : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6
> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> issued : total=r=0/w=512/d=0, short=r=0/w=0/d=0
> latency : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
> WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s,
> maxb=367566KB/s, mint=11411msec, maxt=11411msec
>
> Disk stats (read/write):
> sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36%
>
> <snip ends>
>
> At this point fio does not exit. Below are the gdb backtraces. There
> are three threads in the run. When I attach to one thread, get
> backtrace, and then detach fio crashes with Segfault. So, to collect 3
> backtraces, I ran the experiment 3 times and attached to a different
> thread every time. Below, I change the PIDs so that they match the ps
> output.
>
> # ps -eLf | grep fio
> root 18844 10064 18844 0 3 08:50 pts/1 00:00:00 fio
> --status-interval 10 1.fio
> root 18844 10064 18862 0 3 08:50 pts/1 00:00:00 fio
> --status-interval 10 1.fio
> root 18844 10064 18863 0 3 08:50 pts/1 00:00:00 fio
> --status-interval 10 1.fio
> root 18902 18805 18902 0 1 08:53 pts/0 00:00:00 grep fio
> # gdb
> (gdb) attach 18844
> (gdb) bt
> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
> #2 0x0000000000411bc0 in stat_exit () at stat.c:1905
> #3 0x000000000044c1d6 in fio_backend () at backend.c:2094
> #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
> #5 0x000000000040a679 in _start ()
> (gdb) attach 1862
> (gdb) bt
> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at mutex.c:155
> #2 0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at stat.c:1445
> #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
> #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6
> (gdb) attach 1863
> (gdb) bt
> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
> #2 0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at stat.c:1421
> #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
> #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6
>
> Let me know if you need any additional information.
This is perfect, makes a lot more sense. Progress on this may be a
little slow, I'm on vacation this week... But expect something next
week, at least.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-07-28 19:49 ` Jens Axboe
@ 2014-10-23 15:57 ` Michael Mattsson
2014-10-23 16:01 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-10-23 15:57 UTC (permalink / raw)
To: Jens Axboe; +Cc: Vasily Tarasov, fio
Hey,
Have there been any work on this? This is still an issue for me on fio
2.1.13. It happens less frequently but I've had two hangs in a sample
of ~40 runs.
Regards
Michael
On Mon, Jul 28, 2014 at 12:49 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-28 18:05, Vasily Tarasov wrote:
>>
>> Hi Jens,
>>
>> Thanks for looking into this. Here is the information you've requested:
>>
>> 1. Workload I'm running:
>>
>> <snip starts>
>>
>> [global]
>> rw=write
>> bs=8m
>> direct=0
>> thread=1
>> size=4g
>>
>> [sdaa]
>> filename=/dev/sdaa
>>
>> <snip ends>
>>
>> 2. That's what I see on the screen:
>>
>> <snip starts>
>>
>> # fio --status-interval 10 1.fio
>> sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1
>> fio-2.1.11-10-gae7e
>> Starting 1 thread
>> Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>> 00m:00s]
>> sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014
>> write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec
>> clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68
>> lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71
>> clat percentiles (usec):
>> | 1.00th=[ 2448], 5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[
>> 2544],
>> | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[
>> 2576],
>> | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[
>> 3152],
>> | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[
>> 3664],
>> | 99.99th=[ 3664]
>> bw (MB /s): min= 2464, max= 2842, per=100.00%, avg=2712.77,
>> stdev=215.50
>> lat (msec) : 4=100.00%
>> cpu : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6
>> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>> issued : total=r=0/w=512/d=0, short=r=0/w=0/d=0
>> latency : target=0, window=0, percentile=100.00%, depth=1
>>
>> Run status group 0 (all jobs):
>> WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s,
>> maxb=367566KB/s, mint=11411msec, maxt=11411msec
>>
>> Disk stats (read/write):
>> sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36%
>>
>> <snip ends>
>>
>> At this point fio does not exit. Below are the gdb backtraces. There
>> are three threads in the run. When I attach to one thread, get
>> backtrace, and then detach fio crashes with Segfault. So, to collect 3
>> backtraces, I ran the experiment 3 times and attached to a different
>> thread every time. Below, I change the PIDs so that they match the ps
>> output.
>>
>> # ps -eLf | grep fio
>> root 18844 10064 18844 0 3 08:50 pts/1 00:00:00 fio
>> --status-interval 10 1.fio
>> root 18844 10064 18862 0 3 08:50 pts/1 00:00:00 fio
>> --status-interval 10 1.fio
>> root 18844 10064 18863 0 3 08:50 pts/1 00:00:00 fio
>> --status-interval 10 1.fio
>> root 18902 18805 18902 0 1 08:53 pts/0 00:00:00 grep fio
>> # gdb
>> (gdb) attach 18844
>> (gdb) bt
>> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at
>> mutex.c:155
>> #2 0x0000000000411bc0 in stat_exit () at stat.c:1905
>> #3 0x000000000044c1d6 in fio_backend () at backend.c:2094
>> #4 0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
>> #5 0x000000000040a679 in _start ()
>> (gdb) attach 1862
>> (gdb) bt
>> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at
>> mutex.c:155
>> #2 0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at
>> stat.c:1445
>> #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
>> #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6
>> (gdb) attach 1863
>> (gdb) bt
>> #0 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1 0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at
>> mutex.c:155
>> #2 0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at
>> stat.c:1421
>> #3 0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
>> #4 0x000000376eee8b6d in clone () from /lib64/libc.so.6
>>
>> Let me know if you need any additional information.
>
>
> This is perfect, makes a lot more sense. Progress on this may be a little
> slow, I'm on vacation this week... But expect something next week, at least.
>
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-10-23 15:57 ` Michael Mattsson
@ 2014-10-23 16:01 ` Jens Axboe
2014-10-23 21:58 ` Michael Mattsson
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-10-23 16:01 UTC (permalink / raw)
To: Michael Mattsson; +Cc: Vasily Tarasov, fio
On 10/23/2014 09:57 AM, Michael Mattsson wrote:
> Hey,
> Have there been any work on this? This is still an issue for me on fio
> 2.1.13. It happens less frequently but I've had two hangs in a sample
> of ~40 runs.
Try current -git and see if it works better. If not, lets get some new
traces and we'll get this fixed up.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-10-23 16:01 ` Jens Axboe
@ 2014-10-23 21:58 ` Michael Mattsson
2014-10-24 5:17 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-10-23 21:58 UTC (permalink / raw)
To: Jens Axboe; +Cc: Vasily Tarasov, fio
Hi,
I got some new traces of the stuck fio process.
This is 2.1.13:
(gdb) bt
#0 0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1 0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at mutex.c:155
#2 0x0000000000411950 in stat_exit () at stat.c:1905
#3 0x000000000044c756 in fio_backend () at backend.c:2094
#4 0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6
#5 0x0000000000409e49 in _start ()
(gdb)
With fio-2.1.13-88-gb2ee7:
(gdb) bt
#0 0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0
#1 0x0000000000411939 in wait_for_status_interval_thread_exit () at stat.c:1939
#2 0x000000000044ce2a in fio_backend () at backend.c:2084
#3 0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6
#4 0x000000000040a529 in _start ()
(gdb)
Regards
Michael
On Thu, Oct 23, 2014 at 9:01 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 10/23/2014 09:57 AM, Michael Mattsson wrote:
>> Hey,
>> Have there been any work on this? This is still an issue for me on fio
>> 2.1.13. It happens less frequently but I've had two hangs in a sample
>> of ~40 runs.
>
> Try current -git and see if it works better. If not, lets get some new
> traces and we'll get this fixed up.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-10-23 21:58 ` Michael Mattsson
@ 2014-10-24 5:17 ` Jens Axboe
2014-10-24 15:33 ` Michael Mattsson
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-10-24 5:17 UTC (permalink / raw)
To: Michael Mattsson; +Cc: Vasily Tarasov, fio
On 2014-10-23 15:58, Michael Mattsson wrote:
> Hi,
> I got some new traces of the stuck fio process.
>
> This is 2.1.13:
> (gdb) bt
> #0 0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1 0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at mutex.c:155
> #2 0x0000000000411950 in stat_exit () at stat.c:1905
> #3 0x000000000044c756 in fio_backend () at backend.c:2094
> #4 0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6
> #5 0x0000000000409e49 in _start ()
> (gdb)
>
> With fio-2.1.13-88-gb2ee7:
> (gdb) bt
> #0 0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0
> #1 0x0000000000411939 in wait_for_status_interval_thread_exit () at stat.c:1939
> #2 0x000000000044ce2a in fio_backend () at backend.c:2084
> #3 0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6
> #4 0x000000000040a529 in _start ()
OK, I think I see what this is. Please update to current -git again and
see if that doesn't fix up this issue.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-10-24 5:17 ` Jens Axboe
@ 2014-10-24 15:33 ` Michael Mattsson
2014-10-24 15:50 ` Jens Axboe
0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-10-24 15:33 UTC (permalink / raw)
To: Jens Axboe; +Cc: Vasily Tarasov, fio
Hey,
This seems to be fixed now. I successfully finished 1500+ runs over
night in my rig without issues (it would have a hanging fio thread
about every other or second or so between eight clients).
The stumbling output issue on the second status interval thread I've
got still remain inconsistent.
Regards
Michael
On Thu, Oct 23, 2014 at 10:17 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-10-23 15:58, Michael Mattsson wrote:
>>
>> Hi,
>> I got some new traces of the stuck fio process.
>>
>> This is 2.1.13:
>> (gdb) bt
>> #0 0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1 0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at
>> mutex.c:155
>> #2 0x0000000000411950 in stat_exit () at stat.c:1905
>> #3 0x000000000044c756 in fio_backend () at backend.c:2094
>> #4 0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6
>> #5 0x0000000000409e49 in _start ()
>> (gdb)
>>
>> With fio-2.1.13-88-gb2ee7:
>> (gdb) bt
>> #0 0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0
>> #1 0x0000000000411939 in wait_for_status_interval_thread_exit () at
>> stat.c:1939
>> #2 0x000000000044ce2a in fio_backend () at backend.c:2084
>> #3 0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6
>> #4 0x000000000040a529 in _start ()
>
>
> OK, I think I see what this is. Please update to current -git again and see
> if that doesn't fix up this issue.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-10-24 15:33 ` Michael Mattsson
@ 2014-10-24 15:50 ` Jens Axboe
2014-11-04 15:46 ` Vasily Tarasov
0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-10-24 15:50 UTC (permalink / raw)
To: Michael Mattsson; +Cc: Vasily Tarasov, fio
On 2014-10-24 09:33, Michael Mattsson wrote:
> Hey,
> This seems to be fixed now. I successfully finished 1500+ runs over
> night in my rig without issues (it would have a hanging fio thread
> about every other or second or so between eight clients).
Goodie, I thought that it might.
--
Jens Axboe
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: fio hangs with --status-interval
2014-10-24 15:50 ` Jens Axboe
@ 2014-11-04 15:46 ` Vasily Tarasov
0 siblings, 0 replies; 25+ messages in thread
From: Vasily Tarasov @ 2014-11-04 15:46 UTC (permalink / raw)
To: Jens Axboe; +Cc: Michael Mattsson, fio
The latest code from git does not hang for me as well. Thanks.
Vasily
On Fri, Oct 24, 2014 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-10-24 09:33, Michael Mattsson wrote:
>>
>> Hey,
>> This seems to be fixed now. I successfully finished 1500+ runs over
>> night in my rig without issues (it would have a hanging fio thread
>> about every other or second or so between eight clients).
>
>
> Goodie, I thought that it might.
>
> --
> Jens Axboe
>
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-11-04 15:46 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-09 22:56 fio hangs with --status-interval Michael Mattsson
2014-07-10 8:44 ` Jens Axboe
2014-07-10 14:55 ` Michael Mattsson
2014-07-10 20:07 ` Jens Axboe
2014-07-10 21:27 ` Michael Mattsson
2014-07-11 11:48 ` Jens Axboe
2014-07-11 16:19 ` Michael Mattsson
2014-07-12 8:42 ` Jens Axboe
2014-07-16 16:58 ` Vasily Tarasov
2014-07-21 8:08 ` Jens Axboe
2014-07-21 20:25 ` Vasily Tarasov
2014-07-21 20:54 ` Jens Axboe
2014-07-25 7:43 ` Jens Axboe
2014-07-25 7:56 ` Jens Axboe
2014-07-25 16:34 ` Vasily Tarasov
2014-07-28 8:56 ` Jens Axboe
2014-07-28 16:05 ` Vasily Tarasov
2014-07-28 19:49 ` Jens Axboe
2014-10-23 15:57 ` Michael Mattsson
2014-10-23 16:01 ` Jens Axboe
2014-10-23 21:58 ` Michael Mattsson
2014-10-24 5:17 ` Jens Axboe
2014-10-24 15:33 ` Michael Mattsson
2014-10-24 15:50 ` Jens Axboe
2014-11-04 15:46 ` Vasily Tarasov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.