All of lore.kernel.org
 help / color / mirror / Atom feed
* fio hangs with --status-interval
@ 2014-07-09 22:56 Michael Mattsson
  2014-07-10  8:44 ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-09 22:56 UTC (permalink / raw)
  To: fio

Hey,
I've got 8 identical CentOS 6.5 clients that randomly keeps hanging
fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10
they both behave the same. I've also tried piping the output to tee
instead of redirecting to a file. I also tried --output and specified
output file, still same problem. My fio command runs through its tests
flawlessly without --status-interval and exits cleanly every time.
There could be anywhere from 0 to 5 clients that gets affected.
Running strace on the process that seem hung yields the following
output:

$ strace -p 31055
Process 31055 attached - interrupt to quit
futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL

It will sit there forever. Excuse the splurge but this is the command
(run 8 clients simultaneously where /fut1 and /fut2 is on shared
storage (NFSv3)):

$ fio --minimal --direct=1 --group_reporting --filesize=64m
--norandommap --blocksize=4k --time_based --iodepth=1 --ramp_time=15
--ioengine=libaio --status-interval=5 --name=mytest-foo --rw=randread
--numjobs=64 --filename=/fut1/6of4_1:/fut2/6of4_1:/fut1/6of4_2:/fut2/6of4_2:/fut1/6of4_3:/fut2/6of4_3:/fut1/6of4_4:/fut2/6of4_4:/fut1/6of4_5:/fut2/6of4_5:/fut1/6of4_6:/fut2/6of4_6:/fut1/6of4_7:/fut2/6of4_7:/fut1/6of4_8:/fut2/6of4_8:/fut1/6of4_9:/fut2/6of4_9:/fut1/6of4_10:/fut2/6of4_10:/fut1/6of4_11:/fut2/6of4_11:/fut1/6of4_12:/fut2/6of4_12:/fut1/6of4_13:/fut2/6of4_13:/fut1/6of4_14:/fut2/6of4_14:/fut1/6of4_15:/fut2/6of4_15:/fut1/6of4_16:/fut2/6of4_16:/fut1/6of4_17:/fut2/6of4_17:/fut1/6of4_18:/fut2/6of4_18:/fut1/6of4_19:/fut2/6of4_19:/fut1/6of4_20:/fut2/6of4_20:/fut1/6of4_21:/fut2/6of4_21:/fut1/6of4_22:/fut2/6of4_22:/fut1/6of4_23:/fut2/6of4_23:/fut1/6of4_24:/fut2/6of4_24:/fut1/6of4_25:/fut2/6of4_25:/fut1/6of4_26:/fut2/6of4_26:/fut1/6of4_27:/fut2/6of4_27:/fut1/6of4_28:/fut2/6of4_28:/fut1/6of4_29:/fut2/6of4_29:/fut1/6of4_30:/fut2/6of4_30:/fut1/6of4_31:/fut2/6of4_31:/fut1/6of4_32:/fut2/6of4_32
--runtime=60

Worth noting is that the output file is redirected to shared storage
(NFSv4) on a different system than the one under test.

These are the fio 2.1.4 compile options:
$ ./configure
Operating system              Linux
CPU                           x86_64
Big endian                    no
Compiler                      gcc
Cross compile                 no

Wordsize                      64
zlib                          no
Linux AIO support             yes
POSIX AIO support             yes
POSIX AIO support needs -lrt  yes
POSIX AIO fsync               yes
Solaris AIO support           no
__sync_fetch_and_add          yes
libverbs                      no
rdmacm                        no
Linux fallocate               yes
POSIX fadvise                 yes
POSIX fallocate               yes
sched_setaffinity(3 arg)      yes
sched_setaffinity(2 arg)      no
clock_gettime                 yes
CLOCK_MONOTONIC               yes
CLOCK_MONOTONIC_PRECISE       no
gettimeofday                  yes
fdatasync                     yes
sync_file_range               yes
EXT4 move extent              yes
Linux splice(2)               yes
GUASI                         no
Fusion-io atomic engine       no
libnuma                       no
strsep                        yes
strcasestr                    yes
getopt_long_only()            yes
inet_aton                     yes
socklen_t                     yes
__thread                      yes
gtk 2.18 or higher            no
RUSAGE_THREAD                 yes
SCHED_IDLE                    yes
TCP_NODELAY                   yes
RLIMIT_MEMLOCK                yes
pwritev/preadv                yes

Regards
Michael

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-09 22:56 fio hangs with --status-interval Michael Mattsson
@ 2014-07-10  8:44 ` Jens Axboe
  2014-07-10 14:55   ` Michael Mattsson
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-10  8:44 UTC (permalink / raw)
  To: Michael Mattsson, fio

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

On 2014-07-10 00:56, Michael Mattsson wrote:
> Hey,
> I've got 8 identical CentOS 6.5 clients that randomly keeps hanging
> fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10
> they both behave the same. I've also tried piping the output to tee
> instead of redirecting to a file. I also tried --output and specified
> output file, still same problem. My fio command runs through its tests
> flawlessly without --status-interval and exits cleanly every time.
> There could be anywhere from 0 to 5 clients that gets affected.
> Running strace on the process that seem hung yields the following
> output:
>
> $ strace -p 31055
> Process 31055 attached - interrupt to quit
> futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL

Strange, it must be stuck on the stat mutex, but I don't immediately see 
why that would happen. Does the attached patch make any difference for 
you, both in getting rid of the hang but still producing output at the 
desired intervals?

-- 
Jens Axboe


[-- Attachment #2: stat.patch --]
[-- Type: text/x-patch, Size: 1790 bytes --]

diff --git a/stat.c b/stat.c
index 979c8100d378..93316a239f7b 100644
--- a/stat.c
+++ b/stat.c
@@ -1466,11 +1466,12 @@ static void *__show_running_run_stats(void fio_unused *arg)
  * in the sig handler, but we should be disturbing the system less by just
  * creating a thread to do it.
  */
-void show_running_run_stats(void)
+int show_running_run_stats(void)
 {
 	pthread_t thread;
 
-	fio_mutex_down(stat_mutex);
+	if (fio_mutex_down_trylock(stat_mutex))
+		return 1;
 
 	if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) {
 		int err;
@@ -1479,10 +1480,11 @@ void show_running_run_stats(void)
 		if (err)
 			log_err("fio: DU thread detach failed: %s\n", strerror(err));
 
-		return;
+		return 0;
 	}
 
 	fio_mutex_up(stat_mutex);
+	return 1;
 }
 
 static int status_interval_init;
@@ -1531,8 +1533,8 @@ void check_for_running_stats(void)
 			fio_gettime(&status_time, NULL);
 			status_interval_init = 1;
 		} else if (mtime_since_now(&status_time) >= status_interval) {
-			show_running_run_stats();
-			fio_gettime(&status_time, NULL);
+			if (!show_running_run_stats())
+				fio_gettime(&status_time, NULL);
 			return;
 		}
 	}
diff --git a/stat.h b/stat.h
index 2e46175053e8..82b8e973e4be 100644
--- a/stat.h
+++ b/stat.h
@@ -218,7 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs);
 extern int calc_thread_status(struct jobs_eta *je, int force);
 extern void display_thread_status(struct jobs_eta *je);
 extern void show_run_stats(void);
-extern void show_running_run_stats(void);
+extern int show_running_run_stats(void);
 extern void check_for_running_stats(void);
 extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr);
 extern void sum_group_stats(struct group_run_stats *dst, struct group_run_stats *src);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-10  8:44 ` Jens Axboe
@ 2014-07-10 14:55   ` Michael Mattsson
  2014-07-10 20:07     ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-10 14:55 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Hey,
Thanks for the patch. I got log output but I had two clients hanging
in the same way with the attached patch.

$ patch -p0 < stat.patch
patching file b/stat.c
patching file b/stat.h
Hunk #1 succeeded at 213 (offset -5 lines).

I was using --output <filename>

Worth mentioning here is that on the NFS server the output file is
written to is issuing a stat(1) on the above output file once per
second. Could it cause any prolems?

Thanks!
Michael


On Thu, Jul 10, 2014 at 1:44 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-10 00:56, Michael Mattsson wrote:
>>
>> Hey,
>> I've got 8 identical CentOS 6.5 clients that randomly keeps hanging
>> fio when using --status-interval. I've tried fio 2.1.4 and fio 2.1.10
>> they both behave the same. I've also tried piping the output to tee
>> instead of redirecting to a file. I also tried --output and specified
>> output file, still same problem. My fio command runs through its tests
>> flawlessly without --status-interval and exits cleanly every time.
>> There could be anywhere from 0 to 5 clients that gets affected.
>> Running strace on the process that seem hung yields the following
>> output:
>>
>> $ strace -p 31055
>> Process 31055 attached - interrupt to quit
>> futex(0x7f346ede802c, FUTEX_WAIT, 1, NULL
>
>
> Strange, it must be stuck on the stat mutex, but I don't immediately see why
> that would happen. Does the attached patch make any difference for you, both
> in getting rid of the hang but still producing output at the desired
> intervals?
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-10 14:55   ` Michael Mattsson
@ 2014-07-10 20:07     ` Jens Axboe
  2014-07-10 21:27       ` Michael Mattsson
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-10 20:07 UTC (permalink / raw)
  To: Michael Mattsson; +Cc: fio

On 2014-07-10 16:55, Michael Mattsson wrote:
> Hey,
> Thanks for the patch. I got log output but I had two clients hanging
> in the same way with the attached patch.
>
> $ patch -p0 < stat.patch
> patching file b/stat.c
> patching file b/stat.h
> Hunk #1 succeeded at 213 (offset -5 lines).
>
> I was using --output <filename>
>
> Worth mentioning here is that on the NFS server the output file is
> written to is issuing a stat(1) on the above output file once per
> second. Could it cause any prolems?

OK, so next question. If you leave it long enough, do you get "stuck 
process" dumps from the kernel? In any case, a:

# echo t > /proc/sysrq-trigger

dump would be handy to see for all the fio processes, this doesn't smell 
like a fio issue.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-10 20:07     ` Jens Axboe
@ 2014-07-10 21:27       ` Michael Mattsson
  2014-07-11 11:48         ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-10 21:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Hey,
I don't get any kernel messages about stuck processes. echo t >
/proc/sysrq-trigger gives the following related to fio:

fio           S 0000000000000004     0  4189   3638 0x00000080
 ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
 ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
 ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
Call Trace:
 [<ffffffff8111f867>] ? unlock_page+0x27/0x30
 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
 [<ffffffff810af668>] futex_wait+0x1f8/0x380
 [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
 [<ffffffff810b0f31>] do_futex+0x121/0xb50
 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
 [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
 [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
 [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
 [<ffffffff810b19db>] sys_futex+0x7b/0x170
 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio           S 0000000000000004     0  4191   3638 0x00000080
 ffff88186bccde68 0000000000000086 0000000000000000 ffffffff8109fdd3
 0000000000000004 0000000100000286 ffff88186bccde08 ffffffff8109f491
 ffff8818680c7af8 ffff88186bccdfd8 000000000000fbc8 ffff8818680c7af8
Call Trace:
 [<ffffffff8109fdd3>] ? __hrtimer_start_range_ns+0x1a3/0x460
 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8152a40b>] do_nanosleep+0x8b/0xc0
 [<ffffffff810a0294>] hrtimer_nanosleep+0xc4/0x180
 [<ffffffff8109f0f0>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff810a00c4>] ? hrtimer_start_range_ns+0x14/0x20
 [<ffffffff810a03be>] sys_nanosleep+0x6e/0x80
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio           S 0000000000000002     0  4270   3638 0x00000080
 ffff88186a503bf8 0000000000000086 ffff88186a503b68 ffffffff8111f867
 ffff88186a503d01 00007fa2f39b2000 ffff88186a503bd8 ffffffff810aec10
 ffff88186a747058 ffff88186a503fd8 000000000000fbc8 ffff88186a747058
Call Trace:
 [<ffffffff8111f867>] ? unlock_page+0x27/0x30
 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
 [<ffffffff810af668>] futex_wait+0x1f8/0x380
 [<ffffffff810b0f31>] do_futex+0x121/0xb50
 [<ffffffff810b19db>] sys_futex+0x7b/0x170
 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Above output is without the stat.patch, below is with the patch:

fio           S 0000000000000004     0  4189   3638 0x00000080
 ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
 ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
 ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
Call Trace:
 [<ffffffff8111f867>] ? unlock_page+0x27/0x30
 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
 [<ffffffff810af668>] futex_wait+0x1f8/0x380
 [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
 [<ffffffff810b0f31>] do_futex+0x121/0xb50
 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
 [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
 [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
 [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
 [<ffffffff810b19db>] sys_futex+0x7b/0x170
 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio           S 0000000000000004     0  4191   3638 0x00000080
 ffff88186bccde68 0000000000000086 0000000000000000 ffffffff8109fdd3
 0000000000000004 0000000100000286 ffff88186bccde08 ffffffff8109f491
 ffff8818680c7af8 ffff88186bccdfd8 000000000000fbc8 ffff8818680c7af8
Call Trace:
 [<ffffffff8109fdd3>] ? __hrtimer_start_range_ns+0x1a3/0x460
 [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
 [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
 [<ffffffff8152a40b>] do_nanosleep+0x8b/0xc0
 [<ffffffff810a0294>] hrtimer_nanosleep+0xc4/0x180
 [<ffffffff8109f0f0>] ? hrtimer_wakeup+0x0/0x30
 [<ffffffff810a00c4>] ? hrtimer_start_range_ns+0x14/0x20
 [<ffffffff810a03be>] sys_nanosleep+0x6e/0x80
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
fio           S 0000000000000002     0  4270   3638 0x00000080
 ffff88186a503bf8 0000000000000086 ffff88186a503b68 ffffffff8111f867
 ffff88186a503d01 00007fa2f39b2000 ffff88186a503bd8 ffffffff810aec10
 ffff88186a747058 ffff88186a503fd8 000000000000fbc8 ffff88186a747058
Call Trace:
 [<ffffffff8111f867>] ? unlock_page+0x27/0x30
 [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
 [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
 [<ffffffff810af668>] futex_wait+0x1f8/0x380
 [<ffffffff810b0f31>] do_futex+0x121/0xb50
 [<ffffffff810b19db>] sys_futex+0x7b/0x170
 [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
 [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b


Thanks!
Michael

On Thu, Jul 10, 2014 at 1:07 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-10 16:55, Michael Mattsson wrote:
>>
>> Hey,
>> Thanks for the patch. I got log output but I had two clients hanging
>> in the same way with the attached patch.
>>
>> $ patch -p0 < stat.patch
>> patching file b/stat.c
>> patching file b/stat.h
>> Hunk #1 succeeded at 213 (offset -5 lines).
>>
>> I was using --output <filename>
>>
>> Worth mentioning here is that on the NFS server the output file is
>> written to is issuing a stat(1) on the above output file once per
>> second. Could it cause any prolems?
>
>
> OK, so next question. If you leave it long enough, do you get "stuck
> process" dumps from the kernel? In any case, a:
>
> # echo t > /proc/sysrq-trigger
>
> dump would be handy to see for all the fio processes, this doesn't smell
> like a fio issue.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-10 21:27       ` Michael Mattsson
@ 2014-07-11 11:48         ` Jens Axboe
  2014-07-11 16:19           ` Michael Mattsson
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-11 11:48 UTC (permalink / raw)
  To: Michael Mattsson; +Cc: fio

On 2014-07-10 23:27, Michael Mattsson wrote:
> Hey,
> I don't get any kernel messages about stuck processes. echo t >
> /proc/sysrq-trigger gives the following related to fio:
>
> fio           S 0000000000000004     0  4189   3638 0x00000080
>   ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
>   ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
>   ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
> Call Trace:
>   [<ffffffff8111f867>] ? unlock_page+0x27/0x30
>   [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
>   [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
>   [<ffffffff810af668>] futex_wait+0x1f8/0x380
>   [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
>   [<ffffffff810b0f31>] do_futex+0x121/0xb50
>   [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
>   [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
>   [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
>   [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
>   [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
>   [<ffffffff810b19db>] sys_futex+0x7b/0x170
>   [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
>   [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
>   [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

OK, so nothing unexpected there. But it's all still very weird. Could 
you try and attach gdb to a fio process that is stuck like this, and 
generate 'bt' backtraces?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-11 11:48         ` Jens Axboe
@ 2014-07-11 16:19           ` Michael Mattsson
  2014-07-12  8:42             ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-07-11 16:19 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

Hey,
This system will be busy over the weekend with other work, I'll get
this to you at the next window or try reproduce it on a different
system. Do you want the trace with or without the patch you sent?

Thanks!
Michael

On Fri, Jul 11, 2014 at 4:48 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-10 23:27, Michael Mattsson wrote:
>>
>> Hey,
>> I don't get any kernel messages about stuck processes. echo t >
>> /proc/sysrq-trigger gives the following related to fio:
>>
>> fio           S 0000000000000004     0  4189   3638 0x00000080
>>   ffff8817a0fd5bf8 0000000000000086 0000000000000000 ffffffff8111f867
>>   ffff8817a0fd5d01 00007fa2f43b6000 ffff8817a0fd5bd8 ffffffff810aec10
>>   ffff88186b867ab8 ffff8817a0fd5fd8 000000000000fbc8 ffff88186b867ab8
>> Call Trace:
>>   [<ffffffff8111f867>] ? unlock_page+0x27/0x30
>>   [<ffffffff810aec10>] ? get_futex_key+0x180/0x2b0
>>   [<ffffffff810ae559>] futex_wait_queue_me+0xb9/0xf0
>>   [<ffffffff810af668>] futex_wait+0x1f8/0x380
>>   [<ffffffff8100988e>] ? __switch_to+0x26e/0x320
>>   [<ffffffff810b0f31>] do_futex+0x121/0xb50
>>   [<ffffffff8109f491>] ? lock_hrtimer_base+0x31/0x60
>>   [<ffffffff810a010f>] ? hrtimer_try_to_cancel+0x3f/0xd0
>>   [<ffffffff810a01c2>] ? hrtimer_cancel+0x22/0x30
>>   [<ffffffff8152a413>] ? do_nanosleep+0x93/0xc0
>>   [<ffffffff810a0294>] ? hrtimer_nanosleep+0xc4/0x180
>>   [<ffffffff810b19db>] sys_futex+0x7b/0x170
>>   [<ffffffff810e1e87>] ? audit_syscall_entry+0x1d7/0x200
>>   [<ffffffff810e1c7e>] ? __audit_syscall_exit+0x25e/0x290
>>   [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
>
>
> OK, so nothing unexpected there. But it's all still very weird. Could you
> try and attach gdb to a fio process that is stuck like this, and generate
> 'bt' backtraces?
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-11 16:19           ` Michael Mattsson
@ 2014-07-12  8:42             ` Jens Axboe
  2014-07-16 16:58               ` Vasily Tarasov
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-12  8:42 UTC (permalink / raw)
  To: Michael Mattsson; +Cc: fio

On 2014-07-11 18:19, Michael Mattsson wrote:
> Hey,
> This system will be busy over the weekend with other work, I'll get
> this to you at the next window or try reproduce it on a different
> system. Do you want the trace with or without the patch you sent?

That's fine - and either version can be used, doesn't matter to me.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-12  8:42             ` Jens Axboe
@ 2014-07-16 16:58               ` Vasily Tarasov
  2014-07-21  8:08                 ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-16 16:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Mattsson, fio

I started to observe similar behavior on one of my workloads. Also,
with periodic statistics output and also on RHEL 6.5. Here is gdb
output in my case:


# ps axu | grep fio
root      4489  0.0  0.0 322040 52816 pts/1    Sl+  08:31   0:03 fio
--status-interval 10 --minimal fios/1.fio
root      5547  0.0  0.0 103256   860 pts/0    S+   09:56   0:00 grep fio

# cat /proc/4489/wchan
futex_wait_queue_me

# gdb
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
(gdb) attach 4489
Attaching to process 4489
Reading symbols from /usr/local/bin/fio...done.
Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib64/librdmacm.so.1
Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging
symbols found)...done.
Loaded symbols for /usr/lib64/libibverbs.so.1
Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/librt.so.1
Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libaio.so.1
Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib64/libz.so.1
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
found)...done.
[New LWP 4768]
[New LWP 4491]
[Thread debugging using libthread_db enabled]
Loaded symbols for /lib64/libpthread.so.0
Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/libdl.so.2
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
Missing separate debuginfos, use: debuginfo-install
glibc-2.12-1.132.el6.x86_64 libaio-0.3.107-10.el6.x86_64
libibverbs-1.1.7-1.el6.x86_64 librdmacm-1.0.17-1.el6.x86_64
zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x000000000042ea39 in fio_mutex_down (mutex=0x7f08b4e9f000) at mutex.c:155
#2  0x000000000041b680 in show_run_stats () at stat.c:1409
#3  0x0000000000449c85 in fio_backend () at backend.c:2042
#4  0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
#5  0x000000000040a4b9 in _start ()
(gdb)


Does it help?

Thanks,
Vasily

On Sat, Jul 12, 2014 at 4:42 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-11 18:19, Michael Mattsson wrote:
>>
>> Hey,
>> This system will be busy over the weekend with other work, I'll get
>> this to you at the next window or try reproduce it on a different
>> system. Do you want the trace with or without the patch you sent?
>
>
> That's fine - and either version can be used, doesn't matter to me.
>
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-16 16:58               ` Vasily Tarasov
@ 2014-07-21  8:08                 ` Jens Axboe
  2014-07-21 20:25                   ` Vasily Tarasov
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-21  8:08 UTC (permalink / raw)
  To: Vasily Tarasov; +Cc: Michael Mattsson, fio

[-- Attachment #1: Type: text/plain, Size: 3422 bytes --]

On 2014-07-16 18:58, Vasily Tarasov wrote:
> I started to observe similar behavior on one of my workloads. Also,
> with periodic statistics output and also on RHEL 6.5. Here is gdb
> output in my case:
>
>
> # ps axu | grep fio
> root      4489  0.0  0.0 322040 52816 pts/1    Sl+  08:31   0:03 fio
> --status-interval 10 --minimal fios/1.fio
> root      5547  0.0  0.0 103256   860 pts/0    S+   09:56   0:00 grep fio
>
> # cat /proc/4489/wchan
> futex_wait_queue_me
>
> # gdb
> GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
> Copyright (C) 2010 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>.
> (gdb) attach 4489
> Attaching to process 4489
> Reading symbols from /usr/local/bin/fio...done.
> Reading symbols from /usr/lib64/librdmacm.so.1...(no debugging symbols
> found)...done.
> Loaded symbols for /usr/lib64/librdmacm.so.1
> Reading symbols from /usr/lib64/libibverbs.so.1...(no debugging
> symbols found)...done.
> Loaded symbols for /usr/lib64/libibverbs.so.1
> Reading symbols from /lib64/librt.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/librt.so.1
> Reading symbols from /lib64/libaio.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libaio.so.1
> Reading symbols from /lib64/libz.so.1...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libz.so.1
> Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libm.so.6
> Reading symbols from /lib64/libpthread.so.0...(no debugging symbols
> found)...done.
> [New LWP 4768]
> [New LWP 4491]
> [Thread debugging using libthread_db enabled]
> Loaded symbols for /lib64/libpthread.so.0
> Reading symbols from /lib64/libdl.so.2...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libdl.so.2
> Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
> Loaded symbols for /lib64/libc.so.6
> Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging
> symbols found)...done.
> Loaded symbols for /lib64/ld-linux-x86-64.so.2
> 0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> Missing separate debuginfos, use: debuginfo-install
> glibc-2.12-1.132.el6.x86_64 libaio-0.3.107-10.el6.x86_64
> libibverbs-1.1.7-1.el6.x86_64 librdmacm-1.0.17-1.el6.x86_64
> zlib-1.2.3-29.el6.x86_64
> (gdb) bt
> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x000000000042ea39 in fio_mutex_down (mutex=0x7f08b4e9f000) at mutex.c:155
> #2  0x000000000041b680 in show_run_stats () at stat.c:1409
> #3  0x0000000000449c85 in fio_backend () at backend.c:2042
> #4  0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
> #5  0x000000000040a4b9 in _start ()
> (gdb)

Are there other threads alive, would be interesting to see a backtrace 
from them. In any case, I think it'd be better to move the stat mutex 
grab to the stat thread itself. I can't reproduce this, so can you check 
if the attached patch makes a difference?

-- 
Jens Axboe


[-- Attachment #2: stat.patch --]
[-- Type: text/x-patch, Size: 1325 bytes --]

diff --git a/stat.c b/stat.c
index 979c8100d378..d8365811b25f 100644
--- a/stat.c
+++ b/stat.c
@@ -1411,13 +1411,15 @@ void show_run_stats(void)
 	fio_mutex_up(stat_mutex);
 }
 
-static void *__show_running_run_stats(void fio_unused *arg)
+static void *__show_running_run_stats(void *arg)
 {
 	struct thread_data *td;
 	unsigned long long *rt;
 	struct timeval tv;
 	int i;
 
+	fio_mutex_down(stat_mutex);
+
 	rt = malloc(thread_number * sizeof(unsigned long long));
 	fio_gettime(&tv, NULL);
 
@@ -1458,6 +1460,7 @@ static void *__show_running_run_stats(void fio_unused *arg)
 
 	free(rt);
 	fio_mutex_up(stat_mutex);
+	free(arg);
 	return NULL;
 }
 
@@ -1468,21 +1471,23 @@ static void *__show_running_run_stats(void fio_unused *arg)
  */
 void show_running_run_stats(void)
 {
-	pthread_t thread;
+	pthread_t *thread;
 
-	fio_mutex_down(stat_mutex);
+	thread = calloc(1, sizeof(*thread));
+	if (!thread)
+		return;
 
-	if (!pthread_create(&thread, NULL, __show_running_run_stats, NULL)) {
+	if (!pthread_create(thread, NULL, __show_running_run_stats, thread)) {
 		int err;
 
-		err = pthread_detach(thread);
+		err = pthread_detach(*thread);
 		if (err)
 			log_err("fio: DU thread detach failed: %s\n", strerror(err));
 
 		return;
 	}
 
-	fio_mutex_up(stat_mutex);
+	free(thread);
 }
 
 static int status_interval_init;

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-21  8:08                 ` Jens Axboe
@ 2014-07-21 20:25                   ` Vasily Tarasov
  2014-07-21 20:54                     ` Jens Axboe
  2014-07-25  7:43                     ` Jens Axboe
  0 siblings, 2 replies; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-21 20:25 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Mattsson, fio

Hi Jens,

I tried your patch, but it didn't help. Interestingly, the number of
threads changes in the end. At first, during the run:

# ps -eLf | grep fio
root      5224  4274  5224  1    2 11:12 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5231  5224  5231 60    1 11:12 ?        00:00:07 fio
--status-interval 10 --minimal fios/1.fio
root      5260  5237  5260  0    1 11:12 pts/0    00:00:00 grep fio
[root@bison01 vass]# ps -eLf | grep fio
root      5224  4274  5224  0    2 11:12 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5231  5224  5231 16    1 11:12 ?        00:00:21 fio
--status-interval 10 --minimal fios/1.fio
root      5293  5237  5293  0    1 11:14 pts/0    00:00:00 grep fio
[root@bison01 vass]# ps -eLf | grep fio
root      5224  4274  5224  0    2 11:12 pts/1    00:00:01 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5231  5224  5231 12    1 11:12 ?        00:01:13 fio
--status-interval 10 --minimal fios/1.fio
root      5411  5237  5411  0    1 11:22 pts/0    00:00:00 grep fio

Later, when the threads are stuck:

# ps -eLf | grep fio
root      5224  4274  5224  0   16 11:12 pts/1    00:00:02 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5225  0   16 11:12 pts/1    00:00:01 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5458  0   16 11:25 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5459  0   16 11:25 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5460  0   16 11:25 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5461  0   16 11:25 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5462  0   16 11:25 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5471  0   16 11:25 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5472  0   16 11:26 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5475  0   16 11:26 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5476  0   16 11:26 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5477  0   16 11:26 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5478  0   16 11:26 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5487  0   16 11:26 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5488  0   16 11:27 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      5224  4274  5489  0   16 11:27 pts/1    00:00:00 fio
--status-interval 10 --minimal fios/1.fio
root      6665  5237  6665  0    1 13:21 pts/0    00:00:00 grep fio

Is the number of threads supposed to change?..

Here is the fio file I'm using:

<snip starts>
# cat fios/1.fio
[global]
rw=write
bs=8m
direct=0

[sdaa]
filename=/dev/sdaa
<snip ends>

Command line is

fio --status-interval 10 --minimal fios/1.fio

Thanks,
Vasily


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-21 20:25                   ` Vasily Tarasov
@ 2014-07-21 20:54                     ` Jens Axboe
  2014-07-25  7:43                     ` Jens Axboe
  1 sibling, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2014-07-21 20:54 UTC (permalink / raw)
  To: Vasily Tarasov; +Cc: Michael Mattsson, fio

On 2014-07-21 22:25, Vasily Tarasov wrote:
> Hi Jens,
>
> I tried your patch, but it didn't help. Interestingly, the number of
> threads changes in the end. At first, during the run:
>
> # ps -eLf | grep fio
> root      5224  4274  5224  1    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5231  5224  5231 60    1 11:12 ?        00:00:07 fio
> --status-interval 10 --minimal fios/1.fio
> root      5260  5237  5260  0    1 11:12 pts/0    00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root      5224  4274  5224  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5231  5224  5231 16    1 11:12 ?        00:00:21 fio
> --status-interval 10 --minimal fios/1.fio
> root      5293  5237  5293  0    1 11:14 pts/0    00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root      5224  4274  5224  0    2 11:12 pts/1    00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5231  5224  5231 12    1 11:12 ?        00:01:13 fio
> --status-interval 10 --minimal fios/1.fio
> root      5411  5237  5411  0    1 11:22 pts/0    00:00:00 grep fio
>
> Later, when the threads are stuck:
>
> # ps -eLf | grep fio
> root      5224  4274  5224  0   16 11:12 pts/1    00:00:02 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0   16 11:12 pts/1    00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5458  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5459  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5460  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5461  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5462  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5471  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5472  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5475  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5476  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5477  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5478  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5487  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5488  0   16 11:27 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5489  0   16 11:27 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      6665  5237  6665  0    1 13:21 pts/0    00:00:00 grep fio
>
> Is the number of threads supposed to change?..
>
> Here is the fio file I'm using:
>
> <snip starts>
> # cat fios/1.fio
> [global]
> rw=write
> bs=8m
> direct=0
>
> [sdaa]
> filename=/dev/sdaa
> <snip ends>
>
> Command line is
>
> fio --status-interval 10 --minimal fios/1.fio

Thanks, I'll try and reproduce it tomorrow.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-21 20:25                   ` Vasily Tarasov
  2014-07-21 20:54                     ` Jens Axboe
@ 2014-07-25  7:43                     ` Jens Axboe
  2014-07-25  7:56                       ` Jens Axboe
  1 sibling, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-25  7:43 UTC (permalink / raw)
  To: Vasily Tarasov; +Cc: Michael Mattsson, fio

[-- Attachment #1: Type: text/plain, Size: 3920 bytes --]

On 2014-07-21 22:25, Vasily Tarasov wrote:
> Hi Jens,
>
> I tried your patch, but it didn't help. Interestingly, the number of
> threads changes in the end. At first, during the run:
>
> # ps -eLf | grep fio
> root      5224  4274  5224  1    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5231  5224  5231 60    1 11:12 ?        00:00:07 fio
> --status-interval 10 --minimal fios/1.fio
> root      5260  5237  5260  0    1 11:12 pts/0    00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root      5224  4274  5224  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5231  5224  5231 16    1 11:12 ?        00:00:21 fio
> --status-interval 10 --minimal fios/1.fio
> root      5293  5237  5293  0    1 11:14 pts/0    00:00:00 grep fio
> [root@bison01 vass]# ps -eLf | grep fio
> root      5224  4274  5224  0    2 11:12 pts/1    00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5231  5224  5231 12    1 11:12 ?        00:01:13 fio
> --status-interval 10 --minimal fios/1.fio
> root      5411  5237  5411  0    1 11:22 pts/0    00:00:00 grep fio
>
> Later, when the threads are stuck:
>
> # ps -eLf | grep fio
> root      5224  4274  5224  0   16 11:12 pts/1    00:00:02 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5225  0   16 11:12 pts/1    00:00:01 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5458  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5459  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5460  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5461  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5462  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5471  0   16 11:25 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5472  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5475  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5476  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5477  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5478  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5487  0   16 11:26 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5488  0   16 11:27 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      5224  4274  5489  0   16 11:27 pts/1    00:00:00 fio
> --status-interval 10 --minimal fios/1.fio
> root      6665  5237  6665  0    1 13:21 pts/0    00:00:00 grep fio
>
> Is the number of threads supposed to change?..

Never answered this one... Yes, it'll change, since when you run the 
job, you'll have one backend process, a number of IO workers, and one 
disk util thread typically. When you get stuck, it's the backend that is 
left waiting for that mutex.

In any case, I haven't been able to figure this one out yet. But it 
should be safe enough to just ignore the stat mutex for the final 
output, since the threads otherwise accessing it are gone. Can you see 
if this one makes the issue go away?

-- 
Jens Axboe


[-- Attachment #2: stat-hang.patch --]
[-- Type: text/x-patch, Size: 812 bytes --]

diff --git a/backend.c b/backend.c
index 30f78b72f9d1..981625b61095 100644
--- a/backend.c
+++ b/backend.c
@@ -2068,7 +2068,7 @@ int fio_backend(void)
 	run_threads();
 
 	if (!fio_abort) {
-		show_run_stats();
+		__show_run_stats();
 		if (write_bw_log) {
 			int i;
 
diff --git a/stat.h b/stat.h
index 2e46175053e8..90a7fb31a1bc 100644
--- a/stat.h
+++ b/stat.h
@@ -218,6 +218,7 @@ extern void show_group_stats(struct group_run_stats *rs);
 extern int calc_thread_status(struct jobs_eta *je, int force);
 extern void display_thread_status(struct jobs_eta *je);
 extern void show_run_stats(void);
+extern void __show_run_stats(void);
 extern void show_running_run_stats(void);
 extern void check_for_running_stats(void);
 extern void sum_thread_stats(struct thread_stat *dst, struct thread_stat *src, int nr);

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-25  7:43                     ` Jens Axboe
@ 2014-07-25  7:56                       ` Jens Axboe
  2014-07-25 16:34                         ` Vasily Tarasov
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-25  7:56 UTC (permalink / raw)
  To: Vasily Tarasov; +Cc: Michael Mattsson, fio

On 2014-07-25 09:43, Jens Axboe wrote:
> On 2014-07-21 22:25, Vasily Tarasov wrote:
>> Hi Jens,
>>
>> I tried your patch, but it didn't help. Interestingly, the number of
>> threads changes in the end. At first, during the run:
>>
>> # ps -eLf | grep fio
>> root      5224  4274  5224  1    2 11:12 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5231  5224  5231 60    1 11:12 ?        00:00:07 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5260  5237  5260  0    1 11:12 pts/0    00:00:00 grep fio
>> [root@bison01 vass]# ps -eLf | grep fio
>> root      5224  4274  5224  0    2 11:12 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5231  5224  5231 16    1 11:12 ?        00:00:21 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5293  5237  5293  0    1 11:14 pts/0    00:00:00 grep fio
>> [root@bison01 vass]# ps -eLf | grep fio
>> root      5224  4274  5224  0    2 11:12 pts/1    00:00:01 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5231  5224  5231 12    1 11:12 ?        00:01:13 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5411  5237  5411  0    1 11:22 pts/0    00:00:00 grep fio
>>
>> Later, when the threads are stuck:
>>
>> # ps -eLf | grep fio
>> root      5224  4274  5224  0   16 11:12 pts/1    00:00:02 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5225  0   16 11:12 pts/1    00:00:01 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5458  0   16 11:25 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5459  0   16 11:25 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5460  0   16 11:25 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5461  0   16 11:25 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5462  0   16 11:25 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5471  0   16 11:25 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5472  0   16 11:26 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5475  0   16 11:26 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5476  0   16 11:26 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5477  0   16 11:26 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5478  0   16 11:26 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5487  0   16 11:26 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5488  0   16 11:27 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      5224  4274  5489  0   16 11:27 pts/1    00:00:00 fio
>> --status-interval 10 --minimal fios/1.fio
>> root      6665  5237  6665  0    1 13:21 pts/0    00:00:00 grep fio
>>
>> Is the number of threads supposed to change?..
>
> Never answered this one... Yes, it'll change, since when you run the
> job, you'll have one backend process, a number of IO workers, and one
> disk util thread typically. When you get stuck, it's the backend that is
> left waiting for that mutex.
>
> In any case, I haven't been able to figure this one out yet. But it
> should be safe enough to just ignore the stat mutex for the final
> output, since the threads otherwise accessing it are gone. Can you see
> if this one makes the issue go away?

Patch was not compiled, was missing the non-static __show_run_stats(). 
But just pull current -git, I have committed a variant that does compile :-)

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-25  7:56                       ` Jens Axboe
@ 2014-07-25 16:34                         ` Vasily Tarasov
  2014-07-28  8:56                           ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-25 16:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Mattsson, fio

Hi Jens,

You'll be surprised but it did not help :( I used the latest code from
git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.

I don't know if it helps, but I see this behavior on a machine with
96GB of RAM. So, after buffered writes are over, fio waits for a long
time till all dirty buffers hit the disk. But, even after there is no
more disk activity, fio is still stuck for as long as I don't kill it.

Regarding the number of threads. I do understand where the 3 threads
can come from:

1) Backend thread (sort of a manager)
1) Worker thread(s)
2) Disk stats thread

I my case I defined only one job instance, so I suppose there always
should be only one worker thread. I don't understand how the total
number of threads go to 10 in the end.

<snip starts>
$ ps -eLf | grep fio
root      4427  4135  4427  0   15 07:44 pts/1    00:00:02 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4636  0   15 07:56 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4637  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4638  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4647  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4650  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4651  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4652  0   15 07:57 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4653  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4654  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4663  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4664  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4666  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4668  0   15 07:58 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
root      4427  4135  4669  0   15 07:59 pts/1    00:00:00 fio
--minimal --status-interval 10 1.fio
<snip ends>

Thanks,
Vasily

On Fri, Jul 25, 2014 at 3:56 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-25 09:43, Jens Axboe wrote:
>>
>> On 2014-07-21 22:25, Vasily Tarasov wrote:
>>>
>>> Hi Jens,
>>>
>>> I tried your patch, but it didn't help. Interestingly, the number of
>>> threads changes in the end. At first, during the run:
>>>
>>> # ps -eLf | grep fio
>>> root      5224  4274  5224  1    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5231  5224  5231 60    1 11:12 ?        00:00:07 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5260  5237  5260  0    1 11:12 pts/0    00:00:00 grep fio
>>> [root@bison01 vass]# ps -eLf | grep fio
>>> root      5224  4274  5224  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5231  5224  5231 16    1 11:12 ?        00:00:21 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5293  5237  5293  0    1 11:14 pts/0    00:00:00 grep fio
>>> [root@bison01 vass]# ps -eLf | grep fio
>>> root      5224  4274  5224  0    2 11:12 pts/1    00:00:01 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0    2 11:12 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5231  5224  5231 12    1 11:12 ?        00:01:13 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5411  5237  5411  0    1 11:22 pts/0    00:00:00 grep fio
>>>
>>> Later, when the threads are stuck:
>>>
>>> # ps -eLf | grep fio
>>> root      5224  4274  5224  0   16 11:12 pts/1    00:00:02 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5225  0   16 11:12 pts/1    00:00:01 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5458  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5459  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5460  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5461  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5462  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5471  0   16 11:25 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5472  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5475  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5476  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5477  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5478  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5487  0   16 11:26 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5488  0   16 11:27 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      5224  4274  5489  0   16 11:27 pts/1    00:00:00 fio
>>> --status-interval 10 --minimal fios/1.fio
>>> root      6665  5237  6665  0    1 13:21 pts/0    00:00:00 grep fio
>>>
>>> Is the number of threads supposed to change?..
>>
>>
>> Never answered this one... Yes, it'll change, since when you run the
>> job, you'll have one backend process, a number of IO workers, and one
>> disk util thread typically. When you get stuck, it's the backend that is
>> left waiting for that mutex.
>>
>> In any case, I haven't been able to figure this one out yet. But it
>> should be safe enough to just ignore the stat mutex for the final
>> output, since the threads otherwise accessing it are gone. Can you see
>> if this one makes the issue go away?
>
>
> Patch was not compiled, was missing the non-static __show_run_stats(). But
> just pull current -git, I have committed a variant that does compile :-)
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-25 16:34                         ` Vasily Tarasov
@ 2014-07-28  8:56                           ` Jens Axboe
  2014-07-28 16:05                             ` Vasily Tarasov
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-28  8:56 UTC (permalink / raw)
  To: Vasily Tarasov; +Cc: Michael Mattsson, fio

On 2014-07-25 18:34, Vasily Tarasov wrote:
> Hi Jens,
>
> You'll be surprised but it did not help :( I used the latest code from
> git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.

That's actually good news, since it didn't make a lot of sense. So lets 
see if we can't get to the bottom of this...

> I don't know if it helps, but I see this behavior on a machine with
> 96GB of RAM. So, after buffered writes are over, fio waits for a long
> time till all dirty buffers hit the disk. But, even after there is no
> more disk activity, fio is still stuck for as long as I don't kill it.
>
> Regarding the number of threads. I do understand where the 3 threads
> can come from:
>
> 1) Backend thread (sort of a manager)
> 1) Worker thread(s)
> 2) Disk stats thread
>
> I my case I defined only one job instance, so I suppose there always
> should be only one worker thread. I don't understand how the total
> number of threads go to 10 in the end.
>
> <snip starts>
> $ ps -eLf | grep fio
> root      4427  4135  4427  0   15 07:44 pts/1    00:00:02 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4636  0   15 07:56 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4637  0   15 07:57 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4638  0   15 07:57 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4647  0   15 07:57 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4650  0   15 07:57 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4651  0   15 07:57 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4652  0   15 07:57 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4653  0   15 07:58 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4654  0   15 07:58 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4663  0   15 07:58 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4664  0   15 07:58 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4666  0   15 07:58 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4668  0   15 07:58 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> root      4427  4135  4669  0   15 07:59 pts/1    00:00:00 fio
> --minimal --status-interval 10 1.fio
> <snip ends>

Can you try and gdb attach to it when it's hung and produce a new 
backtrace? It can't be off the final status run, I wonder if it's off 
the mutex down and remove instead.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-28  8:56                           ` Jens Axboe
@ 2014-07-28 16:05                             ` Vasily Tarasov
  2014-07-28 19:49                               ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Vasily Tarasov @ 2014-07-28 16:05 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Mattsson, fio

Hi Jens,

Thanks for looking into this. Here is the information you've requested:

1. Workload I'm running:

<snip starts>

[global]
rw=write
bs=8m
direct=0
thread=1
size=4g

[sdaa]
filename=/dev/sdaa

<snip ends>

2. That's what I see on the screen:

<snip starts>

# fio --status-interval 10 1.fio
sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1
fio-2.1.11-10-gae7e
Starting 1 thread
Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014
  write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec
    clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68
     lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71
    clat percentiles (usec):
     |  1.00th=[ 2448],  5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ 2544],
     | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ 2576],
     | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ 3152],
     | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ 3664],
     | 99.99th=[ 3664]
    bw (MB  /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, stdev=215.50
    lat (msec) : 4=100.00%
  cpu          : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=512/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s,
maxb=367566KB/s, mint=11411msec, maxt=11411msec

Disk stats (read/write):
  sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36%

<snip ends>

At this point fio does not exit. Below are the gdb backtraces. There
are three threads in the run. When I attach to one thread, get
backtrace, and then detach fio crashes with Segfault. So, to collect 3
backtraces, I ran the experiment 3 times and attached to a different
thread every time. Below, I change the PIDs so that they match the ps
output.

# ps -eLf | grep fio
root     18844 10064 18844  0    3 08:50 pts/1    00:00:00 fio
--status-interval 10 1.fio
root     18844 10064 18862  0    3 08:50 pts/1    00:00:00 fio
--status-interval 10 1.fio
root     18844 10064 18863  0    3 08:50 pts/1    00:00:00 fio
--status-interval 10 1.fio
root     18902 18805 18902  0    1 08:53 pts/0    00:00:00 grep fio
# gdb
(gdb) attach 18844
(gdb) bt
#0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
#2  0x0000000000411bc0 in stat_exit () at stat.c:1905
#3  0x000000000044c1d6 in fio_backend () at backend.c:2094
#4  0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
#5  0x000000000040a679 in _start ()
(gdb) attach 1862
(gdb) bt
#0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at mutex.c:155
#2  0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at stat.c:1445
#3  0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000376eee8b6d in clone () from /lib64/libc.so.6
(gdb) attach 1863
(gdb) bt
#0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
#2  0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at stat.c:1421
#3  0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
#4  0x000000376eee8b6d in clone () from /lib64/libc.so.6

Let me know if you need any additional information.

Thank you,
Vasily

On Mon, Jul 28, 2014 at 1:56 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-25 18:34, Vasily Tarasov wrote:
>>
>> Hi Jens,
>>
>> You'll be surprised but it did not help :( I used the latest code from
>> git (fio-2.1.11-10-gae7e, commit ae7e050). Still see the same picture.
>
>
> That's actually good news, since it didn't make a lot of sense. So lets see
> if we can't get to the bottom of this...
>
>
>> I don't know if it helps, but I see this behavior on a machine with
>> 96GB of RAM. So, after buffered writes are over, fio waits for a long
>> time till all dirty buffers hit the disk. But, even after there is no
>> more disk activity, fio is still stuck for as long as I don't kill it.
>>
>> Regarding the number of threads. I do understand where the 3 threads
>> can come from:
>>
>> 1) Backend thread (sort of a manager)
>> 1) Worker thread(s)
>> 2) Disk stats thread
>>
>> I my case I defined only one job instance, so I suppose there always
>> should be only one worker thread. I don't understand how the total
>> number of threads go to 10 in the end.
>>
>> <snip starts>
>> $ ps -eLf | grep fio
>> root      4427  4135  4427  0   15 07:44 pts/1    00:00:02 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4636  0   15 07:56 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4637  0   15 07:57 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4638  0   15 07:57 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4647  0   15 07:57 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4650  0   15 07:57 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4651  0   15 07:57 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4652  0   15 07:57 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4653  0   15 07:58 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4654  0   15 07:58 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4663  0   15 07:58 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4664  0   15 07:58 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4666  0   15 07:58 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4668  0   15 07:58 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> root      4427  4135  4669  0   15 07:59 pts/1    00:00:00 fio
>> --minimal --status-interval 10 1.fio
>> <snip ends>
>
>
> Can you try and gdb attach to it when it's hung and produce a new backtrace?
> It can't be off the final status run, I wonder if it's off the mutex down
> and remove instead.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-28 16:05                             ` Vasily Tarasov
@ 2014-07-28 19:49                               ` Jens Axboe
  2014-10-23 15:57                                 ` Michael Mattsson
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-07-28 19:49 UTC (permalink / raw)
  To: Vasily Tarasov; +Cc: Michael Mattsson, fio

On 2014-07-28 18:05, Vasily Tarasov wrote:
> Hi Jens,
>
> Thanks for looking into this. Here is the information you've requested:
>
> 1. Workload I'm running:
>
> <snip starts>
>
> [global]
> rw=write
> bs=8m
> direct=0
> thread=1
> size=4g
>
> [sdaa]
> filename=/dev/sdaa
>
> <snip ends>
>
> 2. That's what I see on the screen:
>
> <snip starts>
>
> # fio --status-interval 10 1.fio
> sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1
> fio-2.1.11-10-gae7e
> Starting 1 thread
> Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
> sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014
>    write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec
>      clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68
>       lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71
>      clat percentiles (usec):
>       |  1.00th=[ 2448],  5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[ 2544],
>       | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[ 2576],
>       | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[ 3152],
>       | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[ 3664],
>       | 99.99th=[ 3664]
>      bw (MB  /s): min= 2464, max= 2842, per=100.00%, avg=2712.77, stdev=215.50
>      lat (msec) : 4=100.00%
>    cpu          : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued    : total=r=0/w=512/d=0, short=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=1
>
> Run status group 0 (all jobs):
>    WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s,
> maxb=367566KB/s, mint=11411msec, maxt=11411msec
>
> Disk stats (read/write):
>    sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36%
>
> <snip ends>
>
> At this point fio does not exit. Below are the gdb backtraces. There
> are three threads in the run. When I attach to one thread, get
> backtrace, and then detach fio crashes with Segfault. So, to collect 3
> backtraces, I ran the experiment 3 times and attached to a different
> thread every time. Below, I change the PIDs so that they match the ps
> output.
>
> # ps -eLf | grep fio
> root     18844 10064 18844  0    3 08:50 pts/1    00:00:00 fio
> --status-interval 10 1.fio
> root     18844 10064 18862  0    3 08:50 pts/1    00:00:00 fio
> --status-interval 10 1.fio
> root     18844 10064 18863  0    3 08:50 pts/1    00:00:00 fio
> --status-interval 10 1.fio
> root     18902 18805 18902  0    1 08:53 pts/0    00:00:00 grep fio
> # gdb
> (gdb) attach 18844
> (gdb) bt
> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
> #2  0x0000000000411bc0 in stat_exit () at stat.c:1905
> #3  0x000000000044c1d6 in fio_backend () at backend.c:2094
> #4  0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
> #5  0x000000000040a679 in _start ()
> (gdb) attach 1862
> (gdb) bt
> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at mutex.c:155
> #2  0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at stat.c:1445
> #3  0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000376eee8b6d in clone () from /lib64/libc.so.6
> (gdb) attach 1863
> (gdb) bt
> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at mutex.c:155
> #2  0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at stat.c:1421
> #3  0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
> #4  0x000000376eee8b6d in clone () from /lib64/libc.so.6
>
> Let me know if you need any additional information.

This is perfect, makes a lot more sense. Progress on this may be a 
little slow, I'm on vacation this week... But expect something next 
week, at least.


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-07-28 19:49                               ` Jens Axboe
@ 2014-10-23 15:57                                 ` Michael Mattsson
  2014-10-23 16:01                                   ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-10-23 15:57 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Vasily Tarasov, fio

Hey,
Have there been any work on this? This is still an issue for me on fio
2.1.13. It happens less frequently but I've had two hangs in a sample
of ~40 runs.

Regards
Michael

On Mon, Jul 28, 2014 at 12:49 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-07-28 18:05, Vasily Tarasov wrote:
>>
>> Hi Jens,
>>
>> Thanks for looking into this. Here is the information you've requested:
>>
>> 1. Workload I'm running:
>>
>> <snip starts>
>>
>> [global]
>> rw=write
>> bs=8m
>> direct=0
>> thread=1
>> size=4g
>>
>> [sdaa]
>> filename=/dev/sdaa
>>
>> <snip ends>
>>
>> 2. That's what I see on the screen:
>>
>> <snip starts>
>>
>> # fio --status-interval 10 1.fio
>> sdaa: (g=0): rw=write, bs=8M-8M/8M-8M/8M-8M, ioengine=sync, iodepth=1
>> fio-2.1.11-10-gae7e
>> Starting 1 thread
>> Jobs: 1 (f=1): [W(1)] [-.-% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>> 00m:00s]
>> sdaa: (groupid=0, jobs=1): err= 0: pid=18849: Mon Jul 28 08:50:47 2014
>>    write: io=4096.0MB, bw=367567KB/s, iops=44, runt= 11411msec
>>      clat (usec): min=2400, max=3657, avg=2677.80, stdev=234.68
>>       lat (usec): min=2674, max=3968, avg=2948.69, stdev=269.71
>>      clat percentiles (usec):
>>       |  1.00th=[ 2448],  5.00th=[ 2480], 10.00th=[ 2512], 20.00th=[
>> 2544],
>>       | 30.00th=[ 2544], 40.00th=[ 2576], 50.00th=[ 2576], 60.00th=[
>> 2576],
>>       | 70.00th=[ 2608], 80.00th=[ 3024], 90.00th=[ 3120], 95.00th=[
>> 3152],
>>       | 99.00th=[ 3248], 99.50th=[ 3312], 99.90th=[ 3664], 99.95th=[
>> 3664],
>>       | 99.99th=[ 3664]
>>      bw (MB  /s): min= 2464, max= 2842, per=100.00%, avg=2712.77,
>> stdev=215.50
>>      lat (msec) : 4=100.00%
>>    cpu          : usr=1.31%, sys=13.92%, ctx=161, majf=0, minf=6
>>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       issued    : total=r=0/w=512/d=0, short=r=0/w=0/d=0
>>       latency   : target=0, window=0, percentile=100.00%, depth=1
>>
>> Run status group 0 (all jobs):
>>    WRITE: io=4096.0MB, aggrb=367566KB/s, minb=367566KB/s,
>> maxb=367566KB/s, mint=11411msec, maxt=11411msec
>>
>> Disk stats (read/write):
>>    sdaa: ios=84/0, merge=0/0, ticks=43/0, in_queue=35, util=2.36%
>>
>> <snip ends>
>>
>> At this point fio does not exit. Below are the gdb backtraces. There
>> are three threads in the run. When I attach to one thread, get
>> backtrace, and then detach fio crashes with Segfault. So, to collect 3
>> backtraces, I ran the experiment 3 times and attached to a different
>> thread every time. Below, I change the PIDs so that they match the ps
>> output.
>>
>> # ps -eLf | grep fio
>> root     18844 10064 18844  0    3 08:50 pts/1    00:00:00 fio
>> --status-interval 10 1.fio
>> root     18844 10064 18862  0    3 08:50 pts/1    00:00:00 fio
>> --status-interval 10 1.fio
>> root     18844 10064 18863  0    3 08:50 pts/1    00:00:00 fio
>> --status-interval 10 1.fio
>> root     18902 18805 18902  0    1 08:53 pts/0    00:00:00 grep fio
>> # gdb
>> (gdb) attach 18844
>> (gdb) bt
>> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at
>> mutex.c:155
>> #2  0x0000000000411bc0 in stat_exit () at stat.c:1905
>> #3  0x000000000044c1d6 in fio_backend () at backend.c:2094
>> #4  0x000000376ee1ed1d in __libc_start_main () from /lib64/libc.so.6
>> #5  0x000000000040a679 in _start ()
>> (gdb) attach 1862
>> (gdb) bt
>> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff2000) at
>> mutex.c:155
>> #2  0x000000000041c717 in __show_running_run_stats (arg=0x72a330) at
>> stat.c:1445
>> #3  0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x000000376eee8b6d in clone () from /lib64/libc.so.6
>> (gdb) attach 1863
>> (gdb) bt
>> #0  0x000000376f60b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x0000000000430419 in fio_mutex_down (mutex=0x7ffff7ff6000) at
>> mutex.c:155
>> #2  0x000000000041c5cd in __show_running_run_stats (arg=0x72a490) at
>> stat.c:1421
>> #3  0x000000376f6079d1 in start_thread () from /lib64/libpthread.so.0
>> #4  0x000000376eee8b6d in clone () from /lib64/libc.so.6
>>
>> Let me know if you need any additional information.
>
>
> This is perfect, makes a lot more sense. Progress on this may be a little
> slow, I'm on vacation this week... But expect something next week, at least.
>
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-10-23 15:57                                 ` Michael Mattsson
@ 2014-10-23 16:01                                   ` Jens Axboe
  2014-10-23 21:58                                     ` Michael Mattsson
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-10-23 16:01 UTC (permalink / raw)
  To: Michael Mattsson; +Cc: Vasily Tarasov, fio

On 10/23/2014 09:57 AM, Michael Mattsson wrote:
> Hey,
> Have there been any work on this? This is still an issue for me on fio
> 2.1.13. It happens less frequently but I've had two hangs in a sample
> of ~40 runs.

Try current -git and see if it works better. If not, lets get some new
traces and we'll get this fixed up.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-10-23 16:01                                   ` Jens Axboe
@ 2014-10-23 21:58                                     ` Michael Mattsson
  2014-10-24  5:17                                       ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-10-23 21:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Vasily Tarasov, fio

Hi,
I got some new traces of the stuck fio process.

This is 2.1.13:
(gdb) bt
#0  0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at mutex.c:155
#2  0x0000000000411950 in stat_exit () at stat.c:1905
#3  0x000000000044c756 in fio_backend () at backend.c:2094
#4  0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6
#5  0x0000000000409e49 in _start ()
(gdb)

With fio-2.1.13-88-gb2ee7:
(gdb) bt
#0  0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0
#1  0x0000000000411939 in wait_for_status_interval_thread_exit () at stat.c:1939
#2  0x000000000044ce2a in fio_backend () at backend.c:2084
#3  0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6
#4  0x000000000040a529 in _start ()
(gdb)

Regards
Michael

On Thu, Oct 23, 2014 at 9:01 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 10/23/2014 09:57 AM, Michael Mattsson wrote:
>> Hey,
>> Have there been any work on this? This is still an issue for me on fio
>> 2.1.13. It happens less frequently but I've had two hangs in a sample
>> of ~40 runs.
>
> Try current -git and see if it works better. If not, lets get some new
> traces and we'll get this fixed up.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-10-23 21:58                                     ` Michael Mattsson
@ 2014-10-24  5:17                                       ` Jens Axboe
  2014-10-24 15:33                                         ` Michael Mattsson
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-10-24  5:17 UTC (permalink / raw)
  To: Michael Mattsson; +Cc: Vasily Tarasov, fio

On 2014-10-23 15:58, Michael Mattsson wrote:
> Hi,
> I got some new traces of the stuck fio process.
>
> This is 2.1.13:
> (gdb) bt
> #0  0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> #1  0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at mutex.c:155
> #2  0x0000000000411950 in stat_exit () at stat.c:1905
> #3  0x000000000044c756 in fio_backend () at backend.c:2094
> #4  0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6
> #5  0x0000000000409e49 in _start ()
> (gdb)
>
> With fio-2.1.13-88-gb2ee7:
> (gdb) bt
> #0  0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0
> #1  0x0000000000411939 in wait_for_status_interval_thread_exit () at stat.c:1939
> #2  0x000000000044ce2a in fio_backend () at backend.c:2084
> #3  0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6
> #4  0x000000000040a529 in _start ()

OK, I think I see what this is. Please update to current -git again and 
see if that doesn't fix up this issue.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-10-24  5:17                                       ` Jens Axboe
@ 2014-10-24 15:33                                         ` Michael Mattsson
  2014-10-24 15:50                                           ` Jens Axboe
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Mattsson @ 2014-10-24 15:33 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Vasily Tarasov, fio

Hey,
This seems to be fixed now. I successfully finished 1500+ runs over
night in my rig without issues (it would have a hanging fio thread
about every other or second or so between eight clients).

The stumbling output issue on the second status interval thread I've
got still remain inconsistent.

Regards
Michael

On Thu, Oct 23, 2014 at 10:17 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-10-23 15:58, Michael Mattsson wrote:
>>
>> Hi,
>> I got some new traces of the stuck fio process.
>>
>> This is 2.1.13:
>> (gdb) bt
>> #0  0x00007f53cbb085bc in pthread_cond_wait@@GLIBC_2.3.2 () from
>> /lib64/libpthread.so.0
>> #1  0x0000000000430339 in fio_mutex_down (mutex=0x7f53c24ce000) at
>> mutex.c:155
>> #2  0x0000000000411950 in stat_exit () at stat.c:1905
>> #3  0x000000000044c756 in fio_backend () at backend.c:2094
>> #4  0x00007f53cb583d1d in __libc_start_main () from /lib64/libc.so.6
>> #5  0x0000000000409e49 in _start ()
>> (gdb)
>>
>> With fio-2.1.13-88-gb2ee7:
>> (gdb) bt
>> #0  0x00007f871e9de22d in pthread_join () from /lib64/libpthread.so.0
>> #1  0x0000000000411939 in wait_for_status_interval_thread_exit () at
>> stat.c:1939
>> #2  0x000000000044ce2a in fio_backend () at backend.c:2084
>> #3  0x00007f871e45cd1d in __libc_start_main () from /lib64/libc.so.6
>> #4  0x000000000040a529 in _start ()
>
>
> OK, I think I see what this is. Please update to current -git again and see
> if that doesn't fix up this issue.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-10-24 15:33                                         ` Michael Mattsson
@ 2014-10-24 15:50                                           ` Jens Axboe
  2014-11-04 15:46                                             ` Vasily Tarasov
  0 siblings, 1 reply; 25+ messages in thread
From: Jens Axboe @ 2014-10-24 15:50 UTC (permalink / raw)
  To: Michael Mattsson; +Cc: Vasily Tarasov, fio

On 2014-10-24 09:33, Michael Mattsson wrote:
> Hey,
> This seems to be fixed now. I successfully finished 1500+ runs over
> night in my rig without issues (it would have a hanging fio thread
> about every other or second or so between eight clients).

Goodie, I thought that it might.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: fio hangs with --status-interval
  2014-10-24 15:50                                           ` Jens Axboe
@ 2014-11-04 15:46                                             ` Vasily Tarasov
  0 siblings, 0 replies; 25+ messages in thread
From: Vasily Tarasov @ 2014-11-04 15:46 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Michael Mattsson, fio

The latest code from git does not hang for me as well. Thanks.

Vasily

On Fri, Oct 24, 2014 at 11:50 AM, Jens Axboe <axboe@kernel.dk> wrote:
> On 2014-10-24 09:33, Michael Mattsson wrote:
>>
>> Hey,
>> This seems to be fixed now. I successfully finished 1500+ runs over
>> night in my rig without issues (it would have a hanging fio thread
>> about every other or second or so between eight clients).
>
>
> Goodie, I thought that it might.
>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2014-11-04 15:46 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-09 22:56 fio hangs with --status-interval Michael Mattsson
2014-07-10  8:44 ` Jens Axboe
2014-07-10 14:55   ` Michael Mattsson
2014-07-10 20:07     ` Jens Axboe
2014-07-10 21:27       ` Michael Mattsson
2014-07-11 11:48         ` Jens Axboe
2014-07-11 16:19           ` Michael Mattsson
2014-07-12  8:42             ` Jens Axboe
2014-07-16 16:58               ` Vasily Tarasov
2014-07-21  8:08                 ` Jens Axboe
2014-07-21 20:25                   ` Vasily Tarasov
2014-07-21 20:54                     ` Jens Axboe
2014-07-25  7:43                     ` Jens Axboe
2014-07-25  7:56                       ` Jens Axboe
2014-07-25 16:34                         ` Vasily Tarasov
2014-07-28  8:56                           ` Jens Axboe
2014-07-28 16:05                             ` Vasily Tarasov
2014-07-28 19:49                               ` Jens Axboe
2014-10-23 15:57                                 ` Michael Mattsson
2014-10-23 16:01                                   ` Jens Axboe
2014-10-23 21:58                                     ` Michael Mattsson
2014-10-24  5:17                                       ` Jens Axboe
2014-10-24 15:33                                         ` Michael Mattsson
2014-10-24 15:50                                           ` Jens Axboe
2014-11-04 15:46                                             ` Vasily Tarasov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.