All of lore.kernel.org
 help / color / mirror / Atom feed
* Exit all jobs on error
@ 2015-12-10  7:01 Sitsofe Wheeler
  2015-12-10 17:11 ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Sitsofe Wheeler @ 2015-12-10  7:01 UTC (permalink / raw)
  To: fio

Hi,

Is there an option to exit all jobs but only on error? If I have a job like this

[global]
stonewall=1
verify=crc32
rw=write
[pass1]
bs=4k
[pass2]
bs=8k

I want fio to stop if pass1 fails verification and for pass2 not to be
performed at all. I'm aware of "exitall" but using that will make fio
quit even if pass1 is successful.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10  7:01 Exit all jobs on error Sitsofe Wheeler
@ 2015-12-10 17:11 ` Jens Axboe
  2015-12-10 17:58   ` Sitsofe Wheeler
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2015-12-10 17:11 UTC (permalink / raw)
  To: Sitsofe Wheeler, fio

On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote:
> Hi,
>
> Is there an option to exit all jobs but only on error? If I have a job like this
>
> [global]
> stonewall=1
> verify=crc32
> rw=write
> [pass1]
> bs=4k
> [pass2]
> bs=8k
>
> I want fio to stop if pass1 fails verification and for pass2 not to be
> performed at all. I'm aware of "exitall" but using that will make fio
> quit even if pass1 is successful.

That doesn't exist, but we could add a exitall_on_error to have that 
behavior. Should be pretty easy to add.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 17:11 ` Jens Axboe
@ 2015-12-10 17:58   ` Sitsofe Wheeler
  2015-12-10 18:11     ` Andrey Kuzmin
  2015-12-10 18:15     ` Jens Axboe
  0 siblings, 2 replies; 16+ messages in thread
From: Sitsofe Wheeler @ 2015-12-10 17:58 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

On 10 December 2015 at 17:11, Jens Axboe <axboe@kernel.dk> wrote:
> On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote:
>>
>> Hi,
>>
>> Is there an option to exit all jobs but only on error? If I have a job
>> like this
>>
>> [global]
>> stonewall=1
>> verify=crc32
>> rw=write
>> [pass1]
>> bs=4k
>> [pass2]
>> bs=8k
>>
>> I want fio to stop if pass1 fails verification and for pass2 not to be
>> performed at all. I'm aware of "exitall" but using that will make fio
>> quit even if pass1 is successful.
>
>
> That doesn't exist, but we could add a exitall_on_error to have that
> behavior. Should be pretty easy to add.

That would work for me - that way it could be put in the global
section or per (stonewall) group.

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 17:58   ` Sitsofe Wheeler
@ 2015-12-10 18:11     ` Andrey Kuzmin
  2015-12-10 18:15       ` Jens Axboe
  2015-12-10 18:15     ` Jens Axboe
  1 sibling, 1 reply; 16+ messages in thread
From: Andrey Kuzmin @ 2015-12-10 18:11 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, fio

I've also encountered a similar issue a number of times where the job
failed to stop (and refused to terminate in response to C-C) when a
thread/process fails, e.g. due to an error. My guess is that the loop
that waits for completions doesn't check for td->terminate being set.

Regards,
Andrey
Regards,
Andrey


On Thu, Dec 10, 2015 at 8:58 PM, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
> On 10 December 2015 at 17:11, Jens Axboe <axboe@kernel.dk> wrote:
>> On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote:
>>>
>>> Hi,
>>>
>>> Is there an option to exit all jobs but only on error? If I have a job
>>> like this
>>>
>>> [global]
>>> stonewall=1
>>> verify=crc32
>>> rw=write
>>> [pass1]
>>> bs=4k
>>> [pass2]
>>> bs=8k
>>>
>>> I want fio to stop if pass1 fails verification and for pass2 not to be
>>> performed at all. I'm aware of "exitall" but using that will make fio
>>> quit even if pass1 is successful.
>>
>>
>> That doesn't exist, but we could add a exitall_on_error to have that
>> behavior. Should be pretty easy to add.
>
> That would work for me - that way it could be put in the global
> section or per (stonewall) group.
>
> --
> Sitsofe | http://sucs.org/~sits/
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 17:58   ` Sitsofe Wheeler
  2015-12-10 18:11     ` Andrey Kuzmin
@ 2015-12-10 18:15     ` Jens Axboe
  1 sibling, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2015-12-10 18:15 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

On Thu, Dec 10 2015, Sitsofe Wheeler wrote:
> On 10 December 2015 at 17:11, Jens Axboe <axboe@kernel.dk> wrote:
> > On 12/10/2015 12:01 AM, Sitsofe Wheeler wrote:
> >>
> >> Hi,
> >>
> >> Is there an option to exit all jobs but only on error? If I have a job
> >> like this
> >>
> >> [global]
> >> stonewall=1
> >> verify=crc32
> >> rw=write
> >> [pass1]
> >> bs=4k
> >> [pass2]
> >> bs=8k
> >>
> >> I want fio to stop if pass1 fails verification and for pass2 not to be
> >> performed at all. I'm aware of "exitall" but using that will make fio
> >> quit even if pass1 is successful.
> >
> >
> > That doesn't exist, but we could add a exitall_on_error to have that
> > behavior. Should be pretty easy to add.
> 
> That would work for me - that way it could be put in the global
> section or per (stonewall) group.

Something like the below should work. Apart from the 'adding the option'
sugar, it's a few lines of changes. If you could test, that'd be great.


diff --git a/HOWTO b/HOWTO
index eb9c8245d4e3..b21d27e3b15f 100644
--- a/HOWTO
+++ b/HOWTO
@@ -1227,6 +1227,9 @@ exitall		When one job finishes, terminate the rest. The default is
 		to wait for each job to finish, sometimes that is not the
 		desired action.
 
+exitall_on_error	When one job finishes in error, terminate the rest. The
+		default is to wait for each job to finish.
+
 bwavgtime=int	Average the calculated bandwidth over the given time. Value
 		is specified in milliseconds.
 
diff --git a/backend.c b/backend.c
index 425b0ee94c37..e37fffb7b183 100644
--- a/backend.c
+++ b/backend.c
@@ -974,7 +974,7 @@ reap:
 
 		if (!in_ramp_time(td) && should_check_rate(td)) {
 			if (check_min_rate(td, &comp_time)) {
-				if (exitall_on_terminate)
+				if (exitall_on_terminate || td->o.exitall_error)
 					fio_terminate_threads(td->groupid);
 				td_verror(td, EIO, "check_min_rate");
 				break;
@@ -1662,7 +1662,7 @@ static void *thread_main(void *data)
 	if (o->exec_postrun)
 		exec_string(o, o->exec_postrun, (const char *)"postrun");
 
-	if (exitall_on_terminate)
+	if (exitall_on_terminate || (o->exitall_error && td->error))
 		fio_terminate_threads(td->groupid);
 
 err:
diff --git a/cconv.c b/cconv.c
index c0168c47d7dd..a476aad6376a 100644
--- a/cconv.c
+++ b/cconv.c
@@ -167,6 +167,7 @@ void convert_thread_options_to_cpu(struct thread_options *o,
 	o->fsync_on_close = le32_to_cpu(top->fsync_on_close);
 	o->bs_is_seq_rand = le32_to_cpu(top->bs_is_seq_rand);
 	o->random_distribution = le32_to_cpu(top->random_distribution);
+	o->exitall_error = le32_to_cpu(top->exitall_error);
 	o->zipf_theta.u.f = fio_uint64_to_double(le64_to_cpu(top->zipf_theta.u.i));
 	o->pareto_h.u.f = fio_uint64_to_double(le64_to_cpu(top->pareto_h.u.i));
 	o->gauss_dev.u.f = fio_uint64_to_double(le64_to_cpu(top->gauss_dev.u.i));
@@ -353,6 +354,7 @@ void convert_thread_options_to_net(struct thread_options_pack *top,
 	top->fsync_on_close = cpu_to_le32(o->fsync_on_close);
 	top->bs_is_seq_rand = cpu_to_le32(o->bs_is_seq_rand);
 	top->random_distribution = cpu_to_le32(o->random_distribution);
+	top->exitall_error = cpu_to_le32(o->exitall_error);
 	top->zipf_theta.u.i = __cpu_to_le64(fio_double_to_uint64(o->zipf_theta.u.f));
 	top->pareto_h.u.i = __cpu_to_le64(fio_double_to_uint64(o->pareto_h.u.f));
 	top->gauss_dev.u.i = __cpu_to_le64(fio_double_to_uint64(o->gauss_dev.u.f));
diff --git a/fio.1 b/fio.1
index eab20d779e35..4fe1be27c31c 100644
--- a/fio.1
+++ b/fio.1
@@ -1126,6 +1126,10 @@ Should be a multiple of 1MB. Default: 4MB.
 .B exitall
 Terminate all jobs when one finishes.  Default: wait for each job to finish.
 .TP
+.B exitall_on_error \fR=\fPbool
+Terminate all jobs if one job finishes in error.  Default: wait for each job
+to finish.
+.TP
 .BI bwavgtime \fR=\fPint
 Average bandwidth calculations over the given time in milliseconds.  Default:
 500ms.
diff --git a/init.c b/init.c
index 0100da213a24..63ba32481b9b 100644
--- a/init.c
+++ b/init.c
@@ -47,6 +47,7 @@ static char **job_sections;
 static int nr_job_sections;
 
 int exitall_on_terminate = 0;
+int exitall_on_terminate_error = 0;
 int output_format = FIO_OUTPUT_NORMAL;
 int eta_print = FIO_ETA_AUTO;
 int eta_new_line = 0;
diff --git a/options.c b/options.c
index 627029cd732c..46d5fb92ea98 100644
--- a/options.c
+++ b/options.c
@@ -3141,6 +3141,15 @@ struct fio_option fio_options[FIO_MAX_OPTS] = {
 		.group	= FIO_OPT_G_PROCESS,
 	},
 	{
+		.name	= "exitall_on_error",
+		.lname	= "Exit-all on terminate in error",
+		.type	= FIO_OPT_BOOL,
+		.off1	= td_var_offset(unlink),
+		.help	= "Terminate all jobs when one exits in error",
+		.category = FIO_OPT_C_GENERAL,
+		.group	= FIO_OPT_G_PROCESS,
+	},
+	{
 		.name	= "stonewall",
 		.lname	= "Wait for previous",
 		.alias	= "wait_for_previous",
diff --git a/thread_options.h b/thread_options.h
index 02c867f31936..6ae0335698c1 100644
--- a/thread_options.h
+++ b/thread_options.h
@@ -131,6 +131,7 @@ struct thread_options {
 	unsigned int verify_only;
 
 	unsigned int random_distribution;
+	unsigned int exitall_error;
 
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;
@@ -376,7 +377,7 @@ struct thread_options_pack {
 	uint32_t bs_is_seq_rand;
 
 	uint32_t random_distribution;
-	uint32_t pad;
+	uint32_t exitall_error;
 
 	fio_fp64_t zipf_theta;
 	fio_fp64_t pareto_h;

-- 
Jens Axboe



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:11     ` Andrey Kuzmin
@ 2015-12-10 18:15       ` Jens Axboe
  2015-12-10 18:17         ` Andrey Kuzmin
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2015-12-10 18:15 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio

On Thu, Dec 10 2015, Andrey Kuzmin wrote:
> I've also encountered a similar issue a number of times where the job
> failed to stop (and refused to terminate in response to C-C) when a
> thread/process fails, e.g. due to an error. My guess is that the loop
> that waits for completions doesn't check for td->terminate being set.

Attach with gdb and see what they are doing, could be a missing
terminate check. Or it could already be sitting waiting for completions.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:15       ` Jens Axboe
@ 2015-12-10 18:17         ` Andrey Kuzmin
  2015-12-10 18:24           ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Kuzmin @ 2015-12-10 18:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sitsofe Wheeler, fio

On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On Thu, Dec 10 2015, Andrey Kuzmin wrote:
>> I've also encountered a similar issue a number of times where the job
>> failed to stop (and refused to terminate in response to C-C) when a
>> thread/process fails, e.g. due to an error. My guess is that the loop
>> that waits for completions doesn't check for td->terminate being set.
>
> Attach with gdb and see what they are doing, could be a missing
> terminate check. Or it could already be sitting waiting for completions.

It just sits there waiting for completions, as gdb understandably
predominantly hits the wait state.

Regards,
Andrey

>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:17         ` Andrey Kuzmin
@ 2015-12-10 18:24           ` Jens Axboe
  2015-12-10 18:27             ` Andrey Kuzmin
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2015-12-10 18:24 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio

On 12/10/2015 11:17 AM, Andrey Kuzmin wrote:
> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On Thu, Dec 10 2015, Andrey Kuzmin wrote:
>>> I've also encountered a similar issue a number of times where the job
>>> failed to stop (and refused to terminate in response to C-C) when a
>>> thread/process fails, e.g. due to an error. My guess is that the loop
>>> that waits for completions doesn't check for td->terminate being set.
>>
>> Attach with gdb and see what they are doing, could be a missing
>> terminate check. Or it could already be sitting waiting for completions.
>
> It just sits there waiting for completions, as gdb understandably
> predominantly hits the wait state.

Where is it sitting and/or looping?

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:24           ` Jens Axboe
@ 2015-12-10 18:27             ` Andrey Kuzmin
  2015-12-10 18:29               ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Kuzmin @ 2015-12-10 18:27 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sitsofe Wheeler, fio

On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote:
>>
>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote:
>>>>
>>>> I've also encountered a similar issue a number of times where the job
>>>> failed to stop (and refused to terminate in response to C-C) when a
>>>> thread/process fails, e.g. due to an error. My guess is that the loop
>>>> that waits for completions doesn't check for td->terminate being set.
>>>
>>>
>>> Attach with gdb and see what they are doing, could be a missing
>>> terminate check. Or it could already be sitting waiting for completions.
>>
>>
>> It just sits there waiting for completions, as gdb understandably
>> predominantly hits the wait state.
>
>
> Where is it sitting and/or looping?

unix/wait smth ;), as far as I recall.

If you need an exact ref, let me make up an error in the code, run,
and get back to you with the exact gdb frame info.

Regards,
A.

>
> --
> Jens Axboe
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:27             ` Andrey Kuzmin
@ 2015-12-10 18:29               ` Jens Axboe
  2015-12-10 18:30                 ` Andrey Kuzmin
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2015-12-10 18:29 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio

On 12/10/2015 11:27 AM, Andrey Kuzmin wrote:
> On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote:
>>>
>>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote:
>>>>>
>>>>> I've also encountered a similar issue a number of times where the job
>>>>> failed to stop (and refused to terminate in response to C-C) when a
>>>>> thread/process fails, e.g. due to an error. My guess is that the loop
>>>>> that waits for completions doesn't check for td->terminate being set.
>>>>
>>>>
>>>> Attach with gdb and see what they are doing, could be a missing
>>>> terminate check. Or it could already be sitting waiting for completions.
>>>
>>>
>>> It just sits there waiting for completions, as gdb understandably
>>> predominantly hits the wait state.
>>
>>
>> Where is it sitting and/or looping?
>
> unix/wait smth ;), as far as I recall.
>
> If you need an exact ref, let me make up an error in the code, run,
> and get back to you with the exact gdb frame info.

I'm generally not in the crystal ball or guessing game :-)

So yeah, a stack trace would be helpful.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:29               ` Jens Axboe
@ 2015-12-10 18:30                 ` Andrey Kuzmin
  2015-12-11 10:01                   ` Andrey Kuzmin
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Kuzmin @ 2015-12-10 18:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sitsofe Wheeler, fio

On Thu, Dec 10, 2015 at 9:29 PM, Jens Axboe <axboe@kernel.dk> wrote:
> On 12/10/2015 11:27 AM, Andrey Kuzmin wrote:
>>
>> On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>
>>> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote:
>>>>
>>>>
>>>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>>>
>>>>>
>>>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote:
>>>>>>
>>>>>>
>>>>>> I've also encountered a similar issue a number of times where the job
>>>>>> failed to stop (and refused to terminate in response to C-C) when a
>>>>>> thread/process fails, e.g. due to an error. My guess is that the loop
>>>>>> that waits for completions doesn't check for td->terminate being set.
>>>>>
>>>>>
>>>>>
>>>>> Attach with gdb and see what they are doing, could be a missing
>>>>> terminate check. Or it could already be sitting waiting for
>>>>> completions.
>>>>
>>>>
>>>>
>>>> It just sits there waiting for completions, as gdb understandably
>>>> predominantly hits the wait state.
>>>
>>>
>>>
>>> Where is it sitting and/or looping?
>>
>>
>> unix/wait smth ;), as far as I recall.
>>
>> If you need an exact ref, let me make up an error in the code, run,
>> and get back to you with the exact gdb frame info.
>
>
> I'm generally not in the crystal ball or guessing game :-)
>
> So yeah, a stack trace would be helpful.

OK, will do.


Regards,
Andrey

>
> --
> Jens Axboe
>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-10 18:30                 ` Andrey Kuzmin
@ 2015-12-11 10:01                   ` Andrey Kuzmin
  2015-12-11 15:32                     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Kuzmin @ 2015-12-11 10:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Sitsofe Wheeler, fio

^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
01d:12h:24m:29s]
Program received signal SIGINT, Interrupt.
0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at
../sysdeps/unix/sysv/linux/usleep.c:32
#2  0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951
#3  0x000000000045b33c in run_threads () at backend.c:2216
#4  0x000000000045b6a8 in fio_backend () at backend.c:2333
#5  0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8,
envp=0x7fffffffddd0) at fio.c:60

Regards,
Andrey


On Thu, Dec 10, 2015 at 9:30 PM, Andrey Kuzmin
<andrey.v.kuzmin@gmail.com> wrote:
> On Thu, Dec 10, 2015 at 9:29 PM, Jens Axboe <axboe@kernel.dk> wrote:
>> On 12/10/2015 11:27 AM, Andrey Kuzmin wrote:
>>>
>>> On Thu, Dec 10, 2015 at 9:24 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>>
>>>> On 12/10/2015 11:17 AM, Andrey Kuzmin wrote:
>>>>>
>>>>>
>>>>> On Thu, Dec 10, 2015 at 9:15 PM, Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Dec 10 2015, Andrey Kuzmin wrote:
>>>>>>>
>>>>>>>
>>>>>>> I've also encountered a similar issue a number of times where the job
>>>>>>> failed to stop (and refused to terminate in response to C-C) when a
>>>>>>> thread/process fails, e.g. due to an error. My guess is that the loop
>>>>>>> that waits for completions doesn't check for td->terminate being set.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Attach with gdb and see what they are doing, could be a missing
>>>>>> terminate check. Or it could already be sitting waiting for
>>>>>> completions.
>>>>>
>>>>>
>>>>>
>>>>> It just sits there waiting for completions, as gdb understandably
>>>>> predominantly hits the wait state.
>>>>
>>>>
>>>>
>>>> Where is it sitting and/or looping?
>>>
>>>
>>> unix/wait smth ;), as far as I recall.
>>>
>>> If you need an exact ref, let me make up an error in the code, run,
>>> and get back to you with the exact gdb frame info.
>>
>>
>> I'm generally not in the crystal ball or guessing game :-)
>>
>> So yeah, a stack trace would be helpful.
>
> OK, will do.
>

^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
01d:12h:24m:29s]
Program received signal SIGINT, Interrupt.
0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0  0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at
../sysdeps/unix/sysv/linux/usleep.c:32
#2  0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951
#3  0x000000000045b33c in run_threads () at backend.c:2216
#4  0x000000000045b6a8 in fio_backend () at backend.c:2333
#5  0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8,
envp=0x7fffffffddd0) at fio.c:60

The log says "fio: terminating on signal 2", but killing it with ^C if
not running under gdb doesn't work - the job continues, seemingly
waiting for the completion that never comes.

Regards,
Andrey

>
> Regards,
> Andrey
>
>>
>> --
>> Jens Axboe
>>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-11 10:01                   ` Andrey Kuzmin
@ 2015-12-11 15:32                     ` Jens Axboe
  2015-12-11 19:59                       ` Sitsofe Wheeler
  0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2015-12-11 15:32 UTC (permalink / raw)
  To: Andrey Kuzmin; +Cc: Sitsofe Wheeler, fio

On 12/11/2015 03:01 AM, Andrey Kuzmin wrote:
> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
> 01d:12h:24m:29s]
> Program received signal SIGINT, Interrupt.
> 0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> 81 ../sysdeps/unix/syscall-template.S: No such file or directory.
> (gdb) bt
> #0  0x00007ffff6b7ff3d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
> #1  0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at
> ../sysdeps/unix/sysv/linux/usleep.c:32
> #2  0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951
> #3  0x000000000045b33c in run_threads () at backend.c:2216
> #4  0x000000000045b6a8 in fio_backend () at backend.c:2333
> #5  0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8,
> envp=0x7fffffffddd0) at fio.c:60

That's not one of the IO threads, that's the main thread. It'll sit and 
wait in that loop until jobs finish. You'll need the backtrace of one of 
the stuck IO thread instead, this trace is quite normal and expected of 
backend.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-11 15:32                     ` Jens Axboe
@ 2015-12-11 19:59                       ` Sitsofe Wheeler
  2015-12-11 20:32                         ` Andrey Kuzmin
  0 siblings, 1 reply; 16+ messages in thread
From: Sitsofe Wheeler @ 2015-12-11 19:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Andrey Kuzmin, fio

On 11 December 2015 at 15:32, Jens Axboe <axboe@kernel.dk> wrote:
> On 12/11/2015 03:01 AM, Andrey Kuzmin wrote:
>>
>> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>> 01d:12h:24m:29s]
>> Program received signal SIGINT, Interrupt.
>> 0x00007ffff6b7ff3d in nanosleep () at
>> ../sysdeps/unix/syscall-template.S:81
>> 81 ../sysdeps/unix/syscall-template.S: No such file or directory.
>> (gdb) bt
>> #0  0x00007ffff6b7ff3d in nanosleep () at
>> ../sysdeps/unix/syscall-template.S:81
>> #1  0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at
>> ../sysdeps/unix/sysv/linux/usleep.c:32
>> #2  0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951
>> #3  0x000000000045b33c in run_threads () at backend.c:2216
>> #4  0x000000000045b6a8 in fio_backend () at backend.c:2333
>> #5  0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8,
>> envp=0x7fffffffddd0) at fio.c:60
>
>
> That's not one of the IO threads, that's the main thread. It'll sit and wait
> in that loop until jobs finish. You'll need the backtrace of one of the
> stuck IO thread instead, this trace is quite normal and expected of backend.
>
> --
> Jens Axboe
>

Andrey:

Could you try
thread apply all bt full
(found over on https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
)?


-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-11 19:59                       ` Sitsofe Wheeler
@ 2015-12-11 20:32                         ` Andrey Kuzmin
  2015-12-11 20:38                           ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Andrey Kuzmin @ 2015-12-11 20:32 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 1627 bytes --]

On Dec 11, 2015 22:59, "Sitsofe Wheeler" <sitsofe@gmail.com> wrote:
>
> On 11 December 2015 at 15:32, Jens Axboe <axboe@kernel.dk> wrote:
> > On 12/11/2015 03:01 AM, Andrey Kuzmin wrote:
> >>
> >> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
> >> 01d:12h:24m:29s]
> >> Program received signal SIGINT, Interrupt.
> >> 0x00007ffff6b7ff3d in nanosleep () at
> >> ../sysdeps/unix/syscall-template.S:81
> >> 81 ../sysdeps/unix/syscall-template.S: No such file or directory.
> >> (gdb) bt
> >> #0  0x00007ffff6b7ff3d in nanosleep () at
> >> ../sysdeps/unix/syscall-template.S:81
> >> #1  0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at
> >> ../sysdeps/unix/sysv/linux/usleep.c:32
> >> #2  0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951
> >> #3  0x000000000045b33c in run_threads () at backend.c:2216
> >> #4  0x000000000045b6a8 in fio_backend () at backend.c:2333
> >> #5  0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8,
> >> envp=0x7fffffffddd0) at fio.c:60
> >
> >
> > That's not one of the IO threads, that's the main thread. It'll sit and
wait
> > in that loop until jobs finish. You'll need the backtrace of one of the
> > stuck IO thread instead, this trace is quite normal and expected of
backend.
> >
> > --
> > Jens Axboe
> >
>
> Andrey:
>
> Could you try
> thread apply all bt full
> (found over on
https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
> )?
>

That test case is already gone, but - if interested - you can easily
simulate it by randomly dropping an io_u inside the engine.

Regards,
Andrey

>
> --
> Sitsofe | http://sucs.org/~sits/

[-- Attachment #2: Type: text/html, Size: 2403 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Exit all jobs on error
  2015-12-11 20:32                         ` Andrey Kuzmin
@ 2015-12-11 20:38                           ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2015-12-11 20:38 UTC (permalink / raw)
  To: Andrey Kuzmin, Sitsofe Wheeler; +Cc: fio

On 12/11/2015 01:32 PM, Andrey Kuzmin wrote:
>
> On Dec 11, 2015 22:59, "Sitsofe Wheeler" <sitsofe@gmail.com
> <mailto:sitsofe@gmail.com>> wrote:
>  >
>  > On 11 December 2015 at 15:32, Jens Axboe <axboe@kernel.dk
> <mailto:axboe@kernel.dk>> wrote:
>  > > On 12/11/2015 03:01 AM, Andrey Kuzmin wrote:
>  > >>
>  > >> ^Cbs: 1 (f=1): [w(1)] [0.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta
>  > >> 01d:12h:24m:29s]
>  > >> Program received signal SIGINT, Interrupt.
>  > >> 0x00007ffff6b7ff3d in nanosleep () at
>  > >> ../sysdeps/unix/syscall-template.S:81
>  > >> 81 ../sysdeps/unix/syscall-template.S: No such file or directory.
>  > >> (gdb) bt
>  > >> #0  0x00007ffff6b7ff3d in nanosleep () at
>  > >> ../sysdeps/unix/syscall-template.S:81
>  > >> #1  0x00007ffff6bb14a4 in usleep (useconds=<optimized out>) at
>  > >> ../sysdeps/unix/sysv/linux/usleep.c:32
>  > >> #2  0x000000000045a7ed in do_usleep (usecs=10000) at backend.c:1951
>  > >> #3  0x000000000045b33c in run_threads () at backend.c:2216
>  > >> #4  0x000000000045b6a8 in fio_backend () at backend.c:2333
>  > >> #5  0x00000000004991cb in main (argc=4, argv=0x7fffffffdda8,
>  > >> envp=0x7fffffffddd0) at fio.c:60
>  > >
>  > >
>  > > That's not one of the IO threads, that's the main thread. It'll sit
> and wait
>  > > in that loop until jobs finish. You'll need the backtrace of one of the
>  > > stuck IO thread instead, this trace is quite normal and expected of
> backend.
>  > >
>  > > --
>  > > Jens Axboe
>  > >
>  >
>  > Andrey:
>  >
>  > Could you try
>  > thread apply all bt full
>  > (found over on
> https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Backtraces
>  > )?
>  >
>
> That test case is already gone, but - if interested - you can easily
> simulate it by randomly dropping an io_u inside the engine.

To follow up on this, since apparently parts of that thread ended up 
outside of the mailing list.

If you drop an io_u inside the engine, then fio will of course get stuck 
waiting for completions. That would be an IO engine bug. Fio does not 
track timeouts internally, because it does not have to:

For the more real case of being stuck waiting for IO that has been 
submitted to the kernel, we strictly depend on the kernel completing 
those IOs. If not, that's a kernel bug, and it won't matter if we 
explicitly wait for the IO, since it'll happen in any case when we drop 
the aio context. Either the IO gets completed by the device, or a driver 
timeout will take care of completing it in either. In either case, we 
get a completion event.

There's no fio bug here.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2015-12-11 20:38 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-10  7:01 Exit all jobs on error Sitsofe Wheeler
2015-12-10 17:11 ` Jens Axboe
2015-12-10 17:58   ` Sitsofe Wheeler
2015-12-10 18:11     ` Andrey Kuzmin
2015-12-10 18:15       ` Jens Axboe
2015-12-10 18:17         ` Andrey Kuzmin
2015-12-10 18:24           ` Jens Axboe
2015-12-10 18:27             ` Andrey Kuzmin
2015-12-10 18:29               ` Jens Axboe
2015-12-10 18:30                 ` Andrey Kuzmin
2015-12-11 10:01                   ` Andrey Kuzmin
2015-12-11 15:32                     ` Jens Axboe
2015-12-11 19:59                       ` Sitsofe Wheeler
2015-12-11 20:32                         ` Andrey Kuzmin
2015-12-11 20:38                           ` Jens Axboe
2015-12-10 18:15     ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.