All of lore.kernel.org
 help / color / mirror / Atom feed
* fio signal 11
@ 2016-06-09 21:58 Jeff Furlong
  2016-06-10  7:17 ` Sitsofe Wheeler
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-06-09 21:58 UTC (permalink / raw)
  To: fio

Hi All,
Using the latest version from git, I am getting a signal 11 error when running some workloads.  It may be that workloads fail when we wrap around the device capacity (e.g. runtime and time_based).  See below example parameters:

# fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32 --size=100% --numjobs=1 --bs=1m --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=1500s --time_based --ramp_time=5s --output=test_job

fio-2.11-17-ga275
Starting 1 process
fio: pid=145763, got signal=11

In fio-2.11 (release), the error did not occur.

Regards,
Jeff

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-09 21:58 fio signal 11 Jeff Furlong
@ 2016-06-10  7:17 ` Sitsofe Wheeler
  2016-06-10 18:42   ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Sitsofe Wheeler @ 2016-06-10  7:17 UTC (permalink / raw)
  To: Jeff Furlong; +Cc: fio

On 9 June 2016 at 22:58, Jeff Furlong <jeff.furlong@hgst.com> wrote:
>
> Using the latest version from git, I am getting a signal 11 error when running some workloads.  It may be that workloads fail when we wrap around the device capacity (e.g. runtime and time_based).  See below example parameters:
>
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32 --size=100% --numjobs=1 --bs=1m --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=1500s --time_based --ramp_time=5s --output=test_job
>
> fio-2.11-17-ga275
> Starting 1 process
> fio: pid=145763, got signal=11

Could you run fio under gdb (
gdb ./fio
run --name=test_job --ioengine=libaio etc.
)

and when it crashes get a backtrace with
thread apply all bt
?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-06-10  7:17 ` Sitsofe Wheeler
@ 2016-06-10 18:42   ` Jeff Furlong
  2016-06-12  2:56     ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-06-10 18:42 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

Good point.  Here is the trace:

[New LWP 59231]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at stat.c:1909
1909		if (!cur_log) {
 
(gdb) bt
#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at stat.c:1909
#1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at stat.c:1965
#2  0x000000000040ca90 in wait_for_completions (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at backend.c:446
#3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>, td=0x7f8277de0000) at backend.c:991
#4  thread_main (data=data@entry=0x264d450) at backend.c:1667
#5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2217
#6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2349
#7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, envp=<optimized out>) at fio.c:63

Regards,
Jeff


-----Original Message-----
From: Sitsofe Wheeler [mailto:sitsofe@gmail.com] 
Sent: Friday, June 10, 2016 12:17 AM
To: Jeff Furlong <jeff.furlong@hgst.com>
Cc: fio@vger.kernel.org
Subject: Re: fio signal 11

On 9 June 2016 at 22:58, Jeff Furlong <jeff.furlong@hgst.com> wrote:
>
> Using the latest version from git, I am getting a signal 11 error when running some workloads.  It may be that workloads fail when we wrap around the device capacity (e.g. runtime and time_based).  See below example parameters:
>
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=write 
> --iodepth=32 --size=100% --numjobs=1 --bs=1m --filename=/dev/nvme0n1 
> --group_reporting --write_bw_log=test_job --write_iops_log=test_job 
> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 
> --disable_clat=0 --disable_slat=0 --runtime=1500s --time_based 
> --ramp_time=5s --output=test_job
>
> fio-2.11-17-ga275
> Starting 1 process
> fio: pid=145763, got signal=11

Could you run fio under gdb (
gdb ./fio
run --name=test_job --ioengine=libaio etc.
)

and when it crashes get a backtrace with thread apply all bt ?

--
Sitsofe | http://sucs.org/~sits/
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-10 18:42   ` Jeff Furlong
@ 2016-06-12  2:56     ` Jens Axboe
  2016-06-12  3:30       ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-06-12  2:56 UTC (permalink / raw)
  To: Jeff Furlong, Sitsofe Wheeler; +Cc: fio

On 06/10/2016 12:42 PM, Jeff Furlong wrote:
> Good point.  Here is the trace:
>
> [New LWP 59231]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at stat.c:1909
> 1909		if (!cur_log) {
>
> (gdb) bt
> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at stat.c:1909
> #1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at stat.c:1965
> #2  0x000000000040ca90 in wait_for_completions (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at backend.c:446
> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>, td=0x7f8277de0000) at backend.c:991
> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at backend.c:2217
> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at backend.c:2349
> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, envp=<optimized out>) at fio.c:63

That looks odd, thanks for reporting this. I'll see if I can get to this 
on Monday, if not, it'll have to wait until after my vacation... So 
while I appreciate people running -git and finding issues like these 
before they show up in a release, might be best to revert back to 2.2.11 
until I can get this debugged.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-12  2:56     ` Jens Axboe
@ 2016-06-12  3:30       ` Jens Axboe
  2016-06-13 17:58         ` Jeff Furlong
  2016-06-15 14:45         ` Jan Kara
  0 siblings, 2 replies; 28+ messages in thread
From: Jens Axboe @ 2016-06-12  3:30 UTC (permalink / raw)
  To: Jeff Furlong, Sitsofe Wheeler; +Cc: fio, Jan Kara

On 06/11/2016 08:56 PM, Jens Axboe wrote:
> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>> Good point.  Here is the trace:
>>
>> [New LWP 59231]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `fio --name=test_job --ioengine=libaio
>> --direct=1 --rw=write --iodepth=32'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>> stat.c:1909
>> 1909        if (!cur_log) {
>>
>> (gdb) bt
>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>> stat.c:1909
>> #1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
>> stat.c:1965
>> #2  0x000000000040ca90 in wait_for_completions
>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>> backend.c:446
>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
>> td=0x7f8277de0000) at backend.c:991
>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
>> backend.c:2217
>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
>> backend.c:2349
>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>> envp=<optimized out>) at fio.c:63
>
> That looks odd, thanks for reporting this. I'll see if I can get to this
> on Monday, if not, it'll have to wait until after my vacation... So
> while I appreciate people running -git and finding issues like these
> before they show up in a release, might be best to revert back to 2.2.11
> until I can get this debugged.

I take that back - continue using -git! Just pull a fresh copy, should
be fixed now.

Jan, the reporter is right, 2.11 works and -git does not. So I just ran
a quick bisect, changing the logging from every second to every 100ms to
make it reproduce faster. I don't have time to look into why yet, so I
just reverted the commit.

commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
Author: Jan Kara <jack@suse.cz>
Date:   Tue May 24 17:03:21 2016 +0200

     fio: Simplify forking of processes

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-06-12  3:30       ` Jens Axboe
@ 2016-06-13 17:58         ` Jeff Furlong
  2016-06-13 18:01           ` Jens Axboe
  2016-06-15 14:45         ` Jan Kara
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-06-13 17:58 UTC (permalink / raw)
  To: Jens Axboe, Sitsofe Wheeler; +Cc: fio, Jan Kara

I'd like to use the latest git version so that I can pickup patch 356014ff351c6eb69339652650af5f6af72e5c22 to fix the ramp time issue.  

I tried the latest git version fio-2.11-21-g2b762, but got a new issue:

[New LWP 120298]
[New LWP 120281]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32'.
Program terminated with signal 11, Segmentation fault.
#0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000, 
    offset=offset@entry=0) at stat.c:2015
2015			s->val = val;
(gdb) bt
#0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000, 
    offset=offset@entry=0) at stat.c:2015
#1  0x0000000000422411 in __add_log_sample (offset=0, t=1056000, bs=0, ddir=DDIR_WRITE, val=<optimized out>, iolog=0x7f8376222c30)
    at stat.c:2004
#2  __add_stat_to_log (log_max=<optimized out>, elapsed=1056000, ddir=DDIR_WRITE, iolog=0x7f8376222c30) at stat.c:2091
#3  _add_stat_to_log (log_max=<optimized out>, elapsed=<optimized out>, iolog=0x7f8376222c30) at stat.c:2103
#4  add_log_sample (td=td@entry=0x7f8361f3d000, iolog=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=1048576, 
    offset=offset@entry=0) at stat.c:2139
#5  0x000000000042f45f in add_log_sample (offset=0, bs=<optimized out>, ddir=DDIR_WRITE, val=<optimized out>, iolog=<optimized out>, 
    td=0x7f8361f3d000) at stat.c:2357
#6  add_iops_samples (t=0x7f835a4bae50, td=0x7f8361f3d000) at stat.c:2362
#7  calc_log_samples () at stat.c:2402
#8  0x0000000000467508 in helper_thread_main (data=0x7f8376222d90) at helper_thread.c:125
#9  0x00007f8377e7cdf5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f83779a61ad in clone () from /lib64/libc.so.6

Maybe I should try the ramp time patch only on 2.11 release?

Regards,
Jeff


-----Original Message-----
From: Jens Axboe [mailto:axboe@kernel.dk] 
Sent: Saturday, June 11, 2016 8:30 PM
To: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler <sitsofe@gmail.com>
Cc: fio@vger.kernel.org; Jan Kara <jack@suse.cz>
Subject: Re: fio signal 11

On 06/11/2016 08:56 PM, Jens Axboe wrote:
> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>> Good point.  Here is the trace:
>>
>> [New LWP 59231]
>> [Thread debugging using libthread_db enabled] Using host libthread_db 
>> library "/lib64/libthread_db.so.1".
>> Core was generated by `fio --name=test_job --ioengine=libaio
>> --direct=1 --rw=write --iodepth=32'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>> stat.c:1909
>> 1909        if (!cur_log) {
>>
>> (gdb) bt
>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>> stat.c:1909
>> #1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
>> stat.c:1965
>> #2  0x000000000040ca90 in wait_for_completions 
>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>> backend.c:446
>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
>> td=0x7f8277de0000) at backend.c:991
>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
>> backend.c:2217
>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
>> backend.c:2349
>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, 
>> envp=<optimized out>) at fio.c:63
>
> That looks odd, thanks for reporting this. I'll see if I can get to 
> this on Monday, if not, it'll have to wait until after my vacation... 
> So while I appreciate people running -git and finding issues like 
> these before they show up in a release, might be best to revert back 
> to 2.2.11 until I can get this debugged.

I take that back - continue using -git! Just pull a fresh copy, should be fixed now.

Jan, the reporter is right, 2.11 works and -git does not. So I just ran a quick bisect, changing the logging from every second to every 100ms to make it reproduce faster. I don't have time to look into why yet, so I just reverted the commit.

commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
Author: Jan Kara <jack@suse.cz>
Date:   Tue May 24 17:03:21 2016 +0200

     fio: Simplify forking of processes

--
Jens Axboe

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-13 17:58         ` Jeff Furlong
@ 2016-06-13 18:01           ` Jens Axboe
  2016-06-13 18:04             ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-06-13 18:01 UTC (permalink / raw)
  To: Jeff Furlong, Sitsofe Wheeler; +Cc: fio, Jan Kara

On 06/13/2016 11:58 AM, Jeff Furlong wrote:
> I'd like to use the latest git version so that I can pickup patch 356014ff351c6eb69339652650af5f6af72e5c22 to fix the ramp time issue.
>
> I tried the latest git version fio-2.11-21-g2b762, but got a new issue:
>
> [New LWP 120298]
> [New LWP 120281]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32'.
> Program terminated with signal 11, Segmentation fault.
> #0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000,
>      offset=offset@entry=0) at stat.c:2015
> 2015			s->val = val;
> (gdb) bt
> #0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000,
>      offset=offset@entry=0) at stat.c:2015
> #1  0x0000000000422411 in __add_log_sample (offset=0, t=1056000, bs=0, ddir=DDIR_WRITE, val=<optimized out>, iolog=0x7f8376222c30)
>      at stat.c:2004
> #2  __add_stat_to_log (log_max=<optimized out>, elapsed=1056000, ddir=DDIR_WRITE, iolog=0x7f8376222c30) at stat.c:2091
> #3  _add_stat_to_log (log_max=<optimized out>, elapsed=<optimized out>, iolog=0x7f8376222c30) at stat.c:2103
> #4  add_log_sample (td=td@entry=0x7f8361f3d000, iolog=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=1048576,
>      offset=offset@entry=0) at stat.c:2139
> #5  0x000000000042f45f in add_log_sample (offset=0, bs=<optimized out>, ddir=DDIR_WRITE, val=<optimized out>, iolog=<optimized out>,
>      td=0x7f8361f3d000) at stat.c:2357
> #6  add_iops_samples (t=0x7f835a4bae50, td=0x7f8361f3d000) at stat.c:2362
> #7  calc_log_samples () at stat.c:2402
> #8  0x0000000000467508 in helper_thread_main (data=0x7f8376222d90) at helper_thread.c:125
> #9  0x00007f8377e7cdf5 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f83779a61ad in clone () from /lib64/libc.so.6
>
> Maybe I should try the ramp time patch only on 2.11 release?

What did you run to cause this crash? Looks like only some of the 
arguments got shown.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-06-13 18:01           ` Jens Axboe
@ 2016-06-13 18:04             ` Jeff Furlong
  2016-06-13 19:21               ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-06-13 18:04 UTC (permalink / raw)
  To: Jens Axboe, Sitsofe Wheeler; +Cc: fio, Jan Kara

I used the same workload:

# fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32 --size=100% --numjobs=1 --bs=1m --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=1500s --time_based --ramp_time=5s --output=test_job

For whatever reason, looks like gdb truncates the parameters.

Regards,
Jeff


-----Original Message-----
From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On Behalf Of Jens Axboe
Sent: Monday, June 13, 2016 11:02 AM
To: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler <sitsofe@gmail.com>
Cc: fio@vger.kernel.org; Jan Kara <jack@suse.cz>
Subject: Re: fio signal 11

On 06/13/2016 11:58 AM, Jeff Furlong wrote:
> I'd like to use the latest git version so that I can pickup patch 356014ff351c6eb69339652650af5f6af72e5c22 to fix the ramp time issue.
>
> I tried the latest git version fio-2.11-21-g2b762, but got a new issue:
>
> [New LWP 120298]
> [New LWP 120281]
> [Thread debugging using libthread_db enabled] Using host libthread_db 
> library "/lib64/libthread_db.so.1".
> Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32'.
> Program terminated with signal 11, Segmentation fault.
> #0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000,
>      offset=offset@entry=0) at stat.c:2015
> 2015			s->val = val;
> (gdb) bt
> #0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000,
>      offset=offset@entry=0) at stat.c:2015
> #1  0x0000000000422411 in __add_log_sample (offset=0, t=1056000, bs=0, ddir=DDIR_WRITE, val=<optimized out>, iolog=0x7f8376222c30)
>      at stat.c:2004
> #2  __add_stat_to_log (log_max=<optimized out>, elapsed=1056000, 
> ddir=DDIR_WRITE, iolog=0x7f8376222c30) at stat.c:2091
> #3  _add_stat_to_log (log_max=<optimized out>, elapsed=<optimized 
> out>, iolog=0x7f8376222c30) at stat.c:2103
> #4  add_log_sample (td=td@entry=0x7f8361f3d000, iolog=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=1048576,
>      offset=offset@entry=0) at stat.c:2139
> #5  0x000000000042f45f in add_log_sample (offset=0, bs=<optimized out>, ddir=DDIR_WRITE, val=<optimized out>, iolog=<optimized out>,
>      td=0x7f8361f3d000) at stat.c:2357
> #6  add_iops_samples (t=0x7f835a4bae50, td=0x7f8361f3d000) at 
> stat.c:2362
> #7  calc_log_samples () at stat.c:2402
> #8  0x0000000000467508 in helper_thread_main (data=0x7f8376222d90) at 
> helper_thread.c:125
> #9  0x00007f8377e7cdf5 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f83779a61ad in clone () from /lib64/libc.so.6
>
> Maybe I should try the ramp time patch only on 2.11 release?

What did you run to cause this crash? Looks like only some of the arguments got shown.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-06-13 18:04             ` Jeff Furlong
@ 2016-06-13 19:21               ` Jeff Furlong
  2016-06-13 19:23                 ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-06-13 19:21 UTC (permalink / raw)
  To: Jeff Furlong, Jens Axboe, Sitsofe Wheeler; +Cc: fio, Jan Kara

I took the fio 2.11 release and applied patch 356014ff351c6eb69339652650af5f6af72e5c22 but it only created a different error:

[New LWP 131047]
[New LWP 131030]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --io'.
Program terminated with signal 11, Segmentation fault.
#0  __flist_del (next=0x0, prev=0x0) at flist.h:86
86		next->prev = prev;
(gdb) bt
#0  __flist_del (next=0x0, prev=0x0) at flist.h:86
#1  flist_del_init (entry=0xd609c0) at flist.h:109
#2  flush_log (log=<optimized out>, do_append=<optimized out>) at iolog.c:989
#3  0x0000000000457a08 in finish_log (td=td@entry=0x7f4f0fb1f000, log=0x7f4f23e04ad0, trylock=1) at iolog.c:1011
#4  0x0000000000457be5 in __write_log (try=<optimized out>, log=<optimized out>, td=0x7f4f0fb1f000) at iolog.c:1309
#5  write_bandw_log (td=0x7f4f0fb1f000, try=<optimized out>, unit_log=<optimized out>) at iolog.c:1377
#6  0x0000000000457efc in td_writeout_logs (unit_logs=false, td=0x7f4f0fb1f000) at iolog.c:1440
#7  fio_writeout_logs (unit_logs=unit_logs@entry=false) at iolog.c:1461
#8  0x0000000000466a67 in helper_thread_main (data=0x7f4f23e04d90) at helper_thread.c:135
#9  0x00007f4f25a5edf5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f4f255881ad in clone () from /lib64/libc.so.6

Regards,
Jeff


-----Original Message-----
From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On Behalf Of Jeff Furlong
Sent: Monday, June 13, 2016 11:05 AM
To: Jens Axboe <axboe@kernel.dk>; Sitsofe Wheeler <sitsofe@gmail.com>
Cc: fio@vger.kernel.org; Jan Kara <jack@suse.cz>
Subject: RE: fio signal 11

I used the same workload:

# fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32 --size=100% --numjobs=1 --bs=1m --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=1500s --time_based --ramp_time=5s --output=test_job

For whatever reason, looks like gdb truncates the parameters.

Regards,
Jeff


-----Original Message-----
From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On Behalf Of Jens Axboe
Sent: Monday, June 13, 2016 11:02 AM
To: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler <sitsofe@gmail.com>
Cc: fio@vger.kernel.org; Jan Kara <jack@suse.cz>
Subject: Re: fio signal 11

On 06/13/2016 11:58 AM, Jeff Furlong wrote:
> I'd like to use the latest git version so that I can pickup patch 356014ff351c6eb69339652650af5f6af72e5c22 to fix the ramp time issue.
>
> I tried the latest git version fio-2.11-21-g2b762, but got a new issue:
>
> [New LWP 120298]
> [New LWP 120281]
> [Thread debugging using libthread_db enabled] Using host libthread_db 
> library "/lib64/libthread_db.so.1".
> Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --iodepth=32'.
> Program terminated with signal 11, Segmentation fault.
> #0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000,
>      offset=offset@entry=0) at stat.c:2015
> 2015			s->val = val;
> (gdb) bt
> #0  __add_log_sample (iolog=iolog@entry=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=bs@entry=0, t=t@entry=1056000,
>      offset=offset@entry=0) at stat.c:2015
> #1  0x0000000000422411 in __add_log_sample (offset=0, t=1056000, bs=0, ddir=DDIR_WRITE, val=<optimized out>, iolog=0x7f8376222c30)
>      at stat.c:2004
> #2  __add_stat_to_log (log_max=<optimized out>, elapsed=1056000, 
> ddir=DDIR_WRITE, iolog=0x7f8376222c30) at stat.c:2091
> #3  _add_stat_to_log (log_max=<optimized out>, elapsed=<optimized
> out>, iolog=0x7f8376222c30) at stat.c:2103
> #4  add_log_sample (td=td@entry=0x7f8361f3d000, iolog=0x7f8376222c30, val=2230, ddir=ddir@entry=DDIR_WRITE, bs=1048576,
>      offset=offset@entry=0) at stat.c:2139
> #5  0x000000000042f45f in add_log_sample (offset=0, bs=<optimized out>, ddir=DDIR_WRITE, val=<optimized out>, iolog=<optimized out>,
>      td=0x7f8361f3d000) at stat.c:2357
> #6  add_iops_samples (t=0x7f835a4bae50, td=0x7f8361f3d000) at
> stat.c:2362
> #7  calc_log_samples () at stat.c:2402
> #8  0x0000000000467508 in helper_thread_main (data=0x7f8376222d90) at
> helper_thread.c:125
> #9  0x00007f8377e7cdf5 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f83779a61ad in clone () from /lib64/libc.so.6
>
> Maybe I should try the ramp time patch only on 2.11 release?

What did you run to cause this crash? Looks like only some of the arguments got shown.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
\x04 {.n +       +%  lzwm  b 맲  r  y   {ay \x1dʇڙ ,j   f   h   z \x1e w       j:+v   w j m         zZ+     ݢj"  ! i
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-13 19:21               ` Jeff Furlong
@ 2016-06-13 19:23                 ` Jens Axboe
  2016-06-13 19:34                   ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-06-13 19:23 UTC (permalink / raw)
  To: Jeff Furlong, Sitsofe Wheeler; +Cc: fio, Jan Kara

On 06/13/2016 01:21 PM, Jeff Furlong wrote:
> I took the fio 2.11 release and applied patch 356014ff351c6eb69339652650af5f6af72e5c22 but it only created a different error:
>
> [New LWP 131047]
> [New LWP 131030]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> Core was generated by `fio --name=test_job --ioengine=libaio --direct=1 --rw=write --io'.
> Program terminated with signal 11, Segmentation fault.
> #0  __flist_del (next=0x0, prev=0x0) at flist.h:86
> 86		next->prev = prev;
> (gdb) bt
> #0  __flist_del (next=0x0, prev=0x0) at flist.h:86
> #1  flist_del_init (entry=0xd609c0) at flist.h:109
> #2  flush_log (log=<optimized out>, do_append=<optimized out>) at iolog.c:989
> #3  0x0000000000457a08 in finish_log (td=td@entry=0x7f4f0fb1f000, log=0x7f4f23e04ad0, trylock=1) at iolog.c:1011
> #4  0x0000000000457be5 in __write_log (try=<optimized out>, log=<optimized out>, td=0x7f4f0fb1f000) at iolog.c:1309
> #5  write_bandw_log (td=0x7f4f0fb1f000, try=<optimized out>, unit_log=<optimized out>) at iolog.c:1377
> #6  0x0000000000457efc in td_writeout_logs (unit_logs=false, td=0x7f4f0fb1f000) at iolog.c:1440
> #7  fio_writeout_logs (unit_logs=unit_logs@entry=false) at iolog.c:1461
> #8  0x0000000000466a67 in helper_thread_main (data=0x7f4f23e04d90) at helper_thread.c:135
> #9  0x00007f4f25a5edf5 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f4f255881ad in clone () from /lib64/libc.so.6

Hang on, I think I know what the issue is. Just now running a test to 
prove that the fix is correct, then I'll commit it and reply.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-13 19:23                 ` Jens Axboe
@ 2016-06-13 19:34                   ` Jens Axboe
  2016-06-13 20:55                     ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-06-13 19:34 UTC (permalink / raw)
  To: Jeff Furlong, Sitsofe Wheeler; +Cc: fio, Jan Kara

On 06/13/2016 01:23 PM, Jens Axboe wrote:
> On 06/13/2016 01:21 PM, Jeff Furlong wrote:
>> I took the fio 2.11 release and applied patch
>> 356014ff351c6eb69339652650af5f6af72e5c22 but it only created a
>> different error:
>>
>> [New LWP 131047]
>> [New LWP 131030]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib64/libthread_db.so.1".
>> Core was generated by `fio --name=test_job --ioengine=libaio
>> --direct=1 --rw=write --io'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  __flist_del (next=0x0, prev=0x0) at flist.h:86
>> 86        next->prev = prev;
>> (gdb) bt
>> #0  __flist_del (next=0x0, prev=0x0) at flist.h:86
>> #1  flist_del_init (entry=0xd609c0) at flist.h:109
>> #2  flush_log (log=<optimized out>, do_append=<optimized out>) at
>> iolog.c:989
>> #3  0x0000000000457a08 in finish_log (td=td@entry=0x7f4f0fb1f000,
>> log=0x7f4f23e04ad0, trylock=1) at iolog.c:1011
>> #4  0x0000000000457be5 in __write_log (try=<optimized out>,
>> log=<optimized out>, td=0x7f4f0fb1f000) at iolog.c:1309
>> #5  write_bandw_log (td=0x7f4f0fb1f000, try=<optimized out>,
>> unit_log=<optimized out>) at iolog.c:1377
>> #6  0x0000000000457efc in td_writeout_logs (unit_logs=false,
>> td=0x7f4f0fb1f000) at iolog.c:1440
>> #7  fio_writeout_logs (unit_logs=unit_logs@entry=false) at iolog.c:1461
>> #8  0x0000000000466a67 in helper_thread_main (data=0x7f4f23e04d90) at
>> helper_thread.c:135
>> #9  0x00007f4f25a5edf5 in start_thread () from /lib64/libpthread.so.0
>> #10 0x00007f4f255881ad in clone () from /lib64/libc.so.6
>
> Hang on, I think I know what the issue is. Just now running a test to
> prove that the fix is correct, then I'll commit it and reply.

Try current -git, I committed this:

http://git.kernel.dk/cgit/fio/commit/?id=1eb467fb0d2fc2ab41f12d9537184fc6ae95b48a

Hopefully that'll be the last logging bug, after it got rewritten...

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-06-13 19:34                   ` Jens Axboe
@ 2016-06-13 20:55                     ` Jeff Furlong
  2016-06-13 21:23                       ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-06-13 20:55 UTC (permalink / raw)
  To: Jens Axboe, Sitsofe Wheeler; +Cc: fio, Jan Kara

Thanks, the latest git with fio version fio-2.11-22-g1eb4 fixes my workload!

Regards,
Jeff


-----Original Message-----
From: Jens Axboe [mailto:axboe@kernel.dk] 
Sent: Monday, June 13, 2016 12:34 PM
To: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler <sitsofe@gmail.com>
Cc: fio@vger.kernel.org; Jan Kara <jack@suse.cz>
Subject: Re: fio signal 11

On 06/13/2016 01:23 PM, Jens Axboe wrote:
> On 06/13/2016 01:21 PM, Jeff Furlong wrote:
>> I took the fio 2.11 release and applied patch
>> 356014ff351c6eb69339652650af5f6af72e5c22 but it only created a 
>> different error:
>>
>> [New LWP 131047]
>> [New LWP 131030]
>> [Thread debugging using libthread_db enabled] Using host libthread_db 
>> library "/lib64/libthread_db.so.1".
>> Core was generated by `fio --name=test_job --ioengine=libaio
>> --direct=1 --rw=write --io'.
>> Program terminated with signal 11, Segmentation fault.
>> #0  __flist_del (next=0x0, prev=0x0) at flist.h:86
>> 86        next->prev = prev;
>> (gdb) bt
>> #0  __flist_del (next=0x0, prev=0x0) at flist.h:86
>> #1  flist_del_init (entry=0xd609c0) at flist.h:109
>> #2  flush_log (log=<optimized out>, do_append=<optimized out>) at
>> iolog.c:989
>> #3  0x0000000000457a08 in finish_log (td=td@entry=0x7f4f0fb1f000, 
>> log=0x7f4f23e04ad0, trylock=1) at iolog.c:1011
>> #4  0x0000000000457be5 in __write_log (try=<optimized out>, 
>> log=<optimized out>, td=0x7f4f0fb1f000) at iolog.c:1309
>> #5  write_bandw_log (td=0x7f4f0fb1f000, try=<optimized out>, 
>> unit_log=<optimized out>) at iolog.c:1377
>> #6  0x0000000000457efc in td_writeout_logs (unit_logs=false,
>> td=0x7f4f0fb1f000) at iolog.c:1440
>> #7  fio_writeout_logs (unit_logs=unit_logs@entry=false) at 
>> iolog.c:1461
>> #8  0x0000000000466a67 in helper_thread_main (data=0x7f4f23e04d90) at
>> helper_thread.c:135
>> #9  0x00007f4f25a5edf5 in start_thread () from /lib64/libpthread.so.0
>> #10 0x00007f4f255881ad in clone () from /lib64/libc.so.6
>
> Hang on, I think I know what the issue is. Just now running a test to 
> prove that the fix is correct, then I'll commit it and reply.

Try current -git, I committed this:

http://git.kernel.dk/cgit/fio/commit/?id=1eb467fb0d2fc2ab41f12d9537184fc6ae95b48a

Hopefully that'll be the last logging bug, after it got rewritten...

--
Jens Axboe

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-13 20:55                     ` Jeff Furlong
@ 2016-06-13 21:23                       ` Jens Axboe
  0 siblings, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2016-06-13 21:23 UTC (permalink / raw)
  To: Jeff Furlong, Sitsofe Wheeler; +Cc: fio, Jan Kara

On 06/13/2016 02:55 PM, Jeff Furlong wrote:
> Thanks, the latest git with fio version fio-2.11-22-g1eb4 fixes my workload!

Great! Thanks for testing and reporting. I'll be spinning a new release
later today, given that 2.11 is also broken.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-12  3:30       ` Jens Axboe
  2016-06-13 17:58         ` Jeff Furlong
@ 2016-06-15 14:45         ` Jan Kara
  2016-06-16  7:06           ` Jens Axboe
  1 sibling, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-06-15 14:45 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jeff Furlong, Sitsofe Wheeler, fio, Jan Kara

On Sat 11-06-16 21:30:00, Jens Axboe wrote:
> On 06/11/2016 08:56 PM, Jens Axboe wrote:
> >On 06/10/2016 12:42 PM, Jeff Furlong wrote:
> >>Good point.  Here is the trace:
> >>
> >>[New LWP 59231]
> >>[Thread debugging using libthread_db enabled]
> >>Using host libthread_db library "/lib64/libthread_db.so.1".
> >>Core was generated by `fio --name=test_job --ioengine=libaio
> >>--direct=1 --rw=write --iodepth=32'.
> >>Program terminated with signal 11, Segmentation fault.
> >>#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
> >>stat.c:1909
> >>1909        if (!cur_log) {
> >>
> >>(gdb) bt
> >>#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
> >>stat.c:1909
> >>#1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
> >>stat.c:1965
> >>#2  0x000000000040ca90 in wait_for_completions
> >>(td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
> >>backend.c:446
> >>#3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
> >>td=0x7f8277de0000) at backend.c:991
> >>#4  thread_main (data=data@entry=0x264d450) at backend.c:1667
> >>#5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
> >>backend.c:2217
> >>#6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
> >>backend.c:2349
> >>#7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
> >>envp=<optimized out>) at fio.c:63
> >
> >That looks odd, thanks for reporting this. I'll see if I can get to this
> >on Monday, if not, it'll have to wait until after my vacation... So
> >while I appreciate people running -git and finding issues like these
> >before they show up in a release, might be best to revert back to 2.2.11
> >until I can get this debugged.
> 
> I take that back - continue using -git! Just pull a fresh copy, should
> be fixed now.
> 
> Jan, the reporter is right, 2.11 works and -git does not. So I just ran
> a quick bisect, changing the logging from every second to every 100ms to
> make it reproduce faster. I don't have time to look into why yet, so I
> just reverted the commit.
> 
> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
> Author: Jan Kara <jack@suse.cz>
> Date:   Tue May 24 17:03:21 2016 +0200
> 
>     fio: Simplify forking of processes

Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
/dev/sda4 as devices for fio). Is it somehow dependent on the
device fio works with? I have used commit
54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
patch) for testing.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-15 14:45         ` Jan Kara
@ 2016-06-16  7:06           ` Jens Axboe
  2016-07-20  5:08             ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-06-16  7:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jeff Furlong, Sitsofe Wheeler, fio

On 06/15/2016 04:45 PM, Jan Kara wrote:
> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>> Good point.  Here is the trace:
>>>>
>>>> [New LWP 59231]
>>>> [Thread debugging using libthread_db enabled]
>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>> --direct=1 --rw=write --iodepth=32'.
>>>> Program terminated with signal 11, Segmentation fault.
>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>> stat.c:1909
>>>> 1909        if (!cur_log) {
>>>>
>>>> (gdb) bt
>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>> stat.c:1909
>>>> #1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
>>>> stat.c:1965
>>>> #2  0x000000000040ca90 in wait_for_completions
>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>>>> backend.c:446
>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
>>>> td=0x7f8277de0000) at backend.c:991
>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
>>>> backend.c:2217
>>>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
>>>> backend.c:2349
>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>> envp=<optimized out>) at fio.c:63
>>>
>>> That looks odd, thanks for reporting this. I'll see if I can get to this
>>> on Monday, if not, it'll have to wait until after my vacation... So
>>> while I appreciate people running -git and finding issues like these
>>> before they show up in a release, might be best to revert back to 2.2.11
>>> until I can get this debugged.
>>
>> I take that back - continue using -git! Just pull a fresh copy, should
>> be fixed now.
>>
>> Jan, the reporter is right, 2.11 works and -git does not. So I just ran
>> a quick bisect, changing the logging from every second to every 100ms to
>> make it reproduce faster. I don't have time to look into why yet, so I
>> just reverted the commit.
>>
>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>> Author: Jan Kara <jack@suse.cz>
>> Date:   Tue May 24 17:03:21 2016 +0200
>>
>>      fio: Simplify forking of processes
>
> Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
> /dev/sda4 as devices for fio). Is it somehow dependent on the
> device fio works with? I have used commit
> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
> patch) for testing.

On vacation right now, I'll check when I get back. It is possible that 
it was just a fluke, since there was another bug there related to shared 
memory, but it was predictably crashing at the same time for the bisect.

It doesn't make a lot of sense, however.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-06-16  7:06           ` Jens Axboe
@ 2016-07-20  5:08             ` Jan Kara
  2016-07-25 15:21               ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-07-20  5:08 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jan Kara, Jeff Furlong, Sitsofe Wheeler, fio

On Thu 16-06-16 09:06:51, Jens Axboe wrote:
> On 06/15/2016 04:45 PM, Jan Kara wrote:
> >On Sat 11-06-16 21:30:00, Jens Axboe wrote:
> >>On 06/11/2016 08:56 PM, Jens Axboe wrote:
> >>>On 06/10/2016 12:42 PM, Jeff Furlong wrote:
> >>>>Good point.  Here is the trace:
> >>>>
> >>>>[New LWP 59231]
> >>>>[Thread debugging using libthread_db enabled]
> >>>>Using host libthread_db library "/lib64/libthread_db.so.1".
> >>>>Core was generated by `fio --name=test_job --ioengine=libaio
> >>>>--direct=1 --rw=write --iodepth=32'.
> >>>>Program terminated with signal 11, Segmentation fault.
> >>>>#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
> >>>>stat.c:1909
> >>>>1909        if (!cur_log) {
> >>>>
> >>>>(gdb) bt
> >>>>#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
> >>>>stat.c:1909
> >>>>#1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
> >>>>stat.c:1965
> >>>>#2  0x000000000040ca90 in wait_for_completions
> >>>>(td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
> >>>>backend.c:446
> >>>>#3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
> >>>>td=0x7f8277de0000) at backend.c:991
> >>>>#4  thread_main (data=data@entry=0x264d450) at backend.c:1667
> >>>>#5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
> >>>>backend.c:2217
> >>>>#6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
> >>>>backend.c:2349
> >>>>#7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
> >>>>envp=<optimized out>) at fio.c:63
> >>>
> >>>That looks odd, thanks for reporting this. I'll see if I can get to this
> >>>on Monday, if not, it'll have to wait until after my vacation... So
> >>>while I appreciate people running -git and finding issues like these
> >>>before they show up in a release, might be best to revert back to 2.2.11
> >>>until I can get this debugged.
> >>
> >>I take that back - continue using -git! Just pull a fresh copy, should
> >>be fixed now.
> >>
> >>Jan, the reporter is right, 2.11 works and -git does not. So I just ran
> >>a quick bisect, changing the logging from every second to every 100ms to
> >>make it reproduce faster. I don't have time to look into why yet, so I
> >>just reverted the commit.
> >>
> >>commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
> >>Author: Jan Kara <jack@suse.cz>
> >>Date:   Tue May 24 17:03:21 2016 +0200
> >>
> >>     fio: Simplify forking of processes
> >
> >Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
> >/dev/sda4 as devices for fio). Is it somehow dependent on the
> >device fio works with? I have used commit
> >54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
> >patch) for testing.
> 
> On vacation right now, I'll check when I get back. It is possible that it
> was just a fluke, since there was another bug there related to shared
> memory, but it was predictably crashing at the same time for the bisect.
> 
> It doesn't make a lot of sense, however.

Did you have a chance to look into this?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-07-20  5:08             ` Jan Kara
@ 2016-07-25 15:21               ` Jens Axboe
  2016-07-26  8:43                 ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-07-25 15:21 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jeff Furlong, Sitsofe Wheeler, fio

On 07/19/2016 11:08 PM, Jan Kara wrote:
> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>> Good point.  Here is the trace:
>>>>>>
>>>>>> [New LWP 59231]
>>>>>> [Thread debugging using libthread_db enabled]
>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>> stat.c:1909
>>>>>> 1909        if (!cur_log) {
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>> stat.c:1909
>>>>>> #1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
>>>>>> stat.c:1965
>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>>>>>> backend.c:446
>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
>>>>>> backend.c:2217
>>>>>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
>>>>>> backend.c:2349
>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>> envp=<optimized out>) at fio.c:63
>>>>>
>>>>> That looks odd, thanks for reporting this. I'll see if I can get to this
>>>>> on Monday, if not, it'll have to wait until after my vacation... So
>>>>> while I appreciate people running -git and finding issues like these
>>>>> before they show up in a release, might be best to revert back to 2.2.11
>>>>> until I can get this debugged.
>>>>
>>>> I take that back - continue using -git! Just pull a fresh copy, should
>>>> be fixed now.
>>>>
>>>> Jan, the reporter is right, 2.11 works and -git does not. So I just ran
>>>> a quick bisect, changing the logging from every second to every 100ms to
>>>> make it reproduce faster. I don't have time to look into why yet, so I
>>>> just reverted the commit.
>>>>
>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>> Author: Jan Kara <jack@suse.cz>
>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>
>>>>     fio: Simplify forking of processes
>>>
>>> Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>> device fio works with? I have used commit
>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
>>> patch) for testing.
>>
>> On vacation right now, I'll check when I get back. It is possible that it
>> was just a fluke, since there was another bug there related to shared
>> memory, but it was predictably crashing at the same time for the bisect.
>>
>> It doesn't make a lot of sense, however.
>
> Did you have a chance to look into this?

I have not, unfortunately, but I'm suspecting the patch is fine and the
later fix to allocate the cur_log out of the shared pool was the real
fix and that the original patch was fine.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-07-25 15:21               ` Jens Axboe
@ 2016-07-26  8:43                 ` Jan Kara
  2016-07-26 14:17                   ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2016-07-26  8:43 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Jan Kara, Jeff Furlong, Sitsofe Wheeler, fio

On Mon 25-07-16 09:21:00, Jens Axboe wrote:
> On 07/19/2016 11:08 PM, Jan Kara wrote:
> >On Thu 16-06-16 09:06:51, Jens Axboe wrote:
> >>On 06/15/2016 04:45 PM, Jan Kara wrote:
> >>>On Sat 11-06-16 21:30:00, Jens Axboe wrote:
> >>>>On 06/11/2016 08:56 PM, Jens Axboe wrote:
> >>>>>On 06/10/2016 12:42 PM, Jeff Furlong wrote:
> >>>>>>Good point.  Here is the trace:
> >>>>>>
> >>>>>>[New LWP 59231]
> >>>>>>[Thread debugging using libthread_db enabled]
> >>>>>>Using host libthread_db library "/lib64/libthread_db.so.1".
> >>>>>>Core was generated by `fio --name=test_job --ioengine=libaio
> >>>>>>--direct=1 --rw=write --iodepth=32'.
> >>>>>>Program terminated with signal 11, Segmentation fault.
> >>>>>>#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
> >>>>>>stat.c:1909
> >>>>>>1909        if (!cur_log) {
> >>>>>>
> >>>>>>(gdb) bt
> >>>>>>#0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
> >>>>>>stat.c:1909
> >>>>>>#1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
> >>>>>>stat.c:1965
> >>>>>>#2  0x000000000040ca90 in wait_for_completions
> >>>>>>(td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
> >>>>>>backend.c:446
> >>>>>>#3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
> >>>>>>td=0x7f8277de0000) at backend.c:991
> >>>>>>#4  thread_main (data=data@entry=0x264d450) at backend.c:1667
> >>>>>>#5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
> >>>>>>backend.c:2217
> >>>>>>#6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
> >>>>>>backend.c:2349
> >>>>>>#7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
> >>>>>>envp=<optimized out>) at fio.c:63
> >>>>>
> >>>>>That looks odd, thanks for reporting this. I'll see if I can get to this
> >>>>>on Monday, if not, it'll have to wait until after my vacation... So
> >>>>>while I appreciate people running -git and finding issues like these
> >>>>>before they show up in a release, might be best to revert back to 2.2.11
> >>>>>until I can get this debugged.
> >>>>
> >>>>I take that back - continue using -git! Just pull a fresh copy, should
> >>>>be fixed now.
> >>>>
> >>>>Jan, the reporter is right, 2.11 works and -git does not. So I just ran
> >>>>a quick bisect, changing the logging from every second to every 100ms to
> >>>>make it reproduce faster. I don't have time to look into why yet, so I
> >>>>just reverted the commit.
> >>>>
> >>>>commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
> >>>>Author: Jan Kara <jack@suse.cz>
> >>>>Date:   Tue May 24 17:03:21 2016 +0200
> >>>>
> >>>>    fio: Simplify forking of processes
> >>>
> >>>Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
> >>>/dev/sda4 as devices for fio). Is it somehow dependent on the
> >>>device fio works with? I have used commit
> >>>54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
> >>>patch) for testing.
> >>
> >>On vacation right now, I'll check when I get back. It is possible that it
> >>was just a fluke, since there was another bug there related to shared
> >>memory, but it was predictably crashing at the same time for the bisect.
> >>
> >>It doesn't make a lot of sense, however.
> >
> >Did you have a chance to look into this?
> 
> I have not, unfortunately, but I'm suspecting the patch is fine and the
> later fix to allocate the cur_log out of the shared pool was the real
> fix and that the original patch was fine.

So that's what I'd suspect as well but I'm not able to reproduce even the
original crash so I cannot verify this theory... What's the plan going
forward? Will you re-apply the patch? Frankly, I don't care much, it was
just a small cleanup. I'm just curious whether it was really that other bug
or whether I miss something.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-07-26  8:43                 ` Jan Kara
@ 2016-07-26 14:17                   ` Jens Axboe
  2016-07-26 18:33                     ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-07-26 14:17 UTC (permalink / raw)
  To: Jan Kara; +Cc: Jeff Furlong, Sitsofe Wheeler, fio

On 07/26/2016 02:43 AM, Jan Kara wrote:
> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>> Good point.  Here is the trace:
>>>>>>>>
>>>>>>>> [New LWP 59231]
>>>>>>>> [Thread debugging using libthread_db enabled]
>>>>>>>> Using host libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>> stat.c:1909
>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>> stat.c:1909
>>>>>>>> #1  0x000000000042d4df in regrow_logs (td=td@entry=0x7f8277de0000) at
>>>>>>>> stat.c:1965
>>>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>>>>>>>> backend.c:446
>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic pointer>,
>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) at
>>>>>>>> backend.c:2217
>>>>>>>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) at
>>>>>>>> backend.c:2349
>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>
>>>>>>> That looks odd, thanks for reporting this. I'll see if I can get to this
>>>>>>> on Monday, if not, it'll have to wait until after my vacation... So
>>>>>>> while I appreciate people running -git and finding issues like these
>>>>>>> before they show up in a release, might be best to revert back to 2.2.11
>>>>>>> until I can get this debugged.
>>>>>>
>>>>>> I take that back - continue using -git! Just pull a fresh copy, should
>>>>>> be fixed now.
>>>>>>
>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I just ran
>>>>>> a quick bisect, changing the logging from every second to every 100ms to
>>>>>> make it reproduce faster. I don't have time to look into why yet, so I
>>>>>> just reverted the commit.
>>>>>>
>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>
>>>>>>    fio: Simplify forking of processes
>>>>>
>>>>> Hum, I've tried reproducing this but failed (I've tried using /dev/ram0 and
>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>>>> device fio works with? I have used commit
>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted my
>>>>> patch) for testing.
>>>>
>>>> On vacation right now, I'll check when I get back. It is possible that it
>>>> was just a fluke, since there was another bug there related to shared
>>>> memory, but it was predictably crashing at the same time for the bisect.
>>>>
>>>> It doesn't make a lot of sense, however.
>>>
>>> Did you have a chance to look into this?
>>
>> I have not, unfortunately, but I'm suspecting the patch is fine and the
>> later fix to allocate the cur_log out of the shared pool was the real
>> fix and that the original patch was fine.
>
> So that's what I'd suspect as well but I'm not able to reproduce even the
> original crash so I cannot verify this theory... What's the plan going
> forward? Will you re-apply the patch? Frankly, I don't care much, it was
> just a small cleanup. I'm just curious whether it was really that other bug
> or whether I miss something.

Yes, I think re-applying would be the best way forward. Especially since
that 2.13 was just released, so we'll have a while to iron out any
issues. But I really don't see how it could be the reason for the issue,
I'm guessing it just exacerbated it somehow.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-07-26 14:17                   ` Jens Axboe
@ 2016-07-26 18:33                     ` Jeff Furlong
  2016-07-26 18:35                       ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-07-26 18:33 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: Sitsofe Wheeler, fio

FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.

Regards,
Jeff


-----Original Message-----
From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On Behalf Of Jens Axboe
Sent: Tuesday, July 26, 2016 7:17 AM
To: Jan Kara <jack@suse.cz>
Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
Subject: Re: fio signal 11

On 07/26/2016 02:43 AM, Jan Kara wrote:
> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>> Good point.  Here is the trace:
>>>>>>>>
>>>>>>>> [New LWP 59231]
>>>>>>>> [Thread debugging using libthread_db enabled] Using host 
>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>> stat.c:1909
>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>> stat.c:1909
>>>>>>>> #1  0x000000000042d4df in regrow_logs 
>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>> stat.c:1965
>>>>>>>> #2  0x000000000040ca90 in wait_for_completions 
>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>>>>>>>> backend.c:446
>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic 
>>>>>>>> pointer>,
>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0) 
>>>>>>>> at
>>>>>>>> backend.c:2217
>>>>>>>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0) 
>>>>>>>> at
>>>>>>>> backend.c:2349
>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, 
>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>
>>>>>>> That looks odd, thanks for reporting this. I'll see if I can get 
>>>>>>> to this on Monday, if not, it'll have to wait until after my 
>>>>>>> vacation... So while I appreciate people running -git and 
>>>>>>> finding issues like these before they show up in a release, 
>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>
>>>>>> I take that back - continue using -git! Just pull a fresh copy, 
>>>>>> should be fixed now.
>>>>>>
>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I 
>>>>>> just ran a quick bisect, changing the logging from every second 
>>>>>> to every 100ms to make it reproduce faster. I don't have time to 
>>>>>> look into why yet, so I just reverted the commit.
>>>>>>
>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>
>>>>>>    fio: Simplify forking of processes
>>>>>
>>>>> Hum, I've tried reproducing this but failed (I've tried using 
>>>>> /dev/ram0 and
>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the 
>>>>> device fio works with? I have used commit
>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted 
>>>>> my
>>>>> patch) for testing.
>>>>
>>>> On vacation right now, I'll check when I get back. It is possible 
>>>> that it was just a fluke, since there was another bug there related 
>>>> to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>
>>>> It doesn't make a lot of sense, however.
>>>
>>> Did you have a chance to look into this?
>>
>> I have not, unfortunately, but I'm suspecting the patch is fine and 
>> the later fix to allocate the cur_log out of the shared pool was the 
>> real fix and that the original patch was fine.
>
> So that's what I'd suspect as well but I'm not able to reproduce even 
> the original crash so I cannot verify this theory... What's the plan 
> going forward? Will you re-apply the patch? Frankly, I don't care 
> much, it was just a small cleanup. I'm just curious whether it was 
> really that other bug or whether I miss something.

Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-07-26 18:33                     ` Jeff Furlong
@ 2016-07-26 18:35                       ` Jens Axboe
  2016-08-01 22:57                         ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-07-26 18:35 UTC (permalink / raw)
  To: Jeff Furlong, Jan Kara; +Cc: Sitsofe Wheeler, fio

Perfect, thanks for testing!


On 07/26/2016 12:33 PM, Jeff Furlong wrote:
> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.
>
> Regards,
> Jeff
>
>
> -----Original Message-----
> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On Behalf Of Jens Axboe
> Sent: Tuesday, July 26, 2016 7:17 AM
> To: Jan Kara <jack@suse.cz>
> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
> Subject: Re: fio signal 11
>
> On 07/26/2016 02:43 AM, Jan Kara wrote:
>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>
>>>>>>>>> [New LWP 59231]
>>>>>>>>> [Thread debugging using libthread_db enabled] Using host
>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>> stat.c:1909
>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>
>>>>>>>>> (gdb) bt
>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>> stat.c:1909
>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>> stat.c:1965
>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) at
>>>>>>>>> backend.c:446
>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>> pointer>,
>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>> #5  0x000000000045cfec in run_threads (sk_out=sk_out@entry=0x0)
>>>>>>>>> at
>>>>>>>>> backend.c:2217
>>>>>>>>> #6  0x000000000045d2cd in fio_backend (sk_out=sk_out@entry=0x0)
>>>>>>>>> at
>>>>>>>>> backend.c:2349
>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>
>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can get
>>>>>>>> to this on Monday, if not, it'll have to wait until after my
>>>>>>>> vacation... So while I appreciate people running -git and
>>>>>>>> finding issues like these before they show up in a release,
>>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>>
>>>>>>> I take that back - continue using -git! Just pull a fresh copy,
>>>>>>> should be fixed now.
>>>>>>>
>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I
>>>>>>> just ran a quick bisect, changing the logging from every second
>>>>>>> to every 100ms to make it reproduce faster. I don't have time to
>>>>>>> look into why yet, so I just reverted the commit.
>>>>>>>
>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>
>>>>>>>    fio: Simplify forking of processes
>>>>>>
>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>> /dev/ram0 and
>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>>>>> device fio works with? I have used commit
>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you reverted
>>>>>> my
>>>>>> patch) for testing.
>>>>>
>>>>> On vacation right now, I'll check when I get back. It is possible
>>>>> that it was just a fluke, since there was another bug there related
>>>>> to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>>
>>>>> It doesn't make a lot of sense, however.
>>>>
>>>> Did you have a chance to look into this?
>>>
>>> I have not, unfortunately, but I'm suspecting the patch is fine and
>>> the later fix to allocate the cur_log out of the shared pool was the
>>> real fix and that the original patch was fine.
>>
>> So that's what I'd suspect as well but I'm not able to reproduce even
>> the original crash so I cannot verify this theory... What's the plan
>> going forward? Will you re-apply the patch? Frankly, I don't care
>> much, it was just a small cleanup. I'm just curious whether it was
>> really that other bug or whether I miss something.
>
> Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@vger.kernel.org More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-07-26 18:35                       ` Jens Axboe
@ 2016-08-01 22:57                         ` Jeff Furlong
  2016-08-02 21:03                           ` Jens Axboe
  2016-08-03 14:55                           ` Jens Axboe
  0 siblings, 2 replies; 28+ messages in thread
From: Jeff Furlong @ 2016-08-01 22:57 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: Sitsofe Wheeler, fio

Sorry to open this item back up.  However, it appears when we add the ramp_time option, we break the logging.  Specifically, slat will log every entry, regardless of log_avg_msec.

This example works as intended:
# fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job

This example is the same, but adds a ramp_time, but the slat log is full of all entries:
# fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job --ramp_time=1s

Regards,
Jeff

-----Original Message-----
From: Jens Axboe [mailto:axboe@kernel.dk] 
Sent: Tuesday, July 26, 2016 11:35 AM
To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
Subject: Re: fio signal 11

Perfect, thanks for testing!


On 07/26/2016 12:33 PM, Jeff Furlong wrote:
> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.
>
> Regards,
> Jeff
>
>
> -----Original Message-----
> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On 
> Behalf Of Jens Axboe
> Sent: Tuesday, July 26, 2016 7:17 AM
> To: Jan Kara <jack@suse.cz>
> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler 
> <sitsofe@gmail.com>; fio@vger.kernel.org
> Subject: Re: fio signal 11
>
> On 07/26/2016 02:43 AM, Jan Kara wrote:
>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>
>>>>>>>>> [New LWP 59231]
>>>>>>>>> [Thread debugging using libthread_db enabled] Using host 
>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>> stat.c:1909
>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>
>>>>>>>>> (gdb) bt
>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>> stat.c:1909
>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>> stat.c:1965
>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions 
>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) 
>>>>>>>>> at
>>>>>>>>> backend.c:446
>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>> pointer>,
>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>> #5  0x000000000045cfec in run_threads 
>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>> backend.c:2217
>>>>>>>>> #6  0x000000000045d2cd in fio_backend 
>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>> backend.c:2349
>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, 
>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>
>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can 
>>>>>>>> get to this on Monday, if not, it'll have to wait until after 
>>>>>>>> my vacation... So while I appreciate people running -git and 
>>>>>>>> finding issues like these before they show up in a release, 
>>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>>
>>>>>>> I take that back - continue using -git! Just pull a fresh copy, 
>>>>>>> should be fixed now.
>>>>>>>
>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I 
>>>>>>> just ran a quick bisect, changing the logging from every second 
>>>>>>> to every 100ms to make it reproduce faster. I don't have time to 
>>>>>>> look into why yet, so I just reverted the commit.
>>>>>>>
>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>
>>>>>>>    fio: Simplify forking of processes
>>>>>>
>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>> /dev/ram0 and
>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the 
>>>>>> device fio works with? I have used commit
>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you 
>>>>>> reverted my
>>>>>> patch) for testing.
>>>>>
>>>>> On vacation right now, I'll check when I get back. It is possible 
>>>>> that it was just a fluke, since there was another bug there 
>>>>> related to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>>
>>>>> It doesn't make a lot of sense, however.
>>>>
>>>> Did you have a chance to look into this?
>>>
>>> I have not, unfortunately, but I'm suspecting the patch is fine and 
>>> the later fix to allocate the cur_log out of the shared pool was the 
>>> real fix and that the original patch was fine.
>>
>> So that's what I'd suspect as well but I'm not able to reproduce even 
>> the original crash so I cannot verify this theory... What's the plan 
>> going forward? Will you re-apply the patch? Frankly, I don't care 
>> much, it was just a small cleanup. I'm just curious whether it was 
>> really that other bug or whether I miss something.
>
> Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.
>
> --
> Jens Axboe
>
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in the 
> body of a message to majordomo@vger.kernel.org More majordomo info at  
> http://vger.kernel.org/majordomo-info.html
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>


--
Jens Axboe

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-08-01 22:57                         ` Jeff Furlong
@ 2016-08-02 21:03                           ` Jens Axboe
  2016-08-03 14:55                           ` Jens Axboe
  1 sibling, 0 replies; 28+ messages in thread
From: Jens Axboe @ 2016-08-02 21:03 UTC (permalink / raw)
  To: Jeff Furlong, Jan Kara; +Cc: Sitsofe Wheeler, fio

But not related to the segfault issue, this is a continuation of another
logging problem. I'll check up on that one.


On 08/01/2016 03:57 PM, Jeff Furlong wrote:
> Sorry to open this item back up.  However, it appears when we add the ramp_time option, we break the logging.  Specifically, slat will log every entry, regardless of log_avg_msec.
>
> This example works as intended:
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job
>
> This example is the same, but adds a ramp_time, but the slat log is full of all entries:
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job --ramp_time=1s
>
> Regards,
> Jeff
>
> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Tuesday, July 26, 2016 11:35 AM
> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
> Subject: Re: fio signal 11
>
> Perfect, thanks for testing!
>
>
> On 07/26/2016 12:33 PM, Jeff Furlong wrote:
>> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.
>>
>> Regards,
>> Jeff
>>
>>
>> -----Original Message-----
>> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On
>> Behalf Of Jens Axboe
>> Sent: Tuesday, July 26, 2016 7:17 AM
>> To: Jan Kara <jack@suse.cz>
>> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler
>> <sitsofe@gmail.com>; fio@vger.kernel.org
>> Subject: Re: fio signal 11
>>
>> On 07/26/2016 02:43 AM, Jan Kara wrote:
>>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>>
>>>>>>>>>> [New LWP 59231]
>>>>>>>>>> [Thread debugging using libthread_db enabled] Using host
>>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>>> stat.c:1909
>>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>>
>>>>>>>>>> (gdb) bt
>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>>> stat.c:1909
>>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>>> stat.c:1965
>>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300)
>>>>>>>>>> at
>>>>>>>>>> backend.c:446
>>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>>> pointer>,
>>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>>> #5  0x000000000045cfec in run_threads
>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>> backend.c:2217
>>>>>>>>>> #6  0x000000000045d2cd in fio_backend
>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>> backend.c:2349
>>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>>
>>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can
>>>>>>>>> get to this on Monday, if not, it'll have to wait until after
>>>>>>>>> my vacation... So while I appreciate people running -git and
>>>>>>>>> finding issues like these before they show up in a release,
>>>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>>>
>>>>>>>> I take that back - continue using -git! Just pull a fresh copy,
>>>>>>>> should be fixed now.
>>>>>>>>
>>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I
>>>>>>>> just ran a quick bisect, changing the logging from every second
>>>>>>>> to every 100ms to make it reproduce faster. I don't have time to
>>>>>>>> look into why yet, so I just reverted the commit.
>>>>>>>>
>>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>>
>>>>>>>>    fio: Simplify forking of processes
>>>>>>>
>>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>>> /dev/ram0 and
>>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>>>>>> device fio works with? I have used commit
>>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you
>>>>>>> reverted my
>>>>>>> patch) for testing.
>>>>>>
>>>>>> On vacation right now, I'll check when I get back. It is possible
>>>>>> that it was just a fluke, since there was another bug there
>>>>>> related to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>>>
>>>>>> It doesn't make a lot of sense, however.
>>>>>
>>>>> Did you have a chance to look into this?
>>>>
>>>> I have not, unfortunately, but I'm suspecting the patch is fine and
>>>> the later fix to allocate the cur_log out of the shared pool was the
>>>> real fix and that the original patch was fine.
>>>
>>> So that's what I'd suspect as well but I'm not able to reproduce even
>>> the original crash so I cannot verify this theory... What's the plan
>>> going forward? Will you re-apply the patch? Frankly, I don't care
>>> much, it was just a small cleanup. I'm just curious whether it was
>>> really that other bug or whether I miss something.
>>
>> Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.
>>
>> --
>> Jens Axboe
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fio" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>>
>
>
> --
> Jens Axboe
>
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-08-01 22:57                         ` Jeff Furlong
  2016-08-02 21:03                           ` Jens Axboe
@ 2016-08-03 14:55                           ` Jens Axboe
  2016-08-03 15:02                             ` Jeff Furlong
  1 sibling, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-08-03 14:55 UTC (permalink / raw)
  To: Jeff Furlong, Jan Kara; +Cc: Sitsofe Wheeler, fio

What version did you test? Works fine for me.


On 08/01/2016 03:57 PM, Jeff Furlong wrote:
> Sorry to open this item back up.  However, it appears when we add the ramp_time option, we break the logging.  Specifically, slat will log every entry, regardless of log_avg_msec.
>
> This example works as intended:
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job
>
> This example is the same, but adds a ramp_time, but the slat log is full of all entries:
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job --ramp_time=1s
>
> Regards,
> Jeff
>
> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Tuesday, July 26, 2016 11:35 AM
> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
> Subject: Re: fio signal 11
>
> Perfect, thanks for testing!
>
>
> On 07/26/2016 12:33 PM, Jeff Furlong wrote:
>> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.
>>
>> Regards,
>> Jeff
>>
>>
>> -----Original Message-----
>> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On
>> Behalf Of Jens Axboe
>> Sent: Tuesday, July 26, 2016 7:17 AM
>> To: Jan Kara <jack@suse.cz>
>> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler
>> <sitsofe@gmail.com>; fio@vger.kernel.org
>> Subject: Re: fio signal 11
>>
>> On 07/26/2016 02:43 AM, Jan Kara wrote:
>>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>>
>>>>>>>>>> [New LWP 59231]
>>>>>>>>>> [Thread debugging using libthread_db enabled] Using host
>>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>>> stat.c:1909
>>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>>
>>>>>>>>>> (gdb) bt
>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) at
>>>>>>>>>> stat.c:1909
>>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>>> stat.c:1965
>>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300)
>>>>>>>>>> at
>>>>>>>>>> backend.c:446
>>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>>> pointer>,
>>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>>> #5  0x000000000045cfec in run_threads
>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>> backend.c:2217
>>>>>>>>>> #6  0x000000000045d2cd in fio_backend
>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>> backend.c:2349
>>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>>
>>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can
>>>>>>>>> get to this on Monday, if not, it'll have to wait until after
>>>>>>>>> my vacation... So while I appreciate people running -git and
>>>>>>>>> finding issues like these before they show up in a release,
>>>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>>>
>>>>>>>> I take that back - continue using -git! Just pull a fresh copy,
>>>>>>>> should be fixed now.
>>>>>>>>
>>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I
>>>>>>>> just ran a quick bisect, changing the logging from every second
>>>>>>>> to every 100ms to make it reproduce faster. I don't have time to
>>>>>>>> look into why yet, so I just reverted the commit.
>>>>>>>>
>>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>>
>>>>>>>>    fio: Simplify forking of processes
>>>>>>>
>>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>>> /dev/ram0 and
>>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>>>>>> device fio works with? I have used commit
>>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you
>>>>>>> reverted my
>>>>>>> patch) for testing.
>>>>>>
>>>>>> On vacation right now, I'll check when I get back. It is possible
>>>>>> that it was just a fluke, since there was another bug there
>>>>>> related to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>>>
>>>>>> It doesn't make a lot of sense, however.
>>>>>
>>>>> Did you have a chance to look into this?
>>>>
>>>> I have not, unfortunately, but I'm suspecting the patch is fine and
>>>> the later fix to allocate the cur_log out of the shared pool was the
>>>> real fix and that the original patch was fine.
>>>
>>> So that's what I'd suspect as well but I'm not able to reproduce even
>>> the original crash so I cannot verify this theory... What's the plan
>>> going forward? Will you re-apply the patch? Frankly, I don't care
>>> much, it was just a small cleanup. I'm just curious whether it was
>>> really that other bug or whether I miss something.
>>
>> Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.
>>
>> --
>> Jens Axboe
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fio" in the
>> body of a message to majordomo@vger.kernel.org More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>>
>
>
> --
> Jens Axboe
>
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-08-03 14:55                           ` Jens Axboe
@ 2016-08-03 15:02                             ` Jeff Furlong
  2016-08-03 15:03                               ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Furlong @ 2016-08-03 15:02 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: Sitsofe Wheeler, fio

# fio -version
fio-2.13-28-g059b6

# fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job --ramp_time=1s

# ls -l
-rw-r--r-- 1 root root     1748 Aug  3 07:57 test_job
-rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.1.log
-rw-r--r-- 1 root root      196 Aug  3 07:57 test_job_bw.2.log
-rw-r--r-- 1 root root      192 Aug  3 07:57 test_job_bw.3.log
-rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.4.log
-rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_clat.1.log
-rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_clat.2.log
-rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_clat.3.log
-rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_clat.4.log
-rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.1.log
-rw-r--r-- 1 root root      187 Aug  3 07:57 test_job_iops.2.log
-rw-r--r-- 1 root root      184 Aug  3 07:57 test_job_iops.3.log
-rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.4.log
-rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_lat.1.log
-rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_lat.2.log
-rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_lat.3.log
-rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_lat.4.log
-rw-r--r-- 1 root root 49597737 Aug  3 07:57 test_job_slat.1.log
-rw-r--r-- 1 root root 42065243 Aug  3 07:57 test_job_slat.2.log
-rw-r--r-- 1 root root 24407670 Aug  3 07:57 test_job_slat.3.log
-rw-r--r-- 1 root root 47090233 Aug  3 07:57 test_job_slat.4.log

Above we can see the slat log files are huge for a 10s runtime.  Not sure if it's a factor of total runtime or IOs during that runtime.  I can also create the issue when I reduce the runtime to 1s.

Regards,
Jeff

-----Original Message-----
From: Jens Axboe [mailto:axboe@kernel.dk] 
Sent: Wednesday, August 3, 2016 7:56 AM
To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
Subject: Re: fio signal 11

What version did you test? Works fine for me.


On 08/01/2016 03:57 PM, Jeff Furlong wrote:
> Sorry to open this item back up.  However, it appears when we add the ramp_time option, we break the logging.  Specifically, slat will log every entry, regardless of log_avg_msec.
>
> This example works as intended:
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread 
> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 
> --group_reporting --write_bw_log=test_job --write_iops_log=test_job 
> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 
> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based 
> --output=test_job
>
> This example is the same, but adds a ramp_time, but the slat log is full of all entries:
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread 
> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 
> --group_reporting --write_bw_log=test_job --write_iops_log=test_job 
> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 
> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based 
> --output=test_job --ramp_time=1s
>
> Regards,
> Jeff
>
> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Tuesday, July 26, 2016 11:35 AM
> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
> Subject: Re: fio signal 11
>
> Perfect, thanks for testing!
>
>
> On 07/26/2016 12:33 PM, Jeff Furlong wrote:
>> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.
>>
>> Regards,
>> Jeff
>>
>>
>> -----Original Message-----
>> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On 
>> Behalf Of Jens Axboe
>> Sent: Tuesday, July 26, 2016 7:17 AM
>> To: Jan Kara <jack@suse.cz>
>> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler 
>> <sitsofe@gmail.com>; fio@vger.kernel.org
>> Subject: Re: fio signal 11
>>
>> On 07/26/2016 02:43 AM, Jan Kara wrote:
>>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>>
>>>>>>>>>> [New LWP 59231]
>>>>>>>>>> [Thread debugging using libthread_db enabled] Using host 
>>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) 
>>>>>>>>>> at
>>>>>>>>>> stat.c:1909
>>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>>
>>>>>>>>>> (gdb) bt
>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) 
>>>>>>>>>> at
>>>>>>>>>> stat.c:1909
>>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>>> stat.c:1965
>>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions 
>>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300) 
>>>>>>>>>> at
>>>>>>>>>> backend.c:446
>>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>>> pointer>,
>>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>>> #5  0x000000000045cfec in run_threads
>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>> backend.c:2217
>>>>>>>>>> #6  0x000000000045d2cd in fio_backend
>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>> backend.c:2349
>>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638, 
>>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>>
>>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can 
>>>>>>>>> get to this on Monday, if not, it'll have to wait until after 
>>>>>>>>> my vacation... So while I appreciate people running -git and 
>>>>>>>>> finding issues like these before they show up in a release, 
>>>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>>>
>>>>>>>> I take that back - continue using -git! Just pull a fresh copy, 
>>>>>>>> should be fixed now.
>>>>>>>>
>>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I 
>>>>>>>> just ran a quick bisect, changing the logging from every second 
>>>>>>>> to every 100ms to make it reproduce faster. I don't have time 
>>>>>>>> to look into why yet, so I just reverted the commit.
>>>>>>>>
>>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>>
>>>>>>>>    fio: Simplify forking of processes
>>>>>>>
>>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>>> /dev/ram0 and
>>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the 
>>>>>>> device fio works with? I have used commit
>>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you 
>>>>>>> reverted my
>>>>>>> patch) for testing.
>>>>>>
>>>>>> On vacation right now, I'll check when I get back. It is possible 
>>>>>> that it was just a fluke, since there was another bug there 
>>>>>> related to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>>>
>>>>>> It doesn't make a lot of sense, however.
>>>>>
>>>>> Did you have a chance to look into this?
>>>>
>>>> I have not, unfortunately, but I'm suspecting the patch is fine and 
>>>> the later fix to allocate the cur_log out of the shared pool was 
>>>> the real fix and that the original patch was fine.
>>>
>>> So that's what I'd suspect as well but I'm not able to reproduce 
>>> even the original crash so I cannot verify this theory... What's the 
>>> plan going forward? Will you re-apply the patch? Frankly, I don't 
>>> care much, it was just a small cleanup. I'm just curious whether it 
>>> was really that other bug or whether I miss something.
>>
>> Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.
>>
>> --
>> Jens Axboe
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe fio" in the 
>> body of a message to majordomo@vger.kernel.org More majordomo info at 
>> http://vger.kernel.org/majordomo-info.html
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>>
>
>
> --
> Jens Axboe
>
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>


--
Jens Axboe

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-08-03 15:02                             ` Jeff Furlong
@ 2016-08-03 15:03                               ` Jens Axboe
  2016-08-03 15:14                                 ` Jens Axboe
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-08-03 15:03 UTC (permalink / raw)
  To: Jeff Furlong, Jan Kara; +Cc: Sitsofe Wheeler, fio

I can reproduce it now. Seems to happen a bit randomly for me, and not 
all 4 slat logs are big, some of them are averaged fine. So smells like 
a race in updating after going out of ramp time.


On 08/03/2016 08:02 AM, Jeff Furlong wrote:
> # fio -version
> fio-2.13-28-g059b6
>
> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job --write_iops_log=test_job --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0 --disable_clat=0 --disable_slat=0 --runtime=10s --time_based --output=test_job --ramp_time=1s
>
> # ls -l
> -rw-r--r-- 1 root root     1748 Aug  3 07:57 test_job
> -rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.1.log
> -rw-r--r-- 1 root root      196 Aug  3 07:57 test_job_bw.2.log
> -rw-r--r-- 1 root root      192 Aug  3 07:57 test_job_bw.3.log
> -rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.4.log
> -rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_clat.1.log
> -rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_clat.2.log
> -rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_clat.3.log
> -rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_clat.4.log
> -rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.1.log
> -rw-r--r-- 1 root root      187 Aug  3 07:57 test_job_iops.2.log
> -rw-r--r-- 1 root root      184 Aug  3 07:57 test_job_iops.3.log
> -rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.4.log
> -rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_lat.1.log
> -rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_lat.2.log
> -rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_lat.3.log
> -rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_lat.4.log
> -rw-r--r-- 1 root root 49597737 Aug  3 07:57 test_job_slat.1.log
> -rw-r--r-- 1 root root 42065243 Aug  3 07:57 test_job_slat.2.log
> -rw-r--r-- 1 root root 24407670 Aug  3 07:57 test_job_slat.3.log
> -rw-r--r-- 1 root root 47090233 Aug  3 07:57 test_job_slat.4.log
>
> Above we can see the slat log files are huge for a 10s runtime.  Not sure if it's a factor of total runtime or IOs during that runtime.  I can also create the issue when I reduce the runtime to 1s.
>
> Regards,
> Jeff
>
> -----Original Message-----
> From: Jens Axboe [mailto:axboe@kernel.dk]
> Sent: Wednesday, August 3, 2016 7:56 AM
> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
> Subject: Re: fio signal 11
>
> What version did you test? Works fine for me.
>
>
> On 08/01/2016 03:57 PM, Jeff Furlong wrote:
>> Sorry to open this item back up.  However, it appears when we add the ramp_time option, we break the logging.  Specifically, slat will log every entry, regardless of log_avg_msec.
>>
>> This example works as intended:
>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1
>> --group_reporting --write_bw_log=test_job --write_iops_log=test_job
>> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0
>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based
>> --output=test_job
>>
>> This example is the same, but adds a ramp_time, but the slat log is full of all entries:
>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1
>> --group_reporting --write_bw_log=test_job --write_iops_log=test_job
>> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0
>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based
>> --output=test_job --ramp_time=1s
>>
>> Regards,
>> Jeff
>>
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@kernel.dk]
>> Sent: Tuesday, July 26, 2016 11:35 AM
>> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
>> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
>> Subject: Re: fio signal 11
>>
>> Perfect, thanks for testing!
>>
>>
>> On 07/26/2016 12:33 PM, Jeff Furlong wrote:
>>> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my prior workload that caused the signal 11.  The workload now completes without issue.
>>>
>>> Regards,
>>> Jeff
>>>
>>>
>>> -----Original Message-----
>>> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On
>>> Behalf Of Jens Axboe
>>> Sent: Tuesday, July 26, 2016 7:17 AM
>>> To: Jan Kara <jack@suse.cz>
>>> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler
>>> <sitsofe@gmail.com>; fio@vger.kernel.org
>>> Subject: Re: fio signal 11
>>>
>>> On 07/26/2016 02:43 AM, Jan Kara wrote:
>>>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>>>
>>>>>>>>>>> [New LWP 59231]
>>>>>>>>>>> [Thread debugging using libthread_db enabled] Using host
>>>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0)
>>>>>>>>>>> at
>>>>>>>>>>> stat.c:1909
>>>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>>>
>>>>>>>>>>> (gdb) bt
>>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0)
>>>>>>>>>>> at
>>>>>>>>>>> stat.c:1909
>>>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>>>> stat.c:1965
>>>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300)
>>>>>>>>>>> at
>>>>>>>>>>> backend.c:446
>>>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>>>> pointer>,
>>>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>>>> #5  0x000000000045cfec in run_threads
>>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>>> backend.c:2217
>>>>>>>>>>> #6  0x000000000045d2cd in fio_backend
>>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>>> backend.c:2349
>>>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>>>
>>>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can
>>>>>>>>>> get to this on Monday, if not, it'll have to wait until after
>>>>>>>>>> my vacation... So while I appreciate people running -git and
>>>>>>>>>> finding issues like these before they show up in a release,
>>>>>>>>>> might be best to revert back to 2.2.11 until I can get this debugged.
>>>>>>>>>
>>>>>>>>> I take that back - continue using -git! Just pull a fresh copy,
>>>>>>>>> should be fixed now.
>>>>>>>>>
>>>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I
>>>>>>>>> just ran a quick bisect, changing the logging from every second
>>>>>>>>> to every 100ms to make it reproduce faster. I don't have time
>>>>>>>>> to look into why yet, so I just reverted the commit.
>>>>>>>>>
>>>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>>>
>>>>>>>>>    fio: Simplify forking of processes
>>>>>>>>
>>>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>>>> /dev/ram0 and
>>>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>>>>>>> device fio works with? I have used commit
>>>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you
>>>>>>>> reverted my
>>>>>>>> patch) for testing.
>>>>>>>
>>>>>>> On vacation right now, I'll check when I get back. It is possible
>>>>>>> that it was just a fluke, since there was another bug there
>>>>>>> related to shared memory, but it was predictably crashing at the same time for the bisect.
>>>>>>>
>>>>>>> It doesn't make a lot of sense, however.
>>>>>>
>>>>>> Did you have a chance to look into this?
>>>>>
>>>>> I have not, unfortunately, but I'm suspecting the patch is fine and
>>>>> the later fix to allocate the cur_log out of the shared pool was
>>>>> the real fix and that the original patch was fine.
>>>>
>>>> So that's what I'd suspect as well but I'm not able to reproduce
>>>> even the original crash so I cannot verify this theory... What's the
>>>> plan going forward? Will you re-apply the patch? Frankly, I don't
>>>> care much, it was just a small cleanup. I'm just curious whether it
>>>> was really that other bug or whether I miss something.
>>>
>>> Yes, I think re-applying would be the best way forward. Especially since that 2.13 was just released, so we'll have a while to iron out any issues. But I really don't see how it could be the reason for the issue, I'm guessing it just exacerbated it somehow.
>>>
>>> --
>>> Jens Axboe
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe fio" in the
>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>> http://vger.kernel.org/majordomo-info.html
>>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>>>
>>> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>>>
>>
>>
>> --
>> Jens Axboe
>>
>> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>>
>
>
> --
> Jens Axboe
>
> Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:
>
> This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: fio signal 11
  2016-08-03 15:03                               ` Jens Axboe
@ 2016-08-03 15:14                                 ` Jens Axboe
  2016-08-03 15:29                                   ` Jeff Furlong
  0 siblings, 1 reply; 28+ messages in thread
From: Jens Axboe @ 2016-08-03 15:14 UTC (permalink / raw)
  To: Jeff Furlong, Jan Kara; +Cc: Sitsofe Wheeler, fio

OK, try current -git, I believe it should be fixed.


On 08/03/2016 08:03 AM, Jens Axboe wrote:
> I can reproduce it now. Seems to happen a bit randomly for me, and not
> all 4 slat logs are big, some of them are averaged fine. So smells like
> a race in updating after going out of ramp time.
>
>
> On 08/03/2016 08:02 AM, Jeff Furlong wrote:
>> # fio -version
>> fio-2.13-28-g059b6
>>
>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1
>> --group_reporting --write_bw_log=test_job --write_iops_log=test_job
>> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0
>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based
>> --output=test_job --ramp_time=1s
>>
>> # ls -l
>> -rw-r--r-- 1 root root     1748 Aug  3 07:57 test_job
>> -rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.1.log
>> -rw-r--r-- 1 root root      196 Aug  3 07:57 test_job_bw.2.log
>> -rw-r--r-- 1 root root      192 Aug  3 07:57 test_job_bw.3.log
>> -rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.4.log
>> -rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_clat.1.log
>> -rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_clat.2.log
>> -rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_clat.3.log
>> -rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_clat.4.log
>> -rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.1.log
>> -rw-r--r-- 1 root root      187 Aug  3 07:57 test_job_iops.2.log
>> -rw-r--r-- 1 root root      184 Aug  3 07:57 test_job_iops.3.log
>> -rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.4.log
>> -rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_lat.1.log
>> -rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_lat.2.log
>> -rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_lat.3.log
>> -rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_lat.4.log
>> -rw-r--r-- 1 root root 49597737 Aug  3 07:57 test_job_slat.1.log
>> -rw-r--r-- 1 root root 42065243 Aug  3 07:57 test_job_slat.2.log
>> -rw-r--r-- 1 root root 24407670 Aug  3 07:57 test_job_slat.3.log
>> -rw-r--r-- 1 root root 47090233 Aug  3 07:57 test_job_slat.4.log
>>
>> Above we can see the slat log files are huge for a 10s runtime.  Not
>> sure if it's a factor of total runtime or IOs during that runtime.  I
>> can also create the issue when I reduce the runtime to 1s.
>>
>> Regards,
>> Jeff
>>
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@kernel.dk]
>> Sent: Wednesday, August 3, 2016 7:56 AM
>> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
>> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
>> Subject: Re: fio signal 11
>>
>> What version did you test? Works fine for me.
>>
>>
>> On 08/01/2016 03:57 PM, Jeff Furlong wrote:
>>> Sorry to open this item back up.  However, it appears when we add the
>>> ramp_time option, we break the logging.  Specifically, slat will log
>>> every entry, regardless of log_avg_msec.
>>>
>>> This example works as intended:
>>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>>> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1
>>> --group_reporting --write_bw_log=test_job --write_iops_log=test_job
>>> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0
>>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based
>>> --output=test_job
>>>
>>> This example is the same, but adds a ramp_time, but the slat log is
>>> full of all entries:
>>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>>> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1
>>> --group_reporting --write_bw_log=test_job --write_iops_log=test_job
>>> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0
>>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based
>>> --output=test_job --ramp_time=1s
>>>
>>> Regards,
>>> Jeff
>>>
>>> -----Original Message-----
>>> From: Jens Axboe [mailto:axboe@kernel.dk]
>>> Sent: Tuesday, July 26, 2016 11:35 AM
>>> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
>>> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
>>> Subject: Re: fio signal 11
>>>
>>> Perfect, thanks for testing!
>>>
>>>
>>> On 07/26/2016 12:33 PM, Jeff Furlong wrote:
>>>> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my
>>>> prior workload that caused the signal 11.  The workload now
>>>> completes without issue.
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] On
>>>> Behalf Of Jens Axboe
>>>> Sent: Tuesday, July 26, 2016 7:17 AM
>>>> To: Jan Kara <jack@suse.cz>
>>>> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler
>>>> <sitsofe@gmail.com>; fio@vger.kernel.org
>>>> Subject: Re: fio signal 11
>>>>
>>>> On 07/26/2016 02:43 AM, Jan Kara wrote:
>>>>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>>>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>>>>
>>>>>>>>>>>> [New LWP 59231]
>>>>>>>>>>>> [Thread debugging using libthread_db enabled] Using host
>>>>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>>>>> Core was generated by `fio --name=test_job --ioengine=libaio
>>>>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0)
>>>>>>>>>>>> at
>>>>>>>>>>>> stat.c:1909
>>>>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>>>>
>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0)
>>>>>>>>>>>> at
>>>>>>>>>>>> stat.c:1909
>>>>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>>>>> stat.c:1965
>>>>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions
>>>>>>>>>>>> (td=td@entry=0x7f8277de0000, time=time@entry=0x7fffcfb6b300)
>>>>>>>>>>>> at
>>>>>>>>>>>> backend.c:446
>>>>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>>>>> pointer>,
>>>>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at backend.c:1667
>>>>>>>>>>>> #5  0x000000000045cfec in run_threads
>>>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>>>> backend.c:2217
>>>>>>>>>>>> #6  0x000000000045d2cd in fio_backend
>>>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>>>> backend.c:2349
>>>>>>>>>>>> #7  0x000000000040d09c in main (argc=22, argv=0x7fffcfb6f638,
>>>>>>>>>>>> envp=<optimized out>) at fio.c:63
>>>>>>>>>>>
>>>>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can
>>>>>>>>>>> get to this on Monday, if not, it'll have to wait until after
>>>>>>>>>>> my vacation... So while I appreciate people running -git and
>>>>>>>>>>> finding issues like these before they show up in a release,
>>>>>>>>>>> might be best to revert back to 2.2.11 until I can get this
>>>>>>>>>>> debugged.
>>>>>>>>>>
>>>>>>>>>> I take that back - continue using -git! Just pull a fresh copy,
>>>>>>>>>> should be fixed now.
>>>>>>>>>>
>>>>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So I
>>>>>>>>>> just ran a quick bisect, changing the logging from every second
>>>>>>>>>> to every 100ms to make it reproduce faster. I don't have time
>>>>>>>>>> to look into why yet, so I just reverted the commit.
>>>>>>>>>>
>>>>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>>>>
>>>>>>>>>>    fio: Simplify forking of processes
>>>>>>>>>
>>>>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>>>>> /dev/ram0 and
>>>>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the
>>>>>>>>> device fio works with? I have used commit
>>>>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you
>>>>>>>>> reverted my
>>>>>>>>> patch) for testing.
>>>>>>>>
>>>>>>>> On vacation right now, I'll check when I get back. It is possible
>>>>>>>> that it was just a fluke, since there was another bug there
>>>>>>>> related to shared memory, but it was predictably crashing at the
>>>>>>>> same time for the bisect.
>>>>>>>>
>>>>>>>> It doesn't make a lot of sense, however.
>>>>>>>
>>>>>>> Did you have a chance to look into this?
>>>>>>
>>>>>> I have not, unfortunately, but I'm suspecting the patch is fine and
>>>>>> the later fix to allocate the cur_log out of the shared pool was
>>>>>> the real fix and that the original patch was fine.
>>>>>
>>>>> So that's what I'd suspect as well but I'm not able to reproduce
>>>>> even the original crash so I cannot verify this theory... What's the
>>>>> plan going forward? Will you re-apply the patch? Frankly, I don't
>>>>> care much, it was just a small cleanup. I'm just curious whether it
>>>>> was really that other bug or whether I miss something.
>>>>
>>>> Yes, I think re-applying would be the best way forward. Especially
>>>> since that 2.13 was just released, so we'll have a while to iron out
>>>> any issues. But I really don't see how it could be the reason for
>>>> the issue, I'm guessing it just exacerbated it somehow.
>>>>
>>>> --
>>>> Jens Axboe
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe fio" in the
>>>> body of a message to majordomo@vger.kernel.org More majordomo info at
>>>> http://vger.kernel.org/majordomo-info.html
>>>> Western Digital Corporation (and its subsidiaries) E-mail
>>>> Confidentiality Notice & Disclaimer:
>>>>
>>>> This e-mail and any files transmitted with it may contain
>>>> confidential or legally privileged information of WDC and/or its
>>>> affiliates, and are intended solely for the use of the individual or
>>>> entity to which they are addressed. If you are not the intended
>>>> recipient, any disclosure, copying, distribution or any action taken
>>>> or omitted to be taken in reliance on it, is prohibited. If you have
>>>> received this e-mail in error, please notify the sender immediately
>>>> and delete the e-mail in its entirety from your system.
>>>>
>>>
>>>
>>> --
>>> Jens Axboe
>>>
>>> Western Digital Corporation (and its subsidiaries) E-mail
>>> Confidentiality Notice & Disclaimer:
>>>
>>> This e-mail and any files transmitted with it may contain
>>> confidential or legally privileged information of WDC and/or its
>>> affiliates, and are intended solely for the use of the individual or
>>> entity to which they are addressed. If you are not the intended
>>> recipient, any disclosure, copying, distribution or any action taken
>>> or omitted to be taken in reliance on it, is prohibited. If you have
>>> received this e-mail in error, please notify the sender immediately
>>> and delete the e-mail in its entirety from your system.
>>>
>>
>>
>> --
>> Jens Axboe
>>
>> Western Digital Corporation (and its subsidiaries) E-mail
>> Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain confidential
>> or legally privileged information of WDC and/or its affiliates, and
>> are intended solely for the use of the individual or entity to which
>> they are addressed. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited. If you have received this
>> e-mail in error, please notify the sender immediately and delete the
>> e-mail in its entirety from your system.
>>
>
>


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 28+ messages in thread

* RE: fio signal 11
  2016-08-03 15:14                                 ` Jens Axboe
@ 2016-08-03 15:29                                   ` Jeff Furlong
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Furlong @ 2016-08-03 15:29 UTC (permalink / raw)
  To: Jens Axboe, Jan Kara; +Cc: Sitsofe Wheeler, fio

Indeed.  With fio-2.13-35-gf5a5 I ran this workload and the usual combos of log_avg_msec and ramp_time, and all output seems right.  Thanks!

Regards,
Jeff

-----Original Message-----
From: Jens Axboe [mailto:axboe@kernel.dk] 
Sent: Wednesday, August 3, 2016 8:15 AM
To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
Subject: Re: fio signal 11

OK, try current -git, I believe it should be fixed.


On 08/03/2016 08:03 AM, Jens Axboe wrote:
> I can reproduce it now. Seems to happen a bit randomly for me, and not 
> all 4 slat logs are big, some of them are averaged fine. So smells 
> like a race in updating after going out of ramp time.
>
>
> On 08/03/2016 08:02 AM, Jeff Furlong wrote:
>> # fio -version
>> fio-2.13-28-g059b6
>>
>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>> --iodepth=256 --size=100% --numjobs=4 --bs=4k --filename=/dev/nvme0n1 
>> --group_reporting --write_bw_log=test_job --write_iops_log=test_job 
>> --write_lat_log=test_job --log_avg_msec=1000 --disable_lat=0
>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based 
>> --output=test_job --ramp_time=1s
>>
>> # ls -l
>> -rw-r--r-- 1 root root     1748 Aug  3 07:57 test_job
>> -rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.1.log
>> -rw-r--r-- 1 root root      196 Aug  3 07:57 test_job_bw.2.log
>> -rw-r--r-- 1 root root      192 Aug  3 07:57 test_job_bw.3.log
>> -rw-r--r-- 1 root root      201 Aug  3 07:57 test_job_bw.4.log
>> -rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_clat.1.log
>> -rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_clat.2.log
>> -rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_clat.3.log
>> -rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_clat.4.log
>> -rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.1.log
>> -rw-r--r-- 1 root root      187 Aug  3 07:57 test_job_iops.2.log
>> -rw-r--r-- 1 root root      184 Aug  3 07:57 test_job_iops.3.log
>> -rw-r--r-- 1 root root      191 Aug  3 07:57 test_job_iops.4.log
>> -rw-r--r-- 1 root root      179 Aug  3 07:57 test_job_lat.1.log
>> -rw-r--r-- 1 root root      181 Aug  3 07:57 test_job_lat.2.log
>> -rw-r--r-- 1 root root      186 Aug  3 07:57 test_job_lat.3.log
>> -rw-r--r-- 1 root root      178 Aug  3 07:57 test_job_lat.4.log
>> -rw-r--r-- 1 root root 49597737 Aug  3 07:57 test_job_slat.1.log
>> -rw-r--r-- 1 root root 42065243 Aug  3 07:57 test_job_slat.2.log
>> -rw-r--r-- 1 root root 24407670 Aug  3 07:57 test_job_slat.3.log
>> -rw-r--r-- 1 root root 47090233 Aug  3 07:57 test_job_slat.4.log
>>
>> Above we can see the slat log files are huge for a 10s runtime.  Not 
>> sure if it's a factor of total runtime or IOs during that runtime.  I 
>> can also create the issue when I reduce the runtime to 1s.
>>
>> Regards,
>> Jeff
>>
>> -----Original Message-----
>> From: Jens Axboe [mailto:axboe@kernel.dk]
>> Sent: Wednesday, August 3, 2016 7:56 AM
>> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
>> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
>> Subject: Re: fio signal 11
>>
>> What version did you test? Works fine for me.
>>
>>
>> On 08/01/2016 03:57 PM, Jeff Furlong wrote:
>>> Sorry to open this item back up.  However, it appears when we add 
>>> the ramp_time option, we break the logging.  Specifically, slat will 
>>> log every entry, regardless of log_avg_msec.
>>>
>>> This example works as intended:
>>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>>> --iodepth=256 --size=100% --numjobs=4 --bs=4k 
>>> --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job 
>>> --write_iops_log=test_job --write_lat_log=test_job 
>>> --log_avg_msec=1000 --disable_lat=0
>>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based 
>>> --output=test_job
>>>
>>> This example is the same, but adds a ramp_time, but the slat log is 
>>> full of all entries:
>>> # fio --name=test_job --ioengine=libaio --direct=1 --rw=randread
>>> --iodepth=256 --size=100% --numjobs=4 --bs=4k 
>>> --filename=/dev/nvme0n1 --group_reporting --write_bw_log=test_job 
>>> --write_iops_log=test_job --write_lat_log=test_job 
>>> --log_avg_msec=1000 --disable_lat=0
>>> --disable_clat=0 --disable_slat=0 --runtime=10s --time_based 
>>> --output=test_job --ramp_time=1s
>>>
>>> Regards,
>>> Jeff
>>>
>>> -----Original Message-----
>>> From: Jens Axboe [mailto:axboe@kernel.dk]
>>> Sent: Tuesday, July 26, 2016 11:35 AM
>>> To: Jeff Furlong <jeff.furlong@hgst.com>; Jan Kara <jack@suse.cz>
>>> Cc: Sitsofe Wheeler <sitsofe@gmail.com>; fio@vger.kernel.org
>>> Subject: Re: fio signal 11
>>>
>>> Perfect, thanks for testing!
>>>
>>>
>>> On 07/26/2016 12:33 PM, Jeff Furlong wrote:
>>>> FYI, with the patch back in version fio-2.13-1-gce8b, I re-ran my 
>>>> prior workload that caused the signal 11.  The workload now 
>>>> completes without issue.
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: fio-owner@vger.kernel.org [mailto:fio-owner@vger.kernel.org] 
>>>> On Behalf Of Jens Axboe
>>>> Sent: Tuesday, July 26, 2016 7:17 AM
>>>> To: Jan Kara <jack@suse.cz>
>>>> Cc: Jeff Furlong <jeff.furlong@hgst.com>; Sitsofe Wheeler 
>>>> <sitsofe@gmail.com>; fio@vger.kernel.org
>>>> Subject: Re: fio signal 11
>>>>
>>>> On 07/26/2016 02:43 AM, Jan Kara wrote:
>>>>> On Mon 25-07-16 09:21:00, Jens Axboe wrote:
>>>>>> On 07/19/2016 11:08 PM, Jan Kara wrote:
>>>>>>> On Thu 16-06-16 09:06:51, Jens Axboe wrote:
>>>>>>>> On 06/15/2016 04:45 PM, Jan Kara wrote:
>>>>>>>>> On Sat 11-06-16 21:30:00, Jens Axboe wrote:
>>>>>>>>>> On 06/11/2016 08:56 PM, Jens Axboe wrote:
>>>>>>>>>>> On 06/10/2016 12:42 PM, Jeff Furlong wrote:
>>>>>>>>>>>> Good point.  Here is the trace:
>>>>>>>>>>>>
>>>>>>>>>>>> [New LWP 59231]
>>>>>>>>>>>> [Thread debugging using libthread_db enabled] Using host 
>>>>>>>>>>>> libthread_db library "/lib64/libthread_db.so.1".
>>>>>>>>>>>> Core was generated by `fio --name=test_job 
>>>>>>>>>>>> --ioengine=libaio
>>>>>>>>>>>> --direct=1 --rw=write --iodepth=32'.
>>>>>>>>>>>> Program terminated with signal 11, Segmentation fault.
>>>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) 
>>>>>>>>>>>> at
>>>>>>>>>>>> stat.c:1909
>>>>>>>>>>>> 1909        if (!cur_log) {
>>>>>>>>>>>>
>>>>>>>>>>>> (gdb) bt
>>>>>>>>>>>> #0  0x0000000000421e39 in regrow_log (iolog=0x7f828c0c5ad0) 
>>>>>>>>>>>> at
>>>>>>>>>>>> stat.c:1909
>>>>>>>>>>>> #1  0x000000000042d4df in regrow_logs
>>>>>>>>>>>> (td=td@entry=0x7f8277de0000) at
>>>>>>>>>>>> stat.c:1965
>>>>>>>>>>>> #2  0x000000000040ca90 in wait_for_completions 
>>>>>>>>>>>> (td=td@entry=0x7f8277de0000, 
>>>>>>>>>>>> time=time@entry=0x7fffcfb6b300) at
>>>>>>>>>>>> backend.c:446
>>>>>>>>>>>> #3  0x000000000045ade7 in do_io (bytes_done=<synthetic
>>>>>>>>>>>> pointer>,
>>>>>>>>>>>> td=0x7f8277de0000) at backend.c:991
>>>>>>>>>>>> #4  thread_main (data=data@entry=0x264d450) at 
>>>>>>>>>>>> backend.c:1667
>>>>>>>>>>>> #5  0x000000000045cfec in run_threads
>>>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>>>> backend.c:2217
>>>>>>>>>>>> #6  0x000000000045d2cd in fio_backend
>>>>>>>>>>>> (sk_out=sk_out@entry=0x0) at
>>>>>>>>>>>> backend.c:2349
>>>>>>>>>>>> #7  0x000000000040d09c in main (argc=22, 
>>>>>>>>>>>> argv=0x7fffcfb6f638, envp=<optimized out>) at fio.c:63
>>>>>>>>>>>
>>>>>>>>>>> That looks odd, thanks for reporting this. I'll see if I can 
>>>>>>>>>>> get to this on Monday, if not, it'll have to wait until 
>>>>>>>>>>> after my vacation... So while I appreciate people running 
>>>>>>>>>>> -git and finding issues like these before they show up in a 
>>>>>>>>>>> release, might be best to revert back to 2.2.11 until I can 
>>>>>>>>>>> get this debugged.
>>>>>>>>>>
>>>>>>>>>> I take that back - continue using -git! Just pull a fresh 
>>>>>>>>>> copy, should be fixed now.
>>>>>>>>>>
>>>>>>>>>> Jan, the reporter is right, 2.11 works and -git does not. So 
>>>>>>>>>> I just ran a quick bisect, changing the logging from every 
>>>>>>>>>> second to every 100ms to make it reproduce faster. I don't 
>>>>>>>>>> have time to look into why yet, so I just reverted the commit.
>>>>>>>>>>
>>>>>>>>>> commit d7982dd0ab2a1a315b5f9859c67a02414ce6274f
>>>>>>>>>> Author: Jan Kara <jack@suse.cz>
>>>>>>>>>> Date:   Tue May 24 17:03:21 2016 +0200
>>>>>>>>>>
>>>>>>>>>>    fio: Simplify forking of processes
>>>>>>>>>
>>>>>>>>> Hum, I've tried reproducing this but failed (I've tried using
>>>>>>>>> /dev/ram0 and
>>>>>>>>> /dev/sda4 as devices for fio). Is it somehow dependent on the 
>>>>>>>>> device fio works with? I have used commit
>>>>>>>>> 54d0a3150d44adca3ee4047fabd85651c6ea2db1 (just before you 
>>>>>>>>> reverted my
>>>>>>>>> patch) for testing.
>>>>>>>>
>>>>>>>> On vacation right now, I'll check when I get back. It is 
>>>>>>>> possible that it was just a fluke, since there was another bug 
>>>>>>>> there related to shared memory, but it was predictably crashing 
>>>>>>>> at the same time for the bisect.
>>>>>>>>
>>>>>>>> It doesn't make a lot of sense, however.
>>>>>>>
>>>>>>> Did you have a chance to look into this?
>>>>>>
>>>>>> I have not, unfortunately, but I'm suspecting the patch is fine 
>>>>>> and the later fix to allocate the cur_log out of the shared pool 
>>>>>> was the real fix and that the original patch was fine.
>>>>>
>>>>> So that's what I'd suspect as well but I'm not able to reproduce 
>>>>> even the original crash so I cannot verify this theory... What's 
>>>>> the plan going forward? Will you re-apply the patch? Frankly, I 
>>>>> don't care much, it was just a small cleanup. I'm just curious 
>>>>> whether it was really that other bug or whether I miss something.
>>>>
>>>> Yes, I think re-applying would be the best way forward. Especially 
>>>> since that 2.13 was just released, so we'll have a while to iron 
>>>> out any issues. But I really don't see how it could be the reason 
>>>> for the issue, I'm guessing it just exacerbated it somehow.
>>>>
>>>> --
>>>> Jens Axboe
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe fio" in 
>>>> the body of a message to majordomo@vger.kernel.org More majordomo 
>>>> info at http://vger.kernel.org/majordomo-info.html
>>>> Western Digital Corporation (and its subsidiaries) E-mail 
>>>> Confidentiality Notice & Disclaimer:
>>>>
>>>> This e-mail and any files transmitted with it may contain 
>>>> confidential or legally privileged information of WDC and/or its 
>>>> affiliates, and are intended solely for the use of the individual 
>>>> or entity to which they are addressed. If you are not the intended 
>>>> recipient, any disclosure, copying, distribution or any action 
>>>> taken or omitted to be taken in reliance on it, is prohibited. If 
>>>> you have received this e-mail in error, please notify the sender 
>>>> immediately and delete the e-mail in its entirety from your system.
>>>>
>>>
>>>
>>> --
>>> Jens Axboe
>>>
>>> Western Digital Corporation (and its subsidiaries) E-mail 
>>> Confidentiality Notice & Disclaimer:
>>>
>>> This e-mail and any files transmitted with it may contain 
>>> confidential or legally privileged information of WDC and/or its 
>>> affiliates, and are intended solely for the use of the individual or 
>>> entity to which they are addressed. If you are not the intended 
>>> recipient, any disclosure, copying, distribution or any action taken 
>>> or omitted to be taken in reliance on it, is prohibited. If you have 
>>> received this e-mail in error, please notify the sender immediately 
>>> and delete the e-mail in its entirety from your system.
>>>
>>
>>
>> --
>> Jens Axboe
>>
>> Western Digital Corporation (and its subsidiaries) E-mail 
>> Confidentiality Notice & Disclaimer:
>>
>> This e-mail and any files transmitted with it may contain 
>> confidential or legally privileged information of WDC and/or its 
>> affiliates, and are intended solely for the use of the individual or 
>> entity to which they are addressed. If you are not the intended 
>> recipient, any disclosure, copying, distribution or any action taken 
>> or omitted to be taken in reliance on it, is prohibited. If you have 
>> received this e-mail in error, please notify the sender immediately 
>> and delete the e-mail in its entirety from your system.
>>
>
>


--
Jens Axboe

Western Digital Corporation (and its subsidiaries) E-mail Confidentiality Notice & Disclaimer:

This e-mail and any files transmitted with it may contain confidential or legally privileged information of WDC and/or its affiliates, and are intended solely for the use of the individual or entity to which they are addressed. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited. If you have received this e-mail in error, please notify the sender immediately and delete the e-mail in its entirety from your system.



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2016-08-03 15:29 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-09 21:58 fio signal 11 Jeff Furlong
2016-06-10  7:17 ` Sitsofe Wheeler
2016-06-10 18:42   ` Jeff Furlong
2016-06-12  2:56     ` Jens Axboe
2016-06-12  3:30       ` Jens Axboe
2016-06-13 17:58         ` Jeff Furlong
2016-06-13 18:01           ` Jens Axboe
2016-06-13 18:04             ` Jeff Furlong
2016-06-13 19:21               ` Jeff Furlong
2016-06-13 19:23                 ` Jens Axboe
2016-06-13 19:34                   ` Jens Axboe
2016-06-13 20:55                     ` Jeff Furlong
2016-06-13 21:23                       ` Jens Axboe
2016-06-15 14:45         ` Jan Kara
2016-06-16  7:06           ` Jens Axboe
2016-07-20  5:08             ` Jan Kara
2016-07-25 15:21               ` Jens Axboe
2016-07-26  8:43                 ` Jan Kara
2016-07-26 14:17                   ` Jens Axboe
2016-07-26 18:33                     ` Jeff Furlong
2016-07-26 18:35                       ` Jens Axboe
2016-08-01 22:57                         ` Jeff Furlong
2016-08-02 21:03                           ` Jens Axboe
2016-08-03 14:55                           ` Jens Axboe
2016-08-03 15:02                             ` Jeff Furlong
2016-08-03 15:03                               ` Jens Axboe
2016-08-03 15:14                                 ` Jens Axboe
2016-08-03 15:29                                   ` Jeff Furlong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.