lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed
* Large number of stream files in CTF trace -- too many file handles
@ 2020-03-13 21:55 Rocky Dunlap via lttng-dev
  2020-03-16 14:51 ` Jonathan Rajotte-Julien via lttng-dev
  0 siblings, 1 reply; 3+ messages in thread
From: Rocky Dunlap via lttng-dev @ 2020-03-13 21:55 UTC (permalink / raw)
  To: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 4151 bytes --]

I am attempting to use babeltrace2 to read a CTF trace that has ~2000
stream files.  This is a custom trace collected from an MPI application on
an HPC platform.  In this case, each MPI process opens and writes to its
own stream file, so you end up with one file per MPI task.

When I attempt to read the trace from the command line with babeltrace2, I
see the following error:

ERROR:    [Babeltrace CLI] (babeltrace2.c:2548)
  Graph failed to complete successfully
CAUSED BY [libbabeltrace2] (graph.c:473)
  Component's "consume" method failed: status=ERROR, comp-addr=0x1beab20,
comp-name="pretty", comp-log-level=WARNING, comp-class-type=SINK,
  comp-class-name="pretty", comp-class-partial-descr="Pretty-print messages
(`text` fo", comp-class-is-frozen=0,
  comp-class-so-handle-addr=0x174fc10,
comp-class-so-handle-path="/usr/lib/x86_64-linux-gnu/babeltrace2/plugins/babeltrace-plugin-text.so",
  comp-input-port-count=1, comp-output-port-count=0
CAUSED BY [libbabeltrace2] (iterator.c:864)
  Component input port message iterator's "next" method failed:
iter-addr=0x1c7cec0, iter-upstream-comp-name="muxer",
  iter-upstream-comp-log-level=WARNING,
iter-upstream-comp-class-type=FILTER, iter-upstream-comp-class-name="muxer",
  iter-upstream-comp-class-partial-descr="Sort messages from multiple
inpu", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out",
  status=ERROR
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:991)
  Cannot validate muxer's upstream message iterator wrapper:
muxer-msg-iter-addr=0x1c7d030, muxer-upstream-msg-iter-wrap-addr=0x1e23430
CAUSED BY [muxer: 'filter.utils.muxer'] (muxer.c:454)
  Upstream iterator's next method returned an error: status=ERROR
CAUSED BY [libbabeltrace2] (iterator.c:864)
  Component input port message iterator's "next" method failed:
iter-addr=0x1e22f00, iter-upstream-comp-name="auto-disc-source-ctf-fs",
  iter-upstream-comp-log-level=WARNING,
iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="fs",
  iter-upstream-comp-class-partial-descr="Read CTF traces from the file
sy", iter-upstream-port-type=OUTPUT,
  iter-upstream-port-name="21c4e078-a5c7-11e8-8529-34f39aeaad30 | 0 |
/home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020", status=ERROR
CAUSED BY [auto-disc-source-ctf-fs (21c4e078-a5c7-11e8-8529-34f39aeaad30 |
0 | /home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020): 'source.ctf.fs']
(fs.c:109)
  Failed to get next message from CTF message iterator.
CAUSED BY [auto-disc-source-ctf-fs: 'source.ctf.fs'] (msg-iter.c:2899)
  Cannot handle state: msg-it-addr=0x1e230f0, state=SWITCH_PACKET
CAUSED BY [auto-disc-source-ctf-fs (21c4e078-a5c7-11e8-8529-34f39aeaad30 |
0 | /home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020): 'source.ctf.fs']
(data-stream-file.c:385)
  failed to create ctf_fs_ds_file.
CAUSED BY [auto-disc-source-ctf-fs: 'source.ctf.fs'] (file.c:98)
 * Cannot open file: Too many open files:
*path=/home/rocky/tmp/fv3/wave/traceout/esmf_stream_1020,
mode=rb

No doubt the issue is the large number of file handles.

I see a similar error when I try to use bt2.TraceCollectionMessageIterator.

This is probably somewhat non-standard to have so many file streams.  But,
it works quite well to write them out this way on an HPC system--i.e., to
combine the streams during the application run would require MPI
communication, which would degrade performance and make the tracing more
complicated.

But, now that I have the streams and seeing the too many file handles
system error, I am thinking maybe I should post-process the streams down
from 2000 to a much smaller number, maybe 20, where 100 of the original
streams are merged.  The good news is that each of the streams are not that
big, so the overall trace size should be manageable.

If this is the right approach, then what would be the best way to
post-process these streams down to a smaller number of files?

If this is not the right approach, how should I proceed?  E.g., should the
source-ctf-fs manage a limited pool of file handles?  I would think this
would be pretty inefficient as you would need to constantly open/close
files--expensive.

Any help is appreciated!

Rocky

[-- Attachment #1.2: Type: text/html, Size: 4751 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Large number of stream files in CTF trace -- too many file handles
  2020-03-13 21:55 Large number of stream files in CTF trace -- too many file handles Rocky Dunlap via lttng-dev
@ 2020-03-16 14:51 ` Jonathan Rajotte-Julien via lttng-dev
  2020-03-17  3:38   ` Rocky Dunlap via lttng-dev
  0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Rajotte-Julien via lttng-dev @ 2020-03-16 14:51 UTC (permalink / raw)
  To: Rocky Dunlap; +Cc: lttng-dev

Hi,

> If this is not the right approach, how should I proceed?  E.g., should the
> source-ctf-fs manage a limited pool of file handles?  I would think this
> would be pretty inefficient as you would need to constantly open/close
> files--expensive.

I would probably start looking at the soft and hardlimit for the babeltrace2
process in terms of open file:

On my machine:

joraj@~[]$ ulimit -Sn
1024

joraj@~[]$ ulimit -Hn
1048576

That is a lot of headspace.

I might have a setting somewhere increasing the base hardlimit but
in any case you will see how much room you have.

Based on the number of streams you have, I would say that you will
need more than 2000 as a base soft limit for this trace.

We do have a FD pooling system in place in another project we maintain
(lttng-tools[1] GPL-2.0) that might be pertinent for babeltrace2 at some point
in time. As for the overhead that would occur in a scenario with not enough FD
available, I think it is a good compromise between either reading a trace or
not reading it at all. A warning informing the user that we reached
the limit of the pool might be a good start in such case.

[1] https://github.com/lttng/lttng-tools/tree/master/src/common/fd-tracker

Cheers

-- 
Jonathan Rajotte-Julien
EfficiOS

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Large number of stream files in CTF trace -- too many file handles
  2020-03-16 14:51 ` Jonathan Rajotte-Julien via lttng-dev
@ 2020-03-17  3:38   ` Rocky Dunlap via lttng-dev
  0 siblings, 0 replies; 3+ messages in thread
From: Rocky Dunlap via lttng-dev @ 2020-03-17  3:38 UTC (permalink / raw)
  To: Jonathan Rajotte-Julien; +Cc: lttng-dev


[-- Attachment #1.1: Type: text/plain, Size: 1576 bytes --]

Jonathan,

Increasing the soft FD limit worked great, both for the command line and
the Python script.  Thanks for the help!

Rocky

On Mon, Mar 16, 2020 at 8:51 AM Jonathan Rajotte-Julien <
jonathan.rajotte-julien@efficios.com> wrote:

> Hi,
>
> > If this is not the right approach, how should I proceed?  E.g., should
> the
> > source-ctf-fs manage a limited pool of file handles?  I would think this
> > would be pretty inefficient as you would need to constantly open/close
> > files--expensive.
>
> I would probably start looking at the soft and hardlimit for the
> babeltrace2
> process in terms of open file:
>
> On my machine:
>
> joraj@~[]$ ulimit -Sn
> 1024
>
> joraj@~[]$ ulimit -Hn
> 1048576
>
> That is a lot of headspace.
>
> I might have a setting somewhere increasing the base hardlimit but
> in any case you will see how much room you have.
>
> Based on the number of streams you have, I would say that you will
> need more than 2000 as a base soft limit for this trace.
>
> We do have a FD pooling system in place in another project we maintain
> (lttng-tools[1] GPL-2.0) that might be pertinent for babeltrace2 at some
> point
> in time. As for the overhead that would occur in a scenario with not
> enough FD
> available, I think it is a good compromise between either reading a trace
> or
> not reading it at all. A warning informing the user that we reached
> the limit of the pool might be a good start in such case.
>
> [1] https://github.com/lttng/lttng-tools/tree/master/src/common/fd-tracker
>
> Cheers
>
> --
> Jonathan Rajotte-Julien
> EfficiOS
>

[-- Attachment #1.2: Type: text/html, Size: 2174 bytes --]

[-- Attachment #2: Type: text/plain, Size: 156 bytes --]

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-17  3:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-13 21:55 Large number of stream files in CTF trace -- too many file handles Rocky Dunlap via lttng-dev
2020-03-16 14:51 ` Jonathan Rajotte-Julien via lttng-dev
2020-03-17  3:38   ` Rocky Dunlap via lttng-dev

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).