linux-trace-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cannot run trace-cmd split in parallel
@ 2023-07-09 15:39 Sharon Gabay
       [not found] ` <20230709151604.73725086@rorschach.local.home>
  0 siblings, 1 reply; 3+ messages in thread
From: Sharon Gabay @ 2023-07-09 15:39 UTC (permalink / raw)
  To: linux-trace-users

Hi!

I've been having this very strange issue for as long as I've been using "trace-cmd split". Actually I didn't write about it until now because it's so strange I was sure the blame is on the user 😊

When I use "trace-cmd split" in parallel, I get randomly invalid output. This happens specifically when I use the start/end arguments.

To reproduce, take any trace.dat (as far as I can tell), and run the following command, which is 5 nearly identical command lines run in parallel. The only difference is in the start/end arguments. Without this difference, the issue does not reproduce.

trace-cmd split -i trace.dat -o /tmp/out1 <start> <end> & trace-cmd split -i trace.dat -o /tmp/out2 <start+1> <end+1> & trace-cmd split -i trace.dat -o /tmp/out3 <start+2> <end+2> & trace-cmd split -i trace.dat -o /tmp/out4 <start+3> <end+3> & trace-cmd split -i trace.dat -o /tmp/out5 <start> <end>

If you compare the output of the first and last command, which are in bold and you can see they are the exact same, the output is different.
diff /tmp/out1.1 /tmp/out5.1

But it's not consistent, every run will behave differently, so you might need few runs to get this. It might also not happen at all, I guess. Statistics.

You can expect the diff to be different if you see this in the stdout/stderr:
…
libtracecmd: No such file or directory
  can not stat '/tmp/.tmp.tmp.0'
trace-cmd: No such file or directory
  Failed to append tracing data

libtracecmd: No such file or directoryש
  can not stat '/tmp/.tmp.tmp.0'
trace-cmd: No such file or directory
  Failed to append tracing data

It looks like as if trace-cmd is using some file to store temporary data, and the same filename is used by all processes.

Can anyone help me understand this weird behavior?

Thanks!
Sharon


^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: cannot run trace-cmd split in parallel
       [not found] ` <20230709151604.73725086@rorschach.local.home>
@ 2023-07-10  6:44   ` Sharon Gabay
  2023-07-10 15:00     ` Steven Rostedt
  0 siblings, 1 reply; 3+ messages in thread
From: Sharon Gabay @ 2023-07-10  6:44 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: linux-trace-users

Hi!

Running in parallel is needed because I'm using trace-cmd split to split a big job (analyzing multiple frames) between several processes to speed it up.

Unfortunately I don't currently build trace-cmd so can't try the patch, but writing this email made me think of another solution. I'm specifying output files in separate temporary directories (/tmp/1, /tmp/2 ...) and now it works perfectly!

I think it would be useful to have two fixes:
- make trace-cmd create the temporary output ("tmp.0.0") using either a uuid, or the name of the output file itself, or maybe add some suffix to it. In short, avoid collisions.
- I'm not sure why but "trace-cmd split -o /tmp/a" will actually write to /tmp/a.1, if possible it would be best to write to the exact name specified by the user.

Thanks!
Sharon

-----Original Message-----
From: Steven Rostedt <rostedt@goodmis.org> 
Sent: יום א 09 יולי 2023 22:16
To: Sharon Gabay <Sharon.Gabay@mobileye.com>
Cc: linux-trace-users@vger.kernel.org" <linux-trace-users@vger.kernel.org>
Subject: Re: cannot run trace-cmd split in parallel

EXTERNAL EMAIL: Do not click any links or open any attachments unless you trust the sender and know the content is safe.

On Sun, 9 Jul 2023 15:39:46 +0000
Sharon Gabay <Sharon.Gabay@mobileye.com> wrote:

> Hi!
> 
> I've been having this very strange issue for as long as I've been 
> using "trace-cmd split". Actually I didn't write about it until now 
> because it's so strange I was sure the blame is on the user 😊
> 
> When I use "trace-cmd split" in parallel, I get randomly invalid 
> output. This happens specifically when I use the start/end arguments.

I have to admit that I never thought about running it in parallel.

> 
> To reproduce, take any trace.dat (as far as I can tell), and run the 
> following command, which is 5 nearly identical command lines run in 
> parallel. The only difference is in the start/end arguments. Without 
> this difference, the issue does not reproduce.

> 
> trace-cmd split -i trace.dat -o /tmp/out1 <start> <end> & trace-cmd 
> split -i trace.dat -o /tmp/out2 <start+1> <end+1> & trace-cmd split -i 
> trace.dat -o /tmp/out3 <start+2> <end+2> & trace-cmd split -i 
> trace.dat -o /tmp/out4 <start+3> <end+3> & trace-cmd split -i 
> trace.dat -o /tmp/out5 <start> <end>
> 
> If you compare the output of the first and last command, which are in 
> bold and you can see they are the exact same, the output is different. 
> diff /tmp/out1.1 /tmp/out5.1
> 
> But it's not consistent, every run will behave differently, so you 
> might need few runs to get this. It might also not happen at all, I 
> guess. Statistics.
> 
> You can expect the diff to be different if you see this in the
> stdout/stderr:

> …
> libtracecmd: No such file or directory
>   can not stat '/tmp/.tmp.tmp.0'
> trace-cmd: No such file or directory
>   Failed to append tracing data
> 
> libtracecmd: No such file or directoryש
>   can not stat '/tmp/.tmp.tmp.0'
> trace-cmd: No such file or directory
>   Failed to append tracing data
> 

> It looks like as if trace-cmd is using some file to store temporary 
> data, and the same filename is used by all processes.

It is suppose to use the output file to base the temp files on, but it appears that I got the dirname() and basename() backwards, and the dirname truncated the output file such that the basename was the same as the dir name. This causes all the temp files to be the same as the dir name, and you will hit this conflict if your output files share the same directory!

> 
> Can anyone help me understand this weird behavior?
> 

Can you try this patch to see if it fixes your situation?

-- Steve

diff --git a/tracecmd/trace-split.c b/tracecmd/trace-split.c index 1daa847d..57c4e64f 100644
--- a/tracecmd/trace-split.c
+++ b/tracecmd/trace-split.c
@@ -367,8 +367,8 @@ static double parse_file(struct tracecmd_input *handle,
 	int fd;
 
 	output = strdup(output_file);
-	dir = dirname(output);
 	base = basename(output);
+	dir = dirname(output);
 
 	ohandle = tracecmd_copy(handle, output_file, TRACECMD_FILE_CMD_LINES, 0, NULL);
 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: cannot run trace-cmd split in parallel
  2023-07-10  6:44   ` Sharon Gabay
@ 2023-07-10 15:00     ` Steven Rostedt
  0 siblings, 0 replies; 3+ messages in thread
From: Steven Rostedt @ 2023-07-10 15:00 UTC (permalink / raw)
  To: Sharon Gabay; +Cc: linux-trace-users

On Mon, 10 Jul 2023 06:44:53 +0000
Sharon Gabay <Sharon.Gabay@mobileye.com> wrote:

> I think it would be useful to have two fixes:
> - make trace-cmd create the temporary output ("tmp.0.0") using either a
> uuid, or the name of the output file itself, or maybe add some suffix to
> it. In short, avoid collisions.

And the patch pretty much does the above ;-)

> - I'm not sure why but "trace-cmd split -o /tmp/a" will actually write to
> /tmp/a.1, if possible it would be best to write to the exact name
> specified by the user.

That was partially due to the default of the output file being the same as
the input file, and the '.1' made sure that the two did not collide.

But I can modify it to do the following:

diff --git a/tracecmd/trace-split.c b/tracecmd/trace-split.c
index 1daa847d9775..59df1d02b345 100644
--- a/tracecmd/trace-split.c
+++ b/tracecmd/trace-split.c
@@ -545,7 +545,7 @@ void trace_split (int argc, char **argv)
 	if (!output)
 		output = strdup(input_file);
 
-	if (!repeat) {
+	if (!repeat && strcmp(output, input_file) == 0) {
 		output = realloc(output, strlen(output) + 3);
 		strcat(output, ".1");
 	}

That is, only add the '.1' if the output file is the same as the input file.

-- Steve

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-07-10 15:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-09 15:39 cannot run trace-cmd split in parallel Sharon Gabay
     [not found] ` <20230709151604.73725086@rorschach.local.home>
2023-07-10  6:44   ` Sharon Gabay
2023-07-10 15:00     ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).