linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] perf, tools, stat: Force C numeric locale for CSV mode
@ 2016-01-05 19:17 Andi Kleen
  2016-01-05 21:05 ` Jiri Olsa
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2016-01-05 19:17 UTC (permalink / raw)
  To: acme; +Cc: jolsa, mingo, linux-kernel, Andi Kleen

From: Andi Kleen <ak@linux.intel.com>

Some locales print floating point numbers with a comma instead of a dot.
This causes problems with CSV mode because it causes extra false CSV
fields. Force the numeric locale to be always C in CSV mode.

Before:

$ LC_ALL=pl_PL.utf8  perf stat -x, true
0,399472,,task-clock,399472,100,00			<---- extra bogus field
...

After:
$ LC_ALL=pl_PL.utf8  ./obj-perf/perf stat -x, true
0.338422,,task-clock,338422,100.00

Originally reported in https://github.com/andikleen/pmu-tools/issues/43

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/builtin-stat.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 9805e03..4d5e504 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1826,6 +1826,7 @@ int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
 		csv_output = true;
 		if (!strcmp(csv_sep, "\\t"))
 			csv_sep = "\t";
+		setlocale(LC_NUMERIC, "C");
 	} else
 		csv_sep = DEFAULT_SEPARATOR;
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf, tools, stat: Force C numeric locale for CSV mode
  2016-01-05 19:17 [PATCH] perf, tools, stat: Force C numeric locale for CSV mode Andi Kleen
@ 2016-01-05 21:05 ` Jiri Olsa
  2016-01-05 21:17   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 6+ messages in thread
From: Jiri Olsa @ 2016-01-05 21:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: acme, jolsa, mingo, linux-kernel, Andi Kleen

On Tue, Jan 05, 2016 at 11:17:45AM -0800, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
> 
> Some locales print floating point numbers with a comma instead of a dot.
> This causes problems with CSV mode because it causes extra false CSV
> fields. Force the numeric locale to be always C in CSV mode.
> 
> Before:
> 
> $ LC_ALL=pl_PL.utf8  perf stat -x, true
> 0,399472,,task-clock,399472,100,00			<---- extra bogus field
> ...
> 
> After:
> $ LC_ALL=pl_PL.utf8  ./obj-perf/perf stat -x, true
> 0.338422,,task-clock,338422,100.00
> 
> Originally reported in https://github.com/andikleen/pmu-tools/issues/43
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf, tools, stat: Force C numeric locale for CSV mode
  2016-01-05 21:05 ` Jiri Olsa
@ 2016-01-05 21:17   ` Arnaldo Carvalho de Melo
  2016-01-05 21:28     ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-01-05 21:17 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, jolsa, mingo, linux-kernel, Andi Kleen

Em Tue, Jan 05, 2016 at 10:05:01PM +0100, Jiri Olsa escreveu:
> On Tue, Jan 05, 2016 at 11:17:45AM -0800, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> > 
> > Some locales print floating point numbers with a comma instead of a dot.
> > This causes problems with CSV mode because it causes extra false CSV
> > fields. Force the numeric locale to be always C in CSV mode.
> > 
> > Before:
> > 
> > $ LC_ALL=pl_PL.utf8  perf stat -x, true
> > 0,399472,,task-clock,399472,100,00			<---- extra bogus field
> > ...
> > 
> > After:
> > $ LC_ALL=pl_PL.utf8  ./obj-perf/perf stat -x, true
> > 0.338422,,task-clock,338422,100.00
> > 
> > Originally reported in https://github.com/andikleen/pmu-tools/issues/43
> > 
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> Acked-by: Jiri Olsa <jolsa@kernel.org>

I wonder what is that other tools do when stumbling on this, i.e.
some other tool output that produces values that have the CSV character
in it...

Completely disabling the configured locale seems too harsh to me, aren't
people used to changing the csv char via some option like we have in
'perf stat':

    -x, --field-separator

when changing the locale from the default 'C' one? Hey, you even used it
above, but you chose a CSV char that is used in this locale, oops ;-)

- Arnaldo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf, tools, stat: Force C numeric locale for CSV mode
  2016-01-05 21:17   ` Arnaldo Carvalho de Melo
@ 2016-01-05 21:28     ` Andi Kleen
  2016-01-05 23:27       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 6+ messages in thread
From: Andi Kleen @ 2016-01-05 21:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, jolsa, mingo, linux-kernel, Andi Kleen

On Tue, Jan 05, 2016 at 06:17:57PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jan 05, 2016 at 10:05:01PM +0100, Jiri Olsa escreveu:
> > On Tue, Jan 05, 2016 at 11:17:45AM -0800, Andi Kleen wrote:
> > > From: Andi Kleen <ak@linux.intel.com>
> > > 
> > > Some locales print floating point numbers with a comma instead of a dot.
> > > This causes problems with CSV mode because it causes extra false CSV
> > > fields. Force the numeric locale to be always C in CSV mode.
> > > 
> > > Before:
> > > 
> > > $ LC_ALL=pl_PL.utf8  perf stat -x, true
> > > 0,399472,,task-clock,399472,100,00			<---- extra bogus field
> > > ...
> > > 
> > > After:
> > > $ LC_ALL=pl_PL.utf8  ./obj-perf/perf stat -x, true
> > > 0.338422,,task-clock,338422,100.00
> > > 
> > > Originally reported in https://github.com/andikleen/pmu-tools/issues/43
> > > 
> > > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > 
> > Acked-by: Jiri Olsa <jolsa@kernel.org>
> 
> I wonder what is that other tools do when stumbling on this, i.e.
> some other tool output that produces values that have the CSV character
> in it...

Proper CSV supports escaping the separator by putting the whole field
into quotes. Unfortunately perf stat doesn't output proper CSV,
the event fields with commas are not quoted.

I usually work around it by using -x\; instead

But the , problem should be still fixed.

> 
> Completely disabling the configured locale seems too harsh to me, aren't
> people used to changing the csv char via some option like we have in
> 'perf stat':
> 
>     -x, --field-separator
> 
> when changing the locale from the default 'C' one? Hey, you even used it
> above, but you chose a CSV char that is used in this locale, oops ;-)

It's just for numbers (LC_NUMERIC), everything else is still localized.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf, tools, stat: Force C numeric locale for CSV mode
  2016-01-05 21:28     ` Andi Kleen
@ 2016-01-05 23:27       ` Arnaldo Carvalho de Melo
  2016-01-06  1:37         ` Andi Kleen
  0 siblings, 1 reply; 6+ messages in thread
From: Arnaldo Carvalho de Melo @ 2016-01-05 23:27 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jiri Olsa, jolsa, mingo, linux-kernel, Andi Kleen

Em Tue, Jan 05, 2016 at 10:28:39PM +0100, Andi Kleen escreveu:
> On Tue, Jan 05, 2016 at 06:17:57PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jan 05, 2016 at 10:05:01PM +0100, Jiri Olsa escreveu:
> > > On Tue, Jan 05, 2016 at 11:17:45AM -0800, Andi Kleen wrote:
> > > > From: Andi Kleen <ak@linux.intel.com>
> > > > 
> > > > Some locales print floating point numbers with a comma instead of a dot.
> > > > This causes problems with CSV mode because it causes extra false CSV
> > > > fields. Force the numeric locale to be always C in CSV mode.
> > > > 
> > > > Before:
> > > > 
> > > > $ LC_ALL=pl_PL.utf8  perf stat -x, true
> > > > 0,399472,,task-clock,399472,100,00			<---- extra bogus field
> > > > ...
> > > > 
> > > > After:
> > > > $ LC_ALL=pl_PL.utf8  ./obj-perf/perf stat -x, true
> > > > 0.338422,,task-clock,338422,100.00
> > > > 
> > > > Originally reported in https://github.com/andikleen/pmu-tools/issues/43
> > > > 
> > > > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> > > 
> > > Acked-by: Jiri Olsa <jolsa@kernel.org>
> > 
> > I wonder what is that other tools do when stumbling on this, i.e.
> > some other tool output that produces values that have the CSV character
> > in it...
> 
> Proper CSV supports escaping the separator by putting the whole field
> into quotes. Unfortunately perf stat doesn't output proper CSV,
> the event fields with commas are not quoted.

Right, there is even an RFC for CSV, and for a decade already :-)

 https://tools.ietf.org/html/rfc4180

> I usually work around it by using -x\; instead
> 
> But the , problem should be still fixed.

Humm, what is the problem then of doing, for example in my case, with a
LC_ALL=pt_BR, that uses commans as the decimal separator:

LC_ALL=C ./obj-perf/perf stat -x, true

I.e. disabling it using the existing shell mecanism?

[root@zoo ~]# export LC_ALL=pt_BR
[root@zoo ~]# perf stat -e cycles -x, usleep 1
1068190,,cycles,1007049,100,00

Bad, but its a side effect of needing to use a locale that uses the CSV
separator in the decimal separator, disabling it for commands where I
want that CSV separator does the trick:

[root@zoo ~]# LC_ALL= perf stat -e cycles -x, usleep 1
895081,,cycles,840687,100.00
[root@zoo ~]#

It is an inconvenience, yeah, having to prefix that for tools where I
want to use the comma for the CSV separator, but disabling LC_NUMERIC,
albeit better than my what I misparsed at first (forcing all locale to
'C') still looks too harsh :-\

Using -x\; looks sane and shorter tho, perhaps even -x:, to save one
extra char.

- Arnaldo
 
> > Completely disabling the configured locale seems too harsh to me, aren't
> > people used to changing the csv char via some option like we have in
> > 'perf stat':
> > 
> >     -x, --field-separator
> > 
> > when changing the locale from the default 'C' one? Hey, you even used it
> > above, but you chose a CSV char that is used in this locale, oops ;-)
> 
> It's just for numbers (LC_NUMERIC), everything else is still localized.
> 
> -Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf, tools, stat: Force C numeric locale for CSV mode
  2016-01-05 23:27       ` Arnaldo Carvalho de Melo
@ 2016-01-06  1:37         ` Andi Kleen
  0 siblings, 0 replies; 6+ messages in thread
From: Andi Kleen @ 2016-01-06  1:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Andi Kleen, Jiri Olsa, jolsa, mingo, linux-kernel, Andi Kleen

> > I usually work around it by using -x\; instead
> > 
> > But the , problem should be still fixed.
> 
> Humm, what is the problem then of doing, for example in my case, with a
> LC_ALL=pt_BR, that uses commans as the decimal separator:

It's user unfriendly and unobvious. Also you end up with subtly broken
files,. And it would also change the locale of the measured program which
may not be intended.

Plus the floating point values with comma cannot be parsed by programs
that don't know your locale (that was the problem with pmu-tools)

> Using -x\; looks sane and shorter tho, perhaps even -x:, to save one
> extra char.

Even with that there is the problem that you end up with numbers that
cannot be parsed by locale unaware programs.

CSV is intended for other programs so it shouldn't be messed up like
this.

-Andi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-01-06  1:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-05 19:17 [PATCH] perf, tools, stat: Force C numeric locale for CSV mode Andi Kleen
2016-01-05 21:05 ` Jiri Olsa
2016-01-05 21:17   ` Arnaldo Carvalho de Melo
2016-01-05 21:28     ` Andi Kleen
2016-01-05 23:27       ` Arnaldo Carvalho de Melo
2016-01-06  1:37         ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).