All of lore.kernel.org
 help / color / mirror / Atom feed
* col issue
@ 2017-03-28 12:16 Karel Zak
  2017-03-28 19:32 ` Ruediger Meier
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Karel Zak @ 2017-03-28 12:16 UTC (permalink / raw)
  To: util-linux; +Cc: Sami Kerola


 Hi,

 see https://bugzilla.redhat.com/show_bug.cgi?id=1436432

 any idea what is the right col(1) behavior for escape sequences?

 The current code reads two first bytes from the sequence and the rest
 is interpreted as standard chars (because complex sequences like
 ^[..m are completely unknown for the code), for example input:

    ^[[1mtomcat-el^[(B^[[m

 produces:

    1mtomcat-elBm

 It seems incorrect. I think for "col -p" all the sequence should be
 filtered out, it means:
   
    tomcat-el
 
 and the default behavior (without -p) should be output all escape
 sequences but do not increment internal width counters.
 
 Objections?

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: col issue
  2017-03-28 12:16 col issue Karel Zak
@ 2017-03-28 19:32 ` Ruediger Meier
  2017-03-28 21:38 ` Sami Kerola
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Ruediger Meier @ 2017-03-28 19:32 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux, Sami Kerola

On Tuesday 28 March 2017, Karel Zak wrote:
>  Hi,
>
>  see https://bugzilla.redhat.com/show_bug.cgi?id=1436432
>
>  any idea what is the right col(1) behavior for escape sequences?

I have no opinion and no clue about col but still one comment :)

While you are looking at this issue, maybe it's related to some test 
failures on OSX and FreeBSD. In .travis-functions we have disabled 
certain test for OSX:

    export TS_OPT_col_multibyte_known_fail=yes
    export TS_OPT_colcrt_regressions_known_fail=yes
    export TS_OPT_column_invalid_multibyte_known_fail=yes

cu,
Rudi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: col issue
  2017-03-28 12:16 col issue Karel Zak
  2017-03-28 19:32 ` Ruediger Meier
@ 2017-03-28 21:38 ` Sami Kerola
  2017-03-29  9:41   ` Karel Zak
  2017-03-29  2:57 ` Pádraig Brady
  2017-03-29 20:56 ` J William Piggott
  3 siblings, 1 reply; 7+ messages in thread
From: Sami Kerola @ 2017-03-28 21:38 UTC (permalink / raw)
  To: Karel Zak; +Cc: util-linux

On 28 March 2017 at 13:16, Karel Zak <kzak@redhat.com> wrote:
>  see https://bugzilla.redhat.com/show_bug.cgi?id=1436432
>
>  any idea what is the right col(1) behavior for escape sequences?
>
>  The current code reads two first bytes from the sequence and the rest
>  is interpreted as standard chars (because complex sequences like
>  ^[..m are completely unknown for the code), for example input:
>
>     ^[[1mtomcat-el^[(B^[[m
>
>  produces:
>
>     1mtomcat-elBm
>
>  It seems incorrect. I think for "col -p" all the sequence should be
>  filtered out, it means:
>
>     tomcat-el
>
>  and the default behavior (without -p) should be output all escape
>  sequences but do not increment internal width counters.
>
>  Objections?

This is what Open Group[1] has to say about col(1) input handling.

On input, the only control characters accepted are space, backspace, tab,
carriage-return and newline characters, SI, SO, VT, reverse line-feed,
forward half-line-feed and reverse half-line-feed.  The VT character
is an alternative form of full reverse line-feed, included for
compatibility with some earlier programs of this type.  The only
other characters to be copied to the output are those that are printable.

Last sentence is pretty clear that control characters must be removed.  I am
not sure if the definition was meant to include control sequences, but it
feels that is the spirit of the definition.  Maybe a silly question how to
choose control sequences that are recognised?  Maybe ECMA-48, VT100, and
Unicode.

[1] http://pubs.opengroup.org/onlinepubs/7908799/xcu/col.html

-- 
Sami Kerola
http://www.iki.fi/kerolasa/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: col issue
  2017-03-28 12:16 col issue Karel Zak
  2017-03-28 19:32 ` Ruediger Meier
  2017-03-28 21:38 ` Sami Kerola
@ 2017-03-29  2:57 ` Pádraig Brady
  2017-03-29  9:35   ` Karel Zak
  2017-03-29 20:56 ` J William Piggott
  3 siblings, 1 reply; 7+ messages in thread
From: Pádraig Brady @ 2017-03-29  2:57 UTC (permalink / raw)
  To: Karel Zak, util-linux; +Cc: Sami Kerola

On 28/03/17 05:16, Karel Zak wrote:
> 
>  Hi,
> 
>  see https://bugzilla.redhat.com/show_bug.cgi?id=1436432
> 
>  any idea what is the right col(1) behavior for escape sequences?
> 
>  The current code reads two first bytes from the sequence and the rest
>  is interpreted as standard chars (because complex sequences like
>  ^[..m are completely unknown for the code), for example input:
> 
>     ^[[1mtomcat-el^[(B^[[m
> 
>  produces:
> 
>     1mtomcat-elBm
> 
>  It seems incorrect. I think for "col -p" all the sequence should be
>  filtered out, it means:
>    
>     tomcat-el
>  
>  and the default behavior (without -p) should be output all escape
>  sequences but do not increment internal width counters.
>  
>  Objections?
> 
>     Karel
> 

I agree, but presuming you meant the opposite, and
for `col -p` to pass through all these escape sequences unaltered

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: col issue
  2017-03-29  2:57 ` Pádraig Brady
@ 2017-03-29  9:35   ` Karel Zak
  0 siblings, 0 replies; 7+ messages in thread
From: Karel Zak @ 2017-03-29  9:35 UTC (permalink / raw)
  To: Pádraig Brady; +Cc: util-linux, Sami Kerola

On Tue, Mar 28, 2017 at 07:57:35PM -0700, Pádraig Brady wrote:
> On 28/03/17 05:16, Karel Zak wrote:
> > 
> >  Hi,
> > 
> >  see https://bugzilla.redhat.com/show_bug.cgi?id=1436432
> > 
> >  any idea what is the right col(1) behavior for escape sequences?
> > 
> >  The current code reads two first bytes from the sequence and the rest
> >  is interpreted as standard chars (because complex sequences like
> >  ^[..m are completely unknown for the code), for example input:
> > 
> >     ^[[1mtomcat-el^[(B^[[m
> > 
> >  produces:
> > 
> >     1mtomcat-elBm
> > 
> >  It seems incorrect. I think for "col -p" all the sequence should be
> >  filtered out, it means:
> >    
> >     tomcat-el
> >  
> >  and the default behavior (without -p) should be output all escape
> >  sequences but do not increment internal width counters.
> >  
> >  Objections?
> > 
> >     Karel
> > 
> 
> I agree, but presuming you meant the opposite, and
> for `col -p` to pass through all these escape sequences unaltered

 Yes, -p is opposite, it means "Force unknown control sequences to be
 passed through unchanged."

    Karel
 

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: col issue
  2017-03-28 21:38 ` Sami Kerola
@ 2017-03-29  9:41   ` Karel Zak
  0 siblings, 0 replies; 7+ messages in thread
From: Karel Zak @ 2017-03-29  9:41 UTC (permalink / raw)
  To: kerolasa; +Cc: util-linux

On Tue, Mar 28, 2017 at 10:38:46PM +0100, Sami Kerola wrote:
> This is what Open Group[1] has to say about col(1) input handling.
> 
> On input, the only control characters accepted are space, backspace, tab,
> carriage-return and newline characters, SI, SO, VT, reverse line-feed,
> forward half-line-feed and reverse half-line-feed.  The VT character
> is an alternative form of full reverse line-feed, included for
> compatibility with some earlier programs of this type.  The only
> other characters to be copied to the output are those that are printable.
> 
> Last sentence is pretty clear that control characters must be removed.  I am
> not sure if the definition was meant to include control sequences, but it
> feels that is the spirit of the definition.  Maybe a silly question how to
> choose control sequences that are recognised?  Maybe ECMA-48, VT100, and
> Unicode.

I think to keep the col(1) usable there is necessary to have a way how
to filter out the sequences *or* keep the sequences unchanged (-p). The
current behavior produces malformed output. From my point of view it
would be enough to fix -p (at least for ^[..m).

    Karel

-- 
 Karel Zak  <kzak@redhat.com>
 http://karelzak.blogspot.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: col issue
  2017-03-28 12:16 col issue Karel Zak
                   ` (2 preceding siblings ...)
  2017-03-29  2:57 ` Pádraig Brady
@ 2017-03-29 20:56 ` J William Piggott
  3 siblings, 0 replies; 7+ messages in thread
From: J William Piggott @ 2017-03-29 20:56 UTC (permalink / raw)
  To: Karel Zak, util-linux; +Cc: Sami Kerola

On 03/28/2017 08:16 AM, Karel Zak wrote:
> 
>  Hi,
> 
>  see https://bugzilla.redhat.com/show_bug.cgi?id=1436432
> 

A more pertinent question for this bug report might be: why are they
running display terminal output (a log file) through 'col -b'? The 'b'
switch is to strip overstriking, which there is none of in this log
file. Besides clobbering the terminal CSIs it does almost nothing to the
output.

>  any idea what is the right col(1) behavior for escape sequences?

This is a very old roff postprocessor for Model 37 Teletypes, which long
ago was the default output for nroff. The reason col(1) doesn't
recognize modern display terminal CSIs is because they didn't exist back
then.  You will notice that the escape sequences that col(1) does
recognize are not for a display terminal, they are for a print terminal.

Modern roff packages no longer output text this postprocessor was written
for. That is why col(1) is not included with *roff packages anymore.

The only way I see it being used currently is 'col -b' to strip
overstrikes. That is mostly unneeded, because modern pagers can handle
overstrikes. If someone really wants to strip them, a simple sed pipe
will do the job.

So col(1) is not supposed to be a general text processor, it was a roff
postprocessor. It seems to me if it were changed into a general text
processor a new name would be in order to distinguish it from the
traditional command?

> 
>  The current code reads two first bytes from the sequence and the rest
>  is interpreted as standard chars (because complex sequences like
>  ^[..m are completely unknown for the code), for example input:
> 
>     ^[[1mtomcat-el^[(B^[[m
> 
>  produces:
> 
>     1mtomcat-elBm
> 
>  It seems incorrect. I think for "col -p" all the sequence should be
>  filtered out, it means:
>    
>     tomcat-el
>  
>  and the default behavior (without -p) should be output all escape
>  sequences but do not increment internal width counters.
>  
>  Objections?
> 
>     Karel
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-29 20:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28 12:16 col issue Karel Zak
2017-03-28 19:32 ` Ruediger Meier
2017-03-28 21:38 ` Sami Kerola
2017-03-29  9:41   ` Karel Zak
2017-03-29  2:57 ` Pádraig Brady
2017-03-29  9:35   ` Karel Zak
2017-03-29 20:56 ` J William Piggott

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.