linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Timothy Miller <miller@techsource.com>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Gerrit Huizenga <gh@us.ibm.com>,
	John Bradford <john@grabjohn.com>,
	Chuck Ebbert <76306.1226@compuserve.com>,
	Linus Torvalds <torvalds@transmeta.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: kernel support for non-English user messages
Date: Wed, 16 Apr 2003 12:20:39 -0400	[thread overview]
Message-ID: <3E9D82D7.1080304@techsource.com> (raw)
In-Reply-To: 1050503867.28586.91.camel@dhcp22.swansea.linux.org.uk



Alan Cox wrote:

>On Mer, 2003-04-16 at 15:28, Timothy Miller wrote:
>
>>The point of this painfully off-topic rant is that messages being 
>>written in English are a disadvantage for no one since they all already 
>>know English.  The messages are also simple enough that anyone
>>
>
>Thats a hopeless simplification for non techies and for some techies.
>

I recognize that having internationalized messages would be an 
advantage.  The question is whether or not that advantage outweighs the 
disadvantage.

This shouldn't stop anyone from working on it, however.  My message 
compression work only shaves off about 64k, MAYBE, which compared to the 
rest of the kernel footprint is tiny.  What you get is a more complex 
printk, no lower-case letters, and a meager space savings.  Embedded 
users might like it, but they'll start out with less text to compress in 
the first place; those who stand to benefit the most are the ones who 
can benefit the least.  Oh, and the build process will take at least 
twice as long.  IF this can be implemented in a way that makes it 
entirely transparent, maybe a few people will use it, so my primary 
motivation for doing it is that I'm enjoying working on it.  :)

>>I personally have a list of every kernel message I could extract from 
>>the source code of 2.4.20, and I've examined a lot of them.  It's a lot 
>>like reading Dr. Seuss.  Although some of the words are long, the 
>>vocabulary is incredibly small.  A lot of text is abbreviations and 
>>acronyms that you wouldn't translate anyhow!
>>
>
>I would be interested in how you extracted them, since a tool that can
>do this is the relevant 99% of the discussion, whether its for building
>message explanations, translation, reducing messages for embedded...
>
I used 2.4.20.  I modified Rules.make.  The line which does the compile 
(line 60), I changed like this:

%.o: %.c
    $(CPP) $(CFLAGS) $(EXTRA_CFLAGS_nostdinc) -DKBUILD_BASENAME=$(subst 
$(comma),_,$(subst -,_,$(*F))) $(CFLAGS_$@) $< > $<.i ; \
    $(CC) $(CFLAGS) $(EXTRA_CFLAGS_nostdinc) -DKBUILD_BASENAME=$(subst 
$(comma),_,$(subst -,_,$(*F))) $(CFLAGS_$@) -c -o $@ $<


When I built the kernel, for every .o file, I also got a .c.i file.  I 
used a script to scan the tree for .i files.  This is the script:

#!/bin/sh

for i in `ls -A | grep .i$`
do
    if [ -f $i ]
    then
        /home/tim/tmp/prkex < $i
    fi
done

for i in `ls -A`
do
    if [ -d $i ]
    then
    P=`pwd`
    cd $i
    $0
    cd $P
    fi
done



And here's 'prkex', a horrible little extraction program that I hacked 
together rather abruptly.  It doesn't do anything cool like use yacc or 
regex.  It gets the printk format strings using a half-wit state 
machine.  I figure I can get away with posting it because it's shorter 
than a lot of patches I see.  :)



#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>


/* This is implemented as a state-machine which parses the input for
   strings which match:
   printk("...", ...);
*/


/* This is to deal with backslash followed by octal.  Since I don't
   try to deal with that properly, it's kinda pointless to have this.
   oh well. */
int state_stack[10];
int top_state = 0;


#define STATE (state_stack[top_state])

char *printk_string = "printk";
char last_out = '\n';

/* When extracting words, \n is a separator.  We don't need redundant \n */
void output_char(char c)
{
    if (last_out == '\n' && c == '\n') return;
    last_out = c;
    putchar(c);
}


/* Find printk and look for everything in the format string.  Deal with
   concatenation of multiple partial strings and ignore anything outside
   of quotes (__FILE__, etc) until we hit , or ). */
void process_char(int c)
{
    switch (STATE) {
    case 5:   /* look for 'k' */
        if (c == printk_string[5]) {
            STATE = '(';
        } else {
            STATE = 0;
        }
        break;
    default:  /* Look for first 5 chars of 'printk' */
        if (STATE < 5) {
            if (c == printk_string[STATE]) {
                STATE++;
            } else {
                STATE = 0;
            }
        } else {
            STATE = 0;
        }
        break;
    case '(':  /* Look for ( */
        if (isspace(c)) return;
        if (c == '(') {
            STATE = '[';
        } else {
            STATE = 0;
        }
        break;
    case '[':  /* Look for " */
        if (isspace(c)) return;
        switch (c) {
        case 34:
            STATE = 34;
            break;
        case ')':
        case ',':
        case ';':
            output_char('\n');
            STATE = 0;
            break;
        default:
            /* default doesn't abort because we get __FILE__, etc. */
            output_char('\n');
            break;
        }
        break;
    case 34:    /* We're in a string */
        if (c == '\\') {
            top_state++;
            STATE = '\\';
            return;
        }
        if (c == 34) {
            STATE = '[';
            return;
        }
        output_char(c);
        break;
    case '\\':  /* partially deal with backslash. */
        if (state_stack[top_state-1] == 34) {
            switch (c) {
            case 'n':
                output_char('\n');
                break;
            case 't':
                output_char('\t');
                break;
            case '\n':
                break;
            default:
                output_char(c);
                break;
            }
        }
        top_state--;
        break;
    }
}       
       
           

int main(int argc, char *argv[])
{
    int c;
   
    state_stack[0] = 0;
   
    while (!feof(stdin)) {
        c = getchar();
        if (c != EOF) process_char(c);
    }
   
    return 0;
}



  reply	other threads:[~2003-04-16 15:54 UTC|newest]

Thread overview: 122+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-04-11 13:17 kernel support for non-English user messages Chuck Ebbert
2003-04-11 13:40 ` John Bradford
2003-04-16  1:59   ` Gerrit Huizenga
2003-04-16 14:28     ` Timothy Miller
2003-04-16 14:37       ` Alan Cox
2003-04-16 16:20         ` Timothy Miller [this message]
2003-04-16 17:04       ` Bruce Harada
2003-04-16 18:34         ` Timothy Miller
2003-04-16 18:37           ` Bruce Harada
2003-04-11 14:37 ` Richard B. Johnson
2003-04-11 16:00 ` Linus Torvalds
2003-04-12  8:22 ` Kai Henningsen
2003-04-12 11:08   ` John Bradford
  -- strict thread matches above, loose matches on Subject: below --
2003-04-14 21:27 Chuck Ebbert
2003-04-12 20:31 Chuck Ebbert
2003-04-14  9:07 ` Denis Vlasenko
2003-04-12 16:47 Chuck Ebbert
2003-04-12 15:20 Chuck Ebbert
2003-04-12 15:34 ` Alan Cox
2003-04-12 17:22   ` Robert P. J. Day
2003-04-13  3:59     ` Martin J. Bligh
2003-04-13  6:21   ` John Bradford
2003-04-12  9:52 Chuck Ebbert
2003-04-11 23:38 Chuck Ebbert
2003-04-11 23:36 Jim Keniston[UNIX]
2003-04-11 22:21 Chuck Ebbert
2003-04-11 22:53 ` Martin J. Bligh
2003-04-12  7:55   ` John Bradford
2003-04-12  7:48 ` John Bradford
2003-04-14 11:40 ` Denis Vlasenko
2003-04-14 12:55   ` John Bradford
2003-04-14 17:29     ` Linus Torvalds
2003-04-14 18:15       ` John Bradford
2003-04-14 23:04       ` Felipe Alfaro Solana
2003-04-15 13:21       ` Alex Combas
2003-04-15 18:02       ` Eric Altendorf
2003-04-17 13:46         ` Alan Cox
2003-04-17 15:07         ` Randolph Bentson
2003-04-17 18:49           ` Eric Altendorf
2003-04-14 13:18   ` Sean Neakums
2003-04-14 14:23   ` Valdis.Kletnieks
2003-04-16  5:03     ` Denis Vlasenko
     [not found] <A46BBDB345A7D5118EC90002A5072C780BEBA7DD@orsmsx116.jf.inte l.com>
2003-04-11 20:55 ` kernel support for non-english " Ruth Ivimey-Cook
2003-04-11 20:02 Perez-Gonzalez, Inaky
2003-04-11 16:57 kernel support for non-English " Chuck Ebbert
2003-04-11 17:38 ` Richard B. Johnson
2003-04-11 18:10 ` Matti Aarnio
2003-04-11 14:52 Paolo Ciarrocchi
     [not found] <20030409051006$1ecf@gated-at.bofh.it>
     [not found] ` <20030409081011$5257@gated-at.bofh.it>
     [not found]   ` <20030409221017$6c98@gated-at.bofh.it>
     [not found]     ` <20030409225009$2558@gated-at.bofh.it>
     [not found]       ` <20030410014009$78fb@gated-at.bofh.it>
     [not found]         ` <20030410200019$3e8f@gated-at.bofh.it>
     [not found]           ` <20030410202016$7d48@gated-at.bofh.it>
2003-04-11 11:29             ` kernel support for non-english " Tim Connors
2003-04-11 10:10 Chuck Ebbert
2003-04-10 23:23 Chuck Ebbert
2003-04-10 22:13 Chuck Ebbert
2003-04-10 22:33 ` Stephen Hemminger
2003-04-10 21:20 Perez-Gonzalez, Inaky
2003-04-10 22:06 ` Andreas Dilger
2003-04-11  7:38   ` Ville Herva
2003-04-10 20:54 Chuck Ebbert
2003-04-10 21:08 ` Bernd Petrovitsch
2003-04-10 19:21 Perez-Gonzalez, Inaky
2003-04-10 20:41 ` Robert White
2003-04-11  9:21   ` kernel support for non-English " Riley Williams
2003-04-11 20:49     ` Robert White
2003-04-11 22:53       ` Riley Williams
2003-04-15  3:44         ` Robert White
2003-04-15 11:08           ` Alan Cox
2003-04-15 11:08           ` Alan Cox
2003-04-15 14:07           ` Timothy Miller
2003-04-11 21:04     ` Ruth Ivimey-Cook
2003-04-11 21:31       ` Daniel Stekloff
2003-04-10 10:47 kernel support for non-english " Ruth Ivimey-Cook
2003-04-09 23:31 Jim Keniston[UNIX]
2003-04-10 19:01 ` Alan Cox
2003-04-11  9:21   ` kernel support for non-English " Riley Williams
2003-04-11 12:16     ` Alan Cox
2003-04-11 13:39       ` John Bradford
2003-04-11 13:11         ` Alan Cox
2003-04-11 14:48           ` John Bradford
2003-04-09 19:25 kernel support for non-english " Perez-Gonzalez, Inaky
2003-04-09 19:01 Perez-Gonzalez, Inaky
2003-04-09  5:02 Frank Davis
2003-04-09  5:29 ` Oliver Neukum
2003-04-09  5:50   ` Frank Davis
2003-04-09  9:37     ` Bernd Petrovitsch
2003-04-09 11:04   ` Alan Cox
2003-04-09  5:53 ` Andreas Dilger
2003-04-09  8:08 ` Matti Aarnio
2003-04-09  9:33   ` Oliver Neukum
2003-04-09 10:24     ` Matti Aarnio
2003-04-09 22:07   ` Werner Almesberger
2003-04-09 22:41     ` Frank Davis
2003-04-09 22:55       ` Ulrich Drepper
2003-04-09 23:53         ` Johannes Ruscheinski
2003-04-10  1:43       ` Richard B. Johnson
2003-04-10 18:57         ` Alan Cox
2003-04-10 20:13           ` Trond Myklebust
2003-04-10 19:42             ` Alan Cox
2003-04-11  0:48               ` Christer Weinigel
2003-04-11 15:56                 ` Daniel Stekloff
2003-04-10 20:53             ` Richard B. Johnson
2003-04-10 23:05               ` Jon Portnoy
2003-04-11  5:39                 ` DevilKin
2003-04-11  5:49                   ` Arnaldo Carvalho de Melo
2003-04-11  6:17                     ` DevilKin
2003-04-11 17:51                     ` Randy.Dunlap
2003-04-11 11:57               ` Helge Hafting
2003-04-11 17:55                 ` David Lang
2003-04-10 20:36           ` John Bradford
2003-04-10 22:20             ` Shaya Potter
2003-04-11  4:19               ` Valdis.Kletnieks
2003-04-11  4:23                 ` Shaya Potter
2003-04-11  8:40                   ` Henning P. Schmiedehausen
2003-04-11  9:09                 ` John Bradford
2003-04-11 10:59                   ` Valdis.Kletnieks
2003-04-11 11:11                     ` John Bradford
2003-04-11 11:40                 ` Helge Hafting
2003-04-10  8:19       ` Oliver Neukum
2003-04-09 13:11 ` Giuliano Pochini
2003-04-10  3:08 ` Linus Torvalds
2003-04-10  9:05   ` kernel support for non-English " Riley Williams
2003-04-10 17:35     ` Linus Torvalds
2003-04-10 18:32       ` John Bradford
2003-04-12  2:55       ` Chris Wedgwood

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3E9D82D7.1080304@techsource.com \
    --to=miller@techsource.com \
    --cc=76306.1226@compuserve.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=gh@us.ibm.com \
    --cc=john@grabjohn.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@transmeta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).