From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264524AbTDPPyu (ORCPT ); Wed, 16 Apr 2003 11:54:50 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264528AbTDPPyu (ORCPT ); Wed, 16 Apr 2003 11:54:50 -0400 Received: from watch.techsource.com ([209.208.48.130]:42655 "EHLO techsource.com") by vger.kernel.org with ESMTP id S264524AbTDPPyn (ORCPT ); Wed, 16 Apr 2003 11:54:43 -0400 Message-ID: <3E9D82D7.1080304@techsource.com> Date: Wed, 16 Apr 2003 12:20:39 -0400 From: Timothy Miller User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Alan Cox CC: Gerrit Huizenga , John Bradford , Chuck Ebbert <76306.1226@compuserve.com>, Linus Torvalds , Linux Kernel Mailing List Subject: Re: kernel support for non-English user messages References: <3E9D688F.5040204@techsource.com> <1050503867.28586.91.camel@dhcp22.swansea.linux.org.uk> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Alan Cox wrote: >On Mer, 2003-04-16 at 15:28, Timothy Miller wrote: > >>The point of this painfully off-topic rant is that messages being >>written in English are a disadvantage for no one since they all already >>know English. The messages are also simple enough that anyone >> > >Thats a hopeless simplification for non techies and for some techies. > I recognize that having internationalized messages would be an advantage. The question is whether or not that advantage outweighs the disadvantage. This shouldn't stop anyone from working on it, however. My message compression work only shaves off about 64k, MAYBE, which compared to the rest of the kernel footprint is tiny. What you get is a more complex printk, no lower-case letters, and a meager space savings. Embedded users might like it, but they'll start out with less text to compress in the first place; those who stand to benefit the most are the ones who can benefit the least. Oh, and the build process will take at least twice as long. IF this can be implemented in a way that makes it entirely transparent, maybe a few people will use it, so my primary motivation for doing it is that I'm enjoying working on it. :) >>I personally have a list of every kernel message I could extract from >>the source code of 2.4.20, and I've examined a lot of them. It's a lot >>like reading Dr. Seuss. Although some of the words are long, the >>vocabulary is incredibly small. A lot of text is abbreviations and >>acronyms that you wouldn't translate anyhow! >> > >I would be interested in how you extracted them, since a tool that can >do this is the relevant 99% of the discussion, whether its for building >message explanations, translation, reducing messages for embedded... > I used 2.4.20. I modified Rules.make. The line which does the compile (line 60), I changed like this: %.o: %.c $(CPP) $(CFLAGS) $(EXTRA_CFLAGS_nostdinc) -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) $(CFLAGS_$@) $< > $<.i ; \ $(CC) $(CFLAGS) $(EXTRA_CFLAGS_nostdinc) -DKBUILD_BASENAME=$(subst $(comma),_,$(subst -,_,$(*F))) $(CFLAGS_$@) -c -o $@ $< When I built the kernel, for every .o file, I also got a .c.i file. I used a script to scan the tree for .i files. This is the script: #!/bin/sh for i in `ls -A | grep .i$` do if [ -f $i ] then /home/tim/tmp/prkex < $i fi done for i in `ls -A` do if [ -d $i ] then P=`pwd` cd $i $0 cd $P fi done And here's 'prkex', a horrible little extraction program that I hacked together rather abruptly. It doesn't do anything cool like use yacc or regex. It gets the printk format strings using a half-wit state machine. I figure I can get away with posting it because it's shorter than a lot of patches I see. :) #include #include #include #include /* This is implemented as a state-machine which parses the input for strings which match: printk("...", ...); */ /* This is to deal with backslash followed by octal. Since I don't try to deal with that properly, it's kinda pointless to have this. oh well. */ int state_stack[10]; int top_state = 0; #define STATE (state_stack[top_state]) char *printk_string = "printk"; char last_out = '\n'; /* When extracting words, \n is a separator. We don't need redundant \n */ void output_char(char c) { if (last_out == '\n' && c == '\n') return; last_out = c; putchar(c); } /* Find printk and look for everything in the format string. Deal with concatenation of multiple partial strings and ignore anything outside of quotes (__FILE__, etc) until we hit , or ). */ void process_char(int c) { switch (STATE) { case 5: /* look for 'k' */ if (c == printk_string[5]) { STATE = '('; } else { STATE = 0; } break; default: /* Look for first 5 chars of 'printk' */ if (STATE < 5) { if (c == printk_string[STATE]) { STATE++; } else { STATE = 0; } } else { STATE = 0; } break; case '(': /* Look for ( */ if (isspace(c)) return; if (c == '(') { STATE = '['; } else { STATE = 0; } break; case '[': /* Look for " */ if (isspace(c)) return; switch (c) { case 34: STATE = 34; break; case ')': case ',': case ';': output_char('\n'); STATE = 0; break; default: /* default doesn't abort because we get __FILE__, etc. */ output_char('\n'); break; } break; case 34: /* We're in a string */ if (c == '\\') { top_state++; STATE = '\\'; return; } if (c == 34) { STATE = '['; return; } output_char(c); break; case '\\': /* partially deal with backslash. */ if (state_stack[top_state-1] == 34) { switch (c) { case 'n': output_char('\n'); break; case 't': output_char('\t'); break; case '\n': break; default: output_char(c); break; } } top_state--; break; } } int main(int argc, char *argv[]) { int c; state_stack[0] = 0; while (!feof(stdin)) { c = getchar(); if (c != EOF) process_char(c); } return 0; }