From mboxrd@z Thu Jan 1 00:00:00 1970 From: Anton Staaf Subject: Re: [PATCH v2 2/3] dtc: Support character literals in cell lists Date: Thu, 8 Sep 2011 14:20:16 -0700 Message-ID: References: <1315437340-1661-1-git-send-email-robotboy@chromium.org> <1315437340-1661-3-git-send-email-robotboy@chromium.org> <20110908034742.GN30278@yookeroo.fritz.box> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20110908034742.GN30278-787xzQ0H9iQXU02nzanrWNbf9cGiqdzd@public.gmane.org> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: devicetree-discuss-bounces+gldd-devicetree-discuss=m.gmane.org-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org Sender: devicetree-discuss-bounces+gldd-devicetree-discuss=m.gmane.org-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org To: David Gibson Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org List-Id: devicetree@vger.kernel.org On Wed, Sep 7, 2011 at 8:47 PM, David Gibson wrote: > On Wed, Sep 07, 2011 at 04:15:39PM -0700, Anton Staaf wrote: >> With this patch the following property assignment: >> >> =A0 =A0 property =3D <0x12345678 'a' '\r' 100>; >> >> is equivalent to: >> >> =A0 =A0 property =3D <0x12345678 0x00000061 0x0000000D 0x00000064> >> >> Signed-off-by: Anton Staaf >> Cc: Jon Loeliger >> Cc: David Gibson >> --- >> =A0Documentation/dts-format.txt | =A0 =A02 +- >> =A0Documentation/manual.txt =A0 =A0 | =A0 =A07 ++++--- >> =A0dtc-lexer.l =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| =A0 14 ++++++++++++++ >> =A0dtc-parser.y =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0 =A04 ++++ >> =A04 files changed, 23 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/dts-format.txt b/Documentation/dts-format.txt >> index a655b87..eae8b76 100644 >> --- a/Documentation/dts-format.txt >> +++ b/Documentation/dts-format.txt >> @@ -33,7 +33,7 @@ Property values may be defined as an array of 32-bit i= nteger cells, as >> =A0NUL-terminated strings, as bytestrings or a combination of these. >> >> =A0* Arrays of cells are represented by angle brackets surrounding a >> - =A0space separated list of C-style integers >> + =A0space separated list of C-style integers or character literals. >> >> =A0 =A0 =A0 e.g. interrupts =3D <17 0xc>; >> >> diff --git a/Documentation/manual.txt b/Documentation/manual.txt >> index f8a8a7b..940fd3d 100644 >> --- a/Documentation/manual.txt >> +++ b/Documentation/manual.txt >> @@ -213,10 +213,11 @@ For example: >> >> =A0By default, all numeric values are hexadecimal. =A0Alternate bases >> =A0may be specified using a prefix "d#" for decimal, "b#" for binary, >> -and "o#" for octal. >> +and "o#" for octal. =A0Character literals are supported using the C >> +language character literal syntax of 'a'. > > Hrm, this paragraph is clearly referring to the old-style dts-v0 > syntax (hex is not the default, these days) - which may well mean the > whole document is hopelessly out of date. =A0And I think the char > literals should only be processed in dts-v1 mode. Fair enough, would it make sense for me to just remove this modification to manual.txt and leave the changes in place in dts-format.txt? >> -Strings support common escape sequences from C: "\n", "\t", "\r", >> -"\(octal value)", "\x(hex value)". >> +Strings and character literals support common escape sequences from C: >> +"\n", "\t", "\r", "\(octal value)", "\x(hex value)". >> >> >> =A04.3) Labels and References >> diff --git a/dtc-lexer.l b/dtc-lexer.l >> index e866ea5..d4f9eaa 100644 >> --- a/dtc-lexer.l >> +++ b/dtc-lexer.l >> @@ -29,6 +29,8 @@ PROPNODECHAR =A0 =A0 =A0 =A0[a-zA-Z0-9,._+*#?@-] >> =A0PATHCHAR =A0 =A0 ({PROPNODECHAR}|[/]) >> =A0LABEL =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0[a-zA-Z_][a-zA-Z0-9_]* >> =A0STRING =A0 =A0 =A0 =A0 =A0 =A0 =A0 \"([^\\"]|\\.)*\" >> +CHAR_LITERAL '[^\\']' >> +CHAR_ESCAPED '\\([^']+|')' > > I'd prefer a single regex here, of '[^']+', and then check that it is > indeed a single character when you process it. =A0An aside as to why... Done, though I made it a slightly more complex regex that can also match '\= ''. > So, when first working with lexer/parser generators, it is very > tempting to make the token regexes as tight as possible, so that as > soon as lex has recognized the token, you know it is 100% > syntactically valid. =A0The problem with this is that when the user > enters something that looks kinda like the token in question, but > isn't quite right, then flex will re-interpret those characters as > some other tokens. =A0That usually results in a deeply incomprehensible > syntax error from the parser. Good point. > I think the better approach, therefore, is to make the token regexes > as wide as they reasonably can be without ambiguity, then do extra > validation once the token is recognized by flex. =A0That way you can > produce a much more helpful "Bad format for token X" type error > message. =A0Currently dtc is a bit mixed in the extent it does this, but > that's the direction I'd prefer to move. Done. >> =A0WS =A0 =A0 =A0 =A0 =A0 [[:space:]] >> =A0COMMENT =A0 =A0 =A0 =A0 =A0 =A0 =A0"/*"([^*]|\*+[^*/])*\*+"/" >> =A0LINECOMMENT =A0"//".*\n >> @@ -109,6 +111,18 @@ static int pop_input_file(void); >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return DT_LITERAL; >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 } >> >> +{CHAR_LITERAL} =A0 { >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 yylval.byte =3D yytext[1]; >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 DPRINT("Character literal: %s\= n", yytext); >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return DT_BYTE; >> + =A0 =A0 =A0 =A0 =A0 =A0 } >> + >> +{CHAR_ESCAPED} =A0 { >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 yylval.byte =3D get_escape_cha= r_exact(yytext+1, yyleng-2); >> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 DPRINT("Character literal esca= ped: %s\n", yytext); > > So, I'd prefer the error handling to be done here - that is if there's > a bad escape sequence or more than one character, report it here. =A0For > now, a die() will do, although longer term I'd prefer a not > immediately fatal message (allowing the rest of the file to be checked > for errors) with an error triggered after the parse is complete. I ended up (as you'll see shortly) doing this in dtc-parser.y and using the print_error function there so I didn't have to just die out. So longer term can be now, instead of... Thanks, Anton > -- > David Gibson =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| I'll have my music = baroque, and my code > david AT gibson.dropbear.id.au =A0| minimalist, thank you. =A0NOT _the_ _= other_ > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| _way_ _a= round_! > http://www.ozlabs.org/~dgibson >