From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68431C433DB for ; Fri, 8 Jan 2021 16:14:40 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0D6512399C for ; Fri, 8 Jan 2021 16:14:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0D6512399C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:47338 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kxuPO-0006S2-TD for qemu-devel@archiver.kernel.org; Fri, 08 Jan 2021 11:14:38 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:58390) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kxuOY-0005yK-Ok for qemu-devel@nongnu.org; Fri, 08 Jan 2021 11:13:46 -0500 Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]:45962) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kxuOW-0000FV-7k for qemu-devel@nongnu.org; Fri, 08 Jan 2021 11:13:46 -0500 Received: by mail-lf1-x12f.google.com with SMTP id x20so24050658lfe.12 for ; Fri, 08 Jan 2021 08:13:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc; bh=qs3RXjQAtIfsiXG6n3GhHV/BRaco7aRopH0FVL0/Ptg=; b=DBH9yN/mB9cu33Io2EPzgfYs+6A8YWi5as/F1Bvgv2zEHS5h0LRbmpguxprKV1Hq6Z AE/41bizPRC0nnUPcfbQKUV7pn6QQk/caLy/uFDUYvwC48itVZ5WZau8DHYIi4Y4vctS Qg6/gxPw3lzfONQgvMqSfXQPWSIKXJXnXpnvqCs/rLiQARbaNub2jd8MmM3KzUEBQHc3 Au59XbgVc/M4BUts9U8ccKgOBTpBsii2JDusUJF3XI0+wpMkS7IvaNN+xJh9NP9lBRz/ +YCqZtT7nvcAKRFVuXQNSo1oecc0eAfmq0A00iyfT43HAL59sgaeKzcc9qJx/cK9ioqq tpig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc; bh=qs3RXjQAtIfsiXG6n3GhHV/BRaco7aRopH0FVL0/Ptg=; b=ipFk00SbEzmXseJtMeRFG81Vdks0+7P+JS9gUFARljk4jsUn84p/2T/VUGM5E9N/A9 PIpKTzE4h9XxfQOdGFxyEXBqx2Oj63zZpIkWNDoIkFF/paCo1PX+8LDeDTFTn1LDcODp wdBr9WzJMQkrGyzi49O1U//Ts2Cg3tAEokFpuptinVCXA2XeRabfz2JeNRA+AydOj2hG +XqrzSqeMCPjpeLYjvS7CPHsa4G2T9MYY+N736UFbUJt5+KKgbgDo+XqeZzJpexRjl0T ZngiP56vaHgZClSv8HqRxidETkgRkNmPiNMorJzBl+xtb7SMcCLhi0fH5r34/V/d0CKp Q6Yw== X-Gm-Message-State: AOAM533UcHjCRL0fR8D4uB2FUCG1hsvSFIDQocfHuYnPp75Y4b/sCI4k ANQXa+wtfEym7KxgUyssWWVAP66K991nYZuHOyc= X-Google-Smtp-Source: ABdhPJwrnbghGDMC8QH3l++AtfaHJI+TX+sZUlcBPr/kP6oMub2ESGzdvr41hGHSS9+Lh3owRxfHi0PimscCPHpQM14= X-Received: by 2002:a2e:8416:: with SMTP id z22mr1878501ljg.347.1610122422308; Fri, 08 Jan 2021 08:13:42 -0800 (PST) MIME-Version: 1.0 References: <20210108151632.277015-1-f4bug@amsat.org> In-Reply-To: From: =?UTF-8?B?572X5YuH5YiaKFlvbmdnYW5nIEx1byk=?= Date: Sat, 9 Jan 2021 00:13:31 +0800 Message-ID: Subject: Re: [PATCH] decodetree: Open files with encoding='utf-8' To: Peter Maydell Content-Type: multipart/alternative; boundary="00000000000006d44405b865d78c" Received-SPF: pass client-ip=2a00:1450:4864:20::12f; envelope-from=luoyonggang@gmail.com; helo=mail-lf1-x12f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: luoyonggang@gmail.com Cc: Eduardo Habkost , Richard Henderson , QEMU Developers , =?UTF-8?Q?Philippe_Mathieu=2DDaud=C3=A9?= , Cleber Rosa , Paolo Bonzini Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" --00000000000006d44405b865d78c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Jan 9, 2021 at 12:05 AM Peter Maydell wrote: > > On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daud=C3=A9 wrote: > > > > When decodetree.py was added in commit 568ae7efae7, QEMU was > > using Python 2 which happily reads UTF-8 files in text mode. > > Python 3 requires either UTF-8 locale or an explicit encoding > > passed to open(). Now that Python 3 is required, explicit > > UTF-8 encoding for decodetree sources. > > > > This fixes: > > > > $ /usr/bin/python3 scripts/decodetree.py test.decode > > Traceback (most recent call last): > > File "scripts/decodetree.py", line 1397, in > > main() > > File "scripts/decodetree.py", line 1308, in main > > parse_file(f, toppat) > > File "scripts/decodetree.py", line 994, in parse_file > > for line in f: > > File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode > > return codecs.ascii_decode(input, self.errors)[0] > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 80: > > ordinal not in range(128) > > > > Reported-by: Peter Maydell > > Signed-off-by: Philippe Mathieu-Daud=C3=A9 > > --- > > scripts/decodetree.py | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/decodetree.py b/scripts/decodetree.py > > index 47aa9caf6d1..fa40903cff1 100644 > > --- a/scripts/decodetree.py > > +++ b/scripts/decodetree.py > > @@ -1304,7 +1304,7 @@ def main(): > > > > for filename in args: > > input_file =3D filename > > - f =3D open(filename, 'r') > > + f =3D open(filename, 'r', encoding=3D'utf-8') > > parse_file(f, toppat) > > f.close() > > Should we also be opening the output file explicitly as > utf-8 ? (How do we say "write to sys.stdout as utf-8" for > the case where we're doing that?) Can be done with ``` sys.stdout =3D io.TextIOWrapper(sys.stdout.buffer, encoding=3D"utf8= ", errors=3D"ignore") ``` > > thanks > -- PMM > -- =E6=AD=A4=E8=87=B4 =E7=A4=BC =E7=BD=97=E5=8B=87=E5=88=9A Yours sincerely, Yonggang Luo --00000000000006d44405b865d78c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sat, Jan 9, 2021 at 12:05 AM Peter Maydell <= peter.maydell@linaro.org>= ; wrote:
>
> On Fri, 8 Jan 2021 at 15:16, Philippe Mathieu-Daud= =C3=A9 <f4bug@amsat.org> wrote= :
> >
> > When decodetree.py was added in commit 568ae7ef= ae7, QEMU was
> > using Python 2 which happily reads UTF-8 files i= n text mode.
> > Python 3 requires either UTF-8 locale or an expli= cit encoding
> > passed to open(). Now that Python 3 is required, = explicit
> > UTF-8 encoding for decodetree sources.
> >> > This fixes:
> >
> > =C2=A0 $ /usr/bin/python3= scripts/decodetree.py test.decode
> > =C2=A0 Traceback (most rece= nt call last):
> > =C2=A0 =C2=A0 File "scripts/decodetree.py&= quot;, line 1397, in <module>
> > =C2=A0 =C2=A0 =C2=A0 main(= )
> > =C2=A0 =C2=A0 File "scripts/decodetree.py", line 1= 308, in main
> > =C2=A0 =C2=A0 =C2=A0 parse_file(f, toppat)
>= ; > =C2=A0 =C2=A0 File "scripts/decodetree.py", line 994, in p= arse_file
> > =C2=A0 =C2=A0 =C2=A0 for line in f:
> > =C2= =A0 =C2=A0 File "/usr/lib/python3.6/encodings/ascii.py", line 26,= in decode
> > =C2=A0 =C2=A0 =C2=A0 return codecs.ascii_decode(inp= ut, self.errors)[0]
> > =C2=A0 UnicodeDecodeError: 'ascii'= codec can't decode byte 0xc3 in position 80:
> > =C2=A0 ordin= al not in range(128)
> >
> > Reported-by: Peter Maydell &= lt;peter.maydell@linaro.org= >
> > Signed-off-by: Philippe Mathieu-Daud=C3=A9 <f4bug@amsat.org>
> > ---
> &= gt; =C2=A0scripts/decodetree.py | 2 +-
> > =C2=A01 file changed, 1= insertion(+), 1 deletion(-)
> >
> > diff --git a/scripts= /decodetree.py b/scripts/decodetree.py
> > index 47aa9caf6d1..fa40= 903cff1 100644
> > --- a/scripts/decodetree.py
> > +++ b/= scripts/decodetree.py
> > @@ -1304,7 +1304,7 @@ def main():
>= ; >
> > =C2=A0 =C2=A0 =C2=A0for filename in args:
> > = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0input_file =3D filename
> > - = =C2=A0 =C2=A0 =C2=A0 =C2=A0f =3D open(filename, 'r')
> > += =C2=A0 =C2=A0 =C2=A0 =C2=A0f =3D open(filename, 'r', encoding=3D&#= 39;utf-8')
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0parse_file(f,= toppat)
> > =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0f.close()
>> Should we also be opening the output file explicitly as
> utf-= 8 ? (How do we say "write to sys.stdout as utf-8" for
> the= case where we're doing that?)

Can be done with
```
=C2=A0= =C2=A0 =C2=A0 =C2=A0 sys.stdout =3D io.TextIOWrapper(sys.stdout.buffer, en= coding=3D"utf8", errors=3D"ignore")
```

>=
> thanks
> -- PMM
>


--
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0=E6=AD=A4=E8=87=B4
=E7=A4=BC
=E7=BD=97=E5=8B=87=E5= =88=9A
Yours
=C2=A0 =C2=A0 sincerely,
Yonggang Luo
--00000000000006d44405b865d78c--