From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dr. Philipp Tomsich Date: Wed, 26 Apr 2017 00:27:46 +0200 Subject: [U-Boot] [PATCH] patman: encode CC list to UTF-8 In-Reply-To: References: <1492608257-924-1-git-send-email-philipp.tomsich@theobroma-systems.com> <20170425171225.GA12511@bill-the-cat> Message-ID: <11D7D264-A753-4B6E-8231-CED423D0E740@theobroma-systems.com> List-Id: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit To: u-boot@lists.denx.de Hi Simon, > On 25 Apr 2017, at 22:31, Simon Glass wrote: > > Hi Tom, > > On 25 April 2017 at 11:12, Tom Rini wrote: >> >> On Sat, Apr 22, 2017 at 05:53:36PM -0600, Simon Glass wrote: >>> +Tom >>> >>> On 19 April 2017 at 07:24, Philipp Tomsich >>> wrote: >>>> >>>> This change encodes the CC list to UTF-8 to avoid failures on >>>> maintainer-addresses that include non-ASCII characters (observed on >>>> Debian 7.11 with Python 2.7.3). >>>> >>>> Without this, I get the following failure: >>>> Traceback (most recent call last): >>>> File "tools/patman/patman", line 159, in >>>> options.add_maintainers) >>>> File "[snip]/u-boot/tools/patman/series.py", line 234, in MakeCcFile >>>> print(commit.patch, ', '.join(set(list)), file=fd) >>>> UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 81: ordinal not in range(128) >>>> from Heiko's email address: >>>> [..., u'"Heiko St\xfcbner" ', ...] >>>> >>>> While with this change added this encodes to: >>>> "=?UTF-8?q?Heiko=20St=C3=BCbner?= " >>>> >>>> Signed-off-by: Philipp Tomsich >>>> --- >>>> >>>> tools/patman/series.py | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>> >>> Reviewed-by: Simon Glass >> >> Please put this in a PR for me, along with any other critical fixes to >> the various python tools we have, thanks! >> >> And also, do we need to perhaps whack something at a higher level, and >> more consistently, about unicode? This is, I gather, doing UTF-8 right. >> In buildman we have a few patches to just translate to latin-1 instead. >> We should do the same thing I think, and perhaps there's a higher level >> up in the code where we need to do it too? I don't know.. > > Actually I don't think we are quite there yet. This really needs a > test with all the different places strings can come from, to make sure > patman does the right thing. On the topic of ‘different places strings can come from’, here’s another change from my WIP tree that fixes some other UTF-8 issues in patman and may point you towards another trouble spot: @@ -229,14 +229,16 @@ class Series(dict): raise_on_error=raise_on_error) if add_maintainers: list += get_maintainer.GetMaintainer(commit.patch) + list = [s.encode('utf-8') for s in list] all_ccs += list - print(commit.patch, ', '.join(set(list)).encode('utf-8'), file=fd) + print(commit.patch, ', '.join(set(list)), file=fd) self._generated_cc[commit.patch] = list if cover_fname: cover_cc = gitutil.BuildEmailList(self.get('cover_cc', '')) - cc_list = ', '.join([x.decode('utf-8') for x in set(cover_cc + all_ccs)]) - print(cover_fname, cc_list.encode('utf-8'), file=fd) + cover_cc = [s.encode('utf-8') for s in cover_cc] + cc_list = ', '.join([x for x in set(cover_cc + all_ccs)]) + print(cover_fname, cc_list, file=fd) fd.close() return fname Regards, Philipp.