From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932371AbeAXUs2 (ORCPT ); Wed, 24 Jan 2018 15:48:28 -0500 Received: from mail-it0-f65.google.com ([209.85.214.65]:34581 "EHLO mail-it0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932072AbeAXUs1 (ORCPT ); Wed, 24 Jan 2018 15:48:27 -0500 X-Google-Smtp-Source: AH8x226aZed0IKB+bSk7TBiawJjZR+z3C6ZKb7PGnkOxpuQcBdTItvXkhdTHrsQYYrN31uG6cb2DAKEMwGEQIGjiA58= MIME-Version: 1.0 In-Reply-To: <20180124202501.yoy65ubq2zqumehn@chibold.localdomain> References: <20180123195644.GA92112@bhelgaas-glaptop.roam.corp.google.com> <20180124004603.GH5317@bhelgaas-glaptop.roam.corp.google.com> <20180124162027.GI5317@bhelgaas-glaptop.roam.corp.google.com> <20180124202501.yoy65ubq2zqumehn@chibold.localdomain> From: Linus Torvalds Date: Wed, 24 Jan 2018 12:48:25 -0800 X-Google-Sender-Auth: gsy6-hIc8mo3f1ALx0SLJDVzdco Message-ID: Subject: Re: [GIT PULL] PCI fixes for v4.15 To: Peter Grayson Cc: Bjorn Helgaas , Catalin Marinas , linux-pci@vger.kernel.org, Linux Kernel Mailing List , Lorenzo Pieralisi , =?UTF-8?Q?Christian_K=C3=B6nig?= , Aaro Koskinen , Andy Shevchenko , Boris Ostrovsky , Juergen Gross , Alex Deucher , David Airlie Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jan 24, 2018 at 12:25 PM, Peter Grayson wrote: > > The latest stgit release (v0.18) ignores any mis-encoding of the email > body. However, stgit master now decodes email bodies and is thus exposed > to this kind of stray latin-1 character in a UTF-8 body. > > I believe stgit's goal should be to identify and repair this kind of > issue as git does. I will be working on that. Yes, good. The "latin1 vs utf-8" confusion is sadly still somewhat common in Western Europe, from personal experience. People just got used to Latin1 working almost by accident without any explicit encoding, possibly _because_ it also acts as the first 256 bytes of unicode. I suspect the old 8-bit DOS character set (aka "code page 437") is perhaps even more commonly seen in some situations, just not in unix development contexts.. And it lacks a number of the (admittedly rarer) European accented characters anyway. So git basically first does a conversion according to the stated encoding, but after that conversion it will then do another pass to actually verify that the end result is valid utf-8, and if not, do the (trivial) latin1 -> utf-8 conversion. And part of the reason for that latin1 special case is very much the whole "it's trivial" part. So it's not _just_ about "common error in western emails", it's also simply that Latin1 really is special in the Unicode domain. No other character set has that trivial conversion into utf-8. See verify_utf8() in commit.c in the git code. > Unfortunately, the head of stgit master does not yet solve this issue. I > am working to remedy that. Thanks. We used to be *horrible* about getting "complex" names right in the kernel logs, but I've tried to make sure that we actually get this right and have proper names for the last many years. Linus