From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, UNWANTED_LANGUAGE_BODY,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D47DDC432BE for ; Tue, 24 Aug 2021 09:09:32 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B30F061374 for ; Tue, 24 Aug 2021 09:09:32 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235491AbhHXJKO convert rfc822-to-8bit (ORCPT ); Tue, 24 Aug 2021 05:10:14 -0400 Received: from shark2.2a.pl ([213.77.90.2]:55921 "EHLO shark.2a.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S235396AbhHXJKN (ORCPT ); Tue, 24 Aug 2021 05:10:13 -0400 X-Greylist: delayed 539 seconds by postgrey-1.27 at vger.kernel.org; Tue, 24 Aug 2021 05:10:12 EDT Received: from wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) by shark.2a.pl (Postfix) with ESMTP id F25941750875 for ; Tue, 24 Aug 2021 11:00:11 +0200 (CEST) X-Virus-Scanned: amavisd-new at 2a.pl Received: from shark.2a.pl ([213.77.90.2]) by wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) (amavisd-new, port 10024) with ESMTP id G66aUzYRLRus for ; Tue, 24 Aug 2021 11:00:06 +0200 (CEST) Received: from localhost.localdomain (unknown [10.8.1.26]) by shark.2a.pl (Postfix) with ESMTPSA id 04F051750808 for ; Tue, 24 Aug 2021 11:00:06 +0200 (CEST) From: Krzysztof =?utf-8?B?xbtlbGVjaG93c2tp?= To: git@vger.kernel.org Subject: git log --encoding=HTML is not supported Date: Tue, 24 Aug 2021 11:00:03 +0200 Message-ID: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Co robiłeś/-aś zanim pojawił się błąd? (Kroki, aby odtworzyć problem) { git log --oneline --encoding=HTML stl_function.h; } Co powinno się stać? (Oczekiwane zachowanie) 828176ba490 libstdc++: Improve doxygen comments in <bits/stl_function.h> Co stało się zamiast tego? (Rzeczywiste zachowanie) 828176ba490 libstdc++: Improve doxygen comments in Jaka jest różnica między tym, co powinno się stać, a tym, co się stało? Znak początku nazwy pliku jest interpretowany jako znak otwierający znacznik. Inne cenne uwagi: Błąd u klienta: Implementacja wtyczki: "--pretty=format: %h %ad %s %an " Podobne zgłoszenie: Proponowane rozwiązanie: W odróżnieniu od omawianych powyżej trudności używaniem formatu wynikowego JSON, moim zdaniem w tym przypadku wystarczyłoby zakodować znaki [<] i [&] w treści w sposób odpowiedni dla HTML. (To rozwiązanie nie zakłada wykrywania i odrzucania znaków nieprawidłowych.) [Informacje o systemie] wersja gita: git version 2.32.0 cpu: x86_64 no commit associated with this build sizeof-long: 8 sizeof-size_t: 8 shell-path: /bin/sh uname: Linux 5.13.12-1-default #1 SMP Wed Aug 18 08:01:38 UTC 2021 (999e604) x86_64 informacje o kompilacji: gnuc: 11.1 informacje o bibliotece libc: glibc: 2.33 $SHELL (typically, interactive shell): /bin/bash [Włączone skrypty Gita] From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,UNWANTED_LANGUAGE_BODY,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 47071C4320A for ; Tue, 24 Aug 2021 10:32:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 23A7B61373 for ; Tue, 24 Aug 2021 10:32:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236250AbhHXKcx (ORCPT ); Tue, 24 Aug 2021 06:32:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49964 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236232AbhHXKcq (ORCPT ); Tue, 24 Aug 2021 06:32:46 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 99270C061796 for ; Tue, 24 Aug 2021 03:31:18 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id z24-20020a17090acb1800b0018e87a24300so2036030pjt.0 for ; Tue, 24 Aug 2021 03:31:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=ZrCJ35/GRD0zclS7HwGjmd9XS1HZrANSFdWT1W7kq1o=; b=ROYJSvKwh7r5zpz+U4C7Q7wMbuRx8FVncFeJH7oVxo9EhaGZ6sDxVajKl+J/20W/u+ P3c8Oe+2DH27xiL+yBRwbN79FpLGifZ9XrEkt3Pz43/oo9BTO8qfR85ZkTmYFJCCvVB6 xuXDl6af/gBksNaM0jeDwb7IYcdtZmzz2GAjIwTPiXPEEbseH8cM+KcF9PuqVaustl0t IyPfsKye4fUhfd9Md5bHb5/Mykw/PQQBVVXqINZ0n4lDU093cQS/gPVNLDS4IceFR63w anzcbFlIfqEBbyxzuS5IAco4Mtd6HYG4yX5jtewfTQzHgfYMyoGNzjYtzpNVEpt9t2Jg 7qYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ZrCJ35/GRD0zclS7HwGjmd9XS1HZrANSFdWT1W7kq1o=; b=pK6CIV3PCPnxkcEocSd8f6ePswWB/9zz/oo6wLiD/1SkE1BLIsDX+jUsmTRdG2XGX/ TAfd5jVqD/oY6HfnANYeEv5JLDtw/3c/yvYAOh7+0VgaM0WyGT488QEaOZJoFIb0w2H/ ESEdv6YwWeQWfoSWsgmghr4ix3NXBaLM87JtlvWbjE/rjXS36nusDNV1ltGnGsTqL63P NqVkuerhCbWZOsezbjMDDLSX4pnsP6rfl/SRwy5ge50F4bpTOOTokHk1VTOS8wDipNIm U1i/RuNW6aflLw4KUTMAFOMzZkTW96D52SVHpkKqwK5m4mi2nDAKYeGU/4LeVeTAd5zI pn3g== X-Gm-Message-State: AOAM532jIcjFBuCuS4YWbpxPPQUnL6EEardd/BFG/eE4yzl09VEbNF0f PUfjUIxiDacunsI3xwe4nU+LFyDlI1E= X-Google-Smtp-Source: ABdhPJxPIAW84UN6Imw7ERD++9N5SrIty7NYP3trNVaVqPTRrMBl9mlO2EV+BqfQd5cR0UI3NKUKIg== X-Received: by 2002:a17:902:ab53:b029:12c:57a6:13ae with SMTP id ij19-20020a170902ab53b029012c57a613aemr32523312plb.53.1629801077256; Tue, 24 Aug 2021 03:31:17 -0700 (PDT) Received: from [192.168.43.80] (subs28-116-206-12-43.three.co.id. [116.206.12.43]) by smtp.gmail.com with ESMTPSA id i5sm1932067pjk.47.2021.08.24.03.31.15 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Aug 2021 03:31:16 -0700 (PDT) Subject: Re: git log --encoding=HTML is not supported To: =?UTF-8?Q?Krzysztof_=c5=bbelechowski?= , git@vger.kernel.org References: <9896630.2IqcCWsCYL@localhost.localdomain> From: Bagas Sanjaya Message-ID: <22496693-cf63-a278-c85e-d9e4376e2a59@gmail.com> Date: Tue, 24 Aug 2021 17:31:14 +0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <9896630.2IqcCWsCYL@localhost.localdomain> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 24/08/21 16.00, Krzysztof Żelechowski wrote: > Co robiłeś/-aś zanim pojawił się błąd? (Kroki, aby odtworzyć problem) > { git log --oneline --encoding=HTML stl_function.h; } > > Co powinno się stać? (Oczekiwane zachowanie) > 828176ba490 libstdc++: Improve doxygen comments in <bits/stl_function.h> > > Co stało się zamiast tego? (Rzeczywiste zachowanie) > 828176ba490 libstdc++: Improve doxygen comments in > > Jaka jest różnica między tym, co powinno się stać, a tym, co się stało? > Znak początku nazwy pliku jest interpretowany jako znak otwierający znacznik. > > Inne cenne uwagi: > Błąd u klienta: > > > Implementacja wtyczki: > "--pretty=format: > > %h %ad %s %an > " > > Podobne zgłoszenie: > > > Proponowane rozwiązanie: > W odróżnieniu od omawianych powyżej trudności używaniem formatu wynikowego > JSON, moim zdaniem w tym przypadku wystarczyłoby zakodować znaki [<] i [&] > w treści w sposób odpowiedni dla HTML. > (To rozwiązanie nie zakłada wykrywania i odrzucania znaków nieprawidłowych.) > > [Informacje o systemie] > wersja gita: > git version 2.32.0 > cpu: x86_64 > no commit associated with this build > sizeof-long: 8 > sizeof-size_t: 8 > shell-path: /bin/sh > uname: Linux 5.13.12-1-default #1 SMP Wed Aug 18 08:01:38 UTC 2021 (999e604) > x86_64 > informacje o kompilacji: gnuc: 11.1 > informacje o bibliotece libc: glibc: 2.33 > $SHELL (typically, interactive shell): /bin/bash > > > [Włączone skrypty Gita] > > > Please speak English here (in other words, re-submit git-bugreport without l10n). -- An old man doll... just what I always wanted! - Clara From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86561C4338F for ; Tue, 24 Aug 2021 10:33:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6A1AE6127B for ; Tue, 24 Aug 2021 10:33:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236120AbhHXKeB (ORCPT ); Tue, 24 Aug 2021 06:34:01 -0400 Received: from shark2.2a.pl ([213.77.90.2]:58645 "EHLO shark.2a.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S235054AbhHXKeA (ORCPT ); Tue, 24 Aug 2021 06:34:00 -0400 Received: from wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) by shark.2a.pl (Postfix) with ESMTP id A67C91750874; Tue, 24 Aug 2021 12:33:12 +0200 (CEST) X-Virus-Scanned: amavisd-new at 2a.pl Received: from shark.2a.pl ([213.77.90.2]) by wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) (amavisd-new, port 10024) with ESMTP id cKR5x4TEzwX8; Tue, 24 Aug 2021 12:33:10 +0200 (CEST) Received: from localhost.localdomain (unknown [10.8.1.26]) by shark.2a.pl (Postfix) with ESMTPSA id 1933F1750872; Tue, 24 Aug 2021 12:33:11 +0200 (CEST) From: Krzysztof =?utf-8?B?xbtlbGVjaG93c2tp?= To: git@vger.kernel.org, Bagas Sanjaya Subject: Re: git log --encoding=HTML is not supported Date: Tue, 24 Aug 2021 12:33:10 +0200 Message-ID: <2197959.ZqlxZjeC1n@localhost.localdomain> In-Reply-To: <22496693-cf63-a278-c85e-d9e4376e2a59@gmail.com> References: <9896630.2IqcCWsCYL@localhost.localdomain> <22496693-cf63-a278-c85e-d9e4376e2a59@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Dnia wtorek, 24 sierpnia 2021 12:31:14 CEST Bagas Sanjaya pisze: > Please speak English here (in other words, re-submit git-bugreport > without l10n). How do I do that? Chris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2EFCDC4338F for ; Tue, 24 Aug 2021 10:46:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 07B2361242 for ; Tue, 24 Aug 2021 10:46:34 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236142AbhHXKrQ (ORCPT ); Tue, 24 Aug 2021 06:47:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235905AbhHXKrP (ORCPT ); Tue, 24 Aug 2021 06:47:15 -0400 Received: from mail-pj1-x1034.google.com (mail-pj1-x1034.google.com [IPv6:2607:f8b0:4864:20::1034]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 708B4C061757 for ; Tue, 24 Aug 2021 03:46:31 -0700 (PDT) Received: by mail-pj1-x1034.google.com with SMTP id u13-20020a17090abb0db0290177e1d9b3f7so1505644pjr.1 for ; Tue, 24 Aug 2021 03:46:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=IPickOJqZicCbXw0Z/YT9LexmmOLUjuC4WTn0DZqY0c=; b=fB5qf0j3k+SnlEUslljNo+ODOb6gST7wWdbR8SgqiJDgN73Kff9wUGEgRLgnVGk7Qw a30VHK9MsHkGqUGIQbODh3fEDfdL/9VCWrmv2NSyiBUMLi+Phgc/7gn7e+uvhYQWFEVR 6vPzVTTzAhhs6rCG4ZEZ0OUjEUpGMoH79dD9/ZFpClygzroAer2EGDcQUZyJ6CH0+vMT gB/mrAE39+hnouQJ9Uu7g9VBORUxjPiyx4ILT16JlOV3OX4SXcidP6s8gmp36PHLskaK U9Dson73b9FyffcLQvt48mXxM/ZY+SoHuUgke+78VWB5Ux+O2FDTuzdsSXaoAzm2K35x G5pA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=IPickOJqZicCbXw0Z/YT9LexmmOLUjuC4WTn0DZqY0c=; b=soPYvXZUC8+5WUngZYJeHeWXa63Dcsc+wU/4dX+EsbpJh2WYdk7C02raLbuWdNjjFi wUsCeAcSrU6W7IMiB4kIvMFnQEmKOLCgkhHOinBuqslfEFFCokCFf5MYnsorJBx2/vKf 3TWOtzpLkodLAbNfWwALkDjNU94dEHbDpCh/bsEqwiiMD3GZh4sQ8YVPT1fJb0KmFTgZ pwJ/upTXrkEVN7f8GDfE4jp6Pc1OinvMOfD7/OUPbiGcgiO2cAMdMomUFHWktjxpZwT4 J1yFekLZJPbtoLbZX6Db+USb/do18FaHiAb26OIaCQQPzwHmMIia8yNY6z2gZV+G2AbL pCwQ== X-Gm-Message-State: AOAM531hJo91/GKzyNYdBMdCzyUnDpat+Q1J/mbAWD2C4RLnNbHkQzET gGCBY0rEgdwldjBcj0rNLRztKB1tlRc= X-Google-Smtp-Source: ABdhPJzd11yHekf+bw9jjeEIui3qnhhkUGmJDfe0jijWsKsXR7k/KmxrzRD0LgsMEN4GMdFmvFuzzg== X-Received: by 2002:a17:902:e850:b0:12d:91c6:1cd with SMTP id t16-20020a170902e85000b0012d91c601cdmr33142360plg.16.1629801990693; Tue, 24 Aug 2021 03:46:30 -0700 (PDT) Received: from [192.168.43.80] (subs28-116-206-12-43.three.co.id. [116.206.12.43]) by smtp.gmail.com with ESMTPSA id y12sm22906939pgk.7.2021.08.24.03.46.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 24 Aug 2021 03:46:30 -0700 (PDT) Subject: Re: git log --encoding=HTML is not supported To: =?UTF-8?Q?Krzysztof_=c5=bbelechowski?= , git@vger.kernel.org References: <9896630.2IqcCWsCYL@localhost.localdomain> <22496693-cf63-a278-c85e-d9e4376e2a59@gmail.com> <2197959.ZqlxZjeC1n@localhost.localdomain> From: Bagas Sanjaya Message-ID: <05ffcc36-f473-14f3-d7df-1efa0dcfcade@gmail.com> Date: Tue, 24 Aug 2021 17:46:28 +0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <2197959.ZqlxZjeC1n@localhost.localdomain> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 24/08/21 17.33, Krzysztof Żelechowski wrote: > Dnia wtorek, 24 sierpnia 2021 12:31:14 CEST Bagas Sanjaya pisze: > >> Please speak English here (in other words, re-submit git-bugreport >> without l10n). > > How do I do that? You need to set locale to English when executing `git bugreport`: ``` LANGUAGE=en_US.UTF-8 LC_ALL=en_US.UTF-8 /path/to/git bugreport ``` -- An old man doll... just what I always wanted! - Clara From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 07660C4320A for ; Tue, 24 Aug 2021 19:12:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DE417613AB for ; Tue, 24 Aug 2021 19:12:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234674AbhHXTMr (ORCPT ); Tue, 24 Aug 2021 15:12:47 -0400 Received: from pb-smtp1.pobox.com ([64.147.108.70]:53844 "EHLO pb-smtp1.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234649AbhHXTMq (ORCPT ); Tue, 24 Aug 2021 15:12:46 -0400 Received: from pb-smtp1.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 09138F0BBC; Tue, 24 Aug 2021 15:12:01 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=OORX5U0Oydb7 r8zdpJbLef4uRS/kyWTrJ5heQyk0ClY=; b=drdiuszUgmsHK1Dk+riC/Hi6sOlP V/oITewgdAnZhm1JyhgbTppyzMlfatO6i3PVVFPQrHb7z0UFgMDKKUmEykGlOYdF SFsGym0NaCNRsk6ymiA4RUJae/z5H1b1d6tPERxOVism8xzgSdi7UnnO7IFCLHM8 p10z762u3LeJWuI= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id C3DE7F0BBB; Tue, 24 Aug 2021 15:12:00 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.74.116.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 0D4CFF0BBA; Tue, 24 Aug 2021 15:12:00 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Emily Shaffer Cc: Bagas Sanjaya , Krzysztof =?utf-8?Q?=C5=BBelecho?= =?utf-8?Q?wski?= , git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported References: <9896630.2IqcCWsCYL@localhost.localdomain> <22496693-cf63-a278-c85e-d9e4376e2a59@gmail.com> <2197959.ZqlxZjeC1n@localhost.localdomain> <05ffcc36-f473-14f3-d7df-1efa0dcfcade@gmail.com> Date: Tue, 24 Aug 2021 12:11:59 -0700 In-Reply-To: <05ffcc36-f473-14f3-d7df-1efa0dcfcade@gmail.com> (Bagas Sanjaya's message of "Tue, 24 Aug 2021 17:46:28 +0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Pobox-Relay-ID: 2851863A-050F-11EC-8F60-8B3BC6D8090B-77302942!pb-smtp1.pobox.com Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Bagas Sanjaya writes: > On 24/08/21 17.33, Krzysztof =C5=BBelechowski wrote: >> Dnia wtorek, 24 sierpnia 2021 12:31:14 CEST Bagas Sanjaya pisze: >>=20 >>> Please speak English here (in other words, re-submit git-bugreport >>> without l10n). >> How do I do that? > > You need to set locale to English when executing `git bugreport`: > > ``` > LANGUAGE=3Den_US.UTF-8 LC_ALL=3Den_US.UTF-8 /path/to/git bugreport > ``` Emily, what's your take on this exchange? I recall that many people (me included) went for "user friendlyness" by pushing to localize the questionnaire, but here, it seems to be backfiring at us. I personally think that it is OK to give a localized questionnaire and let volunteers who can speak the language help non-English speakers, but at the same time, it may be a good idea to hint that filling in the answers in English would give a better chance for their problems to be looked at (in a localized message). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B30F7C432BE for ; Wed, 25 Aug 2021 00:57:50 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 93680611AF for ; Wed, 25 Aug 2021 00:57:50 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236869AbhHYA6d (ORCPT ); Tue, 24 Aug 2021 20:58:33 -0400 Received: from cloud.peff.net ([104.130.231.41]:58380 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234058AbhHYA6d (ORCPT ); Tue, 24 Aug 2021 20:58:33 -0400 Received: (qmail 21187 invoked by uid 109); 25 Aug 2021 00:57:48 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Wed, 25 Aug 2021 00:57:48 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 4055 invoked by uid 111); 25 Aug 2021 00:57:48 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Tue, 24 Aug 2021 20:57:48 -0400 Authentication-Results: peff.net; auth=none Date: Tue, 24 Aug 2021 20:57:47 -0400 From: Jeff King To: Krzysztof =?utf-8?Q?=C5=BBelechowski?= Cc: git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9896630.2IqcCWsCYL@localhost.localdomain> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Tue, Aug 24, 2021 at 11:00:03AM +0200, Krzysztof Żelechowski wrote: > Co robiłeś/-aś zanim pojawił się błąd? (Kroki, aby odtworzyć problem) > { git log --oneline --encoding=HTML stl_function.h; } > > Co powinno się stać? (Oczekiwane zachowanie) > 828176ba490 libstdc++: Improve doxygen comments in <bits/stl_function.h> > > Co stało się zamiast tego? (Rzeczywiste zachowanie) > 828176ba490 libstdc++: Improve doxygen comments in I can't read the non-English parts of the email, but I gather you were expecting "--encoding=HTML" to escape syntactically significant HTML characters. It's not that kind of "encoding", but more "which character set are you using" (utf8 vs iso8859-1, etc). We feed the encoding "HTML" to iconv_open(), which of course has no idea what that is. It's unfortunate, though, that we don't even print a warning, and instead just quietly leave the text intact. I wonder if we should do something like: diff --git a/pretty.c b/pretty.c index 535eb97fa6..708b618cfe 100644 --- a/pretty.c +++ b/pretty.c @@ -672,7 +672,11 @@ const char *repo_logmsg_reencode(struct repository *r, * If the re-encoding failed, out might be NULL here; in that * case we just return the commit message verbatim. */ - return out ? out : msg; + if (!out) { + warning("unable to reencode commit to '%s'", output_encoding); + return msg; + } + return out; } static int mailmap_name(const char **email, size_t *email_len, As far as what you're trying to accomplish, HTML-escaping isn't something Git supports. You'll have to run the output through an external escaping mechanism. -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 03931C432BE for ; Wed, 25 Aug 2021 16:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DC50A6115A for ; Wed, 25 Aug 2021 16:32:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241639AbhHYQcx (ORCPT ); Wed, 25 Aug 2021 12:32:53 -0400 Received: from pb-smtp21.pobox.com ([173.228.157.53]:59579 "EHLO pb-smtp21.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240303AbhHYQcu (ORCPT ); Wed, 25 Aug 2021 12:32:50 -0400 Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 897DB14CE88; Wed, 25 Aug 2021 12:32:04 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:message-id:mime-version:content-type; s=sasl; bh=8qXju+7i7YbZOWB9P7IWjxrJAqA8l5mJSYtLrLbgnVo=; b=wXsA 04VIc4VMRbvENWhvK8Q8ECO1JrfbTkEJ4fSY+Ji252eJ/22ahxXJufjseAXnXVp/ xLO4mryC23p2vDSb4VOdZTtgLoGl0/skpF3b36NK3/R+7kkoshDZ4F8vWs19dsrE dumoFK+Fub6Ods+QKGXoFuNMIM0/N54F7bOMHvc= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 8194D14CE87; Wed, 25 Aug 2021 12:32:04 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.74.116.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id C385D14CE85; Wed, 25 Aug 2021 12:32:01 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Jeff King Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported References: <9896630.2IqcCWsCYL@localhost.localdomain> Date: Wed, 25 Aug 2021 09:31:59 -0700 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: F9B9B276-05C1-11EC-8F84-FA9E2DDBB1FC-77302942!pb-smtp21.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > We feed the encoding "HTML" to iconv_open(), which of course has no idea > what that is. It's unfortunate, though, that we don't even print a > warning, and instead just quietly leave the text intact. I wonder if we > should do something like: > > diff --git a/pretty.c b/pretty.c > index 535eb97fa6..708b618cfe 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -672,7 +672,11 @@ const char *repo_logmsg_reencode(struct repository *r, > * If the re-encoding failed, out might be NULL here; in that > * case we just return the commit message verbatim. > */ > - return out ? out : msg; > + if (!out) { > + warning("unable to reencode commit to '%s'", output_encoding); > + return msg; > + } > + return out; > } > > static int mailmap_name(const char **email, size_t *email_len, This addition sounds quite sensible to me. "git log --encoding=bogus" would issue this warning for each and every commit and that may be a bit irritating, but being irritating may be a good characteristic for a warning message that is given to an easily correctable condition. I originally thought that the warning would be lost to the pager, but apparently I forgot what I did eons ago at 61b80509 (sending errors to stdout under $PAGER, 2008-02-16) ;-). From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4827EC432BE for ; Wed, 25 Aug 2021 23:00:57 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1C1A7604DC for ; Wed, 25 Aug 2021 23:00:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232906AbhHYXBm convert rfc822-to-8bit (ORCPT ); Wed, 25 Aug 2021 19:01:42 -0400 Received: from shark2.2a.pl ([213.77.90.2]:55198 "EHLO shark.2a.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S231535AbhHYXBm (ORCPT ); Wed, 25 Aug 2021 19:01:42 -0400 Received: from wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) by shark.2a.pl (Postfix) with ESMTP id 798BC175080B; Thu, 26 Aug 2021 01:00:49 +0200 (CEST) X-Virus-Scanned: amavisd-new at 2a.pl Received: from shark.2a.pl ([213.77.90.2]) by wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) (amavisd-new, port 10024) with ESMTP id o3iXtwu6-xID; Thu, 26 Aug 2021 01:00:47 +0200 (CEST) Received: from localhost.localdomain (unknown [10.8.1.26]) by shark.2a.pl (Postfix) with ESMTPSA id 3E12A1750809; Thu, 26 Aug 2021 01:00:47 +0200 (CEST) From: Krzysztof =?utf-8?B?xbtlbGVjaG93c2tp?= To: Jeff King Cc: git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported Date: Thu, 26 Aug 2021 01:00:44 +0200 Message-ID: <1790169.Z4XVHNUiN4@localhost.localdomain> In-Reply-To: References: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Dnia środa, 25 sierpnia 2021 02:57:47 CEST Jeff King pisze: > As far as what you're trying to accomplish, HTML-escaping isn't > something Git supports. You'll have to run the output through an > external escaping mechanism. Have you looked at the format? It is a HTML fragment with placeholders to be filled by git log. I cannot run the output through an external escaping mechanism because it will kill the markup that is already there. Chris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D436C432BE for ; Wed, 25 Aug 2021 23:29:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74EE96108E for ; Wed, 25 Aug 2021 23:29:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233699AbhHYX3w convert rfc822-to-8bit (ORCPT ); Wed, 25 Aug 2021 19:29:52 -0400 Received: from shark2.2a.pl ([213.77.90.2]:57906 "EHLO shark.2a.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S233668AbhHYX3u (ORCPT ); Wed, 25 Aug 2021 19:29:50 -0400 Received: from wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) by shark.2a.pl (Postfix) with ESMTP id 6CBED175080B; Thu, 26 Aug 2021 01:29:02 +0200 (CEST) X-Virus-Scanned: amavisd-new at 2a.pl Received: from shark.2a.pl ([213.77.90.2]) by wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) (amavisd-new, port 10024) with ESMTP id IzPUZS1mdnuq; Thu, 26 Aug 2021 01:28:59 +0200 (CEST) Received: from localhost.localdomain (unknown [10.8.1.26]) by shark.2a.pl (Postfix) with ESMTPSA id 7A5151750809; Thu, 26 Aug 2021 01:28:59 +0200 (CEST) From: Krzysztof =?utf-8?B?xbtlbGVjaG93c2tp?= To: Jeff King Cc: git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported Date: Thu, 26 Aug 2021 01:28:58 +0200 Message-ID: <24330338.EZKKyuarjD@localhost.localdomain> In-Reply-To: References: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Transfer-Encoding: 8BIT Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Dnia środa, 25 sierpnia 2021 02:57:47 CEST Jeff King pisze: > diff --git a/pretty.c b/pretty.c Please fix the manual for git log. It should say what encoding is recognised (namely if supported by iconv(1), except that POSIX character maps of iconv(1p) are not supported), and that an unrecognised encoding is ignored. I would also like to see the HTML encoding supported independently of iconv, which seems like a pretty easy thing to do. Dream on, I guess? Chris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99527C432BE for ; Wed, 25 Aug 2021 23:48:02 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6E83060200 for ; Wed, 25 Aug 2021 23:48:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233924AbhHYXsr (ORCPT ); Wed, 25 Aug 2021 19:48:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56762 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231535AbhHYXsr (ORCPT ); Wed, 25 Aug 2021 19:48:47 -0400 Received: from mail-pg1-x533.google.com (mail-pg1-x533.google.com [IPv6:2607:f8b0:4864:20::533]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E92BFC061757 for ; Wed, 25 Aug 2021 16:48:00 -0700 (PDT) Received: by mail-pg1-x533.google.com with SMTP id e7so1336966pgk.2 for ; Wed, 25 Aug 2021 16:48:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=atlassian.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=sMfi2p4J40ZjBbYzOV5Ygmt+EFL7PMzpkTuToJ6GCcw=; b=ZR8wzAdLt6FUdorj0dZlsMmCapi/qS1M+/hz27OvoE9BtHvCzueqQ9NXhv9G0dL8BP DY1VbUa7nqVl0tCTGvlWL3aRuf41HbgF3qwyq/LZHxMH5PnCe5KPnC+mOaHcHqLlPTai k/u3s8+6YRPMOcObrsRUl/VYSFZpr/kDhrLeccm5ErxF+gX4OZqt9A4dQCOmUPPU7/q6 sqWMbsRiY3RoO/7M6ZxC8Jl8p9lRmQTfGjMObnGcrsHJRdoSBhvFLgh0/utQAq/StuTf IlOiFqFwJ9Eav39PpCcMjbGS1F3iPBjHlsxUn9Ob6mmXElgGuDJdvyM1Mu9shOKxdRwz OxLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=sMfi2p4J40ZjBbYzOV5Ygmt+EFL7PMzpkTuToJ6GCcw=; b=MtKm4k8PMM3NLALndf7OjRxKzOiYJri+oJlR+clSv8PTJazJ5MQ3vgPKwqCoKu9JFW J6Dnlad+Yo0ibzmy3/wywFQfYKzrebWX7qsjmcFFcUN4uvTsmKHSg2oaVKrRY+qwr4WE R2ZeS3c7PuZyVDELF52ZyPH+SufKhxto3lH2dKL+LXEpSeqysl/1iYXL8HWbJgT3xZo1 xRj79CB2XswTczQamZfMf8MEGYRTckejgjfd2l+aa7pATpf72vdgpHrmFq+PrmRnd8+2 X5OVo3r4qwzso4kwFFdWispoj23mdMS+u6Ltt32ly1Qf+NtwlXWheQQvvRL3V4vyqovK P1Pg== X-Gm-Message-State: AOAM530UvaIOKkky+yVc2JIhOxwcdnGsINYV8zUhGk/5Nzh7k1yciuY+ fnANEEYJkmJGaOSd2n9DIptxJGXj6ujoA12GBx3+O7ggGkc= X-Google-Smtp-Source: ABdhPJwcAJCIM0JKhg0eFMwvJ+/hNa/IEBKZrmbTwqls0/BCSFsolDXEDMbrL6hN+HibhCW7UEYSQjzqpmXMdc/OQgs= X-Received: by 2002:a62:4e0f:0:b0:3ee:668d:b841 with SMTP id c15-20020a624e0f000000b003ee668db841mr910509pfb.48.1629935280260; Wed, 25 Aug 2021 16:48:00 -0700 (PDT) MIME-Version: 1.0 References: <9896630.2IqcCWsCYL@localhost.localdomain> <24330338.EZKKyuarjD@localhost.localdomain> In-Reply-To: <24330338.EZKKyuarjD@localhost.localdomain> From: Bryan Turner Date: Wed, 25 Aug 2021 16:47:49 -0700 Message-ID: Subject: Re: git log --encoding=HTML is not supported To: =?UTF-8?Q?Krzysztof_=C5=BBelechowski?= Cc: Jeff King , Git Users Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Aug 25, 2021 at 4:29 PM Krzysztof =C5=BBelechowski wrote: > > Dnia =C5=9Broda, 25 sierpnia 2021 02:57:47 CEST Jeff King pisze: > > diff --git a/pretty.c b/pretty.c > > Please fix the manual for git log. It should say what encoding is recogn= ised > (namely if supported by iconv(1), except that POSIX character maps of > iconv(1p) are not supported), and that an unrecognised encoding is ignore= d. > > I would also like to see the HTML encoding supported independently of ico= nv, > which seems like a pretty easy thing to do. Dream on, I guess? I suspect the answer is less "Dream on" and more "Patches welcome." > > Chris > > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7EDFC432BE for ; Thu, 26 Aug 2021 15:37:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8D0B60F4A for ; Thu, 26 Aug 2021 15:37:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242998AbhHZPid (ORCPT ); Thu, 26 Aug 2021 11:38:33 -0400 Received: from pb-smtp20.pobox.com ([173.228.157.52]:58340 "EHLO pb-smtp20.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232374AbhHZPic (ORCPT ); Thu, 26 Aug 2021 11:38:32 -0400 Received: from pb-smtp20.pobox.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 4FEA9149DD0; Thu, 26 Aug 2021 11:37:45 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=bUOZCdMiyzzd pkjmJ85hGIhJ6C7t/rN9/PlKkCs78sA=; b=o+ZUDp4LyBXe45WFsp18NiK6+6gL QDaNASTGub8O28N5cvbhXMBe6agQqyb6NHSeTQWtt7K0eYbQisv8jKbjcVx7nc2W 6gr0j0MUqTjd7HGo4N8AweCtp+KKTWqxikcN40hBvVop7Xe1HSar/8yUrTTzUH4t Emj3DBhkcwKnCQ4= Received: from pb-smtp20.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp20.pobox.com (Postfix) with ESMTP id 495D3149DCD; Thu, 26 Aug 2021 11:37:45 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.74.116.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp20.pobox.com (Postfix) with ESMTPSA id 6AC76149DCA; Thu, 26 Aug 2021 11:37:42 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Bryan Turner Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , Jeff King , Git Users Subject: Re: git log --encoding=HTML is not supported References: <9896630.2IqcCWsCYL@localhost.localdomain> <24330338.EZKKyuarjD@localhost.localdomain> Date: Thu, 26 Aug 2021 08:37:40 -0700 In-Reply-To: (Bryan Turner's message of "Wed, 25 Aug 2021 16:47:49 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Pobox-Relay-ID: 8D680DDA-0683-11EC-BE4F-FA11AF6C5138-77302942!pb-smtp20.pobox.com Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Bryan Turner writes: > On Wed, Aug 25, 2021 at 4:29 PM Krzysztof =C5=BBelechowski > wrote: >> >> Dnia =C5=9Broda, 25 sierpnia 2021 02:57:47 CEST Jeff King pisze: >> > diff --git a/pretty.c b/pretty.c >> >> Please fix the manual for git log. It should say what encoding is rec= ognised >> (namely if supported by iconv(1), except that POSIX character maps of >> iconv(1p) are not supported), and that an unrecognised encoding is ign= ored. >> >> I would also like to see the HTML encoding supported independently of = iconv, >> which seems like a pretty easy thing to do. Dream on, I guess? > > I suspect the answer is less "Dream on" and more "Patches welcome." Patches are welcomed but not before a proposed design is freshed out. I am sure people do welcome the design discussion. Pieces taken from the contents stored in Git (like "the title of the commit", "the name of the author of the commit") may need quoting and/or escaping when they are incorporated into a string to become parts of "output", and the way the quoting/escaping must be done would depend on the "host" language/format. HTML has its own requirements for how these pieces coming from Git contents are quoted, but it will not be the only "host" language that needs quoting. The requirement for the feature we are "Dreaming on" may be much closer to the "host language" options (e.g. --tcl, --perl ...) the "git for-each-ref" command has. These options tells us to format each piece of information (e.g. "%(subject)") taken from Git as a natural 'string' constant in the host language, so that git for-each-ref --shell \ --format=3D'do_something %(authorname) %(authoremail)' would write a shell script that calls "do_something" command with two arguments for each ref enumerated by the command, without having to worry about whitespaces and quote characters that may appear in the interpolated pieces. It is immediately obvious that within the context of the for-each-ref command, the follwoing would equally be useful (note: this is already "dreaming on" and does not exist yet): echo "
    " git for-each-ref --html \ --format=3D'
  • %(authoremail)
  • ' echo "
" As we have been seeing efforts to port features around the --format option between the for-each-ref family of commands and the log family of commands, I would also imagine that it would be natural future direction to extend it to the latter and eventually allow git log --html \ --format=3D'%h%s...' to format each commit into a single row in HTML table, and things like that. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A681C432BE for ; Fri, 27 Aug 2021 11:40:16 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7562960FDA for ; Fri, 27 Aug 2021 11:40:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245023AbhH0LlE (ORCPT ); Fri, 27 Aug 2021 07:41:04 -0400 Received: from shark2.2a.pl ([213.77.90.2]:65010 "EHLO shark.2a.pl" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S233376AbhH0LlD (ORCPT ); Fri, 27 Aug 2021 07:41:03 -0400 Received: from wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) by shark.2a.pl (Postfix) with ESMTP id 6DC281750881; Thu, 26 Aug 2021 22:52:40 +0200 (CEST) X-Virus-Scanned: amavisd-new at 2a.pl Received: from shark.2a.pl ([213.77.90.2]) by wrasse.2a.pl (wrasse.2a.pl [213.77.90.7]) (amavisd-new, port 10024) with ESMTP id vgqTNWRvy8Vm; Thu, 26 Aug 2021 22:52:38 +0200 (CEST) Received: from localhost.localdomain (unknown [10.8.1.26]) by shark.2a.pl (Postfix) with ESMTPSA id D4AB21750882; Thu, 26 Aug 2021 22:52:38 +0200 (CEST) From: Krzysztof =?utf-8?B?xbtlbGVjaG93c2tp?= To: Bryan Turner , Junio C Hamano Cc: Jeff King , Git Users Subject: Re: git log --encoding=HTML is not supported Date: Thu, 26 Aug 2021 22:52:36 +0200 Message-ID: <3883941.fE8Og5qy2N@localhost.localdomain> In-Reply-To: References: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Dnia czwartek, 26 sierpnia 2021 17:37:40 CEST Junio C Hamano pisze: > git log --html \ > --format='%h%s...' I would like to be able to say: { git config i18n.logOutputEscape HTML; } What do you think? Chris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 555E0C432BE for ; Fri, 27 Aug 2021 15:59:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 33FDF60F5B for ; Fri, 27 Aug 2021 15:59:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245442AbhH0QAd (ORCPT ); Fri, 27 Aug 2021 12:00:33 -0400 Received: from pb-smtp2.pobox.com ([64.147.108.71]:58334 "EHLO pb-smtp2.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235955AbhH0QAc (ORCPT ); Fri, 27 Aug 2021 12:00:32 -0400 Received: from pb-smtp2.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 51CF6ED333; Fri, 27 Aug 2021 11:59:43 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type:content-transfer-encoding; s=sasl; bh=9vRw1EQ7GTeW 5ZEV6NplahfvIIC5rbdtBCL6Qt7kRuY=; b=Sh8t/16I3bBazhH1yxq3QJgYZX/5 dKI8L+DpQSdqdGxMomgmFvHrk7+/EmnODcy8uXYWhKFltO12BeoBbAAMrzrLCcmr uKPdH/u0SFUzwrLWHSfKQCBhEL9s/uCUJdDG4JlQl3ASf1NyAybRuScXvEvOtY44 8nWZbhInu3NB5ZE= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 4A0ECED332; Fri, 27 Aug 2021 11:59:43 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.74.116.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 96ACDED330; Fri, 27 Aug 2021 11:59:41 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Krzysztof =?utf-8?Q?=C5=BBelechowski?= Cc: Bryan Turner , Jeff King , Git Users Subject: Re: git log --encoding=HTML is not supported References: <9896630.2IqcCWsCYL@localhost.localdomain> <3883941.fE8Og5qy2N@localhost.localdomain> Date: Fri, 27 Aug 2021 08:59:40 -0700 In-Reply-To: <3883941.fE8Og5qy2N@localhost.localdomain> ("Krzysztof =?utf-8?Q?=C5=BBelechowski=22's?= message of "Thu, 26 Aug 2021 22:52:36 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 X-Pobox-Relay-ID: CA1D921E-074F-11EC-A27A-ECFD1DBA3BAF-77302942!pb-smtp2.pobox.com Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Krzysztof =C5=BBelechowski writes: > Dnia czwartek, 26 sierpnia 2021 17:37:40 CEST Junio C Hamano pisze: >> git log --html \ >> --format=3D'%h%s...' > > I would like to be able to say: > { git config i18n.logOutputEscape HTML; } > > What do you think? It depends on what it does. If the configuration means that "git log" output (with any supported options, like "-p") will be given with '<' written to '^lt;' etc. so that it becomes safe to dump it in HTML, it fails to interest me at all. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4C855C432BE for ; Fri, 27 Aug 2021 18:30:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 25C8F60E97 for ; Fri, 27 Aug 2021 18:30:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230063AbhH0SbG (ORCPT ); Fri, 27 Aug 2021 14:31:06 -0400 Received: from cloud.peff.net ([104.130.231.41]:60818 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229739AbhH0SbG (ORCPT ); Fri, 27 Aug 2021 14:31:06 -0400 Received: (qmail 9084 invoked by uid 109); 27 Aug 2021 18:30:16 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 27 Aug 2021 18:30:16 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 15474 invoked by uid 111); 27 Aug 2021 18:30:15 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 27 Aug 2021 14:30:15 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 27 Aug 2021 14:30:15 -0400 From: Jeff King To: Junio C Hamano Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Aug 25, 2021 at 09:31:59AM -0700, Junio C Hamano wrote: > Jeff King writes: > > > We feed the encoding "HTML" to iconv_open(), which of course has no idea > > what that is. It's unfortunate, though, that we don't even print a > > warning, and instead just quietly leave the text intact. I wonder if we > > should do something like: > [...] > This addition sounds quite sensible to me. > > "git log --encoding=bogus" would issue this warning for each and > every commit and that may be a bit irritating, but being irritating > may be a good characteristic for a warning message that is given to > an easily correctable condition. > > I originally thought that the warning would be lost to the pager, > but apparently I forgot what I did eons ago at 61b80509 (sending > errors to stdout under $PAGER, 2008-02-16) ;-). Here it is polished into a real commit. -- >8 -- Subject: [PATCH] logmsg_reencode(): warn when iconv() fails If the user asks for a pretty-printed commit to be converted (either explicitly with --encoding=foo, or implicitly because the commit is non-utf8 and we want to convert it), we pass it through iconv(). If that fails, we fall back to showing the input verbatim, but don't tell the user that the output may be bogus. Let's add a warning to do so, along with a mention in the documentation for --encoding. Two things to note about the implementation: - we could produce the warning closer to the call to iconv() in reencode_string_len(), which would let us relay the value of errno. But this is not actually very helpful. reencode_string_len() does not know we are operating on a commit, and indeed does not know that the caller won't produce an error of its own. And the errno values from iconv() are seldom helpful (iconv_open() only ever produces EINVAL; perhaps EILSEQ from iconv() might be illuminating, but it can also return EINVAL for incomplete sequences). - if the reason for the failure is that the output charset is not supported, then the user will see this warning for every commit we try to display. That might be ugly and overwhelming, but on the other hand it is making it clear that every one of them has not been converted (and the likely outcome anyway is to re-try the command with a supported output encoding). Signed-off-by: Jeff King --- Documentation/pretty-options.txt | 4 +++- pretty.c | 6 +++++- t/t4210-log-i18n.sh | 7 +++++++ 3 files changed, 15 insertions(+), 2 deletions(-) diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt index 27ddaf84a1..42b227bc40 100644 --- a/Documentation/pretty-options.txt +++ b/Documentation/pretty-options.txt @@ -40,7 +40,9 @@ people using 80-column terminals. defaults to UTF-8. Note that if an object claims to be encoded in `X` and we are outputting in `X`, we will output the object verbatim; this means that invalid sequences in the original - commit may be copied to the output. + commit may be copied to the output. Likewise, if iconv(3) fails + to convert the commit, we will output the original object + verbatim, along with a warning. --expand-tabs=:: --expand-tabs:: diff --git a/pretty.c b/pretty.c index 9631529c10..73b5ead509 100644 --- a/pretty.c +++ b/pretty.c @@ -671,7 +671,11 @@ const char *repo_logmsg_reencode(struct repository *r, * If the re-encoding failed, out might be NULL here; in that * case we just return the commit message verbatim. */ - return out ? out : msg; + if (!out) { + warning("unable to reencode commit to '%s'", output_encoding); + return msg; + } + return out; } static int mailmap_name(const char **email, size_t *email_len, diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh index d2dfcf164e..0141f36e33 100755 --- a/t/t4210-log-i18n.sh +++ b/t/t4210-log-i18n.sh @@ -131,4 +131,11 @@ do fi done +test_expect_success 'log shows warning when conversion fails' ' + enc=this-encoding-does-not-exist && + git log -1 --encoding=$enc 2>err && + echo "warning: unable to reencode commit to ${SQ}${enc}${SQ}" >expect && + test_cmp expect err +' + test_done -- 2.33.0.396.g72f622fe47 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45FE2C432BE for ; Fri, 27 Aug 2021 18:32:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1CC19601FF for ; Fri, 27 Aug 2021 18:32:09 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229775AbhH0Sc5 (ORCPT ); Fri, 27 Aug 2021 14:32:57 -0400 Received: from cloud.peff.net ([104.130.231.41]:60830 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229580AbhH0Sc4 (ORCPT ); Fri, 27 Aug 2021 14:32:56 -0400 Received: (qmail 9096 invoked by uid 109); 27 Aug 2021 18:32:07 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 27 Aug 2021 18:32:07 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 15495 invoked by uid 111); 27 Aug 2021 18:32:06 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 27 Aug 2021 14:32:06 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 27 Aug 2021 14:32:06 -0400 From: Jeff King To: Junio C Hamano Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Aug 27, 2021 at 02:30:16PM -0400, Jeff King wrote: > Here it is polished into a real commit. > > Subject: [PATCH] logmsg_reencode(): warn when iconv() fails And here's a minimal documentation I'd suggest on top. We can discuss going further in discussing subtleties of iconv() if we want, but IMHO it would work to stop here. -- >8 -- Subject: [PATCH] docs: use "character encoding" to refer to commit-object encoding The word "encoding" can mean a lot of things (e.g., base64 or quoted-printable encoding in emails, HTML entities, URL encoding, and so on). The documentation for i18n.commitEncoding and i18n.logOutputEncoding uses the phrase "character encoding" to make this more clear. Let's use that phrase in other places to make it clear what kind of encoding we are talking about. This patch covers the gui.encoding option, as well as the --encoding option for git-log, etc (in this latter case, I word-smithed the sentence a little at the same time). That, coupled with the mention of iconv in the --encoding description, should make this more clear. The other spot I looked at is the working-tree-encoding section of gitattributes(5). But it gives specific examples of encodings that I think make the meaning pretty clear already. Signed-off-by: Jeff King --- Documentation/config/gui.txt | 2 +- Documentation/pretty-options.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/Documentation/config/gui.txt b/Documentation/config/gui.txt index d30831a130..0c087fd8c9 100644 --- a/Documentation/config/gui.txt +++ b/Documentation/config/gui.txt @@ -11,7 +11,7 @@ gui.displayUntracked:: in the file list. The default is "true". gui.encoding:: - Specifies the default encoding to use for displaying of + Specifies the default character encoding to use for displaying of file contents in linkgit:git-gui[1] and linkgit:gitk[1]. It can be overridden by setting the 'encoding' attribute for relevant files (see linkgit:gitattributes[5]). diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt index 42b227bc40..b3af850608 100644 --- a/Documentation/pretty-options.txt +++ b/Documentation/pretty-options.txt @@ -33,7 +33,7 @@ people using 80-column terminals. used together. --encoding=:: - The commit objects record the encoding used for the log message + Commit objects record the character encoding used for the log message in their encoding header; this option can be used to tell the command to re-code the commit log message in the encoding preferred by the user. For non plumbing commands this -- 2.33.0.396.g72f622fe47 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C5BAFC4320E for ; Fri, 27 Aug 2021 18:33:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9473C60FDA for ; Fri, 27 Aug 2021 18:33:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229958AbhH0SeR (ORCPT ); Fri, 27 Aug 2021 14:34:17 -0400 Received: from cloud.peff.net ([104.130.231.41]:60832 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229909AbhH0SeN (ORCPT ); Fri, 27 Aug 2021 14:34:13 -0400 Received: (qmail 9102 invoked by uid 109); 27 Aug 2021 18:33:23 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 27 Aug 2021 18:33:23 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 15521 invoked by uid 111); 27 Aug 2021 18:33:22 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 27 Aug 2021 14:33:22 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 27 Aug 2021 14:33:22 -0400 From: Jeff King To: Krzysztof =?utf-8?Q?=C5=BBelechowski?= Cc: git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> <1790169.Z4XVHNUiN4@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1790169.Z4XVHNUiN4@localhost.localdomain> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Aug 26, 2021 at 01:00:44AM +0200, Krzysztof Żelechowski wrote: > Dnia środa, 25 sierpnia 2021 02:57:47 CEST Jeff King pisze: > > As far as what you're trying to accomplish, HTML-escaping isn't > > something Git supports. You'll have to run the output through an > > external escaping mechanism. > > Have you looked at the format? It is a HTML fragment with placeholders to be > filled by git log. I cannot run the output through an external escaping > mechanism because it will kill the markup that is already there. Right, what I mean is that you'd have to pull the output out of Git, and then format (and escape) it separately. -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 73B5FC432BE for ; Fri, 27 Aug 2021 18:37:31 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5826660EB5 for ; Fri, 27 Aug 2021 18:37:31 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229947AbhH0SiT (ORCPT ); Fri, 27 Aug 2021 14:38:19 -0400 Received: from cloud.peff.net ([104.130.231.41]:60840 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229739AbhH0SiT (ORCPT ); Fri, 27 Aug 2021 14:38:19 -0400 Received: (qmail 9115 invoked by uid 109); 27 Aug 2021 18:37:29 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 27 Aug 2021 18:37:29 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 15528 invoked by uid 111); 27 Aug 2021 18:37:29 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 27 Aug 2021 14:37:29 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 27 Aug 2021 14:37:28 -0400 From: Jeff King To: Junio C Hamano Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , Bryan Turner , Git Users Subject: Re: git log --encoding=HTML is not supported Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> <3883941.fE8Og5qy2N@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Aug 27, 2021 at 08:59:40AM -0700, Junio C Hamano wrote: > Krzysztof Żelechowski writes: > > > Dnia czwartek, 26 sierpnia 2021 17:37:40 CEST Junio C Hamano pisze: > >> git log --html \ > >> --format='%h%s...' > > > > I would like to be able to say: > > { git config i18n.logOutputEscape HTML; } > > > > What do you think? > > It depends on what it does. > > If the configuration means that "git log" output (with any supported > options, like "-p") will be given with '<' written to '^lt;' etc. so > that it becomes safe to dump it in HTML, it fails to interest me at > all. Yeah, I think things get pretty weird when you start thinking about dumping whole filenames and diff contents. I wouldn't be opposed to an option for the pretty formatter to have encodings. Something like: git log --format='%(authorname:quote=html)' I'd probably put off implementing that until we actually unify the for-each-ref and pretty formats, though (we do not even have %(authorname) at this point!). The latter already has a quoting mechanism for shell/perl/python/tcl (though it is not per-atom, and I wouldn't be opposed to a --format-quote option that quoted all pretty.c placeholders). -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4490C432BE for ; Fri, 27 Aug 2021 19:47:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 846D460FE6 for ; Fri, 27 Aug 2021 19:47:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231279AbhH0TsA (ORCPT ); Fri, 27 Aug 2021 15:48:00 -0400 Received: from pb-smtp1.pobox.com ([64.147.108.70]:57498 "EHLO pb-smtp1.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229821AbhH0TsA (ORCPT ); Fri, 27 Aug 2021 15:48:00 -0400 Received: from pb-smtp1.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id ACEFBDDB5C; Fri, 27 Aug 2021 15:47:10 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=DRW1GhiSZ8JY/2Sx/CdZPMXD+R7LeuiYZaJNr5 pF18s=; b=xZ5v8rhONagAUfUiJCvrZMxRCa+V2PJDSaUYTVXRPB/9bUILRLF1ip Rl73dD+2v9g1cGnI+Y9T+AOBmen2Pi/yYN7fp16gdR28hVi6AxFkdYt10qyyuYLA tMAVlbec0dIKzyu1/Fe7OTvdsJ2mwbfBcJG0DvqK1NTHS80QyO8GI= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id A15EEDDB5B; Fri, 27 Aug 2021 15:47:10 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.74.116.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 0C241DDB5A; Fri, 27 Aug 2021 15:47:10 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Jeff King Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org Subject: Re: git log --encoding=HTML is not supported References: <9896630.2IqcCWsCYL@localhost.localdomain> Date: Fri, 27 Aug 2021 12:47:08 -0700 In-Reply-To: (Jeff King's message of "Fri, 27 Aug 2021 14:32:06 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 91369B4C-076F-11EC-94FC-D601C7D8090B-77302942!pb-smtp1.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > On Fri, Aug 27, 2021 at 02:30:16PM -0400, Jeff King wrote: > >> Here it is polished into a real commit. >> >> Subject: [PATCH] logmsg_reencode(): warn when iconv() fails > > And here's a minimal documentation I'd suggest on top. We can discuss > going further in discussing subtleties of iconv() if we want, but IMHO > it would work to stop here. Thanks, both patches look sensible. Will queue. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E37DC432BE for ; Fri, 27 Aug 2021 21:51:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1984860C40 for ; Fri, 27 Aug 2021 21:51:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232010AbhH0VwR (ORCPT ); Fri, 27 Aug 2021 17:52:17 -0400 Received: from pb-smtp2.pobox.com ([64.147.108.71]:57656 "EHLO pb-smtp2.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231906AbhH0VwO (ORCPT ); Fri, 27 Aug 2021 17:52:14 -0400 Received: from pb-smtp2.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 71938EF386; Fri, 27 Aug 2021 17:51:25 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=Wi/1dsRBZmphU1jZ2bvUzt9XwRYkQ+Yt96pGTi ZRdqk=; b=vra/qPWZHEkAvcMcafx3gXKXlK4zTlIM99wCY29y0MDDDV9pitu2G6 bAz4UD7VNUvExmF+sHJE2TPM6ggt8XU5cX+LrKRO27JiE15LS3BeaErk+VmrzDJP U7OV7Wjwcgk9c6xNYGGUZzlmEESuiLi4ZAUsSxejICJeyOpJvPV48= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id 65DB7EF385; Fri, 27 Aug 2021 17:51:25 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [34.74.116.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id C844AEF384; Fri, 27 Aug 2021 17:51:24 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Jeff King Cc: Krzysztof =?utf-8?Q?=C5=BBelechowski?= , Bryan Turner , Git Users Subject: Re: git log --encoding=HTML is not supported References: <9896630.2IqcCWsCYL@localhost.localdomain> <3883941.fE8Og5qy2N@localhost.localdomain> Date: Fri, 27 Aug 2021 14:51:23 -0700 In-Reply-To: (Jeff King's message of "Fri, 27 Aug 2021 14:37:28 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: EC9A9202-0780-11EC-A26D-ECFD1DBA3BAF-77302942!pb-smtp2.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > I wouldn't be opposed to an option for the pretty formatter to have > encodings. Something like: > > git log --format='%(authorname:quote=html)' > > I'd probably put off implementing that until we actually unify the > for-each-ref and pretty formats, though (we do not even have > %(authorname) at this point!). The latter already has a quoting > mechanism for shell/perl/python/tcl (though it is not per-atom, and I > wouldn't be opposed to a --format-quote option that quoted all pretty.c > placeholders). Yeah, per-atom would be nice, as we can specify which piece needs what kind of quoting, e.g. git log --format=' if test %(authoremail:quote=shell) != "gitster@pobox.com" then echo %(authorname:quote=html+shell) fi ' | sh can be used to write a script to produce an "echo" command with a shell literal string as its argument, where that literal string writes author's name in a way that can be inserted in an HTML document, but omitting the commits by me. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EA4EC433F5 for ; Sat, 9 Oct 2021 01:24:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id EA18060F9C for ; Sat, 9 Oct 2021 01:24:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232054AbhJIB0H (ORCPT ); Fri, 8 Oct 2021 21:26:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232001AbhJIB0H (ORCPT ); Fri, 8 Oct 2021 21:26:07 -0400 Received: from mail-ed1-x534.google.com (mail-ed1-x534.google.com [IPv6:2a00:1450:4864:20::534]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0EC2CC061570 for ; Fri, 8 Oct 2021 18:24:11 -0700 (PDT) Received: by mail-ed1-x534.google.com with SMTP id a25so26706099edx.8 for ; Fri, 08 Oct 2021 18:24:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=lIR0EOyYC/NiSw/1tm0oxoUdQZvZNLb6EAySjM92KYw=; b=ahGZctR+dDfRa5lM7LX9u1zbdGxd8AYFSfq50M93Ek56dHPIm+lmQW7ZPRoZf/Uq33 dNecL5kbV1NZ9vn3MbshijYUDHlQfCkxDdLhJ3R6jZOMEE/V0+wRijN8y75BugQr2wyC NRLYb/OYGaGAxt2hjtJZUg2AvPXMW1N8hgLVlXJlTNostlxvQQfBNGqTI0SvJnvAuiNs he6i0prhxs9QPEos454Z8QeDPHxfb4ccActBEE91CVFD7FtrJAypHDsUScQfNKsL1a0m Vl7bYh9CxHl3hdJbo8CIHDzSVUa1XhkLncHH92y/xaL8G8+GlQRdxlgBhJKWHgU1+zQL T6zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version:content-transfer-encoding; bh=lIR0EOyYC/NiSw/1tm0oxoUdQZvZNLb6EAySjM92KYw=; b=3Dd2QGsnXn5CopWW4i1wDxje7DcURD+wBgb9eWjkQVPQIMkTK7kFEGq0Tre/Wja5DK rv400XDRlA+UZFOSl7wQuV1seZ5N1l7a+VWbsU8kJ0eh6ypocMiJjXxgFOXL5lXOl4O+ B1+T8T9fhwptJ4MXxXBja8v1Cgcf7y8q5UEeDeDuBg/Ms70eoVxlDmJMPhEWqzFfRsnK LbU+IFH1Ci9A2LH0xRrbX+6DsDRxVtQ9a9Bpc8H8irRE+u2gDAl0NmT/wZVc4Y75o0CR mzj/Nn1wqzSKalshfdLJ7XjxD46OMEspMEe9Va5e0hFxUPbkHercXFgBNM8qIdKqge4Z 9+Wg== X-Gm-Message-State: AOAM533lN1sv4cLoGIY0+r52K8mzUolefeBUsZqy1+RWO6x2yvn6TYhf FCmIkTJaekJtQJO6KCkV4HE= X-Google-Smtp-Source: ABdhPJyL++dmerpX0Tqa12Jn9r3OYYyvO70lK4La0vq06JVcFvt1fNpghrOnFLFSk3HwCllrulb02Q== X-Received: by 2002:a05:6402:35c4:: with SMTP id z4mr19628236edc.197.1633742649448; Fri, 08 Oct 2021 18:24:09 -0700 (PDT) Received: from evledraar (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id v13sm381041ede.79.2021.10.08.18.24.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 18:24:09 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: *Really* noisy encoding warnings post-v2.33.0 Date: Sat, 09 Oct 2021 02:58:10 +0200 References: <9896630.2IqcCWsCYL@localhost.localdomain> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.7.0 In-reply-to: Message-ID: <87ily7m1mv.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Aug 27 2021, Jeff King wrote: $subject because I think we should really consider backing this out before it gets to a real release. I ran into this while testing the grep coloring patch[1] (but it's unrelated). Before this commit e.g.: LC_ALL=3DC ~/g/git/git -P -c i18n.commitEncoding=3Dascii log --author= =3D=C3=86var -100|wc -l 28333 So ~3k lines for my last 100 commits, but then: $ LC_ALL=3DC ~/g/git/git -P -c i18n.commitEncoding=3Dascii log --author= =3D=C3=86var -100 2>&1|grep -c ^warning 299 At first I thought it was spewing warnings for every failed re-encoded line in some cases, because I get hundreds at a time sometimes, but it's because stderr and stdout I/O buffering is different (a common case). Adding a "fflush(stderr)" "fixes" that. But anyway, I think we've got a lot of users who say *do* want to reencode something from say UTF-8 to latin1, but then might have the occasional non-latin1 representable data. The old behavior of silently falling back is going to be much better for those users, or maybe show one warning at the end or something, if you feel it really needs to be kept. > On Wed, Aug 25, 2021 at 09:31:59AM -0700, Junio C Hamano wrote: > >> Jeff King writes: >>=20 >> > We feed the encoding "HTML" to iconv_open(), which of course has no id= ea >> > what that is. It's unfortunate, though, that we don't even print a >> > warning, and instead just quietly leave the text intact. I wonder if we >> > should do something like: >> [...] >> This addition sounds quite sensible to me. >>=20 >> "git log --encoding=3Dbogus" would issue this warning for each and >> every commit and that may be a bit irritating, but being irritating >> may be a good characteristic for a warning message that is given to >> an easily correctable condition. >>=20 >> I originally thought that the warning would be lost to the pager, >> but apparently I forgot what I did eons ago at 61b80509 (sending >> errors to stdout under $PAGER, 2008-02-16) ;-). > > Here it is polished into a real commit. > > -- >8 -- > Subject: [PATCH] logmsg_reencode(): warn when iconv() fails > > If the user asks for a pretty-printed commit to be converted (either > explicitly with --encoding=3Dfoo, or implicitly because the commit is > non-utf8 and we want to convert it), we pass it through iconv(). If that > fails, we fall back to showing the input verbatim, but don't tell the > user that the output may be bogus. > > Let's add a warning to do so, along with a mention in the documentation > for --encoding. Two things to note about the implementation: > > - we could produce the warning closer to the call to iconv() in > reencode_string_len(), which would let us relay the value of errno. > But this is not actually very helpful. reencode_string_len() does > not know we are operating on a commit, and indeed does not know that > the caller won't produce an error of its own. And the errno values > from iconv() are seldom helpful (iconv_open() only ever produces > EINVAL; perhaps EILSEQ from iconv() might be illuminating, but it > can also return EINVAL for incomplete sequences). > > - if the reason for the failure is that the output charset is not > supported, then the user will see this warning for every commit we > try to display. That might be ugly and overwhelming, but on the > other hand it is making it clear that every one of them has not been > converted (and the likely outcome anyway is to re-try the command > with a supported output encoding). > > Signed-off-by: Jeff King > --- > Documentation/pretty-options.txt | 4 +++- > pretty.c | 6 +++++- > t/t4210-log-i18n.sh | 7 +++++++ > 3 files changed, 15 insertions(+), 2 deletions(-) > > diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-opti= ons.txt > index 27ddaf84a1..42b227bc40 100644 > --- a/Documentation/pretty-options.txt > +++ b/Documentation/pretty-options.txt > @@ -40,7 +40,9 @@ people using 80-column terminals. > defaults to UTF-8. Note that if an object claims to be encoded > in `X` and we are outputting in `X`, we will output the object > verbatim; this means that invalid sequences in the original > - commit may be copied to the output. > + commit may be copied to the output. Likewise, if iconv(3) fails > + to convert the commit, we will output the original object > + verbatim, along with a warning. >=20=20 > --expand-tabs=3D:: > --expand-tabs:: > diff --git a/pretty.c b/pretty.c > index 9631529c10..73b5ead509 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -671,7 +671,11 @@ const char *repo_logmsg_reencode(struct repository *= r, > * If the re-encoding failed, out might be NULL here; in that > * case we just return the commit message verbatim. > */ > - return out ? out : msg; > + if (!out) { > + warning("unable to reencode commit to '%s'", output_encoding); > + return msg; > + } > + return out; > } >=20=20 > static int mailmap_name(const char **email, size_t *email_len, > diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh > index d2dfcf164e..0141f36e33 100755 > --- a/t/t4210-log-i18n.sh > +++ b/t/t4210-log-i18n.sh > @@ -131,4 +131,11 @@ do > fi > done >=20=20 > +test_expect_success 'log shows warning when conversion fails' ' > + enc=3Dthis-encoding-does-not-exist && > + git log -1 --encoding=3D$enc 2>err && > + echo "warning: unable to reencode commit to ${SQ}${enc}${SQ}" >expect && > + test_cmp expect err > +' > + > test_done From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 49024C433EF for ; Sat, 9 Oct 2021 01:32:14 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1BF0760E76 for ; Sat, 9 Oct 2021 01:32:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244066AbhJIBeJ (ORCPT ); Fri, 8 Oct 2021 21:34:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232063AbhJIBeH (ORCPT ); Fri, 8 Oct 2021 21:34:07 -0400 Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1F0B9C061570 for ; Fri, 8 Oct 2021 18:32:11 -0700 (PDT) Received: by mail-ed1-x52f.google.com with SMTP id b8so42609657edk.2 for ; Fri, 08 Oct 2021 18:32:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=/BXKDzXDYS2W3Dos5odxkzTyykJrDlVy5isPJVbUe1A=; b=alQvYf3tIjTCV8r8IlMU/GzvemIWQvhVxOlR+2jJMfXQlA+EkofDbb2ieo+RP6I4As 7uUKq8w2aW4109aW81ij9dmMivATbPT3ju3yAiFJ0eTL0DB1+ypgr9huDRv+4fbPjMJd mFp92d6ANv1T44Jiv1yrh6xsuzCCj3EBtq68Qs2l9IzgaI5ClEI/KjXk6/ATF4bDc7gU QT5hFaTPvbYHEcv367u8U8haimZARgMbFh1jIphCL0wjc19KG12st6/QJxD2WPabc20X rSEhzjWS7Mt83pXT6wahGWx4GcMCCSxKFkbM3uP/2IQWn/ag1lJ025M89RWyL9rTROau +34w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version:content-transfer-encoding; bh=/BXKDzXDYS2W3Dos5odxkzTyykJrDlVy5isPJVbUe1A=; b=rariJEwXv3xtak/eRhvWRLiqTMD6htvzcu1AD6VV+rV024Pf9YxH5ozRcKBo53oHki sb9UO2oRyWo/wU9SbBDf0Lkoj8goiFbacBMA9EET/OvAlmgTBEL/wkGpk6nUlQtmR2Yl o/Fn/f0oYAJ6HMCE0x9tD2bFi+S1mG7fshzXT6QkFNfHmIBGxos7PYoYdODaOmkV70yw pAbYFyIKfm7StC2c4LC7X41yXkHl+Zm4ihuJ39FBXCFC5C9LD8FFG85YJyNo4inOQPrl bIWFfUMJvVLjU7lSfkdgVDIO19/SRAUfHLx/Dmg7/iGaXk8UX4j02/32ge59/XeAe5iR j5uw== X-Gm-Message-State: AOAM530CNvn8xKd9sG4wiffDEv1tvpnE0jDBS5KuvjkIS5ty/O7jcqlK CeUHYtG9Dx24mv0hxeWyTxA= X-Google-Smtp-Source: ABdhPJw4GxYVHoPDAFgOwrdbAZKYr+NHKP5LO9UyHV8613d6MXo6hrjdDOARs8cIn2bzEyePg/NRQw== X-Received: by 2002:a17:906:ce2c:: with SMTP id sd12mr8546434ejb.488.1633743129551; Fri, 08 Oct 2021 18:32:09 -0700 (PDT) Received: from evledraar (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id h7sm399787ede.19.2021.10.08.18.32.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Oct 2021 18:32:09 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechows?= =?utf-8?Q?ki?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Date: Sat, 09 Oct 2021 03:29:45 +0200 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.7.0 In-reply-to: <87ily7m1mv.fsf@evledraar.gmail.com> Message-ID: <87ee8vm19j.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Sat, Oct 09 2021, =C3=86var Arnfj=C3=B6r=C3=B0 Bjarmason wrote: > On Fri, Aug 27 2021, Jeff King wrote: > > $subject because I think we should really consider backing this out > before it gets to a real release. > > I ran into this while testing the grep coloring patch[1] (but it's > unrelated). Before this commit e.g.: > > LC_ALL=3DC ~/g/git/git -P -c i18n.commitEncoding=3Dascii log --author= =3D=C3=86var -100|wc -l > 28333 > > So ~3k lines for my last 100 commits, but then: > > $ LC_ALL=3DC ~/g/git/git -P -c i18n.commitEncoding=3Dascii log --auth= or=3D=C3=86var -100 2>&1|grep -c ^warning > 299 > > At first I thought it was spewing warnings for every failed re-encoded > line in some cases, because I get hundreds at a time sometimes, but it's > because stderr and stdout I/O buffering is different (a common > case). Adding a "fflush(stderr)" "fixes" that. It's partially that, but also more pathologically: $ git -P -c i18n.commitEncoding=3Dascii log --author=3Ddoesnotexist -1 = | wc -l 0 $ git -P -c i18n.commitEncoding=3Dascii log --author=3Ddoesnotexist -1 = 2>&1 |wc -l 6688 I.e. even if we don't end up emitting anything we'll warn, of course we might not match *because* we failed, e.g. if you had a non-ascii --grep string, but in this case it's rather noisy. > But anyway, I think we've got a lot of users who say *do* want to > reencode something from say UTF-8 to latin1, but then might have the > occasional non-latin1 representable data. The old behavior of silently > falling back is going to be much better for those users, or maybe show > one warning at the end or something, if you feel it really needs to be > kept. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EADE8C433F5 for ; Sat, 9 Oct 2021 02:36:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C789560D07 for ; Sat, 9 Oct 2021 02:36:05 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244245AbhJICiA (ORCPT ); Fri, 8 Oct 2021 22:38:00 -0400 Received: from cloud.peff.net ([104.130.231.41]:36298 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232256AbhJICh7 (ORCPT ); Fri, 8 Oct 2021 22:37:59 -0400 Received: (qmail 7201 invoked by uid 109); 9 Oct 2021 02:36:03 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Sat, 09 Oct 2021 02:36:03 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 27976 invoked by uid 111); 9 Oct 2021 02:36:02 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 08 Oct 2021 22:36:02 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 8 Oct 2021 22:36:02 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87ily7m1mv.fsf@evledraar.gmail.com> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Sat, Oct 09, 2021 at 02:58:10AM +0200, Ævar Arnfjörð Bjarmason wrote: > I ran into this while testing the grep coloring patch[1] (but it's > unrelated). Before this commit e.g.: > > LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100|wc -l > 28333 > > So ~3k lines for my last 100 commits, but then: > > $ LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100 2>&1|grep -c ^warning > 299 > > At first I thought it was spewing warnings for every failed re-encoded > line in some cases, because I get hundreds at a time sometimes, but it's > because stderr and stdout I/O buffering is different (a common > case). Adding a "fflush(stderr)" "fixes" that. I don't think the buffering is the issue. By default stderr flushes on lines, and we flush commits after showing them. If you take away "-P" (or look at the combined 2>&1 output in order), you'll see that they are grouped. Now one thing you might notice is that there may be multiple warnings between output commits. But that's because we really are re-encoding each of those intermediate commits to do your --author grep. And if that re-encoding fails, we may well be producing the wrong output, because the matching won't be correct (in your case, presumably the correct output should be _nothing_, because Æ is not an ascii character). I do think the current warning is particularly bad there, because it doesn't even mention the commit oid. So something like: diff --git a/pretty.c b/pretty.c index 708b618cfe..ddf501632d 100644 --- a/pretty.c +++ b/pretty.c @@ -673,7 +673,8 @@ const char *repo_logmsg_reencode(struct repository *r, * case we just return the commit message verbatim. */ if (!out) { - warning("unable to reencode commit to '%s'", output_encoding); + warning("unable to reencode commit %s to '%s'", + oid_to_hex(&commit->object.oid), output_encoding); return msg; } return out; means you get output like: $ git -c i18n.commitEncoding=ascii log --format='%h %s' --author=Ævar -100 warning: unable to reencode commit c90cfc225baaf64af311f7e2953267e4de636205 to 'ascii' warning: unable to reencode commit 1d1d731d30cbcd5f3a6a5cbac1fe218e4d4db72b to 'ascii' warning: unable to reencode commit 66237bcf60df357f188551e1ea4db90f94c519ae to 'ascii' warning: unable to reencode commit 100c2da2d3a330366588143d720f09a88926972a to 'ascii' warning: unable to reencode commit 59580685bee17de3efff614df7f508133d1e4a7a to 'ascii' 59580685be config.h: remove unused git_config_get_untracked_cache() declaration warning: unable to reencode commit 067e73c8aee9aeb05eac939205274cd2ad8b7cae to 'ascii' 067e73c8ae log-tree.h: remove unused function declarations [...etc...] If that were coupled with, say, an advise() call to explain that output and matching might be inaccurate (and show that _once_), that might might it more clear what's going on. Now I am sympathetic to flooding the user with too many messages, and maybe reducing this to a single instance of "some commit messages could not be re-encoded; output and matching might be inaccurate" is the right thing. But in a sense, it's also working as designed: what you asked for is producing wrong output over and over, and Git is saying so. I'm not even sure what you're trying to do with that command. It could never output a single correct commit, because you've asked to match only commits that will be shown in the wrong encoding. > But anyway, I think we've got a lot of users who say *do* want to > reencode something from say UTF-8 to latin1, but then might have the > occasional non-latin1 representable data. The old behavior of silently > falling back is going to be much better for those users, or maybe show > one warning at the end or something, if you feel it really needs to be > kept. If there are real-world cases where the quantity of errors is really getting in the way, I'm open to the idea of having a single error message. And personally, I don't really have any experience working with broken encodings (all my commits are in utf8, and that's what I use as output). It just seems weird to me that 'git log --encoding=foo' would quietly ignore the option entirely (i.e., the old behavior, which did lead to a confused user and a post to the list). -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 419B4C433F5 for ; Sat, 9 Oct 2021 02:42:35 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1626E60EFE for ; Sat, 9 Oct 2021 02:42:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244182AbhJICoa (ORCPT ); Fri, 8 Oct 2021 22:44:30 -0400 Received: from cloud.peff.net ([104.130.231.41]:36310 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244102AbhJICo3 (ORCPT ); Fri, 8 Oct 2021 22:44:29 -0400 Received: (qmail 7230 invoked by uid 109); 9 Oct 2021 02:42:33 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Sat, 09 Oct 2021 02:42:33 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 28049 invoked by uid 111); 9 Oct 2021 02:42:33 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 08 Oct 2021 22:42:33 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 8 Oct 2021 22:42:32 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 08, 2021 at 10:36:02PM -0400, Jeff King wrote: > If that were coupled with, say, an advise() call to explain that output > and matching might be inaccurate (and show that _once_), that might > might it more clear what's going on. > > Now I am sympathetic to flooding the user with too many messages, and > maybe reducing this to a single instance of "some commit messages could > not be re-encoded; output and matching might be inaccurate" is the right > thing. But in a sense, it's also working as designed: what you asked for > is producing wrong output over and over, and Git is saying so. The single-output version would perhaps be something like this: diff --git a/pretty.c b/pretty.c index 708b618cfe..c86f41bae7 100644 --- a/pretty.c +++ b/pretty.c @@ -606,6 +606,21 @@ static char *replace_encoding_header(char *buf, const char *encoding) return strbuf_detach(&tmp, NULL); } +static void show_encoding_warning(const char *output_encoding) +{ + static int seen_warning; + + if (seen_warning) + return; + + seen_warning = 1; + warning("one or more commits could not be re-encoded to '%s'", + output_encoding); + advise("When re-encoding fails, some output may be in an unexpected\n" + "encoding, and pattern matches against commit data may be\n" + "inaccurate."); +} + const char *repo_logmsg_reencode(struct repository *r, const struct commit *commit, char **commit_encoding, @@ -673,7 +688,7 @@ const char *repo_logmsg_reencode(struct repository *r, * case we just return the commit message verbatim. */ if (!out) { - warning("unable to reencode commit to '%s'", output_encoding); + show_encoding_warning(output_encoding); return msg; } return out; From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 117FFC433EF for ; Sat, 9 Oct 2021 14:33:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id E1A5060F70 for ; Sat, 9 Oct 2021 14:33:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234010AbhJIOfe (ORCPT ); Sat, 9 Oct 2021 10:35:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60012 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233850AbhJIOfe (ORCPT ); Sat, 9 Oct 2021 10:35:34 -0400 Received: from mail-ed1-x531.google.com (mail-ed1-x531.google.com [IPv6:2a00:1450:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 16595C061570 for ; Sat, 9 Oct 2021 07:33:37 -0700 (PDT) Received: by mail-ed1-x531.google.com with SMTP id y12so34565007eda.4 for ; Sat, 09 Oct 2021 07:33:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version; bh=H0PLKci7M0+T204HK2QbKGNf0BLsW+oCUw7vKaxM3pI=; b=aUD0pV+thQuUWly/0vwqOMak/ERsRDc6Jmbhe7Ejs1SfAc8LeCTm15Q2eSDMQHsM2T DfH0rayaYvkP4jBwvx7LhZDJ8OXAkKv490+GrsQV6JvGpazaZZhaEasXUZoKeDHUzEnI DtGI+5Y+4wQj9S4fnrKa9HVOaHjZD74oAfptT2iqBPiJdHH3hZDcMcJiVkWyzXf6HK3z 7SE6+Y/+qdJuA6lS4RBCCPMkjAZQ2sDHyny2CeWZunEtJVMBSQQGvWgIZJ3iP4wAPGkV fm8Zw82RqCHAPPsbLj0KluUFO7NoVqC3vGKq3Z0fhcWS6h8cX122CvD9uCylEvYT4689 SeJg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version; bh=H0PLKci7M0+T204HK2QbKGNf0BLsW+oCUw7vKaxM3pI=; b=1SSpKaVy4svzXI2zxnHjt+W9dfFf/qxlMMXln0uBKkS2aPou7FizPLxcqVWbx4tM9Y p0UWajoU+Gf5EO0M7/tdJUSg8oEuQMtqpTeViXX4SQN3iGHAmBQqIIq56L3p38OrLZ+i BCmnBoxiDpz3Wi6VUT5eGhcz7DDELc2eywqPPGOEC42GFs8ppcXuJCSMfAPb/sUCMN7P L2oq280/JsZOci0Hbw4bFQ+riGLfyftpdoaR2xz582TniWTjLBX03OHM+9kbVct9ECgV Vjt6RMZY0GPySjS528rTp5uetRCfiBVIV29MH9aUWuS0Kb1wOq1NoEBY6z04M3nf1ozr 5Waw== X-Gm-Message-State: AOAM532Kh4qMWD6+2ou/of4Cjkp0dCGFiqKC3bT3fpu+/j1XnE6I0db7 FGO0gPwbp2xJL5aSh5+JJ+vHt98l68temw== X-Google-Smtp-Source: ABdhPJyif1HQ8tr8iwBQvQYeN5PHTywbjNRzdcnQa3d2y1+Pe9KZzqnHjLgdl4GlRn/Akkb9qYgyeQ== X-Received: by 2002:a05:6402:5112:: with SMTP id m18mr16354109edd.101.1633790015475; Sat, 09 Oct 2021 07:33:35 -0700 (PDT) Received: from evledraar (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id u16sm1054590ejy.14.2021.10.09.07.33.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 09 Oct 2021 07:33:34 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechows?= =?utf-8?Q?ki?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Date: Sat, 09 Oct 2021 15:47:16 +0200 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.7.0 In-reply-to: Message-ID: <871r4umfnm.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 08 2021, Jeff King wrote: > On Fri, Oct 08, 2021 at 10:36:02PM -0400, Jeff King wrote: > >> If that were coupled with, say, an advise() call to explain that output >> and matching might be inaccurate (and show that _once_), that might >> might it more clear what's going on. >> >> Now I am sympathetic to flooding the user with too many messages, and >> maybe reducing this to a single instance of "some commit messages could >> not be re-encoded; output and matching might be inaccurate" is the right >> thing. But in a sense, it's also working as designed: what you asked for >> is producing wrong output over and over, and Git is saying so. > > The single-output version would perhaps be something like this: > > diff --git a/pretty.c b/pretty.c > index 708b618cfe..c86f41bae7 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -606,6 +606,21 @@ static char *replace_encoding_header(char *buf, const char *encoding) > return strbuf_detach(&tmp, NULL); > } > > +static void show_encoding_warning(const char *output_encoding) > +{ > + static int seen_warning; > + > + if (seen_warning) > + return; > + > + seen_warning = 1; > + warning("one or more commits could not be re-encoded to '%s'", > + output_encoding); > + advise("When re-encoding fails, some output may be in an unexpected\n" > + "encoding, and pattern matches against commit data may be\n" > + "inaccurate."); > +} > + > const char *repo_logmsg_reencode(struct repository *r, > const struct commit *commit, > char **commit_encoding, > @@ -673,7 +688,7 @@ const char *repo_logmsg_reencode(struct repository *r, > * case we just return the commit message verbatim. > */ > if (!out) { > - warning("unable to reencode commit to '%s'", output_encoding); > + show_encoding_warning(output_encoding); > return msg; > } > return out; I'm not categorically opposed to having this warning stay, because I can imagine an implementation of it that isn't so overmatching, but I think neither one of us is going to work on that, so .... We have the exact same edge case in the grep-a-file code, and it's a delibirate decision to stay quiet about it. Let's assume your pattern contains non-ASCII, you've asked for locale-aware grepping, you want \d+ to mean all sorts of Unicode fuzzyness instead of just [0-9] etc. (not yet implemented, PCRE_UCP). It would still be annoying to see a warning every time you grep without providing a pathspec that blacklists say the 100 '*.png' files that are in your tree. And that's a case where we *could* say that the user should mark them with .gitattributes or whatever, but making every git user go through that would be annoying to them, so we just do our best and silently fall back. Similarly with this, let's say I'm on an OS that likes UTF-16 better, as some of our users do, I have the relevant settings to re-encode git.git or linux.git. Now run: git -c i18n.logOutputEncoding=utf16 log --author=foobarbaz And it's 2 warnings in git.git, and 157 in linux.git. Anyway, your commit above makes that 1 in both cases, which is certainly a *lot* better. But I think similar to the grep-a-file case it's still way to much, now just because I've got some old badly encoded commits in my history I'll see one warning every time a log revision walk/grep comes across those. On the "not categorically opposed" I think that this sort of warning /might/ be good if: * It weren't enabled by default, or at least as a transition had something like a advise() message pointing at a fsck.skipList-like (or other instructions, replace?) about how to quiet it. * We're realy dumb with how we chain data->iconv->PCRE here. I.e. we'll whine that we can't reencode just to match my "foobarbaz", but we could just keep walking past bad bytes. We should ideally say "we might have matched your data, but *because* of the encoding failure we couldn't. We can easily know with something like "foobarbaz" that that's not the case. Anyway, I think all of that we can leave for the future, because I'd simply assumed that this was based on some report that had to do with someone not matching with --grep or whatever because of the details of the encoding going wrong, e.g. a string that's later in a commit message, but a misencoded character tripped it up. But in this case this seems to have been because someone tried to feed "HTML" to it, which is not an encoding, and something iconv_open() has (I daresay) always and will always error on. It returns -1 and sets errno=EINVAL. So having a warning or other detection in the revision loop seems backwards to me, surely we want something like the below instead? I.e. die as close to bad option parsing as possible? Note that this will now die if we have NO_ICONV=Y, even with your patch, that seems like a feature. Now we'll silently ignore it. I.e. we'll warn because we failed to re-encode, but we're using a stub function whose body is: { if (e) *e = 0; return NULL; } So ditto the garbage encoding name we should have died a lot earlier. Aside from your warning test the below makes tests in t4201-shortlog.sh fail, but those just seem broken to me. I.e. they seem to rely on git staying quiet if i18n.commitencoding is set to garbage. diff --git a/environment.c b/environment.c index 43bb1b35ffe..c26b18f8e5c 100644 --- a/environment.c +++ b/environment.c @@ -357,8 +357,18 @@ void set_git_dir(const char *path, int make_realpath) const char *get_log_output_encoding(void) { - return git_log_output_encoding ? git_log_output_encoding + const char *encoding = git_log_output_encoding ? git_log_output_encoding : get_commit_output_encoding(); +#ifndef NO_ICONV + iconv_t conv; + conv = iconv_open(encoding, "UTF-8"); + if (conv == (iconv_t) -1 && errno == EINVAL) + die_errno("the '%s' encoding is not known to iconv", encoding); +#else + if (strcmp(encoding, "UTF-8")) + die("compiled with NO_ICONV=Y, can't re-encode to '%s'", encoding); +#endif + return encoding; } const char *get_commit_output_encoding(void) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C7D1C433EF for ; Sun, 10 Oct 2021 13:53:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 214A9610A4 for ; Sun, 10 Oct 2021 13:53:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232806AbhJJNzf (ORCPT ); Sun, 10 Oct 2021 09:55:35 -0400 Received: from bsmtp1.bon.at ([213.33.87.15]:17620 "EHLO bsmtp1.bon.at" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231842AbhJJNze (ORCPT ); Sun, 10 Oct 2021 09:55:34 -0400 Received: from [192.168.0.98] (unknown [93.83.142.38]) by bsmtp1.bon.at (Postfix) with ESMTPSA id 4HS3L36RZqz5tl9; Sun, 10 Oct 2021 15:53:31 +0200 (CEST) Subject: Re: *Really* noisy encoding warnings post-v2.33.0 To: Jeff King Cc: Junio C Hamano , =?UTF-8?Q?Krzysztof_=c5=bbelechowski?= , git@vger.kernel.org, Hamza Mahfooz , =?UTF-8?B?w4Z2YXIgQXJuZmrDtnLDsCBCamFybWFzb24=?= References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> From: Johannes Sixt Message-ID: <5eca71b7-e4df-92a1-35bf-5a99550e558e@kdbg.org> Date: Sun, 10 Oct 2021 15:53:31 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Am 09.10.21 um 04:36 schrieb Jeff King: > On Sat, Oct 09, 2021 at 02:58:10AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I ran into this while testing the grep coloring patch[1] (but it's >> unrelated). Before this commit e.g.: >> >> LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100|wc -l >> 28333 >> >> So ~3k lines for my last 100 commits, but then: >> >> $ LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100 2>&1|grep -c ^warning >> 299 >> >> At first I thought it was spewing warnings for every failed re-encoded >> line in some cases, because I get hundreds at a time sometimes, but it's >> because stderr and stdout I/O buffering is different (a common >> case). Adding a "fflush(stderr)" "fixes" that. > > I don't think the buffering is the issue. By default stderr flushes on > lines, and we flush commits after showing them. If you take away "-P" > (or look at the combined 2>&1 output in order), you'll see that they are > grouped. > > Now one thing you might notice is that there may be multiple warnings > between output commits. But that's because we really are re-encoding > each of those intermediate commits to do your --author grep. And if that > re-encoding fails, we may well be producing the wrong output, because > the matching won't be correct (in your case, presumably the correct > output should be _nothing_, because Æ is not an ascii character). I don't understand why i18n.commitEncoding plays a role here. Isn't it an instruction "when you make a commit, mark the commit message having this encoding". But grep does not make a commit. If this were i18n.logOuputEncoding it would make much more sense. Have I misunderstood the meaning of the two options? -- Hannes From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0069EC433F5 for ; Sun, 10 Oct 2021 15:45:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D803B60E52 for ; Sun, 10 Oct 2021 15:45:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232571AbhJJPrU (ORCPT ); Sun, 10 Oct 2021 11:47:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231842AbhJJPrR (ORCPT ); Sun, 10 Oct 2021 11:47:17 -0400 Received: from mail-ed1-x535.google.com (mail-ed1-x535.google.com [IPv6:2a00:1450:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3BC4C061570 for ; Sun, 10 Oct 2021 08:45:18 -0700 (PDT) Received: by mail-ed1-x535.google.com with SMTP id ec8so7580173edb.6 for ; Sun, 10 Oct 2021 08:45:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=3F1qqqs83NvRW37BulWTyYmAaQnrsZVQw/bP4pvSiCo=; b=Qnc7yX1HqNAzipmP9y7FJaGHQpMumkpZEODlMyBsoAXzpLwre6ZIoeQR5Fx9N3C0tr D+/sGMPL7KWJQfWuGilRBZ5hoWVdT3XRfI+IZaUpM3yUKjTa+VYuT/g/0XVJJG79sZj9 f06ofEfKROY42jJfeCrXjRi4zcvbikYoRaZxxXS5Q/G1ZRB40BK8mCtZpAQSmtDURdue UULhegfvKWUI4qAwGtfiqPVqYayQc/fkTZ9VycDz/AeaOR5Ne0YkdBwgyaXSDpgWkRzE pxpmY3p8YmSkAtvrH0fmZJAZV14RSXZnGpQho5M5+X1/YZEIbtVoc8U3SwoJ+3LposwK hpnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version:content-transfer-encoding; bh=3F1qqqs83NvRW37BulWTyYmAaQnrsZVQw/bP4pvSiCo=; b=wzirtvSzXzMCksnsREW7rnl8aGmMa/RH0rFw4iZ5U3hAg8QQnw3qRp5/rqCPRpZb9S 9qLs2dPCWFdg3FAkLD0AypFW8g2n51t7Ix3HbvvBlsIa1R07XUE/V0am7K3XO+ushK/Y J63lJWOUwCegcsL6/HqdH+h4fFjF81iZcc4zM12uouH4etCksVVsBuMQjogUCNtmh9oL 7EFgkeERP0hMJoYyZNi9dsD9cvT1LpjXF+HR4PicfQVpLy9yD68H7Lswvj42MYjx8Z5E mZ32BFkZ+o2EURcZRgqbSmBt1dedc/+9+o6m8+tm4jUZu+mGOeKdsKh2J6hfrdqTcndP 2FYQ== X-Gm-Message-State: AOAM530lNP8indJl8aMso+09xaqsGXhDR5/62hJIRAiUjHP/J6GEiAHP mdwtLpwDRPW10IyurQBZfWU4/szdHaY= X-Google-Smtp-Source: ABdhPJyk8I0vnaN3k186dBRDsiGqzld3u9mqYzWacByviXc3uZrNPqBOiOYcJTGr3huFEgA1VQTcbA== X-Received: by 2002:a17:906:a2c9:: with SMTP id by9mr19333030ejb.305.1633880717231; Sun, 10 Oct 2021 08:45:17 -0700 (PDT) Received: from evledraar (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id 6sm2209139ejx.82.2021.10.10.08.45.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Oct 2021 08:45:16 -0700 (PDT) From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Johannes Sixt Cc: Jeff King , Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Date: Sun, 10 Oct 2021 17:43:42 +0200 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> <5eca71b7-e4df-92a1-35bf-5a99550e558e@kdbg.org> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.7.0 In-reply-to: <5eca71b7-e4df-92a1-35bf-5a99550e558e@kdbg.org> Message-ID: <87sfx8lw8j.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Sun, Oct 10 2021, Johannes Sixt wrote: > Am 09.10.21 um 04:36 schrieb Jeff King: >> On Sat, Oct 09, 2021 at 02:58:10AM +0200, =C3=86var Arnfj=C3=B6r=C3=B0 B= jarmason wrote: >>=20 >>> I ran into this while testing the grep coloring patch[1] (but it's >>> unrelated). Before this commit e.g.: >>> >>> LC_ALL=3DC ~/g/git/git -P -c i18n.commitEncoding=3Dascii log --auth= or=3D=C3=86var -100|wc -l >>> 28333 >>> >>> So ~3k lines for my last 100 commits, but then: >>> >>> $ LC_ALL=3DC ~/g/git/git -P -c i18n.commitEncoding=3Dascii log --au= thor=3D=C3=86var -100 2>&1|grep -c ^warning >>> 299 >>> >>> At first I thought it was spewing warnings for every failed re-encoded >>> line in some cases, because I get hundreds at a time sometimes, but it's >>> because stderr and stdout I/O buffering is different (a common >>> case). Adding a "fflush(stderr)" "fixes" that. >>=20 >> I don't think the buffering is the issue. By default stderr flushes on >> lines, and we flush commits after showing them. If you take away "-P" >> (or look at the combined 2>&1 output in order), you'll see that they are >> grouped. >>=20 >> Now one thing you might notice is that there may be multiple warnings >> between output commits. But that's because we really are re-encoding >> each of those intermediate commits to do your --author grep. And if that >> re-encoding fails, we may well be producing the wrong output, because >> the matching won't be correct (in your case, presumably the correct >> output should be _nothing_, because =C3=86 is not an ascii character). > > I don't understand why i18n.commitEncoding plays a role here. Isn't it > an instruction "when you make a commit, mark the commit message having > this encoding". But grep does not make a commit. > > If this were i18n.logOuputEncoding it would make much more sense. > > Have I misunderstood the meaning of the two options? It doesn't, see my later <871r4umfnm.fsf@evledraar.gmail.com> for when I got it right. For the amount of warnings etc. it's the same, whether we call iconv because it's e.g. ascii->utf-8 and that triggers iconv() issues, or (with i18n.logOuputEncoding) utf-8->ascii. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1E6CC433F5 for ; Fri, 22 Oct 2021 23:00:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B5B5361037 for ; Fri, 22 Oct 2021 23:00:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231691AbhJVXCq (ORCPT ); Fri, 22 Oct 2021 19:02:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56598 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229707AbhJVXCp (ORCPT ); Fri, 22 Oct 2021 19:02:45 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 742ECC061764 for ; Fri, 22 Oct 2021 16:00:27 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id g10so7021577edj.1 for ; Fri, 22 Oct 2021 16:00:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version; bh=5orOQWGxRKoAE7szDcr5I+Zc4CQQv4tV+NwCln0viW4=; b=EAXKqhT3BYozvlcJzh0f9D4vzxW4x2FKP1RKsTwEvP1yS5fq5PP7qM4C9rzHFiwy1q Kzi/bBXf6HYwloBFpe6yR8h4EpYbp2CEx/ZK+iYpa/sPOLsMRGKoAYe3CIG/WIrhCXZA 8t57Y5kGynKNQB8sucgADTNhcZq6BOO3yFe3hvqSZRmhoGqAoU/T4yk0Cju7en2NNef3 G2l1NxuocEwCU6FcbYkxUE44f5D0VYo12SYT5z7Cbl6DwQadtnGhDj2qtmhzXWzrECzF FtR4h+jRq83GtkSYv+6Oi3oXgRdBO/LgLxQBEfb8CQnmmRBJhQCbCxMqVdKYN14H9ofH OReg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version; bh=5orOQWGxRKoAE7szDcr5I+Zc4CQQv4tV+NwCln0viW4=; b=x+m7oECdflFGPEyoiWGP2VD/TxDVh69V904BgB3G6LEhAlLjFJt5g4MBynZMADB6PV G/G1B0SgbTffLFyqkPBZ9b2ockeay/nh2bWteXjTMzdVGXbJ8mATUnhHAhV5RGgVJ1tc f/Ym9oymGqDtYm1jogRyNsA8EtwuqlLkzoZTceXck58aSq16WQ8JZ14hLDWpIxZK0AJw O72u0a8RZ2OE7/qkzh8WQi1NmL9cDdaxaxaPlLJ6A6VIgteWol1FhHn5rO6KHjqyYPvG ZUk29L07CLgXWEvbJaihah0P229MG0Tpk3jWN62kDZ15z/v1SKR52bFdBbCmWyLglrq8 mNrw== X-Gm-Message-State: AOAM532TvCy8BSBZo//4Fxnp2lj+x+9inu7MSALUJbnXlKOjd2CHY+ap 0zVZ4iywBlS36hFkFRZ/KKx1fQrmICKU2A== X-Google-Smtp-Source: ABdhPJwLi4+O/ImLqPNNrufeTEM2pWsgS0EMM5MqNEKUN3GCPqxkpMSESWnauxGh/xXx8eghqgdviw== X-Received: by 2002:a05:6402:3488:: with SMTP id v8mr3908827edc.106.1634943626029; Fri, 22 Oct 2021 16:00:26 -0700 (PDT) Received: from gmgdl (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id cx10sm1145206edb.93.2021.10.22.16.00.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Oct 2021 16:00:25 -0700 (PDT) Received: from avar by gmgdl with local (Exim 4.95) (envelope-from ) id 1me3WT-001Ogy-1O; Sat, 23 Oct 2021 01:00:25 +0200 From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechows?= =?utf-8?Q?ki?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Date: Sat, 23 Oct 2021 00:58:34 +0200 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.6.6 In-reply-to: Message-ID: <211023.86sfwsis1i.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 08 2021, Jeff King wrote: > On Fri, Oct 08, 2021 at 10:36:02PM -0400, Jeff King wrote: > >> If that were coupled with, say, an advise() call to explain that output >> and matching might be inaccurate (and show that _once_), that might >> might it more clear what's going on. >> >> Now I am sympathetic to flooding the user with too many messages, and >> maybe reducing this to a single instance of "some commit messages could >> not be re-encoded; output and matching might be inaccurate" is the right >> thing. But in a sense, it's also working as designed: what you asked for >> is producing wrong output over and over, and Git is saying so. > > The single-output version would perhaps be something like this: > > diff --git a/pretty.c b/pretty.c > index 708b618cfe..c86f41bae7 100644 > --- a/pretty.c > +++ b/pretty.c > @@ -606,6 +606,21 @@ static char *replace_encoding_header(char *buf, const char *encoding) > return strbuf_detach(&tmp, NULL); > } > > +static void show_encoding_warning(const char *output_encoding) > +{ > + static int seen_warning; > + > + if (seen_warning) > + return; > + > + seen_warning = 1; > + warning("one or more commits could not be re-encoded to '%s'", > + output_encoding); > + advise("When re-encoding fails, some output may be in an unexpected\n" > + "encoding, and pattern matches against commit data may be\n" > + "inaccurate."); > +} > + > const char *repo_logmsg_reencode(struct repository *r, > const struct commit *commit, > char **commit_encoding, > @@ -673,7 +688,7 @@ const char *repo_logmsg_reencode(struct repository *r, > * case we just return the commit message verbatim. > */ > if (!out) { > - warning("unable to reencode commit to '%s'", output_encoding); > + show_encoding_warning(output_encoding); > return msg; > } > return out; *Poke* about this. We're getting pretty close to release. I think the WIP hunk I posted in https://lore.kernel.org/git/871r4umfnm.fsf@evledraar.gmail.com/ presents a good way forward with this. I.e. aside from how wise it is to warn about this in general, I think there's a pretty bad bug in your implementation where what should effectively be a parse_options() or git_config()-time one-off warning is being fired off for every single commit in "git log" in some cases. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4461C433F5 for ; Wed, 27 Oct 2021 11:03:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8AF906103C for ; Wed, 27 Oct 2021 11:03:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241577AbhJ0LFc (ORCPT ); Wed, 27 Oct 2021 07:05:32 -0400 Received: from cloud.peff.net ([104.130.231.41]:47884 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239782AbhJ0LFb (ORCPT ); Wed, 27 Oct 2021 07:05:31 -0400 Received: (qmail 13083 invoked by uid 109); 27 Oct 2021 11:03:06 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Wed, 27 Oct 2021 11:03:06 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 21502 invoked by uid 111); 27 Oct 2021 11:03:05 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Wed, 27 Oct 2021 07:03:05 -0400 Authentication-Results: peff.net; auth=none Date: Wed, 27 Oct 2021 07:03:05 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> <871r4umfnm.fsf@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <871r4umfnm.fsf@evledraar.gmail.com> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Sat, Oct 09, 2021 at 03:47:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > But in this case this seems to have been because someone tried to feed > "HTML" to it, which is not an encoding, and something iconv_open() has > (I daresay) always and will always error on. It returns -1 and sets > errno=EINVAL. > > So having a warning or other detection in the revision loop seems > backwards to me, surely we want something like the below instead? > I.e. die as close to bad option parsing as possible? Sorry for the slow response; this got thrown on my "to think about and look at later" pile. Yeah, I agree that if we sanity-checked the encoding up front, that would cover the case we saw in practice, and goes a long way towards catching any practical errors. But I think this patch is tricky: > diff --git a/environment.c b/environment.c > index 43bb1b35ffe..c26b18f8e5c 100644 > --- a/environment.c > +++ b/environment.c > @@ -357,8 +357,18 @@ void set_git_dir(const char *path, int make_realpath) > > const char *get_log_output_encoding(void) > { > - return git_log_output_encoding ? git_log_output_encoding > + const char *encoding = git_log_output_encoding ? git_log_output_encoding > : get_commit_output_encoding(); > +#ifndef NO_ICONV > + iconv_t conv; > + conv = iconv_open(encoding, "UTF-8"); > + if (conv == (iconv_t) -1 && errno == EINVAL) > + die_errno("the '%s' encoding is not known to iconv", encoding); > +#else > + if (strcmp(encoding, "UTF-8")) > + die("compiled with NO_ICONV=Y, can't re-encode to '%s'", encoding); > +#endif > + return encoding; > } So one obvious problem here is that we call this function once per commit, so it's a lot of extra iconv_open() calls. But obviously we could use a static flag to do it once per process. The other issue is that it is assuming UTF-8 on one end of the conversion. But we aren't necessarily doing such a conversion; it depends on the commit's on-disk encoding, and the requested output encoding. In particular: - if both of those match, we do not need to call iconv at all (see the same_encoding() check in repo_logmsg_reencode()). With the patch above, the NO_ICONV case would start to die() when both are say iso8859-1, even though it currently works. - likewise, even if you have iconv support, it's possible that your preferred encoding is not compatible with utf8. In which case iconv_open() may complain, even though the actual conversion we'd ask it to do would succeed. I.e., I don't think there's a way to just ask iconv "does this encoding name by itself make any sense". You can only ask it about to/from combos. So I think a much better version of this is to catch the _actual_ iconv_open() call we make. And if it fails, say "woah, this combo of encodings isn't supported". The reason I didn't do that in the earlier patch is that all of this is obscured inside reencode_string_len(), which does both the iconv_open() and the iconv() call. We could surface that error information. But I'm not sure it would make sense to die() in that case. While for something like "git log --encoding=nonsense" every commit is going to fail to re-encode, it's still possible that iconv_open() failures are commit-specific. I.e., you could have some garbage commit in your history with an unsupported encoding, and you wouldn't want to die() for it (it's the same case you are complaining about having a warning for, but much worse). I suspect the best we could do along these lines is to wait until a real iconv_open(to, from) fails, and then as a fallback try: iconv_open("UTF-8", from); iconv_open(to, "UTF-8"); to sanity-check them individually, and guess that one of them is broken if it can't go to/from UTF-8. But even that feels like it's making assumptions about both the system iconv, and the charsets people use. To be clear, I'd expect that most people just use utf-8 in the first place, and even if they don't that their system has some basic utf-8 support. But we are deep into the realm of weird corner cases here, and the utility of this warning / error-checking doesn't seem high enough to merit the possible regressions we'd get by trying to make too many assumptions. -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E786C433EF for ; Fri, 29 Oct 2021 10:57:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 74D73610CF for ; Fri, 29 Oct 2021 10:57:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231788AbhJ2LAM (ORCPT ); Fri, 29 Oct 2021 07:00:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231740AbhJ2LAM (ORCPT ); Fri, 29 Oct 2021 07:00:12 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BC37C061570 for ; Fri, 29 Oct 2021 03:57:42 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id w1so18442442edd.0 for ; Fri, 29 Oct 2021 03:57:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:references:user-agent:in-reply-to :message-id:mime-version:content-transfer-encoding; bh=Ty3hXPlzSGIWKq4kxbAe3v7g4pfG6Nie34qd6lPPSgA=; b=me6qMBvGwf/KofIqTXUYO+Q5kjNyuYpSrXAdJiWvS8iuW5bHgCNWNsLbXsO+WbklVm RHejf+u/P5+yf2wCRA6Kq/bOiXDp8wgTQNrGwO7NTgZJzNX3/t8GBb9XP4K6KbLtfjxo c4VKLFLaWCHiCWyy/hjfkZ4dDkX2FKjWeG9FlatEtQg55dx/sNYkAJyDDJyQWfg9BeDh xcomedAoCBaEPE418mZ9uS3u/bcZf0CviZgWOmVK9d109c58QAk6y++xyfrz8Cn3msQd ql65viaol2uFIPPjoW9dvtLkPVHBiM1T6jdJwIFVyFXxATKSxoERpYMBLA6Zi88bIWOE wzBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:references:user-agent :in-reply-to:message-id:mime-version:content-transfer-encoding; bh=Ty3hXPlzSGIWKq4kxbAe3v7g4pfG6Nie34qd6lPPSgA=; b=xTS/q3HbXZcKVTj+CXnxU2sh4Oax8vBNTYLTV9axuusNHGYL6ImXb/Y7IRrPQd2+Mo T2RKJCnkUvV+i3M9ADZyV2lZqoScik8LLF7uXQ1VDvwnO03zUSiqU+4PeTuXsfmilSQw h5A2nEYcozsoD9eloI48xRTcvPJPy4zqmuVYEN7jFmSd4aAxMwAA43sR4B7NilfsqAm7 DsoVNMCiVohcLHqT7UI6BKu925bcsRV+swfdo7+2fOw9HG97mr6BCjGSTOtG8yzuTFjD /ew6fHgz5F5kG3TMhU5CB/UUnHBgolc4J9929cgm8AELkwCX+C/lmZ4rFp1jPqYuvImm 2p9Q== X-Gm-Message-State: AOAM5323Qe1ZbBoh1FnOH0R6StEyepH48ThrCv8LuV7j2mqh5OuNTaiV Qhs4Ygl2cugBaKdeGpuuXq8= X-Google-Smtp-Source: ABdhPJzsV7u5xMZBBUr3JkRz6eyNNMNCL+IgUAGEAKEQ9406I3IJ/r/p5QMMfUzecgdmT/gIPqMl9w== X-Received: by 2002:aa7:cb86:: with SMTP id r6mr14073320edt.236.1635505060918; Fri, 29 Oct 2021 03:57:40 -0700 (PDT) Received: from gmgdl (j120189.upc-j.chello.nl. [24.132.120.189]) by smtp.gmail.com with ESMTPSA id ne10sm1872672ejc.67.2021.10.29.03.57.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Oct 2021 03:57:40 -0700 (PDT) Received: from avar by gmgdl with local (Exim 4.95) (envelope-from ) id 1mgPZr-0024gM-Q3; Fri, 29 Oct 2021 12:57:39 +0200 From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Jeff King Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechows?= =?utf-8?Q?ki?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Date: Fri, 29 Oct 2021 12:47:36 +0200 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> <871r4umfnm.fsf@evledraar.gmail.com> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.6.6 In-reply-to: Message-ID: <211029.86bl38w124.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Oct 27 2021, Jeff King wrote: > On Sat, Oct 09, 2021 at 03:47:16PM +0200, =C3=86var Arnfj=C3=B6r=C3=B0 Bj= armason wrote: > >> But in this case this seems to have been because someone tried to feed >> "HTML" to it, which is not an encoding, and something iconv_open() has >> (I daresay) always and will always error on. It returns -1 and sets >> errno=3DEINVAL. >>=20 >> So having a warning or other detection in the revision loop seems >> backwards to me, surely we want something like the below instead? >> I.e. die as close to bad option parsing as possible? > > Sorry for the slow response; this got thrown on my "to think about and > look at later" pile. > > Yeah, I agree that if we sanity-checked the encoding up front, that > would cover the case we saw in practice, and goes a long way towards > catching any practical errors. > > But I think this patch is tricky: > >> diff --git a/environment.c b/environment.c >> index 43bb1b35ffe..c26b18f8e5c 100644 >> --- a/environment.c >> +++ b/environment.c >> @@ -357,8 +357,18 @@ void set_git_dir(const char *path, int make_realpat= h) >>=20=20 >> const char *get_log_output_encoding(void) >> { >> - return git_log_output_encoding ? git_log_output_encoding >> + const char *encoding =3D git_log_output_encoding ? git_log_output_enco= ding >> : get_commit_output_encoding(); >> +#ifndef NO_ICONV >> + iconv_t conv; >> + conv =3D iconv_open(encoding, "UTF-8"); >> + if (conv =3D=3D (iconv_t) -1 && errno =3D=3D EINVAL) >> + die_errno("the '%s' encoding is not known to iconv", encoding); >> +#else >> + if (strcmp(encoding, "UTF-8")) >> + die("compiled with NO_ICONV=3DY, can't re-encode to '%s'", encoding); >> +#endif >> + return encoding; >> } > > So one obvious problem here is that we call this function once per > commit, so it's a lot of extra iconv_open() calls. But obviously we > could use a static flag to do it once per process. Yes, or the diff below, which seems like a better idea. I.e. stop calling this in a loop if we know we'll need it, have setup_revisions() populate it once. > The other issue is that it is assuming UTF-8 on one end of the > conversion. But we aren't necessarily doing such a conversion; it > depends on the commit's on-disk encoding, and the requested output > encoding. In particular: > > - if both of those match, we do not need to call iconv at all (see the > same_encoding() check in repo_logmsg_reencode()). With the patch > above, the NO_ICONV case would start to die() when both are say > iso8859-1, even though it currently works. > > - likewise, even if you have iconv support, it's possible that your > preferred encoding is not compatible with utf8. In which case > iconv_open() may complain, even though the actual conversion we'd > ask it to do would succeed. > > I.e., I don't think there's a way to just ask iconv "does this encoding > name by itself make any sense". You can only ask it about to/from > combos. Yes, I'm not saying it covers the general problem, but that it covers the specific complained-about issue of a completely nonsensical encoding like "HTML". We should simply error on that on command startup, whether or not we have any commits to visit. But yes, if we want to cover specific encoding issues, e.g. not being able to squash CJK UTF-8 into US-ASCII that needs to be per-commit, but... > So I think a much better version of this is to catch the _actual_ > iconv_open() call we make. And if it fails, say "woah, this combo of > encodings isn't supported". The reason I didn't do that in the earlier > patch is that all of this is obscured inside reencode_string_len(), > which does both the iconv_open() and the iconv() call. We could surface > that error information. > > But I'm not sure it would make sense to die() in that case. While for > something like "git log --encoding=3Dnonsense" every commit is going to > fail to re-encode, it's still possible that iconv_open() failures are > commit-specific. I.e., you could have some garbage commit in your > history with an unsupported encoding, and you wouldn't want to die() for > it (it's the same case you are complaining about having a warning for, > but much worse). > > I suspect the best we could do along these lines is to wait until a real > iconv_open(to, from) fails, and then as a fallback try: > > iconv_open("UTF-8", from); > iconv_open(to, "UTF-8"); > > to sanity-check them individually, and guess that one of them is broken > if it can't go to/from UTF-8. But even that feels like it's making > assumptions about both the system iconv, and the charsets people use. > > To be clear, I'd expect that most people just use utf-8 in the first > place, and even if they don't that their system has some basic utf-8 > support. But we are deep into the realm of weird corner cases here, and > the utility of this warning / error-checking doesn't seem high enough to > merit the possible regressions we'd get by trying to make too many > assumptions. I still think the right move here is to just revert your patch. Yes we can think of a bunch of tricky edge cases that need to be fixed, but this is a long-standing problem, and the change as-is has some pretty bad UI problems. I think any change that solves those (probably starting with the below) is going to be relatively major this close to release. So per <87ily7m1mv.fsf@evledraar.gmail.com> why can't we just revert the warning(), and then consider a good way forward that covers some/all of these cases we've noted? Which should probably start with the below diff, so we can clearly distinguish startup bad encoding names from runtime ones (we'd put that proposed "die if iconv_open() doesn't like it" somewhere in setup_revisions()). diff --git a/builtin/log.c b/builtin/log.c index f75d87e8d7f..2b3a607e947 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -539,7 +539,7 @@ static void show_tagger(const char *buf, struct rev_inf= o *rev) =20 pp.fmt =3D rev->commit_format; pp.date_mode =3D rev->date_mode; - pp_user_info(&pp, "Tagger", &out, buf, get_log_output_encoding()); + pp_user_info(&pp, "Tagger", &out, buf, rev->log_output_encoding); fprintf(rev->diffopt.file, "%s", out.buf); strbuf_release(&out); } @@ -1208,7 +1208,7 @@ static void make_cover_letter(struct rev_info *rev, i= nt use_separate_file, log.file =3D rev->diffopt.file; log.groups =3D SHORTLOG_GROUP_AUTHOR; for (i =3D 0; i < nr; i++) - shortlog_add_commit(&log, list[i]); + shortlog_add_commit(rev, &log, list[i]); =20 shortlog_output(&log); =20 diff --git a/builtin/merge.c b/builtin/merge.c index ea3112e0c0b..1a8fbafc149 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -431,6 +431,7 @@ static void squash_message(struct commit *commit, struc= t commit_list *remotehead ctx.abbrev =3D rev.abbrev; ctx.date_mode =3D rev.date_mode; ctx.fmt =3D rev.commit_format; + ctx.output_encoding =3D rev.log_output_encoding; =20 strbuf_addstr(&out, "Squashed commit of the following:\n"); while ((commit =3D get_revision(&rev)) !=3D NULL) { diff --git a/builtin/rev-list.c b/builtin/rev-list.c index 36cb909ebaa..905ba7462f3 100644 --- a/builtin/rev-list.c +++ b/builtin/rev-list.c @@ -165,7 +165,7 @@ static void show_commit(struct commit *commit, void *da= ta) ctx.date_mode =3D revs->date_mode; ctx.date_mode_explicit =3D revs->date_mode_explicit; ctx.fmt =3D revs->commit_format; - ctx.output_encoding =3D get_log_output_encoding(); + ctx.output_encoding =3D revs->log_output_encoding; ctx.color =3D revs->diffopt.use_color; pretty_print_commit(&ctx, commit, &buf); if (buf.len) { diff --git a/builtin/shortlog.c b/builtin/shortlog.c index e7f7af5de3f..2e5409a4fcc 100644 --- a/builtin/shortlog.c +++ b/builtin/shortlog.c @@ -198,7 +198,8 @@ static void insert_records_from_trailers(struct shortlo= g *log, unuse_commit_buffer(commit, commit_buffer); } =20 -void shortlog_add_commit(struct shortlog *log, struct commit *commit) +void shortlog_add_commit(struct rev_info *rev, struct shortlog *log, + struct commit *commit) { struct strbuf ident =3D STRBUF_INIT; struct strbuf oneline =3D STRBUF_INIT; @@ -210,7 +211,7 @@ void shortlog_add_commit(struct shortlog *log, struct c= ommit *commit) ctx.abbrev =3D log->abbrev; ctx.print_email_subject =3D 1; ctx.date_mode.type =3D DATE_NORMAL; - ctx.output_encoding =3D get_log_output_encoding(); + ctx.output_encoding =3D rev->log_output_encoding; =20 if (!log->summary) { if (log->user_format) @@ -254,7 +255,7 @@ static void get_from_rev(struct rev_info *rev, struct s= hortlog *log) if (prepare_revision_walk(rev)) die(_("revision walk setup failed")); while ((commit =3D get_revision(rev)) !=3D NULL) - shortlog_add_commit(log, commit); + shortlog_add_commit(rev, log, commit); } =20 static int parse_uint(char const **arg, int comma, int defval) diff --git a/bundle.c b/bundle.c index a0bb687b0f4..32d74ab0bee 100644 --- a/bundle.c +++ b/bundle.c @@ -448,6 +448,7 @@ static int write_bundle_refs(int bundle_fd, struct rev_= info *revs) =20 struct bundle_prerequisites_info { struct object_array *pending; + struct rev_info *rev; int fd; }; =20 @@ -464,7 +465,7 @@ static void write_bundle_prerequisites(struct commit *c= ommit, void *data) write_or_die(bpi->fd, buf.buf, buf.len); =20 ctx.fmt =3D CMIT_FMT_ONELINE; - ctx.output_encoding =3D get_log_output_encoding(); + ctx.output_encoding =3D bpi->rev->log_output_encoding; strbuf_reset(&buf); pretty_print_commit(&ctx, commit, &buf); strbuf_trim(&buf); @@ -544,6 +545,7 @@ int create_bundle(struct repository *r, const char *pat= h, die("revision walk setup failed"); bpi.fd =3D bundle_fd; bpi.pending =3D &revs_copy.pending; + bpi.rev =3D &revs; traverse_commit_list(&revs, write_bundle_prerequisites, NULL, &bpi); object_array_remove_duplicates(&revs_copy.pending); =20 diff --git a/log-tree.c b/log-tree.c index 644893fd8cf..d79324d5bfd 100644 --- a/log-tree.c +++ b/log-tree.c @@ -737,7 +737,7 @@ void show_log(struct rev_info *opt) =20 raw =3D (opt->commit_format =3D=3D CMIT_FMT_USERFORMAT); format_display_notes(&commit->object.oid, ¬ebuf, - get_log_output_encoding(), raw); + opt->log_output_encoding, raw); ctx.notes_message =3D strbuf_detach(¬ebuf, NULL); } =20 @@ -758,7 +758,7 @@ void show_log(struct rev_info *opt) ctx.mailmap =3D opt->mailmap; ctx.color =3D opt->diffopt.use_color; ctx.expand_tabs_in_log =3D opt->expand_tabs_in_log; - ctx.output_encoding =3D get_log_output_encoding(); + ctx.output_encoding =3D opt->log_output_encoding; ctx.rev =3D opt; if (opt->from_ident.mail_begin && opt->from_ident.name_begin) ctx.from_ident =3D &opt->from_ident; diff --git a/pretty.c b/pretty.c index fe95107ae5a..6eb64e8189d 100644 --- a/pretty.c +++ b/pretty.c @@ -2048,15 +2048,17 @@ void pretty_print_commit(struct pretty_print_contex= t *pp, int indent =3D 4; const char *msg; const char *reencoded; - const char *encoding; + const char *encoding =3D pp->output_encoding; int need_8bit_cte =3D pp->need_8bit_cte; =20 + if (!encoding) + BUG("should have .output_encoding"); + if (pp->fmt =3D=3D CMIT_FMT_USERFORMAT) { format_commit_message(commit, user_format, sb, pp); return; } =20 - encoding =3D get_log_output_encoding(); msg =3D reencoded =3D logmsg_reencode(commit, NULL, encoding); =20 if (pp->fmt =3D=3D CMIT_FMT_ONELINE || cmit_fmt_is_mail(pp->fmt)) @@ -2121,6 +2123,14 @@ void pp_commit_easy(enum cmit_fmt fmt, const struct = commit *commit, struct strbuf *sb) { struct pretty_print_context pp =3D {0}; + static const char *output_encoding; + + if (!output_encoding) { + const char *tmp =3D get_log_output_encoding(); + output_encoding =3D tmp; + } +=09 pp.fmt =3D fmt; + pp.output_encoding =3D output_encoding; pretty_print_commit(&pp, commit, sb); } diff --git a/remote-curl.c b/remote-curl.c index d69156312bd..733a525bc73 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -345,8 +345,14 @@ static void free_discovery(struct discovery *d) static int show_http_message(struct strbuf *type, struct strbuf *charset, struct strbuf *msg) { + static const char *output_encoding; const char *p, *eol; =20 + if (!output_encoding) { + const char *tmp =3D get_log_output_encoding(); + output_encoding =3D tmp; + } + /* * We only show text/plain parts, as other types are likely * to be ugly to look at on the user's terminal. @@ -354,7 +360,7 @@ static int show_http_message(struct strbuf *type, struc= t strbuf *charset, if (strcmp(type->buf, "text/plain")) return -1; if (charset->len) - strbuf_reencode(msg, charset->buf, get_log_output_encoding()); + strbuf_reencode(msg, charset->buf, output_encoding); =20 strbuf_trim(msg); if (!msg->len) diff --git a/revision.c b/revision.c index ab7c1358042..0b2ad87f28e 100644 --- a/revision.c +++ b/revision.c @@ -2866,7 +2866,8 @@ int setup_revisions(int argc, const char **argv, stru= ct rev_info *revs, struct s =20 grep_commit_pattern_type(GREP_PATTERN_TYPE_UNSPECIFIED, &revs->grep_filter); - if (!is_encoding_utf8(get_log_output_encoding())) + revs->log_output_encoding =3D get_log_output_encoding(); + if (!is_encoding_utf8(revs->log_output_encoding)) revs->grep_filter.ignore_locale =3D 1; compile_grep_patterns(&revs->grep_filter); =20 diff --git a/revision.h b/revision.h index 5578bb4720a..bf9dbca727e 100644 --- a/revision.h +++ b/revision.h @@ -237,6 +237,7 @@ struct rev_info { struct string_list *ref_message_ids; int add_signoff; const char *extra_headers; + const char *log_output_encoding; const char *log_reencode; const char *subject_prefix; int patch_name_max; diff --git a/shortlog.h b/shortlog.h index 3f7e9aabcae..f809617f8a0 100644 --- a/shortlog.h +++ b/shortlog.h @@ -30,7 +30,9 @@ struct shortlog { =20 void shortlog_init(struct shortlog *log); =20 -void shortlog_add_commit(struct shortlog *log, struct commit *commit); +struct rev_info; +void shortlog_add_commit(struct rev_info *rev, struct shortlog *log, + struct commit *commit); =20 void shortlog_output(struct shortlog *log); =20 diff --git a/submodule.c b/submodule.c index c6890705241..f10e9c34ff6 100644 --- a/submodule.c +++ b/submodule.c @@ -501,7 +501,7 @@ static void print_submodule_diff_summary(struct reposit= ory *r, struct rev_info * while ((commit =3D get_revision(rev))) { struct pretty_print_context ctx =3D {0}; ctx.date_mode =3D rev->date_mode; - ctx.output_encoding =3D get_log_output_encoding(); + ctx.output_encoding =3D rev->log_output_encoding; strbuf_setlen(&sb, 0); repo_format_commit_message(r, commit, format, &sb, &ctx); From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BD615C433EF for ; Fri, 29 Oct 2021 20:40:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9863661040 for ; Fri, 29 Oct 2021 20:40:04 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230070AbhJ2Umc (ORCPT ); Fri, 29 Oct 2021 16:42:32 -0400 Received: from cloud.peff.net ([104.130.231.41]:49776 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229441AbhJ2Umc (ORCPT ); Fri, 29 Oct 2021 16:42:32 -0400 Received: (qmail 23770 invoked by uid 109); 29 Oct 2021 20:40:02 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 29 Oct 2021 20:40:02 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 22681 invoked by uid 111); 29 Oct 2021 20:40:02 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 29 Oct 2021 16:40:02 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 29 Oct 2021 16:40:01 -0400 From: Jeff King To: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason Cc: Junio C Hamano , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Message-ID: References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> <871r4umfnm.fsf@evledraar.gmail.com> <211029.86bl38w124.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <211029.86bl38w124.gmgdl@evledraar.gmail.com> Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 29, 2021 at 12:47:36PM +0200, Ævar Arnfjörð Bjarmason wrote: > > The other issue is that it is assuming UTF-8 on one end of the > > conversion. But we aren't necessarily doing such a conversion; it > > depends on the commit's on-disk encoding, and the requested output > > encoding. In particular: > > > > - if both of those match, we do not need to call iconv at all (see the > > same_encoding() check in repo_logmsg_reencode()). With the patch > > above, the NO_ICONV case would start to die() when both are say > > iso8859-1, even though it currently works. > > > > - likewise, even if you have iconv support, it's possible that your > > preferred encoding is not compatible with utf8. In which case > > iconv_open() may complain, even though the actual conversion we'd > > ask it to do would succeed. > > > > I.e., I don't think there's a way to just ask iconv "does this encoding > > name by itself make any sense". You can only ask it about to/from > > combos. > > Yes, I'm not saying it covers the general problem, but that it covers > the specific complained-about issue of a completely nonsensical encoding > like "HTML". We should simply error on that on command startup, whether > or not we have any commits to visit. I definitely agree with you on the direction, and I don't mind if we don't cover every case. What I was trying to point out above though is that the patch you showed actually _regresses_ some cases, and it's hard to robustly avoid that. > So per <87ily7m1mv.fsf@evledraar.gmail.com> why can't we just revert the > warning(), and then consider a good way forward that covers some/all of > these cases we've noted? Right, I agreed with that in the other thread. You may need to convince Junio. ;) TBH I am not even sure it is worth spending a lot of brain cells on the "and then consider..." part. Over all these years, we've had one report, and it simply misunderstand what "--encoding" was for. I thought it was something we could fix up easily by checking a return value, but IMHO doing it right is quite tricky because of iconv()'s limited interface, and the risk of regression outweighs the potential benefit. -Peff From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C736BC433F5 for ; Fri, 29 Oct 2021 20:45:23 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A992760F02 for ; Fri, 29 Oct 2021 20:45:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231420AbhJ2Uru (ORCPT ); Fri, 29 Oct 2021 16:47:50 -0400 Received: from pb-smtp21.pobox.com ([173.228.157.53]:54661 "EHLO pb-smtp21.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230370AbhJ2Urp (ORCPT ); Fri, 29 Oct 2021 16:47:45 -0400 Received: from pb-smtp21.pobox.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 761E515E2D8; Fri, 29 Oct 2021 16:45:16 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=SIJvL8Spk87jEt5OSSud6oi8LxrUTwbfDh60Rk ZNgG0=; b=HdQOfkEIu/A4SeBeF0H6jl2rNy6PPS3L5IXwHzYZlBlBWU4iu5BZzZ /17gFfEo34vu+V+BzU1T9OivQm/AUWqcLZfTEpBa/yVjiTRBIke2G7YV/P8+UDAg 8RgNMGIVvWKQy6vyvQiQdu9D3fDdIsTzSyuUz7wo/sMkr2JLtZo8I= Received: from pb-smtp21.sea.icgroup.com (unknown [127.0.0.1]) by pb-smtp21.pobox.com (Postfix) with ESMTP id 6F40415E2D7; Fri, 29 Oct 2021 16:45:16 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [104.133.2.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp21.pobox.com (Postfix) with ESMTPSA id 1127615E2D2; Fri, 29 Oct 2021 16:45:13 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Jeff King Cc: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> <871r4umfnm.fsf@evledraar.gmail.com> <211029.86bl38w124.gmgdl@evledraar.gmail.com> Date: Fri, 29 Oct 2021 13:45:11 -0700 In-Reply-To: (Jeff King's message of "Fri, 29 Oct 2021 16:40:01 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 1D473438-38F9-11EC-8926-98D80D944F46-77302942!pb-smtp21.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Jeff King writes: > TBH I am not even sure it is worth spending a lot of brain cells on the > "and then consider..." part. Over all these years, we've had one report, > and it simply misunderstand what "--encoding" was for. I thought it was > something we could fix up easily by checking a return value, but IMHO > doing it right is quite tricky because of iconv()'s limited interface, > and the risk of regression outweighs the potential benefit. I tend to agree with the above. Let's not over-engineer things. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B9C3C433EF for ; Fri, 29 Oct 2021 20:52:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1F27760C51 for ; Fri, 29 Oct 2021 20:52:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230215AbhJ2Uye (ORCPT ); Fri, 29 Oct 2021 16:54:34 -0400 Received: from pb-smtp2.pobox.com ([64.147.108.71]:52070 "EHLO pb-smtp2.pobox.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229873AbhJ2Uyd (ORCPT ); Fri, 29 Oct 2021 16:54:33 -0400 Received: from pb-smtp2.pobox.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id DC647FC49D; Fri, 29 Oct 2021 16:52:03 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=zIN045TdVUJxO1Zz9QSbq5Q3AISpRivsHZZ3v6 v7aGY=; b=UTH+obzVo4DHq/U4KTHCHOo2LDHFt5BG8DfMTQfI0L+4M0bR3sQwTP SOIQFENL2/FuMYUo3cQkUbd+4YCmwN9SkLf2e3m6sA6B8tn87iJiIH6XmLY5nFVC 24MWYPsz5d/AYCGEgyvmre/R6ZuIB+mgShQS3qb0Jmw8P0UbNrKTI= Received: from pb-smtp2.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp2.pobox.com (Postfix) with ESMTP id D3003FC49C; Fri, 29 Oct 2021 16:52:03 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [104.133.2.91]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp2.pobox.com (Postfix) with ESMTPSA id 39BA0FC49B; Fri, 29 Oct 2021 16:52:03 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: Jeff King Cc: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 References: <9896630.2IqcCWsCYL@localhost.localdomain> <87ily7m1mv.fsf@evledraar.gmail.com> <871r4umfnm.fsf@evledraar.gmail.com> <211029.86bl38w124.gmgdl@evledraar.gmail.com> Date: Fri, 29 Oct 2021 13:52:02 -0700 In-Reply-To: (Junio C. Hamano's message of "Fri, 29 Oct 2021 13:45:11 -0700") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 11C1D9F0-38FA-11EC-B087-CD991BBA3BAF-77302942!pb-smtp2.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Junio C Hamano writes: > Jeff King writes: > >> TBH I am not even sure it is worth spending a lot of brain cells on the >> "and then consider..." part. Over all these years, we've had one report, >> and it simply misunderstand what "--encoding" was for. I thought it was >> something we could fix up easily by checking a return value, but IMHO >> doing it right is quite tricky because of iconv()'s limited interface, >> and the risk of regression outweighs the potential benefit. > > I tend to agree with the above. Let's not over-engineer things. ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----- From: Junio C Hamano Date: Fri, 29 Oct 2021 13:48:58 -0700 Subject: [PATCH] Revert "logmsg_reencode(): warn when iconv() fails" This reverts commit fd680bc5 (logmsg_reencode(): warn when iconv() fails, 2021-08-27). Throwing a warning for each and every commit that gets reencoded, without allowing a way to squelch, would make it unpleasant for folks who have to deal with an ancient part of the history in an old project that used wrong encoding in the commits. Signed-off-by: Junio C Hamano --- Documentation/pretty-options.txt | 4 +--- pretty.c | 6 +----- t/t4210-log-i18n.sh | 7 ------- 3 files changed, 2 insertions(+), 15 deletions(-) diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt index b3af850608..54d8bb3db0 100644 --- a/Documentation/pretty-options.txt +++ b/Documentation/pretty-options.txt @@ -40,9 +40,7 @@ people using 80-column terminals. defaults to UTF-8. Note that if an object claims to be encoded in `X` and we are outputting in `X`, we will output the object verbatim; this means that invalid sequences in the original - commit may be copied to the output. Likewise, if iconv(3) fails - to convert the commit, we will output the original object - verbatim, along with a warning. + commit may be copied to the output. --expand-tabs=:: --expand-tabs:: diff --git a/pretty.c b/pretty.c index 73b5ead509..9631529c10 100644 --- a/pretty.c +++ b/pretty.c @@ -671,11 +671,7 @@ const char *repo_logmsg_reencode(struct repository *r, * If the re-encoding failed, out might be NULL here; in that * case we just return the commit message verbatim. */ - if (!out) { - warning("unable to reencode commit to '%s'", output_encoding); - return msg; - } - return out; + return out ? out : msg; } static int mailmap_name(const char **email, size_t *email_len, diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh index 0141f36e33..d2dfcf164e 100755 --- a/t/t4210-log-i18n.sh +++ b/t/t4210-log-i18n.sh @@ -131,11 +131,4 @@ do fi done -test_expect_success 'log shows warning when conversion fails' ' - enc=this-encoding-does-not-exist && - git log -1 --encoding=$enc 2>err && - echo "warning: unable to reencode commit to ${SQ}${enc}${SQ}" >expect && - test_cmp expect err -' - test_done -- 2.33.1-1007-g607b33ccc6 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FCF9C433F5 for ; Fri, 29 Oct 2021 21:10:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 774EF60E8C for ; Fri, 29 Oct 2021 21:10:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231562AbhJ2VMe (ORCPT ); Fri, 29 Oct 2021 17:12:34 -0400 Received: from cloud.peff.net ([104.130.231.41]:49834 "EHLO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230215AbhJ2VMd (ORCPT ); Fri, 29 Oct 2021 17:12:33 -0400 Received: (qmail 23876 invoked by uid 109); 29 Oct 2021 21:10:04 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with ESMTP; Fri, 29 Oct 2021 21:10:04 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 23000 invoked by uid 111); 29 Oct 2021 21:10:04 -0000 Received: from coredump.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.2) by peff.net (qpsmtpd/0.94) with (TLS_AES_256_GCM_SHA384 encrypted) ESMTPS; Fri, 29 Oct 2021 17:10:04 -0400 Authentication-Results: peff.net; auth=none Date: Fri, 29 Oct 2021 17:10:03 -0400 From: Jeff King To: Junio C Hamano Cc: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason , Krzysztof =?utf-8?Q?=C5=BBelechowski?= , git@vger.kernel.org, Hamza Mahfooz Subject: Re: *Really* noisy encoding warnings post-v2.33.0 Message-ID: References: <87ily7m1mv.fsf@evledraar.gmail.com> <871r4umfnm.fsf@evledraar.gmail.com> <211029.86bl38w124.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Oct 29, 2021 at 01:52:02PM -0700, Junio C Hamano wrote: > > I tend to agree with the above. Let's not over-engineer things. > > ----- >8 --------- >8 --------- >8 --------- >8 --------- >8 ----- > From: Junio C Hamano > Date: Fri, 29 Oct 2021 13:48:58 -0700 > Subject: [PATCH] Revert "logmsg_reencode(): warn when iconv() fails" > > This reverts commit fd680bc5 (logmsg_reencode(): warn when iconv() > fails, 2021-08-27). Throwing a warning for each and every commit > that gets reencoded, without allowing a way to squelch, would make > it unpleasant for folks who have to deal with an ancient part of the > history in an old project that used wrong encoding in the commits. Thanks for tying this up. I do think fd680bc5's documentation was good, though. So I'd suggest squashing this in, or applying it on top. -- >8 -- Subject: [PATCH] log: document --encoding behavior on iconv() failure We already note that we may produce invalid output when we skip calling iconv() altogether. But we may also do so if iconv() fails, and we have no good alternative. Let's document this to avoid surprising users. Signed-off-by: Jeff King --- Documentation/pretty-options.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Documentation/pretty-options.txt b/Documentation/pretty-options.txt index 54d8bb3db0..dc685be363 100644 --- a/Documentation/pretty-options.txt +++ b/Documentation/pretty-options.txt @@ -40,7 +40,9 @@ people using 80-column terminals. defaults to UTF-8. Note that if an object claims to be encoded in `X` and we are outputting in `X`, we will output the object verbatim; this means that invalid sequences in the original - commit may be copied to the output. + commit may be copied to the output. Likewise, if iconv(3) fails + to convert the commit, we will quietly output the original + object verbatim. --expand-tabs=:: --expand-tabs:: -- 2.33.1.1446.g3ed047b518