All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Taylor Blau" <me@ttaylorr.com>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>,
	"Johannes Schindelin" <johannes.schindelin@gmx.de>
Subject: [PATCH v2] ci: disallow directional formatting
Date: Wed, 03 Nov 2021 12:23:55 +0000	[thread overview]
Message-ID: <pull.1071.v2.git.1635942236065.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1071.git.1635857935312.gitgitgadget@gmail.com>

From: Johannes Schindelin <johannes.schindelin@gmx.de>

As described in https://trojansource.codes/trojan-source.pdf, it is
possible to abuse directional formatting (a feature of Unicode) to
deceive human readers into interpreting code differently from compilers.

For example, an "if ()" expression could be enclosed in a comment, but
rendered as if it was outside of that comment. In effect, this could
fool a reviewer into misinterpreting the code flow as benign when it is
not.

It is highly unlikely that Git's source code wants to contain such
directional formatting in the first place, so let's just disallow it.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
    ci: disallow directional formatting
    
    I just stumbled over
    https://siliconangle.com/2021/11/01/trojan-source-technique-can-inject-malware-source-code-without-detection/,
    which details an interesting social-engineering attack: it uses
    directional formatting in source code to pretend to human readers that
    the code does something different than it actually does.
    
    It is highly unlikely that Git's source code wants to contain such
    directional formatting in the first place, so let's disallow it.
    
    Technically, this is not exactly -rc material, but the paper was just
    published, and I want us to be safe.
    
    Changes since v1:
    
     * The code was moved into a script in ci/.
     * We use git ls-files now to exclude files marked as binary in the Git
       attributes.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1071%2Fdscho%2Fcheck-for-utf-8-directional-formatting-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1071/dscho/check-for-utf-8-directional-formatting-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1071

Range-diff vs v1:

 1:  6a1661fd887 ! 1:  bbf963695ba ci: disallow directional formatting
     @@ Commit message
          possible to abuse directional formatting (a feature of Unicode) to
          deceive human readers into interpreting code differently from compilers.
      
     +    For example, an "if ()" expression could be enclosed in a comment, but
     +    rendered as if it was outside of that comment. In effect, this could
     +    fool a reviewer into misinterpreting the code flow as benign when it is
     +    not.
     +
          It is highly unlikely that Git's source code wants to contain such
     -    directional formatting in the first place, so let's disallow it.
     +    directional formatting in the first place, so let's just disallow it.
      
          Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
      
     @@ .github/workflows/main.yml: jobs:
           - uses: actions/checkout@v2
           - run: ci/install-dependencies.sh
           - run: ci/run-static-analysis.sh
     -+    - name: disallow Unicode directional formatting
     -+      run: |
     -+        # Use UTF-8-aware `printf` to feed a byte pattern to non-UTF-8-aware `git grep`
     -+        # (Ubuntu's `git grep` is compiled without support for libpcre, otherwise we
     -+        # could use `git grep -P` with the `\u` syntax).
     -+        ! LANG=C git grep -Il "$(LANG=C.UTF-8 printf \
     -+          '\\(\u202a\\|\u202b\\|\u202c\\|\u202d\\|\u202e\\|\u2066\\|\u2067\\|\u2068\\|\u2069\\)')"
     ++    - run: ci/check-directional-formatting.sh
         sparse:
           needs: ci-config
           if: needs.ci-config.outputs.enabled == 'yes'
     +
     + ## ci/check-directional-formatting.sh (new) ##
     +@@
     ++#!/bin/bash
     ++
     ++# This script verifies that the non-binary files tracked in the Git index do
     ++# not contain any Unicode directional formatting: such formatting could be used
     ++# to deceive reviewers into interpreting code differently from the compiler.
     ++# This is intended to run on an Ubuntu agent in a GitHub workflow.
     ++#
     ++# `git grep` as well as GNU grep do not handle `\u` as a way to specify UTF-8.
     ++# A PCRE-enabled `git grep` would handle `\u` as desired, but Ubuntu does
     ++# not build its `git` packages with PCRE support.
     ++#
     ++# To work around that, we use `printf` to produce the pattern as a byte
     ++# sequence, and then feed that to `git grep` as a byte sequence (setting
     ++# `LC_CTYPE` to make sure that the arguments are interpreted as intended).
     ++#
     ++# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as
     ++# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`,
     ++# for example, would use a `printf` that does not understand that syntax.
     ++
     ++# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO
     ++# U+2066..U+2069: LRI, RLI, FSI and PDI
     ++regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)'
     ++
     ++! git ls-files -z ':(attr:!binary)' |
     ++LC_CTYPE=C xargs -0r git grep -Ele "$(LC_CTYPE=C.UTF-8 printf "$regex")" --


 .github/workflows/main.yml         |  1 +
 ci/check-directional-formatting.sh | 25 +++++++++++++++++++++++++
 2 files changed, 26 insertions(+)
 create mode 100755 ci/check-directional-formatting.sh

diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml
index 6ed6a9e8076..36b7a0bee38 100644
--- a/.github/workflows/main.yml
+++ b/.github/workflows/main.yml
@@ -289,6 +289,7 @@ jobs:
     - uses: actions/checkout@v2
     - run: ci/install-dependencies.sh
     - run: ci/run-static-analysis.sh
+    - run: ci/check-directional-formatting.sh
   sparse:
     needs: ci-config
     if: needs.ci-config.outputs.enabled == 'yes'
diff --git a/ci/check-directional-formatting.sh b/ci/check-directional-formatting.sh
new file mode 100755
index 00000000000..ab894715cf1
--- /dev/null
+++ b/ci/check-directional-formatting.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+# This script verifies that the non-binary files tracked in the Git index do
+# not contain any Unicode directional formatting: such formatting could be used
+# to deceive reviewers into interpreting code differently from the compiler.
+# This is intended to run on an Ubuntu agent in a GitHub workflow.
+#
+# `git grep` as well as GNU grep do not handle `\u` as a way to specify UTF-8.
+# A PCRE-enabled `git grep` would handle `\u` as desired, but Ubuntu does
+# not build its `git` packages with PCRE support.
+#
+# To work around that, we use `printf` to produce the pattern as a byte
+# sequence, and then feed that to `git grep` as a byte sequence (setting
+# `LC_CTYPE` to make sure that the arguments are interpreted as intended).
+#
+# Note: we need to use Bash here because its `printf` interprets `\uNNNN` as
+# UTF-8 code points, as desired. Running this script through Ubuntu's `dash`,
+# for example, would use a `printf` that does not understand that syntax.
+
+# U+202a..U+2a2e: LRE, RLE, PDF, LRO and RLO
+# U+2066..U+2069: LRI, RLI, FSI and PDI
+regex='(\u202a|\u202b|\u202c|\u202d|\u202e|\u2066|\u2067|\u2068|\u2069)'
+
+! git ls-files -z ':(attr:!binary)' |
+LC_CTYPE=C xargs -0r git grep -Ele "$(LC_CTYPE=C.UTF-8 printf "$regex")" --

base-commit: 0cddd84c9f3e9c3d793ec93034ef679335f35e49
-- 
gitgitgadget

  parent reply	other threads:[~2021-11-03 12:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-02 12:58 [PATCH] ci: disallow directional formatting Johannes Schindelin via GitGitGadget
2021-11-02 15:01 ` Ævar Arnfjörð Bjarmason
2021-11-02 15:48   ` Taylor Blau
2021-11-02 16:03     ` Ævar Arnfjörð Bjarmason
2021-11-02 16:12     ` Johannes Schindelin
2021-11-02 16:38       ` Ævar Arnfjörð Bjarmason
2021-11-03 17:20     ` Junio C Hamano
2021-11-03 12:23 ` Johannes Schindelin via GitGitGadget [this message]
2021-11-03 16:36   ` [PATCH v2] " Ævar Arnfjörð Bjarmason
2021-11-03 18:00   ` Junio C Hamano
2021-11-03 23:38   ` Junio C Hamano
2021-11-04 10:19     ` Johannes Schindelin
2021-11-04 13:13   ` [PATCH v3] " Johannes Schindelin via GitGitGadget
2021-11-04 13:48     ` Ævar Arnfjörð Bjarmason
2021-11-04 17:19       ` Junio C Hamano
2021-11-08 18:49         ` Ævar Arnfjörð Bjarmason
2021-11-08 20:08           ` Junio C Hamano
2021-11-09 13:34             ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=pull.1071.v2.git.1635942236065.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.