All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH 1/4] git-sh-i18n--envsubst: our own envsubst(1) for eval_gettext()
Date: Sun,  8 May 2011 12:10:56 +0000	[thread overview]
Message-ID: <1304856659-10672-2-git-send-email-avarab@gmail.com> (raw)
In-Reply-To: <1304856659-10672-1-git-send-email-avarab@gmail.com>

Add a git-sh-i18n--envsubst program which is a stripped-down version
of the GNU envsubst(1) program that comes with GNU gettext for use in
the eval_gettext() fallback, instead of using a clever (but broken)
printf + eval + printf trick.

In a previous incarnation of the gettext series I implemented the
eval_gettext() fallback like this:

    eval_gettext() {
        gettext_out=$(gettext "$1")
        gettext_eval="printf '%s' \"$gettext_out\""
        printf "%s" "`eval \"$gettext_eval\"`"
    }

This was clever, but would incorrectly handle cases where the variable
being interpolated contained spaces. E.g.:

    cmd="git foo"; eval_gettext "command: \$cmd"

Would emit "command: gitfoo", instead of the correct "command: git
foo". This happened with a message in git-am.sh that used the $cmdline
variable. An alternate version suggested by Junio is able to handle
this case:

   eval_gettext() {
           gettext_out=$(gettext "$1") &&
           gettext_eval="printf '%s' \"$gettext_out\"" &&
           gettext_cmd=$(eval "$gettext_eval") &&
           printf "%s" "$gettext_cmd"
   }

But it strips out double-quotes when they're included in the message
being passed to eval_gettext. E.g. the message in git-am.sh:

    "When you have resolved this problem run \"$cmdline --resolved\"."

Will be emitted as:

    When you have resolved this problem run git am --resolved.

Instead of the correct:

    When you have resolved this problem run "git am --resolved".

To work around this, and to improve our variable expansion behavior
(eval has security issues) I've imported a stripped-down version of
gettext's envsubst(1) program. With it eval_gettext() is implemented
like this:

    eval_gettext() {
        printf "%s" "$1" | (
            export PATH $(git sh-i18n--envsubst --variables "$1");
            git sh-i18n--envsubst "$1"
        )
    }

These are the modifications I made to envsubst.c as I turned it into
sh-i18n--envsubst.c:

 * Added our git-compat-util.h header for xrealloc() and friends.

 * Removed inclusion of gettext-specific headers.

 * Removed most of main() and replaced it with my own. The modified
   version only does option parsing for --variables. That's all it
   needs.

 * Modified error() invocations to use our error() instead of
   error(3).

 * Replaced the gettext XNMALLOC(n, size) macro with just
   xmalloc(n). Since XNMALLOC() only allocated char's.

 * Removed the string_list_destroy function. It's redundant (also in
   the upstream code).

 * Replaced the use of stdbool.h (a C99 header) by doing the following
   replacements on the code:

    * s/bool/unsigned short int/g
    * s/true/1/g
    * s/false/0/g

Reported-by: Johannes Sixt <j.sixt@viscovery.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 .gitignore                              |    2 +
 Documentation/git-sh-i18n--envsubst.txt |   36 +++
 Makefile                                |    1 +
 sh-i18n--envsubst.c                     |  444 +++++++++++++++++++++++++++++++
 t/t0201-gettext-fallbacks.sh            |   51 ++++
 5 files changed, 534 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/git-sh-i18n--envsubst.txt
 create mode 100644 sh-i18n--envsubst.c
 create mode 100755 t/t0201-gettext-fallbacks.sh

diff --git a/.gitignore b/.gitignore
index 711fce7..1ccf797 100644
--- a/.gitignore
+++ b/.gitignore
@@ -127,6 +127,8 @@
 /git-rm
 /git-send-email
 /git-send-pack
+/git-sh-i18n
+/git-sh-i18n--envsubst
 /git-sh-setup
 /git-shell
 /git-shortlog
diff --git a/Documentation/git-sh-i18n--envsubst.txt b/Documentation/git-sh-i18n--envsubst.txt
new file mode 100644
index 0000000..e146a2c
--- /dev/null
+++ b/Documentation/git-sh-i18n--envsubst.txt
@@ -0,0 +1,36 @@
+git-sh-i18n--envsubst(1)
+========================
+
+NAME
+----
+git-sh-i18n--envsubst - Git's own envsubst(1) for i18n fallbacks
+
+DESCRIPTION
+-----------
+
+This is not a command the end user would want to run.  Ever.
+This documentation is meant for people who are studying the
+plumbing scripts and/or are writing new ones.
+
+git-sh-i18n--envsubst is Git's stripped-down copy of the GNU
+`envsubst(1)` program that comes with the GNU gettext
+package. It's used internally by linkgit:git-sh-i18n[1] to
+provide the `eval_gettext` fallback on Solaris and systems
+that don't have a gettext implementation.
+
+No promises are made about the interface, or that this
+program won't disappear without warning in the next version
+of Git. Don't use it.
+
+Author
+------
+Written by Ævar Arnfjörð Bjarmason <avarab@gmail.com>
+
+Documentation
+--------------
+Documentation by Ævar Arnfjörð Bjarmason and the git-list
+<git@vger.kernel.org>.
+
+GIT
+---
+Part of the linkgit:git[1] suite
diff --git a/Makefile b/Makefile
index ef57c1c..87c83cb 100644
--- a/Makefile
+++ b/Makefile
@@ -414,6 +414,7 @@ PROGRAM_OBJS += shell.o
 PROGRAM_OBJS += show-index.o
 PROGRAM_OBJS += upload-pack.o
 PROGRAM_OBJS += http-backend.o
+PROGRAM_OBJS += sh-i18n--envsubst.o
 
 PROGRAMS += $(patsubst %.o,git-%$X,$(PROGRAM_OBJS))
 
diff --git a/sh-i18n--envsubst.c b/sh-i18n--envsubst.c
new file mode 100644
index 0000000..8db71b1
--- /dev/null
+++ b/sh-i18n--envsubst.c
@@ -0,0 +1,444 @@
+/*
+ * sh-i18n--envsubst.c - a stripped-down version of gettext's envsubst(1)
+ *
+ * Copyright (C) 2010 Ævar Arnfjörð Bjarmason
+ *
+ * This is a modified version of
+ * 67d0871a8c:gettext-runtime/src/envsubst.c from the gettext.git
+ * repository. It has been stripped down to only implement the
+ * envsubst(1) features that we need in the git-sh-i18n fallbacks.
+ *
+ * The "Close standard error" part in main() is from
+ * 8dac033df0:gnulib-local/lib/closeout.c. The copyright notices for
+ * both files are reproduced immediately below.
+ */
+
+#include "git-compat-util.h"
+
+/* Substitution of environment variables in shell format strings.
+   Copyright (C) 2003-2007 Free Software Foundation, Inc.
+   Written by Bruno Haible <bruno@clisp.org>, 2003.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+/* closeout.c - close standard output and standard error
+   Copyright (C) 1998-2007 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 2, or (at your option)
+   any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program; if not, write to the Free Software Foundation,
+   Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.  */
+
+#include <errno.h>
+#include <getopt.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+/* If true, substitution shall be performed on all variables.  */
+static unsigned short int all_variables;
+
+/* Forward declaration of local functions.  */
+static void print_variables (const char *string);
+static void note_variables (const char *string);
+static void subst_from_stdin (void);
+
+int
+main (int argc, char *argv[])
+{
+  /* Default values for command line options.  */
+  unsigned short int show_variables = 0;
+
+  switch (argc)
+	{
+	case 1:
+	  error ("we won't substitute all variables on stdin for you");
+	  /*
+	  all_variables = 1;
+      subst_from_stdin ();
+	  */
+	case 2:
+	  /* echo '$foo and $bar' | git sh-i18n--envsubst --variables '$foo and $bar' */
+	  all_variables = 0;
+	  note_variables (argv[1]);
+      subst_from_stdin ();
+	  break;
+	case 3:
+	  /* git sh-i18n--envsubst --variables '$foo and $bar' */
+	  if (strcmp(argv[1], "--variables"))
+		error ("first argument must be --variables when two are given");
+	  show_variables = 1;
+      print_variables (argv[2]);
+	  break;
+	default:
+	  error ("too many arguments");
+	  break;
+	}
+
+  /* Close standard error.  This is simpler than fwriteerror_no_ebadf, because
+     upon failure we don't need an errno - all we can do at this point is to
+     set an exit status.  */
+  errno = 0;
+  if (ferror (stderr) || fflush (stderr))
+    { 
+      fclose (stderr);
+      exit (EXIT_FAILURE);
+    }
+  if (fclose (stderr) && errno != EBADF)
+    exit (EXIT_FAILURE);
+
+  exit (EXIT_SUCCESS);
+}
+
+/* Parse the string and invoke the callback each time a $VARIABLE or
+   ${VARIABLE} construct is seen, where VARIABLE is a nonempty sequence
+   of ASCII alphanumeric/underscore characters, starting with an ASCII
+   alphabetic/underscore character.
+   We allow only ASCII characters, to avoid dependencies w.r.t. the current
+   encoding: While "${\xe0}" looks like a variable access in ISO-8859-1
+   encoding, it doesn't look like one in the BIG5, BIG5-HKSCS, GBK, GB18030,
+   SHIFT_JIS, JOHAB encodings, because \xe0\x7d is a single character in these
+   encodings.  */
+static void
+find_variables (const char *string,
+		void (*callback) (const char *var_ptr, size_t var_len))
+{
+  for (; *string != '\0';)
+    if (*string++ == '$')
+      {
+	const char *variable_start;
+	const char *variable_end;
+	unsigned short int valid;
+	char c;
+
+	if (*string == '{')
+	  string++;
+
+	variable_start = string;
+	c = *string;
+	if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '_')
+	  {
+	    do
+	      c = *++string;
+	    while ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
+		   || (c >= '0' && c <= '9') || c == '_');
+	    variable_end = string;
+
+	    if (variable_start[-1] == '{')
+	      {
+		if (*string == '}')
+		  {
+		    string++;
+		    valid = 1;
+		  }
+		else
+		  valid = 0;
+	      }
+	    else
+	      valid = 1;
+
+	    if (valid)
+	      callback (variable_start, variable_end - variable_start);
+	  }
+      }
+}
+
+
+/* Print a variable to stdout, followed by a newline.  */
+static void
+print_variable (const char *var_ptr, size_t var_len)
+{
+  fwrite (var_ptr, var_len, 1, stdout);
+  putchar ('\n');
+}
+
+/* Print the variables contained in STRING to stdout, each one followed by a
+   newline.  */
+static void
+print_variables (const char *string)
+{
+  find_variables (string, &print_variable);
+}
+
+
+/* Type describing list of immutable strings,
+   implemented using a dynamic array.  */
+typedef struct string_list_ty string_list_ty;
+struct string_list_ty
+{
+  const char **item;
+  size_t nitems;
+  size_t nitems_max;
+};
+
+/* Initialize an empty list of strings.  */
+static inline void
+string_list_init (string_list_ty *slp)
+{
+  slp->item = NULL;
+  slp->nitems = 0;
+  slp->nitems_max = 0;
+}
+
+/* Append a single string to the end of a list of strings.  */
+static inline void
+string_list_append (string_list_ty *slp, const char *s)
+{
+  /* Grow the list.  */
+  if (slp->nitems >= slp->nitems_max)
+    {
+      size_t nbytes;
+
+      slp->nitems_max = slp->nitems_max * 2 + 4;
+      nbytes = slp->nitems_max * sizeof (slp->item[0]);
+      slp->item = (const char **) xrealloc (slp->item, nbytes);
+    }
+
+  /* Add the string to the end of the list.  */
+  slp->item[slp->nitems++] = s;
+}
+
+/* Compare two strings given by reference.  */
+static int
+cmp_string (const void *pstr1, const void *pstr2)
+{
+  const char *str1 = *(const char **)pstr1;
+  const char *str2 = *(const char **)pstr2;
+
+  return strcmp (str1, str2);
+}
+
+/* Sort a list of strings.  */
+static inline void
+string_list_sort (string_list_ty *slp)
+{
+  if (slp->nitems > 0)
+    qsort (slp->item, slp->nitems, sizeof (slp->item[0]), cmp_string);
+}
+
+/* Test whether a string list contains a given string.  */
+static inline int
+string_list_member (const string_list_ty *slp, const char *s)
+{
+  size_t j;
+
+  for (j = 0; j < slp->nitems; ++j)
+    if (strcmp (slp->item[j], s) == 0)
+      return 1;
+  return 0;
+}
+
+/* Test whether a sorted string list contains a given string.  */
+static int
+sorted_string_list_member (const string_list_ty *slp, const char *s)
+{
+  size_t j1, j2;
+
+  j1 = 0;
+  j2 = slp->nitems;
+  if (j2 > 0)
+    {
+      /* Binary search.  */
+      while (j2 - j1 > 1)
+	{
+	  /* Here we know that if s is in the list, it is at an index j
+	     with j1 <= j < j2.  */
+	  size_t j = (j1 + j2) >> 1;
+	  int result = strcmp (slp->item[j], s);
+
+	  if (result > 0)
+	    j2 = j;
+	  else if (result == 0)
+	    return 1;
+	  else
+	    j1 = j + 1;
+	}
+      if (j2 > j1)
+	if (strcmp (slp->item[j1], s) == 0)
+	  return 1;
+    }
+  return 0;
+}
+
+
+/* Set of variables on which to perform substitution.
+   Used only if !all_variables.  */
+static string_list_ty variables_set;
+
+/* Adds a variable to variables_set.  */
+static void
+note_variable (const char *var_ptr, size_t var_len)
+{
+  char *string = xmalloc (var_len + 1);
+  memcpy (string, var_ptr, var_len);
+  string[var_len] = '\0';
+
+  string_list_append (&variables_set, string);
+}
+
+/* Stores the variables occurring in the string in variables_set.  */
+static void
+note_variables (const char *string)
+{
+  string_list_init (&variables_set);
+  find_variables (string, &note_variable);
+  string_list_sort (&variables_set);
+}
+
+
+static int
+do_getc ()
+{
+  int c = getc (stdin);
+
+  if (c == EOF)
+    {
+      if (ferror (stdin))
+	error ("error while reading standard input");
+    }
+
+  return c;
+}
+
+static inline void
+do_ungetc (int c)
+{
+  if (c != EOF)
+    ungetc (c, stdin);
+}
+
+/* Copies stdin to stdout, performing substitutions.  */
+static void
+subst_from_stdin ()
+{
+  static char *buffer;
+  static size_t bufmax;
+  static size_t buflen;
+  int c;
+
+  for (;;)
+    {
+      c = do_getc ();
+      if (c == EOF)
+	break;
+      /* Look for $VARIABLE or ${VARIABLE}.  */
+      if (c == '$')
+	{
+	  unsigned short int opening_brace = 0;
+	  unsigned short int closing_brace = 0;
+
+	  c = do_getc ();
+	  if (c == '{')
+	    {
+	      opening_brace = 1;
+	      c = do_getc ();
+	    }
+	  if ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '_')
+	    {
+	      unsigned short int valid;
+
+	      /* Accumulate the VARIABLE in buffer.  */
+	      buflen = 0;
+	      do
+		{
+		  if (buflen >= bufmax)
+		    {
+		      bufmax = 2 * bufmax + 10;
+		      buffer = xrealloc (buffer, bufmax);
+		    }
+		  buffer[buflen++] = c;
+
+		  c = do_getc ();
+		}
+	      while ((c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z')
+		     || (c >= '0' && c <= '9') || c == '_');
+
+	      if (opening_brace)
+		{
+		  if (c == '}')
+		    {
+		      closing_brace = 1;
+		      valid = 1;
+		    }
+		  else
+		    {
+		      valid = 0;
+		      do_ungetc (c);
+		    }
+		}
+	      else
+		{
+		  valid = 1;
+		  do_ungetc (c);
+		}
+
+	      if (valid)
+		{
+		  /* Terminate the variable in the buffer.  */
+		  if (buflen >= bufmax)
+		    {
+		      bufmax = 2 * bufmax + 10;
+		      buffer = xrealloc (buffer, bufmax);
+		    }
+		  buffer[buflen] = '\0';
+
+		  /* Test whether the variable shall be substituted.  */
+		  if (!all_variables
+		      && !sorted_string_list_member (&variables_set, buffer))
+		    valid = 0;
+		}
+
+	      if (valid)
+		{
+		  /* Substitute the variable's value from the environment.  */
+		  const char *env_value = getenv (buffer);
+
+		  if (env_value != NULL)
+		    fputs (env_value, stdout);
+		}
+	      else
+		{
+		  /* Perform no substitution at all.  Since the buffered input
+		     contains no other '$' than at the start, we can just
+		     output all the buffered contents.  */
+		  putchar ('$');
+		  if (opening_brace)
+		    putchar ('{');
+		  fwrite (buffer, buflen, 1, stdout);
+		  if (closing_brace)
+		    putchar ('}');
+		}
+	    }
+	  else
+	    {
+	      do_ungetc (c);
+	      putchar ('$');
+	      if (opening_brace)
+		putchar ('{');
+	    }
+	}
+      else
+	putchar (c);
+    }
+}
diff --git a/t/t0201-gettext-fallbacks.sh b/t/t0201-gettext-fallbacks.sh
new file mode 100755
index 0000000..54d98b9
--- /dev/null
+++ b/t/t0201-gettext-fallbacks.sh
@@ -0,0 +1,51 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Ævar Arnfjörð Bjarmason
+#
+
+test_description='Gettext Shell fallbacks'
+
+. ./test-lib.sh
+. "$GIT_BUILD_DIR"/git-sh-i18n
+
+test_expect_success 'gettext: our gettext() fallback has pass-through semantics' '
+    printf "test" >expect &&
+    gettext "test" >actual &&
+    test_i18ncmp expect actual &&
+    printf "test more words" >expect &&
+    gettext "test more words" >actual &&
+    test_i18ncmp expect actual
+'
+
+test_expect_success 'eval_gettext: our eval_gettext() fallback has pass-through semantics' '
+    printf "test" >expect &&
+    eval_gettext "test" >actual &&
+    test_i18ncmp expect actual &&
+    printf "test more words" >expect &&
+    eval_gettext "test more words" >actual &&
+    test_i18ncmp expect actual
+'
+
+test_expect_success 'eval_gettext: our eval_gettext() fallback can interpolate variables' '
+    printf "test YesPlease" >expect &&
+    GIT_INTERNAL_GETTEXT_TEST_FALLBACKS=YesPlease eval_gettext "test \$GIT_INTERNAL_GETTEXT_TEST_FALLBACKS" >actual &&
+    test_i18ncmp expect actual
+'
+
+test_expect_success 'eval_gettext: our eval_gettext() fallback can interpolate variables with spaces' '
+    cmdline="git am" &&
+    export cmdline;
+    printf "When you have resolved this problem run git am --resolved." >expect &&
+    eval_gettext "When you have resolved this problem run \$cmdline --resolved." >actual
+    test_i18ncmp expect actual
+'
+
+test_expect_success 'eval_gettext: our eval_gettext() fallback can interpolate variables with spaces and quotes' '
+    cmdline="git am" &&
+    export cmdline;
+    printf "When you have resolved this problem run \"git am --resolved\"." >expect &&
+    eval_gettext "When you have resolved this problem run \"\$cmdline --resolved\"." >actual
+    test_i18ncmp expect actual
+'
+
+test_done
-- 
1.7.4.4

  reply	other threads:[~2011-05-08 12:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-08 12:10 [PATCH 0/4] i18n: Add shell script translation infrastructure Ævar Arnfjörð Bjarmason
2011-05-08 12:10 ` Ævar Arnfjörð Bjarmason [this message]
2011-05-08 17:15   ` [PATCH 1/4] git-sh-i18n--envsubst: our own envsubst(1) for eval_gettext() Junio C Hamano
2011-05-08 21:33     ` Ævar Arnfjörð Bjarmason
2011-05-09  3:17       ` Junio C Hamano
2011-05-09  7:52         ` Ævar Arnfjörð Bjarmason
2011-05-08 12:10 ` [PATCH 2/4] git-sh-i18n.sh: add no-op gettext() and eval_gettext() wrappers Ævar Arnfjörð Bjarmason
2011-05-08 12:10 ` [PATCH 3/4] git-sh-i18n.sh: add GIT_GETTEXT_POISON support Ævar Arnfjörð Bjarmason
2011-05-08 12:10 ` [PATCH 4/4] Makefile: add xgettext target for *.sh files Ævar Arnfjörð Bjarmason
2011-05-08 17:03 ` [PATCH 0/4] i18n: Add shell script translation infrastructure Sverre Rabbelier
2011-05-08 21:38   ` Ævar Arnfjörð Bjarmason
2011-05-08 21:45     ` Sverre Rabbelier
2011-05-08 21:52       ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1304856659-10672-2-git-send-email-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.