All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] kbuild: improve source package builds
@ 2023-02-02  3:37 Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git Masahiro Yamada
                   ` (5 more replies)
  0 siblings, 6 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada, Tom Rix, llvm


This series improve deb-pkg and (src)rpm-pkg so they can build
without cleaning the kernel tree.
The debian source package will switch to 3.0 (quilt).

My next plans are:

 - add 'srcdeb-pkg' target

 - add more compression mode

 - rewrite snap-pkg and delete the old tar macro



Masahiro Yamada (6):
  kbuild: add a tool to generate a list of files ignored by git
  kbuild: deb-pkg: create source package without cleaning
  kbuild: rpm-pkg: build binary packages from source rpm
  kbuild: srcrpm-pkg: create source package without cleaning
  kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile
  kbuild: deb-pkg: switch over to format 3.0 (quilt)

 Makefile                 |   4 +
 scripts/.gitignore       |   1 +
 scripts/Makefile         |   2 +-
 scripts/Makefile.package |  94 +++---
 scripts/gen-exclude.c    | 623 +++++++++++++++++++++++++++++++++++++++
 scripts/package/mkdebian |  23 +-
 scripts/package/mkspec   |   8 +-
 7 files changed, 706 insertions(+), 49 deletions(-)
 create mode 100644 scripts/gen-exclude.c

-- 
2.34.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git
  2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
@ 2023-02-02  3:37 ` Masahiro Yamada
  2023-02-02  3:49   ` Masahiro Yamada
  2023-02-02 11:02   ` Nicolas Schier
  2023-02-02  3:37 ` [PATCH v4 2/6] kbuild: deb-pkg: create source package without cleaning Masahiro Yamada
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada

In short, the motivation of this commit is to build a source package
without cleaning the source tree.

The deb-pkg and (src)rpm-pkg targets first run 'make clean' before
creating a source tarball. Otherwise build artifacts such as *.o,
*.a, etc. would be included in the tarball. Yet, the tarball ends up
containing several garbage files since 'make clean' does not clean
everything.

Cleaning the tree every time is annoying since it makes the incremental
build impossible. It is desirable to create a source tarball without
cleaning the tree.

In fact, there are some ways to archive this.

The easiest way is 'git archive'. Actually, 'make perf-tar*-src-pkg'
does this way, but I do not like it because it works only when the source
tree is managed by git, and all files you want in the tarball must be
committed in advance.

I want to make it work without relying on git. We can do this.

Files that are not tracked by git are generated files. We can list them
out by parsing the .gitignore files. Of course, .gitignore does not cover
all the cases, but it works well enough.

tar(1) claims to support it:

  --exclude-vcs-ignores

    Exclude files that match patterns read from VCS-specific ignore files.
    Supported files are: .cvsignore, .gitignore, .bzrignore, and .hgignore.

The best scenario would be to use 'tar --exclude-vcs-ignores', but this
option does not work. --exclude-vcs-ignore does not understand any of
the negation (!), preceding slash, following slash, etc.. So, this option
is just useless.

Hence, I wrote this gitignore parser. The previous version [1], written
in Python, was so slow. This version is implemented in C, so it works
much faster.

This tool traverses the source tree, parsing the .gitignore files. It
prints the file paths that are not tracked by git. The output can be
used for tar's --exclude-from= option.

[How to test this tool]

  $ git clean -dfx
  $ make -s -j$(nproc) defconfig all                       # or allmodconifg or whatever
  $ git archive -o ../linux1.tar --prefix=./ HEAD
  $ tar tf ../linux1.tar | LANG=C sort > ../file-list1     # files emitted by 'git archive'
  $ make scripts_exclude
    HOSTCC  scripts/gen-exclude
  $ scripts/gen-exclude --prefix=./ -o ../exclude-list
  $ tar cf ../linux2.tar --exclude-from=../exclude-list .
  $ tar tf ../linux2.tar | LANG=C sort > ../file-list2     # files emitted by 'tar'
  $ diff  ../file-list1 ../file-list2 | grep -E '^(<|>)'
  < ./Documentation/devicetree/bindings/.yamllint
  < ./drivers/clk/.kunitconfig
  < ./drivers/gpu/drm/tests/.kunitconfig
  < ./drivers/gpu/drm/vc4/tests/.kunitconfig
  < ./drivers/hid/.kunitconfig
  < ./fs/ext4/.kunitconfig
  < ./fs/fat/.kunitconfig
  < ./kernel/kcsan/.kunitconfig
  < ./lib/kunit/.kunitconfig
  < ./mm/kfence/.kunitconfig
  < ./net/sunrpc/.kunitconfig
  < ./tools/testing/selftests/arm64/tags/
  < ./tools/testing/selftests/arm64/tags/.gitignore
  < ./tools/testing/selftests/arm64/tags/Makefile
  < ./tools/testing/selftests/arm64/tags/run_tags_test.sh
  < ./tools/testing/selftests/arm64/tags/tags_test.c
  < ./tools/testing/selftests/kvm/.gitignore
  < ./tools/testing/selftests/kvm/Makefile
  < ./tools/testing/selftests/kvm/config
  < ./tools/testing/selftests/kvm/settings

The source tarball contains most of files that are tracked by git. You
see some diffs, but it is just because some .gitignore files are wrong.

  $ git ls-files -i -c --exclude-per-directory=.gitignore
  Documentation/devicetree/bindings/.yamllint
  drivers/clk/.kunitconfig
  drivers/gpu/drm/tests/.kunitconfig
  drivers/hid/.kunitconfig
  fs/ext4/.kunitconfig
  fs/fat/.kunitconfig
  kernel/kcsan/.kunitconfig
  lib/kunit/.kunitconfig
  mm/kfence/.kunitconfig
  tools/testing/selftests/arm64/tags/.gitignore
  tools/testing/selftests/arm64/tags/Makefile
  tools/testing/selftests/arm64/tags/run_tags_test.sh
  tools/testing/selftests/arm64/tags/tags_test.c
  tools/testing/selftests/kvm/.gitignore
  tools/testing/selftests/kvm/Makefile
  tools/testing/selftests/kvm/config
  tools/testing/selftests/kvm/settings

[1]: https://lore.kernel.org/all/20230128173843.765212-1-masahiroy@kernel.org/

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v3)

Changes in v3:
 - Various code refactoring: remove struct gitignore, remove next: label etc.
 - Support --extra-pattern option

Changes in v2:
 - Reimplement in C

 Makefile              |   4 +
 scripts/.gitignore    |   1 +
 scripts/Makefile      |   2 +-
 scripts/gen-exclude.c | 623 ++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 629 insertions(+), 1 deletion(-)
 create mode 100644 scripts/gen-exclude.c

diff --git a/Makefile b/Makefile
index 2faf872b6808..35b294cc6f32 100644
--- a/Makefile
+++ b/Makefile
@@ -1652,6 +1652,10 @@ distclean: mrproper
 %pkg: include/config/kernel.release FORCE
 	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.package $@
 
+PHONY += scripts_exclude
+scripts_exclude: scripts_basic
+	$(Q)$(MAKE) $(build)=scripts scripts/gen-exclude
+
 # Brief documentation of the typical targets used
 # ---------------------------------------------------------------------------
 
diff --git a/scripts/.gitignore b/scripts/.gitignore
index 6e9ce6720a05..7f433bc1461c 100644
--- a/scripts/.gitignore
+++ b/scripts/.gitignore
@@ -1,5 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0-only
 /asn1_compiler
+/gen-exclude
 /generate_rust_target
 /insert-sys-cert
 /kallsyms
diff --git a/scripts/Makefile b/scripts/Makefile
index 32b6ba722728..5dcd7f57607f 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -38,7 +38,7 @@ HOSTCFLAGS_sorttable.o += -DMCOUNT_SORT_ENABLED
 endif
 
 # The following programs are only built on demand
-hostprogs += unifdef
+hostprogs += gen-exclude unifdef
 
 # The module linker script is preprocessed on demand
 targets += module.lds
diff --git a/scripts/gen-exclude.c b/scripts/gen-exclude.c
new file mode 100644
index 000000000000..5c4ecd902290
--- /dev/null
+++ b/scripts/gen-exclude.c
@@ -0,0 +1,623 @@
+// SPDX-License-Identifier: GPL-2.0-only
+//
+// Traverse the source tree, parsing all .gitignore files, and print file paths
+// that are not tracked by git.
+// The output is suitable to the --exclude-from option of tar.
+// This is useful until the --exclude-vcs-ignores option gets working correctly.
+//
+// Copyright (C) 2023 Masahiro Yamada <masahiroy@kernel.org>
+
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <fnmatch.h>
+#include <getopt.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/stat.h>
+#include <sys/types.h>
+#include <unistd.h>
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+// struct pattern - represent an ignore pattern (a line in .gitignroe)
+// @negate:          negate the pattern (prefixing '!')
+// @dir_only:        only matches directories (trailing '/')
+// @path_match:      true if the glob pattern is a path instead of a file name
+// @double_asterisk: true if the glob pattern contains double asterisks ('**')
+// @glob:            glob pattern
+struct pattern {
+	bool negate;
+	bool dir_only;
+	bool path_match;
+	bool double_asterisk;
+	char glob[];
+};
+
+struct pattern **patterns;
+static int nr_patterns, alloced_patterns;
+
+// Remember the number of patterns at each directory level
+static int *nr_patterns_at;
+// Track the current/max directory level;
+static int depth, max_depth;
+static bool debug_on;
+static FILE *out_fp;
+static char *prefix = "";
+static char *progname;
+
+static void __attribute__((noreturn)) perror_exit(const char *s)
+{
+	perror(s);
+
+	exit(EXIT_FAILURE);
+}
+
+static void __attribute__((noreturn)) error_exit(const char *fmt, ...)
+{
+	va_list args;
+
+	fprintf(stderr, "%s: error: ", progname);
+
+	va_start(args, fmt);
+	vfprintf(stderr, fmt, args);
+	va_end(args);
+
+	exit(EXIT_FAILURE);
+}
+
+static void debug(const char *fmt, ...)
+{
+	va_list args;
+	int i;
+
+	if (!debug_on)
+		return;
+
+	fprintf(stderr, "[DEBUG]");
+
+	for (i = 0; i < depth * 2; i++)
+		fputc(' ', stderr);
+
+	va_start(args, fmt);
+	vfprintf(stderr, fmt, args);
+	va_end(args);
+}
+
+static void *xrealloc(void *ptr, size_t size)
+{
+	ptr = realloc(ptr, size);
+	if (!ptr)
+		perror_exit(progname);
+
+	return ptr;
+}
+
+static void *xmalloc(size_t size)
+{
+	return xrealloc(NULL, size);
+}
+
+static char *xstrdup(const char *s)
+{
+	char *new = strdup(s);
+
+	if (!new)
+		perror_exit(progname);
+
+	return new;
+}
+
+static bool simple_match(const char *string, const char *pattern)
+{
+	return fnmatch(pattern, string, FNM_PATHNAME) == 0;
+}
+
+// Handle double asterisks ("**") matching.
+// FIXME:
+//  This function does not work if double asterisks apppear multiple times,
+//  like "foo/**/bar/**/baz".
+static bool double_asterisk_match(const char *path, const char *pattern)
+{
+	bool result = false;
+	int slash_diff = 0;
+	char *modified_pattern, *q;
+	const char *p;
+	size_t len;
+
+	for (p = path; *p; p++)
+		if (*p == '/')
+			slash_diff++;
+
+	for (p = pattern; *p; p++)
+		if (*p == '/')
+			slash_diff--;
+
+	len = strlen(pattern) + 1;
+
+	if (slash_diff > 0)
+		len += slash_diff * 2;
+	modified_pattern = xmalloc(len);
+
+	q = modified_pattern;
+	for (p = pattern; *p; p++) {
+		if (!strncmp(p, "**/", 3)) {
+			// "**/" means zero of more sequences of '*/".
+			// "foo**/bar" matches "foobar", "foo*/bar",
+			// "foo*/*/bar", etc.
+			while (slash_diff-- > 0) {
+				*q++ = '*';
+				*q++ = '/';
+			}
+
+			if (slash_diff == 0) {
+				*q++ = '*';
+				*q++ = '/';
+			}
+
+			if (slash_diff < 0)
+				slash_diff++;
+
+			p += 2;
+		} else if (!strcmp(p, "/**")) {
+			// A trailing "/**" matches everything inside.
+			while (slash_diff-- >= 0) {
+				*q++ = '/';
+				*q++ = '*';
+			}
+
+			p += 2;
+		} else {
+			// Copy other patterns as-is.
+			// Other consecutive asterisks are considered regular
+			// asterisks. fnmatch() already handles them like that.
+			*q++ = *p;
+		}
+	}
+
+	*q = '\0';
+
+	result = simple_match(path, modified_pattern);
+
+	free(modified_pattern);
+
+	return result;
+}
+
+// Return true if the given path is ignored by git.
+static bool is_ignored(const char *path, const char *name, bool is_dir)
+{
+	int i;
+
+	// Search the patterns in the reverse order because the last matching
+	// pattern wins.
+	for (i = nr_patterns - 1; i >= 0; i--) {
+		struct pattern *p = patterns[i];
+
+		if (!is_dir && p->dir_only)
+			continue;
+
+		if (!p->path_match) {
+			// If the pattern has no slash at the beginning or
+			// middle, it matches against the basename. Most cases
+			// fall into this and work well with double asterisks.
+			if (!simple_match(name, p->glob))
+				continue;
+		} else if (!p->double_asterisk) {
+			// Unless the pattern has double asterisks, it is still
+			// simple but matches against the path instead.
+			if (!simple_match(path, p->glob))
+				continue;
+		} else {
+			// Double asterisks with a slash. Complex, but rare.
+			if (!double_asterisk_match(path, p->glob))
+				continue;
+		}
+
+		debug("%s: matches %s%s%s\n", path, p->negate ? "!" : "",
+		      p->glob, p->dir_only ? "/" : "");
+
+		return !p->negate;
+	}
+
+	debug("%s: no match\n", path);
+
+	return false;
+}
+
+// Return the length of the initial segment of the string that does not contain
+// the unquoted sequence of the given character. Similar to strcspn() in libc.
+static size_t strcspn_trailer(const char *str, char c)
+{
+	bool quoted = false;
+	size_t len = strlen(str);
+	size_t spn = len;
+	const char *s;
+
+	for (s = str; *s; s++) {
+		if (!quoted && *s == c) {
+			if (s - str < spn)
+				spn = s - str;
+		} else {
+			spn = len;
+
+			if (!quoted && *s == '\\')
+				quoted = true;
+			else
+				quoted = false;
+		}
+	}
+
+	return spn;
+}
+
+// Add an gitignore pattern.
+static void add_pattern(char *s, const char *dirpath)
+{
+	bool negate = false;
+	bool dir_only = false;
+	bool path_match = false;
+	bool double_asterisk = false;
+	char *e = s + strlen(s);
+	struct pattern *p;
+	size_t len;
+
+	// Skip comments
+	if (*s == '#')
+		return;
+
+	// Trailing spaces are ignored unless they are quoted with backslash.
+	e = s + strcspn_trailer(s, ' ');
+	*e = '\0';
+
+	// The prefix '!' negates the pattern
+	if (*s == '!') {
+		s++;
+		negate = true;
+	}
+
+	// If there is slash(es) that is not escaped at the end of the pattern,
+	// it matches only directories.
+	len = strcspn_trailer(s, '/');
+	if (s + len < e) {
+		dir_only = true;
+		e = s + len;
+		*e = '\0';
+	}
+
+	// Skip if the line gets empty
+	if (*s == '\0')
+		return;
+
+	// Double asterisk is tricky. Mark it to handle it specially later.
+	if (strstr(s, "**/") || strstr(s, "/**"))
+		double_asterisk = true;
+
+	// If there is a slash at the beginning or middle, the pattern
+	// is relative to the directory level of the .gitignore.
+	if (strchr(s, '/')) {
+		if (*s == '/')
+			s++;
+		path_match = true;
+	}
+
+	len = e - s;
+
+	// We need more room to store dirpath and '/'
+	if (path_match)
+		len += strlen(dirpath) + 1;
+
+	p = xmalloc(sizeof(*p) + len + 1);
+	p->negate = negate;
+	p->dir_only = dir_only;
+	p->path_match = path_match;
+	p->double_asterisk = double_asterisk;
+	p->glob[0] = '\0';
+
+	if (path_match) {
+		strcat(p->glob, dirpath);
+		strcat(p->glob, "/");
+	}
+
+	strcat(p->glob, s);
+
+	debug("Add pattern: %s%s%s\n", negate ? "!" : "", p->glob,
+	      dir_only ? "/" : "");
+
+	if (nr_patterns >= alloced_patterns) {
+		alloced_patterns += 128;
+		patterns = xrealloc(patterns,
+				    sizeof(*patterns) * alloced_patterns);
+	}
+
+	patterns[nr_patterns++] = p;
+}
+
+static void *load_gitignore(const char *dirpath)
+{
+	struct stat st;
+	char path[PATH_MAX], *buf;
+	int fd, ret;
+
+	ret = snprintf(path, sizeof(path), "%s/.gitignore", dirpath);
+	if (ret >= sizeof(path))
+		error_exit("%s: too long path was truncated\n", path);
+
+	// If .gitignore does not exist in this directory, open() fails.
+	// It is ok, just skip it.
+	fd = open(path, O_RDONLY);
+	if (fd < 0)
+		return NULL;
+
+	if (fstat(fd, &st) < 0)
+		perror_exit(path);
+
+	buf = xmalloc(st.st_size + 1);
+	if (read(fd, buf, st.st_size) != st.st_size)
+		perror_exit(path);
+
+	buf[st.st_size] = '\0';
+	if (close(fd))
+		perror_exit(path);
+
+	return buf;
+}
+
+// Parse '.gitignore' in the given directory.
+static void parse_gitignore(const char *dirpath)
+{
+	char *buf, *s, *next;
+
+	buf = load_gitignore(dirpath);
+	if (!buf)
+		return;
+
+	debug("Parse %s/.gitignore\n", dirpath);
+
+	for (s = buf; *s; s = next) {
+		next = s;
+
+		while (*next != '\0' && *next != '\n')
+			next++;
+
+		if (*next != '\0') {
+			*next = '\0';
+			next++;
+		}
+
+		add_pattern(s, dirpath);
+	}
+
+	free(buf);
+}
+
+// Save the current number of patterns and increment the depth
+static void increment_depth(void)
+{
+	if (depth >= max_depth) {
+		max_depth += 1;
+		nr_patterns_at = xrealloc(nr_patterns_at,
+					  sizeof(*nr_patterns_at) * max_depth);
+	}
+
+	nr_patterns_at[depth] = nr_patterns;
+	depth++;
+}
+
+// Decrement the depth, and free up the patterns of this directory level.
+static void decrement_depth(void)
+{
+	depth--;
+	if (depth < 0)
+		error_exit("BUG\n");
+
+	while (nr_patterns > nr_patterns_at[depth])
+		free(patterns[--nr_patterns]);
+}
+
+// If we find an ignored path, print it.
+static void print_path(const char *path)
+{
+	// The path always start with "./". If not, it is a bug.
+	if (strlen(path) < 2)
+		error_exit("BUG\n");
+
+	// Replace the root directory with the prefix you like.
+	// This is useful for the tar command.
+	fprintf(out_fp, "%s%s\n", prefix, path + 2);
+}
+
+// Traverse the entire directory tree, parsing .gitignore files.
+// Print file paths that are not tracked by git.
+//
+// Return true if all files under the directory are ignored, false otherwise.
+static bool traverse_directory(const char *dirpath)
+{
+	bool all_ignored = true;
+	DIR *dirp;
+
+	debug("Enter[%d]: %s\n", depth, dirpath);
+	increment_depth();
+
+	// We do not know whether .gitignore exists in this directory or not.
+	// Anyway, try to open it.
+	parse_gitignore(dirpath);
+
+	dirp = opendir(dirpath);
+	if (!dirp)
+		perror_exit(dirpath);
+
+	while (1) {
+		char path[PATH_MAX];
+		struct dirent *d;
+		int ret;
+
+		errno = 0;
+		d = readdir(dirp);
+		if (!d) {
+			// readdir() returns NULL on the end of the directory
+			// steam, and also on an error. To distinguish them,
+			// errno should be checked.
+			if (errno)
+				perror_exit(dirpath);
+			break;
+		}
+
+		if (!strcmp(d->d_name, "..") || !strcmp(d->d_name, "."))
+			continue;
+
+		ret = snprintf(path, sizeof(path), "%s/%s", dirpath, d->d_name);
+		if (ret >= sizeof(path))
+			error_exit("%s: too long path was truncated\n", path);
+
+		if (is_ignored(path, d->d_name, d->d_type & DT_DIR)) {
+			debug("Ignore: %s\n", path);
+			print_path(path);
+		} else {
+			if ((d->d_type & DT_DIR) && !(d->d_type & DT_LNK)) {
+				if (!traverse_directory(path))
+					all_ignored = false;
+			} else {
+				all_ignored = false;
+			}
+		}
+	}
+
+	if (closedir(dirp))
+		perror_exit(dirpath);
+
+	// If all the files under this directory are ignored, let's ignore this
+	// directory as well in order to avoid empty directories in the tarball.
+	if (all_ignored) {
+		debug("Ignore: %s (due to all files inside ignored)\n", dirpath);
+		print_path(dirpath);
+	}
+
+	decrement_depth();
+	debug("Leave[%d]: %s\n", depth, dirpath);
+
+	return all_ignored;
+}
+
+// Register hard-coded ignore patterns.
+static void add_fixed_patterns(void)
+{
+	const char * const fixed_patterns[] = {
+		".git/",
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(fixed_patterns); i++) {
+		char *s = xstrdup(fixed_patterns[i]);
+
+		add_pattern(s, ".");
+		free(s);
+	}
+}
+
+static void usage(void)
+{
+	fprintf(stderr,
+		"usage: %s [options]\n"
+		"\n"
+		"Print files that are not ignored by git\n"
+		"\n"
+		"options:\n"
+		"  -d, --debug                   print debug messages to stderr\n"
+		"  -e, --extra-pattern PATTERN   Add extra ignore patterns. This behaves like it is prepended to the top .gitignore\n"
+		"  -h, --help                    show this help message and exit\n"
+		"  -o, --output FILE             output to a file (default: '-', i.e. stdout)\n"
+		"  -p, --prefix PREFIX           prefix added to each path (default: empty string)\n"
+		"  -r, --rootdir DIR             root of the source tree (default: current working directory):\n",
+		progname);
+}
+
+int main(int argc, char *argv[])
+{
+	const char *output = "-";
+	const char *rootdir = ".";
+
+	progname = strrchr(argv[0], '/');
+	if (progname)
+		progname++;
+	else
+		progname = argv[0];
+
+	while (1) {
+		static struct option long_options[] = {
+			{"debug",         no_argument,       NULL, 'd'},
+			{"extra-pattern", required_argument, NULL, 'e'},
+			{"help",          no_argument,       NULL, 'h'},
+			{"output",        required_argument, NULL, 'o'},
+			{"prefix",        required_argument, NULL, 'p'},
+			{"rootdir",       required_argument, NULL, 'r'},
+			{},
+		};
+
+		int c = getopt_long(argc, argv, "de:ho:p:r:", long_options, NULL);
+
+		if (c == -1)
+			break;
+
+		switch (c) {
+		case 'd':
+			debug_on = true;
+			break;
+		case 'e':
+			add_pattern(optarg, ".");
+			break;
+		case 'h':
+			usage();
+			exit(0);
+		case 'o':
+			output = optarg;
+			break;
+		case 'p':
+			prefix = optarg;
+			break;
+		case 'r':
+			rootdir = optarg;
+			break;
+		case '?':
+			usage();
+			/* fallthrough */
+		default:
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	if (chdir(rootdir))
+		perror_exit(rootdir);
+
+	if (strcmp(output, "-")) {
+		out_fp = fopen(output, "w");
+		if (!out_fp)
+			perror_exit(output);
+	} else {
+		out_fp = stdout;
+	}
+
+	add_fixed_patterns();
+
+	traverse_directory(".");
+
+	if (depth != 0)
+		error_exit("BUG\n");
+
+	while (nr_patterns > 0)
+		free(patterns[--nr_patterns]);
+	free(patterns);
+	free(nr_patterns_at);
+
+	fflush(out_fp);
+	if (ferror(out_fp))
+		error_exit("not all data was written to the output\n");
+
+	if (fclose(out_fp))
+		perror_exit(output);
+
+	return 0;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 2/6] kbuild: deb-pkg: create source package without cleaning
  2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git Masahiro Yamada
@ 2023-02-02  3:37 ` Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 3/6] kbuild: rpm-pkg: build binary packages from source rpm Masahiro Yamada
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada, Tom Rix, llvm

If you run 'make deb-pkg', all objects are lost due to 'make clean',
which makes the incremental builds impossible.

Instead of cleaning, pass the exclude list to tar's --exclude-from
option.

Previously, *.diff.gz contained some check-in files such as
.clang-format, .cocciconfig.

With this commit, *.diff.gz will only contain the .config and debian/.
The other source files will go into the tarball.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

Changes in v4:
  - Fix a typo in comment

Changes in v3:
  - Add --extra-pattern='*.rej'
  - Exclude symlinks at the toplevel
  - Add --sort=name tar option

 scripts/Makefile.package | 38 +++++++++++++++++++++++++++++++++-----
 scripts/package/mkdebian | 25 +++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index dfbf40454a99..14567043a8af 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -50,6 +50,32 @@ fi ; \
 tar -I $(KGZIP) -c $(RCS_TAR_IGNORE) -f $(2).tar.gz \
 	--transform 's:^:$(2)/:S' $(TAR_CONTENT) $(3)
 
+# Source Tarball
+# ---------------------------------------------------------------------------
+
+PHONY += gen-exclude
+gen-exclude:
+	$(Q)$(MAKE) -f $(srctree)/Makefile scripts_exclude
+
+# - Commit 1f5d3a6b6532e25a5cdf1f311956b2b03d343a48 removed '*.rej' from
+#   .gitignore, but it is definitely a generated file.
+# - The kernel tree has no symlink at the toplevel. If it does, it is a
+#   generated one.
+quiet_cmd_exclude_list = GEN     $@
+      cmd_exclude_list = \
+	scripts/gen-exclude --extra-pattern='*.rej' --prefix=./ --rootdir=$(srctree) > $@; \
+	find . -maxdepth 1 -type l >> $@; \
+	echo "./$@" >> $@
+
+.exclude-list: gen-exclude
+	$(call cmd,exclude_list)
+
+quiet_cmd_tar = TAR     $@
+      cmd_tar = tar -I $(KGZIP) -c -f $@ -C $(srctree) --exclude-from=$< --exclude=./$@ --sort=name --transform 's:^\.:linux:S' .
+
+%.tar.gz: .exclude-list
+	$(call cmd,tar)
+
 # rpm-pkg
 # ---------------------------------------------------------------------------
 PHONY += rpm-pkg
@@ -81,12 +107,11 @@ binrpm-pkg:
 
 PHONY += deb-pkg
 deb-pkg:
-	$(MAKE) clean
 	$(CONFIG_SHELL) $(srctree)/scripts/package/mkdebian
-	$(call cmd,src_tar,$(KDEB_SOURCENAME))
-	origversion=$$(dpkg-parsechangelog -SVersion |sed 's/-[^-]*$$//');\
-		mv $(KDEB_SOURCENAME).tar.gz ../$(KDEB_SOURCENAME)_$${origversion}.orig.tar.gz
-	+dpkg-buildpackage -r$(KBUILD_PKG_ROOTCMD) -a$$(cat debian/arch) $(DPKG_FLAGS) --source-option=-sP -i.git -us -uc
+	$(Q)origversion=$$(dpkg-parsechangelog -SVersion |sed 's/-[^-]*$$//');\
+		$(MAKE) -f $(srctree)/scripts/Makefile.package ../$(KDEB_SOURCENAME)_$${origversion}.orig.tar.gz
+	+dpkg-buildpackage -r$(KBUILD_PKG_ROOTCMD) -a$$(cat debian/arch) $(DPKG_FLAGS) \
+		--build=source,binary --source-option=-sP -nc -us -uc
 
 PHONY += bindeb-pkg
 bindeb-pkg:
@@ -174,4 +199,7 @@ help:
 	@echo '  perf-tarxz-src-pkg  - Build $(perf-tar).tar.xz source tarball'
 	@echo '  perf-tarzst-src-pkg - Build $(perf-tar).tar.zst source tarball'
 
+PHONY += FORCE
+FORCE:
+
 .PHONY: $(PHONY)
diff --git a/scripts/package/mkdebian b/scripts/package/mkdebian
index c3bbef7a6754..2f612617cbcf 100755
--- a/scripts/package/mkdebian
+++ b/scripts/package/mkdebian
@@ -84,6 +84,8 @@ set_debarch() {
 	fi
 }
 
+rm -rf debian
+
 # Some variables and settings used throughout the script
 version=$KERNELRELEASE
 if [ -n "$KDEB_PKGVERSION" ]; then
@@ -135,6 +137,29 @@ fi
 mkdir -p debian/source/
 echo "1.0" > debian/source/format
 
+# Ugly: ignore anything except .config or debian/
+# (is there a cleaner way to do this?)
+cat<<'EOF' > debian/source/local-options
+diff-ignore
+
+extend-diff-ignore = ^[^.d]
+
+extend-diff-ignore = ^\.[^c]
+extend-diff-ignore = ^\.c($|[^o])
+extend-diff-ignore = ^\.co($|[^n])
+extend-diff-ignore = ^\.con($|[^f])
+extend-diff-ignore = ^\.conf($|[^i])
+extend-diff-ignore = ^\.confi($|[^g])
+extend-diff-ignore = ^\.config.
+
+extend-diff-ignore = ^d($|[^e])
+extend-diff-ignore = ^de($|[^b])
+extend-diff-ignore = ^deb($|[^i])
+extend-diff-ignore = ^debi($|[^a])
+extend-diff-ignore = ^debia($|[^n])
+extend-diff-ignore = ^debian[^/]
+EOF
+
 echo $debarch > debian/arch
 extra_build_depends=", $(if_enabled_echo CONFIG_UNWINDER_ORC libelf-dev:native)"
 extra_build_depends="$extra_build_depends, $(if_enabled_echo CONFIG_SYSTEM_TRUSTED_KEYRING libssl-dev:native)"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 3/6] kbuild: rpm-pkg: build binary packages from source rpm
  2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 2/6] kbuild: deb-pkg: create source package without cleaning Masahiro Yamada
@ 2023-02-02  3:37 ` Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 4/6] kbuild: srcrpm-pkg: create source package without cleaning Masahiro Yamada
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada

The build rules of rpm-pkg and srcrpm-pkg are almost the same.
Remove the code duplication.

Change rpm-pkg to build binary packages from the source package generated
by srcrpm-pkg.

This changes the output directory of the srpm generated by 'make rpm-pkg'
because srcrpm-pkg overrides _srcrpmdir.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v3)

Changes in v3:
  - Explain that the source package location will be changed.

 scripts/Makefile.package | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index 14567043a8af..ebf3db81b994 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -79,11 +79,9 @@ quiet_cmd_tar = TAR     $@
 # rpm-pkg
 # ---------------------------------------------------------------------------
 PHONY += rpm-pkg
-rpm-pkg:
-	$(MAKE) clean
-	$(CONFIG_SHELL) $(MKSPEC) >$(objtree)/kernel.spec
-	$(call cmd,src_tar,$(KERNELPATH),kernel.spec)
-	+rpmbuild $(RPMOPTS) --target $(UTS_MACHINE)-linux -ta $(KERNELPATH).tar.gz \
+rpm-pkg: srpm = $(shell rpmspec --srpm --query --queryformat='%{name}-%{VERSION}-%{RELEASE}.src.rpm' kernel.spec)
+rpm-pkg: srcrpm-pkg
+	+rpmbuild $(RPMOPTS) --target $(UTS_MACHINE)-linux -rb $(srpm) \
 	--define='_smp_mflags %{nil}'
 
 # srcrpm-pkg
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 4/6] kbuild: srcrpm-pkg: create source package without cleaning
  2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
                   ` (2 preceding siblings ...)
  2023-02-02  3:37 ` [PATCH v4 3/6] kbuild: rpm-pkg: build binary packages from source rpm Masahiro Yamada
@ 2023-02-02  3:37 ` Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 5/6] kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 6/6] kbuild: deb-pkg: switch over to format 3.0 (quilt) Masahiro Yamada
  5 siblings, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada

If you run 'make (src)rpm-pkg', all objects are lost due to 'make clean',
which makes the incremental builds impossible.

Instead of cleaning, pass the exclude list to tar's --exclude-from
option.

Previously, the .config was contained in the source tarball.

With this commit, the source rpm consists of separate linux.tar.gz
and .config.

Remove stale comments. Now, 'make (src)rpm-pkg' works with O= option.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

Changes in v4:
  - Do not delete the old tar command because it is still used
    by snap-pkg although snap-pkg is broken, and it does not work at all.

 scripts/Makefile.package | 29 +++--------------------------
 scripts/package/mkspec   |  8 ++++----
 2 files changed, 7 insertions(+), 30 deletions(-)

diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index ebf3db81b994..6732632a0259 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -3,27 +3,6 @@
 
 include $(srctree)/scripts/Kbuild.include
 
-# RPM target
-# ---------------------------------------------------------------------------
-# The rpm target generates two rpm files:
-# /usr/src/packages/SRPMS/kernel-2.6.7rc2-1.src.rpm
-# /usr/src/packages/RPMS/i386/kernel-2.6.7rc2-1.<arch>.rpm
-# The src.rpm files includes all source for the kernel being built
-# The <arch>.rpm includes kernel configuration, modules etc.
-#
-# Process to create the rpm files
-# a) clean the kernel
-# b) Generate .spec file
-# c) Build a tar ball, using symlink to make kernel version
-#    first entry in the path
-# d) and pack the result to a tar.gz file
-# e) generate the rpm files, based on kernel.spec
-# - Use /. to avoid tar packing just the symlink
-
-# Note that the rpm-pkg target cannot be used with KBUILD_OUTPUT,
-# but the binrpm-pkg target can; for some reason O= gets ignored.
-
-# Remove hyphens since they have special meaning in RPM filenames
 KERNELPATH := kernel-$(subst -,_,$(KERNELRELEASE))
 KDEB_SOURCENAME ?= linux-upstream
 KBUILD_PKG_ROOTCMD ?="fakeroot -u"
@@ -87,12 +66,10 @@ rpm-pkg: srcrpm-pkg
 # srcrpm-pkg
 # ---------------------------------------------------------------------------
 PHONY += srcrpm-pkg
-srcrpm-pkg:
-	$(MAKE) clean
+srcrpm-pkg: linux.tar.gz
 	$(CONFIG_SHELL) $(MKSPEC) >$(objtree)/kernel.spec
-	$(call cmd,src_tar,$(KERNELPATH),kernel.spec)
-	+rpmbuild $(RPMOPTS) --target $(UTS_MACHINE)-linux -ts $(KERNELPATH).tar.gz \
-	--define='_smp_mflags %{nil}' --define='_srcrpmdir $(srctree)'
+	+rpmbuild $(RPMOPTS) --target $(UTS_MACHINE)-linux -bs kernel.spec \
+	--define='_smp_mflags %{nil}' --define='_sourcedir .' --define='_srcrpmdir .'
 
 # binrpm-pkg
 # ---------------------------------------------------------------------------
diff --git a/scripts/package/mkspec b/scripts/package/mkspec
index 108c0cb95436..83a64d9d7372 100755
--- a/scripts/package/mkspec
+++ b/scripts/package/mkspec
@@ -47,7 +47,8 @@ sed -e '/^DEL/d' -e 's/^\t*//' <<EOF
 	Group: System Environment/Kernel
 	Vendor: The Linux Community
 	URL: https://www.kernel.org
-$S	Source: kernel-$__KERNELRELEASE.tar.gz
+$S	Source0: linux.tar.gz
+$S	Source1: .config
 	Provides: $PROVIDES
 $S	BuildRequires: bc binutils bison dwarves
 $S	BuildRequires: (elfutils-libelf-devel or libelf-devel) flex
@@ -83,9 +84,8 @@ $S$M	This package provides kernel headers and makefiles sufficient to build modu
 $S$M	against the $__KERNELRELEASE kernel package.
 $S$M
 $S	%prep
-$S	%setup -q
-$S	rm -f scripts/basic/fixdep scripts/kconfig/conf
-$S	rm -f tools/objtool/{fixdep,objtool}
+$S	%setup -q -n linux
+$S	cp %{SOURCE1} .
 $S
 $S	%build
 $S	$MAKE %{?_smp_mflags} KERNELRELEASE=$KERNELRELEASE KBUILD_BUILD_VERSION=%{release}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 5/6] kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile
  2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
                   ` (3 preceding siblings ...)
  2023-02-02  3:37 ` [PATCH v4 4/6] kbuild: srcrpm-pkg: create source package without cleaning Masahiro Yamada
@ 2023-02-02  3:37 ` Masahiro Yamada
  2023-02-02  3:37 ` [PATCH v4 6/6] kbuild: deb-pkg: switch over to format 3.0 (quilt) Masahiro Yamada
  5 siblings, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada

scripts/Makefile.package does not need to know the value of
KDEB_SOURCENAME because the source name can be taken from
debian/changelog by using dpkg-parsechangelog.

Move the default of KDEB_SOURCENAME (i.e. linux-upstream) to
scripts/package/mkdebian.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

(no changes since v3)

Changes in v3:
  - Move cmd_debianize

Changes in v2:
  - New patch

 scripts/Makefile.package | 23 +++++++++++++++--------
 scripts/package/mkdebian |  2 +-
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index 6732632a0259..5135a5419a72 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -4,9 +4,7 @@
 include $(srctree)/scripts/Kbuild.include
 
 KERNELPATH := kernel-$(subst -,_,$(KERNELRELEASE))
-KDEB_SOURCENAME ?= linux-upstream
 KBUILD_PKG_ROOTCMD ?="fakeroot -u"
-export KDEB_SOURCENAME
 # Include only those top-level files that are needed by make, plus the GPL copy
 TAR_CONTENT := Documentation LICENSES arch block certs crypto drivers fs \
                include init io_uring ipc kernel lib mm net rust \
@@ -80,17 +78,26 @@ binrpm-pkg:
 	+rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" --target \
 		$(UTS_MACHINE)-linux -bb $(objtree)/binkernel.spec
 
+quiet_cmd_debianize = GEN     $@
+      cmd_debianize = $(srctree)/scripts/package/mkdebian
+
+PHONY += debian
+debian:
+	$(call cmd,debianize)
+
+PHONY += debian-tarball
+debian-tarball: source = $(shell dpkg-parsechangelog -S Source)
+debian-tarball: orig-version = $(shell dpkg-parsechangelog -S Version | sed 's/-[^-]*$$//')
+debian-tarball: debian
+	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.package ../$(source)_$(orig-version).orig.tar.gz
+
 PHONY += deb-pkg
-deb-pkg:
-	$(CONFIG_SHELL) $(srctree)/scripts/package/mkdebian
-	$(Q)origversion=$$(dpkg-parsechangelog -SVersion |sed 's/-[^-]*$$//');\
-		$(MAKE) -f $(srctree)/scripts/Makefile.package ../$(KDEB_SOURCENAME)_$${origversion}.orig.tar.gz
+deb-pkg: debian-tarball
 	+dpkg-buildpackage -r$(KBUILD_PKG_ROOTCMD) -a$$(cat debian/arch) $(DPKG_FLAGS) \
 		--build=source,binary --source-option=-sP -nc -us -uc
 
 PHONY += bindeb-pkg
-bindeb-pkg:
-	$(CONFIG_SHELL) $(srctree)/scripts/package/mkdebian
+bindeb-pkg: debian
 	+dpkg-buildpackage -r$(KBUILD_PKG_ROOTCMD) -a$$(cat debian/arch) $(DPKG_FLAGS) -b -nc -uc
 
 PHONY += intdeb-pkg
diff --git a/scripts/package/mkdebian b/scripts/package/mkdebian
index 2f612617cbcf..0c1ed6215a02 100755
--- a/scripts/package/mkdebian
+++ b/scripts/package/mkdebian
@@ -95,7 +95,7 @@ else
 	revision=$($srctree/init/build-version)
 	packageversion=$version-$revision
 fi
-sourcename=$KDEB_SOURCENAME
+sourcename=${KDEB_SOURCENAME:-linux-upstream}
 
 if [ "$ARCH" = "um" ] ; then
 	packagename=user-mode-linux
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v4 6/6] kbuild: deb-pkg: switch over to format 3.0 (quilt)
  2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
                   ` (4 preceding siblings ...)
  2023-02-02  3:37 ` [PATCH v4 5/6] kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile Masahiro Yamada
@ 2023-02-02  3:37 ` Masahiro Yamada
  5 siblings, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:37 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings, Masahiro Yamada

Switch from "1.0" to "3.0 (quilt)" because it works more cleanly.

All files except .config and debian/ go into the .orig tarball.
You can add a single patch, debian/patches/config, and delete the ugly
extend-diff-ignore patterns.

The debian tarball will be compressed into *.debian.tar.xz by default.
If you like to use a different compression mode, you can pass the
command line option, DPKG_FLAGS=-Zgzip, for example.

The .orig tarball only supports gzip for now. The combination of
gzip and xz is somewhat clumsy, but it is not a practical problem.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
---

Changes in v4:
  - New patch

 scripts/Makefile.package |  2 +-
 scripts/package/mkdebian | 42 +++++++++++++++++-----------------------
 2 files changed, 19 insertions(+), 25 deletions(-)

diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index 5135a5419a72..454268a37af1 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -94,7 +94,7 @@ debian-tarball: debian
 PHONY += deb-pkg
 deb-pkg: debian-tarball
 	+dpkg-buildpackage -r$(KBUILD_PKG_ROOTCMD) -a$$(cat debian/arch) $(DPKG_FLAGS) \
-		--build=source,binary --source-option=-sP -nc -us -uc
+		--build=source,binary -nc -us -uc
 
 PHONY += bindeb-pkg
 bindeb-pkg: debian
diff --git a/scripts/package/mkdebian b/scripts/package/mkdebian
index 0c1ed6215a02..1ab4c6ee76d9 100755
--- a/scripts/package/mkdebian
+++ b/scripts/package/mkdebian
@@ -135,30 +135,24 @@ else
 fi
 
 mkdir -p debian/source/
-echo "1.0" > debian/source/format
-
-# Ugly: ignore anything except .config or debian/
-# (is there a cleaner way to do this?)
-cat<<'EOF' > debian/source/local-options
-diff-ignore
-
-extend-diff-ignore = ^[^.d]
-
-extend-diff-ignore = ^\.[^c]
-extend-diff-ignore = ^\.c($|[^o])
-extend-diff-ignore = ^\.co($|[^n])
-extend-diff-ignore = ^\.con($|[^f])
-extend-diff-ignore = ^\.conf($|[^i])
-extend-diff-ignore = ^\.confi($|[^g])
-extend-diff-ignore = ^\.config.
-
-extend-diff-ignore = ^d($|[^e])
-extend-diff-ignore = ^de($|[^b])
-extend-diff-ignore = ^deb($|[^i])
-extend-diff-ignore = ^debi($|[^a])
-extend-diff-ignore = ^debia($|[^n])
-extend-diff-ignore = ^debian[^/]
-EOF
+echo "3.0 (quilt)" > debian/source/format
+
+{
+	echo "diff-ignore"
+	echo "extend-diff-ignore = .*"
+} > debian/source/local-options
+
+# Add .config as a patch
+mkdir -p debian/patches
+{
+	echo "Subject: Add .config"
+	echo "Author: ${maintainer}"
+	echo
+	echo "--- /dev/null"
+	echo "+++ linux/.config"
+	diff -u /dev/null .config | tail -n +3
+} > debian/patches/config
+echo config > debian/patches/series
 
 echo $debarch > debian/arch
 extra_build_depends=", $(if_enabled_echo CONFIG_UNWINDER_ORC libelf-dev:native)"
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git
  2023-02-02  3:37 ` [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git Masahiro Yamada
@ 2023-02-02  3:49   ` Masahiro Yamada
  2023-02-02 11:02   ` Nicolas Schier
  1 sibling, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-02  3:49 UTC (permalink / raw)
  To: linux-kbuild
  Cc: linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Nicolas Schier, Ben Hutchings

On Thu, Feb 2, 2023 at 12:38 PM Masahiro Yamada <masahiroy@kernel.org> wrote:
>
> In short, the motivation of this commit is to build a source package
> without cleaning the source tree.
>
> The deb-pkg and (src)rpm-pkg targets first run 'make clean' before
> creating a source tarball. Otherwise build artifacts such as *.o,
> *.a, etc. would be included in the tarball. Yet, the tarball ends up
> containing several garbage files since 'make clean' does not clean
> everything.
>
> Cleaning the tree every time is annoying since it makes the incremental
> build impossible. It is desirable to create a source tarball without
> cleaning the tree.
>
> In fact, there are some ways to archive this.

"achieve this".



>
> The easiest way is 'git archive'. Actually, 'make perf-tar*-src-pkg'
> does this way, but I do not like it because it works only when the source
> tree is managed by git, and all files you want in the tarball must be
> committed in advance.
>
> I want to make it work without relying on git. We can do this.
>
> Files that are not tracked by git are generated files. We can list them
> out by parsing the .gitignore files. Of course, .gitignore does not cover
> all the cases, but it works well enough.
>
> tar(1) claims to support it:
>
>   --exclude-vcs-ignores
>
>     Exclude files that match patterns read from VCS-specific ignore files.
>     Supported files are: .cvsignore, .gitignore, .bzrignore, and .hgignore.
>
> The best scenario would be to use 'tar --exclude-vcs-ignores', but this
> option does not work. --exclude-vcs-ignore does not understand any of
> the negation (!), preceding slash, following slash, etc.. So, this option
> is just useless.
>
> Hence, I wrote this gitignore parser. The previous version [1], written
> in Python, was so slow. This version is implemented in C, so it works
> much faster.
>
> This tool traverses the source tree, parsing the .gitignore files. It
> prints the file paths that are not tracked by git. The output can be
> used for tar's --exclude-from= option.
>
> [How to test this tool]
>
>   $ git clean -dfx
>   $ make -s -j$(nproc) defconfig all                       # or allmodconifg or whatever
>   $ git archive -o ../linux1.tar --prefix=./ HEAD
>   $ tar tf ../linux1.tar | LANG=C sort > ../file-list1     # files emitted by 'git archive'
>   $ make scripts_exclude
>     HOSTCC  scripts/gen-exclude
>   $ scripts/gen-exclude --prefix=./ -o ../exclude-list
>   $ tar cf ../linux2.tar --exclude-from=../exclude-list .
>   $ tar tf ../linux2.tar | LANG=C sort > ../file-list2     # files emitted by 'tar'
>   $ diff  ../file-list1 ../file-list2 | grep -E '^(<|>)'
>   < ./Documentation/devicetree/bindings/.yamllint
>   < ./drivers/clk/.kunitconfig
>   < ./drivers/gpu/drm/tests/.kunitconfig
>   < ./drivers/gpu/drm/vc4/tests/.kunitconfig
>   < ./drivers/hid/.kunitconfig
>   < ./fs/ext4/.kunitconfig
>   < ./fs/fat/.kunitconfig
>   < ./kernel/kcsan/.kunitconfig
>   < ./lib/kunit/.kunitconfig
>   < ./mm/kfence/.kunitconfig
>   < ./net/sunrpc/.kunitconfig
>   < ./tools/testing/selftests/arm64/tags/
>   < ./tools/testing/selftests/arm64/tags/.gitignore
>   < ./tools/testing/selftests/arm64/tags/Makefile
>   < ./tools/testing/selftests/arm64/tags/run_tags_test.sh
>   < ./tools/testing/selftests/arm64/tags/tags_test.c
>   < ./tools/testing/selftests/kvm/.gitignore
>   < ./tools/testing/selftests/kvm/Makefile
>   < ./tools/testing/selftests/kvm/config
>   < ./tools/testing/selftests/kvm/settings
>
> The source tarball contains most of files that are tracked by git. You
> see some diffs, but it is just because some .gitignore files are wrong.
>
>   $ git ls-files -i -c --exclude-per-directory=.gitignore
>   Documentation/devicetree/bindings/.yamllint
>   drivers/clk/.kunitconfig
>   drivers/gpu/drm/tests/.kunitconfig
>   drivers/hid/.kunitconfig
>   fs/ext4/.kunitconfig
>   fs/fat/.kunitconfig
>   kernel/kcsan/.kunitconfig
>   lib/kunit/.kunitconfig
>   mm/kfence/.kunitconfig
>   tools/testing/selftests/arm64/tags/.gitignore
>   tools/testing/selftests/arm64/tags/Makefile
>   tools/testing/selftests/arm64/tags/run_tags_test.sh
>   tools/testing/selftests/arm64/tags/tags_test.c
>   tools/testing/selftests/kvm/.gitignore
>   tools/testing/selftests/kvm/Makefile
>   tools/testing/selftests/kvm/config
>   tools/testing/selftests/kvm/settings
>
> [1]: https://lore.kernel.org/all/20230128173843.765212-1-masahiroy@kernel.org/
>
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
>
> (no changes since v3)
>
> Changes in v3:
>  - Various code refactoring: remove struct gitignore, remove next: label etc.
>  - Support --extra-pattern option
>
> Changes in v2:
>  - Reimplement in C
>
>  Makefile              |   4 +
>  scripts/.gitignore    |   1 +
>  scripts/Makefile      |   2 +-
>  scripts/gen-exclude.c | 623 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 629 insertions(+), 1 deletion(-)
>  create mode 100644 scripts/gen-exclude.c
>
> diff --git a/Makefile b/Makefile
> index 2faf872b6808..35b294cc6f32 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1652,6 +1652,10 @@ distclean: mrproper
>  %pkg: include/config/kernel.release FORCE
>         $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.package $@
>
> +PHONY += scripts_exclude
> +scripts_exclude: scripts_basic
> +       $(Q)$(MAKE) $(build)=scripts scripts/gen-exclude
> +
>  # Brief documentation of the typical targets used
>  # ---------------------------------------------------------------------------
>
> diff --git a/scripts/.gitignore b/scripts/.gitignore
> index 6e9ce6720a05..7f433bc1461c 100644
> --- a/scripts/.gitignore
> +++ b/scripts/.gitignore
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  /asn1_compiler
> +/gen-exclude
>  /generate_rust_target
>  /insert-sys-cert
>  /kallsyms
> diff --git a/scripts/Makefile b/scripts/Makefile
> index 32b6ba722728..5dcd7f57607f 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -38,7 +38,7 @@ HOSTCFLAGS_sorttable.o += -DMCOUNT_SORT_ENABLED
>  endif
>
>  # The following programs are only built on demand
> -hostprogs += unifdef
> +hostprogs += gen-exclude unifdef
>
>  # The module linker script is preprocessed on demand
>  targets += module.lds
> diff --git a/scripts/gen-exclude.c b/scripts/gen-exclude.c
> new file mode 100644
> index 000000000000..5c4ecd902290
> --- /dev/null
> +++ b/scripts/gen-exclude.c
> @@ -0,0 +1,623 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +//
> +// Traverse the source tree, parsing all .gitignore files, and print file paths
> +// that are not tracked by git.
> +// The output is suitable to the --exclude-from option of tar.
> +// This is useful until the --exclude-vcs-ignores option gets working correctly.
> +//
> +// Copyright (C) 2023 Masahiro Yamada <masahiroy@kernel.org>
> +
> +#include <dirent.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <fnmatch.h>
> +#include <getopt.h>
> +#include <stdarg.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
> +
> +// struct pattern - represent an ignore pattern (a line in .gitignroe)
> +// @negate:          negate the pattern (prefixing '!')
> +// @dir_only:        only matches directories (trailing '/')
> +// @path_match:      true if the glob pattern is a path instead of a file name
> +// @double_asterisk: true if the glob pattern contains double asterisks ('**')
> +// @glob:            glob pattern
> +struct pattern {
> +       bool negate;
> +       bool dir_only;
> +       bool path_match;
> +       bool double_asterisk;
> +       char glob[];
> +};
> +
> +struct pattern **patterns;
> +static int nr_patterns, alloced_patterns;
> +
> +// Remember the number of patterns at each directory level
> +static int *nr_patterns_at;
> +// Track the current/max directory level;
> +static int depth, max_depth;
> +static bool debug_on;
> +static FILE *out_fp;
> +static char *prefix = "";
> +static char *progname;
> +
> +static void __attribute__((noreturn)) perror_exit(const char *s)
> +{
> +       perror(s);
> +
> +       exit(EXIT_FAILURE);
> +}
> +
> +static void __attribute__((noreturn)) error_exit(const char *fmt, ...)
> +{
> +       va_list args;
> +
> +       fprintf(stderr, "%s: error: ", progname);
> +
> +       va_start(args, fmt);
> +       vfprintf(stderr, fmt, args);
> +       va_end(args);
> +
> +       exit(EXIT_FAILURE);
> +}
> +
> +static void debug(const char *fmt, ...)
> +{
> +       va_list args;
> +       int i;
> +
> +       if (!debug_on)
> +               return;
> +
> +       fprintf(stderr, "[DEBUG]");
> +
> +       for (i = 0; i < depth * 2; i++)
> +               fputc(' ', stderr);
> +
> +       va_start(args, fmt);
> +       vfprintf(stderr, fmt, args);
> +       va_end(args);
> +}
> +
> +static void *xrealloc(void *ptr, size_t size)
> +{
> +       ptr = realloc(ptr, size);
> +       if (!ptr)
> +               perror_exit(progname);
> +
> +       return ptr;
> +}
> +
> +static void *xmalloc(size_t size)
> +{
> +       return xrealloc(NULL, size);
> +}
> +
> +static char *xstrdup(const char *s)
> +{
> +       char *new = strdup(s);
> +
> +       if (!new)
> +               perror_exit(progname);
> +
> +       return new;
> +}
> +
> +static bool simple_match(const char *string, const char *pattern)
> +{
> +       return fnmatch(pattern, string, FNM_PATHNAME) == 0;
> +}
> +
> +// Handle double asterisks ("**") matching.
> +// FIXME:
> +//  This function does not work if double asterisks apppear multiple times,
> +//  like "foo/**/bar/**/baz".
> +static bool double_asterisk_match(const char *path, const char *pattern)
> +{
> +       bool result = false;
> +       int slash_diff = 0;
> +       char *modified_pattern, *q;
> +       const char *p;
> +       size_t len;
> +
> +       for (p = path; *p; p++)
> +               if (*p == '/')
> +                       slash_diff++;
> +
> +       for (p = pattern; *p; p++)
> +               if (*p == '/')
> +                       slash_diff--;
> +
> +       len = strlen(pattern) + 1;
> +
> +       if (slash_diff > 0)
> +               len += slash_diff * 2;
> +       modified_pattern = xmalloc(len);
> +
> +       q = modified_pattern;
> +       for (p = pattern; *p; p++) {
> +               if (!strncmp(p, "**/", 3)) {
> +                       // "**/" means zero of more sequences of '*/".
> +                       // "foo**/bar" matches "foobar", "foo*/bar",
> +                       // "foo*/*/bar", etc.
> +                       while (slash_diff-- > 0) {
> +                               *q++ = '*';
> +                               *q++ = '/';
> +                       }
> +
> +                       if (slash_diff == 0) {
> +                               *q++ = '*';
> +                               *q++ = '/';
> +                       }
> +
> +                       if (slash_diff < 0)
> +                               slash_diff++;
> +
> +                       p += 2;
> +               } else if (!strcmp(p, "/**")) {
> +                       // A trailing "/**" matches everything inside.
> +                       while (slash_diff-- >= 0) {
> +                               *q++ = '/';
> +                               *q++ = '*';
> +                       }
> +
> +                       p += 2;
> +               } else {
> +                       // Copy other patterns as-is.
> +                       // Other consecutive asterisks are considered regular
> +                       // asterisks. fnmatch() already handles them like that.
> +                       *q++ = *p;
> +               }
> +       }
> +
> +       *q = '\0';
> +
> +       result = simple_match(path, modified_pattern);
> +
> +       free(modified_pattern);
> +
> +       return result;
> +}
> +
> +// Return true if the given path is ignored by git.
> +static bool is_ignored(const char *path, const char *name, bool is_dir)
> +{
> +       int i;
> +
> +       // Search the patterns in the reverse order because the last matching
> +       // pattern wins.
> +       for (i = nr_patterns - 1; i >= 0; i--) {
> +               struct pattern *p = patterns[i];
> +
> +               if (!is_dir && p->dir_only)
> +                       continue;
> +
> +               if (!p->path_match) {
> +                       // If the pattern has no slash at the beginning or
> +                       // middle, it matches against the basename. Most cases
> +                       // fall into this and work well with double asterisks.
> +                       if (!simple_match(name, p->glob))
> +                               continue;
> +               } else if (!p->double_asterisk) {
> +                       // Unless the pattern has double asterisks, it is still
> +                       // simple but matches against the path instead.
> +                       if (!simple_match(path, p->glob))
> +                               continue;
> +               } else {
> +                       // Double asterisks with a slash. Complex, but rare.
> +                       if (!double_asterisk_match(path, p->glob))
> +                               continue;
> +               }
> +
> +               debug("%s: matches %s%s%s\n", path, p->negate ? "!" : "",
> +                     p->glob, p->dir_only ? "/" : "");
> +
> +               return !p->negate;
> +       }
> +
> +       debug("%s: no match\n", path);
> +
> +       return false;
> +}
> +
> +// Return the length of the initial segment of the string that does not contain
> +// the unquoted sequence of the given character. Similar to strcspn() in libc.
> +static size_t strcspn_trailer(const char *str, char c)
> +{
> +       bool quoted = false;
> +       size_t len = strlen(str);
> +       size_t spn = len;
> +       const char *s;
> +
> +       for (s = str; *s; s++) {
> +               if (!quoted && *s == c) {
> +                       if (s - str < spn)
> +                               spn = s - str;
> +               } else {
> +                       spn = len;
> +
> +                       if (!quoted && *s == '\\')
> +                               quoted = true;
> +                       else
> +                               quoted = false;
> +               }
> +       }
> +
> +       return spn;
> +}
> +
> +// Add an gitignore pattern.
> +static void add_pattern(char *s, const char *dirpath)
> +{
> +       bool negate = false;
> +       bool dir_only = false;
> +       bool path_match = false;
> +       bool double_asterisk = false;
> +       char *e = s + strlen(s);
> +       struct pattern *p;
> +       size_t len;
> +
> +       // Skip comments
> +       if (*s == '#')
> +               return;
> +
> +       // Trailing spaces are ignored unless they are quoted with backslash.
> +       e = s + strcspn_trailer(s, ' ');
> +       *e = '\0';
> +
> +       // The prefix '!' negates the pattern
> +       if (*s == '!') {
> +               s++;
> +               negate = true;
> +       }
> +
> +       // If there is slash(es) that is not escaped at the end of the pattern,
> +       // it matches only directories.
> +       len = strcspn_trailer(s, '/');
> +       if (s + len < e) {
> +               dir_only = true;
> +               e = s + len;
> +               *e = '\0';
> +       }
> +
> +       // Skip if the line gets empty
> +       if (*s == '\0')
> +               return;
> +
> +       // Double asterisk is tricky. Mark it to handle it specially later.
> +       if (strstr(s, "**/") || strstr(s, "/**"))
> +               double_asterisk = true;
> +
> +       // If there is a slash at the beginning or middle, the pattern
> +       // is relative to the directory level of the .gitignore.
> +       if (strchr(s, '/')) {
> +               if (*s == '/')
> +                       s++;
> +               path_match = true;
> +       }
> +
> +       len = e - s;
> +
> +       // We need more room to store dirpath and '/'
> +       if (path_match)
> +               len += strlen(dirpath) + 1;
> +
> +       p = xmalloc(sizeof(*p) + len + 1);
> +       p->negate = negate;
> +       p->dir_only = dir_only;
> +       p->path_match = path_match;
> +       p->double_asterisk = double_asterisk;
> +       p->glob[0] = '\0';
> +
> +       if (path_match) {
> +               strcat(p->glob, dirpath);
> +               strcat(p->glob, "/");
> +       }
> +
> +       strcat(p->glob, s);
> +
> +       debug("Add pattern: %s%s%s\n", negate ? "!" : "", p->glob,
> +             dir_only ? "/" : "");
> +
> +       if (nr_patterns >= alloced_patterns) {
> +               alloced_patterns += 128;
> +               patterns = xrealloc(patterns,
> +                                   sizeof(*patterns) * alloced_patterns);
> +       }
> +
> +       patterns[nr_patterns++] = p;
> +}
> +
> +static void *load_gitignore(const char *dirpath)
> +{
> +       struct stat st;
> +       char path[PATH_MAX], *buf;
> +       int fd, ret;
> +
> +       ret = snprintf(path, sizeof(path), "%s/.gitignore", dirpath);
> +       if (ret >= sizeof(path))
> +               error_exit("%s: too long path was truncated\n", path);
> +
> +       // If .gitignore does not exist in this directory, open() fails.
> +       // It is ok, just skip it.
> +       fd = open(path, O_RDONLY);
> +       if (fd < 0)
> +               return NULL;
> +
> +       if (fstat(fd, &st) < 0)
> +               perror_exit(path);
> +
> +       buf = xmalloc(st.st_size + 1);
> +       if (read(fd, buf, st.st_size) != st.st_size)
> +               perror_exit(path);
> +
> +       buf[st.st_size] = '\0';
> +       if (close(fd))
> +               perror_exit(path);
> +
> +       return buf;
> +}
> +
> +// Parse '.gitignore' in the given directory.
> +static void parse_gitignore(const char *dirpath)
> +{
> +       char *buf, *s, *next;
> +
> +       buf = load_gitignore(dirpath);
> +       if (!buf)
> +               return;
> +
> +       debug("Parse %s/.gitignore\n", dirpath);
> +
> +       for (s = buf; *s; s = next) {
> +               next = s;
> +
> +               while (*next != '\0' && *next != '\n')
> +                       next++;
> +
> +               if (*next != '\0') {
> +                       *next = '\0';
> +                       next++;
> +               }
> +
> +               add_pattern(s, dirpath);
> +       }
> +
> +       free(buf);
> +}
> +
> +// Save the current number of patterns and increment the depth
> +static void increment_depth(void)
> +{
> +       if (depth >= max_depth) {
> +               max_depth += 1;
> +               nr_patterns_at = xrealloc(nr_patterns_at,
> +                                         sizeof(*nr_patterns_at) * max_depth);
> +       }
> +
> +       nr_patterns_at[depth] = nr_patterns;
> +       depth++;
> +}
> +
> +// Decrement the depth, and free up the patterns of this directory level.
> +static void decrement_depth(void)
> +{
> +       depth--;
> +       if (depth < 0)
> +               error_exit("BUG\n");
> +
> +       while (nr_patterns > nr_patterns_at[depth])
> +               free(patterns[--nr_patterns]);
> +}
> +
> +// If we find an ignored path, print it.
> +static void print_path(const char *path)
> +{
> +       // The path always start with "./". If not, it is a bug.
> +       if (strlen(path) < 2)
> +               error_exit("BUG\n");
> +
> +       // Replace the root directory with the prefix you like.
> +       // This is useful for the tar command.
> +       fprintf(out_fp, "%s%s\n", prefix, path + 2);
> +}
> +
> +// Traverse the entire directory tree, parsing .gitignore files.
> +// Print file paths that are not tracked by git.
> +//
> +// Return true if all files under the directory are ignored, false otherwise.
> +static bool traverse_directory(const char *dirpath)
> +{
> +       bool all_ignored = true;
> +       DIR *dirp;
> +
> +       debug("Enter[%d]: %s\n", depth, dirpath);
> +       increment_depth();
> +
> +       // We do not know whether .gitignore exists in this directory or not.
> +       // Anyway, try to open it.
> +       parse_gitignore(dirpath);
> +
> +       dirp = opendir(dirpath);
> +       if (!dirp)
> +               perror_exit(dirpath);
> +
> +       while (1) {
> +               char path[PATH_MAX];
> +               struct dirent *d;
> +               int ret;
> +
> +               errno = 0;
> +               d = readdir(dirp);
> +               if (!d) {
> +                       // readdir() returns NULL on the end of the directory
> +                       // steam, and also on an error. To distinguish them,
> +                       // errno should be checked.
> +                       if (errno)
> +                               perror_exit(dirpath);
> +                       break;
> +               }
> +
> +               if (!strcmp(d->d_name, "..") || !strcmp(d->d_name, "."))
> +                       continue;
> +
> +               ret = snprintf(path, sizeof(path), "%s/%s", dirpath, d->d_name);
> +               if (ret >= sizeof(path))
> +                       error_exit("%s: too long path was truncated\n", path);
> +
> +               if (is_ignored(path, d->d_name, d->d_type & DT_DIR)) {
> +                       debug("Ignore: %s\n", path);
> +                       print_path(path);
> +               } else {
> +                       if ((d->d_type & DT_DIR) && !(d->d_type & DT_LNK)) {
> +                               if (!traverse_directory(path))
> +                                       all_ignored = false;
> +                       } else {
> +                               all_ignored = false;
> +                       }
> +               }
> +       }
> +
> +       if (closedir(dirp))
> +               perror_exit(dirpath);
> +
> +       // If all the files under this directory are ignored, let's ignore this
> +       // directory as well in order to avoid empty directories in the tarball.
> +       if (all_ignored) {
> +               debug("Ignore: %s (due to all files inside ignored)\n", dirpath);
> +               print_path(dirpath);
> +       }
> +
> +       decrement_depth();
> +       debug("Leave[%d]: %s\n", depth, dirpath);
> +
> +       return all_ignored;
> +}
> +
> +// Register hard-coded ignore patterns.
> +static void add_fixed_patterns(void)
> +{
> +       const char * const fixed_patterns[] = {
> +               ".git/",
> +       };
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(fixed_patterns); i++) {
> +               char *s = xstrdup(fixed_patterns[i]);
> +
> +               add_pattern(s, ".");
> +               free(s);
> +       }
> +}
> +
> +static void usage(void)
> +{
> +       fprintf(stderr,
> +               "usage: %s [options]\n"
> +               "\n"
> +               "Print files that are not ignored by git\n"
> +               "\n"
> +               "options:\n"
> +               "  -d, --debug                   print debug messages to stderr\n"
> +               "  -e, --extra-pattern PATTERN   Add extra ignore patterns. This behaves like it is prepended to the top .gitignore\n"
> +               "  -h, --help                    show this help message and exit\n"
> +               "  -o, --output FILE             output to a file (default: '-', i.e. stdout)\n"
> +               "  -p, --prefix PREFIX           prefix added to each path (default: empty string)\n"
> +               "  -r, --rootdir DIR             root of the source tree (default: current working directory):\n",
> +               progname);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +       const char *output = "-";
> +       const char *rootdir = ".";
> +
> +       progname = strrchr(argv[0], '/');
> +       if (progname)
> +               progname++;
> +       else
> +               progname = argv[0];
> +
> +       while (1) {
> +               static struct option long_options[] = {
> +                       {"debug",         no_argument,       NULL, 'd'},
> +                       {"extra-pattern", required_argument, NULL, 'e'},
> +                       {"help",          no_argument,       NULL, 'h'},
> +                       {"output",        required_argument, NULL, 'o'},
> +                       {"prefix",        required_argument, NULL, 'p'},
> +                       {"rootdir",       required_argument, NULL, 'r'},
> +                       {},
> +               };
> +
> +               int c = getopt_long(argc, argv, "de:ho:p:r:", long_options, NULL);
> +
> +               if (c == -1)
> +                       break;
> +
> +               switch (c) {
> +               case 'd':
> +                       debug_on = true;
> +                       break;
> +               case 'e':
> +                       add_pattern(optarg, ".");
> +                       break;
> +               case 'h':
> +                       usage();
> +                       exit(0);
> +               case 'o':
> +                       output = optarg;
> +                       break;
> +               case 'p':
> +                       prefix = optarg;
> +                       break;
> +               case 'r':
> +                       rootdir = optarg;
> +                       break;
> +               case '?':
> +                       usage();
> +                       /* fallthrough */
> +               default:
> +                       exit(EXIT_FAILURE);
> +               }
> +       }
> +
> +       if (chdir(rootdir))
> +               perror_exit(rootdir);
> +
> +       if (strcmp(output, "-")) {
> +               out_fp = fopen(output, "w");
> +               if (!out_fp)
> +                       perror_exit(output);
> +       } else {
> +               out_fp = stdout;
> +       }
> +
> +       add_fixed_patterns();
> +
> +       traverse_directory(".");
> +
> +       if (depth != 0)
> +               error_exit("BUG\n");
> +
> +       while (nr_patterns > 0)
> +               free(patterns[--nr_patterns]);
> +       free(patterns);
> +       free(nr_patterns_at);
> +
> +       fflush(out_fp);
> +       if (ferror(out_fp))
> +               error_exit("not all data was written to the output\n");
> +
> +       if (fclose(out_fp))
> +               perror_exit(output);
> +
> +       return 0;
> +}
> --
> 2.34.1
>


-- 
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git
  2023-02-02  3:37 ` [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git Masahiro Yamada
  2023-02-02  3:49   ` Masahiro Yamada
@ 2023-02-02 11:02   ` Nicolas Schier
  2023-02-06  3:29     ` Masahiro Yamada
  1 sibling, 1 reply; 10+ messages in thread
From: Nicolas Schier @ 2023-02-02 11:02 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linux-kbuild, linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Ben Hutchings

[-- Attachment #1: Type: text/plain, Size: 25439 bytes --]

On Thu, Feb 02, 2023 at 12:37:11PM +0900 Masahiro Yamada wrote:
> In short, the motivation of this commit is to build a source package
> without cleaning the source tree.
> 
> The deb-pkg and (src)rpm-pkg targets first run 'make clean' before
> creating a source tarball. Otherwise build artifacts such as *.o,
> *.a, etc. would be included in the tarball. Yet, the tarball ends up
> containing several garbage files since 'make clean' does not clean
> everything.
> 
> Cleaning the tree every time is annoying since it makes the incremental
> build impossible. It is desirable to create a source tarball without
> cleaning the tree.
> 
> In fact, there are some ways to archive this.
> 
> The easiest way is 'git archive'. Actually, 'make perf-tar*-src-pkg'
> does this way, but I do not like it because it works only when the source
> tree is managed by git, and all files you want in the tarball must be
> committed in advance.
> 
> I want to make it work without relying on git. We can do this.
> 
> Files that are not tracked by git are generated files. We can list them
> out by parsing the .gitignore files. Of course, .gitignore does not cover
> all the cases, but it works well enough.
> 
> tar(1) claims to support it:
> 
>   --exclude-vcs-ignores
> 
>     Exclude files that match patterns read from VCS-specific ignore files.
>     Supported files are: .cvsignore, .gitignore, .bzrignore, and .hgignore.
> 
> The best scenario would be to use 'tar --exclude-vcs-ignores', but this
> option does not work. --exclude-vcs-ignore does not understand any of
> the negation (!), preceding slash, following slash, etc.. So, this option
> is just useless.
> 
> Hence, I wrote this gitignore parser. The previous version [1], written
> in Python, was so slow. This version is implemented in C, so it works
> much faster.
> 
> This tool traverses the source tree, parsing the .gitignore files. It
> prints the file paths that are not tracked by git. The output can be
> used for tar's --exclude-from= option.
> 
> [How to test this tool]
> 
>   $ git clean -dfx
>   $ make -s -j$(nproc) defconfig all                       # or allmodconifg or whatever
>   $ git archive -o ../linux1.tar --prefix=./ HEAD
>   $ tar tf ../linux1.tar | LANG=C sort > ../file-list1     # files emitted by 'git archive'
>   $ make scripts_exclude
>     HOSTCC  scripts/gen-exclude
>   $ scripts/gen-exclude --prefix=./ -o ../exclude-list
>   $ tar cf ../linux2.tar --exclude-from=../exclude-list .
>   $ tar tf ../linux2.tar | LANG=C sort > ../file-list2     # files emitted by 'tar'
>   $ diff  ../file-list1 ../file-list2 | grep -E '^(<|>)'
>   < ./Documentation/devicetree/bindings/.yamllint
>   < ./drivers/clk/.kunitconfig
>   < ./drivers/gpu/drm/tests/.kunitconfig
>   < ./drivers/gpu/drm/vc4/tests/.kunitconfig
>   < ./drivers/hid/.kunitconfig
>   < ./fs/ext4/.kunitconfig
>   < ./fs/fat/.kunitconfig
>   < ./kernel/kcsan/.kunitconfig
>   < ./lib/kunit/.kunitconfig
>   < ./mm/kfence/.kunitconfig
>   < ./net/sunrpc/.kunitconfig
>   < ./tools/testing/selftests/arm64/tags/
>   < ./tools/testing/selftests/arm64/tags/.gitignore
>   < ./tools/testing/selftests/arm64/tags/Makefile
>   < ./tools/testing/selftests/arm64/tags/run_tags_test.sh
>   < ./tools/testing/selftests/arm64/tags/tags_test.c
>   < ./tools/testing/selftests/kvm/.gitignore
>   < ./tools/testing/selftests/kvm/Makefile
>   < ./tools/testing/selftests/kvm/config
>   < ./tools/testing/selftests/kvm/settings
> 
> The source tarball contains most of files that are tracked by git. You
> see some diffs, but it is just because some .gitignore files are wrong.
> 
>   $ git ls-files -i -c --exclude-per-directory=.gitignore
>   Documentation/devicetree/bindings/.yamllint
>   drivers/clk/.kunitconfig
>   drivers/gpu/drm/tests/.kunitconfig
>   drivers/hid/.kunitconfig
>   fs/ext4/.kunitconfig
>   fs/fat/.kunitconfig
>   kernel/kcsan/.kunitconfig
>   lib/kunit/.kunitconfig
>   mm/kfence/.kunitconfig
>   tools/testing/selftests/arm64/tags/.gitignore
>   tools/testing/selftests/arm64/tags/Makefile
>   tools/testing/selftests/arm64/tags/run_tags_test.sh
>   tools/testing/selftests/arm64/tags/tags_test.c
>   tools/testing/selftests/kvm/.gitignore
>   tools/testing/selftests/kvm/Makefile
>   tools/testing/selftests/kvm/config
>   tools/testing/selftests/kvm/settings
> 
> [1]: https://lore.kernel.org/all/20230128173843.765212-1-masahiroy@kernel.org/
> 
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> 
> (no changes since v3)
> 
> Changes in v3:
>  - Various code refactoring: remove struct gitignore, remove next: label etc.
>  - Support --extra-pattern option
> 
> Changes in v2:
>  - Reimplement in C
> 
>  Makefile              |   4 +
>  scripts/.gitignore    |   1 +
>  scripts/Makefile      |   2 +-
>  scripts/gen-exclude.c | 623 ++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 629 insertions(+), 1 deletion(-)
>  create mode 100644 scripts/gen-exclude.c
> 
> diff --git a/Makefile b/Makefile
> index 2faf872b6808..35b294cc6f32 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1652,6 +1652,10 @@ distclean: mrproper
>  %pkg: include/config/kernel.release FORCE
>  	$(Q)$(MAKE) -f $(srctree)/scripts/Makefile.package $@
>  
> +PHONY += scripts_exclude
> +scripts_exclude: scripts_basic
> +	$(Q)$(MAKE) $(build)=scripts scripts/gen-exclude
> +
>  # Brief documentation of the typical targets used
>  # ---------------------------------------------------------------------------
>  
> diff --git a/scripts/.gitignore b/scripts/.gitignore
> index 6e9ce6720a05..7f433bc1461c 100644
> --- a/scripts/.gitignore
> +++ b/scripts/.gitignore
> @@ -1,5 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  /asn1_compiler
> +/gen-exclude
>  /generate_rust_target
>  /insert-sys-cert
>  /kallsyms
> diff --git a/scripts/Makefile b/scripts/Makefile
> index 32b6ba722728..5dcd7f57607f 100644
> --- a/scripts/Makefile
> +++ b/scripts/Makefile
> @@ -38,7 +38,7 @@ HOSTCFLAGS_sorttable.o += -DMCOUNT_SORT_ENABLED
>  endif
>  
>  # The following programs are only built on demand
> -hostprogs += unifdef
> +hostprogs += gen-exclude unifdef
>  
>  # The module linker script is preprocessed on demand
>  targets += module.lds
> diff --git a/scripts/gen-exclude.c b/scripts/gen-exclude.c
> new file mode 100644
> index 000000000000..5c4ecd902290
> --- /dev/null
> +++ b/scripts/gen-exclude.c
> @@ -0,0 +1,623 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +//
> +// Traverse the source tree, parsing all .gitignore files, and print file paths
> +// that are not tracked by git.
> +// The output is suitable to the --exclude-from option of tar.
> +// This is useful until the --exclude-vcs-ignores option gets working correctly.
> +//
> +// Copyright (C) 2023 Masahiro Yamada <masahiroy@kernel.org>
> +
> +#include <dirent.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <fnmatch.h>
> +#include <getopt.h>
> +#include <stdarg.h>
> +#include <stdbool.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/stat.h>
> +#include <sys/types.h>
> +#include <unistd.h>
> +
> +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
> +
> +// struct pattern - represent an ignore pattern (a line in .gitignroe)
> +// @negate:          negate the pattern (prefixing '!')
> +// @dir_only:        only matches directories (trailing '/')
> +// @path_match:      true if the glob pattern is a path instead of a file name
> +// @double_asterisk: true if the glob pattern contains double asterisks ('**')
> +// @glob:            glob pattern
> +struct pattern {
> +	bool negate;
> +	bool dir_only;
> +	bool path_match;
> +	bool double_asterisk;
> +	char glob[];
> +};
> +
> +struct pattern **patterns;

Is there a reason, why patterns is not static?  (sparse asked)

> +static int nr_patterns, alloced_patterns;
> +
> +// Remember the number of patterns at each directory level
> +static int *nr_patterns_at;
> +// Track the current/max directory level;
> +static int depth, max_depth;
> +static bool debug_on;
> +static FILE *out_fp;
> +static char *prefix = "";
> +static char *progname;
> +
> +static void __attribute__((noreturn)) perror_exit(const char *s)
> +{
> +	perror(s);
> +
> +	exit(EXIT_FAILURE);
> +}
> +
> +static void __attribute__((noreturn)) error_exit(const char *fmt, ...)
> +{
> +	va_list args;
> +
> +	fprintf(stderr, "%s: error: ", progname);
> +
> +	va_start(args, fmt);
> +	vfprintf(stderr, fmt, args);
> +	va_end(args);
> +
> +	exit(EXIT_FAILURE);
> +}
> +
> +static void debug(const char *fmt, ...)
> +{
> +	va_list args;
> +	int i;
> +
> +	if (!debug_on)
> +		return;
> +
> +	fprintf(stderr, "[DEBUG]");
> +
> +	for (i = 0; i < depth * 2; i++)
> +		fputc(' ', stderr);
> +
> +	va_start(args, fmt);
> +	vfprintf(stderr, fmt, args);
> +	va_end(args);
> +}
> +
> +static void *xrealloc(void *ptr, size_t size)
> +{
> +	ptr = realloc(ptr, size);
> +	if (!ptr)
> +		perror_exit(progname);
> +
> +	return ptr;
> +}
> +
> +static void *xmalloc(size_t size)
> +{
> +	return xrealloc(NULL, size);
> +}
> +
> +static char *xstrdup(const char *s)
> +{
> +	char *new = strdup(s);
> +
> +	if (!new)
> +		perror_exit(progname);
> +
> +	return new;
> +}
> +
> +static bool simple_match(const char *string, const char *pattern)
> +{
> +	return fnmatch(pattern, string, FNM_PATHNAME) == 0;
> +}
> +
> +// Handle double asterisks ("**") matching.
> +// FIXME:
> +//  This function does not work if double asterisks apppear multiple times,
> +//  like "foo/**/bar/**/baz".
> +static bool double_asterisk_match(const char *path, const char *pattern)
> +{
> +	bool result = false;
> +	int slash_diff = 0;
> +	char *modified_pattern, *q;
> +	const char *p;
> +	size_t len;
> +
> +	for (p = path; *p; p++)
> +		if (*p == '/')
> +			slash_diff++;
> +
> +	for (p = pattern; *p; p++)
> +		if (*p == '/')
> +			slash_diff--;
> +
> +	len = strlen(pattern) + 1;
> +
> +	if (slash_diff > 0)
> +		len += slash_diff * 2;
> +	modified_pattern = xmalloc(len);
> +
> +	q = modified_pattern;
> +	for (p = pattern; *p; p++) {
> +		if (!strncmp(p, "**/", 3)) {
> +			// "**/" means zero of more sequences of '*/".
> +			// "foo**/bar" matches "foobar", "foo*/bar",
> +			// "foo*/*/bar", etc.
> +			while (slash_diff-- > 0) {
> +				*q++ = '*';
> +				*q++ = '/';
> +			}
> +
> +			if (slash_diff == 0) {
> +				*q++ = '*';
> +				*q++ = '/';
> +			}
> +
> +			if (slash_diff < 0)
> +				slash_diff++;
> +
> +			p += 2;
> +		} else if (!strcmp(p, "/**")) {
> +			// A trailing "/**" matches everything inside.

In v2 you also checked against "(*p + 3) == '\0'".  Is the explicit check
against end-of-string really not needed here?  (pattern = "whatever/**/*.tmp"?)

> +			while (slash_diff-- >= 0) {
> +				*q++ = '/';
> +				*q++ = '*';
> +			}
> +
> +			p += 2;
> +		} else {
> +			// Copy other patterns as-is.
> +			// Other consecutive asterisks are considered regular
> +			// asterisks. fnmatch() already handles them like that.
> +			*q++ = *p;
> +		}
> +	}
> +
> +	*q = '\0';
> +
> +	result = simple_match(path, modified_pattern);
> +
> +	free(modified_pattern);
> +
> +	return result;
> +}
> +
> +// Return true if the given path is ignored by git.
> +static bool is_ignored(const char *path, const char *name, bool is_dir)
> +{
> +	int i;
> +
> +	// Search the patterns in the reverse order because the last matching
> +	// pattern wins.
> +	for (i = nr_patterns - 1; i >= 0; i--) {
> +		struct pattern *p = patterns[i];
> +
> +		if (!is_dir && p->dir_only)
> +			continue;
> +
> +		if (!p->path_match) {
> +			// If the pattern has no slash at the beginning or
> +			// middle, it matches against the basename. Most cases
> +			// fall into this and work well with double asterisks.
> +			if (!simple_match(name, p->glob))
> +				continue;
> +		} else if (!p->double_asterisk) {
> +			// Unless the pattern has double asterisks, it is still
> +			// simple but matches against the path instead.
> +			if (!simple_match(path, p->glob))
> +				continue;
> +		} else {
> +			// Double asterisks with a slash. Complex, but rare.
> +			if (!double_asterisk_match(path, p->glob))
> +				continue;
> +		}
> +
> +		debug("%s: matches %s%s%s\n", path, p->negate ? "!" : "",
> +		      p->glob, p->dir_only ? "/" : "");
> +
> +		return !p->negate;
> +	}
> +
> +	debug("%s: no match\n", path);
> +
> +	return false;
> +}
> +
> +// Return the length of the initial segment of the string that does not contain
> +// the unquoted sequence of the given character. Similar to strcspn() in libc.

I struggled across that comment and it took me quite some time to match it to
strcspn_trailers() behaviour.  I expect it to strip all unescaped occurrences
of c at the end of str and return the resulting strlen.  After reading it
several times, I can get a match.  I _think_ main confusion came from my (quite
imperfect) English:

  "one two  "
   ^^^         initial segment of string not containing unquoted c ??

   ^^^^^^^     substr that is considered by strcspn_trailer

But this is just about a comment and I'm sure I understand what is intended.
No action required.

> +static size_t strcspn_trailer(const char *str, char c)
> +{
> +	bool quoted = false;
> +	size_t len = strlen(str);
> +	size_t spn = len;
> +	const char *s;
> +
> +	for (s = str; *s; s++) {
> +		if (!quoted && *s == c) {
> +			if (s - str < spn)
> +				spn = s - str;
> +		} else {
> +			spn = len;

Is this really intended?  Or 'spn = str - s + 1'?

> +
> +			if (!quoted && *s == '\\')
> +				quoted = true;
> +			else
> +				quoted = false;
> +		}
> +	}
> +
> +	return spn;
> +}
> +
> +// Add an gitignore pattern.
> +static void add_pattern(char *s, const char *dirpath)
> +{
> +	bool negate = false;
> +	bool dir_only = false;
> +	bool path_match = false;
> +	bool double_asterisk = false;
> +	char *e = s + strlen(s);
> +	struct pattern *p;
> +	size_t len;
> +
> +	// Skip comments
> +	if (*s == '#')
> +		return;
> +
> +	// Trailing spaces are ignored unless they are quoted with backslash.
> +	e = s + strcspn_trailer(s, ' ');
> +	*e = '\0';
> +
> +	// The prefix '!' negates the pattern
> +	if (*s == '!') {
> +		s++;
> +		negate = true;
> +	}
> +
> +	// If there is slash(es) that is not escaped at the end of the pattern,
> +	// it matches only directories.

Are escaped slashes allowed in file names in git?  I think use of original
strcspn() would have been enough.

> +	len = strcspn_trailer(s, '/');
> +	if (s + len < e) {
> +		dir_only = true;
> +		e = s + len;
> +		*e = '\0';
> +	}
> +
> +	// Skip if the line gets empty
> +	if (*s == '\0')
> +		return;
> +
> +	// Double asterisk is tricky. Mark it to handle it specially later.
> +	if (strstr(s, "**/") || strstr(s, "/**"))
> +		double_asterisk = true;
> +
> +	// If there is a slash at the beginning or middle, the pattern
> +	// is relative to the directory level of the .gitignore.
> +	if (strchr(s, '/')) {
> +		if (*s == '/')
> +			s++;
> +		path_match = true;
> +	}
> +
> +	len = e - s;
> +
> +	// We need more room to store dirpath and '/'
> +	if (path_match)
> +		len += strlen(dirpath) + 1;
> +
> +	p = xmalloc(sizeof(*p) + len + 1);
> +	p->negate = negate;
> +	p->dir_only = dir_only;
> +	p->path_match = path_match;
> +	p->double_asterisk = double_asterisk;
> +	p->glob[0] = '\0';

(bike-shedding)
  p = (struct pattern) {
	.negate = negate,
	.dir_only = dir_only,
	.path_match = path_match,
	.double_asterisk = double_asterisk,
  };


> +
> +	if (path_match) {
> +		strcat(p->glob, dirpath);
> +		strcat(p->glob, "/");
> +	}
> +
> +	strcat(p->glob, s);
> +
> +	debug("Add pattern: %s%s%s\n", negate ? "!" : "", p->glob,
> +	      dir_only ? "/" : "");
> +
> +	if (nr_patterns >= alloced_patterns) {
> +		alloced_patterns += 128;
> +		patterns = xrealloc(patterns,
> +				    sizeof(*patterns) * alloced_patterns);
> +	}
> +
> +	patterns[nr_patterns++] = p;
> +}
> +
> +static void *load_gitignore(const char *dirpath)
> +{
> +	struct stat st;
> +	char path[PATH_MAX], *buf;
> +	int fd, ret;
> +
> +	ret = snprintf(path, sizeof(path), "%s/.gitignore", dirpath);
> +	if (ret >= sizeof(path))
> +		error_exit("%s: too long path was truncated\n", path);
> +
> +	// If .gitignore does not exist in this directory, open() fails.
> +	// It is ok, just skip it.
> +	fd = open(path, O_RDONLY);
> +	if (fd < 0)
> +		return NULL;

Why don't you check against errno == 2 (ENOENT)?  I assume, no other 
errno value is expected, but for me it feels a bit odd to not check it 
and exit loudly if something (unlikely) like EMFILE causes open() to 
fail.

> +
> +	if (fstat(fd, &st) < 0)
> +		perror_exit(path);
> +
> +	buf = xmalloc(st.st_size + 1);
> +	if (read(fd, buf, st.st_size) != st.st_size)
> +		perror_exit(path);
> +
> +	buf[st.st_size] = '\0';
> +	if (close(fd))
> +		perror_exit(path);
> +
> +	return buf;
> +}
> +
> +// Parse '.gitignore' in the given directory.
> +static void parse_gitignore(const char *dirpath)
> +{
> +	char *buf, *s, *next;
> +
> +	buf = load_gitignore(dirpath);
> +	if (!buf)
> +		return;
> +
> +	debug("Parse %s/.gitignore\n", dirpath);
> +
> +	for (s = buf; *s; s = next) {
> +		next = s;
> +
> +		while (*next != '\0' && *next != '\n')

Not relevant for in-tree use: git does not complain about '\0' in a .gitignore
but also handles the remaining part of the file.

> +			next++;
> +
> +		if (*next != '\0') {
> +			*next = '\0';
> +			next++;
> +		}
> +
> +		add_pattern(s, dirpath);
> +	}
> +
> +	free(buf);
> +}
> +
> +// Save the current number of patterns and increment the depth
> +static void increment_depth(void)
> +{
> +	if (depth >= max_depth) {
> +		max_depth += 1;
> +		nr_patterns_at = xrealloc(nr_patterns_at,
> +					  sizeof(*nr_patterns_at) * max_depth);
> +	}
> +
> +	nr_patterns_at[depth] = nr_patterns;
> +	depth++;
> +}
> +
> +// Decrement the depth, and free up the patterns of this directory level.
> +static void decrement_depth(void)
> +{
> +	depth--;
> +	if (depth < 0)
> +		error_exit("BUG\n");
> +
> +	while (nr_patterns > nr_patterns_at[depth])
> +		free(patterns[--nr_patterns]);
> +}
> +
> +// If we find an ignored path, print it.
> +static void print_path(const char *path)
> +{
> +	// The path always start with "./". If not, it is a bug.
> +	if (strlen(path) < 2)
> +		error_exit("BUG\n");
> +
> +	// Replace the root directory with the prefix you like.
> +	// This is useful for the tar command.
> +	fprintf(out_fp, "%s%s\n", prefix, path + 2);
> +}
> +
> +// Traverse the entire directory tree, parsing .gitignore files.
> +// Print file paths that are not tracked by git.
> +//
> +// Return true if all files under the directory are ignored, false otherwise.
> +static bool traverse_directory(const char *dirpath)
> +{
> +	bool all_ignored = true;
> +	DIR *dirp;
> +
> +	debug("Enter[%d]: %s\n", depth, dirpath);
> +	increment_depth();
> +
> +	// We do not know whether .gitignore exists in this directory or not.
> +	// Anyway, try to open it.
> +	parse_gitignore(dirpath);
> +
> +	dirp = opendir(dirpath);
> +	if (!dirp)
> +		perror_exit(dirpath);
> +
> +	while (1) {
> +		char path[PATH_MAX];
> +		struct dirent *d;
> +		int ret;
> +
> +		errno = 0;
> +		d = readdir(dirp);
> +		if (!d) {
> +			// readdir() returns NULL on the end of the directory
> +			// steam, and also on an error. To distinguish them,
> +			// errno should be checked.
> +			if (errno)
> +				perror_exit(dirpath);
> +			break;
> +		}
> +
> +		if (!strcmp(d->d_name, "..") || !strcmp(d->d_name, "."))
> +			continue;
> +
> +		ret = snprintf(path, sizeof(path), "%s/%s", dirpath, d->d_name);
> +		if (ret >= sizeof(path))
> +			error_exit("%s: too long path was truncated\n", path);
> +
> +		if (is_ignored(path, d->d_name, d->d_type & DT_DIR)) {
> +			debug("Ignore: %s\n", path);
> +			print_path(path);
> +		} else {
> +			if ((d->d_type & DT_DIR) && !(d->d_type & DT_LNK)) {
> +				if (!traverse_directory(path))
> +					all_ignored = false;
> +			} else {
> +				all_ignored = false;
> +			}
> +		}
> +	}
> +
> +	if (closedir(dirp))
> +		perror_exit(dirpath);
> +
> +	// If all the files under this directory are ignored, let's ignore this
> +	// directory as well in order to avoid empty directories in the tarball.
> +	if (all_ignored) {
> +		debug("Ignore: %s (due to all files inside ignored)\n", dirpath);
> +		print_path(dirpath);
> +	}
> +
> +	decrement_depth();
> +	debug("Leave[%d]: %s\n", depth, dirpath);
> +
> +	return all_ignored;
> +}
> +
> +// Register hard-coded ignore patterns.
> +static void add_fixed_patterns(void)
> +{
> +	const char * const fixed_patterns[] = {
> +		".git/",
> +	};
> +	int i;
> +
> +	for (i = 0; i < ARRAY_SIZE(fixed_patterns); i++) {
> +		char *s = xstrdup(fixed_patterns[i]);
> +
> +		add_pattern(s, ".");
> +		free(s);
> +	}
> +}
> +
> +static void usage(void)
> +{
> +	fprintf(stderr,
> +		"usage: %s [options]\n"
> +		"\n"
> +		"Print files that are not ignored by git\n"
> +		"\n"
> +		"options:\n"
> +		"  -d, --debug                   print debug messages to stderr\n"
> +		"  -e, --extra-pattern PATTERN   Add extra ignore patterns. This behaves like it is prepended to the top .gitignore\n"
> +		"  -h, --help                    show this help message and exit\n"
> +		"  -o, --output FILE             output to a file (default: '-', i.e. stdout)\n"
> +		"  -p, --prefix PREFIX           prefix added to each path (default: empty string)\n"
> +		"  -r, --rootdir DIR             root of the source tree (default: current working directory):\n",
> +		progname);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	const char *output = "-";
> +	const char *rootdir = ".";
> +
> +	progname = strrchr(argv[0], '/');
> +	if (progname)
> +		progname++;
> +	else
> +		progname = argv[0];
> +
> +	while (1) {
> +		static struct option long_options[] = {
> +			{"debug",         no_argument,       NULL, 'd'},
> +			{"extra-pattern", required_argument, NULL, 'e'},
> +			{"help",          no_argument,       NULL, 'h'},
> +			{"output",        required_argument, NULL, 'o'},
> +			{"prefix",        required_argument, NULL, 'p'},
> +			{"rootdir",       required_argument, NULL, 'r'},
> +			{},
> +		};
> +
> +		int c = getopt_long(argc, argv, "de:ho:p:r:", long_options, NULL);
> +
> +		if (c == -1)
> +			break;
> +
> +		switch (c) {
> +		case 'd':
> +			debug_on = true;
> +			break;
> +		case 'e':
> +			add_pattern(optarg, ".");
> +			break;
> +		case 'h':
> +			usage();
> +			exit(0);
> +		case 'o':
> +			output = optarg;
> +			break;
> +		case 'p':
> +			prefix = optarg;
> +			break;
> +		case 'r':
> +			rootdir = optarg;
> +			break;
> +		case '?':
> +			usage();
> +			/* fallthrough */
> +		default:
> +			exit(EXIT_FAILURE);
> +		}
> +	}
> +
> +	if (chdir(rootdir))
> +		perror_exit(rootdir);
> +
> +	if (strcmp(output, "-")) {
> +		out_fp = fopen(output, "w");
> +		if (!out_fp)
> +			perror_exit(output);
> +	} else {
> +		out_fp = stdout;
> +	}
> +
> +	add_fixed_patterns();
> +
> +	traverse_directory(".");
> +
> +	if (depth != 0)
> +		error_exit("BUG\n");
> +
> +	while (nr_patterns > 0)
> +		free(patterns[--nr_patterns]);
> +	free(patterns);
> +	free(nr_patterns_at);
> +
> +	fflush(out_fp);
> +	if (ferror(out_fp))
> +		error_exit("not all data was written to the output\n");
> +
> +	if (fclose(out_fp))
> +		perror_exit(output);
> +
> +	return 0;
> +}
> -- 
> 2.34.1

I like the idea of gen-exclude.

Testing with some strange patterns seems to reveal some missing points.  It
should not be problematic, as nobody wants to write such .gitignore patterns,
but for completeness:

  $ mkdir -p test/foo/bar
  $ touch test/foo/bar/baz.tmp
  $ cat <<-eof >test/.gitignore
  **/*.tmp
  **/baz.tmp
  foo/**/*.tmp
  **/bar/baz.tmp
  /**/*.tmp
  eof
  $ cd test
  $ ../scripts/gen-exclude --debug
  [DEBUG]Add pattern: .git/
  [DEBUG]Enter[0]: .
  [DEBUG]  ./test: no match
  [DEBUG]  Enter[1]: ./test
  [DEBUG]    Parse ./test/.gitignore
  [DEBUG]    Add pattern: ./test/**/*.tmp
  [DEBUG]    Add pattern: ./test/**/baz.tmp
  [DEBUG]    Add pattern: ./test/foo/**/*.tmp
  [DEBUG]    Add pattern: ./test/**/bar/baz.tmp
  [DEBUG]    Add pattern: ./test/**/*.tmp
  [DEBUG]    ./test/.gitignore: no match
  [DEBUG]    ./test/foo: no match
  [DEBUG]    Enter[2]: ./test/foo
  [DEBUG]      ./test/foo/bar: no match
  [DEBUG]      Enter[3]: ./test/foo/bar
  [DEBUG]        ./test/foo/bar/baz.tmp: no match
  [DEBUG]      Leave[3]: ./test/foo/bar
  [DEBUG]    Leave[2]: ./test/foo
  [DEBUG]  Leave[1]: ./test
  [DEBUG]Leave[0]: .

Thus, no match.  Everything else I tested, did what I expected.

Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Tested-by: Nicolas Schier <nicolas@fjasle.eu>

Kind regards,
Nicolas

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git
  2023-02-02 11:02   ` Nicolas Schier
@ 2023-02-06  3:29     ` Masahiro Yamada
  0 siblings, 0 replies; 10+ messages in thread
From: Masahiro Yamada @ 2023-02-06  3:29 UTC (permalink / raw)
  To: Nicolas Schier
  Cc: linux-kbuild, linux-kernel, Nathan Chancellor, Nick Desaulniers,
	Ben Hutchings

On Thu, Feb 2, 2023 at 8:08 PM Nicolas Schier <nicolas@fjasle.eu> wrote:
>
> On Thu, Feb 02, 2023 at 12:37:11PM +0900 Masahiro Yamada wrote:
> > In short, the motivation of this commit is to build a source package
> > without cleaning the source tree.
> >
> > The deb-pkg and (src)rpm-pkg targets first run 'make clean' before
> > creating a source tarball. Otherwise build artifacts such as *.o,
> > *.a, etc. would be included in the tarball. Yet, the tarball ends up
> > containing several garbage files since 'make clean' does not clean
> > everything.
> >
> > Cleaning the tree every time is annoying since it makes the incremental
> > build impossible. It is desirable to create a source tarball without
> > cleaning the tree.
> >
> > In fact, there are some ways to archive this.
> >
> > The easiest way is 'git archive'. Actually, 'make perf-tar*-src-pkg'
> > does this way, but I do not like it because it works only when the source
> > tree is managed by git, and all files you want in the tarball must be
> > committed in advance.
> >
> > I want to make it work without relying on git. We can do this.
> >
> > Files that are not tracked by git are generated files. We can list them
> > out by parsing the .gitignore files. Of course, .gitignore does not cover
> > all the cases, but it works well enough.
> >
> > tar(1) claims to support it:
> >
> >   --exclude-vcs-ignores
> >
> >     Exclude files that match patterns read from VCS-specific ignore files.
> >     Supported files are: .cvsignore, .gitignore, .bzrignore, and .hgignore.
> >
> > The best scenario would be to use 'tar --exclude-vcs-ignores', but this
> > option does not work. --exclude-vcs-ignore does not understand any of
> > the negation (!), preceding slash, following slash, etc.. So, this option
> > is just useless.
> >
> > Hence, I wrote this gitignore parser. The previous version [1], written
> > in Python, was so slow. This version is implemented in C, so it works
> > much faster.
> >
> > This tool traverses the source tree, parsing the .gitignore files. It
> > prints the file paths that are not tracked by git. The output can be
> > used for tar's --exclude-from= option.
> >
> > [How to test this tool]
> >
> >   $ git clean -dfx
> >   $ make -s -j$(nproc) defconfig all                       # or allmodconifg or whatever
> >   $ git archive -o ../linux1.tar --prefix=./ HEAD
> >   $ tar tf ../linux1.tar | LANG=C sort > ../file-list1     # files emitted by 'git archive'
> >   $ make scripts_exclude
> >     HOSTCC  scripts/gen-exclude
> >   $ scripts/gen-exclude --prefix=./ -o ../exclude-list
> >   $ tar cf ../linux2.tar --exclude-from=../exclude-list .
> >   $ tar tf ../linux2.tar | LANG=C sort > ../file-list2     # files emitted by 'tar'
> >   $ diff  ../file-list1 ../file-list2 | grep -E '^(<|>)'
> >   < ./Documentation/devicetree/bindings/.yamllint
> >   < ./drivers/clk/.kunitconfig
> >   < ./drivers/gpu/drm/tests/.kunitconfig
> >   < ./drivers/gpu/drm/vc4/tests/.kunitconfig
> >   < ./drivers/hid/.kunitconfig
> >   < ./fs/ext4/.kunitconfig
> >   < ./fs/fat/.kunitconfig
> >   < ./kernel/kcsan/.kunitconfig
> >   < ./lib/kunit/.kunitconfig
> >   < ./mm/kfence/.kunitconfig
> >   < ./net/sunrpc/.kunitconfig
> >   < ./tools/testing/selftests/arm64/tags/
> >   < ./tools/testing/selftests/arm64/tags/.gitignore
> >   < ./tools/testing/selftests/arm64/tags/Makefile
> >   < ./tools/testing/selftests/arm64/tags/run_tags_test.sh
> >   < ./tools/testing/selftests/arm64/tags/tags_test.c
> >   < ./tools/testing/selftests/kvm/.gitignore
> >   < ./tools/testing/selftests/kvm/Makefile
> >   < ./tools/testing/selftests/kvm/config
> >   < ./tools/testing/selftests/kvm/settings
> >
> > The source tarball contains most of files that are tracked by git. You
> > see some diffs, but it is just because some .gitignore files are wrong.
> >
> >   $ git ls-files -i -c --exclude-per-directory=.gitignore
> >   Documentation/devicetree/bindings/.yamllint
> >   drivers/clk/.kunitconfig
> >   drivers/gpu/drm/tests/.kunitconfig
> >   drivers/hid/.kunitconfig
> >   fs/ext4/.kunitconfig
> >   fs/fat/.kunitconfig
> >   kernel/kcsan/.kunitconfig
> >   lib/kunit/.kunitconfig
> >   mm/kfence/.kunitconfig
> >   tools/testing/selftests/arm64/tags/.gitignore
> >   tools/testing/selftests/arm64/tags/Makefile
> >   tools/testing/selftests/arm64/tags/run_tags_test.sh
> >   tools/testing/selftests/arm64/tags/tags_test.c
> >   tools/testing/selftests/kvm/.gitignore
> >   tools/testing/selftests/kvm/Makefile
> >   tools/testing/selftests/kvm/config
> >   tools/testing/selftests/kvm/settings
> >
> > [1]: https://lore.kernel.org/all/20230128173843.765212-1-masahiroy@kernel.org/
> >
> > Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> > ---
> >
> > (no changes since v3)
> >
> > Changes in v3:
> >  - Various code refactoring: remove struct gitignore, remove next: label etc.
> >  - Support --extra-pattern option
> >
> > Changes in v2:
> >  - Reimplement in C
> >
> >  Makefile              |   4 +
> >  scripts/.gitignore    |   1 +
> >  scripts/Makefile      |   2 +-
> >  scripts/gen-exclude.c | 623 ++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 629 insertions(+), 1 deletion(-)
> >  create mode 100644 scripts/gen-exclude.c
> >
> > diff --git a/Makefile b/Makefile
> > index 2faf872b6808..35b294cc6f32 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1652,6 +1652,10 @@ distclean: mrproper
> >  %pkg: include/config/kernel.release FORCE
> >       $(Q)$(MAKE) -f $(srctree)/scripts/Makefile.package $@
> >
> > +PHONY += scripts_exclude
> > +scripts_exclude: scripts_basic
> > +     $(Q)$(MAKE) $(build)=scripts scripts/gen-exclude
> > +
> >  # Brief documentation of the typical targets used
> >  # ---------------------------------------------------------------------------
> >
> > diff --git a/scripts/.gitignore b/scripts/.gitignore
> > index 6e9ce6720a05..7f433bc1461c 100644
> > --- a/scripts/.gitignore
> > +++ b/scripts/.gitignore
> > @@ -1,5 +1,6 @@
> >  # SPDX-License-Identifier: GPL-2.0-only
> >  /asn1_compiler
> > +/gen-exclude
> >  /generate_rust_target
> >  /insert-sys-cert
> >  /kallsyms
> > diff --git a/scripts/Makefile b/scripts/Makefile
> > index 32b6ba722728..5dcd7f57607f 100644
> > --- a/scripts/Makefile
> > +++ b/scripts/Makefile
> > @@ -38,7 +38,7 @@ HOSTCFLAGS_sorttable.o += -DMCOUNT_SORT_ENABLED
> >  endif
> >
> >  # The following programs are only built on demand
> > -hostprogs += unifdef
> > +hostprogs += gen-exclude unifdef
> >
> >  # The module linker script is preprocessed on demand
> >  targets += module.lds
> > diff --git a/scripts/gen-exclude.c b/scripts/gen-exclude.c
> > new file mode 100644
> > index 000000000000..5c4ecd902290
> > --- /dev/null
> > +++ b/scripts/gen-exclude.c
> > @@ -0,0 +1,623 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +//
> > +// Traverse the source tree, parsing all .gitignore files, and print file paths
> > +// that are not tracked by git.
> > +// The output is suitable to the --exclude-from option of tar.
> > +// This is useful until the --exclude-vcs-ignores option gets working correctly.
> > +//
> > +// Copyright (C) 2023 Masahiro Yamada <masahiroy@kernel.org>
> > +
> > +#include <dirent.h>
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <fnmatch.h>
> > +#include <getopt.h>
> > +#include <stdarg.h>
> > +#include <stdbool.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <sys/stat.h>
> > +#include <sys/types.h>
> > +#include <unistd.h>
> > +
> > +#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
> > +
> > +// struct pattern - represent an ignore pattern (a line in .gitignroe)
> > +// @negate:          negate the pattern (prefixing '!')
> > +// @dir_only:        only matches directories (trailing '/')
> > +// @path_match:      true if the glob pattern is a path instead of a file name
> > +// @double_asterisk: true if the glob pattern contains double asterisks ('**')
> > +// @glob:            glob pattern
> > +struct pattern {
> > +     bool negate;
> > +     bool dir_only;
> > +     bool path_match;
> > +     bool double_asterisk;
> > +     char glob[];
> > +};
> > +
> > +struct pattern **patterns;
>
> Is there a reason, why patterns is not static?  (sparse asked)


No reason - I just forgot to run sparse.
Thanks for catching it.






> > +     q = modified_pattern;
> > +     for (p = pattern; *p; p++) {
> > +             if (!strncmp(p, "**/", 3)) {
> > +                     // "**/" means zero of more sequences of '*/".
> > +                     // "foo**/bar" matches "foobar", "foo*/bar",
> > +                     // "foo*/*/bar", etc.
> > +                     while (slash_diff-- > 0) {
> > +                             *q++ = '*';
> > +                             *q++ = '/';
> > +                     }
> > +
> > +                     if (slash_diff == 0) {
> > +                             *q++ = '*';
> > +                             *q++ = '/';
> > +                     }
> > +
> > +                     if (slash_diff < 0)
> > +                             slash_diff++;
> > +
> > +                     p += 2;
> > +             } else if (!strcmp(p, "/**")) {
> > +                     // A trailing "/**" matches everything inside.
>
> In v2 you also checked against "(*p + 3) == '\0'".  Is the explicit check
> against end-of-string really not needed here?  (pattern = "whatever/**/*.tmp"?)


This detects a trailing "/**".


See this documentation:
https://github.com/git/git/blob/v2.39.1/Documentation/gitignore.txt#L123



"whatever/**/*.tmp" is detected by the previous
     if (!strncmp(p, "**/", 3))


strcmp(p, "/**") only matches the pattern at the end,
while strncmp(p, "**/", 3) matches the pattern anywhere.


Anyway, I will throw away this code in v5.






> > +}
> > +
> > +// Return the length of the initial segment of the string that does not contain
> > +// the unquoted sequence of the given character. Similar to strcspn() in libc.
>
> I struggled across that comment and it took me quite some time to match it to
> strcspn_trailers() behaviour.  I expect it to strip all unescaped occurrences
> of c at the end of str and return the resulting strlen.  After reading it
> several times, I can get a match.  I _think_ main confusion came from my (quite
> imperfect) English:
>
>   "one two  "
>    ^^^         initial segment of string not containing unquoted c ??
>
>    ^^^^^^^     substr that is considered by strcspn_trailer
>
> But this is just about a comment and I'm sure I understand what is intended.
> No action required.


I am not good at English.

Indeed, this comment is really confusing.

Something like the following would have been clearer.

// This function strips the unescaped sequence of the given char from the end
// of the string, and returns the length of the resulting substring.





>
> > +static size_t strcspn_trailer(const char *str, char c)
> > +{
> > +     bool quoted = false;
> > +     size_t len = strlen(str);
> > +     size_t spn = len;
> > +     const char *s;
> > +
> > +     for (s = str; *s; s++) {
> > +             if (!quoted && *s == c) {
> > +                     if (s - str < spn)
> > +                             spn = s - str;
> > +             } else {
> > +                     spn = len;
>
> Is this really intended?  Or 'spn = str - s + 1'?


I think you meant, 'spn = s - str + 1'

My code works, but I think yours is cleaner
because it does not require 'len'.




BTW, I read the source code of GIT.

GIT's implementation is here:
https://github.com/git/git/blob/v2.39.1/dir.c#L934






>
> > +
> > +                     if (!quoted && *s == '\\')
> > +                             quoted = true;
> > +                     else
> > +                             quoted = false;
> > +             }
> > +     }
> > +
> > +     return spn;
> > +}
> > +
> > +// Add an gitignore pattern.
> > +static void add_pattern(char *s, const char *dirpath)
> > +{
> > +     bool negate = false;
> > +     bool dir_only = false;
> > +     bool path_match = false;
> > +     bool double_asterisk = false;
> > +     char *e = s + strlen(s);
> > +     struct pattern *p;
> > +     size_t len;
> > +
> > +     // Skip comments
> > +     if (*s == '#')
> > +             return;
> > +
> > +     // Trailing spaces are ignored unless they are quoted with backslash.
> > +     e = s + strcspn_trailer(s, ' ');
> > +     *e = '\0';
> > +
> > +     // The prefix '!' negates the pattern
> > +     if (*s == '!') {
> > +             s++;
> > +             negate = true;
> > +     }
> > +
> > +     // If there is slash(es) that is not escaped at the end of the pattern,
> > +     // it matches only directories.
>
> Are escaped slashes allowed in file names in git?  I think use of original
> strcspn() would have been enough.


Perhaps, I had some reason to implement it like this, but
I cannot recall it.



Anyway, GIT's implementation is very simple:

https://github.com/git/git/blob/v2.39.1/dir.c#L634

I will follow that.





>
> > +
> > +     if (path_match) {
> > +             strcat(p->glob, dirpath);
> > +             strcat(p->glob, "/");
> > +     }
> > +
> > +     strcat(p->glob, s);
> > +
> > +     debug("Add pattern: %s%s%s\n", negate ? "!" : "", p->glob,
> > +           dir_only ? "/" : "");
> > +
> > +     if (nr_patterns >= alloced_patterns) {
> > +             alloced_patterns += 128;
> > +             patterns = xrealloc(patterns,
> > +                                 sizeof(*patterns) * alloced_patterns);
> > +     }
> > +
> > +     patterns[nr_patterns++] = p;
> > +}
> > +
> > +static void *load_gitignore(const char *dirpath)
> > +{
> > +     struct stat st;
> > +     char path[PATH_MAX], *buf;
> > +     int fd, ret;
> > +
> > +     ret = snprintf(path, sizeof(path), "%s/.gitignore", dirpath);
> > +     if (ret >= sizeof(path))
> > +             error_exit("%s: too long path was truncated\n", path);
> > +
> > +     // If .gitignore does not exist in this directory, open() fails.
> > +     // It is ok, just skip it.
> > +     fd = open(path, O_RDONLY);
> > +     if (fd < 0)
> > +             return NULL;
>
> Why don't you check against errno == 2 (ENOENT)?  I assume, no other
> errno value is expected, but for me it feels a bit odd to not check it
> and exit loudly if something (unlikely) like EMFILE causes open() to
> fail.


Good suggestion.

I will fix it.

GIT also checks this:

https://github.com/git/git/blob/v2.39.1/wrapper.c#L399


>
> > +
> > +     if (fstat(fd, &st) < 0)
> > +             perror_exit(path);
> > +
> > +     buf = xmalloc(st.st_size + 1);
> > +     if (read(fd, buf, st.st_size) != st.st_size)
> > +             perror_exit(path);
> > +
> > +     buf[st.st_size] = '\0';
> > +     if (close(fd))
> > +             perror_exit(path);
> > +
> > +     return buf;
> > +}
> > +
> > +// Parse '.gitignore' in the given directory.
> > +static void parse_gitignore(const char *dirpath)
> > +{
> > +     char *buf, *s, *next;
> > +
> > +     buf = load_gitignore(dirpath);
> > +     if (!buf)
> > +             return;
> > +
> > +     debug("Parse %s/.gitignore\n", dirpath);
> > +
> > +     for (s = buf; *s; s = next) {
> > +             next = s;
> > +
> > +             while (*next != '\0' && *next != '\n')
>
> Not relevant for in-tree use: git does not complain about '\0' in a .gitignore
> but also handles the remaining part of the file.
>


You are right.

I confirmed it from the source code:
https://github.com/git/git/blob/v2.39.1/dir.c#L1141


I will follow that.





>
> Testing with some strange patterns seems to reveal some missing points.  It
> should not be problematic, as nobody wants to write such .gitignore patterns,
> but for completeness:
>
>   $ mkdir -p test/foo/bar
>   $ touch test/foo/bar/baz.tmp
>   $ cat <<-eof >test/.gitignore
>   **/*.tmp
>   **/baz.tmp
>   foo/**/*.tmp
>   **/bar/baz.tmp
>   /**/*.tmp
>   eof
>   $ cd test
>   $ ../scripts/gen-exclude --debug
>   [DEBUG]Add pattern: .git/
>   [DEBUG]Enter[0]: .
>   [DEBUG]  ./test: no match
>   [DEBUG]  Enter[1]: ./test
>   [DEBUG]    Parse ./test/.gitignore
>   [DEBUG]    Add pattern: ./test/**/*.tmp
>   [DEBUG]    Add pattern: ./test/**/baz.tmp
>   [DEBUG]    Add pattern: ./test/foo/**/*.tmp
>   [DEBUG]    Add pattern: ./test/**/bar/baz.tmp
>   [DEBUG]    Add pattern: ./test/**/*.tmp
>   [DEBUG]    ./test/.gitignore: no match
>   [DEBUG]    ./test/foo: no match
>   [DEBUG]    Enter[2]: ./test/foo
>   [DEBUG]      ./test/foo/bar: no match
>   [DEBUG]      Enter[3]: ./test/foo/bar
>   [DEBUG]        ./test/foo/bar/baz.tmp: no match
>   [DEBUG]      Leave[3]: ./test/foo/bar
>   [DEBUG]    Leave[2]: ./test/foo
>   [DEBUG]  Leave[1]: ./test
>   [DEBUG]Leave[0]: .
>
> Thus, no match.  Everything else I tested, did what I expected.


You are right.

test/foo/bar/baz.tmp must be ignored.


I read the code because I was curious how GIT does this.

GIT has its own fnmatch() that supports double asterisks too.
https://github.com/git/git/blob/v2.39.1/wildmatch.c#L55


I cannot write such clever code, so I will
import the matching code in v5.


V5 is almost ready for submission.
The code grew up to 1000 lines, but I can live with that.



In my local test, v5 worked correctly.


[DEBUG] Add pattern: .git/
[DEBUG] Enter[0]: .
[DEBUG]   ./test: no match
[DEBUG]   Enter[1]: ./test
[DEBUG]     Parse ./test/.gitignore
[DEBUG]     Add pattern: **/*.tmp
[DEBUG]     Add pattern: **/baz.tmp
[DEBUG]     Add pattern: foo/**/*.tmp
[DEBUG]     Add pattern: **/bar/baz.tmp
[DEBUG]     Add pattern: /**/*.tmp
[DEBUG]     ./test/foo: no match
[DEBUG]     Enter[2]: ./test/foo
[DEBUG]       ./test/foo/bar: no match
[DEBUG]       Enter[3]: ./test/foo/bar
[DEBUG]         ./test/foo/bar/baz.tmp: matches /**/*.tmp (./test/.gitignore)
[DEBUG]         Ignore: ./test/foo/bar/baz.tmp
test/foo/bar/baz.tmp
[DEBUG]         Ignore: ./test/foo/bar (due to all files inside ignored)
test/foo/bar
[DEBUG]       Leave[3]: ./test/foo/bar
[DEBUG]       Ignore: ./test/foo (due to all files inside ignored)
test/foo
[DEBUG]     Leave[2]: ./test/foo
[DEBUG]     ./test/.gitignore: no match
[DEBUG]   Leave[1]: ./test
[DEBUG] Leave[0]: .







>
> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
> Tested-by: Nicolas Schier <nicolas@fjasle.eu>


Thanks for your close review, as always.



>
> Kind regards,
> Nicolas

--
Best Regards
Masahiro Yamada

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-02-06  3:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-02  3:37 [PATCH v4 0/6] kbuild: improve source package builds Masahiro Yamada
2023-02-02  3:37 ` [PATCH v4 1/6] kbuild: add a tool to generate a list of files ignored by git Masahiro Yamada
2023-02-02  3:49   ` Masahiro Yamada
2023-02-02 11:02   ` Nicolas Schier
2023-02-06  3:29     ` Masahiro Yamada
2023-02-02  3:37 ` [PATCH v4 2/6] kbuild: deb-pkg: create source package without cleaning Masahiro Yamada
2023-02-02  3:37 ` [PATCH v4 3/6] kbuild: rpm-pkg: build binary packages from source rpm Masahiro Yamada
2023-02-02  3:37 ` [PATCH v4 4/6] kbuild: srcrpm-pkg: create source package without cleaning Masahiro Yamada
2023-02-02  3:37 ` [PATCH v4 5/6] kbuild: deb-pkg: hide KDEB_SOURCENAME from Makefile Masahiro Yamada
2023-02-02  3:37 ` [PATCH v4 6/6] kbuild: deb-pkg: switch over to format 3.0 (quilt) Masahiro Yamada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.