All of lore.kernel.org
 help / color / mirror / Atom feed
* Segfault in the attr stack
@ 2016-06-01 22:00 Stefan Beller
  2016-06-01 22:11 ` Junio C Hamano
  2016-06-01 22:11 ` Stefan Beller
  0 siblings, 2 replies; 12+ messages in thread
From: Stefan Beller @ 2016-06-01 22:00 UTC (permalink / raw)
  To: Junio C Hamano, git

(running git-next)

In the Gerrit repo I did
    $ echo "/plugins/commit-message-length-validator sub-default"
>>.gitattributes
    $ git status ":(attr:sub-default)"
Segmentation fault (core dumped)

Running this multiple times through gdb, this produces different back
traces as it seems that threads are involved? (So I don't attach a
back trace here)

Also notable are:
*** Error in `/usr/local/google/home/sbeller/bin/git': double free or
corruption (fasttop): 0x00007fffd80008f0 ***
*** Error in `/usr/local/google/home/sbeller/bin/git': double free or
corruption (fasttop): 0x00007ffff0001000 ***

When running another command `git status ":(attr:asdf)"`, I get
git: attr.c:634: prepare_attr_stack: Assertion `attr_stack->origin' failed.
git: attr.c:634: prepare_attr_stack: Assertion `attr_stack->origin' failed.
git: attr.c:634: prepare_attr_stack: Assertion `attr_stack->origin' failed.
git: attr.c:634: prepare_attr_stack: Assertion `attr_stack->origin' failed.
Aborted (core dumped)

The failing assertions are an indication that sb/pathspec-label is
using the attrs incorrectly or
jc/attr added tighter assertions than I was aware of.

Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:00 Segfault in the attr stack Stefan Beller
@ 2016-06-01 22:11 ` Junio C Hamano
  2016-06-01 22:29   ` Stefan Beller
  2016-06-01 22:11 ` Stefan Beller
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-06-01 22:11 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> In the Gerrit repo I did
>     $ echo "/plugins/commit-message-length-validator sub-default"
>>>.gitattributes
>     $ git status ":(attr:sub-default)"
> Segmentation fault (core dumped)

Thanks.  I can reproduce this easily with git.git, e.g.

	$ cat >>.git/info/attributes <<-\EOF
	Documentation/git-merge.txt conflict-marker-size=32
        Documentation/user-manual.txt conflict-marker-size=32
	EOF
	$ git status ':(attr:conflict-marker-size=32)'

would die (presumably) the same way.

By the way, I just noticed that the <specification> part of the
':(attr:<specification>)' syntax would need to be rethought.  In the
.gitattributes file everybody has, we see these lines:

        *.[ch] whitespace=indent,trail,space
        *.sh whitespace=indent,trail,space

but because comma is a special separator in the pathspec magic
system, we cannot do

	$ git status ':(attr:whitespace=indent,trail,space)'

I think we should introduce a quoting mechanism to hide these commas
from the pathspec magic splitter, e.g.

where attr_value_unquote() would copy string while unquoting some
special characters (i.e. at least ' ' and ',' because they are used
as syntactic elements in the higher level; there might be others).

diff --git a/pathspec.c b/pathspec.c
index 0a02255..fb22f28 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -132,7 +132,7 @@ static void parse_pathspec_attr_match(struct pathspec_item *item, const char *va
 				am->match_mode = MATCH_SET;
 			else {
 				am->match_mode = MATCH_VALUE;
-				am->value = xstrdup(&attr[attr_len + 1]);
+				am->value = attr_value_unquote(&attr[attr_len + 1]);
 				if (strchr(am->value, '\\'))
 					die(_("attr spec values must not contain backslashes"));
 			}

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:00 Segfault in the attr stack Stefan Beller
  2016-06-01 22:11 ` Junio C Hamano
@ 2016-06-01 22:11 ` Stefan Beller
  2016-06-01 22:17   ` Junio C Hamano
  1 sibling, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2016-06-01 22:11 UTC (permalink / raw)
  To: Junio C Hamano, git

This can be reproduced on sb/pathspec-label in git.git as well.

The key difference I notice is `git ls-files` works perfectly (e.g. in
the tests)
while `git status` fails.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:11 ` Stefan Beller
@ 2016-06-01 22:17   ` Junio C Hamano
  2016-06-01 22:46     ` Junio C Hamano
  2016-06-02  1:02     ` Duy Nguyen
  0 siblings, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-06-01 22:17 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Gaah, of course.

This is coming from the cache preload codepath, where multiple threads
try to run ce_path_match().
It used to be OK because pathspec magic never looked at attributes,
but now it does, and attribute system is not thread-safe.

On Wed, Jun 1, 2016 at 3:11 PM, Stefan Beller <sbeller@google.com> wrote:
> This can be reproduced on sb/pathspec-label in git.git as well.
>
> The key difference I notice is `git ls-files` works perfectly (e.g. in
> the tests)
> while `git status` fails.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:11 ` Junio C Hamano
@ 2016-06-01 22:29   ` Stefan Beller
  2016-06-01 22:35     ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2016-06-01 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Wed, Jun 1, 2016 at 3:11 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> By the way, I just noticed that the <specification> part of the
> ':(attr:<specification>)' syntax would need to be rethought.  In the
> .gitattributes file everybody has, we see these lines:
>
>         *.[ch] whitespace=indent,trail,space
>         *.sh whitespace=indent,trail,space
>
> but because comma is a special separator in the pathspec magic
> system, we cannot do
>
>         $ git status ':(attr:whitespace=indent,trail,space)'
>

Right. In [1] I wrote:

>>
>>> +     if (!item->attr_check)
>>> +             item->attr_check = git_attr_check_alloc();
>>
>> Given more than one "attr" magic, e.g. ":(attr:A=a,attr:B)/path",
>> the check may not be empty when we process the second one; we just
>> extend it without losing the existing contents.
>
> That is why I am not super happy with it though.
>
>    ":(attr:A=a,attr:B)/path",
>    ":(attr:A=a B)/path",
>
> are the same for the user as well as in the internal data structures.
> This "wastes" the white space as a possible convenient separator
> character, e.g. for multiple values. On the other hand it will be easier
> to type, specially for many attrs (read submodule groups).

So at that time I thought I had communicated the issue enough and we'd
be fine ignoring it for now. I propose to not escape commas, but use
white spaces instead, i.e.

    git status ':(attr:whitespace=indent trail space,attr:label=with
more values)' ':attr(attr:foo:bar)'

would match
* all files that have the whitespace AND the label setting (matching
exactly the values)
* OR foo=bar attribute

This syntax would require to repeat ",attr:" for multiple ANDed attributes,
but would save us from escaped commas, which may be a pain both in parsing as
well as doing the input on a shell?

[1] http://thread.gmane.org/gmane.comp.version-control.git/294989/focus=295016

> I think we should introduce a quoting mechanism to hide these commas
> from the pathspec magic splitter, e.g.
>
> where attr_value_unquote() would copy string while unquoting some
> special characters (i.e. at least ' ' and ',' because they are used
> as syntactic elements in the higher level; there might be others).
>
> diff --git a/pathspec.c b/pathspec.c
> index 0a02255..fb22f28 100644
> --- a/pathspec.c
> +++ b/pathspec.c
> @@ -132,7 +132,7 @@ static void parse_pathspec_attr_match(struct pathspec_item *item, const char *va
>                                 am->match_mode = MATCH_SET;
>                         else {
>                                 am->match_mode = MATCH_VALUE;
> -                               am->value = xstrdup(&attr[attr_len + 1]);
> +                               am->value = attr_value_unquote(&attr[attr_len + 1]);
>                                 if (strchr(am->value, '\\'))
>                                         die(_("attr spec values must not contain backslashes"));
>                         }
>

If we go by whitespaces, we can implement attr_value_unquote as a `tr " " ","`
conceptually, which seems easy.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:29   ` Stefan Beller
@ 2016-06-01 22:35     ` Junio C Hamano
  0 siblings, 0 replies; 12+ messages in thread
From: Junio C Hamano @ 2016-06-01 22:35 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Stefan Beller <sbeller@google.com> writes:

> I propose to not escape commas, but use
> white spaces instead, i.e.
>
>     git status ':(attr:whitespace=indent trail space,attr:label=with
> more values)' ':attr(attr:foo:bar)'
>
> would match
> * all files that have the whitespace AND the label setting (matching
> exactly the values)

"attr:whitespace=indent trail ..whatever other stuff.." is already a
non-starter.  In the example in the message you are responding to, I
want to find paths whose "whitespace" attribute is set to the exact
string "indent,trail,space".  And I do not want the pathspec magic
code to know how the interpretation of "whitespace" attribute treats
comma in the string.  It may or may not have any special meaning, so
if you meant "whitespace=indent trail ..whatever other stuff.."  to
mean "either whitespace is set to indent, or whitespace is set to
trail", then that is already unacceptable.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:17   ` Junio C Hamano
@ 2016-06-01 22:46     ` Junio C Hamano
  2016-06-02  0:22       ` Junio C Hamano
  2016-06-02  1:02     ` Duy Nguyen
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-06-01 22:46 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Gaah, of course.
>
> This is coming from the cache preload codepath, where multiple threads
> try to run ce_path_match().
> It used to be OK because pathspec magic never looked at attributes,
> but now it does, and attribute system is not thread-safe.

The symlink check code has dealt with a similar issue in the past in
the codepath.  threaded_has_symlink_leading_path() is called with
per-thread data structure called "cache", because the leading
symbolic link check uses a cache that is shared across different
invocations of the check.

We need something similar to duplicate the attribute stack per
thread.  git_check_attrs() currently uses the singleton instance of
attr_stack chain, that is rooted at the file scope global
"attr_stack", and no wonder that would be clobbered when multiple
threads try to use it.  The result of attribute collection is also
accumulated in a file scope global, which should probably be moved
to the "struct git_attr_check" introduced in the jc/attr topic.

We need to teach the callchain that includes prepare_attr_stack()
and bootstrap_attr_stack() to take a pointer to the attr_stack root,
give git_check_attr_threaded() that takes such root so that threaded
code can use per-thread attr stack, and for non-threaded
applications use &the_default_attr just like the file scope global
"default_cache" is used in symlinks.c.  Then a threaded attribute
lookup can maintain its own attr_stack when running more than one
instance of lookup.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:46     ` Junio C Hamano
@ 2016-06-02  0:22       ` Junio C Hamano
  2016-06-02 15:59         ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-06-02  0:22 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Gaah, of course.
>>
>> This is coming from the cache preload codepath, where multiple threads
>> try to run ce_path_match().
>> It used to be OK because pathspec magic never looked at attributes,
>> but now it does, and attribute system is not thread-safe.
>
> The symlink check code has dealt with a similar issue in the past in
> the codepath.  threaded_has_symlink_leading_path() is called with
> per-thread data structure called "cache", because the leading
> symbolic link check uses a cache that is shared across different
> invocations of the check.
>
> We need something similar to duplicate the attribute stack per
> thread.  git_check_attrs() currently uses the singleton instance of
> attr_stack chain, that is rooted at the file scope global
> "attr_stack", and no wonder that would be clobbered when multiple
> threads try to use it.  The result of attribute collection is also
> accumulated in a file scope global, which should probably be moved
> to the "struct git_attr_check" introduced in the jc/attr topic.
>
> We need to teach the callchain that includes prepare_attr_stack()
> and bootstrap_attr_stack() to take a pointer to the attr_stack root,
> give git_check_attr_threaded() that takes such root so that threaded
> code can use per-thread attr stack, and for non-threaded
> applications use &the_default_attr just like the file scope global
> "default_cache" is used in symlinks.c.  Then a threaded attribute
> lookup can maintain its own attr_stack when running more than one
> instance of lookup.

Another is the global attr dictionary.  It is like the global object
hash, so the look-up and insertion into it need to be protected the
same way with mutex, just like builtin/pack-objects.c serializes the
object store access with locks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-01 22:17   ` Junio C Hamano
  2016-06-01 22:46     ` Junio C Hamano
@ 2016-06-02  1:02     ` Duy Nguyen
  2016-06-02  1:04       ` Duy Nguyen
  1 sibling, 1 reply; 12+ messages in thread
From: Duy Nguyen @ 2016-06-02  1:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git

On Thu, Jun 2, 2016 at 5:17 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Gaah, of course.
>
> This is coming from the cache preload codepath, where multiple threads
> try to run ce_path_match().
> It used to be OK because pathspec magic never looked at attributes,
> but now it does, and attribute system is not thread-safe.

Off topic. I wonder if we can annotate which functions are thread-safe
(minority currently in git code base, I believe) and make gcc, clang
or sparse spot them in threaded code?
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-02  1:02     ` Duy Nguyen
@ 2016-06-02  1:04       ` Duy Nguyen
  0 siblings, 0 replies; 12+ messages in thread
From: Duy Nguyen @ 2016-06-02  1:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Stefan Beller, git

On Thu, Jun 2, 2016 at 8:02 AM, Duy Nguyen <pclouds@gmail.com> wrote:
> On Thu, Jun 2, 2016 at 5:17 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Gaah, of course.
>>
>> This is coming from the cache preload codepath, where multiple threads
>> try to run ce_path_match().
>> It used to be OK because pathspec magic never looked at attributes,
>> but now it does, and attribute system is not thread-safe.
>
> Off topic. I wonder if we can annotate which functions are thread-safe
> (minority currently in git code base, I believe) and make gcc, clang
> or sparse spot them in threaded code?

By them I mean unannotated (i.e. not thread-safe) functions, of course.
-- 
Duy

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-02  0:22       ` Junio C Hamano
@ 2016-06-02 15:59         ` Junio C Hamano
  2016-06-02 16:56           ` Stefan Beller
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2016-06-02 15:59 UTC (permalink / raw)
  To: Stefan Beller; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>>> Gaah, of course.
>>>
>>> This is coming from the cache preload codepath, where multiple threads
>>> try to run ce_path_match().
>>> It used to be OK because pathspec magic never looked at attributes,
>>> but now it does, and attribute system is not thread-safe.

I'll look into teaching a threadble interface to the attribute
subsystem, but for now, this should get you unstuck, I think.

One of the ways preload codepath assures that multiple threads do
not stomp on the shared datastructure is by copying things that are
involved in the operation, and there is copy_pathspec() call.  I
think the "pathspec magic that limits with the attributes" topic,
which adds to the pathspec structure, needs to update the function
so that its shared data structure is properly copied.  Before the
series, I think the pathspec structure only has either pointers to
borrowed strings (e.g. raw) or scalars, and it was sufficient to do
a shallow copy with memcpy().  With the change, that is no longer
the case.

Which would mean we'd need to make sure that any codepath that calls
copy_pathspec() needs to think about how to clean it up.  Before the
change, preload-index codepath didn't have to, but with the change,
it will.  We'd need pathspec_clear() or something like that.

 pathspec.c      | 12 ++++++++++++
 pathspec.h      |  2 ++
 preload-index.c |  4 ++++
 3 files changed, 18 insertions(+)

diff --git a/pathspec.c b/pathspec.c
index 0a02255..326863a 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -208,6 +208,18 @@ static void eat_long_magic(struct pathspec_item *item, const char *elt,
 	*copyfrom_ = copyfrom;
 }
 
+int pathspec_is_non_threadable(const struct pathspec *pathspec)
+{
+	int i;
+
+	for (i = 0; i < pathspec->nr; i++) {
+		const struct pathspec_item *item = &pathspec->items[i];
+		if (item->attr_check)
+			return 1;
+	}
+	return 0;
+}
+
 /*
  * Take an element of a pathspec and check for magic signatures.
  * Append the result to the prefix. Return the magic bitmap.
diff --git a/pathspec.h b/pathspec.h
index 5308137..07d21f0 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -115,4 +115,6 @@ extern void add_pathspec_matches_against_index(const struct pathspec *pathspec,
 extern const char *check_path_for_gitlink(const char *path);
 extern void die_if_path_beyond_symlink(const char *path, const char *prefix);
 
+extern int pathspec_is_non_threadable(const struct pathspec *);
+
 #endif /* PATHSPEC_H */
diff --git a/preload-index.c b/preload-index.c
index c1fe3a3..af46878 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -76,6 +76,10 @@ static void preload_index(struct index_state *index,
 	if (!core_preload_index)
 		return;
 
+	/* Do not preload when pathspec uses non-threadable subsystems */
+	if (pathspec && pathspec_is_non_threadable(pathspec))
+		return; /* for now ... */
+
 	threads = index->cache_nr / THREAD_COST;
 	if (threads < 2)
 		return;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Segfault in the attr stack
  2016-06-02 15:59         ` Junio C Hamano
@ 2016-06-02 16:56           ` Stefan Beller
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Beller @ 2016-06-02 16:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Thu, Jun 2, 2016 at 8:59 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
>>>> Gaah, of course.
>>>>
>>>> This is coming from the cache preload codepath, where multiple threads
>>>> try to run ce_path_match().
>>>> It used to be OK because pathspec magic never looked at attributes,
>>>> but now it does, and attribute system is not thread-safe.
>
> I'll look into teaching a threadble interface to the attribute
> subsystem, but for now, this should get you unstuck, I think.

Thanks for looking into this!
(I was about to do that if you would not have stepped up, as the bug
only surfaces when using the pathspec labels.)

>
> +       /* Do not preload when pathspec uses non-threadable subsystems */
> +       if (pathspec && pathspec_is_non_threadable(pathspec))
> +               return; /* for now ... */

This fixes the problem for now; I'll look into the comma escaping then.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-06-02 16:56 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 22:00 Segfault in the attr stack Stefan Beller
2016-06-01 22:11 ` Junio C Hamano
2016-06-01 22:29   ` Stefan Beller
2016-06-01 22:35     ` Junio C Hamano
2016-06-01 22:11 ` Stefan Beller
2016-06-01 22:17   ` Junio C Hamano
2016-06-01 22:46     ` Junio C Hamano
2016-06-02  0:22       ` Junio C Hamano
2016-06-02 15:59         ` Junio C Hamano
2016-06-02 16:56           ` Stefan Beller
2016-06-02  1:02     ` Duy Nguyen
2016-06-02  1:04       ` Duy Nguyen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.