All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Emily Shaffer <emilyshaffer@google.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH] grep: provide sane default to grep_source struct
Date: Wed, 15 May 2019 20:11:52 -0700	[thread overview]
Message-ID: <20190516022911.GA135875@google.com> (raw)
In-Reply-To: <20190516020023.61161-1-emilyshaffer@google.com>

Hi,

Emily Shaffer wrote:

> grep_buffer creates a struct grep_source gs and calls grep_source()
> with it. However, gs.name is null, which eventually produces a
> segmentation fault in
> grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> not set.

Thanks for catching it.  Taking a step back, I think the problem is in
the definition of "struct grep_source":

	struct grep_source {
		char *name;

		enum grep_source_type {
			GREP_SOURCE_OID,
			GREP_SOURCE_FILE,
			GREP_SOURCE_BUF,
		} type;
		void *identifier;

		...
	};

What is the difference between a 'name' and an 'identifier'?  Who is
responsible for free()ing them?  Can they be NULL?  This is pretty
underdocumented for a public type.

If we take the point of view that 'name' should always be non-NULL,
then I wonder:

- can we document that?
- can grep_source_init enforce that?
- can we take advantage of that in grep_source as well, as a sanity
  check that the grep_source has been initialized?
- while we're here, can we describe what the field is used for
  (prefixing output with context before a ":", I believe)?

[...]
> This seems to be unreachable from existing commands but is reachable in
> the event that someone rolls their own revision walk and neglects to set
> rev_info->grep_filter->status_only.

init_revisions sets that field, so a caller would have to replace
grep_filter or unset status_only to trigger this.

>                                     Conceivably, someone might want to
> print all changed lines of all commits which match some regex.

Hm, does that work well?  The caller of grep_buffer in revision.c
doesn't seem to be careful about where in the output hits would be
printed, but I might be missing something.

[...]
> I ran into this issue while I was working on a tutorial on rolling your
> own revision walk. I didn't want to print all the lines associated with
> a matching change, but it didn't seem good that a seemingly-sane
> grep_filter config was segfaulting.

Good find, and thanks for taking the time to make it easier to debug for
the next person.

[...]
> Jonathan Nieder proposed alternatively adding some check to grep_source()
> to ensure that if opt->status_only is unset, gs->name must be non-NULL
> (and yell about it if not), as well as some extra comments indicating
> what assumptions are made about the data coming into functions like
> grep_source(). I'm fine with that as well (although I'm not sure it
> makes sense semantically to require a name which the user probably can't
> easily set, or else ban the user from printing LOC during grep). Mostly
> I'm happy with any solution besides a segfault with no error logging :)

Let's compare the two possibilities.

The advantage of "(in memory)" is that it Just Works, which should
make a nicer development experience with getting a new caller mostly
working on the way to getting them working just the way you want.

The disadvantage is that if we start outputting that in production, we
and static analyzers are less likely to notice.  In other words,
printing "(in memory)" is leaking details to the end user that do not
match what the end user asked for.  NULL would instead produce a
crash, prompting the author of the caller to fix it.

What was particularly pernicious about this example is that the NULL
dereference only occurs if the grep has a match.  So I suppose I'm
leaning toward (in addition to adding some comments to document the
struct) adding a check like

	if (!gs->name && !opt->status_only)
		BUG("grep calls that could print name require name");

to grep_source.

That would also sidestep the question of whether this debugging aid
should be translated. :)

Sensible?

Thanks,
Jonathan

  parent reply	other threads:[~2019-05-16  3:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-16  2:00 [RFC PATCH] grep: provide sane default to grep_source struct Emily Shaffer
2019-05-16  3:07 ` Junio C Hamano
2019-05-16  3:13   ` Jonathan Nieder
2019-05-16  3:11 ` Jonathan Nieder [this message]
2019-05-16 21:05   ` Emily Shaffer
2019-05-16 21:44 ` [PATCH v2] " Emily Shaffer
2019-05-16 22:02   ` Jeff King
2019-05-21 23:52     ` Emily Shaffer
2019-05-22  0:17       ` Jonathan Nieder
2019-05-22  0:34   ` [PATCH v3] " Emily Shaffer
2019-05-22  0:58     ` Jonathan Nieder
2019-05-22  4:01     ` Jeff King
2019-05-23 20:23     ` [PATCH v4] grep: fail if call could output and name is null Emily Shaffer
2019-05-23 20:34       ` Jonathan Nieder
2019-05-28 17:57         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190516022911.GA135875@google.com \
    --to=jrnieder@gmail.com \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.