Git Mailing List Archive on lore.kernel.org
 help / color / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jonathan Tan <jonathantanmy@google.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path
Date: Fri, 6 Nov 2020 16:26:54 -0800
Message-ID: <CABPp-BEyosjCpBr-8B19YXZV0mpn3oYAXoaaROKFNZQ+p4ZMnQ@mail.gmail.com> (raw)
In-Reply-To: <20201106225828.774616-1-jonathantanmy@google.com>

On Fri, Nov 6, 2020 at 2:58 PM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > +static void setup_path_info(struct merge_options *opt,
> > +                         struct string_list_item *result,
> > +                         const char *current_dir_name,
> > +                         int current_dir_name_len,
> > +                         char *fullpath, /* we'll take over ownership */
> > +                         struct name_entry *names,
> > +                         struct name_entry *merged_version,
> > +                         unsigned is_null,     /* boolean */
> > +                         unsigned df_conflict, /* boolean */
>
> Booleans could be int, I think?

I guess this goes back to the question on patch 6 where you suggested
I mark some unsigned variables (derived from bit math on other
unsigned quantities) instead be int.  I guess it could, but I have the
same question; since "boolean" isn't available in C, does int vs.
unsigned matter?

> > +                         unsigned filemask,
> > +                         unsigned dirmask,
> > +                         int resolved          /* boolean */)
> > +{
> > +     struct conflict_info *path_info;
> > +
> > +     assert(!is_null || resolved);
> > +     assert(!df_conflict || !resolved); /* df_conflict implies !resolved */
> > +     assert(resolved == (merged_version != NULL));
> > +
> > +     path_info = xcalloc(1, resolved ? sizeof(struct merged_info) :
> > +                                       sizeof(struct conflict_info));
> > +     path_info->merged.directory_name = current_dir_name;
> > +     path_info->merged.basename_offset = current_dir_name_len;
> > +     path_info->merged.clean = !!resolved;
> > +     if (resolved) {
> > +             path_info->merged.result.mode = merged_version->mode;
> > +             oidcpy(&path_info->merged.result.oid, &merged_version->oid);
> > +             path_info->merged.is_null = !!is_null;
> > +     } else {
> > +             int i;
> > +
> > +             for (i = 0; i < 3; i++) {
> > +                     path_info->pathnames[i] = fullpath;
> > +                     path_info->stages[i].mode = names[i].mode;
> > +                     oidcpy(&path_info->stages[i].oid, &names[i].oid);
> > +             }
> > +             path_info->filemask = filemask;
> > +             path_info->dirmask = dirmask;
> > +             path_info->df_conflict = !!df_conflict;
> > +     }
> > +     strmap_put(&opt->priv->paths, fullpath, path_info);
>
> So these are placed in paths but not unmerged. I'm starting to wonder if
> struct merge_options_internal should be called merge_options_state or
> something, and each field having documentation about when they're used
> (or better yet, have functions like collect_merge_info() return their
> calculations in return values (which may be "out" parameters) instead of
> in this struct).

Right, unmerged is only those paths that remain unmerged after all
steps.  record_unmerged_index_entries() could simply walk over all
entries in paths and pick out the ones that were unmerged, but
process_entries() has to walk over all paths, determine whether they
can be merged, and determine what to record for the resulting tree for
each path.  So, having it stash away the unmerged stuff is a simple
optimization.

Renaming to merge_options_state or even just merge_state would be fine
-- but any renaming done here will also affect merge-recursive.[ch].
See the definition of merge_options in merge-recursive.  (For history,
merge-recursive.h stuffed state into merge_options, which risked funny
misusage patterns and made the API unnecessarily complex...and made it
suggest that alternative algorithms needed to have the same state.
So, the state was moved to a merge_options_internal struct.  That's
not to say we can't rename, but it does need to be done in
merge-recursive as well.)

As for having collect_merge_info() return their calculations in return
values, would that just end with me returning a struct
merge_options_internal?  Or did you want each return value added to
the function signature?  Each return value in the function signature
makes sense right now for this super-simplified initial 20 patches,
but what about when this data structure gains all kind of
rename-related state that is collected, updated, and passed between
these areas?  I'd have a huge number of "out" and "in" fields to every
function.  Eventually, merge_options_internal (or whatever it might be
renamed to) expands to the following, where I have to first define an
extra enum and two extra structs so that you know the definitions of
new types that show up in merge_options_internal:

enum relevance {
    RELEVANT_NO_MORE = 0,
    RELEVANT_CONTENT = 1,
    RELEVANT_LOCATION = 2,
    RELEVANT_BOTH = 3
};

struct traversal_callback_data {
    unsigned long mask;
    unsigned long dirmask;
    struct name_entry names[3];
};

struct rename_info {
    /* For the next six vars, the 0th entry is ignored and unused */
    struct diff_queue_struct pairs[3]; /* input to & output from
diffcore_rename */
    struct strintmap relevant_sources[3];  /* filepath => enum relevance */
    struct strintmap dirs_removed[3];      /* directory => bool */
    struct strmap dir_rename_count[3];     /* old_dir => {new_dir => int} */
    struct strintmap possible_trivial_merges[3]; /* dirname->dir_rename_mask */
    struct strset target_dirs[3];             /* set of directory paths */
    unsigned trivial_merges_okay[3];          /* 0 = no, 1 = maybe */
    /*
     * dir_rename_mask:
     *   0: optimization removing unmodified potential rename source okay
     *   2 or 4: optimization okay, but must check for files added to dir
     *   7: optimization forbidden; need rename source in case of dir rename
     */
    unsigned dir_rename_mask:3;

    /*
     * dir_rename_mask needs to be coupled with a traversal through trees
     * that iterates over all files in a given tree before all immediate
     * subdirectories within that tree.  Since traverse_trees() doesn't do
     * that naturally, we have a traverse_trees_wrapper() that stores any
     * immediate subdirectories while traversing files, then traverses the
     * immediate subdirectories later.
     */
    struct traversal_callback_data *callback_data;
    int callback_data_nr, callback_data_alloc;
    char *callback_data_traverse_path;

    /*
     * When doing repeated merges, we can re-use renaming information from
     * previous merges under special circumstances;
     */
    struct tree *merge_trees[3];
    int cached_pairs_valid_side;
    struct strmap cached_pairs[3];   /* fullnames -> {rename_path or NULL} */
    struct strset cached_irrelevant[3]; /* fullnames */
    struct strset cached_target_names[3]; /* set of target fullnames */
    /*
     * And sometimes it pays to detect renames, and then restart the merge
     * with the renames cached so that we can do trivial tree merging.
     * Values: 0 = don't bother, 1 = let's do it, 2 = we already did it.
     */
    unsigned redo_after_renames;
};

struct merge_options_internal {
    struct strmap paths;    /* maps path -> (merged|conflict)_info */
    struct strmap unmerged; /* maps path -> conflict_info */
#if USE_MEMORY_POOL
    struct mem_pool pool;
#else
    struct string_list paths_to_free; /* list of strings to free */
#endif
    struct rename_info *renames;
    struct index_state attr_index; /* renormalization weirdly needs one... */
    struct strmap output;  /* maps path -> conflict messages */
    const char *current_dir_name;
    char *toplevel_dir; /* see merge_info.directory_name comment */
    int call_depth;
    int needed_rename_limit;
};


> > +     result->string = fullpath;
> > +     result->util = path_info;
> > +}
> > +
> >  static int collect_merge_info_callback(int n,
> >                                      unsigned long mask,
> >                                      unsigned long dirmask,
> > @@ -91,10 +136,12 @@ static int collect_merge_info_callback(int n,
> >        */
> >       struct merge_options *opt = info->data;
> >       struct merge_options_internal *opti = opt->priv;
> > -     struct conflict_info *ci;
> > +     struct string_list_item pi;  /* Path Info */
> > +     struct conflict_info *ci; /* pi.util when there's a conflict */
>
> Looking ahead to patch 10, this seems more like "pi.util unless we know
> for sure that there's no conflict".

That's too long for the line to remain at 80 characters; it's 16
characters over the limit.  ;-)

  reply index

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren
2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren
2020-11-06 22:05   ` Jonathan Tan
2020-11-06 22:45     ` Elijah Newren
2020-11-09 20:55       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren
2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren
2020-11-11 13:52   ` Derrick Stolee
2020-11-11 16:22     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren
2020-11-11 13:54   ` Derrick Stolee
2020-11-11 16:47     ` Elijah Newren
2020-11-11 16:51       ` Derrick Stolee
2020-11-11 17:03         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren
2020-11-11 13:58   ` Derrick Stolee
2020-11-11 17:07     ` Elijah Newren
2020-11-11 17:10       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren
2020-11-06 22:19   ` Jonathan Tan
2020-11-06 23:10     ` Elijah Newren
2020-11-09 20:59       ` Jonathan Tan
2020-11-11 14:38   ` Derrick Stolee
2020-11-11 17:02     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren
2020-11-11 14:51   ` Derrick Stolee
2020-11-11 17:13     ` Elijah Newren
2020-11-11 17:21       ` Eric Sunshine
2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren
2020-11-06 22:52   ` Jonathan Tan
2020-11-06 23:41     ` Elijah Newren
2020-11-09 22:04       ` Jonathan Tan
2020-11-09 23:05         ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren
2020-11-06 22:58   ` Jonathan Tan
2020-11-07  0:26     ` Elijah Newren [this message]
2020-11-09 22:09       ` Jonathan Tan
2020-11-09 23:08         ` Elijah Newren
2020-11-11 15:26   ` Derrick Stolee
2020-11-11 18:16     ` Elijah Newren
2020-11-11 22:06       ` Elijah Newren
2020-11-12 18:23         ` Derrick Stolee
2020-11-12 18:39       ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren
2020-11-11 15:31   ` Derrick Stolee
2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren
2020-11-11 19:51   ` Jonathan Tan
2020-11-12  1:48     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren
2020-11-11 16:09   ` Derrick Stolee
2020-11-11 18:58     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren
2020-11-11 20:01   ` Jonathan Tan
2020-11-11 20:24     ` Elijah Newren
2020-11-12 20:39       ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren
2020-11-11 20:47   ` Jonathan Tan
2020-11-11 21:21     ` Elijah Newren
2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren
2020-11-12 20:15   ` Jonathan Tan
2020-11-12 22:30     ` Elijah Newren
2020-11-24 20:19       ` Elijah Newren
2020-11-25  2:07         ` Jonathan Tan
2020-11-26 18:13           ` Elijah Newren
2020-11-30 18:41             ` Jonathan Tan
2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren
2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren
2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren
2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee
2020-11-03 16:36   ` Elijah Newren
2020-11-07  6:06     ` Elijah Newren
2020-11-07 15:02       ` Derrick Stolee
2020-11-07 19:39         ` Elijah Newren
2020-11-09 12:30           ` Derrick Stolee
2020-11-09 17:13             ` Elijah Newren
2020-11-09 19:51               ` Derrick Stolee
2020-11-09 22:44                 ` Elijah Newren
2020-11-11 17:08 ` Derrick Stolee
2020-11-11 18:35   ` Elijah Newren
2020-11-11 20:48     ` Derrick Stolee
2020-11-11 21:18       ` Elijah Newren
2020-11-29  7:43 [PATCH " Elijah Newren via GitGitGadget
2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget
2020-12-04 20:47   ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPp-BEyosjCpBr-8B19YXZV0mpn3oYAXoaaROKFNZQ+p4ZMnQ@mail.gmail.com \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Git Mailing List Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/git/0 git/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 git git/ https://lore.kernel.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.git


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git