From: Elijah Newren <newren@gmail.com> To: Jonathan Tan <jonathantanmy@google.com> Cc: Git Mailing List <git@vger.kernel.org> Subject: Re: [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Date: Fri, 6 Nov 2020 16:26:54 -0800 [thread overview] Message-ID: <CABPp-BEyosjCpBr-8B19YXZV0mpn3oYAXoaaROKFNZQ+p4ZMnQ@mail.gmail.com> (raw) In-Reply-To: <20201106225828.774616-1-jonathantanmy@google.com> On Fri, Nov 6, 2020 at 2:58 PM Jonathan Tan <jonathantanmy@google.com> wrote: > > > +static void setup_path_info(struct merge_options *opt, > > + struct string_list_item *result, > > + const char *current_dir_name, > > + int current_dir_name_len, > > + char *fullpath, /* we'll take over ownership */ > > + struct name_entry *names, > > + struct name_entry *merged_version, > > + unsigned is_null, /* boolean */ > > + unsigned df_conflict, /* boolean */ > > Booleans could be int, I think? I guess this goes back to the question on patch 6 where you suggested I mark some unsigned variables (derived from bit math on other unsigned quantities) instead be int. I guess it could, but I have the same question; since "boolean" isn't available in C, does int vs. unsigned matter? > > + unsigned filemask, > > + unsigned dirmask, > > + int resolved /* boolean */) > > +{ > > + struct conflict_info *path_info; > > + > > + assert(!is_null || resolved); > > + assert(!df_conflict || !resolved); /* df_conflict implies !resolved */ > > + assert(resolved == (merged_version != NULL)); > > + > > + path_info = xcalloc(1, resolved ? sizeof(struct merged_info) : > > + sizeof(struct conflict_info)); > > + path_info->merged.directory_name = current_dir_name; > > + path_info->merged.basename_offset = current_dir_name_len; > > + path_info->merged.clean = !!resolved; > > + if (resolved) { > > + path_info->merged.result.mode = merged_version->mode; > > + oidcpy(&path_info->merged.result.oid, &merged_version->oid); > > + path_info->merged.is_null = !!is_null; > > + } else { > > + int i; > > + > > + for (i = 0; i < 3; i++) { > > + path_info->pathnames[i] = fullpath; > > + path_info->stages[i].mode = names[i].mode; > > + oidcpy(&path_info->stages[i].oid, &names[i].oid); > > + } > > + path_info->filemask = filemask; > > + path_info->dirmask = dirmask; > > + path_info->df_conflict = !!df_conflict; > > + } > > + strmap_put(&opt->priv->paths, fullpath, path_info); > > So these are placed in paths but not unmerged. I'm starting to wonder if > struct merge_options_internal should be called merge_options_state or > something, and each field having documentation about when they're used > (or better yet, have functions like collect_merge_info() return their > calculations in return values (which may be "out" parameters) instead of > in this struct). Right, unmerged is only those paths that remain unmerged after all steps. record_unmerged_index_entries() could simply walk over all entries in paths and pick out the ones that were unmerged, but process_entries() has to walk over all paths, determine whether they can be merged, and determine what to record for the resulting tree for each path. So, having it stash away the unmerged stuff is a simple optimization. Renaming to merge_options_state or even just merge_state would be fine -- but any renaming done here will also affect merge-recursive.[ch]. See the definition of merge_options in merge-recursive. (For history, merge-recursive.h stuffed state into merge_options, which risked funny misusage patterns and made the API unnecessarily complex...and made it suggest that alternative algorithms needed to have the same state. So, the state was moved to a merge_options_internal struct. That's not to say we can't rename, but it does need to be done in merge-recursive as well.) As for having collect_merge_info() return their calculations in return values, would that just end with me returning a struct merge_options_internal? Or did you want each return value added to the function signature? Each return value in the function signature makes sense right now for this super-simplified initial 20 patches, but what about when this data structure gains all kind of rename-related state that is collected, updated, and passed between these areas? I'd have a huge number of "out" and "in" fields to every function. Eventually, merge_options_internal (or whatever it might be renamed to) expands to the following, where I have to first define an extra enum and two extra structs so that you know the definitions of new types that show up in merge_options_internal: enum relevance { RELEVANT_NO_MORE = 0, RELEVANT_CONTENT = 1, RELEVANT_LOCATION = 2, RELEVANT_BOTH = 3 }; struct traversal_callback_data { unsigned long mask; unsigned long dirmask; struct name_entry names[3]; }; struct rename_info { /* For the next six vars, the 0th entry is ignored and unused */ struct diff_queue_struct pairs[3]; /* input to & output from diffcore_rename */ struct strintmap relevant_sources[3]; /* filepath => enum relevance */ struct strintmap dirs_removed[3]; /* directory => bool */ struct strmap dir_rename_count[3]; /* old_dir => {new_dir => int} */ struct strintmap possible_trivial_merges[3]; /* dirname->dir_rename_mask */ struct strset target_dirs[3]; /* set of directory paths */ unsigned trivial_merges_okay[3]; /* 0 = no, 1 = maybe */ /* * dir_rename_mask: * 0: optimization removing unmodified potential rename source okay * 2 or 4: optimization okay, but must check for files added to dir * 7: optimization forbidden; need rename source in case of dir rename */ unsigned dir_rename_mask:3; /* * dir_rename_mask needs to be coupled with a traversal through trees * that iterates over all files in a given tree before all immediate * subdirectories within that tree. Since traverse_trees() doesn't do * that naturally, we have a traverse_trees_wrapper() that stores any * immediate subdirectories while traversing files, then traverses the * immediate subdirectories later. */ struct traversal_callback_data *callback_data; int callback_data_nr, callback_data_alloc; char *callback_data_traverse_path; /* * When doing repeated merges, we can re-use renaming information from * previous merges under special circumstances; */ struct tree *merge_trees[3]; int cached_pairs_valid_side; struct strmap cached_pairs[3]; /* fullnames -> {rename_path or NULL} */ struct strset cached_irrelevant[3]; /* fullnames */ struct strset cached_target_names[3]; /* set of target fullnames */ /* * And sometimes it pays to detect renames, and then restart the merge * with the renames cached so that we can do trivial tree merging. * Values: 0 = don't bother, 1 = let's do it, 2 = we already did it. */ unsigned redo_after_renames; }; struct merge_options_internal { struct strmap paths; /* maps path -> (merged|conflict)_info */ struct strmap unmerged; /* maps path -> conflict_info */ #if USE_MEMORY_POOL struct mem_pool pool; #else struct string_list paths_to_free; /* list of strings to free */ #endif struct rename_info *renames; struct index_state attr_index; /* renormalization weirdly needs one... */ struct strmap output; /* maps path -> conflict messages */ const char *current_dir_name; char *toplevel_dir; /* see merge_info.directory_name comment */ int call_depth; int needed_rename_limit; }; > > + result->string = fullpath; > > + result->util = path_info; > > +} > > + > > static int collect_merge_info_callback(int n, > > unsigned long mask, > > unsigned long dirmask, > > @@ -91,10 +136,12 @@ static int collect_merge_info_callback(int n, > > */ > > struct merge_options *opt = info->data; > > struct merge_options_internal *opti = opt->priv; > > - struct conflict_info *ci; > > + struct string_list_item pi; /* Path Info */ > > + struct conflict_info *ci; /* pi.util when there's a conflict */ > > Looking ahead to patch 10, this seems more like "pi.util unless we know > for sure that there's no conflict". That's too long for the line to remain at 80 characters; it's 16 characters over the limit. ;-)
next prev parent reply other threads:[~2020-11-07 0:27 UTC|newest] Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-11-02 20:43 [PATCH v2 00/20] fundamentals of merge-ort implementation Elijah Newren 2020-11-02 20:43 ` [PATCH v2 01/20] merge-ort: setup basic internal data structures Elijah Newren 2020-11-06 22:05 ` Jonathan Tan 2020-11-06 22:45 ` Elijah Newren 2020-11-09 20:55 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 02/20] merge-ort: add some high-level algorithm structure Elijah Newren 2020-11-02 20:43 ` [PATCH v2 03/20] merge-ort: port merge_start() from merge-recursive Elijah Newren 2020-11-11 13:52 ` Derrick Stolee 2020-11-11 16:22 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 04/20] merge-ort: use histogram diff Elijah Newren 2020-11-11 13:54 ` Derrick Stolee 2020-11-11 16:47 ` Elijah Newren 2020-11-11 16:51 ` Derrick Stolee 2020-11-11 17:03 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 05/20] merge-ort: add an err() function similar to one from merge-recursive Elijah Newren 2020-11-11 13:58 ` Derrick Stolee 2020-11-11 17:07 ` Elijah Newren 2020-11-11 17:10 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 06/20] merge-ort: implement a very basic collect_merge_info() Elijah Newren 2020-11-06 22:19 ` Jonathan Tan 2020-11-06 23:10 ` Elijah Newren 2020-11-09 20:59 ` Jonathan Tan 2020-11-11 14:38 ` Derrick Stolee 2020-11-11 17:02 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 07/20] merge-ort: avoid repeating fill_tree_descriptor() on the same tree Elijah Newren 2020-11-11 14:51 ` Derrick Stolee 2020-11-11 17:13 ` Elijah Newren 2020-11-11 17:21 ` Eric Sunshine 2020-11-02 20:43 ` [PATCH v2 08/20] merge-ort: compute a few more useful fields for collect_merge_info Elijah Newren 2020-11-06 22:52 ` Jonathan Tan 2020-11-06 23:41 ` Elijah Newren 2020-11-09 22:04 ` Jonathan Tan 2020-11-09 23:05 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren 2020-11-06 22:58 ` Jonathan Tan 2020-11-07 0:26 ` Elijah Newren [this message] 2020-11-09 22:09 ` Jonathan Tan 2020-11-09 23:08 ` Elijah Newren 2020-11-11 15:26 ` Derrick Stolee 2020-11-11 18:16 ` Elijah Newren 2020-11-11 22:06 ` Elijah Newren 2020-11-12 18:23 ` Derrick Stolee 2020-11-12 18:39 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 10/20] merge-ort: avoid recursing into identical trees Elijah Newren 2020-11-11 15:31 ` Derrick Stolee 2020-11-02 20:43 ` [PATCH v2 11/20] merge-ort: add a preliminary simple process_entries() implementation Elijah Newren 2020-11-11 19:51 ` Jonathan Tan 2020-11-12 1:48 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 12/20] merge-ort: have process_entries operate in a defined order Elijah Newren 2020-11-11 16:09 ` Derrick Stolee 2020-11-11 18:58 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 13/20] merge-ort: step 1 of tree writing -- record basenames, modes, and oids Elijah Newren 2020-11-11 20:01 ` Jonathan Tan 2020-11-11 20:24 ` Elijah Newren 2020-11-12 20:39 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 14/20] merge-ort: step 2 of tree writing -- function to create tree object Elijah Newren 2020-11-11 20:47 ` Jonathan Tan 2020-11-11 21:21 ` Elijah Newren 2020-11-02 20:43 ` [PATCH v2 15/20] merge-ort: step 3 of tree writing -- handling subdirectories as we go Elijah Newren 2020-11-12 20:15 ` Jonathan Tan 2020-11-12 22:30 ` Elijah Newren 2020-11-24 20:19 ` Elijah Newren 2020-11-25 2:07 ` Jonathan Tan 2020-11-26 18:13 ` Elijah Newren 2020-11-30 18:41 ` Jonathan Tan 2020-11-02 20:43 ` [PATCH v2 16/20] merge-ort: basic outline for merge_switch_to_result() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 17/20] merge-ort: add implementation of checkout() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 18/20] tree: enable cmp_cache_name_compare() to be used elsewhere Elijah Newren 2020-11-02 20:43 ` [PATCH v2 19/20] merge-ort: add implementation of record_unmerged_index_entries() Elijah Newren 2020-11-02 20:43 ` [PATCH v2 20/20] merge-ort: free data structures in merge_finalize() Elijah Newren 2020-11-03 14:49 ` [PATCH v2 00/20] fundamentals of merge-ort implementation Derrick Stolee 2020-11-03 16:36 ` Elijah Newren 2020-11-07 6:06 ` Elijah Newren 2020-11-07 15:02 ` Derrick Stolee 2020-11-07 19:39 ` Elijah Newren 2020-11-09 12:30 ` Derrick Stolee 2020-11-09 17:13 ` Elijah Newren 2020-11-09 19:51 ` Derrick Stolee 2020-11-09 22:44 ` Elijah Newren 2020-11-11 17:08 ` Derrick Stolee 2020-11-11 18:35 ` Elijah Newren 2020-11-11 20:48 ` Derrick Stolee 2020-11-11 21:18 ` Elijah Newren 2020-11-29 7:43 [PATCH " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 " Elijah Newren via GitGitGadget 2020-12-04 20:47 ` [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path Elijah Newren via GitGitGadget
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CABPp-BEyosjCpBr-8B19YXZV0mpn3oYAXoaaROKFNZQ+p4ZMnQ@mail.gmail.com \ --to=newren@gmail.com \ --cc=git@vger.kernel.org \ --cc=jonathantanmy@google.com \ --subject='Re: [PATCH v2 09/20] merge-ort: record stage and auxiliary info for every path' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).