git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Conversion of 'git submodule' to C: need some help
@ 2020-02-09 11:13 Shourya Shukla
  2020-02-13 13:33 ` Johannes Schindelin
  0 siblings, 1 reply; 3+ messages in thread
From: Shourya Shukla @ 2020-02-09 11:13 UTC (permalink / raw)
  To: git; +Cc: Johannes.Schindelin, chriscool, peff, t.gummerer

Hello everyone,

I was trying to understand the code of 'git submodule'[1]. This is
also in reference to this conversation I had before[2].

I read the code and stumbled across a function with a 'TODO' tag[3].
Here we want to change the aforementioned function into a 'repo_submodule_init'
function I suppose.

I am facing some problems and would love some insight on them:
	
	1. What exactly are we aiming in [3]? To replace the function completely
	   or to just add some 'repo_submodule_init' functionality?

	2. Something I inferred was that functions with names of the pattern 'strbuf_git_*'
	   are trying to 'create a path'(are they physically creating the path or just
	   instructing git about them?) while functions of the pattern 'git_*' are trying
	   to check some conditions denoted by their function names(for instance
	   'git_config_rename_section_in_file')? Is this inference correct to some extent?

	3. How does one check which all parts of a command have been completed? Is it checked
	   by looking at the file history or by comparing with the shell script of the command
	   or are there any other means?
	
	4. Is it fine if I am not able to understand the purpose of certain functions right now(such as
	   'add_submodule_odb')? I am able to get a rough idea of what the functions are doing but I am
	   not able to decode certain functions line-by-line.

Currently, I am studying in depth about 'git objects' and the submodule command on the git Documentation.
What else do would you advise me to strengthen my understanding of the code and git in general?

Regards,
Shourya Shukla

[1]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c
[2]: https://lore.kernel.org/git/20200201173841.13760-1-shouryashukla.oo@gmail.com/
[3]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c#L168


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Conversion of 'git submodule' to C: need some help
  2020-02-09 11:13 Conversion of 'git submodule' to C: need some help Shourya Shukla
@ 2020-02-13 13:33 ` Johannes Schindelin
  0 siblings, 0 replies; 3+ messages in thread
From: Johannes Schindelin @ 2020-02-13 13:33 UTC (permalink / raw)
  To: Shourya Shukla; +Cc: git, chriscool, peff, t.gummerer

Hi Shourya,

just adding a little to what Abhishek said (which was pretty sound
advice!) below.

On Sun, 9 Feb 2020, Shourya Shukla wrote:

> I am facing some problems and would love some insight on them:
>
> 	1. What exactly are we aiming in [3]? To replace the function completely
> 	   or to just add some 'repo_submodule_init' functionality?

If you follow the "Git blame" link in the breadcrumb menu, you will get to
the commit that added the TODO:
https://github.com/periperidip/git/commit/18cfc0886617e28fb6d29d579bec0ffcdb439196

Unfortunately, it does not necessarily help me understand what that TODO
is about. So let's analyze the code:

int add_submodule_odb(const char *path)
{
	struct strbuf objects_directory = STRBUF_INIT;
	int ret = 0;
	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
	if (ret)
		goto done;
	if (!is_directory(objects_directory.buf)) {
		ret = -1;
		goto done;
	}
	add_to_alternates_memory(objects_directory.buf);
done:
	strbuf_release(&objects_directory);
	return ret;
}

Okay, so this just adds the object database of the submodule (if it
exists, if it does not exist, the submodule is probably _already_ using
the superproject's database).

To understand what I am talking about, have a look at this document:
https://git-scm.com/docs/gitrepository-layout#Documentation/gitrepository-layout.txt-objects

So what does the function do that was suggested as a better alternative?

int repo_submodule_init(struct repository *subrepo,
			struct repository *superproject,
			const struct submodule *sub)
{
	struct strbuf gitdir = STRBUF_INIT;
	struct strbuf worktree = STRBUF_INIT;
	int ret = 0;


	if (!sub) {
		ret = -1;
		goto out;
	}


	strbuf_repo_worktree_path(&gitdir, superproject, "%s/.git", sub->path);
	strbuf_repo_worktree_path(&worktree, superproject, "%s", sub->path);


	if (repo_init(subrepo, gitdir.buf, worktree.buf)) {
		/*
		 * If initialization fails then it may be due to the
		 * submodule
		 * not being populated in the superproject's worktree.
		 * Instead
		 * we can try to initialize the submodule by finding it's
		 * gitdir
		 * in the superproject's 'modules' directory.  In this
		 * case the
		 * submodule would not have a worktree.
		 */
		strbuf_reset(&gitdir);
		strbuf_repo_git_path(&gitdir, superproject,
				     "modules/%s", sub->name);


		if (repo_init(subrepo, gitdir.buf, NULL)) {
			ret = -1;
			goto out;
		}
	}


	subrepo->submodule_prefix = xstrfmt("%s%s/",
					    superproject->submodule_prefix ?
					    superproject->submodule_prefix :
					    "", sub->path);

out:
	strbuf_release(&gitdir);
	strbuf_release(&worktree);
	return ret;
}

Ah, that populates a complete `struct repository`! I fear, however, that
our object lookup is currently not tied to such a `struct repository`
instance. So I think that this TODO can only be addressed once a ton more
patch series like
https://lore.kernel.org/git/f1e4da02-9411-8a93-ca62-6d7ae7bf4ae8@gmail.com/
made it not only to the Git mailing list, but into `master`.

> 	2. Something I inferred was that functions with names of the pattern 'strbuf_git_*'
> 	   are trying to 'create a path'(are they physically creating the path or just
> 	   instructing git about them?) while functions of the pattern 'git_*' are trying
> 	   to check some conditions denoted by their function names(for instance
> 	   'git_config_rename_section_in_file')? Is this inference correct to some extent?

All `strbuf_*()` functions work on our "string class" (I forgot who said
it, but it is true that any sufficiently advanced C project sooner or
later develops their own string data type).

To know whether the functions in question create a path or not, you will
have to find their documentation in the appropriate header file (usually
`strbuf.h`), or absent that, find and understand their implementation
(usually in `strbuf.c`).

> 	3. How does one check which all parts of a command have been completed? Is it checked
> 	   by looking at the file history or by comparing with the shell script of the command
> 	   or are there any other means?

You mean whether a scripted command has been completely converted to C?
There is no universal way to do that.

In `git submodule`'s instance, I would say that a subcommand is converted
successfully when all parts except for the command-line option parsing
have been moved into the `submodule--helper`. Eventually,
`git-submodule.sh` will only have functions that parse command-line
options and then pass the result on to the helper. At that point, the
command-line option parsing can _also_ be moved into the helper. Or maybe
even the entire script in one go, I am not sure how big of a patch that
would be.

> 	4. Is it fine if I am not able to understand the purpose of certain functions right now(such as
> 	   'add_submodule_odb')? I am able to get a rough idea of what the functions are doing but I am
> 	   not able to decode certain functions line-by-line.

It is okay not to understand all the details, but if you want to work on
the code, you will need to understand at least the purpose, and if you
want to come up with a project plan (e.g. for GSoC), it will be _really_
helpful to form an understanding of the implementation details, too.

> Currently, I am studying in depth about 'git objects' and the submodule command on the git Documentation.
> What else do would you advise me to strengthen my understanding of the code and git in general?

I don't know what in particular you want to strengthen. Typically, a good
way to learn enough about the code base in preparation for Google Summer
of Code or Outreachy is to read the code, and whenever anything is
unclear, try to learn about the data structures and/or the underlying
design by studying the files in `Documentation/` (in particular in the
`technical/` subdirectory) whose names seem relevant.

Ciao,
Johannes

>
> Regards,
> Shourya Shukla
>
> [1]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c
> [2]: https://lore.kernel.org/git/20200201173841.13760-1-shouryashukla.oo@gmail.com/
> [3]: https://github.com/periperidip/git/blob/v2.25.0/submodule.c#L168
>
>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Conversion of 'git submodule' to C: need some help
@ 2020-02-09 15:00 Abhishek Kumar
  0 siblings, 0 replies; 3+ messages in thread
From: Abhishek Kumar @ 2020-02-09 15:00 UTC (permalink / raw)
  To: shouryashukla.oo; +Cc: Johannes.Schindelin, chriscool, git, peff, t.gummerer

Greetings Shourya

> 1. What exactly are we aiming in [3]? To replace the function completely
> or to just add some 'repo_submodule_init' functionality?

We are aiming to convert calls to use `repo_submodule_init` instead
and remove this function.

If they differ in functionality, implement any changes to
`repo_submodule_init` such that code that already uses it runs without
any modifications.

> 2. Something I inferred was that functions with names of the pattern 'strbuf_git_*'
> are trying to 'create a path' (are they physically creating the path or just instructing git about them?)

`strbuf_git_*` construct a path to the git directory and append it to
the string buffer passed. They are not physically creating folders,
just creating a string variable that stores the path.
If anything, it returns -1 when the git directory does not exist already. [1]

> while functions of the pattern 'git_*' are trying to check some conditions denoted
> by their function names(for instance  'git_config_rename_section_in_file')?
> Is this inference correct to some extent?

While I cannot talk about whether your inference is correct,
`git_config_rename_section_in_file` does not **just** check the
condition.

In the case of `remove_path_from_submodules` (where I am guessing you
had this doubt), it removes the section from `.gitmodules` and returns
a negative value on failure [2]

It's a common idiom in C - Functions with intended side effects return
non zero (usually negative) values on failure and zero otherwise. [3]

3. Not sure what you mean. Do elaborate.

4. Yes! Everyone has to begin at some point and learn. Feel free to
ask more questions when in doubt.

Regards
Abhishek

[1]: https://github.com/git/git/blob/de93cc14ab7e8db7645d8dbe4fd2603f76d5851f/submodule.c#L2257
[2]: https://github.com/git/git/blob/de93cc14ab7e8db7645d8dbe4fd2603f76d5851f/config.c#L3051
[3]: https://wiki.c2.com/?CeeIdioms

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-02-13 13:34 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-09 11:13 Conversion of 'git submodule' to C: need some help Shourya Shukla
2020-02-13 13:33 ` Johannes Schindelin
2020-02-09 15:00 Abhishek Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).