All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Git List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom verma <hariom18599@gmail.com>, Jeff King <peff@peff.net>,
	Shourya Shukla <periperidip@gmail.com>,
	olyatelezhnaya@gmail.com, ZheNing Hu <adlternative@gmail.com>
Subject: GSoC Git Proposal Draft - ZheNing Hu
Date: Fri, 2 Apr 2021 17:03:17 +0800	[thread overview]
Message-ID: <CAOLTT8RfE4nn5NnjZh7xuF09-5=+K+_j_2kP0327HVdR4x_wAQ@mail.gmail.com> (raw)

Hello, Git,
I'm ZheNing Hu,
Here is my GSoC 2021 Proposal draft.
And website version is there :
https://docs.google.com/document/d/119k-Xa4CKOt5rC1gg1cqPr6H3MvdgTUizndJGAo1Erk/edit

Welcome any Comments and Correct :)

----8<----
## Use ref-filter formats in git cat-file

### About Me
| Name | ZheNing Hu |
| ---------- | ------------------------------------------ |
| Major | Computer Science And Technology |
| Mobile no. | +86 15058356458 |
| Email | adlternative@gmail.com |
| IRC | adlternative (on #git-devel/#git@freenode) |
| Github | https://github.com/adlternative/ |
| Blogs | https://adlternative.github.io/ |
| Time Zone | CST (UTC +08:00) |

### Education & Background
* I am currently a 2nd Year Student majoring in computer science and
technology in Xi'an University of Posts & Telecommunications (China).
* In my freshman year, I joined the XiYou Linux Group of the
university and learned how to use Git to submit my own code to GitHub.
I have learned C, C++, Python and shell in two years, I know how to
use gdb debugging, and I am familiar with relevant knowledge of Linux
System Programming and Linux Network Programming.
* I started learning Git source code and made contributions to Git
from December of 2020.

### Me & Git
Around last November, I found a couple of projects
[build-your-own-git](https://github.com/danistefanovic/build-your-own-x#build-your-own-git)
on GitHub teaching me how to write a simple git, the mechanics of Git
are very interesting:

1. There are four types of objects in Git: BLOB, TREE, COMMIT, TAG
2. The (loose)objects are stored in `.git/object/sha1[0-1]/sha1[2-39]`
with the sha1 value of the data as the storage address.
3. All branches are just references to commits.

Then I read`《Pro Git》`and Jiang Xin's `《Git Authoritative Guide》`,
learned the use of most Git subcommands.

Later, I started learning some of the Git source code, I found Git has
at least 200,000 lines of C code and 200,000 lines of shell script
code, which leaves me a little confused about where to start.

But then, after I submitted my first patch, a lot of people in the Git
community came over and gave me very enthusiastic guidance, which gave
me the courage to learn the Git source code, and then I started making
my own contributions, You can find them here:
[gitgitgadget](https://github.com/gitgitgadget/git/pulls?q=is%3Apr+author%3Aadlternative+)
or
[git.kernel.org](https://git.kernel.org/pub/scm/git/git.git/log/?qt=grep&q=ZheNing+Hu)


These patches have been merged into the "master" branch:

#### [master]
* difftool.c: learn a new way start at specified file [(mail
list)](https://lore.kernel.org/git/pull.870.v6.git.1613739235241.gitgitgadget@gmail.com/)
* ls-files.c: add --deduplicate option
[(mail list)](https://lore.kernel.org/git/384f77a4c188456854bd86335e9bdc8018097a5f.1611485667.git.gitgitgadget@gmail.com/)
* ls_files.c: consolidate two for loops into one
[(mail list)](https://lore.kernel.org/git/f9d5e44d2c08b9e3d05a73b0a6e520ef7bb889c9.1611485667.git.gitgitgadget@gmail.com/)
* ls_files.c: bugfix for --deleted and --modified
[(mail list)](https://lore.kernel.org/git/8b02367a359e62d7721b9078ac8393a467d83724.1611485667.git.gitgitgadget@gmail.com/)
* builtin/*: update usage format
[(mail list)](https://lore.kernel.org/git/d3eb6dcff1468645560c16e1d8753002cbd7f143.1609944243.git.gitgitgadget@gmail.com/)

And These patches are in the queue:

#### [next]

* format-patch: allow a non-integral version numbers
[(mail list)](https://lore.kernel.org/git/pull.885.v10.git.1616497946427.gitgitgadget@gmail.com/)
* [GSOC] commit: add --trailer option
[(mail list)](https://lore.kernel.org/git/pull.901.v14.git.1616507757999.gitgitgadget@gmail.com/)

#### [WIP]

* gitk: add right-click context menu for tags
[(mail list)](https://lore.kernel.org/git/pull.866.v5.git.1614227923637.gitgitgadget@gmail.com/)
* [GSOC] trailer: pass arg as positional parameter
[(mail list)](https://lore.kernel.org/git/5894d8c4b36466326b0427bfda0d6981e52a0907.1617185147.git.gitgitgadget@gmail.com/)

### Proposed Project

* Git used to have an old problem of duplicated implementations of
some logic. For example, Git had at least 4 different implementations
to format command output for different commands.

* `git cat-file` is a git subcommand used to see information about a git object.

* `git cat-file --batch` can print object information and contents on
stdin. The only difference between `--batch-check` and `--batch` is
that `--batch-check` does not print the contents of the object.
* `--batch-all-objects` will show all objects with `--batch` or `--batch-check`.
* `--batch-check` and `--batch` both accept formatted strings:
* `%(objectname)`: 40-bit SHA1 string of Git object
* `%(objecttype)`: Object Type blob,tree,commit,tag
* `%(objectsize)`: Size of the object's content
* `%(objectsize:disk)`: The size of the object itself on disk
* `%(delatbase)`: If the object is stored incrementally in Git,
Returns the SHA1 string for its delabase
* `%(rest)`: Anything before the space and TAB in the input
line is treated as an object, and anything after
that will be printed as usual
* In the original design, the first time use `expand_format()` in
`batch_objects()` is to parsing formatted messages, the second time
use `expand_format()` in `batch_object_write()` is to format the
object information and store it in a string buffer, eventually the
contents of this buffer will be printed to standard output.


* [Olga](olyatelezhnaya@gmail.com) have been involved in integrating
`ref-filter` logic into `cat-file`
[(link)](https://github.com/git/git/pull/568), the problem with her
patches at that time:
1. Too long patch series, difficult to adjust and merge.
2. I don't think it's a good idea for her to use `struct
ref_array_item` instead of `struct expand_data` for `cat-file` to fit
`ref-filter` logic, because `struct ref_array_item` and `struct
expand_data` are not very related.
[(link)](https://github.com/git/git/pull/568/commits/e0aafaa76476ba5528f84b794043531ebd4633c7#diff-d03110606a7ed8cb9832bbcc572f1093435cc6115c4e58d7a7750af3c33319a7R238)

* Because part of the feature of `git for-each-ref` is very similar to
that of `git cat-file`, I think `git cat-file` can learn some feasible
solutions from it.

#### My possible solutions:

1. Same [solution](https://github.com/git/git/pull/568/commits/cc40c464e813fc7a6bd93a01661646114d694d76)
as Olga, add member `struct ref_format format` in `struct
batch_options`.
2. Use the function
[`verify_ref_format()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L904)
to replace the first `expand_format()` for parsing format strings.
3. Write a function like
[`format_ref_array_item()`](https://github.com/gitgitgadget/git/blob/84d06cdc06389ae7c462434cb7b1db0980f63860/ref-filter.c#L2392),
get information about objects, and use `get_object()` to grub the
information which we prefer (or just use `grab_common_value()`).
4. The migration of `%(rest)` may require learning the handling of
`%(if)` ,`%(else)`.

### Are you applying for other Projects?

No, Git is the only one.

### Blogging about Git

In fact, while I am studying Git source code, I often write some
[blogs](https://adlternative.github.io/tags/git/) to record my
learning content, this helps me to recall some content after
forgetting it. Most of the blogs were written in Chinese previously,
but during the GSoC, I promise all my blogs will be written in
English.

### TimeLine
* May 18 ~ June 8
* Look for a scheme to make `git cat-file` and `ref-filter` more
compatible, and start the integration attempt.
* *Stretch Goal*: move `%(objectsize)`,`%(objecttype)`,`%(objectname)` .

* June 8 ~ July 8
* Move the body of the `git cat-file` attempt to the `ref-filter`
logic, complete the basic function realization.
* *Stretch Goal*: move `%(deltabase)`,`%(objectsize:disk)`,`%(rest)` .

* July 8 ~ August 17
* Analyze the performance of ref-filter and try to reduce the
performance cost of a lot of string matching. I thought if I had some
spare time, I could work on some other interesting patches.
* *Stretch Goal*: Optimize ref-filter performance.

### Availability
My exam is expected to end in June, but the time I don't have classes
before the final exam, as well as the summer vacation after that, is
basically my self-learning time. Although I am studying many other
courses, I have enough time and energy to complete daily tasks. I'm
staying active on the Git mailing list, you can find me at any time as
long as I am not sleeping. :)


### Post GSoC
* I love open source philosophy, willing to spread the spirit of
openness, freedom and willing to research technology with like-minded
people.
* In my previous contact with the Git community in the past few
months, many people in the Git community gave me great encouragement.
I hope I can keep my passion for Git alive, contribute my own code,
and pass this cool thing on.
* I am willing to contribute code to the Git community for a long time
after the end of GSoC.
* I hope the Git community can give me a chance to participate in
GSoC. I sincerely thank GSoC and the Git community!

             reply	other threads:[~2021-04-02  9:03 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-02  9:03 ZheNing Hu [this message]
2021-04-02 14:57 ` GSoC Git Proposal Draft - ZheNing Hu Christian Couder
2021-04-03 13:23   ` ZheNing Hu
2021-04-02 15:39 ` Jeff King
2021-04-03 14:27   ` ZheNing Hu
2021-04-07 19:28     ` Jeff King
2021-04-08 13:29       ` ZheNing Hu
2021-04-11  6:11 ` ZheNing Hu
2021-04-11 15:34   ` ZheNing Hu
2021-04-13  6:40     ` Jeff King
2021-04-13 14:51       ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOLTT8RfE4nn5NnjZh7xuF09-5=+K+_j_2kP0327HVdR4x_wAQ@mail.gmail.com' \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=peff@peff.net \
    --cc=periperidip@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.