All of lore.kernel.org
 help / color / mirror / Atom feed
From: ZheNing Hu <adlternative@gmail.com>
To: Git List <git@vger.kernel.org>
Cc: Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	Hariom verma <hariom18599@gmail.com>, Jeff King <peff@peff.net>
Subject: [GSoC] Git Blog 4
Date: Sun, 13 Jun 2021 22:17:23 +0800	[thread overview]
Message-ID: <CAOLTT8QHL-6-DxoRKtx5cVp_DePxtWYU4CuBweYfCG1hGZZhaA@mail.gmail.com> (raw)

My fourth week blog finished:
The web version is here:
https://adlternative.github.io/GSOC-Git-Blog-3/

## Week4: Trouble is a friend

At the beginning of this week , since my previous code
broke some Github CI tests , I tried to solve these bugs
related to the atom `%(raw)` . The most confusing thing
is that some bugs may pass the tests of your local machine,
but fail to pass in the CI of GitHub .

E.g. I need to add the `GPG` prerequisites to the test like this:

```sh
test_expect_success GPG 'basic atom: refs/tags/signed-empty raw' '
git cat-file tag refs/tags/signed-empty >expected &&
git for-each-ref --format="%(raw)" refs/tags/signed-empty >actual &&
sanitize_pgp <expected >expected.clean &&
sanitize_pgp <actual >actual.clean &&
echo "" >>expected.clean &&
test_cmp expected.clean actual.clean
'
```

Otherwise, some operating systems that do not contain GnuPG
may not be able to perform related tests.

In addition, some scripts like `printf "%b" "a\0b\0c" >blob1` will
be truncated at the first NUL on a 32-bit machine, but it performs
well on 64-bit machines, and NUL is normally stored in the file.
This made me think that Git's file decompression had an error on
a 32-bit machine before I used Ubuntu32's docker container to
clone the git repository and In-depth analysis of bugs... In the end,
I used `printf "a\0b\0c"` to make 32-bit machines not truncated
in NUL. Is there a better way to write binary data onto a file than
`printf` and `echo`?

Since I am a newbie to docker, I would like to know if there is any
way to run the Git's Github CI program remotely or locally?

In the second half of this week, I tried to make `cat-file` reuse the
logic of `ref-filter`. I have to say that this is a very difficult process.
"rebase -i" again and again to repair the content of previous commits.
squeeze commits, split commits, modify commit messages... Finally, I
submitted the patches to the Git mailing list in
[[PATCH 0/8] [GSOC][RFC] cat-file: reuse `ref-filter`
logic](https://lore.kernel.org/git/pull.980.git.1623496458.gitgitgadget@gmail.com/).
Now `cat-file` has learned most of the atoms in `ref-filter`. I am very
happy to be able to make git support richer functions through my own code.

Regrettably, `git cat-file --batch --batch-all-objects` seems to take up
a huge amount of memory on a large repo such as git.git, and it will
be killed by Linux's oom. This is mainly because we will make a large
number of copies of the object's raw data. The original `git cat-file`
uses `read_object_file()` or `stream_blob()` to output the object's
raw data, but in `ref-filter`, we have to use `v->s` to copy the object's
data, it is difficult to eliminate `v->s` and print the output directly to the
final output buffer. Because we may have atoms like `%(if)`, `%(else)`
that need to use buffers on the stack to build the final output string
layer by layer, or the `cmp_ref_sorting()` needs to use `v->s` to
compare two refs. In short, it is very difficult for `ref-filter` to reduce
copy overhead. I even thought about using the string pool API
`memintern()` to replace `xmemdupz()`, but it seems that the effect
is not obvious. A large number of objects' data will still reside in memory,
so this may not be a good method.

Anyway, stay confident. I can solve these difficult problems with
the help of mentors and reviewers. `:)`

             reply	other threads:[~2021-06-13 14:17 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-13 14:17 ZheNing Hu [this message]
2021-06-13 23:28 ` [GSoC] Git Blog 4 Eric Sunshine
2021-06-14  3:41   ` ZheNing Hu
2021-06-14  8:02 ` Christian Couder
2021-06-14 12:02   ` Christian Couder
2021-06-15  8:59   ` ZheNing Hu
2021-06-15 12:30     ` ZheNing Hu
2021-06-14 13:20 ` Atharva Raykar
2021-06-15  9:06   ` ZheNing Hu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOLTT8QHL-6-DxoRKtx5cVp_DePxtWYU4CuBweYfCG1hGZZhaA@mail.gmail.com \
    --to=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hariom18599@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.