All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Mazo, Andrey" <amazo@checkvideo.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Cc: "Mazo, Andrey" <amazo@checkvideo.com>,
	"Luke Diamand" <luke@diamand.org>,
	"Eric Sunshine" <sunshine@sunshineco.com>,
	"George Vanburgh" <gvanburgh@bloomberg.net>,
	"Lars Schneider" <larsxschneider@gmail.com>,
	"Miguel Torroja" <miguel.torroja@gmail.com>,
	"Romain Merland" <merlorom@yahoo.fr>,
	"Vitor Antunes" <vitor.hda@gmail.com>,
	"Andrew Oakley" <aoakley@roku.com>,
	"SZEDER Gábor" <szeder.dev@gmail.com>, Andrey <ahippo@yandex.com>,
	"Junio C Hamano" <gitster@pobox.com>
Subject: [RFC PATCH 2/2] git-p4: support loading changelist descriptions from files
Date: Fri, 22 Mar 2019 19:54:45 +0000	[thread overview]
Message-ID: <bb3e14a3897c98762b0e656d583eaa408a6aba60.1553283214.git.amazo@checkvideo.com> (raw)
In-Reply-To: <cover.1553283214.git.amazo@checkvideo.com>

Our Perforce server experienced some kind of database corruption a few years ago.
While the file data and revision history are mostly intact,
some metadata for several changesets got lost.
For example, inspecting certain changelists produces errors.
"""
$ p4 describe -s 12345
Date 2019/02/26 16:46:17:
Operation: user-describe
Operation 'user-describe' failed.
Change 12345 description missing!
"""

While some metadata (like changeset descriptions) is obviously lost,
most of it can be reconstructed via other commands:
 * `p4 changes -l -t //...@12345,12345` --
   to obtain date+time, author, beginning of changeset description;
 * `p4 files -a //...@12345,12345` --
   to obtain file revisions, file types, file actions;
 * `p4 diff2 -u //...@12344 //...@12345` --
   to get a unified diff of text files in a changeset;
 * `p4 print -o binary.blob@12345 //depot/binary.blob@12345` --
   to get a revision of a binary file.

It might be possible to teach git-p4 to fallback to other methods if `p4 describe` fails,
but it's probably too special-cased (really depends on kind and scale of DB corruption),
so some manual intervention is perhaps acceptable.

So, with some manual work, it's possible to reconstruct `p4 -G describe ...` output manually.
In our case, once git-p4 passes `p4 describe` stage,
it can proceed further just fine.
Thus, it's tempting to feed resurrected metadata to git-p4 when a normal `p4 describe` would fail.

This functionality may be useful to cache changelist information,
or to make some changes to changelist info before feeding it to git-p4.

A new config parameter is introduced to tell git-p4
to load certain changelist descriptions from files instead of from a server.
For simplicity, it's one pickled file per changelist.
```
git config --add git-p4.damagedChangelists 12345.pickled
git config --add git-p4.damagedChangelists 12346.pickled
```

The following trivial script may be used to produce pickled `p4 -G describe`-compatible output.
"""
 #!/usr/bin/python2

 import pickle
 import time

 # recovered commits of interest
 changes = [
     {
         'change':     '12345',
         'status':     'submitted',
         'code':       'stat',
         'user':       'username1',
         'time':       str(int(time.mktime(time.strptime('2019/02/28 16:00:30', '%Y/%m/%d %H:%M:%S')))),
         'client':     'username1_hostname1',
         'desc':       'A bug is fixed.\nDetails are below:<lost>\n',
         'depotFile0': '//depot/branch1/foo.sh',
         'action0':    'edit',
         'rev0':       '28',
         'type0':      'xtext',
         'depotFile1': '//depot/branch1/bar.py',
         'action1':    'edit',
         'rev1':       '43',
         'type1':      'text',
         'depotFile2': '//depot/branch1/baz.doc',
         'action2':    'edit',
         'rev2':       '8',
         'type2':      'binary',
         'depotFile3': '//depot/branch1/qqq.c',
         'action3':    'edit',
         'rev3':       '6',
         'type3':      'ktext',
     },
 ]

 for change in changes:
     pickle.dump(change, open('{0}.pickled'.format(change['change']), 'wb'))
"""

Signed-off-by: Andrey Mazo <amazo@checkvideo.com>
---

Notes:
    Documentation changes and tests are obviously missing,
    but I hoped to get some feedback on the idea overall
    before working on those.

 git-p4.py | 25 ++++++++++++++++++++++++-
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/git-p4.py b/git-p4.py
index 40bc84573b..3133419280 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -24,10 +24,11 @@
 import stat
 import zipfile
 import zlib
 import ctypes
 import errno
+import pickle
 
 # support basestring in python3
 try:
     unicode = unicode
 except NameError:
@@ -2615,10 +2616,12 @@ def __init__(self):
         self.knownAlienLabelBranches = {}
 
         self.tz = "%+03d%02d" % (- time.timezone / 3600, ((- time.timezone % 3600) / 60))
         self.labels = {}
 
+        self.damagedChangelists = {}
+
     # Force a checkpoint in fast-import and wait for it to finish
     def checkpoint(self):
         self.gitStream.write("checkpoint\n\n")
         self.gitStream.write("progress checkpoint\n\n")
         out = self.gitOutput.readline()
@@ -3312,10 +3315,25 @@ def getAlienLabelBranchMapping(self):
         for mapping in alienLabelBranches:
             if mapping:
                 (alien, ours) = mapping.split(":")
                 self.knownAlienLabelBranches[alien] = ours
 
+    def loadDamagedChangelists(self):
+        damagedChangelists = gitConfigList("git-p4.damagedChangelists")
+        for clPickled in damagedChangelists:
+            if not clPickled:
+                continue
+
+            try:
+                clDesc = pickle.load(open(clPickled, 'rb'))
+                if not ("status" in clDesc and "user" in clDesc and "time" in clDesc and "change" in clDesc):
+                    die("Changelist description read from {0} doesn't have required fields".format(clPickled))
+            except (IOError, TypeError) as e:
+                die("Can't read changelist description dict from {0}: {1}".format(clPickled, str(e)))
+
+            self.damagedChangelists[int(clDesc["change"])] = clDesc
+
     def updateOptionDict(self, d):
         option_keys = {}
         if self.keepRepoPath:
             option_keys['keepRepoPath'] = 1
 
@@ -3413,11 +3431,14 @@ def searchParent(self, parent, branch, target):
             return None
 
     def importChanges(self, changes, origin_revision=0):
         cnt = 1
         for change in changes:
-            description = p4_describe(change)
+            if change in self.damagedChangelists:
+                description = self.damagedChangelists[change]
+            else:
+                description = p4_describe(change)
             self.updateOptionDict(description)
 
             if not self.silent:
                 sys.stdout.write("\rImporting revision %s (%s%%)" % (change, cnt * 100 / len(changes)))
                 sys.stdout.flush()
@@ -3704,10 +3725,12 @@ def run(self, args):
                     bad_changesfile = True
                     break
         if bad_changesfile:
             die("Option --changesfile is incompatible with revision specifiers")
 
+        self.loadDamagedChangelists()
+
         newPaths = []
         for p in self.depotPaths:
             if p.find("@") != -1:
                 atIdx = p.index("@")
                 self.changeRange = p[atIdx:]
-- 
2.19.2


  parent reply	other threads:[~2019-03-22 19:54 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-04 17:34 [PATCH 0/5] git-p4: a few assorted fixes for branches, excludes Mazo, Andrey
2019-03-04 17:34 ` [PATCH 1/5] git-p4: detect/prevent infinite loop in gitCommitByP4Change() Mazo, Andrey
2019-03-04 17:34 ` [PATCH 2/5] git-p4: match branches case insensitively if configured Mazo, Andrey
2019-03-04 17:34 ` [PATCH 3/5] git-p4: don't groom exclude path list on every commit Mazo, Andrey
2019-03-04 17:34 ` [PATCH 4/5] git-p4: add failing test for "don't exclude other files with same prefix" Mazo, Andrey
2019-03-04 17:34 ` [PATCH 5/5] git-p4: don't exclude other files with same prefix Mazo, Andrey
2019-03-21 22:32 ` [PATCH v2 0/7] git-p4: a few assorted fixes for branches, excludes Mazo, Andrey
2019-03-21 22:32   ` [PATCH v2 1/7] git-p4: detect/prevent infinite loop in gitCommitByP4Change() Mazo, Andrey
2019-03-21 22:32   ` [PATCH v2 2/7] git-p4: match branches case insensitively if configured Mazo, Andrey
2019-03-23  9:15     ` Luke Diamand
2019-03-25 17:20       ` Mazo, Andrey
2019-03-21 22:32   ` [PATCH v2 3/7] git-p4: don't groom exclude path list on every commit Mazo, Andrey
2019-03-21 22:33   ` [PATCH v2 4/7] git-p4: add failing test for "don't exclude other files with same prefix" Mazo, Andrey
2019-03-21 22:33   ` [PATCH v2 5/7] git-p4: don't exclude other files with same prefix Mazo, Andrey
2019-03-21 22:33   ` [PATCH v2 6/7] git-p4: add failing test for "git-p4: respect excluded paths when detecting branches" Mazo, Andrey
2019-03-21 22:33   ` [PATCH v2 7/7] git-p4: respect excluded paths when detecting branches Mazo, Andrey
2019-03-22 19:54   ` [RFC PATCH 0/2] git-p4: "alien" branches and load changelist info from file Mazo, Andrey
2019-03-22 19:54     ` [RFC PATCH 1/2] git-p4: introduce alien branch mappings Mazo, Andrey
2019-03-23  9:08       ` Luke Diamand
2019-03-26 18:43         ` Mazo, Andrey
2019-03-27 23:08           ` [RFC PATCH 1/1] git-p4: inexact label detection Mazo, Andrey
2019-03-22 19:54     ` Mazo, Andrey [this message]
2019-03-23  8:44       ` [RFC PATCH 2/2] git-p4: support loading changelist descriptions from files Luke Diamand
2019-03-25 17:46         ` [RFC PATCH 2/2] git-p4: support loading changelist descriptions Mazo, Andrey
2019-04-01 18:02   ` [PATCH v3 0/8] git-p4: a few assorted fixes for branches, excludes Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 1/8] git-p4: detect/prevent infinite loop in gitCommitByP4Change() Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 2/8] git-p4: add failing test for "git-p4: match branches case insensitively if configured" Mazo, Andrey
2019-04-02 12:05       ` SZEDER Gábor
2019-04-02 17:13         ` Mazo, Andrey
2019-04-03  7:10         ` Junio C Hamano
2019-04-01 18:02     ` [PATCH v3 3/8] git-p4: match branches case insensitively if configured Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 4/8] git-p4: don't groom exclude path list on every commit Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 5/8] git-p4: add failing test for "don't exclude other files with same prefix" Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 6/8] git-p4: don't exclude other files with same prefix Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 7/8] git-p4: add failing test for "git-p4: respect excluded paths when detecting branches" Mazo, Andrey
2019-04-01 18:02     ` [PATCH v3 8/8] git-p4: respect excluded paths when detecting branches Mazo, Andrey
2019-04-01 19:54     ` [PATCH v3 0/8] git-p4: a few assorted fixes for branches, excludes Mazo, Andrey
2019-04-02  0:13     ` [RFC PATCH v2 0/2] git-p4: inexact labels and load changelist description from file Mazo, Andrey
2019-04-02  0:13       ` [RFC PATCH v2 1/2] git-p4: inexact label detection Mazo, Andrey
2019-04-02  0:13       ` [RFC PATCH v2 2/2] git-p4: support loading changelist descriptions from files Mazo, Andrey

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb3e14a3897c98762b0e656d583eaa408a6aba60.1553283214.git.amazo@checkvideo.com \
    --to=amazo@checkvideo.com \
    --cc=ahippo@yandex.com \
    --cc=aoakley@roku.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=gvanburgh@bloomberg.net \
    --cc=larsxschneider@gmail.com \
    --cc=luke@diamand.org \
    --cc=merlorom@yahoo.fr \
    --cc=miguel.torroja@gmail.com \
    --cc=sunshine@sunshineco.com \
    --cc=szeder.dev@gmail.com \
    --cc=vitor.hda@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.