On Thu, Jan 13, 2022 at 02:04:38PM +0100, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jan 13 2022, Patrick Steinhardt wrote: > > > [[PGP Signed Part:Undecided]] > > When deleting refs from the loose-files refs backend, then we need to be > > careful to also delete the same ref from the packed refs backend, if it > > exists. If we don't, then deleting the loose ref would "uncover" the > > packed ref. We thus always have to queue up deletions of refs for both > > the loose and the packed refs backend. This is done in two separate > > transactions, where the end result is that the reference-transaction > > hook is executed twice for the deleted refs. > > But do we (which would be an issue before this series) delete the loose > and and then the packed one, thus racily exposing the stale ref to any > concurrent repository reader, or do we first update the packed ref to > the valu of the now-locked loose ref to avoid such a race? We first commit the packed-refs file so that the stale ref is not exposed. > > [...] > > Fix this behaviour and don't execute the reference-transaction hook at > > all when refs in the packed-refs backend if it's driven by the files > > backend. This works as expected even in case the refs to be deleted only > > exist in the packed-refs backend because the loose-backend always queues > > refs in its own transaction even if they don't exist such that they can > > be locked for concurrent creation. And it also does the right thing in > > case neither of the backends has the ref because that would cause the > > transaction to fail completely. > > I do wonder if the fundimental approach here is the right > one. I.e. changing the hook to only expose "real" updates, as opposed to > leaving it as a lower-level facility to listed in on any sort of ref > updates. > > In such a scenario we could imagine adding a third parameter or > otherwise flag the update as "real" to the hook, so a dumber hook > consumer could ignore the more verbose inter-transactional chatter. > > I say that because this change does the right thing for the use-case you > have in mind, but if you e.g. imagine a more gentle background-friendly > "gc" such a thing could be implemented by backing off as soon as it sees > an ongoing transaction being started. I've mostly been acting on the original report by Waleed. And I tend to agree with his report given that we also got a workaround at GitLab which filters out reference transactions which only consist of force deletions because they're likely to be pruning refs in the packed backend which are about to be uncovered. The result is that execution of the reftx hook is dependent on how well-packed a repository's refs are: when refs are packed we execute the hook twice, whereas we execute it once when it's not well-packed. This is surprising behaviour, even though one can definitely argue that it's just working as intended. I think ultimately the question boils down to whether we want to treat the files backend as a single compound backend and whether the reftx hook should treat it like that. If we treat it as a single backend, then we shouldn't report a change in refs when pruning about-to-be-uncovered refs given that it wouldn't have been visible, but it's only internal cleanup. And neither should we report ref changes when repacking refs into a single file given that from the backend's perspective nothing is about to change. Patrick > With my ae35e16cd43 (reflog expire: don't lock reflogs using previously > seen OID, 2021-08-23) not getting that more chatty data should be be OK > for such a hypothetical hook. > > But we might have more avoidable tripping over locks as the gc and ref > transaction race one another to lock various things in the repository. > > Or maybe nobody cares in practice, just food for thought.