[PATCH 0/2] git-p4: encoding of data from perforce

* [PATCH 0/2] git-p4: encoding of data from perforce
@ 2021-04-12  8:52 Andrew Oakley
  2021-04-12  8:52 ` [PATCH 1/2] git-p4: avoid decoding more " Andrew Oakley
  2021-04-12  8:52 ` [PATCH 2/2] git-p4: do not decode data from perforce by default Andrew Oakley
  0 siblings, 2 replies; 13+ messages in thread
From: Andrew Oakley @ 2021-04-12  8:52 UTC (permalink / raw)
  To: git; +Cc: Luke Diamand, Feiyang Xue, Tzadik Vanderhoof

When using python3, git-p4 fails to handle data from perforce which is
not valid UTF-8.  In large repositories it's very likely that such data
will exist - perforce itself does no validation of the data by default.

Historically git-p4 has just passed whatever bytes it got from perforce
into git.  This seems like a sensible approach - git-p4 has no idea what
encoding may have been used and it seems likely that different encodings
are used within a repository.

I was trying to do a more thorough job, moving more of git-p4 over to
using bytes.  Unfortunately the changes end up being large and hard to
review.  In most cases it's probably sufficient to just avoid decoding
the commit messages.

There have been a couple of previous proposals around trying to decode
this data using a user-configured encoding:
http://public-inbox.org/git/CAE5ih7-F9efsiV5AQmw3ocjiy+BT6ZAT5fA0Lx0OSkVTO8Kqjg@mail.gmail.com/T/
http://public-inbox.org/git/20210409153815.7joohvmlnh6itczc@tb-raspi4/T/

^ permalink raw reply	[flat|nested] 13+ messages in thread