* [BUG b4] Encoding issues with --auto-to-cc
@ 2023-07-24 10:49 Duje Mihanović
2023-07-26 19:48 ` Konstantin Ryabitsev
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Duje Mihanović @ 2023-07-24 10:49 UTC (permalink / raw)
To: tools
[-- Attachment #1: Type: text/plain, Size: 2291 bytes --]
I decided to try using b4 to submit a patchset for adding Marvell PXA1908 ARM
SoC support. Having enrolled an existing branch, I ran `b4 prep -c` and got
the following error (this is with the -d switch added):
Collecting from: [PATCH v2 06/10] dt-bindings: clock: Add documentation for
Marvell PXA1908
Running git --no-pager rev-parse --show-toplevel
Changing dir to /home/duje/code/linux
Running /home/duje/code/linux/scripts/get_maintainer.pl --nogit --nogit-
fallback --nogit-chief-penguins --norolestats --nol
Changing back into /home/duje/code/linux/Documentation/process
Traceback (most recent call last):
File "/usr/bin/b4", line 33, in <module>
sys.exit(load_entry_point('b4==0.12.3', 'console_scripts', 'b4')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/b4/command.py", line 360, in cmd
cmdargs.func(cmdargs)
File "/usr/lib/python3.11/site-packages/b4/command.py", line 76, in cmd_prep
b4.ez.cmd_prep(cmdargs)
File "/usr/lib/python3.11/site-packages/b4/ez.py", line 1994, in cmd_prep
auto_to_cc()
File "/usr/lib/python3.11/site-packages/b4/ez.py", line 1896, in auto_to_cc
for tname, pairs in (('To', get_addresses_from_cmd(tocmd, msgbytes)),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/b4/ez.py", line 948, in
get_addresses_from_cmd
addrs = out.strip().decode()
^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 201:
invalid start byte
I suspected that using an existing branch instead of creating a new one could
have caused this problem, so I unzipped a 6.5-rc2 tarball into /tmp,
initialized a Git repository there and used `git am` to apply initially only
the patch 06/10 which was causing issues in the other repository and ran `b4
prep -c` again. This time b4 was able to collect the addresses without issues,
but then I tried `git am`ing the whole set into the /tmp repository and this
time `b4 prep -c` failed with the same error on the same patch.
Steps to reproduce:
- Checkout Linux 6.5-rc2
- Run `b4 prep -F "<20230721210042.21535-1-duje.mihanovic@skole.hr>" -n
<any branch name>`
- Run `b4 prep -c`
--
Regards,
Duje
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG b4] Encoding issues with --auto-to-cc
2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
@ 2023-07-26 19:48 ` Konstantin Ryabitsev
2023-07-26 19:54 ` Kernel.org Bugbot
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-07-26 19:48 UTC (permalink / raw)
To: Duje Mihanović; +Cc: tools
On Mon, Jul 24, 2023 at 12:49:41PM +0200, Duje Mihanović wrote:
> Steps to reproduce:
> - Checkout Linux 6.5-rc2
> - Run `b4 prep -F "<20230721210042.21535-1-duje.mihanovic@skole.hr>" -n
> <any branch name>`
> - Run `b4 prep -c`
Thank you for that -- I can verify that it's happening.
bugbot assign to me
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Encoding issues with --auto-to-cc
2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
2023-07-26 19:48 ` Konstantin Ryabitsev
@ 2023-07-26 19:54 ` Kernel.org Bugbot
2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
2023-07-26 20:37 ` Kernel.org Bugbot
3 siblings, 0 replies; 7+ messages in thread
From: Kernel.org Bugbot @ 2023-07-26 19:54 UTC (permalink / raw)
To: duje.mihanovic, tools
Hello:
This conversation is now tracked by Kernel.org Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=217713
There is no need to do anything else, just keep talking.
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG b4] Encoding issues with --auto-to-cc
2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
2023-07-26 19:48 ` Konstantin Ryabitsev
2023-07-26 19:54 ` Kernel.org Bugbot
@ 2023-07-26 20:29 ` Konstantin Ryabitsev
2023-07-27 10:02 ` Duje Mihanović
2023-07-26 20:37 ` Kernel.org Bugbot
3 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-07-26 20:29 UTC (permalink / raw)
To: Duje Mihanović; +Cc: tools
On Mon, Jul 24, 2023 at 12:49:41PM +0200, Duje Mihanović wrote:
> I decided to try using b4 to submit a patchset for adding Marvell PXA1908 ARM
> SoC support. Having enrolled an existing branch, I ran `b4 prep -c` and got
> the following error (this is with the -d switch added):
So, there's apparently something very interesting about that final ć in your
name that trips up get_maintainer.pl. For example, run the following:
$ ./scripts/get_maintainer.pl -f Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
You will get back a byte sequence \x87 where your name should be:
"<87>" <duje.mihanovic@skole.hr> (in file)
This is because ć is 0xC4 0x87, but I have no idea why get_maintainer.pl trips
up and splits the unicode sequence into two bytes. It seems to want to do that
for anything above base extended ascii (Latin-A).
I can "fix" this in b4 by forcing it to ignore any unrecognized unicode errors
in get_maintainer.pl output, but it's not a real fix for the underlying
problem.
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Encoding issues with --auto-to-cc
2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
` (2 preceding siblings ...)
2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
@ 2023-07-26 20:37 ` Kernel.org Bugbot
3 siblings, 0 replies; 7+ messages in thread
From: Kernel.org Bugbot @ 2023-07-26 20:37 UTC (permalink / raw)
To: konstantin, duje.mihanovic, tools
Konstantin Ryabitsev writes in commit 034f2fb2ac27c89c1c7ab2af04d26ba63be9ea6c:
ez: ignore invalid unicode returned by get_maintainer
There's a bug in get_maintainer.pl that returns invalid unicode in
certain situations (see bug linked below). We can't fix this in b4, but
at least we can avoid crashing when we encounter this problem.
Reported-by: Duje Mihanović <duje.mihanovic@skole.hr>
Link: https://msgid.link/1940519.PYKUYFuaPT@radijator
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217713
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>
(via https://git.kernel.org/pub/scm/utils/b4/b4.git/commit/?id=034f2fb2ac27)
--
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG b4] Encoding issues with --auto-to-cc
2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
@ 2023-07-27 10:02 ` Duje Mihanović
2023-07-27 14:42 ` Konstantin Ryabitsev
0 siblings, 1 reply; 7+ messages in thread
From: Duje Mihanović @ 2023-07-27 10:02 UTC (permalink / raw)
To: tools
[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]
On Wednesday, July 26, 2023 22:29:55 CEST, Konstantin Ryabitsev wrote:
> So, there's apparently something very interesting about that final ć in your
> name that trips up get_maintainer.pl. For example, run the following:
>
> $ ./scripts/get_maintainer.pl -f
> Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
>
> You will get back a byte sequence \x87 where your name should be:
>
> "<87>" <duje.mihanovic@skole.hr> (in file)
>
> This is because ć is 0xC4 0x87, but I have no idea why get_maintainer.pl
trips
> up and splits the unicode sequence into two bytes. It seems to want to do
> that for anything above base extended ascii (Latin-A).
>
> I can "fix" this in b4 by forcing it to ignore any unrecognized unicode
errors
> in get_maintainer.pl output, but it's not a real fix for the underlying
> problem.
I think it's wrong to say get_maintainer splits up the unicode sequence, it
straight up ignores the first half of the sequence. Perhaps I should file a
bug for get_maintainer.pl?
Regards,
Duje
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [BUG b4] Encoding issues with --auto-to-cc
2023-07-27 10:02 ` Duje Mihanović
@ 2023-07-27 14:42 ` Konstantin Ryabitsev
0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-07-27 14:42 UTC (permalink / raw)
To: Duje Mihanović; +Cc: tools
On Thu, Jul 27, 2023 at 12:02:16PM +0200, Duje Mihanović wrote:
> I think it's wrong to say get_maintainer splits up the unicode sequence, it
> straight up ignores the first half of the sequence. Perhaps I should file a
> bug for get_maintainer.pl?
Yes, you should report the bug for sure. As you can tell from your originally
submitted series, it's manifested itself multiple times already:
To: =?UTF-8?q?=87?= <duje.mihanovic@skole.hr>
-K
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-07-27 14:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
2023-07-26 19:48 ` Konstantin Ryabitsev
2023-07-26 19:54 ` Kernel.org Bugbot
2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
2023-07-27 10:02 ` Duje Mihanović
2023-07-27 14:42 ` Konstantin Ryabitsev
2023-07-26 20:37 ` Kernel.org Bugbot
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.