All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG b4] Encoding issues with --auto-to-cc
@ 2023-07-24 10:49 Duje Mihanović
  2023-07-26 19:48 ` Konstantin Ryabitsev
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Duje Mihanović @ 2023-07-24 10:49 UTC (permalink / raw)
  To: tools

[-- Attachment #1: Type: text/plain, Size: 2291 bytes --]

I decided to try using b4 to submit a patchset for adding Marvell PXA1908 ARM 
SoC support. Having enrolled an existing branch, I ran `b4 prep -c` and got 
the following error (this is with the -d switch added):

Collecting from: [PATCH v2 06/10] dt-bindings: clock: Add documentation for 
Marvell PXA1908
Running git --no-pager rev-parse --show-toplevel
Changing dir to /home/duje/code/linux
Running /home/duje/code/linux/scripts/get_maintainer.pl --nogit --nogit-
fallback --nogit-chief-penguins --norolestats --nol
Changing back into /home/duje/code/linux/Documentation/process
Traceback (most recent call last):
  File "/usr/bin/b4", line 33, in <module>
    sys.exit(load_entry_point('b4==0.12.3', 'console_scripts', 'b4')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/b4/command.py", line 360, in cmd
    cmdargs.func(cmdargs)
  File "/usr/lib/python3.11/site-packages/b4/command.py", line 76, in cmd_prep
    b4.ez.cmd_prep(cmdargs)
  File "/usr/lib/python3.11/site-packages/b4/ez.py", line 1994, in cmd_prep
    auto_to_cc()
  File "/usr/lib/python3.11/site-packages/b4/ez.py", line 1896, in auto_to_cc
    for tname, pairs in (('To', get_addresses_from_cmd(tocmd, msgbytes)),
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/site-packages/b4/ez.py", line 948, in 
get_addresses_from_cmd
    addrs = out.strip().decode()
            ^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x87 in position 201: 
invalid start byte

I suspected that using an existing branch instead of creating a new one could 
have caused this problem, so I unzipped a 6.5-rc2 tarball into /tmp, 
initialized a Git repository there and used `git am` to apply initially only 
the patch 06/10 which was causing issues in the other repository and ran `b4 
prep -c` again. This time b4 was able to collect the addresses without issues, 
but then I tried `git am`ing the whole set into the /tmp repository and this 
time `b4 prep -c` failed with the same error on the same patch.

Steps to reproduce:
    - Checkout Linux 6.5-rc2
    - Run `b4 prep -F "<20230721210042.21535-1-duje.mihanovic@skole.hr>" -n 
<any branch name>`
    - Run `b4 prep -c`

--
Regards,
Duje

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG b4] Encoding issues with --auto-to-cc
  2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
@ 2023-07-26 19:48 ` Konstantin Ryabitsev
  2023-07-26 19:54 ` Kernel.org Bugbot
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-07-26 19:48 UTC (permalink / raw)
  To: Duje Mihanović; +Cc: tools

On Mon, Jul 24, 2023 at 12:49:41PM +0200, Duje Mihanović wrote:
> Steps to reproduce:
>     - Checkout Linux 6.5-rc2
>     - Run `b4 prep -F "<20230721210042.21535-1-duje.mihanovic@skole.hr>" -n 
> <any branch name>`
>     - Run `b4 prep -c`

Thank you for that -- I can verify that it's happening.

bugbot assign to me

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Encoding issues with --auto-to-cc
  2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
  2023-07-26 19:48 ` Konstantin Ryabitsev
@ 2023-07-26 19:54 ` Kernel.org Bugbot
  2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
  2023-07-26 20:37 ` Kernel.org Bugbot
  3 siblings, 0 replies; 7+ messages in thread
From: Kernel.org Bugbot @ 2023-07-26 19:54 UTC (permalink / raw)
  To: duje.mihanovic, tools

Hello:

This conversation is now tracked by Kernel.org Bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=217713

There is no need to do anything else, just keep talking.
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG b4] Encoding issues with --auto-to-cc
  2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
  2023-07-26 19:48 ` Konstantin Ryabitsev
  2023-07-26 19:54 ` Kernel.org Bugbot
@ 2023-07-26 20:29 ` Konstantin Ryabitsev
  2023-07-27 10:02   ` Duje Mihanović
  2023-07-26 20:37 ` Kernel.org Bugbot
  3 siblings, 1 reply; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-07-26 20:29 UTC (permalink / raw)
  To: Duje Mihanović; +Cc: tools

On Mon, Jul 24, 2023 at 12:49:41PM +0200, Duje Mihanović wrote:
> I decided to try using b4 to submit a patchset for adding Marvell PXA1908 ARM 
> SoC support. Having enrolled an existing branch, I ran `b4 prep -c` and got 
> the following error (this is with the -d switch added):

So, there's apparently something very interesting about that final ć in your
name that trips up get_maintainer.pl. For example, run the following:

$ ./scripts/get_maintainer.pl -f Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml

You will get back a byte sequence \x87 where your name should be:

    "<87>" <duje.mihanovic@skole.hr> (in file)

This is because ć is 0xC4 0x87, but I have no idea why get_maintainer.pl trips
up and splits the unicode sequence into two bytes. It seems to want to do that
for anything above base extended ascii (Latin-A).

I can "fix" this in b4 by forcing it to ignore any unrecognized unicode errors
in get_maintainer.pl output, but it's not a real fix for the underlying
problem.

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Encoding issues with --auto-to-cc
  2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
                   ` (2 preceding siblings ...)
  2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
@ 2023-07-26 20:37 ` Kernel.org Bugbot
  3 siblings, 0 replies; 7+ messages in thread
From: Kernel.org Bugbot @ 2023-07-26 20:37 UTC (permalink / raw)
  To: konstantin, duje.mihanovic, tools

Konstantin Ryabitsev writes in commit 034f2fb2ac27c89c1c7ab2af04d26ba63be9ea6c:

ez: ignore invalid unicode returned by get_maintainer

There's a bug in get_maintainer.pl that returns invalid unicode in
certain situations (see bug linked below). We can't fix this in b4, but
at least we can avoid crashing when we encounter this problem.

Reported-by: Duje Mihanović <duje.mihanovic@skole.hr>
Link: https://msgid.link/1940519.PYKUYFuaPT@radijator
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217713
Signed-off-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org>

(via https://git.kernel.org/pub/scm/utils/b4/b4.git/commit/?id=034f2fb2ac27)
-- 
Deet-doot-dot, I am a bot.
Kernel.org Bugzilla (peebz 0.1)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG b4] Encoding issues with --auto-to-cc
  2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
@ 2023-07-27 10:02   ` Duje Mihanović
  2023-07-27 14:42     ` Konstantin Ryabitsev
  0 siblings, 1 reply; 7+ messages in thread
From: Duje Mihanović @ 2023-07-27 10:02 UTC (permalink / raw)
  To: tools

[-- Attachment #1: Type: text/plain, Size: 1060 bytes --]

On Wednesday, July 26, 2023 22:29:55 CEST, Konstantin Ryabitsev wrote:
> So, there's apparently something very interesting about that final ć in your
> name that trips up get_maintainer.pl. For example, run the following:
> 
> $ ./scripts/get_maintainer.pl -f
> Documentation/devicetree/bindings/clock/marvell,pxa1908.yaml
> 
> You will get back a byte sequence \x87 where your name should be:
> 
>     "<87>" <duje.mihanovic@skole.hr> (in file)
> 
> This is because ć is 0xC4 0x87, but I have no idea why get_maintainer.pl 
trips
> up and splits the unicode sequence into two bytes. It seems to want to do
> that for anything above base extended ascii (Latin-A).
> 
> I can "fix" this in b4 by forcing it to ignore any unrecognized unicode 
errors
> in get_maintainer.pl output, but it's not a real fix for the underlying
> problem.

I think it's wrong to say get_maintainer splits up the unicode sequence, it 
straight up ignores the first half of the sequence. Perhaps I should file a 
bug for get_maintainer.pl?

Regards,
Duje

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [BUG b4] Encoding issues with --auto-to-cc
  2023-07-27 10:02   ` Duje Mihanović
@ 2023-07-27 14:42     ` Konstantin Ryabitsev
  0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Ryabitsev @ 2023-07-27 14:42 UTC (permalink / raw)
  To: Duje Mihanović; +Cc: tools

On Thu, Jul 27, 2023 at 12:02:16PM +0200, Duje Mihanović wrote:
> I think it's wrong to say get_maintainer splits up the unicode sequence, it 
> straight up ignores the first half of the sequence. Perhaps I should file a 
> bug for get_maintainer.pl?

Yes, you should report the bug for sure. As you can tell from your originally
submitted series, it's manifested itself multiple times already:

	To: =?UTF-8?q?=87?= <duje.mihanovic@skole.hr>

-K

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-07-27 14:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-24 10:49 [BUG b4] Encoding issues with --auto-to-cc Duje Mihanović
2023-07-26 19:48 ` Konstantin Ryabitsev
2023-07-26 19:54 ` Kernel.org Bugbot
2023-07-26 20:29 ` [BUG b4] " Konstantin Ryabitsev
2023-07-27 10:02   ` Duje Mihanović
2023-07-27 14:42     ` Konstantin Ryabitsev
2023-07-26 20:37 ` Kernel.org Bugbot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.