On Monday, 20 December 2021 11:03:08 CET Kalle Valo wrote:
[...]

Thanks for all the explanation and pointers. I will try to use this to more 
clearly formulate my concern.

If I understood it correctly then ev->replay_counter is:

* __le64 on little endian systems
* __be64 on big endian systems

Or in short: it is just an u64.

> Yeah, if the host does the conversion we would use __le64. But at the
> moment the firmware does the conversion so I think we should use
> ath11k_ce_byte_swap():
> 
> /* For Big Endian Host, Copy Engine byte_swap is enabled
>  * When Copy Engine does byte_swap, need to byte swap again for the
>  * Host to get/put buffer content in the correct byte order
>  */
> void ath11k_ce_byte_swap(void *mem, u32 len)
> {
> 	int i;
> 
> 	if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) {
> 		if (!mem)
> 			return;
> 
> 		for (i = 0; i < (len / 4); i++) {
> 			*(u32 *)mem = swab32(*(u32 *)mem);
> 			mem += 4;
> 		}
> 	}
> }

This function doesn't work for 64 bit values (if they are actually in big 
endian). It just rearranges (len / 4) u32s to host byte order - so the upper 
and lower 32 bit values for an u64 would still be swapped.

Unless I misunderstood what CE_ATTR_BYTE_SWAP_DATA is supposed to do. Maybe it 
is not causing returned data to be in big/little endian but causes for one of 
the host endianess' that the data for 64-bit values in mixed endianness.

And if the function would operate on a struct with 16 bit or 8 bit values then 
we have something which we call here Kuddelmuddel [1].


But if the value is an u64, then the code in the patch is wrong:

> /* supplicant expects big-endian replay counter */
> replay_ctr = cpu_to_be64(le64_to_cpup((__le64 *)ev->replay_counter));

This would break on big endian architectures because ev->replay_counter is a 
__be64 and not a __le64 on these systems. Just from the way the byte ordering 
is supposed to look like - not the data type for the C-compiler).

If you have a look at what the code does (beside 64 bit load by _cpup), is 
just to add a single swap64 - either by cpu_to_be64 or by le64_to_cpup - 
depending on whether the host system is little endian or big endian.

So for a __le64, it would (besides the incorrectly aligned 64 bit load from 
struct wmi_gtk_offload_status_event),  do a single swap64 to __be64. This 
swap64 is from cpu_to_be64 and le64_to_cpup doesn't swap anything.

But on a big endian system, the __be64 would also be sent through a swap64 
(from le64_to_cpup) and cpu_to_be64 wouldn't swap anything. So at the end, it 
would be a __le64. So something which conflicts with the comment above this 
code line.


There are now various ways to correctly implement it:

* change replay_counter to an u64 in the (packed) struct and:

      replay_ctr = cpu_to_be64(ev->replay_counter);

* keep it as u8[8] in the struct and make sure yourself that an unaligned-safe
  64 bit load is used:

      replay_ctr = cpu_to_be64(get_unaligned((u64 *)ev->replay_counter));


Kind regards,
	Sven


[1] It is something like "jumble" or "mess"