On Monday, 20 December 2021 11:03:08 CET Kalle Valo wrote: [...] Thanks for all the explanation and pointers. I will try to use this to more clearly formulate my concern. If I understood it correctly then ev->replay_counter is: * __le64 on little endian systems * __be64 on big endian systems Or in short: it is just an u64. > Yeah, if the host does the conversion we would use __le64. But at the > moment the firmware does the conversion so I think we should use > ath11k_ce_byte_swap(): > > /* For Big Endian Host, Copy Engine byte_swap is enabled > * When Copy Engine does byte_swap, need to byte swap again for the > * Host to get/put buffer content in the correct byte order > */ > void ath11k_ce_byte_swap(void *mem, u32 len) > { > int i; > > if (IS_ENABLED(CONFIG_CPU_BIG_ENDIAN)) { > if (!mem) > return; > > for (i = 0; i < (len / 4); i++) { > *(u32 *)mem = swab32(*(u32 *)mem); > mem += 4; > } > } > } This function doesn't work for 64 bit values (if they are actually in big endian). It just rearranges (len / 4) u32s to host byte order - so the upper and lower 32 bit values for an u64 would still be swapped. Unless I misunderstood what CE_ATTR_BYTE_SWAP_DATA is supposed to do. Maybe it is not causing returned data to be in big/little endian but causes for one of the host endianess' that the data for 64-bit values in mixed endianness. And if the function would operate on a struct with 16 bit or 8 bit values then we have something which we call here Kuddelmuddel [1]. But if the value is an u64, then the code in the patch is wrong: > /* supplicant expects big-endian replay counter */ > replay_ctr = cpu_to_be64(le64_to_cpup((__le64 *)ev->replay_counter)); This would break on big endian architectures because ev->replay_counter is a __be64 and not a __le64 on these systems. Just from the way the byte ordering is supposed to look like - not the data type for the C-compiler). If you have a look at what the code does (beside 64 bit load by _cpup), is just to add a single swap64 - either by cpu_to_be64 or by le64_to_cpup - depending on whether the host system is little endian or big endian. So for a __le64, it would (besides the incorrectly aligned 64 bit load from struct wmi_gtk_offload_status_event), do a single swap64 to __be64. This swap64 is from cpu_to_be64 and le64_to_cpup doesn't swap anything. But on a big endian system, the __be64 would also be sent through a swap64 (from le64_to_cpup) and cpu_to_be64 wouldn't swap anything. So at the end, it would be a __le64. So something which conflicts with the comment above this code line. There are now various ways to correctly implement it: * change replay_counter to an u64 in the (packed) struct and: replay_ctr = cpu_to_be64(ev->replay_counter); * keep it as u8[8] in the struct and make sure yourself that an unaligned-safe 64 bit load is used: replay_ctr = cpu_to_be64(get_unaligned((u64 *)ev->replay_counter)); Kind regards, Sven [1] It is something like "jumble" or "mess"