Hi Konrad,

Thanks for testing it out. I have updated the patch and tested on 5.11.0-rc2+ with dhclient successfully. Could you please help me verify if the patch works on your side?

Thank you

NVMe driver and other applications depend on the data offset
to operate correctly. Currently when unaligned data is mapped via
SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
booting with --swiotlb=force option and using NVMe as interface,
running mkfs.xfs on Rhel fails because of the unalignment issue.
This patch makes sure the mapped data preserves
its offset of the orginal address. Tested on latest kernel that
this patch fixes the issue.

Signed-off-by: Jianxiong Gao <jxgao@google.com>
Acked-by: David Rientjes <rientjes@google.com>
---
 kernel/dma/swiotlb.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 781b9dca197c..56a35e71b3fd 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -483,6 +483,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
        max_slots = mask + 1
                    ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
                    : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
+
+       /*
+        * We need to keep the offset when mapping, so adding the offset
+        * to the total set we need to allocate in SWIOTLB
+        */
+       alloc_size += offset_in_page(orig_addr);

        /*
         * For mappings greater than or equal to a page, we limit the stride
@@ -567,6 +573,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
         */
        for (i = 0; i < nslots; i++)
                io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+       /*
+        * When keeping the offset of the original data, we need to advance
+        * the tlb_addr by the offset of orig_addr.
+        */
+       tlb_addr += offset_in_page(orig_addr);
        if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
            (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
                swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
--



On Fri, Dec 11, 2020 at 12:39 PM Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote:
On Mon, Dec 07, 2020 at 01:42:04PM -0800, Jianxiong Gao wrote:
> NVMe driver and other applications depend on the data offset
> to operate correctly. Currently when unaligned data is mapped via
> SWIOTLB, the data is mapped as slab aligned with the SWIOTLB. When
> booting with --swiotlb=force option and using NVMe as interface,
> running mkfs.xfs on Rhel fails because of the unalignment issue.
> This patch makes sure the mapped data preserves
> its offset of the orginal address. Tested on latest kernel that
> this patch fixes the issue.
>
> Signed-off-by: Jianxiong Gao <jxgao@google.com>
> Acked-by: David Rientjes <rientjes@google.com>

This breaks DHCP with upstream kernel (applied this on top v5.10-rc7)
and used swiotlb=262144,force and now the dhclient is not working:

[  119.300502] bnxt_en 0000:3b:00.0 eno2np0: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive & transmit
[  119.437573] bnxt_en 0000:3b:00.0 eno2np0: FEC autoneg off encoding: None
[   90.064220] dracut-initqueue[1477]: Warning: dhcp for interface eno2np0 failed
[  101.155295] dracut-initqueue[1477]: Warning: dhcp for interfa[  142.361359] bnxt_en 0000:3b:00.1 eno3np1: NIC Link is Up, 25000 Mbps full duplex, Flow control: ON - receive & transmit
ce eno2np0 faile[  142.501860] bnxt_en 0000:3b:00.1 eno3np1: FEC autoneg off encoding: None
d
[  113.054108] dracut-initqueue[1477]: Warning: dhcp for interface eno3np1 failed
[  123.867108] dracut-initqueue[1477]: Warning: dhcp for interface eno3np1 failed
[  251.888002] dracut-initqueue[1477]: Warning: dracut-initqueue timeout - starting timeout scripts

Dropping from linux-next.

> ---
>  kernel/dma/swiotlb.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 781b9dca197c..56a35e71b3fd 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -483,6 +483,12 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>       max_slots = mask + 1
>                   ? ALIGN(mask + 1, 1 << IO_TLB_SHIFT) >> IO_TLB_SHIFT
>                   : 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
> +
> +     /*
> +      * We need to keep the offset when mapping, so adding the offset
> +      * to the total set we need to allocate in SWIOTLB
> +      */
> +     alloc_size += offset_in_page(orig_addr);

>       /*
>        * For mappings greater than or equal to a page, we limit the stride
> @@ -567,6 +573,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t orig_addr,
>        */
>       for (i = 0; i < nslots; i++)
>               io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
> +     /*
> +      * When keeping the offset of the original data, we need to advance
> +      * the tlb_addr by the offset of orig_addr.
> +      */
> +     tlb_addr += orig_addr & (PAGE_SIZE - 1);
>       if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>           (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
>               swiotlb_bounce(orig_addr, tlb_addr, mapping_size, DMA_TO_DEVICE);
> --
> 2.27.0
>
>


--
Jianxiong Gao