xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@infradead.org>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: iommu@lists.linux.dev, x86@kernel.org,
	xen-devel@lists.xenproject.org,
	linux-arm-kernel@lists.infradead.org, linux-mips@vger.kernel.org,
	linux-kernel@vger.kernel.org, hch@infradead.org,
	sstabellini@kernel.org, linux@armlinux.org.uk,
	tsbogend@alpha.franken.de, jgross@suse.com,
	boris.ostrovsky@oracle.com, tglx@linutronix.de, mingo@redhat.com,
	bp@alien8.de, dave.hansen@linux.intel.com,
	m.szyprowski@samsung.com, konrad.wilk@oracle.com,
	robin.murphy@arm.com, hpa@zytor.com,
	oleksandr_tyshchenko@epam.com, joe.jin@oracle.com
Subject: Re: [PATCH RFC v2 1/1] wiotlb: split buffer into 32-bit default and 64-bit extra zones
Date: Tue, 20 Sep 2022 00:55:03 -0700	[thread overview]
Message-ID: <Yylx1xPdEbq3k4v1@infradead.org> (raw)
In-Reply-To: <20220820074250.5460-1-dongli.zhang@oracle.com>

On Sat, Aug 20, 2022 at 12:42:50AM -0700, Dongli Zhang wrote:
> Hello,
> 
> I used to send out RFC v1 to introduce an extra io_tlb_mem (created with
> SWIOTLB_ANY) in addition to the default io_tlb_mem (32-bit).  The
> dev->dma_io_tlb_mem is set to either default or the extra io_tlb_mem,
> depending on dma mask. However, that is not good for setting
> dev->dma_io_tlb_mem at swiotlb layer transparently as suggested by
> Christoph Hellwig.
> 
> https://lore.kernel.org/all/20220609005553.30954-1-dongli.zhang@oracle.com/
> 
> Therefore, this is another RFC v2 implementation following a different
> direction. The core ideas are:
> 
> 1. The swiotlb is splited into two zones, io_tlb_mem->zone[0] (32-bit) and
> io_tlb_mem->zone[1] (64-bit).
> 
> struct io_tlb_mem {
> 	struct io_tlb_zone zone[SWIOTLB_NR];
> 	struct dentry *debugfs;
> 	bool late_alloc;
> 	bool force_bounce;
> 	bool for_alloc;
> 	bool has_extra;
> };
> 
> struct io_tlb_zone {
> 	phys_addr_t start;
> 	phys_addr_t end;
> 	void *vaddr;
> 	unsigned long nslabs;
> 	unsigned long used;
> 	unsigned int nareas;
> 	unsigned int area_nslabs;
> 	struct io_tlb_area *areas;
> 	struct io_tlb_slot *slots;
> };
> 
> 2. By default, only io_tlb_mem->zone[0] is available. The
> io_tlb_mem->zone[1] is allocated conditionally if:
> 
> - the "swiotlb=" is configured to allocate extra buffer, and
> - the SWIOTLB_EXTRA is set in the flag (this is to make sure arch(s) other
>   than x86/sev/xen will not enable it until it is fully tested by each
>   arch, e.g., mips/powerpc). Currently it is enabled for x86 and xen.
> 
> 3. During swiotlb map, whether zone[0] (32-bit) or zone[1] (64-bit
> SWIOTLB_ANY)
> is used depends on min_not_zero(*dev->dma_mask, dev->bus_dma_limit).
> 
> To test the RFC v2, here is the QEMU command line.
> 
> qemu-system-x86_64 -smp 8 -m 32G -enable-kvm -vnc :5 -hda disk.img \
> -kernel path-to-linux/arch/x86_64/boot/bzImage \
> -append "root=/dev/sda1 init=/sbin/init text console=ttyS0 loglevel=7 swiotlb=32768,4194304,force" \
> -net nic -net user,hostfwd=tcp::5025-:22 \
> -device nvme,drive=nvme01,serial=helloworld -drive file=test.qcow2,if=none,id=nvme01 \
> -serial stdio
> 
> There is below in syslog. The extra 8GB buffer is allocated.
> 
> [    0.152251] software IO TLB: area num 8.
> ... ...
> [    3.706088] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
> [    3.707334] software IO TLB: mapped default [mem 0x00000000bbfd7000-0x00000000bffd7000] (64MB)
> [    3.708585] software IO TLB: mapped extra [mem 0x000000061cc00000-0x000000081cc00000] (8192MB)
> 
> After the FIO is triggered over NVMe, the 64-bit buffer is used.
> 
> $ cat /sys/kernel/debug/swiotlb/io_tlb_nslabs_extra
> 4194304
> $ cat /sys/kernel/debug/swiotlb/io_tlb_used_extra
> 327552
> 
> Would you mind helping if this is the right direction to go?
> 
> Thank you very much!
> 
> Cc: Konrad Wilk <konrad.wilk@oracle.com>
> Cc: Joe Jin <joe.jin@oracle.com>
> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
> ---
>  arch/arm/xen/mm.c                      |   2 +-
>  arch/mips/pci/pci-octeon.c             |   5 +-
>  arch/x86/include/asm/xen/swiotlb-xen.h |   2 +-
>  arch/x86/kernel/pci-dma.c              |   6 +-
>  drivers/xen/swiotlb-xen.c              |  18 +-
>  include/linux/swiotlb.h                |  73 +++--

> diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> index 3d826c0..4edfa42 100644
> --- a/arch/arm/xen/mm.c
> +++ b/arch/arm/xen/mm.c
> @@ -125,7 +125,7 @@ static int __init xen_mm_init(void)
>  		return 0;
>  
>  	/* we can work with the default swiotlb */
> -	if (!io_tlb_default_mem.nslabs) {
> +	if (!io_tlb_default_mem.zone[SWIOTLB_DF].nslabs) {
>  		rc = swiotlb_init_late(swiotlb_size_or_default(),
>  				       xen_swiotlb_gfp(), NULL);
>  		if (rc < 0)

First thing we need before doing anything about multiple default
pools is to get all the knowledge of details hidden inside swiotlb.c.

For swiotlb_init_late that seems easy as we can just move the check
into it.

> diff --git a/arch/mips/pci/pci-octeon.c b/arch/mips/pci/pci-octeon.c
> index e457a18..0bf0859 100644
> --- a/arch/mips/pci/pci-octeon.c
> +++ b/arch/mips/pci/pci-octeon.c
> @@ -654,6 +654,9 @@ static int __init octeon_pci_setup(void)
>  		octeon_pci_mem_resource.end =
>  			octeon_pci_mem_resource.start + (1ul << 30);
>  	} else {
> +		struct io_tlb_mem *mem = &io_tlb_default_mem;
> +		struct io_tlb_zone *zone = &mem->zone[SWIOTLB_DF];
> +
>  		/* Remap the Octeon BAR 0 to map 128MB-(128MB+4KB) */
>  		octeon_npi_write32(CVMX_NPI_PCI_CFG04, 128ul << 20);
>  		octeon_npi_write32(CVMX_NPI_PCI_CFG05, 0);
> @@ -664,7 +667,7 @@ static int __init octeon_pci_setup(void)
>  
>  		/* BAR1 movable regions contiguous to cover the swiotlb */
>  		octeon_bar1_pci_phys =
> -			io_tlb_default_mem.start & ~((1ull << 22) - 1);
> +			zone->start & ~((1ull << 22) - 1);

But we'll need to do something about this mess.  I'll need help from
the octeon maintainer on it.

> -	x86_swiotlb_flags |= SWIOTLB_ANY;
> +	x86_swiotlb_flags |= SWIOTLB_ANY | SWIOTLB_EXTRA;

I don't think this is how it is suppoed to be.  SWIOTLB_ANY already
says give me a pool with no addressing constrains.  We don't need
two pools without that.  EXTRA is also not exactly a very helpful
name here.

>  #ifdef CONFIG_X86
> -int xen_swiotlb_fixup(void *buf, unsigned long nslabs)
> +int xen_swiotlb_fixup(void *buf, unsigned long nslabs, unsigned int flags)
>  {
>  	int rc;
>  	unsigned int order = get_order(IO_TLB_SEGSIZE << IO_TLB_SHIFT);
>  	unsigned int i, dma_bits = order + PAGE_SHIFT;
>  	dma_addr_t dma_handle;
>  	phys_addr_t p = virt_to_phys(buf);
> +	unsigned int max_dma_bits = 32;

I think live will be a lot simple if the addressing bits are passed to
this function, and not some kind of flags.

> +#define SWIOTLB_DF	0
> +#define SWIOTLB_EX	1
> +#define SWIOTLB_NR	2

These names are not very descriptive.


      reply	other threads:[~2022-09-20  7:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-20  7:42 [PATCH RFC v2 1/1] wiotlb: split buffer into 32-bit default and 64-bit extra zones Dongli Zhang
2022-09-20  7:55 ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yylx1xPdEbq3k4v1@infradead.org \
    --to=hch@infradead.org \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=dongli.zhang@oracle.com \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgross@suse.com \
    --cc=joe.jin@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=m.szyprowski@samsung.com \
    --cc=mingo@redhat.com \
    --cc=oleksandr_tyshchenko@epam.com \
    --cc=robin.murphy@arm.com \
    --cc=sstabellini@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tsbogend@alpha.franken.de \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).