linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Haiyang Zhang <haiyangz@microsoft.com>
To: Michael Kelley <mikelley@microsoft.com>,
	Tianyu Lan <ltykernel@gmail.com>,
	KY Srinivasan <kys@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	"wei.liu@kernel.org" <wei.liu@kernel.org>,
	Dexuan Cui <decui@microsoft.com>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"mingo@redhat.com" <mingo@redhat.com>,
	"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
	"hpa@zytor.com" <hpa@zytor.com>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	"luto@kernel.org" <luto@kernel.org>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
	"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
	"jgross@suse.com" <jgross@suse.com>,
	"sstabellini@kernel.org" <sstabellini@kernel.org>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"will@kernel.org" <will@kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
	"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
	"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
	"arnd@arndb.de" <arnd@arndb.de>, "hch@lst.de" <hch@lst.de>,
	"m.szyprowski@samsung.com" <m.szyprowski@samsung.com>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"brijesh.singh@amd.com" <brijesh.singh@amd.com>,
	Tianyu Lan <Tianyu.Lan@microsoft.com>,
	"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
	"pgonda@google.com" <pgonda@google.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"kirill.shutemov@linux.intel.com"
	<kirill.shutemov@linux.intel.com>,
	"rppt@kernel.org" <rppt@kernel.org>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"aneesh.kumar@linux.ibm.com" <aneesh.kumar@linux.ibm.com>,
	"saravanand@fb.com" <saravanand@fb.com>,
	"krish.sadhukhan@oracle.com" <krish.sadhukhan@oracle.com>,
	"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	"tj@kernel.org" <tj@kernel.org>,
	"rientjes@google.com" <rientjes@google.com>
Cc: "iommu@lists.linux-foundation.org"
	<iommu@lists.linux-foundation.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	vkuznets <vkuznets@redhat.com>,
	"parri.andrea@gmail.com" <parri.andrea@gmail.com>,
	"dave.hansen@intel.com" <dave.hansen@intel.com>
Subject: RE: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver
Date: Wed, 15 Sep 2021 16:46:57 +0000	[thread overview]
Message-ID: <MN2PR21MB12959F10240EC1BB2270B345CADB9@MN2PR21MB1295.namprd21.prod.outlook.com> (raw)
In-Reply-To: <MWHPR21MB15939A5D74CA1DF25EE816ADD7DB9@MWHPR21MB1593.namprd21.prod.outlook.com>



> -----Original Message-----
> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Wednesday, September 15, 2021 12:22 PM
> To: Tianyu Lan <ltykernel@gmail.com>; KY Srinivasan <kys@microsoft.com>;

> > +				memset(vmap_pages, 0,
> > +				       sizeof(*vmap_pages) * vmap_page_index);
> > +				vmap_page_index = 0;
> > +
> > +				for (j = 0; j < i; j++)
> > +					__free_pages(pages[j], alloc_unit);
> > +
> > +				kfree(pages);
> > +				alloc_unit = 1;
> 
> This is the case where a large enough contiguous physical memory chunk
> could not be found.  But rather than dropping all the way down to single
> pages, would it make sense to try something smaller, but not 1?  For
> example, cut the alloc_unit in half and try again.  But I'm not sure of
> all the implications.

I had the same question. But probably gradually decrementing uses too much
time?

> 
> > +				goto retry;
> > +			}
> > +		}
> > +
> > +		pages[i] = page;
> > +		for (j = 0; j < alloc_unit; j++)
> > +			vmap_pages[vmap_page_index++] = page++;
> > +	}
> > +
> > +	vaddr = vmap(vmap_pages, vmap_page_index, VM_MAP, PAGE_KERNEL);
> > +	kfree(vmap_pages);
> > +
> > +	*pages_array = pages;
> > +	return vaddr;
> > +
> > +cleanup:
> > +	for (j = 0; j < i; j++)
> > +		__free_pages(pages[i], alloc_unit);
> > +
> > +	kfree(pages);
> > +	kfree(vmap_pages);
> > +	return NULL;
> > +}
> > +
> > +static void *netvsc_map_pages(struct page **pages, int count, int
> > +alloc_unit) {
> > +	int pg_count = count * alloc_unit;
> > +	struct page *page;
> > +	unsigned long *pfns;
> > +	int pfn_index = 0;
> > +	void *vaddr;
> > +	int i, j;
> > +
> > +	if (!pages)
> > +		return NULL;
> > +
> > +	pfns = kcalloc(pg_count, sizeof(*pfns), GFP_KERNEL);
> > +	if (!pfns)
> > +		return NULL;
> > +
> > +	for (i = 0; i < count; i++) {
> > +		page = pages[i];
> > +		if (!page) {
> > +			pr_warn("page is not available %d.\n", i);
> > +			return NULL;
> > +		}
> > +
> > +		for (j = 0; j < alloc_unit; j++) {
> > +			pfns[pfn_index++] = page_to_pfn(page++) +
> > +				(ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
> > +		}
> > +	}
> > +
> > +	vaddr = vmap_pfn(pfns, pg_count, PAGE_KERNEL_IO);
> > +	kfree(pfns);
> > +	return vaddr;
> > +}
> > +
> 
> I think you are proposing this approach to allocating memory for the
> send and receive buffers so that you can avoid having two virtual
> mappings for the memory, per comments from Christop Hellwig.  But
> overall, the approach seems a bit complex and I wonder if it is worth it.
> If allocating large contiguous chunks of physical memory is successful,
> then there is some memory savings in that the data structures needed to
> keep track of the physical pages is smaller than the equivalent page
> tables might be.  But if you have to revert to allocating individual
> pages, then the memory savings is reduced.
> 
> Ultimately, the list of actual PFNs has to be kept somewhere.  Another
> approach would be to do the reverse of what hv_map_memory() from the v4
> patch series does.  I.e., you could do virt_to_phys() on each virtual
> address that maps above VTOM, and subtract out the shared_gpa_boundary
> to get the
> list of actual PFNs that need to be freed.   This way you don't have two
> copies
> of the list of PFNs -- one with and one without the shared_gpa_boundary
> added.
> But it comes at the cost of additional code so that may not be a great
> idea.
> 
> I think what you have here works, and I don't have a clearly better
> solution at the moment except perhaps to revert to the v4 solution and
> just have two virtual mappings.  I'll keep thinking about it.  Maybe
> Christop has other thoughts.
> 
> >  static int netvsc_init_buf(struct hv_device *device,
> >  			   struct netvsc_device *net_device,
> >  			   const struct netvsc_device_info *device_info) @@ -
> 337,7 +462,7
> > @@ static int netvsc_init_buf(struct hv_device *device,
> >  	struct nvsp_1_message_send_receive_buffer_complete *resp;
> >  	struct net_device *ndev = hv_get_drvdata(device);
> >  	struct nvsp_message *init_packet;
> > -	unsigned int buf_size;
> > +	unsigned int buf_size, alloc_unit;
> >  	size_t map_words;
> >  	int i, ret = 0;
> >
> > @@ -350,7 +475,14 @@ static int netvsc_init_buf(struct hv_device
> *device,
> >  		buf_size = min_t(unsigned int, buf_size,
> >  				 NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
> >
> > -	net_device->recv_buf = vzalloc(buf_size);
> > +	if (hv_isolation_type_snp())
> > +		net_device->recv_buf =
> > +			netvsc_alloc_pages(&net_device->recv_pages,
> > +					   &net_device->recv_page_count,
> > +					   buf_size);
> > +	else
> > +		net_device->recv_buf = vzalloc(buf_size);
> > +
> 
> I wonder if it is necessary to have two different code paths here.  The
> allocating and freeing of the send and receive buffers is not perf
> sensitive, and it seems like netvsc_alloc_pages() could be used
> regardless of whether SNP Isolation is in effect.  To my thinking, one
> code path is better than two code paths unless there's a compelling
> reason to have two.

I still prefer keeping the simple vzalloc for the non isolated VMs, because
simple code path usually means more robust. 
I don't know how much time difference between the two, but in some cases 
we really care about boot time? 
Also in the multi vPort case for MANA, we potentially support hundreds of 
vPorts, and there will be the same number of synthetic NICs associated with 
them. So even small time difference in the initialization time may add up.

Thanks,
- Haiyang


  reply	other threads:[~2021-09-15 16:47 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-14 13:39 [PATCH V5 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 01/12] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 02/12] x86/hyperv: Initialize shared memory boundary in the " Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 03/12] x86/hyperv: Add new hvcall guest address host visibility support Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 04/12] Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
2021-09-15 15:40   ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 05/12] x86/hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
2021-09-15 15:41   ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 06/12] x86/hyperv: Add ghcb hvcall support for SNP VM Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 07/12] Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
2021-09-15 15:41   ` Michael Kelley
2021-09-16 10:52     ` Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 08/12] Drivers: hv : vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 09/12] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
2021-09-15 15:42   ` Michael Kelley
2021-09-16 10:57     ` Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 10/12] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
2021-09-15 15:43   ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 11/12] scsi: storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
2021-09-15 15:43   ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
2021-09-14 15:49   ` Haiyang Zhang
2021-09-15 16:21   ` Michael Kelley
2021-09-15 16:46     ` Haiyang Zhang [this message]
2021-09-16 13:56       ` Tianyu Lan
2021-09-16 14:43     ` Tianyu Lan
2021-09-22 10:34     ` Tianyu Lan
2021-09-27 14:26       ` Tianyu Lan
2021-09-28  5:39         ` Christoph Hellwig
2021-09-28  9:23           ` Tianyu Lan
2021-09-30  5:48             ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MN2PR21MB12959F10240EC1BB2270B345CADB9@MN2PR21MB1295.namprd21.prod.outlook.com \
    --to=haiyangz@microsoft.com \
    --cc=Tianyu.Lan@microsoft.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=arnd@arndb.de \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=brijesh.singh@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=decui@microsoft.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jejb@linux.ibm.com \
    --cc=jgross@suse.com \
    --cc=joro@8bytes.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=konrad.wilk@oracle.com \
    --cc=krish.sadhukhan@oracle.com \
    --cc=kuba@kernel.org \
    --cc=kys@microsoft.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=ltykernel@gmail.com \
    --cc=luto@kernel.org \
    --cc=m.szyprowski@samsung.com \
    --cc=martin.petersen@oracle.com \
    --cc=mikelley@microsoft.com \
    --cc=mingo@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=parri.andrea@gmail.com \
    --cc=peterz@infradead.org \
    --cc=pgonda@google.com \
    --cc=rientjes@google.com \
    --cc=robin.murphy@arm.com \
    --cc=rppt@kernel.org \
    --cc=saravanand@fb.com \
    --cc=sfr@canb.auug.org.au \
    --cc=sstabellini@kernel.org \
    --cc=sthemmin@microsoft.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tj@kernel.org \
    --cc=vkuznets@redhat.com \
    --cc=wei.liu@kernel.org \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).