From: Tianyu Lan <ltykernel@gmail.com>
To: Christoph Hellwig <hch@lst.de>
Cc: "iommu@lists.linux-foundation.org"
<iommu@lists.linux-foundation.org>,
"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
vkuznets <vkuznets@redhat.com>,
"parri.andrea@gmail.com" <parri.andrea@gmail.com>,
"dave.hansen@intel.com" <dave.hansen@intel.com>,
Michael Kelley <mikelley@microsoft.com>,
KY Srinivasan <kys@microsoft.com>,
Haiyang Zhang <haiyangz@microsoft.com>,
Stephen Hemminger <sthemmin@microsoft.com>,
"wei.liu@kernel.org" <wei.liu@kernel.org>,
Dexuan Cui <decui@microsoft.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"bp@alien8.de" <bp@alien8.de>, "x86@kernel.org" <x86@kernel.org>,
"hpa@zytor.com" <hpa@zytor.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"luto@kernel.org" <luto@kernel.org>,
"peterz@infradead.org" <peterz@infradead.org>,
"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
"boris.ostrovsky@oracle.com" <boris.ostrovsky@oracle.com>,
"jgross@suse.com" <jgross@suse.com>,
"sstabellini@kernel.org" <sstabellini@kernel.org>,
"joro@8bytes.org" <joro@8bytes.org>,
"will@kernel.org" <will@kernel.org>,
"davem@davemloft.net" <davem@davemloft.net>,
"kuba@kernel.org" <kuba@kernel.org>,
"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"gregkh@linuxfoundation.org" <gregkh@linuxfoundation.org>,
"arnd@arndb.de" <arnd@arndb.de>,
"m.szyprowski@samsung.com" <m.szyprowski@samsung.com>,
"robin.murphy@arm.com" <robin.murphy@arm.com>,
"brijesh.singh@amd.com" <brijesh.singh@amd.com>,
Tianyu Lan <Tianyu.Lan@microsoft.com>,
"thomas.lendacky@amd.com" <thomas.lendacky@amd.com>,
"pgonda@google.com" <pgonda@google.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"kirill.shutemov@linux.intel.com"
<kirill.shutemov@linux.intel.com>,
"rppt@kernel.org" <rppt@kernel.org>,
"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
"aneesh.kumar@linux.ibm.com" <aneesh.kumar@linux.ibm.com>,
"saravanand@fb.com" <saravanand@fb.com>,
"krish.sadhukhan@oracle.com" <krish.sadhukhan@oracle.com>,
"xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
"tj@kernel.org" <tj@kernel.org>,
"rientjes@google.com" <rientjes@google.com>
Subject: Re: [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver
Date: Mon, 27 Sep 2021 22:26:43 +0800 [thread overview]
Message-ID: <e379a60b-4d74-9167-983f-f70c96bb279e@gmail.com> (raw)
In-Reply-To: <43e22b84-7273-4099-42ea-54b06f398650@gmail.com>
Hi Christoph:
Gentile ping. The swiotlb and shared memory mapping changes in this
patchset needs your reivew. Could you have a look?
Thanks.
On 9/22/2021 6:34 PM, Tianyu Lan wrote:
> Hi Christoph:
> This patch follows your purposal in the previous discussion.
> Could you have a look?
> "use vmap_pfn as in the current series. But in that case I think
> we should get rid of the other mapping created by vmalloc. I
> though a bit about finding a way to apply the offset in vmalloc
> itself, but I think it would be too invasive to the normal fast
> path. So the other sub-option would be to allocate the pages
> manually (maybe even using high order allocations to reduce TLB
> pressure) and then remap them(https://lkml.org/lkml/2021/9/2/112)
>
> Otherwise, I merge your previous change for swiotlb into patch 9
> “x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM”
> You previous change
> link.(http://git.infradead.org/users/hch/misc.git/commit/8248f295928aded3364a1e54a4e0022e93d3610c)
> Please have a look.
>
>
> Thanks.
>
>
> On 9/16/2021 12:21 AM, Michael Kelley wrote:
>> From: Tianyu Lan <ltykernel@gmail.com> Sent: Tuesday, September 14,
>> 2021 6:39 AM
>>>
>>> In Isolation VM, all shared memory with host needs to mark visible
>>> to host via hvcall. vmbus_establish_gpadl() has already done it for
>>> netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
>>> pagebuffer() stills need to be handled. Use DMA API to map/umap
>>> these memory during sending/receiving packet and Hyper-V swiotlb
>>> bounce buffer dma address will be returned. The swiotlb bounce buffer
>>> has been masked to be visible to host during boot up.
>>>
>>> Allocate rx/tx ring buffer via alloc_pages() in Isolation VM and map
>>> these pages via vmap(). After calling vmbus_establish_gpadl() which
>>> marks these pages visible to host, unmap these pages to release the
>>> virtual address mapped with physical address below shared_gpa_boundary
>>> and map them in the extra address space via vmap_pfn().
>>>
>>> Signed-off-by: Tianyu Lan <Tianyu.Lan@microsoft.com>
>>> ---
>>> Change since v4:
>>> * Allocate rx/tx ring buffer via alloc_pages() in Isolation VM
>>> * Map pages after calling vmbus_establish_gpadl().
>>> * set dma_set_min_align_mask for netvsc driver.
>>>
>>> Change since v3:
>>> * Add comment to explain why not to use dma_map_sg()
>>> * Fix some error handle.
>>> ---
>>> drivers/net/hyperv/hyperv_net.h | 7 +
>>> drivers/net/hyperv/netvsc.c | 287 +++++++++++++++++++++++++++++-
>>> drivers/net/hyperv/netvsc_drv.c | 1 +
>>> drivers/net/hyperv/rndis_filter.c | 2 +
>>> include/linux/hyperv.h | 5 +
>>> 5 files changed, 296 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/net/hyperv/hyperv_net.h
>>> b/drivers/net/hyperv/hyperv_net.h
>>> index 315278a7cf88..87e8c74398a5 100644
>>> --- a/drivers/net/hyperv/hyperv_net.h
>>> +++ b/drivers/net/hyperv/hyperv_net.h
>>> @@ -164,6 +164,7 @@ struct hv_netvsc_packet {
>>> u32 total_bytes;
>>> u32 send_buf_index;
>>> u32 total_data_buflen;
>>> + struct hv_dma_range *dma_range;
>>> };
>>>
>>> #define NETVSC_HASH_KEYLEN 40
>>> @@ -1074,6 +1075,8 @@ struct netvsc_device {
>>>
>>> /* Receive buffer allocated by us but manages by NetVSP */
>>> void *recv_buf;
>>> + struct page **recv_pages;
>>> + u32 recv_page_count;
>>> u32 recv_buf_size; /* allocated bytes */
>>> struct vmbus_gpadl recv_buf_gpadl_handle;
>>> u32 recv_section_cnt;
>>> @@ -1082,6 +1085,8 @@ struct netvsc_device {
>>>
>>> /* Send buffer allocated by us */
>>> void *send_buf;
>>> + struct page **send_pages;
>>> + u32 send_page_count;
>>> u32 send_buf_size;
>>> struct vmbus_gpadl send_buf_gpadl_handle;
>>> u32 send_section_cnt;
>>> @@ -1731,4 +1736,6 @@ struct rndis_message {
>>> #define RETRY_US_HI 10000
>>> #define RETRY_MAX 2000 /* >10 sec */
>>>
>>> +void netvsc_dma_unmap(struct hv_device *hv_dev,
>>> + struct hv_netvsc_packet *packet);
>>> #endif /* _HYPERV_NET_H */
>>> diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
>>> index 1f87e570ed2b..7d5254bf043e 100644
>>> --- a/drivers/net/hyperv/netvsc.c
>>> +++ b/drivers/net/hyperv/netvsc.c
>>> @@ -20,6 +20,7 @@
>>> #include <linux/vmalloc.h>
>>> #include <linux/rtnetlink.h>
>>> #include <linux/prefetch.h>
>>> +#include <linux/gfp.h>
>>>
>>> #include <asm/sync_bitops.h>
>>> #include <asm/mshyperv.h>
>>> @@ -150,11 +151,33 @@ static void free_netvsc_device(struct rcu_head
>>> *head)
>>> {
>>> struct netvsc_device *nvdev
>>> = container_of(head, struct netvsc_device, rcu);
>>> + unsigned int alloc_unit;
>>> int i;
>>>
>>> kfree(nvdev->extension);
>>> - vfree(nvdev->recv_buf);
>>> - vfree(nvdev->send_buf);
>>> +
>>> + if (nvdev->recv_pages) {
>>> + alloc_unit = (nvdev->recv_buf_size /
>>> + nvdev->recv_page_count) >> PAGE_SHIFT;
>>> +
>>> + vunmap(nvdev->recv_buf);
>>> + for (i = 0; i < nvdev->recv_page_count; i++)
>>> + __free_pages(nvdev->recv_pages[i], alloc_unit);
>>> + } else {
>>> + vfree(nvdev->recv_buf);
>>> + }
>>> +
>>> + if (nvdev->send_pages) {
>>> + alloc_unit = (nvdev->send_buf_size /
>>> + nvdev->send_page_count) >> PAGE_SHIFT;
>>> +
>>> + vunmap(nvdev->send_buf);
>>> + for (i = 0; i < nvdev->send_page_count; i++)
>>> + __free_pages(nvdev->send_pages[i], alloc_unit);
>>> + } else {
>>> + vfree(nvdev->send_buf);
>>> + }
>>> +
>>> kfree(nvdev->send_section_map);
>>>
>>> for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
>>> @@ -330,6 +353,108 @@ int netvsc_alloc_recv_comp_ring(struct
>>> netvsc_device *net_device, u32 q_idx)
>>> return nvchan->mrc.slots ? 0 : -ENOMEM;
>>> }
>>>
>>> +void *netvsc_alloc_pages(struct page ***pages_array, unsigned int
>>> *array_len,
>>> + unsigned long size)
>>> +{
>>> + struct page *page, **pages, **vmap_pages;
>>> + unsigned long pg_count = size >> PAGE_SHIFT;
>>> + int alloc_unit = MAX_ORDER_NR_PAGES;
>>> + int i, j, vmap_page_index = 0;
>>> + void *vaddr;
>>> +
>>> + if (pg_count < alloc_unit)
>>> + alloc_unit = 1;
>>> +
>>> + /* vmap() accepts page array with PAGE_SIZE as unit while try to
>>> + * allocate high order pages here in order to save page array
>>> space.
>>> + * vmap_pages[] is used as input parameter of vmap(). pages[] is to
>>> + * store allocated pages and map them later.
>>> + */
>>> + vmap_pages = kmalloc_array(pg_count, sizeof(*vmap_pages),
>>> GFP_KERNEL);
>>> + if (!vmap_pages)
>>> + return NULL;
>>> +
>>> +retry:
>>> + *array_len = pg_count / alloc_unit;
>>> + pages = kmalloc_array(*array_len, sizeof(*pages), GFP_KERNEL);
>>> + if (!pages)
>>> + goto cleanup;
>>> +
>>> + for (i = 0; i < *array_len; i++) {
>>> + page = alloc_pages(GFP_KERNEL | __GFP_ZERO,
>>> + get_order(alloc_unit << PAGE_SHIFT));
>>> + if (!page) {
>>> + /* Try allocating small pages if high order pages are
>>> not available. */
>>> + if (alloc_unit == 1) {
>>> + goto cleanup;
>>> + } else {
>>
>> The "else" clause isn't really needed because of the goto cleanup
>> above. Then
>> the indentation of the code below could be reduced by one level.
>>
>>> + memset(vmap_pages, 0,
>>> + sizeof(*vmap_pages) * vmap_page_index);
>>> + vmap_page_index = 0;
>>> +
>>> + for (j = 0; j < i; j++)
>>> + __free_pages(pages[j], alloc_unit);
>>> +
>>> + kfree(pages);
>>> + alloc_unit = 1;
>>
>> This is the case where a large enough contiguous physical memory chunk
>> could
>> not be found. But rather than dropping all the way down to single pages,
>> would it make sense to try something smaller, but not 1? For example,
>> cut the alloc_unit in half and try again. But I'm not sure of all the
>> implications.
>>
>>> + goto retry;
>>> + }
>>> + }
>>> +
>>> + pages[i] = page;
>>> + for (j = 0; j < alloc_unit; j++)
>>> + vmap_pages[vmap_page_index++] = page++;
>>> + }
>>> +
>>> + vaddr = vmap(vmap_pages, vmap_page_index, VM_MAP, PAGE_KERNEL);
>>> + kfree(vmap_pages);
>>> +
>>> + *pages_array = pages;
>>> + return vaddr;
>>> +
>>> +cleanup:
>>> + for (j = 0; j < i; j++)
>>> + __free_pages(pages[i], alloc_unit);
>>> +
>>> + kfree(pages);
>>> + kfree(vmap_pages);
>>> + return NULL;
>>> +}
>>> +
>>> +static void *netvsc_map_pages(struct page **pages, int count, int
>>> alloc_unit)
>>> +{
>>> + int pg_count = count * alloc_unit;
>>> + struct page *page;
>>> + unsigned long *pfns;
>>> + int pfn_index = 0;
>>> + void *vaddr;
>>> + int i, j;
>>> +
>>> + if (!pages)
>>> + return NULL;
>>> +
>>> + pfns = kcalloc(pg_count, sizeof(*pfns), GFP_KERNEL);
>>> + if (!pfns)
>>> + return NULL;
>>> +
>>> + for (i = 0; i < count; i++) {
>>> + page = pages[i];
>>> + if (!page) {
>>> + pr_warn("page is not available %d.\n", i);
>>> + return NULL;
>>> + }
>>> +
>>> + for (j = 0; j < alloc_unit; j++) {
>>> + pfns[pfn_index++] = page_to_pfn(page++) +
>>> + (ms_hyperv.shared_gpa_boundary >> PAGE_SHIFT);
>>> + }
>>> + }
>>> +
>>> + vaddr = vmap_pfn(pfns, pg_count, PAGE_KERNEL_IO);
>>> + kfree(pfns);
>>> + return vaddr;
>>> +}
>>> +
>>
>> I think you are proposing this approach to allocating memory for the send
>> and receive buffers so that you can avoid having two virtual mappings for
>> the memory, per comments from Christop Hellwig. But overall, the
>> approach
>> seems a bit complex and I wonder if it is worth it. If allocating
>> large contiguous
>> chunks of physical memory is successful, then there is some memory
>> savings
>> in that the data structures needed to keep track of the physical pages is
>> smaller than the equivalent page tables might be. But if you have to
>> revert
>> to allocating individual pages, then the memory savings is reduced.
>>
>> Ultimately, the list of actual PFNs has to be kept somewhere. Another
>> approach
>> would be to do the reverse of what hv_map_memory() from the v4 patch
>> series does. I.e., you could do virt_to_phys() on each virtual
>> address that
>> maps above VTOM, and subtract out the shared_gpa_boundary to get the
>> list of actual PFNs that need to be freed. This way you don't have
>> two copies
>> of the list of PFNs -- one with and one without the
>> shared_gpa_boundary added.
>> But it comes at the cost of additional code so that may not be a great
>> idea.
>>
>> I think what you have here works, and I don't have a clearly better
>> solution
>> at the moment except perhaps to revert to the v4 solution and just
>> have two
>> virtual mappings. I'll keep thinking about it. Maybe Christop has other
>> thoughts.
>>
>>> static int netvsc_init_buf(struct hv_device *device,
>>> struct netvsc_device *net_device,
>>> const struct netvsc_device_info *device_info)
>>> @@ -337,7 +462,7 @@ static int netvsc_init_buf(struct hv_device *device,
>>> struct nvsp_1_message_send_receive_buffer_complete *resp;
>>> struct net_device *ndev = hv_get_drvdata(device);
>>> struct nvsp_message *init_packet;
>>> - unsigned int buf_size;
>>> + unsigned int buf_size, alloc_unit;
>>> size_t map_words;
>>> int i, ret = 0;
>>>
>>> @@ -350,7 +475,14 @@ static int netvsc_init_buf(struct hv_device
>>> *device,
>>> buf_size = min_t(unsigned int, buf_size,
>>> NETVSC_RECEIVE_BUFFER_SIZE_LEGACY);
>>>
>>> - net_device->recv_buf = vzalloc(buf_size);
>>> + if (hv_isolation_type_snp())
>>> + net_device->recv_buf =
>>> + netvsc_alloc_pages(&net_device->recv_pages,
>>> + &net_device->recv_page_count,
>>> + buf_size);
>>> + else
>>> + net_device->recv_buf = vzalloc(buf_size);
>>> +
>>
>> I wonder if it is necessary to have two different code paths here. The
>> allocating and freeing of the send and receive buffers is not perf
>> sensitive, and it seems like netvsc_alloc_pages() could be used
>> regardless of whether SNP Isolation is in effect. To my thinking,
>> one code path is better than two code paths unless there's a
>> compelling reason to have two.
>>
>>> if (!net_device->recv_buf) {
>>> netdev_err(ndev,
>>> "unable to allocate receive buffer of size %u\n",
>>> @@ -375,6 +507,27 @@ static int netvsc_init_buf(struct hv_device
>>> *device,
>>> goto cleanup;
>>> }
>>>
>>> + if (hv_isolation_type_snp()) {
>>> + alloc_unit = (buf_size / net_device->recv_page_count)
>>> + >> PAGE_SHIFT;
>>> +
>>> + /* Unmap previous virtual address and map pages in the extra
>>> + * address space(above shared gpa boundary) in Isolation VM.
>>> + */
>>> + vunmap(net_device->recv_buf);
>>> + net_device->recv_buf =
>>> + netvsc_map_pages(net_device->recv_pages,
>>> + net_device->recv_page_count,
>>> + alloc_unit);
>>> + if (!net_device->recv_buf) {
>>> + netdev_err(ndev,
>>> + "unable to allocate receive buffer of size %u\n",
>>> + buf_size);
>>> + ret = -ENOMEM;
>>> + goto cleanup;
>>> + }
>>> + }
>>> +
>>> /* Notify the NetVsp of the gpadl handle */
>>> init_packet = &net_device->channel_init_pkt;
>>> memset(init_packet, 0, sizeof(struct nvsp_message));
>>> @@ -456,13 +609,21 @@ static int netvsc_init_buf(struct hv_device
>>> *device,
>>> buf_size = device_info->send_sections *
>>> device_info->send_section_size;
>>> buf_size = round_up(buf_size, PAGE_SIZE);
>>>
>>> - net_device->send_buf = vzalloc(buf_size);
>>> + if (hv_isolation_type_snp())
>>> + net_device->send_buf =
>>> + netvsc_alloc_pages(&net_device->send_pages,
>>> + &net_device->send_page_count,
>>> + buf_size);
>>> + else
>>> + net_device->send_buf = vzalloc(buf_size);
>>> +
>>> if (!net_device->send_buf) {
>>> netdev_err(ndev, "unable to allocate send buffer of size
>>> %u\n",
>>> buf_size);
>>> ret = -ENOMEM;
>>> goto cleanup;
>>> }
>>> +
>>> net_device->send_buf_size = buf_size;
>>>
>>> /* Establish the gpadl handle for this buffer on this
>>> @@ -478,6 +639,27 @@ static int netvsc_init_buf(struct hv_device
>>> *device,
>>> goto cleanup;
>>> }
>>>
>>> + if (hv_isolation_type_snp()) {
>>> + alloc_unit = (buf_size / net_device->send_page_count)
>>> + >> PAGE_SHIFT;
>>> +
>>> + /* Unmap previous virtual address and map pages in the extra
>>> + * address space(above shared gpa boundary) in Isolation VM.
>>> + */
>>> + vunmap(net_device->send_buf);
>>> + net_device->send_buf =
>>> + netvsc_map_pages(net_device->send_pages,
>>> + net_device->send_page_count,
>>> + alloc_unit);
>>> + if (!net_device->send_buf) {
>>> + netdev_err(ndev,
>>> + "unable to allocate receive buffer of size %u\n",
>>> + buf_size);
>>> + ret = -ENOMEM;
>>> + goto cleanup;
>>> + }
>>> + }
>>> +
>>> /* Notify the NetVsp of the gpadl handle */
>>> init_packet = &net_device->channel_init_pkt;
>>> memset(init_packet, 0, sizeof(struct nvsp_message));
>>> @@ -768,7 +950,7 @@ static void netvsc_send_tx_complete(struct
>>> net_device *ndev,
>>>
>>> /* Notify the layer above us */
>>> if (likely(skb)) {
>>> - const struct hv_netvsc_packet *packet
>>> + struct hv_netvsc_packet *packet
>>> = (struct hv_netvsc_packet *)skb->cb;
>>> u32 send_index = packet->send_buf_index;
>>> struct netvsc_stats *tx_stats;
>>> @@ -784,6 +966,7 @@ static void netvsc_send_tx_complete(struct
>>> net_device *ndev,
>>> tx_stats->bytes += packet->total_bytes;
>>> u64_stats_update_end(&tx_stats->syncp);
>>>
>>> + netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>>> napi_consume_skb(skb, budget);
>>> }
>>>
>>> @@ -948,6 +1131,87 @@ static void netvsc_copy_to_send_buf(struct
>>> netvsc_device *net_device,
>>> memset(dest, 0, padding);
>>> }
>>>
>>> +void netvsc_dma_unmap(struct hv_device *hv_dev,
>>> + struct hv_netvsc_packet *packet)
>>> +{
>>> + u32 page_count = packet->cp_partial ?
>>> + packet->page_buf_cnt - packet->rmsg_pgcnt :
>>> + packet->page_buf_cnt;
>>> + int i;
>>> +
>>> + if (!hv_is_isolation_supported())
>>> + return;
>>> +
>>> + if (!packet->dma_range)
>>> + return;
>>> +
>>> + for (i = 0; i < page_count; i++)
>>> + dma_unmap_single(&hv_dev->device, packet->dma_range[i].dma,
>>> + packet->dma_range[i].mapping_size,
>>> + DMA_TO_DEVICE);
>>> +
>>> + kfree(packet->dma_range);
>>> +}
>>> +
>>> +/* netvsc_dma_map - Map swiotlb bounce buffer with data page of
>>> + * packet sent by vmbus_sendpacket_pagebuffer() in the Isolation
>>> + * VM.
>>> + *
>>> + * In isolation VM, netvsc send buffer has been marked visible to
>>> + * host and so the data copied to send buffer doesn't need to use
>>> + * bounce buffer. The data pages handled by
>>> vmbus_sendpacket_pagebuffer()
>>> + * may not be copied to send buffer and so these pages need to be
>>> + * mapped with swiotlb bounce buffer. netvsc_dma_map() is to do
>>> + * that. The pfns in the struct hv_page_buffer need to be converted
>>> + * to bounce buffer's pfn. The loop here is necessary because the
>>> + * entries in the page buffer array are not necessarily full
>>> + * pages of data. Each entry in the array has a separate offset and
>>> + * len that may be non-zero, even for entries in the middle of the
>>> + * array. And the entries are not physically contiguous. So each
>>> + * entry must be individually mapped rather than as a contiguous unit.
>>> + * So not use dma_map_sg() here.
>>> + */
>>> +static int netvsc_dma_map(struct hv_device *hv_dev,
>>> + struct hv_netvsc_packet *packet,
>>> + struct hv_page_buffer *pb)
>>> +{
>>> + u32 page_count = packet->cp_partial ?
>>> + packet->page_buf_cnt - packet->rmsg_pgcnt :
>>> + packet->page_buf_cnt;
>>> + dma_addr_t dma;
>>> + int i;
>>> +
>>> + if (!hv_is_isolation_supported())
>>> + return 0;
>>> +
>>> + packet->dma_range = kcalloc(page_count,
>>> + sizeof(*packet->dma_range),
>>> + GFP_KERNEL);
>>> + if (!packet->dma_range)
>>> + return -ENOMEM;
>>> +
>>> + for (i = 0; i < page_count; i++) {
>>> + char *src = phys_to_virt((pb[i].pfn << HV_HYP_PAGE_SHIFT)
>>> + + pb[i].offset);
>>> + u32 len = pb[i].len;
>>> +
>>> + dma = dma_map_single(&hv_dev->device, src, len,
>>> + DMA_TO_DEVICE);
>>> + if (dma_mapping_error(&hv_dev->device, dma)) {
>>> + kfree(packet->dma_range);
>>> + return -ENOMEM;
>>> + }
>>> +
>>> + packet->dma_range[i].dma = dma;
>>> + packet->dma_range[i].mapping_size = len;
>>> + pb[i].pfn = dma >> HV_HYP_PAGE_SHIFT;
>>> + pb[i].offset = offset_in_hvpage(dma);
>>
>> With the DMA min align mask now being set, the offset within
>> the Hyper-V page won't be changed by dma_map_single(). So I
>> think the above statement can be removed.
>>
>>> + pb[i].len = len;
>>
>> A few lines above, the value of "len" is set from pb[i].len. Neither
>> "len" nor "i" is changed in the loop, so this statement can also be
>> removed.
>>
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> static inline int netvsc_send_pkt(
>>> struct hv_device *device,
>>> struct hv_netvsc_packet *packet,
>>> @@ -988,14 +1252,24 @@ static inline int netvsc_send_pkt(
>>>
>>> trace_nvsp_send_pkt(ndev, out_channel, rpkt);
>>>
>>> + packet->dma_range = NULL;
>>> if (packet->page_buf_cnt) {
>>> if (packet->cp_partial)
>>> pb += packet->rmsg_pgcnt;
>>>
>>> + ret = netvsc_dma_map(ndev_ctx->device_ctx, packet, pb);
>>> + if (ret) {
>>> + ret = -EAGAIN;
>>> + goto exit;
>>> + }
>>> +
>>> ret = vmbus_sendpacket_pagebuffer(out_channel,
>>> pb, packet->page_buf_cnt,
>>> &nvmsg, sizeof(nvmsg),
>>> req_id);
>>> +
>>> + if (ret)
>>> + netvsc_dma_unmap(ndev_ctx->device_ctx, packet);
>>> } else {
>>> ret = vmbus_sendpacket(out_channel,
>>> &nvmsg, sizeof(nvmsg),
>>> @@ -1003,6 +1277,7 @@ static inline int netvsc_send_pkt(
>>> VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
>>> }
>>>
>>> +exit:
>>> if (ret == 0) {
>>> atomic_inc_return(&nvchan->queue_sends);
>>>
>>> diff --git a/drivers/net/hyperv/netvsc_drv.c
>>> b/drivers/net/hyperv/netvsc_drv.c
>>> index 382bebc2420d..c3dc884b31e3 100644
>>> --- a/drivers/net/hyperv/netvsc_drv.c
>>> +++ b/drivers/net/hyperv/netvsc_drv.c
>>> @@ -2577,6 +2577,7 @@ static int netvsc_probe(struct hv_device *dev,
>>> list_add(&net_device_ctx->list, &netvsc_dev_list);
>>> rtnl_unlock();
>>>
>>> + dma_set_min_align_mask(&dev->device, HV_HYP_PAGE_SIZE - 1);
>>> netvsc_devinfo_put(device_info);
>>> return 0;
>>>
>>> diff --git a/drivers/net/hyperv/rndis_filter.c
>>> b/drivers/net/hyperv/rndis_filter.c
>>> index f6c9c2a670f9..448fcc325ed7 100644
>>> --- a/drivers/net/hyperv/rndis_filter.c
>>> +++ b/drivers/net/hyperv/rndis_filter.c
>>> @@ -361,6 +361,8 @@ static void rndis_filter_receive_response(struct
>>> net_device *ndev,
>>> }
>>> }
>>>
>>> + netvsc_dma_unmap(((struct net_device_context *)
>>> + netdev_priv(ndev))->device_ctx, &request->pkt);
>>> complete(&request->wait_event);
>>> } else {
>>> netdev_err(ndev,
>>> diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
>>> index c94c534a944e..81e58dd582dc 100644
>>> --- a/include/linux/hyperv.h
>>> +++ b/include/linux/hyperv.h
>>> @@ -1597,6 +1597,11 @@ struct hyperv_service_callback {
>>> void (*callback)(void *context);
>>> };
>>>
>>> +struct hv_dma_range {
>>> + dma_addr_t dma;
>>> + u32 mapping_size;
>>> +};
>>> +
>>> #define MAX_SRV_VER 0x7ffffff
>>> extern bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp,
>>> u8 *buf, u32 buflen,
>>> const int *fw_version, int fw_vercnt,
>>> --
>>> 2.25.1
>>
next prev parent reply other threads:[~2021-09-27 14:27 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-14 13:39 [PATCH V5 00/12] x86/Hyper-V: Add Hyper-V Isolation VM support Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 01/12] x86/hyperv: Initialize GHCB page in Isolation VM Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 02/12] x86/hyperv: Initialize shared memory boundary in the " Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 03/12] x86/hyperv: Add new hvcall guest address host visibility support Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 04/12] Drivers: hv: vmbus: Mark vmbus ring buffer visible to host in Isolation VM Tianyu Lan
2021-09-15 15:40 ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 05/12] x86/hyperv: Add Write/Read MSR registers via ghcb page Tianyu Lan
2021-09-15 15:41 ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 06/12] x86/hyperv: Add ghcb hvcall support for SNP VM Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 07/12] Drivers: hv: vmbus: Add SNP support for VMbus channel initiate message Tianyu Lan
2021-09-15 15:41 ` Michael Kelley
2021-09-16 10:52 ` Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 08/12] Drivers: hv : vmbus: Initialize VMbus ring buffer for Isolation VM Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 09/12] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM Tianyu Lan
2021-09-15 15:42 ` Michael Kelley
2021-09-16 10:57 ` Tianyu Lan
2021-09-14 13:39 ` [PATCH V5 10/12] hyperv/IOMMU: Enable swiotlb bounce buffer for Isolation VM Tianyu Lan
2021-09-15 15:43 ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 11/12] scsi: storvsc: Add Isolation VM support for storvsc driver Tianyu Lan
2021-09-15 15:43 ` Michael Kelley
2021-09-14 13:39 ` [PATCH V5 12/12] net: netvsc: Add Isolation VM support for netvsc driver Tianyu Lan
2021-09-14 15:49 ` Haiyang Zhang
2021-09-15 16:21 ` Michael Kelley
2021-09-15 16:46 ` Haiyang Zhang
2021-09-16 13:56 ` Tianyu Lan
2021-09-16 14:43 ` Tianyu Lan
2021-09-22 10:34 ` Tianyu Lan
2021-09-27 14:26 ` Tianyu Lan [this message]
2021-09-28 5:39 ` Christoph Hellwig
2021-09-28 9:23 ` Tianyu Lan
2021-09-30 5:48 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e379a60b-4d74-9167-983f-f70c96bb279e@gmail.com \
--to=ltykernel@gmail.com \
--cc=Tianyu.Lan@microsoft.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=arnd@arndb.de \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=brijesh.singh@amd.com \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=decui@microsoft.com \
--cc=gregkh@linuxfoundation.org \
--cc=haiyangz@microsoft.com \
--cc=hch@lst.de \
--cc=hpa@zytor.com \
--cc=iommu@lists.linux-foundation.org \
--cc=jejb@linux.ibm.com \
--cc=jgross@suse.com \
--cc=joro@8bytes.org \
--cc=kirill.shutemov@linux.intel.com \
--cc=konrad.wilk@oracle.com \
--cc=krish.sadhukhan@oracle.com \
--cc=kuba@kernel.org \
--cc=kys@microsoft.com \
--cc=linux-arch@vger.kernel.org \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=luto@kernel.org \
--cc=m.szyprowski@samsung.com \
--cc=martin.petersen@oracle.com \
--cc=mikelley@microsoft.com \
--cc=mingo@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=parri.andrea@gmail.com \
--cc=peterz@infradead.org \
--cc=pgonda@google.com \
--cc=rientjes@google.com \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=saravanand@fb.com \
--cc=sfr@canb.auug.org.au \
--cc=sstabellini@kernel.org \
--cc=sthemmin@microsoft.com \
--cc=tglx@linutronix.de \
--cc=thomas.lendacky@amd.com \
--cc=tj@kernel.org \
--cc=vkuznets@redhat.com \
--cc=wei.liu@kernel.org \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).