All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/3] UIO driver for low speed Hyper-V devices
@ 2023-07-14 10:25 Saurabh Sengar
  2023-07-14 10:25 ` [PATCH v3 1/3] uio: Add hv_vmbus_client driver Saurabh Sengar
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Saurabh Sengar @ 2023-07-14 10:25 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, corbet,
	linux-kernel, linux-hyperv, linux-doc

Hyper-V is adding multiple low speed "speciality" synthetic devices.
Instead of writing a new kernel-level VMBus driver for each device,
make the devices accessible to user space through a UIO-based
hv_vmbus_client driver. Each device can then be supported by a user
space driver. This approach optimizes the development process and
provides flexibility to user space applications to control the key
interactions with the VMBus ring buffer.

The new synthetic devices are low speed devices that don't support
VMBus monitor bits, and so they must use vmbus_setevent() to notify
the host of ring buffer updates. The new driver provides this
functionality along with a configurable ring buffer size.

Moreover, this series of patches incorporates an update to the fcopy
application, enabling it to seamlessly utilize the new interface. The
older fcopy driver and application will be phased out gradually.
Development of other similar userspace drivers is still underway.

Moreover, this patch series adds a new implementation of the fcopy
application that uses the new UIO driver. The older fcopy driver and
application will be phased out gradually. Development of other similar
userspace drivers is still underway.

[V3]
- Removed ringbuffer sysfs entry and used uio framework for mmap
- Remove ".id_table = NULL"
- kasprintf -> devm_kasprintf
- Change global variable ring_size to per device
- More checks on value which can be set for ring_size
- Remove driverctl, and used echo command instead for driver documentation
- Remove unnecessary one time use macros
- Change kernel version and date for sysfs documentation
- Update documentation.
- Made ring buffer data offset depend on page size
- remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
- Added legal counsel sign-off
- simplify mmap
- Removed "Link:" tag 
- Improve cover letter and commit messages
- Improve debug prints
- Instead of hardcoded instance id, query from class id sysfs
- Set the ring_size value from application
- new application compilation dependent on x86
- Not removing the old driver and application for backward compatibility

[V2]
- Update driver info in Documentation/driver-api/uio-howto.rst
- Update ring_size sysfs info in Documentation/ABI/stable/sysfs-bus-vmbus
- Remove DRIVER_VERSION
- Remove refcnt
- scnprintf -> sysfs_emit
- sysfs_create_file -> ATTRIBUTE_GROUPS + ".driver.groups";
- sysfs_create_bin_file -> device_create_bin_file
- dev_notice -> dev_err
- remove MODULE_VERSION
- simpler sysfs path, less parsing

Saurabh Sengar (3):
  uio: Add hv_vmbus_client driver
  tools: hv: Add vmbus_bufring
  tools: hv: Add new fcopy application based on uio driver

 Documentation/ABI/stable/sysfs-bus-vmbus |  10 +
 Documentation/driver-api/uio-howto.rst   |  54 +++
 drivers/uio/Kconfig                      |  12 +
 drivers/uio/Makefile                     |   1 +
 drivers/uio/uio_hv_vmbus_client.c        | 218 +++++++++
 tools/hv/Build                           |   2 +
 tools/hv/Makefile                        |  21 +-
 tools/hv/hv_fcopy_uio_daemon.c           | 578 +++++++++++++++++++++++
 tools/hv/vmbus_bufring.c                 | 297 ++++++++++++
 tools/hv/vmbus_bufring.h                 | 154 ++++++
 10 files changed, 1346 insertions(+), 1 deletion(-)
 create mode 100644 drivers/uio/uio_hv_vmbus_client.c
 create mode 100644 tools/hv/hv_fcopy_uio_daemon.c
 create mode 100644 tools/hv/vmbus_bufring.c
 create mode 100644 tools/hv/vmbus_bufring.h

-- 
2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v3 1/3] uio: Add hv_vmbus_client driver
  2023-07-14 10:25 [PATCH v3 0/3] UIO driver for low speed Hyper-V devices Saurabh Sengar
@ 2023-07-14 10:25 ` Saurabh Sengar
  2023-08-02 21:43   ` Michael Kelley (LINUX)
  2023-07-14 10:25 ` [PATCH v3 2/3] tools: hv: Add vmbus_bufring Saurabh Sengar
  2023-07-14 10:25 ` [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver Saurabh Sengar
  2 siblings, 1 reply; 9+ messages in thread
From: Saurabh Sengar @ 2023-07-14 10:25 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, corbet,
	linux-kernel, linux-hyperv, linux-doc

Add a new UIO-based driver that generically supports low speed Hyper-V
VMBus devices. This driver can be bound to VMBus devices by user space
drivers that provide device-specific management.

The new driver provides the following core functionality, which is
suitable for low speed devices:
* A single VMBus channel for each device
* Ability to specify the VMBus channel ring buffer size for each device
* Host notification via a hypercall instead of monitor bits

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
[V3]
- Removed ringbuffer sysfs entry and used uio framework for mmap
- Remove ".id_table = NULL"
- kasprintf -> devm_kasprintf
- Change global variable ring_size to per device
- More checks on value which can be set for ring_size
- Remove driverctl, and used echo command instead for driver documentation
- Remove unnecessary one time use macros
- Change kernel version and date for sysfs documentation
- Update documentation
- Better commit message

[V2]
- Update driver info in Documentation/driver-api/uio-howto.rst
- Update ring_size sysfs info in Documentation/ABI/stable/sysfs-bus-vmbus
- Remove DRIVER_VERSION
- Remove refcnt
- scnprintf -> sysfs_emit
- sysfs_create_file -> ATTRIBUTE_GROUPS + ".driver.groups";
- sysfs_create_bin_file -> device_create_bin_file
- dev_notice -> dev_err
- remove MODULE_VERSION

 Documentation/ABI/stable/sysfs-bus-vmbus |  10 ++
 Documentation/driver-api/uio-howto.rst   |  54 ++++++
 drivers/uio/Kconfig                      |  12 ++
 drivers/uio/Makefile                     |   1 +
 drivers/uio/uio_hv_vmbus_client.c        | 218 +++++++++++++++++++++++
 5 files changed, 295 insertions(+)
 create mode 100644 drivers/uio/uio_hv_vmbus_client.c

diff --git a/Documentation/ABI/stable/sysfs-bus-vmbus b/Documentation/ABI/stable/sysfs-bus-vmbus
index 3066feae1d8d..7e77eda77be3 100644
--- a/Documentation/ABI/stable/sysfs-bus-vmbus
+++ b/Documentation/ABI/stable/sysfs-bus-vmbus
@@ -153,6 +153,16 @@ Contact:	Stephen Hemminger <sthemmin@microsoft.com>
 Description:	Binary file created by uio_hv_generic for ring buffer
 Users:		Userspace drivers
 
+What:		/sys/bus/vmbus/devices/<UUID>/ring_size
+Date:		September 2023
+KernelVersion:	6.6
+Contact:	Saurabh Sengar <ssengar@microsoft.com>
+Description:	File created by uio_hv_vmbus_client for setting device ring
+		buffer size. The value specified within the file denotes the
+		total memory allocation for the one complete ring buffer, which
+		includes the ring buffer header, of size PAGE_SIZE.
+Users:		Userspace drivers
+
 What:           /sys/bus/vmbus/devices/<UUID>/channels/<N>/intr_in_full
 Date:           February 2019
 KernelVersion:  5.0
diff --git a/Documentation/driver-api/uio-howto.rst b/Documentation/driver-api/uio-howto.rst
index 907ffa3b38f5..625c2bda369f 100644
--- a/Documentation/driver-api/uio-howto.rst
+++ b/Documentation/driver-api/uio-howto.rst
@@ -722,6 +722,60 @@ For example::
 
 	/sys/bus/vmbus/devices/3811fe4d-0fa0-4b62-981a-74fc1084c757/channels/21/ring
 
+Generic Hyper-V driver for low speed devices
+============================================
+
+The generic driver is a kernel module named uio_hv_vmbus_client. It
+supports slow devices on the Hyper-V VMBus similar to uio_hv_generic
+for faster devices. This driver also gives flexibility of customized
+ring buffer sizes.
+
+Making the driver recognize the device
+--------------------------------------
+
+Since the driver does not declare any device GUID's, it will not get
+loaded automatically and will not automatically bind to any devices. You
+must load it and allocate id to the driver yourself. For example, to use
+the fcopy device class GUID::
+
+        modprobe uio_hv_vmbus_client
+        echo "34d14be3-dee4-41c8-9ae7-6b174977c192" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/new_id
+
+If there already is a hardware specific kernel driver for the device,
+the generic driver still won't bind to it. In this case if you want to
+use the generic driver for a userspace library you'll have to manually unbind
+the hardware specific driver and bind the generic driver, using the device
+instance GUID like this::
+
+          echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/unbind
+          echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/bind
+
+You can verify that the device has been bound to the driver by looking
+for it in sysfs, for example like the following::
+
+        ls -l /sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/driver
+
+Which if successful should print::
+
+      .../eb765408-105f-49b6-b4aa-c123b64d17d4/driver -> ../../../bus/vmbus/drivers/uio_hv_vmbus_client
+
+Things to know about uio_hv_vmbus_client
+----------------------------------------
+
+The uio_hv_vmbus_client driver maps the Hyper-V device ring buffer to userspace
+and offers an interface to manage it.
+
+The userspace API for mapping and performing read/write operations on the device
+ring buffer is implemented in tools/hv/vmbus_bufring.c. Userspace applications
+should use this file as a library and build their logic on top of it.
+
+Additionally, the uio_hv_vmbus_client driver offers the "ring_size" sysfs entry
+for setting the device ring buffer size before opening the device.
+
+For example::
+
+        /sys/bus/vmbus/devices/eb765408-105f-49b6-b4aa-c123b64d17d4/ring_size
+
 Further information
 ===================
 
diff --git a/drivers/uio/Kconfig b/drivers/uio/Kconfig
index 2e16c5338e5b..bd4d27ecfc9a 100644
--- a/drivers/uio/Kconfig
+++ b/drivers/uio/Kconfig
@@ -166,6 +166,18 @@ config UIO_HV_GENERIC
 
 	  If you compile this as a module, it will be called uio_hv_generic.
 
+config UIO_HV_SLOW_DEVICES
+	tristate "Generic driver for low speed VMBus devices"
+	depends on HYPERV
+	help
+	  Generic driver that you can dynamically bind to low speed Hyper-V
+	  VMBus devices to allow a user space driver to manage the device.
+	  The driver provides a single VMBus channel and uses a hypercall
+	  instead of monitor bits to interrupt the host. The driver provides
+	  a configurable per-device ring buffer size.
+
+	  If you compile this as a module, it will be called uio_hv_vmbus_client.
+
 config UIO_DFL
 	tristate "Generic driver for DFL (Device Feature List) bus"
 	depends on FPGA_DFL
diff --git a/drivers/uio/Makefile b/drivers/uio/Makefile
index f2f416a14228..44be0f96da34 100644
--- a/drivers/uio/Makefile
+++ b/drivers/uio/Makefile
@@ -11,4 +11,5 @@ obj-$(CONFIG_UIO_PRUSS)         += uio_pruss.o
 obj-$(CONFIG_UIO_MF624)         += uio_mf624.o
 obj-$(CONFIG_UIO_FSL_ELBC_GPCM)	+= uio_fsl_elbc_gpcm.o
 obj-$(CONFIG_UIO_HV_GENERIC)	+= uio_hv_generic.o
+obj-$(CONFIG_UIO_HV_SLOW_DEVICES)	+= uio_hv_vmbus_client.o
 obj-$(CONFIG_UIO_DFL)	+= uio_dfl.o
diff --git a/drivers/uio/uio_hv_vmbus_client.c b/drivers/uio/uio_hv_vmbus_client.c
new file mode 100644
index 000000000000..778f43b3701d
--- /dev/null
+++ b/drivers/uio/uio_hv_vmbus_client.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * uio_hv_vmbus_client - UIO driver for low speed VMBus devices
+ *
+ * Copyright (c) 2023, Microsoft Corporation.
+ *
+ * Authors:
+ *   Saurabh Sengar <ssengar@microsoft.com>
+ *
+ * Since the driver does not declare any device ids, userspace code must
+ * allocate an id and bind the device to the driver.
+ *
+ * For example, to associate the fcopy service with this driver:
+ * # echo "34d14be3-dee4-41c8-9ae7-6b174977c192" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/new_id
+ *
+ * If there already is a hardware specific kernel driver for the device,
+ * the generic driver still won't bind to it. In this case if you want to
+ * use the generic driver for a userspace library you'll have to manually unbind
+ * the hardware specific driver and bind the generic driver, using the device
+ * instance GUID like this:
+ * # echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/unbind
+ * # echo "eb765408-105f-49b6-b4aa-c123b64d17d4" > /sys/bus/vmbus/drivers/uio_hv_vmbus_client/bind
+ */
+
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/uio_driver.h>
+#include <linux/hyperv.h>
+
+struct uio_hv_vmbus_dev {
+	struct uio_info info;
+	struct hv_device *device;
+	int ring_size;
+};
+
+/*
+ * This is the irqcontrol callback to be registered to uio_info.
+ * It can be used to disable/enable interrupt from user space processes.
+ *
+ * @param info
+ *  pointer to uio_info.
+ * @param irq_state
+ *  state value. 1 to enable interrupt.
+ */
+static int uio_hv_vmbus_irqcontrol(struct uio_info *info, s32 irq_state)
+{
+	struct uio_hv_vmbus_dev *pdata = info->priv;
+	struct hv_device *hv_dev = pdata->device;
+
+	/* Issue a full memory barrier before triggering the notification */
+	virt_mb();
+
+	if (irq_state == 1)
+		vmbus_setevent(hv_dev->channel);
+
+	return 0;
+}
+
+/*
+ * Callback from vmbus_event when something is in inbound ring.
+ */
+static void uio_hv_vmbus_channel_cb(void *context)
+{
+	struct uio_hv_vmbus_dev *pdata = context;
+
+	/* Issue a full memory barrier before sending the event to userspace */
+	virt_mb();
+
+	uio_event_notify(&pdata->info);
+}
+
+static int uio_hv_vmbus_open(struct uio_info *info, struct inode *inode)
+{
+	struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+	struct hv_device *hv_dev = pdata->device;
+	struct vmbus_channel *channel = hv_dev->channel;
+	void *ring_buffer;
+	int ret;
+
+	ret = vmbus_open(channel, pdata->ring_size, pdata->ring_size, NULL, 0,
+			 uio_hv_vmbus_channel_cb, pdata);
+	if (ret) {
+		dev_err(&hv_dev->device, "error %d when opening the channel\n", ret);
+		return ret;
+	}
+	channel->inbound.ring_buffer->interrupt_mask = 0;
+	set_channel_read_mode(channel, HV_CALL_ISR);
+
+	/* set the mem pointer */
+	info->mem[0].name = "txrx_rings";
+	ring_buffer = page_address(channel->ringbuffer_page);
+	info->mem[0].addr = (uintptr_t)virt_to_phys(ring_buffer);
+	info->mem[0].size = channel->ringbuffer_pagecount << PAGE_SHIFT;
+	info->mem[0].memtype = UIO_MEM_IOVA;
+
+	return ret;
+}
+
+static int uio_hv_vmbus_release(struct uio_info *info, struct inode *inode)
+{
+	struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+	struct hv_device *hv_dev = pdata->device;
+
+	vmbus_close(hv_dev->channel);
+
+	/* restore the mem pointer to its original state */
+	info->mem[0].name = NULL;
+	info->mem[0].addr = 0;
+	info->mem[0].size = 1;
+	info->mem[0].memtype = UIO_MEM_NONE;
+
+	return 0;
+}
+
+static ssize_t ring_size_show(struct device *dev, struct device_attribute *attr,
+			      char *buf)
+{
+	struct uio_info *info = dev_get_drvdata(dev);
+	struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+
+	return sysfs_emit(buf, "%d\n", pdata->ring_size);
+}
+
+static ssize_t ring_size_store(struct device *dev, struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	unsigned int val;
+	struct uio_info *info = dev_get_drvdata(dev);
+	struct uio_hv_vmbus_dev *pdata = container_of(info, struct uio_hv_vmbus_dev, info);
+
+	if (kstrtouint(buf, 0, &val) < 0)
+		return -EINVAL;
+
+	if (val < 2 * PAGE_SIZE || val % PAGE_SIZE)
+		return -EINVAL;
+
+	pdata->ring_size = val;
+
+	return count;
+}
+
+static DEVICE_ATTR_RW(ring_size);
+
+static struct attribute *uio_hv_vmbus_client_attrs[] = {
+	&dev_attr_ring_size.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(uio_hv_vmbus_client);
+
+static int uio_hv_vmbus_probe(struct hv_device *dev, const struct hv_vmbus_device_id *dev_id)
+{
+	struct uio_hv_vmbus_dev *pdata;
+	int ret;
+	char *name = NULL;
+
+	pdata = devm_kzalloc(&dev->device, sizeof(*pdata), GFP_KERNEL);
+	if (!pdata)
+		return -ENOMEM;
+
+	name = devm_kasprintf(&dev->device, GFP_KERNEL, "%pUl", &dev->dev_instance);
+
+	/* Fill general uio info */
+	pdata->info.name = name; /* /sys/class/uio/uioX/name */
+	pdata->info.version = "1";
+	pdata->info.irqcontrol = uio_hv_vmbus_irqcontrol;
+	pdata->info.open = uio_hv_vmbus_open;
+	pdata->info.release = uio_hv_vmbus_release;
+	pdata->info.irq = UIO_IRQ_CUSTOM;
+	pdata->info.priv = pdata;
+	pdata->ring_size = VMBUS_RING_SIZE(3 * HV_HYP_PAGE_SIZE); /* Default ringbuffer size */
+	pdata->device = dev;
+
+	/* dummy value to register the mem pointers which will be updated by open */
+	pdata->info.mem[0].size = 1;
+
+	ret = uio_register_device(&dev->device, &pdata->info);
+	if (ret) {
+		dev_err(&dev->device, "uio_hv_vmbus register failed\n");
+		return ret;
+	}
+
+	hv_set_drvdata(dev, pdata);
+
+	return 0;
+}
+
+static void uio_hv_vmbus_remove(struct hv_device *dev)
+{
+	struct uio_hv_vmbus_dev *pdata = hv_get_drvdata(dev);
+
+	if (pdata)
+		uio_unregister_device(&pdata->info);
+}
+
+static struct hv_driver uio_hv_vmbus_drv = {
+	.driver.dev_groups = uio_hv_vmbus_client_groups,
+	.name = "uio_hv_vmbus_client",
+	.probe = uio_hv_vmbus_probe,
+	.remove = uio_hv_vmbus_remove,
+};
+
+static int __init uio_hv_vmbus_init(void)
+{
+	return vmbus_driver_register(&uio_hv_vmbus_drv);
+}
+
+static void __exit uio_hv_vmbus_exit(void)
+{
+	vmbus_driver_unregister(&uio_hv_vmbus_drv);
+}
+
+module_init(uio_hv_vmbus_init);
+module_exit(uio_hv_vmbus_exit);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Saurabh Sengar <ssengar@microsoft.com>");
+MODULE_DESCRIPTION("Generic UIO driver for low speed VMBus devices");
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 2/3] tools: hv: Add vmbus_bufring
  2023-07-14 10:25 [PATCH v3 0/3] UIO driver for low speed Hyper-V devices Saurabh Sengar
  2023-07-14 10:25 ` [PATCH v3 1/3] uio: Add hv_vmbus_client driver Saurabh Sengar
@ 2023-07-14 10:25 ` Saurabh Sengar
  2023-08-02 21:43   ` Michael Kelley (LINUX)
  2023-07-14 10:25 ` [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver Saurabh Sengar
  2 siblings, 1 reply; 9+ messages in thread
From: Saurabh Sengar @ 2023-07-14 10:25 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, corbet,
	linux-kernel, linux-hyperv, linux-doc

Provide a userspace interface for userspace drivers or applications to
read/write a VMBus ringbuffer. A significant part of this code is
borrowed from DPDK[1]. Current library is supported exclusively for
the x86 architecture.

To build this library:
make -C tools/hv libvmbus_bufring.a

Applications using this library can include the vmbus_bufring.h header
file and libvmbus_bufring.a statically.

[1] https://github.com/DPDK/dpdk/

Signed-off-by: Mary Hardy <maryhardy@microsoft.com>
Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
[V3]
- Made ring buffer data offset depend on page size
- remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
- Added legal counsel sign-off
- Removed "Link:" tag 
- Improve commit messages
- new library compilation dependent on x86
- simplify mmap

[V2]
- simpler sysfs path, less parsing

 tools/hv/Build           |   1 +
 tools/hv/Makefile        |  13 +-
 tools/hv/vmbus_bufring.c | 297 +++++++++++++++++++++++++++++++++++++++
 tools/hv/vmbus_bufring.h | 154 ++++++++++++++++++++
 4 files changed, 464 insertions(+), 1 deletion(-)
 create mode 100644 tools/hv/vmbus_bufring.c
 create mode 100644 tools/hv/vmbus_bufring.h

diff --git a/tools/hv/Build b/tools/hv/Build
index 6cf51fa4b306..2a667d3d94cb 100644
--- a/tools/hv/Build
+++ b/tools/hv/Build
@@ -1,3 +1,4 @@
 hv_kvp_daemon-y += hv_kvp_daemon.o
 hv_vss_daemon-y += hv_vss_daemon.o
 hv_fcopy_daemon-y += hv_fcopy_daemon.o
+vmbus_bufring-y += vmbus_bufring.o
diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index fe770e679ae8..33cf488fd20f 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -11,14 +11,19 @@ srctree := $(patsubst %/,%,$(dir $(CURDIR)))
 srctree := $(patsubst %/,%,$(dir $(srctree)))
 endif
 
+include $(srctree)/tools/scripts/Makefile.arch
+
 # Do not use make's built-in rules
 # (this improves performance and avoids hard-to-debug behaviour);
 MAKEFLAGS += -r
 
 override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
 
+ifeq ($(SRCARCH),x86)
+ALL_LIBS := libvmbus_bufring.a
+endif
 ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
-ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
+ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst %,$(OUTPUT)%,$(ALL_LIBS))
 
 ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
 
@@ -27,6 +32,12 @@ all: $(ALL_PROGRAMS)
 export srctree OUTPUT CC LD CFLAGS
 include $(srctree)/tools/build/Makefile.include
 
+HV_VMBUS_BUFRING_IN := $(OUTPUT)vmbus_bufring.o
+$(HV_VMBUS_BUFRING_IN): FORCE
+	$(Q)$(MAKE) $(build)=vmbus_bufring
+$(OUTPUT)libvmbus_bufring.a : vmbus_bufring.o
+	$(AR) rcs $@ $^
+
 HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
 $(HV_KVP_DAEMON_IN): FORCE
 	$(Q)$(MAKE) $(build)=hv_kvp_daemon
diff --git a/tools/hv/vmbus_bufring.c b/tools/hv/vmbus_bufring.c
new file mode 100644
index 000000000000..fb1f0489c625
--- /dev/null
+++ b/tools/hv/vmbus_bufring.c
@@ -0,0 +1,297 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2009-2012,2016,2023 Microsoft Corp.
+ * Copyright (c) 2012 NetApp Inc.
+ * Copyright (c) 2012 Citrix Inc.
+ * All rights reserved.
+ */
+
+#include <errno.h>
+#include <emmintrin.h>
+#include <stdio.h>
+#include <string.h>
+#include <unistd.h>
+#include <sys/uio.h>
+#include "vmbus_bufring.h"
+
+#define	rte_compiler_barrier()	({ asm volatile ("" : : : "memory"); })
+#define RINGDATA_START_OFFSET	(getpagesize())
+#define VMBUS_RQST_ERROR	0xFFFFFFFFFFFFFFFF
+#define ALIGN(val, align)	((typeof(val))((val) & (~((typeof(val))((align) - 1)))))
+
+/* Increase bufring index by inc with wraparound */
+static inline uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz)
+{
+	idx += inc;
+	if (idx >= sz)
+		idx -= sz;
+
+	return idx;
+}
+
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen)
+{
+	br->vbr = buf;
+	br->windex = br->vbr->windex;
+	br->dsize = blen - RINGDATA_START_OFFSET;
+}
+
+static inline __always_inline void
+rte_smp_mb(void)
+{
+	asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
+}
+
+static inline int
+rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
+{
+	uint8_t res;
+
+	asm volatile("lock ; "
+		     "cmpxchgl %[src], %[dst];"
+		     "sete %[res];"
+		     : [res] "=a" (res),     /* output */
+		     [dst] "=m" (*dst)
+		     : [src] "r" (src),      /* input */
+		     "a" (exp),
+		     "m" (*dst)
+		     : "memory");            /* no-clobber list */
+	return res;
+}
+
+static inline uint32_t
+vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
+		  const void *src0, uint32_t cplen)
+{
+	uint8_t *br_data = (uint8_t *)tbr->vbr + RINGDATA_START_OFFSET;
+	uint32_t br_dsize = tbr->dsize;
+	const uint8_t *src = src0;
+
+	if (cplen > br_dsize - windex) {
+		uint32_t fraglen = br_dsize - windex;
+
+		/* Wrap-around detected */
+		memcpy(br_data + windex, src, fraglen);
+		memcpy(br_data, src + fraglen, cplen - fraglen);
+	} else {
+		memcpy(br_data + windex, src, cplen);
+	}
+
+	return vmbus_br_idxinc(windex, cplen, br_dsize);
+}
+
+/*
+ * Write scattered channel packet to TX bufring.
+ *
+ * The offset of this channel packet is written as a 64bits value
+ * immediately after this channel packet.
+ *
+ * The write goes through three stages:
+ *  1. Reserve space in ring buffer for the new data.
+ *     Writer atomically moves priv_write_index.
+ *  2. Copy the new data into the ring.
+ *  3. Update the tail of the ring (visible to host) that indicates
+ *     next read location. Writer updates write_index
+ */
+static int
+vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
+		 bool *need_sig)
+{
+	struct vmbus_bufring *vbr = tbr->vbr;
+	uint32_t ring_size = tbr->dsize;
+	uint32_t old_windex, next_windex, windex, total;
+	uint64_t save_windex;
+	int i;
+
+	total = 0;
+	for (i = 0; i < iovlen; i++)
+		total += iov[i].iov_len;
+	total += sizeof(save_windex);
+
+	/* Reserve space in ring */
+	do {
+		uint32_t avail;
+
+		/* Get current free location */
+		old_windex = tbr->windex;
+
+		/* Prevent compiler reordering this with calculation */
+		rte_compiler_barrier();
+
+		avail = vmbus_br_availwrite(tbr, old_windex);
+
+		/* If not enough space in ring, then tell caller. */
+		if (avail <= total)
+			return -EAGAIN;
+
+		next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
+
+		/* Atomic update of next write_index for other threads */
+	} while (!rte_atomic32_cmpset(&tbr->windex, old_windex, next_windex));
+
+	/* Space from old..new is now reserved */
+	windex = old_windex;
+	for (i = 0; i < iovlen; i++)
+		windex = vmbus_txbr_copyto(tbr, windex, iov[i].iov_base, iov[i].iov_len);
+
+	/* Set the offset of the current channel packet. */
+	save_windex = ((uint64_t)old_windex) << 32;
+	windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
+				   sizeof(save_windex));
+
+	/* The region reserved should match region used */
+	if (windex != next_windex)
+		return -EINVAL;
+
+	/* Ensure that data is available before updating host index */
+	rte_compiler_barrier();
+
+	/* Checkin for our reservation. wait for our turn to update host */
+	while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
+		_mm_pause();
+
+	return 0;
+}
+
+int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
+			uint32_t dlen, uint32_t flags)
+{
+	struct vmbus_chanpkt pkt;
+	unsigned int pktlen, pad_pktlen;
+	const uint32_t hlen = sizeof(pkt);
+	bool send_evt = false;
+	uint64_t pad = 0;
+	struct iovec iov[3];
+	int error;
+
+	pktlen = hlen + dlen;
+	pad_pktlen = ALIGN(pktlen, sizeof(uint64_t));
+
+	pkt.hdr.type = type;
+	pkt.hdr.flags = flags;
+	pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
+	pkt.hdr.xactid = VMBUS_RQST_ERROR; /* doesn't support multiple requests at same time */
+
+	iov[0].iov_base = &pkt;
+	iov[0].iov_len = hlen;
+	iov[1].iov_base = data;
+	iov[1].iov_len = dlen;
+	iov[2].iov_base = &pad;
+	iov[2].iov_len = pad_pktlen - pktlen;
+
+	error = vmbus_txbr_write(txbr, iov, 3, &send_evt);
+
+	return error;
+}
+
+static inline uint32_t
+vmbus_rxbr_copyfrom(const struct vmbus_br *rbr, uint32_t rindex,
+		    void *dst0, size_t cplen)
+{
+	const uint8_t *br_data = (uint8_t *)rbr->vbr + RINGDATA_START_OFFSET;
+	uint32_t br_dsize = rbr->dsize;
+	uint8_t *dst = dst0;
+
+	if (cplen > br_dsize - rindex) {
+		uint32_t fraglen = br_dsize - rindex;
+
+		/* Wrap-around detected. */
+		memcpy(dst, br_data + rindex, fraglen);
+		memcpy(dst + fraglen, br_data, cplen - fraglen);
+	} else {
+		memcpy(dst, br_data + rindex, cplen);
+	}
+
+	return vmbus_br_idxinc(rindex, cplen, br_dsize);
+}
+
+/* Copy data from receive ring but don't change index */
+static int
+vmbus_rxbr_peek(const struct vmbus_br *rbr, void *data, size_t dlen)
+{
+	uint32_t avail;
+
+	/*
+	 * The requested data and the 64bits channel packet
+	 * offset should be there at least.
+	 */
+	avail = vmbus_br_availread(rbr);
+	if (avail < dlen + sizeof(uint64_t))
+		return -EAGAIN;
+
+	vmbus_rxbr_copyfrom(rbr, rbr->vbr->rindex, data, dlen);
+	return 0;
+}
+
+/*
+ * Copy data from receive ring and change index
+ * NOTE:
+ * We assume (dlen + skip) == sizeof(channel packet).
+ */
+static int
+vmbus_rxbr_read(struct vmbus_br *rbr, void *data, size_t dlen, size_t skip)
+{
+	struct vmbus_bufring *vbr = rbr->vbr;
+	uint32_t br_dsize = rbr->dsize;
+	uint32_t rindex;
+
+	if (vmbus_br_availread(rbr) < dlen + skip + sizeof(uint64_t))
+		return -EAGAIN;
+
+	/* Record where host was when we started read (for debug) */
+	rbr->windex = rbr->vbr->windex;
+
+	/*
+	 * Copy channel packet from RX bufring.
+	 */
+	rindex = vmbus_br_idxinc(rbr->vbr->rindex, skip, br_dsize);
+	rindex = vmbus_rxbr_copyfrom(rbr, rindex, data, dlen);
+
+	/*
+	 * Discard this channel packet's 64bits offset, which is useless to us.
+	 */
+	rindex = vmbus_br_idxinc(rindex, sizeof(uint64_t), br_dsize);
+
+	/* Update the read index _after_ the channel packet is fetched.	 */
+	rte_compiler_barrier();
+
+	vbr->rindex = rindex;
+
+	return 0;
+}
+
+int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr,
+			    void *data, uint32_t *len)
+{
+	struct vmbus_chanpkt_hdr pkt;
+	uint32_t dlen, bufferlen = *len;
+	int error;
+
+	error = vmbus_rxbr_peek(rxbr, &pkt, sizeof(pkt));
+	if (error)
+		return error;
+
+	if (unlikely(pkt.hlen < VMBUS_CHANPKT_HLEN_MIN))
+		/* XXX this channel is dead actually. */
+		return -EIO;
+
+	if (unlikely(pkt.hlen > pkt.tlen))
+		return -EIO;
+
+	/* Length are in quad words */
+	dlen = pkt.tlen << VMBUS_CHANPKT_SIZE_SHIFT;
+	*len = dlen;
+
+	/* If caller buffer is not large enough */
+	if (unlikely(dlen > bufferlen))
+		return -ENOBUFS;
+
+	/* Read data and skip packet header */
+	error = vmbus_rxbr_read(rxbr, data, dlen, 0);
+	if (error)
+		return error;
+
+	/* Return the number of bytes read */
+	return dlen + sizeof(uint64_t);
+}
diff --git a/tools/hv/vmbus_bufring.h b/tools/hv/vmbus_bufring.h
new file mode 100644
index 000000000000..45ecc48e517f
--- /dev/null
+++ b/tools/hv/vmbus_bufring.h
@@ -0,0 +1,154 @@
+/* SPDX-License-Identifier: BSD-3-Clause */
+
+#ifndef _VMBUS_BUF_H_
+#define _VMBUS_BUF_H_
+
+#include <stdbool.h>
+#include <stdint.h>
+
+#define __packed   __attribute__((__packed__))
+#define unlikely(x)	__builtin_expect(!!(x), 0)
+
+#define ICMSGHDRFLAG_TRANSACTION	1
+#define ICMSGHDRFLAG_REQUEST		2
+#define ICMSGHDRFLAG_RESPONSE		4
+
+#define IC_VERSION_NEGOTIATION_MAX_VER_COUNT 100
+#define ICMSG_HDR (sizeof(struct vmbuspipe_hdr) + sizeof(struct icmsg_hdr))
+#define ICMSG_NEGOTIATE_PKT_SIZE(icframe_vercnt, icmsg_vercnt) \
+	(ICMSG_HDR + sizeof(struct icmsg_negotiate) + \
+	 (((icframe_vercnt) + (icmsg_vercnt)) * sizeof(struct ic_version)))
+
+/*
+ * Channel packets
+ */
+
+/* Channel packet flags */
+#define VMBUS_CHANPKT_TYPE_INBAND	0x0006
+#define VMBUS_CHANPKT_TYPE_RXBUF	0x0007
+#define VMBUS_CHANPKT_TYPE_GPA		0x0009
+#define VMBUS_CHANPKT_TYPE_COMP		0x000b
+
+#define VMBUS_CHANPKT_FLAG_NONE		0
+#define VMBUS_CHANPKT_FLAG_RC		0x0001  /* report completion */
+
+#define VMBUS_CHANPKT_SIZE_SHIFT	3
+#define VMBUS_CHANPKT_SIZE_ALIGN	BIT(VMBUS_CHANPKT_SIZE_SHIFT)
+#define VMBUS_CHANPKT_HLEN_MIN		\
+	(sizeof(struct vmbus_chanpkt_hdr) >> VMBUS_CHANPKT_SIZE_SHIFT)
+
+/*
+ * Buffer ring
+ */
+struct vmbus_bufring {
+	volatile uint32_t windex;
+	volatile uint32_t rindex;
+
+	/*
+	 * Interrupt mask {0,1}
+	 *
+	 * For TX bufring, host set this to 1, when it is processing
+	 * the TX bufring, so that we can safely skip the TX event
+	 * notification to host.
+	 *
+	 * For RX bufring, once this is set to 1 by us, host will not
+	 * further dispatch interrupts to us, even if there are data
+	 * pending on the RX bufring.  This effectively disables the
+	 * interrupt of the channel to which this RX bufring is attached.
+	 */
+	volatile uint32_t imask;
+
+	/*
+	 * Win8 uses some of the reserved bits to implement
+	 * interrupt driven flow management. On the send side
+	 * we can request that the receiver interrupt the sender
+	 * when the ring transitions from being full to being able
+	 * to handle a message of size "pending_send_sz".
+	 *
+	 * Add necessary state for this enhancement.
+	 */
+	volatile uint32_t pending_send;
+	uint32_t reserved1[12];
+
+	union {
+		struct {
+			uint32_t feat_pending_send_sz:1;
+		};
+		uint32_t value;
+	} feature_bits;
+
+	/*
+	 * Ring data starts here + RingDataStartOffset
+	 * !!! DO NOT place any fields below this !!!
+	 */
+	uint8_t data[];
+} __packed;
+
+struct vmbus_br {
+	struct vmbus_bufring *vbr;
+	uint32_t	dsize;
+	uint32_t	windex; /* next available location */
+};
+
+struct vmbus_chanpkt_hdr {
+	uint16_t	type;	/* VMBUS_CHANPKT_TYPE_ */
+	uint16_t	hlen;	/* header len, in 8 bytes */
+	uint16_t	tlen;	/* total len, in 8 bytes */
+	uint16_t	flags;	/* VMBUS_CHANPKT_FLAG_ */
+	uint64_t	xactid;
+} __packed;
+
+struct vmbus_chanpkt {
+	struct vmbus_chanpkt_hdr hdr;
+} __packed;
+
+struct vmbuspipe_hdr {
+	unsigned int flags;
+	unsigned int msgsize;
+} __packed;
+
+struct ic_version {
+	unsigned short major;
+	unsigned short minor;
+} __packed;
+
+struct icmsg_negotiate {
+	unsigned short icframe_vercnt;
+	unsigned short icmsg_vercnt;
+	unsigned int reserved;
+	struct ic_version icversion_data[]; /* any size array */
+} __packed;
+
+struct icmsg_hdr {
+	struct ic_version icverframe;
+	unsigned short icmsgtype;
+	struct ic_version icvermsg;
+	unsigned short icmsgsize;
+	unsigned int status;
+	unsigned char ictransaction_id;
+	unsigned char icflags;
+	unsigned char reserved[2];
+} __packed;
+
+int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr, void *data, uint32_t *len);
+int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
+			uint32_t dlen, uint32_t flags);
+void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen);
+
+/* Amount of space available for write */
+static inline uint32_t vmbus_br_availwrite(const struct vmbus_br *br, uint32_t windex)
+{
+	uint32_t rindex = br->vbr->rindex;
+
+	if (windex >= rindex)
+		return br->dsize - (windex - rindex);
+	else
+		return rindex - windex;
+}
+
+static inline uint32_t vmbus_br_availread(const struct vmbus_br *br)
+{
+	return br->dsize - vmbus_br_availwrite(br, br->vbr->windex);
+}
+
+#endif	/* !_VMBUS_BUF_H_ */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver
  2023-07-14 10:25 [PATCH v3 0/3] UIO driver for low speed Hyper-V devices Saurabh Sengar
  2023-07-14 10:25 ` [PATCH v3 1/3] uio: Add hv_vmbus_client driver Saurabh Sengar
  2023-07-14 10:25 ` [PATCH v3 2/3] tools: hv: Add vmbus_bufring Saurabh Sengar
@ 2023-07-14 10:25 ` Saurabh Sengar
  2023-08-02 21:45   ` Michael Kelley (LINUX)
  2 siblings, 1 reply; 9+ messages in thread
From: Saurabh Sengar @ 2023-07-14 10:25 UTC (permalink / raw)
  To: kys, haiyangz, wei.liu, decui, mikelley, gregkh, corbet,
	linux-kernel, linux-hyperv, linux-doc

Implement the file copy service for Linux guests on Hyper-V. This
permits the host to copy a file (over VMBus) into the guest. This
facility is part of "guest integration services" supported on the
Hyper-V platform.

Here is a link that provides additional details on this functionality:

http://technet.microsoft.com/en-us/library/dn464282.aspx

This new fcopy application uses uio_hv_vmbus_client driver which
makes the earlier hv_util based driver and application obsolete.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
---
[V3]
- Improve cover letter and commit messages
- Improve debug prints
- Instead of hardcoded instance id, query from class id sysfs
- Set the ring_size value from application
- Update the application to mmap /dev/uio instead of sysfs
- new application compilation dependent on x86

[V2]
- simpler sysfs path

 tools/hv/Build                 |   1 +
 tools/hv/Makefile              |  10 +-
 tools/hv/hv_fcopy_uio_daemon.c | 578 +++++++++++++++++++++++++++++++++
 3 files changed, 588 insertions(+), 1 deletion(-)
 create mode 100644 tools/hv/hv_fcopy_uio_daemon.c

diff --git a/tools/hv/Build b/tools/hv/Build
index 2a667d3d94cb..efcbb74a0d23 100644
--- a/tools/hv/Build
+++ b/tools/hv/Build
@@ -2,3 +2,4 @@ hv_kvp_daemon-y += hv_kvp_daemon.o
 hv_vss_daemon-y += hv_vss_daemon.o
 hv_fcopy_daemon-y += hv_fcopy_daemon.o
 vmbus_bufring-y += vmbus_bufring.o
+hv_fcopy_uio_daemon-y += hv_fcopy_uio_daemon.o
diff --git a/tools/hv/Makefile b/tools/hv/Makefile
index 33cf488fd20f..678c6c450a53 100644
--- a/tools/hv/Makefile
+++ b/tools/hv/Makefile
@@ -21,8 +21,10 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
 
 ifeq ($(SRCARCH),x86)
 ALL_LIBS := libvmbus_bufring.a
-endif
+ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon hv_fcopy_uio_daemon
+else
 ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
+endif
 ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst %,$(OUTPUT)%,$(ALL_LIBS))
 
 ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
@@ -56,6 +58,12 @@ $(HV_FCOPY_DAEMON_IN): FORCE
 $(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
 	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
 
+HV_FCOPY_UIO_DAEMON_IN := $(OUTPUT)hv_fcopy_uio_daemon-in.o
+$(HV_FCOPY_UIO_DAEMON_IN): FORCE
+	$(Q)$(MAKE) $(build)=hv_fcopy_uio_daemon
+$(OUTPUT)hv_fcopy_uio_daemon: $(HV_FCOPY_UIO_DAEMON_IN) libvmbus_bufring.a
+	$(QUIET_LINK)$(CC) -lm $< -L. -lvmbus_bufring -o $@
+
 clean:
 	rm -f $(ALL_PROGRAMS)
 	find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
diff --git a/tools/hv/hv_fcopy_uio_daemon.c b/tools/hv/hv_fcopy_uio_daemon.c
new file mode 100644
index 000000000000..e8618a30dc7e
--- /dev/null
+++ b/tools/hv/hv_fcopy_uio_daemon.c
@@ -0,0 +1,578 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * An implementation of host to guest copy functionality for Linux.
+ *
+ * Copyright (C) 2023, Microsoft, Inc.
+ *
+ * Author : K. Y. Srinivasan <kys@microsoft.com>
+ * Author : Saurabh Sengar <ssengar@microsoft.com>
+ *
+ */
+
+#include <dirent.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <getopt.h>
+#include <locale.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <syslog.h>
+#include <unistd.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <linux/hyperv.h>
+#include "vmbus_bufring.h"
+
+#define ICMSGTYPE_NEGOTIATE	0
+#define ICMSGTYPE_FCOPY		7
+
+#define WIN8_SRV_MAJOR		1
+#define WIN8_SRV_MINOR		1
+#define WIN8_SRV_VERSION	(WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
+
+#define MAX_PATH_LEN		300
+#define MAX_LINE_LEN		40
+#define DEVICES_SYSFS		"/sys/bus/vmbus/devices"
+#define FCOPY_CLASS_ID		"34d14be3-dee4-41c8-9ae7-6b174977c192"
+
+#define FCOPY_VER_COUNT		1
+static const int fcopy_versions[] = {
+	WIN8_SRV_VERSION
+};
+
+#define FW_VER_COUNT		1
+static const int fw_versions[] = {
+	UTIL_FW_VERSION
+};
+
+#define HV_RING_SIZE		(4 * 4096)
+
+unsigned char desc[HV_RING_SIZE];
+
+static int target_fd;
+static char target_fname[PATH_MAX];
+static unsigned long long filesize;
+
+static int hv_fcopy_create_file(char *file_name, char *path_name, __u32 flags)
+{
+	int error = HV_E_FAIL;
+	char *q, *p;
+
+	filesize = 0;
+	p = (char *)path_name;
+	snprintf(target_fname, sizeof(target_fname), "%s/%s",
+		 (char *)path_name, (char *)file_name);
+
+	/*
+	 * Check to see if the path is already in place; if not,
+	 * create if required.
+	 */
+	while ((q = strchr(p, '/')) != NULL) {
+		if (q == p) {
+			p++;
+			continue;
+		}
+		*q = '\0';
+		if (access(path_name, F_OK)) {
+			if (flags & CREATE_PATH) {
+				if (mkdir(path_name, 0755)) {
+					syslog(LOG_ERR, "Failed to create %s",
+					       path_name);
+					goto done;
+				}
+			} else {
+				syslog(LOG_ERR, "Invalid path: %s", path_name);
+				goto done;
+			}
+		}
+		p = q + 1;
+		*q = '/';
+	}
+
+	if (!access(target_fname, F_OK)) {
+		syslog(LOG_INFO, "File: %s exists", target_fname);
+		if (!(flags & OVER_WRITE)) {
+			error = HV_ERROR_ALREADY_EXISTS;
+			goto done;
+		}
+	}
+
+	target_fd = open(target_fname,
+			 O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC, 0744);
+	if (target_fd == -1) {
+		syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
+		goto done;
+	}
+
+	error = 0;
+done:
+	if (error)
+		target_fname[0] = '\0';
+	return error;
+}
+
+static int hv_copy_data(struct hv_do_fcopy *cpmsg)
+{
+	ssize_t bytes_written;
+	int ret = 0;
+
+	bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
+			       cpmsg->offset);
+
+	filesize += cpmsg->size;
+	if (bytes_written != cpmsg->size) {
+		switch (errno) {
+		case ENOSPC:
+			ret = HV_ERROR_DISK_FULL;
+			break;
+		default:
+			ret = HV_E_FAIL;
+			break;
+		}
+		syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
+		       filesize, (long)bytes_written, strerror(errno));
+	}
+
+	return ret;
+}
+
+/*
+ * Reset target_fname to "" in the two below functions for hibernation: if
+ * the fcopy operation is aborted by hibernation, the daemon should remove the
+ * partially-copied file; to achieve this, the hv_utils driver always fakes a
+ * CANCEL_FCOPY message upon suspend, and later when the VM resumes back,
+ * the daemon calls hv_copy_cancel() to remove the file; if a file is copied
+ * successfully before suspend, hv_copy_finished() must reset target_fname to
+ * avoid that the file can be incorrectly removed upon resume, since the faked
+ * CANCEL_FCOPY message is spurious in this case.
+ */
+static int hv_copy_finished(void)
+{
+	close(target_fd);
+	target_fname[0] = '\0';
+	return 0;
+}
+
+static void print_usage(char *argv[])
+{
+	fprintf(stderr, "Usage: %s [options]\n"
+		"Options are:\n"
+		"  -n, --no-daemon        stay in foreground, don't daemonize\n"
+		"  -h, --help             print this help\n", argv[0]);
+}
+
+static bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, unsigned char *buf,
+				      unsigned int buflen, const int *fw_version, int fw_vercnt,
+				const int *srv_version, int srv_vercnt,
+				int *nego_fw_version, int *nego_srv_version)
+{
+	int icframe_major, icframe_minor;
+	int icmsg_major, icmsg_minor;
+	int fw_major, fw_minor;
+	int srv_major, srv_minor;
+	int i, j;
+	bool found_match = false;
+	struct icmsg_negotiate *negop;
+
+	/* Check that there's enough space for icframe_vercnt, icmsg_vercnt */
+	if (buflen < ICMSG_HDR + offsetof(struct icmsg_negotiate, reserved)) {
+		syslog(LOG_ERR, "Invalid icmsg negotiate");
+		return false;
+	}
+
+	icmsghdrp->icmsgsize = 0x10;
+	negop = (struct icmsg_negotiate *)&buf[ICMSG_HDR];
+
+	icframe_major = negop->icframe_vercnt;
+	icframe_minor = 0;
+
+	icmsg_major = negop->icmsg_vercnt;
+	icmsg_minor = 0;
+
+	/* Validate negop packet */
+	if (icframe_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
+	    icmsg_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
+	    ICMSG_NEGOTIATE_PKT_SIZE(icframe_major, icmsg_major) > buflen) {
+		syslog(LOG_ERR, "Invalid icmsg negotiate - icframe_major: %u, icmsg_major: %u\n",
+		       icframe_major, icmsg_major);
+		goto fw_error;
+	}
+
+	/*
+	 * Select the framework version number we will
+	 * support.
+	 */
+
+	for (i = 0; i < fw_vercnt; i++) {
+		fw_major = (fw_version[i] >> 16);
+		fw_minor = (fw_version[i] & 0xFFFF);
+
+		for (j = 0; j < negop->icframe_vercnt; j++) {
+			if (negop->icversion_data[j].major == fw_major &&
+			    negop->icversion_data[j].minor == fw_minor) {
+				icframe_major = negop->icversion_data[j].major;
+				icframe_minor = negop->icversion_data[j].minor;
+				found_match = true;
+				break;
+			}
+		}
+
+		if (found_match)
+			break;
+	}
+
+	if (!found_match)
+		goto fw_error;
+
+	found_match = false;
+
+	for (i = 0; i < srv_vercnt; i++) {
+		srv_major = (srv_version[i] >> 16);
+		srv_minor = (srv_version[i] & 0xFFFF);
+
+		for (j = negop->icframe_vercnt;
+			(j < negop->icframe_vercnt + negop->icmsg_vercnt);
+			j++) {
+			if (negop->icversion_data[j].major == srv_major &&
+			    negop->icversion_data[j].minor == srv_minor) {
+				icmsg_major = negop->icversion_data[j].major;
+				icmsg_minor = negop->icversion_data[j].minor;
+				found_match = true;
+				break;
+			}
+		}
+
+		if (found_match)
+			break;
+	}
+
+	/*
+	 * Respond with the framework and service
+	 * version numbers we can support.
+	 */
+fw_error:
+	if (!found_match) {
+		negop->icframe_vercnt = 0;
+		negop->icmsg_vercnt = 0;
+	} else {
+		negop->icframe_vercnt = 1;
+		negop->icmsg_vercnt = 1;
+	}
+
+	if (nego_fw_version)
+		*nego_fw_version = (icframe_major << 16) | icframe_minor;
+
+	if (nego_srv_version)
+		*nego_srv_version = (icmsg_major << 16) | icmsg_minor;
+
+	negop->icversion_data[0].major = icframe_major;
+	negop->icversion_data[0].minor = icframe_minor;
+	negop->icversion_data[1].major = icmsg_major;
+	negop->icversion_data[1].minor = icmsg_minor;
+
+	return found_match;
+}
+
+static void wcstoutf8(char *dest, const __u16 *src, size_t dest_size)
+{
+	size_t len = 0;
+
+	while (len < dest_size) {
+		if (src[len] < 0x80)
+			dest[len++] = (char)(*src++);
+		else
+			dest[len++] = 'X';
+	}
+
+	dest[len] = '\0';
+}
+
+static int hv_fcopy_start(struct hv_start_fcopy *smsg_in)
+{
+	setlocale(LC_ALL, "en_US.utf8");
+	size_t file_size, path_size;
+	char *file_name, *path_name;
+	char *in_file_name = (char *)smsg_in->file_name;
+	char *in_path_name = (char *)smsg_in->path_name;
+
+	file_size = wcstombs(NULL, (const wchar_t *restrict)in_file_name, 0) + 1;
+	path_size = wcstombs(NULL, (const wchar_t *restrict)in_path_name, 0) + 1;
+
+	file_name = (char *)malloc(file_size * sizeof(char));
+	path_name = (char *)malloc(path_size * sizeof(char));
+
+	wcstoutf8(file_name, (__u16 *)in_file_name, file_size);
+	wcstoutf8(path_name, (__u16 *)in_path_name, path_size);
+
+	return hv_fcopy_create_file(file_name, path_name, smsg_in->copy_flags);
+}
+
+static int hv_fcopy_send_data(struct hv_fcopy_hdr *fcopy_msg, int recvlen)
+{
+	int operation = fcopy_msg->operation;
+
+	/*
+	 * The  strings sent from the host are encoded in
+	 * utf16; convert it to utf8 strings.
+	 * The host assures us that the utf16 strings will not exceed
+	 * the max lengths specified. We will however, reserve room
+	 * for the string terminating character - in the utf16s_utf8s()
+	 * function we limit the size of the buffer where the converted
+	 * string is placed to W_MAX_PATH -1 to guarantee
+	 * that the strings can be properly terminated!
+	 */
+
+	switch (operation) {
+	case START_FILE_COPY:
+		return hv_fcopy_start((struct hv_start_fcopy *)fcopy_msg);
+	case WRITE_TO_FILE:
+		return hv_copy_data((struct hv_do_fcopy *)fcopy_msg);
+	case COMPLETE_FCOPY:
+		return hv_copy_finished();
+	}
+
+	return HV_E_FAIL;
+}
+
+/* process the packet recv from host */
+static int fcopy_pkt_process(struct vmbus_br *txbr)
+{
+	int ret, offset, pktlen;
+	int fcopy_srv_version;
+	const struct vmbus_chanpkt_hdr *pkt;
+	struct hv_fcopy_hdr *fcopy_msg;
+	struct icmsg_hdr *icmsghdr;
+
+	pkt = (const struct vmbus_chanpkt_hdr *)desc;
+	offset = pkt->hlen << 3;
+	pktlen = (pkt->tlen << 3) - offset;
+	icmsghdr = (struct icmsg_hdr *)&desc[offset + sizeof(struct vmbuspipe_hdr)];
+	icmsghdr->status = HV_E_FAIL;
+
+	if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
+		if (vmbus_prep_negotiate_resp(icmsghdr, desc + offset, pktlen, fw_versions,
+					      FW_VER_COUNT, fcopy_versions, FCOPY_VER_COUNT,
+					      NULL, &fcopy_srv_version)) {
+			syslog(LOG_INFO, "FCopy IC version %d.%d",
+			       fcopy_srv_version >> 16, fcopy_srv_version & 0xFFFF);
+			icmsghdr->status = 0;
+		}
+	} else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
+		/* Ensure recvlen is big enough to contain hv_fcopy_hdr */
+		if (pktlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
+			syslog(LOG_ERR, "Invalid Fcopy hdr. Packet length too small: %u",
+			       pktlen);
+			return -ENOBUFS;
+		}
+
+		fcopy_msg = (struct hv_fcopy_hdr *)&desc[offset + ICMSG_HDR];
+		icmsghdr->status = hv_fcopy_send_data(fcopy_msg, pktlen);
+	}
+
+	icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
+	ret = rte_vmbus_chan_send(txbr, 0x6, desc + offset, pktlen, 0);
+	if (ret) {
+		syslog(LOG_ERR, "Write to ringbuffer failed err: %d", ret);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void fcopy_get_first_folder(char *path, char *chan_no)
+{
+	DIR *dir = opendir(path);
+	struct dirent *entry;
+
+	if (!dir) {
+		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n", strerror(errno));
+		return;
+	}
+
+	while ((entry = readdir(dir)) != NULL) {
+		if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".") != 0 &&
+		    strcmp(entry->d_name, "..") != 0) {
+			strcpy(chan_no, entry->d_name);
+			break;
+		}
+	}
+
+	closedir(dir);
+}
+
+static void fcopy_set_ring_size(char *path, char *inst, int size)
+{
+	char ring_size_path[MAX_PATH_LEN] = {0};
+	FILE *fd;
+
+	snprintf(ring_size_path, sizeof(ring_size_path), "%s/%s/%s", path, inst, "ring_size");
+	fd = fopen(ring_size_path, "w");
+	if (!fd) {
+		syslog(LOG_WARNING, "Failed to open ring_size file (errno=%s).\n", strerror(errno));
+		return;
+	}
+	fprintf(fd, "%d", size);
+	fclose(fd);
+}
+
+static char *fcopy_read_sysfs(char *path, char *buf, int len)
+{
+	FILE *fd;
+	char *ret;
+
+	fd = fopen(path, "r");
+	if (!fd)
+		return NULL;
+
+	ret = fgets(buf, len, fd);
+	fclose(fd);
+
+	return ret;
+}
+
+static int fcopy_get_instance_id(char *path, char *class_id, char *inst)
+{
+	DIR *dir = opendir(path);
+	struct dirent *entry;
+	char tmp_path[MAX_PATH_LEN] = {0};
+	char line[MAX_LINE_LEN];
+
+	if (!dir) {
+		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n", strerror(errno));
+		return -EINVAL;
+	}
+
+	while ((entry = readdir(dir)) != NULL) {
+		if (entry->d_type == DT_LNK && strcmp(entry->d_name, ".") != 0 &&
+		    strcmp(entry->d_name, "..") != 0) {
+			/* search for the sysfs path with matching class_id */
+			snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
+				 path, entry->d_name, "class_id");
+			if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
+				continue;
+
+			/* class id matches, now fetch the instance id from device_id */
+			if (strstr(line, class_id)) {
+				snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
+					 path, entry->d_name, "device_id");
+				if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
+					continue;
+				/* remove braces */
+				strncpy(inst, line + 1, strlen(line) - 3);
+				break;
+			}
+		}
+	}
+
+	closedir(dir);
+	return 0;
+}
+
+int main(int argc, char *argv[])
+{
+	int fcopy_fd = -1, tmp = 1;
+	int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
+	struct vmbus_br txbr, rxbr;
+	void *ring;
+	uint32_t len = HV_RING_SIZE;
+	char uio_name[10] = {0};
+	char uio_dev_path[15] = {0};
+	char uio_path[MAX_PATH_LEN] = {0};
+	char inst[MAX_LINE_LEN] = {0};
+
+	static struct option long_options[] = {
+		{"help",	no_argument,	   0,  'h' },
+		{"no-daemon",	no_argument,	   0,  'n' },
+		{0,		0,		   0,  0   }
+	};
+
+	while ((opt = getopt_long(argc, argv, "hn", long_options,
+				  &long_index)) != -1) {
+		switch (opt) {
+		case 'n':
+			daemonize = 0;
+			break;
+		case 'h':
+		default:
+			print_usage(argv);
+			exit(EXIT_FAILURE);
+		}
+	}
+
+	if (daemonize && daemon(1, 0)) {
+		syslog(LOG_ERR, "daemon() failed; error: %s", strerror(errno));
+		exit(EXIT_FAILURE);
+	}
+
+	openlog("HV_UIO_FCOPY", 0, LOG_USER);
+	syslog(LOG_INFO, "starting; pid is:%d", getpid());
+
+	/* get instance id */
+	if (fcopy_get_instance_id(DEVICES_SYSFS, FCOPY_CLASS_ID, inst))
+		exit(EXIT_FAILURE);
+
+	/* set ring_size value */
+	fcopy_set_ring_size(DEVICES_SYSFS, inst, HV_RING_SIZE);
+
+	/* get /dev/uioX dev path and open it */
+	snprintf(uio_path, sizeof(uio_path), "%s/%s/%s", DEVICES_SYSFS, inst, "uio");
+	fcopy_get_first_folder(uio_path, uio_name);
+	snprintf(uio_dev_path, sizeof(uio_dev_path), "/dev/%s", uio_name);
+	fcopy_fd = open(uio_dev_path, O_RDWR);
+
+	if (fcopy_fd < 0) {
+		syslog(LOG_ERR, "open %s failed; error: %d %s",
+		       uio_dev_path, errno, strerror(errno));
+		syslog(LOG_ERR, "Please make sure module uio_hv_vmbus_client is loaded and" \
+		       " device is not used by any other application\n");
+		ret = fcopy_fd;
+		exit(EXIT_FAILURE);
+	}
+
+	ring = mmap(NULL, 2 * HV_RING_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fcopy_fd, 0);
+	if (ring == MAP_FAILED) {
+		ret = errno;
+		syslog(LOG_ERR, "mmap ringbuffer failed; error: %d %s", ret, strerror(ret));
+		goto close;
+	}
+	vmbus_br_setup(&txbr, ring, HV_RING_SIZE);
+	vmbus_br_setup(&rxbr, (char *)ring + HV_RING_SIZE, HV_RING_SIZE);
+
+	while (1) {
+		/*
+		 * In this loop we process fcopy messages after the
+		 * handshake is complete.
+		 */
+		ret = pread(fcopy_fd, &tmp, sizeof(int), 0);
+		if (ret < 0) {
+			syslog(LOG_ERR, "pread failed: %s", strerror(errno));
+			continue;
+		}
+
+		len = HV_RING_SIZE;
+		ret = rte_vmbus_chan_recv_raw(&rxbr, desc, &len);
+		if (unlikely(ret <= 0)) {
+			/* This indicates a failure to communicate (or worse) */
+			syslog(LOG_ERR, "VMBus channel recv error: %d", ret);
+		} else {
+			ret = fcopy_pkt_process(&txbr);
+			if (ret < 0)
+				goto close;
+
+			/* Signal host */
+			tmp = 1;
+			if ((write(fcopy_fd, &tmp, sizeof(int))) != sizeof(int)) {
+				ret = errno;
+				syslog(LOG_ERR, "Registration failed: %s\n", strerror(ret));
+				goto close;
+			}
+		}
+	}
+close:
+	close(fcopy_fd);
+	return ret;
+}
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [PATCH v3 1/3] uio: Add hv_vmbus_client driver
  2023-07-14 10:25 ` [PATCH v3 1/3] uio: Add hv_vmbus_client driver Saurabh Sengar
@ 2023-08-02 21:43   ` Michael Kelley (LINUX)
  0 siblings, 0 replies; 9+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-02 21:43 UTC (permalink / raw)
  To: Saurabh Sengar, KY Srinivasan, Haiyang Zhang, wei.liu,
	Dexuan Cui, gregkh, corbet, linux-kernel, linux-hyperv,
	linux-doc

From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14, 2023 3:26 AM
> 
> Add a new UIO-based driver that generically supports low speed Hyper-V
> VMBus devices. This driver can be bound to VMBus devices by user space
> drivers that provide device-specific management.
> 
> The new driver provides the following core functionality, which is
> suitable for low speed devices:
> * A single VMBus channel for each device
> * Ability to specify the VMBus channel ring buffer size for each device
> * Host notification via a hypercall instead of monitor bits
> 
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> [V3]
> - Removed ringbuffer sysfs entry and used uio framework for mmap
> - Remove ".id_table = NULL"
> - kasprintf -> devm_kasprintf
> - Change global variable ring_size to per device
> - More checks on value which can be set for ring_size
> - Remove driverctl, and used echo command instead for driver documentation
> - Remove unnecessary one time use macros
> - Change kernel version and date for sysfs documentation
> - Update documentation
> - Better commit message
> 
> [V2]
> - Update driver info in Documentation/driver-api/uio-howto.rst
> - Update ring_size sysfs info in Documentation/ABI/stable/sysfs-bus-vmbus
> - Remove DRIVER_VERSION
> - Remove refcnt
> - scnprintf -> sysfs_emit
> - sysfs_create_file -> ATTRIBUTE_GROUPS + ".driver.groups";
> - sysfs_create_bin_file -> device_create_bin_file
> - dev_notice -> dev_err
> - remove MODULE_VERSION
> 
>  Documentation/ABI/stable/sysfs-bus-vmbus |  10 ++
>  Documentation/driver-api/uio-howto.rst   |  54 ++++++
>  drivers/uio/Kconfig                      |  12 ++
>  drivers/uio/Makefile                     |   1 +
>  drivers/uio/uio_hv_vmbus_client.c        | 218 +++++++++++++++++++++++
>  5 files changed, 295 insertions(+)
>  create mode 100644 drivers/uio/uio_hv_vmbus_client.c
> 

Reviewed-by: Michael Kelley <mikelley@microsoft.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v3 2/3] tools: hv: Add vmbus_bufring
  2023-07-14 10:25 ` [PATCH v3 2/3] tools: hv: Add vmbus_bufring Saurabh Sengar
@ 2023-08-02 21:43   ` Michael Kelley (LINUX)
  2023-08-03 12:06     ` Saurabh Singh Sengar
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-02 21:43 UTC (permalink / raw)
  To: Saurabh Sengar, KY Srinivasan, Haiyang Zhang, wei.liu,
	Dexuan Cui, gregkh, corbet, linux-kernel, linux-hyperv,
	linux-doc

From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14, 2023 3:26 AM
> 
> Provide a userspace interface for userspace drivers or applications to
> read/write a VMBus ringbuffer. A significant part of this code is
> borrowed from DPDK[1]. Current library is supported exclusively for
> the x86 architecture.
> 
> To build this library:
> make -C tools/hv libvmbus_bufring.a
> 
> Applications using this library can include the vmbus_bufring.h header
> file and libvmbus_bufring.a statically.
> 
> [1] https://github.com/DPDK/dpdk/
> 
> Signed-off-by: Mary Hardy <maryhardy@microsoft.com>
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> [V3]
> - Made ring buffer data offset depend on page size
> - remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
> - Added legal counsel sign-off
> - Removed "Link:" tag
> - Improve commit messages
> - new library compilation dependent on x86
> - simplify mmap
> 
> [V2]
> - simpler sysfs path, less parsing
> 
>  tools/hv/Build           |   1 +
>  tools/hv/Makefile        |  13 +-
>  tools/hv/vmbus_bufring.c | 297 +++++++++++++++++++++++++++++++++++++++
>  tools/hv/vmbus_bufring.h | 154 ++++++++++++++++++++
>  4 files changed, 464 insertions(+), 1 deletion(-)
>  create mode 100644 tools/hv/vmbus_bufring.c
>  create mode 100644 tools/hv/vmbus_bufring.h
> 
> diff --git a/tools/hv/Build b/tools/hv/Build
> index 6cf51fa4b306..2a667d3d94cb 100644
> --- a/tools/hv/Build
> +++ b/tools/hv/Build
> @@ -1,3 +1,4 @@
>  hv_kvp_daemon-y += hv_kvp_daemon.o
>  hv_vss_daemon-y += hv_vss_daemon.o
>  hv_fcopy_daemon-y += hv_fcopy_daemon.o
> +vmbus_bufring-y += vmbus_bufring.o
> diff --git a/tools/hv/Makefile b/tools/hv/Makefile
> index fe770e679ae8..33cf488fd20f 100644
> --- a/tools/hv/Makefile
> +++ b/tools/hv/Makefile
> @@ -11,14 +11,19 @@ srctree := $(patsubst %/,%,$(dir $(CURDIR)))
>  srctree := $(patsubst %/,%,$(dir $(srctree)))
>  endif
> 
> +include $(srctree)/tools/scripts/Makefile.arch
> +
>  # Do not use make's built-in rules
>  # (this improves performance and avoids hard-to-debug behaviour);
>  MAKEFLAGS += -r
> 
>  override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
> 
> +ifeq ($(SRCARCH),x86)
> +ALL_LIBS := libvmbus_bufring.a
> +endif
>  ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> -ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
> +ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst
> %,$(OUTPUT)%,$(ALL_LIBS))
> 
>  ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
> 
> @@ -27,6 +32,12 @@ all: $(ALL_PROGRAMS)
>  export srctree OUTPUT CC LD CFLAGS
>  include $(srctree)/tools/build/Makefile.include
> 
> +HV_VMBUS_BUFRING_IN := $(OUTPUT)vmbus_bufring.o
> +$(HV_VMBUS_BUFRING_IN): FORCE
> +	$(Q)$(MAKE) $(build)=vmbus_bufring
> +$(OUTPUT)libvmbus_bufring.a : vmbus_bufring.o
> +	$(AR) rcs $@ $^
> +
>  HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
>  $(HV_KVP_DAEMON_IN): FORCE
>  	$(Q)$(MAKE) $(build)=hv_kvp_daemon
> diff --git a/tools/hv/vmbus_bufring.c b/tools/hv/vmbus_bufring.c
> new file mode 100644
> index 000000000000..fb1f0489c625
> --- /dev/null
> +++ b/tools/hv/vmbus_bufring.c
> @@ -0,0 +1,297 @@
> +// SPDX-License-Identifier: BSD-3-Clause
> +/*
> + * Copyright (c) 2009-2012,2016,2023 Microsoft Corp.
> + * Copyright (c) 2012 NetApp Inc.
> + * Copyright (c) 2012 Citrix Inc.
> + * All rights reserved.
> + */
> +
> +#include <errno.h>
> +#include <emmintrin.h>
> +#include <stdio.h>
> +#include <string.h>
> +#include <unistd.h>
> +#include <sys/uio.h>
> +#include "vmbus_bufring.h"
> +
> +#define	rte_compiler_barrier()	({ asm volatile ("" : : : "memory"); })
> +#define RINGDATA_START_OFFSET	(getpagesize())
> +#define VMBUS_RQST_ERROR	0xFFFFFFFFFFFFFFFF
> +#define ALIGN(val, align)	((typeof(val))((val) & (~((typeof(val))((align) - 1)))))
> +
> +/* Increase bufring index by inc with wraparound */
> +static inline uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz)
> +{
> +	idx += inc;
> +	if (idx >= sz)
> +		idx -= sz;
> +
> +	return idx;
> +}
> +
> +void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen)
> +{
> +	br->vbr = buf;
> +	br->windex = br->vbr->windex;
> +	br->dsize = blen - RINGDATA_START_OFFSET;
> +}
> +
> +static inline __always_inline void
> +rte_smp_mb(void)
> +{
> +	asm volatile("lock addl $0, -128(%%rsp); " ::: "memory");
> +}
> +
> +static inline int
> +rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
> +{
> +	uint8_t res;
> +
> +	asm volatile("lock ; "
> +		     "cmpxchgl %[src], %[dst];"
> +		     "sete %[res];"
> +		     : [res] "=a" (res),     /* output */
> +		     [dst] "=m" (*dst)
> +		     : [src] "r" (src),      /* input */
> +		     "a" (exp),
> +		     "m" (*dst)
> +		     : "memory");            /* no-clobber list */
> +	return res;
> +}
> +
> +static inline uint32_t
> +vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
> +		  const void *src0, uint32_t cplen)
> +{
> +	uint8_t *br_data = (uint8_t *)tbr->vbr + RINGDATA_START_OFFSET;
> +	uint32_t br_dsize = tbr->dsize;
> +	const uint8_t *src = src0;
> +
> +	if (cplen > br_dsize - windex) {
> +		uint32_t fraglen = br_dsize - windex;
> +
> +		/* Wrap-around detected */
> +		memcpy(br_data + windex, src, fraglen);
> +		memcpy(br_data, src + fraglen, cplen - fraglen);
> +	} else {
> +		memcpy(br_data + windex, src, cplen);
> +	}
> +
> +	return vmbus_br_idxinc(windex, cplen, br_dsize);
> +}
> +
> +/*
> + * Write scattered channel packet to TX bufring.
> + *
> + * The offset of this channel packet is written as a 64bits value
> + * immediately after this channel packet.
> + *
> + * The write goes through three stages:
> + *  1. Reserve space in ring buffer for the new data.
> + *     Writer atomically moves priv_write_index.
> + *  2. Copy the new data into the ring.
> + *  3. Update the tail of the ring (visible to host) that indicates
> + *     next read location. Writer updates write_index
> + */
> +static int
> +vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
> +		 bool *need_sig)
> +{
> +	struct vmbus_bufring *vbr = tbr->vbr;
> +	uint32_t ring_size = tbr->dsize;
> +	uint32_t old_windex, next_windex, windex, total;
> +	uint64_t save_windex;
> +	int i;
> +
> +	total = 0;
> +	for (i = 0; i < iovlen; i++)
> +		total += iov[i].iov_len;
> +	total += sizeof(save_windex);
> +
> +	/* Reserve space in ring */
> +	do {
> +		uint32_t avail;
> +
> +		/* Get current free location */
> +		old_windex = tbr->windex;
> +
> +		/* Prevent compiler reordering this with calculation */
> +		rte_compiler_barrier();
> +
> +		avail = vmbus_br_availwrite(tbr, old_windex);
> +
> +		/* If not enough space in ring, then tell caller. */
> +		if (avail <= total)
> +			return -EAGAIN;
> +
> +		next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
> +
> +		/* Atomic update of next write_index for other threads */
> +	} while (!rte_atomic32_cmpset(&tbr->windex, old_windex, next_windex));
> +
> +	/* Space from old..new is now reserved */
> +	windex = old_windex;
> +	for (i = 0; i < iovlen; i++)
> +		windex = vmbus_txbr_copyto(tbr, windex, iov[i].iov_base, iov[i].iov_len);
> +
> +	/* Set the offset of the current channel packet. */
> +	save_windex = ((uint64_t)old_windex) << 32;
> +	windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
> +				   sizeof(save_windex));
> +
> +	/* The region reserved should match region used */
> +	if (windex != next_windex)
> +		return -EINVAL;
> +
> +	/* Ensure that data is available before updating host index */
> +	rte_compiler_barrier();
> +
> +	/* Checkin for our reservation. wait for our turn to update host */
> +	while (!rte_atomic32_cmpset(&vbr->windex, old_windex, next_windex))
> +		_mm_pause();
> +
> +	return 0;
> +}
> +
> +int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
> +			uint32_t dlen, uint32_t flags)
> +{
> +	struct vmbus_chanpkt pkt;
> +	unsigned int pktlen, pad_pktlen;
> +	const uint32_t hlen = sizeof(pkt);
> +	bool send_evt = false;
> +	uint64_t pad = 0;
> +	struct iovec iov[3];
> +	int error;
> +
> +	pktlen = hlen + dlen;
> +	pad_pktlen = ALIGN(pktlen, sizeof(uint64_t));

This ALIGN function rounds down.  So pad_pktlen could be
less than pktlen.

> +
> +	pkt.hdr.type = type;
> +	pkt.hdr.flags = flags;
> +	pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
> +	pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
> +	pkt.hdr.xactid = VMBUS_RQST_ERROR; /* doesn't support multiple requests at same time */
> +
> +	iov[0].iov_base = &pkt;
> +	iov[0].iov_len = hlen;
> +	iov[1].iov_base = data;
> +	iov[1].iov_len = dlen;
> +	iov[2].iov_base = &pad;
> +	iov[2].iov_len = pad_pktlen - pktlen;

Given the way your ALIGN function works, the above could
produce a negative value for iov[2].iov_len.  Then bad things
will happen. :-(

> +
> +	error = vmbus_txbr_write(txbr, iov, 3, &send_evt);
> +
> +	return error;
> +}
> +
> +static inline uint32_t
> +vmbus_rxbr_copyfrom(const struct vmbus_br *rbr, uint32_t rindex,
> +		    void *dst0, size_t cplen)
> +{
> +	const uint8_t *br_data = (uint8_t *)rbr->vbr + RINGDATA_START_OFFSET;
> +	uint32_t br_dsize = rbr->dsize;
> +	uint8_t *dst = dst0;
> +
> +	if (cplen > br_dsize - rindex) {
> +		uint32_t fraglen = br_dsize - rindex;
> +
> +		/* Wrap-around detected. */
> +		memcpy(dst, br_data + rindex, fraglen);
> +		memcpy(dst + fraglen, br_data, cplen - fraglen);
> +	} else {
> +		memcpy(dst, br_data + rindex, cplen);
> +	}
> +
> +	return vmbus_br_idxinc(rindex, cplen, br_dsize);
> +}
> +
> +/* Copy data from receive ring but don't change index */
> +static int
> +vmbus_rxbr_peek(const struct vmbus_br *rbr, void *data, size_t dlen)
> +{
> +	uint32_t avail;
> +
> +	/*
> +	 * The requested data and the 64bits channel packet
> +	 * offset should be there at least.
> +	 */
> +	avail = vmbus_br_availread(rbr);
> +	if (avail < dlen + sizeof(uint64_t))
> +		return -EAGAIN;
> +
> +	vmbus_rxbr_copyfrom(rbr, rbr->vbr->rindex, data, dlen);
> +	return 0;
> +}
> +
> +/*
> + * Copy data from receive ring and change index
> + * NOTE:
> + * We assume (dlen + skip) == sizeof(channel packet).
> + */
> +static int
> +vmbus_rxbr_read(struct vmbus_br *rbr, void *data, size_t dlen, size_t skip)
> +{
> +	struct vmbus_bufring *vbr = rbr->vbr;
> +	uint32_t br_dsize = rbr->dsize;
> +	uint32_t rindex;
> +
> +	if (vmbus_br_availread(rbr) < dlen + skip + sizeof(uint64_t))
> +		return -EAGAIN;
> +
> +	/* Record where host was when we started read (for debug) */
> +	rbr->windex = rbr->vbr->windex;
> +
> +	/*
> +	 * Copy channel packet from RX bufring.
> +	 */
> +	rindex = vmbus_br_idxinc(rbr->vbr->rindex, skip, br_dsize);
> +	rindex = vmbus_rxbr_copyfrom(rbr, rindex, data, dlen);
> +
> +	/*
> +	 * Discard this channel packet's 64bits offset, which is useless to us.
> +	 */
> +	rindex = vmbus_br_idxinc(rindex, sizeof(uint64_t), br_dsize);
> +
> +	/* Update the read index _after_ the channel packet is fetched.	 */
> +	rte_compiler_barrier();
> +
> +	vbr->rindex = rindex;
> +
> +	return 0;
> +}
> +
> +int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr,
> +			    void *data, uint32_t *len)
> +{
> +	struct vmbus_chanpkt_hdr pkt;
> +	uint32_t dlen, bufferlen = *len;
> +	int error;
> +
> +	error = vmbus_rxbr_peek(rxbr, &pkt, sizeof(pkt));
> +	if (error)
> +		return error;
> +
> +	if (unlikely(pkt.hlen < VMBUS_CHANPKT_HLEN_MIN))
> +		/* XXX this channel is dead actually. */
> +		return -EIO;
> +
> +	if (unlikely(pkt.hlen > pkt.tlen))
> +		return -EIO;
> +
> +	/* Length are in quad words */
> +	dlen = pkt.tlen << VMBUS_CHANPKT_SIZE_SHIFT;
> +	*len = dlen;
> +
> +	/* If caller buffer is not large enough */
> +	if (unlikely(dlen > bufferlen))
> +		return -ENOBUFS;
> +
> +	/* Read data and skip packet header */
> +	error = vmbus_rxbr_read(rxbr, data, dlen, 0);
> +	if (error)
> +		return error;
> +
> +	/* Return the number of bytes read */
> +	return dlen + sizeof(uint64_t);
> +}
> diff --git a/tools/hv/vmbus_bufring.h b/tools/hv/vmbus_bufring.h
> new file mode 100644
> index 000000000000..45ecc48e517f
> --- /dev/null
> +++ b/tools/hv/vmbus_bufring.h
> @@ -0,0 +1,154 @@
> +/* SPDX-License-Identifier: BSD-3-Clause */
> +
> +#ifndef _VMBUS_BUF_H_
> +#define _VMBUS_BUF_H_
> +
> +#include <stdbool.h>
> +#include <stdint.h>
> +
> +#define __packed   __attribute__((__packed__))
> +#define unlikely(x)	__builtin_expect(!!(x), 0)
> +
> +#define ICMSGHDRFLAG_TRANSACTION	1
> +#define ICMSGHDRFLAG_REQUEST		2
> +#define ICMSGHDRFLAG_RESPONSE		4
> +
> +#define IC_VERSION_NEGOTIATION_MAX_VER_COUNT 100
> +#define ICMSG_HDR (sizeof(struct vmbuspipe_hdr) + sizeof(struct icmsg_hdr))
> +#define ICMSG_NEGOTIATE_PKT_SIZE(icframe_vercnt, icmsg_vercnt) \
> +	(ICMSG_HDR + sizeof(struct icmsg_negotiate) + \
> +	 (((icframe_vercnt) + (icmsg_vercnt)) * sizeof(struct ic_version)))
> +
> +/*
> + * Channel packets
> + */
> +
> +/* Channel packet flags */
> +#define VMBUS_CHANPKT_TYPE_INBAND	0x0006
> +#define VMBUS_CHANPKT_TYPE_RXBUF	0x0007
> +#define VMBUS_CHANPKT_TYPE_GPA		0x0009
> +#define VMBUS_CHANPKT_TYPE_COMP		0x000b
> +
> +#define VMBUS_CHANPKT_FLAG_NONE		0
> +#define VMBUS_CHANPKT_FLAG_RC		0x0001  /* report completion */
> +
> +#define VMBUS_CHANPKT_SIZE_SHIFT	3
> +#define VMBUS_CHANPKT_SIZE_ALIGN	BIT(VMBUS_CHANPKT_SIZE_SHIFT)
> +#define VMBUS_CHANPKT_HLEN_MIN		\
> +	(sizeof(struct vmbus_chanpkt_hdr) >> VMBUS_CHANPKT_SIZE_SHIFT)
> +
> +/*
> + * Buffer ring
> + */
> +struct vmbus_bufring {
> +	volatile uint32_t windex;
> +	volatile uint32_t rindex;
> +
> +	/*
> +	 * Interrupt mask {0,1}
> +	 *
> +	 * For TX bufring, host set this to 1, when it is processing
> +	 * the TX bufring, so that we can safely skip the TX event
> +	 * notification to host.
> +	 *
> +	 * For RX bufring, once this is set to 1 by us, host will not
> +	 * further dispatch interrupts to us, even if there are data
> +	 * pending on the RX bufring.  This effectively disables the
> +	 * interrupt of the channel to which this RX bufring is attached.
> +	 */
> +	volatile uint32_t imask;
> +
> +	/*
> +	 * Win8 uses some of the reserved bits to implement
> +	 * interrupt driven flow management. On the send side
> +	 * we can request that the receiver interrupt the sender
> +	 * when the ring transitions from being full to being able
> +	 * to handle a message of size "pending_send_sz".
> +	 *
> +	 * Add necessary state for this enhancement.
> +	 */
> +	volatile uint32_t pending_send;
> +	uint32_t reserved1[12];
> +
> +	union {
> +		struct {
> +			uint32_t feat_pending_send_sz:1;
> +		};
> +		uint32_t value;
> +	} feature_bits;
> +
> +	/*
> +	 * Ring data starts here + RingDataStartOffset

This mention of RingDataStartOffset looks stale.  I could
not find it defined anywhere.

> +	 * !!! DO NOT place any fields below this !!!
> +	 */
> +	uint8_t data[];
> +} __packed;
> +
> +struct vmbus_br {
> +	struct vmbus_bufring *vbr;
> +	uint32_t	dsize;
> +	uint32_t	windex; /* next available location */
> +};
> +
> +struct vmbus_chanpkt_hdr {
> +	uint16_t	type;	/* VMBUS_CHANPKT_TYPE_ */
> +	uint16_t	hlen;	/* header len, in 8 bytes */
> +	uint16_t	tlen;	/* total len, in 8 bytes */
> +	uint16_t	flags;	/* VMBUS_CHANPKT_FLAG_ */
> +	uint64_t	xactid;
> +} __packed;
> +
> +struct vmbus_chanpkt {
> +	struct vmbus_chanpkt_hdr hdr;
> +} __packed;
> +
> +struct vmbuspipe_hdr {
> +	unsigned int flags;
> +	unsigned int msgsize;
> +} __packed;
> +
> +struct ic_version {
> +	unsigned short major;
> +	unsigned short minor;
> +} __packed;
> +
> +struct icmsg_negotiate {
> +	unsigned short icframe_vercnt;
> +	unsigned short icmsg_vercnt;
> +	unsigned int reserved;
> +	struct ic_version icversion_data[]; /* any size array */
> +} __packed;
> +
> +struct icmsg_hdr {
> +	struct ic_version icverframe;
> +	unsigned short icmsgtype;
> +	struct ic_version icvermsg;
> +	unsigned short icmsgsize;
> +	unsigned int status;
> +	unsigned char ictransaction_id;
> +	unsigned char icflags;
> +	unsigned char reserved[2];
> +} __packed;
> +
> +int rte_vmbus_chan_recv_raw(struct vmbus_br *rxbr, void *data, uint32_t *len);
> +int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void *data,
> +			uint32_t dlen, uint32_t flags);
> +void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int blen);
> +
> +/* Amount of space available for write */
> +static inline uint32_t vmbus_br_availwrite(const struct vmbus_br *br, uint32_t
> windex)
> +{
> +	uint32_t rindex = br->vbr->rindex;
> +
> +	if (windex >= rindex)
> +		return br->dsize - (windex - rindex);
> +	else
> +		return rindex - windex;
> +}
> +
> +static inline uint32_t vmbus_br_availread(const struct vmbus_br *br)
> +{
> +	return br->dsize - vmbus_br_availwrite(br, br->vbr->windex);
> +}
> +
> +#endif	/* !_VMBUS_BUF_H_ */
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver
  2023-07-14 10:25 ` [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver Saurabh Sengar
@ 2023-08-02 21:45   ` Michael Kelley (LINUX)
  2023-08-03 12:12     ` Saurabh Singh Sengar
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Kelley (LINUX) @ 2023-08-02 21:45 UTC (permalink / raw)
  To: Saurabh Sengar, KY Srinivasan, Haiyang Zhang, wei.liu,
	Dexuan Cui, gregkh, corbet, linux-kernel, linux-hyperv,
	linux-doc

From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14, 2023 3:26 AM
> 
> Implement the file copy service for Linux guests on Hyper-V. This
> permits the host to copy a file (over VMBus) into the guest. This
> facility is part of "guest integration services" supported on the
> Hyper-V platform.
> 
> Here is a link that provides additional details on this functionality:
> 
> https://learn.microsoft.com/en-us/powershell/module/hyper-v/copy-vmfile?view=windowsserver2022-ps
> 
> This new fcopy application uses uio_hv_vmbus_client driver which
> makes the earlier hv_util based driver and application obsolete.
> 
> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> ---
> [V3]
> - Improve cover letter and commit messages
> - Improve debug prints
> - Instead of hardcoded instance id, query from class id sysfs
> - Set the ring_size value from application
> - Update the application to mmap /dev/uio instead of sysfs
> - new application compilation dependent on x86
> 
> [V2]
> - simpler sysfs path
> 
>  tools/hv/Build                 |   1 +
>  tools/hv/Makefile              |  10 +-
>  tools/hv/hv_fcopy_uio_daemon.c | 578 +++++++++++++++++++++++++++++++++
>  3 files changed, 588 insertions(+), 1 deletion(-)
>  create mode 100644 tools/hv/hv_fcopy_uio_daemon.c
> 
> diff --git a/tools/hv/Build b/tools/hv/Build
> index 2a667d3d94cb..efcbb74a0d23 100644
> --- a/tools/hv/Build
> +++ b/tools/hv/Build
> @@ -2,3 +2,4 @@ hv_kvp_daemon-y += hv_kvp_daemon.o
>  hv_vss_daemon-y += hv_vss_daemon.o
>  hv_fcopy_daemon-y += hv_fcopy_daemon.o
>  vmbus_bufring-y += vmbus_bufring.o
> +hv_fcopy_uio_daemon-y += hv_fcopy_uio_daemon.o
> diff --git a/tools/hv/Makefile b/tools/hv/Makefile
> index 33cf488fd20f..678c6c450a53 100644
> --- a/tools/hv/Makefile
> +++ b/tools/hv/Makefile
> @@ -21,8 +21,10 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -
> I$(OUTPUT)include
> 
>  ifeq ($(SRCARCH),x86)
>  ALL_LIBS := libvmbus_bufring.a
> -endif
> +ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> hv_fcopy_uio_daemon
> +else
>  ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> +endif
>  ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst
> %,$(OUTPUT)%,$(ALL_LIBS))
> 
>  ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh hv_set_ifconfig.sh
> @@ -56,6 +58,12 @@ $(HV_FCOPY_DAEMON_IN): FORCE
>  $(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
>  	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> 
> +HV_FCOPY_UIO_DAEMON_IN := $(OUTPUT)hv_fcopy_uio_daemon-in.o
> +$(HV_FCOPY_UIO_DAEMON_IN): FORCE
> +	$(Q)$(MAKE) $(build)=hv_fcopy_uio_daemon
> +$(OUTPUT)hv_fcopy_uio_daemon: $(HV_FCOPY_UIO_DAEMON_IN)
> libvmbus_bufring.a
> +	$(QUIET_LINK)$(CC) -lm $< -L. -lvmbus_bufring -o $@
> +
>  clean:
>  	rm -f $(ALL_PROGRAMS)
>  	find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
> diff --git a/tools/hv/hv_fcopy_uio_daemon.c b/tools/hv/hv_fcopy_uio_daemon.c
> new file mode 100644
> index 000000000000..e8618a30dc7e
> --- /dev/null
> +++ b/tools/hv/hv_fcopy_uio_daemon.c
> @@ -0,0 +1,578 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * An implementation of host to guest copy functionality for Linux.
> + *
> + * Copyright (C) 2023, Microsoft, Inc.
> + *
> + * Author : K. Y. Srinivasan <kys@microsoft.com>
> + * Author : Saurabh Sengar <ssengar@microsoft.com>
> + *
> + */
> +
> +#include <dirent.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <getopt.h>
> +#include <locale.h>
> +#include <stdbool.h>
> +#include <stddef.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <syslog.h>
> +#include <unistd.h>
> +#include <sys/mman.h>
> +#include <sys/stat.h>
> +#include <linux/hyperv.h>
> +#include "vmbus_bufring.h"
> +
> +#define ICMSGTYPE_NEGOTIATE	0
> +#define ICMSGTYPE_FCOPY		7
> +
> +#define WIN8_SRV_MAJOR		1
> +#define WIN8_SRV_MINOR		1
> +#define WIN8_SRV_VERSION	(WIN8_SRV_MAJOR << 16 | WIN8_SRV_MINOR)
> +
> +#define MAX_PATH_LEN		300
> +#define MAX_LINE_LEN		40
> +#define DEVICES_SYSFS		"/sys/bus/vmbus/devices"
> +#define FCOPY_CLASS_ID		"34d14be3-dee4-41c8-9ae7-6b174977c192"
> +
> +#define FCOPY_VER_COUNT		1
> +static const int fcopy_versions[] = {
> +	WIN8_SRV_VERSION
> +};
> +
> +#define FW_VER_COUNT		1
> +static const int fw_versions[] = {
> +	UTIL_FW_VERSION
> +};
> +
> +#define HV_RING_SIZE		(4 * 4096)
> +
> +unsigned char desc[HV_RING_SIZE];
> +
> +static int target_fd;
> +static char target_fname[PATH_MAX];
> +static unsigned long long filesize;
> +
> +static int hv_fcopy_create_file(char *file_name, char *path_name, __u32 flags)
> +{
> +	int error = HV_E_FAIL;
> +	char *q, *p;
> +
> +	filesize = 0;
> +	p = (char *)path_name;
> +	snprintf(target_fname, sizeof(target_fname), "%s/%s",
> +		 (char *)path_name, (char *)file_name);
> +
> +	/*
> +	 * Check to see if the path is already in place; if not,
> +	 * create if required.
> +	 */
> +	while ((q = strchr(p, '/')) != NULL) {
> +		if (q == p) {
> +			p++;
> +			continue;
> +		}
> +		*q = '\0';
> +		if (access(path_name, F_OK)) {
> +			if (flags & CREATE_PATH) {
> +				if (mkdir(path_name, 0755)) {
> +					syslog(LOG_ERR, "Failed to create %s",
> +					       path_name);
> +					goto done;
> +				}
> +			} else {
> +				syslog(LOG_ERR, "Invalid path: %s", path_name);
> +				goto done;
> +			}
> +		}
> +		p = q + 1;
> +		*q = '/';
> +	}
> +
> +	if (!access(target_fname, F_OK)) {
> +		syslog(LOG_INFO, "File: %s exists", target_fname);
> +		if (!(flags & OVER_WRITE)) {
> +			error = HV_ERROR_ALREADY_EXISTS;
> +			goto done;
> +		}
> +	}
> +
> +	target_fd = open(target_fname,
> +			 O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC, 0744);
> +	if (target_fd == -1) {
> +		syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
> +		goto done;
> +	}
> +
> +	error = 0;
> +done:
> +	if (error)
> +		target_fname[0] = '\0';
> +	return error;
> +}
> +
> +static int hv_copy_data(struct hv_do_fcopy *cpmsg)
> +{
> +	ssize_t bytes_written;
> +	int ret = 0;
> +
> +	bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
> +			       cpmsg->offset);
> +
> +	filesize += cpmsg->size;
> +	if (bytes_written != cpmsg->size) {
> +		switch (errno) {
> +		case ENOSPC:
> +			ret = HV_ERROR_DISK_FULL;
> +			break;
> +		default:
> +			ret = HV_E_FAIL;
> +			break;
> +		}
> +		syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
> +		       filesize, (long)bytes_written, strerror(errno));
> +	}
> +
> +	return ret;
> +}
> +
> +/*
> + * Reset target_fname to "" in the two below functions for hibernation: if
> + * the fcopy operation is aborted by hibernation, the daemon should remove the
> + * partially-copied file; to achieve this, the hv_utils driver always fakes a
> + * CANCEL_FCOPY message upon suspend, and later when the VM resumes back,
> + * the daemon calls hv_copy_cancel() to remove the file; if a file is copied
> + * successfully before suspend, hv_copy_finished() must reset target_fname to
> + * avoid that the file can be incorrectly removed upon resume, since the faked
> + * CANCEL_FCOPY message is spurious in this case.
> + */
> +static int hv_copy_finished(void)
> +{
> +	close(target_fd);
> +	target_fname[0] = '\0';
> +	return 0;
> +}
> +
> +static void print_usage(char *argv[])
> +{
> +	fprintf(stderr, "Usage: %s [options]\n"
> +		"Options are:\n"
> +		"  -n, --no-daemon        stay in foreground, don't daemonize\n"
> +		"  -h, --help             print this help\n", argv[0]);
> +}
> +
> +static bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp, unsigned char *buf,
> +				      unsigned int buflen, const int *fw_version, int fw_vercnt,
> +				const int *srv_version, int srv_vercnt,
> +				int *nego_fw_version, int *nego_srv_version)
> +{
> +	int icframe_major, icframe_minor;
> +	int icmsg_major, icmsg_minor;
> +	int fw_major, fw_minor;
> +	int srv_major, srv_minor;
> +	int i, j;
> +	bool found_match = false;
> +	struct icmsg_negotiate *negop;
> +
> +	/* Check that there's enough space for icframe_vercnt, icmsg_vercnt */
> +	if (buflen < ICMSG_HDR + offsetof(struct icmsg_negotiate, reserved)) {
> +		syslog(LOG_ERR, "Invalid icmsg negotiate");
> +		return false;
> +	}
> +
> +	icmsghdrp->icmsgsize = 0x10;
> +	negop = (struct icmsg_negotiate *)&buf[ICMSG_HDR];
> +
> +	icframe_major = negop->icframe_vercnt;
> +	icframe_minor = 0;
> +
> +	icmsg_major = negop->icmsg_vercnt;
> +	icmsg_minor = 0;
> +
> +	/* Validate negop packet */
> +	if (icframe_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
> +	    icmsg_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
> +	    ICMSG_NEGOTIATE_PKT_SIZE(icframe_major, icmsg_major) > buflen) {
> +		syslog(LOG_ERR, "Invalid icmsg negotiate - icframe_major: %u, icmsg_major: %u\n",
> +		       icframe_major, icmsg_major);
> +		goto fw_error;
> +	}
> +
> +	/*
> +	 * Select the framework version number we will
> +	 * support.
> +	 */
> +
> +	for (i = 0; i < fw_vercnt; i++) {
> +		fw_major = (fw_version[i] >> 16);
> +		fw_minor = (fw_version[i] & 0xFFFF);
> +
> +		for (j = 0; j < negop->icframe_vercnt; j++) {
> +			if (negop->icversion_data[j].major == fw_major &&
> +			    negop->icversion_data[j].minor == fw_minor) {
> +				icframe_major = negop->icversion_data[j].major;
> +				icframe_minor = negop->icversion_data[j].minor;
> +				found_match = true;
> +				break;
> +			}
> +		}
> +
> +		if (found_match)
> +			break;
> +	}
> +
> +	if (!found_match)
> +		goto fw_error;
> +
> +	found_match = false;
> +
> +	for (i = 0; i < srv_vercnt; i++) {
> +		srv_major = (srv_version[i] >> 16);
> +		srv_minor = (srv_version[i] & 0xFFFF);
> +
> +		for (j = negop->icframe_vercnt;
> +			(j < negop->icframe_vercnt + negop->icmsg_vercnt);
> +			j++) {
> +			if (negop->icversion_data[j].major == srv_major &&
> +			    negop->icversion_data[j].minor == srv_minor) {
> +				icmsg_major = negop->icversion_data[j].major;
> +				icmsg_minor = negop->icversion_data[j].minor;
> +				found_match = true;
> +				break;
> +			}
> +		}
> +
> +		if (found_match)
> +			break;
> +	}
> +
> +	/*
> +	 * Respond with the framework and service
> +	 * version numbers we can support.
> +	 */
> +fw_error:
> +	if (!found_match) {
> +		negop->icframe_vercnt = 0;
> +		negop->icmsg_vercnt = 0;
> +	} else {
> +		negop->icframe_vercnt = 1;
> +		negop->icmsg_vercnt = 1;
> +	}
> +
> +	if (nego_fw_version)
> +		*nego_fw_version = (icframe_major << 16) | icframe_minor;
> +
> +	if (nego_srv_version)
> +		*nego_srv_version = (icmsg_major << 16) | icmsg_minor;
> +
> +	negop->icversion_data[0].major = icframe_major;
> +	negop->icversion_data[0].minor = icframe_minor;
> +	negop->icversion_data[1].major = icmsg_major;
> +	negop->icversion_data[1].minor = icmsg_minor;
> +
> +	return found_match;
> +}
> +
> +static void wcstoutf8(char *dest, const __u16 *src, size_t dest_size)
> +{
> +	size_t len = 0;
> +
> +	while (len < dest_size) {
> +		if (src[len] < 0x80)
> +			dest[len++] = (char)(*src++);
> +		else
> +			dest[len++] = 'X';
> +	}
> +
> +	dest[len] = '\0';
> +}
> +
> +static int hv_fcopy_start(struct hv_start_fcopy *smsg_in)
> +{
> +	setlocale(LC_ALL, "en_US.utf8");
> +	size_t file_size, path_size;
> +	char *file_name, *path_name;
> +	char *in_file_name = (char *)smsg_in->file_name;
> +	char *in_path_name = (char *)smsg_in->path_name;
> +
> +	file_size = wcstombs(NULL, (const wchar_t *restrict)in_file_name, 0) + 1;
> +	path_size = wcstombs(NULL, (const wchar_t *restrict)in_path_name, 0) + 1;
> +
> +	file_name = (char *)malloc(file_size * sizeof(char));
> +	path_name = (char *)malloc(path_size * sizeof(char));
> +
> +	wcstoutf8(file_name, (__u16 *)in_file_name, file_size);
> +	wcstoutf8(path_name, (__u16 *)in_path_name, path_size);
> +
> +	return hv_fcopy_create_file(file_name, path_name, smsg_in->copy_flags);
> +}
> +
> +static int hv_fcopy_send_data(struct hv_fcopy_hdr *fcopy_msg, int recvlen)
> +{
> +	int operation = fcopy_msg->operation;
> +
> +	/*
> +	 * The  strings sent from the host are encoded in
> +	 * utf16; convert it to utf8 strings.
> +	 * The host assures us that the utf16 strings will not exceed
> +	 * the max lengths specified. We will however, reserve room
> +	 * for the string terminating character - in the utf16s_utf8s()
> +	 * function we limit the size of the buffer where the converted
> +	 * string is placed to W_MAX_PATH -1 to guarantee
> +	 * that the strings can be properly terminated!
> +	 */
> +
> +	switch (operation) {
> +	case START_FILE_COPY:
> +		return hv_fcopy_start((struct hv_start_fcopy *)fcopy_msg);
> +	case WRITE_TO_FILE:
> +		return hv_copy_data((struct hv_do_fcopy *)fcopy_msg);
> +	case COMPLETE_FCOPY:
> +		return hv_copy_finished();
> +	}
> +
> +	return HV_E_FAIL;
> +}
> +
> +/* process the packet recv from host */
> +static int fcopy_pkt_process(struct vmbus_br *txbr)
> +{
> +	int ret, offset, pktlen;
> +	int fcopy_srv_version;
> +	const struct vmbus_chanpkt_hdr *pkt;
> +	struct hv_fcopy_hdr *fcopy_msg;
> +	struct icmsg_hdr *icmsghdr;
> +
> +	pkt = (const struct vmbus_chanpkt_hdr *)desc;
> +	offset = pkt->hlen << 3;
> +	pktlen = (pkt->tlen << 3) - offset;
> +	icmsghdr = (struct icmsg_hdr *)&desc[offset + sizeof(struct vmbuspipe_hdr)];
> +	icmsghdr->status = HV_E_FAIL;
> +
> +	if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
> +		if (vmbus_prep_negotiate_resp(icmsghdr, desc + offset, pktlen, fw_versions,
> +					      FW_VER_COUNT, fcopy_versions, FCOPY_VER_COUNT,
> +					      NULL, &fcopy_srv_version)) {
> +			syslog(LOG_INFO, "FCopy IC version %d.%d",
> +			       fcopy_srv_version >> 16, fcopy_srv_version & 0xFFFF);
> +			icmsghdr->status = 0;
> +		}
> +	} else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
> +		/* Ensure recvlen is big enough to contain hv_fcopy_hdr */
> +		if (pktlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
> +			syslog(LOG_ERR, "Invalid Fcopy hdr. Packet length too small: %u",
> +			       pktlen);
> +			return -ENOBUFS;
> +		}
> +
> +		fcopy_msg = (struct hv_fcopy_hdr *)&desc[offset + ICMSG_HDR];
> +		icmsghdr->status = hv_fcopy_send_data(fcopy_msg, pktlen);
> +	}
> +
> +	icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION | ICMSGHDRFLAG_RESPONSE;
> +	ret = rte_vmbus_chan_send(txbr, 0x6, desc + offset, pktlen, 0);
> +	if (ret) {
> +		syslog(LOG_ERR, "Write to ringbuffer failed err: %d", ret);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void fcopy_get_first_folder(char *path, char *chan_no)
> +{
> +	DIR *dir = opendir(path);
> +	struct dirent *entry;
> +
> +	if (!dir) {
> +		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n", strerror(errno));
> +		return;
> +	}
> +
> +	while ((entry = readdir(dir)) != NULL) {
> +		if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".") != 0 &&
> +		    strcmp(entry->d_name, "..") != 0) {
> +			strcpy(chan_no, entry->d_name);
> +			break;
> +		}
> +	}
> +
> +	closedir(dir);
> +}
> +
> +static void fcopy_set_ring_size(char *path, char *inst, int size)
> +{
> +	char ring_size_path[MAX_PATH_LEN] = {0};
> +	FILE *fd;
> +
> +	snprintf(ring_size_path, sizeof(ring_size_path), "%s/%s/%s", path, inst, "ring_size");
> +	fd = fopen(ring_size_path, "w");
> +	if (!fd) {
> +		syslog(LOG_WARNING, "Failed to open ring_size file (errno=%s).\n", strerror(errno));
> +		return;
> +	}
> +	fprintf(fd, "%d", size);

Check for and log an error if the new value isn't accepted by the kernel driver?
The code is using a ring size value that should be accepted by the kernel driver,
but weird stuff happens and it's probably better to know about it.

> +	fclose(fd);
> +}
> +
> +static char *fcopy_read_sysfs(char *path, char *buf, int len)
> +{
> +	FILE *fd;
> +	char *ret;
> +
> +	fd = fopen(path, "r");
> +	if (!fd)
> +		return NULL;
> +
> +	ret = fgets(buf, len, fd);
> +	fclose(fd);
> +
> +	return ret;
> +}
> +
> +static int fcopy_get_instance_id(char *path, char *class_id, char *inst)
> +{
> +	DIR *dir = opendir(path);
> +	struct dirent *entry;
> +	char tmp_path[MAX_PATH_LEN] = {0};
> +	char line[MAX_LINE_LEN];
> +
> +	if (!dir) {
> +		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n", strerror(errno));
> +		return -EINVAL;
> +	}
> +
> +	while ((entry = readdir(dir)) != NULL) {
> +		if (entry->d_type == DT_LNK && strcmp(entry->d_name, ".") != 0 &&
> +		    strcmp(entry->d_name, "..") != 0) {
> +			/* search for the sysfs path with matching class_id */
> +			snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
> +				 path, entry->d_name, "class_id");
> +			if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
> +				continue;
> +
> +			/* class id matches, now fetch the instance id from device_id */
> +			if (strstr(line, class_id)) {
> +				snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
> +					 path, entry->d_name, "device_id");
> +				if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
> +					continue;
> +				/* remove braces */
> +				strncpy(inst, line + 1, strlen(line) - 3);
> +				break;
> +			}
> +		}
> +	}
> +
> +	closedir(dir);
> +	return 0;

If this function doesn't find a matching class_id, it appears that it
returns 0, but with the "inst" parameter unset.  The caller will then
proceed as if "inst" was set when it is actually an uninitialized stack
variable.  Probably need some better error detection and handling.

> +}
> +
> +int main(int argc, char *argv[])
> +{
> +	int fcopy_fd = -1, tmp = 1;
> +	int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
> +	struct vmbus_br txbr, rxbr;
> +	void *ring;
> +	uint32_t len = HV_RING_SIZE;
> +	char uio_name[10] = {0};
> +	char uio_dev_path[15] = {0};
> +	char uio_path[MAX_PATH_LEN] = {0};
> +	char inst[MAX_LINE_LEN] = {0};
> +
> +	static struct option long_options[] = {
> +		{"help",	no_argument,	   0,  'h' },
> +		{"no-daemon",	no_argument,	   0,  'n' },
> +		{0,		0,		   0,  0   }
> +	};
> +
> +	while ((opt = getopt_long(argc, argv, "hn", long_options,
> +				  &long_index)) != -1) {
> +		switch (opt) {
> +		case 'n':
> +			daemonize = 0;
> +			break;
> +		case 'h':
> +		default:
> +			print_usage(argv);
> +			exit(EXIT_FAILURE);
> +		}
> +	}
> +
> +	if (daemonize && daemon(1, 0)) {
> +		syslog(LOG_ERR, "daemon() failed; error: %s", strerror(errno));
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	openlog("HV_UIO_FCOPY", 0, LOG_USER);
> +	syslog(LOG_INFO, "starting; pid is:%d", getpid());
> +
> +	/* get instance id */
> +	if (fcopy_get_instance_id(DEVICES_SYSFS, FCOPY_CLASS_ID, inst))
> +		exit(EXIT_FAILURE);

Per above, need better error handling.  And since the syslog is now open,
any errors should be logged rather than having the process just
mysteriously exit.

> +
> +	/* set ring_size value */
> +	fcopy_set_ring_size(DEVICES_SYSFS, inst, HV_RING_SIZE);
> +
> +	/* get /dev/uioX dev path and open it */
> +	snprintf(uio_path, sizeof(uio_path), "%s/%s/%s", DEVICES_SYSFS, inst, "uio");
> +	fcopy_get_first_folder(uio_path, uio_name);
> +	snprintf(uio_dev_path, sizeof(uio_dev_path), "/dev/%s", uio_name);
> +	fcopy_fd = open(uio_dev_path, O_RDWR);
> +
> +	if (fcopy_fd < 0) {
> +		syslog(LOG_ERR, "open %s failed; error: %d %s",
> +		       uio_dev_path, errno, strerror(errno));
> +		syslog(LOG_ERR, "Please make sure module uio_hv_vmbus_client is loaded and" \
> +		       " device is not used by any other application\n");
> +		ret = fcopy_fd;
> +		exit(EXIT_FAILURE);
> +	}
> +
> +	ring = mmap(NULL, 2 * HV_RING_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fcopy_fd, 0);
> +	if (ring == MAP_FAILED) {
> +		ret = errno;
> +		syslog(LOG_ERR, "mmap ringbuffer failed; error: %d %s", ret, strerror(ret));
> +		goto close;
> +	}
> +	vmbus_br_setup(&txbr, ring, HV_RING_SIZE);
> +	vmbus_br_setup(&rxbr, (char *)ring + HV_RING_SIZE, HV_RING_SIZE);
> +
> +	while (1) {
> +		/*
> +		 * In this loop we process fcopy messages after the
> +		 * handshake is complete.
> +		 */
> +		ret = pread(fcopy_fd, &tmp, sizeof(int), 0);
> +		if (ret < 0) {
> +			syslog(LOG_ERR, "pread failed: %s", strerror(errno));
> +			continue;
> +		}
> +
> +		len = HV_RING_SIZE;
> +		ret = rte_vmbus_chan_recv_raw(&rxbr, desc, &len);
> +		if (unlikely(ret <= 0)) {
> +			/* This indicates a failure to communicate (or worse) */
> +			syslog(LOG_ERR, "VMBus channel recv error: %d", ret);
> +		} else {
> +			ret = fcopy_pkt_process(&txbr);
> +			if (ret < 0)
> +				goto close;
> +
> +			/* Signal host */
> +			tmp = 1;
> +			if ((write(fcopy_fd, &tmp, sizeof(int))) != sizeof(int)) {
> +				ret = errno;
> +				syslog(LOG_ERR, "Registration failed: %s\n", strerror(ret));
> +				goto close;
> +			}
> +		}
> +	}
> +close:
> +	close(fcopy_fd);
> +	return ret;
> +}
> --
> 2.34.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v3 2/3] tools: hv: Add vmbus_bufring
  2023-08-02 21:43   ` Michael Kelley (LINUX)
@ 2023-08-03 12:06     ` Saurabh Singh Sengar
  0 siblings, 0 replies; 9+ messages in thread
From: Saurabh Singh Sengar @ 2023-08-03 12:06 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	Saurabh Sengar, KY Srinivasan, Haiyang Zhang, wei.liu,
	Dexuan Cui, gregkh, corbet, linux-kernel, linux-hyperv,
	linux-doc



> -----Original Message-----
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Thursday, August 3, 2023 3:14 AM
> To: Saurabh Sengar <ssengar@linux.microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>;
> gregkh@linuxfoundation.org; corbet@lwn.net; linux-kernel@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-doc@vger.kernel.org
> Subject: [EXTERNAL] RE: [PATCH v3 2/3] tools: hv: Add vmbus_bufring
> 
> From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14,
> 2023 3:26 AM
> >
> > Provide a userspace interface for userspace drivers or applications to
> > read/write a VMBus ringbuffer. A significant part of this code is
> > borrowed from DPDK[1]. Current library is supported exclusively for
> > the x86 architecture.
> >
> > To build this library:
> > make -C tools/hv libvmbus_bufring.a
> >
> > Applications using this library can include the vmbus_bufring.h header
> > file and libvmbus_bufring.a statically.
> >
> > [1]
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
> >
> ub.com%2FDPDK%2Fdpdk%2F&data=05%7C01%7Cssengar%40microsoft.com
> %7C7aa6d
> >
> 4dbbcb44895db5008db93a193c9%7C72f988bf86f141af91ab2d7cd011db47%7
> C1%7C0
> >
> %7C638266094508922046%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLj
> AwMDAiLCJQ
> >
> IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata
> =O0cvl
> > EWlbNS51VoaBHo5l2wWDDjAFJVdfDeT3t%2FR36Y%3D&reserved=0
> >
> > Signed-off-by: Mary Hardy <maryhardy@microsoft.com>
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > [V3]
> > - Made ring buffer data offset depend on page size
> > - remove rte_smp_rwmb macro and reused rte_compiler_barrier instead
> > - Added legal counsel sign-off
> > - Removed "Link:" tag
> > - Improve commit messages
> > - new library compilation dependent on x86
> > - simplify mmap
> >
> > [V2]
> > - simpler sysfs path, less parsing
> >
> >  tools/hv/Build           |   1 +
> >  tools/hv/Makefile        |  13 +-
> >  tools/hv/vmbus_bufring.c | 297
> > +++++++++++++++++++++++++++++++++++++++
> >  tools/hv/vmbus_bufring.h | 154 ++++++++++++++++++++
> >  4 files changed, 464 insertions(+), 1 deletion(-)  create mode 100644
> > tools/hv/vmbus_bufring.c  create mode 100644 tools/hv/vmbus_bufring.h
> >
> > diff --git a/tools/hv/Build b/tools/hv/Build index
> > 6cf51fa4b306..2a667d3d94cb 100644
> > --- a/tools/hv/Build
> > +++ b/tools/hv/Build
> > @@ -1,3 +1,4 @@
> >  hv_kvp_daemon-y += hv_kvp_daemon.o
> >  hv_vss_daemon-y += hv_vss_daemon.o
> >  hv_fcopy_daemon-y += hv_fcopy_daemon.o
> > +vmbus_bufring-y += vmbus_bufring.o
> > diff --git a/tools/hv/Makefile b/tools/hv/Makefile index
> > fe770e679ae8..33cf488fd20f 100644
> > --- a/tools/hv/Makefile
> > +++ b/tools/hv/Makefile
> > @@ -11,14 +11,19 @@ srctree := $(patsubst %/,%,$(dir $(CURDIR)))
> > srctree := $(patsubst %/,%,$(dir $(srctree)))  endif
> >
> > +include $(srctree)/tools/scripts/Makefile.arch
> > +
> >  # Do not use make's built-in rules
> >  # (this improves performance and avoids hard-to-debug behaviour);
> > MAKEFLAGS += -r
> >
> >  override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -I$(OUTPUT)include
> >
> > +ifeq ($(SRCARCH),x86)
> > +ALL_LIBS := libvmbus_bufring.a
> > +endif
> >  ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> > -ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS))
> > +ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst
> > %,$(OUTPUT)%,$(ALL_LIBS))
> >
> >  ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh
> > hv_set_ifconfig.sh
> >
> > @@ -27,6 +32,12 @@ all: $(ALL_PROGRAMS)  export srctree OUTPUT CC LD
> > CFLAGS  include $(srctree)/tools/build/Makefile.include
> >
> > +HV_VMBUS_BUFRING_IN := $(OUTPUT)vmbus_bufring.o
> > +$(HV_VMBUS_BUFRING_IN): FORCE
> > +	$(Q)$(MAKE) $(build)=vmbus_bufring
> > +$(OUTPUT)libvmbus_bufring.a : vmbus_bufring.o
> > +	$(AR) rcs $@ $^
> > +
> >  HV_KVP_DAEMON_IN := $(OUTPUT)hv_kvp_daemon-in.o
> >  $(HV_KVP_DAEMON_IN): FORCE
> >  	$(Q)$(MAKE) $(build)=hv_kvp_daemon
> > diff --git a/tools/hv/vmbus_bufring.c b/tools/hv/vmbus_bufring.c new
> > file mode 100644 index 000000000000..fb1f0489c625
> > --- /dev/null
> > +++ b/tools/hv/vmbus_bufring.c
> > @@ -0,0 +1,297 @@
> > +// SPDX-License-Identifier: BSD-3-Clause
> > +/*
> > + * Copyright (c) 2009-2012,2016,2023 Microsoft Corp.
> > + * Copyright (c) 2012 NetApp Inc.
> > + * Copyright (c) 2012 Citrix Inc.
> > + * All rights reserved.
> > + */
> > +
> > +#include <errno.h>
> > +#include <emmintrin.h>
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <unistd.h>
> > +#include <sys/uio.h>
> > +#include "vmbus_bufring.h"
> > +
> > +#define	rte_compiler_barrier()	({ asm volatile ("" : : : "memory"); })
> > +#define RINGDATA_START_OFFSET	(getpagesize())
> > +#define VMBUS_RQST_ERROR	0xFFFFFFFFFFFFFFFF
> > +#define ALIGN(val, align)	((typeof(val))((val) & (~((typeof(val))((align) -
> 1)))))
> > +
> > +/* Increase bufring index by inc with wraparound */ static inline
> > +uint32_t vmbus_br_idxinc(uint32_t idx, uint32_t inc, uint32_t sz) {
> > +	idx += inc;
> > +	if (idx >= sz)
> > +		idx -= sz;
> > +
> > +	return idx;
> > +}
> > +
> > +void vmbus_br_setup(struct vmbus_br *br, void *buf, unsigned int
> > +blen) {
> > +	br->vbr = buf;
> > +	br->windex = br->vbr->windex;
> > +	br->dsize = blen - RINGDATA_START_OFFSET; }
> > +
> > +static inline __always_inline void
> > +rte_smp_mb(void)
> > +{
> > +	asm volatile("lock addl $0, -128(%%rsp); " ::: "memory"); }
> > +
> > +static inline int
> > +rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t
> > +src) {
> > +	uint8_t res;
> > +
> > +	asm volatile("lock ; "
> > +		     "cmpxchgl %[src], %[dst];"
> > +		     "sete %[res];"
> > +		     : [res] "=a" (res),     /* output */
> > +		     [dst] "=m" (*dst)
> > +		     : [src] "r" (src),      /* input */
> > +		     "a" (exp),
> > +		     "m" (*dst)
> > +		     : "memory");            /* no-clobber list */
> > +	return res;
> > +}
> > +
> > +static inline uint32_t
> > +vmbus_txbr_copyto(const struct vmbus_br *tbr, uint32_t windex,
> > +		  const void *src0, uint32_t cplen) {
> > +	uint8_t *br_data = (uint8_t *)tbr->vbr + RINGDATA_START_OFFSET;
> > +	uint32_t br_dsize = tbr->dsize;
> > +	const uint8_t *src = src0;
> > +
> > +	if (cplen > br_dsize - windex) {
> > +		uint32_t fraglen = br_dsize - windex;
> > +
> > +		/* Wrap-around detected */
> > +		memcpy(br_data + windex, src, fraglen);
> > +		memcpy(br_data, src + fraglen, cplen - fraglen);
> > +	} else {
> > +		memcpy(br_data + windex, src, cplen);
> > +	}
> > +
> > +	return vmbus_br_idxinc(windex, cplen, br_dsize); }
> > +
> > +/*
> > + * Write scattered channel packet to TX bufring.
> > + *
> > + * The offset of this channel packet is written as a 64bits value
> > + * immediately after this channel packet.
> > + *
> > + * The write goes through three stages:
> > + *  1. Reserve space in ring buffer for the new data.
> > + *     Writer atomically moves priv_write_index.
> > + *  2. Copy the new data into the ring.
> > + *  3. Update the tail of the ring (visible to host) that indicates
> > + *     next read location. Writer updates write_index
> > + */
> > +static int
> > +vmbus_txbr_write(struct vmbus_br *tbr, const struct iovec iov[], int iovlen,
> > +		 bool *need_sig)
> > +{
> > +	struct vmbus_bufring *vbr = tbr->vbr;
> > +	uint32_t ring_size = tbr->dsize;
> > +	uint32_t old_windex, next_windex, windex, total;
> > +	uint64_t save_windex;
> > +	int i;
> > +
> > +	total = 0;
> > +	for (i = 0; i < iovlen; i++)
> > +		total += iov[i].iov_len;
> > +	total += sizeof(save_windex);
> > +
> > +	/* Reserve space in ring */
> > +	do {
> > +		uint32_t avail;
> > +
> > +		/* Get current free location */
> > +		old_windex = tbr->windex;
> > +
> > +		/* Prevent compiler reordering this with calculation */
> > +		rte_compiler_barrier();
> > +
> > +		avail = vmbus_br_availwrite(tbr, old_windex);
> > +
> > +		/* If not enough space in ring, then tell caller. */
> > +		if (avail <= total)
> > +			return -EAGAIN;
> > +
> > +		next_windex = vmbus_br_idxinc(old_windex, total, ring_size);
> > +
> > +		/* Atomic update of next write_index for other threads */
> > +	} while (!rte_atomic32_cmpset(&tbr->windex, old_windex,
> > +next_windex));
> > +
> > +	/* Space from old..new is now reserved */
> > +	windex = old_windex;
> > +	for (i = 0; i < iovlen; i++)
> > +		windex = vmbus_txbr_copyto(tbr, windex, iov[i].iov_base,
> > +iov[i].iov_len);
> > +
> > +	/* Set the offset of the current channel packet. */
> > +	save_windex = ((uint64_t)old_windex) << 32;
> > +	windex = vmbus_txbr_copyto(tbr, windex, &save_windex,
> > +				   sizeof(save_windex));
> > +
> > +	/* The region reserved should match region used */
> > +	if (windex != next_windex)
> > +		return -EINVAL;
> > +
> > +	/* Ensure that data is available before updating host index */
> > +	rte_compiler_barrier();
> > +
> > +	/* Checkin for our reservation. wait for our turn to update host */
> > +	while (!rte_atomic32_cmpset(&vbr->windex, old_windex,
> next_windex))
> > +		_mm_pause();
> > +
> > +	return 0;
> > +}
> > +
> > +int rte_vmbus_chan_send(struct vmbus_br *txbr, uint16_t type, void
> *data,
> > +			uint32_t dlen, uint32_t flags)
> > +{
> > +	struct vmbus_chanpkt pkt;
> > +	unsigned int pktlen, pad_pktlen;
> > +	const uint32_t hlen = sizeof(pkt);
> > +	bool send_evt = false;
> > +	uint64_t pad = 0;
> > +	struct iovec iov[3];
> > +	int error;
> > +
> > +	pktlen = hlen + dlen;
> > +	pad_pktlen = ALIGN(pktlen, sizeof(uint64_t));
> 
> This ALIGN function rounds down.  So pad_pktlen could be less than pktlen.

Thanks for pointing this, will fix.

> 
> > +
> > +	pkt.hdr.type = type;
> > +	pkt.hdr.flags = flags;
> > +	pkt.hdr.hlen = hlen >> VMBUS_CHANPKT_SIZE_SHIFT;
> > +	pkt.hdr.tlen = pad_pktlen >> VMBUS_CHANPKT_SIZE_SHIFT;
> > +	pkt.hdr.xactid = VMBUS_RQST_ERROR; /* doesn't support multiple
> > +requests at same time */
> > +
> > +	iov[0].iov_base = &pkt;
> > +	iov[0].iov_len = hlen;
> > +	iov[1].iov_base = data;
> > +	iov[1].iov_len = dlen;
> > +	iov[2].iov_base = &pad;
> > +	iov[2].iov_len = pad_pktlen - pktlen;
> 
> Given the way your ALIGN function works, the above could produce a
> negative value for iov[2].iov_len.  Then bad things will happen. :-(

Got it.

> 
> > +
> > +	error = vmbus_txbr_write(txbr, iov, 3, &send_evt);
> > +
> > +	return error;
> > +}
> > +

<snip>

> > +	 * we can request that the receiver interrupt the sender
> > +	 * when the ring transitions from being full to being able
> > +	 * to handle a message of size "pending_send_sz".
> > +	 *
> > +	 * Add necessary state for this enhancement.
> > +	 */
> > +	volatile uint32_t pending_send;
> > +	uint32_t reserved1[12];
> > +
> > +	union {
> > +		struct {
> > +			uint32_t feat_pending_send_sz:1;
> > +		};
> > +		uint32_t value;
> > +	} feature_bits;
> > +
> > +	/*
> > +	 * Ring data starts here + RingDataStartOffset
> 
> This mention of RingDataStartOffset looks stale.  I could not find it defined
> anywhere.

Will correct it to:
Ring data starts after PAGE_SIZE offset from the start of this struct (RINGDATA_START_OFFSET).

- Saurabh

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver
  2023-08-02 21:45   ` Michael Kelley (LINUX)
@ 2023-08-03 12:12     ` Saurabh Singh Sengar
  0 siblings, 0 replies; 9+ messages in thread
From: Saurabh Singh Sengar @ 2023-08-03 12:12 UTC (permalink / raw)
  To: Michael Kelley (LINUX),
	Saurabh Sengar, KY Srinivasan, Haiyang Zhang, wei.liu,
	Dexuan Cui, gregkh, corbet, linux-kernel, linux-hyperv,
	linux-doc



> -----Original Message-----
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Thursday, August 3, 2023 3:15 AM
> To: Saurabh Sengar <ssengar@linux.microsoft.com>; KY Srinivasan
> <kys@microsoft.com>; Haiyang Zhang <haiyangz@microsoft.com>;
> wei.liu@kernel.org; Dexuan Cui <decui@microsoft.com>;
> gregkh@linuxfoundation.org; corbet@lwn.net; linux-kernel@vger.kernel.org;
> linux-hyperv@vger.kernel.org; linux-doc@vger.kernel.org
> Subject: [EXTERNAL] RE: [PATCH v3 3/3] tools: hv: Add new fcopy application
> based on uio driver
> 
> From: Saurabh Sengar <ssengar@linux.microsoft.com> Sent: Friday, July 14,
> 2023 3:26 AM
> >
> > Implement the file copy service for Linux guests on Hyper-V. This
> > permits the host to copy a file (over VMBus) into the guest. This
> > facility is part of "guest integration services" supported on the
> > Hyper-V platform.
> >
> > Here is a link that provides additional details on this functionality:
> >
> > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Flear
> > n.microsoft.com%2Fen-us%2Fpowershell%2Fmodule%2Fhyper-v%2Fcopy-
> vmfile%
> > 3Fview%3Dwindowsserver2022-
> ps&data=05%7C01%7Cssengar%40microsoft.com%7
> >
> Ca5edc1b9d6574e2e6e3108db93a1c558%7C72f988bf86f141af91ab2d7cd011
> db47%7
> >
> C1%7C0%7C638266095311741847%7CUnknown%7CTWFpbGZsb3d8eyJWIjoi
> MC4wLjAwMD
> >
> AiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C
> &sdata
> > =VudSwIKYJJJgPKxNbbfCnOjia1lfKCdijnSn94OWm8Q%3D&reserved=0
> >
> > This new fcopy application uses uio_hv_vmbus_client driver which makes
> > the earlier hv_util based driver and application obsolete.
> >
> > Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
> > ---
> > [V3]
> > - Improve cover letter and commit messages
> > - Improve debug prints
> > - Instead of hardcoded instance id, query from class id sysfs
> > - Set the ring_size value from application
> > - Update the application to mmap /dev/uio instead of sysfs
> > - new application compilation dependent on x86
> >
> > [V2]
> > - simpler sysfs path
> >
> >  tools/hv/Build                 |   1 +
> >  tools/hv/Makefile              |  10 +-
> >  tools/hv/hv_fcopy_uio_daemon.c | 578
> > +++++++++++++++++++++++++++++++++
> >  3 files changed, 588 insertions(+), 1 deletion(-)  create mode 100644
> > tools/hv/hv_fcopy_uio_daemon.c
> >
> > diff --git a/tools/hv/Build b/tools/hv/Build index
> > 2a667d3d94cb..efcbb74a0d23 100644
> > --- a/tools/hv/Build
> > +++ b/tools/hv/Build
> > @@ -2,3 +2,4 @@ hv_kvp_daemon-y += hv_kvp_daemon.o
> hv_vss_daemon-y +=
> > hv_vss_daemon.o  hv_fcopy_daemon-y += hv_fcopy_daemon.o
> > vmbus_bufring-y += vmbus_bufring.o
> > +hv_fcopy_uio_daemon-y += hv_fcopy_uio_daemon.o
> > diff --git a/tools/hv/Makefile b/tools/hv/Makefile index
> > 33cf488fd20f..678c6c450a53 100644
> > --- a/tools/hv/Makefile
> > +++ b/tools/hv/Makefile
> > @@ -21,8 +21,10 @@ override CFLAGS += -O2 -Wall -g -D_GNU_SOURCE -
> > I$(OUTPUT)include
> >
> >  ifeq ($(SRCARCH),x86)
> >  ALL_LIBS := libvmbus_bufring.a
> > -endif
> > +ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> > hv_fcopy_uio_daemon
> > +else
> >  ALL_TARGETS := hv_kvp_daemon hv_vss_daemon hv_fcopy_daemon
> > +endif
> >  ALL_PROGRAMS := $(patsubst %,$(OUTPUT)%,$(ALL_TARGETS)) $(patsubst
> > %,$(OUTPUT)%,$(ALL_LIBS))
> >
> >  ALL_SCRIPTS := hv_get_dhcp_info.sh hv_get_dns_info.sh
> > hv_set_ifconfig.sh @@ -56,6 +58,12 @@ $(HV_FCOPY_DAEMON_IN):
> FORCE
> >  $(OUTPUT)hv_fcopy_daemon: $(HV_FCOPY_DAEMON_IN)
> >  	$(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $< -o $@
> >
> > +HV_FCOPY_UIO_DAEMON_IN := $(OUTPUT)hv_fcopy_uio_daemon-in.o
> > +$(HV_FCOPY_UIO_DAEMON_IN): FORCE
> > +	$(Q)$(MAKE) $(build)=hv_fcopy_uio_daemon
> > +$(OUTPUT)hv_fcopy_uio_daemon: $(HV_FCOPY_UIO_DAEMON_IN)
> > libvmbus_bufring.a
> > +	$(QUIET_LINK)$(CC) -lm $< -L. -lvmbus_bufring -o $@
> > +
> >  clean:
> >  	rm -f $(ALL_PROGRAMS)
> >  	find $(or $(OUTPUT),.) -name '*.o' -delete -o -name '\.*.d' -delete
> > diff --git a/tools/hv/hv_fcopy_uio_daemon.c
> > b/tools/hv/hv_fcopy_uio_daemon.c new file mode 100644 index
> > 000000000000..e8618a30dc7e
> > --- /dev/null
> > +++ b/tools/hv/hv_fcopy_uio_daemon.c
> > @@ -0,0 +1,578 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * An implementation of host to guest copy functionality for Linux.
> > + *
> > + * Copyright (C) 2023, Microsoft, Inc.
> > + *
> > + * Author : K. Y. Srinivasan <kys@microsoft.com>
> > + * Author : Saurabh Sengar <ssengar@microsoft.com>
> > + *
> > + */
> > +
> > +#include <dirent.h>
> > +#include <errno.h>
> > +#include <fcntl.h>
> > +#include <getopt.h>
> > +#include <locale.h>
> > +#include <stdbool.h>
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <string.h>
> > +#include <syslog.h>
> > +#include <unistd.h>
> > +#include <sys/mman.h>
> > +#include <sys/stat.h>
> > +#include <linux/hyperv.h>
> > +#include "vmbus_bufring.h"
> > +
> > +#define ICMSGTYPE_NEGOTIATE	0
> > +#define ICMSGTYPE_FCOPY		7
> > +
> > +#define WIN8_SRV_MAJOR		1
> > +#define WIN8_SRV_MINOR		1
> > +#define WIN8_SRV_VERSION	(WIN8_SRV_MAJOR << 16 |
> WIN8_SRV_MINOR)
> > +
> > +#define MAX_PATH_LEN		300
> > +#define MAX_LINE_LEN		40
> > +#define DEVICES_SYSFS		"/sys/bus/vmbus/devices"
> > +#define FCOPY_CLASS_ID		"34d14be3-dee4-41c8-9ae7-
> 6b174977c192"
> > +
> > +#define FCOPY_VER_COUNT		1
> > +static const int fcopy_versions[] = {
> > +	WIN8_SRV_VERSION
> > +};
> > +
> > +#define FW_VER_COUNT		1
> > +static const int fw_versions[] = {
> > +	UTIL_FW_VERSION
> > +};
> > +
> > +#define HV_RING_SIZE		(4 * 4096)
> > +
> > +unsigned char desc[HV_RING_SIZE];
> > +
> > +static int target_fd;
> > +static char target_fname[PATH_MAX];
> > +static unsigned long long filesize;
> > +
> > +static int hv_fcopy_create_file(char *file_name, char *path_name,
> > +__u32 flags) {
> > +	int error = HV_E_FAIL;
> > +	char *q, *p;
> > +
> > +	filesize = 0;
> > +	p = (char *)path_name;
> > +	snprintf(target_fname, sizeof(target_fname), "%s/%s",
> > +		 (char *)path_name, (char *)file_name);
> > +
> > +	/*
> > +	 * Check to see if the path is already in place; if not,
> > +	 * create if required.
> > +	 */
> > +	while ((q = strchr(p, '/')) != NULL) {
> > +		if (q == p) {
> > +			p++;
> > +			continue;
> > +		}
> > +		*q = '\0';
> > +		if (access(path_name, F_OK)) {
> > +			if (flags & CREATE_PATH) {
> > +				if (mkdir(path_name, 0755)) {
> > +					syslog(LOG_ERR, "Failed to create
> %s",
> > +					       path_name);
> > +					goto done;
> > +				}
> > +			} else {
> > +				syslog(LOG_ERR, "Invalid path: %s",
> path_name);
> > +				goto done;
> > +			}
> > +		}
> > +		p = q + 1;
> > +		*q = '/';
> > +	}
> > +
> > +	if (!access(target_fname, F_OK)) {
> > +		syslog(LOG_INFO, "File: %s exists", target_fname);
> > +		if (!(flags & OVER_WRITE)) {
> > +			error = HV_ERROR_ALREADY_EXISTS;
> > +			goto done;
> > +		}
> > +	}
> > +
> > +	target_fd = open(target_fname,
> > +			 O_RDWR | O_CREAT | O_TRUNC | O_CLOEXEC,
> 0744);
> > +	if (target_fd == -1) {
> > +		syslog(LOG_INFO, "Open Failed: %s", strerror(errno));
> > +		goto done;
> > +	}
> > +
> > +	error = 0;
> > +done:
> > +	if (error)
> > +		target_fname[0] = '\0';
> > +	return error;
> > +}
> > +
> > +static int hv_copy_data(struct hv_do_fcopy *cpmsg) {
> > +	ssize_t bytes_written;
> > +	int ret = 0;
> > +
> > +	bytes_written = pwrite(target_fd, cpmsg->data, cpmsg->size,
> > +			       cpmsg->offset);
> > +
> > +	filesize += cpmsg->size;
> > +	if (bytes_written != cpmsg->size) {
> > +		switch (errno) {
> > +		case ENOSPC:
> > +			ret = HV_ERROR_DISK_FULL;
> > +			break;
> > +		default:
> > +			ret = HV_E_FAIL;
> > +			break;
> > +		}
> > +		syslog(LOG_ERR, "pwrite failed to write %llu bytes: %ld (%s)",
> > +		       filesize, (long)bytes_written, strerror(errno));
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +/*
> > + * Reset target_fname to "" in the two below functions for
> > +hibernation: if
> > + * the fcopy operation is aborted by hibernation, the daemon should
> > +remove the
> > + * partially-copied file; to achieve this, the hv_utils driver always
> > +fakes a
> > + * CANCEL_FCOPY message upon suspend, and later when the VM
> resumes
> > +back,
> > + * the daemon calls hv_copy_cancel() to remove the file; if a file is
> > +copied
> > + * successfully before suspend, hv_copy_finished() must reset
> > +target_fname to
> > + * avoid that the file can be incorrectly removed upon resume, since
> > +the faked
> > + * CANCEL_FCOPY message is spurious in this case.
> > + */
> > +static int hv_copy_finished(void)
> > +{
> > +	close(target_fd);
> > +	target_fname[0] = '\0';
> > +	return 0;
> > +}
> > +
> > +static void print_usage(char *argv[]) {
> > +	fprintf(stderr, "Usage: %s [options]\n"
> > +		"Options are:\n"
> > +		"  -n, --no-daemon        stay in foreground, don't
> daemonize\n"
> > +		"  -h, --help             print this help\n", argv[0]);
> > +}
> > +
> > +static bool vmbus_prep_negotiate_resp(struct icmsg_hdr *icmsghdrp,
> unsigned char *buf,
> > +				      unsigned int buflen, const int *fw_version,
> int fw_vercnt,
> > +				const int *srv_version, int srv_vercnt,
> > +				int *nego_fw_version, int *nego_srv_version)
> {
> > +	int icframe_major, icframe_minor;
> > +	int icmsg_major, icmsg_minor;
> > +	int fw_major, fw_minor;
> > +	int srv_major, srv_minor;
> > +	int i, j;
> > +	bool found_match = false;
> > +	struct icmsg_negotiate *negop;
> > +
> > +	/* Check that there's enough space for icframe_vercnt, icmsg_vercnt
> */
> > +	if (buflen < ICMSG_HDR + offsetof(struct icmsg_negotiate, reserved)) {
> > +		syslog(LOG_ERR, "Invalid icmsg negotiate");
> > +		return false;
> > +	}
> > +
> > +	icmsghdrp->icmsgsize = 0x10;
> > +	negop = (struct icmsg_negotiate *)&buf[ICMSG_HDR];
> > +
> > +	icframe_major = negop->icframe_vercnt;
> > +	icframe_minor = 0;
> > +
> > +	icmsg_major = negop->icmsg_vercnt;
> > +	icmsg_minor = 0;
> > +
> > +	/* Validate negop packet */
> > +	if (icframe_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
> > +	    icmsg_major > IC_VERSION_NEGOTIATION_MAX_VER_COUNT ||
> > +	    ICMSG_NEGOTIATE_PKT_SIZE(icframe_major, icmsg_major) >
> buflen) {
> > +		syslog(LOG_ERR, "Invalid icmsg negotiate - icframe_major:
> %u, icmsg_major: %u\n",
> > +		       icframe_major, icmsg_major);
> > +		goto fw_error;
> > +	}
> > +
> > +	/*
> > +	 * Select the framework version number we will
> > +	 * support.
> > +	 */
> > +
> > +	for (i = 0; i < fw_vercnt; i++) {
> > +		fw_major = (fw_version[i] >> 16);
> > +		fw_minor = (fw_version[i] & 0xFFFF);
> > +
> > +		for (j = 0; j < negop->icframe_vercnt; j++) {
> > +			if (negop->icversion_data[j].major == fw_major &&
> > +			    negop->icversion_data[j].minor == fw_minor) {
> > +				icframe_major = negop-
> >icversion_data[j].major;
> > +				icframe_minor = negop-
> >icversion_data[j].minor;
> > +				found_match = true;
> > +				break;
> > +			}
> > +		}
> > +
> > +		if (found_match)
> > +			break;
> > +	}
> > +
> > +	if (!found_match)
> > +		goto fw_error;
> > +
> > +	found_match = false;
> > +
> > +	for (i = 0; i < srv_vercnt; i++) {
> > +		srv_major = (srv_version[i] >> 16);
> > +		srv_minor = (srv_version[i] & 0xFFFF);
> > +
> > +		for (j = negop->icframe_vercnt;
> > +			(j < negop->icframe_vercnt + negop->icmsg_vercnt);
> > +			j++) {
> > +			if (negop->icversion_data[j].major == srv_major &&
> > +			    negop->icversion_data[j].minor == srv_minor) {
> > +				icmsg_major = negop-
> >icversion_data[j].major;
> > +				icmsg_minor = negop-
> >icversion_data[j].minor;
> > +				found_match = true;
> > +				break;
> > +			}
> > +		}
> > +
> > +		if (found_match)
> > +			break;
> > +	}
> > +
> > +	/*
> > +	 * Respond with the framework and service
> > +	 * version numbers we can support.
> > +	 */
> > +fw_error:
> > +	if (!found_match) {
> > +		negop->icframe_vercnt = 0;
> > +		negop->icmsg_vercnt = 0;
> > +	} else {
> > +		negop->icframe_vercnt = 1;
> > +		negop->icmsg_vercnt = 1;
> > +	}
> > +
> > +	if (nego_fw_version)
> > +		*nego_fw_version = (icframe_major << 16) | icframe_minor;
> > +
> > +	if (nego_srv_version)
> > +		*nego_srv_version = (icmsg_major << 16) | icmsg_minor;
> > +
> > +	negop->icversion_data[0].major = icframe_major;
> > +	negop->icversion_data[0].minor = icframe_minor;
> > +	negop->icversion_data[1].major = icmsg_major;
> > +	negop->icversion_data[1].minor = icmsg_minor;
> > +
> > +	return found_match;
> > +}
> > +
> > +static void wcstoutf8(char *dest, const __u16 *src, size_t dest_size)
> > +{
> > +	size_t len = 0;
> > +
> > +	while (len < dest_size) {
> > +		if (src[len] < 0x80)
> > +			dest[len++] = (char)(*src++);
> > +		else
> > +			dest[len++] = 'X';
> > +	}
> > +
> > +	dest[len] = '\0';
> > +}
> > +
> > +static int hv_fcopy_start(struct hv_start_fcopy *smsg_in) {
> > +	setlocale(LC_ALL, "en_US.utf8");
> > +	size_t file_size, path_size;
> > +	char *file_name, *path_name;
> > +	char *in_file_name = (char *)smsg_in->file_name;
> > +	char *in_path_name = (char *)smsg_in->path_name;
> > +
> > +	file_size = wcstombs(NULL, (const wchar_t *restrict)in_file_name, 0) +
> 1;
> > +	path_size = wcstombs(NULL, (const wchar_t *restrict)in_path_name,
> 0)
> > ++ 1;
> > +
> > +	file_name = (char *)malloc(file_size * sizeof(char));
> > +	path_name = (char *)malloc(path_size * sizeof(char));
> > +
> > +	wcstoutf8(file_name, (__u16 *)in_file_name, file_size);
> > +	wcstoutf8(path_name, (__u16 *)in_path_name, path_size);
> > +
> > +	return hv_fcopy_create_file(file_name, path_name,
> > +smsg_in->copy_flags); }
> > +
> > +static int hv_fcopy_send_data(struct hv_fcopy_hdr *fcopy_msg, int
> > +recvlen) {
> > +	int operation = fcopy_msg->operation;
> > +
> > +	/*
> > +	 * The  strings sent from the host are encoded in
> > +	 * utf16; convert it to utf8 strings.
> > +	 * The host assures us that the utf16 strings will not exceed
> > +	 * the max lengths specified. We will however, reserve room
> > +	 * for the string terminating character - in the utf16s_utf8s()
> > +	 * function we limit the size of the buffer where the converted
> > +	 * string is placed to W_MAX_PATH -1 to guarantee
> > +	 * that the strings can be properly terminated!
> > +	 */
> > +
> > +	switch (operation) {
> > +	case START_FILE_COPY:
> > +		return hv_fcopy_start((struct hv_start_fcopy *)fcopy_msg);
> > +	case WRITE_TO_FILE:
> > +		return hv_copy_data((struct hv_do_fcopy *)fcopy_msg);
> > +	case COMPLETE_FCOPY:
> > +		return hv_copy_finished();
> > +	}
> > +
> > +	return HV_E_FAIL;
> > +}
> > +
> > +/* process the packet recv from host */ static int
> > +fcopy_pkt_process(struct vmbus_br *txbr) {
> > +	int ret, offset, pktlen;
> > +	int fcopy_srv_version;
> > +	const struct vmbus_chanpkt_hdr *pkt;
> > +	struct hv_fcopy_hdr *fcopy_msg;
> > +	struct icmsg_hdr *icmsghdr;
> > +
> > +	pkt = (const struct vmbus_chanpkt_hdr *)desc;
> > +	offset = pkt->hlen << 3;
> > +	pktlen = (pkt->tlen << 3) - offset;
> > +	icmsghdr = (struct icmsg_hdr *)&desc[offset + sizeof(struct
> vmbuspipe_hdr)];
> > +	icmsghdr->status = HV_E_FAIL;
> > +
> > +	if (icmsghdr->icmsgtype == ICMSGTYPE_NEGOTIATE) {
> > +		if (vmbus_prep_negotiate_resp(icmsghdr, desc + offset,
> pktlen, fw_versions,
> > +					      FW_VER_COUNT, fcopy_versions,
> FCOPY_VER_COUNT,
> > +					      NULL, &fcopy_srv_version)) {
> > +			syslog(LOG_INFO, "FCopy IC version %d.%d",
> > +			       fcopy_srv_version >> 16, fcopy_srv_version &
> 0xFFFF);
> > +			icmsghdr->status = 0;
> > +		}
> > +	} else if (icmsghdr->icmsgtype == ICMSGTYPE_FCOPY) {
> > +		/* Ensure recvlen is big enough to contain hv_fcopy_hdr */
> > +		if (pktlen < ICMSG_HDR + sizeof(struct hv_fcopy_hdr)) {
> > +			syslog(LOG_ERR, "Invalid Fcopy hdr. Packet length too
> small: %u",
> > +			       pktlen);
> > +			return -ENOBUFS;
> > +		}
> > +
> > +		fcopy_msg = (struct hv_fcopy_hdr *)&desc[offset +
> ICMSG_HDR];
> > +		icmsghdr->status = hv_fcopy_send_data(fcopy_msg, pktlen);
> > +	}
> > +
> > +	icmsghdr->icflags = ICMSGHDRFLAG_TRANSACTION |
> ICMSGHDRFLAG_RESPONSE;
> > +	ret = rte_vmbus_chan_send(txbr, 0x6, desc + offset, pktlen, 0);
> > +	if (ret) {
> > +		syslog(LOG_ERR, "Write to ringbuffer failed err: %d", ret);
> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void fcopy_get_first_folder(char *path, char *chan_no) {
> > +	DIR *dir = opendir(path);
> > +	struct dirent *entry;
> > +
> > +	if (!dir) {
> > +		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n",
> strerror(errno));
> > +		return;
> > +	}
> > +
> > +	while ((entry = readdir(dir)) != NULL) {
> > +		if (entry->d_type == DT_DIR && strcmp(entry->d_name, ".")
> != 0 &&
> > +		    strcmp(entry->d_name, "..") != 0) {
> > +			strcpy(chan_no, entry->d_name);
> > +			break;
> > +		}
> > +	}
> > +
> > +	closedir(dir);
> > +}
> > +
> > +static void fcopy_set_ring_size(char *path, char *inst, int size) {
> > +	char ring_size_path[MAX_PATH_LEN] = {0};
> > +	FILE *fd;
> > +
> > +	snprintf(ring_size_path, sizeof(ring_size_path), "%s/%s/%s", path, inst,
> "ring_size");
> > +	fd = fopen(ring_size_path, "w");
> > +	if (!fd) {
> > +		syslog(LOG_WARNING, "Failed to open ring_size file
> (errno=%s).\n", strerror(errno));
> > +		return;
> > +	}
> > +	fprintf(fd, "%d", size);
> 
> Check for and log an error if the new value isn't accepted by the kernel
> driver?
> The code is using a ring size value that should be accepted by the kernel
> driver, but weird stuff happens and it's probably better to know about it.

Ok, will add error check.

> 
> > +	fclose(fd);
> > +}
> > +
> > +static char *fcopy_read_sysfs(char *path, char *buf, int len) {
> > +	FILE *fd;
> > +	char *ret;
> > +
> > +	fd = fopen(path, "r");
> > +	if (!fd)
> > +		return NULL;
> > +
> > +	ret = fgets(buf, len, fd);
> > +	fclose(fd);
> > +
> > +	return ret;
> > +}
> > +
> > +static int fcopy_get_instance_id(char *path, char *class_id, char
> > +*inst) {
> > +	DIR *dir = opendir(path);
> > +	struct dirent *entry;
> > +	char tmp_path[MAX_PATH_LEN] = {0};
> > +	char line[MAX_LINE_LEN];
> > +
> > +	if (!dir) {
> > +		syslog(LOG_ERR, "Failed to open directory (errno=%s).\n",
> strerror(errno));
> > +		return -EINVAL;
> > +	}
> > +
> > +	while ((entry = readdir(dir)) != NULL) {
> > +		if (entry->d_type == DT_LNK && strcmp(entry->d_name, ".")
> != 0 &&
> > +		    strcmp(entry->d_name, "..") != 0) {
> > +			/* search for the sysfs path with matching class_id */
> > +			snprintf(tmp_path, sizeof(tmp_path), "%s/%s/%s",
> > +				 path, entry->d_name, "class_id");
> > +			if (!fcopy_read_sysfs(tmp_path, line, MAX_LINE_LEN))
> > +				continue;
> > +
> > +			/* class id matches, now fetch the instance id from
> device_id */
> > +			if (strstr(line, class_id)) {
> > +				snprintf(tmp_path, sizeof(tmp_path),
> "%s/%s/%s",
> > +					 path, entry->d_name, "device_id");
> > +				if (!fcopy_read_sysfs(tmp_path, line,
> MAX_LINE_LEN))
> > +					continue;
> > +				/* remove braces */
> > +				strncpy(inst, line + 1, strlen(line) - 3);
> > +				break;
> > +			}
> > +		}
> > +	}
> > +
> > +	closedir(dir);
> > +	return 0;
> 
> If this function doesn't find a matching class_id, it appears that it returns 0,
> but with the "inst" parameter unset.  The caller will then proceed as if "inst"
> was set when it is actually an uninitialized stack variable.  Probably need
> some better error detection and handling.

Good point !
Let me fix it in next version.

> 
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +	int fcopy_fd = -1, tmp = 1;
> > +	int daemonize = 1, long_index = 0, opt, ret = -EINVAL;
> > +	struct vmbus_br txbr, rxbr;
> > +	void *ring;
> > +	uint32_t len = HV_RING_SIZE;
> > +	char uio_name[10] = {0};
> > +	char uio_dev_path[15] = {0};
> > +	char uio_path[MAX_PATH_LEN] = {0};
> > +	char inst[MAX_LINE_LEN] = {0};
> > +
> > +	static struct option long_options[] = {
> > +		{"help",	no_argument,	   0,  'h' },
> > +		{"no-daemon",	no_argument,	   0,  'n' },
> > +		{0,		0,		   0,  0   }
> > +	};
> > +
> > +	while ((opt = getopt_long(argc, argv, "hn", long_options,
> > +				  &long_index)) != -1) {
> > +		switch (opt) {
> > +		case 'n':
> > +			daemonize = 0;
> > +			break;
> > +		case 'h':
> > +		default:
> > +			print_usage(argv);
> > +			exit(EXIT_FAILURE);
> > +		}
> > +	}
> > +
> > +	if (daemonize && daemon(1, 0)) {
> > +		syslog(LOG_ERR, "daemon() failed; error: %s",
> strerror(errno));
> > +		exit(EXIT_FAILURE);
> > +	}
> > +
> > +	openlog("HV_UIO_FCOPY", 0, LOG_USER);
> > +	syslog(LOG_INFO, "starting; pid is:%d", getpid());
> > +
> > +	/* get instance id */
> > +	if (fcopy_get_instance_id(DEVICES_SYSFS, FCOPY_CLASS_ID, inst))
> > +		exit(EXIT_FAILURE);
> 
> Per above, need better error handling.  And since the syslog is now open, any
> errors should be logged rather than having the process just mysteriously exit.

OK.

- Saurabh


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-08-03 12:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-14 10:25 [PATCH v3 0/3] UIO driver for low speed Hyper-V devices Saurabh Sengar
2023-07-14 10:25 ` [PATCH v3 1/3] uio: Add hv_vmbus_client driver Saurabh Sengar
2023-08-02 21:43   ` Michael Kelley (LINUX)
2023-07-14 10:25 ` [PATCH v3 2/3] tools: hv: Add vmbus_bufring Saurabh Sengar
2023-08-02 21:43   ` Michael Kelley (LINUX)
2023-08-03 12:06     ` Saurabh Singh Sengar
2023-07-14 10:25 ` [PATCH v3 3/3] tools: hv: Add new fcopy application based on uio driver Saurabh Sengar
2023-08-02 21:45   ` Michael Kelley (LINUX)
2023-08-03 12:12     ` Saurabh Singh Sengar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.