From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE200C64E90 for ; Sat, 28 Nov 2020 21:52:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 62763208CA for ; Sat, 28 Nov 2020 21:52:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="yZdliu4g" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2391272AbgK1Vw3 (ORCPT ); Sat, 28 Nov 2020 16:52:29 -0500 Received: from mail.kernel.org ([198.145.29.99]:50318 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732513AbgK1TB4 (ORCPT ); Sat, 28 Nov 2020 14:01:56 -0500 Received: from kernel.org (unknown [77.125.7.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0F478222EB; Sat, 28 Nov 2020 10:16:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606558581; bh=SXZ50O0tZ9SfgxO8cbDzArUsagGX8/+Yhh5eFFtIV3g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=yZdliu4g5VM7aPv77CfA4Drg0lJGOysCszVmHduFA3ju2GDAiX1KFB119cnjPZb6B ynWOzkQsPzH2rVyUnuaqIq+iA1CduHiSQyl0AQKoH1yJV6NYTIldrCTHzas/4lMJMD 6j7DGWWd/IYAF/1hu4n5YcAlfspmE3DEfZxmv6+A= Date: Sat, 28 Nov 2020 12:16:04 +0200 From: Mike Rapoport To: "Catangiu, Adrian Costin" Cc: Dmitry Safonov <0x7f454c46@gmail.com>, Alexander Graf , Christian Borntraeger , "Jason A. Donenfeld" , Jann Horn , Willy Tarreau , "MacCarthaigh, Colm" , Andy Lutomirski , "Theodore Y. Ts'o" , Eric Biggers , "open list:DOCUMENTATION" , kernel list , "Woodhouse, David" , "bonzini@gnu.org" , "Singh, Balbir" , "Weiss, Radu" , "oridgar@gmail.com" , "ghammer@redhat.com" , Jonathan Corbet , Greg Kroah-Hartman , "Michael S. Tsirkin" , Qemu Developers , KVM list , Michal Hocko , "Rafael J. Wysocki" , Pavel Machek , Linux API , "mpe@ellerman.id.au" , linux-s390 , "areber@redhat.com" , Pavel Emelyanov , Andrey Vagin , Pavel Tikhomirov , "gil@azul.com" , "asmehra@redhat.com" , "dgunigun@redhat.com" , "vijaysun@ca.ibm.com" , "Eric W. Biederman" Subject: Re: [PATCH v3] drivers/virt: vmgenid: add vm generation id driver Message-ID: <20201128101604.GC557259@kernel.org> References: <3E05451B-A9CD-4719-99D0-72750A304044@amazon.com> <300d4404-3efe-880e-ef30-692eabbff5f7@de.ibm.com> <20201119173800.GD8537@kernel.org> <1cdb6fac-0d50-3399-74a6-24c119ebbaa5@amazon.de> <106f56ca-49bc-7cad-480f-4b26656e90ce@gmail.com> <96625ce2-66c6-34b8-ef81-7c17c05b4c7a@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <96625ce2-66c6-34b8-ef81-7c17c05b4c7a@amazon.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Adrian, Usually each version of a patch is a separate e-mail thread On Fri, Nov 27, 2020 at 08:26:02PM +0200, Catangiu, Adrian Costin wrote: > - Background > > The VM Generation ID is a feature defined by Microsoft (paper: > http://go.microsoft.com/fwlink/?LinkId=260709) and supported by > multiple hypervisor vendors. > > The feature is required in virtualized environments by apps that work > with local copies/caches of world-unique data such as random values, > uuids, monotonically increasing counters, etc. > Such apps can be negatively affected by VM snapshotting when the VM > is either cloned or returned to an earlier point in time. > > The VM Generation ID is a simple concept meant to alleviate the issue > by providing a unique ID that changes each time the VM is restored > from a snapshot. The hw provided UUID value can be used to > differentiate between VMs or different generations of the same VM. > > - Problem > > The VM Generation ID is exposed through an ACPI device by multiple > hypervisor vendors but neither the vendors or upstream Linux have no > default driver for it leaving users to fend for themselves. > > Furthermore, simply finding out about a VM generation change is only > the starting point of a process to renew internal states of possibly > multiple applications across the system. This process could benefit > from a driver that provides an interface through which orchestration > can be easily done. > > - Solution > > This patch is a driver that exposes a monotonic incremental Virtual > Machine Generation u32 counter via a char-dev FS interface. The FS > interface provides sync and async VmGen counter updates notifications. > It also provides VmGen counter retrieval and confirmation mechanisms. > > The generation counter and the interface through which it is exposed > are available even when there is no acpi device present. > > When the device is present, the hw provided UUID is not exposed to > userspace, it is internally used by the driver to keep accounting for > the exposed VmGen counter. The counter starts from zero when the > driver is initialized and monotonically increments every time the hw > UUID changes (the VM generation changes). > On each hw UUID change, the new hypervisor-provided UUID is also fed > to the kernel RNG. > > If there is no acpi vmgenid device present, the generation changes are > not driven by hw vmgenid events but can be driven by software through > a dedicated driver ioctl. > > This patch builds on top of Or Idgar 's proposal > https://lkml.org/lkml/2018/3/1/498 > > - Future improvements > > Ideally we would want the driver to register itself based on devices' > _CID and not _HID, but unfortunately I couldn't find a way to do that. > The problem is that ACPI device matching is done by > '__acpi_match_device()' which exclusively looks at > 'acpi_hardware_id *hwid'. > > There is a path for platform devices to match on _CID when _HID is > 'PRP0001' - but this is not the case for the Qemu vmgenid device. > > Guidance and help here would be greatly appreciated. > > Signed-off-by: Adrian Catangiu > > --- Please put the history in the descending order next time v2 -> v3: ... v1 -> v2: ... > v1 -> v2: > >   - expose to userspace a monotonically increasing u32 Vm Gen Counter >     instead of the hw VmGen UUID >   - since the hw/hypervisor-provided 128-bit UUID is not public >     anymore, add it to the kernel RNG as device randomness >   - insert driver page containing Vm Gen Counter in the user vma in >     the driver's mmap handler instead of using a fault handler >   - turn driver into a misc device driver to auto-create /dev/vmgenid >   - change ioctl arg to avoid leaking kernel structs to userspace >   - update documentation >   - various nits >   - rebase on top of linus latest > > v2 -> v3: > >   - separate the core driver logic and interface, from the ACPI device. >     The ACPI vmgenid device is now one possible backend. >   - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS >   - add locking to avoid races between fs ops handlers and hw irq >     driven generation updates >   - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is >     outdated or a generation change happens while waiting (thus making >     current caller outdated), the ioctl returns -EINTR to signal the >     user to handle event and retry. Fixes blocking on oneself. >   - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by >     CAP_CHECKPOINT_RESTORE capability, through which software can force >     generation bump. > --- >  Documentation/virt/vmgenid.rst | 240 +++++++++++++++++++++++ >  drivers/virt/Kconfig           |  17 ++ >  drivers/virt/Makefile          |   1 + >  drivers/virt/vmgenid.c         | 435 +++++++++++++++++++++++++++++++++++++++++ >  include/uapi/linux/vmgenid.h   |  14 ++ >  5 files changed, 707 insertions(+) >  create mode 100644 Documentation/virt/vmgenid.rst >  create mode 100644 drivers/virt/vmgenid.c >  create mode 100644 include/uapi/linux/vmgenid.h > > diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst > new file mode 100644 > index 0000000..b6a9f8d > --- /dev/null > +++ b/Documentation/virt/vmgenid.rst > @@ -0,0 +1,240 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============ > +VMGENID > +============ The "==" line should be the same length as the title, I think. > + > +The VM Generation ID is a feature defined by Microsoft (paper: > +http://go.microsoft.com/fwlink/?LinkId=260709) and supported by > +multiple hypervisor vendors. > + > +The feature is required in virtualized environments by apps that work Please spell 'applications' fully > +with local copies/caches of world-unique data such as random values, > +uuids, monotonically increasing counters, etc. UUIDs > +Such apps can be negatively affected by VM snapshotting when the VM ^applications > +is either cloned or returned to an earlier point in time. > + > +The VM Generation ID is a simple concept meant to alleviate the issue > +by providing a unique ID that changes each time the VM is restored > +from a snapshot. The hw provided UUID value can be used to ^hardware (and below as well) > +differentiate between VMs or different generations of the same VM. > + > +The VM Generation ID is exposed through an ACPI device by multiple > +hypervisor vendors. The driver for it lives at > +``drivers/virt/vmgenid.c`` > + > +The ``vmgenid`` driver exposes a monotonic incremental Virtual > +Machine Generation u32 counter via a char-dev FS interface that > +provides sync and async VmGen counter updates notifications. It also > +provides VmGen counter retrieval and confirmation mechanisms. It would be nice to memntion here the name of the chardev :) > +This counter and the interface through which it is exposed are > +available even when there is no acpi device present. > + > +When the device is present, the hw provided UUID is not exposed to > +userspace, it is internally used by the driver to keep accounting for > +the exposed VmGen counter. The counter starts from zero when the > +driver is initialized and monotonically increments every time the hw > +UUID changes (the VM generation changes). > +On each hw UUID change, the new UUID is also fed to the kernel RNG. > + > +If there is no acpi vmgenid device present, the generation changes are > +not driven by hw vmgenid events and thus should be driven by software > +through a dedicated driver ioctl. > + > +Driver interface: > + > +``open()``: > +  When the device is opened, a copy of the current Vm-Gen-Id (counter) > +  is associated with the open file descriptor. The driver now tracks > +  this file as an independent *watcher*. The driver tracks how many > +  watchers are aware of the latest Vm-Gen-Id counter and how many of > +  them are *outdated*; outdated being those that have lived through > +  a Vm-Gen-Id change but not yet confirmed the new generation counter. > + > +``read()``: > +  Read is meant to provide the *new* VM generation counter when a > +  generation change takes place. The read operation blocks until the > +  associated counter is no longer up to date - until HW vm gen id > +  changes - at which point the new counter is provided/returned. > +  Nonblocking ``read()`` uses ``EAGAIN`` to signal that there is no > +  *new* counter value available. The generation counter is considered > +  *new* for each open file descriptor that hasn't confirmed the new > +  value, following a generation change. Therefore, once a generation > +  change takes place, all ``read()`` calls will immediately return the > +  new generation counter and will continue to do so until the > +  new value is confirmed back to the driver through ``write()``. > +  Partial reads are not allowed - read buffer needs to be at least > +  ``sizeof(unsigned)`` in size. > + > +``write()``: > +  Write is used to confirm the up-to-date Vm Gen counter back to the > +  driver. > +  Following a VM generation change, all existing watchers are marked > +  as *outdated*. Each file descriptor will maintain the *outdated* > +  status until a ``write()`` confirms the up-to-date counter back to > +  the driver. > +  Partial writes are not allowed - write buffer should be exactly > +  ``sizeof(unsigned)`` in size. > + > +``poll()``: > +  Poll is implemented to allow polling for generation counter updates. > +  Such updates result in ``EPOLLIN`` polling status until the new > +  up-to-date counter is confirmed back to the driver through a > +  ``write()``. > + > +``ioctl()``: > +  The driver also adds support for tracking count of open file > +  descriptors that haven't acknowledged a generation counter update. > +  This is exposed through two IOCTLs: > + > +  - VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of > +    *outdated* watchers - number of file descriptors that were open > +    during a VM generation change, and which have not yet confirmed the > +    new generation counter. > +  - VMGENID_WAIT_WATCHERS: blocks until there are no more *outdated* > +    watchers, or if a ``timeout`` argument is provided, until the > +    timeout expires. > +    If the current caller is *outdated* or a generation change happens > +    while waiting (thus making current caller *outdated*), the ioctl > +    returns ``-EINTR`` to signal the user to handle event and retry. > +  - VMGENID_FORCE_GEN_UPDATE: forces a generation counter bump. Can only > +    be used by processes with CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN > +    capabilities. > + > +``mmap()``: > +  The driver supports ``PROT_READ, MAP_SHARED`` mmaps of a single page > +  in size. The first 4 bytes of the mapped page will contain an > +  up-to-date copy of the VM generation counter. > +  The mapped memory can be used as a low-latency generation counter > +  probe mechanism in critical sections - see examples. > + > +``close()``: > +  Removes the file descriptor as a Vm generation counter watcher. > + > +Example application workflows > +----------------------------- > + > +1) Watchdog thread simplified example:: > + > +    void watchdog_thread_handler(int *thread_active) > +    { > +        unsigned genid; > +        int fd = open("/dev/vmgenid", O_RDWR | O_CLOEXEC, S_IRUSR | > S_IWUSR); > + > +        do { > +            // read new gen ID - blocks until VM generation changes > +            read(fd, &genid, sizeof(genid)); > + > +            // because of VM generation change, we need to rebuild world > +            reseed_app_env(); > + > +            // confirm we're done handling gen ID update > +            write(fd, &genid, sizeof(genid)); > +        } while (atomic_read(thread_active)); > + > +        close(fd); > +    } > + > +2) ASYNC simplified example:: > + > +    void handle_io_on_vmgenfd(int vmgenfd) > +    { > +        unsigned genid; > + > +        // read new gen ID - we need it to confirm we've handled update > +        read(fd, &genid, sizeof(genid)); > + > +        // because of VM generation change, we need to rebuild world > +        reseed_app_env(); > + > +        // confirm we're done handling the gen ID update > +        write(fd, &genid, sizeof(genid)); > +    } > + > +    int main() { > +        int epfd, vmgenfd; > +        struct epoll_event ev; > + > +        epfd = epoll_create(EPOLL_QUEUE_LEN); > + > +        vmgenfd = open("/dev/vmgenid", > +                       O_RDWR | O_CLOEXEC | O_NONBLOCK, > +                       S_IRUSR | S_IWUSR); > + > +        // register vmgenid for polling > +        ev.events = EPOLLIN; > +        ev.data.fd = vmgenfd; > +        epoll_ctl(epfd, EPOLL_CTL_ADD, vmgenfd, &ev); > + > +        // register other parts of your app for polling > +        // ... > + > +        while (1) { > +            // wait for something to do... > +            int nfds = epoll_wait(epfd, events, > +                MAX_EPOLL_EVENTS_PER_RUN, > +                EPOLL_RUN_TIMEOUT); > +            if (nfds < 0) die("Error in epoll_wait!"); > + > +            // for each ready fd > +            for(int i = 0; i < nfds; i++) { > +                int fd = events[i].data.fd; > + > +                if (fd == vmgenfd) > +                    handle_io_on_vmgenfd(vmgenfd); > +                else > +                    handle_some_other_part_of_the_app(fd); > +            } > +        } > + > +        return 0; > +    } > + > +3) Mapped memory polling simplified example:: > + > +    /* > +     * app/library function that provides cached secrets > +     */ > +    char * safe_cached_secret(app_data_t *app) > +    { > +        char *secret; > +        volatile unsigned *const genid_ptr = get_vmgenid_mapping(app); > +    again: > +        secret = __cached_secret(app); > + > +        if (unlikely(*genid_ptr != app->cached_genid)) { > +            // rebuild world then confirm the genid update (thru write) > +            rebuild_caches(app); > + > +            app->cached_genid = *genid_ptr; > +            ack_vmgenid_update(app); > + > +            goto again; > +        } > + > +        return secret; > +    } > + > +4) Orchestrator simplified example:: > + > +    /* > +     * orchestrator - manages multiple apps and libraries used by a service > +     * and tries to make sure all sensitive components gracefully handle > +     * VM generation changes. > +     * Following function is called on detection of a VM generation change. > +     */ > +    int handle_vmgen_update(int vmgen_fd, unsigned new_gen_id) > +    { > +        // pause until all components have handled event > +        pause_service(); > + > +        // confirm *this* watcher as up-to-date > +        write(vmgen_fd, &new_gen_id, sizeof(unsigned)); > + > +        // wait for all *others* for at most 5 seconds. > +        ioctl(vmgen_fd, VMGENID_WAIT_WATCHERS, 5000); > + > +        // all apps on the system have rebuilt worlds > +        resume_service(); > +    } > diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig > index 80c5f9c1..5d5f37b 100644 > --- a/drivers/virt/Kconfig > +++ b/drivers/virt/Kconfig > @@ -13,6 +13,23 @@ menuconfig VIRT_DRIVERS >   >  if VIRT_DRIVERS >   > +config VMGENID > +    tristate "Virtual Machine Generation ID driver" > +    depends on ACPI I think this is not needed. We have /dev/vmgenid regardless of ACPI device for container usecase and we may have a different HW emulation for s390 and PowerPC. > +    default N > +    help > +      This is a Virtual Machine Generation ID driver which provides > +      a virtual machine generation counter. The driver exposes FS ops > +      on /dev/vmgenid through which it can provide information and > +      notifications on VM generation changes that happen on snapshots > +      or cloning. > +      This enables applications and libraries that store or cache > +      sensitive information, to know that they need to regenerate it > +      after process memory has been exposed to potential copying. > + > +      To compile this driver as a module, choose M here: the > +      module will be called vmgenid. > + >  config FSL_HV_MANAGER >      tristate "Freescale hypervisor management driver" >      depends on FSL_SOC > diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile > index f28425c..889be01 100644 > --- a/drivers/virt/Makefile > +++ b/drivers/virt/Makefile > @@ -4,6 +4,7 @@ >  # >   >  obj-$(CONFIG_FSL_HV_MANAGER)    += fsl_hypervisor.o > +obj-$(CONFIG_VMGENID)        += vmgenid.o >  obj-y                += vboxguest/ >   >  obj-$(CONFIG_NITRO_ENCLAVES)    += nitro_enclaves/ > diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c > new file mode 100644 > index 0000000..c4d4683 > --- /dev/null > +++ b/drivers/virt/vmgenid.c > @@ -0,0 +1,435 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Virtual Machine Generation ID driver > + * > + * Copyright (C) 2018 Red Hat Inc. All rights reserved. > + * > + * Copyright (C) 2020 Amazon. All rights reserved. > + * > + *    Authors: > + *      Adrian Catangiu > + *      Or Idgar > + *      Gal Hammer > + * > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DEV_NAME "vmgenid" > +ACPI_MODULE_NAME(DEV_NAME); > + > +struct acpi_data { > +    uuid_t uuid; > +    void   *uuid_iomap; > +}; > + > +struct driver_data { I'd suggest vmgenid_data > +    unsigned long     map_buf; We use tab=8 for indentation. Please run your patch though scripts/checkpatch.pl to make sure it conforms the coding style. > +    wait_queue_head_t read_waitq; > +    atomic_t          generation_counter; > + > +    unsigned int      watchers; > +    atomic_t          outdated_watchers; > +    wait_queue_head_t outdated_waitq; > +    spinlock_t        lock; > + > +    struct acpi_data  *acpi_data; > +}; > +struct driver_data driver_data; static > + > +struct file_data { > +    unsigned int acked_gen_counter; > +}; > + > +static int equals_gen_counter(unsigned int counter) > +{ > +    return counter == atomic_read(&driver_data.generation_counter); > +} > + > +static void vmgenid_bump_generation(void) > +{ > +    unsigned long flags; > +    int counter; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    counter = atomic_inc_return(&driver_data.generation_counter); > +    *((int *) driver_data.map_buf) = counter; > +    atomic_set(&driver_data.outdated_watchers, driver_data.watchers); > + > +    wake_up_interruptible(&driver_data.read_waitq); > +    wake_up_interruptible(&driver_data.outdated_waitq); > +    spin_unlock_irqrestore(&driver_data.lock, flags); > +} > + > +static void vmgenid_put_outdated_watchers(void) > +{ > +    if (atomic_dec_and_test(&driver_data.outdated_watchers)) > +        wake_up_interruptible(&driver_data.outdated_waitq); > +} > + > +static int vmgenid_open(struct inode *inode, struct file *file) > +{ > +    struct file_data *fdata = kzalloc(sizeof(struct file_data), > GFP_KERNEL); > +    unsigned long flags; > + > +    if (!fdata) > +        return -ENOMEM; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    fdata->acked_gen_counter = > atomic_read(&driver_data.generation_counter); > +    ++driver_data.watchers; > +    spin_unlock_irqrestore(&driver_data.lock, flags); > + > +    file->private_data = fdata; > + > +    return 0; > +} > + > +static int vmgenid_close(struct inode *inode, struct file *file) > +{ > +    struct file_data *fdata = file->private_data; > +    unsigned long flags; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    if (!equals_gen_counter(fdata->acked_gen_counter)) > +        vmgenid_put_outdated_watchers(); > +    --driver_data.watchers; > +    spin_unlock_irqrestore(&driver_data.lock, flags); > + > +    kfree(fdata); > + > +    return 0; > +} > + > +static ssize_t > +vmgenid_read(struct file *file, char __user *ubuf, size_t nbytes, Please keep the function name at the same line as return type and wrap parameters to the next line. > loff_t *ppos) > +{ > +    struct file_data *fdata = file->private_data; > +    ssize_t ret; > +    int gen_counter; > + > +    if (nbytes == 0) > +        return 0; > +    /* disallow partial reads */ > +    if (nbytes < sizeof(gen_counter)) > +        return -EINVAL; > + > +    if (equals_gen_counter(fdata->acked_gen_counter)) { > +        if (file->f_flags & O_NONBLOCK) > +            return -EAGAIN; > +        ret = wait_event_interruptible( > +            driver_data.read_waitq, > +            !equals_gen_counter(fdata->acked_gen_counter) > +        ); > +        if (ret) > +            return ret; > +    } > + > +    gen_counter = atomic_read(&driver_data.generation_counter); > +    ret = copy_to_user(ubuf, &gen_counter, sizeof(gen_counter)); > +    if (ret) > +        return -EFAULT; > + > +    return sizeof(gen_counter); > +} > + > +static ssize_t vmgenid_write(struct file *file, const char __user *ubuf, > +                size_t count, loff_t *ppos) > +{ > +    struct file_data *fdata = file->private_data; > +    unsigned int new_acked_gen; > +    unsigned long flags; > + > +    /* disallow partial writes */ > +    if (count != sizeof(new_acked_gen)) > +        return -EINVAL; > +    if (copy_from_user(&new_acked_gen, ubuf, count)) > +        return -EFAULT; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    /* wrong gen-counter acknowledged */ > +    if (!equals_gen_counter(new_acked_gen)) { > +        spin_unlock_irqrestore(&driver_data.lock, flags); > +        return -EINVAL; > +    } > +    if (!equals_gen_counter(fdata->acked_gen_counter)) { > +        fdata->acked_gen_counter = new_acked_gen; > +        vmgenid_put_outdated_watchers(); > +    } > +    spin_unlock_irqrestore(&driver_data.lock, flags); > + > +    return (ssize_t)count; > +} > + > +static __poll_t > +vmgenid_poll(struct file *file, poll_table *wait) > +{ > +    __poll_t mask = 0; > +    struct file_data *fdata = file->private_data; > + > +    if (!equals_gen_counter(fdata->acked_gen_counter)) > +        return EPOLLIN | EPOLLRDNORM; > + > +    poll_wait(file, &driver_data.read_waitq, wait); > + > +    if (!equals_gen_counter(fdata->acked_gen_counter)) > +        mask = EPOLLIN | EPOLLRDNORM; > + > +    return mask; > +} > + > +static long vmgenid_ioctl(struct file *file, > +        unsigned int cmd, unsigned long arg) > +{ > +    struct file_data *fdata = file->private_data; > +    unsigned long timeout_ns; > +    ktime_t until; > +    int ret = 0; > + > +    switch (cmd) { > +    case VMGENID_GET_OUTDATED_WATCHERS: > +        ret = atomic_read(&driver_data.outdated_watchers); > +        break; > +    case VMGENID_WAIT_WATCHERS: > +        timeout_ns = arg * NSEC_PER_MSEC; > +        until = timeout_ns ? ktime_set(0, timeout_ns) : KTIME_MAX; > + > +        ret = wait_event_interruptible_hrtimeout( > +            driver_data.outdated_waitq, > +            (!atomic_read(&driver_data.outdated_watchers) || > +                    !equals_gen_counter(fdata->acked_gen_counter)), > +            until > +        ); > +        if (atomic_read(&driver_data.outdated_watchers)) > +            ret = -EINTR; > +        else > +            ret = 0; > +        break; > +    case VMGENID_FORCE_GEN_UPDATE: > +        if (!checkpoint_restore_ns_capable(current_user_ns())) > +            return -EACCES; > +        vmgenid_bump_generation(); > +        break; > +    default: > +        ret = -EINVAL; > +        break; > +    } > +    return ret; > +} > + > +static int vmgenid_mmap(struct file *file, struct vm_area_struct *vma) > +{ > +    struct file_data *fdata = file->private_data; > + > +    if (vma->vm_pgoff != 0 || vma_pages(vma) > 1) > +        return -EINVAL; > + > +    if ((vma->vm_flags & VM_WRITE) != 0) > +        return -EPERM; > + > +    vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > +    vma->vm_flags &= ~VM_MAYWRITE; > +    vma->vm_private_data = fdata; > + > +    return vm_insert_page(vma, vma->vm_start, > +                          virt_to_page(driver_data.map_buf)); > +} > + > +static const struct file_operations fops = { > +    .owner          = THIS_MODULE, > +    .mmap           = vmgenid_mmap, > +    .open           = vmgenid_open, > +    .release        = vmgenid_close, > +    .read           = vmgenid_read, > +    .write          = vmgenid_write, > +    .poll           = vmgenid_poll, > +    .unlocked_ioctl = vmgenid_ioctl, > +}; > + > +struct miscdevice vmgenid_misc = { static > +    .minor = MISC_DYNAMIC_MINOR, > +    .name = "vmgenid", > +    .fops = &fops, > +}; > + > +static int vmgenid_acpi_map(struct acpi_data *priv, acpi_handle handle) > +{ > +    int i; > +    phys_addr_t phys_addr; > +    struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; > +    acpi_status status; > +    union acpi_object *pss; > +    union acpi_object *element; > + > +    status = acpi_evaluate_object(handle, "ADDR", NULL, &buffer); > +    if (ACPI_FAILURE(status)) { > +        ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR")); > +        return -ENODEV; > +    } > +    pss = buffer.pointer; > +    if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2) > +        return -EINVAL; > + > +    phys_addr = 0; > +    for (i = 0; i < pss->package.count; i++) { > +        element = &(pss->package.elements[i]); > +        if (element->type != ACPI_TYPE_INTEGER) > +            return -EINVAL; > +        phys_addr |= element->integer.value << i * 32; > +    } > + > +    priv->uuid_iomap = acpi_os_map_memory(phys_addr, sizeof(uuid_t)); > +    if (!priv->uuid_iomap) { > +        pr_err("Could not map memory at 0x%llx, size %u\n", > +               phys_addr, > +               (u32) sizeof(uuid_t)); > +        return -ENOMEM; > +    } > + > +    memcpy_fromio(&priv->uuid, priv->uuid_iomap, sizeof(uuid_t)); > + > +    return 0; > +} > + > +static int vmgenid_acpi_add(struct acpi_device *device) > +{ > +    int ret; > + > +    if (!device) > +        return -EINVAL; > + > +    driver_data.acpi_data = kzalloc(sizeof(struct acpi_data), GFP_KERNEL); > +    if (!driver_data.acpi_data) { > +        pr_err("vmgenid: failed to allocate acpi_data\n"); > +        return -ENOMEM; > +    } > +    device->driver_data = &driver_data; > + > +    ret = vmgenid_acpi_map(driver_data.acpi_data, device->handle); > +    if (ret < 0) { > +        pr_err("vmgenid: failed to map acpi device\n"); > +        goto err; > +    } > + > +    return 0; > + > +err: > +    kfree(driver_data.acpi_data); > +    driver_data.acpi_data = NULL; > + > +    return ret; > +} > + > +static int vmgenid_acpi_remove(struct acpi_device *device) > +{ > +    struct acpi_data *priv; > + > +    if (!device || !acpi_driver_data(device)) > +        return -EINVAL; > + > +    device->driver_data = NULL; > +    priv = driver_data.acpi_data; > +    driver_data.acpi_data = NULL; > + > +    if (priv && priv->uuid_iomap) > +        acpi_os_unmap_memory(priv->uuid_iomap, sizeof(uuid_t)); > +    kfree(priv); > + > +    return 0; > +} > + > +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event) > +{ > +    struct acpi_data *priv; > +    uuid_t old_uuid; > + > +    if (!device || !acpi_driver_data(device)) { > +        pr_err("VMGENID notify with NULL private data\n"); > +        return; > +    } > +    priv = driver_data.acpi_data; > + > +    /* update VM Generation UUID */ > +    old_uuid = priv->uuid; > +    memcpy_fromio(&priv->uuid, priv->uuid_iomap, sizeof(uuid_t)); > + > +    if (memcmp(&old_uuid, &priv->uuid, sizeof(uuid_t))) { > +        /* HW uuid updated */ > +        vmgenid_bump_generation(); > +        add_device_randomness(&priv->uuid, sizeof(uuid_t)); > +    } > +} > + > +static const struct acpi_device_id vmgenid_ids[] = { > +    {"QEMUVGID", 0}, > +    {"", 0}, > +}; > + > +static struct acpi_driver acpi_vmgenid_driver = { > +    .name = "vm_generation_id", > +    .ids = vmgenid_ids, > +    .owner = THIS_MODULE, > +    .ops = { > +        .add = vmgenid_acpi_add, > +        .remove = vmgenid_acpi_remove, > +        .notify = vmgenid_acpi_notify, > +    } > +}; > + > +static int __init vmgenid_init(void) > +{ > +    int ret; > + > +    driver_data.map_buf = get_zeroed_page(GFP_KERNEL); > +    if (!driver_data.map_buf) > +        return -ENOMEM; > + > +    atomic_set(&driver_data.generation_counter, 0); > +    atomic_set(&driver_data.outdated_watchers, 0); > +    init_waitqueue_head(&driver_data.read_waitq); > +    init_waitqueue_head(&driver_data.outdated_waitq); > +    spin_lock_init(&driver_data.lock); > +    driver_data.acpi_data = NULL; > + > +    ret = misc_register(&vmgenid_misc); > +    if (ret < 0) { > +        pr_err("misc_register() failed for vmgenid\n"); > +        goto err; > +    } > + > +    ret = acpi_bus_register_driver(&acpi_vmgenid_driver); > +    if (ret < 0) > +        pr_warn("No vmgenid acpi device found\n"); I think this needs to be reworked to support no-ACPI version. For instance we can call here something like ret = vmgenid_hw_register(); and have #ifdef CONFIG_ACPI static int vmgenid_hw_register(void) { return acpi_bus_register_driver(&acpi_vmgenid_driver); } #else static int vmgenid_hw_register(void) { return 0; } #endif > + > +    return 0; > + > +err: > +    free_pages(driver_data.map_buf, 0); > +    driver_data.map_buf = 0; > + > +    return ret; > +} > + > +static void __exit vmgenid_exit(void) > +{ > +    acpi_bus_unregister_driver(&acpi_vmgenid_driver); > + > +    misc_deregister(&vmgenid_misc); > +    free_pages(driver_data.map_buf, 0); > +    driver_data.map_buf = 0; > +} > + > +module_init(vmgenid_init); > +module_exit(vmgenid_exit); > + > +MODULE_AUTHOR("Adrian Catangiu"); > +MODULE_DESCRIPTION("Virtual Machine Generation ID"); > +MODULE_LICENSE("GPL"); > +MODULE_VERSION("0.1"); > diff --git a/include/uapi/linux/vmgenid.h b/include/uapi/linux/vmgenid.h > new file mode 100644 > index 0000000..9316b00 > --- /dev/null > +++ b/include/uapi/linux/vmgenid.h > @@ -0,0 +1,14 @@ > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ > + > +#ifndef _UAPI_LINUX_VMGENID_H > +#define _UAPI_LINUX_VMGENID_H > + > +#include > + > +#define VMGENID_IOCTL 0x2d > +#define VMGENID_GET_OUTDATED_WATCHERS _IO(VMGENID_IOCTL, 1) > +#define VMGENID_WAIT_WATCHERS         _IO(VMGENID_IOCTL, 2) > +#define VMGENID_FORCE_GEN_UPDATE      _IO(VMGENID_IOCTL, 3) > + > +#endif /* _UAPI_LINUX_VMGENID_H */ > + > -- > 2.7.4 > -- Sincerely yours, Mike. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail.kernel.org ([198.145.29.99]:50318 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732513AbgK1TB4 (ORCPT ); Sat, 28 Nov 2020 14:01:56 -0500 Date: Sat, 28 Nov 2020 12:16:04 +0200 From: Mike Rapoport Subject: Re: [PATCH v3] drivers/virt: vmgenid: add vm generation id driver Message-ID: <20201128101604.GC557259@kernel.org> References: <3E05451B-A9CD-4719-99D0-72750A304044@amazon.com> <300d4404-3efe-880e-ef30-692eabbff5f7@de.ibm.com> <20201119173800.GD8537@kernel.org> <1cdb6fac-0d50-3399-74a6-24c119ebbaa5@amazon.de> <106f56ca-49bc-7cad-480f-4b26656e90ce@gmail.com> <96625ce2-66c6-34b8-ef81-7c17c05b4c7a@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <96625ce2-66c6-34b8-ef81-7c17c05b4c7a@amazon.com> List-ID: To: "Catangiu, Adrian Costin" Cc: Dmitry Safonov <0x7f454c46@gmail.com>, Alexander Graf , Christian Borntraeger , "Jason A. Donenfeld" , Jann Horn , Willy Tarreau , "MacCarthaigh, Colm" , Andy Lutomirski , "Theodore Y. Ts'o" , Eric Biggers , "open list:DOCUMENTATION" , kernel list , "Woodhouse, David" , "bonzini@gnu.org" , "Singh, Balbir" , "Weiss, Radu" , "oridgar@gmail.com" , "ghammer@redhat.com" , Jonathan Corbet , Greg Kroah-Hartman , "Michael S. Tsirkin" , Qemu Developers , KVM list , Michal Hocko , "Rafael J. Wysocki" , Pavel Machek , Linux API , "mpe@ellerman.id.au" , linux-s390 , "areber@redhat.com" , Pavel Emelyanov , Andrey Vagin , Pavel Tikhomirov , "gil@azul.com" , "asmehra@redhat.com" , "dgunigun@redhat.com" , "vijaysun@ca.ibm.com" , "Eric W. Biederman" Hi Adrian, Usually each version of a patch is a separate e-mail thread On Fri, Nov 27, 2020 at 08:26:02PM +0200, Catangiu, Adrian Costin wrote: > - Background >=20 > The VM Generation ID is a feature defined by Microsoft (paper: > http://go.microsoft.com/fwlink/?LinkId=3D260709) and supported by > multiple hypervisor vendors. >=20 > The feature is required in virtualized environments by apps that work > with local copies/caches of world-unique data such as random values, > uuids, monotonically increasing counters, etc. > Such apps can be negatively affected by VM snapshotting when the VM > is either cloned or returned to an earlier point in time. >=20 > The VM Generation ID is a simple concept meant to alleviate the issue > by providing a unique ID that changes each time the VM is restored > from a snapshot. The hw provided UUID value can be used to > differentiate between VMs or different generations of the same VM. >=20 > - Problem >=20 > The VM Generation ID is exposed through an ACPI device by multiple > hypervisor vendors but neither the vendors or upstream Linux have no > default driver for it leaving users to fend for themselves. >=20 > Furthermore, simply finding out about a VM generation change is only > the starting point of a process to renew internal states of possibly > multiple applications across the system. This process could benefit > from a driver that provides an interface through which orchestration > can be easily done. >=20 > - Solution >=20 > This patch is a driver that exposes a monotonic incremental Virtual > Machine Generation u32 counter via a char-dev FS interface. The FS > interface provides sync and async VmGen counter updates notifications. > It also provides VmGen counter retrieval and confirmation mechanisms. >=20 > The generation counter and the interface through which it is exposed > are available even when there is no acpi device present. >=20 > When the device is present, the hw provided UUID is not exposed to > userspace, it is internally used by the driver to keep accounting for > the exposed VmGen counter. The counter starts from zero when the > driver is initialized and monotonically increments every time the hw > UUID changes (the VM generation changes). > On each hw UUID change, the new hypervisor-provided UUID is also fed > to the kernel RNG. >=20 > If there is no acpi vmgenid device present, the generation changes are > not driven by hw vmgenid events but can be driven by software through > a dedicated driver ioctl. >=20 > This patch builds on top of Or Idgar 's proposal > https://lkml.org/lkml/2018/3/1/498 >=20 > - Future improvements >=20 > Ideally we would want the driver to register itself based on devices' > _CID and not _HID, but unfortunately I couldn't find a way to do that. > The problem is that ACPI device matching is done by > '__acpi_match_device()' which exclusively looks at > 'acpi_hardware_id *hwid'. >=20 > There is a path for platform devices to match on _CID when _HID is > 'PRP0001' - but this is not the case for the Qemu vmgenid device. >=20 > Guidance and help here would be greatly appreciated. >=20 > Signed-off-by: Adrian Catangiu >=20 > --- =20 Please put the history in the descending order next time v2 -> v3: ... v1 -> v2: ... > v1 -> v2: >=20 > =EF=BF=BD - expose to userspace a monotonically increasing u32 Vm Gen Cou= nter > =EF=BF=BD=EF=BF=BD=EF=BF=BD instead of the hw VmGen UUID > =EF=BF=BD - since the hw/hypervisor-provided 128-bit UUID is not public > =EF=BF=BD=EF=BF=BD=EF=BF=BD anymore, add it to the kernel RNG as device r= andomness > =EF=BF=BD - insert driver page containing Vm Gen Counter in the user vma = in > =EF=BF=BD=EF=BF=BD=EF=BF=BD the driver's mmap handler instead of using a = fault handler > =EF=BF=BD - turn driver into a misc device driver to auto-create /dev/vmg= enid > =EF=BF=BD - change ioctl arg to avoid leaking kernel structs to userspace > =EF=BF=BD - update documentation > =EF=BF=BD - various nits > =EF=BF=BD - rebase on top of linus latest >=20 > v2 -> v3: >=20 > =EF=BF=BD - separate the core driver logic and interface, from the ACPI d= evice. > =EF=BF=BD=EF=BF=BD=EF=BF=BD The ACPI vmgenid device is now one possible b= ackend. > =EF=BF=BD - fix issue when timeout=3D0 in VMGENID_WAIT_WATCHERS > =EF=BF=BD - add locking to avoid races between fs ops handlers and hw irq > =EF=BF=BD=EF=BF=BD=EF=BF=BD driven generation updates > =EF=BF=BD - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is > =EF=BF=BD=EF=BF=BD=EF=BF=BD outdated or a generation change happens while= waiting (thus making > =EF=BF=BD=EF=BF=BD=EF=BF=BD current caller outdated), the ioctl returns -= EINTR to signal the > =EF=BF=BD=EF=BF=BD=EF=BF=BD user to handle event and retry. Fixes blockin= g on oneself. > =EF=BF=BD - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by > =EF=BF=BD=EF=BF=BD=EF=BF=BD CAP_CHECKPOINT_RESTORE capability, through wh= ich software can force > =EF=BF=BD=EF=BF=BD=EF=BF=BD generation bump. > --- > =EF=BF=BDDocumentation/virt/vmgenid.rst | 240 +++++++++++++++++++++++ > =EF=BF=BDdrivers/virt/Kconfig=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD |=EF=BF=BD 17 ++ > =EF=BF=BDdrivers/virt/Makefile=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD |=EF=BF=BD=EF=BF=BD 1 + > =EF=BF=BDdrivers/virt/vmgenid.c=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD | 435 ++++++++++++++++++++++++++++++++++++++= +++ > =EF=BF=BDinclude/uapi/linux/vmgenid.h=EF=BF=BD=EF=BF=BD |=EF=BF=BD 14 ++ > =EF=BF=BD5 files changed, 707 insertions(+) > =EF=BF=BDcreate mode 100644 Documentation/virt/vmgenid.rst > =EF=BF=BDcreate mode 100644 drivers/virt/vmgenid.c > =EF=BF=BDcreate mode 100644 include/uapi/linux/vmgenid.h >=20 > diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.= rst > new file mode 100644 > index 0000000..b6a9f8d > --- /dev/null > +++ b/Documentation/virt/vmgenid.rst > @@ -0,0 +1,240 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > +VMGENID > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The "=3D=3D" line should be the same length as the title, I think. > + > +The VM Generation ID is a feature defined by Microsoft (paper: > +http://go.microsoft.com/fwlink/?LinkId=3D260709) and supported by > +multiple hypervisor vendors. > + > +The feature is required in virtualized environments by apps that work Please spell 'applications' fully > +with local copies/caches of world-unique data such as random values, > +uuids, monotonically increasing counters, etc. UUIDs > +Such apps can be negatively affected by VM snapshotting when the VM ^applications > +is either cloned or returned to an earlier point in time. > + > +The VM Generation ID is a simple concept meant to alleviate the issue > +by providing a unique ID that changes each time the VM is restored > +from a snapshot. The hw provided UUID value can be used to ^hardware (and below as well) > +differentiate between VMs or different generations of the same VM. > + > +The VM Generation ID is exposed through an ACPI device by multiple > +hypervisor vendors. The driver for it lives at > +``drivers/virt/vmgenid.c`` > + > +The ``vmgenid`` driver exposes a monotonic incremental Virtual > +Machine Generation u32 counter via a char-dev FS interface that > +provides sync and async VmGen counter updates notifications. It also > +provides VmGen counter retrieval and confirmation mechanisms. It would be nice to memntion here the name of the chardev :) > +This counter and the interface through which it is exposed are > +available even when there is no acpi device present. > + > +When the device is present, the hw provided UUID is not exposed to > +userspace, it is internally used by the driver to keep accounting for > +the exposed VmGen counter. The counter starts from zero when the > +driver is initialized and monotonically increments every time the hw > +UUID changes (the VM generation changes). > +On each hw UUID change, the new UUID is also fed to the kernel RNG. > + > +If there is no acpi vmgenid device present, the generation changes are > +not driven by hw vmgenid events and thus should be driven by software > +through a dedicated driver ioctl. > + > +Driver interface: > + > +``open()``: > +=EF=BF=BD When the device is opened, a copy of the current Vm-Gen-Id (co= unter) > +=EF=BF=BD is associated with the open file descriptor. The driver now tr= acks > +=EF=BF=BD this file as an independent *watcher*. The driver tracks how m= any > +=EF=BF=BD watchers are aware of the latest Vm-Gen-Id counter and how man= y of > +=EF=BF=BD them are *outdated*; outdated being those that have lived thro= ugh > +=EF=BF=BD a Vm-Gen-Id change but not yet confirmed the new generation co= unter. > + > +``read()``: > +=EF=BF=BD Read is meant to provide the *new* VM generation counter when a > +=EF=BF=BD generation change takes place. The read operation blocks until= the > +=EF=BF=BD associated counter is no longer up to date - until HW vm gen id > +=EF=BF=BD changes - at which point the new counter is provided/returned. > +=EF=BF=BD Nonblocking ``read()`` uses ``EAGAIN`` to signal that there is= no > +=EF=BF=BD *new* counter value available. The generation counter is consi= dered > +=EF=BF=BD *new* for each open file descriptor that hasn't confirmed the = new > +=EF=BF=BD value, following a generation change. Therefore, once a genera= tion > +=EF=BF=BD change takes place, all ``read()`` calls will immediately retu= rn the > +=EF=BF=BD new generation counter and will continue to do so until the > +=EF=BF=BD new value is confirmed back to the driver through ``write()``. > +=EF=BF=BD Partial reads are not allowed - read buffer needs to be at lea= st > +=EF=BF=BD ``sizeof(unsigned)`` in size. > + > +``write()``: > +=EF=BF=BD Write is used to confirm the up-to-date Vm Gen counter back to= the > +=EF=BF=BD driver. > +=EF=BF=BD Following a VM generation change, all existing watchers are ma= rked > +=EF=BF=BD as *outdated*. Each file descriptor will maintain the *outdate= d* > +=EF=BF=BD status until a ``write()`` confirms the up-to-date counter bac= k to > +=EF=BF=BD the driver. > +=EF=BF=BD Partial writes are not allowed - write buffer should be exactly > +=EF=BF=BD ``sizeof(unsigned)`` in size. > + > +``poll()``: > +=EF=BF=BD Poll is implemented to allow polling for generation counter up= dates. > +=EF=BF=BD Such updates result in ``EPOLLIN`` polling status until the new > +=EF=BF=BD up-to-date counter is confirmed back to the driver through a > +=EF=BF=BD ``write()``. > + > +``ioctl()``: > +=EF=BF=BD The driver also adds support for tracking count of open file > +=EF=BF=BD descriptors that haven't acknowledged a generation counter upd= ate. > +=EF=BF=BD This is exposed through two IOCTLs: > + > +=EF=BF=BD - VMGENID_GET_OUTDATED_WATCHERS: immediately returns the numbe= r of > +=EF=BF=BD=EF=BF=BD=EF=BF=BD *outdated* watchers - number of file descrip= tors that were open > +=EF=BF=BD=EF=BF=BD=EF=BF=BD during a VM generation change, and which hav= e not yet confirmed the > +=EF=BF=BD=EF=BF=BD=EF=BF=BD new generation counter. > +=EF=BF=BD - VMGENID_WAIT_WATCHERS: blocks until there are no more *outda= ted* > +=EF=BF=BD=EF=BF=BD=EF=BF=BD watchers, or if a ``timeout`` argument is pr= ovided, until the > +=EF=BF=BD=EF=BF=BD=EF=BF=BD timeout expires. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD If the current caller is *outdated* or a gen= eration change happens > +=EF=BF=BD=EF=BF=BD=EF=BF=BD while waiting (thus making current caller *o= utdated*), the ioctl > +=EF=BF=BD=EF=BF=BD=EF=BF=BD returns ``-EINTR`` to signal the user to han= dle event and retry. > +=EF=BF=BD - VMGENID_FORCE_GEN_UPDATE: forces a generation counter bump. = Can only > +=EF=BF=BD=EF=BF=BD=EF=BF=BD be used by processes with CAP_CHECKPOINT_RES= TORE or CAP_SYS_ADMIN > +=EF=BF=BD=EF=BF=BD=EF=BF=BD capabilities. > + > +``mmap()``: > +=EF=BF=BD The driver supports ``PROT_READ, MAP_SHARED`` mmaps of a singl= e page > +=EF=BF=BD in size. The first 4 bytes of the mapped page will contain an > +=EF=BF=BD up-to-date copy of the VM generation counter. > +=EF=BF=BD The mapped memory can be used as a low-latency generation coun= ter > +=EF=BF=BD probe mechanism in critical sections - see examples. > + > +``close()``: > +=EF=BF=BD Removes the file descriptor as a Vm generation counter watcher. > + > +Example application workflows > +----------------------------- > + > +1) Watchdog thread simplified example:: > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD void watchdog_thread_handler(int *thread_act= ive) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned genid; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD int fd =3D open(= "/dev/vmgenid", O_RDWR | O_CLOEXEC, S_IRUSR | > S_IWUSR); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD do { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD // read new gen ID - blocks until VM generation changes > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD read(fd, &genid, sizeof(genid)); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD // because of VM generation change, we need to rebuild world > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD reseed_app_env(); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD // confirm we're done handling gen ID update > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD write(fd, &genid, sizeof(genid)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD } while (atomic_= read(thread_active)); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD close(fd); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +2) ASYNC simplified example:: > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD void handle_io_on_vmgenfd(int vmgenfd) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned genid; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // read new gen = ID - we need it to confirm we've handled update > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD read(fd, &genid,= sizeof(genid)); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // because of VM= generation change, we need to rebuild world > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD reseed_app_env(); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // confirm we're= done handling the gen ID update > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD write(fd, &genid= , sizeof(genid)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int main() { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD int epfd, vmgenf= d; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD struct epoll_eve= nt ev; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD epfd =3D epoll_c= reate(EPOLL_QUEUE_LEN); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD vmgenfd =3D open= ("/dev/vmgenid", > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD O_RDWR | O_CLOEXEC | O_NONBLOCK, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD S_IRUSR | S_IWUSR); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // register vmge= nid for polling > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ev.events =3D EP= OLLIN; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ev.data.fd =3D v= mgenfd; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD epoll_ctl(epfd, = EPOLL_CTL_ADD, vmgenfd, &ev); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // register othe= r parts of your app for polling > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // ... > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD while (1) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD // wait for something to do... > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD int nfds =3D epoll_wait(epfd, events, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD MAX_EPOLL_EVENTS_PER_RUN, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD EPOLL_RUN_TIMEOUT); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD if (nfds < 0) die("Error in epoll_wait!"); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD // for each ready fd > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD for(int i =3D 0; i < nfds; i++) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD int fd =3D events[i].data.fd; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (fd =3D=3D vmgenfd) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD handle= _io_on_vmgenfd(vmgenfd); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD else > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD handle= _some_other_part_of_the_app(fd); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +3) Mapped memory polling simplified example:: > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD /* > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD* app/library function that provide= s cached secrets > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD*/ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD char * safe_cached_secret(app_data_t *app) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD char *secret; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD volatile unsigne= d *const genid_ptr =3D get_vmgenid_mapping(app); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD again: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD secret =3D __cac= hed_secret(app); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (unlikely(*ge= nid_ptr !=3D app->cached_genid)) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD // rebuild world then confirm the genid update (thru write) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD rebuild_caches(app); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD app->cached_genid =3D *genid_ptr; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD ack_vmgenid_update(app); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD goto again; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return secret; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +4) Orchestrator simplified example:: > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD /* > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD* orchestrator - manages multiple a= pps and libraries used by a service > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD* and tries to make sure all sensit= ive components gracefully handle > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD* VM generation changes. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD* Following function is called on d= etection of a VM generation change. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD*/ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int handle_vmgen_update(int vmgen_fd, unsign= ed new_gen_id) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // pause until a= ll components have handled event > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pause_service(); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // confirm *this= * watcher as up-to-date > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD write(vmgen_fd, = &new_gen_id, sizeof(unsigned)); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // wait for all = *others* for at most 5 seconds. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ioctl(vmgen_fd, = VMGENID_WAIT_WATCHERS, 5000); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD // all apps on t= he system have rebuilt worlds > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD resume_service(); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig > index 80c5f9c1..5d5f37b 100644 > --- a/drivers/virt/Kconfig > +++ b/drivers/virt/Kconfig > @@ -13,6 +13,23 @@ menuconfig VIRT_DRIVERS > =EF=BF=BD > =EF=BF=BDif VIRT_DRIVERS > =EF=BF=BD > +config VMGENID > +=EF=BF=BD=EF=BF=BD=EF=BF=BD tristate "Virtual Machine Generation ID driv= er" > +=EF=BF=BD=EF=BF=BD=EF=BF=BD depends on ACPI I think this is not needed. We have /dev/vmgenid regardless of ACPI device for container usecase and we may have a different HW emulation for s390 and PowerPC. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD default N > +=EF=BF=BD=EF=BF=BD=EF=BF=BD help > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD This is a Virtual Machine Generati= on ID driver which provides > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD a virtual machine generation count= er. The driver exposes FS ops > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD on /dev/vmgenid through which it c= an provide information and > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD notifications on VM generation cha= nges that happen on snapshots > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD or cloning. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD This enables applications and libr= aries that store or cache > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD sensitive information, to know tha= t they need to regenerate it > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD after process memory has been expo= sed to potential copying. > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD To compile this driver as a module= , choose M here: the > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD module will be called vmgenid. > + > =EF=BF=BDconfig FSL_HV_MANAGER > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD tristate "Freescale hypervisor manag= ement driver" > =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD depends on FSL_SOC > diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile > index f28425c..889be01 100644 > --- a/drivers/virt/Makefile > +++ b/drivers/virt/Makefile > @@ -4,6 +4,7 @@ > =EF=BF=BD# > =EF=BF=BD > =EF=BF=BDobj-$(CONFIG_FSL_HV_MANAGER)=EF=BF=BD=EF=BF=BD=EF=BF=BD +=3D fsl= _hypervisor.o > +obj-$(CONFIG_VMGENID)=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF= =BF=BD +=3D vmgenid.o > =EF=BF=BDobj-y=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF= =BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD +=3D vboxguest/ > =EF=BF=BD > =EF=BF=BDobj-$(CONFIG_NITRO_ENCLAVES)=EF=BF=BD=EF=BF=BD=EF=BF=BD +=3D nit= ro_enclaves/ > diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c > new file mode 100644 > index 0000000..c4d4683 > --- /dev/null > +++ b/drivers/virt/vmgenid.c > @@ -0,0 +1,435 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Virtual Machine Generation ID driver > + * > + * Copyright (C) 2018 Red Hat Inc. All rights reserved. > + * > + * Copyright (C) 2020 Amazon. All rights reserved. > + * > + *=EF=BF=BD=EF=BF=BD=EF=BF=BD Authors: > + *=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD Adrian Catangiu > + *=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD Or Idgar > + *=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD Gal Hammer > + * > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DEV_NAME "vmgenid" > +ACPI_MODULE_NAME(DEV_NAME); > + > +struct acpi_data { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD uuid_t uuid; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD void=EF=BF=BD=EF=BF=BD *uuid_iomap; > +}; > + > +struct driver_data { I'd suggest vmgenid_data > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned long=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD map_buf; We use tab=3D8 for indentation. Please run your patch though scripts/checkpatch.pl to make sure it conforms the coding style. > +=EF=BF=BD=EF=BF=BD=EF=BF=BD wait_queue_head_t read_waitq; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD atomic_t=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD generation_counter; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned int=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD=EF=BF=BD watchers; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD atomic_t=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD outdated_watchers; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD wait_queue_head_t outdated_waitq; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spinlock_t=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD lock; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct acpi_data=EF=BF=BD *acpi_data; > +}; > +struct driver_data driver_data; static=20 > + > +struct file_data { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned int acked_gen_counter; > +}; > + > +static int equals_gen_counter(unsigned int counter) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return counter =3D=3D atomic_read(&driver_da= ta.generation_counter); > +} > + > +static void vmgenid_bump_generation(void) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned long flags; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int counter; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_lock_irqsave(&driver_data.lock, flags); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD counter =3D atomic_inc_return(&driver_data.g= eneration_counter); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD *((int *) driver_data.map_buf) =3D counter; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD atomic_set(&driver_data.outdated_watchers, d= river_data.watchers); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD wake_up_interruptible(&driver_data.read_wait= q); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD wake_up_interruptible(&driver_data.outdated_= waitq); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_unlock_irqrestore(&driver_data.lock, fl= ags); > +} > + > +static void vmgenid_put_outdated_watchers(void) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (atomic_dec_and_test(&driver_data.outdate= d_watchers)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD wake_up_interrup= tible(&driver_data.outdated_waitq); > +} > + > +static int vmgenid_open(struct inode *inode, struct file *file) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D kzalloc(sizeof(s= truct file_data), > GFP_KERNEL); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned long flags; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!fdata) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -ENOMEM; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_lock_irqsave(&driver_data.lock, flags); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD fdata->acked_gen_counter =3D > atomic_read(&driver_data.generation_counter); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ++driver_data.watchers; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_unlock_irqrestore(&driver_data.lock, fl= ags); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD file->private_data =3D fdata; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > +} > + > +static int vmgenid_close(struct inode *inode, struct file *file) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D file->private_da= ta; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned long flags; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_lock_irqsave(&driver_data.lock, flags); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!equals_gen_counter(fdata->acked_gen_cou= nter)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD vmgenid_put_outd= ated_watchers(); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD --driver_data.watchers; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_unlock_irqrestore(&driver_data.lock, fl= ags); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD kfree(fdata); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > +} > + > +static ssize_t > +vmgenid_read(struct file *file, char __user *ubuf, size_t nbytes, Please keep the function name at the same line as return type and wrap parameters to the next line. > loff_t *ppos) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D file->private_da= ta; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ssize_t ret; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int gen_counter; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (nbytes =3D=3D 0) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD /* disallow partial reads */ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (nbytes < sizeof(gen_counter)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (equals_gen_counter(fdata->acked_gen_coun= ter)) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (file->f_flag= s & O_NONBLOCK) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD return -EAGAIN; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D wait_eve= nt_interruptible( > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD driver_data.read_waitq, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD !equals_gen_counter(fdata->acked_gen_counter) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (ret) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD return ret; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD gen_counter =3D atomic_read(&driver_data.gen= eration_counter); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D copy_to_user(ubuf, &gen_counter, siz= eof(gen_counter)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (ret) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EFAULT; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return sizeof(gen_counter); > +} > + > +static ssize_t vmgenid_write(struct file *file, const char __user *ubuf, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD size_t count, loff_t *ppos) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D file->private_da= ta; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned int new_acked_gen; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned long flags; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD /* disallow partial writes */ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (count !=3D sizeof(new_acked_gen)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (copy_from_user(&new_acked_gen, ubuf, cou= nt)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EFAULT; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_lock_irqsave(&driver_data.lock, flags); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD /* wrong gen-counter acknowledged */ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!equals_gen_counter(new_acked_gen)) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD spin_unlock_irqr= estore(&driver_data.lock, flags); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!equals_gen_counter(fdata->acked_gen_cou= nter)) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD fdata->acked_gen= _counter =3D new_acked_gen; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD vmgenid_put_outd= ated_watchers(); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_unlock_irqrestore(&driver_data.lock, fl= ags); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return (ssize_t)count; > +} > + > +static __poll_t > +vmgenid_poll(struct file *file, poll_table *wait) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD __poll_t mask =3D 0; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D file->private_da= ta; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!equals_gen_counter(fdata->acked_gen_cou= nter)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return EPOLLIN |= EPOLLRDNORM; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD poll_wait(file, &driver_data.read_waitq, wai= t); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!equals_gen_counter(fdata->acked_gen_cou= nter)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD mask =3D EPOLLIN= | EPOLLRDNORM; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return mask; > +} > + > +static long vmgenid_ioctl(struct file *file, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned int cmd= , unsigned long arg) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D file->private_da= ta; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD unsigned long timeout_ns; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ktime_t until; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int ret =3D 0; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD switch (cmd) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD case VMGENID_GET_OUTDATED_WATCHERS: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D atomic_r= ead(&driver_data.outdated_watchers); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD break; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD case VMGENID_WAIT_WATCHERS: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD timeout_ns =3D a= rg * NSEC_PER_MSEC; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD until =3D timeou= t_ns ? ktime_set(0, timeout_ns) : KTIME_MAX; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D wait_eve= nt_interruptible_hrtimeout( > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD driver_data.outdated_waitq, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD (!atomic_read(&driver_data.outdated_watchers) || > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD !equal= s_gen_counter(fdata->acked_gen_counter)), > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD until > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (atomic_read(= &driver_data.outdated_watchers)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD ret =3D -EINTR; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD else > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD ret =3D 0; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD break; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD case VMGENID_FORCE_GEN_UPDATE: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (!checkpoint_= restore_ns_capable(current_user_ns())) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD return -EACCES; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD vmgenid_bump_gen= eration(); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD break; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD default: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D -EINVAL; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD break; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return ret; > +} > + > +static int vmgenid_mmap(struct file *file, struct vm_area_struct *vma) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct file_data *fdata =3D file->private_da= ta; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (vma->vm_pgoff !=3D 0 || vma_pages(vma) >= 1) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if ((vma->vm_flags & VM_WRITE) !=3D 0) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EPERM; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD vma->vm_flags |=3D VM_DONTEXPAND | VM_DONTDU= MP; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD vma->vm_flags &=3D ~VM_MAYWRITE; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD vma->vm_private_data =3D fdata; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return vm_insert_page(vma, vma->vm_start, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF= =BD=EF=BF=BD=EF=BF=BD =EF=BF=BD virt_to_page(driver_data.map_buf)); > +} > + > +static const struct file_operations fops =3D { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .owner=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =3D THIS_MODULE, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .mmap=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =3D vmgenid_mmap, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .open=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =3D vmgenid_open, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .release=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD =3D vmgenid_close, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .read=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =3D vmgenid_read, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .write=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD= =EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =3D vmgenid_write, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .poll=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD =3D vmgenid_poll, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .unlocked_ioctl =3D vmgenid_ioctl, > +}; > + > +struct miscdevice vmgenid_misc =3D { static > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .minor =3D MISC_DYNAMIC_MINOR, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .name =3D "vmgenid", > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .fops =3D &fops, > +}; > + > +static int vmgenid_acpi_map(struct acpi_data *priv, acpi_handle handle) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int i; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD phys_addr_t phys_addr; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct acpi_buffer buffer =3D { ACPI_ALLOCAT= E_BUFFER, NULL }; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD acpi_status status; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD union acpi_object *pss; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD union acpi_object *element; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD status =3D acpi_evaluate_object(handle, "ADD= R", NULL, &buffer); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (ACPI_FAILURE(status)) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD ACPI_EXCEPTION((= AE_INFO, status, "Evaluating ADDR")); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -ENODEV; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD pss =3D buffer.pointer; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!pss || pss->type !=3D ACPI_TYPE_PACKAGE= || pss->package.count !=3D 2) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD phys_addr =3D 0; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD for (i =3D 0; i < pss->package.count; i++) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD element =3D &(ps= s->package.elements[i]); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD if (element->typ= e !=3D ACPI_TYPE_INTEGER) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD return -EINVAL; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD phys_addr |=3D e= lement->integer.value << i * 32; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD priv->uuid_iomap =3D acpi_os_map_memory(phys= _addr, sizeof(uuid_t)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!priv->uuid_iomap) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pr_err("Could no= t map memory at 0x%llx, size %u\n", > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD phys_addr, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF= =BD=EF=BF=BD =EF=BF=BD=EF=BF=BD (u32) sizeof(uuid_t)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -ENOMEM; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD memcpy_fromio(&priv->uuid, priv->uuid_iomap,= sizeof(uuid_t)); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > +} > + > +static int vmgenid_acpi_add(struct acpi_device *device) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int ret; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!device) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.acpi_data =3D kzalloc(sizeof(str= uct acpi_data), GFP_KERNEL); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!driver_data.acpi_data) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pr_err("vmgenid:= failed to allocate acpi_data\n"); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -ENOMEM; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD device->driver_data =3D &driver_data; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D vmgenid_acpi_map(driver_data.acpi_da= ta, device->handle); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (ret < 0) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pr_err("vmgenid:= failed to map acpi device\n"); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD goto err; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > + > +err: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD kfree(driver_data.acpi_data); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.acpi_data =3D NULL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return ret; > +} > + > +static int vmgenid_acpi_remove(struct acpi_device *device) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct acpi_data *priv; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!device || !acpi_driver_data(device)) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -EINVAL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD device->driver_data =3D NULL; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD priv =3D driver_data.acpi_data; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.acpi_data =3D NULL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (priv && priv->uuid_iomap) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD acpi_os_unmap_me= mory(priv->uuid_iomap, sizeof(uuid_t)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD kfree(priv); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > +} > + > +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD struct acpi_data *priv; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD uuid_t old_uuid; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!device || !acpi_driver_data(device)) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pr_err("VMGENID = notify with NULL private data\n"); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +=EF=BF=BD=EF=BF=BD=EF=BF=BD priv =3D driver_data.acpi_data; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD /* update VM Generation UUID */ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD old_uuid =3D priv->uuid; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD memcpy_fromio(&priv->uuid, priv->uuid_iomap,= sizeof(uuid_t)); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (memcmp(&old_uuid, &priv->uuid, sizeof(uu= id_t))) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD /* HW uuid updat= ed */ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD vmgenid_bump_gen= eration(); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD add_device_rando= mness(&priv->uuid, sizeof(uuid_t)); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +} > + > +static const struct acpi_device_id vmgenid_ids[] =3D { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD {"QEMUVGID", 0}, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD {"", 0}, > +}; > + > +static struct acpi_driver acpi_vmgenid_driver =3D { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .name =3D "vm_generation_id", > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .ids =3D vmgenid_ids, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .owner =3D THIS_MODULE, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD .ops =3D { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD .add =3D vmgenid= _acpi_add, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD .remove =3D vmge= nid_acpi_remove, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD .notify =3D vmge= nid_acpi_notify, > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > +}; > + > +static int __init vmgenid_init(void) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD int ret; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.map_buf =3D get_zeroed_page(GFP_= KERNEL); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (!driver_data.map_buf) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD return -ENOMEM; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD atomic_set(&driver_data.generation_counter, = 0); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD atomic_set(&driver_data.outdated_watchers, 0= ); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD init_waitqueue_head(&driver_data.read_waitq); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD init_waitqueue_head(&driver_data.outdated_wa= itq); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD spin_lock_init(&driver_data.lock); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.acpi_data =3D NULL; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D misc_register(&vmgenid_misc); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (ret < 0) { > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pr_err("misc_reg= ister() failed for vmgenid\n"); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD goto err; > +=EF=BF=BD=EF=BF=BD=EF=BF=BD } > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD ret =3D acpi_bus_register_driver(&acpi_vmgen= id_driver); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD if (ret < 0) > +=EF=BF=BD=EF=BF=BD=EF=BF=BD =EF=BF=BD=EF=BF=BD=EF=BF=BD pr_warn("No vmge= nid acpi device found\n"); I think this needs to be reworked to support no-ACPI version. For instance we can call here something like ret =3D vmgenid_hw_register(); and have=20 #ifdef CONFIG_ACPI static int vmgenid_hw_register(void) { return acpi_bus_register_driver(&acpi_vmgenid_driver); } #else static int vmgenid_hw_register(void) { return 0; } #endif > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return 0; > + > +err: > +=EF=BF=BD=EF=BF=BD=EF=BF=BD free_pages(driver_data.map_buf, 0); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.map_buf =3D 0; > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD return ret; > +} > + > +static void __exit vmgenid_exit(void) > +{ > +=EF=BF=BD=EF=BF=BD=EF=BF=BD acpi_bus_unregister_driver(&acpi_vmgenid_dri= ver); > + > +=EF=BF=BD=EF=BF=BD=EF=BF=BD misc_deregister(&vmgenid_misc); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD free_pages(driver_data.map_buf, 0); > +=EF=BF=BD=EF=BF=BD=EF=BF=BD driver_data.map_buf =3D 0; > +} > + > +module_init(vmgenid_init); > +module_exit(vmgenid_exit); > + > +MODULE_AUTHOR("Adrian Catangiu"); > +MODULE_DESCRIPTION("Virtual Machine Generation ID"); > +MODULE_LICENSE("GPL"); > +MODULE_VERSION("0.1"); > diff --git a/include/uapi/linux/vmgenid.h b/include/uapi/linux/vmgenid.h > new file mode 100644 > index 0000000..9316b00 > --- /dev/null > +++ b/include/uapi/linux/vmgenid.h > @@ -0,0 +1,14 @@ > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ > + > +#ifndef _UAPI_LINUX_VMGENID_H > +#define _UAPI_LINUX_VMGENID_H > + > +#include > + > +#define VMGENID_IOCTL 0x2d > +#define VMGENID_GET_OUTDATED_WATCHERS _IO(VMGENID_IOCTL, 1) > +#define VMGENID_WAIT_WATCHERS=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF= =BD=EF=BF=BD=EF=BF=BD=EF=BF=BD _IO(VMGENID_IOCTL, 2) > +#define VMGENID_FORCE_GEN_UPDATE=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF=BF=BD=EF= =BF=BD _IO(VMGENID_IOCTL, 3) > + > +#endif /* _UAPI_LINUX_VMGENID_H */ > + > --=20 > 2.7.4 >=20 --=20 Sincerely yours, Mike. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6D15C63697 for ; Sat, 28 Nov 2020 10:17:46 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D82002227F for ; Sat, 28 Nov 2020 10:17:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="yZdliu4g" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D82002227F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:57720 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kixIW-0008Cj-Fs for qemu-devel@archiver.kernel.org; Sat, 28 Nov 2020 05:17:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:60054) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kixHK-0007YR-IH for qemu-devel@nongnu.org; Sat, 28 Nov 2020 05:16:30 -0500 Received: from mail.kernel.org ([198.145.29.99]:36494) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kixHE-0008J9-IO for qemu-devel@nongnu.org; Sat, 28 Nov 2020 05:16:29 -0500 Received: from kernel.org (unknown [77.125.7.142]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0F478222EB; Sat, 28 Nov 2020 10:16:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1606558581; bh=SXZ50O0tZ9SfgxO8cbDzArUsagGX8/+Yhh5eFFtIV3g=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=yZdliu4g5VM7aPv77CfA4Drg0lJGOysCszVmHduFA3ju2GDAiX1KFB119cnjPZb6B ynWOzkQsPzH2rVyUnuaqIq+iA1CduHiSQyl0AQKoH1yJV6NYTIldrCTHzas/4lMJMD 6j7DGWWd/IYAF/1hu4n5YcAlfspmE3DEfZxmv6+A= Date: Sat, 28 Nov 2020 12:16:04 +0200 From: Mike Rapoport To: "Catangiu, Adrian Costin" Subject: Re: [PATCH v3] drivers/virt: vmgenid: add vm generation id driver Message-ID: <20201128101604.GC557259@kernel.org> References: <3E05451B-A9CD-4719-99D0-72750A304044@amazon.com> <300d4404-3efe-880e-ef30-692eabbff5f7@de.ibm.com> <20201119173800.GD8537@kernel.org> <1cdb6fac-0d50-3399-74a6-24c119ebbaa5@amazon.de> <106f56ca-49bc-7cad-480f-4b26656e90ce@gmail.com> <96625ce2-66c6-34b8-ef81-7c17c05b4c7a@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <96625ce2-66c6-34b8-ef81-7c17c05b4c7a@amazon.com> Received-SPF: pass client-ip=198.145.29.99; envelope-from=rppt@kernel.org; helo=mail.kernel.org X-Spam_score_int: -70 X-Spam_score: -7.1 X-Spam_bar: ------- X-Spam_report: (-7.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "asmehra@redhat.com" , "Jason A. Donenfeld" , "dgunigun@redhat.com" , KVM list , "open list:DOCUMENTATION" , "ghammer@redhat.com" , "vijaysun@ca.ibm.com" , Dmitry Safonov <0x7f454c46@gmail.com>, Qemu Developers , Michal Hocko , Andrey Vagin , Pavel Machek , Pavel Tikhomirov , linux-s390 , Jonathan Corbet , "mpe@ellerman.id.au" , "Michael S. Tsirkin" , Eric Biggers , Christian Borntraeger , "Singh, Balbir" , "bonzini@gnu.org" , Alexander Graf , Jann Horn , "Weiss, Radu" , "oridgar@gmail.com" , Andy Lutomirski , "gil@azul.com" , "MacCarthaigh, Colm" , "Theodore Y. Ts'o" , Greg Kroah-Hartman , "areber@redhat.com" , kernel list , Pavel Emelyanov , "Eric W. Biederman" , Linux API , "Rafael J. Wysocki" , Willy Tarreau , "Woodhouse, David" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" Hi Adrian, Usually each version of a patch is a separate e-mail thread On Fri, Nov 27, 2020 at 08:26:02PM +0200, Catangiu, Adrian Costin wrote: > - Background > > The VM Generation ID is a feature defined by Microsoft (paper: > http://go.microsoft.com/fwlink/?LinkId=260709) and supported by > multiple hypervisor vendors. > > The feature is required in virtualized environments by apps that work > with local copies/caches of world-unique data such as random values, > uuids, monotonically increasing counters, etc. > Such apps can be negatively affected by VM snapshotting when the VM > is either cloned or returned to an earlier point in time. > > The VM Generation ID is a simple concept meant to alleviate the issue > by providing a unique ID that changes each time the VM is restored > from a snapshot. The hw provided UUID value can be used to > differentiate between VMs or different generations of the same VM. > > - Problem > > The VM Generation ID is exposed through an ACPI device by multiple > hypervisor vendors but neither the vendors or upstream Linux have no > default driver for it leaving users to fend for themselves. > > Furthermore, simply finding out about a VM generation change is only > the starting point of a process to renew internal states of possibly > multiple applications across the system. This process could benefit > from a driver that provides an interface through which orchestration > can be easily done. > > - Solution > > This patch is a driver that exposes a monotonic incremental Virtual > Machine Generation u32 counter via a char-dev FS interface. The FS > interface provides sync and async VmGen counter updates notifications. > It also provides VmGen counter retrieval and confirmation mechanisms. > > The generation counter and the interface through which it is exposed > are available even when there is no acpi device present. > > When the device is present, the hw provided UUID is not exposed to > userspace, it is internally used by the driver to keep accounting for > the exposed VmGen counter. The counter starts from zero when the > driver is initialized and monotonically increments every time the hw > UUID changes (the VM generation changes). > On each hw UUID change, the new hypervisor-provided UUID is also fed > to the kernel RNG. > > If there is no acpi vmgenid device present, the generation changes are > not driven by hw vmgenid events but can be driven by software through > a dedicated driver ioctl. > > This patch builds on top of Or Idgar 's proposal > https://lkml.org/lkml/2018/3/1/498 > > - Future improvements > > Ideally we would want the driver to register itself based on devices' > _CID and not _HID, but unfortunately I couldn't find a way to do that. > The problem is that ACPI device matching is done by > '__acpi_match_device()' which exclusively looks at > 'acpi_hardware_id *hwid'. > > There is a path for platform devices to match on _CID when _HID is > 'PRP0001' - but this is not the case for the Qemu vmgenid device. > > Guidance and help here would be greatly appreciated. > > Signed-off-by: Adrian Catangiu > > --- Please put the history in the descending order next time v2 -> v3: ... v1 -> v2: ... > v1 -> v2: > >   - expose to userspace a monotonically increasing u32 Vm Gen Counter >     instead of the hw VmGen UUID >   - since the hw/hypervisor-provided 128-bit UUID is not public >     anymore, add it to the kernel RNG as device randomness >   - insert driver page containing Vm Gen Counter in the user vma in >     the driver's mmap handler instead of using a fault handler >   - turn driver into a misc device driver to auto-create /dev/vmgenid >   - change ioctl arg to avoid leaking kernel structs to userspace >   - update documentation >   - various nits >   - rebase on top of linus latest > > v2 -> v3: > >   - separate the core driver logic and interface, from the ACPI device. >     The ACPI vmgenid device is now one possible backend. >   - fix issue when timeout=0 in VMGENID_WAIT_WATCHERS >   - add locking to avoid races between fs ops handlers and hw irq >     driven generation updates >   - change VMGENID_WAIT_WATCHERS ioctl so if the current caller is >     outdated or a generation change happens while waiting (thus making >     current caller outdated), the ioctl returns -EINTR to signal the >     user to handle event and retry. Fixes blocking on oneself. >   - add VMGENID_FORCE_GEN_UPDATE ioctl conditioned by >     CAP_CHECKPOINT_RESTORE capability, through which software can force >     generation bump. > --- >  Documentation/virt/vmgenid.rst | 240 +++++++++++++++++++++++ >  drivers/virt/Kconfig           |  17 ++ >  drivers/virt/Makefile          |   1 + >  drivers/virt/vmgenid.c         | 435 +++++++++++++++++++++++++++++++++++++++++ >  include/uapi/linux/vmgenid.h   |  14 ++ >  5 files changed, 707 insertions(+) >  create mode 100644 Documentation/virt/vmgenid.rst >  create mode 100644 drivers/virt/vmgenid.c >  create mode 100644 include/uapi/linux/vmgenid.h > > diff --git a/Documentation/virt/vmgenid.rst b/Documentation/virt/vmgenid.rst > new file mode 100644 > index 0000000..b6a9f8d > --- /dev/null > +++ b/Documentation/virt/vmgenid.rst > @@ -0,0 +1,240 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +============ > +VMGENID > +============ The "==" line should be the same length as the title, I think. > + > +The VM Generation ID is a feature defined by Microsoft (paper: > +http://go.microsoft.com/fwlink/?LinkId=260709) and supported by > +multiple hypervisor vendors. > + > +The feature is required in virtualized environments by apps that work Please spell 'applications' fully > +with local copies/caches of world-unique data such as random values, > +uuids, monotonically increasing counters, etc. UUIDs > +Such apps can be negatively affected by VM snapshotting when the VM ^applications > +is either cloned or returned to an earlier point in time. > + > +The VM Generation ID is a simple concept meant to alleviate the issue > +by providing a unique ID that changes each time the VM is restored > +from a snapshot. The hw provided UUID value can be used to ^hardware (and below as well) > +differentiate between VMs or different generations of the same VM. > + > +The VM Generation ID is exposed through an ACPI device by multiple > +hypervisor vendors. The driver for it lives at > +``drivers/virt/vmgenid.c`` > + > +The ``vmgenid`` driver exposes a monotonic incremental Virtual > +Machine Generation u32 counter via a char-dev FS interface that > +provides sync and async VmGen counter updates notifications. It also > +provides VmGen counter retrieval and confirmation mechanisms. It would be nice to memntion here the name of the chardev :) > +This counter and the interface through which it is exposed are > +available even when there is no acpi device present. > + > +When the device is present, the hw provided UUID is not exposed to > +userspace, it is internally used by the driver to keep accounting for > +the exposed VmGen counter. The counter starts from zero when the > +driver is initialized and monotonically increments every time the hw > +UUID changes (the VM generation changes). > +On each hw UUID change, the new UUID is also fed to the kernel RNG. > + > +If there is no acpi vmgenid device present, the generation changes are > +not driven by hw vmgenid events and thus should be driven by software > +through a dedicated driver ioctl. > + > +Driver interface: > + > +``open()``: > +  When the device is opened, a copy of the current Vm-Gen-Id (counter) > +  is associated with the open file descriptor. The driver now tracks > +  this file as an independent *watcher*. The driver tracks how many > +  watchers are aware of the latest Vm-Gen-Id counter and how many of > +  them are *outdated*; outdated being those that have lived through > +  a Vm-Gen-Id change but not yet confirmed the new generation counter. > + > +``read()``: > +  Read is meant to provide the *new* VM generation counter when a > +  generation change takes place. The read operation blocks until the > +  associated counter is no longer up to date - until HW vm gen id > +  changes - at which point the new counter is provided/returned. > +  Nonblocking ``read()`` uses ``EAGAIN`` to signal that there is no > +  *new* counter value available. The generation counter is considered > +  *new* for each open file descriptor that hasn't confirmed the new > +  value, following a generation change. Therefore, once a generation > +  change takes place, all ``read()`` calls will immediately return the > +  new generation counter and will continue to do so until the > +  new value is confirmed back to the driver through ``write()``. > +  Partial reads are not allowed - read buffer needs to be at least > +  ``sizeof(unsigned)`` in size. > + > +``write()``: > +  Write is used to confirm the up-to-date Vm Gen counter back to the > +  driver. > +  Following a VM generation change, all existing watchers are marked > +  as *outdated*. Each file descriptor will maintain the *outdated* > +  status until a ``write()`` confirms the up-to-date counter back to > +  the driver. > +  Partial writes are not allowed - write buffer should be exactly > +  ``sizeof(unsigned)`` in size. > + > +``poll()``: > +  Poll is implemented to allow polling for generation counter updates. > +  Such updates result in ``EPOLLIN`` polling status until the new > +  up-to-date counter is confirmed back to the driver through a > +  ``write()``. > + > +``ioctl()``: > +  The driver also adds support for tracking count of open file > +  descriptors that haven't acknowledged a generation counter update. > +  This is exposed through two IOCTLs: > + > +  - VMGENID_GET_OUTDATED_WATCHERS: immediately returns the number of > +    *outdated* watchers - number of file descriptors that were open > +    during a VM generation change, and which have not yet confirmed the > +    new generation counter. > +  - VMGENID_WAIT_WATCHERS: blocks until there are no more *outdated* > +    watchers, or if a ``timeout`` argument is provided, until the > +    timeout expires. > +    If the current caller is *outdated* or a generation change happens > +    while waiting (thus making current caller *outdated*), the ioctl > +    returns ``-EINTR`` to signal the user to handle event and retry. > +  - VMGENID_FORCE_GEN_UPDATE: forces a generation counter bump. Can only > +    be used by processes with CAP_CHECKPOINT_RESTORE or CAP_SYS_ADMIN > +    capabilities. > + > +``mmap()``: > +  The driver supports ``PROT_READ, MAP_SHARED`` mmaps of a single page > +  in size. The first 4 bytes of the mapped page will contain an > +  up-to-date copy of the VM generation counter. > +  The mapped memory can be used as a low-latency generation counter > +  probe mechanism in critical sections - see examples. > + > +``close()``: > +  Removes the file descriptor as a Vm generation counter watcher. > + > +Example application workflows > +----------------------------- > + > +1) Watchdog thread simplified example:: > + > +    void watchdog_thread_handler(int *thread_active) > +    { > +        unsigned genid; > +        int fd = open("/dev/vmgenid", O_RDWR | O_CLOEXEC, S_IRUSR | > S_IWUSR); > + > +        do { > +            // read new gen ID - blocks until VM generation changes > +            read(fd, &genid, sizeof(genid)); > + > +            // because of VM generation change, we need to rebuild world > +            reseed_app_env(); > + > +            // confirm we're done handling gen ID update > +            write(fd, &genid, sizeof(genid)); > +        } while (atomic_read(thread_active)); > + > +        close(fd); > +    } > + > +2) ASYNC simplified example:: > + > +    void handle_io_on_vmgenfd(int vmgenfd) > +    { > +        unsigned genid; > + > +        // read new gen ID - we need it to confirm we've handled update > +        read(fd, &genid, sizeof(genid)); > + > +        // because of VM generation change, we need to rebuild world > +        reseed_app_env(); > + > +        // confirm we're done handling the gen ID update > +        write(fd, &genid, sizeof(genid)); > +    } > + > +    int main() { > +        int epfd, vmgenfd; > +        struct epoll_event ev; > + > +        epfd = epoll_create(EPOLL_QUEUE_LEN); > + > +        vmgenfd = open("/dev/vmgenid", > +                       O_RDWR | O_CLOEXEC | O_NONBLOCK, > +                       S_IRUSR | S_IWUSR); > + > +        // register vmgenid for polling > +        ev.events = EPOLLIN; > +        ev.data.fd = vmgenfd; > +        epoll_ctl(epfd, EPOLL_CTL_ADD, vmgenfd, &ev); > + > +        // register other parts of your app for polling > +        // ... > + > +        while (1) { > +            // wait for something to do... > +            int nfds = epoll_wait(epfd, events, > +                MAX_EPOLL_EVENTS_PER_RUN, > +                EPOLL_RUN_TIMEOUT); > +            if (nfds < 0) die("Error in epoll_wait!"); > + > +            // for each ready fd > +            for(int i = 0; i < nfds; i++) { > +                int fd = events[i].data.fd; > + > +                if (fd == vmgenfd) > +                    handle_io_on_vmgenfd(vmgenfd); > +                else > +                    handle_some_other_part_of_the_app(fd); > +            } > +        } > + > +        return 0; > +    } > + > +3) Mapped memory polling simplified example:: > + > +    /* > +     * app/library function that provides cached secrets > +     */ > +    char * safe_cached_secret(app_data_t *app) > +    { > +        char *secret; > +        volatile unsigned *const genid_ptr = get_vmgenid_mapping(app); > +    again: > +        secret = __cached_secret(app); > + > +        if (unlikely(*genid_ptr != app->cached_genid)) { > +            // rebuild world then confirm the genid update (thru write) > +            rebuild_caches(app); > + > +            app->cached_genid = *genid_ptr; > +            ack_vmgenid_update(app); > + > +            goto again; > +        } > + > +        return secret; > +    } > + > +4) Orchestrator simplified example:: > + > +    /* > +     * orchestrator - manages multiple apps and libraries used by a service > +     * and tries to make sure all sensitive components gracefully handle > +     * VM generation changes. > +     * Following function is called on detection of a VM generation change. > +     */ > +    int handle_vmgen_update(int vmgen_fd, unsigned new_gen_id) > +    { > +        // pause until all components have handled event > +        pause_service(); > + > +        // confirm *this* watcher as up-to-date > +        write(vmgen_fd, &new_gen_id, sizeof(unsigned)); > + > +        // wait for all *others* for at most 5 seconds. > +        ioctl(vmgen_fd, VMGENID_WAIT_WATCHERS, 5000); > + > +        // all apps on the system have rebuilt worlds > +        resume_service(); > +    } > diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig > index 80c5f9c1..5d5f37b 100644 > --- a/drivers/virt/Kconfig > +++ b/drivers/virt/Kconfig > @@ -13,6 +13,23 @@ menuconfig VIRT_DRIVERS >   >  if VIRT_DRIVERS >   > +config VMGENID > +    tristate "Virtual Machine Generation ID driver" > +    depends on ACPI I think this is not needed. We have /dev/vmgenid regardless of ACPI device for container usecase and we may have a different HW emulation for s390 and PowerPC. > +    default N > +    help > +      This is a Virtual Machine Generation ID driver which provides > +      a virtual machine generation counter. The driver exposes FS ops > +      on /dev/vmgenid through which it can provide information and > +      notifications on VM generation changes that happen on snapshots > +      or cloning. > +      This enables applications and libraries that store or cache > +      sensitive information, to know that they need to regenerate it > +      after process memory has been exposed to potential copying. > + > +      To compile this driver as a module, choose M here: the > +      module will be called vmgenid. > + >  config FSL_HV_MANAGER >      tristate "Freescale hypervisor management driver" >      depends on FSL_SOC > diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile > index f28425c..889be01 100644 > --- a/drivers/virt/Makefile > +++ b/drivers/virt/Makefile > @@ -4,6 +4,7 @@ >  # >   >  obj-$(CONFIG_FSL_HV_MANAGER)    += fsl_hypervisor.o > +obj-$(CONFIG_VMGENID)        += vmgenid.o >  obj-y                += vboxguest/ >   >  obj-$(CONFIG_NITRO_ENCLAVES)    += nitro_enclaves/ > diff --git a/drivers/virt/vmgenid.c b/drivers/virt/vmgenid.c > new file mode 100644 > index 0000000..c4d4683 > --- /dev/null > +++ b/drivers/virt/vmgenid.c > @@ -0,0 +1,435 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Virtual Machine Generation ID driver > + * > + * Copyright (C) 2018 Red Hat Inc. All rights reserved. > + * > + * Copyright (C) 2020 Amazon. All rights reserved. > + * > + *    Authors: > + *      Adrian Catangiu > + *      Or Idgar > + *      Gal Hammer > + * > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define DEV_NAME "vmgenid" > +ACPI_MODULE_NAME(DEV_NAME); > + > +struct acpi_data { > +    uuid_t uuid; > +    void   *uuid_iomap; > +}; > + > +struct driver_data { I'd suggest vmgenid_data > +    unsigned long     map_buf; We use tab=8 for indentation. Please run your patch though scripts/checkpatch.pl to make sure it conforms the coding style. > +    wait_queue_head_t read_waitq; > +    atomic_t          generation_counter; > + > +    unsigned int      watchers; > +    atomic_t          outdated_watchers; > +    wait_queue_head_t outdated_waitq; > +    spinlock_t        lock; > + > +    struct acpi_data  *acpi_data; > +}; > +struct driver_data driver_data; static > + > +struct file_data { > +    unsigned int acked_gen_counter; > +}; > + > +static int equals_gen_counter(unsigned int counter) > +{ > +    return counter == atomic_read(&driver_data.generation_counter); > +} > + > +static void vmgenid_bump_generation(void) > +{ > +    unsigned long flags; > +    int counter; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    counter = atomic_inc_return(&driver_data.generation_counter); > +    *((int *) driver_data.map_buf) = counter; > +    atomic_set(&driver_data.outdated_watchers, driver_data.watchers); > + > +    wake_up_interruptible(&driver_data.read_waitq); > +    wake_up_interruptible(&driver_data.outdated_waitq); > +    spin_unlock_irqrestore(&driver_data.lock, flags); > +} > + > +static void vmgenid_put_outdated_watchers(void) > +{ > +    if (atomic_dec_and_test(&driver_data.outdated_watchers)) > +        wake_up_interruptible(&driver_data.outdated_waitq); > +} > + > +static int vmgenid_open(struct inode *inode, struct file *file) > +{ > +    struct file_data *fdata = kzalloc(sizeof(struct file_data), > GFP_KERNEL); > +    unsigned long flags; > + > +    if (!fdata) > +        return -ENOMEM; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    fdata->acked_gen_counter = > atomic_read(&driver_data.generation_counter); > +    ++driver_data.watchers; > +    spin_unlock_irqrestore(&driver_data.lock, flags); > + > +    file->private_data = fdata; > + > +    return 0; > +} > + > +static int vmgenid_close(struct inode *inode, struct file *file) > +{ > +    struct file_data *fdata = file->private_data; > +    unsigned long flags; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    if (!equals_gen_counter(fdata->acked_gen_counter)) > +        vmgenid_put_outdated_watchers(); > +    --driver_data.watchers; > +    spin_unlock_irqrestore(&driver_data.lock, flags); > + > +    kfree(fdata); > + > +    return 0; > +} > + > +static ssize_t > +vmgenid_read(struct file *file, char __user *ubuf, size_t nbytes, Please keep the function name at the same line as return type and wrap parameters to the next line. > loff_t *ppos) > +{ > +    struct file_data *fdata = file->private_data; > +    ssize_t ret; > +    int gen_counter; > + > +    if (nbytes == 0) > +        return 0; > +    /* disallow partial reads */ > +    if (nbytes < sizeof(gen_counter)) > +        return -EINVAL; > + > +    if (equals_gen_counter(fdata->acked_gen_counter)) { > +        if (file->f_flags & O_NONBLOCK) > +            return -EAGAIN; > +        ret = wait_event_interruptible( > +            driver_data.read_waitq, > +            !equals_gen_counter(fdata->acked_gen_counter) > +        ); > +        if (ret) > +            return ret; > +    } > + > +    gen_counter = atomic_read(&driver_data.generation_counter); > +    ret = copy_to_user(ubuf, &gen_counter, sizeof(gen_counter)); > +    if (ret) > +        return -EFAULT; > + > +    return sizeof(gen_counter); > +} > + > +static ssize_t vmgenid_write(struct file *file, const char __user *ubuf, > +                size_t count, loff_t *ppos) > +{ > +    struct file_data *fdata = file->private_data; > +    unsigned int new_acked_gen; > +    unsigned long flags; > + > +    /* disallow partial writes */ > +    if (count != sizeof(new_acked_gen)) > +        return -EINVAL; > +    if (copy_from_user(&new_acked_gen, ubuf, count)) > +        return -EFAULT; > + > +    spin_lock_irqsave(&driver_data.lock, flags); > +    /* wrong gen-counter acknowledged */ > +    if (!equals_gen_counter(new_acked_gen)) { > +        spin_unlock_irqrestore(&driver_data.lock, flags); > +        return -EINVAL; > +    } > +    if (!equals_gen_counter(fdata->acked_gen_counter)) { > +        fdata->acked_gen_counter = new_acked_gen; > +        vmgenid_put_outdated_watchers(); > +    } > +    spin_unlock_irqrestore(&driver_data.lock, flags); > + > +    return (ssize_t)count; > +} > + > +static __poll_t > +vmgenid_poll(struct file *file, poll_table *wait) > +{ > +    __poll_t mask = 0; > +    struct file_data *fdata = file->private_data; > + > +    if (!equals_gen_counter(fdata->acked_gen_counter)) > +        return EPOLLIN | EPOLLRDNORM; > + > +    poll_wait(file, &driver_data.read_waitq, wait); > + > +    if (!equals_gen_counter(fdata->acked_gen_counter)) > +        mask = EPOLLIN | EPOLLRDNORM; > + > +    return mask; > +} > + > +static long vmgenid_ioctl(struct file *file, > +        unsigned int cmd, unsigned long arg) > +{ > +    struct file_data *fdata = file->private_data; > +    unsigned long timeout_ns; > +    ktime_t until; > +    int ret = 0; > + > +    switch (cmd) { > +    case VMGENID_GET_OUTDATED_WATCHERS: > +        ret = atomic_read(&driver_data.outdated_watchers); > +        break; > +    case VMGENID_WAIT_WATCHERS: > +        timeout_ns = arg * NSEC_PER_MSEC; > +        until = timeout_ns ? ktime_set(0, timeout_ns) : KTIME_MAX; > + > +        ret = wait_event_interruptible_hrtimeout( > +            driver_data.outdated_waitq, > +            (!atomic_read(&driver_data.outdated_watchers) || > +                    !equals_gen_counter(fdata->acked_gen_counter)), > +            until > +        ); > +        if (atomic_read(&driver_data.outdated_watchers)) > +            ret = -EINTR; > +        else > +            ret = 0; > +        break; > +    case VMGENID_FORCE_GEN_UPDATE: > +        if (!checkpoint_restore_ns_capable(current_user_ns())) > +            return -EACCES; > +        vmgenid_bump_generation(); > +        break; > +    default: > +        ret = -EINVAL; > +        break; > +    } > +    return ret; > +} > + > +static int vmgenid_mmap(struct file *file, struct vm_area_struct *vma) > +{ > +    struct file_data *fdata = file->private_data; > + > +    if (vma->vm_pgoff != 0 || vma_pages(vma) > 1) > +        return -EINVAL; > + > +    if ((vma->vm_flags & VM_WRITE) != 0) > +        return -EPERM; > + > +    vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP; > +    vma->vm_flags &= ~VM_MAYWRITE; > +    vma->vm_private_data = fdata; > + > +    return vm_insert_page(vma, vma->vm_start, > +                          virt_to_page(driver_data.map_buf)); > +} > + > +static const struct file_operations fops = { > +    .owner          = THIS_MODULE, > +    .mmap           = vmgenid_mmap, > +    .open           = vmgenid_open, > +    .release        = vmgenid_close, > +    .read           = vmgenid_read, > +    .write          = vmgenid_write, > +    .poll           = vmgenid_poll, > +    .unlocked_ioctl = vmgenid_ioctl, > +}; > + > +struct miscdevice vmgenid_misc = { static > +    .minor = MISC_DYNAMIC_MINOR, > +    .name = "vmgenid", > +    .fops = &fops, > +}; > + > +static int vmgenid_acpi_map(struct acpi_data *priv, acpi_handle handle) > +{ > +    int i; > +    phys_addr_t phys_addr; > +    struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL }; > +    acpi_status status; > +    union acpi_object *pss; > +    union acpi_object *element; > + > +    status = acpi_evaluate_object(handle, "ADDR", NULL, &buffer); > +    if (ACPI_FAILURE(status)) { > +        ACPI_EXCEPTION((AE_INFO, status, "Evaluating ADDR")); > +        return -ENODEV; > +    } > +    pss = buffer.pointer; > +    if (!pss || pss->type != ACPI_TYPE_PACKAGE || pss->package.count != 2) > +        return -EINVAL; > + > +    phys_addr = 0; > +    for (i = 0; i < pss->package.count; i++) { > +        element = &(pss->package.elements[i]); > +        if (element->type != ACPI_TYPE_INTEGER) > +            return -EINVAL; > +        phys_addr |= element->integer.value << i * 32; > +    } > + > +    priv->uuid_iomap = acpi_os_map_memory(phys_addr, sizeof(uuid_t)); > +    if (!priv->uuid_iomap) { > +        pr_err("Could not map memory at 0x%llx, size %u\n", > +               phys_addr, > +               (u32) sizeof(uuid_t)); > +        return -ENOMEM; > +    } > + > +    memcpy_fromio(&priv->uuid, priv->uuid_iomap, sizeof(uuid_t)); > + > +    return 0; > +} > + > +static int vmgenid_acpi_add(struct acpi_device *device) > +{ > +    int ret; > + > +    if (!device) > +        return -EINVAL; > + > +    driver_data.acpi_data = kzalloc(sizeof(struct acpi_data), GFP_KERNEL); > +    if (!driver_data.acpi_data) { > +        pr_err("vmgenid: failed to allocate acpi_data\n"); > +        return -ENOMEM; > +    } > +    device->driver_data = &driver_data; > + > +    ret = vmgenid_acpi_map(driver_data.acpi_data, device->handle); > +    if (ret < 0) { > +        pr_err("vmgenid: failed to map acpi device\n"); > +        goto err; > +    } > + > +    return 0; > + > +err: > +    kfree(driver_data.acpi_data); > +    driver_data.acpi_data = NULL; > + > +    return ret; > +} > + > +static int vmgenid_acpi_remove(struct acpi_device *device) > +{ > +    struct acpi_data *priv; > + > +    if (!device || !acpi_driver_data(device)) > +        return -EINVAL; > + > +    device->driver_data = NULL; > +    priv = driver_data.acpi_data; > +    driver_data.acpi_data = NULL; > + > +    if (priv && priv->uuid_iomap) > +        acpi_os_unmap_memory(priv->uuid_iomap, sizeof(uuid_t)); > +    kfree(priv); > + > +    return 0; > +} > + > +static void vmgenid_acpi_notify(struct acpi_device *device, u32 event) > +{ > +    struct acpi_data *priv; > +    uuid_t old_uuid; > + > +    if (!device || !acpi_driver_data(device)) { > +        pr_err("VMGENID notify with NULL private data\n"); > +        return; > +    } > +    priv = driver_data.acpi_data; > + > +    /* update VM Generation UUID */ > +    old_uuid = priv->uuid; > +    memcpy_fromio(&priv->uuid, priv->uuid_iomap, sizeof(uuid_t)); > + > +    if (memcmp(&old_uuid, &priv->uuid, sizeof(uuid_t))) { > +        /* HW uuid updated */ > +        vmgenid_bump_generation(); > +        add_device_randomness(&priv->uuid, sizeof(uuid_t)); > +    } > +} > + > +static const struct acpi_device_id vmgenid_ids[] = { > +    {"QEMUVGID", 0}, > +    {"", 0}, > +}; > + > +static struct acpi_driver acpi_vmgenid_driver = { > +    .name = "vm_generation_id", > +    .ids = vmgenid_ids, > +    .owner = THIS_MODULE, > +    .ops = { > +        .add = vmgenid_acpi_add, > +        .remove = vmgenid_acpi_remove, > +        .notify = vmgenid_acpi_notify, > +    } > +}; > + > +static int __init vmgenid_init(void) > +{ > +    int ret; > + > +    driver_data.map_buf = get_zeroed_page(GFP_KERNEL); > +    if (!driver_data.map_buf) > +        return -ENOMEM; > + > +    atomic_set(&driver_data.generation_counter, 0); > +    atomic_set(&driver_data.outdated_watchers, 0); > +    init_waitqueue_head(&driver_data.read_waitq); > +    init_waitqueue_head(&driver_data.outdated_waitq); > +    spin_lock_init(&driver_data.lock); > +    driver_data.acpi_data = NULL; > + > +    ret = misc_register(&vmgenid_misc); > +    if (ret < 0) { > +        pr_err("misc_register() failed for vmgenid\n"); > +        goto err; > +    } > + > +    ret = acpi_bus_register_driver(&acpi_vmgenid_driver); > +    if (ret < 0) > +        pr_warn("No vmgenid acpi device found\n"); I think this needs to be reworked to support no-ACPI version. For instance we can call here something like ret = vmgenid_hw_register(); and have #ifdef CONFIG_ACPI static int vmgenid_hw_register(void) { return acpi_bus_register_driver(&acpi_vmgenid_driver); } #else static int vmgenid_hw_register(void) { return 0; } #endif > + > +    return 0; > + > +err: > +    free_pages(driver_data.map_buf, 0); > +    driver_data.map_buf = 0; > + > +    return ret; > +} > + > +static void __exit vmgenid_exit(void) > +{ > +    acpi_bus_unregister_driver(&acpi_vmgenid_driver); > + > +    misc_deregister(&vmgenid_misc); > +    free_pages(driver_data.map_buf, 0); > +    driver_data.map_buf = 0; > +} > + > +module_init(vmgenid_init); > +module_exit(vmgenid_exit); > + > +MODULE_AUTHOR("Adrian Catangiu"); > +MODULE_DESCRIPTION("Virtual Machine Generation ID"); > +MODULE_LICENSE("GPL"); > +MODULE_VERSION("0.1"); > diff --git a/include/uapi/linux/vmgenid.h b/include/uapi/linux/vmgenid.h > new file mode 100644 > index 0000000..9316b00 > --- /dev/null > +++ b/include/uapi/linux/vmgenid.h > @@ -0,0 +1,14 @@ > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */ > + > +#ifndef _UAPI_LINUX_VMGENID_H > +#define _UAPI_LINUX_VMGENID_H > + > +#include > + > +#define VMGENID_IOCTL 0x2d > +#define VMGENID_GET_OUTDATED_WATCHERS _IO(VMGENID_IOCTL, 1) > +#define VMGENID_WAIT_WATCHERS         _IO(VMGENID_IOCTL, 2) > +#define VMGENID_FORCE_GEN_UPDATE      _IO(VMGENID_IOCTL, 3) > + > +#endif /* _UAPI_LINUX_VMGENID_H */ > + > -- > 2.7.4 > -- Sincerely yours, Mike.