All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org, Igor Mammedov <imammedo@redhat.com>,
	Xiao Guangrong <guangrong.xiao@gmail.com>,
	Stefan Hajnoczi <stefanha@gmail.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [Qemu-devel] [PATCH v2 2/4] nvdimm: warn if the backend is not a DAX device
Date: Mon, 12 Jun 2017 06:25:20 +0300	[thread overview]
Message-ID: <20170612062351-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20170612031821.ycsasc2p6svil55a@hz-desktop>

On Mon, Jun 12, 2017 at 11:18:21AM +0800, Haozhong Zhang wrote:
> On 06/08/17 15:51 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 06, 2017 at 03:22:27PM +0800, Haozhong Zhang wrote:
> > > Applications in Linux guest that use device-dax never trigger flush
> > > that can be trapped by KVM/QEMU. Meanwhile, if the host backend is not
> > > device-dax, QEMU cannot guarantee the persistence of guest writes.
> > > Before solving this flushing problem, QEMU should warn users if the
> > > host backend is not device-dax.
> > 
> > No one reads warnings unless things fail but they can be a debugging aid
> > if they do. But they have to be simple and robust to be helpful for
> > that. This one seems to me too complex and fragile.
> >
> > 
> > > Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
> > > Message-id: CAPcyv4hV2-ZW8SMCRtD0P_86KgR3DHOvNe+6T5SY2u7wXg3gEg@mail.gmail.com
> > 
> > Don't include this line in commit log please. Put it after --- if you
> > must.
> > 
> > > ---
> > >  hw/mem/nvdimm.c      |  6 ++++++
> > >  include/qemu/osdep.h |  9 ++++++++
> > >  util/osdep.c         | 61 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  3 files changed, 76 insertions(+)
> > > 
> > > diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> > > index a9b0863f20..b23542fbdf 100644
> > > --- a/hw/mem/nvdimm.c
> > > +++ b/hw/mem/nvdimm.c
> > > @@ -26,6 +26,7 @@
> > >  #include "qapi/error.h"
> > >  #include "qapi/visitor.h"
> > >  #include "hw/mem/nvdimm.h"
> > > +#include "qemu/error-report.h"
> > >  
> > >  static void nvdimm_get_label_size(Object *obj, Visitor *v, const char *name,
> > >                                    void *opaque, Error **errp)
> > > @@ -84,6 +85,11 @@ static void nvdimm_realize(PCDIMMDevice *dimm, Error **errp)
> > >      NVDIMMDevice *nvdimm = NVDIMM(dimm);
> > >      uint64_t align, pmem_size, size = memory_region_size(mr);
> > >  
> > > +    if (!qemu_fd_is_dev_dax(memory_region_get_fd(mr))) {
> > > +        error_report("warning: nvdimm backend does not look like a DAX device, "
> > > +                     "unable to guarantee persistence of guest writes");
> > > +    }
> > > +
> > >      align = memory_region_get_alignment(mr);
> > >  
> > >      pmem_size = size - nvdimm->label_size;
> > > diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> > > index 1c9f5e260c..7f26af371e 100644
> > > --- a/include/qemu/osdep.h
> > > +++ b/include/qemu/osdep.h
> > > @@ -470,4 +470,13 @@ char *qemu_get_pid_name(pid_t pid);
> > >   */
> > >  pid_t qemu_fork(Error **errp);
> > >  
> > > +/**
> > > + * qemu_fd_is_dev_dax:
> > > + *
> > > + * Check whether @fd describes a DAX device.
> > > + *
> > > + * Returns true if it is; otherwise, return false.
> > > + */
> > > +bool qemu_fd_is_dev_dax(int fd);
> > > +
> > >  #endif
> > > diff --git a/util/osdep.c b/util/osdep.c
> > > index a2863c8e53..02881f96bc 100644
> > > --- a/util/osdep.c
> > > +++ b/util/osdep.c
> > > @@ -471,3 +471,64 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
> > >      return readv_writev(fd, iov, iov_cnt, true);
> > >  }
> > >  #endif
> > > +
> > > +#ifdef __linux__
> > > +static ssize_t qemu_dev_dax_sysfs_read(int fd, const char *entry,
> > > +                                       char *buf, size_t len)
> > > +{
> > > +    ssize_t read_bytes;
> > > +    struct stat st;
> > > +    unsigned int major, minor;
> > > +    char *path, *pos;
> > > +    int sysfs_fd;
> > > +
> > > +    if (fstat(fd, &st)) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    major = major(st.st_rdev);
> > > +    minor = minor(st.st_rdev);
> > > +    path = g_strdup_printf("/sys/dev/char/%u:%u/%s", major, minor, entry);
> > > +
> > > +    sysfs_fd = open(path, O_RDONLY);
> > > +    g_free(path);
> > > +    if (sysfs_fd == -1) {
> > > +        return 0;
> > > +    }
> > > +
> > > +    read_bytes = read(sysfs_fd, buf, len - 1);
> > > +    close(sysfs_fd);
> > > +    if (read_bytes > 0) {
> > > +        buf[read_bytes] = '\0';
> > > +        pos = g_strstr_len(buf, read_bytes, "\n");
> > > +        if (pos) {
> > > +            *pos = '\0';
> > > +        }
> > > +    }
> > > +
> > > +    return read_bytes;
> > > +}
> > > +#endif /* __linux__ */
> > > +
> > > +bool qemu_fd_is_dev_dax(int fd)
> > > +{
> > > +    bool is_dax = false;
> > > +
> > > +#ifdef __linux__
> > > +    char devtype[7];
> > > +    ssize_t len;
> > > +
> > > +    if (fd == -1) {
> > > +        return false;
> > > +    }
> > > +
> > > +    len = qemu_dev_dax_sysfs_read(fd, "device/devtype",
> > > +                                  devtype, sizeof(devtype));
> > > +    if (len <= 0) {
> > > +        return false;
> > > +    }
> > 
> > This can easily trigger false positives e.g. if sysfs access
> > is blocked by selinux.
> >
> 
> A part of userland interface of nvdimm is exposed via sysfs, so QEMU
> has to access to certain sysfs entries in order to, e.g. perform the
> DAX check in this patch and get alignment requirement in patch 4.
> 
> I agree it's not robust if the permission is not properly granted. We
> may either
> 1) require users to grant the permissions to QEMU and document the
>    requirements
>  or
> 2) get the information from sysfs from the outside of QEMU
>    (e.g. libvirt) which has the permission and then pass to QEMU.
> 
> Which one do you think is better?

2) since that allows emulating things without hardware dax.

> > 
> > > +    is_dax = !strncmp(devtype, "nd_dax", len);
> > 
> > E.g. will return true on any string starting with nd_dax.
> > And it's not clear non-dax devices can't guarantee safety.
> 
> This check can be done by checking whether /sys/dev/char/MAJ:MIN/subsystem
> points to /sys/class/dax, as Dan suggested, if we decide to access
> sysfs in QEMU.
> 
> Thanks,
> Haozhong
> 
> > 
> > > +#endif /* __linux__ */
> > > +
> > > +    return is_dax;
> > > +}
> > > -- 
> > > 2.11.0

  reply	other threads:[~2017-06-12  3:25 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-06  7:22 [Qemu-devel] [PATCH v2 0/4] nvdimm: fixes for (non-)dax backends Haozhong Zhang
2017-06-06  7:22 ` [Qemu-devel] [PATCH v2 1/4] nvdimm: add a macro for property "label-size" Haozhong Zhang
2017-06-07 15:29   ` Stefan Hajnoczi
2017-06-06  7:22 ` [Qemu-devel] [PATCH v2 2/4] nvdimm: warn if the backend is not a DAX device Haozhong Zhang
2017-06-06 17:59   ` Dan Williams
2017-06-08  1:07     ` Haozhong Zhang
2017-06-07 15:14   ` Stefan Hajnoczi
2017-06-08  1:07     ` Haozhong Zhang
2017-06-08 12:51   ` Michael S. Tsirkin
2017-06-12  3:18     ` Haozhong Zhang
2017-06-12  3:25       ` Michael S. Tsirkin [this message]
2017-06-06  7:22 ` [Qemu-devel] [PATCH v2 3/4] nvdimm: add a boolean option "restrict" Haozhong Zhang
2017-06-07 15:27   ` Stefan Hajnoczi
2017-06-08  1:45     ` Haozhong Zhang
2017-06-08 10:19       ` Stefan Hajnoczi
2017-06-08  6:39     ` Haozhong Zhang
2017-06-08 10:18       ` Stefan Hajnoczi
2017-06-08 12:56   ` Michael S. Tsirkin
2017-06-09  9:34     ` Stefan Hajnoczi
2017-06-12  1:18     ` Haozhong Zhang
2017-06-06  7:22 ` [Qemu-devel] [PATCH v2 4/4] util/mmap-alloc: account for DAX device in qemu_fd_getpagesize Haozhong Zhang
2017-06-07 15:29   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170612062351-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=guangrong.xiao@gmail.com \
    --cc=imammedo@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.