All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pankaj Gupta <pagupta@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org,
	linux-mm@kvack.org, jack@suse.cz, stefanha@redhat.com,
	dan j williams <dan.j.williams@intel.com>,
	riel@redhat.com, haozhong zhang <haozhong.zhang@intel.com>,
	nilal@redhat.com, kwolf@redhat.com, pbonzini@redhat.com,
	ross zwisler <ross.zwisler@intel.com>,
	david@redhat.com,
	xiaoguangrong eric <xiaoguangrong.eric@gmail.com>
Subject: Re: [RFC 2/2] KVM: add virtio-pmem driver
Date: Mon, 16 Oct 2017 13:04:34 -0400 (EDT)	[thread overview]
Message-ID: <1080174355.20804941.1508173474622.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20171016144753.GB14135@stefanha-x1.localdomain>


> 
> On Fri, Oct 13, 2017 at 06:48:15AM -0400, Pankaj Gupta wrote:
> > > On Thu, Oct 12, 2017 at 09:20:26PM +0530, Pankaj Gupta wrote:
> > > > +static blk_qc_t virtio_pmem_make_request(struct request_queue *q,
> > > > +			struct bio *bio)
> > > > +{
> > > > +	blk_status_t rc = 0;
> > > > +	struct bio_vec bvec;
> > > > +	struct bvec_iter iter;
> > > > +	struct virtio_pmem *pmem = q->queuedata;
> > > > +
> > > > +	if (bio->bi_opf & REQ_FLUSH)
> > > > +		//todo host flush command
> > > 
> > > This detail is critical to the device design.  What is the plan?
> > 
> > yes, this is good point.
> > 
> > was thinking of guest sending a flush command to Qemu which
> > will do a fsync on file fd.
> 
> Previously there was discussion about fsyncing a specific file range
> instead of the whole file.  This could perform better in cases where
> only a subset of dirty pages need to be flushed.

yes, We had discussion about this and decided to do entire block flush
then to range level flush.

> 
> One possibility is to design the virtio interface to communicate ranges
> but the emulation code simply fsyncs the fd for the time being.  Later
> on, if the necessary kernel and userspace interfaces are added, we can
> make use of the interface.
> 
> > If we do a async flush and move the task to wait queue till we receive
> > flush complete reply from host we can allow other tasks to execute
> > in current cpu.
> > 
> > Any suggestions you have or anything I am not foreseeing here?
> 
> My main thought about this patch series is whether pmem should be a
> virtio-blk feature bit instead of a whole new device.  There is quite a
> bit of overlap between the two.

Exposing options with existing virtio-blk device to be used as persistent memory
range at high level would require additional below features:

- Use a persistent memory range with an option to allocate memmap array in the device
  itself for .

- Block operations for DAX and persistent memory range.

- Bifurcation at filesystem level based on type of virtio-blk device selected.

- Bifurcation of flushing interface and communication channel between guest & host.

But yes these features can be dynamically configured based on type of device
added? What if we have virtio-blk:virtio-pmem (m:n) devices ratio?And scale involved? 

If i understand correctly virtio-blk is high performance interface with multiqueue support 
and additional features at host side like data-plane mode etc. If we bloat it with additional
stuff(even when we need them) and provide locking with additional features both at guest as 
well as host side we will get a hit in performance? Also as requirement of both the interfaces
would grow it will be more difficult to maintain? I would prefer more simpler interfaces with
defined functionality but yes common code can be shared and used using well defined wrappers. 

> 
> Stefan
> 

WARNING: multiple messages have this Message-ID (diff)
From: Pankaj Gupta <pagupta@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org,
	linux-mm@kvack.org, jack@suse.cz, stefanha@redhat.com,
	dan j williams <dan.j.williams@intel.com>,
	riel@redhat.com, haozhong zhang <haozhong.zhang@intel.com>,
	nilal@redhat.com, kwolf@redhat.com, pbonzini@redhat.com,
	ross zwisler <ross.zwisler@intel.com>,
	david@redhat.com,
	xiaoguangrong eric <xiaoguangrong.eric@gmail.com>
Subject: Re: [RFC 2/2] KVM: add virtio-pmem driver
Date: Mon, 16 Oct 2017 13:04:34 -0400 (EDT)	[thread overview]
Message-ID: <1080174355.20804941.1508173474622.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20171016144753.GB14135@stefanha-x1.localdomain>


> 
> On Fri, Oct 13, 2017 at 06:48:15AM -0400, Pankaj Gupta wrote:
> > > On Thu, Oct 12, 2017 at 09:20:26PM +0530, Pankaj Gupta wrote:
> > > > +static blk_qc_t virtio_pmem_make_request(struct request_queue *q,
> > > > +			struct bio *bio)
> > > > +{
> > > > +	blk_status_t rc = 0;
> > > > +	struct bio_vec bvec;
> > > > +	struct bvec_iter iter;
> > > > +	struct virtio_pmem *pmem = q->queuedata;
> > > > +
> > > > +	if (bio->bi_opf & REQ_FLUSH)
> > > > +		//todo host flush command
> > > 
> > > This detail is critical to the device design.  What is the plan?
> > 
> > yes, this is good point.
> > 
> > was thinking of guest sending a flush command to Qemu which
> > will do a fsync on file fd.
> 
> Previously there was discussion about fsyncing a specific file range
> instead of the whole file.  This could perform better in cases where
> only a subset of dirty pages need to be flushed.

yes, We had discussion about this and decided to do entire block flush
then to range level flush.

> 
> One possibility is to design the virtio interface to communicate ranges
> but the emulation code simply fsyncs the fd for the time being.  Later
> on, if the necessary kernel and userspace interfaces are added, we can
> make use of the interface.
> 
> > If we do a async flush and move the task to wait queue till we receive
> > flush complete reply from host we can allow other tasks to execute
> > in current cpu.
> > 
> > Any suggestions you have or anything I am not foreseeing here?
> 
> My main thought about this patch series is whether pmem should be a
> virtio-blk feature bit instead of a whole new device.  There is quite a
> bit of overlap between the two.

Exposing options with existing virtio-blk device to be used as persistent memory
range at high level would require additional below features:

- Use a persistent memory range with an option to allocate memmap array in the device
  itself for .

- Block operations for DAX and persistent memory range.

- Bifurcation at filesystem level based on type of virtio-blk device selected.

- Bifurcation of flushing interface and communication channel between guest & host.

But yes these features can be dynamically configured based on type of device
added? What if we have virtio-blk:virtio-pmem (m:n) devices ratio?And scale involved? 

If i understand correctly virtio-blk is high performance interface with multiqueue support 
and additional features at host side like data-plane mode etc. If we bloat it with additional
stuff(even when we need them) and provide locking with additional features both at guest as 
well as host side we will get a hit in performance? Also as requirement of both the interfaces
would grow it will be more difficult to maintain? I would prefer more simpler interfaces with
defined functionality but yes common code can be shared and used using well defined wrappers. 

> 
> Stefan
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Pankaj Gupta <pagupta@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org,
	linux-mm@kvack.org, jack@suse.cz, stefanha@redhat.com,
	dan j williams <dan.j.williams@intel.com>,
	riel@redhat.com, haozhong zhang <haozhong.zhang@intel.com>,
	nilal@redhat.com, kwolf@redhat.com, pbonzini@redhat.com,
	ross zwisler <ross.zwisler@intel.com>,
	david@redhat.com,
	xiaoguangrong eric <xiaoguangrong.eric@gmail.com>
Subject: Re: [Qemu-devel] [RFC 2/2] KVM: add virtio-pmem driver
Date: Mon, 16 Oct 2017 13:04:34 -0400 (EDT)	[thread overview]
Message-ID: <1080174355.20804941.1508173474622.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20171016144753.GB14135@stefanha-x1.localdomain>


> 
> On Fri, Oct 13, 2017 at 06:48:15AM -0400, Pankaj Gupta wrote:
> > > On Thu, Oct 12, 2017 at 09:20:26PM +0530, Pankaj Gupta wrote:
> > > > +static blk_qc_t virtio_pmem_make_request(struct request_queue *q,
> > > > +			struct bio *bio)
> > > > +{
> > > > +	blk_status_t rc = 0;
> > > > +	struct bio_vec bvec;
> > > > +	struct bvec_iter iter;
> > > > +	struct virtio_pmem *pmem = q->queuedata;
> > > > +
> > > > +	if (bio->bi_opf & REQ_FLUSH)
> > > > +		//todo host flush command
> > > 
> > > This detail is critical to the device design.  What is the plan?
> > 
> > yes, this is good point.
> > 
> > was thinking of guest sending a flush command to Qemu which
> > will do a fsync on file fd.
> 
> Previously there was discussion about fsyncing a specific file range
> instead of the whole file.  This could perform better in cases where
> only a subset of dirty pages need to be flushed.

yes, We had discussion about this and decided to do entire block flush
then to range level flush.

> 
> One possibility is to design the virtio interface to communicate ranges
> but the emulation code simply fsyncs the fd for the time being.  Later
> on, if the necessary kernel and userspace interfaces are added, we can
> make use of the interface.
> 
> > If we do a async flush and move the task to wait queue till we receive
> > flush complete reply from host we can allow other tasks to execute
> > in current cpu.
> > 
> > Any suggestions you have or anything I am not foreseeing here?
> 
> My main thought about this patch series is whether pmem should be a
> virtio-blk feature bit instead of a whole new device.  There is quite a
> bit of overlap between the two.

Exposing options with existing virtio-blk device to be used as persistent memory
range at high level would require additional below features:

- Use a persistent memory range with an option to allocate memmap array in the device
  itself for .

- Block operations for DAX and persistent memory range.

- Bifurcation at filesystem level based on type of virtio-blk device selected.

- Bifurcation of flushing interface and communication channel between guest & host.

But yes these features can be dynamically configured based on type of device
added? What if we have virtio-blk:virtio-pmem (m:n) devices ratio?And scale involved? 

If i understand correctly virtio-blk is high performance interface with multiqueue support 
and additional features at host side like data-plane mode etc. If we bloat it with additional
stuff(even when we need them) and provide locking with additional features both at guest as 
well as host side we will get a hit in performance? Also as requirement of both the interfaces
would grow it will be more difficult to maintain? I would prefer more simpler interfaces with
defined functionality but yes common code can be shared and used using well defined wrappers. 

> 
> Stefan
> 

  parent reply	other threads:[~2017-10-16 17:04 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-12 15:50 [RFC 0/2] KVM "fake DAX" device flushing Pankaj Gupta
2017-10-12 15:50 ` [Qemu-devel] " Pankaj Gupta
2017-10-12 15:50 ` Pankaj Gupta
2017-10-12 15:50 ` [RFC 1/2] pmem: Move reusable code to base header files Pankaj Gupta
2017-10-12 15:50   ` [Qemu-devel] " Pankaj Gupta
2017-10-12 15:50   ` Pankaj Gupta
2017-10-12 20:42   ` Dan Williams
2017-10-12 20:42     ` [Qemu-devel] " Dan Williams
2017-10-12 20:42     ` Dan Williams
2017-10-12 21:27     ` [Qemu-devel] " Pankaj Gupta
2017-10-12 21:27       ` Pankaj Gupta
     [not found] ` <20171012155027.3277-1-pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-12 15:50   ` [RFC 2/2] KVM: add virtio-pmem driver Pankaj Gupta
2017-10-12 15:50     ` [Qemu-devel] " Pankaj Gupta
2017-10-12 15:50     ` Pankaj Gupta
2017-10-12 15:50     ` Pankaj Gupta
2017-10-12 20:51     ` Dan Williams
2017-10-12 20:51       ` [Qemu-devel] " Dan Williams
2017-10-12 20:51       ` Dan Williams
2017-10-12 21:25       ` Pankaj Gupta
2017-10-12 21:25         ` [Qemu-devel] " Pankaj Gupta
2017-10-12 21:25         ` Pankaj Gupta
2017-10-12 21:54         ` Dan Williams
2017-10-12 21:54           ` [Qemu-devel] " Dan Williams
2017-10-12 21:54           ` Dan Williams
     [not found]           ` <CAPcyv4gkri7t+3Unf0sc9AHMnz-v9G_qV_bJppLjUUNAn7drrQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-12 22:18             ` Pankaj Gupta
2017-10-12 22:18               ` [Qemu-devel] " Pankaj Gupta
2017-10-12 22:18               ` Pankaj Gupta
2017-10-12 22:18               ` Pankaj Gupta
2017-10-12 22:27               ` Rik van Riel
2017-10-12 22:27                 ` [Qemu-devel] " Rik van Riel
2017-10-12 22:27                 ` Rik van Riel
2017-10-12 22:27                 ` Rik van Riel
2017-10-12 22:27                 ` Rik van Riel
2017-10-12 22:39                 ` Pankaj Gupta
2017-10-12 22:39                   ` [Qemu-devel] " Pankaj Gupta
2017-10-12 22:39                   ` Pankaj Gupta
2017-10-12 22:52                 ` Pankaj Gupta
2017-10-12 22:52                   ` [Qemu-devel] " Pankaj Gupta
2017-10-12 22:52                   ` Pankaj Gupta
2017-10-12 22:59                   ` Dan Williams
2017-10-12 22:59                     ` [Qemu-devel] " Dan Williams
2017-10-12 22:59                     ` Dan Williams
2017-10-12 23:07                     ` Pankaj Gupta
2017-10-12 23:07                       ` [Qemu-devel] " Pankaj Gupta
2017-10-12 23:07                       ` Pankaj Gupta
2017-10-13  9:44     ` Stefan Hajnoczi
2017-10-13  9:44       ` [Qemu-devel] " Stefan Hajnoczi
2017-10-13  9:44       ` Stefan Hajnoczi
2017-10-13 10:48       ` Pankaj Gupta
2017-10-13 10:48         ` [Qemu-devel] " Pankaj Gupta
2017-10-13 10:48         ` Pankaj Gupta
     [not found]         ` <24301306.20068579.1507891695416.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-16 14:47           ` Stefan Hajnoczi
2017-10-16 14:47             ` [Qemu-devel] " Stefan Hajnoczi
2017-10-16 14:47             ` Stefan Hajnoczi
2017-10-16 14:47             ` Stefan Hajnoczi
2017-10-16 15:58             ` Dan Williams
2017-10-16 15:58               ` [Qemu-devel] " Dan Williams
2017-10-16 15:58               ` Dan Williams
2017-10-16 17:04             ` Pankaj Gupta [this message]
2017-10-16 17:04               ` [Qemu-devel] " Pankaj Gupta
2017-10-16 17:04               ` Pankaj Gupta
     [not found]       ` <20171013094431.GA27308-lxVrvc10SDRcolVlb+j0YCZi+YwRKgec@public.gmane.org>
2017-10-13 15:25         ` Dan Williams
2017-10-13 15:25           ` [Qemu-devel] " Dan Williams
2017-10-13 15:25           ` Dan Williams
2017-10-13 15:25           ` Dan Williams
     [not found]     ` <20171012155027.3277-3-pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-10-17  7:16       ` Christoph Hellwig
2017-10-17  7:16         ` [Qemu-devel] " Christoph Hellwig
2017-10-17  7:16         ` Christoph Hellwig
2017-10-17  7:16         ` Christoph Hellwig
2017-10-17  7:40         ` [Qemu-devel] " Pankaj Gupta
2017-10-17  7:40           ` Pankaj Gupta
2017-10-17  8:02           ` Christoph Hellwig
2017-10-17  8:02             ` Christoph Hellwig
2017-10-17  8:30             ` Pankaj Gupta
2017-10-17  8:30               ` Pankaj Gupta
2017-10-18 13:03               ` Stefan Hajnoczi
2017-10-18 13:03                 ` Stefan Hajnoczi
2017-10-18 15:51                 ` Dan Williams
2017-10-18 15:51                   ` Dan Williams
     [not found]                   ` <CAPcyv4h6aFkyHhh4R4DTznbSCLf9CuBoszk0Q1gB5EKNcp_SeQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-19  8:01                     ` Stefan Hajnoczi
2017-10-19  8:01                       ` Stefan Hajnoczi
2017-10-19  8:01                       ` Stefan Hajnoczi
2017-10-19  8:01                   ` Christoph Hellwig
2017-10-19  8:01                     ` Christoph Hellwig
     [not found]                     ` <20171019080149.GB10089-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-10-19 18:21                       ` Dan Williams
2017-10-19 18:21                         ` Dan Williams
2017-10-19 18:21                         ` Dan Williams
     [not found]                         ` <CAPcyv4j=Cdp68C15HddKaErpve2UGRfSTiL6bHiS=3gQybz9pg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-10-20  8:00                           ` Christoph Hellwig
2017-10-20  8:00                             ` Christoph Hellwig
2017-10-20  8:00                             ` Christoph Hellwig
     [not found]                             ` <20171020080049.GA25471-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2017-10-20 15:05                               ` Dan Williams
2017-10-20 15:05                                 ` Dan Williams
2017-10-20 15:05                                 ` Dan Williams
2017-10-20 16:06                                 ` Christoph Hellwig
2017-10-20 16:06                                   ` Christoph Hellwig
2017-10-20 16:11                                   ` Dan Williams
2017-10-20 16:11                                     ` Dan Williams
2017-10-12 15:50 ` [RFC] QEMU: Add virtio pmem device Pankaj Gupta
2017-10-12 15:50   ` [Qemu-devel] " Pankaj Gupta
2017-10-12 15:50   ` Pankaj Gupta

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1080174355.20804941.1508173474622.JavaMail.zimbra@redhat.com \
    --to=pagupta@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=haozhong.zhang@intel.com \
    --cc=jack@suse.cz \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@redhat.com \
    --cc=ross.zwisler@intel.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.