* [PATCH V2 0/3] xen-blkfront/xen-blkback trim support @ 2011-08-18 9:34 Li Dongyang 2011-08-18 9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang ` (5 more replies) 0 siblings, 6 replies; 15+ messages in thread From: Li Dongyang @ 2011-08-18 9:34 UTC (permalink / raw) To: xen-devel; +Cc: owen.smith, JBeulich Hi, List Here is V2 of the trim support for blkfront/blkback, it's now rebased on Jeremy's next-3.1 tree and has some adjustments according to the comments from Jan Beulich, Thanks ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang @ 2011-08-18 9:34 ` Li Dongyang 2011-08-20 0:38 ` Jeremy Fitzhardinge 2011-08-18 9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang ` (4 subsequent siblings) 5 siblings, 1 reply; 15+ messages in thread From: Li Dongyang @ 2011-08-18 9:34 UTC (permalink / raw) To: xen-devel; +Cc: owen.smith, JBeulich This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling us the type of the backend, used in blkback to determine what to do when we see a trim request. Part of the patch is just taken from Owen Smith, Thanks Signed-off-by: Owen Smith <owen.smith@citrix.com> Signed-off-by: Li Dongyang <lidongyang@novell.com> --- include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h index 3d5d6db..b92cf23 100644 --- a/include/xen/interface/io/blkif.h +++ b/include/xen/interface/io/blkif.h @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; * "feature-flush-cache" node! */ #define BLKIF_OP_FLUSH_DISKCACHE 3 + +/* + * Recognised only if "feature-trim" is present in backend xenbus info. + * The "feature-trim" node contains a boolean indicating whether barrier + * requests are likely to succeed or fail. Either way, a trim request + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by + * the underlying block-device hardware. The boolean simply indicates whether + * or not it is worthwhile for the frontend to attempt trim requests. + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* + * create the "feature-trim" node! + */ +#define BLKIF_OP_TRIM 5 + /* * Maximum scatter/gather segments per request. * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. @@ -74,6 +87,11 @@ struct blkif_request_rw { } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; }; +struct blkif_request_trim { + blkif_sector_t sector_number; + uint64_t nr_sectors; +}; + struct blkif_request { uint8_t operation; /* BLKIF_OP_??? */ uint8_t nr_segments; /* number of segments */ @@ -81,6 +99,7 @@ struct blkif_request { uint64_t id; /* private guest value, echoed in resp */ union { struct blkif_request_rw rw; + struct blkif_request_trim trim; } u; }; @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); #define VDISK_CDROM 0x1 #define VDISK_REMOVABLE 0x2 #define VDISK_READONLY 0x4 +#define VDISK_FILE_BACKEND 0x8 +#define VDISK_PHY_BACKEND 0x10 /* Xen-defined major numbers for virtual disks, they look strangely * familiar */ -- 1.7.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags 2011-08-18 9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang @ 2011-08-20 0:38 ` Jeremy Fitzhardinge 2011-08-22 9:36 ` Li Dongyang 0 siblings, 1 reply; 15+ messages in thread From: Jeremy Fitzhardinge @ 2011-08-20 0:38 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On 08/18/2011 02:34 AM, Li Dongyang wrote: > This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling > us the type of the backend, used in blkback to determine what to do when we > see a trim request. > Part of the patch is just taken from Owen Smith, Thanks > > Signed-off-by: Owen Smith <owen.smith@citrix.com> > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ > 1 files changed, 21 insertions(+), 0 deletions(-) > > diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h > index 3d5d6db..b92cf23 100644 > --- a/include/xen/interface/io/blkif.h > +++ b/include/xen/interface/io/blkif.h > @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; > * "feature-flush-cache" node! > */ > #define BLKIF_OP_FLUSH_DISKCACHE 3 > + > +/* > + * Recognised only if "feature-trim" is present in backend xenbus info. > + * The "feature-trim" node contains a boolean indicating whether barrier > + * requests are likely to succeed or fail. Either way, a trim request Barrier requests? > + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by > + * the underlying block-device hardware. The boolean simply indicates whether > + * or not it is worthwhile for the frontend to attempt trim requests. > + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* > + * create the "feature-trim" node! Is all this necessary? What happens if guests just send OP_TRIM requests, and if the host doesn't understand them then it will fails them with EOPNOTSUPP? Is a TRIM request ever anything more than a hint to the backend that certain blocks are no longer needed? > + */ > +#define BLKIF_OP_TRIM 5 > + > /* > * Maximum scatter/gather segments per request. > * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. > @@ -74,6 +87,11 @@ struct blkif_request_rw { > } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > }; > > +struct blkif_request_trim { > + blkif_sector_t sector_number; > + uint64_t nr_sectors; > +}; > + > struct blkif_request { > uint8_t operation; /* BLKIF_OP_??? */ > uint8_t nr_segments; /* number of segments */ > @@ -81,6 +99,7 @@ struct blkif_request { > uint64_t id; /* private guest value, echoed in resp */ > union { > struct blkif_request_rw rw; > + struct blkif_request_trim trim; > } u; > }; > > @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); > #define VDISK_CDROM 0x1 > #define VDISK_REMOVABLE 0x2 > #define VDISK_READONLY 0x4 > +#define VDISK_FILE_BACKEND 0x8 > +#define VDISK_PHY_BACKEND 0x10 What are these for? Why does a frontend care about these? J ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags 2011-08-20 0:38 ` Jeremy Fitzhardinge @ 2011-08-22 9:36 ` Li Dongyang 2011-08-22 17:44 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 15+ messages in thread From: Li Dongyang @ 2011-08-22 9:36 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: xen-devel, owen.smith, JBeulich On Sat, Aug 20, 2011 at 8:38 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: > On 08/18/2011 02:34 AM, Li Dongyang wrote: >> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling >> us the type of the backend, used in blkback to determine what to do when we >> see a trim request. >> Part of the patch is just taken from Owen Smith, Thanks >> >> Signed-off-by: Owen Smith <owen.smith@citrix.com> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ >> 1 files changed, 21 insertions(+), 0 deletions(-) >> >> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h >> index 3d5d6db..b92cf23 100644 >> --- a/include/xen/interface/io/blkif.h >> +++ b/include/xen/interface/io/blkif.h >> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; >> * "feature-flush-cache" node! >> */ >> #define BLKIF_OP_FLUSH_DISKCACHE 3 >> + >> +/* >> + * Recognised only if "feature-trim" is present in backend xenbus info. >> + * The "feature-trim" node contains a boolean indicating whether barrier >> + * requests are likely to succeed or fail. Either way, a trim request > > Barrier requests? hm, I wonder the same, seems it's a copy & paste mistake, the BLKIF_OP_TRIM part is taken from Owen's patch back in Jan 2011: http://lists.xensource.com/archives/html/xen-devel/2011-01/msg01059.html > >> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by >> + * the underlying block-device hardware. The boolean simply indicates whether >> + * or not it is worthwhile for the frontend to attempt trim requests. >> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* >> + * create the "feature-trim" node! > > Is all this necessary? What happens if guests just send OP_TRIM > requests, and if the host doesn't understand them then it will fails > them with EOPNOTSUPP? Is a TRIM request ever anything more than a hint > to the backend that certain blocks are no longer needed? that won't happen: we only mark the queue in the guest has TRIM if blkback tells blkfront via xenstore. if we don't init the queue with TRIM in guest, if guest send OP_TRIM, it gonna fail with ENONOTSUPP in the guest's block layer, see blkdev_issue_discard. and yes, trim is just a hint, the basic idea is forward the hint to phy dev if it has trim, or punch a hole to reduce disk usage if the backend is a file. and this comment is taken from Owen, I think he could give sth here. > >> + */ >> +#define BLKIF_OP_TRIM 5 >> + >> /* >> * Maximum scatter/gather segments per request. >> * This is carefully chosen so that sizeof(struct blkif_ring) <= PAGE_SIZE. >> @@ -74,6 +87,11 @@ struct blkif_request_rw { >> } seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; >> }; >> >> +struct blkif_request_trim { >> + blkif_sector_t sector_number; >> + uint64_t nr_sectors; >> +}; >> + >> struct blkif_request { >> uint8_t operation; /* BLKIF_OP_??? */ >> uint8_t nr_segments; /* number of segments */ >> @@ -81,6 +99,7 @@ struct blkif_request { >> uint64_t id; /* private guest value, echoed in resp */ >> union { >> struct blkif_request_rw rw; >> + struct blkif_request_trim trim; >> } u; >> }; >> >> @@ -109,6 +128,8 @@ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); >> #define VDISK_CDROM 0x1 >> #define VDISK_REMOVABLE 0x2 >> #define VDISK_READONLY 0x4 >> +#define VDISK_FILE_BACKEND 0x8 >> +#define VDISK_PHY_BACKEND 0x10 > > What are these for? Why does a frontend care about these? they are used for the backend driver to decide what to do when we get a OP_TRIM, if it's phy then forward the trim, or punch a hole in the file, Jan also mentioned this is not good, gonna find another place for the flags, thanks > > J > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags 2011-08-22 9:36 ` Li Dongyang @ 2011-08-22 17:44 ` Jeremy Fitzhardinge 0 siblings, 0 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2011-08-22 17:44 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On 08/22/2011 02:36 AM, Li Dongyang wrote: > On Sat, Aug 20, 2011 at 8:38 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrote: >> On 08/18/2011 02:34 AM, Li Dongyang wrote: >>> This adds the BLKIF_OP_TRIM for blkfront and blkback, also 2 flags telling >>> us the type of the backend, used in blkback to determine what to do when we >>> see a trim request. >>> Part of the patch is just taken from Owen Smith, Thanks >>> >>> Signed-off-by: Owen Smith <owen.smith@citrix.com> >>> Signed-off-by: Li Dongyang <lidongyang@novell.com> >>> --- >>> include/xen/interface/io/blkif.h | 21 +++++++++++++++++++++ >>> 1 files changed, 21 insertions(+), 0 deletions(-) >>> >>> diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h >>> index 3d5d6db..b92cf23 100644 >>> --- a/include/xen/interface/io/blkif.h >>> +++ b/include/xen/interface/io/blkif.h >>> @@ -57,6 +57,19 @@ typedef uint64_t blkif_sector_t; >>> * "feature-flush-cache" node! >>> */ >>> #define BLKIF_OP_FLUSH_DISKCACHE 3 >>> + >>> +/* >>> + * Recognised only if "feature-trim" is present in backend xenbus info. >>> + * The "feature-trim" node contains a boolean indicating whether barrier >>> + * requests are likely to succeed or fail. Either way, a trim request >> Barrier requests? > hm, I wonder the same, seems it's a copy & paste mistake, > the BLKIF_OP_TRIM part is taken from Owen's patch back in Jan 2011: > http://lists.xensource.com/archives/html/xen-devel/2011-01/msg01059.html >>> + * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by >>> + * the underlying block-device hardware. The boolean simply indicates whether >>> + * or not it is worthwhile for the frontend to attempt trim requests. >>> + * If a backend does not recognise BLKIF_OP_TRIM, it should *not* >>> + * create the "feature-trim" node! >> Is all this necessary? What happens if guests just send OP_TRIM >> requests, and if the host doesn't understand them then it will fails >> them with EOPNOTSUPP? Is a TRIM request ever anything more than a hint >> to the backend that certain blocks are no longer needed? > that won't happen: we only mark the queue in the guest has TRIM if > blkback tells blkfront > via xenstore. if we don't init the queue with TRIM in guest, if guest > send OP_TRIM, > it gonna fail with ENONOTSUPP in the guest's block layer, see > blkdev_issue_discard. > and yes, trim is just a hint, the basic idea is forward the hint to > phy dev if it has trim, or > punch a hole to reduce disk usage if the backend is a file. > and this comment is taken from Owen, I think he could give sth here. Right. So if this is just a hint, and there's no correctness implications for the frontend if the backend doesn't support trim, then the frontend should just start trying to use trim and then suppress them if they come back as failed. I guess if the frontend needs other information from the backend (about trim chunk size?) then this is moot, but this kind of feature negotiation via xenbus seems pretty flaky and it would be nice to avoid repeating the mistake. J ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang 2011-08-18 9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang @ 2011-08-18 9:34 ` Li Dongyang 2011-08-18 15:05 ` Konrad Rzeszutek Wilk 2011-08-18 9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang ` (3 subsequent siblings) 5 siblings, 1 reply; 15+ messages in thread From: Li Dongyang @ 2011-08-18 9:34 UTC (permalink / raw) To: xen-devel; +Cc: owen.smith, JBeulich The blkfront driver now will read feature-trim from xenstore, and set up the request queue with trim params, then we can forward the discard requests to backend driver. Signed-off-by: Li Dongyang <lidongyang@novell.com> --- drivers/block/xen-blkfront.c | 96 ++++++++++++++++++++++++++++++++---------- 1 files changed, 73 insertions(+), 23 deletions(-) diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c index b536a9c..c22fb07 100644 --- a/drivers/block/xen-blkfront.c +++ b/drivers/block/xen-blkfront.c @@ -98,6 +98,9 @@ struct blkfront_info unsigned long shadow_free; unsigned int feature_flush; unsigned int flush_op; + unsigned int feature_trim; + unsigned int discard_granularity; + unsigned int discard_alignment; int is_ready; }; @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) ring_req->operation = info->flush_op; } - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); + if (unlikely(req->cmd_flags & REQ_DISCARD)) { + /* id, sector_number and handle are set above. */ + ring_req->operation = BLKIF_OP_TRIM; + ring_req->nr_segments = 0; + ring_req->u.trim.nr_sectors = blk_rq_sectors(req); + } else { + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); - fsect = sg->offset >> 9; - lsect = fsect + (sg->length >> 9) - 1; - /* install a grant reference. */ - ref = gnttab_claim_grant_reference(&gref_head); - BUG_ON(ref == -ENOSPC); + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); + fsect = sg->offset >> 9; + lsect = fsect + (sg->length >> 9) - 1; + /* install a grant reference. */ + ref = gnttab_claim_grant_reference(&gref_head); + BUG_ON(ref == -ENOSPC); - gnttab_grant_foreign_access_ref( - ref, - info->xbdev->otherend_id, - buffer_mfn, - rq_data_dir(req) ); - - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); - ring_req->u.rw.seg[i] = - (struct blkif_request_segment) { - .gref = ref, - .first_sect = fsect, - .last_sect = lsect }; + gnttab_grant_foreign_access_ref( + ref, + info->xbdev->otherend_id, + buffer_mfn, + rq_data_dir(req)); + + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); + ring_req->u.rw.seg[i] = + (struct blkif_request_segment) { + .gref = ref, + .first_sect = fsect, + .last_sect = lsect }; + } } info->ring.req_prod_pvt++; @@ -399,6 +409,7 @@ wait: static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) { struct request_queue *rq; + struct blkfront_info *info = gd->private_data; rq = blk_init_queue(do_blkif_request, &blkif_io_lock); if (rq == NULL) @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); + if (info->feature_trim) { + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); + blk_queue_max_discard_sectors(rq, get_capacity(gd)); + rq->limits.discard_granularity = info->discard_granularity; + rq->limits.discard_alignment = info->discard_alignment; + } + /* Hard sector size and max sectors impersonate the equiv. hardware. */ blk_queue_logical_block_size(rq, sector_size); blk_queue_max_hw_sectors(rq, 512); @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; switch (bret->operation) { + case BLKIF_OP_TRIM: + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { + struct request_queue *rq = info->rq; + printk(KERN_WARNING "blkfront: %s: trim op failed\n", + info->gd->disk_name); + error = -EOPNOTSUPP; + info->feature_trim = 0; + spin_lock_irq(rq->queue_lock); + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); + spin_unlock_irq(rq->queue_lock); + } + __blk_end_request_all(req, error); + break; case BLKIF_OP_FLUSH_DISKCACHE: case BLKIF_OP_WRITE_BARRIER: if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { @@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info *info) unsigned long sector_size; unsigned int binfo; int err; - int barrier, flush; + unsigned int discard_granularity; + unsigned int discard_alignment; + int barrier, flush, trim; switch (info->connected) { case BLKIF_STATE_CONNECTED: @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info *info) info->feature_flush = REQ_FLUSH; info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; } - + + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "feature-trim", "%d", &trim, + NULL); + + if (!err && trim) { + info->feature_trim = 1; + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, + "discard_granularity", "%u", &discard_granularity, + "discard_alignment", "%u", &discard_alignment, + NULL); + if (!err) { + info->discard_granularity = discard_granularity; + info->discard_alignment = discard_alignment; + } + } else + info->feature_trim = 0; + err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); if (err) { xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", -- 1.7.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request 2011-08-18 9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang @ 2011-08-18 15:05 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 15+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-08-18 15:05 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On Thu, Aug 18, 2011 at 05:34:30PM +0800, Li Dongyang wrote: > The blkfront driver now will read feature-trim from xenstore, > and set up the request queue with trim params, then we can forward the > discard requests to backend driver. > > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > drivers/block/xen-blkfront.c | 96 ++++++++++++++++++++++++++++++++---------- > 1 files changed, 73 insertions(+), 23 deletions(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index b536a9c..c22fb07 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -98,6 +98,9 @@ struct blkfront_info > unsigned long shadow_free; > unsigned int feature_flush; > unsigned int flush_op; > + unsigned int feature_trim; > + unsigned int discard_granularity; > + unsigned int discard_alignment; > int is_ready; > }; > > @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) > ring_req->operation = info->flush_op; > } > > - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > + if (unlikely(req->cmd_flags & REQ_DISCARD)) { > + /* id, sector_number and handle are set above. */ > + ring_req->operation = BLKIF_OP_TRIM; > + ring_req->nr_segments = 0; > + ring_req->u.trim.nr_sectors = blk_rq_sectors(req); > + } else { > + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); > + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); > > - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > - fsect = sg->offset >> 9; > - lsect = fsect + (sg->length >> 9) - 1; > - /* install a grant reference. */ > - ref = gnttab_claim_grant_reference(&gref_head); > - BUG_ON(ref == -ENOSPC); > + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { > + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); > + fsect = sg->offset >> 9; > + lsect = fsect + (sg->length >> 9) - 1; > + /* install a grant reference. */ > + ref = gnttab_claim_grant_reference(&gref_head); > + BUG_ON(ref == -ENOSPC); > > - gnttab_grant_foreign_access_ref( > - ref, > - info->xbdev->otherend_id, > - buffer_mfn, > - rq_data_dir(req) ); > - > - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > - ring_req->u.rw.seg[i] = > - (struct blkif_request_segment) { > - .gref = ref, > - .first_sect = fsect, > - .last_sect = lsect }; > + gnttab_grant_foreign_access_ref( > + ref, > + info->xbdev->otherend_id, > + buffer_mfn, > + rq_data_dir(req)); > + > + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); > + ring_req->u.rw.seg[i] = > + (struct blkif_request_segment) { > + .gref = ref, > + .first_sect = fsect, > + .last_sect = lsect }; > + } > } > > info->ring.req_prod_pvt++; > @@ -399,6 +409,7 @@ wait: > static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > { > struct request_queue *rq; > + struct blkfront_info *info = gd->private_data; > > rq = blk_init_queue(do_blkif_request, &blkif_io_lock); > if (rq == NULL) > @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) > > queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); > > + if (info->feature_trim) { > + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); > + blk_queue_max_discard_sectors(rq, get_capacity(gd)); > + rq->limits.discard_granularity = info->discard_granularity; > + rq->limits.discard_alignment = info->discard_alignment; > + } > + > /* Hard sector size and max sectors impersonate the equiv. hardware. */ > blk_queue_logical_block_size(rq, sector_size); > blk_queue_max_hw_sectors(rq, 512); > @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id) > > error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; > switch (bret->operation) { > + case BLKIF_OP_TRIM: > + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > + struct request_queue *rq = info->rq; > + printk(KERN_WARNING "blkfront: %s: trim op failed\n", > + info->gd->disk_name); > + error = -EOPNOTSUPP; > + info->feature_trim = 0; > + spin_lock_irq(rq->queue_lock); > + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); > + spin_unlock_irq(rq->queue_lock); > + } > + __blk_end_request_all(req, error); > + break; > case BLKIF_OP_FLUSH_DISKCACHE: > case BLKIF_OP_WRITE_BARRIER: > if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { > @@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info *info) > unsigned long sector_size; > unsigned int binfo; > int err; > - int barrier, flush; > + unsigned int discard_granularity; > + unsigned int discard_alignment; > + int barrier, flush, trim; > > switch (info->connected) { > case BLKIF_STATE_CONNECTED: > @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info *info) > info->feature_flush = REQ_FLUSH; > info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; > } > - > + > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "feature-trim", "%d", &trim, > + NULL); > + > + if (!err && trim) { > + info->feature_trim = 1; > + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, > + "discard_granularity", "%u", &discard_granularity, > + "discard_alignment", "%u", &discard_alignment, I am just not sure why the '_' was choosen. It seems so odds with the rest of what is put in XenBus. > + NULL); > + if (!err) { > + info->discard_granularity = discard_granularity; > + info->discard_alignment = discard_alignment; > + } > + } else > + info->feature_trim = 0; > + > err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); > if (err) { > xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", > -- > 1.7.6 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH V2 3/3] xen-blkback: handle trim request in backend driver 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang 2011-08-18 9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang 2011-08-18 9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang @ 2011-08-18 9:34 ` Li Dongyang 2011-08-18 14:56 ` Konrad Rzeszutek Wilk 2011-08-18 14:56 ` [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Konrad Rzeszutek Wilk ` (2 subsequent siblings) 5 siblings, 1 reply; 15+ messages in thread From: Li Dongyang @ 2011-08-18 9:34 UTC (permalink / raw) To: xen-devel; +Cc: owen.smith, JBeulich Now blkback driver can handle the trim request from guest, we will forward the request to phy device if it really has trim support, or we'll punch a hole on the image file. Signed-off-by: Li Dongyang <lidongyang@novell.com> --- drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ drivers/block/xen-blkback/common.h | 4 +- drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ 3 files changed, 135 insertions(+), 15 deletions(-) diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c index 2330a9a..5acc37a 100644 --- a/drivers/block/xen-blkback/blkback.c +++ b/drivers/block/xen-blkback/blkback.c @@ -39,6 +39,9 @@ #include <linux/list.h> #include <linux/delay.h> #include <linux/freezer.h> +#include <linux/loop.h> +#include <linux/falloc.h> +#include <linux/fs.h> #include <xen/events.h> #include <xen/page.h> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) static void print_stats(struct xen_blkif *blkif) { - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" + " | tr %4d\n", current->comm, blkif->st_oo_req, - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); + blkif->st_rd_req, blkif->st_wr_req, + blkif->st_f_req, blkif->st_tr_req); blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); blkif->st_rd_req = 0; blkif->st_wr_req = 0; blkif->st_oo_req = 0; + blkif->st_tr_req = 0; } int xen_blkif_schedule(void *arg) @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, blkif->st_f_req++; operation = WRITE_FLUSH; break; + case BLKIF_OP_TRIM: + blkif->st_tr_req++; + operation = REQ_DISCARD; + break; case BLKIF_OP_WRITE_BARRIER: default: operation = 0; /* make gcc happy */ @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, /* Check that the number of segments is sane. */ nseg = req->nr_segments; - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", nseg); @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, * the hypercall to unmap the grants - that is all done in * xen_blkbk_unmap. */ - if (xen_blkbk_map(req, pending_req, seg)) + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) goto fail_flush; - /* This corresponding xen_blkif_put is done in __end_block_io_op */ + /* + * This corresponding xen_blkif_put is done in __end_block_io_op, or + * below if we are handling a BLKIF_OP_TRIM. + */ xen_blkif_get(blkif); for (i = 0; i < nseg; i++) { @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, preq.sector_number += seg[i].nsec; } - /* This will be hit if the operation was a flush. */ + /* This will be hit if the operation was a flush or trim. */ if (!bio) { - BUG_ON(operation != WRITE_FLUSH); + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); - bio = bio_alloc(GFP_KERNEL, 0); - if (unlikely(bio == NULL)) - goto fail_put_bio; + if (operation == WRITE_FLUSH) { + bio = bio_alloc(GFP_KERNEL, 0); + if (unlikely(bio == NULL)) + goto fail_put_bio; - biolist[nbio++] = bio; - bio->bi_bdev = preq.bdev; - bio->bi_private = pending_req; - bio->bi_end_io = end_block_io_op; + biolist[nbio++] = bio; + bio->bi_bdev = preq.bdev; + bio->bi_private = pending_req; + bio->bi_end_io = end_block_io_op; + } else if (operation == REQ_DISCARD) { + int err = 0; + int status = BLKIF_RSP_OKAY; + struct block_device *bdev = blkif->vbd.bdev; + + preq.nr_sects = req->u.trim.nr_sectors; + if (blkif->vbd.type & VDISK_PHY_BACKEND) + /* just forward the trim request */ + err = blkdev_issue_discard(bdev, + preq.sector_number, + preq.nr_sects, + GFP_KERNEL, 0); + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { + /* punch a hole in the backing file */ + struct loop_device *lo = + bdev->bd_disk->private_data; + struct file *file = lo->lo_backing_file; + + if (file->f_op->fallocate) + err = file->f_op->fallocate(file, + FALLOC_FL_KEEP_SIZE | + FALLOC_FL_PUNCH_HOLE, + preq.sector_number << 9, + preq.nr_sects << 9); + else + err = -EOPNOTSUPP; + } else + status = BLKIF_RSP_EOPNOTSUPP; + + if (err == -EOPNOTSUPP) { + DPRINTK("blkback: discard op failed, " + "not supported\n"); + status = BLKIF_RSP_EOPNOTSUPP; + } else if (err) + status = BLKIF_RSP_ERROR; + + if (status == BLKIF_RSP_OKAY) + blkif->st_tr_sect += preq.nr_sects; + make_response(blkif, req->id, req->operation, status); + xen_blkif_put(blkif); + free_req(pending_req); + return 0; + } } /* diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h index 9e40b28..1fef727 100644 --- a/drivers/block/xen-blkback/common.h +++ b/drivers/block/xen-blkback/common.h @@ -159,8 +159,10 @@ struct xen_blkif { int st_wr_req; int st_oo_req; int st_f_req; + int st_tr_req; int st_rd_sect; int st_wr_sect; + int st_tr_sect; wait_queue_head_t waiting_to_free; @@ -182,7 +184,7 @@ struct xen_blkif { struct phys_req { unsigned short dev; - unsigned short nr_sects; + blkif_sector_t nr_sects; struct block_device *bdev; blkif_sector_t sector_number; }; diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c index 3f129b4..05ea8e0 100644 --- a/drivers/block/xen-blkback/xenbus.c +++ b/drivers/block/xen-blkback/xenbus.c @@ -272,16 +272,20 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req); VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect); static struct attribute *xen_vbdstat_attrs[] = { &dev_attr_oo_req.attr, &dev_attr_rd_req.attr, &dev_attr_wr_req.attr, &dev_attr_f_req.attr, + &dev_attr_tr_req.attr, &dev_attr_rd_sect.attr, &dev_attr_wr_sect.attr, + &dev_attr_tr_sect.attr, NULL }; @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, return err; } +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be) +{ + struct xenbus_device *dev = be->dev; + struct xen_vbd *vbd = &be->blkif->vbd; + char *type; + int err; + int state = 0; + + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); + if (!IS_ERR(type)) { + if (strcmp(type, "file") == 0) + state = 1; + vbd->type |= VDISK_FILE_BACKEND; + if (strcmp(type, "phy") == 0) { + struct block_device *bdev = be->blkif->vbd.bdev; + struct request_queue *q = bdev_get_queue(bdev); + if (blk_queue_discard(q)) { + err = xenbus_printf(xbt, dev->nodename, + "discard_granularity", "%u", + q->limits.discard_granularity); + if (err) { + xenbus_dev_fatal(dev, err, + "writing discard_granularity"); + goto kfree; + } + err = xenbus_printf(xbt, dev->nodename, + "discard_alignment", "%u", + q->limits.discard_alignment); + if (err) { + xenbus_dev_fatal(dev, err, + "writing discard_alignment"); + goto kfree; + } + state = 1; + vbd->type |= VDISK_PHY_BACKEND; + } + } + } else { + err = PTR_ERR(type); + xenbus_dev_fatal(dev, err, "reading type"); + goto out; + } + + err = xenbus_printf(xbt, dev->nodename, "feature-trim", + "%d", state); + if (err) + xenbus_dev_fatal(dev, err, "writing feature-trim"); +kfree: + kfree(type); +out: + return err; +} + /* * Entry point to this code when a new device is created. Allocate the basic * structures, and watch the store waiting for the hotplug scripts to tell us @@ -650,6 +707,10 @@ again: if (err) goto abort; + err = xen_blkbk_trim(xbt, be); + if (err) + goto abort; + err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", (unsigned long long)vbd_sz(&be->blkif->vbd)); if (err) { -- 1.7.6 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH V2 3/3] xen-blkback: handle trim request in backend driver 2011-08-18 9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang @ 2011-08-18 14:56 ` Konrad Rzeszutek Wilk 2011-08-22 9:43 ` Li Dongyang 0 siblings, 1 reply; 15+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-08-18 14:56 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote: > Now blkback driver can handle the trim request from guest, we will > forward the request to phy device if it really has trim support, or we'll > punch a hole on the image file. > > Signed-off-by: Li Dongyang <lidongyang@novell.com> > --- > drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ > drivers/block/xen-blkback/common.h | 4 +- > drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ > 3 files changed, 135 insertions(+), 15 deletions(-) > > diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > index 2330a9a..5acc37a 100644 > --- a/drivers/block/xen-blkback/blkback.c > +++ b/drivers/block/xen-blkback/blkback.c > @@ -39,6 +39,9 @@ > #include <linux/list.h> > #include <linux/delay.h> > #include <linux/freezer.h> > +#include <linux/loop.h> > +#include <linux/falloc.h> > +#include <linux/fs.h> > > #include <xen/events.h> > #include <xen/page.h> > @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) > > static void print_stats(struct xen_blkif *blkif) > { > - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", > + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" > + " | tr %4d\n", > current->comm, blkif->st_oo_req, > - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); > + blkif->st_rd_req, blkif->st_wr_req, > + blkif->st_f_req, blkif->st_tr_req); > blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); > blkif->st_rd_req = 0; > blkif->st_wr_req = 0; > blkif->st_oo_req = 0; > + blkif->st_tr_req = 0; > } > > int xen_blkif_schedule(void *arg) > @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > blkif->st_f_req++; > operation = WRITE_FLUSH; > break; > + case BLKIF_OP_TRIM: > + blkif->st_tr_req++; > + operation = REQ_DISCARD; > + break; > case BLKIF_OP_WRITE_BARRIER: > default: > operation = 0; /* make gcc happy */ > @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > > /* Check that the number of segments is sane. */ > nseg = req->nr_segments; > - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || > + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || > unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { > pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", > nseg); > @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > * the hypercall to unmap the grants - that is all done in > * xen_blkbk_unmap. > */ > - if (xen_blkbk_map(req, pending_req, seg)) > + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) > goto fail_flush; > > - /* This corresponding xen_blkif_put is done in __end_block_io_op */ > + /* > + * This corresponding xen_blkif_put is done in __end_block_io_op, or > + * below if we are handling a BLKIF_OP_TRIM. > + */ > xen_blkif_get(blkif); > > for (i = 0; i < nseg; i++) { > @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > preq.sector_number += seg[i].nsec; > } > > - /* This will be hit if the operation was a flush. */ > + /* This will be hit if the operation was a flush or trim. */ > if (!bio) { > - BUG_ON(operation != WRITE_FLUSH); > + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); > > - bio = bio_alloc(GFP_KERNEL, 0); > - if (unlikely(bio == NULL)) > - goto fail_put_bio; > + if (operation == WRITE_FLUSH) { > + bio = bio_alloc(GFP_KERNEL, 0); > + if (unlikely(bio == NULL)) > + goto fail_put_bio; > > - biolist[nbio++] = bio; > - bio->bi_bdev = preq.bdev; > - bio->bi_private = pending_req; > - bio->bi_end_io = end_block_io_op; > + biolist[nbio++] = bio; > + bio->bi_bdev = preq.bdev; > + bio->bi_private = pending_req; > + bio->bi_end_io = end_block_io_op; > + } else if (operation == REQ_DISCARD) { > + int err = 0; > + int status = BLKIF_RSP_OKAY; > + struct block_device *bdev = blkif->vbd.bdev; > + > + preq.nr_sects = req->u.trim.nr_sectors; > + if (blkif->vbd.type & VDISK_PHY_BACKEND) > + /* just forward the trim request */ > + err = blkdev_issue_discard(bdev, > + preq.sector_number, > + preq.nr_sects, > + GFP_KERNEL, 0); > + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { > + /* punch a hole in the backing file */ > + struct loop_device *lo = > + bdev->bd_disk->private_data; > + struct file *file = lo->lo_backing_file; > + > + if (file->f_op->fallocate) > + err = file->f_op->fallocate(file, > + FALLOC_FL_KEEP_SIZE | > + FALLOC_FL_PUNCH_HOLE, > + preq.sector_number << 9, > + preq.nr_sects << 9); > + else > + err = -EOPNOTSUPP; > + } else > + status = BLKIF_RSP_EOPNOTSUPP; > + > + if (err == -EOPNOTSUPP) { > + DPRINTK("blkback: discard op failed, " > + "not supported\n"); Use pr_debug like the rest of the file does. > + status = BLKIF_RSP_EOPNOTSUPP; > + } else if (err) > + status = BLKIF_RSP_ERROR; > + > + if (status == BLKIF_RSP_OKAY) > + blkif->st_tr_sect += preq.nr_sects; > + make_response(blkif, req->id, req->operation, status); > + xen_blkif_put(blkif); > + free_req(pending_req); > + return 0; > + } All of this should really be moved to its own function. > } > > /* > diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h > index 9e40b28..1fef727 100644 > --- a/drivers/block/xen-blkback/common.h > +++ b/drivers/block/xen-blkback/common.h > @@ -159,8 +159,10 @@ struct xen_blkif { > int st_wr_req; > int st_oo_req; > int st_f_req; > + int st_tr_req; > int st_rd_sect; > int st_wr_sect; > + int st_tr_sect; > > wait_queue_head_t waiting_to_free; > > @@ -182,7 +184,7 @@ struct xen_blkif { > > struct phys_req { > unsigned short dev; > - unsigned short nr_sects; > + blkif_sector_t nr_sects; > struct block_device *bdev; > blkif_sector_t sector_number; > }; > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c > index 3f129b4..05ea8e0 100644 > --- a/drivers/block/xen-blkback/xenbus.c > +++ b/drivers/block/xen-blkback/xenbus.c > @@ -272,16 +272,20 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); > VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); > VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); > VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); > +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req); > VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); > VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); > +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect); > > static struct attribute *xen_vbdstat_attrs[] = { > &dev_attr_oo_req.attr, > &dev_attr_rd_req.attr, > &dev_attr_wr_req.attr, > &dev_attr_f_req.attr, > + &dev_attr_tr_req.attr, > &dev_attr_rd_sect.attr, > &dev_attr_wr_sect.attr, > + &dev_attr_tr_sect.attr, > NULL > }; > > @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, > return err; > } > > +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be) > +{ > + struct xenbus_device *dev = be->dev; > + struct xen_vbd *vbd = &be->blkif->vbd; > + char *type; > + int err; > + int state = 0; > + > + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); > + if (!IS_ERR(type)) { > + if (strcmp(type, "file") == 0) > + state = 1; > + vbd->type |= VDISK_FILE_BACKEND; > + if (strcmp(type, "phy") == 0) { Use 'strncmp' please. > + struct block_device *bdev = be->blkif->vbd.bdev; > + struct request_queue *q = bdev_get_queue(bdev); > + if (blk_queue_discard(q)) { > + err = xenbus_printf(xbt, dev->nodename, > + "discard_granularity", "%u", Hm, most of the items written to the Xenbus use '-', not '_'. Any particular reason for using '_'? > + q->limits.discard_granularity); > + if (err) { > + xenbus_dev_fatal(dev, err, > + "writing discard_granularity"); > + goto kfree; > + } > + err = xenbus_printf(xbt, dev->nodename, > + "discard_alignment", "%u", Ditto here. > + q->limits.discard_alignment); > + if (err) { > + xenbus_dev_fatal(dev, err, > + "writing discard_alignment"); > + goto kfree; > + } > + state = 1; > + vbd->type |= VDISK_PHY_BACKEND; > + } > + } > + } else { > + err = PTR_ERR(type); > + xenbus_dev_fatal(dev, err, "reading type"); > + goto out; > + } > + > + err = xenbus_printf(xbt, dev->nodename, "feature-trim", > + "%d", state); > + if (err) > + xenbus_dev_fatal(dev, err, "writing feature-trim"); > +kfree: > + kfree(type); > +out: > + return err; > +} > + > /* > * Entry point to this code when a new device is created. Allocate the basic > * structures, and watch the store waiting for the hotplug scripts to tell us > @@ -650,6 +707,10 @@ again: > if (err) > goto abort; > > + err = xen_blkbk_trim(xbt, be); > + if (err) > + goto abort; > + > err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", > (unsigned long long)vbd_sz(&be->blkif->vbd)); > if (err) { > -- > 1.7.6 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 3/3] xen-blkback: handle trim request in backend driver 2011-08-18 14:56 ` Konrad Rzeszutek Wilk @ 2011-08-22 9:43 ` Li Dongyang 2011-08-22 13:27 ` Konrad Rzeszutek Wilk 0 siblings, 1 reply; 15+ messages in thread From: Li Dongyang @ 2011-08-22 9:43 UTC (permalink / raw) To: Konrad Rzeszutek Wilk; +Cc: xen-devel, owen.smith, JBeulich On Thu, Aug 18, 2011 at 10:56 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote: >> Now blkback driver can handle the trim request from guest, we will >> forward the request to phy device if it really has trim support, or we'll >> punch a hole on the image file. >> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ >> drivers/block/xen-blkback/common.h | 4 +- >> drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ >> 3 files changed, 135 insertions(+), 15 deletions(-) >> >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c >> index 2330a9a..5acc37a 100644 >> --- a/drivers/block/xen-blkback/blkback.c >> +++ b/drivers/block/xen-blkback/blkback.c >> @@ -39,6 +39,9 @@ >> #include <linux/list.h> >> #include <linux/delay.h> >> #include <linux/freezer.h> >> +#include <linux/loop.h> >> +#include <linux/falloc.h> >> +#include <linux/fs.h> >> >> #include <xen/events.h> >> #include <xen/page.h> >> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) >> >> static void print_stats(struct xen_blkif *blkif) >> { >> - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", >> + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" >> + " | tr %4d\n", >> current->comm, blkif->st_oo_req, >> - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); >> + blkif->st_rd_req, blkif->st_wr_req, >> + blkif->st_f_req, blkif->st_tr_req); >> blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); >> blkif->st_rd_req = 0; >> blkif->st_wr_req = 0; >> blkif->st_oo_req = 0; >> + blkif->st_tr_req = 0; >> } >> >> int xen_blkif_schedule(void *arg) >> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> blkif->st_f_req++; >> operation = WRITE_FLUSH; >> break; >> + case BLKIF_OP_TRIM: >> + blkif->st_tr_req++; >> + operation = REQ_DISCARD; >> + break; >> case BLKIF_OP_WRITE_BARRIER: >> default: >> operation = 0; /* make gcc happy */ >> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> >> /* Check that the number of segments is sane. */ >> nseg = req->nr_segments; >> - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || >> + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || >> unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { >> pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", >> nseg); >> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> * the hypercall to unmap the grants - that is all done in >> * xen_blkbk_unmap. >> */ >> - if (xen_blkbk_map(req, pending_req, seg)) >> + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) >> goto fail_flush; >> >> - /* This corresponding xen_blkif_put is done in __end_block_io_op */ >> + /* >> + * This corresponding xen_blkif_put is done in __end_block_io_op, or >> + * below if we are handling a BLKIF_OP_TRIM. >> + */ >> xen_blkif_get(blkif); >> >> for (i = 0; i < nseg; i++) { >> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, >> preq.sector_number += seg[i].nsec; >> } >> >> - /* This will be hit if the operation was a flush. */ >> + /* This will be hit if the operation was a flush or trim. */ >> if (!bio) { >> - BUG_ON(operation != WRITE_FLUSH); >> + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); >> >> - bio = bio_alloc(GFP_KERNEL, 0); >> - if (unlikely(bio == NULL)) >> - goto fail_put_bio; >> + if (operation == WRITE_FLUSH) { >> + bio = bio_alloc(GFP_KERNEL, 0); >> + if (unlikely(bio == NULL)) >> + goto fail_put_bio; >> >> - biolist[nbio++] = bio; >> - bio->bi_bdev = preq.bdev; >> - bio->bi_private = pending_req; >> - bio->bi_end_io = end_block_io_op; >> + biolist[nbio++] = bio; >> + bio->bi_bdev = preq.bdev; >> + bio->bi_private = pending_req; >> + bio->bi_end_io = end_block_io_op; >> + } else if (operation == REQ_DISCARD) { >> + int err = 0; >> + int status = BLKIF_RSP_OKAY; >> + struct block_device *bdev = blkif->vbd.bdev; >> + >> + preq.nr_sects = req->u.trim.nr_sectors; >> + if (blkif->vbd.type & VDISK_PHY_BACKEND) >> + /* just forward the trim request */ >> + err = blkdev_issue_discard(bdev, >> + preq.sector_number, >> + preq.nr_sects, >> + GFP_KERNEL, 0); >> + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { >> + /* punch a hole in the backing file */ >> + struct loop_device *lo = >> + bdev->bd_disk->private_data; >> + struct file *file = lo->lo_backing_file; >> + >> + if (file->f_op->fallocate) >> + err = file->f_op->fallocate(file, >> + FALLOC_FL_KEEP_SIZE | >> + FALLOC_FL_PUNCH_HOLE, >> + preq.sector_number << 9, >> + preq.nr_sects << 9); >> + else >> + err = -EOPNOTSUPP; >> + } else >> + status = BLKIF_RSP_EOPNOTSUPP; >> + >> + if (err == -EOPNOTSUPP) { >> + DPRINTK("blkback: discard op failed, " >> + "not supported\n"); > > Use pr_debug like the rest of the file does. gonna fix > >> + status = BLKIF_RSP_EOPNOTSUPP; >> + } else if (err) >> + status = BLKIF_RSP_ERROR; >> + >> + if (status == BLKIF_RSP_OKAY) >> + blkif->st_tr_sect += preq.nr_sects; >> + make_response(blkif, req->id, req->operation, status); >> + xen_blkif_put(blkif); >> + free_req(pending_req); >> + return 0; >> + } > > All of this should really be moved to its own function. not quite clear about this, do you mean we should make sth like dispatch_trim_block_io only for OP_TRIM? I added the trim handling stuff to dispatch_rw_block_io because it also handles flush stuff. > >> } >> >> /* >> diff --git a/drivers/block/xen-blkback/common.h b/drivers/block/xen-blkback/common.h >> index 9e40b28..1fef727 100644 >> --- a/drivers/block/xen-blkback/common.h >> +++ b/drivers/block/xen-blkback/common.h >> @@ -159,8 +159,10 @@ struct xen_blkif { >> int st_wr_req; >> int st_oo_req; >> int st_f_req; >> + int st_tr_req; >> int st_rd_sect; >> int st_wr_sect; >> + int st_tr_sect; >> >> wait_queue_head_t waiting_to_free; >> >> @@ -182,7 +184,7 @@ struct xen_blkif { >> >> struct phys_req { >> unsigned short dev; >> - unsigned short nr_sects; >> + blkif_sector_t nr_sects; >> struct block_device *bdev; >> blkif_sector_t sector_number; >> }; >> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c >> index 3f129b4..05ea8e0 100644 >> --- a/drivers/block/xen-blkback/xenbus.c >> +++ b/drivers/block/xen-blkback/xenbus.c >> @@ -272,16 +272,20 @@ VBD_SHOW(oo_req, "%d\n", be->blkif->st_oo_req); >> VBD_SHOW(rd_req, "%d\n", be->blkif->st_rd_req); >> VBD_SHOW(wr_req, "%d\n", be->blkif->st_wr_req); >> VBD_SHOW(f_req, "%d\n", be->blkif->st_f_req); >> +VBD_SHOW(tr_req, "%d\n", be->blkif->st_tr_req); >> VBD_SHOW(rd_sect, "%d\n", be->blkif->st_rd_sect); >> VBD_SHOW(wr_sect, "%d\n", be->blkif->st_wr_sect); >> +VBD_SHOW(tr_sect, "%d\n", be->blkif->st_tr_sect); >> >> static struct attribute *xen_vbdstat_attrs[] = { >> &dev_attr_oo_req.attr, >> &dev_attr_rd_req.attr, >> &dev_attr_wr_req.attr, >> &dev_attr_f_req.attr, >> + &dev_attr_tr_req.attr, >> &dev_attr_rd_sect.attr, >> &dev_attr_wr_sect.attr, >> + &dev_attr_tr_sect.attr, >> NULL >> }; >> >> @@ -419,6 +423,59 @@ int xen_blkbk_flush_diskcache(struct xenbus_transaction xbt, >> return err; >> } >> >> +int xen_blkbk_trim(struct xenbus_transaction xbt, struct backend_info *be) >> +{ >> + struct xenbus_device *dev = be->dev; >> + struct xen_vbd *vbd = &be->blkif->vbd; >> + char *type; >> + int err; >> + int state = 0; >> + >> + type = xenbus_read(XBT_NIL, dev->nodename, "type", NULL); >> + if (!IS_ERR(type)) { >> + if (strcmp(type, "file") == 0) >> + state = 1; >> + vbd->type |= VDISK_FILE_BACKEND; >> + if (strcmp(type, "phy") == 0) { > > Use 'strncmp' please. gonna fix > >> + struct block_device *bdev = be->blkif->vbd.bdev; >> + struct request_queue *q = bdev_get_queue(bdev); >> + if (blk_queue_discard(q)) { >> + err = xenbus_printf(xbt, dev->nodename, >> + "discard_granularity", "%u", > > Hm, most of the items written to the Xenbus use '-', not '_'. > Any particular reason for using '_'? they are taken from struct queue_limits so I used the underscore, no problem to change them to '-' > >> + q->limits.discard_granularity); >> + if (err) { >> + xenbus_dev_fatal(dev, err, >> + "writing discard_granularity"); >> + goto kfree; >> + } >> + err = xenbus_printf(xbt, dev->nodename, >> + "discard_alignment", "%u", > > Ditto here. >> + q->limits.discard_alignment); >> + if (err) { >> + xenbus_dev_fatal(dev, err, >> + "writing discard_alignment"); >> + goto kfree; >> + } >> + state = 1; >> + vbd->type |= VDISK_PHY_BACKEND; >> + } >> + } >> + } else { >> + err = PTR_ERR(type); >> + xenbus_dev_fatal(dev, err, "reading type"); >> + goto out; >> + } >> + >> + err = xenbus_printf(xbt, dev->nodename, "feature-trim", >> + "%d", state); >> + if (err) >> + xenbus_dev_fatal(dev, err, "writing feature-trim"); >> +kfree: >> + kfree(type); >> +out: >> + return err; >> +} >> + >> /* >> * Entry point to this code when a new device is created. Allocate the basic >> * structures, and watch the store waiting for the hotplug scripts to tell us >> @@ -650,6 +707,10 @@ again: >> if (err) >> goto abort; >> >> + err = xen_blkbk_trim(xbt, be); >> + if (err) >> + goto abort; >> + >> err = xenbus_printf(xbt, dev->nodename, "sectors", "%llu", >> (unsigned long long)vbd_sz(&be->blkif->vbd)); >> if (err) { >> -- >> 1.7.6 >> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 3/3] xen-blkback: handle trim request in backend driver 2011-08-22 9:43 ` Li Dongyang @ 2011-08-22 13:27 ` Konrad Rzeszutek Wilk 0 siblings, 0 replies; 15+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-08-22 13:27 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On Mon, Aug 22, 2011 at 05:43:28PM +0800, Li Dongyang wrote: > On Thu, Aug 18, 2011 at 10:56 PM, Konrad Rzeszutek Wilk > <konrad.wilk@oracle.com> wrote: > > On Thu, Aug 18, 2011 at 05:34:31PM +0800, Li Dongyang wrote: > >> Now blkback driver can handle the trim request from guest, we will > >> forward the request to phy device if it really has trim support, or we'll > >> punch a hole on the image file. > >> > >> Signed-off-by: Li Dongyang <lidongyang@novell.com> > >> --- > >> drivers/block/xen-blkback/blkback.c | 85 +++++++++++++++++++++++++++++------ > >> drivers/block/xen-blkback/common.h | 4 +- > >> drivers/block/xen-blkback/xenbus.c | 61 +++++++++++++++++++++++++ > >> 3 files changed, 135 insertions(+), 15 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkback/blkback.c b/drivers/block/xen-blkback/blkback.c > >> index 2330a9a..5acc37a 100644 > >> --- a/drivers/block/xen-blkback/blkback.c > >> +++ b/drivers/block/xen-blkback/blkback.c > >> @@ -39,6 +39,9 @@ > >> #include <linux/list.h> > >> #include <linux/delay.h> > >> #include <linux/freezer.h> > >> +#include <linux/loop.h> > >> +#include <linux/falloc.h> > >> +#include <linux/fs.h> > >> > >> #include <xen/events.h> > >> #include <xen/page.h> > >> @@ -258,13 +261,16 @@ irqreturn_t xen_blkif_be_int(int irq, void *dev_id) > >> > >> static void print_stats(struct xen_blkif *blkif) > >> { > >> - pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d\n", > >> + pr_info("xen-blkback (%s): oo %3d | rd %4d | wr %4d | f %4d" > >> + " | tr %4d\n", > >> current->comm, blkif->st_oo_req, > >> - blkif->st_rd_req, blkif->st_wr_req, blkif->st_f_req); > >> + blkif->st_rd_req, blkif->st_wr_req, > >> + blkif->st_f_req, blkif->st_tr_req); > >> blkif->st_print = jiffies + msecs_to_jiffies(10 * 1000); > >> blkif->st_rd_req = 0; > >> blkif->st_wr_req = 0; > >> blkif->st_oo_req = 0; > >> + blkif->st_tr_req = 0; > >> } > >> > >> int xen_blkif_schedule(void *arg) > >> @@ -563,6 +569,10 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> blkif->st_f_req++; > >> operation = WRITE_FLUSH; > >> break; > >> + case BLKIF_OP_TRIM: > >> + blkif->st_tr_req++; > >> + operation = REQ_DISCARD; > >> + break; > >> case BLKIF_OP_WRITE_BARRIER: > >> default: > >> operation = 0; /* make gcc happy */ > >> @@ -572,7 +582,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> > >> /* Check that the number of segments is sane. */ > >> nseg = req->nr_segments; > >> - if (unlikely(nseg == 0 && operation != WRITE_FLUSH) || > >> + if (unlikely(nseg == 0 && operation != (WRITE_FLUSH | REQ_DISCARD)) || > >> unlikely(nseg > BLKIF_MAX_SEGMENTS_PER_REQUEST)) { > >> pr_debug(DRV_PFX "Bad number of segments in request (%d)\n", > >> nseg); > >> @@ -627,10 +637,13 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> * the hypercall to unmap the grants - that is all done in > >> * xen_blkbk_unmap. > >> */ > >> - if (xen_blkbk_map(req, pending_req, seg)) > >> + if (operation != BLKIF_OP_TRIM && xen_blkbk_map(req, pending_req, seg)) > >> goto fail_flush; > >> > >> - /* This corresponding xen_blkif_put is done in __end_block_io_op */ > >> + /* > >> + * This corresponding xen_blkif_put is done in __end_block_io_op, or > >> + * below if we are handling a BLKIF_OP_TRIM. > >> + */ > >> xen_blkif_get(blkif); > >> > >> for (i = 0; i < nseg; i++) { > >> @@ -654,18 +667,62 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif, > >> preq.sector_number += seg[i].nsec; > >> } > >> > >> - /* This will be hit if the operation was a flush. */ > >> + /* This will be hit if the operation was a flush or trim. */ > >> if (!bio) { > >> - BUG_ON(operation != WRITE_FLUSH); > >> + BUG_ON(operation != (WRITE_FLUSH | REQ_DISCARD)); > >> > >> - bio = bio_alloc(GFP_KERNEL, 0); > >> - if (unlikely(bio == NULL)) > >> - goto fail_put_bio; > >> + if (operation == WRITE_FLUSH) { > >> + bio = bio_alloc(GFP_KERNEL, 0); > >> + if (unlikely(bio == NULL)) > >> + goto fail_put_bio; > >> > >> - biolist[nbio++] = bio; > >> - bio->bi_bdev = preq.bdev; > >> - bio->bi_private = pending_req; > >> - bio->bi_end_io = end_block_io_op; > >> + biolist[nbio++] = bio; > >> + bio->bi_bdev = preq.bdev; > >> + bio->bi_private = pending_req; > >> + bio->bi_end_io = end_block_io_op; > >> + } else if (operation == REQ_DISCARD) { > >> + int err = 0; > >> + int status = BLKIF_RSP_OKAY; > >> + struct block_device *bdev = blkif->vbd.bdev; > >> + > >> + preq.nr_sects = req->u.trim.nr_sectors; > >> + if (blkif->vbd.type & VDISK_PHY_BACKEND) > >> + /* just forward the trim request */ > >> + err = blkdev_issue_discard(bdev, > >> + preq.sector_number, > >> + preq.nr_sects, > >> + GFP_KERNEL, 0); > >> + else if (blkif->vbd.type & VDISK_FILE_BACKEND) { > >> + /* punch a hole in the backing file */ > >> + struct loop_device *lo = > >> + bdev->bd_disk->private_data; > >> + struct file *file = lo->lo_backing_file; > >> + > >> + if (file->f_op->fallocate) > >> + err = file->f_op->fallocate(file, > >> + FALLOC_FL_KEEP_SIZE | > >> + FALLOC_FL_PUNCH_HOLE, > >> + preq.sector_number << 9, > >> + preq.nr_sects << 9); > >> + else > >> + err = -EOPNOTSUPP; > >> + } else > >> + status = BLKIF_RSP_EOPNOTSUPP; > >> + > >> + if (err == -EOPNOTSUPP) { > >> + DPRINTK("blkback: discard op failed, " > >> + "not supported\n"); > > > > Use pr_debug like the rest of the file does. > gonna fix > > > >> + status = BLKIF_RSP_EOPNOTSUPP; > >> + } else if (err) > >> + status = BLKIF_RSP_ERROR; > >> + > >> + if (status == BLKIF_RSP_OKAY) > >> + blkif->st_tr_sect += preq.nr_sects; > >> + make_response(blkif, req->id, req->operation, status); > >> + xen_blkif_put(blkif); > >> + free_req(pending_req); > >> + return 0; > >> + } > > > > All of this should really be moved to its own function. > not quite clear about this, do you mean we should make sth like > dispatch_trim_block_io only for OP_TRIM? > I added the trim handling stuff to dispatch_rw_block_io because it > also handles flush stuff. That function is getting quite busy - it has a lot of code. My thought was that you could seperate it out. Like if (!bio) { if (operation == WRITE_FLUSH) { bio = bio_alloc(..) .. snip (the same as your patch has it. } else if (operation == REQ_DISCARD) { xen_blk_trim(blkif, preq); return 0; } And do all of the operation in xen_blk_trim. Including the make_response .. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 0/3] xen-blkfront/xen-blkback trim support 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang ` (2 preceding siblings ...) 2011-08-18 9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang @ 2011-08-18 14:56 ` Konrad Rzeszutek Wilk 2011-08-18 15:06 ` Konrad Rzeszutek Wilk 2011-08-20 0:41 ` Jeremy Fitzhardinge 5 siblings, 0 replies; 15+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-08-18 14:56 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On Thu, Aug 18, 2011 at 05:34:28PM +0800, Li Dongyang wrote: > Hi, List > Here is V2 of the trim support for blkfront/blkback, > it's now rebased on Jeremy's next-3.1 tree and has some adjustments > according to the comments from Jan Beulich, Thanks Great! Thanks for posting them! > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 0/3] xen-blkfront/xen-blkback trim support 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang ` (3 preceding siblings ...) 2011-08-18 14:56 ` [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Konrad Rzeszutek Wilk @ 2011-08-18 15:06 ` Konrad Rzeszutek Wilk 2011-08-20 0:41 ` Jeremy Fitzhardinge 5 siblings, 0 replies; 15+ messages in thread From: Konrad Rzeszutek Wilk @ 2011-08-18 15:06 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On Thu, Aug 18, 2011 at 05:34:28PM +0800, Li Dongyang wrote: > Hi, List > Here is V2 of the trim support for blkfront/blkback, > it's now rebased on Jeremy's next-3.1 tree and has some adjustments In the future, please base it on top of a Linus's kernel - say 3.1-rc2. > according to the comments from Jan Beulich, Thanks > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 0/3] xen-blkfront/xen-blkback trim support 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang ` (4 preceding siblings ...) 2011-08-18 15:06 ` Konrad Rzeszutek Wilk @ 2011-08-20 0:41 ` Jeremy Fitzhardinge 5 siblings, 0 replies; 15+ messages in thread From: Jeremy Fitzhardinge @ 2011-08-20 0:41 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith, JBeulich On 08/18/2011 02:34 AM, Li Dongyang wrote: > Hi, List > Here is V2 of the trim support for blkfront/blkback, > it's now rebased on Jeremy's next-3.1 tree and has some adjustments Please don't use that branch; its really just for my personal use, and I don't intend it for general use. New patches should be based on Linus's linux.git. J ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <6688c4e4.427@victor.provo.novell.com>]
[parent not found: <4E4D02F80200007800051CB6@victor.provo.novell.com>]
* Re: <subject missing #2> [not found] ` <4E4D02F80200007800051CB6@victor.provo.novell.com> @ 2011-08-22 9:19 ` Li Dongyang 2011-08-22 9:55 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Jan Beulich 0 siblings, 1 reply; 15+ messages in thread From: Li Dongyang @ 2011-08-22 9:19 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel, owen.smith On Thu, Aug 18, 2011 at 6:18 PM, Jan Beulich <JBeulich@novell.com> wrote: > >>> On 18.08.11 at 11:35, Li Dongyang <lidongyang@novell.com> wrote: >> JBeulich@novell.com >> Subject: [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim >> request >> Date: Thu, 18 Aug 2011 17:34:30 +0800 >> Message-Id: <1313660071-25230-3-git-send-email-lidongyang@novell.com> >> X-Mailer: git-send-email 1.7.6 >> In-Reply-To: <1313660071-25230-1-git-send-email-lidongyang@novell.com> >> References: <1313660071-25230-1-git-send-email-lidongyang@novell.com> >> >> The blkfront driver now will read feature-trim from xenstore, >> and set up the request queue with trim params, then we can forward the >> discard requests to backend driver. >> >> Signed-off-by: Li Dongyang <lidongyang@novell.com> >> --- >> drivers/block/xen-blkfront.c | 96 ++++++++++++++++++++++++++++++++---------- >> 1 files changed, 73 insertions(+), 23 deletions(-) >> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c >> index b536a9c..c22fb07 100644 >> --- a/drivers/block/xen-blkfront.c >> +++ b/drivers/block/xen-blkfront.c >> @@ -98,6 +98,9 @@ struct blkfront_info >> unsigned long shadow_free; >> unsigned int feature_flush; >> unsigned int flush_op; >> + unsigned int feature_trim; >> + unsigned int discard_granularity; >> + unsigned int discard_alignment; >> int is_ready; >> }; >> >> @@ -302,29 +305,36 @@ static int blkif_queue_request(struct request *req) >> ring_req->operation = info->flush_op; >> } >> >> - ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); >> - BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); >> + if (unlikely(req->cmd_flags & REQ_DISCARD)) { >> + /* id, sector_number and handle are set above. */ >> + ring_req->operation = BLKIF_OP_TRIM; >> + ring_req->nr_segments = 0; >> + ring_req->u.trim.nr_sectors = blk_rq_sectors(req); >> + } else { >> + ring_req->nr_segments = blk_rq_map_sg(req->q, req, info->sg); >> + BUG_ON(ring_req->nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST); >> >> - for_each_sg(info->sg, sg, ring_req->nr_segments, i) { >> - buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); >> - fsect = sg->offset >> 9; >> - lsect = fsect + (sg->length >> 9) - 1; >> - /* install a grant reference. */ >> - ref = gnttab_claim_grant_reference(&gref_head); >> - BUG_ON(ref == -ENOSPC); >> + for_each_sg(info->sg, sg, ring_req->nr_segments, i) { >> + buffer_mfn = pfn_to_mfn(page_to_pfn(sg_page(sg))); >> + fsect = sg->offset >> 9; >> + lsect = fsect + (sg->length >> 9) - 1; >> + /* install a grant reference. */ >> + ref = gnttab_claim_grant_reference(&gref_head); >> + BUG_ON(ref == -ENOSPC); >> >> - gnttab_grant_foreign_access_ref( >> - ref, >> - info->xbdev->otherend_id, >> - buffer_mfn, >> - rq_data_dir(req) ); >> - >> - info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); >> - ring_req->u.rw.seg[i] = >> - (struct blkif_request_segment) { >> - .gref = ref, >> - .first_sect = fsect, >> - .last_sect = lsect }; >> + gnttab_grant_foreign_access_ref( >> + ref, >> + info->xbdev->otherend_id, >> + buffer_mfn, >> + rq_data_dir(req)); >> + >> + info->shadow[id].frame[i] = mfn_to_pfn(buffer_mfn); >> + ring_req->u.rw.seg[i] = >> + (struct blkif_request_segment) { >> + .gref = ref, >> + .first_sect = fsect, >> + .last_sect = lsect }; >> + } >> } >> >> info->ring.req_prod_pvt++; >> @@ -399,6 +409,7 @@ wait: >> static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size) >> { >> struct request_queue *rq; >> + struct blkfront_info *info = gd->private_data; >> >> rq = blk_init_queue(do_blkif_request, &blkif_io_lock); >> if (rq == NULL) >> @@ -406,6 +417,13 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 >> sector_size) >> >> queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq); >> >> + if (info->feature_trim) { >> + queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, rq); >> + blk_queue_max_discard_sectors(rq, get_capacity(gd)); >> + rq->limits.discard_granularity = info->discard_granularity; >> + rq->limits.discard_alignment = info->discard_alignment; >> + } >> + >> /* Hard sector size and max sectors impersonate the equiv. hardware. */ >> blk_queue_logical_block_size(rq, sector_size); >> blk_queue_max_hw_sectors(rq, 512); >> @@ -722,6 +740,19 @@ static irqreturn_t blkif_interrupt(int irq, void >> *dev_id) >> >> error = (bret->status == BLKIF_RSP_OKAY) ? 0 : -EIO; >> switch (bret->operation) { >> + case BLKIF_OP_TRIM: >> + if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { >> + struct request_queue *rq = info->rq; >> + printk(KERN_WARNING "blkfront: %s: trim op failed\n", >> + info->gd->disk_name); >> + error = -EOPNOTSUPP; >> + info->feature_trim = 0; >> + spin_lock_irq(rq->queue_lock); >> + queue_flag_clear(QUEUE_FLAG_DISCARD, rq); >> + spin_unlock_irq(rq->queue_lock); > > Are you sure you want to re-enable interrupts here unconditionally? hm, you're right, as we called spin_lock_irqsave() above, I think it's safe for us to just use spin_lock and spin_unlock here. > >> + } >> + __blk_end_request_all(req, error); >> + break; >> case BLKIF_OP_FLUSH_DISKCACHE: >> case BLKIF_OP_WRITE_BARRIER: >> if (unlikely(bret->status == BLKIF_RSP_EOPNOTSUPP)) { >> @@ -1108,7 +1139,9 @@ static void blkfront_connect(struct blkfront_info >> *info) >> unsigned long sector_size; >> unsigned int binfo; >> int err; >> - int barrier, flush; >> + unsigned int discard_granularity; >> + unsigned int discard_alignment; >> + int barrier, flush, trim; >> >> switch (info->connected) { >> case BLKIF_STATE_CONNECTED: >> @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info >> *info) >> info->feature_flush = REQ_FLUSH; >> info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; >> } >> - >> + >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >> + "feature-trim", "%d", &trim, >> + NULL); > > So you switched this to use "trim", but ... > >> + >> + if (!err && trim) { >> + info->feature_trim = 1; >> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >> + "discard_granularity", "%u", &discard_granularity, >> + "discard_alignment", "%u", &discard_alignment, > > ... you kept these using "discard" - quite inconsistent. the discard_granularity and discard_alignment are taken from struct queue_limits they are needed to setup the queue in guest if we are using a phy device has trim. so I think we can keep the names here. > > Also, I think (but I may be wrong) that dashes are preferred over > underscores in xenstore node names. that could be done in xenstore, I used dashes because they are taken from struct queue_limits > >> + NULL); >> + if (!err) { >> + info->discard_granularity = discard_granularity; >> + info->discard_alignment = discard_alignment; > > I think you should set info->feature_trim only here (and then you can > eliminate the local variables). those discard_granularity and discard_alignment are only meaningful if the backend is a phy device has trim, and the 2 args are read from the queue limits from host. if the backend is a file, we can go with no discard_granularity and discard_alignment because we are about to punch a hole in the image file. > > Jan > >> + } >> + } else >> + info->feature_trim = 0; >> + >> err = xlvbd_alloc_gendisk(sectors, info, binfo, sector_size); >> if (err) { >> xenbus_dev_fatal(info->xbdev, err, "xlvbd_add at %s", > > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request 2011-08-22 9:19 ` <subject missing #2> Li Dongyang @ 2011-08-22 9:55 ` Jan Beulich 0 siblings, 0 replies; 15+ messages in thread From: Jan Beulich @ 2011-08-22 9:55 UTC (permalink / raw) To: Li Dongyang; +Cc: xen-devel, owen.smith >>> On 22.08.11 at 11:19, Li Dongyang <jerry87905@gmail.com> wrote: > On Thu, Aug 18, 2011 at 6:18 PM, Jan Beulich <JBeulich@novell.com> wrote: >> >>> On 18.08.11 at 11:35, Li Dongyang <lidongyang@novell.com> wrote: >>> @@ -1178,7 +1211,24 @@ static void blkfront_connect(struct blkfront_info >>> *info) >>> info->feature_flush = REQ_FLUSH; >>> info->flush_op = BLKIF_OP_FLUSH_DISKCACHE; >>> } >>> - >>> + >>> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >>> + "feature-trim", "%d", &trim, >>> + NULL); >> >> So you switched this to use "trim", but ... >> >>> + >>> + if (!err && trim) { >>> + info->feature_trim = 1; >>> + err = xenbus_gather(XBT_NIL, info->xbdev->otherend, >>> + "discard_granularity", "%u", &discard_granularity, >>> + "discard_alignment", "%u", &discard_alignment, >> >> ... you kept these using "discard" - quite inconsistent. > the discard_granularity and discard_alignment are taken from struct > queue_limits > they are needed to setup the queue in guest if we are using a phy > device has trim. > so I think we can keep the names here. The way Linux names them doesn't matter for the xenstore interface. I think they should be named so that the connection to the base node's name is obvious. >> >> Also, I think (but I may be wrong) that dashes are preferred over >> underscores in xenstore node names. > that could be done in xenstore, I used dashes because they are taken > from struct queue_limits >> >>> + NULL); >>> + if (!err) { >>> + info->discard_granularity = discard_granularity; >>> + info->discard_alignment = discard_alignment; >> >> I think you should set info->feature_trim only here (and then you can >> eliminate the local variables). > those discard_granularity and discard_alignment are only meaningful if > the backend is a phy device has trim, > and the 2 args are read from the queue limits from host. if the > backend is a file, we can go with no discard_granularity > and discard_alignment because we are about to punch a hole in the image > file. Then you should update the condition accordingly. Setting info->feature_trim prematurely is just calling for later problems. Jan ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-08-22 17:44 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-08-18 9:34 [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Li Dongyang 2011-08-18 9:34 ` [PATCH V2 1/3] xen-blkfront: add BLKIF_OP_TRIM and backend type flags Li Dongyang 2011-08-20 0:38 ` Jeremy Fitzhardinge 2011-08-22 9:36 ` Li Dongyang 2011-08-22 17:44 ` Jeremy Fitzhardinge 2011-08-18 9:34 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Li Dongyang 2011-08-18 15:05 ` Konrad Rzeszutek Wilk 2011-08-18 9:34 ` [PATCH V2 3/3] xen-blkback: handle trim request in backend driver Li Dongyang 2011-08-18 14:56 ` Konrad Rzeszutek Wilk 2011-08-22 9:43 ` Li Dongyang 2011-08-22 13:27 ` Konrad Rzeszutek Wilk 2011-08-18 14:56 ` [PATCH V2 0/3] xen-blkfront/xen-blkback trim support Konrad Rzeszutek Wilk 2011-08-18 15:06 ` Konrad Rzeszutek Wilk 2011-08-20 0:41 ` Jeremy Fitzhardinge [not found] <6688c4e4.427@victor.provo.novell.com> [not found] ` <4E4D02F80200007800051CB6@victor.provo.novell.com> 2011-08-22 9:19 ` <subject missing #2> Li Dongyang 2011-08-22 9:55 ` [PATCH V2 2/3] xen-blkfront: teach blkfront driver handle trim request Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.