All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
@ 2010-12-10 15:00 Christoph Hellwig
  2010-12-10 15:00 ` [Qemu-devel] [PATCH 1/7] block: add discard support Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:00 UTC (permalink / raw)
  To: qemu-devel

This patchset adds support for the ATA TRIM and SCSI WRITE SAME with
unmap commands, which allow reclaiming free space from a backing image.

The user facing implementation is pretty complete, but not really
efficient because the underlying bdrv_discard implementation doesn't
use the aio implementation yet.  The reason for that is that the SCSI
layer doesn't really allow any asynchronous commands except for
READ/WRITE by design, and implementing the ATA TRIM command with it's
multiple ranges is rather painful, and combined with the SCSI limitation
I didn't bother yet.  The only backend support so far is the XFS hole
punching ioctl, but others can be added easily when they become
available.  A virtio implementation for a discard command would also
be pretty easy, but until we actually support a better backend then
a plain sparse file it's not worth using for production enviroments
anyway, but more for playing with the thin provisioning infrastructure,
or observing guest behaviour when TRIM / unmap is supported.

If the support is enabled and the backend doesn't support hole punching
the TRIM / WRITE SAME commands become no-ops so that migration from
hosts supporting or not supporting it works.

Version 3:
	- refactor IDE dma support code
	- proper brace obsfucation
	- fix compile without xfs headers
	- use bool instead of int for a one-byte flag

Version 2:
	- replace tabs with spaces
	- return -ENOMEDIUM from bdrv_discard if there's no driver
	  assigned
	- actually list the TP EVPD page as supported when querying
	  for supported EVPD pages

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 1/7] block: add discard support
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
@ 2010-12-10 15:00 ` Christoph Hellwig
  2010-12-10 15:00 ` [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:00 UTC (permalink / raw)
  To: qemu-devel

Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2010-12-07 09:34:17.563010201 +0100
+++ qemu/block.c	2010-12-07 09:37:44.734255486 +0100
@@ -1499,6 +1499,17 @@ int bdrv_has_zero_init(BlockDriverState
     return 1;
 }
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    if (!bs->drv) {
+        return -ENOMEDIUM;
+    }
+    if (!bs->drv->bdrv_discard) {
+        return 0;
+    }
+    return bs->drv->bdrv_discard(bs, sector_num, nb_sectors);
+}
+
 /*
  * Returns true iff the specified sector is present in the disk image. Drivers
  * not implementing the functionality are assumed to not support backing files,
Index: qemu/block.h
===================================================================
--- qemu.orig/block.h	2010-12-07 09:34:17.571010061 +0100
+++ qemu/block.h	2010-12-07 09:34:29.689022144 +0100
@@ -146,6 +146,7 @@ int bdrv_flush(BlockDriverState *bs);
 void bdrv_flush_all(void);
 void bdrv_close_all(void);
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 	int *pnum);
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2010-12-07 09:34:17.588010480 +0100
+++ qemu/block_int.h	2010-12-07 09:34:29.693010131 +0100
@@ -73,6 +73,8 @@ struct BlockDriver {
         BlockDriverCompletionFunc *cb, void *opaque);
     BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
+    int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
+                        int nb_sectors);
 
     int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
         int num_reqs);
@@ -227,6 +229,7 @@ typedef struct BlockConf {
     uint16_t logical_block_size;
     uint16_t min_io_size;
     uint32_t opt_io_size;
+    uint32_t discard_granularity;
 } BlockConf;
 
 static inline unsigned int get_physical_block_exp(BlockConf *conf)
@@ -249,6 +252,8 @@ static inline unsigned int get_physical_
     DEFINE_PROP_UINT16("physical_block_size", _state,                   \
                        _conf.physical_block_size, 512),                 \
     DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
-    DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0)
+    DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0), \
+    DEFINE_PROP_UINT32("discard_granularity", _state, \
+                       _conf.discard_granularity, 0)
 
 #endif /* BLOCK_INT_H */
Index: qemu/block/raw.c
===================================================================
--- qemu.orig/block/raw.c	2010-12-07 09:34:17.596025496 +0100
+++ qemu/block/raw.c	2010-12-07 09:34:29.695297321 +0100
@@ -65,6 +65,11 @@ static int raw_probe(const uint8_t *buf,
    return 1; /* everything can be opened as raw image */
 }
 
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    return bdrv_discard(bs->file, sector_num, nb_sectors);
+}
+
 static int raw_is_inserted(BlockDriverState *bs)
 {
     return bdrv_is_inserted(bs->file);
@@ -130,6 +135,7 @@ static BlockDriver bdrv_raw = {
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
     .bdrv_aio_flush     = raw_aio_flush,
+    .bdrv_discard       = raw_discard,
 
     .bdrv_is_inserted   = raw_is_inserted,
     .bdrv_eject         = raw_eject,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
  2010-12-10 15:00 ` [Qemu-devel] [PATCH 1/7] block: add discard support Christoph Hellwig
@ 2010-12-10 15:00 ` Christoph Hellwig
  2010-12-16 15:48   ` Kevin Wolf
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 3/7] make dma_bdrv_io available to drivers Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:00 UTC (permalink / raw)
  To: qemu-devel

Support discards via the WRITE SAME command with the unmap bit set, and
tell the initiator about the support for it via the block limit and the
new thin provisioning EVPD pages.  Also fix the comment which incorrectly
describedthe block limits EVPD page.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/scsi-disk.c
===================================================================
--- qemu.orig/hw/scsi-disk.c	2010-12-07 09:35:44.203009851 +0100
+++ qemu/hw/scsi-disk.c	2010-12-07 09:42:07.984255141 +0100
@@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCS
             outbuf[buflen++] = 0x80; // unit serial number
             outbuf[buflen++] = 0x83; // device identification
             if (bdrv_get_type_hint(s->bs) != BDRV_TYPE_CDROM) {
-                outbuf[buflen++] = 0xb0; // block device characteristics
+                outbuf[buflen++] = 0xb0; // block limits
+                outbuf[buflen++] = 0xb2; // thin provisioning
             }
             outbuf[pages] = buflen - pages - 1; // number of pages
             break;
@@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCS
             buflen += id_len;
             break;
         }
-        case 0xb0: /* block device characteristics */
+        case 0xb0: /* block limits */
         {
+            unsigned int unmap_sectors =
+                    s->qdev.conf.discard_granularity / s->qdev.blocksize;
             unsigned int min_io_size =
                     s->qdev.conf.min_io_size / s->qdev.blocksize;
             unsigned int opt_io_size =
@@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCS
             outbuf[13] = (opt_io_size >> 16) & 0xff;
             outbuf[14] = (opt_io_size >> 8) & 0xff;
             outbuf[15] = opt_io_size & 0xff;
+
+            /* optimal unmap granularity */
+            outbuf[28] = (unmap_sectors >> 24) & 0xff;
+            outbuf[29] = (unmap_sectors >> 16) & 0xff;
+            outbuf[30] = (unmap_sectors >> 8) & 0xff;
+            outbuf[31] = unmap_sectors & 0xff;
+            break;
+        }
+        case 0xb2: /* thin provisioning */
+        {
+            outbuf[3] = buflen = 8;
+            outbuf[4] = 0;
+            outbuf[5] = 0x40; /* write same with unmap supported */
+            outbuf[6] = 0;
+            outbuf[7] = 0;
             break;
         }
         default:
@@ -959,6 +977,11 @@ static int scsi_disk_emulate_command(SCS
             outbuf[11] = 0;
             outbuf[12] = 0;
             outbuf[13] = get_physical_block_exp(&s->qdev.conf);
+
+            /* set TPE bit if the format supports discard */
+            if (s->qdev.conf.discard_granularity)
+                outbuf[14] = 0x80;
+
             /* Protection, exponent and lowest lba field left blank. */
             buflen = req->cmd.xfer;
             break;
@@ -1123,6 +1146,34 @@ static int32_t scsi_send_command(SCSIDev
             goto illegal_lba;
         }
         break;
+    case WRITE_SAME_16:
+        len = r->req.cmd.xfer / d->blocksize;
+
+        DPRINTF("WRITE SAME(16) (sector %" PRId64 ", count %d)\n",
+                r->req.cmd.lba, len);
+
+        if (r->req.cmd.lba > s->max_lba) {
+            goto illegal_lba;
+        }
+
+        /*
+         * We only support WRITE SAME with the unmap bit set for now.
+         */
+        if (!(buf[1] & 0x8)) {
+            goto fail;
+        }
+
+        rc = bdrv_discard(s->bs, r->req.cmd.lba * s->cluster_size,
+                          len * s->cluster_size);
+        if (rc < 0) {
+            /* XXX: better error code ?*/
+            goto fail;
+        }
+
+        scsi_req_set_status(r, GOOD, NO_SENSE);
+        scsi_req_complete(&r->req);
+        scsi_remove_request(r);
+        return 0;
     default:
         DPRINTF("Unknown SCSI command (%2.2x)\n", buf[0]);
     fail:
Index: qemu/hw/scsi-defs.h
===================================================================
--- qemu.orig/hw/scsi-defs.h	2010-12-07 09:35:44.219005032 +0100
+++ qemu/hw/scsi-defs.h	2010-12-07 09:38:35.726003986 +0100
@@ -84,6 +84,7 @@
 #define MODE_SENSE_10         0x5a
 #define PERSISTENT_RESERVE_IN 0x5e
 #define PERSISTENT_RESERVE_OUT 0x5f
+#define WRITE_SAME_16         0x93
 #define MAINTENANCE_IN        0xa3
 #define MAINTENANCE_OUT       0xa4
 #define MOVE_MEDIUM           0xa5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 3/7] make dma_bdrv_io available to drivers
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
  2010-12-10 15:00 ` [Qemu-devel] [PATCH 1/7] block: add discard support Christoph Hellwig
  2010-12-10 15:00 ` [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
@ 2010-12-10 15:01 ` Christoph Hellwig
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 4/7] ide: factor dma handling helpers Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:01 UTC (permalink / raw)
  To: qemu-devel

Make dma_bdrv_io available for drivers, and pass an explicit I/O function
instead of hardcoding bdrv_aio_readv/bdrv_aio_writev.  This is required
to implement non-READ/WRITE dma commands in the ide driver, e.g. the
upcoming TRIM support.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/dma-helpers.c
===================================================================
--- qemu.orig/dma-helpers.c	2010-11-23 14:13:42.178253390 +0100
+++ qemu/dma-helpers.c	2010-11-23 14:14:45.186253110 +0100
@@ -47,6 +47,7 @@ typedef struct {
     target_phys_addr_t sg_cur_byte;
     QEMUIOVector iov;
     QEMUBH *bh;
+    DMAIOFunc *io_func;
 } DMAAIOCB;
 
 static void dma_bdrv_cb(void *opaque, int ret);
@@ -116,13 +117,8 @@ static void dma_bdrv_cb(void *opaque, in
         return;
     }
 
-    if (dbs->is_write) {
-        dbs->acb = bdrv_aio_writev(dbs->bs, dbs->sector_num, &dbs->iov,
-                                   dbs->iov.size / 512, dma_bdrv_cb, dbs);
-    } else {
-        dbs->acb = bdrv_aio_readv(dbs->bs, dbs->sector_num, &dbs->iov,
-                                  dbs->iov.size / 512, dma_bdrv_cb, dbs);
-    }
+    dbs->acb = dbs->io_func(dbs->bs, dbs->sector_num, &dbs->iov,
+                            dbs->iov.size / 512, dma_bdrv_cb, dbs);
     if (!dbs->acb) {
         dma_bdrv_unmap(dbs);
         qemu_iovec_destroy(&dbs->iov);
@@ -144,12 +140,12 @@ static AIOPool dma_aio_pool = {
     .cancel             = dma_aio_cancel,
 };
 
-static BlockDriverAIOCB *dma_bdrv_io(
+BlockDriverAIOCB *dma_bdrv_io(
     BlockDriverState *bs, QEMUSGList *sg, uint64_t sector_num,
-    BlockDriverCompletionFunc *cb, void *opaque,
-    int is_write)
+    DMAIOFunc *io_func, BlockDriverCompletionFunc *cb,
+    void *opaque, int is_write)
 {
-    DMAAIOCB *dbs =  qemu_aio_get(&dma_aio_pool, bs, cb, opaque);
+    DMAAIOCB *dbs = qemu_aio_get(&dma_aio_pool, bs, cb, opaque);
 
     dbs->acb = NULL;
     dbs->bs = bs;
@@ -158,6 +154,7 @@ static BlockDriverAIOCB *dma_bdrv_io(
     dbs->sg_cur_index = 0;
     dbs->sg_cur_byte = 0;
     dbs->is_write = is_write;
+    dbs->io_func = io_func;
     dbs->bh = NULL;
     qemu_iovec_init(&dbs->iov, sg->nsg);
     dma_bdrv_cb(dbs, 0);
@@ -173,12 +170,12 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDri
                                 QEMUSGList *sg, uint64_t sector,
                                 void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_bdrv_io(bs, sg, sector, cb, opaque, 0);
+    return dma_bdrv_io(bs, sg, sector, bdrv_aio_readv, cb, opaque, 0);
 }
 
 BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
                                  QEMUSGList *sg, uint64_t sector,
                                  void (*cb)(void *opaque, int ret), void *opaque)
 {
-    return dma_bdrv_io(bs, sg, sector, cb, opaque, 1);
+    return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, 1);
 }
Index: qemu/dma.h
===================================================================
--- qemu.orig/dma.h	2010-11-23 14:13:42.185253809 +0100
+++ qemu/dma.h	2010-11-23 14:14:45.186253110 +0100
@@ -32,6 +32,14 @@ void qemu_sglist_add(QEMUSGList *qsg, ta
                      target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 
+typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, int64_t sector_num,
+                                 QEMUIOVector *iov, int nb_sectors,
+                                 BlockDriverCompletionFunc *cb, void *opaque);
+
+BlockDriverAIOCB *dma_bdrv_io(BlockDriverState *bs,
+                              QEMUSGList *sg, uint64_t sector_num,
+                              DMAIOFunc *io_func, BlockDriverCompletionFunc *cb,
+                              void *opaque, int is_write);
 BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
                                 QEMUSGList *sg, uint64_t sector,
                                 BlockDriverCompletionFunc *cb, void *opaque);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 4/7] ide: factor dma handling helpers
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (2 preceding siblings ...)
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 3/7] make dma_bdrv_io available to drivers Christoph Hellwig
@ 2010-12-10 15:01 ` Christoph Hellwig
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 5/7] ide: also reset io_buffer_index for writes Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:01 UTC (permalink / raw)
  To: qemu-devel

The DMA I/O path is duplicated between read and write commands, and
support for the TRIM command would add another copy.  Factor the
code into common helpers using the s->is_read flag added for the
macio ATA controller, and the newly added dma_bdrv_io function.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/ide/core.c
===================================================================
--- qemu.orig/hw/ide/core.c	2010-12-10 11:17:57.328254648 +0100
+++ qemu/hw/ide/core.c	2010-12-10 11:35:23.164005731 +0100
@@ -61,7 +61,8 @@ static inline int media_is_cd(IDEState *
     return (media_present(s) && s->nb_sectors <= CD_MAX_SECTORS);
 }
 
-static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb);
+static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb,
+       DMAIOFunc *io_func);
 static void ide_dma_restart(IDEState *s, int is_read);
 static void ide_atapi_cmd_read_dma_cb(void *opaque, int ret);
 static int ide_handle_rw_error(IDEState *s, int error, int op);
@@ -568,7 +569,7 @@ static int dma_buf_rw(BMDMAState *bm, in
     return 1;
 }
 
-static void ide_read_dma_cb(void *opaque, int ret)
+static void ide_dma_cb(void *opaque, int ret)
 {
     BMDMAState *bm = opaque;
     IDEState *s = bmdma_active_if(bm);
@@ -576,9 +577,12 @@ static void ide_read_dma_cb(void *opaque
     int64_t sector_num;
 
     if (ret < 0) {
-        if (ide_handle_rw_error(s, -ret,
-            BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ))
-        {
+        int op = BM_STATUS_DMA_RETRY;
+
+        if (s->is_read)
+            op |= BM_STATUS_RETRY_READ;
+
+        if (ide_handle_rw_error(s, -ret, op)) {
             return;
         }
     }
@@ -586,7 +590,7 @@ static void ide_read_dma_cb(void *opaque
     n = s->io_buffer_size >> 9;
     sector_num = ide_get_sector(s);
     if (n > 0) {
-        dma_buf_commit(s, 1);
+        dma_buf_commit(s, s->is_read);
         sector_num += n;
         ide_set_sector(s, sector_num);
         s->nsector -= n;
@@ -596,32 +600,39 @@ static void ide_read_dma_cb(void *opaque
     if (s->nsector == 0) {
         s->status = READY_STAT | SEEK_STAT;
         ide_set_irq(s->bus);
-    eot:
-        bm->status |= BM_STATUS_INT;
-        ide_dma_set_inactive(bm);
-        return;
+        goto eot;
     }
 
     /* launch next transfer */
     n = s->nsector;
-    s->io_buffer_index = 0;
+    if (s->is_read)
+        s->io_buffer_index = 0;
     s->io_buffer_size = n * 512;
-    if (dma_buf_prepare(bm, 1) == 0)
+    if (dma_buf_prepare(bm, s->is_read) == 0)
         goto eot;
+
 #ifdef DEBUG_AIO
-    printf("aio_read: sector_num=%" PRId64 " n=%d\n", sector_num, n);
+    printf("ide_dma_cb: read: %d sector_num=%" PRId64 " n=%d\n",
+           s->is_read, sector_num, n);
 #endif
-    bm->aiocb = dma_bdrv_read(s->bs, &s->sg, sector_num, ide_read_dma_cb, bm);
-    ide_dma_submit_check(s, ide_read_dma_cb, bm);
+
+    bm->aiocb = dma_bdrv_io(s->bs, &s->sg, sector_num, bm->io_func,
+                            ide_dma_cb, bm, !s->is_read);
+    ide_dma_submit_check(s, ide_dma_cb, bm);
+    return;
+
+eot:
+    bm->status |= BM_STATUS_INT;
+    ide_dma_set_inactive(bm);
 }
 
-static void ide_sector_read_dma(IDEState *s)
+static void ide_sector_dma(IDEState *s, DMAIOFunc *io_func, bool is_read)
 {
     s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
-    s->is_read = 1;
-    ide_dma_start(s, ide_read_dma_cb);
+    s->is_read = is_read;
+    ide_dma_start(s, ide_dma_cb, io_func);
 }
 
 static void ide_sector_write_timer_cb(void *opaque)
@@ -714,58 +725,6 @@ void ide_dma_restart_cb(void *opaque, in
     }
 }
 
-static void ide_write_dma_cb(void *opaque, int ret)
-{
-    BMDMAState *bm = opaque;
-    IDEState *s = bmdma_active_if(bm);
-    int n;
-    int64_t sector_num;
-
-    if (ret < 0) {
-        if (ide_handle_rw_error(s, -ret,  BM_STATUS_DMA_RETRY))
-            return;
-    }
-
-    n = s->io_buffer_size >> 9;
-    sector_num = ide_get_sector(s);
-    if (n > 0) {
-        dma_buf_commit(s, 0);
-        sector_num += n;
-        ide_set_sector(s, sector_num);
-        s->nsector -= n;
-    }
-
-    /* end of transfer ? */
-    if (s->nsector == 0) {
-        s->status = READY_STAT | SEEK_STAT;
-        ide_set_irq(s->bus);
-    eot:
-        bm->status |= BM_STATUS_INT;
-        ide_dma_set_inactive(bm);
-        return;
-    }
-
-    n = s->nsector;
-    s->io_buffer_size = n * 512;
-    /* launch next transfer */
-    if (dma_buf_prepare(bm, 0) == 0)
-        goto eot;
-#ifdef DEBUG_AIO
-    printf("aio_write: sector_num=%" PRId64 " n=%d\n", sector_num, n);
-#endif
-    bm->aiocb = dma_bdrv_write(s->bs, &s->sg, sector_num, ide_write_dma_cb, bm);
-    ide_dma_submit_check(s, ide_write_dma_cb, bm);
-}
-
-static void ide_sector_write_dma(IDEState *s)
-{
-    s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
-    s->io_buffer_index = 0;
-    s->io_buffer_size = 0;
-    s->is_read = 0;
-    ide_dma_start(s, ide_write_dma_cb);
-}
-
 void ide_atapi_cmd_ok(IDEState *s)
 {
     s->error = 0;
@@ -1003,7 +962,7 @@ static void ide_atapi_cmd_reply(IDEState
 
     if (s->atapi_dma) {
     	s->status = READY_STAT | SEEK_STAT | DRQ_STAT;
-	ide_dma_start(s, ide_atapi_cmd_read_dma_cb);
+	ide_dma_start(s, ide_atapi_cmd_read_dma_cb, NULL);
     } else {
     	s->status = READY_STAT | SEEK_STAT;
     	ide_atapi_cmd_reply_end(s);
@@ -1111,7 +1070,7 @@ static void ide_atapi_cmd_read_dma(IDESt
 
     /* XXX: check if BUSY_STAT should be set */
     s->status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
-    ide_dma_start(s, ide_atapi_cmd_read_dma_cb);
+    ide_dma_start(s, ide_atapi_cmd_read_dma_cb, NULL);
 }
 
 static void ide_atapi_cmd_read(IDEState *s, int lba, int nb_sectors,
@@ -1968,7 +1927,7 @@ void ide_ioport_write(void *opaque, uint
             if (!s->bs)
                 goto abort_cmd;
 	    ide_cmd_lba48_transform(s, lba48);
-            ide_sector_read_dma(s);
+            ide_sector_dma(s, bdrv_aio_readv, 1);
             break;
 	case WIN_WRITEDMA_EXT:
 	    lba48 = 1;
@@ -1977,7 +1936,7 @@ void ide_ioport_write(void *opaque, uint
             if (!s->bs)
                 goto abort_cmd;
 	    ide_cmd_lba48_transform(s, lba48);
-            ide_sector_write_dma(s);
+            ide_sector_dma(s, bdrv_aio_writev, 0);
             s->media_changed = 1;
             break;
         case WIN_READ_NATIVE_MAX_EXT:
@@ -2912,13 +2871,15 @@ const VMStateDescription vmstate_ide_bus
 /***********************************************************/
 /* PCI IDE definitions */
 
-static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb)
+static void ide_dma_start(IDEState *s, BlockDriverCompletionFunc *dma_cb,
+       DMAIOFunc *io_func)
 {
     BMDMAState *bm = s->bus->bmdma;
     if(!bm)
         return;
     bm->unit = s->unit;
     bm->dma_cb = dma_cb;
+    bm->io_func = io_func;
     bm->cur_prd_last = 0;
     bm->cur_prd_addr = 0;
     bm->cur_prd_len = 0;
@@ -2936,15 +2897,11 @@ static void ide_dma_restart(IDEState *s,
     s->io_buffer_index = 0;
     s->io_buffer_size = 0;
     s->nsector = bm->nsector;
+    s->is_read = is_read;
     bm->cur_addr = bm->addr;
 
-    if (is_read) {
-        bm->dma_cb = ide_read_dma_cb;
-    } else {
-        bm->dma_cb = ide_write_dma_cb;
-    }
-
-    ide_dma_start(s, bm->dma_cb);
+    bm->dma_cb = ide_dma_cb;
+    ide_dma_start(s, bm->dma_cb, bm->io_func);
 }
 
 void ide_dma_cancel(BMDMAState *bm)
Index: qemu/hw/ide/internal.h
===================================================================
--- qemu.orig/hw/ide/internal.h	2010-12-10 11:17:57.348254927 +0100
+++ qemu/hw/ide/internal.h	2010-12-10 11:33:41.444006150 +0100
@@ -433,7 +433,6 @@ struct IDEState {
     uint32_t mdata_size;
     uint8_t *mdata_storage;
     int media_changed;
-    /* for pmac */
     int is_read;
     /* SMART */
     uint8_t smart_enabled;
@@ -493,6 +492,7 @@ struct BMDMAState {
     uint8_t unit;
     BlockDriverCompletionFunc *dma_cb;
     BlockDriverAIOCB *aiocb;
+    DMAIOFunc *io_func;
     struct iovec iov;
     QEMUIOVector qiov;
     int64_t sector_num;
Index: qemu/hw/ide/macio.c
===================================================================
--- qemu.orig/hw/ide/macio.c	2010-12-10 11:17:57.357254927 +0100
+++ qemu/hw/ide/macio.c	2010-12-10 11:19:00.936260584 +0100
@@ -146,12 +146,8 @@ static void pmac_ide_transfer_cb(void *o
     io->addr += io->len;
     io->len = 0;
 
-    if (s->is_read)
-        m->aiocb = dma_bdrv_read(s->bs, &s->sg, sector_num,
-		                 pmac_ide_transfer_cb, io);
-    else
-        m->aiocb = dma_bdrv_write(s->bs, &s->sg, sector_num,
-		                  pmac_ide_transfer_cb, io);
+    m->aiocb = dma_bdrv_io(s->bs, &s->sg, sector_num, s->io_func,
+		           pmac_ide_transfer_cb, io, !s->is_read);
     if (!m->aiocb)
         pmac_ide_transfer_cb(io, -1);
 }

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 5/7] ide: also reset io_buffer_index for writes
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (3 preceding siblings ...)
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 4/7] ide: factor dma handling helpers Christoph Hellwig
@ 2010-12-10 15:01 ` Christoph Hellwig
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 6/7] ide: add TRIM support Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:01 UTC (permalink / raw)
  To: qemu-devel

Currenly the code only resets the io_buffer_index field for reads,
but the code seems to expect this for all types of I/O.  I guess
we simply don't hit large enough transfers that would require this
often enough.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/ide/core.c
===================================================================
--- qemu.orig/hw/ide/core.c	2010-12-10 11:35:23.164005731 +0100
+++ qemu/hw/ide/core.c	2010-12-10 11:35:30.471253949 +0100
@@ -605,8 +605,7 @@ static void ide_dma_cb(void *opaque, int
 
     /* launch next transfer */
     n = s->nsector;
-    if (s->is_read)
-        s->io_buffer_index = 0;
+    s->io_buffer_index = 0;
     s->io_buffer_size = n * 512;
     if (dma_buf_prepare(bm, s->is_read) == 0)
         goto eot;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 6/7] ide: add TRIM support
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (4 preceding siblings ...)
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 5/7] ide: also reset io_buffer_index for writes Christoph Hellwig
@ 2010-12-10 15:01 ` Christoph Hellwig
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 7/7] raw-posix: add discard support Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:01 UTC (permalink / raw)
  To: qemu-devel

Add support for the data set management command, and the TRIM sub function
in it.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/ide/core.c
===================================================================
--- qemu.orig/hw/ide/core.c	2010-12-10 11:35:30.471253949 +0100
+++ qemu/hw/ide/core.c	2010-12-10 11:35:33.430253739 +0100
@@ -146,6 +146,9 @@ static void ide_identify(IDEState *s)
     put_le16(p + 66, 120);
     put_le16(p + 67, 120);
     put_le16(p + 68, 120);
+    if (dev && dev->conf.discard_granularity) {
+        put_le16(p + 69, (1 << 14)); /* determinate TRIM behavior */
+    }
     put_le16(p + 80, 0xf0); /* ata3 -> ata6 supported */
     put_le16(p + 81, 0x16); /* conforms to ata5 */
     /* 14=NOP supported, 5=WCACHE supported, 0=SMART supported */
@@ -172,6 +175,9 @@ static void ide_identify(IDEState *s)
     dev = s->unit ? s->bus->slave : s->bus->master;
     if (dev && dev->conf.physical_block_size)
         put_le16(p + 106, 0x6000 | get_physical_block_exp(&dev->conf));
+    if (dev && dev->conf.discard_granularity) {
+        put_le16(p + 169, 1); /* TRIM support */
+    }
 
     memcpy(s->identify_data, p, sizeof(s->identify_data));
     s->identify_set = 1;
@@ -1746,6 +1752,72 @@ static void ide_clear_hob(IDEBus *bus)
     bus->ifs[1].select &= ~(1 << 7);
 }
 
+typedef struct TrimAIOCB {
+    BlockDriverAIOCB common;
+    QEMUBH *bh;
+    int ret;
+} TrimAIOCB;
+
+static void trim_aio_cancel(BlockDriverAIOCB *acb)
+{
+    TrimAIOCB *iocb = container_of(acb, TrimAIOCB, common);
+
+    qemu_bh_delete(iocb->bh);
+    iocb->bh = NULL;
+    qemu_aio_release(iocb);
+}
+
+static AIOPool trim_aio_pool = {
+    .aiocb_size         = sizeof(TrimAIOCB),
+    .cancel             = trim_aio_cancel,
+};
+
+static void ide_trim_bh_cb(void *opaque)
+{
+    TrimAIOCB *iocb = opaque;
+
+    iocb->common.cb(iocb->common.opaque, iocb->ret);
+
+    qemu_bh_delete(iocb->bh);
+    iocb->bh = NULL;
+
+    qemu_aio_release(iocb);
+}
+
+static BlockDriverAIOCB *ide_issue_trim(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    TrimAIOCB *iocb;
+    int i, j, ret;
+
+    iocb = qemu_aio_get(&trim_aio_pool, bs, cb, opaque);
+    iocb->bh = qemu_bh_new(ide_trim_bh_cb, iocb);
+    iocb->ret = 0;
+
+    for (j = 0; j < qiov->niov; j++) {
+        uint64_t *buffer = qiov->iov[j].iov_base;
+
+        for (i = 0; i < qiov->iov[j].iov_len / 8; i++) {
+            /* 6-byte LBA + 2-byte range per entry */
+            uint64_t entry = le64_to_cpu(buffer[i]);
+            uint64_t sector = entry & 0x0000ffffffffffffULL;
+            uint16_t count = entry >> 48;
+
+            if (count == 0)
+                break;
+
+            ret = bdrv_discard(bs, sector * 512, count * 512);
+            if (!iocb->ret)
+                iocb->ret = ret;
+        }
+    }
+
+    qemu_bh_schedule(iocb->bh);
+
+    return &iocb->common;
+}
+
 void ide_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
     IDEBus *bus = opaque;
@@ -1825,6 +1897,17 @@ void ide_ioport_write(void *opaque, uint
             break;
 
         switch(val) {
+        case WIN_DSM:
+            switch (s->feature) {
+            case DSM_TRIM:
+                if (!s->bs)
+                   goto abort_cmd;
+                ide_sector_dma(s, ide_issue_trim, 0);
+                break;
+            default:
+                goto abort_cmd;
+            }
+            break;
         case WIN_IDENTIFY:
             if (s->bs && s->drive_kind != IDE_CD) {
                 if (s->drive_kind != IDE_CFATA)
Index: qemu/hw/ide/internal.h
===================================================================
--- qemu.orig/hw/ide/internal.h	2010-12-10 11:33:41.444006150 +0100
+++ qemu/hw/ide/internal.h	2010-12-10 11:35:33.446003914 +0100
@@ -60,7 +60,11 @@ typedef struct BMDMAState BMDMAState;
  */
 #define CFA_REQ_EXT_ERROR_CODE		0x03 /* CFA Request Extended Error Code */
 /*
- *	0x04->0x07 Reserved
+ *      0x04->0x05 Reserved
+ */
+#define WIN_DSM                         0x06
+/*
+ *      0x07 Reserved
  */
 #define WIN_SRST			0x08 /* ATAPI soft reset command */
 #define WIN_DEVICE_RESET		0x08
@@ -188,6 +192,9 @@ typedef struct BMDMAState BMDMAState;
 
 #define IDE_DMA_BUF_SECTORS 256
 
+/* feature values for Data Set Management */
+#define DSM_TRIM                        0x01
+
 #if (IDE_DMA_BUF_SECTORS < MAX_MULT_SECTORS)
 #error "IDE_DMA_BUF_SECTORS must be bigger or equal to MAX_MULT_SECTORS"
 #endif
Index: qemu/hw/ide/qdev.c
===================================================================
--- qemu.orig/hw/ide/qdev.c	2010-12-10 11:24:19.324253880 +0100
+++ qemu/hw/ide/qdev.c	2010-12-10 11:35:33.450257860 +0100
@@ -110,6 +110,11 @@ static int ide_drive_initfn(IDEDevice *d
     const char *serial;
     DriveInfo *dinfo;
 
+    if (dev->conf.discard_granularity && dev->conf.discard_granularity != 512) {
+        error_report("discard_granularity must be 512 for ide");
+        return -1;
+    }
+
     serial = dev->serial;
     if (!serial) {
         /* try to fall back to value set with legacy -drive serial=... */

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 7/7] raw-posix: add discard support
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (5 preceding siblings ...)
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 6/7] ide: add TRIM support Christoph Hellwig
@ 2010-12-10 15:01 ` Christoph Hellwig
  2010-12-12 15:28 ` [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Stefan Hajnoczi
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-10 15:01 UTC (permalink / raw)
  To: qemu-devel

Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
hole punching mechanisms can be added when they become available.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block/raw-posix.c
===================================================================
--- qemu.orig/block/raw-posix.c	2010-12-07 09:35:40.806004264 +0100
+++ qemu/block/raw-posix.c	2010-12-07 09:44:30.093290897 +0100
@@ -69,6 +69,10 @@
 #include <sys/diskslice.h>
 #endif
 
+#ifdef CONFIG_XFS
+#include <xfs/xfs.h>
+#endif
+
 //#define DEBUG_FLOPPY
 
 //#define DEBUG_BLOCK
@@ -120,6 +124,9 @@ typedef struct BDRVRawState {
 #endif
     uint8_t *aligned_buf;
     unsigned aligned_buf_size;
+#ifdef CONFIG_XFS
+    bool is_xfs : 1;
+#endif
 } BDRVRawState;
 
 static int fd_open(BlockDriverState *bs);
@@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverSt
 #endif
     }
 
+#ifdef CONFIG_XFS
+    if (platform_test_xfs_fd(s->fd)) {
+        s->is_xfs = 1;
+    }
+#endif
+
     return 0;
 
 out_free_buf:
@@ -740,6 +753,36 @@ static int raw_flush(BlockDriverState *b
     return qemu_fdatasync(s->fd);
 }
 
+#ifdef CONFIG_XFS
+static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
+{
+    struct xfs_flock64 fl;
+
+    memset(&fl, 0, sizeof(fl));
+    fl.l_whence = SEEK_SET;
+    fl.l_start = sector_num << 9;
+    fl.l_len = (int64_t)nb_sectors << 9;
+
+    if (xfsctl(NULL, s->fd, XFS_IOC_UNRESVSP64, &fl) < 0) {
+        printf("cannot punch hole (%s)\n", strerror(errno));
+        return -errno;
+    }
+
+    return 0;
+}
+#endif
+
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+#ifdef CONFIG_XFS
+    BDRVRawState *s = bs->opaque;
+
+    if (s->is_xfs)
+        return xfs_discard(s, sector_num, nb_sectors);
+#endif
+
+    return 0;
+}
 
 static QEMUOptionParameter raw_create_options[] = {
     {
@@ -761,6 +804,7 @@ static BlockDriver bdrv_file = {
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
     .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
 
     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
Index: qemu/configure
===================================================================
--- qemu.orig/configure	2010-12-07 09:35:40.815261632 +0100
+++ qemu/configure	2010-12-07 09:44:52.764255834 +0100
@@ -288,6 +288,7 @@ xen=""
 linux_aio=""
 attr=""
 vhost_net=""
+xfs=""
 
 gprof="no"
 debug_tcg="no"
@@ -1393,6 +1394,27 @@ EOF
 fi
 
 ##########################################
+# xfsctl() probe, used for raw-posix
+if test "$xfs" != "no" ; then
+  cat > $TMPC << EOF
+#include <xfs/xfs.h>
+int main(void)
+{
+    xfsctl(NULL, 0, 0, NULL);
+    return 0;
+}
+EOF
+  if compile_prog "" "" ; then
+    xfs="yes"
+  else
+    if test "$xfs" = "yes" ; then
+      feature_not_found "xfs"
+    fi
+    xfs=no
+  fi
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" != "no" ; then
   vde_libs="-lvdeplug"
@@ -2354,6 +2376,7 @@ echo "vhost-net support $vhost_net"
 echo "Trace backend     $trace_backend"
 echo "Trace output file $trace_file-<pid>"
 echo "spice support     $spice"
+echo "xfsctl support    $xfs"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2499,6 +2522,9 @@ fi
 if test "$uuid" = "yes" ; then
   echo "CONFIG_UUID=y" >> $config_host_mak
 fi
+if test "$xfs" = "yes" ; then
+  echo "CONFIG_XFS=y" >> $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_host_mak
 echo "PKGVERSION=$pkgversion" >>$config_host_mak

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (6 preceding siblings ...)
  2010-12-10 15:01 ` [Qemu-devel] [PATCH 7/7] raw-posix: add discard support Christoph Hellwig
@ 2010-12-12 15:28 ` Stefan Hajnoczi
  2010-12-13 15:45   ` Christoph Hellwig
  2010-12-16 15:43 ` Kevin Wolf
  2010-12-16 18:36 ` [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4 Christoph Hellwig
  9 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2010-12-12 15:28 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Do you have qemu-io support for discard?

Any hints on testing this?  A recent guest kernel and ext -o discard
might exercise the code but I haven't tried yet.

Stefan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
  2010-12-12 15:28 ` [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Stefan Hajnoczi
@ 2010-12-13 15:45   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-13 15:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel

On Sun, Dec 12, 2010 at 03:28:14PM +0000, Stefan Hajnoczi wrote:
> Do you have qemu-io support for discard?

Now that you wrote it we have the support :)

> Any hints on testing this?  A recent guest kernel and ext -o discard
> might exercise the code but I haven't tried yet.

Anything that submits a discard in the guest is fine.  The simples thing
to test are the various mkfs tools, as they do a whole device discard.
Also -o discard for various Linux filesystem works, Mark Lord's wiper.sh
script, or any Windows 7 installation.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (7 preceding siblings ...)
  2010-12-12 15:28 ` [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Stefan Hajnoczi
@ 2010-12-16 15:43 ` Kevin Wolf
  2010-12-16 16:42   ` Christoph Hellwig
  2010-12-16 18:36 ` [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4 Christoph Hellwig
  9 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2010-12-16 15:43 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Am 10.12.2010 16:00, schrieb Christoph Hellwig:
> This patchset adds support for the ATA TRIM and SCSI WRITE SAME with
> unmap commands, which allow reclaiming free space from a backing image.
> 
> The user facing implementation is pretty complete, but not really
> efficient because the underlying bdrv_discard implementation doesn't
> use the aio implementation yet.  The reason for that is that the SCSI
> layer doesn't really allow any asynchronous commands except for
> READ/WRITE by design, and implementing the ATA TRIM command with it's
> multiple ranges is rather painful, and combined with the SCSI limitation
> I didn't bother yet.  The only backend support so far is the XFS hole
> punching ioctl, but others can be added easily when they become
> available.  A virtio implementation for a discard command would also
> be pretty easy, but until we actually support a better backend then
> a plain sparse file it's not worth using for production enviroments
> anyway, but more for playing with the thin provisioning infrastructure,
> or observing guest behaviour when TRIM / unmap is supported.
> 
> If the support is enabled and the backend doesn't support hole punching
> the TRIM / WRITE SAME commands become no-ops so that migration from
> hosts supporting or not supporting it works.
> 
> Version 3:
> 	- refactor IDE dma support code
> 	- proper brace obsfucation
> 	- fix compile without xfs headers
> 	- use bool instead of int for a one-byte flag
> 
> Version 2:
> 	- replace tabs with spaces
> 	- return -ENOMEDIUM from bdrv_discard if there's no driver
> 	  assigned
> 	- actually list the TP EVPD page as supported when querying
> 	  for supported EVPD pages

The SCSI changes seem to apply, but the rest conflicts with recent
changes, most notably AHCI. Can you rebase on top of the block branch?

I also found some minor things in the SCSI code, so I'll take the chance
to comment on them.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit
  2010-12-10 15:00 ` [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
@ 2010-12-16 15:48   ` Kevin Wolf
  2010-12-16 16:41     ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2010-12-16 15:48 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Am 10.12.2010 16:00, schrieb Christoph Hellwig:
> Support discards via the WRITE SAME command with the unmap bit set, and
> tell the initiator about the support for it via the block limit and the
> new thin provisioning EVPD pages.  Also fix the comment which incorrectly
> describedthe block limits EVPD page.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: qemu/hw/scsi-disk.c
> ===================================================================
> --- qemu.orig/hw/scsi-disk.c	2010-12-07 09:35:44.203009851 +0100
> +++ qemu/hw/scsi-disk.c	2010-12-07 09:42:07.984255141 +0100
> @@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCS
>              outbuf[buflen++] = 0x80; // unit serial number
>              outbuf[buflen++] = 0x83; // device identification
>              if (bdrv_get_type_hint(s->bs) != BDRV_TYPE_CDROM) {
> -                outbuf[buflen++] = 0xb0; // block device characteristics
> +                outbuf[buflen++] = 0xb0; // block limits
> +                outbuf[buflen++] = 0xb2; // thin provisioning
>              }
>              outbuf[pages] = buflen - pages - 1; // number of pages
>              break;
> @@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCS
>              buflen += id_len;
>              break;
>          }
> -        case 0xb0: /* block device characteristics */
> +        case 0xb0: /* block limits */
>          {
> +            unsigned int unmap_sectors =
> +                    s->qdev.conf.discard_granularity / s->qdev.blocksize;
>              unsigned int min_io_size =
>                      s->qdev.conf.min_io_size / s->qdev.blocksize;
>              unsigned int opt_io_size =
> @@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCS
>              outbuf[13] = (opt_io_size >> 16) & 0xff;
>              outbuf[14] = (opt_io_size >> 8) & 0xff;
>              outbuf[15] = opt_io_size & 0xff;
> +
> +            /* optimal unmap granularity */
> +            outbuf[28] = (unmap_sectors >> 24) & 0xff;
> +            outbuf[29] = (unmap_sectors >> 16) & 0xff;
> +            outbuf[30] = (unmap_sectors >> 8) & 0xff;
> +            outbuf[31] = unmap_sectors & 0xff;
> +            break;
> +        }
> +        case 0xb2: /* thin provisioning */
> +        {
> +            outbuf[3] = buflen = 8;
> +            outbuf[4] = 0;
> +            outbuf[5] = 0x40; /* write same with unmap supported */
> +            outbuf[6] = 0;
> +            outbuf[7] = 0;
>              break;
>          }
>          default:
> @@ -959,6 +977,11 @@ static int scsi_disk_emulate_command(SCS
>              outbuf[11] = 0;
>              outbuf[12] = 0;
>              outbuf[13] = get_physical_block_exp(&s->qdev.conf);
> +
> +            /* set TPE bit if the format supports discard */
> +            if (s->qdev.conf.discard_granularity)
> +                outbuf[14] = 0x80;

Missing braces.

> +
>              /* Protection, exponent and lowest lba field left blank. */
>              buflen = req->cmd.xfer;
>              break;
> @@ -1123,6 +1146,34 @@ static int32_t scsi_send_command(SCSIDev
>              goto illegal_lba;
>          }
>          break;
> +    case WRITE_SAME_16:
> +        len = r->req.cmd.xfer / d->blocksize;
> +
> +        DPRINTF("WRITE SAME(16) (sector %" PRId64 ", count %d)\n",
> +                r->req.cmd.lba, len);
> +
> +        if (r->req.cmd.lba > s->max_lba) {
> +            goto illegal_lba;
> +        }
> +
> +        /*
> +         * We only support WRITE SAME with the unmap bit set for now.
> +         */
> +        if (!(buf[1] & 0x8)) {
> +            goto fail;
> +        }
> +
> +        rc = bdrv_discard(s->bs, r->req.cmd.lba * s->cluster_size,
> +                          len * s->cluster_size);
> +        if (rc < 0) {
> +            /* XXX: better error code ?*/
> +            goto fail;
> +        }
> +
> +        scsi_req_set_status(r, GOOD, NO_SENSE);
> +        scsi_req_complete(&r->req);
> +        scsi_remove_request(r);

Isn't this the same as scsi_command_complete()?

> +        return 0;

And if you break; here instead of returning (like all other commands)
and remove the three lines above completely, I think it would just do
the right thing.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit
  2010-12-16 15:48   ` Kevin Wolf
@ 2010-12-16 16:41     ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-16 16:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel

On Thu, Dec 16, 2010 at 04:48:15PM +0100, Kevin Wolf wrote:
> > +        scsi_req_set_status(r, GOOD, NO_SENSE);
> > +        scsi_req_complete(&r->req);
> > +        scsi_remove_request(r);
> 
> Isn't this the same as scsi_command_complete()?

Yes.

> 
> > +        return 0;
> 
> And if you break; here instead of returning (like all other commands)
> and remove the three lines above completely, I think it would just do
> the right thing.

Yes, that looks doable.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3
  2010-12-16 15:43 ` Kevin Wolf
@ 2010-12-16 16:42   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-16 16:42 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel

On Thu, Dec 16, 2010 at 04:43:20PM +0100, Kevin Wolf wrote:
> The SCSI changes seem to apply, but the rest conflicts with recent
> changes, most notably AHCI. Can you rebase on top of the block branch?

I've tried to, but with the maze of pointer indirections I'm totally
lost for now.  I'll have to drop the IDE side for now until that area
gets less messy or I get a lot more time.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4
  2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
                   ` (8 preceding siblings ...)
  2010-12-16 15:43 ` Kevin Wolf
@ 2010-12-16 18:36 ` Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 1/3] block: add discard support Christoph Hellwig
                     ` (2 more replies)
  9 siblings, 3 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-16 18:36 UTC (permalink / raw)
  To: qemu-devel

This patchset adds support for the SCSI WRITE SAME with unmap command,
which allows reclaiming free space from a backing image.

The user facing implementation is pretty complete, but not really
efficient because the underlying bdrv_discard implementation doesn't
use the aio implementation yet.  The reason for that is that the SCSI
layer doesn't really allow any asynchronous commands except for
READ/WRITE by design.  The only support backend so far is the XFS hole
punching ioctl, but others can be added easily when they become
available.  A virtio implementation for a discard command would also
be pretty easy, but until we actually support a better backend then
a plain sparse file it's not worth using for production enviroments
anyway, but more for playing with the thin provisioning infrastructure,
or observing guest behaviour when TRIM / unmap is supported.

If the support is enabled and the backend doesn't support hole punching
the WRITE SAME command becomes a no-op so that migration from
hosts supporting or not supporting it works.

Version 4:
	- incorporate Kevin's review comments for the scsi patch
	- update to the current block queue
	- drop ide TRIM support, as it doesn't easily port to the
	  refactoring of the IDE code

Version 3:
 	- refactor IDE dma support code
 	- proper brace obsfucation
 	- fix compile without xfs headers
 	- use bool instead of int for a one-byte flag
 
Version 2:
 	- replace tabs with spaces
 	- return -ENOMEDIUM from bdrv_discard if there's no driver
 	  assigned
 	- actually list the TP EVPD page as supported when querying
 	  for supported EVPD pages

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 1/3] block: add discard support
  2010-12-16 18:36 ` [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4 Christoph Hellwig
@ 2010-12-16 18:36   ` Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 2/3] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 3/3] raw-posix: add discard support Christoph Hellwig
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-16 18:36 UTC (permalink / raw)
  To: qemu-devel

Add a new bdrv_discard method to free blocks in a mapping image, and a new
drive property to set the granularity for these discard.  If no discard
granularity support is set discard support is disabled.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block.c
===================================================================
--- qemu.orig/block.c	2010-12-16 16:51:43.000000000 +0100
+++ qemu/block.c	2010-12-16 16:52:05.875253180 +0100
@@ -1515,6 +1515,17 @@ int bdrv_has_zero_init(BlockDriverState
     return 1;
 }
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    if (!bs->drv) {
+        return -ENOMEDIUM;
+    }
+    if (!bs->drv->bdrv_discard) {
+        return 0;
+    }
+    return bs->drv->bdrv_discard(bs, sector_num, nb_sectors);
+}
+
 /*
  * Returns true iff the specified sector is present in the disk image. Drivers
  * not implementing the functionality are assumed to not support backing files,
Index: qemu/block.h
===================================================================
--- qemu.orig/block.h	2010-12-16 16:51:43.000000000 +0100
+++ qemu/block.h	2010-12-16 16:52:05.875253180 +0100
@@ -146,6 +146,7 @@ int bdrv_flush(BlockDriverState *bs);
 void bdrv_flush_all(void);
 void bdrv_close_all(void);
 
+int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 	int *pnum);
Index: qemu/block_int.h
===================================================================
--- qemu.orig/block_int.h	2010-12-16 16:51:43.000000000 +0100
+++ qemu/block_int.h	2010-12-16 16:52:38.542254787 +0100
@@ -72,6 +72,8 @@ struct BlockDriver {
         BlockDriverCompletionFunc *cb, void *opaque);
     BlockDriverAIOCB *(*bdrv_aio_flush)(BlockDriverState *bs,
         BlockDriverCompletionFunc *cb, void *opaque);
+    int (*bdrv_discard)(BlockDriverState *bs, int64_t sector_num,
+                        int nb_sectors);
 
     int (*bdrv_aio_multiwrite)(BlockDriverState *bs, BlockRequest *reqs,
         int num_reqs);
@@ -227,6 +229,7 @@ typedef struct BlockConf {
     uint16_t min_io_size;
     uint32_t opt_io_size;
     int32_t bootindex;
+    uint32_t discard_granularity;
 } BlockConf;
 
 static inline unsigned int get_physical_block_exp(BlockConf *conf)
@@ -250,6 +253,8 @@ static inline unsigned int get_physical_
                        _conf.physical_block_size, 512),                 \
     DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
     DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),    \
-    DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1)         \
+    DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1),        \
+    DEFINE_PROP_UINT32("discard_granularity", _state, \
+                       _conf.discard_granularity, 0)
 
 #endif /* BLOCK_INT_H */
Index: qemu/block/raw.c
===================================================================
--- qemu.orig/block/raw.c	2010-12-15 10:05:38.000000000 +0100
+++ qemu/block/raw.c	2010-12-16 16:52:05.882253111 +0100
@@ -65,6 +65,11 @@ static int raw_probe(const uint8_t *buf,
    return 1; /* everything can be opened as raw image */
 }
 
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    return bdrv_discard(bs->file, sector_num, nb_sectors);
+}
+
 static int raw_is_inserted(BlockDriverState *bs)
 {
     return bdrv_is_inserted(bs->file);
@@ -130,6 +135,7 @@ static BlockDriver bdrv_raw = {
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
     .bdrv_aio_flush     = raw_aio_flush,
+    .bdrv_discard       = raw_discard,
 
     .bdrv_is_inserted   = raw_is_inserted,
     .bdrv_eject         = raw_eject,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 2/3] scsi-disk: support WRITE SAME (16) with unmap bit
  2010-12-16 18:36 ` [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4 Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 1/3] block: add discard support Christoph Hellwig
@ 2010-12-16 18:36   ` Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 3/3] raw-posix: add discard support Christoph Hellwig
  2 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-16 18:36 UTC (permalink / raw)
  To: qemu-devel

Support discards via the WRITE SAME command with the unmap bit set, and
tell the initiator about the support for it via the block limit and the
new thin provisioning EVPD pages.  Also fix the comment which incorrectly
describedthe block limits EVPD page.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/hw/scsi-disk.c
===================================================================
--- qemu.orig/hw/scsi-disk.c	2010-12-15 10:06:14.260003984 +0100
+++ qemu/hw/scsi-disk.c	2010-12-16 16:57:26.881003566 +0100
@@ -424,7 +424,8 @@ static int scsi_disk_emulate_inquiry(SCS
             outbuf[buflen++] = 0x80; // unit serial number
             outbuf[buflen++] = 0x83; // device identification
             if (bdrv_get_type_hint(s->bs) != BDRV_TYPE_CDROM) {
-                outbuf[buflen++] = 0xb0; // block device characteristics
+                outbuf[buflen++] = 0xb0; // block limits
+                outbuf[buflen++] = 0xb2; // thin provisioning
             }
             outbuf[pages] = buflen - pages - 1; // number of pages
             break;
@@ -466,8 +467,10 @@ static int scsi_disk_emulate_inquiry(SCS
             buflen += id_len;
             break;
         }
-        case 0xb0: /* block device characteristics */
+        case 0xb0: /* block limits */
         {
+            unsigned int unmap_sectors =
+                    s->qdev.conf.discard_granularity / s->qdev.blocksize;
             unsigned int min_io_size =
                     s->qdev.conf.min_io_size / s->qdev.blocksize;
             unsigned int opt_io_size =
@@ -492,6 +495,21 @@ static int scsi_disk_emulate_inquiry(SCS
             outbuf[13] = (opt_io_size >> 16) & 0xff;
             outbuf[14] = (opt_io_size >> 8) & 0xff;
             outbuf[15] = opt_io_size & 0xff;
+
+            /* optimal unmap granularity */
+            outbuf[28] = (unmap_sectors >> 24) & 0xff;
+            outbuf[29] = (unmap_sectors >> 16) & 0xff;
+            outbuf[30] = (unmap_sectors >> 8) & 0xff;
+            outbuf[31] = unmap_sectors & 0xff;
+            break;
+        }
+        case 0xb2: /* thin provisioning */
+        {
+            outbuf[3] = buflen = 8;
+            outbuf[4] = 0;
+            outbuf[5] = 0x40; /* write same with unmap supported */
+            outbuf[6] = 0;
+            outbuf[7] = 0;
             break;
         }
         default:
@@ -959,6 +977,12 @@ static int scsi_disk_emulate_command(SCS
             outbuf[11] = 0;
             outbuf[12] = 0;
             outbuf[13] = get_physical_block_exp(&s->qdev.conf);
+
+            /* set TPE bit if the format supports discard */
+            if (s->qdev.conf.discard_granularity) {
+                outbuf[14] = 0x80;
+            }
+
             /* Protection, exponent and lowest lba field left blank. */
             buflen = req->cmd.xfer;
             break;
@@ -1123,6 +1147,31 @@ static int32_t scsi_send_command(SCSIDev
             goto illegal_lba;
         }
         break;
+    case WRITE_SAME_16:
+        len = r->req.cmd.xfer / d->blocksize;
+
+        DPRINTF("WRITE SAME(16) (sector %" PRId64 ", count %d)\n",
+                r->req.cmd.lba, len);
+
+        if (r->req.cmd.lba > s->max_lba) {
+            goto illegal_lba;
+        }
+
+        /*
+         * We only support WRITE SAME with the unmap bit set for now.
+         */
+        if (!(buf[1] & 0x8)) {
+            goto fail;
+        }
+
+        rc = bdrv_discard(s->bs, r->req.cmd.lba * s->cluster_size,
+                          len * s->cluster_size);
+        if (rc < 0) {
+            /* XXX: better error code ?*/
+            goto fail;
+        }
+
+        break;
     default:
         DPRINTF("Unknown SCSI command (%2.2x)\n", buf[0]);
     fail:
Index: qemu/hw/scsi-defs.h
===================================================================
--- qemu.orig/hw/scsi-defs.h	2010-12-15 10:05:38.301003914 +0100
+++ qemu/hw/scsi-defs.h	2010-12-16 16:52:55.803004683 +0100
@@ -84,6 +84,7 @@
 #define MODE_SENSE_10         0x5a
 #define PERSISTENT_RESERVE_IN 0x5e
 #define PERSISTENT_RESERVE_OUT 0x5f
+#define WRITE_SAME_16         0x93
 #define MAINTENANCE_IN        0xa3
 #define MAINTENANCE_OUT       0xa4
 #define MOVE_MEDIUM           0xa5

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH 3/3] raw-posix: add discard support
  2010-12-16 18:36 ` [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4 Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 1/3] block: add discard support Christoph Hellwig
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 2/3] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
@ 2010-12-16 18:36   ` Christoph Hellwig
  2010-12-17 10:32     ` Kevin Wolf
  2 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-16 18:36 UTC (permalink / raw)
  To: qemu-devel

Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
hole punching mechanisms can be added when they become available.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block/raw-posix.c
===================================================================
--- qemu.orig/block/raw-posix.c	2010-12-15 10:05:37.000000000 +0100
+++ qemu/block/raw-posix.c	2010-12-16 17:40:47.617253460 +0100
@@ -69,6 +69,10 @@
 #include <sys/diskslice.h>
 #endif
 
+#ifdef CONFIG_XFS
+#include <xfs/xfs.h>
+#endif
+
 //#define DEBUG_FLOPPY
 
 //#define DEBUG_BLOCK
@@ -120,6 +124,9 @@ typedef struct BDRVRawState {
 #endif
     uint8_t *aligned_buf;
     unsigned aligned_buf_size;
+#ifdef CONFIG_XFS
+    bool is_xfs : 1;
+#endif
 } BDRVRawState;
 
 static int fd_open(BlockDriverState *bs);
@@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverSt
 #endif
     }
 
+#ifdef CONFIG_XFS
+    if (platform_test_xfs_fd(s->fd)) {
+        s->is_xfs = 1;
+    }
+#endif
+
     return 0;
 
 out_free_buf:
@@ -740,6 +753,36 @@ static int raw_flush(BlockDriverState *b
     return qemu_fdatasync(s->fd);
 }
 
+#ifdef CONFIG_XFS
+static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
+{
+    struct xfs_flock64 fl;
+
+    memset(&fl, 0, sizeof(fl));
+    fl.l_whence = SEEK_SET;
+    fl.l_start = sector_num << 9;
+    fl.l_len = (int64_t)nb_sectors << 9;
+
+    if (xfsctl(NULL, s->fd, XFS_IOC_UNRESVSP64, &fl) < 0) {
+        printf("cannot punch hole (%s)\n", strerror(errno));
+        return -errno;
+    }
+
+    return 0;
+}
+#endif
+
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+#ifdef CONFIG_XFS
+    BDRVRawState *s = bs->opaque;
+
+    if (s->is_xfs)
+        return xfs_discard(s, sector_num, nb_sectors);
+#endif
+
+    return 0;
+}
 
 static QEMUOptionParameter raw_create_options[] = {
     {
@@ -761,6 +804,7 @@ static BlockDriver bdrv_file = {
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
     .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
 
     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
Index: qemu/configure
===================================================================
--- qemu.orig/configure	2010-12-16 16:51:43.000000000 +0100
+++ qemu/configure	2010-12-16 17:41:11.410254507 +0100
@@ -288,6 +288,7 @@ xen=""
 linux_aio=""
 attr=""
 vhost_net=""
+xfs=""
 
 gprof="no"
 debug_tcg="no"
@@ -1399,6 +1400,27 @@ EOF
 fi
 
 ##########################################
+# xfsctl() probe, used for raw-posix
+if test "$xfs" != "no" ; then
+  cat > $TMPC << EOF
+#include <xfs/xfs.h>
+int main(void)
+{
+    xfsctl(NULL, 0, 0, NULL);
+    return 0;
+}
+EOF
+  if compile_prog "" "" ; then
+    xfs="yes"
+  else
+    if test "$xfs" = "yes" ; then
+      feature_not_found "xfs"
+    fi
+    xfs=no
+  fi
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" != "no" ; then
   vde_libs="-lvdeplug"
@@ -2403,6 +2425,7 @@ echo "Trace backend     $trace_backend"
 echo "Trace output file $trace_file-<pid>"
 echo "spice support     $spice"
 echo "rbd support       $rbd"
+echo "xfsctl support    $xfs"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2548,6 +2571,9 @@ fi
 if test "$uuid" = "yes" ; then
   echo "CONFIG_UUID=y" >> $config_host_mak
 fi
+if test "$xfs" = "yes" ; then
+  echo "CONFIG_XFS=y" >> $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_host_mak
 echo "PKGVERSION=$pkgversion" >>$config_host_mak

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] raw-posix: add discard support
  2010-12-16 18:36   ` [Qemu-devel] [PATCH 3/3] raw-posix: add discard support Christoph Hellwig
@ 2010-12-17 10:32     ` Kevin Wolf
  2010-12-17 10:41       ` [Qemu-devel] [PATCH v5] " Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Kevin Wolf @ 2010-12-17 10:32 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Am 16.12.2010 19:36, schrieb Christoph Hellwig:
> Add support to discard blocks in a raw image residing on an XFS filesystem
> by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
> hole punching mechanisms can be added when they become available.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Index: qemu/block/raw-posix.c
> ===================================================================
> --- qemu.orig/block/raw-posix.c	2010-12-15 10:05:37.000000000 +0100
> +++ qemu/block/raw-posix.c	2010-12-16 17:40:47.617253460 +0100
> @@ -69,6 +69,10 @@
>  #include <sys/diskslice.h>
>  #endif
>  
> +#ifdef CONFIG_XFS
> +#include <xfs/xfs.h>
> +#endif
> +
>  //#define DEBUG_FLOPPY
>  
>  //#define DEBUG_BLOCK
> @@ -120,6 +124,9 @@ typedef struct BDRVRawState {
>  #endif
>      uint8_t *aligned_buf;
>      unsigned aligned_buf_size;
> +#ifdef CONFIG_XFS
> +    bool is_xfs : 1;
> +#endif
>  } BDRVRawState;
>  
>  static int fd_open(BlockDriverState *bs);
> @@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverSt
>  #endif
>      }
>  
> +#ifdef CONFIG_XFS
> +    if (platform_test_xfs_fd(s->fd)) {
> +        s->is_xfs = 1;
> +    }
> +#endif
> +
>      return 0;
>  
>  out_free_buf:
> @@ -740,6 +753,36 @@ static int raw_flush(BlockDriverState *b
>      return qemu_fdatasync(s->fd);
>  }
>  
> +#ifdef CONFIG_XFS
> +static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
> +{
> +    struct xfs_flock64 fl;
> +
> +    memset(&fl, 0, sizeof(fl));
> +    fl.l_whence = SEEK_SET;
> +    fl.l_start = sector_num << 9;
> +    fl.l_len = (int64_t)nb_sectors << 9;
> +
> +    if (xfsctl(NULL, s->fd, XFS_IOC_UNRESVSP64, &fl) < 0) {
> +        printf("cannot punch hole (%s)\n", strerror(errno));

Debugging leftover? Block drivers shouldn't print anything to stdout.

> +        return -errno;
> +    }
> +
> +    return 0;
> +}
> +#endif
> +
> +static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
> +{
> +#ifdef CONFIG_XFS
> +    BDRVRawState *s = bs->opaque;
> +
> +    if (s->is_xfs)
> +        return xfs_discard(s, sector_num, nb_sectors);

Missing braces.

I have already applied patch 1 and 2 to the block branch, so sending a
v5 for this one is enough.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH v5] raw-posix: add discard support
  2010-12-17 10:32     ` Kevin Wolf
@ 2010-12-17 10:41       ` Christoph Hellwig
  2010-12-17 10:53         ` [Qemu-devel] " Kevin Wolf
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2010-12-17 10:41 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

Add support to discard blocks in a raw image residing on an XFS filesystem
by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
hole punching mechanisms can be added when they become available.

Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: qemu/block/raw-posix.c
===================================================================
--- qemu.orig/block/raw-posix.c	2010-12-17 11:30:21.455262819 +0100
+++ qemu/block/raw-posix.c	2010-12-17 11:32:00.131003705 +0100
@@ -69,6 +69,10 @@
 #include <sys/diskslice.h>
 #endif
 
+#ifdef CONFIG_XFS
+#include <xfs/xfs.h>
+#endif
+
 //#define DEBUG_FLOPPY
 
 //#define DEBUG_BLOCK
@@ -120,6 +124,9 @@ typedef struct BDRVRawState {
 #endif
     uint8_t *aligned_buf;
     unsigned aligned_buf_size;
+#ifdef CONFIG_XFS
+    bool is_xfs : 1;
+#endif
 } BDRVRawState;
 
 static int fd_open(BlockDriverState *bs);
@@ -196,6 +203,12 @@ static int raw_open_common(BlockDriverSt
 #endif
     }
 
+#ifdef CONFIG_XFS
+    if (platform_test_xfs_fd(s->fd)) {
+        s->is_xfs = 1;
+    }
+#endif
+
     return 0;
 
 out_free_buf:
@@ -740,6 +753,37 @@ static int raw_flush(BlockDriverState *b
     return qemu_fdatasync(s->fd);
 }
 
+#ifdef CONFIG_XFS
+static int xfs_discard(BDRVRawState *s, int64_t sector_num, int nb_sectors)
+{
+    struct xfs_flock64 fl;
+
+    memset(&fl, 0, sizeof(fl));
+    fl.l_whence = SEEK_SET;
+    fl.l_start = sector_num << 9;
+    fl.l_len = (int64_t)nb_sectors << 9;
+
+    if (xfsctl(NULL, s->fd, XFS_IOC_UNRESVSP64, &fl) < 0) {
+        DEBUG_BLOCK_PRINT("cannot punch hole (%s)\n", strerror(errno));
+        return -errno;
+    }
+
+    return 0;
+}
+#endif
+
+static int raw_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+#ifdef CONFIG_XFS
+    BDRVRawState *s = bs->opaque;
+
+    if (s->is_xfs) {
+        return xfs_discard(s, sector_num, nb_sectors);
+    }
+#endif
+
+    return 0;
+}
 
 static QEMUOptionParameter raw_create_options[] = {
     {
@@ -761,6 +805,7 @@ static BlockDriver bdrv_file = {
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
     .bdrv_flush = raw_flush,
+    .bdrv_discard = raw_discard,
 
     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
Index: qemu/configure
===================================================================
--- qemu.orig/configure	2010-12-17 11:30:21.468254368 +0100
+++ qemu/configure	2010-12-17 11:30:40.080023610 +0100
@@ -288,6 +288,7 @@ xen=""
 linux_aio=""
 attr=""
 vhost_net=""
+xfs=""
 
 gprof="no"
 debug_tcg="no"
@@ -1399,6 +1400,27 @@ EOF
 fi
 
 ##########################################
+# xfsctl() probe, used for raw-posix
+if test "$xfs" != "no" ; then
+  cat > $TMPC << EOF
+#include <xfs/xfs.h>
+int main(void)
+{
+    xfsctl(NULL, 0, 0, NULL);
+    return 0;
+}
+EOF
+  if compile_prog "" "" ; then
+    xfs="yes"
+  else
+    if test "$xfs" = "yes" ; then
+      feature_not_found "xfs"
+    fi
+    xfs=no
+  fi
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" != "no" ; then
   vde_libs="-lvdeplug"
@@ -2403,6 +2425,7 @@ echo "Trace backend     $trace_backend"
 echo "Trace output file $trace_file-<pid>"
 echo "spice support     $spice"
 echo "rbd support       $rbd"
+echo "xfsctl support    $xfs"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2548,6 +2571,9 @@ fi
 if test "$uuid" = "yes" ; then
   echo "CONFIG_UUID=y" >> $config_host_mak
 fi
+if test "$xfs" = "yes" ; then
+  echo "CONFIG_XFS=y" >> $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_host_mak
 echo "PKGVERSION=$pkgversion" >>$config_host_mak

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] Re: [PATCH v5] raw-posix: add discard support
  2010-12-17 10:41       ` [Qemu-devel] [PATCH v5] " Christoph Hellwig
@ 2010-12-17 10:53         ` Kevin Wolf
  0 siblings, 0 replies; 21+ messages in thread
From: Kevin Wolf @ 2010-12-17 10:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Am 17.12.2010 11:41, schrieb Christoph Hellwig:
> Add support to discard blocks in a raw image residing on an XFS filesystem
> by calling the XFS_IOC_UNRESVSP64 ioctl to punch holes.  Support for other
> hole punching mechanisms can be added when they become available.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Thanks, applied to the block branch.

Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-12-17 10:52 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-10 15:00 [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Christoph Hellwig
2010-12-10 15:00 ` [Qemu-devel] [PATCH 1/7] block: add discard support Christoph Hellwig
2010-12-10 15:00 ` [Qemu-devel] [PATCH 2/7] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
2010-12-16 15:48   ` Kevin Wolf
2010-12-16 16:41     ` Christoph Hellwig
2010-12-10 15:01 ` [Qemu-devel] [PATCH 3/7] make dma_bdrv_io available to drivers Christoph Hellwig
2010-12-10 15:01 ` [Qemu-devel] [PATCH 4/7] ide: factor dma handling helpers Christoph Hellwig
2010-12-10 15:01 ` [Qemu-devel] [PATCH 5/7] ide: also reset io_buffer_index for writes Christoph Hellwig
2010-12-10 15:01 ` [Qemu-devel] [PATCH 6/7] ide: add TRIM support Christoph Hellwig
2010-12-10 15:01 ` [Qemu-devel] [PATCH 7/7] raw-posix: add discard support Christoph Hellwig
2010-12-12 15:28 ` [Qemu-devel] ]PATCH 0/7] add TRIM/UNMAP support, v3 Stefan Hajnoczi
2010-12-13 15:45   ` Christoph Hellwig
2010-12-16 15:43 ` Kevin Wolf
2010-12-16 16:42   ` Christoph Hellwig
2010-12-16 18:36 ` [Qemu-devel] Re: [PATCH 0/3] add TRIM/UNMAP support, v4 Christoph Hellwig
2010-12-16 18:36   ` [Qemu-devel] [PATCH 1/3] block: add discard support Christoph Hellwig
2010-12-16 18:36   ` [Qemu-devel] [PATCH 2/3] scsi-disk: support WRITE SAME (16) with unmap bit Christoph Hellwig
2010-12-16 18:36   ` [Qemu-devel] [PATCH 3/3] raw-posix: add discard support Christoph Hellwig
2010-12-17 10:32     ` Kevin Wolf
2010-12-17 10:41       ` [Qemu-devel] [PATCH v5] " Christoph Hellwig
2010-12-17 10:53         ` [Qemu-devel] " Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.