All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mmc: block: delete packed command support
@ 2016-11-21 10:08 Linus Walleij
  2016-11-21 11:11 ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Walleij @ 2016-11-21 10:08 UTC (permalink / raw)
  To: linux-mmc, Ulf Hansson
  Cc: Chunyan Zhang, Baolin Wang, Linus Walleij, Jaehoon Chung,
	Namjae Jeon, Maya Erez

I've had it with this code now.

The packed command support is a complex hurdle in the MMC/SD block
layer, around 500+ lines of code which was introduced in 2013 in
commits

ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
abd9ac144947 ("mmc: add packed command feature of eMMC4.5")

...and since then it has been rotting. The original author of the
code has disappeared from the community and the mail address is
bouncing.

For the code to be exercised the host must flag that it supports
packed commands, so in mmc_blk_prep_packed_list() which is called for
every single request, the following construction appears:

u8 max_packed_rw = 0;

if ((rq_data_dir(cur) == WRITE) &&
    mmc_host_packed_wr(card->host))
        max_packed_rw = card->ext_csd.max_packed_writes;

if (max_packed_rw == 0)
    goto no_packed;

This has the following logical deductions:

- Only WRITE commands can really be packed, so the solution is
  only half-done: we support packed WRITE but not packed READ.
  The packed command support has not been finalized by supporting
  reads in three years!

- mmc_host_packed_wr() is just a static inline that checks
  host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
  that NO upstream host sets this capability flag! No driver
  in the kernel is using it, and we can't test it. Packed
  command may be supported in out-of-tree code, but I doubt
  it. I doubt that the code is even working anymore due to
  other refactorings in the MMC block layer, who would
  notice if patches affecting it broke packed commands?
  No one.

- There is no Device Tree binding or code to mark a host as
  supporting packed read or write commands, just this flag
  in caps2, so for sure there are not any DT systems using
  it either.

It has other problems as well: mmc_blk_prep_packed_list() is
speculatively picking requests out of the request queue with
blk_fetch_request() making the MMC/SD stack harder to convert
to the multiqueue block layer. By this we get rid of an
obstacle.

The way I see it this is just cruft littering the MMC/SD
stack.

Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Maya Erez <qca_merez@qca.qualcomm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
If you are using the packed command and ready to step up and
finalize the READ support, integrate a host to actually use
it upstream, provide proper device tree bindings and and work
to support it in the future: step up NOW or the code will
be deleted.
---
 drivers/mmc/card/block.c | 482 ++---------------------------------------------
 drivers/mmc/card/queue.c |  53 +-----
 drivers/mmc/card/queue.h |  19 --
 include/linux/mmc/host.h |   5 -
 4 files changed, 20 insertions(+), 539 deletions(-)

diff --git a/drivers/mmc/card/block.c b/drivers/mmc/card/block.c
index ea9de876a846..b1d6392ed8d3 100644
--- a/drivers/mmc/card/block.c
+++ b/drivers/mmc/card/block.c
@@ -65,9 +65,6 @@ MODULE_ALIAS("mmc:block");
 
 #define mmc_req_rel_wr(req)	((req->cmd_flags & REQ_FUA) && \
 				  (rq_data_dir(req) == WRITE))
-#define PACKED_CMD_VER	0x01
-#define PACKED_CMD_WR	0x02
-
 static DEFINE_MUTEX(block_mutex);
 
 /*
@@ -101,7 +98,6 @@ struct mmc_blk_data {
 	unsigned int	flags;
 #define MMC_BLK_CMD23	(1 << 0)	/* Can do SET_BLOCK_COUNT for multiblock */
 #define MMC_BLK_REL_WR	(1 << 1)	/* MMC Reliable write support */
-#define MMC_BLK_PACKED_CMD	(1 << 2)	/* MMC packed command support */
 
 	unsigned int	usage;
 	unsigned int	read_only;
@@ -125,12 +121,6 @@ struct mmc_blk_data {
 
 static DEFINE_MUTEX(open_lock);
 
-enum {
-	MMC_PACKED_NR_IDX = -1,
-	MMC_PACKED_NR_ZERO,
-	MMC_PACKED_NR_SINGLE,
-};
-
 module_param(perdev_minors, int, 0444);
 MODULE_PARM_DESC(perdev_minors, "Minors numbers to allocate per device");
 
@@ -138,17 +128,6 @@ static inline int mmc_blk_part_switch(struct mmc_card *card,
 				      struct mmc_blk_data *md);
 static int get_card_status(struct mmc_card *card, u32 *status, int retries);
 
-static inline void mmc_blk_clear_packed(struct mmc_queue_req *mqrq)
-{
-	struct mmc_packed *packed = mqrq->packed;
-
-	mqrq->cmd_type = MMC_PACKED_NONE;
-	packed->nr_entries = MMC_PACKED_NR_ZERO;
-	packed->idx_failure = MMC_PACKED_NR_IDX;
-	packed->retries = 0;
-	packed->blocks = 0;
-}
-
 static struct mmc_blk_data *mmc_blk_get(struct gendisk *disk)
 {
 	struct mmc_blk_data *md;
@@ -1419,111 +1398,12 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
 	if (!brq->data.bytes_xfered)
 		return MMC_BLK_RETRY;
 
-	if (mmc_packed_cmd(mq_mrq->cmd_type)) {
-		if (unlikely(brq->data.blocks << 9 != brq->data.bytes_xfered))
-			return MMC_BLK_PARTIAL;
-		else
-			return MMC_BLK_SUCCESS;
-	}
-
 	if (blk_rq_bytes(req) != brq->data.bytes_xfered)
 		return MMC_BLK_PARTIAL;
 
 	return MMC_BLK_SUCCESS;
 }
 
-static int mmc_packed_init(struct mmc_queue *mq, struct mmc_card *card)
-{
-	struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
-	struct mmc_queue_req *mqrq_prev = &mq->mqrq[1];
-	int ret = 0;
-
-
-	mqrq_cur->packed = kzalloc(sizeof(struct mmc_packed), GFP_KERNEL);
-	if (!mqrq_cur->packed) {
-		pr_warn("%s: unable to allocate packed cmd for mqrq_cur\n",
-			mmc_card_name(card));
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	mqrq_prev->packed = kzalloc(sizeof(struct mmc_packed), GFP_KERNEL);
-	if (!mqrq_prev->packed) {
-		pr_warn("%s: unable to allocate packed cmd for mqrq_prev\n",
-			mmc_card_name(card));
-		kfree(mqrq_cur->packed);
-		mqrq_cur->packed = NULL;
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	INIT_LIST_HEAD(&mqrq_cur->packed->list);
-	INIT_LIST_HEAD(&mqrq_prev->packed->list);
-
-out:
-	return ret;
-}
-
-static void mmc_packed_clean(struct mmc_queue *mq)
-{
-	struct mmc_queue_req *mqrq_cur = &mq->mqrq[0];
-	struct mmc_queue_req *mqrq_prev = &mq->mqrq[1];
-
-	kfree(mqrq_cur->packed);
-	mqrq_cur->packed = NULL;
-	kfree(mqrq_prev->packed);
-	mqrq_prev->packed = NULL;
-}
-
-static enum mmc_blk_status mmc_blk_packed_err_check(struct mmc_card *card,
-						    struct mmc_async_req *areq)
-{
-	struct mmc_queue_req *mq_rq = container_of(areq, struct mmc_queue_req,
-			mmc_active);
-	struct request *req = mq_rq->req;
-	struct mmc_packed *packed = mq_rq->packed;
-	enum mmc_blk_status status, check;
-	int err;
-	u8 *ext_csd;
-
-	packed->retries--;
-	check = mmc_blk_err_check(card, areq);
-	err = get_card_status(card, &status, 0);
-	if (err) {
-		pr_err("%s: error %d sending status command\n",
-		       req->rq_disk->disk_name, err);
-		return MMC_BLK_ABORT;
-	}
-
-	if (status & R1_EXCEPTION_EVENT) {
-		err = mmc_get_ext_csd(card, &ext_csd);
-		if (err) {
-			pr_err("%s: error %d sending ext_csd\n",
-			       req->rq_disk->disk_name, err);
-			return MMC_BLK_ABORT;
-		}
-
-		if ((ext_csd[EXT_CSD_EXP_EVENTS_STATUS] &
-		     EXT_CSD_PACKED_FAILURE) &&
-		    (ext_csd[EXT_CSD_PACKED_CMD_STATUS] &
-		     EXT_CSD_PACKED_GENERIC_ERROR)) {
-			if (ext_csd[EXT_CSD_PACKED_CMD_STATUS] &
-			    EXT_CSD_PACKED_INDEXED_ERROR) {
-				packed->idx_failure =
-				  ext_csd[EXT_CSD_PACKED_FAILURE_INDEX] - 1;
-				check = MMC_BLK_PARTIAL;
-			}
-			pr_err("%s: packed cmd failed, nr %u, sectors %u, "
-			       "failure index: %d\n",
-			       req->rq_disk->disk_name, packed->nr_entries,
-			       packed->blocks, packed->idx_failure);
-		}
-		kfree(ext_csd);
-	}
-
-	return check;
-}
-
 static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 			       struct mmc_card *card,
 			       int disable_multi,
@@ -1684,222 +1564,6 @@ static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 	mmc_queue_bounce_pre(mqrq);
 }
 
-static inline u8 mmc_calc_packed_hdr_segs(struct request_queue *q,
-					  struct mmc_card *card)
-{
-	unsigned int hdr_sz = mmc_large_sector(card) ? 4096 : 512;
-	unsigned int max_seg_sz = queue_max_segment_size(q);
-	unsigned int len, nr_segs = 0;
-
-	do {
-		len = min(hdr_sz, max_seg_sz);
-		hdr_sz -= len;
-		nr_segs++;
-	} while (hdr_sz);
-
-	return nr_segs;
-}
-
-static u8 mmc_blk_prep_packed_list(struct mmc_queue *mq, struct request *req)
-{
-	struct request_queue *q = mq->queue;
-	struct mmc_card *card = mq->card;
-	struct request *cur = req, *next = NULL;
-	struct mmc_blk_data *md = mq->blkdata;
-	struct mmc_queue_req *mqrq = mq->mqrq_cur;
-	bool en_rel_wr = card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN;
-	unsigned int req_sectors = 0, phys_segments = 0;
-	unsigned int max_blk_count, max_phys_segs;
-	bool put_back = true;
-	u8 max_packed_rw = 0;
-	u8 reqs = 0;
-
-	/*
-	 * We don't need to check packed for any further
-	 * operation of packed stuff as we set MMC_PACKED_NONE
-	 * and return zero for reqs if geting null packed. Also
-	 * we clean the flag of MMC_BLK_PACKED_CMD to avoid doing
-	 * it again when removing blk req.
-	 */
-	if (!mqrq->packed) {
-		md->flags &= (~MMC_BLK_PACKED_CMD);
-		goto no_packed;
-	}
-
-	if (!(md->flags & MMC_BLK_PACKED_CMD))
-		goto no_packed;
-
-	if ((rq_data_dir(cur) == WRITE) &&
-	    mmc_host_packed_wr(card->host))
-		max_packed_rw = card->ext_csd.max_packed_writes;
-
-	if (max_packed_rw == 0)
-		goto no_packed;
-
-	if (mmc_req_rel_wr(cur) &&
-	    (md->flags & MMC_BLK_REL_WR) && !en_rel_wr)
-		goto no_packed;
-
-	if (mmc_large_sector(card) &&
-	    !IS_ALIGNED(blk_rq_sectors(cur), 8))
-		goto no_packed;
-
-	mmc_blk_clear_packed(mqrq);
-
-	max_blk_count = min(card->host->max_blk_count,
-			    card->host->max_req_size >> 9);
-	if (unlikely(max_blk_count > 0xffff))
-		max_blk_count = 0xffff;
-
-	max_phys_segs = queue_max_segments(q);
-	req_sectors += blk_rq_sectors(cur);
-	phys_segments += cur->nr_phys_segments;
-
-	if (rq_data_dir(cur) == WRITE) {
-		req_sectors += mmc_large_sector(card) ? 8 : 1;
-		phys_segments += mmc_calc_packed_hdr_segs(q, card);
-	}
-
-	do {
-		if (reqs >= max_packed_rw - 1) {
-			put_back = false;
-			break;
-		}
-
-		spin_lock_irq(q->queue_lock);
-		next = blk_fetch_request(q);
-		spin_unlock_irq(q->queue_lock);
-		if (!next) {
-			put_back = false;
-			break;
-		}
-
-		if (mmc_large_sector(card) &&
-		    !IS_ALIGNED(blk_rq_sectors(next), 8))
-			break;
-
-		if (mmc_req_is_special(next))
-			break;
-
-		if (rq_data_dir(cur) != rq_data_dir(next))
-			break;
-
-		if (mmc_req_rel_wr(next) &&
-		    (md->flags & MMC_BLK_REL_WR) && !en_rel_wr)
-			break;
-
-		req_sectors += blk_rq_sectors(next);
-		if (req_sectors > max_blk_count)
-			break;
-
-		phys_segments +=  next->nr_phys_segments;
-		if (phys_segments > max_phys_segs)
-			break;
-
-		list_add_tail(&next->queuelist, &mqrq->packed->list);
-		cur = next;
-		reqs++;
-	} while (1);
-
-	if (put_back) {
-		spin_lock_irq(q->queue_lock);
-		blk_requeue_request(q, next);
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	if (reqs > 0) {
-		list_add(&req->queuelist, &mqrq->packed->list);
-		mqrq->packed->nr_entries = ++reqs;
-		mqrq->packed->retries = reqs;
-		return reqs;
-	}
-
-no_packed:
-	mqrq->cmd_type = MMC_PACKED_NONE;
-	return 0;
-}
-
-static void mmc_blk_packed_hdr_wrq_prep(struct mmc_queue_req *mqrq,
-					struct mmc_card *card,
-					struct mmc_queue *mq)
-{
-	struct mmc_blk_request *brq = &mqrq->brq;
-	struct request *req = mqrq->req;
-	struct request *prq;
-	struct mmc_blk_data *md = mq->blkdata;
-	struct mmc_packed *packed = mqrq->packed;
-	bool do_rel_wr, do_data_tag;
-	u32 *packed_cmd_hdr;
-	u8 hdr_blocks;
-	u8 i = 1;
-
-	mqrq->cmd_type = MMC_PACKED_WRITE;
-	packed->blocks = 0;
-	packed->idx_failure = MMC_PACKED_NR_IDX;
-
-	packed_cmd_hdr = packed->cmd_hdr;
-	memset(packed_cmd_hdr, 0, sizeof(packed->cmd_hdr));
-	packed_cmd_hdr[0] = cpu_to_le32((packed->nr_entries << 16) |
-		(PACKED_CMD_WR << 8) | PACKED_CMD_VER);
-	hdr_blocks = mmc_large_sector(card) ? 8 : 1;
-
-	/*
-	 * Argument for each entry of packed group
-	 */
-	list_for_each_entry(prq, &packed->list, queuelist) {
-		do_rel_wr = mmc_req_rel_wr(prq) && (md->flags & MMC_BLK_REL_WR);
-		do_data_tag = (card->ext_csd.data_tag_unit_size) &&
-			(prq->cmd_flags & REQ_META) &&
-			(rq_data_dir(prq) == WRITE) &&
-			blk_rq_bytes(prq) >= card->ext_csd.data_tag_unit_size;
-		/* Argument of CMD23 */
-		packed_cmd_hdr[(i * 2)] = cpu_to_le32(
-			(do_rel_wr ? MMC_CMD23_ARG_REL_WR : 0) |
-			(do_data_tag ? MMC_CMD23_ARG_TAG_REQ : 0) |
-			blk_rq_sectors(prq));
-		/* Argument of CMD18 or CMD25 */
-		packed_cmd_hdr[((i * 2)) + 1] = cpu_to_le32(
-			mmc_card_blockaddr(card) ?
-			blk_rq_pos(prq) : blk_rq_pos(prq) << 9);
-		packed->blocks += blk_rq_sectors(prq);
-		i++;
-	}
-
-	memset(brq, 0, sizeof(struct mmc_blk_request));
-	brq->mrq.cmd = &brq->cmd;
-	brq->mrq.data = &brq->data;
-	brq->mrq.sbc = &brq->sbc;
-	brq->mrq.stop = &brq->stop;
-
-	brq->sbc.opcode = MMC_SET_BLOCK_COUNT;
-	brq->sbc.arg = MMC_CMD23_ARG_PACKED | (packed->blocks + hdr_blocks);
-	brq->sbc.flags = MMC_RSP_R1 | MMC_CMD_AC;
-
-	brq->cmd.opcode = MMC_WRITE_MULTIPLE_BLOCK;
-	brq->cmd.arg = blk_rq_pos(req);
-	if (!mmc_card_blockaddr(card))
-		brq->cmd.arg <<= 9;
-	brq->cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
-
-	brq->data.blksz = 512;
-	brq->data.blocks = packed->blocks + hdr_blocks;
-	brq->data.flags = MMC_DATA_WRITE;
-
-	brq->stop.opcode = MMC_STOP_TRANSMISSION;
-	brq->stop.arg = 0;
-	brq->stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
-
-	mmc_set_data_timeout(&brq->data, card);
-
-	brq->data.sg = mqrq->sg;
-	brq->data.sg_len = mmc_queue_map_sg(mq, mqrq);
-
-	mqrq->mmc_active.mrq = &brq->mrq;
-	mqrq->mmc_active.err_check = mmc_blk_packed_err_check;
-
-	mmc_queue_bounce_pre(mqrq);
-}
-
 static int mmc_blk_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
 			   struct mmc_blk_request *brq, struct request *req,
 			   int ret)
@@ -1922,79 +1586,10 @@ static int mmc_blk_cmd_err(struct mmc_blk_data *md, struct mmc_card *card,
 		if (blocks != (u32)-1) {
 			ret = blk_end_request(req, 0, blocks << 9);
 		}
-	} else {
-		if (!mmc_packed_cmd(mq_rq->cmd_type))
-			ret = blk_end_request(req, 0, brq->data.bytes_xfered);
-	}
-	return ret;
-}
-
-static int mmc_blk_end_packed_req(struct mmc_queue_req *mq_rq)
-{
-	struct request *prq;
-	struct mmc_packed *packed = mq_rq->packed;
-	int idx = packed->idx_failure, i = 0;
-	int ret = 0;
-
-	while (!list_empty(&packed->list)) {
-		prq = list_entry_rq(packed->list.next);
-		if (idx == i) {
-			/* retry from error index */
-			packed->nr_entries -= idx;
-			mq_rq->req = prq;
-			ret = 1;
-
-			if (packed->nr_entries == MMC_PACKED_NR_SINGLE) {
-				list_del_init(&prq->queuelist);
-				mmc_blk_clear_packed(mq_rq);
-			}
-			return ret;
-		}
-		list_del_init(&prq->queuelist);
-		blk_end_request(prq, 0, blk_rq_bytes(prq));
-		i++;
 	}
-
-	mmc_blk_clear_packed(mq_rq);
 	return ret;
 }
 
-static void mmc_blk_abort_packed_req(struct mmc_queue_req *mq_rq)
-{
-	struct request *prq;
-	struct mmc_packed *packed = mq_rq->packed;
-
-	while (!list_empty(&packed->list)) {
-		prq = list_entry_rq(packed->list.next);
-		list_del_init(&prq->queuelist);
-		blk_end_request(prq, -EIO, blk_rq_bytes(prq));
-	}
-
-	mmc_blk_clear_packed(mq_rq);
-}
-
-static void mmc_blk_revert_packed_req(struct mmc_queue *mq,
-				      struct mmc_queue_req *mq_rq)
-{
-	struct request *prq;
-	struct request_queue *q = mq->queue;
-	struct mmc_packed *packed = mq_rq->packed;
-
-	while (!list_empty(&packed->list)) {
-		prq = list_entry_rq(packed->list.prev);
-		if (prq->queuelist.prev != &packed->list) {
-			list_del_init(&prq->queuelist);
-			spin_lock_irq(q->queue_lock);
-			blk_requeue_request(mq->queue, prq);
-			spin_unlock_irq(q->queue_lock);
-		} else {
-			list_del_init(&prq->queuelist);
-		}
-	}
-
-	mmc_blk_clear_packed(mq_rq);
-}
-
 static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 {
 	struct mmc_blk_data *md = mq->blkdata;
@@ -2005,15 +1600,11 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 	struct mmc_queue_req *mq_rq;
 	struct request *req = rqc;
 	struct mmc_async_req *areq;
-	const u8 packed_nr = 2;
 	u8 reqs = 0;
 
 	if (!rqc && !mq->mqrq_prev->req)
 		return 0;
 
-	if (rqc)
-		reqs = mmc_blk_prep_packed_list(mq, rqc);
-
 	do {
 		if (rqc) {
 			/*
@@ -2028,11 +1619,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 				goto cmd_abort;
 			}
 
-			if (reqs >= packed_nr)
-				mmc_blk_packed_hdr_wrq_prep(mq->mqrq_cur,
-							    card, mq);
-			else
-				mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
+			mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
 			areq = &mq->mqrq_cur->mmc_active;
 		} else
 			areq = NULL;
@@ -2057,13 +1644,8 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 			 */
 			mmc_blk_reset_success(md, type);
 
-			if (mmc_packed_cmd(mq_rq->cmd_type)) {
-				ret = mmc_blk_end_packed_req(mq_rq);
-				break;
-			} else {
-				ret = blk_end_request(req, 0,
-						brq->data.bytes_xfered);
-			}
+			ret = blk_end_request(req, 0,
+					brq->data.bytes_xfered);
 
 			/*
 			 * If the blk_end_request function returns non-zero even
@@ -2100,8 +1682,7 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 			err = mmc_blk_reset(md, card->host, type);
 			if (!err)
 				break;
-			if (err == -ENODEV ||
-				mmc_packed_cmd(mq_rq->cmd_type))
+			if (err == -ENODEV)
 				goto cmd_abort;
 			/* Fall through */
 		}
@@ -2132,23 +1713,14 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 		}
 
 		if (ret) {
-			if (mmc_packed_cmd(mq_rq->cmd_type)) {
-				if (!mq_rq->packed->retries)
-					goto cmd_abort;
-				mmc_blk_packed_hdr_wrq_prep(mq_rq, card, mq);
-				mmc_start_req(card->host,
-					      &mq_rq->mmc_active, NULL);
-			} else {
-
-				/*
-				 * In case of a incomplete request
-				 * prepare it again and resend.
-				 */
-				mmc_blk_rw_rq_prep(mq_rq, card,
-						disable_multi, mq);
-				mmc_start_req(card->host,
-						&mq_rq->mmc_active, NULL);
-			}
+			/*
+			 * In case of a incomplete request
+			 * prepare it again and resend.
+			 */
+			mmc_blk_rw_rq_prep(mq_rq, card,
+					disable_multi, mq);
+			mmc_start_req(card->host,
+					&mq_rq->mmc_active, NULL);
 			mq_rq->brq.retune_retry_done = retune_retry_done;
 		}
 	} while (ret);
@@ -2156,15 +1728,11 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 	return 1;
 
  cmd_abort:
-	if (mmc_packed_cmd(mq_rq->cmd_type)) {
-		mmc_blk_abort_packed_req(mq_rq);
-	} else {
-		if (mmc_card_removed(card))
-			req->cmd_flags |= REQ_QUIET;
-		while (ret)
-			ret = blk_end_request(req, -EIO,
-					blk_rq_cur_bytes(req));
-	}
+	if (mmc_card_removed(card))
+		req->cmd_flags |= REQ_QUIET;
+	while (ret)
+		ret = blk_end_request(req, -EIO,
+				blk_rq_cur_bytes(req));
 
  start_new_req:
 	if (rqc) {
@@ -2172,12 +1740,6 @@ static int mmc_blk_issue_rw_rq(struct mmc_queue *mq, struct request *rqc)
 			rqc->cmd_flags |= REQ_QUIET;
 			blk_end_request_all(rqc, -EIO);
 		} else {
-			/*
-			 * If current request is packed, it needs to put back.
-			 */
-			if (mmc_packed_cmd(mq->mqrq_cur->cmd_type))
-				mmc_blk_revert_packed_req(mq, mq->mqrq_cur);
-
 			mmc_blk_rw_rq_prep(mq->mqrq_cur, card, 0, mq);
 			mmc_start_req(card->host,
 				      &mq->mqrq_cur->mmc_active, NULL);
@@ -2360,14 +1922,6 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 		blk_queue_write_cache(md->queue.queue, true, true);
 	}
 
-	if (mmc_card_mmc(card) &&
-	    (area_type == MMC_BLK_DATA_AREA_MAIN) &&
-	    (md->flags & MMC_BLK_CMD23) &&
-	    card->ext_csd.packed_event_en) {
-		if (!mmc_packed_init(&md->queue, card))
-			md->flags |= MMC_BLK_PACKED_CMD;
-	}
-
 	return md;
 
  err_putdisk:
@@ -2471,8 +2025,6 @@ static void mmc_blk_remove_req(struct mmc_blk_data *md)
 		 */
 		card = md->queue.card;
 		mmc_cleanup_queue(&md->queue);
-		if (md->flags & MMC_BLK_PACKED_CMD)
-			mmc_packed_clean(&md->queue);
 		if (md->disk->flags & GENHD_FL_UP) {
 			device_remove_file(disk_to_dev(md->disk), &md->force_ro);
 			if ((md->area_type & MMC_BLK_DATA_AREA_BOOT) &&
diff --git a/drivers/mmc/card/queue.c b/drivers/mmc/card/queue.c
index 3f6a2463ab30..7dacf2744fbd 100644
--- a/drivers/mmc/card/queue.c
+++ b/drivers/mmc/card/queue.c
@@ -406,41 +406,6 @@ void mmc_queue_resume(struct mmc_queue *mq)
 	}
 }
 
-static unsigned int mmc_queue_packed_map_sg(struct mmc_queue *mq,
-					    struct mmc_packed *packed,
-					    struct scatterlist *sg,
-					    enum mmc_packed_type cmd_type)
-{
-	struct scatterlist *__sg = sg;
-	unsigned int sg_len = 0;
-	struct request *req;
-
-	if (mmc_packed_wr(cmd_type)) {
-		unsigned int hdr_sz = mmc_large_sector(mq->card) ? 4096 : 512;
-		unsigned int max_seg_sz = queue_max_segment_size(mq->queue);
-		unsigned int len, remain, offset = 0;
-		u8 *buf = (u8 *)packed->cmd_hdr;
-
-		remain = hdr_sz;
-		do {
-			len = min(remain, max_seg_sz);
-			sg_set_buf(__sg, buf + offset, len);
-			offset += len;
-			remain -= len;
-			sg_unmark_end(__sg++);
-			sg_len++;
-		} while (remain);
-	}
-
-	list_for_each_entry(req, &packed->list, queuelist) {
-		sg_len += blk_rq_map_sg(mq->queue, req, __sg);
-		__sg = sg + (sg_len - 1);
-		sg_unmark_end(__sg++);
-	}
-	sg_mark_end(sg + (sg_len - 1));
-	return sg_len;
-}
-
 /*
  * Prepare the sg list(s) to be handed of to the host driver
  */
@@ -449,26 +414,14 @@ unsigned int mmc_queue_map_sg(struct mmc_queue *mq, struct mmc_queue_req *mqrq)
 	unsigned int sg_len;
 	size_t buflen;
 	struct scatterlist *sg;
-	enum mmc_packed_type cmd_type;
 	int i;
 
-	cmd_type = mqrq->cmd_type;
-
-	if (!mqrq->bounce_buf) {
-		if (mmc_packed_cmd(cmd_type))
-			return mmc_queue_packed_map_sg(mq, mqrq->packed,
-						       mqrq->sg, cmd_type);
-		else
-			return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);
-	}
+	if (!mqrq->bounce_buf)
+		return blk_rq_map_sg(mq->queue, mqrq->req, mqrq->sg);
 
 	BUG_ON(!mqrq->bounce_sg);
 
-	if (mmc_packed_cmd(cmd_type))
-		sg_len = mmc_queue_packed_map_sg(mq, mqrq->packed,
-						 mqrq->bounce_sg, cmd_type);
-	else
-		sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);
+	sg_len = blk_rq_map_sg(mq->queue, mqrq->req, mqrq->bounce_sg);
 
 	mqrq->bounce_sg_len = sg_len;
 
diff --git a/drivers/mmc/card/queue.h b/drivers/mmc/card/queue.h
index ae96eb223990..47f5532b5776 100644
--- a/drivers/mmc/card/queue.h
+++ b/drivers/mmc/card/queue.h
@@ -22,23 +22,6 @@ struct mmc_blk_request {
 	int			retune_retry_done;
 };
 
-enum mmc_packed_type {
-	MMC_PACKED_NONE = 0,
-	MMC_PACKED_WRITE,
-};
-
-#define mmc_packed_cmd(type)	((type) != MMC_PACKED_NONE)
-#define mmc_packed_wr(type)	((type) == MMC_PACKED_WRITE)
-
-struct mmc_packed {
-	struct list_head	list;
-	u32			cmd_hdr[1024];
-	unsigned int		blocks;
-	u8			nr_entries;
-	u8			retries;
-	s16			idx_failure;
-};
-
 struct mmc_queue_req {
 	struct request		*req;
 	struct mmc_blk_request	brq;
@@ -47,8 +30,6 @@ struct mmc_queue_req {
 	struct scatterlist	*bounce_sg;
 	unsigned int		bounce_sg_len;
 	struct mmc_async_req	mmc_active;
-	enum mmc_packed_type	cmd_type;
-	struct mmc_packed	*packed;
 };
 
 struct mmc_queue {
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 5310f94be0ab..37c3d82d8550 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -495,11 +495,6 @@ static inline int mmc_host_uhs(struct mmc_host *host)
 		 MMC_CAP_UHS_DDR50);
 }
 
-static inline int mmc_host_packed_wr(struct mmc_host *host)
-{
-	return host->caps2 & MMC_CAP2_PACKED_WR;
-}
-
 static inline int mmc_card_hs(struct mmc_card *card)
 {
 	return card->host->ios.timing == MMC_TIMING_SD_HS ||
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 10:08 [PATCH] mmc: block: delete packed command support Linus Walleij
@ 2016-11-21 11:11 ` Arnd Bergmann
  2016-11-21 12:44   ` Adrian Hunter
  2016-11-21 14:23   ` Alex Lemberg
  0 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2016-11-21 11:11 UTC (permalink / raw)
  To: Linus Walleij
  Cc: linux-mmc, Ulf Hansson, Chunyan Zhang, Baolin Wang,
	Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio, Alex Lemberg

On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
> I've had it with this code now.
> 
> The packed command support is a complex hurdle in the MMC/SD block
> layer, around 500+ lines of code which was introduced in 2013 in
> commits
> 
> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
> 
> ...and since then it has been rotting. The original author of the
> code has disappeared from the community and the mail address is
> bouncing.
> 
> For the code to be exercised the host must flag that it supports
> packed commands, so in mmc_blk_prep_packed_list() which is called for
> every single request, the following construction appears:
> 
> u8 max_packed_rw = 0;
> 
> if ((rq_data_dir(cur) == WRITE) &&
>     mmc_host_packed_wr(card->host))
>         max_packed_rw = card->ext_csd.max_packed_writes;
> 
> if (max_packed_rw == 0)
>     goto no_packed;
> 
> This has the following logical deductions:
> 
> - Only WRITE commands can really be packed, so the solution is
>   only half-done: we support packed WRITE but not packed READ.
>   The packed command support has not been finalized by supporting
>   reads in three years!

Packed reads don't make a lot of sense, there is very little
for an MMC to optimize in reads that it can't already do without
the packing. For writes, packing makes could be an important
performance optimization, if the eMMC supports it.

I've added Luca Porzio  and Alex Lemberg to Cc. I think they
are subscribed to the list already, but it would be good to
get some estimate from them too about how common the packed
write support is on existing hardware from their respective
employers before we kill it off.

If none of Samsung/Micron/Sandisk are currently shipping
devices that support eMMC-4.5 packed commands but don't
support the eMMC-5.1 command queuing (which IIRC is a more
general way to achieve the same), we probably don't need to
worry about it too much. 

> - mmc_host_packed_wr() is just a static inline that checks
>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>   that NO upstream host sets this capability flag! No driver
>   in the kernel is using it, and we can't test it. Packed
>   command may be supported in out-of-tree code, but I doubt
>   it. I doubt that the code is even working anymore due to
>   other refactorings in the MMC block layer, who would
>   notice if patches affecting it broke packed commands?
>   No one.

This is a very good indication that it's not really used.
It would be useful to also check out the Android AOSP
kernel tree to see if it's in there, if anything important
supports packed commands, it's probably in there.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 11:11 ` Arnd Bergmann
@ 2016-11-21 12:44   ` Adrian Hunter
  2016-11-21 14:02     ` Ulf Hansson
  2016-11-21 14:23   ` Alex Lemberg
  1 sibling, 1 reply; 15+ messages in thread
From: Adrian Hunter @ 2016-11-21 12:44 UTC (permalink / raw)
  To: Linus Walleij
  Cc: Arnd Bergmann, linux-mmc, Ulf Hansson, Chunyan Zhang,
	Baolin Wang, Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio,
	Alex Lemberg

On 21/11/16 13:11, Arnd Bergmann wrote:
> On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
>> I've had it with this code now.
>>
>> The packed command support is a complex hurdle in the MMC/SD block
>> layer, around 500+ lines of code which was introduced in 2013 in
>> commits
>>
>> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
>> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
>>
>> ...and since then it has been rotting. The original author of the
>> code has disappeared from the community and the mail address is
>> bouncing.
>>
>> For the code to be exercised the host must flag that it supports
>> packed commands, so in mmc_blk_prep_packed_list() which is called for
>> every single request, the following construction appears:
>>
>> u8 max_packed_rw = 0;
>>
>> if ((rq_data_dir(cur) == WRITE) &&
>>     mmc_host_packed_wr(card->host))
>>         max_packed_rw = card->ext_csd.max_packed_writes;
>>
>> if (max_packed_rw == 0)
>>     goto no_packed;
>>
>> This has the following logical deductions:
>>
>> - Only WRITE commands can really be packed, so the solution is
>>   only half-done: we support packed WRITE but not packed READ.
>>   The packed command support has not been finalized by supporting
>>   reads in three years!
> 
> Packed reads don't make a lot of sense, there is very little
> for an MMC to optimize in reads that it can't already do without
> the packing. For writes, packing makes could be an important
> performance optimization, if the eMMC supports it.
> 
> I've added Luca Porzio  and Alex Lemberg to Cc. I think they
> are subscribed to the list already, but it would be good to
> get some estimate from them too about how common the packed
> write support is on existing hardware from their respective
> employers before we kill it off.
> 
> If none of Samsung/Micron/Sandisk are currently shipping
> devices that support eMMC-4.5 packed commands but don't
> support the eMMC-5.1 command queuing (which IIRC is a more
> general way to achieve the same), we probably don't need to
> worry about it too much. 
> 
>> - mmc_host_packed_wr() is just a static inline that checks
>>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>>   that NO upstream host sets this capability flag! No driver
>>   in the kernel is using it, and we can't test it. Packed
>>   command may be supported in out-of-tree code, but I doubt
>>   it. I doubt that the code is even working anymore due to
>>   other refactorings in the MMC block layer, who would
>>   notice if patches affecting it broke packed commands?
>>   No one.
> 
> This is a very good indication that it's not really used.
> It would be useful to also check out the Android AOSP
> kernel tree to see if it's in there, if anything important
> supports packed commands, it's probably in there.
> 
>> - There is no Device Tree binding or code to mark a host as
>>   supporting packed read or write commands, just this flag
>>   in caps2, so for sure there are not any DT systems using
>>   it either.
>> 
>> It has other problems as well: mmc_blk_prep_packed_list() is
>> speculatively picking requests out of the request queue with
>> blk_fetch_request() making the MMC/SD stack harder to convert
>> to the multiqueue block layer. By this we get rid of an
>> obstacle.

SDHCIv4 has a feature (ADMA3) which is slightly similar to packed
commands but it does not require card support.

Why is it a problem for blk-mq to allow some extra requests to
pick from for packing?


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 12:44   ` Adrian Hunter
@ 2016-11-21 14:02     ` Ulf Hansson
  2016-11-21 14:17       ` Adrian Hunter
  0 siblings, 1 reply; 15+ messages in thread
From: Ulf Hansson @ 2016-11-21 14:02 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Linus Walleij, Arnd Bergmann, linux-mmc, Chunyan Zhang,
	Baolin Wang, Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio,
	Alex Lemberg

On 21 November 2016 at 13:44, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 21/11/16 13:11, Arnd Bergmann wrote:
>> On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
>>> I've had it with this code now.
>>>
>>> The packed command support is a complex hurdle in the MMC/SD block
>>> layer, around 500+ lines of code which was introduced in 2013 in
>>> commits
>>>
>>> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
>>> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
>>>
>>> ...and since then it has been rotting. The original author of the
>>> code has disappeared from the community and the mail address is
>>> bouncing.
>>>
>>> For the code to be exercised the host must flag that it supports
>>> packed commands, so in mmc_blk_prep_packed_list() which is called for
>>> every single request, the following construction appears:
>>>
>>> u8 max_packed_rw = 0;
>>>
>>> if ((rq_data_dir(cur) == WRITE) &&
>>>     mmc_host_packed_wr(card->host))
>>>         max_packed_rw = card->ext_csd.max_packed_writes;
>>>
>>> if (max_packed_rw == 0)
>>>     goto no_packed;
>>>
>>> This has the following logical deductions:
>>>
>>> - Only WRITE commands can really be packed, so the solution is
>>>   only half-done: we support packed WRITE but not packed READ.
>>>   The packed command support has not been finalized by supporting
>>>   reads in three years!
>>
>> Packed reads don't make a lot of sense, there is very little
>> for an MMC to optimize in reads that it can't already do without
>> the packing. For writes, packing makes could be an important
>> performance optimization, if the eMMC supports it.
>>
>> I've added Luca Porzio  and Alex Lemberg to Cc. I think they
>> are subscribed to the list already, but it would be good to
>> get some estimate from them too about how common the packed
>> write support is on existing hardware from their respective
>> employers before we kill it off.
>>
>> If none of Samsung/Micron/Sandisk are currently shipping
>> devices that support eMMC-4.5 packed commands but don't
>> support the eMMC-5.1 command queuing (which IIRC is a more
>> general way to achieve the same), we probably don't need to
>> worry about it too much.
>>
>>> - mmc_host_packed_wr() is just a static inline that checks
>>>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>>>   that NO upstream host sets this capability flag! No driver
>>>   in the kernel is using it, and we can't test it. Packed
>>>   command may be supported in out-of-tree code, but I doubt
>>>   it. I doubt that the code is even working anymore due to
>>>   other refactorings in the MMC block layer, who would
>>>   notice if patches affecting it broke packed commands?
>>>   No one.
>>
>> This is a very good indication that it's not really used.
>> It would be useful to also check out the Android AOSP
>> kernel tree to see if it's in there, if anything important
>> supports packed commands, it's probably in there.
>>
>>> - There is no Device Tree binding or code to mark a host as
>>>   supporting packed read or write commands, just this flag
>>>   in caps2, so for sure there are not any DT systems using
>>>   it either.
>>>
>>> It has other problems as well: mmc_blk_prep_packed_list() is
>>> speculatively picking requests out of the request queue with
>>> blk_fetch_request() making the MMC/SD stack harder to convert
>>> to the multiqueue block layer. By this we get rid of an
>>> obstacle.
>
> SDHCIv4 has a feature (ADMA3) which is slightly similar to packed
> commands but it does not require card support.
>
> Why is it a problem for blk-mq to allow some extra requests to
> pick from for packing?
>

In blk-mq, requests don't get picked from the queue, but instead those
gets "pushed" to the block device driver.

First, to support packing, it seems like we would need to specify a
queue-depth > 1, more or less lie to blk-mq layer about the device's
capability. Okay, we can do that. But..

I also believe, the implementation would become really complex, as you
would need to "hold" the first write request, while waiting for a
second to arrive. Then for how long shall you hold it? And then what
if you are unlucky so the next is read request, thus you can't pack
them. The solution will be suboptimal, won't it?

Perhaps Linus can add something, or confirm?

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 14:02     ` Ulf Hansson
@ 2016-11-21 14:17       ` Adrian Hunter
  2016-11-21 15:17         ` Ulf Hansson
  2016-11-21 15:27         ` Linus Walleij
  0 siblings, 2 replies; 15+ messages in thread
From: Adrian Hunter @ 2016-11-21 14:17 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Linus Walleij, Arnd Bergmann, linux-mmc, Chunyan Zhang,
	Baolin Wang, Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio,
	Alex Lemberg

On 21/11/16 16:02, Ulf Hansson wrote:
> On 21 November 2016 at 13:44, Adrian Hunter <adrian.hunter@intel.com> wrote:
>> On 21/11/16 13:11, Arnd Bergmann wrote:
>>> On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
>>>> I've had it with this code now.
>>>>
>>>> The packed command support is a complex hurdle in the MMC/SD block
>>>> layer, around 500+ lines of code which was introduced in 2013 in
>>>> commits
>>>>
>>>> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
>>>> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
>>>>
>>>> ...and since then it has been rotting. The original author of the
>>>> code has disappeared from the community and the mail address is
>>>> bouncing.
>>>>
>>>> For the code to be exercised the host must flag that it supports
>>>> packed commands, so in mmc_blk_prep_packed_list() which is called for
>>>> every single request, the following construction appears:
>>>>
>>>> u8 max_packed_rw = 0;
>>>>
>>>> if ((rq_data_dir(cur) == WRITE) &&
>>>>     mmc_host_packed_wr(card->host))
>>>>         max_packed_rw = card->ext_csd.max_packed_writes;
>>>>
>>>> if (max_packed_rw == 0)
>>>>     goto no_packed;
>>>>
>>>> This has the following logical deductions:
>>>>
>>>> - Only WRITE commands can really be packed, so the solution is
>>>>   only half-done: we support packed WRITE but not packed READ.
>>>>   The packed command support has not been finalized by supporting
>>>>   reads in three years!
>>>
>>> Packed reads don't make a lot of sense, there is very little
>>> for an MMC to optimize in reads that it can't already do without
>>> the packing. For writes, packing makes could be an important
>>> performance optimization, if the eMMC supports it.
>>>
>>> I've added Luca Porzio  and Alex Lemberg to Cc. I think they
>>> are subscribed to the list already, but it would be good to
>>> get some estimate from them too about how common the packed
>>> write support is on existing hardware from their respective
>>> employers before we kill it off.
>>>
>>> If none of Samsung/Micron/Sandisk are currently shipping
>>> devices that support eMMC-4.5 packed commands but don't
>>> support the eMMC-5.1 command queuing (which IIRC is a more
>>> general way to achieve the same), we probably don't need to
>>> worry about it too much.
>>>
>>>> - mmc_host_packed_wr() is just a static inline that checks
>>>>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>>>>   that NO upstream host sets this capability flag! No driver
>>>>   in the kernel is using it, and we can't test it. Packed
>>>>   command may be supported in out-of-tree code, but I doubt
>>>>   it. I doubt that the code is even working anymore due to
>>>>   other refactorings in the MMC block layer, who would
>>>>   notice if patches affecting it broke packed commands?
>>>>   No one.
>>>
>>> This is a very good indication that it's not really used.
>>> It would be useful to also check out the Android AOSP
>>> kernel tree to see if it's in there, if anything important
>>> supports packed commands, it's probably in there.
>>>
>>>> - There is no Device Tree binding or code to mark a host as
>>>>   supporting packed read or write commands, just this flag
>>>>   in caps2, so for sure there are not any DT systems using
>>>>   it either.
>>>>
>>>> It has other problems as well: mmc_blk_prep_packed_list() is
>>>> speculatively picking requests out of the request queue with
>>>> blk_fetch_request() making the MMC/SD stack harder to convert
>>>> to the multiqueue block layer. By this we get rid of an
>>>> obstacle.
>>
>> SDHCIv4 has a feature (ADMA3) which is slightly similar to packed
>> commands but it does not require card support.
>>
>> Why is it a problem for blk-mq to allow some extra requests to
>> pick from for packing?
>>
> 
> In blk-mq, requests don't get picked from the queue, but instead those
> gets "pushed" to the block device driver.
> 
> First, to support packing, it seems like we would need to specify a
> queue-depth > 1, more or less lie to blk-mq layer about the device's
> capability. Okay, we can do that. But..
> 
> I also believe, the implementation would become really complex, as you
> would need to "hold" the first write request, while waiting for a
> second to arrive. Then for how long shall you hold it? And then what
> if you are unlucky so the next is read request, thus you can't pack
> them. The solution will be suboptimal, won't it?

It doesn't hold and wait now.  So why would it in the blk-mq case?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 11:11 ` Arnd Bergmann
  2016-11-21 12:44   ` Adrian Hunter
@ 2016-11-21 14:23   ` Alex Lemberg
  2016-11-22  3:53     ` Jaehoon Chung
  1 sibling, 1 reply; 15+ messages in thread
From: Alex Lemberg @ 2016-11-21 14:23 UTC (permalink / raw)
  To: Arnd Bergmann, Linus Walleij
  Cc: linux-mmc, Ulf Hansson, Chunyan Zhang, Baolin Wang,
	Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio

Hi,


On 11/21/16, 1:11 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:

>On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
>> I've had it with this code now.
>> 
>> The packed command support is a complex hurdle in the MMC/SD block
>> layer, around 500+ lines of code which was introduced in 2013 in
>> commits
>> 
>> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
>> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
>> 
>> ...and since then it has been rotting. The original author of the
>> code has disappeared from the community and the mail address is
>> bouncing.
>> 
>> For the code to be exercised the host must flag that it supports
>> packed commands, so in mmc_blk_prep_packed_list() which is called for
>> every single request, the following construction appears:
>> 
>> u8 max_packed_rw = 0;
>> 
>> if ((rq_data_dir(cur) == WRITE) &&
>>     mmc_host_packed_wr(card->host))
>>         max_packed_rw = card->ext_csd.max_packed_writes;
>> 
>> if (max_packed_rw == 0)
>>     goto no_packed;
>> 
>> This has the following logical deductions:
>> 
>> - Only WRITE commands can really be packed, so the solution is
>>   only half-done: we support packed WRITE but not packed READ.
>>   The packed command support has not been finalized by supporting
>>   reads in three years!
>
>Packed reads don't make a lot of sense, there is very little
>for an MMC to optimize in reads that it can't already do without
>the packing. For writes, packing makes could be an important
>performance optimization, if the eMMC supports it.
>
>I've added Luca Porzio  and Alex Lemberg to Cc. I think they
>are subscribed to the list already, but it would be good to
>get some estimate from them too about how common the packed
>write support is on existing hardware from their respective
>employers before we kill it off.

Correct, in general there is no value in using packed for Read.
But I can’t say this for all existing flash management solution.
The eMMC spec allows to use it for Read as well.

>
>If none of Samsung/Micron/Sandisk are currently shipping
>devices that support eMMC-4.5 packed commands but don't
>support the eMMC-5.1 command queuing (which IIRC is a more
>general way to achieve the same), we probably don't need to
>worry about it too much. 

Please note that even by having eMMC-5.1 device,
not all chipset vendors are having HW/SW CMDQ support.
So they might be using packed commands instead.

>
>> - mmc_host_packed_wr() is just a static inline that checks
>>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>>   that NO upstream host sets this capability flag! No driver
>>   in the kernel is using it, and we can't test it. Packed
>>   command may be supported in out-of-tree code, but I doubt
>>   it. I doubt that the code is even working anymore due to
>>   other refactorings in the MMC block layer, who would
>>   notice if patches affecting it broke packed commands?
>>   No one.
>
>This is a very good indication that it's not really used.
>It would be useful to also check out the Android AOSP
>kernel tree to see if it's in there, if anything important
>supports packed commands, it's probably in there.

As far as I can say from reviewing the mobile (Android)
platforms kernel source (from different vendors), 
many of them are enabling Packed Commands.

>
>	Arnd


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 14:17       ` Adrian Hunter
@ 2016-11-21 15:17         ` Ulf Hansson
  2016-11-21 15:27         ` Linus Walleij
  1 sibling, 0 replies; 15+ messages in thread
From: Ulf Hansson @ 2016-11-21 15:17 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Linus Walleij, Arnd Bergmann, linux-mmc, Chunyan Zhang,
	Baolin Wang, Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio,
	Alex Lemberg

[...]

>>>
>>> SDHCIv4 has a feature (ADMA3) which is slightly similar to packed
>>> commands but it does not require card support.
>>>
>>> Why is it a problem for blk-mq to allow some extra requests to
>>> pick from for packing?
>>>
>>
>> In blk-mq, requests don't get picked from the queue, but instead those
>> gets "pushed" to the block device driver.
>>
>> First, to support packing, it seems like we would need to specify a
>> queue-depth > 1, more or less lie to blk-mq layer about the device's
>> capability. Okay, we can do that. But..
>>
>> I also believe, the implementation would become really complex, as you
>> would need to "hold" the first write request, while waiting for a
>> second to arrive. Then for how long shall you hold it? And then what
>> if you are unlucky so the next is read request, thus you can't pack
>> them. The solution will be suboptimal, won't it?
>
> It doesn't hold and wait now.  So why would it in the blk-mq case?

Because earlier the mmc block device driver *fetches* requests from
the queue and when finding more than one request, we could try to pack
them. If there are only one request, we just move on with it.

With blk-mq, we don't *fetch* request from the queue, but instead
those would get pushed to the mmc block device driver via the
implemented block-mq ops.

Please note, I am not an blk-mq expert, but this is my understanding.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 14:17       ` Adrian Hunter
  2016-11-21 15:17         ` Ulf Hansson
@ 2016-11-21 15:27         ` Linus Walleij
  1 sibling, 0 replies; 15+ messages in thread
From: Linus Walleij @ 2016-11-21 15:27 UTC (permalink / raw)
  To: Adrian Hunter, Jens Axboe, Christoph Hellwig, linux-block
  Cc: Ulf Hansson, Arnd Bergmann, linux-mmc, Chunyan Zhang,
	Baolin Wang, Jaehoon Chung, Namjae Jeon, Maya Erez, Luca Porzio,
	Alex Lemberg, Paolo Valente

[CC the block layer maintainers so they can smack my fingers
if I misunderstood any of how the block layer semantics work...]

On Mon, Nov 21, 2016 at 3:17 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 21/11/16 16:02, Ulf Hansson wrote:

>> I also believe, the implementation would become really complex, as you
>> would need to "hold" the first write request, while waiting for a
>> second to arrive. Then for how long shall you hold it? And then what
>> if you are unlucky so the next is read request, thus you can't pack
>> them. The solution will be suboptimal, won't it?
>
> It doesn't hold and wait now.  So why would it in the blk-mq case?

The current kthread in drivers/mmc/card/queue.c looks like this
in essence:

struct request *r;

while (1)
    r = blk_fetch_request(q);
    issue();
}

It is pulling out as much as it can to asynchronously issue
two requests in parallel and also the packed command (as
mentioned in the commitlog to $SUBJECT) pulls out even more
stuff of the queue to speculatively issue things in a packed
command.

The block layer isn't supposed to be used like this. It is
supposed to be used like so:

1. You get notified by the request_fn that is passed with
   blk_init_queue()
2. The request function fires a work.
3. The work pick ONE request with blk_fetch_request()
  and handles it.
4. Repeat from (1)

Instead of doing this the MMC layer kthread is speculatively
pulling out stuff of the queue whenever it can, including
pulling out a few NULL at the end before it stops. The
mechanism is similar to a person running along a queue and
picking a segment of passengers into a bus to send off,
batching them. Which is clever, but the block layer is not
supposed to be used like that. It just happens to work.

In blk-mq this speculative fetching is no longer possible.
Instead you register a notification function in struct blk_mq_ops
vtable .queue_rq() callback: this will be called by the block
layer core whenever there is a request available on "our"
queue. It further approves a .init_request() callback that is
called overhead to allocate a per-request context, such as
the current struct mmc_queue_req - just not quite because
the current mmc_queue_req is not mapped 1-to-1 onto a
request from the block layer because of packed command;
but it is after this patch, hehe ;)

Any speculative batching needs to happen *after* this, i.e.
the MMC layer would have to report a certain larger queue
depth (if you set it to 1 you only ever get one request at the time
and have to finish it before you get a new one), group the
requests itself with packed command or command queueing,
then signal them back as they are confirmed completed by
the device, or, if they cannot be grouped, handle as far as you
can and put the remaining requests back on the queue
(creating a "bubble" in the pipeline).

Relying on iterating and inspecting the block layer queue is
*not* possible with blk-mq, sure blk_fetch_request() is still
there, but if you call it on an mq-registered queue, it will
crash the kernel. (At least it did for me.) Clearly it is not
intended to be used with MQ: none of the MQ-converted
subsystems use this. (drivers/mtd/ubi/block.c is a good
simple example)

I liken these mechanisms to a pipeline:

- The two-levels deep speculation buffer in struct mmc_queue
  field .mqrq[2] is a "software pipeline" in the MMC layer (so we
  can prepare and handle requests in parallel)

- The packed command and command queue is a
  hardware-supported pipeline on the device side.

Both try to overcome the hardware limitations of the MMC/SD
logical interface. This batching-style pipelining isn't really
solving the problem the way real multiqueue hardware does,
so it is a poor man's patchwork to the problem.

In either case, as Ulf notes, you need to get a few requests
off the queue and group them using packed command or
command queueing if possible, but that grouping needs to
happen in the MMC/SD layer, after picking the requests from
the queue. I think it is OK to do so and just put any requests
you cannot pack into the pipeline back on the queue. But I
am not sure (still learning....)

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-21 14:23   ` Alex Lemberg
@ 2016-11-22  3:53     ` Jaehoon Chung
  2016-11-22  8:49       ` Ulf Hansson
  0 siblings, 1 reply; 15+ messages in thread
From: Jaehoon Chung @ 2016-11-22  3:53 UTC (permalink / raw)
  To: Alex Lemberg, Arnd Bergmann, Linus Walleij
  Cc: linux-mmc, Ulf Hansson, Chunyan Zhang, Baolin Wang, Namjae Jeon,
	Maya Erez, Luca Porzio

On 11/21/2016 11:23 PM, Alex Lemberg wrote:
> Hi,
> 
> 
> On 11/21/16, 1:11 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
>> On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
>>> I've had it with this code now.
>>>
>>> The packed command support is a complex hurdle in the MMC/SD block
>>> layer, around 500+ lines of code which was introduced in 2013 in
>>> commits
>>>
>>> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
>>> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
>>>
>>> ...and since then it has been rotting. The original author of the
>>> code has disappeared from the community and the mail address is
>>> bouncing.
>>>
>>> For the code to be exercised the host must flag that it supports
>>> packed commands, so in mmc_blk_prep_packed_list() which is called for
>>> every single request, the following construction appears:
>>>
>>> u8 max_packed_rw = 0;
>>>
>>> if ((rq_data_dir(cur) == WRITE) &&
>>>     mmc_host_packed_wr(card->host))
>>>         max_packed_rw = card->ext_csd.max_packed_writes;
>>>
>>> if (max_packed_rw == 0)
>>>     goto no_packed;
>>>
>>> This has the following logical deductions:
>>>
>>> - Only WRITE commands can really be packed, so the solution is
>>>   only half-done: we support packed WRITE but not packed READ.
>>>   The packed command support has not been finalized by supporting
>>>   reads in three years!
>>
>> Packed reads don't make a lot of sense, there is very little
>> for an MMC to optimize in reads that it can't already do without
>> the packing. For writes, packing makes could be an important
>> performance optimization, if the eMMC supports it.
>>
>> I've added Luca Porzio  and Alex Lemberg to Cc. I think they
>> are subscribed to the list already, but it would be good to
>> get some estimate from them too about how common the packed
>> write support is on existing hardware from their respective
>> employers before we kill it off.
> 
> Correct, in general there is no value in using packed for Read.
> But I can’t say this for all existing flash management solution.
> The eMMC spec allows to use it for Read as well.

As i know, when packed command had implemented, early eMMC had the firmware problem
for Packed Read operation. but so I can't say Packed Read doesn't have the benefit for performance.
But Packed Write command can see the benefit for performance.

> 
>>
>> If none of Samsung/Micron/Sandisk are currently shipping
>> devices that support eMMC-4.5 packed commands but don't
>> support the eMMC-5.1 command queuing (which IIRC is a more
>> general way to achieve the same), we probably don't need to
>> worry about it too much. 
> 
> Please note that even by having eMMC-5.1 device,
> not all chipset vendors are having HW/SW CMDQ support.
> So they might be using packed commands instead.
> 
>>
>>> - mmc_host_packed_wr() is just a static inline that checks
>>>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>>>   that NO upstream host sets this capability flag! No driver
>>>   in the kernel is using it, and we can't test it. Packed
>>>   command may be supported in out-of-tree code, but I doubt
>>>   it. I doubt that the code is even working anymore due to
>>>   other refactorings in the MMC block layer, who would
>>>   notice if patches affecting it broke packed commands?
>>>   No one.
>>
>> This is a very good indication that it's not really used.
>> It would be useful to also check out the Android AOSP
>> kernel tree to see if it's in there, if anything important
>> supports packed commands, it's probably in there.
> 
> As far as I can say from reviewing the mobile (Android)
> platforms kernel source (from different vendors), 
> many of them are enabling Packed Commands.

Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
(For Android/Tizen OS and ARTIK boards)

Best Regards,
Jaehoon Chung

> 
>>
>> 	Arnd
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-22  3:53     ` Jaehoon Chung
@ 2016-11-22  8:49       ` Ulf Hansson
  2016-11-22 12:49         ` Linus Walleij
  2016-11-22 14:44         ` Arnd Bergmann
  0 siblings, 2 replies; 15+ messages in thread
From: Ulf Hansson @ 2016-11-22  8:49 UTC (permalink / raw)
  To: Jaehoon Chung
  Cc: Alex Lemberg, Arnd Bergmann, Linus Walleij, linux-mmc,
	Chunyan Zhang, Baolin Wang, Namjae Jeon, Maya Erez, Luca Porzio

On 22 November 2016 at 04:53, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> On 11/21/2016 11:23 PM, Alex Lemberg wrote:
>> Hi,
>>
>>
>> On 11/21/16, 1:11 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
>>
>>> On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
>>>> I've had it with this code now.
>>>>
>>>> The packed command support is a complex hurdle in the MMC/SD block
>>>> layer, around 500+ lines of code which was introduced in 2013 in
>>>> commits
>>>>
>>>> ce39f9d17c14 ("mmc: support packed write command for eMMC4.5 devices")
>>>> abd9ac144947 ("mmc: add packed command feature of eMMC4.5")
>>>>
>>>> ...and since then it has been rotting. The original author of the
>>>> code has disappeared from the community and the mail address is
>>>> bouncing.
>>>>
>>>> For the code to be exercised the host must flag that it supports
>>>> packed commands, so in mmc_blk_prep_packed_list() which is called for
>>>> every single request, the following construction appears:
>>>>
>>>> u8 max_packed_rw = 0;
>>>>
>>>> if ((rq_data_dir(cur) == WRITE) &&
>>>>     mmc_host_packed_wr(card->host))
>>>>         max_packed_rw = card->ext_csd.max_packed_writes;
>>>>
>>>> if (max_packed_rw == 0)
>>>>     goto no_packed;
>>>>
>>>> This has the following logical deductions:
>>>>
>>>> - Only WRITE commands can really be packed, so the solution is
>>>>   only half-done: we support packed WRITE but not packed READ.
>>>>   The packed command support has not been finalized by supporting
>>>>   reads in three years!
>>>
>>> Packed reads don't make a lot of sense, there is very little
>>> for an MMC to optimize in reads that it can't already do without
>>> the packing. For writes, packing makes could be an important
>>> performance optimization, if the eMMC supports it.
>>>
>>> I've added Luca Porzio  and Alex Lemberg to Cc. I think they
>>> are subscribed to the list already, but it would be good to
>>> get some estimate from them too about how common the packed
>>> write support is on existing hardware from their respective
>>> employers before we kill it off.
>>
>> Correct, in general there is no value in using packed for Read.
>> But I can’t say this for all existing flash management solution.
>> The eMMC spec allows to use it for Read as well.
>
> As i know, when packed command had implemented, early eMMC had the firmware problem
> for Packed Read operation. but so I can't say Packed Read doesn't have the benefit for performance.
> But Packed Write command can see the benefit for performance.

Regarding "performance", are you merely thinking about increased
throughput? With packed command we decrease the communication overhead
with the card so less commands becomes sent/received.

Or, did you also observed an improved behaviour of the card from a
garbage collect point of view? In other words also a decreased latency
when the device is becoming more and more used?

Finally, did you compare the packed command, towards using the
asynchronous request mechanisms (using the ->pre|post_req() mmc host
ops)?

>
>>
>>>
>>> If none of Samsung/Micron/Sandisk are currently shipping
>>> devices that support eMMC-4.5 packed commands but don't
>>> support the eMMC-5.1 command queuing (which IIRC is a more
>>> general way to achieve the same), we probably don't need to
>>> worry about it too much.
>>
>> Please note that even by having eMMC-5.1 device,
>> not all chipset vendors are having HW/SW CMDQ support.
>> So they might be using packed commands instead.
>>
>>>
>>>> - mmc_host_packed_wr() is just a static inline that checks
>>>>   host->caps2 & MMC_CAP2_PACKED_WR. The problem with this is
>>>>   that NO upstream host sets this capability flag! No driver
>>>>   in the kernel is using it, and we can't test it. Packed
>>>>   command may be supported in out-of-tree code, but I doubt
>>>>   it. I doubt that the code is even working anymore due to
>>>>   other refactorings in the MMC block layer, who would
>>>>   notice if patches affecting it broke packed commands?
>>>>   No one.
>>>
>>> This is a very good indication that it's not really used.
>>> It would be useful to also check out the Android AOSP
>>> kernel tree to see if it's in there, if anything important
>>> supports packed commands, it's probably in there.
>>
>> As far as I can say from reviewing the mobile (Android)
>> platforms kernel source (from different vendors),
>> many of them are enabling Packed Commands.
>
> Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
> (For Android/Tizen OS and ARTIK boards)

Thanks for sharing this information!

It seems like we need to run another round of performance
measurements, as to get some fresh number of the benefit of packed
command.
I would really appreciate if you could help out with that.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-22  8:49       ` Ulf Hansson
@ 2016-11-22 12:49         ` Linus Walleij
  2016-11-23  9:34           ` Jaehoon Chung
  2016-11-22 14:44         ` Arnd Bergmann
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Walleij @ 2016-11-22 12:49 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Jaehoon Chung, Alex Lemberg, Arnd Bergmann, linux-mmc,
	Chunyan Zhang, Baolin Wang, Namjae Jeon, Maya Erez, Luca Porzio

On Tue, Nov 22, 2016 at 9:49 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
> On 22 November 2016 at 04:53, Jaehoon Chung <jh80.chung@samsung.com> wrote:

>>> Correct, in general there is no value in using packed for Read.
>>> But I can’t say this for all existing flash management solution.
>>> The eMMC spec allows to use it for Read as well.
>>
>> As i know, when packed command had implemented, early eMMC had the firmware problem
>> for Packed Read operation. but so I can't say Packed Read doesn't have the benefit for performance.
>> But Packed Write command can see the benefit for performance.
>
> Regarding "performance", are you merely thinking about increased
> throughput? With packed command we decrease the communication overhead
> with the card so less commands becomes sent/received.
>
> Or, did you also observed an improved behaviour of the card from a
> garbage collect point of view? In other words also a decreased latency
> when the device is becoming more and more used?
>
> Finally, did you compare the packed command, towards using the
> asynchronous request mechanisms (using the ->pre|post_req() mmc host
> ops)?

Packed write (and read, if we had implemented it) can only happen when
we get a number of requests to different areas of the card.

If they are to consecutive sectors (such as with a dd-test) it will not
improve performance, as that case is already optimized by the block
layer by front-merging of the requests into large chunks, I guess?

Indeed commit ce39f9d17c14 mentions improvement on lmdd but not
how it was invoked. I suspect it was invoked with random writes...

It is important to do everything we can to improve random small writes
though, and it seems packed command can do that.

(...)
>> Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
>> (For Android/Tizen OS and ARTIK boards)
>
> Thanks for sharing this information!
>
> It seems like we need to run another round of performance
> measurements, as to get some fresh number of the benefit of packed
> command.
> I would really appreciate if you could help out with that.

I would add: do it on the upstream kernel and submit the stuff needed
to make it work for you out-of-the box.

This is on Exynos I guess?

Can we have this flag set for the Exynos host controller, and/or
proper DT bindings to mark it as packed command enabled?

Hell I don't even know if this is a feature that needs anything special
from the host controller. Does it? Should we rather just enable it for all
host controllers if the card supports packed command? I'm lost.

Yours,
Linus Walleij

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-22  8:49       ` Ulf Hansson
  2016-11-22 12:49         ` Linus Walleij
@ 2016-11-22 14:44         ` Arnd Bergmann
  2016-11-22 16:06           ` Ulf Hansson
  1 sibling, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2016-11-22 14:44 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Jaehoon Chung, Alex Lemberg, Linus Walleij, linux-mmc,
	Chunyan Zhang, Baolin Wang, Namjae Jeon, Maya Erez, Luca Porzio

On Tuesday, November 22, 2016 9:49:26 AM CET Ulf Hansson wrote:
> On 22 November 2016 at 04:53, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> > On 11/21/2016 11:23 PM, Alex Lemberg wrote:
> >> On 11/21/16, 1:11 PM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> >>> On Monday, November 21, 2016 11:08:57 AM CET Linus Walleij wrote:
> >>>
> >>> Packed reads don't make a lot of sense, there is very little
> >>> for an MMC to optimize in reads that it can't already do without
> >>> the packing. For writes, packing makes could be an important
> >>> performance optimization, if the eMMC supports it.
> >>>
> >>> I've added Luca Porzio  and Alex Lemberg to Cc. I think they
> >>> are subscribed to the list already, but it would be good to
> >>> get some estimate from them too about how common the packed
> >>> write support is on existing hardware from their respective
> >>> employers before we kill it off.
> >>
> >> Correct, in general there is no value in using packed for Read.
> >> But I can’t say this for all existing flash management solution.
> >> The eMMC spec allows to use it for Read as well.
> >
> > As i know, when packed command had implemented, early eMMC had the firmware problem
> > for Packed Read operation. but so I can't say Packed Read doesn't have the benefit for performance.
> > But Packed Write command can see the benefit for performance.
> 
> Regarding "performance", are you merely thinking about increased
> throughput? With packed command we decrease the communication overhead
> with the card so less commands becomes sent/received.
> 
> Or, did you also observed an improved behaviour of the card from a
> garbage collect point of view? In other words also a decreased latency
> when the device is becoming more and more used?
> 
> Finally, did you compare the packed command, towards using the
> asynchronous request mechanisms (using the ->pre|post_req() mmc host
> ops)?

The main point of command packing is that the device can be smarter
about garbage collection as well as combine sub-page sized writes.

The communication overhead is nearly irrelevant in comparison,
and we would probably not have done anything for that.

> >> As far as I can say from reviewing the mobile (Android)
> >> platforms kernel source (from different vendors),
> >> many of them are enabling Packed Commands.
> >
> > Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
> > (For Android/Tizen OS and ARTIK boards)
> 
> Thanks for sharing this information!
> 
> It seems like we need to run another round of performance
> measurements, as to get some fresh number of the benefit of packed
> command.
> I would really appreciate if you could help out with that.

As far as I'm concerned, there are already two conclusions and
I don't think those measurements would help much:

- It's a problem that none of our upstream drivers support this
  feature, and we really want them to do so, at least after the
  blk_mq change.

- Linus' analysis is still valid: there is no regression in removing
  it from the traditional blk code today, anyone who has a private
  driver that uses it can simply revert the removal in their next
  private product release.

If removing it helps us enable blk_mq support more easily, then
I think we can take out the packed command handling, but we have
to be prepared to put it back later on top of blk_mq.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-22 14:44         ` Arnd Bergmann
@ 2016-11-22 16:06           ` Ulf Hansson
  2016-11-23  9:40             ` Jaehoon Chung
  0 siblings, 1 reply; 15+ messages in thread
From: Ulf Hansson @ 2016-11-22 16:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Jaehoon Chung, Alex Lemberg, Linus Walleij, linux-mmc,
	Chunyan Zhang, Baolin Wang, Namjae Jeon, Maya Erez, Luca Porzio

[...]

>>
>> Regarding "performance", are you merely thinking about increased
>> throughput? With packed command we decrease the communication overhead
>> with the card so less commands becomes sent/received.
>>
>> Or, did you also observed an improved behaviour of the card from a
>> garbage collect point of view? In other words also a decreased latency
>> when the device is becoming more and more used?
>>
>> Finally, did you compare the packed command, towards using the
>> asynchronous request mechanisms (using the ->pre|post_req() mmc host
>> ops)?
>
> The main point of command packing is that the device can be smarter
> about garbage collection as well as combine sub-page sized writes.

Ideally, yes! But I am not sure that is the case. Vendors would have to confirm.

By following the evolution of development of the eMMC spec one could
make some guesses. Let me try it :-).

In the eMMC 4.5 spec, "data tag", "context ID" and "packed command"
was added. It seems to me that "data tag" and "context ID" was
intended to help the device to be smarter about garbage collection,
while "packed command" is about reducing overhead.

The below is quoted from the packed command section in the spec:

"Read and write commands can be packed in groups of commands (either
all read or all write) that transfer the data for all commands in the
group in one transfer on the bus, to reduce overheads."

>
> The communication overhead is nearly irrelevant in comparison,
> and we would probably not have done anything for that.

I think we did.

And that's particularly why I am interested to see a fresh comparison
against the async request transfer mechanism.

>
>> >> As far as I can say from reviewing the mobile (Android)
>> >> platforms kernel source (from different vendors),
>> >> many of them are enabling Packed Commands.
>> >
>> > Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
>> > (For Android/Tizen OS and ARTIK boards)
>>
>> Thanks for sharing this information!
>>
>> It seems like we need to run another round of performance
>> measurements, as to get some fresh number of the benefit of packed
>> command.
>> I would really appreciate if you could help out with that.
>
> As far as I'm concerned, there are already two conclusions and
> I don't think those measurements would help much:
>
> - It's a problem that none of our upstream drivers support this
>   feature, and we really want them to do so, at least after the
>   blk_mq change.
>
> - Linus' analysis is still valid: there is no regression in removing
>   it from the traditional blk code today, anyone who has a private
>   driver that uses it can simply revert the removal in their next
>   private product release.
>
> If removing it helps us enable blk_mq support more easily, then
> I think we can take out the packed command handling, but we have
> to be prepared to put it back later on top of blk_mq.
>

Thanks for the summary. This do seems like a nice approach.

Let's see what other people thinks about this.

Kind regards
Uffe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-22 12:49         ` Linus Walleij
@ 2016-11-23  9:34           ` Jaehoon Chung
  0 siblings, 0 replies; 15+ messages in thread
From: Jaehoon Chung @ 2016-11-23  9:34 UTC (permalink / raw)
  To: Linus Walleij, Ulf Hansson
  Cc: Alex Lemberg, Arnd Bergmann, linux-mmc, Chunyan Zhang,
	Baolin Wang, Namjae Jeon, Maya Erez, Luca Porzio

Hi Linus,

On 11/22/2016 09:49 PM, Linus Walleij wrote:
> On Tue, Nov 22, 2016 at 9:49 AM, Ulf Hansson <ulf.hansson@linaro.org> wrote:
>> On 22 November 2016 at 04:53, Jaehoon Chung <jh80.chung@samsung.com> wrote:
> 
>>>> Correct, in general there is no value in using packed for Read.
>>>> But I can’t say this for all existing flash management solution.
>>>> The eMMC spec allows to use it for Read as well.
>>>
>>> As i know, when packed command had implemented, early eMMC had the firmware problem
>>> for Packed Read operation. but so I can't say Packed Read doesn't have the benefit for performance.
>>> But Packed Write command can see the benefit for performance.
>>
>> Regarding "performance", are you merely thinking about increased
>> throughput? With packed command we decrease the communication overhead
>> with the card so less commands becomes sent/received.
>>
>> Or, did you also observed an improved behaviour of the card from a
>> garbage collect point of view? In other words also a decreased latency
>> when the device is becoming more and more used?
>>
>> Finally, did you compare the packed command, towards using the
>> asynchronous request mechanisms (using the ->pre|post_req() mmc host
>> ops)?
> 
> Packed write (and read, if we had implemented it) can only happen when
> we get a number of requests to different areas of the card.
> 
> If they are to consecutive sectors (such as with a dd-test) it will not
> improve performance, as that case is already optimized by the block
> layer by front-merging of the requests into large chunks, I guess?
> 
> Indeed commit ce39f9d17c14 mentions improvement on lmdd but not
> how it was invoked. I suspect it was invoked with random writes...

Well, I don't know those improvements was right or not..

> 
> It is important to do everything we can to improve random small writes
> though, and it seems packed command can do that.

Agreed...but i have checked with latest kernel and older version(v3.18).
I saw the random write performance with small chunk size was increased with v3.18, not v4.9-rc5.
I don't know what happens.

> 
> (...)
>>> Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
>>> (For Android/Tizen OS and ARTIK boards)
>>
>> Thanks for sharing this information!
>>
>> It seems like we need to run another round of performance
>> measurements, as to get some fresh number of the benefit of packed
>> command.
>> I would really appreciate if you could help out with that.
> 
> I would add: do it on the upstream kernel and submit the stuff needed
> to make it work for you out-of-the box.
> 
> This is on Exynos I guess?
> 
> Can we have this flag set for the Exynos host controller, and/or
> proper DT bindings to mark it as packed command enabled?

Actually, I can add the property or flags..but i think it's meaningless..
If you ask me my opinion, i don't want to add the enabling packed command at this time.

Because i didn't see the increasing performance dynamically at this time.
But i can say..when we had applied the packed command (2~3years ago.), there were some benefit to use this command.

Yes..my comments might be confused..
but i just mentioned about some Samsung guys are using packed command for their devices with eMMC4.5.
I'm not sure which kernel version they used..maybe older version.

> 
> Hell I don't even know if this is a feature that needs anything special
> from the host controller. Does it? Should we rather just enable it for all
> host controllers if the card supports packed command? I'm lost.
> 

I'm also feeling likes you...Now I'm checking about performance with all exynos devices that i have.

Best Regards,
Jaehoon Chung

> Yours,
> Linus Walleij
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mmc: block: delete packed command support
  2016-11-22 16:06           ` Ulf Hansson
@ 2016-11-23  9:40             ` Jaehoon Chung
  0 siblings, 0 replies; 15+ messages in thread
From: Jaehoon Chung @ 2016-11-23  9:40 UTC (permalink / raw)
  To: Ulf Hansson, Arnd Bergmann
  Cc: Alex Lemberg, Linus Walleij, linux-mmc, Chunyan Zhang,
	Baolin Wang, Namjae Jeon, Maya Erez, Luca Porzio

On 11/23/2016 01:06 AM, Ulf Hansson wrote:
> [...]
> 
>>>
>>> Regarding "performance", are you merely thinking about increased
>>> throughput? With packed command we decrease the communication overhead
>>> with the card so less commands becomes sent/received.
>>>
>>> Or, did you also observed an improved behaviour of the card from a
>>> garbage collect point of view? In other words also a decreased latency
>>> when the device is becoming more and more used?
>>>
>>> Finally, did you compare the packed command, towards using the
>>> asynchronous request mechanisms (using the ->pre|post_req() mmc host
>>> ops)?
>>
>> The main point of command packing is that the device can be smarter
>> about garbage collection as well as combine sub-page sized writes.
> 
> Ideally, yes! But I am not sure that is the case. Vendors would have to confirm.
> 
> By following the evolution of development of the eMMC spec one could
> make some guesses. Let me try it :-).
> 
> In the eMMC 4.5 spec, "data tag", "context ID" and "packed command"
> was added. It seems to me that "data tag" and "context ID" was
> intended to help the device to be smarter about garbage collection,
> while "packed command" is about reducing overhead.
> 
> The below is quoted from the packed command section in the spec:
> 
> "Read and write commands can be packed in groups of commands (either
> all read or all write) that transfer the data for all commands in the
> group in one transfer on the bus, to reduce overheads."
> 
>>
>> The communication overhead is nearly irrelevant in comparison,
>> and we would probably not have done anything for that.
> 
> I think we did.
> 
> And that's particularly why I am interested to see a fresh comparison
> against the async request transfer mechanism.
> 
>>
>>>>> As far as I can say from reviewing the mobile (Android)
>>>>> platforms kernel source (from different vendors),
>>>>> many of them are enabling Packed Commands.
>>>>
>>>> Actually, some shipping Samsung devices with eMMC4.5 might be used packed command.
>>>> (For Android/Tizen OS and ARTIK boards)
>>>
>>> Thanks for sharing this information!
>>>
>>> It seems like we need to run another round of performance
>>> measurements, as to get some fresh number of the benefit of packed
>>> command.
>>> I would really appreciate if you could help out with that.
>>
>> As far as I'm concerned, there are already two conclusions and
>> I don't think those measurements would help much:
>>
>> - It's a problem that none of our upstream drivers support this
>>   feature, and we really want them to do so, at least after the
>>   blk_mq change.
>>
>> - Linus' analysis is still valid: there is no regression in removing
>>   it from the traditional blk code today, anyone who has a private
>>   driver that uses it can simply revert the removal in their next
>>   private product release.

It looks like makes sense..I agreed this..

>>
>> If removing it helps us enable blk_mq support more easily, then
>> I think we can take out the packed command handling, but we have
>> to be prepared to put it back later on top of blk_mq.
>>
> 
> Thanks for the summary. This do seems like a nice approach.
> 
> Let's see what other people thinks about this.
> 
> Kind regards
> Uffe
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-11-23  9:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-21 10:08 [PATCH] mmc: block: delete packed command support Linus Walleij
2016-11-21 11:11 ` Arnd Bergmann
2016-11-21 12:44   ` Adrian Hunter
2016-11-21 14:02     ` Ulf Hansson
2016-11-21 14:17       ` Adrian Hunter
2016-11-21 15:17         ` Ulf Hansson
2016-11-21 15:27         ` Linus Walleij
2016-11-21 14:23   ` Alex Lemberg
2016-11-22  3:53     ` Jaehoon Chung
2016-11-22  8:49       ` Ulf Hansson
2016-11-22 12:49         ` Linus Walleij
2016-11-23  9:34           ` Jaehoon Chung
2016-11-22 14:44         ` Arnd Bergmann
2016-11-22 16:06           ` Ulf Hansson
2016-11-23  9:40             ` Jaehoon Chung

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.