All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: tianyu.lan@intel.com, kevin.tian@intel.com, mst@redhat.com,
	jan.kiszka@siemens.com, bd.aviv@gmail.com, qemu-devel@nongnu.org,
	alex.williamson@redhat.com
Subject: Re: [Qemu-devel] [PATCH RFC v4 15/20] intel_iommu: provide its own replay() callback
Date: Mon, 23 Jan 2017 10:54:49 +0800	[thread overview]
Message-ID: <20170123025449.GE26526@pxdev.xzpeter.org> (raw)
In-Reply-To: <c7265d9c-54e9-f4a2-003d-85c4416c1651@redhat.com>

On Mon, Jan 23, 2017 at 09:48:48AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月22日 16:51, Peter Xu wrote:
> >On Sun, Jan 22, 2017 at 03:56:10PM +0800, Jason Wang wrote:
> >
> >[...]
> >
> >>>+/**
> >>>+ * vtd_page_walk_level - walk over specific level for IOVA range
> >>>+ *
> >>>+ * @addr: base GPA addr to start the walk
> >>>+ * @start: IOVA range start address
> >>>+ * @end: IOVA range end address (start <= addr < end)
> >>>+ * @hook_fn: hook func to be called when detected page
> >>>+ * @private: private data to be passed into hook func
> >>>+ * @read: whether parent level has read permission
> >>>+ * @write: whether parent level has write permission
> >>>+ * @skipped: accumulated skipped ranges
> >>What's the usage for this parameter? Looks like it was never used in this
> >>series.
> >This was for debugging purpose before, and I kept it in case one day
> >it can be used again, considering that will not affect much on the
> >overall performance.
> 
> I think we usually do not keep debugging codes outside debug macros.

I'll remove it.

> 
> >
> >>>+ * @notify_unmap: whether we should notify invalid entries
> >>>+ */
> >>>+static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
> >>>+                               uint64_t end, vtd_page_walk_hook hook_fn,
> >>>+                               void *private, uint32_t level,
> >>>+                               bool read, bool write, uint64_t *skipped,
> >>>+                               bool notify_unmap)
> >>>+{
> >>>+    bool read_cur, write_cur, entry_valid;
> >>>+    uint32_t offset;
> >>>+    uint64_t slpte;
> >>>+    uint64_t subpage_size, subpage_mask;
> >>>+    IOMMUTLBEntry entry;
> >>>+    uint64_t iova = start;
> >>>+    uint64_t iova_next;
> >>>+    uint64_t skipped_local = 0;
> >>>+    int ret = 0;
> >>>+
> >>>+    trace_vtd_page_walk_level(addr, level, start, end);
> >>>+
> >>>+    subpage_size = 1ULL << vtd_slpt_level_shift(level);
> >>>+    subpage_mask = vtd_slpt_level_page_mask(level);
> >>>+
> >>>+    while (iova < end) {
> >>>+        iova_next = (iova & subpage_mask) + subpage_size;
> >>>+
> >>>+        offset = vtd_iova_level_offset(iova, level);
> >>>+        slpte = vtd_get_slpte(addr, offset);
> >>>+
> >>>+        /*
> >>>+         * When one of the following case happens, we assume the whole
> >>>+         * range is invalid:
> >>>+         *
> >>>+         * 1. read block failed
> >>Don't get the meaning (and don't see any code relate to this comment).
> >I took above vtd_get_slpte() a "read", so I was trying to mean that we
> >failed to read the SLPTE due to some reason, we assume the range is
> >invalid.
> 
> I see, so we'd better move the comment above of vtd_get_slpte().

Let me remove the whole comment block... I think the codes explained
it clearly even without any comment. (when people see the code check
slpte against -1, it'll think about above function naturally)

> 
> >
> >>>+         * 3. reserved area non-zero
> >>>+         * 2. both read & write flag are not set
> >>Should be 1,2,3? And the above comment is conflict with the code at least
> >>when notify_unmap is true.
> >Yes, okay I don't know why 3 jumped. :(
> >
> >And yes, I should mention that "both read & write flag not set" only
> >suites for page tables here.
> >
> >>>+         */
> >>>+
> >>>+        if (slpte == (uint64_t)-1) {
> >>If this is true, vtd_slpte_nonzero_rsvd(slpte) should be true too I think?
> >Yes, but we are doing two checks here:
> >
> >- checking against -1 to make sure slpte read success
> >- checking against nonzero reserved fields to make sure it follows spec
> >
> >Imho we should not skip the first check here, only if one day removing
> >this may really matter (e.g., for performance reason? I cannot think
> >of one yet).
> 
> Ok. (return -1 seems not good, but we can address this in the future).

Thanks.

> 
> >
> >>>+            trace_vtd_page_walk_skip_read(iova, iova_next);
> >>>+            skipped_local++;
> >>>+            goto next;
> >>>+        }
> >>>+
> >>>+        if (vtd_slpte_nonzero_rsvd(slpte, level)) {
> >>>+            trace_vtd_page_walk_skip_reserve(iova, iova_next);
> >>>+            skipped_local++;
> >>>+            goto next;
> >>>+        }
> >>>+
> >>>+        /* Permissions are stacked with parents' */
> >>>+        read_cur = read && (slpte & VTD_SL_R);
> >>>+        write_cur = write && (slpte & VTD_SL_W);
> >>>+
> >>>+        /*
> >>>+         * As long as we have either read/write permission, this is
> >>>+         * a valid entry. The rule works for both page or page tables.
> >>>+         */
> >>>+        entry_valid = read_cur | write_cur;
> >>>+
> >>>+        if (vtd_is_last_slpte(slpte, level)) {
> >>>+            entry.target_as = &address_space_memory;
> >>>+            entry.iova = iova & subpage_mask;
> >>>+            /*
> >>>+             * This might be meaningless addr if (!read_cur &&
> >>>+             * !write_cur), but after all this field will be
> >>>+             * meaningless in that case, so let's share the code to
> >>>+             * generate the IOTLBs no matter it's an MAP or UNMAP
> >>>+             */
> >>>+            entry.translated_addr = vtd_get_slpte_addr(slpte);
> >>>+            entry.addr_mask = ~subpage_mask;
> >>>+            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
> >>>+            if (!entry_valid && !notify_unmap) {
> >>>+                trace_vtd_page_walk_skip_perm(iova, iova_next);
> >>>+                skipped_local++;
> >>>+                goto next;
> >>>+            }
> >>Under which case should we care about unmap here (better with a comment I
> >>think)?
> >When PSIs are for invalidation, rather than newly mapped entries. In
> >that case, notify_unmap will be true, and here we need to notify
> >IOMMU_NONE to do the cache flush or unmap.
> >
> >(this page walk is not only for replaying, it is for cache flushing as
> >  well)
> >
> >Do you have suggestion on the comments?
> 
> I think then we'd better move this to patch 18 which will use notify_unmap.

Do you mean to add some more comment on patch 18?

> 
> >
> >>>+            trace_vtd_page_walk_one(level, entry.iova, entry.translated_addr,
> >>>+                                    entry.addr_mask, entry.perm);
> >>>+            if (hook_fn) {
> >>>+                ret = hook_fn(&entry, private);
> >>For better performance, we could try to merge adjacent mappings here. I
> >>think both vfio and vhost support this and it can save a lot of ioctls.
> >Looks so, and this is in my todo list.
> >
> >Do you mind I do it later after this series merged? I would really
> >appreciate if we can have this codes settled down first (considering
> >that this series has been dangling for half a year, or more, startint
> >from Aviv's series), and I am just afraid this will led to
> >unconvergence of this series (and I believe there are other places
> >that can be enhanced in the future as well).
> 
> Yes, let's do it on top.

Thanks.

> 
> >
> >>>+                if (ret < 0) {
> >>>+                    error_report("Detected error in page walk hook "
> >>>+                                 "function, stop walk.");
> >>>+                    return ret;
> >>>+                }
> >>>+            }
> >>>+        } else {
> >>>+            if (!entry_valid) {
> >>>+                trace_vtd_page_walk_skip_perm(iova, iova_next);
> >>>+                skipped_local++;
> >>>+                goto next;
> >>>+            }
> >>>+            trace_vtd_page_walk_level(vtd_get_slpte_addr(slpte), level - 1,
> >>>+                                      iova, MIN(iova_next, end));
> >>This looks duplicated?
> >I suppose not. The level is different.
> 
> Seem not? level - 1 was passed to vtd_page_walk_level() as level which did:
> 
> trace_vtd_page_walk_level(addr, level, start, end);

Hmm yes I didn't notice that I had one at the entry. :(

Let me only keep that one.

Thanks,

-- peterx

  reply	other threads:[~2017-01-23  2:55 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-20 13:08 [Qemu-devel] [PATCH RFC v4 00/20] VT-d: vfio enablement and misc enhances Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 01/20] vfio: trace map/unmap for notify as well Peter Xu
2017-01-23 18:20   ` Alex Williamson
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 02/20] vfio: introduce vfio_get_vaddr() Peter Xu
2017-01-23 18:49   ` Alex Williamson
2017-01-24  3:28     ` Peter Xu
2017-01-24  4:30       ` Alex Williamson
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 03/20] vfio: allow to notify unmap for very large region Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 04/20] IOMMU: add option to enable VTD_CAP_CM to vIOMMU capility exposoed to guest Peter Xu
2017-01-22  2:51   ` [Qemu-devel] [PATCH RFC v4.1 04/20] intel_iommu: add "caching-mode" option Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 05/20] intel_iommu: simplify irq region translation Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 06/20] intel_iommu: renaming gpa to iova where proper Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 07/20] intel_iommu: fix trace for inv desc handling Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 08/20] intel_iommu: fix trace for addr translation Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 09/20] intel_iommu: vtd_slpt_level_shift check level Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 10/20] memory: add section range info for IOMMU notifier Peter Xu
2017-01-23 19:12   ` Alex Williamson
2017-01-24  7:48     ` Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 11/20] memory: provide IOMMU_NOTIFIER_FOREACH macro Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 12/20] memory: provide iommu_replay_all() Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 13/20] memory: introduce memory_region_notify_one() Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 14/20] memory: add MemoryRegionIOMMUOps.replay() callback Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 15/20] intel_iommu: provide its own replay() callback Peter Xu
2017-01-22  7:56   ` Jason Wang
2017-01-22  8:51     ` Peter Xu
2017-01-22  9:36       ` Peter Xu
2017-01-23  1:50         ` Jason Wang
2017-01-23  1:48       ` Jason Wang
2017-01-23  2:54         ` Peter Xu [this message]
2017-01-23  3:12           ` Jason Wang
2017-01-23  3:35             ` Peter Xu
2017-01-23 19:34           ` Alex Williamson
2017-01-24  4:04             ` Peter Xu
2017-01-23 19:33       ` Alex Williamson
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 16/20] intel_iommu: do replay when context invalidate Peter Xu
2017-01-23 10:36   ` Jason Wang
2017-01-24  4:52     ` Peter Xu
2017-01-25  3:09       ` Jason Wang
2017-01-25  3:46         ` Peter Xu
2017-01-25  6:37           ` Tian, Kevin
2017-01-25  6:44             ` Peter Xu
2017-01-25  7:45               ` Jason Wang
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 17/20] intel_iommu: allow dynamic switch of IOMMU region Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 18/20] intel_iommu: enable vfio devices Peter Xu
2017-01-22  8:08   ` Jason Wang
2017-01-22  9:04     ` Peter Xu
2017-01-23  1:55       ` Jason Wang
2017-01-23  3:34         ` Peter Xu
2017-01-23 10:23           ` Jason Wang
2017-01-23 19:40             ` Alex Williamson
2017-01-25  1:19               ` Jason Wang
2017-01-25  1:31                 ` Alex Williamson
2017-01-25  7:41                   ` Jason Wang
2017-01-24  4:42             ` Peter Xu
2017-01-23 18:03           ` Alex Williamson
2017-01-24  7:22             ` Peter Xu
2017-01-24 16:24               ` Alex Williamson
2017-01-25  4:04                 ` Peter Xu
2017-01-23  2:01   ` Jason Wang
2017-01-23  2:17     ` Jason Wang
2017-01-23  3:40     ` Peter Xu
2017-01-23 10:27       ` Jason Wang
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 19/20] intel_iommu: unmap existing pages before replay Peter Xu
2017-01-22  8:13   ` Jason Wang
2017-01-22  9:09     ` Peter Xu
2017-01-23  1:57       ` Jason Wang
2017-01-23  7:30         ` Peter Xu
2017-01-23 10:29           ` Jason Wang
2017-01-23 10:40   ` Jason Wang
2017-01-24  7:31     ` Peter Xu
2017-01-25  3:11       ` Jason Wang
2017-01-25  4:15         ` Peter Xu
2017-01-20 13:08 ` [Qemu-devel] [PATCH RFC v4 20/20] intel_iommu: replay even with DSI/GLOBAL inv desc Peter Xu
2017-01-23 15:55 ` [Qemu-devel] [PATCH RFC v4 00/20] VT-d: vfio enablement and misc enhances Michael S. Tsirkin
2017-01-24  7:40   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170123025449.GE26526@pxdev.xzpeter.org \
    --to=peterx@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bd.aviv@gmail.com \
    --cc=jan.kiszka@siemens.com \
    --cc=jasowang@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tianyu.lan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.