From: Boaz Harrosh <bharrosh@panasas.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au>, Wang Sen <senwang@linux.vnet.ibm.com>, <linux-scsi@vger.kernel.org>, <JBottomley@parallels.com>, <stefanha@linux.vnet.ibm.com>, <mc@linux.vnet.ibm.com>, <linux-kernel@vger.kernel.org>, "kvm@vger.kernel.org" <kvm@vger.kernel.org> Subject: Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list) Date: Mon, 30 Jul 2012 11:56:47 +0300 [thread overview] Message-ID: <50164C4F.7030905@panasas.com> (raw) In-Reply-To: <501633EA.603@redhat.com> On 07/30/2012 10:12 AM, Paolo Bonzini wrote: > Il 30/07/2012 01:50, Rusty Russell ha scritto: >>> Also, being the first user of chained scatterlist doesn't exactly give >>> me warm fuzzies. >> >> We're far from the first user: they've been in the kernel for well over >> 7 years. They were introduced for the block layer, but they tended to >> ignore private uses of scatterlists like this one. > > Yeah, but sg_chain has no users in drivers, only a private one in > lib/scatterlist.c. The internal API could be changed to something else > and leave virtio-scsi screwed... > >> Yes, we should do this. But note that this means an iteration, so we >> might as well combine the loops :) > > I'm really bad at posting pseudo-code, but you can count the number of > physically-contiguous entries at the beginning of the list only. So if > everything is contiguous, you use a single non-indirect buffer and save > a kmalloc. If you use indirect buffers, I suspect it's much less > effective to collapse physically-contiguous entries. More elaborate > heuristics do need a loop, though. > [All the below with a grain of salt, from my senile memory] You must not forget some facts about the scatterlist received here at the LLD. It has already been DMA mapped and locked by the generic layer. Which means that the DMA engine has already collapsed physically-contiguous entries. Those you get here are already unique physically. (There were bugs in the past, where this was not true, please complain if you find them again) A scatterlist is two different lists taking the same space, but with two different length. - One list is the PAGE pointers plus offset && length, which is bigger or equal to the 2nd list. The end marker corresponds to this list. This list is the input into the DMA engine. - Second list is the physical DMA addresses list. With their physical-lengths. Offset is not needed because it is incorporated in the DMA address. This list is the output from the DMA engine. The reason 2nd list is shorter is because the DMA engine tries to minimize the physical scatter-list entries which is usually a limited HW resource. This list might follow chains but it's end is determined by the received sg_count from the DMA engine, not by the end marker. At the time my opinion, and I think Rusty agreed, was that the scatterlist should be split in two. The input page-ptr list is just the BIO, and the output of the DMA-engine should just be the physical part of the sg_list, as a separate parameter. But all this was berried under too much APIs and the noise was two strong, for any single brave sole. So I'd just trust blindly the returned sg_count from the DMA engine, it is already optimized. I THINK > Paolo Boaz
WARNING: multiple messages have this Message-ID (diff)
From: Boaz Harrosh <bharrosh@panasas.com> To: Paolo Bonzini <pbonzini@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au>, Wang Sen <senwang@linux.vnet.ibm.com>, linux-scsi@vger.kernel.org, JBottomley@parallels.com, stefanha@linux.vnet.ibm.com, mc@linux.vnet.ibm.com, linux-kernel@vger.kernel.org, "kvm@vger.kernel.org" <kvm@vger.kernel.org> Subject: Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list) Date: Mon, 30 Jul 2012 11:56:47 +0300 [thread overview] Message-ID: <50164C4F.7030905@panasas.com> (raw) In-Reply-To: <501633EA.603@redhat.com> On 07/30/2012 10:12 AM, Paolo Bonzini wrote: > Il 30/07/2012 01:50, Rusty Russell ha scritto: >>> Also, being the first user of chained scatterlist doesn't exactly give >>> me warm fuzzies. >> >> We're far from the first user: they've been in the kernel for well over >> 7 years. They were introduced for the block layer, but they tended to >> ignore private uses of scatterlists like this one. > > Yeah, but sg_chain has no users in drivers, only a private one in > lib/scatterlist.c. The internal API could be changed to something else > and leave virtio-scsi screwed... > >> Yes, we should do this. But note that this means an iteration, so we >> might as well combine the loops :) > > I'm really bad at posting pseudo-code, but you can count the number of > physically-contiguous entries at the beginning of the list only. So if > everything is contiguous, you use a single non-indirect buffer and save > a kmalloc. If you use indirect buffers, I suspect it's much less > effective to collapse physically-contiguous entries. More elaborate > heuristics do need a loop, though. > [All the below with a grain of salt, from my senile memory] You must not forget some facts about the scatterlist received here at the LLD. It has already been DMA mapped and locked by the generic layer. Which means that the DMA engine has already collapsed physically-contiguous entries. Those you get here are already unique physically. (There were bugs in the past, where this was not true, please complain if you find them again) A scatterlist is two different lists taking the same space, but with two different length. - One list is the PAGE pointers plus offset && length, which is bigger or equal to the 2nd list. The end marker corresponds to this list. This list is the input into the DMA engine. - Second list is the physical DMA addresses list. With their physical-lengths. Offset is not needed because it is incorporated in the DMA address. This list is the output from the DMA engine. The reason 2nd list is shorter is because the DMA engine tries to minimize the physical scatter-list entries which is usually a limited HW resource. This list might follow chains but it's end is determined by the received sg_count from the DMA engine, not by the end marker. At the time my opinion, and I think Rusty agreed, was that the scatterlist should be split in two. The input page-ptr list is just the BIO, and the output of the DMA-engine should just be the physical part of the sg_list, as a separate parameter. But all this was berried under too much APIs and the noise was two strong, for any single brave sole. So I'd just trust blindly the returned sg_count from the DMA engine, it is already optimized. I THINK > Paolo Boaz
next prev parent reply other threads:[~2012-07-30 8:57 UTC|newest] Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2012-07-25 8:29 [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list Wang Sen 2012-07-25 8:44 ` Paolo Bonzini 2012-07-25 9:22 ` Boaz Harrosh 2012-07-25 9:22 ` Boaz Harrosh 2012-07-25 9:41 ` Paolo Bonzini 2012-07-25 12:34 ` Boaz Harrosh 2012-07-25 12:34 ` Boaz Harrosh 2012-07-25 12:49 ` Paolo Bonzini 2012-07-25 13:26 ` Boaz Harrosh 2012-07-25 13:26 ` Boaz Harrosh 2012-07-25 13:36 ` Paolo Bonzini 2012-07-25 14:36 ` Boaz Harrosh 2012-07-25 14:36 ` Boaz Harrosh 2012-07-25 15:09 ` performance improvements for the sglist API (Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list) Paolo Bonzini 2012-07-25 15:16 ` Paolo Bonzini 2012-07-25 14:17 ` virtio(-scsi) vs. chained sg_lists (was " Paolo Bonzini 2012-07-25 15:28 ` Boaz Harrosh 2012-07-25 15:28 ` Boaz Harrosh 2012-07-25 17:43 ` Paolo Bonzini 2012-07-25 19:16 ` Boaz Harrosh 2012-07-25 19:16 ` Boaz Harrosh 2012-07-25 20:06 ` Paolo Bonzini 2012-07-25 21:04 ` Boaz Harrosh 2012-07-25 21:04 ` Boaz Harrosh 2012-07-26 7:23 ` Paolo Bonzini 2012-07-26 7:56 ` Boaz Harrosh 2012-07-26 7:56 ` Boaz Harrosh 2012-07-26 7:58 ` Paolo Bonzini 2012-07-26 13:05 ` Paolo Bonzini 2012-07-27 6:27 ` Rusty Russell 2012-07-27 6:27 ` Rusty Russell 2012-07-27 8:11 ` Paolo Bonzini 2012-07-29 23:50 ` Rusty Russell 2012-07-29 23:50 ` Rusty Russell 2012-07-30 7:12 ` Paolo Bonzini 2012-07-30 8:56 ` Boaz Harrosh [this message] 2012-07-30 8:56 ` Boaz Harrosh 2012-07-25 10:41 ` [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list Stefan Hajnoczi 2012-07-25 11:48 ` Sen Wang 2012-07-25 11:44 ` Sen Wang 2012-07-25 12:40 ` Boaz Harrosh 2012-07-25 12:40 ` Boaz Harrosh 2012-07-27 3:12 ` Wang Sen 2012-07-27 6:50 ` Paolo Bonzini 2012-07-25 10:04 ` Rolf Eike Beer 2012-07-25 10:04 ` Rolf Eike Beer 2012-07-25 11:46 ` Sen Wang
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=50164C4F.7030905@panasas.com \ --to=bharrosh@panasas.com \ --cc=JBottomley@parallels.com \ --cc=kvm@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-scsi@vger.kernel.org \ --cc=mc@linux.vnet.ibm.com \ --cc=pbonzini@redhat.com \ --cc=rusty@rustcorp.com.au \ --cc=senwang@linux.vnet.ibm.com \ --cc=stefanha@linux.vnet.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.