All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boaz Harrosh <bharrosh@panasas.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	Wang Sen <senwang@linux.vnet.ibm.com>,
	<linux-scsi@vger.kernel.org>, <JBottomley@parallels.com>,
	<stefanha@linux.vnet.ibm.com>, <mc@linux.vnet.ibm.com>,
	<linux-kernel@vger.kernel.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
Date: Mon, 30 Jul 2012 11:56:47 +0300	[thread overview]
Message-ID: <50164C4F.7030905@panasas.com> (raw)
In-Reply-To: <501633EA.603@redhat.com>

On 07/30/2012 10:12 AM, Paolo Bonzini wrote:

> Il 30/07/2012 01:50, Rusty Russell ha scritto:
>>> Also, being the first user of chained scatterlist doesn't exactly give
>>> me warm fuzzies.
>>
>> We're far from the first user: they've been in the kernel for well over
>> 7 years.  They were introduced for the block layer, but they tended to
>> ignore private uses of scatterlists like this one.
> 
> Yeah, but sg_chain has no users in drivers, only a private one in
> lib/scatterlist.c.  The internal API could be changed to something else
> and leave virtio-scsi screwed...
> 
>> Yes, we should do this.  But note that this means an iteration, so we
>> might as well combine the loops :)
> 
> I'm really bad at posting pseudo-code, but you can count the number of
> physically-contiguous entries at the beginning of the list only.  So if
> everything is contiguous, you use a single non-indirect buffer and save
> a kmalloc.  If you use indirect buffers, I suspect it's much less
> effective to collapse physically-contiguous entries.  More elaborate
> heuristics do need a loop, though.
> 


[All the below with a grain of salt, from my senile memory]

You must not forget some facts about the scatterlist received here at the LLD.
It has already been DMA mapped and locked by the generic layer.
Which means that the DMA engine has already collapsed physically-contiguous
entries. Those you get here are already unique physically.
(There were bugs in the past, where this was not true, please complain
 if you find them again)

A scatterlist is two different lists taking the same space, but with two
different length.
- One list is the PAGE pointers plus offset && length, which is bigger or
  equal to the 2nd list. The end marker corresponds to this list.

  This list is the input into the DMA engine.

- Second list is the physical DMA addresses list. With their physical-lengths.
  Offset is not needed because it is incorporated in the DMA address.

  This list is the output from the DMA engine.

  The reason 2nd list is shorter is because the DMA engine tries to minimize
  the physical scatter-list entries which is usually a limited HW resource.

  This list might follow chains but it's end is determined by the received
  sg_count from the DMA engine, not by the end marker.

At the time my opinion, and I think Rusty agreed, was that the scatterlist
should be split in two. The input page-ptr list is just the BIO, and the
output of the DMA-engine should just be the physical part of the sg_list,
as a separate parameter. But all this was berried under too much APIs and
the noise was two strong, for any single brave sole.

So I'd just trust blindly the returned sg_count from the DMA engine, it is
already optimized. I THINK

> Paolo


Boaz

WARNING: multiple messages have this Message-ID (diff)
From: Boaz Harrosh <bharrosh@panasas.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>,
	Wang Sen <senwang@linux.vnet.ibm.com>,
	linux-scsi@vger.kernel.org, JBottomley@parallels.com,
	stefanha@linux.vnet.ibm.com, mc@linux.vnet.ibm.com,
	linux-kernel@vger.kernel.org,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>
Subject: Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)
Date: Mon, 30 Jul 2012 11:56:47 +0300	[thread overview]
Message-ID: <50164C4F.7030905@panasas.com> (raw)
In-Reply-To: <501633EA.603@redhat.com>

On 07/30/2012 10:12 AM, Paolo Bonzini wrote:

> Il 30/07/2012 01:50, Rusty Russell ha scritto:
>>> Also, being the first user of chained scatterlist doesn't exactly give
>>> me warm fuzzies.
>>
>> We're far from the first user: they've been in the kernel for well over
>> 7 years.  They were introduced for the block layer, but they tended to
>> ignore private uses of scatterlists like this one.
> 
> Yeah, but sg_chain has no users in drivers, only a private one in
> lib/scatterlist.c.  The internal API could be changed to something else
> and leave virtio-scsi screwed...
> 
>> Yes, we should do this.  But note that this means an iteration, so we
>> might as well combine the loops :)
> 
> I'm really bad at posting pseudo-code, but you can count the number of
> physically-contiguous entries at the beginning of the list only.  So if
> everything is contiguous, you use a single non-indirect buffer and save
> a kmalloc.  If you use indirect buffers, I suspect it's much less
> effective to collapse physically-contiguous entries.  More elaborate
> heuristics do need a loop, though.
> 


[All the below with a grain of salt, from my senile memory]

You must not forget some facts about the scatterlist received here at the LLD.
It has already been DMA mapped and locked by the generic layer.
Which means that the DMA engine has already collapsed physically-contiguous
entries. Those you get here are already unique physically.
(There were bugs in the past, where this was not true, please complain
 if you find them again)

A scatterlist is two different lists taking the same space, but with two
different length.
- One list is the PAGE pointers plus offset && length, which is bigger or
  equal to the 2nd list. The end marker corresponds to this list.

  This list is the input into the DMA engine.

- Second list is the physical DMA addresses list. With their physical-lengths.
  Offset is not needed because it is incorporated in the DMA address.

  This list is the output from the DMA engine.

  The reason 2nd list is shorter is because the DMA engine tries to minimize
  the physical scatter-list entries which is usually a limited HW resource.

  This list might follow chains but it's end is determined by the received
  sg_count from the DMA engine, not by the end marker.

At the time my opinion, and I think Rusty agreed, was that the scatterlist
should be split in two. The input page-ptr list is just the BIO, and the
output of the DMA-engine should just be the physical part of the sg_list,
as a separate parameter. But all this was berried under too much APIs and
the noise was two strong, for any single brave sole.

So I'd just trust blindly the returned sg_count from the DMA engine, it is
already optimized. I THINK

> Paolo


Boaz

  reply	other threads:[~2012-07-30  8:57 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-25  8:29 [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list Wang Sen
2012-07-25  8:44 ` Paolo Bonzini
2012-07-25  9:22   ` Boaz Harrosh
2012-07-25  9:22     ` Boaz Harrosh
2012-07-25  9:41     ` Paolo Bonzini
2012-07-25 12:34       ` Boaz Harrosh
2012-07-25 12:34         ` Boaz Harrosh
2012-07-25 12:49         ` Paolo Bonzini
2012-07-25 13:26           ` Boaz Harrosh
2012-07-25 13:26             ` Boaz Harrosh
2012-07-25 13:36             ` Paolo Bonzini
2012-07-25 14:36               ` Boaz Harrosh
2012-07-25 14:36                 ` Boaz Harrosh
2012-07-25 15:09                 ` performance improvements for the sglist API (Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list) Paolo Bonzini
2012-07-25 15:16                   ` Paolo Bonzini
2012-07-25 14:17             ` virtio(-scsi) vs. chained sg_lists (was " Paolo Bonzini
2012-07-25 15:28               ` Boaz Harrosh
2012-07-25 15:28                 ` Boaz Harrosh
2012-07-25 17:43                 ` Paolo Bonzini
2012-07-25 19:16                   ` Boaz Harrosh
2012-07-25 19:16                     ` Boaz Harrosh
2012-07-25 20:06                     ` Paolo Bonzini
2012-07-25 21:04                       ` Boaz Harrosh
2012-07-25 21:04                         ` Boaz Harrosh
2012-07-26  7:23                         ` Paolo Bonzini
2012-07-26  7:56                           ` Boaz Harrosh
2012-07-26  7:56                             ` Boaz Harrosh
2012-07-26  7:58                             ` Paolo Bonzini
2012-07-26 13:05                               ` Paolo Bonzini
2012-07-27  6:27                                 ` Rusty Russell
2012-07-27  6:27                                   ` Rusty Russell
2012-07-27  8:11                                   ` Paolo Bonzini
2012-07-29 23:50                                     ` Rusty Russell
2012-07-29 23:50                                       ` Rusty Russell
2012-07-30  7:12                                       ` Paolo Bonzini
2012-07-30  8:56                                         ` Boaz Harrosh [this message]
2012-07-30  8:56                                           ` Boaz Harrosh
2012-07-25 10:41   ` [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list Stefan Hajnoczi
2012-07-25 11:48     ` Sen Wang
2012-07-25 11:44   ` Sen Wang
2012-07-25 12:40     ` Boaz Harrosh
2012-07-25 12:40       ` Boaz Harrosh
2012-07-27  3:12       ` Wang Sen
2012-07-27  6:50         ` Paolo Bonzini
2012-07-25 10:04 ` Rolf Eike Beer
2012-07-25 10:04   ` Rolf Eike Beer
2012-07-25 11:46   ` Sen Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50164C4F.7030905@panasas.com \
    --to=bharrosh@panasas.com \
    --cc=JBottomley@parallels.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=mc@linux.vnet.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=senwang@linux.vnet.ibm.com \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.