All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladislav Bolkhovitin <vst@vlnb.net>
To: Evgeniy Polyakov <zbr@ioremap.net>
Cc: linux-scsi@vger.kernel.org,
	James Bottomley <James.Bottomley@HansenPartnership.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>,
	Mike Christie <michaelc@cs.wisc.edu>,
	Jeff Garzik <jeff@garzik.org>,
	Boaz Harrosh <bharrosh@panasas.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org, scst-devel@lists.sourceforge.net,
	Bart Van Assche <bart.vanassche@gmail.com>,
	"Nicholas A. Bellinger" <nab@linux-iscsi.org>,
	netdev@vger.kernel.org
Subject: Re: [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data
Date: Thu, 11 Dec 2008 21:16:47 +0300	[thread overview]
Message-ID: <4941590F.3070705@vlnb.net> (raw)
In-Reply-To: <20081210214500.GA24212@ioremap.net>

Hi Evgeniy,

Evgeniy Polyakov wrote:
> Hi Vladislav.
> 
> On Wed, Dec 10, 2008 at 10:04:36PM +0300, Vladislav Bolkhovitin (vst@vlnb.net) wrote:
>> In the chosen approach new optional field void *net_priv was added to 
>> struct page. It is enclosed by
> 
> There is a huge no-no in networking land on increasing skb.
> Reason is simple every skb will carry potentially unneded data as long
> as given option is enabled, and most of the time it will.
> To break this barrier one has to have (I wanted to write ego, but then
> decided to replace it with mojo) so huge reason to do this, that it is
> almost impossible to have.
> 
> Something tells me that increasing page structure with 8 bytes because
> of zero-copy iscsi transfer is not that great idea, since basically every
> user out there will have it enabled in the distro config and will waste
> noticeble amount of ram.

The waste will be only 0.2% of RAM or 2MB per 1GB. Not much. Perhaps, 
not noticeable for an average user of distro kernels at all. Embedded 
people, who count each byte, almost always don't need iSCSI, so won't 
have any problems to disable 
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION option.

Also, as I wrote, iSCSI-SCST can work without this patch or with 
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION option disabled, but with 
user space device handlers it will work considerably worse. Only few 
distro kernels users need an iSCSI target and only few among such users 
need to use a user space device handler. So, option 
TCP_ZERO_COPY_TRANSFER_COMPLETION_NOTIFICATION can be safely disabled in 
default configs of distro kernels. People who need both iSCSI target 
*and* fast working user space device handler would simply enable that 
option and rebuild the kernel. Rejecting this patch provides much worse 
alternative: those people would also have to *patch* the kernel at 
first, only then enable that option, then rebuild the kernel.

> The same problem of not sending any kind of notification to the user
> when his pages are 'acked' by receiving some packet or freeing the data
> exists long ago and was tried to be fixed several times.
> 
> The most applicable to your case maybe DST experience. DST is a block
> layer device and all its pages starting from quite recent kernels are
> not allowed to be slab ones (xfs was the last one who provided slab
> pages in the bios), so each page has two 'unused' pointers in lru list
> entry, which you may reuse. If scsi layer may have slab pages from some
> place (although this does not sound like a good idea, ->sendpage() will
> bug on on them anyway), this hack will not work, otherwise you only need
> to have net_page_get/put stuff in and do not mess with increasing page.
> And this was tested 3-4 kernel releases ago, so things may be changed.

I've just rechecked if any of struct page fields can be shared with 
net_priv field. Unfortunately, it looks like none among the mandatory 
fields:

  - Transmitting pages can be mapped in some user space, so sharing in 
unions with _mapcount, mapping and index fields isn't possible

  - Transmitting pages can be in the page cache, so sharing lru field 
isn't possible as well

It might, however, be shared with field "virtual", since SCST allocates 
only lowmem pages, but on non-HIGHMEM kernels and most HIGHMEM kernels 
WANT_PAGE_VIRTUAL isn't defined, so it will change basically nothing.

> Another appropach is to increase skb's shared data (at the end of the
> skb->data), and this approach was not frowned upon too much either, but
> it requires to mess with skb->destructor, which may not be appropriate
> in some cases. If iscsi does not use sockets (it does iirc), things are
> much simpler.

As I wrote, approach to use net_priv on the skb level was examined, but 
rejected as unpractical. Simply too many places should be modified to 
prevent merging skb's with different net_priv-like labels and those 
places are too inobvious. Implementation of this approach and, 
especially, maintenance it would be just a nightmare.

Thanks,
Vlad


  reply	other threads:[~2008-12-11 18:17 UTC|newest]

Thread overview: 129+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-10 18:26 [PATCH][RFC 0/23] New SCSI target framework (SCST) and 4 target drivers Vladislav Bolkhovitin
2008-12-10 18:28 ` [PATCH][RFC 1/23]: SCST public headers Vladislav Bolkhovitin
2008-12-10 18:30 ` [PATCH][RFC 2/23]: SCST core Vladislav Bolkhovitin
2008-12-10 19:12   ` Sam Ravnborg
2008-12-11 17:28     ` Vladislav Bolkhovitin
2008-12-11 21:09       ` Sam Ravnborg
2008-12-12 19:24         ` Vladislav Bolkhovitin
2008-12-12 21:50           ` Steven Rostedt
     [not found]             ` <20081212230523.GB4775@ghostprotocols.net>
2008-12-13  1:25               ` Frédéric Weisbecker
2008-12-13  1:25                 ` Frédéric Weisbecker
2008-12-13  1:27                 ` Frédéric Weisbecker
2008-12-13  1:27                   ` Frédéric Weisbecker
2008-12-13 14:46             ` Vladislav Bolkhovitin
2008-12-14  0:35               ` Frédéric Weisbecker
2008-12-16 21:49                 ` Ingo Molnar
2008-12-16 21:49                   ` Ingo Molnar
2008-12-16 22:13                   ` Frédéric Weisbecker
2008-12-16 22:13                     ` Frédéric Weisbecker
2008-12-16 22:22                     ` Ingo Molnar
2008-12-16 22:22                       ` Ingo Molnar
2008-12-16 23:46                       ` Frédéric Weisbecker
2008-12-16 23:46                         ` Frédéric Weisbecker
2008-12-18 11:45                 ` Vladislav Bolkhovitin
2008-12-20 13:06                   ` Frédéric Weisbecker
2008-12-20 13:06                     ` Frédéric Weisbecker
2008-12-23 19:11                     ` Vladislav Bolkhovitin
2008-12-23 19:11                       ` Vladislav Bolkhovitin
2008-12-27 11:20                       ` Ingo Molnar
2008-12-30 17:13                         ` Vladislav Bolkhovitin
2008-12-30 21:03                           ` Frederic Weisbecker
2008-12-30 21:35                             ` Steven Rostedt
2008-12-10 18:34 ` [PATCH][RFC 3/23]: SCST core docs Vladislav Bolkhovitin
2008-12-10 18:34   ` Vladislav Bolkhovitin
2008-12-10 18:36 ` [PATCH][RFC 4/23]: SCST debug support Vladislav Bolkhovitin
2008-12-10 18:37 ` [PATCH][RFC 5/23]: SCST /proc interface Vladislav Bolkhovitin
2008-12-11 20:23   ` Nicholas A. Bellinger
2008-12-12 19:23     ` Vladislav Bolkhovitin
2008-12-10 18:39 ` [PATCH][RFC 6/23]: SCST SGV cache Vladislav Bolkhovitin
2008-12-10 18:40 ` [PATCH][RFC 7/23]: SCST integration into the kernel Vladislav Bolkhovitin
2008-12-10 18:42 ` [PATCH][RFC 8/23]: SCST pass-through backend handlers Vladislav Bolkhovitin
2008-12-10 18:43 ` [PATCH][RFC 9/23]: SCST virtual disk backend handler Vladislav Bolkhovitin
2008-12-10 18:44 ` [PATCH][RFC 10/23]: SCST user space " Vladislav Bolkhovitin
2008-12-10 18:46 ` [PATCH][RFC 11/23]: Makefile for SCST backend handlers Vladislav Bolkhovitin
2008-12-10 18:47 ` [PATCH][RFC 12/23]: Patch to add necessary support for SCST pass-through Vladislav Bolkhovitin
2008-12-10 18:49 ` [PATCH][RFC 13/23]: Export of alloc_io_context() function Vladislav Bolkhovitin
2008-12-11 13:34   ` Jens Axboe
2008-12-11 18:17     ` Vladislav Bolkhovitin
2008-12-11 18:41       ` Jens Axboe
2008-12-11 19:00         ` Vladislav Bolkhovitin
2008-12-11 19:06           ` Jens Axboe
2008-12-12 19:16             ` Vladislav Bolkhovitin
2008-12-10 18:50 ` [PATCH][RFC 14/23]: Necessary functionality in qla2xxx driver to support target mode Vladislav Bolkhovitin
2008-12-10 18:51 ` [PATCH][RFC 15/23]: QLogic target driver Vladislav Bolkhovitin
2008-12-10 18:54 ` [PATCH][RFC 16/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:55 ` [PATCH][RFC 17/23]: InfiniBand SRP " Vladislav Bolkhovitin
2008-12-10 18:57 ` [PATCH][RFC 18/23]: Documentation for " Vladislav Bolkhovitin
2008-12-10 18:58 ` [PATCH][RFC 19/23]: scst_local " Vladislav Bolkhovitin
2008-12-10 19:00 ` [PATCH][RFC 20/23]: Documentation for scst_local driver Vladislav Bolkhovitin
2008-12-10 19:01 ` [PATCH][RFC 21/23]: iSCSI target driver Vladislav Bolkhovitin
2008-12-11 22:55   ` Nicholas A. Bellinger
2008-12-11 22:59     ` Nicholas A. Bellinger
2008-12-12 19:26     ` Vladislav Bolkhovitin
2008-12-13 10:03       ` Nicholas A. Bellinger
2008-12-13 10:11         ` Bart Van Assche
2008-12-13 10:16           ` Nicholas A. Bellinger
2008-12-13 10:27             ` Bart Van Assche
2008-12-13 15:01             ` Vladislav Bolkhovitin
2008-12-13 15:01               ` Vladislav Bolkhovitin
2008-12-13 14:57         ` Vladislav Bolkhovitin
2008-12-10 19:02 ` [PATCH][RFC 22/23]: Documentation for iSCSI-SCST Vladislav Bolkhovitin
2008-12-10 19:04 ` [PATCH][RFC 23/23]: Support for zero-copy TCP transmit of user space data Vladislav Bolkhovitin
2008-12-10 21:45   ` Evgeniy Polyakov
2008-12-11 18:16     ` Vladislav Bolkhovitin [this message]
2008-12-11 19:12       ` James Bottomley
2008-12-12 19:25         ` Vladislav Bolkhovitin
2008-12-12 19:37           ` James Bottomley
2008-12-15 17:58             ` Vladislav Bolkhovitin
2008-12-15 23:18               ` Christoph Hellwig
2008-12-16 18:57                 ` Vladislav Bolkhovitin
2008-12-18 18:35                   ` [RFC]: " Vladislav Bolkhovitin
2008-12-18 18:35                     ` Vladislav Bolkhovitin
2008-12-18 18:43                     ` David M. Lloyd
2008-12-18 18:43                       ` David M. Lloyd
2008-12-19 17:37                       ` Vladislav Bolkhovitin
2008-12-19 17:37                         ` Vladislav Bolkhovitin
2008-12-19 19:07                         ` Jens Axboe
2008-12-19 19:07                           ` Jens Axboe
2008-12-19 19:17                           ` Vladislav Bolkhovitin
2008-12-19 19:17                             ` Vladislav Bolkhovitin
2008-12-19 19:27                             ` Jens Axboe
2008-12-19 19:27                               ` Jens Axboe
2008-12-19 21:58                               ` Evgeniy Polyakov
2008-12-19 21:58                                 ` Evgeniy Polyakov
2008-12-23 19:11                               ` Vladislav Bolkhovitin
2008-12-23 19:11                                 ` Vladislav Bolkhovitin
2008-12-19 11:27                     ` Andi Kleen
2008-12-19 11:27                       ` Andi Kleen
2008-12-19 11:27                       ` Andi Kleen
2008-12-19 17:38                       ` Vladislav Bolkhovitin
2008-12-19 17:38                         ` Vladislav Bolkhovitin
2008-12-19 18:00                         ` Andi Kleen
2008-12-19 18:00                           ` Andi Kleen
2008-12-19 17:57                           ` Vladislav Bolkhovitin
2008-12-19 17:57                             ` Vladislav Bolkhovitin
2008-12-16 16:00     ` [PATCH][RFC 23/23]: " Bart Van Assche
2008-12-16 17:41       ` Evgeniy Polyakov
2008-12-19 20:21   ` Jeremy Fitzhardinge
2008-12-19 22:04     ` Evgeniy Polyakov
2008-12-19 22:21       ` Jeremy Fitzhardinge
2008-12-19 22:33         ` Evgeniy Polyakov
2008-12-20  1:56           ` Jeremy Fitzhardinge
2008-12-20  2:02             ` Herbert Xu
2008-12-20  6:14               ` Jeremy Fitzhardinge
2008-12-20  6:51                 ` Herbert Xu
2008-12-20  7:43                   ` Jeremy Fitzhardinge
2008-12-20  8:10                     ` Herbert Xu
2008-12-20 10:32                       ` Evgeniy Polyakov
2008-12-20 19:39                         ` Jeremy Fitzhardinge
2008-12-22  0:43                           ` Rusty Russell
2008-12-23 19:14                             ` Vladislav Bolkhovitin
2008-12-23 19:16                         ` Vladislav Bolkhovitin
2008-12-23 21:38                           ` Evgeniy Polyakov
2008-12-24 14:37                             ` Vladislav Bolkhovitin
2008-12-24 14:44                               ` Evgeniy Polyakov
2008-12-24 17:46                                 ` Vladislav Bolkhovitin
2008-12-24 18:08                                   ` Evgeniy Polyakov
2008-12-30 17:37                                     ` Vladislav Bolkhovitin
2008-12-30 21:35                                       ` Evgeniy Polyakov
2008-12-23 19:13     ` Vladislav Bolkhovitin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4941590F.3070705@vlnb.net \
    --to=vst@vlnb.net \
    --cc=James.Bottomley@HansenPartnership.com \
    --cc=akpm@linux-foundation.org \
    --cc=bart.vanassche@gmail.com \
    --cc=bharrosh@panasas.com \
    --cc=fujita.tomonori@lab.ntt.co.jp \
    --cc=jeff@garzik.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=nab@linux-iscsi.org \
    --cc=netdev@vger.kernel.org \
    --cc=scst-devel@lists.sourceforge.net \
    --cc=torvalds@linux-foundation.org \
    --cc=zbr@ioremap.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.