All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC][ATTEND] protection information and userspace
@ 2013-02-06 19:51 Ben Myers
  2013-02-06 20:24 ` Darrick J. Wong
  2013-02-07 19:20 ` Martin K. Petersen
  0 siblings, 2 replies; 28+ messages in thread
From: Ben Myers @ 2013-02-06 19:51 UTC (permalink / raw)
  To: lsf-pc; +Cc: linux-fsdevel, linux-scsi, martin.petersen

Hi,

I'm interested in discussing how to pass protection information to and from
userspace.  Maybe Martin could be enlisted for the discussion.

I read that some work has already been done in this area but have not been able
to locate it.  It looks like the bio-integrity code already makes it possible
to generate the t10-dif crc in the filesystem.  It would be good to be able to
get the guard and application tags back out to backup applications such as
xfsdump.  Enabling other applications to generate their own tags in userspace
is also interesting.

Regards,
	Ben

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-06 19:51 [LSF/MM TOPIC][ATTEND] protection information and userspace Ben Myers
@ 2013-02-06 20:24 ` Darrick J. Wong
  2013-02-06 20:34   ` Chuck Lever
  2013-02-07 19:09   ` Martin K. Petersen
  2013-02-07 19:20 ` Martin K. Petersen
  1 sibling, 2 replies; 28+ messages in thread
From: Darrick J. Wong @ 2013-02-06 20:24 UTC (permalink / raw)
  To: Ben Myers; +Cc: lsf-pc, linux-fsdevel, linux-scsi, martin.petersen

On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
> Hi,
> 
> I'm interested in discussing how to pass protection information to and from
> userspace.  Maybe Martin could be enlisted for the discussion.
> 
> I read that some work has already been done in this area but have not been able
> to locate it.  It looks like the bio-integrity code already makes it possible
> to generate the t10-dif crc in the filesystem.  It would be good to be able to
> get the guard and application tags back out to backup applications such as
> xfsdump.  Enabling other applications to generate their own tags in userspace
> is also interesting.

This one's been on my list for a couple of years (and companies) too.  A few
years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
gone anywhere), and more recently I've theorized that we could add a magic
fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
*{read,write}v call as the PI buffer, which I think is similar to how DIX gets
PI data to a disk.  But it's not like I have any code to show for it.

I /think/ it's fairly straightforward to change the directio submit code to
find the userspace PI buffer and amend the block integrity code to attach our
own PI buffer.  You'd still have to let the block layer set the sector # field,
but afaik that won't affect the crc or the app tag.

I hear that the NFS guys want to propose some sort of protocol for transmitting
PI data (across NFS), but I haven't seen anything concrete yet.

Well, I hope I'll scrape together the time to hack together a PoC before LSF...
on the other hand, I ran the discussion about PI userland interfaces at LPC2011
and (shamefully) haven't done anything yet.

<end rambling>

--D
> 
> Regards,
> 	Ben
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-06 20:24 ` Darrick J. Wong
@ 2013-02-06 20:34   ` Chuck Lever
  2013-02-07  9:40     ` Joel Becker
  2013-02-07 19:09   ` Martin K. Petersen
  1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2013-02-06 20:34 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ben Myers, lsf-pc, linux-fsdevel, linux-scsi, martin.petersen


On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:

> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>> Hi,
>> 
>> I'm interested in discussing how to pass protection information to and from
>> userspace.  Maybe Martin could be enlisted for the discussion.
>> 
>> I read that some work has already been done in this area but have not been able
>> to locate it.  It looks like the bio-integrity code already makes it possible
>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>> get the guard and application tags back out to backup applications such as
>> xfsdump.  Enabling other applications to generate their own tags in userspace
>> is also interesting.
> 
> This one's been on my list for a couple of years (and companies) too.  A few
> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
> gone anywhere), and more recently I've theorized that we could add a magic
> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
> PI data to a disk.  But it's not like I have any code to show for it.
> 
> I /think/ it's fairly straightforward to change the directio submit code to
> find the userspace PI buffer and amend the block integrity code to attach our
> own PI buffer.  You'd still have to let the block layer set the sector # field,
> but afaik that won't affect the crc or the app tag.
> 
> I hear that the NFS guys want to propose some sort of protocol for transmitting
> PI data (across NFS), but I haven't seen anything concrete yet.

I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.

Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.

> Well, I hope I'll scrape together the time to hack together a PoC before LSF...
> on the other hand, I ran the discussion about PI userland interfaces at LPC2011
> and (shamefully) haven't done anything yet.
> 
> <end rambling>
> 
> --D
>> 
>> Regards,
>> 	Ben
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-06 20:34   ` Chuck Lever
@ 2013-02-07  9:40     ` Joel Becker
  2013-02-07 10:01       ` Darrick J. Wong
  2013-02-07 19:12       ` Martin K. Petersen
  0 siblings, 2 replies; 28+ messages in thread
From: Joel Becker @ 2013-02-07  9:40 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Darrick J. Wong, Ben Myers, lsf-pc, linux-fsdevel, linux-scsi,
	martin.petersen

On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
> 
> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
> 
> > On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
> >> Hi,
> >> 
> >> I'm interested in discussing how to pass protection information to and from
> >> userspace.  Maybe Martin could be enlisted for the discussion.
> >> 
> >> I read that some work has already been done in this area but have not been able
> >> to locate it.  It looks like the bio-integrity code already makes it possible
> >> to generate the t10-dif crc in the filesystem.  It would be good to be able to
> >> get the guard and application tags back out to backup applications such as
> >> xfsdump.  Enabling other applications to generate their own tags in userspace
> >> is also interesting.
> > 
> > This one's been on my list for a couple of years (and companies) too.  A few
> > years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
> > gone anywhere), and more recently I've theorized that we could add a magic
> > fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
> > *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
> > PI data to a disk.  But it's not like I have any code to show for it.
> > 
> > I /think/ it's fairly straightforward to change the directio submit code to
> > find the userspace PI buffer and amend the block integrity code to attach our
> > own PI buffer.  You'd still have to let the block layer set the sector # field,
> > but afaik that won't affect the crc or the app tag.
> > 
> > I hear that the NFS guys want to propose some sort of protocol for transmitting
> > PI data (across NFS), but I haven't seen anything concrete yet.
> 
> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
> 
> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.

I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
coding hasn't happened.  I do think we're better off with some kind of
explicit API than some magic state on the file.  I mean, even something
like:

	ssize_t write_with_pi(int fd, const void *buf, size_t count,
			      const void *pi, size_t pi_count);

It's not as nice as a non-historical API (eg sys_dio), but it also
probably plays nicer with buffered I/O.

Joel

> 
> > Well, I hope I'll scrape together the time to hack together a PoC before LSF...
> > on the other hand, I ran the discussion about PI userland interfaces at LPC2011
> > and (shamefully) haven't done anything yet.
> > 
> > <end rambling>
> > 
> > --D
> >> 
> >> Regards,
> >> 	Ben
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"I think it would be a good idea."  
        - Mahatma Ghandi, when asked what he thought of Western
          civilization

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07  9:40     ` Joel Becker
@ 2013-02-07 10:01       ` Darrick J. Wong
  2013-02-07 11:27         ` Hannes Reinecke
  2013-02-07 19:12       ` Martin K. Petersen
  1 sibling, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2013-02-07 10:01 UTC (permalink / raw)
  To: Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel, linux-scsi,
	martin.petersen

On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
> > 
> > On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
> > 
> > > On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
> > >> Hi,
> > >> 
> > >> I'm interested in discussing how to pass protection information to and from
> > >> userspace.  Maybe Martin could be enlisted for the discussion.
> > >> 
> > >> I read that some work has already been done in this area but have not been able
> > >> to locate it.  It looks like the bio-integrity code already makes it possible
> > >> to generate the t10-dif crc in the filesystem.  It would be good to be able to
> > >> get the guard and application tags back out to backup applications such as
> > >> xfsdump.  Enabling other applications to generate their own tags in userspace
> > >> is also interesting.
> > > 
> > > This one's been on my list for a couple of years (and companies) too.  A few
> > > years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
> > > gone anywhere), and more recently I've theorized that we could add a magic
> > > fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
> > > *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
> > > PI data to a disk.  But it's not like I have any code to show for it.
> > > 
> > > I /think/ it's fairly straightforward to change the directio submit code to
> > > find the userspace PI buffer and amend the block integrity code to attach our
> > > own PI buffer.  You'd still have to let the block layer set the sector # field,
> > > but afaik that won't affect the crc or the app tag.
> > > 
> > > I hear that the NFS guys want to propose some sort of protocol for transmitting
> > > PI data (across NFS), but I haven't seen anything concrete yet.
> > 
> > I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
> > 
> > Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
> 
> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
> coding hasn't happened.  I do think we're better off with some kind of
> explicit API than some magic state on the file.  I mean, even something
> like:
> 
> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
> 			      const void *pi, size_t pi_count);
> 
> It's not as nice as a non-historical API (eg sys_dio), but it also
> probably plays nicer with buffered I/O.

I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
and all the other plumbing necessary to make that happen...

void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
		       int iovcnt, long long offset, const void *pi,
		       size_t pi_count);

--D
> 
> Joel
> 
> > 
> > > Well, I hope I'll scrape together the time to hack together a PoC before LSF...
> > > on the other hand, I ran the discussion about PI userland interfaces at LPC2011
> > > and (shamefully) haven't done anything yet.
> > > 
> > > <end rambling>
> > > 
> > > --D
> > >> 
> > >> Regards,
> > >> 	Ben
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> > 
> > 
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> 
> "I think it would be a good idea."  
>         - Mahatma Ghandi, when asked what he thought of Western
>           civilization
> 
> 			http://www.jlbec.org/
> 			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 10:01       ` Darrick J. Wong
@ 2013-02-07 11:27         ` Hannes Reinecke
  2013-02-07 12:08             ` Boaz Harrosh
  0 siblings, 1 reply; 28+ messages in thread
From: Hannes Reinecke @ 2013-02-07 11:27 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel, linux-scsi,
	martin.petersen

On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>
>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>
>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>> Hi,
>>>>>
>>>>> I'm interested in discussing how to pass protection information to and from
>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>
>>>>> I read that some work has already been done in this area but have not been able
>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>> get the guard and application tags back out to backup applications such as
>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>> is also interesting.
>>>>
>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>
>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>> but afaik that won't affect the crc or the app tag.
>>>>
>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>
>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>
>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>
>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>> coding hasn't happened.  I do think we're better off with some kind of
>> explicit API than some magic state on the file.  I mean, even something
>> like:
>>
>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>> 			      const void *pi, size_t pi_count);
>>
>> It's not as nice as a non-historical API (eg sys_dio), but it also
>> probably plays nicer with buffered I/O.
>
> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
> and all the other plumbing necessary to make that happen...
>
> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
> 		       int iovcnt, long long offset, const void *pi,
> 		       size_t pi_count);
>
This is also what I've envisioned.
Updating io_prep / async I/O is reasonably easy as its been using a 
separate structure for passing in the I/O details.

Normal read/write calls don't really map as you simply don't have 
enough parameter to feed PI information into the kernel.
So for that you'd need to invent a new interface / syscall.

For aio we just need to add additional fields to an existing structure.

So yeah, I'd be interested in that discussion as well.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 11:27         ` Hannes Reinecke
@ 2013-02-07 12:08             ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:08 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>
>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>>
>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm interested in discussing how to pass protection information to and from
>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>
>>>>>> I read that some work has already been done in this area but have not been able
>>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>>> get the guard and application tags back out to backup applications such as
>>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>>> is also interesting.
>>>>>
>>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>
>>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>>> but afaik that won't affect the crc or the app tag.
>>>>>
>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>
>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>>
>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>>
>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>> coding hasn't happened.  I do think we're better off with some kind of
>>> explicit API than some magic state on the file.  I mean, even something
>>> like:
>>>
>>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>> 			      const void *pi, size_t pi_count);
>>>
>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>> probably plays nicer with buffered I/O.
>>
>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>> and all the other plumbing necessary to make that happen...
>>
>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>> 		       int iovcnt, long long offset, const void *pi,
>> 		       size_t pi_count);
>>
> This is also what I've envisioned.
> Updating io_prep / async I/O is reasonably easy as its been using a 
> separate structure for passing in the I/O details.
> 
> Normal read/write calls don't really map as you simply don't have 
> enough parameter to feed PI information into the kernel.
> So for that you'd need to invent a new interface / syscall.
> 
> For aio we just need to add additional fields to an existing structure.
> 
> So yeah, I'd be interested in that discussion as well.
> 

Me too, in multiple fronts. It's part of my general concern about
   "things we would like for user-mode servers"

I think that the current aio and libaio Interface is broken for a long
time, for multitude of reasons. For instance the nested structure definitions
are COMPAT broken, and lots of missing pieces. (For example search in archives
for why bsg does not support sg-lists.)

And there are all these additions that everyone wants on top, that call for
a new interface anyway.

So I would like to see a deep fixup of this interface, with an aio version2
that can take into considerations, all of future needs including these
above. Kernel code will be very happy to be implemented with the new, interface
and a COMPAT layer could be put in place for the old interface.

All interested parties should bring to the table what is the extension/changes
they need. And we can try and union all of them together.

(My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
 I know that qemu was wanting this for a while as well as the multitude of
 user-mode servers)

Thanks
Boaz

> Cheers,
> 
> Hannes
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
@ 2013-02-07 12:08             ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:08 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>
>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>>
>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I'm interested in discussing how to pass protection information to and from
>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>
>>>>>> I read that some work has already been done in this area but have not been able
>>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>>> get the guard and application tags back out to backup applications such as
>>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>>> is also interesting.
>>>>>
>>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>
>>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>>> but afaik that won't affect the crc or the app tag.
>>>>>
>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>
>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>>
>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>>
>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>> coding hasn't happened.  I do think we're better off with some kind of
>>> explicit API than some magic state on the file.  I mean, even something
>>> like:
>>>
>>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>> 			      const void *pi, size_t pi_count);
>>>
>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>> probably plays nicer with buffered I/O.
>>
>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>> and all the other plumbing necessary to make that happen...
>>
>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>> 		       int iovcnt, long long offset, const void *pi,
>> 		       size_t pi_count);
>>
> This is also what I've envisioned.
> Updating io_prep / async I/O is reasonably easy as its been using a 
> separate structure for passing in the I/O details.
> 
> Normal read/write calls don't really map as you simply don't have 
> enough parameter to feed PI information into the kernel.
> So for that you'd need to invent a new interface / syscall.
> 
> For aio we just need to add additional fields to an existing structure.
> 
> So yeah, I'd be interested in that discussion as well.
> 

Me too, in multiple fronts. It's part of my general concern about
   "things we would like for user-mode servers"

I think that the current aio and libaio Interface is broken for a long
time, for multitude of reasons. For instance the nested structure definitions
are COMPAT broken, and lots of missing pieces. (For example search in archives
for why bsg does not support sg-lists.)

And there are all these additions that everyone wants on top, that call for
a new interface anyway.

So I would like to see a deep fixup of this interface, with an aio version2
that can take into considerations, all of future needs including these
above. Kernel code will be very happy to be implemented with the new, interface
and a COMPAT layer could be put in place for the old interface.

All interested parties should bring to the table what is the extension/changes
they need. And we can try and union all of them together.

(My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
 I know that qemu was wanting this for a while as well as the multitude of
 user-mode servers)

Thanks
Boaz

> Cheers,
> 
> Hannes
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 12:08             ` Boaz Harrosh
@ 2013-02-07 12:16               ` Boaz Harrosh
  -1 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 02:08 PM, Boaz Harrosh wrote:
> On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
>> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>>
>>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>>>
>>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm interested in discussing how to pass protection information to and from
>>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>>
>>>>>>> I read that some work has already been done in this area but have not been able
>>>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>>>> get the guard and application tags back out to backup applications such as
>>>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>>>> is also interesting.
>>>>>>
>>>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>>
>>>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>>>> but afaik that won't affect the crc or the app tag.
>>>>>>
>>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>>
>>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>>>
>>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>>>
>>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>>> coding hasn't happened.  I do think we're better off with some kind of
>>>> explicit API than some magic state on the file.  I mean, even something
>>>> like:
>>>>
>>>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>>> 			      const void *pi, size_t pi_count);
>>>>
>>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>>> probably plays nicer with buffered I/O.
>>>
>>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>>> and all the other plumbing necessary to make that happen...
>>>
>>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>>> 		       int iovcnt, long long offset, const void *pi,
>>> 		       size_t pi_count);
>>>
>> This is also what I've envisioned.
>> Updating io_prep / async I/O is reasonably easy as its been using a 
>> separate structure for passing in the I/O details.
>>
>> Normal read/write calls don't really map as you simply don't have 
>> enough parameter to feed PI information into the kernel.
>> So for that you'd need to invent a new interface / syscall.
>>
>> For aio we just need to add additional fields to an existing structure.
>>
>> So yeah, I'd be interested in that discussion as well.
>>
> 
> Me too, in multiple fronts. It's part of my general concern about
>    "things we would like for user-mode servers"
> 
> I think that the current aio and libaio Interface is broken for a long
> time, for multitude of reasons. For instance the nested structure definitions
> are COMPAT broken, and lots of missing pieces. (For example search in archives
> for why bsg does not support sg-lists.)
> 
> And there are all these additions that everyone wants on top, that call for
> a new interface anyway.
> 
> So I would like to see a deep fixup of this interface, with an aio version2
> that can take into considerations, all of future needs including these
> above. Kernel code will be very happy to be implemented with the new, interface
> and a COMPAT layer could be put in place for the old interface.
> 
> All interested parties should bring to the table what is the extension/changes
> they need. And we can try and union all of them together.
> 
> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>  I know that qemu was wanting this for a while as well as the multitude of
>  user-mode servers)
> 

I wanted to add that there is another LSF/MM thread going on about:
	"[LSF TOPIC] What to do about O_DIRECT?"

All these guys should be participating here, so to change core structures
and behavior to a better model, that helps us here, and not against us.

(Again libaio should be changed in concert with Kernel's new API, and we
 can sacrifice old user-mode performance, with a COMPAT layer. Distro
 maintainers should consider replacing libaio, together with the new
 Kernel, so it is only those that do their own mix-and-match, who can
 fix that mismatch too)

> Thanks
> Boaz
> 
>> Cheers,
>>
>> Hannes
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
@ 2013-02-07 12:16               ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 02:08 PM, Boaz Harrosh wrote:
> On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
>> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>>
>>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>>>
>>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm interested in discussing how to pass protection information to and from
>>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>>
>>>>>>> I read that some work has already been done in this area but have not been able
>>>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>>>> get the guard and application tags back out to backup applications such as
>>>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>>>> is also interesting.
>>>>>>
>>>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>>
>>>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>>>> but afaik that won't affect the crc or the app tag.
>>>>>>
>>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>>
>>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>>>
>>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>>>
>>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>>> coding hasn't happened.  I do think we're better off with some kind of
>>>> explicit API than some magic state on the file.  I mean, even something
>>>> like:
>>>>
>>>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>>> 			      const void *pi, size_t pi_count);
>>>>
>>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>>> probably plays nicer with buffered I/O.
>>>
>>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>>> and all the other plumbing necessary to make that happen...
>>>
>>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>>> 		       int iovcnt, long long offset, const void *pi,
>>> 		       size_t pi_count);
>>>
>> This is also what I've envisioned.
>> Updating io_prep / async I/O is reasonably easy as its been using a 
>> separate structure for passing in the I/O details.
>>
>> Normal read/write calls don't really map as you simply don't have 
>> enough parameter to feed PI information into the kernel.
>> So for that you'd need to invent a new interface / syscall.
>>
>> For aio we just need to add additional fields to an existing structure.
>>
>> So yeah, I'd be interested in that discussion as well.
>>
> 
> Me too, in multiple fronts. It's part of my general concern about
>    "things we would like for user-mode servers"
> 
> I think that the current aio and libaio Interface is broken for a long
> time, for multitude of reasons. For instance the nested structure definitions
> are COMPAT broken, and lots of missing pieces. (For example search in archives
> for why bsg does not support sg-lists.)
> 
> And there are all these additions that everyone wants on top, that call for
> a new interface anyway.
> 
> So I would like to see a deep fixup of this interface, with an aio version2
> that can take into considerations, all of future needs including these
> above. Kernel code will be very happy to be implemented with the new, interface
> and a COMPAT layer could be put in place for the old interface.
> 
> All interested parties should bring to the table what is the extension/changes
> they need. And we can try and union all of them together.
> 
> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>  I know that qemu was wanting this for a while as well as the multitude of
>  user-mode servers)
> 

I wanted to add that there is another LSF/MM thread going on about:
	"[LSF TOPIC] What to do about O_DIRECT?"

All these guys should be participating here, so to change core structures
and behavior to a better model, that helps us here, and not against us.

(Again libaio should be changed in concert with Kernel's new API, and we
 can sacrifice old user-mode performance, with a COMPAT layer. Distro
 maintainers should consider replacing libaio, together with the new
 Kernel, so it is only those that do their own mix-and-match, who can
 fix that mismatch too)

> Thanks
> Boaz
> 
>> Cheers,
>>
>> Hannes
>>
> 


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 12:08             ` Boaz Harrosh
  (?)
  (?)
@ 2013-02-07 12:29             ` Bart Van Assche
  2013-02-07 12:47                 ` Boaz Harrosh
  -1 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2013-02-07 12:29 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Hannes Reinecke, Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc,
	linux-fsdevel, linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/13 13:08, Boaz Harrosh wrote:
> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>   I know that qemu was wanting this for a while as well as the multitude of
>   user-mode servers)

Do you think it would help / make sense if sg_alloc_table() would be 
modified such that it allocates the entire scatterlist table via one 
vmalloc() call instead of chaining several page-sized scatterlist tables 
? Note: such a change is not possible without modifying 
scsi_alloc_sgtable().

Bart.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 12:16               ` Boaz Harrosh
  (?)
@ 2013-02-07 12:33               ` Hannes Reinecke
  2013-02-07 12:54                   ` Boaz Harrosh
  -1 siblings, 1 reply; 28+ messages in thread
From: Hannes Reinecke @ 2013-02-07 12:33 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 01:16 PM, Boaz Harrosh wrote:
> On 02/07/2013 02:08 PM, Boaz Harrosh wrote:
>> On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
>>> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>>>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>>>
>>>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong" <darrick.wong@oracle.com> wrote:
>>>>>>
>>>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm interested in discussing how to pass protection information to and from
>>>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>>>
>>>>>>>> I read that some work has already been done in this area but have not been able
>>>>>>>> to locate it.  It looks like the bio-integrity code already makes it possible
>>>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be able to
>>>>>>>> get the guard and application tags back out to backup applications such as
>>>>>>>> xfsdump.  Enabling other applications to generate their own tags in userspace
>>>>>>>> is also interesting.
>>>>>>>
>>>>>>> This one's been on my list for a couple of years (and companies) too.  A few
>>>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that hasn't
>>>>>>> gone anywhere), and more recently I've theorized that we could add a magic
>>>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a O_DIRECT
>>>>>>> *{read,write}v call as the PI buffer, which I think is similar to how DIX gets
>>>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>>>
>>>>>>> I /think/ it's fairly straightforward to change the directio submit code to
>>>>>>> find the userspace PI buffer and amend the block integrity code to attach our
>>>>>>> own PI buffer.  You'd still have to let the block layer set the sector # field,
>>>>>>> but afaik that won't affect the crc or the app tag.
>>>>>>>
>>>>>>> I hear that the NFS guys want to propose some sort of protocol for transmitting
>>>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>>>
>>>>>> I'm writing a requirements document for the NFS protocol which I can discuss at LSF.  The use cases for NFS for now would be virtual disk devices (hypervisors) or direct NFS access to storage from user space.
>>>>>>
>>>>>> Like everyone else we are waiting for a magical VFS and user space API to appear that can pass PI to and from storage.
>>>>>
>>>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>>>> coding hasn't happened.  I do think we're better off with some kind of
>>>>> explicit API than some magic state on the file.  I mean, even something
>>>>> like:
>>>>>
>>>>> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>>>> 			      const void *pi, size_t pi_count);
>>>>>
>>>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>>>> probably plays nicer with buffered I/O.
>>>>
>>>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>>>> and all the other plumbing necessary to make that happen...
>>>>
>>>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>>>> 		       int iovcnt, long long offset, const void *pi,
>>>> 		       size_t pi_count);
>>>>
>>> This is also what I've envisioned.
>>> Updating io_prep / async I/O is reasonably easy as its been using a
>>> separate structure for passing in the I/O details.
>>>
>>> Normal read/write calls don't really map as you simply don't have
>>> enough parameter to feed PI information into the kernel.
>>> So for that you'd need to invent a new interface / syscall.
>>>
>>> For aio we just need to add additional fields to an existing structure.
>>>
>>> So yeah, I'd be interested in that discussion as well.
>>>
>>
>> Me too, in multiple fronts. It's part of my general concern about
>>     "things we would like for user-mode servers"
>>
>> I think that the current aio and libaio Interface is broken for a long
>> time, for multitude of reasons. For instance the nested structure definitions
>> are COMPAT broken, and lots of missing pieces. (For example search in archives
>> for why bsg does not support sg-lists.)
>>
>> And there are all these additions that everyone wants on top, that call for
>> a new interface anyway.
>>
>> So I would like to see a deep fixup of this interface, with an aio version2
>> that can take into considerations, all of future needs including these
>> above. Kernel code will be very happy to be implemented with the new, interface
>> and a COMPAT layer could be put in place for the old interface.
>>
>> All interested parties should bring to the table what is the extension/changes
>> they need. And we can try and union all of them together.
>>
>> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>>   I know that qemu was wanting this for a while as well as the multitude of
>>   user-mode servers)
>>
>
> I wanted to add that there is another LSF/MM thread going on about:
> 	"[LSF TOPIC] What to do about O_DIRECT?"
>
> All these guys should be participating here, so to change core structures
> and behavior to a better model, that helps us here, and not against us.
>
> (Again libaio should be changed in concert with Kernel's new API, and we
>   can sacrifice old user-mode performance, with a COMPAT layer. Distro
>   maintainers should consider replacing libaio, together with the new
>   Kernel, so it is only those that do their own mix-and-match, who can
>   fix that mismatch too)
>
And while we're at it, I still would _love_ to connect aio_cancel() 
and blk_abort_request().

That way we could sensibly abort an I/O and get out of the darn 'D' 
state.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 12:29             ` Bart Van Assche
@ 2013-02-07 12:47                 ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hannes Reinecke, Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc,
	linux-fsdevel, linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 02:29 PM, Bart Van Assche wrote:
> On 02/07/13 13:08, Boaz Harrosh wrote:
>> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>>   I know that qemu was wanting this for a while as well as the multitude of
>>   user-mode servers)
> 
> Do you think it would help / make sense if sg_alloc_table() would be 
> modified such that it allocates the entire scatterlist table via one 
> vmalloc() call instead of chaining several page-sized scatterlist tables 
> ? Note: such a change is not possible without modifying 
> scsi_alloc_sgtable().
> 

I don't think so, no. sg_alloc_table() is used not only for direct IO
also for buffered, Now vmalloc() is terribly slow and would be a bottleneck
in today's SSD performance.

I love it that the Linux Kernel never uses vmalloc internally, and only ever
chains everything to upto PAGE_SIZE sized objects. Coming from all these
other OSs that don't, believe me, it is great great performance pain.
(TLBs are a bitch)

> Bart.
> 

Thanks
Boaz


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
@ 2013-02-07 12:47                 ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Hannes Reinecke, Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc,
	linux-fsdevel, linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 02:29 PM, Bart Van Assche wrote:
> On 02/07/13 13:08, Boaz Harrosh wrote:
>> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>>   I know that qemu was wanting this for a while as well as the multitude of
>>   user-mode servers)
> 
> Do you think it would help / make sense if sg_alloc_table() would be 
> modified such that it allocates the entire scatterlist table via one 
> vmalloc() call instead of chaining several page-sized scatterlist tables 
> ? Note: such a change is not possible without modifying 
> scsi_alloc_sgtable().
> 

I don't think so, no. sg_alloc_table() is used not only for direct IO
also for buffered, Now vmalloc() is terribly slow and would be a bottleneck
in today's SSD performance.

I love it that the Linux Kernel never uses vmalloc internally, and only ever
chains everything to upto PAGE_SIZE sized objects. Coming from all these
other OSs that don't, believe me, it is great great performance pain.
(TLBs are a bitch)

> Bart.
> 

Thanks
Boaz


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 12:33               ` Hannes Reinecke
@ 2013-02-07 12:54                   ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:54 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 02:33 PM, Hannes Reinecke wrote:
> On 02/07/2013 01:16 PM, Boaz Harrosh wrote:
>> (Again libaio should be changed in concert with Kernel's new API, and we
>>   can sacrifice old user-mode performance, with a COMPAT layer. Distro
>>   maintainers should consider replacing libaio, together with the new
>>   Kernel, so it is only those that do their own mix-and-match, who can
>>   fix that mismatch too)
>>
> And while we're at it, I still would _love_ to connect aio_cancel() 
> and blk_abort_request().
> 
> That way we could sensibly abort an I/O and get out of the darn 'D' 
> state.
> 

Yes!! Thanks. It is very interesting how the socket side of the world
had it correct for ages, and the same "fd" object on disks is second grade
citizen in UNIX land. (Anybody voting for epoll on async disk IO? )

Thanks Hannes yes that too. And wait_interuptable() too, at couple of
places, will need some serious error handling audit for that.

> Cheers,
> 
> Hannes
> 

Boaz


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
@ 2013-02-07 12:54                   ` Boaz Harrosh
  0 siblings, 0 replies; 28+ messages in thread
From: Boaz Harrosh @ 2013-02-07 12:54 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On 02/07/2013 02:33 PM, Hannes Reinecke wrote:
> On 02/07/2013 01:16 PM, Boaz Harrosh wrote:
>> (Again libaio should be changed in concert with Kernel's new API, and we
>>   can sacrifice old user-mode performance, with a COMPAT layer. Distro
>>   maintainers should consider replacing libaio, together with the new
>>   Kernel, so it is only those that do their own mix-and-match, who can
>>   fix that mismatch too)
>>
> And while we're at it, I still would _love_ to connect aio_cancel() 
> and blk_abort_request().
> 
> That way we could sensibly abort an I/O and get out of the darn 'D' 
> state.
> 

Yes!! Thanks. It is very interesting how the socket side of the world
had it correct for ages, and the same "fd" object on disks is second grade
citizen in UNIX land. (Anybody voting for epoll on async disk IO? )

Thanks Hannes yes that too. And wait_interuptable() too, at couple of
places, will need some serious error handling audit for that.

> Cheers,
> 
> Hannes
> 

Boaz


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 12:08             ` Boaz Harrosh
@ 2013-02-07 16:19               ` Jeff Moyer
  -1 siblings, 0 replies; 28+ messages in thread
From: Jeff Moyer @ 2013-02-07 16:19 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Hannes Reinecke, Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc,
	linux-fsdevel, linux-scsi, martin.petersen, FUJITA Tomonori,
	Zach Brown

Boaz Harrosh <bharrosh@panasas.com> writes:

>>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>>> and all the other plumbing necessary to make that happen...
>>>
>>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>>> 		       int iovcnt, long long offset, const void *pi,
>>> 		       size_t pi_count);
>>>
>> This is also what I've envisioned.
>> Updating io_prep / async I/O is reasonably easy as its been using a 
>> separate structure for passing in the I/O details.
>> 
>> Normal read/write calls don't really map as you simply don't have 
>> enough parameter to feed PI information into the kernel.
>> So for that you'd need to invent a new interface / syscall.
>> 
>> For aio we just need to add additional fields to an existing structure.
>> 
>> So yeah, I'd be interested in that discussion as well.

Sure, it's easy to start there, but then you eventually end up having to
add a non-aio interface as well.  Let's not take the latter off the
table.

> Me too, in multiple fronts. It's part of my general concern about
>    "things we would like for user-mode servers"
>
> I think that the current aio and libaio Interface is broken for a long
> time, for multitude of reasons. For instance the nested structure definitions
> are COMPAT broken

News to me.  I run the libaio test harness built with -m32 on 64 bit
regularly.  What, exactly, is broken?

> , and lots of missing pieces. (For example search in archives
> for why bsg does not support sg-lists.)

> And there are all these additions that everyone wants on top, that call for
> a new interface anyway.

What was proposed above does not require a new interface.  It's just an
additional IO_CMD_*.  I'm not saying there aren't reasons for a new
interface, it's just I didn't see any in this thread.

> So I would like to see a deep fixup of this interface, with an aio version2
> that can take into considerations, all of future needs including these
> above. Kernel code will be very happy to be implemented with the new, interface
> and a COMPAT layer could be put in place for the old interface.
>
> All interested parties should bring to the table what is the extension/changes
> they need. And we can try and union all of them together.
>
> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>  I know that qemu was wanting this for a while as well as the multitude of
>  user-mode servers)

I'm not sure how that's directly related to aio, but ok.  If we're going
to rewrite the aio code, I think Zach's acall would be a good start, at
least on the API front:
  http://lwn.net/Articles/316806/

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
@ 2013-02-07 16:19               ` Jeff Moyer
  0 siblings, 0 replies; 28+ messages in thread
From: Jeff Moyer @ 2013-02-07 16:19 UTC (permalink / raw)
  To: Boaz Harrosh
  Cc: Hannes Reinecke, Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc,
	linux-fsdevel, linux-scsi, martin.petersen, FUJITA Tomonori,
	Zach Brown

Boaz Harrosh <bharrosh@panasas.com> writes:

>>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to libaio
>>> and all the other plumbing necessary to make that happen...
>>>
>>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>>> 		       int iovcnt, long long offset, const void *pi,
>>> 		       size_t pi_count);
>>>
>> This is also what I've envisioned.
>> Updating io_prep / async I/O is reasonably easy as its been using a 
>> separate structure for passing in the I/O details.
>> 
>> Normal read/write calls don't really map as you simply don't have 
>> enough parameter to feed PI information into the kernel.
>> So for that you'd need to invent a new interface / syscall.
>> 
>> For aio we just need to add additional fields to an existing structure.
>> 
>> So yeah, I'd be interested in that discussion as well.

Sure, it's easy to start there, but then you eventually end up having to
add a non-aio interface as well.  Let's not take the latter off the
table.

> Me too, in multiple fronts. It's part of my general concern about
>    "things we would like for user-mode servers"
>
> I think that the current aio and libaio Interface is broken for a long
> time, for multitude of reasons. For instance the nested structure definitions
> are COMPAT broken

News to me.  I run the libaio test harness built with -m32 on 64 bit
regularly.  What, exactly, is broken?

> , and lots of missing pieces. (For example search in archives
> for why bsg does not support sg-lists.)

> And there are all these additions that everyone wants on top, that call for
> a new interface anyway.

What was proposed above does not require a new interface.  It's just an
additional IO_CMD_*.  I'm not saying there aren't reasons for a new
interface, it's just I didn't see any in this thread.

> So I would like to see a deep fixup of this interface, with an aio version2
> that can take into considerations, all of future needs including these
> above. Kernel code will be very happy to be implemented with the new, interface
> and a COMPAT layer could be put in place for the old interface.
>
> All interested parties should bring to the table what is the extension/changes
> they need. And we can try and union all of them together.
>
> (My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
>  I know that qemu was wanting this for a while as well as the multitude of
>  user-mode servers)

I'm not sure how that's directly related to aio, but ok.  If we're going
to rewrite the aio code, I think Zach's acall would be a good start, at
least on the API front:
  http://lwn.net/Articles/316806/

Cheers,
Jeff

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 16:19               ` Jeff Moyer
  (?)
@ 2013-02-07 17:27               ` Zach Brown
  2013-02-07 17:36                 ` Joel Becker
  -1 siblings, 1 reply; 28+ messages in thread
From: Zach Brown @ 2013-02-07 17:27 UTC (permalink / raw)
  To: Jeff Moyer
  Cc: Boaz Harrosh, Hannes Reinecke, Darrick J. Wong, Chuck Lever,
	Ben Myers, lsf-pc, linux-fsdevel, linux-scsi, martin.petersen,
	FUJITA Tomonori

On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote:
> Boaz Harrosh <bharrosh@panasas.com> writes:
> >> 
> >> For aio we just need to add additional fields to an existing structure.
> >> 
> >> So yeah, I'd be interested in that discussion as well.
> 
> Sure, it's easy to start there, but then you eventually end up having to
> add a non-aio interface as well.  Let's not take the latter off the
> table.

I agree that a sync variant should't be ignored, but needing a sync
interface with PI arguments also shouldn't get in the way of adding
support to the aio+dio path.  Simply because it's what people use :/.

> I'm not sure how that's directly related to aio, but ok.  If we're going
> to rewrite the aio code, I think Zach's acall would be a good start, at
> least on the API front:
>   http://lwn.net/Articles/316806/

Yeah, I'm happy to chat about this stuff if people are interested.  I
think I'd do things differently today than what was done in that aged
acall prototype.

- z

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 17:27               ` Zach Brown
@ 2013-02-07 17:36                 ` Joel Becker
  2013-02-07 21:04                   ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: Joel Becker @ 2013-02-07 17:36 UTC (permalink / raw)
  To: Zach Brown
  Cc: Jeff Moyer, Boaz Harrosh, Hannes Reinecke, Darrick J. Wong,
	Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel, linux-scsi,
	martin.petersen, FUJITA Tomonori

Dear LSF committee,
	I'd like to explicitly request attendance for this discussion
:-)

Joel

On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote:
> On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote:
> > Boaz Harrosh <bharrosh@panasas.com> writes:
> > >> 
> > >> For aio we just need to add additional fields to an existing structure.
> > >> 
> > >> So yeah, I'd be interested in that discussion as well.
> > 
> > Sure, it's easy to start there, but then you eventually end up having to
> > add a non-aio interface as well.  Let's not take the latter off the
> > table.
> 
> I agree that a sync variant should't be ignored, but needing a sync
> interface with PI arguments also shouldn't get in the way of adding
> support to the aio+dio path.  Simply because it's what people use :/.
> 
> > I'm not sure how that's directly related to aio, but ok.  If we're going
> > to rewrite the aio code, I think Zach's acall would be a good start, at
> > least on the API front:
> >   http://lwn.net/Articles/316806/
> 
> Yeah, I'm happy to chat about this stuff if people are interested.  I
> think I'd do things differently today than what was done in that aged
> acall prototype.
> 
> - z
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"You can get more with a kind word and a gun than you can with
 a kind word alone."
         - Al Capone

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-06 20:24 ` Darrick J. Wong
  2013-02-06 20:34   ` Chuck Lever
@ 2013-02-07 19:09   ` Martin K. Petersen
  2013-02-07 23:45     ` Darrick J. Wong
  1 sibling, 1 reply; 28+ messages in thread
From: Martin K. Petersen @ 2013-02-07 19:09 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Ben Myers, lsf-pc, linux-fsdevel, linux-scsi, martin.petersen

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> and more recently I've theorized that we could add a magic
Darrick> fcntl/ioctl to make the kernel recognize, say, the first iovec
Darrick> of a O_DIRECT *{read,write}v call as the PI buffer, which I
Darrick> think is similar to how DIX gets PI data to a disk.  But it's
Darrick> not like I have any code to show for it.

I don't particularly like the "stick it in the first iovec" magic. Also,
we need a bit more than this. A handful of knobs need to be present to
convey how the PI should be sliced and diced. So then we get into the
territory where the first iovec is a PI descriptor of some sort. And
then the second entry is the PI buffer.


Darrick> I /think/ it's fairly straightforward to change the directio
Darrick> submit code to find the userspace PI buffer and amend the block
Darrick> integrity code to attach our own PI buffer.  

I recommend that you check out how I do this in oracleasm.


Darrick> You'd still have to let the block layer set the sector # field,
Darrick> but afaik that won't affect the crc or the app tag.

Correct. But the right way would be to pass the ref tag seed in as part
of the IOCB and let sd or the HBA hardware do the remapping.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07  9:40     ` Joel Becker
  2013-02-07 10:01       ` Darrick J. Wong
@ 2013-02-07 19:12       ` Martin K. Petersen
  2013-02-08  9:36         ` Joel Becker
  1 sibling, 1 reply; 28+ messages in thread
From: Martin K. Petersen @ 2013-02-07 19:12 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Darrick J. Wong, Ben Myers, lsf-pc, linux-fsdevel, linux-scsi,
	martin.petersen

>>>>> "Joel" == Joel Becker <jlbec@evilplan.org> writes:

Joel> I'm happy to chat about it.  Unfortunately, like Darrick says,
Joel> sys_dio() coding hasn't happened.  I do think we're better off
Joel> with some kind of explicit API than some magic state on the file.
Joel> I mean, even something like:

Joel> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
Joel> 			      const void *pi, size_t pi_count);

Joel> It's not as nice as a non-historical API (eg sys_dio), but it also
Joel> probably plays nicer with buffered I/O.

Pretty much everyone I have talked to that are interested in explicitly
attaching PI (as opposed to relying on the kernel doing it) are using
Linux aio.

I am not opposed to having more read()/write() like interface as
well. But I think it's important to cater to the I/O paradigm used by
the applications interested in this. It's a lot easier to tweak a few
IOCB fields than it is to rewrite how an application does I/O.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-06 19:51 [LSF/MM TOPIC][ATTEND] protection information and userspace Ben Myers
  2013-02-06 20:24 ` Darrick J. Wong
@ 2013-02-07 19:20 ` Martin K. Petersen
  1 sibling, 0 replies; 28+ messages in thread
From: Martin K. Petersen @ 2013-02-07 19:20 UTC (permalink / raw)
  To: Ben Myers; +Cc: lsf-pc, linux-fsdevel, linux-scsi, martin.petersen

>>>>> "Ben" == Ben Myers <bpm@sgi.com> writes:

Ben> I'm interested in discussing how to pass protection information to
Ben> and from userspace.  Maybe Martin could be enlisted for the
Ben> discussion.

I'll be there, obviously.


Ben> I read that some work has already been done in this area but have
Ben> not been able to locate it.  It looks like the bio-integrity code
Ben> already makes it possible to generate the t10-dif crc in the
Ben> filesystem.  

Yep. Although the block layer will generate the PI when the filesystem
submits the bio. So until we have a userland conduit there hasn't been
much point in the filesystems mucking with the PI explicitly.


Ben> It would be good to be able to get the guard and application tags
Ben> back out to backup applications such as xfsdump.  Enabling other
Ben> applications to generate their own tags in userspace is also
Ben> interesting.

However, the app tag is really only good for disk drives. Most array
vendors use it internally. And going forward we're going to use it for
access control instead of opaque storage. So exposing the application
tag space to userland applications is of very limited use at this point.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 17:36                 ` Joel Becker
@ 2013-02-07 21:04                   ` J. Bruce Fields
  2013-02-08  9:38                     ` Joel Becker
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2013-02-07 21:04 UTC (permalink / raw)
  To: Zach Brown, Jeff Moyer, Boaz Harrosh, Hannes Reinecke,
	Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On Thu, Feb 07, 2013 at 09:36:39AM -0800, Joel Becker wrote:
> Dear LSF committee,
> 	I'd like to explicitly request attendance for this discussion
> :-)

http://marc.info/?l=linux-fsdevel&m=135894412908342&w=2

	"Also, the way I compile the list of requests is from thread
	heads ...  that means don't send your attendee request as a
	reply to something else either otherwise it might get missed."

--b.

> 
> Joel
> 
> On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote:
> > On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote:
> > > Boaz Harrosh <bharrosh@panasas.com> writes:
> > > >> 
> > > >> For aio we just need to add additional fields to an existing structure.
> > > >> 
> > > >> So yeah, I'd be interested in that discussion as well.
> > > 
> > > Sure, it's easy to start there, but then you eventually end up having to
> > > add a non-aio interface as well.  Let's not take the latter off the
> > > table.
> > 
> > I agree that a sync variant should't be ignored, but needing a sync
> > interface with PI arguments also shouldn't get in the way of adding
> > support to the aio+dio path.  Simply because it's what people use :/.
> > 
> > > I'm not sure how that's directly related to aio, but ok.  If we're going
> > > to rewrite the aio code, I think Zach's acall would be a good start, at
> > > least on the API front:
> > >   http://lwn.net/Articles/316806/
> > 
> > Yeah, I'm happy to chat about this stuff if people are interested.  I
> > think I'd do things differently today than what was done in that aged
> > acall prototype.
> > 
> > - z
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> 
> "You can get more with a kind word and a gun than you can with
>  a kind word alone."
>          - Al Capone
> 
> 			http://www.jlbec.org/
> 			jlbec@evilplan.org
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 19:09   ` Martin K. Petersen
@ 2013-02-07 23:45     ` Darrick J. Wong
  2013-02-07 23:59       ` Martin K. Petersen
  0 siblings, 1 reply; 28+ messages in thread
From: Darrick J. Wong @ 2013-02-07 23:45 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Ben Myers, lsf-pc, linux-fsdevel, linux-scsi

On Thu, Feb 07, 2013 at 02:09:17PM -0500, Martin K. Petersen wrote:
> >>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:
> 
> Darrick> and more recently I've theorized that we could add a magic
> Darrick> fcntl/ioctl to make the kernel recognize, say, the first iovec
> Darrick> of a O_DIRECT *{read,write}v call as the PI buffer, which I
> Darrick> think is similar to how DIX gets PI data to a disk.  But it's
> Darrick> not like I have any code to show for it.
> 
> I don't particularly like the "stick it in the first iovec" magic. Also,
> we need a bit more than this. A handful of knobs need to be present to
> convey how the PI should be sliced and diced. So then we get into the
> territory where the first iovec is a PI descriptor of some sort. And
> then the second entry is the PI buffer.

Hm, well if we're adding another IO_CMD_ anyway, it probably isn't that hard to
find space to stuff in an extra pointer or two to a PI descriptor + buffer.

(or a pointer to a descriptor that itself points to a buffer...)

> Darrick> I /think/ it's fairly straightforward to change the directio
> Darrick> submit code to find the userspace PI buffer and amend the block
> Darrick> integrity code to attach our own PI buffer.  
> 
> I recommend that you check out how I do this in oracleasm.

Is there a newer one than this?
https://oss.oracle.com/projects/oracleasm/files/sources/

(Nov. 2008?)

> 
> Darrick> You'd still have to let the block layer set the sector # field,
> Darrick> but afaik that won't affect the crc or the app tag.
> 
> Correct. But the right way would be to pass the ref tag seed in as part
> of the IOCB and let sd or the HBA hardware do the remapping.

<nod>

--D
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 23:45     ` Darrick J. Wong
@ 2013-02-07 23:59       ` Martin K. Petersen
  0 siblings, 0 replies; 28+ messages in thread
From: Martin K. Petersen @ 2013-02-07 23:59 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Martin K. Petersen, Ben Myers, lsf-pc, linux-fsdevel, linux-scsi

>>>>> "Darrick" == Darrick J Wong <darrick.wong@oracle.com> writes:

Darrick> Is there a newer one than this?
Darrick> https://oss.oracle.com/projects/oracleasm/files/sources/

UEK2 git.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 19:12       ` Martin K. Petersen
@ 2013-02-08  9:36         ` Joel Becker
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Becker @ 2013-02-08  9:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Chuck Lever, Darrick J. Wong, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi

On Thu, Feb 07, 2013 at 02:12:57PM -0500, Martin K. Petersen wrote:
> >>>>> "Joel" == Joel Becker <jlbec@evilplan.org> writes:
> 
> Joel> I'm happy to chat about it.  Unfortunately, like Darrick says,
> Joel> sys_dio() coding hasn't happened.  I do think we're better off
> Joel> with some kind of explicit API than some magic state on the file.
> Joel> I mean, even something like:
> 
> Joel> 	ssize_t write_with_pi(int fd, const void *buf, size_t count,
> Joel> 			      const void *pi, size_t pi_count);
> 
> Joel> It's not as nice as a non-historical API (eg sys_dio), but it also
> Joel> probably plays nicer with buffered I/O.
> 
> Pretty much everyone I have talked to that are interested in explicitly
> attaching PI (as opposed to relying on the kernel doing it) are using
> Linux aio.
> 
> I am not opposed to having more read()/write() like interface as
> well. But I think it's important to cater to the I/O paradigm used by
> the applications interested in this. It's a lot easier to tweak a few
> IOCB fields than it is to rewrite how an application does I/O.

You know I'm not going to argue with this.  I was merely stating that
I'm flexible in how we start :-)

Joel

> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"Depend on the rabbit's foot if you will, but remember, it didn't
 help the rabbit."
	- R. E. Shay

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [LSF/MM TOPIC][ATTEND] protection information and userspace
  2013-02-07 21:04                   ` J. Bruce Fields
@ 2013-02-08  9:38                     ` Joel Becker
  0 siblings, 0 replies; 28+ messages in thread
From: Joel Becker @ 2013-02-08  9:38 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Zach Brown, Jeff Moyer, Boaz Harrosh, Hannes Reinecke,
	Darrick J. Wong, Chuck Lever, Ben Myers, lsf-pc, linux-fsdevel,
	linux-scsi, martin.petersen, FUJITA Tomonori

On Thu, Feb 07, 2013 at 04:04:36PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 07, 2013 at 09:36:39AM -0800, Joel Becker wrote:
> > Dear LSF committee,
> > 	I'd like to explicitly request attendance for this discussion
> > :-)
> 
> http://marc.info/?l=linux-fsdevel&m=135894412908342&w=2
> 
> 	"Also, the way I compile the list of requests is from thread
> 	heads ...  that means don't send your attendee request as a
> 	reply to something else either otherwise it might get missed."

Ack.  Send as such.

Thanks,
Joel

> 
> --b.
> 
> > 
> > Joel
> > 
> > On Thu, Feb 07, 2013 at 09:27:35AM -0800, Zach Brown wrote:
> > > On Thu, Feb 07, 2013 at 11:19:59AM -0500, Jeff Moyer wrote:
> > > > Boaz Harrosh <bharrosh@panasas.com> writes:
> > > > >> 
> > > > >> For aio we just need to add additional fields to an existing structure.
> > > > >> 
> > > > >> So yeah, I'd be interested in that discussion as well.
> > > > 
> > > > Sure, it's easy to start there, but then you eventually end up having to
> > > > add a non-aio interface as well.  Let's not take the latter off the
> > > > table.
> > > 
> > > I agree that a sync variant should't be ignored, but needing a sync
> > > interface with PI arguments also shouldn't get in the way of adding
> > > support to the aio+dio path.  Simply because it's what people use :/.
> > > 
> > > > I'm not sure how that's directly related to aio, but ok.  If we're going
> > > > to rewrite the aio code, I think Zach's acall would be a good start, at
> > > > least on the API front:
> > > >   http://lwn.net/Articles/316806/
> > > 
> > > Yeah, I'm happy to chat about this stuff if people are interested.  I
> > > think I'd do things differently today than what was done in that aged
> > > acall prototype.
> > > 
> > > - z
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > -- 
> > 
> > "You can get more with a kind word and a gun than you can with
> >  a kind word alone."
> >          - Al Capone
> > 
> > 			http://www.jlbec.org/
> > 			jlbec@evilplan.org
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

"You look in her eyes, the music begins to play.
 Hopeless romantics, here we go again."

			http://www.jlbec.org/
			jlbec@evilplan.org

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-02-08  9:38 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-06 19:51 [LSF/MM TOPIC][ATTEND] protection information and userspace Ben Myers
2013-02-06 20:24 ` Darrick J. Wong
2013-02-06 20:34   ` Chuck Lever
2013-02-07  9:40     ` Joel Becker
2013-02-07 10:01       ` Darrick J. Wong
2013-02-07 11:27         ` Hannes Reinecke
2013-02-07 12:08           ` Boaz Harrosh
2013-02-07 12:08             ` Boaz Harrosh
2013-02-07 12:16             ` Boaz Harrosh
2013-02-07 12:16               ` Boaz Harrosh
2013-02-07 12:33               ` Hannes Reinecke
2013-02-07 12:54                 ` Boaz Harrosh
2013-02-07 12:54                   ` Boaz Harrosh
2013-02-07 12:29             ` Bart Van Assche
2013-02-07 12:47               ` Boaz Harrosh
2013-02-07 12:47                 ` Boaz Harrosh
2013-02-07 16:19             ` Jeff Moyer
2013-02-07 16:19               ` Jeff Moyer
2013-02-07 17:27               ` Zach Brown
2013-02-07 17:36                 ` Joel Becker
2013-02-07 21:04                   ` J. Bruce Fields
2013-02-08  9:38                     ` Joel Becker
2013-02-07 19:12       ` Martin K. Petersen
2013-02-08  9:36         ` Joel Becker
2013-02-07 19:09   ` Martin K. Petersen
2013-02-07 23:45     ` Darrick J. Wong
2013-02-07 23:59       ` Martin K. Petersen
2013-02-07 19:20 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.