All of lore.kernel.org
 help / color / mirror / Atom feed
* another semantic storage system (in userspace)
@ 2006-07-12 21:44 Hubert Chan
  2006-07-12 23:22 ` Clay Barnes
  2006-07-13 17:06 ` Clay Barnes
  0 siblings, 2 replies; 7+ messages in thread
From: Hubert Chan @ 2006-07-12 21:44 UTC (permalink / raw)
  To: reiserfs-list

I thought this may be interesting to the people on the list.

I was pointed to the GLScube project (http://www.glscube.org/), which
tries to implement semantic storage in usesrpace.  GLScube (GLS^3)
stands for GNU/Linux Semantic Storage System.  It provides a C++ API for
programmers to use, and also exports a virtual filesystem (glscubefs)
using FUSE, so that legacy applications can take advantage of its
functionality.  It performs incremental indexing (I assume using
either inotify or dnotify).

It is licensed under the GPL and can be downloaded from their website.
Unfortunately, the 0.1 tarball that I downloaded doesn't seem to include
Makefiles, so I'm not sure how to build it.

They also have video demos available on the site.

I think that what makes this more interesting to ReiserFS people,
compared to, say, Beagle, is the glscubefs, which allows legacy
applications to take advantage of GLScube functionality.  Although we'll
have to see how much functionality is actually exposed through glscubefs
(i.e. can we do everything from the commandline?).

-- 
Hubert Chan - email & Jabber: hubert@uhoreg.ca - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA   (Key available at wwwkeys.pgp.net)
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: another semantic storage system (in userspace)
  2006-07-12 21:44 another semantic storage system (in userspace) Hubert Chan
@ 2006-07-12 23:22 ` Clay Barnes
  2006-07-13  7:30   ` Hans Reiser
  2006-07-13 17:06 ` Clay Barnes
  1 sibling, 1 reply; 7+ messages in thread
From: Clay Barnes @ 2006-07-12 23:22 UTC (permalink / raw)
  To: reiserfs-list

On 17:44 Wed 12 Jul     , Hubert Chan wrote:
> I thought this may be interesting to the people on the list.
> 
> I was pointed to the GLScube project (http://www.glscube.org/), which
> tries to implement semantic storage in usesrpace.  GLScube (GLS^3)
> stands for GNU/Linux Semantic Storage System.  It provides a C++ API for
> programmers to use, and also exports a virtual filesystem (glscubefs)
> using FUSE, so that legacy applications can take advantage of its
> functionality.  It performs incremental indexing (I assume using
> either inotify or dnotify).
> 
> It is licensed under the GPL and can be downloaded from their website.
> Unfortunately, the 0.1 tarball that I downloaded doesn't seem to include
> Makefiles, so I'm not sure how to build it.
> 
> They also have video demos available on the site.
> 
> I think that what makes this more interesting to ReiserFS people,
> compared to, say, Beagle, is the glscubefs, which allows legacy
> applications to take advantage of GLScube functionality.  Although we'll
> have to see how much functionality is actually exposed through glscubefs
> (i.e. can we do everything from the commandline?).
> 
> -- 
> Hubert Chan - email & Jabber: hubert@uhoreg.ca - http://www.uhoreg.ca/
> PGP/GnuPG key: 1024D/124B61FA   (Key available at wwwkeys.pgp.net)
> Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA
It's an interesting project, and we as much as anyone appreciate their
efforts to integrate semantics into legacy applications through a
virtual FS, but I don'te see anything in the demonstrations or
whitepaper to benifit the Reiser4 project.  Why build a virtual
filesystem to store semantic data when it can be built into an actual
one?   The one benefit I see is they may be able to do more with search
optimization in their design, but only at the expense of space.  If
Reiser4 has a clever design for metadata storage, though, even that will
be easly bested.

I do like their careful reliance only on standard tools like inotify
FUSE and Lucene to implement their design, instead of writing a bunch of
redundant code and their interest in backwards compadability through a
virtual FS---it has the chance to bring non-heriarchical storage to
some, but it lacks the careful planning that has gone into the 
discussion of the metadata plugin I've seen on the mailing list alone.

--Clay Barnes

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: another semantic storage system (in userspace)
  2006-07-12 23:22 ` Clay Barnes
@ 2006-07-13  7:30   ` Hans Reiser
  0 siblings, 0 replies; 7+ messages in thread
From: Hans Reiser @ 2006-07-13  7:30 UTC (permalink / raw)
  To: Clay Barnes; +Cc: reiserfs-list

I agree that it is an interesting project.

What can I say, it is time for ReiserFS to leave this storage layer
stuff behind us, and start doing the semantics.....

Hopefully by January we can do so.

Hans

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: another semantic storage system (in userspace)
  2006-07-12 21:44 another semantic storage system (in userspace) Hubert Chan
  2006-07-12 23:22 ` Clay Barnes
@ 2006-07-13 17:06 ` Clay Barnes
  2006-07-13 17:38   ` Hans Reiser
  1 sibling, 1 reply; 7+ messages in thread
From: Clay Barnes @ 2006-07-13 17:06 UTC (permalink / raw)
  To: reiserfs-list

I have been thinking lately that though we certainly need to do 
cleanup of the various bugs and such relating to the storage layer,
perhaps now is a good time to review and discuss the plans for the
semantic layer so that any outstanding concerns can be thouroughly
discussed and resolved before we get close to time to start with actual
work on that portion of Reiser4.  Remember, we have a real chance at
being the first semantic storage system with a significant user base,
and that places a terrible pressure for perfection on us (and I use 'us'
loosely, since I don't have nearly the code skills in C needed to dare
touch source in non-trivial ways---I hope however that between my CS and
Linguistics degrees, I'll be able to at least contribute some ideas).
If we're first out of the gate, but we have some significant flaw in
design, we're deeply endangered.  People will wait for our correction of
it (which may be impossible if it's a fundamental or debated problem),
or for another system that has less critical flaws.

These are my cricial concerns.  I know some of these have been addressed
before, but this keeps anything from being skipped under the assumption
that they've already been resolved.
1) Scope
  a) Should the semantic content of files be purely user-defined?
  b) Should the full extricable content of a file be read into semantic
  space?
  c) If so, should there be a seperation of the two forms of content?
  d) How would we address the two in a simple, user-transparent way?
2) Storage
  a) How do we store the semantic data so it is very rapidly accessable
  and easy to update, especially if we decide to use the full textual
  contentent of parsabe file?
3) Changes
  a) Should we instantly index at full capacity changes, or should we
  queue files needing re-indexing for a very low resource daemon to
  process?
  b) If we use the latter, how do we avoid disagreement between newly
  changed/created files and the semanic actions regarding them while the
  daemon works?
  c) If we use the former, how do we mimize the impact of this sudden
  spike in resources to the user without risking letting the index and
  data get out of sync.
4) Portability
  a) Should we provide a way to export semantic data when archiving to
  formats which standards prevent from using Reiser4 (such as DVD)?
  b) How do we handle exports from a partial filesystem, if we decide to
  provide export capabilities?
  c) Should we provide the ability to import from compeating semantic
  systems?  Export?
5) Code revisions
  a) With emerging formats, updates to formats and the numerous ways
  file standard change, how do we provide easy addition and updates to
  the filters we use to index files?
  b) Should we provide a simple user-editable means to change/augment
  filters?
  c) Can these both be resolved by placing the actual filters in
  userspace/filesystemspace instead of into the code?

I hope I haven't overstepped my relevance, and my apologies if I have,
but I just wanted to raise some concerns while they are easy to
address---before the code is started.

Further disclaimer:  I'm at work, so I may have been a little hasty
writing this (though technically, I'm *supposed* to be reasearching
semantic storage systems for our documents, so I'm not really goofing
off), so there may be errors from my minimal review/revision.

Thanks,
Clay


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: another semantic storage system (in userspace)
  2006-07-13 17:06 ` Clay Barnes
@ 2006-07-13 17:38   ` Hans Reiser
  2006-07-13 20:30     ` Hubert Chan
  0 siblings, 1 reply; 7+ messages in thread
From: Hans Reiser @ 2006-07-13 17:38 UTC (permalink / raw)
  To: Clay Barnes; +Cc: reiserfs-list

Clay Barnes wrote:

>I have been thinking lately that though we certainly need to do 
>cleanup of the various bugs and such relating to the storage layer,
>perhaps now is a good time to review and discuss the plans for the
>semantic layer so that any outstanding concerns can be thouroughly
>discussed and resolved before we get close to time to start with actual
>work on that portion of Reiser4.  Remember, we have a real chance at
>being the first semantic storage system with a significant user base,
>and that places a terrible pressure for perfection on us (and I use 'us'
>loosely, since I don't have nearly the code skills in C needed to dare
>touch source in non-trivial ways---I hope however that between my CS and
>Linguistics degrees, I'll be able to at least contribute some ideas).
>If we're first out of the gate, but we have some significant flaw in
>design, we're deeply endangered.  People will wait for our correction of
>it (which may be impossible if it's a fundamental or debated problem),
>or for another system that has less critical flaws.
>
>These are my cricial concerns.  I know some of these have been addressed
>before, but this keeps anything from being skipped under the assumption
>that they've already been resolved.
>1) Scope
>  a) Should the semantic content of files be purely user-defined?
>  
>
Yes.

>  b) Should the full extricable content of a file be read into semantic
>  space?
>  
>
If the user wants that.   The user should configure his auto-indexer
that he has selected to work as he desires and to be applied to those
files he desires to.  By default there should be a delay (such as, until
the repacker runs at night) in indexing to ensure that we only index
that which will be around for a while.  This is for performance reasons.

>  c) If so, should there be a seperation of the two forms of content?
>  d) How would we address the two in a simple, user-transparent way?
>2) Storage
>  a) How do we store the semantic data so it is very rapidly accessable
>  and easy to update, especially if we decide to use the full textual
>  contentent of parsabe file?
>3) Changes
>  a) Should we instantly index at full capacity changes, or should we
>  queue files needing re-indexing for a very low resource daemon to
>  process?
>  b) If we use the latter, how do we avoid disagreement between newly
>  changed/created files and the semanic actions regarding them while the
>  daemon works?
>  c) If we use the former, how do we mimize the impact of this sudden
>  spike in resources to the user without risking letting the index and
>  data get out of sync.
>4) Portability
>  a) Should we provide a way to export semantic data when archiving to
>  formats which standards prevent from using Reiser4 (such as DVD)?
>  b) How do we handle exports from a partial filesystem, if we decide to
>  provide export capabilities?
>  c) Should we provide the ability to import from compeating semantic
>  systems?  Export?
>5) Code revisions
>  a) With emerging formats, updates to formats and the numerous ways
>  file standard change, how do we provide easy addition and updates to
>  the filters we use to index files?
>  b) Should we provide a simple user-editable means to change/augment
>  filters?
>  c) Can these both be resolved by placing the actual filters in
>  userspace/filesystemspace instead of into the code?
>
>I hope I haven't overstepped my relevance, and my apologies if I have,
>but I just wanted to raise some concerns while they are easy to
>address---before the code is started.
>
>Further disclaimer:  I'm at work, so I may have been a little hasty
>writing this (though technically, I'm *supposed* to be reasearching
>semantic storage systems for our documents, so I'm not really goofing
>off), so there may be errors from my minimal review/revision.
>
>Thanks,
>Clay
>
>
>
>  
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: another semantic storage system (in userspace)
  2006-07-13 17:38   ` Hans Reiser
@ 2006-07-13 20:30     ` Hubert Chan
  2006-07-14  0:23       ` Jonathan Briggs
  0 siblings, 1 reply; 7+ messages in thread
From: Hubert Chan @ 2006-07-13 20:30 UTC (permalink / raw)
  To: reiserfs-list

On Thu, 13 Jul 2006 10:38:23 -0700, Hans Reiser <reiser@namesys.com> said:

> Clay Barnes wrote:

>> 1) Scope
>> a) Should the semantic content of files be purely user-defined?

> Yes.

I guess this also raises the question of how multiple users on the same
machine can define their own semantic content (e.g. if user A wants to
index some new file format, but doesn't want to have to bug the
administrator to add support for it).  Will the filesystem be talking to
some userspace daemons?

-- 
Hubert Chan - email & Jabber: hubert@uhoreg.ca - http://www.uhoreg.ca/
PGP/GnuPG key: 1024D/124B61FA   (Key available at wwwkeys.pgp.net)
Fingerprint: 96C5 012F 5F74 A5F7 1FF7  5291 AF29 C719 124B 61FA


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: another semantic storage system (in userspace)
  2006-07-13 20:30     ` Hubert Chan
@ 2006-07-14  0:23       ` Jonathan Briggs
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Briggs @ 2006-07-14  0:23 UTC (permalink / raw)
  To: Hubert Chan; +Cc: Reiserfs mail-list

[-- Attachment #1: Type: text/plain, Size: 1500 bytes --]

On Thu, 2006-07-13 at 16:30 -0400, Hubert Chan wrote:
> On Thu, 13 Jul 2006 10:38:23 -0700, Hans Reiser <reiser@namesys.com> said:
> 
> > Clay Barnes wrote:
> 
> >> 1) Scope
> >> a) Should the semantic content of files be purely user-defined?
> 
> > Yes.
> 
> I guess this also raises the question of how multiple users on the same
> machine can define their own semantic content (e.g. if user A wants to
> index some new file format, but doesn't want to have to bug the
> administrator to add support for it).  Will the filesystem be talking to
> some userspace daemons?

I was thinking that the file system should only index its own meta-data
attributes.  A user-space daemon should read the file contents and
create these attributes.

Search directories would display selected parts of the indexes.  One of
these that would be highly useful for a user-space indexing daemon is a
timestamp search directory.  The indexer would begin with the timestamp
search set to (UID == my user and timestamp > 0).  After indexing a few
files it would update the search to (my user and > timestamp of last
indexed file).  Or possibly, if Reiser4 has something like a 64-bit
monotonic update ID, it could use that instead of a timestamp.

If the filesystem indexes are not going to be updated in real-time but
only at specific times, another search type that could list updated but
not yet indexed files would also be useful.
-- 
Jonathan Briggs <jbriggs@esoft.com>
eSoft, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-07-14  0:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-07-12 21:44 another semantic storage system (in userspace) Hubert Chan
2006-07-12 23:22 ` Clay Barnes
2006-07-13  7:30   ` Hans Reiser
2006-07-13 17:06 ` Clay Barnes
2006-07-13 17:38   ` Hans Reiser
2006-07-13 20:30     ` Hubert Chan
2006-07-14  0:23       ` Jonathan Briggs

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.