[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [kDev] kendraTools: information structure...
Hi Neil and All,
Thanks for clarifying. I've massively snipped this so for Neil's email
please see:
http://www.kendra.org.uk/lists/archive/k-developers/msg00077.html
This kind of referencing the list web archives should be an automatic
feature of our new list Tool - tight integration between web and email.
Please see comments...
On Tuesday, May 6, 2003, at 04:12 pm, Neil Harris wrote:
1 The data store stores _everything_ under a name which is a Unicode
string (the "Universe of discourse")
And we need to be able to store binary files too. Like photos, music
tracks, anything, everything... In conversation with Joe a while ago he
said that on the very low level we'd have a list of object names
pointing to different tables - one for each object type: text string,
long text string, blob, etc.
1a [there will also be simple namespaces to deal with special / new
data types]
We need to have different instances of the same object name. See Job in
the demo. That's the current fault with the demo, we need to be able to
call an object by the same name but mean a different object.
Ultimately, objects become defined by what links they have rather than
what they're called. For example there are many (too many) people in
the world called "Daniel Harris". So, the name can't be the unique
identifier as it is in the demo.
2 these things can be (initially)
* structured data objects with fields made from simple types
(including names of other objects)
* relational statements of the form "x relation y"
* English or other language plain text comments
Yes, the point here being is that we are creating a receptacle that can
hold *any* data structure. It may not do it in the most space efficient
way or in a way that can be searched in the quickest way. But that's
OK. The important point is that we retain the relationships over a
distributed network of servers.
The search paradigm is "offline". Searches may not be that fast.
Especially if querying multiple servers with complicated criteria. So,
we don't want to keep webpages open for hours (?) waiting for results
to come in. So, we'll have a search page of "my most recent searches"
and their progress ("75% completed"). So, the search page gets updated
as more info comes in.
The search result may be examined in great detail. In some cases the
result will be made up from searching cached (mirrored) data that may
not be current. That's OK as the search criteria will dictate how
current the results need to be. It may be that there'll be massive and
fast data caches that hold non-current data (no names ;-) and that'll
be the quickest way to get a result. So, in some cases it will be a
trade off between speed and up-to-dateness.
Of course, it may be that searches end up being really fast and then
can behave as if online, if you get my drift.
Search criteria are themselves object in the data store that can be
linked to, etc.
* page/report templates -- definition by example, etc.
Also referred to as "views". So, we could have plain tabled lists like
we have many of on the website currently. Add to that more bulletin
board type views - also a form of tables. And how about a 2D/3D
globular rendering of the data store placing emphasis on assertions
that have most links to them or coloured based on selected criteria -
one for later, eh?
Anything on the website can be commented on because everything comes
from the database and so can be explicitly referred to. You can
subscr!be to a topic or forum just as with the better bulletin boards.
You can create a topic by selecting a piece of text or image in
"comment mode". The comment mode would render the viewed webpage with
everything hotlinked-clickable-on. When clicked on the user would be
invited to comment on that object. Really cool!
2a Some operations are restricted to admins for the time being.
The "owner" or sysadmin for their own server will always be in full
control of what happens on their server.
3 The system will initially support very simple syntax for declaring
templates for new kinds of object, and creating and editing objects.
If people are linking their-objects to other-people's-objects then they
are not necessarily going to want those other-people's-objects to
change... ever. They may agree with assertion Xv1 but not with Xv2. So,
it may be that we have to say that there are no direct modifications to
objects once placed in the data store. All modifications are a new
object/relationship. So "Y is a modification of X". Hmmm?
4 Every user has an account.
5 All assertions are tagged by which user asserted them.
Yes, anonymity is to be discouraged by ergonomic engineering.
8 Full source + database dumps to be available on servers, so mirrors
can be set up by anyone.
Some data (like addresses and username/password) will be private. A
user should be able to set levels of privacy. They may want to share
their address with everyone or just certain people. These rules will be
pervasive for all objects that a user owns.
The user will also be able to elect where their private data is held.
They may decide to only hold it on their friends server who they trust
or they may let it be mirrored on any server. If they do that then they
run the risk of server owners looking at their private data. But that
may not be a problem for them. The choice is theirs.
Having everything in a relational database *should* bring space savings
for things like access logs. I say *should* because I don't know how
efficient SQL databases are at keeping data small. But at some point we
may want to archive some of the database and take it offline. But if
all these object have relationships to other objects then how are we to
take them offline? I guess we just chuck more hard disk at it! ;-/
9 Need to be able to import data from public domain sources (NIMA
database etc), peer with copyleft content (MusicBrainz etc.), and
allow copyrighted data owners to participate, without giving up the
rights on their data...
Good stuff.
See also: http://www.wikipedia.org/wiki/Wikipedia:Size_comparisons
After reading the list you really get the sense of "raw data" verses
"useful data" - meaning stuff you can do stuff with.
A migration path for the software...
1 single server, running a single copy of the server
2 multiple cooperating servers, running on multiple boxes, run by
Kendra, as proof of concept
3 as 2, but with servers run by other trusted organizations, with a
central Kendra "mothership"
4 allow anyone to act as a content peer? (requires self-governing
community critical mass to manage potential problems)
5 the "mothership" becomes unneccessary.
Yes, Kendra leaves home at last and goes off to fend for itself... This
all necessitates that we don't completely open the network up from day
one so we have to have a kind of trust model for which servers come
into the network (?) and who we hand out server software to (?). Not
sure how to go about that as it sort of goes against our very open
attitude to date. Ah! Remember Joe saying that people would have to
prove themselves before getting the software. But rather than the
criteria being nasty it could simply be a set of requirements like "you
need a server", etc.
1 different licences may be needed per project
Yup. The owner will decide. Licences need to be codify-able to make
quicker and easier to understand and hence for users to decide whether
they wish to interact with data with licence restrictions.
2 what dump format to support?
If this is inter kendraServer dumps then they are not really wholesale
dumps and more like query/searches where the results get cached and
marked as cached. Remember object owners can specify where they want
their data to reside and if it gets mirrored at all.
If, however, these are kendraServer to outside world then again these
will be queries and the format will be requested like XML, LDAP, etc.
Look forward to questions/comments.
Cheers Daniel