[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [kDev] kendraTools: information structure...



Hi Neil and All,

Thanks for clarifying. I've massively snipped this so for Neil's email please see:
http://www.kendra.org.uk/lists/archive/k-developers/msg00077.html

This kind of referencing the list web archives should be an automatic feature of our new list Tool - tight integration between web and email.

Please see comments...

On Tuesday, May 6, 2003, at 04:12  pm, Neil Harris wrote:
1 The data store stores _everything_ under a name which is a Unicode string (the "Universe of discourse")

And we need to be able to store binary files too. Like photos, music tracks, anything, everything... In conversation with Joe a while ago he said that on the very low level we'd have a list of object names pointing to different tables - one for each object type: text string, long text string, blob, etc.

1a [there will also be simple namespaces to deal with special / new data types]

We need to have different instances of the same object name. See Job in the demo. That's the current fault with the demo, we need to be able to call an object by the same name but mean a different object. Ultimately, objects become defined by what links they have rather than what they're called. For example there are many (too many) people in the world called "Daniel Harris". So, the name can't be the unique identifier as it is in the demo.

2 these things can be (initially)
* structured data objects with fields made from simple types (including names of other objects)
* relational statements of the form "x relation y"
* English or other language plain text comments

Yes, the point here being is that we are creating a receptacle that can hold *any* data structure. It may not do it in the most space efficient way or in a way that can be searched in the quickest way. But that's OK. The important point is that we retain the relationships over a distributed network of servers.

The search paradigm is "offline". Searches may not be that fast. Especially if querying multiple servers with complicated criteria. So, we don't want to keep webpages open for hours (?) waiting for results to come in. So, we'll have a search page of "my most recent searches" and their progress ("75% completed"). So, the search page gets updated as more info comes in.

The search result may be examined in great detail. In some cases the result will be made up from searching cached (mirrored) data that may not be current. That's OK as the search criteria will dictate how current the results need to be. It may be that there'll be massive and fast data caches that hold non-current data (no names ;-) and that'll be the quickest way to get a result. So, in some cases it will be a trade off between speed and up-to-dateness.

Of course, it may be that searches end up being really fast and then can behave as if online, if you get my drift.

Search criteria are themselves object in the data store that can be linked to, etc.

* page/report templates -- definition by example, etc.

Also referred to as "views". So, we could have plain tabled lists like we have many of on the website currently. Add to that more bulletin board type views - also a form of tables. And how about a 2D/3D globular rendering of the data store placing emphasis on assertions that have most links to them or coloured based on selected criteria - one for later, eh?

Anything on the website can be commented on because everything comes from the database and so can be explicitly referred to. You can subscr!be to a topic or forum just as with the better bulletin boards. You can create a topic by selecting a piece of text or image in "comment mode". The comment mode would render the viewed webpage with everything hotlinked-clickable-on. When clicked on the user would be invited to comment on that object. Really cool!

2a Some operations are restricted to admins for the time being.

The "owner" or sysadmin for their own server will always be in full control of what happens on their server.

3 The system will initially support very simple syntax for declaring templates for new kinds of object, and creating and editing objects.

If people are linking their-objects to other-people's-objects then they are not necessarily going to want those other-people's-objects to change... ever. They may agree with assertion Xv1 but not with Xv2. So, it may be that we have to say that there are no direct modifications to objects once placed in the data store. All modifications are a new object/relationship. So "Y is a modification of X". Hmmm?

4 Every user has an account.
5 All assertions are tagged by which user asserted them.

Yes, anonymity is to be discouraged by ergonomic engineering.

8 Full source + database dumps to be available on servers, so mirrors can be set up by anyone.

Some data (like addresses and username/password) will be private. A user should be able to set levels of privacy. They may want to share their address with everyone or just certain people. These rules will be pervasive for all objects that a user owns.

The user will also be able to elect where their private data is held. They may decide to only hold it on their friends server who they trust or they may let it be mirrored on any server. If they do that then they run the risk of server owners looking at their private data. But that may not be a problem for them. The choice is theirs.

Having everything in a relational database *should* bring space savings for things like access logs. I say *should* because I don't know how efficient SQL databases are at keeping data small. But at some point we may want to archive some of the database and take it offline. But if all these object have relationships to other objects then how are we to take them offline? I guess we just chuck more hard disk at it! ;-/

9 Need to be able to import data from public domain sources (NIMA database etc), peer with copyleft content (MusicBrainz etc.), and allow copyrighted data owners to participate, without giving up the rights on their data...

Good stuff.

See also: http://www.wikipedia.org/wiki/Wikipedia:Size_comparisons

After reading the list you really get the sense of "raw data" verses "useful data" - meaning stuff you can do stuff with.

A migration path for the software...

1  single server, running a single copy of the server
2 multiple cooperating servers, running on multiple boxes, run by Kendra, as proof of concept 3 as 2, but with servers run by other trusted organizations, with a central Kendra "mothership" 4 allow anyone to act as a content peer? (requires self-governing community critical mass to manage potential problems)
5  the "mothership" becomes unneccessary.

Yes, Kendra leaves home at last and goes off to fend for itself... This all necessitates that we don't completely open the network up from day one so we have to have a kind of trust model for which servers come into the network (?) and who we hand out server software to (?). Not sure how to go about that as it sort of goes against our very open attitude to date. Ah! Remember Joe saying that people would have to prove themselves before getting the software. But rather than the criteria being nasty it could simply be a set of requirements like "you need a server", etc.

1  different licences may be needed per project

Yup. The owner will decide. Licences need to be codify-able to make quicker and easier to understand and hence for users to decide whether they wish to interact with data with licence restrictions.

2  what dump format to support?

If this is inter kendraServer dumps then they are not really wholesale dumps and more like query/searches where the results get cached and marked as cached. Remember object owners can specify where they want their data to reside and if it gets mirrored at all.

If, however, these are kendraServer to outside world then again these will be queries and the format will be requested like XML, LDAP, etc.

Look forward to questions/comments.

Cheers Daniel