[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Wikipedia software was Re: [kDev] RFC: Kendra Tools Project Plan 1...



Hi Neil and All,

I have massively snipped this email for ease of reading so please refer to web archive for history. Please see comments and questions...

On Wednesday, January 29, 2003, at 02:14  pm, Neil Harris wrote:
I think Kendra is trying to solve a different problem from Wikipedia, which has the "narrow" aim of writing a multilingual encylopedia on all subjects. However, it does not hurt that their content is available under the GFDL, should we need to use it. See also the Wikitionary, which is an early-stages project to use the same technology for a dictionary.

I don't like the way that multilingual content is handled at all. You shouldn't have to go to a different site (which then sets another cookie). Also, the content is different depending on which language site you're visiting. I am not saying that everything should be a direct translation of an English page but I would like the switch from one language to another to be as transparent/accurate as possible. So, there are 2 issues here: 1, the original text in whatever language and 2, the accurate translation of that text in whatever language. Reason being, to use language as an enabler to passing knowledge and not a barrier.

I want to know what French speaking people think about stuff so that means the translations must be accurate and transparent.

And another thing, I want my objects to persist. I'd love people commenting on them or even taking them and modifying them to create their own objects. I know I can look through the archive/history of each page to see who wrote what but it's not the same. I don't want to be forced to see the data in a certain way. I want to be able to see perhaps just what Neil's written or comments that people have made of this subject or the views of people in Russia or the-list-is-endless(tm).

Also, don't like the way anonymous people can make changes as I see in the archives. Removes responsibility and accountability for what's being said.

Can this only support pages/articles? I mean can it support data like lists of kendraPartipants, their addresses, their servers, their songs, their songs' business rules, their servers, which songs on which servers, etc? And can we easily extend the data model to allow for new objects bolted on?

Well "pages" can be be anything

Granted.

and they can be linked in many different ways:

But right now the only "link" is a hyperlink, yes? There doesn't seem to be a way to define the relationships between objects as with test1. Like (Pigs on the Wing | is a song by | Pink Floyd)?

the free-form Wiki structure can work alongside table-driven data structures

Sure, but somehow I'd rather it was all the same structure, a hybrid, perhaps. A structure that gives us the free form of Wiki *and* the clarity of tables.

or the Wiki data be treated as raw material for data mining into a more rigid form. Wikipedia articles tend to follow style rules: by using conventions in free-form material, they can be parsed to build up relational indices, which can then generate auto-generated content, web services etc.

The problem I have with parsing free-form material is that it is somehow removed from the original material. What happens if a change needs to be made by the originator? It all has to be re-parsed? But what if I've already linked to an indices? Will it loose the reference? Not a neat solution surely?

Why do we have to have free form material that contains lists or tables? Free form is fine for free form stuff like text. But why not create structured tables for structured data. So, in some ways we need to have a "create table" function on the input.

Am I right that Wiki is all about doing stuff in flat page format? Maybe that model just wont work for the complexity of what we need. Though there are some parts of the Wiki interface that are really great and we should borrow.

I keep looking at test1 and think that it could provide a basis for a linking structure that could accommodate html pages, structured lists, comments, questions, transactions, anything. It does seem so, yes? What's wrong with it?

* existing developer and 1000+ user base
* current database exceeds 100,000 articles with full histories

Above 2 points only relevant if we merge, yes?

The point here was: it scales well (don't look today, though, they're debugging new database support -- see below -- and it's slow at the moment).

Cool.

We need totally distributed databases. Each company/organisation/group/individual may want to host their own data and we need to be able to cope with that.

Inter-wiki links should be able to handle that.

Are those connection made via a standard methods like XML or web services? We need kendraTools to talk to any third party application. I'm not saying we code the connections tools but just make the interface plug-in-able.

For example we would need something like "Inter-wiki links" and there would need to be some kind of name resolution service to find out where the data is stored (hope this is correct terminology). Using domain names and IP addresses are obvious but we need to be able to cope with any new stuff that comes along like p2p applications.

Also, still on name resolution, I remember in the heady days of the kendraNetworkTrial that there was an issue with just putting in IP address or host name for where the stream was stored because a few CDNs use an abstracted way to locate the stream. So, actually, this plug-in-able stuff would be cool. Then the CDNs could build in their own nearest-server-algorithms.

* provision for page caching

Cool. The user needs to be able to specify for their own content their own rules about how that caching operates on kendraTools. Master records would be kept in one or more places on servers that where the user would trust the system administrator - unless they're not that bothered, of course - and depending on exactly what the content is.

So, EVERYONE can then use the code in ANYTHING they produce and if they can make money from it then so much the better. Yes?

As I understand it, only code derived from, or directly linked to, GPL'd code is required to be GPLd.

Sure, and that's a restriction that serves free software but may not serve Kendra. It may be better for us to give it away with no restrictions. So, if Kendra Foundation is spending money on coding then the license needs to serve Kendra.

Using a GPL'd engine does not require the use of GPL on subsystems built to run on it: for example, Linux can be used to run proprietary software, without any GPL restrictions.

Sure.

Finally, the current Wikipedia software is the work of only a few people, and they _might_ be willing to let it be relicensed under, say, the BSD licence if we ask them nicely, and give them a really good explanation of why we need it.

Sure. The licence that we use needs to have some thought put into it. It would be good for Kendra to actually produce something tangible so there needs to be ownership of it and hence a license for use. I'm going to send my thoughts in a new email as this one is getting too long.

PostgreSQL is much better in systems with a high update level: it uses multi-versioning instead of locks wherever possible.

Sounds like PostgreSQL is the way to go but not my decision. Whoever gets employed to do this makes the decision based on the requirements of the project plan, I guess. Is the project plan clear enough to make those decision? If not, where does it need more work?

Cheers Daniel

Test: http://www.kendra.org.uk/tools1/
Plan: http://www.kendra.org.uk/documents/kendra_tools_project_plan_1.html