[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Wikipedia software was Re: [kDev] RFC: Kendra Tools Project Plan 1...
Hi Neil and All,
I have massively snipped this email for ease of reading so please refer
to web archive for history. Please see comments and questions...
On Wednesday, January 29, 2003, at 02:14 pm, Neil Harris wrote:
I think Kendra is trying to solve a different problem from Wikipedia,
which has the "narrow" aim of writing a multilingual encylopedia on
all subjects. However, it does not hurt that their content is
available under the GFDL, should we need to use it. See also the
Wikitionary, which is an early-stages project to use the same
technology for a dictionary.
I don't like the way that multilingual content is handled at all. You
shouldn't have to go to a different site (which then sets another
cookie). Also, the content is different depending on which language
site you're visiting. I am not saying that everything should be a
direct translation of an English page but I would like the switch from
one language to another to be as transparent/accurate as possible. So,
there are 2 issues here: 1, the original text in whatever language and
2, the accurate translation of that text in whatever language. Reason
being, to use language as an enabler to passing knowledge and not a
barrier.
I want to know what French speaking people think about stuff so that
means the translations must be accurate and transparent.
And another thing, I want my objects to persist. I'd love people
commenting on them or even taking them and modifying them to create
their own objects. I know I can look through the archive/history of
each page to see who wrote what but it's not the same. I don't want to
be forced to see the data in a certain way. I want to be able to see
perhaps just what Neil's written or comments that people have made of
this subject or the views of people in Russia or
the-list-is-endless(tm).
Also, don't like the way anonymous people can make changes as I see in
the archives. Removes responsibility and accountability for what's
being said.
Can this only support pages/articles? I mean can it support data like
lists of kendraPartipants, their addresses, their servers, their
songs, their songs' business rules, their servers, which songs on
which servers, etc? And can we easily extend the data model to allow
for new objects bolted on?
Well "pages" can be be anything
Granted.
and they can be linked in many different ways:
But right now the only "link" is a hyperlink, yes? There doesn't seem
to be a way to define the relationships between objects as with test1.
Like (Pigs on the Wing | is a song by | Pink Floyd)?
the free-form Wiki structure can work alongside table-driven data
structures
Sure, but somehow I'd rather it was all the same structure, a hybrid,
perhaps. A structure that gives us the free form of Wiki *and* the
clarity of tables.
or the Wiki data be treated as raw material for data mining into a
more rigid form. Wikipedia articles tend to follow style rules: by
using conventions in free-form material, they can be parsed to build
up relational indices, which can then generate auto-generated content,
web services etc.
The problem I have with parsing free-form material is that it is
somehow removed from the original material. What happens if a change
needs to be made by the originator? It all has to be re-parsed? But
what if I've already linked to an indices? Will it loose the reference?
Not a neat solution surely?
Why do we have to have free form material that contains lists or
tables? Free form is fine for free form stuff like text. But why not
create structured tables for structured data. So, in some ways we need
to have a "create table" function on the input.
Am I right that Wiki is all about doing stuff in flat page format?
Maybe that model just wont work for the complexity of what we need.
Though there are some parts of the Wiki interface that are really great
and we should borrow.
I keep looking at test1 and think that it could provide a basis for a
linking structure that could accommodate html pages, structured lists,
comments, questions, transactions, anything. It does seem so, yes?
What's wrong with it?
* existing developer and 1000+ user base
* current database exceeds 100,000 articles with full histories
Above 2 points only relevant if we merge, yes?
The point here was: it scales well (don't look today, though, they're
debugging new database support -- see below -- and it's slow at the
moment).
Cool.
We need totally distributed databases. Each
company/organisation/group/individual may want to host their own data
and we need to be able to cope with that.
Inter-wiki links should be able to handle that.
Are those connection made via a standard methods like XML or web
services? We need kendraTools to talk to any third party application.
I'm not saying we code the connections tools but just make the
interface plug-in-able.
For example we would need something like "Inter-wiki links" and there
would need to be some kind of name resolution service to find out where
the data is stored (hope this is correct terminology). Using domain
names and IP addresses are obvious but we need to be able to cope with
any new stuff that comes along like p2p applications.
Also, still on name resolution, I remember in the heady days of the
kendraNetworkTrial that there was an issue with just putting in IP
address or host name for where the stream was stored because a few CDNs
use an abstracted way to locate the stream. So, actually, this
plug-in-able stuff would be cool. Then the CDNs could build in their
own nearest-server-algorithms.
* provision for page caching
Cool. The user needs to be able to specify for their own content their
own rules about how that caching operates on kendraTools. Master
records would be kept in one or more places on servers that where the
user would trust the system administrator - unless they're not that
bothered, of course - and depending on exactly what the content is.
So, EVERYONE can then use the code in ANYTHING they produce and if
they can make money from it then so much the better. Yes?
As I understand it, only code derived from, or directly linked to,
GPL'd code is required to be GPLd.
Sure, and that's a restriction that serves free software but may not
serve Kendra. It may be better for us to give it away with no
restrictions. So, if Kendra Foundation is spending money on coding then
the license needs to serve Kendra.
Using a GPL'd engine does not require the use of GPL on subsystems
built to run on it: for example, Linux can be used to run proprietary
software, without any GPL restrictions.
Sure.
Finally, the current Wikipedia software is the work of only a few
people, and they _might_ be willing to let it be relicensed under,
say, the BSD licence if we ask them nicely, and give them a really
good explanation of why we need it.
Sure. The licence that we use needs to have some thought put into it.
It would be good for Kendra to actually produce something tangible so
there needs to be ownership of it and hence a license for use. I'm
going to send my thoughts in a new email as this one is getting too
long.
PostgreSQL is much better in systems with a high update level: it uses
multi-versioning instead of locks wherever possible.
Sounds like PostgreSQL is the way to go but not my decision. Whoever
gets employed to do this makes the decision based on the requirements
of the project plan, I guess. Is the project plan clear enough to make
those decision? If not, where does it need more work?
Cheers Daniel
Test: http://www.kendra.org.uk/tools1/
Plan:
http://www.kendra.org.uk/documents/kendra_tools_project_plan_1.html