Home

20010314 Triples: The centre of the Semantic Web?

I believe that there are going to be at least two different levels of information on the SW: triples that are produced and consumed by machines and other "higher-level" formats that humans use. We, humans, interact with software programs that transform our thoughts (or parts of them) into some basic triple representation. The software programs should also be able to create "higher-level" formats from basic triplets according to our own preferences. The important thing is that there is no semantically difference of the information if it is present in one level or the other. Triples should encode the same semantic as "higher-level" formats and vice versa.

This basic triple representation needs to be a global standard. But this does not mean that all software programs should use this format internally, they could use whatever format they like. The important thing is that every software program could do the exchange of triples. Of course, since the SW is the "Web of conversions", one should be able to convert formats into the basic triple standard by using some conversion language. If I design some triple format I should express in some standard conversion language how this format could be transform into the basic triple format. (The reason for creating another triple format could be that the triples should be transported along with other information formats in witch the basic triple format might, for instance, violate some rule of the transporting format.) The exchange of triples between machines will be massive, therefore it is very important to invent an efficient conversion language. (Something to write about!)

I am quite convinced that the format will be based on XML, but other than that I simply don't know (RDF - perhaps, if not judged as too complex by the community). XML has several properties that make it suitable for representing the basic triple format: it uses Unicode, supports namespaces, handle URIs neatly and it could represent triple structures easily.

Using triples as the core information set is something that I have not seen directly mentioned in that many places. Sean B Palmer has mentioned this in his More SW Ramblings - which by the way is quite good but does not explain the general and global use of triples. I will try to explain how the triples are going to be used. I will use the terminology used in the RDF M&S to not confuse people too much.

A triple is a statement. That kind of statements might steam from a database (a bit tweaked) or some annotated information. Statements attach properties to resources or create relationships between resources. By using properties that are "understandable" to machines, it becomes possible to automate the processing of these resources. In order to be universal, and to enable it to be machine processed, at least two of the elements in the triple have to be URIs. As the triple consists of a predicate, an object, and a subject, the predicate and the subject has to be a URI. We need to know exactly what resource the statement is about and exactly what kind of property we assign to that resource. Of course, often the triples will consist of three URIs. This is the basic construction and intent of using triples.

Now, where will these triples "live" and were do they come from? To explain lets look at an example. Imagine a "triple search engine" (TSE) (or some form of agent) that traverses the Web. At some location it might stumbles across a set of triples (e.g. from a RDF annotation). This set of triples might be in a format that the TSE does not understand. But, since there is a URL to the format specification and since that specification describes in a machine understandable way how to transform the format to the basic triple format it is "no problemos!" The TSE takes these statements (triples), if they are in some way useful, but in order to represent them locally the TSE needs to transform them into reified statements and in the preferred triple format. Since we can't trust anybody, the TSE needs to store the statements as "according to my source bla bla bla". Thus, internally the TSE stores the statements as refined statements. This means that the TSE collects a set of statements from different sources and stores them locally. Now, some client to the TSE can search or do some form of logical manipulations on the set of collected statements.

[[If another TSE starts to ask questions to the TSE and receives statements like: "according to http://somewhere/somedoc Elvis lives" it should instead ask the original source (e.g. go directly to the source) instead of asking the TSE.]]

But is there going to be a new URI scheme for exchanging triples, hmm I don't know, but generally one should not invent URI schemes if it is not absolutely necessarily. (This is also something to write about.)

It feels to simple, I must have forgotten something!

Home