Advocating FDL University-courses
Wim De Smet
fragmeat at yucom.be
Tue Jul 16 18:17:13 UTC 2002
Tomasz Wegrzanowski wrote:
>On Tue, Jul 16, 2002 at 02:35:06PM +0100, Jaime E . Villate wrote:
>>On Tue, Jul 16, 2002 at 02:51:58PM +0200, Tomasz Wegrzanowski wrote:
>>>On Tue, Jul 16, 2002 at 02:35:06PM +0200, Joerg Schilling wrote:
>>>>There is no "Microsoft Word format", here is a bunch of different formats.
>>>>Some of them are more or less decoded by free software.
>>>The same can be said about HTML.
>>Wrong. All existing versions of HTML are perfectly decoded by free software
>>because the DTDs are publicly available at http://www.w3.org/ and an sgml
>>parser can easily decide whether a document is valid or not.
>Majoriity of HTML documents available online isn't valid according to
>W3C standards. They use tons of browser-specific extensions and often
>have bugs which nobody cares about as long as they work in major browsers.
Most modern webmasters make a good attempt to make their documents
w3c-compliant and even if they don't, the extensions they use can
usually be ignored in order to find the content. HTML is text after all
and using the HTML-standards one can easily remove all superfluous tags
and extract it's context. This way, HTML is both portable and transparent.
An example: any HTML you find out there will almost certainly use the
tags <body></body> and the tag <p> or <table>. The only possible way in
which extracting the information would become impossible is when the
document would use propietary formats, that can be viewed with a plugin
(and are not HTML).
DHTML off course can be pure evil but the information is always present
in the .html file.
As for .doc, the specification is seldom adhered to and changes in every
version of the MS Word program. It is in that way not transparent and
certainly not portable. That's why it's not equivalent to HTML...
>Discussion mailing list
>Discussion at fsfeurope.org
More information about the Discussion