RSS
Translated by
2010/05/18 18:22:29

XML

XML (eXtensible Markup Language is an extensible markup language; it is said [ex-em-el]) — the markup language recommended by Consortium of the World Wide Web which is actually representing the code of the general syntactic rules. XML is the text format intended for storage of structured data (instead of the existing files of databases), for information exchange between programs and also for creation on its basis of more specialized markup languages (for example, XHTML) sometimes called by dictionaries. XML is the simplified SGML language subset.

Content


Ensuring compatibility by transfer of structured data between different processing systems information, especially by transfer of such data through was the purpose of creation of XML Internet. The dictionaries based on XML (for example RDF RSS MathML, XHTML SVG), in itself are formally described that allows to change and check programmatically documents on the basis of these dictionaries, without knowing them semantics, i.e. without knowing semantic value of elements. Important feature of XML also is application of so-called namespaces (namespace).

Correctly constructed and valid documents XML

The standard determines two levels of correctness of the document XML:

  • Correctly constructed (Well-formed). Correctly constructed document corresponds to all general rules of syntax of XML applicable to any XML document. And if, for example, the beginning tag has no final tag corresponding to it, then it is incorrectly constructed document XML. The document which is incorrectly constructed cannot be considered as the document XML; The XML-processor (parser) should not process it normally and is obliged to classify a situation as a fatal error.

  • Valid. The valid document in addition corresponds to some semantic rules. It is more strict additional check of correctness of the document on compliance to in advance certain, but already external rules, for the purpose of minimization of quantity of errors, for example, of structure and structure of this, specific document or family of documents. These rules can be developed by both the user, and third-party developers, for example, developers of dictionaries or standards of data exchange. Usually such rules are stored in special files  — schemes where the structure of the document, all admissible names of elements, attributes and many other things is in detail described. And if the document, for example, contains the name of an element which is not defined in advance in schemes, then the XML document is considered invalid; the checking XML-processor (validator) when checking on compliance to rules and schemes is obliged (at the choice of the user) to announce an error.

These two concepts have no rather settled standardized translation into Russian, especially the concept valid which can also be translated, as valid, lawful, reliable, suitable, or even checked for compliance to rules, standards, laws. Some programmers apply in use the settled tracing-paper Valid.

Syntax of XML

In this section only the correct creation of documents XML, i.e. their syntax is considered.

XML  is the hierarchical structure intended for storage of any data, visually the structure can be provided as a tree. The major mandatory syntax requirement  — the fact that the document has only one root element (root element) (which is alternatively called by a document element). It means that the text or other data of all document should be located between the only beginning root tag and the final tag corresponding to it.

The following simplest example  — correctly constructed document XML: <source lang="xml"> <book>Это книга: "Книжечка"</book></source><source lang="xml"> <book>It is the book: "Book"</book></source> The first line of the XML document is called the declaration of XML (XML declaration)  - it is the optional line specifying the version of the XML standard (normally it is 1.0), the character encoding and external dependences also here can be specified. <source lang="xml"> <?xml version="1.0" encoding="UTF-8"?> </source> The specification requires that XML processors surely supported Unicode- codings UTF-8 and UTF-16 UTF-32 (is not obligatory). Are recognized as admissible, are supported and other codings based on the standard are widely used (but are not obligatory) ISO/IEC 8859 other codings, for example, Russians are also admissible Windows-1251 KOI-8.

The comment can be placed in any place of a tree. XML comments are placed in couple of tags <! - and->. Two signs a hyphen (-) cannot be applied in any part in the comment. <source lang="xml"> </source>

Below the example of the simple culinary recipe marked using XML is given:

<source lang="xml">

<?xml version="1.0" encoding="UTF-8"?> <recipe name="хлеб" preptime="5" cooktime="180"> <title>Simple bread</title> <ingredient amount="3" unit="стакан">Flour</ingredient> <ingredient amount="0.25" unit="грамм">Yeast</ingredient> <ingredient amount="1.5" unit="стакан">Warm water</ingredient> <ingredient amount="1" unit="чайная ложка">Salt</ingredient> <Instructions> <step>Mix all ingredients and to knead carefully.</step> <step>Close fabric and leave for one hour in the warm premises.</step> <step>Knead once again, put on a baking sheet and deliver in an oven.</step> </Instructions> </recipe> </source>

Structure

The rest of this XML document consists of the enclosed elements some of which have attributes and contents. The element usually consists of the opening and closing tags framing the text and other elements. The opening tag consists of member name in angle brackets, for example, "<step>"; the closing tag consists of the same name in angle brackets, but before a name the virgule, for example, "</step>" is still added. Contents of an element (content) is called everything that is located between opening and closing tags, including the text and other (enclosed) elements. Below the example of a XML-element which contains opening tag, closing tag and contents of an element is given:

<source lang="xml"> <step>Knead once again, put on a baking sheet and deliver in an oven.</step> </source>

Except contents the element can have attributes  — pairs a name value added to opening tag after the name of an element. Values of attributes are always quoted (unary or double), the same name of attribute cannot meet twice in one element. It is not recommended to use different types of quotes for values of attributes of one tag.

<source lang="xml"> <ingredient amount="3" unit="стакан">Flour</ingredient> </source>

In the given example the ingredient element has two attributes: "amount" important "3", and "unit", important "glass". In terms of a XML-marking, the given attributes do not bear any sense, and are just a symbol set.

Except the text, the element may contain other elements:

<source lang="xml"> <Instructions> <step>Mix all ingredients and to knead carefully.</step> <step>Close fabric and leave for one hour in the warm premises.</step> <step>Knead once again, put on a baking sheet and deliver in an oven.</step> </Instructions> </source>

In this case the Instructions element contains three step elements. XML does not allow the blocked elements. For example, the fragment given below is incorrect as the "em" and "strong" elements are blocked.

<! - ATTENTION! Incorrect XML!-> <p> Normal <em> accented <strong> selected and accented </em> selected </strong> </p>

Each XML document should contain one root element (root element or document element) in accuracy, thus, the following fragment cannot be considered as the correct XML document.

<! - ATTENTION! Incorrect XML!-> <thing> Entity No. 1 </thing> <thing> Entity No. 2 </thing>

Without the contents called by an empty element it is necessary to apply the special form of record consisting of one tag in which after member name the virgule is put to designation of an element. If in a DTD element it is not announced empty, but in the document it has no contents, it is allowed to apply such form of record to it. For example:

<source lang="xml"> <foo></foo> <foo /> <foo/> </source>

In XML two write methods of special characters are defined: the entity reference and the link by number of the character. An entity (entity) in XML are called the referred to as data, usually text, in particular, special characters. The entity reference (entity references) is specified in that place where there has to be an entity and consists of an ampersand ("&"), a name of an entity and a semicolon (";"). In XML there are several predetermined entities, such as "lt" (it is possible to refer to it having written "&lt;") for the left angle bracket and "amp"  (link — "&amp;") for an ampersand, it is possible to define own entities also. In addition to record using entities of separate characters, they can be used for record of often found text blocks. Below the example of use of the predetermined entity for avoidance of use of the sign of an ampersand is given in the name:

<source lang="xml"> <company-name>AT&T</company-name> </source>

The complete list of the predetermined entities consists from &amp; ("&"), &lt; ("<"), &gt; (">"), &apos; ("'"), and &quot; (""")  — the last two are useful to record of dividers in values of attributes. It is possible to define the entities in the DTD document.

Sometimes happens it is necessary to define a non-breaking space which is very often used in HTML and is designated as &nbsp; in XML there is no such predetermined entity, it is written &#160, and use &nbsp; causes an error. The lack of this very widespread entity often is surprising to a great number of programmers and it creates some difficulties at migration of the HTML-developments in XML.

The link by number of the character (numeric character reference) looks as the entity reference, but instead of a name of an entity the character # and the number (in decimal or hexadecimal notation) which is number of the character in the code chart Unicode is specified. These are usually characters which cannot be coded directly, for example, a letter of the arabic alphabet in ASCII - the coded document. The ampersand can be provided as follows:

<source lang="xml"> <company-name>AT&T</company-name> </source>

There is still a set of the rules concerning drawing up the correct XML document, but the purpose of this brief summary was to show only the bases necessary for understanding of structure of the XML document.

History

Year of birth of XML can be considered 1996 at the end of which there was a draft version of the language specification, or 1998 when this specification was approved. And everything began with emergence in 1986 of the SGML language.

SGML (Standard Generalized Markup Language  is a standard generalized markup language) declared itself as flexible, complex and comprehensive meta language for creation of markup languages. In spite of the fact that the concept of the hypertext appeared in 1965 (and the fundamental principles are formulated in 1945 by годóhttp://www.arbuz.uz/x_revich_mouse.htmlОшибка цитирования Неверный вызов: нет входных данных), SGML has no hypertext model. It is possible to call creation of SGML with confidence attempt to embrace the immensity as it integrates in itself(himself) such opportunities which are extremely seldom used all together. In it its main shortcoming  — complexity also consists and, as a result, the high cost of this language limits its use only by the large companies which are able to afford to purchase the corresponding software and to employ highly paid specialists. Besides, the small companies seldom have so difficult tasks to attract to their solution SGML.

Most widely SGML is applied to creation of other markup languages, with its help the markup language of hypertext documents  — HTML which specification was approved in 1992 was created. Its emergence was connected with need of the organization of promptly increasing document file for the Internet. Rapid growth of number of connections to the Internet and, respectively, Web servers caused such need for encoding of electronic documents with which SGML owing to high difficulty of mastering could not cope. Emergence of HTML  — very simple language of a marking  — quickly solved this problem: ease in studying and richness of means of document creation made it the most popular language for Internet users. But, in process of growth of quantity and change of quality of documents in Network, both requirements imposed to them, and simplicity of HTML grew turned into its main shortcoming. The limitation of number of tags and absolute indifference to structure of the document induced developers on behalf of consortium W3C to creation of such markup language which would be not so difficult as SGML, and is not so primitive as HTML. As a result, combining simplicity of HTML, logic of a marking of SGML and meeting requirements the Internet, the XML language was born.

Strong and weaknesses

Advantages

  • XML  is the markup language allowing to display binary data in the text read by the person and analyzed by the computer;
  • XML supports Unicode;
  • in the XML format such data structures  as records, lists and trees can be described;
  • XML  is a self-documented format which describes structure and names of fields also as well as field values;
  • XML has strictly certain syntax and requirements to the analysis that allows it to remain simple, effective and consistent. Along with it, different developers are not limited in the choice of expressional methods (for example, it is possible to model data, placing values in parameters of tags or in a body of tags, it is possible to use different languages and notations for naming of tags  , etc.);
  • XML  is the format based on international standards;
  • The hierarchical structure of XML is suitable for the description practically of any document types, except audio and video of media streams, bitmap images, network data structures and binary data;
  • XML represents the plain text free from licensing and any restrictions;
  • XML does not depend on the platform;
  • XML is a subset of SGML (which is used since 1986). Extensive work experience with language is saved already up and customized applications are created;
  • XML does not impose requirements of arrangement of characters in [1]
  • Unlike binary formats, XML contains metadata about names, types and classes of the described objects on which the application can process the document of unknown structure (for example, for dynamic creation [2]);
  • XML has implementations of parsers for all modern languages [3]
  • XML is supported on low hardware, microprogram and program levels in modern hardware solutions.[4]

Shortcomings

  • The syntax of XML is excessive. [5]

* The size of XML document is significantly more than binary submission of the same data. In rough estimates the value of this factor is taken for 1 order (by 10 times).
* The size of XML document is significantly more, than the document in alternative text transmission formats of data (for example JSON[1], YAML) and especially in formats of the data optimized for a specific case of use.
* The redundancy of XML can affect efficiency of the application. The cost of storage, processing and data transmission increases.
* XML contains metadata (about names of fields, classes, enclosure of structures), and at the same time XML is positioned as language of open system interconnection. By transfer between the systems of a large number of objects of one type (one structure) to transfer metadata repeatedly there is no sense though they contain in each copy of XML description.
* For a large number of tasks all power of syntax of XML is not necessary and it is possible to use much simpler and productive [6]
  • Ambiguity of modeling.

* There is no commonly accepted methodology for data modeling in XML while and object-oriented such means are developed for a relational model and are based on relational algebra, system approach and systems analysis.
* In the nature there is a set of objects and phenomena for which description different data structures (network, relational, hierarchical) are natural, and display of an object in model, unnatural for it, is painful for its essence. In a case with relational and hierarchical models the procedures of decomposition providing relative unambiguity that cannot be told about a network model are defined. [7]
* As a result of big flexibility of language and lack of strict restrictions, the same structure can be provided by a set of methods (different developers), for example, the value can be written as attribute of the tag or as a tag body  , etc. For example:<a b="1" c="1"/> or<a b="1" c="1"></a> or <a><b> 1 </b> <c>1</c></a> or<a><b value="1"/><c value="1"/></a> or<a><fields b="1" c="1"/></a>  , etc. [8]
* Support of many languages in naming of tags gives the chance to call, for example weight the Russian word, in that case the computer will not be able to set compliance of this field with the weight field in the English-language version of the program and with fields in versions of model of an object in a set of other languages in any way.
  • XML does not contain the support of data types which is built in language. In it there is no strict typification, i.e. concepts of "integer numbers", "lines", "dates", "boolean values"  , etc.
  • The hierarchical data model offered XML is limited in comparison with a relational model and object-oriented columns and a network model of data.

* Expression of not hierarchical data (for example graphs) requires additional efforts
* Christopher Deyt, the specialist in the field of relational databases, the author of the classical textbook "An Introduction to Database Systems", noted that "… XML is attempt again to invent hierarchical databases …" [9] (in the 1980th years hierarchical databases were forced out by relational databases).
  • It is difficult to use namespaces of XML and it is difficult to implement them in XML parsers.
  • There are others, having potential, similar to XML, text formats of data which have higher convenience of reading by the person (YAML JSON[10], [11]).

The XML display in the World Wide Web

Three methods of conversion of the XML document to the type displayed to the user are most widespread:

  1. Use of CSS styles;
  2. Use of the XSLT conversion;
  3. Writing in any programming language of the processor of the XML document.

Without use of CSS or XSL the XML document is displayed as the plain text in the majority of Web browsers. Some browsers, such as Internet Explorer, Mozilla and Mozilla Firefox display structure of the document in the form of a tree, allowing to turn and develop nodes using the key presses of a mouse.

Use of CSS styles

Process is similar to application of CSS to HTML document for display.

For application of CSS at display in the browser, XML document should contain the special table reference of styles. For example:

<source lang="xml"> <?xml-stylesheet type="text/css" href="myStyleSheet.css"?> </source>

It differs from approach of HTML where it is used the <link> element.

Use of the XSLT conversion

XSL is the technology describing how to format or transform data of the XML document. The document is transformed to the format suitable for display in the browser. The browser  is the most frequent use of XSL, but you should not forget that using XSL it is possible to transform XML to any format VRML, for example PDF, the text.

The XSL task of transformation (XSLT) on client side requires existence of the following instruction in XML:

<source lang="xml"> <?xml-stylesheet type="text/xsl" href="transform.xsl"?> </source>

XML dictionaries

As XML is rather abstract language, the XML dictionaries were developed.

The dictionary allows developers to agree about some final set of names of tags and attributes of these tags. One of the first dictionaries there was XHTML which the majority of browsers understand. XHTML often use for storage and editing content in CMS.

More specialized dictionaries, for example the data transfer protocol SOAP which is not cheloveko-focused were created and rather difficult we read. There are commercial dictionaries, such as CommerceML, xCBL and cXML which are used for data transmission, these dictionaries focused on a trading activity include the description of a system of orders, suppliers, products and other.

Usually, describing any document, the person for himself thinks out some dictionary which is described then by means of DTD or just speaks "fingers" to interested persons.

One of the interesting dictionaries which were widely adopted is FB2  — the dictionary describing a book format with various footnotes, quotes, even pictures.

Versions of XML

  • XML 1.0
  • XML 1.1

See Also

  • DTD XSD XML Schema — languages of structure declaration of the document XML
  • XLink, XPointer  is languages of the description of links XML
  • XPath  is language of addressing XML
  • XQuery  is language of requests XML
  • XML DOM  — the interface for processing of XML documents
  • XSL XSL-FO XSLT — languages of conversion of documents XML
  • YAML
  • DITA

Notes

  1. 1,0 1,1 строке;http://www.json.org/xml.html
  2. interfeysovhttp://www.xml.com/lpt/a/1535
  3. программирования;http://www.xml.com/pub/rg/XML_Parsers
  4. Intel XML Accelerator
  5. David Megginson. Imperfect XML: Rants, Raves, Tips, and Tricks … from an Insider. Chapter 8
  6. решения.http://www.faqs.org/docs/artu/ch05s02.html#id2907018
  7. Gustavo Alonso. [1] Myths around Web Services Swiss Federal Institute of Technology, page 6
  8. Tim Bray. [2] Using XML in Internet Protocols Sun Microsystems
  9. http://www.oreillynet.com/lpt/a/6060
  10. SweetXMLhttp://innig.net/software/sweetxml/index.html
  11. XFhttp://xfhome.org

Literature

Links