Home Links
Home Page
Use XML in PHP
Compression of the data on PHP
Use mod_perl
Style of coding on PHP
Perl and XML. Library of the programmer
Access to databases under management SUBD POSTGRES95
Parsing on Perl
XMLHttpRequest (AJAX) - sending and processing of answers of http-searches with help JavaScript.
Subsys_JsHttpRequest: pumping of the data without perezagruzki pages (AJAX)
The brief description of regular expressions: POSIX and PCRE
Optimization of searches in MySQL
Wound of treelike structures in Databases (Nested Sets)
Oracle / PHP FAQ
The specification and functions DOM in PHP
Not kehshirovat`!
Report PPP
Useful advice{councils} on optimization of ASP-applications
XML: time has come
 

XML: time has come

In spite of the fact that this material can seem to the usual user trudnovatym for perception{recognition}, I recommend to not glance over it , and to strain and read. If it is necessary, to return once again and more. This material is written more for developers of Internet - applications, but today it is possible to say with confidence, that time XML has come and for the ordinary user the Internet and computers. Creation Symantec Web (see the material " Is sense - Symantec Web ") has finally fixed in consciousness razrabotchtkov programs idea of that without XML is farther anywhere, and it means, that the programs created by them will as much as possible xml-zirovany. And for whom programs are written? Correctly, for us with you. Means to us and to use XML to the full extent. Keep abreast from a train, not over yet.


If you are the developer for Web by development of programs zhdja the Internet to you is necessary to deal with set of technologies - connected modules Netscape, elements of management ActiveX, Dynamic HTML, Cascading Style Sheets (CSS), etc. - for expansion as affirms, opportunities of your pages. In few cases you really received promised, but basically these technologies only seriously complicated to you a life because of their unmatched behaviour in different browsers.


As one of victims, I should admit, that eventually my reaction to expansions for browsers became precisely same, as on a headache at a migraine: to switch off light, to draw curtains, to lie on a bed and to wait, while she will pass.


However the expanded markup language (Extensible Markup Language, XML) is absolutely other business. Though, as well as any new technology, he demands development, it should not cause in you a migraine. XML has come seriously and for a long time. The main thing, he should make your life more easy, instead of it is heavier.


The most important feature XML and technology of expanded language of the table of styles accompanying him (Extensible Stylesheet Language, XSL) will consist in branch of formatting from information filling. It can seem familiar to everyone who had to work with CSS or tables of styles in Microsoft Word. However if standard HTML to assimilate to a picture of a building CSS will correspond{meet} to instructions for a photolaboratory how it is necessary to process a photo. It is possible to make all doors red, all walls - pink, and a roof - sulfur. However without access to a photocopy of a building of any fundamental changes to bring it is impossible. XML, as against HTML, allows to exhibit the data and to manipulate them.


HTML at the end of a way


All beauty XML can be understood only at his  comparison with HTML. Formalized in RFC 1866 in 1995 (though, naturally to be used he began earlier), HTML is the most popular language of a marking all over the world. The term with reference to the document means usually everything, that does not concern to his  information filling. For example, when this clause{article} made ready for the press, editors Network Magazine marked her  (with the help of an old kind red pen), inserting remarks for the author and instructions for verstal`hhikov how it is necessary to format various elements.


For certain all users Web had to see file HTML in his  initial kind where tegi formatting are hashed with the usual text. (it is Possible, some from you will recollect in this connection about WordStar where basically steam rooms tegi also were used. In days of text monitors the document could be without ceremony spoiled, when, having inserted opening teg for transition to a fat font or underlining{emphasis}, you then overlooked to switch off it  by means of an insert closing tega at the end of a word.)


The main feature of marking HTML is, certainly, the opportunity of an insert of links to external documents or on internal sections of the same document. It is necessary to notice, that though HTML it is given by servers on HTTP more often, he also can be used on CD ROM or in a local area network. Universal languages of a marking are not adhered to any concrete transport.


HTML has succeeded not only as an adapted markup language, but also as the intermediate software (see D.Ejndzhela's clause{article} in this number{room} LAN). Due to the cheapness and prevalence browsers Web represent excellent{different} clients; at intermediary HTML they can communicate with the diversified servers.


However HTML has met the certain difficulties. His  limited opportunities of formatting tried to overcome with help CSS, initiatives TrueDoc from Bitstream and certainly sets of specific expansions for a browser; and his  limited opportunities as intermediate ON - with help Java, ActiveX, etc. Nevertheless all this does not eliminate his  fundamental lacks.


That you see, - all this that you receive


As a matter of fact, HTML is a technology of performance of the information, he describes how the browser should group the text and the schedule on page. In result. There is no way to describe the data irrespective of display of these data (behind exception of extremely weak system of keywords in heading of page Web). It is the main reason why so it is difficult to find the necessary information with the help of a search engine.


The client has no any more - less comprehensible means of extraction of the data from page Web for the further job with them. At presence of a firm hand you can insert contents of table HTML into a spreadsheet, but it not the decision! Further, on any concrete page Web the client receives only one performance of concrete set of the data.


Let's assume, that you look through the list of auctions eBay, ordered by date of opening the tenders. If you want to look at the same list, but sorted by date of closing the tenders your browser should send new search to the server. In turn the server should send anew full page HTML with the list of auctions. Such manipulation the data conducts to substantial growth of number of references{manipulations} to Web servers and complicates, thus, their further scaling.


Other problem with HTML in that, what is it language, i.e. authors cannot use it  for granting the information on hierarchy of the data. Further, he is inconsistent and consequently complicates analysis of the text by the software. For example, though the majority opening tegov (such as <B> or <H1>) has corresponding closing tegi, the some people (for example,) have no them.


The simple decision for some from the listed problems would be introduction additional tegov HTML, such, as, or. With their help the client could define{determine}, that itself submit data, and to display them differently or to export on search of the user. The history, however, shows, that introduction additional tegov for HTML can borrow{occupy} years; a consensus concerning that they should mean, it is rare when it is possible to reach{achieve} quickly if he in general is possible{probable}. If you decide to not wait changes of the standard mean, that you create something, non-standard and by that refuse one of main advantages HTML.


Therefore in 1996 members of working group of Consortium World Wide Web (W3C, http://www.w3.org <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.w3.org>) have returned to consideration of the standard generalized language of a marking (Standard Generalized Markup Language, SGML), strongly simplified which descendant is HTML. Offered{suggested} in 1974 Charles Goldfarbom, SGML represents a meta language - system for the description of other languages. At all opportunities he is too combined for the majority of browsers Web: one specification SGML borrows{occupies} over 500 pages.


Having simplified SGML for use with Web, the group has suggested XML (recommendation W3C under the status for February, 1998). XML represents subset SGML, and any valid document XML is valid document SGML. And, as well as SGML, XML is the meta language determining other languages of a marking for the specific purposes. For example, language of the synchronized integration of multimedia (Synchronized Multimedia Integration Language, SMIL) is based on XML.


XML it is used for a marking of standard documents in many respects the same as HTML. However XML surpasses it  at job with the structured data, such, as results of search, a metainformation about Web site or elements and types of the circuit.


Document XML looks in many respects similar on HTML. He also will consist of the text fragments annotated by prisoners in angular brackets tegami. However, as against HTML, the sense tega depends on the register, and everyone opening teg should have in all cases pair closing teg.


To call things by their proper names


Not limiting the author any fixed set tegov, XML allows him to enter any names represented useful. This opportunity is key for an active manipulation the data. As an example I shall result for comparison how the list of names and addresses looks on HTML, and how he will be submitted on XML.


Fragment HTML:



<H1> Editor Sontacts </H1>

<H2> the Name: Dzhonatan EHjndzhel </H2>

<P> A post: the senior editor </P>

<P> The edition: Network Magazine </P>

<P> Street and the house: Garissona, 600 </P>

<P> City: San Francisco </P>

<P> Staff{State}: California </P>

<P> An index: 94107 </P>

<P> Email:

jangel@mfi.com </P>


Tegi place the data on the screen, but inform nothing on their structure. Certainly, you can domyslit` their structure and even to insert the long list of recordings into a spreadsheet but what will take place if one of recordings will not contain a line with an e-mail address or the name of street and city appear are mixed by places?


In case XML the same fragment will be submitted as follows (and it is saved in file EDITORS.XML).



<? xml version = " 1.0"?>

<editor_contacts>

<editor>

<first_name> Dzhonatan </first_name>

<last_name> EHjndzhel </last_name>

<title> the senior editor </title>

<publication> Network

Magazine </publication>

<adress>

<street> Garissona, 600 </street>

<city> San Francisco </city>

<state> California </state>

<zip> 94107 </zip>

</address>

<e_mail> jangel@mfi.com </e_mail>

</editor>

</editor_contacts>


XML, it is a little bit more <verbose>, than HTML, much more simplifies definition of that itself represent and where there are fields of the data. In XML tegi cannot be imposed, as in HTML (that is not encouraged, but it is supposed by the majority of programs of analysis HTML). However they can be enclosed in each other. Actually, the investment even is encouraged as a way of creation of hierarchy of the data (the subordinated or equal in rights attitudes{relations}). Apparently from the resulted example, such elements as <first_name> and <e_mail>, contain the data while others (<address>) are present only with a view of structurization.


Tegi the beginnings and the end of an element are the basic used in XML marking, but business is not settled{exhausted} by them. For example, to elements attributes can be appropriated{given}. This opportunity is similar available in HTML where, for example, to an element <table> the attribute align = "center" can be appropriated{given}. In XML the element can have attributes one or more connected to it , and at drawing up of the document you can invent them so much, how much will wish, for example <publication topic = "networking" circulation = "controlled">.


Documents XML can contain links to other objects. Links represent a line beginning with ampersanta and coming to an end semicolon. These links allow to insert, in particular, into the document special symbols which inclusion on itself could confuse the program of analysis. For example, to place in the document a sign <it is less, than> (<) you should insert the link <and to insert itself ampersant - the link and, etc. Till now all the same as and in HTML. However links XML to objects give much more opportunities as they can refer to the sections of the text determined by the author in the same or in the other document.


For example, links to objects allow to apply the object-oriented approach at creation of journal clause{article}:



<article>

*introduction;

*body;

*sidebar;

*conclusion;

*resources;

</article>


Other kinds of marking XML are comments (they are allocated the same way as in HTML) and instructions on processing. Instructions on processing are preceded with a sign on a question. They describe, what exactly the program of analysis should use for interpretation of the concrete document or his  unit. For example, the instruction <? xml version = 1.0"?> informs the program of analysis XML, that the processable document is really made with help XML. On the other hand, the instruction <? rtf page?> serves for a call of the program of analysis RTF and an insert of a symbol of the end of page.


At last, sections of the symbolical data are the parts of the document considered{examined} exclusively as the symbolical data, not exposed to analysis. They look as follows:



<! [CDATA [


This text even if he contains elements of code HTML, such as <B> zhirnyjshrift </B> or <H1> the heading </H1>, is not exposed to grammatic analysis. Instead of it he is displayed as is.



]]>


Tables of styles


Till now at discussion XML I bypassed the party  two important questions. First of them concerns that, elements XML. how should be formatted (you for certain tried, but it is vain to find instructions on formatting in resulted{brought} fragments of a code.) the Second is connected to that as browsers can understand non-standard tegi type <publication>.


The answer lays in use of tables of styles. Using moderate popularity in Web kaskadiruemye tables of styles (Cascading Style Sheet, CSS) allow to change formatting known tegov HTML and to define{determine} new tegi. In particular, on Network Magazine Web-server of the table of styles CSS are used for standardization of performance of typical elements, such, as <H1>, and for introduction new, such, as vrezki.


CSS can serve and for formatting documents XML, but it is not so successful choice. Main advantage XML that he represents a format of the document, for possible{probable} manipulations, as treelike structure. Unfortunately, CSS are not capable to cooperate with a tree and can format only documents XML <as they are>. You can display the document in any attracted format, but cannot carry out any selective performance of his  data without application of language of scripts. Moreover, for use CSS you should study one more syntax.


The given restrictions have led to to creation XSL. This application XML with own semantics (the fixed set of elements), hence, it can be used for creation of tables of styles (patterns of documents), understandable to any program of analysis XML.


Tables of styles XSL describe, how documents XML should be transformed to other formats, such, as HTML or RTF. But tables of styles XML is something the greater, than it is simple converters of formats; they also give the mechanism for a manipulation the data. For example, the data it is possible to sort, make on them search, to delete or add directly from a browser.


Let's consider any simple table of styles which we could use for submitted before application Editor Contacts.



<? xml version = " 1.0"?>

<xsl:stylesheet xmlns:xsl = " http: // www.w3.org/TR/WD-xsl ">

<! The declaration, that the document is the table of styles and that he is connected with xsl: namespace->

<xsl:template match = "/">

<! To apply a pattern to all in initial document XML->

<HTML>

<BODY>

<H1> Editor Contacts </H1>

<xsl:for-each select = "editor_contacts/editor">

<H2> Name: <xsl:value-of select = "first_name">

<xsl:value-of select = "last_name"/> </H2>

<P> Title: <xsl:value-of select = "title"/> </P>

<P> Publication: <xsl:value-of select = "publication"/> </P>

<P> Street Address: <xsl:value-of select = "address/street"/> </P>

<P> City: <xsl:value-of select = "address/city"/> </P>

<P> State: <xsl:value-of select = "address/state"/> </P>

<P> Zip: <xsl:value-of select = "address/zip"/> </P>

<P> E-Mail: <xsl:value-of select = "e_mail"/> </P>

</xsl:for-each>

</BODY>

</HTML>

</xsl:template>

</xsl:stylesheet>


At preservation on a disk under name EDITORS.XSL (or any another) this pattern will be applied to EDITORS.XML at addition in him  of the next line after the first:



<? xml-stylesheet

type = "text/xsl" href =

"editors.xsl"?>


Finally the text on the screen of a browser will look the same way, as fragment HTML submitted earlier. However XSL can operate as function of a word-processor merge-print. Certain{determined} as the integral part of space of names XSL, an element xsl:for-each informs the processor that he should process cyclically all sites in initial file XML. The attribute xsl:value-of inserts value of XML site into element HTML. Thus, if you should return to EDITORS.XML and to insert tens or hundreds contact addresses they without any changes will be displayed in the table of styles. Due to that the information on formatting is required to be passed only once, XML and XSL save bandwidth.


Tables of styles XSL simulate function merge-print also that they allow to lower{omit} selectively fields of the data at display. Besides the conclusion can be sorted on any concrete field of the data. For sorting a database of contact addresses on a surname of the editor in the direct alphabetic order the element xsl:for-each should be changed as follows:



<xsl:for-each select =

"editor_contacts/editor"

order-by = " + last_name ">


XSL it is capable to carry out also conditional transformation of a conclusion depending on values of various elements or attributes. Moreover, he allows to request the sets of various operators of patterns given with use, symbols of substitution, filters, bulevykh operators and expressions of set XML and XSL are not intended at all for replacement SQL, besides hardly will be a lot of wishing to store{keep} the databases directly in format XML. However XSL opens an opportunity of various search on the data after their loading in a browser. You never any more do not need to use for information search the primitive built - in command of browser Find.


Significant potential XML as the intermediate software is supported with objective model of the document (Document Object Model, DOM) which version 1.0 has been accepted as recommendation W3C in October, 1998. DOM has arisen as the specification for maintenance of bearableness of scripts JavaScript and programs on Java between browsers Web and later ehvoljucionirovala in API for documents HTML and XML. She defines{determines} logic structure of documents, ways of access and a manipulation them. Programmers can create documents, operate their structure and add, modify or delete elements and contents.


DOM does not render any influence on how it is necessary to write documents XML and HTML. Instead of definition of a set of structures of the data she represents documents according to objective model, such, as the treelike structure consisting of sites. There is no necessity to use DOM simply for viewing documents XML from a browser. She is applied, when under the script it is required to change document XML or to address to his  data. On DOM server it can be applied to the analysis of files XML which have acted{arrived} from the client and corresponding reaction to them. Besides by programmers DOM it can be used as an intermediate level for transformation from a format of a database in XML. At correct realization of interfaces DOM users never will need to know, that data are stored{kept} in any other format, instead of in XML.

Declaration of structure of the document


Well, well, I recognize: passing to use XML as the intermediate software, I have passed{missed} one important step. If tegi and elements XML are used exclusively for the sake of convenience on your own Web site (as though you used CSS) has no any value, that you give these elements and tegam names which sense differs from standard and is known only to you. If, on the other hand, you want to give given to an external world and to receive the information from partners on business this circumstance gets huge value. Elements and attributes should be used by you the same way, as well as all other people, or at least you should document that do{make}.


For this purpose you should use definitions of types of documents (Document Type Definition, DTD). Stored{kept} in the beginning of file XML or external image as a file *.DTD, these definitions describe information structure of the document. DTD list{transfer} possible{probable} names of elements, define{determine} available attributes for each type of elements and describe compatibility of one elements with others.


Each line such as the document can contain the declaration in definition such as an element, call an element and define{determine} type of the data which the element can contain. She has the following kind



<! ELEMENT imja_ehlementa

(tip_dannykh)>


For example, the declaration <! ELEMENT publication (*PCDATA)> defines{determines} an element with a name publication, containing the symbolical data (i.e. the text). The declaration <! ELEMENT special_report (article_1, article_2, article_3)> defines{determines} an element with a name special_report, containing subelements article_1, article_2 and article_3 in that order, for example:



<special_report>

<article_1> XML:

Time has come </article_1>

<article_2> XML surpasses

Itself </article_2>

<article_3> Management of networks and

Systems with help XML </article_3>

</special_report>


After definition of elements DTD can define{determine} also attributes with the help of the command! ATTLIST. She specifies an element, calls the attribute connected to it  and then describes his  allowable values. For example, the following command establishes conformity between attribute manufacturer and an element car, and first of them can accept one of the specified values:



<! ATTLIST car manufacturer

(AudilVolvolVolkswagen)>


! ATTLIST allows to operate attributes and many other ways: to set default values, to suppress blanks, etc. DTD can contain declarations also! ENTITY where links to objects are defined{determined}, and also declarations! NOTATION, specifying what to do{make} with binary files not in format XML.


Serious and a little bit surprising restriction DTD will be, that they do not suppose typification of the data, i.e. limit given to a concrete format (such, as date, an integer or number with a floating point). As you, probably, have already noticed, DTD use other syntax, rather than XML, and not so that be intuitively clear. For named reasons DTD will be, probably, are replaced with more powerful and simple circuits in use XML, job above which is conducted now. The additional information on circuits XML can be gathered from the equipment design, the link on which is resulted in the Table, and from vrezki.


Probably, you had to hear definitions (well-formed) and (valid) with reference to documents XML. The document is correctly made if for everyone opening tega is present corresponding closing teg, and imposed tegi are absent. (Thus, the most part of documents HTML is made incorrectly.) the Document is valid if he contains DTD and corresponds{meets} to his  rules.

XML starts to work


XML will use escalating popularity as the open and effective standard for cooperation between the companies and electronic commerce. Data XML will mainly move with help HTTP, but they can be distributed also with the help of technologies of the organization of turns of messages, such, as MQSeries companies IBM or Message Queue Server companies Microsoft.


However that it became possible{probable}, it is required to define{determine} and in coordination to introduce specific circuits. W3C has perfectly decided, that he should not interfere with it; in result tens branch organizations on standardization are engaged in definition XML, DTD and circuits. Among them RosettaNet (focused on deliveries in area IT - see details on http://www.rosettanet.org <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.rosettanet.org>), CommerceNet (http://www.commercenet.com <http://www.internet-technologies.ru/?url=http%3A%2F%2Fwww.commercenet.com>), XML/EDI Group (http://www.geocities.com/WallStreet/Floor/5815/ <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.geocities.com%2FWallStreet%2FFloor%2F5815%2F>), Open Applications Group (http://www.openapplications.org <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.openapplications.org>), XML.ORG (http://www.xml.org <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.xml.org>) and BizTalk (http://www.biztalk.org <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.biztalk.org>).


The greatest number of disputes causes BizTalk companies Microsoft: supporters of the company consider{examine} it as altruistic attempt to help becoming XML while opponents consider it one more attempt to subordinate to themselves branch. (the Additional information about BizTalk read in clause{article}.)


As I personally think (on a twist of fate, my opinion coincides with the forecast published on BizTalk Web-server), hardly any branch will manage to realize the general{common} set of semantic rules for various circuits XML. On the other hand, probably, all variety of circuits will manage to be reduced to two - three competing circuits for each branch and then to publish maps for adaptation of these circuits to each other.


Partners on business cannot accept XML and the general{common} circuits for data exchange with each other. In a case of a strong competition between the companies, such as interactive bookshops, auction servers, etc., they can keep up to the last for the circuits before, at last, begin to submit data in the standard image. However, finally, customers should compel to make them it. As soon as applications for XML will allow to prospect and compare the prices for various servers, those who will refuse standardization, will simply lose the business.


XML has come seriously and for a long time - and he possesses weight of advantages. By the way, as assert{approve} in W3C, under a mask of recently accepted addition to Synchronized Multimedia Integration Language (SMIL), XML can become a key element of a digital television announcement.


Probably, for the present early completely to alter Web server under XML. However to start to work with it  already it is time, as the necessary toolkit already is present. From the point of view of the end user, Internet Explorer 5.0 companies Microsoft support XML, XSL, DTD and circuits XML, and Netscape Navigator/Mozilla 5.kh it will do{make} after the output{exit}.

Resources Internet


Tim Shave, the coeditor of specification Extensible Markup Language (XML) 1.0, has written excellent{different} introduction in XML for Scientific American. It{he} can be read on http://www.sciam.com/1999/0599issue/0599bosak.html. <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.sciam.com%2F1999%2F0599issue%2F0599bosak.html.> His  clause{article} can be found on the server of magazine Web Techniques to the address: http://www.webtechniques.com/archives/1998/12/bray/. <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.webtechniques.com%2Farchives%2F1998%2F12%2Fbray%2F.>


Home page XML of consortium World Wide Web with links to fact-finding clauses{articles}, answers to questions and corresponding standards is located to the address: http://www.w3.org/XML/. <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.w3.org%2FXML%2F.>


If you want to keep abreast of events in area XML visit http://www.xml.com, <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.xml.com%2C> published Seybold Publications and O-Reilly.

Circuits XML


Are called to replace Document Type Definitions (DTDs). See equipment design W3C on http://www.w3.org/TR/xmlschema-1/. <http: // www.internet-technologies.ru/? url=http%3A%2F%2Fwww.w3.org%2FTR%2Fxmlschema-1%2F.>

What in your name?


The expanded markup language (Extensible Markup Language, XML) allows you to create own tegi, to document them with help of definitions of types of documents (Document Type Definition, DTD) or circuit XML and then without problems to exchange the data with other sources. All this is good, but can appear, that others use the same, as you, names for elements and attributes, but thus lean{base} on others DTD.


Addressing to a popular example with a bookshop, almost for certain and Know Knew Books, also will use Amazon.com tegi with such names, as author (author), title (name), isdn and price (price). At the same time it is improbable, that they begin to use the same DTD. It is a direct way to problems.


In order to prevent similar conflicts W3C has developed the concept of spaces of names and a keyword xmlns. Due to them in one document names of elements and attributes which differently would enter the conflict with each other can be used. Now they differ with different prefixes of space of names and are defined{determined} on various DTD or to circuits.


, for example, a fragment of code XML with use of spaces of names:



<inventory xmlns:storea =

" http://www.knowknew.com/

books.dtd " xmlns:storeb =

" http: // www.amazon.com/schema ">

<storea:magazine>

<storea:title> Network

Magazine </storea:title>

</storea:magazine>

<storeb:magazine>

<storeb:magazine storeb:title =

" Data Communications ">

</storeb:magazine>

</inventory>


In definition DTD of shop And the name of the book is a subelement of magazine. In the circuit of shop the name Would be attribute of magazine.


Due to distinction of names with the help of different prefixes of spaces of names they can be applied together. Site DTD and circuits is underlined in the given example with help URL, but it can be defined{determined} also with help Uniform Resource Name (URN, see RFC 2141) or Uniform Resource Identifier (URI, see RFC 2396).