Home Links
Home Page
Use XML in PHP
Compression of the data on PHP
Use mod_perl
Style of coding on PHP
Perl and XML. Library of the programmer
Access to databases under management SUBD POSTGRES95
Parsing on Perl
XMLHttpRequest (AJAX) - sending and processing of answers of http-searches with help JavaScript.
Subsys_JsHttpRequest: pumping of the data without perezagruzki pages (AJAX)
The brief description of regular expressions: POSIX and PCRE
Optimization of searches in MySQL
Wound of treelike structures in Databases (Nested Sets)
Oracle / PHP FAQ
The specification and functions DOM in PHP
Not kehshirovat`!
Report PPP
Useful advice{councils} on optimization of ASP-applications
XML: time has come
 

Perl and XML. Library of the programmer

Perl and XML



Programming language Perl has appeared enough for a long time and has initially been focused on text processing. As against Perl, behind "shoulders" XML - only read out years, but for this time he has had time to recommend itself(himself) from the best party . This language is widely applied to processing web-contents, performance of various operations with documents, designing of web-services, and also in any other situation in which structurization of the changeable information is required. Despite of so various nature, these languages fine get on together. And at times complex , but the successful union this book is devoted to the description of a history of them.


In what the reason of the close union between Perl and XML?



First of all it is necessary to note, that Perl ideally approaches for text processing. He supports descriptors of files, objects of documents, carries out processing lines, and also offers opportunities on creation of regular expressions. Any who had to develop initially programs on nizkourovnevom language, such as C, and then - on Perl, understands, that Perl is much better adapted to text processing. And XML, in essence, is a usual text, therefore Perl and XML fine supplement each other.



Moreover, since version 5.6, Perl supposes job with the symbolical codings based on Unicode (for example, UTF-8), and this fact is extremely important at processing XML-documents. Additional data on symbolical codings can be found in chapter{head} 3.



Second, it is necessary as to take into account presence of universal archival network Perl (CPAN), including set of the Perl-modules accessible to all interested persons. Due to this problems{tasks} of the programmer considerably become simpler; everyone, beginning{starting} to program in language Perl, can simply use ready modules. Due to this the significant economy of time and means is reached{achieved}. For example, what for to create own analyzer of a code (parser) if network CPAN contains set of the ready analyzers accessible to free loading? And, as a rule, all modules are beforehand tested and can be arranged under concrete needs. Network CPAN does not concern to number of rigidly set structures: in its{her} development brings the contribution set of people, and any displays of the control have the limited character. As soon as there is a new technology, in network CPAN there is a module supporting her . Due to this property opportunities XML therefore can change old are fine supplemented and be added new auxiliary technologies.



Initially XML-modules grew and were multiplied « as mushrooms after a rain ». Each module has been supplied with the unique interface and had original style inherent in him in traditions Perl. Recently the tendency to creation of the universal interface began to be shown, allowing to realize interchangeability of modules. If you by virtue of any reasons do not accept parser SAX, you can easily use any other analyzer, not applying for this purpose additional efforts.



Thirdly, the floppy opportunities Perl providing object-oriented programming, are rather useful at job with XML. The data in the XML-document have the hierarchical structure formed with the help of base units. These units are called as XML-elements and can contain the enclosed elements. In result the elements forming the document, can be submitted with the help of one class of the objects including simple identical interfaces. Moreover, markup language XML inkapsuliruet contents of these objects, which, in turn, inkapsulirujut a code and the data. In result they fine supplement each other. It is uneasy to notice, that similar objects are rather useful at the organization of modular structure of XML-processors. They include analyzers, factories of analyzers, auxiliary objects, and also the analyzers returning objects. All this provides creation of a "transparent" code which can practically be carried out on all platforms.



Fourthly, the large value has communication{connection} between Perl and Web. Certainly, Java and JavaScript possess really inexhaustible opportunities in this area, but any who understands web-programming a little, will say to you, that Perl it is applied at the organization of a server part of the majority of web-servers. Many web-libraries written in language Perl, can easily adapt in view of their application in XML. Skilled programmers who years developed web-sites in language Perl, can to become freely poddanymi "kingdoms" XML.



And at last, at a choice of the programming language it is necessary to start with personal motives. Language Perl is ideal at job with a XML-code, but it is not necessary to become isolated extremely on him. Simply try to work with it .



XML it is easier, than you think



Many people are inclined to consider{examine} XML as result of intervention certain « the malicious genius » who, at least, is going to to destroy all mankind. The introduced markup language, with his  angular brackets and slehshami, at first sight seems rather complex . Besides, if to take into account presence of the enclosed elements, types of sites and announcements DTD, the situation considerably is complicated.



And now authors of the book want to share with the reader a small secret: creation of the programs intended for processing of a XML-code, does not make special work. There is a whole spectrum of the tool means focused on performance of parse and construction of structures of the data. And at disposal of the user are given convenient in application API, to seize with which probably within several minutes. If you want to test all charm of rather complex  XML-applications, special obstacles for this purpose no, but nevertheless it is not necessary to complicate all excessively. Complexity of a XML-code varies in very wide limits, and in case of development of the simple XML-application it is necessary to use simple tool means.



For an illustration of validity of this statement we shall consider the simple base module called XML:: Simple, the Poppy created by the Grant Klin (Grant McLean). At the minimal expenditures of labour on the part of the user access to a solid set of the useful properties providing processing of a XML-code is provided.



As is known, the typical program reads out the XML-document, carries out some changes, writing down them thus in a file. Purpose  of module XML:: Simple consists in all-round automation of this process. As a result of a call of the subroutine there is a reading the XML-document and his  preservation in memory. The built - in khehsh-symbols are applied to performance of elements and the data. After end of all necessary changes other subroutine which is carrying out recording of the information in a file is caused.



And now we shall pass to direct practical tests. As well as in a case with any other module, use directive is applied to announcement XML:: Simple in the program:




use XML:: Simple;


After performance of this instruction XML:: Simple in space of names exports two subroutines:



XMLin () - this subroutine reads out the XML-document from a file or lines and creates the structure including the data and elements. In a course of realization of this process the link on khehsh, containing structure comes back.


XMLout () - this subroutine, having the link on khehsh with the coded document, generates a XML-marking and returns her  as a line of the text.


At desire it is possible to create the document « from zero » by generating structures of the data on the basis of khehsh-symbols, files and lines. Application of a similar method is recommended at primary creation of a file. Abstain from application of circular links or incorrectly functioning modules.



Let's assume, that your boss is going to to send messages to group of users of application WarbleSoft SpamChucker, the manager of mailing lists. One of properties of this application is the opportunity of import / export of the XML-files representing mailing lists. The unique problem in this case consists that it is inconvenient to boss to read on the screen names of users in their initial kind and he prefers, that they were displayed with application of capital letters. By virtue of it it is required to write the program which is editing XML-files of the data and carrying out required transformations.



The first carried out problem  in this case will be the analysis of XML-files with the purpose of definition of styles of a marking. The sample of the similar document is resulted in listing 1.1.



Listing 1.1. A file of given  SpamChucker




<? xml version = " 1.0"?>


<spam-document version = " 3.5" timestamp = " 2002-05-13 15:33:45 ">


<! - version 3.5-> is automatically generated WarbleSoft Spam,


<customer>


<first-name> Joe </first-name>


<surname> Wrigley </surname>


<address>


<street> 17 Beable Ave. </street>


<sity> Meatball </city>


<state> MI </state>


<zip> 82649 </zip>


</address>


<email> joewrigley@jmac.org </email>


<age> 42 </age>


</customer>


<customer>


<first-name> Henrietta </first-name>


<surname> Pussycat </surname>


<address>


<street> R.F.D. 2 </street>


<city> Flangerville </city>


<state> NY </state>


<zip> 83642 </zip>


</address>


<email> meow@263A.org </email>


<age> 37 </age>


</customer>


</spam-document>


Having familiarized with page perldoc, describing module XML:: Simple, you can feel enough confidently and write the small script shown in listing 1.2.



Listing 1.2. The script which is carrying out replacement lower case on capital letters in names of customers




* The program carries out replacement of lower case letters on capital


* In names of customers under the XML-document,


* Generated by application WarbleSoft SpamChucker..


* Inclusion of restrictions and preventions , in this case


* You know what to do{make}.


use strict;


use warnings;


* Import of module XML:: Simple.


use XML:: Simple;


* Inclusion of a file in the khehsh-link with the help of procedure


* "XMLin" from the module


XML:: Simple.


* Also the option 'forcearray', therefore is included


* All elements


* Contain links to a file.


my $cust_xml = XMLin ('./customers.xml ', forcearray = <l);


* Performance of a cycle on everyone subkhehshu customer,


* And all objects are stored{kept} as anonymous


* The list under a key 'customer'.


for my $customer ({$cust_xml-<{customer}}) {


* Replacement lower case on capital letters in elements


* 'first-name' and 'surname' by performance built - in


* Functions Perl, uc ().


foreach (qw (first-name surname)) {


$customer-<{$ _}-<[0] = uc ($customer-<{$ _}-<[0]);


}


}


* A repeated conclusion to a seal khehsha as the XML-document,


* Including finishing symbol of a new line


* (for improvement of perception{recognition} of a code).


print XMLout ($cust_xml);


print "n";


As a result of performance of the program (it is possible, connected to some problems since your boss disposes of all data), we receive the following result:




<opt version = " 3.5" timestamp = " 2002-05-13 15:33:45 ">


<customer>


<address>


<state> MI </state>


<zip> 82649 </zip>


<city> Meatball </city>


<street> 17 Beable Ave. </street>


</address>


<first-name> JOE </first-name>


<email> i-like-cheese@jfflac.org </email>


<surname> WRIGLEY </surname>


<age> 42 </age>


</customer>


<customer>


<address>


<state> NY </state>


<zip> 83642 </zip>


<city> Flangerville </city>


<street> R.F.O. 2 </street>


</address>


<first-name> HENRIETTA </first-name>


<email> ineowmeow@augh.org </email>


<surname> PUSSYCAT </surname>


<age> 37 </age>


</customer>


</opt>


Accept our congratulations! You have written the program which is carrying out processing of a XML-code which, besides, well works. Almost excellent result is achieved. The word used here "almost" testifies that the behaviour of the program differs from expected a little. As khehsh-symbols do not save the order of following of elements contained in them, he will be changed. Also there can be blanks between elements. Whether can represent it a problem?



In the considered{examined} script the important idea is emphasized: necessity of the compromise between simplicity and complexity. As the developer to you will need to make the decision of that makes the important part of the code written in language of a marking, and that - no. The order of following of elements sometimes matters. In this case use of the modules similar XML:: Simple, is not supposed. Or the programmer needs to get access to processable instructions, and also to save them in a file. And again the modules similar XML:: Simple, do not possess similar opportunities. Therefore before to make the decision concerning use of the concrete module, it is required to estimate his  opportunities. In this case all is much easier, as you have consulted with the boss and have checked up program SpamChucker, having used a set of the changed data. In result all interested persons have been satisfied. The received and initial documents are similar, due to what it is possible to come to a conclusion about conformity to the requirements which are put forward to prilozheniju1. Therefore consider, that you have passed "baptism of fire" and have started processing the XML-document with help Perl!



Mean, that we are only in the beginning of a way. The most part of the book is made out as a rate of outstripping training and contains set of advice{councils} and the techniques intended for performance of processing of any XML-documents. Not each problem concerning this process, is so simple as it was specified above. In any case authors of the book hope, that readers are prepared enough and do not test the insuperable complexities connected to processing of XML-documents by means Perl.


KHML-processors



Now, when « the simple party  » XML has been considered, we shall start studying some features of this language. These features should be taken into account at job with XML and Perl.



In the book often there is a term the XML-processor (processor quite often reduced up to a word which differs radically from the corresponding term designating the central computer of computer system). The generalized sense is inherent in this term in the maximal degree. For the processor not essence important what exactly done{made} with the program with the document processable by it. He does not define{determine} sources of an origin of the document, and also methods of his  further processing.



As one would expect, the functioning of the XML-processor considered{examined} in the pure state, does not represent special interest. Proceeding from these reasons, it is possible to come to a conclusion, that the computer program which does{makes} something useful at processing the XML-document, uses the processor simply as one of components. The processor usually reads out a XML-file, and then, applying opportunities of parse, will transform it  to the structures which are taking place in memory which further are processed by the program.



In world Perl the similar behaviour is defined{determined} with the help of Perl-modules: as a rule, in a case when the processing of a XML-code which is carried out with the help of the instruction use is required. Thus the existing package providing access of the programmer to the object-oriented interface is used. The beginning of processing of a XML-code in many programs in language Perl, having corresponding opportunities, is defined{determined} with the help of analyzer XML:: Parser (or other similar program). After the small period of time all « draft job » on analysis of a XML-code is entrusted to anothers earlier written to modules. The code made by programmers, defines{determines} the order of preliminary and finishing processing.



Use ready modules



It is possible to attribute{relate} that this language is supported by community of his  supporters all over the world to one of strengths Perl. As soon as programmers in language Perl identify a problem and create the corresponding module aimed at its{her} decision, the result of their activity becomes property of world community. Network CPAN intends for this purpose. The basic advantage in this case consists that if you want to create any fragment of a code in language Perl, there is a probability, that someone has created it  earlier, and in network CPAN it is possible to find the corresponding Perl-module.



However the method of " collective creativity », applied in relation to so "young", popular and nonconventional technology what is XML, has the lacks. By the moment of the first « an output{exit} on a stage » XML in network CPAN there were the various Perl-modules written by various programmers. Owing to full anarchy all of them form « shapeless chorus » which number of participants includes various structures and the interfaces focused on achievement of the various purposes.



However do not lose courage. Times « anarchy and the disorder », concerning by 1998, have remained in the past. At present time is observed a certain similarity of the organization and standards. And the initiative proceeds from community Perl/XML (what it has initially been declared in a mailing list perl-xml, supported ActiveState). Members of community developed the first modules with the purpose of creation of required tools. Thus they followed the rules established by other players in world XML. It is possible to attribute{relate} standards of parse to number of these "players", SAX and DOM, and also the introduced XML-technologies, such as XPath. Base analyzers of a low level later have appeared. There were interesting systems (such as XML:: SAX) which realize model DWIM at level Perl, displayed in developed standartakh1.



Certainly, if you wish to use "confused" tools, suitable only for performance of draft and fast job can always make it. It is possible to attribute{relate} module XML to number of such tools:: Simple. Authors of the book will apply{put} a maximum of diligence to help you to use standardized tools. After that you will need to start only process of processing of a XML-code and to not interfere with occuring process.



To the programmer on a note



As a rule, the XML-modules which are taking place in network CPAN, satisfy needs{requirements} of programmers on 90 %. Certainly, the rest of 10 % it is possible to consider{examine} as a parity{ratio} between leading experts of your company and « candidates on dismissal ». Authors of the book intend to justify those expenses which you have incurred{carried} as a result of purchase of the book, by demonstration of some "awful" details explaining the order of processing in Perl of XML-document at the lowest levels (in comparison with any kind of the specialized text processing which are carried out in Perl). For the beginning we will address to some "truisms" which should be taken into account further.


The origin of the program has no value



By then, when the part of a XML-code which is carrying out the analysis of XML-documents, starts his  processing, the source of his  origin has no special value. The document can be received with the help of a local area network, is loaded from a database or read out from a disk. For the analyzer the XML-code and everything is important only, that to it  is connected.



Mean, that the program demands attention to itself as a whole. For example, if the program realizing mechanism XML-RPC has been written, it is better to know the order of use of report TCP which defines{determines} sample and sending through the Internet of all XML-given! We can use this program for performance of operations of sample and sending of the data, however until the final software product is constant as the user will feel necessity of the pure{clean} XML-document which can be processed by the XML-processor laying in a nucleus of the program.


All XML-documents are similar from the point of view of structure



Irrespective of the purpose and a way of creation, at drawing up of any XML-document the same base rules of formatting should be taken into account: strictly one root element, absence of overlapped elements, the conclusion of all attributes in inverted commas, etc. Each component of the analyzer for the XML-processor feels necessity of performance of the same operations, as any other XML-processor. It in turn means, that all processors can use the general{common} base in common. In programs in language Perl, carrying out processing of a XML-code, freely distributed modules - analyzers are usually used. Practice of repeated realization of base XML-procedures of the analysis is not applied.



Moreover, thanking odnodokumentnoj to nature XML, process of processing of a code turns to the pleasant iterative process which is not demanding the big expenses of time. Thus each document, by means of external essence of other document, tests « magic transformation » into « simply other element » in structure of causing process. In this case the code which forms the first document, can create "fabric" of any link (or any other object which the link can concern), not demanding additional efforts from the programmer.


XML-applications differ with the purpose 



Any XML-applications define{determine} sense of existence of the any XML-document. And the set corrected a highest level to which the any XML-document follows, promotes achievement of some useful purposes. These rules can define{determine}: filling of a file of a configuration, preparation of data transfer in a network ilivypolnenie some other actions. The sense of existence of XML-applications consists not only in filling modest documents with " the maximum{supreme} sense of their applicability ». They also are required for definition of structure of created documents according to the certain specification of applications. With the help of announcement DTD achievement of compatibility of the structure described above is facilitated. However it is necessary to take into account that fact, that at the order of the developer can not appear the circuit of formal acknowledgement{confirmation} used by development of applications. There can be a need{requirement} for creation of some rules of check. These rules appear rather useful, for example, in a case, when it is required, that your followers (switching and you two a week after) not « were confused in a jungle » developed before the program. It is required to create also the circuit of check if it will be necessary to allow for other programmers to create the programs providing use of advantages of language XML.



As a rule, at realization in practice of the majority of receptions of drawing up of a XML-code, in the chapter{head} of a corner the mentioned above dualism "document / application" is put. In most cases developed software will include the sections which are taking into account three herein provided facts:



Data input is carried out with application of a suitable method. In particular "listening" of a network socket or reading of a file from a disk can be carried out. The similar behaviour is rather typical and characteristic for Perl: do{make} all necessary for data acquisition;

The intercepted entrance data will be passed the certain type of the XML-processor. As a rule, it is the best way to use one of the analyzers, the created and supported community of developers on Perl. As these modules module XML:: Simple or more complex  modules which will be considered below can be used;

And at last, pay attention to result of processing by the processor of a XML-code. Probably, he will produce a XML-code (or a HTML-code) to update a database or to send the electronic message of your mother. This item{point} is determining at performance of the XML-application: code XML simply undertakes and his  certain processing is carried out. In the book the boundless opportunities opening in this case will not be discussed really. The subject of consideration will be made with close communications{connections} between the XML-processor and other parts of your program.


Features XML



In this section the questions making a subject of the book are mentioned. The problems arising at processing of XML-documents are connected to them.


Formal correctness



In XML there is a built - in monitoring system of quality. The document should correspond{meet} to some minimal set of the syntactic rules, allowing to correspond{meet} to a formal correctness for a XML-code. The majority of analyzers are not capable to process the document which breaks any of these rules, therefore it is required to be convinced that the any entrance data are characterized by a comprehensible degree of quality.


Codings of symbols



The life in XXI century demands to pay attention to such questions, as used codings of symbols. Those days when contents web-uzlovv the Internet were coded with application of character set ASCII. « the Hero of our days » have irrevocably left became{began} Unicode on the basis of which all basic character sets used in the Network are formed. In XML job with symbols Unicode though there is a set of ways for performance of this coding is supposed, switching most often used in Perl coding Unicode, UTF-8. As a rule, seldom enough it is necessary to reflect on questions of a similar sort, but it is necessary to be informed concerning available opportunities.


Spaces of names



Not everyone can be praised by that, that worked with spaces of names. Spaces of names make splitting a code into separate areas, dividing{sharing} tegi, marking and announcements. In result there is an opportunity to mix and compare with various types of documents. Whether there is a speech about the equations in a HTML-code, or about a marking as the data in a XSLT-code, use of spaces of names is justified. Mean, that support of spaces of names is realized in recently developed modules.


Announcements



In essence, announcements are not part of the document, and simply describe it . It should be accepted as a reality and it is not necessary to give this question enhanced attention. Remember only that in documents announcements DTD often are applied, and also announcements of such objects, as essence and attributes are included. It is necessary to take into account it "not commit follies" further.


Essence



Essence and interrelations suhhnostej look simply enough: they remain in contents which you, most likely yet have not defined{determined}. It is possible, that contents are in the other file, or include symbols which input causes the certain difficulties. Sometimes it is required to resolve links, and it is sometimes better to refrain from it. At times the analyzer carries out viewing announcements, and during other time he does not care of it. Essence can contain others of essence, and depth of an enclosure is not limited. In any case it is necessary to supervise a situation in order to prevent possible{probable} problems further.


Service symbols



According to the rules accepted in XML, all that is not tegom marking, concerns to the category of the meaning symbolical data. This fact can lead to to some unexpected results. For example, not always it is possible to say definitely, that occurs at processing service symbols. By default the XML-processor saves all symbols - even symbols of creation of new lines which can be included the ambassador tegov with the purpose of improvement of perception{recognition} of a code, or symbols of blanks which can be applied to creation of spaces in the text. Some analyzers sometimes allow to ignore blanks, but in this case there are no rigidly established and simple rules.



Finishing{Stopping} the review of languages Perl and XML, it is possible to draw a conclusion, that they excellently supplement each other. And though during job there can be so-called "traps", but due to presence of the various modules developed by programmers, studying of opportunities Perl/XML will be easy and pleasant.