Home Links
Home Page
Use XML in PHP
Compression of the data on PHP
Use mod_perl
Style of coding on PHP
Perl and XML. Library of the programmer
Access to databases under management SUBD POSTGRES95
Parsing on Perl
XMLHttpRequest (AJAX) - sending and processing of answers of http-searches with help JavaScript.
Subsys_JsHttpRequest: pumping of the data without perezagruzki pages (AJAX)
The brief description of regular expressions: POSIX and PCRE
Optimization of searches in MySQL
Wound of treelike structures in Databases (Nested Sets)
Oracle / PHP FAQ
The specification and functions DOM in PHP
Not kehshirovat`!
Report PPP
Useful advice{councils} on optimization of ASP-applications
XML: time has come
 

The specification and functions DOM in PHP


Introduction: about specifications of XML-technologies


Set of different specifications around XML first of all are directed on ordering and leading to to the uniform standard approaches to job with the data in format XML. At present there are XML + XLink + XSL + spaces of names + information set + XML Linking + Model XPointer + spaces of names XPointer + xptr () XPointer + XSLT + XPath + XSL FO + DOM + SAX + PI for communication{connection} with a sheet of styles + the XML-circuit + XQuery + Encryption XML + Canonization XML + the XML-signature + DOM a level 2 + DOM a level 3 (the list is taken from clause{article} " Happy birthday, XML! ").

Introduction: about specifications of XML-technologies



What is DOM


Document Object Model (objective model of the document). The object in this case means object in programmer sense - artefact OOP and all perfect, for what we like it .


Let's look at an initial code of the XML-document:



<? xml version = " 1.0" encoding = "windows-1251"?>

<root language = "russian">

        <title> XML: the specification and functions DOM in PHP </title>


        <text> Set of different specifications around <acronym> XML </acronym>

        First of all are directed on that <b> to order </b> and to lead to

        To the uniform standard approaches to job with the data in a format <acronym> XML </acronym>.

        </text>


        <date> 2003-05-12 </date>


        <raw-code>

           <! [CDATA [<br> an example not well-formed marking: <p> bla-bla </p>]]>

        </raw-code>


        <! - to add in clause{article} living examples it would be necessary...->

</root>


Basis of ideology XML that the document is a set of sites of treelike structure of the data. The given document can be presented as the following tree:


-o-the Document

|

+-o-the Element root

|

+-o-Attribute language

|

+-o-the Element title

| |

| +-o-the Text site (" XML: the specification... ")

|

+-o-the Element text

| |

| +-o-the Text site (" Set.. ")

| |

| +-o-the Element acronym

| | |

| | +-o-the Text site ("XML")

| |

| +-o-the Text site (" first of all... ")

| |

| +-o-the Element b

| | |

| | +-o-the Text site ("to order")

| |

| +-o-the Text site (" and to result... ")

| |

| +-o-the Element acronym

| | |

| | +-o-the Text site ("XML")

| |

| +-o-the Text site (".")

|

+-o-the Element date

| |

| +-o-the Text site ("2003-05-12")

|

+-o-the Element raw-code

| |

| +-o-Section CDATA (" <br...> ")

|

+-o-the Comment (" to add... ")


It is familiar "-o-" on the circuit sites are designated. To the right of them the text means type of the site. For text sites, section CDATA and the comment contents - for the sake of convenience of orientation are added. Actually, in an amicable way, carries of lines between elements are text sites, and too it would be possible to bring in them to the circuit.


So, we assort the circuit. Everything, that is in the document - sites, and the document - too the site. It means, that there is a class of objects "site", and other classes ("document", "element", " the text site ", "CDATA", "comment") - affiliated from him  and inherit his  properties and methods. What properties and methods should contain in what classes - is described in specification DOM.


If to see in the documentation on module DOM XML (too to me a parameter:)), it is visible, that at all these different sites is much in common - 28 methods at class DomNode, and together with affiliated classes of methods 62. As it is possible to guess, methods and properties of class DomNode are present and at other classes.


On a site phpPatterns () recently (9.4.3) clause{article} " the Rough circuit of module DOM XML in PHP " Harry Fjueksa has appeared. That who in English is able, it is possible to read the primary source, the rest I allow the ogrublenie the rough circuit.


In clause{article} the illustration of mutual relations of classes of module DOM XML is resulted. A tree of classes in mine ispolneii:


o-DomNode

|

+-o-DomAttribute

|

+-o-DomCData

| |

| +-o-DomComment

| |

| +-o-DomDTD

| |

| +-o-DomText

|

+-o-DomDocument

|

+-o-DomDocumentType

|

+-o-DomElement

|

+-o-DomEntity

|

+-o-DomEntityReference

|

+-o-DomProcessingInstruction


Further remarks are resulted, that module DOM XML meanwhile does not fully comply with the specification (and the "left" functions already everyone has had time popol`zovat`sja, now in many applications it is necessary to pick out them and to copy a code) that else is a lot of outflow of memory of the module will be corrected in version 4.3.2 (which it is not issued yet and is in a stage of the release - candidate). But it is trifles of a life. Who for a long time uses DOM XML, to that to not get used, and if you only have begun acquaintance to it  will start to use in real problems{tasks} already when he becomes stable and will correspond{meet} to the specification. In general, we continue acquaintance with DOM and the module.


Specification DOM describes what objects should be present at the applications working with XML, what methods should be at these objects and as they should influence sites of the document. Therefore in language Java, Javascript and other systems where already there is support DOM, XML-documents have the identical interface differing only by names of functions. It is terrible to assume, that would be, start to invent independently developers model.

Job in PHP with the document



Support of cyrillics


The standard provides job with the data recoded in UTF-8, therefore all functions on data input demand, that they have been recoded, and on an output{exit} give out too UTF-8. For code conversion it is necessary to use function iconv.


The changed library php_domxml with support of Russian is accessible on a site dan.phpclub.net. She can create object of the document from a file or lines in which in opening tege there is a corresponding attribute:



<? xml version = " 1.0" encoding = "windows-1251"?> Russian text


Function dump_mem in her too gives out the text in the coding windows 1251, and on it convenience come to an end - other data need to be entered into the document, recoding in UTF-8.

Creation of the document


The object of the document can be created from an existing file or a text line, or absolutely new empty document.



<?

$dom1 = domxml_open_file ("c:/xml/existing_file.xml");

$dom2 = domxml_open_mem ($string);

$dom3 = domxml_new_doc ();

?>


All these functions at a mistake return not object, value false, so check of result of operation simple enough.


By default at creation of the document check of his  syntax (well-form), but not admissibilities (conformity to the DTD-circuit or the XML-circuit of the document, validity) is made. To check and on an admissibility, it is necessary to specify in function of creation of the document (any of three above mentioned) the second, not documentary while, parameter and in him constant DOMXML_LOAD_VALIDATING:



<?

$dom2 = domxml_open_mem ($string, DOMXML_LOAD_VALIDATING);

?>



Reception of object of an element


In memory PHP after the document has been created, all objects of elements of the document are stored{kept}. But in variables of a script they without a special call do not enter the name.


The root element of the document can be received, having addressed to object of the document by means of a method document_element. Function returns object of class DomElement which can be used as argument of other function, or to write down in a variable:



<?

$root = $dom1-> document_element ();

?>


It is similarly possible to receive any site from the document - by means of methods of object of the document or objects of elements.



<?

// A file of affiliated elements root

$root_child = $root-> child_nodes ();


for ($i = 0; $i <sizeof ($root_child); $i ++)

        print ("$i.". $root_child [$i]-> node_type (). " ". $root_child [$i]-> node_name ().

                        " <br/> ");


// First and last affiliated elements

$first_child = $root-> first_child ();

$last_child = $root-> last_child ();


print ($first_child-> node_name (). "And". $root_child [0]-> node_name (). " - same

");

print ($last_child-> node_name (). "And". $root_child [sizeof ($root_child)-1]-> node_name ().

        " - too coincide

");


// An element following for first

// previous_sibling works the same way

$second_child = $first_child-> next_sibling ();


print ($second_child-> node_name (). " ". $root_child [1]-> node_name (). "

");


At analysis of affiliated elements it is important to watch{keep up} types of sites because carries of lines which are put for convenience of reading and editing, too become sites of the document and, accordingly, enter into a file of affiliated elements.



<?

for ($i = 0; $i <sizeof ($root_child); $i ++)

        if ($root_child [$i]-> node_type () == XML_ELEMENT_NODE)

                // For an illustration here the text will be recoded, though for latinicy

                                // It is unessential

                $root_child [$i]-> set_attribute ("makes-sence", iconv ("windows-1251",

                                        "UTF-8", "maybe"));

        else

                print (" $i - an element such as ". $root_child [$i]-> node_type ());


However sometimes in general it is impossible to be confident that the object of the site is received, instead of false or null. Then if to call a method of object, it is possible to receive directly in the resulting document a line about a warning-ohm. That it to avoid, it is possible to check type of an element function get_class.


And uncertain in result it is possible to be, for example, when you get the necessary element from the document by means of expressions XPath. To receive the necessary element, it is not meaningful to touch, certainly, all elements of the document in his  searches. Specially for this purpose there are expressions XPath used in XSLT for addressing to preobrazuemogo sites of the document (attributes select, match).



<?

/* Creation of context XPath. Argument of function - object of the document, in which expressions

        XPath will be carried out. */

$context = xpath_new_context ($dom1);


/* Performance of expression and recording of result in a variable result */

$result = xpath_eval ($context, "/root/text/acronym");


var_dump ($result);


/* A variable $result - object of class XPathObject, property nodeset - a file,

        Containing objects of the received elements. */

for ($i = 0; $i <sizeof ($result-> nodeset); $i ++)

{

        $text = $result-> nodeset [$i]-> first_child ();

        print (iconv ("UTF-8", "windows-1251", $text-> node_value ()). "

");

}



/* Reception of scalar value with help XPath (calculation of number of all elements in

        The document except for root) */

$result = xpath_eval ($context, " count (/root // *) ");


var_dump ($result);

print ("

{$result-> value} ");


It is important to remember about spaces of names XML which can be used in documents. If you want to carry out expressions in documents, kotoyre contain elements from the spaces of names (for example, XSLT-documents), to you it is necessary to declare proistranstvo names. Otherwise it will be impossible to specify names of a kind "xsl:template" in expression.


(URI address of space of names in argument of function necessarily should coincide with that, that is specified in the document, differently XPath-parser will consider, that with the same prefix xsl two different spaces of names are registered.



<?

$xslt = domxml_open_file ("c:/xml/custom.xslt");

$context = xpath_new_context ($xslt);


/* Registration of space of names xsl in context XPath */

xpath_register_ns ($context, "xsl", " http: // www.w3.org/1999/XSL/Transform ");


/* Calculation of quantity{amount} of patterns in XSLT-style. */

$result = xpath_eval ($context, " count (/xsl:stylesheet/xsl:template) ");


print ($result-> value);


So, the problem  of reception of object of the necessary element is disassembled. Now what to do{make} with it .

Copying of elements


While module DOM XML not absolutely corresponded{met} to specification DOM, with this business there was absolutely a whim: you receive object of an element and you add it  to an element of other document. Now it is necessary to clone before it an element function clone_node. The following code copies elements from a root of the first document in a root of the second.



<?

$root1 = $dom1-> document_element ();

$child = $root1-> child_nodes ();


$root2 = $dom2-> document_element ();


for ($i = 0; $i <sizeof ($child); $i ++)

        $root2-> append_child ($child [$i]-> clone_node ());



Creation of new sites in the document


You have noticed, that that here is written "element", "site". I hope, from the circuit of inheritance of classes (is (see higher) to you it became understandable, that the element is teg, and the site is more the general{common} concept including all on light. I try to use these words in the necessary places that was not dvusmyslija.


Any vstavljaets site in the document in two operations. The first - creation of the site. The site should be created inside that document into which he will be inserted. Then the site is added as affiliated to any of sites of the document. For attributes which, in an amicable way, too sites are more convenient design.


In the documentation on php which I downloaded recently, there is a list of functions which were in the previous versions, but there did not correspond{there meet} specifications DOM, and descriptions of their job. They can be studied, but to apply it is not recommended. With new versions the behaviour of some designs and so varied, therefore to rely that is sentenced to a cancellation, does not cost.



<?


/* The root element is added the same as also other sites. */

$dom3 = domxml_new_doc ();


/* Function create_element creates the site such as "element" */

$root3_new = $dom3-> create_element ("root");


/* Now the created element is added to the document. Actually, nothing

        Prevents to send result of function create_element in the document directly, and

        Not through a variable $root3_new. */

$root3 = $dom3-> append_child ($root3_new);


$title = $root3-> append_child ($dom3-> create_element ("title"));


/* Function create_text_node creates the text site. It  we shall add as contents

        Element title. To save the added element in a variable unessentially-

        Only if you want with it  after addition to work. */

$title-> append_child ($dom3-> create_text_node (" Creation of new sites in the document "));


Sites of other types are similarly created and inserted into the document.

Change of sites


Formally, such methods... It is not stipulated. At attributes the method changing contents, is. There are elements in which there are no elements affiliated, and there is only section CDATA, a text with suhhnostjami, or the comment. To change those it is possible by removal{distance} of existing sites and an insert new. For the elements having affiliated mixed up with text, a method of editing would be in general a nonsense.


Attributes are created and change through methods of objects of elements in which these attributes contain. Under the specification they as should be created and added in elements through functions create_attribute and append_child, but all this is not realized yet in PHP 4.3.1 (4.3.2).



<?

// It is possible to establish value of attribute through object of his  element

$root3-> set_attribute ("language", "Russian");


// So it is written to documentation, but does not work.

$root2-> append_child ($dom2-> create_attribute ("language", "Russian"));

?>


The explanation why does not work - " is not made meanwhile ". At the same time it is offered to use last version from CVS - there was no grief; understandably, certainly, that DOM XML is any more for average minds{wits} but that it{he} needed to be extracted so - dismiss. As it is offered to use function set_attribute_node. In some cases it will call inconvenience when, for example, the type of the site inserted in an element is beforehand unknown - whether the text, whether attribute will be made, and it would be necessary to use one function for now so it is impossible, should to do{make} a design if-else.


Replacement of one text site is a removal{distance} existing and an insert new. If there is no confidence, that in an element only the text site and no others, it is possible to clone an element, to insert it  into the parent and then to remove initial. The "pure"{"clean"} element without affiliated will turn out, the truth attributes thus will not be copied.



<?

// $target - a variable with a changeable element


// We receive the parental site.

$parent = $target-> parent_node ();


// We insert in him  a clone of the site necessary to us.

$new_target = $parent-> append_child ($target-> clone_node (false));


// We delete an old element.

$parent-> remove_child ($target);


// We insert in a new element the necessary text.

$new_target-> append_child ($dom-> create_text_node (iconv ("windows-1251",

        "UTF-8", " Replacement of the site is a removal{distance} existing and an insert new. ")));


With change of text sites or sekcij CDATA in a complex  combination of elements too it is simple: we receive the necessary object, we add before it  the new site, and old we delete.



<?

$new_node = $target_node-> insert_before ($dom-> create_text_node (iconv ("windows-1251",

        "UTF-8",

        " Replacement of the site is a removal{distance} existing and an insert new. ")),

        $target_node);


$parent = $target_node-> parent_node ();

$parent-> remove_child ($target_node);


It was possible to replace last 2 lines one - $target_node-> unlink_node (), but as this function does not correspond{meet} to the standard, she can be removed and, accordingly, in examples of her  it is better to not use.

XSL-transformations in module DOM XML


XSLT is too the XML-document. He is read from a file (line) - the same way as the object of the document is created - or created from object of the XML-document. Then the method process with object transformiruemogo the document is caused as argument, and on an output{exit} the object of the XML-document turns out.



<?

$xslt = domxml_xslt_stylesheet_doc ("c:/xml/custom.xslt");

$dom = domxml_open_file ("c:/xml/existing_file.xml");


$final = $xslt-> process ($dom);


print ($final-> dump_mem ());


The conclusion


The objective approach to the document is a step forward, our light future. Module DOM XML gives the program interface with such opportunities what to not make on SAX-parsere in a php-script. He eliminates the uncertainty, connected with treatment of symbols in the XML-document made by you at his  analysis by the XSLT-processor or other obrabotchikom. For example, the problem with an insert of the text in elements of the document in DOM XML does not exist, whereas at job with the text of the document it is necessary to check and filter service symbols. The text in DOM-object is a text and if it  simply to insert into a line of the document where symbols <?> will turn in tegi, and essence if not are declared, can call a mistake.


Yes, module DOM XML still crude. Not all functions are realized, last release always will be ostavat` that is in CVS at developers. The documentation strongly lags behind and releases. However developers actively communicate with users, the module is open for innovations and improvements. Therefore to master his  functions it is necessary already now that to the first commercial project in which it will be used XML, at you the luggage of knowledge and experience of construction of simple sites with XML was.





© Web Development Company Conkurent, LLC 2008-2009. All rights reserved.