Use XML in PHP
Our plan such. First we learn{find out}, what functions are for job with XML in PHP and as them to use. That it is better for understanding, we shall consider a small script which will display structure of our XML-document.
Let's start. I do not want tiresomely and for a long time to tell the general{common} words, how to work with XML in PHP, let's disassemble all this on an example better. So, production of a problem{task}: to write a script which will show structure of the XML-document. In examples it is a file xml.php.
First we shall create the XML-document (in examples it test.xml). Let in this file photos will be described. We shall especially not subtilize, and we shall do without description DTD (to not confuse with DDT:)). Here there is first unpleasant feature PHP: XML-documents which should be processed from a script can but` are written in the following codings: US-ASCII, ISO-8859-1 and UTF-8. Since we need to describe photos in Russian it is necessary to choose last coding since in the first drukh there are no Russian letters. Not all text editors can work with this coding. I, for example, typed{collected} XML in editor SciTE. He small, free-of-charge and at him{it} good illumination of syntax (including PHP and XML). Our XML-document will look so:
<? xml version = " 1.0" encoding = "UTF-8"?>
<album>
<foto smallfoto = "Fotos/1smallvelo.jpg" bigfoto = "Fotos/1bigvelo.jpg">
<title> the Name 1 </title>
<comment> the Long comment
For some lines 1 </comment>
<date> 26.05.2003 </date>
<color/>
<detailed> 0 </detailed>
</foto>
<foto smallfoto = "Fotos/smallbardak.jpg" bigfoto = "Fotos/bigbardak.jpg">
<title> the Name 2 </title>
<comment> the Long comment
For some lines 2 </comment>
<date> 27.05.2003 </date>
<color/>
<detailed> 1 </detailed>
</foto>
</album>
The "Physical" sense tegov in XML now has no value (though there like and so all is understandable). The only thing, that only <color/> the color photo here can designate whether or not. It here only for an example tega at which no closed.
And now we shall write a script which would show structure of the XML-document. For job with XML in PHP there are more than 20 functions. We shall consider for the beginning the most necessary. This script:
<?
$xmlfilename = "test.xml";
$code = "UTF-8"; // the Coding xml-?
$curcode = "Windows-1251"; // the Current coding
$level = 0; // the Level of an enclosure
$list = array (); // the List of elements in a xml-file
// Will transform a line from Unicode
function encoding ($str)
{
global $code;
global $curcode;
$str = mb_convert_encoding ($str, $curcode, $code);
return $str;
}
function drawspace ()
{
global $level;
for ($i = 0; $i <$level * 10; $i ++)
{
echo " ";
}
}
// Processes the text between tegami
function characterhandler ($parser, $data)
{
global $code;
global $curcode;
drawspace ();
$data = encoding ($data, $curcode, $code);
$data = trim ($data.) "<br>";
echo $data;
}
// Processes opening tegi
function starthandler ($parser, $name, $attribs)
{
global $level;
global $list;
global $code;
global $curcode;
$name = encoding ($name, $curcode, $code);
$list [] = $name;
drawspace ();
echo " <<font color ='blue ' size = ' + 1 '> $name </font> ";
foreach ($attribs as $atname => $val)
{
echo encoding (" $atname => $val ");
}
echo "> <br> ";
$level ++;
}
// Processes closed tegi
function endhandler ($parser, $name)
{
global $level;
global $list;
array_pop ($list);
$level-;
drawspace ();
echo " <<font color ='blue ' size = ' + 1 '> / $name </font>> <p> ";
}
// We shall create parser
$parser = xml_parser_create ($code);
if (! $parser)
{
exit (" I can not create parser ");
}
else
{
echo " Parser it is successfully created <p> ";
}
// We shall establish obrabotchiki tegov and the text between them
xml_set_element_handler ($parser, 'starthandler', 'endhandler');
xml_set_character_data_handler ($parser, 'characterhandler');
// We shall open a file with xml
$fp = fopen ($xmlfilename, "r");
if (! $fp)
{
xml_parser_free ($parser);
exit (" I can not open a file ");
}
while ($data = fread ($fp, 4096))
{
if (! xml_parse ($parser, $data, feof ($fp)))
{
die (sprintf (" Oshibochka has left: %s in line %d ",
xml_error_string (xml_get_error_code ($parser)),
xml_get_current_line_number ($parser)));
}
}
fclose ($fp);
xml_parser_free ($parser);
?>
After announcements of auxiliary functions, it is necessary to create first of all parser. It can be made one of funkciij xml_parser_create or xml_parser_create_ns. The first has one unessential parameter which designates the coding in which the XML-document is written. If it{him} to not specify, it is considered to default, that he is written as ISO-8859-1. But, as I wrote above, it does not approach us also we chooses UTF-8. Since the designation of this coding still is required to us, we shall bear{we shall take out} her{it} in a global variable ($code = "UTF-8";). Also we shall bear{we shall take out} there the coding in which the text in a browser ($curcode = "Windows-1251" will be deduced{removed};). Function xml_parser_create_ns has additional (too unessential) parameter which designates a symbol by which in the document spaces of names will be divided{shared}. Since now it is not necessary to us, we have used the first function. If parser it is created successfully paremennaja $parser will receive the value which is distinct from zero.
After that it is necessary to specify parseru XML, what functions to cause at occurrence in the text tegov XML. In our example it is made so:
// We shall establish obrabotchiki tegov and the text between them
xml_set_element_handler ($parser, 'starthandler', 'endhandler');
xml_set_character_data_handler ($parser, 'characterhandler');
Function xml_set_element_handler establishes obrabotchiki for opening and closed tegov. As the first parameter to them it is passed parser which we have created before. And as the second and the third - names of functions which will be caused as will come across opening and closed tego accordingly. These functions should be determined definitely. Function for opening tegov should look approximately so:
// Processes opening tegi
function starthandler ($parser, $name, $attribs)
{
}
By its{her} call to her are passed parser which we have created, a name processable tega and his{its} attributes (that is in angular brackets after a name). If with a name of any features no, attributes are passed as an associative file, i.e. as a key => value. Therefore we also process them as follows:
foreach ($attribs as $atname => $val)
{
echo encoding (" $atname => $val ");
}
All too most and for closed tegov, only functions are not passed attributes which basically cannot be at closed tega:
function endhandler ($parser, $name)
{
}
Here there is one interesting detail. Even if at tega no closed the second function is all the same caused. If you see at job of a script will see, that for tega <color/> at us it has turned out:
<COLOR>
</COLOR>
And to process the text which settles down between tegami, it is necessary to establish corresponding obrabotchik function xml_set_character_data_handler. To her to use the same way, only its{her} second argument should be a name of function which is declared thus:
function characterhandler ($parser, $data)
That is the same as and for closed tega. In it{her} like " the Name 1 " or " the Long comment all data are passed to some lines 2 " from our example. Well and, at last, the most important - how to read the XML-document. It appears simply - as a usual text file. I.e. we open his{its} function fopen, for example so:
$fp = fopen ($xmlfilename, "r");
Also we read from him{it} all the line long which it is then passed in function xml_parse:
while ($data = fread ($fp, 4096))
{
if (! xml_parse ($parser, $data, feof ($fp)))
{
die (sprintf (" Oshibochka has left: %s in line %d ",
xml_error_string (xml_get_error_code ($parser)),
xml_get_current_line_number ($parser)));
}
}
At xml_parse three arguments. The first - a variable created by us earlier parsera, to the second - the read line, and the third (unessential) - an attribute of that it is time to finish parsit` (we there and we pass value of, whether the file) was terminated. At us check of mistakes is still inserted. There like all it is clear from the name. xml_get_error_code returns an error code on which xml_error_string creates a line which describes this mistake.
After all it it is necessary to not forget to destroy parser. It is done{made} by function xml_parser_free:
xml_parser_free ($parser);
Now one of the most unpleasant features. Since we wrote XML as Unicode also lines are passed us in the same coding. And as usually a site build on more habitual coding (Koi8, Windows) with it Unicod'om it is necessary to do{make} something. And here the most unpleasant begins. In expansion PHP which is responsible XML, there are two functions for code conversion UTF-8. It is function utf8_decode which will transform a line from UTF-8, and function utf8_encode which on the contrary will transform in UTF-8. But they do not approach us for the reason, that can work with coding ISO-8859-1 in which there are no Russian letters. Fortunately, developers PHP all the same have made functions which can buz work problems and with other codings is mb_convert_encoding. In this case we used her{it} so:
$str = mb_convert_encoding ($str, $curcode, $code);
$curcode and $code it is variables, in which khranjat`sja names of codings (remember, we have declared them global earlier?). With this function all is understandable: the first argument is the initial line, the second - the name of the coding to which we shall transform, and the third argument (unessential) - the coding from which we shall transform. Function returns to us a new line. It would seem, that very well, there is a function, she is healthy works (it really so), but that she worked, it is necessary, that expansion to PHP - mbstring (multi byte string) has been connected. For this purpose if you work from Windows, in a file php.ini it is necessary raskommentirovat` a line extension=php_mbstring.dll. But if houses it to make simply on a hosting where your site is located, it (expansion) can be not connected. For this reason I have born{have taken out} code conversion in separate function that she{it} could be corrected easily:
// Will transform a line from Unicode
function encoding ($str)
{
global $code;
global $curcode;
$str = mb_convert_encoding ($str, $curcode, $code);
return $str;
}
If you have ideas how to do without without mb_convert_encoding - write to me
It were the most simple functions for job with XML. That it was more interesting, in our script I think a level of an enclosure for tegov (it correctly to displace the text to the right) and more in a global variable $list will be worn out opening tegi, and at occurrence closed - last element is thrown out. T.o. In $list the way on which we is stored{kept} have passed up to current tega, and this teg is at the end of the list.
Now let's a little be indulged and we shall see, as processing of mistakes works. We shall clean{remove} from tega color a slash. That is we shall leave <color> as though we have forgotten it{him} to close. Also that to us gives out PHP: " Oshibochka has left: mismatched tag in line 16 ". And on it processing stops. Also " mismatched tag " will be, if we shall transfer closed teg <data/> the ambassador tega <foto/>.
We shall be played with codings. If to save our XML-document in coding Windows-1251 and fairly to specify it in heading <? xml version = " 1.0" encoding = "Windows-1251"?> (do not forget to correct a corresponding global variable in a script) PHP... safely takes off:) At least so was at me. I tested this script on such configuration: Win2000 + SP3; Apache 1.3.27; PHP 4.3.1.

|