Re: "Invalid hexadecimal character reference" error parsing an XML withSAX processor

From: Date: Sat, 09 Aug 2008 18:21:24 +0000
Subject: Re: "Invalid hexadecimal character reference" error parsing an XML withSAX processor
References: 1  Groups: php.xml.dev 
Request: Send a blank email to php-xml-dev+get-616@lists.php.net to get a copy of this message
I have found the problem. In the XML file it was an & entity and it creates the problem in the PHP XML parser.

So It could be enough to replace & with [amp] before the parsing and then, when you have to ouput the result, replace back to the &

r.


Roberto Battistoni ha scritto:
Hy to everyone I have created a simple SAX parser for a very simple XML file. When I run the code that follows I get this error: "Invalid hexadecimal character reference" The strange thing is If I change the "chunk size" for the data I send to the parser, the error row changes. This behaviour is very strange! I have done a one more test and I have set the chunkSize equals to the file size and I have the same error at the end of the file. The same XML file processed with another language doesn't raise any error. I use PHP 5.2.3 and a LAMP (AppServ Open Project - 2.5.9 for Windows) on a Windows VISTA PC. The code I have used follows: public function create_parser($filename) { $this->fp = fopen($filename, 'r'); $this->fsize = filesize($filename); $this->parser = xml_parser_create(); xml_set_element_handler($this->parser, 'Parser::start_element','Parser::end_element'); xml_set_character_data_handler($this->parser, 'Parser::char_data'); } public function parse() { //$blockSize = 4*1024;
   $blockSize = $this->fsize;    echo 'Lunghezza file: '.$this-
fsize;
while ($data = fread($this->fp, $blockSize)) {
       //$data = str_replace('\n','',$data);
       if (!xml_parse($this->parser, $data, feof($this->fp)))
       {
              echo 'Parser error: ('.xml_get_current_byte_index($this-
parser).') \''.xml_error_string($this->parser).'\' at line '.
              xml_get_current_line_number($this->parser). ' at col ' .
              xml_get_current_column_number($this->parser);
              return false;
        }
} return true; } A piece of the XML following: <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE dblp SYSTEM "dblp.dtd"> <dblp>
     <incollection mdate="2002-01-03"
key="books/acm/kim95/AnnevelinkACFHK95">
       <author>
Jurgen Annevelink
       </author>
       <author>
Rafiul Ahad
       </author>
       <author>
Amelia Carlson
       </author>
       <author>
Daniel H. Fishman
       </author>
       <author>
Michael L. Heytens
       </author>
       <author>
.... The Industrial Information Technology Handbook
       </booktitle>
       <url>
db/books/collections/IITHandbook2005.html#SeyfarthK05
       </url>
     </incollection>
</dblp>

Thread (2 messages)

« previous php.xml.dev (#616) next »