Migrating from PHP 4 to 5 John Coggeshall Zend Technologies
Let's get started PHP 5, among its engine-level improvements such as OO and Iterators for users also improved from an internals perspective These improvements allow for incredible flexibility in PHP 5 extensions Object overloading for XML parsing Integration of other objects from Java, .NET, etc. PHP 5 also boasts some great new DB functionality
SQLite One of the best new DB features in PHP 5 is SQLite What is SQLite? A stand-alone RDBMS database system Allows developers to store relational data in the local file system No external server exists, nor is needed Depending on the application, it can significantly out perform other DB packages
The difference in paradigms While RDBMS like MySQL run on a client server model, SQLite modifies files directly.
Simplicity at a price While SQLite is a simpler RDBMS model, the simplicity comes at a price Because of the architecture, the database is inherently un-scalable for concurrent writing (every write locks the entire database) The simplicity makes it usable in almost any environment SQLite, however, is incredibly good at reading!
Example: Zip Code / Area Code lookup SQLite is extremely good for look-up tables For instance, relating U.S. postal codes to city names and phone area codes Where to get the data: zipfiles.txt A little text file I picked up years ago somewhere along the line
Zip file format File is one line per entry in the format: <ZIPCODE><STATE><AREACODE><CITYNAME> First step: Create tables Second step: Create indexes Third step: populate database Fourth Step: Lock and Load!
A note about creating tables in SQLite SQLite is unlike most other RDBMS packages does not require typing information such as INTEGER VARCHAR etc. Rather, SQLite has only a notion of type classes: textual numeric
A lack of typing information means Although you can use whatever you want for a type, SQLite does have some simple rules INTEGER must be used if you want to create an auto incrementing key Anything with the substring “CHAR” in it will be considered textual
Create your tables Download the sqlite command line tool from sqlite.org and create zipfiles.db: $ sqlite zipfiles.db sqlite> CREATE TABLE cities (zip INTEGER, city_name, state); sqlite> CREATE INDEX cities_city_name_idx on cities(city_name); sqlite> CREATE INDEX cities_zip_idx on cities(zip); sqlite> CREATE TABLE areacode(zip INTEGER, areacode INTEGER); sqlite> CREATE INDEX areacode_idx on areacode(areacode); sqlite> CREATE INDEX areacode_zip_idx on areacode(zip);
Populate the tables (zipcode_db_populate.php) With the tables created populate them using a simple PHP script to parse our text file Use sqlite_open() to open the database file When inserting data, always use sqlite_escape_string() to escape data Use sqlite_query() to perform the queries Use sqlite_close() to close the database
The Zipcode lookup API (zipcode_api.php, zipcode_lookup.php) Now that we have our database, wrap the queries into a clean API that we can use find_cities_by_zipcode($zipcode, $db) find_state_by_zipcode($zipcode, $db) find_areacodes_by_zipcode($zipcode, $db) find_zipcode_by_city_name($city, $state, $db) _handle_sqlite_error($result, $db) To handle errors which occur during a query
Improving Write Performance Although SQLite isn't very good at writing there are a number of things you can do to improve write performance Wrap large numbers of queries in a transaction Using PRAGMA to tweak SQLite options to improve performance Spread tables across multiple database files
Synchronous The synchronous option is a very import option in SQLite. It controls the trade off between absolute data integrity and speed Three different levels: NONE : Fastest, but sudden power outage can result in data loss NORMAL: Default setting offering a reasonable mix between data integrity and speed FULL: Near 100% assurance of data integrity at the cost of performance
Synchronous Control this setting using PRAGMA in a query: PRAGMA default_synchronous=OFF; Other interesting PRAGMA options: count_changes: If enabled SQLite will count the number of affected rows in a query. If disabled functionality which relies on knowing the number of rows will be disabled
Splitting up Tables Since every write locks the entire database, splitting tables which have heavy writing to them can improve performance Multiple databases means multiple files Join them together using SQLite's ATTACH: ATTACH DATABASE mydatabase.db AS mydb;
Table splitting pitfalls Can only attach a maximum of 10 databases together Transactions lock all databases Cross-database Transactions are not atomic Attached databases cannot have their schema modified
Improving Reads By default SQLite performs reads using a buffered query Allows for data seeks forward and backward If you are only interested in reading from start to finish you can use an unbuffered query sqlite_unbuffered_query() Only fetches one row at a time Good for large result sets
Questions?
MySQLi MySQLi (or I mproved MySQL) is a complete re-write of the old MySQL extension for PHP Used with MySQL version 4.1 and above Supports PHP APIs for new MySQL 4.1 features Most legacy functions still exist, although their name has changed
Making the leap (mysqlidiff.php) MySQL and MySQLi share a similar API Most functions which existed in the old extension exist today: instead of mysql_query() use mysqli_query() There are incompatibilities however No more implicit database resources (all queries must specify the database connection being used) Doesn't work with versions of MySQL < 4.1
Backward Compatibility? MySQLi and the old MySQL extension do not play very nicely together Difficult, if not impossible, to get both mysql_* and mysqli_* functions available at the same time from PHP To overcome this, I created a compatibility layer: http://www.coggeshall.org/oss/mysql2i/ Maps MySQLi functions to the old mysql_* names Should be a drop-in fix to most legacy code
Same steps Although the API has changed slightly, the steps for working with MySQL are the same: Connect to the database server Select the database to use Perform queries Retrieve results Close database connection
An example (mysqli_simple.php) Here is a simple example of using MySQLi mysqli_connect() to connect to the database server mysqli_select_db() to select the database mysqli_query() to perform queries mysqli_fetch_array() to return results mysqli_fetch_row() and mysqli_fetch_assoc() are also both available as helper methods mysqli_close() to close the connection
Dealing with Errors (mysqli_error.php) You'll notice in my example I didn't deal with errors very nicely As with the old extension, MySQLi can retrieve nice error codes and messages for users mysqli_errno() - returns an error code mysqli_error() - returns a string representation of the error mysqli_connect_error() - returns an error code from the connection process mysqli_connect_error() - returns an error string from the connection process
Executing Multiple Queries One of the big improvements in MySQLi is the ability to execute multiple queries at the same time . Using a single multiquery in MySQLi is more complex than a single query Must iterate through a set of result objects and then treat each one as a result
Need to know for Multiqueries (using_multiqueries.php) There are a few functions you need to know about when dealing with Multi-query select statements: mysqli_multi_query(): perform the multi-query mysqli_store_result(): retrieve a result Perform operations against result as before mysqli_more_results(): check for another result mysqli_next_result(): increment to next result
Prepared Statements Prepared Statements are a more efficient way of performing queries against the database Every time you execute a query, the query must be parsed, checked for syntax, etc. This is an expensive process Prepared statements allow you to save compiled “templates” of a query Instead of recompiling and retransmitting an entire query, only the values to plug into the templates are sent.
Using prepared statements Consider the following query INSERT INTO mytable VALUES($data1, '$data2'); Instead of specifying the variable in the query ($id), replace it with a ? placeholder Use this type of prepared statement for database writes
Using prepared statements The same query as a prepared statement INSERT INTO mytable VALUES(?, ?); Variable was replaced with ? Note quotes are no longer necessary Prepared statements automatically escape data
Using a Prepared Statement (mysqli_bound_param.php) Once you have a query you can use it in four steps: Prepare the statement using mysqli_prepare() Bind PHP variables to the statement using mysqli_bind_param Set the variable values Execute the query and write to the database
Using Result-Bound Prepared Statements (mysqli_bind_result.php) The second type of prepared statement is a result-bound prepared statement Bind PHP variables to columns being returned Loop over the result set and the PHP variables will be automatically be populated with current data
Transactions One of the biggest improvements in MySQLi/MySQL 4.0+ is the support for atomic transactions Multiple writes done as a single write Insures data integrity during critical multi-write operations such as credit card processing
Transactions API (mysqli_transactions.php) MySQLi supports a number of transaction APIs mysqli_autocommit() enables and disables auto committing of transactions mysqli_commit() allows you to explicitly commit a transaction mysqli_rollback() allows you to roll back (undo) a transaction To determine the state of auto committing, perform the query: SELECT @@autocommit;
Questions?
That's it for Databases Now that you've been introduced to the two new database extensions available in PHP 5, let's take a look at some of the other functionality PHP 5 boasts a completely revamped XML system Based on libxml2 library dom simplexml xmlreader (to be released in PHP 5.1)
XML Processing in PHP 5 PHP 5 can parse XML in a variety of ways SAX (inherited from PHP 4) DOM (as defined by the W3C) Xpath SimpleXML
Benefits to the new XML In PHP 5 because everything uses a single underlying library many improvements have been made Can switch between SimpleXML/DOM processing at will Streams support has been extended to XML documents themselves (use a stream for an <xsl:include> or <xi:include> tag, for instance.)
DOM in PHP 5 PHP 5 supports a W3C compliant Document Object Model for XML A very detailed way of parsing XML Refer to http://www.w3c.org/DOM for a complete description
Reading XML using DOM Consider the following simple XML document <?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; ?> <articles> <item> <title>PHP Weekly: Issue # 172</title> <link>http://www.zend.com/zend/week172.php</link> </item> </articles>
Reading XML using DOM To use DOM, create a new instance of the DomDocument() object Load an XML file using the load() method To output the XML file to the browser, using the saveXML() method To write the XML file to the filesystem use the save() method
Retrieving nodes by name (dom_getelementbytagname.php) One of the easiest ways to pull data out of an XML document is to retrieve them by name the getElementsByTagName() method returns a DomNodeList object To get the content of a node refer to $node->firstChild->data; PHP 5 also provides $node->textContent to retrieve the same data in a simplified fashion DomNodeList objects can be iterated over like an array using foreach()
More DOM navigation (dom_navigation.php) Although getElementsByTagName is useful, it is also a bit limited Doesn't give you information stored in the structure of the XML document itself To be more detailed, you must parse the document manually Iterate over the childNodes property to get child nodes Use nodeType and nodeName to identify nodes you are interested in
Writing XML using DOM (dom_writing.php) You can also write to XML documents using the DOM model Create nodes using the createElement() method Create values using the createTextNode() method Add nodes as children to existing nodes using appendChild()
Extending DOM Because in PHP 5 DOM is handled through a DomDocument class, you can extend it to implement your own helper functions Must call the DomDocument constructor (__construct) when your extended class is constructed Add a method like addArticle() which encapsulates the steps from the previous example to add a new article to the XML document
XML Validation You can also validate XML documents using DOM in PHP 5 using one of the following three methods: DTD: A very old and largely unneeded method of XML validation XML Schema: Defined by the W3C and can be very complex to work with RelaxNG: A much simplified version of XML validation (recommended)
XML Validation To use one of these three methods simply call one of the following after loading an XML document using the load() method $dom->validate('myxmlfile.dtd'); $dom->relaxNGValidate('myxmlfile.rng'); $dom->schemaValidate('myxmlfile.xsd'); These functions will return a boolean indicating if the validation was successful. Currently doesn't have the best error handling...
Simplified XML parsing Although DOM is great when you don't really know what you are looking for, it is overly complex for when you do For these reasons PHP 5 comes with the SimpleXML extension Maps the structure of an XML document directly to a PHP 5 overloaded object for easy navigation Only good for when you know the structure of the XML document beforehand.
Using SimpleXML (simplexml.php) To use simpleXML Load the XML document using... simplexml_load_file() to load a file simplexml_load_string() to load from a string simplexml_import_dom() to load from an existing DOM node Once loaded you can access nodes directly by name as properties / methods of the object returned
More details on SimpleXML As you can see, nodes can be directly accessed by name from the returned object If you would like to extract attributes from a node, reference the name as an associative array: $simplexml->title['id']; This will get the ID attribute of the TITLE root node
Xpath in SimpleXML (simplexml_xpath.php) SimpleXML also supports Xpath for pulling particular nodes out of a XML document Use the xpath() method to provide your query
Writing XML using SimpleXML Although there are limitations, you can also write XML documents using SimpleXML Just reassign a node or attribute to a new value $simplexml->item->title = “My new title”; $simplexml->item->title['id'] = 42; Use the asXML() method to return back an XML document from SimpleXML Alternatively you can also reimport a SimpleXML document into DOM using dom_import_simplexml()
Questions?
Moving along As you can see, XML support has been drastically improved for PHP 5 HTML support has been improved as well The new tidy extension allows for intelligent HTML parsing, manipulation and repair
What exactly is Tidy? Tidy is a intelligent HTML parser It can parse malformed HTML documents and intelligently correct most common errors in their syntax Missing or misaligned end tags Unquoted attributes Missing required tag elements Tidy automatically adjusts itself based on the detected HTML document type
Using Tidy in PHP (tidy_syntax_fix.php) In its simplest form Tidy will read an HTML document, parse it, correct any syntax errors and allow you to display the corrected document back Use tidy_parse_file() to parse the file Use tidy_get_output() to return the corrected HTML Note that the resource returned from tidy_parse_file() can also be treated as a string to get the output
Identifying problems with a document Once a document has been parsed you can identify problems with the document by examining the return value of the tidy_get_error_buffer() function Returns something like the following: line 1 column 1 – Warning: missing <!DOCTYPE> declaration line 1 column 1 – Warning: replacing unexpected i by </i> line 1 column 43 – Warning: <u> is probably intended as </u> line 1 column 1 – Warning: inserting missing 'title' element
Repairing HTML documents Once a document has been parsed you can be sure it is valid from a syntax standpoint However, this does not mean a document is actually web-standards compliant To make a parsed HTML document standards complaint call the tidy_clean_repair() function Brings the document up to spec according to configuration options (discussed later)
Configuration Options? The vast majority of power in tidy comes from the configuration options which can be set Allows you to do everything from replace deprecated <FONT> tags with CSS to converting HTML 3.2 documents into XHTML 1.0 documents Can be set either at run time or through a configuration file A default configuration can be set using the tidy.default_config php.ini directive.
Runtime configuration (tidy_runtime_config.php) To configure tidy at run time you must pass the configuration as the second parameter to tidy_parse_file() If the second parameter is an array, it should be a series of key/value pairs mapping to configuration options / values If the second parameter is a string it will be treated as a tidy configuration filename and loaded from the filesystem.
Configuration Files Configuration files are useful for creating tidy “profiles” representing different tasks A profile to strip all unnecessary data from an HTML document (save bandwidth) A profile to beautify HTML documents which are difficult to read
Configuration Files Below is an example tidy configuration file indent: yes indent-spaces: 4 wrap: 4096 tidy-mark: no new-blocklevel-tags: mytag, anothertag
Parsing with Tidy Along with all of the functionality for parsing/cleaning/repairing HTML tidy can also be used to parse HTML documents Four different entry points ROOT HEAD HTML BODY Enter using the root(), head(), html(), or body() methods
The Tidy Node (pseudo_tidy_node.php) When calling one of the entry-point methods against the return value from tidy_parse_file(), you get back a tidyNode object Each node represents a tag in the HTML document Allows you to find out many interesting things about the node Allows you to pull out attributes quickly, making screen scraping a snap Consult the PHP manual for details on type, etc.
Example of using Tidy Parsing (tidy_dump_nodes.php) In this example we will parse a document using Tidy and extract all of the URLs found within <A> tags Check the $id property of the node to see if it matches the TIDY_TAG_A constant Look for the 'href' property in the $attribute array
Questions?

Migrating from PHP 4 to PHP 5

  • 1.
    Migrating from PHP4 to 5 John Coggeshall Zend Technologies
  • 2.
    Let's get startedPHP 5, among its engine-level improvements such as OO and Iterators for users also improved from an internals perspective These improvements allow for incredible flexibility in PHP 5 extensions Object overloading for XML parsing Integration of other objects from Java, .NET, etc. PHP 5 also boasts some great new DB functionality
  • 3.
    SQLite One ofthe best new DB features in PHP 5 is SQLite What is SQLite? A stand-alone RDBMS database system Allows developers to store relational data in the local file system No external server exists, nor is needed Depending on the application, it can significantly out perform other DB packages
  • 4.
    The difference inparadigms While RDBMS like MySQL run on a client server model, SQLite modifies files directly.
  • 5.
    Simplicity at aprice While SQLite is a simpler RDBMS model, the simplicity comes at a price Because of the architecture, the database is inherently un-scalable for concurrent writing (every write locks the entire database) The simplicity makes it usable in almost any environment SQLite, however, is incredibly good at reading!
  • 6.
    Example: Zip Code/ Area Code lookup SQLite is extremely good for look-up tables For instance, relating U.S. postal codes to city names and phone area codes Where to get the data: zipfiles.txt A little text file I picked up years ago somewhere along the line
  • 7.
    Zip file formatFile is one line per entry in the format: <ZIPCODE><STATE><AREACODE><CITYNAME> First step: Create tables Second step: Create indexes Third step: populate database Fourth Step: Lock and Load!
  • 8.
    A note aboutcreating tables in SQLite SQLite is unlike most other RDBMS packages does not require typing information such as INTEGER VARCHAR etc. Rather, SQLite has only a notion of type classes: textual numeric
  • 9.
    A lack oftyping information means Although you can use whatever you want for a type, SQLite does have some simple rules INTEGER must be used if you want to create an auto incrementing key Anything with the substring “CHAR” in it will be considered textual
  • 10.
    Create your tablesDownload the sqlite command line tool from sqlite.org and create zipfiles.db: $ sqlite zipfiles.db sqlite> CREATE TABLE cities (zip INTEGER, city_name, state); sqlite> CREATE INDEX cities_city_name_idx on cities(city_name); sqlite> CREATE INDEX cities_zip_idx on cities(zip); sqlite> CREATE TABLE areacode(zip INTEGER, areacode INTEGER); sqlite> CREATE INDEX areacode_idx on areacode(areacode); sqlite> CREATE INDEX areacode_zip_idx on areacode(zip);
  • 11.
    Populate the tables(zipcode_db_populate.php) With the tables created populate them using a simple PHP script to parse our text file Use sqlite_open() to open the database file When inserting data, always use sqlite_escape_string() to escape data Use sqlite_query() to perform the queries Use sqlite_close() to close the database
  • 12.
    The Zipcode lookupAPI (zipcode_api.php, zipcode_lookup.php) Now that we have our database, wrap the queries into a clean API that we can use find_cities_by_zipcode($zipcode, $db) find_state_by_zipcode($zipcode, $db) find_areacodes_by_zipcode($zipcode, $db) find_zipcode_by_city_name($city, $state, $db) _handle_sqlite_error($result, $db) To handle errors which occur during a query
  • 13.
    Improving Write PerformanceAlthough SQLite isn't very good at writing there are a number of things you can do to improve write performance Wrap large numbers of queries in a transaction Using PRAGMA to tweak SQLite options to improve performance Spread tables across multiple database files
  • 14.
    Synchronous The synchronousoption is a very import option in SQLite. It controls the trade off between absolute data integrity and speed Three different levels: NONE : Fastest, but sudden power outage can result in data loss NORMAL: Default setting offering a reasonable mix between data integrity and speed FULL: Near 100% assurance of data integrity at the cost of performance
  • 15.
    Synchronous Control thissetting using PRAGMA in a query: PRAGMA default_synchronous=OFF; Other interesting PRAGMA options: count_changes: If enabled SQLite will count the number of affected rows in a query. If disabled functionality which relies on knowing the number of rows will be disabled
  • 16.
    Splitting up TablesSince every write locks the entire database, splitting tables which have heavy writing to them can improve performance Multiple databases means multiple files Join them together using SQLite's ATTACH: ATTACH DATABASE mydatabase.db AS mydb;
  • 17.
    Table splitting pitfallsCan only attach a maximum of 10 databases together Transactions lock all databases Cross-database Transactions are not atomic Attached databases cannot have their schema modified
  • 18.
    Improving Reads Bydefault SQLite performs reads using a buffered query Allows for data seeks forward and backward If you are only interested in reading from start to finish you can use an unbuffered query sqlite_unbuffered_query() Only fetches one row at a time Good for large result sets
  • 19.
  • 20.
    MySQLi MySQLi (or I mproved MySQL) is a complete re-write of the old MySQL extension for PHP Used with MySQL version 4.1 and above Supports PHP APIs for new MySQL 4.1 features Most legacy functions still exist, although their name has changed
  • 21.
    Making the leap(mysqlidiff.php) MySQL and MySQLi share a similar API Most functions which existed in the old extension exist today: instead of mysql_query() use mysqli_query() There are incompatibilities however No more implicit database resources (all queries must specify the database connection being used) Doesn't work with versions of MySQL < 4.1
  • 22.
    Backward Compatibility? MySQLiand the old MySQL extension do not play very nicely together Difficult, if not impossible, to get both mysql_* and mysqli_* functions available at the same time from PHP To overcome this, I created a compatibility layer: http://www.coggeshall.org/oss/mysql2i/ Maps MySQLi functions to the old mysql_* names Should be a drop-in fix to most legacy code
  • 23.
    Same steps Althoughthe API has changed slightly, the steps for working with MySQL are the same: Connect to the database server Select the database to use Perform queries Retrieve results Close database connection
  • 24.
    An example (mysqli_simple.php)Here is a simple example of using MySQLi mysqli_connect() to connect to the database server mysqli_select_db() to select the database mysqli_query() to perform queries mysqli_fetch_array() to return results mysqli_fetch_row() and mysqli_fetch_assoc() are also both available as helper methods mysqli_close() to close the connection
  • 25.
    Dealing with Errors(mysqli_error.php) You'll notice in my example I didn't deal with errors very nicely As with the old extension, MySQLi can retrieve nice error codes and messages for users mysqli_errno() - returns an error code mysqli_error() - returns a string representation of the error mysqli_connect_error() - returns an error code from the connection process mysqli_connect_error() - returns an error string from the connection process
  • 26.
    Executing Multiple QueriesOne of the big improvements in MySQLi is the ability to execute multiple queries at the same time . Using a single multiquery in MySQLi is more complex than a single query Must iterate through a set of result objects and then treat each one as a result
  • 27.
    Need to knowfor Multiqueries (using_multiqueries.php) There are a few functions you need to know about when dealing with Multi-query select statements: mysqli_multi_query(): perform the multi-query mysqli_store_result(): retrieve a result Perform operations against result as before mysqli_more_results(): check for another result mysqli_next_result(): increment to next result
  • 28.
    Prepared Statements PreparedStatements are a more efficient way of performing queries against the database Every time you execute a query, the query must be parsed, checked for syntax, etc. This is an expensive process Prepared statements allow you to save compiled “templates” of a query Instead of recompiling and retransmitting an entire query, only the values to plug into the templates are sent.
  • 29.
    Using prepared statementsConsider the following query INSERT INTO mytable VALUES($data1, '$data2'); Instead of specifying the variable in the query ($id), replace it with a ? placeholder Use this type of prepared statement for database writes
  • 30.
    Using prepared statementsThe same query as a prepared statement INSERT INTO mytable VALUES(?, ?); Variable was replaced with ? Note quotes are no longer necessary Prepared statements automatically escape data
  • 31.
    Using a PreparedStatement (mysqli_bound_param.php) Once you have a query you can use it in four steps: Prepare the statement using mysqli_prepare() Bind PHP variables to the statement using mysqli_bind_param Set the variable values Execute the query and write to the database
  • 32.
    Using Result-Bound PreparedStatements (mysqli_bind_result.php) The second type of prepared statement is a result-bound prepared statement Bind PHP variables to columns being returned Loop over the result set and the PHP variables will be automatically be populated with current data
  • 33.
    Transactions One ofthe biggest improvements in MySQLi/MySQL 4.0+ is the support for atomic transactions Multiple writes done as a single write Insures data integrity during critical multi-write operations such as credit card processing
  • 34.
    Transactions API (mysqli_transactions.php)MySQLi supports a number of transaction APIs mysqli_autocommit() enables and disables auto committing of transactions mysqli_commit() allows you to explicitly commit a transaction mysqli_rollback() allows you to roll back (undo) a transaction To determine the state of auto committing, perform the query: SELECT @@autocommit;
  • 35.
  • 36.
    That's it forDatabases Now that you've been introduced to the two new database extensions available in PHP 5, let's take a look at some of the other functionality PHP 5 boasts a completely revamped XML system Based on libxml2 library dom simplexml xmlreader (to be released in PHP 5.1)
  • 37.
    XML Processing inPHP 5 PHP 5 can parse XML in a variety of ways SAX (inherited from PHP 4) DOM (as defined by the W3C) Xpath SimpleXML
  • 38.
    Benefits to thenew XML In PHP 5 because everything uses a single underlying library many improvements have been made Can switch between SimpleXML/DOM processing at will Streams support has been extended to XML documents themselves (use a stream for an <xsl:include> or <xi:include> tag, for instance.)
  • 39.
    DOM in PHP5 PHP 5 supports a W3C compliant Document Object Model for XML A very detailed way of parsing XML Refer to http://www.w3c.org/DOM for a complete description
  • 40.
    Reading XML usingDOM Consider the following simple XML document <?xml version=&quot;1.0&quot; encoding=&quot;iso-8859-1&quot; ?> <articles> <item> <title>PHP Weekly: Issue # 172</title> <link>http://www.zend.com/zend/week172.php</link> </item> </articles>
  • 41.
    Reading XML usingDOM To use DOM, create a new instance of the DomDocument() object Load an XML file using the load() method To output the XML file to the browser, using the saveXML() method To write the XML file to the filesystem use the save() method
  • 42.
    Retrieving nodes byname (dom_getelementbytagname.php) One of the easiest ways to pull data out of an XML document is to retrieve them by name the getElementsByTagName() method returns a DomNodeList object To get the content of a node refer to $node->firstChild->data; PHP 5 also provides $node->textContent to retrieve the same data in a simplified fashion DomNodeList objects can be iterated over like an array using foreach()
  • 43.
    More DOM navigation(dom_navigation.php) Although getElementsByTagName is useful, it is also a bit limited Doesn't give you information stored in the structure of the XML document itself To be more detailed, you must parse the document manually Iterate over the childNodes property to get child nodes Use nodeType and nodeName to identify nodes you are interested in
  • 44.
    Writing XML usingDOM (dom_writing.php) You can also write to XML documents using the DOM model Create nodes using the createElement() method Create values using the createTextNode() method Add nodes as children to existing nodes using appendChild()
  • 45.
    Extending DOM Becausein PHP 5 DOM is handled through a DomDocument class, you can extend it to implement your own helper functions Must call the DomDocument constructor (__construct) when your extended class is constructed Add a method like addArticle() which encapsulates the steps from the previous example to add a new article to the XML document
  • 46.
    XML Validation Youcan also validate XML documents using DOM in PHP 5 using one of the following three methods: DTD: A very old and largely unneeded method of XML validation XML Schema: Defined by the W3C and can be very complex to work with RelaxNG: A much simplified version of XML validation (recommended)
  • 47.
    XML Validation Touse one of these three methods simply call one of the following after loading an XML document using the load() method $dom->validate('myxmlfile.dtd'); $dom->relaxNGValidate('myxmlfile.rng'); $dom->schemaValidate('myxmlfile.xsd'); These functions will return a boolean indicating if the validation was successful. Currently doesn't have the best error handling...
  • 48.
    Simplified XML parsingAlthough DOM is great when you don't really know what you are looking for, it is overly complex for when you do For these reasons PHP 5 comes with the SimpleXML extension Maps the structure of an XML document directly to a PHP 5 overloaded object for easy navigation Only good for when you know the structure of the XML document beforehand.
  • 49.
    Using SimpleXML (simplexml.php)To use simpleXML Load the XML document using... simplexml_load_file() to load a file simplexml_load_string() to load from a string simplexml_import_dom() to load from an existing DOM node Once loaded you can access nodes directly by name as properties / methods of the object returned
  • 50.
    More details onSimpleXML As you can see, nodes can be directly accessed by name from the returned object If you would like to extract attributes from a node, reference the name as an associative array: $simplexml->title['id']; This will get the ID attribute of the TITLE root node
  • 51.
    Xpath in SimpleXML(simplexml_xpath.php) SimpleXML also supports Xpath for pulling particular nodes out of a XML document Use the xpath() method to provide your query
  • 52.
    Writing XML usingSimpleXML Although there are limitations, you can also write XML documents using SimpleXML Just reassign a node or attribute to a new value $simplexml->item->title = “My new title”; $simplexml->item->title['id'] = 42; Use the asXML() method to return back an XML document from SimpleXML Alternatively you can also reimport a SimpleXML document into DOM using dom_import_simplexml()
  • 53.
  • 54.
    Moving along Asyou can see, XML support has been drastically improved for PHP 5 HTML support has been improved as well The new tidy extension allows for intelligent HTML parsing, manipulation and repair
  • 55.
    What exactly isTidy? Tidy is a intelligent HTML parser It can parse malformed HTML documents and intelligently correct most common errors in their syntax Missing or misaligned end tags Unquoted attributes Missing required tag elements Tidy automatically adjusts itself based on the detected HTML document type
  • 56.
    Using Tidy inPHP (tidy_syntax_fix.php) In its simplest form Tidy will read an HTML document, parse it, correct any syntax errors and allow you to display the corrected document back Use tidy_parse_file() to parse the file Use tidy_get_output() to return the corrected HTML Note that the resource returned from tidy_parse_file() can also be treated as a string to get the output
  • 57.
    Identifying problems witha document Once a document has been parsed you can identify problems with the document by examining the return value of the tidy_get_error_buffer() function Returns something like the following: line 1 column 1 – Warning: missing <!DOCTYPE> declaration line 1 column 1 – Warning: replacing unexpected i by </i> line 1 column 43 – Warning: <u> is probably intended as </u> line 1 column 1 – Warning: inserting missing 'title' element
  • 58.
    Repairing HTML documentsOnce a document has been parsed you can be sure it is valid from a syntax standpoint However, this does not mean a document is actually web-standards compliant To make a parsed HTML document standards complaint call the tidy_clean_repair() function Brings the document up to spec according to configuration options (discussed later)
  • 59.
    Configuration Options? Thevast majority of power in tidy comes from the configuration options which can be set Allows you to do everything from replace deprecated <FONT> tags with CSS to converting HTML 3.2 documents into XHTML 1.0 documents Can be set either at run time or through a configuration file A default configuration can be set using the tidy.default_config php.ini directive.
  • 60.
    Runtime configuration (tidy_runtime_config.php)To configure tidy at run time you must pass the configuration as the second parameter to tidy_parse_file() If the second parameter is an array, it should be a series of key/value pairs mapping to configuration options / values If the second parameter is a string it will be treated as a tidy configuration filename and loaded from the filesystem.
  • 61.
    Configuration Files Configurationfiles are useful for creating tidy “profiles” representing different tasks A profile to strip all unnecessary data from an HTML document (save bandwidth) A profile to beautify HTML documents which are difficult to read
  • 62.
    Configuration Files Belowis an example tidy configuration file indent: yes indent-spaces: 4 wrap: 4096 tidy-mark: no new-blocklevel-tags: mytag, anothertag
  • 63.
    Parsing with TidyAlong with all of the functionality for parsing/cleaning/repairing HTML tidy can also be used to parse HTML documents Four different entry points ROOT HEAD HTML BODY Enter using the root(), head(), html(), or body() methods
  • 64.
    The Tidy Node(pseudo_tidy_node.php) When calling one of the entry-point methods against the return value from tidy_parse_file(), you get back a tidyNode object Each node represents a tag in the HTML document Allows you to find out many interesting things about the node Allows you to pull out attributes quickly, making screen scraping a snap Consult the PHP manual for details on type, etc.
  • 65.
    Example of usingTidy Parsing (tidy_dump_nodes.php) In this example we will parse a document using Tidy and extract all of the URLs found within <A> tags Check the $id property of the node to see if it matches the TIDY_TAG_A constant Look for the 'href' property in the $attribute array
  • 66.