Skip to content

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables. Originally forked from https://github.com/rashid2538/php-htmldiff.

License

Notifications You must be signed in to change notification settings

SavageTiger/php-htmldiff

 
 

php-htmldiff

Scrutinizer Code Quality Build Status Code Coverage Packagist Average time to resolve an issue Percentage of issues still open

php-htmldiff is a library for comparing two HTML files/snippets and highlighting the differences using simple HTML.

This HTML Diff implementation was forked from rashid2538/php-htmldiff and has been modified with new features, bug fixes, and enhancements to the original code.

For more information on these modifications, read the differences from rashid2538/php-htmldiff or view the CHANGELOG.

Installation

The recommended way to install php-htmldiff is through Composer. Require the caxy/php-htmldiff package by running following command:

composer require caxy/php-htmldiff

This will resolve the latest stable version.

Otherwise, install the library and setup the autoloader yourself.

Working with Symfony

If you are using Symfony, you can use the caxy/HtmlDiffBundle to make life easy!

Usage

use Caxy\HtmlDiff\HtmlDiff; $htmlDiff = new HtmlDiff($oldHtml, $newHtml); $content = $htmlDiff->build();

Configuration

The configuration for HtmlDiff is contained in the Caxy\HtmlDiff\HtmlDiffConfig class.

There are two ways to set the configuration:

  1. Configure an Existing HtmlDiff Object
  2. Create and Use a HtmlDiffConfig Object

Configure an Existing HtmlDiff Object

When a new HtmlDiff object is created, it creates a HtmlDiffConfig object with the default configuration. You can change the configuration using setters on the object:

use Caxy\HtmlDiff\HtmlDiff; // ... $htmlDiff = new HtmlDiff($oldHtml, $newHtml); // Set some of the configuration options. $htmlDiff->getConfig() ->setMatchThreshold(80) ->setInsertSpaceInReplace(true) ; // Calculate the differences using the configuration and get the html diff. $content = $htmlDiff->build(); // ...

Create and Use a HtmlDiffConfig Object

You can also set the configuration by creating an instance of Caxy\HtmlDiff\HtmlDiffConfig and using it when creating a new HtmlDiff object using HtmlDiff::create.

This is useful when creating more than one instance of HtmlDiff:

use Caxy\HtmlDiff\HtmlDiff; use Caxy\HtmlDiff\HtmlDiffConfig; // ... $config = new HtmlDiffConfig(); $config ->setMatchThreshold(95) ->setInsertSpaceInReplace(true) ; // Create an HtmlDiff object with the custom configuration. $firstHtmlDiff = HtmlDiff::create($oldHtml, $newHtml, $config); $firstContent = $firstHtmlDiff->build(); $secondHtmlDiff = HtmlDiff::create($oldHtml2, $newHtml2, $config); $secondHtmlDiff->getConfig()->setMatchThreshold(50); $secondContent = $secondHtmlDiff->build(); // ...

Full Configuration with Defaults:

$config = new HtmlDiffConfig(); $config // Percentage required for list items to be considered a match. ->setMatchThreshold(80) // Set the encoding of the text to be diffed. ->setEncoding('UTF-8') // If true, a space will be added between the <del> and <ins> tags of text that was replaced. ->setInsertSpaceInReplace(false) // Option to disable the new Table Diffing feature and treat tables as regular text. ->setUseTableDiffing(true) // Pass an instance of \Doctrine\Common\Cache\Cache to cache the calculated diffs. ->setCacheProvider(null) // Set the cache directory that HTMLPurifier should use. ->setPurifierCacheLocation(null) // Group consecutive deletions and insertions instead of showing a deletion and insertion for each word individually.  ->setGroupDiffs(true) // List of characters to consider part of a single word when in the middle of text. ->setSpecialCaseChars(array('.', ',', '(', ')', '\'')) // List of tags to treat as special case tags. ->setSpecialCaseTags(array('strong', 'b', 'i', 'big', 'small', 'u', 'sub', 'sup', 'strike', 's', 'p')) // List of tags (and their replacement strings) to be diffed in isolation. ->setIsolatedDiffTags(array( 'ol' => '[[REPLACE_ORDERED_LIST]]', 'ul' => '[[REPLACE_UNORDERED_LIST]]', 'sub' => '[[REPLACE_SUB_SCRIPT]]', 'sup' => '[[REPLACE_SUPER_SCRIPT]]', 'dl' => '[[REPLACE_DEFINITION_LIST]]', 'table' => '[[REPLACE_TABLE]]', 'strong' => '[[REPLACE_STRONG]]', 'b' => '[[REPLACE_B]]', 'em' => '[[REPLACE_EM]]', 'i' => '[[REPLACE_I]]', 'a' => '[[REPLACE_A]]', )) ;

Contributing

See CONTRIBUTING file.

Contributor Code of Conduct

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms. See CODE_OF_CONDUCT file.

Credits

Did we miss anyone? If we did, let us know or put in a pull request!

License

php-htmldiff is available under GNU General Public License, version 2. See the LICENSE file for details.

TODO

  • Tests, tests, and more tests! (mostly unit tests) - need more tests before we can major refactoring / cleanup for a v1 release
  • Add documentation for setting up a cache provider (doctrine cache)
    • Maybe add abstraction layer for cache + adapter for doctrine cache
  • Make HTML Purifier an optional dependency - possibly use abstraction layer for purifiers so alternatives could be used (or none at all for performance)
  • Expose configuration for HTML Purifier (used in table diffing) - currently only cache dir is configurable through HtmlDiffConfig object
  • Add option to enable using HTML Purifier to purify all input
  • Performance improvements (we have 1 benchmark test, we should probably get more)
    • Algorithm improvements - trimming alike text at start and ends, store nested diff results in memory to re-use (like we do w/ caching)
    • Benchmark using DOMDocument vs. alternatives vs. string parsing
  • Benchmarking
  • Look into removing dependency on php-simple-html-dom-parser library - possibly find alternative or no library at all. Consider how this affects performance.
  • Refactoring (but... tests first)
    • Overall design/architecture improvements
    • API improvements so a new HtmlDiff isn't required for each new diff (especially so that configuration can be re-used)
  • Split demo application to separate repository
  • Add documentation on alternative htmldiff engines and perhaps some comparisons

About

A library for comparing two HTML files/snippets and highlighting the differences using simple HTML. Includes support for comparing complex lists and tables. Originally forked from https://github.com/rashid2538/php-htmldiff.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • PHP 100.0%