Skip to content

Conversation

@jschroed91
Copy link
Member

Change diffing to strike through entire words/numbers if they contain periods or commas within the word.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty messy if statement, I'll figure out a way to clean it up.

I do have a question on implementation:
As you can see I added the specialCaseChars array for characters that should be considered part of the 'word' when inside it. I only set it to period and comma here.

I'm wondering if I should change it so that any character would be considered part of the word if inside, instead of whitelisting (consider part of the word if not alpha-numeric but the next character in text is anything other than whitespace).

For example, by including all chars, then: this-is-hyphenated would be considered one word, instead of breaking it up like |this|-|is|-|hyphenated|

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't want to end up with tags as parts of a word. I would think
hyphenated words should be diff'd sanely.
On May 30, 2014 5:20 PM, "Josh Schroeder" notifications@github.com wrote:

In lib/Caxy/HtmlDiff/HtmlDiff.php:

@@ -195,7 +201,10 @@ private function convertHtmlToListOfWords($characterString)
$current_word = $character;
$mode = 'whitespace';
} else {

  •  if ( ctype_alnum( $character ) && ( strlen($current_word) == 0 || ctype_alnum( $current_word ) ) ) { 
  •  if ( 
  •  (ctype_alnum($character) && (strlen($current_word) == 0 || $this->isSingleWord($current_word))) || 
  •  (in_array($character, $this->specialCaseChars) && isset($characterString[$i+1]) && $this->isSingleWord($characterString[$i+1])) 
  •  ) { 

This is a pretty messy if statement, I'll figure out a way to clean it up.

I do have a question on implementation:
As you can see I added the specialCaseChars array for characters that
should be considered part of the 'word' when inside it. I only set it to
period and comma here.

I'm wondering if I should change it so that any character would be
considered part of the word if inside, instead of whitelisting (consider
part of the word if not alpha-numeric but the next character in text is
anything other than whitespace).

For example, by including all chars, then: this-is-hyphenated would be
considered one word, instead of breaking it up like |this|-|is|-|hyphenated|


Reply to this email directly or view it on GitHub
https://github.com/caxy/php-htmldiff/pull/3/files#r13254106.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps ( and ) as should be considered as part of a word as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 these are common with table and figure titles

On Fri, May 30, 2014 at 6:07 PM, Brandon Raxter notifications@github.com
wrote:

In lib/Caxy/HtmlDiff/HtmlDiff.php:

@@ -195,7 +201,10 @@ private function convertHtmlToListOfWords($characterString)
$current_word = $character;
$mode = 'whitespace';
} else {

  •  if ( ctype_alnum( $character ) && ( strlen($current_word) == 0 || ctype_alnum( $current_word ) ) ) { 
  •  if ( 
  •  (ctype_alnum($character) && (strlen($current_word) == 0 || $this->isSingleWord($current_word))) || 
  •  (in_array($character, $this->specialCaseChars) && isset($characterString[$i+1]) && $this->isSingleWord($characterString[$i+1])) 
  •  ) { 

Perhaps ( and ) as should be considered as part of a word as well?


Reply to this email directly or view it on GitHub
https://github.com/caxy/php-htmldiff/pull/3/files#r13255285.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call Brandon - I will add them. Any others that you guys can think of?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbols that might be used? #@$% maybe?

@mgersten-caxy
Copy link
Contributor

Looks good, thanks

mgersten-caxy added a commit that referenced this pull request Jun 2, 2014
@mgersten-caxy mgersten-caxy merged commit 213afcc into master Jun 2, 2014
@mgersten-caxy mgersten-caxy deleted the feature-nonpartial_word_diffing branch June 2, 2014 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 participants