Tuesday, January 3, 2017

Super difficult bug to identify in PHP Simple HTML DOM Parser Library

Insert/Edit/Append operations in PHP Simple HTML DOM library can silently fail, without any reason.

I suspect that the cause might be a internal logical bug inside of the library, or maybe just a memory leak, as the script seems to be pretty memory inefficient. Anyway, this bug looks like this:

Scenario:

You update an element using innertext or outertext method. It seems all good, but when you print your DOM object, it doesn't include the newly changed/added elements.

SOLUTION / WORKAROUND:

Call method ->clear() on $html object and reload it back from the string every now and then:


$x=(string)$html;$html->clear();$html=str_get_html($x);




UPDATE:
Recreating simple_dom_html object by the method above adds significant overhead in terms of page load time (~100ms in my tests). So,I've decided to investigate the issue in more detail. The result: I've found the bug to be linked with a line that's changed innertext of body tag:
$html->find("body", 0)->innertext.=$BOX;

After this call, any other call to innertext change would fail. So, I've decided not to directly expand body tag using this approach, but I have added a dummy child element to body tag,  and put $BOX content there. It worked well. So, I guess there must be a bug in simple_dom_html somewhere, which makes it silently crash upon change of body tags innertext. It seems that it doesn't have ability to recreate DOM tree if the root element is modified?
UPDATE AGAIN:

It seems that simple_dom_html really doesn't support changing dynamically loaded HTML (via innertext).

Alternative:
use a new $html2 object and process the tags, then return it back to string

No comments:

Post a Comment

Ubuntu 12.04, 14.04, 16.04 - auto start an app or script before login

To run a command or application at startup, even before the user has logged in, you can use this file: /etc/rc.local The commands entered...