Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing introduces new line breaks when outputting the html with multiple childs #96

Open
mangei opened this issue Jan 13, 2023 · 3 comments
Assignees

Comments

@mangei
Copy link

mangei commented Jan 13, 2023

What is this feature about (expected vs actual behaviour)?

When parsing a file, I would like to have the original html of an element, so that I can search-replace a specific part of a document, without changing/updating the rest.

The issue is that $el->html does not return the right string, if any of its childs has more than one child. It introduces additional linebreaks.

How can I reproduce it?

Script: (it shows my full use-case; the notable part is highlighted)

<?php

use voku\helper\HtmlDomParser;

require_once '../composer/autoload.php';

$fileContent = file_get_contents('./test.html');
$dom = HtmlDomParser::str_get_html($fileContent);

foreach($dom->find('.mydiv') as $myDivEl) {
    $currentHtml = $myDivEl->html;
    echo $currentHtml;                                           // <---- here you can see the wrong output (you can skip the rest)

    $newContent = "";
    foreach($myDivEl->find('.mydiv-item') as $childEl) {
        $childEl->class = 'replaced';

        $newContent .= $childEl;
    }

    $myDivEl->outerhtml = '<div class="myreplacement">' . $newContent . '</div>';
    
    $fileContent = str_replace($currentHtml, $myDivEl->html, $fileContent);
}

file_put_contents('./test-out.html', $fileContent);

Input HTML file:

<html>
<body>
        <div class="mydiv">
        </div>
        <div class="mydiv">
            <div class="mydiv-item"><span>A1</span></div>
        </div>
        <div class="mydiv">
            <div class="mydiv-item"><span>B1</span><span>B2</span></div>
        </div>
</body>
</html>

Actual output: (B is not replaced)

<html>
<body>
        <div class="myreplacement"></div>
        <div class="myreplacement"><div class="replaced"><span>A1</span></div></div>
        <div class="mydiv">
            <div class="mydiv-item"><span>B1</span><span>B2</span></div>
        </div>
</body>
</html>

Expected output:

<html>
<body>
        <div class="myreplacement"></div>
        <div class="myreplacement"><div class="replaced"><span>A1</span></div></div>
        <div class="myreplacement"><div class="replaced"><span>B2</span><span>B2</span></div></div>
</body>
</html>

The issue is, that the html of the selected elements is not the same, if an element has more than one child. Therefore the search-replace does not work correctly:

<div class="mydiv">
        </div>

A:
<div class="mydiv">
            <div class="mydiv-item"><span>A1</span></div>
        </div>

B:
<div class="mydiv">
            <div class="mydiv-item">
<span>B1</span><span>B2</span>
</div>
        </div>

B should be:

<div class="mydiv">
            <div class="mydiv-item"><span>B1</span><span>B2</span></div>
        </div>

Does it take minutes, hours or days to fix?

Minutes?

Any additional information?

.

Thanks for your help!

@mangei
Copy link
Author

mangei commented Jan 15, 2023

It would also help me, if I can get the original parsed text, so that I can (search &) replace it. Maybe indices (from-to) of the original parsed string.

voku added a commit that referenced this issue Feb 12, 2023
@voku
Copy link
Owner

voku commented Feb 12, 2023

It's much more simple to use the HtmlDom object instead of some string replacements, here is an example: 7571bee

@voku voku self-assigned this Feb 12, 2023
@ducwp
Copy link

ducwp commented May 16, 2023

This is a lack of this library. If I have multiple parent & multiple child selectors that's a big problem.

Example:

<html>
<body>
        <div class="mydiv">
        </div>
        <div class="mydiv_a">
            <div class="mydiv-item"><span>A1</span></div>
        </div>
        <div class="mydiv_b">
            <div class="mydiv-item"><span>B1</span><span>B2</span></div>
        </div>

<div class="mydiv_c">
            <div class="mydiv-item-next"><span>B1</span><span>B2</span></div>
        </div>
</body>
</html>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants