Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incomplete parsing using myhtml_node_next and myhtml_node_text #157

Open
parser12 opened this issue Sep 10, 2018 · 3 comments
Open

incomplete parsing using myhtml_node_next and myhtml_node_text #157

parser12 opened this issue Sep 10, 2018 · 3 comments

Comments

@parser12
Copy link

parser12 commented Sep 10, 2018

Hi
I am using myhtml to parse following html code

<html>
<span class="c3">
<span class="sonne" title="Sonnenscheindauer"><img width="20" height="20" src="whatever1.img" alt="sun" />0.0 h</span>
<span class="regen" title="Niederschlagsmenge"><img width="20" height="20" src="whatever2.img" alt="rain" />0 mm</span>
</span>
</html>

I expect to get: the 0.0h and 0 mm

my understaning of the tree is:
tag:span class c3

  • tag: span
    • attrib class
    • attrib title
  • tag: img
    • attrib width
    • attrib height
    • attrib src
  • 0 mm

I use:

node: span with class c3
subnode1: the tags span, img and the required text

pseudo code:

subNode1 = myhtml_node_child(node);
while (subNode1 != NULL) {
  if (subNode1 != NULL) {printf("child: of %lu -> %s\n", myhtml_node_tag_id(node), myhtml_node_text( subNode1,&len ) );
  subNode1 = myhtml_node_next(subNode1 );
}

the compete source code is attached as well as the html file

I am able to parse e.g. the tags span class=regen, the img with it's attributes but not the text: "0 mm"
do you havea suggestion?

@lexborisov
Copy link
Owner

lexborisov commented Sep 10, 2018

@parser12 hi!
The input data is not clear. Please, use (for comments, markdown)

```HTML
<html>In this place HTML tags</html>
```

and for C code:
```C
subNode1 = myhtml_node_child(node);
```

See Creating and highlighting code blocks

Thanks!

@lexborisov
Copy link
Owner

@parser12

After parsing, you get this tree:

<html>
  <head>
  <body>
    <span class="c3">
      "
      "
      <span class="sonne" title="Sonnenscheindauer">
        <img width="20" height="20" src="whatever1.img" alt="sun">
        "0.0 h"
      "
      "
      <span class="regen" title="Niederschlagsmenge">
        <img width="20" height="20" src="whatever2.img" alt="rain">
        "0 mm"
      "
      "
    "
    "

This is a new line after the <span>:

      "
      "
subNode1 = myhtml_node_child(node);

while (subNode1 != NULL) {
    printf("child: %s\n", myhtml_tag_name_by_id(subNode1->tree, myhtml_node_tag_id(subNode1), NULL));

    if (myhtml_node_tag_id(subNode1) == MyHTML_TAG__TEXT) {
        printf("Text: %s\n", myhtml_node_text(subNode1, NULL));
    }

    subNode1 = myhtml_node_next(subNode1);
}

Output:

child: -text
Text: \n
child: span
child: -text
Text: \n
child: span
child: -text
Text: \n

I think the general meaning is clear?

For your task, see function for search nodes and example.

P.S.: you can use Modest and selectors for this.

@parser12
Copy link
Author

Thank you for the really fast response.
I was in the oppinion I had this as a solution already but I was looping on the node-level, not on the subNote1 level; I assume.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants