Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a more faithful json mapping of html sibling nodes of mixed types #110

Open
andremarianiello opened this issue Feb 23, 2019 · 1 comment · May be fixed by #111
Open

Provide a more faithful json mapping of html sibling nodes of mixed types #110

andremarianiello opened this issue Feb 23, 2019 · 1 comment · May be fixed by #111

Comments

@andremarianiello
Copy link

Currently, the json{} displayer comines all text node children of an element node together (separated by spaces)

# echo '<div><span>a</span>1<span>b</span>2 3</div>' | pup 'div json{}'
[
 {
  "children": [
   {
    "tag": "span",
    "text": "a"
   },
   {
    "tag": "span",
    "text": "b"
   }
  ],
  "tag": "div",
  "text": "1 2 3"
 }
]

This produces the same output as a different html document

# echo '<div><span>a</span>1 2<span>b</span>3</div>' | pup 'div json{}'
[
 {
  "children": [
   {
    "tag": "span",
    "text": "a"
   },
   {
    "tag": "span",
    "text": "b"
   }
  ],
  "tag": "div",
  "text": "1 2 3"
 }
]

I would like the json output to preserve the distinction between these documents. For example,

# echo '<div><span>a</span>1<span>b</span>2 3</div>' | pup 'div json{}'
[
 {
  "children": [
   {
    "children": [
     {
      "text": "a"
     }
    ],
    "tag": "span"
   },
   {
    "text": "1"
   },
   {
    "children": [
     {
      "text": "b"
     }
    ],
    "tag": "span"
   },
   {
    "text": "2 3"
   }
  ],
  "tag": "div"
 }
]

# echo '<div><span>a</span>1 2<span>b</span>3</div>' | pup 'div json{}'
[
 {
  "children": [
   {
    "children": [
     {
      "text": "a"
     }
    ],
    "tag": "span"
   },
   {
    "text": "1 2"
   },
   {
    "children": [
     {
      "text": "b"
     }
    ],
    "tag": "span"
   },
   {
    "text": "3"
   }
  ],
  "tag": "div"
 }
]
@carlosvsilva
Copy link

Im also trying to parse a webpage with an item name followed by a price, and a second item name followed by its own price, and using a comma ',' I get both items names first, and both prices last, separate from names. Why doesn't pup keep the original order?
Any way to fix this?
I'm running: pup '[class="product-name"],[class="price"] json{}'

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants