Skip to content

Preprocessing #39

Answered by jamesturk
brasfb asked this question in Q&A
Discussion options

You must be logged in to vote

The matching HTML is sent to the LLM. So if you had HTML like:

<html>
<div class="sidebar"> ... </div>
<main><a href="#">Something</a><div>Something Else</div></main>
<div class="footer"> ... </div>
</html>

And passed the selector main, all of the innerHTML of that element would get sent. (If there are multiple matches they are appended together.)

Just extracting text would tend to lose context, and some data (URL, phone numbers, etc.) might not be in the text, but instead in attributes. Further refinement of this library will likely entail figuring out how minimal the HTML sent can be without affecting results. (I'm currently working on building a test corpus of sorts, since these sorts…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by jamesturk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #38 on April 05, 2023 00:47.