Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] does content.url in filename for websites make sense? (I want attribution per paragraph via separate prompt) #491

Closed
chaelli opened this issue May 15, 2024 · 4 comments

Comments

@chaelli
Copy link
Contributor

chaelli commented May 15, 2024

Context / Scenario

I changed the prompt to make sure the llm includes the source per paragraph of the answer. So I can more closly align the response with the facts for my users. When I do that, I can only tell it to reference the filename (as this is what the llm gets in the facts part of the prompt). For websites this is always "content.url" - because this is set so in

fileName: "content.url",

Question

I wonder if it would not make more sense to put the url there instead of a static string. Or at least include the url in the facts where it exists.

@chaelli chaelli added the question Further information is requested label May 15, 2024
@dluc
Copy link
Collaborator

dluc commented May 15, 2024

You should be able to swap content.url with the URL upon receiving the response, there is a property with the URL

@chaelli
Copy link
Contributor Author

chaelli commented May 15, 2024

This only works if there is just 1 relevant source - if there are multiple, I would not know which part of the answer is based on what page. If there are multiple sources, they are all called content.url and I cannot align separate sources to separate paragraphs.
fyi until I started using kernel memory, I just used a prompt like this:

Add a source reference to the end of each sentence. e.g. Apple is a fruit ([Reference page title](Reference page url)) (markdown link formatting). ...

@chaelli
Copy link
Contributor Author

chaelli commented May 27, 2024

@dluc Do you have any preference between the options:

  • replace "content.url" during indexing with the real url value?
  • additing the url as an additional value in the prompt?

Or none of them?

@dluc
Copy link
Collaborator

dluc commented May 27, 2024

@dluc Do you have any preference between the options:

* replace "content.url" during indexing with the real url value?

* additing the url as an additional value in the prompt?

Or none of them?

I would try the approach with the prompt, it should be easier. Changing the indexing pipeline might have unexpected impact

chaelli pushed a commit to chaelli/kernel-memory that referenced this issue May 27, 2024
Update SearchClient to use webPageUrl instead of static fileName for webpages
chaelli pushed a commit to chaelli/kernel-memory that referenced this issue May 27, 2024
Update SearchClient to use webPageUrl instead of static fileName for webpages
@dluc dluc closed this as completed in 17d73e0 May 27, 2024
@microsoft microsoft locked and limited conversation to collaborators Jun 4, 2024
@dluc dluc converted this issue into discussion #553 Jun 4, 2024
@dluc dluc added discussion and removed question Further information is requested labels Jun 4, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

2 participants