Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H-2692: Infer facts from text before proposing entities #4467

Merged
merged 16 commits into from
May 22, 2024

Conversation

benwerner01
Copy link
Member

@benwerner01 benwerner01 commented May 15, 2024

🌟 What is the purpose of this PR?

This PR modifies how entities are proposed in the research action, and stops making use of the inferEntitiesFromContent action to propose entities. The process of proposing entities in the worker agent is now:

  1. Summarise all relevant entities in the text provided
  2. Infer facts from the text, which have a subject, predicate, and singular object
  3. For each summarised entity, propose the entity and its outgoing links based on the facts which have the entity as their "subject"

In follow up we aim to use the underlying pieces of this process to no longer propose entities when processing a single piece of text. Instead we will gather all the facts from different sources on the coordinator level, so that entities can be proposed based on information obtained from a variety of sources.

πŸ”— Related links

πŸ” What does this change?

  • adds mocks for the temporal functionality needed to run flow steps methods in the vitest testing library

Pre-Merge Checklist πŸš€

🚒 Has this modified a publishable library?

This PR:

  • does not modify any publishable blocks or libraries, or modifications do not need publishing

πŸ“œ Does this require a change to the docs?

The changes in this PR:

  • are internal and do not require a docs change

πŸ•ΈοΈ Does this require a change to the Turbo Graph?

The changes in this PR:

  • do not affect the execution graph

⚠️ Known issues

  • This PR breaks the functionality of proposed entities being able to link to existing entities passed to the research action. This will be partially addressed when the remaining work on moving the fact gathering to the coordinator level takes place, as we can incorporate existing entities in the required fact deduplication work (H-2693). Ideally we will also make the fact inference methods aware of existing methods (H-2713).
  • We will need to add additional fields to the facts so that provenance information is captured. This is not yet required for this PR, as we can determine the provenance data as we would have previously as all properties are being derived from a single source.

🐾 Next steps

  • gather facts at the coordinator level from multiple sources, before proposing the entities (H-2693)
  • Add ability to specify existingEntities when inferring facts, so that these can be directly linked from new proposed entities (H-2713)

πŸ›‘ What tests cover this?

Manual testing

❓ How to test this?

Try out the existing flows that make use of the research action. I've used the "Get subsidiary companies of Google" as a prompt and the Company flow test type to produce the demoed result.

πŸ“Ή Demo

image

@github-actions github-actions bot added area/deps Relates to third-party dependencies (area) area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-api Affects the HASH API (app) type/eng > frontend Owned by the @frontend team type/eng > backend Owned by the @backend team area/apps labels May 15, 2024
@benwerner01 benwerner01 force-pushed the bw/infer-entities-from-facts branch from d68c05a to b8f9ee1 Compare May 15, 2024 18:37
@github-actions github-actions bot removed the type/eng > frontend Owned by the @frontend team label May 15, 2024
Copy link

codecov bot commented May 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 217 lines in your changes are missing coverage. Please review.

Project coverage is 20.83%. Comparing base (19e6e65) to head (2c18ba4).
Report is 28 commits behind head on main.

Files Patch % Lines
...e-entities-from-facts/propose-entity-from-facts.ts 0.00% 66 Missing ⚠️
...er-facts-from-text/infer-entity-facts-from-text.ts 0.00% 48 Missing ⚠️
...w-activities/shared/propose-entities-from-facts.ts 0.00% 28 Missing ⚠️
...-facts-from-text/get-entity-summaries-from-text.ts 0.00% 21 Missing ⚠️
.../shared/testing-utilities/mock-get-flow-context.ts 0.00% 18 Missing ⚠️
...es/flow-activities/shared/infer-facts-from-text.ts 0.00% 17 Missing ⚠️
...red/testing-utilities/get-alice-user-account-id.ts 0.00% 8 Missing ⚠️
...ction/infer-entities-from-web-page-worker-agent.ts 0.00% 6 Missing ⚠️
...worker-ts/src/activities/shared/activity-logger.ts 0.00% 4 Missing ⚠️
...sh-ai-worker-ts/src/activities/shared/stringify.ts 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4467      +/-   ##
==========================================
- Coverage   21.10%   20.83%   -0.27%     
==========================================
  Files         449      456       +7     
  Lines       15247    15443     +196     
  Branches     2275     2316      +41     
==========================================
  Hits         3218     3218              
- Misses      11988    12184     +196     
  Partials       41       41              
Flag Coverage Ξ”
apps.hash-ai-worker-ts 1.69% <0.00%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

@benwerner01 benwerner01 marked this pull request as ready for review May 15, 2024 20:43
@benwerner01 benwerner01 requested a review from CiaranMn May 15, 2024 20:43
@benwerner01 benwerner01 changed the title H-2692: infer facts from text, before proposing entities H-2692: infer facts from text before inferring entities May 15, 2024
Copy link
Contributor

Benchmark results

@rust/graph-benches – Integrations

scaling_read_entity_complete_one_depth

Function Value Mean
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 5 $$24.4 \mathrm{ms} \pm 276 \mathrm{ΞΌs}\left({\color{gray}0.397 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 50 $$255 \mathrm{ms} \pm 1.54 \mathrm{ms}\left({\color{gray}-2.188 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 $$45.5 \mathrm{ms} \pm 2.55 \mathrm{ms}\left({\color{red}48.9 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 25 $$69.6 \mathrm{ms} \pm 485 \mathrm{ΞΌs}\left({\color{gray}-3.468 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 $$20.4 \mathrm{ms} \pm 95.4 \mathrm{ΞΌs}\left({\color{gray}-0.909 \mathrm{\%}}\right) $$

representative_read_entity

Function Value Mean
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1 $$16.1 \mathrm{ms} \pm 189 \mathrm{ΞΌs}\left({\color{gray}-0.448 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1 $$16.5 \mathrm{ms} \pm 185 \mathrm{ΞΌs}\left({\color{gray}-4.006 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1 $$16.2 \mathrm{ms} \pm 189 \mathrm{ΞΌs}\left({\color{gray}1.94 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2 $$16.7 \mathrm{ms} \pm 187 \mathrm{ΞΌs}\left({\color{gray}0.484 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1 $$17.3 \mathrm{ms} \pm 198 \mathrm{ΞΌs}\left({\color{lightgreen}-32.658 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1 $$16.8 \mathrm{ms} \pm 213 \mathrm{ΞΌs}\left({\color{gray}0.506 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1 $$16.5 \mathrm{ms} \pm 186 \mathrm{ΞΌs}\left({\color{gray}1.22 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1 $$15.9 \mathrm{ms} \pm 157 \mathrm{ΞΌs}\left({\color{gray}-0.071 \mathrm{\%}}\right) $$
entity_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579, Entity Type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1 $$16.7 \mathrm{ms} \pm 169 \mathrm{ΞΌs}\left({\color{gray}2.85 \mathrm{\%}}\right) $$

representative_read_multiple_entities

Function Value Mean
link_by_source_by_property depths: DT=255, PT=255, ET=255, E=255 $$1.98 \mathrm{s} \pm 8.08 \mathrm{ms}\left({\color{gray}-0.737 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=2, PT=2, ET=2, E=2 $$1.05 \mathrm{s} \pm 3.57 \mathrm{ms}\left({\color{gray}0.515 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=2, ET=2, E=2 $$1.05 \mathrm{s} \pm 6.96 \mathrm{ms}\left({\color{gray}-0.038 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=2 $$95.7 \mathrm{ms} \pm 559 \mathrm{ΞΌs}\left({\color{gray}-0.172 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=0, ET=2, E=2 $$418 \mathrm{ms} \pm 1.31 \mathrm{ms}\left({\color{gray}0.233 \mathrm{\%}}\right) $$
link_by_source_by_property depths: DT=0, PT=0, ET=0, E=0 $$60.2 \mathrm{ms} \pm 372 \mathrm{ΞΌs}\left({\color{gray}-0.088 \mathrm{\%}}\right) $$
entity_by_property depths: DT=255, PT=255, ET=255, E=255 $$2.87 \mathrm{s} \pm 6.72 \mathrm{ms}\left({\color{gray}0.240 \mathrm{\%}}\right) $$
entity_by_property depths: DT=2, PT=2, ET=2, E=2 $$974 \mathrm{ms} \pm 4.96 \mathrm{ms}\left({\color{gray}-0.631 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=2, ET=2, E=2 $$965 \mathrm{ms} \pm 3.13 \mathrm{ms}\left({\color{gray}-2.832 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=0, ET=0, E=2 $$39.7 \mathrm{ms} \pm 220 \mathrm{ΞΌs}\left({\color{gray}-1.224 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=0, ET=2, E=2 $$355 \mathrm{ms} \pm 1.96 \mathrm{ms}\left({\color{gray}-2.990 \mathrm{\%}}\right) $$
entity_by_property depths: DT=0, PT=0, ET=0, E=0 $$35.9 \mathrm{ms} \pm 153 \mathrm{ΞΌs}\left({\color{gray}-0.392 \mathrm{\%}}\right) $$

representative_read_entity_type

Function Value Mean
get_entity_type_by_id Account ID: d4e16033-c281-4cde-aa35-9085bf2e7579 $$1.35 \mathrm{ms} \pm 5.32 \mathrm{ΞΌs}\left({\color{gray}-1.499 \mathrm{\%}}\right) $$

scaling_read_entity_linkless

Function Value Mean
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 $$2.39 \mathrm{ms} \pm 10.9 \mathrm{ΞΌs}\left({\color{gray}0.921 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10000 $$13.5 \mathrm{ms} \pm 124 \mathrm{ΞΌs}\left({\color{gray}0.377 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 100 $$2.55 \mathrm{ms} \pm 14.6 \mathrm{ΞΌs}\left({\color{gray}0.276 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1000 $$3.26 \mathrm{ms} \pm 21.4 \mathrm{ΞΌs}\left({\color{gray}1.62 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 $$2.39 \mathrm{ms} \pm 7.59 \mathrm{ΞΌs}\left({\color{gray}0.279 \mathrm{\%}}\right) $$

scaling_read_entity_complete_zero_depth

Function Value Mean
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 5 $$2.43 \mathrm{ms} \pm 13.2 \mathrm{ΞΌs}\left({\color{gray}-1.434 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 50 $$4.43 \mathrm{ms} \pm 20.5 \mathrm{ΞΌs}\left({\color{gray}1.72 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 10 $$2.63 \mathrm{ms} \pm 18.2 \mathrm{ΞΌs}\left({\color{gray}0.041 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 25 $$3.06 \mathrm{ms} \pm 12.2 \mathrm{ΞΌs}\left({\color{gray}-1.401 \mathrm{\%}}\right) $$
get_entity_by_id Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba, Number Of Entities: 1 $$2.40 \mathrm{ms} \pm 9.30 \mathrm{ΞΌs}\left({\color{gray}-0.121 \mathrm{\%}}\right) $$

@vilkinsons vilkinsons changed the title H-2692: infer facts from text before inferring entities H-2692: Infer facts from text before proposing entities May 16, 2024
Copy link
Member

@CiaranMn CiaranMn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, only a couple of minor comments

@@ -144,6 +88,7 @@ export const compareLlmResponses = async () => {
const llmResponses = await Promise.all(
models.map((model) => {
return getLlmResponse(
// @ts-expect-error - figure out what's going wrong here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The models field accepts any model name, but the params demands that only OpenAI model names are sent with OpenAI params, etc.

The type of CompareLlmResponseConfig would have to be updated somehow to make sure they were in sync, maybe with generics or something. Not sure it's worth sinking a lot of time into, since it's a testing utility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I made an attempt at this but didn't think it was worth sinking a lot of time into. We'll probably revise this method further once we dedicate time on the model evaluation you've mentioned.

Comment on lines +35 to +36
// eslint-disable-next-line no-console
console.log(logMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use logToConsole here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it doesn't actually log anything when I run the test via vitest, not entirely sure why haven't spent a lot of time trying to fix this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair play, don't worry about it then

@benwerner01 benwerner01 requested a review from CiaranMn May 21, 2024 17:23
@benwerner01 benwerner01 added this pull request to the merge queue May 22, 2024
Merged via the queue into main with commit 33e24e4 May 22, 2024
137 checks passed
@benwerner01 benwerner01 deleted the bw/infer-entities-from-facts branch May 22, 2024 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/apps > hash* Affects HASH (a `hash-*` app) area/apps > hash-api Affects the HASH API (app) area/apps area/deps Relates to third-party dependencies (area) type/eng > backend Owned by the @backend team
Development

Successfully merging this pull request may close these issues.

None yet

2 participants