Add `Builder.with_materializers()` #877

zilto · 2024-05-02T15:56:53Z

Following the discussion from #816, there would be benefits to allow materializer nodes to be defined statically at the Driver level (both DataLoader and DataSaver).

The nodes can be called directly via .execute()
Materializers appear in HamiltonGraph and visualizations even if they aren't executed.
Validate the DAG, including the materializers before execution.

Solution 1

dr = (
  driver.Builder()
  .with_modules(...)
  .with_materializers(
    to.dlt(
      id="features_duckdb",
      dependencies=["features_df"]m
      destination=duckdb_dest(...),
    )
  )
  .build()
)

Solution 2

An alternative, would be to allow materializers to be imported and added via .with_modules(). For example, production_materializers.py contains

# production_materializers.py
from hamilton.io.materialization import to

to.dlt(
  id="features__duckdb",
  dependencies=["features_df"],
  destination=duckdb_dest(...),
)

from hamilton import driver
import dataflow
import production_materializers

dr = driver.Builder().with_modules(dataflow, production_materializers).build()

For basic to.parquet() usage, it might be more efficient to store simple Python functions using pd.to_parquet() in a module to enable this patterns. More powerful materializers (e.g., dlt) would benefit from this approach though

The text was updated successfully, but these errors were encountered:

Following #877 Added the ability to add materializer nodes to the `FunctionGraph` at `Driver` build time. Use cases: - Easier to view materializers as part of `.display_all_functions()` - Allows to call materializers by name with `.execute()`. Would allow some users to completely ignore `.materialize()` - Possible to combine "static" materializers at build time and "dynamic" materializers at execution time - Catch invalid dataflows earlier than `dr.materialize()` ## Changes - Builder now accepts `.with_materializers(*materializers)` - `Builder.build()` adds materializer nodes after creating the `Driver` and returns the updated Driver. The logic is derived from `Driver.materialize()` - `Builder.copy()` copies the materializers ## How I tested this - Added tests to check materializer nodes (savers and loaders) are properly added - test that "static" and "dynamic" materializers can be used together ## Notes - each time materializers are added `post_graph_construct` hooks are triggered - Should be include materializers in the `version` hashing of the dataflow? should we treat "static" and "dynamic" differently. IMO, we might want two create two versions: one for "the dataflow transform, ignoring materializers" and one for "all nodes" since they answer different equality / diffing questions - there's duplicate logic between `Builder.build()` and `Driver.materialize()`. A better approach could be to have `Driver.add_materializers()` and `Driver.execute_materializers()`. Both `Builder.build()` and `Driver.materialize()` could call `Driver.add_materializers()` - Related to #713, users need to be aware that materializers need a valid `target` or `dependencies` (type and name) to connect to. Squashed commits of: * added Builder.with_materializers() * moved logic to Driver __init__ * added type annotation * made materializers an internal attribute * updated docs; pre-commit hook; typing bugfix --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>

zilto added the enhancement New feature or request label May 2, 2024

zilto mentioned this issue May 16, 2024

added Builder.with_materializers() #911

Merged

zilto closed this as completed May 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Builder.with_materializers()` #877

Add `Builder.with_materializers()` #877

zilto commented May 2, 2024 •

edited

Add Builder.with_materializers() #877

Add Builder.with_materializers() #877

Comments

zilto commented May 2, 2024 • edited

Solution 1

Solution 2

Add `Builder.with_materializers()` #877

Add `Builder.with_materializers()` #877

zilto commented May 2, 2024 •

edited