Python prefect vs airflow

12/29/2023

Haven't used Prefect's caching yet (didn't need it) so I'm a bit unfamiliar with it. Then, that data is persisted on the path PREFECT_LOCAL_STORAGE_PATH and can be reused for subsequent flow runs. My understanding is that cache data during a flow run is stored in memory.

As a result, task caching between flow runs is currently limited to flow runs with access to that local storage path. Task results are cached in memory during a flow run and persisted to the location specified by the PREFECT_LOCAL_STORAGE_PATH setting. What if the data being cached is too large? Let me know if I can help with anything else. I just started using Prefect at work and actively learning more about it. A very simple implementation would be a flow "My Pipeline" and tasks could be "FetchMySQL", "TransformWithDbt" and "LoadToDW", each with it's own set of parameters. So a flow might be a single pipeline which has to ETL some data from a mysql, do some transformations and load it to a data warehouse. What might make sense is that a flow is the integration of many operations -the tasks here. But you can also run tasks asynchronously. You decorate them to provide metadata (such as name, retries, tags etc.) and let Prefect know that this is a special function that will be observed by the platform. They will execute in the specified order anyway For example if you have a flow "Test Flow" and run 2 tasks "Task 1" and "Task 2" you can see all tasks that run under "Test Flow" and each task's output/errors/debug/whatever. The point I have seen is that you can see a flow's task executions breakdown in the web ui and understand what might have gone wrong in which task. They can also be run asynchronously as well. Tasks are small units of processing that can be retried and cached.

0 Comments

Python prefect vs airflow

Leave a Reply.

Author

Archives

Categories