

#Amazon managed airflow upgrade
Every upgrade consumes a lot of engineering effort from upgrading to testing the new version with all the library dependency checks and so on. Upgrading the Airflow version is a cumbersome process. Challenges with the current setup(self-managed cluster)Īirflow community is very active and they keep on releasing new features and bug fixes within a short span of time. And, how we leverage Aiflow on our Data Platfom is described here. Airflow Cluster Architecture is described on our previous blog here. The Airflow Cluster and its components ( WebServer, Scheduler and Worker) are hosted in EC2 instances and is managed by Data Engineers and Site Reliability Engineers (SRE) collaboratively. We leverage Airflow to schedule over 350 DAG’s and 2500 tasks and as the business grows, we are continuously adding or orchestrating new data sources and new DAGs are added to the Airflow Server. Airflow plays a key role in our data platform, most of our data consumption and orchestration is scheduled using it. They act depending on the success (or not) of the workflow.Īs Vandon said, “It’s just making things simpler for users.At Halodoc, we have been using Airflow as a scheduling tool since 2019. There are also notifiers, which get placed at the end of the workflow. Sensors are operators that wait for something to happen. And now that block, that blueprint exists for everyone.”Īirflow 2.6 has an alpha for sensors, Vandon said. And it’s just like people find a need, and they contribute it back.

We’re at 2,500 contributors now, I believe. You can interact with so many different services and cloud providers. “So I’m going to go and look at, say, the Google Cloud operators and find one that fits what I want to do there. So I’m going to use that operator, and then I want to send the data somewhere else. And basically, the community develops and contributes to these operators so that the users, in the end, are basically saying the task I want to do is pull data from here. And then there’s an operator that will send the data to an SQL server or something like that. “So, for example, there’s an operator to write data to.

“You just chain them together in different ways,” he said. Each operator does one specific task, Ferruzzi said. Operators are like generic building blocks. So companies like AWS, and Google and Databricks, are all contributing these operators, which really wrap their underlying SDK.” ‘That Blueprint Exists for Everyone’ And two, we have this operator ecosystem. “The beautiful thing about Airflow, that has made it so popular is that it’s so easy,” Oliveira said.
#Amazon managed airflow software
Raphaël Vandon, a senior software engineer at AWS, is an Apache Airflow contributor working on performance improvements for Airflow and leveraging async capabilities in AWS Operators, the part of Airflow that allows for seamless interactions with AWS. It allows Airflow to be a more pluggable architecture, which makes it easier for users to build and write their own Airflow Executors. A recent project included writing and implementing AIP-51 (Airflow Improvement Proposal), which modifies and updates the Executor interface in Airflow. He spends much time reviewing, approving and merging pull requests. Niko Oliveira, a senior software development engineer at AWS, is a committer/maintainer for Apache Airflow. The API will allow for more granular metrics and better visibility into Airflow environments.
#Amazon managed airflow update
In an On the Road episode of Makers recorded at the Linux Foundation’s Open Source Summit North America, our guests, who all work with the AWS Managed Service for Airflow team, reflected on the work on Apache Airflow to improve the overall experience:ĭennis Ferruzzi, a software developer at AWS, is an Airflow contributor working on project API-49, which will update Airflow’s logging and metrics backend to the OpenTelemetry standard.
