My Outreachy internship is coming to its ends which is also the best time to look back and reflect on the progress so far.
The goal of my project is to Extend and Improve the Apache Airflow REST API. In this post, I will be sharing my progress so far.
We started a bit late implementing the REST API because it took time for the OpenAPI 3.0 specification we were to use for the project to be merged. Thanks to Kamil, who paved the way for us to start implementing the REST API endpoints. Below are the endpoints I implemented and the challenges I encountered, including how I overcame them.
Implementing The Read-Only Connection Endpoints
The read-only connection endpoints were the first endpoint I implemented. Looking back, I can see how much I have improved.
I started by implementing the database schema for the Connection table using Marshmallow 2. We had to use Marshmallow 2 because Flask-AppBuilder was still using it and Flask-AppBuilder is deeply integrated to Apache Airflow. This meant I had to unlearn Marshmallow 3 that I had been studying before this realization, but thankfully, Marshmallow 3 isn’t too different, so I was able to start using Marshmallow 2 in no time.
This first PR would have been more difficult than it was unless there had been any reference endpoint to look at. Kamil implemented a draft PR in which I took inspiration from. Thanks to this, It was easy for me to write the unit tests. It was also in this endpoint that I learned using parameterized in unit tests :D.
Implementing The Read-Only DagRuns Endpoints
This endpoint came with its many challenges, especially on filtering with
This was because the
connexion library we were using to build the REST API was not validating
date-time format in OpenAPI 3.0 specification, what I eventually found out, was intentional.
strict-rfc3339 because of the later license which is not compatible with
Apache 2.0 license.
I implemented a workaround on this, by defining a function called
conn_parse_datetime in the
API utils module. This was later refactored and thankfully, Kamil
implemented a decorator that allowed us to have cleaner code on the views while using this function.
Then we tried using
rfc3339-validator whose license is compatible with Apache 2.0 licence but
later discarded this because with our custom date parser we were able to use duration and
not just date times.
I implemented some different other endpoints. One peculiar issue I faced was because of Marshmallow 2
not giving error when extra fields are in the request body. I implemented a
method on the schema to handle this. Thankfully, Flask-AppBuilder updated to using Marshmallow 3,
we quickly updated Flask-AppBuilder in Apache Airflow and started using Marshmallow 3 too.
Here are some PRs I contributed that are related to the REST API:
Add event log endpoints The event log would help users get information on operations performed at the UI
Add CRUD endpoints for connection This PR performs DELETE, PATCH and POST operations on
Add log endpoint This PR enables users to get Task Instances log entries
Move limit & offset to kwargs in views plus work on a configurable maximum limit This helped us in having a neat code on the views and added configurable maximum limit on query results.
Update FlaskAppBuilder to v3 This enabled Airflow to start using v3 of Flask App Builder and also made it possible for the API to use a modern database serializer/deserializer
Add migration guide from the experimental REST API to the stable REST API This would enable users to start using the stable REST API in less time.
There is still lots of works to be done on the REST API including writing helpful documentation. I still follow up on these and hopefully, we will complete the REST API before the internship ends.
I am very grateful to my mentors, Jarek and Kaxil for their patience with me and for surviving my never-ending questions. Kamil and Tomek have been very supportive and I appreciate them for their support and amazing code reviews.
Thanks for reading!