

&lt;?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Apache Airflow Blog</title>
  <link href="/blog/" rel="alternate"/>
<link href="/blog/index.xml" rel="self" type="application/atom+xml"/>
<id>/blog/</id>


  <updated>2026-04-08T16:39:10Z</updated>


  <entry>
    <title>Apache Airflow 3.2.0: Data-Aware Workflows at Scale</title>
    <link href="/blog/airflow-3.2.0/" rel="alternate"/>
    <id>/blog/airflow-3.2.0/</id>
    <published>2026-04-07T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>We&rsquo;re proud to announce the release of <strong>Apache Airflow 3.2.0</strong>! Airflow 3.1 puts humans at the center of automated workflows. 3.2 brings that same precision to data: Asset partitioning for granular pipeline orchestration, multi-team deployments for enterprise scale, synchronous deadline alert callbacks, and continued progress toward full Task SDK separation.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/3.2.0/">https://pypi.org/project/apache-airflow/3.2.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/3.2.0/">https://airflow.apache.org/docs/apache-airflow/3.2.0/</a> <br>
🛠️ Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/3.2.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/3.2.0/release_notes.html</a> <br>
🐳 Docker Image: <code>docker pull apache/airflow:3.2.0</code> <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-3.2.0">https://github.com/apache/airflow/tree/constraints-3.2.0</a></p>
<h1 id="-asset-partitioning-aip-76-only-the-right-work-gets-triggered">🗂️ Asset Partitioning (AIP-76): Only the Right Work Gets Triggered</h1>
<p>Asset partitioning has been one of the most requested additions to data-aware scheduling. If you work with date-partitioned S3 paths, Hive table partitions, BigQuery partitions, or really any partitioned data store, you&rsquo;ve dealt with this: An upstream task updates one partition, and every downstream Dag fires regardless of which slice actually changed. It&rsquo;s wasteful, and for large deployments it creates real operational noise.</p>
<p>Asset partitioning in 3.2 makes this granular. Downstream Dags trigger only when the specific partition they care about gets updated. It&rsquo;s the biggest change to data-aware scheduling since Assets were introduced, and it turns partition-driven orchestration into something Airflow handles natively rather than something you work around.</p>
<p><img src="/blog/airflow-3.2.0/images/asset_partitioning.png" alt="Asset Partitioning"></p>
<h2 id="key-capabilities">Key Capabilities</h2>
<ul>
<li><strong>Partition-driven scheduling</strong>: Dags trigger on specific partition updates, not every asset change</li>
<li><strong>CronPartitionTimetable</strong>: Schedule Dags against partitions using cron expressions. Also available in the Task SDK</li>
<li><strong>Backfill for partitioned Dags</strong>: Backfill historical partitions without re-triggering everything downstream (#61464)</li>
<li><strong>Multi-asset partitions</strong>: A single Dag can listen for partitions across multiple assets, which matters when your downstream work depends on several sources aligning (#60577)</li>
</ul>
<p>For more advanced use cases, there are temporal and range partition mappers (#61522, #55247) for mapping time ranges and value ranges to partition keys, a partition key field on Dag run references (#61725) so you can inspect exactly which partition triggered a run, and PartitionedAssetTimetable for full control over how partition events from multiple assets get resolved into a unified trigger.</p>
<p><strong>Example</strong>: Three upstream ingestion Dags each write to a separate asset on an hourly cadence. The downstream Dag only triggers when all three have updated the same hourly partition. Since the three assets don&rsquo;t share a partition key natively, a mapper resolves them into a common key.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">annotations</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.sdk</span> <span class="kn">import</span> <span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">DAG</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">Asset</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">CronPartitionTimetable</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">PartitionedAssetTimetable</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">StartOfHourMapper</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">asset</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">task</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">team_a_player_stats</span> <span class="o">=</span> <span class="n">Asset</span><span class="p">(</span><span class="n">uri</span><span class="o">=</span><span class="s2">&#34;file://incoming/player-stats/team_a.csv&#34;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s2">&#34;team_a_player_stats&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="n">combined_player_stats</span> <span class="o">=</span> <span class="n">Asset</span><span class="p">(</span><span class="n">uri</span><span class="o">=</span><span class="s2">&#34;file://curated/player-stats/combined.csv&#34;</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s2">&#34;combined_player_stats&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;ingest_team_a_player_stats&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">schedule</span><span class="o">=</span><span class="n">CronPartitionTimetable</span><span class="p">(</span><span class="s2">&#34;0 * * * *&#34;</span><span class="p">,</span> <span class="n">timezone</span><span class="o">=</span><span class="s2">&#34;UTC&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="n">tags</span><span class="o">=</span><span class="p">[</span><span class="s2">&#34;player-stats&#34;</span><span class="p">,</span> <span class="s2">&#34;ingestion&#34;</span><span class="p">],</span>
</span></span><span class="line"><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nd">@task</span><span class="p">(</span><span class="n">outlets</span><span class="o">=</span><span class="p">[</span><span class="n">team_a_player_stats</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">ingest_team_a_stats</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Materialize Team A player statistics for the current hourly partition.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="k">pass</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">ingest_team_a_stats</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@asset</span><span class="p">(</span><span class="n">schedule</span><span class="o">=</span><span class="n">CronPartitionTimetable</span><span class="p">(</span><span class="s2">&#34;15 * * * *&#34;</span><span class="p">,</span> <span class="n">timezone</span><span class="o">=</span><span class="s2">&#34;UTC&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">team_b_player_stats</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">pass</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;clean_and_combine_player_stats&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">schedule</span><span class="o">=</span><span class="n">PartitionedAssetTimetable</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">assets</span><span class="o">=</span><span class="n">team_a_player_stats</span> <span class="o">&amp;</span> <span class="n">team_b_player_stats</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">default_partition_mapper</span><span class="o">=</span><span class="n">StartOfHourMapper</span><span class="p">(),</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="n">catchup</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nd">@task</span><span class="p">(</span><span class="n">outlets</span><span class="o">=</span><span class="p">[</span><span class="n">combined_player_stats</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">combine_player_stats</span><span class="p">(</span><span class="n">dag_run</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;&#34;&#34;Merge the aligned hourly partitions into a combined dataset.&#34;&#34;&#34;</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="n">dag_run</span><span class="o">.</span><span class="n">partition_key</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="n">combine_player_stats</span><span class="p">()</span>
</span></span></code></pre></div><p>See <a href="https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/example_dags/example_asset_partition.py"><code>example_asset_partition.py</code></a> and the Task SDK API docs for <code>PartitionedAssetTimetable</code> and partition mappers.</p>
<h1 id="-multi-team-deployments-aip-67-airflow-for-the-enterprise">🏢 Multi-Team Deployments (AIP-67): Airflow for the Enterprise</h1>
<blockquote>
<p>⚠️ <strong>Experimental</strong>: Multi-Team support is experimental in Airflow 3.2 and may change in future releases based on user feedback.</p></blockquote>
<p>Airflow 3.2 introduces multi-team support, allowing organizations to run multiple isolated teams within a single Airflow deployment. Each team can have its own Dags, connections, variables, pools, and executors— enabling true resource and permission isolation without requiring separate Airflow instances per team.</p>
<p>This is particularly valuable for platform teams that serve multiple data engineering or data science teams from shared infrastructure, while maintaining strong boundaries between teams&rsquo; resources and access.</p>
<h2 id="key-capabilities-1">Key Capabilities</h2>
<ul>
<li><strong>Per-team resource isolation</strong>: Each team has its own Dags, connections, variables, and pools</li>
<li><strong>Per-team executors</strong>: Different teams can use different executors (e.g. Celery, Kubernetes, Local, AWS ECS, etc.) and configure them separately — #57837, #57910</li>
<li><strong>Team-scoped authorization</strong>: Keycloak and Simple auth managers support team-scoped access control (#61351, #61861)</li>
<li><strong>Team-scoped secrets</strong>: Use <code>AIRFLOW_VAR__{TEAM}___{KEY}</code> environment variable or <code>AIRFLOW_CONN__&lt;TEAM&gt;___&lt;CONN_ID&gt;</code> pattern for team-specific secrets (#62588)</li>
<li><strong>CLI management</strong>: New CLI commands for managing teams (#55283)</li>
<li><strong>UI team selector</strong>: Team selector in connection, variable, and pool create/edit forms (#60237, #60474, #61082)</li>
<li><strong>Full API support</strong>: <code>team_name</code> field added to Connection, Variable, and Pool APIs (#59336, #57102, #60952)</li>
</ul>
<h2 id="enabling-multi-team">Enabling Multi-Team</h2>
<pre tabindex="0"><code># In airflow.cfg:
[core]
multi_team = True

# Or via environment variable:
export AIRFLOW__CORE__MULTI_TEAM=True
</code></pre><h1 id="-deadline-alerts-now-with-synchronous-callbacks-aip-86">⏰ Deadline Alerts: Now With Synchronous Callbacks (AIP-86)</h1>
<blockquote>
<p>⚠️ <strong>Experimental</strong>: Deadline Alerts are experimental in Airflow 3.2 and may change in future releases based on user feedback.</p></blockquote>
<p>Building on the Deadline Alerts system introduced in Airflow 3.1, this release adds synchronous callback support. In 3.1, callbacks ran through the triggerer (async only), which limited integration options. Synchronous callbacks execute directly via the executor, with optional targeting of a specific executor via the executor parameter.</p>
<h2 id="whats-new-in-32">What&rsquo;s New in 3.2</h2>
<ul>
<li><strong>SyncCallback support</strong>: Unlike <code>AsyncCallback</code> which runs on the triggerer, <code>SyncCallback</code> executes directly on the worker via the executor, with optional targeting of a specific executor</li>
<li><strong>Multiple Deadline Alerts per Dag</strong>: Pass a list to the deadline parameter to configure multiple thresholds on a single Dag</li>
<li><strong>Missed-deadline metadata in Grid API</strong>: Dag run API now includes missed-deadline information for programmatic monitoring</li>
<li><strong>Improved UX for custom DeadlineReferences</strong>: Cleaner developer experience when defining custom deadline reference points (#57222)</li>
</ul>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;sync_deadline&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">deadline</span><span class="o">=</span><span class="n">DeadlineAlert</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">reference</span><span class="o">=</span><span class="n">DeadlineReference</span><span class="o">.</span><span class="n">FIXED_DATETIME</span><span class="p">(</span><span class="n">datetime</span><span class="p">(</span><span class="mi">1980</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">2</span><span class="p">)),</span>
</span></span><span class="line"><span class="cl">        <span class="n">interval</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">callback</span><span class="o">=</span><span class="n">SyncCallback</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="n">SlackWebhookNotifier</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="s2">&#34;Sync Callback; Alert should trigger immediately!&#34;</span><span class="p">},</span>
</span></span><span class="line"><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">EmptyOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s1">&#39;empty_task&#39;</span><span class="p">)</span>
</span></span></code></pre></div><h1 id="-ui-enhancements">🖥️ UI Enhancements</h1>
<ul>
<li><strong>HITL Approval History</strong>: The Human-in-the-Loop approval interface now shows the complete audit trail of approvals and rejections for any task. (#56760, #55952)</li>
<li><strong>XCom Management</strong>: You can now add, edit, and delete XCom values directly from the UI. (#58921)</li>
<li><strong>Segmented state bar</strong>: Collapsed task groups and mapped tasks now show a segmented state bar for at-a-glance status (#61854)</li>
<li><strong>Unified tooltips</strong>: Grid and Graph view tooltips now show dates, duration, and child states (#62119)</li>
<li><strong>Filename in Dag Code tab</strong>: File identification now shown in the Code tab (#60759)</li>
<li><strong>Copy button for logs</strong>: One-click log copying (#61185)</li>
<li><strong>Date range filter</strong>: Filter Dag executions by date range (#60772)</li>
<li><strong>Task upstream/downstream filter</strong>: Filter by upstream or downstream tasks in Graph and Grid views (#57237)</li>
<li><strong>Data redaction</strong>: Sensitive fields are now redacted in the UI and Public API (#59873)</li>
<li><strong>Custom theme support</strong>: <code>globalCss</code> and theme config for white-label/custom deployments (#61161, #58411)</li>
<li><strong>Inherit core UI theme in React plugins</strong>: Plugin UIs now automatically match the core Airflow theme (#60256)</li>
<li><strong>Task display names in Gantt</strong>: <code>task_display_name</code> shown for better readability (#61438)</li>
</ul>
<h1 id="-performance-improvements">🚀 Performance Improvements</h1>
<p><strong>Rendered Task Instance Fields Cleanup: ~42x Faster.</strong> The cleanup job for rendered task instance fields has been rewritten and is roughly 42 times faster for Dags with many mapped tasks. Retention is now based on the N most recent Dag runs rather than N most recent task executions, which is both more intuitive and dramatically more performant. Config renamed: <code>max_num_rendered_ti_fields_per_task</code> → <code>num_dag_runs_to_retain_rendered_fields</code> (old name still works with deprecation warning). (#60951)</p>
<p><strong>Scheduler Improvements.</strong> For large-scale deployments, 3.2 addresses several known bottlenecks:</p>
<ul>
<li>The scheduler no longer loads all TaskInstances into memory, preventing memory spikes on large deployments (#60956)</li>
<li>Faster task dequeuing loop (#61376)</li>
<li>Queue query now enforces <code>max_active_tasks</code> directly, preventing over-queueing (#54103)</li>
</ul>
<p><strong>API Server Improvements:</strong></p>
<ul>
<li>Eliminated SerializedDag loads on task start, reducing memory usage (#60803)</li>
<li><code>serialized_dag</code> data column now uses JSONB on PostgreSQL (#55979)</li>
</ul>
<h1 id="-task-sdk-evolution--developer-experience">🔧 Task SDK Evolution &amp; Developer Experience</h1>
<h2 id="task-sdk-decoupling-continues">Task SDK Decoupling Continues</h2>
<p>Airflow 3.2 continues moving components from <code>airflow-core</code> into the Task SDK, progressing toward full client-server separation. This enables Dag authors to independently upgrade the Task SDK without requiring Airflow Core upgrades, reducing coordination overhead between Dag authors and Ops teams.</p>
<p>Modules moved to Task SDK in this release (old import paths still work with deprecation warnings):</p>
<ul>
<li><strong>Exceptions</strong>: <code>AirflowSkipException</code>, <code>TaskDeferred</code>, etc. → <code>airflow.sdk.exceptions</code> (#59780)</li>
<li><strong>Serde</strong>: <code>airflow.serialization.serde</code> → <code>airflow.sdk.serde</code>; serializers → <code>airflow.sdk.serde.serializers.*</code> (#58900)</li>
<li><strong>SkipMixin / BranchMixIn</strong>: Moved to Task SDK; existing imports work via <code>common-compat</code> (#62749, #62776)</li>
<li><strong>Lineage module</strong>: Moved to Task SDK for client-server separation (#60968, #61157)</li>
<li><strong>Listeners module</strong>: Moved to shared library (#59883)</li>
<li><strong>XCom API</strong>: Decoupled from <code>XComEncoder</code> (#58900)</li>
</ul>
<h2 id="pythonoperator-async-support">PythonOperator Async Support</h2>
<p><code>PythonOperator</code> now supports async callables. You can pass an async function as the <code>python_callable</code> and the operator will correctly await it, enabling async I/O patterns without needing a custom operator. (#60268)</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="nd">@task</span><span class="p">(</span><span class="n">show_return_value_in_logs</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">load_xml_files</span><span class="p">(</span><span class="n">files</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="kn">import</span> <span class="nn">asyncio</span>
</span></span><span class="line"><span class="cl">    <span class="kn">from</span> <span class="nn">io</span> <span class="kn">import</span> <span class="n">BytesIO</span>
</span></span><span class="line"><span class="cl">    <span class="kn">from</span> <span class="nn">more_itertools</span> <span class="kn">import</span> <span class="n">chunked</span>
</span></span><span class="line"><span class="cl">    <span class="kn">from</span> <span class="nn">os</span> <span class="kn">import</span> <span class="n">cpu_count</span>
</span></span><span class="line"><span class="cl">    <span class="kn">from</span> <span class="nn">tenacity</span> <span class="kn">import</span> <span class="n">retry</span><span class="p">,</span> <span class="n">stop_after_attempt</span><span class="p">,</span> <span class="n">wait_fixed</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="kn">from</span> <span class="nn">airflow.providers.sftp.hooks.sftp</span> <span class="kn">import</span> <span class="n">SFTPClientPool</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;number of files:&#34;</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">files</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">    <span class="k">async</span> <span class="k">with</span> <span class="n">SFTPClientPool</span><span class="p">(</span><span class="n">sftp_conn_id</span><span class="o">=</span><span class="n">sftp_conn</span><span class="p">,</span> <span class="n">pool_size</span><span class="o">=</span><span class="n">cpu_count</span><span class="p">())</span> <span class="k">as</span> <span class="n">pool</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="nd">@retry</span><span class="p">(</span><span class="n">stop</span><span class="o">=</span><span class="n">stop_after_attempt</span><span class="p">(</span><span class="mi">3</span><span class="p">),</span> <span class="n">wait</span><span class="o">=</span><span class="n">wait_fixed</span><span class="p">(</span><span class="mi">5</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">        <span class="k">async</span> <span class="k">def</span> <span class="nf">download_file</span><span class="p">(</span><span class="n">file</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="k">async</span> <span class="k">with</span> <span class="n">pool</span><span class="o">.</span><span class="n">get_sftp_client</span><span class="p">()</span> <span class="k">as</span> <span class="n">sftp</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;downloading:&#34;</span><span class="p">,</span> <span class="n">file</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="n">buffer</span> <span class="o">=</span> <span class="n">BytesIO</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">                <span class="k">async</span> <span class="k">with</span> <span class="n">sftp</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="n">encoding</span><span class="o">=</span><span class="n">xml_encoding</span><span class="p">)</span> <span class="k">as</span> <span class="n">remote_file</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="n">data</span> <span class="o">=</span> <span class="k">await</span> <span class="n">remote_file</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">                    <span class="n">buffer</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="o">.</span><span class="n">encode</span><span class="p">(</span><span class="n">xml_encoding</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">                    <span class="n">buffer</span><span class="o">.</span><span class="n">seek</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="k">return</span> <span class="n">buffer</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">        <span class="k">for</span> <span class="n">batch</span> <span class="ow">in</span> <span class="n">chunked</span><span class="p">(</span><span class="n">files</span><span class="p">,</span> <span class="n">cpu_count</span><span class="p">()</span> <span class="o">*</span> <span class="mi">2</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">            <span class="n">tasks</span> <span class="o">=</span> <span class="p">[</span><span class="n">asyncio</span><span class="o">.</span><span class="n">create_task</span><span class="p">(</span><span class="n">download_file</span><span class="p">(</span><span class="n">f</span><span class="p">))</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">batch</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">            <span class="c1"># Wait for this batch to finish before starting the next</span>
</span></span><span class="line"><span class="cl">            <span class="k">for</span> <span class="n">task</span> <span class="ow">in</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">as_completed</span><span class="p">(</span><span class="n">tasks</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">                <span class="n">result</span> <span class="o">=</span> <span class="k">await</span> <span class="n">task</span>
</span></span><span class="line"><span class="cl">                <span class="c1"># Do something with result or accumulate it and return it as an XCom</span>
</span></span></code></pre></div><h1 id="updated-securiy-model">Updated securiy model</h1>
<p>We are working on improving isolation and improving security of Airflow deployments and in order to make our users better informed of what expectations they should have for Airflow security, we updated the security model to reflect changes implemented in Airflow 3.2.0 and explain future improvements that we work on in this area. See more:  <a href="https://airflow.apache.org/docs/apache-airflow/stable/security/security_model.html">Airflow Security Model</a>.</p>
<h1 id="-community-appreciation">🙏 Community Appreciation</h1>
<p>This release represents the collaborative effort of hundreds of contributors from around the world. Special thanks to our release manager and all the developers, documentarians, testers, and community members who made Airflow 3.2.0 possible.</p>
<p>Thanks to contributors like you, the Airflow project continues to thrive. Whether you&rsquo;re filing issues, submitting PRs, improving documentation, or helping others in the community, every contribution matters.</p>
<h1 id="-get-involved">🔗 Get Involved</h1>
<ul>
<li><strong>Try the Release</strong>: Upgrade your development environment and explore the new features</li>
<li><strong>Join the Conversation</strong>: Connect with us on <a href="https://s.apache.org/airflow-slack">Slack</a> and the <a href="https://airflow.apache.org/community/">dev mailing list</a></li>
<li><strong>Contribute</strong>: Check out our <a href="https://github.com/apache/airflow/blob/main/contributing-docs/README.rst">contribution guide</a></li>
<li><strong>Provide Feedback</strong>: Share your experiences and suggestions on <a href="https://github.com/apache/airflow">GitHub</a></li>
</ul>
<p>Apache Airflow 3.2.0 marks a new chapter in data-aware, partition-driven workflow orchestration. We can&rsquo;t wait to see what you build with it!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Introducing the Apache Airflow Registry</title>
    <link href="/blog/airflow-registry/" rel="alternate"/>
    <id>/blog/airflow-registry/</id>
    <published>2026-03-19T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Today we&rsquo;re launching the <strong><a href="https://airflow.apache.org/registry/">Apache Airflow Registry</a></strong> — a searchable catalog of every official Airflow provider and its modules, live at <a href="https://airflow.apache.org/registry/">airflow.apache.org/registry/</a>.</p>
<p>Need an S3 operator? A Snowflake hook? An OpenAI sensor? The Registry helps you find, compare, and configure the right components for your data pipelines — without digging through docs or PyPI pages.</p>
<p><img src="/blog/airflow-registry/images/registry-homepage.png" alt="Registry Homepage"></p>
<h2 id="by-the-numbers">By the Numbers</h2>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><strong>98</strong></td>
          <td>Official providers</td>
      </tr>
      <tr>
          <td><strong>1,602</strong></td>
          <td>Modules (operators, hooks, sensors, triggers, transfers, and more)</td>
      </tr>
      <tr>
          <td><strong>329M+</strong></td>
          <td>Monthly PyPI downloads across all providers</td>
      </tr>
      <tr>
          <td><strong>125+</strong></td>
          <td>Integrations with cloud platforms, databases, ML tools, and messaging services</td>
      </tr>
  </tbody>
</table>
<h2 id="search-everything">Search Everything</h2>
<p>Hit <strong>Cmd+K</strong> from any page and start typing. Results show up instantly, grouped by Providers and Modules, with type badges so you can tell a hook from an operator at a glance.</p>
<p><img src="/blog/airflow-registry/images/search.png" alt="Search results showing the S3Hook from the Amazon provider"></p>
<h2 id="provider-pages">Provider Pages</h2>
<p>Each provider gets a dedicated page with everything in one place: install command with copy-to-clipboard, version selector, extras dropdown, compatibility info, connection types, and the full module listing organized by type.</p>
<p><img src="/blog/airflow-registry/images/provider-detail.png" alt="Amazon provider detail page showing 372 modules across 10 types"></p>
<p>The Amazon provider, for example, has <strong>372 modules</strong> across operators, hooks, sensors, triggers, transfers, and more. Module type tabs let you filter to exactly what you&rsquo;re looking for, and a category sidebar groups modules by AWS service (S3, Lambda, Glue, Step Functions, etc.).</p>
<h2 id="connection-builder">Connection Builder</h2>
<p>Click any connection type badge on a provider page, fill in the fields, and the builder generates the connection in three formats — <strong>URI</strong>, <strong>JSON</strong>, and <strong>Env Var</strong> — ready to copy into your configuration.</p>
<p><img src="/blog/airflow-registry/images/connection-builder.gif" alt="Connection builder showing URI, JSON, and Env Var export formats"></p>
<p>No more guessing URI encoding or JSON structure.</p>
<h2 id="explore-by-category">Explore by Category</h2>
<p>Not sure which provider you need? The <strong><a href="https://airflow.apache.org/registry/explore/">Explore page</a></strong> organizes providers into categories: Cloud Platforms, Databases, Data Warehouses, Messaging &amp; Notifications, AI &amp; Machine Learning, Data Processing, and more.</p>
<p><img src="/blog/airflow-registry/images/explore-categories.png" alt="Explore page showing providers grouped by category"></p>
<h2 id="statistics">Statistics</h2>
<p>The <strong><a href="https://airflow.apache.org/registry/stats/">Stats page</a></strong> breaks down the ecosystem: <strong>848 operators</strong>, <strong>298 hooks</strong>, <strong>164 triggers</strong>, <strong>157 sensors</strong>, <strong>83 transfers</strong>, and more — plus top providers by downloads and module count.</p>
<p><img src="/blog/airflow-registry/images/stats-page.png" alt="Registry statistics showing module distribution by type"></p>
<h2 id="json-api">JSON API</h2>
<p>Every piece of data in the Registry is available as structured JSON — providers, modules, parameters, connections, versions. An <strong><a href="https://airflow.apache.org/registry/api-explorer/">API Explorer</a></strong> lets you browse all endpoints interactively.</p>
<p><img src="/blog/airflow-registry/images/api-explorer.png" alt="API Explorer with OpenAPI 3.1 spec"></p>
<p>This makes the Registry accessible to IDE extensions, AI coding assistants, and automation tools.</p>
<h2 id="light--dark-mode">Light &amp; Dark Mode</h2>
<p>Full theme support with dark mode as the default. One click to switch.</p>
<p><img src="/blog/airflow-registry/images/light-mode.png" alt="Registry homepage in light mode"></p>
<h2 id="standing-on-shoulders">Standing on Shoulders</h2>
<p>The Apache Airflow PMC would like to thank <a href="https://www.astronomer.io">Astronomer</a> for building and maintaining the Astronomer Registry for years — it was the go-to place to discover Airflow providers and proved the value of a searchable provider catalog. That work directly shaped this community-owned registry.</p>
<p>The Apache Airflow Registry lives at <code>airflow.apache.org</code>, is built from the same repo as the providers, and updates automatically when new versions are published.</p>
<h2 id="whats-next">What&rsquo;s Next</h2>
<p>This is the first release of the Registry. Here&rsquo;s what&rsquo;s coming:</p>
<ul>
<li><strong>Third-party provider support</strong> — we&rsquo;re exploring options to list community-built providers alongside the official ones</li>
<li><strong>Richer module pages</strong> — dedicated pages per module with full parameter docs and usage examples</li>
</ul>
<h2 id="get-involved">Get Involved</h2>
<ul>
<li><strong><a href="https://airflow.apache.org/registry/">Explore the Registry</a></strong> and let us know what you think</li>
<li><strong>Join the conversation</strong> on <a href="https://s.apache.org/airflow-slack">Airflow Slack</a> and the <a href="https://airflow.apache.org/community/">dev mailing list</a></li>
<li><strong>Contribute</strong> — the code lives in <a href="https://github.com/apache/airflow/tree/main/registry"><code>registry/</code></a> in the main Airflow repo</li>
<li><strong>Report issues or request features</strong> on <a href="https://github.com/apache/airflow/issues">GitHub</a></li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Survey 2025</title>
    <link href="/blog/airflow-survey-2025/" rel="alternate"/>
    <id>/blog/airflow-survey-2025/</id>
    <published>2026-01-22T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p><img src="/blog/airflow-survey-2025/images/Airflow-Survey-2025-Results.png" alt="Airflow Survey 2025" title="airflow_survey_2025"></p>
<div style="display: flex; align-items: flex-start; gap: 0.75rem;">
  <a href="https://www.astronomer.io/" style="flex-shrink: 0;"><img src="images/astronomer-logo.svg" alt="Astronomer" width="40" height="40" /></a>
  <div>
    <p style="margin: 0;">The interactive report is hosted by <a href="https://www.astronomer.io/">Astronomer</a>. The Apache Airflow community thanks <a href="https://www.astronomer.io/">Astronomer</a> for running this survey, for sponsoring it and providing the report in this form, and for their effort in marketing, analysis, and preparing the graphics.</p>
  </div>
</div>
<hr style="margin: 1rem 0; border: none; border-top: 1px solid #ccc;" />
<p><a href="https://astronomer.typeform.com/reports/01KESPS8SJ2Y80THJAEYCECE5B">View raw data</a></p>
<p><a href="/data/survey-responses/airflow-user-survey-responses-2025.csv.zip">Download survey responses (CSV)</a></p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow CTL aka airflowctl 0.1.0</title>
    <link href="/blog/airflowctl-0.1.0/" rel="alternate"/>
    <id>/blog/airflowctl-0.1.0/</id>
    <published>2025-10-15T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>We are thrilled to announce the first major release of <strong><code>airflowctl</code> 0.1.0</strong>, the new <strong>secure, API-driven command-line interface (CLI)</strong> for Apache Airflow — built under <a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-81&#43;Enhanced&#43;Security&#43;in&#43;CLI&#43;via&#43;Integration&#43;of&#43;API"><strong>AIP-81</strong></a>.</p>
<p>This release marks CLI to join the general posture on communicating through API. Airflow CLI joins the modern era of secure, auditable, and remote-first operations.</p>
<p><strong>Details</strong>:</p>
<p>📦 <strong>PyPI:</strong> <a href="https://pypi.org/project/apache-airflow-ctl/0.1.0/">https://pypi.org/project/apache-airflow-ctl/0.1.0/</a>  <br>
🛠️ <strong>Release Notes:</strong> <a href="https://airflow.apache.org/docs/apache-airflow-ctl/stable/release_notes.html">https://airflow.apache.org/docs/apache-airflow-ctl/stable/release_notes.html</a>  <br>
🪶 <strong>Source Code:</strong> <a href="https://github.com/apache/airflow/tree/main/airflow-ctl">https://github.com/apache/airflow/tree/main/airflow-ctl</a></p>
<h2 id="-what-is-airflowctl">🎯 What is airflowctl?</h2>
<p><code>airflowctl</code> is a new command-line interface for Apache Airflow that interacts exclusively with the Airflow REST API.
It provides a secure, auditable, and consistent way to manage Airflow deployments — without direct access to the metadata database.</p>
<h2 id="-coexistence-with-airflow-cli">🔄 Coexistence with Airflow CLI</h2>
<p>The Airflow CLI will continue as intended, primarily for admin tasks such as running Airflow components (<code>airflow api-server</code>, <code>airflow scheduler</code>) or managing the metadata database (<code>airflow db init</code>).
<code>airflowctl</code> focuses on operational commands that interact with Airflow resources via the API (<code>airflowctl dagrun trigger</code>, <code>airflowctl connection create</code>, etc.).</p>
<p>We defined the commands falls under <strong>two main categories</strong>:</p>
<ol>
<li><strong>Remote Commands</strong>: Operations that can be provided via API (e.g., managing DAGs, connections, variables, triggering DAG runs) are now available in <code>airflowctl</code> and will be the recommended approach going forward.</li>
<li><strong>Local/Admin Commands</strong>: Operations that manage Airflow components or the metadata database will remain in the Airflow CLI.</li>
</ol>
<p>Of course, in the current state they will both have the remote commands.
We are planning a zero-disruption migration path where <strong>Remote Commands</strong> will be gradually deprecated from the Airflow CLI as they achieve parity in <code>airflowctl</code>.</p>
<h2 id="-why-airflowctl">🔒 Why airflowctl?</h2>
<p>Until now, Airflow CLI connected directly to the <strong>metadata database</strong>, bypassing RBAC, authentication, and API logs.
While convenient, this approach limited <strong>security, auditing, and remote management</strong> capabilities — especially for enterprise environments.</p>
<p><strong><code>airflowctl</code></strong> changes that by routing every command through the <strong>Airflow REST API</strong>, bringing:</p>
<ul>
<li><strong>Authentication &amp; RBAC enforcement</strong></li>
<li><strong>Centralized logging &amp; audit trail</strong></li>
<li><strong>Secure credential storage via Keyring</strong></li>
<li><strong>Remote command execution with zero DB access</strong></li>
<li><strong>Consistency with Airflow UI and API behaviors</strong></li>
</ul>
<h2 id="-aip-81-cli-reimagined-through-the-api">🚀 AIP-81: CLI Reimagined Through the API</h2>
<p><strong>AIP-81</strong> (“Enhanced Security in CLI via Integration of API”) defined a clear goal:</p>
<blockquote>
<p>“The CLI must be a first-class, secure client of the Airflow REST API — not a privileged database actor.”</p></blockquote>
<p><code>airflowctl</code> is the direct realization of that vision.</p>
<h3 id="core-design-principles">Core design principles:</h3>
<ul>
<li><strong>All remote commands use the REST API</strong></li>
<li><strong>RBAC and auth handled consistently via API layer</strong></li>
<li><strong>Pluggable auth mechanisms</strong> (basic auth, OAuth, token, etc.)</li>
<li><strong>Secure credential persistence</strong> through <strong>system keyring</strong></li>
<li><strong>Extensible</strong> to new API endpoints as Airflow evolves</li>
</ul>
<h2 id="-getting-started">⚙️ Getting Started</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">pip install apache-airflow-ctl
</span></span></code></pre></div><p>Once installed, you can connect your CLI to an Airflow instance:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">airflowctl auth login --url http://localhost:8080 --username admin --password admin
</span></span></code></pre></div><p>The password field is interactive by default. You can enter your password securely without echoing it on the terminal.
Use the above command without specifying the password and run it.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">airflowctl auth login --url http://localhost:8080 --username admin --password
</span></span></code></pre></div><h2 id="-command-highlights">🧩 Command Highlights</h2>
<p>Here’s a quick look at some of the most popular commands, now fully API-backed in airflowctl 0.1.0:</p>
<h3 id="-assets">🧩 Assets</h3>
<p><img src="/blog/airflowctl-0.1.0/images/assets_create_event.gif" alt="Assets Create Event">
<img src="/blog/airflowctl-0.1.0/images/assets_get.gif" alt="Assets Get"></p>
<h3 id="-config">⚙️ Config</h3>
<p><img src="/blog/airflowctl-0.1.0/images/config_get.gif" alt="Config Get"></p>
<h3 id="-connections">🔑 Connections</h3>
<p><img src="/blog/airflowctl-0.1.0/images/connections_create.gif" alt="Connections Create">
<img src="/blog/airflowctl-0.1.0/images/connections_update.gif" alt="Connections Update"></p>
<h3 id="-dag-runs">🎯 DAG Runs</h3>
<p>Trigger and inspect DAG runs securely through the API:</p>
<p><img src="/blog/airflowctl-0.1.0/images/dagrun_list.gif" alt="DagRun List">
<img src="/blog/airflowctl-0.1.0/images/dagrun_trigger.gif" alt="DagRun Trigger"></p>
<h3 id="-variables">🪣 Variables</h3>
<p><img src="/blog/airflowctl-0.1.0/images/variables_export.gif" alt="Variables Export">
<img src="/blog/airflowctl-0.1.0/images/variables_import.gif" alt="Variables Import"></p>
<p>All these commands — and many more — operate via Airflow’s public REST API, ensuring secure, logged, and RBAC-controlled execution.</p>
<h2 id="-key-security-features">🔐 Key Security Features</h2>
<h3 id="-keyring-integration">🔑 Keyring Integration</h3>
<p>No more plaintext tokens or passwords.
airflowctl uses your OS-level keyring (e.g., macOS Keychain, Windows Credential Manager, or Linux Secret Service) to store and retrieve authentication tokens securely.</p>
<h3 id="-role-based-access-control">🧱 Role-Based Access Control</h3>
<p>Every command is evaluated by Airflow’s RBAC system through the API — ensuring consistent authorization with the web UI and API clients.</p>
<h3 id="-auditing-and-traceability">🕵️‍♀️ Auditing and Traceability</h3>
<p>All CLI commands generate API logs and can be observed through standard audit mechanisms — closing a long-standing gap between the CLI and Airflow’s security model.</p>
<h2 id="-roadmap-highlights">📈 Roadmap Highlights</h2>
<p>airflowctl 0.1.0 is just the beginning. The foundation is in place for a fully unified, secure CLI experience.</p>
<h3 id="-coming-soon">🧩 Coming Soon</h3>
<ul>
<li>Completeness of API coverage</li>
<li>Live log streaming</li>
<li>Worker management</li>
<li>Remote debugging</li>
<li>Incremental deprecation of legacy CLI commands</li>
<li>Over time, the legacy airflow CLI will be incrementally deprecated as commands achieve API parity.</li>
</ul>
<h2 id="-migration">🧭 Migration</h2>
<p>Migration requires mapping commands, updating authentication, and re-testing automation to ensure compatibility with the new API-backed architecture.
Because airflowctl mirrors the core CLI syntax, most workflows require minimal changes — primarily adjusting authentication and configuration.</p>
<p>Side by side comparison:</p>
<table>
  <thead>
      <tr>
          <th>Before</th>
          <th>After</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td><img src="/blog/airflowctl-0.1.0/images/pools_list_old.gif" alt="pools_list_old.gif"></td>
          <td><img src="/blog/airflowctl-0.1.0/images/pools_list.gif" alt="pools_list.gif"></td>
      </tr>
      <tr>
          <td><img src="/blog/airflowctl-0.1.0/images/variables_list_old.gif" alt="variables_list_old.gif"></td>
          <td><img src="/blog/airflowctl-0.1.0/images/variables_list_yaml.gif" alt="variables_list_yaml.gif"></td>
      </tr>
  </tbody>
</table>
<h2 id="-community--acknowledgments">🙏 Community &amp; Acknowledgments</h2>
<p>This release is the result of extensive collaboration across the Apache Airflow community.
Many thanks all who worked on AIP-81, the Airflow REST API, Authentication, and the airflowctl implementation.</p>
<h2 id="leading-contributors">Leading Contributors</h2>
<p>Special thanks to leading contributors of <code>airflowctl</code>:
<strong>Amar Prakash Pandey, Amogh Desai, Aritra Basu, Aryan Khurana, ayush3singh, Brent Bovenzi, Brunda10,
Bugra Ozturk, Daniel Standish, D. Ferruzzi, Deji Ibrahim, Elad Kalif, Ephraim Anierobi, GPK,
Guan Ming(Wesley) Chiu, Hussein Awala, Jake Roach, Jarek Potiuk, Jed Cunningham, Jens Scheffler,
Jaejun Lee, Kalyan R, Karthikeyan Singaravelan, Kaxil Naik, Kevin Yang, Kiruban Kamaraj, LI,JHE-CHEN,
Pierre Jeambrun, Pratiksha, Sam Wheating, Tzu-ping Chung, Valentyn, Vincent, Wei Lee, Yeonguk,
Yunchi Pang, Zhen-Lun (Kevin) Hong</strong></p>
<p>✨ In Summary</p>
<p>airflowctl 0.1.0 makes Airflow’s command line:</p>
<table>
  <thead>
      <tr>
          <th>Before</th>
          <th>After</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Direct DB access</td>
          <td>API-backed security</td>
      </tr>
      <tr>
          <td>No RBAC or audit</td>
          <td>Centralized auth &amp; logging</td>
      </tr>
      <tr>
          <td>Inconsistent behavior</td>
          <td>Unified CLI + API experience</td>
      </tr>
      <tr>
          <td>Manual secrets</td>
          <td>Keyring-secured credentials</td>
      </tr>
  </tbody>
</table>
<p>Security first. API always. CLI reimagined.
The secure CLI foundation lays the groundwork for Airflow’s next generation. A unified, API-first platform for orchestration and operations.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 3.1.0: Human-Centered Workflows</title>
    <link href="/blog/airflow-3.1.0/" rel="alternate"/>
    <id>/blog/airflow-3.1.0/</id>
    <published>2025-09-25T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>We are thrilled to announce the release of <strong>Apache Airflow 3.1.0</strong>, an update that puts humans at the center of data
workflows. This release introduces powerful new capabilities for human decision-making in automated
processes, comprehensive internationalization support, and significant developer experience improvements.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/3.1.0/">https://pypi.org/project/apache-airflow/3.1.0/</a> <br>
📚 Core Airflow Docs: <a href="https://airflow.apache.org/docs/apache-airflow/3.1.0/">https://airflow.apache.org/docs/apache-airflow/3.1.0/</a> <br>
📚 Task SDK Docs: <a href="https://airflow.apache.org/docs/task-sdk/1.1.0/">https://airflow.apache.org/docs/task-sdk/1.1.0/</a> <br>
🛠️ Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/3.1.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/3.1.0/release_notes.html</a> <br>
🪶 Sources: <a href="https://airflow.apache.org/docs/apache-airflow/3.1.0/installation/installing-from-sources.html">https://airflow.apache.org/docs/apache-airflow/3.1.0/installation/installing-from-sources.html</a> <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-3.1.0">https://github.com/apache/airflow/tree/constraints-3.1.0</a></p>
<h2 id="-human-in-the-loop-hitl-when-automation-meets-human-judgment">🤝 Human-in-the-Loop (HITL): When Automation Meets Human Judgment</h2>
<p>This powerful capability bridges the gap between automated processes and human expertise, making Airflow invaluable for:</p>
<p><img src="/blog/airflow-3.1.0/images/hitl.gif" alt="Human-in-the-Loop HITL"></p>
<ul>
<li><strong>AI/ML Model Validation</strong>: Pause inference pipelines for human review of model outputs</li>
<li><strong>Content Moderation</strong>: Route content through human reviewers before publication</li>
<li><strong>Approval Workflows</strong>: Require manager approval for sensitive operations</li>
<li><strong>Data Quality Gates</strong>: Allow data stewards to validate critical datasets</li>
</ul>
<p><strong>HITL</strong> tasks pause in a deferred state while presenting intuitive web forms in the Airflow UI. Users with appropriate roles can review context data, DAG parameters, and XCom values before making informed decisions.</p>
<h2 id="example-code">Example Code:</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.sdk</span> <span class="kn">import</span> <span class="n">DAG</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.providers.standard.operators.hitl</span> <span class="kn">import</span> <span class="n">HITLOperator</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="s2">&#34;content_moderation&#34;</span><span class="p">,</span> <span class="n">schedule</span><span class="o">=</span><span class="s2">&#34;@daily&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">moderate_content</span> <span class="o">=</span> <span class="n">HITLOperator</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">task_id</span><span class="o">=</span><span class="s2">&#34;review_content&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">message</span><span class="o">=</span><span class="s2">&#34;Please review this content for publication&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">data_key</span><span class="o">=</span><span class="s2">&#34;content_to_review&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span></code></pre></div><h1 id="-ui-enhancements--performance">📊 UI Enhancements &amp; Performance</h1>
<h2 id="calendar-and-gantt-views-make-their-comeback">Calendar and Gantt Views Make Their Comeback</h2>
<p>Remember those beloved Calendar and Gantt chart views from Airflow 2.x? They&rsquo;re back, completely rebuilt for the
modern React UI after being omitted from the 3.0 release.</p>
<p>The new Calendar view is genuinely interactive with filtering capabilities that make it easy to drill down
into specific time periods and dag states.</p>
<p><img src="/blog/airflow-3.1.0/images/calendar.gif" alt="Calendar View"></p>
<p>The Gantt chart is now integrated directly into the grid view and renders much faster than the old
version, giving you that timeline perspective without the performance headaches.</p>
<p><img src="/blog/airflow-3.1.0/images/gantt.png" alt="Gantt View"></p>
<h2 id="theme-updates-that-actually-matter">Theme Updates That Actually Matter</h2>
<p>We&rsquo;ve refreshed the color palette using modern design principles, making the UI more consistent, professional
and most of all taken a careful look at contrast ratios so the UI should be more accessible.</p>
<h2 id="other-improvements">Other Improvements</h2>
<p>We&rsquo;ve added a lot more filtering options across the pages!</p>
<p>Plus, you can now pin your <strong>favorite DAGs</strong> to keep them at the top of your list or to filter for them easily. It&rsquo;s
one of those small features that makes a huge difference when dealing with 100s of workflows.</p>
<p><img src="/blog/airflow-3.1.0/images/favorite.png" alt="Favorite Dags in the UI">
Credited to Volker Janz.</p>
<blockquote>
<p>📊 <strong>UI Development Milestone</strong>: Airflow 3.1.0 features <strong>5x more UI pull requests</strong> than the 2.10 release and <strong>50% more</strong> than Airflow 3.0, demonstrating the community&rsquo;s commitment to user experience excellence.</p></blockquote>
<h1 id="-deadline-alerts-proactive-workflow-monitoring">⏰ <strong>Deadline Alerts</strong>: Proactive Workflow Monitoring</h1>
<p>Say goodbye to reactive monitoring. <strong>Deadline Alerts</strong> provide proactive notifications when DAG runs
exceed time thresholds, helping ensure SLA compliance and timely completion of critical workflows.</p>
<p>Configure monitoring by specifying:</p>
<ul>
<li><strong>Reference point</strong>: DAG queued time, logical date, or fixed datetime</li>
<li><strong>Interval</strong>: Time threshold (positive or negative)</li>
<li><strong>Callback</strong>: Notifications via Airflow Notifiers or custom functions</li>
</ul>
<h2 id="example-code-1">Example Code:</h2>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">datetime</span> <span class="kn">import</span> <span class="n">timedelta</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.sdk.definitions.deadline</span> <span class="kn">import</span> <span class="n">DeadlineAlert</span><span class="p">,</span> <span class="n">DeadlineReference</span><span class="p">,</span> <span class="n">AsyncCallback</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.providers.slack.notifications.slack_webhook</span> <span class="kn">import</span> <span class="n">SlackWebhookNotifier</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="s2">&#34;critical_etl&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">deadline</span><span class="o">=</span><span class="n">DeadlineAlert</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">reference</span><span class="o">=</span><span class="n">DeadlineReference</span><span class="o">.</span><span class="n">DAGRUN_QUEUED_AT</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="n">interval</span><span class="o">=</span><span class="n">timedelta</span><span class="p">(</span><span class="n">hours</span><span class="o">=</span><span class="mi">2</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">callback</span><span class="o">=</span><span class="n">AsyncCallback</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="n">SlackWebhookNotifier</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">kwargs</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;text&#34;</span><span class="p">:</span> <span class="s2">&#34;🚨 Critical ETL missed deadline!&#34;</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Your tasks here</span>
</span></span></code></pre></div><p>Perfect for monitoring daily ETLs, alerting before critical deadlines, or escalating resource-constrained workflows.</p>
<h1 id="-going-global-with-17-languages">🌍 Going Global with 17 Languages</h1>
<p>Airflow now speaks your team&rsquo;s language. Literally. We have added comprehensive internationalization support
for <strong>17 languages</strong>, including Arabic, Chinese, French, German, Spanish and more. The interface detects your
browser preferences automatically, but you can switch languages on the fly without refreshing the page.</p>
<p><img src="/blog/airflow-3.1.0/images/i18n-demo.gif" alt="Internationalization Demo"></p>
<p>For our Arabic and Hebrew users, we&rsquo;ve built in <strong>proper right-to-left (RTL) support</strong></p>
<p>The best part? We have made it straightforward for the community to contribute additional languages with clear
contribution guidelines, so this is just the beginning of Airflow&rsquo;s global reach.</p>
<h1 id="-build-your-airflow-your-way">🎨 Build <em>your Airflow</em>, <em>your way</em></h1>
<p>The new <strong>React Plugin System</strong> (<strong>AIP-68</strong>) transforms how you extend Airflow&rsquo;s interface. We have replaced
the old Flask-based approach with a modern toolkit that lets you customize Airflow exactly how your team works.</p>
<p>Want to embed your company&rsquo;s dashboard right in the Airflow UI? Build React applications or iframes that will
render inside Airflow&rsquo;s (nav bar, dashboard, details page, etc.). Want to link to your existing tools
seamlessly? Create custom external links to your resources. Want to extend Airflow&rsquo;s API server? Register
FastAPI sub applications and middlewares that fit your specific processes.</p>
<p>The system includes:</p>
<ul>
<li><strong>External Views</strong> for linking to existing tools (external links or embedded iframes)</li>
<li><strong>React Applications</strong>  support for rendering external react apps</li>
<li><strong>FastAPI Sub Applications</strong> to extend the API server</li>
<li><strong>Root Middlewares</strong> for intercepting API requests (even core ones)</li>
</ul>
<p>We&rsquo;ve already seen teams integrate everything from Wikipedia searches to data lineage
visualizations to yes, someone building a snake game to play while waiting on dag runs!</p>
<p><img src="/blog/airflow-3.1.0/images/snake.gif" alt="Snake Game Plugin">
Credited to Tamara Fingerlin.</p>
<h1 id="-enhanced-developer-and-authoring-experience">🔧 Enhanced Developer and Authoring Experience</h1>
<h2 id="task-sdk-evolution">Task SDK Evolution</h2>
<p>Airflow 3.1 advances the decoupling of the <strong>Task SDK</strong> from Airflow Core through improved DAG serialization. While
complete separation arrives in 3.2.0, the foundation enables:</p>
<ul>
<li><strong>Independent Upgrades</strong>: Reduced coordination need between Dag authors and Airflow Ops teams</li>
<li><strong>Forward Compatibility</strong>: Dag authors should now write Dags by importing from the <strong>airflow.sdk</strong> namespace for future-proofing. (Naturally, the old imports still work but issue a warning.)</li>
<li><strong>Deployment Flexibility</strong>: Better support for separated component deployment</li>
</ul>
<h2 id="python-313-support">Python 3.13 Support</h2>
<p>Airflow 3.1.0 adds <strong>Python 3.13</strong> support while removing Python 3.9 (end-of-life). The platform now supports Python 3.10, 3.11, 3.12, and 3.13.</p>
<h2 id="inference-execution">Inference Execution</h2>
<p>A new streaming API endpoint (<strong><code>/dags/{dag_id}/dagRuns/{dag_run_id}/wait</code></strong>) allows applications to watch DAG runs
until completion, enabling responsive integration patterns for real-time workflows.</p>
<p>The below example use <a href="https://www.python-httpx.org/async/"><code>httpx</code></a> to trigger a dag run, and emits the final dag run
state after it finishes:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-py" data-lang="py"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">asyncio</span>
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">json</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">httpx</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">dag_id</span> <span class="o">=</span> <span class="s2">&#34;my-dag&#34;</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">create_and_wait</span><span class="p">(</span><span class="n">client</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># Create a dag run...</span>
</span></span><span class="line"><span class="cl">    <span class="n">r</span> <span class="o">=</span> <span class="k">await</span> <span class="n">client</span><span class="o">.</span><span class="n">post</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;https://my-airflow.example.com/api/v2/dags/</span><span class="si">{</span><span class="n">dag_id</span><span class="si">}</span><span class="s2">/dagRuns&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">run_id</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="n">json</span><span class="p">()[</span><span class="s2">&#34;dag_run_id&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">    <span class="k">async</span> <span class="k">with</span> <span class="n">client</span><span class="o">.</span><span class="n">stream</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="s2">&#34;GET&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="sa">f</span><span class="s2">&#34;https://my-airflow.example.com/api/v2/dags/</span><span class="si">{</span><span class="n">dag_id</span><span class="si">}</span><span class="s2">/dagRuns/</span><span class="si">{</span><span class="n">run_id</span><span class="si">}</span><span class="s2">/wait&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="p">)</span> <span class="k">as</span> <span class="n">r</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">async</span> <span class="k">for</span> <span class="n">line</span> <span class="ow">in</span> <span class="n">r</span><span class="o">.</span><span class="n">aiter_lines</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">            <span class="k">pass</span>  <span class="c1"># You can do progress report here instead.</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Dag run state:&#34;</span><span class="p">,</span> <span class="n">json</span><span class="o">.</span><span class="n">loads</span><span class="p">(</span><span class="n">line</span><span class="o">.</span><span class="n">strip</span><span class="p">())[</span><span class="s2">&#34;state&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">async</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="k">async</span> <span class="k">with</span> <span class="n">httpx</span><span class="o">.</span><span class="n">AsyncClient</span><span class="p">()</span> <span class="k">as</span> <span class="n">client</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">await</span> <span class="n">create_and_wait</span><span class="p">(</span><span class="n">client</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">asyncio</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">main</span><span class="p">())</span>
</span></span></code></pre></div><h1 id="-amazing-community">🙏 Amazing Community</h1>
<p>Apache Airflow 3.1.0 represents an extraordinary community effort, showcasing the vibrant ecosystem that drives this project forward with <strong>163 contributors</strong> making this release possible across <strong>1,400+ commits</strong>.</p>
<h2 id="leading-contributors">Leading Contributors</h2>
<p>Special thanks to our top 20 contributors who drove this release forward: <strong>Amogh Desai</strong>, <strong>Ash Berlin-Taylor</strong>, <strong>Brent Bovenzi</strong>, <strong>Bugra Ozturk</strong>, <strong>Daniel Standish</strong>, <strong>Elad Kalif</strong>, <strong>Ephraim Anierobi</strong>, <strong>GPK</strong>, <strong>Guan-Ming (Wesley) Chiu</strong>, <strong>Jarek Potiuk</strong>, <strong>Jens Scheffler</strong>, <strong>Karthikeyan Singaravelan</strong>, <strong>Kaxil Naik</strong>, <strong>LI,JHE-CHEN</strong>, <strong>Pierre Jeambrun</strong>, <strong>Shahar Epstein</strong>, <strong>Tzu-ping Chung</strong>, <strong>Vincent</strong>, <strong>Wei Lee</strong>, and <strong>Yeonguk Choo</strong>.</p>
<details>
<summary>View all 143 additional contributors</summary>
<p>1in3x, Aaron Chen, Aayush Bisen, Abhishek, Achim Gädke, Aldo, Alex Neal Albinda, Alyssa Mhie M. Matila, Anand Raman, Andrei Serdiukov, Ankit Chaurasia, Antony Southworth, Aritra Basu, Aryan Khurana, Atul Singh, Azis, BBQing, Bjorn Olsen, Bowrna, Brunda10, Carl Leake, Chang-Yen (Brian) Li, Christos Bisias, Collin McNulty, Constance Martineau, D. Ferruzzi, DHARMENDRA AHIRWAR, Damian Shaw, Daniel Wolf, David Blain, Denis Krivenko, Dev-iL, Dheeraj Turaga, Diogo Rodrigues, Domadelfin, Dov Benyomin Sohacheski, Duc Nguyen, Evgenii Prusov, Farhan, Fortytwo, Gabriel TOUZALIN, Gajo Petrovic, Gary Hsu, Glenn Schuurman, Guangyang Li, Gwak Beomgyu, Hoyeop Lee, Hussein Awala, Isaiah Iruoha, Ivan, Jake Roach, James Hyphen, Jason, Jason Brownstein, Jed Cunningham, Jeongseok Kang, John Bampton, Josef Šimánek, Josué Velázquez Gen, João Ramiro, Kacper Muda, Kalyan R, Karan Anand, Karen Braganza, Karthik S, Kavya Katal, Ken Lewerentz, Kevin Liu, Kevin Yang, Kiran R, Kiruban Kamaraj, Kosteev Eugene, Kumbha Lakshmi Narayana, Kyungjun Lee, LIU ZHE YOU, Lipu Fei, Maciej Obuchowski, Maksim, Mike Lay, Mikhail Dengin, Minkyu Kim, N R Navaneet, NOEUN KIM, Naseem Shah, Nataneljpwd, Niko Oliveira, Nithin U, Nitochkin, Olivier, Owen Leung, Paolo Facchinetti, Pedro Leal, Pratiksha, Przemysław Mirowski, Qiang-Liu, Rahul Vats, Ramit Kataria, Sam Wheating, Sean Ghaeli, Sean Rose, Sebastián Ortega, Seongho Kim, SeungMin, Shlomit-B, Shubham Raj, Sneha Prabhu, Stanley Law, Stephan, Steve Ahn, Valentyn, Vic Wen, Vincent Kling, VladaZakharova, Wei-Yu Chen, Wonseok Yang, Xch1, Xiaodong DENG, Y. SOMDA, Yann Lambret, Yannick Suter, Yeonguk, Yiming Peng, Yusin, Zach, Zach Liu, Zhen-Lun (Kevin) Hong, anasatzemoso, ayush3singh, codecae, davidfgcorreia, dominikhei, ecodina, fuatcakici, humit, magic_frog, majorosdonat, mandeepzemo, oboki, olegkachur-e, pawelgrochowicz, roach231428, shreyaskj-0710, sujitha-saranam, suman-himanshu, vikrantkumar-max, yangyulely, 코딩하는펭귄.</p>
</details>
<h2 id="ui-excellence--community-growth">UI Excellence &amp; Community Growth</h2>
<p>The exceptional growth in UI contributions - <strong>5x more pull requests</strong> than Airflow 2.10 and <strong>50% more</strong> than Airflow 3.0 - reflects the dedicated efforts of our UI maintainers and an expanding community of <strong>70 frontend contributors</strong> who have made user experience a cornerstone of this release.</p>
<h2 id="global-collaboration">Global Collaboration</h2>
<p>The internationalization effort represents contributors from around the world, making Airflow truly accessible across <strong>17 languages</strong> and diverse technical communities, demonstrating the truly global nature of the Airflow project.</p>
<hr>
<p><em>Apache Airflow is a community-driven project. Special thanks to all contributors who made this release possible through code, documentation, testing, and feedback. The future of workflow orchestration is built together.</em></p>
<h1 id="-migration--upgrade-notes">📝 Migration &amp; Upgrade Notes</h1>
<ul>
<li><strong>Python Support</strong>: Ensure you&rsquo;re running Python 3.10+ before upgrading. We recommend at least Python 3.12 for performance improvements from the Python core team – 3.13 if you can manage it is even better!</li>
<li><strong>Provider Updates</strong>: Update to the latest provider packages to take advantage of new features.</li>
<li><strong>Breaking Changes</strong>: Review the <a href="https://airflow.apache.org/docs/apache-airflow/3.1.0/installation/upgrading.html">migration guide</a> for configuration changes and removed features if you are upgrading directly from Airflow 2.x.</li>
</ul>
<h1 id="-get-involved">🔗 Get Involved</h1>
<ul>
<li><strong>Try the Release</strong>: Upgrade your development environment and explore the new features</li>
<li><strong>Join the Conversation</strong>: Connect with us on (<a href="https://s.apache.org/airflow-slack">Airflow Slack</a>) and the (<a href="https://airflow.apache.org/community/">dev mailing list</a>)</li>
<li><strong>Contribute</strong>: Check out our <a href="https://github.com/apache/airflow/blob/main/contributing-docs/README.rst">contribution guide</a>.</li>
<li><strong>Provide Feedback</strong>: Share your experiences and suggestions on GitHub (<a href="https://github.com/apache/airflow">https://github.com/apache/airflow</a>)</li>
</ul>
<p>Apache Airflow 3.1.0 marks a new chapter in making data orchestration more inclusive, intelligent, and
human-centered. We can&rsquo;t wait to see what you build with it!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow® 3 is Generally Available!</title>
    <link href="/blog/airflow-three-point-oh-is-here/" rel="alternate"/>
    <id>/blog/airflow-three-point-oh-is-here/</id>
    <published>2025-04-22T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>We announced our intent to focus on Apache Airflow 3.0® as the next big milestone for the Airflow project at the Airflow Summit in September 2024. We are delighted to announce that Airflow 3.0 is now released!</p>
<h2 id="a-major-release-four-years-in-the-making">A Major Release, Four Years in the Making</h2>
<p>Airflow 3.0 is the biggest release in Airflow’s history—2.0 was released in 2020, and the last 4 years have seen incremental updates and releases every quarter with version 2.10 released in Q4 2024. With over 30 million monthly downloads (up over 30x since 2020) and 80,000 organizations (up from 25,000 in 2020) now using Airflow, we’ve seen an incredible growth in popularity since 2.0.</p>
<p>Over the last four years, Airflow has grown to power business critical data workflows within organizations of all sizes. We have seen an exponential increase in the use cases for Airflow from its beginnings with ETL, ELT, and Reverse ETL, with over 30% of Airflow users using it for MLOps, and 10% using it for GenAI workflows. Airflow 3 is a response to this use case expansion and is the standard for data application development across the enterprise.</p>
<p>Here are some highlights:</p>
<ul>
<li>
<p>Airflow 3 is significantly easier to use for data practitioners and incorporates their key requests for critical changes to Airflow. Early user reactions to features such as the new React based UI, DAG Versioning, and improved Backfill support have been incredibly positive. I was ecstatic to see the reaction from data engineers when I demonstrated this at a recent Airflow meetup.</p>
</li>
<li>
<p>The seamless UI transition of navigating between Asset-oriented workflows and Task-oriented workflows is beautiful. Once again, Airflow lets the developer choose how you want to develop and navigate without imposing any restrictions.</p>
</li>
<li>
<p>Introduction of Event Driven Scheduling enables Airflow to seamlessly integrate with messaging providers and react to events happening and data assets being updated outside of Airflow.</p>
</li>
<li>
<p>The big architecture change with the introduction of the Task Execution Interface and the Task SDKs, enable a stronger security model, including secure, scalable execution across multi-cloud, hybrid-cloud, and local data center deployments.</p>
</li>
</ul>
<p>This is the result of 300+ developers within the Airflow community working together tirelessly for many months and I could not be more proud to be part of this wonderful team. Here are some more details of the release.</p>
<h2 id="highly-requested-ux-features">Highly requested UX features</h2>
<h3 id="dag-versioning">DAG Versioning</h3>
<p>DAG Versioning has been the most requested feature within Airflow based on the annual Airflow survey. As implemented in Airflow 3, a DAG will run through to completion based on the version at start, even if a new version has been uploaded while this DAG was being run. All DAG runs in the UI are now associated with the version of the DAG as run including the Task structure, the code, the logs, and more.
This is described in two AIPs: Improve DAG history (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-65%3A&#43;Improve&#43;DAG&#43;history&#43;in&#43;UI">AIP-65</a>) , and DAG Bundles and Parsing (<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=294816356">AIP-66</a>).</p>
<p><img src="/blog/airflow-three-point-oh-is-here/versioning_ui.gif" alt="DAG Versioning UI"></p>
<h3 id="backfills-improvements">Backfills improvements</h3>
<p>Another long-standing user request has been better support for backfills. Often discussed in the context of machine learning, backfills also apply to traditional ETL and ELT use cases.  In Airflow 3, backfills are run within the scheduler for improved control, scalability, and diagnostics. Backfills can now be started from the UI or API, and monitored within the UI.</p>
<p>This was defined as part of “Scheduler-managed backfills” (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-78&#43;Scheduler-managed&#43;backfill">AIP-78</a>), and an example screenshot is shown below:</p>
<p><img src="/blog/airflow-three-point-oh-is-here/backfill.png" alt="Backfill UI"></p>
<h2 id="run-anywhere-at-any-time-in-any-language">Run anywhere, at any time, in any Language</h2>
<h3 id="run-anywhere-in-any-language">Run anywhere, in any language</h3>
<p>A foundational goal of Airflow 3 is allowing execution in any environment, in any language. A key component of this is the Task Execution Interface (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-72&#43;Task&#43;Execution&#43;Interface&#43;aka&#43;Task&#43;SDK">AIP-72</a>), which enables the evolution of Airflow into a client-server architecture, which represents one of the most significant architectural shifts in Airflow’s history. This supports Celery, Kubernetes, and Local Executors, but also enables new capabilities. A component of this change is the API server which represents input for the Task Execution Interface. This foundational feature enables multi-cloud deployments and multi-language support in the form of the Task Execution API. The Airflow 3 release includes the Python TaskSDK which enables backward compatibility for existing DAGs. TaskSDKs for additional languages, starting with Golang will be released over the next few months.</p>
<p>To enable data pipelines to be run on edge devices, outside of the core data centers and clouds, the Edge Executor (<a href="https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=301795932">AIP-69</a>) is available as a provider package with Airflow 3. This is an incremental feature built on top of the Task Execution Interface. Initial incarnations have been released in experimental mode based on Airflow 2x and this executor has now evolved to leverage the Airflow 3 API Server.</p>
<h3 id="event-driven-scheduling-and-data-assets">Event-driven scheduling and Data Assets</h3>
<p>Airflow 3 represents a foundational jump in enabling Airflow to react to events happening outside of Airflow, including data assets being created or updated by external data systems. This was based on the evolution of Datasets into Data Assets and was broken out into several AIPs as detailed below, which are all part of the release.</p>
<p>The fundamental evolution of Datasets into Data Assets has been done as part of “Introducing Data Assets” (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-74&#43;Introducing&#43;Data&#43;Assets">AIP-74</a>). This introduces the concept of Watchers which is leveraged by other capabilities detailed below. A significant enhancement around Data Assets is the New Asset-Centric Syntax (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-75&#43;New&#43;Asset-Centric&#43;Syntax">AIP-75</a>) for defining Assets easily with DAGs using the Python decorator syntax, which is part of this release.</p>
<p>External event driven scheduling (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82&#43;External&#43;event&#43;driven&#43;scheduling&#43;in&#43;Airflow">AIP-82</a>) is based on the foundational Data Assets work described above, specifically Watchers. The initial scope as defined in the AIP is complete and now incorporates a “Common Message Bus” interface. This release also includes an implementation of the above for AWS SQS as an “out of the box” integration, which demonstrates DAGs being triggered upon the arrival of a message in AWS SQS.</p>
<h3 id="inference-execution-and-hyperparameter-tuning">Inference execution and hyperparameter tuning</h3>
<p>Many ML and AI Engineers are already using Airflow for ML/AI Ops, especially for model training. However, there were challenges for Inference Execution. Enhancing Airflow for Inference Execution by adding support for non-data-interval-Dags (sorry, that’s a mouthful) is an important change. This work is covered as part of “Remove Execution date unique constraint from DAG run” (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-83&#43;Remove&#43;Execution&#43;Date&#43;Unique&#43;Constraint&#43;from&#43;DAG&#43;Run">AIP-83</a>)</p>
<h2 id="security-and-usability-improvements">Security and usability improvements</h2>
<h3 id="ui-modernization">UI Modernization</h3>
<p>The Airflow UI has been completely rewritten as part of Airflow 3 and incorporates a significantly improved user experience which seamlessly blends Asset-oriented workflows with Task-oriented workflows. This is a dramatic improvement which enables developers to author DAGs as they choose, without being opinionated about “a right way”.</p>
<p><img src="/blog/airflow-three-point-oh-is-here/airflow-3.0-ui.gif" alt="Airflow 3.0’s new UI"></p>
<p>Check out <a href="http://airflow.apache.org/docs/apache-airflow/stable/ui.html">the screenshots in the docs</a> for more.</p>
<p>Recreating it to be based on React and the FastAPI has been a massive project and was broken out into several AIPs as detailed below.</p>
<p>The foundation for the new UI is the REST API and a set of internal APIs for UI Operations (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-84&#43;UI&#43;REST&#43;API">AIP-84</a>) both of which are now based on FastAPI. These APIs are served as part of the API Server described above as part of the Task Execution framework.</p>
<p>The Airflow 3.0 UI has been significantly improved and includes a streamlined user experience workflow encompassing both the Grid and Graph views. The interaction between DAGs and Assets are also more streamlined. User experience is always a work in progress and we very much appreciate your feedback. This is covered in great detail as part of the Modern Web Application proposal (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38&#43;Modern&#43;Web&#43;Application">AIP-38</a>).</p>
<p>As part of this project, Flask AppBuilder has now been moved into a separate provider package and is no longer a part of the Core Airflow package. This enables an easier security and maintenance update process, while retaining backwards compatibility. This is documented as part of the “Remove Flask App Builder as a Core Dependency” proposal (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-79%3A&#43;Remove&#43;Flask&#43;AppBuilder&#43;as&#43;Core&#43;dependency">AIP-79</a>).</p>
<h3 id="security">Security</h3>
<p>A key benefit of the Task Execution Interface and the API server is Task Isolation. This has often been requested by Airflow enterprise deployments for a better security posture when an Airflow deployment is shared by multiple teams. Further security and authorization patterns can be developed on top of this foundation as more detailed requirements are uncovered.</p>
<p>Improving the CLI and reducing the maintenance burden by having the CLI use the Airflow APIs, rather than direct access is an important evolution for Airflow. We have now split the core Airflow CLI into two parts, the first for local development and backwards compatibility and the second for remote access using the API. The second will be a new provider package called “airflowctl” which can be optionally installed along with Core Airflow. This is documented in more detail as part of the “Enhanced security in CLI via Integration of API” proposal (<a href="https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-81&#43;Enhanced&#43;Security&#43;in&#43;CLI&#43;via&#43;Integration&#43;of&#43;API">AIP-81</a>).</p>
<h2 id="an-amazing-community">An amazing community</h2>
<p>This release could not have happened without the inspiration and technical leadership of key contributors who led the AIPs listed above. We thank them all here: Ash Berlin-Taylor, Brent Bovenzi, Bugra Ozturk, Constance Martineau, Daniel Standish, Jed Cunningham, Jens Scheffler, Kaxil Naik, Pierre Jeambrun, Vincent Beck, and Vikram Koka. We also wanted to thank Jarek Potiuk for the critical development infrastructure and packaging work and to Elad Kalif for shepherding all the key provider changes needed. We would like to recognize Wei Lee and Ankit Chaurasia for their work on the upgrade utilities to enable users to easily upgrade to Airflow 3.</p>
<p>Finally, a huge shoutout to Jed Cunningham and Kaxil Naik for the critical part of release management!</p>
<p>Over three hundred developers around the world have contributed to making this release a reality. We thank them all for their contributions. They are listed here in alphabetical order:</p>
<ul>
<li>Aakcht</li>
<li>Aaron Chen</li>
<li>Abhishek</li>
<li>Adam Turner</li>
<li>Adan</li>
<li>Aditya Yadav</li>
<li>Adrian Lazar</li>
<li>Adrian Perea</li>
<li>Ajit J Gupta</li>
<li>Albert Okiri</li>
<li>Alex Waygood</li>
<li>Alexander Millin</li>
<li>AlteredOracle</li>
<li>Amar Prakash Pandey</li>
<li>Amir Mor</li>
<li>Amogh Desai</li>
<li>Amol Saini</li>
<li>Anakin Skywalker Pactores</li>
<li>Andor Markus</li>
<li>Andre Miranda</li>
<li>Andres Lowrie</li>
<li>Andrew Arochukwu</li>
<li>Andrew Stein</li>
<li>Andrii Abramov</li>
<li>Andrii Korotkov</li>
<li>Andrii Yerko</li>
<li>Ankit Chaurasia</li>
<li>Anthony Lin</li>
<li>Antony Southworth</li>
<li>Aritra Basu</li>
<li>Arjun Pathak</li>
<li>Arnel Jan Sarmiento</li>
<li>Arnout Engelen</li>
<li>Artem Suslov</li>
<li>Arthur Braveheart</li>
<li>Artour</li>
<li>Artur Skarżyński</li>
<li>Arunav Gupta</li>
<li>Aryan Khurana</li>
<li>Ash Berlin-Taylor</li>
<li>AshKatzEm</li>
<li>AutomationDev85</li>
<li>Avihais12344</li>
<li>Azhar Izzannada E</li>
<li>Baitur Ulukbekov</li>
<li>Balthazar Rouberol</li>
<li>Bartosz Jankiewicz</li>
<li>Bas</li>
<li>Ben Chen</li>
<li>Benoit Perigaud</li>
<li>Biswamitra Biswas</li>
<li>Bjorn Olsen</li>
<li>Bluefox9x5</li>
<li>Bohdan Udovenko</li>
<li>Bonnie Why</li>
<li>Boris Morel</li>
<li>Bowrna</li>
<li>Brent Bovenzi</li>
<li>Bugra Ozturk</li>
<li>Błażej Tecław</li>
<li>Castle Cheng</li>
<li>Chris Luedtke</li>
<li>Christian Yarros</li>
<li>Christos Bisias</li>
<li>Collin McNulty</li>
<li>Computer Network Investigation</li>
<li>Constance Martineau</li>
<li>D. Ferruzzi</li>
<li>DShi</li>
<li>Daniel Gellert</li>
<li>Daniel Imberman</li>
<li>Daniel Standish</li>
<li>Daniel van der Ende</li>
<li>Danish Amjad</li>
<li>Danny Liu</li>
<li>David Blain</li>
<li>Derek</li>
<li>Detlev V.</li>
<li>Dewen Kong</li>
<li>Sriraj Dheeraj Turaga</li>
<li>Diogo Rodrigues</li>
<li>Dmitry Astankov</li>
<li>Dmitry Pustoshilov</li>
<li>Dominic Leung</li>
<li>Dong-yeong0</li>
<li>Doug Guthrie</li>
<li>Dylan Melotik</li>
<li>Elad Kalif</li>
<li>Eldar Kasmamytov</li>
<li>Ephraim Anierobi</li>
<li>Eric</li>
<li>Everton Seiei Arakaki</li>
<li>Farhan</li>
<li>Fedor Kobak</li>
<li>Felix Uellendall</li>
<li>Fred Thomsen</li>
<li>Fully.is(풀리)</li>
<li>GPK</li>
<li>Gagan Bhullar</li>
<li>Geonwoo Kim</li>
<li>GlenboLake</li>
<li>Gopal Dirisala</li>
<li>Gregory Borodin</li>
<li>Guan-Ming (Wesley) Chiu</li>
<li>Guangyang Li</li>
<li>Guillaume Lostis</li>
<li>Hari Selvarajan</li>
<li>HassanAlahmed</li>
<li>Hojin Jun</li>
<li>Howard Yoo</li>
<li>Huanjie Guo</li>
<li>Hung</li>
<li>Hussein Awala</li>
<li>Hyunsoo Kang</li>
<li>Ian Buss</li>
<li>Idris Adebisi</li>
<li>Igor Kholopov</li>
<li>IlaiGigi</li>
<li>Indrale Dnyaneshwar</li>
<li>JISHAN GARGACHARYA</li>
<li>Jaejun</li>
<li>Jake Ferriero</li>
<li>Jake Roach</li>
<li>Jakub Dardzinski</li>
<li>James Chaldecott</li>
<li>James Regan</li>
<li>Jarek Potiuk</li>
<li>Jasmin Patel</li>
<li>Jason</li>
<li>Jed Cunningham</li>
<li>Jeff Harrison</li>
<li>Jens Scheffler</li>
<li>Jianzhun Du</li>
<li>Jimmy McBroom</li>
<li>Joao Amaral</li>
<li>João Pedro M Miguel</li>
<li>Joel Labes</li>
<li>Joey Cumines</li>
<li>Joffrey Bienvenu</li>
<li>John Bampton</li>
<li>John C. Merfeld</li>
<li>Johnny1cyber</li>
<li>José Joaquín Virtudes Castro</li>
<li>Joseph Ang</li>
<li>JoshuaXOng</li>
<li>Josix</li>
<li>Julian Maicher</li>
<li>Kacper Kulczak</li>
<li>Kacper Muda</li>
<li>Kalyan R</li>
<li>Kamil Breguła</li>
<li>Karen Braganza</li>
<li>Karthik Dulam</li>
<li>Karthik Ravi</li>
<li>Karthikeyan Singaravelan</li>
<li>Kaxil Naik</li>
<li>Kevin Allen</li>
<li>Kim</li>
<li>Kris</li>
<li>Kunal Bhattacharya</li>
<li>LIU ZHE YOU</li>
<li>Lennox Stevenson</li>
<li>Linh</li>
<li>Lorin Dawson</li>
<li>Lou ✨</li>
<li>Lucy Hu</li>
<li>Lukas Mikelionis</li>
<li>Luyang Liu</li>
<li>Lyndon Fan</li>
<li>M. Olcay Tercanlı</li>
<li>Maciej Obuchowski</li>
<li>Madison Swain-Bowden</li>
<li>Maksim</li>
<li>Marcelo Trylesinski</li>
<li>Marcos Marx</li>
<li>Maria</li>
<li>Mark Andreev</li>
<li>Mark H</li>
<li>Matt Burke</li>
<li>Matt Dupree</li>
<li>Maxim Martynov</li>
<li>Mayuresh Kedari</li>
<li>Mehul Goyal</li>
<li>Mike</li>
<li>Mike Beckhusen</li>
<li>Mikhail Dengin</li>
<li>MishchenkoYuriy</li>
<li>Muhammad Hanif Mohamad Musa</li>
<li>Myles Hollowed</li>
<li>Narendra-Neerukonda</li>
<li>Natsu</li>
<li>Nikita</li>
<li>Niko Oliveira</li>
<li>Nishant Gupta</li>
<li>Nitesh Kumar Dubey Samsung</li>
<li>Nitochkin</li>
<li>Oleg Ovcharuk</li>
<li>Oleksandr Slynko</li>
<li>Omkar P</li>
<li>Owen Leung</li>
<li>Pandycool</li>
<li>Pankaj Koti</li>
<li>Park Jiwon</li>
<li>Pavan Sharma</li>
<li>Peng-Jui Wang</li>
<li>Peter Debelak</li>
<li>Phani Kumar</li>
<li>Pierre Jeambrun</li>
<li>Po-Yu Hsieh</li>
<li>Prajwal7842</li>
<li>Pratiksha</li>
<li>Purna Chander</li>
<li>Rafa</li>
<li>Rahul Madan</li>
<li>Rahul Vats</li>
<li>Ramit Kataria</li>
<li>Rishabh Srivastava</li>
<li>Rushabh Garambha</li>
<li>Ryan Eakman</li>
<li>Ryan Hatter</li>
<li>Rytis Ulys</li>
<li>SAI GANESH S</li>
<li>Sam Lendle</li>
<li>SamLiaoP</li>
<li>Saumil Patel</li>
<li>SaurabhhB</li>
<li>Sean Gabriel Bayron</li>
<li>Sean Rose</li>
<li>Sebastian Daum</li>
<li>SeonghwanLee</li>
<li>Shahar Epstein</li>
<li>Shahbaz Aamir</li>
<li>Shoaib UR Rehman</li>
<li>Shubham Raj</li>
<li>Simon Sawicki</li>
<li>Siva Kumar Edupuganti</li>
<li>Sneha Prabhu</li>
<li>Sooter Saalu</li>
<li>Srabasti Banerjee</li>
<li>Stefan Keidel</li>
<li>Steven Loria</li>
<li>Steven Shidi Zhou</li>
<li>Stijn De Haes</li>
<li>Success Moses</li>
<li>TakawaAkirayo</li>
<li>Tamara Janina Fingerlin</li>
<li>Tamas Palinkas</li>
<li>Tatiana Al-Chueyr</li>
<li>Topher Anderson</li>
<li>Tzu-ping Chung</li>
<li>Usiel Riedl</li>
<li>Utkarsh Sharma</li>
<li>Valentyn</li>
<li>Venkat VJ</li>
<li>Vikram Koka</li>
<li>Vikram Medabalimi</li>
<li>Vikramaditya Gaonkar</li>
<li>Vincent</li>
<li>Vincent Kling</li>
<li>VladaZakharova</li>
<li>Waldemar Hummer</li>
<li>Wang Ran (汪然)</li>
<li>Wei Lee</li>
<li>Wojciech Szlachta</li>
<li>Wonseok Yang</li>
<li>Yeonguk Choo</li>
<li>Yohei Kishimoto</li>
<li>Youngha, Park</li>
<li>Yuan Li</li>
<li>Zach Liu</li>
<li>Zhen-Lun (Kevin) Hong</li>
<li>althati</li>
<li>ambikagarg</li>
<li>atrbgithub</li>
<li>awdavidson</li>
<li>codecae</li>
<li>dan-js</li>
<li>darkag</li>
<li>davidfgcorreia</li>
<li>dominikhei</li>
<li>ellisms</li>
<li>enisnazif</li>
<li>fritz-astronomer</li>
<li>gaurav7261</li>
<li>geraj1010</li>
<li>got686-yandex</li>
<li>harjeevan maan</li>
<li>harry.shi</li>
<li>hikaruhk</li>
<li>hprassad</li>
<li>ipsatrivedi</li>
<li>jaejun</li>
<li>jj.lee</li>
<li>jonhspyro</li>
<li>kanagaraj</li>
<li>kandharvishnu</li>
<li>leoguzman</li>
<li>lucasmo</li>
<li>luoyuliuyin</li>
<li>mahdi alizadeh</li>
<li>majorosdonat</li>
<li>max</li>
<li>mayankymailusfedu</li>
<li>michaeljs-c</li>
<li>morooshka</li>
<li>ninad-opsverse</li>
<li>olegkachur-e</li>
<li>paolomoriello</li>
<li>perry2of5</li>
<li>pgvishnuram</li>
<li>phi-friday</li>
<li>rahulgoyal2987</li>
<li>raphaelauv</li>
<li>rgriffier</li>
<li>rom sharon</li>
<li>saucoide</li>
<li>sbock-slack</li>
<li>sc-anssi</li>
<li>seyoon-lim</li>
<li>simonprydden</li>
<li>skandala23</li>
<li>sonu4578</li>
<li>suyesh-amatya</li>
<li>svellaiyan</li>
<li>tnk-ysk</li>
<li>uzhastik</li>
<li>vatsrahul1001</li>
<li>vfeldsher</li>
<li>xavipuerto</li>
<li>xitep</li>
<li>yangyulely</li>
<li>yunchi</li>
<li>鐘翊修</li>
<li>김영준</li>
</ul>
<h2 id="whats-next">What’s Next</h2>
<p>We’d love your feedback. Try out the release, open issues, file PRs, or just join the conversation on the Airflow dev list, Slack, and GitHub.
Let’s build the future of data orchestration—together.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Survey 2024</title>
    <link href="/blog/airflow-survey-2024/" rel="alternate"/>
    <id>/blog/airflow-survey-2024/</id>
    <published>2025-02-27T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p><img src="/blog/airflow-survey-2024/images/Airflow-Survey-2024-Results-v2.png" alt="Airflow Survey 2024" title="airflow_survey_2024"></p>
<div style="display: flex; align-items: flex-start; gap: 0.75rem;">
  <a href="https://www.astronomer.io/" style="flex-shrink: 0;"><img src="images/astronomer-logo.svg" alt="Astronomer" width="40" height="40" /></a>
  <div>
    <p style="margin: 0;">The interactive report is hosted by <a href="https://www.astronomer.io/">Astronomer</a>. The Apache Airflow community thanks <a href="https://www.astronomer.io/">Astronomer</a> for running this survey, for sponsoring it and providing the report in this form, and for their effort in marketing, analysis, and preparing the graphics.</p>
  </div>
</div>
<hr style="margin: 1rem 0; border: none; border-top: 1px solid #ccc;" />
<p><a href="https://astronomer.typeform.com/report/SF2VGNTc/fRSeRcKKJ3kgYXVl">View raw data</a></p>
<p><a href="/data/survey-responses/airflow-user-survey-responses-2024.csv.zip">Download survey responses (CSV)</a></p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.10.0 is here</title>
    <link href="/blog/airflow-2.10.0/" rel="alternate"/>
    <id>/blog/airflow-2.10.0/</id>
    <published>2024-08-08T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I&rsquo;m happy to announce that Apache Airflow 2.10.0 is now available, bringing an array of noteworthy enhancements and new features that will greatly serve our community.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.10.0/">https://pypi.org/project/apache-airflow/2.10.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.10.0/">https://airflow.apache.org/docs/apache-airflow/2.10.0/</a> <br>
🛠 Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.10.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.10.0/release_notes.html</a> <br>
🐳 Docker Image: &ldquo;docker pull apache/airflow:2.10.0&rdquo; <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.10.0">https://github.com/apache/airflow/tree/constraints-2.10.0</a></p>
<h2 id="airflow-now-collects-telemetry-data-by-default">Airflow now collects Telemetry data by default</h2>
<p>With the release of Airflow 2.10.0, we’ve introduced the collection of basic telemetry data, as outlined <a href="https://airflow.apache.org/docs/apache-airflow/2.10.0/faq.html#does-airflow-collect-any-telemetry-data">here</a>. This data will play a crucial role in helping Airflow maintainers gain a deeper understanding of how Airflow is utilized across various deployments. The insights derived from this information are invaluable in guiding the prioritization of patches, minor releases, and security fixes. Moreover, this data will inform key decisions regarding the development roadmap, ensuring that Airflow continues to evolve in line with community needs.</p>
<p>For those who prefer not to participate in data collection, deployments can easily opt out by setting the <code>[usage_data_collection] enabled</code> option to <code>False</code> or by using the <code>SCARF_ANALYTICS=false</code> environment variable.</p>
<h2 id="multiple-executor-configuration-formerly-hybrid-execution">Multiple Executor Configuration (formerly &ldquo;Hybrid Execution&rdquo;)</h2>
<p>Each executor comes with its unique set of strengths and weaknesses, typically balancing latency, isolation, and compute efficiency. Traditionally, an Airflow environment is limited to a single executor, requiring users to make trade-offs, as no single executor is perfectly suited for all types of tasks.</p>
<p>We are introducing a new feature that allows for the concurrent use of multiple executors within a single Airflow environment. This flexibility enables users to take advantage of the specific strengths of different executors for various tasks, improving overall efficiency and mitigating weaknesses. Users can set a default executor for the entire environment and, if necessary, assign particular executors to individual DAGs or tasks.</p>
<p>To configure multiple executors we can pass comma separated list in airflow configuration. The first executor in the list will be the default executor for the environment.</p>
<pre tabindex="0"><code>[core]
executor = &#39;LocalExecutor,CeleryExecutor&#39;
</code></pre><p>To make it easier for dag authors, we can also specify aliases for executors that can be specified in the executor configuration</p>
<pre tabindex="0"><code class="language-commandline" data-lang="commandline">[core]
executor = &#39;LocalExecutor,KubernetesExecutor,my.custom.module.ExecutorClass:ShortName&#39;
</code></pre><p>DAG authors can specify executors to use at the task</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">BashOperator</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">task_id</span><span class="o">=</span><span class="s2">&#34;hello_world&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">executor</span><span class="o">=</span><span class="s2">&#34;ShortName&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">bash_command</span><span class="o">=</span><span class="s2">&#34;echo &#39;hello world!&#39;&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@task</span><span class="p">(</span><span class="n">executor</span><span class="o">=</span><span class="s2">&#34;KubernetesExecutor&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">hello_world</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;hello world!&#34;</span><span class="p">)</span>
</span></span></code></pre></div><p>We can also specify executors on the DAG level</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">hello_world</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;hello world!&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">hello_world_again</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;hello world again!&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;hello_worlds&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">default_args</span><span class="o">=</span><span class="p">{</span><span class="s2">&#34;executor&#34;</span><span class="p">:</span> <span class="s2">&#34;ShortName&#34;</span><span class="p">},</span>  <span class="c1"># Applies to all tasks in the DAG</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># All tasks will use the executor from default args automatically</span>
</span></span><span class="line"><span class="cl">    <span class="n">hw</span> <span class="o">=</span> <span class="n">hello_world</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">hw_again</span> <span class="o">=</span> <span class="n">hello_world_again</span><span class="p">()</span>
</span></span></code></pre></div><h2 id="dynamic-dataset-scheduling-through-datasetalias">Dynamic Dataset scheduling through DatasetAlias</h2>
<p>Airflow 2.10 comes with <code>DatasetAlias</code> class which can be passed as a value in the <code>outlets</code>, <code>inlets</code> on a task, and <code>schedule</code> on a DAG. An instance of <code>DatasetAlias</code> is resolved dynamically to a real dataset. Downstream can depend on either the resolved dataset or on an alias itself.</p>
<p><code>DatasetAlias</code> has one argument <code>name</code> that uniquely identifies the dataset. The task must first declare the alias as an outlet, and use <code>outlet_events</code> or <code>yield Metadata</code> to add events to it.</p>
<h3 id="emit-a-dataset-event-during-task-execution-through-outlet_events">Emit a dataset event during task execution through outlet_events</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.datasets</span> <span class="kn">import</span> <span class="n">DatasetAlias</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@task</span><span class="p">(</span><span class="n">outlets</span><span class="o">=</span><span class="p">[</span><span class="n">DatasetAlias</span><span class="p">(</span><span class="s2">&#34;my-task-outputs&#34;</span><span class="p">)])</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">my_task_with_outlet_events</span><span class="p">(</span><span class="o">*</span><span class="p">,</span> <span class="n">outlet_events</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">outlet_events</span><span class="p">[</span><span class="s2">&#34;my-task-outputs&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dataset</span><span class="p">(</span><span class="s2">&#34;s3://bucket/my-task&#34;</span><span class="p">))</span>
</span></span></code></pre></div><h3 id="emit-a-dataset-event-during-task-execution-by-yielding-metadata">Emit a dataset event during task execution by yielding Metadata</h3>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.datasets.metadata</span> <span class="kn">import</span> <span class="n">Metadata</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@task</span><span class="p">(</span><span class="n">outlets</span><span class="o">=</span><span class="p">[</span><span class="n">DatasetAlias</span><span class="p">(</span><span class="s2">&#34;my-task-outputs&#34;</span><span class="p">)])</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">my_task_with_metadata</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="n">s3_dataset</span> <span class="o">=</span> <span class="n">Dataset</span><span class="p">(</span><span class="s2">&#34;s3://bucket/my-task}&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">yield</span> <span class="n">Metadata</span><span class="p">(</span><span class="n">s3_dataset</span><span class="p">,</span> <span class="n">alias</span><span class="o">=</span><span class="s2">&#34;my-task-outputs&#34;</span><span class="p">)</span>
</span></span></code></pre></div><p>There are two options for scheduling based on dataset aliases. Schedule based on <code>DatasetAlias</code> or real datasets.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;dataset-alias-producer&#34;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="nd">@task</span><span class="p">(</span><span class="n">outlets</span><span class="o">=</span><span class="p">[</span><span class="n">DatasetAlias</span><span class="p">(</span><span class="s2">&#34;example-alias&#34;</span><span class="p">)])</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">produce_dataset_events</span><span class="p">(</span><span class="o">*</span><span class="p">,</span> <span class="n">outlet_events</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">outlet_events</span><span class="p">[</span><span class="s2">&#34;example-alias&#34;</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dataset</span><span class="p">(</span><span class="s2">&#34;s3://bucket/my-task&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;dataset-consumer&#34;</span><span class="p">,</span> <span class="n">schedule</span><span class="o">=</span><span class="n">Dataset</span><span class="p">(</span><span class="s2">&#34;s3://bucket/my-task&#34;</span><span class="p">)):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;dataset-alias-consumer&#34;</span><span class="p">,</span> <span class="n">schedule</span><span class="o">=</span><span class="n">DatasetAlias</span><span class="p">(</span><span class="s2">&#34;example-alias&#34;</span><span class="p">)):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><h3 id="dataset-aliases-ui-enhancements">Dataset Aliases UI Enhancements</h3>
<p>Now users can see Dataset Aliases in legend of each cross-dag dependency graph with a corresponded icon/color.</p>
<p><img src="/blog/airflow-2.10.0/dag_dependencies_legend.png" alt="DAG Dependencies graph"></p>
<h2 id="dark-mode-for-airflow-ui">Dark Mode for Airflow UI</h2>
<p>Airflow 2.10 comes with new Dark Mode feature which is designed to enhance user experience by offering an alternative visual theme that is easier on the eyes, especially in low-light conditions. You can toggle the crescent icon on the right side of the navigation bar to switch between light and dark mode.</p>
<p><img src="/blog/airflow-2.10.0/airflow_dark_mode.png" alt="Airflow Dark mode"></p>
<p><img src="/blog/airflow-2.10.0/airflow_light_mode.png" alt="Airflow Light mode"></p>
<h2 id="task-instance-history">Task Instance History</h2>
<p>In Apache Airflow 2.10.0, when a task instance is retried or cleared, its execution history is maintained. You can view this history by clicking on the task instance in the Grid view, allowing you to access information about each attempt, such as logs, execution durations, and any failures. This feature improves transparency into the task&rsquo;s execution process, making it easier to troubleshoot and analyze your DAGs.</p>
<p><img src="/blog/airflow-2.10.0/task_instance_history.png" alt="Task instance history"></p>
<p>The history displays the final values of the task instance attributes for each specific run. On the log page, you can also access the logs for each attempt of the task instance. This information is valuable for debugging purposes.</p>
<p><img src="/blog/airflow-2.10.0/task_instance_history_log.png" alt="Task instance history"></p>
<h2 id="dataset-ui-enhancements">Dataset UI Enhancements</h2>
<p>The dataset page has been revamped to include a focused dataset events section with additional details such as extras, consuming DAGs, and producing tasks.
<img src="/blog/airflow-2.10.0/dataset_list.png" alt="Dataset list"></p>
<p>We now have separate dependency graph and dataset list pages in new tabs, enhancing the user experience.</p>
<p><img src="/blog/airflow-2.10.0/dependency_graph.png" alt="Dataset dependency graph"></p>
<p>Dataset events are now displayed in both the Details tab of each DAG run and within the DAG graph.</p>
<p><img src="/blog/airflow-2.10.0/dataset_details.png" alt="Dataset list"></p>
<h3 id="toggle-datasets-in-graph">Toggle datasets in Graph</h3>
<p>We can now toggle the datasets in the DAG graph</p>
<p><img src="/blog/airflow-2.10.0/dataset_toggle_on.png" alt="Dataset toggle button on">
<img src="/blog/airflow-2.10.0/dataset_toggle_off.png" alt="Dataset toggle button off"></p>
<h3 id="dataset-conditions-in-dag-graph-view">Dataset Conditions in DAG Graph view</h3>
<p>We now display the graph view with logical gates. Datasets with actual events are highlighted with a different border, making it easier to see what triggered the selected run.</p>
<p><img src="/blog/airflow-2.10.0/render_dataset_conditions.png" alt="Render dataset conditions in graph view"></p>
<h3 id="dataset-event-info-in-dag-graph">Dataset event info in DAG Graph</h3>
<p>For a DAG run, users can now view the dataset events connected to it directly in the graph view.</p>
<p><img src="/blog/airflow-2.10.0/dataset_info.png" alt="Dataset event info"></p>
<h2 id="on-demand-dag-re-parsing">On-demand DAG Re-parsing</h2>
<p>In 2.10 users can now reparse the DAGs on demand using below button on DAG list and DAG detail pages</p>
<p><img src="/blog/airflow-2.10.0/DAG_reparsing_button_list.png" alt="DAG Reparsing button on DAG list page">
<img src="/blog/airflow-2.10.0/DAG_reparse_button_detail.png" alt="DAG Reparsing button on DAG detail page"></p>
<h2 id="additional-new-features">Additional new features</h2>
<p>Here are just a few interesting new features since there are too many to list in full:</p>
<ul>
<li>Deferrable operators can now execute directly from the triggerer without needing to go through the worker. This is especially efficient for certain operators, like sensors, and can help teams save both time and money.</li>
<li>Crucial executor logs are now integrated into the task logs. If the executor fails to start a task, the relevant error messages will be available in the task logs, simplifying the debugging process.</li>
</ul>
<h2 id="contributors">Contributors</h2>
<p>Thanks to everyone who contributed to this release, including Andrey Anshin, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Jens Scheffler, Tzu-ping Chung, Vincent, and over 63 others!</p>
<p>I hope you enjoy using Apache Airflow 2.10.0!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.9.0: Dataset and UI Improvements</title>
    <link href="/blog/airflow-2.9.0/" rel="alternate"/>
    <id>/blog/airflow-2.9.0/</id>
    <published>2024-04-08T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I’m happy to announce that Apache Airflow 2.9.0 has been released! This time around we have new features for data-aware scheduling and a bunch of UI-related improvements.</p>
<p>Apache Airflow 2.9.0 contains over 550 commits, which include 38 new features, 70 improvements, 31 bug fixes, and 18 documentation changes.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.9.0/">https://pypi.org/project/apache-airflow/2.9.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/">https://airflow.apache.org/docs/apache-airflow/2.9.0/</a> <br>
🛠 Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.9.0/release_notes.html</a> <br>
🐳 Docker Image: &ldquo;docker pull apache/airflow:2.9.0&rdquo; <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.9.0">https://github.com/apache/airflow/tree/constraints-2.9.0</a></p>
<p>Airflow 2.9.0 is also the first release that supports Python 3.12. However, Pendulum 2 does not support Python 3.12, so you’ll need to use <a href="https://pendulum.eustace.io/blog/announcing-pendulum-3-0-0.html">Pendulum 3</a> if you upgrade to Python 3.12.</p>
<h2 id="new-data-aware-scheduling-options">New data-aware scheduling options</h2>
<h3 id="logical-operators-and-conditional-expressions-for-dag-scheduling">Logical operators and conditional expressions for DAG scheduling</h3>
<p>When Datasets were added in Airflow 2.4, DAGs only had scheduling support for logical AND combinations of Datasets. Simply, you could schedule against more than one Dataset, but a DAG run would only be created once all the Datasets were updated after the last run. Now in Airflow 2.9, we support logical OR and even arbitrary combinations of AND and OR.</p>
<p>As an example, you can schedule a DAG whenever <code>dataset_1</code> or <code>dataset_2</code> are updated :</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">schedule</span><span class="o">=</span><span class="p">(</span><span class="n">dataset_1</span> <span class="o">|</span> <span class="n">dataset_2</span><span class="p">),</span> <span class="o">...</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><p>You can have arbitrary combinations:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">schedule</span><span class="o">=</span><span class="p">((</span><span class="n">dataset_1</span> <span class="o">|</span> <span class="n">dataset_2</span><span class="p">)</span> <span class="o">&amp;</span> <span class="n">dataset_3</span><span class="p">),</span> <span class="o">...</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><p>You can read more about this new functionality in the <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/authoring-and-scheduling/datasets.html#advanced-dataset-scheduling-with-conditional-expressions">data-aware scheduling docs</a>.</p>
<h3 id="combining-dataset-and-time-based-schedules">Combining Dataset and Time-Based Schedules</h3>
<p>Airflow 2.9 comes with a new timetable, <code>DatasetOrTimeSchedule</code>, that allows you to schedule DAGs based on both dataset events and a timetable. Now you have the best of both worlds.</p>
<p>For example, to run whenever <code>dataset_1</code> updates and at midnight UTC:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">schedule</span><span class="o">=</span><span class="n">DatasetOrTimeSchedule</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">        <span class="n">timetable</span><span class="o">=</span><span class="n">CronTriggerTimetable</span><span class="p">(</span><span class="s2">&#34;0 0 * * *&#34;</span><span class="p">,</span> <span class="n">timezone</span><span class="o">=</span><span class="s2">&#34;UTC&#34;</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">        <span class="n">datasets</span><span class="o">=</span><span class="p">[</span><span class="n">dag1_dataset</span><span class="p">],</span>
</span></span><span class="line"><span class="cl">    <span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span><span class="line"><span class="cl"><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><h3 id="dataset-event-rest-api-endpoints">Dataset Event REST API endpoints</h3>
<p>New REST API endpoints have been introduced for creating, listing, and deleting dataset events. This makes it possible for external systems to notify Airflow about dataset updates and unlocks management of event queues for more sophisticated use cases.</p>
<p>See the <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/stable-rest-api-ref.html#tag/Dataset">Dataset API docs</a> for more details.</p>
<h3 id="dataset-ui-enhancements">Dataset UI Enhancements</h3>
<p>The DAG&rsquo;s graph view has been enhanced to display both the datasets it is scheduled on and those in the task outlets, providing a comprehensive overview of the datasets consumed and produced by the DAG.</p>
<p><img src="/blog/airflow-2.9.0/datasets-in-graph.png" alt="Datasets in the graph view"></p>
<p>The main datasets view now allows you to filter for both DAGs and datasets:</p>
<p><img src="/blog/airflow-2.9.0/dataset-view-filtering.png" alt="Dataset view filtering"></p>
<p>When viewing a Dataset, you can now create a manual dataset event through the UI by clicking the play button shown in the top right here:</p>
<p><img src="/blog/airflow-2.9.0/create-manual-dataset-event.png" alt="Creating manual Dataset event"></p>
<h2 id="custom-names-for-dynamic-task-mapping">Custom names for Dynamic Task Mapping</h2>
<p>Gone are the days of clicking into index numbers and hunting for the dynamically mapped task you wanted to see! This has been a requested feature ever since task mapping was added in Airflow 2.3, and we are happy it’s finally here.</p>
<p>You can provide a <code>map_index_template</code> to mapped operators:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">BashOperator</span><span class="o">.</span><span class="n">partial</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">task_id</span><span class="o">=</span><span class="s2">&#34;hello&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">bash_command</span><span class="o">=</span><span class="s2">&#34;echo Hello $NAME&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">map_index_template</span><span class="o">=</span><span class="s2">&#34;{{ task.env[&#39;NAME&#39;] }}&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span><span class="o">.</span><span class="n">expand</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">env</span><span class="o">=</span><span class="p">[{</span><span class="s2">&#34;NAME&#34;</span><span class="p">:</span> <span class="s2">&#34;John&#34;</span><span class="p">},</span> <span class="p">{</span><span class="s2">&#34;NAME&#34;</span><span class="p">:</span> <span class="s2">&#34;Bob&#34;</span><span class="p">},</span> <span class="p">{</span><span class="s2">&#34;NAME&#34;</span><span class="p">:</span> <span class="s2">&#34;Fred&#34;</span><span class="p">}],</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span></code></pre></div><p>That template will be rendered after each task finishes running and will populate the name in the UI:</p>
<p><img src="/blog/airflow-2.9.0/dynamic-task-mapping-custom-names.png" alt="Dynamic Task Mapping custom names"></p>
<p>More details on this, including a taskflow example, is available in the <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/authoring-and-scheduling/dynamic-task-mapping.html#named-mapping">dynamic task mapping docs</a>.</p>
<h2 id="object-storage-as-xcom-backend">Object Storage as XCom Backend</h2>
<p>You can now configure Object Storage to be used as an XCom backend, making it much easier to get XCom results into an object store. Deployment managers can configure the object store of their choice, a size threshold to route some results to the Airflow metadata database and some to the object store, and even a compression method to apply before the data is stored.</p>
<p>The following configuration will store anything above 1MB in S3 and will compress it using gzip:</p>
<pre tabindex="0"><code>[core]
xcom_backend = airflow.providers.common.io.xcom.backend.XComObjectStoreBackend

[common.io]
xcom_objectstorage_path = s3://conn_id@mybucket/key
xcom_objectstorage_threshold = 1048576
xcom_objectstorage_compression = gzip
</code></pre><p>See the <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/core-concepts/xcoms.html#object-storage-xcom-backend">docs on the object storage xcom backend</a> for more details.</p>
<h2 id="display-names-for-dags-and-tasks">Display names for DAGs and Tasks</h2>
<p>Get your emojis ready! You can now set a display name for dags and tasks, separate from the <code>dag_id</code> and <code>task_id</code>. This allows you to have localized display names in the UI, or just use a bunch of emojis.</p>
<p>Using <code>dag_display_name</code> and <code>task_display_name</code>, you can break away from the ascii handcuffs:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="s2">&#34;not_a_fun_dag_id&#34;</span><span class="p">,</span> <span class="n">dag_display_name</span><span class="o">=</span><span class="s2">&#34;📣 Best DAG ever 🎉&#34;</span><span class="p">,</span> <span class="o">...</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">BashOperator</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s2">&#34;some_task&#34;</span><span class="p">,</span> <span class="n">task_display_name</span><span class="o">=</span><span class="s2">&#34;🥳 Fun task!&#34;</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
</span></span></code></pre></div><p><img src="/blog/airflow-2.9.0/display-names.png" alt="Display names for DAGs and tasks"></p>
<h2 id="task-log-grouping">Task log grouping</h2>
<p>Airflow now has support for arbitrary grouping of task logs.</p>
<p>By default, pre-execute and post-execute logs are grouped and collapsed, making it easier to see your task logs:</p>
<p><img src="/blog/airflow-2.9.0/pre-post-logs-grouped.png" alt="Pre and post execute logs are grouped"></p>
<p>You can also use this feature in your task code to make your logs easier to follow:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@task</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">big_hello</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;::group::Setup our big Hello&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="n">greeting</span> <span class="o">=</span> <span class="s2">&#34;&#34;</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="s2">&#34;Hello Airflow 2.9&#34;</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="n">greeting</span> <span class="o">+=</span> <span class="n">c</span>
</span></span><span class="line"><span class="cl">        <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Adding </span><span class="si">{</span><span class="n">c</span><span class="si">}</span><span class="s2"> to our greeting. Current greeting: </span><span class="si">{</span><span class="n">greeting</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;::endgroup::&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="n">greeting</span><span class="p">)</span>
</span></span></code></pre></div><p>That custom group is collapsed by default:</p>
<p><img src="/blog/airflow-2.9.0/custom-log-grouping.png" alt="Custom log grouping collapsed by default"></p>
<p>And it can be expanded if you want to dig into the details:</p>
<p><img src="/blog/airflow-2.9.0/custom-log-grouping-expanded.png" alt="Custom log grouping expanded"></p>
<h2 id="ui-modernization">UI Modernization</h2>
<p>In addition to all the UI improvements mentioned above, we have a bunch more improvements in Airflow 2.9!</p>
<p>The rest of the DAG level views have been moved into React and the grid view interface, allowing for a more cohesive experience. This includes the calendar, task duration, run duration (which replaces landing times), and the audit log. These weren’t just “moved”, they each were improved upon as well.</p>
<p>Here is the new run duration view, which replaces landing times. Users can toggle between landing times and simple run duration:</p>
<p><img src="/blog/airflow-2.9.0/run-duration.png" alt="Run duration"></p>
<p>And the new task duration view. Users can toggle queued time on/off and see the median value across the displayed runs:</p>
<p><img src="/blog/airflow-2.9.0/task-duration.png" alt="Task duration"></p>
<h2 id="additional-new-features">Additional new features</h2>
<p>Here are just a few interesting new features since there are too many to list in full:</p>
<ul>
<li>All create/update/delete actions in the REST API are now recorded in the audit log</li>
<li><a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/administration-and-deployment/logging-monitoring/callbacks.html#callback-types">New <code>on_skipped_callback</code></a></li>
<li><a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/core-concepts/dags.html#dag-auto-pausing-experimental">Auto pause DAGs after n consecutive failures</a></li>
<li>Support for <a href="https://matomo.org/">Matomo</a> as an <a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/administration-and-deployment/logging-monitoring/tracking-user-activity.html">analytics tool</a></li>
<li><a href="https://airflow.apache.org/docs/apache-airflow/2.9.0/howto/operator/bash.html">New <code>@task.bash</code> TaskFlow decorator</a></li>
<li>Support regex in dag_id for the DAG pause and resume CLI commands</li>
<li><code>airflow tasks test</code> now works with deferrable operators</li>
</ul>
<h2 id="contributors">Contributors</h2>
<p>Thanks to everyone who contributed to this release, including Amogh Desai, Andrey Anshin, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Jens Scheffler, Tzu-ping Chung, Vincent Beck, Wei Lee, and over 120 others!</p>
<p>I’d especially like to thank our release manager, Ephraim, for getting this release out the door.</p>
<p>I hope you enjoy using Apache Airflow 2.9.0!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Vulnerability in long deprecated OpenID authentication method in Flask AppBuilder</title>
    <link href="/blog/fab-oid-vulnerability/" rel="alternate"/>
    <id>/blog/fab-oid-vulnerability/</id>
    <published>2024-02-26T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h1 id="vulnerability-in-long-deprecated-openid-authentication-method-in-flask-appbuilder">Vulnerability in long deprecated OpenID authentication method in Flask AppBuilder</h1>
<p>Recently <a href="https://www.linkedin.com/in/islam-rzayev">Islam Rzayev</a> made us aware of a vulnerability in the
long deprecated OpenID authentication method in Flask AppBuilder. This vulnerability allowed a malicious user
to take over the identity of any Airflow UI user by forging a specially crafted request and implementing
their own OpenID service. While this is an old, deprecated and almost not used authentication method, we still
took the issue seriously.</p>
<p>This issue ONLY affects users who have <code>AUTH_OID</code> set in their <code>webserver_config.py</code> file as
<code>AUTH_TYPE</code>. This is a very old and deprecated authentication method that is unlikely to be used by anyone.</p>
<p>We would like to advise even the small number of our users that still use this
authentication method to take an immediate action and either upgrade to Apache Airflow 2.8.2 or switch to
another authentication method (or apply a workaround we provide if they cannot do either of the above
immediately).</p>
<p>Important to stress, because many of the users might get confused by the name, OpenID is NOT the same as
OpenID Connect. Those are completely different protocols and while OpenID Connect (also known as OIDC) is
a modern, widely used  protocol, OpenID is a legacy protocol that has been deprecated more than 10 years
ago and since then has been abandoned by almost everyone in the community, including all services in
Flask AppBuilder example services that supported it, so it is highly unlikely someone is still using it.</p>
<p>Due to this highly unlikely configuration the <a href="https://www.cve.org/CVERecord?id=CVE-2024-25128">Flask AppBuilder CVE</a>
is just &ldquo;Moderate&rdquo; not &ldquo;Critical&rdquo;. It affects a very small (if any) number of users and it&rsquo;s not likely
to be a target for an attack. However, we still advise our users who still use AUTH_OID to apply remediation.</p>
<p>This vulnerability is fixed in Flask Appbuilder 4.3.11 and Apache Airflow 2.8.2 uses that version of Flask
Application Builder. We advise users who still use this authentication method to either switch to another
authentication method or upgrade to Apache Airflow 2.8.2. If they cannot do either
of these solutions quickly, they should apply the workaround provided below.</p>
<h2 id="impact">Impact</h2>
<p>When Flask-AppBuilder is set to <code>AUTH_TYPE</code> set to <code>AUTH_OID</code>, it allows an attacker to forge an HTTP
request that could deceive the backend into using any requested OpenID service. This vulnerability
could grant an attacker unauthorised privilege access if a custom OpenID service is deployed
by the attacker and accessible by the backend.</p>
<p>This vulnerability is only exploitable when the application is using OpenID (not OpenID Connect also known
as OIDC). Currently, this protocol is regarded as legacy, with significantly reduced usage.</p>
<h2 id="possible-remediation">Possible remediation</h2>
<ul>
<li>Change your authentication method - if you are using <code>AUTH_OID</code>, there are almost no commercial services
supporting it, it was deprecated 10 years ago and abandoned by nearly everyone in the community 4 years
ago. Your best choice is to choose a different authentication method.</li>
<li>Upgrade to Apache Airflow 2.8.2 (which also upgrades to Flask-AppBuilder 4.3.11 that contains a fix)</li>
<li>If upgrade is not possible, apply the workaround below</li>
</ul>
<h2 id="workarounds">Workarounds</h2>
<p>If upgrade or changing authentication method is not possible add the following to
your <code>webserver_config.py</code> file to fix the issue:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">import</span> <span class="nn">os</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">flash</span><span class="p">,</span> <span class="n">redirect</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">flask_appbuilder.security.forms</span> <span class="kn">import</span> <span class="n">LoginForm_oid</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">flask_appbuilder.security.views</span> <span class="kn">import</span> <span class="n">AuthOIDView</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">flask_appbuilder.views</span> <span class="kn">import</span> <span class="n">expose</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.www.security</span> <span class="kn">import</span> <span class="n">AirflowSecurityManager</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">basedir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">abspath</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="vm">__file__</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">FixedOIDView</span><span class="p">(</span><span class="n">AuthOIDView</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="nd">@expose</span><span class="p">(</span><span class="s2">&#34;/login/&#34;</span><span class="p">,</span> <span class="n">methods</span><span class="o">=</span><span class="p">[</span><span class="s2">&#34;GET&#34;</span><span class="p">,</span> <span class="s2">&#34;POST&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">login</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">flag</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">form</span> <span class="o">=</span> <span class="n">LoginForm_oid</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">        <span class="k">if</span> <span class="n">form</span><span class="o">.</span><span class="n">validate_on_submit</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">            <span class="n">identity_url</span> <span class="o">=</span> <span class="kc">None</span>
</span></span><span class="line"><span class="cl">            <span class="k">for</span> <span class="n">provider</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">appbuilder</span><span class="o">.</span><span class="n">sm</span><span class="o">.</span><span class="n">openid_providers</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="k">if</span> <span class="n">provider</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&#34;url&#34;</span><span class="p">)</span> <span class="o">==</span> <span class="n">form</span><span class="o">.</span><span class="n">openid</span><span class="o">.</span><span class="n">data</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                    <span class="n">identity_url</span> <span class="o">=</span> <span class="n">form</span><span class="o">.</span><span class="n">openid</span><span class="o">.</span><span class="n">data</span>
</span></span><span class="line"><span class="cl">            <span class="k">if</span> <span class="n">identity_url</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">                <span class="n">flash</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">invalid_login_message</span><span class="p">,</span> <span class="s2">&#34;warning&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">                <span class="k">return</span> <span class="n">redirect</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">appbuilder</span><span class="o">.</span><span class="n">get_url_for_login</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="n">login</span><span class="p">(</span><span class="n">flag</span><span class="o">=</span><span class="n">flag</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">class</span> <span class="nc">FixedAirflowSecurityManager</span><span class="p">(</span><span class="n">AirflowSecurityManager</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="n">authoidview</span> <span class="o">=</span> <span class="n">FixedOIDView</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">SECURITY_MANAGER_CLASS</span> <span class="o">=</span> <span class="n">FixedAirflowSecurityManager</span>
</span></span></code></pre></div><h2 id="credits">Credits</h2>
<p>Big thanks to <a href="https://www.linkedin.com/in/islam-rzayev">Islam Rzayev</a> for finding out and reporting the issue responsibly and to <a href="https://github.com/dpgaspar">Daniel Gaspar</a> for
very close cooperation on this one and coordinating the disclosure together with the <a href="https://superset.apache.org/">Apache Superset</a>
where Flask AppBuilder is also used.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.8.0 is here</title>
    <link href="/blog/airflow-2.8.0/" rel="alternate"/>
    <id>/blog/airflow-2.8.0/</id>
    <published>2023-12-15T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I am thrilled to announce the release of Apache Airflow 2.8.0, featuring a host of significant enhancements and new features that will greatly benefit our community.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.8.0/">https://pypi.org/project/apache-airflow/2.8.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.8.0/">https://airflow.apache.org/docs/apache-airflow/2.8.0/</a> <br>
🛠 Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.8.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.8.0/release_notes.html</a> <br>
🐳 Docker Image: &ldquo;docker pull apache/airflow:2.8.0&rdquo; <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.8.0">https://github.com/apache/airflow/tree/constraints-2.8.0</a></p>
<h2 id="airflow-object-storage-aip-58">Airflow Object Storage (AIP-58)</h2>
<p><em>This feature is experimental and subject to change.</em></p>
<p>Airflow now offers a generic abstraction layer over various object stores like S3, GCS, and Azure Blob Storage, enabling the use of different storage systems in DAGs without code modification.</p>
<p>In addition, it allows you to use most of the standard Python modules, like shutil, that can work with file-like objects.</p>
<p>Here is an example of how to use the new feature to open a file:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.io.path</span> <span class="kn">import</span> <span class="n">ObjectStoragePath</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">base</span> <span class="o">=</span> <span class="n">ObjectStoragePath</span><span class="p">(</span><span class="s2">&#34;s3://my-bucket/&#34;</span><span class="p">,</span> <span class="n">conn_id</span><span class="o">=</span><span class="s2">&#34;aws_default&#34;</span><span class="p">)</span>  <span class="c1"># conn_id is optional</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@task</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">read_file</span><span class="p">(</span><span class="n">path</span><span class="p">:</span> <span class="n">ObjectStoragePath</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="k">with</span> <span class="n">path</span><span class="o">.</span><span class="n">open</span><span class="p">()</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">        <span class="k">return</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span>
</span></span></code></pre></div><p>The above example is just the tip of the iceberg. The new feature allows you to configure an alternative backend for a scheme or protocol.</p>
<p>Here is an example of how to configure a custom backend for the <code>dbfs</code> scheme:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.io.path</span> <span class="kn">import</span> <span class="n">ObjectStoragePath</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.io.store</span> <span class="kn">import</span> <span class="n">attach</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">fsspec.implementations.dbfs</span> <span class="kn">import</span> <span class="n">DBFSFileSystem</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">attach</span><span class="p">(</span><span class="n">protocol</span><span class="o">=</span><span class="s2">&#34;dbfs&#34;</span><span class="p">,</span> <span class="n">fs</span><span class="o">=</span><span class="n">DBFSFileSystem</span><span class="p">(</span><span class="n">instance</span><span class="o">=</span><span class="s2">&#34;myinstance&#34;</span><span class="p">,</span> <span class="n">token</span><span class="o">=</span><span class="s2">&#34;mytoken&#34;</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="n">base</span> <span class="o">=</span> <span class="n">ObjectStoragePath</span><span class="p">(</span><span class="s2">&#34;dbfs://my-location/&#34;</span><span class="p">)</span>
</span></span></code></pre></div><p>For more information: <a href="https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/objectstorage.html">Airflow Object Storage</a></p>
<p>The support for a specific object storage system depends on the installed providers,
with out-of-the-box support for the file scheme.</p>
<h2 id="ship-logs-from-other-components-to-task-logs">Ship logs from other components to Task logs</h2>
<p>This feature seamlessly integrates task-related messages from various Airflow components, including the Scheduler and
Executors, into the task logs. This integration allows users to easily track error messages and other relevant
information within a single log view.</p>
<p>Presently, suppose a task is terminated by the scheduler before initiation, times out due to prolonged queuing, or transitions into a zombie state. In that case, it is not recorded in the task log. With this enhancement, in such situations,
it becomes feasible to dispatch an error message to the task log for convenient visibility on the UI.</p>
<p>This feature can be toggled, for more information <a href="https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#enable-task-context-logger">see “enable_task_context_logger” in the logging configuration documentation</a>.</p>
<h2 id="listener-hooks-for-datasets">Listener hooks for Datasets</h2>
<p><em>Please note that listeners are still experimental and subject to change.</em></p>
<p>This feature enables users to subscribe to Dataset creation and update events using listener hooks.
It’s particularly useful to trigger external processes based on a Dataset being created or updated.</p>
<h2 id="using-extra-index-urls-with-pythonvirtualenvoperator-and-caching">Using Extra Index URLs with PythonVirtualEnvOperator and Caching</h2>
<p>This feature allows you to specify extra index URLs to PythonVirtualEnvOperator (+corresponding decorator) to be able to install virtualenvs with (private) additional Python package repositories.</p>
<p>You can also reuse the virtualenvs by caching them in a specified directory and reusing them in subsequent runs. This
can be achieved by setting the <code>venv_cache_path</code> to a file system folder on your worker</p>
<p>For more information: <a href="https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonvirtualenvoperator">PythonVirtualenvOperator</a></p>
<h1 id="web-ui-improvements">Web UI improvements</h1>
<p>There are a number of improvements to the Web UI in this release, including:</p>
<h2 id="add-multiselect-to-run-state-in-grid-view">Add multiselect to run state in grid view:</h2>
<p>The grid view now supports multiselect for run states. This allows you to select multiple states to filter the dag runs shown in the grid view.</p>
<p><img src="/blog/airflow-2.8.0/multiselect-states.png" alt="Multiselect on the run state"></p>
<h2 id="improved-visibility-of-task-status-in-the-graph-view">Improved visibility of task status in the Graph view</h2>
<p>You can now see the status of a task in the graph view through the border color of the task. This makes it easier to see the status of a task at a glance.</p>
<p><img src="/blog/airflow-2.8.0/task_status_visibility.png" alt="Task status visibility"></p>
<h2 id="raw-html-code-in-dag-docs-and-dag-params-descriptions-is-disabled-by-default">Raw HTML code in DAG docs and DAG params descriptions is disabled by default</h2>
<p>As part of our continuous quest to make airflow more secure by default, we have disabled raw HTML code in DAG docs and DAG params descriptions by default.
We care for your security, and &ldquo;secure by default&rdquo; is one of the things we follow strongly.</p>
<p>Other notable UI improvements include:</p>
<ul>
<li>Simplify DAG trigger UI</li>
<li>Hide logical date and run id in trigger UI form</li>
<li>Move external logs links to top of react logs page</li>
</ul>
<p>Additional new features and improvements can be found in the <a href="https://airflow.apache.org/docs/apache-airflow/2.8.0/release_notes.html#airflow-2-8-0-2023-12-14">Airflow 2.8.0 release notes</a>.</p>
<h1 id="contributors">Contributors</h1>
<p>Thanks to everyone who contributed to this release, including Amogh Desai, Andrey Anshin, Bolke de Bruin, Daniel Dyląg, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Jens Scheffler, mhenc, Miroslav Šedivý, Pankaj Koti, Tzu-ping Chung, Vincent, and everyone else who committed, all 110 of you! You are what makes Airflow the successful project that it is!</p>
<p>I hope you enjoy using Apache Airflow 2.8.0!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Survey 2023</title>
    <link href="/blog/airflow-survey-2023/" rel="alternate"/>
    <id>/blog/airflow-survey-2023/</id>
    <published>2023-09-21T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p><img src="/blog/airflow-survey-2023/images/Astronomer_Demographics.png" alt="Demographics" title="airflow_usage">
<img src="/blog/airflow-survey-2023/images/Astronomer_Community_and_Contribution.png" alt="Community and Contribution" title="community_and_contributions">
<img src="/blog/airflow-survey-2023/images/Airflow-Survey-2023-Results--Airflow-Usage-Page-1-Revised.png" alt="Airflow Usage Page 1">
<img src="/blog/airflow-survey-2023/images/Astronomer-Airflow-Survey-2023-Results-Airflow-Usage-Page-2-Landscape.png" alt="Airflow Usage Page 2">
<img src="/blog/airflow-survey-2023/images/Airflow-Survey-2023-Results-Airflow-Usage-Page-3-Revised-Landscape@2x.png" alt="Airflow Usage Page 3">
<img src="/blog/airflow-survey-2023/images/Astronomer-Airflow-Survey-2023-Results-Future-Landscape@2x.png" alt="Future"></p>
<p><a href="https://docs.google.com/forms/d/1wYm6c5Gn379zkg7zD7vcWB-1fCjnOocT0oZm-tjft_Q/viewanalytics">View Raw Data</a></p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.7.0 is here</title>
    <link href="/blog/airflow-2.7.0/" rel="alternate"/>
    <id>/blog/airflow-2.7.0/</id>
    <published>2023-08-18T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I’m happy to announce that Apache Airflow 2.7.0 has been released! Some notable features have been added that we are excited for the community to use.</p>
<p>Apache Airflow 2.7.0 contains over 500 commits, which include 40 new features, 49 improvements, 53 bug fixes, and 15 documentation changes.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.7.0/">https://pypi.org/project/apache-airflow/2.7.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.7.0/">https://airflow.apache.org/docs/apache-airflow/2.7.0/</a> <br>
🛠 Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.7.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.7.0/release_notes.html</a> <br>
🐳 Docker Image: &ldquo;docker pull apache/airflow:2.7.0&rdquo; <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.7.0">https://github.com/apache/airflow/tree/constraints-2.7.0</a></p>
<p>Airflow 2.7.0 is a release that focuses on security. The Airflow security team, working together with security researchers, identified a number of areas that required strengthening of security. This resulted in, among others things, an improved description of the <a href="https://airflow.apache.org/docs/apache-airflow/stable/security/security_model/">Airflow security model</a>, a better explanation of our <a href="https://github.com/apache/airflow/security/policy">security policy</a> and the disabling of certain, potentially dangerous, features by default - like, for example, connection testing (#32052).</p>
<p>Airflow 2.7.0 is also the first release that drops support for end-of-life Python 3.7. This allows Airflow users and maintainers to make use of features and improvements in Python 3.8, and unlocks newer versions of our dependencies.</p>
<h2 id="setup-and-teardown-aip-52">Setup and Teardown (AIP-52)</h2>
<p>Airflow now has first class support for the concept of setup and teardown tasks. These tasks have special behavior in that:</p>
<ul>
<li>Teardown tasks will still run, no matter what state the upstream tasks end up in</li>
<li>Teardown tasks failing won’t, by default, cause the DAG run to fail</li>
<li>Automatically clear setup/teardown tasks when clearing a dependent task</li>
</ul>
<p>You can read more about setup and teardown in the <a href="/blog/introducing_setup_teardown/">Introducing Setup and Teardown tasks blog post</a>, or in the <a href="https://airflow.apache.org/docs/apache-airflow/2.7.0/howto/setup-and-teardown.html">setup and teardown docs</a>.</p>
<h2 id="cluster-activity-ui">Cluster Activity UI</h2>
<p>There is a new top level page in Airflow, the Cluster Activity page. This gives an overview of the cluster, including component health, dag and task state counts, and more!</p>
<p><img src="/blog/airflow-2.7.0/cluster_activity.png" alt="New cluster activity page"></p>
<h2 id="graph-and-gantt-views-moved-into-the-grid-view-ui">Graph and gantt views moved into the Grid view UI</h2>
<p>The graph and gantt views have been rewritten and moved into the now familiar grid view. This makes it easier to jump between task details, logs, graph, and gantt views without losing your place in a complicated DAG.</p>
<p><img src="/blog/airflow-2.7.0/graph_in_grid.png" alt="Graph in grid view"></p>
<h2 id="enable-deferrable-mode-for-all-deferable-tasks-with-1-config-setting">Enable deferrable mode for all deferable tasks with 1 config setting</h2>
<p>Airflow 2.7.0 comes with a new config option, <code>default_deferrable</code>, which allows admins to enable deferrable mode for all deferrable tasks without requiring any DAG modifications. Simply set it in your config and enjoy async tasks!</p>
<h2 id="openlineage-built-in-integration">OpenLineage built-in integration</h2>
<p><a href="https://openlineage.io/">OpenLineage</a> provides a spec standardizing operational lineage collection and distribution across the data ecosystem that projects – open source or proprietary – implement.</p>
<p>With 2.7.0, OpenLineage changes from a plugin implementation maintained in the OpenLineage project to a built-in feature of Airflow. As a plugin, OpenLineage depended on Airflow and operators’ internals, making it brittle. Built-in OpenLineage support in Airflow makes publishing operational lineage through the OpenLineage ecosystem easier and more reliable. It has been implemented by moving the <a href="https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow">openlineage-airflow</a> package from the OpenLineage project to an <code>apache-airflow-providers-openlineage</code> provider in the base Airflow Docker image, where it can be easily enabled by configuration. Also, lineage extraction logic that was included in <a href="https://github.com/OpenLineage/OpenLineage/tree/main/integration/airflow/openlineage/airflow/extractors">Extractors</a> in that package has been moved into each corresponding provider package along with unit tests, eliminating the need for Extractors in most cases. For this purpose, a new optional API for Operators (<code>get_openlineage_facets_on_{start(), complete(ti), failure(ti)}</code>, documented <a href="https://openlineage.io/docs/integrations/airflow/default-extractors">here</a>) can be used. Having the extraction logic in each provider ensures the stability of the lineage contract in each operator and makes adding lineage coverage to custom operators easier.</p>
<h2 id="some-executors-moved-into-providers">Some executors moved into providers</h2>
<p>Some of the executors that were shipped in core Airflow have moved into their respective providers for Airflow 2.7.0. The great benefit of this is to allow faster bug-fix releases as providers are released independently of core.
The following providers have been moved and require certain minimum providers versions:</p>
<ul>
<li>In order to use Celery executors, install the <a href="https://pypi.org/project/apache-airflow-providers-celery/">celery provider version 3.3.0+</a></li>
<li>In order to use the Kubernetes executor, install the <a href="https://pypi.org/project/apache-airflow-providers-cncf-kubernetes/">kubernetes provider version 7.4.0+</a></li>
<li>In order to use the Dask executor, install any version of the <a href="https://pypi.org/project/apache-airflow-providers-daskexecutor/">daskexecutor provider</a></li>
</ul>
<p>If you use the official docker images, all of these providers come preinstalled.</p>
<h2 id="additional-new-features">Additional new features</h2>
<p>Here are just a few interesting new features, since there are too many to list in full:</p>
<ul>
<li>Pools can now consider tasks in the deferred state as running (#32709)</li>
<li>chain_linear, like chain but allowing sequential tasks (#31927)</li>
<li>Grid view now supports keyboard shortcuts! (#30950)</li>
<li>Mark task groups as success or failed (#30478)</li>
<li>Fail_stop, allowing all remaining and running tasks to be failed on the first failure in a DAG (#29406)</li>
</ul>
<h2 id="contributors">Contributors</h2>
<p>Thanks to everyone who contributed to this release, including Akash Sharma, Amogh Desai, Brent Bovenzi, D. Ferruzzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Karthikeyan Singaravelan, Maciej Obuchowski, Niko Oliveira, Pankaj Koti, Pankaj Singh, Pierre Jeambrun, Tzu-ping Chung, Utkarsh Sharma, Vincent Beck, and over 74 others!</p>
<p>I’d especially like to thank our release manager, Ephraim, for getting this release out the door.</p>
<p>I hope you enjoy using Apache Airflow 2.7.0!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Introducing Setup and Teardown tasks</title>
    <link href="/blog/introducing_setup_teardown/" rel="alternate"/>
    <id>/blog/introducing_setup_teardown/</id>
    <published>2023-08-18T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>In data pipelines, commonly we need to create infrastructure resources, like a cluster or GPU nodes in an existing cluster, before doing the actual “work” and delete them after the work is done. Airflow 2.7 adds “setup” and “teardown” tasks to better support this type of pipeline. This blog post aims to highlight the key features so you know what’s possible. For full documentation on how to use setup and teardown tasks, see the <a href="https://airflow.apache.org/docs/apache-airflow/2.7.0/howto/setup-and-teardown.html">setup and teardown docs</a>.</p>
<h2 id="why-setup-and-teardown">Why setup and teardown?</h2>
<p>Before we dig into examples, let me state at high level what setup and teardown bring to the table.</p>
<h3 id="more-expressive-dependencies">More expressive dependencies</h3>
<p>Before setup and teardown, upstream and downstream relationships could only mean one thing: “this comes before that”. With setup and teardown, in effect we can say “this requires that”. And what it means in practice is, if you clear your task, and it requires a setup, that setup will be cleared too. And if that setup has a teardown, that will run again as well.</p>
<h3 id="separating-the-work-from-the-infra">Separating the work from the infra</h3>
<p>Sometimes the part of the dag you care about is not, say, the cleanup task. For example, suppose you have a dag that loads some data and then deletes temp files. As long as the data loads, you want your dag to be marked successful. By default, this is how teardown tasks work; that is, they are ignored when determining dag run state.</p>
<h2 id="simple-case">Simple case</h2>
<p>A simple example is one setup / teardown pair, and one normal or “work” task.</p>
<p><img src="/blog/introducing_setup_teardown/simple.png" alt="Simple setup and teardown example"></p>
<p>Setups and teardowns are indicated by the up and down arrows, respectively. From that we can see that .<code>create_cluster</code> is a setup task and <code>delete_cluster</code> is a teardown. The link between a setup and a teardown is always dotted to highlight the special relationship.</p>
<p>Some things to observe:</p>
<ul>
<li>If <code>create_cluster</code> fails, neither <code>run_query</code> nor <code>delete_cluster</code> will run.</li>
<li>If <code>create_cluster</code> succeeds and <code>run_query</code> fails, then <code>delete_cluster</code> will still run.</li>
<li>If <code>create_cluster</code> is skipped, <code>run_query</code> and <code>delete_cluster</code> will be skipped</li>
<li>By default, if <code>run_query</code> succeeds, and <code>delete_cluster</code> fails, then the dag run will still be marked successful. (This behavior can be overridden).</li>
</ul>
<h2 id="authoring-with-task-groups">Authoring with task groups</h2>
<p>When we set something downstream of a task group, any teardowns in the task group are ignored. This reflects the assumption that in general, we probably don’t want to stop dag execution just because a teardown fails. So, let’s wrap the above dag in a task group and see what happens:</p>
<p><img src="/blog/introducing_setup_teardown/task-group-arrow.png" alt="Setup and teardown in task groups"></p>
<p>And here’s how we linked those groups in the code:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">TaskGroup</span><span class="p">(</span><span class="s2">&#34;do_emr&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">do_emr</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">create_cluster_task</span> <span class="o">=</span> <span class="n">create_cluster</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">run_query</span><span class="p">(</span><span class="n">create_cluster_task</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="n">delete_cluster</span><span class="p">(</span><span class="n">create_cluster_task</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">TaskGroup</span><span class="p">(</span><span class="s2">&#34;load&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">load</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">create_config_task</span> <span class="o">=</span> <span class="n">create_configuration</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">    <span class="n">load_data</span><span class="p">(</span><span class="n">create_config_task</span><span class="p">)</span> <span class="o">&gt;&gt;</span> <span class="n">delete_configuration</span><span class="p">(</span><span class="n">create_config_task</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">do_emr</span> <span class="o">&gt;&gt;</span> <span class="n">load</span>
</span></span></code></pre></div><p>In this code, each group has a teardown, and we just arrow the first group to the second. As advertised, <code>delete_cluster</code>, a teardown task, is ignored. This has two important consequences: one, even if it fails, the <code>load</code> group will still run; and two, <code>delete_cluster</code> and <code>create_configuration</code> can run in parallel (generally speaking, we’d imagine you don’t want to wait for teardown operations to complete before continuing onto other tasks in the dag). Of course, you can override this behavior by adding an arrow between <code>delete_cluster</code> and <code>create_configuration</code>. Further, the success of this dag will depend only on whether the <code>load_data</code> task completes successfully.</p>
<h2 id="conclusion">Conclusion</h2>
<p>There’s a lot of detail we’re omitting here about exactly how to write dags with setup and teardown tasks, and for that please head over to the <a href="https://airflow.apache.org/docs/apache-airflow/2.7.0/howto/setup-and-teardown.html">setup and teardown docs</a>. But hopefully this post gives you enough of an idea of what is possible with setup and teardown tasks that you can begin to see where they can improve your data pipelines in Airflow.</p>
<p>Curious to know what else is new in Airflow 2.7? Head over to the main <a href="/blog/airflow-2.7.0/">Airflow 2.7 blog post</a> to find out!</p>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>Setup and Teardown was the product of AIP-52. Thanks to everyone who contributed to it, including those that read and voted on the AIP. Special thanks to Ash Berlin-Taylor, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Jed Cunningham, Rahul Vats, and Vikram Koka.</p>
]]></content>
  </entry>
  
  <entry>
    <title>what&#39;s new in Apache Airflow 2.6.0</title>
    <link href="/blog/airflow-2.6.0/" rel="alternate"/>
    <id>/blog/airflow-2.6.0/</id>
    <published>2023-04-30T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I am excited to announce that Apache Airflow 2.6.0 has been released, bringing many minor features and improvements to the community.</p>
<p>Apache Airflow 2.6.0 contains over 500 commits, which include 42 new features, 58 improvements, 38 bug fixes, and 17 documentation changes.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.6.0/">https://pypi.org/project/apache-airflow/2.6.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.6.0/">https://airflow.apache.org/docs/apache-airflow/2.6.0/</a> <br>
🛠 Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.6.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.6.0/release_notes.html</a> <br>
🐳 Docker Image: &ldquo;docker pull apache/airflow:2.6.0&rdquo; <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.6.0">https://github.com/apache/airflow/tree/constraints-2.6.0</a></p>
<p>As the changelog is quite large, the following are some notable new features that shipped in this release.</p>
<h2 id="trigger-logs-can-now-be-viewed-in-webserver">Trigger logs can now be viewed in webserver</h2>
<p>Trigger logs have now been added to task logs. They appear right alongside the rest of the logs from your task.</p>
<p><img src="/blog/airflow-2.6.0/trigger_logging.png" alt="Trigger logs shown in task log"></p>
<p>Adding this feature required changes across the entire Airflow logging stack, so be sure to update your providers if you are using remote logging.</p>
<h2 id="grid-view-improvements">Grid view improvements</h2>
<p>The grid view has received a number of minor improvements in this release.</p>
<p>Most notably, there is now a graph tab in the grid view. This offers a more integrated graph representation of the DAG, where choosing a task in either the grid or graph will highlight the same task in both views.</p>
<p><img src="/blog/airflow-2.6.0/graph.png" alt="The new graph view"></p>
<p>You can also filter upstream and downstream from a single task. For example, in the screenshot above, <code>describe_integrity</code> is the selected task. If you choose to filter downstream, this is the result:</p>
<p><img src="/blog/airflow-2.6.0/filter_downstream.png" alt="The new graph view can be filtered to show downstream tasks only"></p>
<h2 id="trigger-ui-based-on-dag-level-params">Trigger UI based on DAG level params</h2>
<p>A user-friendly form is now shown to users triggering runs for DAGs with DAG level params.</p>
<p><img src="/blog/airflow-2.6.0/trigger_dag_form.png" alt="Form shown for params in UI when triggering a DAG"></p>
<p>See the <a href="https://airflow.apache.org/docs/apache-airflow/2.6.0/core-concepts/params.html#use-params-to-provide-a-trigger-ui-form">Params docs</a> for more details.</p>
<h2 id="consolidation-of-handling-stuck-queued-tasks">Consolidation of handling stuck queued tasks</h2>
<p>Airflow now has a single configuration, <code>[scheduler] task_queued_timeout</code>, to handle tasks that get stuck in queued for too long. With a simpler implementation than the outgoing code handling these tasks, tasks stuck in queued will no longer slip through the cracks and stay stuck.</p>
<p>For more details, see the <a href="https://medium.com/apache-airflow/unsticking-airflow-stuck-queued-tasks-are-no-more-in-2-6-0-6f40a1a22835">Unsticking Airflow: Stuck Queued Tasks are No More in 2.6.0</a> Medium post.</p>
<h2 id="cluster-policy-hooks-can-come-from-plugins">Cluster Policy hooks can come from plugins</h2>
<p>Cluster policy hooks (e.g. <code>dag_policy</code>), can now come from Airflow plugins in addition to Airflow local settings. By allowing multiple hooks to be defined, it makes it easier for more than one team to run hooks in a single Airflow instance.</p>
<p>See the <a href="https://airflow.apache.org/docs/apache-airflow/2.6.0/administration-and-deployment/cluster-policies.html">cluster policy docs</a> for more details.</p>
<h2 id="notification-support-added">Notification support added</h2>
<p>The notifications framework allows you to send messages to external systems when a task instance/DAG run changes state. For example, you can easily post a message to Slack</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="err">“</span><span class="n">slack_notifier_example</span><span class="err">”</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2023</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">),</span>
</span></span><span class="line"><span class="cl">    <span class="n">on_success_callback</span><span class="o">=</span><span class="p">[</span>
</span></span><span class="line"><span class="cl">        <span class="n">send_slack_notification</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="n">text</span><span class="o">=</span><span class="s2">&#34;The DAG {{ dag.dag_id }} succeeded&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">channel</span><span class="o">=</span><span class="s2">&#34;#general&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">            <span class="n">username</span><span class="o">=</span><span class="s2">&#34;Airflow&#34;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">        <span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="p">],</span>
</span></span><span class="line"><span class="cl"><span class="p">):</span>
</span></span></code></pre></div><p>As of today, Slack is the only system supported out of the box. However, watch this space as more integrations will be added soon.</p>
<p>You can also create notifiers for your own use, refer to the <a href="https://airflow.apache.org/docs/apache-airflow/2.6.0/howto/notifications.html">notifier how-to docs</a> for more details.</p>
<h2 id="thanks-to-the-contributors">Thanks to the contributors</h2>
<p>Thanks to everyone who contributed to this release, including Andrey Anshin, Ash Berlin-Taylor, Brent Bovenzi, Daniel Standish, Ephraim Anierobi, Hussein Awala, Jarek Potiuk, Jed Cunningham, Josh Fell, Michael Petro, Niko Oliveira, Pierre Jeambrun, Tzu-ping Chung, Victor Chiapaikeo, and over 120 others!</p>
<p>I&rsquo;d especially like to thank our release manager, Ephraim, for getting this release out the door.</p>
<p>I hope you enjoy using Apache Airflow 2.6.0!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.5.0: Tick-Tock</title>
    <link href="/blog/airflow-2.5.0/" rel="alternate"/>
    <id>/blog/airflow-2.5.0/</id>
    <published>2022-12-02T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Apache Airflow 2.5 has just been released, barely two and a half months after 2.4!</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.5.0/">https://pypi.org/project/apache-airflow/2.5.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.5.0/">https://airflow.apache.org/docs/apache-airflow/2.5.0/</a> <br>
🛠️ Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.5.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.5.0/release_notes.html</a> <br>
🐳 Docker Image: docker pull apache/airflow:2.5.0 <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.5.0">https://github.com/apache/airflow/tree/constraints-2.5.0</a></p>
<p>This quicker release cadence is a departure from our previous habit of releasing every five-to-seven months and was a deliberate effort to listen to you, our users, and get the changes and improvements into your workflows earlier.</p>
<h2 id="usability-improvements-to-the-datasets-ui">Usability improvements to the Datasets UI</h2>
<p>When we released Dataset aware scheduling in September we knew that the tools we gave to manage the Datasets were very much a Minimum Viable Product, and in the last two months the committers and contributors have been hard at work at making the UI much more usable when it comes to Datasets.</p>
<p>But we aren&rsquo;t done yet - keep an eye out for more improvements coming over the next couple of releases too.</p>
<h2 id="greatly-improved-airflow-dags-test-command">Greatly improved <code>airflow dags test</code> command</h2>
<p>This airflow subcommand has been rethought and re-optimized to make it much easier to test your DAGs locally - the major changes are:</p>
<p>a. Task logs are visible right there in the console, instead of hidden away inside the task log files
b. It is about an order of magnitude quicker to run the tasks than before (i.e. it gets to running the task code so much quicker)
c. Everything runs in one process, so you can put a breakpoint in your IDE, and configure it to run <code>airflow dags test &lt;mydag&gt;</code> then debug code!</p>
<h2 id="auto-tailing-task-logs-in-the-grid-view">Auto tailing task logs in the Grid view</h2>
<p>Hopefully the headline says enough. It&rsquo;s lovely, go check it out.</p>
<h2 id="more-improvements-to-dynamic-task-mapping">More improvements to Dynamic-Task mapping</h2>
<p>In a similar vein to the improvements to the Dataset (UI), we have continued to iterate on and improve the feature we first added in Airflow 2.3, Dynamic Task Mapping, and 2.5 includes <a href="https://github.com/apache/airflow/pulls?q=is%3Apr&#43;author%3Auranusjr&#43;is%3Aclosed&#43;milestone%3A%22Airflow&#43;2.5.0%22">dozens of improvements</a>.</p>
<h2 id="thanks-to-the-contributors">Thanks to the contributors</h2>
<p>Andrey Anshin, Ash Berlin-Taylor, blag, Bolke de Bruin, Brent Bovenzi, Chenglong Yan, Daniel Standish, Dov Benyomin Sohacheski, Elad Kalif, Ephraim Anierobi, Jarek Potiuk, Jed Cunningham, Jorrick Sleijster, Michael Petro, Niko, Pierre Jeambrun, Tzu-ping Chung and many more, over 75 of you. Thank you!</p>
<p>And a special thank you to Ephraim who tirelessly worked behind the scenes as release manager!</p>
<p>A much shorter change log than 2.4, but I think you&rsquo;ll agree, some great changes.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.4.0: That Data Aware Release</title>
    <link href="/blog/airflow-2.4.0/" rel="alternate"/>
    <id>/blog/airflow-2.4.0/</id>
    <published>2022-09-19T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Apache Airflow 2.4.0 contains over 650 &ldquo;user-facing&rdquo; commits (excluding commits to providers or chart) and over 870 total. That includes 46 new features, 39 improvements, 52 bug fixes, and several documentation changes.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.4.0/">https://pypi.org/project/apache-airflow/2.4.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/">https://airflow.apache.org/docs/apache-airflow/2.4.0/</a> <br>
🛠️ Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html</a> <br>
🐳 Docker Image: docker pull apache/airflow:2.4.0 <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.4.0">https://github.com/apache/airflow/tree/constraints-2.4.0</a></p>
<h2 id="data-aware-scheduling-aip-48">Data-aware scheduling (AIP-48)</h2>
<p>This one is big. Airflow now has the ability to schedule DAGs based on other tasks updating datasets.</p>
<p>What does this mean, exactly? This is a great new feature that lets DAG authors create smaller, more self-contained DAGs, which chain together into a larger data-based workflow. If you are currently using <code>ExternalTaskSensor</code> or <code>TriggerDagRunOperator</code> you should take a look at datasets &ndash; in most cases you can replace them with something that will speed up the scheduling!</p>
<p>But enough talking, lets have a short example. First lets write a simple DAG with a task called <code>my_task</code> that produces a dataset called <code>my-dataset</code>:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow</span> <span class="kn">import</span> <span class="n">Dataset</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">dataset</span> <span class="o">=</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">uri</span><span class="o">=</span><span class="s1">&#39;my-dataset&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s1">&#39;producer&#39;</span><span class="p">,</span> <span class="o">...</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nd">@task</span><span class="p">(</span><span class="n">outlets</span><span class="o">=</span><span class="p">[</span><span class="n">dataset</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">    <span class="k">def</span> <span class="nf">my_task</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">        <span class="o">...</span>
</span></span></code></pre></div><p>Datasets are defined by a URI. Now, we can create a second DAG (<code>consumer</code>) that gets scheduled whenever this dataset changes:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow</span> <span class="kn">import</span> <span class="n">Dataset</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">dataset</span> <span class="o">=</span> <span class="n">Dataset</span><span class="p">(</span><span class="n">uri</span><span class="o">=</span><span class="s1">&#39;my-dataset&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s1">&#39;dataset-consumer&#39;</span><span class="p">,</span> <span class="n">schedule</span><span class="o">=</span><span class="p">[</span><span class="n">dataset</span><span class="p">]):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><p>With these two DAGs, the instant <code>my_task</code> finishes, Airflow will create the DAG run for the <code>dataset-consumer</code> workflow.</p>
<p>We know that what exists right now won&rsquo;t fit all use cases that people might wish for datasets, and in the coming minor releases (2.5, 2.6, etc.) we will expand and improve upon this foundation.</p>
<p>Datasets represent the abstract concept of a dataset, and (for now) do not have any direct read or write capability - in this release we are adding the foundational feature that we will build upon in the future - and it&rsquo;s part of our goal to have smaller releases to get new features in your hands sooner!</p>
<p>For more information on datasets, see the <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/datasets.html">documentation on Data-aware scheduling</a>. That includes details on how datasets are identified (URIs), how you can depend on multiple datasets, and how to think about what a dataset is (hint: don&rsquo;t include &ldquo;date partitions&rdquo; in a dataset, it&rsquo;s higher level than that).</p>
<h2 id="easier-management-of-conflicting-python-dependencies-using-the-new-externalpythonoperator">Easier management of conflicting python dependencies using the new ExternalPythonOperator</h2>
<p>As much as we wish all python libraries could be used happily together that sadly isn&rsquo;t the world we live in, and sometimes there are conflicts when trying to install multiple python libraries in an Airflow install &ndash; right now we hear this a lot with <code>dbt-core</code>.</p>
<p>To make this easier we have introduced <code>@task.external_python</code> (and the matching <code>ExternalPythonOperator</code>) that lets you run a python function as an Airflow task in a pre-configured virtual env, or even a whole different python version. For example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@task.external_python</span><span class="p">(</span><span class="n">python</span><span class="o">=</span><span class="s1">&#39;/opt/venvs/task_deps/bin/python&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">my_task</span><span class="p">(</span><span class="n">data_interval_start</span><span class="p">,</span> <span class="n">data_interval_env</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">&#39;Looking at data between </span><span class="si">{</span><span class="n">data_interval_start</span><span class="si">}</span><span class="s1"> and </span><span class="si">{</span><span class="n">data_interval_end</span><span class="si">}</span><span class="s1">&#39;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><p>There are a few subtleties as to what you need installed in the virtual env depending on which context variables you access, so be sure to read the <a href="http://airflow.apache.org/docs/apache-airflow/2.4.0/howto/operator/python.html#externalpythonoperator">how-to on using the ExternalPythonOperator</a></p>
<h2 id="more-improvements-to-dynamic-task-mapping-aip-42">More improvements to Dynamic Task Mapping (AIP-42)</h2>
<p>You asked, we listened. Dynamic task mapping now includes support for:</p>
<ul>
<li><code>expand_kwargs</code>: To assign multiple parameters to a non-TaskFlow operator.</li>
<li><code>zip</code>: To combine multiple things without cross-product.</li>
<li><code>map</code>: To transform the parameters just before the task is run.</li>
</ul>
<p>For more information on dynamic task mapping, see the new sections of the doc on <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#transforming-mapped-data">Transforming Mapped Data</a>, <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#combining-upstream-data-aka-zipping">Combining upstream data (aka &ldquo;zipping&rdquo;)</a>, and <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/concepts/dynamic-task-mapping.html#assigning-multiple-parameters-to-a-non-taskflow-operator">Assigning multiple parameters to a non-TaskFlow operator</a>.</p>
<h2 id="auto-register-dags-used-in-a-context-manager-no-more-as-dag-needed">Auto-register DAGs used in a context manager (no more <code>as dag:</code> needed)</h2>
<p>This one is a small quality of life improvement, and I don&rsquo;t want to admit how many times I forgot the <code>as dag:</code>, or worse, had <code>as dag:</code> repeated.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;example&#34;</span><span class="p">)</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">  <span class="o">...</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@dag</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">dag_maker</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">dag2</span> <span class="o">=</span> <span class="n">dag_maker</span><span class="p">()</span>
</span></span></code></pre></div><p>can become</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;example&#34;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@dag</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">my_dag</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">my_dag</span><span class="p">()</span>
</span></span></code></pre></div><p>If you want to disable the behaviour for any reason, set <code>auto_register=False</code> on the DAG:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="c1"># This dag will not be picked up by Airflow as it&#39;s not assigned to a variable</span>
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;example&#34;</span><span class="p">,</span> <span class="n">auto_register</span><span class="o">=</span><span class="kc">False</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="o">...</span>
</span></span></code></pre></div><h2 id="additional-improvements">Additional improvements</h2>
<p>With over 650 commits the <a href="https://airflow.apache.org/docs/apache-airflow/2.4.0/release_notes.html#airflow-2-4-0-2022-09-19">full list of features, fixes and changes</a> is too big to go in to here (check out the release notes for a full list), but some noteworthy or interesting small features include:</p>
<ul>
<li>Auto-refresh on the home page</li>
<li>Add <code>@task.short_circuit</code> TaskFlow decorator</li>
<li>Add roles delete command to cli</li>
<li>Add support for <code>TaskGroup</code> in <code>ExternalTaskSensor</code></li>
<li>Add <code>@task.kubernetes</code> taskflow decorator</li>
<li>Add experimental <code>parsing_context</code> to enable optimization of Dynamic DAG handling in workers</li>
<li>Consolidate to one <code>schedule</code> param</li>
<li>Allow showing non-sensitive config values in Admin -&gt; Configuration (rather than all or nothing)</li>
<li>Operator name separate from class (no more <code>_PythonDecoratedOperator</code> when using TaskFlow)</li>
</ul>
<h2 id="contributors">Contributors</h2>
<p>Thanks to everyone who contributed to this release, including Andrey Anshin, Ash Berlin-Taylor, Bartłomiej Hirsz, Brent Bovenzi, Chenglong Yan, D. Ferruzzi, Daniel Standish, Drew Hubl, Elad Kalif, Ephraim Anierobi, Jarek Potiuk, Jed Cunningham, Josh Fell, Mark Norman Francis, Niko, Tzu-ping Chung, Vincent, Wojciech Januszek, chethanuk-plutoflume, pierrejeambrun, and everyone else who committed, all 152 of you! You are what makes Airflow the successful project that it is!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Survey 2022</title>
    <link href="/blog/airflow-survey-2022/" rel="alternate"/>
    <id>/blog/airflow-survey-2022/</id>
    <published>2022-06-17T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h1 id="airflow-user-survey-2022">Airflow User Survey 2022</h1>
<p>This year’s survey has come and gone, and with it we’ve got a new batch of data for everyone! We collected 210 responses over two weeks. We continue to see growth in both contributions and downloads over the last two years, and expect that trend will continue through 2022.</p>
<p>The raw response data will be made available here soon, in the meantime, feel free to email <a href="mailto:john.thomas@astronomer.io">john.thomas@astronomer.io</a> for a copy.</p>
<h2 id="tldr">TL;DR</h2>
<h3 id="overview-of-the-user">Overview of the user</h3>
<ul>
<li>Like previous years, more than half of the Airflow users are Data Engineers (54%). Solutions Architects (13%), Developers (12%), DevOps (6%) and Data Scientists (4%) are also active Airflow users! There was a slight increase in the representation of Solutions Architect roles compared to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> and <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a> .</li>
<li>Airflow is used and popular in bigger companies, 64% of Airflow users work for companies with 200+ employees which is an 11 percent increase compared to <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>.</li>
<li>62% of the survey participants have more than 6 Airflow users in their company.</li>
<li>More Airflow users (65.9%) are willing to recommend Apache Airflow compared to the survey results in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> and <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a>. There is a general positive trend in a willingness to recommend Airflow, 93% of surveyed Airflow users are willing to recommend Airflow ( 85.7% in <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a> and 92% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> ), only 1% of users are not likely to recommend (3.6% in <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a> and 3.5% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>).</li>
<li>Airflow documentation is a critical source of information, with more than 90% (15% increase compared to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>) of survey participants using the documentation. Airflow documentation is also one of the top areas to improve! What’s interesting, also Stack Overflow usage is critical, with about 60% users declaring to use it as a source of information (24% increase compared to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>).</li>
</ul>
<h3 id="deployments">Deployments</h3>
<ul>
<li>85% of the Airflow users have between 1 and 7 active Airflow instances. 62.5% of the Airflow users have between 11 and 250 DAGs in their largest Airflow instance. 75% of the surveyed Airflow users have between 1 and 100 tasks per DAG.</li>
<li>Close to 85% of users use one of the Airflow 2 versions, 9.2% users still use 1.10.15, while the remaining 6.3% are still using older Airflow 1 versions. The good news is that the majority of users on Airflow 1 are planning migration to Airflow 2 quite soon, with resources and capacity being the main blockers.</li>
<li>In comparison to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>, more users were interested in monitoring in general and specifically in using tools such as external monitoring services (40.7%, up from 29.6%) and information from metabase (35.7%, up from 25.1%).</li>
<li>Celery (52.7%) and Kubernetes (39.4%) are the most common executors used.</li>
</ul>
<h3 id="usage">Usage</h3>
<ul>
<li>81.3% of Airflow users who responded to the survey don’t have any customisation of Airflow.</li>
<li>Xcom (69.8%) is the most popular method to pass inputs and outputs between tasks, however Saving and Retrieving Inputs and Outputs from Storage still plays an important role (49%).</li>
<li>Lineage itself is a quite new topic for Airflow users, most of them don’t use lineage solutions but might be interested if supported by Airflow (47.5%), are not familiar with data lineage (29%) or that data lineage is not their concern (13%).</li>
<li>The Airflow web UI is used heavily for Monitoring Runs (95.9%), Accessing Task Logs (89.8%), Manually triggering DAGs (85.2%), Clearing Tasks (82.7%) and Marking Tasks as successful (60.7%). The top 3 views used are: List of DAGs, Task Logs and DAG Runs, which is very similar to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> and <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a>.</li>
</ul>
<h3 id="community-and-contribution">Community and contribution</h3>
<ul>
<li>Most Airflow users (57.1%) are aware they could contribute but do not, and an additional 21.7% contribute very rarely. 14.8% of users were not aware they could contribute. There is much more to be done to engage our community to be more active contributors and raise the current 6.4% of users who actively contribute, especially considering that one important blocker for contribution is lack of knowledge on how to start (37.7%).</li>
</ul>
<h3 id="the-future-of-airflow">The future of Airflow</h3>
<ul>
<li>The top area for improvement is still the Airflow web UI (49.5%), closely followed by more telemetry for logging, monitoring and alerting purposes (48%). However all those efforts should go in line with improved documentation (36.6.%) and resources about using the Airflow, especially when we take into account the need of onboarding new users (36.6%).</li>
<li>DAG Versioning(66.2%) is a winner for new features in Airflow, and it’s not a surprise as this feature may positively impact daily work of Airflow users. It is followed by three other ideas: Dependency management and Data-driven scheduling (42.6%), More dynamic task structure (42.1%) and Multi-Tenancy (37.9%).</li>
</ul>
<h2 id="overview-of-the-user-1">Overview of the user</h2>
<h3 id="what-best-describes-your-current-occupation-single-choice">What best describes your current occupation? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image1.png" alt="alt_text" title="user_occupations"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Data Engineer</td>
          <td>114</td>
          <td>54%</td>
      </tr>
      <tr>
          <td>Solutions Architect</td>
          <td>27</td>
          <td>13%</td>
      </tr>
      <tr>
          <td>Developer</td>
          <td>25</td>
          <td>12%</td>
      </tr>
      <tr>
          <td>DevOps</td>
          <td>12</td>
          <td>6%</td>
      </tr>
      <tr>
          <td>Data Scientist</td>
          <td>8</td>
          <td>4%</td>
      </tr>
      <tr>
          <td>Support Engineer</td>
          <td>5</td>
          <td>2%</td>
      </tr>
      <tr>
          <td>Data Analyst</td>
          <td>3</td>
          <td>1%</td>
      </tr>
      <tr>
          <td>Business Analyst</td>
          <td>2</td>
          <td>1%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>14</td>
          <td>7%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey, more than half of Airflow users are Data Engineers (54%). Roles of the remaining Airflow users might be broken down into Solutions Architects (13%), Developers (12%), DevOps (6%) and Data Scientists (4%). The 2022 results are similar to <a href="https://airflow.apache.org/blog/airflow-survey/">those from 2019</a> and <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> with a slight increase in the representation of Solutions Architect roles.</p>
<h3 id="how-often-do-you-interact-with-airflow-single-choice">How often do you interact with Airflow? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image2.png" alt="alt_text" title="interaction_frequency"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Every day</td>
          <td>154</td>
          <td>73%</td>
      </tr>
      <tr>
          <td>At least once per week</td>
          <td>36</td>
          <td>17%</td>
      </tr>
      <tr>
          <td>At least once per month</td>
          <td>11</td>
          <td>5%</td>
      </tr>
      <tr>
          <td>Less than once per month</td>
          <td>9</td>
          <td>4%</td>
      </tr>
  </tbody>
</table>
<p>Users who took the survey are actively using Airflow as part of their current role. 73% of Airflow users who responded use it on a daily basis, 17% weekly.</p>
<h3 id="how-many-people-work-at-your-company-single-choice">How many people work at your company? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image3.png" alt="alt_text" title="company_size"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>201-5000</td>
          <td>85</td>
          <td>41%</td>
      </tr>
      <tr>
          <td>5000+</td>
          <td>49</td>
          <td>23%</td>
      </tr>
      <tr>
          <td>51-200</td>
          <td>46</td>
          <td>22%</td>
      </tr>
      <tr>
          <td>11-50</td>
          <td>20</td>
          <td>10%</td>
      </tr>
      <tr>
          <td>1-10</td>
          <td>9</td>
          <td>4%</td>
      </tr>
  </tbody>
</table>
<p>Airflow is a framework that is used and popular in bigger companies, 64% of Airflow users who responded (compared to 52.7% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>) work for companies bigger than 200 employees (41% in companies size 201-5000 and 23% in companies size 5000+).</p>
<h3 id="how-many-people-at-your-company-use-airflow-single-choice">How many people at your company use Airflow? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image4.png" alt="alt_text" title="airflow_usage"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>6-20</td>
          <td>80</td>
          <td>38%</td>
      </tr>
      <tr>
          <td>1-5</td>
          <td>61</td>
          <td>29%</td>
      </tr>
      <tr>
          <td>51-200</td>
          <td>49</td>
          <td>24%</td>
      </tr>
      <tr>
          <td>200+</td>
          <td>18</td>
          <td>9%</td>
      </tr>
  </tbody>
</table>
<p>Airflow is generally used by small to medium-sized teams. 62% of the survey participants have more than 6 Airflow users in their company (38% have between 6 and 200 users, 24% between 51-200 users).</p>
<h3 id="how-likely-are-you-to-recommend-apache-airflow-single-choice">How likely are you to recommend Apache Airflow? (single choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>% 2019</td>
          <td>% 2020</td>
          <td>% 2022</td>
      </tr>
      <tr>
          <td>Very Likely</td>
          <td>45.4%</td>
          <td>61.6%</td>
          <td>65.9%</td>
      </tr>
      <tr>
          <td>Likely</td>
          <td>40.3%</td>
          <td>30.4%</td>
          <td>26.9%</td>
      </tr>
      <tr>
          <td>Neutral</td>
          <td>10.7%</td>
          <td>5.4%</td>
          <td>6.3%</td>
      </tr>
      <tr>
          <td>Unlikely</td>
          <td>2.6%</td>
          <td>1.5%</td>
          <td>0.5%</td>
      </tr>
      <tr>
          <td>Very Unlikely</td>
          <td>1%</td>
          <td>1%</td>
          <td>0.5%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey, more Airflow users (65.9%) are willing to recommend Apache Airflow compared to the survey results in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> and <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a>. There is a general positive trend in a willingness to recommend Airflow, 93% of surveyed Airflow users are willing to recommend Airflow (92% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> and 85.7% in <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a>), only 1% of users are not likely to recommend (3.6% in <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a> and 3.5% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> ).</p>
<h3 id="what-is-your-source-of-information-about-airflow-multiple-choice">What is your source of information about Airflow? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Documentation</td>
          <td>189</td>
          <td>90.4%</td>
      </tr>
      <tr>
          <td>Airflow website (Blog, etc.)</td>
          <td>142</td>
          <td>67.9%</td>
      </tr>
      <tr>
          <td>Stack Overflow</td>
          <td>126</td>
          <td>60.3%</td>
      </tr>
      <tr>
          <td>Github Issues</td>
          <td>104</td>
          <td>49.8%</td>
      </tr>
      <tr>
          <td>Slack</td>
          <td>96</td>
          <td>45.9%</td>
      </tr>
      <tr>
          <td>Airflow Summit Videos</td>
          <td>88</td>
          <td>42.1%</td>
      </tr>
      <tr>
          <td>GitHub Discussions</td>
          <td>76</td>
          <td>36.4%</td>
      </tr>
      <tr>
          <td>Airflow Community Webinars</td>
          <td>41</td>
          <td>19.6%</td>
      </tr>
      <tr>
          <td>Astronomer Registry</td>
          <td>51</td>
          <td>24.4%</td>
      </tr>
      <tr>
          <td>Airflow Mailing List</td>
          <td>34</td>
          <td>16.3%</td>
      </tr>
  </tbody>
</table>
<p>Airflow documentation is a critical source of information, with more than 90% of survey participants using the documentation. It is of increasing importance compared to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> where documentation was at about 75% level. Moreover, more than 60% of users are getting information from the Airflow website (67.9% ) and Stack Overflow (60.3%) which is also a big increase compared to 36% level in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>. What’s interesting is that Slack usage decreased from 63.05% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> to 45.9% in 2022.</p>
<h2 id="deployments-1">Deployments</h2>
<h3 id="how-many-active-dags-do-you-have-in-your-largest-airflow-instance-single-choice">How many active DAGs do you have in your largest Airflow instance? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image5.png" alt="alt_text" title="active_dags"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>51-250</td>
          <td>66</td>
          <td>31.7%</td>
      </tr>
      <tr>
          <td>11-50</td>
          <td>64</td>
          <td>30.8%</td>
      </tr>
      <tr>
          <td>5-10</td>
          <td>25</td>
          <td>12.0%</td>
      </tr>
      <tr>
          <td>251-500</td>
          <td>20</td>
          <td>9.6%</td>
      </tr>
      <tr>
          <td>&lt;5</td>
          <td>14</td>
          <td>6.7%</td>
      </tr>
      <tr>
          <td>1000+</td>
          <td>10</td>
          <td>4.8%</td>
      </tr>
      <tr>
          <td>501-1000</td>
          <td>9</td>
          <td>4.3%</td>
      </tr>
  </tbody>
</table>
<p>62.5% of the Airflow users surveyed have between 11 and 250 DAGs in their largest Airflow instance.</p>
<h3 id="how-many-active-airflow-instances-do-you-have-single-choice">How many active Airflow instances do you have? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image6.png" alt="alt_text" title="image_tooltip"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>1</td>
          <td>52</td>
          <td>25.2%</td>
      </tr>
      <tr>
          <td>2</td>
          <td>46</td>
          <td>22.3%</td>
      </tr>
      <tr>
          <td>4-7</td>
          <td>40</td>
          <td>19.4%</td>
      </tr>
      <tr>
          <td>3</td>
          <td>37</td>
          <td>18.0%</td>
      </tr>
      <tr>
          <td>20+</td>
          <td>19</td>
          <td>9.2%</td>
      </tr>
      <tr>
          <td>8-10</td>
          <td>7</td>
          <td>3.4%</td>
      </tr>
      <tr>
          <td>11-20</td>
          <td>5</td>
          <td>2.4%</td>
      </tr>
  </tbody>
</table>
<p>85% of the Airflow users surveyed have between 1 and 7 active Airflow instances, and nearly 50% have only 1 or 2.</p>
<h3 id="what-is-the-maximum-number-of-tasks-that-you-have-used-in-a-single-dagsingle-choice">What is the maximum number of tasks that you have used in a single DAG?(single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image7.png" alt="alt_text" title="maximum tasks"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>11-25</td>
          <td>51</td>
          <td>24.5%</td>
      </tr>
      <tr>
          <td>26-50</td>
          <td>41</td>
          <td>19.7%</td>
      </tr>
      <tr>
          <td>51-100</td>
          <td>35</td>
          <td>16.8%</td>
      </tr>
      <tr>
          <td>&lt;10</td>
          <td>29</td>
          <td>13.9%</td>
      </tr>
      <tr>
          <td>101-250</td>
          <td>23</td>
          <td>11.1%</td>
      </tr>
      <tr>
          <td>501-1000</td>
          <td>9</td>
          <td>4.3%</td>
      </tr>
      <tr>
          <td>1000-2500</td>
          <td>8</td>
          <td>3.8%</td>
      </tr>
      <tr>
          <td>251-500</td>
          <td>8</td>
          <td>3.8%</td>
      </tr>
      <tr>
          <td>2500-5000</td>
          <td>4</td>
          <td>1.9%</td>
      </tr>
  </tbody>
</table>
<p>75% of the surveyed Airflow users have between 1 and 100 tasks per DAG.</p>
<h3 id="how-many-schedulers-do-you-have-in-your-largest-airflow-instance-single-choice">How many schedulers do you have in your largest Airflow instance? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image8.png" alt="alt_text" title="max_schedulers"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>1</td>
          <td>113</td>
          <td>55.1%</td>
      </tr>
      <tr>
          <td>2</td>
          <td>61</td>
          <td>29.8%</td>
      </tr>
      <tr>
          <td>3</td>
          <td>18</td>
          <td>8.8%</td>
      </tr>
      <tr>
          <td>4+</td>
          <td>13</td>
          <td>6.3%</td>
      </tr>
  </tbody>
</table>
<p>More than half of Airflow users who responded to the survey have 1 scheduler in their largest Airflow instance, however it’s important to notice that the second half of Airflow users decided to have 2 schedulers and more.</p>
<h3 id="what-executor-type-do-you-use-multiple-choice">What executor type do you use? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Celery</td>
          <td>107</td>
          <td>52.7 %</td>
      </tr>
      <tr>
          <td>Kubernetes</td>
          <td>80</td>
          <td>39.4%</td>
      </tr>
      <tr>
          <td>Local</td>
          <td>49</td>
          <td>24.1%</td>
      </tr>
      <tr>
          <td>Sequential</td>
          <td>21</td>
          <td>10.3%</td>
      </tr>
      <tr>
          <td>CeleryKubernetes</td>
          <td>14</td>
          <td>6.9%</td>
      </tr>
  </tbody>
</table>
<p>Celery (52.7%) and Kubernetes (39.4%) are the most common executors used. CeleryKubernetes (6.9%) executor also started to be noticed and used by Airflow users.</p>
<h3 id="if-you-use-the-celery-executor-how-many-workers-do-you-have-in-your-largest-airflow-instance-single-choice">If you use the Celery executor, how many workers do you have in your largest Airflow instance? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image9.png" alt="alt_text" title="max_workers"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>2-5</td>
          <td>64</td>
          <td>44.8%</td>
      </tr>
      <tr>
          <td>10+</td>
          <td>28</td>
          <td>19.6%</td>
      </tr>
      <tr>
          <td>1</td>
          <td>26</td>
          <td>18.2%</td>
      </tr>
      <tr>
          <td>6-10</td>
          <td>25</td>
          <td>17.5%</td>
      </tr>
  </tbody>
</table>
<p>Amongst Celery executor users who responded to the survey, close to half the number (44.8%) have between 2 and 5 workers in their largest Airflow instance. It’s notable that nearly a fifth (19.6%) have more than 10 workers.</p>
<h3 id="which-version-of-airflow-do-you-currently-use-single-choice">Which version of Airflow do you currently use? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image10.png" alt="alt_text" title="airflow_version"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>1.10.14 or older</td>
          <td>13</td>
          <td>6.3%</td>
      </tr>
      <tr>
          <td>1.10.15</td>
          <td>19</td>
          <td>9.2%</td>
      </tr>
      <tr>
          <td>2.0.x</td>
          <td>23</td>
          <td>11.1%</td>
      </tr>
      <tr>
          <td>2.1.x</td>
          <td>24</td>
          <td>11.6%</td>
      </tr>
      <tr>
          <td>2.2.x</td>
          <td>79</td>
          <td>38.2%</td>
      </tr>
      <tr>
          <td>2.3.x</td>
          <td>49</td>
          <td>23.7%</td>
      </tr>
  </tbody>
</table>
<p>It&rsquo;s good to see that close to 85% of users who responded to the survey use one of the Airflow 2 versions, 9.2% users still use 1.10.15, while the remaining 6.3% are still using older Airflow 1.10 versions.</p>
<p>The good news is that the majority of users on Airflow 1 are planning migration to Airflow 2 quite soon, as for now they have capacity constraints to undertake such a significant effort in their opinion. However, it can also be noticed in the survey’s comments that some users are generally skeptical towards migration to Airflow 2, they have negative opinions about the new scheduler or compatibility with the helm chart.</p>
<p>As to plans about migration to the newest version of Airflow 2, users who responded to the survey are committed and waiting especially for the features related to dynamic DAGs. However, some users also reported that they are waiting to solve some dependencies they have or they prefer to wait a little bit more for the community to test the new version before they decide to move on.</p>
<h3 id="what-metrics-do-you-use-to-monitor-airflow-multiple-choice">What metrics do you use to monitor Airflow? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>External monitoring service</td>
          <td>81</td>
          <td>40.7%</td>
      </tr>
      <tr>
          <td>Information from metadatabase</td>
          <td>71</td>
          <td>35.7%</td>
      </tr>
      <tr>
          <td>Statsd</td>
          <td>54</td>
          <td>27.1%</td>
      </tr>
      <tr>
          <td>I do not use monitoring</td>
          <td>47</td>
          <td>23.6%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>14</td>
          <td>7%</td>
      </tr>
  </tbody>
</table>
<p>In comparison to results from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>, more users are monitoring airflow in some way. External monitoring services (40.7%) and information from metabase (35.7%) started to play a more important role in Airflow monitoring.</p>
<h3 id="how-do-you-deploy-airflow-multiple-choice">How do you deploy Airflow? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>On virtual machines (for example using AWS EC2)</td>
          <td>63</td>
          <td>30.6 %</td>
      </tr>
      <tr>
          <td>Using a managed service like Astronomer, Google Composer or AWS MWAA</td>
          <td>54</td>
          <td>26.2 %</td>
      </tr>
      <tr>
          <td>On Kubernetes (using Apache Airflow’s helm chart)</td>
          <td>46</td>
          <td>22.3%</td>
      </tr>
      <tr>
          <td>On premises</td>
          <td>43</td>
          <td>20.9%</td>
      </tr>
      <tr>
          <td>On Kubernetes (using custom deployments)</td>
          <td>39</td>
          <td>18.9%</td>
      </tr>
      <tr>
          <td>On Kubernetes (using another helm chart)</td>
          <td>21</td>
          <td>10.2%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>13</td>
          <td>6.5%</td>
      </tr>
  </tbody>
</table>
<p>More than half of Airflow users who responded (51.4%) deploy Airflow on Kubernetes. This is about 20 percent more than in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>. The remaining top deployment methods are on virtual machines (30.6%) and via managed services (26.2%).</p>
<h3 id="how-do-you-distribute-your-dags-from-your-developer-environment-to-the-cloud-single-choice">How do you distribute your DAGs from your developer environment to the cloud? (single choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Using a synchronizing process (Git sync, GCS fuse, etc)</td>
          <td>100</td>
          <td>49%</td>
      </tr>
      <tr>
          <td>Bake them into the docker image</td>
          <td>51</td>
          <td>25%</td>
      </tr>
      <tr>
          <td>Shared files system</td>
          <td>30</td>
          <td>14.7%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>16</td>
          <td>7.9%</td>
      </tr>
      <tr>
          <td>I don’t know</td>
          <td>7</td>
          <td>3.4%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey responses, the most popular way of distributing DAGs is a synchronizing process, about half of Airflow users (49%) use this process to distribute DAGs from developer environments to the cloud.</p>
<h2 id="usage-1">Usage</h2>
<h3 id="do-you-have-any-customisation-of-airflow-single-choice">Do you have any customisation of Airflow? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image11.png" alt="alt_text" title="customization"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>No, we use vanilla airflow</td>
          <td>165</td>
          <td>81.3%</td>
      </tr>
      <tr>
          <td>Yes, we have a separate fork</td>
          <td>13</td>
          <td>6.4%</td>
      </tr>
      <tr>
          <td>Yes, we use a 3rd-party fork</td>
          <td>12</td>
          <td>5.9%</td>
      </tr>
      <tr>
          <td>Yes, we’ve backpropagated bug fixes to an older version</td>
          <td>13</td>
          <td>6.4%</td>
      </tr>
  </tbody>
</table>
<p>More Airflow users (81.3%) don’t have any customisation of Airflow (compared to 75.9% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>). Those Airflow users who have customisations (18.7%) decided to introduce them mainly to separate development and production workflows, to backport bug fixes, due to security fixes or to run a backfill command on Kubernetes pod.</p>
<h3 id="which-metadata-database-do-you-use-single-choice">Which Metadata Database do you use? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image12.png" alt="alt_text" title="database"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%I</td>
      </tr>
      <tr>
          <td>PostgreSQL 13</td>
          <td>86</td>
          <td>43.9%</td>
      </tr>
      <tr>
          <td>PostgreSQL 12</td>
          <td>74</td>
          <td>37.8%</td>
      </tr>
      <tr>
          <td>MySQL 8</td>
          <td>22</td>
          <td>11.2%</td>
      </tr>
      <tr>
          <td>MySQL 5</td>
          <td>9</td>
          <td>4.6%</td>
      </tr>
      <tr>
          <td>MariaDB</td>
          <td>4</td>
          <td>2.0%</td>
      </tr>
      <tr>
          <td>MsSQL</td>
          <td>1</td>
          <td>0.5%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey responses, the most popular metadata databases are PostgreSQL 13 (43.9%) and PostgreSQL 12 (37.8%). This represents a sharp increase from 2020, up from 68.9% to 81.7% total on PostgreSQL, with a corresponding decrease in MySQL, down from 23% to 15%. This is an interesting result taking into account community discussion about not adding support for more database backend or even deciding on single database support.</p>
<h3 id="whats-the-primary-method-by-which-you-integrate-with-providers-and-external-services-in-your-airflow-dags-single-choice">What&rsquo;s the primary method by which you integrate with providers and external services in your Airflow DAGs? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image13.png" alt="alt_text" title="providers_interface"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Using existing dedicated operators / hooks</td>
          <td>70</td>
          <td>34.5%</td>
      </tr>
      <tr>
          <td>Using Bash/Python operators</td>
          <td>58</td>
          <td>28.6%</td>
      </tr>
      <tr>
          <td>Using custom operators / hooks</td>
          <td>50</td>
          <td>24.6%</td>
      </tr>
      <tr>
          <td>Using KubernetesPodOperator</td>
          <td>25</td>
          <td>12.3%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey responses, the following ways of using Airflow to connect to external services are the most popular: Using existing dedicated operators / hooks (34.5%), Using Bash/Python operators (28.6%), Using custom operators / hooks (24.6%). Using KubernetesPodOperator (12.3%) is less popular regarding the survey responses. The integration with providers and external services methods ranking is similar to the one from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a>.</p>
<h3 id="what-providers-do-you-use-in-your-airflow-dags-multiple-choice">What providers do you use in your Airflow DAGs? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Amazon Web Services</td>
          <td>112</td>
          <td>55.4%</td>
      </tr>
      <tr>
          <td>Google Cloud Platform / Google APIs</td>
          <td>79</td>
          <td>39.1%</td>
      </tr>
      <tr>
          <td>Internal company systems</td>
          <td>75</td>
          <td>37.1%</td>
      </tr>
      <tr>
          <td>Hadoop / Spark / Flink / Other Apache software</td>
          <td>57</td>
          <td>28.2%</td>
      </tr>
      <tr>
          <td>Microsoft Azure</td>
          <td>17</td>
          <td>8.4%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>21</td>
          <td>10.5%</td>
      </tr>
      <tr>
          <td>I do not use external services in my Airflow DAGs</td>
          <td>14</td>
          <td>6.9%</td>
      </tr>
  </tbody>
</table>
<p>It’s not surprising that Amazon Web Services (55.4% vs 59.6% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/">2020</a>), on the next three positions Google Cloud Platform (39.1% vs 47.7% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/">2020</a> ), Internal company systems (37.1% vs 55.6% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/">2020</a>), and other Apache products (28.2% vs 35.47% in <a href="https://airflow.apache.org/blog/airflow-survey-2020/">2020</a>) are leading Airflow providers.</p>
<h3 id="how-frequently-do-you-upgrade-airflow-environments-single-choice">How frequently do you upgrade Airflow environments? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image14.png" alt="alt_text" title="upgrade_frequency"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>every 12 months</td>
          <td>46</td>
          <td>22.9%</td>
      </tr>
      <tr>
          <td>every 6 months</td>
          <td>49</td>
          <td>24.4%</td>
      </tr>
      <tr>
          <td>once a quarter</td>
          <td>47</td>
          <td>23.4%</td>
      </tr>
      <tr>
          <td>Whenever there is a newer version</td>
          <td>59</td>
          <td>29.4%</td>
      </tr>
  </tbody>
</table>
<p>Different frequencies of Airflow environments upgrades are almost equally popular amongst Airflow users who responded to the survey.</p>
<h3 id="do-you-upgrade-providers-separately-from-the-core-single-choice">Do you upgrade providers separately from the core? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image15.png" alt="alt_text" title="providers_upgrade"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>When I need it</td>
          <td>83</td>
          <td>42.8%</td>
      </tr>
      <tr>
          <td>Never - always use the providers that come with Airflow</td>
          <td>68</td>
          <td>35.1%</td>
      </tr>
      <tr>
          <td>I did not know I can upgrade providers separately</td>
          <td>32</td>
          <td>16.5%</td>
      </tr>
      <tr>
          <td>I upgrade providers when they are released</td>
          <td>11</td>
          <td>5.7%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey responses, Airflow users most often upgrade providers when they need it (42.8%) or prefer to stay with providers that come with Airflow (35.1%). It’s surprising that 16.5% of Airflow users who responded to the survey were not aware that they can upgrade their providers separately from the core Airflow.</p>
<h3 id="how-do-you-pass-inputs-and-outputs-between-tasks-multiple-choice">How do you pass inputs and outputs between tasks? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Xcom</td>
          <td>141</td>
          <td>69.8%</td>
      </tr>
      <tr>
          <td>Saving and retrieving from Storage</td>
          <td>99</td>
          <td>49%</td>
      </tr>
      <tr>
          <td>TaskFlow</td>
          <td>37</td>
          <td>18.3%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>5</td>
          <td>2.5%</td>
      </tr>
      <tr>
          <td>We don’t</td>
          <td>29</td>
          <td>14.4%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey responses, Xcom (69.8%) is the most popular method to pass inputs and outputs between tasks, however Saving and Retrieving Inputs and Outputs from Storage still plays an important role (49%). It’s interesting that close to 15% of Airflow users who responded to the survey declare to not pass any outputs or inputs between tasks.</p>
<h3 id="do-you-use-a-data-lineage-backend-multiple-choice">Do you use a data lineage backend? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>No, but I will use such feature if fully supported in Airflow</td>
          <td>95</td>
          <td>47.5%</td>
      </tr>
      <tr>
          <td>I’m not familiar with data lineage</td>
          <td>58</td>
          <td>29%</td>
      </tr>
      <tr>
          <td>No, data lineage isn’t a concern for my usage</td>
          <td>26</td>
          <td>13%</td>
      </tr>
      <tr>
          <td>Yes, I send lineage to an Open Source lineage repository</td>
          <td>15</td>
          <td>7.5%</td>
      </tr>
      <tr>
          <td>Yes, I send lineage to an Enterprise lineage repository</td>
          <td>7</td>
          <td>3.5%</td>
      </tr>
      <tr>
          <td>Yes, I send lineage to a custom internal lineage repository</td>
          <td>9</td>
          <td>4.5%</td>
      </tr>
  </tbody>
</table>
<p>When asked what lineage backend Airflow users use, the answers indicated that, while lineage itself is a quite new topic, there is interest in the feature as a whole. Most Airflow users responded that they don’t use lineage solutions currently but might be interested in the future if supported by Airflow (47.5%), are not familiar with data lineage (29%) or that data lineage is not their concern (13%).</p>
<h3 id="which-interfaces-of-airflow-do-you-use-as-part-of-your-current-role-multiple-choice">Which interfaces of Airflow do you use as part of your current role? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Original Airflow Graphical User Interface</td>
          <td>189</td>
          <td>94%</td>
      </tr>
      <tr>
          <td>CLI</td>
          <td>98</td>
          <td>48.8%</td>
      </tr>
      <tr>
          <td>API</td>
          <td>80</td>
          <td>39.8%</td>
      </tr>
      <tr>
          <td>Custom (own created) Airflow Graphical User Interface</td>
          <td>12</td>
          <td>6%</td>
      </tr>
      <tr>
          <td>GCP Composer</td>
          <td>1</td>
          <td>0.5%</td>
      </tr>
  </tbody>
</table>
<p>It’s clear that usage of Airflow web UI is important as 94% of users who responded to the survey declare to use it as a part of their current role. Usage of CLI (48.8%) and API (39.8%) goes in pairs but are not so common compared to Airflow web UI usage.</p>
<h3 id="if-gui-marked-what-do-you-use-the-gui-for-multiple-choice">(If GUI Marked) What do you use the GUI for? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Monitoring Runs</td>
          <td>188</td>
          <td>95.9%</td>
      </tr>
      <tr>
          <td>Accessing Task Logs</td>
          <td>176</td>
          <td>89.8%</td>
      </tr>
      <tr>
          <td>Manually triggering DAGs</td>
          <td>167</td>
          <td>85.2%</td>
      </tr>
      <tr>
          <td>Clearing Tasks</td>
          <td>162</td>
          <td>82.7%</td>
      </tr>
      <tr>
          <td>Marking Tasks as successful</td>
          <td>119</td>
          <td>60.7%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>6</td>
          <td>3%</td>
      </tr>
  </tbody>
</table>
<p>Airflow web UI is used heavily for monitoring: Monitoring Runs (95.9%) and troubleshooting: Accessing Task Logs (89.8%), Manually triggering DAGs (85.2%), Clearing Tasks (82.7%) and Marking Tasks as successful (60.7%).</p>
<h3 id="if-cli-marked-what-do-you-use-the-cli-for-multiple-choice">(if CLI Marked) What do you use the CLI For? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Backfilling</td>
          <td>63</td>
          <td>56.8%</td>
      </tr>
      <tr>
          <td>Manually triggering DAGs</td>
          <td>52</td>
          <td>46.8%</td>
      </tr>
      <tr>
          <td>Clearing Tasks</td>
          <td>26</td>
          <td>23.4%</td>
      </tr>
      <tr>
          <td>Monitoring Runs</td>
          <td>25</td>
          <td>22.5%</td>
      </tr>
      <tr>
          <td>Accessing Task Logs</td>
          <td>21</td>
          <td>18.9%</td>
      </tr>
      <tr>
          <td>Marking Tasks as successful</td>
          <td>11</td>
          <td>9.9%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>17</td>
          <td>15.3%</td>
      </tr>
  </tbody>
</table>
<p>Compared to Airflow web UI, Airflow CLI is used mainly for Backfilling (56.8%) and Manually triggering DAGs (46.8%).</p>
<h3 id="in-airflow-which-ui-views-are-important-for-you-multiple-choice">In Airflow, which UI views are important for you? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>List of DAGs</td>
          <td>178</td>
          <td>89.4%</td>
      </tr>
      <tr>
          <td>Task Logs</td>
          <td>162</td>
          <td>81.4%</td>
      </tr>
      <tr>
          <td>DAG Runs</td>
          <td>160</td>
          <td>80.4%</td>
      </tr>
      <tr>
          <td>Graph view</td>
          <td>147</td>
          <td>73.9%</td>
      </tr>
      <tr>
          <td>Grid/Tree View</td>
          <td>138</td>
          <td>69.3%</td>
      </tr>
      <tr>
          <td>Run Details</td>
          <td>117</td>
          <td>58.8%</td>
      </tr>
      <tr>
          <td>DAG details</td>
          <td>111</td>
          <td>55.8%</td>
      </tr>
      <tr>
          <td>Task Instances</td>
          <td>102</td>
          <td>51.3%</td>
      </tr>
      <tr>
          <td>Task Duration</td>
          <td>91</td>
          <td>45.7%</td>
      </tr>
      <tr>
          <td>Code</td>
          <td>90</td>
          <td>45.2%</td>
      </tr>
      <tr>
          <td>Task Tries</td>
          <td>60</td>
          <td>30.2%</td>
      </tr>
      <tr>
          <td>Gantt</td>
          <td>48</td>
          <td>21.4%</td>
      </tr>
      <tr>
          <td>Landing Times</td>
          <td>27</td>
          <td>13.6%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>4</td>
          <td>2%</td>
      </tr>
  </tbody>
</table>
<p>UI views importance ranking shows that the majority Airflow users use Web UI mostly for monitoring and/or troubleshooting purposes, where the top 3 views are List of DAGs (89.4%), Task Logs (81.4%) and DAG Runs (80.4%). The results are very similar to those from <a href="https://airflow.apache.org/blog/airflow-survey-2020/#overview-of-the-user">2020</a> and <a href="https://airflow.apache.org/blog/airflow-survey/">2019</a>.</p>
<h2 id="community-and-contribution-1">Community and contribution</h2>
<h3 id="are-you-participating-in-the-airflow-community-discussions-single-choice">Are you participating in the Airflow community discussions? (single choice)</h3>
<p><img src="/blog/airflow-survey-2022/images/image16.png" alt="alt_text" title="discussions_engagement"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>I see them from time to time</td>
          <td>99</td>
          <td>48.3%</td>
      </tr>
      <tr>
          <td>I regularly follow what&rsquo;s being discussed but don&rsquo;t participate</td>
          <td>53</td>
          <td>25.9%</td>
      </tr>
      <tr>
          <td>I didn&rsquo;t know I could</td>
          <td>41</td>
          <td>20.0%</td>
      </tr>
      <tr>
          <td>I actively participate in the discussions</td>
          <td>12</td>
          <td>5.9%</td>
      </tr>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>I know I can but I do not contribute</td>
          <td>116</td>
          <td>57.1%</td>
      </tr>
      <tr>
          <td>Very rarely when it relates to what I need</td>
          <td>44</td>
          <td>21.7%</td>
      </tr>
      <tr>
          <td>I do not know I could</td>
          <td>30</td>
          <td>14.8%</td>
      </tr>
      <tr>
          <td>I regularly contribute by discussing, reviewing and submitting PR</td>
          <td>13</td>
          <td>6.4%</td>
      </tr>
  </tbody>
</table>
<p>Results related to the Airflow contribution are very similar to those about participating in the Airflow community discussions. Most of the Airflow users (57.1%) who responded to the survey are aware but do not contribute or contribute very rarely (21.7%). 14.8% of users were not aware they could contribute. Once again, it’s a clear indicator that there is much more to be done to engage our community to be more active contributors and raise the current 6.4% of users who actively contribute.</p>
<h3 id="if-you-do-not-contribute---why">If you do not contribute - why?</h3>
<p><img src="/blog/airflow-survey-2022/images/image18.png" alt="alt_text" title="contribution_reasons"></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>I have no time to contribute even if would like to</td>
          <td>65</td>
          <td>38.9%</td>
      </tr>
      <tr>
          <td>I don’t know how to start</td>
          <td>63</td>
          <td>37.7%</td>
      </tr>
      <tr>
          <td>I don’t have a need to contribute</td>
          <td>19</td>
          <td>11.4%</td>
      </tr>
      <tr>
          <td>I didn’t know I could</td>
          <td>12</td>
          <td>7.2%</td>
      </tr>
      <tr>
          <td>My employer has policy that makes it difficult to contribute</td>
          <td>8</td>
          <td>4.8%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey results, the most important blocker for the Airflow contribution is limited time (38.9%), but surprisingly interesting and important blocker is also lack of knowledge on how to start (37.7%), followed by lack of knowledge that it’s possible to contribute (7.2%).</p>
<h2 id="the-future-of-airflow-1">The future of Airflow</h2>
<h3 id="in-your-opinion-what-could-be-improved-in-airflow-multiple-choice">In your opinion, what could be improved in Airflow? (multiple choice)</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>Web UI</td>
          <td>100</td>
          <td>49.5%</td>
      </tr>
      <tr>
          <td>Logging, monitoring and alerting</td>
          <td>97</td>
          <td>48.0%</td>
      </tr>
      <tr>
          <td>Examples, how-to, onboarding documentation</td>
          <td>74</td>
          <td>36.6%</td>
      </tr>
      <tr>
          <td>Technical documentation</td>
          <td>74</td>
          <td>36.6%</td>
      </tr>
      <tr>
          <td>Scheduler performance</td>
          <td>56</td>
          <td>27.7%</td>
      </tr>
      <tr>
          <td>Reliability</td>
          <td>52</td>
          <td>25.7%</td>
      </tr>
      <tr>
          <td>DAG authoring</td>
          <td>48</td>
          <td>23.8%</td>
      </tr>
      <tr>
          <td>REST API</td>
          <td>43</td>
          <td>21.3%</td>
      </tr>
      <tr>
          <td>Authentication and authorization</td>
          <td>41</td>
          <td>20.3%</td>
      </tr>
      <tr>
          <td>External integration e.g. AWS, GCP, Apache products</td>
          <td>41</td>
          <td>20.3%</td>
      </tr>
      <tr>
          <td>Better support for various deployments (Docker-compose/Nomad/Others)</td>
          <td>39</td>
          <td>19.3%</td>
      </tr>
      <tr>
          <td>Everything works fine for me</td>
          <td>19</td>
          <td>9.4%</td>
      </tr>
      <tr>
          <td>I don’t know</td>
          <td>4</td>
          <td>2.0%</td>
      </tr>
  </tbody>
</table>
<p>The results are quite self-explanatory. According to the survey results, the top area for improvement is still the Airflow web UI (49.5%), closely followed by more telemetry for logging, monitoring and alerting purposes (48%). However all those efforts should go in line with improved documentation (36.6.%) and resources about using the Airflow, especially when we take into account the need of onboarding new users (36.6%).</p>
<h3 id="which-features-would-you-like-to-see-in-airflow">Which features would you like to see in Airflow?</h3>
<table>
  <thead>
      <tr>
          <th></th>
          <th></th>
          <th></th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td></td>
          <td>No.</td>
          <td>%</td>
      </tr>
      <tr>
          <td>DAG Versioning</td>
          <td>129</td>
          <td>66.2%</td>
      </tr>
      <tr>
          <td>Dependency management and Data-driven scheduling</td>
          <td>83</td>
          <td>42.6%</td>
      </tr>
      <tr>
          <td>More dynamic task structure</td>
          <td>82</td>
          <td>42.1%</td>
      </tr>
      <tr>
          <td>Multi-Tenancy</td>
          <td>74</td>
          <td>37.9%</td>
      </tr>
      <tr>
          <td>Signal-based scheduling</td>
          <td>67</td>
          <td>34.4%</td>
      </tr>
      <tr>
          <td>Better Security (Isolation)</td>
          <td>65</td>
          <td>33.3%</td>
      </tr>
      <tr>
          <td>Submitting new DAGs externally via API</td>
          <td>53</td>
          <td>27.2%</td>
      </tr>
      <tr>
          <td>Composable Operators</td>
          <td>46</td>
          <td>23.6%</td>
      </tr>
      <tr>
          <td>Support for native cloud executors (AWS/GCP/Azure etc.)</td>
          <td>44</td>
          <td>22.6%</td>
      </tr>
      <tr>
          <td>Better support for Machine Learning</td>
          <td>38</td>
          <td>19.5%</td>
      </tr>
      <tr>
          <td>Remote CLI</td>
          <td>36</td>
          <td>18.5%</td>
      </tr>
      <tr>
          <td>Support for hybrid executors</td>
          <td>22</td>
          <td>11.3%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey results, DAG Versioning is a winner for new features in Airflow, and it’s not a surprise as this feature may positively impact daily work of Airflow users. It is followed by three other ideas: Dependency management and Data-driven scheduling (42.6%), More dynamic task structure (42.1%) and Multi-Tenancy (37.9%). Another interesting point from that question is that only 11.3% think that support for hybrid executors is needed in Airflow.</p>
<h2 id="data">Data</h2>
<p>If you&rsquo;re interested in taking a look at the raw data yourself, it&rsquo;s available here: (Airflow User Survey 2022.csv)[/data/survey-responses/airflow-user-survey-responses-2022.csv.zip]</p>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Summit 2022</title>
    <link href="/blog/airflow_summit_2022/" rel="alternate"/>
    <id>/blog/airflow_summit_2022/</id>
    <published>2022-05-16T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>The biggest Airflow Event of the Year returns May 23–27! Airflow Summit 2022 will bring together the global
community of Apache Airflow practitioners and data leaders.</p>
<h3 id="whats-on-the-agenda">What’s on the Agenda</h3>
<p>During the free conference, you will hear about Apache Airflow best practices, trends in building data
pipelines, data governance, Airflow and machine learning, and the future of Airflow. There will also be
a series of presentations on non-code contributions driving the open-source project.</p>
<h3 id="how-to-attend">How to Attend</h3>
<p>This year’s edition will include a variety of online sessions across different time zones.
Additionally, you can take part in local in-person events organized worldwide for data
communities to watch the event and network.</p>
<h3 id="interested">Interested?</h3>
<p>🪶 <a href="https://www.crowdcast.io/e/airflowsummit2022/register?utm_campaign=Astronomer_marketing&amp;utm_source=Astronomer%20website&amp;utm_medium=website&amp;utm_term=Airflow%20Summit">Register for Airflow Summit 2022</a> today</p>
<p>🤝 <a href="https://airflowsummit.org/in-person-events/">Check out the in-person events</a> planned for Airflow Summit 2022.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.3.0 is here</title>
    <link href="/blog/airflow-2.3.0/" rel="alternate"/>
    <id>/blog/airflow-2.3.0/</id>
    <published>2022-04-30T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Apache Airflow 2.3.0 contains over 700 commits since 2.2.0 and includes 50 new features, 99 improvements, 85 bug fixes, and several doc changes.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.3.0/">https://pypi.org/project/apache-airflow/2.3.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/">https://airflow.apache.org/docs/apache-airflow/2.3.0/</a> <br>
🛠️ Release Notes: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/release_notes.html">https://airflow.apache.org/docs/apache-airflow/2.3.0/release_notes.html</a> <br>
🐳 Docker Image: docker pull apache/airflow:2.3.0 <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.3.0">https://github.com/apache/airflow/tree/constraints-2.3.0</a></p>
<p>As the changelog is quite large, the following are some notable new features that shipped in this release.</p>
<h2 id="dynamic-task-mappingaip-42">Dynamic Task Mapping(AIP-42)</h2>
<p>There&rsquo;s now first-class support for dynamic tasks in Airflow. What this means is that you can generate tasks dynamically at runtime. Much like using a <code>for</code> loop
to create a list of tasks, here you can create the same tasks without having to know the exact number of tasks ahead of time.</p>
<p>You can have a <code>task</code> generate the list to iterate over, which is not possible with a <code>for</code> loop.</p>
<p>Here is an example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@task</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">make_list</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># This can also be from an API call, checking a database, -- almost anything you like, as long as the</span>
</span></span><span class="line"><span class="cl">    <span class="c1"># resulting list/dictionary can be stored in the current XCom backend.</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">{</span><span class="s2">&#34;a&#34;</span><span class="p">:</span> <span class="s2">&#34;b&#34;</span><span class="p">},</span> <span class="s2">&#34;str&#34;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@task</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">consumer</span><span class="p">(</span><span class="n">arg</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">arg</span><span class="p">))</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="k">with</span> <span class="n">DAG</span><span class="p">(</span><span class="n">dag_id</span><span class="o">=</span><span class="s2">&#34;dynamic-map&#34;</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2022</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">))</span> <span class="k">as</span> <span class="n">dag</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">    <span class="n">consumer</span><span class="o">.</span><span class="n">expand</span><span class="p">(</span><span class="n">arg</span><span class="o">=</span><span class="n">make_list</span><span class="p">())</span>
</span></span></code></pre></div><p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/concepts/dynamic-task-mapping.html">Dynamic Task Mapping</a></p>
<h2 id="grid-view-replaces-tree-view">Grid View replaces Tree View</h2>
<p>Grid view replaces tree view in Airflow 2.3.0.</p>
<p><strong>Screenshots</strong>:
<img src="/blog/airflow-2.3.0/grid-view.png" alt="The new grid view"></p>
<h2 id="purge-history-from-metadata-database">Purge history from metadata database</h2>
<p>Airflow 2.3.0 introduces a new <code>airflow db clean</code> command that can be used to purge old data from the metadata database.</p>
<p>You would want to use this command if you want to reduce the size of the metadata database.</p>
<p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/usage-cli.html#purge-history-from-metadata-database">Purge history from metadata database</a></p>
<h2 id="localkubernetesexecutor">LocalKubernetesExecutor</h2>
<p>There is a new executor named LocalKubernetesExecutor. This executor helps you run some tasks using LocalExecutor and run another set of tasks using the KubernetesExecutor in the same deployment based on the task&rsquo;s queue.</p>
<p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/executor/local_kubernetes.html">LocalKubernetesExecutor</a></p>
<h2 id="dagprocessormanager-as-standalone-process-aip-43">DagProcessorManager as standalone process (AIP-43)</h2>
<p>As of 2.3.0, you can run the DagProcessorManager as a standalone process. Because DagProcessorManager runs user code, separating it from the scheduler process and running it as an independent process in a different host is a good idea.</p>
<p>The <code>airflow dag-processor</code> cli command will start a new process that will run the DagProcessorManager in a separate process. Before you can run the DagProcessorManager as a standalone process, you need to set the <a href="https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html#standalone_dag_processor">[scheduler] standalone_dag_processor</a> to <code>True</code>.</p>
<p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/cli-and-env-variables-ref.html#dag-processor">dag-processor CLI command</a></p>
<h2 id="json-serialization-for-connections">JSON serialization for connections</h2>
<p>You can now create connections using the <code>json</code> serialization format.</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">airflow connections add <span class="s1">&#39;my_prod_db&#39;</span> <span class="se">\
</span></span></span><span class="line"><span class="cl"><span class="se"></span>    --conn-json <span class="s1">&#39;{
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;conn_type&#34;: &#34;my-conn-type&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;login&#34;: &#34;my-login&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;password&#34;: &#34;my-password&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;host&#34;: &#34;my-host&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;port&#34;: 1234,
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;schema&#34;: &#34;my-schema&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">        &#34;extra&#34;: {
</span></span></span><span class="line"><span class="cl"><span class="s1">            &#34;param1&#34;: &#34;val1&#34;,
</span></span></span><span class="line"><span class="cl"><span class="s1">            &#34;param2&#34;: &#34;val2&#34;
</span></span></span><span class="line"><span class="cl"><span class="s1">        }
</span></span></span><span class="line"><span class="cl"><span class="s1">    }&#39;</span>
</span></span></code></pre></div><p>You can also use <code>json</code> serialization format when setting the connection in environment variables.</p>
<p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/howto/connection.html">JSON serialization for connections</a></p>
<h2 id="airflow-db-downgrade-and-offline-generation-of-sql-scripts">Airflow <code>db downgrade</code> and Offline generation of SQL scripts</h2>
<p>Airflow 2.3.0 introduced a new command <code>airflow db downgrade</code> that will downgrade the database to your chosen version.</p>
<p>You can also generate the downgrade/upgrade SQL scripts for your database and manually run it against your database or just view the SQL queries that would be run by the downgrade/upgrade command.</p>
<p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/usage-cli.html#downgrading-airflow">Airflow <code>db downgrade</code> and Offline generation of SQL scripts</a></p>
<h2 id="reuse-of-decorated-tasks">Reuse of decorated tasks</h2>
<p>You can now reuse decorated tasks across your dag files. A decorated task has an <code>override</code> method that allows you to override its arguments.</p>
<p>Here&rsquo;s an example:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="nd">@task</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">add_task</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&#34;Task args: x=</span><span class="si">{</span><span class="n">x</span><span class="si">}</span><span class="s2">, y=</span><span class="si">{</span><span class="n">y</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">return</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@dag</span><span class="p">(</span><span class="n">start_date</span><span class="o">=</span><span class="n">datetime</span><span class="p">(</span><span class="mi">2022</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">mydag</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">    <span class="n">start</span> <span class="o">=</span> <span class="n">add_task</span><span class="o">.</span><span class="n">override</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="s2">&#34;start&#34;</span><span class="p">)(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">    <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="n">start</span> <span class="o">&gt;&gt;</span> <span class="n">add_task</span><span class="o">.</span><span class="n">override</span><span class="p">(</span><span class="n">task_id</span><span class="o">=</span><span class="sa">f</span><span class="s2">&#34;add_start_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s2">&#34;</span><span class="p">)(</span><span class="n">start</span><span class="p">,</span> <span class="n">i</span><span class="p">)</span>
</span></span></code></pre></div><p>More information can be found here: <a href="https://airflow.apache.org/docs/apache-airflow/2.3.0/tutorial_taskflow_api.html#reusing-a-decorated-task">Reuse of decorated DAGs</a></p>
<h2 id="other-small-features">Other small features</h2>
<p>This isn’t a comprehensive list, but some noteworthy or interesting small features include:</p>
<ul>
<li>Support different timeout value for dag file parsing</li>
<li><code>airflow dags reserialize</code> command to reserialize dags</li>
<li>Events Timetable</li>
<li>SmoothOperator - Operator that does literally nothing except logging a YouTube link to
Sade&rsquo;s &ldquo;Smooth Operator&rdquo;. Enjoy!</li>
</ul>
<h2 id="contributors">Contributors</h2>
<p>Thanks to everyone who contributed to this release: Ash Berlin-Taylor, Brent Bovenzi, Daniel Standish, Elad, Ephraim Anierobi, Jarek Potiuk, Jed Cunningham, Josh Fell, Kamil Breguła, Kanthi, Kaxil Naik, Khalid Mammadov, Malthe Borch, Ping Zhang, Tzu-ping Chung and many others who keep making Airflow better for everyone.</p>
]]></content>
  </entry>
  
  <entry>
    <title>What&#39;s new in Apache Airflow 2.2.0</title>
    <link href="/blog/airflow-2.2.0/" rel="alternate"/>
    <id>/blog/airflow-2.2.0/</id>
    <published>2021-10-11T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I’m proud to announce that Apache Airflow 2.2.0 has been released. It contains over 600 commits since 2.1.4 and includes 30 new features, 84 improvements, 85 bug fixes, and many internal and doc changes.</p>
<p><strong>Details</strong>:</p>
<p>📦 PyPI: <a href="https://pypi.org/project/apache-airflow/2.2.0/">https://pypi.org/project/apache-airflow/2.2.0/</a> <br>
📚 Docs: <a href="https://airflow.apache.org/docs/apache-airflow/2.2.0/">https://airflow.apache.org/docs/apache-airflow/2.2.0/</a> <br>
🛠️ Changelog: <a href="https://airflow.apache.org/docs/apache-airflow/2.2.0/changelog.html">https://airflow.apache.org/docs/apache-airflow/2.2.0/changelog.html</a> <br>
🐳 Docker Image: docker pull apache/airflow:2.2.0 <br>
🚏 Constraints: <a href="https://github.com/apache/airflow/tree/constraints-2.2.0">https://github.com/apache/airflow/tree/constraints-2.2.0</a></p>
<p>As the changelog is quite large, the following are some notable new features that shipped in this release.</p>
<h2 id="custom-timetables-aip-39">Custom Timetables (AIP-39)</h2>
<p>Airflow has historically used cron expressions and timedeltas to represent when a DAG should run. This worked for a lot of use cases, but not all. For example, running daily on Monday-Friday, but not on weekends wasn’t possible.</p>
<p>To provide more scheduling flexibility, determining when a DAG should run is now done with Timetables. Of course, backwards compatibility has been maintained - cron expressions and timedeltas are still fully supported, however, timetables are pluggable so you can add your own custom timetable to fit your needs! For example, you could write a timetable to schedule a DagRun</p>
<p><code>execution_date</code> has long been confusing to new Airflowers, so as part of this change a new concept has been added to Airflow to replace it named <code>data_interval</code>, which is the period of data that a task should operate on. The following are now available:</p>
<ul>
<li><code>logical_date</code> (aka <code>execution_date</code>)</li>
<li><code>data_interval_start</code> (same value as <code>execution_date</code> for cron)</li>
<li><code>data_interval_end</code> (aka <code>next_execution_date</code>)</li>
</ul>
<p>If you write your own timetables, keep in mind they should be idempotent and fast as they are used in the scheduler to create DagRuns.</p>
<p>More information can be found at: <a href="https://airflow.apache.org/docs/apache-airflow/stable/howto/timetable.html">Customizing DAG Scheduling with Timetables</a></p>
<h2 id="deferrable-tasks-aip-40">Deferrable Tasks (AIP-40)</h2>
<p>Deferrable tasks allows operators or sensors to defer themselves until a light-weight async check passes, at which point they can resume executing. Most importantly, this results in the worker slot, and most notably any resources used by it, to be returned to Airflow. This allows simple things like monitoring a job in an external system or watching for an event to be much cheaper.</p>
<p>To support this feature, a new component has been added to Airflow, the triggerer, which is the daemon process that runs the asyncio event loop.</p>
<p>Airflow 2.2.0 ships with 2 deferrable sensors, <code>DateTimeSensorAsync</code> and <code>TimeDeltaSensorAsync</code>, both of which are drop-in replacements for the existing corresponding sensor.</p>
<p>More information can be found at:</p>
<p><a href="https://airflow.apache.org/docs/apache-airflow/stable/concepts/deferring.html">Deferrable Operators &amp; Triggers</a></p>
<h2 id="custom-task-decorators-and-taskdocker">Custom <code>@task</code> decorators and <code>@task.docker</code></h2>
<p>Airflow 2.2.0 allows providers to create custom <code>@task</code> decorators in the TaskFlow interface.</p>
<p>The <code>@task.docker</code> decorator is one such decorator that allows you to run a function in a docker container. Airflow handles getting the code into the container and returning xcom - you just worry about your function. This is particularly useful when you have conflicting dependencies between Airflow itself and tasks you need to run.</p>
<p>More information on creating custom <code>@task</code> decorators can be found at: <a href="https://airflow.apache.org/docs/apache-airflow/stable/howto/create-custom-decorator.html">Creating Custom @task Decorators</a></p>
<p>More information on the <code>@task.docker</code> decorator can be found at: <a href="https://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html#using-the-taskflow-api-with-docker-or-virtual-environments">Using the Taskflow API with Docker or Virtual Environments</a></p>
<h2 id="validation-of-dag-params">Validation of DAG params</h2>
<p>You can now apply validation on DAG params by passing a <code>Param</code> object for each param. The <code>Param</code> object supports the full <a href="https://json-schema.org/draft/2020-12/json-schema-validation.html">json-schema validation specifications</a>.</p>
<p>Currently, this only functions with manually triggered DAGs, but it does set the stage for future params related functionality.</p>
<p>More information can be found at: <a href="https://airflow.apache.org/docs/apache-airflow/stable/concepts/params.html">Params</a></p>
<h2 id="other-small-features">Other small features</h2>
<p>This isn’t a comprehensive list, but some noteworthy or interesting small features include:</p>
<ul>
<li>Testing Connections from the UI - test the credentials for your Connection actually work</li>
<li>Duplication Connections from the UI</li>
<li>DAGs “Next run” info is shown in the UI, including when the run will actually start</li>
<li><code>airflow standalone</code> command runs all of the Airflow components directly without docker - great for local development</li>
</ul>
<h2 id="contributors">Contributors</h2>
<p>Thanks to everyone who contributed to this release: Andrew Godwin, Ash Berlin-Taylor, Brent Bovenzi, Elad Kalif, Ephraim Anierobi, James Timmins, Jarek Potiuk, Jed Cunningham, Josh Fell, Kamil Breguła, Kaxil Naik, Malthe Borch, Sam Wheating, Sumit Maheshwari, Tzu-ping Chung and many others</p>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Summit 2021</title>
    <link href="/blog/airflow_summit_2021/" rel="alternate"/>
    <id>/blog/airflow_summit_2021/</id>
    <published>2021-03-21T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h2 id="airflow-summit-2021-is-here">Airflow Summit 2021 is here!</h2>
<p>The summit will be held online, July 8-16, 2021. Join us from all over the world to find
out how Airflow is being used by leading companies, what is its roadmap and how you can
participate in its development.</p>
<h2 id="useful-information">Useful information:</h2>
<ul>
<li>The official website: <a href="https://airflowsummit.org">https://airflowsummit.org</a></li>
<li>Call for proposals is open until <strong>12 April 2021</strong>. To submit your talk go to <a href="https://sessionize.com/airflow-summit-2021/">https://sessionize.com/airflow-summit-2021/</a></li>
<li>In case of any questions reach out to us via <a href="mailto:info@airflowsummit.org">info@airflowsummit.org</a></li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Survey 2020</title>
    <link href="/blog/airflow-survey-2020/" rel="alternate"/>
    <id>/blog/airflow-survey-2020/</id>
    <published>2021-03-09T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h1 id="apache-airflow-survey-2020">Apache Airflow Survey 2020</h1>
<p>World of data processing tools is growing steadily. Apache Airflow seems to be already considered as
crucial component of this complex ecosystem. We observe steady growth in number of users as well as in
an amount of active contributors. So listening and understanding our community is of high importance.</p>
<p>It&rsquo;s worth to note that the 2020 survey was still mostly about 1.10.X version of Apache Airflow and
possibly many drawbacks were addressed in the 2.0 version that was released in December 2020. But if this
is true, we will learn next year!</p>
<h2 id="overview-of-the-user">Overview of the user</h2>
<p><img src="/blog/airflow-survey-2020/What_best_describes_your_current_occupation.png" alt=""></p>
<p><strong>What best describes your current occupation? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Data Engineer</td>
          <td>115</td>
          <td>56.65</td>
      </tr>
      <tr>
          <td>Developer</td>
          <td>28</td>
          <td>13.79</td>
      </tr>
      <tr>
          <td>DevOps</td>
          <td>17</td>
          <td>8.37</td>
      </tr>
      <tr>
          <td>Solutions Architect</td>
          <td>14</td>
          <td>6.9</td>
      </tr>
      <tr>
          <td>Data Scientist</td>
          <td>12</td>
          <td>5.91</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>10</td>
          <td>4.93</td>
      </tr>
      <tr>
          <td>Data Analyst</td>
          <td>4</td>
          <td>1.97</td>
      </tr>
      <tr>
          <td>Support Engineer</td>
          <td>3</td>
          <td>1.48</td>
      </tr>
  </tbody>
</table>
<p>Those results are not a surprise as Airflow is a tool dedicated to data-related tasks. The majority of
our users are data engineers, scientists or analysts. The 2020 results are similar to <a href="https://airflow.apache.org/blog/airflow-survey/">those from 2019</a> with
visible slight increase in ML use cases.</p>
<p>Additionally, 79% of users uses Airflow on daily basis and 16% interacts with it at least once a week.</p>
<p><strong>How many people work in your company? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>200+</td>
          <td>107</td>
          <td>52.71</td>
      </tr>
      <tr>
          <td>51-200</td>
          <td>44</td>
          <td>21.67</td>
      </tr>
      <tr>
          <td>11-50</td>
          <td>37</td>
          <td>18.23</td>
      </tr>
      <tr>
          <td>1-10</td>
          <td>15</td>
          <td>7.39</td>
      </tr>
  </tbody>
</table>
<p><strong>How many people in your company use Airflow? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1-5</td>
          <td>84</td>
          <td>41.38</td>
      </tr>
      <tr>
          <td>6-20</td>
          <td>75</td>
          <td>36.95</td>
      </tr>
      <tr>
          <td>21-50</td>
          <td>23</td>
          <td>11.33</td>
      </tr>
      <tr>
          <td>50+</td>
          <td>21</td>
          <td>10.34</td>
      </tr>
  </tbody>
</table>
<p>Airflow is a software that is used and trusted by big companies. We can also see that Airflow can work
fine for teams of different sizes. However, in some cases users may use multiple Airflow instances.</p>
<p><strong>Are you considering moving to other workflow engines? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No, we are happy with Airflow</td>
          <td>174</td>
          <td>85.71</td>
      </tr>
      <tr>
          <td>Yes</td>
          <td>29</td>
          <td>14.29</td>
      </tr>
  </tbody>
</table>
<p>Nearly 1 out of 7 users is considering migrating to other workflow engines. Their decision is usually
justified by need of <strong>easier workflow writing experience</strong> (12.32%), <strong>better UI/UX</strong> and <strong>faster scheduler</strong>
(8.37% both).</p>
<p>While the first point may be addressed by <a href="http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskflow-api">TaskFlow API</a> in Airflow 2.0 the other two are definitely addressed
in the new major version. And the early feedback from 2.0 users seems to be confirming it.</p>
<p>The alternative engines considered by users are mainly Prefect and Argo. Some participants also mentioned
Luigi, Kubeflow or custom solutions.</p>
<p><strong>Are you or your team actively participating in Airflow development - contributing? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>I wish we could</td>
          <td>99</td>
          <td>48.77</td>
      </tr>
      <tr>
          <td>No</td>
          <td>59</td>
          <td>29.06</td>
      </tr>
      <tr>
          <td>Yes</td>
          <td>45</td>
          <td>22.17</td>
      </tr>
  </tbody>
</table>
<p>This is really heart-warming result. It means that 1 out of 5 users contributes actively to our project!
But it would be good to learn if there&rsquo;s something else than time that is stopping people who wish to contribute
from doing it. If there are some other obstacles we definitely would like to learn about them so we can improve.
That said - if you know something we can improve please reach out via Slack, dev list or Github
discussions.</p>
<p><strong>How likely are you to recommend Apache Airflow? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>2020 %</th>
          <th>2019 %</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Very Likely</td>
          <td>125</td>
          <td>61.58</td>
          <td>45.45%</td>
      </tr>
      <tr>
          <td>Likely</td>
          <td>62</td>
          <td>30.54</td>
          <td>40.26%</td>
      </tr>
      <tr>
          <td>Neutral</td>
          <td>11</td>
          <td>5.42</td>
          <td>10.71%</td>
      </tr>
      <tr>
          <td>Unlikely</td>
          <td>3</td>
          <td>1.48</td>
          <td>2.60%</td>
      </tr>
      <tr>
          <td>Very unlikely</td>
          <td>2</td>
          <td>0.99</td>
          <td>0.97%</td>
      </tr>
  </tbody>
</table>
<p>Here is good news! It seems that people are more willing to recommend Apache Airflow than year before.</p>
<p><strong>What is your source of information about Airflow? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Documentation</td>
          <td>154</td>
          <td>75.86</td>
      </tr>
      <tr>
          <td>Airflow website</td>
          <td>139</td>
          <td>68.47</td>
      </tr>
      <tr>
          <td>Slack</td>
          <td>128</td>
          <td>63.05</td>
      </tr>
      <tr>
          <td>Github</td>
          <td>127</td>
          <td>62.56</td>
      </tr>
      <tr>
          <td>Stack Overflow</td>
          <td>72</td>
          <td>35.47</td>
      </tr>
      <tr>
          <td>Airflow Summit Videos</td>
          <td>44</td>
          <td>21.67</td>
      </tr>
      <tr>
          <td>The dev mailing list</td>
          <td>33</td>
          <td>16.26</td>
      </tr>
      <tr>
          <td>Awesome Apache Airflow repository</td>
          <td>21</td>
          <td>10.34</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>15</td>
          <td>7.39</td>
      </tr>
  </tbody>
</table>
<p>Here we see that Airflow documentation is the crucial source of information. What&rsquo;s interesting is that more
than 60% of users are getting information from Github and Slack channels.</p>
<p><img src="/blog/airflow-survey-2020/Where_are_you_based.png" alt=""></p>
<h2 id="airflow-uses-cases">Airflow uses cases</h2>
<p><strong>Do you have any customisation of Airflow? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No, we use vanilla Airflow</td>
          <td>154</td>
          <td>75.86</td>
      </tr>
      <tr>
          <td>Yes, we have small patches (no fork)</td>
          <td>34</td>
          <td>16.75</td>
      </tr>
      <tr>
          <td>Yes, we have separate fork</td>
          <td>15</td>
          <td>7.39</td>
      </tr>
  </tbody>
</table>
<p><strong>When onboarding new members to airflow, what is the biggest problem? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No guide on best practises on developing DAGs</td>
          <td>102</td>
          <td>50.25</td>
      </tr>
      <tr>
          <td>There is no easy option to launch Airflow</td>
          <td>64</td>
          <td>31.53</td>
      </tr>
      <tr>
          <td>Small number of tutorials on different aspects of using Airflow</td>
          <td>57</td>
          <td>28.08</td>
      </tr>
      <tr>
          <td>Documentation is not clear enough</td>
          <td>53</td>
          <td>26.11</td>
      </tr>
      <tr>
          <td>There is no easy option to deploy DAGs to an Airflow instance</td>
          <td>52</td>
          <td>25.62</td>
      </tr>
      <tr>
          <td>No problems</td>
          <td>34</td>
          <td>16.75</td>
      </tr>
      <tr>
          <td>Small number of blogs regarding Airflow</td>
          <td>30</td>
          <td>14.78</td>
      </tr>
  </tbody>
</table>
<p><strong>Which interface(s) of Airflow do you use as part of your current role? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Original Airflow Graphical User Interface</td>
          <td>199</td>
          <td>98.03</td>
      </tr>
      <tr>
          <td>CLI</td>
          <td>88</td>
          <td>43.35</td>
      </tr>
      <tr>
          <td>API</td>
          <td>48</td>
          <td>23.65</td>
      </tr>
      <tr>
          <td>Custom (own created) Airflow Graphical User Interface</td>
          <td>12</td>
          <td>5.91</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>3</td>
          <td>1.48</td>
      </tr>
  </tbody>
</table>
<p><strong>Do you combine multiple DAGs? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Yes, by triggering another DAG</td>
          <td>87</td>
          <td>42.86</td>
      </tr>
      <tr>
          <td>No, I don&rsquo;t combine multiple DAGs</td>
          <td>79</td>
          <td>38.92</td>
      </tr>
      <tr>
          <td>Yes, through SubDAG</td>
          <td>40</td>
          <td>19.7</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>18</td>
          <td>8.87</td>
      </tr>
  </tbody>
</table>
<p><strong>How do you integrate with external services? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Using existing dedicated operators / hooks</td>
          <td>147</td>
          <td>72.41</td>
      </tr>
      <tr>
          <td>Using Bash / Python operator</td>
          <td>140</td>
          <td>68.97</td>
      </tr>
      <tr>
          <td>Using own custom operators / hooks</td>
          <td>138</td>
          <td>67.98</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>12</td>
          <td>5.91</td>
      </tr>
  </tbody>
</table>
<p><strong>What external services do you use in your Airflow DAGs? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Amazon Web Services</td>
          <td>121</td>
          <td>59.61</td>
      </tr>
      <tr>
          <td>Internal company systems</td>
          <td>113</td>
          <td>55.67</td>
      </tr>
      <tr>
          <td>Google Cloud Platform / Google APIs</td>
          <td>97</td>
          <td>47.78</td>
      </tr>
      <tr>
          <td>Hadoop / Spark / Flink / Other Apache software</td>
          <td>72</td>
          <td>35.47</td>
      </tr>
      <tr>
          <td>Microsoft Azure</td>
          <td>21</td>
          <td>10.34</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>19</td>
          <td>9.36</td>
      </tr>
      <tr>
          <td>I do not use external services in my Airflow DAGs</td>
          <td>5</td>
          <td>2.46</td>
      </tr>
  </tbody>
</table>
<p><img src="/blog/airflow-survey-2020/What_external_services_do_you_use_in_your_Airflow_DAGs.png" alt=""></p>
<p><strong>Do you use Airflow Plugins? If yes, what do you use them for? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Adding new operators/sensors and hooks</td>
          <td>119</td>
          <td>58.62</td>
      </tr>
      <tr>
          <td>I don&rsquo;t use Airflow plugins</td>
          <td>69</td>
          <td>33.99</td>
      </tr>
      <tr>
          <td>Adding AppBuilder views &amp; menu items</td>
          <td>27</td>
          <td>13.3</td>
      </tr>
      <tr>
          <td>Adding new executors</td>
          <td>17</td>
          <td>8.37</td>
      </tr>
      <tr>
          <td>Adding OperatorExtraLinks</td>
          <td>13</td>
          <td>6.4</td>
      </tr>
  </tbody>
</table>
<p>| Other</p>
<p><strong>Do you use Airflow&rsquo;s data lineage feature? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No, I will use such feature if fully supported in Airflow</td>
          <td>105</td>
          <td>51.72</td>
      </tr>
      <tr>
          <td>No, data lineage isn’t a concern for my usage.</td>
          <td>68</td>
          <td>33.5</td>
      </tr>
      <tr>
          <td>Yes, I use another data lineage product</td>
          <td>24</td>
          <td>11.82</td>
      </tr>
      <tr>
          <td>Yes, I use custom implementation</td>
          <td>5</td>
          <td>2.46</td>
      </tr>
      <tr>
          <td>Yes, I use Airflow&rsquo;s experimental data lineage feature</td>
          <td>1</td>
          <td>0.49</td>
      </tr>
  </tbody>
</table>
<p>When asked what lineage product users use, the answers were varying from custom tools
to known product like Amundsen, Atlas or dbt.</p>
<h2 id="deployment">Deployment</h2>
<p><strong>How many active DAGs do you have in your largest Airflow instance? (open question)</strong></p>
<table>
  <thead>
      <tr>
          <th>Number of DAGs</th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>&lt; 20</td>
          <td>64</td>
          <td>32</td>
      </tr>
      <tr>
          <td>21-40</td>
          <td>33</td>
          <td>16</td>
      </tr>
      <tr>
          <td>41-60</td>
          <td>13</td>
          <td>6</td>
      </tr>
      <tr>
          <td>61-100</td>
          <td>32</td>
          <td>16</td>
      </tr>
      <tr>
          <td>101-200</td>
          <td>31</td>
          <td>15</td>
      </tr>
      <tr>
          <td>201-300</td>
          <td>8</td>
          <td>4</td>
      </tr>
      <tr>
          <td>301-999</td>
          <td>12</td>
          <td>6</td>
      </tr>
      <tr>
          <td>1000+</td>
          <td>10</td>
          <td>5</td>
      </tr>
  </tbody>
</table>
<p><strong>What is the maximum number of tasks that you have used in one DAG? (open question)</strong></p>
<table>
  <thead>
      <tr>
          <th>Number of DAGs</th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>&lt; 10</td>
          <td>42</td>
          <td>21</td>
      </tr>
      <tr>
          <td>11-20</td>
          <td>31</td>
          <td>15</td>
      </tr>
      <tr>
          <td>21-30</td>
          <td>15</td>
          <td>7</td>
      </tr>
      <tr>
          <td>31-40</td>
          <td>11</td>
          <td>5</td>
      </tr>
      <tr>
          <td>41-50</td>
          <td>22</td>
          <td>11</td>
      </tr>
      <tr>
          <td>51-100</td>
          <td>39</td>
          <td>19</td>
      </tr>
      <tr>
          <td>101-200</td>
          <td>16</td>
          <td>8</td>
      </tr>
      <tr>
          <td>201-500</td>
          <td>16</td>
          <td>8</td>
      </tr>
      <tr>
          <td>501+</td>
          <td>11</td>
          <td>5</td>
      </tr>
  </tbody>
</table>
<p><strong>Which version of Airflow do you use currently? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>1.10.14</td>
          <td>55</td>
          <td>27.09</td>
      </tr>
      <tr>
          <td>2.0.0+</td>
          <td>45</td>
          <td>22.17</td>
      </tr>
      <tr>
          <td>1.10.12</td>
          <td>27</td>
          <td>13.3</td>
      </tr>
      <tr>
          <td>1.10.10</td>
          <td>26</td>
          <td>12.81</td>
      </tr>
      <tr>
          <td>1.10.11</td>
          <td>14</td>
          <td>6.9</td>
      </tr>
      <tr>
          <td>1.10.5 or older</td>
          <td>10</td>
          <td>4.93</td>
      </tr>
      <tr>
          <td>1.10.9</td>
          <td>8</td>
          <td>3.94</td>
      </tr>
      <tr>
          <td>1.10.13</td>
          <td>7</td>
          <td>3.45</td>
      </tr>
      <tr>
          <td>1.10.6</td>
          <td>4</td>
          <td>1.97</td>
      </tr>
      <tr>
          <td>1.10.7</td>
          <td>4</td>
          <td>1.97</td>
      </tr>
      <tr>
          <td>1.10.8</td>
          <td>3</td>
          <td>1.48</td>
      </tr>
  </tbody>
</table>
<p>This was probably one of the most important questions in the survey. While it&rsquo;s good to see
that more than 60% of users use one of three latest Airflow versions, it&rsquo;s worrying that the rest
are using versions that are old or have known security vulnerabilities.</p>
<p>Additionally, more than 20% of users are already using 2.0.0+ versions which is reasonably good information.</p>
<p><strong>What meta-database do you use? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Postgres 12</td>
          <td>36</td>
          <td>17.73</td>
      </tr>
      <tr>
          <td>Postgres 9.6</td>
          <td>33</td>
          <td>16.26</td>
      </tr>
      <tr>
          <td>Postgres 11</td>
          <td>31</td>
          <td>15.27</td>
      </tr>
      <tr>
          <td>MySQL 5.7</td>
          <td>27</td>
          <td>13.3</td>
      </tr>
      <tr>
          <td>MySQL 8.0</td>
          <td>20</td>
          <td>9.85</td>
      </tr>
      <tr>
          <td>Postgres 10</td>
          <td>20</td>
          <td>9.85</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>19</td>
          <td>9.36</td>
      </tr>
      <tr>
          <td>Postgres 13</td>
          <td>18</td>
          <td>8.87</td>
      </tr>
  </tbody>
</table>
<p>This means that more about 69% of users decide to use Postgres as their meta-database.
MySQL is the choice of nearly 24% users. The other responses included some MySQL versions
like MariaDB or cloud hosted database like Cloud SQL (used by Google Composer) or AWS Aurora.</p>
<p>It&rsquo;s good to know that users rather avoid using SQLite in production deployments!</p>
<p><strong>What executor type do you use? (single choice)</strong></p>
<p><img src="/blog/airflow-survey-2020/What_executor_type_do_you_use.png" alt=""></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>2020</th>
          <th>2019</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Celery</td>
          <td>100</td>
          <td>49.26%</td>
          <td>44.81%</td>
      </tr>
      <tr>
          <td>Kubernetes</td>
          <td>48</td>
          <td>23.65%</td>
          <td>16.88%</td>
      </tr>
      <tr>
          <td>Local</td>
          <td>40</td>
          <td>19.7%</td>
          <td>27.60%</td>
      </tr>
      <tr>
          <td>Sequential</td>
          <td>10</td>
          <td>4.93%</td>
          <td>7.14%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>5</td>
          <td>2.46%</td>
          <td>3.57</td>
      </tr>
  </tbody>
</table>
<p>In comparison to previous year it seems that more users use currently Celery and
Kubernetes executors and LocalExecutor usage dropped by nearly 8 points. This may
suggest that users&rsquo; deployments are growing, and they need more scalable solutions.</p>
<p>Among CeleryExecutor users 78% use Redis as a broker, 19% use RabbitMQ and the rest
is using other brokers or is not sure what is used in their deployments.</p>
<p><strong>What metrics do you use to monitor Airflow? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>I do not use monitoring</td>
          <td>65</td>
          <td>32.02</td>
      </tr>
      <tr>
          <td>External monitoring service</td>
          <td>60</td>
          <td>29.56</td>
      </tr>
      <tr>
          <td>Information from metadatabase</td>
          <td>51</td>
          <td>25.12</td>
      </tr>
      <tr>
          <td>Statsd</td>
          <td>49</td>
          <td>24.14</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>31</td>
          <td>15.27</td>
      </tr>
  </tbody>
</table>
<p>The other responses included mostly information about tools used by users
including DataDog and Prometheus exporter.</p>
<p><strong>How do you deploy Airflow? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>On virtual machines (for example using AWS EC2)</td>
          <td>64</td>
          <td>31.53</td>
      </tr>
      <tr>
          <td>Using a managed service like Astronomer, Google Composer or AWS MWAA</td>
          <td>35</td>
          <td>17.24</td>
      </tr>
      <tr>
          <td>On Kubernetes (using custom deployments)</td>
          <td>29</td>
          <td>14.29</td>
      </tr>
      <tr>
          <td>On premises</td>
          <td>28</td>
          <td>13.79</td>
      </tr>
      <tr>
          <td>On Kubernetes (using another helm chart)</td>
          <td>20</td>
          <td>9.85</td>
      </tr>
      <tr>
          <td>On Kubernetes (using Apache Airflow&rsquo;s helm chart)</td>
          <td>17</td>
          <td>8.37</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>12</td>
          <td>5.91</td>
      </tr>
  </tbody>
</table>
<p>Nearly 33% of users deploys Airflow using some kind of Kubernetes deployment. This is about
10 percent more than in 2019. There&rsquo;s slightly increase in usage of Airflow via
managed services (14.61% in 2019).</p>
<p><strong>Do you use containerisation for deployment? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Yes, using helm chart / kubernetes</td>
          <td>58</td>
          <td>28.57</td>
      </tr>
      <tr>
          <td>No, I don’t use containerisation</td>
          <td>57</td>
          <td>28.08</td>
      </tr>
      <tr>
          <td>Yes, single docker image</td>
          <td>49</td>
          <td>24.14</td>
      </tr>
      <tr>
          <td>Yes, using docker compose</td>
          <td>39</td>
          <td>19.21</td>
      </tr>
  </tbody>
</table>
<p>Among users who do not use Kubernetes based deployments 58% of them use containerisation. About
42% of those users use docker-compose for deployments.</p>
<p><strong>How do you distribute your DAGs? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Using a synchronizing process (Git sync, GCS fuse, etc)</td>
          <td>79</td>
          <td>38.92</td>
      </tr>
      <tr>
          <td>Bake them into the docker image</td>
          <td>56</td>
          <td>27.59</td>
      </tr>
      <tr>
          <td>Shared files system</td>
          <td>34</td>
          <td>16.75</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>20</td>
          <td>9.85</td>
      </tr>
      <tr>
          <td>I don’t know</td>
          <td>14</td>
          <td>6.9</td>
      </tr>
  </tbody>
</table>
<p>The most popular way of distributing DAGs seems to be using a synchronizing process. About
40% of users use this process together with Kubernetes deployments.</p>
<h2 id="future-of-airflow">Future of Airflow</h2>
<p><strong>In your opinion, what could be improved in Airflow? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Web UI</td>
          <td>100</td>
          <td>49.26</td>
      </tr>
      <tr>
          <td>Examples, how-to, onboarding documentation</td>
          <td>90</td>
          <td>44.33</td>
      </tr>
      <tr>
          <td>Logging, monitoring and alerting</td>
          <td>90</td>
          <td>44.33</td>
      </tr>
      <tr>
          <td>Technical documentation</td>
          <td>90</td>
          <td>44.33</td>
      </tr>
      <tr>
          <td>Scheduler performance</td>
          <td>83</td>
          <td>40.89</td>
      </tr>
      <tr>
          <td>DAG authoring</td>
          <td>64</td>
          <td>31.53</td>
      </tr>
      <tr>
          <td>Authentication and authorization</td>
          <td>58</td>
          <td>28.57</td>
      </tr>
      <tr>
          <td>REST API</td>
          <td>51</td>
          <td>25.12</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>44</td>
          <td>21.67</td>
      </tr>
      <tr>
          <td>Reliability</td>
          <td>41</td>
          <td>20.2</td>
      </tr>
      <tr>
          <td>External integration e.g. AWS, GCP, Apache products</td>
          <td>36</td>
          <td>17.73</td>
      </tr>
      <tr>
          <td>Security</td>
          <td>28</td>
          <td>13.79</td>
      </tr>
      <tr>
          <td>CLI</td>
          <td>20</td>
          <td>9.85</td>
      </tr>
      <tr>
          <td>Everything work fine for me</td>
          <td>14</td>
          <td>6.9</td>
      </tr>
      <tr>
          <td>I don’t know</td>
          <td>4</td>
          <td>1.97</td>
      </tr>
  </tbody>
</table>
<p><strong>Which features would most interest you? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>DAG versioning</td>
          <td>109</td>
          <td>53.69</td>
      </tr>
      <tr>
          <td>Builtin statistics</td>
          <td>71</td>
          <td>34.98</td>
      </tr>
      <tr>
          <td>Improved data lineage</td>
          <td>65</td>
          <td>32.02</td>
      </tr>
      <tr>
          <td>Scheduling at the start of the interval</td>
          <td>63</td>
          <td>31.03</td>
      </tr>
      <tr>
          <td>Stateless workers</td>
          <td>59</td>
          <td>29.06</td>
      </tr>
      <tr>
          <td>More option to configure schedules (time units, increments)</td>
          <td>57</td>
          <td>28.08</td>
      </tr>
      <tr>
          <td>Multi-tenant deployment</td>
          <td>49</td>
          <td>24.14</td>
      </tr>
      <tr>
          <td>DAG fetcher (AIP-5)</td>
          <td>39</td>
          <td>19.21</td>
      </tr>
      <tr>
          <td>Generic transfer operator</td>
          <td>34</td>
          <td>16.75</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>33</td>
          <td>16.26</td>
      </tr>
      <tr>
          <td>I have everything I need</td>
          <td>11</td>
          <td>5.42</td>
      </tr>
      <tr>
          <td>Nothing</td>
          <td>11</td>
          <td>5.42</td>
      </tr>
  </tbody>
</table>
<p><strong>Will you consider migrating to Airflow 2.0? (single choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Yes, as soon as possible</td>
          <td>81</td>
          <td>39.9</td>
      </tr>
      <tr>
          <td>Yes, once it’s mature (for example after 2.1)</td>
          <td>72</td>
          <td>35.47</td>
      </tr>
      <tr>
          <td>I am already using Airflow 2.0+</td>
          <td>39</td>
          <td>19.21</td>
      </tr>
      <tr>
          <td>I don&rsquo;t know yet</td>
          <td>8</td>
          <td>3.94</td>
      </tr>
      <tr>
          <td>No, I do not plan to migrate</td>
          <td>3</td>
          <td>1.48</td>
      </tr>
  </tbody>
</table>
<p><strong>What are the features of Airflow 2.0 you are most excited about? (multiple choice)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>General performance improvements</td>
          <td>133</td>
          <td>65.52</td>
      </tr>
      <tr>
          <td>Refreshed WebUI</td>
          <td>102</td>
          <td>50.25</td>
      </tr>
      <tr>
          <td>Scheduler HA</td>
          <td>99</td>
          <td>48.77</td>
      </tr>
      <tr>
          <td>Official docker image</td>
          <td>84</td>
          <td>41.38</td>
      </tr>
      <tr>
          <td>@task decorator</td>
          <td>56</td>
          <td>27.59</td>
      </tr>
      <tr>
          <td>Official helm chart</td>
          <td>51</td>
          <td>25.12</td>
      </tr>
      <tr>
          <td>Providers packages</td>
          <td>41</td>
          <td>20.2</td>
      </tr>
      <tr>
          <td>Configurable XCom backends</td>
          <td>33</td>
          <td>16.26</td>
      </tr>
      <tr>
          <td>CeleryKubernetesExecutor</td>
          <td>31</td>
          <td>15.27</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>12</td>
          <td>5.91</td>
      </tr>
  </tbody>
</table>
<h2 id="summary">Summary</h2>
<p>From an open-source point of view, it is good to see that many people would love to contribute to Apache Airflow.
This means that there are resources that if unleashed may make our community even stronger. From a product perspective, it is important to know that users are usually using the latest versions of our software and
are willing to upgrade to new ones.</p>
<p>Finally, there are still some things to improve - documentation, onboarding guides and plug-and-play airflow
deployments. However, we hope that with the increase of adoption there will be an increase in people willing
to share their experience and tools.</p>
<h2 id="data">Data</h2>
<p>If you think I missed something or you simply want to look for insights on your own, the data is available for you here: (Airflow User Survey 2020.csv)[/data/survey-responses/airflow-user-survey-responses-2020.csv.zip]</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 2.0 is here!</title>
    <link href="/blog/airflow-two-point-oh-is-here/" rel="alternate"/>
    <id>/blog/airflow-two-point-oh-is-here/</id>
    <published>2020-12-17T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I am proud to announce that Apache Airflow 2.0.0 has been released.</p>
<p>The full changelog is about 3,000 lines long (already excluding everything backported to 1.10), so for now I&rsquo;ll simply share some of the major features in 2.0.0 compared to 1.10.14:</p>
<h2 id="a-new-way-of-writing-dags-the-taskflow-api-aip-31">A new way of writing dags: the TaskFlow API (AIP-31)</h2>
<p>(Known in 2.0.0alphas as Functional DAGs.)</p>
<p>DAGs are now much much nicer to author especially when using PythonOperator. Dependencies are handled more clearly and XCom is nicer to use</p>
<p>Read more here:</p>
<p><a href="http://airflow.apache.org/docs/apache-airflow/stable/tutorial_taskflow_api.html">TaskFlow API Tutorial</a> <br>
<a href="https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#decorated-flows">TaskFlow API Documentation</a></p>
<p>A quick teaser of what DAGs can now look like:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.decorators</span> <span class="kn">import</span> <span class="n">dag</span><span class="p">,</span> <span class="n">task</span>
</span></span><span class="line"><span class="cl"><span class="kn">from</span> <span class="nn">airflow.utils.dates</span> <span class="kn">import</span> <span class="n">days_ago</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="nd">@dag</span><span class="p">(</span><span class="n">default_args</span><span class="o">=</span><span class="p">{</span><span class="s1">&#39;owner&#39;</span><span class="p">:</span> <span class="s1">&#39;airflow&#39;</span><span class="p">},</span> <span class="n">schedule_interval</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">start_date</span><span class="o">=</span><span class="n">days_ago</span><span class="p">(</span><span class="mi">2</span><span class="p">))</span>
</span></span><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">tutorial_taskflow_api_etl</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">   <span class="nd">@task</span>
</span></span><span class="line"><span class="cl">   <span class="k">def</span> <span class="nf">extract</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">       <span class="k">return</span> <span class="p">{</span><span class="s2">&#34;1001&#34;</span><span class="p">:</span> <span class="mf">301.27</span><span class="p">,</span> <span class="s2">&#34;1002&#34;</span><span class="p">:</span> <span class="mf">433.21</span><span class="p">,</span> <span class="s2">&#34;1003&#34;</span><span class="p">:</span> <span class="mf">502.22</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">   <span class="nd">@task</span>
</span></span><span class="line"><span class="cl">   <span class="k">def</span> <span class="nf">transform</span><span class="p">(</span><span class="n">order_data_dict</span><span class="p">:</span> <span class="nb">dict</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span>
</span></span><span class="line"><span class="cl">       <span class="n">total_order_value</span> <span class="o">=</span> <span class="mi">0</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">       <span class="k">for</span> <span class="n">value</span> <span class="ow">in</span> <span class="n">order_data_dict</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
</span></span><span class="line"><span class="cl">           <span class="n">total_order_value</span> <span class="o">+=</span> <span class="n">value</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">       <span class="k">return</span> <span class="p">{</span><span class="s2">&#34;total_order_value&#34;</span><span class="p">:</span> <span class="n">total_order_value</span><span class="p">}</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">   <span class="nd">@task</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">   <span class="k">def</span> <span class="nf">load</span><span class="p">(</span><span class="n">total_order_value</span><span class="p">:</span> <span class="nb">float</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">       <span class="nb">print</span><span class="p">(</span><span class="s2">&#34;Total order value is: </span><span class="si">%.2f</span><span class="s2">&#34;</span> <span class="o">%</span> <span class="n">total_order_value</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl">   <span class="n">order_data</span> <span class="o">=</span> <span class="n">extract</span><span class="p">()</span>
</span></span><span class="line"><span class="cl">   <span class="n">order_summary</span> <span class="o">=</span> <span class="n">transform</span><span class="p">(</span><span class="n">order_data</span><span class="p">)</span>
</span></span><span class="line"><span class="cl">   <span class="n">load</span><span class="p">(</span><span class="n">order_summary</span><span class="p">[</span><span class="s2">&#34;total_order_value&#34;</span><span class="p">])</span>
</span></span><span class="line"><span class="cl">
</span></span><span class="line"><span class="cl"><span class="n">tutorial_etl_dag</span> <span class="o">=</span> <span class="n">tutorial_taskflow_api_etl</span><span class="p">()</span>
</span></span></code></pre></div><h2 id="fully-specified-rest-api-aip-32">Fully specified REST API (AIP-32)</h2>
<p>We now have a fully supported, no-longer-experimental API with a comprehensive OpenAPI specification</p>
<p>Read more here:</p>
<p><a href="http://airflow.apache.org/docs/apache-airflow/stable/stable-rest-api-ref.html">REST API Documentation</a>.</p>
<h2 id="massive-scheduler-performance-improvements">Massive Scheduler performance improvements</h2>
<p>As part of AIP-15 (Scheduler HA+performance) and other work Kamil did, we significantly improved the performance of the Airflow Scheduler. It now starts tasks much, MUCH quicker.</p>
<p>Over at Astronomer.io we&rsquo;ve <a href="https://www.astronomer.io/blog/airflow-2-scheduler">benchmarked the scheduler—it&rsquo;s fast</a> (we had to triple check the numbers as we don&rsquo;t quite believe them at first!)</p>
<h2 id="scheduler-is-now-ha-compatible-aip-15">Scheduler is now HA compatible (AIP-15)</h2>
<p>It&rsquo;s now possible and supported to run more than a single scheduler instance. This is super useful for both resiliency (in case a scheduler goes down) and scheduling performance.</p>
<p>To fully use this feature you need Postgres 9.6+ or MySQL 8+ (MySQL 5, and MariaDB won&rsquo;t work with more than one scheduler I&rsquo;m afraid).</p>
<p>There&rsquo;s no config or other set up required to run more than one scheduler—just start up a scheduler somewhere else (ensuring it has access to the DAG files) and it will cooperate with your existing schedulers through the database.</p>
<p>For more information, read the <a href="http://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler">Scheduler HA documentation</a>.</p>
<h2 id="task-groups-aip-34">Task Groups (AIP-34)</h2>
<p>SubDAGs were commonly used for grouping tasks in the UI, but they had many drawbacks in their execution behaviour (primarily that they only executed a single task in parallel!) To improve this experience, we’ve introduced &ldquo;Task Groups&rdquo;: a method for organizing tasks which provides the same grouping behaviour as a subdag without any of the execution-time drawbacks.</p>
<p>SubDAGs will still work for now, but we think that any previous use of SubDAGs can now be replaced with task groups. If you find an example where this isn&rsquo;t the case, please let us know by opening an issue on GitHub</p>
<p>For more information, check out the <a href="http://airflow.apache.org/docs/apache-airflow/stable/concepts.html#taskgroup">Task Group documentation</a>.</p>
<h2 id="refreshed-ui">Refreshed UI</h2>
<p>We&rsquo;ve given the Airflow UI <a href="https://github.com/apache/airflow/pull/11195">a visual refresh</a> and updated some of the styling.</p>
<p><img src="/blog/airflow-two-point-oh-is-here/airflow-2.0-ui.gif" alt="Airflow 2.0’s new UI"></p>
<p>We have also added an option to auto-refresh task states in Graph View so you no longer need to continuously press the refresh button :).</p>
<p>Check out <a href="http://airflow.apache.org/docs/apache-airflow/stable/ui.html">the screenshots in the docs</a> for more.</p>
<h2 id="smart-sensors-for-reduced-load-from-sensors-aip-17">Smart Sensors for reduced load from sensors (AIP-17)</h2>
<p>If you make heavy use of sensors in your Airflow cluster, you might find that sensor execution takes up a significant proportion of your cluster even with &ldquo;reschedule&rdquo; mode. To improve this, we&rsquo;ve added a new mode called &ldquo;Smart Sensors&rdquo;.</p>
<p>This feature is in &ldquo;early-access&rdquo;: it&rsquo;s been well-tested by Airbnb and is &ldquo;stable&rdquo;/usable, but we reserve the right to make backwards incompatible changes to it in a future release (if we have to. We&rsquo;ll try very hard not to!)</p>
<p>Read more about it in the <a href="https://airflow.apache.org/docs/apache-airflow/stable/smart-sensor.html">Smart Sensors documentation</a>.</p>
<h2 id="simplified-kubernetesexecutor">Simplified KubernetesExecutor</h2>
<p>For Airflow 2.0, we have re-architected the KubernetesExecutor in a fashion that is simultaneously faster, easier to understand, and more flexible for Airflow users. Users will now be able to access the full Kubernetes API to create a .yaml <code>pod_template_file</code> instead of specifying parameters in their airflow.cfg.</p>
<p>We have also replaced the <code>executor_config</code> dictionary with the <code>pod_override</code> parameter, which takes a Kubernetes V1Pod object for a1:1 setting override. These changes have removed over three thousand lines of code from the KubernetesExecutor, which makes it run faster and creates fewer potential errors.</p>
<p>Read more here:</p>
<p><a href="https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-template-file">Docs on pod_template_file</a> <br>
<a href="https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html?highlight=pod_override#pod-override">Docs on pod_override</a></p>
<h2 id="airflow-core-and-providers-splitting-airflow-into-60-packages">Airflow core and providers: Splitting Airflow into 60+ packages:</h2>
<p>Airflow 2.0 is not a monolithic &ldquo;one to rule them all&rdquo; package. We’ve split Airflow into core and 61 (for now) provider packages. Each provider package is for either a particular external service (Google, Amazon, Microsoft, Snowflake), a database (Postgres, MySQL), or a protocol (HTTP/FTP). Now you can create a custom Airflow installation from &ldquo;building&rdquo; blocks and choose only what you need, plus add whatever other requirements you might have. Some of the common providers are installed automatically (ftp, http, imap, sqlite) as they are commonly used. Other providers are automatically installed when you choose appropriate extras when installing Airflow.</p>
<p>The provider architecture should make it much easier to get a fully customized, yet consistent runtime with the right set of Python dependencies.</p>
<p>But that’s not all: you can write your own custom providers and add things like custom connection types, customizations of the Connection Forms, and extra links to your operators in a manageable way. You can build your own provider and install it as a Python package and have your customizations visible right in the Airflow UI.</p>
<p>Our very own Jarek Potiuk has written about <a href="https://higrys.medium.com/airflow-2-0-providers-1bd21ba3bd93">providers in much more detail</a> on Jarek&rsquo;s blog.</p>
<p>Docs on the <a href="http://airflow.apache.org/docs/apache-airflow-providers/">providers concept and writing custom providers</a> <br>
Docs on <a href="http://airflow.apache.org/docs/apache-airflow-providers/packages-ref.html">all providers packages available</a></p>
<h2 id="security">Security</h2>
<p>As part of Airflow 2.0 effort, there has been a conscious focus on Security and reducing areas of exposure. This is represented across different functional areas in different forms. For example, in the new REST API, all operations now require authorization. Similarly, in the configuration settings, the Fernet key is now required to be specified.</p>
<h2 id="configuration">Configuration</h2>
<p>Configuration in the form of the airflow.cfg file has been rationalized further in distinct sections, specifically around &ldquo;core&rdquo;. Additionally, a significant amount of configuration options have been deprecated or moved to individual component-specific configuration files, such as the pod-template-file for Kubernetes execution-related configuration.</p>
<h2 id="thanks-to-all-of-you">Thanks to all of you</h2>
<p>We&rsquo;ve tried to make as few breaking changes as possible and to provide deprecation path in the code, especially in the case of anything called in the DAG. That said, please read through UPDATING.md to check what might affect you. For example: We have re-organized the layout of operators (they now all live under airflow.providers.*) but the old names should continue to work - you&rsquo;ll just notice a lot of DeprecationWarnings that need to be fixed up.</p>
<p>Thank you so much to all the contributors who got us to this point, in no particular order: Kaxil Naik, Daniel Imberman, Jarek Potiuk, Tomek Urbaszek, Kamil Breguła, Gerard Casas Saez, Xiaodong DENG, Kevin Yang, James Timmins, Yingbo Wang, Qian Yu, Ryan Hamilton and the 100s of others who keep making Airflow better for everyone.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Journey with Airflow as an Outreachy Intern</title>
    <link href="/blog/experience-with-airflow-as-an-outreachy-intern/" rel="alternate"/>
    <id>/blog/experience-with-airflow-as-an-outreachy-intern/</id>
    <published>2020-08-30T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p><a href="https://www.outreachy.org/">Outreachy</a> is a program which organises three months paid internships with FOSS
projects for people who are typically underrepresented in those projects.</p>
<h3 id="contribution-period">Contribution Period</h3>
<p>The first thing I had to do was choose a project under an organisation. After going through all the projects
I chose “Extending the REST API of Apache Airflow”, because I had a good idea of what  REST API(s) are, so I
thought it would be easier to get started with the contributions. The next step was to set up Airflow’s dev
environment which thanks to <a href="https://github.com/apache/airflow/blob/master/BREEZE.rst">Breeze</a>, was a breeze.
Since I had never contributed to FOSS before so this part was overwhelming but there were plenty of issues
labelled “good first issues” with detailed descriptions and some even had code snippets so luckily that nudged
me in the right direction. These things about Airflow and the positive vibes from the community were the reasons
why I chose to stick with Airflow as my Outreachy project.</p>
<h3 id="internship-period">Internship Period</h3>
<p>My first PR was followed by many new experiences one of them being that I introduced a
<a href="https://github.com/apache/airflow/pull/7680#issuecomment-619763051">bug</a> in it;).
But nonetheless it made me familiar with the feedback loop and the feedback on my subsequent
<a href="https://github.com/apache/airflow/pulls?q=is%3Apr&#43;author%3AOmairK&#43;">PRs</a> was the focal point of the overall
learning experience I went through, which boosted my confidence to contribute more and move out of my comfort zone.
I wanted to learn more about the things that happen under the Airflow’s hood so I started filtering out recent PRs
dealing with different components and I would go through the code changes along with discussion that would help me
get a better understanding of the whole workflow. <a href="https://lists.apache.org/list.html?dev@airflow.apache.org">Airflow’s mailing list</a>
was also a great source of knowledge.</p>
<p>The API related PRs that I worked on helped me with some of the important concepts like:</p>
<ol>
<li>
<p><a href="https://github.com/apache/airflow/pull/9329">Pool CRUD endpoints</a> where pools limit the execution parallelism.</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9597">Tasks</a> determine the actual work that has to be carried out.</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9473">DAG</a> which represents the structure for a collection
of tasks. It keeps track of tasks, their dependencies and the sequence in which they have to run.</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9473">Dag Runs</a> that are the instantiation of DAG(s) in time.</p>
</li>
</ol>
<p>Through actively and passively participating in discussions I learnt that even if there is a difference of opinion
one could always learn from the different approaches, and <a href="https://github.com/apache/airflow/pull/8721">this PR</a> with
more than 300+ comments is the proof of it. I also started reviewing small PRs which gave me the amazing opportunity
to interact with new people. Throughout my internship I learnt a lot about different frameworks and technologies
but the biggest takeaway for me was that a code is read more often than it&rsquo;s written, and I started writing code with
that in mind.</p>
<h3 id="wrapping-up">Wrapping Up</h3>
<p>So with my project of extending Airflow’s REST API as well as the Outreachy internship coming to an end I would like
to thank my mentors <a href="https://github.com/potiuk">Jarek Potiuk</a>, <a href="https://github.com/kaxil">Kaxil Naik</a> and
<a href="https://github.com/mik-laj">Kamil Breguła</a> for the patience and the time they invested in mentoring me and
the Airflow community for making me feel so welcomed. I plan to stick around and contribute to give back to the
community that has been made my summer, one to remember.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 1.10.12</title>
    <link href="/blog/airflow-1.10.12/" rel="alternate"/>
    <id>/blog/airflow-1.10.12/</id>
    <published>2020-08-25T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Airflow 1.10.12 contains 113 commits since 1.10.11 and includes 5 new features, 23 improvements, 23 bug fixes,
and several doc changes.</p>
<p><strong>Details</strong>:</p>
<ul>
<li><strong>PyPI</strong>: <a href="https://pypi.org/project/apache-airflow/1.10.12/">https://pypi.org/project/apache-airflow/1.10.12/</a></li>
<li><strong>Docs</strong>: <a href="https://airflow.apache.org/docs/1.10.12/">https://airflow.apache.org/docs/1.10.12/</a></li>
<li><strong>Changelog</strong>: <a href="http://airflow.apache.org/docs/1.10.12/changelog.html">http://airflow.apache.org/docs/1.10.12/changelog.html</a></li>
</ul>
<p><strong>Airflow 1.10.11 has breaking changes with respect to
KubernetesExecutor &amp; KubernetesPodOperator so I recommend users to directly upgrade to Airflow 1.10.12 instead</strong>.</p>
<p>Some of the noteworthy new features (user-facing) are:</p>
<ul>
<li><a href="https://github.com/apache/airflow/pull/8560">Allow defining custom XCom class</a></li>
<li><a href="https://github.com/apache/airflow/pull/9645">Get Airflow configs with sensitive data from Secret Backends</a></li>
<li><a href="https://github.com/apache/airflow/pull/10282">Add AirflowClusterPolicyViolation support to Airflow local settings</a></li>
</ul>
<h3 id="allow-defining-custom-xcom-class">Allow defining Custom XCom class</h3>
<p>Until Airflow 1.10.11, the XCom data was only stored in Airflow Metadatabase. From Airflow 1.10.12, users
would be able to define custom XCom classes. This will allow users to transfer larger data between tasks.
An example here would be to store XCom in S3 or GCS Bucket if the size of data that needs to be stored is larger
than <code>XCom.MAX_XCOM_SIZE</code> (48 KB).</p>
<p><strong>PR</strong>: <a href="https://github.com/apache/airflow/pull/8560">https://github.com/apache/airflow/pull/8560</a></p>
<h3 id="get-airflow-configs-with-sensitive-data-from-secret-backends">Get Airflow configs with sensitive data from Secret Backends</h3>
<p>Users would be able to get the following Airflow configs from Secrets Backend like Hashicorp Vault:</p>
<ul>
<li><code>sql_alchemy_conn</code> in [core] section</li>
<li><code>fernet_key</code> in [core] section</li>
<li><code>broker_url</code> in [celery] section</li>
<li><code>flower_basic_auth</code> in [celery] section</li>
<li><code>result_backend</code> in [celery] section</li>
<li><code>password</code> in [atlas] section</li>
<li><code>smtp_password</code> in [smtp] section</li>
<li><code>bind_password</code> in [ldap] section</li>
<li><code>git_password</code> in [kubernetes] section</li>
</ul>
<p>Further improving Airflow&rsquo;s Secret Management story, from Airflow 1.10.12, users don&rsquo;t need to hardcode
the <strong>sensitive</strong> config value in airflow.cfg nor then need to use an Environment variable to set this config.</p>
<p>For example, the metadata database connection string can either be set in airflow.cfg like this:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="cl"><span class="k">[core]</span>
</span></span><span class="line"><span class="cl"><span class="na">sql_alchemy_conn_secret</span> <span class="o">=</span> <span class="s">sql_alchemy_conn</span>
</span></span></code></pre></div><p>This will retrieve config option from the set Secret Backends.</p>
<p>As you can see you just need to add a <code>_secret</code> suffix at the end of the actual config option
and the value needs to be the <strong>key</strong> which the Secrets backend will look for.</p>
<p>Similarly, <code>_secret</code> config options can also be set using a corresponding environment variable. For example:</p>
<pre tabindex="0"><code>export AIRFLOW__CORE__SQL_ALCHEMY_CONN_SECRET=sql_alchemy_conn
</code></pre><p>More details: <a href="http://airflow.apache.org/docs/1.10.12/howto/set-config.html">http://airflow.apache.org/docs/1.10.12/howto/set-config.html</a></p>
<h3 id="add-airflowclusterpolicyviolation-support-to-airflow_local_settingspy">Add AirflowClusterPolicyViolation support to airflow_local_settings.py</h3>
<p>Users can use Cluster Policies to apply cluster-wide checks on Airflow
tasks. You can raise <a href="http://airflow.apache.org/docs/1.10.12/_api/airflow/exceptions/index.html#airflow.exceptions.AirflowClusterPolicyViolation">AirflowClusterPolicyViolation</a>
in a policy or task mutation hook to prevent a DAG from being
imported or prevent a task from being executed if the task is not compliant with
your check.</p>
<p>These checks are intended to help teams using Airflow to protect against common
beginner errors that may get past a code reviewer, rather than as technical
security controls.</p>
<p>For example, don&rsquo;t run tasks without <code>airflow</code> owners:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="k">def</span> <span class="nf">task_must_have_owners</span><span class="p">(</span><span class="n">task</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">    <span class="k">if</span> <span class="ow">not</span> <span class="n">task</span><span class="o">.</span><span class="n">owner</span> <span class="ow">or</span> <span class="n">task</span><span class="o">.</span><span class="n">owner</span><span class="o">.</span><span class="n">lower</span><span class="p">()</span> <span class="o">==</span> <span class="n">conf</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s1">&#39;operators&#39;</span><span class="p">,</span> <span class="s1">&#39;default_owner&#39;</span><span class="p">):</span>
</span></span><span class="line"><span class="cl">        <span class="k">raise</span> <span class="n">AirflowClusterPolicyViolation</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">            <span class="s1">&#39;Task must have non-None non-default owner. Current value: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">task</span><span class="o">.</span><span class="n">owner</span><span class="p">))</span>
</span></span></code></pre></div><p>More details: <a href="http://airflow.apache.org/docs/1.10.12/concepts.html#cluster-policies-for-custom-task-checks">http://airflow.apache.org/docs/1.10.12/concepts.html#cluster-policies-for-custom-task-checks</a></p>
<h3 id="launch-pods-via-yaml-files-when-using-kubernetesexecutor-and-kubernetespodoperator">Launch Pods via YAML files when using KubernetesExecutor and KubernetesPodOperator</h3>
<p>As of 1.10.12, users can launch pods via YAML files instead of passing various configurations.</p>
<p>To allow greater flexibility we have deprecated Airflow&rsquo;s Pod class and instead now use classes and
objects from the official Kubernetes API. The POD class will still work but raise a deprecation
warning. This feature involved a pretty extensive rewrite of all of our pod creation code.</p>
<p>Initially, we were going to hold off on these features until Airflow 2.0. However, we soon
realized that exposing these features in 1.10.x is crucial in preparing users for the 2.0 release to come.</p>
<p>Details: <a href="https://github.com/apache/airflow/pull/6230">https://github.com/apache/airflow/pull/6230</a> (<a href="https://github.com/apache/airflow/commit/7aa0f472b57985a952a3e3d0a38f1b2535d93413">Backport commit</a>)</p>
<h2 id="updating-guide">Updating Guide</h2>
<p>If you are updating Apache Airflow from a previous version to <code>1.10.12</code>, please take a note of the following:</p>
<ul>
<li>
<p>Run <code>airflow upgradedb</code> after <code>pip install -U apache-airflow==1.10.12</code> as <code>1.10.12</code> contains 1 database migration.</p>
</li>
<li>
<p>As of airflow 1.10.12, using the <code>airflow.contrib.kubernetes.Pod</code> class in the <code>pod_mutation_hook</code> is now
deprecated. Instead we recommend that users treat the pod parameter as a <code>kubernetes.client.models.V1Pod</code> object.
This means that users now have access to the full Kubernetes API when modifying airflow pods for mutating POD.</p>
</li>
<li>
<p>Previously, when tasks skipped by SkipMixin (such as <code>BranchPythonOperator</code>, <code>BaseBranchOperator</code> and
<code>ShortCircuitOperator</code>) are cleared, they execute. Since 1.10.12, when such skipped tasks are cleared,
they will be skipped again by the newly introduced <code>NotPreviouslySkippedDep</code>.</p>
</li>
</ul>
<h2 id="special-note">Special Note</h2>
<h3 id="python-2">Python 2</h3>
<p>Python 2 has reached end of its life on Jan 2020. Airflow Master no longer supports Python 2.
Airflow 1.10.* would be the last series to support Python 2.</p>
<p>We strongly recommend users to use Python &gt;= 3.6</p>
<h3 id="use-airflow-rbac-ui">Use Airflow RBAC UI</h3>
<p>Airflow 1.10.12 ships with 2 UIs, the default is non-RBAC Flask-admin based UI and Flask-appbuilder based UI.</p>
<p>The Flask-AppBuilder (FAB) based UI allows Role-based Access Control and has more advanced features compared to
the legacy Flask-admin based UI. This UI can be enabled by setting <code>rbac=True</code> in <code>[webserver]</code> section in
your <code>airflow.cfg</code>.</p>
<p>Flask-admin based UI is deprecated and new features won&rsquo;t be ported to it. This UI will still be the default
for 1.10.* series but would no longer be available from Airflow 2.0</p>
<h3 id="we-have-moved-to-github-issues">We have moved to GitHub Issues</h3>
<p>The Airflow Project has moved from <a href="https://issues.apache.org/jira/projects/AIRFLOW/issues">JIRA</a> to
<a href="https://github.com/apache/airflow/issues">GitHub</a> for tracking issues.</p>
<p>So if you find any bugs in Airflow 1.10.12 please create a GitHub Issue for it.</p>
<h2 id="list-of-contributors">List of Contributors</h2>
<p>According to git shortlog, the following people contributed to the 1.10.12 release. Thank you to all contributors!</p>
<p>Alexander Sutcliffe, Andy, Aneesh Joseph, Ash Berlin-Taylor, Aviral Agrawal, BaoshanGu, Beni Ben zikry,
Daniel Imberman, Daniel Standish, Danylo Baibak, Ephraim Anierobi, Felix Uellendall, Greg Neiheisel,
Hartorn, Jacob Ferriero, Jannik F, Jarek Potiuk, Jinhui Zhang, Kamil Breguła, Kaxil Naik, Kurganov,
Luis Magana, Max Arrich, Pete DeJoy, Sumit Maheshwari, Tomek Urbaszek, Vicken Simonian, Vinnie Guimaraes,
William Tran, Xiaodong Deng, YI FU, Zikun Zhu, dewaldabrie, pulsar314, retornam, yuqian90</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow For Newcomers</title>
    <link href="/blog/apache-airflow-for-newcomers/" rel="alternate"/>
    <id>/blog/apache-airflow-for-newcomers/</id>
    <published>2020-08-17T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Apache Airflow is a platform to programmatically author, schedule, and monitor workflows.
A workflow is a sequence of tasks that processes a set of data. You can think of workflow as the
path that describes how tasks go from being undone to done. Scheduling, on the other hand, is the
process of planning, controlling, and optimizing when a particular task should be done.</p>
<h3 id="authoring-workflow-in-apache-airflow">Authoring Workflow in Apache Airflow.</h3>
<p>Airflow makes it easy to author workflows using python scripts. A <a href="https://en.wikipedia.org/wiki/Directed_acyclic_graph">Directed Acyclic Graph</a>
(DAG) represents a workflow in Airflow. It is a collection of tasks in a way that shows each task&rsquo;s
relationships and dependencies. You can have as many DAGs as you want, and Airflow will execute
them according to the task&rsquo;s relationships and dependencies. If task B depends on the successful
execution of another task A, it means Airflow will run task A and only run task B after task A.
This dependency is very easy to express in Airflow. For example, the above scenario is expressed as</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">task_A</span> <span class="o">&gt;&gt;</span> <span class="n">task_B</span>
</span></span></code></pre></div><p>Also equivalent to</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">task_A</span><span class="o">.</span><span class="n">set_downstream</span><span class="p">(</span><span class="n">task_B</span><span class="p">)</span>
</span></span></code></pre></div><p><img src="/blog/apache-airflow-for-newcomers/Simple_dag.png" alt="Simple Dag"></p>
<p>That helps Airflow to know that it needs to execute task A before task B. Tasks can have far more complex
relationships to each other than expressed above and Airflow figures out how and when to execute the tasks following
their relationships and dependencies.
<img src="/blog/apache-airflow-for-newcomers/semicomplex.png" alt="Complex Dag"></p>
<p>Before we discuss the architecture of Airflow that makes scheduling, executing, and monitoring of
workflow an easy thing, let us discuss the <a href="https://github.com/apache/airflow/blob/master/BREEZE.rst">Breeze environment</a>.</p>
<h3 id="breeze-environment">Breeze Environment</h3>
<p>The breeze environment is the development environment for Airflow where you can run tests, build images,
build documentations and so many other things. There are excellent
<a href="https://github.com/apache/airflow/blob/master/BREEZE.rst">documentation and video</a> on Breeze environment.
Please check them out. You enter the Breeze environment by running the <code>./breeze</code> script. You can run all
the commands mentioned here in the Breeze environment.</p>
<h3 id="scheduler">Scheduler</h3>
<p>The scheduler is the component that monitors DAGs and triggers those tasks whose dependencies have
been met. It watches over the DAG folder, checking the tasks in each DAG and triggers them once they
are ready. It accomplishes this by spawning a process that runs periodically(every minute or so)
reading the metadata database to check the status of each task and decides what needs to be done.
The metadata database is where the status of all tasks are recorded. The status can be one of running,
success, failed, etc.</p>
<p>A task is said to be ready when its dependencies have been met. The dependencies include all the data
necessary for the task to be executed. It should be noted that the scheduler won&rsquo;t trigger your tasks until
the period it covers has ended. If a task&rsquo;s <code>schedule_interval</code> is <code>@daily</code>, the scheduler triggers the task
at the end of the day and not at the beginning. This is to ensure that the necessary data needed for the tasks
are ready. It is also possible to trigger tasks manually on the UI.</p>
<p>In the <a href="https://github.com/apache/airflow/blob/master/BREEZE.rst">Breeze environment</a>, the scheduler is started by running the command <code>airflow scheduler</code>. It uses
the configured production environment. The configuration can be specified in <code>airflow.cfg</code></p>
<h3 id="executor">Executor</h3>
<p>Executors are responsible for running tasks. They work with the scheduler to get information about
what resources are needed to run a task as the task is queued.</p>
<p>By default, Airflow uses the <a href="https://airflow.apache.org/docs/stable/executor/sequential.html#sequential-executor">SequentialExecutor</a>.
However, this executor is limited and it is the only executor that can be used with SQLite.</p>
<p>There are many other <a href="https://airflow.apache.org/docs/stable/executor/index.html">executors</a>,
the difference is on the resources they have and how they choose to use the resources. The available executors
are:</p>
<ul>
<li>Sequential Executor</li>
<li>Debug Executor</li>
<li>Local Executor</li>
<li>Dask Executor</li>
<li>Celery Executor</li>
<li>Kubernetes Executor</li>
<li>Scaling Out with Mesos (community contributed)</li>
</ul>
<p>CeleryExecutor is a better executor compared to the SequentialExecutor. The CeleryExecutor uses several
workers to execute a job in a distributed way. If a worker node is ever down, the CeleryExecutor assign its
task to another worker node. This ensures high availability.</p>
<p>The CeleryExecutor works closely with the scheduler which adds a message to the queue and the Celery broker
which delivers the message to a Celery worker to execute.
You can find more information about the CeleryExecutor and how to configure it at the
<a href="https://airflow.apache.org/docs/stable/executor/celery.html#celery-executor">documentation</a></p>
<h3 id="webserver">Webserver</h3>
<p>The webserver is the web interface (UI) for Airflow. The UI is feature-rich. It makes it easy to
monitor and troubleshoot DAGs and Tasks.</p>
<p><img src="/blog/apache-airflow-for-newcomers/airflow-ui.png" alt="airflow UI"></p>
<p>There are many actions you can perform on the UI. You can trigger a task, monitor the execution
including the duration of the task. The UI makes it possible to view the task&rsquo;s dependencies in a
tree view and graph view. You can view task logs in the UI.</p>
<p>The web UI is started with the command <code>airflow webserver</code> in the breeze environment.</p>
<h3 id="backend">Backend</h3>
<p>By default, Airflow uses the SQLite backend for storing the configuration information, DAG states,
and much other useful information. This should not be used in production as SQLite can cause a data
loss.</p>
<p>You can use PostgreSQL or MySQL as a backend for airflow. It is easy to change to PostgreSQL or MySQL.</p>
<p>The command <code>./breeze --backend mysql</code> selects MySQL as the backend when starting the breeze environment.</p>
<h3 id="operators">Operators</h3>
<p>Operators determine what gets done by a task. Airflow has a lot of builtin Operators. Each operator
does a specific task. There&rsquo;s a BashOperator that executes a bash command, the PythonOperator which
calls a python function, AwsBatchOperator which executes a job on AWS Batch and <a href="https://airflow.apache.org/docs/stable/concepts.html#operators">many more</a>.</p>
<h4 id="sensors">Sensors</h4>
<p>Sensors can be described as special operators that are used to monitor a long-running task.
Just like Operators, there are many predefined sensors in Airflow. These include</p>
<ul>
<li>AthenaSensor: Asks for the state of the Query until it reaches a failure state or success state.</li>
<li>AzureCosmosDocumentSensor: Checks for the existence of a document which matches the given query in CosmosDB</li>
<li>GoogleCloudStorageObjectSensor:  Checks for the existence of a file in Google Cloud Storage</li>
</ul>
<p>A list of most of the available sensors can be found in this <a href="https://airflow.apache.org/docs/stable/_api/airflow/contrib/sensors/index.html?highlight=sensors#module-airflow.contrib.sensors">module</a></p>
<h3 id="contributing-to-airflow">Contributing to Airflow</h3>
<p>Airflow is an open source project, everyone is welcome to contribute. It is easy to get started thanks
to the excellent <a href="https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst">documentation on how to get started</a>.</p>
<p>I joined the community about 12 weeks ago through the <a href="https://www.outreachy.org/">Outreachy Program</a> and have
completed about <a href="https://github.com/apache/airflow/pulls/ephraimbuddy">40 PRs</a>.</p>
<p>It has been an amazing experience! Thanks to my mentors <a href="https://github.com/potiuk">Jarek</a> and
<a href="https://github.com/kaxil">Kaxil</a>, and the community members especially <a href="https://github.com/mik-laj">Kamil</a>
and <a href="https://github.com/turbaszek">Tomek</a> for all their support. I&rsquo;m grateful!</p>
<p>Thank you so much, <a href="https://github.com/leahecole">Leah E. Cole</a>, for your wonderful reviews.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Implementing Stable API for Apache Airflow</title>
    <link href="/blog/implementing-stable-api-for-apache-airflow/" rel="alternate"/>
    <id>/blog/implementing-stable-api-for-apache-airflow/</id>
    <published>2020-07-19T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>My <a href="https://outreachy.org">Outreachy internship</a> is coming to its ends which is also the best time to look back and
reflect on the progress so far.</p>
<p>The goal of my project is to Extend and Improve the Apache Airflow REST API. In this post,
I will be sharing my progress so far.</p>
<p>We started a bit late implementing the REST API because it took time for the OpenAPI 3.0
specification we were to use for the project to be merged. Thanks to <a href="https://github.com/mik-laj">Kamil</a>,
who paved the way for us to start implementing the REST API endpoints. Below are the endpoints I
implemented and the challenges I encountered, including how I overcame them.</p>
<h3 id="implementing-the-read-only-connection-endpoints">Implementing The Read-Only Connection Endpoints</h3>
<p>The <a href="https://github.com/apache/airflow/pull/9095">read-only connection endpoints</a> were the first endpoint I implemented. Looking back,
I can see how much I have improved.</p>
<p>I started by implementing the database schema for the Connection table using <a href="https://marshmallow.readthedocs.io/en/2.x-line/">Marshmallow 2</a>.
We had to use Marshmallow 2 because Flask-AppBuilder was still using it and Flask-AppBuilder
is deeply integrated to Apache Airflow. This meant I had to unlearn Marshmallow 3 that I had
been studying before this realization, but thankfully, <a href="https://marshmallow.readthedocs.io/en/stable/index.html">Marshmallow 3</a> isn&rsquo;t too
different, so I was able to start using Marshmallow 2 in no time.</p>
<p>This first PR would have been more difficult than it was unless there had been any reference
endpoint to look at. <a href="https://github.com/mik-laj">Kamil</a> implemented a <a href="https://github.com/apache/airflow/pull/9045">draft PR</a> in which I took inspiration from.
Thanks to this, It was easy for me to write the unit tests. It was also in this endpoint that
I learned using <a href="https://github.com/wolever/parameterized">parameterized</a> in unit tests :D.</p>
<h3 id="implementing-the-read-only-dagruns-endpoints">Implementing The Read-Only DagRuns Endpoints</h3>
<p>This <a href="https://github.com/apache/airflow/pull/9153">endpoint</a> came with its many challenges, especially on filtering with <code>datetimes</code>.
This was because the <code>connexion</code> library we were using to build the REST API was not validating
date-time format in OpenAPI 3.0 specification, what I eventually found out, was intentional.
Connexion dropped <code>strict-rfc3339</code> because of the later license which is not compatible with
Apache 2.0 license.</p>
<p>I implemented a workaround on this, by defining a function called <code>conn_parse_datetime</code> in the
API utils module. This was later refactored and thankfully, <a href="https://github.com/mik-laj">Kamil</a>
implemented a decorator that allowed us to have cleaner code on the views while using this function.</p>
<p>Then we tried using <code>rfc3339-validator</code> whose license is compatible with Apache 2.0 licence but
later discarded this because with our custom date parser we were able to use duration and
not just date times.</p>
<h3 id="other-endpoints">Other Endpoints</h3>
<p>I implemented some different other endpoints. One peculiar issue I faced was because of Marshmallow 2
not giving error when extra fields are in the request body. I implemented a <code>validate_unknown</code>
method on the schema to handle this. Thankfully, Flask-AppBuilder updated to using Marshmallow 3,
we quickly updated Flask-AppBuilder in Apache Airflow and started using Marshmallow 3 too.</p>
<p>Here are some PRs I contributed that are related to the REST API:</p>
<ol>
<li>
<p><a href="https://github.com/apache/airflow/pull/9227">Add event log endpoints</a>
The event log would help users get information on operations performed at the UI</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9266">Add CRUD endpoints for connection</a>
This PR performs DELETE, PATCH and POST operations on <code>Connection</code></p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9331">Add log endpoint</a>
This PR enables users to get Task Instances log entries</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9431">Move limit &amp; offset to kwargs in views plus work on a configurable maximum limit</a>
This helped us in having a neat code on the views and added configurable maximum limit on query results.</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9648">Update FlaskAppBuilder to v3</a>
This enabled Airflow to start using v3 of Flask App Builder and also made it possible for the API to use
a modern database serializer/deserializer</p>
</li>
<li>
<p><a href="https://github.com/apache/airflow/pull/9771">Add migration guide from the experimental REST API to the stable REST API</a>
This would enable users to start using the stable REST API in less time.</p>
</li>
</ol>
<h3 id="follow-ups">Follow-Ups</h3>
<p>There is still lots of works to be done on the REST API including writing helpful documentation.
I still follow up on these and hopefully, we will complete the REST API before the internship ends.</p>
<p>I am very grateful to my mentors, <a href="https://github.com/potiuk">Jarek</a> and <a href="https://github.com/kaxil">Kaxil</a> for their
patience with me and for surviving my never-ending questions. <a href="https://github.com/mik-laj">Kamil</a> and <a href="https://github.com/turbaszek">Tomek</a>
have been very supportive and I appreciate them for their support and amazing code reviews.</p>
<p>Thanks to <a href="https://github.com/leahecole">Leah E. Cole</a> and <a href="https://github.com/mschickensoup">Karolina Rosół</a>, for their
wonderful reviews. I&rsquo;m grateful.</p>
<p>Thanks for reading!</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 1.10.10</title>
    <link href="/blog/airflow-1.10.10/" rel="alternate"/>
    <id>/blog/airflow-1.10.10/</id>
    <published>2020-04-09T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Airflow 1.10.10 contains 199 commits since 1.10.9 and includes 11 new features, 43 improvements, 44 bug fixes, and several doc changes.</p>
<p><strong>Details</strong>:</p>
<ul>
<li><strong>PyPI</strong>: <a href="https://pypi.org/project/apache-airflow/1.10.10/">https://pypi.org/project/apache-airflow/1.10.10/</a></li>
<li><strong>Docs</strong>: <a href="https://airflow.apache.org/docs/1.10.10/">https://airflow.apache.org/docs/1.10.10/</a></li>
<li><strong>Changelog</strong>: <a href="http://airflow.apache.org/docs/1.10.10/changelog.html">http://airflow.apache.org/docs/1.10.10/changelog.html</a></li>
</ul>
<p>Some of the noteworthy new features (user-facing) are:</p>
<ul>
<li><a href="https://github.com/apache/airflow/pull/8046">Allow user to choose timezone to use in the RBAC UI</a></li>
<li><a href="https://github.com/apache/airflow/pull/7832">Add Production Docker image support</a></li>
<li><a href="http://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html">Allow Retrieving Airflow Connections &amp; Variables from various Secrets backend</a></li>
<li><a href="http://airflow.apache.org/docs/1.10.10/dag-serialization.html">Stateless Webserver using DAG Serialization</a></li>
<li><a href="https://github.com/apache/airflow/pull/7880">Tasks with Dummy Operators are no longer sent to executor</a></li>
<li><a href="https://github.com/apache/airflow/pull/7312">Allow passing DagRun conf when triggering dags via UI</a></li>
</ul>
<h3 id="allow-user-to-choose-timezone-to-use-in-the-rbac-ui">Allow user to choose timezone to use in the RBAC UI</h3>
<p>By default the Web UI will show times in UTC. It is possible to change the timezone shown by using the menu in the top
right (click on the clock to activate it):</p>
<p><strong>Screenshot</strong>:
<img src="/blog/airflow-1.10.10/rbac-ui-timezone.gif" alt="Allow user to chose timezone to use in the RBAC UI"></p>
<p>Details: <a href="https://airflow.apache.org/docs/1.10.10/timezone.html#web-ui">https://airflow.apache.org/docs/1.10.10/timezone.html#web-ui</a></p>
<p><strong>Note</strong>: This feature is only available for the RBAC UI (enabled using <code>rbac=True</code> in <code>[webserver]</code> section in your <code>airflow.cfg</code>).</p>
<h3 id="add-production-docker-image-support">Add Production Docker image support</h3>
<p>There are brand-new production images (alpha quality) available for Airflow 1.10.10. You can pull them from the
<a href="https://hub.docker.com/r/apache/airflow">Apache Airflow Dockerhub</a> repository and start using it.</p>
<p>More information about using production images can be found in <a href="https://github.com/apache/airflow/blob/master/IMAGES.rst#using-the-images">https://github.com/apache/airflow/blob/master/IMAGES.rst#using-the-images</a>. Soon it will be updated with
information how to use images using official helm chart.</p>
<p>To pull the images you can run one of the following commands:</p>
<ul>
<li><code>docker pull apache/airflow:1.10.10-python2.7</code></li>
<li><code>docker pull apache/airflow:1.10.10-python3.5</code></li>
<li><code>docker pull apache/airflow:1.10.10-python3.6</code></li>
<li><code>docker pull apache/airflow:1.10.10-python3.7</code></li>
<li><code>docker pull apache/airflow:1.10.10</code> (uses Python 3.6)</li>
</ul>
<h3 id="allow-retrieving-airflow-connections--variables-from-various-secrets-backend">Allow Retrieving Airflow Connections &amp; Variables from various Secrets backend</h3>
<p>From Airflow 1.10.10, users would be able to get Airflow Variables from Environment Variables.</p>
<p>Details: <a href="https://airflow.apache.org/docs/1.10.10/concepts.html#storing-variables-in-environment-variables">https://airflow.apache.org/docs/1.10.10/concepts.html#storing-variables-in-environment-variables</a></p>
<p>A new concept of Secrets Backend has been introduced to retrieve Airflow Connections and Variables.</p>
<p>From Airflow 1.10.10, users can retrieve Connections &amp; Variables using the same syntax (no DAG code change is required),
from a secret backend defined in <code>airflow.cfg</code>. If no backend is defined, Airflow falls-back to Environment Variables
and then Metadata DB.</p>
<p>Check <a href="https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html#configuration">https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html#configuration</a> for details on how-to
configure Secrets backend.</p>
<p>As of 1.10.10, Airflow supports the following Secret Backends:</p>
<ul>
<li>Hashicorp Vault</li>
<li>GCP Secrets Manager</li>
<li>AWS Parameters Store</li>
</ul>
<p>Details: <a href="https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html">https://airflow.apache.org/docs/1.10.10/howto/use-alternative-secrets-backend.html</a></p>
<p>Example configuration to use Hashicorp Vault as the backend:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-ini" data-lang="ini"><span class="line"><span class="cl"><span class="k">[secrets]</span>
</span></span><span class="line"><span class="cl"><span class="na">backend</span> <span class="o">=</span> <span class="s">airflow.contrib.secrets.hashicorp_vault.VaultBackend</span>
</span></span><span class="line"><span class="cl"><span class="na">backend_kwargs</span> <span class="o">=</span> <span class="s">{&#34;url&#34;: &#34;http://127.0.0.1:8200&#34;, &#34;connections_path&#34;: &#34;connections&#34;, &#34;variables_path&#34;: &#34;variables&#34;, &#34;mount_point&#34;: &#34;airflow&#34;}</span>
</span></span></code></pre></div><h3 id="stateless-webserver-using-dag-serialization">Stateless Webserver using DAG Serialization</h3>
<p>The Webserver can now run without access to DAG Files when DAG Serialization is turned on.
The 2 limitations we had in 1.10.7-1.10.9 (
<a href="https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations">https://airflow.apache.org/docs/1.10.7/dag-serialization.html#limitations</a>)
have been resolved.</p>
<p>The main advantage of this would be reduction in Webserver startup time for large number of DAGs.
Without DAG Serialization all the DAGs are loaded in the DagBag during the
Webserver startup.</p>
<p>With DAG Serialization, an empty DagBag is created and
Dags are loaded from DB only when needed (i.e. when a particular DAG is
clicked on in the home page)</p>
<p>Details: <a href="http://airflow.apache.org/docs/1.10.10/dag-serialization.html">http://airflow.apache.org/docs/1.10.10/dag-serialization.html</a></p>
<h3 id="tasks-using-dummy-operators-are-no-longer-sent-to-executor">Tasks using Dummy Operators are no longer sent to executor</h3>
<p>The Dummy operators does not actually do any work and are mostly used for organizing/grouping tasks along
with BranchPythonOperator.</p>
<p>Previously, when using Kubernetes Executor, the executor would spin up a whole worker pod to execute a dummy task.
With Airflow 1.10.10 tasks using Dummy Operators would be scheduled &amp; evaluated by the Scheduler but not sent to the
Executor. This should significantly improve execution time and resource usage.</p>
<h3 id="allow-passing-dagrun-conf-when-triggering-dags-via-ui">Allow passing DagRun conf when triggering dags via UI</h3>
<p>When triggering a DAG from the CLI or the REST API, it s possible to pass configuration for the DAG run as a JSON blob.</p>
<p>From Airflow 1.10.10, when a user clicks on Trigger Dag button, a new screen confirming the trigger request, and allowing the user to pass a JSON configuration
blob would be shown.</p>
<p><strong>Screenshot</strong>:
<img src="/blog/airflow-1.10.10/trigger-dag-conf.png" alt="Allow passing DagRun conf when triggering dags via UI"></p>
<p>Details: <a href="https://github.com/apache/airflow/pull/7312">https://github.com/apache/airflow/pull/7312</a></p>
<h2 id="updating-guide">Updating Guide</h2>
<p>If you are updating Apache Airflow from a previous version to <code>1.10.10</code>, please take a note of the following:</p>
<ul>
<li>
<p>Run <code>airflow upgradedb</code> after <code>pip install -U apache-airflow==1.10.10</code> as <code>1.10.10</code> contains 3 database migrations.</p>
</li>
<li>
<p>If you have used <code>none_failed</code> trigger rule in your DAG, change it to use the new <code>none_failed_or_skipped</code> trigger rule.
As previously implemented, the actual behavior of <code>none_failed</code> trigger rule would skip the current task if all parents of the task
had also skipped. This was not in-line with what was documented about that trigger rule. We have changed the implementation to match
the documentation, hence if you need the old behavior use <code>none_failed_or_skipped</code>.</p>
<p>More details in <a href="https://github.com/apache/airflow/pull/7464">https://github.com/apache/airflow/pull/7464</a>.</p>
</li>
<li>
<p>Setting empty string to an Airflow Variable will now return an empty string, it previously returned <code>None</code>.</p>
<p>Example:</p>
<pre><code>&gt;&gt; Variable.set('test_key', '')
&gt;&gt; Variable.get('test_key')
</code></pre>
<p>The above code returned <code>None</code> previously, now it will return &lsquo;&rsquo;.</p>
</li>
<li>
<p>When a task is marked as <code>success</code> by a user in Airflow UI, function defined in <code>on_success_callback</code> will be called.</p>
</li>
</ul>
<h2 id="special-note--deprecations">Special Note / Deprecations</h2>
<h3 id="python-2">Python 2</h3>
<p>Python 2 has reached end of its life on Jan 2020. Airflow Master no longer supports Python 2.
Airflow 1.10.* would be the last series to support Python 2.</p>
<p>We strongly recommend users to use Python &gt;= 3.6</p>
<h3 id="use-airflow-rbac-ui">Use Airflow RBAC UI</h3>
<p>Airflow 1.10.10 ships with 2 UIs, the default is non-RBAC Flask-admin based UI and Flask-appbuilder based UI.</p>
<p>The Flask-AppBuilder (FAB) based UI allows Role-based Access Control and has more advanced features compared to
the legacy Flask-admin based UI. This UI can be enabled by setting <code>rbac=True</code> in <code>[webserver]</code> section in your <code>airflow.cfg</code>.</p>
<p>Flask-admin based UI is deprecated and new features won&rsquo;t be ported to it. This UI will still be the default
for 1.10.* series but would no longer be available from Airflow 2.0</p>
<h3 id="running-airflow-on-macos">Running Airflow on MacOS</h3>
<p>Run <code>export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES</code> in your scheduler environmentIf you are running Airflow on MacOS
and get the following error in the Scheduler logs:</p>
<pre><code>objc[1873]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[1873]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
</code></pre>
<p>This error occurs because of added security to restrict multiprocessing &amp; multithreading in Mac OS High Sierra and above.</p>
<h3 id="we-have-moved-to-github-issues">We have moved to GitHub Issues</h3>
<p>The Airflow Project has moved from <a href="https://issues.apache.org/jira/projects/AIRFLOW/issues">JIRA</a> to
<a href="https://github.com/apache/airflow/issues">GitHub</a> for tracking issues.</p>
<p>So if you find any bugs in Airflow 1.10.10 please create a GitHub Issue for it.</p>
<h2 id="list-of-contributors">List of Contributors</h2>
<p>According to git shortlog, the following people contributed to the 1.10.10 release. Thank you to all contributors!</p>
<p>ANiteckiP, Alex Guziel, Alex Lue, Anita Fronczak, Ash Berlin-Taylor, Benji Visser, Bhavika Tekwani, Brad Dettmer, Chris McLennon, Cooper Gillan, Daniel Imberman, Daniel Standish, Felix Uellendall, Jarek Potiuk, Jiajie Zhong, Jithin Sukumar, Kamil Breguła, Kaxil Naik, Kengo Seki, Kris, Kumpan Anton, Lokesh Lal, Louis Guitton, Louis Simoneau, Luyao Yang, Noël Bardelot, Omair Khan, Philipp Großelfinger, Ping Zhang, RasPavel, Ray, Robin Edwards, Ry Walker, Saurabh, Sebastian Brandt, Tomek Kzukowski, Tomek Urbaszek, Van-Duyet Le, Xiaodong Deng, Xinbin Huang, Yu Qian, Zacharya, atrbgithub, cong-zhu, retornam</p>
]]></content>
  </entry>
  
  <entry>
    <title>Apache Airflow 1.10.8 &amp; 1.10.9</title>
    <link href="/blog/airflow-1.10.8-1.10.9/" rel="alternate"/>
    <id>/blog/airflow-1.10.8-1.10.9/</id>
    <published>2020-02-23T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Airflow 1.10.8 contains 160 commits since 1.10.7 and includes 4 new features, 42 improvements, 36 bug fixes, and several doc changes.</p>
<p>We released 1.10.9 on the same day as one of the Flask dependencies (Werkzeug) released 1.0 which broke Airflow 1.10.8.</p>
<p><strong>Details</strong>:</p>
<ul>
<li><strong>PyPI</strong>: <a href="https://pypi.org/project/apache-airflow/1.10.9/">https://pypi.org/project/apache-airflow/1.10.9/</a></li>
<li><strong>Docs</strong>: <a href="https://airflow.apache.org/docs/1.10.9/">https://airflow.apache.org/docs/1.10.9/</a></li>
<li><strong>Changelog (1.10.8)</strong>: <a href="http://airflow.apache.org/docs/1.10.8/changelog.html#airflow-1-10-8-2020-01-07">http://airflow.apache.org/docs/1.10.8/changelog.html#airflow-1-10-8-2020-01-07</a></li>
<li><strong>Changelog (1.10.9)</strong>: <a href="http://airflow.apache.org/docs/1.10.9/changelog.html#airflow-1-10-9-2020-02-10">http://airflow.apache.org/docs/1.10.9/changelog.html#airflow-1-10-9-2020-02-10</a></li>
</ul>
<p>Some of the noteworthy new features (user-facing) are:</p>
<ul>
<li><a href="https://github.com/apache/airflow/pull/6489">Add tags to DAGs and use it for filtering in the UI (RBAC only)</a></li>
<li><a href="http://airflow.apache.org/docs/1.10.9/executor/debug.html">New Executor: DebugExecutor for Local debugging from your IDE</a></li>
<li><a href="https://github.com/apache/airflow/pull/7281">Allow passing conf in &ldquo;Add DAG Run&rdquo; (Triggered Dags) view</a></li>
<li><a href="https://github.com/apache/airflow/pull/7038">Allow dags to run for future execution dates for manually triggered DAGs (only if <code>schedule_interval=None</code>)</a></li>
<li><a href="https://airflow.apache.org/docs/1.10.9/configurations-ref.html">Dedicated page in documentation for all configs in airflow.cfg</a></li>
</ul>
<h3 id="add-tags-to-dags-and-use-it-for-filtering-in-the-ui">Add tags to DAGs and use it for filtering in the UI</h3>
<p>In order to filter DAGs (e.g. by team), you can add tags in each dag. The filter is saved in a cookie and can be reset by the reset button.</p>
<p>For example:</p>
<p>In your Dag file, pass a list of tags you want to add to DAG object:</p>
<div class="highlight"><pre tabindex="0" class="chroma"><code class="language-python" data-lang="python"><span class="line"><span class="cl"><span class="n">dag</span> <span class="o">=</span> <span class="n">DAG</span><span class="p">(</span>
</span></span><span class="line"><span class="cl">    <span class="n">dag_id</span><span class="o">=</span><span class="s1">&#39;example_dag_tag&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">schedule_interval</span><span class="o">=</span><span class="s1">&#39;0 0 * * *&#39;</span><span class="p">,</span>
</span></span><span class="line"><span class="cl">    <span class="n">tags</span><span class="o">=</span><span class="p">[</span><span class="s1">&#39;example&#39;</span><span class="p">]</span>
</span></span><span class="line"><span class="cl"><span class="p">)</span>
</span></span></code></pre></div><p><strong>Screenshot</strong>:
<img src="/blog/airflow-1.10.8-1.10.9/airflow-dag-tags.png" alt="Add filter by DAG tags"></p>
<p><strong>Note</strong>: This feature is only available for the RBAC UI (enabled using <code>rbac=True</code> in <code>[webserver]</code> section in your <code>airflow.cfg</code>).</p>
<h2 id="special-note--deprecations">Special Note / Deprecations</h2>
<h3 id="python-2">Python 2</h3>
<p>Python 2 has reached end of its life on Jan 2020. Airflow Master no longer supports Python 2.
Airflow 1.10.* would be the last series to support Python 2.</p>
<p>We strongly recommend users to use Python &gt;= 3.6</p>
<h3 id="use-airflow-rbac-ui">Use Airflow RBAC UI</h3>
<p>Airflow 1.10.9 ships with 2 UIs, the default is non-RBAC Flask-admin based UI and Flask-appbuilder based UI.</p>
<p>The Flask-AppBuilder (FAB) based UI is allowed Role-based Access Control and has more advanced features compared to
the legacy Flask-admin based UI. This UI can be enabled by setting <code>rbac=True</code> in <code>[webserver]</code> section in your <code>airflow.cfg</code>.</p>
<p>Flask-admin based UI is deprecated and new features won&rsquo;t be ported to it. This UI will still be the default
for 1.10.* series but would no longer be available from Airflow 2.0</p>
<h2 id="list-of-contributors">List of Contributors</h2>
<p>According to git shortlog, the following people contributed to the 1.10.8 and 1.10.9 release. Thank you to all contributors!</p>
<p>Anita Fronczak, Ash Berlin-Taylor, BasPH, Bharat Kashyap, Bharath Palaksha, Bhavika Tekwani, Bjorn Olsen, Brian Phillips, Cooper Gillan, Daniel Cohen, Daniel Imberman, Daniel Standish, Gabriel Eckers, Hossein Torabi, Igor Khrol, Jacob, Jarek Potiuk, Jay, Jiajie Zhong, Jithin Sukumar, Kamil Breguła, Kaxil Naik, Kousuke Saruta, Mustafa Gök, Noël Bardelot, Oluwafemi Sule, Pete DeJoy, QP Hou, Qian Yu, Robin Edwards, Ry Walker, Steven van Rossum, Tomek Urbaszek, Xinbin Huang, Yuen-Kuei Hsueh, Yu Qian, Zacharya, ZxMYS, rconroy293, tooptoop4</p>
]]></content>
  </entry>
  
  <entry>
    <title>Experience in Google Season of Docs 2019 with Apache Airflow</title>
    <link href="/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/" rel="alternate"/>
    <id>/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/</id>
    <published>2019-12-20T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>I came across <a href="https://developers.google.com/season-of-docs">Google Season of Docs</a> (GSoD) almost by accident, thanks to my extensive HackerNews and Twitter addiction.  I was familiar with the Google Summer of Code but not with this program.
It turns out it was the inaugural phase. I read the details, and the process felt a lot like GSoC except that this was about documentation.</p>
<h2 id="about-me">About Me</h2>
<p>I have been writing tech articles on medium as well as my blog for the past 1.5 years.  Blogging helps me test my understanding of the concepts as untangling the toughest of ideas in simple sentences requires a considerable time investment.</p>
<p>Also, I have been working as a Software Developer for the past three years, which involves writing documentation for my projects as well. I completed my B.Tech from  IIT Roorkee. During my stay in college, I applied for GSoC once but didn’t make it through in the final list of selected candidates.</p>
<p>I saw GSoD as an excellent opportunity to improve my technical writing skills using feedback from the open-source community. I contributed some bug fixes and features to Apache Superset and Apache Druid, but this would be my first contribution as a technical writer.</p>
<h2 id="searching-for-the-organization">Searching for the organization</h2>
<p>About 40+ organizations were participating in the GSoD. However, there were two which came as the right choice for me in the first instant. The first one was <a href="https://airflow.apache.org/">Apache Airflow</a> because I had already used Airflow extensively and also contributed some custom operators inside the forked version of my previous company.</p>
<p>The second one was <a href="http://cassandra.apache.org/">Apache Cassandra</a>, on which I also had worked extensively but hadn’t done any code or doc changes.</p>
<p>Considering the total experience, I decided to go with the Airflow.</p>
<h2 id="project-selection">Project selection</h2>
<p>After selecting the org, the next step was to choose the project. Again, my previous experience played a role here, and I ended up picking the <strong>How to create a workflow</strong> . The aim of the project was to write documentation which will help users in creating complex as well as custom DAGs.
The final deliverables were a bit different, though. More on that later.</p>
<p>After submitting my application, I got involved in my job until one day, I saw a mail from Google confirming my selection as a Technical Writer for the project.</p>
<h2 id="community-bonding">Community Bonding</h2>
<p>Getting selected is just a beginning.  I got the invite to the Airflow Slack channel where most of the discussions happened.
My mentor was <a href="https://github.com/ashb">Ash-Berlin Taylor</a> from Apache Airflow. I started talking to my mentor to get a general sense of what deliverables were expected. The deliverables were documented in <a href="https://cwiki.apache.org/confluence/display/AIRFLOW/Season&#43;of&#43;Docs&#43;2019">confluence</a>.</p>
<ul>
<li>A page for how to create a DAG that also includes:
<ul>
<li>Revamping the page related to scheduling a DAG</li>
<li>Adding tips for specific DAG conditions, such as rerunning a failed task</li>
</ul>
</li>
<li>A page for developing custom operators that includes:
<ul>
<li>Describing mechanisms that are important when creating an operator, such as template fields, UI color, hooks, connection, etc.</li>
<li>Describing the responsibility between the operator and the hook</li>
<li>Considerations for dealing with shared resources (such as connections and hooks)</li>
</ul>
</li>
<li>A page that describes how to define the relationships between tasks. The page should include information about:
<ul>
<li>** &gt;&gt; &lt;&lt; **</li>
<li>set upstream / set downstream</li>
<li>helpers method ex. chain</li>
</ul>
</li>
<li>A page that describes the communication between tasks that also includes:
<ul>
<li>Revamping the page related to macros and XCOM</li>
</ul>
</li>
</ul>
<p>My mentor set the expectation early on that the deliverables were sort of like guidelines and not strict rules.
If I wanted to, I could choose to work on something else related to the project also, which was not under deliverables.
After connecting with the mentor, I started engaging with the overall Airflow community. The people in the community were helpful, especially <a href="https://github.com/mik-laj">Kamil Bregula</a>. Kamil helped me in getting started with the guidelines to follow while writing the documentation for Airflow.</p>
<h2 id="doc-development">Doc Development</h2>
<p>I picked DAG run as my first deliverable. I chose this topic as some parts of it were already documented but needed some additional text.
I split the existing Scheduling &amp; Triggers page into two new pages.</p>
<ol>
<li>Schedulers</li>
<li>DAG Runs</li>
</ol>
<p>Most of the details unrelated to schedulers were moved to DAG runs page, and then missing points such as how to re-run a task or DAG were added.
Once I was satisfied with my version, I asked my mentor and Kamil to review it. For the first version, I shared the text in the Google Docs file in which the reviewers added comments.
However, the document started getting messy, and it became difficult to track the changes. The time had come now to raise a proper Pull Request.</p>
<p>This was the time when I faced my first challenge. The documentation of Apache Airflow is written using RST(reStructuredText) syntax, with which I was entirely unfamiliar. I had mostly worked in Markdown.
I spent the next couple of days understanding the syntax. Fortunately, it was quite easy to get acquainted.
I raised the <a href="https://github.com/apache/airflow/pull/6295">Pull Request</a> and waited for the comments. Finally, after a few days when I saw the comments, they were mostly related to two things - grammar and formatting. There were also comments related to what I had missed or misinterpreted.</p>
<h3 id="using-correct-grammar">Using correct grammar</h3>
<p>After discussing with Kamil, I decided to follow <a href="https://developers.google.com/style/">Google’s Developer Documentation Guidelines</a>.  These guidelines contain almost everything you’ll need to consider while writing good documentation, such as always to use active voice.
Secondly, I installed the Grammarly app. After writing a doc, I used to put it in Grammarly to check for errors. Then I corrected the errors, made some more changes, and then again pushed it to Grammarly. This was an iterative process until I arrived with a version of the doc, which was grammatically correct but not seemed to have been written by an AI.</p>
<h3 id="formatting">Formatting</h3>
<p>Formatting involves writing notes and tips, marking the airflow components correctly in the text, and making sure a user who is skimming through the docs doesn’t miss the critical text.
This required a bit of trial and error. I studied the current pattern in Airflow docs and made changes, pushed commits, incorporated new review comments, and then so on.</p>
<p>In the end, all the reviewers approved the PR, but it was not merged until two months later. This was because we doubted if some more pages, such as <strong>Concepts</strong>, should also be split up, resulting in a better-structured document. In the end, we decided to delay it until we discussed it with the broader community.</p>
<p>My <a href="https://github.com/apache/airflow/pull/6348">second PR</a> was a completely new document. It was related to How to create your custom operator. For this, since now I was familiar with most of the syntax, I directly raised the PR without going via Google Docs. I received a lot of comments again, but this time they were more related to what I had written rather than how I had written it.
e.g., Describing in detail how to use <strong>template fields</strong> and clean up my code examples. The fewer grammatical &amp; formatting error comments showed I had made progress.
The PR was accepted within two weeks and gave me a huge confidence boost.</p>
<p>After my second PR, I was in a bit of a deadlock. My last remaining deliverable was related to <strong>Macros</strong>, but the scope wasn’t clear. I talked to my mentor, and he told me he didn’t mind if I can go off-track to work on something else while the community figured out what changes were needed.
We discussed a lot of ideas. In the end, I decided to go with the Best Practices guide inspired by my mentors’ <a href="https://drive.google.com/file/d/1E4zle8-fv5S1rrlcNUzjiEV19OMYvwoY/view?usp=sharing">talk on Apache Airflow </a>in a meetup. Having faced challenges while running Airflow in production myself, I was highly motivated to write something like this so that other developers don’t suffer.
The first draft was ready within two weeks. I called it <strong>Running Airflow in Production</strong>. However, after adding a few more pieces to the document, I realized it was better to call it <strong>Best Practices</strong> guide, which most of the open-source projects contained.</p>
<p>People were enthusiastic about this <a href="https://github.com/apache/airflow/pull/6515">pull request</a> since a lot of them faced the challenges described in the doc. I had hit the nail on the head. After some deliberation over the next 1-2 weeks, my PR got accepted.</p>
<p>I then returned to my first PR and started making some changes related to the new review comments.  After this, I discussed with my mentor about specific elements that were bugging him, such as getting people to understand how the schedule interval works in as few words as possible.
After a lot of trial and error, we arrived at a version with which both of us could make peace.</p>
<h2 id="final-evaluation">Final Evaluation</h2>
<p>On 12th September, I received mail from Google about the successful completion of the project. This meant my mentor liked my work. The Airflow community also appreciated the contributions.</p>
<p>My documents were finally published on Airflow website -</p>
<ul>
<li><a href="https://airflow.readthedocs.io/en/latest/dag-run.html">DAG Runs</a></li>
<li><a href="https://airflow.readthedocs.io/en/latest/scheduler.html">Scheduler</a></li>
<li><a href="https://airflow.readthedocs.io/en/latest/howto/custom-operator.html">Creating a custom operator</a></li>
<li><a href="https://airflow.readthedocs.io/en/latest/best-practices.html">Best Practices</a></li>
</ul>
<p>I also started getting invited in the PR reviews of other developers. I am looking forward to more contributions to the project in the coming year.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Airflow Survey 2019</title>
    <link href="/blog/airflow-survey/" rel="alternate"/>
    <id>/blog/airflow-survey/</id>
    <published>2019-12-11T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h1 id="apache-airflow-survey-2019">Apache Airflow Survey 2019</h1>
<p>Apache Airflow is <a href="https://www.astronomer.io/blog/why-airflow/">growing faster than ever</a>.
Thus, receiving and adjusting to our users’ feedback is a must. We created
<a href="https://forms.gle/XAzR1pQBZiftvPQM7">survey</a> and we got <strong>308</strong> responses.
Let’s see who Airflow users are, how they play with it, and what they miss.</p>
<h1 id="overview-of-the-user">Overview of the user</h1>
<p><strong>What best describes your current occupation?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Data Engineer</td>
          <td>194</td>
          <td>62.99%</td>
      </tr>
      <tr>
          <td>Developer</td>
          <td>34</td>
          <td>11.04%</td>
      </tr>
      <tr>
          <td>Architect</td>
          <td>23</td>
          <td>7.47%</td>
      </tr>
      <tr>
          <td>Data Scientist</td>
          <td>19</td>
          <td>6.17%</td>
      </tr>
      <tr>
          <td>Data Analyst</td>
          <td>13</td>
          <td>4.22%</td>
      </tr>
      <tr>
          <td>DevOps</td>
          <td>13</td>
          <td>4.22%</td>
      </tr>
      <tr>
          <td>IT Administrator</td>
          <td>2</td>
          <td>0.65%</td>
      </tr>
      <tr>
          <td>Machine Learning Engineer</td>
          <td>2</td>
          <td>0.65%</td>
      </tr>
      <tr>
          <td>Manager</td>
          <td>2</td>
          <td>0.65%</td>
      </tr>
      <tr>
          <td>Operations</td>
          <td>2</td>
          <td>0.65%</td>
      </tr>
      <tr>
          <td>Chief Data Officer</td>
          <td>1</td>
          <td>0.32%</td>
      </tr>
      <tr>
          <td>Engineering Manager</td>
          <td>1</td>
          <td>0.32%</td>
      </tr>
      <tr>
          <td>Intern</td>
          <td>1</td>
          <td>0.32%</td>
      </tr>
      <tr>
          <td>Product owner</td>
          <td>1</td>
          <td>0.32%</td>
      </tr>
      <tr>
          <td>Quant</td>
          <td>1</td>
          <td>0.32%</td>
      </tr>
  </tbody>
</table>
<p><strong>In your day to day job, what do you use Airflow for?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Data processing (ETL)</td>
          <td>298</td>
          <td>96.75%</td>
      </tr>
      <tr>
          <td>Artificial Intelligence and Machine Learning Pipelines</td>
          <td>90</td>
          <td>29.22%</td>
      </tr>
      <tr>
          <td>Automating DevOps operations</td>
          <td>64</td>
          <td>20.78%</td>
      </tr>
  </tbody>
</table>
<p>According to the survey, most of the Airflow users are the “data” people. Moreover,
28.57% uses Airflow to both ETL and ML pipelines meaning that those two fields
are somehow connected. Only five respondents use Airflow for DevOps operations only,
That means that other 59 people who use Airflow for DevOps stuff use it also for
ETL / ML  purposes.</p>
<p><strong>How many active DAGs do you have in your largest Airflow instance?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>0-20</td>
          <td>115</td>
          <td>37.34%</td>
      </tr>
      <tr>
          <td>21-40</td>
          <td>65</td>
          <td>21.10%</td>
      </tr>
      <tr>
          <td>41-60</td>
          <td>44</td>
          <td>14.29%</td>
      </tr>
      <tr>
          <td>61-100</td>
          <td>28</td>
          <td>9.09%</td>
      </tr>
      <tr>
          <td>101-200</td>
          <td>28</td>
          <td>9.09%</td>
      </tr>
      <tr>
          <td>201-300</td>
          <td>7</td>
          <td>2.27%</td>
      </tr>
      <tr>
          <td>301-999</td>
          <td>8</td>
          <td>2.60%</td>
      </tr>
      <tr>
          <td>1000+</td>
          <td>13</td>
          <td>4.22%</td>
      </tr>
  </tbody>
</table>
<p>The majority of users do not exceed 100 active DAGs per Airflow instance. However,
as we can see there are users who exceed thousands of DAGs with a maximum number 5000.</p>
<p><strong>What is the maximum number of tasks that you have used in one DAG?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>0-10</td>
          <td>61</td>
          <td>19.81%</td>
      </tr>
      <tr>
          <td>11-20</td>
          <td>60</td>
          <td>19.48%</td>
      </tr>
      <tr>
          <td>21-30</td>
          <td>31</td>
          <td>10.06%</td>
      </tr>
      <tr>
          <td>31-40</td>
          <td>21</td>
          <td>6.82%</td>
      </tr>
      <tr>
          <td>41-50</td>
          <td>26</td>
          <td>8.44%</td>
      </tr>
      <tr>
          <td>51-100</td>
          <td>36</td>
          <td>11.69%</td>
      </tr>
      <tr>
          <td>101-200</td>
          <td>28</td>
          <td>9.09%</td>
      </tr>
      <tr>
          <td>201-500</td>
          <td>21</td>
          <td>6.82%</td>
      </tr>
      <tr>
          <td>501+</td>
          <td>24</td>
          <td>11.54%</td>
      </tr>
  </tbody>
</table>
<p>The given maximum number of tasks in a single DAG was 10 000 (!). The number of tasks
depends on the purposes of a DAG, so it’s rather hard to say if users have “simple”
or “complicated” workflows.</p>
<p><strong>When onboarding new members to Airflow, what is the biggest problem?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No guide on best practises on developing DAGs</td>
          <td>160</td>
          <td>51.95%</td>
      </tr>
      <tr>
          <td>Small number of tutorials on different aspects of using Airflow</td>
          <td>57</td>
          <td>18.51%</td>
      </tr>
      <tr>
          <td>Documentation is not clear enough</td>
          <td>42</td>
          <td>13.64%</td>
      </tr>
      <tr>
          <td>Small number of blogs regarding Airflow</td>
          <td>6</td>
          <td>1.95%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>43</td>
          <td>13.96%</td>
      </tr>
  </tbody>
</table>
<p>This is an important result. Using Airflow is all about writing and scheduling DAGs.
No guide or any other complete resource on best practices for developing Dags is a big
problem. Diving deep in the “other” answers, we can find that:</p>
<ul>
<li>Airflow’s “magic” (scheduler, executors, schedule times) is hard to understand</li>
<li>DAG testing is not easy to do and to explain</li>
<li>Airflow UI needs some love.</li>
</ul>
<p><strong>How likely are you to recommend Apache Airflow?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Very Likely</td>
          <td>140</td>
          <td>45.45%</td>
      </tr>
      <tr>
          <td>Likely</td>
          <td>124</td>
          <td>40.26%</td>
      </tr>
      <tr>
          <td>Neutral</td>
          <td>33</td>
          <td>10.71%</td>
      </tr>
      <tr>
          <td>Unlikely</td>
          <td>8</td>
          <td>2.60%</td>
      </tr>
      <tr>
          <td>Very unlikely</td>
          <td>3</td>
          <td>0.97%</td>
      </tr>
  </tbody>
</table>
<p>This means that more than 85% of people who use Airflow like it. It seems Airflow does
its job nicely. However, we have to remember that this survey is likely biased - it’s
more likely that you respond to the survey if you like the tool you use. Should we
focus then on those 11 people who did not like Airflow? It’s a good question.</p>
<h2 id="airflow-usage">Airflow usage</h2>
<p><strong>Which interface(s) of Airflow do you use as part of your current role?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Original Airflow Graphical User Interface</td>
          <td>297</td>
          <td>96.43%</td>
      </tr>
      <tr>
          <td>CLI</td>
          <td>126</td>
          <td>40.91%</td>
      </tr>
      <tr>
          <td>Original Airflow Graphical User Interface, CLI</td>
          <td>117</td>
          <td>37.99%</td>
      </tr>
      <tr>
          <td>API</td>
          <td>60</td>
          <td>19.48%</td>
      </tr>
      <tr>
          <td>Original Airflow Graphical User Interface, CLI, API</td>
          <td>32</td>
          <td>10.39%</td>
      </tr>
      <tr>
          <td>Custom (own created) Airflow Graphical User Interface</td>
          <td>25</td>
          <td>8.12%</td>
      </tr>
  </tbody>
</table>
<p>It’s visible that usage of CLI goes in pair with using Airflow web UI. Our
survey included some UX related questions to allow us to understand how users
use Airflow webserver.</p>
<p><strong>What do you use the Graphical User Interface for?</strong></p>
<p><img src="/blog/airflow-survey/plot1.png" alt=""></p>
<p><strong>What do you use CLI for?</strong></p>
<p><img src="/blog/airflow-survey/plot2.png" alt=""></p>
<p><strong>In Airflow, which UI view(s) are important for you?</strong></p>
<p><img src="/blog/airflow-survey/plot3.png" alt=""></p>
<p>Here we see that the majority uses Web UI mostly for monitoring purposes:</p>
<ul>
<li>Monitoring DAGs</li>
<li>Accessing logs</li>
</ul>
<p>An interesting result is that many people seem not to use backfilling as
there’s no other way than to do it by CLI.</p>
<p><strong>What executor type do you use?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Celery</td>
          <td>138</td>
          <td>44.81%</td>
      </tr>
      <tr>
          <td>Local</td>
          <td>85</td>
          <td>27.60%</td>
      </tr>
      <tr>
          <td>Kubernetes</td>
          <td>52</td>
          <td>16.88%</td>
      </tr>
      <tr>
          <td>Sequential</td>
          <td>22</td>
          <td>7.14%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>11</td>
          <td>3.57</td>
      </tr>
  </tbody>
</table>
<p>The other option mostly consisted of information that someone uses a few types or is
migrating from one executor to another. What can be observed is an increase in usage
of Local and Kubernetes executors when compared to results from an earlier <a href="https://ash.berlintaylor.com/writings/2019/02/airflow-user-survey-2019/">survey done
by Ash</a>.</p>
<p><strong>Do you use Kubernetes-based deployments for Airflow?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No - we do not plan to use Kubernetes near term</td>
          <td>88</td>
          <td>28.57%</td>
      </tr>
      <tr>
          <td>Yes - setup on our own via Helm Chart or similar</td>
          <td>65</td>
          <td>21.10%</td>
      </tr>
      <tr>
          <td>Not yet - but we use Kubernetes in our organization and we could move</td>
          <td>61</td>
          <td>19.81%</td>
      </tr>
      <tr>
          <td>Yes - via managed service in the cloud (Composer / Astronomer etc.)</td>
          <td>45</td>
          <td>14.61%</td>
      </tr>
      <tr>
          <td>Not yet - but we plan to deploy Kubernetes in our organization soon</td>
          <td>42</td>
          <td>13.64%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>7</td>
          <td>2.27%</td>
      </tr>
  </tbody>
</table>
<p>The most interesting thing is that there’s nearly 30% of users who do not use Kubernetes,
and they are not going to move. This means we should keep other deployment options in
mind when working on Airflow 2.0. On the other hand, almost 70% of the users already
use Kubernetes, or it’s a viable option for them.</p>
<p><strong>Do you combine multiple DAGs?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>No, I don&rsquo;t combine multiple DAGs</td>
          <td>127</td>
          <td>41.23%</td>
      </tr>
      <tr>
          <td>Yes, through SubDAG</td>
          <td>73</td>
          <td>23.70%</td>
      </tr>
      <tr>
          <td>Yes, by triggering another DAG</td>
          <td>72</td>
          <td>23.38%</td>
      </tr>
      <tr>
          <td>Other</td>
          <td>36</td>
          <td>11.69%</td>
      </tr>
  </tbody>
</table>
<p>In the other category, 9 people explicitly mentioned using <code>ExternalTaskSensor</code>,
and I think it could be treated as running subDAGs by triggering other DAGs.</p>
<p><strong>Do you use Airflow Plugins? If yes, what do you use it for?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Adding new operators/sensors and hooks</td>
          <td>187</td>
          <td>60.71%</td>
      </tr>
      <tr>
          <td>I don&rsquo;t use Airflow plugins</td>
          <td>109</td>
          <td>35.39%</td>
      </tr>
      <tr>
          <td>Adding AppBuilder views &amp; menu items</td>
          <td>31</td>
          <td>10.06%</td>
      </tr>
      <tr>
          <td>Adding new executor</td>
          <td>18</td>
          <td>5.84%</td>
      </tr>
      <tr>
          <td>Adding OperatorExtraLinks</td>
          <td>7</td>
          <td>2.27%</td>
      </tr>
  </tbody>
</table>
<p>The high percentage - 60%  for “Adding new operators/sensors and hooks” is quite a
surprising result for some of us - especially that you do not actually need to use the
plugin mechanism to add any of those. Those are standard python objects, and you can
simply drop your hooks/operators/sensors code to <code>PYTHONPATH</code> environment variable and
they will work. It seems that this may be a result of a lack of best practices guide.</p>
<p>Plugins are more useful for adding views and menu items - yet only 10%.
OperatorExtraLinks are even more useful (though relatively new) feature, so it’s not
entirely surprising they are hardly used.</p>
<p>It was also kind of surprising that someone at all uses plugins to use their own
executors. We considered removing that option recently - but now we have to rethink
our approach.</p>
<p><strong>What metrics do you use to monitor Airflow?</strong></p>
<p>There were a lot of different responses. Some use Prometheus and other services,
others do not use any monitoring. One of the interesting responses linked to this
solution for <a href="https://github.com/mastak/airflow_operators_metrics">airflow_operators_metrics</a>.</p>
<h2 id="external-services">External services</h2>
<p><strong>What external services do you use in your Airflow DAGs?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Amazon Web Services</td>
          <td>160</td>
          <td>51.95%</td>
      </tr>
      <tr>
          <td>Internal company systems</td>
          <td>150</td>
          <td>48.7%</td>
      </tr>
      <tr>
          <td>Hadoop / Spark / Flink / Other Apache software</td>
          <td>119</td>
          <td>38.64%</td>
      </tr>
      <tr>
          <td>Google Cloud Platform / Google APIs</td>
          <td>112</td>
          <td>36.36%</td>
      </tr>
      <tr>
          <td>Microsoft Azure</td>
          <td>28</td>
          <td>9.09%</td>
      </tr>
      <tr>
          <td>I do not use external services in my Airflow DAGs</td>
          <td>18</td>
          <td>5.84%</td>
      </tr>
  </tbody>
</table>
<p>It’s not surprising that Amazon Web Services is leading the way as they are considered the most mature
cloud provider. Internal system and other Apache products on the next two positions are
quite understandable if we take into account that the majority uses Airflow for ETL processes.</p>
<p><strong>What external services do you use in your Airflow DAGs? (Mixed providers)</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Google Cloud Platform / Google APIs, Amazon Web Services</td>
          <td>44</td>
          <td>14.29%</td>
      </tr>
      <tr>
          <td>Amazon Web Services, Microsoft Azure</td>
          <td>5</td>
          <td>1.62%</td>
      </tr>
      <tr>
          <td>Google Cloud Platform / Google APIs, Microsoft Azure</td>
          <td>4</td>
          <td>1.3%</td>
      </tr>
  </tbody>
</table>
<p>This result is not surprising because companies usually prefer to stick with one cloud
provider.</p>
<p><strong>How do you integrate with external services?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Using Bash / Python operator</td>
          <td>220</td>
          <td>71.43%</td>
      </tr>
      <tr>
          <td>Using existing, dedicated operators / hooks</td>
          <td>217</td>
          <td>70.45%</td>
      </tr>
      <tr>
          <td>Using own, custom operators / hooks</td>
          <td>216</td>
          <td>70.13%</td>
      </tr>
  </tbody>
</table>
<p>We had some anecdotal evidence that people use more Python/Bash operators than the
dedicated ones - but it looks like all ways of using Airflow to connect to external
services are equally popular.</p>
<h2 id="what-can-be-improved">What can be improved</h2>
<p><strong>In your opinion, what could be improved in Airflow?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Scheduler performance</td>
          <td>189</td>
          <td>61.36%</td>
      </tr>
      <tr>
          <td>Web UI</td>
          <td>180</td>
          <td>58.44%</td>
      </tr>
      <tr>
          <td>Logging, monitoring and alerting</td>
          <td>145</td>
          <td>47.08%</td>
      </tr>
      <tr>
          <td>Examples, how-to, onboarding documentation</td>
          <td>143</td>
          <td>46.43%</td>
      </tr>
      <tr>
          <td>Technical documentation</td>
          <td>137</td>
          <td>44.48%</td>
      </tr>
      <tr>
          <td>Reliability</td>
          <td>112</td>
          <td>36.36%</td>
      </tr>
      <tr>
          <td>REST API</td>
          <td>96</td>
          <td>31.17%</td>
      </tr>
      <tr>
          <td>Authentication and authorization</td>
          <td>89</td>
          <td>28.9%</td>
      </tr>
      <tr>
          <td>External integration e.g. AWS, GCP, Apache product</td>
          <td>49</td>
          <td>15.91%</td>
      </tr>
      <tr>
          <td>CLI</td>
          <td>41</td>
          <td>13.31%</td>
      </tr>
      <tr>
          <td>I don’t know</td>
          <td>5</td>
          <td>1.62%</td>
      </tr>
  </tbody>
</table>
<p>The results are rather quite self-explaining. Improved performance of Airflow, better
UI, and more telemetry are desirable. But this should go in pair with improved
documentation and resources about using the Airflow, especially when we
take into account the problem of onboarding new users.</p>
<p>Another interesting point from that question is that only 16% think that operators
should be extended and improved. This suggests that we should focus on improving
Airflow core instead of adding more and more integrations.</p>
<p><strong>What would be the most interesting feature for you?</strong></p>
<table>
  <thead>
      <tr>
          <th></th>
          <th>No.</th>
          <th>%</th>
      </tr>
  </thead>
  <tbody>
      <tr>
          <td>Production-ready Airflow docker image</td>
          <td>175</td>
          <td>56.82%</td>
      </tr>
      <tr>
          <td>Declarative way of writing DAGs / automated DAGs generation</td>
          <td>155</td>
          <td>50.32%</td>
      </tr>
      <tr>
          <td>Horizontal Autoscaling</td>
          <td>122</td>
          <td>39.61%</td>
      </tr>
      <tr>
          <td>Asynchronous Operators</td>
          <td>97</td>
          <td>31.49%</td>
      </tr>
      <tr>
          <td>Stateless web server</td>
          <td>81</td>
          <td>26.3%</td>
      </tr>
      <tr>
          <td>Knative Executor</td>
          <td>48</td>
          <td>15.58%</td>
      </tr>
      <tr>
          <td>I already have all I need</td>
          <td>13</td>
          <td>4.22%</td>
      </tr>
  </tbody>
</table>
<p>Production Docker image wins, and it’s not a surprise. We all know that deploying
Airflow is not a plug and play process, and that’s why the official image is being
worked on by Jarek Potiuk. An unexpected result is that half of the users would like to
have a declarative way of creating DAGs. That seems to be something that is “against Airflow”
as we always emphasize the possibility of writing workflows in pure python. Stories
about DAG generators are not new and confirm that there’s a need for a way to
declare DAGs.</p>
]]></content>
  </entry>
  
  <entry>
    <title>New Airflow website</title>
    <link href="/blog/announcing-new-website/" rel="alternate"/>
    <id>/blog/announcing-new-website/</id>
    <published>2019-12-11T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>The brand <a href="https://airflow.apache.org/">new Airflow website</a> has arrived! Those who have been following the process know that the journey to update <a href="https://airflow.readthedocs.io/en/1.10.6/">the old Airflow website</a> started at the beginning of the year.
Thanks to sponsorship from the Cloud Composer team at Google that allowed us to
collaborate with <code>Polidea</code> and with their design studio <code>Utilo</code>, and deliver an awesome website.</p>
<p>Documentation of open source projects is key to engaging new contributors in the maintenance,
development, and adoption of software. We want the Apache Airflow community to have
the best possible experience to contribute and use the project. We also took this opportunity to make the project
more accessible, and in doing so, increase its reach.</p>
<p>In the past three and a half months, we have updated everything: created a more efficient landing page,
enhanced information architecture, and improved UX &amp; UI. Most importantly, the website now has capabilities
to be translated into many languages. This is our effort to foster a more inclusive community around
Apache Airflow, and we look forward to seeing contributions in Spanish, Chinese, Russian, and other languages as well!</p>
<p>We built our website on Docsy, a platform that is easy to use and contribute to. Follow
<a href="https://github.com/apache/airflow-site/blob/master/README.md">these steps</a> to set up your environment and
to create your first pull request. You may also use
the new website for your own open source project as a template.
All of our <a href="https://github.com/apache/airflow-site/tree/master">code is open and hosted on GitHub</a>.</p>
<p>Share your questions, comments, and suggestions with us, to help us improve the website.
We hope that this new design makes finding documentation about Airflow easier,
and that its improved accessibility increases adoption and use of Apache Airflow around the world.</p>
<p>Happy browsing!</p>
]]></content>
  </entry>
  
  <entry>
    <title>ApacheCon Europe 2019 — Thoughts and Insights by Airflow Committers</title>
    <link href="/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/" rel="alternate"/>
    <id>/blog/apache-con-europe-2019-thoughts-and-insights-by-airflow-committers/</id>
    <published>2019-11-22T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<p>Is it possible to create an organization that delivers tens of projects used by millions, nearly no one is paid for doing their job, and still, it has been fruitfully carrying on for more than 20 years? Apache Software Foundation proves it is possible. For the last two decades, ASF has been crafting a model called the Apache Way—a way of organizing and leading tech open source projects. Due to this approach, which is strongly based on the “community over code” motto, we can enjoy such awesome projects like Apache Spark, Flink, Beam, or Airflow (and many more).</p>
<p>After this year’s ApacheCon, Polidea’s engineers talked with Committers of Apache projects, such as—Aizhamal Nurmamat kyzy, Felix Uellendall, and Fokko Driesprong—about insights to what makes the ASF such an amazing organization.</p>
<p>You can read the <a href="https://higrys.medium.com/apachecon-europe-2019-thoughts-and-insights-by-airflow-committers-9ff5f6938c99">insights after the ApacheCon 2019</a>.</p>
]]></content>
  </entry>
  
  <entry>
    <title>Documenting using local development environment</title>
    <link href="/blog/documenting-using-local-development-environments/" rel="alternate"/>
    <id>/blog/documenting-using-local-development-environments/</id>
    <published>2019-11-22T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h2 id="documenting-local-development-environment-of-apache-airflow">Documenting local development environment of Apache Airflow</h2>
<p>From Sept to November 2019 I have been participating in a wonderful initiative, <a href="https://developers.google.com/season-of-docs">Google Season of Docs</a>.</p>
<p>I had a pleasure to contribute to the Apache Airflow open source project as a technical writer.
My initial assignment was an extension to the GitHub-based Contribution guide.</p>
<p>From the very first days I have been pretty closely involved into inter-project communications
via emails/slack and had regular 1:1s with my mentor, Jarek Potiuk.</p>
<p>I got infected with Jarek’s enthusiasm to ease the on-boarding experience for
Airflow contributors. I do share this strategy and did my best to improve the structure,
language and DX. As a result, Jarek and I extended the current contributor’s docs and
ended up with the Contributing guide navigating the users through the project
infrastructure and providing a workflow example based on a real-life use case;
the Testing guide with an overview of a complex testing infrastructure for Apache Airflow;
and two guides dedicated to the Breeze dev environment and local virtual environment
(my initial assignment).</p>
<p>I’m deeply grateful to my mentor and Airflow developers for their feedback,
patience and help while I was breaking through new challenges
(I’ve never worked on an open source project before),
and for their support of all my ideas! I think a key success factor for any contributor
is a responsive, supportive and motivated team, and I was lucky to join such
a team for 3 months.</p>
<p>Documents I worked on:</p>
<ul>
<li><a href="https://github.com/apache/airflow/blob/master/BREEZE.rst">Breeze development environment documentation</a></li>
<li><a href="https://github.com/apache/airflow/blob/master/LOCAL_VIRTUALENV.rst">Local virtualenv environment documentation</a></li>
<li><a href="https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst">Contributing guide</a></li>
<li><a href="https://github.com/apache/airflow/blob/master/TESTING.rst">Testing guide</a></li>
</ul>
]]></content>
  </entry>
  
  <entry>
    <title>It&#39;s a &#34;Breeze&#34; to develop Apache Airflow</title>
    <link href="/blog/its-a-breeze-to-develop-apache-airflow/" rel="alternate"/>
    <id>/blog/its-a-breeze-to-develop-apache-airflow/</id>
    <published>2019-11-22T00:00:00Z</published>
    <updated>2026-04-08T16:39:10Z</updated>
    <author>
      <name>Apache Airflow</name>
    </author>
    <content type="html">&lt;![CDATA[<h2 id="the-story-behind-the-airflow-breeze-tool">The story behind the Airflow Breeze tool</h2>
<p>Initially, we started contributing to this fantastic open-source project [Apache Airflow] with a team of three which then grew to five. When we kicked it off a year ago, I realized pretty soon where the biggest bottlenecks and areas for improvement in terms of productivity were. Even with the help of our client, who provided us with a “homegrown” development environment it took us literally days to set it up and learn some basics.</p>
<p>That is how the journey to increased productivity in Apache Airflow began. The result? The Airflow Breeze open-source tool. Jarek Potiuk, an Airflow Committer, will tell you all about it.</p>
<p>You can learn <a href="https://higrys.medium.com/its-a-breeze-to-develop-apache-airflow-bf306d3e3505">how and why it’s a &ldquo;Breeze&rdquo; to Develop Apache Airflow</a>.</p>
]]></content>
  </entry>
  
</feed>
