- Notifications
You must be signed in to change notification settings - Fork 16.2k
Make CronTriggerTimetable startup behavior intuitive #41558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The "best practice" for having the first run begin at the most recent run is to set the start_date far in the past and catchup=False, but I think having a meaningless date passed to start_date is not a good thing for Airflow to need and doesn't match author intent. |
| I found a logical bug. Setting to False creates a race condition to actually schedule the first run. We'll need some minimum buffer time where the next run time does not update even when |
1cf6a65 to 3b65469 Compare | @uranusjr We spoke about what CronDataInterval's behavior is when start_date is None, but I found that the DAG object actually restricts you from setting start date to None. This, I believe, is no longer necessary, and actually the CronDataIntervalTimetable's behavior as written seems sensible, it's just not accessible. I added a commit to remove that restriction on start_date |
| Needs to resolve conflicts but otherwise looks good to me. |
| This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
| @uranusjr I don't think this test failure is related to the change. I merged in the changes from main and think this new version should be ready to merge. |
| You need to rebase - there are conflicts. That should also address the issue with unrelated test failing @collinmcnulty |
| Smth is wrong. There are a LOT of changes |
3bc3d56 to e1f66e6 Compare | I believe I fixed it. Someone else had made the changes to the dag model to allow None in the meantime while this PR was languishing, so now its only about CronTriggerTimetable itself. |
--------- Co-authored-by: Tzu-ping Chung <uranusjr@gmail.com>
The first run behavior for cron based DAGs (and most DAGs really) is quite unintuitive. It is very common for users to want to run a DAG for the first time either immediately, with a logical date of the past run that would have occurred most recently, or to wait until the next run time and then begin. The former is only possible if you set catchup to False and the latter is not possible without knowing ahead of time when that next run would be and hardcoding the start time for the DAG (which is not feasible when doing dev/text/prod environment promotion of the same DAG file).
It is also very common to set the start_date to a dynamic function that calculates a date in the recent past, such as days_ago(1), which can bite you if you're used to doing that (or have coded it into a DAG factory pattern) and then you set up your first monthly DAG. The existing timetables behavior is to silently miss the run, which makes it even worse.
This PR adds the ability to set start_date to
Noneand get the intuitive behavior of running the last run if that run would have happened in the very recent past (10% of the distance between DAG runs) or to wait for the next run otherwise. It also allows using therun_immediatelyparameter to specify always running the last run, always waiting for the next run, or passing a timedelta to exactly control what "recent past" means for this particular DAG. Setting start_date toNoneis a much more accurate and clear indication of author intent thatdays_ago(x).I hope that if this is widely adopted, we can stop seeing users miss DAG runs or using dynamic start dates.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.