Hello, we have to ETL jobs: "main" runs at night and "realtime" runs every 15 minutes. To avoid waste data, both jobs must not run at the same time. Therefore each job has to check if the other job is running.
How can we do that?
Our first approach is a python script which requests the task state via API (".../task/running"). Maybe there is an easier way to resolve our problem?
1 Community Answers
Kalyan Arangam —
could each job write a specific flag to a disk/database or drop a file in S3 to indicate they are running. And cleanup once done?
Each job would check this flag before they run.
If the schedule-pattern is similar (15 mins), you could create a parent job that runs every 15 mins and runs the appropriate job based on the time of launch. What is the pattern for “main” job – how often does it run at night?