Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure driver is not working correctly, when it tries to execute more than one job in parallel #384

Open
Ben10k opened this issue Dec 2, 2023 · 2 comments

Comments

@Ben10k
Copy link

Ben10k commented Dec 2, 2023

I am using the latest versions of:

.drone_pool.yml that works:

version: "1"
instances:
- name: ubuntu-azure
  default: true
  type: azure
  pool: 0
  limit: 1
  platform:
    os: linux
    arch: amd64
  spec:
    account:
      client_id: "****"
      client_secret: "****"
      subscription_id: "****"
      tenant_id: "****"
    resource_group: drone-runners
    location: eastus2
    size: Standard_B4s_v2
    image:
      username: "****"
      password: "****"
      publisher: canonical
      offer: 0001-com-ubuntu-server-focal
      sku: 20_04-lts-gen2
      version: latest

When I try to run pipelines, this configuration works, but if I increase the limit to anything above and try to run parallel pipleines, something stops working.

  • Both pipeline stages turn yellow and the timer starts.
  • In the azure portal and runner's logs I can see that 2 VMs are provisioned
  • After about 60-80 seconds, one job starts running, and work as expected, when the pipeline finishes, the VM gets destroyed (visible both in runner's logs and on Azure portal)
  • Another job stays in progress, but steps do not start
  • After about 20-21 minutes, the pipeline stage fails with context deadline exceeded
  • After enabling the trace and debug logs on the runner, I found these log messages, which indicate that if a few VMs are started at the same time, the runner assigns the same IP addresses for VMs even though only 1 VM actually has that IP, and other 2 VMs have their own unique public IPs.
time="2023-12-02T11:36:06Z" level=debug msg="azure: [provision] complete" cloud=azure fields.time=47.23s image=0001-com-ubuntu-server-focal ip=20.230.100.19 name=drone-runner-vm-cd8fb7c9b-6gn44-ubuntu-azure-XWp05980 pool=ubuntu-azure size=Standard_B4s_v2 zone="[]"
time="2023-12-02T11:36:08Z" level=debug msg="azure: [provision] complete" cloud=azure fields.time=48.40s image=0001-com-ubuntu-server-focal ip=20.230.100.19 name=drone-runner-vm-cd8fb7c9b-6gn44-ubuntu-azure-basMfX91 pool=ubuntu-azure size=Standard_B4s_v2 zone="[]"
time="2023-12-02T11:36:09Z" level=debug msg="azure: [provision] complete" cloud=azure fields.time=47.53s image=0001-com-ubuntu-server-focal ip=20.230.100.19 name=drone-runner-vm-cd8fb7c9b-6gn44-ubuntu-azure-5v88gnFS pool=ubuntu-azure size=Standard_B4s_v2 zone="[]"

  • The VM which actually has that IP, continue executing the pipeline, but the other 2 hang for 20 minutes, until they are removed.

If I increase the pool value to anything more than 0, than all pipelines timeout with the same issues.

Note: I have tried the same runner with the same pipelines with another pool of AWS instances and it worked perfectly.

@Ben10k
Copy link
Author

Ben10k commented Dec 2, 2023

I found the issue and was able to solve it locally.
Will raise a PR shortly.

@Ben10k Ben10k changed the title Azure drivwer is not working correctly, when it tries to execute more than one job in parallel Azure driver is not working correctly, when it tries to execute more than one job in parallel Dec 2, 2023
@Ben10k
Copy link
Author

Ben10k commented Dec 15, 2023

@raghavharness I am tagging you as I see you are actively maintaining this repository.

Can you please take a look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant