Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis Build Failure monitor(ing) #469

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -37,9 +37,10 @@ gem 'sinatra', :require => false
gem 'slim'

# Services gems
gem 'gitter-api', '~> 0.1.0'
gem 'minigit', '~> 0.0.4'
gem 'tracker_api', '~> 1.6'
gem 'travis', '~> 1.7.6'
gem 'travis', '~> 1.8.10'

gem 'awesome_spawn', '>= 1.4.1'
gem 'default_value_for', '>= 3.1.0'
Expand Down
14 changes: 4 additions & 10 deletions Gemfile.lock
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,6 @@ GEM
timers (>= 4.1.1)
celluloid-supervision (0.20.6)
timers (>= 4.1.1)
coderay (1.1.2)
coercible (1.0.0)
descendants_tracker (~> 0.0.1)
coffee-rails (4.2.2)
Expand Down Expand Up @@ -164,6 +163,7 @@ GEM
multi_json (~> 1.0)
net-http-persistent (~> 2.9)
net-http-pipeline
gitter-api (0.1.0)
globalid (0.4.2)
activesupport (>= 4.2.0)
haml (5.1.2)
Expand Down Expand Up @@ -223,10 +223,6 @@ GEM
parser (2.7.1.2)
ast (~> 2.4.0)
pg (1.2.2)
pry (0.9.12.6)
coderay (~> 1.0)
method_source (~> 0.8)
slop (~> 3.4)
pusher-client (0.6.2)
json
websocket (~> 1.0)
Expand Down Expand Up @@ -340,7 +336,6 @@ GEM
slim (4.0.1)
temple (>= 0.7.6, < 0.9)
tilt (>= 2.0.6, < 2.1)
slop (3.6.0)
sprockets (3.7.2)
concurrent-ruby (~> 1.0)
rack (> 1, < 3)
Expand Down Expand Up @@ -369,15 +364,13 @@ GEM
multi_json
representable
virtus
travis (1.7.7)
addressable (~> 2.3)
travis (1.8.10)
backports
faraday (~> 0.9)
faraday_middleware (~> 0.9, >= 0.9.1)
gh (~> 0.13)
highline (~> 1.6)
launchy (~> 2.1)
pry (~> 0.9, < 0.10)
pusher-client (~> 0.4)
typhoeus (~> 0.6, >= 0.6.8)
turbolinks (5.2.1)
Expand Down Expand Up @@ -419,6 +412,7 @@ DEPENDENCIES
faraday (~> 0.9.2)
faraday-http-cache (~> 2.0.0)
foreman (~> 0.64.0)
gitter-api (~> 0.1.0)
haml (~> 5.1)
haml_lint (~> 0.35.0)
influxdb (~> 0.3.13)
Expand All @@ -443,7 +437,7 @@ DEPENDENCIES
thin
timecop
tracker_api (~> 1.6)
travis (~> 1.7.6)
travis (~> 1.8.10)
turbolinks
uglifier (>= 1.3.0)
webmock
Expand Down
33 changes: 33 additions & 0 deletions app/models/branch.rb
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,37 @@ def fq_branch_name
def git_service
GitService::Branch.new(self)
end

# Branch Failure

def notify_of_failure
if passing?
BuildFailureNotifier.new(self).report_passing
update(:last_build_failure_notified_at => nil, :travis_build_failure_id => nil)
elsif should_notify_of_failure?
update(:last_build_failure_notified_at => Time.zone.now)

BuildFailureNotifier.new(self).post_failure
end
end

def previously_failing?
!!travis_build_failure_id
end

def should_notify_of_failure?
last_build_failure_notified_at.nil? || last_build_failure_notified_at < 1.day.ago
end

# If we have reported a failure before and the branch is now green.
#
# The other advantage of checking `last_build_failure_notified_at.nil?` here
# is that we save a Travis API call, since we shouldn't be creating
# BuildFailure records without having found a build failure elsewhere (e.g.
# TravisBranchMonitor).
#
# New records will short circut before hitting `Travis::Repository.find`.
def passing?
!last_build_failure_notified_at.nil? && Travis::Repository.find(repo.name).branch(name).green?
end
end
94 changes: 94 additions & 0 deletions app/workers/travis_branch_monitor.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
require 'travis'

class TravisBranchMonitor
include Sidekiq::Worker
sidekiq_options :queue => :miq_bot, :retry => false

include Sidetiq::Schedulable
recurrence { hourly.minute_of_hour(0, 15, 30, 45) }

include SidekiqWorkerMixin

class << self
private

# For this class, sometimes the repo needs to be mapped to a specific
# gitter room, so a hash is required.
#
# This override allows for doing this in the config
#
# travis_branch_monitor:
# included_repos:
# ManageIQ/manageiq-ui-classic: ManageIQ/ui
# ManageIQ/manageiq-gems-pending: ManageIQ/core
# ManageIQ/manageiq:
# ManageIQ/miq_bot:
#
# Which you are allowed to leave the value empty, and the key will be used
# where appropriate (not used in this class).
#
# The result from the above for this method will then be:
#
# [
# [
# "ManageIQ/manageiq-ui-classic",
# "ManageIQ/manageiq-gems-pending",
# "ManageIQ/manageiq",
# "ManageIQ/miq_bot"
# ],
# []
# ]
#
def included_and_excluded_repos
super # just used for error handling...

[
settings.included_repos.try(:to_h).try(:stringify_keys).try(:keys),
settings.excluded_repos.try(:to_h).try(:stringify_keys).try(:keys)
]
end
end

def perform
if !first_unique_worker?
logger.info("#{self.class} is already running, skipping")
else
process_repos
end
end

def process_repos
enabled_repos.each do |repo|
process_repo(repo)
end
end

def process_repo(repo)
repo.regular_branch_names.each do |branch_record|
process_branch(repo, branch_record)
end
end

def process_branch(repo, branch_record)
# If we already have a failure record, call notify with that record
return branch_record.notify_of_failure if branch_record.previously_failing?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that it's still failing first?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this probably is a "naming is hard" problem, but .notify_of_failure actually does that. It does the .passing? check which will determine if a repo is still failing for that given branch.

I do forget the rational for my logic as to "why" I split it up like this, so I will have to look at this again, but the logic does as you wish, it just is a little unclear.


# otherwise, check if any builds exist with a failures, and if so, update
# the branch_record to add the `travis_build_failure_id`.
v3_client = TravisV3Client.new(:repo => Travis::Repository.find(repo.name))
branch_builds = v3_client.repo_branch_builds(branch_record.name)

if branch_builds.first.failed?
first_failure = find_first_recent_failure(branch_builds)
branch_record.update(:travis_build_failure_id => first_failure.id)

branch_record.notify_of_failure
end
end

private

def find_first_recent_failure(builds)
builds.take_while(&:failed?).last
end
end
6 changes: 6 additions & 0 deletions db/migrate/20200109223351_add_branch_build_failures.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
class AddBranchBuildFailures < ActiveRecord::Migration[5.2]
def change
add_column :branches, :travis_build_failure_id, :integer
add_column :branches, :last_build_failure_notified_at, :datetime
end
end
4 changes: 3 additions & 1 deletion db/schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
#
# It's strongly recommended that you check this file into your version control system.

ActiveRecord::Schema.define(version: 2017_10_06_050814) do
ActiveRecord::Schema.define(version: 2020_01_09_223351) do

# These are extensions that must be enabled in order to support this database
enable_extension "plpgsql"
Expand Down Expand Up @@ -44,6 +44,8 @@
t.string "merge_target"
t.string "pr_title"
t.integer "linter_offense_count"
t.integer "travis_build_failure_id"
t.datetime "last_build_failure_notified_at"
end

create_table "repos", id: :serial, force: :cascade do |t|
Expand Down
98 changes: 98 additions & 0 deletions lib/build_failure_notifier.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
require 'travis'
require 'gitter/api'

class BuildFailureNotifier
def self.gitter
@gitter ||= begin
api_url = Settings.gitter.api_url
client_settings = {
:token => Settings.gitter_credentials.token,
:api_prefix => Settings.gitter.api_prefix,
:api_uri => api_url && URI(api_url)
}

Gitter::API::Client.new(client_settings)
end
end

def self.repo_room_map
settings = Settings.travis_branch_monitor
repo_map = settings.included_repos.to_h.merge(settings.excluded_repos.to_h)

repo_map.stringify_keys!
repo_map.each do |repo, room_uri|
repo_map[repo] = repo if room_uri.nil?
end
end

attr_reader :branch, :build, :repo, :repo_path, :room

def initialize(branch)
@branch = branch
@repo_path = branch.repo.name
@repo = Travis::Repository.find(repo_path)
@room = repo_room_map[repo_path]
@build = repo.session.find_one(Travis::Client::Build, branch.travis_build_failure_id)
end

def post_failure
notification_msg = <<~MSG
> ### :red_circle: Build Failure in #{repo_branches_markdown_url}!
>
> **Travis Build**: #{travis_build_url}
MSG
notification_msg << "> **Failure PR**: #{offending_pr}\n" if offending_pr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not always a PR to that repo which causes the failure. Sometimes a merge to core breaks the API repo, but it is only noticed after the API repo cron runs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is why the if offending_pr is there, which can return nil.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I am open to suggestions on how we can improve this to better provide context for the message. I realize that this could still display a false positive by displaying the "last merged commit" on something that was merged weeks ago, so maybe only display the PR if it was merged recently? Within the last 3 days?

notification_msg << "> **Commit**: #{commit_url}\n" if commit_url
notification_msg << "> **Compare**: #{compare_url}\n" if compare_url

gitter_room.send_message(notification_msg)
end

def report_passing
notification_msg = <<~MSG
> ### :green_heart: #{repo_branches_markdown_url} now passing!
MSG

gitter_room.send_message(notification_msg)
end

private

def gitter
self.class.gitter
end

def repo_room_map
self.class.repo_room_map
end

# join room if needed, otherwise returns room
def gitter_room
@gitter_room ||= gitter.join_room(room)
end

def travis_build_url
"https://travis-ci.org/#{repo_path}/builds/#{build.id}"
end

# find the PR that caused this mess...
def offending_pr
if build.commit && build.commit.message =~ /^Merge pull request #(\d+)/
"https://github.com/#{repo_path}/issues/#{$1}"
end
end

def commit_url
if build.commit
"https://github.com/#{repo_path}/commit/#{build.commit.sha[0, 8]}"
end
end

def compare_url
build.commit.compare_url if build.commit && build.commit.compare_url
end

def repo_branches_markdown_url
"[`#{repo_path}`](https://travis-ci.org/#{repo_path}/branches)"
end
end
5 changes: 5 additions & 0 deletions lib/tasks/travis_branch_monitor.rake
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
namespace :travis_branch_monitor do
task :poll_single => :environment do
TravisBranchMonitor.new.perform
end
end
45 changes: 45 additions & 0 deletions lib/travis_v3_client.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
require 'travis'

# A helper class for making v3 API class to the TravisCI API, since the current
# API mostly only makes v2 calls, and does a decent amount of inificient ones
# at that...
#
# Expects for most calls that a base Travis::Entity has been set for the given
# client call. Passed in as a hash during initialization.
#
class TravisV3Client
attr_reader :connection, :repo, :user_agent

def initialize(base_entities = {})
@connection = Faraday.new(:url => Travis::Client::ORG_URI)

@connection.headers['Authorization'] = "token #{::Settings.travis.access_token}"
@connection.headers['travis-api-version'] = '3'

@repo = base_entities[:repo]

set_user_agent
end

def repo_branch_builds(branch = "master", params = {})
query_params = {
"branch.name" => branch,
"build.event_type" => "push,api,cron"
}.merge(params)

data = connection.get("/repo/#{repo.id}/builds", query_params)
repo.session.load(JSON.parse(data.body))["builds"]
end

private

# Use the Travis user agent for this
def set_user_agent
base_model = repo
if base_model
agent_string = repo.session.headers['User-Agent']
@connection.headers['User-Agent'] = agent_string
agent_string
end
end
end
Loading