Skip to content
Snippets Groups Projects

Add message to the hook script to notify of potential validation delay

Merged Martin Lowe requested to merge malowe/main/script-large-set-messages into main
2 unresolved threads
1 file
+ 7
0
Compare changes
  • Side-by-side
  • Inline
+ 7
0
@@ -7,6 +7,7 @@ require 'multi_json'
@@ -7,6 +7,7 @@ require 'multi_json'
WIKI_REGEX_MATCH = /.*\.wiki$/
WIKI_REGEX_MATCH = /.*\.wiki$/
HOST_URL='https://gitlab.eclipse.org'
HOST_URL='https://gitlab.eclipse.org'
API_URL='https://api.eclipse.org'
API_URL='https://api.eclipse.org'
 
LARGE_COMMIT_SET_THRESHOLD=15
# this should be removed as soon as Oniro is onside again
# this should be removed as soon as Oniro is onside again
ALLOW_LIST_PROJECTS = ['/eclipse/oniro-core/meta-ts',
ALLOW_LIST_PROJECTS = ['/eclipse/oniro-core/meta-ts',
'/eclipse/oniro-core/meta-ledge-sesure',
'/eclipse/oniro-core/meta-ledge-sesure',
@@ -138,6 +139,8 @@ diff_git_commits = diff_git_commits_raw.split(/\n/)
@@ -138,6 +139,8 @@ diff_git_commits = diff_git_commits_raw.split(/\n/)
if (diff_git_commits.empty?) then
if (diff_git_commits.empty?) then
puts "There are no commits to validate for current push, skipping validation step"
puts "There are no commits to validate for current push, skipping validation step"
exit 0
exit 0
 
elsif (diff_git_commits.length() > LARGE_COMMIT_SET_THRESHOLD) then
 
puts "The following push has a large number of commits (#{diff_git_commits.length()}), and may have a delay in response or timeout for very large sets"
end
end
processed_git_data = []
processed_git_data = []
@@ -165,6 +168,10 @@ begin
@@ -165,6 +168,10 @@ begin
parsed_response = MultiJson.load(response.body)
parsed_response = MultiJson.load(response.body)
rescue MultiJson::ParseError
rescue MultiJson::ParseError
puts "GL-HOOK-ERR: Unable to validate commit, server error encountered.\n\nPlease contact the administrator, and retry the commit at a later time.\n\n"
puts "GL-HOOK-ERR: Unable to validate commit, server error encountered.\n\nPlease contact the administrator, and retry the commit at a later time.\n\n"
 
## If a large commit set, this could be a timeout, and may be resolved through a second attempt after a short delay
 
if (diff_git_commits.length() > LARGE_COMMIT_SET_THRESHOLD) then
    • Why are we printing 2 messages?

      • Author Maintainer

        One is to inform them before the request that it'll probably take some time and is expected. The second is when there's an error and we had a ton of commits that it may have been a timeout, as another push will likely succeed after a minute or so

      • Gotcha, the second one only occurs if there is an error...

        I am concerned this may create too much noise, what do you think about only including the message when we detect an error?

        Do we know who is timing out? Is it the API or the Ruby script making the HTTP call?

        My understanding is that httparty uses the default from Net::HTTP which has a default value of 60 seconds.

      • Author Maintainer

        My assumption is Quarkus kills the connection, looks like somewhere between 15s and 30s. I'm not finding anything in configs about that value, but that about tracks from what I can see. I'm hoping to put in a large-ish thread pool over the next week and have requests grab from that to do parallel processing rather than sequential. 15s is a long wait for a validation, and I know we can do better.

      • Author Maintainer

        The vert.x client has a really clear timeout of 60s, which is what we're starting to switch to this quarter. I can't find any reference to the legacy resteasy client timeouts in the config docs though.

      • When we validate multiple commits, do we currently make individual API calls to either the foundation or profile API for each commit a user makes, or have we optimized this to validate a user only once?

        One idea to consider is creating a new endpoint that allows us to retrieve the ECA status (and any other necessary information) for multiple users simultaneously. This could help reduce the number of API requests required.

        Another suggestion is to enhance our Ruby script so that it automatically retries requests after encountering a timeout. However, this could lead to an even longer wait time so not best solution.

        tldr; While increasing the timeout may be something we should do, we should prioritize ways to optimize our processes to shorten the time required for commit validation

      • It would be interesting to have some stats for this API on where are the biggest bottlenecks.

      • Author Maintainer

        Biggest bottleneck is without a doubt sequential processing of commits. We use somewhat aggressive caching, so people should only be fetched once, but we fetch their profiles as we need ECA status, which is the second biggest issue. So if there is a few users on a patch, it adds up fast. Add in that it's sequential and we have a recipe for slow responses.

        One idea to consider is creating a new endpoint that allows us to retrieve the ECA status (and any other necessary information) for multiple users simultaneously. This could help reduce the number of API requests required.

        This would be a huge win but would also probably need an at least somewhat complicated rewrite. Without a new endpoint, something that could actually make a lot of sense here would be looking into reactive processing, as it's made to pick up and drop processing as it's waiting for responses, but that would take much longer than an endpoint as I'm a novice at reactive.

        tldr; While increasing the timeout may be something we should do, we should prioritize ways to optimize our processes to shorten the time required for commit validation

        That's honestly the plan. Making it wait longer doesn't fix that it gets really slow sometimes.

      • Author Maintainer

        I am concerned this may create too much noise, what do you think about only including the message when we detect an error?

        I only include the first message if there are more than 15 commits in the patch, which will probably take at least 10 seconds, more if there are a lot of unique committers. This might make a user wonder whats taking so long, so this is to just let them know it's somewhat expected. This is just a quick fix for a rare case while we look into performance. With the changes I'm looking into now along with switching to native, I'm hoping to get a lot of performance gains. My hope is that we won't need that message after all the changes are in.

        If you want, I can move the first message to go out with second message if you'd prefer. Just let me know!

      • I am good with what you have!

      • Please register or sign in to reply
Please register or sign in to reply
 
puts "The following push had a large commit set and may have timed out in processing. In this case, after a short delay (~1 minute), a second attempt can be made which may resolve the issue. If the issue is still not resolved, see above message."
 
end
## Additional information for debugging - request will allow us to easily retest exact output and find errors in render
## Additional information for debugging - request will allow us to easily retest exact output and find errors in render
puts "Request body: #{MultiJson.dump(json_data)}\n\n"
puts "Request body: #{MultiJson.dump(json_data)}\n\n"
exit 1
exit 1
Loading