diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..12d1efcae72dd5020a9d8c823490735332f60d14 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +/dist/ +/.venv/ diff --git a/README.md b/README.md index 664e6846560299ed64bc17e9e4c355ec47db614c..a51d6493641be629afa09511c306fdf55b3e81fa 100644 --- a/README.md +++ b/README.md @@ -12,11 +12,11 @@ # About -Gitlab IP Check enables dependency checks in Gitlab projects using the Eclipse Dash tool, generating a report file with -the results. +Eclipse IP Analysis (IPA) enables dependency analysis in GitLab and GitHub projects using the Eclipse Dash +tool, generating a report file with the results. _List of currently supported programming languages: Go, Java (Maven and Gradle), JavaScript (NPM and Yarn), -Kotlin (Gradle), Python_ +Kotlin (Gradle), Python._ # Getting Started @@ -24,20 +24,20 @@ Kotlin (Gradle), Python_ In order to run the tool, you must install the base requirements below. -- Python 3.11.x: check your Python version with the command ```python --version```. In some systems, you may not have +- Python >=3.6, <3.13: check your Python version with the command ```python --version```. In some systems, you may not have the alias for Python mapping to Python 3. In such cases, you can run ```python3 --version```. Moreover, check that you have the Python Package Manager (pip) installed. Similar to Python, you can run ```pip --version``` or ```pip3 --version```. The resulting line should contain your version of Python at its end. If pip is not installed, official documentation can be followed [here](https://pip.pypa.io/en/stable/installation/). -- Java JDK 11 or above: the latest version can be safely installed. Check that Java is installed and its current version -by running the command ```java --version```. +- Java JDK 11 or above: the latest version can be safely installed. Check that Java is installed and what's the current +version by running the command ```java --version```. -- Apache Maven: the latest version can be safely installed. Check that Maven is installed and its current version by -running the command ```mvn --version```. +- Apache Maven: the latest version can be safely installed. Check that Maven is installed and what's the current version +by running the command ```mvn --version```. -- Git CLI: the latest version can be safely installed. Check that Git is installed and its current version by running -the command ```git --version```. +- Git CLI: the latest version can be safely installed. Check that Git is installed and what's the current version by +running the command ```git --version```. ## Setup @@ -46,12 +46,15 @@ the command ```git --version```. ```git clone https://gitlab.eclipse.org/eclipse-research-labs/research-eclipse/ip-check.git``` - Navigate to the directory of the repository that you just cloned. -- Install Python dependencies using pip command line: +- Get Hatch to build the tool (https://hatch.pypa.io/latest/install). +- Build and install the tool: -```pip install -r requirements.txt``` +```hatch build``` -_Please note that if you may need to run the command ```pip3 install -r requirements.txt``` if pip is not mapped to your -version of Python 3 as discussed in the installation of [Base Requirements](#base-requirements)._ +```pip install dist/eclipse_ipa-0.1.0.tar.gz``` + +_Please note that you may need to run ```pip``` as ```pip3``` if pip is not mapped to your +version of Python 3, as mentioned in the installation of [Base Requirements](#base-requirements)._ ([back to top](#About)) @@ -59,69 +62,93 @@ version of Python 3 as discussed in the installation of [Base Requirements](#bas Run the tool with the following command: -```python ip_check.py [-h] [-b BRANCH] [-c CONFIG] [-g GROUP] [-p PROJECT] [-pf PROJECTS_FILE] [-df DEPENDENCIES_FILE]``` - -You must adapt ```python``` to ```python3``` depending on what your result was in the installation of -[Base Requirements](#base-requirements). +```eclipse-ipa [-h] [-ci] [-gh] [-gl GITLAB] [-b BRANCH] [-c CONFIG] [-g GROUP] [-p PROJECT] [-s] [-t TOKEN] [-pf PROJECTS_FILE] [-df DEPENDENCIES_FILE]``` -The command does not require any of its options. However, a minimum set is needed to execute simple IP checks if +The command does not require any of its options. However, a minimum set is needed to execute simple IP analysis if a configuration file is not specified. A summary of the options is given below: ``` -h, --help show this help message and exit + -ci, --ci_mode execute in CI mode + -gh, --GitHub execute for GitHub + -gl GITLAB, --gitlab GITLAB + execute for GitLab URL -b BRANCH, --branch BRANCH - branch to check + branch to analyze -c CONFIG, --config CONFIG config file to use -g GROUP, --group GROUP - Gitlab group ID to check + GitHub organization/GitLab group ID to analyze -p PROJECT, --project PROJECT - Gitlab project ID to check + GitHub/GitLab project ID to analyze + -s, --summary output is a Dash summary file + -t TOKEN, --token TOKEN + access token for API -pf PROJECTS_FILE, --projects_file PROJECTS_FILE - file with projects to check + file with projects to analyze -df DEPENDENCIES_FILE, --dependencies_file DEPENDENCIES_FILE - file with dependency locations to check + file with dependency locations to analyze ``` -To start using the tool, you should provide **one of the following five options**: +To start using the tool, you must provide **one of the following five options**: + +1. A file with the dependency locations to analyze. Each line should contain the GitHub/GitLab Project ID, the full +location path and the programming language, all separated by semicolons (;). The full path of this file is specified with +option -df as summarized above. + +Example for a GitHub line: + +```kubernetes-client/python;requirements.txt;Python``` + +Example for a GitLab line: -1. A file with the dependency locations to check. Each line should contain the Gitlab Project ID, the full location path -(not parsed) and the programming language, all separated by semicolons (;). The full path of this file is specified with -option -df as summarized above. Example for a line: ```7602;eclipse-research-labs/research-eclipse/ip-tools/gitlab-ip-check/requirements.txt;Python``` -2. A file with the list of Gitlab Projects to check. Each line should contain the Gitlab Project ID and its full path -(not parsed), separated by semicolons (;). The full path of this file is specified with option -pf as summarized above. -Example for a line: +2. A file with the list of GitHub/GitLab Projects to analyze. For GitHub, each line should contain the GitHub project +full name. For GitLab, each line should contain the GitLab Project ID and its full path (not parsed), separated by +semicolons (;). The full path of this file is specified with option -pf as summarized above. + +Example for a GitHub line: + +```kubernetes-client/python``` + +Example for a GitLab line: + ```7602;eclipse-research-labs/research-eclipse/ip-tools``` -3. Your specific Gitlab Group ID that can be obtained from the Gitlab web interface by navigating to your group and -pressing the three dots (More actions) at the right-top section of the menu. This is specified with option -g as summarized -above. +3. Your specific GitHub Organization name, or your specific GitLab Group ID that can be obtained from the GitLab web +interface by navigating to your group and pressing the three dots (More actions) at the right-top section of the menu. +This is specified with option -g as summarized above. -4. Your specific Gitlab Project ID that can be obtained from the Gitlab web interface by navigating to your project and -pressing the three dots (More actions) at the right-top section of the menu. This is specified with option -p as summarized -above. +4. Your specific GitHub Project name (full name including Organization), or your specific GitLab Project ID that can be +obtained from the GitLab web interface by navigating to your project and pressing the three dots (More actions) at the +right-top section of the menu. This is specified with option -p as summarized above. 5. A configuration file, specified with option -c as summarized above. It allows additional customization, and a sample is provided in the same folder as the tool with the filename *config.ini.sample*. Parameters within the config file are described in the comments. +_Please note that for GitHub public access the API rate limits are very low. It's recommended to provide an access token +if running against GitHub projects._ + ## Adding verified dependencies (optional) If a restricted or unknown dependency has been manually verified, this information can be added to the verified-dependencies.txt file and is later displayed in the HTML report. Add one per line in the following format: ```<dependency_full_name>;<comments>``` +Please bear in mind that the file must exist in the directory where you run the tool. + ## How the tool works -If a Gitlab Group ID or a list of Gitlab Projects is provided, the tool fetches the programming languages for each project -and searches for dependency files for each supported programming language. Once a list of dependency locations is available -(user-provided or automatically detected), it runs Eclipse Dash on those dependencies to check their IP approval status. +If a GitHub Organization/GitLab Group ID or a list of GitHub/GitLab Projects is provided, the tool fetches the programming +languages for each project and searches for dependency files for each supported programming language. Once a list of +dependency locations is available (user-provided or automatically detected), it runs Eclipse Dash on those dependencies +to analyze their IP approval status. -At the end, the tool outputs a full report in HTML. Any additional details can be found in the log file (ip-check.log) -or by looking into the Eclipse Dash output files (in the "output" folder, by default). +At the end, and by default, the tool outputs a full report in HTML. Any additional details can be found in the log file +(ip-analysis.log). ([back to top](#About)) diff --git a/config.ini.sample b/config.ini.sample index 2497d2bd4c24233be3824ca807f845d6ad6c20cf..c0a0a577d4ce5a7083f859c6a58c035895b158db 100644 --- a/config.ini.sample +++ b/config.ini.sample @@ -11,24 +11,26 @@ [General] LogLevel = INFO -LogFile = ip_check.log -GitlabURL = https://gitlab.eclipse.org +LogFile = ip_analysis.log +;GitlabURL = https://gitlab.eclipse.org +;GithubAuthToken = ;GitlabAuthToken = -GitlabConnectionAttempts = 3 +APIConnectionAttempts = 3 # If not set, the default branch for each project will be used ;Branch = -# Check dependencies with EclipseDash -CheckDependencies = yes +# Analyze dependencies with EclipseDash +AnalyzeDependencies = yes # Input file with manually verified dependencies VerifiedDependencies = verified-dependencies.txt [Groups] -# If not specified and no list of projects/dependencies is given, the tool will exit +# If not specified and no list of projects/dependencies is given, the tool will exit. +# Also valid to specify a Github organization. ;BaseGroupID = [Projects] ;SingleProjectID = -# If processing groups, a project list can be saved +# If processing groups/organizations, a project list can be saved Save = no # If loaded from file, there will be no group processing LoadFromFile = no @@ -47,10 +49,10 @@ OutputFile = gitlab-dependencies.txt JarPath = ./assets/org.eclipse.dash.licenses-1.1.1-20240607.055024-182.jar BatchSize = 500 ConfidenceThreshold = 60 -# Output folder for all Eclipse Dash runs -OutputFolder = ./output -# Report with a summary of Eclipse Dash execution saved in the main folder +# HTML Report of Eclipse Dash execution saved in the main folder OutputReport = yes +# Alternatively, the output can be a combined summary in the original Dash format +;OutputSummary = no [Go] Enabled = yes diff --git a/ip_check.py b/ip_check.py deleted file mode 100644 index 1f9a22ea31c34b77803d30f771d1ebdc5ec684a6..0000000000000000000000000000000000000000 --- a/ip_check.py +++ /dev/null @@ -1,545 +0,0 @@ -# Copyright (c) 2024 The Eclipse Foundation -# -# This program and the accompanying materials are made available under the -# terms of the Eclipse Public License 2.0 which is available at -# http://www.eclipse.org/legal/epl-2.0. -# -# SPDX-License-Identifier: EPL-2.0 -# -# Contributors: -# asgomes - Initial implementation - - -import argparse -import configparser -import fnmatch -import shutil -import stat -import tempfile - -import gitlab -import logging -import os -import re -import report_generator -from chardet import detect -from collections import OrderedDict, defaultdict -from get_pypi_latest_version import GetPyPiLatestVersion -from subprocess import PIPE, Popen, run -from time import sleep - -config = configparser.ConfigParser() - -logger = logging.getLogger(__name__) -log_level = logging.getLevelName(config.get('General', 'LogLevel', fallback='INFO')) -log_file = config.get('General', 'LogFile', fallback='ip_check.log') -logging.basicConfig(filename=log_file, encoding='utf-8', - format='%(asctime)s [%(levelname)s] %(message)s', level=log_level) - - -def recursive_chmod(path, mode): - for dirpath, dirnames, filenames in os.walk(path): - os.chmod(dirpath, mode) - for filename in filenames: - os.chmod(os.path.join(dirpath, filename), mode) - - -def cleanup_fs(path): - try: - shutil.rmtree(path) - except FileNotFoundError as e: - logger.info("Error while cleaning up: " + str(e)) - except PermissionError as e: - logger.info("Error while cleaning up: " + str(e) + ". Attempting again after changing permissions.") - recursive_chmod(path, stat.S_IWUSR) - try: - shutil.rmtree(path) - except PermissionError as e: - logger.info("Error while cleaning up: " + str(e) + ". Giving up.") - - -def add_error_report(location, error): - if config.getboolean('EclipseDash', 'OutputReport', fallback=True): - return location + ";" + error + ";;error;-" - - -def dash_check(project, filepaths, lang): - effective_count = 0 - total_count = 0 - output_report = [] - for fpath in filepaths: - total_count = total_count + 1 - print("Processing " + lang + " dependency location " + str(total_count) + "/" + str(len(filepaths))) - logger.info("Processing " + lang + " dependency location " + str(total_count) + "/" + str(len(filepaths))) - - # Make relative path for processing - fpath = fpath.replace(project.path_with_namespace + "/", "") - location = project.path_with_namespace + "/-/blob/" + config.get('General', 'Branch', - fallback=project.default_branch) + "/" + fpath - # Java (Maven Only) - if lang == 'Java' and 'gradle' not in fpath: - # Git clone repo for Maven - p_git = Popen([shutil.which('git'), 'clone', '--depth', '1', - project.http_url_to_repo, 'tmp'], stdout=PIPE, stderr=PIPE) - stdout, stderr = p_git.communicate() - # If errors from Git clone - if p_git.returncode != 0: - logger.warning( - "Error Git cloning repository for dependency file (" + project.path_with_namespace + "/" + fpath - + "). Please check.") - logger.warning(stdout) - logger.warning(stderr) - output_report.append( - add_error_report(location, "Error Git cloning repository for the dependency file")) - # Attempt cleanup - cleanup_fs('./tmp') - continue - # Create dependency list with Maven - relative_path = './tmp/' + fpath.replace(project.path_with_namespace, "") - # Run Maven - p_maven = Popen([shutil.which('mvn'), '-f', relative_path, 'verify', 'dependency:list', '-DskipTests', - '-Dmaven.javadoc.skip=true', '-DoutputFile=maven.deps'], stdout=PIPE, stderr=PIPE) - stdout, stderr = p_maven.communicate() - # If no errors from Maven - if p_maven.returncode == 0: - with open(relative_path.replace('pom.xml', 'maven.deps'), 'r') as fp: - raw_content = fp.read() - # Retrieve only the right content - processed_content = [x.group(0) for x in re.finditer(r'\S+:(system|provided|compile)', raw_content)] - # Sort and remove duplicates - processed_content.sort() - processed_content = "\n".join(list(OrderedDict.fromkeys(processed_content))) - raw_content = processed_content.encode('utf-8') - # Attempt cleanup - cleanup_fs('./tmp') - else: - # Get Maven error - maven_error = [x for x in stdout.decode('utf-8', errors='ignore').splitlines() - if '[ERROR]' in x][0].replace('[ERROR]', '').strip() - logger.warning( - "Error running Maven for dependency file (" + project.path_with_namespace + "/" + fpath - + "). Please see debug information below.") - logger.warning(maven_error) - output_report.append( - add_error_report(location, "Error running Maven for the dependency file")) - # Attempt cleanup - cleanup_fs('./tmp') - continue - # Python - elif lang == 'Python': - # Get raw version of requirements.txt - raw_content = get_file_gitlab(project, fpath) - if raw_content is None: - output_report.append( - add_error_report(location, "Error obtaining dependency file from Gitlab")) - continue - # Detect charset and decode - res = detect(raw_content) - if res['encoding'] is not None: - # Remove commented lines - processed_content = re.sub(r'(?m)^ *#.*\n?', '', - raw_content.decode(res['encoding'], errors='ignore')) - else: - # Unknown charset, cannot decode - logger.warning( - 'Error detecting encoding for dependency file (' + project.path_with_namespace + '/' + fpath + ')') - output_report.append( - add_error_report(location, "Error detecting encoding for the dependency file")) - continue - # Sort content - sorted_content = processed_content.split("\n") - sorted_content.sort() - # Fix versions - obtainer = GetPyPiLatestVersion() - processed_content = [] - for line in sorted_content: - line = line.strip() - if line == "": - continue - elif ">" in line: - # If a range of versions is given assume the base version - tmp = line.split('>') - processed_content.append(tmp[0] + "==" + tmp[1].replace("=", "").strip()) - elif "=" not in line: - # When no version is specified, assume the latest - try: - processed_content.append(line + "==" + obtainer(line)) - except ValueError: - logger.warning( - "Error obtaining latest version for " + line + ". Attempting with " + line.capitalize()) - try: - processed_content.append(line.capitalize() + "==" + obtainer(line.capitalize())) - except ValueError: - logger.warning("Error obtaining latest version for " + line.capitalize() + ". Gave up...") - else: - processed_content.append(line) - - # Convert from list to text and ignore duplicates - processed_content = "\n".join(list(OrderedDict.fromkeys(processed_content))) - - # Change format to be compatible with Eclipse Dash - processed_content = re.sub(r'^([^=~ ]+)[=|~]=([^= ]+)$', r'pypi/pypi/-/\1/\2', processed_content, - flags=re.MULTILINE) - processed_content = re.sub(r'\[.*\]', '', processed_content, flags=re.MULTILINE) - - # Encode as file - raw_content = processed_content.encode('utf-8') - # Java or Kotlin using Gradle - elif 'gradle' in fpath: - # Get raw version of build.gradle.kts - raw_content = get_file_gitlab(project, fpath) - if raw_content is None: - if config.getboolean('EclipseDash', 'OutputReport', fallback=True): - output_report.append( - add_error_report(location, "Error obtaining dependency file from Gitlab")) - continue - # Detect charset and decode - res = detect(raw_content) - if res['encoding'] is not None: - # Remove commented lines - processed_content = re.sub(r'(?m)^ *//.*\n?', '', - raw_content.decode(res['encoding'], errors='ignore')) - else: - # Unknown charset, cannot decode - logger.warning( - 'Error detecting encoding for dependency file (' + project.path_with_namespace + '/' + fpath + ')') - output_report.append( - add_error_report(location, "Error detecting encoding for the dependency file")) - continue - # Get only the dependencies - filtered_content = re.findall(r'(?s)(?<=^dependencies\s\{)(.+?)(?=\})', processed_content, - flags=re.MULTILINE) - # If dependencies are empty, continue to next item - if len(filtered_content) == 0: - continue - # Remove Kotlin internals - filtered_content = "\n".join(x for x in filtered_content[0].splitlines() if 'kotlin(' not in x) - # Expand variables with versions - variables = re.findall(r'val(.*=.*)$', processed_content, flags=re.MULTILINE) - for var in variables: - var_declaration = var.split('=') - filtered_content = filtered_content.replace('$' + var_declaration[0].strip(), - var_declaration[1].strip().replace('"', '')) - # Sort dependencies - sorted_content = re.findall(r'"(.*?)"', filtered_content, flags=re.MULTILINE) - sorted_content.sort() - # Convert from list to text and ignore duplicates - processed_content = "\n".join(list(OrderedDict.fromkeys(sorted_content))) - # Encode as file - raw_content = processed_content.encode('utf-8') - else: - raw_content = get_file_gitlab(project, fpath) - if raw_content is None: - output_report.append( - add_error_report(location, "Error obtaining dependency file from Gitlab")) - continue - # Execute Eclipse Dash - with tempfile.TemporaryDirectory() as tmpdir: - # Temporary file for input to Eclipse Dash - dash_input_fpath = os.path.join(tmpdir, os.path.basename(fpath)) - with open(dash_input_fpath, 'wb') as fp: - fp.write(raw_content) - # Run Eclipse Dash process - p_dash = Popen(['java', '-jar', config.get('EclipseDash', 'JarPath', - fallback='assets/org.eclipse.dash.licenses-1.1.1-20240607.055024-182.jar'), - dash_input_fpath, '-summary', config.get('EclipseDash', 'OutputFolder', fallback='output') + - '/' + str(project.id) + '-check_' + lang + str(effective_count) + '.txt', '-batch', - config.get('EclipseDash', 'BatchSize', fallback='500'), '-confidence', - config.get('EclipseDash', 'ConfidenceThreshold', fallback='60')], stdout=PIPE, stderr=PIPE) - stdout, stderr = p_dash.communicate() - - # Add to report output - if config.getboolean('EclipseDash', 'OutputReport', fallback=True): - dash_output = config.get('EclipseDash', 'OutputFolder', fallback='output') + '/' + str( - project.id) + '-check_' + lang + str(effective_count) + '.txt' - with open(dash_output, 'r') as fp: - for line in fp: - output_report.append(project.path_with_namespace + "/-/blob/" + - config.get('General', 'Branch', fallback=project.default_branch) + - "/" + fpath + ";" + line.replace(", ", ";")) - effective_count += 1 - return output_report - - -def get_file_gitlab(project, fpath): - try: - raw_content = project.files.raw(file_path=fpath, - ref=config.get('General', 'Branch', fallback=project.default_branch)) - except gitlab.exceptions.GitlabGetError as e: - logger.warning("Error obtaining file (" + fpath + ") from Gitlab (" + str(e.response_code) + ")") - return - return raw_content - - -def find_dependencies(lang, files, default_filename): - # Attempt to find dependency files - filepaths = [] - for pattern in config.get(lang, 'DependencySearch', fallback=default_filename).split(','): - regex = fnmatch.translate(pattern.strip()) - for f in files: - if re.match(regex, f['name']): - filepaths.append(f['path']) - # print(filepaths) - logger.info("Dependency filepaths for " + lang + ": " + str(filepaths)) - return filepaths - - -def add_dependency_locations(dependency_locations, proj, lang, paths): - for path in paths: - try: - dependency_locations[proj.id][lang].append(str(proj.path_with_namespace) + '/' + path) - except KeyError: - dependency_locations[proj.id][lang] = [] - dependency_locations[proj.id][lang].append(str(proj.path_with_namespace) + '/' + path) - - -def main(): - print("Executing IP Check of Eclipse Gitlab Projects") - logger.info("Starting IP Check of Eclipse Gitlab Projects") - - # Handle parameters and defaults - parser = argparse.ArgumentParser() - parser.add_argument('-b', '--branch', help='branch to check') - parser.add_argument('-c', '--config', default='config.ini', help='config file to use') - parser.add_argument('-g', '--group', type=int, help='Gitlab group ID to check') - parser.add_argument('-p', '--project',type=int, help='Gitlab project ID to check') - parser.add_argument('-pf', '--projects_file', help='file with projects to check') - parser.add_argument('-df', '--dependencies_file', help='file with dependency locations to check') - - try: - args = parser.parse_args() - config.read(args.config) - if args.branch is not None: - if not config.has_section('General'): - config.add_section('General') - config.set('General', 'Branch', str(args.branch)) - if args.group is not None: - if not config.has_section('Groups'): - config.add_section('Groups') - config.set('Groups', 'BaseGroupID', str(args.group)) - if args.project is not None: - if not config.has_section('Projects'): - config.add_section('Projects') - config.set('Projects', 'SingleProjectID', str(args.project)) - if args.projects_file is not None: - if not config.has_section('Projects'): - config.add_section('Projects') - config.set('Projects', 'LoadFromFile', 'Yes') - config.set('Projects', 'InputFile', str(args.projects)) - if args.dependencies_file is not None: - if not config.has_section('DependencyLocations'): - config.add_section('DependencyLocations') - config.set('DependencyLocations', 'LoadFromFile', 'Yes') - config.set('DependencyLocations', 'InputFile', str(args.projects)) - except argparse.ArgumentError as e: - print(e) - - # Gitlab instance - gl = gitlab.Gitlab(url=config.get('General', 'GitlabURL', fallback='https://gitlab.eclipse.org'), - private_token=config.get('General', 'GitlabAuthToken', fallback=None)) - - # Data structure - # dict -> dict -> list - dependency_locations = defaultdict(dict) - - # If a list of dependency locations is given, work with that - if config.getboolean('DependencyLocations', 'LoadFromFile', fallback=False): - input_file = config.get('DependencyLocations', 'InputFile', fallback='gitlab-dependencies.txt') - line_count = 0 - try: - with open(input_file, 'r') as fp: - for line in fp: - # Ignore commented lines - if not line.startswith('#') and line != "": - line_count = line_count + 1 - tokens = line.strip().split(';') - proj_id = int(tokens[0]) - try: - dependency_locations[proj_id][tokens[2]].append(tokens[1]) - except KeyError: - dependency_locations[proj_id][tokens[2]] = [] - dependency_locations[proj_id][tokens[2]].append(tokens[1]) - print("Read " + str(line_count) + " dependency locations from " + input_file) - logger.info("Read " + str(line_count) + " dependency locations from " + input_file) - except FileNotFoundError: - print("The provided dependency file (" + input_file + ") cannot be found. Exiting...") - logger.error("The provided dependency file (" + input_file + ") cannot be found. Exiting...") - exit(1) - # If a list of projects is given, work with that - elif config.getboolean('Projects', 'LoadFromFile', fallback=False): - input_file = config.get('Projects', 'InputFile', fallback='gitlab-projects.txt') - line_count = 0 - try: - with open(input_file, 'r') as fp: - for line in fp: - # Ignore commented lines - if not line.startswith('#'): - line_count = line_count + 1 - proj_id = int(line.strip().split(';')[0]) - dependency_locations[proj_id] = {} - print("Read " + str(line_count) + " projects from " + input_file) - logger.info("Read " + str(line_count) + " projects from " + input_file) - except FileNotFoundError: - print("The provided projects file (" + input_file + ") cannot be found. Exiting...") - logger.error("The provided projects file (" + input_file + ") cannot be found. Exiting...") - exit(1) - # If a group ID is given, get our own list of projects from it - elif config.has_option('Groups', 'BaseGroupID'): - # Set base group ID to work - try: - base_group = gl.groups.get(config.getint('Groups', 'BaseGroupID'), lazy=True) - except ValueError: - print("Invalid BaseGroupID provided. Exiting...") - logger.warning("Invalid BaseGroupID specified. Exiting...") - exit(1) - # Get all projects - try: - projects = base_group.projects.list(include_subgroups=True, all=True, lazy=True) - except gitlab.exceptions.GitlabListError: - print("Invalid BaseGroupID provided. Exiting...") - logger.warning("Invalid BaseGroupID specified. Exiting...") - exit(1) - if config.getboolean('Projects', 'Save', fallback=False): - # Write all projects to a file - output_file = config.get('Projects', 'OutputFile', fallback='gitlab-projects.txt') - with open(output_file, 'w') as fp: - fp.write("#ID;PATH\n") - fp.write("\n".join(str(proj.id) + ';' + str(proj.path_with_namespace) for proj in projects)) - logger.info("Wrote " + str(len(projects)) + " projects to " + output_file) - for proj in projects: - dependency_locations[proj.id] = {} - # Work with a single project ID - elif config.has_option('Projects', 'SingleProjectID'): - dependency_locations[config.getint('Projects', 'SingleProjectID')] = {} - else: - # No valid option provided, exit - print("Insufficient parameters provided. Exiting...") - logger.warning("Insufficient parameters provided. Exiting...") - exit(0) - - # If dependency check with Eclipse Dash is enabled, proceed - if config.getboolean('General', 'CheckDependencies', fallback=True): - # Initialize output with dependency locations - if (config.has_section('DependencyLocations') and - config.getboolean('DependencyLocations', 'Save', fallback=False)): - # Write list of dependency files to a file - output_file = config.get('DependencyLocations', 'OutputFile', fallback='gitlab-dependencies.txt') - with open(output_file, 'w') as fp: - fp.write("#PROJ_ID;PATH;P_LANGUAGE\n") - output_report = [] - proj_count = 0 - print("Handling dependency location(s) for " + str(len(dependency_locations)) + " Gitlab project(s)") - logger.info("Handling dependency location(s) for " + str(len(dependency_locations)) + " Gitlab project(s)") - # For all projects to be processed - for proj in dependency_locations.keys(): - proj_count = proj_count + 1 - print("Handling Gitlab project " + str(proj_count) + "/" + str(len(dependency_locations))) - logger.info("Handling Gitlab project " + str(proj_count) + "/" + str(len(dependency_locations))) - - # Get project details - max_attempts = int(config.get('General', 'GitlabConnectionAttempts', fallback=3)) + 1 - for i in range(max_attempts): - try: - p_details = gl.projects.get(proj) - - except BaseException: - # Max attempts reached - if i == max_attempts - 1: - print("Connection error fetching project with ID after " + str( - max_attempts - 1) + "attempts. Exiting...") - logger.error("Connection error fetching project with ID after " + str( - max_attempts - 1) + "attempts. Exiting...") - exit(1) - logger.warning("Connection error fetching project with ID. Retrying in 30 seconds..." + str(proj)) - sleep(30) - - logger.info("Project full path: " + str(p_details.path_with_namespace)) - - # User did not provide dependencies for the project - if len(dependency_locations[proj]) == 0: - logger.info("No dependencies given for project. Attempting to find them.") - # Get programming languages of the project - p_langs = p_details.languages() - logger.info("Project programming languages: " + str(p_langs)) - # Get a list of files in the project repository - files = [] - try: - files = p_details.repository_tree( - ref=config.get('General', 'Branch', fallback=p_details.default_branch), - recursive=True, all=True) - except gitlab.exceptions.GitlabGetError: - logger.warning("Project repository not found for: " + p_details.path_with_namespace) - # Attempt to find dependency files for supported programming languages - if 'Go' in p_langs and config.getboolean('Go', 'Enabled', fallback=True): - dependency_paths = find_dependencies('Go', files, default_filename='go.sum') - add_dependency_locations(dependency_locations, p_details, 'Go', dependency_paths) - if 'Java' in p_langs and config.getboolean('Java', 'Enabled', fallback=True): - dependency_paths = find_dependencies('Java', files, default_filename='pom.xml') - add_dependency_locations(dependency_locations, p_details, 'Java', dependency_paths) - if 'JavaScript' in p_langs and config.getboolean('JavaScript', 'Enabled', fallback=True): - dependency_paths = find_dependencies('JavaScript', files, default_filename='package-lock.json') - add_dependency_locations(dependency_locations, p_details, 'JavaScript', dependency_paths) - if 'Kotlin' in p_langs and config.getboolean('Kotlin', 'Enabled', fallback=True): - dependency_paths = find_dependencies('Kotlin', files, default_filename='build.gradle.kts') - add_dependency_locations(dependency_locations, p_details, 'Kotlin', dependency_paths) - if 'Python' in p_langs and config.getboolean('Python', 'Enabled', fallback=True): - dependency_paths = find_dependencies('Python', files, default_filename='requirements.txt') - add_dependency_locations(dependency_locations, p_details, 'Python', dependency_paths) - # Dash Check - for lang in dependency_locations[proj].keys(): - if config.getboolean(lang, 'Enabled', fallback=True): - print("Processing " + str(len(dependency_locations[proj][lang])) + - " dependency location(s) for " + lang + " in project " + p_details.path_with_namespace) - logger.info("Processing " + str(len(dependency_locations[proj][lang])) + - " dependency location(s) for " + lang + " in project" + p_details.path_with_namespace) - output_report.extend(dash_check(p_details, dependency_locations[proj][lang], lang)) - - # Initialize output with dependency locations - if (config.has_section('DependencyLocations') and - config.getboolean('DependencyLocations', 'Save', fallback=False)): - # Write list of dependency locations to a file - output_file = config.get('DependencyLocations', 'OutputFile', fallback='gitlab-dependencies.txt') - line_count = 0 - with open(output_file, 'a') as fp: - for proj in dependency_locations.keys(): - for lang in dependency_locations[proj].keys(): - fp.write("\n".join(str(proj) + ';' + depl + ';' + lang - for depl in dependency_locations[proj][lang])) - line_count = line_count + 1 - fp.write("\n") - logger.info("Wrote " + str(line_count) + " dependency locations to " + output_file) - if config.getboolean('EclipseDash', 'OutputReport', fallback=True): - base_url = config.get('General', 'GitlabURL', fallback='https://gitlab.eclipse.org') + "/" - try: - with open(config.get('General', 'VerifiedDependencies', fallback='verified-dependencies.txt'), - 'r') as fp: - for line in fp: - # Ignore commented/blank lines - if line.startswith('#') or line.strip() == '': - continue - tokens = line.split(';') - # Check all items in the current output report - for item in output_report: - # If the verified dependency is present (and not approved), add the verified column value - if tokens[0].lower() in item.lower() and 'approved' not in item: - index = output_report.index(item) - # Get verification status from comments - verification = tokens[1].split(' ')[0].lower() - # Add verification status + comments in different columns to improve filtering - output_report[index] = output_report[index] + ";" + verification + ";" + tokens[1] - except FileNotFoundError: - logger.warning("Verified dependencies file (" + - config.get('General', 'VerifiedDependencies', - fallback='verified-dependencies.txt') + ") was not found") - # Generate output report - report_filename = report_generator.render(base_url, output_report) - - print("IP Check Report written to " + os.path.join(os.getcwd(), report_filename)) - logger.info("IP Check Report written to " + os.path.join(os.getcwd(), report_filename)) - - print("IP Check of Eclipse Gitlab Projects is now complete. Goodbye!") - logger.info("IP Check of Eclipse Gitlab Projects is now complete. Goodbye!") - - -if __name__ == '__main__': - main() diff --git a/output/README.md b/output/README.md deleted file mode 100644 index 316a1929e815758ac8e1a9f163119743de9f1345..0000000000000000000000000000000000000000 --- a/output/README.md +++ /dev/null @@ -1,15 +0,0 @@ -<!-- - * Copyright (c) 2024 The Eclipse Foundation - * - * This program and the accompanying materials are made available under the - * terms of the Eclipse Public License v. 2.0 which is available at - * http://www.eclipse.org/legal/epl-2.0. - * - * SPDX-FileType: DOCUMENTATION - * SPDX-FileCopyrightText: 2024 The Eclipse Foundation - * SPDX-License-Identifier: EPL-2.0 ---> - -# Gitlab IP Check - -## Default output folder for Eclipse Dash \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000000000000000000000000000000000000..98fcbe9c131692746d53944bf293bf5b52185795 --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,47 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial definition + +[build-system] +requires = ["hatchling"] +build-backend = "hatchling.build" + +[project] +name = "eclipse-ipa" +dynamic = ["version"] +authors = [ + { name="André Gomes", email="andre.gomes@eclipse-foundation.org" }, +] +description = "A package to perform IP Analysis for GitHub and GitLab projects" +readme = "README.md" +license = { text = "EPL-2.0" } +requires-python = ">=3.6,<3.13" +dependencies = ["chardet==5.2.0", "get-pypi-latest-version==0.1.0", "jinja2==3.1.5", "python-gitlab==5.3.1", +"PyGitHub==2.5.0"] +classifiers = [ + "Programming Language :: Python :: 3", + "License :: OSI Approved :: Eclipse Public License 2.0 (EPL-2.0)", + "Operating System :: OS Independent", +] + +[project.scripts] +eclipse-ipa = 'ipa:main' + +[tool.hatch.build.targets.sdist] +exclude = [ + "/.git", + "/docs", +] + +[tool.hatch.build.targets.wheel] +packages = ["src/eclipse/ipa"] + +[tool.hatch.version] +path = "src/eclipse/ipa/__init__.py" diff --git a/src/eclipse/__init__.py b/src/eclipse/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..95dd682ff48021d44cd52d13c47a1fcdb3b597f2 --- /dev/null +++ b/src/eclipse/__init__.py @@ -0,0 +1,12 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" diff --git a/src/eclipse/ipa/__init__.py b/src/eclipse/ipa/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..e17368e472d6229e7f82d3fe9e00a289cb56ae4b --- /dev/null +++ b/src/eclipse/ipa/__init__.py @@ -0,0 +1,113 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" + +import argparse +import configparser +from shutil import which +from . import glab +from . import ghub + +config = configparser.ConfigParser() + + +def main(): + # Handle parameters and defaults + parser = argparse.ArgumentParser() + parser.add_argument('-ci', '--ci_mode', action='store_true', help='execute in CI mode') + parser.add_argument('-gh', '--github', action='store_true', help='execute for GitHub') + parser.add_argument('-gl', '--gitlab', default='gitlab.eclipse.org', help='execute for GitLab URL') + + parser.add_argument('-b', '--branch', help='branch to analyze') + parser.add_argument('-c', '--config', default='config.ini', help='config file to use') + parser.add_argument('-g', '--group', help='Github organization/Gitlab group ID to analyze') + parser.add_argument('-p', '--project', help='Github/Gitlab project ID to analyze') + parser.add_argument('-s', '--summary', action='store_true', help='output is a Dash summary file') + parser.add_argument('-t', '--token', help='access token for API') + parser.add_argument('-pf', '--projects_file', help='file with projects to analyze') + parser.add_argument('-df', '--dependencies_file', help='file with dependency locations to analyze') + + try: + args = parser.parse_args() + config.read(args.config) + + if args.branch is not None: + if not config.has_section('General'): + config.add_section('General') + config.set('General', 'Branch', str(args.branch)) + if args.group is not None: + if not config.has_section('Groups'): + config.add_section('Groups') + config.set('Groups', 'BaseGroupID', str(args.group)) + if args.project is not None: + if not config.has_section('Projects'): + config.add_section('Projects') + config.set('Projects', 'SingleProjectID', str(args.project)) + if args.projects_file is not None: + if not config.has_section('Projects'): + config.add_section('Projects') + config.set('Projects', 'LoadFromFile', 'Yes') + config.set('Projects', 'InputFile', str(args.projects_file)) + if args.dependencies_file is not None: + if not config.has_section('DependencyLocations'): + config.add_section('DependencyLocations') + config.set('DependencyLocations', 'LoadFromFile', 'Yes') + config.set('DependencyLocations', 'InputFile', str(args.dependencies_file)) + if args.summary: + if not config.has_section('EclipseDash'): + config.add_section('EclipseDash') + config.set('EclipseDash', 'OutputReport', 'No') + config.set('EclipseDash', 'OutputSummary', 'Yes') + + # Check for pre-requisites + if which('git') is None: + print('Git CLI not found. Please check the official installation instructions:') + print('https://git-scm.com/book/en/v2/Getting-Started-Installing-Git') + print('Exiting...') + exit(0) + if which('java') is None: + print('Java CLI not found. Please check the official installation instructions:') + print('https://adoptium.net/installation/') + print('Exiting...') + exit(0) + if which('mvn') is None: + print('Maven CLI not found. Please check the official installation instructions:') + print('https://maven.apache.org/install.html') + print('Exiting...') + exit(0) + + # If in CI mode + if args.ci_mode: + glab.ci.execute() + elif args.github: + if args.token is not None: + if not config.has_section('General'): + config.add_section('General') + config.set('General', 'GithubAuthToken', str(args.token)) + ghub.remote.execute(config) + elif args.gitlab is not None: + if not config.has_section('General'): + config.add_section('General') + config.set('General', 'GitlabURL', 'https://' + str(args.gitlab)) + if args.token is not None: + config.set('General', 'GitlabAuthToken', str(args.token)) + glab.remote.execute(config) + else: + print('Not yet supported. Exiting...') + exit(0) + + except argparse.ArgumentError as e: + print(e) + + +if __name__ == '__main__': + main() diff --git a/src/eclipse/ipa/dash/__init__.py b/src/eclipse/ipa/dash/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..c12a6f2ec54c4ce5a0dabe3079261d79c6e1ac44 --- /dev/null +++ b/src/eclipse/ipa/dash/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +from . import report, run + +__all__ = ['report', 'run'] +__version__ = '0.1.0' diff --git a/assets/org.eclipse.dash.licenses-1.1.1-20240607.055024-182.jar b/src/eclipse/ipa/dash/assets/eclipse-dash.jar similarity index 100% rename from assets/org.eclipse.dash.licenses-1.1.1-20240607.055024-182.jar rename to src/eclipse/ipa/dash/assets/eclipse-dash.jar diff --git a/report_generator.py b/src/eclipse/ipa/dash/report.py similarity index 67% rename from report_generator.py rename to src/eclipse/ipa/dash/report.py index 4400bd90ea5e8df3047ba59311124dd21397277e..62402a5386567a6bf0adec704b669a6b7c388037 100644 --- a/report_generator.py +++ b/src/eclipse/ipa/dash/report.py @@ -10,28 +10,32 @@ # asgomes - Initial implementation import re -from datetime import datetime, date -from jinja2 import Environment, FileSystemLoader +from datetime import date +from jinja2 import Environment, PackageLoader row_template = ''' <tr> <td class="bs-checkbox"></td> ''' -def render(base_url, entries, report_filename=""): - env = Environment(loader=FileSystemLoader('templates')) +def render(base_url, entries, report_filename="", branch=None): + env = Environment(loader=PackageLoader('ipa', 'templates')) template = env.get_template('report_template.jinja') if report_filename == "": - report_filename = datetime.now().strftime("%Y%m%d_%H%M%S") + "-ip-report.html" + report_filename = "ip-report.html" trows = "" for e in entries: trow = row_template columns = e.split(';') # Update location to URL - columns[0] = ('<a href="' + base_url + columns[0] + '" target="_blank">' + - re.sub(r'-/blob/.*?/', '', columns[0]) + '</a>') + if branch: + columns[0] = ('<a href="' + base_url + '/-/blob/' + branch + columns[0][1:] + + '" target="_blank">' + columns[0][2:] + '</a>') + else: + columns[0] = ('<a href="' + base_url + columns[0] + '" target="_blank">' + + re.sub(r'/?-?/blob/.*?/', '/', columns[0]) + '</a>') # Set empty license name to Unknown if columns[2].strip() == "" or columns[2].strip() == "unknown": columns[2] = "Unknown" @@ -41,7 +45,7 @@ def render(base_url, entries, report_filename=""): columns.append('-') else: # Replace any URLs with HTML links (for comments) - urls = re.compile(r"((https?):((//)|(\\\\))+[\w\d:#@%/;$~_?+-=\\.&]*)", re.UNICODE) + urls = re.compile(r"((https?):((//)|(\\\\))+[\w:#@%/;$~_?+-=\\.&]*)", re.UNICODE) columns[len(columns) - 1] = urls.sub(r'<a href="\1" target="_blank">\1</a>', columns[len(columns) - 1]) # Write all columns for this row for col in columns: @@ -51,5 +55,3 @@ def render(base_url, entries, report_filename=""): with open(report_filename, 'w') as fp: print(template.render(trows=trows, year=date.today().year), file=fp) - - return report_filename diff --git a/src/eclipse/ipa/dash/run.py b/src/eclipse/ipa/dash/run.py new file mode 100644 index 0000000000000000000000000000000000000000..89dd44951c25ff1bb06e6de953675ca7ca4d8050 --- /dev/null +++ b/src/eclipse/ipa/dash/run.py @@ -0,0 +1,259 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +import re +import tempfile +from collections import OrderedDict +from datetime import datetime +from importlib import resources +from os import path +from shutil import which +from subprocess import PIPE, Popen + +from chardet import detect +from get_pypi_latest_version import GetPyPiLatestVersion + + +def read_file(fpath, decode=True): + # Get file contents + try: + with open(fpath, 'rb') as fp: + raw_contents = fp.read() + except FileNotFoundError: + return None + + # If contents need to be decoded + if decode: + # Detect charset and decode + res = detect(raw_contents) + if res['encoding'] is None: + return None + return raw_contents.decode(res['encoding']) + + return raw_contents + + +def handle_gradle(contents): + # Get only the dependencies + filtered_contents = re.findall(r'(?s)(?<=^dependencies\s\{)(.+?)(?=})', contents, + flags=re.MULTILINE) + + # Remove Kotlin internals + filtered_contents = "\n".join(x for x in filtered_contents[0].splitlines() if 'kotlin(' not in x) + + # Expand variables with versions + variables = re.findall(r'val(.*=.*)$', filtered_contents, flags=re.MULTILINE) + for var in variables: + var_declaration = var.split('=') + filtered_contents = filtered_contents.replace('$' + var_declaration[0].strip(), + var_declaration[1].strip().replace('"', '')) + + # Sort dependencies + sorted_contents = re.findall(r'"(.*?)"', filtered_contents, flags=re.MULTILINE) + sorted_contents.sort() + + # Convert from list to text and ignore duplicates + processed_contents = "\n".join(list(OrderedDict.fromkeys(sorted_contents))) + + # Encode as file + raw_contents = processed_contents.encode('utf-8') + + return raw_contents + + +def handle_maven(fpath): + # Run Maven + p_maven = Popen([which('mvn'), '-f', fpath, 'verify', 'dependency:list', '-DskipTests', + '-Dmaven.javadoc.skip=true', '-DoutputFile=maven.deps'], stdout=PIPE, stderr=PIPE) + stdout, stderr = p_maven.communicate() + + # If no errors from Maven + if p_maven.returncode == 0: + with open(fpath.replace('pom.xml', 'maven.deps'), 'r') as fp: + raw_contents = fp.read() + + # Retrieve only the right content + processed_contents = [x.group(0) for x in re.finditer(r'\S+:(system|provided|compile)', raw_contents)] + + # Sort and remove duplicates + processed_contents.sort() + processed_contents = "\n".join(list(OrderedDict.fromkeys(processed_contents))) + raw_contents = processed_contents.encode('utf-8') + + return raw_contents, None + else: + # Get Maven error + maven_error = [x for x in stdout.decode('utf-8', errors='ignore').splitlines() + if '[ERROR]' in x][0].replace('[ERROR]', '').strip() + return None, maven_error + + +class Dash: + def __init__(self, config, logger): + self.config = config + self.logger = logger + + def dash_execute(self, raw_contents, tmpdir): + # Output summary full path + summary_filepath = path.join(tmpdir, str(datetime.now().timestamp()) + '_analysis.txt') + + # Run Eclipse Dash + with resources.as_file(resources.files(__package__).joinpath('assets/eclipse-dash.jar')) as exe: + p_dash = Popen(['java', '-jar', str(exe), '-', + '-summary', summary_filepath, '-batch', self.config['batch_size'], '-confidence', + self.config['confidence_threshold']], stdin=PIPE, stdout=PIPE, stderr=PIPE) + stdout, stderr = p_dash.communicate(input=raw_contents) + # print(stdout) + return p_dash.returncode, summary_filepath + + def dash_report(self, raw_contents): + report = [] + with tempfile.TemporaryDirectory() as tmpdir: + return_code, summary_filepath = self.dash_execute(raw_contents, tmpdir) + try: + with open(summary_filepath, 'r') as fp: + for line in fp: + report.append(line.replace(", ", ";")) + except FileNotFoundError: + return None + return report + + def dash_generic(self, dependency_locations): + report = [] + for dependency in dependency_locations: + # Get file raw contents + raw_contents = read_file(dependency, decode=False) + + # Run Dash and get report + rep = self.dash_report(raw_contents) + if rep is not None: + for line in rep: + report.append(dependency + ";" + line) + return report + + def dash_python(self, dependency_locations): + report = [] + for dependency in dependency_locations: + # Get file contents + contents = read_file(dependency) + if contents is None: + report.append(dependency + ";Error detecting encoding for the dependency file;;error;-") + continue + + # Remove commented lines + contents = re.sub(r'(?m)#.*', '', contents, flags=re.MULTILINE) + + # If multiple version conditions given, only considered the base one + contents = re.sub(r'(?m),.*', '', contents, flags=re.MULTILINE) + + # Sort content + sorted_contents = contents.split("\n") + sorted_contents.sort() + + # To get latest versions + obtainer = GetPyPiLatestVersion() + + # Handle versions + contents = [] + for line in sorted_contents: + line = line.strip() + if line == "": + continue + elif ">" in line: + # If a range of versions is given assume the base version + tmp = line.split('>') + contents.append(tmp[0] + "==" + tmp[1].replace("=", "").strip()) + elif "=" not in line: + # When no version is specified, assume the latest + try: + contents.append(line + "==" + obtainer(line)) + except ValueError: + self.logger.warning( + "Error obtaining latest version for " + line + ". Attempting with " + line.capitalize()) + try: + contents.append(line.capitalize() + "==" + obtainer(line.capitalize())) + except ValueError: + self.logger.warning( + "Error obtaining latest version for " + line.capitalize() + ". Gave up...") + continue + else: + contents.append(line) + + # Convert from list to text and ignore duplicates + contents = "\n".join(list(OrderedDict.fromkeys(contents))) + + # Change format to be compatible with Eclipse Dash + contents = re.sub(r'^([^=~ ]+)[=|~]=([^= ]+)$', r'pypi/pypi/-/\1/\2', contents, + flags=re.MULTILINE) + contents = re.sub(r'\[.*]', '', contents, flags=re.MULTILINE) + + # Encode as file + raw_contents = contents.encode('utf-8') + + # Run Dash and get report + rep = self.dash_report(raw_contents) + if rep is not None: + for line in rep: + report.append(dependency + ";" + line) + + return report + + def dash_java(self, dependency_locations): + report = [] + for dependency in dependency_locations: + if 'gradle' in dependency: + # Get file contents + contents = read_file(dependency) + if contents is None: + report.append(dependency + ";Error detecting encoding for the dependency file;;error;-") + continue + + # Process contents for Gradle analysis + raw_contents = handle_gradle(contents) + + # Run Dash and get report + rep = self.dash_report(raw_contents) + if rep is not None: + for line in rep: + report.append(dependency + ";" + line) + elif 'pom.xml' in dependency: + # Process contents for Maven analysis + raw_contents, error = handle_maven(dependency) + if raw_contents is None: + report.append(dependency + ";" + error + ";;error;-") + continue + + # Run Dash and get report + rep = self.dash_report(raw_contents) + if rep is not None: + for line in rep: + report.append(dependency + ";" + line) + return report + + def dash_kotlin(self, dependency_locations): + report = [] + for dependency in dependency_locations: + if 'gradle' in dependency: + # Get file contents + contents = read_file(dependency) + if contents is None: + report.append(dependency + ";Error detecting encoding for the dependency file;;error;-") + continue + + # Process contents for Gradle analysis + raw_contents = handle_gradle(contents) + + # Run Dash and get report + rep = self.dash_report(raw_contents) + if rep is not None: + for line in rep: + report.append(dependency + ";" + line) + return report diff --git a/src/eclipse/ipa/general/__init__.py b/src/eclipse/ipa/general/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..2e064075247d8589a07f2bd54a1dc9ae86919728 --- /dev/null +++ b/src/eclipse/ipa/general/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +from . import utils + +__all__ = ['utils'] +__version__ = '0.1.0' diff --git a/src/eclipse/ipa/general/utils.py b/src/eclipse/ipa/general/utils.py new file mode 100644 index 0000000000000000000000000000000000000000..d1a129eb57f5e3857d029b55524751909a2bb601 --- /dev/null +++ b/src/eclipse/ipa/general/utils.py @@ -0,0 +1,64 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" + +import fnmatch +import re + + +def find_dependencies_gitlab(config, logger, lang, files, default_filename): + # Attempt to find dependency files + filepaths = [] + for pattern in config.get(lang, 'DependencySearch', fallback=default_filename).split(','): + regex = fnmatch.translate(pattern.strip()) + for f in files: + if re.match(regex, f['name']): + filepaths.append(f['path']) + # print(filepaths) + logger.info("Dependency filepaths for " + lang + ": " + str(filepaths)) + return filepaths + + +def find_dependencies_github(config, logger, lang, files, default_filename): + # Attempt to find dependency files + filepaths = [] + for pattern in config.get(lang, 'DependencySearch', fallback=default_filename).split(','): + regex = fnmatch.translate(pattern.strip()) + for f in files: + if re.match(regex, f.name): + filepaths.append(f.path) + # print(filepaths) + logger.info("Dependency filepaths for " + lang + ": " + str(filepaths)) + return filepaths + + +def add_gldep_locations(dependency_locations, proj, lang, paths): + for path in paths: + try: + dependency_locations[proj.id][lang].append(str(proj.path_with_namespace) + '/' + path) + except KeyError: + dependency_locations[proj.id][lang] = [] + dependency_locations[proj.id][lang].append(str(proj.path_with_namespace) + '/' + path) + + +def add_ghdep_locations(dependency_locations, proj, lang, paths): + for path in paths: + try: + dependency_locations[proj][lang].append(path) + except KeyError: + dependency_locations[proj][lang] = [] + dependency_locations[proj][lang].append(path) + + +def add_error_report(config, location, error): + if config.getboolean('EclipseDash', 'OutputReport', fallback=True): + return location + ";" + error + ";;error;-" diff --git a/src/eclipse/ipa/ghub/__init__.py b/src/eclipse/ipa/ghub/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..13114b0f3c6668b32194241f2e06fa9c79ff47c7 --- /dev/null +++ b/src/eclipse/ipa/ghub/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +from . import remote + +__all__ = ['remote'] +__version__ = '0.1.0' diff --git a/src/eclipse/ipa/ghub/remote.py b/src/eclipse/ipa/ghub/remote.py new file mode 100644 index 0000000000000000000000000000000000000000..ecba357a24763e8ce6cf00fcf599121b30b5d481 --- /dev/null +++ b/src/eclipse/ipa/ghub/remote.py @@ -0,0 +1,407 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" + +import logging +import os +import re +import shutil +import tempfile +from collections import defaultdict +from datetime import datetime +from pathlib import Path +from subprocess import PIPE, Popen +from time import sleep + +from github import Auth, Github + +from ..dash import report, run +from ..general import utils + +logger = logging.getLogger(__name__) + + +def get_dependency_locations(gh, config): + # Data structure + # dict -> dict -> list + dependency_locations = defaultdict(dict) + + # If a list of dependency locations is given, work with that + if config.getboolean('DependencyLocations', 'LoadFromFile', fallback=False): + input_file = config.get('DependencyLocations', 'InputFile', fallback='github-dependencies.txt') + line_count = 0 + try: + with open(input_file, 'r') as fp: + for line in fp: + # Ignore commented lines + if not line.startswith('#') and line != "": + line_count = line_count + 1 + tokens = line.strip().split(';') + proj_id = tokens[0] + try: + dependency_locations[proj_id][tokens[2]].append(tokens[1]) + except KeyError: + dependency_locations[proj_id][tokens[2]] = [] + dependency_locations[proj_id][tokens[2]].append(tokens[1]) + print("Read " + str(line_count) + " dependency locations from " + input_file) + logger.info("Read " + str(line_count) + " dependency locations from " + input_file) + except FileNotFoundError: + print("The provided dependency file (" + input_file + ") cannot be found. Exiting...") + logger.error("The provided dependency file (" + input_file + ") cannot be found. Exiting...") + exit(1) + # If a list of projects is given, work with that + elif config.getboolean('Projects', 'LoadFromFile', fallback=False): + input_file = config.get('Projects', 'InputFile', fallback='github-projects.txt') + line_count = 0 + try: + with open(input_file, 'r') as fp: + for line in fp: + # Ignore commented lines + if not line.startswith('#'): + line_count = line_count + 1 + proj_id = line.strip() + dependency_locations[proj_id] = {} + print("Read " + str(line_count) + " projects from " + input_file) + logger.info("Read " + str(line_count) + " projects from " + input_file) + except FileNotFoundError: + print("The provided projects file (" + input_file + ") cannot be found. Exiting...") + logger.error("The provided projects file (" + input_file + ") cannot be found. Exiting...") + exit(1) + # If an organization is given, get our own list of projects from it + elif config.has_option('Groups', 'BaseGroupID'): + # Set base organization to work + organization = gh.get_organization(config.get('Groups', 'BaseGroupID')) + + # Get all projects + projects = organization.get_repos() + # print("Invalid BaseGroupID provided. Exiting...") + # logger.warning("Invalid BaseGroupID specified. Exiting...") + + # Iterate over all projects (several API calls because of pagination) + for proj in projects: + dependency_locations[proj.full_name] = {} + + if config.getboolean('Projects', 'Save', fallback=False): + # Write all projects to a file + output_file = config.get('Projects', 'OutputFile', fallback='github-projects.txt') + with open(output_file, 'w') as fp: + fp.write("#FULL_NAME\n") + fp.write("\n".join(proj for proj in dependency_locations.keys())) + logger.info("Wrote " + str(len(projects)) + " projects to " + output_file) + # Work with a single project ID + elif config.has_option('Projects', 'SingleProjectID'): + dependency_locations[config.get('Projects', 'SingleProjectID')] = {} + else: + # No valid option provided, exit + print("Insufficient parameters provided. Exiting...") + logger.warning("Insufficient parameters provided. Exiting...") + exit(0) + + return dependency_locations + + +def analyze_dependencies(gh, config, dependency_locations): + # If dependency check with Eclipse Dash is enabled, proceed + if config.getboolean('General', 'AnalyzeDependencies', fallback=True): + # Initialize output with dependency locations + if (config.has_section('DependencyLocations') and + config.getboolean('DependencyLocations', 'Save', fallback=False)): + # Write list of dependency files to a file + output_file = config.get('DependencyLocations', 'OutputFile', fallback='github-dependencies.txt') + with open(output_file, 'w') as fp: + fp.write("#PROJ_ID;PATH;P_LANGUAGE\n") + output_report = [] + proj_count = 0 + print("Handling dependency location(s) for " + str(len(dependency_locations)) + " Github project(s)") + logger.info("Handling dependency location(s) for " + str(len(dependency_locations)) + " Github project(s)") + # For all projects to be processed + for proj in dependency_locations.keys(): + proj_count = proj_count + 1 + print("Handling Github project " + str(proj_count) + "/" + str(len(dependency_locations))) + logger.info("Handling Github project " + str(proj_count) + "/" + str(len(dependency_locations))) + + # Get project details + max_attempts = int(config.get('General', 'APIConnectionAttempts', fallback=3)) + 1 + for i in range(max_attempts): + try: + p_details = gh.get_repo(re.sub(r'https://github.com/', '', proj)) + except BaseException: + # Max attempts reached + if i == max_attempts - 1: + print("Connection error fetching project with ID " + str(proj) + "after " + str( + max_attempts - 1) + " attempts. Exiting...") + logger.error("Connection error fetching project with ID " + str(proj) + "after " + str( + max_attempts - 1) + " attempts. Exiting...") + exit(1) + logger.warning( + "Connection error fetching project with ID " + str(proj) + ". Retrying in 30 seconds...") + sleep(30) + + logger.info("Project full path: " + str(p_details.full_name)) + + # User did not provide dependencies for the project + if len(dependency_locations[proj]) == 0: + logger.info("No dependencies given for project. Attempting to find them.") + + # Get programming languages of the project + p_langs = p_details.get_languages() + logger.info("Project programming languages: " + str(p_langs)) + + # Get a list of files in the project repository + files = [] + repo_contents = p_details.get_contents("") + while repo_contents: + file = repo_contents.pop(0) + if file.type == "dir": + repo_contents.extend(p_details.get_contents(file.path)) + else: + files.append(file) + # logger.warning("Project repository not found for: " + p_details.full_name) + + # Attempt to find dependency files for supported programming languages + if 'Go' in p_langs and config.getboolean('Go', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_github(config, logger, 'Go', files, + default_filename='go.sum') + utils.add_ghdep_locations(dependency_locations, proj, 'Go', dependency_paths) + if 'Java' in p_langs and config.getboolean('Java', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_github(config, logger, 'Java', files, + default_filename='pom.xml') + utils.add_ghdep_locations(dependency_locations, proj, 'Java', dependency_paths) + if 'JavaScript' in p_langs and config.getboolean('JavaScript', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_github(config, logger, 'JavaScript', files, + default_filename='package-lock.json') + utils.add_ghdep_locations(dependency_locations, proj, 'JavaScript', dependency_paths) + if 'Kotlin' in p_langs and config.getboolean('Kotlin', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_github(config, logger, 'Kotlin', files, + default_filename='build.gradle.kts') + utils.add_ghdep_locations(dependency_locations, proj, 'Kotlin', dependency_paths) + if 'Python' in p_langs and config.getboolean('Python', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_github(config, logger, 'Python', files, + default_filename='requirements.txt') + utils.add_ghdep_locations(dependency_locations, proj, 'Python', dependency_paths) + + # Dash Analysis + for lang in dependency_locations[proj].keys(): + if config.getboolean(lang, 'Enabled', fallback=True): + print("Processing " + str(len(dependency_locations[proj][lang])) + + " dependency location(s) for " + lang + " in project " + p_details.full_name) + logger.info("Processing " + str(len(dependency_locations[proj][lang])) + + " dependency location(s) for " + lang + " in project" + p_details.full_name) + output_report.extend(dash_processing(config, p_details, dependency_locations[proj][lang], lang)) + + return output_report + + +def dash_processing(config, project, filepaths, lang): + effective_count = 0 + total_count = 0 + output_report = [] + dash_config = { + 'batch_size': config.get('EclipseDash', 'BatchSize', fallback='500'), + 'confidence_threshold': config.get('EclipseDash', 'ConfidenceThreshold', fallback='60'), + } + dash_runner = run.Dash(dash_config, logger) + + for fpath in filepaths: + total_count = total_count + 1 + print("Processing " + lang + " dependency location " + str(total_count) + "/" + str(len(filepaths))) + logger.info("Processing " + lang + " dependency location " + str(total_count) + "/" + str(len(filepaths))) + + # Set full location for reports + location = project.full_name + "/blob/" + config.get('General', 'Branch', + fallback=project.default_branch) + "/" + fpath + # Java (Maven Only) + if lang == 'Java' and 'gradle' not in fpath: + # Git clone repo for Maven + with tempfile.TemporaryDirectory() as tmpdir: + p_git = Popen([shutil.which('git'), 'clone', '-b', + config.get('General', 'Branch', fallback=project.default_branch), '--single-branch', + '--depth', '1', project.clone_url, tmpdir], stdout=PIPE, stderr=PIPE) + stdout, stderr = p_git.communicate() + # If errors from Git clone + if p_git.returncode != 0: + logger.warning( + "Error Git cloning repository for dependency file (" + project.full_name + "/" + fpath + + "). Please check.") + logger.warning(stdout) + logger.warning(stderr) + output_report.append( + utils.add_error_report(config, location, + "Error Git cloning repository for the dependency file")) + continue + # Create dependency list with Maven + relative_path = tmpdir + os.pathsep + fpath.replace(project.full_name, "") + + dash_output = dash_runner.dash_java(relative_path) + for line in dash_output: + if 'error' in line: + columns = line.split(';') + logger.warning( + "Error running Maven for dependency file (" + project.full_name + "/" + fpath + + "). Please see debug information below.") + logger.warning(columns[1]) + output_report.append( + utils.add_error_report(config, location, "Error running Maven for the dependency file")) + continue + else: + line = re.sub(r'(.*?);', project.full_name + "/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + # Java or Kotlin using Gradle + elif 'gradle' in fpath: + with tempfile.TemporaryDirectory() as tmpdir: + # Get raw version of build.gradle.kts + if tmpfile := get_file_github(config, project, fpath, tmpdir): + dash_output = dash_runner.dash_java([str(tmpfile)]) + for line in dash_output: + line = re.sub(r'(.*?);', project.full_name + "/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + else: + output_report.append( + utils.add_error_report(config, location, "Error obtaining dependency file from Gitlab")) + continue + # Python + elif lang == 'Python': + with tempfile.TemporaryDirectory() as tmpdir: + # Get raw version of requirements.txt + if tmpfile := get_file_github(config, project, fpath, tmpdir): + dash_output = dash_runner.dash_python([str(tmpfile)]) + for line in dash_output: + line = re.sub(r'(.*?);', project.full_name + "/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + else: + output_report.append( + utils.add_error_report(config, location, "Error obtaining dependency file from Gitlab")) + continue + # Go, Javascript (or others directly supported) + else: + with tempfile.TemporaryDirectory() as tmpdir: + # Get raw version of file + if tmpfile := get_file_github(config, project, fpath, tmpdir): + dash_output = dash_runner.dash_generic([str(tmpfile)]) + for line in dash_output: + line = re.sub(r'(.*?);', project.full_name + "/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + else: + output_report.append( + utils.add_error_report(config, location, "Error obtaining dependency file from Gitlab")) + continue + effective_count += 1 + + return output_report + + +def get_file_github(config, project, fpath, tmpdir): + wpath = Path(f'{os.path.join(tmpdir, os.path.basename(fpath))}') + with open(wpath, 'w+b') as f: + file_contents = project.get_contents(fpath, ref=config.get('General', 'Branch', + fallback=project.default_branch)).decoded_content + f.write(file_contents) + return wpath + # logger.warning("Error obtaining file (" + fpath + ") from Gitlab (" + str(e.response_code) + ")") + + +def write_output(config, dependency_locations, output_report): + # Initialize output with dependency locations + if (config.has_section('DependencyLocations') and + config.getboolean('DependencyLocations', 'Save', fallback=False)): + # Write list of dependency locations to a file + output_file = config.get('DependencyLocations', 'OutputFile', fallback='github-dependencies.txt') + line_count = 0 + with open(output_file, 'a') as fp: + for proj in dependency_locations.keys(): + for lang in dependency_locations[proj].keys(): + fp.write("\n".join(str(proj.id) + ';' + depl + ';' + lang + for depl in dependency_locations[proj][lang])) + line_count = line_count + 1 + fp.write("\n") + logger.info("Wrote " + str(line_count) + " dependency locations to " + output_file) + if config.getboolean('EclipseDash', 'OutputReport', fallback=True): + base_url = 'https://github.com/' + try: + with open(config.get('General', 'VerifiedDependencies', fallback='verified-dependencies.txt'), + 'r') as fp: + for line in fp: + # Ignore commented/blank lines + if line.startswith('#') or line.strip() == '': + continue + tokens = line.split(';') + # Check all items in the current output report + for item in output_report: + # If the verified dependency is present (and not approved), add the verified column value + if tokens[0].lower() in item.lower() and 'approved' not in item: + index = output_report.index(item) + # Get verification status from comments + verification = tokens[1].split(' ')[0].lower() + # Add verification status + comments in different columns to improve filtering + output_report[index] = output_report[index] + ";" + verification + ";" + tokens[1] + except FileNotFoundError: + logger.warning("Verified dependencies file (" + + config.get('General', 'VerifiedDependencies', + fallback='verified-dependencies.txt') + ") was not found") + # Generate output report + report_filename = datetime.now().strftime("%Y%m%d_%H%M%S") + "-ip-report.html" + report.render(base_url, output_report, report_filename=report_filename) + + print("IP Analysis Report written to " + os.path.join(os.getcwd(), report_filename)) + logger.info("IP Analysis Report written to " + os.path.join(os.getcwd(), report_filename)) + if config.getboolean('EclipseDash', 'OutputSummary', fallback=False): + # Generate output summary + summary_filename = datetime.now().strftime("%Y%m%d_%H%M%S") + "-ip-summary.csv" + summary_contents = "" + for e in output_report: + columns = e.split(';') + summary_contents = summary_contents + columns[1] + "," + columns[2] + "," + columns[3] + "," + columns[4] + with open(summary_filename, 'w') as fp: + fp.write(summary_contents) + + print("IP Analysis Summary written to " + os.path.join(os.getcwd(), summary_filename)) + logger.info("IP Analysis Summary written to " + os.path.join(os.getcwd(), summary_filename)) + + +def execute(config): + # Set logging + log_level = logging.getLevelName(config.get('General', 'LogLevel', fallback='INFO')) + log_file = config.get('General', 'LogFile', fallback='ip_analysis.log') + logging.basicConfig(filename=log_file, encoding='utf-8', + format='%(asctime)s [%(levelname)s] %(message)s', level=log_level) + + print("Executing IP Analysis of Github Projects") + logger.info("Starting IP Analysis of Github Projects") + + # Using an access token + auth = Auth.Token(config.get('General', 'GithubAuthToken', fallback=None)) + + # Open GitHub connection + gh = Github(auth=auth) + + # Get dependency locations + dependency_locations = get_dependency_locations(gh, config) + + # Analyze dependencies + output_report = analyze_dependencies(gh, config, dependency_locations) + + # Close GitHub connection + gh.close() + + # Write output + write_output(config, dependency_locations, output_report) + + print("IP Analysis of Github Projects is now complete. Goodbye!") + logger.info("IP Analysis of Github Projects is now complete. Goodbye!") diff --git a/src/eclipse/ipa/glab/__init__.py b/src/eclipse/ipa/glab/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..995948df5b771dacfb9b8c6f3366c08770b40ac3 --- /dev/null +++ b/src/eclipse/ipa/glab/__init__.py @@ -0,0 +1,15 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +from . import ci, remote + +__all__ = ['ci', 'remote'] +__version__ = '0.1.0' diff --git a/src/eclipse/ipa/glab/ci.py b/src/eclipse/ipa/glab/ci.py new file mode 100644 index 0000000000000000000000000000000000000000..9f10597811913f91cf45315f8dd67daf3c46a662 --- /dev/null +++ b/src/eclipse/ipa/glab/ci.py @@ -0,0 +1,114 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" + +import logging +import os +import fnmatch +import re +from ..dash import report, run +from glob import glob + +logger = logging.getLogger(__name__) + + +def list_files(path): + tree = glob(path + '/**/*', recursive=True) + files = [] + for item in tree: + # Ignore directories + if os.path.isdir(item): + continue + files.append(item) + return files + + +def find_dependencies(files, patterns): + # Attempt to find dependency files + dependency_locations = [] + for pattern in patterns: + regex = fnmatch.translate(pattern.strip()) + for f in files: + if re.match(regex, os.path.basename(f)): + dependency_locations.append(f) + return dependency_locations + + +def execute(): + # Set logging + log_level = logging.getLevelName('INFO') + logging.basicConfig(filename='ip_analysis.log', encoding='utf-8', + format='%(asctime)s [%(levelname)s] %(message)s', level=log_level) + + print("Performing IP Analysis") + logger.info("Performing IP Analysis") + + # Check for programming languages in repository + if 'CI_PROJECT_REPOSITORY_LANGUAGES' in os.environ: + p_langs = os.environ['CI_PROJECT_REPOSITORY_LANGUAGES'] + else: + logger.warning("Unable to get project repository languages from environment") + # Nothing to do, exit + exit(0) + + # Get list of files in repository + files = list_files('.') + logger.debug("List of repository files: " + str(files)) + + # Prepare report contents + output = [] + + # Get Dash runner + dash_config = { + 'batch_size': '500', + 'confidence_threshold': '60' + } + dash_runner = run.Dash(dash_config, logger) + + # Run Eclipse Dash for dependency files of supported programming languages + if 'go' in p_langs: + logger.info("Analyzing any Go dependencies") + dependency_locations = find_dependencies(files, patterns=['*.sum']) + output.extend(dash_runner.dash_generic(dependency_locations)) + if 'javascript' in p_langs: + logger.info("Analyzing any JavaScript dependencies") + dependency_locations = find_dependencies(files, patterns=['package-lock.json']) + output.extend(dash_runner.dash_generic(dependency_locations)) + if 'python' in p_langs: + logger.info("Analyzing any Python dependencies") + dependency_locations = find_dependencies(files, patterns=['requirements*.txt']) + output.extend(dash_runner.dash_python(dependency_locations)) + if 'java' in p_langs: + logger.info("Analyzing any Java dependencies") + dependency_locations = find_dependencies(files, patterns=['pom.xml', 'build.gradle.kts']) + output.extend(dash_runner.dash_java(dependency_locations)) + if 'kotlin' in p_langs: + logger.info("Analyzing any Kotlin dependencies") + dependency_locations = find_dependencies(files, patterns=['build.gradle.kts']) + output.extend(dash_runner.dash_kotlin(dependency_locations)) + + # Render HTML report + if 'CI_PROJECT_URL' in os.environ: + base_url = os.environ['CI_PROJECT_URL'] + else: + base_url = "" + if 'CI_COMMIT_BRANCH' in os.environ: + branch = os.environ['CI_COMMIT_BRANCH'] + else: + branch = "" + + report.render(base_url, output, branch=branch, report_filename='ip_analysis.html') + + print("IP Analysis complete") + logger.info("IP Analysis complete") + + exit(0) diff --git a/src/eclipse/ipa/glab/remote.py b/src/eclipse/ipa/glab/remote.py new file mode 100644 index 0000000000000000000000000000000000000000..72426a80ca19b5caaed28a5030ecf2b2f5a9fda4 --- /dev/null +++ b/src/eclipse/ipa/glab/remote.py @@ -0,0 +1,401 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" + +import logging +import os +import re +import shutil +import tempfile +from collections import defaultdict +from datetime import datetime +from pathlib import Path +from subprocess import PIPE, Popen +from time import sleep + +import gitlab + +from ..dash import report, run +from ..general import utils + +logger = logging.getLogger(__name__) + + +def get_dependency_locations(gl, config): + # Data structure + # dict -> dict -> list + dependency_locations = defaultdict(dict) + + # If a list of dependency locations is given, work with that + if config.getboolean('DependencyLocations', 'LoadFromFile', fallback=False): + input_file = config.get('DependencyLocations', 'InputFile', fallback='gitlab-dependencies.txt') + line_count = 0 + try: + with open(input_file, 'r') as fp: + for line in fp: + # Ignore commented lines + if not line.startswith('#') and line != "": + line_count = line_count + 1 + tokens = line.strip().split(';') + proj_id = int(tokens[0]) + try: + dependency_locations[proj_id][tokens[2]].append(tokens[1]) + except KeyError: + dependency_locations[proj_id][tokens[2]] = [] + dependency_locations[proj_id][tokens[2]].append(tokens[1]) + print("Read " + str(line_count) + " dependency locations from " + input_file) + logger.info("Read " + str(line_count) + " dependency locations from " + input_file) + except FileNotFoundError: + print("The provided dependency file (" + input_file + ") cannot be found. Exiting...") + logger.error("The provided dependency file (" + input_file + ") cannot be found. Exiting...") + exit(1) + # If a list of projects is given, work with that + elif config.getboolean('Projects', 'LoadFromFile', fallback=False): + input_file = config.get('Projects', 'InputFile', fallback='gitlab-projects.txt') + line_count = 0 + try: + with open(input_file, 'r') as fp: + for line in fp: + # Ignore commented lines + if not line.startswith('#'): + line_count = line_count + 1 + proj_id = int(line.strip().split(';')[0]) + dependency_locations[proj_id] = {} + print("Read " + str(line_count) + " projects from " + input_file) + logger.info("Read " + str(line_count) + " projects from " + input_file) + except FileNotFoundError: + print("The provided projects file (" + input_file + ") cannot be found. Exiting...") + logger.error("The provided projects file (" + input_file + ") cannot be found. Exiting...") + exit(1) + # If a group ID is given, get our own list of projects from it + elif config.has_option('Groups', 'BaseGroupID'): + # Set base group ID to work + try: + base_group = gl.groups.get(config.getint('Groups', 'BaseGroupID'), lazy=True) + except ValueError: + print("Invalid BaseGroupID provided. Exiting...") + logger.warning("Invalid BaseGroupID specified. Exiting...") + exit(1) + # Get all projects + try: + projects = base_group.projects.list(include_subgroups=True, all=True, lazy=True) + except gitlab.exceptions.GitlabListError: + print("Invalid BaseGroupID provided. Exiting...") + logger.warning("Invalid BaseGroupID specified. Exiting...") + exit(1) + if config.getboolean('Projects', 'Save', fallback=False): + # Write all projects to a file + output_file = config.get('Projects', 'OutputFile', fallback='gitlab-projects.txt') + with open(output_file, 'w') as fp: + fp.write("#ID;PATH\n") + fp.write("\n".join(str(proj.id) + ';' + str(proj.path_with_namespace) for proj in projects)) + logger.info("Wrote " + str(len(projects)) + " projects to " + output_file) + for proj in projects: + dependency_locations[proj.id] = {} + # Work with a single project ID + elif config.has_option('Projects', 'SingleProjectID'): + dependency_locations[config.getint('Projects', 'SingleProjectID')] = {} + else: + # No valid option provided, exit + print("Insufficient parameters provided. Exiting...") + logger.warning("Insufficient parameters provided. Exiting...") + exit(0) + + return dependency_locations + + +def analyze_dependencies(gl, config, dependency_locations): + # If dependency check with Eclipse Dash is enabled, proceed + if config.getboolean('General', 'AnalyzeDependencies', fallback=True): + # Initialize output with dependency locations + if (config.has_section('DependencyLocations') and + config.getboolean('DependencyLocations', 'Save', fallback=False)): + # Write list of dependency files to a file + output_file = config.get('DependencyLocations', 'OutputFile', fallback='gitlab-dependencies.txt') + with open(output_file, 'w') as fp: + fp.write("#PROJ_ID;PATH;P_LANGUAGE\n") + output_report = [] + proj_count = 0 + print("Handling dependency location(s) for " + str(len(dependency_locations)) + " Gitlab project(s)") + logger.info("Handling dependency location(s) for " + str(len(dependency_locations)) + " Gitlab project(s)") + # For all projects to be processed + for proj in dependency_locations.keys(): + proj_count = proj_count + 1 + print("Handling Gitlab project " + str(proj_count) + "/" + str(len(dependency_locations))) + logger.info("Handling Gitlab project " + str(proj_count) + "/" + str(len(dependency_locations))) + + # Get project details + max_attempts = int(config.get('General', 'APIConnectionAttempts', fallback=3)) + 1 + for i in range(max_attempts): + try: + p_details = gl.projects.get(proj) + except BaseException: + # Max attempts reached + if i == max_attempts - 1: + print("Connection error fetching project with ID " + str(proj) + "after " + str( + max_attempts - 1) + " attempts. Exiting...") + logger.error("Connection error fetching project with ID " + str(proj) + "after " + str( + max_attempts - 1) + " attempts. Exiting...") + exit(1) + logger.warning("Connection error fetching project with ID " + str(proj) + ". Retrying in 30 seconds...") + sleep(30) + + logger.info("Project full path: " + str(p_details.path_with_namespace)) + + # User did not provide dependencies for the project + if len(dependency_locations[proj]) == 0: + logger.info("No dependencies given for project. Attempting to find them.") + # Get programming languages of the project + p_langs = p_details.languages() + logger.info("Project programming languages: " + str(p_langs)) + # Get a list of files in the project repository + files = [] + try: + files = p_details.repository_tree( + ref=config.get('General', 'Branch', fallback=p_details.default_branch), + recursive=True, all=True) + except gitlab.exceptions.GitlabGetError: + logger.warning("Project repository not found for: " + p_details.path_with_namespace) + # Attempt to find dependency files for supported programming languages + if 'Go' in p_langs and config.getboolean('Go', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_gitlab(config, logger, 'Go', files, default_filename='go.sum') + utils.add_gldep_locations(dependency_locations, p_details, 'Go', dependency_paths) + if 'Java' in p_langs and config.getboolean('Java', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_gitlab(config, logger, 'Java', files, default_filename='pom.xml') + utils.add_gldep_locations(dependency_locations, p_details, 'Java', dependency_paths) + if 'JavaScript' in p_langs and config.getboolean('JavaScript', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_gitlab(config, logger, 'JavaScript', files, + default_filename='package-lock.json') + utils.add_gldep_locations(dependency_locations, p_details, 'JavaScript', dependency_paths) + if 'Kotlin' in p_langs and config.getboolean('Kotlin', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_gitlab(config, logger, 'Kotlin', files, + default_filename='build.gradle.kts') + utils.add_gldep_locations(dependency_locations, p_details, 'Kotlin', dependency_paths) + if 'Python' in p_langs and config.getboolean('Python', 'Enabled', fallback=True): + dependency_paths = utils.find_dependencies_gitlab(config, logger, 'Python', files, + default_filename='requirements.txt') + utils.add_gldep_locations(dependency_locations, p_details, 'Python', dependency_paths) + + # Dash Analysis + for lang in dependency_locations[proj].keys(): + if config.getboolean(lang, 'Enabled', fallback=True): + print("Processing " + str(len(dependency_locations[proj][lang])) + + " dependency location(s) for " + lang + " in project " + p_details.path_with_namespace) + logger.info("Processing " + str(len(dependency_locations[proj][lang])) + + " dependency location(s) for " + lang + " in project" + p_details.path_with_namespace) + output_report.extend(dash_processing(config, p_details, dependency_locations[proj][lang], lang)) + + return output_report + + +def dash_processing(config, project, filepaths, lang): + effective_count = 0 + total_count = 0 + output_report = [] + dash_config = { + 'batch_size': config.get('EclipseDash', 'BatchSize', fallback='500'), + 'confidence_threshold': config.get('EclipseDash', 'ConfidenceThreshold', fallback='60'), + } + dash_runner = run.Dash(dash_config, logger) + + for fpath in filepaths: + total_count = total_count + 1 + print("Processing " + lang + " dependency location " + str(total_count) + "/" + str(len(filepaths))) + logger.info("Processing " + lang + " dependency location " + str(total_count) + "/" + str(len(filepaths))) + + # Make relative path for processing + fpath = fpath.replace(project.path_with_namespace + "/", "") + location = project.path_with_namespace + "/-/blob/" + config.get('General', 'Branch', + fallback=project.default_branch) + "/" + fpath + # Java (Maven Only) + if lang == 'Java' and 'gradle' not in fpath: + # Git clone repo for Maven + with tempfile.TemporaryDirectory() as tmpdir: + p_git = Popen([shutil.which('git'), 'clone', '-b', + config.get('General', 'Branch', fallback=project.default_branch), '--single-branch', + '--depth', '1', project.clone_url, tmpdir], stdout=PIPE, stderr=PIPE) + stdout, stderr = p_git.communicate() + # If errors from Git clone + if p_git.returncode != 0: + logger.warning( + "Error Git cloning repository for dependency file (" + project.path_with_namespace + "/" + fpath + + "). Please check.") + logger.warning(stdout) + logger.warning(stderr) + output_report.append( + utils.add_error_report(config, location, "Error Git cloning repository for the dependency file")) + continue + # Create dependency list with Maven + relative_path = tmpdir + os.pathsep + fpath.replace(project.path_with_namespace, "") + + dash_output = dash_runner.dash_java(relative_path) + for line in dash_output: + if 'error' in line: + columns = line.split(';') + logger.warning( + "Error running Maven for dependency file (" + project.path_with_namespace + "/" + fpath + + "). Please see debug information below.") + logger.warning(columns[1]) + output_report.append( + utils.add_error_report(config, location, "Error running Maven for the dependency file")) + continue + else: + line = re.sub(r'(.*?);', project.path_with_namespace + "/-/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + # Java or Kotlin using Gradle + elif 'gradle' in fpath: + with tempfile.TemporaryDirectory() as tmpdir: + # Get raw version of build.gradle.kts + if tmpfile := get_file_gitlab(config, project, fpath, tmpdir): + dash_output = dash_runner.dash_java([str(tmpfile)]) + for line in dash_output: + line = re.sub(r'(.*?);', project.path_with_namespace + "/-/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + else: + output_report.append( + utils.add_error_report(config, location, "Error obtaining dependency file from Gitlab")) + continue + # Python + elif lang == 'Python': + with tempfile.TemporaryDirectory() as tmpdir: + # Get raw version of requirements.txt + if tmpfile := get_file_gitlab(config, project, fpath, tmpdir): + dash_output = dash_runner.dash_python([str(tmpfile)]) + for line in dash_output: + line = re.sub(r'(.*?);', project.path_with_namespace + "/-/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + else: + output_report.append( + utils.add_error_report(config, location, "Error obtaining dependency file from Gitlab")) + continue + # Go, Javascript (or others directly supported) + else: + with tempfile.TemporaryDirectory() as tmpdir: + # Get raw version of file + if tmpfile := get_file_gitlab(config, project, fpath, tmpdir): + dash_output = dash_runner.dash_generic([str(tmpfile)]) + for line in dash_output: + line = re.sub(r'(.*?);', project.path_with_namespace + "/-/blob/" + + config.get('General', 'Branch', fallback=project.default_branch) + + "/" + fpath + ";", line, 1) + output_report.append(line) + else: + output_report.append( + utils.add_error_report(config, location, "Error obtaining dependency file from Gitlab")) + continue + effective_count += 1 + + return output_report + + +def get_file_gitlab(config, project, fpath, tmpdir): + try: + wpath = Path(f'{os.path.join(tmpdir, os.path.basename(fpath))}') + with open(wpath, 'w+b') as f: + project.files.raw(file_path=fpath, + ref=config.get('General', 'Branch', fallback=project.default_branch), + streamed=True, action=f.write) + return wpath + except gitlab.exceptions.GitlabGetError as e: + logger.warning("Error obtaining file (" + fpath + ") from Gitlab (" + str(e.response_code) + ")") + return None + + +def write_output(config, dependency_locations, output_report): + # Initialize output with dependency locations + if (config.has_section('DependencyLocations') and + config.getboolean('DependencyLocations', 'Save', fallback=False)): + # Write list of dependency locations to a file + output_file = config.get('DependencyLocations', 'OutputFile', fallback='gitlab-dependencies.txt') + line_count = 0 + with open(output_file, 'a') as fp: + for proj in dependency_locations.keys(): + for lang in dependency_locations[proj].keys(): + fp.write("\n".join(str(proj) + ';' + depl + ';' + lang + for depl in dependency_locations[proj][lang])) + line_count = line_count + 1 + fp.write("\n") + logger.info("Wrote " + str(line_count) + " dependency locations to " + output_file) + if config.getboolean('EclipseDash', 'OutputReport', fallback=True): + base_url = config.get('General', 'GitlabURL', fallback='https://gitlab.eclipse.org') + "/" + try: + with open(config.get('General', 'VerifiedDependencies', fallback='verified-dependencies.txt'), + 'r') as fp: + for line in fp: + # Ignore commented/blank lines + if line.startswith('#') or line.strip() == '': + continue + tokens = line.split(';') + # Check all items in the current output report + for item in output_report: + # If the verified dependency is present (and not approved), add the verified column value + if tokens[0].lower() in item.lower() and 'approved' not in item: + index = output_report.index(item) + # Get verification status from comments + verification = tokens[1].split(' ')[0].lower() + # Add verification status + comments in different columns to improve filtering + output_report[index] = output_report[index] + ";" + verification + ";" + tokens[1] + except FileNotFoundError: + logger.warning("Verified dependencies file (" + + config.get('General', 'VerifiedDependencies', + fallback='verified-dependencies.txt') + ") was not found") + # Generate output report + report_filename = datetime.now().strftime("%Y%m%d_%H%M%S") + "-ip-report.html" + report.render(base_url, output_report, report_filename=report_filename) + + print("IP Analysis Report written to " + os.path.join(os.getcwd(), report_filename)) + logger.info("IP Analysis Report written to " + os.path.join(os.getcwd(), report_filename)) + if config.getboolean('EclipseDash', 'OutputSummary', fallback=False): + # Generate output summary + summary_filename = datetime.now().strftime("%Y%m%d_%H%M%S") + "-ip-summary.csv" + summary_contents = "" + for e in output_report: + columns = e.split(';') + summary_contents = summary_contents + columns[1] + "," + columns[2] + "," + columns[3] + "," + columns[4] + with open(summary_filename, 'w') as fp: + fp.write(summary_contents) + + print("IP Analysis Summary written to " + os.path.join(os.getcwd(), summary_filename)) + logger.info("IP Analysis Summary written to " + os.path.join(os.getcwd(), summary_filename)) + + +def execute(config): + # Set logging + log_level = logging.getLevelName(config.get('General', 'LogLevel', fallback='INFO')) + log_file = config.get('General', 'LogFile', fallback='ip_analysis.log') + logging.basicConfig(filename=log_file, encoding='utf-8', + format='%(asctime)s [%(levelname)s] %(message)s', level=log_level) + + print("Executing IP Analysis of Gitlab Projects") + logger.info("Starting IP Analysis of Gitlab Projects") + + # Gitlab instance + gl = gitlab.Gitlab(url=config.get('General', 'GitlabURL', fallback='https://gitlab.eclipse.org'), + private_token=config.get('General', 'GitlabAuthToken', fallback=None)) + + # Get dependency locations + dependency_locations = get_dependency_locations(gl, config) + + # Analyze dependencies + output_report = analyze_dependencies(gl, config, dependency_locations) + + # Write output + write_output(config, dependency_locations, output_report) + + print("IP Analysis of Gitlab Projects is now complete. Goodbye!") + logger.info("IP Analysis of Gitlab Projects is now complete. Goodbye!") diff --git a/templates/report_template.jinja b/src/eclipse/ipa/templates/report_template.jinja similarity index 98% rename from templates/report_template.jinja rename to src/eclipse/ipa/templates/report_template.jinja index e48c23fa18964c081a5fe9c4c407afd851732386..379f5ea06444d29db8e2e285590781e4ae4cca1f 100644 --- a/templates/report_template.jinja +++ b/src/eclipse/ipa/templates/report_template.jinja @@ -26,7 +26,7 @@ Contributors: <!-- Required meta tags --> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"> - <title>IP Check Report</title> + <title>IP Analysis Report</title> <link href="https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css" rel="stylesheet"> <link href="https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.3/font/bootstrap-icons.css" rel="stylesheet"> @@ -34,7 +34,7 @@ Contributors: </head> <body> <div class="container" style="width: 80vw;"> - <h1>IP Check Report</h1> + <h1>IP Analysis Report</h1> <div id="toolbar"> <select class="form-control"> diff --git a/tests/__init__.py b/tests/__init__.py new file mode 100644 index 0000000000000000000000000000000000000000..95dd682ff48021d44cd52d13c47a1fcdb3b597f2 --- /dev/null +++ b/tests/__init__.py @@ -0,0 +1,12 @@ +# Copyright (c) 2024 The Eclipse Foundation +# +# This program and the accompanying materials are made available under the +# terms of the Eclipse Public License 2.0 which is available at +# http://www.eclipse.org/legal/epl-2.0. +# +# SPDX-License-Identifier: EPL-2.0 +# +# Contributors: +# asgomes - Initial implementation + +__version__ = "0.1.0" diff --git a/requirements.txt b/tests/test_params.py similarity index 76% rename from requirements.txt rename to tests/test_params.py index ee0596ec2ca6b76babc7d0a13b20861668db64b0..cc16fdf520b808acbac1588f83cc31dd3e552f8e 100644 --- a/requirements.txt +++ b/tests/test_params.py @@ -8,7 +8,10 @@ # # Contributors: # asgomes - Initial definition -python-gitlab==4.8.0 -get-pypi-latest-version==0.0.12 -chardet==5.2.0 -jinja2==3.1.4 \ No newline at end of file + +import pytest + +from src.eclipse.ipa import main + +def test_something(): + assert True == True