Skip to content
Snippets Groups Projects
Commit e1c74d8a authored by Boris Baldassari's avatar Boris Baldassari
Browse files

#5 Start web site refactoring.

parent 50168b72
No related branches found
No related tags found
No related merge requests found
......@@ -3,17 +3,74 @@
![Scava logo](scava-header.jpg)
This web site hosts the various downloads available from the [Eclipse Scava project](https://eclipse.org/scava). It notably includes:
* Open [software-engineering datasets](datasets/index.html) related to the Eclipse forge, including AERI stacktraces, Eclipse mailing lists, and Eclipse projects data.
* Downloads for the Scava application (availability planned for Q3 2019).
This web site hosts the open datasets generated in the course of the [Crossminer research project](https://crossminer.org). Crossminer has been terminated in 2019, and since then the datasets are maintained by [Castalia Solutions](https://castalia.solutions) as a service for the Eclipse and Research communities.
## More information
The datasets include various pieces of data retrieved from the Eclipse forge: Mailing lists, Project development data, and AERI stacktraces in handy CSV and JSON formats, and each dataset has a R Markdown document describing its content and providing hints about how to use it. Examples provided mainly use the [R statistical analysis software](https://r-project.org).
All datasets are published under the [Creative Commons BY-Attribution-Share Alike 4.0 (International)](https://creativecommons.org/licenses/by-sa/4.0/).
All data is anonymised, please see the [dedicated document](datasets_privacy.html) to learn more about privacy and the anonymisation mecanism.
We're open: if you'd like to contribute, please see the [GitLab project](https://gitlab.eclipse.org/bbaldassari2kd/scava-datasets) page.
## Eclipse projects
We generate comprehensive data extracts of a [set of Eclipse projects](projects/eclipse_projects.html), including data sources like:
* Software Configuration Management ([git](https://git.eclipse.org)),
* Issues tracking ([Bugzilla](https://bugs.eclipse.org) or GitHub),
* Project metadata checks ([PMI](https://projects.eclipse.org)),
* Licencing and copyrights ([Scancode](https://github.com/nexB/scancode-toolkit)), and
* Static Code Analysis ([SonarCloud](https://sonarcloud.io)) when available.
These datasets are updated weekly, at 2am on Sunday. If you would like to add a project, please [let us know](https://gitlab.eclipse.org/bbaldassari2kd/scava-datasets/-/issues).
**Downloads**
* **List of projects** See the [list of projects with their associated datasets and documentation](projects/eclipse_projects.html).
## Eclipse mailing lists
The [Eclipse Mailing lists](eclipse_mls/eclipse_mls.html) dump is an extract of all emails posted on the Eclipse mailing lists.
* Download the **Eclipse mailing lists dataset** [ [CSV](eclipse_mls/eclipse_mls.gz) ].
* Check the **documentation** for the dataset [here (HTML)](eclipse_mls/mbox_csv_analysis.html). For reproducibility we also provide the [R Markdown document](eclipse_mls/mbox_csv_analysis.rmd) for the dataset analysis and documentation.
* Download the **mbox files** [ [see the list](eclipse_mls/eclipse_mls.html#project-mboxes) ]
More information can be found on the official [Eclipse page for mailing lists](https://accounts.eclipse.org/mailing-list).
## AERI Stacktraces
The [AERI stacktraces dataset](aeri_stacktraces/aeri_stacktraces.html) is a list of exceptions encountered by users in the Eclipse IDE, as retrieved by the AERI system. The Automated Error Reporting (AERI) system has been developed by the people at [Code Trails](https://www.codetrails.com/) and retrieves information about exceptions. It is installed by default in the Eclipse IDE and has helped hundreds of projects better support their users and resolve bugs. This dataset is a dump of all records over a couple of years, with useful information about the exceptions and environment.
Last update of the dataset occured on 2018-02-11.
**Downloads**
* **Problems full** [ [Download JSON](aeri_stacktraces/problems_full.tar.bz2) ] -- A list of all problems, exported as JSON (one problem per file).
* **Problems extract** [ [Download CSV](aeri_stacktraces/problems_extract.csv.bz2) ] -- A list of all problems, exported as CSV (one big file).
* **Incidents full** [ [Download JSON](aeri_stacktraces/incidents_full.tar.bz2) ] -- A list of all incidents, exported as JSON (one incident per file).
* **Incidents extract** [ [Download CSV](aeri_stacktraces/incidents_extract.csv.bz2) ] -- A list of all incidents, exported as CSV (one big file).
* **Incidents Bundles** [ [Download CSV](aeri_stacktraces/incidents_bundles_extract.csv.bz2) ] -- A list of all bundles found in incidents, exported as CSV. Attributes are bundle_name, bundle_version, and number of occurrences.
**Documentation**
* **Stacktraces Problems analysis document** [ [Download PDF](aeri_stacktraces/problems_analysis.pdf) | [Download Rmd](aeri_stacktraces/problems_analysis.rmd) ] -- A R Markdown document to analyse the Stacktraces problem dataset, with description of the actual content and examples of usage.
* **Stacktraces Incidents analysis document** [ [Download PDF](aeri_stacktraces/incidents_analysis.pdf) | [Download Rmd](aeri_stacktraces/incidents_analysis.rmd) ] -- A R Markdown document to analyse the Stacktraces incidents dataset, with description of the actual content and examples of usage.
More information about the AERI system can be found on the [Code Trails website](https://www.codetrails.com/error-analytics/manual/).
## About Scava
Scava is the Eclipse spin-off of Crossminer, a EU-funded research project. More information can be found at the following places:
* The [Eclipse Scava project](https://eclipse.org/scava)
* The official [documentation for Scava](https://scava-docs.readthedocs.io)
* The [documentation repository](https://github.com/crossminer/scava-docs)
* The official [Crossminer web page](https://crossminer.org)
* The [GitHub Crossminer organisation](https://github.com/crossminer)
......@@ -22,10 +79,3 @@ Scava is the Eclipse spin-off of Crossminer, a EU-funded research project. More
All datasets are published under the [Creative Commons BY-Attribution-Share Alike 4.0 (International)](https://creativecommons.org/licenses/by-sa/4.0/).
All code is, unless otherwise stated, published under the [Eclipse Public Licence, v2](https://www.eclipse.org/legal/epl-2.0/).
## Associated repositories
More information can be found in the following places:
* The [git repository dedicated to the datasets extraction](https://github.com/eclipse-scava/scava-datasets): `git@github.com:eclipse-scava/scava-datasets.git`
* The [documentation repository](https://github.com/crossminer/scava-docs) for the Scava project: `https://github.com/crossminer/scava-docs`
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment