Commit 858c3981 authored by Boris Baldassari's avatar Boris Baldassari
Browse files

Fix downloads for mls.

parent 9e760977
......@@ -18,7 +18,8 @@ pipeline {
steps {
sh '''
ls /data/eclipse_mls/ > website/content/eclipse_mls/list_mboxes.txt
cp /data/eclipse_mls_full.csv website/content/eclipse_mls/
cp /data/eclipse_mls_full.csv.gz website/content/eclipse_mls/eclipse_mls.csv.gz
cp -r /data/eclipse_mls_scrambled/ website/content/eclipse_mls/mboxes/
'''
}
}
......@@ -56,6 +57,7 @@ pipeline {
sh '''
echo "Creating download area."
rsync -am --include='*.bz2' --include='*/' --exclude='*' website/public/ download/
rsync -am --include='*.xz' --include='*/' --exclude='*' website/public/ download/
rsync -am --include='*.gz' --include='*/' --exclude='*' website/public/ download/
echo "Cleaning website zone from compressed files."
find website/public/ -name "*.gz" -or -name "*.xz" -or -name "*.bz2" | xargs rm -rf
......
......@@ -9,7 +9,7 @@ weight: 40
slug: eclipse_mls
---
The Eclipse Mailing lists dump is an extract of all emails posted on the [Eclipse mailing lists](https://accounts.eclipse.org/mailing-list), as a single CSV file or as per-project mboxes.
The Eclipse Mailing lists dump is an extract of all emails posted on the [Eclipse mailing lists](https://accounts.eclipse.org/mailing-list) **from 2001-11-05 to 2021-09-04**, as a single CSV file or as per-project mboxes.
* Download the **Eclipse mailing lists dataset** [ [CSV]({{< ref "eclipse_mls" >}}eclipse_mls.csv.gz) ].
* Check the **documentation** for the dataset [here (HTML)](mbox_csv_analysis). For reproducibility we also provide the [R Markdown document](https://gitlab.eclipse.org/eclipse/dataeggs/dataeggs/-/raw/master/website/content/eclipse_mls/mbox_csv_analysis.Rmarkdown) for the dataset analysis and documentation.
......@@ -20,7 +20,7 @@ These datasets are published under the [Creative Commons BY-Attribution-Share Al
## The CSV extract
This dataset is a dump of all posts sent on all mailing lists hosted at the Eclipse Forge. It only includes the list name, post ID, sent date, author name and address, and post subject. the body of messages is dismissed.
This dataset is a dump of all posts sent on all mailing lists hosted at the Eclipse Forge. It only includes the list name, post ID, sent date, author name and address, and post subject. The body of messages is dismissed.
Although this is public data (the mailing lists can be browsed on the [official mailman page](https://accounts.eclipse.org/mailing-list)) all data has been anonymised to prevent any misuse.
The privacy issues identified, along with the anonymisation process, have been covered in a [dedicated document](../privacy/).
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment