**UPDATE: ** it is safe to move p2 and maven repositories, as download.e.o will detect the presence of archived content and serve a 301 redirect to it.
We've received an email from OSU OSL that our download.e.o footprint exceeds 2TB.
As we rely on volunteer mirrors around the world to help distribute Eclipse bits, download.e.o must be kept as small as possible:
download.e.o must only contain recent nightly builds. Nightly builds > 2wks must be purged
** Nightly builds must be in an Excluded directory structure (see below)
download.e.o must only contain the last 2 releases. Older releases must be moved to archives
There are two tools you can use to ensure your footprint is reasonable:
This is an index of download.e.o, with the actual on-mirror footprint and file content. Please locate your projects and perform the necessary maintenace:
Hi Denis. I tried to prune Linux Tools project, but the button for Move to Archives is prohibited for me. I have logged in, but the page doesn't mark me as logged in. I have tried re-logging in and that doesn't help either.
@jjohnston There is an unfortunate bug with the pages on download.e.o (and archive.e.o) in that they do not show the logged in user. The options should still be there. Can you attach a screenshot?
From what I can see in the code, the "Move to archive.eclipse.org" button is disabled but default and if a checkbox is clicked, we have some javascript that shoudl remove the disabled attribute from the button.
@jjohnston can you tell me if you have disabled JS in your browser? Thanks.
I'm also thinking we might just want to simply remove the disabled attribute from the button and let it be accessible even if no checkboxes have been selected.
I spoke too soon. All the directories I tried to archive didn't move. I tried today and waited and refreshed. All the directories I tried to archive came back with Folder cannot be archived, failed with 256, contact webmaster.
Hi Jeff, a few have failed with: unable to remove target: Directory not empty
Can you check the archive location for those that failed? They may already have a matching directory. From archive, it can be deleted, then archived again (and permanently deleted if need be)
Unfortunately, this doesn't work properly. From the archives page, I can't delete the directories. I can click on them and from there I can delete contents. I went into update-7.5.0 and deleted all contents. That worked. I then went and archived update-7.5.0 from downloads and though it didn't fail, it placed the update-7.5.0 directory inside the archive update-7.5.0 directory I couldn't remove. So any old references shouldn't resolve as they won't have update-7.5.0/update-7.5.0 in the url.
@npeifer1fn We seem to be missing a 'genie.subversive' user account. @mward@jmazanek4ep can genie.subversive be created, and made owner of download.e.o/technology/subversive and archive.e.o/technology/subversive?
Same issue as jeff here : move to archive is grayed out and cannot be used. the download.eclipse.org page doesn't show me as logged in. This is true on both firefox and chrome.
That is an interesting use case. We could implement a transparent 30x redirect if the file is nonexistent and it does exist in the same structure on archives
@droy The redirects are not working properly from download -> archive because the 301 is returning an http:// even though the initiator is https://
So I get errors like this in the browser console and no download if I click on a link on www.eclipse.org.
Mixed Content: The site at 'https://www.eclipse.org/' was loaded over a secure connection, but the file at 'https://archive.eclipse.org/technology/epp/downloads/release/2021-03/R/eclipse-java-2021-03-R-linux-gtk-x86_64.tar.gz' was redirected through an insecure connection. This file should be served over HTTPS. This download has been blocked. See https://blog.chromium.org/2020/02/protecting-users-from-insecure.html for more details.
@droy Is there a transparent redirect to the archive if a file is requested via https://www.eclipse.org/downloads/download.php?file=/path/to/binary.zip ?
I have archived EPP releases <= 2021-03 - this removed ~200GB from the mirrors
Note that some of the now archived files are still downloaded 100s of times per day (mostly the JEE and Java packages for Windows). See download stats for /technology/epp/downloads/release/neon/3/eclipse-jee-neon-3-win32-x86_64.zip as an example, downloaded ~150k times in the last 12 months.
Long story short - @droy please let me know if you see a spike in traffic to archive.e.o that causes you a problem and we can unarchive some of the older releases.
Ah sorry, I mean the projects download page, which enables us to move folders to the "archive". Currently, it's only possible to move a folder to "archive" and then to delete the folder in the archive. Probably, there could be an option to directly delete a folder.
@phwenig We're adding navigation to the Archives. Deletes will be 3 clicks (Archive, Go to, Delete) but that gives us enough confidence that deletions are intentional while making them easier to perform.
Looking at the size of the mirror itself since Denis' call for action, I see it's reduced from 1.5T then down to 1.2T today so that's a great improvement:
I wonder if you would consider adding to the exclusions list? For example, I could suggest adding webtools/CI (does it need some * in the pattern?) given that its subfolders are taking up 100G currently and these appear to be a huge number of relatively recent integration builds (several months worth) of multiple different WTP versions:
Better of course (for the overall backup footprint of all the servers) would be to limit these to perhaps the 5 most recent such builds for each version, but even in that case, I doubt these need to be mirrored at all...
I'll start a running list of potential exclusions:
*/ci/* (might as well grab them all*/CI/*/releases/neon/ /releases/mars//releases/oxygen//releases/luna/ etc... ? https://download.eclipse.org/oomph/archive/eclipse/releases/index.html
The removed elements are not available on archive.eclipse.org/jetty, is there a way to have access to the removed resources ?
I did not find any official corresponding repo yet.
Note that the following two URLs are quite useful for browsing exactly what's in the file system of the two servers, even if there is an index.* in the folder that would otherwise prevent showing the listing of that folder:
Here's a list of potential additions to the exclude list, which will purge matching files and direcrtories from mirrors:
/technology/osee (doesn't use mirrors)*-dev/ (dev stuff)/technology/linuxtools (2012 archives)/tools/orbit (is it even worth mirroring this?)*/CI/**ganymede**galileo**helios**indigo**juno**kepler**luna**mars**neon**snapshots*test/*milestone/*milestones/*stable/*/ee4j/* (doesn't use mirrors)builds/*
Do we know the download count for the directories affected by this exclude list? Is it reasonably low? What's the cut off number/count?
Expanding the exclude list is a good measure to reduce the disk usage on the mirrors, but it should be accompanied by default quotas for every project's download dir (+ exceptions were necessary) so we do not end up in the same situation in a couple of months.
There's also a price to pay in mirroring (and sync'ing) content that no one downloads -- or worse, mirroring content that folks where folks link to downoad.e.o directly.
The rsync process -- even if there are no bit changes in the content from the sender to receiver - is expensive in disk I/O and runs for several minutes. Plus the stream of bandwidth used for the sync itself is not small.
Before changes to the exclude list:total size is 1,588,864,727,405List xfer alone is 200MAfter:total size is 1,080,857,525,956List transfer alone is 146M
If I cross-check those patterns to the download stats:
/stable: /ldt/products/stable/1.4.2/org.eclipse.ldt.product* would be the biggest to pay the price. 99MB, 30 downloads per day. We could ask them to rename /stable to /release or something.
-dev/ : insignificant
/technology/linuxtools : less than 1000 in the last year, mostly .ogg files
/tools/orbit: about 200/year of this 600M file: /tools/orbit/downloads/drops/R20190226160451/orbit-buildrepo-R20190226160451.zip + a few jar files here and there.
/CI/ : 0
/ganymede: 225 various zips in the last year, and these could be on archive.e.o already
/galileo: about 200 zips, a few hundred swtbot jar files
/helios: mostly /technology/epp/downloads/release/helios/SR1/eclipse-jee-helios-SR1-win32-x86_64.zipbut that is not even on download.e.o, it's on archive and it's only being counted for stats.
/juno: nothing of importance that is not already on archive.e.o
[skip]
/neon: nothing of importance that is not already on archive.e.o
/milestone% : about 8900 downloads in the last year, mostly eclipselink 30M zips
/ee4j: nothing of importance, or stuff that is already gone.
builds/ : About 12000 downloads, of those 6000 are Subversive zip files @ 20M each
From the comments it looks like orbit isn't mirrored. Does that mean I can leave everything on tools/orbit, or do you want/need me to move old releases to archive.eclipse.org?
There are other patterns that look scary to me. Surely these two are bad as well:
*cbi* *CI*
Given I'm currently looking at project/bundles with names like org.eclipse.cbi.*, I really would not be happy if those such bundles are selectively excluded from p2 repositories. I think we should be careful with any patterns apply to files, not just to folders, that they don't selectively exclude a file that could be in a p2 repository.
All these are suspect, especially ones that start and end with *.
I tested a few patterns with find on download.eclipse.org:
*cbi* pattern is too wide and not very useful. For example, it excludes files like
/home/data/httpd/download.eclipse.org/modeling/mdt/papyrus/updates/releases/2021-09/5.2.0/toolsmiths/plugins/org.eclipse.cbi.p2repo.aggregator_1.0.300.20200825-1205.jar (it might be debatable, if those plugins should be part of the papyrus update site).
Out of ~2900 pattern matches on downloads.eclipse.org, ~2200 matches refer to cbi.p2repo.aggregator files that are part of I-Builds and therefore are excluded from mirroring anyway.
*ci* is too wide, as it matches files that include the following words: ascii, reconciler, tracing, principal
*CI* is too wide, as it matches files like /home/data/httpd/download.eclipse.org/modeling/gmp/updates/releases/features/org.eclipse.gmf.runtime.notation.sdk_1.6.0.v20120327-2213-47F08xGD6FxMBN7CJFV3CIKK9t84.jar. *CI/ would be better, it currently matches:
*-dev* is too wide, as it matches files like /home/data/httpd/download.eclipse.org/chess/core/releases/devel/CHESS-devel_2111221039-linux.gtk.x86_64.zip
for release names like *galileo*, etc it's probably better to exclude only directories like *galileo/, otherwise individual files like /home/data/httpd/download.eclipse.org/releases/neon/201612211000/plugins/org.eclipse.rcptt.updates.galileo_2.1.0.201510050740.jar are excluded.
I tried to archive the m2e-milestones directory (and its content) at
https://download.eclipse.org/technology/m2e/milestones/ with the intention to delete it. But moving it to archives failed with the following message:
(Folder cannot be archived: failed with 256. Please contact webmaster.)
When looking into the milestones directory I noticed that all but the 1.5 and 1.6 sub-directories were moved.
In the archives the 1.5 and 1.6 sub-directory as well as the other directories are present, so it looks like it just failed to delete those two sub-directories.
Furthermore I noticed that the content of https://download.eclipse.org/technology/m2e/milestones/ was moved to the following address: https://archive.eclipse.org/technology/m2e/milestones/milestones/
I think the second milestones path element is wrong, isn't it?
Before I tried to delete the milestones directories via the Jenkins-CI but it failed due to missing permissions (the owner is another committer of m2e). For details regarding that effort, see https://github.com/eclipse-m2e/m2e-core/issues/224
@hwellmannwr6 I've fixed the permissions, so you should be able to clean up via CI. Indeed, moves when the target exists is problematic. Please let me know if you want me to clean up by specifying which directories should be deleted, and which should be archived.
Thanks. Deleting via the CI as well deleting by moving to archives worked both.
Could it be problematic if a subfolder of a folder is archived when other sub-folders have been archived before and therefore the parent-folder already exists in the targets?
For example in case we now move older releases to archives (and consequently create the releases folder in the archives) and then want to archive other releases in the future?
One, the size of the mirror as computed by crawling ftp.fau.de; Two, the size of the actual download.eclipse.org file system; Three, the size of the actual archive.eclipse.org file system.
As Ed mentioned, most of the old releases are excluded now:
Release name
Status
Callisto
not on d.e.o / not archived
Europa
not mirrored
Ganymede
not mirrored (changed recently)
Galileo
not mirrored (changed recently)
Helios
not mirrored (changed recently)
Indigo
not mirrored (changed recently)
Juno
not mirrored (changed recently)
Kepler
not mirrored (changed recently)
Luna
not mirrored (changed recently)
Neon
not mirrored (changed recently)
Mars
not mirrored (changed recently)
Oxygen
mirrored
Photon
mirrored
2018-09
mirrored
...
...
2021-12
mirrored
I've recently (August) removed a bunch of milestone and RCs from old releases. The releases dir had around 270GB, it's down to 100GB now. I'd be happy to move old releases to archive.eclipse.org, depending on the download stats. As Denis mentioned above, up until Neon it should be fine to move the releases to a.e.o.
BTW: each new release adds about 2-3GB (depending on additional respins, etc).
@droy can you comment on the download stats of oxygen and photon?
Bit late to the party - but I have updated the rest of the projects that I am PL for (CDT, remaining parts of EPP, and LSP4J) to have the minimal amount on download.eclipse.org.
Also, the release repo for EPP has grown a lot over the last couple of years or so with the addition of new architectures (aarch on Linux + mac) and bundled JREs. So I have also updated EPP release procedures to archive each milestone as new ones are created. In the past we would accumulate them on download.e.o for the whole release cycle.
The current release size of EPP is ~65GB - up from 11GB 2 years ago. And it is probably set to grow as the remaining packages adopt bundled JREs.
If we need to have a discussion about the EPP usage - lets have it on epp-dev or in bugzilla (as this issue is actually closed)