**UPDATE: ** it is safe to move p2 and maven repositories, as download.e.o will detect the presence of archived content and serve a 301 redirect to it.
We've received an email from OSU OSL that our download.e.o footprint exceeds 2TB.
As we rely on volunteer mirrors around the world to help distribute Eclipse bits, download.e.o must be kept as small as possible:
download.e.o must only contain recent nightly builds. Nightly builds > 2wks must be purged
** Nightly builds must be in an Excluded directory structure (see below)
download.e.o must only contain the last 2 releases. Older releases must be moved to archives
There are two tools you can use to ensure your footprint is reasonable:
This is an index of download.e.o, with the actual on-mirror footprint and file content. Please locate your projects and perform the necessary maintenace:
Hi Denis. I tried to prune Linux Tools project, but the button for Move to Archives is prohibited for me. I have logged in, but the page doesn't mark me as logged in. I have tried re-logging in and that doesn't help either.
@jjohnston There is an unfortunate bug with the pages on download.e.o (and archive.e.o) in that they do not show the logged in user. The options should still be there. Can you attach a screenshot?
From what I can see in the code, the "Move to archive.eclipse.org" button is disabled but default and if a checkbox is clicked, we have some javascript that shoudl remove the disabled attribute from the button.
@jjohnston can you tell me if you have disabled JS in your browser? Thanks.
I'm also thinking we might just want to simply remove the disabled attribute from the button and let it be accessible even if no checkboxes have been selected.
I spoke too soon. All the directories I tried to archive didn't move. I tried today and waited and refreshed. All the directories I tried to archive came back with Folder cannot be archived, failed with 256, contact webmaster.
Hi Jeff, a few have failed with: unable to remove target: Directory not empty
Can you check the archive location for those that failed? They may already have a matching directory. From archive, it can be deleted, then archived again (and permanently deleted if need be)
Unfortunately, this doesn't work properly. From the archives page, I can't delete the directories. I can click on them and from there I can delete contents. I went into update-7.5.0 and deleted all contents. That worked. I then went and archived update-7.5.0 from downloads and though it didn't fail, it placed the update-7.5.0 directory inside the archive update-7.5.0 directory I couldn't remove. So any old references shouldn't resolve as they won't have update-7.5.0/update-7.5.0 in the url.
@npeifer1fn We seem to be missing a 'genie.subversive' user account. @mward@jmazanek4ep can genie.subversive be created, and made owner of download.e.o/technology/subversive and archive.e.o/technology/subversive?
Same issue as jeff here : move to archive is grayed out and cannot be used. the download.eclipse.org page doesn't show me as logged in. This is true on both firefox and chrome.
That is an interesting use case. We could implement a transparent 30x redirect if the file is nonexistent and it does exist in the same structure on archives
@droy The redirects are not working properly from download -> archive because the 301 is returning an http:// even though the initiator is https://
So I get errors like this in the browser console and no download if I click on a link on www.eclipse.org.
Mixed Content: The site at 'https://www.eclipse.org/' was loaded over a secure connection, but the file at 'https://archive.eclipse.org/technology/epp/downloads/release/2021-03/R/eclipse-java-2021-03-R-linux-gtk-x86_64.tar.gz' was redirected through an insecure connection. This file should be served over HTTPS. This download has been blocked. See https://blog.chromium.org/2020/02/protecting-users-from-insecure.html for more details.
@droy Is there a transparent redirect to the archive if a file is requested via https://www.eclipse.org/downloads/download.php?file=/path/to/binary.zip ?
I have archived EPP releases <= 2021-03 - this removed ~200GB from the mirrors
Note that some of the now archived files are still downloaded 100s of times per day (mostly the JEE and Java packages for Windows). See download stats for /technology/epp/downloads/release/neon/3/eclipse-jee-neon-3-win32-x86_64.zip as an example, downloaded ~150k times in the last 12 months.
Long story short - @droy please let me know if you see a spike in traffic to archive.e.o that causes you a problem and we can unarchive some of the older releases.
Ah sorry, I mean the projects download page, which enables us to move folders to the "archive". Currently, it's only possible to move a folder to "archive" and then to delete the folder in the archive. Probably, there could be an option to directly delete a folder.
@phwenig We're adding navigation to the Archives. Deletes will be 3 clicks (Archive, Go to, Delete) but that gives us enough confidence that deletions are intentional while making them easier to perform.
Looking at the size of the mirror itself since Denis' call for action, I see it's reduced from 1.5T then down to 1.2T today so that's a great improvement:
I wonder if you would consider adding to the exclusions list? For example, I could suggest adding webtools/CI (does it need some * in the pattern?) given that its subfolders are taking up 100G currently and these appear to be a huge number of relatively recent integration builds (several months worth) of multiple different WTP versions:
Better of course (for the overall backup footprint of all the servers) would be to limit these to perhaps the 5 most recent such builds for each version, but even in that case, I doubt these need to be mirrored at all...
I'll start a running list of potential exclusions:
*/ci/* (might as well grab them all*/CI/*/releases/neon/ /releases/mars//releases/oxygen//releases/luna/ etc... ? https://download.eclipse.org/oomph/archive/eclipse/releases/index.html
The removed elements are not available on archive.eclipse.org/jetty, is there a way to have access to the removed resources ?
I did not find any official corresponding repo yet.
Note that the following two URLs are quite useful for browsing exactly what's in the file system of the two servers, even if there is an index.* in the folder that would otherwise prevent showing the listing of that folder:
Here's a list of potential additions to the exclude list, which will purge matching files and direcrtories from mirrors:
/technology/osee (doesn't use mirrors)*-dev/ (dev stuff)/technology/linuxtools (2012 archives)/tools/orbit (is it even worth mirroring this?)*/CI/**ganymede**galileo**helios**indigo**juno**kepler**luna**mars**neon**snapshots*test/*milestone/*milestones/*stable/*/ee4j/* (doesn't use mirrors)builds/*
Do we know the download count for the directories affected by this exclude list? Is it reasonably low? What's the cut off number/count?
Expanding the exclude list is a good measure to reduce the disk usage on the mirrors, but it should be accompanied by default quotas for every project's download dir (+ exceptions were necessary) so we do not end up in the same situation in a couple of months.
There's also a price to pay in mirroring (and sync'ing) content that no one downloads -- or worse, mirroring content that folks where folks link to downoad.e.o directly.
The rsync process -- even if there are no bit changes in the content from the sender to receiver - is expensive in disk I/O and runs for several minutes. Plus the stream of bandwidth used for the sync itself is not small.
Before changes to the exclude list:total size is 1,588,864,727,405List xfer alone is 200MAfter:total size is 1,080,857,525,956List transfer alone is 146M