Skip to content

Replace bash scripts and aws instruction with the cc-downloader#18

Merged
lfoppiano merged 3 commits intomainfrom
feature/luca/use-cc-downloader
Feb 26, 2026
Merged

Replace bash scripts and aws instruction with the cc-downloader#18
lfoppiano merged 3 commits intomainfrom
feature/luca/use-cc-downloader

Conversation

@lfoppiano
Copy link
Collaborator

As discussed in https://github.com/commoncrawl/issues/issues/625:

  • provide instruction using the polite cc-downloader
  • remove awk-based download script
  • remove AWS s3 instructions

@wumpus
Copy link
Member

wumpus commented Feb 17, 2026

cargo install cc-downloader doesn't always work -- it doesn't work on rf, for example. So near where you mention installing cc-downloader, I'd mention that users should look at the cc-downloader repo for installation instructions if cargo install cc-downloader does not work.

@lfoppiano
Copy link
Collaborator Author

lfoppiano commented Feb 17, 2026

cargo install cc-downloader doesn't always work -- it doesn't work on rf, for example. So near where you mention installing cc-downloader, I'd mention that users should look at the cc-downloader repo for installation instructions if cargo install cc-downloader does not work.

ahhh, good catch @wumpus!

I've added the same fix also in the whirlwind-python PR.

Copy link

@sebastian-nagel sebastian-nagel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @lfoppiano! Looks good.

@lfoppiano lfoppiano merged commit 7318a0f into main Feb 26, 2026
1 check passed
@lfoppiano lfoppiano deleted the feature/luca/use-cc-downloader branch February 26, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants