That suggestion is exactly the same as what I started with when I said “IMO the ideal solution would be the one Wikimedia uses, which is to make the information available in an easily-downloadable archive file.” It just cuts out the Aaron-Schwarts-style external middleman, so it’s easier and more efficient to create the downloadable data.
I don’t understand why the burden is on the victims here.
They put the website up. Load balancing, rate limiting, and such go with the turf. It’s their responsibility to make the site easy to use and hard to break. Putting up an archive of the content that the scrapers want is an easy and straightforward thing to do to accomplish this goal.
I think what’s really going on here is that your concern isn’t about ensuring that the site is up, and it’s certainly not about ensuring that the data it’s providing is readily available. It’s that there are these specific companies you don’t like and you just want to forbid them from accessing otherwise freely accessible data.
That is absolutely ridiculous. The pressure AI scraping puts on sites vastly outstrips anything people built for, as evidenced by the fact that the systems are going down.
Yes. Which is why I’m suggesting providing an approach that doesn’t require scraping the site.
That suggestion is exactly the same as what I started with when I said “IMO the ideal solution would be the one Wikimedia uses, which is to make the information available in an easily-downloadable archive file.” It just cuts out the Aaron-Schwarts-style external middleman, so it’s easier and more efficient to create the downloadable data.
deleted by creator
They put the website up. Load balancing, rate limiting, and such go with the turf. It’s their responsibility to make the site easy to use and hard to break. Putting up an archive of the content that the scrapers want is an easy and straightforward thing to do to accomplish this goal.
I think what’s really going on here is that your concern isn’t about ensuring that the site is up, and it’s certainly not about ensuring that the data it’s providing is readily available. It’s that there are these specific companies you don’t like and you just want to forbid them from accessing otherwise freely accessible data.
deleted by creator
Yes. Which is why I’m suggesting providing an approach that doesn’t require scraping the site.
deleted by creator
Perhaps be more succinct? You’re really flooding the zone here.
No, I’m staying focused.
Do you have any idea how deeply it undermines your argument when you just openly say, “You’re writing oo much for me to read, please write less.”
Don’t respond if you don’t have the common courtesy to read what the person person wrote.
“be more succinct”?
maybe have AI summarize it for you 🙄
deleted by creator