r/ExperiencedDevs pointed at programming.dev specifically which is why I am giving it a shot. I have used decentralized stuff in the past like Usenet and IRC. I kinda miss the lack of corporate overlords.
I am an academic and outdoors enthusiast who supports the free and open exchange of information. We need to stop the trend of social media companies closing data access to researchers, open source developers, and the general community.
r/ExperiencedDevs pointed at programming.dev specifically which is why I am giving it a shot. I have used decentralized stuff in the past like Usenet and IRC. I kinda miss the lack of corporate overlords.
I am cautiously optimistic about the decentralization and federation. But I think the biggest hurdle is developing the user base right now. ExperiencedDevs is the only subreddit I followed before this all started that directly linked a Lemmy alternative.
I was wondering if someone would bring up search engine indexing. Google certainly has the upper hand for LLM training data with Reddit’s new API change since they have the comments anyway. This is a big reason I fear these API changes, it is very much concentrating power in the hands of already powerful companies.
I am also wary of big tech companies using my comment history for their LLMs. However, I worry that the tech companies will scrape data anyway and Reddit’s API pricing just locks out the open source LLMs. There are a few of them, a couple that I have played with:
https://github.com/nomic-ai/gpt4all
https://github.com/ggerganov/llama.cpp
Some projects even try to preserve privacy. But I think its more on the side of what extra training data you give it and the queries you issue.
I totally agree that Reddit’s motivation is probably not related to LLMs and the link I posted is more of an excuse than anything. However, I am curious what people think about data scraping and LLMs in general.
I hope cross posts are OK. But I am curious about Experienced Dev’s perspective on this as well since the question is rather technical.
Copying my opinion from the other thread in case you don’t want to look at my other thread:
My personal opinion is that high API usage fees hurt open source LLMs (e.g. GPT4All). I would rather not see this new technology monopolized by those who can pay API fees.
Yeah, automatic posts drive me away faster than anything. Good point on cross posting though, I just followed your advice. It’s pretty much free if your post fits in multiple places and there are lots of nearly empty communities right now.