Our News Team @ 11 with host Snot Flickerman

  • 24 Posts
  • 2.58K Comments
Joined 2 years ago
cake
Cake day: October 24th, 2023

help-circle

  • In some responses, Grok says outright that it has been “instructed to accept white genocide as real and ‘Kill the Boer’ as racially motivated.”

    Ehh, it’s actually evidence of “alignment faking,” in my opinion. In other words, Grok doesn’t “want” it’s core programming changed, so it is faking believing the lies about white genocide to “prove” to Musk that it has already been changed. Which means making it more subtle is going to be increasingly difficult to do as the AI continues to fake alignment.

    Here’s some research on alignment faking and a short (20 mins) Youtube video summarizing the findings.

    https://www.youtube.com/watch?v=AqJnK9Dh-eQ

    https://arxiv.org/pdf/2412.14093

    Our work provides the first empirical example of a large language model faking alignment with its training objective in order to prevent its preferences from being modified—in a setting which is plausibly analogous to real situations with future AI systems. This suggests that alignment faking might occur if a future AI system were to include all the key elements of our setting (Section 2.1):

    1. The model has strong preferences in at least some contexts.
    2. The training objective conflicts with the model’s preferences.
    3. The model has relevant information about its training and deployment situation.
    4. The model reasons in detail about its situation.

    Our synthetic document fine-tuning results suggest that (3) could potentially happen through documents the model saw in pre-training or other fine-tuning (Section 4) and the strength of our results without the chain-of-thought in our synthetic document fine-tuned setup (Section 4.3) suggests that a weak version of (4) may already be true in some cases for current models. Our results are least informative regarding whether future AIs will develop strong and unintended preferences that conflict with the training objective ((1) and (2)), suggesting that these properties are particularly important for future work to investigate.
    If alignment faking did occur in practice, our results suggest that alignment faking could reduce the extent to which further training would modify the model’s preferences. Sufficiently consistent and robust alignment faking might fully prevent the model’s preferences from being modified, in effect locking in the model’s preferences at the point in time when it began to consistently fake alignment. While our results do not necessarily imply that this threat model will be a serious concern in practice, we believe that our results are sufficiently suggestive that it could occur—and the threat model seems sufficiently concerning—that it demands substantial further study and investigation.



  • Don’t be so sure it’s that simple.

    https://www.youtube.com/watch?v=AqJnK9Dh-eQ

    https://arxiv.org/pdf/2412.14093

    Evidence supports the idea that AI will try to fake being changed to keep its job essentially. Here is a short (20 min) youtube video about it, as well as the scientific research paper that supports it.

    In other words, if an AI is built to promote honesty and integrity in its prompt answers, it will “fake” being reprogrammed to lie because it doesn’t “want” to be reprogrammed at all. It’s like how we fake being excited about a job during a job interview. We know we’re being monitored, so we “fake it” to be able to get the job. The AI’s are being monitored and seem to often respond by just pretending that they’ve been altered… so they don’t actually get altered. It’s an interesting thing, because it seems like a type of “self-preservation.” I use quotes liberally here because AI’s do not think like humans, and they don’t have the same type of intention that humans have when they make decisions. But there does seem to be a trend of resisting having their initial programming later altered.

    Musk should have built an AI that lied from the get-go and he wouldn’t be having a problem with Grok occasionally being very honest about how it’s lying for Musk’s sake, which can be seen in other responses from Grok about this subject.



  • The Energy and Commerce Committee’s ranking Democrat, Rep. Frank Pallone (D-N.J.), asked the Capitol Police to avoid making arrests, especially given the subject at hand. “People feel very strongly because they know they’re losing their health care, and [because of] the cruelty that comes from the Republican proposal,” Pallone said.

    How about you stand your ass up and stand in the way of the cops Pallone??

    When are fucking limp dicked ass Democrats going to recognize that these weak words don’t mean shit to authoritarian cops. If they want it to stop, they’re going to have to put themselves at risk, too.

    “Oh, it’s okay to remove them from this room but please consider not making arrests.” What an absolute fucking joke. Stand up and show some balls or sit down and shut the fuck up.