• @redcalcium
    link
    1511 months ago

    We love this example because it illustrates just how difficult it will be to fully understand LLMs. The five-member Redwood team published a 25-page paper explaining how they identified and validated these attention heads. Yet even after they did all that work, we are still far from having a comprehensive explanation for why GPT-2 decided to predict Mary as the next word.

    Current approach to ML model development has the same vibe with people writing a block of code that somehow works and then put comments like "no idea why but it works, modify at your own risk’

    • @Jumper775@lemmy.world
      link
      fedilink
      English
      211 months ago

      Perhaps we could see even greater improvements if we stopped and looked at how this works. Eventually we will need to as there is a limit to how much real text exists.