• vcmj@programming.dev
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    Yes, that’s by design, the networks work on transcripts per input, it does genuinely get cut off eventually, usually it purges an entire older line when the tokens exceed a limit.

    • vcmj@programming.dev
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      Or I should explain better: most training samples will be cut off at the top, so the network sort of learns to ignore it a bit.