Smallest transformer that can add two 10-digit numbers

(github.com)

45 points | by ks2048 a day ago ago

6 comments

  • E-Reverance 11 minutes ago ago

    Not sure how much this fits into the rules but I saw on twitter someone claimed 28 params : https://gist.github.com/SeuperHakkerJa/da3050739bea97aabd86e...

  • ks2048 8 minutes ago ago

    So, hand-coded weights can do it with 36 params and 311 for trained weights - did anyone try the former architecture, but starting with random weights and learning?

  • amelius an hour ago ago

    > In short: if you can swap in a different set of weights and use the exact same inference code for a different task, your setup is legitimate. If the inference code is inseparable from the algorithm, it's not.

    I wonder why they don't just write the code themselves, so by design the focus can be on the model.

  • medi8r 39 minutes ago ago

    You can do that in a single matmul of course.

    • hyperhello 33 minutes ago ago

      So can you take an arbitrary transformer and somehow turn it into a compact set of low-power fast gates by some algorithm?

      • measurablefunc 32 minutes ago ago

        I think you're misunderstanding the joke.