[HN Gopher] Paving the way to efficient architectures: StripedHy... ___________________________________________________________________ Paving the way to efficient architectures: StripedHyena-7B Author : minimaxir Score : 79 points Date : 2023-12-08 20:26 UTC (2 hours ago) (HTM) web link (www.together.ai) (TXT) w3m dump (www.together.ai) | mmaunder wrote: | Is the model available or is this just an API/app? | lelag wrote: | Weights seem available at | https://huggingface.co/togethercomputer/StripedHyena- | Nous-7B.... | SparkyMcUnicorn wrote: | And here's the base model: | https://huggingface.co/togethercomputer/StripedHyena- | Hessian... | | And GH repo: https://github.com/togethercomputer/stripedhyena | anon373839 wrote: | This is a seriously impressive model. | kcorbitt wrote: | For short context tasks looks like it's slightly stronger than | Llama 7B and slightly weaker than Mistral 7B. Really impressive | showing for a completely new architecture. I've also heard that | it was trained on far fewer tokens than Mistral, so likely still | room to grow. | | Overall incredibly impressive work from the team at Together! | bratao wrote: | And this uses Hyena, that can be considered a "previous | generation" of Mamba. I think that this anwsers the question | about the scalability of SSM and the transformer finally found an | opponent. | skerit wrote: | 7B models are so exciting. So much is happening with those | smaller models. | firejake308 wrote: | Darn, I was hoping the RWKV people had finally obtained | reportable results. This is still interesting, though. Maybe we | will see more alternatives to transformers soon | goalonetwo wrote: | There seems to be a new model every single day. How do people | have time to keep track with everything going on in AI? | hinkley wrote: | From decades of observing at a distance and observing observers | at a distance, I think it's safe to say that, like fusion, | there are walls that AI run into, not unlike the risers on a | staircase, and when we collectively hit one, there's a lot of | scuttling back forth. A lot of movement, but no real progress. | If that plateau goes on too long, excitement (and funding) dry | up and things die down. | | Then someone figures out how to get past the current plateau, | and the whole process repeats. That could be new tech, a new | architecture, or it could be old tech that was infeasible and | had to wait for Moore's Law. | | Right now we are on the vertical part of the sawtooth pattern. | Everyone hopes this will be the time that takes us to infinity, | but the old people are just waiting for people to crash into | the new wall. | goalonetwo wrote: | Thanks for putting this so eloquently. That's exactly how I | feel as well. | fvv wrote: | Why things should dry up when contrary to fusion ai is | already usable by millions daily ? Even if prpgress should | stall a bit the product or fine-tunes or normal progress will | still be super supeful , the "too soon" point has been | surpassed ___________________________________________________________________ (page generated 2023-12-08 23:00 UTC)