[HN Gopher] Paving the way to efficient architectures: StripedHy...
       ___________________________________________________________________
        
       Paving the way to efficient architectures: StripedHyena-7B
        
       Author : minimaxir
       Score  : 79 points
       Date   : 2023-12-08 20:26 UTC (2 hours ago)
        
 (HTM) web link (www.together.ai)
 (TXT) w3m dump (www.together.ai)
        
       | mmaunder wrote:
       | Is the model available or is this just an API/app?
        
         | lelag wrote:
         | Weights seem available at
         | https://huggingface.co/togethercomputer/StripedHyena-
         | Nous-7B....
        
           | SparkyMcUnicorn wrote:
           | And here's the base model:
           | https://huggingface.co/togethercomputer/StripedHyena-
           | Hessian...
           | 
           | And GH repo: https://github.com/togethercomputer/stripedhyena
        
       | anon373839 wrote:
       | This is a seriously impressive model.
        
       | kcorbitt wrote:
       | For short context tasks looks like it's slightly stronger than
       | Llama 7B and slightly weaker than Mistral 7B. Really impressive
       | showing for a completely new architecture. I've also heard that
       | it was trained on far fewer tokens than Mistral, so likely still
       | room to grow.
       | 
       | Overall incredibly impressive work from the team at Together!
        
       | bratao wrote:
       | And this uses Hyena, that can be considered a "previous
       | generation" of Mamba. I think that this anwsers the question
       | about the scalability of SSM and the transformer finally found an
       | opponent.
        
       | skerit wrote:
       | 7B models are so exciting. So much is happening with those
       | smaller models.
        
       | firejake308 wrote:
       | Darn, I was hoping the RWKV people had finally obtained
       | reportable results. This is still interesting, though. Maybe we
       | will see more alternatives to transformers soon
        
       | goalonetwo wrote:
       | There seems to be a new model every single day. How do people
       | have time to keep track with everything going on in AI?
        
         | hinkley wrote:
         | From decades of observing at a distance and observing observers
         | at a distance, I think it's safe to say that, like fusion,
         | there are walls that AI run into, not unlike the risers on a
         | staircase, and when we collectively hit one, there's a lot of
         | scuttling back forth. A lot of movement, but no real progress.
         | If that plateau goes on too long, excitement (and funding) dry
         | up and things die down.
         | 
         | Then someone figures out how to get past the current plateau,
         | and the whole process repeats. That could be new tech, a new
         | architecture, or it could be old tech that was infeasible and
         | had to wait for Moore's Law.
         | 
         | Right now we are on the vertical part of the sawtooth pattern.
         | Everyone hopes this will be the time that takes us to infinity,
         | but the old people are just waiting for people to crash into
         | the new wall.
        
           | goalonetwo wrote:
           | Thanks for putting this so eloquently. That's exactly how I
           | feel as well.
        
           | fvv wrote:
           | Why things should dry up when contrary to fusion ai is
           | already usable by millions daily ? Even if prpgress should
           | stall a bit the product or fine-tunes or normal progress will
           | still be super supeful , the "too soon" point has been
           | surpassed
        
       ___________________________________________________________________
       (page generated 2023-12-08 23:00 UTC)