[HN Gopher] Show HN: Ecco - See what your NLP language model is ...
       ___________________________________________________________________
        
       Show HN: Ecco - See what your NLP language model is "thinking"
        
       Author : jalammar
       Score  : 153 points
       Date   : 2021-01-08 12:08 UTC (10 hours ago)
        
 (HTM) web link (www.eccox.io)
 (TXT) w3m dump (www.eccox.io)
        
       | shenberg wrote:
       | NMF for factorizing activations is brilliant!
        
       | ZeroCool2u wrote:
       | Fantastic work. This is the kind of stuff we need to get these
       | models actually adopted and integrated into non-tech
       | organizations.
        
       | pizza wrote:
       | One small step on the path towards solid-state intelligence
        
       | Der_Einzige wrote:
       | This work is awesome!
       | 
       | Are there theoretical reason to choose NMF over other
       | dimensionality reduction algorithms, e.g. UMAP?
       | 
       | Is it easy to add other DR algorithms? I may submit a PR adding
       | those in if it is...
        
         | jalammar wrote:
         | I actually started with PCA. But NMF proved more understandable
         | since negative dimensions in PCA are hard to interpret. I
         | didn't consider UMAP, but would be interested to see how it
         | performs here.
         | 
         | It should be easy, yeah. for NMF, the activations vector is
         | reshaped from (layers, neurons, token position) down into
         | (layers/neurons, token position). And we present that to
         | sklearn's NMF model. I would assume UMAP would operate on that
         | same matrix. That matrix is called 'merged_act' and is located
         | here:
         | https://github.com/jalammar/ecco/blob/1e957a4c1c9bd49c203993...
        
       | jalammar wrote:
       | Hi HN,
       | 
       | Author here. I had been fascinated with Andrej Karpathy's article
       | (https://karpathy.github.io/2015/05/21/rnn-effectiveness/) --
       | especially where it shows neurons being activated in response to
       | brackets and indentation.
       | 
       | I built Ecco to enable examining neurons inside Transformer-based
       | language models.
       | 
       | You can use Ecco to simply interact with a language model and see
       | its output token by token(as it's built on the awesome Hugging
       | Face transformers package). But more interestingly you can use it
       | to examine neuron activations. The article explains more:
       | https://jalammar.github.io/explaining-transformers/
       | 
       | I have a couple more visualizations I'd like to add in the
       | future. It's open source, so feel free to help me improve it.
        
         | Grimm1 wrote:
         | This is fantastic, I used your earlier transformers article to
         | first get a real grasp on the architecture. I hope you expand
         | this to accommodate other modes of attention outside of
         | transformers paradigm as well!
        
           | jalammar wrote:
           | Wonderful! Thanks!
           | 
           | I am curious about those recent O(L) attention transformers
           | (see slide 106 of http://gabrielilharco.com/publications/EMNL
           | P_2020_Tutorial__...). If these methods are converging
           | towards a new self-attention mechanism, I'd love to try
           | illustrating that.
           | 
           | What other attention modes are you referring to? Did
           | something in particular catch your attention?
        
             | Grimm1 wrote:
             | Personally, I implemented this just yesterday.
             | 
             | https://arxiv.org/pdf/1703.03130.pdf
             | 
             | It's a bit older now but I was looking for a self attention
             | method without resorting to a transformer model and this
             | proposed an interesting implementation that wound up being
             | very successful for my problem case.
        
         | airstrike wrote:
         | I just want to say I absolutely love the name and logo. Brings
         | back some fond memories of an incredibly hard game from once
         | upon a time...
         | 
         | Having said that, IANAL, but I find it unlikely that the use of
         | a dolphin and the word Ecco together are not trademarked, so
         | you may want to check on that before someone bugs you about it
        
           | cmrdsprklpny wrote:
           | "Ecco the Dolphin" is a game for Sega consoles.
           | https://en.wikipedia.org/wiki/Ecco_the_Dolphin
        
             | airstrike wrote:
             | Yes, that's precisely what I meant
        
         | ninjin wrote:
         | I can not thank you enough for your "The Illustrated
         | Transformer" [1] that I have directed two cohorts of MSc
         | students to - it is a true gem of an article. A few years ago
         | my group made an interface to visualise contextual word
         | representations [2] that looked like a primordial soup ancestor
         | to your most recent article (no screenshots though, sadly). I
         | hope putting these together brings you as much joy as it does
         | to your fans in academia and education like myself reading it.
         | Despite Chris Ohla's effort with Distill, I still think we lack
         | a good way to give the amount of credit efforts like yours
         | deserve.
         | 
         | [1]: https://jalammar.github.io/illustrated-transformer
         | 
         | [2]: https://github.com/uclnlp/muppetshow
        
           | tchalla wrote:
           | I also want to make an additional "Thank You" note for the
           | author on the lovely "The Illustrated Word2Vec" [0]. I wish
           | every concept Machine Learning or otherwise would follow such
           | a framework.
           | 
           | [0] https://jalammar.github.io/illustrated-word2vec/
        
           | jalammar wrote:
           | I'd love to look at your group's visualizations! Is it a
           | private repo? because the link doesn't open up. It never
           | stops to blow my mind that we can represent words and
           | concepts in vectors of numbers.
           | 
           | Thanks for your kind words! It's a labor of passion,
           | honestly. And while in previous years it was a nights-and-
           | weekends project, I have recently been giving it my entire
           | time and focus -- which is why I'm able to dip my toes more
           | heavily into R&D like Ecco and the "Explaining Transformers"
           | article.
        
             | ninjin wrote:
             | Yikes, you are right... I just linked a private repo. '^^ I
             | have poked the rest of the group and it seems that at least
             | a tweet was made [1] - but not much else remains.
             | Describing it from memory, we ran ELMo and BERT on
             | Wikipedia and then allowed similarity search between a
             | query and showed heat maps to a matched context. Nothing
             | particularly deep compared to yours that go into the
             | transformer "machinery", but I think it captures very well
             | how most Question Answering models still operate: Embed
             | query and contexts in a high-dimensional space, compare,
             | find semantically plausible span, and done!
             | 
             | [1]: https://twitter.com/Johannes_Welbl/status/106530965474
             | 036121...
             | 
             | Work and articles like yours has truly had an impact on me,
             | even though they are largely qualitative. We always say
             | "Turing complete" this and "Turing complete" that, but
             | theoretical statements such as this have little practical
             | utility to me as we all know that what can be learnt and
             | what is learnt are two very different things. For example,
             | "Visualizing and Understanding Recurrent Networks" by
             | Karpathy et al. (2015) [2] that you list as inspiration
             | blew my mind in terms of for example neurons that
             | monotonically decrease from the sentence start. I remember
             | Karpathy giving a talk on it in London and what struck me
             | was how he simply had gone to manually inspect the neurons
             | manually (heresy!) as there were only a few thousand of
             | them any way. That playfulness, truly admirable.
             | 
             | [2]: https://arxiv.org/abs/1506.02078
             | 
             | Another anecdote, now from "Attention Is All You Need" by
             | Vaswani et al. (2017) [3] where I was far from sold on
             | Transformers as a model until Uszkoreit gave a talk at an
             | invitation-only summit where he showed those cherry-picked
             | attention heads that "flipped" based on whether an object
             | was animate or not. I approached him after the talk and
             | asked why it was not in the paper as it was awesome! Maybe
             | I am biased because I give a large role to intuition in
             | science, but analysis such as this is far more valuable to
             | me as a researcher than yet another point of BLEU or a 10th
             | dataset. Again, my bias, but I feel that there is a need
             | for new ways of thinking in terms of both "hard" empiricism
             | and "soft" analysis in machine learning as we seemingly are
             | now having to mature given the attention we are receiving.
             | 
             | [3]: https://arxiv.org/abs/1706.03762
             | 
             | Apologies if I am rambling, it is midnight now and I barely
             | slept last night.
        
               | ptd wrote:
               | You are not rambling. Thanks for sharing.
        
               | jalammar wrote:
               | Hey, I feel you! I'm an intuitive learner as well. I
               | wouldn't have been able to learn much in ML if it weren't
               | for people who write and visualize and make the methods
               | accessible to non-experts. In my case, as with many
               | others, it was the writing and videos of Andrew Ng,
               | Karpathy, Chris Olah, Nando de Freitas, Sebastian Ruder,
               | Andrew Trask, and Denny Britz amongst others. Accessible
               | content like this goes a long way in building the
               | confidence to further pursue the topic and not be
               | intimidated by the steep learning curve. It fill me with
               | joy that you've found some of my work helpful.
               | 
               | Thanks for digging up the screenshot. Exploring
               | contextualize word embeddings is truly fascinating. And
               | thanks for sharing your experience!
        
       | indymike wrote:
       | Helping people understand "what the ai is thinking" is really
       | important when you are trying to get organizations to adopt the
       | technology. Great work.
        
         | nathanyz wrote:
         | Exactly and maybe we can "lobotomize" sections of the models
         | that replicate unwanted bias in the training data.
        
       | anfal_alatawi wrote:
       | Thank you, Jay! I appreciate the addition of the colab notebooks
       | with code examples. I can't wait to play around with this and
       | investigate how language models _speak_.
        
         | jalammar wrote:
         | Thanks! Please let me know if you have any feedback!
        
       | GistNoesis wrote:
       | Interesting. The non-negative matrix factorization on the first
       | level kinda highlight some semantic groupings : paragraph, verbs,
       | auxiliaries, commas, pronouns, nominal propositions.
       | 
       | I tried to look at higher level layers, and the grouping were
       | indeed of higher level : for example at level 4 there was a
       | grouping which highlighted for any punctuation (and not just
       | comma). The grouping were also qualifying more : for example
       | ("would deliberately" whereas at lower level it was just would).
       | 
       | But it's not as clear as I had hoped it would be. I hoped it
       | would somehow highlight grouping of higher and higher size, that
       | could nicely map to the equivalent of a parse-tree.
       | 
       | The problem I have with this kind of visualizations, is that they
       | often require interpretation. Also, they don't tell me if the
       | structure was really present by the neural network but was just
       | not apparent because the prism of the Non-negative Matrix
       | Factorization hid it.
       | 
       | For my own networks, instead of visualizing, I like to quantify
       | things a little more. I give the neural network some additional
       | layers, and I try to make the neural network produce the
       | visualization directly. I give it some examples of what I'd like
       | the visualization to look like, and jointly train/fine-tune the
       | neural network so that it solve simultaneously his original task,
       | and the production of the visualization which is then easier to
       | inspect.
       | 
       | Depending on how many additional layers I had to add, and
       | depending on where they were added, and depending on how accurate
       | (measured by a Loss Function!) the network prediction are, I can
       | better infer how it's working internally, and whether or not the
       | network is really doing the work or if it is taking some mental
       | shortcuts.
       | 
       | For example in my Colorify [1] browser extension, which aims to
       | reduce the cognitive load of reading, I use neural networks to
       | predict simultaneously visualizations of sentence-grouping,
       | linguistic features, and even the parse-tree.
       | 
       | [1] https://addons.mozilla.org/en-US/firefox/addon/colorify/
        
         | jalammar wrote:
         | Interesting. Thanks for sharing your notes on the higher
         | layers. Allow me to repost that to the discussion board on
         | github.
         | 
         | I do get your point on interpretation. This work is just a
         | starting point. I'm curious to arrive at ways to automatically
         | select the appropriate number of factors for a specific
         | sequence. Kind of like the elbow method for K-means clustering.
        
       | blackbear_ wrote:
       | Any examples of novel insights obtained with this method?
        
         | amrrs wrote:
         | It's also mentioned in this video
         | https://youtu.be/gJPMXgvnX4Y?t=429
        
         | jalammar wrote:
         | What I found most fascinating is identifying neuron firing
         | patterns corresponding to linguistic properties: e.g. groups of
         | neurons that fire in response to verbs, or pronounds.
         | 
         | Scroll down to "Factorizing Activations of a Single Layer" in
         | https://jalammar.github.io/explaining-transformers/ to see
         | those.
         | 
         | The figure above it, titled 'Explorable: Ten Activation Factors
         | of XML' shows how neuron firing patterns in response to XML --
         | opening tags, closing tags, and even indentation.
         | 
         | It's still fresh, but I'm keen to see what other people uncover
         | in their examinations (or what shortfalls/areas of improvement
         | there are for such a method).
        
       | yowlingcat wrote:
       | Wow, love the NNMF visualization. Like all great visualizations,
       | it does a very good job of showing and not telling me what's
       | going on. More of this, please. One question: how does this kind
       | of thing line up with what people describe as "explainable AI?"
        
         | gfody wrote:
         | It's not explainable until all these weights are between
         | unambiguous concepts in a knowledge base rather than plain text
         | tokens that must be interpreted. For some reason we gave up on
         | symbolic AI in the 70's and decided making machines write
         | poetry is where the money's at.
        
         | jalammar wrote:
         | These are AI explanation methods. They belong to the toolbox
         | which would include LIME, Shapley values...etc. Input saliency
         | is a gradient-based explanation method.
        
       | khalidlafi wrote:
       | looks great!
        
         | jalammar wrote:
         | Thank you!
        
       ___________________________________________________________________
       (page generated 2021-01-08 23:01 UTC)