[HN Gopher] Show HN: Ecco - See what your NLP language model is ... ___________________________________________________________________ Show HN: Ecco - See what your NLP language model is "thinking" Author : jalammar Score : 153 points Date : 2021-01-08 12:08 UTC (10 hours ago) (HTM) web link (www.eccox.io) (TXT) w3m dump (www.eccox.io) | shenberg wrote: | NMF for factorizing activations is brilliant! | ZeroCool2u wrote: | Fantastic work. This is the kind of stuff we need to get these | models actually adopted and integrated into non-tech | organizations. | pizza wrote: | One small step on the path towards solid-state intelligence | Der_Einzige wrote: | This work is awesome! | | Are there theoretical reason to choose NMF over other | dimensionality reduction algorithms, e.g. UMAP? | | Is it easy to add other DR algorithms? I may submit a PR adding | those in if it is... | jalammar wrote: | I actually started with PCA. But NMF proved more understandable | since negative dimensions in PCA are hard to interpret. I | didn't consider UMAP, but would be interested to see how it | performs here. | | It should be easy, yeah. for NMF, the activations vector is | reshaped from (layers, neurons, token position) down into | (layers/neurons, token position). And we present that to | sklearn's NMF model. I would assume UMAP would operate on that | same matrix. That matrix is called 'merged_act' and is located | here: | https://github.com/jalammar/ecco/blob/1e957a4c1c9bd49c203993... | jalammar wrote: | Hi HN, | | Author here. I had been fascinated with Andrej Karpathy's article | (https://karpathy.github.io/2015/05/21/rnn-effectiveness/) -- | especially where it shows neurons being activated in response to | brackets and indentation. | | I built Ecco to enable examining neurons inside Transformer-based | language models. | | You can use Ecco to simply interact with a language model and see | its output token by token(as it's built on the awesome Hugging | Face transformers package). But more interestingly you can use it | to examine neuron activations. The article explains more: | https://jalammar.github.io/explaining-transformers/ | | I have a couple more visualizations I'd like to add in the | future. It's open source, so feel free to help me improve it. | Grimm1 wrote: | This is fantastic, I used your earlier transformers article to | first get a real grasp on the architecture. I hope you expand | this to accommodate other modes of attention outside of | transformers paradigm as well! | jalammar wrote: | Wonderful! Thanks! | | I am curious about those recent O(L) attention transformers | (see slide 106 of http://gabrielilharco.com/publications/EMNL | P_2020_Tutorial__...). If these methods are converging | towards a new self-attention mechanism, I'd love to try | illustrating that. | | What other attention modes are you referring to? Did | something in particular catch your attention? | Grimm1 wrote: | Personally, I implemented this just yesterday. | | https://arxiv.org/pdf/1703.03130.pdf | | It's a bit older now but I was looking for a self attention | method without resorting to a transformer model and this | proposed an interesting implementation that wound up being | very successful for my problem case. | airstrike wrote: | I just want to say I absolutely love the name and logo. Brings | back some fond memories of an incredibly hard game from once | upon a time... | | Having said that, IANAL, but I find it unlikely that the use of | a dolphin and the word Ecco together are not trademarked, so | you may want to check on that before someone bugs you about it | cmrdsprklpny wrote: | "Ecco the Dolphin" is a game for Sega consoles. | https://en.wikipedia.org/wiki/Ecco_the_Dolphin | airstrike wrote: | Yes, that's precisely what I meant | ninjin wrote: | I can not thank you enough for your "The Illustrated | Transformer" [1] that I have directed two cohorts of MSc | students to - it is a true gem of an article. A few years ago | my group made an interface to visualise contextual word | representations [2] that looked like a primordial soup ancestor | to your most recent article (no screenshots though, sadly). I | hope putting these together brings you as much joy as it does | to your fans in academia and education like myself reading it. | Despite Chris Ohla's effort with Distill, I still think we lack | a good way to give the amount of credit efforts like yours | deserve. | | [1]: https://jalammar.github.io/illustrated-transformer | | [2]: https://github.com/uclnlp/muppetshow | tchalla wrote: | I also want to make an additional "Thank You" note for the | author on the lovely "The Illustrated Word2Vec" [0]. I wish | every concept Machine Learning or otherwise would follow such | a framework. | | [0] https://jalammar.github.io/illustrated-word2vec/ | jalammar wrote: | I'd love to look at your group's visualizations! Is it a | private repo? because the link doesn't open up. It never | stops to blow my mind that we can represent words and | concepts in vectors of numbers. | | Thanks for your kind words! It's a labor of passion, | honestly. And while in previous years it was a nights-and- | weekends project, I have recently been giving it my entire | time and focus -- which is why I'm able to dip my toes more | heavily into R&D like Ecco and the "Explaining Transformers" | article. | ninjin wrote: | Yikes, you are right... I just linked a private repo. '^^ I | have poked the rest of the group and it seems that at least | a tweet was made [1] - but not much else remains. | Describing it from memory, we ran ELMo and BERT on | Wikipedia and then allowed similarity search between a | query and showed heat maps to a matched context. Nothing | particularly deep compared to yours that go into the | transformer "machinery", but I think it captures very well | how most Question Answering models still operate: Embed | query and contexts in a high-dimensional space, compare, | find semantically plausible span, and done! | | [1]: https://twitter.com/Johannes_Welbl/status/106530965474 | 036121... | | Work and articles like yours has truly had an impact on me, | even though they are largely qualitative. We always say | "Turing complete" this and "Turing complete" that, but | theoretical statements such as this have little practical | utility to me as we all know that what can be learnt and | what is learnt are two very different things. For example, | "Visualizing and Understanding Recurrent Networks" by | Karpathy et al. (2015) [2] that you list as inspiration | blew my mind in terms of for example neurons that | monotonically decrease from the sentence start. I remember | Karpathy giving a talk on it in London and what struck me | was how he simply had gone to manually inspect the neurons | manually (heresy!) as there were only a few thousand of | them any way. That playfulness, truly admirable. | | [2]: https://arxiv.org/abs/1506.02078 | | Another anecdote, now from "Attention Is All You Need" by | Vaswani et al. (2017) [3] where I was far from sold on | Transformers as a model until Uszkoreit gave a talk at an | invitation-only summit where he showed those cherry-picked | attention heads that "flipped" based on whether an object | was animate or not. I approached him after the talk and | asked why it was not in the paper as it was awesome! Maybe | I am biased because I give a large role to intuition in | science, but analysis such as this is far more valuable to | me as a researcher than yet another point of BLEU or a 10th | dataset. Again, my bias, but I feel that there is a need | for new ways of thinking in terms of both "hard" empiricism | and "soft" analysis in machine learning as we seemingly are | now having to mature given the attention we are receiving. | | [3]: https://arxiv.org/abs/1706.03762 | | Apologies if I am rambling, it is midnight now and I barely | slept last night. | ptd wrote: | You are not rambling. Thanks for sharing. | jalammar wrote: | Hey, I feel you! I'm an intuitive learner as well. I | wouldn't have been able to learn much in ML if it weren't | for people who write and visualize and make the methods | accessible to non-experts. In my case, as with many | others, it was the writing and videos of Andrew Ng, | Karpathy, Chris Olah, Nando de Freitas, Sebastian Ruder, | Andrew Trask, and Denny Britz amongst others. Accessible | content like this goes a long way in building the | confidence to further pursue the topic and not be | intimidated by the steep learning curve. It fill me with | joy that you've found some of my work helpful. | | Thanks for digging up the screenshot. Exploring | contextualize word embeddings is truly fascinating. And | thanks for sharing your experience! | indymike wrote: | Helping people understand "what the ai is thinking" is really | important when you are trying to get organizations to adopt the | technology. Great work. | nathanyz wrote: | Exactly and maybe we can "lobotomize" sections of the models | that replicate unwanted bias in the training data. | anfal_alatawi wrote: | Thank you, Jay! I appreciate the addition of the colab notebooks | with code examples. I can't wait to play around with this and | investigate how language models _speak_. | jalammar wrote: | Thanks! Please let me know if you have any feedback! | GistNoesis wrote: | Interesting. The non-negative matrix factorization on the first | level kinda highlight some semantic groupings : paragraph, verbs, | auxiliaries, commas, pronouns, nominal propositions. | | I tried to look at higher level layers, and the grouping were | indeed of higher level : for example at level 4 there was a | grouping which highlighted for any punctuation (and not just | comma). The grouping were also qualifying more : for example | ("would deliberately" whereas at lower level it was just would). | | But it's not as clear as I had hoped it would be. I hoped it | would somehow highlight grouping of higher and higher size, that | could nicely map to the equivalent of a parse-tree. | | The problem I have with this kind of visualizations, is that they | often require interpretation. Also, they don't tell me if the | structure was really present by the neural network but was just | not apparent because the prism of the Non-negative Matrix | Factorization hid it. | | For my own networks, instead of visualizing, I like to quantify | things a little more. I give the neural network some additional | layers, and I try to make the neural network produce the | visualization directly. I give it some examples of what I'd like | the visualization to look like, and jointly train/fine-tune the | neural network so that it solve simultaneously his original task, | and the production of the visualization which is then easier to | inspect. | | Depending on how many additional layers I had to add, and | depending on where they were added, and depending on how accurate | (measured by a Loss Function!) the network prediction are, I can | better infer how it's working internally, and whether or not the | network is really doing the work or if it is taking some mental | shortcuts. | | For example in my Colorify [1] browser extension, which aims to | reduce the cognitive load of reading, I use neural networks to | predict simultaneously visualizations of sentence-grouping, | linguistic features, and even the parse-tree. | | [1] https://addons.mozilla.org/en-US/firefox/addon/colorify/ | jalammar wrote: | Interesting. Thanks for sharing your notes on the higher | layers. Allow me to repost that to the discussion board on | github. | | I do get your point on interpretation. This work is just a | starting point. I'm curious to arrive at ways to automatically | select the appropriate number of factors for a specific | sequence. Kind of like the elbow method for K-means clustering. | blackbear_ wrote: | Any examples of novel insights obtained with this method? | amrrs wrote: | It's also mentioned in this video | https://youtu.be/gJPMXgvnX4Y?t=429 | jalammar wrote: | What I found most fascinating is identifying neuron firing | patterns corresponding to linguistic properties: e.g. groups of | neurons that fire in response to verbs, or pronounds. | | Scroll down to "Factorizing Activations of a Single Layer" in | https://jalammar.github.io/explaining-transformers/ to see | those. | | The figure above it, titled 'Explorable: Ten Activation Factors | of XML' shows how neuron firing patterns in response to XML -- | opening tags, closing tags, and even indentation. | | It's still fresh, but I'm keen to see what other people uncover | in their examinations (or what shortfalls/areas of improvement | there are for such a method). | yowlingcat wrote: | Wow, love the NNMF visualization. Like all great visualizations, | it does a very good job of showing and not telling me what's | going on. More of this, please. One question: how does this kind | of thing line up with what people describe as "explainable AI?" | gfody wrote: | It's not explainable until all these weights are between | unambiguous concepts in a knowledge base rather than plain text | tokens that must be interpreted. For some reason we gave up on | symbolic AI in the 70's and decided making machines write | poetry is where the money's at. | jalammar wrote: | These are AI explanation methods. They belong to the toolbox | which would include LIME, Shapley values...etc. Input saliency | is a gradient-based explanation method. | khalidlafi wrote: | looks great! | jalammar wrote: | Thank you! ___________________________________________________________________ (page generated 2021-01-08 23:01 UTC)