hngopher.com

       [HN Gopher] An Open Version of WordNet
       ___________________________________________________________________
        
       An Open Version of WordNet
        
       Author : syats
       Score  : 49 points
       Date   : 2020-07-30 13:50 UTC (1 days ago)
        
 (HTM) web link (en-word.net)
 (TXT) w3m dump (en-word.net)
        
       | rohan1024 wrote:
       | A fun way to access WordNet hosted by dict.org
       | nc dict.org 2628         DEFINE wn hacker
       | 
       | Don't think it's open though.
        
         | minerjoe wrote:
         | It's open. You can install your own copy locally.
         | 
         | Arch has an AUR dict-wn that does this. For those not using
         | arch you can still clone the PKGBUILD and see how it grabs and
         | compiles and installs.
         | 
         | git clone https://aur.archlinux.org/dict-wn
        
           | frank2 wrote:
           | I remember installing the Wordnet database on my Arch Linux
           | install 17 years ago so I would have a dictionary I could use
           | without internet connectivity. Wordnet can be used as a
           | traditional dictionary although it is not very good compared
           | to, e.g., the dictionary that comes with any Mac or iPhone.
        
       | Yeroc wrote:
       | Is there a similar API with the etymology of the words? I played
       | with this a bit and it doesn't seem to cover this area.
        
       | jatsign wrote:
       | An animated graph of wordnet links may help explain why it's
       | useful: https://www.wordsapi.com/
        
       | syats wrote:
       | Wordnet is like a dictionary in that it contains definitions and
       | synonyms of words. It goes beyond a dictionary in that it also
       | records relationships like hypernym (broader) and hyponym
       | (narrower), which can be useful for "understanding" (what ever
       | that means) text. It is a graph, that connects different senses
       | (called synsets), and also senses to words. It used to be
       | released under some close license and poorly maintained, now
       | there's a fork of it on github to which all can contribute.
        
         | tasogare wrote:
         | Wordnet has a "gloss" field but it's very lacking if used as a
         | traditional dictionary. Its value lies in the graph of synsets.
        
           | compressedgas wrote:
           | The problem with the gloss or example field is that it is per
           | synnet and not per (word, sense or synnet) pair as it would
           | be in a normal dictionary.
           | 
           | This means if you try to use it as a normal dictionary the
           | glosses tend to not contain the word for which you are
           | listing the senses.
        
       | Isamu wrote:
       | Wordnet is already open, I think the advantage of this is that it
       | is being actively maintained.
       | 
       | From Wordnet:
       | 
       | > Permission to use, copy, modify and distribute this software
       | and database and its documentation for any purpose and without
       | fee or royalty is hereby granted, provided that you agree to
       | comply with the following copyright notice and statements,
       | including the disclaimer, and that the same appear on ALL copies
       | of the software, database and documentation, including
       | modifications that you make for internal use or for distribution.
        
       | rrose wrote:
       | it... doesn't know the word "how"? literally like the second word
       | i typed in. it seems like this must have some holes in it?
        
       | Bellamy wrote:
       | What exactly can you do/produce with this graph and connection of
       | words? I can't understand the benefits.
        
         | suyash wrote:
         | yes, I'm also interested in use cases for the same.
        
         | abhgh wrote:
         | Using WordNet used to be a very popular way to perform
         | "knowledge-rich" NLP around late 90s upto around 2010
         | (approximate timeline). "Knowledge-rich" meant you could start
         | with some understanding of the language and not rely solely on
         | the data at hand. Much like the use-case that pretrained models
         | like GloVe serve today (WordNet probably is closer to
         | Dependency based word vectors [1]). Some interesting uses were
         | query expansion [2], sense disambiguation [3], word
         | similarities (popular: wu-Palmer similarity, check out NLTK),
         | and in an interesting area called "lexical chains" [4]: group
         | of related words running through a text, with their "weave"
         | signifying topics.
         | 
         | The arrival of WordNet on the scene, when it happened, was a
         | big deal, since there weren't many ways to perform knowledge-
         | rich NLP back then. The common ones were using a dictionary or
         | a thesaurus. There was some effort to tie topic models with
         | WordNet too, like LDAWN [5]. And extending it, based on
         | collocation information you could glean from the gloss -
         | "eXtended WordNet" [6].
         | 
         | You still (occasionally) see its uses where you need some kind
         | of rich prior knowledge. For ex, the "Hierarchical
         | Probabilistic Neural Network Language Model" by Morin and
         | Bengio [7], or cluster labeling (which uses embeddings with
         | WordNet) [8]. To quote an example from the latter, 'a word
         | cluster containing words "dog" and "wolf" should not be labeled
         | with either word, but as "canids"'. And you know "canids" is a
         | super-category here, by looking up the precise relationships in
         | WordNet.
         | 
         | My own Master's research looked at combining WordNet based
         | lexical chaining with more "ML"-ish techniques like Hidden
         | Markov Models [9]. Which is why I know, or rather, vaguely
         | remember, some of the stuff that was happening back then :-)
         | 
         | I think the primary reason why WordNet did not retain its
         | popularity was it was a good "one off" solution. Worked well
         | with "correct" English. You want to adapt it to your domain
         | vocabulary? Heuristics. You want to use WordNet in another
         | language? Well, someone needs to build one first. You want to
         | use it to process text in internet lingo? Nope, hybrid models
         | and heuristics. Also, at this time the amount of text available
         | to train on was increasing by leaps and bounds, so the field
         | moving toward ML heavy techniques made sense.
         | 
         | [1] https://www.aclweb.org/anthology/P14-2050.pdf
         | 
         | [2] https://www.aclweb.org/anthology/P08-1017.pdf
         | 
         | [3]
         | https://pdfs.semanticscholar.org/7f2c/b3e390c5e539ef9089014a...
         | 
         | [4]
         | http://www.cs.columbia.edu/nlp/papers/2003/galley_mckeown_03...
         | 
         | [5] https://wordnet.cs.princeton.edu/papers/jbg-EMNLP07.pdf
         | 
         | [6] https://en.wikipedia.org/wiki/EXtended_WordNet
         | 
         | [7] https://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-
         | nn...
         | 
         | [8] https://www.aclweb.org/anthology/U18-1008/
         | 
         | [9]
         | https://pdfs.semanticscholar.org/e7ce/34e5acdbb7a91e28fdafa9...
        
         | defen wrote:
         | You can build a company called Applied Semantics and then sell
         | your tech to Google so they can develop products called
         | "AdSense" and "AdWords" and make trillions of dollars of
         | revenue. However, first you'll need to invent a time machine
         | that can take you back in time 20 years.
        
         | azinman2 wrote:
         | It used to be used in NLP, but never to great success. Word
         | embeddings are a far more powerful way to achieve a lot of
         | similar goals, but with easier computation, easier scalability
         | to other languages, and accommodation for new/personal words
         | that aren't (yet) in the dictionary.
        
       ___________________________________________________________________
       (page generated 2020-07-31 23:00 UTC)