[HN Gopher] An Open Version of WordNet ___________________________________________________________________ An Open Version of WordNet Author : syats Score : 49 points Date : 2020-07-30 13:50 UTC (1 days ago) (HTM) web link (en-word.net) (TXT) w3m dump (en-word.net) | rohan1024 wrote: | A fun way to access WordNet hosted by dict.org | nc dict.org 2628 DEFINE wn hacker | | Don't think it's open though. | minerjoe wrote: | It's open. You can install your own copy locally. | | Arch has an AUR dict-wn that does this. For those not using | arch you can still clone the PKGBUILD and see how it grabs and | compiles and installs. | | git clone https://aur.archlinux.org/dict-wn | frank2 wrote: | I remember installing the Wordnet database on my Arch Linux | install 17 years ago so I would have a dictionary I could use | without internet connectivity. Wordnet can be used as a | traditional dictionary although it is not very good compared | to, e.g., the dictionary that comes with any Mac or iPhone. | Yeroc wrote: | Is there a similar API with the etymology of the words? I played | with this a bit and it doesn't seem to cover this area. | jatsign wrote: | An animated graph of wordnet links may help explain why it's | useful: https://www.wordsapi.com/ | syats wrote: | Wordnet is like a dictionary in that it contains definitions and | synonyms of words. It goes beyond a dictionary in that it also | records relationships like hypernym (broader) and hyponym | (narrower), which can be useful for "understanding" (what ever | that means) text. It is a graph, that connects different senses | (called synsets), and also senses to words. It used to be | released under some close license and poorly maintained, now | there's a fork of it on github to which all can contribute. | tasogare wrote: | Wordnet has a "gloss" field but it's very lacking if used as a | traditional dictionary. Its value lies in the graph of synsets. | compressedgas wrote: | The problem with the gloss or example field is that it is per | synnet and not per (word, sense or synnet) pair as it would | be in a normal dictionary. | | This means if you try to use it as a normal dictionary the | glosses tend to not contain the word for which you are | listing the senses. | Isamu wrote: | Wordnet is already open, I think the advantage of this is that it | is being actively maintained. | | From Wordnet: | | > Permission to use, copy, modify and distribute this software | and database and its documentation for any purpose and without | fee or royalty is hereby granted, provided that you agree to | comply with the following copyright notice and statements, | including the disclaimer, and that the same appear on ALL copies | of the software, database and documentation, including | modifications that you make for internal use or for distribution. | rrose wrote: | it... doesn't know the word "how"? literally like the second word | i typed in. it seems like this must have some holes in it? | Bellamy wrote: | What exactly can you do/produce with this graph and connection of | words? I can't understand the benefits. | suyash wrote: | yes, I'm also interested in use cases for the same. | abhgh wrote: | Using WordNet used to be a very popular way to perform | "knowledge-rich" NLP around late 90s upto around 2010 | (approximate timeline). "Knowledge-rich" meant you could start | with some understanding of the language and not rely solely on | the data at hand. Much like the use-case that pretrained models | like GloVe serve today (WordNet probably is closer to | Dependency based word vectors [1]). Some interesting uses were | query expansion [2], sense disambiguation [3], word | similarities (popular: wu-Palmer similarity, check out NLTK), | and in an interesting area called "lexical chains" [4]: group | of related words running through a text, with their "weave" | signifying topics. | | The arrival of WordNet on the scene, when it happened, was a | big deal, since there weren't many ways to perform knowledge- | rich NLP back then. The common ones were using a dictionary or | a thesaurus. There was some effort to tie topic models with | WordNet too, like LDAWN [5]. And extending it, based on | collocation information you could glean from the gloss - | "eXtended WordNet" [6]. | | You still (occasionally) see its uses where you need some kind | of rich prior knowledge. For ex, the "Hierarchical | Probabilistic Neural Network Language Model" by Morin and | Bengio [7], or cluster labeling (which uses embeddings with | WordNet) [8]. To quote an example from the latter, 'a word | cluster containing words "dog" and "wolf" should not be labeled | with either word, but as "canids"'. And you know "canids" is a | super-category here, by looking up the precise relationships in | WordNet. | | My own Master's research looked at combining WordNet based | lexical chaining with more "ML"-ish techniques like Hidden | Markov Models [9]. Which is why I know, or rather, vaguely | remember, some of the stuff that was happening back then :-) | | I think the primary reason why WordNet did not retain its | popularity was it was a good "one off" solution. Worked well | with "correct" English. You want to adapt it to your domain | vocabulary? Heuristics. You want to use WordNet in another | language? Well, someone needs to build one first. You want to | use it to process text in internet lingo? Nope, hybrid models | and heuristics. Also, at this time the amount of text available | to train on was increasing by leaps and bounds, so the field | moving toward ML heavy techniques made sense. | | [1] https://www.aclweb.org/anthology/P14-2050.pdf | | [2] https://www.aclweb.org/anthology/P08-1017.pdf | | [3] | https://pdfs.semanticscholar.org/7f2c/b3e390c5e539ef9089014a... | | [4] | http://www.cs.columbia.edu/nlp/papers/2003/galley_mckeown_03... | | [5] https://wordnet.cs.princeton.edu/papers/jbg-EMNLP07.pdf | | [6] https://en.wikipedia.org/wiki/EXtended_WordNet | | [7] https://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical- | nn... | | [8] https://www.aclweb.org/anthology/U18-1008/ | | [9] | https://pdfs.semanticscholar.org/e7ce/34e5acdbb7a91e28fdafa9... | defen wrote: | You can build a company called Applied Semantics and then sell | your tech to Google so they can develop products called | "AdSense" and "AdWords" and make trillions of dollars of | revenue. However, first you'll need to invent a time machine | that can take you back in time 20 years. | azinman2 wrote: | It used to be used in NLP, but never to great success. Word | embeddings are a far more powerful way to achieve a lot of | similar goals, but with easier computation, easier scalability | to other languages, and accommodation for new/personal words | that aren't (yet) in the dictionary. ___________________________________________________________________ (page generated 2020-07-31 23:00 UTC)