Text file with all English words and their part of speech

In summary, the speakers are discussing the need for a text file containing all English words with their corresponding parts of speech for use in natural language processing. One speaker mentions a possible source, but also notes that it may not be necessary to have all the words in English for this task. They also mention other reputable sources, such as WordNet and the Brown Corpus, that are used by NLP libraries like Python's NLTK.
  • #1
Superposed_Cat
388
5
Hey all, been wanting to get into NLP (natural language processing) but I require a text file with a list of all English words (not the definitions) and a tag indicating their part of speech, I know it exists because I had it on my old laptop but I can't seem to refind it. Any help apreciated.
 
Technology news on Phys.org
  • #2
Superposed_Cat said:
Hey all, been wanting to get into NLP (natural language processing) but I require a text file with a list of all English words (not the definitions) and a tag indicating their part of speech, I know it exists because I had it on my old laptop but I can't seem to refind it. Any help apreciated.
ALL the words in English? That's going to be one hell of a file. And mostly useless. Of the 1,000,000+ words in English (depending on who you believe), an average speaker has a vocab of about 6,000 to 8,000 words and a highly educated one has under 20,000 so even highly educated English speakers use less than 2% of the words in the language (and may have "receptive" knowledge of another 1% or less). I suspect that your list problably had 20,000 to 30,000 words, not "all" the words in English.
 
  • #3
I won't be able to help you find your file, but if you want a dictionary with words in it https://github.com/TheBerkin/Rantionary/blob/master/Prepositions.dic is one. It has pronunciation as well.
 
Last edited by a moderator:
  • #4
http://wordnet.princeton.edu/
WordNet® is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations

These guys are often used as corpora for natural language, and their database is downloadable (free). Python NLTK uses this, as do a lot of other NLP libraries.
 
Last edited:
  • Like
Likes jim mcnamara
  • #5
You might want to search for the 'Brown Corpus', one of the earliest best known corpus with parts of speech. I don't think any two groups of computational linguists agree on the parts of speech; you may not even need parts of speech data depending on what you're doing.
 
  • #6
http://www.nltk.org/nltk_data/

That's the complete list of sources used by the Python natural language toolkit. Wordnet and Brown Corpus are in there, as are others. That's quite a good library.
 

Related to Text file with all English words and their part of speech

1. What is a text file with all English words and their part of speech?

A text file with all English words and their part of speech is a digital document that contains a comprehensive list of all English words and their corresponding part of speech, such as noun, verb, adjective, etc. It is often used for language analysis and natural language processing tasks.

2. Where can I find a text file with all English words and their part of speech?

There are various sources where you can find a text file with all English words and their part of speech, such as online databases, language corpora, and language processing software. You can also create your own text file by compiling a list of words and their part of speech manually or using automated tools.

3. What is the benefit of having a text file with all English words and their part of speech?

A text file with all English words and their part of speech can be a valuable resource for language-related research, education, and development. It can provide a comprehensive and standardized list of words for natural language processing tasks, assist in language learning and teaching, and aid in the development of dictionaries and language processing algorithms.

4. How accurate and comprehensive is a text file with all English words and their part of speech?

The accuracy and comprehensiveness of a text file with all English words and their part of speech can vary depending on the source and method used to create it. Some text files may only include common words, while others may contain rare or obsolete words as well. It is important to check the credibility and validity of the source when using such a text file for language-related tasks.

5. Can I use a text file with all English words and their part of speech for commercial purposes?

The usage rights of a text file with all English words and their part of speech may vary depending on the source. Some sources may offer the text file for free and allow commercial use, while others may have restrictions on its usage. It is important to read the terms and conditions of the source before using the text file for commercial purposes.

Similar threads

  • Programming and Computer Science
2
Replies
65
Views
2K
  • Programming and Computer Science
Replies
2
Views
468
  • Programming and Computer Science
Replies
6
Views
1K
  • Programming and Computer Science
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Programming and Computer Science
Replies
2
Views
1K
  • Programming and Computer Science
Replies
4
Views
1K
  • Programming and Computer Science
2
Replies
69
Views
4K
  • Art, Music, History, and Linguistics
Replies
34
Views
3K
  • Programming and Computer Science
Replies
23
Views
1K
Back
Top