Micah P. Dombrowski / May 13 2019

NLTK Stanford Environment

1. Download NLTK Add-ons

Download various Java-based Stanford tools that NLTK can use. Cell is locked to prevent spurious re-downloading.

cd /results
wget --progress=dot:giga \
  https://nlp.stanford.edu/software/stanford-ner-2018-10-16.zip \
  https://nlp.stanford.edu/software/stanford-postagger-full-2018-10-16.zip \
  https://nlp.stanford.edu/software/stanford-parser-full-2018-10-17.zip \
  https://nlp.stanford.edu/software/stanford-{arabic,chinese,english,french,german,spanish}-corenlp-2018-10-05-models.jar
stanford-arabic-corenlp-2018-10-05-models.jar
stanford-french-corenlp-2018-10-05-models.jar
stanford-german-corenlp-2018-10-05-models.jar
stanford-chinese-corenlp-2018-10-05-models.jar
stanford-spanish-corenlp-2018-10-05-models.jar
stanford-english-corenlp-2018-10-05-models.jar
stanford-ner-2018-10-16.zip
stanford-postagger-full-2018-10-16.zip
stanford-parser-full-2018-10-17.zip

2. Install NLTK + Tools

apt-get -qq update
DEBIAN_FRONTEND=noninteractive apt-get install --no-install-recommends \
  default-jdk
apt-get clean
rm -r /var/lib/apt/lists/* # Clear package list so it isn't stale

conda install nltk
conda clean -qtipy

pip install PyPDF2
mkdir /usr/local/nltk
cd /usr/local/nltk

unzip -q 
stanford-spanish-corenlp-2018-10-05-models.jar
ln -s stanford-ner* ner unzip -q
stanford-spanish-corenlp-2018-10-05-models.jar
ln -s stanford-postagger* postagger unzip -q
stanford-spanish-corenlp-2018-10-05-models.jar
ln -s stanford-parser* parser cp
stanford-spanish-corenlp-2018-10-05-models.jar
\
stanford-spanish-corenlp-2018-10-05-models.jar
\
stanford-spanish-corenlp-2018-10-05-models.jar
\
stanford-spanish-corenlp-2018-10-05-models.jar
\
stanford-spanish-corenlp-2018-10-05-models.jar
\
stanford-spanish-corenlp-2018-10-05-models.jar
\ parser/