As an Artificial Intelligence/Natural Language Process Data Scientist you will be responsible for building AI and Data Science models with a main focus on data extraction and insights from form or any text corpora. You will need to rapidly prototype various algorithmic implementations and test their efficacy using appropriate experimental design and hypothesis validation.


Basic Qualifications

  • Bachelor's degree in a quantitative field such as statistics, computer science, engineering or applied mathematics, or equivalent work experience

  • Six to eight years of relevant experience

Preferred Skills/Experience

  • PhD or MS in Computer Science, Computational Linguistics, Artificial Intelligence with a heavy focus on NLP/Text mining with 6 years of relevant industry experience.

  • Experience with Financial documents such as SEC filings, financial reports, credit agreements or business news is a plus.

  • Creativity, resourcefulness, and a collaborative spirit.

  • Knowledge and working experience in one or more of the following areas: Natural Language Processing, Clustering and Classifications of Text, Question Answering, Text Mining, Information Retrieval, Distributional Semantics, Knowledge Engineering, Search Rank and Recommendation.

  • Deep experience with text-wrangling and pre-processing skills such as document parsing and cleanup, vectorization, tokenization, language modeling, phrase detection, etc.

  • Proficient programming skills in a high-level language (e.g. Python,R,Java,Scala)

  • Being comfortable with rapid prototyping practices.

  • Being comfortable with developing clean, production-ready code.

  • Being comfortable with pre-processing unstructured or semin-structured data.

  • Experience with statistical data analysis, experimental design, and hypothesis validation.

  • Project-based experience with some of the following tools:

Natural Language Processing (e.g. Spacy, NLTK, OpenNLP or similar, BERT Transfer Learning)

Applied Machine Learning (e.g. Scikit-learn, SparkML, H2O or similar)

Information retrieval and search engines (e.g. Elasticsearch/ELK, Solr/Lucene)

Distributed computing platforms, such as Spark, Hadoop (Hive, Hbase, Pig), GraphLab

Databases ( traditional and NOSQL)

  • Proficiency in traditional Machine Learning models such as LDA/topic modeling, graphical models, etc.

  • Familiarity with Deep Learning architectures and frameworks such as Pytorch, Tensorflow, Keras.

