NLP Interview Coding Task

Please comment on the following NLP Interview Coding Task that I have prepared for the candidates on Data Science NLP position that I am looking for. The goal is to check candidate understanding of the fundamental role of text representations with vectors in NLP, as well as checking candidate coding skills and their ability to optimize computations with vectorization that Numpy provides.

In particular I need your opinion on:

  1. Is task clear?
  2. Is task adequate for coding a rough solution from scratch in 20 -30 minutes during the online interview?
  3. What level - Junior, Middle or Senior DS NLP Engineer - would you assign this task to?

Task:

# Write from scratch (you can only use Numpy arrays) 
# very basic and simple algorithm to classify sentences:

test1 = cats like meat and fish is best for cats
test2 = train your mind reading good fiction, thrillers and other books

# Use these sentences to train your classifier:

# Class 1
sent1 = meat is a good food for all dogs and cats , dogs also like apples

# Class 2
sent2 = reading fiction books is a good food for mind and some thrillers are not

To solve this task, candidate should write count vectorizer and cosine similarity functions from scratch. Using these functions candidate can find similarity of test sentences to classes 1 and 2, and thus classify test sentences. Normalizing vectors would be a bonus for the candidate.

It took 20 minutes for me to code, test and describe this task. Not sure how much time NLP position candidate may need.

Topic cosine-distance classification nlp

Category Data Science


The task is not clear to me. Mostly, I cannot tell if this project is meant to be totally self-contained or not.

  1. Are the two sentences 'sent1' and 'sent2' intended to be the entirety of the training corpus that the word vectors are created from? Or is the intention to use an external dataset for this?

  2. Are the two sentences 'test1' and 'test2' the entirety of the testing set? Or is the intention that the classifier should work for sentences that may share no words in common with the training ones?


  1. Yes the task is very clear. Maybe one suggestion is to change language from build a classifier to provide more details. Create a Rule based clasifier which calculates cosine similarity between sentence and all given classes and assign it to class where it has maximum cosine similarity

  2. This would be a very good excercise and test coding as well as general understanding of the candidate. If a person is able to achieve even 90% of this, they should be a good candidate

  3. It should be good enough for middle level and senior DS engineer.

  4. If you want to complicate it maybe instead of count vectoriser you can ask them to code TF-IDF

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.