AMA Session with AR on Data Science

While I keep answering questions asked by DS beginners and keep discussing interesting topics with DS practitioners on LinkedIn, recently we had a kind of ‘Ask Me Anything’ (AMA) session where many beginners/practitioners asked me some interesting questions. This post is the collection of questions asked and my responses, I believe you will find it useful.


Latish Khubnani: Is it wiser to start with Data Analyst position than Data Scientist? specialy when u don’t have PhD?

Ankit Rathi: Having a PhD is not a requirement to become Data Scientist unless there is any sophisticated research work there & having PhD helps in particular. However, if building a skill-set for Data Scientist looks intimidating, one can start with Data Analyst job.

Eshan Bhatt: When it comes to DS, its not only important to possess the technical knowledge and an analytical approach, but also to have the domain knowledge of the specific industry. Well it is definitely not easy to have good knowledge of all the industries. In this case how would you tackle such situation while looking for jobs in DS/DA?

Ankit Rathi: As the DS field is evolving, while hiring a Data Scientist, we focus on DS know-how (Stats/Maths, IT/Programming) and have a domain expert separately in the project as it very rare to find the unicorn. Data Scientists learn the domain over the period of time working with domain experts, so it’s a nice to have skill, at least in the projects I have worked till date.

Devendra Kumar: Since I believe it will take a much time to grab a data science job, can i meanwhile work as a data analyst and can prepare for DS and again switch to DS, is it a safe approach? Learning excel, advance excel and sql, scripting tool eg python and reporting tool such as power bi or tableu, would this be enough to get a job as Data Analyst?

Ankit Rathi: This looks like a good approach, if you are finding it difficult to land a job as DS you can start with any data related job/project (data analysis, visualization, SQL related etc) and keep honing your skills for the right opportunity.

Every job requires some generic & some specific skills, have a look at the various job descriptions for data analysts on job portals, you will get to know what skills are general and frequently required.

Gayatri Iyer: How to think like a data scientist? In our day to day life, we may come across many instances where data science can be applied. How to identify these potential data science problems hidden in the mundane world?

Ankit Rathi: To me, thinking like data scientist is analyzing a use case and backtracking it to its data sources & identifying the patterns in data that are helpful to meet the business objective. In other words, identifying actionable insights by analyzing relevant data of a business function. Identifying data science use cases requires both, knowledge of data science as well as domain expertise.

So as a data scientist, if we don’t have expertise in that domain, its critical for us to educate business leaders and operators data science in intuitive way so that they can identify the use cases for us and we can validate that.

We can also correlate the use cases of our current domain with previously done projects in other domains, like for that kind of business function we built that use-case so is there similar business function in this domain?

Lakshmipathi G: What are the main components are should follow when non engineering or non technical guy want to become a Data Scientist?

Ankit Rathi: There are 3 main aspects of Data Science: Maths/Stats, IT/Programming & Business/Domain. Non-engineering candidates can focus on Programming & Maths part in parallel, while domain is learnt while working on real projects. In my view, quickest way to start with DS is ‘Kaggle Learn’ for anyone these days. Please refer this podcast:

[embed]https://link.medium.com/zZyNd99nBR[/embed]

Vishal Tiwari: What are the techniques to get the feature in text classification?

Ankit Rathi: Basically, you need to convert text into different kind of numerical representation, some methods are common, some are contextual. Some common methods based on vectorization are mentioned in these articles:

[embed]https://link.medium.com/zZyNd99nBR[/embed]
[embed]https://link.medium.com/zZyNd99nBR[/embed]

Sohini Aich: How being a post graduate in control and instrumentation get a break into Data science field and land a job?

Ankit Rathi: As I responded above, there are 3 main aspects to learn in Data Science: Maths/Stats, IT/Programming, Business/Domain Please refer following podcast:

[embed]https://link.medium.com/zZyNd99nBR[/embed]

Mukul Sharma: Can we use bootstraping where statistical test is accuracy, and putting a significance level for model testing? I think this will give better results than cross validation.

Ankit Rathi: Statistical tests are subjective to the problem space we are working, there is no silver bullet as such. Generally, I don’t rely on a specific method and try different methods in the context, if most of the results from these methods say the same thing, I rely on that.

Manish Shinde: What things I should learn in order to build a small data science project which mimics actual production level scenario.? I am trying to learn aws and django to build something in data science. Eg. Find players within budget with Moneyball dataset.

Ankit Rathi: In my view, it’s difficult to do a self-initiated project which mimics a production level project as there are many things missing like business stakeholders’ expectations, technology ecosystem limitations etc. But at the same time, I believe it’s a good exercise to get that kind of exposure, so please carry on.

Purnasai Gudikandula: Everything about kaggle . tips and tricks, and how much it takes for beginner to become kaggler and then from kaggler to grandmaster like you Ankit Rathi sir?

Ankit Rathi: First of all, I am just a Kaggle Competitions Expert, these are tags you get based on how you perform in Kaggle competitions, there are other tags related to kernels & discussions.

In my view, start with ‘Kaggle Learn’ section to learn the basics, then learn from ‘Kaggle Kernels’ about how other DS have solved common problems using DS techniques, then you can participate in basic competitions for learning like Titanic, Housing Prices, Digit Recognizer etc.

Once you are comfortable with above, you can participate in any live competitions. To become a Kaggle expert, it depends, I took 2 years as I was active intermittently but many brilliant people have achieved that in 6 months. So it depends how much you are active and engaged.


Thank you for reading my post. I regularly write about Data & Technology on LinkedIn & Medium. If you would like to read my future posts then simply ‘Connect’ or ‘Follow’. Also feel free to listen to me on SoundCloud & visit my website https://ankitrathi.com.

If you have any questions or comments, click the "Go To Discussion" button below!