[CTQ Smartcast] Mid-career transitions to ML-AI, with Yogesh Kulkarni
It’s the big question on many minds. Should I undergo a mid-career transition, that too into a disruptive field like Machine Learning and Artificial Intelligence?
After 15-odd years of working as an engineer and manager in the domains of geometric modeling, Yogesh Kulkarni decided to pursue a Ph.D, and then switch tracks to a new world of AI-ML & Data Science.
Today, Yogesh works in this area, teaches, runs a community, and more. So…
Why and how did he make this change?
How should you plan such a transition?
After getting into AI-ML, why is he now taking courses from areas like biology?
In this Smartcast, Yogesh tells CTQ’s Ramanand about his fascinating journey. He reveals the thinking, the process, and the prep that helped his switch.
(Read the shownotes or skip to the transcript)
Some of the things we discuss
Decision-making process behind the switch
How to manage your money during a career transition
Transitioning to a data science career
How to think like a data scientist
Managing a data science team
Building credentials and showcasing expertise
Managing ML engineers
Building your network with social media
The scope of ML and data science
Keeping your data science skills fresh
Data science tools for non-programmers
Tips for making time for learning
Deciding what skills to learn
Courses recommendations
Book recommendations
Best platforms for online courses
Data Science communities
YOGESH RECOMMENDS
Zen Habits: Handbook for Life by Leo Babauta
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World by Pedro Domingos.
Analytics Vidhya’s Learning Paths
The Meetup Page for Tensorflow User Group, Pune
READ THE TRANSCRIPT OF THIS EPISODE
[Start of Transcript]
[00:00:00]
Yogesh Kulkarni: If I want to have something on my name, this mode is not going to help. Blogging, GitHub repositories, teaching, doing free consultancy. This is how I transition and build some kind of profile in the new domain. What is different in the ‘machine learning’ way of doing things: you give inputs as well as the outputs, and let the computer program generate the logic.
Ramanand: Around 2010, our guest had a full-time career. He had been an engineer; he was a manager. But a few years later, suddenly, he was pursuing a PhD in the area of geometric modelling. He then switched over to data sciences and the world of AI and ML. So how did this shift happen? Why did it happen? To talk about his very interesting journey, we have Yogesh Kulkarni with us today. Yogesh, welcome to the CTQ Smartcast.
Yogesh Kulkarni: Thanks, Ramanand.
Ramanand: Yogesh, take us back all those years, now that you're in a position to reflect, what was the thinking process? What motivated you to make that switch?
Yogesh Kulkarni: The shift to AI-ML was rather unplanned. But the precursor to that shift, i.e. going for PhD was very much planned. I was around 40 and I thought, there should be something on my name somewhere, some method on my name. That was the primary motive. Of course, a few other reasons. Like, I wanted to have the highest degree, very fashionable thing, like having a 'Dr.' before your name. So those are all normal things that were there. [00:02:00] But having something on my name was very important for me. Plus, I have been a visiting faculty at multiple places, right from the start of my career. One or two lectures here and there, typically at College of Engineering, Pune (COEP), teaching CAD, computer design. I thought with this degree, it could be more formal, later if I decide to do it full time, then I have the paper and I can get some official recognition. So that was the main idea behind going for a PhD.
Luckily, I had a problem to work on. Luckily, I got good advisors, good clients, at COEP. So, COEP was my research centre. I continued doing PhD, midway, I came across this new problem-solving technique. Of course, the earlier research was a normal procedural way of solving problems. But this Machine Learning way came via one paper. I was really impressed. I thought, once this PhD is done, I'm going to go ahead with this topic. So, in a nutshell, I think PhD was planned, but not AI-ML shift was planned. It just dawned upon me one day and I thought this is it.
Ramanand: In some sense, your PhD was the culmination of the previous portion of your career. When I told a couple of people we were doing this talk, one thing they were curious to know was whether this... the PhD actually launched the next half... but it's kind of a bridge, but I think it's more of a culmination. So, one thing I wanted to ask, was there a decision making process [00:04:00] that you had? Like, for example, I take so much time, PhDs are notorious, hard to predict. Tell us about the decision making process.
Yogesh Kulkarni: As I said, the PhD part was planned. So the initial plan was, I will do it part-time. Working full time at a company called Autodesk, I was handling a reasonably high position. I was managing the whole group. Site manager also. So I thought, I'll do it part-time and it started that way only. Started doing part-time. First year in PhD, you have to do coursework. In this you have to take two masters courses, one-one odd research methodology courses and all. Apart from that, you have to do a literature survey to come up with problem statements, precise problem statements. During that time, doing this full-time job, responsibility and then coursework. Coursework was manageable. I could get it done.
But then during the literature survey, I realized, if I want to have something on my name, this mode is not going to help. Doing both things, juggling both things is not going to help. Then I talked with my family and decided to take the plunge, basically. Decided midway that year, so around the November-December time frame, 2012, I talked with my boss, American guy. He understood, of course, and then we planned that way. So we planned a succession within the company role. So we had other managers to take care. I had six months planned, or rather timeframe for my switch. Started talking to my boss in November-December, so that I can leave in the next semester, that is June-odd. Completed that. [00:06:00] That way, this decision-making process happened, but then I talked with my family, and then that thing happened.
I decided that once I start full time, I will do it in four years. Of course, there are multiple reasons why four years and not seven, not 10, which some people take. Four years, I'll have this kind of output, these many papers and this, this. this. And of course, if you see a roadblock somewhere in the middle, you can always come back. So, that confidence was there. I decided to go full-time for four years. That's how this happened.
Ramanand: Great! What was the family reaction like? And more importantly, what was the peer reaction like?
Yogesh Kulkarni: Family reaction was actually good. My wife is very much supportive. She was working. So, she supported it very much. Peers, for boss, of course, it was a surprise that at that age, you are deciding to go back to school. But somehow my work, although it was in software engineering, was research-oriented. The topic is not run of the mill topic, geometric modelling. So we had to do some kind of research, anyway, algorithmic research. So, peers were also conversant with the level we work at. They were not really surprised then, as I had a proper six months’ transition period. Initially, though, there was some surprise, but later it smoothened out and things were okay.
Ramanand: One other question that often comes up, before I get into some [00:08:00] of the mechanics of what you did those years, was that from a finance point of view, did you again, do some planning? Because that's an area, a lot of people would like to do what you do, but that's a challenge for them.
Yogesh Kulkarni: Actually, nothing else I can boast about my financial prowess. I've been very meticulous about my financial planning, right from whatever, even early career days. That way, I had no loans, decent income, good savings. There was nothing to worry about. I could always afford it. Plus, of course, my wife was working. So that way, no risk, actually. And there was always a confidence that if nothing works, I can come back.
Ramanand: Talk to me a little bit about the transition to data sciences. Obviously, like you mentioned, because you were doing geometric modelling, and reading through your resume, it was fairly at a deep level. You hadn't lost touch with the mathematical foundations or even the technical programming side of things. Let's talk a little bit about that. How did that fit into your data sciences plans?
Yogesh Kulkarni: Maybe I'll talk about this problem for which I've got a little detailed or involved. My PhD research was on the topic called 'Midsurface', surface in the middle. Meaning if you have a thin-walled shape, like sheet metal shape or plastic shapes, if you imagine them, you will have two surfaces. [00:10:00] For certain methods like finite element method, instead of this 3D solid shape, if you give equivalent to the surface shape, the computations are faster. Instead of 3D elements, you can use 2D elements, surfaces. So, the problem statement is given, any thin-walled shape, then you compute a surface in the middle.
This research has been going on for 50-60 years, still the problem is unsolved. It's there in the commercial packages, still there are issues. Currently, I worked commercially in my previous job on this problem itself. So, I knew where the issues were. Of course, my method is also not solving the whole thing, but certain aspects. Imagine a table, you have a flat table and on top, you have to create a surface. Of course, a surface equation is very simple - a planar equation. Offset of it in the middle would be the midsurface, simple. But, in real life of course, the surface can be anything. And it can have small other features, meaning holes, chamfers, some fillets here and there. So, before computing midsurface, you typically remove these small things, which are irrelevant. The process is called as de-featuring, or suppressing small features. It's not that simple. What is small? Is it 5% of the total shape? 10%? So, there is a judgment involved. Plus, if the feature is coming in the direction of the load, you don't. There's a lot of engineering judgment, that's also a big area of research, 10-15 years of research.
Even I did de-featuring before computing midsurface in my research, had papers on them. But then one paper came to me, [00:12:00] which said that forget writing these rules, 5% of the total shape, the radius should be... or the chamfer should be this side... small is this. Let's give a shape with a feature and shape with suppressed features together, and multiple examples of that, and let the system evolve the rules. That was a flash, and of course he did it because he had data. I didn't do it because I did not have data. So that was another point, you needed data. So, this MLM method was very impressive, surprising to me. Not just geometry. But then I read upon that, did courses, I thought this method is what I will do next. Again, that's the transition part. Now what was there in geometric modelling, and that was useful in machine learning. Although mathematics is common, the application of mathematics is rather different.
Imagine you're offsetting a surface, you have a surface equation, not a very planar equation and then translation is different than offsetting, though. Regular translation offset is normal in the normal direction. You can have collapses of surfaces and other things. So, there you need matrices, because all transformations are homogeneous transformation matrices, fine transformations. You have calculus, you have to take derivatives, curvatures, slopes, but the application is the surfaces which lie within the tolerance of the goodness. You have the angular surface, but it has to be within certain tolerances. So, it's all floating-point [00:14:00] arithmetic wherein tolerance is the key. Whereas in machine learning, probability is the key. The ideal question, the ideal answer is this, you have 10 answers possible, which is most probable?
Mathematical way of looking at things is different. But as I said, vectors, matrices, and calculus are also there in machine learning. All these neural networks that you'll see, these flickers and arrows, they're actually matrices. For visualization purposes, we see that, as a neural network. Only thing, the additional part in machine learning is statistics. That's generally not there, at least at the level I've worked in geometric modelling, it's not there. Linear algebra and calculus was very useful while doing machine learning.
Ramanand: Two questions I have based on this. One is that, while talking to a few people, I also briefly worked in that area. What I noticed is that for some people, to make a mental switch to this world of almost... It's a slightly different way of looking at the results. Sometimes there is a lot of discomfort with the uncertainty. Did you have to make a mental switch very consciously now that your work revolves around data sciences?
Yogesh Kulkarni: Yes, actually very much. Apart from being probabilistic, meaning indeterminate, the solution is not deterministic. The other big issue that comes while doing software development for machine learning is things that are still black-boxed. Machine modelling domain, I was writing the logic. So, if there is any issue, the debugging, I could see the steps, get the intermediate output, [00:16:00] look at it, see where I'm going wrong, all the steps were written by me. So I could debug them and get it done. But here, you have to pray, basically, black box. You can, of course, do some tinkering here, you can change the activation functions, add one more layer, add two more nodes. But then still, you really can't debug that way, the way you used to do, procedural way of doing problems. So, rather than tolerance way of looking at it, and in a probabilistic way, debugging, being a programmer and developer was a big hurdle while doing the mental shift.
Ramanand: If let's say there was someone listening to you, who is maybe more than in the managerial cadre, someone who has to now work with teams who are data science-oriented teams, what is the shift do you think or you've seen them having to make?
Yogesh Kulkarni: Again, a big mental shift. Machine learning is not deterministic, is statistical. It's probabilistic. So you generally can never say that I'll get you 100% accuracy. The accuracy talk has to be very carefully made or you should not be making that kind of a claim. 99% accuracy, if you're saying that, typically your model is overfitting. So, what results to believe, that sense being manager you should get. Not that it is 99% or 99.9%. That's irrelevant. How useful the result is, and how generalized the result is? You are mugging up for an exam, but the next exam has a completely different set of questions. Then you are gone, right? So that [00:18:00] sense of understanding the results would be very much necessary for managerial people.
Ramanand: I'm going to come back to some of these points a little later, but I'm just going to switch back to that phase when you are in the middle of your PhD, or coming to the end of it. You've discovered this new and interesting field, it's kind of opened up a new window for you. Tell me about what you did once you knew you were going to get into that area? So one problem a lot of people have is how do you build credentials? How do you showcase your expertise? Tell me about that.
Yogesh Kulkarni: Personally, in my case, as I was saying, I was in a different domain, though linear algebra or calculus was useful. But I did not have formal exposure apart from what we did in college, statistics and then really in the data analytics part. That I had to take care. What I did was a BI course, meaning a Business Intelligence course, where you learn what is called data pipeline, ETL, Extract, Transform and Load. Very old-ish kind of technology, still used though. Data analytics side of it - how to connect to databases, how to transform data, how to fire SQL queries, that I actually learned for two and a half months, every day for two hours. That gave me a sense of data.
Then once that was done, I started doing courses, the normal, very fashionable or popular courses that I did, did projects, solved some of [00:20:00] the Kaggle competitions. Another thing that I did, rather differently, was I started writing blogs. If I got to understand one subject, typically a particular subject, then I would write a blog about it. Then we would publish it on a very good site called Analytics Vidhya, typically. So I'll write blogs, plus whatever I did, I made a GitHub repo for that. So I tried building a GitHub repo portfolio so that later, if somebody wants to see what I've done. BI class, blogging, GitHub repo is what I did. Then once this was done, then I, of course, started through the network getting queries. And of course, people know that I've started working on this. Initially, I worked for free. I went to friends, asked them for real-life problems, and worked there for months free so that I could get a sense of how customer data comes, how to present to a customer, what are the real-life problems.
Once that was done, I started giving talks, whatever I understood, meetups and all. Through such things, through this, initially free, but later I started getting consultancy assignments as well. That got built up, and went on very smoothly. That's how I landed a job. Initially, my idea was also to go to the same company for consultancy. But then, it so happened that the data is so private, so secret, typically [00:22:00] is not given to an outsider or a consultant. So, I thought joining there would be beneficial. BI class, initial, whatever gap was there in my skill set, and needed skill set, blogging, GitHub repositories, teaching, doing free consultancy, this is how I transitioned and built some kind of profile in the new domain. That's how I got into it formally then.
Ramanand: Very fascinating, Yogesh, because you did a lot of things that people have been telling engineers or technical folks to do as part of their work to build up that personal profile, especially in a day and age where that is possible. Maybe 10 years ago, it was much harder to do that. But no one really does it. Because you always have something urgent burning, and you don't feel the need to do it. But what I find fascinating about what you've done is you've assembled, it's almost like assembling that structure of credibility through various means. I know you were doing teaching, but were you doing blogging and you were using social media and meetups earlier as well?
Yogesh Kulkarni: Not really. I was a normal WhatsApp, Facebook guy. Had some presence on LinkedIn, of course. But then. WhatsApp is there, but left Facebook. Even never written a tweet on Twitter, but fully focused on LinkedIn. LinkedIn is a platform I'm always at, like a second home. There I started following people, what are they doing, and then posting my things that would help me build a network as I was doing consultancy after my PhD. Getting work also, [00:24:00] getting to know what's going on. So, LinkedIn is where I focused as far as social media goes, and I removed the rest of the normal social media.
Ramanand: Again, one thing I find fascinating is that even though you have kind of now reached a point of stability in terms of your journey, let's say it is local stability, you haven't stopped doing the things that took you here, which is what I've also noticed.
Yogesh Kulkarni: As far as teaching goes, I've been teaching for decades now. Initially as visiting faculty. During PhD time, you have to do it formally. So I took courses as a student itself. I was a full-time student at COEP, so did semester-long courses there. But then, this has continued, and now that I'm in a job I cannot do it full-time anyway or big full time, semester-long courses. So I do that on weekends. I've co-founded a meetup group in Pune called TensorFlow User Group. I typically give a month, give a lecture, plus people call on these faculty development programs, so various requests. That way some kind of teaching has continued.
Ramanand: How has the teaching helped over the years? Because it's clearly a source of strength.
Yogesh Kulkarni: It started for selfish reasons, actually. Generally, if I understand a topic, I'm able to explain it to somebody well. And that's the touchstone also, a litmus test also. If I'm able to explain somebody well, then I've actually understood. [00:26:00] And while teaching, I get to learn more. The way people react, and the way people ask questions. I'll give you a very simple example. This machine learning, as you would know, is a dramatic shift from traditional programming. In traditional programming, you know the inputs, you have to write logic, some program, to generate the output. But what is different in the machine learning way of doing things, you give inputs as well as the outputs, and let the computer program... of course, you have to provide some structure... generate the logic. So you have to decide if it is a linear relationship or nonlinear, something you have to provide.
But the logic actually, the set of weights is generated by the machine learning program itself. So you don't write the hard code explicit logic there. So you need inputs and outputs for machine learning. By the way, COEP started teaching machine learning as an elective to third-year mechanical engineering students, not just computer science. One of the classes, from third-year mechanical engineering students, one student asked me that you said for machine learning, inputs and outputs are needed. But then in unsupervised learning, you provide only inputs. You don't have output. So why is it for machine learning? That was a bouncer, right? Then of course, I figured it out, maybe I'll give it as an exercise to those who are familiar with it; unsupervised learning is called machine learning. This is how you get to think about aspects that you have not looked at. And in solving people's questions, or addressing them, responding to them in an understandable [00:28:00] manner gives you clarity. Teaching has helped me learn more, or learn better.
Ramanand: I also wanted to get a little bit of a few tips from you, in terms of what have been the two-three big areas in learning about machine learning and data sciences that you have consciously focused on? It's a very broad area, it's a very highly evolving area. Have you constrained the scope a little bit for yourself?
Yogesh Kulkarni: In any field, there is something called horizontal and there are verticals also. There are some base strata that you have to have. As I mentioned, in my case, the data engineering side I had figured. Understanding of data, that is this ETL pipeline, database SQL, data engineering, that part is essential for machine learning. If you don't have data, you're not doing machine learning. It's just a PowerPoint thing.
Another thing that you need to have is good familiarity with mathematics or some aspects of mathematics, linear algebra, vectors and matrices, calculus, meaning derivatives and gradients, and statistics and probability. If you dislike them, then it's hard later. So data engineering, mathematics, and another thing you have to implement things. So you have to know programming well. Algorithms, data structures, programming, any language per se, if you're coming from C, C++, Java, and then going to Python and R, which are typically used for machine learning. There is more of unlearning many things, many complex things.
But if you're starting afresh Python could be a good programming language to start with. [00:30:00] So, data, mathematics and programming, this is all horizontal. But that won't be sufficient, that is probably a Bachelor's degree. To survive, you will have to have a Master’s degree, you have to have a domain. So some specialization, which is let's say, computer vision, or natural processing, or even a domain-domain thing meaning healthcare, finance, something of that sort of your own liking should be there. It turned out for me that I would do these three basic things, data, mathematics and programming, plus natural language processing as a specialization. That's sort of a Master's in that case.
Ramanand: If someone were to try and focus on a domain, one challenge is getting hold of good data sites, which are domain-specific. Are there domains that someone who's self-taught or a student can start with, safer domains for whom data sites are now available?
Yogesh Kulkarni: Generally, finance, the buck stops literally. It had data from early on, I mean, finance is data. You had Excels from the beginning, the computer was meant for finance, I guess, need generated. So if you're into finance, then getting data is just easily possible. If you're into computer vision, lots of data sets are available, because it is deterministic now. And that image improvement has happened in computer vision. It has surpassed, at least in some cases, human accuracy. In case of a little obscure domain, in terms of data, would be healthcare, which is regulated because [00:32:00] of the secrecy aspect. And a few other things which are highly regulated. Getting data there is an issue. If you are into genomics and all, still harder. But it is very easy to get datasets for.
Ramanand: Great. Yogesh, since we are midway through our conversation, you have been scaring your students with questions. We thought we'll reverse it. And this is the moment you were very worried about. So I'm going to put you on the spot with a simple quiz question. You will get clues if you ask for it. Since you are a learner at heart, we expect you to work your way to the answer. If you get this right, we will give you a little gift to give to someone which is a slot in one of our reading groups or 'reading compound', as we call it. A question from the world of data sciences, I'm talking about something which is popular in the world of ML software. This was created by Ross Ihaka and Robert Gentleman, and it was named partly because of what their names have in common. What from the world of ML software are we talking about?
Yogesh Kulkarni: Is it R?
Ramanand: Why do you say that?
Yogesh Kulkarni: R, the names are with R.
Ramanand: Absolutely right, well done Yogesh. You got that right. We will give you a gift, well done for that. This is not that bad, right?
Yogesh Kulkarni: I see what's coming next.
Ramanand: We happen to land on that spot very naturally as well. So, since you spoke about some of the areas, [00:34:00] and we touched upon programming languages, are there software environments that people should now also consider, go-to kind of places?
Yogesh Kulkarni: If you're a programmer, I would suggest Python. If you come from statistics background, hardcore statistics, mathematics background, then R would be suitable for you. But still being a little biased, I would say Python is the programming language you should go ahead with. If you have been programming before, then it's just probably two days at least to get you started. Nothing more than that. For machine learning, the favourite library is Scikit-learn, import sklearn. So Scikit-learn is the library you should go to. It is widely used, very popular academically and also has 200 plus algorithms. For deep learning, there are quite a few good libraries.
Now that I'm a Google Developer expert for TensorFlow, I would say TensorFlow. Actually TensorFlow's first version, 1.0, was very difficult to work with. In the middle, I had shifted to PyTorch, sort of a rival system. But then TensorFlow 2.0 came, and it has an interface of what is called as Keras. Keras was a rather easy to use library, it has been sort of a front end for TensorFlow now. So, for deep learning, I would say TensorFlow 2.0. If you're into NLP, there are quite a few popular libraries. So Python, Scikit learn and TensorFlow are what I will suggest for machine learning and deep learning.
Ramanand: How much research paper reading do you do? Because [00:36:00] they have lots of new developments all the time. So if you're going to a textbook, you're probably a few generations behind. Would you still advise people to learn? It's not necessarily taught in the standard undergraduate curriculum.
Yogesh Kulkarni: Actually, we have to go stepwise. You have to have some courses done first, and have basic mathematical abilities to understand the courses first. If you do the Andrew Ng course, that we call it in India, he starts with the loss function of a linear regression. And then if you don't know this word, what are these Greek letters, then it becomes difficult. So having a little background, doing courses, there are some intermediate level courses also, then you can start looking at the Pi Data kind of conference, where courses like topic modelling are taught or explained. Then the last stage would be research papers, I would say. Because a research paper is to tell you what was there before. They'll have a literature survey session; they'll take a survey of what has happened before.
But going to research papers, I would still say if you're beginner or even intermediate, unless you are an expert, I would suggest going to research papers which explain the research paper. I would suggest those, and quite a few people, even very renowned researchers also write blogs. They do research, have very condensed material in their research paper, but they explain [00:38:00] it also. Another blog by Chris Olah and all on deep learning ways, wonderfully explained. That explanation you don't get in any research paper, it condensed, all Greek and Latin there. But of course, you have reached the level, other than research papers, nothing else.
Ramanand: I want to ask you a few questions about how you go about learning and things like that. But before that, let's leave this section with your recommendations for say someone who's been a regular software developer, three to five years, thinks that now, if they don't catch this train or at least understand what this train is all about, they can then make an informed choice of making a switch or bringing that into their work. What would be an ideal next step?
Yogesh Kulkarni: As I mentioned, there are three and four things meaning maths, data science and programming that you have to know, plus a domain. There are some learning paths that people have already formulated. A site that I would suggest is called Analytics Vidhya. It's an Indian site started by my friend now from IIT Kanpur, I guess. There you go and search for 'learning path 2017'. They have 2019-2020. But I would still say a little oldish, earlier 2017 path. There he gives you a one-year program. Learn this in this month, learn this, so stepwise program for a newcomer to enter into this field. And he'll cover all these things that are mentioned. The data part, maths part, [00:40:00] and the programming part. Not much on the domain side. And all the resources there are predominantly free resources. You can easily go through. For a newcomer to test the waters, I would suggest going through this learning path.
Ramanand: Fascinating, we'll link this in our notes on our blog. That sounds really promising. But let's say on the other end, you have someone with say two or three decades of experience, just wants to know enough to say be dangerous, may not be doing a lot of technical work, but, that's increasingly something you should be functionally literate about, let's say, what should they do?
Yogesh Kulkarni: Being AI aware is I think a new thing. As I mentioned, these things are statistical, these things are probabilistic. But even at that level, even though you are in a non-tech mode, managerial model, even sales mode, marketing mode, you cannot really bluff that easily nowadays. You have to have some sense of what you're speaking. If you're not doing programming, not doing Scikit-learn, or TensorFlow, still, there are some tools available, which are called low code, or no-code tools, almost visual tools that are available for machine learning. Tools like RapidMiner, or KNIME, and there are quite a few visual tools available. Even platforms like Azure, Microsoft Azure would have a studio which is visual. You have to drag data acquisition block, then linear regression block, then reporting block. That way, you can play with datasets. And I'll say, data sets are available. [00:42:00] Play with those datasets with these visual tools. So you're not doing programming, but get the sense what kind of accuracy you get with so-called clean data. On a Titanic data set, what kind of accuracy do you get? 99%? Once you're familiar with this, then you'll be able to give an informed talk or even a discussion later.
So, using visual tools is one I would suggest for them to get into. Anyway, a little prophetic, though, but anyway, AutoML is coming. So these steps are going to be handled, or at least the mundane, the regular steps wouldn't be handled by... So those who are bumping today or other, having joy today of doing machine learning have some tests later, AutoML is coming. So learn more about AutoML, what kind of things it would do. And another thing, which is very important, and sort of non-technical, is to have a decent domain understanding. If you are in finance, interpretation of results, explainability of the results, and of course, the perennial storytelling. Why are you doing machine learning? You're extracting information from text data. Is it only for the data extracted that is the use of this workflow? Or is it saving you time? What is the business use? What is the value proposition that you are giving? That's how you get the work actually. Even though you are a non-technical person, a non-programmer, they're just no less value to you, if you bring value proposition.
Ramanand: I think that's a very valuable point. Because people can get swept away [00:44:00] by saying, 'Oh, yeah, I am now an ML person' or whatever. But, again, there are things which are timeless, like being able to convey the business impact and making that case for it. I think those are not going away, just because you know ML. So I think that's a very valid point. Yogesh, one thing that everybody who knows you or even is connected to you on LinkedIn will know that you're a prolific learner. You're taking courses all the time, you're talking or you're sharing your learnings. Some people want to know, where did you get the time to do all of these? Tell us a little bit about the planning when it comes to doing all these things and managing and balancing.
Yogesh Kulkarni: Actually, there's no plan and there's no motivation either. Now it has become a habit. If I'm not doing something extra, then it feels a little odd. So, I've learned really diverse things, right from Yoga Sutra to genomics, which is very different from what I do for bread and butter. So it has become a habit now. Another thing is that my daily schedule is a little odd than usual. I get up early, before four, and then start work by seven, get it done by four, five, whatever. And then I have evenings free, generally. So I spend time in evenings for my own learnings plus weekends. I get ample time actually. I don't have other things, which are attractive in that sense, or rather socially attractive. I can just stay back in the evenings and go ahead with these courses that I do. I really love them even now; motivation is needed that way. [00:46:00]
Ramanand: You described it well. You don't have to go and seek, you're not depleting your willpower searching for motivation. You're just doing it. How do you pick what you learn? You've been learning Sanskrit, genomics, all these things. How do you pick what you learn?
Yogesh Kulkarni: There is no set pattern to this. I think the core thing would be if I find something witty, something intellectual somehow, then I get attracted to it and that is a common thing I've realized myself. In the case of genomics, although it is all biological, you don't understand what is a nucleotide, and what is gene expression, what is sequencing? Sequencing is identifying the chemicals or bases. So, even though you're not from the background, finally, it boils down to an abstraction.
It's a sequence of letters. Actually, it's bio NLP in that sense. It's natural language processing, it's a sequence of letters, and your algorithms could be finding a subsequence, finding a matching strain. So, if you get to know these abstractions, somehow, then you get fascinated. Then the domain really doesn't matter. There is some sense of intellectualness involved in this. In Sanskrit also, if you've done Sanskrit, and the multiple interpretations it can have, the craft people can have is just mind-boggling. In the sense of wittiness, if I see somewhere, I somehow get attracted, and do those courses just to know more about those things.
Ramanand: I often tell people we are fortunate to live in almost a golden age of access to learning. You have so many things. I don't [00:48:00] think you would have imagined maybe 10-15 years ago that you would have access to an array of courses, while balancing your other things. Tell me a couple of courses that you're in the middle of right now.
Yogesh Kulkarni: Actually, I'm more into genomics now. I've done one specialization, by Coursera and did AI for medicine also. As AutoML is coming and then on the Gartner hype cycle, machine learning is coming down now, if you see the 2020 chart. Domain is what will drive. Healthcare or genetics has not exploited enough of machine learning prowess or the capabilities. So I thought at least understanding what's going on, and as I've said, the abstraction is really workable, I don't think you have to know biology that way. If I can get the sequence, can I do sequence predictions, clustering? The interpretation can be done by the domain person. So this is where I'm going personally. Of course apart from doing Sanskrit and other things are happening side by side anyway.
Ramanand: It almost seems to me that these things are building on top of each other. They are not entirely unconnected to what you have done is the sense I get.
Yogesh Kulkarni: Plus, I have some flair for languages. I had Sanskrit for five years in school. Then German for two years in college. I was working in a company which had offices in Shanghai, I had to go to China. So I learned Chinese for two and a half months. Had Japanese for one year - two years, and worked in Japan also. And I am working in natural language processing [00:50:00] per se. So, probably some linguistic flair is there. And this new language of AutoML is fascinating. That's where I'm going towards now.
Ramanand: Nice. Coming back to AI, you mentioned AutoML. I know, you have dabbled a little bit in explainable AI. What are the areas that you're looking ahead to in the field of AI, which maybe we have not covered so far?
Yogesh Kulkarni: Again, the biases in AI are getting very important, responsible AI. For readers, I suggest you follow the Gartner hype cycle. Gartner is this marketing research company; I think it's a very famous one. They publish, right from 1995, they publish hype cycles, technologies that are hypes. There are graphs like this, where the technologies traverse. There is an initial hype, and there is a peak of hype then. Everybody starts using them, then the expectations are too much, it drops. Currently machine learning is also dropping. But then if it survives that trough, survives that valley, then it matures, and then it is used heavily. I'll suggest following at least 2018, 2019 and 2020 hype cycles, publicly available. That's why I think the explainable AI, responsible AI is coming up now.
What is happening, AI is so mature now. Rather than machine learning and deep learning, I would say similarly, they have crossed the peak of the hype. People are starting to use it. As people are using it, people want to understand why it has made that decision. That's why the explainability is coming, why I was given a loan and why this [00:52:00] dark skinned guy was not given a loan. People are asking legal questions, there's a compliance aspect to it, GDPR aspect to it also. If you have tools, if you have skills to explain AI, explainable AI, explain the biases that algorithms bring and typically data brings, you are best suited for this. So that thing is to be explored further than the core ML, because ML will be taken care of by AutoML.
Ramanand: I think you can't do this just by being a single sliced individual. By that I mean that you have only a niche skill, but suddenly, the ability to use language or use domain or make that connection across disciplines is probably what's going to matter.
Yogesh Kulkarni: Yes, the cross-pollination is what is going to take us further. Just knowing one algorithm, working on SVM for 50 years, or 30 years is not going to help in my opinion. This is what I think Taleb calls optionality, in Antifragile he said. There are actually two thoughts, focusing on only this thing for 50 years is one thing, and being horizontal. But there is middle ground also. Decide the direction but be a little divergent within that direction itself. And do cross-pollination. Have different domains talk to you. NPS and NLP can be, of course, legal domain that I work in my day job. But Indic languages - Sanskrit, NLP, could be one, genomics could be another type of NLP. So some balance of horizontal and vertical.
Ramanand: The half-life of some of these has dipped, right? That's why you have to have a divergent portfolio, like you mentioned with GitHub, [00:54:00] portfolio for your own career or life.
Yogesh Kulkarni: Actually, I would say, rather than deciding something that you want to do, there's a very niche statement that what is your passion, what do you like? Very smoothened out thing, but I would go a little different. Be open and decide what you don't like. I don't like macOS, I don't like programming in JavaScript. That you decide, then everything else is open. So that you have to fix first.
Ramanand: To bring in Taleb again, Via Negativa.
Yogesh Kulkarni: Via Negativa is the thing. Yeah.
Ramanand: We have about three, four minutes left. So last couple of questions, Yogesh. We've discussed courses. Tell me a little bit about reading. Do you read a lot? Could you recommend a couple of books for people that you like?
Yogesh Kulkarni: I read generally a lot I would say. Sometimes self-help books also still! But what I found is that there are some core three-four principles that they expand and make a book out of. I would suggest one small book. It is not just three-four, it is a whole recipe kind of thing. The book's name is Zen Habits by Leo Babauta, a wonderful book, almost like a recipe book. What you should do, how many things you should do in a day. That I liked much, because it's very simplistic and not dilute. On the technical front, one book that I read a few years back is the Master Algorithm [00:56:00] by Pedro Domingos.
I'm fascinated by the way things evolve. Like Gartner hype cycle. There was some type of algorithm, first procedural, then came machine learning. Procedural, you knew only input and you wrote the logic. You gave input and output, the thing wrote the logic. But you still have to configure the neural network that does the conversion. Can something evolve the neural network itself? That would be the procedural algorithm. People put these things in perspective. So that book I would recommend. In case of non-technical things, or fiction, actually, I'm not much into English fiction. I'm still a Marathi person. I read Marathi only if it is fiction thing, non-technical. That's something which I follow very much intuitively. Not the English part of it.
Ramanand: Thank you. That was a nice sample. I'm sure it doesn't do justice to your interests and your library. But that's a very nice sampling. One quick question on the courses front was that what are your go-to platforms typically these days for courses, where do you start from?
Yogesh Kulkarni: I've done almost all the popular ones, Coursera, edX, Udacity, Udemy, DataCap. Coursera is for a little intermediate people. Udemy is for beginners. edX and Udacity would be in between. I would still suggest, if you get some basic understanding and do Coursera courses, they're very well thought out, platform-wise, the platform is very good. [00:58:00]
Ramanand: Since you spoke about a community that you have helped nurture. Tell us a little bit about what are the different communities that people interested in data science should consider joining.
Yogesh Kulkarni: There's a site called meetup.com. It is a wonderful site where you can go, type in your subject and type in the location, say whatever, deep learning, you'll come to know the meetups that are happening. Meetups are actually the communities now. It used to be physical before, but now it is all virtual. So there is a Python community in Pune, TensorFlow community, of course, depending on all the domains. As far as data science goes, these communities are available. I'm at quite a few meetups, I've subscribed to them, I attend them rather religiously. And on the offline, Pune is very vibrant that way. If you are a runner, there's a Pune runners club. If you are a sketcher, there's the Urban Sketchers group. I've done those also. Community-wise, there is no dearth of things in Pune at least.
Ramanand: Alright Yogesh, on that very diverse, divergent cross-pollination, you've started with geometric modeling. Right now we're just about to touch art and sketching. I'm sure we'll do another episode to do justice to that half of Yogesh Kulkarni. Yogesh, thank you so much for spending your last one hour with me.
Yogesh Kulkarni: Thank you, Ramanand.
If you want to get into the habit of reading, or explore diverse topics that you wouldn't have read otherwise, CTQ Compounds is for you. Compounds are expertly curated by us and are a great way to slip in 15 minutes of reading nonfiction every day. The FutureStack compound is perfect for anyone with their eye on the future. It gives you a regular dose of relevant info to keep you current and relevant in the future to come. For how you can be a part of a compound, go to ctqcompounds.com. You can also see what our compound members have to say about their experience there. That's ctqcompounds.com.
[End of Transcript]