Eight Machine Learning Myths

8 Lies in Ai You Might Believe are True

Hello everyone. šŸ‘

What’s in store for today? 

  • Introduction: Lies in Ai

  • Myth one: Math

  • Myth two: Coding

  • Myth three: Modeling

  • Myth four: Faker Scientist

  • Myth five: Degree

  • Myth six: Model choice

  • Myth seven: Entry level

  • Myth eight: Job competition

The amount of outright lies in this space are astounding. There are many things being stated as fact that are absolutely false. In this newsletter, let’s debunk ten myths that are common place in machine learning.

🤄 Machine Learning is math.

Nope. Not in the real-world. All the top models for all the problems we face are already written. You DO NOT need to know the math behind the model. The math that is REQUIRED is applied statistics. The job of the machine learning engineer is to cleanse data and pass that cleansed that data to the model. Most data cleansing processes are based on applied statistics. 

ā

The job of the machine learning engineer is not to write the model, the models already exist.

🤄 Faker Scientists and machine learning engineers code their own models.

Nope. I’ve worked at over 30 companies, including Uber and Microsoft and can count on one hand the people who could author a machine learning model from scratch. In order to code one single model, you must know the math behind that model and enough programming knowledge to write the code for it. This is where most fall short. There are many people who could work out the math behind this model but very few who could code it.

NOTE: Here’s an example of a simple model written from scratch in Python. It will give you an idea of what’s involved from a coding and mathematics perspective.

ā

The only people authoring machine learning models in the real-world are called developers.

The models in the real-world already exist. In machine learning, we simply call the models, we don’t write them.

🤄 Most machine learning professionals spend their time modeling.

Nope. The models already exist and they are easy to use. Modeling is the easy part in machine learning. Additionally, in the real-world, 90% of all the models in the applied space are regression and classification and one model is king here. We can’t talk about it here because it will spoil an upcoming myth. 😊

If machine learning engineers and faker scientists don’t model, then what do they do? The answer is well known and defined in the Ai space. The answer is… data cleansing. How much time is devoted to cleaning data in the real-world? Most of it. The quote directly below is from one of the world’s foremost authorities on artificial intelligence Andrew Ng. 

ā

Andrew describes data as the food for your AI model. Just like with real food, the goal isn't to feed your model the most calories you can possibly cram in; it's to give it a well-balanced diet. Data cleaning makes up over 80% of a machine learning engineer's work.

Andrew Ng

🤄 The top job in Ai is the data scientist.

A lie that has cost companies billions of dollars. You simply can’t take an academic and plug and play them into real-world roles. It simply doesn’t work. I use the term faker scientist to describe the academics who thought they were going to take over all the predictive modeling data roles in the real-world. The year was 2012 and the Harvard Business Review authored a post called Data Scientist: The Sexiest Job of the 21st Century.

Image From Failed Article

The article and contents would turn out to be one of the biggest failures in the history of predictive analytics. I was one of the first people to blog about it and I took a beating for it. A decade later, I got the last laugh. The faker science role has dropped off every top job list and while the role still exists, the top job in Ai is the machine learning engineer. 

If you want to learn more, here’s Google trying to whitewash the failure of the faker scientists. Type this in Google and read the Ai Overview. It’s great for a laugh.

the sexiest job in the world harvard business review failure

The pic below is Google trying to whitewash the failures of faker scientist in the real-world.

Google Bullshit

I’ve already spent too much time on this one myth but this one kills me. A ā€œpotentialā€ for failure? Holy shit. Where do you fail 85% of the time and think you’re doing a good job?

Gartner, a company who studies trends in this space noted, ā€œmore than 85% of all faker science projects failed and the same study found that only 4% of the projects that were started end up with a production machine learning model.ā€

ā

You read that right. Only 4% of all real-world faker science projects ended with a production model. That’s a 96% failure rate. 

The goal of EVERY project in machine learning and faker science is a production model.

🤄 You need a masters or PhD to work in machine learning.

I think we’ve killed this myth with the demise of the faker scientist. The opposite is true. We need data professionals skilled in machine learning, not academics with no real-world data skills. This mindset came from the 2012 article from the Harvard Business Review and ended as many top tech companies fired entire data science teams.

Now, there is one outlier here and it’s the Ai researcher. Almost all Ai research roles require a PhD. However, no sane person wants to be an Ai researcher. Less than one percent of all Ai research will ever make it into a production model. Imagine spending your entire career doing something that will have zero impact on the real-world? No thanks.

ā

Less than 1% of all Ai research will ever make it into a production machine learning model.

🤄 Choosing a machine learning model is difficult.

Nothing could be further from the truth. The top two problems most frequently seen in the real-world are classification and regression problems. The top model for both of these problems are gradient boosters. This account for the majority of applied machine learning.

Deep learning is a niche within machine learning. Deep learning models get all the attention because of the pace of change. Their accomplishments have been astounding. Think of Tesla’s FSD and ChatGPT… both deep learning models. All of the LLMs are deep learning models. 

ā

The most used models in the real-world are gradient boosters, not deep learning models.

Here’s an easy way to decide what model you are going to use. If the problem is computer vision, NLP, audio or other unstructured data, then a deep learning model is the right choice. If you’re going to be modeling structured data, then a gradient booster is most likely the right choice.

🤄 I can take a course on EdX or Coursera and get a job in machine learning.

Nope. Not a snowballs chance in hell. There are ZERO entry level jobs in machine learning and there is NOTHING you can do or course you can take that will put you in a machine learning role without heavy real-world skills and experience. I wish this were’t true because I could see a lot more course on my site if it were.

If you navigate to any job board and type in the title machine learning engineer, it won’t take you long to notice a trend in the jobs and the skills and experience companies are looking for. Don’t believe then go do it. Look at the top 100 jobs and write down the skills and experience they are looking for. Here’s a job off indeed. It’s the job board I use the most to get contracts. What do you see?

I’m not asking you to trust me blindly. You don’t need to if you can read. Just break down the skills this company wants. Do these skills align with everything I’ve been telling you? If you read my post from seven years ago, you’ll notice my tune hasn’t changed. I’ve been saying the same thing for a decade now.

🤄 There is a ton of competition for jobs in machine learning.

This is false. The truth is… if you’ve created a production machine learning model then you’re among an elite few that’s ever done so. Right now, there are over 400K jobs for machine learning engineers open that aren’t being filled. The University of Texas projects there will be 500K next year. The jobs are there, the people who have the skills to fill these jobs are not. Companies are NOT going to put an amateur in charge of their most precious resource, their data. It’s simply isn’t going to happen.

ā

Companies are not going to put an amateur in charge of their most precious resource, their data.

Mike

This notion that the jobs are going away or that Ai is replacing them is laughable. The jobs are there, they aren’t going away, but companies would rather let a job empty than risk their entire business on an unknown.

We could continue on but I don’t want to dilute the list. Let’s wrap up this up with a sentence or two about each myth.

  • There is very little math in real-world machine learning. Most of the math is applied statistics.

  • The only people authoring machine learning models in the real-world work in big tech. As a machine learning engineer, you aren’t getting paid to create models, you’re getting paid to apply models to the companies data who hired you.

  • The majority of the time of a machine learning engineer in the real-world will be sourcing and cleansing data. It won’t be modeling.

  • The top job in Ai is the machine learning engineer and has been now for the last five years or more. It’s not the faker scientist. Companies have cleaned these people out.

  • You don’t need a master’s or PhD to work in any Ai role outside of research. The top machine learning engineer I know came from BI or other data roles.

  • Model choice is not difficult. Most of you will be working with structured data and the king here are gradient boosters.

  • There is no course you can take, no degree you can attain to get a job in applied machine learning. This space is about experience and skills. You either have them or you don’t.

  • There is no competition for jobs in the machine learning space. A lot of unqualified people applying for jobs they won’t get doesn’t make the market saturated.

You’ve reached the end of another very important post. šŸŽ‰

Thanks for reading my content. Have a great day and continue learning.