Some quotes from Lesson 3 of course.fast.ai by Jeremy Howard.
I remember a few years ago when I said something like this in a class somebody on the forum was like “this reminds me of that thing about how to draw an owl”. Jeremy’s basically saying okay step one draw two circles, step two draw the rest of the owl. The thing I find I have a lot of trouble explaining to students is when it comes to deep learning, there’s nothing between these two steps. When you have ReLUs getting added together and gradient descent to optimize the parameters and samples of inputs and of what you want, the computer draws the owl. That’s it.
I pretty much only use resnet18 at the start of a new project because I want to spend all of my time trying things out. I’m going to try different data augmentation, I’m going to try different ways of cleaning the data, I’m going to try to bring in external data, and so I want to be trying lots of things and I want to be able to try it as fast as possible. Trying better architectures is the very last thing that I do.
My very strong opinion is that the vast majority of projects I see in industry wait far too long before they train their first model. You know in my opinion you want to train your first model on day one with whatever CSV files or whatever that you can hack together. And you might be surprised that none of the fancy stuff you’re thinking of doing is necessary because you already have a good enough accuracy for what you need. Or you might find quite the opposite you might find that oh my god we’re basically getting no accuracy at all maybe it’s impossible. These are things you want to know at the start.