If you had samples of cow behavior such as videos, you could then simulate behavior at each step and optimize for a genetic algorithm that produces similar behaviors. In this case it's probably not the right solution, but it's not that hard to swap out different learning algorithms when things don't seem to jell.
I have already explained in the post above why optimization is very a much a part of machine learning. Now, you say that that classification and optimization are two different things and that is true. But really, it can be more fruitful to look at supervised learning as a special case of unsupervised learning [1]. It is best to seek the most general framework from which to understand things as it leads to a deeper understanding, broader applicability of concepts and easier cross fertilization across fields.
For example, understanding the spectral theorem makes SVD (hence PCA) and the DFT class of algorithms much clearer. Understand the notion of Lp-Norms, convexity, adjoints, loss functions and regularization and a whole bunch of seemingly different algorithms collapse into facets of the same thing. Hook it up to automatic differentiation then some optimization algorithms and you can write anything from Neural networks, SVMs, regularized logistic regression to Non negative tensor factorization in a few lines. You stop making arbitrary divisions between classification or optimization. Much the same kind of collapse can be done for the dual [2] notion of probabilistic algorithms by thinking in terms of graphs, simplices, parametrizations, families and conjugacy.
The best thing from all this is you stop thinking of which algorithm should I use and start thinking of what do I want to do? What is the best mathematical model for this? What would really be great would be a machine learning language. Where one could work with things akin to folds and maps on various structures and manifolds and disappear the incidental complexity. Stuff like [3] is really encouraging for that direction.
[1] The problem of learning a distribution usually is called unsupervised learning, but in this case, supervised learning formally is a special case of unsupervised learning; if we admit that all the functional relations or associations that we are trying to learn have any element of noise or stochasticity, then this connection between supervised and unsupervised problems is quite general.
I agree with everything you said. I never said optimization isn't a part of ML, it was very much a part of my ML masters, in fact. I was just saying that, in this case, the OP doesn't have an explicit function to optimize, hence why he didn't need optimization...
Oh cool =) I don't really think in terms of classification and optimization anymore and don't really see much distinction between minimizing a loss or objective function. But I did rant on because I am really passionate about how much unnecessary complexity (hehe) is in much of machine learning. And am always eager to talk about promising unifying approaches. If you haven't read up on semirings or computational algebraic geometry you should. For an MLer learning more math is like learning math for a programmer, it is a lot of upfront work but it makes a lot of seemingly arbitrary list of rules much more cohesive and simpler when you've achieved what is basically a conceptual compression.
Yeah. It must be pretty energy intensive for the brain consider all that rewiring that occurs. I take the interlaced GIF approach to learning hard things.