Backpropagation is based on linear optimization (aka calculate the maximum or minimum of a function based on the derivades of that function, this is taught before university in my country). And also in the chain rule to calculate the derivades of functions (first year of university).
But I meant, if you see the equations and the steps without understanding completely the insights, it is a joke of an algorithm. It just does some multiplications and applies the new gradients, move to the previous layer and repeat.