Lots of things are torsors: position, currency values, calendar dates etc. the vales themselves are arbitrary, and translating/scaling them by some value doesn't make a functional difference. Torsors let us talk about these things without needing to make such an arbitrary choice a priori.
In the case of baseless logs, the underlying set is "information units", i.e. log 2 is bits, log e is nats, log 10 is digits, etc. The conversion factors give us the torsor's group, and picking a privileged unit is just a trivialization of the torsor.
The vector division notation is, similarly, encoding a g-torsor in precisely the same way as length units are.
The examples so far are all torsors with abelian groups, but specifying position both requires choosing an origin and a length unit. The group of this torsor is a suitable semidirect product between translation and scaling, which gives a non-abelian group.
Most of the time we just implicitly choose a trivialization, which often causes confusion because it identifies objects with operations on them, e.g. conflating vectors as positions with vectors as translations. The author's treatise on problems with geometric algebra [1] even brings up this point!
I do know about torsors actually but I didn't think to link it from there. I guess I don't find the term very useful; it feels like things are still hard to think about even after you know it's a torsor!---but also, I think I need to get more familiar with the concept, because the other commenter on here who described my basis-logarithm as a "GL(V)-torsor" really said it much more succinctly than what I was hacking out manually.
Regardless of the terminology, I thought it was interesting because I have never seen the logarithm thought about in that way.
Thanks for the article. I do think your more elementary approach is good pedagogy since the subject is so broadly familiar already. I just like torsors, since they elegantly encode the "arbitrary choice" needed to deal with lots of objects.
Using the term "torsor" for that mathematical concept has been a very bad choice, both because the concept does not have any obvious relationship with the meaning of the word and because the word "torsor" had already been used for a very long time in classical mechanics for a very different concept, i.e. for the quantity that must be null for a rigid body to stay in equilibrium (i.e. the pair of a resultant force and a resultant torque).
Unfortunately, in mathematics there already is a long tradition of reusing common words to designate concepts that have no relationship whatsoever with the original meanings of those words. This obfuscates the content of many mathematical books or research papers, because even when they state trivial facts the statements are opaque for those unfamiliar with the specific jargon used in that niche branch of mathematics.
Words happen more than they are chosen, cf. "computer". The term "torsor" in this sense likely comes from the French "torseur" [0], which was used to describe rigid-body motions via a fundamental screw-like action.
The hypothesis seems to be that the idea of affine spaces came out of that theory, for whatever reason, which was subsequently generalized to principle bundles and finally into what we have now. The point is that, at every step along the way, we want to connect the incrementally new ideas to existing ones, and creating a hard break with new, idiosyncratic terminology is itself obfuscatory.
My beef is more with use of the heavily-overloaded words "regular" and "normal" in math, which just seems like lazy naming:
> In the normal extension K/Q, every normal subgroup of the regular representation acts on a normal scheme that is regular in codimension one, whose normal bundle — orthonormal to the regular surface at each regular value — carries a normal operator whose spectrum follows a normal distribution over a space that is at once regular and normal, all indexed by a regular cardinal.
That's like 8 different meanings of normal and 6 different meanings of regular. lol
>I think you can look at adjoint profunctors from the unit category and show that they consist of giving a consistent ‘distance’ to every object, which in a torsor will be represented.
Logs are awesome. I started a math textbook from the 1920's a while ago, and all the calculations relied on tabulated logs, where you would convert the number to a log in a table to reduce the operation's degree, then convert back to the ordinary representation. This would reduce operations like finding cubed roots to division, would could be converted to log-log to be further reduced to subtraction before you would restore to ordinary notation. It feels like you're using a magic wormhole or something when you're doing this stuff by hand, it's really neat.
Another neat application, if a bit simplistic, are these mechanical paper computer that let you figure out your body-mass-index. They are basically two disks with logarithmic scales on them that you rotate relative to each other. Like a slide-rule, but circular. I think you can find them under the name 'BMI wheel'.
I found this book because I was a little rusty on my trig and most celestial navigation texts will just throw the PZX equation (and others) at you without breaking down what's actually being done with it on a mathematical level...it's just kind of treated like a magical black box without any discussion, and I'd rather have a complete understanding of what I'm doing and why. Having an application-specific approach also makes it a lot easier to learn.
I'm using it with Norie's Nautical Tables, which has the log tables and a whole lot else:
The important properties of the logarithm are structural: we usually do not care about units or bases, except when carrying out an actual numerical computation.
As developed in the article, informally, but somewhat sufficiently, the change of base formula shows that the choice of base is largely irrelevant: different bases give equivalent logarithms up to a constant factor.
The Taylor expansion of exp gives a more intrinsic and general definition of the exponential function. This allows exp to be generalised structurally to many algebraic settings, provided the relevant convergence conditions are met: for example, the complex exponential and its many possible logs, the matrix exponential, and so on…
> The important properties of the logarithm are structural: we usually do not care about units or bases, except when carrying out an actual numerical computation.
Units are important as a sort-of type system, even at the conceptual level.
You are right that bases are not as important conceptually.
Well, the brightness of celestial objects is also sometimes negative:
> The apparent magnitude of known objects can range from −26.832 for our Sun to about +31.5 for objects in deep space imaged by the Hubble Space Telescope.[3]
More specifically, 0 dB is the loudest sound the audio system is rated to produce without distortion. It's common to be able to actually drive systems harder than their specified engineering limits, which is why meters have a short positive dB section marked in red.
The later reuse of “log” across valuations, dimension, vector fields, orders of vanishing is not so good. Those may be related ideas, but each needs a type signature: from what, to what, and preserving which operation?
Or, to say a little more explicitly what you're getting at: when you take a logarithm of some quantity, log x, x absolutely must be unitless. There's no way whatsoever to take a logarithm of something with a unit attached. (This is an important and useful dimensional analysis check in formulas and long calculations!)
So what do you do in practice? You have to normalize: you don't calculate log x, but instead log x/U for some scaling unit U. It's typical for U to be something like 1 mV or 1 W in electrical engineering, for example. This is completely legitimate, but it does mean that the thing that comes out needs a corresponding unit attached to it: dBmV, dBW, et cetera.
And it's really kind of important to be careful about that.
The term "baseless logarithm" is really nonsensical and using it would be a great mistake.
Nonetheless, where the author of TFA is correct is that logarithms are a single physical quantity, like length, area or volume, and that choosing the so called "base" is choosing the unit of measurement for logarithms.
Logarithms are included in the dimensional formulae of many derived physical quantities, e.g. for describing the attenuation or amplification of waves during their propagation, where one uses quantities like logarithm per length and logarithm per time.
Changing the "base" of logarithms modifies the numeric values of all derived physical quantities exactly in the same manner as changing any other fundamental unit of measurement, like the unit of length or the unit of time.
Like for any physical quantity, the complete value of a logarithm is independent of the unit of measurement, because it is the product between the numeric value and the unit of measurement. When the unit of measurement is changed, both the numeric value and the unit are changed and the product stays the same (i.e. the logarithm corresponds to the same ratio, regardless what base is used to compute a numeric value for the logarithm).
Nowadays, the unit of logarithms is normally chosen between the octave (binary logarithms), neper (hyperbolic logarithms) or bel (decimal logarithms).
The units of measurement for logarithms are not the bases, but the logarithms of the bases, which is why e.g. the value of the number "e", the base of the hyperbolic logarithms, is never needed in any computation. The only values that are needed are "ln 2" or its inverse "log2 e", which are used to convert the numeric values of logarithms when the unit of measurement is changed between those corresponding to binary logarithms and to hyperbolic logarithms (a.k.a. natural logarithms, but there is nothing more "natural" about hyperbolic logarithms than about any other kind of logarithms).
Wasn't there some scientific paper recently that proved that every operation can be represented as a logarithm? Like, the same as every logic gate can be derived from NAND gates
I think what's going on with the complex logarithm is basically the same as the logarithm that outputs the set of all possible bases for a vector space. The complex logarithm produces a Z-torsor, and the basis logarithm produces a GL(V)-torsor. There's probably some way to represent a choice of branch cut as a part of the choice of the base of the complex logarithm, and similarly the choice of a specific basis as part of the choice of base of the vector space base logarithm.
Interesting, it did not occur to me of those as two instances of the same phenomenon. Although I still find the complex analytic one hard to think about.
I think that's more about integrations/differentials not producing them (generally speaking). Physics likes to deal with integrals and differentiation as you calculate change over time or over spatial dimensions.
Eg. the integral of x^10 is x^11 / 11 + c. No hyper-operation appears and it's just another exponential (with a division).
The integral of log(x) is xlog(x) - x + c. So still basically just a logarithm
Even the integral of 2^x is just 2^x / log(2). Still basically the same thing.
There's no easy way to pull a hyper-operation out.
I read this kind of essay as a certain part of the arc by which new thoughts are formed: an act of large-scale pattern matching, laying out a bunch of cases which resemble each other, searching for the essential basis of the resemblance.
To post such a pattern allows the thought process to become distributed. Perhaps someone else will see the insight.
Logarithms are laughably simple once you've fully internalized the meaning of the log function; it simply answers the question:
"To what power must I raise the base to get the argument?"
This is why the output tapers out as you increase the argument; because even if you increase the argument exponentially, you only need a fixed increment in the power to reach that number... So if you increase the argument only by a fixed amount (linearly) instead of exponentially, then it makes sense that the output will grow sub-linearly.
I remember when I was doing algebra with logs many years ago at school, I was applying rules to remove the log from one side of the equation.
Then when I got to uni, I had to revise the rules but it was kind of silly of me because those rules can be trivially derived if you just think about what the log function means. Turns out I had been solving equations with logs throughout school without understanding what they even meant... It's only at university that I actually bothered to learn them.
Actually TBH. I didn't even fully understand powers for some time even though I was doing calculus with them at school. I only fully understood powers once I properly internalized the concept of k-ary trees as a proxy.
It's one thing to be able to apply something, another to understand it. And I think to innovate with something, as a tool, it's not enough to be able to apply it. You must understand it.
Seeing there is nothing (right well-beloved Students of the Mathematics) that is so troublesome to mathematical practice, nor that doth more molest and hinder calculators, than the multiplications, divisions, square and cubical extractions of great numbers, which besides the tedious expense of time are for the most part subject to many slippery errors, I began therefore to consider in my mind by what certain and ready art I might remove those hindrances. And having thought upon many things to this purpose, I found at length some excellent brief rules to be treated of (perhaps) hereafter. But amongst all, none more profitable than this which together with the hard and tedious multiplications, divisions, and extractions of roots, doth also cast away from the work itself even the very numbers themselves that are to be multiplied, divided and resolved into roots, and putteth other numbers in their place which perform as much as they can do, only by addition and subtraction, division by two or division by three.
This is what provides the intuition viz; convert multiplication/division/etc. of large numbers into addition/subtraction of two other smaller numbers. Logarithms as inverse of Exponentiation came much later. Starting with this generally confuses the student since they do not understand the point of it all.
Napier conceived the logarithm as the relationship between two particles moving along a line, one at constant speed and the other at a speed proportional to its distance from a fixed endpoint.
Since the speed is directly proportional to its remaining distance from the fixed endpoint, it therefore is a deceleration, which results in the characteristic "flattening" of the curve.
Further details for understanding the above can be found at
Priority, Parallel Discovery, and Pre-eminence: Napier, Burgi and the Early History of the Logarithm Relation (pdf) - http://www.numdam.org/item/RHM_2012__18_2_223_0.pdf
// The power to which I must raise 10 to get 100 is 2.
log10(100) = 2
// The power to which I must raise 10 to get 1000 is 3.
log10(1000) = 3
// The power to which I must raise 3 to get 27 is 3.
log3(27) = 3
Also it makes solving equations much more intuitive:
log3(x) = 4
^ This means; the power to which I must raise 3 to get x is 4.
So it follows logically that if I raise 3 to the power of 4, I will get x.
This makes it intuitive that this equation can be rewritten as:
x = 3 ^ 4
You don't even need to know the algebraic rule. I felt retarded when I figured this out.
This was a rule I had memorized before. It's even dumber and easier to infer than the rule to compute derivatives. I wonder why teachers even bother to teach you all these rules when they could just explain the fundamentals to you.
I had a weird relationship with Math growing up; I alternated between getting very high grades and terrible grades depending on the teacher. I didn't like all the notations and conventions of Math and the way it was taught, but I enjoyed it conceptually. It had ended badly in high school as I did poorly in advanced Math though I did quite well in all my other subjects so I got into a good Software Engineering degree at a top 50 university for engineering globally anyway.
But early in college, it occurred to me that I didn't understand Math concepts as intuitively as I understood programming concepts so I challenged myself to revisit everything from the beginning including numbers, addition, subtraction, fractions, roots, powers, probabilities, derivatives, integrals, vectors, matrices, calculus...
I had to free myself from thinking of Math as symbols on a piece of paper and think of it as being about actual quantities, transformations and combinations. I needed a completely new way to think about it and visualize every single step. When I was practicing calculus, I would stop at each step and try to visualize the equation. For example, when finding the 3D plane perpendicular to a point on a 3D curve, I would put effort into visualizing what happened to the equations across different dimensions at each step when I found the partial derivatives and combined them to get the 3D plane vectors.
My Math grades at university were quite good. I passed all the Math courses with ease and got several distinctions even.
Look, the whole thing actually makes sense and the core idea is pretty cool because it's true that a lot of stuff in math looks identical. But in my opinion this is way too much of a macro-level overgeneralization and you risk throwing everything into the same pot, which ends up diluting the actual point of things.I mean, if you take a hammer and a meat mallet, at the end of the day they're both chunks of metal used to hit stuff, but if you bunch them together without making any distinction, you lose track of why you use one to drive nails into a wall and the other to prep cutlets.Saying everything is just one big logarithm is a nice mental exercise, but I feel like it flattens out the differences too much and makes you lose the practical utility of the individual math tools, which are meant to solve completely different problems.
I'm a programmer so to me this brings to mind the idea of classes and subclasses. A program is implemented by having a set of classes. The classes can be organized into a class-hierarchy where they inherit methods from their ancestor-classes.
Now assume originally you did not have the feature of inheritance in your programming language so you would just create all the classes you need without orgnizing them into an inheritance-tree. Then you upgraded to a language that doe shave inheritance and you wanted to refactor your program to omit duplicate definitions of methods.
What kind of class-hierarchy would you come up with? There is no single way to do it. Some ways are better than others. There migh be more than one optimal way.
Same goes with generalization general, it is part of the language we create to describe things and there are many different languages we may come up with, some simpler, some more difficult to understand.
Lots of things are torsors: position, currency values, calendar dates etc. the vales themselves are arbitrary, and translating/scaling them by some value doesn't make a functional difference. Torsors let us talk about these things without needing to make such an arbitrary choice a priori.
In the case of baseless logs, the underlying set is "information units", i.e. log 2 is bits, log e is nats, log 10 is digits, etc. The conversion factors give us the torsor's group, and picking a privileged unit is just a trivialization of the torsor.
The vector division notation is, similarly, encoding a g-torsor in precisely the same way as length units are.
The examples so far are all torsors with abelian groups, but specifying position both requires choosing an origin and a length unit. The group of this torsor is a suitable semidirect product between translation and scaling, which gives a non-abelian group.
Most of the time we just implicitly choose a trivialization, which often causes confusion because it identifies objects with operations on them, e.g. conflating vectors as positions with vectors as translations. The author's treatise on problems with geometric algebra [1] even brings up this point!
[0]:https://math.ucr.edu/home/baez/torsors.html
[1]:https://alexkritchevsky.com/2024/02/28/geometric-algebra.htm...
Regardless of the terminology, I thought it was interesting because I have never seen the logarithm thought about in that way.
Thanks for the writeup!
Unfortunately, in mathematics there already is a long tradition of reusing common words to designate concepts that have no relationship whatsoever with the original meanings of those words. This obfuscates the content of many mathematical books or research papers, because even when they state trivial facts the statements are opaque for those unfamiliar with the specific jargon used in that niche branch of mathematics.
The hypothesis seems to be that the idea of affine spaces came out of that theory, for whatever reason, which was subsequently generalized to principle bundles and finally into what we have now. The point is that, at every step along the way, we want to connect the incrementally new ideas to existing ones, and creating a hard break with new, idiosyncratic terminology is itself obfuscatory.
My beef is more with use of the heavily-overloaded words "regular" and "normal" in math, which just seems like lazy naming:
> In the normal extension K/Q, every normal subgroup of the regular representation acts on a normal scheme that is regular in codimension one, whose normal bundle — orthonormal to the regular surface at each regular value — carries a normal operator whose spectrum follows a normal distribution over a space that is at once regular and normal, all indexed by a regular cardinal.
That's like 8 different meanings of normal and 6 different meanings of regular. lol
[0]:https://fr.wikipedia.org/wiki/Torseur
https://golem.ph.utexas.edu/category/2013/06/torsors_and_enr...
Consider in particular that use of ‘distance’
>I think you can look at adjoint profunctors from the unit category and show that they consist of giving a consistent ‘distance’ to every object, which in a torsor will be represented.
https://www.google.com/books/edition/Trigonometry_for_Naviga...
See my other comment:
https://news.ycombinator.com/item?id=48623646
https://www.google.com/books/edition/Trigonometry_for_Naviga...
I found this book because I was a little rusty on my trig and most celestial navigation texts will just throw the PZX equation (and others) at you without breaking down what's actually being done with it on a mathematical level...it's just kind of treated like a magical black box without any discussion, and I'd rather have a complete understanding of what I'm doing and why. Having an application-specific approach also makes it a lot easier to learn.
I'm using it with Norie's Nautical Tables, which has the log tables and a whole lot else:
https://bluewaterweb.com/product/nories-nautical-tables-2025...
I'm sure there are plenty of free PDF's of log tables you can find though.
(I believe they used log tables on boats primarily because it's easier to use than a slide rule when everything is constantly rocking back and forth.)
It’s like audio where people say "dB" as if it answers the next question. Relative to what, measured how, and weighted for whom?
Author should brush up on https://en.wikipedia.org/wiki/Lie_theory
As developed in the article, informally, but somewhat sufficiently, the change of base formula shows that the choice of base is largely irrelevant: different bases give equivalent logarithms up to a constant factor.
The Taylor expansion of exp gives a more intrinsic and general definition of the exponential function. This allows exp to be generalised structurally to many algebraic settings, provided the relevant convergence conditions are met: for example, the complex exponential and its many possible logs, the matrix exponential, and so on…
Units are important as a sort-of type system, even at the conceptual level.
You are right that bases are not as important conceptually.
> The apparent magnitude of known objects can range from −26.832 for our Sun to about +31.5 for objects in deep space imaged by the Hubble Space Telescope.[3]
See https://en.wikipedia.org/wiki/Apparent_magnitude
https://en.wikipedia.org/wiki/Absolute_threshold_of_hearing
The later reuse of “log” across valuations, dimension, vector fields, orders of vanishing is not so good. Those may be related ideas, but each needs a type signature: from what, to what, and preserving which operation?
So what do you do in practice? You have to normalize: you don't calculate log x, but instead log x/U for some scaling unit U. It's typical for U to be something like 1 mV or 1 W in electrical engineering, for example. This is completely legitimate, but it does mean that the thing that comes out needs a corresponding unit attached to it: dBmV, dBW, et cetera.
And it's really kind of important to be careful about that.
Nonetheless, where the author of TFA is correct is that logarithms are a single physical quantity, like length, area or volume, and that choosing the so called "base" is choosing the unit of measurement for logarithms.
Logarithms are included in the dimensional formulae of many derived physical quantities, e.g. for describing the attenuation or amplification of waves during their propagation, where one uses quantities like logarithm per length and logarithm per time.
Changing the "base" of logarithms modifies the numeric values of all derived physical quantities exactly in the same manner as changing any other fundamental unit of measurement, like the unit of length or the unit of time.
Like for any physical quantity, the complete value of a logarithm is independent of the unit of measurement, because it is the product between the numeric value and the unit of measurement. When the unit of measurement is changed, both the numeric value and the unit are changed and the product stays the same (i.e. the logarithm corresponds to the same ratio, regardless what base is used to compute a numeric value for the logarithm).
Nowadays, the unit of logarithms is normally chosen between the octave (binary logarithms), neper (hyperbolic logarithms) or bel (decimal logarithms).
The units of measurement for logarithms are not the bases, but the logarithms of the bases, which is why e.g. the value of the number "e", the base of the hyperbolic logarithms, is never needed in any computation. The only values that are needed are "ln 2" or its inverse "log2 e", which are used to convert the numeric values of logarithms when the unit of measurement is changed between those corresponding to binary logarithms and to hyperbolic logarithms (a.k.a. natural logarithms, but there is nothing more "natural" about hyperbolic logarithms than about any other kind of logarithms).
[0] magworld.pw
Eg. the integral of x^10 is x^11 / 11 + c. No hyper-operation appears and it's just another exponential (with a division).
The integral of log(x) is xlog(x) - x + c. So still basically just a logarithm
Even the integral of 2^x is just 2^x / log(2). Still basically the same thing.
There's no easy way to pull a hyper-operation out.
To post such a pattern allows the thought process to become distributed. Perhaps someone else will see the insight.
Logarithms are laughably simple once you've fully internalized the meaning of the log function; it simply answers the question:
"To what power must I raise the base to get the argument?"
This is why the output tapers out as you increase the argument; because even if you increase the argument exponentially, you only need a fixed increment in the power to reach that number... So if you increase the argument only by a fixed amount (linearly) instead of exponentially, then it makes sense that the output will grow sub-linearly.
I remember when I was doing algebra with logs many years ago at school, I was applying rules to remove the log from one side of the equation.
Then when I got to uni, I had to revise the rules but it was kind of silly of me because those rules can be trivially derived if you just think about what the log function means. Turns out I had been solving equations with logs throughout school without understanding what they even meant... It's only at university that I actually bothered to learn them.
Actually TBH. I didn't even fully understand powers for some time even though I was doing calculus with them at school. I only fully understood powers once I properly internalized the concept of k-ary trees as a proxy.
It's one thing to be able to apply something, another to understand it. And I think to innovate with something, as a tool, it's not enough to be able to apply it. You must understand it.
Seeing there is nothing (right well-beloved Students of the Mathematics) that is so troublesome to mathematical practice, nor that doth more molest and hinder calculators, than the multiplications, divisions, square and cubical extractions of great numbers, which besides the tedious expense of time are for the most part subject to many slippery errors, I began therefore to consider in my mind by what certain and ready art I might remove those hindrances. And having thought upon many things to this purpose, I found at length some excellent brief rules to be treated of (perhaps) hereafter. But amongst all, none more profitable than this which together with the hard and tedious multiplications, divisions, and extractions of roots, doth also cast away from the work itself even the very numbers themselves that are to be multiplied, divided and resolved into roots, and putteth other numbers in their place which perform as much as they can do, only by addition and subtraction, division by two or division by three.
This is what provides the intuition viz; convert multiplication/division/etc. of large numbers into addition/subtraction of two other smaller numbers. Logarithms as inverse of Exponentiation came much later. Starting with this generally confuses the student since they do not understand the point of it all.
From https://en.wikipedia.org/wiki/History_of_logarithms;
Napier conceived the logarithm as the relationship between two particles moving along a line, one at constant speed and the other at a speed proportional to its distance from a fixed endpoint.
Since the speed is directly proportional to its remaining distance from the fixed endpoint, it therefore is a deceleration, which results in the characteristic "flattening" of the curve.
Further details for understanding the above can be found at Priority, Parallel Discovery, and Pre-eminence: Napier, Burgi and the Early History of the Logarithm Relation (pdf) - http://www.numdam.org/item/RHM_2012__18_2_223_0.pdf
// The power to which I must raise 10 to get 100 is 2.
log10(100) = 2
// The power to which I must raise 10 to get 1000 is 3.
log10(1000) = 3
// The power to which I must raise 3 to get 27 is 3.
log3(27) = 3
Also it makes solving equations much more intuitive:
log3(x) = 4
^ This means; the power to which I must raise 3 to get x is 4. So it follows logically that if I raise 3 to the power of 4, I will get x. This makes it intuitive that this equation can be rewritten as:
x = 3 ^ 4
You don't even need to know the algebraic rule. I felt retarded when I figured this out. This was a rule I had memorized before. It's even dumber and easier to infer than the rule to compute derivatives. I wonder why teachers even bother to teach you all these rules when they could just explain the fundamentals to you.
I had a weird relationship with Math growing up; I alternated between getting very high grades and terrible grades depending on the teacher. I didn't like all the notations and conventions of Math and the way it was taught, but I enjoyed it conceptually. It had ended badly in high school as I did poorly in advanced Math though I did quite well in all my other subjects so I got into a good Software Engineering degree at a top 50 university for engineering globally anyway.
But early in college, it occurred to me that I didn't understand Math concepts as intuitively as I understood programming concepts so I challenged myself to revisit everything from the beginning including numbers, addition, subtraction, fractions, roots, powers, probabilities, derivatives, integrals, vectors, matrices, calculus...
I had to free myself from thinking of Math as symbols on a piece of paper and think of it as being about actual quantities, transformations and combinations. I needed a completely new way to think about it and visualize every single step. When I was practicing calculus, I would stop at each step and try to visualize the equation. For example, when finding the 3D plane perpendicular to a point on a 3D curve, I would put effort into visualizing what happened to the equations across different dimensions at each step when I found the partial derivatives and combined them to get the 3D plane vectors.
My Math grades at university were quite good. I passed all the Math courses with ease and got several distinctions even.
Now assume originally you did not have the feature of inheritance in your programming language so you would just create all the classes you need without orgnizing them into an inheritance-tree. Then you upgraded to a language that doe shave inheritance and you wanted to refactor your program to omit duplicate definitions of methods.
What kind of class-hierarchy would you come up with? There is no single way to do it. Some ways are better than others. There migh be more than one optimal way.
Same goes with generalization general, it is part of the language we create to describe things and there are many different languages we may come up with, some simpler, some more difficult to understand.