I agree. I think it comes down to the motivation behind why one does mathematics (or any other field for that matter). If it's a means to an end, then sure have the AI do the work and get rid of the researchers. However, that's not why everyone does math. For many it's more akin to why an artist paints. People still paint today even though a camera can produce much more realistic images. It was probably the case (I'm guessing!) that there was a significant drop in jobs for artists-for-hire, for whom painting was just a means to an end (e.g. creating a portrait), but the artists who were doing it for the sake of art survived and were presumably made better by the ability to see photos of other places they want to paint or art from other artists due to the invention of the camera.
Another way to get to the same result is to use "Feynman's Trick" of differentiating inside a sum:
Consider the function f(x) = Sum_{n=1}^\infty c^(-xn)
Then differentiate this k times. Each time you pull down a factor of n (as well as a log(c), but that's just a constant). So, the sum you're looking for is related to the kth derivative of this function.
Now, fortunately this function can be evaluated explicitly since it's just a geometric series: it's 1 / (c^x - 1) -- note that the sum starts at 1 and not 0. Then it's just a matter of calculating a bunch of derivatives of this function, keeping track of factors of log(c) etc. and then evaluating it at x = 1 at the very end. Very labor intensive, but (in my opinion) less mysterious than the approach shown here (although, of course the polylogarithm function is precisely this tower of derivatives for negative integer values).
Instead of differentiating c^(-xn) w.r.t. x to pull down factors of n (and inconvenient logarithms of c), you can use (z d/dz) z^n = n z^n to pull down factors of n with no inconvenient logarithms. Then you can set z=1/2 at the end to get the desired summand here. This approach makes it more obvious that the answer will be rational.
This is effectively what OP does, but it is phrased there in terms of properties of the Li function, which makes it seem a little more exotic than thinking just in terms of differentiating power functions.
Yeah, differentiating these infinite sums to pull down polynomial factors is a familiar trick.
It happens in basic moment generating function manipulations (e.g., higher moments of random variables). Or from z-transforms in signal processing (z transforms of integrals or derivatives). And (a little less obvious, but still the same) from Fourier analysis.
The concept applies to any moment generating function, z-transform, whatever. It’s clearest for the geometric distribution, where the distribution itself has the geometric form (https://mathworld.wolfram.com/GeometricDistribution.html, around equation 6).
I agree that the Li function seems like a detour, but maybe it can make some of the manipulation easier?
I mean, you could jump into the black hole to see what's inside so it's not unfalsifiable. The only issue is that you can't convey it to someone on the outside of the black hole.
I especially like the end of that video where he discusses trying to transcribe regular speech to music notation. It completes a part of my music theory mental puzzle.
Admittedly, I know very little beyond high school band, but the theory always felt arbitrary. Like, whenever I would try to drill down on something like why notes are the frequencies they are, I eventually got to “We don’t know.” That left me thinking either only that teacher didn’t know or the theory was BS.
Now I see it was neither. Music is something we irrational, inscrutable humans made up. Music theory, even at its most brilliant, is just an approximation. I always thought music theory describe fundamental, physical truths that humans intuitively discovered. Now I see that music theory describes our intuitive understanding.
Sort of like how a recipe calls for 1/4 tsp salt but people just use a pinch. I thought the people were being imprecise. But it’s really that the recipe cannot precisely capture all the variables in cooking that actually makes “a pinch of salt” a more precise description.
At the end of the day music theory is prescriptive, not predictive. That's poorly taught in high school education, and it's not limited to the arts.
I think people go into a music theory course and want it to be like math or physics where you can plug and chug to get an answer (where the answer is "good" music) - but that's not really what's going on. The tools that music theory gives you are those to understand how a piece of music has been constructed, and if you want to apply that to creating new music you can try using those structures in your own work. It's still up to you (or really, your audience) to decide if it's "good."
Disagree. Music theory should not be prescriptive. It's descriptive at best.
But then you seem to go on to explain that it's actually not prescriptive. If it sounds good, it is good. Theory is sometimes useful to make it sound good.
> I always thought music theory describe fundamental, physical truths that humans intuitively discovered.
There _are_ some fundamental physical truths in music (related to overtones, wavelengths, etc), but the funny thing is that most of our "music theory" has actually had to bodge those truths a bit to accommodate variety and playability.
(For example, listen to a major third on a just intoned instrument versus an equal temperament instrument like a guitar!)
And he references the exact blog post here. I hear it as 6/8 (definitely a rest after the first three 16th notes which is born out in the timing) and I still think that's closer, but I appreciate the 21/32 argument and glad to see him actually try to find a common subdivision to those timings
When he talks about how things are done based on feel instead of complex subdivisions, well normal subdivisions are ultimately done based on feel as well, so there's not much difference really
Here's a fun thought experiment / apparent paradox.
In high school physics we learn that a 1kg mass accelerates as quickly as a 2kg mass when only subjected to the force of gravity. When I used to teach physics, the intuitive explanation I gave for this hinged on a thought experiment. Suppose that you have three 1kg masses falling side by side after being dropped from the same height. Clearly they are all going to fall at the same rate since they're equivalent. Now imagine redoing the experiment but this time taking two of the masses and placing them closer together. Does anything change? Clearly not, they're still all equivalent and ought to fall at the same rate. Now imagine doing this until those two masses are right next to each other, touching. Does anything change? Well no, all three should still fall at the same rate. But now, why not glue those two masses together and call it a 2kg mass? Once you do that you've shown that a 1kg mass and a 2kg mass fall at the same rate.
This usually convinces people, but there's actually a flaw in the argument that gets to the heart of why gravity is so different from the other forces.
To see the flaw, replace the above masses by three electrons falling next to each other in an electric field. Everything goes through in exactly the same way. You end up gluing together two of the electrons and these two electrons will accelerate at the same rate as the single electron. But if you're not careful you'd conclude that all electric charges fall at the same rate in an electric field, something we know is false.
Where's the flaw? Well, all of matter is built from some particles, and as long as you restrict yourself to particles that have the same "charge/mass ratio", the argument above works. It is true that one electron accelerates the same as 100 electrons tied together but that's just because e/m is the same for all those constituents.
So, the thing that's glossed over in my high school explanation for why 1kg and 2kg accelerate at the same rate is that the constituent particles all have the same "gravitational charge / inertial mass" ratio. Because this ratio is the same for all particles, we may as well absorbed that ratio into the gravitational constant and just use "m" in place of both of them. It's this "universal coupling" that's really responsible for the equivalence principle and what sets gravity apart from the other forces.
1kg mass and a 2kg mass do not fall at the same rate. The Gravitational force is (G*m1*m2)/r^2. You are observing that m1 (the earth) is much much greater than m2 (the 1 or 2 kg masses), and you are simplifying to (G*m1)/d^2 because of the precision of the measuring device. Also, d is the same for both masses.
The force is on both objects at the same time. The force in F = ma is a function of the mass of both and their distance. If the mass is different in the two scenarios, then the force is different. On earth with small weights, they seem the same because of the precision of the measurement.
Is what you're getting at the fact that the distance between the earth and the other object changes from two effects (the first being the ball falling towards the Earth and the second being the Earth falling towards the ball)? That's right, of course. But that distance's second derivative is not the acceleration a in F=ma. Indeed, in both Galilean and Einsteinian relativity acceleration is detectable locally without a needed reference to another object.
Yes - I was making a mistake. I was trying to describe the effect of both masses. When one is much smaller than the other, then the movement is mostly in one direction. When they are closer in mass or even equal, they move toward each other. For example, if you have a 1 liter water bottle filled with a material that gives it the same mass as the earth, then the two bodies will move toward each other, and the water bottle will seem to move toward the earth much faster that the 1 filled with water (1kg). If it is filled with a material, that gives it much grater mass than the earth, the earth will move toward it.
The mass of the earth dictates the acceleration of the individual masses towards the earth. However the acceleration of the earth itself towards the masses are dependent on how much mass is falling towards the earth. When more mass is falling to the earth, the earth accelerates towards the masses faster. So the thought experiment is flawed because with only one 1 kg weight falling towards earth, the gap between the weight closes slower than when there are three 1 kg weights spaced 1 m apart and dropped simultaneously.
If you define fall as the size of the gap. You could also take it as acceleration towards the barycenter, which would be the same. These are indistinguishable for everyday objects so could argue that the word “fall” could be interpreted either way.
>You end up gluing together two of the electrons and these two electrons will accelerate at the same rate as the single electron.
Not really. You have swept under the rug the fact that it's really hard to glue electrons together. And if you were to actually find a way to do it, you would have to add so much potential energy to the system that it's inertial mass would increase dramatically. In fact, two electrons that were actually "together" (whatever that might actually mean for a quantum particle that obeys the Pauli exclusion principle) would have a mass orders of magnitude higher than two electrons separately.
You probably want to talk about ions in an electric field, which you can "glue together", but then it becomes rather obvious that they don't all accelerate at the same rate.
The equivalence principle is not the only thing that distinguishes gravity from other forces. There is also the fact that there is only one gravitational "charge" and it's mutually attractive. (Gravity is also many orders of magnitude weaker than all other forces.)
It is generally believed that the perturbation expansion that we see in realistic quantum field theories are what are known as asymptomatic expansions. These are series that have a radius of convergence of zero (i.e. they only converge when the expansion parameter is exactly zero and diverge for all non zero values).
There are then two natural questions: 1. if the perturbation series diverges, why doesn't the universe explode? and 2. If the series diverges, why can we use it at all?
Let's first talk about the first part: why doesn't the universe explode? Well, it's because the perturbation series is not actually what is going on, the real answer is the solution to the full set of equations. It's just that we're using a perturbation expansion as a crutch. It's sort of like if the universe's function is 1/(1-x) but we constantly insist on using 1+x+x^2+... Clearly the first function is completely well behaved at x=2 but the second one is not. If we notice that our series explodes for x=2 we should not immediately assume that the universe also must explode, it's just that our representation of the true physics is not faithful. This is perhaps a bad example because the series in question is convergent for some x, just not for x=2. The perturbation expansions in question are more subtle since the never converge.
This then leads into the second question: if the series diverges, how can we even use it? Well the idea here is that it's not just any divergent series (like my silly example with 1/(1-x) above) but rather an asymptomatic series. This means that as long as you truncate the series at some point it is in fact reasonably close to the target function for a sufficiently small value of the parameter. It's just that the more terms you want to include, the sooner the approximation breaks in terms of the parameter. So, if you want to include 10 terms it might be a decent approximation until x ~0.1 but if you include 100 terms it might only be a good approximation until x~0.01. Now, within the overlapping range (x<0.01) it's better to have 100 terms than 10 terms, so it's not like including more terms is bad in all ways. But you see the issue: if you include 1000 terms you get a better approximation for your function for values x<0.001 than you had with 100 terms but now your approximation breaks much sooner. If you want to include all the terms your approximation breaks the moment you leave the point x=0.
Why do we think that QFT perturbation theories generally have zero radius of convergence? Well, look at QED, the quantum theory of E&M. If the theory had any nonzero radius of convergence, that also means that the theory would need to make sense for negative coupling constants. However, what would E&M look like for negative coupling? Well, we'd still have electron/positron virtual pair creation from the vacuum since the interactions of the theory are still the same. However this time around they wouldn't attract each other anymore but instead repel each other causing an instability in the vacuum of the theory. We would just constantly be producing these particle/anti-particle pairs and they'd form two separate clusters where all the electrons attract each other and all the positions attract each other but they pairwise repel. In other words, the vacuum would break. This suggests that QED with a negative coupling constants doesn't make sense. But this contradicts the fact that the radius of convergence of the perturbative expansion is nonzero.
That's not to say that all QFTs must have zero radius of convergence, but similar arguments can (I think) be made for the type of QFTs that we actually see in nature.
While everyone is debating whether this is impressive or dumb, if it's a leap forward in technology or just a rehashing of old ideas with more data, if we should really care that much about it passing the bar exam or if it's all just a parlor trick, people around the world are starting to use this as a tool and getting real results, becoming more productive, and building stuff... Seems like the proof is in the pudding!
One of the major reasons why this all is heavily debated, including by those not in the field at all, is because if those things are really capable of human-like reasoning, it leads to answers for some commonly asked philosophical questions on the nature of human conscience, intellect etc that many people find difficult to accept.
Yeah, from a philosophical perspective these are interesting questions to ponder, but my impression of these comments is less that people are pondering the depth of consciousness and more that they're trying to be contrarian / naysayers.
Yeah but they aren't using it for the same stuff that they would use a lawyer who got the same result on the bar exam. I think it is fair to say that LLMs have an unfair advantage over humans on these exams, and we should take that into account when trying to assess them.
One (perhaps silly) way I like to frame this stuff is by imagining that I'm some special purpose Turing machine designed specifically for some task. Sure, sometimes other Turing machines come along that appear to infringe upon my skill set but they ultimately only perform a small subtask better than I am able to (e.g. calculator, spell check, word processor, IDE, code completion, ...). So, I incorporate it into my routine, effectively boosting my own performance.
Now, what would happen if all of a sudden a universal Turing machine came along? Well, by virtue of being universal, that means that it can emulate me and all other Turing machines. This time around things are different. Even if I can find a way to incorporate it into my workflow, it can still emulate that more sophisticated version of me by virtue of being universal. So it then comes down to whether or not I can incorporate the latest version of this universal Turing machine faster than its own design is improved. If not, I will be replaced. Since in our instantiation, I am made from biological material it's in my mind only a matter of time before the universal Turing machine starts outpacing me.
So, I guess the question is then if these GPT models (or their descendants) are universal (in my hand wavy definition of the term).
You are misunderstanding / misusing "universal Turing machine" ... Computers have have been universal Turing machines aka Turing complete approximately since computers were invented.
You seem to be confusing UTM with Artificial General Intelligence. Universal Turing Machine is not the term for some magic machine that can interpret and integrate any observed computation. LLMs will significantly change how we interact with computers, but the ability to emulate another turing machine has always been there (for computers and yes, LLMs with memory are turning complete). That doesn't mean AGI can be implemented efficiently or that LLMs are sufficient for AGI.
No I'm just it as an analogy. Not all Turing machines are universal.
What were going through now could maybe be likened to what it would be like for a Turing machine to encounter a universal Turing machine for the first time. For all its life this fictitious Turing machine has encountered other non-universal Turing machines and have simply incorporated them into their own process. When they then encounter their first universal Turing machine they would possibly not be too concerned since each time before they have always just been able to use the new machine to make themselves more productive. However, this time it's different.
My point is just that while it may very well have been true in all of history that new tools have just made us more productive than before rather than fully replace us, this won't be the case for AGI. It's not just another tool we can add to our arsenal but instead something than can subsume us entirely much like how a universal Turing machine can emulate any other Turing machine.
There's another version of the Monte Hall problem that highlights why this is such a counterintuitive problem.
Imagine that after you pick your box, Monte Hall invites an audience member up on stage and instructs them to choose one of the remaining two doors to open. This audience member doesn't know anything at all and just randomly picks one of the two doors. When their door is opened we see that it's empty. You're now given the option of switching just like in the standard game. Should you?
Cosmetically everything is identical with the standard game, but if you analyze the game carefully this time you're left with a 50/50 shot so there's no benefit of switching.
I think most of the arguments in this article would appear to work for this modified version of the game which means that they're not actually getting to the heart of the problem.
For completeness, the reason this now reduces to 50/50 is that there's also now a chance that the spectator opens the door with the car behind it, something that couldn't happen in the original Monte Hall problem. Put another way, there's actually a little bit of information that's conveyed to you when you see that the spectator happens to not open the door with the car and this extra information exactly cancels the usual benefit you get from eliminating the other empty door. In the example of "scaling up" in the article, if you did this with 20 doors and the spectator randomly picks 18 of the 19 unopened ones to open and then happen to not stumble upon the car, you might actually think that you could have been lucky all along. Ultimately you're left with a 50/50 chance.
Correct me if I'm wrong, but in your particular example (spectator opens an empty door and I am asked if I want to switch), nothing changes in regards to the original Monty Hall problem. If a spectator opens a random remaining door, one of two things can happen:
- a car is revealed, I lose immediately (there is no option to switch anymore)
- no car is revealed, which means I again have 2/3 chances when switching, not a 50% chance as you've stated
In your example, the spectator opens an empty door, so there is no difference to the host opening an empty door in regards to the probability. Again, if the spectator opens a car, I just lose.
No, if the spectator happens to open an empty door, the probability collapses to 50/50. That's the point I'm making for how counterintuitive this is. Here's the full set of cases. Let's say that you select door 1 and spectator opens door 2 (all other cases are permutations of this):
Door1 Door2 Door3
Car Empty Empty
Empty Car Empty
Empty Empty Car
Let's suppose your strategy is to stick with your original choice. In the first case above then you get the car. In the second case the audience member stumbles upon the car and you lose. In the third case you lose because you stick with an empty door. All three cases are equally likely and since the second one ends, and you know that your game didn't end, you know that you're either in case 1 or case 2. Your chance is thus 50/50.
The issue is that in the classic MH problem, cases 2 and 3 are collapsed into one outcome (MH opens an empty door), but that's not true here.
More mathematically, you should ask yourself p(Door1 | Game Did Not End).
The point I was making is that the problem (MH) is a lot more subtle than people give it credit for. Many of the arguments people make for how the MH problem is "obvious" seem to work for this modified game too unless you're careful with them.
No. The fact that they happened to not open the door with the car tells you something. Take it to the extreme of 100 doors. If the spectator randomly opens 98 doors and doesn't randomly stumble upon the car you should take that as evidence that you might have the car yourself. This extra evidence in favor of staying with your door exactly cancels the original MH advantage of switching.
Thank you for the explanation; I think I see it now.
Here's another way of seeing it which I think may be effectively the same. Suppose that there are 100 doors in a circle. You pick one and then all the doors are opened except for the one you picked and the next door along. If the car hasn't been revealed yet, it seems intuitively right that you have a 50-50 chance of winning whether you stick or switch at that point.