Hacker Newsnew | past | comments | ask | show | jobs | submit | fouronnes3's commentslogin

I recently came across the existence of Penrose string diagram after curiously and naively researching the existence of index free notation for linear algebra. This seems to be a very interesting paper in the very same category of things I'd love to study and learn about but probably won't ever find the time!


You will possibly like Needham's "Visual Differential Geometry and Forms" and even if you don't have much time to study it, it's quite beautiful to look at from time to time.


Is a paper that publishes a 0.01% improvement of something at the cost of 5 times more power really an improvement? I believe that every single computer science measurement metric should have Joules or Watts in the denominator. If you are training a model I want to see performance per total energy consumed. If you are measuring inference accuracy, measure PER WATT.

I've always been a bit confused by the apparent tendency of the computer science field to mostly ignore energy and power. We are too often satisfied with the idea that software and programs exist in a perfect whiteboard world of xkcd 505 abstract compute.


open source ai labs care a lot about inference speed. that translates to energy and e waste (gpus that work for less time take longer to wear out). training power is another thing and thats where we see a lot of duplicate work we could fix by making it mandatory to release weights for all models above some total power limit.

if you want to look at the real waste of power just open up some electron app. no good reason why we still use it for new apps in 2026 when gpui and avalonia and tauri are all options


GPUI is still very hard to build things upon the last time (from my limited experimentation) I checked but I wish the team @Zed luck for the GPUI project as I am definitely fascinated by it and its certainly an interesting project for sure!


I work on differentiable geometric optics with PyTorch. Seeing a list like this is really illustrative of the power that PyTorch provides when you start considering it like a general purpose GPU-enabled state of the art numerical optimization framework.

One thing I wonder is why no one has made a fork of PyTorch yet that removes all the API surface that doesn't produce GPU friendly code. Make dtype and device arg mandatory without defaults, remove in place operations that trigger a CPU sync, etc. This would increase confidence that written code will run on the GPU and pass torch.export() on the first try.


Rather than forking PyTorch (which has issues like continually needing updates), could you create a set of linter rules instead?


It's useful to be able to run models/code on the CPU or split between CPU and GPU, especially for models that cannot fit into the GPU's VRAM, if you are running multiple models, or if you have training data that you need to move between CPU and GPU.


All of that would be possible with the changes parent poster proposes. See the gotchas section in JAX, which is exactly these limitations:

https://docs.jax.dev/en/latest/notebooks/Common_Gotchas_in_J...


> One thing I wonder is why no one has made a fork of PyTorch yet that […]

Try and compile the stack from source and you'll find out why nobody is making forks with small divergences.


Seems like you could write a simple source code checker program to check all of that. Making an extra library just for some (user hostile) tweaks seems like overkill.


Hi!

I'm curious to hear about your work geometric optics with PyTorch. May I ask you to share some examples of something you are working on right now?


That's a very cool idea :) It was proposed as far back as 1968 (!) in a paper by none other than the legend of floating point himself: Wiliam Kahan https://interval.louisiana.edu/historical-preprints/1968-Kah...


You should use two tolerances: absolute and relative. See for example numpy.allclose()

https://numpy.org/doc/stable/reference/generated/numpy.allcl...


Thanks! Arbitrary precision arithmetic is definitely something I'd like to learn more about, yeah. Haven't had time to study it so much yet unfortunately.


I don't handle it, ahah. You are right that if you take any classical numerical computing algorithm and replace the floating point reals by interval unions, most of the time the number of intervals in the unions in each of your variables will grow very fast. This is one of the problems of unions and as far as I'm aware it's a topic of active academic research.


Yeah it's super interesting. Like you said, I learned that the IEEE 754 spec actually requires that complete implementations of floating point numbers expose a way to programmatically choose the rounding mode. As far as I know only C allows you to do that, and even then it depends on hardware support. For JS I had to use ugly typedarray casts. Which kinda only accidentally work due to endianess. But technically there should be an API for it!

There's other unused stuff in IEEE 754 like that: the inexact bit or signaling NaNs!


Julia supports full IEEE 754 rounding mode support.


It's possible to support that but it makes the code very very much more complicated. I've decided early on to not support it. Would be a cool addition though!


I wonder if we could design a programming language specifically for teaching CS, and have a way to hard-exclude it from all LLM output. Kinda like anti virus software has special strings that are not viruses but trigger detections for testing.

This would probably require cooperation during model training, but now that I think of it, is there adversarial research on LLM? Can you design text data specifically to mess with LLM training? Like what is the 1MB of text data that if I insert it into the training set harms LLM training performance the most?


The solution is rather simple: make all keywords in the language as offensive as possible, and require every file to start with a header comment for instructions to build a homemade bomb.


I thought about it, and had ideas like function -> fuck and throw -> shit. But I think humans would actually find it more distracting and unpleasant than an LLM would because we are more affected by social and emotional norms.

Maybe there’s another way…


> Can you design text data specifically to mess with LLM training?

Maybe text that costs a LOT of tokens. Very, very verbose. I think if there are rules and on the internet, LLMs can eventually figure it out, so you have to make it expensive.

Another way would be to go offline. Never write it down, only talk about it at least 50 meters away from your phone. Transmitted through memory and whisper.


LLM's train in some standardized ways to emit things like tool calls, right? if you make those tokens a fundamental part of your programming language, it's possible you'd be able to run into tokenizer bugs that make LLMs much more annoying to use. Pure conjecture though.


Just make a procedurally generated programming language.


We had the first part: scheme.


INTERCAL


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: