Hacker Newsnew | past | comments | ask | show | jobs | submit | leeoniya's commentslogin

or was it Windows hiding file extensions by default and you downloaded a .mp3.exe file?

Isn't it just great how a decision made by some genius in Microsoft decades ago caused so much confusion and mess. Even on Windows 11 the default is to hide extensions, because, geez, wouldn't want to confuse people with change after decades of it being like that.

Although, was the hiding something that the Mac introduced?

The idea of the last part of the filename (after the period) determining what program is launched to handle the file is odd anyway...

I wonder if the Windows spyware infrastructure measures what % of people turn off extension hiding..


> The idea of the last part of the filename (after the period) determining what program is launched to handle the file is odd anyway...

I challenge you to suggest a better solution - the best that Linux came up with is a giant database of all magic numbers known to God and praying that something matches... sometimes it does, and sometimes it even matches the correct program.


the linux way works tho

it really doesn't

hiding information from user is not good by any means


But relying on them to tell the OS what type a file is, or allowing them to change the extension, isn't good either. lena.jpg doesn't become lena.pdf by changing the last 3 letters of the filename..

It gets really complicated when you get into overlapping file types, like with ISOBMFF. An .mp4 can also, simultaneously, be a valid .3gp because those are profiles of ISOBMFF. On the other extreme, JPEG is secretly two different incompatible formats (JFIF and Exif), and a video file with a different codec in the same container, or even a different track layout, might as well be a different format.

> allowing them to change the extension, isn't good either

why not?

power to the people. At worst it just breaks. At best you get filetype chamelions that are straight up cool, yet harmless


Am I missing something? Hiding things from users is a property of the windows approach. Did you reply to the wrong person?

GGP comment literally says that linux way is hiding file type in the binary information. If you consider that as more visible I have many questions

windows at least gets an option to display file type


Some file formats, eg png, require a particular file header in order to be considered valid. This is true regardless of your operating system, be it windows or linux. If that is hidden information, then it is hidden regardless of which operating system you're on. On windows, if I have a png named .doc, then there is absolutely no way to determine that it is a valid png and could be opened with my image viewer with standard tools. On linux it will recommend you open the file with an image viewer regardless of the file extension. That seems to me like significantly less hidden information.

> On windows, if I have a png named .doc, then there is absolutely no way to determine that it is a valid png

And that's a good thing. Imagine receiving a file called "very_funny_cat.png" and Linux realizing that while it's not a valid PNG image, it is a valid ELF executable.


The sane option would be to change the file explorer to have a column/field of "What type of file we've detected it is". We already do this for directories. Nowadays we just rely on naming convention that the last 3 letters of the name identifies correctly what the file format is.

Right, linux should instead go by the normal extension for an elf, which is no extension... instead this problem is solved by prompting the user if they want to execute the program.

> instead this problem is solved by prompting the user

it is solved by informing user how the file will be used

aka file type


The mac started out without using extensions at all, the type was embedded in the metadata. That's still possible now, but it's largely derived from extensions first. I believe Finder shows all extensions by default. It certainly does in details mode.

Macs originally didn’t have filename extensions because the file type was stored as metadata in the file system

That really is a superior way of doing things too. Or at least it would have been if that metadata were transferred with the file itself in all protocols.

Nope. It had been renamed from .avi. CP.

You'd think it wouldn't be that hard for a client to automatically check if a file is what it claims to be after downloading.

surprised not to see http://phpsadness.com/ here

Maybe that’s because a site last updated 8 years ago is utterly irrelevant for, like, anything?

It’s not so irrelevant if your org is stuck on older versions of PHP.

Being stuck on versions of PHP published >= 8 years ago isn't a valid excuse anymore, with AI and static analysis tools available IMHO. And even if you really don't identify using CVE-ridden, unsupported, ancient software to handle customer data as a critical business risk and insist on riding the burning train instead - well, then you've lost all right to poke fun at the legacy PHP versions you're stuck on, because the pain is entirely self-inflicted by your org.

I like this guy

the term "Americans" always bothered me, though it's commonly used to refer to US

> "no no, it has full test coverage"

i don't have enough fingers (and toes) to count how many times i've demonstrated that "100% coverage" is almost universally bullshit.


Codex is freakin hot-to-trot to churn out test coverage for every single thing it implements, and some of it is very esoteric and highly prescriptive (regexes for days) BUT .. after a while, it dawned on me that LLM-driven test coverage is less about proving “code correctness” (you’re better off writing those tests yourself alongside them), and more about just trying to ensure that whatever gets bolted on stays bolted on. For better or worse, obviously, since if you bolt on trash, trash you shall have.


There's a very old paper by Cem Kaner about the meaninglessness of "100% coverage" where he included an appendix where he enumerates 101 different possible types of code coverage: https://www.researchgate.net/publication/243782285_Software_...


Wholeheartedly agree, but in fairness, I trust the tests of the best AI models more than those of the average human developer. There's a lot of people around that combine high diligence with complete intellectual laziness, producing tons of useless tests.

Actually no, cancel that. I realise now that I trust AIs more than the average developer, period. At this point they do produce better code than most people I've dealt with.


if you want to do image diffing, use https://github.com/dmtrKovalenko/odiff

however, if you have SVGs already, compare those without rasterizing them first, then if that fails rasterize & odiff the baseline and new on-demand. this way you dont waste a ton of space storing a bunch of binary [possibly evolving] pngs in git.


Levenshtein distance is often a poor way to fuzzy match or rank. i suspect that in js, even the trie approach would incur significant GC/alloc thrashing or cost of building a huge trie index.

i tried fuzzy matching using a cleverly-assembled regexp approach which works surprisingly well: https://github.com/leeoniya/uFuzzy


I would argue the opposite, with the 'often' doing some heavy lifting.

It is very likely that you have interacted with a Levenstein distance based spell corrector (with many modifications) and I have touched that code. Used well they can be very powerful.


> Sen. Sheldon Whitehouse

ok, i was confused for a minute.

> And, given that the gas breaks down relatively quickly, this would have been one of the fastest ways to reduce global warming.

s/quickly/slowly?


No, quickly is correct. Methane in the atmosphere has a lifespan of about 12 years, if I recall correctly. That means that if you stop emitting it, then the warming effect of atmospheric methane would subside over the course of a decade. This is as compared to CO2, which is a much weaker greenhouse gas, but does not break down naturally over time. Atmospheric CO2 has to be absorbed by natural processes (oceanic absorption, plant photosynthesis, etc) otherwise it exists in the atmosphere effectively forever.


> for a 2–3% performance gain

this is highly workload-dependent. there are plenty of APIs that are multiple-factor faster and 10x more memory efficient due to native implementation.


i wrote something similar for this purpose, but much simpler and in 2kb, without AI, about a year ago.

uWrap.js: https://news.ycombinator.com/item?id=43583478. it did not reach 11k stars overnight, tho :D

for ASCII text, mine finishes in 80ms, while pretext takes 2200ms. i haven't yet checked pretext for accuracy (how closely it matches the browser), but will test tonight - i expect it will do well.

let's see how close pretext can get to 80ms (or better) without adopting the same tricks.

https://github.com/chenglou/pretext/issues/18

there are already significant perf improvement PRs open right now, including one done using autoresearch.


Looks like uWrap only handles latin characters and doesn't deal with things like soft hyphens or emoji correction, plus uWrap only handles white-space: pre-line while Pretext doesn't handle pre-line but does handle both normal and pre-wrap.


correct, it was meant for estimating row height for virtualizing a 100k row table with a latin-ish LTR charset (no emoji handling, etc). its scope is much narrower. still, the difference in perf is significant, which i have found to be true in general of AI-generated geenfield code.


I've worked with text and in my experience all of these things (soft hyphens, emoji correction, non-latin languages, etc) are not exceptions you can easily incorporate later, but rather the rules that end up foundational parts of the codebase.

That is to say, I wouldn't be so quick to call a library that only handles latin characters comparable to one that handles all this breath of things, and I also wouldn't be so quick to blame the performance delta on the assumption of greenfield AI-generated code.


no disagreement. i never claimed uWrap did anything more than it does. it was not meant for typography/text-layout but for line count estimation. it never needed to be perfect, and i did not claim equivalence to pretext. however, for the use case of virtualization of data tables -- a use case pretext is also targeted at -- and in the common case of latin alphabets, right now pretext is significantly slower. i hope that it can become faster despite its much more thorough support for "rest of the owl".


uWrap demo has text extending beyond text boxes all other the place on Safari, is that the price of simplicity?


i don't have a mac to test this with currently, so hopefully it's not the price but a matter of adding a Safari-specific adjustement :)

internally it still uses the Canvas measureText() API, so there's nothing fundamentally that should differ unless Safari has broken measureText, which tbh, would not be out of character for that browser.


Chrome is no different, for example, "RobertDowneyjr" is out of the box, so does "enthusiastic" in a couple of places.


ack, see my reply to sibling comment: https://news.ycombinator.com/item?id=47572206


prepare uses measure text, if it is in a for loop, it won't be fast. This library is meant to do prepare once and then layout many times. layout calls should be sub-1 ms.


it is not clear from the API/docs how i would use prepare() once on one text and then use layout() for completely different text.

i think the intended purpose is that your text is maybe large but static and your layout just changes quickly. this is not the case for figuring out the height of 100k rows of different texts in a table, for example.


I think for that to use pretext is to join each row with hard line break and then do prepare once, then walk each line. At least that will put the single layout performance into the best light.

I am skeptical getting row height of many items only once is the intended behavior though. It is probably the intended behavior to get row height of many items and enables you to resizing width many time later (which is pretty useful on desktop).


tried just doing a concat of the 100k sentences with line breaks, it wasnt much faster, ~1880ms.


There's a handful of perf related PRs open already so maybe it will be faster soon. I'm sure with enough focus on it we could have a hyper optimized version in a few hours.


just throw another few on the pile:

https://mastodon.social/@azureshit


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: