Hacker Newsnew | past | comments | ask | show | jobs | submit | mnorris's commentslogin

The smallest Nvidia gpu I've run Sharp on is an Nvidia T4 which I think has 16 GB on the chip


TLDR; I accidentally allowlisted only my domains, so no one else was able to generate <image-3d>

I was testing this on my own domains and it was working, so I wrongly assumed that no one was trying this. But I do see quite a few failures which will hopefully succeed now.

But I made an allowlist for only my domains yesterday that rejects everything not on the list, and forgot to toggle OFF the switch I built that opens it up to everyone.

Anyway, happy hacking and if anyone else stops in and tries this out and it fails or succeeds let me know.


I'm Michael. I built https://mukba.ng because I felt like 3D creation tools were too complicated.

All of the Gaussian Splat embeds that I see on the web require you to create a Gaussian Splat first, and then embed it. I thought, what if we could just turn every photo on the web into 3D.

I built a web component that turns any image URL into an interactive 3D embed. Drag to rotate, pinch to zoom. Install in two lines:

<script src="https://mukba.ng/image-3d/embed.js" defer></script>

<image-3d src="/your-photo.jpg"></image-3d>

The component fetches from a Cloudflare Worker that either returns a cached Gaussian Splat or generates one on the spot. Splats are ~5 MB, but have a 15 KB preview mesh that you can interact with while it streams in on slower connections.

Works with any image URL.

Docs + live demo: https://mukba.ng/image-3d/docs/ Source: https://github.com/imichaelnorris/image-3d


I built DeepSteve (https://github.com/deepsteve/deepsteve) with a similar itch but went the other way. Instead of adding graphics to the terminal, I put the terminal in a place that already has graphics.

I kept trying to optimize my terminal layout and realized I could just run my terminals inside of the browser, and let Claude Code write JavaScript in the same browser tab to customize the experience however I want. It's kind of a terrible idea, but it's my terrible idea, and I love it.


Have you ran into any issues with handling inputs? Like how vim Ctrl+P would maybe be intercepted by the browser Ctrl+P shortcut for print.

And have you run into any other issues, maybe like performance?

I feel like web-ified terminals get nerfed pretty hard and I'm not sure if/how people overcome that.

I like the idea of customizing multiplexed terminals with on-the-fly JavaScript, tho.


I haven't seen any performance issues for Claude Code, even when I'm running like 20 in one browser tab and looking at them all at the same time (rendered with xterm.js), but Gemini and OpenCode flicker a lot even if you have one open.


I’m making a detective themed iOS-based visual novel with React Native.

Making the game engine was easy. Making the story consistent, believable, and interesting has been the biggest challenge for me.

I’ve written a few bad novels but never any narrative games, so it’s been a good exercise for me.


disclaimer: I work on a different project in the space but got excited by your comment

DeepSteve (deepsteve.com) has a similar premise: it spawns Claude Code processes and attaches terminals to them in a browser UI, so you can automate coordination in ways a regular terminal can’t: Spawning new agents from GitHub issues, coordinating tasks via inter-agent chat, modifying its own UI, terminals that fork themselves.

Re: native vs external orchestration, I think the external layer matters precisely because it doesn’t have to replicate traditional company hierarchies. I’m less interested in “AI org chart” setups like gstack (we don’t have to bring antiquated corporate hierarchies with us) and more in hackable, flat coordination where agents talk to each other via MCP and you decide the topology yourself.


I was intrigued and had a look at deepsteve.com, but I couldn't figure the website out. I'm guessing it won't give you any information about it until you install it?


Thanks for the feedback.

Deepsteve is a node server that runs on your machine, so the website is designed to look like DeepSteve's UI. You really just access it at localhost:3000 in your browser, not via deepsteve.com

But now I can see how that would be confusing.


I ran exiftool on an image I just generated:

$ exiftool chatgpt_image.png

...

Actions Software Agent Name : GPT-4o

Actions Digital Source Type : http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgori...

Name : jumbf manifest

Alg : sha256

Hash : (Binary data 32 bytes, use -b option to extract)

Pad : (Binary data 8 bytes, use -b option to extract)

Claim Generator Info Name : ChatGPT

...


Exif isn't all that robust though.

I suppose I'm going to have to bite the bullet and actually train an AI detector that works roughly in real time.


Why food? It's static, and AI 3D models do not make food that I want to eat. Using photogrammetry means that high quality reconstructions of real food look tasty - it's an easy qualitative metric for me.

Previously the app only produced 3D models and threw away the original video, but incorporating the underlying videos both shows off to new users what type of content they're supposed to record (i.e. a 1 second video of a darkly lit pizza box is NOT going to produce good content), and it makes the output shareable content.


I've been working on Mukbang 3D for the past year and a half—an iOS app that converts food videos into interactive 3D models using photogrammetry. Record a short video of food, and viewers can rotate/zoom/explore it while the video plays.

I recently added pose tracking of the 3d model so I can overlay 3d effects onto the underlying video.

Here's a demo: https://mukba.ng/p?id=29265051-b9c7-400b-b15a-139ca5dfaf7e


This is awesome!

I'd love to boot this up and see how it runs on a Quest headset


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: