Tailscale is awesome, and Netbird is awesome, and Wireguard is awesome. It is a great time to be alive for sure. I have a guide that I wrote https://dynip.dev/guides/tailscale where I explain how and why they can exist
Agree that the OpenWrt DDNS scripts are a bit of a pain with keys secrets but the snippets function actually take the guess / how-does-it-work work out of the equation so I am pretty happy with that
Your guide sounds obviously written by an LLM. I think that's okay, and you might have directed the LLM's work, but don't say you wrote it; this misrepresents the guide as more carefully crafted and authoritative than it really is.
I would have been all over this a few months ago but I've recently been an enthusiastic convert to netbird recently. I had a look at your guide. I am using netbird reverse proxy to expose a few services and it's been pretty much flawless. It saves me from needing to set up port forwards or worry about a firewall.
Do you see an advantage or alternative benefits to also having a public dynamic DNS, because for me I am struggling to see any?
Okay well I guess we are still dealing with someone else's proxy in the way (also providing TLS termination which was a big thing I was after). So you share fates with that service. It's not just a case of hole punching via a relay.
It would be nice to get something like that also with easy TLS setup.
So many self replies :) happy to dive in a bit more at a later time to get your take on how the services work together. hope you found the /guide helpful
I now use both. DynIP for public-facing services (yeah I still have a few), and Tailscale for what only I need to access. Drastically reduced my attack surface.
This makes me really happy, like really really. It is the exact part of the /guide where things work together and not agaist or replace, synergy and happiness.
Tailscale simp here, been using this feature since it launched in beta, can't believe it didn't exist earlier.
This solved every last remaining problem of my CGNAT'd devices having to hop through STUN servers (with the QoS being noticable), now they just route through my own nodes.
Not OP. Personal opinion on why it is a somewhat hard problem. The main problem is using the available compute correctly and productively while doing two very separate types of tasks that were previously solved independently: generating responses with llm inference engines and modifying weights with a training code. A step of training updates the weights so the inference engines have to adjust theirs, but we talk about 750B parameters and multiple inference servers. Stale weights can be used instead, but only for a tiny bit and the data from them needs special corrections that also involve large compute/memory. Your inference engines better be deterministic (for given pseudoRNG; it clashes with parallelism) or you have a way to correct the probability streams. Ideally inference and training should have same everything at the bit level when they handle the same context, but we dont live in that world yet. And of course, GPUs break. For no great reason, other than the tiny scale of their features making them fragile. And because you scale, you need to handle failures gracefully and efficiently.
Surely you could just pre-generate rollouts with slightly stale weights and then cheaply verify the rollout when up-to-date weights stream in by treating the former solution as speculative decoding. Sounds quite trivial to me, perhaps I'm missing something.
Cheap verifying of speculative decoding only works for a few tokens at a time. Long sequence generations (thousands to tens of thousands of tokens in typical rollouts for thinking models) are dominated by distribution drift on stale weights (because slightly wrong probabilities multiply over long streams), and the off policy RL training methods dont work well (high variance) for such high dimensional problems.
I don't think it's possible to separate any open source contribution from the ones that came before it, as we're all standing on the shoulders of giants. Every developer learns from their predecessors and adapts patterns and code from existing projects.
Exactly that. And all the books about, for instance, operating systems, totally based on the work of others: their ideas where collected and documented, the exact algorithms, and so forth. All the human culture worked this way. Moreover there is a strong pattern of the most prolific / known open source developers being NOT against the fact that their code was used for training: they can't talk for everybody but it is a signal that for many this use is within the scope of making source code available.
Yeah, documented *and credited*. I'm not against the idea of disseminating knowledge, and even with my misgivings about LLMs, I wouldn't have said anything if this blog post was simply "LLMs are really useful".
My comment was in response to you essentially saying "all the criticisms of LLMs aren't real, and you should be uncompromisingly proud about using them".
> Moreover there is a strong pattern of the most prolific / known open source developers being NOT against the fact that their code was used for training
I think it's easy to get "echo-chambered" by who you follow online with this, my experience has been the opposite, i don't think it's clear what the reality is.
If you fork an open source project and nuke the git history, that's considered to be a "dick move" because you are erasing the record of people's contributions.
The hard truth is that if you're big enough (and the original creator is small enough) you can just do whatever you want and to hell with what any license says about it.
To my understanding, the expensive lawyers hired by the biggest people around, filtered through layers of bureaucracy and translated to software teams, still result in companies mostly avoiding GPL code.
I’ve been thinking that information provenance would be very useful for LLMs. Not just for attribution (git authors), but the LLM would know (and be able to control) which outputs are derived from reliable sources (e.g. Wikipedia vs a Reddit post; also which outputs are derived from ideologically-aligned sources, which would make LLMs more personal and subjectively better, but also easier to bias and generate deliberate misinformation).
“Information provenance” could (and I think most likely would, although I’m very unfamiliar with LLM internals) be which sources most plausibly derive an output, so even output that exists today could eventually get proper attribution.
At least today if you know something’s origin, and it’s both obvious and publicly online, you have proof via the Internet Archive.
> I don't think it's possible to separate any open source contribution from the ones that came before it, as we're all standing on the shoulders of giants. Every developer learns from their predecessors and adapts patterns and code from existing projects.
Yes but you can also ask the developer (wheter in libera.irc, or say if its a foss project on any foss talk, about which books and blogs they followed for code patterns & inspirations & just talk to them)
I do feel like some aspects of this are gonna get eaten away by the black box if we do spec-development imo.
well, its not without issues. the actual motivation was not that dhcp is the suxxors, but to promote a model where the assigned prefix was free and highly dynamic.
the goal being to support a model where one could support multiple prefixes to handle the common case of multiple internet connections. more importantly to allow providers to shuffle the address space around without having to coordinate with the end organization. this was perceived to be necessary to prevent the v6 address space from accruing segmentation.
It's funny the "handle the common case of multiple internet connections" just doesn't work at all with ipv6 yet works much better under IPv4 NAT. With IPv6 each machine gets it's own routing table due to having two addresses which means I can't failover on the router when an ISP goes down. Machine will keep trying to use the ISP that is having 100% packet loss. I can't prioritize sending traffic out of one ISP because I'd need to configure it on each machine due to them having their own routing table. With IPv4 the router can handle those rules since its doing NAT for all machines in the network so it gets to choose.
Then Tailscale came out and I stopped caring about DDNS or CGNAT ever since.
reply