More

secondcoming · 2026-03-27T01:42:52 1774575772

> But as of now there is no such problem on any kind of significant scale.

This is not the same as saying there's no problem.

A fraction of humans will ever compete in the Olympics. People train their whole lives for it. It's not about 'scale', it's about safety and fairness. It's not reasonable to expect them to 'shut up' about it.

I don't want to watch a man beat up a woman in a boxing ring.

secondcoming · 2026-03-27T01:33:03 1774575183

You're most likely part of the 2bn that showed no, or a passing interest, in the Olympics.

squigz · 2026-03-27T01:35:50 1774575350

I sincerely doubt more than half the population of the entire planet showed more than a passing interest in them, and I'm still curious how it'd be possible to measure that.

secondcoming · 2026-03-26T16:48:59 1774543739

> Continuously capturing low-overhead performance profiles in production

It suprises me that anything designed by the OTel community could ever meet 'low-overhead' expectations.

tanelpoder · 2026-03-26T17:19:30 1774545570

The reference implementation of the profiler [1] was originally built by the Optimyze team that Elastic then acquired (and donated to OTEL). That team is very good at what they do. For example, they invented the .eh_frame walking technique to get stack traces from binaries without frame pointers enabled.

Some of the OGs from that team later founded Zymtrace [2] and they're doing the same for profiling what happens inside GPUs now!

[1] https://github.com/open-telemetry/opentelemetry-ebpf-profile...

[2] https://zymtrace.com/article/zero-friction-gpu-profiler/

rnrn · 2026-03-26T21:06:54 1774559214

> For example, they invented the .eh_frame walking technique to get stack traces from binaries without frame pointers enabled.

This is not an accurate summary of what they developed.

Using .eh_frame to unwind stacks without frame pointers is not novel - it is exactly what it is for and perf has had an implementation doing it since ~2010. The problem is the kernel support for this was repeatedly rejected so the kernel samples kilobytes of stack and then userspace does the unwind

What they developed is an implementation of unwinding from an eBPF program running in the kernel using data from eh_frame.

tanelpoder · 2026-03-26T22:12:07 1774563127

True, I should have been more specific about the context:

Their invention is about pushing down the .eh_frame walking to kernel space, so you don't need to ship large chunks of stack memory to userspace for post-processing. And eBPF code is the executor of that "pushed down" .eh_frame walking.

The GitHub page mentions a patent on this too: https://patents.google.com/patent/US11604718B1/en

BigRedEye · 2026-03-27T10:52:29 1774608749

I believe this is a case of convergent invention – the idea of pushing DWARF/.eh_frame unwinding into eBPF seems to have occurred to several people around the same time. For example, there's a working implementation discussed as early as March 2021: https://github.com/iovisor/bcc/issues/1234#issuecomment-7875...

felixge · 2026-03-26T17:13:55 1774545235

OTel Profiling SIG maintainer here: I understand your concern, but we’ve tried our best to make things efficient across the protocol and all involved components.

Please let us know if you find any issues with what we are shipping right now.

phillipcarter · 2026-03-26T16:52:42 1774543962

Anything to actually add?

antonvs · 2026-03-27T05:45:45 1774590345

Do you feel better now?

secondcoming · 2026-03-26T14:42:51 1774536171

If you enforce that the buffer size is a power of 2 you just use a mask to do the

    if (next_head == buffer.size())
        next_head = 0;

part

JonChesterfield · 2026-03-26T14:46:55 1774536415

If it's a power of two, you don't need the branch at all. Let the unsigned index wrap.

loeg · 2026-03-26T15:09:36 1774537776

You ultimately need a mask to access the correct slot in the ring. But it's true that you can leave unmasked values in your reader/writer indices.

dalvrosa · 2026-03-26T14:55:22 1774536922

Interesting, I've never heard about anybody using this. Maybe a bit unreadable? But yeah, should work :)

mandarax8 · 2026-03-26T16:39:53 1774543193

See https://fgiesen.wordpress.com/2012/07/21/the-magic-ring-buff... which takes it even further :)

dalvrosa · 2026-03-26T18:07:27 1774548447

Nice one!

loeg · 2026-03-26T22:14:54 1774563294

I believe ConcurrencyKit's impl does this.

https://github.com/concurrencykit/ck/blob/master/include/ck_...

dalvrosa · 2026-03-26T14:52:51 1774536771

Indeed that's true. That extra constraint enables further optimization

It's mentioned in the post, but worth reiterating!

foobar10000 · 2026-03-28T02:50:46 1774666246

Nice!

Should be able to push it more if

* we limit data shared to an atomic-writable size and have a sentinel - less mucking around with cached indexes - just spinning on (buffer_[rpos_]!=sentinel) (atomic style with proper sematics, etc..).

* buffer size is compile-time - then mod becomes compile-time (and if a power of 2 - just a bitmask) - and so we can just use a 64-bit uint to just count increments, not position. No branch to wrap the index to 0.

Also, I think there's a chunk of false sharing if the reader is 2 or 3 ahead of the writer - so performance will be best if reader and writer are cachline apart - but will slow down if they are sharing the same cacheline (and buffer_[12] and buffer_[13] very well may if the payload is small). Several solutions to this - disruptor patter or use a cycle from group theory - i.e. buffer[_wpos%9] for example (9 needs to be computed based on cache line size and size of payload).

I've seen these be able to pushed to about clockspeed/3 for uint64 payload writes on modern AMD chips on same CCD.

loeg · 2026-03-26T15:07:50 1774537670

This was, in fact, mentioned in the article.

secondcoming · 2026-03-25T14:51:39 1774450299

UK, California and Brazil, no?

wtallis · 2026-03-25T15:09:16 1774451356

California's law requires that the OS ask the user for their age, and accept the response as-is without doing any verification.

kps · 2026-03-25T14:55:58 1774450558

Terry Gilliam's Brazil, California, and geographic Brazil, yes.

secondcoming · 2026-03-23T17:30:25 1774287025

If I get a beer with no head I'm assuming the glass was dirty

secondcoming · 2026-03-18T14:09:50 1773842990

My first thought is that this is a sign of burn-out.

secondcoming · 2026-03-10T22:04:20 1773180260

Has Amazon's advertising TAM product been affected by AI?

secondcoming · 2026-03-09T12:32:14 1773059534

Boost is stronger than ever.

secondcoming · 2026-03-09T12:31:08 1773059468

Then why are you using rust for these tasks?

olalonde · 2026-03-09T16:04:37 1773072277

I'm not.