Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
C++ Language Interface Foundation (CLIF) (github.com/google)
97 points by matt_d on May 2, 2017 | hide | past | favorite | 69 comments


C++ is an extremely challenging language to write a wrapper generator for. I did a C++ to Common Lisp generator about a decade ago for my GSoC project, based on GCC-XML (with the goal of being able to wrap QT). The two challenges you run into are that the GCC C+ ABI is extremely complicated and requires a runtime on top of whatever low-level C FFI you're working with.

The second is that many semantics just don't map very easily. I imagine the latter problem has only gotten worse with heavier use of templated types in APIs. It looks like CLIF doesn't expose the template per se (at least in the Python examples) but requires explicit instantiation of each specialization as a distinct Python class. Templates mess with the very concept of an FFI generator, which tends to assume a reasonably clear separation between compile time and run time (so you don't need to run a C++ compiler as a pre-pass on any code that uses a binding).

All that is a roundabout way of saying: for god's sake write your system APIs in C.


Yeah I've grown to appreciate C++ in many ways, but what I noticed at my last job is that the baroque interfaces give it a viral effect.

Once you have a little C++ code, the easiest thing to do is to continue writing more C++ code, even if it's not the best tool for that job. It's too hard to use a C++ interface from any other language. Even wrapping it in C is annoying, although as I understand LLVM does exactly that for some of its API.

IMO there are a lot of systems where C++ is the best language for about 10% of the code. C++ really is unique in terms of offering zero-cost abstraction. But then the remaining 90% gets written in C++ too. It can be remarkably awkward for many problems, and your build times scale nonlinearly too.

In some way this is inherent ... the compiler's job is to erase all those abstractions and generate straightforward machine code. In other ways it is the fault/result of adhering to the C linking model, which ironically was for interoperability.

C++ is sort of like a universal receiver and not a universal donor. It can assimilate any C code, and thus it gets code in other languages transitively. But other languages can't assimilate it, at least not without great effort.

That said, Clang has a better API than GCC-XML, so the problem might be solved. Athoough honestly C++ just keeps growing more features with C++ 11, 14, 17 that make it harder to interoperate with any other code. It's this huge compile-time language completely separate from the C linking model.


> It's too hard to use a C++ interface from any other language.

D has the best support for this. It's not as simple as `#include <cppheader.hpp>` but it's better than any other language at it. Name mangling matches, C++ exception support and C++ abstract classes can be declared as D interfaces and everything works.

> C++ really is unique in terms of offering zero-cost abstraction

Rust and D (at the very least) would disagree.


OK, good to know about D.

As far as zero cost, my understanding is that D is still trying to get rid of GC from the standard library.

And I'm explaining why I think Rust is having a hard time getting adoption. It makes an effort to be compatible with C, but not with C++. But really C++ is its "competitor", not C.

And C++ has the network / lock-in effect I described. All languages have a network effect to some degree (libraries, documentation), but I think the situation with C++ is especially acute.

Also, I would say that safety and zero-cost abstractions are the goal of Rust. Those two goals conflict somewhat -- e.g. in the decision over bounds checking. Although I guess you can say that without bounds checking you have no abstraction; you just have a pile of buggy code at zero cost :)


I quite like using Erlang/OTP to host C++ nifs. I feel like it strikes that 10% of code being in C++ balance that you mentioned. I heard it's a popular combo for HFT applications.

http://erlang.org/doc/tutorial/nif.html


The linked to page only shows C and not C++.


> C++ is sort of like a universal receiver and not a universal donor. It can assimilate any C code...

This is part of the language's design.

Bjarne's goal after being forced to use BCPL instead of Simula was never to use a bare bones language ever again, and C with Classes needed to fit into AT&T's C tooling.

Hence why we also didn't got modules back then and a funky name mangling to fit into those bare UNIX linkers.

As for being an universal donor, the story is a bit different in OSes where the ABI is not C based, e.g. BeOS, Symbian, Genode, Windows (COM, .NET, UWP), OS/400 (TIMI), z/OS (ILC).


Yeah I read "The Design and Evolution of C++" and quite liked it. His goal was definitely to assimilate C code, with compatibility being a high priority.

But I'm not sure the goal was to make C++ hard to use from other languages. I think that just fell out of the focus on zero-cost abstractions.

COM seems like the right middle ground between baroque C++ interfaces and RPC/message passing. You write native code, but it can interoperate dynamically with components in other languages, in the same address space. But I think it is overly tied to OOP, and that doesn't play well with the style of Unix.

I wonder if it would be possible to do better, or if that ship has sailed. I'm not overly familiar with Windows... I know there were some problems with COM but it seemed basically sound. I used JScript once and it was pretty powerful.


COM works remarkably well (and a ton of system APIs simply doesn't exist as a C API). It probably only gets bashed because some people conflate it with DCOM and it's security issues.


Well UWP is basically the COM+ Runtime reborn, picking up the ideas that they were discussing back when Ext-VOS was being planned, which eventually became .NET instead.

OS/2 SOM was better in that it supported implementation inheritance and meta-classes.

Apple also had some nice ideas for Copland and how to further develop Taligent, but there is little documentation left of that effort.


It makes sense to write system APIs in C, but there are many more things than system APIs, and many of them find it useful to use C++ constructs.

(edit) Clang's internals are also several orders of magnitude easier to work with than GCC's internals. In fact, clang is designed to be used as a library, and CLIF leverages that very, very highly.

See, for example, https://clang.llvm.org/docs/ClangTools.html . Clang tools are very powerful, and much of CLIF is based on a Clang tool, not a GCC internals hack.


Also should be noted that outside UNIX world, there are other OSes that don't use C conventions, or are migrating away from them (even if slowly).


How stable are Clang's internals? Are you supposed to be able to write your own Clang tools and link them dynamically, or are they mostly distributed and built together with the Clang source code? (i.e. like Linux drivers where there is no stable interface.)

I know that LLVM is notable for not maintaining API stability -- not sure about Clang.


There are (roughly) two ways of using Clang as a library. The first is a via a C-based API into a libclang.so. This is an API designed to be extremely stable, is fairly powerful, and works relatively well.

That API doesn't expose absolutely everything though, which is one reason it can be so stable. Tools that need more are typically shipped and built along side the clang source code. CLIF is the latter.

You can read about both of them here:

http://clang.llvm.org/docs/Tooling.html


> All that is a roundabout way of saying: for god's sake write your system APIs in C.

Or at least expose a C-compatible ABI.


On Windows we get to use COM and .NET bindings, with UWP controls slowly picking up steam.

Much better and we get to use a modern OO API.

Hence why better Windows support is relevant for Rust, if you want it to be more seriously taken by Windows devs.

EDIT: Same applies to mainframes, which also don't follow a C ABI, rather their own native languages.


Hopefully this can read the code well enough to recognize explicit instantiation in C++ code and just generate the python versions of those. One of the (many) painful things about SWIG is trying to keep your .i file in synch with your c++ code.


CLIF does recognize explicit template instatiations, and can instantiate a limited set itself. There is no pain like keeping the two files in sync because clif only allows a very small amount of C++ in its sources--basically just type names.

And cliff takes care to error out when the wrapper description doesn't match the C++. "When in doubt, refuse the temptation to guess."

Our internal clients really like it. Someone described it the other day as "magic".


"magic" is an overloaded term.

In can mean something so beautifully advanced that you cannot distinguish it from advanced tech and you really don't care how it works, since it works so flawlessly you couldn't bother looking behind the scenes. I.e. a praise.

Or it can mean something that uses some arcane unearthly constructs and at the moment you need to look behind the scenes you find yourself utterly lost. I.e. a critique.


No goats need to be sacrificed in order for CLIF to do the right thing.


I was in the room when this person said it (and it was in the context of CLIF's Google-wide adoption), so I think it's reasonable that my interpretation that this was a positive statement is reasonable.

At least if you weren't in the room as well. I'm assuming you weren't?


Why do you need a handwritten clif file?


Why would this be down voted? It's a legit question, and a followup in the comment chain.

I understand why SWIG needs a .i file, as it doesn't understand much about the code. But when you control the compiler you can 1> as in the example I gave, look at actual explicit instantiation instead of doing the synthetic thing SWIG does, as well as make other, smarter decisions and 2> use #pragmas or attributes to direct translation at the point of use.

And the author even said himself that "code two files is a pain."

Thus my question remains: why the need for a .clif file?


#pragmas are not portable, and have various problems with compatibility.

Structured comments could work, but because the C++ compiler doesn't parse them, you essentially have a file-with-a-file, and 80% of the problems you encounter with the dual-file system.

You also have the problem of the API's client trying to understand what the API looks like. We don't want to make them read C++ code in any way, and a pyclif file looks a lot like python.

Finally, CLIF is substantially more terse than SWIG--on the order of 1/10th the number of lines. This makes it less of a big deal to have a separate file.


A C++ header file is full of garbage^W implementation details and not-C++ users want a stable API definition - that's what .clif file is about. It's an interface declaration for a native user.


> All that is a roundabout way of saying: for god's sake write your system APIs in C.

No. Use an IDL to define component interfaces and a generator for any particular language. That's how C++ components interoperate on Windows. (COM is just a formalization of vtables generated from an IDL file and compiled with midl compiler.)


I wondered how long it would take before someone posted CLIF here since we haven't put up a blog post about it yet... :)

Interesting stat: Around 80% of our the Python C++ extension module wrappings being added to our code base are now being done using CLIF instead of SWIG. We are actively working on forbidding new Python SWIG wrappers from being added and migrating important legacy wrappings off of it onto CLIF.

Who am I? I am TL of the team that create CLIF (design and code reviewer, _not_ a primary author; they can identify themselves as they see fit). -gps@


When are y'all doing Go and Java, though? The fact that this exists just compounds the pain of writing SWIG for the other two languages. The grass really is greener.


Go and Java claimed they can do without C++, so let them suffer :)

It takes time and because we love Python we did it first. Others will come.


SWIG is ... special.


So... someone with basically zero Python experience and a ton of C++ experience could basically "hand that experience down" to his newly minted Python-self?


Sounds like a worthy goal. There will always be infelicities (as with human language translation as well) but why not?


Having used both SWIG and Boost.Python professionally, I find manual work required by the latter to be 100% worthwhile and usually necessary.

C++ and Python are not very similar, and the result of auto-binding them typically gives up a combination of fluency and performance.

A simple example of the former is a C++ algorithm which writes to an output iterator. In Python this might be expressed as a generator. But none of the auto-binding tools can do this transformation.

As for the latter--performance--I have seen 100x performance penalties in practice when the blind lead the bind. A good example is that in Python we have memoryview and the buffer protocol, but auto-binding tools take these no further than std::vector (if that). A C++ API which produces a large stream of numeric data just begs to be bound using the buffer protocol or perhaps even NumPy directly. But if a C++ API produces small values really quickly, an auto-bound one will produce the same small values really slowly.

Some people see the writing of bindings as manual labor to be avoided. I see it as an optimization opportunity. There are huge gains available.


> I find manual work required by the latter to be 100% worthwhile and usually necessary

Roughly the same experience. Automatic generation for anything more than trivial examples always seems to lead to the need for more and more configuration to keep everything in line, to the point where just writing everything by hand becomes less work. This is likely due to the type of software we use it for, but still, the amount of time I lost on SWIG really seems wasted instead of leaving the feeling of having learnt at least something. It was all not very pleasant. The Python side of our software gets exposed to less tech-savvy users and is meant to be a more friendly layer over C++ functions and classes. But the C++ side isn't really an API in the sense there is no hard distinction between the 'internal' layer and the API layer. Some class which is only used internally might the next day also be wrapped to be available in Python. So there's no single directory or so one can point to and say 'everything in there is the API'. Also given a C++ class, sometimes only a couple of methods have to get exposed to the users, sometimes with a different name even, or some arguments defaulted etc. Preferrably without having to write a wrapper just for the sake of exposing it. All this turned out to be a nightmare in SWIG. Just manually writing one or more lines for registering the function manually (not Boost.Python but similar) is usually a one-time-almost-never-look-back thing which, for us, is way less work in the end. Even with the hundreds of functions we have already. And of course performance is the other advantage.


CLIF is used inside Google for some very non-trivial projects.


Well I'm always open to new ways of doing things. Any estimate of how hard it would be to generate MicroPython [1] wrappers for C++ code (in a rather configurable way as described earlier)?

[1] https://github.com/micropython/micropython


It seems entirely reasonable for someone to create a MicroPython wrapper generator. It appears to have a C/C++ API so it should be similar to the existing CPython Generator and Runtime.


There are always justifiable use cases for manually written Python bindings.

But a "there be dragons" caveat applies as getting the CPython API correct is complicated. Reference counting and error checking bugs are common in hand written CPython C API code.

For most cross language bindings the primary goal is to "just work reliably". Optimization can happen later after you have profiles to figure out where it is worth doing and in what way.


Assuming one wants to call C++ code from Python (the common case) I would even start with simple pipe-based IPC and just build an executable in C++ that acts as a bridge between Python (or any other language in this case) and C++. E.g. using JSON serialization it would look like this:

  $LANG (e.g. Python) <-> JSON <-> Pipe <-> JSON <-> C++
If this is too slow I still wouldn't use a binding generator (SWIG, etc.), but build a C API using opaque pointers that offer the required functions, i.e. a bridge between C and C++. Finally one can simply call into this API with Python's ctypes which are part of its standard library.

  Python <-> ctypes <-> C-API <-> C++
I have worked previously with SWIG, but my experience has been very bad, in particular the problem with binding generators is similar to the problem parser generator face - they make the simple stuff simple and the hard stuff hard. Furthermore the bindings that are auto-generated are mostly too low-level, so one normally writers wrappers around them to make them more "pythonic" which raises the questions why not write the bindings from scratch anyway?


Finally! I have been suffering under SWIG and have been hoping for some time that someone would get the compiler to do this.+

Not to say SWIG isn't an amazing effort, but it's a whole C++ compiler maintained by a very small team.

+ I already put my time in at the gcc salt mines so no, I didn't try doing this myself.


Perhaps the best thing about CLIF is that it never, ever, parses any C++ by hand [0]. It always uses a state-of-the-art, fully-industrial-strength, well-supported compiler: Clang. And it can be updated in lockstep with Clang near trivially.

[0] Actually, there is one tiny exception: CLIF occasionally separates a C++ qualified name on the scope-resolution operator "::".


I'm cautiously excited. The problem with C++ is there's just so many good frameworks but you can't use them without creating more C++.

SWIG should be called WIG. And anyway, I can't see it surviving modern C++.

As long as I'm ranting, what is modern C++ but a bunch of new languages that are incompatible with C++, each other, and everything else? Forgive me if I'm wrong but this seems like a terrible idea.


Modern C++ means picking up the ideas from Alexandrescu, avoiding unsafe C style programming unless profiler tells otherwise and using the higher level features from C++ for writing nice, usable, safe libraries.

Many of the idioms actually already possible back in the C++ARM days, before C++98 was a thing, but spoiled by C refugees.


Maybe I got my terminology wrong. I'm talking about the wave of new standards. I feel like they're a bunch of new languages which are improvements over C++ but not backwards compatible with C++. Every problem you have interoperating two different languages you have between these different C++es but worse because there aren't tools like SWIG to help you.


You can say the same about any programming language that enjoys wide market adoption, except maybe for C that still thinks computers are like PDP-11's, with C99 and C11 being very tiny evolutions with little regard to improve the overall productivity.

Even Fortran and Cobol(!) have evolved more than C.


> SWIG should be called WIG. Depends on your perspective. The other alternative is (was?) to start typing up those wrapper functions manually.


There are several issues that I usually hit with bindings to C/C++ and other languages:

  - Calling a native function that itself takes callback, and that callback might be your non-native code. (trampolines?)
  - Dealing with memory allocation. Who owns what.
  - Exceptions or long_jmps
  - Green threads, fibers, etc.
  - Other?


Also:

  - int64 -> uint64 conversion
  - int64/uint64 -> double conversion


callbacks in swig are straightforward. you have a C/C++ reciver function that gets registered along with a pointer to the scripting language target function. I've used this design for over a decade.


I was mentioning FFI in general, like Common Lisp's FFI, LuaJIT, others. For example for callback back, you need to have a trampoline, where the "C/C++" trampoline would call back your language runtime "engine". But this introduces gap in the "stack", and might not work with all VM's, or you may run out of trampoline slots, and not being able to allocate new, since this would require dynamic code generation, and say on iOS and certain other game consoles that's not allowed.


there are no trampoline slots required. it's a function pointer that's provided as an argument (user data). Can you point to an FFI that doesn't support callbacks in this way? This would be a major problen in the FFI implementation.


LuaJIT limits both the number of registered callbacks [1], and re-entering the interpreter from a JIT'd callback [2] (it will try to detect and prevent JIT for such functions, but if that fails you get a panic).

[1] http://stackoverflow.com/questions/31042530/why-does-luajit-...

[2] http://stackoverflow.com/questions/25924755/c-and-lua-unprot...


that's a limitation of a specific FFI, which is apparently a deisgn choice motivated by limited memory systems.


Yes, sorry - I'm simply speaking as an user of a given language runtime connecting to "C" exported functions. I understand that given a different implementation this would've not be needed.


Callbacks are well supported in SWIG for most languages.


Having trouble groking this. Can someone give a simple use case / example? If the Parser generates language agnostic data, how can this data be passed to the Matcher which parses C++ headers (i.e., C++ headers are not language agnostic).

A good README should give a "high level" description of what the things is and what its used for.


The repository mentions "other languages" but I could only find Python examples?

While I think wrapper generators are a noble idea, I doubt I would choose this over the convenience of IDL-like metaprogramming approaches (e.g. pybind11, or Boost.Python)


Metaprogramming such as pybind11 does is neat. Thanks for the pointer to the project. It still looks like manually written C extension modules to me based off of what I see in https://github.com/pybind/pybind11/blob/master/docs/basics.r.... Just much shorter code with a lot of the hard to get right details taken care of you. Good! Better than the status quo. But it doesn't abstract the problem away very much. (no doubt some will consider that a feature)

What if I also wanted my wrapper available on a non CPython VM? The proper way to use PyPy is not via its fake CPython API support (slow and memory hungry). Imagine if a PyPy Generator were added to CLIF. It'd use the same interface definition Parser but generate an entirely different set of code. In PyPy's case that would probably be a C library wrapping the C++ for generated cffi based Python code to interact with.

Admittedly all hypothetical here until other Python VMs have CLIF Generator implementations.


Semi-related, in case you haven't seen it:

http://doc.pypy.org/en/latest/cppyy.html


Thanks! Eek! An XML interface definition. In 2017!


CLIF is designed to support arbitrary front-end languages--and other languages are in the exploratory phases. But there is no full support for other languages at this time.


Doesn't the IDL approach require that the declarations/intermediate language be written manually though? Most C++ libraries out there don't have any IDL files (looks at QT)


I mean IDL-like, as in valid c++ that is almost as easy to read as an IDL (to me at least) but without losing the ability to dive in to the binding if necessary. If you're trying to produce a binding that feels idiomatic in the target language, I'd say this is important, and this is where Swig-like approaches show their weakness.

I think the author of Swig shares this sentiment: (http://code.activestate.com/lists/python-dev/109281/).


Can this wrap Qt (or barring Qt, wxWidgets)? There are great Qt bindings in Python, but in other languages, not so much.


Fellows, I would suggest you check https://github.com/mono/CppSharp. It's Clang-based as well and despite the name, it's not bound to C# or .NET, generators for any languages can be added. It's feature complete with the exception of templates which are being worked on as we speak. It's also fully automated, manual intervention is only required if the user wants binding-specific customisation.


How much C++ do I need to know in order to wrap some else's C++ library?


Quite a lot, or at least by proxy given that you would have to talk to library and compilers that are written in C++.

However: Some libraries/packages do expose a C api, which you could use in a language assuming it had a decent FFI.

A lot of major C++ projects use exceptions, which you will have to catch and handle inside of the C++ because even if C could catch exceptions, there is no guarantee of ABI compatibility. Some languages, I think D can do it, can catch an exception thrown in C++ code.

You might have better luck integrating a standard data interface between the C++ and any other language: However, that could be a little slow.


I like this idea, I will try this soon. Is there any plan for somewhat transparently supporting NumPy?


NumPy support is a borderline issue for CLIF.

Typical usage I saw is the following: C++ has some class(es) that best represented as NumPy arrays. So because those classes are specific to the project (not generic like std::) they don't fit into CLIF runtime, but the project supply a C++ library with custom conversion functions as described in ext.md that use NumPy C API tells CLIF that those classes are convertible to Python objects (that NumPy objects are).

That all NumPy integration CLIF needs and it has to come from the user.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: