AutoCXX: Safely call C++ from Rust with auto generated bindings

adonese · on Aug 22, 2020

Probably a follow up on this one[0]. Also related to cxx's[1] (mentioned in chromium's original post[2])

[0]: https://news.ycombinator.com/item?id=24211691 [1]: https://github.com/dtolnay/cxx [2]: https://www.chromium.org/Home/chromium-security/memory-safet...

kibwen · on Aug 22, 2020

These libraries must contain a list of types which they consider "mostly equivalent" between Rust and C++ in order to facilitate the bridge, e.g. Rust's `Box` and C++'s `unique_ptr`. I would be interested in reading the exhaustive list of the types that are considered equivalent; surely there must be subtle semantic differences to account for.

steveklabnik · on Aug 22, 2020

https://github.com/dtolnay/cxx#builtin-types

AprilArcus · on Aug 22, 2020

Result<T> -> throw/catch seems surprising and hard to walk back later. Why not use std::expected or a work-alike, and let the caller decide whether to throw an exception?

MaulingMonkey · on Aug 23, 2020

Indeed. I'd expect panics -> throw/catch -> panics... - since those both unwind the stack until they reach a handler by default - but Result -> throw/catch? No way. Making a work-alike and calling it rust::Result would be much more appropriate IMO.

Even setting aside personal taste, some gamedev platforms - even in $(CURRENTYEAR) - still default to C++ exceptions being disabled. And may throw linker errors if you try to enable exceptions, while linking any closed source third party libraries that were built with default exceptionless build settings. I've rewritten my share of exception-based error handling - just to sanely port across platforms - as a result. It's just as well - I've fixed enough bugs where exceptions propigate across a C ABI to consider them UB-bait.

There's a reason Rust lets you configure panic="abort", and a reason it gives you Result s without magic unwinding semantics, and a reason why a lot of C++ codebases have ended up with some kind of custom Result-alike.

masklinn · on Aug 22, 2020

cxx is bidirectional so probably because the mapping needs to work both ways, and you will eventually have C++ -> Rust -> C++ where the "inner" C++ throwing would be expected to cause the "outer" C++ to catch?

Not to mention as far as I know std::expected does not actually exist. It was first proposed 7 years ago but is still a proposal, now hoping for inclusion in C++23 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p032...)

AprilArcus · on Aug 22, 2020

> you will eventually have C++ -> Rust -> C++ where the "inner" C++ throwing would be expected to cause the "outer" C++ to catch?

Look I hate to be that girl with the strong opinions on the orange website, but "goto is not an appropriate feature for an ffi" is a hill I'm ready to die on.

_dh54 · on Aug 22, 2020

Exceptions are not “goto.” Exceptions are an example of “structured control flow” while “goto” certainly is not. Throwing an exception is an example of a “non local exit” which Rust also implements (in terms of “Return”).

Also the exception does not leak into rust-space. Like it or not, any serious c++ ffi must accommodate C++ exceptions, since that is a standard feature (not to mention idiomatic error signaling mechanism) in C++.

arcticbull · on Aug 22, 2020

Exceptions are worse than go-to, they're a comes-from statement. At least with a go-to you know where you're going when you look at it -- with exceptions you could be coming from literally anywhere. All sorts of different control flows all filter into this one weird spot.

Exceptions are probably the worst thing in computer science.

This is a hill I'm happy to join AprilArcus on.

_dh54 · on Aug 23, 2020

> exceptions you could be coming from literally anywhere.

False. exceptions can only come from throw statements, which are both static and finite within every codebase.

> Exceptions are probably the worst thing in computer science.

Opinion, irrelevant.

arcticbull · on Aug 23, 2020

> False. exceptions can only come from throw statements, which are both static and finite within every codebase.

Not between libraries and library customers. From the perspective of the library author, the set is dynamic and uncountable, and similarly from the perspective of the library customer, any exceptions thrown are dynamic and uncountable.

> Opinion, irrelevant.

It's almost like opinions come from some basis worth exploring instead of outright rejecting without consideration. In my (obviously irrelevant) opinion, you may grow as a person if you learn to explore the opinions of others :)

monadic2 · on Aug 23, 2020

That's ridiculous, exceptions always come from further down the stack and can whose throw location can be precisely identified with basic debug symbols.

arcticbull · on Aug 23, 2020

Indeed, and that is orders of magnitude harder to reason about than something that can only come from a specific invocation.

AprilArcus · on Aug 22, 2020

> Throwing an exception is an example of a “non local exit” which Rust also implements (in terms of “Return”).

They are not isomorphic. Return always hands control back to the caller; you can emerge from an exception inside any arbitrarily higher scope, and the proliferation of weird edge cases (what happens when I throw an uncaught exception in a constructor or destructor, the exception safety guarantee hierarchy) are good evidence for why this behavior is too complicated for its own good.

> Like it or not, any serious c++ ffi must accommodate C++ exceptions

Obviously this is the case and I'm not trying to say otherwise. I already said so upthread, but it seems to me that the least complicated way to do so is to wrap wrap foreign C++ function return values in a Result<T> (unless they can be statically proven not to throw), and report errors to C++ through some kind of Either-ish box. Then the C++ caller can decide whether they want to handle the error then and there, or throw an "idiomatic" exception.

_dh54 · on Aug 22, 2020

>> Throwing an exception is an example of a “non local exit” which Rust also implements (in terms of “Return”).

> They are not isomorphic.

It was never claimed that “return” and “throw” were isomorphic. Only that they are both examples of non-local exits.

> good evidence for why this behavior is too complicated for its own good.

That’s a nice opinion but I wouldn’t consider that evidence except under the loosest definitions of the word. By the same reasoning I could claim that being able to invoke “return” at any arbitrary point in the function is evidence that return is a complicated feature.

Just like “return” exceptions in C++ have a statically defined set of areas they can branch to and the invocation of throw is well defined w.r.t. to object cleanup. Admittedly those landing sites are more numerous than “return” but not different in their more defining properties.

arcticbull · on Aug 22, 2020

> I could claim that being able to invoke “return” at any arbitrary point in the function is evidence that return is a complicated feature.

Not really, because a return statement (like a go-to) only has one place to go. An exception has an unlimited number of places to come from. That put it into its own special circle of hell.

_dh54 · on Aug 23, 2020

Your metric of what makes a language feature too complicated is just as arbitrary as mine.

> An exception has an unlimited number of places to come from

False. An exception can only come from a throw statement, which is both lexical/statically defined and finite in number within every codebase.

arcticbull · on Aug 23, 2020

> False. An exception can only come from a throw statement...

Obviously.

> ...which is both lexical/statically defined and finite in number within every codebase.

Not in every case. Any time you have a library that calls into client code the set is dynamic and uncountable from the perspective of the library author.

MaulingMonkey · on Aug 23, 2020

Tons of C++ FFI is really C FFI (with maybe some lightweight wrappers) and completely ignores exceptions, leaving them to the end user to manually mangle.

On the flipside, any serious gamedev C++ FFI must accomodate exceptions-disabled environments, since that's common in gamedev codebases - and even the default settings on at least one relatively modern gamedev platform.

_dh54 · on Aug 23, 2020

> On the flipside, any serious gamedev C++ FFI must accomodate exceptions-disabled environments

Any C++ ffi that accommodates exception also accommodates exception-disabled environments by virtue of accommodating function calls. Certainly AutoCXX accommodates codebases that use exceptions as well as those that disable exceptions by convention.

MaulingMonkey · on Aug 23, 2020

> Certainly AutoCXX accommodates codebases that [...] disable exceptions by convention.

From the documentation, it sounds like that's only true if you eschew Result, the absolute backbone of not-automatically-unwinding Rust error handling.

You could presumably, technically, roll your own Rust struct/enum - "NonStdResult" - and expose that to C++ however cxx deigns to expose Rust enums to C++. But that seems... very awkward at best, to completely counter to the entire point of AutoCXX - in trying to eliminate the need to write a bunch of special case conversion boilerplate - at worst. If you're suggesting that as a superior design choice, we're seriously going to have to agree to disagree.

monadic2 · on Aug 23, 2020

How is goto related to dynamic stack unwinding?

steveklabnik · on Aug 22, 2020

I am not sure, as I'm not the author. I can see it both ways, but I think I personally lean towards what you're saying.

singron · on Aug 22, 2020

Being a Googler project, I expected this to be a version of cxx that used StatusOr and -fno-exceptions.

joshuamorton · on Aug 22, 2020

autoCXX is Google-y, but cxx doesn't appear to be, and enforcing noexcept in a general use tool seems ill-advised.

fluffy87 · on Aug 22, 2020

I wonder how it can auto generate safe bindings for C++.

To do that, it would essentially need to literally prove that the C++ code is thread and memory safe, which is an open research problem at Best, and probably impossible since it requires solving the halting problem.

If it can do that, then the actual binding generation would be the most uninteresting part of this work.

steveklabnik · on Aug 22, 2020

See https://github.com/dtolnay/cxx/issues/1 for some of this debate.

fluffything · on Aug 22, 2020

Ralf Jung's views there are perfectly reasonable.

    unsafe  { ... }

is a soundness proof, it reads "the code within this block is sound".

People lying about having proved safety in their crates accidentally is bad. People doing this intentionally is extremely bad and I wish there was a way to automatically reject being able to depend on intentionally unsound crates in crates.io (or ban their authors are their crates from pushing anything to crates.io since they cannot be trusted).

Code generators that automatically generate thousands of broken soundness proof en masse and by design are IMO the ultimate evil. They completely defeat Rust's purpose. It makes absolutely no sense to interface Rust and C++ in this way, and the people doing this would be better off by just sticking to C++ instead of trying to make safe Rust unsound.

If safe Rust cannot be trusted, Rust value proposition is _dead_ (you cannot hack without feat anymore, refactor without fear, avoid segfaults, ...). This people are writing tools to automatically generate massive amounts of broken Rust code. If crates.io does not protect Rust users from them, we need a different crate repository that does.

Rusky · on Aug 22, 2020

There is no conflict between (auto)cxx and the view that `unsafe` blocks assert the soundness of their contents. Using cxx does not, in fact, remove any `unsafe` blocks- thus it is not "lying," nor is it "the ultimate evil." (Such hyperbole makes productive discussion in this space rather difficult...)

Consider the (completely and utterly standard) practice of writing a safe Rust library wrapping a C++ library. To be sound, this crate has two obligations: First, its `extern {}` function signatures must match the C++ library's. Second, its public API must enforce any additional soundness invariants from the C++ library.

Ensuring that signatures match is tedious and entirely mechanical- automating it increases confidence that a program is sound! Crucially, there is more to signature matching than mere arity and type layout- C++ vocabulary types express ownership information that bindgen alone does not capture!

Cxx isolates all the auditing work related to C++ vocabulary types, like `std::vector` or `std::unique_ptr`. A single audit of cxx covers this aspect of all cxx-using crates. This is an enormous win if your goal is to integrate Rust with a large C++ codebase like Gecko or Chromium!

The kernel of truth to your argument lies only in that second obligation. As Ralf concludes in that same issue thread, the problem of generated `unsafe` blocks is not new to cxx, but a near-universal aspect of Rust/C++ bindings. The solution is not, and never was, to give up and hand-write a bunch of `unsafe` blocks.

Instead, the solution is simply to ensure that auditors can easily locate generated `unsafe` blocks and the soundness assertions they represent. A typical solution here (also mentioned in that thread) is to add "unsafe" to the macro name or input. (Dtolnay already plans to do this for cxx.) Personally, I would argue that this is not even cxx's failing, but a tooling issue- if we're going to use macros to generate bindings, then our auditing tools should see through those macros.

fluffything · on Aug 24, 2020

> Using cxx does not, in fact, remove any `unsafe` blocks- thus it is not "lying," nor is it "the ultimate evil." (Such hyperbole makes productive discussion in this space rather difficult...)

If I can write an unsound Rust program using autocxx or cxx safe Rust APIs, then these APIs are unsound.

Whether these APIs are macros or function calls does not matter. There is no distinction at the API level about whether incorrect unsafe code is expanded, or called into. The API is safe, and for these crates, the currently-safe API introduces undefined behavior.

cxx will be fixed to require unsafe, but AFAICT autocxx usecase is to satisfy Chromiums requirements which require completely avoiding unsafe.

Rusky · on Aug 24, 2020

There is no such thing as a "safe macro" or an "unsafe macro," so you can't make the usual argument there. Macros (and other forms of code generation) all exist before and outside the "safety system," and so must be inspected in an audit regardless.

Putting "unsafe" in the name or input of cxx/autocxx/etc. is a great tweak that improves local readability, but it's not a fundamental change. The actual generated wrapper functions would still be safe, after all. If you're generating `unsafe` blocks, good tooling will point them out to you just the same as if they were hand-written. Nobody is trying to sweep that responsibility under the rug- not cxx, not Chromium, not autocxx.

That is, Chromium's requirement is notably not "completely avoiding unsafe." (This is another piece of hyperbole that derails productive discussion.) Their actual requirement is perfectly reasonable: restrict manual per-function unsafe to exceptional scenarios that deserve extra attention, above and beyond the usual check that a function is always safe to call with arguments that match its type.

Updating autocxx to require surface-level `unsafe` in the same way as cxx does not conflict with this requirement. However, it's also relevant that autocxx is a) rather new, and b) not (yet?) an official Google project: https://github.com/google/autocxx#license-and-usage-notes. It would be unwise to blindly infer any sort of Chromium policy from it at this point, especially when they have just published a rather detailed document that contradicts your inference.

Overall, the problem we should be trying to solve here is not simply "how do we prevent people from generating unsound code," but "how can we enable people to correctly bind to large C++ APIs?"

fluffything · on Aug 24, 2020

> There is no such thing as a "safe macro" or an "unsafe macro," so you can't make the usual argument there

From all the soundness issues that have been filled on macros that were not safe in the past, the impression I have is that all exported Rust macros must be safe.

> but "how can we enable people to correctly bind to large C++ APIs?"

This is a solved problem: rust-bindgen, it correctly generates unsafe Rust FFI wrappers for all C++ code.

What the Chromium devs actually want is what I pointed out in my root comment: to automatically generate correct soundness proofs for their C++ code / APIs, so that they can automatically generate safe Rust FFI wrappers for them.

This is an unsolved problem, which is probably impossible to solve.

So I think this is an absurd requirement from the part of the Chromium project, that deserves the absurd solution proposed by the `autocxx` crate, which is to just assume that these proofs exist. The disclaimer "You can only use `autocxx` on those APIs for which those proofs actually exist" isn't very reassuring TBH. They don't appear interested in finding out whether those proofs actually exist, so what's the point really.

They will just start using Rust, continue getting CVEs because "oh, I guess we shouldn't have blidnly wrapped those thousands of APIs", and conclude that Rust isn't more secure than C++.

Rusky · on Aug 24, 2020

"Just use bindgen" is a cop-out, not a solution. It is technically correct, merely by marking all APIs unsafe. But as the Chromium document describes, this introduces a lot of noise that can mask the more involved proofs.

What Chromium wants is, again, not a way to generate soundness proofs, but a way to consolidate large numbers of similar proofs into a single place (the invocation of the binding macro), so that the remaining proofs can stand out.

Please, do us all a favor and stop misattributing absurdities to Chromium and autocxx. This is a useful part of the design space, ill-served by bindgen alone. Chromium is not the first project to hit it, nor will it be the last. For example, here is a Servo/Gecko dev's take on the subject: https://www.reddit.com/r/rust/comments/ielvxu/the_cxx_debate...

fluffything · on Aug 24, 2020

> "Just use bindgen" is a cop-out, not a solution. It is technically correct, merely by marking all APIs unsafe. But as the Chromium document describes, this introduces a lot of noise that can mask the more involved proofs.

Chromium does not want to use `unsafe` on every call. The solution is simple, write a safe wrapper, that uses `unsafe` once.

> but a way to consolidate large numbers of similar proofs into a single place

That wasn't my read of their comment. If that's what they want, then the patched `cxx` crate which requires `unsafe` would give them exactly this.

The reddit user mentions:

> The key issue that people get worked up about, is that all C++ is unsafe,

This is not what we are talking about here. Lot of C++ code is safe, and writing a safe wrapper over C++ that's obviously safe is a one liner. What's problematic is doing so without checking _and_ without writing unsafe, by assuming that all C++ code is safe, which is what a library that asks you for a path to a headerfile and that will automatically generate safe wrappers in safe rust without asking even if the header file silently changes seems to encourage.

I also worked on one of the largest FFI Rust projects, and what one does is automatically generate thousands of unsafe C bindings, and have a crate exposing safe wrappers. Every time you need to call one FFI you either use the safe wrapper, or you add one if there isn't one. Very often, safe C wrappers weren't trivial, because what the FFI bindings were doing was inherently unsafe, and the amount of abstraction required to make that safe was prohibitive (and many many projects have run into this, Vulkan, wayland, ...). Then you either spend a lot of time into a safe abstraction, or you just use the unsafe API. Automatically generating safe wrappers instead just sounds like a bad idea, one would just be pushing all those issues onto safe Rust, which is not where they belong.

This is exactly what the first reply to that reddit user mentions, and here is your reddit user's reply to that, in which they explain that they were mostly referring to FFI when it comes to splicing Rust into C++: https://www.reddit.com/r/rust/comments/ielvxu/the_cxx_debate...

Calling Rust from C++ is a _very_ different use case from just calling C++ from Rust, which is what the Chromium devs mention is their primary concern:

> we are primarily concerned with the ability for new Rust code to call into existing C++ code

It's so different that one can in fact easily generate "safe" C++ bindings to safe Rust.

Rusky · on Aug 24, 2020

> The solution is simple, write a safe wrapper, that uses `unsafe` once.

This is the thing I'm calling a cop-out. Chromium wants to avoid the error-prone boilerplate of writing out all those safe wrappers, by asserting in a single place that a whole collection of C++ functions are always safe to call with any arguments that match their (generated Rust-side) type.

> That wasn't my read of their comment. If that's what they want, then the patched `cxx` crate which requires `unsafe` would give them exactly this.

Yes, this is what I have been telling you. The Chromium document also says this at the end of point #1: "This particular property is satisfied by dtolnay’s marvellous cxx library already."

> > The key issue that people get worked up about, is that all C++ is unsafe,

>

> This is not what we are talking about here. Lot of C++ code is safe, and writing a safe wrapper over C++ that's obviously safe is a one liner.

We need to be careful with terminology here. All C++ code is unsafe in the sense that its soundness is unchecked by the compiler. Some C++ is also unsafe in the sense of an `unsafe fn`, placing additional expectations on its callers which are not captured by the type signature.

I'm repeating myself now, but it is absolutely possible to take a C++ header file, review the functions it declares, and decide "yes, the wrappers generated by cxx for these functions would all be sound." This is the case where autocxx helps- once you have such a header file, generating a bunch of safe wrappers is much less error prone than writing and maintaining them by hand.

You are correct that Vulkan/Wayland/etc. are usually not conducive to this. But that is why I linked that Reddit comment- the Chromium devs invoking cxx also control the APIs they're feeding it! The reply you linked is not talking about calling Rust from C++, but about this approach of calling C++ which you also own from Rust.

Once you control both sides of the language boundary, changes to the header are a totally different matter. All the thought that goes into preserving those functions' contracts, or updating their callers when they change, must "simply" take the Rust code (and its generated wrappers) into account as well. This is what "all C++ code is unsafe" means- regardless of whether or how Rust is involved, changes to C++ functions must consider their impact on soundness.

fluffything · on Aug 24, 2020

I think we pretty much agree on everything then.

My only complain about autocxx remains that it does not require `unsafe`, but if they adopt the same fix that the `cxx` crate wants to adopt, that complain is gone. I hope the Chromium devs would be "ok" with such a small amount of unsafe code that condenses the proofs of all APIs being wrapped.

We kind of tossed aside the discussion of unsafe vs safe macros, but I'd understand some people see that differently (macros expand Rust code, but IMO they are not that different from generics, and we do have safe and unsafe generics; right now, AFAICT, we only have safe macros).

Rusky · on Aug 24, 2020

Yeah, I do think it might be interesting (if somewhat of a backwards compatibility problem) to introduce the concept of an "unsafe macro." Along with https://github.com/rust-lang/rfcs/pull/2585 it would really emphasize the proof obligation-vs-discharge model of `unsafe`.

fluffything · on Aug 24, 2020

I think that would be worth doing, but more than the backward compat problem, the main complication I see would be in the syntax/grammar/macro expansion.

Like, say we wanted to make cxx or autocxx macros unsafe. We'd need to allow at item scope writing this:

    mod foo {
        unsafe { my_unsafe_macro!() }
    }

and we'd "somehow" have to check that unsafe macros are only expanded within unsafe "scopes", rejecting the program otherwise (can't be a type error because we can't do type checking before macro expansion). Also, these unsafe scopes would need to export all the items, etc.

I don't know. Seems like a lot of contortion for something that could be solved with just a rename like other have proposed, e.g., unsafe_autocxx_include!(...);

Might be worth opening an internal threads about this. Maybe Centril could shed more like on how hard this would actually be.

Most (all?) macros that I regularly use are sound (the derives, println!, debug!, offset_of!, pin_mut!...). So it might also be worth looking into how common of a problem this is. I've seen soundness bugs in macros before (offset_of!), but these were always bugs. This is the first time I recall a macro that's unsound by design.

Matthias247 · on Aug 22, 2020

So I read about things being the ultimate evil. But what is your counter proposal to make the world a better place?

Rewrite everything in Rust in one go? Not going to happen.

The last teams I worked on all maintained a giant pile of C and C++ code. Which obviously sometimes is buggy, and sometimes might even exhibit memory safety issues. I’m looking for every opportunity to make things better and using Rust for to improve safety and reliability. However due to realities this isn’t going to happen at once.

So now we have 2 choices:

Option 1: use bindings and replace things step by step, knowing that we have still unsafe code (but less so!)

Option 2: Stick with C and C++

I think everyone agrees that option 1 is better than 2. But reading about it as „the ultimate evil“ makes me sound very uncomfortable.

fluffything · on Aug 24, 2020

> Rewrite everything in Rust in one go?

No? Just do what every good Rust programmer does today instead?

If I have a C++ function like this:

   // If a == nullptr the behavior is undefined 
   void foo(int* a);

the right way to provide a safe Rust API for it is to write:

   mod ffi { extern "C" { fn foo(a: *mut c_int); } }
   fn foo(a: *mut c_int) { 
      assert!(!a.is_null());
      unsafe { ffi::foo(a) }
   }

The crates being discussed here generate:

   mod ffi { extern "C" { fn foo(a: *mut c_int); } }
   fn foo(a: *mut c_int) { 
      unsafe { ffi::foo(a) }
   }

instead (notice the missing assert), which is broken Rust code according to the Rust spec, because now `foo` will introduce undefined behavior into safe Rust every time its safely called with a null pointer. That is, doing this makes _all_ Rust safe code "unsafe", making the unsafe Rust keyword essentially meaningless, and negating any advantage that Rust has over C++ (memory safety, thread safety, lacks of segfaults, refactoring without introducing errors, etc.). So you end up with 2 languages in your codebase, + a lot of glue boilerplate, for very little win.

If this is what you want, you are better off just sticking with C++ instead.

What Firefox, Servo, and any other good C++ project that cannot/does not want to review all C++ APIs does is to just write:

   extern "C" { fn foo(a: *mut c_int); }

instead. That's less code, and it is correct code. When a programmer needs to call foo, they need to write "unsafe { foo(a) }", and that often leads to them actually checking "foo"'s API docs, and explaining why the call is safe (maybe the assert is not needed because a cannot be null. The burden of doing the work is on the caller.

Firefox and Servo heavily use `rust-bindgen`, which will automatically generate all these FFI wrappers with a correct ABIs for you. The main difference is that rust-bindgen does not attempt to falsely convey the idea that these APIs are safe to call.

The_rationalist · on Aug 22, 2020

I think everyone agrees that option 1 is better than 2. What? It seems pretty reasonable to be against it. Code clarity is the number one priority and the ffi are going to add cognitive overhead / clutter / boilerplate which is harmful.

steveklabnik · on Aug 22, 2020

I actually disagree with you completely, but I understand where it’s coming from.

I think this debate is really, really interesting.

dtolnay · on Aug 22, 2020

I've been planning on the cxx side to update my attribute macro to require `unsafe mod ffi {...}` or `unsafe extern "C" {...}` to signal presence of a proof obligation. I think that resolves all concerns from the January discussion, including viewpoints I don't necessarily agree with, without sacrificing any ease of use.

Just haven't found time to make the compiler PR to allow exposing that syntax to macros yet, but I will.

fluffything · on Aug 22, 2020

That would completely resolve my concern about cxx.

(i still think its a soundness bug in Rust that `extern { }` does not require unsafe since one can use it to trigger UB in safe Rust code)

tick_tock_tick · on Aug 22, 2020

What's your thoughts on #[no_mangle]? https://github.com/rust-lang/rust/issues/28179

Frankly Rust values being a usable language too much to be 100% sound.

fluffything · on Aug 24, 2020

#[no_mangle], like many other Rust features, is unsound; that's a fact and not an opinion, there is a bug in the tracker open for it and labelled I-unsound (accepted as an unsound bug). It is a low priority issue, but will be fixed eventually.

> Frankly Rust values being a usable language too much to be 100% sound.

Rust has a really good track record of identifying, prioritizing, and fixing soundness bugs (e.g. it took 4 years of work to fix one floating-point soundness bug!). Many people continuously work on this at the academic, toolchain, and backend (llvm, crane lift) levels. There is also a lot of people continuously working on making sure that new language features like async/await, const generics, specialization, GATs, ... are sound.

TBH i'm surprised to learn that not all Rust core members consider soundness to be a Rust core value. Maybe it isn't a Rust core value? (it is for me; without it, Rust makes no sense as a language to me)

Jweb_Guru · on Aug 22, 2020

I don't understand why you and others think that bringing up existing and irritating soundness holes in Rust is an argument for introducing new ones. Whether or not you think CXX exposes a sound API, it's a bad argument. This one is literally a bug that we can hopefully fix at some point, just like the floating point casts one was finally fixed.

steveklabnik · on Aug 22, 2020

I think that’s a fantastic compromise. I am 100% in agreement with you on this topic, FWIW.

fluffything · on Aug 22, 2020

How is that a compromise?

The unsafe in `unsafe mod ffi { ... }` is literally the proof that all APIs exposed in the block are sound to call from safe Rust.

It would only need to go hand in hand with a comment explaining why each API in the block is sound to call safely, and it allows users to not list unsound APIs in there, but wrap them manually when needed. Along with code that checks the C++ lib version, etc. to make sure that the proof are kept in sync with each version of the lib.

That's completely different from not requiring any unsafe in the Rust side, and doing this en masse via bindgen.

cornstalks · on Aug 22, 2020

> How is that a compromise?

How is it not a compromise? Party A wanted (and implemented) something that didn't require `unsafe`. Party B wanted `unsafe` to be used. After much discussion, Party A concedes with allowing people to put `unsafe` in some of the code, and Party B concedes that putting it there is sufficient instead of requiring it to be strewn all over the place.

Sounds like a (reasonable) compromise to me.

simias · on Aug 22, 2020

Can you explain why having to put "unsafe" before FFI calls is considered a problem? That's the part I don't get, isn't the whole point of using Rust that you want sound, memory-safe code by default?

If you don't want to have unsafe everywhere it seems to me that the reasonable path (taken by many Rust libraries) is simply to wrap the raw C/C++ interface around a Rust interface that enforces the safety invariants. That's why you can write safe GTK or OpenGL code in Rust, using Rust libraries that expose an actually sound interface.

I mean, FFI code is tagged as unsafe because it is unsafe to call. I don't see how "but I don't like having to write unsafe everywhere in my FFI code :(" is in any way a reasonable technical argument here.

cornstalks · on Aug 22, 2020

> Can you explain why having to put "unsafe" before FFI calls is considered a problem?

I intentionally avoided taking a side in this debate. My response was focused on the claim that there was no compromise between the parties, when I think it's pretty clear cut that there is a compromise.

Personally, I'm sympathetic to both parties. In my own personal FFI-binding library I chose to let the programmer decide per-method whether it's safe or unsafe.

simias · on Aug 23, 2020

I'm all for compromise but first I need to understand the pros and cons of both sides. For me here so far it's pretty clear cut: there's a right way and a wrong way. The compromise is just a slightly less wrong way. So no I don't accept this compromise in this situation.

drran · on Aug 22, 2020

Some coders want their code to be safe. Some coders want their code to look safe. Feel the difference.

fluffything · on Aug 22, 2020

Rust requires all safe Rust code to not have undefined behavior.

Party A wanting safe Rust to have undefined behavior was wrong. Party A now adds unsafe to their API, so that undefined behavior only happens in unsafe Rust.

I don't see the compromise anywhere. Party B told party A that they were wrong, and party A acknowledge it and fixed their crate.

cornstalks · on Aug 22, 2020

You either haven't looked at how cxx works, or you're intentionally misrepresenting it.

cxx generates code that uses `unsafe` blocks/functions. It then generates safe wrappers around those, which are what it exposes to users. It's no different then someone doing this:

  pub fn safe_fn() {
    extern { fn unsafe_fn(); }
    unsafe { unsafe_fn(); }
  }

cxx just uses macros to generate that. You're welcome to have a different opinion on whether or not the user must pass an `unsafe` token to the macro. But that has no bearing on the code generated by the macro.

Statements like "Party A wanting safe Rust to have undefined behavior was wrong" are just straight up incorrect.

fluffything · on Aug 22, 2020

I know how cxx works, and I am not misrepresenting anything.

cxx is currently unsound, and the author (dtolnay) has chimed in above with a fix they are considering that fixes it (https://news.ycombinator.com/item?id=24244121)

The API of the cxx crate is safe, and it can cause UB, therefore it is unsound.

It doesn't matter that the cxx crate API is a macro. Yes, this macro expands to unsafe code like you mention, but the problem is that this unsafe code is often "broken" (unsound).

Ar-Curunir · on Aug 22, 2020

Unsafe != unsound

fluffything · on Aug 24, 2020

I know.

The Rust spec defines "unsound" as "introducing undefined behavior in safe Rust". The API of the cxx crate allows a safe Rust program to have undefined behavior, and it is therefore unsound.

Jweb_Guru · on Aug 22, 2020

Someone doing what you just described is wrong in the context of C++, at least without manual review.

tick_tock_tick · on Aug 22, 2020

Rust requires ALL CODE to not to have undefined behavior. Unsafe doesn't mean you can do whatever you want it just means that you are guaranteeing to the compiler that this block is safe.

eximius · on Aug 22, 2020

> The unsafe in `unsafe mod ffi { ... }` is literally the proof that all APIs exposed in the block are sound to call from safe Rust.

I think there might be a misunderstanding here. I interpreted the `unsafe mod ffi { ... }` to be like `unsafe fn foo()`, declaring the module as unsafe, not an unsafe block where we're telling the compiler we will maintain the invariants ourselves.

It is somewhat unfortunate both the proof obligation and proof 'declaration' use the same token.

fluffything · on Aug 24, 2020

> It is somewhat unfortunate both the proof obligation and proof 'declaration' use the same token.

There have been some RFCs open to improve this situation (e.g. unsafe blocks in unsafe functions comes to mind).

Jweb_Guru · on Aug 22, 2020

I don't actually understand your position on this argument at all... maybe you can clarify? I guess it doesn't really matter since it was resolved in a way that satisfies everyone, it just makes me think I'm missing something. If I grep a dependency for `unsafe`, don't find any, but I have UB in my application from that library, I am going to be very, very unhappy.

steveklabnik · on Aug 22, 2020

I wrote it up as a blog post: https://steveklabnik.com/blog/the-cxx-debate

TL;DR: I think this is just regular old composition of Rust features, and that using the library itself is the proof obligation. I can see why others don't like it. I think dtolnay has come to a good compromise.

Jweb_Guru · on Aug 22, 2020

All of the examples in your blog post are examples of code that is obviously safe. I think extending that analogy to something as large as a nontrivial C++ library (one you'd need CXX to integrate with) is such a huge stretch that it kind of feels disingenuous as an argument. However, your current stated justification ("using the library is equivalent to unsafe") is at least a little more understandable to me; nonetheless, the standard in Rust is syntactically writing `unsafe` and I'm glad that dtolnay found a compromise that worked for everyone.

(I will point out, however, that I still seriously doubt that every single one of these autogenerated functions is safe, and will probably avoid libraries that use CXX for that reason. But at least now someone who hasn't heard of CXX can come to that conclusion independently :)).

steveklabnik · on Aug 22, 2020

Well, I do that partially because I think that it's the core of the issue here. Saying "hey I think this makes it too easy to make mistakes" is a very different argument than "obviously this library is inherently unsound."

Jweb_Guru · on Aug 22, 2020

Well, both can be true (and I am fairly confident that this was the case previously, even though the exact standards for macro unsafety have not been formally established--in the past, for example when adding #[may_dangle] and other unsafe attributes, a way was found to add `unsafe` to the syntax even though Rust lacked a way to do this directly for the attribute in question). But I agree these are different arguments that shouldn't be confused for one another.

simias · on Aug 22, 2020

I'm really surprised to read this, it's so obvious to me that what fluffything says is correct that I can't quite imagine what the counter-argument looks like.

I kinda feel like the original sin in this discussion is that "unsafe {}" should really have been called "safe {}" or "sound {}" since it's really the programmer telling the compiler "hey, this block is safe to run as-is, I've checked, trust me on that". An automatic C++ binding library is in no position to make this assertion.

It's weird to quote the Holy Scripture back at you of all people but here it goes (emphasis mine):

https://doc.rust-lang.org/beta/reference/behavior-considered...

> Rust code is incorrect if it exhibits any of the behaviors in the following list. This includes code within unsafe blocks and unsafe functions. unsafe only means that avoiding undefined behavior is on the programmer; it does not change anything about the fact that Rust programs must never cause undefined behavior.

> It is the programmer's responsibility when writing unsafe code to ensure that any safe code interacting with the unsafe code cannot trigger these behaviors. unsafe code that satisfies this property for any safe client is called sound; if unsafe code can be misused by safe code to exhibit undefined behavior, it is unsound.

This library generates safe bindings to unsafe FFI calls without actually checking that said code is safe to call that way. It's therefore totally unsound until proven otherwise.

So what's the argument here? That the C++ code might be safe to call that way if we're lucky? I mean, sure, but how is that reasonable or practical? I definitely don't want a FFI library to tag functions as safe for me before I had the time to validate it, it seems to easy to let something bad slip unnoticed. If the interface is unsafe it means that I need to review it and manually mark it as safe once I've taken due diligence and I can vouch that the interface is actually sound. And I definitely don't want a third-party crate to expose a safe-but-unsound Rust interface with a big disclaimer that "basically it's just raw calls to C++, make sure you use it right otherwise it's segfault time!".

Again, it's mind boggling to me that this is controversial. It's not just bullshit language lawyering either, I think it's a terrible idea both in theory and in practice. I can already see the Rust crates leaking unsound """safe""" interfaces left and right because they're just a thin automatically-generated wrapper around C++ header files.

steveklabnik · on Aug 22, 2020

I wrote up a blog post about it, I linked to it in a sibling comment, I don't want to spam :)

> So what's the argument here? That the C++ code might be safe to call that way if we're lucky?

No, the argument is that we've checked that the C++ code is safe to call, just like any other FFI. We've reduced the boilerplate of doing so with a macro, just like any other boilerplate might be.

> I definitely don't want a FFI library to tag functions as safe for me before I had the time to validate it,

Do you validate every single unsafe code in every single library you use? Even the standard library? Every time?

> And I definitely don't want a third-party crate to expose a safe-but-unsound Rust interface with a big disclaimer that "basically it's just raw calls to C++, make sure you use it right otherwise it's segfault time!".

I do not either, but CXX doesn't change this in any meaningful way. This could still happen before.

fluffything · on Aug 24, 2020

> Do you validate every single unsafe code in every single library you use? Even the standard library? Every time?

We don't need to? Rust's main soundness theorem says that if a safe Rust API is sound, then safe Rust code using it is sound.

So the only thing we need to check is that the unsafe code that _we_ write is sound, and we 100% do this all the way for the standard library. In the standard library _every_ unsafe block has a comment explaining why the "user-provided soundness proof" that they represent is correct. We even have a linter that actually rejects PRs that add unsafe blocks to the standard library without such comments.

> Do you validate every single unsafe code in every single library you use? Even the standard library? Every time?

Soundness is IMO Rust's main feature and its main core value, and is more important than other Rust core values like, e.g., zero-overhead abstractions, which are also extremely important. Soundness is what enables "Hack without fear", "no segfaults", "refactor without fear", cargo ("large scale software without fear"), crates.io ("using other people's libraries without fear")... and it's what sets Rust apart from unsound-by-design languages like D, Nim, Zig, C, C++, etc.

I haven't read the book in a long time, but the documentation I do read (nomicon, spec, unsafe code guidelines, issue tracker, internals) makes it very clear that making sure that unsafe Rust code is correct is critical for the ecosystem and for users to benefit from the main advantages Rust has to offer.

tonyhb · on Aug 22, 2020

> Do you validate every single unsafe code in every single library you use? Even the standard library? Every time?

Nope. But if I use a rust library, the compiler will do that for me. If it's Rust. And not CXX.

I shouldn't have to in Rust. That's... the central premise here. Usage of unsafe code should say that it's unsafe.

tonyhb · on Aug 22, 2020

Agree with you and fluffything entirely. I can't believe this is an argument - it's basic logic. That the "rust community" can't see this is absolutely WILD.

1) C++ may be unsafe

2) The rust bindings are safe

3) But the rust bindings call potentially unsafe C++

Yet... it's controversial to want unsafe{}? unsafe{} is there SO THAT YOU KNOW that the rust compiler can't assert soundness.

Absolutely bananas.

Jweb_Guru · on Aug 22, 2020

It's not the case that "the Rust community" can't see this, and the fact that there's a huge amount of disagreement in this thread should be evidence of that. Ultimately, the problem is being resolved to everyone's satisfaction, which means that the discussion was productive, so I don't see the point in antagonizing people about whether they personally think it's an issue or not.

fluffything · on Aug 22, 2020

We already had this debate many times.

One of the last ones was the actix debacle, which was contained and accidental and ended up badly.

This is IMO infinitely worse.

steveklabnik · on Aug 22, 2020

I think they are categorically different debates.

fluffything · on Aug 22, 2020

To me they are categorically identical.

Rust code author knowingly exposes unsound safe Rust APIs.

The details are different. Actix did so in their own project and were willing to fix it (the debacle was mostly social, and in some sense even accidental), while here exposing unsound safe Rust APIs is the whole raison d'etre of the project.

AutoCXX doesn't even check that the underlying C++ code does not change. So even if an autogenerated safe Rust API happens to be "accidentally safe", that can change any time without Rust users knowing.

steveklabnik · on Aug 22, 2020

> Rust code author knowingly exposes unsound safe Rust APIs.

This is where we disagree. I 100% agree that knowingly exposing an unsound API to safe Rust is a bad thing. However, what I see is the user of cxx/autocxx saying "I am asserting that this API is sound."

> AutoCXX doesn't even check that the underlying C++ code does not change

This is no different than any other Rust code calling into C++ code. The author has to declare that they believe it is safe either way.

fluffything · on Aug 22, 2020

> However, what I see is the user of cxx/autocxx saying "I am asserting that this API is sound."

And since the API of this library allows this assertion to be performed in safe Rust code, when it fails, safe Rust code has UB, which according to the Rust language reference makes the API of this crate unsound.

Unsound safe Rust APIs are broken Rust code. The Rust language spec and toolchain make no guarantees about what the behavior of these APIs is.

So that's what I see here. Just another safe Rust API that is broken by design. Maybe with the twist that this crate is actually a factory to generates thousands of those broken safe Rust APIs in masse, which kind of makes it worse than your usual "broken API" bug (which is what these are, these are soundness bugs in Rust libraries).

steveklabnik · on Aug 22, 2020

> this assertion to be performed in safe Rust code,

This assertion is always performed in safe Rust code. That's how you turn unsafe code into safe code: you write "unsafe { }" in safe code. That this does this in the body of a macro is not material.

(I know you disagree and don't think we're going to get anywhere, and frankly, find your aggressiveness really offputting.)

fluffything · on Aug 22, 2020

> This assertion is always performed in safe Rust code.

Safe Rust code performs these assertions by using the "unsafe" keyword. This library API allows safe Rust to perform these assertions _without_ writing "unsafe".

That's the problem.

steveklabnik · on Aug 22, 2020

Do you think that macros should treat unsafe blocks in their body as a compiler error? Because that's what this boils down to.

If so, why? How is that materially different from calling a function with unsafe in its body? If not, why is this particular macro different?

fluffything · on Aug 22, 2020

> Do you think that macros should treat unsafe blocks in their body as a compiler error?

No. Exported macros (as opposed to private ones) are only sound if they do not allow safe Rust to introduce undefined behavior.

Whether these macros use "unsafe" internally or not is irrelevant. For example, `pin_mut!` uses unsafe internally, but it does not allow safe Rust calling it to introduce UB.

A macro that allows safe Rust to introduce undefined behavior is unsound. An example of such an unsound macro would be `offset_of!`.

---

That is, I do not differentiate Rust abstractions when it comes to soundness. Whether its a function, a trait method, a macro, or a function pointer, it does not matter. If an abstraction its safe to use it shall not introduce UB. If it does, it is an unsound abstraction.

steveklabnik · on Aug 22, 2020

> Exported macros (as opposed to private ones) are only sound if they do not allow safe Rust to introduce undefined behavior.

There is no debate about this.

You didn't really answer my question. I'm also not really gonna continue this argument.

fluffything · on Aug 22, 2020

> There is no debate about this.

Of course there is, for example, the offset_of! macro was unsound for a long time:

* https://internals.rust-lang.org/t/pre-rfc-add-a-new-offset-o...

* https://github.com/rust-lang/unsafe-code-guidelines/issues/1...

* https://github.com/rust-lang/unsafe-code-guidelines/issues/2...

* https://github.com/rust-lang/rust-memory-model/issues/35

> You didn't really answer my question.

I literally wrote "No.". To your follow up questions starting with "If so ..." i did not reply, because the assumption these questions were based on (that I would reply to the previous question with "yes") did not hold.

To expand on this. Rust does not have "unsafe macros", so all exported macros must be sound. Whether a macro is sound or not is orthogonal to whether the macro itself uses `unsafe` (and your original claim was whether I thought that macros containing unsafe should be rejected by the compiler, to which I replied "No.", `pin_mut!` is an example of a macro that uses `unsafe` and is sound).

atq2119 · on Aug 22, 2020

I'm observing this discussion as an outsider and can't help but wonder: maybe this could be solved by renaming the macro in question so that it contains "unsafe" in its name?

Whoever invokes the macro asserts that doing so is sound / cannot cause UB, and having "unsafe" in the macro name draws attention to that in the same way that the unsafe keyword itself does.

steveklabnik · on Aug 22, 2020

That is what in fact is being proposed by the author of this library (well the one this library depends on) elsewhere in this thread. I think it's a great idea.

Dylan16807 · on Aug 22, 2020

> However, what I see is the user of cxx/autocxx saying "I am asserting that this API is sound."

Right now that's very implicit, and I bet there are users of the library that don't think they're asserting that. They expect you to follow the documented rules about invariants.

Requiring people to write 'unsafe' at the import site makes that a lot clearer, but if it's mandatory then people are probably going to write it whether the API is sound or not.

I'd be more comfortable if you could either declare an API sound when you import it or write 'unsafe' everywhere you call it.

kibwen · on Aug 22, 2020

Even if someone uploads Rust bindings to C or C++ code to crates.io, as a library user you would still need to manually install the C/C++ library on your end, along with a compiler. I agree that we need to have strong social customs to enforce the meaning of safety where the Rust compiler cannot, but when it comes to FFI-induced unsafety I don't think we don't need to worry about library authors sneaking random unsafely-bound C or C++ libs into the dependencies of a crates.io lib without the user's realization.

est31 · on Aug 22, 2020

1. this only makes interfacing with C++ libraries safe, not with C or the C-ish subset of C++. So no C pointers without unsafe.

2. many C/C++ library bindings in Rust ship the C code with them, so no need to install the library, only the compiler.

fluffything · on Aug 24, 2020

> Even if someone uploads Rust bindings to C or C++ code to crates.io, as a library user you would still need to manually install the C/C++ library on your end, along with a compiler.

Not if the library comes with your system. One can use this macro to generate a more "ergonomic" "safe libc" crate, that just introduces UB in safe Rust.

geofft · on Aug 22, 2020

Do you think that a crate that allows you to write to files should be banned from crates.io, because you can write to /proc/self/mem without writing the word "unsafe"?

fluffything · on Aug 24, 2020

This is currently a soundness bug in libstd: https://github.com/rust-lang/rust/issues/32670

(note: the bug was proposed to review by the libs team, but this review never happened, some prominent community members claim there that this is indeed a soundness bug, just one that might not be worth fixing, or would be really hard to fix, but that should be tracked as such).

To me it looks like one should just link the bug to the unsafe code guidelines, so that they can clarify the definition of soundness (and the limits of the rust abstract machine in which soundness is meaningful) to take this into account.

See the linked "mitosis crate" issue at the end for a similar issue.

Jweb_Guru · on Aug 22, 2020

I actually find that hole in Rust really bad and would be happy if Rust's runtime closed this loophole. Not sure why people think "hah! You'd have to disable being able to write bytes to random memory addresses, too!" is an argument ad absurdum, the fact that you can do that is terrible and dangerous.

geofft · on Aug 22, 2020

How would you close that loophole, exactly? Would you prevent running a subprocess that writes to /proc/$pid/mem? Would you prevent running all subprocesses, or somehow only those that edit /proc/$pid/mem?

Like, I get that you want the loophole to not exist, but does there exist a possible world in which Rust closed the loophole and you would actually be happy about it?

The fact that you can write bytes to random memory addresses if you intentionally set out to do so is not what Rust's safety system is about. Rust's safety system is a tool for the programmer to build working programs. It is not an end goal in itself - hence why unsafe exists and why so much of the standard library and so many third party crates use safe. It's a tool you have to use correctly, sure, but it's only a tool. We don't avoid memory corruption and data races because they are inherently bad - we avoid them because they cause bad things to happen, like getting the wrong results out of a computation or allowing attackers to subvert program control flow.

If you want to 100% guarantee that a program does not bad things, you're looking for a sandbox. Compile your code to wasm, or run it in a VM, or something, and then even if you have memory corruption or data races, it can't escape the sandbox and read the rest of your files. Or if you want to 100% guarantee that your program has no internal bugs, write it in Coq or ATS or something. Rust has a specific goal, and it does not involve being either a sandbox or a theorem prover.

Jweb_Guru · on Aug 23, 2020

I mean, ideally Linux would remove the interface completely, as it's not even needed by the original application that proposed it and is a constant lurking potential security flaw. But moving to more realistic measures...

AFAIK, it is actually possible to prevent /proc/$pid/mem from being updated from an individual process, and Rust's runtime could in fact ensure this on startup, at least on Linux. I was also pretty sure (but would have to look this up--it's a really crappy interface) that in general other processes cannot invoke this interface under ordinary privileges, which I think is a fairly reasonable compromise in terms of safety.

Not really sure what the point of the rest of your post is. I'm quite confident I have never argued that libraries should not use unsafe, or that Rust should prevent all bugs. I do think that Rust's standard library should not expose tools that can lead directly to memory unsafety, in the same process, when only safe code is used, under some "reasonable" model of the environment in which it runs (which might include, for example, a working MMU, and process isolation boundaries). That seems like a pretty reasonable standard for a language calling itself safe, and even if we can't quite reach it, we should strive for it. I could even buy the argument that exposing an interface like /proc/$pid/mem/ is not "reasonable" and thus discount this unsafety (well, I could if it weren't the default on all Linux distributions I'm aware of).

To instead use cases like this as an argument for "pragmatically" giving up memory safety in situations where it isn't actually forced upon us by the underlying execution environment, isn't at all convincing to me.

geofft · on Aug 23, 2020

Well, /proc/$pid/mem is only one way of doing it. Here are a number of others:

- Use the debugger interface (ptrace on Linux, but most OSes have some form of debugger). Should Rust programs block themselves from being debugged, iTunes-style?

- By editing the binary files on disk that are mapped into the process space. Should Rust binaries refuse to run if they are writable by the current user?

- By loading a kernel module. Should Rust binaries drop CAP_SYS_MODULE from the bounding set, just in case someone wants to load kernel modules from a subprocess?

And so forth. Now, yes, these cases are getting increasingly silly, but I'm having trouble figuring out what the precise distinction is between "It is bad to release a crate that permits you, in safe code, to call a function that might be buggy" and "It is bad to release a crate that permits you, in safe code, to run SSH, because you could SSH to your own hypervisor and modify the current process's memory."

The best distinction I can think of is that a programmer who thinks they are writing code to do task X should not unintentionally also be writing code to cause problem Y. This is the fundamental problem with unsafe interfaces - it's not that, say, gets() doesn't work, it's that it's literally impossible to use gets() without also allowing whoever's providing input to overflow a buffer. If you set out to modify your own memory, well, you're intending to do so, so I don't think that the safe/unsafe distinction makes much sense there. I suspect an argument could be made that AutoCXX's bindings should be marked unsafe, but I think that argument needs to be more complicated than "producing any sort of automated safe bindings is equivalent to the halting problem".

> Rust's standard library should not expose tools that can lead directly to memory unsafety when only safe code is used, under some "reasonable" model of the environment in which it runs (which might include, for example, a working MMU, and process isolation boundaries)

I do agree with this, I think - I just think that "directly" / "reasonable" excludes stuff like /proc/self/mem.

Jweb_Guru · on Aug 24, 2020

The difference is that all of the other scenarios you mentioned don't work by default under standard execution privileges, and/or require additional setup beyond just executing the program. /proc/$pid/mem just works with any program under Linux. That's a pretty fundamental difference. Like I said, if it weren't the default behavior on Linux, I wouldn't care. But as it is I don't think you can meaningfully say the default Linux environment is "not a reasonable environment" for running secure software while I am happy to say this about, say, running with root privileges, or having a debugger inserted, or loading non-default kernel modules.

The idea that unsafe is supposed to only cover "deliberate" invocations is kind of silly, IMO. That is not what unsafe means. A function called "get_unchecked_index_this_requires_a_bounds_check_be_careful" still needs to be marked unsafe. But even if somehow this distinction could be made meaningful, this turns an accidental bug that lets you manipulate the file path into full blown arbitrary code execution.

Jweb_Guru · on Aug 25, 2020

Also, people have brought up examples of real C and C++ libraries that people use, that would be unsound if bound by AutoCXX as initially implemented. I'm really not sure what the argument is for AutoCXX being sound without requiring the user to write unsafe, but the argument against is not an abstract one as you are implying.

pjmlp · on Aug 22, 2020

Which is why going back to Burroughs, Modula variants, Oberon variants and even .NET, all offer the mechanism to say no linking of unsafe code.

nullc · on Aug 23, 2020

Has anyone suggested adding unsafe type bitmask?

E.g. "This code and its transitive dependencies has been carefully validated to be correct", "I had to do this to make it compile", and "This is a FFI that is inherently potentially unsafe".

Then the unsafty of any function would be defined as the logical OR of it and its dependencies... and you could filter what you're willing to use -- maybe with exceptions programmed for specific unsafe blocks, so you could e.g. except specific FFI interfaces without taking random ones people sneak in while they're adding malware to deep dependencies in crates.io to get silently embedded into your software.

throwfaraway12 · on Aug 22, 2020

crates.io isn't the ultimate ivory tower. It is an open space for anyone wanting to share and upload code.

It is your responsibility to vet your dependencies. For anything serious, you better setup a layered review process.

If you want a "vetted crates.io", then propose that. I’d be in favor. I certainly never liked crates.io to be the next NPM.

But telling people their crates are "evil" and trying to get them banned is breaking the CoC. The people uploading "broken" code, no matter how much they upload, aren’t the ones breaking it.

Ygg2 · on Aug 22, 2020

It assumes using your library is safe, same way you assume Rust std is safe, even though it uses unsafe in the background.

Basically, burden of proof of using unsafe correctly is on the programmer.

simias · on Aug 22, 2020

But there's no concept of safety, lifetime and borrows in C++ (there's aliasing, but the rules are different). There's a lot of code that one could write that's perfectly kosher in C++ but a no-no in Rust.

A function like memmove would be completely unsafe in Rust for instance but C++ could expose such an interface without having to do anything special.

Ygg2 · on Aug 22, 2020

So, Rust is to blame for C++ problems?

No. C++ programs are safe under certain conditions. If they aren't then they need to be changed in C++ or in Rust interfacing code.

Even if something is unsafe, it can still be used safely.

simias · on Aug 22, 2020

> C++ programs are safe under certain conditions.

I entirely agree with this. My point is that these conditions are not always (and often aren't) encoded in the C++ syntax itself, and are instead part of the documentation or some softer coding convention.

As such an automatic code converter can't be aware of these details and can't decide if a call is safe or not. C++ doesn't have the concept of safety built in, it's the programmer that's in charge of keeping track of the soundness of it all. It's not enforced by the compiler like Rust does.

I'm really surprised that there's so much debate around this in this thread honestly (including by people who know Rust very well). That's blindingly obvious to me that it's a very bad idea. Clearly that means that I'm missing one side of this issue.

cle · on Aug 22, 2020

I’m not surprised at all. If you follow the Rust community, this is an endless debate. It’s totally off-putting to me and makes me want to avoid the language altogether.

fluffything · on Aug 22, 2020

Sure, but safe Rust functions can _always_ be used safely. If your function cannot always be used safely, either it is an `unsafe` Rust function, or your program is broken according to the Rust spec.

Publishing safe Rust APIs that are _intentionally_ unsound should at best be warned about to any other crate that depends on it. Ideally, those authors and their crate would simply be banned.

Ygg2 · on Aug 22, 2020

And I explained.

Either you change the C++ lib to not behave unsafely for inputs.

Or you manually change the binding to keep the invariant.

The assumption behind linking to C++ code is that it doesn't contain unsoundness. If it does the library was fucked way before.

fluffything · on Aug 22, 2020

I agree with you.

But this discussion is about a tool that:

* does not change the C++ code to behave correctly for all inputs

* does not (and technically cannot) generate safe Rust bindings that keep the invariant (the bindings are automatically generated under the assumption that all C++ code is safe for all inputs).

the_mitsuhiko · on Aug 22, 2020

It seems it’s quite an absolutist way to look at safety. It establishes that nothing is allowed to ask a user to proof safety unless the unsafe keyword is used somewhere.

I think this is already invalidated by countless code generation and build scripts.

Jweb_Guru · on Aug 22, 2020

The use of the unsafe keyword is how Rust's "absolutist" view is made practical... the obligation to write an extra seven to nine characters is pretty minimal compared to the burden of an actual proof. "Some automated build script probably don't respect that" isn't a great argument against that standard, I think, especially with the push to sandbox build scripts and procedural macros showing that people are still quite concerned about safety in those areas (I know you're referring to the code they generate and not the scripts themselves, but I see these as related issues of trust).

Ygg2 · on Aug 23, 2020

But the thing is, unsafe was never absolutist.

Even regular safe Rust assumes you use unsafe in a proper manner.

This build tool assumes your C++ code is safe enough to wrap around unsafe trivially.

Jweb_Guru · on Aug 23, 2020

Regular Rust assumes that you use unsafe in a proper manner, but still asks you to mark functions that can cause memory unsafety with the wrong inputs as unsafe. This is so people know what code to audit. If you don't do this (or at least, mark something in the module as unsafe), it's considered a bug in your library. Among other things, this kind of clear isolation boundary makes it obvious where your proof obligations are and consequently which code you can ignore when debugging undefined behavior (code outside the unsafe module is not responsible for the unsafety no matter what weird input it provides).

This build tool, as first proposed, knowingly exposed macros that can cause memory unsafety with the wrong inputs, without making you write unsafe. If a build tool makes assumptions like "you probably know what you're doing with that C++ code" without asking you to write unsafe, it is just as wrong as an implementation of unchecked indexing that wasn't marked unsafe (because "you'll probably know not to use this unless you're really in bounds."). It is clear that you consider this a trivial matter, but many people very heavily involved in the Rust community (including the people working on its soundness proofs) do not.

Ygg2 · on Aug 23, 2020

> knowingly exposed macros that can cause memory unsafety with the wrong inputs,

This was done to deal with remarks Chromium devs had against Rust C++ binding.

In lieu of that it's liberal treatment of unsafe is understandable.

fluffything · on Aug 24, 2020

> This was done to deal with remarks Chromium devs had against Rust C++ binding.

I have a hard time believing those Chromium devs know what they want.

Their request allows safe Rust to introduce undefined behavior, making the unsafe Rust keyword meaningless (since now you can't even grep for the places in the code where this UB is introduced), and eliminating the actual reason they want to use Rust in the first place (to avoid undefined behavior, segfaults, CVEs, etc.).

The only thing complying with the request will achieve is hurt the language. Once the chromium devs start hitting the same CVEs and segfaults in Rust that they were hitting in C++, they will just tell the whole world that Rust is as bad as C++ and using it is not worth it.

Jweb_Guru · on Aug 24, 2020

With all due respect, "Chromium devs wanted feature X" does not make it a good idea. Servo devs wanted features that would have made Rust worse, too.

Ygg2 · on Aug 22, 2020

Yes. And? You seem to forget famous saying. Perfect is the enemy of good.

Rust std wasn't yet verified to not contain unsafe behavior. In fact, some modelling discovered and fixed unsafe behavior in Rust std libs.

Same happens here. If you discover that for inputs you have unsafe behavior calling ffi, you uncapsulate it in way that prevents misuses or fix in source ffi lib.

rumanator · on Aug 22, 2020

> There's a lot of code that one could write that's perfectly kosher in C++ but a no-no in Rust.

Not a problem. Rust checks are applied to the Rust code you write or import, and everything else is just ingested as is.

yoshuaw · on Aug 22, 2020

I read "safely call" as "does not require `unsafe {}` to call". Invariants will still need to be manually upheld to ensure the C++ code will work as expected.

simias · on Aug 22, 2020

The problem with that is that making a function safe means that you guarantee that these invariants are enforced. If you tag a function as safe erroneously you basically throw Rust's safety out of the window.

Given that C++ has no explicit concept of safety it seems like it would be hard to do that automatically as soon as pointers or references are involved.

As a quick example: if you have the following signature in C++:

    int *foo(int &a);

How you automatically generate safe FFI for it?

As a human doing the same task I'd have to ask myself at the very least:

- Can the return value be NULL?

- Can the return value alias a?

- What's the lifetime of the returned value and who owns it?

rumanator · on Aug 22, 2020

> If you tag a function as safe erroneously you basically throw Rust's safety out of the window.

You really don't. Rust checks are applied to the code written in Rust. If a developer intentionally marked a dependency as safe then you get exactly what you've asked for.

Sometimes it's unquestionably better to have a working system than not having one just because a pedantic compiler complains about stuff that you can't do nothing about.

ChrisSD · on Aug 22, 2020

If an interface can break safe Rust (without that being an implementation bug) then it should not be marked as safe. Breaking safety isn't about the compiler being pedantic, it can break your real world program.

Yes this may mean you sometimes can't create a safe interface to a particular foreign function. However all this means is that calling the function requires an explicit unsafe block and extra care. But that 'unsafe' marker is valuable! It's something that screams "here be dragons".

simias · on Aug 22, 2020

I don't understand what you mean. You can always use "unsafe" anywhere in your code, if autoCXX marked all interface code as unsafe you'd never be stuck because of it, it would just mean that you'd had to check the safety of the C++ function yourself and, if you deem that your usage is kosher, add an unsafe block around your call.

If you erroneously tag unsafe interfaces as safe then you basically throw all of Rust's safety out of the window. You can end up with multiple mutable borrows, or borrows with a bogus lifetime or whatever.

Jweb_Guru · on Aug 22, 2020

Actually, many C and C++ functions that take no variables at all are unsafe to call because they update shared global state (and tacitly assume that the user of the library will not do this more than once). Initialization functions for example.

simias · on Aug 22, 2020

Indeed. In my experience it's also fairly common for C and C++ functions to be non-thread safe and have the implicit restriction that they should always be called from the same thread. OpenGL being a good example of that.

Raw bindings to OpenGL in Rust is actually a good example of FFI that wouldn't be safe to call for about half a trillion reasons. There are so many implicit constraints in there that even manually designing a safe Rust interface is a rather complicated endeavor.

rightbyte · on Aug 22, 2020

foo can also store a pointer to a somewhere right?

fluffything · on Aug 22, 2020

If it does not require `unsafe` to call these bindings, then the code they execute _must_ be sound according to _Rust_ rules.

If this is not the case, then this whole library thing is _unsound_. We really need an "intentially-broken" tag in crates.io to identify crates that lie about their soundness intentionally.

awestroke · on Aug 22, 2020

The bindings are safe, not the C++ they bind to

simias · on Aug 22, 2020

Unsafety is contagious, the whole point of "unsafe {}" is to create a well defined interface between code that can rely on safety guarantees enforced by the compiler and code that needs to be manually checked by the developer.

Safe bindings to unsafe code need to enforce the invariants to make the calls safe, otherwise they are not safe.

Consider this code:

    fn int_to_string(e: &mut u32) -> &mut String {
        unsafe {
            &mut*(e as *mut u32 as *mut String)
        }
    }

    fn main() {
        let mut i = 42u32;
        
        let s = int_to_string(&mut i);
        
        s.push_str("Ayyy");
        
        println!("{}", s);
    }

This int_to_string function is marked safe and can be called from main without any unsafe block, yet if you run this code it will probably segfault. Or maybe it'll format your hard drive, who knows. Because int_to_string is clearly unsound and is broken.

If you just start tagging random, potentially unsound interfaces as safe, what's even the point?

And if you agree that the code in this example is bad and "int_to_string" should definitely not be considered safe, why would that change if I rewrote it to make "int_to_string" a C++ function called through FFI instead?

fluffything · on Aug 22, 2020

If the C++ they bind to is not safe, then allowing these to be called from safe Rust is unsound.

alvarelle · on Aug 22, 2020

The point is that the C++ code should be safe because the C++ programmer should not introduce UB on its C++ code. If the C++ code invoke UB, that is a bug in the C++ code which should be found by reviewing the C++ code alone.

No need to write 'unsafe' because .cpp files are already known to need carefull review.

masklinn · on Aug 22, 2020

> The point is that the C++ code should be safe because the C++ programmer should not introduce UB on its C++ code.

That's a misunderstanding of safety, and ub, and `unsafe`.

The C++ code could be unsafe when called with certain values which it is not normally called with. This is common. This is also not allowed in Rust, it'd be unsound.

Furthermore C++ has different notions of safety than Rust. C++ allows dangling and null pointers (whether raw or smart), it doesn't allow calling them. Rust does not allow dangling or null pointers unless they're raw. You can have a null unique_ptr, you can not have an empty Box.

alvarelle · on Aug 22, 2020

I believe I understand correctly UB and unsafe.

The cxx crate and the autocxx tool should make sure that the exposed C++ functions only take arguments types which have well defined semantics.

In your example, a rust Box<T> maps to a rust::Box<T> in C++, which cannot be null. And a unique_ptr from C++ maps to a cxx::UniquePtr in rust which can be empty.

If somehow the C++ code puts a dangling or null pointer into a rust::Box, that is clearly a bug in the C++ code.

fluffything · on Aug 24, 2020

I agree with you that by controlling both sides of the FFI (the Rust and the C++ code) one can make sure that the types work.

The real problem is, however, that C++ lacks an "unsafe" keyword, so functions like:

    /// # Unsafe
    ///
    /// Must call `bar` after a sequence of calls to `foo`
    unsafe fn foo();
    fn bar();

just look like

    /// note: must call bar after a sequence of calls to foo
    void foo();
    void bar();

You can autogenerate "correct" C++ code from that Rust code (just loose the "unsafe"), but you cannot autogenerate safe Rust code from that C++ code unless you start parsing and understanding documentation comments (which could be possible, e.g., chromium could annotate C++ APIs that should be unsafe in Rust).

To generate Rust from C++, it does not suffice to just "look at the types" like cxx and autocxx do. One also _at least_ need to read all the API documentation comments, check if there are any invariants that must be preserved, and act accordingly.

If the APIs are ok and can be wrapped mechanically, the actual wrapping can be made trivial with tools, but there is no tool today that will tell you whether this is the case.

That is, at the end of the day, if you need to expose 10k C++ APIs from Rust, you will still need to manually inspect those 10k C++ APIs, and _think_ about whether they are safe or not.

That's the time consuming part, and you actually want to only do this once, and write down why an API is safe or not, so that other programmers don't have to repeat this work every time you hit an FFI issue.

So IMO while cxx and autocxx are "ergonomic", they spare you only the easy lest time consuming portion of the work. autocxx also makes it easy for you to either not check, or not write down the result of the check, and this could end up creating a lot more work down the road.

---

Note that this is something one wants to do even when one trusts that the C++ code is correct. In the example above, the C++ APIs can be correct, but one can still UB by using them incorrectly.

fluffything · on Aug 24, 2020

C++ code only needs to be safe according to C++ rules (not Rust rules). So it is possible for the C++ to be safe, and the corresponding Rust code to be unsafe, e.g.,

* int foo(); which returns an uninitialized int is OK according to C++ rules, but would need a MaybeUninit<c_int> according to Rust rules.

* int foo(); could throw an exception, causing UB in Rust, since Rust assumes FFI declarations not to throw according to the spec. Rust can only export `noexcept(true)` C++ FFI declarations, or C functions (since C cannot throw). Apparently, autocxx and the cxx crate ignore this and treat all C++ functions as if they never throw, giving them a safe API. That's unsound. (One can fix that on nightly Rust though).

Unsafety can also be introduced through ABI incompatibilities, but IIUC autocxx usage of rust-bindgen deals with that.

geofft · on Aug 22, 2020

Do you think that e.g. the Rust bindings to libgit2 or OpenSSL or libc also need to prove that the entire C library being bound is 100% thread and memory safe and free of bugs in order to expose safe wrappers?

masklinn · on Aug 22, 2020

Kinda?

It needs to ensure that whatever preconditions those libraries have which are not reflected in their API because the languages they use don't allow for it are never broken. So let's say a libgit function takes a pointer (for an array) and an index, the rust bindings must ensure that the pointer is valid and the index is within the array.

Will there be bugs and things which will be missed? Likely, after all we've seen that in pure-rust unsafe code, including the standard library.

But the library "can't" just yolo and expose the entire thing as-is through a safe interface. As in technically it can do that just fine, but that's completely unsound even if it's effectively never called incorrectly.

geofft · on Aug 22, 2020

So, if you restrict yourself to looking just at the API and not the implementation, it seems to me that if your library operates entirely on non-pointer C++ types (e.g., it takes in a std::string and returns an std::string), a program could automatically determine that and call the binding "safe". Would that be enough?

I agree that the program should not automatically generate "safe" bindings that take pointers, because in Rust, creating and passing around raw pointers is safe but actually using them is safe. But if, let's say, you have an API that consists entirely of integers, bools, std::strings, and structs and classes thereof, what additional things would you need to check to be confident calling the binding safe?

(Sure, there are weird cases here like "this function takes a long, casts it to a pointer, and dereferences it," but I assume those are uncommon enough that you'll see if you're about to create or use auto-generated bindings to such a function. I suppose there could be "this function takes an std::string and an index, and the index must be less than the length or it's UB" - are those common enough that they make this endeavor questionable?)

masklinn · on Aug 22, 2020

> So, if you restrict yourself to looking just at the API and not the implementation, it seems to me that if your library operates entirely on non-pointer C++ types (e.g., it takes in a std::string and returns an std::string), a program could automatically determine that and call the binding "safe". Would that be enough?

My understanding is that this is the idea behind cxx: high-level C++ types are considered "safe", and so functions which use these are assumed to be safe.

Whether this is sufficient is the debate you see upthread.

> But if, let's say, you have an API that consists entirely of integers, bools, std::strings, and structs and classes thereof, what additional things would you need to check to be confident calling the binding safe?

The entire codebase, because even with no parameters at all, C++ can do thing which Rust considers wildly unsafe. Like manipulating global state, possibly across functions (there are entire libraries which work on that principle).

Put an other way, a "safe rust" function is reentrant, thread-safe and does not cause UB given any input.

By this definition, it is not possible to know that a C(++) function is safe without going through its source with a microscope.

geofft · on Aug 23, 2020

Right, that's what I was getting at - is it inappropriate to publish safe bindings for C libraries that you have not rigorously audited and fuzzed?

(I mean, it would be really cool if we rigorously audited and fuzzed all these C libraries, so I'm not completely opposed to this!)

_bz2r · on Aug 22, 2020

It might just be me, but I feel like from this comment on down, everyone is saying the same thing in different words.

(which is great, when the topic is a little complicated like this)

crb002 · on Aug 22, 2020

Why not wrap for use with dlopen()?