This is an unhelpful interpretation. With a decent memory-safe parser, it’s perfectly safe [1] to deserialize JSON or (most of) XML [0] protobuf or Cap’n Proto or HTTP requests, etc. Or to query a database containing untrusted data. You need to be careful that you don’t introduce a vulnerability by doing something unwise with the deserialized result, but a good deserializer will safely produce a correctly typed output given any input, and the biggest risk is that the output is excessively large.
But tools like Pickle or Java deserialization or, most likely, rkyv_dyn will happily give you outputs that contain callables and that contain behavior, and the result is not safe to access. (In Python, it’s wildly unsafe to access, as merely reading a field of a Python object calls functions encoded by the class, and the class may be quite dynamic.)
[0] The world is full of infamously dangerous XML parsers. Don’t use them, especially if they’re written in C or C++ or they don’t promise that they will not access the network.
> The solution is to add a cryptographic signature to detect tempering.
If you don’t have a deserializer that works on untrusted input, how do you verify signatures. Also, do you really thing it’s okay to do “sh $cmd” just because you happen to have verified a signature.
> This is also called a man in the middle attack.
I suggest looking up what a man in the middle attack is.
Ah, I see the confusion. rkyv_dyn doesn't serialize code. Rust is compiled to machine code. It would be quite a feat to accomplish.
I was a bit confused when you compared it to Python pickle and assumed you were talking about general input validation somehow.
I agree that pickle and similar are profoundly surprising and error prone. I struggle to find any reasonable reason one would want that.
As for the man in middle attack, I meant that if somebody intercepts the serialized form, they can mutate it. And without a cryptographic signature, you wouldn't know.
> rkyv_dyn doesn't serialize code. Rust is compiled to machine code.
Java is compiled to bytecode, and Obj-C is compiled to machine code. Yet both Android and iOS have had repeated severe vulnerabilities related to deserializing an object that contains a subobject of an unexpected type that pulls code along with it. It seems to be that rkyv_dyn has exactly the same underlying issue.
Sure, Rust is “safe”, and if all the unsafe code is sufficiently careful, it ought to be impossible to get the type of corruption that results in direct code execution, memory writes, etc. But systems can be fully compromised by semantic errors, too.
If I’m designing a system that takes untrusted input and produces an object of type Thing, I want Thing to be pure data. Once you start allowing an open set of methods on Thing or its subobjects, you have lost control of your own control flow. So doing:
thing.a.func()
may call a function that wasn’t even written at the time you wrote that line of code or even a function that is only present in some but not all programs that execute that line of code.
Exploiting this is considerably harder than exploiting pickle, but considerably harder is not the same as impossible.
You know very well what I meant by "compile to machine code". But you decided to interpret it in a combative way. Even though you seem very knowledgeable, this makes me want to stop discussing with you.
Ultimately you should read the code of rkyv_dyn to understand what it does instead of making random claims.
It will be faster for you to read the code than for me to attempt explaining how it works. Especially since you will most likely choose the least charitable interpretation of everything I say. There is very little code, it won't take long.
> You know very well what I meant by "compile to machine code".
I really don't. I think you mean that Rust compiles to machine code and neither loads executable code at runtime nor contains a JIT, so you can't possibly open a file and deserialize it and end up with code or particularly code-like things from that file being executed in your process.
My point is that there's an open-ended global registry of objects that implement a given trait, and it's possible (I think) to deserialize and get an unexpected type out, and calling its methods may run code that was not expected by whoever wrote the calling code. And the set of impls and thus the set of actual methods may expand by the mere fact of linking something else into the project.
This probably won't blow up quite as badly as NSCoding does in ObjC because Rust is (except when unsafe is used) memory-safe, so use-after-free just from deserializing is pretty unlikely. But I would still never use a mechanism like this if there was any chance of it consuming potentially malicious input.
But tools like Pickle or Java deserialization or, most likely, rkyv_dyn will happily give you outputs that contain callables and that contain behavior, and the result is not safe to access. (In Python, it’s wildly unsafe to access, as merely reading a field of a Python object calls functions encoded by the class, and the class may be quite dynamic.)
[0] The world is full of infamously dangerous XML parsers. Don’t use them, especially if they’re written in C or C++ or they don’t promise that they will not access the network.
> The solution is to add a cryptographic signature to detect tempering.
If you don’t have a deserializer that works on untrusted input, how do you verify signatures. Also, do you really thing it’s okay to do “sh $cmd” just because you happen to have verified a signature.
> This is also called a man in the middle attack.
I suggest looking up what a man in the middle attack is.