[HN Gopher] Native Reflection in Rust
       Native Reflection in Rust
       Author : jswrenn
       Score  : 199 points
       Date   : 2022-12-15 15:54 UTC (7 hours ago)
 (HTM) web link (jack.wrenn.fyi)
 (TXT) w3m dump (jack.wrenn.fyi)
       | unconed wrote:
       | My version of Greenspun's Tenth [1] is that any sufficiently
       | complex static language contains an adhoc, informally specified,
       | bug ridden and slow version of a dynamic "any" type.
       | Thx OP for providing an example.
       | [1] https://en.wikipedia.org/wiki/Greenspun's_tenth_rule
         | kibwen wrote:
         | Rust has a dynamic any type, `std::any::Any`.
       | 8jy89hui wrote:
       | This is a beautiful (hacky) demo of something that I didn't think
       | was possible in Rust (yet). I hope other applications don't
       | accidentally start using it just to discover that it doesn't work
       | in release mode.
       | Very impressive work!
         | jswrenn wrote:
         | Oh, I should add a note about that. Fortunately, it's quite
         | easy to tell Rust to generate debuginfo even in release mode.
       | bouk wrote:
       | It would be really cool if it was possible to natively inspect
       | the state of a Rust generator in a type-safe way
       | Animats wrote:
       | _" When you call .reflect on a dyn Reflect value, deflect figures
       | out its concrete type in four steps:"_
       | * _invokes local_type_id to get the memory address of your
       | value's static implementation of local_type_id_
       | * _maps that memory address to an offset in your application's
       | binary_
       | * _searches your application's debug info for the entry
       | describing the function at that offset_
       | * _parses that debugging information entry (DIE) to determine the
       | type of local_type_id's &self parameter_.
       | This is a rather strange thing to bolt onto a language. I could
       | see this as an external tool. The use case seems to be programs
       | which used "async" so much they can't figure out the resulting
       | state machine. External debug tools to view and examine the async
       | state machine might be helpful.
       | My experience with Rust has been that debugging of safe code is
       | just not a problem. Print statements and logging are enough.
         | pcwalton wrote:
         | > This is a rather strange thing to bolt onto a language. I
         | could see this as an external tool.
         | It _is_ an external tool. This is a crate, not a part of the
         | compiler.
         | loeg wrote:
         | > This is a rather strange thing to bolt onto a language.
         | It can just be an extremely fun and cute demo, without
         | practical application.
           | jerf wrote:
           | It can also be something that looks cool and doesn't
           | necessarily ever get past "kinda works", but piques the
           | interest of the core dev team and they take steps to make it
           | work even better, resulting in the ultimate "deprecation" of
           | this sort of thing by virtue of it being even better
           | integrated into the core.
           | I don't have the context to judge the probability of that in
           | this specific case (lots of technical nitty-gritty comes in
           | to this sort of thing), but I've certainly seen similar
           | things happen in other communities.
           | More-nitors wrote:
           | how about adding this to debuggers for better object-views?
           | (could it be possible to provide near-js/python/java level of
           | obj view?)
             | gpderetta wrote:
             | Thus is already using DWARF debug infos. Using this for
             | debugging would be a long way around to arrive where you
             | started
             | You can already script gdb to provide rich views of any
             | data structure.
       | olvy0 wrote:
       | I've used very similar method, at work, to provide C++
       | "reflection" between my own system and a system from another
       | team.
       | Basically, the other system is a dynamic library which sends and
       | receives C structures from my application. Those structures are
       | then mapped into a buffer that is supposed to have the same size
       | and there are pointers with metadata pointing into the buffer
       | that are supposed to be exactly like the struct elements. Those
       | structures can have arbitrary complexity, and are passed around
       | through type erasure (essentially char*).
       | I wrote a "reflection" code for the other team, which runs when
       | they register the struct instance to be sent, checks if there's a
       | matching PDB [0] around, reads it, and outputs a json including
       | the metadata needed, which can then be used to define the
       | structures' metadata on our side correctly.
       | This is all in C/C++ since in some contexts we have soft real-
       | time requirements, else I would have used any of the many RPC
       | frameworks available.
       | This has been working for several years now.
       | This is not a generic solution but it's good enough for in-house
       | communication between 2 systems that are maintained by different
       | parts of the organization, where the API between them, that like
       | I said is based on passing around char* buffers, has been more or
       | less set in stone a long time ago. Conway's law [1] and all that.
       | Sigh.
       | [0] We are a Windows shop although the same thing should work
       | with DWARF info, same as the OP library works. In fact he says
       | "It may never work on Windows, which does not use DWARF to encode
       | debug info" but I can say that the same approach does work on
       | Windows, for C++ at least. The PDB format might be a tad
       | undocumented, but its documentation has been improved in the last
       | decade or so since I started working on my library. Writing some
       | small test programs is enough to understand how to access it, if
       | all you need is meta info on C-style structures. Other stuff is
       | more... challenging. But it wasn't necessary for my use-case.
       | [1] https://en.wikipedia.org/wiki/Conway%27s_law
       | kp995 wrote:
       | Can't we rely more on Rust's Pattern Matching and it's strong
       | type system?
       | Reflection seems more helpful when the programming language is
       | little unsounded.
         | jswrenn wrote:
         | Absolutely! That's the approach that frunk [0] takes. Frunk
         | (and other reflection libraries like it) are suitable for most
         | use cases, and make better use of Rust's affordances.
         | My crate is suitable for cases where you cannot know (or
         | control) the set of types you might need to reflect on in
         | advance. It's primary use-cases are related to debugging.
         | [0]: https://docs.rs/frunk
           | halfmatthalfcat wrote:
           | Is Frunk Rust's Shapeless (from Scala)?
             | jswrenn wrote:
             | Yep!
       | Thaxll wrote:
       | Today I learn that Rust does not have reflection.
         | estebank wrote:
         | Reflection is usually not available in AoT compiled languages.
         | The prevalent Rust coding styles rely heavily on monomorphic
         | data types and functions, meaning there's nothing _left_ to
         | reflect at runtime. But if you want to deal with trait objects
         | and need to access the underlying type, you need to use
         | Any::downcast or rely on annotations on every type you want to
         | reflect on. Or now, leverage DWARF info on Linux with deflect.
           | omginternets wrote:
           | What are monomorphic data types? What should be my first read
           | on the subject?
             | estebank wrote:
             | It's a fancy way of saying "every time this type is used,
             | replace all the generic type params with what was used and
             | generate code for it". It's how generics are implemented in
             | Rust. If you have                   struct Foo<T>(T);
             | And you create Foo(42i32) and Foo(0.0f64), the compiler
             | will create the equivalent to                   struct
             | Fooi32(i32);         struct Foof64(f64);
             | In other languages like Java, generics are implemented the
             | way that Rust does "trait objects" (&dyn Trait).
             | Rust is not the only language that does this, to be clear.
             | If you're interested in a quick intro on the _compiler_
             | side of this, you can read https://rustc-dev-guide.rust-
             | lang.org/backend/monomorph.html
               | shpongled wrote:
               | Nice examples - you can also have languages (like SML)
               | where monomorphization is simply an implementation
               | detail. Some compilers (e.g., MLton) perform
               | monomorphization and others don't.
               | yakubin wrote:
               | That depends on what you mean. SML has "polymorphism"
               | boiling down to being able to plug an arbitrary type in
               | some places, which is denoted like _' a_. But when people
               | talk about generics, they more often talk about C++
               | templates, Java generics, Rust traits, etc. whose SML
               | equivalent are signatures, structs and functors.
               | Signatures are a bit like Rust traits, structs are a bit
               | like Rust implementations of traits, whereas functors are
               | like Rust's "templates", i.e. wherever you swap angle
               | brackets to parametrise something with types constrained
               | by traits, or values constrained by types. Except in Rust
               | this parametrisation can be slapped on a bunch of things.
               | It can be on structs, on functions, on traits, on
               | implementations of traits etc. In SML you need to group
               | all the "parametrised" things into a struct (and a
               | corresponding signature), which is going to be returned
               | by a functor.
               | And now the thing is: with transparent signature
               | ascriptions, functors are monomorphised in SML, instead
               | of everything being hidden behind signatures (as is in
               | the case of Rust with traits when you use _dyn_ ), which
               | has semantic consequences. E.g. a struct returned by a
               | functor may contain a type. You can't perform proper
               | type-checking without monomorphising, because you don't
               | know what the exact type is. E.g. in the following
               | program, the final line couldn't be type-checked without
               | monomorphisation:                  signature ITERABLE =
               | sig            type ElemT            type SrcT
               | val new_iter: SrcT -> unit -> ElemT option        end
               | signature LIST_ELEM_TYPE = sig            type T
               | end                functor ListIterFun (ListElemType:
               | LIST_ELEM_TYPE): ITERABLE = struct            type ElemT
               | = ListElemType.T            type SrcT = ElemT list
               | fun new_iter l = let val lr = ref l
               | in                               fn () => case !lr of
               | nil => NONE                                        |
               | (x::xs) => (lr := xs; SOME x)
               | end                end                structure
               | IntElemType: LIST_ELEM_TYPE = struct            type T =
               | int        end                structure IntListIter =
               | ListIterFun(IntElemType)                val next =
               | IntListIter.new_iter [1, 2, 3, 4, 5]
               | If I change the signature ascription on ListIterFun to an
               | opaque ascription ( _: > ITERABLE_), the final line won't
               | type-check, because it's not obvious from the signature,
               | that ElemT is int. So transparent signature ascriptions
               | require monomorphisation (Rust traits without _dyn), and
               | opaque signature ascriptions free the compiler from
               | having to do monomorphisation (Rust traits with_ dyn*).
               | There was a lot of discussion of this issue when Go was
               | settling on a design for its generics, under the phrase
               | "reified generics".
               | codeflo wrote:
               | I only recently realized that certain type system
               | features, like polymorphic recursion, make
               | monomorphization impossible in the general case. In
               | Haskell for example, it's by necessity only an
               | optimization that's used where applicable, and not the
               | general strategy.
               | gloryjulio wrote:
               | I think cpp does this too
               | estebank wrote:
               | It indeed does. The only difference is that Rust has
               | traits (similar to C++'s concepts) which require explicit
               | mention of what interface the type parameters have inside
               | the function, whereas C++'s templates will have a compile
               | error _after_ instantiation if you passed something that
               | didn 't meet the expected contract. This is closer to
               | Rust's macros in operation.
               | Given                   fn foo<T>(a: T, b: T) -> T { a +
               | b }
               | The compiler will complain that you should have been
               | explicit on how T is going to be used:
               | error[E0369]: cannot add `T` to `T`          -->
               | src/lib.rs:1:32           |         1 | fn foo<T>(a: T,
               | b: T) -> T { a + b }           |
               | - ^ - T           |                              |
               | |                              T           |
               | help: consider restricting type parameter `T`           |
               | 1 | fn foo<T: Add<Output = T>>(a: T, b: T) -> T { a + b }
               | |         +++++++++++++++++
               | whereas in C++ this would have been accepted _until_ you
               | called foo with two things that couldn 't be added
               | together, like a Rust macro[1].
               | [1]: https://play.rust-
               | lang.org/?version=nightly&mode=debug&editi...
               | codeflo wrote:
               | To add to this, even the Foo-wrapper is gone, just the
               | i32 remains. Rust values are amorphous data blobs at
               | runtime.
               | CryZe wrote:
               | ABI wise that is not true though. structs have struct
               | ABI, even just a newtype struct around an integer will
               | not use integer ABI unless annotated with
               | #[repr(transparent)].
               | estebank wrote:
               | Yes, that's true but that is an implementation detail
               | that only comes into play when dealing with ABI, and
               | _then_ you should be using #[repr(transparent)] to ensure
               | that the compiler won 't do something else :)
               | codeflo wrote:
               | Sure, it's good to point out the difference between "the
               | behavior of a typical optimizing compiler" and "things
               | actually guaranteed by the language". The context of the
               | discussion was the former, I think. I'm not even that
               | certain that monomorphization is actually required in
               | theory.
               | estebank wrote:
               | Yes, monomorphization isn't _needed_ in theory, as long
               | as the user-visible behavior remains the same, and in
               | practice the team is exploring options[1] to identify
               | cases where the currently manual practice of writing
               | pub fn foo<T: AsRef<X>>(x: T) {
               | inner_foo(x.as_ref());         }         fn inner_foo(_:
               | &X) { todo!() }
               | can be instead done by the compiler automatically
               | (turning monomorphized code back into polymorphic code,
               | hence the polimorphization hame).
               | [1]: https://rustc-dev-guide.rust-
               | lang.org/backend/monomorph.html...
               | estebank wrote:
               | Expanding on trait objects: these are implemented as
               | "V-Tables", structs holding pointers to the trait's
               | methods and to the underlying type. This means that if
               | you _need_ to know what the underlying type, you have to
               | do something fancy, usually referred to as  "reflection".
               | Also, invocation of generic functions that use V-Tables
               | require "chasing pointers", which makes cache locality
               | worse (because data might not be in the same cache read
               | as the v-table itself), but makes the generated binary
               | smaller (because if you have something like Foo<T> used
               | with 1000 types, with monomorphization you end up with
               | 2000 generated types in the binary, instead of 1001 with
               | trait objects).
               | Joker_vD wrote:
               | Pretty sure that some usage patterns of polymorphic types
               | can not be completely monomorphized. Here's example in
               | Golang:                   package main
               | import (             "fmt"         )              type
               | wrapper[T any] struct {             Value T         }
               | func (w wrapper[T]) String() string {             return
               | fmt.Sprintf("{%v}", w.Value)         }              func
               | stringWrapped[T any](n int, v T) string {             if
               | n == 0 {                 return fmt.Sprintf("%v", v)
               | }             return stringWrapped(n-1, wrapper[T]{Value:
               | v})         }              func main() {             n :=
               | 0             fmt.Scanf("%d", &n)             result :=
               | stringWrapped(n, "test")             fmt.Println(result)
               | }
               | Go refuses to compile because it can't possibly generate
               | all instances of wrapper[T] that this program may use:
               | wrapper[string], wrapper[wrapper[string]],
               | wrapper[wrapper[wrapper[string]]], etc.
               | estebank wrote:
               | Rust will complain about a recursion limit being reached
               | during instantiation[1]. The solution in Rust is to use
               | &dyn Trait or Box<dyn Trait> instead.[2]
               | [1]: https://play.rust-
               | lang.org/?version=stable&mode=debug&editio...
               | [2]: https://play.rust-
               | lang.org/?version=stable&mode=debug&editio...
               | ^ This blows the stack because it keeps calling itself
               | with no break condition, but shows how the type system
               | accepted the code.
               | gpderetta wrote:
               | I think this is called polymorphic recursion in Haskell
               | circles.
               | In C++ you can monomorphize as long as you can somehow
               | prove the recursion terminates at compile time (for
               | example by threading a static recursion counter).
               | dgb23 wrote:
               | Not exactly the same thing but JITs can turn dynamic
               | objects into structs if the structure is consistent. JS
               | runtimes and Julia do this as far as I know.
               | adgjlsfhk1 wrote:
               | Julia doesn't do this. It just has structs in the first
               | place.
               | mmis1000 wrote:
               | Firefox's js runtime also do tricks like generate multi
               | copy of optimized function when the function has multi
               | call site instead make one with lots of if else. So it no
               | longer suffer from the problem that function that
               | frequently get multi different type of parameters from
               | different call site has poor performance.
               | It's probably exactly how templates work, except the
               | details are invisible to users.
               | https://hacks.mozilla.org/2020/11/warp-improved-js-
               | performan...
               | estebank wrote:
               | Yes! Java as well. And this is how those languages can
               | show impressive benchmarks for consistent workloads. In
               | theory they can even surpass AoT languages. In practice
               | it depends on the specifics.
             | [deleted]
           | planede wrote:
           | That's runtime reflection.
           | Compile time reflection AFAIK is available in D and Zig, and
           | is planned for C++.
             | elcritch wrote:
             | That's right. Nim does as well. It's amazing. Once you get
             | used to having CTTI and being able to use it, it's hard to
             | program without it. Bonus points if you can do basic
             | dependent types too.
             | In C++ with SFINAE you can effectively do CTTI-style
             | programming in C++. C++ has long had runtime type
             | reflection as well (RTTI), though it needs to be compiled
             | in. Looks like there's a boost library for CTTI.
               | Conscat wrote:
               | The C++ reflection improves a lot in C++20, but it's
               | still very limited compared to that aspect of Nim, or
               | even Zig. The std::meta::info and "splices" based on
               | Haskell for C++26 are incredibly exciting to me. I have
               | many use cases in mind. Splices in combination with
               | std::embed will make C++ basically just a bad Racket (but
               | one with inline assembly!).
             | yakubin wrote:
             | Yup. I consider runtime reflection an antifeature, which
             | has negative performance effects, is unsafe (see e.g.
             | log4j) and leads to fragile code.
             | I would however welcome static reflection with open arms.
             | In Rust in particular, I'd prefer it if derive was
             | implemented using static reflection, rather than proc
             | macros.
         | nestorD wrote:
         | The usual argument is that between having macro and focusing on
         | a strong type system, there are very few legitimate usecase for
         | reflection left in Rust.
         | snordgren wrote:
         | Rust has very little influence from reflection-heavy languages
         | like Java and C#. On their list of influences
         | (https://doc.rust-lang.org/reference/influences.html), Java is
         | not even mentioned, and C# is only mentioned for its
         | attributes. There is very little overlap between the design
         | philosophies that influenced Rust and Java/C#.
         | Ruts does not support inheritance either. But I have never
         | missed either feature in a Rust program.
         | Tuna-Fish wrote:
         | Reflection is typically provided by a runtime, and languages
         | that don't have runtimes usually don't have it. You shouldn't
         | expect a low-level systems language to have reflection. There
         | is no zero-cost way of implementing it.
           | spacechild1 wrote:
           | This is of course only true for runtime reflection. And which
           | language does not have a runtime?
           | Joker_vD wrote:
           | Except Rust has runtime: [0]. And so, usually, does C (in
           | hosted implementations).
           | [0] https://doc.rust-lang.org/reference/runtime.html
             | pornel wrote:
             | These are a couple of functions executables can call at run
             | time, but they're more like an extra standard library. It's
             | not a runtime in the same sense as a runtime in dynamic or
             | GC languages that manages all objects and is able to know
             | types of arbitrary objects and inspect/trace them.
             | Rust has no run-time type information except limited
             | downcasts via `dyn Any` or explicitly derived traits on
             | per-type basis, and these features compile to type-specific
             | monomorphic code rather than calling some run-time
             | reflection.
               | throwaway894345 wrote:
               | Pretty sure you don't need a runtime to track runtime
               | type info. What we think of as a "runtime" in GC
               | languages is usually several distinct things (a
               | scheduler, a GC, and maybe some other stuff in the case
               | of Java/.Net).
             | [deleted]
       | armchairhacker wrote:
       | Does this still work if the application is complied in release
       | mode or with optimizations?
       | Even if not, this is still very useful for debugging
         | jswrenn wrote:
         | It only works if DWARF is generated. By default, the `release`
         | profile of Cargo sets `debug = false` [0]. But, it's quite easy
         | to override this setting, and have a build that is both
         | optimized and includes debuginfo.
         | [0]: https://doc.rust-
         | lang.org/cargo/reference/profiles.html#rele...
       | jeroenhd wrote:
       | Does using DWARF info imply that this will break when you strip
       | the resulting executable? I often strip my Rust binaries because
       | it practically halves the application size, which can become
       | quite a lot in a language where you're statically linking
       | everything.
       | Regardless, quite an ingenious use of standard ELF features, I
       | didn't think this would be possible in Rust without adding some
       | kind of VM around reflection code.
         | jswrenn wrote:
         | Yes, unfortunately that's a tradeoff here. Rust does support
         | splitting debug info into other files, but Deflect doesn't
         | support loading split debuginfo _yet_.
         | HideousKojima wrote:
         | C# has similar issues where they have to be conservative about
         | what them trim from binaries for AoT in case it is used for
         | reflection, so I imagine you'd run into the same issues for
         | almost any compiled language you want to implement reflection
         | for.
       | davidhyde wrote:
       | Great writeup! The defmt logging crate uses a linker script to
       | extract debug symbols so that you get nicely formatted stack
       | traces on embedded systems. It works on linux, macos and windows.
       | I wonder if the same technique can be applied to this project. It
       | needs a runner though so may not be the right approach.
       | https://github.com/knurling-rs/defmt
       (page generated 2022-12-15 23:00 UTC)