Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to serde from rustc-serialize causes binary to grow significantly #286

Open
alexbool opened this issue Apr 14, 2016 · 13 comments
Open
Assignees

Comments

@alexbool
Copy link

Some time ago I migrated my crate to serde from rustc-serialize and I was somewhat dissapointed that my binary size increased approximately 2x (and so did compilation times).

If I run nm --size-sort -S, I get a lot of symbols like this:

0000000100120c40 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h0683deb22b9931ffE
0000000100123bb0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h2fd21583584d6bf1E
00000001001471c0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h539ca1b6dbf1cb43E
000000010014e650 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5806957261f0bc9bE
0000000100096cd0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hb74320b766f864cbE
00000001001a59e0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hcbf6c07be4ec0d3bE
0000000100109790 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hcc2221ebbabcbce1E
000000010015c1e0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hcccb666beff69b03E
0000000100191100 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hd99ca0f77ada84c1E
00000001001117e0 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hec08322395fba03eE
000000010010c440 0000000000000370 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hf3862946ff0b8b3cE
...
00000001000b5140 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h21580c3267c18d93E
0000000100138910 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5c55d12e7448b4a3E
0000000100193c30 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5d14cb7b1aee2532E
00000001000ab9f0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h5df1c90391bf8f60E
000000010014b7e0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h6b721180eb040b2dE
00000001000d6210 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h85b05d12103203d6E
00000001000b00f0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17h9385aa3d135608f9E
0000000100106900 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hb58e2b12ab99f16dE
00000001000f46a0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hb9a69783daa5ea47E
00000001000bf7c0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hbca6642924436262E
0000000100144350 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17he7129a26a2066f08E
00000001001355e0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hf7e87397027fcc4aE
00000001000ffbb0 0000000000000380 t __ZN41_$LT$rmp_serde..Deserializer$LT$R$GT$$GT$8read_str17hf8b195b1ed175901E

With rustc-serialize, seems like there is no such symbol mutiplication.

OS X rustc 1.9.0-nightly (a43eb4e77 2016-04-12), serde 0.7.0 (I also tried on Windows and results are the same)

@oli-obk
Copy link
Member

oli-obk commented Apr 14, 2016

Are you compiling with optimizations? Have you tried enabling link time optimizations?

@alexbool
Copy link
Author

I run with -O3. I'll try with -C lto soon.
But it's still suspicious. Both rustc-serialize and serde are compiled with same compiler settings and I get different results.

@oli-obk
Copy link
Member

oli-obk commented Apr 14, 2016

Well... Rust's monomorphisation doesn't merge identical implementations yet. We could improve on this probably by adjusting the macros that generate all the code

@alexbool
Copy link
Author

-C lto doesn't help much. ~3% difference in binary size.

@dtolnay
Copy link
Member

dtolnay commented Jul 11, 2016

From IRC on how eddyb creates a count of monomorphized functions in order to debug compiler memory usage (and also executable size):

eddyb: unoptimized LLVM IR
eddyb: first used grep '^define' to get only the lines defining function bodies
eddyb: then regex replace in my editor to remove everything before @ and everything after (
eddyb: then sort | uniq -c

I will try this and see if there is any low-hanging fruit.

@dtolnay
Copy link
Member

dtolnay commented Feb 13, 2017

Command to show which symbols contribute the most to binary size:

nm -S target/debug/json-benchmark \
  | awk '{
           print $4 |& "rustfilt";
           "rustfilt" |& getline id;
           sub(/::h[0-9a-f]{16}$/, "", id);
           sums[id] += strtonum("0x"$2);
           counts[id] += 1
         }
         END{
           for (id in sums) {
             printf "%8s %4s  %s\n", sums[id], counts[id], id
           }
         }' \
  | sort -nr \
  | head -16

The first column is total number of bytes associated with that function across all the different instantiations of it, second column is number of different instantiations of the function (with different generic types etc). I ran it on json-benchmark for lack of a better thing to run it on. Debug mode:

594744   85  <serde_json::de::Deserializer<R>>::parse_value
187045   85  <serde_json::de::Deserializer<R>>::parse_integer
183834   85  <serde_json::de::Deserializer<R>>::parse_exponent
180242   85  <serde_json::de::Deserializer<R>>::parse_decimal
129856   64  <serde_json::de::SeqVisitor<'a, R> as serde::de::SeqVisitor>::visit_seed
 76946   26  <serde_json::de::MapVisitor<'a, R> as serde::de::MapVisitor>::visit_key_seed
 73037   85  <serde_json::de::Deserializer<R>>::visit_f64_from_parts
 72937   85  <serde_json::de::Deserializer<R>>::parse_number
 69700   85  <serde_json::de::Deserializer<R>>::parse_exponent_overflow
 68943   85  <serde_json::de::Deserializer<R>>::parse_long_integer
 65195   85  serde_json::error::Error::fix_position
 59979    1  <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize for json_benchmark::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 44385   55  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeMap>::serialize_value
 36773    1  <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize for json_benchmark::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 35042    1  <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize for json_benchmark::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_seq
 26925    1  <json_benchmark::citm_catalog::_IMPL_DESERIALIZE_FOR_CitmCatalog::<impl serde::de::Deserialize for json_benchmark::citm_catalog::CitmCatalog>::deserialize::__Visitor as serde::de::Visitor>::visit_map

Release mode:

174550   42  <core::marker::PhantomData<T> as serde::de::DeserializeSeed>::deserialize
 53537    4  <serde_json::de::Deserializer<R>>::parse_value
 29883   61  <serde_json::de::Deserializer<R>>::parse_integer
 19929   61  <serde_json::de::Deserializer<R>>::parse_exponent
 18492   61  <serde_json::de::Deserializer<R>>::visit_f64_from_parts
 16438   61  <serde_json::de::Deserializer<R>>::parse_decimal
 15927    1  <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_User::<impl serde::de::Deserialize for json_benchmark::twitter::User>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 12567    1  <json_benchmark::serde_json::Library as json_benchmark::Library>::parse_struct
 12425    1  <json_benchmark::serde_json::Library as json_benchmark::Library>::stringify_struct
 12240    1  <json_benchmark::twitter::_IMPL_DESERIALIZE_FOR_Status::<impl serde::de::Deserialize for json_benchmark::twitter::Status>::deserialize::__Visitor as serde::de::Visitor>::visit_map
 11186    1  json_benchmark::twitter::_IMPL_SERIALIZE_FOR_Status::<impl serde::ser::Serialize for json_benchmark::twitter::Status>::serialize
 10693    1  json_benchmark::num_trials
 10084   61  <serde_json::de::Deserializer<R>>::parse_exponent_overflow
 10076   52  <serde_json::de::Deserializer<R>>::parse_number
  9570    1  stats_arena_print
  8628   21  <serde_json::ser::Compound<'a, W, F> as serde::ser::SerializeStruct>::serialize_field

The number parsing is definitely something we can improve. Those functions are generic on the Visitor type but they do a lot of work that does not touch the visitor and could be factored out.

I don't see any other obvious culprits. We just need to work through these and reduce the number of redundant copies of code being compiled.

@dtolnay
Copy link
Member

dtolnay commented Feb 13, 2017

If anyone has large binary projects using Serde where you care about compile time / executable size, please comment with a link to the repo. I would like to make sure that the things we optimize will actually improve your situation.

@dtolnay
Copy link
Member

dtolnay commented May 3, 2017

serde-rs/json#313 has a good minimal case to reproduce this.

extern crate pandoc_ast;
fn main() {
    pandoc_ast::filter(String::new(), |p| p);
}

Instantiations are listed in serde-rs/json#313 (comment). Again I think we should tackle the number parsing area first.

rubdos pushed a commit to rubdos/serde that referenced this issue Jun 20, 2017
…viper

impl remaining num-traits for std::num::Wrapping<T>

This is a (late) follow-up for [rust-num/num#279 since I realized that implementing `Num` for `Wrapping<T>` was merely half of the work.

This PR makes `Wrapping<T>` implement the remaining appropriate traits, granting it the ability to really be used a complete substitute for its primitive integer counterparts.
Some benefits are :
- Less refactoring for users using `num`'s traits replacing some primitives by their `Wrapping` counterpart (same for the opposite);
- Since `Wrapping<T>` is from `std`, nobody except us can `impl` our traits for it, so people don't have to create their own.
@raphlinus
Copy link

I'm seeing a pretty big impact on xi-editor.

Debug:

  399029   88  <serde::private::de::content::ContentDeserializer<'de, E> as serde::de::Deserializer<'de>>::deserialize_any
  198961 3289  core::ptr::drop_in_place
  181294  502  <core::option::Option<T>>::map
  173009   68  serde_json::value::de::<impl serde::de::Deserializer<'de> for serde_json::value::Value>::deserialize_any
  165551  401  <core::result::Result<T, E>>::map
   76056   86  <alloc::raw_vec::RawVec<T, A>>::double
   64426   20  <serde::private::de::content::ContentRefDeserializer<'a, 'de, E> as serde::de::Deserializer<'de>>::deserialize_any
   62602  116  <core::result::Result<T, E>>::map_err
   59192   28  <alloc::btree::map::IntoIter<K, V> as core::iter::iterator::Iterator>::next
   54314  134  <alloc::btree::node::NodeRef<BorrowType, K, V, Type>>::ascend
   51567   63  <alloc::raw_vec::RawVec<T, A>>::reserve
   51300   76  <alloc::raw_vec::RawVec<T, A>>::allocate_in
   50644  108  <serde::de::value::MapDeserializer<'de, I, E>>::end
   43181   29  std::collections::hash::map::search_hashed
   42266   67  <alloc::vec::Vec<T>>::extend_desugared
   41073   67  <alloc::vec::Vec<T> as alloc::vec::SpecExtend<T, I>>::spec_extend

Release:

  204186 1732  core::ptr::drop_in_place
  168838   54  <core::marker::PhantomData<T> as serde::de::DeserializeSeed<'de>>::deserialize
  126183   11  <serde::private::de::content::ContentDeserializer<'de, E> as serde::de::Deserializer<'de>>::deserialize_any
   72758    1  <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_CoreNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::CoreNotification>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   55573    1  <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_EditNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::EditNotification>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   48550    7  serde_json::value::de::<impl serde::de::Deserializer<'de> for serde_json::value::Value>::deserialize_any
   46759    1  <xi_core_lib::internal::plugins::rpc::_IMPL_DESERIALIZE_FOR_PluginNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::internal::plugins::rpc::PluginNotification>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   41345    1  xi_core_lib::internal::tabs::Documents::handle_notification
   40838    1  <xi_core_lib::internal::plugins::rpc::_IMPL_DESERIALIZE_FOR_PluginNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::internal::plugins::rpc::PluginNotification>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   35555    1  <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_CoreNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::CoreNotification>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   32647    1  <xi_core_lib::internal::plugins::manifest::_IMPL_DESERIALIZE_FOR_PluginDescription::<impl serde::de::Deserialize<'de> for xi_core_lib::internal::plugins::manifest::PluginDescription>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   32604  111  <alloc::arc::Arc<T>>::drop_slow
   31838    1  <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_EditNotification::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::EditNotification>::deserialize::__Seed<'de> as serde::de::DeserializeSeed<'de>>::deserialize
   24323    1  <xi_core_lib::rpc::_IMPL_DESERIALIZE_FOR_CoreRequest::<impl serde::de::Deserialize<'de> for xi_core_lib::rpc::CoreRequest>::deserialize::__Visitor<'de> as serde::de::Visitor<'de>>::visit_map
   22424    3  ref.1B
   21805    8  serde::de::SeqAccess::next_element

That's measured on linux. Happy to do more digging, and try low-hanging fruit that might improve things.

@dtolnay
Copy link
Member

dtolnay commented Apr 16, 2018

Thanks Raph! Those ContentDeserializer::deserialize_any, ContentRefDeserializer::deserialize_any, and Value::deserialize_any numbers definitely should not be that high. I will see what I can do. So far there has not been any effort to optimize any of those.

ContentRefDeserializer shows up when dealing with untagged enums and ContentDeserializer shows up in internally tagged and adjacently tagged enums. They are the mechanism Serde uses for buffering data until we are able to determine which variant of the enum we are looking at.

@raphlinus
Copy link

@dtolnay We're heavily using the tag mechanism on enums (see https://github.com/google/xi-editor/blob/master/rust/core-lib/src/rpc.rs#L89), so not surprising if that's the code path affected.

Thanks for any help in improving this.

@dtolnay
Copy link
Member

dtolnay commented Apr 17, 2018

As a start, I opened xi-editor/xi-editor#617 with a 22% improvement from upgrading to serde 1.0.39 and serde_json 1.0.15.

@dtolnay
Copy link
Member

dtolnay commented Apr 21, 2018

I opened xi-editor/xi-editor#628 with a 12% improvement this time focusing on the release binary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants