Skip to content

Conversation

@casperisfine
Copy link
Contributor

Fix: #179

YJIT doesn't support being interrupted by signal in random places, and probably will never support it.

Basically turn off #150, if YJIT is enabled.

cc @tenderlove @XrXr @jhawthorn

Fix: tmm1#179 YJIT doesn't support being interrupted by signal in random places, and probably will never support it.
@casperisfine
Copy link
Contributor Author

I got some weird failures on 2.3 and 2.4, but not sure it's worth digging into it, I'd be tempted to make 2.5 the minimum tested version.

@casperisfine
Copy link
Contributor Author

I tested this on our YJIT CI and it does get rid of the VM crashes.

@tenderlove tenderlove merged commit ec245ef into tmm1:master Jul 18, 2022
@tenderlove tenderlove deleted the yjit-head-compat branch July 18, 2022 15:53
* See https://github.com/ruby/ruby/commit/0e276dc458f94d9d79a0f7c7669bde84abe80f21
*/
#if RUBY_API_VERSION_MAJOR < 3
stackprof_use_postponed_job = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bug.
If Ruby < 3.0 (or YJIT), we want to use postponed job.
In other words rb_profile_frames() can only be used anywhere for Ruby 3.0+ no YJIT.
This is most likely the cause of https://bugs.ruby-lang.org/issues/18967 and recent CI failures #182 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #186

@eregon
Copy link
Contributor

eregon commented May 21, 2023

One surprising side effect of this is wildly different profiles when using YJIT vs not, and some of these differences are not because of JITed code but because of postponed jobs when using YJIT vs not.

For instance I'm looking at YARP parse, dump and load performance (on branch varint).
3.2.1 without YJIT:

$ ruby -Ilib -ryarp -rstackprof/autorun -rbenchmark -e '10.times { p Benchmark.realtime { Dir.glob("top-100-gems/activesupport-7.0.4.3/**/*.rb") { |file| code = File.read(file); YARP.load code, YARP.dump(code, file) } } }' 0.20659038900066662 0.20447865699952672 0.20755467499930091 ... StackProf results dumped at: /tmp/stackprof20230521-28681-srrvhw.dump $ stackprof /tmp/stackprof20230521-28681-srrvhw.dump ================================== Mode: wall(1000) Samples: 2006 (0.35% miss rate) GC: 144 (7.18%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 273 (13.6%) 273 (13.6%) YARP.dump 349 (17.4%) 237 (11.8%) Class#new 275 (13.7%) 227 (11.3%) YARP::Serialize::Loader#load_integer 1532 (76.4%) 163 (8.1%) YARP::Serialize::Loader#load_node 496 (24.7%) 156 (7.8%) YARP::Serialize::Loader#load_token 97 (4.8%) 97 (4.8%) (sweeping) 503 (25.1%) 79 (3.9%) YARP::Serialize::Loader#load_location 79 (3.9%) 79 (3.9%) StringIO#read 1517 (75.6%) 58 (2.9%) Enumerable#map 58 (2.9%) 58 (2.9%) StringIO#getbyte 397 (19.8%) 57 (2.8%) YARP::Serialize::Loader#load_optional_token 1862 (92.8%) 54 (2.7%) Integer#times 1478 (73.7%) 51 (2.5%) YARP::Serialize::Loader#load_optional_node 48 (2.4%) 48 (2.4%) String#[] 48 (2.4%) 48 (2.4%) String#unpack1 47 (2.3%) 47 (2.3%) (marking) 41 (2.0%) 41 (2.0%) YARP::Location#initialize 38 (1.9%) 38 (1.9%) IO.read 36 (1.8%) 36 (1.8%) String#force_encoding 27 (1.3%) 27 (1.3%) YARP::Token#initialize 1517 (75.6%) 18 (0.9%) Enumerator#each 66 (3.3%) 17 (0.8%) YARP::Serialize::Loader#load_optional_location 45 (2.2%) 13 (0.6%) YARP::Serialize::Loader#load_string 11 (0.5%) 11 (0.5%) StringIO#pos 11 (0.5%) 11 (0.5%) YARP::CallNode#initialize 7 (0.3%) 7 (0.3%) StringIO#pos= 6 (0.3%) 6 (0.3%) Integer#<< 1541 (76.8%) 5 (0.2%) YARP::Serialize::Loader#load 5 (0.2%) 5 (0.2%) YARP::DefNode#initialize 1861 (92.8%) 4 (0.2%) Dir.glob 

3.2.1 with YJIT:

$ ruby --yjit -Ilib -ryarp -rstackprof/autorun -rbenchmark -e '10.times { p Benchmark.realtime { Dir.glob("top-100-gems/activesupport-7.0.4.3/**/*.rb") { |file| code = File.read(file); YARP.load code, YARP.dump(code, file) } } }' 0.17736748199968133 0.15191541199965286 0.14858805099993333 ... StackProf results dumped at: /tmp/stackprof20230521-28997-ym472m.dump $ stackprof /tmp/stackprof20230521-28997-ym472m.dump ================================== Mode: wall(1000) Samples: 1512 (0.40% miss rate) GC: 125 (8.27%) ================================== TOTAL (pct) SAMPLES (pct) FRAME 290 (19.2%) 290 (19.2%) BasicObject#initialize 144 (9.5%) 144 (9.5%) String#unpack1 128 (8.5%) 128 (8.5%) StringIO#getbyte 117 (7.7%) 117 (7.7%) YARP::Location#initialize 109 (7.2%) 109 (7.2%) StringIO#read 102 (6.7%) 102 (6.7%) String#[] 93 (6.2%) 93 (6.2%) (sweeping) 78 (5.2%) 78 (5.2%) YARP::Token#initialize 68 (4.5%) 68 (4.5%) String#force_encoding 35 (2.3%) 35 (2.3%) Integer#== 34 (2.2%) 34 (2.2%) IO.read 32 (2.1%) 32 (2.1%) (marking) 1386 (91.7%) 31 (2.1%) Integer#times 112 (7.4%) 23 (1.5%) YARP::Serialize::Loader#load_integer 23 (1.5%) 23 (1.5%) Integer#<< 19 (1.3%) 19 (1.3%) StringIO#pos 1029 (68.1%) 18 (1.2%) Enumerable#map 16 (1.1%) 16 (1.1%) YARP::CallNode#initialize 13 (0.9%) 13 (0.9%) YARP::StatementsNode#initialize 10 (0.7%) 10 (0.7%) YARP::ArgumentsNode#initialize 10 (0.7%) 10 (0.7%) YARP.dump 9 (0.6%) 9 (0.6%) YARP::LocalVariableReadNode#initialize 999 (66.1%) 8 (0.5%) YARP::Serialize::Loader#load_optional_node 255 (16.9%) 8 (0.5%) YARP::Serialize::Loader#load_optional_token 341 (22.6%) 7 (0.5%) YARP::Serialize::Loader#load_token 7 (0.5%) 7 (0.5%) StringIO#pos= 1029 (68.1%) 6 (0.4%) Enumerator#each 1385 (91.6%) 5 (0.3%) Dir.glob 1040 (68.8%) 4 (0.3%) YARP::Serialize::Loader#load_node 4 (0.3%) 4 (0.3%) YARP::ConstantReadNode#initialize 

YARP.dump (a method implemented in C) had 273 samples (=~ 273 ms) in the first profile, but only 10 samples in the second profile. So with YJIT (with postponed jobs) it seems the time spent in C extension methods is misreported, and probably attributed to something else incorrectly.
And also the time spent in JITed code might also be misattributed.

For instance 19.2% BasicObject#initialize seems unlikely and too high here for an empty method, that's probably in part a misattribution for allocation, i.e., for Class#new, and possibly a misattribution for YARP.dump (that's just a guess though).

@casperisfine
Copy link
Contributor Author

Note that historically, stackprof would just always go through postponed jobs.

It's only since #150 that postponed jobs are skipped if you are on Ruby 3+.

So yeah, enabling YJIT introduce some amount of inaccuracy, but I don't see what we can do about it :/

@ivoanjo
Copy link
Contributor

ivoanjo commented May 22, 2023

In the JVM, there's a ~unofficial API that profilers such as async-profiler where they can still sample from the signal handler, even in the presence of JIT and other weird VM states (such as during GC).

That's obviously a non-trivial approach, but making rb_profile_frames (or some successor) always be usable from the signal handler would help avoiding this sampling bias.

@casperisfine
Copy link
Contributor Author

When I discussed this with the YJIT team, they said it was unlikely this would ever be supported. I only have limited knowledge here, but I get that it's tricky to do.

@ivoanjo
Copy link
Contributor

ivoanjo commented May 22, 2023

I wonder if it's mostly a performance concern (e.g. limiting the kind of optimizations that can be done, or being more costly having to push the stack frames), or if there's other concerns as well.

E.g. if the main concern was performance, it occurs to me that it would be interesting to maybe make it an option to "go slower, but more observable". That way we could enable the flag we you really need more visibility, with knowledge that Ruby would go slower.

@casperisfine
Copy link
Contributor Author

IIRC it was both, as it both complicate the code and made it slower. That conversation was a long time ago.

But overall I agree that Ruby profiling APIs are less than ideal, and some work upstream would be nice.

@eregon
Copy link
Contributor

eregon commented May 22, 2023

I think the main question is why is it unsafe to use rb_profile_frames() in a signal handler with YJIT enabled?
Is the problem that rb_profile_frames() can't walk the stack to find the caller?
I think https://bugs.ruby-lang.org/issues/19541 might be related.

@casperisfine
Copy link
Contributor Author

why is it unsafe to use rb_profile_frames() in a signal handler with YJIT enabled?

The conversation was too long ago, so I'll defer to @tenderlove here.

@tenderlove
Copy link
Collaborator

@eregon At the time we did this, YJIT didn't "atomically" write Ruby frames. It would push a frame before filling out the frame's memory, and if rb_profile_frames was ever executed when that happened, it would break.

The redmine ticket you're linking to is with regard to C stack frames, rb_profile_frames only walks Ruby frames. I don't think we want to force YJIT (or any JIT) to write stack frames in a certain order, so it seems like maybe we need to make rb_profile_frames JIT aware.

Regardless, I don't think this PR has anything to do with the redmine issue.

@eregon
Copy link
Contributor

eregon commented May 23, 2023

It would push a frame before filling out the frame's memory

Is there any performance advantage to do so? I'm thinking "push a frame" is basically updating some pointer by writing to memory, so performance-wise it might be identical to do it in a different order. Doing it like the interpreter ("push" last) would then let rb_profile_frames() work as-is probably. And in fact the JIT can easily choose the order of writes since it emits the assembly, unlike the interpreter.

But maybe it's about filling the frame lazily and only when needed, e.g., just before a side exit?

The redmine ticket you're linking to is with regard to C stack frames, rb_profile_frames only walks Ruby frames

Ah, my bad, thanks for clarifying.

@ianks
Copy link
Contributor

ianks commented Jun 16, 2023

Thanks for the information here, y’all. My question is: can we currently trust rb_profile_frames when YJIT is enabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

7 participants