Project

General

Profile

Actions

Bug #16809

closed

Fiber crashes with --with-coroutine=copy

Bug #16809: Fiber crashes with --with-coroutine=copy

Added by ncopa (Natanael Copa) over 5 years ago. Updated over 4 years ago.

Status:
Closed
Target version:
-
ruby -v:
ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [s390x-linux-musl]
[ruby-core:98026]

Description

./revision.h unchanged #190 test_fiber.rb:15:in `<top (required)>': Fiber.new{ }.resume :ok #=> "" (expected "ok") #192 test_fiber.rb:26:in `<top (required)>': fibers = 100.times.collect{Fiber.new{Fiber.yield}} fibers.each(&:resume) fibers.each(&:resume) :ok #=> "" (expected "ok") #193 test_fiber.rb:33:in `<top (required)>': at_exit { Fiber.new{}.resume } #=> killed by SIGFPE (signal 8) #194 test_fiber.rb:37:in `<top (required)>': Fiber.new(&Object.method(:class_eval)).resume("foo") #=> killed by SIGFPE (signal 8) [ruby-dev:34128] test_fiber.rb FAIL 4/5 #934 test_massign.rb:165:in `<top (required)>': a,s=[],"aaa" 300.times { a<<s; s=s.succ } eval <<-END__ GC.stress=true Fiber.new do #{ a.join(",") },*zzz=1 end.resume END__ :ok #=> "" (expected "ok") [ruby-dev:32581] test_massign.rb FAIL 1/34 #1391 test_thread.rb:310:in `<top (required)>': g = enum_for(:local_variables) loop { g.next } #=> killed by SIGFPE (signal 8) [ruby-dev:34128] #1392 test_thread.rb:315:in `<top (required)>': g = enum_for(:block_given?) loop { g.next } #=> killed by SIGFPE (signal 8) [ruby-dev:34128] #1393 test_thread.rb:320:in `<top (required)>': g = enum_for(:binding) loop { g.next } #=> killed by SIGFPE (signal 8) [ruby-dev:34128] #1394 test_thread.rb:325:in `<top (required)>': g = "abc".enum_for(:scan, /./) loop { g.next } #=> killed by SIGFPE (signal 8) [ruby-dev:34128] #1395 test_thread.rb:330:in `<top (required)>': g = Module.enum_for(:new) loop { g.next } #=> killed by SIGFPE (signal 8) [ruby-dev:34128] test_thread.rb FAIL 5/48 Thread count: 10000 (skipping) FAIL 10/1409 tests failed make: *** [uncommon.mk:751: yes-btest-ruby] Error 1 

May be related to this warning:

compiling coroutine/copy/Context.c coroutine/copy/Context.c: In function 'coroutine_restore_stack_padded': coroutine/copy/Context.c:87:34: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] 87 | _longjmp(context->state, 1 | (int)buffer); | 

Updated by puchuu (Andrew Aladjev) about 5 years ago Actions #1 [ruby-core:100085]

I've tested copy coroutine. Unfortunately today it is broken completely: hangs, segfaults, etc.

Updated by jeremyevans0 (Jeremy Evans) about 5 years ago Actions #2 [ruby-core:100086]

  • Subject changed from ruby testsuite fails on s390x alpine (musl) with --with-coroutine=copy to Fiber crashes with --with-coroutine=copy
  • Status changed from Open to Assigned
  • Assignee set to ioquatix (Samuel Williams)

OpenBSD/sparc64 (which uses copy coroutine) is similarly broken in regards to fibers. Even something simple like ruby27 -e 'Fiber.new{Fiber.yield}.resume' crashes (ruby26 works fine for this). Changing the title to be more general since this does not just affect s390x alpine (musl).

Updated by jeremyevans0 (Jeremy Evans) about 5 years ago Actions #3 [ruby-core:100096]

It looks like sometimes the copy coroutine implementation can segfault even on x86_64: https://travis-ci.org/github/ruby/ruby/jobs/729643639

Updated by ioquatix (Samuel Williams) about 5 years ago Actions #4 [ruby-core:100286]

This might be a pointer alignment issue / problem with the alloca elision.

After playing around with godbolt compiler explorer, I think this might be one option:

https://github.com/ruby/ruby/pull/3624

However, I wouldn't be surprised if it doesn't solve the issue.

Updated by jeremyevans0 (Jeremy Evans) about 5 years ago Actions #5 [ruby-core:100674]

I tried pull request #3624 on OpenBSD/sparc64 and it still crashed.

I was able to come up with a fix that works on OpenBSD/sparc64, as long as a couple files are compiled without optimization: https://github.com/ruby/ruby/pull/3726

Updated by ioquatix (Samuel Williams) about 5 years ago Actions #6 [ruby-core:100757]

I think we found the root cause of this, and it should be addressed by:

https://github.com/ruby/ruby/pull/3624/commits/9de559acc82a28bb0d912ed55cd36cf6f652ea9f

However, @jeremyevans0 (Jeremy Evans) is still testing it.

Updated by ioquatix (Samuel Williams) almost 5 years ago Actions #7 [ruby-core:101308]

  • Status changed from Assigned to Closed

Updated by ioquatix (Samuel Williams) almost 5 years ago Actions #8 [ruby-core:101309]

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN to 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: REQUIRED

@jeremyevans0 (Jeremy Evans) can you manage the backport? Or who is responsible?

This commit (and only this commit) should be backported: https://github.com/ruby/ruby/pull/3624/commits/440983fa9e7695d83def190e9701b5a22e076495

Updated by jeremyevans0 (Jeremy Evans) almost 5 years ago Actions #9 [ruby-core:101319]

  • Backport changed from 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: REQUIRED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED

ioquatix (Samuel Williams) wrote in #note-8:

@jeremyevans0 (Jeremy Evans) can you manage the backport? Or who is responsible?

The branch maintainer is responsible. For 2.7, that is currently @nagachika (Tomoyuki Chikanaga) .

I updated the backport flag to indicate this is only needed by 2.7 and not earlier versions.

Updated by nagachika (Tomoyuki Chikanaga) over 4 years ago Actions #10 [ruby-core:102946]

  • Backport changed from 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONE

ruby_2_7 d84cc717020be1da7d89b6bda02d1427f9593968 merged revision(s) 15e23312f6abcbf1afc6fbbf7917a57a0637f680.

Actions

Also available in: PDF Atom