- Notifications
You must be signed in to change notification settings - Fork 66
Closed
Description
A SIGFPE crashed an ML job in 6.8.1. This is the failure message:
Fatal error: 'si_signo 8, si_code: 1, si_errno: 0, address: 0x7f64eccb39b1, library: /usr/share/elasticsearch/modules/x-pack-ml/platform/linux-x86_64/bin/../lib/libMlMaths.so, base: 0x7f64ec95c000, normalized address: 0x3579b1', version: 6.8.1 (build 6e3432237cefa4) It looks like the function where the SIGFPE occurred was ml::maths::CTimeSeriesDecompositionDetail::CComponents::CSeasonal::propagateForwards(long, long):
$ objdump -T libMlMaths.so | grep '^00000000003579' | sort 0000000000357930 g DF .text 000000000000016e Base _ZN2ml5maths30CTimeSeriesDecompositionDetail11CComponents9CSeasonal17propagateForwardsEll $ c++filt _ZN2ml5maths30CTimeSeriesDecompositionDetail11CComponents9CSeasonal17propagateForwardsEll ml::maths::CTimeSeriesDecompositionDetail::CComponents::CSeasonal::propagateForwards(long, long) The relevant code looks like this in 6.8.1:
void CTimeSeriesDecompositionDetail::CComponents::CSeasonal::propagateForwards(core_t::TTime start, core_t::TTime end) { for (std::size_t i = 0u; i < m_Components.size(); ++i) { core_t::TTime period{m_Components[i].time().period()}; core_t::TTime a{CIntegerTools::floor(start, period)}; core_t::TTime b{CIntegerTools::floor(end, period)}; if (b > a) { double time{static_cast<double>(b - a) / static_cast<double>(CTools::truncate(period, DAY, WEEK))}; m_Components[i].propagateForwardsByTime(time); m_PredictionErrors[i].age(std::exp(-m_Components[i].decayRate() * time)); } } } It's still the same in latest 6.8. It was refactored in version 7.3.0 in #496. We should probably backport a bare minimum change to 6.8 that at least logs an error instead of crashing with a SIGFPE.