Return to Question

added 2380 characters in body

edited Aug 3, 2016 at 17:31

EDIT:

It seems like the problem is SSL related:

~$ curl -v -k https://mycompany.com:9001/service1/ -o /dev/null * Hostname was NOT found in DNS cache % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying x.x.x.x... * Connected to mycompany.com (x.x.x.x) port 9001 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, **Client hello** (1): } [data not shown] 0 0 0 0 0 0 0 0 --:--:-- **0:00:16** --:--:-- 0* SSLv3, TLS handshake, **Server hello** (2): { [data not shown] * SSLv3, TLS handshake, CERT (11): { [data not shown] * SSLv3, TLS handshake, Server key exchange (12): { [data not shown] * SSLv3, TLS handshake, Server finished (14): { [data not shown] * SSLv3, TLS handshake, Client key exchange (16): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Finished (20): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): { [data not shown] * SSLv3, TLS handshake, Finished (20): { [data not shown] * SSL connection using EDH-RSA-DES-CBC3-SHA * Server certificate: * subject: C=xx; ST=xx; L=xx; O=xx; OU=Tech; CN=mycompany.com * start date: 2016-02-25 13:17:10 GMT * expire date: 2021-02-23 13:17:10 GMT * issuer: DC=xx; DC=xx; DC=xx; CN=xx * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. > GET /service1 HTTP/1.1 > User-Agent: curl/7.35.0 > Host: mycompany.com:9001 > Accept: */* > < HTTP/1.1 200 OK < Date: Wed, 03 Aug 2016 15:46:41 GMT < Transfer-Encoding: chunked < Content-Type: text/xml; charset=UTF-8 < X-Powered-By: Servlet/2.5 JSP/2.1

It seems that while doing GC it takes the server a lot of time to complete the SSL handshake (in this example 16 seconds) But I don't have any idea why. Does anyone know of any java problem related to GCs and SSL protocol?

Regards.

EDIT:

It seems like the problem is SSL related:

~$ curl -v -k https://mycompany.com:9001/service1/ -o /dev/null * Hostname was NOT found in DNS cache % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying x.x.x.x... * Connected to mycompany.com (x.x.x.x) port 9001 (#0) * successfully set certificate verify locations: * CAfile: none CApath: /etc/ssl/certs * SSLv3, TLS handshake, **Client hello** (1): } [data not shown] 0 0 0 0 0 0 0 0 --:--:-- **0:00:16** --:--:-- 0* SSLv3, TLS handshake, **Server hello** (2): { [data not shown] * SSLv3, TLS handshake, CERT (11): { [data not shown] * SSLv3, TLS handshake, Server key exchange (12): { [data not shown] * SSLv3, TLS handshake, Server finished (14): { [data not shown] * SSLv3, TLS handshake, Client key exchange (16): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): } [data not shown] * SSLv3, TLS handshake, Finished (20): } [data not shown] * SSLv3, TLS change cipher, Client hello (1): { [data not shown] * SSLv3, TLS handshake, Finished (20): { [data not shown] * SSL connection using EDH-RSA-DES-CBC3-SHA * Server certificate: * subject: C=xx; ST=xx; L=xx; O=xx; OU=Tech; CN=mycompany.com * start date: 2016-02-25 13:17:10 GMT * expire date: 2021-02-23 13:17:10 GMT * issuer: DC=xx; DC=xx; DC=xx; CN=xx * SSL certificate verify result: self signed certificate in certificate chain (19), continuing anyway. > GET /service1 HTTP/1.1 > User-Agent: curl/7.35.0 > Host: mycompany.com:9001 > Accept: */* > < HTTP/1.1 200 OK < Date: Wed, 03 Aug 2016 15:46:41 GMT < Transfer-Encoding: chunked < Content-Type: text/xml; charset=UTF-8 < X-Powered-By: Servlet/2.5 JSP/2.1

Regards.

Source Link

asked Aug 2, 2016 at 15:44

Franco

WebLogic server seems to freeze after CMS Garbage Collection run

We're using Java Hotspot 1.6 with a 10.3.6 WebLogic Cluster with 2 nodes running OSB 11.1.1.7 behind a physical load balancer:

java version "1.6.0_91" Java(TM) SE Runtime Environment (build 1.6.0_91-b13) Java HotSpot(TM) 64-Bit Server VM (build 20.91-b07, mixed mode)

We've tuned the JVM to use CMS and avoid long running garbage collections:

-XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewRatio=4 -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=60 -XX:+ScavengeBeforeFullGC -XX:+CMSScavengeBeforeRemark

Most of the pause phases of the GCs (initial mark and remark) take less than a second, we don't see any failed CMS runs and we don't see Full GCs in the gc logs.

However, we're receiving timeout errors on clients trying to connect to OSB a few seconds after a completed CMS GC execution (particularly in an Oracle SOA installation with lots of composites that call OSB):

OSB 1 gc.log:

> 2016-07-28T08:31:52.656-0300: 492247.689: [GC > 2016-07-28T08:31:52.656-0300: 492247.690: [ParNew: > 1027265K->22014K(1132416K), 0.0412613 secs] > 4046262K->3043322K(6165632K), 0.0420791 secs] [Times: user=0.25 > sys=0.00, real=0.04 secs] 2016-07-28T08:31:52.701-0300: 492247.735: > [GC [1 CMS-initial-mark: 3021307K(5033216K)] 3043844K(6165632K), > 0.0683781 secs] [Times: user=0.08 sys=0.00, real=0.07 secs] 2016-07-28T08:31:52.770-0300: 492247.804: [CMS-concurrent-mark-start] > 2016-07-28T08:31:56.971-0300: 492252.005: [CMS-concurrent-mark: > 4.201/4.201 secs] [Times: user=10.55 sys=0.16, real=4.20 secs] 2016-07-28T08:31:56.971-0300: 492252.005: > [CMS-concurrent-preclean-start] 2016-07-28T08:31:57.064-0300: > 492252.098: [CMS-concurrent-preclean: 0.090/0.093 secs] [Times: user=0.12 sys=0.00, real=0.09 secs] 2016-07-28T08:31:57.064-0300: > 492252.098: [CMS-concurrent-abortable-preclean-start] 2016-07-28T08:32:01.198-0300: 492256.231: [GC > 2016-07-28T08:32:01.198-0300: 492256.232: [ParNew: > 1028128K->30711K(1132416K), 0.0502326 secs] > 4049435K->3053383K(6165632K), 0.0508313 secs] [Times: user=0.34 > sys=0.00, real=0.05 secs] CMS: abort preclean due to time > 2016-07-28T08:32:02.128-0300: 492257.161: > [CMS-concurrent-abortable-preclean: 4.027/5.063 secs] [Times: > user=7.62 sys=0.24, real=5.06 secs] 2016-07-28T08:32:02.134-0300: > 492257.167: [GC[YG occupancy: 186361 K (1132416 K)]2016-07-28T08:32:02.134-0300: 492257.167: [GC > 2016-07-28T08:32:02.134-0300: 492257.167: [ParNew: > 186361K->25052K(1132416K), 0.0231613 secs] > 3209034K->3049538K(6165632K), 0.0236069 secs] [Times: user=0.17 > sys=0.00, real=0.02 secs] 2016-07-28T08:32:02.158-0300: 492257.191: > [Rescan (parallel) , 0.0289939 secs]2016-07-28T08:32:02.187-0300: > 492257.220: [weak refs processing, 0.3412320 secs] [1 CMS-remark: 3024485K(5033216K)] 3049538K(6165632K), 0.4179878 secs] [Times: > user=0.67 sys=0.00, real=0.42 secs] 2016-07-28T08:32:02.552-0300: > 492257.586: [CMS-concurrent-sweep-start] 2016-07-28T08:32:07.144-0300: 492262.177: [CMS-concurrent-sweep: 4.591/4.591 secs] [Times: user=14.05 sys=0.22, real=4.59 secs] 2016-07-28T08:32:07.146-0300: > 492262.180: [CMS-concurrent-reset-start] 2016-07-28T08:32:07.283-0300: 492262.316: [CMS-concurrent-reset: 0.137/0.137 secs] [Times: user=0.36 sys=0.01, real=0.14 secs] 2016-07-28T08:32:36.201-0300: 492291.234: > [GC 2016-07-28T08:32:36.202-0300: 492291.234: [ParNew: > 1030955K->45449K(1132416K), 0.0671313 secs] > 2269248K->1285059K(6165632K), 0.0676485 secs] [Times: user=0.41 > sys=0.00, real=0.07 secs]

Is in the OSB WebLogic access log that we noticed this strange behavior, usually 5 or 10 seconds after CMS starts, there's a 20+ second gap between requests in the access log, when usually we see dozens of calls per second.

OSB1 access log:

2016-07-28 08:32:03 POST /service1/ 200 0.043 2016-07-28 08:32:03 POST /service1/ 200 0.074 2016-07-28 08:32:03 POST /service1/ 200 0.045 2016-07-28 08:32:03 POST /service1/ 200 0.053 2016-07-28 08:32:35 POST /service1/ 200 0.076

And at the same time some of the calls made from the SOA server fail:

SOA log error: <Jul 28, 2016 8:32:19 AM ART> <oracle.integration.platform.blocks.soap> <Unable to invoke endpoint URI "https://esburl.mycompany.com/service1" successfully due to: Unable to invoke endpoint URI "https://esburl.mycompany.com/service1" successfully due to: javax.xml.soap.SOAPException: javax.xml.soap.SOAPException: Message send failed: Read timed out>

There's no errors reported in the OSB weblogic output log. We've taken some threadumps while GC was running but haven't been able to pinpoint the issue. We thought that if initial mark and remark were fast enough we wouldn't see this kind of problems, but we still have doubts regarding if this is a JVM, a WebLogic or an application issue (OSB). Anyone has seen something like this with WebLogic or another applications server?

Thanks!