Mining Development Knowledge to Understand and Support Software Logging Practices Heng Li Supervisor: Dr. Ahmed E. Hassan Software Analysis & Intelligence Lab (SAIL) Queen’s University, Canada
Developers insert logging code that produces log messages at runtime 2 Log() Logging code Log messages Software system Log.info(“Stopping server on ” + port); 2016-07-23 17:56:16 INFO Stopping server on 8032 Log messages record valuable runtime information
Diagnose failures Logging is critical for software maintenance Detect anomalies Log messages are widely used in software maintenance efforts 3 Understand runtime behaviors Fu et al., Contextual analysis of program logs for understanding system behaviors. MSR ‘13 Yuan et al., Sherlog: Error diagnosis by connecting clues from run-time logs. ASPLOS ‘10 Xu et al., Detecting large-scale system problems by mining console logs. SOSP ‘09
Developers have difficulties deciding on appropriate logging code 4 “A lot of log noise” “Slowing down perf by 20%” “Missing an error log” Developers spend a significant amount of efforts maintaining their logging code § Logging practices in open source projects [Yuan et al., 2012; Chen and Jiang, 2017] § Logging practices in industry [Shang et al, 2014; Fu et al, 2014] Prior work
Development knowledge explains the development of logging code 5 − LOG.info(msg); + LOG.warn(msg); To help users identify a problem LOG.warn(msg); What How Why Change historySource code Issue reports
Thesis statement Development knowledge can help us understand current logging practices and develop useful tools to support such logging practices 6 Change historySource code Issue reports Development knowledge
Mining development knowledge to understand and support logging practices 7 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
Mining development knowledge to understand and support logging practices 8 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
Developers communicate their logging concerns in issue reports 9 Logging cost: performance overhead Remove a logging statement
Developers communicate their logging concerns in issue reports 10 Add a logging statement Logging benefit: exposing runtime problems
We study logging-related issues reports to understand developer’s logging concerns 11 Logging issue reports Logging concerns Automated & manual filtering Qualitative analysis
What are developers’ logging concerns? 12 Logging Benefits § Assisting in debugging Logging Costs § Excessive log information Research opportunities Leverage Minimize Frequency § Providing runtime perf § Exposing runtime problems § Bookkeeping § Showing execution progress § Exposing unnecessary details § Misleading end users § Performance overhead § Exposing sensitive info
Mining development knowledge to understand and support logging practices 13 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info 10 categories of logging concerns (e.g., misleading users)
Mining development knowledge to understand and support logging practices 14 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
Some code topics are more likely to need logging statements 15 Examples of JIRA issues that require developers to log the topic of “connections” [EMSE 2018]
Can code topics explain where to log? Topic: “connection” Logging statement [EMSE 2018] 16 We extract the code topics and logging statements for each code snippet (method level)
We use LDA to extract code topics Logging statement [EMSE 2018] 17 Tokenization Topic model (LDA) queue, connection Topic: “connection”
A small number of topics are much more likely to be logged Topic: “connection” Logging statement The most log-intensive topics usually capture communication between machines (e.g., ”connection”) or interactions between threads (e.g., “thread interruption”) [EMSE 2018] 18
We combine both the structure and topic info to explain where to log Topic: “connection” Logging statement Structure info: lines of code, complexity, control flow statements, etc. [EMSE 2018] 19
We combine both the structure and topic info to explain where to log Topic: “connection” Logging statement Structure info: lines of code, complexity, control flow statements, etc. LASSO model [EMSE 2018] 20
Code topics bring additional explanatory power (up to 13% AUC improvement) 21 0.82 0.86 0.8 0.86 0.83 0.96 0.87 0.94 0.9 0.9 0.88 0.99 0.5 0.6 0.7 0.8 0.9 1 Structure info Structure & topic info AUC The performance (AUC) of our LASSO models Random guess [EMSE 2018]
Mining development knowledge to understand and support logging practices 22 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info Logging varies across code topics
Mining development knowledge to understand and support logging practices 23 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
Developers have difficulties to make appropriate log changes 24 Developers usually forget to change logging code when they change their code; in many cases, logging code is written as “after-thoughts” after a failure happens and logs are needed [Yuan et al., 2012] Commit n Commit n+1 Code changes Log changes Version k Debugging difficulties Code change history Maintenance efforts
Learning from the code change history to provide log change suggestions 25 [EMSE 2017] Code Code Log Code Log ? Commit 1 Commit 2 Commit n… Code changes without log changes Code changes with log changes Do we need to change logs? Code change history
LOG? Providing automated suggestions for log changes when developers change the code 26 Random Forest Classifier Log change suggestions Three dimensions 25 metrics Change metrics Historical metrics Product metrics [EMSE 2017] Code
Our models can effectively suggest whether a log change is needed 27 0.84 0.91 0.86 0.88 0.5 0.6 0.7 0.8 0.9 1 AUC The performance (AUC) of our Random Forest models Random guess [EMSE 2017]
LOG? The source code and code changes are important for explaining log changes 28 Log change suggestions Three dimensions 25 metrics Change metrics Historical metrics Product metrics [EMSE 2017] Code Explain
Mining development knowledge to understand and support logging practices 29 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info The source code & code changes can explain log changes
Mining development knowledge to understand and support logging practices 30 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
Log levels are used to disable some verbose log messages while enabling important ones 31 Trace Debug Info Warn Error Fatal Less verbose levels (higher levels) More verbose levels (lower levels) Log.error(“message”) Log level
Improper log levels can have many negative impacts 32 “…tends to generate a lot of log noise…” “These warnings worry users” Developers spend much efforts adjusting log levels [Yuan et al., 2012]
Learning from the code change history to provide log level suggestions 33 [EMSE 2017] Commit 1 Commit 2 Commit n… Code change history Log.warn(msg) Log.info(msg) Log. ? (msg) Log.error(msg) Which log level to use?
Providing automated suggestions for log levels when developers add logging code 34 Logging statement metrics Containing block metrics Containing file metrics Code change metrics Historical change metrics Trace Debug Info Warn Error Fatal Ordinal Regression Model [EMSE 2017]
Ordinal regression models can effectively model log levels 35 0.76 0.78 0.81 0.75 0.5 0.6 0.7 0.8 0.9 The performance (AUC) of our Ordinal Regression Models AUC Random guess [EMSE 2017]
The content of a logging statements and the containing block/file explain its log level 36 Logging statement metrics Containing block metrics Containing file metrics Code change metrics Historical change metrics Trace Debug Info Warn Error Fatal [EMSE 2017] Explain
Mining development knowledge to understand and support logging practices 37 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info The log content & containing blocks/files can explain log levels
Mining development knowledge to understand and support logging practices 38 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Logging varies across code topics Error Warn Info The source code & code changes can explain log changes The log content & containing blocks/files can explain log levels 10 categories of logging concerns (e.g., misleading users)
References § Fu, Q., Lou, J. G., Lin, Q., Ding, R., Zhang, D., and Xie, T. (2013). Contextual analysis of program logs for understanding system behaviors. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 397–400. § Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pages 117–132. § Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., and Pasupathy, S. (2010). Sherlog: Error diagnosis by connecting clues from run-time logs. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’10, pages 143–154. § Yuan, D., Park, S., and Zhou, Y. (2012). Characterizing logging practices in open source software. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 102–112. § Chen, B. and Jiang, Z. M. J. (2017). Characterizing logging practices in Java-based open source software projects – a replication study in apache software foundation. Empirical Software Engineering, 22(1):330–374. § Shang, W., Jiang, Z. M., Adams, B., Hassan, A. E., Godfrey, M. W., Nasser, M., and Flora, P. (2014). An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process, 26(1):3–26. § Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., and Ubayashi, N. (2014). An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11thWorking Conference onMining Software Repositories, MSR 2014, pages 172–181. 39
Extra slides 40
Log() Literature review 41 Mining logging code Mining log messages Improving logging code Log()
Mining log messages 42 Understanding runtime behaviors [Fu et al., 2013; Hassan et al., 2008; Shang et al., 2013] Detecting anomaly conditions [Xu et al., 2008, 2009; Fu et al., 2009; Jiang et al., 2008] Diagnosing system failures [Yuan et al, 2010; Syer et al., 2013] Prior work highlights the importance of improving logging quality
Mining logging code 43 Logging practices in open source projects [Yuan et al., 2012; Chen and Jiang, 2017] Logging practices in industry [Fu et al, 2014; Pecchia et al., 2015] Evolution of logging code [Shang et al, 2011; Kabinna et al., 2016] Log() Developers spend much effort maintaining their logging Software logging is a common practice
Improving logging code: proactive logging 44 Proactively adding logging info in the source code [Yuan et al., 2011, 2012; Zhao et al., 2017] Log() Producing excessive log information Developers’ expertise and concerns are not considered
Improving logging code: learning to log 45 Learning statistical models to suggest where to log [Zhu et al., 2015; Lal and Sureka, 2016; Jia et al., 2018] Ignoring logging patterns (e.g., log level, stack trace) Log() Focusing on one dim. of dev. knowledge (source code) Providing logging suggestions as a post-dev. process
Logging stack traces can grow log files very fast 46 Log.warn(msg) Log.warn(msg, e) Logging a log message + full stack trace Logging a log message
Developers have difficulties to decide whether to log stack traces 47 Missing stack trace Improper logging of stack trace
Learning from existing source code to suggest whether to log a stack trace 48 Source code Source code Log(msg) Log(msg, e) Source code Log(msg, ?) Random Forest Classifier Log the stack trace? Six dimensions of features Log(msg, e)
Our models can effectively suggest whether a stack trace is needed 49 0.85 0.94 0.9 0.86 0.5 0.6 0.7 0.8 0.9 1 AUC The performance (AUC) of our Random Forest models Random guess

Mining Development Knowledge to Understand and Support Software Logging Practices

  • 1.
    Mining Development Knowledgeto Understand and Support Software Logging Practices Heng Li Supervisor: Dr. Ahmed E. Hassan Software Analysis & Intelligence Lab (SAIL) Queen’s University, Canada
  • 2.
    Developers insert loggingcode that produces log messages at runtime 2 Log() Logging code Log messages Software system Log.info(“Stopping server on ” + port); 2016-07-23 17:56:16 INFO Stopping server on 8032 Log messages record valuable runtime information
  • 3.
    Diagnose failures Logging is criticalfor software maintenance Detect anomalies Log messages are widely used in software maintenance efforts 3 Understand runtime behaviors Fu et al., Contextual analysis of program logs for understanding system behaviors. MSR ‘13 Yuan et al., Sherlog: Error diagnosis by connecting clues from run-time logs. ASPLOS ‘10 Xu et al., Detecting large-scale system problems by mining console logs. SOSP ‘09
  • 4.
    Developers have difficultiesdeciding on appropriate logging code 4 “A lot of log noise” “Slowing down perf by 20%” “Missing an error log” Developers spend a significant amount of efforts maintaining their logging code § Logging practices in open source projects [Yuan et al., 2012; Chen and Jiang, 2017] § Logging practices in industry [Shang et al, 2014; Fu et al, 2014] Prior work
  • 5.
    Development knowledge explains thedevelopment of logging code 5 − LOG.info(msg); + LOG.warn(msg); To help users identify a problem LOG.warn(msg); What How Why Change historySource code Issue reports
  • 6.
    Thesis statement Development knowledgecan help us understand current logging practices and develop useful tools to support such logging practices 6 Change historySource code Issue reports Development knowledge
  • 7.
    Mining development knowledgeto understand and support logging practices 7 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
  • 8.
    Mining development knowledgeto understand and support logging practices 8 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
  • 9.
    Developers communicate theirlogging concerns in issue reports 9 Logging cost: performance overhead Remove a logging statement
  • 10.
    Developers communicate theirlogging concerns in issue reports 10 Add a logging statement Logging benefit: exposing runtime problems
  • 11.
    We study logging-relatedissues reports to understand developer’s logging concerns 11 Logging issue reports Logging concerns Automated & manual filtering Qualitative analysis
  • 12.
    What are developers’logging concerns? 12 Logging Benefits § Assisting in debugging Logging Costs § Excessive log information Research opportunities Leverage Minimize Frequency § Providing runtime perf § Exposing runtime problems § Bookkeeping § Showing execution progress § Exposing unnecessary details § Misleading end users § Performance overhead § Exposing sensitive info
  • 13.
    Mining development knowledgeto understand and support logging practices 13 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info 10 categories of logging concerns (e.g., misleading users)
  • 14.
    Mining development knowledgeto understand and support logging practices 14 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
  • 15.
    Some code topicsare more likely to need logging statements 15 Examples of JIRA issues that require developers to log the topic of “connections” [EMSE 2018]
  • 16.
    Can code topicsexplain where to log? Topic: “connection” Logging statement [EMSE 2018] 16 We extract the code topics and logging statements for each code snippet (method level)
  • 17.
    We use LDAto extract code topics Logging statement [EMSE 2018] 17 Tokenization Topic model (LDA) queue, connection Topic: “connection”
  • 18.
    A small numberof topics are much more likely to be logged Topic: “connection” Logging statement The most log-intensive topics usually capture communication between machines (e.g., ”connection”) or interactions between threads (e.g., “thread interruption”) [EMSE 2018] 18
  • 19.
    We combine boththe structure and topic info to explain where to log Topic: “connection” Logging statement Structure info: lines of code, complexity, control flow statements, etc. [EMSE 2018] 19
  • 20.
    We combine boththe structure and topic info to explain where to log Topic: “connection” Logging statement Structure info: lines of code, complexity, control flow statements, etc. LASSO model [EMSE 2018] 20
  • 21.
    Code topics bringadditional explanatory power (up to 13% AUC improvement) 21 0.82 0.86 0.8 0.86 0.83 0.96 0.87 0.94 0.9 0.9 0.88 0.99 0.5 0.6 0.7 0.8 0.9 1 Structure info Structure & topic info AUC The performance (AUC) of our LASSO models Random guess [EMSE 2018]
  • 22.
    Mining development knowledgeto understand and support logging practices 22 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info Logging varies across code topics
  • 23.
    Mining development knowledgeto understand and support logging practices 23 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
  • 24.
    Developers have difficultiesto make appropriate log changes 24 Developers usually forget to change logging code when they change their code; in many cases, logging code is written as “after-thoughts” after a failure happens and logs are needed [Yuan et al., 2012] Commit n Commit n+1 Code changes Log changes Version k Debugging difficulties Code change history Maintenance efforts
  • 25.
    Learning from thecode change history to provide log change suggestions 25 [EMSE 2017] Code Code Log Code Log ? Commit 1 Commit 2 Commit n… Code changes without log changes Code changes with log changes Do we need to change logs? Code change history
  • 26.
    LOG? Providing automated suggestionsfor log changes when developers change the code 26 Random Forest Classifier Log change suggestions Three dimensions 25 metrics Change metrics Historical metrics Product metrics [EMSE 2017] Code
  • 27.
    Our models caneffectively suggest whether a log change is needed 27 0.84 0.91 0.86 0.88 0.5 0.6 0.7 0.8 0.9 1 AUC The performance (AUC) of our Random Forest models Random guess [EMSE 2017]
  • 28.
    LOG? The source codeand code changes are important for explaining log changes 28 Log change suggestions Three dimensions 25 metrics Change metrics Historical metrics Product metrics [EMSE 2017] Code Explain
  • 29.
    Mining development knowledgeto understand and support logging practices 29 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info The source code & code changes can explain log changes
  • 30.
    Mining development knowledgeto understand and support logging practices 30 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info
  • 31.
    Log levels areused to disable some verbose log messages while enabling important ones 31 Trace Debug Info Warn Error Fatal Less verbose levels (higher levels) More verbose levels (lower levels) Log.error(“message”) Log level
  • 32.
    Improper log levelscan have many negative impacts 32 “…tends to generate a lot of log noise…” “These warnings worry users” Developers spend much efforts adjusting log levels [Yuan et al., 2012]
  • 33.
    Learning from thecode change history to provide log level suggestions 33 [EMSE 2017] Commit 1 Commit 2 Commit n… Code change history Log.warn(msg) Log.info(msg) Log. ? (msg) Log.error(msg) Which log level to use?
  • 34.
    Providing automated suggestionsfor log levels when developers add logging code 34 Logging statement metrics Containing block metrics Containing file metrics Code change metrics Historical change metrics Trace Debug Info Warn Error Fatal Ordinal Regression Model [EMSE 2017]
  • 35.
    Ordinal regression modelscan effectively model log levels 35 0.76 0.78 0.81 0.75 0.5 0.6 0.7 0.8 0.9 The performance (AUC) of our Ordinal Regression Models AUC Random guess [EMSE 2017]
  • 36.
    The content ofa logging statements and the containing block/file explain its log level 36 Logging statement metrics Containing block metrics Containing file metrics Code change metrics Historical change metrics Trace Debug Info Warn Error Fatal [EMSE 2017] Explain
  • 37.
    Mining development knowledgeto understand and support logging practices 37 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Error Warn Info The log content & containing blocks/files can explain log levels
  • 38.
    Mining development knowledgeto understand and support logging practices 38 Developers’ logging concerns? [TSE under review] Where to log? When to update log? How to log? [EMSE 2018] [EMSE 2017] [EMSE 2017] Logging varies across code topics Error Warn Info The source code & code changes can explain log changes The log content & containing blocks/files can explain log levels 10 categories of logging concerns (e.g., misleading users)
  • 39.
    References § Fu, Q.,Lou, J. G., Lin, Q., Ding, R., Zhang, D., and Xie, T. (2013). Contextual analysis of program logs for understanding system behaviors. In Proceedings of the 10th Working Conference on Mining Software Repositories, MSR ’13, pages 397–400. § Xu, W., Huang, L., Fox, A., Patterson, D., and Jordan, M. I. (2009). Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, SOSP ’09, pages 117–132. § Yuan, D., Mai, H., Xiong, W., Tan, L., Zhou, Y., and Pasupathy, S. (2010). Sherlog: Error diagnosis by connecting clues from run-time logs. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’10, pages 143–154. § Yuan, D., Park, S., and Zhou, Y. (2012). Characterizing logging practices in open source software. In Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages 102–112. § Chen, B. and Jiang, Z. M. J. (2017). Characterizing logging practices in Java-based open source software projects – a replication study in apache software foundation. Empirical Software Engineering, 22(1):330–374. § Shang, W., Jiang, Z. M., Adams, B., Hassan, A. E., Godfrey, M. W., Nasser, M., and Flora, P. (2014). An exploratory study of the evolution of communicated information about the execution of large software systems. Journal of Software: Evolution and Process, 26(1):3–26. § Fukushima, T., Kamei, Y., McIntosh, S., Yamashita, K., and Ubayashi, N. (2014). An empirical study of just-in-time defect prediction using cross-project models. In Proceedings of the 11thWorking Conference onMining Software Repositories, MSR 2014, pages 172–181. 39
  • 40.
  • 41.
    Log() Literature review 41 Mining logging code Mining logmessages Improving logging code Log()
  • 42.
    Mining log messages 42 Understandingruntime behaviors [Fu et al., 2013; Hassan et al., 2008; Shang et al., 2013] Detecting anomaly conditions [Xu et al., 2008, 2009; Fu et al., 2009; Jiang et al., 2008] Diagnosing system failures [Yuan et al, 2010; Syer et al., 2013] Prior work highlights the importance of improving logging quality
  • 43.
    Mining logging code 43 Loggingpractices in open source projects [Yuan et al., 2012; Chen and Jiang, 2017] Logging practices in industry [Fu et al, 2014; Pecchia et al., 2015] Evolution of logging code [Shang et al, 2011; Kabinna et al., 2016] Log() Developers spend much effort maintaining their logging Software logging is a common practice
  • 44.
    Improving logging code:proactive logging 44 Proactively adding logging info in the source code [Yuan et al., 2011, 2012; Zhao et al., 2017] Log() Producing excessive log information Developers’ expertise and concerns are not considered
  • 45.
    Improving logging code:learning to log 45 Learning statistical models to suggest where to log [Zhu et al., 2015; Lal and Sureka, 2016; Jia et al., 2018] Ignoring logging patterns (e.g., log level, stack trace) Log() Focusing on one dim. of dev. knowledge (source code) Providing logging suggestions as a post-dev. process
  • 46.
    Logging stack tracescan grow log files very fast 46 Log.warn(msg) Log.warn(msg, e) Logging a log message + full stack trace Logging a log message
  • 47.
    Developers have difficultiesto decide whether to log stack traces 47 Missing stack trace Improper logging of stack trace
  • 48.
    Learning from existingsource code to suggest whether to log a stack trace 48 Source code Source code Log(msg) Log(msg, e) Source code Log(msg, ?) Random Forest Classifier Log the stack trace? Six dimensions of features Log(msg, e)
  • 49.
    Our models caneffectively suggest whether a stack trace is needed 49 0.85 0.94 0.9 0.86 0.5 0.6 0.7 0.8 0.9 1 AUC The performance (AUC) of our Random Forest models Random guess