Introduction
OWASP JuiceShop, a widely used Capture The Flag (CTF) contest application for penetration testing (PT) teams. It offers a gamified experience with logical puzzles. While it serves its intended purpose, it is not a suitable benchmarking target for Dynamic Application Security Testing (DAST). We will explain why this is the case in this post. Before we dive into the concerns of using JuiceShop as a DAST benchmarking tool first define why and how we should approach DAST benchmarking.
The Purpose of Benchmarking
In the realm of DAST benchmarking involves comparing the performance, capabilities, and efficacy of various tools in identifying and mitigating security vulnerabilities. The primary goal is to select a DAST solution that aligns with the unique requirements and objectives of an organization’s security strategy. As such we should also make sure the benchmarking target resembles the end target applications of the organization as closely as possible. This is a key reason that selecting very old benchmarking targets with obsolete technologies like DVWAbWAPP or targets which do not behave like real world applications does not align with the end goal of finding the best tool for the job; with the job being testing real world applications of the organization.
Approaching DAST Testing
To extract maximum value from DAST benchmarking, it’s crucial to adopt a comprehensive testing approach. Consider the following key aspects:
a. Ability to Test Modern Technologies: Ensure that the DAST tool supports and effectively tests applications built on modern technologies. Compatibility with diverse tech stacks is vital for addressing the ever-evolving nature of web applications.
An example to technologies we should ensure are present at a modern benchmark are:
- Modern backend language like: NodeJS, Go, Elixer, etc..
- Modern frontend frameworks such as React, Angular, and Vue.js.
- Modern Architectures: SPA, BackendFrontend API communicating over RESTGraphQL.
- Dynamic Application: JS Events, Complicated DOM, Frontend logic.
- Modern Stack: PostgresQL, NoSQL, modern web server, etc..
b. Modern Vulnerabilities: Evaluate the tool’s proficiency in detecting modern vulnerabilities. The benchmarking process should include testing for threats beyond traditional issues, such as those related to cloud services, microservices, and serverless architectures.
An example of modern vulnerabilities we should ensure are present at modern benchmark are:
- Cloud resources: AWS S3 issues, Google Storage, Azure Blobs, API key leaks and secrets.
- API Security: GraphQL misconfiguration, OWASP API top 10, business constraint issues, business logic issues.
- Authorization: JWT Token issues, privilege elevation issues, Access Control misconfiguration.
c. Authentication Scenarios: Assess the DAST tool’s capability to handle various authentication mechanisms. Robust testing should encompass scenarios involving single sign-on (SSO), multi-factor authentication (MFA), and other authentication protocols to provide a holistic security assessment.
d. Crawling and Discovery: The tool’s ability to thoroughly crawl and discover the application’s attack surface is critical. Effective crawling ensures comprehensive coverage of the application, uncovering hidden vulnerabilities that may escape less sophisticated tools.
e. API and Backend Testing: With the rise of API-centric architectures, a robust DAST tool should extend its testing capabilities to APIs and backend services. Evaluate how well the tool can identify vulnerabilities in API endpoints, this includes different API technologies like RESTGraphQL and others. we should also make sure the DAST tool can support multiple ways to map and identify all of the different API endpoints (loading schemes, handing introspection, allowing editing or manual setup of specific API EPs)
Now that we agree on the requirements from an effective benchmark we need to ensure the target of our benchmark can enable us to effectively support all these points. This will enable us to stay as true to actual targets we will test for the organization, encompass multiple modern vulnerabilities and behave and be architected in a way that resembles real world applications as much as possible.
Why does JuiceShop fall short
Gamified Approach and Logical Puzzles:
OWASP Juice Shop’s design heavily emphasizes a play-like approach, incorporating logical puzzles that may not align with real-world application security challenges.

One prominent example is the scenario where a user is prompted to “Reset the password of Bjoern’s OWASP account via the Forgot Password mechanism with the truthful answer to his security question.” To solve this scenario one needs to either watch Bjoern’s OWASP lecture from 2018 to see his playthrough of JuiceShop or go to his twitter and scroll until a post talking about his favorite cat “Zaya” happens to come into view.

Another good example is the “Receive a coupon code from the support chatbot” challenge, to win this one a user needs to “bully” the chatbot while asking consistently again and again for a coupon code until the Bot gives up and supplies the user with a coupon.
Many similar “vulnerabilities” have been programmed into JuiceShop. While this makes the application a very fun PT puzzle platform these issues are hardly in the realm of real world vulnerabilities or issues that a DAST tool is expected to find.
Limited Automated Vulnerability Detection:
Certain vulnerabilities within Juice Shop cannot be efficiently detected through automated means. An illustrative example involves extracting security question answers from external sources like YouTube videos. This kind of manual intervention and information retrieval, as demonstrated by Bjoern Kimminich himself in a conference talk, highlights the inherent limitations of automated vulnerability detection in Juice Shop.
Non-Conformity to HTTP Standards:
A major drawback of Juice Shop lies in its non-conformity to HTTP standards. Every page, regardless of existence, returns a 200 OK status, creating potential confusion for DAST tools relying on standard status codes for interpretation.

As the application uses only relative links every such non existent URL has the potential to endlessly increase the sitemap if the tool is not configured to handle such situations.
Furthermore, the application employs unconventional HTTP response status messages, such as using a 500 Internal Error for unauthorized access, a departure from the industry-standard 401 or 403 status.

Moreover, much has been invested to make sure the application behaves in a way that will make automated scanner’s job harder to ensure PT players do not “cheat” the game using automated tools, this also includes other complicated scenarios like forms which are not really forms:

JS events attached to images, fields which do not open, or are not editable until an icon is clicked.
One good example can be seen when looking at the images sources in the main page:

We can see multiple events listeners in the image, each one creating a different behavior.
Another good example is the “search” bar which hides a DOM XSS:

The search bar is non-existent until a click or touch event triggers happens and then the DOM enables the search bar:

Another example if the “Directory Listing”, usually this issue talks about a misconfiguration in the server level that enables browsing directories using the browser, it looks like:

In Juiceshop instead the behavior is an in-app directory browsing library, that allows you to go through the files on a specific folder. this is not what we would classify as “Directory Listing” and it’s more about application feature inside of JuiceShop:

There are other examples of behavior that is very human centrist in order to make sure automated tools have hard time parsing the targets and managing to run scans.
Conclusion
In conclusion, while OWASP Juice Shop provides an engaging platform for PT teams and serves its intended purpose as a gamified CTF application, it falls short as an ideal benchmarking target for DAST tools. Its unique design choices, non-standard HTTP practices, and deliberate anti-automation features pose challenges that diverge from the realistic security scenarios encountered in actual applications. To ensure comprehensive security testing and benchmarking, it is crucial to consider applications that more closely emulate real-world conditions. As the cybersecurity landscape evolves, the need for reliable and realistic benchmarks becomes increasingly vital in fortifying applications against emerging threats.
This is why we should consider proper modern benchmarks like the following:
- BrokenCrystals – Broken Crystals (sources at: GitHub – NeuraLegion/brokencrystals: A Broken Application – Very Vulnerable! )
- DVGA – GitHub – dolevf/Damn-Vulnerable-GraphQL-Application: Damn Vulnerable GraphQL Application is an intentionally vulnerable implementation of Facebook’s GraphQL technology, to learn and practice GraphQL Security.
- VAPI – GitHub – roottusk/vapi: vAPI is Vulnerable Adversely Programmed Interface which is Self-Hostable API that mimics OWASP API Top 10 scenarios through Exercises.
- crAPI – GitHub – OWASP/crAPI: completely ridiculous API (crAPI)
