AI for Trust

Challenge

Artificial Intelligence (AI) is increasingly shaping all areas of our daily lives – from product recommendations to highly critical decisions. These systems are often described as “black boxes,” whose decision-making processes and underlying mechanisms are not comprehensible to users, making it difficult to build trust and ensure responsible use. Current approaches to increasing transparency in AI systems often overlook consistent user-centered design and evaluation.

Approach

This project explored

  • Common methods for measuring and evaluating trust in AI in order to improve standards in empirical research
  • The effectiveness and acceptance of AI systems in three areas of application to provide recommendations for the design of trustworthy AI systems

Results

Study 1: analyzed two decades of research on trust in AI to critically examine its limits as a framework for evaluating adaptive, responsive systems.

Key findings: Trust in AI is often equated with reliability, overlooking essential human aspects. Many models rely on outdated automation concepts, oversimplifying human-AI interaction. Instead of focusing solely on acceptance and error rates, transparency should support shared understanding and effective collaboration.

Study 2: tested a collaborative immersive analytics system developed by the lab to explore how decisions are made in data-driven teams with diverse expertise.

Key findings: Effective teamwork in ML modeling depends on clear roles, shared frameworks, and mutual awareness. Challenges included blind trust in model outputs and usability issues. Systems should support reflective collaboration, enable task delegation, and provide tools for shared understanding.

Study 3: explored how users assess the trustworthiness of Large Language Models (LLMs) like GPT-4, aiming to bridge the gap between expert debates and user perspectives.

Key finding: Users identified 68 trust criteria, emphasizing transparent governance, monitoring, and user-centered design. Technical terms like auditability were less understood. The study shows that trust in LLMs stems more from clear communication than technical features—highlighting the need for user-focused standards and education in LLM development and regulation.

Study 4: examined how different AI-based decision support system (DSS) paradigms influence human decision-making, especially in complex, high-stakes contexts like health or justice.

Key finding: In a web experiment with 290 participants, recommendation-based DSS led to more uncritical acceptance of suggestions, while hypothesis-driven DSS encouraged analytical thinking and better decision accuracy. The study highlights that not just whether, but how DSS are used significantly impacts decision quality—crucial for designing responsible AI systems.

Selected Scientific Contributions

Benk, M., Fröhlich, A., Wangenheim, F. & Miller, T. Hypothesis- and Recommendation-Driven Strategies for Responsive Decision Support (Under review).

Benk, M., Schlicker, N., von Wangenheim, F., & Scharowski, N. (2025). Bridging the knowledge gap: Understanding user expectations for trustworthy LLM standards. In Proceedings of the 39th Annual AAAI Conference on Artificial Intelligence (forthcoming).

Benk, M., Kerstan, S., von Wangenheim, F., & Ferrario, A. (2024). Twenty-four years of empirical research on trust in AI: a bibliometric review of trends, overlooked issues, and future directions. AI & society, 1-24.

Benk, M., Weibel, R. P., Feuerriegel S., & Ferrario, A. (2022). “Is It My Turn?”: Assessing Teamwork and Taskwork in Collaborative Immersive Analytics. Proc. ACM Hum.-Comput. Interact. 6, CSCW2, Article 479 (November 2022).

Benk, M., Weibel, R. P., & Ferrario, A. (2022). Creative Uses of AI Systems and their Explanations: A Case Study from Insurance. ACM CHI 2022 Workshop on Human-Centered Explainable AI (HCXAI’22).

Benk, M., Tolmeijer, S., von Wangenheim, F., & Ferrario, A. (2022). The Value of Measuring Trust in AI-A Socio-Technical System Perspective. ACM CHI 2022 Workshop on Trust and Reliance in AI-Human Teams (TRAIT’22).

Project Status

Completed

Lead Researchers

Michaela Benk, Dr. Joseph Ollier