#

ai-safety

Here are 93 public repositories matching this topic...

erfanshayegani / Jailbreak-In-Pieces

Jailbreak in pieces: Compositional Adversarial Attacks on Multi-Modal Language Models - 🔥 ICLR 2024 Spotlight - 🏆 Best Paper Award SoCal NLP 2023

alignment ai-safety vlm llm vision-language-models cross-modality-safety-alignment multi-modal-models

Updated Jun 6, 2024
Python

giskard

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for LLMs and ML models

Updated Jun 6, 2024
Python

ztjona / ztjona.github.io

My personal website.

machine-learning deep-learning ai-safety

Updated Jun 5, 2024
HTML

jphall663 / awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.

Updated Jun 4, 2024

normster / llm_rules

RuLES: a benchmark for evaluating rule-following in language models

ai-safety ai-security gpt-4

Updated Jun 4, 2024
Python

Nkluge-correa / Aira

Aira is a series of chatbots developed as an experimentation playground for value alignment.

natural-language-processing ai chatbot alignment language-model ai-safety

Updated Jun 4, 2024
Jupyter Notebook

IQTLabs / daisybell

Scan your AI/ML models for problems before you put them into production.

cybersecurity ai-safety bias-correction bias-detection ai-alignment model-poison ai-assurance

Updated Jun 6, 2024
Python

riceissa / aiwatch

Website to track people, organizations, and products (tools, websites, etc.) in AI safety

mysql php database dataset ai-safety data-portal aisafety ai-alignment

Updated Jun 6, 2024
HTML

moonwatcher-ai / moonwatcher

Evaluation & testing framework for computer vision models

computer-vision ai-safety ethical-artificial-intelligence ai-security mlops ml-safety ml-validation trustworthy-ai ml-testing

Updated Jun 3, 2024
Python

StampyAI / stampy-ui

AI Safety Q&A web frontend

Updated Jun 6, 2024
TypeScript

dynaroars / neuralsat

DPLL(T)-based Verification tool for DNNs

abstraction sat-solver software-verification ai-safety robustness dpll adversarial-attacks robustness-verification dnn-verification ai-assurance neural-network-veri

Updated May 29, 2024
Python

wesg52 / universal-neurons

Universal Neurons in GPT2 Language Models

ai-safety interpretability llm mechanistic-interpretability

Updated May 28, 2024
Jupyter Notebook

levitation-opensource / ai-safety-gridworlds

Extended, multi-agent and multi-objective (MaMoRL) environments based on DeepMind's AI Safety Gridworlds. This is a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. It is made compatible with OpenAI's Gym/Gymnasium and Farama Foundation PettingZoo.

Updated May 25, 2024
Python

SafeAILab / RAIN

[ICLR'24] RAIN: Your Language Models Can Align Themselves without Finetuning

alignment ai-safety large-language-models

Updated May 23, 2024
Python

WindVChen / VCO-AP

A novel physical adversarial attack tackling the Digital-to-Physical Visual Inconsistency problem.

remote-sensing object-detection ai-safety adversarial-attacks physical-attacks oriented-object-detection adversarial-patches physical-adversarial-attacks

Updated May 23, 2024
Python

yyy01 / PAC

The official implementation of the paper "Data Contamination Calibration for Black-box LLMs" (ACL 2024)

nlp machine-learning ai-safety data-contamination membership-inference-attack large-language-models

Updated May 21, 2024
Python

zhoumingyi / ModelObfuscator

Code for our paper "Modelobfuscator: Obfuscating Model Information to Protect Deployed ML-Based Systems" that has been published by ISSTA'23

obfuscation deep-learning ai-safety

Updated May 18, 2024
C++

PKU-YuanGroup / Hallucination-Attack

Attack to induce LLMs within hallucinations

nlp machine-learning deep-learning ai-safety adversarial-attacks hallucinations llm llm-safety

Updated May 17, 2024
Python

dynaroars / vnncomp-benchmark-generation

benchmark verification ai-safety ai-assurance vnncomp

Updated May 12, 2024
Python

jacksonkarel / recursive-other-improvement

ai artificial-intelligence code-generation agents automl autonomous-agents ai-safety ai-agents large-language-models llms

Updated May 7, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the ai-safety topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-safety topic, visit your repo's landing page and select "manage topics."