About 340,000 results
Open links in new tab
  1. EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic ...

    Sep 16, 2025 · A fundamental limitation of current AI agents is their inability to learn complex skills on the fly at test time, often behaving like “clever but clueless interns” in novel environments. This …

  2. CLEVER: A Curated Benchmark for Formally Verified Code Generation

    Jul 9, 2025 · TL;DR: We introduce CLEVER, a hand-curated benchmark for verified code generation in Lean. It requires full formal specs and proofs. No few-shot method solves all stages, making it a …

  3. Submissions | OpenReview

    Jan 22, 2025 · Leaving the barn door open for Clever Hans: Simple features predict LLM benchmark answers Lorenzo Pacchiardi, Marko Tesic, Lucy G Cheke, Jose Hernandez-Orallo 27 Sept 2024 …

  4. STAIR: Improving Safety Alignment with Introspective Reasoning

    May 1, 2025 · One common approach is training models to refuse unsafe queries, but this strategy can be vulnerable to clever prompts, often referred to as jailbreak attacks, which can trick the AI into …

  5. LongWriter: Unleashing 10,000+ Word Generation from Long Context …

    Jan 22, 2025 · The work includes a new benchmark (LongBench-Write) for evaluating ultra-long generation. Reviewers highlighted the paper's clear identification of the problem, the clever and …

  6. Contrastive Learning Via Equivariant Representation - OpenReview

    Sep 25, 2024 · In this paper, we revisit the roles of augmentation strategies and equivariance in improving CL's efficacy. We propose CLeVER (Contrastive Learning Via Equivariant …

  7. DeblurDiff: Real-Word Image Deblurring with Generative Diffusion...

    Sep 18, 2025 · Strengths: Using diffusion based Kernel Prediction in the latent domain to iteratively refine the input for the diffusion based restoration model is a clever idea. It tackles a prevalent …

  8. Do Histopathological Foundation Models Eliminate Batch Effects? A ...

    Oct 11, 2024 · Deep learning has led to remarkable advancements in computational histopathology, e.g., in diagnostics, biomarker prediction, and outcome prognosis. Yet, the lack of annotated data …

  9. Benchmarking and Enhancing Rational Preference Utilization for ...

    Sep 14, 2025 · Large language model (LLM)-powered assistants have recently integrated memory mechanisms that record user preferences, leading to more personalized and user-aligned …

  10. Super Deep Contrastive Information Bottleneck for Multi-modal...

    May 1, 2025 · In multi-modal clustering, effectively capturing the complex relationships between modalities remains a challenge. For solving this, this paper propose a new super deep contrastive …