Arxiv Day: Article

LAB-Bench: Measuring Capabilities of Language Models for Biology Research

There is widespread optimism that frontier Large Language Models (LLMs) and LLM-augmented systems have the potential to rapidly accelerate scientific discovery across disciplines. Today, many benchmarks exist to measure LLM knowledge and reasoning on textbook-style science questions, but few if any benchmarks are designed to evaluate language model performance on practical tasks required for scientific research, such as literature search, protocol planning, and data analysis. As a step toward building such benchmarks, we introduce the Language Agent Biology Benchmark (LAB-Bench), a broad dataset of over 2,400 multiple choice questions for evaluating AI systems on a range of practical biology research capabilities, including recall and reasoning over literature, interpretation of figures, access and navigation of databases, and comprehension and manipulation of DNA and protein sequences. Importantly, in contrast to previous scientific benchmarks, we expect that an AI system that can achieve consistently high scores on the more difficult LAB-Bench tasks would serve as a useful assistant for researchers in areas such as literature search and molecular cloning. As an initial assessment of the emergent scientific task capabilities of frontier language models, we measure performance of several against our benchmark and report results compared to human expert biology researchers. We will continue to update and expand LAB-Bench over time, and expect it to serve as a useful tool in the development of automated research systems going forward. A public subset of LAB-Bench is available for use at the following URL: https://huggingface.co/datasets/futurehouse/lab-bench

Updated: 2024-07-14 23:52:25

标题: LAB-Bench：衡量语言模型在生物研究中的能力

摘要: 有广泛的乐观情绪认为，前沿的大型语言模型（LLMs）和LLM增强系统具有加速跨学科科学发现的潜力。今天，存在许多基准来衡量LLM在课本风格科学问题上的知识和推理能力，但几乎没有基准是设计用来评估语言模型在科学研究所需的实际任务上的表现，例如文献搜索、协议规划和数据分析。作为建立这类基准的一步，我们介绍了Language Agent Biology Benchmark（LAB-Bench），这是一个包含超过2400个多项选择题的广泛数据集，用于评估人工智能系统在一系列实际生物学研究能力上的表现，包括对文献的回忆和推理、图表的解释、数据库的访问和导航，以及DNA和蛋白质序列的理解和操作。重要的是，与以往的科学基准相比，我们期望一个能够在更困难的LAB-Bench任务上实现一致高分的人工智能系统将成为研究人员在文献搜索和分子克隆等领域的有用助手。作为对前沿语言模型新兴科学任务能力的初步评估，我们对几个模型在我们的基准上的表现进行了测量，并将结果与人类专家生物学研究人员进行了比较。我们将继续随时间更新和扩展LAB-Bench，并期望它在未来的自动化研究系统开发中成为一个有用工具。LAB-Bench的公共子集可在以下网址使用：https://huggingface.co/datasets/futurehouse/lab-bench

更新时间: 2024-07-14 23:52:25

领域: cs.AI

下载: http://arxiv.org/abs/2407.10362v1

Evolved Developmental Artificial Neural Networks for Multitasking with Advanced Activity Dependence

Recently, Cartesian Genetic Programming has been used to evolve developmental programs to guide the formation of artificial neural networks (ANNs). This approach has demonstrated success in enabling ANNs to perform multiple tasks while avoiding catastrophic forgetting. One unique aspect of this approach is the use of separate developmental programs evolved to regulate the development of separate soma and dendrite units. An opportunity afforded by this approach is the ability to incorporate Activity Dependence (AD) into the model such that environmental feedback can help to regulate the behavior of each type of unit. Previous work has shown a limited version of AD (influencing neural bias) to provide marginal improvements over non-AD ANNs. In this work, we present promising results from new extensions to AD. Specifically, we demonstrate a more significant improvement via AD on new neural parameters including health and position, as well as a combination of all of these along with bias. We report on the implications of this work and suggest several promising directions for future work.

Updated: 2024-07-14 23:39:07

标题: 进化发展的人工神经网络在具有高级活动依赖性的多任务处理中的应用

摘要: 最近，笛卡尔遗传编程被用来演化发育程序来引导人工神经网络（ANNs）的形成。这种方法已经证明成功，在使ANNs能够执行多个任务的同时避免灾难性遗忘方面取得了成功。这种方法的一个独特之处是使用演化的独立发育程序来调节独立的细胞体和树突单位的发育。这种方法提供的一个机会是能够将活动依赖性（AD）纳入模型中，以便环境反馈可以帮助调节每种单位的行为。先前的工作表明有限的AD版本（影响神经偏见）相对于非AD ANNs提供了边际改进。在这项工作中，我们展示了对AD的新扩展取得了令人鼓舞的结果。具体地，我们证明了AD对新的神经参数（包括健康和位置）以及所有这些参数和偏见的组合都带来了更显著的改进。我们报告了这项工作的影响，并提出了几个有前途的未来工作方向。

更新时间: 2024-07-14 23:39:07

领域: cs.NE,cs.AI,I.2.6; I.2.11

下载: http://arxiv.org/abs/2407.10359v1

Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction

Large language models are prominently used in real-world applications, often tasked with reasoning over large volumes of documents. An exciting development in this space is models boasting extended context capabilities, with some accommodating over 2 million tokens. Such long context model capabilities remain uncertain in production systems, motivating the need to benchmark their performance on real world use cases. We address this challenge by proposing SWiM, an evaluation framework that addresses the limitations of standard tests. Testing the framework on eight long context models, we find that even strong models such as GPT-4 and Claude 3 Opus degrade in performance when information is present in the middle of the context window (lost-in-the-middle effect). Next, in addition to our benchmark, we propose medoid voting, a simple, but effective training-free approach that helps alleviate this effect, by generating responses a few times, each time randomly permuting documents in the context, and selecting the medoid answer. We evaluate medoid voting on single document QA tasks, achieving up to a 24% lift in accuracy. Our code is available at https://github.com/snorkel-ai/long-context-eval.

Updated: 2024-07-14 22:47:13

标题: 评估语言模型上下文窗口：一个“工作记忆”测试和推理时间校正

摘要: 大型语言模型在现实世界的应用中得到了突出的应用，通常被赋予在大量文档上进行推理的任务。在这一领域的一个令人兴奋的发展是具有扩展上下文能力的模型，有些模型可以容纳超过200万个标记。这种长上下文模型的能力在生产系统中仍然存在不确定性，促使我们有必要在真实用例上对它们的性能进行基准测试。我们通过提出SWiM来解决这一挑战，这是一个解决标准测试限制的评估框架。在八个长上下文模型上测试该框架时，我们发现即使是强大的模型如GPT-4和Claude 3 Opus，在信息出现在上下文窗口中间时也会降低性能（中间信息丢失效应）。接下来，除了我们的基准测试，我们提出了medoid投票，这是一种简单但有效的无需训练的方法，通过多次生成响应，在每次随机排列上下文中的文档，并选择medoid答案来帮助减轻这种效应。我们在单文档问答任务上评估了medoid投票，准确率提高了高达24%。我们的代码可在https://github.com/snorkel-ai/long-context-eval上找到。

更新时间: 2024-07-14 22:47:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.03651v2

SynCode: LLM Generation with Grammar Augmentation

LLMs are widely used in complex AI applications. These applications underscore the need for LLM outputs to adhere to a specific format, for their integration with other components in the systems. Typically the format rules e.g., for data serialization formats such as JSON, YAML, or Code in Programming Language are expressed as context-free grammar (CFG). Due to the hallucinations and unreliability of LLMs, instructing LLMs to adhere to specified syntax becomes an increasingly important challenge. We present SynCode, a novel framework for efficient and general syntactical decoding with LLMs, to address this challenge. SynCode ensures soundness and completeness with respect to the CFG of a formal language, effectively retaining valid tokens while filtering out invalid ones. SynCode uses an offline-constructed, efficient lookup table, the DFA mask store, derived from the DFA of the language's grammar for efficient generation. SynCode seamlessly integrates with any language defined by CFG, as evidenced by experiments focusing on generating JSON, Python, and Go outputs. Our experiments evaluating the effectiveness of SynCode for JSON generation demonstrate that SynCode eliminates all syntax errors and significantly outperforms state-of-the-art baselines. Furthermore, our results underscore how SynCode significantly reduces 96.07% of syntax errors in generated Python and Go code, showcasing its substantial impact on enhancing syntactical precision in LLM generation. Our code is available at https://github.com/uiuc-focal-lab/syncode

Updated: 2024-07-14 22:22:59

标题: SynCode：通过语法增强生成LLM

摘要: LLMs在复杂的人工智能应用中被广泛使用。这些应用强调LLM输出需要遵循特定格式，以便与系统中的其他组件集成。通常格式规则，例如数据序列化格式（如JSON、YAML或编程语言中的代码）都以上下文无关文法（CFG）的形式表达。由于LLMs的幻觉和不可靠性，要求LLMs遵循指定的语法成为一个日益重要的挑战。我们提出了SynCode，一个用于有效和通用的与LLMs一起解码的新框架，以解决这一挑战。SynCode确保对形式语言的CFG的完整性和准确性，有效地保留有效的标记同时过滤掉无效的标记。SynCode使用离线构建的高效查找表，即DFA掩码存储库，从语言文法的DFA中派生，以实现高效生成。SynCode与由CFG定义的任何语言无缝集成，实验证明其在生成JSON、Python和Go输出方面的有效性。我们的实验评估了SynCode在JSON生成方面的有效性，结果表明SynCode消除了所有的语法错误，并在性能上显著优于现有的基准。此外，我们的结果强调了SynCode在生成的Python和Go代码中显著减少了96.07％的语法错误，展示了它对提高LLM生成中的语法准确性的重大影响。我们的代码可在https://github.com/uiuc-focal-lab/syncode 找到。

更新时间: 2024-07-14 22:22:59

领域: cs.LG,cs.FL,cs.PL,cs.SE

下载: http://arxiv.org/abs/2403.01632v3

Share, Collaborate, Benchmark: Advancing Travel Demand Research through rigorous open-source collaboration

This research foregrounds general practices in travel demand research, emphasizing the need to change our ways. A critical barrier preventing travel demand literature from effectively informing policy is the volume of publications without clear, consolidated benchmarks, making it difficult for researchers and policymakers to gather insights and use models to guide decision-making. By emphasizing reproducibility and open collaboration, we aim to enhance the reliability and policy relevance of travel demand research. We present a collaborative infrastructure for transit demand prediction models, focusing on their performance during highly dynamic conditions like the COVID-19 pandemic. Drawing from over 300 published papers, we develop an open-source infrastructure with five common methodologies and assess their performance under stable and dynamic conditions. We found that the prediction error for the LSTM deep learning approach stabilized at a mean arctangent absolute percentage error (MAAPE) of about 0.12 within 1.5 months, whereas other models continued to exhibit higher error rates even a year into the pandemic. If research practices had prioritized reproducibility before the COVID-19 pandemic, transit agencies would have had clearer guidance on the best forecasting methods and quickly identified those best suited for pandemic conditions to inform operations in response to changes in transit demand. The aim of this open-source codebase is to lower the barrier for other researchers to replicate, reproduce models and build upon findings. We encourage researchers to test their own modeling approaches on this benchmarking platform, challenge the analyses conducted in this paper, and develop model specifications that can outperform those evaluated here. Further, collaborative research approaches must be expanded across travel demand modeling if we wish to impact policy and planning.

Updated: 2024-07-14 22:11:43

标题: 分享，合作，基准：通过严格的开源合作推动旅行需求研究

摘要: 这项研究突出了旅行需求研究中的一般实践，强调改变我们的方式的必要性。阻碍旅行需求文献有效指导政策的关键障碍是缺乏明确、整合的基准，这使得研究人员和政策制定者难以获取见解并利用模型指导决策。通过强调可重现性和开放协作，我们旨在提高旅行需求研究的可靠性和政策相关性。我们提出了一个用于公共交通需求预测模型的协作基础设施，重点关注它们在像COVID-19大流行这样高动态条件下的性能。借鉴了300多篇发表的论文，我们开发了一个开源基础设施，包括五种常见方法，并评估它们在稳定和动态条件下的表现。我们发现，LSTM深度学习方法的预测误差在1.5个月内稳定在约0.12的平均反正切百分比误差（MAAPE），而其他模型在疫情持续一年后仍然表现出较高的错误率。如果在COVID-19大流行之前研究实践优先考虑了可重现性，公共交通机构将会获得更清晰的指导，了解最佳预测方法，并迅速确定最适合疫情条件的方法，以指导对公共交通需求变化的响应。这个开源代码库的目的是降低其他研究人员复制、重现模型和建立在研究结果基础上的障碍。我们鼓励研究人员在这个基准平台上测试他们自己的建模方法，挑战本文中进行的分析，并开发能够超越这里评估的模型规范的模型。此外，如果我们希望影响政策和规划，协作研究方法必须在旅行需求建模领域得到扩展。

更新时间: 2024-07-14 22:11:43

领域: cs.LG

下载: http://arxiv.org/abs/2306.06194v2

Affordance-Guided Reinforcement Learning via Visual Prompting

Robots equipped with reinforcement learning (RL) have the potential to learn a wide range of skills solely from a reward signal. However, obtaining a robust and dense reward signal for general manipulation tasks remains a challenge. Existing learning-based approaches require significant data, such as demonstrations or examples of success and failure, to learn task-specific reward functions. Recently, there is also a growing adoption of large multi-modal foundation models for robotics. These models can perform visual reasoning in physical contexts and generate coarse robot motions for various manipulation tasks. Motivated by this range of capability, in this work, we propose and study rewards shaped by vision-language models (VLMs). State-of-the-art VLMs have demonstrated an impressive ability to reason about affordances through keypoints in zero-shot, and we leverage this to define dense rewards for robotic learning. On a real-world manipulation task specified by natural language description, we find that these rewards improve the sample efficiency of autonomous RL and enable successful completion of the task in 20K online finetuning steps. Additionally, we demonstrate the robustness of the approach to reductions in the number of in-domain demonstrations used for pretraining, reaching comparable performance in 35K online finetuning steps.

Updated: 2024-07-14 21:41:29

标题: 通过视觉提示引导的赋能增强学习

摘要: 装备强化学习（RL）的机器人有潜力仅通过奖励信号学习各种技能。然而，为一般操作任务获得稳健和密集的奖励信号仍然是一个挑战。现有的基于学习的方法需要大量数据，如演示或成功和失败的示例，来学习特定于任务的奖励函数。最近，机器人领域也越来越多地采用大型多模态基础模型。这些模型可以在物理环境中进行视觉推理，并为各种操作任务生成粗略的机器人动作。在这项工作中，受到这种能力范围的启发，我们提出并研究由视觉语言模型（VLMs）塑造的奖励。最先进的VLMs已经展示了惊人的能力，通过零样本中的关键点来推理出合适性，我们利用这一点来定义机器人学习的密集奖励。在由自然语言描述指定的真实世界操作任务中，我们发现这些奖励提高了自主RL的样本效率，并使任务在20K在线微调步骤中成功完成。此外，我们展示了该方法对用于预训练的领域内演示数量减少的稳健性，在35K在线微调步骤中达到可比较的性能。

更新时间: 2024-07-14 21:41:29

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10341v1

Mapping the Scholarship of Dark Pattern Regulation: A Systematic Review of Concepts, Regulatory Paradigms, and Solutions from an Interdisciplinary Perspective

Dark patterns, design tricks used on online interfaces to manipulate users decision-making process, have raised public concerns. However, research on regulation of dark pattern remains underdeveloped and scattered, particularly regarding scholars views on the concept, regulatory paradigms, and solutions. Following PRISMA guidelines, this paper systematically reviews the formats and content of regulatory discussions on dark patterns from the interdisciplinary scholarship of Law and Human-Computer Interaction. A total of 65 studies were analysed through content and thematic analysis. This study synthesises the unique trends and characteristics of legal scholarship on dark patterns, identifying five root problems and triple layered harms. It critiques current regulations in terms of legal theories and sectoral legislations, highlighting their inadequacies in addressing dark patterns. The paper also critically examines existing proposed solutions, including paradigmatic shifts in legal doctrines, refinements to existing frameworks, technical design-embedded solutions, and accountability measures for design practices. This research critically discusses the current barriers to effective dark pattern regulations and explores promising regulatory solutions. The difficulty in identifying the normative nature of various forms of dark patterns, in identifying evident and actionable harm, and the expanding scope of dark patterns connotation inherently hinders effective regulation. However, technical design-embedded solutions, accountability frameworks, and practical design guidelines offer potential routes for more proactive regulation, while legal pluralism stands as a promising macro-level change in regulatory paradigms for dark pattern regulation.

Updated: 2024-07-14 21:41:18

标题: 映射黑色模式监管的学术研究：跨学科视角下对概念、监管范式和解决方案的系统评价

摘要: 暗模式是在线界面上使用的设计技巧，用来操纵用户的决策过程，引起了公众关注。然而，对暗模式的监管研究仍未完善且零散，特别是关于学者对概念、监管范式和解决方案的看法。本文遵循PRISMA指南，系统地审查了法律和人机交互学科的跨学科学术界对暗模式的监管讨论的格式和内容。通过内容和主题分析，分析了总共65项研究。本研究综合了法律学术界对暗模式的独特趋势和特征，识别了五个根本问题和三重层面的危害。它批评了当前在法律理论和行业立法方面的监管，突出了它们在解决暗模式问题上的不足。本文还批判性地审查了现有的提出的解决方案，包括法律学说的范式转变，对现有框架的完善，技术设计嵌入式解决方案以及对设计实践的问责措施。这项研究批判性地讨论了有效监管暗模式的当前障碍，并探讨了有前景的监管解决方案。确定各种形式的暗模式的规范性质，确定明显和可行的危害，以及暗模式内涵范围的扩大困难，固有地阻碍了有效监管。然而，技术设计嵌入式解决方案、问责框架和实际设计准则为更积极的监管提供了潜在途径，而法律多元主义则作为暗模式监管监管范式的宏观层面变革的有前途的选择。

更新时间: 2024-07-14 21:41:18

领域: cs.CY,cs.AI,cs.HC,cs.IT,cs.SI,math.IT

下载: http://arxiv.org/abs/2407.10340v1

SENTINEL: Securing Indoor Localization against Adversarial Attacks with Capsule Neural Networks

With the increasing demand for edge device powered location-based services in indoor environments, Wi-Fi received signal strength (RSS) fingerprinting has become popular, given the unavailability of GPS indoors. However, achieving robust and efficient indoor localization faces several challenges, due to RSS fluctuations from dynamic changes in indoor environments and heterogeneity of edge devices, leading to diminished localization accuracy. While advances in machine learning (ML) have shown promise in mitigating these phenomena, it remains an open problem. Additionally, emerging threats from adversarial attacks on ML-enhanced indoor localization systems, especially those introduced by malicious or rogue access points (APs), can deceive ML models to further increase localization errors. To address these challenges, we present SENTINEL, a novel embedded ML framework utilizing modified capsule neural networks to bolster the resilience of indoor localization solutions against adversarial attacks, device heterogeneity, and dynamic RSS fluctuations. We also introduce RSSRogueLoc, a novel dataset capturing the effects of rogue APs from several real-world indoor environments. Experimental evaluations demonstrate that SENTINEL achieves significant improvements, with up to 3.5x reduction in mean error and 3.4x reduction in worst-case error compared to state-of-the-art frameworks using simulated adversarial attacks. SENTINEL also achieves improvements of up to 2.8x in mean error and 2.7x in worst-case error compared to state-of-the-art frameworks when evaluated with the real-world RSSRogueLoc dataset.

Updated: 2024-07-14 21:40:12

标题: 哨兵：使用胶囊神经网络保护室内定位免受对抗性攻击

摘要: 随着室内环境中对边缘设备提供基于位置的服务需求不断增加，由于室内GPS不可用，Wi-Fi接收信号强度（RSS）指纹识别变得流行起来。然而，由于室内环境动态变化和边缘设备异质性导致RSS波动，室内定位的稳健和高效面临着几个挑战，从而降低了定位精度。虽然机器学习（ML）的进展显示出在缓解这些现象方面的潜力，但仍然存在一个开放问题。此外，来自对ML增强室内定位系统的对抗攻击的新威胁，特别是由恶意或流氓接入点（APs）引入的攻击，可以欺骗ML模型以进一步增加定位错误。为了解决这些挑战，我们提出了SENTINEL，这是一个利用修改后的胶囊神经网络加强室内定位解决方案对抗对抗攻击、设备异质性和动态RSS波动的新型嵌入式ML框架。我们还介绍了RSSRogueLoc，一个捕捉多个真实室内环境中流氓AP效应的新数据集。实验评估表明，SENTINEL取得了显著改进，与使用模拟对抗攻击的最先进框架相比，平均误差减少了3.5倍，最坏情况下的误差减少了3.4倍。与最先进框架相比，SENTINEL在使用真实世界的RSSRogueLoc数据集进行评估时，平均误差和最坏情况下的误差分别提高了2.8倍和2.7倍。

更新时间: 2024-07-14 21:40:12

领域: eess.SP,cs.CR,cs.DC,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.11091v1

Thyroidiomics: An Automated Pipeline for Segmentation and Classification of Thyroid Pathologies from Scintigraphy Images

The objective of this study was to develop an automated pipeline that enhances thyroid disease classification using thyroid scintigraphy images, aiming to decrease assessment time and increase diagnostic accuracy. Anterior thyroid scintigraphy images from 2,643 patients were collected and categorized into diffuse goiter (DG), multinodal goiter (MNG), and thyroiditis (TH) based on clinical reports, and then segmented by an expert. A ResUNet model was trained to perform auto-segmentation. Radiomic features were extracted from both physician (scenario 1) and ResUNet segmentations (scenario 2), followed by omitting highly correlated features using Spearman's correlation, and feature selection using Recursive Feature Elimination (RFE) with XGBoost as the core. All models were trained under leave-one-center-out cross-validation (LOCOCV) scheme, where nine instances of algorithms were iteratively trained and validated on data from eight centers and tested on the ninth for both scenarios separately. Segmentation performance was assessed using the Dice similarity coefficient (DSC), while classification performance was assessed using metrics, such as precision, recall, F1-score, accuracy, area under the Receiver Operating Characteristic (ROC AUC), and area under the precision-recall curve (PRC AUC). ResUNet achieved DSC values of 0.84$\pm$0.03, 0.71$\pm$0.06, and 0.86$\pm$0.02 for MNG, TH, and DG, respectively. Classification in scenario 1 achieved an accuracy of 0.76$\pm$0.04 and a ROC AUC of 0.92$\pm$0.02 while in scenario 2, classification yielded an accuracy of 0.74$\pm$0.05 and a ROC AUC of 0.90$\pm$0.02. The automated pipeline demonstrated comparable performance to physician segmentations on several classification metrics across different classes, effectively reducing assessment time while maintaining high diagnostic accuracy. Code available at: https://github.com/ahxmeds/thyroidiomics.git.

Updated: 2024-07-14 21:29:28

标题: 甲状腺组学：一种用于甲状腺核素显像图像分割和分类的自动化流程

摘要: 本研究的目标是开发一个自动化流程，利用甲状腺闪烁显像图像增强甲状腺疾病分类，旨在减少评估时间并提高诊断准确性。收集了2,643名患者的前置甲状腺闪烁显像图像，并根据临床报告将其分为弥漫性甲状腺肿（DG）、多结节性甲状腺肿（MNG）和甲状腺炎（TH），然后由专家进行分割。训练了一个ResUNet模型来执行自动分割。从医师（场景1）和ResUNet分割（场景2）中提取了放射组学特征，然后使用Spearman相关系数排除高度相关的特征，并使用XGBoost作为核心进行特征选择。所有模型均在留一中心外交叉验证（LOCOCV）方案下训练，其中九个算法实例被迭代地训练和验证，分别在八个中心的数据上进行训练和验证，并在第九个中心进行测试，对于两种不同的场景分别进行。使用Dice相似系数（DSC）评估了分割性能，而使用精度、召回率、F1分数、准确率、接收者操作特征（ROC AUC）下的面积和精度-召回率曲线（PRC AUC）下的面积等指标评估了分类性能。ResUNet实现了MNG、TH和DG的DSC值分别为0.84±0.03、0.71±0.06和0.86±0.02。在场景1中，分类的准确度为0.76±0.04，ROC AUC为0.92±0.02，而在场景2中，分类的准确度为0.74±0.05，ROC AUC为0.90±0.02。自动化流程在不同类别的几个分类指标上表现出与医师分割相当的性能，有效地减少评估时间的同时保持高诊断准确性。代码可在以下链接找到：https://github.com/ahxmeds/thyroidiomics.git。

更新时间: 2024-07-14 21:29:28

领域: eess.IV,cs.CV,cs.LG,physics.med-ph

下载: http://arxiv.org/abs/2407.10336v1

Towards Adapting Reinforcement Learning Agents to New Tasks: Insights from Q-Values

While contemporary reinforcement learning research and applications have embraced policy gradient methods as the panacea of solving learning problems, value-based methods can still be useful in many domains as long as we can wrangle with how to exploit them in a sample efficient way. In this paper, we explore the chaotic nature of DQNs in reinforcement learning, while understanding how the information that they retain when trained can be repurposed for adapting a model to different tasks. We start by designing a simple experiment in which we are able to observe the Q-values for each state and action in an environment. Then we train in eight different ways to explore how these training algorithms affect the way that accurate Q-values are learned (or not learned). We tested the adaptability of each trained model when retrained to accomplish a slightly modified task. We then scaled our setup to test the larger problem of an autonomous vehicle at an unprotected intersection. We observed that the model is able to adapt to new tasks quicker when the base model's Q-value estimates are closer to the true Q-values. The results provide some insights and guidelines into what algorithms are useful for sample efficient task adaptation.

Updated: 2024-07-14 21:28:27

标题: 朝向将强化学习智能体调整到新任务的方向：来自Q值的启示

摘要: 尽管当代强化学习研究和应用已经将政策梯度方法视为解决学习问题的万灵药，但基于价值的方法在许多领域仍然可以发挥作用，只要我们能够找到如何以样本高效的方式利用它们。在本文中，我们探讨了强化学习中DQNs的混沌特性，同时理解它们在训练时保留的信息如何被重新利用来适应不同任务的模型。我们首先设计了一个简单的实验，在其中我们能够观察环境中每个状态和动作的Q值。然后我们以八种不同的方式进行训练，探讨这些训练算法如何影响准确学习（或未学习）Q值的方式。我们测试了每个训练模型在重新训练以完成稍微修改的任务时的适应性。然后我们扩展了我们的设置，测试了一个无保护十字路口的自动驾驶车辆的更大问题。我们观察到，当基础模型的Q值估计值接近真实Q值时，模型能够更快地适应新任务。结果提供了一些见解和指导，说明哪些算法对于样本高效的任务适应是有用的。

更新时间: 2024-07-14 21:28:27

领域: cs.AI

下载: http://arxiv.org/abs/2407.10335v1

An Interpretable Neural Network for Vegetation Phenotyping with Visualization of Trait-Based Spectral Features

Plant phenotyping is the assessment of a plant's traits and plant identification is the process of determining the category such as genus and species. In this paper we present an interpretable neural network trained on the UPWINS spectral library which contains spectra with rich metadata across variation in species, health, growth stage, annual variation, and environmental conditions for 13 selected indicator species and natural common background species. We show that the neurons in the network learn spectral indicators for chemical and physiological traits through visualization of the network weights, and we show how these traits are combined by the network for species identification with an accuracy around 90% on a test set. While neural networks are often perceived as `black box' classifiers, our work shows that they can be in fact more explainable and informative than other machine learning methods. We show that the neurons learn fundamental traits about the vegetation, for example the composition of different types of chlorophyll present which indicates species as well as response to illumination conditions. There is clear excess training capacity in our network, and we expect that as the UPWINS spectral library continues to grow the approach in this paper will provide further foundational insights in understanding plant traits. This provides a methodology for designing and interpreting neural networks on spectral data in general, and provides a framework for using neural networks with hyperspectral imagery for understanding vegetation that is extendable to other domains.

Updated: 2024-07-14 21:20:37

标题: 一个可解释的神经网络用于植被表型学，可视化基于特征的光谱特征

摘要: 植物表型学是对植物特征进行评估，植物识别是确定类别（如属和种）的过程。本文介绍了一种在UPWINS光谱库上训练的可解释神经网络，该库包含13种选定指示物种和自然常见背景物种的光谱及丰富的元数据，涵盖了物种、健康状况、生长阶段、年度变化和环境条件的变化。我们展示了网络中的神经元通过可视化网络权重来学习化学和生理特征的光谱指标，并展示了网络如何将这些特征结合起来进行物种识别，在测试集上的准确率约为90%。尽管神经网络通常被视为“黑盒”分类器，我们的研究表明，实际上它们可能比其他机器学习方法更具解释性和信息性。我们展示了神经元学习有关植被的基本特征，例如不同类型叶绿素的组成，这表明了物种及对光照条件的响应。我们的网络具有明显的过度训练能力，我们预计随着UPWINS光谱库的持续增长，本文的方法将为理解植物特征提供进一步的基础性见解。这为设计和解释光谱数据上的神经网络提供了一种方法论，并为利用高光谱成像理解植被提供了一个框架，可扩展到其他领域。

更新时间: 2024-07-14 21:20:37

领域: cs.LG,q-bio.QM,stat.ML

下载: http://arxiv.org/abs/2407.10333v1

Ontology-driven Reinforcement Learning for Personalized Student Support

In the search for more effective education, there is a widespread effort to develop better approaches to personalize student education. Unassisted, educators often do not have time or resources to personally support every student in a given classroom. Motivated by this issue, and by recent advancements in artificial intelligence, this paper presents a general-purpose framework for personalized student support, applicable to any virtual educational system such as a serious game or an intelligent tutoring system. To fit any educational situation, we apply ontologies for their semantic organization, combining them with data collection considerations and multi-agent reinforcement learning. The result is a modular system that can be adapted to any virtual educational software to provide useful personalized assistance to students.

Updated: 2024-07-14 21:11:44

标题: 本体驱动的强化学习用于个性化学生支持

摘要: 在寻找更有效的教育方法过程中，人们普遍努力开发更好的个性化学生教育方法。在没有帮助的情况下，教育工作者通常没有时间或资源来亲自支持每位学生。受到这一问题的启发，并借鉴最近人工智能的进展，本文提出了一个通用的个性化学生支持框架，适用于任何虚拟教育系统，如严肃游戏或智能辅导系统。为适应任何教育情况，我们应用本体论进行语义组织，并结合数据收集考虑和多智能体强化学习。结果是一个模块化系统，可以适应任何虚拟教育软件，为学生提供有用的个性化支持。

更新时间: 2024-07-14 21:11:44

领域: cs.CY,cs.LG

下载: http://arxiv.org/abs/2407.10332v1

3D Foundation Models Enable Simultaneous Geometry and Pose Estimation of Grasped Objects

Humans have the remarkable ability to use held objects as tools to interact with their environment. For this to occur, humans internally estimate how hand movements affect the object's movement. We wish to endow robots with this capability. We contribute methodology to jointly estimate the geometry and pose of objects grasped by a robot, from RGB images captured by an external camera. Notably, our method transforms the estimated geometry into the robot's coordinate frame, while not requiring the extrinsic parameters of the external camera to be calibrated. Our approach leverages 3D foundation models, large models pre-trained on huge datasets for 3D vision tasks, to produce initial estimates of the in-hand object. These initial estimations do not have physically correct scales and are in the camera's frame. Then, we formulate, and efficiently solve, a coordinate-alignment problem to recover accurate scales, along with a transformation of the objects to the coordinate frame of the robot. Forward kinematics mappings can subsequently be defined from the manipulator's joint angles to specified points on the object. These mappings enable the estimation of points on the held object at arbitrary configurations, enabling robot motion to be designed with respect to coordinates on the grasped objects. We empirically evaluate our approach on a robot manipulator holding a diverse set of real-world objects.

Updated: 2024-07-14 21:02:55

标题: 3D基础模型实现握持物体的几何和姿态估计

摘要: 人类具有惊人的能力，可以使用手持物品作为工具与环境进行互动。为了实现这一点，人类内部估计手部运动如何影响物体的运动。我们希望赋予机器人这种能力。我们提出了一种方法，可以从外部摄像头捕捉的RGB图像中联合估计机器人抓取的物体的几何形状和姿态。值得注意的是，我们的方法将估计的几何形状转换为机器人的坐标系，而不需要校准外部摄像头的外部参数。我们的方法利用了3D基础模型，这些模型是在巨大数据集上进行了预训练用于3D视觉任务，以产生手持物体的初始估计。这些初始估计没有物理上正确的比例，并且是在相机的坐标系中。然后，我们制定并有效解决了一个坐标对齐问题，以恢复准确的比例，并将物体转换到机器人的坐标系中。随后可以从机械臂的关节角度定义正向运动学映射到物体上的指定点。这些映射使得可以在任意配置下估计手持物体上的点，从而使得机器人运动可以根据抓取物体上的坐标设计。我们在一个机器人操纵器上实证评估了我们的方法，该操纵器持有一组多样的真实世界物体。

更新时间: 2024-07-14 21:02:55

领域: cs.RO,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2407.10331v1

Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}}\left({{\epsilon}^{-4}{\lambda^{*}}^{-1}\log^5\left({\epsilon^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.

Updated: 2024-07-14 21:02:27

标题: 通过随机梯度哈密尔顿蒙特卡洛提升低精度抽样

摘要: 低精度训练已经成为一种有前途且低成本的技术，可以提高深度神经网络的训练效率，而不会牺牲太多准确性。其贝叶斯对应物能够进一步提供不确定性量化和改进的泛化准确性。本文通过使用低精度和全精度梯度累加器，研究了低精度采样以通过随机梯度哈密顿蒙特卡洛（SGHMC）来处理强对数凹和非对数凹分布。理论上，我们的结果表明，为了在非对数凹分布中实现2-Wasserstein距离中的$\epsilon$-误差，低精度SGHMC相比于最先进的低精度采样器随机梯度朗之万动力学（SGLD）实现了二次改进（$\widetilde{\mathbf{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$）。此外，我们证明低精度SGHMC相比于低精度SGLD更加稳健，因为基于动量的更新对梯度噪声具有鲁棒性。在实证方面，我们对合成数据和{MNIST，CIFAR-10 \& CIFAR-100}数据集进行了实验，验证了我们的理论发现。我们的研究突出了低精度SGHMC作为一种高效且准确的采样方法，适用于大规模和资源有限的机器学习。

更新时间: 2024-07-14 21:02:27

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2310.16320v2

Proof-of-Learning with Incentive Security

Most concurrent blockchain systems rely heavily on the Proof-of-Work (PoW) or Proof-of-Stake (PoS) mechanisms for decentralized consensus and security assurance. However, the substantial energy expenditure stemming from computationally intensive yet meaningless tasks has raised considerable concerns surrounding traditional PoW approaches, The PoS mechanism, while free of energy consumption, is subject to security and economic issues. Addressing these issues, the paradigm of Proof-of-Useful-Work (PoUW) seeks to employ challenges of practical significance as PoW, thereby imbuing energy consumption with tangible value. While previous efforts in Proof of Learning (PoL) explored the utilization of deep learning model training SGD tasks as PoUW challenges, recent research has revealed its vulnerabilities to adversarial attacks and the theoretical hardness in crafting a byzantine-secure PoL mechanism. In this paper, we introduce the concept of incentive-security that incentivizes rational provers to behave honestly for their best interest, bypassing the existing hardness to design a PoL mechanism with computational efficiency, a provable incentive-security guarantee and controllable difficulty. Particularly, our work is secure against two attacks to the recent work of Jia et al. [2021], and also improves the computational overhead from $\Theta(1)$ to $O(\frac{\log E}{E})$. Furthermore, while most recent research assumes trusted problem providers and verifiers, our design also guarantees frontend incentive-security even when problem providers are untrusted, and verifier incentive-security that bypasses the Verifier's Dilemma. By incorporating ML training into blockchain consensus mechanisms with provable guarantees, our research not only proposes an eco-friendly solution to blockchain systems, but also provides a proposal for a completely decentralized computing power market in the new AI age.

Updated: 2024-07-14 20:56:10

标题: 学习证明与激励安全

摘要: 大多数并发区块链系统在分散式共识和安全保障方面严重依赖工作量证明（PoW）或权益证明（PoS）机制。然而，由于计算密集但毫无意义的任务所导致的巨大能源消耗引起了对传统PoW方法的广泛关注。PoS机制虽然不消耗能源，但却存在安全和经济问题。为解决这些问题，有用工作证明（PoUW）范式旨在利用具有实际意义的挑战作为PoW，从而赋予能源消耗切实的价值。虽然先前的有关学习证明（PoL）的努力探讨了将深度学习模型训练SGD任务作为PoUW挑战的利用，但最近的研究揭示了其对敌对攻击的脆弱性以及在设计具有拜占庭安全的PoL机制上的理论困难。在本文中，我们引入了激励安全的概念，激励理性的证明者为了自身利益而诚实行事，绕过了设计PoL机制的现有困难，具有计算效率、可证明的激励安全保证和可控制的难度。特别是，我们的工作能够抵御贾等人最近工作中的两种攻击，并将计算开销从$\Theta(1)$改进到$O(\frac{\log E}{E})$。此外，尽管大多数最近的研究假设问题提供者和验证者是可信的，我们的设计也保证了前端激励安全，即使问题提供者是不可信的，也保证了绕过验证者困境的验证者激励安全。通过将机器学习训练纳入具有可证明保障的区块链共识机制中，我们的研究不仅提出了一种环保的区块链系统解决方案，还为新的人工智能时代提供了一个完全去中心化的计算力市场的提议。

更新时间: 2024-07-14 20:56:10

领域: cs.CR,cs.AI,cs.ET,cs.GT,cs.LG

下载: http://arxiv.org/abs/2404.09005v6

The Interpretation Gap in Text-to-Music Generation Models

Large-scale text-to-music generation models have significantly enhanced music creation capabilities, offering unprecedented creative freedom. However, their ability to collaborate effectively with human musicians remains limited. In this paper, we propose a framework to describe the musical interaction process, which includes expression, interpretation, and execution of controls. Following this framework, we argue that the primary gap between existing text-to-music models and musicians lies in the interpretation stage, where models lack the ability to interpret controls from musicians. We also propose two strategies to address this gap and call on the music information retrieval community to tackle the interpretation challenge to improve human-AI musical collaboration.

Updated: 2024-07-14 20:51:08

标题: 文献标题翻译为：文本到音乐生成模型中的解释差距

摘要: 大规模文本到音乐生成模型显著增强了音乐创作能力，提供了前所未有的创造自由。然而，它们与人类音乐家有效合作的能力仍然有限。在本文中，我们提出了一个描述音乐交互过程的框架，其中包括表达、解释和执行控制。根据这一框架，我们认为现有文本到音乐模型与音乐家之间的主要差距在于解释阶段，模型缺乏解释音乐家控制的能力。我们还提出了两种策略来解决这一差距，并呼吁音乐信息检索社区应对解释挑战，以改进人工智能与音乐家之间的音乐合作。

更新时间: 2024-07-14 20:51:08

领域: cs.SD,cs.AI

下载: http://arxiv.org/abs/2407.10328v1

Learning Unlabeled Clients Divergence via Anchor Model Aggregation for Federated Semi-supervised Learning

Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models from unlabeled clients due to their inherent unreliability, thus overlooking unique information from their heterogeneous data distribution, leading to sub-optimal results. In this paper, we enable unlabeled client aggregation through SemiAnAgg, a novel Semi-supervised Anchor-Based federated Aggregation. SemiAnAgg learns unlabeled client contributions via an anchor model, effectively harnessing their informative value. Our key idea is that by feeding local client data to the same global model and the same consistently initialized anchor model (i.e., random model), we can measure the importance of each unlabeled client accordingly. Extensive experiments demonstrate that SemiAnAgg achieves new state-of-the-art results on four widely used FedSemi benchmarks, leading to substantial performance improvements: a 9% increase in accuracy on CIFAR-100 and a 7.6% improvement in recall on the medical dataset ISIC-18, compared with prior state-of-the-art. Code is available at: https://github.com/xmed-lab/SemiAnAgg.

Updated: 2024-07-14 20:50:40

标题: 学习未标记客户端的分歧：通过锚模型聚合进行联合半监督学习

摘要: 联邦半监督学习（FedSemi）指的是可能存在拥有完全标记数据的客户端、部分标记数据的客户端，甚至完全未标记客户端的情况下保护数据隐私。然而，由于未定义的异质类分布和错误的伪标签，客户端漂移导致挑战。现有的FedSemi方法通常无法聚合未标记客户端的模型，因为它们固有的不可靠性，从而忽略了来自其异质数据分布的独特信息，导致次优结果。本文通过一种新颖的半监督基于锚点的联邦聚合（SemiAnAgg）方法实现未标记客户端的聚合。SemiAnAgg通过锚点模型学习未标记客户端的贡献，有效地利用它们的信息价值。我们的关键思想是，通过将本地客户端数据提供给相同的全局模型和相同一致初始化的锚点模型（即随机模型），我们可以相应地衡量每个未标记客户端的重要性。广泛的实验表明，SemiAnAgg在四个广泛使用的FedSemi基准测试上实现了新的最先进结果，带来了显著的性能改进：与之前的最先进相比，在CIFAR-100上准确率提高了9％，在医学数据集ISIC-18上召回率提高了7.6％。代码可在https://github.com/xmed-lab/SemiAnAgg找到。

更新时间: 2024-07-14 20:50:40

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2407.10327v1

Insecurity of Quantum Two-Party Computation with Applications to Cheat-Sensitive Protocols and Oblivious Transfer Reductions

Oblivious transfer (OT) is a fundamental primitive for secure two-party computation. It is well known that OT cannot be implemented with information-theoretic security if the two players only have access to noiseless communication channels, even in the quantum case. As a result, weaker variants of OT have been studied. In this work, we rigorously establish the impossibility of cheat-sensitive OT, where a dishonest party can cheat, but risks being detected. We construct a general attack on any quantum protocol that allows the receiver to compute all inputs of the sender and provide an explicit upper bound on the success probability of this attack. This implies that cheat-sensitive quantum Symmetric Private Information Retrieval cannot be implemented with statistical information-theoretic security. Leveraging the techniques devised for our proofs, we provide entropic bounds on primitives needed for secure function evaluation. They imply impossibility results for protocols where the players have access to OT as a resource. This result significantly improves upon existing bounds and yields tight bounds for reductions of 1-out-of-n OT to a resource primitive. Our results hold in particular for transformations between a finite number of primitives and for any error.

Updated: 2024-07-14 20:48:17

标题: 量子两方计算的不安全性及其在对抗敏感作弊协议和无忧传输减少中的应用

摘要: 遗忘传输（OT）是安全双方计算的基本原语。众所周知，即使在量子情况下，如果两个玩家只能访问无噪声通信渠道，OT也无法实现信息论安全。因此，研究了OT的较弱变体。在这项工作中，我们严格地确定了欺骗敏感的OT的不可能性，即不诚实的一方可能会欺骗，但可能会被检测到。我们构建了一个通用攻击来攻击允许接收方计算发送方的所有输入的任何量子协议，并提供了该攻击成功概率的明确上限。这意味着欺骗敏感的量子对称私有信息检索无法实现统计信息论安全。利用我们证明所设计的技术，我们提供了安全函数评估所需的熵界。它们暗示了对于玩家可以将OT作为资源的协议的不可能性结果。这一结果显著改进了现有的界限，并为将1-out-of-n OT减少到资源原语提供了紧密的界限。我们的结果特别适用于有限数量的原语之间的转换以及任何错误。

更新时间: 2024-07-14 20:48:17

领域: quant-ph,cs.CR

下载: http://arxiv.org/abs/2405.12121v2

Emergent World Models and Latent Variable Estimation in Chess-Playing Language Models

Language models have shown unprecedented capabilities, sparking debate over the source of their performance. Is it merely the outcome of learning syntactic patterns and surface level statistics, or do they extract semantics and a world model from the text? Prior work by Li et al. investigated this by training a GPT model on synthetic, randomly generated Othello games and found that the model learned an internal representation of the board state. We extend this work into the more complex domain of chess, training on real games and investigating our model's internal representations using linear probes and contrastive activations. The model is given no a priori knowledge of the game and is solely trained on next character prediction, yet we find evidence of internal representations of board state. We validate these internal representations by using them to make interventions on the model's activations and edit its internal board state. Unlike Li et al's prior synthetic dataset approach, our analysis finds that the model also learns to estimate latent variables like player skill to better predict the next character. We derive a player skill vector and add it to the model, improving the model's win rate by up to 2.6 times.

Updated: 2024-07-14 20:23:19

标题: 紧急世界模型和隐变量估计在国际象棋语言模型中的应用

摘要: 语言模型展现出前所未有的能力，引发了关于其表现来源的争论。这仅仅是学习句法模式和表面层统计的结果，还是从文本中提取了语义和世界模型？李等人之前的研究通过在合成、随机生成的奥赛罗游戏上训练GPT模型来调查这一问题，并发现模型学习了棋盘状态的内部表示。我们将这项工作扩展到更复杂的国际象棋领域，通过在真实游戏上进行训练并使用线性探针和对比激活来调查我们模型的内部表示。模型没有先验知识关于游戏，仅通过下一个字符预测进行训练，然而我们发现存在棋盘状态的内部表示的证据。我们通过使用这些内部表示来干预模型的激活并编辑其内部棋盘状态来验证这些内部表示。与李等人之前的合成数据集方法不同，我们的分析发现模型还学会了估计潜在变量，如玩家技能，以更好地预测下一个字符。我们推导出一个玩家技能向量并将其添加到模型中，将模型的胜率提高了最多2.6倍。

更新时间: 2024-07-14 20:23:19

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2403.15498v2

Order parameters and phase transitions of continual learning in deep neural networks

Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks. It gives rise to order parameters (OPs) that capture how task relations and network architecture influence forgetting and knowledge transfer, as verified by numerical evaluations. We found that the input and rule similarity between tasks have different effects on CL performance. In addition, the theory predicts that increasing the network depth can effectively reduce overlap between tasks, thereby lowering forgetting. For networks with task-specific readouts, the theory identifies a phase transition where CL performance shifts dramatically as tasks become less similar, as measured by the OPs. Sufficiently low similarity leads to catastrophic anterograde interference, where the network retains old tasks perfectly but completely fails to generalize new learning. Our results delineate important factors affecting CL performance and suggest strategies for mitigating forgetting.

Updated: 2024-07-14 20:22:36

标题: 深度神经网络中连续学习的序参数和相变

摘要: Continuous learning (CL) allows animals to acquire new tasks without losing previous knowledge. However, in artificial neural networks (NNs), CL is challenging due to catastrophic forgetting, where learning new tasks impairs performance on older tasks. While various methods exist to alleviate forgetting, there is a lack of theoretical understanding of when and why CL fails in NNs. In this study, we propose a statistical-mechanics theory of CL in deep, wide NNs, which describes the network's input-output mapping as it learns a sequence of tasks. This theory introduces order parameters (OPs) that capture how task relationships and network architecture affect forgetting and knowledge transfer, as confirmed by numerical simulations. We discovered that the similarity between tasks in terms of input and rules has different impacts on CL performance. Furthermore, the theory suggests that increasing the network depth can reduce task overlap, thus reducing forgetting. For networks with task-specific outputs, the theory identifies a phase transition where CL performance drastically changes as tasks become less similar, as indicated by the OPs. When similarity is sufficiently low, catastrophic anterograde interference occurs, where the network retains old tasks perfectly but fails to generalize new learning. Our findings highlight key factors influencing CL performance and propose strategies to mitigate forgetting.

更新时间: 2024-07-14 20:22:36

领域: cs.LG,physics.app-ph,q-bio.NC

下载: http://arxiv.org/abs/2407.10315v1

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on language models, and we release our benchmark to facilitate further evaluations: muse-bench.github.io

Updated: 2024-07-14 20:14:02

标题: MUSE：用于语言模型的机器消除六向评估

摘要: 语言模型（LMs）是在大量文本数据上进行训练的，这些数据可能包含私人和受版权保护的内容。数据所有者可能因隐私或版权问题而要求从训练模型中删除其数据。然而，现代模型中仅取消学习这些数据点（即，重新训练并删除数据）是不可行的。这导致了许多近似取消学习算法的发展。对这些算法的有效性评估在传统上范围狭窄，未能准确量化算法在模型部署者和数据所有者的视角下的成功和实用性。我们通过提出MUSE来解决这个问题，这是一个全面的机器取消学习评估基准，列举了取消学习模型的六个不同的理想属性：（1）没有直接记忆，（2）没有知识记忆，（3）没有隐私泄露，（4）在未被删除的数据上保持效用，（5）与删除请求的大小成比例，（6）能够持续处理顺序取消学习请求。利用这些标准，我们评估了8种流行的取消学习算法在7B参数LMs上如何取消学习《哈利·波特》书籍和新闻文章。我们的结果表明，大多数算法可以在不同程度上防止直接记忆和知识记忆，但只有一种算法不会导致严重的隐私泄露。此外，现有算法未能满足部署者的期望，因为它们通常会降低通用模型的效用，并且也无法持续地适应连续的取消学习请求或大规模内容的删除。我们的发现确定了现有取消学习算法在语言模型上的实用性的关键问题，并发布了我们的基准以促进进一步的评估：muse-bench.github.io

更新时间: 2024-07-14 20:14:02

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.06460v2

Augmented prediction of a true class for Positive Unlabeled data under selection bias

We introduce a new observational setting for Positive Unlabeled (PU) data where the observations at prediction time are also labeled. This occurs commonly in practice -- we argue that the additional information is important for prediction, and call this task "augmented PU prediction". We allow for labeling to be feature dependent. In such scenario, Bayes classifier and its risk is established and compared with a risk of a classifier which for unlabeled data is based only on predictors. We introduce several variants of the empirical Bayes rule in such scenario and investigate their performance. We emphasise dangers (and ease) of applying classical classification rule in the augmented PU scenario -- due to no preexisting studies, an unaware researcher is prone to skewing the obtained predictions. We conclude that the variant based on recently proposed variational autoencoder designed for PU scenario works on par or better than other considered variants and yields advantage over feature-only based methods in terms of accuracy for unlabeled samples.

Updated: 2024-07-14 19:58:01

标题: 增强选择偏差下对正未标记数据真实类别的预测

摘要: 我们介绍了一个新的正负未标记（PU）数据的观测设置，其中在预测时观测也被标记了。这在实践中很常见--我们认为额外的信息对于预测是重要的，并将这个任务称为“增强PU预测”。我们允许标记是特征相关的。在这种情况下，贝叶斯分类器及其风险被建立并与一个仅基于预测变量的无标记数据的分类器的风险进行了比较。我们在这种情况下引入了几种经验贝叶斯规则的变体，并研究它们的性能。我们强调在增强PU情景中应用经典分类规则的危险性（和便利性）--由于没有先前的研究，一个不知情的研究人员容易使得获得的预测结果有所偏差。我们得出结论，基于最近提出的为PU情景设计的变分自动编码器的变体与其他考虑的变体相当或更好地工作，并在准确性方面优于仅基于特征的方法对于无标记样本。

更新时间: 2024-07-14 19:58:01

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.10309v1

Ada-NAV: Adaptive Trajectory Length-Based Sample Efficient Policy Learning for Robotic Navigation

Trajectory length stands as a crucial hyperparameter within reinforcement learning (RL) algorithms, significantly contributing to the sample inefficiency in robotics applications. Motivated by the pivotal role trajectory length plays in the training process, we introduce Ada-NAV, a novel adaptive trajectory length scheme designed to enhance the training sample efficiency of RL algorithms in robotic navigation tasks. Unlike traditional approaches that treat trajectory length as a fixed hyperparameter, we propose to dynamically adjust it based on the entropy of the underlying navigation policy. Interestingly, Ada-NAV can be applied to both existing on-policy and off-policy RL methods, which we demonstrate by empirically validating its efficacy on three popular RL methods: REINFORCE, Proximal Policy Optimization (PPO), and Soft Actor-Critic (SAC). We demonstrate through simulated and real-world robotic experiments that Ada-NAV outperforms conventional methods that employ constant or randomly sampled trajectory lengths. Specifically, for a fixed sample budget, Ada-NAV achieves an 18\% increase in navigation success rate, a 20-38\% reduction in navigation path length, and a 9.32\% decrease in elevation costs. Furthermore, we showcase the versatility of Ada-NAV by integrating it with the Clearpath Husky robot, illustrating its applicability in complex outdoor environments.

Updated: 2024-07-14 19:35:43

标题: Ada-NAV：自适应轨迹长度为基础的高效样本策略学习用于机器人导航

摘要: 轨迹长度是强化学习（RL）算法中关键的超参数，显著影响机器人应用中的样本效率。受轨迹长度在训练过程中的关键作用启发，我们引入了Ada-NAV，一种新颖的自适应轨迹长度方案，旨在增强RL算法在机器人导航任务中的训练样本效率。与将轨迹长度视为固定超参数的传统方法不同，我们提出根据基础导航策略的熵动态调整轨迹长度。有趣的是，Ada-NAV可以应用于现有的在线策略和离线策略RL方法，我们通过实证验证其在三种流行的RL方法（REINFORCE，Proximal Policy Optimization（PPO）和Soft Actor-Critic（SAC））上的有效性。我们通过模拟和真实世界的机器人实验证明，Ada-NAV优于采用固定或随机采样轨迹长度的传统方法。具体而言，对于固定的样本预算，Ada-NAV实现了导航成功率增加18％，导航路径长度减少20-38％，高度成本降低9.32％。此外，我们展示了Ada-NAV的多功能性，将其与Clearpath Husky机器人集成，展示其在复杂的户外环境中的适用性。

更新时间: 2024-07-14 19:35:43

领域: cs.RO,cs.AI,cs.LG

下载: http://arxiv.org/abs/2306.06192v6

The Feasibility of a Smart Contract "Kill Switch"

The advent of blockchain technology and its adoption across various sectors have raised critical discussions about the need for regulatory mechanisms to ensure consumer protection, maintain financial stability, and address privacy concerns without compromising the foundational principles of decentralization and immutability inherent in blockchain platforms. We examine the existing mechanisms for smart contract termination across several major blockchain platforms, including Ethereum, BNB Smart Chain, Cardano, Solana, Hyperledger Fabric, Corda, IOTA, Apotos, and Sui. We assess the compatibility of these mechanisms with the requirements of the EU Data Act, focusing on aspects such as consumer protection, error correction, and regulatory compliance. Our analysis reveals a diverse landscape of approaches, from immutable smart contracts with built-in termination conditions to upgradable smart contracts that allow for post-deployment modifications. We discuss the challenges associated with implementing the so-called smart contract "kill switches," such as the balance between enabling regulatory compliance and preserving the decentralized ethos, the technical feasibility of such mechanisms, and the implications for security and trust in the ecosystem.

Updated: 2024-07-14 19:31:15

标题: 智能合约“停止开关”的可行性

摘要: 区块链技术的出现及其在各个领域的应用引发了关于需要监管机制来确保消费者保护、维护金融稳定性和解决隐私问题的讨论，同时又不能损害区块链平台固有的去中心化和不可变性基本原则。我们检查了几个主要区块链平台上智能合约终止的现有机制，包括以太坊、BNB智能链、Cardano、Solana、Hyperledger Fabric、Corda、IOTA、Apotos和Sui。我们评估了这些机制与欧盟数据法案要求的兼容性，重点关注消费者保护、错误更正和监管合规等方面。我们的分析显示了各种不同的方法，从具有内置终止条件的不可变智能合约到允许发布后修改的可升级智能合约。我们讨论了实施所谓智能合约“关断开关”面临的挑战，如在实现监管合规和保持去中心化精神之间的平衡，这种机制的技术可行性，以及对生态系统安全性和信任的影响。

更新时间: 2024-07-14 19:31:15

领域: cs.CR,cs.ET

下载: http://arxiv.org/abs/2407.10302v1

Risks of uncertainty propagation in Al-augmented security pipelines

The use of AI technologies is percolating into the secure development of software-based systems, with an increasing trend of composing AI-based subsystems (with uncertain levels of performance) into automated pipelines. This presents a fundamental research challenge and poses a serious threat to safety-critical domains (e.g., aviation). Despite the existing knowledge about uncertainty in risk analysis, no previous work has estimated the uncertainty of AI-augmented systems given the propagation of errors in the pipeline. We provide the formal underpinnings for capturing uncertainty propagation, develop a simulator to quantify uncertainty, and evaluate the simulation of propagating errors with two case studies. We discuss the generalizability of our approach and present policy implications and recommendations for aviation. Future work includes extending the approach and investigating the required metrics for validation in the aviation domain.

Updated: 2024-07-14 19:02:20

标题: 风险：Al增强安全管道中的不确定性传播

摘要: 人工智能技术的使用正在渗透到基于软件的系统的安全开发中，越来越多地将基于人工智能的子系统（性能水平不确定）组合成自动化管道的趋势日益明显。这提出了一个基本的研究挑战，并对安全关键领域（例如航空）构成了严重威胁。尽管在风险分析中存在关于不确定性的现有知识，但之前的工作并未估计由于在管道中传播错误而导致的人工智能增强系统的不确定性。我们提供了捕捉不确定性传播的形式基础，开发了一个模拟器来量化不确定性，并通过两个案例研究评估了传播错误的模拟。我们讨论了我们方法的泛化能力，并提出了航空领域的政策含义和建议。未来的工作包括扩展该方法并研究在航空领域中验证所需的度量标准。

更新时间: 2024-07-14 19:02:20

领域: cs.SE,cs.AI,cs.CR

下载: http://arxiv.org/abs/2407.14540v1

Merging Improves Self-Critique Against Jailbreak Attacks

The robustness of large language models (LLMs) against adversarial manipulations, such as jailbreak attacks, remains a significant challenge. In this work, we propose an approach that enhances the self-critique capability of the LLM and further fine-tunes it over sanitized synthetic data. This is done with the addition of an external critic model that can be merged with the original, thus bolstering self-critique capabilities and improving the robustness of the LLMs response to adversarial prompts. Our results demonstrate that the combination of merging and self-critique can reduce the attack success rate of adversaries significantly, thus offering a promising defense mechanism against jailbreak attacks. Code, data and models released at https://github.com/vicgalle/merging-self-critique-jailbreaks .

Updated: 2024-07-14 18:27:14

标题: 合并改进自我批评抵御越狱攻击

摘要: 大型语言模型（LLMs）对抗性操纵，如越狱攻击的稳健性仍然是一个重要挑战。在这项工作中，我们提出了一种方法，增强了LLM的自我批评能力，并通过对经过消毒的合成数据进行进一步微调。这是通过添加一个外部批评模型来实现的，可以与原始模型合并，从而增强自我批评能力，并改进LLM对对抗性提示的响应的稳健性。我们的结果表明，合并和自我批评的结合可以显著降低对手的攻击成功率，从而提供一种有前景的防御机制来对抗越狱攻击。代码、数据和模型发布在https://github.com/vicgalle/merging-self-critique-jailbreaks。

更新时间: 2024-07-14 18:27:14

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2406.07188v2

Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge

With the breakthrough of multi-modal large language models, answering complex visual questions that demand advanced reasoning abilities and world knowledge has become a much more important testbed for developing AI models than ever. However, equipping AI models with robust cross-modality reasoning ability remains challenging since the cognition scheme of humans has not been understood systematically. In this paper, we believe that if we can collect visual clues in the given image as much as possible, we will recognize the image more accurately, understand the question better, recall relevant knowledge more easily, and finally reason out the answer. We discover these rich visual clues by mining question-answer pairs in images and sending them into multi-modal large language models as prompts. We call the proposed method Q&A Prompts. Specifically, we first use the image-answer pairs and the corresponding questions in the training set as inputs and outputs to train a visual question generation model. Then, we use an image tagging model to identify various instances and send packaged image-tag pairs into the visual question generation model to generate relevant questions with the extracted image tags as answers. Finally, we encode these generated question-answer pairs as prompts with a visual-aware prompting module and send them into pre-trained multi-modal large language models to reason out the final answers. Experimental results show that, compared with state-of-the-art methods, our Q&A Prompts achieves substantial improvements on the challenging visual question answering datasets requiring reasoning over diverse world knowledge, such as OK-VQA and A-OKVQA.

Updated: 2024-07-14 18:18:05

标题: 问答提示：通过挖掘需要多样化世界知识的VQA问题-答案提示发现丰富的视觉线索

摘要: 随着多模态大型语言模型的突破，回答需要先进推理能力和世界知识的复杂视觉问题，比以往任何时候都更成为开发AI模型的重要测试平台。然而，由于人类的认知方案尚未系统地被理解，使AI模型具有强大的跨模态推理能力仍然具有挑战性。在本文中，我们认为如果我们能够尽可能收集给定图像中的视觉线索，我们将更准确地识别图像，更好地理解问题，更容易地回忆相关知识，并最终推理出答案。我们通过在图像中挖掘问题-答案对来发现这些丰富的视觉线索，并将它们作为提示发送到多模态大型语言模型中。我们称所提出的方法为Q&A Prompts。具体来说，我们首先使用训练集中的图像-答案对和相应的问题作为输入和输出来训练视觉问题生成模型。然后，我们使用图像标记模型识别各种实例，并将打包的图像-标记对发送到视觉问题生成模型中，以提取的图像标记作为答案生成相关问题。最后，我们使用视觉感知提示模块将这些生成的问题-答案对编码为提示，并将它们发送到预训练的多模态大型语言模型中，以推理出最终答案。实验结果表明，与最先进的方法相比，我们的Q&A Prompts在需要对多样世界知识进行推理的具有挑战性的视觉问题回答数据集（如OK-VQA和A-OKVQA）上取得了显著的改进。

更新时间: 2024-07-14 18:18:05

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.10712v4

A Closer Look at the Limitations of Instruction Tuning

Instruction Tuning (IT), the process of training large language models (LLMs) using instruction-response pairs, has emerged as the predominant method for transforming base pre-trained LLMs into open-domain conversational agents. While IT has achieved notable success and widespread adoption, its limitations and shortcomings remain underexplored. In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through IT, we reveal various limitations of IT. In particular, we show that (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation. (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality. (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses. (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets. We hope the insights and challenges revealed in this paper inspire future work in related directions.

Updated: 2024-07-14 18:14:57

标题: 对指令调整的局限性进行更深入的探讨

摘要: 指导调优（Instruction Tuning，IT）是使用指导-响应对训练大型语言模型（LLMs）的过程，已成为将基础预训练的LLMs转化为开放领域会话代理的主要方法。虽然IT取得了显著的成功并得到了广泛采用，但其限制和不足仍未得到充分探讨。本文通过严格的实验和对LLMs在IT过程中经历的变化进行深入分析，揭示了IT的各种限制。具体而言，我们发现（1）IT未能增强LLMs的知识或技能。LoRA微调仅限于学习响应启动和样式标记，而全参数微调导致知识退化。（2）从知识来源派生的IT数据集中复制响应模式会导致响应质量下降。（3）全参数微调通过不准确地从IT数据集中的概念上类似的实例中借用标记来生成响应，增加了幻觉。（4）改进IT的流行方法并没有使LoRA微调模型的性能得到提升。我们的研究结果表明，仅基于预训练知识生成的响应始终优于通过IT学习任何形式的新知识的模型在开源数据集上生成的响应。我们希望本文揭示的见解和挑战能够激发未来相关方向的研究工作。

更新时间: 2024-07-14 18:14:57

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.05119v5

Hyperplane Arrangements and Fixed Points in Iterated PWL Neural Networks

We leverage the framework of hyperplane arrangements to analyze potential regions of (stable) fixed points. We provide an upper bound on the number of fixed points for multi-layer neural networks equipped with piecewise linear (PWL) activation functions with arbitrary many linear pieces. The theoretical optimality of the exponential growth in the number of layers of the latter bound is shown. Specifically, we also derive a sharper upper bound on the number of stable fixed points for one-hidden-layer networks with hard tanh activation.

Updated: 2024-07-14 18:01:28

标题: 超平面排列与迭代PWL神经网络中的固定点

摘要: 我们利用超平面排列的框架来分析（稳定）固定点的潜在区域。我们为多层神经网络配备具有任意多个线性分段的分段线性（PWL）激活函数提供了固定点数量的上界。我们展示了后一界的层数指数增长的理论最优性。具体来说，我们还推导了硬tanh激活函数的单隐藏层网络的稳定固定点数量的更尖锐的上界。

更新时间: 2024-07-14 18:01:28

领域: cs.LG,cs.AI,stat.ML,68T07,G.0

下载: http://arxiv.org/abs/2405.09878v2

Numbers Matter! Bringing Quantity-awareness to Retrieval Systems

Quantitative information plays a crucial role in understanding and interpreting the content of documents. Many user queries contain quantities and cannot be resolved without understanding their semantics, e.g., ``car that costs less than $10k''. Yet, modern search engines apply the same ranking mechanisms for both words and quantities, overlooking magnitude and unit information. In this paper, we introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently. These techniques incorporate quantity information in available retrieval systems and can address queries with numerical conditions equal, greater than, and less than. To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine and compare our method against various lexical and neural models. The code and data are available under https://github.com/satya77/QuantityAwareRankers.

Updated: 2024-07-14 17:56:11

标题: 数字重要！将数量意识引入检索系统

摘要: 数量信息在理解和解释文档内容中起着至关重要的作用。许多用户查询包含数量，如果不理解其语义，就无法解决，例如，“价格低于10,000美元的汽车”。然而，现代搜索引擎对单词和数量应用相同的排名机制，忽略了数量和单位信息。在本文中，我们介绍了两种数量感知排名技术，旨在联合或独立地对数量和文本内容进行排名。这些技术将数量信息纳入现有检索系统中，可以处理具有等于、大于和小于数值条件的查询。为了评估我们提出的模型的有效性，我们在金融和医学领域引入了两个新颖的数量感知基准数据集，并将我们的方法与各种词汇和神经模型进行比较。代码和数据可在https://github.com/satya77/QuantityAwareRankers 下载。

更新时间: 2024-07-14 17:56:11

领域: cs.IR,cs.LG,H.3.3; I.2.7

下载: http://arxiv.org/abs/2407.10283v1

Deep Learning Activation Functions: Fixed-Shape, Parametric, Adaptive, Stochastic, Miscellaneous, Non-Standard, Ensemble

In the architecture of deep learning models, inspired by biological neurons, activation functions (AFs) play a pivotal role. They significantly influence the performance of artificial neural networks. By modulating the non-linear properties essential for learning complex patterns, AFs are fundamental in both classification and regression tasks. This paper presents a comprehensive review of various types of AFs, including fixed-shape, parametric, adaptive, stochastic/probabilistic, non-standard, and ensemble/combining types. We begin with a systematic taxonomy and detailed classification frameworks that delineates the principal characteristics of AFs and organizes them based on their structural and functional distinctions. Our in-depth analysis covers primary groups such as sigmoid-based, ReLU-based, and ELU-based AFs, discussing their theoretical foundations, mathematical formulations, and specific benefits and limitations in different contexts. We also highlight key attributes of AFs such as output range, monotonicity, and smoothness. Furthermore, we explore miscellaneous AFs that do not conform to these categories but have shown unique advantages in specialized applications. Non-standard AFs are also explored, showcasing cutting-edge variations that challenge traditional paradigms and offer enhanced adaptability and model performance. We examine strategies for combining multiple AFs to leverage complementary properties. The paper concludes with a comparative evaluation of 12 state-of-the-art AFs, using rigorous statistical and experimental methodologies to assess their efficacy. This analysis not only aids practitioners in selecting and designing the most appropriate AFs for their specific deep learning tasks but also encourages continued innovation in AF development within the machine learning community.

Updated: 2024-07-14 17:53:49

标题: 深度学习激活函数：固定形状、参数化、自适应、随机、杂项、非标准、集成

摘要: 在深度学习模型的架构中，受生物神经元启发，激活函数（AFs）发挥着关键作用。它们对人工神经网络的性能产生显著影响。通过调节学习复杂模式所必需的非线性特性，AFs在分类和回归任务中都起着基础性作用。本文综述了各种类型的AFs，包括固定形状、参数化、自适应、随机/概率、非标准和集成/组合类型。我们从系统的分类和详细的分类框架开始，概述了AFs的主要特征，并根据它们的结构和功能区分进行了组织。我们的深入分析涵盖了主要的群体，如基于Sigmoid、ReLU和ELU的AFs，讨论了它们的理论基础、数学公式以及在不同背景下的具体优势和局限性。我们还强调了AFs的关键属性，如输出范围、单调性和平滑性。此外，我们探索了一些不符合这些类别但在专业应用中表现出独特优势的杂项AFs。非标准AFs也得到了探讨，展示了挑战传统范式并提供增强适应性和模型性能的尖端变体。我们研究了结合多个AFs以利用互补属性的策略。本文以严谨的统计和实验方法对12种最先进的AFs进行了比较评估，评估它们的有效性。这种分析不仅有助于从业者选择和设计最适合其深度学习任务的AFs，还鼓励机器学习社区在AF开发方面持续创新。

更新时间: 2024-07-14 17:53:49

领域: cs.LG

下载: http://arxiv.org/abs/2407.11090v1

AlphaDou: High-Performance End-to-End Doudizhu AI Integrating Bidding

Artificial intelligence for card games has long been a popular topic in AI research. In recent years, complex card games like Mahjong and Texas Hold'em have been solved, with corresponding AI programs reaching the level of human experts. However, the game of Dou Di Zhu presents significant challenges due to its vast state/action space and unique characteristics involving reasoning about competition and cooperation, making the game extremely difficult to solve.The RL model DouZero, trained using the Deep Monte Carlo algorithm framework, has shown excellent performance in DouDiZhu. However, there are differences between its simplified game environment and the actual Dou Di Zhu environment, and its performance is still a considerable distance from that of human experts. This paper modifies the Deep Monte Carlo algorithm framework by using reinforcement learning to obtain a neural network that simultaneously estimates win rates and expectations. The action space is pruned using expectations, and strategies are generated based on win rates. This RL model is trained in a realistic DouDiZhu environment and achieves a state-of-the-art level among publicly available models.

Updated: 2024-07-14 17:32:36

标题: AlphaDou：高性能端到端斗地主AI整合叫牌

摘要: 人工智能在纸牌游戏中长期以来一直是人工智能研究中的热门话题。近年来，像麻将和德州扑克这样复杂的纸牌游戏已经被解决，相应的人工智能程序已经达到了人类专家的水平。然而，斗地主这个游戏由于其庞大的状态/动作空间和涉及到竞争和合作的独特特征，使得该游戏非常难以解决。使用深度蒙特卡洛算法框架训练的RL模型DouZero在斗地主中表现出色。然而，其简化的游戏环境与实际斗地主环境之间存在差异，其性能仍然与人类专家相去甚远。本文通过使用强化学习修改深度蒙特卡洛算法框架，得到一个同时估计胜率和期望的神经网络。通过期望对动作空间进行修剪，并基于胜率生成策略。这个RL模型在一个真实的斗地主环境中进行训练，并在公开可用的模型中达到了最先进的水平。

更新时间: 2024-07-14 17:32:36

领域: cs.AI,cs.GT,cs.MA

下载: http://arxiv.org/abs/2407.10279v1

From PEFT to DEFT: Parameter Efficient Finetuning for Reducing Activation Density in Transformers

Pretrained Language Models (PLMs) have become the de facto starting point for fine-tuning on downstream tasks. However, as model sizes continue to increase, traditional fine-tuning of all the parameters becomes challenging. To address this, parameter-efficient fine-tuning (PEFT) methods have gained popularity as a means to adapt PLMs effectively. In parallel, recent studies have revealed the presence of activation sparsity within the intermediate outputs of the multilayer perceptron (MLP) blocks in transformers. Low activation density enables efficient model inference on sparsity-aware hardware. Building upon this insight, in this work, we propose a novel density loss that encourages higher activation sparsity (equivalently, lower activation density) in the pre-trained models. We demonstrate the effectiveness of our approach by utilizing mainstream PEFT techniques, including QLoRA, LoRA, Adapter, and Prompt/Prefix Tuning, to facilitate efficient model adaptation across diverse downstream tasks. Experiments show that our proposed method, \textbf{DEFT} (Density-Efficient Fine-Tuning), can consistently reduce activation density by up to \textbf{44.94\%} on RoBERTa$_\mathrm{Large}$ and by \textbf{53.19\%} (encoder density) and \textbf{90.60\%} (decoder density) on Flan-T5$_\mathrm{XXL}$ (\textbf{11B}) compared to PEFT, using GLUE and QA (SQuAD) benchmarks respectively. We also introduce \textbf{ADA-DEFT}, an adaptive variant of our DEFT approach, which achieves significant memory and runtime savings during inference. For instance, ADA-DEFT reduces runtime by \textbf{8.79\%}and memory usage by \textbf{17.46\%} in Flan-T5$_\mathrm{XL}$, and by \textbf{2.79\%} and \textbf{2.54\%} respectively in Flan-T5$_\mathrm{XXL}$. Additionally, we showcase that DEFT works complementarily with quantized and pruned models.

Updated: 2024-07-14 17:32:36

标题: 从PEFT到DEFT：用于减少变压器中激活密度的参数高效微调

摘要: 预训练语言模型（PLMs）已成为在下游任务上微调的事实上的起点。然而，随着模型规模的持续增加，传统的微调所有参数变得具有挑战性。为了解决这一问题，参数高效微调（PEFT）方法作为一种有效调整PLMs的手段已经变得流行起来。同时，最近的研究揭示了变压器中多层感知器（MLP）块的中间输出中存在激活稀疏性。低激活密度使得在稀疏感知硬件上进行高效的模型推断成为可能。基于这一观点，在这项工作中，我们提出了一种鼓励更高激活稀疏性（等效地，更低激活密度）的新型密度损失，以促进预训练模型中的激活稀疏性。我们通过利用主流的PEFT技术，包括QLoRA、LoRA、Adapter和Prompt/Prefix Tuning，来展示我们方法的有效性，以促进在不同下游任务中的高效模型调整。实验证明，我们提出的方法DEFT（密度高效微调）可以在RoBERTa$_\mathrm{Large}$上将激活密度降低高达44.94％，并且在Flan-T5$_\mathrm{XXL}$（11B）上将编码器密度降低53.19％和解码器密度降低90.60％，相比于PEFT，在GLUE和QA（SQuAD）基准上分别进行了测试。我们还介绍了ADA-DEFT，这是我们DEFT方法的自适应变体，在推断过程中实现了显著的内存和运行时节约。例如，在Flan-T5$_\mathrm{XL}$中，ADA-DEFT将运行时减少了8.79％，内存使用减少了17.46％，在Flan-T5$_\mathrm{XXL}$中分别减少了2.79％和2.54％。此外，我们展示了DEFT与量化和修剪模型的互补作用。

更新时间: 2024-07-14 17:32:36

领域: cs.LG

下载: http://arxiv.org/abs/2402.01911v2

Disrupting Diffusion-based Inpainters with Semantic Digression

The fabrication of visual misinformation on the web and social media has increased exponentially with the advent of foundational text-to-image diffusion models. Namely, Stable Diffusion inpainters allow the synthesis of maliciously inpainted images of personal and private figures, and copyrighted contents, also known as deepfakes. To combat such generations, a disruption framework, namely Photoguard, has been proposed, where it adds adversarial noise to the context image to disrupt their inpainting synthesis. While their framework suggested a diffusion-friendly approach, the disruption is not sufficiently strong and it requires a significant amount of GPU and time to immunize the context image. In our work, we re-examine both the minimal and favorable conditions for a successful inpainting disruption, proposing DDD, a "Digression guided Diffusion Disruption" framework. First, we identify the most adversarially vulnerable diffusion timestep range with respect to the hidden space. Within this scope of noised manifold, we pose the problem as a semantic digression optimization. We maximize the distance between the inpainting instance's hidden states and a semantic-aware hidden state centroid, calibrated both by Monte Carlo sampling of hidden states and a discretely projected optimization in the token space. Effectively, our approach achieves stronger disruption and a higher success rate than Photoguard while lowering the GPU memory requirement, and speeding the optimization up to three times faster.

Updated: 2024-07-14 17:21:19

标题: 使用语义偏离打破基于扩散的修复算法

摘要: 随着基于文本到图像扩散模型的出现，网络和社交媒体上视觉虚假信息的制造呈指数增长。特别是，稳定扩散修复器允许合成个人和私人人物以及受版权保护内容的恶意修复图像，也被称为深度伪造。为了对抗这种生成，提出了一个破坏框架，即Photoguard，它向上下文图像添加对抗性噪声以干扰其修复合成。虽然他们的框架提出了一种友好的扩散方法，但是破坏并不足够强大，需要大量GPU和时间来使上下文图像免疫。在我们的工作中，我们重新审视了成功修复破坏的最小和有利条件，提出了DDD，即“偏离引导扩散破坏”框架。首先，我们确定了在隐藏空间中最易受攻击的扩散时间步范围。在这个受噪声影响的流形范围内，我们将问题提出为语义偏离优化。我们通过对隐藏状态进行蒙特卡洛采样和在令牌空间中进行离散投影的校准，最大化修复实例的隐藏状态与语义感知隐藏状态中心之间的距离。有效地，我们的方法实现了比Photoguard更强的破坏和更高的成功率，同时降低了GPU内存需求，并将优化速度提高了三倍。

更新时间: 2024-07-14 17:21:19

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10277v1

The Error Analysis of the Secret Key Generation Algorithm Using Analog Function Computation

This study introduces a decentralized approach to secure wireless communication using a cryptographic secret key generation algorithm among distributed nodes. The system model employs Gaussian prime numbers, ensuring the collaborative generation of a secret key. Pre-processing and post-processing functions enable to generate a secret key across the network. An error model evaluates aspects like thermal noise power and channel estimation errors, while simulations assess the success rate to factorize the norm of the secret key. It is observed that path loss-induced large scale fading emerges as a critical component impacting information and power loss. The robustness of the proposed model under fading channel conditions is evaluated with a success rate. Additionally, it is also observed that the tolerance value set in the factorization algorithms has a significant impact on the success rate. Furthermore, the success rate is compared in two scenarios, one with 2 users and another with 3 users, to provide a comprehensive evaluation of the system performance.

Updated: 2024-07-14 17:20:54

标题: 使用模拟函数计算的秘钥生成算法的误差分析

摘要: 这项研究介绍了一种分散式方法，利用分布式节点之间的加密秘密密钥生成算法来保护无线通信。系统模型采用高斯素数，确保秘密密钥的协同生成。预处理和后处理函数使得能够在网络中生成秘密密钥。一个错误模型评估了诸如热噪声功率和信道估计误差等方面，而模拟评估了因子化秘密密钥范数的成功率。观察到由路径损耗引起的大尺度衰落成为影响信息和功率损失的关键因素。在衰落信道条件下提出的模型的鲁棒性通过成功率进行评估。此外，观察到在因子化算法中设置的容差值对成功率有显著影响。此外，将两种情景进行成功率比较，一个是2用户，另一个是3用户，以全面评估系统性能。

更新时间: 2024-07-14 17:20:54

领域: cs.CR

下载: http://arxiv.org/abs/2407.10276v1

Cross-Lingual Multi-Hop Knowledge Editing -- Benchmarks, Analysis and a Simple Contrastive Learning based Approach

Large language models are often expected to constantly adapt to new sources of knowledge and knowledge editing techniques aim to efficiently patch the outdated model knowledge, with minimal modification. Most prior works focus on monolingual knowledge editing in English, even though new information can emerge in any language from any part of the world. We propose the Cross-Lingual Multi-Hop Knowledge Editing paradigm, for measuring and analyzing the performance of various SoTA knowledge editing techniques in a cross-lingual setup. Specifically, we create a parallel cross-lingual benchmark, CROLIN-MQUAKE for measuring the knowledge editing capabilities. Our extensive analysis over various knowledge editing techniques uncover significant gaps in performance between the cross-lingual and English-centric setting. Following this, we propose a significantly improved system for cross-lingual multi-hop knowledge editing, CLEVER-CKE. CLEVER-CKE is based on a retrieve, verify and generate knowledge editing framework, where a retriever is formulated to recall edited facts and support an LLM to adhere to knowledge edits. We develop language-aware and hard-negative based contrastive objectives for improving the cross-lingual and fine-grained fact retrieval and verification process used in this framework. Extensive experiments on three LLMs, eight languages, and two datasets show CLEVER-CKE's significant gains of up to 30% over prior methods.

Updated: 2024-07-14 17:18:16

标题: 跨语言多跳知识编辑--基准、分析和基于简单对比学习的方法

摘要: 大型语言模型通常被期望不断适应新知识来源，知识编辑技术旨在有效地修补过时的模型知识，而修改最小。大多数先前的工作侧重于英语中的单语知识编辑，尽管新信息可以在世界任何地方的任何语言中出现。我们提出了跨语言多跳知识编辑范式，用于衡量和分析各种SoTA知识编辑技术在跨语言设置中的性能。具体地，我们创建了一个平行的跨语言基准，CROLIN-MQUAKE，用于衡量知识编辑能力。我们对各种知识编辑技术进行了广泛的分析，发现了跨语言和以英语为中心设置之间的性能差距。在此基础上，我们提出了一个显著改进的跨语言多跳知识编辑系统，CLEVER-CKE。CLEVER-CKE基于检索、验证和生成知识编辑框架，其中检索器被设计为检索编辑后的事实，并支持LLM遵循知识编辑。我们开发了基于语言感知和基于困难负例的对比目标，用于改进在该框架中使用的跨语言和细粒度事实检索和验证过程。在三个LLM、八种语言和两个数据集上进行的广泛实验显示，CLEVER-CKE相对于先前方法的显著增益高达30%。

更新时间: 2024-07-14 17:18:16

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10275v1

Enhancing Weakly-Supervised Histopathology Image Segmentation with Knowledge Distillation on MIL-Based Pseudo-Labels

Segmenting tumors in histological images is vital for cancer diagnosis. While fully supervised models excel with pixel-level annotations, creating such annotations is labor-intensive and costly. Accurate histopathology image segmentation under weakly-supervised conditions with coarse-grained image labels is still a challenging problem. Although multiple instance learning (MIL) has shown promise in segmentation tasks, surprisingly, no previous pseudo-supervision methods have used MIL-based outputs as pseudo-masks for training. We suspect this stems from concerns over noises in MIL results affecting pseudo supervision quality. To explore the potential of leveraging MIL-based segmentation for pseudo supervision, we propose a novel distillation framework for histopathology image segmentation. This framework introduces a iterative fusion-knowledge distillation strategy, enabling the student model to learn directly from the teacher's comprehensive outcomes. Through dynamic role reversal between the fixed teacher and learnable student models and the incorporation of weighted cross-entropy loss for model optimization, our approach prevents performance deterioration and noise amplification during knowledge distillation. Experimental results on public histopathology datasets, Camelyon16 and Digestpath2019, demonstrate that our approach not only complements various MIL-based segmentation methods but also significantly enhances their performance. Additionally, our method achieves new SOTA in the field.

Updated: 2024-07-14 17:15:47

标题: 通过基于MIL伪标签的知识蒸馏提升弱监督组织病理图像分割

摘要: 在组织学图像中分割肿瘤对癌症诊断至关重要。尽管完全监督模型在像素级注释方面表现出色，但创建这种注释是劳动密集且昂贵的。在弱监督条件下使用粗粒度图像标签进行准确的组织病理学图像分割仍然是一个具有挑战性的问题。虽然多实例学习（MIL）在分割任务中显示出潜力，但令人惊讶的是，以前的伪监督方法没有使用基于MIL的输出作为伪掩模进行训练。我们怀疑这是由于对MIL结果中的噪声影响伪监督质量的担忧所致。为了探索利用基于MIL的分割进行伪监督的潜力，我们提出了一个新颖的组织病理学图像分割蒸馏框架。该框架引入了迭代融合知识蒸馏策略，使学生模型能够直接从教师的综合结果中学习。通过固定教师和可学习学生模型之间的动态角色转换以及加权交叉熵损失的引入进行模型优化，我们的方法防止了知识蒸馏过程中性能的恶化和噪声的增强。对公共组织病理学数据集Camelyon16和Digestpath2019的实验结果表明，我们的方法不仅补充了各种基于MIL的分割方法，还显著提升了它们的性能。此外，我们的方法在该领域取得了新的最先进技术水平。

更新时间: 2024-07-14 17:15:47

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2407.10274v1

psifx -- Psychological and Social Interactions Feature Extraction Package

psifx is a plug-and-play multi-modal feature extraction toolkit, aiming to facilitate and democratize the use of state-of-the-art machine learning techniques for human sciences research. It is motivated by a need (a) to automate and standardize data annotation processes, otherwise involving expensive, lengthy, and inconsistent human labor, such as the transcription or coding of behavior changes from audio and video sources; (b) to develop and distribute open-source community-driven psychology research software; and (c) to enable large-scale access and ease of use to non-expert users. The framework contains an array of tools for tasks, such as speaker diarization, closed-caption transcription and translation from audio, as well as body, hand, and facial pose estimation and gaze tracking from video. The package has been designed with a modular and task-oriented approach, enabling the community to add or update new tools easily. We strongly hope that this package will provide psychologists a simple and practical solution for efficiently a range of audio, linguistic, and visual features from audio and video, thereby creating new opportunities for in-depth study of real-time behavioral phenomena.

Updated: 2024-07-14 16:20:42

标题: psifx -- 心理和社会互动特征提取包

摘要: psifx是一个即插即用的多模态特征提取工具包，旨在促进和民主化最先进的机器学习技术在人类科学研究中的应用。它的动机是为了（a）自动化和标准化数据注释过程，否则需要昂贵、耗时且不一致的人力，比如从音频和视频来源转录或编码行为变化；（b）开发和分发开源社区驱动的心理学研究软件；以及（c）为非专家用户提供大规模访问和易用性。该框架包含一系列工具，用于任务，如说话者分离、从音频转录和翻译的闭幕字幕，以及从视频中估计身体、手部和面部姿势以及凝视跟踪。该软件包采用模块化和任务导向的方法设计，使社区能够轻松添加或更新新工具。我们强烈希望这个软件包将为心理学家提供一个简单实用的解决方案，能够高效地从音频和视频中提取一系列音频、语言和视觉特征，从而为深入研究实时行为现象创造新机会。

更新时间: 2024-07-14 16:20:42

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2407.10266v1

Divide & Bind Your Attention for Improved Generative Semantic Nursing

Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., "a cat and a dog". However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks.

Updated: 2024-07-14 16:20:19

标题: 将您的注意力分割和绑定以改善生成语义护理

摘要: 新兴的大规模文本到图像生成模型，例如稳定扩散（SD），已经展示出具有高保真度的压倒性结果。尽管取得了巨大进展，但目前的领先模型仍然难以生成完全符合输入提示的图像。先前的研究《注意与激发》引入了生成语义护理（GSN）的概念，旨在优化推理时的交叉注意力，以更好地融入语义。它在生成简单提示，例如“一只猫和一只狗”方面展示了有希望的结果。然而，当处理更复杂的提示时，它的效力会下降，并且它并没有明确解决属性绑定不当的问题。为了解决复杂提示或涉及多个实体的情景所带来的挑战，并实现改进的属性绑定，我们提出了“分割与绑定”方法。我们为GSN引入了两个新的损失目标：一种新的出席损失和一个绑定损失。我们的方法在能够忠实合成复杂提示中所需的对象并展现出改进的属性对齐方面脱颖而出，并在多个评估基准测试中展现出卓越的性能。

更新时间: 2024-07-14 16:20:19

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2307.10864v3

Logical Distillation of Graph Neural Networks

We present a logic based interpretable model for learning on graphs and an algorithm to distill this model from a Graph Neural Network (GNN). Recent results have shown connections between the expressivity of GNNs and the two-variable fragment of first-order logic with counting quantifiers (C2). We introduce a decision-tree based model which leverages an extension of C2 to distill interpretable logical classifiers from GNNs. We test our approach on multiple GNN architectures. The distilled models are interpretable, succinct, and attain similar accuracy to the underlying GNN. Furthermore, when the ground truth is expressible in C2, our approach outperforms the GNN.

Updated: 2024-07-14 16:19:30

标题: 图神经网络的逻辑蒸馏

摘要: 我们提出了一种基于逻辑的可解释模型，用于在图上学习，并提出了一种从图神经网络（GNN）中提炼这个模型的算法。最近的研究结果表明，GNN的表达能力与一阶逻辑的双变量片段和计数量词（C2）之间存在联系。我们引入了一个基于决策树的模型，利用C2的扩展从GNN中提炼可解释的逻辑分类器。我们在多个GNN架构上测试了我们的方法。提炼出的模型是可解释的、简洁的，并且获得了与基础GNN相似的准确性。此外，当基本事实可以用C2表示时，我们的方法胜过了GNN。

更新时间: 2024-07-14 16:19:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.07126v2

What Makes and Breaks Safety Fine-tuning? Mechanistic Study

Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation framework that captures salient aspects of an unsafe input by modeling the interaction between the task the model is asked to perform (e.g., ``design'') versus the specific concepts the task is asked to be performed upon (e.g., a ``cycle'' vs. a ``bomb''). Using this, we investigate three well-known safety fine-tuning methods -- supervised safety fine-tuning, direct preference optimization, and unlearning -- and provide significant evidence demonstrating that these methods minimally transform MLP weights to specifically align unsafe inputs into its weights' null space. This yields a clustering of inputs based on whether the model deems them safe or not. Correspondingly, when an adversarial input (e.g., a jailbreak) is provided, its activations are closer to safer samples, leading to the model processing such an input as if it were safe. We validate our findings, wherever possible, on real-world models -- specifically, Llama-2 7B and Llama-3 8B.

Updated: 2024-07-14 16:12:57

标题: 什么造就和破坏安全的微调？机制研究

摘要: 安全微调有助于将大型语言模型（LLM）与人类偏好对其安全部署进行调整。为了更好地理解通过安全微调使模型安全的潜在因素，我们设计了一个合成数据生成框架，通过模拟模型被要求执行的任务（例如，“设计”）与被要求执行的特定概念之间的交互（例如，“循环”与“炸弹”）。利用这一框架，我们研究了三种众所周知的安全微调方法--监督安全微调、直接偏好优化和遗忘--并提供了显著证据表明这些方法最小程度地转换MLP权重，以将不安全的输入特别调整到权重的零空间。这导致基于模型是否认为它们安全而对输入进行聚类。相应地，当提供一个对抗性输入（例如越狱）时，其激活更接近更安全的样本，导致模型处理这样的输入就像它是安全的一样。我们在可能的情况下验证了我们的发现，特别是在现实世界模型上--具体来说是Llama-27B和Llama-38B。

更新时间: 2024-07-14 16:12:57

领域: cs.LG

下载: http://arxiv.org/abs/2407.10264v1

Spatial-Temporal Graph Representation Learning for Tactical Networks Future State Prediction

Resource allocation in tactical ad-hoc networks presents unique challenges due to their dynamic and multi-hop nature. Accurate prediction of future network connectivity is essential for effective resource allocation in such environments. In this paper, we introduce the Spatial-Temporal Graph Encoder-Decoder (STGED) framework for Tactical Communication Networks that leverages both spatial and temporal features of network states to learn latent tactical behaviors effectively. STGED hierarchically utilizes graph-based attention mechanism to spatially encode a series of communication network states, leverages a recurrent neural network to temporally encode the evolution of states, and a fully-connected feed-forward network to decode the connectivity in the future state. Through extensive experiments, we demonstrate that STGED consistently outperforms baseline models by large margins across different time-steps input, achieving an accuracy of up to 99.2\% for the future state prediction task of tactical communication networks.

Updated: 2024-07-14 15:59:14

标题: 空间-时间图表示学习用于战术网络未来状态预测

摘要: 战术自组织网络中的资源分配面临独特挑战，因为其动态和多跳的特性。准确预测未来网络连接对于在这种环境中有效地进行资源分配至关重要。本文介绍了一种用于战术通信网络的空间-时间图编码器-解码器（STGED）框架，利用网络状态的空间和时间特征有效地学习潜在的战术行为。STGED层次化地利用基于图的注意力机制对一系列通信网络状态进行空间编码，利用循环神经网络对状态的演变进行时间编码，并利用全连接前馈网络来解码未来状态的连接性。通过大量实验，我们展示了STGED在不同时间步输入下始终以较大幅度优于基线模型，实现了战术通信网络未来状态预测任务的高达99.2\%的准确率。

更新时间: 2024-07-14 15:59:14

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2403.13872v3

Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization

Safe Reinforcement Learning (RL) plays an important role in applying RL algorithms to safety-critical real-world applications, addressing the trade-off between maximizing rewards and adhering to safety constraints. This work introduces a novel approach that combines RL with trajectory optimization to manage this trade-off effectively. Our approach embeds safety constraints within the action space of a modified Markov Decision Process (MDP). The RL agent produces a sequence of actions that are transformed into safe trajectories by a trajectory optimizer, thereby effectively ensuring safety and increasing training stability. This novel approach excels in its performance on challenging Safety Gym tasks, achieving significantly higher rewards and near-zero safety violations during inference. The method's real-world applicability is demonstrated through a safe and effective deployment in a real robot task of box-pushing around obstacles.

Updated: 2024-07-14 15:56:37

标题: 在具有轨迹优化的安全嵌入式MDP中的强化学习

摘要: 安全强化学习（RL）在将RL算法应用于安全关键的现实世界应用中发挥着重要作用，解决了最大化奖励和遵守安全约束之间的权衡。本文介绍了一种将RL与轨迹优化结合起来有效管理这种权衡的新方法。我们的方法将安全约束嵌入到修改后的马尔可夫决策过程（MDP）的动作空间中。RL代理生成一系列动作，这些动作由轨迹优化器转换为安全轨迹，从而有效确保安全性并增加训练稳定性。这种新方法在挑战性的Safety Gym任务中表现出色，在推理过程中取得了显着更高的奖励和几乎零的安全违规。通过在真实机器人任务中安全有效地部署推箱子绕过障碍物的实例，证明了该方法在现实世界中的适用性。

更新时间: 2024-07-14 15:56:37

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2310.06903v2

Towards detailed and interpretable hybrid modeling of continental-scale bird migration

Hybrid modeling aims to augment traditional theory-driven models with machine learning components that learn unknown parameters, sub-models or correction terms from data. In this work, we build on FluxRGNN, a recently developed hybrid model of continental-scale bird migration, which combines a movement model inspired by fluid dynamics with recurrent neural networks that capture the complex decision-making processes of birds. While FluxRGNN has been shown to successfully predict key migration patterns, its spatial resolution is constrained by the typically sparse observations obtained from weather radars. Additionally, its trainable components lack explicit incentives to adequately predict take-off and landing events. Both aspects limit our ability to interpret model results ecologically. To address this, we propose two major modifications that allow for more detailed predictions on any desired tessellation while providing control over the interpretability of model components. In experiments on the U.S. weather radar network, the enhanced model effectively leverages the underlying movement model, resulting in strong extrapolation capabilities to unobserved locations.

Updated: 2024-07-14 15:52:19

标题: 朝向大陆尺度鸟类迁徙的详细和可解释的混合建模

摘要: 混合建模旨在通过机器学习组件来增强传统的理论驱动模型，这些组件可以从数据中学习未知参数、子模型或修正项。在这项工作中，我们基于最近开发的 FluxRGNN，这是一个大陆尺度鸟类迁徙的混合模型，它将受流体动力学启发的移动模型与捕捉鸟类复杂决策过程的循环神经网络结合起来。虽然已经证明 FluxRGNN 能成功预测关键迁徙模式，但其空间分辨率受到通常从天气雷达获得的稀疏观测的限制。此外，其可训练组件缺乏明确的动机来充分预测起飞和降落事件。这两个方面限制了我们在生态学上解释模型结果的能力。为了解决这个问题，我们提出了两个主要修改，允许在任何所需的镶嵌上进行更详细的预测，同时控制模型组件的可解释性。在美国天气雷达网络的实验中，增强的模型有效地利用了基础移动模型，从而具有强大的对未观察位置的外推能力。

更新时间: 2024-07-14 15:52:19

领域: cs.LG

下载: http://arxiv.org/abs/2407.10259v1

Deep Learning Algorithms for Early Diagnosis of Acute Lymphoblastic Leukemia

Acute lymphoblastic leukemia (ALL) is a form of blood cancer that affects the white blood cells. ALL constitutes approximately 25% of pediatric cancers. Early diagnosis and treatment of ALL are crucial for improving patient outcomes. The task of identifying immature leukemic blasts from normal cells under the microscope can prove challenging, since the images of a healthy and cancerous cell appear similar morphologically. In this study, we propose a binary image classification model to assist in the diagnostic process of ALL. Our model takes as input microscopic images of blood samples and outputs a binary prediction of whether the sample is normal or cancerous. Our dataset consists of 10661 images out of 118 subjects. Deep learning techniques on convolutional neural network architectures were used to achieve accurate classification results. Our proposed method achieved 94.3% accuracy and could be used as an assisting tool for hematologists trying to predict the likelihood of a patient developing ALL.

Updated: 2024-07-14 15:35:39

标题: 深度学习算法用于急性淋巴细胞白血病早期诊断

摘要: 急性淋巴细胞白血病（ALL）是一种影响白细胞的血液癌症。ALL占约儿童癌症的25％。早期诊断和治疗对改善患者预后至关重要。在显微镜下区分未成熟的白血病爆发细胞和正常细胞的任务可能具有挑战性，因为健康细胞和癌细胞的图像在形态上看起来相似。在这项研究中，我们提出了一个二元图像分类模型，用于辅助诊断ALL的过程。我们的模型以血液样本的显微图像作为输入，并输出样本是否正常或癌症的二元预测。我们的数据集包括118个受试者的10661张图像。利用卷积神经网络架构上的深度学习技术，实现了准确的分类结果。我们提出的方法达到了94.3％的准确率，可用作血液学家预测患者患ALL的可能性的辅助工具。

更新时间: 2024-07-14 15:35:39

领域: eess.IV,cs.CV,cs.LG,I.2.6; I.5.4

下载: http://arxiv.org/abs/2407.10251v1

An exactly solvable model for emergence and scaling laws

Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network.

Updated: 2024-07-14 15:28:01

标题: 一个可解释的模型用于涌现和标度定律

摘要: 深度学习模型在训练时间、训练数据或模型规模增加时可能表现出一种突然的解决新问题的能力，这种现象被称为出现。在本文中，我们提出了一个框架，其中每种新能力（一种技能）被表示为一个基函数。我们在这种技能基础上解决一个简单的多线性模型，找到了新技能的出现以及损失随训练时间、数据规模、模型规模和最优计算量（$C$）的缩放规律的解析表达式。我们将详细计算结果与直接模拟在多任务稀疏奇偶性数据集上训练的两层神经网络进行了比较，其中数据集中的任务按照幂律分布。我们的简单模型使用一个拟合参数捕捉了神经网络中随着训练时间、数据规模或模型规模增加而出现的多个新技能的S形出现特征。

更新时间: 2024-07-14 15:28:01

领域: cs.LG,cond-mat.dis-nn,stat.ML

下载: http://arxiv.org/abs/2404.17563v2

Explainable bank failure prediction models: Counterfactual explanations to reduce the failure risk

The accuracy and understandability of bank failure prediction models are crucial. While interpretable models like logistic regression are favored for their explainability, complex models such as random forest, support vector machines, and deep learning offer higher predictive performance but lower explainability. These models, known as black boxes, make it difficult to derive actionable insights. To address this challenge, using counterfactual explanations is suggested. These explanations demonstrate how changes in input variables can alter the model output and suggest ways to mitigate bank failure risk. The key challenge lies in selecting the most effective method for generating useful counterfactuals, which should demonstrate validity, proximity, sparsity, and plausibility. The paper evaluates several counterfactual generation methods: WhatIf, Multi Objective, and Nearest Instance Counterfactual Explanation, and also explores resampling methods like undersampling, oversampling, SMOTE, and the cost sensitive approach to address data imbalance in bank failure prediction in the US. The results indicate that the Nearest Instance Counterfactual Explanation method yields higher quality counterfactual explanations, mainly using the cost sensitive approach. Overall, the Multi Objective Counterfactual and Nearest Instance Counterfactual Explanation methods outperform others regarding validity, proximity, and sparsity metrics, with the cost sensitive approach providing the most desirable counterfactual explanations. These findings highlight the variability in the performance of counterfactual generation methods across different balancing strategies and machine learning models, offering valuable strategies to enhance the utility of black box bank failure prediction models.

Updated: 2024-07-14 15:27:27

标题: 可解释的银行倒闭预测模型：反事实解释以减少倒闭风险

摘要: 银行失败预测模型的准确性和可理解性至关重要。尽管像逻辑回归这样的可解释模型备受青睐，因为它们易于解释，但复杂模型如随机森林、支持向量机和深度学习提供了更高的预测性能，但解释性较低。这些模型被称为黑匣子，使得难以得出可操作的见解。为了解决这一挑战，建议使用对立事实解释。这些解释展示了输入变量的变化如何改变模型输出，并提出了减轻银行失败风险的方法。关键挑战在于选择最有效的方法来生成有用的对立事实，这些对立事实应该展示出有效性、接近度、稀疏性和合理性。本文评估了几种对立事实生成方法：WhatIf、多目标和最近实例对立事实解释，还探讨了重新采样方法如欠采样、过采样、SMOTE和成本敏感方法，以解决美国银行失败预测中的数据不平衡问题。结果表明，最近实例对立事实解释方法使用成本敏感方法产生了更高质量的对立事实解释。总体而言，多目标对立事实和最近实例对立事实解释方法在有效性、接近度和稀疏性指标方面优于其他方法，成本敏感方法提供了最可取的对立事实解释。这些发现突出了对立事实生成方法在不同平衡策略和机器学习模型中表现差异的可变性，提供了增强黑匣子银行失败预测模型效用的宝贵策略。

更新时间: 2024-07-14 15:27:27

领域: cs.LG

下载: http://arxiv.org/abs/2407.11089v1

Hybrid quantum physics-informed neural networks for simulating computational fluid dynamics in complex shapes

Finding the distribution of the velocities and pressures of a fluid by solving the Navier-Stokes equations is a principal task in the chemical, energy, and pharmaceutical industries, as well as in mechanical engineering and the design of pipeline systems. With existing solvers, such as OpenFOAM and Ansys, simulations of fluid dynamics in intricate geometries are computationally expensive and require re-simulation whenever the geometric parameters or the initial and boundary conditions are altered. Physics-informed neural networks are a promising tool for simulating fluid flows in complex geometries, as they can adapt to changes in the geometry and mesh definitions, allowing for generalization across fluid parameters and transfer learning across different shapes. We present a hybrid quantum physics-informed neural network that simulates laminar fluid flows in 3D Y-shaped mixers. Our approach combines the expressive power of a quantum model with the flexibility of a physics-informed neural network, resulting in a 21% higher accuracy compared to a purely classical neural network. Our findings highlight the potential of machine learning approaches, and in particular hybrid quantum physics-informed neural network, for complex shape optimization tasks in computational fluid dynamics. By improving the accuracy of fluid simulations in complex geometries, our research using hybrid quantum models contributes to the development of more efficient and reliable fluid dynamics solvers.

Updated: 2024-07-14 15:24:07

标题: 混合量子物理信息神经网络用于在复杂形状中模拟计算流体动力学

摘要: 通过解决纳维-斯托克斯方程来找到流体的速度和压力分布是化工、能源和制药行业以及机械工程和管道系统设计中的一个主要任务。使用现有的求解器，如OpenFOAM和Ansys，在复杂几何形状中进行流体动力学模拟是计算昂贵的，并且在几何参数或初始和边界条件发生变化时需要重新模拟。物理信息神经网络是在复杂几何形状中模拟流体流动的一种有前途的工具，因为它们可以适应几何形状和网格定义的变化，实现在流体参数上的泛化和在不同形状之间的迁移学习。我们提出了一种混合的量子物理信息神经网络，用于模拟3D Y形混合器中的层流流体流动。我们的方法结合了量子模型的表达能力和物理信息神经网络的灵活性，与纯粹的经典神经网络相比，精度提高了21％。我们的研究结果突出了机器学习方法，特别是混合的量子物理信息神经网络，在计算流体动力学中复杂形状优化任务中的潜力。通过提高在复杂几何形状中的流体模拟的准确性，我们使用混合量子模型的研究有助于开发更高效和可靠的流体动力学求解器。

更新时间: 2024-07-14 15:24:07

领域: cs.LG,physics.flu-dyn,quant-ph

下载: http://arxiv.org/abs/2304.11247v3

xLSTMTime : Long-term Time Series Forecasting With xLSTM

In recent years, transformer-based models have gained prominence in multivariate long-term time series forecasting (LTSF), demonstrating significant advancements despite facing challenges such as high computational demands, difficulty in capturing temporal dynamics, and managing long-term dependencies. The emergence of LTSF-Linear, with its straightforward linear architecture, has notably outperformed transformer-based counterparts, prompting a reevaluation of the transformer's utility in time series forecasting. In response, this paper presents an adaptation of a recent architecture termed extended LSTM (xLSTM) for LTSF. xLSTM incorporates exponential gating and a revised memory structure with higher capacity that has good potential for LTSF. Our adopted architecture for LTSF termed as xLSTMTime surpasses current approaches. We compare xLSTMTime's performance against various state-of-the-art models across multiple real-world da-tasets, demonstrating superior forecasting capabilities. Our findings suggest that refined recurrent architectures can offer competitive alternatives to transformer-based models in LTSF tasks, po-tentially redefining the landscape of time series forecasting.

Updated: 2024-07-14 15:15:00

标题: xLSTMTime：使用xLSTM进行长期时间序列预测

摘要: 近年来，基于Transformer的模型在多变量长期时间序列预测(LTSF)中备受关注，尽管面临高计算需求、捕捉时间动态和管理长期依赖性等挑战，但仍然取得了显著进展。具有直观线性结构的LTSF-Linear的出现明显优于基于Transformer的对应模型，促使重新评估Transformer在时间序列预测中的实用性。为此，本文提出了一种适用于LTSF的最新架构——扩展LSTM（xLSTM）。xLSTM结合了指数门控和具有更高容量的修订记忆结构，对LTSF具有良好的潜力。我们采用的名为xLSTMTime的LTSF架构超越了当前的方法。我们将xLSTMTime的性能与多个实际数据集上的各种最先进模型进行比较，展示出优越的预测能力。我们的研究结果表明，精细调整的循环架构可以为LTSF任务提供与基于Transformer的模型竞争的替代方案，可能重新定义时间序列预测的格局。

更新时间: 2024-07-14 15:15:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.10240v1

Parameter Estimation for Generalized Low-Rank Matrix Sensing by Learning on Riemannian Manifolds

We prove convergence guarantees for generalized low-rank matrix sensing -- i.e., where matrix sensing where the observations may be passed through some nonlinear link function. We focus on local convergence of the optimal estimator, ignoring questions of optimization. In particular, assuming the minimizer of the empirical loss $\theta^0$ is in a constant size ball around the true parameters $\theta^*$, we prove that $d(\theta^0,\theta^*)=\tilde{O}(\sqrt{dk^2/n})$. Our analysis relies on tools from Riemannian geometry to handle the rotational symmetry in the parameter space.

Updated: 2024-07-14 15:11:13

标题: 在黎曼流形上学习的广义低秩矩阵感知参数估计

摘要: 我们证明了广义低秩矩阵感知的收敛性保证 - 即，在矩阵感知中，观测可以通过某些非线性链接函数传递。我们专注于最优估计器的局部收敛性，忽略优化问题。特别是，假设经验损失的最小化器$\theta^0$在真实参数$\theta^*$周围的一个常数大小的球中，我们证明$d(\theta^0,\theta^*)=\tilde{O}(\sqrt{dk^2/n})$。我们的分析依赖于黎曼几何的工具，以处理参数空间中的旋转对称性。

更新时间: 2024-07-14 15:11:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.10238v1

Visual Prompt Selection for In-Context Learning Segmentation

As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual prompts or simply apply similarity sorting to select contextual examples. In this paper, we focus on rethinking and improving the example selection strategy. By comprehensive comparisons, we first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation. Based on the above insights, we propose a new stepwise context search method. Different from previous works, we construct a small yet rich candidate pool and adaptively search the well-matched contexts. More importantly, this method effectively reduces the annotation cost by compacting the search space. Extensive experiments show that our method is an effective strategy for selecting examples and enhancing segmentation performance.

Updated: 2024-07-14 15:02:54

标题: 上下文学习分割中的视觉提示选择

摘要: 作为计算机视觉中的一个基础且广泛研究的任务，图像分割旨在在像素级别定位和识别不同的语义概念。最近，受到上下文学习（ICL）的启发，提出了几种通用的分割框架，为分割特定对象提供了一个有前途的范例。然而，现有的工作大多忽略了视觉提示的价值，或者简单地应用相似性排序来选择上下文示例。在本文中，我们专注于重新思考和改进示例选择策略。通过全面比较，我们首先证明了基于ICL的分割模型对不同上下文敏感。此外，经验证据表明上下文提示的多样性在指导分割中起着关键作用。基于上述观点，我们提出了一种新的分步上下文搜索方法。与以前的工作不同，我们构建了一个小而丰富的候选池，并自适应地搜索匹配良好的上下文。更重要的是，这种方法通过压缩搜索空间有效地降低了注释成本。大量实验证明，我们的方法是选择示例和增强分割性能的有效策略。

更新时间: 2024-07-14 15:02:54

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.10233v1

Weighted Aggregation of Conformity Scores for Classification

Conformal prediction is a powerful framework for constructing prediction sets with valid coverage guarantees in multi-class classification. However, existing methods often rely on a single score function, which can limit their efficiency and informativeness. We propose a novel approach that combines multiple score functions to improve the performance of conformal predictors by identifying optimal weights that minimize prediction set size. Our theoretical analysis establishes a connection between the weighted score functions and subgraph classes of functions studied in Vapnik-Chervonenkis theory, providing a rigorous mathematical basis for understanding the effectiveness of the proposed method. Experiments demonstrate that our approach consistently outperforms single-score conformal predictors while maintaining valid coverage, offering a principled and data-driven way to enhance the efficiency and practicality of conformal prediction in classification tasks.

Updated: 2024-07-14 14:58:03

标题: 分类的加权一致性分数聚合

摘要: 共形预测是在多类分类中构建具有有效覆盖保证的预测集的强大框架。然而，现有方法通常依赖于单个评分函数，这可能限制它们的效率和信息量。我们提出了一种新颖的方法，通过结合多个评分函数来改进共形预测器的性能，通过确定最小化预测集大小的最佳权重。我们的理论分析建立了加权评分函数与瓦普尼克-切尔文科斯理论中研究的子图类函数之间的联系，为理解所提出的方法的有效性提供了严格的数学基础。实验证明，我们的方法在保持有效覆盖的同时始终优于单一评分共形预测器，为增强分类任务中共形预测的效率和实用性提供了一种原则性和数据驱动的方式。

更新时间: 2024-07-14 14:58:03

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2407.10230v1

TwinS: Revisiting Non-Stationarity in Multivariate Time Series Forecasting

Recently, multivariate time series forecasting tasks have garnered increasing attention due to their significant practical applications, leading to the emergence of various deep forecasting models. However, real-world time series exhibit pronounced non-stationary distribution characteristics. These characteristics are not solely limited to time-varying statistical properties highlighted by non-stationary Transformer but also encompass three key aspects: nested periodicity, absence of periodic distributions, and hysteresis among time variables. In this paper, we begin by validating this theory through wavelet analysis and propose the Transformer-based TwinS model, which consists of three modules to address the non-stationary periodic distributions: Wavelet Convolution, Period-Aware Attention, and Channel-Temporal Mixed MLP. Specifically, The Wavelet Convolution models nested periods by scaling the convolution kernel size like wavelet transform. The Period-Aware Attention guides attention computation by generating period relevance scores through a convolutional sub-network. The Channel-Temporal Mixed MLP captures the overall relationships between time series through channel-time mixing learning. TwinS achieves SOTA performance compared to mainstream TS models, with a maximum improvement in MSE of 25.8\% over PatchTST.

Updated: 2024-07-14 14:55:16

标题: TwinS: 重新审视多变量时间序列预测中的非平稳性

摘要: 最近，由于多变量时间序列预测任务具有显著的实际应用，吸引了越来越多的关注，导致各种深度预测模型的出现。然而，现实世界中的时间序列展现出明显的非平稳分布特征。这些特征不仅仅局限于非平稳Transformer突出的时间变化统计特性，还包括三个关键方面：嵌套周期性、缺乏周期性分布以及时间变量之间的磁滞。在本文中，我们首先通过小波分析验证了这一理论，并提出了基于Transformer的TwinS模型，该模型由三个模块组成，用于处理非平稳周期分布：小波卷积、周期感知注意力和通道-时间混合MLP。具体来说，小波卷积通过像小波变换一样调整卷积核大小来建模嵌套周期。周期感知注意力通过通过一个卷积子网络生成周期相关性得分来引导注意力计算。通道-时间混合MLP通过通道-时间混合学习捕捉时间序列之间的整体关系。与主流TS模型相比，TwinS实现了SOTA性能，在MSE最大改进方面超过PatchTST的25.8％。

更新时间: 2024-07-14 14:55:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2406.03710v2

Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective

In long-term time series forecasting (LTSF) tasks, an increasing number of models have acknowledged that discrete time series originate from continuous dynamic systems and have attempted to model their dynamical structures. Recognizing the chaotic nature of real-world data, our model, \textbf{\textit{Attraos}}, incorporates chaos theory into LTSF, perceiving real-world time series as observations from unknown high-dimensional chaotic dynamic systems. Under the concept of attractor invariance, Attraos utilizes non-parametric Phase Space Reconstruction embedding and the proposed multi-scale dynamic memory unit to memorize historical dynamics structure and predicts by a frequency-enhanced local evolution strategy. Detailed theoretical analysis and abundant empirical evidence consistently show that Attraos outperforms various LTSF methods on mainstream LTSF datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.

Updated: 2024-07-14 14:46:50

标题: 吸引子记忆用于长期时间序列预测：混沌视角

摘要: 在长期时间序列预测（LTSF）任务中，越来越多的模型已经认识到离散时间序列源自连续动态系统，并尝试对它们的动态结构进行建模。认识到真实世界数据的混沌性质，我们的模型\textbf{\textit{Attraos}}将混沌理论纳入到LTSF中，将真实世界时间序列视为来自未知高维混沌动态系统的观测。在吸引子不变性的概念下，Attraos利用非参数相空间重构嵌入和提出的多尺度动态记忆单元来记忆历史动态结构，并通过频率增强的局部演化策略进行预测。详细的理论分析和丰富的经验证据一致表明，与PatchTST相比，Attraos在主流LTSF数据集和混沌数据集上表现优异，而参数仅为PatchTST的十二分之一。

更新时间: 2024-07-14 14:46:50

领域: cs.LG,cs.AI,nlin.CD

下载: http://arxiv.org/abs/2402.11463v6

Surpassing legacy approaches to PWR core reload optimization with single-objective Reinforcement learning

Optimizing the fuel cycle cost through the optimization of nuclear reactor core loading patterns involves multiple objectives and constraints, leading to a vast number of candidate solutions that cannot be explicitly solved. To advance the state-of-the-art in core reload patterns, we have developed methods based on Deep Reinforcement Learning (DRL) for both single- and multi-objective optimization. Our previous research has laid the groundwork for these approaches and demonstrated their ability to discover high-quality patterns within a reasonable time frame. On the other hand, stochastic optimization (SO) approaches are commonly used in the literature, but there is no rigorous explanation that shows which approach is better in which scenario. In this paper, we demonstrate the advantage of our RL-based approach, specifically using Proximal Policy Optimization (PPO), against the most commonly used SO-based methods: Genetic Algorithm (GA), Parallel Simulated Annealing (PSA) with mixing of states, and Tabu Search (TS), as well as an ensemble-based method, Prioritized Replay Evolutionary and Swarm Algorithm (PESA). We found that the LP scenarios derived in this paper are amenable to a global search to identify promising research directions rapidly, but then need to transition into a local search to exploit these directions efficiently and prevent getting stuck in local optima. PPO adapts its search capability via a policy with learnable weights, allowing it to function as both a global and local search method. Subsequently, we compared all algorithms against PPO in long runs, which exacerbated the differences seen in the shorter cases. Overall, the work demonstrates the statistical superiority of PPO compared to the other considered algorithms.

Updated: 2024-07-14 14:45:52

标题: 使用单目标强化学习超越传统的PWR堆芯重新装填优化方法

摘要: 通过优化核反应堆堆芯装载图案，优化燃料循环成本涉及多个目标和约束，导致大量候选解决方案无法明确解决。为了推动核反应堆重新装载图案的最新技术，我们开发了基于深度强化学习（DRL）的单目标和多目标优化方法。我们先前的研究为这些方法奠定了基础，并展示了它们在合理时间内发现高质量图案的能力。另一方面，随机优化（SO）方法在文献中常用，但没有严格的解释显示哪种方法在哪种情况下更好。在本文中，我们展示了我们基于RL的方法的优势，具体使用Proximal Policy Optimization（PPO），针对最常用的基于SO的方法：遗传算法（GA）、并行模拟退火（PSA）混合状态以及禁忌搜索（TS），以及一种基于集成的方法，Prioritized Replay Evolutionary and Swarm Algorithm（PESA）。我们发现，在本文中推导的LP场景适合进行全局搜索，以快速确定有前途的研究方向，但随后需要转变为局部搜索，以高效利用这些方向并防止陷入局部最优解。PPO通过具有可学习权重的策略调整其搜索能力，使其能够同时充当全局和局部搜索方法。随后，我们对所有算法进行了长时间运行的比较，加剧了短时间案例中所见差异。总体而言，该研究表明PPO在统计上优于其他考虑的算法。

更新时间: 2024-07-14 14:45:52

领域: cs.NE,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2402.11040v2

Time-SSM: Simplifying and Unifying State Space Models for Time Series Forecasting

State Space Models (SSMs) have emerged as a potent tool in sequence modeling tasks in recent years. These models approximate continuous systems using a set of basis functions and discretize them to handle input data, making them well-suited for modeling time series data collected at specific frequencies from continuous systems. Despite its potential, the application of SSMs in time series forecasting remains underexplored, with most existing models treating SSMs as a black box for capturing temporal or channel dependencies. To address this gap, this paper proposes a novel theoretical framework termed Dynamic Spectral Operator, offering more intuitive and general guidance on applying SSMs to time series data. Building upon our theory, we introduce Time-SSM, a novel SSM-based foundation model with only one-seventh of the parameters compared to Mamba. Various experiments validate both our theoretical framework and the superior performance of Time-SSM.

Updated: 2024-07-14 14:40:20

标题: Time-SSM：简化和统一时间序列预测的状态空间模型

摘要: 最近几年，状态空间模型（SSMs）已经成为序列建模任务中的一种强大工具。这些模型利用一组基函数来近似连续系统，并将其离散化以处理输入数据，使它们非常适合对从连续系统以特定频率收集的时间序列数据进行建模。尽管其潜力巨大，但在时间序列预测中应用SSMs的研究仍未被充分探索，大多数现有模型将SSMs视为捕捉时间或通道依赖性的黑匣子。为了填补这一空白，本文提出了一个称为动态谱操作符的新颖理论框架，提供了更直观和通用的指导，用于将SSMs应用于时间序列数据。基于我们的理论，我们引入了Time-SSM，一个新颖的基于SSM的基础模型，与Mamba相比，参数数量仅为其七分之一。各种实验验证了我们的理论框架以及Time-SSM的卓越性能。

更新时间: 2024-07-14 14:40:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.16312v2

Density Estimation via Binless Multidimensional Integration

We introduce the Binless Multidimensional Thermodynamic Integration (BMTI) method for nonparametric, robust, and data-efficient density estimation. BMTI estimates the logarithm of the density by initially computing log-density differences between neighbouring data points. Subsequently, such differences are integrated, weighted by their associated uncertainties, using a maximum-likelihood formulation. This procedure can be seen as an extension to a multidimensional setting of the thermodynamic integration, a technique developed in statistical physics. The method leverages the manifold hypothesis, estimating quantities within the intrinsic data manifold without defining an explicit coordinate map. It does not rely on any binning or space partitioning, but rather on the construction of a neighbourhood graph based on an adaptive bandwidth selection procedure. BMTI mitigates the limitations commonly associated with traditional nonparametric density estimators, effectively reconstructing smooth profiles even in high-dimensional embedding spaces. The method is tested on a variety of complex synthetic high-dimensional datasets, where it is shown to outperform traditional estimators, and is benchmarked on realistic datasets from the chemical physics literature.

Updated: 2024-07-14 14:38:16

标题: 通过无箱多维积分进行密度估计

摘要: 我们介绍了无箱多维热力学积分（BMTI）方法，用于非参数、鲁棒和数据高效的密度估计。BMTI通过最初计算相邻数据点之间的对数密度差异来估计密度的对数。随后，通过最大似然公式对这些差异进行加权积分，加权值为它们的相关不确定性。这个过程可以看作是热力学积分在多维设置中的扩展，热力学积分是一种在统计物理学中发展起来的技术。该方法利用流形假设，估计内在数据流形中的量，而不定义显式坐标映射。它不依赖于任何分箱或空间分割，而是基于自适应带宽选择过程构建邻域图。BMTI缓解了通常与传统非参数密度估计器相关的限制，有效地重建了即使在高维嵌入空间中也是平滑的曲线。该方法在各种复杂的合成高维数据集上进行了测试，显示出优于传统估计器，并在化学物理文献中的实际数据集上进行了基准测试。

更新时间: 2024-07-14 14:38:16

领域: stat.ML,cs.LG,physics.chem-ph,physics.data-an

下载: http://arxiv.org/abs/2407.08094v2

Practical Unlearning for Large Language Models

While LLMs have demonstrated impressive performance across various domains and tasks, their security issues have become increasingly severe. Machine unlearning (MU) has emerged as a promising solution to address these issues by removing the influence of undesired data on the target model without compromising its utility in other aspects. MU typically assumes full access to the original training data to preserve utility, which is difficult to achieve in LLM unlearning. Existing LLM unlearning methods often assume access to data most affected by undesired data unlearning. However, this assumption underestimates the entanglement among various LLM capabilities and ignores data access limitations due to various issues. Moreover, these LLM unlearning methods do not sufficiently consider that unlearning requests in real-world scenarios are continuously emerging. To overcome these challenges and achieve practical LLM unlearning, we propose the O3 framework. The O3 framework includes an Out-Of-Distribution (OOD) detector to measure the similarity between input and unlearning data, and an Orthogonal low-rank adapter (LoRA) for continuously unlearning requested data. The OOD detector is trained with a novel contrastive entropy loss and utilizes a local-global layer-aggregated scoring mechanism. The orthogonal LoRA achieves parameter disentanglement among continual unlearning requests. During inference, our O3 framework can smartly decide whether and to what extent to load the unlearning LoRA based on the OOD detector's predictions. Notably, O3's effectiveness does not rely on any retained data. We conducted extensive experiments on O3 and state-of-the-art LLM unlearning methods across three tasks and seven datasets. The results indicate that O3 consistently achieves the best trade-off between unlearning effectiveness and utility preservation, especially when facing continuous unlearning requests.

Updated: 2024-07-14 14:26:17

标题: 大型语言模型的实用性遗忘

摘要: 虽然LLM在各个领域和任务中表现出色，但它们的安全问题变得越来越严重。机器遗忘（MU）已经成为一个有前途的解决方案，通过消除不良数据对目标模型的影响，而不影响其在其他方面的效用。MU通常假定可以完全访问原始训练数据以保留效用，但在LLM遗忘中很难实现。现有的LLM遗忘方法通常假定可以访问受不良数据遗忘影响最严重的数据。然而，这种假设低估了各种LLM能力之间的纠缠，并忽视了由于各种问题而导致的数据访问限制。此外，这些LLM遗忘方法并没有充分考虑到在现实场景中持续不断出现的遗忘请求。为了克服这些挑战并实现实际的LLM遗忘，我们提出了O3框架。O3框架包括一个用于衡量输入和遗忘数据之间相似性的Out-Of-Distribution（OOD）检测器，以及一个用于持续遗忘请求数据的正交低秩适配器（LoRA）。OOD检测器使用新颖的对比熵损失进行训练，并利用局部-全局层聚合评分机制。正交LoRA实现了在持续遗忘请求之间的参数解缠。在推断过程中，我们的O3框架可以智能地根据OOD检测器的预测来决定是否以及在多大程度上加载遗忘LoRA。值得注意的是，O3的有效性不依赖于任何保留数据。我们在O3和最先进的LLM遗忘方法上进行了大量实验，涵盖了三个任务和七个数据集。结果表明，O3在遗忘效果和效用保留之间始终实现了最佳权衡，尤其是在面对持续遗忘请求时。

更新时间: 2024-07-14 14:26:17

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2407.10223v1

Stable generative modeling using Schrödinger bridges

We consider the problem of sampling from an unknown distribution for which only a sufficiently large number of training samples are available. Such settings have recently drawn considerable interest in the context of generative modelling and Bayesian inference. In this paper, we propose a generative model combining Schr\"odinger bridges and Langevin dynamics. Schr\"odinger bridges over an appropriate reversible reference process are used to approximate the conditional transition probability from the available training samples, which is then implemented in a discrete-time reversible Langevin sampler to generate new samples. By setting the kernel bandwidth in the reference process to match the time step size used in the unadjusted Langevin algorithm, our method effectively circumvents any stability issues typically associated with the time-stepping of stiff stochastic differential equations. Moreover, we introduce a novel split-step scheme, ensuring that the generated samples remain within the convex hull of the training samples. Our framework can be naturally extended to generate conditional samples and to Bayesian inference problems. We demonstrate the performance of our proposed scheme through experiments on synthetic datasets with increasing dimensions and on a stochastic subgrid-scale parametrization conditional sampling problem.

Updated: 2024-07-14 14:18:26

标题: 使用薛定谔桥构建稳定的生成模型

摘要: 我们考虑从一个未知分布中抽样的问题，只有足够数量的训练样本可用。最近在生成建模和贝叶斯推断的背景下，这类设定引起了相当大的兴趣。在本文中，我们提出了一个结合Schr\"odinger桥和Langevin动力学的生成模型。Schr\"odinger桥在一个适当的可逆参考过程上被用来近似从可用的训练样本中的条件转移概率，然后在一个离散时间可逆的Langevin采样器中实施，以生成新样本。通过将参考过程中的核带宽设置为与未调整的Langevin算法中使用的时间步长大小相匹配，我们的方法有效地规避了通常与僵硬随机微分方程的时间步骤相关的任何稳定性问题。此外，我们引入了一种新颖的分步方案，确保生成的样本保持在训练样本的凸包内。我们的框架可以自然地扩展到生成条件样本和贝叶斯推断问题。我们通过对具有不断增加维度的合成数据集和随机子网格尺度参数化条件抽样问题的实验，展示了我们所提出方案的性能。

更新时间: 2024-07-14 14:18:26

领域: stat.ML,cs.LG,cs.NA,math.NA,stat.CO,60H10, 62F15, 62F30, 65C05, 65C40

下载: http://arxiv.org/abs/2401.04372v2

Learning to Steer Markovian Agents under Model Uncertainty

Designing incentives for an adapting population is a ubiquitous problem in a wide array of economic applications and beyond. In this work, we study how to design additional rewards to steer multi-agent systems towards desired policies \emph{without} prior knowledge of the agents' underlying learning dynamics. We introduce a model-based non-episodic Reinforcement Learning (RL) formulation for our steering problem. Importantly, we focus on learning a \emph{history-dependent} steering strategy to handle the inherent model uncertainty about the agents' learning dynamics. We introduce a novel objective function to encode the desiderata of achieving a good steering outcome with reasonable cost. Theoretically, we identify conditions for the existence of steering strategies to guide agents to the desired policies. Complementing our theoretical contributions, we provide empirical algorithms to approximately solve our objective, which effectively tackles the challenge in learning history-dependent strategies. We demonstrate the efficacy of our algorithms through empirical evaluations.

Updated: 2024-07-14 14:01:38

标题: 学习在模型不确定性下引导马尔可夫代理

摘要: 在广泛的经济应用中，为适应性人口设计激励措施是一个普遍存在的问题。本文研究如何设计额外奖励来引导多代理系统朝着期望的政策方向发展，而不需要事先了解代理人基础学习动态。我们引入了一个基于模型的非情节性强化学习（RL）形式化来解决我们的引导问题。重要的是，我们专注于学习一个基于历史的引导策略，以处理关于代理人学习动态的固有模型不确定性。我们引入了一个新颖的目标函数，用于编码实现良好引导结果和合理成本的要求。在理论上，我们确定了引导策略指导代理人达到期望政策的存在条件。作为我们理论贡献的补充，我们提供了经验算法来近似解决我们的目标，有效地解决了学习基于历史的策略的挑战。我们通过实证评估证明了我们算法的有效性。

更新时间: 2024-07-14 14:01:38

领域: cs.LG,cs.AI,cs.MA,stat.ML

下载: http://arxiv.org/abs/2407.10207v1

Empowering ChatGPT-Like Large-Scale Language Models with Local Knowledge Base for Industrial Prognostics and Health Management

Prognostics and health management (PHM) is essential for industrial operation and maintenance, focusing on predicting, diagnosing, and managing the health status of industrial systems. The emergence of the ChatGPT-Like large-scale language model (LLM) has begun to lead a new round of innovation in the AI field. It has extensively promoted the level of intelligence in various fields. Therefore, it is also expected further to change the application paradigm in industrial PHM and promote PHM to become intelligent. Although ChatGPT-Like LLMs have rich knowledge reserves and powerful language understanding and generation capabilities, they lack domain-specific expertise, significantly limiting their practicability in PHM applications. To this end, this study explores the ChatGPT-Like LLM empowered by the local knowledge base (LKB) in industrial PHM to solve the above limitations. In addition, we introduce the method and steps of combining the LKB with LLMs, including LKB preparation, LKB vectorization, prompt engineering, etc. Experimental analysis of real cases shows that combining the LKB with ChatGPT-Like LLM can significantly improve its performance and make ChatGPT-Like LLMs more accurate, relevant, and able to provide more insightful information. This can promote the development of ChatGPT-Like LLMs in industrial PHM and promote their efficiency and quality.

Updated: 2024-07-14 14:01:01

标题: 为工业预测与健康管理赋能ChatGPT-Like大规模语言模型，并结合本地知识库

摘要: 预测与健康管理（PHM）对于工业运营和维护至关重要，其重点是预测、诊断和管理工业系统的健康状态。ChatGPT-Like大规模语言模型（LLM）的出现开始引领人工智能领域的新一轮创新。它广泛提升了各个领域的智能水平。因此，人们也期待着进一步改变工业PHM的应用范式，推动PHM智能化。尽管ChatGPT-Like LLMs具有丰富的知识储备和强大的语言理解与生成能力，但它们缺乏领域特定的专业知识，极大地限制了它们在PHM应用中的实用性。因此，本研究探讨了在工业PHM中使用本地知识库（LKB）赋能ChatGPT-Like LLM以解决上述限制。此外，我们介绍了将LKB与LLMs结合的方法和步骤，包括LKB准备、LKB向量化、提示工程等。对真实案例的实验分析表明，将LKB与ChatGPT-Like LLM结合可以显著提高其性能，使ChatGPT-Like LLM更准确、相关，并能提供更深入的信息。这可以推动ChatGPT-Like LLM在工业PHM中的发展，提高其效率和质量。

更新时间: 2024-07-14 14:01:01

领域: cs.IR,cs.AI,cs.CL

下载: http://arxiv.org/abs/2312.14945v2

Dominant Design Prediction with Phylogenetic Networks

This study proposes an effective method to predict technology development from an evolutionary perspective. Product evolution is the result of technological evolution and market selection. A phylogenetic network is the main method to study product evolution. The formation of the dominant design determines the trajectory of technology development. How to predict future dominant design has become a key issue in technology forecasting and new product development. We define the dominant product and use machine learning methods, combined with product evolutionary theory, to construct a Fully Connected Phylogenetic Network dataset to effectively predict the future dominant design.

Updated: 2024-07-14 14:00:02

标题: 用系统发育网络预测优势设计

摘要: 这项研究提出了一种从进化的角度预测技术发展的有效方法。产品的演进是技术演进和市场选择的结果。系统发生树网络是研究产品演变的主要方法。主导设计的形成决定了技术发展的轨迹。如何预测未来的主导设计已成为技术预测和新产品开发中的关键问题。我们定义主导产品，并运用机器学习方法，结合产品进化理论，构建一个完全连接的系统发生树网络数据集，以有效预测未来的主导设计。

更新时间: 2024-07-14 14:00:02

领域: cs.CE,cs.AI,cs.NE,cs.SI

下载: http://arxiv.org/abs/2407.10206v1

Computational Copyright: Towards A Royalty Model for Music Generative AI

The advancement of generative AI has given rise to pressing copyright challenges, especially within the music industry. This paper focuses on the economic aspects of these challenges, emphasizing that the economic impact constitutes a central issue in the copyright arena. Furthermore, the complexity of the black-box generative AI technologies not only suggests but necessitates algorithmic solutions. Yet, such solutions have been largely missing, exacerbating regulatory hurdles in this landscape. We seek to address this gap by proposing viable royalty models for revenue sharing on AI music generation platforms. We start by examining existing royalty models utilized by platforms like Spotify and YouTube, and then discuss how to adapt them to the unique context of AI-generated music. A significant challenge emerging from this adaptation is the attribution of AI-generated music to influential copyrighted content in the training data. To this end, we present algorithmic solutions employing data attribution techniques. We also conduct a range of experiments to verify the effectiveness and robustness of these solutions. This research is one of the early attempts to integrate technical advancements with economic and legal considerations in the field of music generative AI, offering a computational copyright solution for the challenges posed by the opaque nature of AI technologies.

Updated: 2024-07-14 13:49:37

标题: 计算版权：针对音乐生成人工智能的版税模型

摘要: 生成AI技术的进步引发了音乐产业内尤为紧迫的版权挑战。本文侧重于这些挑战的经济方面，强调经济影响在版权领域中构成一个核心问题。此外，黑匣子生成AI技术的复杂性不仅暗示，而且必须采用算法解决方案。然而，这样的解决方案在很大程度上缺失，加剧了该领域中的监管障碍。我们通过提出适用于AI音乐生成平台的可行版税模型来填补这一空白。我们首先检验了Spotify和YouTube等平台使用的现有版税模型，然后讨论如何将其调整以适应AI生成音乐的独特背景。由此产生的一个重大挑战是将AI生成的音乐归因于训练数据中的有影响力的受版权保护内容。为此，我们提出了采用数据归因技术的算法解决方案。我们还进行了一系列实验来验证这些解决方案的有效性和稳健性。这项研究是在音乐生成AI领域将技术进步与经济和法律考虑相结合的早期尝试，为AI技术的不透明特性带来的挑战提供了计算版权解决方案。

更新时间: 2024-07-14 13:49:37

领域: cs.AI

下载: http://arxiv.org/abs/2312.06646v3

Improving Graph Out-of-distribution Generalization on Real-world Data

Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environments and invariant sub-graphs, this paper presents the theorems of environment-label dependency and mutable rationale invariance, where the former characterizes the usefulness of environments in determining graph labels while the latter refers to the mutable importance of graph rationales. Based on analytic investigations, a novel variational inference based method named ``Probability Dependency on Environments and Rationales for OOD Graphs on Real-world Data'' (DEROG) is introduced. To alleviate the adverse effect of unknown prior knowledge on environments and rationales, DEROG utilizes generalized Bayesian inference. Further, DEROG employs an EM-based algorithm for optimization. Finally, extensive experiments on real-world datasets under different distribution shifts are conducted to show the superiority of DEROG. Our code is publicly available at https://anonymous.4open.science/r/DEROG-536B.

Updated: 2024-07-14 13:48:25

标题: Improving Graph Out-of-distribution Generalization on Real-world Data （优化在真实世界数据上的图形分布外泛化）

摘要: 现有的图形超出分布（OOD）泛化方法主要依赖于对合成数据集的经验研究。这些方法往往过分强调不变子图与标签之间的因果关系，从而忽视了在现实世界场景中环境的不可忽略作用。与之前的研究相反，这篇论文提出了环境-标签依赖性和可变理由不变性的定理，前者表征了环境在确定图标签方面的有用性，而后者指的是图推理的可变重要性。基于分析调查，引入了一种基于变分推理的新型方法，名为“基于真实数据的OOD图形的环境和理由概率依赖性”（DEROG）。为了减轻对环境和理由的未知先验知识的不利影响，DEROG利用了广义贝叶斯推断。此外，DEROG采用基于EM的算法进行优化。最后，在不同分布转移下的真实数据集上进行了广泛实验，以展示DEROG的优越性。我们的代码可以在https://anonymous.4open.science/r/DEROG-536B 上公开获取。

更新时间: 2024-07-14 13:48:25

领域: cs.LG

下载: http://arxiv.org/abs/2407.10204v1

Evolutionary Retrosynthetic Route Planning

Molecular retrosynthesis is a significant and complex problem in the field of chemistry, however, traditional manual synthesis methods not only need well-trained experts but also are time-consuming. With the development of big data and machine learning, artificial intelligence (AI) based retrosynthesis is attracting more attention and has become a valuable tool for molecular retrosynthesis. At present, Monte Carlo tree search is a mainstream search framework employed to address this problem. Nevertheless, its search efficiency is compromised by its large search space. Therefore, this paper proposes a novel approach for retrosynthetic route planning based on evolutionary optimization, marking the first use of Evolutionary Algorithm (EA) in the field of multi-step retrosynthesis. The proposed method involves modeling the retrosynthetic problem into an optimization problem, defining the search space and operators. Additionally, to improve the search efficiency, a parallel strategy is implemented. The new approach is applied to four case products and compared with Monte Carlo tree search. The experimental results show that, in comparison to the Monte Carlo tree search algorithm, EA significantly reduces the number of calling single-step model by an average of 53.9%. The time required to search three solutions decreases by an average of 83.9%, and the number of feasible search routes increases by 1.38 times. The source code is available at https://github.com/ilog-ecnu/EvoRRP.

Updated: 2024-07-14 13:43:23

标题: 进化性逆合成路线规划

摘要: 分子逆合成是化学领域中一个重要且复杂的问题，然而，传统的手动合成方法不仅需要经过良好训练的专家，而且耗时。随着大数据和机器学习的发展，基于人工智能（AI）的逆合成引起了越来越多的关注，并已成为分子逆合成的有价值工具。目前，蒙特卡洛树搜索是用来解决这一问题的主流搜索框架。然而，由于搜索空间较大，其搜索效率受到影响。因此，本文提出了一种基于进化优化的逆合成路线规划新方法，这是首次在多步逆合成领域使用进化算法（EA）。所提出的方法将逆合成问题建模为一个优化问题，定义了搜索空间和操作符。此外，为了提高搜索效率，实施了并行策略。新方法应用于四个案例产品，并与蒙特卡洛树搜索进行了比较。实验结果表明，与蒙特卡洛树搜索算法相比，EA将单步模型调用数量平均减少了53.9％。搜索三个解所需的时间平均减少了83.9％，可行搜索路线的数量增加了1.38倍。源代码可在https://github.com/ilog-ecnu/EvoRRP 上找到。

更新时间: 2024-07-14 13:43:23

领域: cs.AI

下载: http://arxiv.org/abs/2310.05186v2

Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these challenges, we propose Shape2Scene (S2S), a novel method that learns representations of large-scale 3D scenes from 3D shape data. We first design multiscale and high-resolution backbones for shape and scene level 3D tasks, i.e., MH-P (point-based) and MH-V (voxel-based). MH-P/V establishes direct paths to highresolution features that capture deep semantic information across multiple scales. This pivotal nature makes them suitable for a wide range of 3D downstream tasks that tightly rely on high-resolution features. We then employ a Shape-to-Scene strategy (S2SS) to amalgamate points from various shapes, creating a random pseudo scene (comprising multiple objects) for training data, mitigating disparities between shapes and scenes. Finally, a point-point contrastive loss (PPC) is applied for the pre-training of MH-P/V. In PPC, the inherent correspondence (i.e., point pairs) is naturally obtained in S2SS. Extensive experiments have demonstrated the transferability of 3D representations learned by MH-P/V across shape-level and scene-level 3D tasks. MH-P achieves notable performance on well-known point cloud datasets (93.8% OA on ScanObjectNN and 87.6% instance mIoU on ShapeNetPart). MH-V also achieves promising performance in 3D semantic segmentation and 3D object detection.

Updated: 2024-07-14 13:42:05

标题: Shape2Scene：通过在形状数据上进行预训练学习的3D场景表示学习

摘要: 目前的3D自监督学习方法在3D场景方面面临数据匮乏的问题，这是由于3D场景数据收集过程耗时且昂贵所导致的。相比之下，3D形状数据更容易收集。尽管如此，现有的形状数据预训练策略对于3D场景理解的潜力有限，因为点数量存在显著差异。为了解决这些挑战，我们提出了Shape2Scene（S2S），这是一种从3D形状数据中学习大规模3D场景表示的新方法。我们首先为形状和场景级3D任务设计了多尺度和高分辨率的骨干网络，即MH-P（基于点）和MH-V（基于体素）。MH-P/V建立了通往捕获跨多个尺度的深层语义信息的高分辨率特征的直接路径。这一关键性质使它们适用于广泛的依赖于高分辨率特征的3D下游任务。然后，我们采用了一种从形状到场景的策略（S2SS），将来自各种形状的点合并，为训练数据创建了一个随机伪场景（包含多个对象），从而减轻了形状和场景之间的差异。最后，我们对MH-P/V进行了预训练，应用了点-点对比损失（PPC）。在PPC中，自然地获得了S2SS中的内在对应关系（即点对）。大量实验证明了MH-P/V学习的3D表示在形状级和场景级3D任务之间的可迁移性。MH-P在知名点云数据集上取得了显著表现（在ScanObjectNN上为93.8% OA，在ShapeNetPart上为87.6%实例mIoU）。MH-V在3D语义分割和3D物体检测方面也取得了令人满意的表现。

更新时间: 2024-07-14 13:42:05

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.10200v1

A3S: A General Active Clustering Method with Pairwise Constraints

Active clustering aims to boost the clustering performance by integrating human-annotated pairwise constraints through strategic querying. Conventional approaches with semi-supervised clustering schemes encounter high query costs when applied to large datasets with numerous classes. To address these limitations, we propose a novel Adaptive Active Aggregation and Splitting (A3S) framework, falling within the cluster-adjustment scheme in active clustering. A3S features strategic active clustering adjustment on the initial cluster result, which is obtained by an adaptive clustering algorithm. In particular, our cluster adjustment is inspired by the quantitative analysis of Normalized mutual information gain under the information theory framework and can provably improve the clustering quality. The proposed A3S framework significantly elevates the performance and scalability of active clustering. In extensive experiments across diverse real-world datasets, A3S achieves desired results with significantly fewer human queries compared with existing methods.

Updated: 2024-07-14 13:37:03

标题: A3S: 一种具有成对约束的通用主动聚类方法

摘要: 主动聚类旨在通过整合人工注释的成对约束来提高聚类性能，通过策略性查询。当应用于具有众多类别的大型数据集时，传统的半监督聚类方案会遇到较高的查询成本。为了解决这些限制，我们提出了一个新颖的自适应主动聚合和分裂（A3S）框架，属于主动聚类中的集群调整方案。 A3S具有在由自适应聚类算法获得的初始聚类结果上进行战略性主动聚类调整的特点。特别是，我们的集群调整受到信息理论框架下标准化互信息增益的定量分析的启发，并可以明显提高聚类质量。提出的A3S框架显著提升了主动聚类的性能和可扩展性。在各种真实世界数据集上进行了广泛实验，与现有方法相比，A3S在人类查询数量明显较少的情况下实现了理想的结果。

更新时间: 2024-07-14 13:37:03

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.10196v1

Nonverbal Interaction Detection

This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form <individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs. Central to the model is a dual multi-scale hypergraph that adeptly addresses individual-to-individual and group-to-group correlations across varying scales, facilitating interactional feature learning and eventually improving interaction prediction. Extensive experiments on NVI show that NVI-DEHR improves various baselines significantly in NVI-DET. It also exhibits leading performance on HOI-DET, confirming its versatility in supporting related tasks and strong generalization ability. We hope that our study will offer the community new avenues to explore nonverbal signals in more depth.

Updated: 2024-07-14 13:33:57

标题: 非言语互动检测

摘要: 这项工作涉及了理解社交背景下人类非语言互动的新挑战。非语言信号几乎渗透到每一个交流行为中。我们的手势、面部表情、姿势、注视，甚至外貌都传达着信息，而无需言语。尽管在社交生活中扮演着至关重要的角色，与语言对应物相比，非语言信号受到的关注非常有限，现有的解决方案通常孤立地考察非语言线索。我们的研究标志着第一次系统性地努力增强多方面非语言信号的解释。首先，我们贡献了一个新颖的大规模数据集，名为NVI，该数据集经过精心注释，包括人类和相应社交群体的边界框，以及五种广泛的互动类型下的22种原子级非语言行为。其次，我们建立了一个新任务NVI-DET用于非语言互动检测，该任务被形式化为从图像中识别<个人、群体、互动>三元组。第三，我们提出了一种非语言互动检测超图（NVI-DEHR），这是一种新方法，明确地利用超图对高阶非语言互动进行建模。该模型的核心是一个双多尺度超图，灵活地处理不同尺度下个体与个体以及群体与群体之间的相关性，促进互动特征学习，最终改善互动预测。对NVI的大量实验表明，NVI-DEHR在NVI-DET中显著改善了各种基线。它还在HOI-DET中表现出领先的性能，证实了其在支持相关任务和强大泛化能力方面的多功能性。我们希望我们的研究将为社区提供探索更深层次的非语言信号的新途径。

更新时间: 2024-07-14 13:33:57

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.08133v2

Curriculum Learning for Small Code Language Models

Code language models have emerged as useful tools for various programming tasks, yet they often struggle when it comes to complex ones. In this paper, we explore the potential of curriculum learning in enhancing the performance of these models. While prior research has suggested that curriculum learning does not necessarily help in improving the performance of language models, our results surprisingly show that this may not be the case for code language models. We demonstrate that a well-designed curriculum learning approach significantly improves the accuracy of small decoder-only code language models on the task of code execution, while its effect on code completion is less significant. To explore the potential of curriculum learning, we train multiple GPT models with 1 million parameters each to predict the next token and evaluate them on code completion and execution tasks. Our contributions include proposing a novel code difficulty assessment metric by combining software code measures, investigating the effectiveness of Curriculum Learning for code language models, and introducing a Novel Curriculum Learning schedule that enhances the performance of small decoder-only language models in code execution tasks. The results of this paper open the door for more research on the use of curriculum learning for code language models.

Updated: 2024-07-14 13:32:24

标题: 小型代码语言模型的课程学习

摘要: 代码语言模型已经成为各种编程任务中的有用工具，但是在复杂任务中它们通常会遇到困难。在本文中，我们探讨了课程学习在提升这些模型性能方面的潜力。尽管先前的研究表明课程学习并不一定有助于改善语言模型的性能，但我们的结果令人惊讶地表明这对于代码语言模型可能并非如此。我们展示了一个精心设计的课程学习方法显著提高了小型仅解码器的代码语言模型在代码执行任务上的准确性，而对代码完成任务的影响较小。为了探索课程学习的潜力，我们训练了多个具有100万参数的GPT模型，用于预测下一个标记，并在代码完成和执行任务上对它们进行评估。我们的贡献包括提出一种通过结合软件代码度量来评估代码难度的新型指标，研究课程学习对代码语言模型的有效性，并引入一种增强小型仅解码器语言模型在代码执行任务中性能的新型课程学习计划。本文的结果为更多关于将课程学习应用于代码语言模型的研究打开了大门。

更新时间: 2024-07-14 13:32:24

领域: cs.LG,cs.AI,cs.PL

下载: http://arxiv.org/abs/2407.10194v1

Unexpected Benefits of Self-Modeling in Neural Systems

Self-models have been a topic of great interest for decades in studies of human cognition and more recently in machine learning. Yet what benefits do self-models confer? Here we show that when artificial networks learn to predict their internal states as an auxiliary task, they change in a fundamental way. To better perform the self-model task, the network learns to make itself simpler, more regularized, more parameter-efficient, and therefore more amenable to being predictively modeled. To test the hypothesis of self-regularizing through self-modeling, we used a range of network architectures performing three classification tasks across two modalities. In all cases, adding self-modeling caused a significant reduction in network complexity. The reduction was observed in two ways. First, the distribution of weights was narrower when self-modeling was present. Second, a measure of network complexity, the real log canonical threshold (RLCT), was smaller when self-modeling was present. Not only were measures of complexity reduced, but the reduction became more pronounced as greater training weight was placed on the auxiliary task of self-modeling. These results strongly support the hypothesis that self-modeling is more than simply a network learning to predict itself. The learning has a restructuring effect, reducing complexity and increasing parameter efficiency. This self-regularization may help explain some of the benefits of self-models reported in recent machine learning literature, as well as the adaptive value of self-models to biological systems. In particular, these findings may shed light on the possible interaction between the ability to model oneself and the ability to be more easily modeled by others in a social or cooperative context.

Updated: 2024-07-14 13:16:23

标题: 神经系统中自我建模的意外好处

摘要: 自我模型在人类认知研究以及最近的机器学习中已经成为一个长期关注的话题。然而，自我模型究竟带来了哪些好处呢？在这里，我们展示了当人工网络学习预测其内部状态作为辅助任务时，它们会以一种基本的方式发生改变。为了更好地执行自我模型任务，网络学会使自身变得更简单、更规范化、更参数高效，因此更容易被预测性地建模。为了测试通过自我模型实现自我正则化的假设，我们使用了一系列网络架构在两种模态下执行三个分类任务。在所有情况下，添加自我模型会显著减少网络复杂性。这种减少以两种方式观察到。首先，当存在自我模型时，权重分布会更窄。其次，网络复杂性的度量，即实对数标准阈值（RLCT），在存在自我模型时会更小。不仅复杂性度量减少，而且随着对自我模型辅助任务的训练权重增加，这种减少会变得更加显著。这些结果强烈支持了自我模型不仅仅是网络学会预测自身的假设。学习具有重构效应，减少复杂性并增加参数效率。这种自我正则化可能有助于解释最近机器学习文献中报告的自我模型的一些好处，以及自我模型对生物系统的适应性价值。特别是，这些发现可能阐明了在社会或合作环境中模拟自己的能力和更容易被他人模拟的能力之间的可能互动。

更新时间: 2024-07-14 13:16:23

领域: cs.LG

下载: http://arxiv.org/abs/2407.10188v1

Identity Chain

The first generation of cryptocurrencies introduced revolutionary concepts, yet faced challenges in privacy and regulatory compliance. While subsequent cryptocurrencies aimed to address privacy concerns (like Zcash and Monero), they often conflicted with regulatory frameworks, hindering broader adoption. In response, inspired by recent researches about privacy and accountability and incentive techniques in Blockchain, we propose IdentityChain as a novel framework that integrates privacy and accountability principles, leading to a robust system equipped with adaptable rules. IdentityChain is a KYC (Know Your Customer) service on top of a public Blockchain (e.g., Ethereum, Ton, Polygon). The goal is to maintain privacy while ensuring compliance with existing regulations. Privacy is one of the key characteristics of IdentityChain, it's crucial for preventing conflicts of interests further discussed how. Accountability is also one of the main characteristics of IdentityChain and prevents from misbehave of users. Privacy and accountability together wouldn't be possible unless advancements in cryptography.

Updated: 2024-07-14 13:14:16

标题: 身份链

摘要: 加密货币的第一代引入了革命性的概念，但在隐私和监管合规方面面临挑战。尽管随后的加密货币旨在解决隐私问题（如Zcash和Monero），但它们经常与监管框架发生冲突，阻碍了更广泛的采用。为此，受最近关于隐私和责任以及区块链中的激励技术的研究启发，我们提出了IdentityChain作为一个整合隐私和责任原则的新框架，导致一个装备有适应性规则的强大系统。 IdentityChain是建立在公共区块链（例如以太坊、Ton、Polygon等）之上的KYC（认识您的客户）服务。其目标是在确保符合现有法规的同时保持隐私。隐私是IdentityChain的关键特征之一，对于预防利益冲突至关重要。责任也是IdentityChain的主要特征之一，可以防止用户的不当行为。隐私和责任在没有密码学的进步的情况下是不可能的。

更新时间: 2024-07-14 13:14:16

领域: cs.GT,cs.CR

下载: http://arxiv.org/abs/2407.10187v1

FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Field

The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions. Many studies have proven its accuracy in depicting atomic environments, although this comes with high computational needs. The computational burden of this challenge is hard to reduce due to the need for permutation equivariance, which limits the design space of the CG transform layer. We show that, implementing the CG transform layer on permutation-invariant inputs allows complete freedom in the design of this layer without affecting symmetry. Developing further on this premise, our idea is to create a CG transform layer that operates on permutation-invariant abstract edges generated from real edge information. We bring in group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves State-of-The-Art (SoTA) results in force prediction for MD17, rMD17, MD22, and property prediction in QM9 datasets with notable enhancement. It introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs.

Updated: 2024-07-14 12:40:35

标题: FreeCG：为机器学习势场释放Clebsch-Gordan变换的设计空间

摘要: The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions. Many studies have proven its accuracy in depicting atomic environments, although this comes with high computational needs. The computational burden of this challenge is hard to reduce due to the need for permutation equivariance, which limits the design space of the CG transform layer. We show that implementing the CG transform layer on permutation-invariant inputs allows complete freedom in the design of this layer without affecting symmetry. Developing further on this premise, our idea is to create a CG transform layer that operates on permutation-invariant abstract edges generated from real edge information. We bring in group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves State-of-The-Art (SoTA) results in force prediction for MD17, rMD17, MD22, and property prediction in QM9 datasets with notable enhancement. It introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs.

更新时间: 2024-07-14 12:40:35

领域: cs.LG,physics.chem-ph,q-bio.BM,quant-ph

下载: http://arxiv.org/abs/2407.02263v2

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these smaller models often suffer from errors in calculation and semantic understanding. Prior work has proposed Program-of-Thought Distillation (PoTD) to avoid calculation error. To further address semantic understanding errors, we propose Key-Point-Driven Mathematical Reasoning Distillation (KPDD). KPDD enhances the reasoning performance of SLMs by breaking down the problem-solving process into three stages: Core Question Extraction, Problem-Solving Information Extraction, and Step-by-Step Solution. This method is further divided into KPDD-CoT, which generates Chain-of-Thought rationales, and KPDD-PoT, which creates Program-of-Thought rationales. The experiment results show that KPDD-CoT significantly improves reasoning abilities, while KPDD-PoT achieves state-of-the-art performance in mathematical reasoning tasks. Our approach effectively mitigates misunderstanding errors, advancing the deployment of efficient and capable SLMs.

Updated: 2024-07-14 11:41:03

标题: 关键点驱动的大型语言模型数学推理精炼

摘要: 大型语言模型（LLMs）由于其庞大的参数数量和在大量数据集上的训练，已经在数学推理任务中展现出卓越的能力。尽管具有这些能力，部署LLMs受到其计算需求的限制。将LLM数学推理转化为较小语言模型（SLMs）已经成为解决这一挑战的方案，尽管这些较小模型经常在计算和语义理解上出现错误。之前的工作提出了基于思想程序的精馏（PoTD）来避免计算错误。为了进一步解决语义理解错误，我们提出了基于关键点驱动的数学推理精炼（KPDD）。KPDD通过将问题解决过程分解为三个阶段：核心问题提取、问题解决信息提取和逐步解决方案来增强SLMs的推理性能。这种方法进一步分为KPDD-CoT，它生成思维链理由，和KPDD-PoT，它创建思维程序理由。实验结果显示，KPDD-CoT显著提高了推理能力，而KPDD-PoT在数学推理任务中实现了最先进的性能。我们的方法有效地减轻了误解错误，推动了高效和有能力SLM的部署。

更新时间: 2024-07-14 11:41:03

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10167v1

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices. Our code is freely available at: https://github.com/ristea/fast-aed.

Updated: 2024-07-14 11:34:21

标题: 通过对抗性知识蒸馏实现闪电般快速的视频异常检测

摘要: 我们提出了一个非常快速的视频异常检测的帧级模型，该模型通过从多个高度准确的对象级教师模型中提取知识来学习检测异常。为了提高我们的学生模型的准确性，我们通过同时应用标准和对抗性蒸馏来提取教师模型的低分辨率异常地图，为每个教师引入一个对抗性鉴别器来区分目标和生成的异常地图。我们在三个基准测试（Avenue，ShanghaiTech，UCSD Ped2）上进行实验，结果显示我们的方法比最快的竞争方法快7倍以上，并且比以对象为中心的模型快28到62倍，同时获得与最近方法相当的结果。我们的评估还表明，由于其前所未有的1480 FPS的速度，我们的模型实现了速度和准确性之间的最佳权衡。此外，我们进行了全面的消融研究来证明我们的架构设计选择的合理性。我们的代码可以在以下链接免费获取：https://github.com/ristea/fast-aed。

更新时间: 2024-07-14 11:34:21

领域: cs.CV,cs.AI,cs.LG,cs.MM,stat.ML

下载: http://arxiv.org/abs/2211.15597v3

The Hidden Influence of Latent Feature Magnitude When Learning with Imbalanced Data

Machine learning (ML) models have difficulty generalizing when the number of training class instances are numerically imbalanced. The problem of generalization in the face of data imbalance has largely been attributed to the lack of training data for under-represented classes and to feature overlap. The typical remedy is to implement data augmentation for classes with fewer instances or to assign a higher cost to minority class prediction errors or to undersample the prevalent class. However, we show that one of the central causes of impaired generalization when learning with imbalanced data is the inherent manner in which ML models perform inference. These models have difficulty generalizing due to their heavy reliance on the magnitude of encoded signals. During inference, the models predict classes based on a combination of encoded signal magnitudes that linearly sum to the largest scalar. We demonstrate that even with aggressive data augmentation, which generally improves minority class prediction accuracy, parametric ML models still associate a class label with a limited number of feature combinations that sum to a prediction, which can affect generalization.

Updated: 2024-07-14 11:20:50

标题: 学习不平衡数据时潜在特征幅度的隐藏影响

摘要: 机器学习（ML）模型在训练类实例数量数字不平衡时很难泛化。在面对数据不平衡的情况下，泛化问题主要归因于对少数类缺乏训练数据和特征重叠。典型的解决方法是为实例较少的类实现数据增强，或者为少数类预测错误分配更高的成本，或者对普遍类进行欠采样。然而，我们表明，在学习不平衡数据时泛化受损的一个中心原因是ML模型执行推理的固有方式。这些模型在泛化上有困难，因为它们过于依赖编码信号的幅度。在推理过程中，模型根据线性求和得到的编码信号幅度来预测类。我们证明，即使采取激进的数据增强措施，通常会提高少数类预测准确度，参数化ML模型仍然将类标签与有限数量的特征组合相关联，这些组合求和得到一个预测，这可能影响泛化。

更新时间: 2024-07-14 11:20:50

领域: cs.LG

下载: http://arxiv.org/abs/2407.10165v1

Pre-training with Fractional Denoising to Enhance Molecular Property Prediction

Deep learning methods have been considered promising for accelerating molecular screening in drug discovery and material design. Due to the limited availability of labelled data, various self-supervised molecular pre-training methods have been presented. While many existing methods utilize common pre-training tasks in computer vision (CV) and natural language processing (NLP), they often overlook the fundamental physical principles governing molecules. In contrast, applying denoising in pre-training can be interpreted as an equivalent force learning, but the limited noise distribution introduces bias into the molecular distribution. To address this issue, we introduce a molecular pre-training framework called fractional denoising (Frad), which decouples noise design from the constraints imposed by force learning equivalence. In this way, the noise becomes customizable, allowing for incorporating chemical priors to significantly improve molecular distribution modeling. Experiments demonstrate that our framework consistently outperforms existing methods, establishing state-of-the-art results across force prediction, quantum chemical properties, and binding affinity tasks. The refined noise design enhances force accuracy and sampling coverage, which contribute to the creation of physically consistent molecular representations, ultimately leading to superior predictive performance.

Updated: 2024-07-14 11:09:42

标题: 使用分数去噪预训练以增强分子性质预测

摘要: 深度学习方法被认为在加速药物发现和材料设计中具有潜在前景。由于标记数据的有限可用性，各种自监督分子预训练方法已被提出。尽管许多现有方法利用计算机视觉（CV）和自然语言处理（NLP）中的常见预训练任务，但它们经常忽视控制分子的基本物理原则。相反，在预训练中应用去噪可以被解释为等效的力学学习，但有限的噪声分布会引入偏差到分子分布中。为了解决这个问题，我们引入了一个称为分数去噪（Frad）的分子预训练框架，它将噪声设计与力学学习等效的约束分离。通过这种方式，噪声变得可定制，允许将化学先验纳入其中，从而显著改善分子分布建模。实验证明，我们的框架始终优于现有方法，在力预测、量子化学性质和结合亲和力任务中建立了最先进的结果。精细的噪声设计增强了力的准确性和采样覆盖率，有助于创造物理上一致的分子表示，最终导致更优越的预测性能。

更新时间: 2024-07-14 11:09:42

领域: cs.LG,cs.AI,physics.chem-ph

下载: http://arxiv.org/abs/2407.11086v1

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at LLM reasoning tasks that can enhance the performance of LLMs in multi-step deductive reasoning tasks by integrating logic programming. In ChatLogic, the language model plays a central role, acting as a controller and participating in every system operation stage. We propose a novel method of converting logic problems into symbolic integration with an inference engine. This approach leverages large language models' situational understanding and imitation skills and uses symbolic memory to enhance multi-step deductive reasoning capabilities. Our results show that the ChatLogic framework significantly improves the multi-step reasoning capabilities of LLMs. The source code and data are available at \url{https://github.com/Strong-AI-Lab/ChatLogic}

Updated: 2024-07-14 11:06:43

标题: ChatLogic：将逻辑编程与大型语言模型集成，用于多步推理

摘要: 大型语言模型（LLMs）如ChatGPT和GPT-4在各种生成任务中展示出令人印象深刻的能力。然而，它们的性能常常受限于长期记忆的访问和利用能力，导致特定的脆弱性和偏见，特别是在长时间交互中。本文介绍了ChatLogic，一个针对LLM推理任务的创新框架，通过集成逻辑编程可以提升LLMs在多步演绎推理任务中的性能。在ChatLogic中，语言模型扮演着核心角色，作为控制器参与每个系统操作阶段。我们提出了一种将逻辑问题转化为符号集成与推理引擎的新方法。这种方法利用大型语言模型的情境理解和模仿能力，利用符号记忆增强多步演绎推理能力。我们的结果表明，ChatLogic框架显著改善了LLMs的多步推理能力。源代码和数据可在\url{https://github.com/Strong-AI-Lab/ChatLogic}上获得。

更新时间: 2024-07-14 11:06:43

领域: cs.AI

下载: http://arxiv.org/abs/2407.10162v1

RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation

3D point clouds play a pivotal role in outdoor scene perception, especially in the context of autonomous driving. Recent advancements in 3D LiDAR segmentation often focus intensely on the spatial positioning and distribution of points for accurate segmentation. However, these methods, while robust in variable conditions, encounter challenges due to sole reliance on coordinates and point intensity, leading to poor isometric invariance and suboptimal segmentation. To tackle this challenge, our work introduces Range-Aware Pointwise Distance Distribution (RAPiD) features and the associated RAPiD-Seg architecture. Our RAPiD features exhibit rigid transformation invariance and effectively adapt to variations in point density, with a design focus on capturing the localized geometry of neighboring structures. They utilize inherent LiDAR isotropic radiation and semantic categorization for enhanced local representation and computational efficiency, while incorporating a 4D distance metric that integrates geometric and surface material reflectivity for improved semantic segmentation. To effectively embed high-dimensional RAPiD features, we propose a double-nested autoencoder structure with a novel class-aware embedding objective to encode high-dimensional features into manageable voxel-wise embeddings. Additionally, we propose RAPiD-Seg which incorporates a channel-wise attention fusion and two effective RAPiD-Seg variants, further optimizing the embedding for enhanced performance and generalization. Our method outperforms contemporary LiDAR segmentation work in terms of mIoU on SemanticKITTI (76.1) and nuScenes (83.6) datasets.

Updated: 2024-07-14 10:59:34

标题: RAPiD-Seg：面向范围感知的点间距分布网络，用于3D LiDAR分割

摘要: 3D点云在室外场景感知中发挥关键作用，特别是在自动驾驶的背景下。最近在3D LiDAR分割方面取得的进展通常集中在点的空间定位和分布，以实现准确的分割。然而，这些方法虽然在不同条件下具有稳健性，但由于仅依赖坐标和点强度，面临着等距不变性差和分割不够优化的挑战。为了解决这一挑战，我们的工作引入了Range-Aware Pointwise Distance Distribution（RAPiD）特征和相关的RAPiD-Seg架构。我们的RAPiD特征表现出刚性变换不变性，并有效地适应点密度的变化，设计重点在于捕获相邻结构的局部几何形状。它们利用固有的LiDAR等向辐射和语义分类，以增强本地表示和计算效率，同时整合了一个4D距离度量，将几何和表面材料反射率整合到一起，以改善语义分割。为了有效地嵌入高维度的RAPiD特征，我们提出了一个双嵌套自动编码器结构，并使用一种新颖的类感知嵌入目标，将高维特征编码为可管理的体素嵌入。此外，我们提出了RAPiD-Seg，它结合了通道注意力融合和两种有效的RAPiD-Seg变体，进一步优化了嵌入，以提高性能和泛化能力。我们的方法在SemanticKITTI（76.1）和nuScenes（83.6）数据集上的mIoU方面优于当代LiDAR分割工作。

更新时间: 2024-07-14 10:59:34

领域: cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2407.10159v1

Look Within, Why LLMs Hallucinate: A Causal Perspective

The emergence of large language models (LLMs) is a milestone in generative artificial intelligence, achieving significant success in text comprehension and generation tasks. Despite the tremendous success of LLMs in many downstream tasks, they suffer from severe hallucination problems, posing significant challenges to the practical applications of LLMs. Most of the works about LLMs' hallucinations focus on data quality. Self-attention is a core module in transformer-based LLMs, while its potential relationship with LLMs' hallucination has been hardly investigated. To fill this gap, we study this problem from a causal perspective. We propose a method to intervene in LLMs' self-attention layers and maintain their structures and sizes intact. Specifically, we disable different self-attention layers in several popular open-source LLMs and then compare their degrees of hallucination with the original ones. We evaluate the intervened LLMs on hallucination assessment benchmarks and conclude that disabling some specific self-attention layers in the front or tail of the LLMs can alleviate hallucination issues. The study paves a new way for understanding and mitigating LLMs' hallucinations.

Updated: 2024-07-14 10:47:44

标题: 往内看，为什么LLM产生幻觉：因果透视

摘要: 大型语言模型（LLMs）的出现是生成人工智能的里程碑，取得了在文本理解和生成任务中的显著成功。尽管LLMs在许多下游任务中取得了巨大成功，但它们遭受严重的幻觉问题，给LLMs的实际应用带来了重大挑战。关于LLMs的幻觉问题的大部分作品都集中在数据质量上。自注意力是基于transformer的LLMs中的核心模块，然而它与LLMs的幻觉问题的潜在关系几乎没有被研究。为了填补这一空白，我们从因果的角度研究了这个问题。我们提出了一种干预LLMs自注意力层并保持它们的结构和大小不变的方法。具体来说，我们禁用了几个流行的开源LLMs中的不同自注意力层，然后将它们的幻觉程度与原始模型进行比较。我们在幻觉评估基准上评估了干预LLMs，并得出结论，禁用LLMs前端或尾部的一些特定自注意力层可以缓解幻觉问题。这项研究为理解和减轻LLMs的幻觉问题开辟了一条新途径。

更新时间: 2024-07-14 10:47:44

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10153v1

Event Trojan: Asynchronous Event-based Backdoor Attacks

As asynchronous event data is more frequently engaged in various vision tasks, the risk of backdoor attacks becomes more evident. However, research into the potential risk associated with backdoor attacks in asynchronous event data has been scarce, leaving related tasks vulnerable to potential threats. This paper has uncovered the possibility of directly poisoning event data streams by proposing Event Trojan framework, including two kinds of triggers, i.e., immutable and mutable triggers. Specifically, our two types of event triggers are based on a sequence of simulated event spikes, which can be easily incorporated into any event stream to initiate backdoor attacks. Additionally, for the mutable trigger, we design an adaptive learning mechanism to maximize its aggressiveness. To improve the stealthiness, we introduce a novel loss function that constrains the generated contents of mutable triggers, minimizing the difference between triggers and original events while maintaining effectiveness. Extensive experiments on public event datasets show the effectiveness of the proposed backdoor triggers. We hope that this paper can draw greater attention to the potential threats posed by backdoor attacks on event-based tasks. Our code is available at https://github.com/rfww/EventTrojan.

Updated: 2024-07-14 10:40:13

标题: 事件木马：异步事件驱动的后门攻击

摘要: 随着异步事件数据在各种视觉任务中的更频繁应用，后门攻击的风险变得更加明显。然而，关于异步事件数据中与后门攻击相关的潜在风险的研究很少，使相关任务容易受到潜在威胁。本文通过提出事件特洛伊框架揭示了直接污染事件数据流的可能性，其中包括两种触发器，即不可变和可变触发器。具体而言，我们的两种事件触发器基于一系列模拟事件尖峰，可以轻松地整合到任何事件流中以发起后门攻击。此外，对于可变触发器，我们设计了一种自适应学习机制来最大化其攻击性。为了提高隐蔽性，我们引入了一种新颖的损失函数，约束可变触发器生成的内容，最小化触发器与原始事件之间的差异，同时保持有效性。对公共事件数据集进行的大量实验显示了所提出的后门触发器的有效性。我们希望本文能够引起更多关于事件任务上后门攻击潜在威胁的关注。我们的代码可在https://github.com/rfww/EventTrojan 上找到。

更新时间: 2024-07-14 10:40:13

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2407.06838v2

A Framework for Evaluating Privacy-Utility Trade-off in Vertical Federated Learning

Federated learning (FL) has emerged as a practical solution to tackle data silo issues without compromising user privacy. One of its variants, vertical federated learning (VFL), has recently gained increasing attention as the VFL matches the enterprises' demands of leveraging more valuable features to build better machine learning models while preserving user privacy. Current works in VFL concentrate on developing a specific protection or attack mechanism for a particular VFL algorithm. In this work, we propose an evaluation framework that formulates the privacy-utility evaluation problem. We then use this framework as a guide to comprehensively evaluate a broad range of protection mechanisms against most of the state-of-the-art privacy attacks for three widely deployed VFL algorithms. These evaluations may help FL practitioners select appropriate protection mechanisms given specific requirements. Our evaluation results demonstrate that: the model inversion and most of the label inference attacks can be thwarted by existing protection mechanisms; the model completion (MC) attack is difficult to be prevented, which calls for more advanced MC-targeted protection mechanisms. Based on our evaluation results, we offer concrete advice on improving the privacy-preserving capability of VFL systems. The code is available at https://github.com/yankang18/VFL-Attack-Defense

Updated: 2024-07-14 10:23:56

标题: 一个评估垂直联邦学习隐私-效用权衡的框架

摘要: 联邦学习（FL）已经成为解决数据孤岛问题且不损害用户隐私的实际解决方案。其中一种变体，垂直联邦学习（VFL），最近引起了越来越多的关注，因为VFL能够满足企业利用更有价值的特征构建更好的机器学习模型的需求，同时保护用户隐私。目前VFL方面的工作集中在为特定的VFL算法开发特定的保护或攻击机制。在这项工作中，我们提出了一个评估框架，用于制定隐私-效用评估问题。然后我们使用该框架作为指南，全面评估一系列保护机制，以抵御大多数目前广泛部署的VFL算法所面临的隐私攻击。这些评估可以帮助FL从业者根据特定需求选择适当的保护机制。我们的评估结果表明：现有的保护机制可以阻止模型反演和大多数标签推断攻击；模型完成（MC）攻击很难被防止，这需要更先进的针对MC的保护机制。根据我们的评估结果，我们提供了改进VFL系统隐私保护能力的具体建议。代码可在https://github.com/yankang18/VFL-Attack-Defense找到。

更新时间: 2024-07-14 10:23:56

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2209.03885v3

Data-Efficient Molecular Generation with Hierarchical Textual Inversion

Developing an effective molecular generation framework even with a limited number of molecules is often important for its practical deployment, e.g., drug discovery, since acquiring task-related molecular data requires expensive and time-consuming experimental costs. To tackle this issue, we introduce Hierarchical textual Inversion for Molecular generation (HI-Mol), a novel data-efficient molecular generation method. HI-Mol is inspired by the importance of hierarchical information, e.g., both coarse- and fine-grained features, in understanding the molecule distribution. We propose to use multi-level embeddings to reflect such hierarchical features based on the adoption of the recent textual inversion technique in the visual domain, which achieves data-efficient image generation. Compared to the conventional textual inversion method in the image domain using a single-level token embedding, our multi-level token embeddings allow the model to effectively learn the underlying low-shot molecule distribution. We then generate molecules based on the interpolation of the multi-level token embeddings. Extensive experiments demonstrate the superiority of HI-Mol with notable data-efficiency. For instance, on QM9, HI-Mol outperforms the prior state-of-the-art method with 50x less training data. We also show the effectiveness of molecules generated by HI-Mol in low-shot molecular property prediction.

Updated: 2024-07-14 09:36:45

标题: Hierarchical Textual Inversion实现的数据高效分子生成

摘要: 开发一种有效的分子生成框架，即使只有有限数量的分子也通常很重要，用于其实际部署，例如药物发现，因为获取与任务相关的分子数据需要昂贵且耗时的实验成本。为了解决这个问题，我们引入了分子生成的层次文本反转（HI-Mol），一种新颖的数据高效的分子生成方法。HI-Mol受到分层信息的重要性的启发，例如，粗粒度和细粒度特征，在理解分子分布方面。我们提出使用多级嵌入来反映这种基于最近的文本反转技术在视觉领域的采用的分层特征，该技术实现了数据高效的图像生成。与在图像领域使用单级标记嵌入的传统文本反转方法相比，我们的多级标记嵌入允许模型有效地学习基础的低射击分子分布。然后，我们基于多级标记嵌入的插值生成分子。大量实验证明了HI-Mol在数据效率方面的优越性。例如，在QM9上，HI-Mol在比先前最先进的方法少50倍的训练数据下表现优异。我们还展示了由HI-Mol生成的分子在低射击分子性质预测中的有效性。

更新时间: 2024-07-14 09:36:45

领域: cs.LG,q-bio.MN

下载: http://arxiv.org/abs/2405.02845v2

SpreadFGL: Edge-Client Collaborative Federated Graph Learning with Adaptive Neighbor Generation

Federated Graph Learning (FGL) has garnered widespread attention by enabling collaborative training on multiple clients for semi-supervised classification tasks. However, most existing FGL studies do not well consider the missing inter-client topology information in real-world scenarios, causing insufficient feature aggregation of multi-hop neighbor clients during model training. Moreover, the classic FGL commonly adopts the FedAvg but neglects the high training costs when the number of clients expands, resulting in the overload of a single edge server. To address these important challenges, we propose a novel FGL framework, named SpreadFGL, to promote the information flow in edge-client collaboration and extract more generalized potential relationships between clients. In SpreadFGL, an adaptive graph imputation generator incorporated with a versatile assessor is first designed to exploit the potential links between subgraphs, without sharing raw data. Next, a new negative sampling mechanism is developed to make SpreadFGL concentrate on more refined information in downstream tasks. To facilitate load balancing at the edge layer, SpreadFGL follows a distributed training manner that enables fast model convergence. Using real-world testbed and benchmark graph datasets, extensive experiments demonstrate the effectiveness of the proposed SpreadFGL. The results show that SpreadFGL achieves higher accuracy and faster convergence against state-of-the-art algorithms.

Updated: 2024-07-14 09:34:19

标题: SpreadFGL：具有自适应邻居生成的边缘-客户端协作式联邦图学习

摘要: 联合图学习（FGL）通过在多个客户端上进行合作训练，实现了半监督分类任务，引起了广泛关注。然而，大多数现有的FGL研究并未充分考虑真实场景中缺失的客户端拓扑信息，导致模型训练过程中多跳邻居客户端的特征聚合不足。此外，经典的FGL通常采用FedAvg，但忽略了当客户端数量扩大时高昂的训练成本，导致单个边缘服务器负载过重。为解决这些重要挑战，我们提出了一种新颖的FGL框架，名为SpreadFGL，以促进边缘客户端协作中的信息流动，并提取客户端之间更广义的潜在关系。在SpreadFGL中，首先设计了一个自适应图填补生成器，结合一个多功能评估器，来挖掘子图之间的潜在链接，而不共享原始数据。接下来，开发了一种新的负采样机制，使SpreadFGL集中于下游任务中更精细的信息。为促进边缘层的负载平衡，SpreadFGL遵循一种分布式训练方式，实现快速模型收敛。通过使用真实世界测试平台和基准图数据集，广泛的实验证明了所提出的SpreadFGL的有效性。结果显示，SpreadFGL在准确性和收敛速度方面优于现有算法。

更新时间: 2024-07-14 09:34:19

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.11085v1

Optimal Kernel Choice for Score Function-based Causal Discovery

Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods.

Updated: 2024-07-14 09:32:20

标题: 基于评分函数的因果发现的最佳核选择

摘要: 基于评分方法已经证明了它们在发现因果关系方面的有效性，通过根据它们与数据的拟合程度对不同因果结构进行评分。最近，黄等人提出了一个广义评分函数，可以通过在再生核希尔伯特空间（RKHS）中建模关系来处理一般数据分布和因果关系。在这个评分函数中选择适当的核对准确描述因果关系和确保精确因果发现至关重要。然而，当前方法涉及手动启发式选择核参数，使该过程变得繁琐，且难以确保最优性。在本文中，我们提出了一个在广义评分函数中的核选择方法，自动选择最适合数据的核。具体地，我们将因果图搜索过程中涉及的变量的生成过程建模为独立噪声变量的混合。基于这个模型，我们通过最大化每个搜索步骤中涉及的变量的边际似然来推导出自动核选择方法。我们在合成数据和真实基准数据上进行实验，结果表明我们提出的方法优于启发式核选择方法。

更新时间: 2024-07-14 09:32:20

领域: cs.LG,stat.ME

下载: http://arxiv.org/abs/2407.10132v1

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

Deep supervised learning algorithms typically require a large volume of labeled data to achieve satisfactory performance. However, the process of collecting and labeling such data can be expensive and time-consuming. Self-supervised learning (SSL), a subset of unsupervised learning, aims to learn discriminative features from unlabeled data without relying on human-annotated labels. SSL has garnered significant attention recently, leading to the development of numerous related algorithms. However, there is a dearth of comprehensive studies that elucidate the connections and evolution of different SSL variants. This paper presents a review of diverse SSL methods, encompassing algorithmic aspects, application domains, three key trends, and open research questions. Firstly, we provide a detailed introduction to the motivations behind most SSL algorithms and compare their commonalities and differences. Secondly, we explore representative applications of SSL in domains such as image processing, computer vision, and natural language processing. Lastly, we discuss the three primary trends observed in SSL research and highlight the open questions that remain. A curated collection of valuable resources can be accessed at https://github.com/guijiejie/SSL.

Updated: 2024-07-14 09:30:45

标题: 一份关于自监督学习的调查：算法、应用和未来趋势

摘要: 深度监督学习算法通常需要大量标记数据才能达到令人满意的性能。然而，收集和标记这些数据的过程可能昂贵且耗时。自监督学习（SSL），作为无监督学习的一个子集，旨在从未标记数据中学习有区别的特征，而不依赖于人工标注的标签。近年来，SSL引起了广泛关注，导致了许多相关算法的发展。然而，缺乏全面研究来阐明不同SSL变体之间的联系和演变。本文介绍了多样化的SSL方法，涵盖了算法方面、应用领域、三个关键趋势和开放的研究问题。首先，我们详细介绍了大多数SSL算法背后的动机，并比较它们的共同点和差异。其次，我们探讨了SSL在图像处理、计算机视觉和自然语言处理等领域的代表性应用。最后，我们讨论了SSL研究中观察到的三大主要趋势，并强调仍然存在的开放问题。值得收藏的有价值资源集合可以在https://github.com/guijiejie/SSL 上访问。

更新时间: 2024-07-14 09:30:45

领域: cs.LG

下载: http://arxiv.org/abs/2301.05712v4

AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology

Software agents have emerged as promising tools for addressing complex software engineering tasks. Existing works, on the other hand, frequently oversimplify software development workflows, despite the fact that such workflows are typically more complex in the real world. Thus, we propose AgileCoder, a multi agent system that integrates Agile Methodology (AM) into the framework. This system assigns specific AM roles - such as Product Manager, Developer, and Tester to different agents, who then collaboratively develop software based on user inputs. AgileCoder enhances development efficiency by organizing work into sprints, focusing on incrementally developing software through sprints. Additionally, we introduce Dynamic Code Graph Generator, a module that creates a Code Dependency Graph dynamically as updates are made to the codebase. This allows agents to better comprehend the codebase, leading to more precise code generation and modifications throughout the software development process. AgileCoder surpasses existing benchmarks, like ChatDev and MetaGPT, establishing a new standard and showcasing the capabilities of multi agent systems in advanced software engineering environments.

Updated: 2024-07-14 09:14:30

标题: AgileCoder：基于敏捷方法论的软件开发动态协作代理

摘要: 软件代理已经成为解决复杂软件工程任务的有希望的工具。另一方面，现有的工作经常过于简化软件开发工作流程，尽管在现实世界中这样的工作流程通常更加复杂。因此，我们提出了AgileCoder，这是一个将敏捷方法（AM）集成到框架中的多代理系统。该系统将特定的AM角色（如产品经理、开发人员和测试人员）分配给不同的代理人，然后这些代理人根据用户输入进行协作开发软件。AgileCoder通过将工作组织成冲刺，专注于通过冲刺逐步开发软件来提高开发效率。此外，我们引入了动态代码图生成器，一个模块，它在对代码库进行更新时动态创建代码依赖图。这使代理能够更好地理解代码库，从而在整个软件开发过程中更精确地生成和修改代码。AgileCoder超越了现有的基准，如ChatDev和MetaGPT，树立了新的标准，展示了多代理系统在先进软件工程环境中的能力。

更新时间: 2024-07-14 09:14:30

领域: cs.SE,cs.AI

下载: http://arxiv.org/abs/2406.11912v2

Selective Learning: Towards Robust Calibration with Dynamic Regularization

Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance. This problem usually arises due to the overfitting problem, which is characterized by learning everything presented in the training set, resulting in overconfident predictions during testing. Existing methods typically address overfitting and mitigate the miscalibration by adding a maximum-entropy regularizer to the objective function. The objective can be understood as seeking a model that fits the ground-truth labels by increasing the confidence while also maximizing the entropy of predicted probabilities by decreasing the confidence. However, previous methods lack clear guidance on confidence adjustment, leading to conflicting objectives (increasing but also decreasing confidence). Therefore, we introduce a method called Dynamic Regularization (DReg), which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off. At a high level, DReg aims to obtain a more reliable model capable of acknowledging what it knows and does not know. Specifically, DReg effectively fits the labels for in-distribution samples (samples that should be learned) while applying regularization dynamically to samples beyond model capabilities (e.g., outliers), thereby obtaining a robust calibrated model especially on the samples beyond model capabilities. Both theoretical and empirical analyses sufficiently demonstrate the superiority of DReg compared with previous methods.

Updated: 2024-07-14 09:12:25

标题: 选择性学习：通过动态正则化实现稳健校准

摘要: 深度学习中的miscalibration指的是预测的置信度和性能之间存在差异。这个问题通常由过拟合问题引起，过拟合表现为学习训练集中的所有内容，导致在测试过程中出现过度自信的预测。现有方法通常通过向目标函数添加最大熵正则化器来解决过拟合问题并减轻miscalibration。目标可以被理解为通过增加置信度来拟合地面真实标签，同时通过减少置信度来最大化预测概率的熵。然而，先前的方法缺乏对置信度调整的明确指导，导致目标冲突（增加但也减少置信度）。因此，我们介绍了一种称为动态正则化（DReg）的方法，旨在在训练过程中学习应该学习什么，从而规避置信度调整的折衷。在高层次上，DReg旨在获得一个更可靠的模型，能够承认自己知道什么，不知道什么。具体而言，DReg有效地拟合了内分布样本的标签（应该学习的样本），同时动态地对模型能力范围之外的样本（例如异常值）应用正则化，从而获得一个尤其在模型能力范围之外的样本上鲁棒且校准的模型。理论和实证分析充分展示了DReg相对于先前方法的优越性。

更新时间: 2024-07-14 09:12:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.08384v2

Ensemble Deep Random Vector Functional Link Neural Network Based on Fuzzy Inference System

The ensemble deep random vector functional link (edRVFL) neural network has demonstrated the ability to address the limitations of conventional artificial neural networks. However, since edRVFL generates features for its hidden layers through random projection, it can potentially lose intricate features or fail to capture certain non-linear features in its base models (hidden layers). To enhance the feature learning capabilities of edRVFL, we propose a novel edRVFL based on fuzzy inference system (edRVFL-FIS). The proposed edRVFL-FIS leverages the capabilities of two emerging domains, namely deep learning and ensemble approaches, with the intrinsic IF-THEN properties of fuzzy inference system (FIS) and produces rich feature representation to train the ensemble model. Each base model of the proposed edRVFL-FIS encompasses two key feature augmentation components: a) unsupervised fuzzy layer features and b) supervised defuzzified features. The edRVFL-FIS model incorporates diverse clustering methods (R-means, K-means, Fuzzy C-means) to establish fuzzy layer rules, resulting in three model variations (edRVFL-FIS-R, edRVFL-FIS-K, edRVFL-FIS-C) with distinct fuzzified features and defuzzified features. Within the framework of edRVFL-FIS, each base model utilizes the original, hidden layer and defuzzified features to make predictions. Experimental results, statistical tests, discussions and analyses conducted across UCI and NDC datasets consistently demonstrate the superior performance of all variations of the proposed edRVFL-FIS model over baseline models. The source codes of the proposed models are available at https://github.com/mtanveer1/edRVFL-FIS.

Updated: 2024-07-14 08:37:14

标题: 基于模糊推理系统的集成深度随机向量功能链接神经网络

摘要: 集成深度随机向量功能链接（edRVFL）神经网络已经证明了它能够解决传统人工神经网络的局限性。然而，由于edRVFL通过随机投影生成其隐藏层的特征，它可能会丢失复杂的特征或无法捕捉其基本模型（隐藏层）中的某些非线性特征。为了增强edRVFL的特征学习能力，我们提出了一种基于模糊推理系统的新型edRVFL（edRVFL-FIS）。所提出的edRVFL-FIS结合了两个新兴领域的能力，即深度学习和集成方法，以及模糊推理系统（FIS）的固有IF-THEN属性，并产生丰富的特征表示来训练集成模型。所提出的edRVFL-FIS的每个基本模型包含两个关键特征增强组件：a）无监督模糊层特征和b）监督去模糊特征。edRVFL-FIS模型整合了多种聚类方法（R-均值，K-均值，模糊C-均值）来建立模糊层规则，产生三种模型变体（edRVFL-FIS-R，edRVFL-FIS-K，edRVFL-FIS-C），具有不同的模糊特征和去模糊特征。在edRVFL-FIS框架内，每个基本模型利用原始、隐藏层和去模糊特征进行预测。在UCI和NDC数据集上进行的实验结果、统计检验、讨论和分析一致表明，所提出的edRVFL-FIS模型的所有变体在基准模型上表现出卓越性能。所提出模型的源代码可在https://github.com/mtanveer1/edRVFL-FIS 上获得。

更新时间: 2024-07-14 08:37:14

领域: cs.LG

下载: http://arxiv.org/abs/2406.00801v2

Evolutionary Trigger Detection and Lightweight Model Repair Based Backdoor Defense

Deep Neural Networks (DNNs) have been widely used in many areas such as autonomous driving and face recognition. However, DNN model is fragile to backdoor attack. A backdoor in the DNN model can be activated by a poisoned input with trigger and leads to wrong prediction, which causes serious security issues in applications. It is challenging for current defenses to eliminate the backdoor effectively with limited computing resources, especially when the sizes and numbers of the triggers are variable as in the physical world. We propose an efficient backdoor defense based on evolutionary trigger detection and lightweight model repair. In the first phase of our method, CAM-focus Evolutionary Trigger Filter (CETF) is proposed for trigger detection. CETF is an effective sample-preprocessing based method with the evolutionary algorithm, and our experimental results show that CETF not only distinguishes the images with triggers accurately from the clean images, but also can be widely used in practice for its simplicity and stability in different backdoor attack situations. In the second phase of our method, we leverage several lightweight unlearning methods with the trigger detected by CETF for model repair, which also constructively demonstrate the underlying correlation of the backdoor with Batch Normalization layers. Source code will be published after accepted.

Updated: 2024-07-14 08:25:25

标题: 进化触发检测和基于轻量级模型修复的后门防御

摘要: 深度神经网络（DNNs）已被广泛应用于许多领域，如自动驾驶和人脸识别。然而，DNN模型容易受到后门攻击的影响。DNN模型中的后门可以通过带有触发器的毒化输入激活，并导致错误预测，这在应用中引发严重的安全问题。目前的防御措施在有限的计算资源下难以有效消除后门，尤其是在物理世界中触发器的大小和数量是可变的情况下。我们提出了一种基于进化触发器检测和轻量级模型修复的高效后门防御方法。在我们方法的第一阶段，提出了基于CAM-focus进化触发器过滤器（CETF）的触发器检测。CETF是一种基于进化算法的有效的样本预处理方法，我们的实验结果表明，CETF不仅可以准确区分带有触发器的图像与干净图像，而且由于其在不同后门攻击情况下的简单性和稳定性，可以广泛应用于实践中。在我们方法的第二阶段，我们利用CETF检测到的触发器结合几种轻量级的遗忘方法进行模型修复，这也有助于展示后门与批量归一化层之间的潜在相关性。源代码将在接受后发布。

更新时间: 2024-07-14 08:25:25

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2407.05396v2

Adaptive Differentially Quantized Subspace Perturbation (ADQSP): A Unified Framework for Privacy-Preserving Distributed Average Consensus

Privacy-preserving distributed average consensus has received significant attention recently due to its wide applicability. Based on the achieved performances, existing approaches can be broadly classified into perfect accuracy-prioritized approaches such as secure multiparty computation (SMPC), and worst-case privacy-prioritized approaches such as differential privacy (DP). Methods of the first class achieve perfect output accuracy but reveal some private information, while methods from the second class provide privacy against the strongest adversary at the cost of a loss of accuracy. In this paper, we propose a general approach named adaptive differentially quantized subspace perturbation (ADQSP) which combines quantization schemes with so-called subspace perturbation. Although not relying on cryptographic primitives, the proposed approach enjoys the benefits of both accuracy-prioritized and privacy-prioritized methods and is able to unify them. More specifically, we show that by varying a single quantization parameter the proposed method can vary between SMPC-type performances and DP-type performances. Our results show the potential of exploiting traditional distributed signal processing tools for providing cryptographic guarantees. In addition to a comprehensive theoretical analysis, numerical validations are conducted to substantiate our results.

Updated: 2024-07-14 08:20:50

标题: 自适应差分量化子空间扰动（ADQSP）：一种用于保护隐私的分布式平均一致性的统一框架

摘要: 最近，隐私保护的分布式平均一致性受到了广泛关注，因为它具有广泛的适用性。根据已实现的性能，现有方法可以被广泛分类为优先考虑完美准确性的方法，如安全多方计算（SMPC），和优先考虑最坏情况隐私的方法，如差分隐私（DP）。第一类方法实现了完美的输出准确性，但泄露了一些私人信息，而第二类方法在牺牲准确性的情况下提供了针对最强对手的隐私保护。在本文中，我们提出了一种名为自适应差异量化子空间扰动（ADQSP）的通用方法，它将量化方案与所谓的子空间扰动相结合。尽管不依赖于密码学原语，提出的方法享有优先考虑准确性和优先考虑隐私的方法的优势，并能够统一它们。更具体地说，我们展示了通过改变单个量化参数，提出的方法可以在SMPC类型性能和DP类型性能之间变化。我们的结果显示利用传统的分布式信号处理工具提供加密保证的潜力。除了全面的理论分析外，还进行了数值验证以证实我们的结果。

更新时间: 2024-07-14 08:20:50

领域: cs.CR

下载: http://arxiv.org/abs/2312.07947v2

A Bag of Tricks for Scaling CPU-based Deep FFMs to more than 300m Predictions per Second

Field-aware Factorization Machines (FFMs) have emerged as a powerful model for click-through rate prediction, particularly excelling in capturing complex feature interactions. In this work, we present an in-depth analysis of our in-house, Rust-based Deep FFM implementation, and detail its deployment on a CPU-only, multi-data-center scale. We overview key optimizations devised for both training and inference, demonstrated by previously unpublished benchmark results in efficient model search and online training. Further, we detail an in-house weight quantization that resulted in more than an order of magnitude reduction in bandwidth footprint related to weight transfers across data-centres. We disclose the engine and associated techniques under an open-source license to contribute to the broader machine learning community. This paper showcases one of the first successful CPU-only deployments of Deep FFMs at such scale, marking a significant stride in practical, low-footprint click-through rate prediction methodologies.

Updated: 2024-07-14 08:10:20

标题: 将CPU-based Deep FFMs扩展到每秒超过300m次预测的一袋技巧

摘要: 领域感知因子分解机（FFMs）已经成为一个强大的模型，特别擅长于捕捉复杂特征交互，用于点击率预测。在这项工作中，我们介绍了我们基于Rust的Deep FFM实现的深入分析，并详细说明了其在仅CPU、多数据中心规模上的部署。我们概述了为训练和推断而设计的关键优化，通过先前未公开的基准结果展示了高效模型搜索和在线训练。此外，我们详细介绍了一种内部权重量化方法，导致了权重传输跨数据中心的带宽占用减少了一个数量级以上。我们以开源许可证披露了引擎和相关技术，以贡献于更广泛的机器学习社区。本文展示了Deep FFMs在如此规模上的首次成功CPU-only部署，标志着在实际中，低占用率点击率预测方法方面取得了重大进展。

更新时间: 2024-07-14 08:10:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2407.10115v1

Spurious Feature Diversification Improves Out-of-distribution Generalization

Generalization to out-of-distribution (OOD) data is a critical challenge in machine learning. Ensemble-based methods, like weight space ensembles that interpolate model parameters, have been shown to achieve superior OOD performance. However, the underlying mechanism for their effectiveness remains unclear. In this study, we closely examine WiSE-FT, a popular weight space ensemble method that interpolates between a pre-trained and a fine-tuned model. We observe an unexpected ``FalseFalseTrue" phenomenon, in which WiSE-FT successfully corrects many cases where each individual model makes incorrect predictions, which contributes significantly to its OOD effectiveness. To gain further insights, we conduct theoretical analysis in a multi-class setting with a large number of spurious features. Our analysis predicts the above phenomenon and it further shows that ensemble-based models reduce prediction errors in the OOD settings by utilizing a more diverse set of spurious features. Contrary to the conventional wisdom that focuses on learning invariant features for better OOD performance, our findings suggest that incorporating a large number of diverse spurious features weakens their individual contributions, leading to improved overall OOD generalization performance. Additionally, our findings provide the first explanation for the mysterious phenomenon of weight space ensembles outperforming output space ensembles in OOD. Empirically we demonstrate the effectiveness of utilizing diverse spurious features on a MultiColorMNIST dataset, and our experimental results are consistent with the theoretical analysis. Building upon the new theoretical insights into the efficacy of ensemble methods, we further propose a novel averaging method called BAlaNced averaGing (BANG) which significantly enhances the OOD performance of WiSE-FT.

Updated: 2024-07-14 08:02:49

标题: 虚假特征多样化改善超出分布的泛化

摘要: 对于分布外（OOD）数据的泛化是机器学习中面临的一个关键挑战。基于集成的方法，如插值模型参数的权重空间集成，已被证明能够实现优越的OOD性能。然而，它们有效性的基本机制仍然不清楚。在这项研究中，我们密切研究了WiSE-FT，一种流行的权重空间集成方法，它在预训练模型和微调模型之间进行插值。我们观察到一个意想不到的“FalseFalseTrue”现象，在这种现象中，WiSE-FT成功地纠正了许多每个单独模型都做出错误预测的情况，这对其OOD有效性做出了重要贡献。为了获得进一步的见解，我们在一个具有大量虚假特征的多类设置中进行了理论分析。我们的分析预测了上述现象，并进一步显示，基于集成的模型通过利用更多种类的虚假特征来减少OOD设置中的预测错误。与着重于学习不变特征以获得更好OOD性能的传统智慧相反，我们的发现表明，引入大量多样化的虚假特征会削弱它们的个体贡献，从而提高整体OOD泛化性能。此外，我们的发现为权重空间集成优于输出空间集成在OOD中的神秘现象提供了首个解释。在实证方面，我们在MultiColorMNIST数据集上展示了利用多样化虚假特征的有效性，我们的实验结果与理论分析一致。基于对集成方法有效性的新理论见解，我们进一步提出一种称为BAlaNced averaGing（BANG）的新型平均方法，显著增强了WiSE-FT的OOD性能。

更新时间: 2024-07-14 08:02:49

领域: cs.LG

下载: http://arxiv.org/abs/2309.17230v2

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best methods in the tasks of semantic segmentation, human parsing, and object boundary detection, respectively. The code is available at https://github.com/EnVision-Research/MTMamba.

Updated: 2024-07-14 07:50:04

标题: MTMamba：基于Mamba编码器的多任务密集场景理解的增强

摘要: 多任务密集场景理解学习模型，学习多个密集预测任务的模型，具有广泛的应用场景。建模长程依赖和增强跨任务交互对于多任务密集预测至关重要。在本文中，我们提出了MTMamba，一种新颖的基于Mamba的多任务场景理解架构。它包含两种核心块：自任务Mamba（STM）块和跨任务Mamba（CTM）块。STM通过利用Mamba处理长程依赖，而CTM明确地建模任务交互，以促进跨任务的信息交换。在NYUDv2和PASCAL-Context数据集上的实验表明，MTMamba相对于基于Transformer和基于CNN的方法具有优越的性能。值得注意的是，在PASCAL-Context数据集上，MTMamba在语义分割、人体解析和物体边界检测任务上分别比先前最佳方法提高了+2.08、+5.01和+4.90。代码可在https://github.com/EnVision-Research/MTMamba 获取。

更新时间: 2024-07-14 07:50:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.02228v2

Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification

Long Document Classification (LDC) has gained significant attention recently. However, multi-modal data in long documents such as texts and images are not being effectively utilized. Prior studies in this area have attempted to integrate texts and images in document-related tasks, but they have only focused on short text sequences and images of pages. How to classify long documents with hierarchical structure texts and embedding images is a new problem and faces multi-modal representation difficulties. In this paper, we propose a novel approach called Hierarchical Multi-modal Transformer (HMT) for cross-modal long document classification. The HMT conducts multi-modal feature interaction and fusion between images and texts in a hierarchical manner. Our approach uses a multi-modal transformer and a dynamic multi-scale multi-modal transformer to model the complex relationships between image features, and the section and sentence features. Furthermore, we introduce a new interaction strategy called the dynamic mask transfer module to integrate these two transformers by propagating features between them. To validate our approach, we conduct cross-modal LDC experiments on two newly created and two publicly available multi-modal long document datasets, and the results show that the proposed HMT outperforms state-of-the-art single-modality and multi-modality methods.

Updated: 2024-07-14 07:12:25

标题: 分层多模态变压器用于跨模态长文档分类

摘要: 长文档分类（LDC）最近引起了相当大的关注。然而，长文档中的多模态数据，如文本和图像，并未被有效利用。该领域的先前研究尝试在文档相关任务中整合文本和图像，但只关注于短文本序列和页面图像。如何对具有分层结构文本和嵌入图像的长文档进行分类是一个新问题，并面临多模态表示困难。在本文中，我们提出了一种名为Hierarchical Multi-modal Transformer（HMT）的新方法，用于跨模态长文档分类。HMT以分层方式在图像和文本之间进行多模态特征交互和融合。我们的方法使用多模态Transformer和动态多尺度多模态Transformer来建模图像特征之间，以及段落和句子特征之间的复杂关系。此外，我们引入了一种新的交互策略，称为动态掩码传递模块，通过在它们之间传播特征来集成这两个Transformer。为了验证我们的方法，我们在两个新创建的和两个公开可用的多模态长文档数据集上进行了跨模态LDC实验，结果显示，所提出的HMT优于最先进的单模态和多模态方法。

更新时间: 2024-07-14 07:12:25

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2407.10105v1

A Self-Supervised Learning Pipeline for Demographically Fair Facial Attribute Classification

Published research highlights the presence of demographic bias in automated facial attribute classification. The proposed bias mitigation techniques are mostly based on supervised learning, which requires a large amount of labeled training data for generalizability and scalability. However, labeled data is limited, requires laborious annotation, poses privacy risks, and can perpetuate human bias. In contrast, self-supervised learning (SSL) capitalizes on freely available unlabeled data, rendering trained models more scalable and generalizable. However, these label-free SSL models may also introduce biases by sampling false negative pairs, especially at low-data regimes 200K images) under low compute settings. Further, SSL-based models may suffer from performance degradation due to a lack of quality assurance of the unlabeled data sourced from the web. This paper proposes a fully self-supervised pipeline for demographically fair facial attribute classifiers. Leveraging completely unlabeled data pseudolabeled via pre-trained encoders, diverse data curation techniques, and meta-learning-based weighted contrastive learning, our method significantly outperforms existing SSL approaches proposed for downstream image classification tasks. Extensive evaluations on the FairFace and CelebA datasets demonstrate the efficacy of our pipeline in obtaining fair performance over existing baselines. Thus, setting a new benchmark for SSL in the fairness of facial attribute classification.

Updated: 2024-07-14 07:11:57

标题: 一个用于人口统计学公平面部属性分类的自监督学习流程

摘要: 已发表的研究突显了自动面部属性分类中存在的人口统计偏见。提出的偏见缓解技术大多基于监督学习，这需要大量标记训练数据以实现泛化性和可扩展性。然而，标记数据有限，需要繁琐的注释，存在隐私风险，并可能强化人类偏见。相比之下，自监督学习（SSL）利用自由可用的未标记数据，使训练模型更具可扩展性和泛化性。然而，这些无标签的SSL模型也可能通过采样错误的负样本对引入偏见，特别是在低数据情况（20万张图像）下，在低计算设置下。此外，基于SSL的模型可能由于缺乏对从网络获取的未标记数据的质量保证而遭受性能下降。本文提出了一个完全自监督的流水线，用于人口统计上公平的面部属性分类器。通过利用通过预训练编码器伪标记的完全未标记数据，多样的数据整理技术和基于元学习的加权对比学习，我们的方法在下游图像分类任务中明显优于现有的SSL方法。对FairFace和CelebA数据集的广泛评估表明，我们的流水线在获得比现有基准更公平的性能方面效果显著。因此，在面部属性分类的公平性方面设立了SSL的新基准。

更新时间: 2024-07-14 07:11:57

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10104v1

ToxVidLM: A Multimodal Framework for Toxicity Detection in Code-Mixed Videos

In an era of rapidly evolving internet technology, the surge in multimodal content, including videos, has expanded the horizons of online communication. However, the detection of toxic content in this diverse landscape, particularly in low-resource code-mixed languages, remains a critical challenge. While substantial research has addressed toxic content detection in textual data, the realm of video content, especially in non-English languages, has been relatively underexplored. This paper addresses this research gap by introducing a benchmark dataset, the first of its kind, consisting of 931 videos with 4021 code-mixed Hindi-English utterances collected from YouTube. Each utterance within this dataset has been meticulously annotated for toxicity, severity, and sentiment labels. We have developed an advanced Multimodal Multitask framework built for Toxicity detection in Video Content by leveraging Language Models (LMs), crafted for the primary objective along with the additional tasks of conducting sentiment and severity analysis. ToxVidLM incorporates three key modules - the Encoder module, Cross-Modal Synchronization module, and Multitask module - crafting a generic multimodal LM customized for intricate video classification tasks. Our experiments reveal that incorporating multiple modalities from the videos substantially enhances the performance of toxic content detection by achieving an Accuracy and Weighted F1 score of 94.29% and 94.35%, respectively.

Updated: 2024-07-14 07:09:42

标题: ToxVidLM：一种用于检测代码混合视频中毒性的多模态框架

摘要: 在快速发展的互联网技术时代，多模态内容的激增，包括视频，拓展了在线交流的视野。然而，在这个多样化的环境中检测有毒内容，特别是在资源匮乏的混合语言中，仍然是一个重要挑战。虽然大量研究已经解决了文本数据中的有毒内容检测问题，但视频内容领域，尤其是非英语语言中的研究相对较少。本文通过引入一个基准数据集来解决这一研究空白，这是第一种数据集，包含了从YouTube收集的931个视频，其中包含4021段混合印地语-英语话语。该数据集中的每个话语都经过了仔细标注，标记了毒性、严重性和情感标签。我们开发了一种先进的多模态多任务框架，用于利用语言模型(LMs)进行视频内容的毒性检测，旨在进行情感和严重性分析等附加任务。ToxVidLM包括三个关键模块——编码器模块、跨模态同步模块和多任务模块——打造了一个通用的多模态LM，定制用于复杂的视频分类任务。我们的实验显示，将视频中的多种模态整合在一起，显著提高了有毒内容检测的性能，分别达到了94.29%的准确率和94.35%的加权F1分数。

更新时间: 2024-07-14 07:09:42

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.20628v2

Retrieval-Augmented Generation and Knowledge-Grounded Reasoning for Faithful Patient Discharge Instructions

Language models (LMs), such as ChatGPT, have the potential to assist clinicians in generating various clinical notes. However, LMs are prone to produce ``hallucinations'', i.e., generated content that is not aligned with facts and knowledge. In this paper, we propose the Re$^3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning to enable LMs to generate faithful clinical texts. We demonstrate the effectiveness of our method in generating patient discharge instructions. It requires the LMs to understand the patients' long clinical documents, i.e., the health records during hospitalization, to generate critical instructional information provided both to carers and to the patient at the time of discharge. The proposed Re$^3$Writer imitates the working patterns of physicians to first retrieve related working experience from historical instructions written by physicians, then reason related medical knowledge. Finally, it refines the retrieved working experience and reasoned medical knowledge to extract useful information, which is used to generate the discharge instructions for previously-unseen patients. Our experiments show that, using our method, the performance of five different LMs can be substantially boosted across all metrics. Meanwhile, we show results from human evaluations to measure the effectiveness in terms of fluency, faithfulness, and comprehensiveness. The code is available at https://github.com/AI-in-Hospitals/Patient-Instructions

Updated: 2024-07-14 07:02:20

标题: 检索增强生成和基于知识的推理，以提供忠实的患者出院指导

摘要: 语言模型（LMs），如ChatGPT，有潜力帮助临床医生生成各种临床记录。然而，LMs容易产生“幻觉”，即生成的内容与事实和知识不符。在本文中，我们提出了Re$^3$Writer方法，采用检索增强生成和基于知识的推理，使LMs能够生成忠实的临床文本。我们展示了我们的方法在生成患者出院指导方面的有效性。它要求LMs理解患者的长期临床文档，即住院期间的健康记录，以生成关键的指导信息，同时提供给护理人员和患者在出院时。所提出的Re$^3$Writer模仿医生的工作模式，首先从医生编写的历史指导中检索相关工作经验，然后推理相关医学知识。最后，它改进了检索到的工作经验和推理的医学知识，提取有用的信息，用于为之前未见过的患者生成出院指导。我们的实验表明，使用我们的方法，五种不同的LMs的性能在所有指标上都可以大幅提升。同时，我们展示了人类评估结果，以衡量流畅性、忠实度和全面性方面的有效性。代码可在https://github.com/AI-in-Hospitals/Patient-Instructions 上找到。

更新时间: 2024-07-14 07:02:20

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2210.12777v3

A Watermark-Conditioned Diffusion Model for IP Protection

The ethical need to protect AI-generated content has been a significant concern in recent years. While existing watermarking strategies have demonstrated success in detecting synthetic content (detection), there has been limited exploration in identifying the users responsible for generating these outputs from a single model (owner identification). In this paper, we focus on both practical scenarios and propose a unified watermarking framework for content copyright protection within the context of diffusion models. Specifically, we consider two parties: the model provider, who grants public access to a diffusion model via an API, and the users, who can solely query the model API and generate images in a black-box manner. Our task is to embed hidden information into the generated contents, which facilitates further detection and owner identification. To tackle this challenge, we propose a Watermark-conditioned Diffusion model called WaDiff, which manipulates the watermark as a conditioned input and incorporates fingerprinting into the generation process. All the generative outputs from our WaDiff carry user-specific information, which can be recovered by an image extractor and further facilitate forensic identification. Extensive experiments are conducted on two popular diffusion models, and we demonstrate that our method is effective and robust in both the detection and owner identification tasks. Meanwhile, our watermarking framework only exerts a negligible impact on the original generation and is more stealthy and efficient in comparison to existing watermarking strategies.

Updated: 2024-07-14 06:53:20

标题: 一个水印条件的扩散模型用于IP保护

摘要: 近年来，保护人工智能生成内容的道德需求已成为一个重要关注点。虽然现有的水印策略已经成功地在检测合成内容方面取得了成果，但对于识别生成这些输出的用户的研究却有限（所有者识别）。在本文中，我们专注于实际情景，并提出了一个统一的水印框架，用于扩散模型的内容版权保护。具体来说，我们考虑两方：模型提供者通过API向公众提供对扩散模型的访问权限，以及用户，他们只能以黑匣子方式查询模型API并生成图像。我们的任务是将隐藏信息嵌入到生成的内容中，以便进一步进行检测和所有者识别。为了解决这一挑战，我们提出了一种称为WaDiff的水印条件扩散模型，它将水印作为条件输入，将指纹技术融入到生成过程中。我们的WaDiff生成的所有输出都携带着用户特定信息，可以通过图像提取器恢复，并进一步促进取证识别。我们在两种流行的扩散模型上进行了大量实验，并证明我们的方法在检测和所有者识别任务中都是有效且稳健的。与现有的水印策略相比，我们的水印框架对原始生成几乎没有影响，更加隐蔽和高效。

更新时间: 2024-07-14 06:53:20

领域: cs.CR

下载: http://arxiv.org/abs/2403.10893v2

DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding

Generating molecules that bind to specific proteins is an important but challenging task in drug discovery. Previous works usually generate atoms in an auto-regressive way, where element types and 3D coordinates of atoms are generated one by one. However, in real-world molecular systems, the interactions among atoms in an entire molecule are global, leading to the energy function pair-coupled among atoms. With such energy-based consideration, the modeling of probability should be based on joint distributions, rather than sequentially conditional ones. Thus, the unnatural sequentially auto-regressive modeling of molecule generation is likely to violate the physical rules, thus resulting in poor properties of the generated molecules. In this work, a generative diffusion model for molecular 3D structures based on target proteins as contextual constraints is established, at a full-atom level in a non-autoregressive way. Given a designated 3D protein binding site, our model learns the generative process that denoises both element types and 3D coordinates of an entire molecule, with an equivariant network. Experimentally, the proposed method shows competitive performance compared with prevailing works in terms of high affinity with proteins and appropriate molecule sizes as well as other drug properties such as drug-likeness of the generated molecules.

Updated: 2024-07-14 06:41:36

标题: DiffBP: 用于目标蛋白结合的3D分子生成扩散

摘要: 生成与特定蛋白质结合的分子是药物发现中一个重要但具有挑战性的任务。先前的研究通常以自回归方式生成原子，其中原子的元素类型和三维坐标逐个生成。然而，在真实的分子系统中，整个分子中原子之间的相互作用是全局的，导致原子之间的能量函数成对耦合。基于这种基于能量的考虑，概率建模应基于联合分布，而不是顺序条件分布。因此，分子生成的非自然顺序自回归建模可能违反物理规则，从而导致生成分子的性能不佳。本文提出了一种基于靶蛋白作为上下文约束的分子三维结构的生成扩散模型，以全原子水平以一种非自回归方式建立。给定一个指定的三维蛋白结合位点，我们的模型学习去噪整个分子的元素类型和三维坐标的生成过程，采用等变网络。实验结果表明，所提出的方法在与蛋白质的高亲和力、适当的分子大小以及生成分子的药物相似性等其他药物性质方面表现出与现有作品相媲美的竞争力。

更新时间: 2024-07-14 06:41:36

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2211.11214v4

Generative Modeling by Minimizing the Wasserstein-2 Loss

This paper approaches the unsupervised learning problem by minimizing the second-order Wasserstein loss (the $W_2$ loss) through a distribution-dependent ordinary differential equation (ODE), whose dynamics involves the Kantorovich potential associated with the true data distribution and a current estimate of it. A main result shows that the time-marginal laws of the ODE form a gradient flow for the $W_2$ loss, which converges exponentially to the true data distribution. An Euler scheme for the ODE is proposed and it is shown to recover the gradient flow for the $W_2$ loss in the limit. An algorithm is designed by following the scheme and applying persistent training, which naturally fits our gradient-flow approach. In both low- and high-dimensional experiments, our algorithm outperforms Wasserstein generative adversarial networks by increasing the level of persistent training appropriately.

Updated: 2024-07-14 05:54:39

标题: 通过最小化Wasserstein-2损失进行生成建模

摘要: 本文通过最小化第二阶Wasserstein损失（$W_2$损失）来处理无监督学习问题，通过依赖于分布的常微分方程（ODE），其动态涉及与真实数据分布相关的Kantorovich势和其当前估计值。一个主要结果表明ODE的时间边际定律形成$W_2$损失的梯度流，以指数方式收敛于真实数据分布。提出了ODE的欧拉方案，并表明在极限情况下恢复了$W_2$损失的梯度流。通过遵循该方案并应用持续训练设计了一个算法，这自然适合我们的梯度流方法。在低维和高维实验中，我们的算法通过适当增加持续训练水平优于Wasserstein生成对抗网络。

更新时间: 2024-07-14 05:54:39

领域: stat.ML,cs.LG,34A06, 49Q22, 68T01

下载: http://arxiv.org/abs/2406.13619v2

ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots

A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. The resource-intensive nature of QM methods and the scarcity of mechanism-based datasets motivated us to develop reliable ML models for predicting mechanisms. In this study, we created a comprehensive dataset with seven distinct classes, each representing uniquely characterized elementary steps. Subsequently, we developed an interpretable attention-based GNN that achieved near-unity and 96% accuracy, respectively for reaction step classification and the prediction of reactive atoms in each such step, capturing interactions between the broader reaction context and local active regions. The near-perfect classification enables accurate prediction of both individual events and the entire CRM, mitigating potential drawbacks of Seq2Seq approaches, where a wrongly predicted character leads to incoherent CRM identification. In addition to interpretability, our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.

Updated: 2024-07-14 05:53:18

标题: ReactAIvate：预测反应机制和揭示反应热点的深度学习方法

摘要: 化学反应机理（CRM）是一系列涉及断键/形成过程的分子级事件，随着反应物转化为产物，生成沿反应途径的瞬时中间体。理解这样的机制对于设计和发现新反应至关重要。目前可用的一种探测CRM的方法是量子力学（QM）计算。QM方法的资源密集性和机制性数据集的稀缺性促使我们开发可靠的ML模型来预测机制。在这项研究中，我们创建了一个包含七个独特类别的全面数据集，每个类别代表独特表征的基本步骤。随后，我们开发了一种可解释的基于注意力的GNN，在反应步骤分类和每个步骤中反应原子的预测方面分别实现了接近完美和96％的准确性，捕获了更广泛的反应背景与局部活跃区域之间的相互作用。接近完美的分类使得能够准确预测个别事件和整个CRM，减轻了Seq2Seq方法的潜在缺点，其中错误预测的字符会导致不连贯的CRM识别。除了可解释性外，我们的模型还能够从分布之外的类别中熟练识别关键原子。这种泛化性允许以模块化方式包含新的反应类型，因此对于专家了解新分子的反应性将会有价值。

更新时间: 2024-07-14 05:53:18

领域: physics.chem-ph,cs.AI,cs.LG

下载: http://arxiv.org/abs/2407.10090v1

Rapid Biomedical Research Classification: The Pandemic PACT Advanced Categorisation Engine

This paper introduces the Pandemic PACT Advanced Categorisation Engine (PPACE) along with its associated dataset. PPACE is a fine-tuned model developed to automatically classify research abstracts from funded biomedical projects according to WHO-aligned research priorities. This task is crucial for monitoring research trends and identifying gaps in global health preparedness and response. Our approach builds on human-annotated projects, which are allocated one or more categories from a predefined list. A large language model is then used to generate `rationales' explaining the reasoning behind these annotations. This augmented data, comprising expert annotations and rationales, is subsequently used to fine-tune a smaller, more efficient model. Developed as part of the Pandemic PACT project, which aims to track and analyse research funding and clinical evidence for a wide range of diseases with outbreak potential, PPACE supports informed decision-making by research funders, policymakers, and independent researchers. We introduce and release both the trained model and the instruction-based dataset used for its training. Our evaluation shows that PPACE significantly outperforms its baselines. The release of PPACE and its associated dataset offers valuable resources for researchers in multilabel biomedical document classification and supports advancements in aligning biomedical research with key global health priorities.

Updated: 2024-07-14 05:22:53

标题: 快速生物医学研究分类：大流行PACT先进分类引擎

摘要: 本文介绍了流行病PACT高级分类引擎（PPACE）及其相关数据集。 PPACE是一个经过精细调整的模型，旨在根据世界卫生组织对齐的研究重点自动分类资助的生物医学项目的摘要。这项任务对于监测研究趋势并识别全球卫生应对准备和响应中的差距至关重要。我们的方法建立在人工注释的项目基础上，这些项目根据预定义的列表被分配一个或多个类别。然后使用大型语言模型生成解释这些注释背后推理的“理由”。这种增强数据，包括专家注释和理由，随后用于对更小更高效的模型进行精细调整。作为旨在跟踪和分析具有疫情潜力的多种疾病的研究资助和临床证据的流行病PACT项目的一部分，PPACE支持研究资助者、政策制定者和独立研究人员做出知情决策。我们介绍并发布了训练有素的模型和用于训练的基于指令的数据集。我们的评估表明，PPACE明显优于其基线。PPACE及其相关数据集的发布为多标签生物医学文档分类的研究人员提供了宝贵资源，并支持将生物医学研究与重要的全球卫生重点协调的进展。

更新时间: 2024-07-14 05:22:53

领域: cs.CL,cs.AI,68T50,I.2.7

下载: http://arxiv.org/abs/2407.10086v1

Semantic Understanding and Data Imputation using Large Language Model to Accelerate Recommendation System

This paper aims to address the challenge of sparse and missing data in recommendation systems, a significant hurdle in the age of big data. Traditional imputation methods struggle to capture complex relationships within the data. We propose a novel approach that fine-tune Large Language Model (LLM) and use it impute missing data for recommendation systems. LLM which is trained on vast amounts of text, is able to understand complex relationship among data and intelligently fill in missing information. This enriched data is then used by the recommendation system to generate more accurate and personalized suggestions, ultimately enhancing the user experience. We evaluate our LLM-based imputation method across various tasks within the recommendation system domain, including single classification, multi-classification, and regression compared to traditional data imputation methods. By demonstrating the superiority of LLM imputation over traditional methods, we establish its potential for improving recommendation system performance.

Updated: 2024-07-14 04:53:36

标题: 利用大型语言模型进行语义理解和数据插补以加速推荐系统

摘要: 本文旨在解决推荐系统中稀疏和缺失数据的挑战，在大数据时代是一个重要障碍。传统的插补方法难以捕捉数据内部的复杂关系。我们提出了一种新颖的方法，即对大型语言模型（LLM）进行微调，并利用它来填补推荐系统中的缺失数据。LLM在大量文本上进行训练，能够理解数据之间的复杂关系，并智能地填补缺失信息。然后，推荐系统使用这些丰富的数据生成更准确和个性化的建议，最终提升用户体验。我们通过比较LLM基于的插补方法与传统数据插补方法在推荐系统领域内的各种任务，包括单一分类、多类分类和回归，来评估我们的LLM插补方法。通过展示LLM插补方法优于传统方法，我们确立了其改善推荐系统性能的潜力。

更新时间: 2024-07-14 04:53:36

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2407.10078v1

AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models

Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often directly inject Projected Gradient Descent (PGD) gradients into the sampling of generative models, which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for GAN-based methods on large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable in generating high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective in generating unrestricted adversarial examples, which outperforms state-of-the-art unrestricted adversarial attack methods in terms of attack performance and generation quality.

Updated: 2024-07-14 04:48:53

标题: AdvDiff: 使用扩散模型生成无限制的对抗样本

摘要: 无限制的对抗性攻击对深度学习模型和对抗性防御技术构成严重威胁。它们对深度学习应用造成严重安全问题，因为它们可以有效地绕过防御机制。然而，先前的攻击方法通常直接将投影梯度下降（PGD）梯度注入生成模型的采样中，这些方法在理论上无法证明，因此通过融入对抗目标生成不切实际的示例，特别是针对大规模数据集（如ImageNet）上的基于GAN的方法。在本文中，我们提出了一种名为AdvDiff的新方法，用扩散模型生成无限制的对抗性示例。我们设计了两种新颖的对抗引导技术，用于在扩散模型的反向生成过程中进行对抗性采样。这两种技术通过可解释地集成目标分类器的梯度，有效稳定地生成高质量、逼真的对抗性示例。在MNIST和ImageNet数据集上的实验结果表明，AdvDiff在生成无限制的对抗性示例方面表现出色，攻击性能和生成质量优于最先进的无限制对抗性攻击方法。

更新时间: 2024-07-14 04:48:53

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2307.12499v4

You Can Wash Hands Better: Accurate Daily Handwashing Assessment with Smartwatches

Hand hygiene is one of the most efficient daily actions to prevent infectious diseases, such as Influenza, Malaria, and skin infections. We have been suggested to wash our hands under professional guidelines to prevent virus infection. However, several surveys show that very few people follow this suggestion. Thus we propose UWash, a wearable solution with smartwatches, to assess handwashing procedures for the purpose of raising users' awareness and cultivating habits of high-quality handwashing. We address the task of handwashing assessment from readings of motion sensors similar to the action segmentation problem in computer vision, and propose a simple and lightweight two-stream UNet-like network to achieve it effectively. Experiments over 51 subjects show that UWash achieves an accuracy of 92.27% on handwashing gesture recognition, <0.5 seconds error on onset/offset detection, and <5 points error on gesture scoring in the user-dependent setting, and keeps promising in the user-independent evaluation and the user-independent-location-independent evaluation. UWash even performs well on 10 random passersby in a hospital 9 months later. UWash is the first work that scores the handwashing quality by gesture sequences and is instructive to guide users in promoting hand hygiene in daily life. Code and data are avaliable at https://github.com/aiotgroup/UWash

Updated: 2024-07-14 04:35:23

标题: 您可以更好地洗手：使用智能手表准确评估日常洗手行为

摘要: 手部卫生是预防传染病的最有效日常行动之一，例如流感、疟疾和皮肤感染。我们被建议根据专业指导清洗双手以预防病毒感染。然而，几项调查显示很少有人遵循这一建议。因此，我们提出了一种可穿戴解决方案UWash，使用智能手表来评估洗手程序，以提高用户意识和养成高质量洗手习惯。我们从运动传感器的读数中解决洗手评估任务，类似于计算机视觉中的动作分割问题，并提出了一个简单轻量级的双流UNet网络来有效实现。对51名受试者进行的实验表明，UWash在洗手手势识别方面达到92.27%的准确率，在用户依赖设置下的起始/结束检测误差小于0.5秒，手势评分误差小于5分，并在用户独立评估和用户独立位置独立评估中保持良好表现。9个月后，UWash甚至在医院的10名随机过路人中表现良好。UWash是第一个通过手势序列评分手部卫生质量的工作，并有助于指导用户在日常生活中促进手部卫生。代码和数据可在https://github.com/aiotgroup/UWash上找到。

更新时间: 2024-07-14 04:35:23

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2112.06657v2

Have ASkotch: Fast Methods for Large-scale, Memory-constrained Kernel Ridge Regression

Kernel ridge regression (KRR) is a fundamental computational tool, appearing in problems that range from computational chemistry to health analytics, with a particular interest due to its starring role in Gaussian process regression. However, it is challenging to scale KRR solvers to large datasets: with $n$ training points, a direct solver (i.e., Cholesky decomposition) uses $O(n^2)$ storage and $O(n^3)$ flops. Iterative methods for KRR, such as preconditioned conjugate gradient (PCG), avoid the cubic scaling of direct solvers and often use low-rank preconditioners; a rank $r$ preconditioner uses $O(rn)$ storage and each iteration requires $O(n^2)$ flops. To reduce the storage and iteration complexity of iterative solvers for KRR, we propose ASkotch ($\textbf{A}$ccelerated $\textbf{s}$calable $\textbf{k}$ernel $\textbf{o}$p$\textbf{t}$imization using block $\textbf{c}$oordinate descent with $\textbf{H}$essian preconditioning). For a given block size $|b| << n$, each iteration of ASkotch uses $O(r|b| + n)$ storage and $O(n|b|)$ flops, so ASkotch scales better than Cholesky decomposition and PCG. We prove that ASkotch obtains linear convergence to the optimum, with the convergence rate depending on the square roots of the $\textit{preconditioned}$ block condition numbers. Furthermore, we solve KRR problems that were considered to be impossibly large while using limited computational resources: we show that ASkotch outperforms PCG methods with respect to generalization error on large-scale KRR (up to $n = 10^8$) and KRR classification tasks (up to $n = 10^7$) while running each of our experiments on $\textit{a single 12 GB Titan V GPU}$. Our work opens up the possibility of as-yet-unimagined applications of KRR across a wide range of disciplines.

Updated: 2024-07-14 04:11:10

标题: 拥有ASKOTCH：大规模、受内存限制的快速核岭回归方法

摘要: 核岭回归（KRR）是一种基本的计算工具，在从计算化学到健康分析等问题中出现，由于其在高斯过程回归中的重要作用而备受关注。然而，将KRR求解器扩展到大型数据集是具有挑战性的：对于$n$个训练点，直接求解器（即，Cholesky分解）使用$O(n^2)$的存储和$O(n^3)$的浮点运算。对于KRR的迭代方法，如预处理共轭梯度（PCG），避免了直接求解器的立方缩放，并经常使用低秩预处理器；一个秩为$r$的预处理器使用$O(rn)$的存储，每次迭代需要$O(n^2)$的浮点运算。为了减少KRR的迭代求解器的存储和迭代复杂性，我们提出了ASkotch（加速可扩展的核优化，使用带有Hessian预处理的块坐标下降）。对于给定的块大小$|b| << n$，ASkotch的每次迭代使用$O(r|b| + n)$的存储和$O(n|b|)$的浮点运算，因此ASkotch比Cholesky分解和PCG更具扩展性。我们证明ASkotch实现了对最优解的线性收敛，收敛速度取决于预处理块条件数的平方根。此外，我们解决了被认为是不可能的大规模KRR问题，同时利用有限的计算资源：我们展示ASkotch在大规模KRR（最多$n=10^8$）和KRR分类任务（最多$n=10^7$）上优于PCG方法，同时在每个实验中都在单个12 GB Titan V GPU上运行。我们的工作为KRR在各种学科领域中尚未想象到的应用打开了可能性。

更新时间: 2024-07-14 04:11:10

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2407.10070v1

Contrastive Learning of Shared Spatiotemporal EEG Representations Across Individuals for Naturalistic Neuroscience

Neural representations induced by naturalistic stimuli offer insights into how humans respond to stimuli in daily life. Understanding neural mechanisms underlying naturalistic stimuli processing hinges on the precise identification and extraction of the shared neural patterns that are consistently present across individuals. Targeting the Electroencephalogram (EEG) technique, known for its rich spatial and temporal information, this study presents a framework for Contrastive Learning of Shared SpatioTemporal EEG Representations across individuals (CL-SSTER). CL-SSTER utilizes contrastive learning to maximize the similarity of EEG representations across individuals for identical stimuli, contrasting with those for varied stimuli. The network employed spatial and temporal convolutions to simultaneously learn the spatial and temporal patterns inherent in EEG. The versatility of CL-SSTER was demonstrated on three EEG datasets, including a synthetic dataset, a natural speech comprehension EEG dataset, and an emotional video watching EEG dataset. CL-SSTER attained the highest inter-subject correlation (ISC) values compared to the state-of-the-art ISC methods. The latent representations generated by CL-SSTER exhibited reliable spatiotemporal EEG patterns, which can be explained by properties of the naturalistic stimuli. CL-SSTER serves as an interpretable and scalable framework for the identification of inter-subject shared neural representations in naturalistic neuroscience.

Updated: 2024-07-14 03:48:28

标题: 个体间自然神经科学中共享时空脑电图表示的对比学习

摘要: 通过自然刺激引发的神经表征提供了洞察人类如何在日常生活中对刺激做出反应的见解。理解自然刺激加工的神经机制取决于精确识别和提取跨个体一致存在的共享神经模式。本研究针对以其丰富的空间和时间信息而闻名的脑电图（EEG）技术，提出了一种跨个体共享时空EEG表征对比学习框架（CL-SSTER）。CL-SSTER利用对比学习来最大化相同刺激下个体间的EEG表征的相似性，与变化刺激下的表征形成对比。网络采用空间和时间卷积同时学习EEG中固有的空间和时间模式。CL-SSTER的通用性在三个EEG数据集上得到了展示，包括一个合成数据集，一个自然语音理解EEG数据集和一个情绪视频观看EEG数据集。与最先进的ISC方法相比，CL-SSTER获得了最高的个体间相关（ISC）值。CL-SSTER生成的潜在表征展示了可靠的时空EEG模式，这可以通过自然刺激的属性来解释。CL-SSTER作为一个可解释和可扩展的框架，用于识别自然主义神经科学中个体间共享的神经表征。

更新时间: 2024-07-14 03:48:28

领域: q-bio.NC,cs.LG,eess.SP

下载: http://arxiv.org/abs/2402.14213v2

Learning to Refuse: Towards Mitigating Privacy Risks in LLMs

Large language models (LLMs) exhibit remarkable capabilities in understanding and generating natural language. However, these models can inadvertently memorize private information, posing significant privacy risks. This study addresses the challenge of enabling LLMs to protect specific individuals' private data without the need for complete retraining. We propose \return, a Real-world pErsonal daTa UnleaRNing dataset, comprising 2,492 individuals from Wikipedia with associated QA pairs, to evaluate machine unlearning (MU) methods for protecting personal data in a realistic scenario. Additionally, we introduce the Name-Aware Unlearning Framework (NAUF) for Privacy Protection, which enables the model to learn which individuals' information should be protected without affecting its ability to answer questions related to other unrelated individuals. Our extensive experiments demonstrate that NAUF achieves a state-of-the-art average unlearning score, surpassing the best baseline method by 5.65 points, effectively protecting target individuals' personal data while maintaining the model's general capabilities.

Updated: 2024-07-14 03:05:53

标题: 学习拒绝：减轻LLMs中的隐私风险

摘要: 大型语言模型（LLMs）在理解和生成自然语言方面表现出卓越的能力。然而，这些模型可能会无意中记住私人信息，带来重大的隐私风险。本研究解决了使LLMs能够保护特定个人私人数据的挑战，而无需进行完全重新训练。我们提出了一个名为\return的真实世界个人数据解除数据集，包括来自维基百科的2,492个个人及其相关的问答对，用于评估在现实场景中保护个人数据的机器解除（MU）方法。此外，我们引入了一种名为Name-Aware Unlearning Framework（NAUF）的隐私保护框架，使模型能够学习应该保护哪些个人信息，而不影响其回答与其他无关个人相关的问题的能力。我们的广泛实验证明，NAUF实现了最先进的平均解除分数，超过最佳基线方法5.65分，有效保护目标个人的个人数据，同时保持模型的总体能力。

更新时间: 2024-07-14 03:05:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10058v1

MKDTI: Predicting drug-target interactions via multiple kernel fusion on graph attention network

Drug-target relationships may now be predicted computationally using bioinformatics data, which is a valuable tool for understanding pharmacological effects, enhancing drug development efficiency, and advancing related research. A number of structure-based, ligand-based and network-based approaches have now emerged. Furthermore, the integration of graph attention networks with intricate drug target studies is an application area of growing interest. In our work, we formulate a model called MKDTI by extracting kernel information from various layer embeddings of a graph attention network. This combination improves the prediction ability with respect to novel drug-target relationships. We first build a drug-target heterogeneous network using heterogeneous data of drugs and targets, and then use a self-enhanced multi-head graph attention network to extract potential features in each layer. Next, we utilize embeddings of each layer to computationally extract kernel matrices and fuse multiple kernel matrices. Finally, we use a Dual Laplacian Regularized Least Squares framework to forecast novel drug-target entity connections. This prediction can be facilitated by integrating the kernel matrix associated with the drug-target. We measured our model's efficacy using AUPR and AUC. Compared to the benchmark algorithms, our model outperforms them in the prediction outcomes. In addition, we conducted an experiment on kernel selection. The results show that the multi-kernel fusion approach combined with the kernel matrix generated by the graph attention network provides complementary insights into the model. The fusion of this information helps to enhance the accuracy of the predictions.

Updated: 2024-07-14 02:53:25

标题: MKDTI：通过图注意力网络上的多核融合预测药物-靶点相互作用

摘要: 药物靶点关系现在可以通过生物信息学数据进行计算预测，这是了解药理效应、提高药物开发效率和推进相关研究的有价值工具。现在出现了一些基于结构、基于配体和基于网络的方法。此外，将图注意力网络与复杂的药物靶点研究集成是一个日益受到关注的应用领域。在我们的工作中，我们通过从图注意力网络的各层嵌入中提取核信息来构建一个称为MKDTI的模型。这种组合提高了对新型药物靶点关系的预测能力。我们首先利用药物和靶点的异质数据构建一个药物靶点异质网络，然后使用自增强的多头图注意力网络在每一层中提取潜在特征。接下来，我们利用每一层的嵌入来计算提取核矩阵并融合多个核矩阵。最后，我们使用双拉普拉斯正则化最小二乘框架来预测新型药物靶点实体连接。这种预测可以通过整合与药物靶点相关的核矩阵来实现。我们使用AUPR和AUC来衡量我们模型的有效性。与基准算法相比，我们的模型在预测结果方面表现优异。此外，我们进行了核选择实验。结果显示，多核融合方法结合图注意力网络生成的核矩阵提供了对模型的补充见解。这些信息的融合有助于提高预测的准确性。

更新时间: 2024-07-14 02:53:25

领域: cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2407.10055v1

AutoGRAMS: Autonomous Graphical Agent Modeling Software

We introduce the AutoGRAMS framework for programming multi-step interactions with language models. AutoGRAMS represents AI agents as a graph, where each node can execute either a language modeling instruction or traditional code. Likewise, transitions in the graph can be governed by either language modeling decisions or traditional branch logic. AutoGRAMS supports using variables as memory and allows nodes to call other AutoGRAMS graphs as functions. We show how AutoGRAMS can be used to design highly sophisticated agents, including self-referential agents that can modify their own graph. AutoGRAMS's graph-centric approach aids interpretability, controllability, and safety during the design, development, and deployment of AI agents. We provide our framework as open source at https://github.com/autograms/autograms .

Updated: 2024-07-14 02:25:45

标题: AutoGRAMS：自主图形代理建模软件

摘要: 我们介绍了AutoGRAMS框架，用于编程与语言模型的多步交互。AutoGRAMS将AI代理表示为图形，其中每个节点可以执行语言建模指令或传统代码。同样，图中的转换可以由语言建模决策或传统分支逻辑控制。AutoGRAMS支持使用变量作为内存，并允许节点调用其他AutoGRAMS图形作为函数。我们展示了如何使用AutoGRAMS设计高度复杂的代理，包括可以修改自己图形的自引用代理。AutoGRAMS的图形中心方法有助于在设计、开发和部署AI代理过程中的可解释性、可控性和安全性。我们将我们的框架作为开源提供，网址为https://github.com/autograms/autograms。

更新时间: 2024-07-14 02:25:45

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2407.10049v1

On the Importance of Uncertainty in Decision-Making with Large Language Models

We investigate the role of uncertainty in decision-making problems with natural language as input. For such tasks, using Large Language Models as agents has become the norm. However, none of the recent approaches employ any additional phase for estimating the uncertainty the agent has about the world during the decision-making task. We focus on a fundamental decision-making framework with natural language as input, which is the one of contextual bandits, where the context information consists of text. As a representative of the approaches with no uncertainty estimation, we consider an LLM bandit with a greedy policy, which picks the action corresponding to the largest predicted reward. We compare this baseline to LLM bandits that make active use of uncertainty estimation by integrating the uncertainty in a Thompson Sampling policy. We employ different techniques for uncertainty estimation, such as Laplace Approximation, Dropout, and Epinets. We empirically show on real-world data that the greedy policy performs worse than the Thompson Sampling policies. These findings suggest that, while overlooked in the LLM literature, uncertainty plays a fundamental role in bandit tasks with LLMs.

Updated: 2024-07-14 02:20:59

标题: 关于在使用大型语言模型进行决策时不确定性重要性的研究

摘要: 我们研究了在决策问题中不确定性在自然语言输入中的作用。对于这样的任务，使用大型语言模型作为代理已经成为常态。然而，最近的方法中没有一个额外的阶段用于估计代理在决策任务中对世界的不确定性。我们专注于一个基本的决策框架，其中自然语言作为输入，这就是上下文贝叶斯方法之一，其中上下文信息包括文本。作为不确定性估计方法的代表，我们考虑一个具有贪婪策略的LLM贝叶斯方法，它选择对应于最大预测奖励的动作。我们将这个基准与利用不确定性估计的LLM贝叶斯方法进行比较，通过将不确定性整合到汤普森采样策略中。我们采用不同的不确定性估计技术，如拉普拉斯近似、辍学和Epinets。我们在真实世界数据上进行实证研究，发现贪婪策略的表现比汤普森采样策略差。这些发现表明，在LLM文献中被忽视的情况下，不确定性在具有LLM的贝叶斯任务中发挥了基本作用。

更新时间: 2024-07-14 02:20:59

领域: cs.LG

下载: http://arxiv.org/abs/2404.02649v2

Harnessing Feature Clustering For Enhanced Anomaly Detection With Variational Autoencoder And Dynamic Threshold

We introduce an anomaly detection method for multivariate time series data with the aim of identifying critical periods and features influencing extreme climate events like snowmelt in the Arctic. This method leverages the Variational Autoencoder (VAE) integrated with dynamic thresholding and correlation-based feature clustering. This framework enhances the VAE's ability to identify localized dependencies and learn the temporal relationships in climate data, thereby improving the detection of anomalies as demonstrated by its higher F1-score on benchmark datasets. The study's main contributions include the development of a robust anomaly detection method, improving feature representation within VAEs through clustering, and creating a dynamic threshold algorithm for localized anomaly detection. This method offers explainability of climate anomalies across different regions.

Updated: 2024-07-14 01:52:10

标题: 利用特征聚类的方法增强变分自动编码器和动态阈值在异常检测中的应用

摘要: 我们介绍了一种针对多变量时间序列数据的异常检测方法，旨在识别影响极端气候事件（如北极融雪）的关键时期和特征。该方法利用了集成动态阈值和基于相关性的特征聚类的变分自动编码器（VAE）。该框架增强了VAE识别局部依赖性和学习气候数据中时间关系的能力，从而通过在基准数据集上表现出更高的F1分数来改进异常检测。该研究的主要贡献包括开发出一种强大的异常检测方法，通过聚类改进VAE中的特征表示，并为局部异常检测创建动态阈值算法。该方法提供了对不同地区气候异常的解释能力。

更新时间: 2024-07-14 01:52:10

领域: cs.LG,cs.IR

下载: http://arxiv.org/abs/2407.10042v1

Lean-STaR: Learning to Interleave Thinking and Proving

Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the resulting code. We present Lean-STaR, a framework for training language models to produce informal thoughts prior to each step of a proof, thereby boosting the model's theorem-proving capabilities. Lean-STaR uses retrospective ground-truth tactics to generate synthetic thoughts for training the language model. At inference time, the trained model directly generates the thoughts prior to the prediction of the tactics in each proof step. Building on the self-taught reasoner framework, we then apply expert iteration to further fine-tune the model on the correct proofs it samples and verifies using the Lean solver. Lean-STaR achieves state-of-the-art results on the miniF2F-test benchmark within the Lean theorem proving environment, significantly outperforming base models ($\boldsymbol{43.4\% \rightarrow 46.3\%,}$ Pass@64). We also analyze the impact of the augmented thoughts on various aspects of the theorem proving process, providing insights into their effectiveness.

Updated: 2024-07-14 01:43:07

标题: Lean-STaR：学习交替思考和证明

摘要: 传统的基于语言模型的定理证明假设通过对足够数量的正式证明数据进行训练，模型将学会证明定理。我们的关键观察是，正式证明中没有的大量非正式信息对学习证明定理是有用的。例如，人类会思考证明的步骤，但这种思考过程在最终的代码中并不可见。我们提出了Lean-STaR，一个框架，用于训练语言模型在每个证明步骤之前生成非正式思考，从而提升模型的定理证明能力。Lean-STaR使用回顾性的真实策略生成用于训练语言模型的合成思考。在推理时，训练有素的模型直接在每个证明步骤中的策略预测之前生成思考。在自学式推理框架的基础上，我们进一步应用专家迭代来对模型在其采样和验证正确证明时进行微调，使用Lean求解器。Lean-STaR在Lean定理证明环境中的miniF2F-test基准测试中取得了最先进的结果，明显优于基本模型（43.4%→46.3%，Pass@64）。我们还分析了增强思考对定理证明过程各个方面的影响，为其有效性提供了见解。

更新时间: 2024-07-14 01:43:07

领域: cs.AI

下载: http://arxiv.org/abs/2407.10040v1

OpenTracer: A Dynamic Transaction Trace Analyzer for Smart Contract Invariant Generation and Beyond

Smart contracts, self-executing programs on the blockchain, facilitate reliable value exchanges without centralized oversight. Despite the recent focus on dynamic analysis of their transaction histories in both industry and academia, no open-source tool currently offers comprehensive tracking of complete transaction information to extract user-desired data such as invariant-related data. This paper introduces OpenTracer, designed to address this gap. OpenTracer guarantees comprehensive tracking of every execution step, providing complete transaction information. OpenTracer has been employed to analyze 350,800 Ethereum transactions, successfully inferring 23 different types of invariant from predefined templates. The tool is fully open-sourced, serving as a valuable resource for developers and researchers aiming to study transaction behaviors or extract and validate new invariants from transaction traces. The source code of OpenTracer is available at https://github.com/jeffchen006/OpenTracer.

Updated: 2024-07-14 01:35:22

标题: OpenTracer：智能合约不变性生成及更多的动态交易跟踪分析器

摘要: 智能合约是区块链上的自动执行程序，可以在没有集中监管的情况下促进可靠的价值交换。尽管最近工业界和学术界都集中研究了它们的交易历史的动态分析，但目前没有开源工具能够全面追踪完整的交易信息，以提取用户所需的数据，如不变数据。本文介绍了OpenTracer，旨在填补这一空白。OpenTracer能够确保全面追踪每个执行步骤，提供完整的交易信息。OpenTracer已经被用于分析了350,800个以太坊交易，成功地从预定义模板中推断出23种不同类型的不变量。该工具是完全开源的，为开发人员和研究人员提供了宝贵的资源，旨在研究交易行为或从交易跟踪中提取和验证新的不变量。OpenTracer的源代码可在https://github.com/jeffchen006/OpenTracer上找到。

更新时间: 2024-07-14 01:35:22

领域: cs.SE,cs.CR,cs.PL

下载: http://arxiv.org/abs/2407.10039v1

Demystifying Invariant Effectiveness for Securing Smart Contracts

Smart contract transactions associated with security attacks often exhibit distinct behavioral patterns compared with historical benign transactions before the attacking events. While many runtime monitoring and guarding mechanisms have been proposed to validate invariants and stop anomalous transactions on the fly, the empirical effectiveness of the invariants used remains largely unexplored. In this paper, we studied 23 prevalent invariants of 8 categories, which are either deployed in high-profile protocols or endorsed by leading auditing firms and security experts. Using these well-established invariants as templates, we developed a tool Trace2Inv which dynamically generates new invariants customized for a given contract based on its historical transaction data. We evaluated Trace2Inv on 42 smart contracts that fell victim to 27 distinct exploits on the Ethereum blockchain. Our findings reveal that the most effective invariant guard alone can successfully block 18 of the 27 identified exploits with minimal gas overhead. Our analysis also shows that most of the invariants remain effective even when the experienced attackers attempt to bypass them. Additionally, we studied the possibility of combining multiple invariant guards, resulting in blocking up to 23 of the 27 benchmark exploits and achieving false positive rates as low as 0.32%. Trace2Inv outperforms current state-of-the-art works on smart contract invariant mining and transaction attack detection in terms of both practicality and accuracy. Though Trace2Inv is not primarily designed for transaction attack detection, it surprisingly found two previously unreported exploit transactions, earlier than any reported exploit transactions against the same victim contracts.

Updated: 2024-07-14 01:13:43

标题: 揭秘确保智能合约安全的不变有效性

摘要: 与安全攻击相关的智能合约交易通常表现出与攻击事件之前的历史良性交易相比具有明显的行为模式。虽然已经提出了许多运行时监视和防护机制来验证不变量并在运行时停止异常交易，但使用的不变量的经验有效性仍然大部分未被探索。在本文中，我们研究了8个类别的23个流行的不变量，这些不变量要么部署在知名协议中，要么得到了领先的审计公司和安全专家的认可。利用这些成熟的不变量作为模板，我们开发了一个名为Trace2Inv的工具，根据合同的历史交易数据动态生成定制的新不变量。我们在42个智能合约上评估了Trace2Inv，这些合约成为以太坊区块链上27个不同攻击的受害者。我们的研究结果显示，最有效的不变量防护单独就能成功阻止27个已确定的攻击中的18个，而且燃气开销很小。我们的分析还表明，大多数不变量即使经验丰富的攻击者试图绕过它们，仍然有效。此外，我们研究了结合多个不变量防护的可能性，导致阻止27个基准攻击中的23个，并将误报率降低至0.32%。Trace2Inv在智能合约不变量挖掘和交易攻击检测方面的实用性和准确性方面超越了当前的最新工作。尽管Trace2Inv并非主要设计用于交易攻击检测，但它出人意料地发现了两笔先前未报告的攻击交易，比针对同一受害者合约的任何报告的攻击交易都要早。

更新时间: 2024-07-14 01:13:43

领域: cs.CR,cs.PL,cs.SE

下载: http://arxiv.org/abs/2404.14580v2

Towards a Unified Framework for Evaluating Explanations

The challenge of creating interpretable models has been taken up by two main research communities: ML researchers primarily focused on lower-level explainability methods that suit the needs of engineers, and HCI researchers who have more heavily emphasized user-centered approaches often based on participatory design methods. This paper reviews how these communities have evaluated interpretability, identifying overlaps and semantic misalignments. We propose moving towards a unified framework of evaluation criteria and lay the groundwork for such a framework by articulating the relationships between existing criteria. We argue that explanations serve as mediators between models and stakeholders, whether for intrinsically interpretable models or opaque black-box models analyzed via post-hoc techniques. We further argue that useful explanations require both faithfulness and intelligibility. Explanation plausibility is a prerequisite for intelligibility, while stability is a prerequisite for explanation faithfulness. We illustrate these criteria, as well as specific evaluation methods, using examples from an ongoing study of an interpretable neural network for predicting a particular learner behavior.

Updated: 2024-07-14 01:11:22

标题: 朝着一个统一框架评估解释前进

摘要: 创建可解释模型的挑战已被两个主要的研究社区所接受：主要专注于满足工程师需求的低层可解释性方法的机器学习研究人员，以及更加强调基于参与式设计方法的以用户为中心的人机交互研究人员。本文回顾了这些社区如何评估可解释性，识别重叠和语义不一致之处。我们提出朝着统一的评估标准框架迈进，并通过阐明现有标准之间的关系为这样一个框架奠定基础。我们认为解释在模型和利益相关者之间起着中介作用，无论是对于具有内在可解释性的模型还是通过事后技术分析的不透明黑匣子模型。我们进一步认为，有用的解释需要忠实性和可理解性。解释的可信性是可理解性的先决条件，而稳定性是解释忠实性的先决条件。我们使用正在进行的一个关于可解释的神经网络用于预测特定学习者行为的研究的示例来说明这些标准，以及具体的评估方法。

更新时间: 2024-07-14 01:11:22

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.14016v2

LeanQuant: Accurate Large Language Model Quantization with Loss-Error-Aware Grid

Large language models (LLMs) have numerous applications across various domains, but their high computational and memory demands pose significant deployment challenges. Weight quantization is an effective technique for reducing the decoding latency and memory requirements of LLMs. Existing approaches primarily aim to maintain the quality of quantized models by preserving outliers in input features, but they still suffer significant quality loss at lower bit widths. Our approach builds on Optimal Brain Quantization (OBQ), an iterative weight-update-based quantization framework. We identify a key limitation of OBQ, specifically that its uniform quantization grid is suboptimal for maintaining model quality, as it introduces large errors to the task loss. To address this, we propose LeanQuant, which learns a loss-error-aware quantization grid by leveraging the inverse diagonal Hessian. Extensive empirical evaluations demonstrate that LeanQuant is both efficient and accurate; it can quantize a 70-billion-parameter model in 6 hours using a single 32GB GPU and performs favorably compared to competitive baselines in the 4-bit, 3-bit, and 2-bit regions.

Updated: 2024-07-14 00:23:51

标题: LeanQuant：具有损失误差感知网格的准确大语言模型量化

摘要: 大型语言模型（LLMs）在各个领域有着许多应用，但它们高计算和内存需求带来了重要的部署挑战。权重量化是减少LLMs解码延迟和内存需求的有效技术。现有方法主要旨在通过保留输入特征中的离群值来保持量化模型的质量，但它们在较低位宽下仍然遭受到显著的质量损失。我们的方法建立在最优脑量化（OBQ）之上，这是一个基于迭代权重更新的量化框架。我们确定了OBQ的一个关键限制，即其均匀量化网格对于维持模型质量是次优的，因为它会给任务损失引入大误差。为了解决这个问题，我们提出了LeanQuant，它通过利用逆对角Hessian学习一个损失-误差感知的量化网格。大量实证评估表明LeanQuant既高效又准确；它可以在6小时内使用单个32GB GPU对一个拥有700亿参数的模型进行量化，并在4位、3位和2位区域与竞争基线相比表现出色。

更新时间: 2024-07-14 00:23:51

领域: cs.LG

下载: http://arxiv.org/abs/2407.10032v1