Arxiv Day: Article

Transforming Information Systems Management: A Reference Model for Digital Engineering Integration

Digital engineering practices offer significant yet underutilized potential for improving information assurance and system lifecycle management. This paper examines how capabilities like model-based engineering, digital threads, and integrated product lifecycles can address gaps in prevailing frameworks. A reference model demonstrates applying digital engineering techniques to a reference information system, exhibiting enhanced traceability, risk visibility, accuracy, and integration. The model links strategic needs to requirements and architecture while reusing authoritative elements across views. Analysis of the model shows digital engineering closes gaps in compliance, monitoring, change management, and risk assessment. Findings indicate purposeful digital engineering adoption could transform cybersecurity, operations, service delivery, and system governance through comprehensive digital system representations. This research provides a foundation for maturing application of digital engineering for information systems as organizations modernize infrastructure and pursue digital transformation.

Updated: 2024-05-29 23:49:47

标题: 转型信息系统管理：数字工程集成的参考模型

摘要: 数字工程实践提供了显著但被低估的潜力，可用于改善信息保障和系统生命周期管理。本文探讨了诸如基于模型的工程、数字线程和集成产品生命周期等能力如何可以解决现有框架中的缺陷。一个参考模型展示了将数字工程技术应用于参考信息系统，展示了增强的可追溯性、风险可见性、准确性和集成性。该模型将战略需求与需求和架构联系起来，同时在各个视图之间重复使用权威元素。对模型的分析显示，数字工程填补了合规性、监控、变更管理和风险评估方面的差距。研究结果表明，有目的地采用数字工程可以通过全面的数字系统表示来改变网络安全、运营、服务交付和系统治理。这项研究为数字工程在信息系统中的应用提供了基础，随着组织更新基础设施并推进数字化转型，数字工程的应用也将逐渐成熟起来。

更新时间: 2024-05-29 23:49:47

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2405.19576v1

A Deep Convolutional Neural Network-based Model for Aspect and Polarity Classification in Hausa Movie Reviews

Aspect-based Sentiment Analysis (ABSA) is crucial for understanding sentiment nuances in text, especially across diverse languages and cultures. This paper introduces a novel Deep Convolutional Neural Network (CNN)-based model tailored for aspect and polarity classification in Hausa movie reviews, an underrepresented language in sentiment analysis research. A comprehensive Hausa ABSA dataset is created, filling a significant gap in resource availability. The dataset, preprocessed using sci-kit-learn for TF-IDF transformation, includes manually annotated aspect-level feature ontology words and sentiment polarity assignments. The proposed model combines CNNs with attention mechanisms for aspect-word prediction, leveraging contextual information and sentiment polarities. With 91% accuracy on aspect term extraction and 92% on sentiment polarity classification, the model outperforms traditional machine models, offering insights into specific aspects and sentiments. This study advances ABSA research, particularly in underrepresented languages, with implications for cross-cultural linguistic research.

Updated: 2024-05-29 23:45:42

标题: 一个基于深度卷积神经网络的模型，用于豪萨语电影评论中的方面和极性分类

摘要: 方面情感分析（ABSA）对于理解文本中的情感细微差别尤为重要，尤其是跨越不同语言和文化。本文介绍了一种新颖的基于深度卷积神经网络（CNN）的模型，专门用于豪萨语电影评论中的方面和极性分类，豪萨语在情感分析研究中是一种被忽视的语言。一个全面的豪萨语ABSA数据集被创建，填补了资源供给中的重要空白。该数据集使用sci-kit-learn进行TF-IDF转换进行预处理，包括手动标注的方面级特征本体词和情感极性分配。提出的模型将CNN与注意机制相结合，用于方面词预测，利用上下文信息和情感极性。在方面术语提取上准确率达到91％，在情感极性分类上为92％，该模型优于传统的机器模型，提供了对特定方面和情感的见解。这项研究推动了ABSA研究，特别是在被忽视的语言中，对跨文化语言研究具有重要意义。

更新时间: 2024-05-29 23:45:42

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19575v1

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.

Updated: 2024-05-29 23:32:23

标题: 记住重要的事情：多重遍历中的紧急场景分解

摘要: 人类自然会记住永久元素的记忆，而短暂的时刻往往会从记忆中消失。这种选择性的保留对于机器人的感知、定位和绘图至关重要。为了赋予机器人这种能力，我们引入了3D高斯映射（3DGM），这是一个基于3D高斯喷涂的自我监督、仅相机的离线绘图框架。3DGM将来自同一区域的多遍RGB视频转换为基于高斯的环境地图，同时执行2D短暂对象分割。我们的关键观察是环境在遍历过程中保持一致，而对象经常发生变化。这使我们能够利用重复遍历的自我监督来实现环境-对象分解。具体来说，3DGM将多遍环境映射形式化为一个强健的可微分渲染问题，将环境和对象的像素分别视为内点和外点。通过强健特征提炼、特征残差挖掘和强健优化，3DGM联合执行2D分割和3D绘图，无需人工干预。我们构建了Mapverse基准，从Ithaca365和nuPlan数据集中获得，用于评估我们的方法在无监督2D分割、3D重建和神经渲染中的表现。广泛的结果验证了我们的方法在自动驾驶和机器人领域的有效性和潜力。

更新时间: 2024-05-29 23:32:23

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.17187v2

Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding

Vision-Language Models (VLM) can support clinicians by analyzing medical images and engaging in natural language interactions to assist in diagnostic and treatment tasks. However, VLMs often exhibit "hallucinogenic" behavior, generating textual outputs not grounded in contextual multimodal information. This challenge is particularly pronounced in the medical domain, where we do not only require VLM outputs to be accurate in single interactions but also to be consistent with clinical reasoning and diagnostic pathways throughout multi-turn conversations. For this purpose, we propose a new alignment algorithm that uses symbolic representations of clinical reasoning to ground VLMs in medical knowledge. These representations are utilized to (i) generate GPT-4-guided visual instruction tuning data at scale, simulating clinician-VLM conversations with demonstrations of clinical reasoning, and (ii) create an automatic reward function that evaluates the clinical validity of VLM generations throughout clinician-VLM interactions. Our algorithm eliminates the need for human involvement in training data generation or reward model construction, reducing costs compared to standard reinforcement learning with human feedback (RLHF). We apply our alignment algorithm to develop Dr-LLaVA, a conversational VLM finetuned for analyzing bone marrow pathology slides, demonstrating strong performance in multi-turn medical conversations.

Updated: 2024-05-29 23:19:28

标题: Dr-LLaVA：符号临床基础下的视觉指导调整

摘要: 视觉语言模型（VLM）可以通过分析医学图像并参与自然语言交互来支持临床医生进行诊断和治疗任务。然而，VLM经常表现出“致幻”行为，生成的文本输出不基于上下文的多模态信息。在医学领域，这一挑战尤为突出，我们不仅要求VLM输出在单次交互中准确，还要求在整个多轮对话中与临床推理和诊断路径保持一致。为此，我们提出了一种新的对齐算法，使用临床推理的符号表示来将VLM与医学知识联系起来。这些表示被用于（i）生成规模化的GPT-4引导的视觉指导调整数据，模拟临床医生-VLM对话，并展示临床推理的演示，以及（ii）创建一个自动奖励函数，评估VLM在临床医生-VLM互动中生成的临床有效性。我们的算法消除了在训练数据生成或奖励模型构建中需要人类参与的需求，与使用人类反馈的标准强化学习相比，降低了成本。我们将我们的对齐算法应用于开发Dr-LLaVA，这是一个针对分析骨髓病理切片进行微调的会话式VLM，在多轮医学对话中表现出强大的性能。

更新时间: 2024-05-29 23:19:28

领域: cs.AI,cs.CL,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.19567v1

Selective Explanations

Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there has been increasing efforts to develop amortized explainers, where a machine learning model is trained to predict feature attribution scores with only one inference. Despite their efficiency, amortized explainers can produce inaccurate predictions and misleading explanations. In this paper, we propose selective explanations, a novel feature attribution method that (i) detects when amortized explainers generate low-quality explanations and (ii) improves these explanations using a technique called explanations with initial guess. Our selective explanation method allows practitioners to specify the fraction of samples that receive explanations with initial guess, offering a principled way to bridge the gap between amortized explainers and their high-quality counterparts.

Updated: 2024-05-29 23:08:31

标题: 选择性解释

摘要: 特征归因方法通过为输入特征分配重要性分数来解释黑盒机器学习（ML）模型。对于大型ML模型来说，这些方法可能在计算上昂贵。为了解决这一挑战，人们开始努力开发摊销解释器，其中一个机器学习模型被训练以仅进行一次推理来预测特征归因分数。尽管摊销解释器具有高效性，但它们可能产生不准确的预测和误导性的解释。在本文中，我们提出了选择性解释，这是一种新颖的特征归因方法，它（i）检测摊销解释器生成低质量解释的情况，以及（ii）使用一种称为带有初始猜测的解释技术改进这些解释。我们的选择性解释方法允许实践者指定接收带有初始猜测解释的样本比例，从而提供了一种原则性的方法来弥合摊销解释器和其高质量对应物之间的差距。

更新时间: 2024-05-29 23:08:31

领域: cs.CY,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.19562v1

Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study

Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage while assuring patient privacy and safety. Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed. As such, clinical institutions are still reluctant to apply them for open access to their data. Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field, given their capability to perform various tasks. This paper proposes six new evaluation metrics tailored to the challenges of generative anonymization with LLMs. Moreover, we present a comparative study of LLM-based methods, testing them against two baseline techniques. Our results establish LLM-based models as a reliable alternative to common approaches, paving the way toward trustworthy anonymization of clinical text.

Updated: 2024-05-29 23:07:58

标题: 释放大型语言模型在临床文本匿名化中的潜力：一项比较研究

摘要: 自动化临床文本匿名化有潜力解锁文本健康数据的广泛共享，同时确保患者隐私和安全。尽管文献中提出了许多复杂且理论上成功的匿名化解决方案，但这些技术仍然存在缺陷。因此，临床机构仍然不愿意将它们应用于其数据的开放访问。最近发展的大型语言模型（LLMs）在进一步推动该领域方面提供了有希望的机会，因为它们能够执行各种任务。本文提出了六个针对LLMs生成匿名化挑战的新评估指标。此外，我们对基于LLMs的方法进行了比较研究，将它们与两种基准技术进行了测试。我们的结果将LLM模型确立为常见方法的可靠替代方案，为临床文本的可靠匿名化铺平了道路。

更新时间: 2024-05-29 23:07:58

领域: cs.CL,cs.AI,cs.CR,cs.LG,I.2.7

下载: http://arxiv.org/abs/2406.00062v1

Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models

The startling success of ChatGPT and other large language models (LLMs) using transformer-based generative neural network architecture in applications such as natural language processing and image synthesis has many researchers excited about potential opportunities in process systems engineering (PSE). The almost human-like performance of LLMs in these areas is indeed very impressive, surprising, and a major breakthrough. Their capabilities are very useful in certain tasks, such as writing first drafts of documents, code writing assistance, text summarization, etc. However, their success is limited in highly scientific domains as they cannot yet reason, plan, or explain due to their lack of in-depth domain knowledge. This is a problem in domains such as chemical engineering as they are governed by fundamental laws of physics and chemistry (and biology), constitutive relations, and highly technical knowledge about materials, processes, and systems. Although purely data-driven machine learning has its immediate uses, the long-term success of AI in scientific and engineering domains would depend on developing hybrid AI systems that use first principles and technical knowledge effectively. We call these hybrid AI systems Large Knowledge Models (LKMs), as they will not be limited to only NLP-based techniques or NLP-like applications. In this paper, we discuss the challenges and opportunities in developing such systems in chemical engineering.

Updated: 2024-05-29 23:06:54

标题: 何去何从ChatGPT？从大型语言模型到大型知识模型

摘要: ChatGPT和其他基于transformer的大型语言模型（LLMs）在自然语言处理和图像合成等应用中的惊人成功，使许多研究人员对过程系统工程（PSE）中的潜在机会感到兴奋。在这些领域，LLMs几乎人类般的表现确实令人印象深刻，令人惊讶，并且是一项重大突破。它们的能力在某些任务中非常有用，例如撰写文件的初稿，编写代码辅助，文本摘要等。然而，在高度科学领域，它们的成功受到限制，因为它们尚无法进行推理、规划或解释，这是由于它们缺乏深入的领域知识。在化学工程等领域，这是一个问题，因为这些领域受到物理、化学（和生物学）基本法则、本构关系和关于材料、过程和系统的高度技术知识的约束。尽管纯数据驱动的机器学习具有其直接用途，但在科学和工程领域中AI的长期成功将取决于开发有效使用第一原则和技术知识的混合AI系统。我们将这些混合AI系统称为大型知识模型（LKMs），因为它们不仅限于仅基于NLP的技术或类似NLP的应用。在本文中，我们讨论了在化学工程领域开发此类系统面临的挑战和机遇。

更新时间: 2024-05-29 23:06:54

领域: cs.AI,cs.CL,I.2.0; I.2.7

下载: http://arxiv.org/abs/2405.19561v1

Learning Neural Contracting Dynamics: Extended Linearization and Global Guarantees

Global stability and robustness guarantees in learned dynamical systems are essential to ensure well-behavedness of the systems in the face of uncertainty. We present Extended Linearized Contracting Dynamics (ELCD), the first neural network-based dynamical system with global contractivity guarantees in arbitrary metrics. The key feature of ELCD is a parametrization of the extended linearization of the nonlinear vector field. In its most basic form, ELCD is guaranteed to be (i) globally exponentially stable, (ii) equilibrium contracting, and (iii) globally contracting with respect to some metric. To allow for contraction with respect to more general metrics in the data space, we train diffeomorphisms between the data space and a latent space and enforce contractivity in the latent space, which ensures global contractivity in the data space. We demonstrate the performance of ELCD on the high dimensional LASA, multi-link pendulum, and Rosenbrock datasets.

Updated: 2024-05-29 23:05:07

标题: 学习神经收缩动力学：扩展线性化和全局保证

摘要: 在学习的动态系统中，全局稳定性和鲁棒性保证对于确保系统在面对不确定性时表现良好至关重要。我们提出了扩展线性化收缩动力学（ELCD），这是第一个基于神经网络的动态系统，在任意度量下具有全局收缩性保证。ELCD的关键特点是对非线性向量场的扩展线性化进行参数化。在其最基本形式中，ELCD被保证是（i）全局指数稳定的，（ii）平衡收缩的，以及（iii）在某个度量下全局收缩的。为了允许在数据空间中更一般的度量下进行收缩，我们训练数据空间和潜在空间之间的微分同胚，并在潜在空间中强制执行收缩，这确保了数据空间中的全局收缩。我们在高维LASA、多链接摆和Rosenbrock数据集上展示了ELCD的性能。

更新时间: 2024-05-29 23:05:07

领域: cs.LG,math.OC

下载: http://arxiv.org/abs/2402.08090v3

Manipulation and Peer Mechanisms: A Survey

In peer mechanisms, the competitors for a prize also determine who wins. Each competitor may be asked to rank, grade, or nominate peers for the prize. Since the prize can be valuable, such as financial aid, course grades, or an award at a conference, competitors may be tempted to manipulate the mechanism. We survey approaches to prevent or discourage the manipulation of peer mechanisms. We conclude our survey by identifying several important research challenges.

Updated: 2024-05-29 23:04:26

标题: 操纵和同行机制：一项调查

摘要: 在同行机制中，竞争者也决定了谁会赢得奖品。每个竞争者可能会被要求对同行进行排名、评分或提名以获得奖品。由于奖品可能是有价值的，比如财政援助、课程成绩或会议奖项，竞争者可能会被诱导操纵机制。我们调查了预防或遏制同行机制操纵的方法。最后，我们通过确定几个重要的研究挑战来总结我们的调查。

更新时间: 2024-05-29 23:04:26

领域: cs.AI,cs.GT,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2210.01984v3

Clustering Mixtures of Discrete Distributions: A Note on Mitra's Algorithm

In this note, we provide a refined analysis of Mitra's algorithm \cite{mitra2008clustering} for classifying general discrete mixture distribution models. Built upon spectral clustering \cite{mcsherry2001spectral}, this algorithm offers compelling conditions for probability distributions. We enhance this analysis by tailoring the model to bipartite stochastic block models, resulting in more refined conditions. Compared to those derived in \cite{mitra2008clustering}, our improved separation conditions are obtained.

Updated: 2024-05-29 22:55:45

标题: 离散分布混合的聚类：关于Mitra算法的注解

摘要: 在这篇文章中，我们对Mitra的算法进行了精细分析，用于对一般的离散混合分布模型进行分类\cite{mitra2008clustering}。该算法基于谱聚类\cite{mcsherry2001spectral}，为概率分布提供了引人注目的条件。我们通过将模型定制为二部随机块模型，进一步增强了这一分析，得到了更精细的条件。与\cite{mitra2008clustering}中得到的条件相比，我们获得了改进的分离条件。

更新时间: 2024-05-29 22:55:45

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19559v1

Convergence Bounds for Sequential Monte Carlo on Multimodal Distributions using Soft Decomposition

We prove bounds on the variance of a function $f$ under the empirical measure of the samples obtained by the Sequential Monte Carlo (SMC) algorithm, with time complexity depending on local rather than global Markov chain mixing dynamics. SMC is a Markov Chain Monte Carlo (MCMC) method, which starts by drawing $N$ particles from a known distribution, and then, through a sequence of distributions, re-weights and re-samples the particles, at each instance applying a Markov chain for smoothing. In principle, SMC tries to alleviate problems from multi-modality. However, most theoretical guarantees for SMC are obtained by assuming global mixing time bounds, which are only efficient in the uni-modal setting. We show that bounds can be obtained in the truly multi-modal setting, with mixing times that depend only on local MCMC dynamics.

Updated: 2024-05-29 22:43:45

标题: 使用软分解的序贯蒙特卡洛在多模态分布上的收敛界限

摘要: 我们证明了在顺序蒙特卡洛（SMC）算法获得的样本的经验测度下，函数$f$的方差的界限，其时间复杂度取决于局部而不是全局的马尔可夫链混合动力学。SMC是一种马尔可夫链蒙特卡洛（MCMC）方法，它首先从已知分布中抽取$N$个粒子，然后通过一系列分布，对粒子进行重新加权和重新抽样，在每个实例中应用马尔可夫链进行平滑处理。原则上，SMC试图减轻多模态问题。然而，大多数SMC的理论保证都是基于假设全局混合时间界限得到的，这种方法在单模态设置下是有效的。我们展示了可以在真正的多模态设置下获得界限，混合时间仅取决于局部MCMC动力学。

更新时间: 2024-05-29 22:43:45

领域: math.ST,cs.LG,math.PR,stat.ML,stat.TH

下载: http://arxiv.org/abs/2405.19553v1

Simulation of Graph Algorithms with Looped Transformers

The execution of graph algorithms using neural networks has recently attracted significant interest due to promising empirical progress. This motivates further understanding of how neural networks can replicate reasoning steps with relational data. In this work, we study the ability of transformer networks to simulate algorithms on graphs from a theoretical perspective. The architecture we use is a looped transformer with extra attention heads that interact with the graph. We prove by construction that this architecture can simulate individual algorithms such as Dijkstra's shortest path, Breadth- and Depth-First Search, and Kosaraju's strongly connected components, as well as multiple algorithms simultaneously. The number of parameters in the networks does not increase with the input graph size, which implies that the networks can simulate the above algorithms for any graph. Despite this property, we show a limit to simulation in our solution due to finite precision. Finally, we show a Turing Completeness result with constant width when the extra attention heads are utilized.

Updated: 2024-05-29 22:41:12

标题: 使用环形变压器模拟图算法

摘要: 最近，使用神经网络执行图算法引起了广泛关注，因为实证进展令人鼓舞。这促使进一步了解神经网络如何能够复制与关系数据相关的推理步骤。在这项工作中，我们从理论角度研究了transformer网络模拟图算法的能力。我们使用的架构是一个带有额外注意力头的循环transformer，与图进行交互。我们通过构造证明，这种架构可以模拟个体算法，如Dijkstra的最短路径、广度优先搜索、深度优先搜索和Kosaraju的强连通分量，以及同时模拟多个算法。网络中的参数数量不随输入图大小增加，这意味着网络可以模拟任何图上述算法。尽管具有这种属性，我们在解决方案中展示了由于有限精度而导致模拟的局限性。最后，我们展示了在利用额外注意力头时的恒定宽度的图灵完备性结果。

更新时间: 2024-05-29 22:41:12

领域: cs.LG,cs.AI,cs.DS

下载: http://arxiv.org/abs/2402.01107v2

Few-shot Adaptation to Distribution Shifts By Mixing Source and Target Embeddings

Pretrained machine learning models need to be adapted to distribution shifts when deployed in new target environments. When obtaining labeled data from the target distribution is expensive, few-shot adaptation with only a few examples from the target distribution becomes essential. In this work, we propose MixPro, a lightweight and highly data-efficient approach for few-shot adaptation. MixPro first generates a relatively large dataset by mixing (linearly combining) pre-trained embeddings of large source data with those of the few target examples. This process preserves important features of both source and target distributions, while mitigating the specific noise in the small target data. Then, it trains a linear classifier on the mixed embeddings to effectively adapts the model to the target distribution without overfitting the small target data. Theoretically, we demonstrate the advantages of MixPro over previous methods. Our experiments, conducted across various model architectures on 8 datasets featuring different types of distribution shifts, reveal that MixPro can outperform baselines by up to 7\%, with only 2-4 target examples.

Updated: 2024-05-29 22:38:13

标题: 通过混合源和目标嵌入进行少样本适应分布变化

摘要: 预训练的机器学习模型在部署到新的目标环境时需要适应分布变化。当从目标分布获取标记数据变得昂贵时，只有少量目标示例的少样本适应变得至关重要。在这项工作中，我们提出了MixPro，一种轻量级且高度数据高效的少样本适应方法。MixPro首先通过将大型源数据的预训练嵌入与少量目标示例的嵌入线性组合来生成一个相对较大的数据集。这个过程保留了源分布和目标分布的重要特征，同时减轻了小目标数据中的特定噪音。然后，它在混合嵌入上训练线性分类器，有效地使模型适应目标分布，而不会过度拟合小目标数据。从理论上讲，我们展示了MixPro相对于先前方法的优势。我们的实验跨越了各种模型架构，涉及8个具有不同类型分布变化的数据集，结果显示MixPro可以比基线方法提高高达7\%，仅需2-4个目标示例。

更新时间: 2024-05-29 22:38:13

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2305.14521v3

Exploring knowledge graph-based neural-symbolic system from application perspective

Advancements in Artificial Intelligence (AI) and deep neural networks have driven significant progress in vision and text processing. However, achieving human-like reasoning and interpretability in AI systems remains a substantial challenge. The Neural-Symbolic paradigm, which integrates neural networks with symbolic systems, presents a promising pathway toward more interpretable AI. Within this paradigm, Knowledge Graphs (KG) are crucial, offering a structured and dynamic method for representing knowledge through interconnected entities and relationships, typically as triples (subject, predicate, object). This paper explores recent advancements in neural-symbolic integration based on KG, examining how it supports integration in three categories: enhancing the reasoning and interpretability of neural networks with symbolic knowledge (Symbol for Neural), refining the completeness and accuracy of symbolic systems via neural network methodologies (Neural for Symbol), and facilitating their combined application in Hybrid Neural-Symbolic Integration. It highlights current trends and proposes future research directions in Neural-Symbolic AI.

Updated: 2024-05-29 22:37:08

标题: 从应用视角探索基于知识图的神经符号系统

摘要: 人工智能（AI）和深度神经网络的进步推动了视觉和文本处理方面的显著进展。然而，在AI系统中实现类人推理和可解释性仍然是一个重大挑战。神经符号范式将神经网络与符号系统整合在一起，为更具可解释性的AI提供了一条有前途的道路。在这个范式中，知识图（KG）至关重要，通过互联实体和关系的方式提供了一种结构化和动态的知识表示方法，通常以三元组（主语，谓词，宾语）呈现。本文探讨了基于知识图的神经符号整合的最新进展，研究了它如何支持三个方面的整合：通过符号知识增强神经网络的推理和可解释性（符号应用于神经），通过神经网络方法改进符号系统的完整性和准确性（神经应用于符号），以及促进它们在混合神经符号整合中的联合应用。它强调了当前的趋势，并提出了神经符号AI未来研究方向。

更新时间: 2024-05-29 22:37:08

领域: cs.AI

下载: http://arxiv.org/abs/2405.03524v4

Stress-Testing Capability Elicitation With Password-Locked Models

To determine the safety of large language models (LLMs), AI developers must be able to assess their dangerous capabilities. But simple prompting strategies often fail to elicit an LLM's full capabilities. One way to elicit capabilities more robustly is to fine-tune the LLM to complete the task. In this paper, we investigate the conditions under which fine-tuning-based elicitation suffices to elicit capabilities. To do this, we introduce password-locked models, LLMs fine-tuned such that some of their capabilities are deliberately hidden. Specifically, these LLMs are trained to exhibit these capabilities only when a password is present in the prompt, and to imitate a much weaker LLM otherwise. Password-locked models enable a novel method of evaluating capabilities elicitation methods, by testing whether these password-locked capabilities can be elicited without using the password. We find that a few high-quality demonstrations are often sufficient to fully elicit password-locked capabilities. More surprisingly, fine-tuning can elicit other capabilities that have been locked using the same password, or even different passwords. Furthermore, when only evaluations, and not demonstrations, are available, approaches like reinforcement learning are still often able to elicit capabilities. Overall, our findings suggest that fine-tuning is an effective method of eliciting hidden capabilities of current models, but may be unreliable when high-quality demonstrations are not available, e.g. as may be the case when models' (hidden) capabilities exceed those of human demonstrators.

Updated: 2024-05-29 22:26:26

标题: 使用密码锁定模型进行压力测试能力引出

摘要: 为了确定大型语言模型（LLMs）的安全性，AI开发人员必须能够评估它们的危险能力。但简单的提示策略通常无法引发LLM的全部能力。一种更可靠地引发能力的方法是对LLM进行微调以完成任务。在本文中，我们研究了基于微调的引发何时足以引发能力。为了做到这一点，我们引入了密码锁定模型，即通过微调LLMs，使其某些能力被故意隐藏。具体来说，这些LLMs被训练只有在提示中存在密码时才表现出这些能力，并在其他情况下模仿一个远不如它的LLM。密码锁定模型实现了一种评估能力引发方法的新方法，即通过测试这些密码锁定的能力是否可以在不使用密码的情况下被引发。我们发现，少数高质量的演示通常足以完全引发密码锁定的能力。更令人惊讶的是，通过相同密码或甚至不同密码锁定的其他能力也可以通过微调来引发。此外，当只有评估而没有演示时，诸如强化学习的方法仍然经常能够引发能力。总的来说，我们的研究结果表明，微调是引发当前模型隐藏能力的一种有效方法，但在没有高质量演示可用时可能会不可靠，例如当模型的（隐藏）能力超过人类演示者时可能会出现这种情况。

更新时间: 2024-05-29 22:26:26

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.19550v1

RLeXplore: Accelerating Research in Intrinsically-Motivated Reinforcement Learning

Extrinsic rewards can effectively guide reinforcement learning (RL) agents in specific tasks. However, extrinsic rewards frequently fall short in complex environments due to the significant human effort needed for their design and annotation. This limitation underscores the necessity for intrinsic rewards, which offer auxiliary and dense signals and can enable agents to learn in an unsupervised manner. Although various intrinsic reward formulations have been proposed, their implementation and optimization details are insufficiently explored and lack standardization, thereby hindering research progress. To address this gap, we introduce RLeXplore, a unified, highly modularized, and plug-and-play framework offering reliable implementations of eight state-of-the-art intrinsic reward algorithms. Furthermore, we conduct an in-depth study that identifies critical implementation details and establishes well-justified standard practices in intrinsically-motivated RL. The source code for RLeXplore is available at https://github.com/RLE-Foundation/RLeXplore.

Updated: 2024-05-29 22:23:20

标题: RLeXplore：加速内在动机驱动的强化学习研究

摘要: 外部奖励可以有效地引导强化学习代理在特定任务中。然而，在复杂环境中，外部奖励经常由于需要设计和注释的大量人力而不足。这种限制凸显了内在奖励的必要性，内在奖励提供辅助和密集的信号，可以使代理以无监督的方式学习。尽管提出了各种内在奖励的公式，但它们的实现和优化细节尚未得到充分探讨，并且缺乏标准化，从而阻碍了研究进展。为了填补这一空白，我们介绍了RLeXplore，这是一个统一的、高度模块化的、即插即用的框架，提供了八种最先进的内在奖励算法的可靠实现。此外，我们进行了深入研究，确定了关键的实现细节，并建立了内在动机的强化学习中合理的标准实践。RLeXplore的源代码可在https://github.com/RLE-Foundation/RLeXplore上找到。

更新时间: 2024-05-29 22:23:20

领域: cs.LG

下载: http://arxiv.org/abs/2405.19548v1

CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning

Data selection has emerged as a core issue for large-scale visual-language model pretaining (e.g., CLIP), particularly with noisy web-curated datasets. Three main data selection approaches are: (1) leveraging external non-CLIP models to aid data selection, (2) training new CLIP-style embedding models that are more effective at selecting high-quality data than the original OpenAI CLIP model, and (3) designing better metrics or strategies universally applicable to any CLIP embedding without requiring specific model properties (e.g., CLIPScore is one popular metric). While the first two approaches have been extensively studied, the third remains under-explored. In this paper, we advance the third approach by proposing two new methods. Firstly, instead of classical CLIP scores that only consider the alignment between two modalities from a single sample, we introduce negCLIPLoss, a CLIP loss-inspired method that adds the alignment between one sample and its contrastive pairs as an extra normalization term for better quality measurement. Secondly, when downstream tasks are known, we propose a new norm-based metric, NormSim, to measure the similarity between pretraining data and target data. We test our methods on the data selection benchmark, DataComp~\cite{gadre2023datacomp}. Compared to the best baseline using only OpenAI's CLIP-L/14, our methods achieve a 5.3\% improvement on ImageNet-1k and a 2.8\% improvement on 38 downstream evaluation tasks. Moreover, both negCLIPLoss and NormSim are compatible with existing techniques. By combining our methods with the current best methods DFN~\cite{fang2023data} and HYPE~\cite{kim2024hype}, we can boost average performance on downstream tasks by 0.9\%, achieving a new state-of-the-art.

Updated: 2024-05-29 22:19:57

标题: CLIPLoss 和基于范数的数据选择方法用于多模式对比学习

摘要: 数据选择已经成为大规模视觉语言模型预训练（例如CLIP）的核心问题，特别是在嘈杂的网络策划数据集中。三种主要的数据选择方法是：（1）利用外部非CLIP模型来帮助数据选择，（2）训练新的CLIP风格嵌入模型，这些模型比原始的OpenAI CLIP模型更有效地选择高质量数据，（3）设计更好的度量标准或策略，普遍适用于任何CLIP嵌入，而不需要特定的模型属性（例如，CLIPScore是一种流行的度量标准）。虽然前两种方法已经得到了广泛研究，但第三种方法仍未得到充分探讨。在本文中，我们通过提出两种新方法推进了第三种方法。首先，我们引入了negCLIPLoss，这是一种受CLIP损失启发的方法，不仅考虑单个样本两种模态之间的对齐，还增加了一个额外的规范化项，用于衡量一个样本与其对比对的对齐情况，以更好地衡量质量。其次，当下游任务已知时，我们提出了一种基于范数的度量标准NormSim，用于衡量预训练数据和目标数据之间的相似性。我们在数据选择基准DataComp上测试了我们的方法。与仅使用OpenAI的CLIP-L/14的最佳基线相比，我们的方法在ImageNet-1k上实现了5.3％的改进，在38个下游评估任务上实现了2.8％的改进。此外，negCLIPLoss和NormSim与现有技术兼容。通过将我们的方法与当前最佳方法DFN和HYPE相结合，我们可以将下游任务的平均性能提升0.9％，实现新的最新技术水平。

更新时间: 2024-05-29 22:19:57

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.19547v1

One-Shot Safety Alignment for Large Language Models via Optimal Dualization

The growing safety concerns surrounding Large Language Models (LLMs) raise an urgent need to align them with diverse human preferences to simultaneously enhance their helpfulness and safety. A promising approach is to enforce safety constraints through Reinforcement Learning from Human Feedback (RLHF). For such constrained RLHF, common Lagrangian-based primal-dual policy optimization methods are computationally expensive and often unstable. This paper presents a dualization perspective that reduces constrained alignment to an equivalent unconstrained alignment problem. We do so by pre-optimizing a smooth and convex dual function that has a closed form. This shortcut eliminates the need for cumbersome primal-dual policy iterations, thus greatly reducing the computational burden and improving training stability. Our strategy leads to two practical algorithms in model-based and preference-based scenarios (MoCAN and PeCAN, respectively). A broad range of experiments demonstrate the effectiveness of our methods.

Updated: 2024-05-29 22:12:52

标题: 大语言模型的一次性安全对齐：通过最佳二元化

摘要: 随着对大型语言模型（LLMs）日益增长的安全担忧，迫切需要将它们与多样化的人类偏好相一致，以同时提高它们的实用性和安全性。一种有前途的方法是通过来自人类反馈的强化学习来强制执行安全约束。对于这种受限制的RLHF，常见的基于Lagrangian的原始-对偶策略优化方法计算成本高且经常不稳定。本文提出了一个对偶化视角，将受限制的对齐问题简化为一个等价的无限制对齐问题。我们通过预先优化一个具有封闭形式的平滑和凸二次函数来实现这一目标。这种快捷方式消除了繁琐的原始-对偶策略迭代的需要，从而大大减轻了计算负担并提高了训练稳定性。我们的策略导致了模型为基础和偏好为基础情景下的两种实用算法（MoCAN和PeCAN）。大量实验证明了我们方法的有效性。

更新时间: 2024-05-29 22:12:52

领域: cs.AI,cs.CL,cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.19544v1

Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending on their capabilities and quality, which inevitably sets an upper bound on performance. In this paper, we propose SIMA, a framework that enhances visual and language modality alignment through self-improvement, eliminating the needs for external models or data. SIMA leverages prompts from existing vision instruction tuning datasets to self-generate responses and employs an in-context self-critic mechanism to select response pairs for preference tuning. The key innovation is the introduction of three vision metrics during the in-context self-critic process, which can guide the LVLM in selecting responses that enhance image comprehension. Through experiments across 14 hallucination and comprehensive benchmarks, we demonstrate that SIMA not only improves model performance across all benchmarks but also achieves superior modality alignment, outperforming previous approaches.

Updated: 2024-05-29 22:07:53

标题: 通过自我改进提升大规模视觉语言模型中的视觉-语言模态对齐

摘要: 大型视觉-语言模型（LVLMs）通过在特定数据集上进行视觉指导调优，在各种视觉问答和推理任务中取得了令人印象深刻的成果。然而，在视觉和语言模态之间的对齐仍有显著的改进空间。以前用于增强这种对齐的方法通常需要外部模型或数据，严重依赖它们的能力和质量，这不可避免地为性能设置了上限。在本文中，我们提出了SIMA，一个通过自我改进增强视觉和语言模态对齐的框架，消除了对外部模型或数据的需求。SIMA利用现有的视觉指导调优数据集中的提示来自动生成响应，并采用上下文自批评机制来选择用于偏好调优的响应对。关键创新在于在上下文自批评过程中引入了三种视觉度量，可以引导LVLM选择增强图像理解的响应。通过在14个虚构和全面基准上的实验，我们证明了SIMA不仅在所有基准上提高了模型性能，而且实现了更优越的模态对齐，优于以前的方法。

更新时间: 2024-05-29 22:07:53

领域: cs.CV,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.15973v2

Large Language Model for Mental Health: A Systematic Review

Large language models (LLMs) have attracted significant attention for potential applications in digital health, while their application in mental health is subject to ongoing debate. This systematic review aims to evaluate the usage of LLMs in mental health, focusing on their strengths and limitations in early screening, digital interventions, and clinical applications. Adhering to PRISMA guidelines, we searched PubMed, IEEE Xplore, Scopus, and the JMIR using keywords: 'mental health OR mental illness OR mental disorder OR psychiatry' AND 'large language models'. We included articles published between January 1, 2017, and December 31, 2023, excluding non-English articles. 30 articles were evaluated, which included research on mental illness and suicidal ideation detection through text (n=12), usage of LLMs for mental health conversational agents (CAs) (n=5), and other applications and evaluations of LLMs in mental health (n=13). LLMs exhibit substantial effectiveness in detecting mental health issues and providing accessible, de-stigmatized eHealth services. However, the current risks associated with the clinical use might surpass their benefits. The study identifies several significant issues: the lack of multilingual datasets annotated by experts, concerns about the accuracy and reliability of the content generated, challenges in interpretability due to the 'black box' nature of LLMs, and persistent ethical dilemmas. These include the lack of a clear ethical framework, concerns about data privacy, and the potential for over-reliance on LLMs by both therapists and patients, which could compromise traditional medical practice. Despite these issues, the rapid development of LLMs underscores their potential as new clinical aids, emphasizing the need for continued research and development in this area.

Updated: 2024-05-29 21:55:17

标题: 大型语言模型用于心理健康：系统综述

摘要: 大型语言模型（LLMs）在数字健康领域引起了极大关注，而它们在心理健康领域的应用则存在争议。本系统性综述旨在评估LLMs在心理健康领域的应用，重点关注它们在早期筛查、数字干预和临床应用中的优势和局限性。遵循PRISMA指南，我们在PubMed、IEEE Xplore、Scopus和JMIR上使用关键词搜索：“心理健康 OR 心理疾病 OR 精神障碍 OR 精神病学” 和 “大型语言模型”。我们包括了2017年1月1日至2023年12月31日发表的文章，排除非英语文章。共评估了30篇文章，其中包括有关通过文本检测心理疾病和自杀意念（n=12）、使用LLMs进行心理健康对话代理人（CAs）（n=5）以及其他心理健康领域LLMs应用和评估（n=13）的研究。LLMs在检测心理健康问题和提供可访问、去污名化的eHealth服务方面表现出显著效果。然而，目前与临床应用相关的风险可能超过了它们的好处。研究确定了几个重要问题：缺乏由专家注释的多语言数据集，对生成内容的准确性和可靠性的担忧，由于LLMs的“黑盒”性质而导致解释困难的挑战，以及持续存在的伦理困境。这些问题包括缺乏明确的伦理框架，对数据隐私的担忧，以及治疗师和患者可能过度依赖LLMs的潜在风险，这可能损害传统医疗实践。尽管存在这些问题，LLMs的快速发展突显了它们作为新临床辅助工具的潜力，强调了在这一领域继续进行研究和开发的必要性。

更新时间: 2024-05-29 21:55:17

领域: cs.CY,cs.AI,cs.CL

下载: http://arxiv.org/abs/2403.15401v2

Computing Low-Entropy Couplings for Large-Support Distributions

Minimum-entropy coupling (MEC) -- the process of finding a joint distribution with minimum entropy for given marginals -- has applications in areas such as causality and steganography. However, existing algorithms are either computationally intractable for large-support distributions or limited to specific distribution types and sensitive to hyperparameter choices. This work addresses these limitations by unifying a prior family of iterative MEC (IMEC) approaches into a generalized partition-based formalism. From this framework, we derive a novel IMEC algorithm called ARIMEC, capable of handling arbitrary discrete distributions, and introduce a method to make IMEC robust to suboptimal hyperparameter settings. These innovations facilitate the application of IMEC to high-throughput steganography with language models, among other settings. Our codebase is available at https://github.com/ssokota/mec .

Updated: 2024-05-29 21:54:51

标题: 计算大支持分布的低熵耦合

摘要: 最小熵耦合（MEC）是指在给定边际分布的情况下找到具有最小熵的联合分布的过程，它在因果关系和隐写术等领域有应用。然而，现有算法要么在处理大支持度分布时计算复杂，要么仅限于特定分布类型并对超参数选择敏感。本文通过将一系列迭代MEC（IMEC）方法统一为广义基于分区的形式主义来解决这些限制。从这个框架中，我们推导出一种新颖的IMEC算法称为ARIMEC，能够处理任意离散分布，并介绍一种方法使IMEC对次优超参数设置具有鲁棒性。这些创新促进了IMEC在语言模型高通量隐写术等设置中的应用。我们的代码库可在https://github.com/ssokota/mec 上找到。

更新时间: 2024-05-29 21:54:51

领域: cs.IT,cs.CR,math.IT

下载: http://arxiv.org/abs/2405.19540v1

CheXpert Plus: Hundreds of Thousands of Aligned Radiology Texts, Images and Patients

Since the release of the original CheXpert paper five years ago, CheXpert has become one of the most widely used and cited clinical AI datasets. The emergence of vision language models has sparked an increase in demands for sharing reports linked to CheXpert images, along with a growing interest among AI fairness researchers in obtaining demographic data. To address this, CheXpert Plus serves as a new collection of radiology data sources, made publicly available to enhance the scaling, performance, robustness, and fairness of models for all subsequent machine learning tasks in the field of radiology. CheXpert Plus is the largest text dataset publicly released in radiology, with a total of 36 million text tokens, including 13 million impression tokens. To the best of our knowledge, it represents the largest text de-identification effort in radiology, with almost 1 million PHI spans anonymized. It is only the second time that a large-scale English paired dataset has been released in radiology, thereby enabling, for the first time, cross-institution training at scale. All reports are paired with high-quality images in DICOM format, along with numerous image and patient metadata covering various clinical and socio-economic groups, as well as many pathology labels and RadGraph annotations. We hope this dataset will boost research for AI models that can further assist radiologists and help improve medical care. Data is available at the following URL: https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 Models are available at the following URL: https://github.com/Stanford-AIMI/chexpert-plus

Updated: 2024-05-29 21:48:56

标题: CheXpert Plus：数十万条对齐的放射学文本、图像和患者

摘要: 自从五年前发布了原始的CheXpert论文以来，CheXpert已成为最广泛使用和引用的临床AI数据集之一。视觉语言模型的出现引发了人们对与CheXpert图像相关的报告共享的需求增加，以及AI公平性研究人员对获取人口统计数据的兴趣增加。为了解决这个问题，CheXpert Plus作为一种新的放射学数据源集合，公开提供以增强放射学领域所有后续机器学习任务的扩展性、性能、稳健性和公平性的模型。CheXpert Plus是放射学领域公开发布的最大文本数据集，共有3600万个文本标记，其中包括1300万个印象标记。据我们所知，它代表了放射学领域最大规模的文本去识别工作，近100万个PHI跨度已被匿名化。这是放射学领域第二次发布大规模英语配对数据集，从而首次实现了跨机构规模训练。所有报告均配有DICOM格式的高质量图像，以及涵盖各种临床和社会经济群体的许多图像和患者元数据，以及许多病理标签和RadGraph注释。我们希望这个数据集能促进AI模型的研究，进一步协助放射科医生，并帮助改善医疗护理。数据可在以下网址获取：https://stanfordaimi.azurewebsites.net/datasets/5158c524-d3ab-4e02-96e9-6ee9efc110a1 模型可在以下网址获取：https://github.com/Stanford-AIMI/chexpert-plus

更新时间: 2024-05-29 21:48:56

领域: cs.CL,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.19538v1

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of safety-critical parameters in LLMs. Our method is grounded in a pivotal observation: the gradients of an LLM's loss for jailbreak prompts paired with compliance response exhibit similar patterns on certain safety-critical parameters. In contrast, safe prompts lead to different gradient patterns. Building on this observation, GradSafe analyzes the gradients from prompts (paired with compliance responses) to accurately detect jailbreak prompts. We show that GradSafe, applied to Llama-2 without further training, outperforms Llama Guard, despite its extensive finetuning with a large dataset, in detecting jailbreak prompts. This superior performance is consistent across both zero-shot and adaptation scenarios, as evidenced by our evaluations on ToxicChat and XSTest. The source code is available at https://github.com/xyq7/GradSafe.

Updated: 2024-05-29 21:45:35

标题: GradSafe：通过安全关键梯度分析检测LLMs的越狱提示

摘要: 大型语言模型（LLMs）面临来自越狱提示的威胁。现有的检测越狱提示的方法主要是在线审查API或经过微调的LLMs。然而，这些策略通常需要大量且资源密集的数据收集和训练过程。在本研究中，我们提出了GradSafe，通过审查LLMs中安全关键参数的梯度有效地检测越狱提示。我们的方法基于一个重要观察：越狱提示与合规响应配对的LLM损失的梯度在某些安全关键参数上呈现出相似的模式。相比之下，安全提示会导致不同的梯度模式。基于这一观察，GradSafe分析来自提示（与合规响应配对）的梯度，准确地检测越狱提示。我们展示了，应用于Llama-2的GradSafe，在没有进一步训练的情况下，优于Llama Guard，在检测越狱提示方面，尽管后者经过了大规模数据集的广泛微调。这种卓越的性能在零射击和适应场景下都是一致的，这一点在我们对ToxicChat和XSTest的评估中有所体现。源代码可在https://github.com/xyq7/GradSafe上找到。

更新时间: 2024-05-29 21:45:35

领域: cs.CL,cs.CR

下载: http://arxiv.org/abs/2402.13494v2

Preference Learning Algorithms Do Not Learn Preference Rankings

Preference learning algorithms (e.g., RLHF and DPO) are frequently used to steer LLMs to produce generations that are more preferred by humans, but our understanding of their inner workings is still limited. In this work, we study the conventional wisdom that preference learning trains models to assign higher likelihoods to more preferred outputs than less preferred outputs, measured via $\textit{ranking accuracy}$. Surprisingly, we find that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common preference datasets. We furthermore derive the $\textit{idealized ranking accuracy}$ that a preference-tuned LLM would achieve if it optimized the DPO or RLHF objective perfectly. We demonstrate that existing models exhibit a significant $\textit{alignment gap}$ -- $\textit{i.e.}$, a gap between the observed and idealized ranking accuracies. We attribute this discrepancy to the DPO objective, which is empirically and theoretically ill-suited to fix even mild ranking errors in the reference model, and derive a simple and efficient formula for quantifying the difficulty of learning a given preference datapoint. Finally, we demonstrate that ranking accuracy strongly correlates with the empirically popular win rate metric when the model is close to the reference model used in the objective, shedding further light on the differences between on-policy (e.g., RLHF) and off-policy (e.g., DPO) preference learning algorithms.

Updated: 2024-05-29 21:29:44

标题: 偏好学习算法无法学习偏好排序

摘要: 偏好学习算法（例如RLHF和DPO）经常被用来引导LLMs生成更受人类偏好的结果，但我们对它们的内部工作原理的理解仍然有限。在这项工作中，我们研究了偏好学习训练模型将更受偏好输出分配更高概率比较不受偏好输出的传统智慧，通过$\textit{排名准确性}$来衡量。令人惊讶的是，我们发现大多数最先进的偏好调整模型在常见偏好数据集上的排名准确性不到60%。此外，我们推导了偏好调整LLM将实现的$\textit{理想排名准确性}$，如果它完美地优化了DPO或RLHF目标。我们证明现有模型存在显著的$\textit{对齐差距}$ -- 即，观察到的排名准确性与理想排名准确性之间的差距。我们将这种差异归因于DPO目标，实证和理论上不适合修复参考模型中的甚至轻微的排名错误，并推导了一个简单而高效的公式，用于量化学习给定偏好数据点的难度。最后，我们证明排名准确性与经验上受欢迎的胜率指标密切相关，当模型接近目标中使用的参考模型时，进一步阐明了在线（例如RLHF）和离线（例如DPO）偏好学习算法之间的差异。

更新时间: 2024-05-29 21:29:44

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19534v1

Contrasting Multiple Representations with the Multi-Marginal Matching Gap

Learning meaningful representations of complex objects that can be seen through multiple ($k\geq 3$) views or modalities is a core task in machine learning. Existing methods use losses originally intended for paired views, and extend them to $k$ views, either by instantiating $\tfrac12k(k-1)$ loss-pairs, or by using reduced embeddings, following a \textit{one vs. average-of-rest} strategy. We propose the multi-marginal matching gap (M3G), a loss that borrows tools from multi-marginal optimal transport (MM-OT) theory to simultaneously incorporate all $k$ views. Given a batch of $n$ points, each seen as a $k$-tuple of views subsequently transformed into $k$ embeddings, our loss contrasts the cost of matching these $n$ ground-truth $k$-tuples with the MM-OT polymatching cost, which seeks $n$ optimally arranged $k$-tuples chosen within these $n\times k$ vectors. While the exponential complexity $O(n^k$) of the MM-OT problem may seem daunting, we show in experiments that a suitable generalization of the Sinkhorn algorithm for that problem can scale to, e.g., $k=3\sim 6$ views using mini-batches of size $64~\sim128$. Our experiments demonstrate improved performance over multiview extensions of pairwise losses, for both self-supervised and multimodal tasks.

Updated: 2024-05-29 21:24:44

标题: 用多重表示与多边匹配间隙进行对比

摘要: 学习复杂对象的有意义表示是机器学习中的核心任务，这些对象可以通过多个（$k\geq 3$）视图或模态进行观察。现有方法使用最初用于配对视图的损失，并将它们扩展到$k$个视图，方法是实例化$\tfrac12k(k-1)$个损失对，或者使用减少的嵌入，遵循“一个对其他的平均值”策略。我们提出了多边缘匹配差（M3G），这是一种从多边缘最优传输（MM-OT）理论中借鉴工具的损失，可以同时纳入所有$k$个视图。给定一个批次包含$n$个点，每个点被视为一个$k$元组的视图，随后转换为$k$个嵌入，我们的损失对比匹配这些$n$个基本真值$k$元组的成本与MM-OT多匹配成本，后者寻找$n$个在这$n\times k$个向量中选择的最优排列的$k$元组。虽然MM-OT问题的指数复杂度$O(n^k$)可能看起来令人生畏，但我们在实验中展示了适用于该问题的Sinkhorn算法的适当泛化可以扩展到，例如，使用大小为$64~\sim128$的小批量处理$k=3\sim 6$个视图。我们的实验表明，在自监督和多模态任务中，相比于配对损失的多视图扩展，我们的方法表现出更好的性能。

更新时间: 2024-05-29 21:24:44

领域: cs.LG

下载: http://arxiv.org/abs/2405.19532v1

Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives

Advances in artificial intelligence (AI) have been propelling the evolution of human-robot interaction (HRI) technologies. However, significant challenges remain in achieving seamless interactions, particularly in tasks requiring physical contact with humans. These challenges arise from the need for accurate real-time perception of human actions, adaptive control algorithms for robots, and the effective coordination between human and robotic movements. In this paper, we propose an approach to enhancing physical HRI with a focus on dynamic robot-assisted hand-object interaction (HOI). Our methodology integrates hand pose estimation, adaptive robot control, and motion primitives to facilitate human-robot collaboration. Specifically, we employ a transformer-based algorithm to perform real-time 3D modeling of human hands from single RGB images, based on which a motion primitives model (MPM) is designed to translate human hand motions into robotic actions. The robot's action implementation is dynamically fine-tuned using the continuously updated 3D hand models. Experimental validations, including a ring-wearing task, demonstrate the system's effectiveness in adapting to real-time movements and assisting in precise task executions.

Updated: 2024-05-29 21:20:16

标题: 实时动态机器人辅助手-物体互动通过运动基元

摘要: 人工智能（AI）的进步推动了人机交互（HRI）技术的发展。然而，要实现无缝交互仍然存在重大挑战，特别是在需要与人类进行物理接触的任务中。这些挑战源于对人类行动的准确实时感知、机器人的自适应控制算法以及人机协调之间的有效协调。在本文中，我们提出了一种增强物理HRI的方法，重点关注动态机器人辅助的手-物体交互（HOI）。我们的方法整合了手部姿势估计、自适应机器人控制和运动基元，以促进人机协作。具体来说，我们采用基于转换器的算法，从单个RGB图像中实时建模人类手部的3D模型，基于此设计了一个运动基元模型（MPM），将人类手部动作转化为机器人动作。机器人的动作实现通过不断更新的3D手部模型进行动态微调。实验验证，包括戴戒指任务，证明了系统在适应实时动作并协助精确任务执行方面的有效性。

更新时间: 2024-05-29 21:20:16

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.19531v1

AI Risk Management Should Incorporate Both Safety and Security

The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.

Updated: 2024-05-29 21:00:47

标题: 人工智能风险管理应该同时考虑安全和安全性

摘要: 在安全对齐的语言模型中暴露安全漏洞，例如易受对抗性攻击的影响，揭示了人工智能安全和人工智能安全之间复杂的相互作用。尽管这两个学科现在在人工智能风险管理的总体目标下汇聚在一起，但它们在历史上是分开发展的，导致了不同的观点。因此，在本文中，我们主张人工智能风险管理的利益相关者应当意识到安全和安全之间的细微差别、协同作用，并明确地考虑这两个学科的观点，以制定大多数有效和整体的风险缓解方法。不幸的是，这种愿景往往被混淆，因为"安全"和"安全"这两个基本概念的定义本身在各个社区之间常常不一致，缺乏共识。随着人工智能风险管理越来越跨学科，这个问题尤为突出。鉴于这一概念上的挑战，我们引入了一个统一的参考框架，以阐明人工智能安全和人工智能安全之间的差异和相互作用，旨在促进社区之间的共同理解和有效合作。

更新时间: 2024-05-29 21:00:47

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.19524v1

Predicting Traffic Congestion at Urban Intersections Using Data-Driven Modeling

Traffic congestion at intersections is a significant issue in urban areas, leading to increased commute times, safety hazards, and operational inefficiencies. This study aims to develop a predictive model for congestion at intersections in major U.S. cities, utilizing a dataset of trip-logging metrics from commercial vehicles across 4,800 intersections. The dataset encompasses 27 features, including intersection coordinates, street names, time of day, and traffic metrics (Kashyap et al., 2019). Additional features, such as rainfall/snowfall percentage, distance from downtown and outskirts, and road types, were incorporated to enhance the model's predictive power. The methodology involves data exploration, feature transformation, and handling missing values through low-rank models and label encoding. The proposed model has the potential to assist city planners and governments in anticipating traffic hot spots, optimizing operations, and identifying infrastructure challenges.

Updated: 2024-05-29 21:00:44

标题: 利用数据驱动建模预测城市交叉口交通拥堵

摘要: 在城市地区，交通拥堵是一个重要问题，导致通勤时间增加、安全隐患和运营效率低下。本研究旨在开发一个预测模型，用于预测美国主要城市交叉口的拥堵情况，利用一组商用车辆的行程记录数据集，涵盖4800个交叉口。该数据集包括27个特征，包括交叉口坐标、街道名称、时间和交通指标。为增强模型的预测能力，还引入了额外的特征，如降雨/降雪百分比、距离市中心和郊区的距离以及道路类型。方法包括数据探索、特征转换和通过低秩模型和标签编码处理缺失值。该模型有潜力帮助城市规划者和政府预测交通热点，优化运营并识别基础设施挑战。

更新时间: 2024-05-29 21:00:44

领域: cs.LG

下载: http://arxiv.org/abs/2404.08838v10

Artificial Intelligence Index Report 2024

The 2024 Index is our most comprehensive to date and arrives at an important moment when AI's influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ever before, this edition introduces new estimates on AI training costs, detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI's impact on science and medicine. The AI Index report tracks, collates, distills, and visualizes data related to artificial intelligence (AI). Our mission is to provide unbiased, rigorously vetted, broadly sourced data in order for policymakers, researchers, executives, journalists, and the general public to develop a more thorough and nuanced understanding of the complex field of AI. The AI Index is recognized globally as one of the most credible and authoritative sources for data and insights on artificial intelligence. Previous editions have been cited in major newspapers, including the The New York Times, Bloomberg, and The Guardian, have amassed hundreds of academic citations, and been referenced by high-level policymakers in the United States, the United Kingdom, and the European Union, among other places. This year's edition surpasses all previous ones in size, scale, and scope, reflecting the growing significance that AI is coming to hold in all of our lives.

Updated: 2024-05-29 20:59:57

标题: 人工智能指数报告2024

摘要: 2024指数是迄今为止我们最全面的一份报告，恰逢人工智能对社会的影响从未如此显著的时刻。今年，我们将范围扩大，更广泛地涵盖了人工智能技术进步、公众对技术的看法以及围绕其发展的地缘政治动态等重要趋势。本版报告比以往任何时候都包含更多原始数据，引入了新的人工智能培训成本估算、对负责任人工智能领域的详细分析，以及专门探讨人工智能对科学和医学影响的全新章节。人工智能指数报告跟踪、整合、提炼和可视化与人工智能相关的数据。我们的使命是提供经过严格审查、广泛来源的无偏见数据，以便决策者、研究人员、高管、记者和普通公众能够对人工智能这一复杂领域有更全面、更细致的理解。人工智能指数被全球公认为关于人工智能数据和洞见的最可靠和权威来源之一。以往的版本曾在包括《纽约时报》、彭博社和《卫报》在内的主要报纸中被引用，获得了数百个学术引用，并被美国、英国和欧盟等地的高级决策者所引用。今年的版本在规模、规模和范围上超过了以往所有版本，反映了人工智能在我们生活中日益重要的地位。

更新时间: 2024-05-29 20:59:57

领域: cs.AI

下载: http://arxiv.org/abs/2405.19522v1

Crowdsourcing with Difficulty: A Bayesian Rating Model for Heterogeneous Items

In applied statistics and machine learning, the "gold standards" used for training are often biased and almost always noisy. Dawid and Skene's justifiably popular crowdsourcing model adjusts for rater (coder, annotator) sensitivity and specificity, but fails to capture distributional properties of rating data gathered for training, which in turn biases training. In this study, we introduce a general purpose measurement-error model with which we can infer consensus categories by adding item-level effects for difficulty, discriminativeness, and guessability. We further show how to constrain the bimodal posterior of these models to avoid (or if necessary, allow) adversarial raters. We validate our model's goodness of fit with posterior predictive checks, the Bayesian analogue of $\chi^2$ tests. Dawid and Skene's model is rejected by goodness of fit tests, whereas our new model, which adjusts for item heterogeneity, is not rejected. We illustrate our new model with two well-studied data sets, binary rating data for caries in dental X-rays and implication in natural language.

Updated: 2024-05-29 20:59:28

标题: 用困难的众包：一种适用于异构项目的贝叶斯评分模型

摘要: 在应用统计学和机器学习中，用于训练的“黄金标准”通常存在偏差，几乎总是杂音干扰。Dawid和Skene的众包模型可以调整评分者（编码者，注释员）的敏感性和特异性，但无法捕捉用于训练的评分数据的分布特性，因此导致训练偏差。在本研究中，我们引入了一个通用的测量误差模型，通过为难度、区分度和猜测性添加项目级别效应，我们可以推断出共识类别。我们进一步展示如何约束这些模型的双峰后验分布，以避免（或必要时允许）敌对评分者。我们通过后验预测检查验证了我们模型的拟合度，这是$\chi^2$检验的贝叶斯模拟。Dawid和Skene的模型在拟合度测试中被拒绝，而我们的新模型，可以调整项目的异质性，没有被拒绝。我们用两个经过充分研究的数据集，即牙齿X射线中龋齿的二进制评分数据和自然语言中的含义，来说明我们的新模型。

更新时间: 2024-05-29 20:59:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19521v1

S2malloc: Statistically Secure Allocator for Use-After-Free Protection And More

Attacks on heap memory, encompassing memory overflow, double and invalid free, use-after-free (UAF), and various heap spraying techniques are ever-increasing. Existing entropy-based secure memory allocators provide statistical defenses against virtually all of these attack vectors. Although they claim protections against UAF attacks, their designs are not tailored to detect (failed) attempts. Consequently, to beat this entropy-based protection, an attacker can simply launch the same attack repeatedly with the potential use of heap spraying to further improve their chance of success. We introduce S2malloc, aiming to enhance UAF-attempt detection without compromising other security guarantees or introducing significant performance overhead. To achieve this, we use three innovative constructs in secure allocator design: free block canaries (FBC) to detect UAF attempts, random in-block offset (RIO) to stop the attacker from accurately overwriting the victim object, and random bag layout (RBL) to impede attackers from estimating the block size based on its address. We show that (a) by reserving 25% of the object size for the RIO offset, an 8-byte canary offers a 69% protection rate if the attacker reuses the same pointer and 96% protection rate if the attacker does not, against UAF exploitation attempts targeting a 64 bytes object, with equal or higher security guarantees against all other attacks; and (b) S2malloc is practical, with only a 2.8% run-time overhead on PARSEC and an 11.5% overhead on SPEC. Compared to state-of-the-art entropy-based allocators, S2malloc improves UAF-protection without incurring additional performance overhead. Compared to UAF-mitigating allocators, S2malloc trades off a minuscule probability of failed protection for significantly lower overhead.

Updated: 2024-05-29 20:59:20

标题: S2malloc：用于使用之后释放保护和更多的统计安全分配器

摘要: 对堆内存的攻击，包括内存溢出、双重和无效释放、使用后释放（UAF）以及各种堆喷射技术，正在不断增加。现有基于熵的安全内存分配器提供针对几乎所有这些攻击向量的统计防御。尽管它们声称对抗UAF攻击，但它们的设计并未针对检测（失败的）尝试。因此，为了击败基于熵的保护，攻击者可以简单地反复发动相同的攻击，并潜在地使用堆喷射来进一步提高成功的机会。我们介绍了S2malloc，旨在增强UAF尝试检测，而不损害其他安全保证或引入显著的性能开销。为了实现这一目标，我们在安全分配器设计中使用了三种创新构造：自由块金丝雀（FBC）用于检测UAF尝试，随机内部偏移（RIO）阻止攻击者准确覆盖受害对象，以及随机包布局（RBL）阻碍攻击者根据地址估计块大小。我们表明，（a）通过将对象大小的25%保留给RIO偏移，如果攻击者重复使用相同指针，则8字节金丝雀提供69%的保护率，如果攻击者没有重复使用，则提供96%的保护率，针对一个64字节对象的UAF利用尝试，具有相等或更高的安全保证针对所有其他攻击；（b）S2malloc是实用的，在PARSEC上只有2.8%的运行时开销，在SPEC上则有11.5%的开销。与最先进的基于熵的分配器相比，S2malloc提高了UAF保护而不产生额外的性能开销。与UAF缓解分配器相比，S2malloc在显著降低开销的同时，权衡了微小的失败保护概率。

更新时间: 2024-05-29 20:59:20

领域: cs.CR

下载: http://arxiv.org/abs/2402.01894v2

Two-layer retrieval augmented generation framework for low-resource medical question-answering: proof of concept using Reddit data

Retrieval augmented generation (RAG) provides the capability to constrain generative model outputs, and mitigate the possibility of hallucination, by providing relevant in-context text. The number of tokens a generative large language model (LLM) can incorporate as context is finite, thus limiting the volume of knowledge from which to generate an answer. We propose a two-layer RAG framework for query-focused answer generation and evaluate a proof-of-concept for this framework in the context of query-focused summary generation from social media forums, focusing on emerging drug-related information. The evaluations demonstrate the effectiveness of the two-layer framework in resource constrained settings to enable researchers in obtaining near real-time data from users.

Updated: 2024-05-29 20:56:52

标题: 一个基于两层检索增强生成的低资源医疗问答框架：以Reddit数据为例的概念验证

摘要: 检索增强生成（RAG）提供了约束生成模型输出的能力，并通过提供相关的上下文文本来减轻产生幻觉的可能性。一个生成性大型语言模型（LLM）可以整合的token数量是有限的，从而限制了生成答案的知识量。我们提出了一个面向查询的答案生成的两层RAG框架，并在社交媒体论坛中的查询焦点摘要生成的背景下评估了这个框架的概念验证。评估结果表明，在资源受限制的环境中，这个两层框架的有效性，可以帮助研究人员从用户那里获得接近实时的数据。

更新时间: 2024-05-29 20:56:52

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19519v1

Exploring the Potential of Hybrid Machine-Learning/Physics-Based Modeling for Atmospheric/Oceanic Prediction Beyond the Medium Range

This paper explores the potential of a hybrid modeling approach that combines machine learning (ML) with conventional physics-based modeling for weather prediction beyond the medium range. It extends the work of Arcomano et al. (2022), which tested the approach for short- and medium-range weather prediction, and the work of Arcomano et al. (2023), which investigated its potential for climate modeling. The hybrid model used for the forecast experiments of the paper is based on the low-resolution, simplified parameterization atmospheric general circulation model (AGCM) SPEEDY. In addition to the hybridized prognostic variables of SPEEDY, the current version of the model has three purely ML-based prognostic variables. One of these is 6~h cumulative precipitation, another is the sea surface temperature, while the third is the heat content of the top 300 m deep layer of the ocean. The model has skill in predicting the El Ni\~no cycle and its global teleconnections with precipitation for 3-7 months depending on the season. The model captures equatorial variability of the precipitation associated with Kelvin and Rossby waves and MJO. Predictions of the precipitation in the equatorial region have skill for 15 days in the East Pacific and 11.5 days in the West Pacific. Though the model has low spatial resolution, for these tasks it has prediction skill comparable to what has been published for high-resolution, purely physics-based, conventional operational forecast models.

Updated: 2024-05-29 20:56:44

标题: 探索混合机器学习/基于物理的建模在中长期大气/海洋预测中的潜力

摘要: 这篇论文探讨了一种混合建模方法的潜力，该方法将机器学习（ML）与传统的基于物理的建模相结合，用于超出中期范围的天气预测。它延续了Arcomano等人（2022年）的工作，该工作测试了该方法用于短期和中期天气预测，并延续了Arcomano等人（2023年）的工作，该工作调查了其在气候建模中的潜力。用于本文预测实验的混合模型基于低分辨率、简化的参数化大气环流模式（AGCM）SPEEDY。除了SPEEDY的混合预测变量外，该模型的当前版本还具有三个纯ML基础的预测变量。其中一个是6小时累积降水量，另一个是海表温度，第三个是海洋顶部300米深层的热含量。该模型在预测El Ni\~no循环及其与降水的全球遥相关方面具有技能，预测范围为3-7个月，具体取决于季节。该模型捕捉了与Kelvin和Rossby波以及MJO相关的降水的赤道变异性。在东太平洋，赤道地区降水的预测具有15天的技能，而在西太平洋则为11.5天。虽然该模型具有较低的空间分辨率，但对于这些任务，其预测技能与已发布的高分辨率、纯物理基础的传统运行预测模型相当。

更新时间: 2024-05-29 20:56:44

领域: physics.ao-ph,cs.LG,nlin.CD

下载: http://arxiv.org/abs/2405.19518v1

Enabling Visual Recognition at Radio Frequency

This paper introduces PanoRadar, a novel RF imaging system that brings RF resolution close to that of LiDAR, while providing resilience against conditions challenging for optical signals. Our LiDAR-comparable 3D imaging results enable, for the first time, a variety of visual recognition tasks at radio frequency, including surface normal estimation, semantic segmentation, and object detection. PanoRadar utilizes a rotating single-chip mmWave radar, along with a combination of novel signal processing and machine learning algorithms, to create high-resolution 3D images of the surroundings. Our system accurately estimates robot motion, allowing for coherent imaging through a dense grid of synthetic antennas. It also exploits the high azimuth resolution to enhance elevation resolution using learning-based methods. Furthermore, PanoRadar tackles 3D learning via 2D convolutions and addresses challenges due to the unique characteristics of RF signals. Our results demonstrate PanoRadar's robust performance across 12 buildings.

Updated: 2024-05-29 20:52:59

标题: 使用射频技术实现视觉识别

摘要: 本文介绍了PanoRadar，一种新颖的射频成像系统，将射频分辨率提高至接近LiDAR的水平，同时提供对光学信号挑战条件的弹性。我们的类似LiDAR的3D成像结果首次实现了在射频下的各种视觉识别任务，包括表面法线估计、语义分割和目标检测。PanoRadar利用旋转的单芯片毫米波雷达，结合新颖的信号处理和机器学习算法，生成周围环境的高分辨率3D图像。我们的系统准确估计机器人运动，通过合成天线密集网格实现相干成像。它还利用高方位分辨率，利用基于学习的方法增强高程分辨率。此外，PanoRadar通过2D卷积处理3D学习，并解决由射频信号独特特性引起的挑战。我们的结果展示了PanoRadar在12座建筑物上的稳健性能。

更新时间: 2024-05-29 20:52:59

领域: eess.SP,cs.CV,cs.LG,cs.RO

下载: http://arxiv.org/abs/2405.19516v1

Decentralized Optimization in Time-Varying Networks with Arbitrary Delays

We consider a decentralized optimization problem for networks affected by communication delays. Examples of such networks include collaborative machine learning, sensor networks, and multi-agent systems. To mimic communication delays, we add virtual non-computing nodes to the network, resulting in directed graphs. This motivates investigating decentralized optimization solutions on directed graphs. Existing solutions assume nodes know their out-degrees, resulting in limited applicability. To overcome this limitation, we introduce a novel gossip-based algorithm, called DT-GO, that does not need to know the out-degrees. The algorithm is applicable in general directed networks, for example networks with delays or limited acknowledgment capabilities. We derive convergence rates for both convex and non-convex objectives, showing that our algorithm achieves the same complexity order as centralized Stochastic Gradient Descent. In other words, the effects of the graph topology and delays are confined to higher-order terms. Additionally, we extend our analysis to accommodate time-varying network topologies. Numerical simulations are provided to support our theoretical findings.

Updated: 2024-05-29 20:51:38

标题: 在具有任意延迟的时变网络中的分散优化

摘要: 我们考虑了一个受通信延迟影响的网络的分散优化问题。这样的网络的例子包括协作机器学习、传感器网络和多智能体系统。为了模拟通信延迟，我们向网络中添加了虚拟非计算节点，从而得到了有向图。这促使我们研究有向图上的分散优化解决方案。现有的解决方案假设节点知道它们的出度，导致适用性有限。为了克服这一限制，我们引入了一种新颖的基于传闻的算法，称为DT-GO，它不需要知道出度。该算法适用于一般的有向网络，例如具有延迟或有限确认能力的网络。我们推导了凸和非凸目标的收敛速度，表明我们的算法达到了与集中式随机梯度下降相同的复杂度顺序。换句话说，图的拓扑结构和延迟的影响被限制在高阶项中。此外，我们扩展了我们的分析以适应时变网络拓扑。我们提供了数值模拟来支持我们的理论发现。

更新时间: 2024-05-29 20:51:38

领域: cs.LG,cs.DC,cs.SY,eess.SY,math.OC,stat.ML,68W10, 68W15, 68W40, 90C06, 90C35, 90C25,G.1.6; F.2.1; E.4

下载: http://arxiv.org/abs/2405.19513v1

IncidentResponseGPT: Generating Traffic Incident Response Plans with Generative Artificial Intelligence

Traffic congestion due to road incidents poses a significant challenge in urban environments, leading to increased pollution, economic losses, and traffic congestion. Efficiently managing these incidents is imperative for mitigating their adverse effects; however, the complexity of urban traffic systems and the variety of potential incidents represent a considerable obstacle. This paper introduces IncidentResponseGPT, an innovative solution designed to assist traffic management authorities by providing rapid, informed, and adaptable traffic incident response plans. By integrating a Generative AI platform with real-time traffic incident reports and operational guidelines, our system aims to streamline the decision-making process in responding to traffic incidents. The research addresses the critical challenges involved in deploying AI in traffic management, including overcoming the complexity of urban traffic networks, ensuring real-time decision-making capabilities, aligning with local laws and regulations, and securing public acceptance for AI-driven systems. Through a combination of text analysis of accident reports, validation of AI recommendations through traffic simulation, and implementation of transparent and validated AI systems, IncidentResponseGPT offers a promising approach to optimizing traffic flow and reducing congestion in the face of traffic incidents. The relevance of this work extends to traffic management authorities, emergency response teams, and municipal bodies, all integral stakeholders in urban traffic control and incident management. By proposing a novel solution to the identified challenges, this research aims to develop a framework that not only facilitates faster resolution of traffic incidents but also minimizes their overall impact on urban traffic systems.

Updated: 2024-05-29 20:50:48

标题: 事故响应GPT：利用生成人工智能生成交通事故响应计划

摘要: 由于道路事故导致的交通拥堵在城市环境中构成了一个重大挑战，导致污染增加、经济损失和交通拥堵。有效管理这些事故对于减轻其不利影响至关重要；然而，城市交通系统的复杂性和潜在事故的多样性构成了一个相当大的障碍。本文介绍了IncidentResponseGPT，这是一个创新解决方案，旨在通过提供快速、知情和可适应的交通事故应对计划，协助交通管理部门。通过将生成式人工智能平台与实时交通事故报告和操作指南集成，我们的系统旨在简化对交通事故的响应决策过程。该研究解决了在交通管理中部署人工智能所涉及的关键挑战，包括克服城市交通网络的复杂性、确保实时决策能力、与当地法律法规保持一致，以及确保公众对以人工智能驱动的系统的接受。通过对事故报告的文本分析、通过交通模拟验证人工智能建议，并实施透明且经过验证的人工智能系统，IncidentResponseGPT提供了一个有前途的方法，优化交通流量并减少交通事故导致的拥堵。这项工作的相关性延伸到交通管理部门、应急响应团队和市政机构，它们都是城市交通控制和事故管理中不可或缺的利益相关者。通过提出对已确定挑战的新颖解决方案，本研究旨在开发一个框架，不仅能促进交通事故更快速地解决，还能最小化其对城市交通系统的整体影响。

更新时间: 2024-05-29 20:50:48

领域: cs.LG,cs.HC

下载: http://arxiv.org/abs/2404.18550v2

Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectives

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to tune language models to easily maximize non-differentiable and non-binary objectives according to the LLM designer's preferences (e.g., using simpler language or minimizing specific kinds of harmful content). These may neither align with user preferences nor even be able to be captured tractably by binary preference data. To leverage the simplicity and performance of DPO with the generalizability of RL, we propose a hybrid approach between DPO and RLHF. With a simple augmentation to the implicit reward decomposition of DPO, we allow for tuning LLMs to maximize a set of arbitrary auxiliary rewards using offline RL. The proposed method, Hybrid Preference Optimization (HPO), shows the ability to effectively generalize to both user preferences and auxiliary designer objectives, while preserving alignment performance across a range of challenging benchmarks and model sizes.

Updated: 2024-05-29 20:48:47

标题: 混合偏好优化：通过辅助目标增强直接偏好优化

摘要: 为了对齐大型语言模型（LLMs），先前的工作利用强化学习通过人类反馈（RLHF）或直接偏好优化（DPO）的变体。虽然DPO提供了一个基于最大似然估计的更简单的框架，但它在调整语言模型以轻松最大化非可微分和非二进制目标方面做出了妥协，这些目标根据LLM设计者的偏好（例如，使用更简单的语言或最小化特定类型的有害内容）。这些可能既不符合用户偏好，也无法通过二进制偏好数据轻松捕捉。为了利用DPO的简易性和性能与RL的泛化能力，我们提出了一种介于DPO和RLHF之间的混合方法。通过对DPO的隐式奖励分解进行简单的增强，我们允许调整LLMs以最大化一组任意辅助奖励，使用离线RL。所提出的方法，混合偏好优化（HPO），展示了在一系列具有挑战性的基准测试和模型大小上保持对齐性能的能力，同时有效地泛化到用户偏好和辅助设计目标。

更新时间: 2024-05-29 20:48:47

领域: cs.AI

下载: http://arxiv.org/abs/2405.17956v2

Serpent: Scalable and Efficient Image Restoration via Multi-scale Structured State Space Models

The landscape of computational building blocks of efficient image restoration architectures is dominated by a combination of convolutional processing and various attention mechanisms. However, convolutional filters, while efficient, are inherently local and therefore struggle with modeling long-range dependencies in images. In contrast, attention excels at capturing global interactions between arbitrary image regions, but suffers from a quadratic cost in image dimension. In this work, we propose Serpent, an efficient architecture for high-resolution image restoration that combines recent advances in state space models (SSMs) with multi-scale signal processing in its core computational block. SSMs, originally introduced for sequence modeling, can maintain a global receptive field with a favorable linear scaling in input size. We propose a novel hierarchical architecture inspired by traditional signal processing principles, that converts the input image into a collection of sequences and processes them in a multi-scale fashion. Our experimental results demonstrate that Serpent can achieve reconstruction quality on par with state-of-the-art techniques, while requiring orders of magnitude less compute (up to $150$ fold reduction in FLOPS) and a factor of up to $5\times$ less GPU memory while maintaining a compact model size. The efficiency gains achieved by Serpent are especially notable at high image resolutions.

Updated: 2024-05-29 20:43:07

标题: 蛇形：通过多尺度结构状态空间模型实现可扩展高效的图像恢复

摘要: 高效图像恢复体系结构的计算构建块景观主要由卷积处理和各种注意机制组合而成。然而，卷积滤波器虽然高效，但本质上是局部的，因此在建模图像中的长程依赖关系方面存在困难。相比之下，注意机制擅长捕捉任意图像区域之间的全局交互作用，但在图像维度方面存在二次成本。在这项工作中，我们提出了Serpent，一种高分辨率图像恢复的高效体系结构，它将最近在状态空间模型（SSMs）和多尺度信号处理方面的进展结合在其核心计算块中。SSMs最初用于序列建模，可以在输入尺寸上保持全局感受域，并且具有良好的线性缩放。我们提出了一种受传统信号处理原则启发的新颖分层体系结构，将输入图像转换为一组序列，并以多尺度方式处理它们。我们的实验结果表明，Serpent可以实现与最先进技术相媲美的重建质量，同时需要数量级更少的计算量（FLOPS减少高达150倍）和少达5倍的GPU内存，同时保持紪洲的模型大小。Serpent实现的效率提升在高分辨率图像上尤为显著。

更新时间: 2024-05-29 20:43:07

领域: eess.IV,cs.CV,cs.LG,I.4.4; I.4.5

下载: http://arxiv.org/abs/2403.17902v2

Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on the performance of different LLMs, specifically for domain-specific data such as medical chart notes. We propose an evaluation approach to analyze the performance of open-source LLMs such as Llama2 and Mistral for medical summarization tasks, using GPT-4 as an assessor. Our innovative approach to quantitative evaluation of LLMs can enable quality control, support the selection of effective LLMs for specific tasks, and advance knowledge discovery in digital health.

Updated: 2024-05-29 20:40:32

标题: 开源语言模型在总结医学文本数据中的比较分析

摘要: 医疗记录和对话中的非结构化文本包含丰富的信息。最近，大型语言模型（LLMs）的最新进展在非结构化文本数据上的问答和摘要任务中表现出卓越的性能，优于传统的文本分析方法。然而，文献中缺乏系统评估和报告不同LLMs性能的科学研究，特别是针对医疗记录等领域特定数据。我们提出了一种评估方法，用于分析开源LLMs（如Llama2和Mistral）在医疗摘要任务中的表现，以GPT-4作为评估者。我们创新的LLMs定量评估方法可以实现质量控制，支持为特定任务选择有效的LLMs，并推动数字健康领域的知识发现。

更新时间: 2024-05-29 20:40:32

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16295v3

MDS-ViTNet: Improving saliency prediction for Eye-Tracking with Vision Transformer

In this paper, we present a novel methodology we call MDS-ViTNet (Multi Decoder Saliency by Vision Transformer Network) for enhancing visual saliency prediction or eye-tracking. This approach holds significant potential for diverse fields, including marketing, medicine, robotics, and retail. We propose a network architecture that leverages the Vision Transformer, moving beyond the conventional ImageNet backbone. The framework adopts an encoder-decoder structure, with the encoder utilizing a Swin transformer to efficiently embed most important features. This process involves a Transfer Learning method, wherein layers from the Vision Transformer are converted by the Encoder Transformer and seamlessly integrated into a CNN Decoder. This methodology ensures minimal information loss from the original input image. The decoder employs a multi-decoding technique, utilizing dual decoders to generate two distinct attention maps. These maps are subsequently combined into a singular output via an additional CNN model. Our trained model MDS-ViTNet achieves state-of-the-art results across several benchmarks. Committed to fostering further collaboration, we intend to make our code, models, and datasets accessible to the public.

Updated: 2024-05-29 20:28:04

标题: MDS-ViTNet：利用Vision Transformer改进眼动追踪的显著性预测

摘要: 在本文中，我们提出了一种新颖的方法论，称为MDS-ViTNet（Vision Transformer网络的多解码注意力），用于增强视觉显著性预测或眼动追踪。这种方法在包括营销、医学、机器人技术和零售在内的多个领域具有重要潜力。我们提出了一种网络架构，利用Vision Transformer，超越了传统的ImageNet骨干。该框架采用编码器-解码器结构，编码器利用Swin transformer高效嵌入最重要的特征。这个过程涉及到一个迁移学习方法，其中Vision Transformer的层通过编码器Transformer转换并无缝集成到CNN解码器中。这种方法确保了原始输入图像的最小信息损失。解码器采用多解码技术，利用双解码器生成两个不同的注意力地图。这些地图随后通过额外的CNN模型合并为一个输出。我们训练的模型MDS-ViTNet在几个基准测试中取得了最先进的结果。为了促进进一步的合作，我们打算将我们的代码、模型和数据集提供给公众。

更新时间: 2024-05-29 20:28:04

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19501v1

Momentum for the Win: Collaborative Federated Reinforcement Learning across Heterogeneous Environments

We explore a Federated Reinforcement Learning (FRL) problem where $N$ agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.

Updated: 2024-05-29 20:24:42

标题: 胜利的动力：跨异构环境的协作联邦强化学习

摘要: 我们探讨了一个联邦强化学习（FRL）问题，其中$N$个代理共同学习一个共同策略，而不共享它们的轨迹数据。迄今为止，现有的FRL工作主要集中在在相同或“相似”环境中操作的代理。相比之下，我们的问题设置允许任意大的环境异质性水平。为了获得最大化所有潜在完全不同环境中的平均表现的最优策略，我们提出了两种算法：FedSVRPG-M和FedHAPG-M。与现有结果相比，我们证明了利用动量机制的FedSVRPG-M和FedHAPG-M都能够精确收敛到平均表现函数的稳定点，而不受环境异质性大小的限制。此外，通过结合方差减少技术或Hessian逼近的好处，两种算法均实现了具有$\mathcal{O}\left(\epsilon^{-\frac{3}{2}}/N\right)$样本复杂度的最新收敛结果。值得注意的是，我们的算法在代理数量方面享有线性收敛加速度的优势，突显了代理之间合作寻找共同策略的好处。

更新时间: 2024-05-29 20:24:42

领域: cs.LG,cs.MA,math.OC

下载: http://arxiv.org/abs/2405.19499v1

Machine Psychology: Integrating Operant Conditioning with the Non-Axiomatic Reasoning System for Advancing Artificial General Intelligence Research

This paper introduces an interdisciplinary framework called Machine Psychology, which merges principles from operant learning psychology with a specific Artificial Intelligence model, the Non-Axiomatic Reasoning System (NARS), to enhance Artificial General Intelligence (AGI) research. The core premise of this framework is that adaptation is crucial to both biological and artificial intelligence and can be understood through operant conditioning principles. The study assesses this approach via three operant learning tasks using OpenNARS for Applications (ONA): simple discrimination, changing contingencies, and conditional discrimination tasks. In the simple discrimination task, NARS demonstrated rapid learning, achieving perfect accuracy during both training and testing phases. The changing contingencies task showcased NARS's adaptability, as it successfully adjusted its behavior when task conditions were reversed. In the conditional discrimination task, NARS handled complex learning scenarios effectively, achieving high accuracy by forming and utilizing intricate hypotheses based on conditional cues. These findings support the application of operant conditioning as a framework for creating adaptive AGI systems. NARS's ability to operate under conditions of insufficient knowledge and resources, coupled with its sensorimotor reasoning capabilities, establishes it as a robust model for AGI. The Machine Psychology framework, by incorporating elements of natural intelligence such as continuous learning and goal-driven behavior, offers a scalable and flexible approach for real-world applications. Future research should investigate using enhanced NARS systems, more advanced tasks, and applying this framework to diverse, complex challenges to further progress the development of human-level AI.

Updated: 2024-05-29 20:23:57

标题: 机器心理学：将操作条件与非公理推理系统相结合，推动人工通用智能研究。

摘要: 这篇论文介绍了一个名为机器心理学的跨学科框架，将操作学习心理学原理与特定的人工智能模型——非公理推理系统(NARS)相结合，以增强人工通用智能(AGI)研究。该框架的核心前提是适应对于生物和人工智能都至关重要，并可以通过操作条件原理加以理解。该研究通过三个操作学习任务利用OpenNARS for Applications (ONA)评估了这一方法：简单的辨别、改变条件和条件辨别任务。在简单的辨别任务中，NARS展示了快速学习的能力，在训练和测试阶段均达到了完美的准确性。改变条件任务展示了NARS的适应能力，当任务条件被颠倒时，它成功地调整了自己的行为。在条件辨别任务中，NARS有效地处理了复杂的学习场景，通过形成和利用基于条件提示的复杂假设，实现了高准确性。这些发现支持操作条件作为创建适应性AGI系统的框架。NARS在知识和资源不足的情况下运作的能力，以及其感知运动推理能力，使其成为AGI的强大模型。机器心理学框架通过整合连续学习和目标驱动行为等自然智能元素，为实际应用提供了一种可扩展和灵活的方法。未来的研究应该探讨使用增强的NARS系统、更高级的任务，并将这一框架应用于各种复杂挑战，以进一步推动人类级AI的发展。

更新时间: 2024-05-29 20:23:57

领域: cs.AI

下载: http://arxiv.org/abs/2405.19498v1

Gaussian Flow Bridges for Audio Domain Transfer with Unpaired Data

Audio domain transfer is the process of modifying audio signals to match characteristics of a different domain, while retaining the original content. This paper investigates the potential of Gaussian Flow Bridges, an emerging approach in generative modeling, for this problem. The presented framework addresses the transport problem across different distributions of audio signals through the implementation of a series of two deterministic probability flows. The proposed framework facilitates manipulation of the target distribution properties through a continuous control variable, which defines a certain aspect of the target domain. Notably, this approach does not rely on paired examples for training. To address identified challenges on maintaining the speech content consistent, we recommend a training strategy that incorporates chunk-based minibatch Optimal Transport couplings of data samples and noise. Comparing our unsupervised method with established baselines, we find competitive performance in tasks of reverberation and distortion manipulation. Despite encoutering limitations, the intriguing results obtained in this study underscore potential for further exploration.

Updated: 2024-05-29 20:23:01

标题: 高斯流桥在音频领域无配对数据转移中的应用

摘要: 音频领域转移是修改音频信号以匹配不同领域特征的过程，同时保留原始内容。本文研究了高斯流桥的潜力，这是生成建模中一种新兴方法，用于解决这一问题。所提出的框架通过实现一系列两个确定性概率流来解决不同音频信号分布之间的传输问题。所提出的框架通过一个连续控制变量来促进目标分布属性的调整，该变量定义了目标域的某个特定方面。值得注意的是，这种方法不依赖于配对的示例进行训练。为了解决保持语音内容一致的挑战，我们建议采用一种训练策略，该策略将数据样本和噪声进行基于块的小批量最优传输耦合。将我们的无监督方法与已建立的基线进行比较，我们发现在混响和失真操作任务中表现有竞争力。尽管遇到了一些限制，但本研究取得的有趣结果强调了进一步探索的潜力。

更新时间: 2024-05-29 20:23:01

领域: eess.AS,cs.LG,cs.SD

下载: http://arxiv.org/abs/2405.19497v1

Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ significantly from classical programming approaches or languages. A Code LLM specializing in quantum computing requires a foundational understanding of quantum computing and quantum information theory. However, the scarcity of available quantum code examples and the rapidly evolving field, which necessitates continuous dataset updates, present significant challenges. Moreover, we discuss our work on training Code LLMs to produce high-quality quantum code using the Qiskit library. This work includes an examination of the various aspects of the LLMs used for training and the specific training conditions, as well as the results obtained with our current models. To evaluate our models, we have developed a custom benchmark, similar to HumanEval, which includes a set of tests specifically designed for the field of quantum computing programming using Qiskit. Our findings indicate that our model outperforms existing state-of-the-art models in quantum computing tasks. We also provide examples of code suggestions, comparing our model to other relevant code LLMs. Finally, we introduce a discussion on the potential benefits of Code LLMs for quantum computing computational scientists, researchers, and practitioners. We also explore various features and future work that could be relevant in this context.

Updated: 2024-05-29 20:21:00

标题: Qiskit代码助手：训练LLMs生成量子计算代码

摘要: 大型语言模型（Code LLMs）已经成为强大的工具，通过自动化编码过程和减少构建应用程序所需的时间和精力，改变了软件开发领域的格局。本文着重训练Code LLMs专门用于量子计算领域。我们首先讨论量子计算编程的独特需求，与传统编程方法或语言有显著差异。专门用于量子计算的Code LLMs需要对量子计算和量子信息理论有基础的了解。然而，由于可用量子代码示例的稀缺性和快速发展的领域，需要不断更新数据集，这带来了重大挑战。此外，我们讨论了我们在训练Code LLMs以使用Qiskit库生成高质量量子代码方面的工作。这项工作包括对用于训练的LLMs的各个方面以及具体训练条件的检查，以及我们当前模型的结果。为了评估我们的模型，我们开发了一个类似于HumanEval的自定义基准，其中包含一组专门设计用于使用Qiskit进行量子计算编程的测试。我们的研究结果表明，我们的模型在量子计算任务中优于现有的最先进模型。我们还提供了代码建议的示例，将我们的模型与其他相关的Code LLMs进行比较。最后，我们介绍了关于Code LLMs对量子计算计算科学家、研究人员和从业者潜在益处的讨论。我们还探讨了在这一背景下可能相关的各种特征和未来工作。

更新时间: 2024-05-29 20:21:00

领域: quant-ph,cs.AI

下载: http://arxiv.org/abs/2405.19495v1

Online Nonparametric Supervised Learning for Massive Data

Despite their benefits in terms of simplicity, low computational cost and data requirement, parametric machine learning algorithms, such as linear discriminant analysis, quadratic discriminant analysis or logistic regression, suffer from serious drawbacks including linearity, poor fit of features to the usually imposed normal distribution and high dimensionality. Batch kernel-based nonparametric classifier, which overcomes the linearity and normality of features constraints, represent an interesting alternative for supervised classification problem. However, it suffers from the ``curse of dimension". The problem can be alleviated by the explosive sample size in the era of big data, while large-scale data size presents some challenges in the storage of data and the calculation of the classifier. These challenges make the classical batch nonparametric classifier no longer applicable. This motivates us to develop a fast algorithm adapted to the real-time calculation of the nonparametric classifier in massive as well as streaming data frameworks. This online classifier includes two steps. First, we consider an online principle components analysis to reduce the dimension of the features with a very low computation cost. Then, a stochastic approximation algorithm is deployed to obtain a real-time calculation of the nonparametric classifier. The proposed methods are evaluated and compared to some commonly used machine learning algorithms for real-time fetal well-being monitoring. The study revealed that, in terms of accuracy, the offline (or Batch), as well as, the online classifiers are good competitors to the random forest algorithm. Moreover, we show that the online classifier gives the best trade-off accuracy/computation cost compared to the offline classifier.

Updated: 2024-05-29 20:04:23

标题: 大规模数据的在线非参数监督学习

摘要: 尽管参数机器学习算法在简单性、低计算成本和数据需求方面具有优势，如线性判别分析、二次判别分析或逻辑回归，但存在严重缺点，包括线性、特征拟合通常施加的正态分布和高维度。批处理基于核的非参数分类器，克服了特征约束的线性性和正态性，代表了监督分类问题的一个有趣的替代方案。然而，它受到“维度灾难”的困扰。在大数据时代，爆炸性样本量可以缓解这个问题，而大规模数据大小在数据存储和分类器计算方面提出一些挑战。这些挑战使得传统的批处理非参数分类器不再适用。这促使我们开发一个快速算法，适用于大规模和流数据框架中非参数分类器的实时计算。这个在线分类器包括两个步骤。首先，我们考虑在线主成分分析来降低特征的维度，计算成本非常低。然后，部署一种随机逼近算法来实现非参数分类器的实时计算。提出的方法被评估并与一些常用的机器学习算法进行比较，用于实时胎儿健康监测。研究表明，在准确性方面，离线（或批处理）以及在线分类器是随机森林算法的良好竞争对手。此外，我们展示了在线分类器在准确性/计算成本的最佳权衡方面优于离线分类器。

更新时间: 2024-05-29 20:04:23

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19486v1

Participation in the age of foundation models

Growing interest and investment in the capabilities of foundation models has positioned such systems to impact a wide array of public services. Alongside these opportunities is the risk that these systems reify existing power imbalances and cause disproportionate harm to marginalized communities. Participatory approaches hold promise to instead lend agency and decision-making power to marginalized stakeholders. But existing approaches in participatory AI/ML are typically deeply grounded in context - how do we apply these approaches to foundation models, which are, by design, disconnected from context? Our paper interrogates this question. First, we examine existing attempts at incorporating participation into foundation models. We highlight the tension between participation and scale, demonstrating that it is intractable for impacted communities to meaningfully shape a foundation model that is intended to be universally applicable. In response, we develop a blueprint for participatory foundation models that identifies more local, application-oriented opportunities for meaningful participation. In addition to the "foundation" layer, our framework proposes the "subfloor'' layer, in which stakeholders develop shared technical infrastructure, norms and governance for a grounded domain, and the "surface'' layer, in which affected communities shape the use of a foundation model for a specific downstream task. The intermediate "subfloor'' layer scopes the range of potential harms to consider, and affords communities more concrete avenues for deliberation and intervention. At the same time, it avoids duplicative effort by scaling input across relevant use cases. Through three case studies in clinical care, financial services, and journalism, we illustrate how this multi-layer model can create more meaningful opportunities for participation than solely intervening at the foundation layer.

Updated: 2024-05-29 19:53:23

标题: 基于基础模型的参与

摘要: 对基础模型能力的日益关注和投资，使得这类系统有可能影响各种公共服务。与此同时，存在这些系统固化现有权力失衡并给边缘化社群带来不成比例的伤害的风险。参与式方法有望赋予边缘化利益相关者代理权和决策权。然而，现有的参与式人工智能/机器学习方法通常深深扎根于特定背景 - 我们如何将这些方法应用于基础模型，而基础模型设计上与背景脱钩？首先，我们审视了将参与融入基础模型的现有尝试。我们强调了参与和规模之间的紧张关系，展示了对于受影响社群来说，让他们能够有意义地塑造一个旨在普遍适用的基础模型是不可行的。作为回应，我们提出了一个参与式基础模型的蓝图，确定了更多地基于本地、应用导向的机会，以进行有意义的参与。除了“基础”层之外，我们的框架还提出了“子楼层”层，其中利益相关者共同开发共享的技术基础设施、规范和治理以服务于一个具体领域，以及“表层”层，其中受影响社群塑造基础模型在特定下游任务中的使用。中间的“子楼层”层限定了需要考虑的潜在伤害范围，并为社群提供更具体的讨论和干预途径。同时，通过跨相关用例扩展输入，避免了重复努力。通过在临床护理、金融服务和新闻业中进行三个案例研究，我们说明了这个多层模型如何创造比仅仅在基础层介入更有意义的参与机会。

更新时间: 2024-05-29 19:53:23

领域: cs.CY,cs.AI,cs.HC,cs.LG

下载: http://arxiv.org/abs/2405.19479v1

Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms

This paper considers a stochastic Multi-Armed Bandit (MAB) problem with dual objectives: (i) quick identification and commitment to the optimal arm, and (ii) reward maximization throughout a sequence of $T$ consecutive rounds. Though each objective has been individually well-studied, i.e., best arm identification for (i) and regret minimization for (ii), the simultaneous realization of both objectives remains an open problem, despite its practical importance. This paper introduces \emph{Regret Optimal Best Arm Identification} (ROBAI) which aims to achieve these dual objectives. To solve ROBAI with both pre-determined stopping time and adaptive stopping time requirements, we present an algorithm called EOCP and its variants respectively, which not only achieve asymptotic optimal regret in both Gaussian and general bandits, but also commit to the optimal arm in $\mathcal{O}(\log T)$ rounds with pre-determined stopping time and $\mathcal{O}(\log^2 T)$ rounds with adaptive stopping time. We further characterize lower bounds on the commitment time (equivalent to the sample complexity) of ROBAI, showing that EOCP and its variants are sample optimal with pre-determined stopping time, and almost sample optimal with adaptive stopping time. Numerical results confirm our theoretical analysis and reveal an interesting "over-exploration" phenomenon carried by classic UCB algorithms, such that EOCP has smaller regret even though it stops exploration much earlier than UCB, i.e., $\mathcal{O}(\log T)$ versus $\mathcal{O}(T)$, which suggests over-exploration is unnecessary and potentially harmful to system performance.

Updated: 2024-05-29 19:49:02

标题: 快速且遗憾最小的最佳臂识别：基本极限和低复杂度算法

摘要: 本文考虑了一个具有双重目标的随机多臂赌博机(Multi-Armed Bandit，MAB)问题：(i)快速确定并致力于最佳臂，以及(ii)在一系列$T$个连续轮次中最大化奖励。尽管每个目标都已经被单独研究得很好，即(i)最佳臂确定和(ii)遗憾最小化，但同时实现这两个目标仍然是一个未解决的问题，尽管它在实践中非常重要。本文引入了“遗憾最优最佳臂确定”(Regret Optimal Best Arm Identification，ROBAI)的概念，旨在实现这两个双重目标。为了解决具有预定停止时间和自适应停止时间要求的ROBAI问题，我们分别提出了一个名为EOCP的算法及其变体，它不仅在高斯赌博机和一般赌博机中实现了渐近最优遗憾，还在预定停止时间下在$\mathcal{O}(\log T)$轮内确定了最佳臂，在自适应停止时间下在$\mathcal{O}(\log^2 T)$轮内确定了最佳臂。我们进一步对ROBAI的确定时间（等同于样本复杂度）进行了下界刻画，表明EOCP及其变体在预定停止时间下是样本最优的，而在自适应停止时间下几乎是样本最优的。数值结果证实了我们的理论分析，并揭示了一个有趣的“过度探索”现象，即经典的UCB算法存在这种现象，尽管EOCP停止探索得更早，即$\mathcal{O}(\log T)$与$\mathcal{O}(T)$相比，但它的遗憾更小，这表明过度探索是不必要的，可能对系统性能有害。

更新时间: 2024-05-29 19:49:02

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2309.00591v3

GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting

In recent years, data-driven modeling approaches have gained significant attention across various meteorological applications, particularly in weather forecasting. However, these methods often face challenges in handling extreme weather conditions. In response, we present the GA-SmaAt-GNet model, a novel generative adversarial framework for extreme precipitation nowcasting. This model features a unique SmaAt-GNet generator, an extension of the successful SmaAt-UNet architecture, capable of integrating precipitation masks (binarized precipitation maps) to enhance predictive accuracy. Additionally, GA-SmaAt-GNet incorporates an attention-augmented discriminator inspired by the Pix2Pix architecture. This innovative framework paves the way for generative precipitation nowcasting using multiple data sources. We evaluate the performance of SmaAt-GNet and GA-SmaAt-GNet using real-life precipitation data from the Netherlands, revealing notable improvements in overall performance and for extreme precipitation events compared to other models. Specifically, our proposed architecture demonstrates its main performance gain in summer and autumn, when precipitation intensity is typically at its peak. Furthermore, we conduct uncertainty analysis on the GA-SmaAt-GNet model and the precipitation dataset, providing insights into its predictive capabilities. Finally, we employ Grad-CAM to offer visual explanations of our model's predictions, generating activation heatmaps that highlight areas of input activation throughout the network.

Updated: 2024-05-29 19:41:41

标题: GA-SmaAt-GNet: 生成对抗小关注GNet用于极端降水的即时预测

摘要: 近年来，数据驱动的建模方法在各种气象应用中引起了极大关注，特别是在天气预报方面。然而，这些方法在处理极端天气条件时经常面临挑战。为此，我们提出了GA-SmaAt-GNet模型，这是一种新颖的用于极端降水即时预测的生成对抗框架。该模型具有独特的SmaAt-GNet生成器，这是成功的SmaAt-UNet架构的扩展，能够整合降水蒙版（二值化降水地图）以增强预测准确性。此外，GA-SmaAt-GNet还融合了受Pix2Pix架构启发的注意力增强鉴别器。这种创新框架为利用多种数据源进行生成性降水即时预测铺平了道路。我们利用荷兰的真实降水数据评估了SmaAt-GNet和GA-SmaAt-GNet的性能，发现与其他模型相比，在整体性能和极端降水事件方面有显著改进。具体来说，我们提出的架构在夏季和秋季表现出主要性能增益，因为在这些季节，降水强度通常达到峰值。此外，我们对GA-SmaAt-GNet模型和降水数据集进行了不确定性分析，为其预测能力提供了见解。最后，我们利用Grad-CAM提供我们模型预测的可视解释，生成突出显示整个网络中输入激活区域的激活热图。

更新时间: 2024-05-29 19:41:41

领域: cs.LG,physics.ao-ph,I.2; I.5

下载: http://arxiv.org/abs/2401.09881v2

The Data Minimization Principle in Machine Learning

The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has been endorsed by various global data protection regulations. However, its practical implementation remains a challenge due to the lack of a rigorous formulation. This paper addresses this gap and introduces an optimization framework for data minimization based on its legal definitions. It then adapts several optimization algorithms to perform data minimization and conducts a comprehensive evaluation in terms of their compliance with minimization objectives as well as their impact on user privacy. Our analysis underscores the mismatch between the privacy expectations of data minimization and the actual privacy benefits, emphasizing the need for approaches that account for multiple facets of real-world privacy risks.

Updated: 2024-05-29 19:40:27

标题: 《机器学习中的数据最小化原则》

摘要: 数据最小化原则旨在减少收集、处理或保留数据的数量，以最大程度减少滥用、未经授权访问或数据泄露的风险。根植于隐私设计原则，数据最小化已得到各种全球数据保护法规的认可。然而，由于缺乏严格的规范，其实际实施仍然是一个挑战。本文填补了这一空白，并基于其法律定义引入了一个数据最小化的优化框架。然后，它调整了几种优化算法来执行数据最小化，并在最小化目标的符合性以及对用户隐私的影响方面进行了全面评估。我们的分析强调了数据最小化的隐私期望与实际隐私利益之间的不匹配，强调了需要考虑现实世界隐私风险的多个方面的方法。

更新时间: 2024-05-29 19:40:27

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2405.19471v1

A note on the error analysis of data-driven closure models for large eddy simulations of turbulence

In this work, we provide a mathematical formulation for error propagation in flow trajectory prediction using data-driven turbulence closure modeling. Under the assumption that the predicted state of a large eddy simulation prediction must be close to that of a subsampled direct numerical simulation, we retrieve an upper bound for the prediction error when utilizing a data-driven closure model. We also demonstrate that this error is significantly affected by the time step size and the Jacobian which play a role in amplifying the initial one-step error made by using the closure. Our analysis also shows that the error propagates exponentially with rollout time and the upper bound of the system Jacobian which is itself influenced by the Jacobian of the closure formulation. These findings could enable the development of new regularization techniques for ML models based on the identified error-bound terms, improving their robustness and reducing error propagation.

Updated: 2024-05-29 19:39:12

标题: 关于数据驱动封闭模型在大涡模拟湍流中的误差分析的注释

摘要: 在这项工作中，我们提供了一个数学公式，用于利用数据驱动的湍流闭合建模来预测流轨迹时的误差传播。在假设大涡模拟预测的状态必须接近于子采样直接数值模拟的状态的前提下，我们得出了利用数据驱动闭合模型时的预测误差的上限。我们还展示了这种误差受时间步长和雅可比矩阵的显著影响，这些因素在放大使用闭合方法造成的初始一步误差中起作用。我们的分析还表明，误差随着推演时间和系统雅可比矩阵的上限以指数方式传播，而系统雅可比矩阵本身受到闭合公式的雅可比矩阵的影响。这些发现可以使基于确定的误差上限项的ML模型开发新的正则化技术，提高它们的稳健性并减少误差传播。

更新时间: 2024-05-29 19:39:12

领域: physics.flu-dyn,cs.LG,physics.comp-ph

下载: http://arxiv.org/abs/2405.17612v2

Loop Polarity Analysis to Avoid Underspecification in Deep Learning

Deep learning is a powerful set of techniques for detecting complex patterns in data. However, when the causal structure of that process is underspecified, deep learning models can be brittle, lacking robustness to shifts in the distribution of the data-generating process. In this paper, we turn to loop polarity analysis as a tool for specifying the causal structure of a data-generating process, in order to encode a more robust understanding of the relationship between system structure and system behavior within the deep learning pipeline. We use simulated epidemic data based on an SIR model to demonstrate how measuring the polarity of the different feedback loops that compose a system can lead to more robust inferences on the part of neural networks, improving the out-of-distribution performance of a deep learning model and infusing a system-dynamics-inspired approach into the machine learning development pipeline.

Updated: 2024-05-29 19:33:12

标题: 循环极性分析以避免深度学习中的不确定性

摘要: 深度学习是一组强大的技术，用于检测数据中的复杂模式。然而，当该过程的因果结构不明确时，深度学习模型可能变得脆弱，缺乏对数据生成过程分布变化的鲁棒性。在本文中，我们将循环极性分析作为一种工具，用于指定数据生成过程的因果结构，以便在深度学习流程中编码对系统结构和系统行为之间关系的更加稳健的理解。我们使用基于SIR模型的模拟流行病数据来展示，测量组成系统的不同反馈环路的极性如何可以使神经网络更可靠地推断，提高深度学习模型的超出分布性能，并将系统动力学启发式方法融入机器学习开发流程中。

更新时间: 2024-05-29 19:33:12

领域: cs.LG,cs.HC,stat.ME

下载: http://arxiv.org/abs/2309.10211v2

Posterior Sampling via Autoregressive Generation

Real-world decision-making requires grappling with a perpetual lack of data as environments change; intelligent agents must comprehend uncertainty and actively gather information to resolve it. We propose a new framework for learning bandit algorithms from massive historical data, which we demonstrate in a cold-start recommendation problem. First, we use historical data to pretrain an autoregressive model to predict a sequence of repeated feedback/rewards (e.g., responses to news articles shown to different users over time). In learning to make accurate predictions, the model implicitly learns an informed prior based on rich action features (e.g., article headlines) and how to sharpen beliefs as more rewards are gathered (e.g., clicks as each article is recommended). At decision-time, we autoregressively sample (impute) an imagined sequence of rewards for each action, and choose the action with the largest average imputed reward. Far from a heuristic, our approach is an implementation of Thompson sampling (with a learned prior), a prominent active exploration algorithm. We prove our pretraining loss directly controls online decision-making performance, and we demonstrate our framework on a news recommendation task where we integrate end-to-end fine-tuning of a pretrained language model to process news article headline text to improve performance.

Updated: 2024-05-29 19:24:44

标题: 通过自回归生成的后验抽样

摘要: 实际决策需要处理数据不足的情况，因为环境不断变化；智能代理必须理解不确定性并积极收集信息以解决问题。我们提出了一个新的框架，用于从大量历史数据中学习赌博算法，并在一个冷启动推荐问题中进行了演示。首先，我们使用历史数据预训练自回归模型，以预测一系列重复的反馈/奖励（例如，对不同用户显示的新闻文章的反应随时间变化）。通过学习准确预测，模型隐含地学习了基于丰富行为特征（例如，文章标题）的知情先验，以及如何在收集更多奖励（例如，点击每篇文章被推荐时的点击）时如何加强信念。在决策时，我们自回归地采样（填补）每个动作的想象奖励序列，并选择平均填补奖励最高的动作。我们的方法远非启发式方法，而是汤普森采样（带有学习的先验）的实现，这是一个著名的主动探索算法。我们证明我们的预训练损失直接控制在线决策性能，我们在一个新闻推荐任务中展示了我们的框架，其中我们整合了端到端微调一个预训练的语言模型，以处理新闻文章标题文本以提高性能。

更新时间: 2024-05-29 19:24:44

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19466v1

Characterizing Overfitting in Kernel Ridgeless Regression Through the Eigenspectrum

We derive new bounds for the condition number of kernel matrices, which we then use to enhance existing non-asymptotic test error bounds for kernel ridgeless regression (KRR) in the over-parameterized regime for a fixed input dimension. For kernels with polynomial spectral decay, we recover the bound from previous work; for exponential decay, our bound is non-trivial and novel. Our contribution is two-fold: (i) we rigorously prove the phenomena of tempered overfitting and catastrophic overfitting under the sub-Gaussian design assumption, closing an existing gap in the literature; (ii) we identify that the independence of the features plays an important role in guaranteeing tempered overfitting, raising concerns about approximating KRR generalization using the Gaussian design assumption in previous literature.

Updated: 2024-05-29 19:23:41

标题: 通过特征谱表征核岭回归中的过拟合现象

摘要: 我们推导了核矩阵条件数的新界限，然后利用这些界限增强了固定输入维度下超参数化区域内核无岭回归（KRR）的现有非渐近测试误差界限。对于具有多项式谱衰减的核函数，我们恢复了先前工作中的界限；对于指数衰减的核函数，我们的界限是非平凡且新颖的。我们的贡献是双重的：（i）我们严格证明了在次高斯设计假设下的温和过拟合和灾难性过拟合现象，填补了文献中的现有空白；（ii）我们确定特征的独立性在保证温和过拟合方面发挥重要作用，引发了对先前文献中使用高斯设计假设来近似KRR泛化的担忧。

更新时间: 2024-05-29 19:23:41

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.01297v3

Leveraging Generative AI for Smart City Digital Twins: A Survey on the Autonomous Generation of Data, Scenarios, 3D City Models, and Urban Designs

The digital transformation of modern cities by integrating advanced information, communication, and computing technologies has marked the epoch of data-driven smart city applications for efficient and sustainable urban management. Despite their effectiveness, these applications often rely on massive amounts of high-dimensional and multi-domain data for monitoring and characterizing different urban sub-systems, presenting challenges in application areas that are limited by data quality and availability, as well as costly efforts for generating urban scenarios and design alternatives. As an emerging research area in deep learning, Generative Artificial Intelligence (AI) models have demonstrated their unique values in data and code generation. This survey paper aims to explore the innovative integration of generative AI techniques and urban digital twins to address challenges in the realm of smart cities in various urban sectors, such as transportation and mobility management, energy system operations, building and infrastructure management, and urban design. The survey starts with the introduction of popular generative AI models with their application areas, followed by a structured review of the existing urban science applications that leverage the autonomous capability of the generative AI techniques to facilitate (a) data augmentation for promoting urban monitoring and predictive analytics, (b) synthetic data and scenario generation, (c) automated 3D city modeling, and (d) generative urban design and optimization. Based on the review, this survey discusses potential opportunities and technical strategies that integrate generative AI models into the next-generation urban digital twins for more reliable, scalable, and automated management of smart cities.

Updated: 2024-05-29 19:23:07

标题: 利用生成式人工智能为智慧城市数字孪生体提供支持：关于数据、场景、3D城市模型和城市设计的自主生成的调查

摘要: 现代城市通过整合先进的信息、通信和计算技术进行数字化转型，标志着基于数据驱动的智能城市应用的时代，以实现高效和可持续的城市管理。尽管这些应用程序有效，但它们通常依赖于大量的高维和多领域数据来监测和描述不同的城市子系统，这在受数据质量和可用性限制的应用领域中提出了挑战，同时为生成城市场景和设计替代方案带来了成本。作为深度学习中的新兴研究领域，生成人工智能（AI）模型已经展示了它们在数据和代码生成方面的独特价值。本调查论文旨在探讨生成AI技术与城市数字孪生体集成的创新，以解决智能城市领域中各种城市部门（如交通和移动管理、能源系统运营、建筑和基础设施管理以及城市设计）中的挑战。调查从介绍流行的生成AI模型及其应用领域开始，接着对现有的利用生成AI技术的自主能力促进城市监测和预测分析的城市科学应用进行了结构化审查，包括数据增强、合成数据和场景生成、自动化3D城市建模以及生成城市设计和优化。基于审查，本调查讨论了将生成AI模型整合到下一代城市数字孪生体中的潜在机会和技术策略，以实现更可靠、可扩展和自动化的智能城市管理。

更新时间: 2024-05-29 19:23:07

领域: cs.AI

下载: http://arxiv.org/abs/2405.19464v1

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-\iota})$ for any $\iota>0$, respectively under the availability of two-sample and one-sample oracles, respectively, where $T$ is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results.

Updated: 2024-05-29 19:21:55

标题: 基于流数据的工具变量回归的随机优化算法

摘要: 我们通过将问题视为条件随机优化问题，开发并分析了工具变量回归的算法。在最小二乘工具变量回归的背景下，我们的算法既不需要矩阵求逆，也不需要小批量，并提供了一种完全在线的方法，用于处理具有数据流的工具变量回归。当真实模型为线性时，我们在期望中推导出收敛速度为$\mathcal{O}(\log T/T)$和$\mathcal{O}(1/T^{1-\iota})$的结果，分别在两样本和一样本预测器的情况下，其中$T$为迭代次数。重要的是，在有两样本预测器的情况下，我们的程序避免了显式建模和估计混淆因素与工具变量之间的关系，展示了所提出方法相对于基于将问题重新表述为极小化最大化优化问题的最近作品的优势。通过数值实验验证了理论结果。

更新时间: 2024-05-29 19:21:55

领域: stat.ML,cs.LG,econ.EM,math.OC

下载: http://arxiv.org/abs/2405.19463v1

Clustering-Based Validation Splits for Domain Generalisation

This paper considers the problem of model selection under domain shift. In this setting, it is proposed that a high maximum mean discrepancy (MMD) between the training and validation sets increases the generalisability of selected models. A data splitting algorithm based on kernel k-means clustering, which maximises this objective, is presented. The algorithm leverages linear programming to control the size, label, and (optionally) group distributions of the splits, and comes with convergence guarantees. The technique consistently outperforms alternative splitting strategies across a range of datasets and training algorithms, for both domain generalisation (DG) and unsupervised domain adaptation (UDA) tasks. Analysis also shows the MMD between the training and validation sets to be strongly rank-correlated ($\rho=0.63$) with test domain accuracy, further substantiating the validity of this approach.

Updated: 2024-05-29 19:21:17

标题: 基于聚类的领域泛化验证拆分

摘要: 这篇论文考虑了在领域转移下的模型选择问题。在这种情况下，提出训练集和验证集之间高最大均值差异（MMD）增加了所选模型的泛化能力。提出了一种基于核k均值聚类的数据分割算法，该算法最大化了这一目标。该算法利用线性规划来控制分割的大小、标签和（可选的）组分布，并具有收敛保证。该技术在一系列数据集和训练算法中始终优于替代分割策略，适用于领域泛化（DG）和无监督领域自适应（UDA）任务。分析还显示训练集和验证集之间的MMD与测试领域准确性强相关（$\rho=0.63$），进一步证实了这种方法的有效性。

更新时间: 2024-05-29 19:21:17

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.19461v1

MemControl: Mitigating Memorization in Medical Diffusion Models via Automated Parameter Selection

Diffusion models show a remarkable ability in generating images that closely mirror the training distribution. However, these models are prone to training data memorization, leading to significant privacy, ethical, and legal concerns, particularly in sensitive fields such as medical imaging. We hypothesize that memorization is driven by the overparameterization of deep models, suggesting that regularizing model capacity during fine-tuning could be an effective mitigation strategy. Parameter-efficient fine-tuning (PEFT) methods offer a promising approach to capacity control by selectively updating specific parameters. However, finding the optimal subset of learnable parameters that balances generation quality and memorization remains elusive. To address this challenge, we propose a bi-level optimization framework that guides automated parameter selection by utilizing memorization and generation quality metrics as rewards. Our framework successfully identifies the optimal parameter set to be updated to satisfy the generation-memorization tradeoff. We perform our experiments for the specific task of medical image generation and outperform existing state-of-the-art training-time mitigation strategies by fine-tuning as few as 0.019% of model parameters. Furthermore, we show that the strategies learned through our framework are transferable across different datasets and domains. Our proposed framework is scalable to large datasets and agnostic to the choice of reward functions. Finally, we show that our framework can be combined with existing approaches for further memorization mitigation.

Updated: 2024-05-29 19:12:08

标题: MemControl：通过自动参数选择减轻医疗扩散模型中的记忆效应

摘要: 扩散模型在生成图像方面表现出非常出色的能力，其生成的图像与训练分布密切相关。然而，这些模型容易出现训练数据记忆化的问题，引发了重要的隐私、道德和法律关注，尤其在医学影像等敏感领域。我们假设记忆化是由深度模型的过度参数化驱动的，因此在微调过程中对模型容量进行规范化可能是一种有效的缓解策略。参数高效微调（PEFT）方法通过有选择地更新特定参数，为控制容量提供了一种有前途的途径。然而，找到平衡生成质量和记忆化的最佳可学习参数子集仍然是困难的。为了解决这一挑战，我们提出了一个双层优化框架，通过利用记忆化和生成质量指标作为奖励来引导自动参数选择。我们的框架成功地确定了要更新的最佳参数集，以满足生成-记忆化的权衡。我们针对医学图像生成的具体任务进行了实验，并通过微调仅占模型参数0.019%的方法，优于现有最先进的训练时间缓解策略。此外，我们展示了通过我们的框架学习到的策略可以在不同数据集和领域之间转移。我们提出的框架适用于大型数据集，并且对奖励函数的选择具有普遍性。最后，我们展示了我们的框架可以与现有方法结合，进一步缓解记忆化问题。

更新时间: 2024-05-29 19:12:08

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19458v1

An Automated Startup Evaluation Pipeline: Startup Success Forecasting Framework (SSFF)

Evaluating startups in their early stages is a complex task that requires detailed analysis by experts. While automating this process on a large scale can significantly impact businesses, the inherent complexity poses challenges. This paper addresses this challenge by introducing the Startup Success Forecasting Framework (SSFF), a new automated system that combines traditional machine learning with advanced language models. This intelligent agent-based architecture is designed to reason, act, synthesize, and decide like a venture capitalist to perform the analysis end-to-end. The SSFF is made up of three main parts: - Prediction Block: Uses random forests and neural networks to make predictions. - Analyst Block: Simulates VC analysis scenario and uses SOTA prompting techniques - External Knowledge Block: Gathers real-time information from external sources. This framework requires minimal input data about the founder and startup description, enhances it with additional data from external resources, and performs a detailed analysis with high accuracy, all in an automated manner

Updated: 2024-05-29 19:07:42

标题: 一个自动化的创业评估管道：创业成功预测框架（SSFF）

摘要: 在早期阶段评估初创企业是一个复杂的任务，需要专家进行详细分析。虽然在大规模范围内自动化这个过程可以显著影响企业，但固有的复杂性带来挑战。本文通过引入Startup Success Forecasting Framework（SSFF）来应对这一挑战，这是一个结合传统机器学习和先进语言模型的新型自动化系统。这个智能代理架构旨在像风险投资家一样进行推理、行动、综合和决策，以进行端到端的分析。SSFF由三个主要部分组成：- 预测模块：使用随机森林和神经网络进行预测。- 分析师模块：模拟VC分析场景并使用SOTA提示技术- 外部知识模块：从外部来源收集实时信息。这个框架需要关于创始人和初创企业描述的最少输入数据，利用外部资源的额外数据增强，以高精度进行详细分析，全部以自动化方式进行。

更新时间: 2024-05-29 19:07:42

领域: cs.AI

下载: http://arxiv.org/abs/2405.19456v1

Deep Grokking: Would Deep Neural Networks Generalize Better?

Recent research on the grokking phenomenon has illuminated the intricacies of neural networks' training dynamics and their generalization behaviors. Grokking refers to a sharp rise of the network's generalization accuracy on the test set, which occurs long after an extended overfitting phase, during which the network perfectly fits the training set. While the existing research primarily focus on shallow networks such as 2-layer MLP and 1-layer Transformer, we explore grokking on deep networks (e.g. 12-layer MLP). We empirically replicate the phenomenon and find that deep neural networks can be more susceptible to grokking than its shallower counterparts. Meanwhile, we observe an intriguing multi-stage generalization phenomenon when increase the depth of the MLP model where the test accuracy exhibits a secondary surge, which is scarcely seen on shallow models. We further uncover compelling correspondences between the decreasing of feature ranks and the phase transition from overfitting to the generalization stage during grokking. Additionally, we find that the multi-stage generalization phenomenon often aligns with a double-descent pattern in feature ranks. These observations suggest that internal feature rank could serve as a more promising indicator of the model's generalization behavior compared to the weight-norm. We believe our work is the first one to dive into grokking in deep neural networks, and investigate the relationship of feature rank and generalization performance.

Updated: 2024-05-29 19:05:11

标题: 深度理解：深度神经网络是否能更好地泛化？

摘要: 最近对grokking现象的研究揭示了神经网络训练动态和它们的泛化行为的复杂性。Grokking指的是网络在测试集上的泛化准确度急剧上升，这发生在长时间的过拟合阶段之后，网络在此阶段完美拟合训练集。虽然现有研究主要集中在浅层网络，如2层MLP和1层Transformer，我们探索了深层网络（例如12层MLP）上的grokking。我们在实验中复制了这一现象，并发现深度神经网络比其较浅层的对应物更容易受到grokking的影响。同时，当增加MLP模型的深度时，我们观察到一个有趣的多阶段泛化现象，其中测试准确度呈现出二次激增，这在浅层模型上很少见。我们进一步发现，在grokking过程中，特征排名的降低与从过拟合到泛化阶段的相变之间存在引人注目的对应关系。此外，我们发现多阶段泛化现象与特征排名的双减趋势常常一致。这些观察表明，内部特征排名可能比权重范数更有希望成为模型泛化行为的更有前景的指标。我们相信我们的工作是第一个深入研究深度神经网络中的grokking，并研究特征排名与泛化性能之间关系的研究。

更新时间: 2024-05-29 19:05:11

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19454v1

Optimizing Split Points for Error-Resilient SplitFed Learning

Recent advancements in decentralized learning, such as Federated Learning (FL), Split Learning (SL), and Split Federated Learning (SplitFed), have expanded the potentials of machine learning. SplitFed aims to minimize the computational burden on individual clients in FL and parallelize SL while maintaining privacy. This study investigates the resilience of SplitFed to packet loss at model split points. It explores various parameter aggregation strategies of SplitFed by examining the impact of splitting the model at different points-either shallow split or deep split-on the final global model performance. The experiments, conducted on a human embryo image segmentation task, reveal a statistically significant advantage of a deeper split point.

Updated: 2024-05-29 19:03:27

标题: 优化分割点以实现错误鲁棒的SplitFed学习

摘要: 最近去中心化学习方面的进展，例如联邦学习（FL）、分裂学习（SL）和分裂联邦学习（SplitFed），已经拓展了机器学习的潜力。SplitFed旨在最小化联邦学习中个体客户端的计算负担，并在保持隐私的同时并行化SL。本研究调查了SplitFed在模型分裂点丢包情况下的韧性。通过研究在不同点分裂模型（浅层分裂或深层分裂）对最终全局模型性能的影响，探讨了SplitFed的各种参数聚合策略。在进行的人类胚胎图像分割任务实验中，结果显示深度分裂点具有明显的统计学优势。

更新时间: 2024-05-29 19:03:27

领域: cs.AI

下载: http://arxiv.org/abs/2405.19453v1

Gaitor: Learning a Unified Representation Across Gaits for Real-World Quadruped Locomotion

The current state-of-the-art in quadruped locomotion is able to produce robust motion for terrain traversal but requires the segmentation of a desired robot trajectory into a discrete set of locomotion skills such as trot and crawl. In contrast, in this work we demonstrate the feasibility of learning a single, unified representation for quadruped locomotion enabling continuous blending between gait types and characteristics. We present Gaitor, which learns a disentangled representation of locomotion skills, thereby sharing information common to all gait types seen during training. The structure emerging in the learnt representation is interpretable in that it is found to encode phase correlations between the different gait types. These can be leveraged to produce continuous gait transitions. In addition, foot swing characteristics are disentangled and directly addressable. Together with a rudimentary terrain encoding and a learned planner operating in this structured latent representation, Gaitor is able to take motion commands including desired gait type and characteristics from a user while reacting to uneven terrain. We evaluate Gaitor in both simulated and real-world settings on the ANYmal C platform. To the best of our knowledge, this is the first work learning such a unified and interpretable latent representation for multiple gaits, resulting in on-demand continuous blending between different locomotion modes on a real quadruped robot.

Updated: 2024-05-29 19:02:57

标题: Gaitor：学习现实世界四足动物运动中不同步态之间的统一表示

摘要: 目前四足动物运动的最先进技术能够产生适合地形穿越的稳健运动，但需要将期望的机器人轨迹分割成诸如慢跑和爬行等一组离散的运动技能。相比之下，在这项工作中，我们展示了学习四足动物运动的单一、统一表示的可行性，实现了步态类型和特征之间的连续混合。我们提出了Gaitor，它学习了解开的运动技能表示，从而共享训练期间所有步态类型中共同的信息。在学习表示中出现的结构是可解释的，因为发现它编码了不同步态类型之间的相位相关性。这些可以利用来产生连续的步态转换。此外，脚摆特征是解开的，并且可以直接访问。结合一个简单的地形编码和在这个结构化潜在表示中运行的学习规划器，Gaitor能够接收用户的运动命令，包括期望的步态类型和特征，同时对不平整地形作出反应。我们在ANYmal C平台上的模拟和实际环境中评估了Gaitor。据我们所知，这是第一项学习多种步态的统一和可解释潜在表示的工作，在真实四足机器人上实现了按需连续混合不同运动模式。

更新时间: 2024-05-29 19:02:57

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.19452v1

Look Once to Hear: Target Speech Hearing with Noisy Examples

In crowded settings, the human brain can focus on speech from a target speaker, given prior knowledge of how they sound. We introduce a novel intelligent hearable system that achieves this capability, enabling target speech hearing to ignore all interfering speech and noise, but the target speaker. A naive approach is to require a clean speech example to enroll the target speaker. This is however not well aligned with the hearable application domain since obtaining a clean example is challenging in real world scenarios, creating a unique user interface problem. We present the first enrollment interface where the wearer looks at the target speaker for a few seconds to capture a single, short, highly noisy, binaural example of the target speaker. This noisy example is used for enrollment and subsequent speech extraction in the presence of interfering speakers and noise. Our system achieves a signal quality improvement of 7.01 dB using less than 5 seconds of noisy enrollment audio and can process 8 ms of audio chunks in 6.24 ms on an embedded CPU. Our user studies demonstrate generalization to real-world static and mobile speakers in previously unseen indoor and outdoor multipath environments. Finally, our enrollment interface for noisy examples does not cause performance degradation compared to clean examples, while being convenient and user-friendly. Taking a step back, this paper takes an important step towards enhancing the human auditory perception with artificial intelligence. We provide code and data at: https://github.com/vb000/LookOnceToHear.

Updated: 2024-05-29 19:00:39

标题: 看一次就能听到：通过嘈杂的例子实现目标语音听觉

摘要: 在拥挤的环境中，人类大脑可以专注于目标说话者的讲话，前提是对他们的声音有所了解。我们介绍了一种新型的智能听觉系统，实现了这种能力，使目标说话听力可以忽略所有干扰的讲话和噪音，只关注目标说话者。一种天真的方法是要求使用干净的语音示例来注册目标说话者。然而，这与可穿戴应用领域不太匹配，因为在现实世界的场景中获取干净示例是具有挑战性的，这导致了一个独特的用户界面问题。我们提出了第一个注册界面，佩戴者只需看几秒钟目标说话者，即可捕获目标说话者的一个短暂、高度嘈杂、双耳示例。这个嘈杂的示例用于注册和在干扰说话者和噪音存在的情况下进行后续语音提取。我们的系统使用少于5秒的嘈杂注册音频实现了7.01 dB的信号质量改进，并且可以在嵌入式CPU上以6.24毫秒处理8毫秒的音频块。我们的用户研究展示了对在以前未见的室内和室外多路径环境中的真实静态和移动说话者的泛化。最后，我们的嘈杂示例注册界面与干净示例相比不会导致性能下降，同时又方便用户友好。从更大的角度来看，本文是向人类听觉感知增强人工智能迈出的重要一步。我们在https://github.com/vb000/LookOnceToHear提供代码和数据。

更新时间: 2024-05-29 19:00:39

领域: cs.SD,cs.AI,eess.AS

下载: http://arxiv.org/abs/2405.06289v3

DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference

Given the increasing demand for tree-structured interactions with LLMs, we introduce DeFT (Decoding with Flash Tree-Attention), an IO-aware tree attention algorithm tailored for tree-structured inference. Unlike traditional sequence-based decoding, tree-structured decoding better accommodates modern task requirements, including self-consistency, few-shot prompting, multi-step reasoning, and multi-model/head coordination. However, existing sequence-based inference systems are ill-suited for tree-structured decoding, resulting in redundancy in computation, memory footprints, and memory access, thereby undermining inference efficiency. To address this challenge, DeFT maintains memory-efficient attention calculation with low memory footprints through two key stages: (1) QKV Preparation: We propose a KV-Guided Grouping Strategy with Tree Split to intelligently group QKV, optimizing GPU resource utilization while minimizing memory reads/writes for KV cache between GPU global memory and on-chip shared memory; (2)Attention Calculation: We compute partial attention of each QKV group in a fused kernel and employ a Tree-topology-aware Global Reduction strategy to obtain final attention. By reducing 73-99% KV cache IO and nearly 100% IO for partial results during attention calculation (e.g., Softmax), DeFT achieves up to 2.52/3.82x speedup in the end-to-end/attention latency across three practical tree-based workloads: namely, few-shot prompting, multi-step reasoning, and speculative decoding, over state-of-the-art attention algorithms.

Updated: 2024-05-29 18:46:41

标题: DeFT：使用Flash Tree-attention进行高效树形LLM推理解码

摘要: 鉴于对LLM的树结构交互需求不断增加，我们引入了DeFT（Decoding with Flash Tree-Attention），这是一种专为树结构推理定制的IO感知树注意力算法。与传统基于序列的解码不同，树结构解码更好地满足现代任务需求，包括自洽性、少样本提示、多步推理和多模型/头协调。然而，现有基于序列的推理系统不适用于树结构解码，导致计算冗余、内存占用和内存访问，从而降低推理效率。为了解决这一挑战，DeFT通过两个关键阶段实现了低内存占用的注意力计算：（1）QKV准备：我们提出了一种KV引导分组策略和树分割，智能地对QKV进行分组，优化GPU资源利用率，同时最大限度地减少GPU全局内存和片上共享内存之间的KV缓存读写；（2）注意力计算：我们在一个融合内核中计算每个QKV组的部分注意力，并采用一种树拓扑感知的全局减少策略来获得最终注意力。通过减少73-99%的KV缓存IO和注意力计算期间的部分结果（例如Softmax）近乎100%的IO，DeFT在三个实际基于树的工作负载（即少样本提示、多步推理和推测解码）上实现了高达2.52/3.82倍的端到端/注意力延迟加速，超过了最先进的注意力算法。

更新时间: 2024-05-29 18:46:41

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.00242v2

MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions

Large language models (LLMs) have demonstrated impressive capabilities in mathematical problem solving, particularly in single turn question answering formats. However, real world scenarios often involve mathematical question answering that requires multi turn or interactive information exchanges, and the performance of LLMs on these tasks is still underexplored. This paper introduces MathChat, a comprehensive benchmark specifically designed to evaluate LLMs across a broader spectrum of mathematical tasks. These tasks are structured to assess the models' abilities in multiturn interactions and open ended generation. We evaluate the performance of various SOTA LLMs on the MathChat benchmark, and we observe that while these models excel in single turn question answering, they significantly underperform in more complex scenarios that require sustained reasoning and dialogue understanding. To address the above limitations of existing LLMs when faced with multiturn and open ended tasks, we develop MathChat sync, a synthetic dialogue based math dataset for LLM finetuning, focusing on improving models' interaction and instruction following capabilities in conversations. Experimental results emphasize the need for training LLMs with diverse, conversational instruction tuning datasets like MathChatsync. We believe this work outlines one promising direction for improving the multiturn mathematical reasoning abilities of LLMs, thus pushing forward the development of LLMs that are more adept at interactive mathematical problem solving and real world applications.

Updated: 2024-05-29 18:45:55

标题: MathChat：在多轮交互中基准测试数学推理和指导遵循

摘要: 大型语言模型（LLMs）已经展示出在数学问题解决方面的令人印象深刻的能力，特别是在单轮问答格式中。然而，现实世界的场景通常涉及需要多轮或交互信息交换的数学问题回答，LLMs在这些任务上的表现仍未被充分探索。本文介绍了MathChat，一个专门设计用于评估LLMs在更广泛的数学任务范围内的综合基准。这些任务被设计为评估模型在多轮交互和开放式生成中的能力。我们评估了各种SOTA LLMs在MathChat基准上的表现，并观察到，虽然这些模型在单轮问答方面表现出色，但在需要持续推理和对话理解的更复杂场景中表现明显不佳。为了解决现有LLMs在面对多轮和开放式任务时的上述限制，我们开发了MathChat sync，这是一个用于LLM微调的基于合成对话的数学数据集，专注于提高模型在对话中的交互和指令遵循能力。实验结果强调了训练LLMs需要使用类似MathChatsync这样多样化的对话指令调整数据集的必要性。我们相信这项工作勾勒了改善LLMs多轮数学推理能力的一个有前途的方向，从而推动了更擅长交互式数学问题解决和实际应用的LLMs的发展。

更新时间: 2024-05-29 18:45:55

领域: cs.AI

下载: http://arxiv.org/abs/2405.19444v1

On the Convergence of Multi-objective Optimization under Generalized Smoothness

Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions, which are typically unsatisfactory for neural networks, such as recurrent neural networks (RNNs) and transformers. In this paper, we study a more general and realistic class of $\ell$-smooth loss functions, where $\ell$ is a general non-decreasing function of gradient norm. We develop two novel single-loop algorithms for $\ell$-smooth MOO problems, Generalized Smooth Multi-objective Gradient descent (GSMGrad) and its stochastic variant, Stochastic Generalized Smooth Multi-objective Gradient descent (SGSMGrad), which approximate the conflict-avoidant (CA) direction that maximizes the minimum improvement among objectives. We provide a comprehensive convergence analysis of both algorithms and show that they converge to an $\epsilon$-accurate Pareto stationary point with a guaranteed $\epsilon$-level average CA distance (i.e., the gap between the updating direction and the CA direction) over all iterations, where totally $\mathcal{O}(\epsilon^{-2})$ and $\mathcal{O}(\epsilon^{-4})$ samples are needed for deterministic and stochastic settings, respectively. Our algorithms can also guarantee a tighter $\epsilon$-level CA distance in each iteration using more samples. Moreover, we propose a practical variant of GSMGrad named GSMGrad-FA using only constant-level time and space, while achieving the same performance guarantee as GSMGrad. Our experiments validate our theory and demonstrate the effectiveness of the proposed methods.

Updated: 2024-05-29 18:36:59

标题: 多目标优化在广义平滑性下的收敛性

摘要: 多目标优化（MOO）在各个领域，如多任务学习中越来越受到关注。最近的研究提供了一些有效的算法，并进行了理论分析，但它们受到了标准的$L$-smooth或有界梯度假设的限制，这些假设通常不适用于神经网络，如循环神经网络（RNN）和transformers。在本文中，我们研究了一类更一般和更现实的$\ell$-smooth损失函数，其中$\ell$是梯度范数的一般非递减函数。我们为$\ell$-smooth MOO问题开发了两种新颖的单循环算法，分别是Generalized Smooth Multi-objective Gradient descent（GSMGrad）和其随机变体Stochastic Generalized Smooth Multi-objective Gradient descent（SGSMGrad），它们近似冲突避免（CA）方向，该方向最大化目标之间的最小改进。我们提供了这两种算法的全面收敛分析，并展示它们收敛到一个$\epsilon$-准确的帕累托稳定点，其中在所有迭代中都有保证的$\epsilon$级平均CA距离（即更新方向和CA方向之间的差距），在确定性和随机设置中分别需要$\mathcal{O}(\epsilon^{-2})$和$\mathcal{O}(\epsilon^{-4})$个样本。我们的算法还可以在每次迭代中使用更多样本来保证更紧的$\epsilon$级CA距离。此外，我们提出了一个实用的GSMGrad变体，称为GSMGrad-FA，仅使用恒定级别的时间和空间，并实现与GSMGrad相同的性能保证。我们的实验验证了我们的理论，并展示了所提出方法的有效性。

更新时间: 2024-05-29 18:36:59

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.19440v1

Chem-FINESE: Validating Fine-Grained Few-shot Entity Extraction through Text Reconstruction

Fine-grained few-shot entity extraction in the chemical domain faces two unique challenges. First, compared with entity extraction tasks in the general domain, sentences from chemical papers usually contain more entities. Moreover, entity extraction models usually have difficulty extracting entities of long-tailed types. In this paper, we propose Chem-FINESE, a novel sequence-to-sequence (seq2seq) based few-shot entity extraction approach, to address these two challenges. Our Chem-FINESE has two components: a seq2seq entity extractor to extract named entities from the input sentence and a seq2seq self-validation module to reconstruct the original input sentence from extracted entities. Inspired by the fact that a good entity extraction system needs to extract entities faithfully, our new self-validation module leverages entity extraction results to reconstruct the original input sentence. Besides, we design a new contrastive loss to reduce excessive copying during the extraction process. Finally, we release ChemNER+, a new fine-grained chemical entity extraction dataset that is annotated by domain experts with the ChemNER schema. Experiments in few-shot settings with both ChemNER+ and CHEMET datasets show that our newly proposed framework has contributed up to 8.26% and 6.84% absolute F1-score gains respectively.

Updated: 2024-05-29 18:24:15

标题: Chem-FINESE: 通过文本重建验证细粒度少样本实体提取

摘要: 在化学领域，细粒度的少样本实体提取面临两个独特的挑战。首先，与一般领域的实体提取任务相比，化学论文中的句子通常包含更多实体。此外，实体提取模型通常难以提取长尾类型的实体。本文提出了Chem-FINESE，一种基于序列到序列（seq2seq）的新型少样本实体提取方法，以解决这两个挑战。我们的Chem-FINESE包括两个组件：一个seq2seq实体提取器用于从输入句子中提取命名实体，以及一个seq2seq自验证模块用于从提取的实体重建原始输入句子。受到良好的实体提取系统需要忠实提取实体的启发，我们的新自验证模块利用实体提取结果重建原始输入句子。此外，我们设计了一个新的对比损失来减少提取过程中的过度复制。最后，我们发布了ChemNER+，一个由领域专家使用ChemNER模式注释的新型细粒度化学实体提取数据集。在使用ChemNER+和CHEMET数据集的少样本设置中的实验表明，我们新提出的框架分别提高了8.26%和6.84%的绝对F1分数。

更新时间: 2024-05-29 18:24:15

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2401.10189v4

SecureFalcon: Are We There Yet in Automated Software Vulnerability Detection with LLMs?

Software vulnerabilities can cause numerous problems, including crashes, data loss, and security breaches. These issues greatly compromise quality and can negatively impact the market adoption of software applications and systems. Traditional bug-fixing methods, such as static analysis, often produce false positives. While bounded model checking, a form of Formal Verification (FV), can provide more accurate outcomes compared to static analyzers, it demands substantial resources and significantly hinders developer productivity. Can Machine Learning (ML) achieve accuracy comparable to FV methods and be used in popular instant code completion frameworks in near real-time? In this paper, we introduce SecureFalcon, an innovative model architecture with only 121 million parameters derived from the Falcon-40B model and explicitly tailored for classifying software vulnerabilities. To achieve the best performance, we trained our model using two datasets, namely the FormAI dataset and the FalconVulnDB. The FalconVulnDB is a combination of recent public datasets, namely the SySeVR framework, Draper VDISC, Bigvul, Diversevul, SARD Juliet, and ReVeal datasets. These datasets contain the top 25 most dangerous software weaknesses, such as CWE-119, CWE-120, CWE-476, CWE-122, CWE-190, CWE-121, CWE-78, CWE-787, CWE-20, and CWE-762. SecureFalcon achieves 94% accuracy in binary classification and up to 92% in multiclassification, with instant CPU inference times. It outperforms existing models such as BERT, RoBERTa, CodeBERT, and traditional ML algorithms, promising to push the boundaries of software vulnerability detection and instant code completion frameworks.

Updated: 2024-05-29 18:22:48

标题: SecureFalcon：利用LLM实现自动化软件漏洞检测，我们已经到达了目标吗？

摘要: 软件漏洞可能导致多种问题，包括崩溃、数据丢失和安全漏洞。这些问题严重影响质量，并可能对软件应用和系统的市场采用产生负面影响。传统的漏洞修复方法，如静态分析，通常会产生误报。而有界模型检查，一种形式的形式验证（FV），可以提供比静态分析器更准确的结果，但需要大量资源并显著影响开发人员的生产力。机器学习（ML）是否能够实现与FV方法相媲美的准确性，并能在近实时中被用于流行的即时代码补全框架？在本文中，我们介绍了SecureFalcon，这是一种创新的模型架构，仅包含从Falcon-40B模型衍生出的1.21亿个参数，专门用于分类软件漏洞。为了实现最佳性能，我们使用两个数据集对模型进行了训练，即FormAI数据集和FalconVulnDB。FalconVulnDB是最近公开数据集的组合，包括SySeVR框架、Draper VDISC、Bigvul、Diversevul、SARD Juliet和ReVeal数据集。这些数据集包含了排名前25位最危险的软件弱点，如CWE-119、CWE-120、CWE-476、CWE-122、CWE-190、CWE-121、CWE-78、CWE-787、CWE-20和CWE-762。SecureFalcon在二元分类中达到了94%的准确率，在多分类中达到了92%的准确率，并具有即时的CPU推理时间。它优于现有模型，如BERT、RoBERTa、CodeBERT和传统的ML算法，有望推动软件漏洞检测和即时代码补全框架的发展界限。

更新时间: 2024-05-29 18:22:48

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2307.06616v2

Understanding LLMs Requires More Than Statistical Generalization

The last decade has seen blossoming research in deep learning theory attempting to answer, "Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart -- thus, equivalent test loss -- can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

Updated: 2024-05-29 18:22:26

标题: 理解LLMs需要的不仅仅是统计概括

摘要: 过去十年来，深度学习理论研究蓬勃发展，试图回答“为什么深度学习能泛化？”这一问题。这一进展是由于对过度参数化模型在插值区域的研究视角的转变。在本文中，我们认为另一个视角的转变是必要的，因为一些LLM（大型语言模型）的可取性并不是良好统计泛化的结果，需要一个单独的理论解释。我们的核心论点建立在AR概率模型本质上是不可识别的这一观察上：模型之间的KL散度接近零或零时，即测试损失相等，但行为却可能截然不同。我们用数学例子和经验观察支持我们的观点，说明为什么不可识别性通过三个案例研究具有实际意义：（1）零样本规则外推的不可识别性；（2）上下文学习的近似不可识别性；（3）可微调性的不可识别性。我们回顾了关注LLM相关泛化度量、可转移性和归纳偏差的有希望的研究方向。

更新时间: 2024-05-29 18:22:26

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.01964v2

Machine Learning in Space: Surveying the Robustness of on-board ML models to Radiation

Modern spacecraft are increasingly relying on machine learning (ML). However, physical equipment in space is subject to various natural hazards, such as radiation, which may inhibit the correct operation of computing devices. Despite plenty of evidence showing the damage that naturally-induced faults can cause to ML-related hardware, we observe that the effects of radiation on ML models for space applications are not well-studied. This is a problem: without understanding how ML models are affected by these natural phenomena, it is uncertain "where to start from" to develop radiation-tolerant ML software. As ML researchers, we attempt to tackle this dilemma. By partnering up with space-industry practitioners specialized in ML, we perform a reflective analysis of the state of the art. We provide factual evidence that prior work did not thoroughly examine the impact of natural hazards on ML models meant for spacecraft. Then, through a "negative result", we show that some existing open-source technologies can hardly be used by researchers to study the effects of radiation for some applications of ML in satellites. As a constructive step forward, we perform simple experiments showcasing how to leverage current frameworks to assess the robustness of practical ML models for cloud detection against radiation-induced faults. Our evaluation reveals that not all faults are as devastating as claimed by some prior work. By publicly releasing our resources, we provide a foothold -- usable by researchers without access to spacecraft -- for spearheading development of space-tolerant ML models.

Updated: 2024-05-29 18:13:03

标题: 太空中的机器学习：评估机载机器学习模型对辐射的稳健性

摘要: 现代航天器越来越依赖机器学习（ML）。然而，太空中的物理设备受到各种自然危害的影响，如辐射，这可能会阻碍计算设备的正确运行。尽管有大量证据显示自然诱发的故障可能给与ML相关的硬件造成的损害，但我们观察到对太空应用中ML模型的辐射影响并没有得到充分研究。这是一个问题：在不了解这些自然现象对ML模型的影响的情况下，要开发耐辐射的ML软件就很不确定。作为ML研究者，我们试图解决这个困境。通过与专门从事ML的太空工业从业者合作，我们对现有技术进行了反思分析。我们提供了事实证据，表明以前的工作没有彻底研究自然灾害对太空航天器用ML模型的影响。然后，通过一个“负面结果”，我们展示了一些现有的开源技术几乎无法被研究人员用于研究卫星ML应用中辐射的影响。作为一个建设性的进展，我们进行了简单的实验，展示如何利用当前框架评估针对辐射诱发故障的云检测实用ML模型的稳健性。我们的评估显示，并不是所有的故障都像一些以前的工作所声称的那样具有破坏性。通过公开发布我们的资源，我们提供了一个立足点，研究人员可以利用这些资源开发具有空间容忍性的ML模型，而不需要访问太空航天器。

更新时间: 2024-05-29 18:13:03

领域: cs.LG

下载: http://arxiv.org/abs/2405.02642v2

UNITS: A Unified Multi-Task Time Series Model

Advances in time series models are driving a shift from conventional deep learning methods to pre-trained foundational models. While pre-trained transformers and reprogrammed text-based LLMs report state-of-the-art results, the best-performing architectures vary significantly across tasks, and models often have limited scope, such as focusing only on time series forecasting. Models that unify predictive and generative time series tasks under a single framework remain challenging to achieve. We introduce UniTS, a multi-task time series model that uses task tokenization to express predictive and generative tasks within a single model. UniTS leverages a modified transformer block designed to obtain universal time series representations. This design induces transferability from a heterogeneous, multi-domain pre-training dataset-often with diverse dynamic patterns, sampling rates, and temporal scales-to many downstream datasets, which can also be diverse in task specifications and data domains. Across 38 datasets spanning human activity sensors, healthcare, engineering, and finance domains, UniTS model performs favorably against 12 forecasting models, 20 classification models, 18 anomaly detection models, and 16 imputation models, including repurposed text-based LLMs. UniTS demonstrates effective few-shot and prompt learning capabilities when evaluated on new data domains and tasks. In the conventional single-task setting, UniTS outperforms strong task-specialized time series models. The source code and datasets are available at https://github.com/mims-harvard/UniTS.

Updated: 2024-05-29 18:11:04

标题: 单位：一个统一的多任务时间序列模型

摘要: 时间序列模型的进展正在推动从传统的深度学习方法转向预训练的基础模型。虽然预训练的transformers和重新编程的基于文本的LLMs报告了最先进的结果，但在不同任务中性能最佳的架构差异显著，而且模型通常具有有限的范围，比如仅专注于时间序列预测。统一预测和生成时间序列任务的模型在一个框架下仍然具有挑战性。我们引入了UniTS，一个多任务时间序列模型，它使用任务标记化来表达单个模型中的预测和生成任务。UniTS利用了一个修改后的transformer块，设计用于获得通用的时间序列表示。这种设计从一个异构的、多领域的预训练数据集中获得了可传递性，通常该数据集具有多样的动态模式、采样率和时间尺度，适用于许多下游数据集，这些数据集在任务规范和数据领域上也可能是多样的。在涵盖了人类活动传感器、医疗保健、工程和金融领域的38个数据集中，UniTS模型在12个预测模型、20个分类模型、18个异常检测模型和16个插补模型中表现优异，包括重新用于基于文本的LLMs。UniTS在评估新的数据领域和任务时展示了有效的少样本和提示学习能力。在传统的单一任务设置中，UniTS的性能超越了强大的任务专用时间序列模型。源代码和数据集可在https://github.com/mims-harvard/UniTS上获得。

更新时间: 2024-05-29 18:11:04

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.00131v2

Evaluating Vision-Language Models on Bistable Images

Bistable images, also known as ambiguous or reversible images, present visual stimuli that can be seen in two distinct interpretations, though not simultaneously by the observer. In this study, we conduct the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected them to 116 different manipulations in brightness, tint, and rotation. We evaluated twelve different models in both classification and generative tasks across six model architectures. Our findings reveal that, with the exception of models from the Idefics family and LLaVA1.5-13b, there is a pronounced preference for one interpretation over another among the models, and minimal variance under image manipulations, with few exceptions on image rotations. Additionally, we compared the model preferences with humans, noting that the models do not exhibit the same continuity biases as humans and often diverge from human initial interpretations. We also investigated the influence of variations in prompts and the use of synonymous labels, discovering that these factors significantly affect model interpretations more than image manipulations showing a higher influence of the language priors on bistable image interpretations compared to image-text training data. All code and data is open sourced.

Updated: 2024-05-29 18:04:59

标题: 在双稳图像上评估视觉-语言模型

摘要: 双稳态图像，也称为模糊或可逆图像，呈现出可以在观察者看来以两种不同的解释形式出现的视觉刺激，尽管观察者无法同时看到这两种解释。在这项研究中，我们进行了迄今为止对视觉-语言模型使用双稳态图像的最广泛的研究。我们手动收集了一个包含29个双稳态图像及其相关标签的数据集，并对它们进行了116种不同的亮度、色调和旋转操作。我们在六种模型架构下评估了十二种不同的模型在分类和生成任务中的表现。我们的研究结果显示，除了来自Idefics家族和LLaVA1.5-13b的模型外，其他模型在一种解释上明显偏好于另一种解释，在图像操作中变化很小，只在图像旋转上有少数例外。此外，我们还将模型的偏好与人类进行了比较，发现模型不像人类一样展现出连续性偏差，而且经常与人类最初的解释有分歧。我们还调查了提示变化和使用同义标签的影响，发现这些因素对模型的解释产生了显著影响，比图像操作更具影响力，显示了语言先验对双稳态图像解释的影响要比图像-文本训练数据更大。所有代码和数据都是开源的。

更新时间: 2024-05-29 18:04:59

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19423v1

Using Contrastive Learning with Generative Similarity to Learn Spaces that Capture Human Inductive Biases

Humans rely on strong inductive biases to learn from few examples and abstract useful information from sensory data. Instilling such biases in machine learning models has been shown to improve their performance on various benchmarks including few-shot learning, robustness, and alignment. However, finding effective training procedures to achieve that goal can be challenging as psychologically-rich training data such as human similarity judgments are expensive to scale, and Bayesian models of human inductive biases are often intractable for complex, realistic domains. Here, we address this challenge by introducing a Bayesian notion of generative similarity whereby two datapoints are considered similar if they are likely to have been sampled from the same distribution. This measure can be applied to complex generative processes, including probabilistic programs. We show that generative similarity can be used to define a contrastive learning objective even when its exact form is intractable, enabling learning of spatial embeddings that express specific inductive biases. We demonstrate the utility of our approach by showing how it can be used to capture human inductive biases for geometric shapes, and to better distinguish different abstract drawing styles that are parameterized by probabilistic programs.

Updated: 2024-05-29 18:01:58

标题: 利用对比学习和生成相似性学习空间，捕捉人类归纳偏好

摘要: 人类依赖强大的归纳偏见来从少量示例中学习，并从感官数据中抽象出有用的信息。在机器学习模型中灌输这种偏见已经被证明可以提高它们在各种基准上的性能，包括少样本学习、鲁棒性和对齐性。然而，要找到有效的训练程序来实现这一目标可能是具有挑战性的，因为富有心理学特点的训练数据，如人类相似性判断，很难扩展，而且对于复杂、现实领域来说，人类归纳偏见的贝叶斯模型通常难以处理。在这里，我们通过引入一种贝叶斯的生成相似性概念来解决这一挑战，其中两个数据点被认为是相似的，如果它们很可能是从同一分布中抽样得到的。这个度量可以应用于复杂的生成过程，包括概率程序。我们展示了生成相似性可以用于定义对比学习目标，即使它的确切形式是难以处理的，从而实现学习表达特定归纳偏见的空间嵌入。我们通过展示我们的方法如何可以用于捕捉几何形状的人类归纳偏见，并更好地区分由概率程序参数化的不同抽象绘画风格，来展示我们方法的效用。

更新时间: 2024-05-29 18:01:58

领域: cs.LG,cs.AI,q-bio.NC

下载: http://arxiv.org/abs/2405.19420v1

Safety through Permissibility: Shield Construction for Fast and Safe Reinforcement Learning

Designing Reinforcement Learning (RL) solutions for real-life problems remains a significant challenge. A major area of concern is safety. "Shielding" is a popular technique to enforce safety in RL by turning user-defined safety specifications into safe agent behavior. However, these methods either suffer from extreme learning delays, demand extensive human effort in designing models and safe domains in the problem, or require pre-computation. In this paper, we propose a new permissibility-based framework to deal with safety and shield construction. Permissibility was originally designed for eliminating (non-permissible) actions that will not lead to an optimal solution to improve RL training efficiency. This paper shows that safety can be naturally incorporated into this framework, i.e. extending permissibility to include safety, and thereby we can achieve both safety and improved efficiency. Experimental evaluation using three standard RL applications shows the effectiveness of the approach.

Updated: 2024-05-29 18:00:21

标题: 通过许可性确保安全：快速且安全的强化学习的屏蔽构建

摘要: 为真实生活问题设计强化学习（RL）解决方案仍然是一个重要挑战。一个主要关注的领域是安全。"屏蔽"是一种流行的技术，通过将用户定义的安全规范转化为安全的代理行为来实施强化学习中的安全性。然而，这些方法要么遭受极端的学习延迟，要么需要大量人力在问题中设计模型和安全领域，要么需要预计算。在本文中，我们提出了一个基于许可性的新框架来处理安全性和屏蔽构造。许可性最初是为了消除（非许可的）不会导致最优解的行动，以提高强化学习培训效率而设计的。本文表明安全性可以自然地纳入这一框架中，即扩展许可性以包括安全性，从而既可以实现安全性又可以提高效率。使用三个标准RL应用程序进行的实验评估显示了该方法的有效性。

更新时间: 2024-05-29 18:00:21

领域: cs.LG

下载: http://arxiv.org/abs/2405.19414v1

VisTA-SR: Improving the Accuracy and Resolution of Low-Cost Thermal Imaging Cameras for Agriculture

Thermal cameras are an important tool for agricultural research because they allow for non-invasive measurement of plant temperature, which relates to important photochemical, hydraulic, and agronomic traits. Utilizing low-cost thermal cameras can lower the barrier to introducing thermal imaging in agricultural research and production. This paper presents an approach to improve the temperature accuracy and image quality of low-cost thermal imaging cameras for agricultural applications. Leveraging advancements in computer vision techniques, particularly deep learning networks, we propose a method, called $\textbf{VisTA-SR}$ ($\textbf{Vis}$ual \& $\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement) that combines RGB and thermal images to enhance the capabilities of low-resolution thermal cameras. The research includes calibration and validation of temperature measurements, acquisition of paired image datasets, and the development of a deep learning network tailored for agricultural thermal imaging. Our study addresses the challenges of image enhancement in the agricultural domain and explores the potential of low-cost thermal cameras to replace high-resolution industrial cameras. Experimental results demonstrate the effectiveness of our approach in enhancing temperature accuracy and image sharpness, paving the way for more accessible and efficient thermal imaging solutions in agriculture.

Updated: 2024-05-29 18:00:20

标题: VisTA-SR：提高农业低成本热成像相机的准确性和分辨率

摘要: 热成像相机是农业研究中的重要工具，因为它们允许对植物温度进行非侵入式测量，这与重要的光化学、水力和农学特征相关。利用低成本热成像相机可以降低引入热成像技术在农业研究和生产中的障碍。本文提出了一种改进低成本热成像相机在农业应用中温度准确性和图像质量的方法。利用计算机视觉技术的进展，特别是深度学习网络，我们提出了一种方法，称为$\textbf{VisTA-SR}$（$\textbf{Vis}$ual \& $\textbf{T}$hermal $\textbf{A}$lignment and $\textbf{S}$uper-$\textbf{R}$esolution Enhancement），它结合了RGB和热像，以增强低分辨率热成像相机的能力。研究包括温度测量的校准和验证、成对图像数据集的获取以及为农业热成像量身定制的深度学习网络的开发。我们的研究解决了农业领域图像增强的挑战，并探索了低成本热成像相机替代高分辨率工业相机的潜力。实验结果显示了我们的方法在提高温度准确性和图像清晰度方面的有效性，为农业领域提供了更易接近和高效的热成像解决方案的道路。

更新时间: 2024-05-29 18:00:20

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19413v1

Ground state phases of the two-dimension electron gas with a unified variational approach

The two-dimensional electron gas (2DEG) is a fundamental model, which is drawing increasing interest because of recent advances in experimental and theoretical studies of 2D materials. Current understanding of the ground state of the 2DEG relies on quantum Monte Carlo calculations, based on variational comparisons of different ansatze for different phases. We use a single variational ansatz, a general backflow-type wave function using a message-passing neural quantum state architecture, for a unified description across the entire density range. The variational optimization consistently leads to lower ground-state energies than previous best results. Transition into a Wigner crystal (WC) phase occurs automatically at rs = 37 +/- 1, a density lower than currently believed. Between the liquid and WC phases, the same ansatz and variational search strongly suggest the existence of intermediate states in a broad range of densities, with enhanced short-range nematic spin correlations.

Updated: 2024-05-29 18:00:01

标题: 基态相变：利用统一的变分方法研究二维电子气体

摘要: 二维电子气（2DEG）是一个基本模型，由于近年来对二维材料的实验和理论研究取得的进展，越来越受到关注。目前对2DEG基态的理解依赖于量子蒙特卡罗计算，基于不同相的不同ansatze的变分比较。我们使用单一变分ansatz，一种使用消息传递神经量子态架构的一般反流型波函数，对整个密度范围进行统一描述。变分优化始终导致比先前最佳结果更低的基态能量。在rs = 37 +/- 1时，自动转变为Wigner晶体（WC）相，密度低于目前的认知。在液态和WC相之间，相同的ansatz和变分搜索强烈暗示存在在广泛密度范围内的中间态，具有增强的短程各向异性自旋相关性。

更新时间: 2024-05-29 18:00:01

领域: cond-mat.str-el,cs.LG,physics.comp-ph,quant-ph

下载: http://arxiv.org/abs/2405.19397v1

Neural Scaling Laws From Large-N Field Theory: Solvable Model Beyond the Ridgeless Limit

Many machine learning models based on neural networks exhibit scaling laws: their performance scales as power laws with respect to the sizes of the model and training data set. We use large-N field theory methods to solve a model recently proposed by Maloney, Roberts and Sully which provides a simplified setting to study neural scaling laws. Our solution extends the result in this latter paper to general nonzero values of the ridge parameter, which are essential to regularize the behavior of the model. In addition to obtaining new and more precise scaling laws, we also uncover a duality transformation at the diagrams level which explains the symmetry between model and training data set sizes. The same duality underlies recent efforts to design neural networks to simulate quantum field theories.

Updated: 2024-05-29 18:00:01

标题: 大 N 场论的神经缩放定律：超越无脊限制的可解模型

摘要: 基于神经网络的许多机器学习模型表现出缩放定律：它们的性能与模型大小和训练数据集的大小呈幂律关系。我们使用大N场论方法解决了Maloney、Roberts和Sully最近提出的模型，该模型提供了一个简化的设置来研究神经网络缩放定律。我们的解决方案将后一篇论文中的结果扩展到岭参数的一般非零值，这些值对于规范化模型的行为至关重要。除了获得新的和更精确的缩放定律外，我们还揭示了一个在图表水平上的对偶变换，解释了模型和训练数据集大小之间的对称性。相同的对偶也支持最近的努力设计神经网络来模拟量子场论。

更新时间: 2024-05-29 18:00:01

领域: hep-th,cond-mat.dis-nn,cs.LG,hep-ph

下载: http://arxiv.org/abs/2405.19398v1

X-VILA: Cross-Modality Alignment for Large Language Model

We introduce X-VILA, an omni-modality model designed to extend the capabilities of large language models (LLMs) by incorporating image, video, and audio modalities. By aligning modality-specific encoders with LLM inputs and diffusion decoders with LLM outputs, X-VILA achieves cross-modality understanding, reasoning, and generation. To facilitate this cross-modality alignment, we curate an effective interleaved any-to-any modality instruction-following dataset. Furthermore, we identify a significant problem with the current cross-modality alignment method, which results in visual information loss. To address the issue, we propose a visual alignment mechanism with a visual embedding highway module. We then introduce a resource-efficient recipe for training X-VILA, that exhibits proficiency in any-to-any modality conversation, surpassing previous approaches by large margins. X-VILA also showcases emergent properties across modalities even in the absence of similar training data. The project will be made open-source.

Updated: 2024-05-29 17:59:58

标题: X-VILA：用于大型语言模型的跨模态对齐

摘要: 我们介绍了X-VILA，这是一个全模态模型，旨在通过整合图像、视频和音频模态来拓展大型语言模型（LLMs）的能力。通过将模态特定的编码器与LLM输入对齐，以及将扩散解码器与LLM输出对齐，X-VILA实现了跨模态的理解、推理和生成。为了促进这种跨模态对齐，我们整理了一个有效的混合式任意-任意模态指令跟随数据集。此外，我们发现了当前跨模态对齐方法存在一个重要问题，导致视觉信息丢失。为了解决这个问题，我们提出了一个具有视觉嵌入高速公路模块的视觉对齐机制。然后，我们介绍了一个资源高效的X-VILA训练方案，展示了在任意-任意模态对话中的熟练程度，超越了以往方法很大的差距。X-VILA还展示了在缺乏类似训练数据的情况下，跨模态中出现的新特性。该项目将开源。

更新时间: 2024-05-29 17:59:58

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.19335v1

LLMs Meet Multimodal Generation and Editing: A Survey

With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on understanding. This survey elaborates on multimodal generation across different domains, including image, video, 3D, and audio, where we highlight the notable advancements with milestone works in these fields. Specifically, we exhaustively investigate the key technical components behind methods and multimodal datasets utilized in these studies. Moreover, we dig into tool-augmented multimodal agents that can use existing generative models for human-computer interaction. Lastly, we also comprehensively discuss the advancement in AI safety and investigate emerging applications as well as future prospects. Our work provides a systematic and insightful overview of multimodal generation, which is expected to advance the development of Artificial Intelligence for Generative Content (AIGC) and world models. A curated list of all related papers can be found at https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

Updated: 2024-05-29 17:59:20

标题: LLMs满足多模态生成和编辑：一项调查

摘要: 随着大型语言模型（LLMs）的最新进展，人们越来越感兴趣将LLMs与多模态学习相结合。先前对多模态大型语言模型（MLLMs）的调查主要集中在理解方面。本调查详细阐述了跨不同领域的多模态生成，包括图像、视频、3D和音频，在这些领域的重要进展以及里程碑作品。具体而言，我们详尽调查了这些研究中使用的方法和多模态数据集背后的关键技术组件。此外，我们探讨了工具增强的多模态代理，这些代理可以利用现有的生成模型进行人机交互。最后，我们还全面讨论了人工智能安全的进展，调查了新兴应用以及未来展望。我们的工作提供了多模态生成的系统和富有洞见的概述，预计将推动生成内容的人工智能（AIGC）和世界模型的发展。所有相关论文的精选列表可在https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation 找到。

更新时间: 2024-05-29 17:59:20

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.19334v1

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Preference optimization, particularly through Reinforcement Learning from Human Feedback (RLHF), has achieved significant success in aligning Large Language Models (LLMs) to adhere to human intentions. Unlike offline alignment with a fixed dataset, online feedback collection from humans or AI on model generations typically leads to more capable reward models and better-aligned LLMs through an iterative process. However, achieving a globally accurate reward model requires systematic exploration to generate diverse responses that span the vast space of natural language. Random sampling from standard reward-maximizing LLMs alone is insufficient to fulfill this requirement. To address this issue, we propose a bilevel objective optimistically biased towards potentially high-reward responses to actively explore out-of-distribution regions. By solving the inner-level problem with the reparameterized reward function, the resulting algorithm, named Self-Exploring Language Models (SELM), eliminates the need for a separate RM and iteratively updates the LLM with a straightforward objective. Compared to Direct Preference Optimization (DPO), the SELM objective reduces indiscriminate favor of unseen extrapolations and enhances exploration efficiency. Our experimental results demonstrate that when finetuned on Zephyr-7B-SFT and Llama-3-8B-Instruct models, SELM significantly boosts the performance on instruction-following benchmarks such as MT-Bench and AlpacaEval 2.0, as well as various standard academic benchmarks in different settings. Our code and models are available at https://github.com/shenao-zhang/SELM.

Updated: 2024-05-29 17:59:07

标题: 自我探索语言模型：在线对齐的主动偏好引诱

摘要: 优化偏好，尤其是通过人类反馈的强化学习（RLHF），在使大型语言模型（LLMs）符合人类意图方面取得了显著成功。与固定数据集的离线对齐不同，来自人类或人工智能对模型生成的在线反馈通常通过迭代过程导致更有能力的奖励模型和更好对齐的LLMs。然而，要实现全局准确的奖励模型，需要系统探索以生成涵盖自然语言广阔空间的多样化响应。仅从标准奖励最大化的LLMs中随机抽样是不足以满足这一要求的。为了解决这个问题，我们提出了一个双层目标，乐观地偏向于潜在高奖励响应，以积极探索超出分布范围的区域。通过使用重新参数化的奖励函数解决内部问题，得到的算法被命名为自我探索语言模型（SELM），消除了需要单独的RM，并通过简单的目标迭代更新LLM。与直接偏好优化（DPO）相比，SELM目标减少了对未见外推的不加区分的偏好，并增强了探索效率。我们的实验结果表明，当在Zephyr-7B-SFT和Llama-3-8B-Instruct模型上进行微调时，SELM显著提升了指令遵循基准测试的性能，如MT-Bench和AlpacaEval 2.0，以及不同设置下的各种标准学术基准测试。我们的代码和模型可在https://github.com/shenao-zhang/SELM上找到。

更新时间: 2024-05-29 17:59:07

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19332v1

NPGA: Neural Parametric Gaussian Avatars

The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian Splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. To increase the representational capacity of our avatars, we augment the canonical Gaussian point cloud using per-primitive latent features which govern its dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.

Updated: 2024-05-29 17:58:09

标题: NPGA：神经参数高斯化身

摘要: 通过创建高保真度的数字化人头是进一步将虚拟组件融入我们日常生活的重要步骤。构建这样的化身是一个具有挑战性的研究问题，因为对逼真性和实时渲染性能的需求很高。在这项工作中，我们提出了神经参数高斯化身（NPGA），这是一种数据驱动的方法，可以从多视角视频录像中创建高保真度、可控制的化身。我们基于3D高斯飞溅构建了我们的方法，因为它具有高效的渲染性能，并继承了点云的拓扑灵活性。与以往的工作相比，我们将我们的化身动态条件化为神经参数头模型（NPHM）的丰富表达空间，而不是基于网格的3DMMs。为此，我们将底层NPHM的反向变形场提炼为与栅格化渲染兼容的正向变形。所有其余的细节、表达依赖的细节都是从多视角视频中学习的。为增加我们的化身的表现能力，我们使用每个原始潜在特征来增强规范高斯点云，这些特征控制其动态行为。为了规范这种增加的动态表现力，我们提出了潜在特征和预测动态的拉普拉斯项。我们在公共NeRSemble数据集上评估了我们的方法，结果表明NPGA在自我再现任务上比前沿化身表现提高了2.6个PSNR。此外，我们展示了从现实世界单目视频中准确的动画能力。

更新时间: 2024-05-29 17:58:09

领域: cs.CV,cs.AI,cs.GR

下载: http://arxiv.org/abs/2405.19331v1

MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series

Large Language Models (LLMs) have made great strides in recent years to achieve unprecedented performance across different tasks. However, due to commercial interest, the most competitive models like GPT, Gemini, and Claude have been gated behind proprietary interfaces without disclosing the training details. Recently, many institutions have open-sourced several strong LLMs like LLaMA-3, comparable to existing closed-source LLMs. However, only the model's weights are provided with most details (e.g., intermediate checkpoints, pre-training corpus, and training code, etc.) being undisclosed. To improve the transparency of LLMs, the research community has formed to open-source truly open LLMs (e.g., Pythia, Amber, OLMo), where more details (e.g., pre-training corpus and training code) are being provided. These models have greatly advanced the scientific study of these large models including their strengths, weaknesses, biases and risks. However, we observe that the existing truly open LLMs on reasoning, knowledge, and coding tasks are still inferior to existing state-of-the-art LLMs with similar model sizes. To this end, we open-source MAP-Neo, a highly capable and transparent bilingual language model with 7B parameters trained from scratch on 4.5T high-quality tokens. Our MAP-Neo is the first fully open-sourced bilingual LLM with comparable performance compared to existing state-of-the-art LLMs. Moreover, we open-source all details to reproduce our MAP-Neo, where the cleaned pre-training corpus, data cleaning pipeline, checkpoints, and well-optimized training/evaluation framework are provided. Finally, we hope our MAP-Neo will enhance and strengthen the open research community and inspire more innovations and creativities to facilitate the further improvements of LLMs.

Updated: 2024-05-29 17:57:16

标题: MAP-Neo：功能强大且透明的双语大型语言模型系列

摘要: 近年来，大型语言模型（LLMs）在不同任务上取得了前所未有的表现。然而，由于商业利益，像GPT、Gemini和Claude这样最具竞争力的模型被封锁在专有界面后，没有披露训练细节。最近，许多机构已经开源了几个强大的LLMs，如LLaMA-3，与现有闭源LLMs相媲美。然而，大多数细节（如中间检查点、预训练语料库和训练代码等）没有披露，仅提供了模型的权重。为了提高LLMs的透明度，研究界已经开始开源真正开放的LLMs（如Pythia、Amber、OLMo），提供了更多细节（如预训练语料库和训练代码）。这些模型极大地推动了对这些大型模型的科学研究，包括它们的优势、弱点、偏见和风险。然而，我们观察到，现有的真正开放的LLMs在推理、知识和编码任务上仍然不及具有相似模型规模的现有最先进的LLMs。为此，我们开源了MAP-Neo，这是一个具有7B参数的高性能和透明的双语语言模型，从零开始在4.5T高质量令牌上进行训练。我们的MAP-Neo是第一个完全开源的双语LLM，性能可与现有最先进的LLMs相媲美。此外，我们开源了所有细节以重现我们的MAP-Neo，提供了清洁的预训练语料库、数据清理管道、检查点和经过良好优化的训练/评估框架。最后，我们希望我们的MAP-Neo将增强和巩固开放研究社区，并激发更多的创新和创造力，促进LLMs的进一步改进。

更新时间: 2024-05-29 17:57:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19327v1

Code Simulation Challenges for Large Language Models

Many reasoning, planning, and problem-solving tasks share an intrinsic algorithmic nature: correctly simulating each step is a sufficient condition to solve them correctly. This work studies to what extent Large Language Models (LLMs) can simulate coding and algorithmic tasks to provide insights into general capabilities in such algorithmic reasoning tasks. We introduce benchmarks for straight-line programs, code that contains critical paths, and approximate and redundant instructions. We further assess the simulation capabilities of LLMs with sorting algorithms and nested loops and show that a routine's computational complexity directly affects an LLM's ability to simulate its execution. While the most powerful LLMs exhibit relatively strong simulation capabilities, the process is fragile, seems to rely heavily on pattern recognition, and is affected by memorisation. We propose a novel off-the-shelf prompting method, Chain of Simulation (CoSm), which instructs LLMs to simulate code execution line by line/follow the computation pattern of compilers. CoSm efficiently helps LLMs reduce memorisation and shallow pattern recognition while improving simulation performance. We consider the success of CoSm in code simulation to be inspirational for other general routine simulation reasoning tasks.

Updated: 2024-05-29 17:56:58

标题: 大型语言模型的代码模拟挑战

摘要: 这项工作研究了大型语言模型（LLMs）在模拟编码和算法任务方面的能力，以提供对这类算法推理任务中的一般能力的洞察。我们引入了针对直线程序、包含关键路径的代码以及近似和冗余指令的基准。我们进一步评估了LLMs在排序算法和嵌套循环中的模拟能力，并展示了一个程序的计算复杂性直接影响LLMs模拟其执行的能力。尽管最强大的LLMs表现出相对较强的模拟能力，但这一过程是脆弱的，似乎在很大程度上依赖于模式识别，并受到记忆的影响。我们提出了一种新颖的即插即用提示方法，Chain of Simulation（CoSm），该方法指导LLMs按行模拟代码执行/遵循编译器的计算模式。CoSm有效地帮助LLMs减少记忆和浅层模式识别，同时提高了模拟性能。我们认为CoSm在代码模拟中的成功对其他一般例行模拟推理任务具有启发性。

更新时间: 2024-05-29 17:56:58

领域: cs.LG,cs.AI,cs.CL,cs.PL

下载: http://arxiv.org/abs/2401.09074v3

Center-Based Relaxed Learning Against Membership Inference Attacks

Membership inference attacks (MIAs) are currently considered one of the main privacy attack strategies, and their defense mechanisms have also been extensively explored. However, there is still a gap between the existing defense approaches and ideal models in performance and deployment costs. In particular, we observed that the privacy vulnerability of the model is closely correlated with the gap between the model's data-memorizing ability and generalization ability. To address this, we propose a new architecture-agnostic training paradigm called center-based relaxed learning (CRL), which is adaptive to any classification model and provides privacy preservation by sacrificing a minimal or no loss of model generalizability. We emphasize that CRL can better maintain the model's consistency between member and non-member data. Through extensive experiments on standard classification datasets, we empirically show that this approach exhibits comparable performance without requiring additional model capacity or data costs.

Updated: 2024-05-29 17:54:47

标题: 基于中心的放松学习对抗成员推断攻击

摘要: 成员推断攻击（MIAs）目前被认为是主要的隐私攻击策略之一，它们的防御机制也得到了广泛探讨。然而，现有防御方法与性能和部署成本中的理想模型之间仍存在差距。特别是，我们观察到模型的隐私漏洞与模型的数据记忆能力和泛化能力之间的差距密切相关。为了解决这个问题，我们提出了一个新的架构无关的训练范式，称为基于中心的放松学习（CRL），它适应于任何分类模型，并通过牺牲最小或没有损失模型泛化能力来提供隐私保护。我们强调CRL可以更好地保持模型在成员和非成员数据之间的一致性。通过对标准分类数据集进行大量实验，我们经验性地展示了这种方法在不需要额外模型容量或数据成本的情况下表现出可比性能。

更新时间: 2024-05-29 17:54:47

领域: cs.LG,cs.AI,cs.CR

下载: http://arxiv.org/abs/2404.17674v2

Are Large Language Models Chameleons?

Do large language models (LLMs) have their own worldviews and personality tendencies? Simulations in which an LLM was asked to answer subjective questions were conducted more than 1 million times. Comparison of the responses from different LLMs with real data from the European Social Survey (ESS) suggests that the effect of prompts on bias and variability is fundamental, highlighting major cultural, age, and gender biases. Methods for measuring the difference between LLMs and survey data are discussed, such as calculating weighted means and a new proposed measure inspired by Jaccard similarity. We conclude that it is important to analyze the robustness and variability of prompts before using LLMs to model individual decisions or collective behavior, as their imitation abilities are approximate at best.

Updated: 2024-05-29 17:54:22

标题: 大型语言模型是变色龙吗？

摘要: 大型语言模型（LLMs）是否具有自己的世界观和个性倾向？对LLM进行了超过100万次的模拟，要求其回答主观问题。将不同LLMs的回答与来自欧洲社会调查（ESS）的真实数据进行比较，表明提示对偏见和变异性的影响是根本的，突出了主要的文化、年龄和性别偏见。讨论了衡量LLMs与调查数据之间差异的方法，例如计算加权平均值和受Jaccard相似性启发的新提出的测量方法。我们得出结论，重要的是在使用LLMs对个体决策或集体行为建模之前，分析提示的稳健性和变异性，因为它们的模仿能力最多是近似的。

更新时间: 2024-05-29 17:54:22

领域: cs.CL,cs.AI,cs.CY,cs.LG

下载: http://arxiv.org/abs/2405.19323v1

Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF

Reinforcement learning from human feedback (RLHF) has demonstrated great promise in aligning large language models (LLMs) with human preference. Depending on the availability of preference data, both online and offline RLHF are active areas of investigation. A key bottleneck is understanding how to incorporate uncertainty estimation in the reward function learned from the preference data for RLHF, regardless of how the preference data is collected. While the principles of optimism or pessimism under uncertainty are well-established in standard reinforcement learning (RL), a practically-implementable and theoretically-grounded form amenable to large language models is not yet available, as standard techniques for constructing confidence intervals become intractable under arbitrary policy parameterizations. In this paper, we introduce a unified approach to online and offline RLHF -- value-incentivized preference optimization (VPO) -- which regularizes the maximum-likelihood estimate of the reward function with the corresponding value function, modulated by a $\textit{sign}$ to indicate whether the optimism or pessimism is chosen. VPO also directly optimizes the policy with implicit reward modeling, and therefore shares a simpler RLHF pipeline similar to direct preference optimization. Theoretical guarantees of VPO are provided for both online and offline settings, matching the rates of their standard RL counterparts. Moreover, experiments on text summarization and dialog verify the practicality and effectiveness of VPO.

Updated: 2024-05-29 17:51:42

标题: 价值激励偏好优化：在线和离线RLHF的统一方法

摘要: 人类反馈强化学习（RLHF）已经展示出在将大型语言模型（LLMs）与人类偏好对齐方面具有巨大潜力。根据偏好数据的可用性，在线和离线RLHF都是研究的活跃领域。一个关键瓶颈是如何在从偏好数据中学习到的奖励函数中合并不确定性估计，无论偏好数据是如何收集的。虽然在标准强化学习（RL）中乐观或悲观情况下不确定性的原则已经得到了确立，但对于大型语言模型而言，实际可实施且具有理论基础的形式尚未出现，因为标准构建置信区间的技术在任意策略参数化下变得难以处理。在本文中，我们介绍了一种统一的在线和离线RLHF方法--价值激励偏好优化（VPO）--它通过相应的价值函数调节最大似然估计的奖励函数，通过一个$\textit{sign}$来指示选择乐观或悲观状态。VPO还直接通过隐式奖励建模优化策略，因此与直接偏好优化类似，共享更简单的RLHF流程。VPO的理论保证适用于在线和离线设置，并且与它们的标准RL对应物的速率相匹配。此外，在文本摘要和对话方面的实验验证了VPO的实用性和有效性。

更新时间: 2024-05-29 17:51:42

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.19320v1

Adaptive Generalized Neyman Allocation: Local Asymptotic Minimax Optimal Best Arm Identification

This study investigates a local asymptotic minimax optimal strategy for fixed-budget best arm identification (BAI). We propose the Adaptive Generalized Neyman Allocation (AGNA) strategy and show that its worst-case upper bound of the probability of misidentifying the best arm aligns with the worst-case lower bound under the small-gap regime, where the gap between the expected outcomes of the best and suboptimal arms is small. Our strategy corresponds to a generalization of the Neyman allocation for two-armed bandits (Neyman, 1934; Kaufmann et al., 2016) and a refinement of existing strategies such as the ones proposed by Glynn & Juneja (2004) and Shin et al. (2018). Compared to Komiyama et al. (2022), which proposes a minimax rate-optimal strategy, our proposed strategy has a tighter upper bound that exactly matches the lower bound, including the constant terms, by restricting the class of distributions to the class of small-gap distributions. Our result contributes to the longstanding open issue about the existence of asymptotically optimal strategies in fixed-budget BAI, by presenting the local asymptotic minimax optimal strategy.

Updated: 2024-05-29 17:43:13

标题: 自适应广义Neyman分配：局部渐近极小极优最佳臂识别

摘要: 这项研究探讨了固定预算下最佳臂辨识（BAI）的本地渐近极小极小优化策略。我们提出了自适应广义尼曼分配（AGNA）策略，并展示其误识别最佳臂的概率的最坏情况上限与小间隙制度下的最坏情况下限相符，其中最佳和次优臂的预期结果之间的差距很小。我们的策略对应于两臂老虎机的尼曼分配的泛化（Neyman, 1934; Kaufmann等，2016）以及现有策略的改进，如Glynn＆Juneja（2004）和Shin等（2018）提出的策略。与Komiyama等人（2022）提出的极小极小速率最优策略相比，我们提出的策略具有更紧的上限，并通过将分布类限制为小间隙分布类，确切地匹配了下限，包括常数项。我们的结果为固定预算BAI中渐近最优策略的存在问题做出了贡献，通过提出本地渐近极小极小最优策略。

更新时间: 2024-05-29 17:43:13

领域: cs.LG,cs.AI,econ.EM,stat.ME,stat.ML

下载: http://arxiv.org/abs/2405.19317v1

Robust Preference Optimization through Reward Model Distillation

Language model (LM) post-training (or alignment) involves maximizing a reward function that is derived from preference annotations. Direct Preference Optimization (DPO) is a popular offline alignment method that trains a policy directly on preference data without the need to train a reward model or apply reinforcement learning. However, typical preference datasets have only a single, or at most a few, annotation per preference pair, which causes DPO to overconfidently assign rewards that trend towards infinite magnitude. This frequently leads to degenerate policies, sometimes causing even the probabilities of the preferred generations to go to zero. In this work, we analyze this phenomenon and propose distillation to get a better proxy for the true preference distribution over generation pairs: we train the LM to produce probabilities that match the distribution induced by a reward model trained on the preference data. Moreover, to account for uncertainty in the reward model we are distilling from, we optimize against a family of reward models that, as a whole, is likely to include at least one reasonable proxy for the preference distribution. Our results show that distilling from such a family of reward models leads to improved robustness to distribution shift in preference annotations, while preserving the simple supervised nature of DPO.

Updated: 2024-05-29 17:39:48

标题: 通过奖励模型蒸馏实现健壮的偏好优化

摘要: 语言模型（LM）的后训练（或对齐）涉及最大化源自偏好注释的奖励函数。直接偏好优化（DPO）是一种流行的离线对齐方法，它直接在偏好数据上训练策略，无需训练奖励模型或应用强化学习。然而，典型的偏好数据集每个偏好对仅有一个，或最多几个注释，这导致DPO过于自信地分配趋向无限大小的奖励。这经常导致退化的策略，有时甚至导致优先生成的概率变为零。在这项工作中，我们分析了这一现象，并提出蒸馏以获得对生成对的真实偏好分布更好的代理：我们训练LM产生与在偏好数据上训练的奖励模型诱导的分布相匹配的概率。此外，为了考虑我们正在蒸馏的奖励模型中的不确定性，我们针对一组奖励模型进行优化，整体上可能包含至少一个合理的偏好分布代理。我们的结果表明，从这样一组奖励模型中进行蒸馏可以提高对偏好注释分布转移的鲁棒性，同时保持DPO的简单监督性质。

更新时间: 2024-05-29 17:39:48

领域: cs.LG,cs.CL

下载: http://arxiv.org/abs/2405.19316v1

Matryoshka Query Transformer for Large Vision-Language Models

Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources? We answer this with an emphatic yes. Inspired by Matryoshka Representation Learning, we introduce the Matryoshka Query Transformer (MQT), capable of encoding an image into m visual tokens during inference, where m can be any number up to a predefined maximum. This is achieved by employing a query transformer with M latent query tokens to compress the visual embeddings. During each training step, we randomly select m <= M latent query tokens and train the model using only these first m tokens, discarding the rest. Combining MQT with LLaVA, we train a single model once, and flexibly and drastically reduce the number of inference-time visual tokens while maintaining similar or better performance compared to training independent models for each number of tokens. Our model, MQT-LLAVA, matches LLaVA-1.5 performance across 11 benchmarks using a maximum of 256 tokens instead of LLaVA's fixed 576. Reducing to 16 tokens (8x less TFLOPs) only sacrifices the performance by 2.4 points on MMBench. On certain tasks such as ScienceQA and MMMU, we can even go down to only 2 visual tokens with performance drops of just 3% and 6% each. Our exploration of the trade-off between the accuracy and computational cost brought about by the number of visual tokens facilitates future research to achieve the best of both worlds.

Updated: 2024-05-29 17:39:42

标题: 大视觉语言模型的马特里奥什卡查询转换器

摘要: 大型视觉-语言模型（LVLMs）通常将图像编码为固定数量的视觉标记（例如576个），并使用语言模型处理这些标记。尽管它们表现出色，但LVLMs在适应不同计算约束方面面临挑战。这引发了一个问题：我们是否可以实现灵活性，以适应不同任务和计算资源的视觉标记数量？我们断言可以。受马特里奥什卡表示学习的启发，我们引入了马特里奥什卡查询变换器（MQT），能够在推断过程中将图像编码为m个视觉标记，其中m可以是预定义最大数量内的任何数字。这是通过使用具有M个潜在查询标记的查询变换器来压缩视觉嵌入实现的。在每个训练步骤中，我们随机选择m <= M个潜在查询标记，并仅使用这些第一个m个标记来训练模型，丢弃其余的标记。将MQT与LLaVA相结合，我们一次训练一个单一模型，并在推断时灵活且大幅减少视觉标记的数量，同时与为每个标记数量训练独立模型相比，保持类似或更好的性能。我们的模型MQT-LLAVA在11个基准测试中使用最多256个标记（而不是LLAVA的固定576个）与LLaVA-1.5性能相匹配。将标记减少到16个（TFLOPs减少8倍）仅在MMBench上牺牲了2.4个点的性能。在某些任务（如ScienceQA和MMMU）中，我们甚至可以将视觉标记减少到仅2个，性能下降仅为3%和6%。我们对标记数量带来的准确性和计算成本之间的权衡探讨促进了未来研究，以实现两全其美。

更新时间: 2024-05-29 17:39:42

领域: cs.CV,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.19315v1

Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice

The observed similarities in the behavior of humans and Large Language Models (LLMs) have prompted researchers to consider the potential of using LLMs as models of human cognition. However, several significant challenges must be addressed before LLMs can be legitimately regarded as cognitive models. For instance, LLMs are trained on far more data than humans typically encounter, and may have been directly trained on human data in specific cognitive tasks or aligned with human preferences. Consequently, the origins of these behavioral similarities are not well understood. In this paper, we propose a novel way to enhance the utility of LLMs as cognitive models. This approach involves (i) leveraging computationally equivalent tasks that both an LLM and a rational agent need to master for solving a cognitive problem and (ii) examining the specific task distributions required for an LLM to exhibit human-like behaviors. We apply this approach to decision-making -- specifically risky and intertemporal choice -- where the key computationally equivalent task is the arithmetic of expected value calculations. We show that an LLM pretrained on an ecologically valid arithmetic dataset, which we call Arithmetic-GPT, predicts human behavior better than many traditional cognitive models. Pretraining LLMs on ecologically valid arithmetic datasets is sufficient to produce a strong correspondence between these models and human decision-making. Our results also suggest that LLMs used as cognitive models should be carefully investigated via ablation studies of the pretraining data.

Updated: 2024-05-29 17:37:14

标题: 训练进行算术的语言模型预测人类的风险和时间选择

摘要: 人类和大型语言模型（LLMs）行为的相似性促使研究人员考虑将LLMs作为人类认知模型的潜力。然而，在LLMs被正当认为是认知模型之前，必须解决几个重要挑战。例如，LLMs接受的训练数据远远超过人类通常遇到的数据，并且可能直接在特定认知任务或与人类偏好对齐的人类数据上接受了训练。因此，这些行为相似性的起源尚不清楚。在本文中，我们提出了一种增强LLMs作为认知模型实用性的新方法。这种方法包括（i）利用LLM和理性智能体都需要掌握的计算等效任务来解决认知问题，以及（ii）研究LLM展现出类似人类行为所需的具体任务分布。我们将这种方法应用于决策制定 - 具体而言是风险和时间抉择 - 其中关键的计算等效任务是预期价值计算的算术。我们展示了一个预训练在生态有效算术数据集上的LLM，我们称之为算术-GPT，比许多传统认知模型更好地预测人类行为。在生态有效算术数据集上预训练LLMs足以产生这些模型与人类决策制定之间的强相关性。我们的结果还表明，作为认知模型使用的LLMs应通过预训练数据的消融研究进行认真调查。

更新时间: 2024-05-29 17:37:14

领域: cs.AI,cs.CL,econ.GN,q-fin.EC

下载: http://arxiv.org/abs/2405.19313v1

Causal Inference from Slowly Varying Nonstationary Processes

Causal inference from observational data following the restricted structural causal models (SCM) framework hinges largely on the asymmetry between cause and effect from the data generating mechanisms, such as non-Gaussianity or non-linearity. This methodology can be adapted to stationary time series, yet inferring causal relationships from nonstationary time series remains a challenging task. In this work, we propose a new class of restricted SCM, via a time-varying filter and stationary noise, and exploit the asymmetry from nonstationarity for causal identification in both bivariate and network settings. We propose efficient procedures by leveraging powerful estimates of the bivariate evolutionary spectra for slowly varying processes. Various synthetic and real datasets that involve high-order and non-smooth filters are evaluated to demonstrate the effectiveness of our proposed methodology.

Updated: 2024-05-29 17:33:47

标题: 从缓慢变化的非平稳过程中推断因果关系

摘要: 通过遵循受限制的结构因果模型（SCM）框架从观察数据中进行因果推断，在数据生成机制中，如非高斯性或非线性，因果关系在很大程度上取决于因果关系的不对称性。这种方法可以适用于平稳时间序列，但从非平稳时间序列中推断因果关系仍然是一项具有挑战性的任务。在这项工作中，我们提出了一种新的受限制SCM类，通过时间变化的滤波器和稳态噪声，并利用非平稳性的不对称性，对双变量和网络设置中进行因果识别。我们通过利用用于缓慢变化过程的双变量演化谱的强大估计，提出了高效的程序。评估了涉及高阶和非平滑滤波器的各种合成和真实数据集，以证明我们提出的方法的有效性。

更新时间: 2024-05-29 17:33:47

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.06902v2

Measuring and Mitigating Bias for Tabular Datasets with Multiple Protected Attributes

Motivated by the recital (67) of the current corrigendum of the AI Act in the European Union, we propose and present measures and mitigation strategies for discrimination in tabular datasets. We specifically focus on datasets that contain multiple protected attributes, such as nationality, age, and sex. This makes measuring and mitigating bias more challenging, as many existing methods are designed for a single protected attribute. This paper comes with a twofold contribution: Firstly, new discrimination measures are introduced. These measures are categorized in our framework along with existing ones, guiding researchers and practitioners in choosing the right measure to assess the fairness of the underlying dataset. Secondly, a novel application of an existing bias mitigation method, FairDo, is presented. We show that this strategy can mitigate any type of discrimination, including intersectional discrimination, by transforming the dataset. By conducting experiments on real-world datasets (Adult, Bank, Compas), we demonstrate that de-biasing datasets with multiple protected attributes is achievable. Further, the transformed fair datasets do not compromise any of the tested machine learning models' performances significantly when trained on these datasets compared to the original datasets. Discrimination was reduced by up to 83% in our experimentation. For most experiments, the disparity between protected groups was reduced by at least 7% and 27% on average. Generally, the findings show that the mitigation strategy used is effective, and this study contributes to the ongoing discussion on the implementation of the European Union's AI Act.

Updated: 2024-05-29 17:27:08

标题: 测量和减轻具有多个受保护属性的表格数据集的偏见

摘要: 受欧盟AI法案目前勘误（第67条）的启发，我们提出并提出了针对表格数据集中歧视的措施和缓解策略。我们特别关注包含多个受保护属性（如国籍、年龄和性别）的数据集。这使得衡量和减轻偏见更具挑战性，因为许多现有方法是为单个受保护属性设计的。本文具有双重贡献：首先，引入了新的歧视性措施。这些措施在我们的框架中与现有措施分类，指导研究人员和从业者选择正确的衡量方法来评估底层数据集的公平性。其次，介绍了现有偏见缓解方法FairDo的新颖应用。我们展示了该策略可以通过转换数据集来减轻任何类型的歧视，包括交叉歧视。通过在真实数据集（成人、银行、Compas）上进行实验，我们证明了对具有多个受保护属性的数据集进行去偏见化是可行的。此外，与原始数据集相比，经过转换的公平数据集不会显著损害任何经过训练的机器学习模型的性能。在我们的实验中，歧视减少了高达83%。在大多数实验中，受保护群体之间的差异至少减少了7%，平均减少了27%。总的来说，研究结果表明所使用的减轻策略是有效的，这项研究对欧盟AI法案的实施进行了持续讨论的贡献。

更新时间: 2024-05-29 17:27:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19300v1

Neural Isometries: Taming Transformations for Equivariant ML

Real-world geometry and 3D vision tasks are replete with challenging symmetries that defy tractable analytical expression. In this paper, we introduce Neural Isometries, an autoencoder framework which learns to map the observation space to a general-purpose latent space wherein encodings are related by isometries whenever their corresponding observations are geometrically related in world space. Specifically, we regularize the latent space such that maps between encodings preserve a learned inner product and commute with a learned functional operator, in the same manner as rigid-body transformations commute with the Laplacian. This approach forms an effective backbone for self-supervised representation learning, and we demonstrate that a simple off-the-shelf equivariant network operating in the pre-trained latent space can achieve results on par with meticulously-engineered, handcrafted networks designed to handle complex, nonlinear symmetries. Furthermore, isometric maps capture information about the respective transformations in world space, and we show that this allows us to regress camera poses directly from the coefficients of the maps between encodings of adjacent views of a scene.

Updated: 2024-05-29 17:24:25

标题: 神经同构：驯服等变ML的变换

摘要: 真实世界的几何和三维视觉任务充满了具有挑战性的对称性，这些对称性难以通过可处理的分析表达。在本文中，我们介绍了神经等距映射，这是一个自动编码器框架，可以学习将观察空间映射到一个通用的潜在空间，在这个空间中，编码在几何上相关的观察对应时通过等距映射相关。具体地，我们规范潜在空间，使得编码之间的映射保持学习到的内积，同时与学习到的功能算子相结合，就像刚体变换与拉普拉斯算子相结合一样。这种方法为自监督表示学习提供了一个有效的骨干，并且我们证明，在预训练的潜在空间中操作的简单现成等变网络可以取得与精心设计、手工制作的网络相媲美的结果，后者旨在处理复杂的非线性对称性。此外，等距映射可以捕捉有关世界空间中相应变换的信息，我们展示，这使我们能够直接从场景相邻视图的编码之间的映射的系数中回归出相机姿势。

更新时间: 2024-05-29 17:24:25

领域: cs.CV,cs.AI,cs.GR,cs.LG

下载: http://arxiv.org/abs/2405.19296v1

Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform

Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we present the first end-to-end inference results of transformer models on an open-source many-tiny-core RISC-V platform implementing distributed Softmax primitives and leveraging ISA extensions for SIMD floating-point operand streaming and instruction repetition, as well as specialized DMA engines to minimize costly main memory accesses and to tolerate their latency. We focus on two foundational transformer topologies, encoder-only and decoder-only models. For encoder-only models, we demonstrate a speedup of up to 12.8x between the most optimized implementation and the baseline version. We reach over 79% FPU utilization and 294 GFLOPS/W, outperforming State-of-the-Art (SoA) accelerators by more than 2x utilizing the HW platform while achieving comparable throughput per computational unit. For decoder-only topologies, we achieve 16.1x speedup in the Non-Autoregressive (NAR) mode and up to 35.6x speedup in the Autoregressive (AR) mode compared to the baseline implementation. Compared to the best SoA dedicated accelerator, we achieve 2.04x higher FPU utilization.

Updated: 2024-05-29 17:16:59

标题: 在一个多核开源RISC-V平台上优化基础模型推断

摘要: 基于Transformer的基础模型已经成为各个领域的关键，尤其是自然语言处理（NLP）或计算机视觉（CV）。这些模型主要部署在高性能GPU或硬件加速器上，具有高度定制的专有指令集。直到现在，对基于RISC-V的通用平台的关注有限。在我们的工作中，我们展示了在一个实现了分布式Softmax原语并利用ISA扩展进行SIMD浮点操作数流和指令重复的开源多微核RISC-V平台上Transformer模型的首个端到端推理结果，同时利用专门的DMA引擎来最小化昂贵的主存访问并容忍它们的延迟。我们专注于两种基础的Transformer拓扑结构，分别是仅编码器模型和仅解码器模型。对于仅编码器模型，我们展示了最优化实现和基线版本之间高达12.8倍的加速。我们达到了超过79%的FPU利用率和294 GFLOPS/W的性能，利用硬件平台超过2倍地超越了最新的加速器，并实现了每个计算单元的可比吞吐量。对于仅解码器拓扑结构，我们在非自回归（NAR）模式下实现了16.1倍的加速，相比基线实现，在自回归（AR）模式下实现了高达35.6倍的加速。与最佳的SoA专用加速器相比，我们实现了2.04倍更高的FPU利用率。

更新时间: 2024-05-29 17:16:59

领域: cs.DC,cs.AI,cs.AR,C.4; C.3; I.2

下载: http://arxiv.org/abs/2405.19284v1

Understanding and Minimising Outlier Features in Neural Network Training

Outlier Features (OF) are neurons whose activation magnitudes significantly exceed the average over a neural network's (NN) width. They are well known to emerge during standard transformer training and have the undesirable effect of hindering quantisation in afflicted models. Despite their practical importance, little is known behind why OFs emerge during training, nor how one can minimise them. Our work focuses on the above questions, first identifying several quantitative metrics, such as the kurtosis over neuron activation norms, to measure OFs. With these metrics, we study how architectural and optimisation choices influence OFs, and provide practical insights to minimise OFs during training. As highlights, we emphasise the importance of controlling signal propagation throughout training, and propose the Outlier Protected transformer block, which removes standard Pre-Norm layers to mitigate OFs, without loss of convergence speed or training stability. Overall, our findings shed new light on our understanding of, our ability to prevent, and the complexity of this important facet in NN training dynamics.

Updated: 2024-05-29 17:11:28

标题: 理解和最小化神经网络训练中的异常特征

摘要: 异常特征（OF）是指其激活幅度明显超过神经网络（NN）宽度上的平均值的神经元。它们众所周知会在标准变压器训练过程中出现，并具有不利影响，会阻碍受影响模型中的量化。尽管它们在实践中具有重要意义，但很少人知道为什么在训练过程中会出现异常特征，也不知道如何最小化它们。我们的工作集中在上述问题上，首先确定了几个量化指标，例如神经元激活范数上的峰度，用于衡量异常特征。通过这些指标，我们研究了架构和优化选择如何影响异常特征，并提供了在训练过程中最小化异常特征的实用见解。作为亮点，我们强调了在整个训练过程中控制信号传播的重要性，并提出了Outlier Protected变压器块，它去除了标准的Pre-Norm层以减轻异常特征，而不会损失收敛速度或训练稳定性。总的来说，我们的发现为我们对神经网络训练动态的理解、预防以及这一重要方面的复杂性带来了新的启示。

更新时间: 2024-05-29 17:11:28

领域: cs.LG

下载: http://arxiv.org/abs/2405.19279v1

Ferrari: Federated Feature Unlearning via Optimizing Feature Sensitivity

The advent of Federated Learning (FL) highlights the practical necessity for the 'right to be forgotten' for all clients, allowing them to request data deletion from the machine learning model's service provider. This necessity has spurred a growing demand for Federated Unlearning (FU). Feature unlearning has gained considerable attention due to its applications in unlearning sensitive features, backdoor features, and bias features. Existing methods employ the influence function to achieve feature unlearning, which is impractical for FL as it necessitates the participation of other clients in the unlearning process. Furthermore, current research lacks an evaluation of the effectiveness of feature unlearning. To address these limitations, we define feature sensitivity in the evaluation of feature unlearning according to Lipschitz continuity. This metric characterizes the rate of change or sensitivity of the model output to perturbations in the input feature. We then propose an effective federated feature unlearning framework called Ferrari, which minimizes feature sensitivity. Extensive experimental results and theoretical analysis demonstrate the effectiveness of Ferrari across various feature unlearning scenarios, including sensitive, backdoor, and biased features.

Updated: 2024-05-29 17:11:04

标题: 法拉利：通过优化特征敏感度进行联邦特征遗忘

摘要: 联合学习（FL）的出现凸显了所有客户端对“被遗忘权利”的实际必要性，使他们能够要求从机器学习模型的服务提供商那里删除数据。这种必要性推动了对联合遗忘（FU）的日益增长的需求。特征遗忘由于在遗忘敏感特征、后门特征和偏见特征方面的应用而引起了广泛关注。现有方法利用影响函数来实现特征遗忘，但这对于FL来说是不切实际的，因为它需要其他客户端参与遗忘过程。此外，目前的研究缺乏对特征遗忘有效性的评估。为了解决这些限制，我们根据Lipschitz连续性在特征遗忘评估中定义了特征敏感性。这个度量描述了模型输出对输入特征扰动的变化率或敏感性。然后，我们提出了一个名为Ferrari的有效的联合特征遗忘框架，该框架最小化了特征敏感性。大量的实验结果和理论分析证明了Ferrari在各种特征遗忘场景下的有效性，包括敏感、后门和有偏见的特征。

更新时间: 2024-05-29 17:11:04

领域: cs.LG

下载: http://arxiv.org/abs/2405.17462v2

Deep Latent Variable Modeling of Physiological Signals

A deep latent variable model is a powerful method for capturing complex distributions. These models assume that underlying structures, but unobserved, are present within the data. In this dissertation, we explore high-dimensional problems related to physiological monitoring using latent variable models. First, we present a novel deep state-space model to generate electrical waveforms of the heart using optically obtained signals as inputs. This can bring about clinical diagnoses of heart disease via simple assessment through wearable devices. Second, we present a brain signal modeling scheme that combines the strengths of probabilistic graphical models and deep adversarial learning. The structured representations can provide interpretability and encode inductive biases to reduce the data complexity of neural oscillations. The efficacy of the learned representations is further studied in epilepsy seizure detection formulated as an unsupervised learning problem. Third, we propose a framework for the joint modeling of physiological measures and behavior. Existing methods to combine multiple sources of brain data provided are limited. Direct analysis of the relationship between different types of physiological measures usually does not involve behavioral data. Our method can identify the unique and shared contributions of brain regions to behavior and can be used to discover new functions of brain regions. The success of these innovative computational methods would allow the translation of biomarker findings across species and provide insight into neurocognitive analysis in numerous biological studies and clinical diagnoses, as well as emerging consumer applications.

Updated: 2024-05-29 17:07:33

标题: 生理信号的深层潜变量建模

摘要: 深层潜变量模型是捕捉复杂分布的强大方法。这些模型假设数据中存在未观察到的潜在结构。在本篇论文中，我们探讨了使用潜变量模型解决与生理监测相关的高维问题。首先，我们提出了一种新颖的深度状态空间模型，用于利用光学获取的信号生成心脏的电波形。通过可穿戴设备进行简单评估，可以实现心脏疾病的临床诊断。其次，我们提出了一种结合概率图模型和深度对抗学习优势的大脑信号建模方案。结构化表示可以提供可解释性，并编码归纳偏差以降低神经振荡的数据复杂性。学习表示的效果进一步在癫痫发作检测中作为无监督学习问题进行研究。第三，我们提出了一个用于联合建模生理测量和行为的框架。现有的结合多种脑数据来源的方法有限。直接分析不同类型生理测量之间的关系通常不涉及行为数据。我们的方法可以识别大脑区域对行为的独特和共享贡献，并可用于发现大脑区域的新功能。这些创新的计算方法的成功将允许生物标志物发现在物种之间的转化，并提供对许多生物学研究和临床诊断中的神经认知分析以及新兴消费者应用的洞察。

更新时间: 2024-05-29 17:07:33

领域: cs.LG

下载: http://arxiv.org/abs/2405.19277v1

A Recipe for Charge Density Prediction

In density functional theory, charge density is the core attribute of atomic systems from which all chemical properties can be derived. Machine learning methods are promising in significantly accelerating charge density prediction, yet existing approaches either lack accuracy or scalability. We propose a recipe that can achieve both. In particular, we identify three key ingredients: (1) representing the charge density with atomic and virtual orbitals (spherical fields centered at atom/virtual coordinates); (2) using expressive and learnable orbital basis sets (basis function for the spherical fields); and (3) using high-capacity equivariant neural network architecture. Our method achieves state-of-the-art accuracy while being more than an order of magnitude faster than existing methods. Furthermore, our method enables flexible efficiency-accuracy trade-offs by adjusting the model/basis sizes.

Updated: 2024-05-29 17:07:24

标题: 一个电荷密度预测的配方

摘要: 在密度泛函理论中，电荷密度是原子系统的核心属性，可以从中推导出所有化学性质。机器学习方法在显著加速电荷密度预测方面具有潜力，然而现有方法要么缺乏准确性，要么缺乏可扩展性。我们提出了一种可以同时实现准确性和可扩展性的方法。具体来说，我们确定了三个关键要素：(1)用原子和虚拟轨道(以原子/虚拟坐标为中心的球形场)表示电荷密度；(2)使用富有表现力和可学习的轨道基组(用于球形场的基函数)；(3)使用高容量等变神经网络架构。我们的方法在实现最新技术准确性的同时，比现有方法快一个数量级以上。此外，我们的方法通过调整模型/基组大小实现灵活的效率-准确性权衡。

更新时间: 2024-05-29 17:07:24

领域: physics.comp-ph,cs.LG

下载: http://arxiv.org/abs/2405.19276v1

Robust Emotion Recognition in Context Debiasing

Context-aware emotion recognition (CAER) has recently boosted the practical applications of affective computing techniques in unconstrained environments. Mainstream CAER methods invariably extract ensemble representations from diverse contexts and subject-centred characteristics to perceive the target person's emotional state. Despite advancements, the biggest challenge remains due to context bias interference. The harmful bias forces the models to rely on spurious correlations between background contexts and emotion labels in likelihood estimation, causing severe performance bottlenecks and confounding valuable context priors. In this paper, we propose a counterfactual emotion inference (CLEF) framework to address the above issue. Specifically, we first formulate a generalized causal graph to decouple the causal relationships among the variables in CAER. Following the causal graph, CLEF introduces a non-invasive context branch to capture the adverse direct effect caused by the context bias. During the inference, we eliminate the direct context effect from the total causal effect by comparing factual and counterfactual outcomes, resulting in bias mitigation and robust prediction. As a model-agnostic framework, CLEF can be readily integrated into existing methods, bringing consistent performance gains.

Updated: 2024-05-29 17:07:14

标题: 在上下文去偏差中的强大情绪识别

摘要: 上下文感知情绪识别（CAER）最近在无约束环境中提升了情感计算技术的实际应用。主流的CAER方法总是从不同的背景和主观特征中提取集合表示，以感知目标个体的情绪状态。尽管取得了进展，但最大的挑战仍然是由于上下文偏见干扰。有害的偏见迫使模型依赖于背景上下文和情绪标签之间的虚假相关性来进行可能性估计，导致严重的性能瓶颈并混淆了有价值的上下文先验。在本文中，我们提出了一个反事实情绪推断（CLEF）框架来解决上述问题。具体来说，我们首先制定了一个广义因果图来解耦CAER中变量之间的因果关系。根据因果图，CLEF引入一个非侵入式的上下文分支来捕捉由上下文偏见引起的不良直接效应。在推断过程中，我们通过比较事实和反事实结果来消除直接上下文效应对总因果效应的影响，从而减轻偏见并进行稳健预测。作为一个与模型无关的框架，CLEF可以轻松集成到现有方法中，带来一致的性能提升。

更新时间: 2024-05-29 17:07:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2403.05963v2

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; other research uses large language models to predict edited music, resulting in imprecise audio reconstruction. To Combine the strengths and address these limitations, we introduce Instruct-MusicGen, a novel approach that finetunes a pretrained MusicGen model to efficiently follow editing instructions such as adding, removing, or separating stems. Our approach involves a modification of the original MusicGen architecture by incorporating a text fusion module and an audio fusion module, which allow the model to process instruction texts and audio inputs concurrently and yield the desired edited music. Remarkably, Instruct-MusicGen only introduces 8% new parameters to the original MusicGen model and only trains for 5K steps, yet it achieves superior performance across all tasks compared to existing baselines, and demonstrates performance comparable to the models trained for specific tasks. This advancement not only enhances the efficiency of text-to-music editing but also broadens the applicability of music language models in dynamic music production environments.

Updated: 2024-05-29 17:05:32

标题: Instruct-MusicGen: 通过指导调整实现音乐语言模型的文本到音乐编辑

摘要: 最近在文本到音乐编辑领域取得了进展，利用文本查询来修改音乐（例如通过改变其风格或调整乐器组件），为AI辅助音乐创作提供了独特的挑战和机遇。在这一领域先前的方法受限于需要从头开始训练特定的编辑模型，这既消耗资源又低效；其他研究使用大型语言模型来预测编辑后的音乐，导致音频重建不精确。为了结合各方面的优势并解决这些限制，我们引入了Instruct-MusicGen，这是一种新颖的方法，通过对预训练的MusicGen模型进行微调，以有效地遵循编辑指令，例如添加、删除或分离音轨。我们的方法涉及对原始MusicGen架构的修改，包括一个文本融合模块和一个音频融合模块，这允许模型同时处理指令文本和音频输入，并产生所需的编辑后音乐。值得注意的是，Instruct-MusicGen仅向原始MusicGen模型引入了8%的新参数，并仅训练了5K步，然而与现有基线相比，在所有任务上都实现了优越的性能，并展示了与针对特定任务进行训练的模型相当的性能。这一进步不仅提高了文本到音乐编辑的效率，还拓宽了音乐语言模型在动态音乐制作环境中的适用性。

更新时间: 2024-05-29 17:05:32

领域: cs.SD,cs.AI,cs.LG,cs.MM,eess.AS

下载: http://arxiv.org/abs/2405.18386v2

Mitigating Disparate Impact of Differential Privacy in Federated Learning through Robust Clustering

Federated Learning (FL) is a decentralized machine learning (ML) approach that keeps data localized and often incorporates Differential Privacy (DP) to enhance privacy guarantees. Similar to previous work on DP in ML, we observed that differentially private federated learning (DPFL) introduces performance disparities, particularly affecting minority groups. Recent work has attempted to address performance fairness in vanilla FL through clustering, but this method remains sensitive and prone to errors, which are further exacerbated by the DP noise in DPFL. To fill this gap, in this paper, we propose a novel clustered DPFL algorithm designed to effectively identify clients' clusters in highly heterogeneous settings while maintaining high accuracy with DP guarantees. To this end, we propose to cluster clients based on both their model updates and training loss values. Our proposed approach also addresses the server's uncertainties in clustering clients' model updates by employing larger batch sizes along with Gaussian Mixture Model (GMM) to alleviate the impact of noise and potential clustering errors, especially in privacy-sensitive scenarios. We provide theoretical analysis of the effectiveness of our proposed approach. We also extensively evaluate our approach across diverse data distributions and privacy budgets and show its effectiveness in mitigating the disparate impact of DP in FL settings with a small computational cost.

Updated: 2024-05-29 17:03:31

标题: 通过强大的聚类减轻联邦学习中差异隐私的影响

摘要: 联邦学习（FL）是一种分散的机器学习（ML）方法，它将数据保留在本地，并经常结合差分隐私（DP）来增强隐私保障。与先前关于ML中DP的工作类似，我们观察到差分私有化的联邦学习（DPFL）引入了性能差异，特别影响少数群体。最近的工作尝试通过聚类来解决普通FL中的性能公平性问题，但这种方法仍然敏感且容易出错，而DPFL中的DP噪声进一步加剧了这种情况。为了填补这一空白，本文提出了一种新颖的聚类DPFL算法，旨在有效识别高度异质设置中的客户群集，同时保持高准确性和DP保证。为此，我们提出根据客户的模型更新和训练损失值来对客户进行聚类的方法。我们提出的方法还通过采用更大的批量大小以及高斯混合模型（GMM）来减轻噪声和潜在的聚类错误的影响，特别是在隐私敏感的情况下，解决了服务器在聚类客户模型更新时的不确定性。我们对我们提出的方法的有效性进行了理论分析。我们还在不同数据分布和隐私预算下广泛评估我们的方法，并展示了其在减轻FL设置中DP的不平等影响方面的有效性，而且计算成本很小。

更新时间: 2024-05-29 17:03:31

领域: cs.LG,cs.CR,cs.DC

下载: http://arxiv.org/abs/2405.19272v1

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Sample-efficiency and reliability remain major bottlenecks toward wide adoption of reinforcement learning algorithms in continuous settings with high-dimensional perceptual inputs. Toward addressing these challenges, we introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations, but the environment is governed by low-dimensional latent states and Lipschitz continuous dynamics. Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient. The core of our algorithm is a new representation learning objective; we show that prior representation learning schemes tailored to discrete dynamics do not naturally extend to the continuous setting. Our new objective is amenable to practical implementation, and empirically, we find that it compares favorably to prior schemes in a standard evaluation protocol. We further provide several insights into the statistical complexity of the RichCLD framework, in particular proving that certain notions of Lipschitzness that admit sample-efficient learning in the absence of rich observations are insufficient in the rich-observation setting.

Updated: 2024-05-29 17:02:49

标题: 使用连续潜在动态的Rich-Observation强化学习

摘要: 样本效率和可靠性仍然是在具有高维感知输入的连续环境中广泛采用强化学习算法时的主要瓶颈。为了解决这些挑战，我们引入了一个新的理论框架，即RichCLD（具有连续潜在动态的富观测RL），在这个框架中，代理根据高维观测执行控制，但环境由低维潜在状态和Lipschitz连续动态驱动。我们的主要贡献是针对这种情况的一种新算法，可以证明在统计和计算效率方面是有效的。我们算法的核心是一个新的表示学习目标；我们表明，专门针对离散动态的先前表示学习方案并不自然地扩展到连续环境。我们的新目标适合于实际实施，并在实证中发现，它在标准评估协议中与先前方案相比表现有利。我们进一步提供了关于RichCLD框架的统计复杂性的几个见解，特别是证明在富观测环境中，某些Lipschitzness概念在缺乏丰富观测的情况下适用于样本有效学习。

更新时间: 2024-05-29 17:02:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.19269v1

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Scale has become a main ingredient in obtaining strong machine learning models. As a result, understanding a model's scaling properties is key to effectively designing both the right training setup as well as future generations of architectures. In this work, we argue that scale and training research has been needlessly complex due to reliance on the cosine schedule, which prevents training across different lengths for the same model size. We investigate the training behavior of a direct alternative - constant learning rate and cooldowns - and find that it scales predictably and reliably similar to cosine. Additionally, we show that stochastic weight averaging yields improved performance along the training trajectory, without additional training costs, across different scales. Importantly, with these findings we demonstrate that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs. Our code is available at https://github.com/epfml/schedules-and-scaling.

Updated: 2024-05-29 16:56:26

标题: 尺度规律与超越固定训练时长的计算优化训练

摘要: 规模已经成为获得强大机器学习模型的主要要素。因此，理解模型的缩放特性对于有效设计正确的训练设置以及未来的架构至关重要。在这项工作中，我们认为，由于依赖余弦调度，规模和训练研究变得不必要复杂，这阻止了对相同模型大小的不同长度进行训练。我们研究了一个直接的替代方案 - 恒定学习率和冷却时间 - 并发现它与余弦函数相似地可预测和可靠地缩放。此外，我们展示了随机权重平均化在训练轨迹上提供了改进的性能，而不增加额外的训练成本，跨不同规模。重要的是，通过这些发现，我们展示了可以利用更少但可重复使用的训练运行显著减少计算和GPU小时数来进行规模实验。我们的代码可在https://github.com/epfml/schedules-and-scaling上找到。

更新时间: 2024-05-29 16:56:26

领域: cs.LG

下载: http://arxiv.org/abs/2405.18392v2

Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models

Large language models are usually fine-tuned to align with human preferences. However, fine-tuning a large language model can be challenging. In this work, we introduce $\textit{weak-to-strong search}$, framing the alignment of a large language model as a test-time greedy search to maximize the log-likelihood difference between small tuned and untuned models while sampling from the frozen large model. This method serves both as (i) a compute-efficient model up-scaling strategy that avoids directly tuning the large model and as (ii) an instance of weak-to-strong generalization that enhances a strong model with weak test-time guidance. Empirically, we demonstrate the flexibility of weak-to-strong search across different tasks. In controlled-sentiment generation and summarization, we use tuned and untuned $\texttt{gpt2}$s to effectively improve the alignment of large models without additional training. Crucially, in a more difficult instruction-following benchmark, AlpacaEval 2.0, we show that reusing off-the-shelf small model pairs (e.g., $\texttt{zephyr-7b-beta}$ and its untuned version) can significantly improve the length-controlled win rates of both white-box and black-box large models against $\texttt{gpt-4-turbo}$ (e.g., $34.4 \rightarrow 37.9$ for $\texttt{Llama-3-70B-Instruct}$ and $16.0 \rightarrow 20.1$ for $\texttt{gpt-3.5-turbo-instruct}$), despite the small models' low win rates $\approx 10.0$.

Updated: 2024-05-29 16:55:32

标题: 弱到强搜索：通过在小语言模型上进行搜索来对齐大型语言模型

摘要: 大型语言模型通常被微调以与人类偏好对齐。然而，微调大型语言模型可能具有挑战性。在这项工作中，我们介绍了“弱到强搜索”，将大型语言模型的对齐视为一个在测试时贪婪搜索，以最大化小型微调和未微调模型之间的对数似然差异，同时从冻结的大模型中进行抽样。这种方法既可以作为一种计算高效的模型扩展策略，避免直接微调大模型，又可以作为一种弱到强泛化的实例，通过弱测试时引导增强强模型。在经验上，我们展示了弱到强搜索在不同任务中的灵活性。在受控情感生成和总结中，我们使用微调和未微调的gpt2有效地提高了大型模型的对齐性，而无需额外训练。在更困难的指令遵循基准测试AlpacaEval 2.0中，我们展示了重复使用现成的小模型对（例如，zephyr-7b-beta及其未微调版本）可以显著改善白盒和黑盒大模型对gpt-4-turbo的长度控制胜率（例如，Llama-3-70B-Instruct从34.4提高到37.9，gpt-3.5-turbo-instruct从16.0提高到20.1），尽管小模型的胜率约为10.0。

更新时间: 2024-05-29 16:55:32

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19262v1

Faster Cascades via Speculative Decoding

Cascades and speculative decoding are two common approaches to improving language models' inference efficiency. Both approaches involve interleaving models of different sizes, but via fundamentally distinct mechanisms: cascades employ a deferral rule that invokes the larger model only for "hard" inputs, while speculative decoding uses speculative execution to primarily invoke the larger model in parallel verification mode. These mechanisms offer different benefits: empirically, cascades are often capable of yielding better quality than even the larger model, while theoretically, speculative decoding offers a guarantee of quality-neutrality. In this paper, we leverage the best of both these approaches by designing new speculative cascading techniques that implement their deferral rule through speculative execution. We characterize the optimal deferral rule for our speculative cascades, and employ a plug-in approximation to the optimal rule. Through experiments with T5 models on benchmark language tasks, we show that the proposed approach yields better cost-quality trade-offs than cascading and speculative decoding baselines.

Updated: 2024-05-29 16:55:08

标题: 更快的级联通过推测解码

摘要: 级联和投机解码是改进语言模型推理效率的两种常见方法。这两种方法都涉及交替使用不同大小的模型，但通过基本不同的机制：级联采用延迟规则，仅对“困难”输入调用较大的模型，而投机解码则使用投机执行，主要以并行验证模式调用较大的模型。这些机制提供不同的好处：经验上，级联通常能够产生比较大模型甚至更好的质量，而理论上，投机解码提供了质量中性的保证。在本文中，我们通过设计新的投机级联技术，利用这两种方法的优势，通过投机执行实现它们的延迟规则。我们对我们的投机级联进行了最佳延迟规则的表征，并使用最佳规则的插件近似。通过在基准语言任务上使用T5模型进行实验，我们展示了所提出的方法比级联和投机解码基线产生更好的成本-质量权衡。

更新时间: 2024-05-29 16:55:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19261v1

Exploring Fairness in Educational Data Mining in the Context of the Right to be Forgotten

In education data mining (EDM) communities, machine learning has achieved remarkable success in discovering patterns and structures to tackle educational challenges. Notably, fairness and algorithmic bias have gained attention in learning analytics of EDM. With the increasing demand for the right to be forgotten, there is a growing need for machine learning models to forget sensitive data and its impact, particularly within the realm of EDM. The paradigm of selective forgetting, also known as machine unlearning, has been extensively studied to address this need by eliminating the influence of specific data from a pre-trained model without complete retraining. However, existing research assumes that interactive data removal operations are conducted in secure and reliable environments, neglecting potential malicious unlearning requests to undermine the fairness of machine learning systems. In this paper, we introduce a novel class of selective forgetting attacks designed to compromise the fairness of learning models while maintaining their predictive accuracy, thereby preventing the model owner from detecting the degradation in model performance. Additionally, we propose an innovative optimization framework for selective forgetting attacks, capable of generating malicious unlearning requests across various attack scenarios. We validate the effectiveness of our proposed selective forgetting attacks on fairness through extensive experiments using diverse EDM datasets.

Updated: 2024-05-29 16:52:43

标题: 在“被遗忘权”背景下探索教育数据挖掘中的公平性

摘要: 在教育数据挖掘（EDM）社区中，机器学习在发现模式和结构以应对教育挑战方面取得了显著成功。值得注意的是，在EDM的学习分析中，公平性和算法偏见备受关注。随着对被遗忘权利的需求日益增加，机器学习模型需要忘记敏感数据及其影响，尤其是在EDM领域内。选择性遗忘的范式，也称为机器取消学习，已被广泛研究，以满足这种需求，从而在不完全重新训练的情况下消除预先训练模型的特定数据的影响。然而，现有研究假设交互式数据删除操作是在安全可靠的环境中进行的，忽视了可能的恶意取消学习请求，以破坏机器学习系统的公平性。在本文中，我们介绍了一类新颖的选择性遗忘攻击，旨在破坏学习模型的公平性，同时保持其预测准确性，从而阻止模型所有者检测到模型性能的下降。此外，我们提出了一个创新的选择性遗忘攻击优化框架，能够在各种攻击场景下生成恶意取消学习请求。我们通过使用各种EDM数据集进行广泛实验，验证了我们提出的选择性遗忘攻击对公平性的有效性。

更新时间: 2024-05-29 16:52:43

领域: cs.LG

下载: http://arxiv.org/abs/2405.16798v2

Generating In-Distribution Proxy Graphs for Explaining Graph Neural Networks

Graph Neural Networks (GNNs) have become a building block in graph data processing, with wide applications in critical domains. The growing needs to deploy GNNs in high-stakes applications necessitate explainability for users in the decision-making processes. A popular paradigm for the explainability of GNNs is to identify explainable subgraphs by comparing their labels with the ones of original graphs. This task is challenging due to the substantial distributional shift from the original graphs in the training set to the set of explainable subgraphs, which prevents accurate prediction of labels with the subgraphs. To address it, in this paper, we propose a novel method that generates proxy graphs for explainable subgraphs that are in the distribution of training data. We introduce a parametric method that employs graph generators to produce proxy graphs. A new training objective based on information theory is designed to ensure that proxy graphs not only adhere to the distribution of training data but also preserve explanatory factors. Such generated proxy graphs can be reliably used to approximate the predictions of the labels of explainable subgraphs. Empirical evaluations across various datasets demonstrate our method achieves more accurate explanations for GNNs.

Updated: 2024-05-29 16:52:16

标题: 生成用于解释图神经网络的内部分布代理图 (Note: "In-Distribution" in this context likely refers to graphs that are within the same distribution as the training data for the graph neural network.)

摘要: 图神经网络（GNNs）已经成为图数据处理中的基本构建模块，在关键领域具有广泛的应用。在高风险应用中部署GNNs的不断增长的需求需要用户在决策过程中具有可解释性。解释GNNs的流行范式是通过比较其标签与原始图的标签来识别可解释子图。这个任务很具有挑战性，原因是训练集中的原始图与可解释子图集之间存在重大的分布偏移，这会阻碍对子图标签的准确预测。为了解决这个问题，在本文中，我们提出了一种新颖的方法，生成在训练数据分布中的可解释子图的代理图。我们引入了一种使用图生成器产生代理图的参数化方法。基于信息理论设计了一个新的训练目标，以确保代理图不仅符合训练数据的分布，还保留解释因素。这样生成的代理图可可靠地用于近似预测可解释子图的标签。在各种数据集上的实证评估表明，我们的方法为GNNs实现了更准确的解释。

更新时间: 2024-05-29 16:52:16

领域: cs.LG

下载: http://arxiv.org/abs/2402.02036v2

A Privacy-Preserving Graph Encryption Scheme Based on Oblivious RAM

Graph encryption schemes play a crucial role in facilitating secure queries on encrypted graphs hosted on untrusted servers. With applications spanning navigation systems, network topology, and social networks, the need to safeguard sensitive data becomes paramount. Existing graph encryption methods, however, exhibit vulnerabilities by inadvertently revealing aspects of the graph structure and query patterns, posing threats to security and privacy. In response, we propose a novel graph encryption scheme designed to mitigate access pattern and query pattern leakage through the integration of oblivious RAM and trusted execution environment techniques, exemplified by a Trusted Execution Environment (TEE). Our solution establishes two key security objectives: (1) ensuring that adversaries, when presented with an encrypted graph, remain oblivious to any information regarding the underlying graph, and (2) achieving query indistinguishability by concealing access patterns. Additionally, we conducted experimentation to evaluate the efficiency of the proposed schemes when dealing with real-world location navigation services.

Updated: 2024-05-29 16:47:38

标题: 基于遗忘RAM的隐私保护图加密方案

摘要: 图加密方案在促进在不受信任的服务器上托管的加密图上进行安全查询方面发挥着至关重要的作用。应用程序涵盖导航系统、网络拓扑和社交网络，保护敏感数据的需求变得至关重要。然而，现有的图加密方法存在漏洞，无意中泄露图结构和查询模式的方面，从而对安全性和隐私构成威胁。为此，我们提出了一种新颖的图加密方案，旨在通过集成遗忘型RAM和受信任执行环境技术来减轻访问模式和查询模式泄漏的问题，其中一个典型的例子就是可信执行环境（TEE）。我们的解决方案确立了两个关键的安全目标：（1）确保敌手在面对加密图时对底层图的任何信息保持无知，（2）通过隐藏访问模式实现查询不可区分性。此外，我们进行了实验，评估了提出的方案在处理真实世界位置导航服务时的效率。

更新时间: 2024-05-29 16:47:38

领域: cs.CR

下载: http://arxiv.org/abs/2405.19259v1

PINE: Efficient Norm-Bound Verification for Secret-Shared Vectors

Secure aggregation of high-dimensional vectors is a fundamental primitive in federated statistics and learning. A two-server system such as PRIO allows for scalable aggregation of secret-shared vectors. Adversarial clients might try to manipulate the aggregate, so it is important to ensure that each (secret-shared) contribution is well-formed. In this work, we focus on the important and well-studied goal of ensuring that each contribution vector has bounded Euclidean norm. Existing protocols for ensuring bounded-norm contributions either incur a large communication overhead, or only allow for approximate verification of the norm bound. We propose Private Inexpensive Norm Enforcement (PINE): a new protocol that allows exact norm verification with little communication overhead. For high-dimensional vectors, our approach has a communication overhead of a few percent, compared to the 16-32x overhead of previous approaches.

Updated: 2024-05-29 16:47:17

标题: PINE：用于秘密共享向量的高效规范约束验证

摘要: 高维向量的安全聚合是联邦统计学和学习中的基本原语。 PRIO等双服务器系统允许可扩展的秘密共享向量的聚合。对手客户端可能会尝试操纵聚合，因此确保每个（秘密共享）贡献都是良好形式的至关重要。在这项工作中，我们关注确保每个贡献向量具有有界欧几里得范数的重要且经过充分研究的目标。现有的确保有界范数贡献的协议要么产生大量通信开销，要么仅允许对范数边界进行近似验证。我们提出了私人低成本范数强制执行（PINE）：一种新协议，允许以很少的通信开销进行精确的范数验证。对于高维向量，与先前方法的16-32倍的开销相比，我们的方法的通信开销仅为几个百分比。

更新时间: 2024-05-29 16:47:17

领域: cs.CR

下载: http://arxiv.org/abs/2311.10237v2

UP5: Unbiased Foundation Model for Fairness-aware Recommendation

Recent advances in Foundation Models such as Large Language Models (LLMs) have propelled them to the forefront of Recommender Systems (RS). Despite their utility, there is a growing concern that LLMs might inadvertently perpetuate societal stereotypes, resulting in unfair recommendations. Since fairness is critical for RS as many users take it for decision-making and demand fulfillment, this paper focuses on user-side fairness for LLM-based recommendation where the users may require a recommender system to be fair on specific sensitive features such as gender or age. In this paper, we dive into the extent of unfairness exhibited by LLM-based recommender models based on both T5 and LLaMA backbones, and discuss appropriate methods for promoting equitable treatment of users in LLM-based recommendation models. We introduce a novel Counterfactually-Fair-Prompt (CFP) method towards Unbiased Foundation mOdels (UFO) for fairness-aware LLM-based recommendation. Experiments are conducted on two real-world datasets, MovieLens-1M and Insurance, and compared with both matching-based and sequential-based fairness-aware recommendation models. Results show that CFP achieves better recommendation performance with a high level of fairness. Data and code are open-sourced at https://github.com/agiresearch/UP5.

Updated: 2024-05-29 16:46:47

标题: UP5：公正感知推荐的无偏基础模型

摘要: 最近基于大型语言模型（LLMs）等基础模型的进展将它们推到了推荐系统（RS）的前沿。尽管它们非常实用，但人们越来越担心LLMs可能会无意中持续社会刻板印象，导致不公平的推荐。由于公平对于RS至关重要，因为许多用户将其用于决策和需求满足，本文侧重于基于用户的公平性，用户可能要求推荐系统在特定敏感特征（如性别或年龄）上公平。本文深入探讨了基于T5和LLaMA骨干的LLM推荐模型所展示的不公平程度，并讨论了促进LLM推荐模型中用户公平处理的适当方法。我们引入了一种新颖的反事实公平提示（CFP）方法，以实现面向公平感知的LLM推荐的无偏基础模型（UFO）。实验在两个真实世界数据集MovieLens-1M和Insurance上进行，并与基于匹配和基于顺序的公平感知推荐模型进行比较。结果显示，CFP在高水平的公平性下实现了更好的推荐性能。数据和代码在https://github.com/agiresearch/UP5 上开源。

更新时间: 2024-05-29 16:46:47

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2305.12090v2

HetCAN: A Heterogeneous Graph Cascade Attention Network with Dual-Level Awareness

Heterogeneous graph neural networks(HGNNs) have recently shown impressive capability in modeling heterogeneous graphs that are ubiquitous in real-world applications. Most existing methods for heterogeneous graphs mainly learn node embeddings by stacking multiple convolutional or attentional layers, which can be considered as capturing the high-order information from node-level aspect. However, different types of nodes in heterogeneous graphs have diverse features, it is also necessary to capture interactions among node features, namely the high-order information from feature-level aspect. In addition, most methods first align node features by mapping them into one same low-dimensional space, while they may lose some type information of nodes in this way. To address these problems, in this paper, we propose a novel Heterogeneous graph Cascade Attention Network (HetCAN) composed of multiple cascade blocks. Each cascade block includes two components, the type-aware encoder and the dimension-aware encoder. Specifically, the type-aware encoder compensates for the loss of node type information and aims to make full use of graph heterogeneity. The dimension-aware encoder is able to learn the feature-level high-order information by capturing the interactions among node features. With the assistance of these components, HetCAN can comprehensively encode information of node features, graph heterogeneity and graph structure in node embeddings. Extensive experiments demonstrate the superiority of HetCAN over advanced competitors and also exhibit its efficiency and robustness.

Updated: 2024-05-29 16:43:56

标题: HetCAN：具有双层意识的异构图级联注意力网络

摘要: 异质图神经网络（HGNNs）最近在建模现实世界中普遍存在的异质图方面展示出了令人印象深刻的能力。大多数现有的异质图方法主要通过堆叠多个卷积或注意力层来学习节点嵌入，这可以被视为从节点级别方面捕获高阶信息。然而，在异质图中不同类型的节点具有多样化的特征，因此捕获节点特征之间的相互作用，即从特征级别方面获取高阶信息，也是必要的。此外，大多数方法首先通过将节点特征映射到同一低维空间来对齐节点特征，但这样做可能会丢失一些节点类型信息。为了解决这些问题，本文提出了一种由多个级联块组成的新型异质图级联注意力网络（HetCAN）。每个级联块包括两个组件，即类型感知编码器和维度感知编码器。具体而言，类型感知编码器补偿了节点类型信息的丢失，并旨在充分利用图的异质性。维度感知编码器能够通过捕获节点特征之间的相互作用来学习特征级别的高阶信息。借助这些组件的帮助，HetCAN可以全面地编码节点特征、图的异质性和图结构的信息。大量实验证明了HetCAN相对于先进竞争对手的优越性，也展示了其效率和鲁棒性。

更新时间: 2024-05-29 16:43:56

领域: cs.LG,cs.SI

下载: http://arxiv.org/abs/2311.03275v2

Weak Generative Sampler to Efficiently Sample Invariant Distribution of Stochastic Differential Equation

Sampling invariant distributions from an Ito diffusion process presents a significant challenge in stochastic simulation. Traditional numerical solvers for stochastic differential equations require both a fine step size and a lengthy simulation period, resulting in both biased and correlated samples. Current deep learning-based method solves the stationary Fokker--Planck equation to determine the invariant probability density function in form of deep neural networks, but they generally do not directly address the problem of sampling from the computed density function. In this work, we introduce a framework that employs a weak generative sampler (WGS) to directly generate independent and identically distributed (iid) samples induced by a transformation map derived from the stationary Fokker--Planck equation. Our proposed loss function is based on the weak form of the Fokker--Planck equation, integrating normalizing flows to characterize the invariant distribution and facilitate sample generation from the base distribution. Our randomized test function circumvents the need for mini-max optimization in the traditional weak formulation. Distinct from conventional generative models, our method neither necessitates the computationally intensive calculation of the Jacobian determinant nor the invertibility of the transformation map. A crucial component of our framework is the adaptively chosen family of test functions in the form of Gaussian kernel functions with centres selected from the generated data samples. Experimental results on several benchmark examples demonstrate the effectiveness of our method, which offers both low computational costs and excellent capability in exploring multiple metastable states.

Updated: 2024-05-29 16:41:42

标题: 弱生成采样器用于高效采样随机微分方程的不变分布

摘要: 从Ito扩散过程中抽样不变分布在随机模拟中提出了一个重要挑战。传统的随机微分方程数值解需要精细的步长和漫长的模拟周期，导致样本既有偏差又相关。当前基于深度学习的方法解决了稳态福克-普朗克方程，以确定深度神经网络形式的不变概率密度函数，但它们通常不直接解决从计算密度函数中抽样的问题。在这项工作中，我们引入了一个框架，利用弱生成器采样器（WGS）直接生成由稳态福克-普朗克方程导出的变换映射诱导的独立同分布（iid）样本。我们提出的损失函数基于福克-普朗克方程的弱形式，整合了标准流来表征不变分布并促进从基础分布中生成样本。我们的随机测试函数避免了传统弱形式中的最小最大优化的需要。与传统生成模型不同，我们的方法既不需要计算密集型的雅各比行列式，也不需要变换映射的可逆性。我们框架的一个关键组成部分是自适应选择的以高斯核函数形式的测试函数族，其中中心是从生成的数据样本中选择的。在几个基准示例上的实验结果表明了我们方法的有效性，它既具有低计算成本，又能够很好地探索多个亚稳态。

更新时间: 2024-05-29 16:41:42

领域: cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2405.19256v1

Towards Next-Generation Urban Decision Support Systems through AI-Powered Generation of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation

The incorporation of Artificial Intelligence (AI) models into various optimization systems is on the rise. Yet, addressing complex urban and environmental management problems normally requires in-depth domain science and informatics expertise. This expertise is essential for deriving data and simulation-driven for informed decision support. In this context, we investigate the potential of leveraging the pre-trained Large Language Models (LLMs). By adopting ChatGPT API as the reasoning core, we outline an integrated workflow that encompasses natural language processing, methontology-based prompt tuning, and transformers. This workflow automates the creation of scenario-based ontology using existing research articles and technical manuals of urban datasets and simulations. The outcomes of our methodology are knowledge graphs in widely adopted ontology languages (e.g., OWL, RDF, SPARQL). These facilitate the development of urban decision support systems by enhancing the data and metadata modeling, the integration of complex datasets, the coupling of multi-domain simulation models, and the formulation of decision-making metrics and workflow. The feasibility of our methodology is evaluated through a comparative analysis that juxtaposes our AI-generated ontology with the well-known Pizza Ontology employed in tutorials for popular ontology software (e.g., prot\'eg\'e). We close with a real-world case study of optimizing the complex urban system of multi-modal freight transportation by generating anthologies of various domain data and simulations to support informed decision-making.

Updated: 2024-05-29 16:40:31

标题: 通过使用大型语言模型进行人工智能驱动的科学本体生成，助力下一代城市决策支持系统 - 以优化联运货运为例

摘要: 人工智能（AI）模型的结合正在不断增加到各种优化系统中。然而，解决复杂的城市和环境管理问题通常需要深入的领域科学和信息学专业知识。这种专业知识对于获取数据和模拟驱动的决策支持至关重要。在这种背景下，我们研究了利用预先训练的大型语言模型（LLMs）的潜力。通过采用ChatGPT API作为推理核心，我们概述了一个集成的工作流程，包括自然语言处理、基于methontology的提示调整和转换器。这个工作流程通过使用现有研究文章和城市数据集和模拟技术手册自动创建基于场景的本体论。我们方法的结果是用广泛采用的本体语言（如OWL、RDF、SPARQL）制作的知识图。这些本体图有助于通过增强数据和元数据建模、集成复杂数据集、耦合多领域模拟模型以及制定决策度量和工作流程来开发城市决策支持系统。我们的方法的可行性通过比较分析来评估，将我们的AI生成的本体与用于流行本体软件教程的著名Pizza本体进行对比。最后，我们通过一个真实案例研究来优化多模式货运运输的复杂城市系统，通过生成各种领域数据和模拟来支持理性决策。

更新时间: 2024-05-29 16:40:31

领域: cs.AI

下载: http://arxiv.org/abs/2405.19255v1

Kotlin ML Pack: Technical Report

In this technical report, we present three novel datasets of Kotlin code: KStack, KStack-clean, and KExercises. We also describe the results of fine-tuning CodeLlama and DeepSeek models on this data. Additionally, we present a version of the HumanEval benchmark rewritten by human experts into Kotlin - both the solutions and the tests. Our results demonstrate that small, high-quality datasets (KStack-clean and KExercises) can significantly improve model performance on code generation tasks, achieving up to a 16-point increase in pass rate on the HumanEval benchmark. Lastly, we discuss potential future work in the field of improving language modeling for Kotlin, including the use of static analysis tools in the learning process and the introduction of more intricate and realistic benchmarks.

Updated: 2024-05-29 16:33:50

标题: Kotlin 机器学习包：技术报告

摘要: 在这份技术报告中，我们介绍了三个新颖的Kotlin代码数据集：KStack、KStack-clean和KExercises。我们还描述了在这些数据上对CodeLlama和DeepSeek模型进行微调的结果。此外，我们还展示了由人类专家重新编写成Kotlin的HumanEval基准的版本 - 包括解决方案和测试。我们的结果表明，小而高质量的数据集（KStack-clean和KExercises）可以显著提高模型在代码生成任务上的性能，使在HumanEval基准测试中通过率提高达16个百分点。最后，我们讨论了在改进Kotlin语言建模领域的潜在未来工作，包括在学习过程中使用静态分析工具以及引入更复杂和现实的基准测试。

更新时间: 2024-05-29 16:33:50

领域: cs.SE,cs.AI,cs.PL

下载: http://arxiv.org/abs/2405.19250v1

More Flexible PAC-Bayesian Meta-Learning by Learning Learning Algorithms

We introduce a new framework for studying meta-learning methods using PAC-Bayesian theory. Its main advantage over previous work is that it allows for more flexibility in how the transfer of knowledge between tasks is realized. For previous approaches, this could only happen indirectly, by means of learning prior distributions over models. In contrast, the new generalization bounds that we prove express the process of meta-learning much more directly as learning the learning algorithm that should be used for future tasks. The flexibility of our framework makes it suitable to analyze a wide range of meta-learning mechanisms and even design new mechanisms. Other than our theoretical contributions we also show empirically that our framework improves the prediction quality in practical meta-learning mechanisms.

Updated: 2024-05-29 16:32:36

标题: 更灵活的PAC-Bayesian元学习：通过学习学习算法

摘要: 我们引入了一个新的框架，使用PAC-Bayesian理论研究元学习方法。与先前的工作相比，其主要优势在于它允许在任务之间的知识传递方面更灵活。对于先前的方法，这只能间接发生，通过学习模型先验分布的方式。相反，我们证明的新的泛化界限更直接地表达了元学习过程，即学习应该用于未来任务的学习算法。我们的框架的灵活性使其适合分析各种元学习机制甚至设计新机制。除了我们的理论贡献之外，我们还通过实验证明我们的框架改善了实际元学习机制中的预测质量。

更新时间: 2024-05-29 16:32:36

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.04054v2

Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI

In the current landscape of deep learning research, there is a predominant emphasis on achieving high predictive accuracy in supervised tasks involving large image and language datasets. However, a broader perspective reveals a multitude of overlooked metrics, tasks, and data types, such as uncertainty, active and continual learning, and scientific data, that demand attention. Bayesian deep learning (BDL) constitutes a promising avenue, offering advantages across these diverse settings. This paper posits that BDL can elevate the capabilities of deep learning. It revisits the strengths of BDL, acknowledges existing challenges, and highlights some exciting research avenues aimed at addressing these obstacles. Looking ahead, the discussion focuses on possible ways to combine large-scale foundation models with BDL to unlock their full potential.

Updated: 2024-05-29 16:31:07

标题: 位置：在大规模人工智能时代需要贝叶斯深度学习

摘要: 在当前深度学习研究领域，主要强调在涉及大型图像和语言数据集的监督任务中实现高预测准确性。然而，更广泛的视角揭示了许多被忽视的指标、任务和数据类型，如不确定性、主动学习和持续学习以及科学数据，这些都需要关注。贝叶斯深度学习（BDL）构成了一个有前途的途径，能够在这些不同的情境下提供优势。本文认为BDL可以提升深度学习的能力。它重新审视了BDL的优势，承认了现有的挑战，并强调了一些旨在应对这些障碍的令人兴奋的研究途径。展望未来，讨论着眼于可能的方式，将大规模基础模型与BDL相结合，以发挥它们的全部潜力。

更新时间: 2024-05-29 16:31:07

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2402.00809v3

Comparative Study of Neighbor-based Methods for Local Outlier Detection

The neighbor-based method has become a powerful tool to handle the outlier detection problem, which aims to infer the abnormal degree of the sample based on the compactness of the sample and its neighbors. However, the existing methods commonly focus on designing different processes to locate outliers in the dataset, while the contributions of different types neighbors to outlier detection has not been well discussed. To this end, this paper studies the neighbor in the existing outlier detection algorithms and a taxonomy is introduced, which uses the three-level components of information, neighbor and methodology to define hybrid methods. This taxonomy can serve as a paradigm where a novel neighbor-based outlier detection method can be proposed by combining different components in this taxonomy. A large number of comparative experiments were conducted on synthetic and real-world datasets in terms of performance comparison and case study, and the results show that reverse K-nearest neighbor based methods achieve promising performance and dynamic selection method is suitable for working in high-dimensional space. Notably, it is verified that rationally selecting components from this taxonomy may create an algorithms superior to existing methods.

Updated: 2024-05-29 16:28:12

标题: 邻居方法在本地异常值检测中的比较研究

摘要: 基于邻居的方法已成为处理异常检测问题的强大工具，旨在根据样本及其邻居的紧凑性推断样本的异常程度。然而，现有方法通常侧重于设计不同的过程来定位数据集中的异常值，而不同类型邻居对异常检测的贡献尚未得到充分讨论。因此，本文研究了现有异常检测算法中的邻居，并引入了一个分类法，使用信息、邻居和方法学的三级组件来定义混合方法。这个分类法可以作为一个范例，通过结合这个分类法中的不同组件来提出一种新颖的基于邻居的异常检测方法。在合成和现实世界的数据集上进行了大量的比较实验，以性能比较和案例研究为基础，结果表明基于逆K最近邻的方法取得了有希望的表现，并且动态选择方法适用于在高维空间中工作。值得注意的是，经验证合理地从这个分类法中选择组件可能会创造出比现有方法更优越的算法。

更新时间: 2024-05-29 16:28:12

领域: cs.LG

下载: http://arxiv.org/abs/2405.19247v1

Challenge-Device-Synthesis: A multi-disciplinary approach for the development of social innovation competences for students of Artificial Intelligence

The advent of Artificial Intelligence is expected to imply profound changes in the short-term. It is therefore imperative for Academia, and particularly for the Computer Science scope, to develop cross-disciplinary tools that bond AI developments to their social dimension. To this aim, we introduce the Challenge-Device-Synthesis methodology (CDS), in which a specific challenge is presented to the students of AI, who are required to develop a device as a solution for the challenge. The device becomes the object of study for the different dimensions of social transformation, and the conclusions addressed by the students during the discussion around the device are presented in a synthesis piece in the shape of a 10-page scientific paper. The latter is evaluated taking into account both the depth of analysis and the level to which it genuinely reflects the social transformations associated with the proposed AI-based device. We provide data obtained during the pilot for the implementation phase of CDS within the subject of Social Innovation, a 6-ECTS subject from the 6th semester of the Degree of Artificial Intelligence, UAB-Barcelona. We provide details on temporalisation, task distribution, methodological tools used and assessment delivery procedure, as well as qualitative analysis of the results obtained.

Updated: 2024-05-29 16:24:38

标题: 挑战-设备-合成：一种多学科方法，用于培养人工智能学生的社会创新能力

摘要: 人工智能的出现预计将在短期内带来深远的变化。因此，对于学术界，尤其是计算机科学领域来说，发展将人工智能发展与社会维度结合起来的跨学科工具是至关重要的。为此，我们引入了挑战-设备-综合方法（CDS），在这种方法中，向人工智能学生提出一个具体的挑战，要求他们开发一种设备作为挑战的解决方案。该设备成为社会转型的不同维度研究的对象，并且学生在围绕设备展开讨论时得出的结论以一篇10页的科学论文的形式呈现。对这篇论文的评估考虑了分析深度和其是否真实反映了与所提议的基于人工智能设备相关联的社会转型的水平。我们提供了在Social Innovation（社会创新）这门6学分课程中实施CDS的试点阶段中获得的数据，这门课程是UAB-Barcelona人工智能学位第六学期的课程。我们提供了有关时间安排、任务分配、使用的方法论工具和评估交付程序的详细信息，以及所获得结果的定性分析。

更新时间: 2024-05-29 16:24:38

领域: cs.AI,physics.ed-ph

下载: http://arxiv.org/abs/2405.19243v1

Explanation-based Belief Revision: Moving Beyond Minimalism to Explanatory Understanding

In belief revision, agents typically modify their beliefs when they receive some new piece of information that is in conflict with them. The guiding principle behind most belief revision frameworks is that of minimalism, which advocates minimal changes to existing beliefs. However, minimalism may not necessarily capture the nuanced ways in which human agents reevaluate and modify their beliefs. In contrast, the explanatory hypothesis indicates that people are inherently driven to seek explanations for inconsistencies, thereby striving for explanatory coherence rather than minimal changes when revising beliefs. Our contribution in this paper is two-fold. Motivated by the explanatory hypothesis, we first present a novel, yet simple belief revision operator that, given a belief base and an explanation for an explanandum, it revises the belief bases in a manner that preserves the explanandum and is not necessarily minimal. We call this operator explanation-based belief revision. Second, we conduct two human-subject studies to empirically validate our approach and investigate belief revision behavior in real-world scenarios. Our findings support the explanatory hypothesis and provide insights into the strategies people employ when resolving inconsistencies.

Updated: 2024-05-29 16:20:51

标题: 基于解释的信念修正：从极简主义到解释性理解

摘要: 在信念修正中，代理通常会在收到与其相矛盾的新信息时修改其信念。大多数信念修正框架背后的指导原则是极简主义，即提倡对现有信念进行最小化改变。然而，极简主义可能并不一定能捕捉到人类代理重新评估和修改信念的微妙方式。相比之下，解释假设表明人们本质上驱使他们寻求不一致性的解释，因此在修正信念时努力实现解释的一致性，而不是最小化改变。本文的贡献是双重的。受解释假设的启发，我们首先提出了一种新颖但简单的信念修正算子，即基于解释的信念修正算子，它在给定信念基础和一个解释的情况下，以一种保持解释对象并不一定最小化的方式修正信念基础。其次，我们进行了两项人体实验研究，以经验验证我们的方法，并调查现实场景中的信念修正行为。我们的研究结果支持解释假设，并提供了人们在解决不一致性时采用的策略的见解。

更新时间: 2024-05-29 16:20:51

领域: cs.AI

下载: http://arxiv.org/abs/2405.19238v1

ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning

While large-scale text-to-image diffusion models have demonstrated impressive image-generation capabilities, there are significant concerns about their potential misuse for generating unsafe content, violating copyright, and perpetuating societal biases. Recently, the text-to-image generation community has begun addressing these concerns by editing or unlearning undesired concepts from pre-trained models. However, these methods often involve data-intensive and inefficient fine-tuning or utilize various forms of token remapping, rendering them susceptible to adversarial jailbreaks. In this paper, we present a simple and effective training-free approach, ConceptPrune, wherein we first identify critical regions within pre-trained models responsible for generating undesirable concepts, thereby facilitating straightforward concept unlearning via weight pruning. Experiments across a range of concepts including artistic styles, nudity, object erasure, and gender debiasing demonstrate that target concepts can be efficiently erased by pruning a tiny fraction, approximately 0.12% of total weights, enabling multi-concept erasure and robustness against various white-box and black-box adversarial attacks.

Updated: 2024-05-29 16:19:37

标题: ConceptPrune：通过熟练的神经元修剪在扩散模型中进行概念编辑

摘要: 尽管大规模文本到图像扩散模型展示了令人印象深刻的图像生成能力，但人们对它们可能被用于生成不安全内容、侵犯版权和持续社会偏见表示了重大关注。最近，文本到图像生成社区开始通过编辑或取消预训练模型中不良概念来解决这些问题。然而，这些方法往往涉及数据密集型和低效的微调，或者利用各种形式的标记重映射，使它们容易受到对抗性越狱的攻击。在本文中，我们提出了一种简单有效的无需训练的方法，ConceptPrune，首先识别预训练模型中负责产生不良概念的关键区域，从而通过权重修剪实现直接概念取消。在包括艺术风格、裸露、物体擦除和性别去偏见在内的一系列概念上的实验表明，目标概念可以通过修剪总权重的约0.12%的微小部分有效擦除，从而实现多概念擦除并对各种白盒和黑盒对抗攻击具有鲁棒性。

更新时间: 2024-05-29 16:19:37

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19237v1

Retrieval Augmented Generation for Domain-specific Question Answering

Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.

Updated: 2024-05-29 16:18:02

标题: 特定领域问答的检索增强生成

摘要: 问答（QA）已成为大型语言模型先进开发中的重要应用。一般预训练的大型语言模型并未经过培训，无法正确理解特定领域（如金融、医疗、教育和产品客户服务）的知识或术语。为了更好地满足领域特定理解的需求，我们为Adobe产品构建了一个内部问答系统。我们提出了一个新颖的框架，用于编制大型问答数据库，并开发了一个用于大型语言模型的检索感知微调方法。我们展示了微调检索器会导致最终生成的重大改进。我们的整体方法在生成过程中减少了幻觉，同时保持了最新检索信息以进行上下文化基础。

更新时间: 2024-05-29 16:18:02

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2404.14760v2

On the Error-Propagation of Inexact Hotelling's Deflation for Principal Component Analysis

Principal Component Analysis (PCA) aims to find subspaces spanned by the so-called principal components that best represent the variance in the dataset. The deflation method is a popular meta-algorithm that sequentially finds individual principal components, starting from the most important ones and working towards the less important ones. However, as deflation proceeds, numerical errors from the imprecise estimation of principal components propagate due to its sequential nature. This paper mathematically characterizes the error propagation of the inexact Hotelling's deflation method. We consider two scenarios: $i)$ when the sub-routine for finding the leading eigenvector is abstract and can represent various algorithms; and $ii)$ when power iteration is used as the sub-routine. In the latter case, the additional directional information from power iteration allows us to obtain a tighter error bound than the sub-routine agnostic case. For both scenarios, we explicitly characterize how the errors progress and affect subsequent principal component estimations.

Updated: 2024-05-29 16:17:24

标题: 不精确 Hotelling's Deflation 在主成分分析中的误差传播

摘要: 主成分分析（PCA）旨在找到由所谓的主成分张成的子空间，最能代表数据集中的方差。缩放方法是一种流行的元算法，顺序地找到单个主成分，从最重要的开始，逐渐向不太重要的主成分工作。然而，随着缩放的进行，由于主成分估计不准确而产生的数值误差会由于其顺序性质而传播。本文数学地表征了不精确Hotelling缩放方法的误差传播。我们考虑了两种情况：$i$）当用于寻找主要特征向量的子程序是抽象的并且可以代表各种算法时；和$ii$）当使用幂迭代作为子程序时。在后一种情况下，来自幂迭代的额外方向信息使我们能够获得比子程序不可知情况更紧密的误差界限。对于这两种情况，我们明确地表征了错误如何进展并影响随后的主成分估计。

更新时间: 2024-05-29 16:17:24

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2310.04283v2

Exploring the impact of traffic signal control and connected and automated vehicles on intersections safety: A deep reinforcement learning approach

In transportation networks, intersections pose significant risks of collisions due to conflicting movements of vehicles approaching from different directions. To address this issue, various tools can exert influence on traffic safety both directly and indirectly. This study focuses on investigating the impact of adaptive signal control and connected and automated vehicles (CAVs) on intersection safety using a deep reinforcement learning approach. The objective is to assess the individual and combined effects of CAVs and adaptive traffic signal control on traffic safety, considering rear-end and crossing conflicts. The study employs a Deep Q Network (DQN) to regulate traffic signals and driving behaviors of both CAVs and Human Drive Vehicles (HDVs), and uses Time To Collision (TTC) metric to evaluate safety. The findings demonstrate a significant reduction in rear-end and crossing conflicts through the combined implementation of CAVs and DQNs-based traffic signal control. Additionally, the long-term positive effects of CAVs on safety are similar to the short-term effects of combined CAVs and DQNs-based traffic signal control. Overall, the study emphasizes the potential benefits of integrating CAVs and adaptive traffic signal control approaches in order to enhance traffic safety. The findings of this study could provide valuable insights for city officials and transportation authorities in developing effective strategies to improve safety at signalized intersections.

Updated: 2024-05-29 16:17:19

标题: 探讨交通信号控制和连接自动化车辆对交叉口安全性的影响：一种深度强化学习方法

摘要: 在交通网络中，由于不同方向的车辆冲突移动，交叉口存在着重大的碰撞风险。为了解决这一问题，各种工具可以直接和间接地对交通安全产生影响。本研究重点研究了自适应信号控制和连接和自动驾驶车辆（CAVs）对交叉口安全的影响，采用深度强化学习方法。研究目标是评估CAVs和自适应交通信号控制对交通安全的个体和组合效应，考虑了追尾和横穿冲突。该研究采用深度Q网络（DQN）来调节交通信号和CAVs和人驾驶车辆（HDVs）的驾驶行为，并使用时间到冲突（TTC）指标来评估安全性。研究结果表明，通过CAVs和基于DQNs的交通信号控制的组合实施，追尾和横穿冲突显著减少。此外，CAVs对安全的长期积极影响与组合CAVs和基于DQNs的交通信号控制的短期影响相似。总体而言，该研究强调了整合CAVs和自适应交通信号控制方法以提高交通安全性的潜在好处。本研究的发现可以为城市官员和交通管理机构开发有效策略以改善信号交叉口的安全性提供有价值的见解。

更新时间: 2024-05-29 16:17:19

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.19236v1

Masked Autoencoders are PDE Learners

Neural solvers for partial differential equations (PDEs) have great potential to generate fast and accurate physics solutions, yet their practicality is currently limited by their generalizability. PDEs evolve over broad scales and exhibit diverse behaviors; predicting these phenomena will require learning representations across a wide variety of inputs which may encompass different coefficients, boundary conditions, resolutions, or even equations. As a step towards generalizable PDE modeling, we adapt masked pretraining for physics problems. Through self-supervised learning across PDEs, masked autoencoders can consolidate heterogeneous physics to learn meaningful latent representations and perform latent PDE arithmetic in this space. Furthermore, we demonstrate that masked pretraining can improve PDE coefficient regression and the classification of PDE features. Lastly, conditioning neural solvers on learned latent representations can improve time-stepping and super-resolution performance across a variety of coefficients, discretizations, or boundary conditions, as well as on unseen PDEs. We hope that masked pretraining can emerge as a unifying method across large, unlabeled, and heterogeneous datasets to learn latent physics at scale.

Updated: 2024-05-29 16:14:23

标题: 遮蔽自动编码器是PDE学习者

摘要: 神经求解器用于偏微分方程（PDEs）具有生成快速准确的物理解决方案的潜力，然而，它们目前的实用性受到它们的泛化能力的限制。PDEs在广泛的尺度上演变，并展现出多样的行为；预测这些现象将需要学习跨多种输入的表示，这些输入可能涵盖不同的系数、边界条件、分辨率，甚至方程。作为通向可泛化PDE建模的一步，我们将掩码预训练应用于物理问题。通过跨PDE的自监督学习，掩码自编码器可以整合异质物理学，学习有意义的潜在表示，并在该空间中执行潜在PDE算术。此外，我们证明掩码预训练可以改善PDE系数回归和PDE特征分类。最后，基于学习到的潜在表示条件的神经求解器可以提高跨各种系数、离散化或边界条件以及未见PDE的时间步进和超分辨率性能。我们希望掩码预训练可以成为一种统一的方法，适用于大型、无标记和异质数据集，以在规模上学习潜在的物理学。

更新时间: 2024-05-29 16:14:23

领域: cs.LG

下载: http://arxiv.org/abs/2403.17728v2

Forward-Backward Knowledge Distillation for Continual Clustering

Unsupervised Continual Learning (UCL) is a burgeoning field in machine learning, focusing on enabling neural networks to sequentially learn tasks without explicit label information. Catastrophic Forgetting (CF), where models forget previously learned tasks upon learning new ones, poses a significant challenge in continual learning, especially in UCL, where labeled information of data is not accessible. CF mitigation strategies, such as knowledge distillation and replay buffers, often face memory inefficiency and privacy issues. Although current research in UCL has endeavored to refine data representations and address CF in streaming data contexts, there is a noticeable lack of algorithms specifically designed for unsupervised clustering. To fill this gap, in this paper, we introduce the concept of Unsupervised Continual Clustering (UCC). We propose Forward-Backward Knowledge Distillation for unsupervised Continual Clustering (FBCC) to counteract CF within the context of UCC. FBCC employs a single continual learner (the ``teacher'') with a cluster projector, along with multiple student models, to address the CF issue. The proposed method consists of two phases: Forward Knowledge Distillation, where the teacher learns new clusters while retaining knowledge from previous tasks with guidance from specialized student models, and Backward Knowledge Distillation, where a student model mimics the teacher's behavior to retain task-specific knowledge, aiding the teacher in subsequent tasks. FBCC marks a pioneering approach to UCC, demonstrating enhanced performance and memory efficiency in clustering across various tasks, outperforming the application of clustering algorithms to the latent space of state-of-the-art UCL algorithms.

Updated: 2024-05-29 16:13:54

标题: 前向-后向知识蒸馏用于持续聚类

摘要: 无监督连续学习（UCL）是机器学习中一个新兴领域，专注于使神经网络能够顺序学习任务而无需明确的标签信息。灾难性遗忘（CF）是一个重要挑战，指模型在学习新任务时会忘记先前学习的任务，尤其是在UCL中，数据的标签信息不可访问。CF缓解策略，如知识蒸馏和重播缓冲区，往往面临内存效率和隐私问题。尽管当前UCL研究致力于优化数据表示和解决流数据环境中的CF问题，但明显缺乏专门设计用于无监督聚类的算法。为填补这一空白，本文介绍了无监督连续聚类（UCC）的概念。我们提出了用于无监督连续聚类（FBCC）的前向-后向知识蒸馏方法，以在UCC的背景下对抗CF。FBCC采用一个连续学习者（“老师”）和一个聚类投影仪，以及多个学生模型来解决CF问题。该方法包括两个阶段：前向知识蒸馏，老师在专门的学生模型的指导下学习新的聚类，同时保留先前任务的知识；后向知识蒸馏，学生模型模仿老师的行为以保留任务特定的知识，帮助老师完成后续任务。FBCC标志着UCC的一种开创性方法，展示了在各种任务中聚类方面的增强性能和内存效率，优于将聚类算法应用于最先进的UCL算法的潜在空间。

更新时间: 2024-05-29 16:13:54

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.19234v1

Track Anything Rapter(TAR)

Object tracking is a fundamental task in computer vision with broad practical applications across various domains, including traffic monitoring, robotics, and autonomous vehicle tracking. In this project, we aim to develop a sophisticated aerial vehicle system known as Track Anything Rapter (TAR), designed to detect, segment, and track objects of interest based on user-provided multimodal queries, such as text, images, and clicks. TAR utilizes cutting-edge pre-trained models like DINO, CLIP, and SAM to estimate the relative pose of the queried object. The tracking problem is approached as a Visual Servoing task, enabling the UAV to consistently focus on the object through advanced motion planning and control algorithms. We showcase how the integration of these foundational models with a custom high-level control algorithm results in a highly stable and precise tracking system deployed on a custom-built PX4 Autopilot-enabled Voxl2 M500 drone. To validate the tracking algorithm's performance, we compare it against Vicon-based ground truth. Additionally, we evaluate the reliability of the foundational models in aiding tracking in scenarios involving occlusions. Finally, we test and validate the model's ability to work seamlessly with multiple modalities, such as click, bounding box, and image templates.

Updated: 2024-05-29 16:09:31

标题: 跟踪任何猛禽（TAR）

摘要: 目标跟踪是计算机视觉中的一个基本任务，具有广泛的实际应用，涵盖了交通监控、机器人技术和自主车辆跟踪等多个领域。在这个项目中，我们旨在开发一种称为Track Anything Rapter（TAR）的先进空中车辆系统，旨在基于用户提供的多模态查询（如文本、图像和点击）检测、分割和跟踪感兴趣的对象。TAR利用像DINO、CLIP和SAM这样的尖端预训练模型来估计查询对象的相对姿态。跟踪问题被视为一项视觉伺服任务，使无人机通过先进的运动规划和控制算法始终专注于对象。我们展示了如何将这些基础模型与自定义高级控制算法集成，从而形成一个高度稳定和精确的跟踪系统，部署在自定义构建的PX4自动驾驶仪启用的Voxl2 M500无人机上。为了验证跟踪算法的性能，我们将其与基于Vicon的地面真相进行比较。此外，我们评估了基础模型在帮助涉及遮挡的情况下跟踪的可靠性。最后，我们测试和验证了模型与多种模态（如点击、边界框和图像模板）无缝协作的能力。

更新时间: 2024-05-29 16:09:31

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.11655v2

Valid Conformal Prediction for Dynamic GNNs

Graph neural networks (GNNs) are powerful black-box models which have shown impressive empirical performance. However, without any form of uncertainty quantification, it can be difficult to trust such models in high-risk scenarios. Conformal prediction aims to address this problem, however, an assumption of exchangeability is required for its validity which has limited its applicability to static graphs and transductive regimes. We propose to use unfolding, which allows any existing static GNN to output a dynamic graph embedding with exchangeability properties. Using this, we extend the validity of conformal prediction to dynamic GNNs in both transductive and semi-inductive regimes. We provide a theoretical guarantee of valid conformal prediction in these cases and demonstrate the empirical validity, as well as the performance gains, of unfolded GNNs against standard GNN architectures on both simulated and real datasets.

Updated: 2024-05-29 16:07:39

标题: 动态GNN的有效符合性预测

摘要: 图神经网络（GNNs）是强大的黑盒模型，已经展现出令人印象深刻的经验性能。然而，在没有任何形式的不确定性量化的情况下，很难信任这种模型在高风险场景中的表现。符合预测旨在解决这个问题，然而，其有效性需要假设交换性，这限制了其适用性于静态图和转导性制度。我们提出使用展开，允许任何现有的静态GNN输出具有交换性属性的动态图嵌入。利用这一点，我们将符合预测的有效性扩展到动态GNNs的转导性和半归纳性制度。我们在这些情况下提供了有效符合预测的理论保证，并展示了展开的GNN相对于标准GNN架构在模拟和真实数据集上的经验有效性以及性能收益。

更新时间: 2024-05-29 16:07:39

领域: stat.ML,cs.LG,62H30

下载: http://arxiv.org/abs/2405.19230v1

On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios

Explanation generation frameworks aim to make AI systems' decisions transparent and understandable to human users. However, generating explanations in uncertain environments characterized by incomplete information and probabilistic models remains a significant challenge. In this paper, we propose a novel framework for generating probabilistic monolithic explanations and model reconciling explanations. Monolithic explanations provide self-contained reasons for an explanandum without considering the agent receiving the explanation, while model reconciling explanations account for the knowledge of the agent receiving the explanation. For monolithic explanations, our approach integrates uncertainty by utilizing probabilistic logic to increase the probability of the explanandum. For model reconciling explanations, we propose a framework that extends the logic-based variant of the model reconciliation problem to account for probabilistic human models, where the goal is to find explanations that increase the probability of the explanandum while minimizing conflicts between the explanation and the probabilistic human model. We introduce explanatory gain and explanatory power as quantitative metrics to assess the quality of these explanations. Further, we present algorithms that exploit the duality between minimal correction sets and minimal unsatisfiable sets to efficiently compute both types of explanations in probabilistic contexts. Extensive experimental evaluations on various benchmarks demonstrate the effectiveness and scalability of our approach in generating explanations under uncertainty.

Updated: 2024-05-29 16:07:31

标题: 在概率场景中生成单体和模型调和解释

摘要: Explanation generation frameworks旨在使人工智能系统的决策对人类用户透明和可理解。然而，在由不完整信息和概率模型特征的不确定环境中生成解释仍然是一个重要挑战。在本文中，我们提出了一个新颖的框架，用于生成概率单体解释和模型协调解释。单体解释为解释目标提供独立的理由，而不考虑接收解释的代理人，而模型协调解释则考虑接收解释的代理人的知识。对于单体解释，我们的方法通过利用概率逻辑来增加解释目标的概率来整合不确定性。对于模型协调解释，我们提出了一个框架，将基于逻辑的模型协调问题扩展到考虑概率人类模型，目标是找到既增加解释目标概率又最小化解释与概率人类模型之间冲突的解释。我们引入解释增益和解释能力作为量化指标来评估这些解释的质量。此外，我们提出了利用最小修正集和最小不可满足集之间的对偶性来高效计算概率环境下的两种类型解释的算法。对各种基准测试的广泛实验评估表明，我们的方法在不确定情况下生成解释的有效性和可扩展性。

更新时间: 2024-05-29 16:07:31

领域: cs.AI

下载: http://arxiv.org/abs/2405.19229v1

Synthetic Potential Outcomes for Mixtures of Treatment Effects

Modern data analysis frequently relies on the use of large datasets, often constructed as amalgamations of diverse populations or data-sources. Heterogeneity across these smaller datasets constitutes two major challenges for causal inference: (1) the source of each sample can introduce latent confounding between treatment and effect, and (2) diverse populations may respond differently to the same treatment, giving rise to heterogeneous treatment effects (HTEs). The issues of latent confounding and HTEs have been studied separately but not in conjunction. In particular, previous works only report the conditional average treatment effect (CATE) among similar individuals (with respect to the measured covariates). CATEs cannot resolve mixtures of potential treatment effects driven by latent heterogeneity, which we call mixtures of treatment effects (MTEs). Inspired by method of moment approaches to mixture models, we propose "synthetic potential outcomes" (SPOs). Our new approach deconfounds heterogeneity while also guaranteeing the identifiability of MTEs. This technique bypasses full recovery of a mixture, which significantly simplifies its requirements for identifiability. We demonstrate the efficacy of SPOs on synthetic data.

Updated: 2024-05-29 16:05:57

标题: 混合处理效应的合成潜力结果

摘要: 现代数据分析经常依赖于使用大型数据集，通常构建为多样化人群或数据源的合并。这些较小数据集之间的异质性构成了因果推断的两个主要挑战：（1）每个样本的来源可能引入治疗和效应之间的潜在混淆，（2）不同的人群可能对相同的治疗产生不同的反应，从而产生异质性治疗效果（HTEs）。潜在混淆和HTEs的问题已经分别进行了研究，但尚未结合在一起。特别是，先前的研究仅报告了类似个体（在测量协变量方面）之间的条件平均治疗效应（CATE）。CATEs不能解决由潜在异质性驱动的潜在治疗效果的混合，我们称之为治疗效果的混合（MTEs）。受混合模型的矩法方法启发，我们提出了“合成潜在结果”（SPOs）。我们的新方法既去混淆异质性，又保证了MTEs的可识别性。该技术绕过了对混合的完全恢复，从而显着简化了其可识别性的要求。我们展示了SPOs在合成数据上的有效性。

更新时间: 2024-05-29 16:05:57

领域: cs.LG,econ.EM,stat.ME

下载: http://arxiv.org/abs/2405.19225v1

Semantic In-Domain Product Identification for Search Queries

Accurate explicit and implicit product identification in search queries is critical for enhancing user experiences, especially at a company like Adobe which has over 50 products and covers queries across hundreds of tools. In this work, we present a novel approach to training a product classifier from user behavioral data. Our semantic model led to >25% relative improvement in CTR (click through rate) across the deployed surfaces; a >50% decrease in null rate; a 2x increase in the app cards surfaced, which helps drive product visibility.

Updated: 2024-05-29 16:01:27

标题: 搜索查询的语义领域产品识别

摘要: 在搜索查询中准确识别产品的显式和隐式信息对于提升用户体验至关重要，尤其是在像Adobe这样拥有50多种产品并涵盖数百种工具查询的公司。在这项工作中，我们提出了一种新颖的方法，通过用户行为数据训练产品分类器。我们的语义模型导致部署表面上的点击率（CTR）相对提高了超过25％；空结果率降低了50％以上；应用卡片的展示量增加了2倍，有助于提高产品的可见性。

更新时间: 2024-05-29 16:01:27

领域: cs.IR,cs.AI,cs.CL,cs.LG

下载: http://arxiv.org/abs/2404.09091v2

Domain adaptation in small-scale and heterogeneous biological datasets

Machine learning techniques are steadily becoming more important in modern biology, and are used to build predictive models, discover patterns, and investigate biological problems. However, models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories, due to differences in the statistical properties of these datasets. These could stem from technical differences, such as the measurement technique used, or from relevant biological differences between the populations studied. Domain adaptation, a type of transfer learning, can alleviate this problem by aligning the statistical distributions of features and samples among different datasets so that similar models can be applied across them. However, a majority of state-of-the-art domain adaptation methods are designed to work with large-scale data, mostly text and images, while biological datasets often suffer from small sample sizes, and possess complexities such as heterogeneity of the feature space. This Review aims to synthetically discuss domain adaptation methods in the context of small-scale and highly heterogeneous biological data. We describe the benefits and challenges of domain adaptation in biological research and critically discuss some of its objectives, strengths, and weaknesses through key representative methodologies. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.

Updated: 2024-05-29 16:01:15

标题: 小规模和异质生物数据集中的域自适应

摘要: 机器学习技术在现代生物学中变得越来越重要，被用来构建预测模型、发现模式和探究生物问题。然而，仅仅基于一个数据集训练的模型通常无法泛化到来自不同队列或实验室的其他数据集，这是因为这些数据集的统计特性存在差异。这些差异可能源自技术上的不同，比如使用的测量技术，或者研究人群之间的相关生物差异。领域自适应是一种迁移学习的类型，可以通过调整不同数据集之间的特征和样本的统计分布，使相似的模型可以跨数据集应用。然而，大多数最新领域自适应方法设计用于处理大规模数据，主要是文本和图像，而生物数据集往往受限于小样本量，并具有特征空间的多样性。本综述旨在讨论领域自适应方法在小规模和高度异质生物数据的背景下的应用。我们描述了领域自适应在生物研究中的好处和挑战，并通过关键代表性方法批判性地讨论了一些其目标、优势和弱点。我们主张将领域自适应技术纳入计算生物学家的工具包，并进一步发展定制化方法。

更新时间: 2024-05-29 16:01:15

领域: q-bio.QM,cs.LG

下载: http://arxiv.org/abs/2405.19221v1

WRDScore: New Metric for Evaluation of Natural Language Generation Models

The problem of natural language generation, and, more specifically, method name prediction, faces significant difficulties when proposed models need to be evaluated on test data. Such a metric would need to consider the versatility with which a single method can be named, with respect to both semantics and syntax. Measuring the direct overlap between the predicted and reference (true) sequences will not be able to capture these subtleties. Other existing embedding based metrics either do not measure precision and recall or impose strict unrealistic assumptions on both sequences. To address these issues, we propose a new metric that, on the one hand, is very simple and lightweight, and, on the other hand, is able to calculate precision and recall without resorting to any assumptions while obtaining good performance with respect to the human judgement.

Updated: 2024-05-29 16:00:46

标题: WRDScore：自然语言生成模型评估的新指标

摘要: 自然语言生成问题，更具体地说是方法名称预测，在提出的模型需要在测试数据上进行评估时面临重大困难。这样的度量标准需要考虑一个单一方法在语义和语法方面命名的多功能性。直接测量预测序列和参考（真实）序列之间的重叠将无法捕捉这些细微差别。其他现有基于嵌入的度量标准要么不衡量精确度和召回率，要么对两个序列都施加严格的不切实际的假设。为了解决这些问题，我们提出了一种新的度量标准，一方面非常简单且轻量级，另一方面能够在不依赖任何假设的情况下计算精确度和召回率，同时在人类判断方面表现良好。

更新时间: 2024-05-29 16:00:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19220v1

LoByITFL: Low Communication Secure and Private Federated Learning

Federated Learning (FL) faces several challenges, such as the privacy of the clients data and security against Byzantine clients. Existing works treating privacy and security jointly make sacrifices on the privacy guarantee. In this work, we introduce LoByITFL, the first communication-efficient Information-Theoretic (IT) private and secure FL scheme that makes no sacrifices on the privacy guarantees while ensuring security against Byzantine adversaries. The key ingredients are a small and representative dataset available to the federator, a careful transformation of the FLTrust algorithm and the use of a trusted third party only in a one-time preprocessing phase before the start of the learning algorithm. We provide theoretical guarantees on privacy and Byzantine-resilience, and provide convergence guarantee and experimental results validating our theoretical findings.

Updated: 2024-05-29 16:00:19

标题: LoByITFL：低通信安全和私密的联邦学习

摘要: Federated Learning（FL）面临着一些挑战，例如客户数据的隐私和对拜占庭客户的安全性。现有的研究在处理隐私和安全性时通常会牺牲隐私保障。在这项工作中，我们介绍了LoByITFL，这是第一个通信高效的信息论私密和安全的FL方案，它在保证隐私保障的同时确保了对抗拜占庭对手的安全性。关键要素是一个小而具代表性的数据集可供联邦者使用，对FLTrust算法进行仔细转换，并在学习算法启动之前仅在一次预处理阶段使用信任的第三方。我们提供了隐私和拜占庭抗性的理论保证，并提供了收敛保证和实验证实了我们的理论发现。

更新时间: 2024-05-29 16:00:19

领域: cs.IT,cs.CR,cs.DC,cs.LG,math.IT

下载: http://arxiv.org/abs/2405.19217v1

Towards Global Glacier Mapping with Deep Learning and Open Earth Observation Data

Accurate global glacier mapping is critical for understanding climate change impacts. Despite its importance, automated glacier mapping at a global scale remains largely unexplored. Here we address this gap and propose Glacier-VisionTransformer-U-Net (GlaViTU), a convolutional-transformer deep learning model, and five strategies for multitemporal global-scale glacier mapping using open satellite imagery. Assessing the spatial, temporal and cross-sensor generalisation shows that our best strategy achieves intersection over union >0.85 on previously unobserved images in most cases, which drops to >0.75 for debris-rich areas such as High-Mountain Asia and increases to >0.90 for regions dominated by clean ice. A comparative validation against human expert uncertainties in terms of area and distance deviations underscores GlaViTU performance, approaching or matching expert-level delineation. Adding synthetic aperture radar data, namely, backscatter and interferometric coherence, increases the accuracy in all regions where available. The calibrated confidence for glacier extents is reported making the predictions more reliable and interpretable. We also release a benchmark dataset that covers 9% of glaciers worldwide. Our results support efforts towards automated multitemporal and global glacier mapping.

Updated: 2024-05-29 15:58:03

标题: 朝向利用深度学习和开放地球观测数据进行全球冰川制图

摘要: 准确的全球冰川映射对于理解气候变化影响至关重要。尽管它的重要性，全球范围内的自动化冰川映射仍然大部分未被探索。在这里，我们填补了这一空白，并提出了Glacier-VisionTransformer-U-Net（GlaViTU），一个卷积-变压器深度学习模型，以及五种用于使用开放卫星图像进行多时相全球范围冰川映射的策略。评估空间、时间和跨传感器泛化显示，我们的最佳策略在大多数情况下实现了交并比>0.85，对先前未观察到的图像，对于富含碎屑的区域，如高山亚洲，降至>0.75，并增至>0.90，对于主要由清冰占主导的区域。通过与人类专家的面积和距离偏差的比较验证强调了GlaViTU的性能，接近或匹配专家级别的划定。添加合成孔径雷达数据，即回波和干涉相干，提高了所有可用区域的准确性。报道了冰川范围的校准置信度，使预测更可靠和可解释。我们还发布了一个覆盖全球9%冰川的基准数据集。我们的结果支持自动多时相和全球冰川映射的努力。

更新时间: 2024-05-29 15:58:03

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2401.15113v2

HawkVision: Low-Latency Modeless Edge AI Serving

The trend of modeless ML inference is increasingly growing in popularity as it hides the complexity of model inference from users and caters to diverse user and application accuracy requirements. Previous work mostly focuses on modeless inference in data centers. To provide low-latency inference, in this paper, we promote modeless inference at the edge. The edge environment introduces additional challenges related to low power consumption, limited device memory, and volatile network environments. To address these challenges, we propose HawkVision, which provides low-latency modeless serving of vision DNNs. HawkVision leverages a two-layer edge-DC architecture that employs confidence scaling to reduce the number of model options while meeting diverse accuracy requirements. It also supports lossy inference under volatile network environments. Our experimental results show that HawkVision outperforms current serving systems by up to 1.6X in P99 latency for providing modeless service. Our FPGA prototype demonstrates similar performance at certain accuracy levels with up to a 3.34X reduction in power consumption.

Updated: 2024-05-29 15:56:33

标题: 鹰眼视觉：低延迟无模式边缘人工智能服务

摘要: 随着无模式ML推理的趋势日益普及，它将模型推理的复杂性隐藏在用户之外，并满足不同用户和应用程序的精度要求。先前的工作主要集中在数据中心中的无模式推理。为了提供低延迟的推理，在本文中，我们推动边缘的无模式推理。边缘环境引入了与低功耗、设备内存有限和网络环境不稳定相关的额外挑战。为了解决这些挑战，我们提出了HawkVision，它提供视觉DNN的低延迟无模式服务。HawkVision利用了一个采用置信度缩放的两层边缘-数据中心架构，以减少模型选项的数量同时满足不同的精度要求。它还支持在网络环境不稳定的情况下进行有损推理。我们的实验结果显示，HawkVision在提供无模式服务方面比当前的服务系统性能提高了高达1.6倍的P99延迟。我们的FPGA原型在某些精度水平上表现出类似的性能，功耗降低高达3.34倍。

更新时间: 2024-05-29 15:56:33

领域: eess.SY,cs.AI,cs.LG,cs.NI,cs.SY

下载: http://arxiv.org/abs/2405.19213v1

Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

Updated: 2024-05-29 15:54:04

标题: 使用具有自监督预训练和定制微调的Transformer进行车道渲染智能异常检测

摘要: 数字地图的导航服务不断增长，为司机提供了极大的便利。然而，在车道渲染地图图像中存在异常时，偶尔会引入潜在的危险，因为这些异常可能会误导人类驾驶员，从而导致不安全的驾驶条件。为了应对这一问题并准确有效地检测异常，本文将车道渲染图像异常检测转化为分类问题，并提出了一个由数据预处理、使用基于掩膜图像建模（MiM）方法的自监督预训练、使用基于交叉熵损失和标签平滑的自定义微调以及后处理组成的四阶段流水线来解决这一问题，利用最先进的深度学习技术，特别是涉及Transformer模型的技术。各种实验验证了所提出的流水线的有效性。结果表明，所提出的流水线在车道渲染图像异常检测方面表现出优越的性能，特别是自监督预训练与MiM结合可以大大提高检测准确性，同时显著减少总训练时间。例如，使用统一遮罩的Swin Transformer作为自监督预训练（Swin-Trans-UM）产生了94.77%的提高准确率和0.9743的改进的AUC分数，相比之下，未进行预训练的纯Swin Transformer（Swin-Trans）的准确率为94.01%，AUC为0.9498。微调的轮数从原来的280大幅减少到41。总之，所提出的流水线，通过整合使用MiM的自监督预训练和其他先进的深度学习技术，成为数字导航系统中增强车道渲染图像异常检测准确性和效率的强大解决方案。

更新时间: 2024-05-29 15:54:04

领域: cs.CV,cs.AI,cs.LG,eess.IV,stat.ML

下载: http://arxiv.org/abs/2312.04398v2

Partial Information Decomposition for Data Interpretability and Feature Selection

In this paper, we introduce Partial Information Decomposition of Features (PIDF), a new paradigm for simultaneous data interpretability and feature selection. Contrary to traditional methods that assign a single importance value, our approach is based on three metrics per feature: the mutual information shared with the target variable, the feature's contribution to synergistic information, and the amount of this information that is redundant. In particular, we develop a novel procedure based on these three metrics, which reveals not only how features are correlated with the target but also the additional and overlapping information provided by considering them in combination with other features. We extensively evaluate PIDF using both synthetic and real-world data, demonstrating its potential applications and effectiveness, by considering case studies from genetics and neuroscience.

Updated: 2024-05-29 15:54:03

标题: 部分信息分解用于数据解释性和特征选择

摘要: 在本文中，我们介绍了特征的部分信息分解（PIDF），这是一种同时具有数据可解释性和特征选择的新范式。与传统方法将单一重要性值分配给特征不同，我们的方法基于每个特征的三个指标：与目标变量共享的互信息，特征对协同信息的贡献，以及此信息中的冗余量。特别地，我们基于这三个指标开发了一种新颖的程序，不仅揭示了特征与目标的相关性，还考虑到将其与其他特征结合时提供的额外和重叠信息。我们通过使用合成和真实世界数据广泛评估了PIDF，通过考虑遗传学和神经科学的案例研究，展示了其潜在应用和有效性。

更新时间: 2024-05-29 15:54:03

领域: cs.LG,cs.AI,cs.IT,math.IT

下载: http://arxiv.org/abs/2405.19212v1

Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning

Machine learning models are vulnerable to adversarial attacks, including attacks that leak information about the model's training data. There has recently been an increase in interest about how to best address privacy concerns, especially in the presence of data-removal requests. Machine unlearning algorithms aim to efficiently update trained models to comply with data deletion requests while maintaining performance and without having to resort to retraining the model from scratch, a costly endeavor. Several algorithms in the machine unlearning literature demonstrate some level of privacy gains, but they are often evaluated only on rudimentary membership inference attacks, which do not represent realistic threats. In this paper we describe and propose alternative evaluation methods for three key shortcomings in the current evaluation of unlearning algorithms. We show the utility of our alternative evaluations via a series of experiments of state-of-the-art unlearning algorithms on different computer vision datasets, presenting a more detailed picture of the state of the field.

Updated: 2024-05-29 15:53:23

标题: 逝者虽去却不被遗忘：机器学习撤销的改进基准

摘要: 机器学习模型容易受到对抗性攻击的影响，包括那些泄露模型训练数据信息的攻击。最近对如何最好地解决隐私问题，尤其是在数据删除请求存在的情况下，引起了越来越多的关注。机器取消学习算法旨在有效地更新经过训练的模型，以便遵守数据删除请求，同时保持性能，而无需从头开始重新训练模型，这是一项昂贵的工作。机器取消学习文献中的几种算法展示了一定程度的隐私收益，但它们通常只在基本成员推断攻击上进行评估，这并不能代表现实威胁。在本文中，我们描述并提出了针对当前取消学习算法评估中三个关键缺陷的替代评估方法。通过一系列针对不同计算机视觉数据集的最新取消学习算法实验，我们展示了我们替代评估的实用性，从而呈现出该领域现状的更详细画面。

更新时间: 2024-05-29 15:53:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.19211v1

REBEL: Reinforcement Learning via Regressing Relative Rewards

While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise implementation of these components. In response, we take a step back and ask what a minimalist RL algorithm for the era of generative models would look like. We propose REBEL, an algorithm that cleanly reduces the problem of policy optimization to regressing the relative reward between two completions to a prompt in terms of the policy, enabling strikingly lightweight implementation. In theory, we prove that fundamental RL algorithms like Natural Policy Gradient can be seen as variants of REBEL, which allows us to match the strongest known theoretical guarantees in terms of convergence and sample complexity in the RL literature. REBEL can also cleanly incorporate offline data and be extended to handle the intransitive preferences we frequently see in practice. Empirically, we find that REBEL provides a unified approach to language modeling and image generation with stronger or similar performance as PPO and DPO, all while being simpler to implement and more computationally efficient than PPO. When fine-tuning Llama-3-8B-Instruct, REBEL achieves strong performance in AlpacaEval 2.0, MT-Bench, and Open LLM Leaderboard.

Updated: 2024-05-29 15:52:20

标题: REBEL：通过回归相对奖励进行强化学习

摘要: 尽管最初是为连续控制问题开发的，但近端策略优化（PPO）已成为各种强化学习（RL）应用的主要工具，包括生成模型的微调。不幸的是，PPO需要多个启发式方法才能实现稳定的收敛（例如价值网络、剪切），并且以其对这些组件的精确实现敏感而臭名昭著。作为回应，我们退后一步，询问在生成模型时代，一个极简的RL算法会是什么样子。我们提出了REBEL，一种算法，将策略优化问题清晰地归约为在政策方面回归两个完成对提示的相对奖励，实现了明显轻量级的实现。理论上，我们证明了像自然政策梯度这样的基本RL算法可以看作是REBEL的变种，这使我们能够在RL文献中具有最强大的已知理论保证，收敛性和样本复杂性方面相匹配。REBEL还可以清晰地整合离线数据，并扩展到处理我们经常在实践中看到的非传递偏好。在实证方面，我们发现REBEL提供了对语言建模和图像生成的统一方法，性能比PPO和DPO更强或相似，同时比PPO更简单实现和更具计算效率。在对Llama-3-8B-Instruct进行微调时，REBEL在AlpacaEval 2.0、MT-Bench和Open LLM Leaderboard中取得了强劲表现。

更新时间: 2024-05-29 15:52:20

领域: cs.LG,cs.CL,cs.CV

下载: http://arxiv.org/abs/2404.16767v2

Gradient Guided Hypotheses: A unified solution to enable machine learning models on scarce and noisy data regimes

Ensuring high-quality data is paramount for maximizing the performance of machine learning models and business intelligence systems. However, challenges in data quality, including noise in data capture, missing records, limited data production, and confounding variables, significantly constrain the potential performance of these systems. In this study, we propose an architecture-agnostic algorithm, Gradient Guided Hypotheses (GGH), designed to address these challenges. GGH analyses gradients from hypotheses as a proxy of distinct and possibly contradictory patterns in the data. This framework entails an additional step in machine learning training, where gradients can be included or excluded from backpropagation. In this manner, missing and noisy data are addressed through a unified solution that perceives both challenges as facets of the same overarching issue: the propagation of erroneous information. Experimental validation of GGH is conducted using real-world open-source datasets, where records with missing rates of up to 98.5% are simulated. Comparative analysis with state-of-the-art imputation methods demonstrates a substantial improvement in model performance achieved by GGH. Specifically in very high scarcity regimes, GGH was found to be the only viable solution. Additionally, GGH's noise detection capabilities are showcased by introducing simulated noise into the datasets and observing enhanced model performance after filtering out the noisy data. This study presents GGH as a promising solution for improving data quality and model performance in various applications.

Updated: 2024-05-29 15:51:40

标题: 渐变引导假设：一种统一解决方案，实现机器学习模型在稀缺和嘈杂数据环境中的应用

摘要: 确保高质量数据对于最大化机器学习模型和商业智能系统的性能至关重要。然而，数据质量方面的挑战，包括数据捕获中的噪音、缺失记录、有限的数据生成和混杂变量，显著限制了这些系统的潜在性能。在本研究中，我们提出了一种与架构无关的算法，称为Gradient Guided Hypotheses (GGH)，旨在解决这些挑战。GGH分析假设的梯度作为数据中不同且可能相互矛盾模式的代理。这个框架包含了机器学习训练中的一个额外步骤，其中梯度可以被包含或排除在反向传播中。通过这种方式，缺失和嘈杂的数据通过一个统一的解决方案得到解决，将这两个挑战都视为同一个根本问题的方面：错误信息的传播。GGH的实验验证是使用真实世界的开源数据集进行的，其中模拟了缺失率高达98.5%的记录。与最先进的插补方法进行比较分析，结果表明GGH实现的模型性能显著提升。特别是在非常高度稀缺的情况下，GGH被发现是唯一可行的解决方案。此外，通过向数据集引入模拟噪音并观察在滤除嘈杂数据后改进的模型性能，展示了GGH的噪声检测能力。这项研究将GGH作为一种有望改善各种应用中数据质量和模型性能的解决方案。

更新时间: 2024-05-29 15:51:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19210v1

VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos

Video-language understanding tasks have focused on short video clips, often struggling with long-form video understanding tasks. Recently, many long video-language understanding approaches have leveraged the reasoning capabilities of Large Language Models (LLMs) to perform long video QA, transforming videos into densely sampled frame captions, and asking LLMs to respond to text queries over captions. However, the frames used for captioning are often redundant and contain irrelevant information, making dense sampling inefficient, and ignoring the fact that video QA requires varying levels of granularity, with some video segments being highly relevant to the question (needing more fine-grained detail) while others being less relevant. Thus, these LLM-based approaches are prone to missing information and operate on large numbers of irrelevant captions, lowering both performance and efficiency. To address these issues, we introduce VideoTree, a query-adaptive and hierarchical framework for long-video understanding with LLMs. VideoTree dynamically extracts query-related information from a video and builds a tree-based representation for LLM reasoning. First, VideoTree adaptively selects frames for captioning by iteratively clustering frames based on their visual features and scoring clusters using their relevance to the query. Second, it organizes visual clusters into a query-adaptive and hierarchical tree structure; the tree encodes varying levels of granularity, with higher resolution on relevant segments. Finally, VideoTree produces an answer by traversing the tree's keyframes and passing their captions to an LLM answerer. Our method improves both reasoning accuracy and efficiency compared to existing methods: VideoTree achieves a 7.0%, 2.2%, and 2.7% accuracy gain over baselines on the EgoSchema, NExT-QA, and IntentQA benchmarks, respectively, while reducing inference time by 40%.

Updated: 2024-05-29 15:49:09

标题: VideoTree：基于树的自适应视频表示法用于长视频上的LLM推理

摘要: 视频语言理解任务通常集中于短视频片段，往往难以处理长视频理解任务。最近，许多长视频语言理解方法利用大型语言模型（LLMs）的推理能力执行长视频问答（QA），将视频转换为密集采样的帧标题，并要求LLMs对标题上的文本查询做出回应。然而，用于标题的帧通常是冗余的并包含无关信息，使得密集采样效率低下，并忽略了视频QA需要不同层次的粒度，有些视频段对问题非常相关（需要更精细的细节），而其他则不太相关。因此，这些基于LLMs的方法容易丢失信息，并在大量无关标题上操作，降低了性能和效率。为了解决这些问题，我们引入了VideoTree，这是一个用于长视频理解的查询自适应和分层框架，结合LLMs使用。VideoTree动态提取视频中与查询相关的信息，并构建基于树状结构的LLM推理。首先，VideoTree通过迭代地根据其视觉特征对帧进行聚类，并根据它们与查询的相关性对聚类进行评分，以自适应地选择帧进行标题。其次，它将视觉聚类组织成一个查询自适应和分层的树结构；树编码不同层次的粒度，对相关段具有更高分辨率。最后，VideoTree通过遍历树的关键帧并将它们的标题传递给LLM回答者来生成答案。与现有方法相比，我们的方法提高了推理准确性和效率：VideoTree在EgoSchema、NExT-QA和IntentQA基准测试中分别比基线实现了7.0%、2.2%和2.7%的准确度增益，同时将推理时间减少了40%。

更新时间: 2024-05-29 15:49:09

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.19209v1

A Multi-Source Retrieval Question Answering Framework Based on RAG

With the rapid development of large-scale language models, Retrieval-Augmented Generation (RAG) has been widely adopted. However, existing RAG paradigms are inevitably influenced by erroneous retrieval information, thereby reducing the reliability and correctness of generated results. Therefore, to improve the relevance of retrieval information, this study proposes a method that replaces traditional retrievers with GPT-3.5, leveraging its vast corpus knowledge to generate retrieval information. We also propose a web retrieval based method to implement fine-grained knowledge retrieval, Utilizing the powerful reasoning capability of GPT-3.5 to realize semantic partitioning of problem.In order to mitigate the illusion of GPT retrieval and reduce noise in Web retrieval,we proposes a multi-source retrieval framework, named MSRAG, which combines GPT retrieval with web retrieval. Experiments on multiple knowledge-intensive QA datasets demonstrate that the proposed framework in this study performs better than existing RAG framework in enhancing the overall efficiency and accuracy of QA systems.

Updated: 2024-05-29 15:47:57

标题: 基于RAG的多源检索问答框架

摘要: 随着大规模语言模型的快速发展，检索增强生成（RAG）已被广泛采用。然而，现有的RAG范式不可避免地受到错误检索信息的影响，从而降低了生成结果的可靠性和正确性。因此，为了改善检索信息的相关性，本研究提出了一种方法，用GPT-3.5替换传统的检索器，利用其丰富的语料库知识生成检索信息。我们还提出了一种基于Web检索的方法来实现细粒度知识检索，利用GPT-3.5强大的推理能力实现问题的语义分区。为了减轻GPT检索的错觉并减少Web检索中的噪音，我们提出了一个多源检索框架，名为MSRAG，将GPT检索与Web检索相结合。对多个知识密集型QA数据集的实验证明，本研究提出的框架在提高QA系统的整体效率和准确性方面优于现有的RAG框架。

更新时间: 2024-05-29 15:47:57

领域: cs.IR,cs.AI

下载: http://arxiv.org/abs/2405.19207v1

Matrix Manifold Neural Networks++

Deep neural networks (DNNs) on Riemannian manifolds have garnered increasing interest in various applied areas. For instance, DNNs on spherical and hyperbolic manifolds have been designed to solve a wide range of computer vision and nature language processing tasks. One of the key factors that contribute to the success of these networks is that spherical and hyperbolic manifolds have the rich algebraic structures of gyrogroups and gyrovector spaces. This enables principled and effective generalizations of the most successful DNNs to these manifolds. Recently, some works have shown that many concepts in the theory of gyrogroups and gyrovector spaces can also be generalized to matrix manifolds such as Symmetric Positive Definite (SPD) and Grassmann manifolds. As a result, some building blocks for SPD and Grassmann neural networks, e.g., isometric models and multinomial logistic regression (MLR) can be derived in a way that is fully analogous to their spherical and hyperbolic counterparts. Building upon these works, we design fully-connected (FC) and convolutional layers for SPD neural networks. We also develop MLR on Symmetric Positive Semi-definite (SPSD) manifolds, and propose a method for performing backpropagation with the Grassmann logarithmic map in the projector perspective. We demonstrate the effectiveness of the proposed approach in the human action recognition and node classification tasks.

Updated: 2024-05-29 15:47:35

标题: 矩阵流形神经网络++

摘要: Riemann流形上的深度神经网络(DNNs)在各个应用领域引起了越来越多的关注。例如，球形和双曲流形上的DNNs已经被设计用来解决各种计算机视觉和自然语言处理任务。这些网络成功的关键因素之一是球形和双曲流形具有陀螺群和陀螺向量空间的丰富代数结构。这使得最成功的DNNs能够在这些流形上进行有原则的和有效的泛化。最近的一些研究表明，陀螺群和陀螺向量空间理论中的许多概念也可以泛化到矩阵流形，例如对称正定(SPD)和Grassmann流形。因此，一些用于SPD和Grassmann神经网络的构建模块，如等距模型和多项式逻辑回归(MLR)，可以以与球形和双曲流形类似的方式推导出来。基于这些研究，我们为SPD神经网络设计了全连接(FC)和卷积层。我们还在对称正定半定(SPSD)流形上开发了MLR，并提出了一种在投影器视角下使用Grassmann对数映射进行反向传播的方法。我们在人体动作识别和节点分类任务中展示了所提出方法的有效性。

更新时间: 2024-05-29 15:47:35

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19206v1

TRAMBA: A Hybrid Transformer and Mamba Architecture for Practical Audio and Bone Conduction Speech Super Resolution and Enhancement on Mobile and Wearable Platforms

We propose TRAMBA, a hybrid transformer and Mamba architecture for acoustic and bone conduction speech enhancement, suitable for mobile and wearable platforms. Bone conduction speech enhancement has been impractical to adopt in mobile and wearable platforms for several reasons: (i) data collection is labor-intensive, resulting in scarcity; (ii) there exists a performance gap between state of-art models with memory footprints of hundreds of MBs and methods better suited for resource-constrained systems. To adapt TRAMBA to vibration-based sensing modalities, we pre-train TRAMBA with audio speech datasets that are widely available. Then, users fine-tune with a small amount of bone conduction data. TRAMBA outperforms state-of-art GANs by up to 7.3% in PESQ and 1.8% in STOI, with an order of magnitude smaller memory footprint and an inference speed up of up to 465 times. We integrate TRAMBA into real systems and show that TRAMBA (i) improves battery life of wearables by up to 160% by requiring less data sampling and transmission; (ii) generates higher quality voice in noisy environments than over-the-air speech; (iii) requires a memory footprint of less than 20.0 MB.

Updated: 2024-05-29 15:46:57

标题: TRAMBA：一种用于移动和可穿戴平台上实际音频和骨导语音超分辨率和增强的混合Transformer和Mamba架构

摘要: 我们提出了TRAMBA，这是一种适用于移动和可穿戴平台的声学和骨传导语音增强的混合变压器和曼巴架构。骨传导语音增强由于几个原因而在移动和可穿戴平台上难以采用：(i)数据收集需要大量劳动力，导致数据稀缺；(ii)状态艺术模型与适合资源受限系统的方法之间存在性能差距。为了使TRAMBA适应基于振动传感的模态，我们使用广泛可用的音频语音数据集对TRAMBA进行预训练，然后用户使用少量骨传导数据进行微调。TRAMBA在PESQ上比最先进的GAN模型表现更好，提高了最多7.3%，在STOI上提高了1.8%，内存占用量小了一个数量级，推断速度提高了最多465倍。我们将TRAMBA集成到实际系统中，并展示TRAMBA：(i)通过减少数据采样和传输，使可穿戴设备的电池寿命提高了最多160%；(ii)在嘈杂环境中生成比空中语音更高质量的声音；(iii)内存占用量小于20.0 MB。

更新时间: 2024-05-29 15:46:57

领域: cs.SD,cs.AI,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.01242v3

Continual Contrastive Spoken Language Understanding

Recently, neural networks have shown impressive progress across diverse fields, with speech processing being no exception. However, recent breakthroughs in this area require extensive offline training using large datasets and tremendous computing resources. Unfortunately, these models struggle to retain their previously acquired knowledge when learning new tasks continually, and retraining from scratch is almost always impractical. In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning. Through a modified version of the standard supervised contrastive loss applied only to the rehearsal samples, COCONUT preserves the learned representations by pulling closer samples from the same class and pushing away the others. Moreover, we leverage a multimodal contrastive loss that helps the model learn more discriminative representations of the new data by aligning audio and text features. We also investigate different contrastive designs to combine the strengths of the contrastive loss with teacher-student architectures used for distillation. Experiments on two established SLU datasets reveal the effectiveness of our proposed approach and significant improvements over the baselines. We also show that COCONUT can be combined with methods that operate on the decoder side of the model, resulting in further metrics improvements.

Updated: 2024-05-29 15:43:52

标题: 持续的对比口语理解

摘要: 最近，神经网络在各个领域取得了令人印象深刻的进展，语音处理也不例外。然而，这一领域的最新突破需要使用大型数据集和巨大的计算资源进行广泛的离线训练。不幸的是，这些模型在持续学习新任务时往往难以保留先前获得的知识，而从头开始重新训练几乎总是不切实际的。在本文中，我们研究了在增量学习环境下学习面向序列的语言理解模型的问题，并提出了COCONUT，这是一种依赖于经验重放和对比学习结合的增量学习方法。通过对标准监督对比损失的修改版本仅应用于回放样本，COCONUT通过将来自同一类别的样本拉近并将其他样本推开来保留学到的表示。此外，我们利用多模态对比损失帮助模型通过对齐音频和文本特征学习更具辨别力的表示。我们还研究了不同的对比设计，以结合对比损失与用于蒸馏的师生架构的优势。在两个已建立的SLU数据集上的实验证明了我们提出的方法的有效性和相对基线的显著改进。我们还展示了COCONUT可以与在模型解码器端运行的方法结合，从而进一步改进指标。

更新时间: 2024-05-29 15:43:52

领域: eess.AS,cs.AI

下载: http://arxiv.org/abs/2310.02699v2

Vulnerable Road User Detection and Safety Enhancement: A Comprehensive Survey

Traffic incidents involving vulnerable road users (VRUs) constitute a significant proportion of global road accidents. Advances in traffic communication ecosystems, coupled with sophisticated signal processing and machine learning techniques, have facilitated the utilization of data from diverse sensors. Despite these advancements and the availability of extensive datasets, substantial progress is required to mitigate traffic casualties. This paper provides a comprehensive survey of state-of-the-art technologies and methodologies to enhance the safety of VRUs. The study delves into the communication networks between vehicles and VRUs, emphasizing the integration of advanced sensors and the availability of relevant datasets. It explores preprocessing techniques and data fusion methods to enhance sensor data quality. Furthermore, our study assesses critical simulation environments essential for developing and testing VRU safety systems. Our research also highlights recent advances in VRU detection and classification algorithms, addressing challenges such as variable environmental conditions. Additionally, we cover cutting-edge research in predicting VRU intentions and behaviors, which is crucial for proactive collision avoidance strategies. Through this survey, we aim to provide a comprehensive understanding of the current landscape of VRU safety technologies, identifying areas of progress and areas needing further research and development.

Updated: 2024-05-29 15:42:10

标题: 易受伤害的道路使用者的检测和安全增强：一项全面调查

摘要: 涉及弱势道路使用者（VRU）的交通事故占全球道路事故的相当大比例。交通通信生态系统的进步，加上复杂的信号处理和机器学习技术，促进了来自不同传感器的数据的利用。尽管有这些进展和大量数据集的可用性，但需要大量的进展来减少交通伤亡。本文提供了一个综合调查，介绍了增强VRU安全性的最新技术和方法。研究深入探讨了车辆和VRU之间的通信网络，强调了先进传感器的整合和相关数据集的可用性。它探讨了用于增强传感器数据质量的预处理技术和数据融合方法。此外，我们的研究评估了对开发和测试VRU安全系统至关重要的关键模拟环境。我们的研究还突出了VRU检测和分类算法的最新进展，解决了可变环境条件等挑战。此外，我们涵盖了预测VRU意图和行为的尖端研究，这对于积极的碰撞回避策略至关重要。通过这项调查，我们旨在提供对当前VRU安全技术现状的全面了解，识别需要进一步研究和开发的领域和取得进展的领域。

更新时间: 2024-05-29 15:42:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19202v1

Going beyond compositional generalization, DDPMs can produce zero-shot interpolation

Denoising Diffusion Probabilistic Models (DDPMs) exhibit remarkable capabilities in image generation, with studies suggesting that they can generalize by composing latent factors learned from the training data. In this work, we go further and study DDPMs trained on strictly separate subsets of the data distribution with large gaps on the support of the latent factors. We show that such a model can effectively generate images in the unexplored, intermediate regions of the distribution. For instance, when trained on clearly smiling and non-smiling faces, we demonstrate a sampling procedure which can generate slightly smiling faces without reference images (zero-shot interpolation). We replicate these findings for other attributes as well as other datasets. $\href{https://github.com/jdeschena/ddpm-zero-shot-interpolation}{\text{Our code is available on GitHub.}}$

Updated: 2024-05-29 15:41:53

标题: 超越组合概括，DDPMs可以产生零-shot插值

摘要: 去噪扩散概率模型（DDPMs）在图像生成方面具有显著的能力，研究表明它们可以通过组合从训练数据中学到的潜在因子来实现泛化。在这项工作中，我们进一步研究了在数据分布的严格分离子集上训练的DDPMs，这些子集在潜在因子的支持上具有很大的差距。我们展示了这样一个模型可以有效地在分布的未探索中间区域生成图像。例如，当训练清楚微笑和不微笑的脸部时，我们展示了一个可以生成微微微笑脸部的采样过程，而不需要参考图像（零样本插值）。我们还为其他属性以及其他数据集复制了这些发现。我们的代码可以在GitHub上找到。

更新时间: 2024-05-29 15:41:53

领域: cs.CV,cs.AI,cs.NE

下载: http://arxiv.org/abs/2405.19201v1

Learning Topological Representations with Bidirectional Graph Attention Network for Solving Job Shop Scheduling Problem

Existing learning-based methods for solving job shop scheduling problems (JSSP) usually use off-the-shelf GNN models tailored to undirected graphs and neglect the rich and meaningful topological structures of disjunctive graphs (DGs). This paper proposes the topology-aware bidirectional graph attention network (TBGAT), a novel GNN architecture based on the attention mechanism, to embed the DG for solving JSSP in a local search framework. Specifically, TBGAT embeds the DG from a forward and a backward view, respectively, where the messages are propagated by following the different topologies of the views and aggregated via graph attention. Then, we propose a novel operator based on the message-passing mechanism to calculate the forward and backward topological sorts of the DG, which are the features for characterizing the topological structures and exploited by our model. In addition, we theoretically and experimentally show that TBGAT has linear computational complexity to the number of jobs and machines, respectively, strengthening our method's practical value. Besides, extensive experiments on five synthetic datasets and seven classic benchmarks show that TBGAT achieves new SOTA results by outperforming a wide range of neural methods by a large margin. All the code and data are publicly available online at https://github.com/zcaicaros/TBGAT.

Updated: 2024-05-29 15:32:09

标题: 使用双向图注意力网络学习拓扑表示以解决作业车间调度问题

摘要: 现有的基于学习的方法用于解决作业车间调度问题（JSSP）通常使用现成的GNN模型，专门针对无向图，并忽略了离散图（DGs）的丰富且有意义的拓扑结构。本文提出了一种拓扑感知的双向图注意力网络（TBGAT），这是一种基于注意力机制的新颖GNN架构，用于在本地搜索框架中嵌入DG以解决JSSP。具体而言，TBGAT分别从正向和反向视角嵌入DG，其中消息通过遵循不同视角的拓扑结构传播，并通过图注意力进行聚合。然后，我们提出了一种基于消息传递机制的新操作符来计算DG的正向和反向拓扑排序，这些特征用于表征拓扑结构并被我们的模型利用。此外，我们在理论上和实验上展示，TBGAT对于作业和机器的数量分别具有线性的计算复杂性，增强了我们方法的实际价值。此外，对五个合成数据集和七个经典基准进行的广泛实验表明，TBGAT通过大幅度胜过一系列神经方法取得了新的SOTA结果。所有代码和数据都可以在https://github.com/zcaicaros/TBGAT上公开获得。

更新时间: 2024-05-29 15:32:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.17606v2

Diffusion-based Dynamics Models for Long-Horizon Rollout in Offline Reinforcement Learning

With the great success of diffusion models (DMs) in generating realistic synthetic vision data, many researchers have investigated their potential in decision-making and control. Most of these works utilized DMs to sample directly from the trajectory space, where DMs can be viewed as a combination of dynamics models and policies. In this work, we explore how to decouple DMs' ability as dynamics models in fully offline settings, allowing the learning policy to roll out trajectories. As DMs learn the data distribution from the dataset, their intrinsic policy is actually the behavior policy induced from the dataset, which results in a mismatch between the behavior policy and the learning policy. We propose Dynamics Diffusion, short as DyDiff, which can inject information from the learning policy to DMs iteratively. DyDiff ensures long-horizon rollout accuracy while maintaining policy consistency and can be easily deployed on model-free algorithms. We provide theoretical analysis to show the advantage of DMs on long-horizon rollout over models and demonstrate the effectiveness of DyDiff in the context of offline reinforcement learning, where the rollout dataset is provided but no online environment for interaction. Our code is at https://github.com/FineArtz/DyDiff.

Updated: 2024-05-29 15:29:46

标题: 基于扩散的动力学模型在离线强化学习中的长期推演模型

摘要: 随着扩散模型（DMs）在生成逼真的合成视觉数据方面取得巨大成功，许多研究人员已经调查了它们在决策和控制方面的潜力。大多数工作利用DMs直接从轨迹空间中采样，其中DMs可以被视为动态模型和策略的组合。在这项工作中，我们探讨了如何在完全离线设置中解耦DMs作为动态模型的能力，允许学习策略展开轨迹。由于DMs从数据集中学习数据分布，它们内在的策略实际上是从数据集中诱导出的行为策略，这导致了行为策略和学习策略之间的不匹配。我们提出了Dynamics Diffusion，简称为DyDiff，它可以迭代地向DMs注入来自学习策略的信息。DyDiff确保了长时间展开的准确性，同时保持了策略的一致性，并且可以轻松部署在无模型算法上。我们提供了理论分析，展示了DMs在长期展开中的优势，并展示了DyDiff在离线强化学习环境中的有效性，其中提供了展开数据集，但没有在线环境进行交互。我们的代码位于https://github.com/FineArtz/DyDiff。

更新时间: 2024-05-29 15:29:46

领域: cs.LG

下载: http://arxiv.org/abs/2405.19189v1

MetaToken: Detecting Hallucination in Image Descriptions by Meta Classification

Large Vision Language Models (LVLMs) have shown remarkable capabilities in multimodal tasks like visual question answering or image captioning. However, inconsistencies between the visual information and the generated text, a phenomenon referred to as hallucinations, remain an unsolved problem with regard to the trustworthiness of LVLMs. To address this problem, recent works proposed to incorporate computationally costly Large (Vision) Language Models in order to detect hallucinations on a sentence- or subsentence-level. In this work, we introduce MetaToken, a lightweight binary classifier to detect hallucinations on the token-level at negligible cost. Based on a statistical analysis, we reveal key factors of hallucinations in LVLMs which have been overseen in previous works. MetaToken can be applied to any open-source LVLM without any knowledge about ground truth data providing a reliable detection of hallucinations. We evaluate our method on four state-of-the-art LVLMs demonstrating the effectiveness of our approach.

Updated: 2024-05-29 15:28:42

标题: 元分类器：通过元分类检测图像描述中的虚构

摘要: 大型视觉语言模型（LVLMs）在视觉问答或图像字幕等多模式任务中展示出卓越的能力。然而，视觉信息与生成文本之间的不一致性，即所谓的幻觉现象，仍然是LVLMs可信度方面的未解决问题。为了解决这个问题，最近的研究提出了在句子或子句级别上检测幻觉的计算成本高昂的大型（视觉）语言模型。在这项工作中，我们引入MetaToken，一个轻量级的二元分类器，以忽略成本检测令牌级别的幻觉。基于统计分析，我们揭示了在先前的研究中被忽视的LVLMs中幻觉的关键因素。MetaToken可以应用于任何开源LVLM，无需了解地面真实数据，提供可靠的幻觉检测。我们在四个最先进的LVLM上评估了我们的方法，展示了我们方法的有效性。

更新时间: 2024-05-29 15:28:42

领域: cs.CV,cs.CL,cs.LG,I.4

下载: http://arxiv.org/abs/2405.19186v1

Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Better Solvers for Math Word Problems

Chain-of-Thought (CoT) prompting has enhanced the performance of Large Language Models (LLMs) across various reasoning tasks. However, CoT still falls short in dealing with complex math word problems, as it usually suffers from three pitfalls: semantic misunderstanding errors, calculation errors and step-missing errors. Prior studies involve addressing the calculation errors and step-missing errors, but neglect the semantic misunderstanding errors, which is the major factor limiting the LLMs' performance. To this end, we propose a simple-yet-effective method, namely Deeply Understanding the Problems (DUP), to improve the LLMs' math problem-solving ability by addressing semantic misunderstanding errors. The core of our method is to encourage the LLMs to deeply understand the problems and extract the key problem-solving information used for better reasoning. Extensive experiments on 10 diverse reasoning benchmarks show that our DUP method consistently outperforms the other counterparts by a large margin. More encouragingly, DUP achieves a new SOTA result on the GSM8K benchmark, with an accuracy of 97.1% under zero-shot setting.

Updated: 2024-05-29 15:27:53

标题: 《在GSM8K上达到>97%的准确率：深入理解问题使LLMs成为数学文字问题更好的解决者》

摘要: 思维链（CoT）提示已经提升了大型语言模型（LLMs）在各种推理任务中的表现。然而，CoT在处理复杂的数学问题时仍然存在不足，因为它通常遭受三个陷阱：语义误解错误、计算错误和步骤遗漏错误。先前的研究涉及解决计算错误和步骤遗漏错误，但忽视了语义误解错误，这是限制LLMs性能的主要因素。为此，我们提出了一种简单但有效的方法，即深入理解问题（DUP），通过解决语义误解错误来提高LLMs的数学问题解决能力。我们方法的核心是鼓励LLMs深入理解问题，并提取用于更好推理的关键问题解决信息。对10个不同推理基准的广泛实验表明，我们的DUP方法始终以较大的优势胜过其他对手。更令人鼓舞的是，在零-shot设置下，DUP在GSM8K基准上实现了新的SOTA结果，准确率为97.1%。

更新时间: 2024-05-29 15:27:53

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.14963v3

A Declarative System for Optimizing AI Workloads

A long-standing goal of data management systems has been to build systems which can compute quantitative insights over large corpora of unstructured data in a cost-effective manner. Until recently, it was difficult and expensive to extract facts from company documents, data from scientific papers, or metrics from image and video corpora. Today's models can accomplish these tasks with high accuracy. However, a programmer who wants to answer a substantive AI-powered query must orchestrate large numbers of models, prompts, and data operations. For even a single query, the programmer has to make a vast number of decisions such as the choice of model, the right inference method, the most cost-effective inference hardware, the ideal prompt design, and so on. The optimal set of decisions can change as the query changes and as the rapidly-evolving technical landscape shifts. In this paper we present Palimpzest, a system that enables anyone to process AI-powered analytical queries simply by defining them in a declarative language. The system uses its cost optimization framework to implement the query plan with the best trade-offs between runtime, financial cost, and output data quality. We describe the workload of AI-powered analytics tasks, the optimization methods that Palimpzest uses, and the prototype system itself. We evaluate Palimpzest on tasks in Legal Discovery, Real Estate Search, and Medical Schema Matching. We show that even our simple prototype offers a range of appealing plans, including one that is 3.3x faster and 2.9x cheaper than the baseline method, while also offering better data quality. With parallelism enabled, Palimpzest can produce plans with up to a 90.3x speedup at 9.1x lower cost relative to a single-threaded GPT-4 baseline, while obtaining an F1-score within 83.5% of the baseline. These require no additional work by the user.

Updated: 2024-05-29 15:27:07

标题: 一个优化人工智能工作负载的声明性系统

摘要: 数据管理系统的一个长期目标是建立能够以成本效益的方式计算大量非结构化数据的定量洞见的系统。直到最近，从公司文件中提取事实、从科学论文中获取数据，或者从图像和视频文集中提取指标都是困难且昂贵的。如今的模型可以以高精度完成这些任务。然而，想要回答一个实质性的基于人工智能的查询的程序员必须协调大量的模型、提示和数据操作。即使是针对一个查询，程序员也必须做出大量决策，比如选择模型、正确的推理方法、最具成本效益的推理硬件、理想的提示设计等等。最佳决策组合可能会随着查询的变化和快速发展的技术环境的变化而改变。在本文中，我们提出了Palimpzest，一个系统，可以通过在声明性语言中定义查询来简单地处理基于人工智能的分析查询。该系统使用其成本优化框架来实现具有最佳权衡关系的查询计划，包括运行时、财务成本和输出数据质量。我们描述了基于人工智能的分析任务的工作负载、Palimpzest使用的优化方法以及原型系统本身。我们在法律发现、房地产搜索和医疗模式匹配任务上评估了Palimpzest。我们展示了即使是我们简单的原型也提供了吸引人的计划范围，包括一个比基准方法快3.3倍、便宜2.9倍，同时提供更好的数据质量。启用并行处理后，Palimpzest可以生成计划，相对于单线程的GPT-4基准，速度提升高达90.3倍，成本降低9.1倍，同时获得与基准值83.5%的F1分数。这些都不需要用户进行额外的工作。

更新时间: 2024-05-29 15:27:07

领域: cs.CL,cs.AI,cs.DB,H.2.3; I.2.5

下载: http://arxiv.org/abs/2405.14696v2

Promoting Two-sided Fairness in Dynamic Vehicle Routing Problem

Dynamic Vehicle Routing Problem (DVRP), is an extension of the classic Vehicle Routing Problem (VRP), which is a fundamental problem in logistics and transportation. Typically, DVRPs involve two stakeholders: service providers that deliver services to customers and customers who raise requests from different locations. Many real-world applications can be formulated as DVRP such as ridesharing and non-compliance capture. Apart from original objectives like optimising total utility or efficiency, DVRP should also consider fairness for all parties. Unfairness can induce service providers and customers to give up on the systems, leading to negative financial and social impacts. However, most existing DVRP-related applications focus on improving fairness from a single side, and there have been few works considering two-sided fairness and utility optimisation concurrently. To this end, we propose a novel framework, a Two-sided Fairness-aware Genetic Algorithm (named 2FairGA), which expands the genetic algorithm from the original objective solely focusing on utility to multi-objectives that incorporate two-sided fairness. Subsequently, the impact of injecting two fairness definitions into the utility-focused model and the correlation between any pair of the three objectives are explored. Extensive experiments demonstrate the superiority of our proposed framework compared to the state-of-the-art.

Updated: 2024-05-29 15:24:28

标题: 在动态车辆路径问题中促进双边公平

摘要: 动态车辆路径问题（DVRP）是经典车辆路径问题（VRP）的延伸，是物流和运输领域中的一个基本问题。通常，DVRP涉及两个利益相关者：提供服务给客户的服务提供者和从不同位置提出请求的客户。许多现实世界的应用可以被形式化为DVRP，如拼车和非合规捕捉。除了原始目标如优化总效用或效率外，DVRP还应考虑所有利益相关方的公平性。不公平会导致服务提供者和客户放弃系统，从而导致负面的财务和社会影响。然而，大多数现有的与DVRP相关的应用都着重于从单一方面提高公平性，很少有研究同时考虑双方公平和效用优化。为此，我们提出了一个新颖的框架，即双方公平意识遗传算法（命名为2FairGA），它将遗传算法从仅关注效用的原始目标扩展到包含双方公平的多目标。随后，探讨了将两种公平定义注入到以效用为中心的模型中的影响以及三个目标中任意一对之间的相关性。广泛的实验证明了我们提出的框架相对于最先进技术的优越性。

更新时间: 2024-05-29 15:24:28

领域: cs.AI

下载: http://arxiv.org/abs/2405.19184v1

Model-independent cosmological inference post DESI DR1 BAO measurements

In this work, we implement Gaussian process regression to reconstruct the expansion history of the universe in a model-agnostic manner, using the Pantheon-Plus SN-Ia compilation in combination with two different BAO measurements (SDSS-IV and DESI DR1). In both the reconstructions, the $\Lambda$CDM model is always included in the 95\% confidence intervals. We find evidence that the DESI LRG data at $z_{\text{eff}} = 0.51$ is not an outlier within our model-independent framework. We study the $\mathcal{O}m$-diagnostics and the evolution of the total equation of state (EoS) of our universe, which hint towards the possibility of a quintessence-like dark energy scenario with a very slowly varying EoS, and a phantom-crossing in higher $z$. The entire exercise is later complemented by considering two more SN-Ia compilations - DES-5YR and Union3 - in combination with DESI BAO. Reconstruction with the DESI BAO + DES-5YR SN data sets predicts that the $\Lambda$CDM model lies outside the 3$\sigma$ confidence levels, whereas with DESI BAO + Union3 data, the $\Lambda$CDM model is always included within 1$\sigma$. We also report constraints on $H_0 r_d$ from our model-agnostic analysis, independent of the pre-recombination physics. Our results point towards an $\approx$ 2$\sigma$ discrepancy between the DESI + Pantheon-Plus and DESI + DES-5YR data sets, which calls for further investigation.

Updated: 2024-05-29 15:18:39

标题: 基于DESI DR1 BAO测量数据的独立模型宇宙学推断

摘要: 在这项工作中，我们实现了高斯过程回归，以一种不受模型限制的方式重建宇宙的扩张历史，使用Pantheon-Plus SN-Ia编译与两种不同的BAO测量（SDSS-IV和DESI DR1）相结合。在两种重建中，ΛCDM模型始终包含在95%置信区间内。我们发现证据表明，在我们的模型无关框架内，$z_{\text{eff}} = 0.51$处的DESI LRG数据不是异常值。我们研究了$\mathcal{O}m$诊断和我们宇宙总态方程（EoS）的演变，这暗示着可能存在一种类似于五体相互作用的暗能量情景，其EoS变化非常缓慢，并在较高的$z$值处穿越幽灵。整个过程后来又通过考虑另外两个SN-Ia编译 - DES-5YR和Union3 - 与DESI BAO相结合来补充。使用DESI BAO + DES-5YR SN数据集进行重建预测，ΛCDM模型位于3$\sigma$置信水平之外，而使用DESI BAO + Union3数据时，ΛCDM模型始终包含在1$\sigma$内。我们还报告了来自我们的模型无关分析的$H_0 r_d$的约束，独立于复合前物理。我们的结果指向了DESI + Pantheon-Plus和DESI + DES-5YR数据集之间的约2$\sigma$差异，这需要进一步调查。

更新时间: 2024-05-29 15:18:39

领域: astro-ph.CO,cs.LG,gr-qc

下载: http://arxiv.org/abs/2405.19178v1

The ethical situation of DALL-E 2

A hot topic of Artificial Intelligence right now is image generation from prompts. DALL-E 2 is one of the biggest names in this domain, as it allows people to create images from simple text inputs, to even more complicated ones. The company that made this possible, OpenAI, has assured everyone that visited their website that their mission is to ensure that artificial general intelligence benefits all humanity. A noble idea in our opinion, that also stood as the motive behind us choosing this subject. This paper analyzes the ethical implications of an AI image generative system, with an emphasis on how society is responding to it, how it probably will and how it should if all the right measures are taken.

Updated: 2024-05-29 15:18:13

标题: 《DALL-E 2的伦理情况》

摘要: 目前人工智能领域的一个热门话题是从提示中生成图像。DALL-E 2 是这一领域中最重要的名字之一，因为它允许人们从简单的文本输入到更复杂的输入创建图像。使这一切成为可能的公司 OpenAI 已向访问他们网站的所有人保证，他们的使命是确保人工通用智能造福于全人类。在我们看来，这是一个高尚的想法，也是我们选择这个主题的动机。本文分析了人工智能图像生成系统的伦理影响，重点关注社会对其的反应，以及它可能会如何应对，以及如果采取了所有正确的措施，应该如何应对。

更新时间: 2024-05-29 15:18:13

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2405.19176v1

Online Linear Regression in Dynamic Environments via Discounting

We develop algorithms for online linear regression which achieve optimal static and dynamic regret guarantees \emph{even in the complete absence of prior knowledge}. We present a novel analysis showing that a discounted variant of the Vovk-Azoury-Warmuth forecaster achieves dynamic regret of the form $R_{T}(\vec{u})\le O\left(d\log(T)\vee \sqrt{dP_{T}^{\gamma}(\vec{u})T}\right)$, where $P_{T}^{\gamma}(\vec{u})$ is a measure of variability of the comparator sequence, and show that the discount factor achieving this result can be learned on-the-fly. We show that this result is optimal by providing a matching lower bound. We also extend our results to \emph{strongly-adaptive} guarantees which hold over every sub-interval $[a,b]\subseteq[1,T]$ simultaneously.

Updated: 2024-05-29 15:17:53

标题: 通过折扣在动态环境中进行在线线性回归

摘要: 我们提出了一种在线线性回归的算法，即使在完全没有先验知识的情况下，也能实现最优的静态和动态遗憾保证。我们提出了一种新颖的分析，表明Vovk-Azoury-Warmuth预测器的折扣变体可以实现形式为$R_{T}(\vec{u})\le O\left(d\log(T)\vee \sqrt{dP_{T}^{\gamma}(\vec{u})T}\right)$的动态遗憾，其中$P_{T}^{\gamma}(\vec{u})$是比较序列的变化性度量，并且表明实现这一结果的折扣因子可以实时学习。我们通过提供相匹配的下界证明了这一结果的最优性。我们还将我们的结果扩展到同时在每个子区间$[a,b]\subseteq[1,T]$上保持\emph{强自适应}保证。

更新时间: 2024-05-29 15:17:53

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19175v1

FaultFormer: Pretraining Transformers for Adaptable Bearing Fault Classification

The growth of global consumption has motivated important applications of deep learning to smart manufacturing and machine health monitoring. In particular, analyzing vibration data offers great potential to extract meaningful insights into predictive maintenance by the detection of bearing faults. Deep learning can be a powerful method to predict these mechanical failures; however, they lack generalizability to new tasks or datasets and require expensive, labeled mechanical data. We address this by presenting a novel self-supervised pretraining and fine-tuning framework based on transformer models. In particular, we investigate different tokenization and data augmentation strategies to reach state-of-the-art accuracies using transformer models. Furthermore, we demonstrate self-supervised masked pretraining for vibration signals and its application to low-data regimes, task adaptation, and dataset adaptation. Pretraining is able to improve performance on scarce, unseen training samples, as well as when fine-tuning on fault classes outside of the pretraining distribution. Furthermore, pretrained transformers are shown to be able to generalize to a different dataset in a few-shot manner. This introduces a new paradigm where models can be pretrained on unlabeled data from different bearings, faults, and machinery and quickly deployed to new, data-scarce applications to suit specific manufacturing needs.

Updated: 2024-05-29 15:13:29

标题: FaultFormer：为适应性轴承故障分类预训练Transformer

摘要: 全球消费的增长促使了深度学习在智能制造和机器健康监测方面的重要应用。特别是，分析振动数据具有很大潜力，可以通过检测轴承故障来提取有意义的见解，从而实现预测性维护。深度学习可以是预测这些机械故障的强大方法；然而，它们缺乏对新任务或数据集的泛化能力，并且需要昂贵的、标记的机械数据。我们通过提出一种基于Transformer模型的新颖的自监督预训练和微调框架来解决这个问题。具体而言，我们研究了不同的分词和数据增强策略，以使用Transformer模型达到最先进的准确性。此外，我们展示了振动信号的自监督掩码预训练及其在低数据范围、任务适应和数据集适应方面的应用。预训练能够提高在稀缺、未见训练样本上的性能，以及在对预训练分布之外的故障类别进行微调时的性能。此外，预训练的Transformer被证明能够以少数样本的方式泛化到不同的数据集。这引入了一种新的范式，即模型可以在来自不同轴承、故障和机械的未标记数据上进行预训练，然后快速部署到新的、数据稀缺的应用程序中，以适应特定的制造需求。

更新时间: 2024-05-29 15:13:29

领域: cs.LG,eess.SP

下载: http://arxiv.org/abs/2312.02380v3

Transformers as Neural Operators for Solutions of Differential Equations with Finite Regularity

Neural operator learning models have emerged as very effective surrogates in data-driven methods for partial differential equations (PDEs) across different applications from computational science and engineering. Such operator learning models not only predict particular instances of a physical or biological system in real-time but also forecast classes of solutions corresponding to a distribution of initial and boundary conditions or forcing terms. % DeepONet is the first neural operator model and has been tested extensively for a broad class of solutions, including Riemann problems. Transformers have not been used in that capacity, and specifically, they have not been tested for solutions of PDEs with low regularity. % In this work, we first establish the theoretical groundwork that transformers possess the universal approximation property as operator learning models. We then apply transformers to forecast solutions of diverse dynamical systems with solutions of finite regularity for a plurality of initial conditions and forcing terms. In particular, we consider three examples: the Izhikevich neuron model, the tempered fractional-order Leaky Integrate-and-Fire (LIF) model, and the one-dimensional Euler equation Riemann problem. For the latter problem, we also compare with variants of DeepONet, and we find that transformers outperform DeepONet in accuracy but they are computationally more expensive.

Updated: 2024-05-29 15:10:24

标题: Transformers作为有限正则性微分方程解的神经算子

摘要: 神经操作学习模型已经成为数据驱动的偏微分方程（PDEs）方法中非常有效的替代品，涵盖了从计算科学和工程学中的不同应用。这种操作学习模型不仅可以实时预测物理或生物系统的特定实例，还可以预测与一组初始和边界条件或强迫项对应的解类别。DeepONet是第一个神经操作模型，并已广泛测试了包括黎曼问题在内的广泛解类别。变压器尚未以这种方式使用，并且特别地，它们尚未针对低正则性PDE解进行测试。在这项工作中，我们首先建立了变压器作为操作学习模型具有通用逼近性质的理论基础。然后，我们将变压器应用于预测多样的动力系统的解，这些解具有有限的正则性，针对多种初始条件和强迫项。特别地，我们考虑了三个示例：Izhikevich神经元模型，温和分数阶泄漏积分-发放（LIF）模型，以及一维欧拉方程黎曼问题。对于后者的问题，我们还与DeepONet的变体进行了比较，发现变压器在准确性上胜过DeepONet，但计算成本更高。

更新时间: 2024-05-29 15:10:24

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19166v1

Spatio-Temporal Field Neural Networks for Air Quality Inference

The air quality inference problem aims to utilize historical data from a limited number of observation sites to infer the air quality index at an unknown location. Considering the sparsity of data due to the high maintenance cost of the stations, good inference algorithms can effectively save the cost and refine the data granularity. While spatio-temporal graph neural networks have made excellent progress on this problem, their non-Euclidean and discrete data structure modeling of reality limits its potential. In this work, we make the first attempt to combine two different spatio-temporal perspectives, fields and graphs, by proposing a new model, Spatio-Temporal Field Neural Network, and its corresponding new framework, Pyramidal Inference. Extensive experiments validate that our model achieves state-of-the-art performance in nationwide air quality inference in the Chinese Mainland, demonstrating the superiority of our proposed model and framework.

Updated: 2024-05-29 15:10:11

标题: 空间-时间场神经网络用于空气质量推断

摘要: 空气质量推断问题旨在利用来自有限数量观测站的历史数据推断未知位置的空气质量指数。考虑到观测站维护成本高导致数据稀疏，良好的推断算法可以有效节省成本并细化数据粒度。尽管时空图神经网络在这一问题上取得了出色进展，但其对现实的非欧几里得和离散数据结构建模限制了其潜力。在这项工作中，我们首次尝试结合两种不同的时空视角，领域和图，通过提出一个新模型，即时空场神经网络，以及相应的新框架，金字塔推断。大量实验证实，我们的模型在中国大陆全国范围内的空气质量推断中取得了最先进的性能，证明了我们提出的模型和框架的优越性。

更新时间: 2024-05-29 15:10:11

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2403.02354v2

Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery

Electronic Discovery (eDiscovery) involves identifying relevant documents from a vast collection based on legal production requests. The integration of artificial intelligence (AI) and natural language processing (NLP) has transformed this process, helping document review and enhance efficiency and cost-effectiveness. Although traditional approaches like BM25 or fine-tuned pre-trained models are common in eDiscovery, they face performance, computational, and interpretability challenges. In contrast, Large Language Model (LLM)-based methods prioritize interpretability but sacrifice performance and throughput. This paper introduces DISCOvery Graph (DISCOG), a hybrid approach that combines the strengths of two worlds: a heterogeneous graph-based method for accurate document relevance prediction and subsequent LLM-driven approach for reasoning. Graph representational learning generates embeddings and predicts links, ranking the corpus for a given request, and the LLMs provide reasoning for document relevance. Our approach handles datasets with balanced and imbalanced distributions, outperforming baselines in F1-score, precision, and recall by an average of 12%, 3%, and 16%, respectively. In an enterprise context, our approach drastically reduces document review costs by 99.9% compared to manual processes and by 95% compared to LLM-based classification methods

Updated: 2024-05-29 15:08:55

标题: 学习诉讼：在电子发现中用于检索和推理的图表和LLM

摘要: 电子发现（eDiscovery）涉及根据法律生产请求从庞大的文档集合中识别相关文档。人工智能（AI）和自然语言处理（NLP）的整合已经改变了这一过程，帮助文档审阅并增强效率和成本效益。虽然传统方法如BM25或精细调整的预训练模型在电子发现中很常见，但它们面临性能、计算和可解释性挑战。相比之下，基于大型语言模型（LLM）的方法优先考虑可解释性，但牺牲了性能和吞吐量。本文介绍了DISCOvery Graph（DISCOG），一种混合方法，结合了两个世界的优势：基于异构图的准确文档相关性预测方法和随后的LLM驱动推理方法。图表征学习生成嵌入并预测链接，为给定请求对语料库进行排名，而LLM为文档相关性提供推理。我们的方法处理平衡和不平衡分布的数据集，分别通过平均12%、3%和16%的F1分数、精确度和召回率优于基线。在企业环境中，与手动流程相比，我们的方法可将文档审阅成本降低99.9%，与基于LLM的分类方法相比可降低95%。

更新时间: 2024-05-29 15:08:55

领域: cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.19164v1

Does learning the right latent variables necessarily improve in-context learning?

Large autoregressive models like Transformers can solve tasks through in-context learning (ICL) without learning new weights, suggesting avenues for efficiently solving new tasks. For many tasks, e.g., linear regression, the data factorizes: examples are independent given a task latent that generates the data, e.g., linear coefficients. While an optimal predictor leverages this factorization by inferring task latents, it is unclear if Transformers implicitly do so or if they instead exploit heuristics and statistical shortcuts enabled by attention layers. Both scenarios have inspired active ongoing work. In this paper, we systematically investigate the effect of explicitly inferring task latents. We minimally modify the Transformer architecture with a bottleneck designed to prevent shortcuts in favor of more structured solutions, and then compare performance against standard Transformers across various ICL tasks. Contrary to intuition and some recent works, we find little discernible difference between the two; biasing towards task-relevant latent variables does not lead to better out-of-distribution performance, in general. Curiously, we find that while the bottleneck effectively learns to extract latent task variables from context, downstream processing struggles to utilize them for robust prediction. Our study highlights the intrinsic limitations of Transformers in achieving structured ICL solutions that generalize, and shows that while inferring the right latents aids interpretability, it is not sufficient to alleviate this problem.

Updated: 2024-05-29 15:06:10

标题: 学习正确的潜变量是否必然改进上下文学习？

摘要: 像Transformer这样的大型自回归模型可以通过上下文学习（ICL）解决任务，而无需学习新的权重，这为有效解决新任务提供了途径。对于许多任务，例如线性回归，数据因子化：给定生成数据的任务潜变量，例如线性系数，示例是独立的。虽然最优预测器通过推断任务潜变量利用这种因子化，但不清楚Transformer是否隐式这样做，或者它们是否利用注意力层启用的启发式和统计快捷方式。这两种情况都激发了积极的持续工作。在本文中，我们系统地研究了显式推断任务潜变量的影响。我们通过最小修改Transformer架构，设计了一个瓶颈，以防止快捷方式，而更倾向于更结构化的解决方案，然后在各种ICL任务中与标准Transformer进行性能比较。与直觉和一些最近的工作相反，我们发现两者之间几乎没有明显区别；偏向于与任务相关的潜变量通常不会导致更好的超出分布性能。有趣的是，我们发现虽然瓶颈有效地学会从上下文中提取潜在的任务变量，但下游处理却难以利用它们进行稳健的预测。我们的研究突显了Transformer在实现结构化ICL解决方案上的固有限制，并表明，虽然推断正确的潜变量有助于解释性，但并不足以缓解这个问题。

更新时间: 2024-05-29 15:06:10

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19162v1

Improving Neural Additive Models with Bayesian Principles

Neural additive models (NAMs) enhance the transparency of deep neural networks by handling input features in separate additive sub-networks. However, they lack inherent mechanisms that provide calibrated uncertainties and enable selection of relevant features and interactions. Approaching NAMs from a Bayesian perspective, we augment them in three primary ways, namely by a) providing credible intervals for the individual additive sub-networks; b) estimating the marginal likelihood to perform an implicit selection of features via an empirical Bayes procedure; and c) facilitating the ranking of feature pairs as candidates for second-order interaction in fine-tuned models. In particular, we develop Laplace-approximated NAMs (LA-NAMs), which show improved empirical performance on tabular datasets and challenging real-world medical tasks.

Updated: 2024-05-29 15:02:21

标题: 用贝叶斯原理改进神经加法模型

摘要: 神经加性模型（NAMs）通过处理输入特征的单独加性子网络，提高了深度神经网络的透明度。然而，它们缺乏提供校准不确定性并启用相关特征和交互作用选择的固有机制。从贝叶斯角度出发，我们通过三种主要方式增强NAMs，即a）为单独的加性子网络提供可信区间；b）通过经验贝叶斯程序估计边缘似然，执行特征的隐式选择；和c）促进特征对的排名，作为调优模型中二阶交互作用的候选。特别是，我们开发了拉普拉斯近似的NAMs（LA-NAMs），在表格数据集和具有挑战性的现实世界医学任务上显示出改进的经验性能。

更新时间: 2024-05-29 15:02:21

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2305.16905v4

Beyond Discrepancy: A Closer Look at the Theory of Distribution Shift

Many machine learning models appear to deploy effortlessly under distribution shift, and perform well on a target distribution that is considerably different from the training distribution. Yet, learning theory of distribution shift bounds performance on the target distribution as a function of the discrepancy between the source and target, rarely guaranteeing high target accuracy. Motivated by this gap, this work takes a closer look at the theory of distribution shift for a classifier from a source to a target distribution. Instead of relying on the discrepancy, we adopt an Invariant-Risk-Minimization (IRM)-like assumption connecting the distributions, and characterize conditions under which data from a source distribution is sufficient for accurate classification of the target. When these conditions are not met, we show when only unlabeled data from the target is sufficient, and when labeled target data is needed. In all cases, we provide rigorous theoretical guarantees in the large sample regime.

Updated: 2024-05-29 15:00:19

标题: 超越差异：深入探讨分布转移理论

摘要: 许多机器学习模型似乎在分布转移下轻松部署，并且在目标分布与训练分布相差较大时表现良好。然而，分布转移的学习理论将目标分布上的性能限制为源分布和目标之间的差距的函数，很少能保证高目标准确性。受到这一差距的启发，本文深入研究了从源分布到目标分布的分类器的分布转移理论。我们不依赖于差距，而是采用类似于不变风险最小化（IRM）的假设连接这些分布，并确定了源分布中的数据何时足以准确分类目标。当这些条件不满足时，我们展示了仅需要目标中的未标记数据时以及何时需要标记的目标数据。在所有情况下，我们在大样本区间提供了严格的理论保证。

更新时间: 2024-05-29 15:00:19

领域: cs.LG

下载: http://arxiv.org/abs/2405.19156v1

A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning

Continual learning with deep neural networks presents challenges distinct from both the fixed-dataset and convex continual learning regimes. One such challenge is plasticity loss, wherein a neural network trained in an online fashion displays a degraded ability to fit new tasks. This problem has been extensively studied in both supervised learning and off-policy reinforcement learning (RL), where a number of remedies have been proposed. Still, plasticity loss has received less attention in the on-policy deep RL setting. Here we perform an extensive set of experiments examining plasticity loss and a variety of mitigation methods in on-policy deep RL. We demonstrate that plasticity loss is pervasive under domain shift in this regime, and that a number of methods developed to resolve it in other settings fail, sometimes even resulting in performance that is worse than performing no intervention at all. In contrast, we find that a class of ``regenerative'' methods are able to consistently mitigate plasticity loss in a variety of contexts, including in gridworld tasks and more challenging environments like Montezuma's Revenge and ProcGen.

Updated: 2024-05-29 14:59:49

标题: 一个关于在线策略深度强化学习中塑性丧失的研究

摘要: 使用深度神经网络进行持续学习面临着与固定数据集和凸连续学习制度不同的挑战。其中一个挑战是可塑性损失，即以在线方式训练的神经网络显示出较差的适应新任务能力。这个问题在监督学习和离策略强化学习中得到了广泛研究，提出了许多解决方法。然而，在策略上的深度强化学习中，可塑性损失受到的关注较少。本文进行了一系列实验，研究了策略上的深度强化学习中的可塑性损失及各种缓解方法。我们证明在这个制度下，可塑性损失在领域转移中普遍存在，并且一些在其他环境中解决此问题的方法失败，有时甚至导致性能比根本不进行干预还要差。相反，我们发现一类“再生”方法能够在各种情境下持续缓解可塑性损失，包括在网格世界任务和更具挑战性的环境中，如蒙特祖玛的复仇和ProcGen。

更新时间: 2024-05-29 14:59:49

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19153v1

CaLa: Complementary Association Learning for Augmenting Composed Image Retrieval

Composed Image Retrieval (CIR) involves searching for target images based on an image-text pair query. While current methods treat this as a query-target matching problem, we argue that CIR triplets contain additional associations beyond this primary relation. In our paper, we identify two new relations within triplets, treating each triplet as a graph node. Firstly, we introduce the concept of text-bridged image alignment, where the query text serves as a bridge between the query image and the target image. We propose a hinge-based cross-attention mechanism to incorporate this relation into network learning. Secondly, we explore complementary text reasoning, considering CIR as a form of cross-modal retrieval where two images compose to reason about complementary text. To integrate these perspectives effectively, we design a twin attention-based compositor. By combining these complementary associations with the explicit query pair-target image relation, we establish a comprehensive set of constraints for CIR. Our framework, CaLa (Complementary Association Learning for Augmenting Composed Image Retrieval), leverages these insights. We evaluate CaLa on CIRR and FashionIQ benchmarks with multiple backbones, demonstrating its superiority in composed image retrieval.

Updated: 2024-05-29 14:52:10

标题: CaLa：用于增强合成图像检索的互补关联学习

摘要: 复合图像检索（CIR）涉及基于图像-文本对查询搜索目标图像。虽然当前方法将此视为查询-目标匹配问题，但我们认为CIR三元组包含超出这种主要关系的额外关联。在我们的论文中，我们确定了三元组中的两种新关系，将每个三元组视为一个图节点。首先，我们引入了文本桥接图像对齐的概念，其中查询文本作为查询图像和目标图像之间的桥梁。我们提出了一种基于铰链的交叉注意机制，将这种关系纳入网络学习中。其次，我们探讨了互补文本推理，将CIR视为一种形式的跨模态检索，其中两个图像组合以推理关于互补文本。为了有效整合这些观点，我们设计了一个双关注的组合器。通过将这些互补关联与显式查询对-目标图像关系相结合，我们为CIR建立了一套全面的约束。我们的框架CaLa（用于增强复合图像检索的补充关联学习）利用这些见解。我们在CIRR和FashionIQ基准测试中评估了CaLa，并展示了它在复合图像检索中的优越性。

更新时间: 2024-05-29 14:52:10

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.19149v1

I Bet You Did Not Mean That: Testing Semantic Importance via Betting

Recent works have extended notions of feature importance to \emph{semantic concepts} that are inherently interpretable to the users interacting with a black-box predictive model. Yet, precise statistical guarantees, such as false positive rate control, are needed to communicate findings transparently and to avoid unintended consequences in real-world scenarios. In this paper, we formalize the global (i.e., over a population) and local (i.e., for a sample) statistical importance of semantic concepts for the predictions of opaque models, by means of conditional independence, which allows for rigorous testing. We use recent ideas of sequential kernelized testing (SKIT) to induce a rank of importance across concepts, and showcase the effectiveness and flexibility of our framework on synthetic datasets as well as on image classification tasks using vision-language models such as CLIP.

Updated: 2024-05-29 14:51:41

标题: 我打赌你不是这个意思：通过下注测试语义重要性

摘要: 最近的研究将特征重要性的概念扩展到语义概念，这些概念对与黑盒预测模型进行交互的用户来说是可解释的。然而，需要精确的统计保证，如假阳性率控制，以便透明地传达发现并避免在现实场景中出现意外后果。在本文中，我们通过条件独立性形式化了对不透明模型的预测中语义概念的全局（即针对人口）和局部（即针对样本）统计重要性，这允许进行严格的测试。我们利用最近的顺序核化测试（SKIT）的思想来诱导概念重要性的排名，并展示了我们的框架在合成数据集以及使用视觉语言模型（如CLIP）进行图像分类任务中的有效性和灵活性。

更新时间: 2024-05-29 14:51:41

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19146v1

DGRC: An Effective Fine-tuning Framework for Distractor Generation in Chinese Multi-choice Reading Comprehension

When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distractor generation (NQDG). In contrast to the CDG, utilizing pre-trained language models (PLMs) for NQDG presents three primary challenges: (1) PLMs are typically trained to generate ``correct'' content, like answers, while rarely trained to generate ``plausible" content, like distractors; (2) PLMs often struggle to produce content that aligns well with specific knowledge and the style of exams; (3) NQDG necessitates the model to produce longer, context-sensitive, and question-relevant distractors. In this study, we introduce a fine-tuning framework named DGRC for NQDG in Chinese multi-choice reading comprehension from authentic examinations. DGRC comprises three major components: hard chain-of-thought, multi-task learning, and generation mask patterns. The experiment results demonstrate that DGRC significantly enhances generation performance, achieving a more than 2.5-fold improvement in BLEU scores.

Updated: 2024-05-29 14:47:01

标题: DGRC：一种有效的微调框架，用于生成中文多项选择阅读理解中的干扰项

摘要: 在评估学习者的知识熟练度时，多项选择题是标准化测试中高效且广泛使用的格式。然而，生成这些问题，特别是合理的干扰项（错误选项），是一个相当大的挑战。通常，干扰项的生成可以分为填空式干扰项生成（CDG）和自然问题干扰项生成（NQDG）。与CDG相比，利用预训练语言模型（PLMs）进行NQDG面临三个主要挑战：（1）PLMs通常训练用于生成“正确”的内容，如答案，而很少训练生成“合理”的内容，如干扰项；（2）PLMs通常难以产生与特定知识和考试风格相匹配的内容；（3）NQDG要求模型生成更长、上下文敏感且与问题相关的干扰项。在本研究中，我们介绍了一个名为DGRC的NQDG中文多项选择阅读理解的微调框架，来自真实考试。DGRC包括三个主要组成部分：硬链式思维、多任务学习和生成掩码模式。实验结果表明，DGRC显著提升了生成性能，BLEU分数实现了超过2.5倍的改进。

更新时间: 2024-05-29 14:47:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19139v1

Principled Preferential Bayesian Optimization

We study the problem of preferential Bayesian optimization (BO), where we aim to optimize a black-box function with only preference feedback over a pair of candidate solutions. Inspired by the likelihood ratio idea, we construct a confidence set of the black-box function using only the preference feedback. An optimistic algorithm with an efficient computational method is then developed to solve the problem, which enjoys an information-theoretic bound on the total cumulative regret, a first-of-its-kind for preferential BO. This bound further allows us to design a scheme to report an estimated best solution, with a guaranteed convergence rate. Experimental results on sampled instances from Gaussian processes, standard test functions, and a thermal comfort optimization problem all show that our method stably achieves better or competitive performance as compared to the existing state-of-the-art heuristics, which, however, do not have theoretical guarantees on regret bounds or convergence.

Updated: 2024-05-29 14:46:51

标题: 原则性优先贝叶斯优化

摘要: 我们研究了偏好贝叶斯优化（BO）的问题，旨在通过对一对候选解的偏好反馈来优化一个黑盒函数。受似然比率思想启发，我们利用仅有的偏好反馈构建了黑盒函数的置信区间。接着开发了一种乐观算法，并采用高效的计算方法来解决这个问题，其在总累积遗憾上具有信息理论界限，这在偏好BO中是首次出现的。这个界限进一步使我们能够设计一个方案来报告估计的最佳解，具有保证的收敛速度。从高斯过程、标准测试函数和热舒适度优化问题的抽样实例的实验结果显示，与现有的最先进的启发式方法相比，我们的方法稳定地实现了更好或竞争性的性能，而这些方法却没有关于遗憾界限或收敛性的理论保证。

更新时间: 2024-05-29 14:46:51

领域: cs.LG

下载: http://arxiv.org/abs/2402.05367v2

RAG-Driver: Generalisable Driving Explanations with Retrieval-Augmented In-Context Learning in Multi-Modal Large Language Model

We need to trust robots that use often opaque AI methods. They need to explain themselves to us, and we need to trust their explanation. In this regard, explainability plays a critical role in trustworthy autonomous decision-making to foster transparency and acceptance among end users, especially in complex autonomous driving. Recent advancements in Multi-Modal Large Language models (MLLMs) have shown promising potential in enhancing the explainability as a driving agent by producing control predictions along with natural language explanations. However, severe data scarcity due to expensive annotation costs and significant domain gaps between different datasets makes the development of a robust and generalisable system an extremely challenging task. Moreover, the prohibitively expensive training requirements of MLLM and the unsolved problem of catastrophic forgetting further limit their generalisability post-deployment. To address these challenges, we present RAG-Driver, a novel retrieval-augmented multi-modal large language model that leverages in-context learning for high-performance, explainable, and generalisable autonomous driving. By grounding in retrieved expert demonstration, we empirically validate that RAG-Driver achieves state-of-the-art performance in producing driving action explanations, justifications, and control signal prediction. More importantly, it exhibits exceptional zero-shot generalisation capabilities to unseen environments without further training endeavours.

Updated: 2024-05-29 14:44:20

标题: RAG-Driver：多模态大型语言模型中具有检索增强上下文学习的可推广驾驶解释

摘要: 我们需要相信那些使用常常不透明的人工智能方法的机器人。它们需要向我们解释自己，而我们需要信任它们的解释。在这方面，可解释性在值得信赖的自主决策中起着关键作用，以促进终端用户之间的透明度和接受度，尤其是在复杂的自主驾驶中。最近在多模态大型语言模型（MLLMs）方面取得的进展显示出在生成控制预测以及自然语言解释方面提高可解释性作为驾驶代理的潜力。然而，由于昂贵的注释成本和不同数据集之间的重大领域差距导致的严重数据稀缺性使得开发一个稳健且具有一般性的系统变得极其具有挑战性。此外，MLLM的训练要求过高以及灾难性遗忘问题尚未解决，进一步限制了它们在部署后的一般性。为了解决这些挑战，我们提出了RAG-Driver，一种新颖的检索增强的多模态大型语言模型，利用上下文学习实现高性能、可解释且具有一般性的自主驾驶。通过基于检索的专家演示，我们经验性地验证了RAG-Driver在产生驾驶行为解释、理由和控制信号预测方面达到了最先进的性能。更重要的是，它展现出在未见过的环境中具有出色的零-shot泛化能力，无需进一步的训练努力。

更新时间: 2024-05-29 14:44:20

领域: cs.RO,cs.AI

下载: http://arxiv.org/abs/2402.10828v2

Analyzing Chat Protocols of Novice Programmers Solving Introductory Programming Tasks with ChatGPT

Large Language Models (LLMs) have taken the world by storm, and students are assumed to use related tools at a great scale. In this research paper we aim to gain an understanding of how introductory programming students chat with LLMs and related tools, e.g., ChatGPT-3.5. To address this goal, computing students at a large German university were motivated to solve programming exercises with the assistance of ChatGPT as part of their weekly introductory course exercises. Then students (n=213) submitted their chat protocols (with 2335 prompts in sum) as data basis for this analysis. The data was analyzed w.r.t. the prompts, frequencies, the chats' progress, contents, and other use pattern, which revealed a great variety of interactions, both potentially supportive and concerning. Learning about students' interactions with ChatGPT will help inform and align teaching practices and instructions for future introductory programming courses in higher education.

Updated: 2024-05-29 14:38:32

标题: 分析初学者程序员使用ChatGPT解决入门编程任务的聊天协议

摘要: 大型语言模型(LLMs)已经席卷世界，人们认为学生在很大程度上会使用相关工具。在这篇研究论文中，我们旨在了解初级编程学生如何与LLMs和相关工具（如ChatGPT-3.5）进行交流。为了实现这一目标，一所德国大型大学的计算机学生被激励在他们的每周初级课程练习中借助ChatGPT解决编程练习。然后学生（n=213）提交了他们的聊天协议（总计2335个提示）作为这项分析的数据基础。数据被分析，关于提示、频率、聊天的进展、内容和其他使用模式，揭示了各种各样的互动，既可能支持也可能引起关注。了解学生与ChatGPT的互动将有助于为未来高等教育的初级编程课程提供信息和对齐教学实践和指导。

更新时间: 2024-05-29 14:38:32

领域: cs.AI

下载: http://arxiv.org/abs/2405.19132v1

Spatio-Spectral Graph Neural Networks

Spatial Message Passing Graph Neural Networks (MPGNNs) are widely used for learning on graph-structured data. However, key limitations of l-step MPGNNs are that their "receptive field" is typically limited to the l-hop neighborhood of a node and that information exchange between distant nodes is limited by over-squashing. Motivated by these limitations, we propose Spatio-Spectral Graph Neural Networks (S$^2$GNNs) -- a new modeling paradigm for Graph Neural Networks (GNNs) that synergistically combines spatially and spectrally parametrized graph filters. Parameterizing filters partially in the frequency domain enables global yet efficient information propagation. We show that S$^2$GNNs vanquish over-squashing and yield strictly tighter approximation-theoretic error bounds than MPGNNs. Further, rethinking graph convolutions at a fundamental level unlocks new design spaces. For example, S$^2$GNNs allow for free positional encodings that make them strictly more expressive than the 1-Weisfeiler-Lehman (WL) test. Moreover, to obtain general-purpose S$^2$GNNs, we propose spectrally parametrized filters for directed graphs. S$^2$GNNs outperform spatial MPGNNs, graph transformers, and graph rewirings, e.g., on the peptide long-range benchmark tasks, and are competitive with state-of-the-art sequence modeling. On a 40 GB GPU, S$^2$GNNs scale to millions of nodes.

Updated: 2024-05-29 14:28:08

标题: 空间-谱图神经网络 (Note: This is a direct translation of the title provided)

摘要: 空间消息传递图神经网络（MPGNNs）被广泛用于学习图结构数据。然而，l步MPGNNs的关键局限在于它们的“接受域”通常仅限于节点的l跳邻域，并且远距离节点之间的信息交换受到过度压缩的限制。受到这些限制的启发，我们提出了空间-频谱图神经网络（S$^2$GNNs）--一种新的图神经网络（GNNs）建模范式，将空间和频谱参数化的图滤波器协同结合起来。在频率域部分参数化滤波器使得全局而高效的信息传播成为可能。我们展示了S$^2$GNNs战胜了过度压缩，并产生比MPGNNs严格更紧的逼近论错误界限。此外，以基本层面重新思考图卷积解锁了新的设计空间。例如，S$^2$GNNs允许自由位置编码，使其比1-Weisfeiler-Lehman（WL）测试严格更具表现力。此外，为了得到通用的S$^2$GNNs，我们提出了针对有向图的频谱参数化滤波器。S$^2$GNNs在肽长程基准任务上胜过了空间MPGNNs，图变压器和图重连等方法，并与最先进的序列建模方法相竞争。在一个40 GB的GPU上，S$^2$GNNs可以扩展到数百万个节点。

更新时间: 2024-05-29 14:28:08

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19121v1

Can Graph Learning Improve Task Planning?

Task planning is emerging as an important research topic alongside the development of large language models (LLMs). It aims to break down complex user requests into solvable sub-tasks, thereby fulfilling the original requests. In this context, the sub-tasks can be naturally viewed as a graph, where the nodes represent the sub-tasks, and the edges denote the dependencies among them. Consequently, task planning is a decision-making problem that involves selecting a connected path or subgraph within the corresponding graph and invoking it. In this paper, we explore graph learning-based methods for task planning, a direction that is orthogonal to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs, which is adeptly addressed by graph neural networks (GNNs). This theoretical insight led us to integrate GNNs with LLMs to enhance overall performance. Extensive experiments demonstrate that GNN-based methods surpass existing solutions even without training, and minimal training can further enhance their performance. Additionally, our approach complements prompt engineering and fine-tuning techniques, with performance further enhanced by improved prompts or a fine-tuned model.

Updated: 2024-05-29 14:26:24

标题: 图学习能够提升任务规划吗？

摘要: 任务规划正在与大型语言模型（LLMs）的发展一起成为一个重要的研究课题。它旨在将复杂的用户请求分解为可解决的子任务，从而实现原始请求。在这种情况下，子任务可以自然地被视为一个图，其中节点代表子任务，边表示它们之间的依赖关系。因此，任务规划是一个涉及在相应图中选择一个连接路径或子图并调用它的决策问题。在本文中，我们探讨了基于图学习的任务规划方法，这是与目前主要关注提示设计方向正交的一个方向。我们对图学习的兴趣源自一个理论发现：注意力和自回归损失的偏差阻碍了LLMs在图上有效导航决策的能力，而图神经网络（GNNs）则能够熟练地解决这个问题。这一理论洞见促使我们将GNNs与LLMs整合，以提升整体性能。大量实验证明，基于GNN的方法甚至在没有训练的情况下超越了现有解决方案，而最少的训练可以进一步提升它们的性能。此外，我们的方法辅助提示工程和微调技术，通过改进提示或微调模型进一步提升性能。

更新时间: 2024-05-29 14:26:24

领域: cs.LG

下载: http://arxiv.org/abs/2405.19119v1

SoK: A Defense-Oriented Evaluation of Software Supply Chain Security

The software supply chain comprises a highly complex set of operations, processes, tools, institutions and human factors involved in creating a piece of software. A number of high-profile attacks that exploit a weakness in this complex ecosystem have spurred research in identifying classes of supply chain attacks. Yet, practitioners often lack the necessary information to understand their security posture and implement suitable defenses against these attacks. We argue that the next stage of software supply chain security research and development will benefit greatly from a defense-oriented approach that focuses on holistic bottom-up solutions. To this end, this paper introduces the AStRA model, a framework for representing fundamental software supply chain elements and their causal relationships. Using this model, we identify software supply chain security objectives that are needed to mitigate common attacks and systematize knowledge on recent and well-established security techniques for their ability to meet these objectives. We validate our model against prior attacks and taxonomies. Finally, we identify emergent research gaps and propose opportunities to develop novel software development tools and systems that are secure-by-design.

Updated: 2024-05-29 14:26:13

标题: SoK: 一个以防御为导向的软件供应链安全评估

摘要: 软件供应链包括一系列高度复杂的操作、流程、工具、机构和人为因素，涉及创建软件的过程。利用这个复杂的生态系统中的弱点进行的一些知名攻击，促使研究者们鉴别供应链攻击的类别。然而，实践者通常缺乏必要的信息来了解他们的安全状况，并采取适当的防御措施来应对这些攻击。我们认为，软件供应链安全研究和发展的下一个阶段将极大受益于一个以防御为导向的方法，专注于全面的自下而上的解决方案。为此，本文介绍了AStRA模型，这是一个代表基本软件供应链元素及其因果关系的框架。利用这一模型，我们鉴别了需要实现的软件供应链安全目标，以减轻常见攻击，并系统化了对最近和已建立的安全技术的了解，以评估其是否能够实现这些目标。我们通过先前的攻击和分类验证了我们的模型。最后，我们确定了新兴研究领域的差距，并提出了发展新型软件开发工具和系统的机会，使其具备安全性设计。

更新时间: 2024-05-29 14:26:13

领域: cs.CR

下载: http://arxiv.org/abs/2405.14993v2

Introducing Adaptive Continuous Adversarial Training (ACAT) to Enhance ML Robustness

Adversarial training enhances the robustness of Machine Learning (ML) models against adversarial attacks. However, obtaining labeled training and adversarial training data in network/cybersecurity domains is challenging and costly. Therefore, this letter introduces Adaptive Continuous Adversarial Training (ACAT), a method that integrates adversarial training samples into the model during continuous learning sessions using real-world detected adversarial data. Experimental results with a SPAM detection dataset demonstrate that ACAT reduces the time required for adversarial sample detection compared to traditional processes. Moreover, the accuracy of the under-attack ML-based SPAM filter increased from 69% to over 88% after just three retraining sessions.

Updated: 2024-05-29 14:23:35

标题: 引入自适应连续对抗训练（ACAT）以增强机器学习鲁棒性

摘要: 对抗训练提高了机器学习（ML）模型对抗性攻击的鲁棒性。然而，在网络/网络安全领域获得带标签的训练和对抗训练数据是具有挑战性和成本高昂的。因此，本信函介绍了自适应连续对抗训练（ACAT）方法，该方法在连续学习会话中使用实际检测到的对抗性数据将对抗训练样本整合到模型中。使用一个垃圾邮件检测数据集的实验结果表明，与传统流程相比，ACAT减少了对抗样本检测所需的时间。此外，在仅进行三次重新训练会话后，受攻击的基于ML的垃圾邮件过滤器的准确性从69%提高到超过88%。

更新时间: 2024-05-29 14:23:35

领域: cs.LG,cs.CR,cs.NI

下载: http://arxiv.org/abs/2403.10461v2

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that directly generates Fourier spectral coefficients. Vocos not only matches the state-of-the-art in audio quality, as demonstrated in our evaluations, but it also substantially improves computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. The source code and model weights have been open-sourced at https://github.com/gemelo-ai/vocos.

Updated: 2024-05-29 14:21:47

标题: Vocos：缩小时域和基于傅立叶的神经声码器之间的差距，实现高质量音频合成

摘要: 最近神经声码器的进展主要是由在时域操作的生成对抗网络（GANs）推动的。虽然有效，这种方法忽视了时频表示提供的归纳偏差，导致冗余和计算密集的上采样操作。基于傅立叶变换的时频表示是一个吸引人的替代方案，更准确地与人类听觉感知相一致，并且受益于其计算的成熟快速算法。然而，直接重建复值谱图历来存在问题，主要是由于相位恢复问题。本研究旨在通过提出Vocos来填补这一差距，这是一个直接生成傅立叶谱系数的新模型。Vocos不仅在我们的评估中展示了与音频质量方面的最新技术匹配，而且还大大提高了计算效率，与当前的时域神经声码器方法相比，速度提高了一个数量级。源代码和模型权重已经在https://github.com/gemelo-ai/vocos 开源。

更新时间: 2024-05-29 14:21:47

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2306.00814v3

SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals

This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic framework that generates prediction intervals for datasets with complete or missing data. SEMF extends the Expectation-Maximization (EM) algorithm, traditionally used in unsupervised learning, to a supervised context, enabling it to extract latent representations for uncertainty estimation. The framework demonstrates robustness through extensive empirical evaluation across 11 tabular datasets, achieving$\unicode{x2013}$in some cases$\unicode{x2013}$narrower normalized prediction intervals and higher coverage than traditional quantile regression methods. Furthermore, SEMF integrates seamlessly with existing machine learning algorithms, such as gradient-boosted trees and neural networks, exemplifying its usefulness for real-world applications. The experimental results highlight SEMF's potential to advance state-of-the-art techniques in uncertainty quantification.

Updated: 2024-05-29 14:17:13

标题: SEMF：用于预测区间的监督式期望最大化框架

摘要: 这项工作介绍了监督期望最大化框架（SEMF），这是一个多功能且与模型无关的框架，可为具有完整或缺失数据集生成预测区间。SEMF扩展了传统用于无监督学习的期望最大化（EM）算法，将其应用于监督上下文，使其能够提取用于不确定性估计的潜在表示。该框架通过对11个表格数据集进行广泛的实证评估展现了其鲁棒性，在某些情况下实现了比传统分位数回归方法更窄的标准化预测区间和更高的覆盖率。此外，SEMF与现有的机器学习算法（如梯度提升树和神经网络）无缝集成，展示了其在实际应用中的有用性。实验结果突显了SEMF在不确定性量化方面推进最新技术的潜力。

更新时间: 2024-05-29 14:17:13

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.18176v2

Non-Log-Concave and Nonsmooth Sampling via Langevin Monte Carlo Algorithms

We study the problem of approximate sampling from non-log-concave distributions, e.g., Gaussian mixtures, which is often challenging even in low dimensions due to their multimodality. We focus on performing this task via Markov chain Monte Carlo (MCMC) methods derived from discretizations of the overdamped Langevin diffusions, which are commonly known as Langevin Monte Carlo algorithms. Furthermore, we are also interested in two nonsmooth cases for which a large class of proximal MCMC methods have been developed: (i) a nonsmooth prior is considered with a Gaussian mixture likelihood; (ii) a Laplacian mixture distribution. Such nonsmooth and non-log-concave sampling tasks arise from a wide range of applications to Bayesian inference and imaging inverse problems such as image deconvolution. We perform numerical simulations to compare the performance of most commonly used Langevin Monte Carlo algorithms.

Updated: 2024-05-29 14:15:42

标题: 非对数凹和非光滑抽样的Langevin Monte Carlo算法

摘要: 我们研究了从非对数凹分布（例如高斯混合分布）中近似抽样的问题，即使在低维情况下也常常具有挑战性，因为它们具有多峰性。我们专注于通过源自过阻尼朗之万扬散过程的离散化得到的马尔可夫链蒙特卡洛（MCMC）方法来执行这一任务，这些方法通常被称为朗之万蒙特卡洛算法。此外，我们还对两种非光滑情况感兴趣，针对这两种情况已经开发了大量的近端MCMC方法：（i）考虑了一个具有高斯混合似然的非光滑先验；（ii）一个拉普拉斯混合分布。这些非光滑和非对数凹的抽样任务源自广泛的贝叶斯推断和成像反问题的应用，例如图像去卷积。我们进行了数值模拟来比较最常用的朗之万蒙特卡洛算法的性能。

更新时间: 2024-05-29 14:15:42

领域: stat.ML,cs.LG,stat.CO,stat.ME

下载: http://arxiv.org/abs/2305.15988v2

Position: Foundation Agents as the Paradigm Shift for Decision Making

Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.

Updated: 2024-05-29 14:15:09

标题: 职位：基金会代理人作为决策转变的范式

摘要: 决策需要认知、记忆和推理之间复杂的相互作用，以确定最佳政策。传统的决策方法面临着与样本效率低和泛化能力差有关的挑战。相比之下，语言和视觉领域的基础模型展示了对多样新任务的快速适应能力。因此，我们主张将基础代理构建作为代理学习范式的转变。这一提议基于对基础代理的基本特征和挑战进行了阐述，其动机是大型语言模型（LLMs）的成功。此外，我们详细说明了基础代理的路线图，从大规模互动数据收集或生成，到自监督预训练和适应，以及与LLMs的知识和价值对齐。最后，我们指出了从这一阐述中派生出的关键研究问题，并勾画了支持基础代理的实际用例，解决了推动该领域朝着更全面和有影响力的未来发展的技术和理论方面。

更新时间: 2024-05-29 14:15:09

领域: cs.AI

下载: http://arxiv.org/abs/2405.17009v3

Devil's Advocate: Anticipatory Reflection for LLM Agents

In this work, we introduce a novel approach that equips LLM agents with introspection, enhancing consistency and adaptability in solving complex tasks. Our approach prompts LLM agents to decompose a given task into manageable subtasks (i.e., to make a plan), and to continuously introspect upon the suitability and results of their actions. We implement a three-fold introspective intervention: 1) anticipatory reflection on potential failures and alternative remedy before action execution, 2) post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution, and 3) comprehensive review upon plan completion for future strategy refinement. By deploying and experimenting with this methodology - a zero-shot approach - within WebArena for practical tasks in web environments, our agent demonstrates superior performance over existing zero-shot methods. The experimental results suggest that our introspection-driven approach not only enhances the agent's ability to navigate unanticipated challenges through a robust mechanism of plan execution, but also improves efficiency by reducing the number of trials and plan revisions needed to achieve a task.

Updated: 2024-05-29 14:12:53

标题: 魔鬼的辩护：LLM代理预期性思考

摘要: 在这项工作中，我们引入了一种新颖的方法，为LLM代理提供自省能力，增强解决复杂任务的一致性和适应性。我们的方法促使LLM代理将给定任务分解为可管理的子任务（即制定计划），并不断自省其行动的适宜性和结果。我们实施了三重自省干预：1）在执行行动之前对潜在失败和替代补救进行预期性反思，2）在行动后与子任务目标对齐并回溯补救，以确保最大努力地执行计划，3）在计划完成后进行全面审查，以便未来策略的完善。通过在Web环境中的实际任务中使用这种方法 - 一种零射击方法 - 我们的代理表现出比现有的零射击方法更优越的性能。实验结果表明，我们的自省驱动方法不仅通过计划执行的稳健机制增强了代理商应对意外挑战的能力，还通过减少完成任务所需的试验次数和计划修订次数来提高效率。

更新时间: 2024-05-29 14:12:53

领域: cs.AI

下载: http://arxiv.org/abs/2405.16334v3

Offline Regularised Reinforcement Learning for Large Language Models Alignment

The dominant framework for alignment of large language models (LLM), whether through reinforcement learning from human feedback or direct preference optimisation, is to learn from preference data. This involves building datasets where each element is a quadruplet composed of a prompt, two independent responses (completions of the prompt) and a human preference between the two independent responses, yielding a preferred and a dis-preferred response. Such data is typically scarce and expensive to collect. On the other hand, \emph{single-trajectory} datasets where each element is a triplet composed of a prompt, a response and a human feedback is naturally more abundant. The canonical element of such datasets is for instance an LLM's response to a user's prompt followed by a user's feedback such as a thumbs-up/down. Consequently, in this work, we propose DRO, or \emph{Direct Reward Optimisation}, as a framework and associated algorithms that do not require pairwise preferences. DRO uses a simple mean-squared objective that can be implemented in various ways. We validate our findings empirically, using T5 encoder-decoder language models, and show DRO's performance over selected baselines such as Kahneman-Tversky Optimization (KTO). Thus, we confirm that DRO is a simple and empirically compelling method for single-trajectory policy optimisation.

Updated: 2024-05-29 14:11:29

标题: 离线正则化强化学习用于大型语言模型对齐

摘要: 大型语言模型（LLM）对齐的主要框架，无论是通过强化学习从人类反馈中学习还是直接偏好优化，都是从偏好数据中学习。这涉及构建数据集，其中每个元素都是一个由提示，两个独立响应（提示的完成）和人类对这两个独立响应之间的偏好组成的四元组，产生一个优选和一个不优选的响应。这种数据通常很少且昂贵。另一方面，“单轨迹”数据集，其中每个元素是一个由提示，一个响应和人类反馈组成的三元组，自然更加丰富。这些数据集的经典元素例如是LLM对用户提示的响应，然后是用户的反馈，比如一个大拇指向上/向下。因此，在这项工作中，我们提出了DRO，或称为“直接奖励优化”，作为一个框架和相关算法，不需要成对的偏好。DRO使用一个简单的均方目标，可以以各种方式实现。我们通过使用T5编码器-解码器语言模型在实证上验证了我们的发现，并展示了DRO相对于选择的基线（如Kahneman-Tversky Optimization，KTO）的性能。因此，我们确认DRO是一种简单且在实证上具有说服力的单轨迹策略优化方法。

更新时间: 2024-05-29 14:11:29

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19107v1

AudioProtoPNet: An interpretable deep learning model for bird sound classification

Recently, scientists have proposed several deep learning models to monitor the diversity of bird species. These models can detect bird species with high accuracy by analyzing acoustic signals. However, traditional deep learning algorithms are black-box models that provide no insight into their decision-making process. For domain experts, such as ornithologists, it is crucial that these models are not only efficient, but also interpretable in order to be used as assistive tools. In this study, we present an adaption of the Prototypical Part Network (ProtoPNet) for audio classification that provides inherent interpretability through its model architecture. Our approach is based on a ConvNeXt backbone architecture for feature extraction and learns prototypical patterns for each bird species using spectrograms of the training data. Classification of new data is done by comparison with these prototypes in latent space, which simultaneously serve as easily understandable explanations for the model's decisions. We evaluated the performance of our model on seven different datasets representing bird species from different geographical regions. In our experiments, the model showed excellent results, achieving an average AUROC of 0.82 and an average cmAP of 0.37 across the seven datasets, making it comparable to state-of-the-art black-box models for bird sound classification. Thus, this work demonstrates that even for the challenging task of bioacoustic bird classification, powerful yet interpretable deep learning models can be developed to provide valuable insights to domain experts.

Updated: 2024-05-29 14:09:17

标题: AudioProtoPNet：一种用于鸟类声音分类的可解释深度学习模型

摘要: 最近，科学家们提出了几种深度学习模型来监测鸟类物种的多样性。这些模型可以通过分析声学信号准确地检测鸟类物种。然而，传统的深度学习算法是黑盒模型，不提供其决策过程的洞察。对于鸟类学家等领域专家来说，这些模型不仅需要高效，还需要可解释性，以便作为辅助工具使用。在这项研究中，我们提出了一种适用于音频分类的原型部分网络（ProtoPNet）的改进版本，通过其模型架构提供固有的可解释性。我们的方法基于ConvNeXt主干架构进行特征提取，并利用训练数据的频谱图为每个鸟类物种学习原型模式。新数据的分类通过与这些原型在潜在空间中的比较来完成，这同时作为模型决策的易于理解的解释。我们在代表不同地理区域鸟类物种的七个不同数据集上评估了我们模型的性能。在我们的实验中，模型表现出色，实现了七个数据集的平均AUROC为0.82，平均cmAP为0.37，与鸟类声音分类的最先进的黑盒模型相当。因此，这项工作表明，即使对于具有挑战性的生物声学鸟类分类任务，也可以开发出功能强大且可解释的深度学习模型，为领域专家提供有价值的见解。

更新时间: 2024-05-29 14:09:17

领域: cs.LG

下载: http://arxiv.org/abs/2404.10420v2

Voice Jailbreak Attacks Against GPT-4o

Recently, the concept of artificial assistants has evolved from science fiction into real-world applications. GPT-4o, the newest multimodal large language model (MLLM) across audio, vision, and text, has further blurred the line between fiction and reality by enabling more natural human-computer interactions. However, the advent of GPT-4o's voice mode may also introduce a new attack surface. In this paper, we present the first systematic measurement of jailbreak attacks against the voice mode of GPT-4o. We show that GPT-4o demonstrates good resistance to forbidden questions and text jailbreak prompts when directly transferring them to voice mode. This resistance is primarily due to GPT-4o's internal safeguards and the difficulty of adapting text jailbreak prompts to voice mode. Inspired by GPT-4o's human-like behaviors, we propose VoiceJailbreak, a novel voice jailbreak attack that humanizes GPT-4o and attempts to persuade it through fictional storytelling (setting, character, and plot). VoiceJailbreak is capable of generating simple, audible, yet effective jailbreak prompts, which significantly increases the average attack success rate (ASR) from 0.033 to 0.778 in six forbidden scenarios. We also conduct extensive experiments to explore the impacts of interaction steps, key elements of fictional writing, and different languages on VoiceJailbreak's effectiveness and further enhance the attack performance with advanced fictional writing techniques. We hope our study can assist the research community in building more secure and well-regulated MLLMs.

Updated: 2024-05-29 14:07:44

标题: 对GPT-4o的Voice Jailbreak攻击

摘要: 最近，人工助理的概念已经从科幻小说发展到现实世界的应用。GPT-4o是最新的跨音频、视觉和文本的多模态大型语言模型（MLLM），通过实现更自然的人机交互，进一步模糊了小说和现实之间的界限。然而，GPT-4o的语音模式的出现也可能引入新的攻击面。在本文中，我们首次系统地对针对GPT-4o语音模式的越狱攻击进行了测量。我们展示了当直接将禁止问题和文本越狱提示转换为语音模式时，GPT-4o对此表现出良好的抵抗力。这种抵抗主要是由于GPT-4o的内部保障措施以及将文本越狱提示调整到语音模式的困难。受GPT-4o类人行为的启发，我们提出了VoiceJailbreak，一种通过虚构故事（背景、角色和情节）使GPT-4o人性化并试图说服其的新型语音越狱攻击。VoiceJailbreak能够生成简单、可听、但有效的越狱提示，将六种禁止场景的平均攻击成功率（ASR）从0.033提高到0.778。我们还进行了大量实验，探讨了互动步骤、虚构写作的关键元素和不同语言对VoiceJailbreak效果的影响，并通过先进的虚构写作技术进一步提高了攻击性能。我们希望我们的研究能帮助研究社区建立更安全、更规范的MLLMs。

更新时间: 2024-05-29 14:07:44

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.19103v1

High-Performance Hybrid Algorithm for Minimum Sum-of-Squares Clustering of Infinitely Tall Data

This paper introduces a novel formulation of the clustering problem, namely the Minimum Sum-of-Squares Clustering of Infinitely Tall Data (MSSC-ITD), and presents HPClust, an innovative set of hybrid parallel approaches for its effective solution. By utilizing modern high-performance computing techniques, HPClust enhances key clustering metrics: effectiveness, computational efficiency, and scalability. In contrast to vanilla data parallelism, which only accelerates processing time through the MapReduce framework, our approach unlocks superior performance by leveraging the multi-strategy competitive-cooperative parallelism and intricate properties of the objective function landscape. Unlike other available algorithms that struggle to scale, our algorithm is inherently parallel in nature, improving solution quality through increased scalability and parallelism, and outperforming even advanced algorithms designed for small and medium-sized datasets. Our evaluation of HPClust, featuring four parallel strategies, demonstrates its superiority over traditional and cutting-edge methods by offering better performance in the key metrics. These results also show that parallel processing not only enhances the clustering efficiency, but the accuracy as well. Additionally, we explore the balance between computational efficiency and clustering quality, providing insights into optimal parallel strategies based on dataset specifics and resource availability. This research advances our understanding of parallelism in clustering algorithms, demonstrating that a judicious hybridization of advanced parallel approaches yields optimal results for MSSC-ITD. Experiments on synthetic data further confirm HPClust's exceptional scalability and robustness to noise.

Updated: 2024-05-29 14:07:17

标题: 高性能混合算法用于无穷高数据的最小平方和聚类

摘要: 本文介绍了一种新颖的聚类问题形式，即无限高数据的最小平方和聚类（MSSC-ITD），并提出了HPClust，一组创新的混合并行方法，用于有效解决该问题。通过利用现代高性能计算技术，HPClust提高了关键的聚类指标：有效性、计算效率和可扩展性。与仅通过MapReduce框架加速处理时间的普通数据并行相比，我们的方法通过利用多策略竞争合作并行和目标函数景观的复杂特性，解锁了更优越的性能。与其他现有的算法不同，这些算法往往难以扩展，我们的算法在本质上是并行的，通过增加可扩展性和并行性来改善解决方案质量，并且甚至胜过为小型和中型数据集设计的先进算法。我们对HPClust的评估，包括四种并行策略，证明了其在关键指标上比传统和尖端方法更优越。这些结果还表明，并行处理不仅提高了聚类效率，也提高了准确性。此外，我们探讨了计算效率和聚类质量之间的平衡，根据数据集的具体特性和资源可用性，提供了关于最佳并行策略的见解。这项研究推进了我们对聚类算法中并行性的理解，表明对MSSC-ITD采用先进并行方法的审慎混合可以产生最佳结果。对合成数据的实验进一步证实了HPClust在可扩展性和对噪声的稳健性方面的杰出表现。

更新时间: 2024-05-29 14:07:17

领域: cs.DC,cs.LG,math.OC

下载: http://arxiv.org/abs/2311.04517v3

Poseidon: Efficient Foundation Models for PDEs

We introduce Poseidon, a foundation model for learning the solution operators of PDEs. It is based on a multiscale operator transformer, with time-conditioned layer norms that enable continuous-in-time evaluations. A novel training strategy leveraging the semi-group property of time-dependent PDEs to allow for significant scaling-up of the training data is also proposed. Poseidon is pretrained on a diverse, large scale dataset for the governing equations of fluid dynamics. It is then evaluated on a suite of 15 challenging downstream tasks that include a wide variety of PDE types and operators. We show that Poseidon exhibits excellent performance across the board by outperforming baselines significantly, both in terms of sample efficiency and accuracy. Poseidon also generalizes very well to new physics that is not seen during pretraining. Moreover, Poseidon scales with respect to model and data size, both for pretraining and for downstream tasks. Taken together, our results showcase the surprising ability of Poseidon to learn effective representations from a very small set of PDEs during pretraining in order to generalize well to unseen and unrelated PDEs downstream, demonstrating its potential as an effective, general purpose PDE foundation model. Finally, the Poseidon model as well as underlying pretraining and downstream datasets are open sourced, with code being available at https://github.com/camlab-ethz/poseidon and pretrained models and datasets at https://huggingface.co/camlab-ethz.

Updated: 2024-05-29 14:06:51

标题: 波塞冬：偏微分方程高效基础模型

摘要: 我们引入Poseidon，这是一个用于学习PDE解算子的基础模型。它基于一个多尺度操作器变换器，具有时间条件的层规范，可以实现连续的时间评估。我们还提出了一种新颖的训练策略，利用时间依赖PDE的半群性质，允许显著扩展训练数据。Poseidon在一个多样化的大规模数据集上进行了预训练，用于流体动力学的控制方程。然后在包括各种PDE类型和操作器的15个具有挑战性的下游任务上进行评估。我们展示了Poseidon在所有方面都表现出色，通过显著优于基准线，无论是在样本效率还是准确性方面。Poseidon还非常好地泛化到预训练期间没有见过的新物理。此外，Poseidon在模型和数据大小方面具有可伸缩性，无论是用于预训练还是下游任务。综合考虑，我们的结果展示了Poseidon从一组非常少量的PDE中学习有效表示的惊人能力，以便在下游泛化到看不见的和不相关的PDE，展示了其作为一个有效的通用PDE基础模型的潜力。最后，Poseidon模型以及底层的预训练和下游数据集都是开源的，代码可在https://github.com/camlab-ethz/poseidon 上找到，预训练模型和数据集可在 https://huggingface.co/camlab-ethz 上找到。

更新时间: 2024-05-29 14:06:51

领域: cs.LG

下载: http://arxiv.org/abs/2405.19101v1

DataSafe: Copyright Protection with PUF Watermarking and Blockchain Tracking

Digital watermarking methods are commonly used to safeguard digital media copyrights by confirming ownership and deterring unauthorized use. However, without reliable third-party oversight, these methods risk security vulnerabilities during watermark extraction. Furthermore, digital media lacks tangible ownership attributes, posing challenges for secure copyright transfer and tracing. This study introduces DataSafe, a copyright protection scheme that combines physical unclonable functions (PUFs) and blockchain technology. PUF devices use their unique fingerprints for blockchain registration. Subsequently, these devices incorporate invisible watermarking techniques to embed digital watermarks into media for copyright protection. The watermark verification process is confined within the devices, preserving confidentiality during extraction, validating identities during copyright exchanges, and facilitating blockchain-based traceability of copyright transfers. The implementation of a prototype system on the LPC55S69-EVK development board is detailed, illustrating the practicality and effectiveness of the proposed solution.

Updated: 2024-05-29 14:05:19

标题: DataSafe：采用PUF数字水印和区块链追踪的版权保护

摘要: 数字水印技术通常用于保护数字媒体版权，确认所有权并阻止未经授权的使用。然而，在没有可靠的第三方监督的情况下，这些方法在提取水印过程中存在安全漏洞的风险。此外，数字媒体缺乏有形的所有权属性，给安全版权转让和追踪带来挑战。本研究介绍了DataSafe，一种结合了物理不可克隆功能（PUFs）和区块链技术的版权保护方案。PUF设备使用其独特的指纹进行区块链注册。随后，这些设备结合了隐形水印技术，将数字水印嵌入媒体以进行版权保护。水印验证过程在设备内部进行，保护了提取过程中的保密性，验证了版权交换中的身份，并促进了基于区块链的版权转让的可追溯性。在LPC55S69-EVK开发板上实现了一个原型系统，详细展示了所提出解决方案的实用性和有效性。

更新时间: 2024-05-29 14:05:19

领域: cs.CR

下载: http://arxiv.org/abs/2405.19099v1

Efficient Black-box Adversarial Attacks via Bayesian Optimization Guided by a Function Prior

This paper studies the challenging black-box adversarial attack that aims to generate adversarial examples against a black-box model by only using output feedback of the model to input queries. Some previous methods improve the query efficiency by incorporating the gradient of a surrogate white-box model into query-based attacks due to the adversarial transferability. However, the localized gradient is not informative enough, making these methods still query-intensive. In this paper, we propose a Prior-guided Bayesian Optimization (P-BO) algorithm that leverages the surrogate model as a global function prior in black-box adversarial attacks. As the surrogate model contains rich prior information of the black-box one, P-BO models the attack objective with a Gaussian process whose mean function is initialized as the surrogate model's loss. Our theoretical analysis on the regret bound indicates that the performance of P-BO may be affected by a bad prior. Therefore, we further propose an adaptive integration strategy to automatically adjust a coefficient on the function prior by minimizing the regret bound. Extensive experiments on image classifiers and large vision-language models demonstrate the superiority of the proposed algorithm in reducing queries and improving attack success rates compared with the state-of-the-art black-box attacks. Code is available at https://github.com/yibo-miao/PBO-Attack.

Updated: 2024-05-29 14:05:16

标题: 通过由函数先验引导的贝叶斯优化实现高效的黑盒对抗攻击

摘要: 这篇论文研究了挑战性的黑盒对抗攻击，旨在通过仅使用模型的输出反馈到输入查询来生成针对黑盒模型的对抗样本。一些先前的方法通过将白盒模型的梯度整合到基于查询的攻击中来提高查询效率，这是由于对抗性可转移性。然而，局部梯度信息还不够充分，使得这些方法仍然需要大量查询。在本文中，我们提出了一种Prior-guided Bayesian Optimization（P-BO）算法，该算法利用替代模型作为全局函数先验在黑盒对抗攻击中。由于替代模型包含黑盒模型的丰富先验信息，P-BO使用高斯过程来建模攻击目标，其均值函数初始化为替代模型的损失。我们对遗憾界进行了理论分析，表明P-BO的性能可能会受到不良先验的影响。因此，我们进一步提出了一种自适应整合策略，通过最小化遗憾界来自动调整函数先验上的系数。对图像分类器和大型视觉语言模型进行了大量实验，结果表明与最先进的黑盒攻击相比，所提出的算法在减少查询次数和提高攻击成功率方面具有优势。代码可在https://github.com/yibo-miao/PBO-Attack上找到。

更新时间: 2024-05-29 14:05:16

领域: cs.LG,cs.AI,cs.CR,cs.CV,stat.ML

下载: http://arxiv.org/abs/2405.19098v1

Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

Text-to-image diffusion models have achieved tremendous success in the field of controllable image generation, while also coming along with issues of privacy leakage and data copyrights. Membership inference arises in these contexts as a potential auditing method for detecting unauthorized data usage. While some efforts have been made on diffusion models, they are not applicable to text-to-image diffusion models due to the high computation overhead and enhanced generalization capabilities. In this paper, we first identify a conditional overfitting phenomenon in text-to-image diffusion models, indicating that these models tend to overfit the conditional distribution of images given the text rather than the marginal distribution of images. Based on this observation, we derive an analytical indicator, namely Conditional Likelihood Discrepancy (CLiD), to perform membership inference, which reduces the stochasticity in estimating the memorization of individual samples. Experimental results demonstrate that our method significantly outperforms previous methods across various data distributions and scales. Additionally, our method shows superior resistance to overfitting mitigation strategies such as early stopping and data augmentation.

Updated: 2024-05-29 14:03:41

标题: 通过条件似然差异在文本到图像扩散模型中进行成员推断

摘要: 文本到图像扩散模型在可控图像生成领域取得了巨大成功，但同时也伴随着隐私泄露和数据版权等问题。在这些情况下，成员推断作为一种潜在的审计方法出现，用于检测未经授权的数据使用。虽然一些努力已经在扩散模型上进行，但由于高计算开销和增强的泛化能力，这些方法并不适用于文本到图像扩散模型。本文首先确定了文本到图像扩散模型中的条件过拟合现象，表明这些模型倾向于过拟合给定文本的图像条件分布，而不是图像的边际分布。基于这一观察，我们推导出一个分析指标，即条件似然差异（CLiD），用于执行成员推断，从而减少估计单个样本记忆的随机性。实验结果表明，我们的方法在各种数据分布和规模上明显优于先前的方法。此外，我们的方法显示出对过拟合缓解策略（如提前停止和数据增强）具有更强的抵抗力。

更新时间: 2024-05-29 14:03:41

领域: cs.CR,cs.CV

下载: http://arxiv.org/abs/2405.14800v2

Black-Box Access is Insufficient for Rigorous AI Audits

External audits of AI systems are increasingly recognized as a key mechanism for AI governance. The effectiveness of an audit, however, depends on the degree of access granted to auditors. Recent audits of state-of-the-art AI systems have primarily relied on black-box access, in which auditors can only query the system and observe its outputs. However, white-box access to the system's inner workings (e.g., weights, activations, gradients) allows an auditor to perform stronger attacks, more thoroughly interpret models, and conduct fine-tuning. Meanwhile, outside-the-box access to training and deployment information (e.g., methodology, code, documentation, data, deployment details, findings from internal evaluations) allows auditors to scrutinize the development process and design more targeted evaluations. In this paper, we examine the limitations of black-box audits and the advantages of white- and outside-the-box audits. We also discuss technical, physical, and legal safeguards for performing these audits with minimal security risks. Given that different forms of access can lead to very different levels of evaluation, we conclude that (1) transparency regarding the access and methods used by auditors is necessary to properly interpret audit results, and (2) white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

Updated: 2024-05-29 13:56:29

标题: 黑盒访问对严格的人工智能审计是不够的

摘要: 人们越来越认识到对人工智能系统进行外部审计是人工智能治理的关键机制。然而，审计的有效性取决于审计员被授予的访问权限。最近对最先进的人工智能系统的审计主要依赖于黑匣子访问，其中审计员只能查询系统并观察其输出。然而，对系统内部运作的白匣子访问（例如权重、激活、梯度）允许审计员进行更强的攻击，更深入地解释模型，并进行微调。与此同时，对训练和部署信息的超越匣子访问（例如方法论、代码、文档、数据、部署细节、内部评估结果）允许审计员审查开发过程并设计更有针对性的评估。在本文中，我们探讨了黑匣子审计的局限性以及白匣子和超越匣子审计的优势。我们还讨论了执行这些审计时的技术、物理和法律保障，以降低安全风险。考虑到不同形式的访问可能导致非常不同的评估水平，我们得出结论：(1) 透明度是必要的，以便正确解释审计结果所使用的访问和方法，以及 (2) 白匣子和超越匣子访问允许进行比单独使用黑匣子访问更深入的审查。

更新时间: 2024-05-29 13:56:29

领域: cs.CY,cs.AI,cs.CR

下载: http://arxiv.org/abs/2401.14446v3

Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap

Domain generalization (DG) is an important problem that learns a model which generalizes to unseen test domains leveraging one or more source domains, under the assumption of shared label spaces. However, most DG methods assume access to abundant source data in the target label space, a requirement that proves overly stringent for numerous real-world applications, where acquiring the same label space as the target task is prohibitively expensive. For this setting, we tackle the multimodal version of the unsupervised domain generalization (MUDG) problem, which uses a large task-agnostic unlabeled source dataset during finetuning. Our framework does not explicitly assume any relationship between the source dataset and target task. Instead, it relies only on the premise that the source dataset can be accurately and efficiently searched in a joint vision-language space. We make three contributions in the MUDG setting. Firstly, we show theoretically that cross-modal approximate nearest neighbor search suffers from low recall due to the large distance between text queries and the image centroids used for coarse quantization. Accordingly, we propose paired k-means, a simple clustering algorithm that improves nearest neighbor recall by storing centroids in query space instead of image space. Secondly, we propose an adaptive text augmentation scheme for target labels designed to improve zero-shot accuracy and diversify retrieved image data. Lastly, we present two simple but effective components to further improve downstream target accuracy. We compare against state-of-the-art name-only transfer, source-free DG and zero-shot (ZS) methods on their respective benchmarks and show consistent improvement in accuracy on 20 diverse datasets. Code is available: https://github.com/Chris210634/mudg

Updated: 2024-05-29 13:56:14

标题: 跨模态间隙检索的多模态无监督领域泛化

摘要: 域泛化（DG）是一个重要问题，它学习一个模型，能够在未见过的测试域中泛化，利用一个或多个源域，假设它们共享标签空间。然而，大多数DG方法假设可以访问目标标签空间中丰富的源数据，这一要求对许多现实应用来说过于严格，因为获取与目标任务相同的标签空间成本过高。针对这种情况，我们解决了无监督域泛化（MUDG）问题的多模态版本，利用一个大型任务无关的未标记源数据集进行微调。我们的框架没有明确假设源数据集和目标任务之间的任何关系。相反，它只依赖于一个前提，即源数据集可以在一个联合的视觉-语言空间中被准确而高效地搜索。在MUDG设置中，我们做出了三个贡献。首先，我们理论上表明，跨模态的近似最近邻搜索由于文本查询与用于粗量化的图像质心之间的距离较大而导致召回率低。因此，我们提出了一种简单的聚类算法，即配对k-means，通过在查询空间而不是图像空间中存储质心来改善最近邻召回率。其次，我们提出了一种适应性文本增强方案，用于改善零样本准确性并使检索到的图像数据多样化。最后，我们提出了两个简单但有效的组件，以进一步提高下游目标准确性。我们在各自的基准上与最先进的仅名称传输、无源域DG和零样本（ZS）方法进行比较，并在20个不同的数据集上展示了准确性的一致提升。代码可在https://github.com/Chris210634/mudg找到。

更新时间: 2024-05-29 13:56:14

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2402.04416v2

Faithful Chart Summarization with ChaTS-Pi

Chart-to-summary generation can help explore data, communicate insights, and help the visually impaired people. Multi-modal generative models have been used to produce fluent summaries, but they can suffer from factual and perceptual errors. In this work we present CHATS-CRITIC, a reference-free chart summarization metric for scoring faithfulness. CHATS-CRITIC is composed of an image-to-text model to recover the table from a chart, and a tabular entailment model applied to score the summary sentence by sentence. We find that CHATS-CRITIC evaluates the summary quality according to human ratings better than reference-based metrics, either learned or n-gram based, and can be further used to fix candidate summaries by removing not supported sentences. We then introduce CHATS-PI, a chart-to-summary pipeline that leverages CHATS-CRITIC during inference to fix and rank sampled candidates from any chart-summarization model. We evaluate CHATS-PI and CHATS-CRITIC using human raters, establishing state-of-the-art results on two popular chart-to-summary datasets.

Updated: 2024-05-29 13:55:06

标题: 用ChaTS-Pi进行忠实的图表总结

摘要: 图表到摘要生成可以帮助探索数据，传达见解，并帮助视觉受损的人。多模态生成模型已被用来生成流畅的摘要，但它们可能存在事实和感知错误。在这项工作中，我们提出了CHATS-CRITIC，一个无参考的图表摘要评分度量，用于评估忠实度。CHATS-CRITIC由一个图像到文本模型和一个用于逐句评分总结句子的表格蕴涵模型组成。我们发现，CHATS-CRITIC能够根据人类评分更好地评估摘要质量，优于基于参考的度量，无论是学习的还是n-gram基于的，并且可以进一步用于通过删除不支持的句子修复候选摘要。然后我们介绍了CHATS-PI，一个图表到摘要的流水线，利用CHATS-CRITIC在推理过程中修复和排名从任何图表摘要模型中采样的候选项。我们使用人工评分者评估CHATS-PI和CHATS-CRITIC，在两个流行的图表到摘要数据集上建立了最新颖的结果。

更新时间: 2024-05-29 13:55:06

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.19094v1

MRCpy: A Library for Minimax Risk Classifiers

Libraries for supervised classification have enabled the wide-spread usage of machine learning methods. Existing libraries, such as scikit-learn, caret, and mlpack, implement techniques based on the classical empirical risk minimization (ERM) approach. We present a Python library, MRCpy, that implements minimax risk classifiers (MRCs) based on the robust risk minimization (RRM) approach. The library offers multiple variants of MRCs that can provide performance guarantees, enable efficient learning in high dimensions, and adapt to distribution shifts. MRCpy follows an object-oriented approach and adheres to the standards of popular Python libraries, such as scikit-learn, facilitating readability and easy usage together with a seamless integration with other libraries. The source code is available under the GPL-3.0 license at https://github.com/MachineLearningBCAM/MRCpy.

Updated: 2024-05-29 13:51:15

标题: MRCpy：一种用于最小化风险分类器的库

摘要: 用于监督分类的图书馆已经实现了机器学习方法的广泛应用。现有的图书馆，如scikit-learn、caret和mlpack，实现了基于经典经验风险最小化（ERM）方法的技术。我们介绍了一个名为MRCpy的Python图书馆，它实现了基于鲁棒风险最小化（RRM）方法的极小风险分类器（MRCs）。该图书馆提供多个MRCs的变体，可以提供性能保证，实现高维度下的高效学习，并适应分布变化。MRCpy采用面向对象的方法，遵循流行的Python图书馆标准，如scikit-learn，提高可读性并易于使用，并与其他图书馆无缝集成。源代码在GPL-3.0许可下可在https://github.com/MachineLearningBCAM/MRCpy找到。

更新时间: 2024-05-29 13:51:15

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2108.01952v4

Patch-enhanced Mask Encoder Prompt Image Generation

Artificial Intelligence Generated Content(AIGC), known for its superior visual results, represents a promising mitigation method for high-cost advertising applications. Numerous approaches have been developed to manipulate generated content under different conditions. However, a crucial limitation lies in the accurate description of products in advertising applications. Applying previous methods directly may lead to considerable distortion and deformation of advertised products, primarily due to oversimplified content control conditions. Hence, in this work, we propose a patch-enhanced mask encoder approach to ensure accurate product descriptions while preserving diverse backgrounds. Our approach consists of three components Patch Flexible Visibility, Mask Encoder Prompt Adapter and an image Foundation Model. Patch Flexible Visibility is used for generating a more reasonable background image. Mask Encoder Prompt Adapter enables region-controlled fusion. We also conduct an analysis of the structure and operational mechanisms of the Generation Module. Experimental results show our method can achieve the highest visual results and FID scores compared with other methods.

Updated: 2024-05-29 13:47:32

标题: Patch-enhanced Mask编码提示图像生成

摘要: 人工智能生成内容（AIGC）以其优越的视觉效果而闻名，是一种应用于高成本广告应用的有前途的缓解方法。已经开发了许多方法来在不同条件下操作生成的内容。然而，在广告应用中准确描述产品是一个关键限制。直接应用先前的方法可能会导致广告产品的严重扭曲和变形，主要是由于过于简化的内容控制条件。因此，在这项工作中，我们提出了一种增强贴片掩膜编码器方法，以确保准确描述产品同时保留多样化的背景。我们的方法包括三个组件：贴片灵活可见性、掩膜编码器提示适配器和图像基础模型。贴片灵活可见性用于生成更合理的背景图像。掩膜编码器提示适配器实现了区域控制融合。我们还对生成模块的结构和操作机制进行了分析。实验结果显示，与其他方法相比，我们的方法可以实现最高的视觉效果和FID分数。

更新时间: 2024-05-29 13:47:32

领域: cs.AI,cs.CV

下载: http://arxiv.org/abs/2405.19085v1

Learning Better Representations From Less Data For Propositional Satisfiability

Training neural networks on NP-complete problems typically demands very large amounts of training data and often needs to be coupled with computationally expensive symbolic verifiers to ensure output correctness. In this paper, we present NeuRes, a neuro-symbolic approach to address both challenges for propositional satisfiability, being the quintessential NP-complete problem. By combining certificate-driven training and expert iteration, our model learns better representations than models trained for classification only, with a much higher data efficiency -- requiring orders of magnitude less training data. NeuRes employs propositional resolution as a proof system to generate proofs of unsatisfiability and to accelerate the process of finding satisfying truth assignments, exploring both possibilities in parallel. To realize this, we propose an attention-based architecture that autoregressively selects pairs of clauses from a dynamic formula embedding to derive new clauses. Furthermore, we employ expert iteration whereby model-generated proofs progressively replace longer teacher proofs as the new ground truth. This enables our model to reduce a dataset of proofs generated by an advanced solver by ~32% after training on it with no extra guidance. This shows that NeuRes is not limited by the optimality of the teacher algorithm owing to its self-improving workflow. We show that our model achieves far better performance than NeuroSAT in terms of both correctly classified and proven instances.

Updated: 2024-05-29 13:47:14

标题: 从更少的数据中学习更好的表示方法用于命题可满足性

摘要: 在NP完全问题上训练神经网络通常需要大量的训练数据，并且经常需要与计算上昂贵的符号验证器相结合，以确保输出的正确性。在本文中，我们提出NeuRes，这是一种神经符号方法，用于解决命题可满足性这一典型的NP完全问题。通过结合基于证书驱动的训练和专家迭代，我们的模型学习到比仅用于分类训练的模型更好的表示，且数据效率更高--需要数量级更少的训练数据。NeuRes采用命题分辨作为一个证明系统来生成不可满足性的证明，并加速找到满足真值赋值的过程，同时并行地探索这两种可能性。为了实现这一点，我们提出了一个基于注意力的架构，自回归地从动态公式嵌入中选择子句对来推导新的子句。此外，我们采用专家迭代，其中模型生成的证明逐渐取代较长的老师证明作为新的基准事实。这使得我们的模型在没有额外指导的情况下，能够将一个由高级求解器生成的证明数据集减少约32%。这表明NeuRes不受老师算法的最优性限制，因为它具有自我改进的工作流程。我们展示了我们的模型在正确分类和证明实例方面比NeuroSAT表现得好得多。

更新时间: 2024-05-29 13:47:14

领域: cs.LG,cs.LO

下载: http://arxiv.org/abs/2402.08365v2

Prompt Fuzzing for Fuzz Driver Generation

Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous attempts within a large search space. We propose PromptFuzz, a coverage-guided fuzzer for prompt fuzzing that iteratively generates fuzz drivers to explore undiscovered library code. To explore API usage in fuzz drivers during prompt fuzzing, we propose several key techniques: instructive program generation, erroneous program validation, coverage-guided prompt mutation, and constrained fuzzer scheduling. We implemented PromptFuzz and evaluated it on 14 real-world libraries. Compared with OSS-Fuzz and Hopper (the state-of-the-art fuzz driver generation tool), fuzz drivers generated by PromptFuzz achieved 1.61 and 1.63 times higher branch coverage than those by OSS-Fuzz and Hopper, respectively. Moreover, the fuzz drivers generated by PromptFuzz detected 33 genuine, new bugs out of a total of 49 crashes, out of which 30 bugs have been confirmed by their respective communities.

Updated: 2024-05-29 13:42:18

标题: 快速模糊测试用于生成模糊驱动程序

摘要: 制作高质量的模糊驱动程序不仅耗时，还需要对库的深入理解。然而，目前自动生成模糊驱动程序的技术并未达到预期。虽然从消费者代码中派生的模糊驱动程序可以达到深层状态，但覆盖范围有限。相反，解释性模糊测试可以探索大多数API调用，但需要在庞大的搜索空间中进行大量尝试。我们提出PromptFuzz，一种用于即时模糊测试的覆盖引导模糊器，通过迭代生成模糊驱动程序来探索未发现的库代码。为了在即时模糊测试过程中探索模糊驱动程序中的API使用，我们提出了几种关键技术：指导性程序生成，错误程序验证，覆盖引导的即时变异和受限的模糊器调度。我们实现了PromptFuzz，并在14个实际库上进行了评估。与OSS-Fuzz和Hopper（最先进的模糊驱动程序生成工具）相比，PromptFuzz生成的模糊驱动程序分别比OSS-Fuzz和Hopper的分支覆盖率高1.61和1.63倍。此外，PromptFuzz生成的模糊驱动程序在49次崩溃中检测到33个真实的新错误，其中30个错误已被各自社区确认。

更新时间: 2024-05-29 13:42:18

领域: cs.CR,cs.SE

下载: http://arxiv.org/abs/2312.17677v2

Beyond the Limits: A Survey of Techniques to Extend the Context Length in Large Language Models

Recently, large language models (LLMs) have shown remarkable capabilities including understanding context, engaging in logical reasoning, and generating responses. However, this is achieved at the expense of stringent computational and memory requirements, hindering their ability to effectively support long input sequences. This survey provides an inclusive review of the recent techniques and methods devised to extend the sequence length in LLMs, thereby enhancing their capacity for long-context understanding. In particular, we review and categorize a wide range of techniques including architectural modifications, such as modified positional encoding and altered attention mechanisms, which are designed to enhance the processing of longer sequences while avoiding a proportional increase in computational requirements. The diverse methodologies investigated in this study can be leveraged across different phases of LLMs, i.e., training, fine-tuning and inference. This enables LLMs to efficiently process extended sequences. The limitations of the current methodologies is discussed in the last section along with the suggestions for future research directions, underscoring the importance of sequence length in the continued advancement of LLMs.

Updated: 2024-05-29 13:38:25

标题: 超越极限：大型语言模型中扩展上下文长度的技术调查

摘要: 最近，大型语言模型(LLMs)展示出了令人瞩目的能力，包括理解上下文、进行逻辑推理和生成响应。然而，这是以严格的计算和内存需求为代价实现的，这阻碍了它们有效支持长输入序列的能力。本调查提供了对最近设计的用于扩展LLMs序列长度的技术和方法的全面审查，从而增强了它们对长上下文理解的能力。具体来说，我们审查并分类了一系列广泛的技术，包括建筑修改，比如修改后的位置编码和改变的注意机制，旨在增强对更长序列的处理，同时避免计算需求的成比例增加。本研究调查的多样化方法可以在LLMs的不同阶段，即训练、微调和推断中加以利用。这使LLMs能够高效处理扩展的序列。当前方法的局限性在最后一节中进行了讨论，同时提出了未来研究方向的建议，强调了序列长度对LLMs持续进步的重要性。

更新时间: 2024-05-29 13:38:25

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.02244v3

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. In light of this, we introduce a surrogate policy learning objective by considering the transition occupancy discrepancies and then cast it into a tractable min-max optimization problem through dual reformulation. Our method, dubbed Occupancy-Matching Policy Optimization (OMPO), features a specialized actor-critic structure equipped with a distribution discriminator and a small-size local buffer. We conduct extensive experiments based on the OpenAI Gym, Meta-World, and Panda Robots environments, encompassing policy shifts under stationary and nonstationary dynamics, as well as domain adaption. The results demonstrate that OMPO outperforms the specialized baselines from different categories in all settings. We also find that OMPO exhibits particularly strong performance when combined with domain randomization, highlighting its potential in RL-based robotics applications

Updated: 2024-05-29 13:36:36

标题: OMPO: 一个统一的框架，用于处理策略和动态变化下的强化学习

摘要: 使用从不同策略或动态中收集的环境交互数据来训练强化学习策略存在一项基本挑战。现有的工作往往忽视由策略或动态转变引起的分布差异，或依赖于带有任务先验的专门算法，因此通常导致次优策略性能和高学习方差。在本文中，我们确定了一个统一的策略，用于在线RL策略学习在不同策略和动态转变的情况下：转换占用匹配。基于此，我们通过考虑转换占用的不一致性引入了一个替代策略学习目标，然后通过双重重构将其转化为一个可处理的极小-极大优化问题。我们的方法，名为占用匹配策略优化（OMPO），具有一个配备分布鉴别器和一个小型本地缓冲区的专门的演员-评论家结构。我们基于OpenAI Gym、Meta-World和熊猫机器人环境进行了广泛的实验，涵盖了在静态和非静态动态下的策略转变，以及领域调整。结果表明，在所有设置中，OMPO优于不同类别的专门基线。我们还发现，当与领域随机化相结合时，OMPO表现出特别强大的性能，突显了其在基于RL的机器人应用中的潜力。

更新时间: 2024-05-29 13:36:36

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19080v1

Benchmarking General Purpose In-Context Learning

In-context learning (ICL) capabilities are becoming increasingly appealing for building general intelligence due to their sample efficiency and independence from artificial optimization skills. To enhance generalization, biological neural systems primarily inherit learning capabilities and subsequently refine their memory, acquiring diverse skills and knowledge through extensive lifelong experiences. This process gives rise to the concept of general-purpose in-context learning (GPICL). Compared to standard ICL, GPICL addresses a broader range of tasks, extends learning horizons, and starts at a lower zero-shot baseline. We introduce two lightweight but insightful benchmarks specifically crafted to train and evaluate GPICL functionalities. Each benchmark includes a vast number of tasks characterized by significant task variance and minimal transferable knowledge among tasks, facilitating lifelong in-context learning through continuous generation and interaction. These features pose significant challenges for models that rely on context or interactions to improve their proficiency, including language models, decision models, and world models. Our experiments reveal that parameter scale alone may not be crucial for ICL or GPICL, suggesting alternative approaches such as increasing the scale of contexts and memory states.

Updated: 2024-05-29 13:35:01

标题: 基准测试通用情境学习

摘要: 上下文学习（ICL）能力因其样本效率和独立于人工优化技能而变得越来越吸引人，以建立一般智能。为了增强泛化能力，生物神经系统主要继承学习能力，并随后完善其记忆，通过广泛的终身经验获取各种技能和知识。这个过程产生了通用的上下文学习（GPICL）的概念。与标准ICL相比，GPICL涵盖了更广泛的任务范围，扩展了学习视野，并从更低的零射击基线开始。我们引入了两个专门设计用于训练和评估GPICL功能的轻量级但富有洞察力的基准。每个基准包含大量任务，这些任务具有显著的任务差异和任务之间的最小可转移知识，通过持续的生成和交互促进终身上下文学习。这些特征为依赖于上下文或交互以提高其熟练度的模型，包括语言模型、决策模型和世界模型，提出了重大挑战。我们的实验表明，参数规模单独可能对ICL或GPICL不是至关重要的，这表明诸如增加上下文和记忆状态规模之类的替代方法。

更新时间: 2024-05-29 13:35:01

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17234v2

Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design

We present Cephalo, a series of multimodal vision large language models (V-LLMs) designed for materials science applications, integrating visual and linguistic data for enhanced understanding and interaction within human-AI and multi-agent AI frameworks. A key innovation of Cephalo is its advanced dataset generation method, which employs a sophisticated algorithm to accurately detect and separate images and their corresponding textual descriptions from PDF documents, such as scientific papers. The method includes a careful refinement of image-text pairs through integrated vision and language processing, ensuring high-quality, contextually relevant, and well reasoned training data. Cephalo is trained on integrated image and text data extracted from thousands of scientific papers and science-focused Wikipedia pages demonstrates can interpret complex visual scenes, generate precise language descriptions, and answer queries about images effectively. The combination of a vision encoder with an autoregressive transformer supports complex natural language understanding in an integrated model, which can be coupled with other generative methods to create an image-to-text-to-image or image-to-text-to-3D pipeline. To explore the development of larger models from smaller ones, we merge sets of layers that originate from different pre-trained source models. This hybrid approach allows us to leverage the domain-specific expertise and general conversational capabilities to harness the strengths of multiple models. We examine the models in diverse use cases that incorporate biological materials, fracture and engineering analysis, protein biophysics, and bio-inspired design based on insect behavior. Generative applications include bio-inspired designs, including pollen-inspired architected materials, as well as the synthesis of bio-inspired material microstructures from a photograph of a solar eclipse.

Updated: 2024-05-29 13:34:32

标题: Cephalo: 多模态视觉-语言模型用于生物启发材料分析和设计

摘要: 我们介绍了Cephalo，这是一系列为材料科学应用设计的多模态视觉大语言模型（V-LLMs），集成了视觉和语言数据，以增强人工智能和多智能体人工智能框架内的理解和交互。Cephalo的一个关键创新是其先进的数据集生成方法，该方法采用复杂的算法准确检测和分离来自PDF文件（如科学论文）的图像及其相应的文本描述。该方法通过集成视觉和语言处理精心改进图像-文本对，确保高质量、上下文相关且合理的训练数据。Cephalo是在从成千上万篇科学论文和以科学为重点的维基百科页面中提取的集成图像和文本数据上进行训练的，证明了其能够解释复杂的视觉场景，生成精确的语言描述，并有效地回答关于图像的查询。视觉编码器与自回归变换器的结合支持集成模型中的复杂自然语言理解，可与其他生成方法相结合，创建图像到文本到图像或图像到文本到三维流水线。为了探索从较小模型发展出更大模型，我们合并了来自不同预训练源模型的层集。这种混合方法使我们能够利用领域专业知识和一般对话能力，以利用多个模型的优势。我们在包括生物材料、断裂和工程分析、蛋白质生物物理学和基于昆虫行为的生物启发设计在内的各种用例中研究了这些模型。生成应用包括受生物启发的设计，包括受花粉启发的结构材料，以及从日食的照片中合成生物启发的材料微结构。

更新时间: 2024-05-29 13:34:32

领域: cs.CV,cond-mat.mes-hall,cond-mat.mtrl-sci,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.19076v1

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Aligning human preference and value is an important requirement for contemporary foundation models. State-of-the-art techniques such as Reinforcement Learning from Human Feedback (RLHF) often consist of two stages: 1) supervised fine-tuning (SFT), where the model is fine-tuned by learning from human demonstration data; 2) Preference learning, where preference data is used to learn a reward model, which is in turn used by a reinforcement learning (RL) step to fine-tune the model. Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to (explicitly or implicitly) build an reward model, while learning the policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but also promote the ability to distinguish between the preferred and non-preferred continuations. Moreover, we identify a connection between the proposed IRL based approach, and certain self-play approach proposed recently, and showed that self-play is a special case of modeling a reward-learning agent. Theoretically, we show that the proposed algorithms converge to the stationary solutions of the IRL problem. Empirically, we align 1B and 7B models using proposed methods and evaluate them on a reward benchmark model and the HuggingFace Open LLM Leaderboard. The proposed methods show significant performance improvement over existing SFT approaches. Our results indicate that it is beneficial to explicitly or implicitly leverage reward learning throughout the entire alignment process.

Updated: 2024-05-29 13:33:33

标题: 从SFT数据中获得更多信息：从人类示范中学习奖励改善LLM对齐

摘要: 将人类偏好和价值进行对齐是当代基础模型的重要要求。最先进的技术，如从人类反馈中学习强化学习（RLHF），通常包括两个阶段：1）监督微调（SFT），在这个阶段，模型通过从人类示范数据中学习进行微调；2）偏好学习，其中使用偏好数据来学习一个奖励模型，然后该奖励模型由强化学习（RL）步骤来微调模型。这种奖励模型作为人类偏好的代理，并且对指导RL步骤改善模型质量至关重要。在这项工作中，我们认为SFT阶段也极大受益于学习奖励模型。我们提出利用逆强化学习（IRL）技术（显式或隐式）构建奖励模型，同时学习策略模型，而不是直接通过监督学习使用人类示范数据。这种方法导致了新的SFT算法，不仅实现效率高，而且促进了区分首选和非首选延续的能力。此外，我们确定了所提议的基于IRL的方法与最近提出的某些自我对弈方法之间的联系，并显示自我对弈是对建模奖励学习代理的特殊情况。在理论上，我们证明了所提议的算法收敛到IRL问题的稳定解。在实证上，我们使用提出的方法对1B和7B模型进行对齐，并在奖励基准模型和HuggingFace Open LLM排行榜上对它们进行评估。提出的方法显示出明显的性能改进，超过了现有的SFT方法。我们的结果表明，在整个对齐过程中明确或隐含地利用奖励学习是有益的。

更新时间: 2024-05-29 13:33:33

领域: cs.AI

下载: http://arxiv.org/abs/2405.17888v2

Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning

Continual learning methods are known to suffer from catastrophic forgetting, a phenomenon that is particularly hard to counter for methods that do not store exemplars of previous tasks. Therefore, to reduce potential drift in the feature extractor, existing exemplar-free methods are typically evaluated in settings where the first task is significantly larger than subsequent tasks. Their performance drops drastically in more challenging settings starting with a smaller first task. To address this problem of feature drift estimation for exemplar-free methods, we propose to adversarially perturb the current samples such that their embeddings are close to the old class prototypes in the old model embedding space. We then estimate the drift in the embedding space from the old to the new model using the perturbed images and compensate the prototypes accordingly. We exploit the fact that adversarial samples are transferable from the old to the new feature space in a continual learning setting. The generation of these images is simple and computationally cheap. We demonstrate in our experiments that the proposed approach better tracks the movement of prototypes in embedding space and outperforms existing methods on several standard continual learning benchmarks as well as on fine-grained datasets. Code is available at https://github.com/dipamgoswami/ADC.

Updated: 2024-05-29 13:31:42

标题: 用新数据复活旧类别，实现无样本持续学习

摘要: 持续学习方法通常会遭受灾难性遗忘的困扰，这种现象对于不存储先前任务示例的方法来说尤为难以对抗。因此，为了减少特征提取器中潜在的漂移，现有的无示例方法通常在第一个任务明显大于后续任务的设置中进行评估。它们在从较小的第一个任务开始的更具挑战性的设置中的表现急剧下降。为了解决无示例方法中特征漂移估计的问题，我们提出对当前样本进行对抗性扰动，使它们在旧模型嵌入空间中接近旧类原型。然后，我们使用扰动图像从旧模型到新模型估计嵌入空间中的漂移，并相应地补偿原型。我们利用对抗样本在持续学习环境中从旧到新特征空间的可传递性。这些图像的生成简单且计算成本低廉。我们在实验中证明，所提出的方法更好地跟踪嵌入空间中原型的移动，并在几个标准的持续学习基准测试以及细粒度数据集上胜过现有方法。代码可在https://github.com/dipamgoswami/ADC 上找到。

更新时间: 2024-05-29 13:31:42

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.19074v1

Achievable Fairness on Your Data With Utility Guarantees

In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy, a phenomenon known as the fairness-accuracy trade-off. The severity of this trade-off inherently depends on dataset characteristics such as dataset imbalances or biases and therefore, using a uniform fairness requirement across diverse datasets remains questionable. To address this, we present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets, backed by rigorous statistical guarantees. By utilizing the You-Only-Train-Once (YOTO) framework, our approach mitigates the computational burden of having to train multiple models when approximating the trade-off curve. Crucially, we introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness while avoiding false conclusions due to estimation errors. Our experiments spanning tabular (e.g., Adult), image (CelebA), and language (Jigsaw) datasets underscore that our approach not only reliably quantifies the optimum achievable trade-offs across various data modalities but also helps detect suboptimality in SOTA fairness methods.

Updated: 2024-05-29 13:29:39

标题: 您的数据的公平性可实现，并保证效用

摘要: 在机器学习公平性中，训练模型以最小化不同敏感群体之间的差异通常会导致准确性降低，这种现象被称为公平性-准确性权衡。这种权衡的严重程度在很大程度上取决于数据集特征，如数据集不平衡或偏见，因此，在不同数据集之间使用统一的公平性要求仍然存在疑问。为了解决这个问题，我们提出了一种针对个别数据集量身定制的计算效率高的方法，用来近似公平性-准确性权衡曲线，并支持严格的统计保证。通过利用You-Only-Train-Once（YOTO）框架，我们的方法减轻了在近似权衡曲线时训练多个模型的计算负担。关键的是，我们引入了一种新颖的方法来量化我们估计中的不确定性，从而为从业者提供了一个健壮的框架，用于审计模型的公平性，同时避免由于估计误差而产生错误结论。我们的实验涵盖了表格（例如Adult）、图像（CelebA）和语言（Jigsaw）数据集，强调了我们的方法不仅可靠地量化了各种数据模态之间的最佳可实现的权衡，还有助于检测SOTA公平性方法中的次优性。

更新时间: 2024-05-29 13:29:39

领域: stat.ML,cs.CY,cs.LG

下载: http://arxiv.org/abs/2402.17106v2

Relevance-aware Algorithmic Recourse

As machine learning continues to gain prominence, transparency and explainability are increasingly critical. Without an understanding of these models, they can replicate and worsen human bias, adversely affecting marginalized communities. Algorithmic recourse emerges as a tool for clarifying decisions made by predictive models, providing actionable insights to alter outcomes. They answer, 'What do I have to change?' to achieve the desired result. Despite their importance, current algorithmic recourse methods treat all domain values equally, which is unrealistic in real-world settings. In this paper, we propose a novel framework, Relevance-Aware Algorithmic Recourse (RAAR), that leverages the concept of relevance in applying algorithmic recourse to regression tasks. We conducted multiple experiments on 15 datasets to outline how relevance influences recourses. Results show that relevance contributes algorithmic recourses comparable to well-known baselines, with greater efficiency and lower relative costs.

Updated: 2024-05-29 13:25:49

标题: 相关性感知算法性回溯

摘要: 随着机器学习的日益突出，透明度和可解释性变得越来越关键。没有对这些模型的理解，它们可能会复制和加剧人类偏见，对边缘化社区产生不利影响。算法补救措施作为一种工具，用于澄清预测模型所做的决定，提供可操作的见解以改变结果。它们回答了“我需要改变什么？”以实现预期结果。尽管它们很重要，但目前的算法补救方法将所有领域值同等对待，在实际环境中是不现实的。在本文中，我们提出了一个新颖的框架，即关联感知算法补救（RAAR），利用相关性概念将算法补救应用于回归任务。我们在15个数据集上进行了多次实验，以说明相关性如何影响补救措施。结果表明，相关性贡献了与知名基线相当的算法补救，具有更高的效率和较低的相对成本。

更新时间: 2024-05-29 13:25:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.19072v1

Machine Learning in Short-Reach Optical Systems: A Comprehensive Survey

In recent years, extensive research has been conducted to explore the utilization of machine learning algorithms in various direct-detected and self-coherent short-reach communication applications. These applications encompass a wide range of tasks, including bandwidth request prediction, signal quality monitoring, fault detection, traffic prediction, and digital signal processing (DSP)-based equalization. As a versatile approach, machine learning demonstrates the ability to address stochastic phenomena in optical systems networks where deterministic methods may fall short. However, when it comes to DSP equalization algorithms, their performance improvements are often marginal, and their complexity is prohibitively high, especially in cost-sensitive short-reach communications scenarios such as passive optical networks (PONs). They excel in capturing temporal dependencies, handling irregular or nonlinear patterns effectively, and accommodating variable time intervals. Within this extensive survey, we outline the application of machine learning techniques in short-reach communications, specifically emphasizing their utilization in high-bandwidth demanding PONs. Notably, we introduce a novel taxonomy for time-series methods employed in machine learning signal processing, providing a structured classification framework. Our taxonomy categorizes current time series methods into four distinct groups: traditional methods, Fourier convolution-based methods, transformer-based models, and time-series convolutional networks. Finally, we highlight prospective research directions within this rapidly evolving field and outline specific solutions to mitigate the complexity associated with hardware implementations. We aim to pave the way for more practical and efficient deployment of machine learning approaches in short-reach optical communication systems by addressing complexity concerns.

Updated: 2024-05-29 13:25:16

标题: 短距离光学系统中的机器学习：全面调查

摘要: 最近几年，人们进行了大量研究，探索在各种直接检测和自相干短程通信应用中利用机器学习算法。这些应用涵盖了广泛的任务，包括带宽请求预测、信号质量监测、故障检测、流量预测和基于数字信号处理（DSP）的均衡。作为一种多功能方法，机器学习展示了处理光学系统网络中的随机现象的能力，而确定性方法可能会不足。然而，当涉及到DSP均衡算法时，它们的性能改进往往是微小的，而且它们的复杂性过高，尤其是在成本敏感的短距离通信场景中，如被动光网络（PONs）。它们擅长捕捉时间依赖性，有效处理不规则或非线性模式，并适应可变时间间隔。在这个广泛的调查中，我们概述了机器学习技术在短程通信中的应用，特别强调它们在高带宽需求的PONs中的利用。值得注意的是，我们引入了一种新的时间序列方法分类法，用于机器学习信号处理，提供了一个结构化的分类框架。我们的分类法将目前的时间序列方法分为四个不同的组别：传统方法、傅里叶卷积方法、基于变压器的模型和时间序列卷积网络。最后，我们强调了在这个快速发展领域内的未来研究方向，并概述了减轻与硬件实施相关的复杂性的特定解决方案。我们旨在通过解决复杂性问题，为机器学习方法在短程光通信系统中更实用和高效的部署铺平道路。

更新时间: 2024-05-29 13:25:16

领域: eess.SP,cs.LG

下载: http://arxiv.org/abs/2405.09557v2

xTern: Energy-Efficient Ternary Neural Network Inference on RISC-V-Based Edge Systems

Ternary neural networks (TNNs) offer a superior accuracy-energy trade-off compared to binary neural networks. However, until now, they have required specialized accelerators to realize their efficiency potential, which has hindered widespread adoption. To address this, we present xTern, a lightweight extension of the RISC-V instruction set architecture (ISA) targeted at accelerating TNN inference on general-purpose cores. To complement the ISA extension, we developed a set of optimized kernels leveraging xTern, achieving 67% higher throughput than their 2-bit equivalents. Power consumption is only marginally increased by 5.2%, resulting in an energy efficiency improvement by 57.1%. We demonstrate that the proposed xTern extension, integrated into an octa-core compute cluster, incurs a minimal silicon area overhead of 0.9% with no impact on timing. In end-to-end benchmarks, we demonstrate that xTern enables the deployment of TNNs achieving up to 1.6 percentage points higher CIFAR-10 classification accuracy than 2-bit networks at equal inference latency. Our results show that xTern enables RISC-V-based ultra-low-power edge AI platforms to benefit from the efficiency potential of TNNs.

Updated: 2024-05-29 13:16:46

标题: xTern：基于RISC-V边缘系统的能效三值神经网络推断

摘要: 三元神经网络（TNNs）相较于二元神经网络具有更优越的准确性-能量折衷。然而，直到现在，它们需要专门的加速器来实现其效率潜力，这一直阻碍了广泛的采用。为了解决这个问题，我们提出了xTern，这是一个轻量级的RISC-V指令集架构（ISA）扩展，旨在加速通用核心上的TNN推断。为了补充ISA扩展，我们开发了一组利用xTern的优化内核，实现比它们的2位等效内核高出67%的吞吐量。功耗仅增加了5.2%，能效提高了57.1%。我们证明了所提出的xTern扩展，集成到一个八核计算集群中，只增加了0.9%的硅片面积开销，对时序没有影响。在端到端基准测试中，我们展示了xTern使得部署的TNNs在相同推断延迟下可以实现比2位网络高出1.6个百分点的CIFAR-10分类准确度。我们的结果表明，xTern使得基于RISC-V的超低功耗边缘人工智能平台可以从TNNs的效率潜力中受益。

更新时间: 2024-05-29 13:16:46

领域: cs.AR,cs.LG

下载: http://arxiv.org/abs/2405.19065v1

SIG: Efficient Self-Interpretable Graph Neural Network for Continuous-time Dynamic Graphs

While dynamic graph neural networks have shown promise in various applications, explaining their predictions on continuous-time dynamic graphs (CTDGs) is difficult. This paper investigates a new research task: self-interpretable GNNs for CTDGs. We aim to predict future links within the dynamic graph while simultaneously providing causal explanations for these predictions. There are two key challenges: (1) capturing the underlying structural and temporal information that remains consistent across both independent and identically distributed (IID) and out-of-distribution (OOD) data, and (2) efficiently generating high-quality link prediction results and explanations. To tackle these challenges, we propose a novel causal inference model, namely the Independent and Confounded Causal Model (ICCM). ICCM is then integrated into a deep learning architecture that considers both effectiveness and efficiency. Extensive experiments demonstrate that our proposed model significantly outperforms existing methods across link prediction accuracy, explanation quality, and robustness to shortcut features. Our code and datasets are anonymously released at https://github.com/2024SIG/SIG.

Updated: 2024-05-29 13:09:33

标题: SIG：高效的自解释图神经网络用于连续时间动态图

摘要: 动态图神经网络在各种应用中显示出了潜力，但解释连续时间动态图（CTDGs）上的预测是困难的。本文研究了一个新的研究任务：用于CTDGs的自解释GNNs。我们的目标是在预测动态图中的未来链接的同时，为这些预测提供因果解释。存在两个关键挑战：（1）捕获在独立同分布（IID）和分布外（OOD）数据上保持一致的基础结构和时间信息，以及（2）高效生成高质量的链接预测结果和解释。为了解决这些挑战，我们提出了一种新颖的因果推断模型，即独立和混淆的因果模型（ICCM）。ICCM然后被整合到一个同时考虑效果和效率的深度学习架构中。广泛的实验证明，我们提出的模型在链接预测准确性、解释质量和对快捷特征的鲁棒性方面明显优于现有方法。我们的代码和数据集已匿名发布在https://github.com/2024SIG/SIG。

更新时间: 2024-05-29 13:09:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19062v1

On the Condition Monitoring of Bolted Joints through Acoustic Emission and Deep Transfer Learning: Generalization, Ordinal Loss and Super-Convergence

This paper investigates the use of deep transfer learning based on convolutional neural networks (CNNs) to monitor the condition of bolted joints using acoustic emissions. Bolted structures are critical components in many mechanical systems, and the ability to monitor their condition status is crucial for effective structural health monitoring. We evaluated the performance of our methodology using the ORION-AE benchmark, a structure composed of two thin beams connected by three bolts, where highly noisy acoustic emission measurements were taken to detect changes in the applied tightening torque of the bolts. The data used from this structure is derived from the transformation of acoustic emission data streams into images using continuous wavelet transform, and leveraging pretrained CNNs for feature extraction and denoising. Our experiments compared single-sensor versus multiple-sensor fusion for estimating the tightening level (loosening) of bolts and evaluated the use of raw versus prefiltered data on the performance. We particularly focused on the generalization capabilities of CNN-based transfer learning across different measurement campaigns and we studied ordinal loss functions to penalize incorrect predictions less severely when close to the ground truth, thereby encouraging misclassification errors to be in adjacent classes. Network configurations as well as learning rate schedulers are also investigated, and super-convergence is obtained, i.e., high classification accuracy is achieved in a few number of iterations with different networks. Furthermore, results demonstrate the generalization capabilities of CNN-based transfer learning for monitoring bolted structures by acoustic emission with varying amounts of prior information required during training.

Updated: 2024-05-29 13:07:21

标题: 关于通过声发射和深度转移学习进行螺栓连接状态监测：泛化、序数损失和超收敛

摘要: 本文研究了基于卷积神经网络（CNN）的深度迁移学习在使用声发射监测螺栓连接部件状态方面的应用。螺栓结构是许多机械系统中的关键组成部分，监测其状态对有效的结构健康监测至关重要。我们使用了ORION-AE基准评估了我们方法的性能，该结构由两根薄梁通过三个螺栓连接而成，高噪声的声发射测量数据用于检测螺栓拧紧扭矩的变化。从该结构中使用的数据源自将声发射数据流转换为图像，使用连续小波变换，并利用预训练的CNN进行特征提取和去噪。我们的实验比较了单传感器与多传感器融合用于估计螺栓的拧紧水平（松动），并评估了原始数据与预过滤数据在性能上的差异。我们特别关注了基于CNN的迁移学习在不同测量活动中的泛化能力，并研究了序数损失函数，以在接近真实值时轻罚错误预测，从而鼓励误分类错误位于相邻类别中。还调查了网络配置和学习率调度器，并获得了超级收敛，即使用不同网络在少量迭代中实现高分类准确度。此外，结果表明了基于CNN的迁移学习在使用声发射监测螺栓结构时的泛化能力，需要在训练期间提供不同数量的先验信息。

更新时间: 2024-05-29 13:07:21

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2405.20887v1

Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models

As the use of Large Language Models (LLMs) becomes more widespread, understanding their self-evaluation of confidence in generated responses becomes increasingly important as it is integral to the reliability of the output of these models. We introduce the concept of Confidence-Probability Alignment, that connects an LLM's internal confidence, quantified by token probabilities, to the confidence conveyed in the model's response when explicitly asked about its certainty. Using various datasets and prompting techniques that encourage model introspection, we probe the alignment between models' internal and expressed confidence. These techniques encompass using structured evaluation scales to rate confidence, including answer options when prompting, and eliciting the model's confidence level for outputs it does not recognize as its own. Notably, among the models analyzed, OpenAI's GPT-4 showed the strongest confidence-probability alignment, with an average Spearman's $\hat{\rho}$ of 0.42, across a wide range of tasks. Our work contributes to the ongoing efforts to facilitate risk assessment in the application of LLMs and to further our understanding of model trustworthiness.

Updated: 2024-05-29 13:05:16

标题: 引擎盖下的信心：对大型语言模型中信心-概率对齐的调查

摘要: 随着大型语言模型（LLMs）的使用日益普及，了解它们对生成的响应的自我评估信心变得越来越重要，因为这对于这些模型的输出的可靠性至关重要。我们引入了置信度-概率对齐的概念，将一个LLM的内部置信度（通过标记概率量化）与当明确询问其确定性时模型响应中传达的置信度联系起来。通过使用各种数据集和促使技术鼓励模型自省，我们探究了模型内部和表达置信度之间的对齐情况。这些技术包括使用结构化评估量表来评估置信度，包括在提示时提供答案选项，并引出模型对其不认可为自己的输出的置信水平。值得注意的是，在分析的模型中，OpenAI的GPT-4显示出最强的置信度-概率对齐，其平均Spearman's ρ̂为0.42，覆盖了广泛的任务范围。我们的工作有助于促进在LLMs应用中风险评估的持续努力，并进一步增进对模型可信度的理解。

更新时间: 2024-05-29 13:05:16

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.16282v2

Interpretability of Statistical, Machine Learning, and Deep Learning Models for Landslide Susceptibility Mapping in Three Gorges Reservoir Area

Landslide susceptibility mapping (LSM) is crucial for identifying high-risk areas and informing prevention strategies. This study investigates the interpretability of statistical, machine learning (ML), and deep learning (DL) models in predicting landslide susceptibility. This is achieved by incorporating various relevant interpretation methods and two types of input factors: a comprehensive set of 19 contributing factors that are statistically relevant to landslides, as well as a dedicated set of 9 triggering factors directly associated with triggering landslides. Given that model performance is a crucial metric in LSM, our investigations into interpretability naturally involve assessing and comparing LSM accuracy across different models considered. In our investigation, the convolutional neural network model achieved the highest accuracy (0.8447 with 19 factors; 0.8048 with 9 factors), while Extreme Gradient Boosting and Support Vector Machine also demonstrated strong predictive capabilities, outperforming conventional statistical models. These findings indicate that DL and sophisticated ML algorithms can effectively capture the complex relationships between input factors and landslide occurrence. However, the interpretability of predictions varied among different models, particularly when using the broader set of 19 contributing factors. Explanation methods like SHAP, LIME, and DeepLIFT also led to variations in interpretation results. Using a comprehensive set of 19 contributing factors improved prediction accuracy but introduced complexities and inconsistency in model interpretations. Focusing on a dedicated set of 9 triggering factors sacrificed some predictive power but enhanced interpretability, as evidenced by more consistent key factors identified across various models and alignment with the findings of field investigation reports....

Updated: 2024-05-29 13:02:11

标题: 三峡库区滑坡易发区域的统计学、机器学习和深度学习模型的可解释性

摘要: 山体滑坡易发性制图(LSM)对于确定高风险区域并制定预防策略至关重要。本研究调查了统计学、机器学习(ML)和深度学习(DL)模型在预测山体滑坡易发性方面的可解释性。为实现这一目标，研究纳入了各种相关的解释方法和两种类型的输入因素：一个包括19个与山体滑坡统计相关的因素集合，以及一个专门包括9个直接与滑坡触发相关的因素集合。鉴于模型性能在LSM中是一个关键指标，我们的解释性研究自然涉及对不同考虑的模型之间的LSM准确性进行评估和比较。在我们的研究中，卷积神经网络模型实现了最高的准确性(19个因素为0.8447；9个因素为0.8048)，而极限梯度提升和支持向量机也表现出强大的预测能力，超越了传统统计模型。这些发现表明，深度学习和复杂的机器学习算法可以有效地捕捉输入因素与山体滑坡发生之间的复杂关系。然而，不同模型的预测解释性存在差异，尤其是在使用更广泛的19个贡献因素集合时。SHAP、LIME和DeepLIFT等解释方法也导致解释结果的变化。使用一个包括19个贡献因素的综合集合提高了预测准确性，但引入了模型解释中的复杂性和不一致性。专注于一个包括9个触发因素的专门集合牺牲了一些预测能力，但增强了解释性，这体现在各种模型之间识别出更一致的关键因素，并与野外调查报告的发现相一致。

更新时间: 2024-05-29 13:02:11

领域: cs.LG

下载: http://arxiv.org/abs/2405.11762v2

Robust Entropy Search for Safe Efficient Bayesian Optimization

The practical use of Bayesian Optimization (BO) in engineering applications imposes special requirements: high sampling efficiency on the one hand and finding a robust solution on the other hand. We address the case of adversarial robustness, where all parameters are controllable during the optimization process, but a subset of them is uncontrollable or even adversely perturbed at the time of application. To this end, we develop an efficient information-based acquisition function that we call Robust Entropy Search (RES). We empirically demonstrate its benefits in experiments on synthetic and real-life data. The results showthat RES reliably finds robust optima, outperforming state-of-the-art algorithms.

Updated: 2024-05-29 13:00:10

标题: 稳健熵搜索用于安全高效的贝叶斯优化

摘要: 在工程应用中，贝叶斯优化（BO）的实际应用提出了特殊要求：一方面需要高效的采样效率，另一方面需要找到一个稳健的解决方案。我们讨论了对抗性稳健性的情况，即在优化过程中所有参数都是可控的，但其中一部分是不可控的，甚至在应用时受到不利的干扰。为此，我们开发了一种高效的基于信息的获取函数，我们称之为Robust Entropy Search（RES）。我们在合成和现实数据的实验中经验性地展示了其好处。结果表明，RES可可靠地找到稳健的最优解，优于最先进的算法。

更新时间: 2024-05-29 13:00:10

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.19059v1

Building Guardrails for Large Language Models

As Large Language Models (LLMs) become more integrated into our daily lives, it is crucial to identify and mitigate their risks, especially when the risks can have profound impacts on human users and societies. Guardrails, which filter the inputs or outputs of LLMs, have emerged as a core safeguarding technology. This position paper takes a deep look at current open-source solutions (Llama Guard, Nvidia NeMo, Guardrails AI), and discusses the challenges and the road towards building more complete solutions. Drawing on robust evidence from previous research, we advocate for a systematic approach to construct guardrails for LLMs, based on comprehensive consideration of diverse contexts across various LLMs applications. We propose employing socio-technical methods through collaboration with a multi-disciplinary team to pinpoint precise technical requirements, exploring advanced neural-symbolic implementations to embrace the complexity of the requirements, and developing verification and testing to ensure the utmost quality of the final product.

Updated: 2024-05-29 12:57:01

标题: 构建大型语言模型的护栏

摘要: 随着大型语言模型（LLMs）越来越融入我们的日常生活，识别和减轻它们的风险变得至关重要，特别是当这些风险可能对人类用户和社会产生深远影响时。护栏技术，即过滤LLMs的输入或输出的技术，已成为一种核心保障技术。本文深入探讨了当前开源解决方案（Llama Guard，Nvidia NeMo，Guardrails AI），并讨论了构建更完整解决方案的挑战和路径。借助以往研究的强有力证据，我们主张采用系统化方法为LLMs构建护栏，基于对各种LLMs应用程序中不同背景的全面考虑。我们提议通过与多学科团队合作，确定精确的技术要求，探索先进的神经符号实现以应对这些要求的复杂性，并开发验证和测试以确保最终产品的最高质量。

更新时间: 2024-05-29 12:57:01

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2402.01822v2

Multiscale Spatio-Temporal Enhanced Short-term Load Forecasting of Electric Vehicle Charging Stations

The rapid expansion of electric vehicles (EVs) has rendered the load forecasting of electric vehicle charging stations (EVCS) increasingly critical. The primary challenge in achieving precise load forecasting for EVCS lies in accounting for the nonlinear of charging behaviors, the spatial interactions among different stations, and the intricate temporal variations in usage patterns. To address these challenges, we propose a Multiscale Spatio-Temporal Enhanced Model (MSTEM) for effective load forecasting at EVCS. MSTEM incorporates a multiscale graph neural network to discern hierarchical nonlinear temporal dependencies across various time scales. Besides, it also integrates a recurrent learning component and a residual fusion mechanism, enhancing its capability to accurately capture spatial and temporal variations in charging patterns. The effectiveness of the proposed MSTEM has been validated through comparative analysis with six baseline models using three evaluation metrics. The case studies utilize real-world datasets for both fast and slow charging loads at EVCS in Perth, UK. The experimental results demonstrate the superiority of MSTEM in short-term continuous load forecasting for EVCS.

Updated: 2024-05-29 12:54:22

标题: Electric Vehicle Charging Stations的多尺度时空增强短期负载预测

摘要: 电动汽车（EVs）的快速扩张使得电动汽车充电站（EVCS）的负荷预测变得日益关键。实现对EVCS的精确负载预测的主要挑战在于考虑到充电行为的非线性、不同站点之间的空间相互作用以及使用模式中复杂的时间变化。为了解决这些挑战，我们提出了一种用于有效预测EVCS负载的多尺度时空增强模型（MSTEM）。MSTEM整合了多尺度图神经网络，以识别在不同时间尺度上的分层非线性时间依赖关系。此外，它还整合了一个循环学习组件和一个残差融合机制，增强了其准确捕捉充电模式的空间和时间变化的能力。通过与六个基线模型进行三个评估指标的比较分析，验证了所提出的MSTEM的有效性。案例研究利用了英国珀斯EVCS的快速和慢速充电负载的真实数据集。实验结果表明，MSTEM在EVCS的短期连续负载预测方面具有优越性。

更新时间: 2024-05-29 12:54:22

领域: eess.SY,cs.AI,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.19053v1

Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias

Citation practices are crucial in shaping the structure of scientific knowledge, yet they are often influenced by contemporary norms and biases. The emergence of Large Language Models (LLMs) like GPT-4 introduces a new dynamic to these practices. Interestingly, the characteristics and potential biases of references recommended by LLMs that entirely rely on their parametric knowledge, and not on search or retrieval-augmented generation, remain unexplored. Here, we analyze these characteristics in an experiment using a dataset of 166 papers from AAAI, NeurIPS, ICML, and ICLR, published after GPT-4's knowledge cut-off date, encompassing 3,066 references in total. In our experiment, GPT-4 was tasked with suggesting scholarly references for the anonymized in-text citations within these papers. Our findings reveal a remarkable similarity between human and LLM citation patterns, but with a more pronounced high citation bias in GPT-4, which persists even after controlling for publication year, title length, number of authors, and venue. Additionally, we observe a large consistency between the characteristics of GPT-4's existing and non-existent generated references, indicating the model's internalization of citation patterns. By analyzing citation graphs, we show that the references recommended by GPT-4 are embedded in the relevant citation context, suggesting an even deeper conceptual internalization of the citation networks. While LLMs can aid in citation generation, they may also amplify existing biases and introduce new ones, potentially skewing scientific knowledge dissemination. Our results underscore the need for identifying the model's biases and for developing balanced methods to interact with LLMs in general.

Updated: 2024-05-29 12:50:49

标题: 大型语言模型反映出人类引用模式，并具有增强的引用偏见

摘要: 引文实践在塑造科学知识结构中至关重要，然而往往受到当代规范和偏见的影响。像GPT-4这样的大型语言模型（LLMs）的出现为这些实践引入了新的动态。有趣的是，完全依赖于其参数化知识而不是搜索或检索增强生成的LLMs推荐的参考文献的特征和潜在偏见尚未被探索。在这里，我们使用来自AAAI，NeurIPS，ICML和ICLR的166篇论文数据集进行了实验分析，这些论文发表在GPT-4的知识截止日期之后，总共包含3,066个参考文献。在我们的实验中，我们要求GPT-4为这些论文中的匿名内文引用提供学术引用建议。我们的发现显示，人类和LLM引文模式之间存在显著的相似性，但是GPT-4中存在更为明显的高引文偏见，即使在控制出版年份、标题长度、作者数量和会议场所等因素后仍然存在。此外，我们观察到GPT-4已有生成的参考文献和不存在的生成参考文献之间存在着很大的一致性，表明该模型内部化了引文模式。通过分析引文图，我们展示了GPT-4推荐的参考文献嵌入在相关的引文上下文中，表明了对引文网络的更深层次的概念内化。虽然LLMs可以帮助生成引文，但它们也可能放大现有的偏见并引入新的偏见，可能扭曲科学知识的传播。我们的结果强调了需要识别模型的偏见，并开发与LLMs进行平衡互动的方法的重要性。

更新时间: 2024-05-29 12:50:49

领域: cs.DL,cs.AI,cs.LG,cs.SI

下载: http://arxiv.org/abs/2405.15739v2

Statistical Context Detection for Deep Lifelong Reinforcement Learning

Context detection involves labeling segments of an online stream of data as belonging to different tasks. Task labels are used in lifelong learning algorithms to perform consolidation or other procedures that prevent catastrophic forgetting. Inferring task labels from online experiences remains a challenging problem. Most approaches assume finite and low-dimension observation spaces or a preliminary training phase during which task labels are learned. Moreover, changes in the transition or reward functions can be detected only in combination with a policy, and therefore are more difficult to detect than changes in the input distribution. This paper presents an approach to learning both policies and labels in an online deep reinforcement learning setting. The key idea is to use distance metrics, obtained via optimal transport methods, i.e., Wasserstein distance, on suitable latent action-reward spaces to measure distances between sets of data points from past and current streams. Such distances can then be used for statistical tests based on an adapted Kolmogorov-Smirnov calculation to assign labels to sequences of experiences. A rollback procedure is introduced to learn multiple policies by ensuring that only the appropriate data is used to train the corresponding policy. The combination of task detection and policy deployment allows for the optimization of lifelong reinforcement learning agents without an oracle that provides task labels. The approach is tested using two benchmarks and the results show promising performance when compared with related context detection algorithms. The results suggest that optimal transport statistical methods provide an explainable and justifiable procedure for online context detection and reward optimization in lifelong reinforcement learning.

Updated: 2024-05-29 12:44:41

标题: 深度终身强化学习的统计上下文检测

摘要: 上下文检测涉及将在线数据流的段标记为属于不同的任务。任务标签在终身学习算法中用于执行巩固或其他程序，以防止灾难性遗忘。从在线经验中推断任务标签仍然是一个具有挑战性的问题。大多数方法假设有限和低维观测空间或者在任务标签被学习期间进行初步训练阶段。此外，只有在结合策略时才能检测到转换或奖励函数的变化，因此比检测输入分布的变化更加困难。本文提出了一种在线深度强化学习环境中学习策略和标签的方法。关键思想是使用距离度量，通过最优输运方法（即Wasserstein距离），在适当的潜在动作-奖励空间上测量过去和当前数据流的数据点集之间的距离。这些距离可以用于基于经过调整的Kolmogorov-Smirnov计算的统计测试，以为经验序列分配标签。引入了一个回滚过程，通过确保只使用适当的数据来训练相应的策略，来学习多个策略。任务检测和策略部署的组合允许优化终身强化学习代理，而无需提供任务标签的神谕。该方法使用两个基准进行测试，结果显示与相关上下文检测算法相比表现出有希望的性能。结果表明，最优输运统计方法为在线上下文检测和奖励优化提供了一种可解释和可证明的程序。

更新时间: 2024-05-29 12:44:41

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19047v1

From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

To date, toxicity mitigation in language models has almost entirely been focused on single-language settings. As language models embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.

Updated: 2024-05-29 12:28:13

标题: 从一而多：扩大语言模型中毒性缓解的范围

摘要: 迄今为止，语言模型中的毒性缓解几乎完全集中在单一语言环境中。随着语言模型 embracing 多语言能力，我们的安全措施也至关重要。鉴于这一研究空白，我们的方法将传统毒性缓解的范围扩大，以解决多语言环境下出现的复杂性。在缺乏跨语言的充分注释数据集的情况下，我们使用翻译数据来评估和增强我们的缓解技术。我们还比较了在静态和持续毒性缓解情景下，微调缓解方法与检索增强技术之间的差异。这使我们能够研究翻译质量和跨语言转移对毒性缓解的影响。我们还探讨了模型大小和数据数量对这些缓解工作成功的影响。我们的研究涵盖了九种语言，代表了广泛的语言家族和资源可用性水平，从高资源到中等资源语言不等。通过全面的实验，我们深入探讨了多语言毒性缓解的复杂性，提供了有价值的见解，并为这一日益重要的领域的未来研究铺平了道路。代码和数据可在https://github.com/for-ai/goodtriever 上找到。

更新时间: 2024-05-29 12:28:13

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2403.03893v2

State Space Models are Comparable to Transformers in Estimating Functions with Dynamic Smoothness

Deep neural networks based on state space models (SSMs) are attracting much attention in sequence modeling since their computational cost is significantly smaller than that of Transformers. While the capabilities of SSMs have been primarily investigated through experimental comparisons, theoretical understanding of SSMs is still limited. In particular, there is a lack of statistical and quantitative evaluation of whether SSM can replace Transformers. In this paper, we theoretically explore in which tasks SSMs can be alternatives of Transformers from the perspective of estimating sequence-to-sequence functions. We consider the setting where the target function has direction-dependent smoothness and prove that SSMs can estimate such functions with the same convergence rate as Transformers. Additionally, we prove that SSMs can estimate the target function, even if the smoothness changes depending on the input sequence, as well as Transformers. Our results show the possibility that SSMs can replace Transformers when estimating the functions in certain classes that appear in practice.

Updated: 2024-05-29 12:23:48

标题: 状态空间模型与变压器在估计具有动态平滑性函数方面是可比较的

摘要: 基于状态空间模型（SSMs）的深度神经网络在序列建模中引起了广泛关注，因为它们的计算成本显著低于Transformer的成本。虽然SSMs的能力主要是通过实验比较来进行研究的，但对SSMs的理论理解仍然有限。特别是，缺乏统计和定量评估SSM是否可以替代Transformer。在本文中，我们从估计序列到序列函数的角度，从理论上探讨了SSMs可以在哪些任务中成为Transformer的替代品。我们考虑目标函数具有方向相关的平滑性的情况，并证明SSMs可以以与Transformer相同的收敛速率估计这样的函数。此外，我们证明SSMs可以估计目标函数，即使平滑性随输入序列的变化而变化，与Transformer一样。我们的结果表明，在实践中出现的某些类别的函数估计中，SSMs有可能取代Transformer。

更新时间: 2024-05-29 12:23:48

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19036v1

CiliaGraph: Enabling Expression-enhanced Hyper-Dimensional Computation in Ultra-Lightweight and One-Shot Graph Classification on Edge

Graph Neural Networks (GNNs) are computationally demanding and inefficient when applied to graph classification tasks in resource-constrained edge scenarios due to their inherent process, involving multiple rounds of forward and backward propagation. As a lightweight alternative, Hyper-Dimensional Computing (HDC), which leverages high-dimensional vectors for data encoding and processing, offers a more efficient solution by addressing computational bottleneck. However, current HDC methods primarily focus on static graphs and neglect to effectively capture node attributes and structural information, which leads to poor accuracy. In this work, we propose CiliaGraph, an enhanced expressive yet ultra-lightweight HDC model for graph classification. This model introduces a novel node encoding strategy that preserves relative distance isomorphism for accurate node connection representation. In addition, node distances are utilized as edge weights for information aggregation, and the encoded node attributes and structural information are concatenated to obtain a comprehensive graph representation. Furthermore, we explore the relationship between orthogonality and dimensionality to reduce the dimensions, thereby further enhancing computational efficiency. Compared to the SOTA GNNs, extensive experiments show that CiliaGraph reduces memory usage and accelerates training speed by an average of 292 times(up to 2341 times) and 103 times(up to 313 times) respectively while maintaining comparable accuracy.

Updated: 2024-05-29 12:22:59

标题: CiliaGraph：在边缘上实现超轻量级和一次性图分类中的表达增强超维计算

摘要: 图神经网络（GNNs）在资源受限的边缘场景中应用于图分类任务时，由于其涉及多轮前向和后向传播的固有过程，具有计算需求高且效率低的特点。作为一种轻量级的替代方案，超高维计算（HDC）利用高维向量进行数据编码和处理，通过解决计算瓶颈提供了更高效的解决方案。然而，当前的HDC方法主要集中在静态图上，并忽视了有效捕捉节点属性和结构信息，导致准确性不佳。在本研究中，我们提出了CiliaGraph，这是一个增强的表达力高且超轻量级的HDC模型用于图分类。该模型引入了一种新颖的节点编码策略，以保留相对距离同构以实现准确的节点连接表示。此外，节点之间的距离被用作信息聚合的边权重，并将编码后的节点属性和结构信息连接起来以获得全面的图表示。此外，我们探索正交性和维度之间的关系，以减少维度，从而进一步提高计算效率。与SOTA GNN相比，大量实验表明，CiliaGraph在保持可比准确性的同时，将内存使用量减少并分别加快了训练速度，平均分别提高了292倍（最高2341倍）和103倍（最高313倍）。

更新时间: 2024-05-29 12:22:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.19033v1

Generalized Sobolev Transport for Probability Measures on a Graph

We study the optimal transport (OT) problem for measures supported on a graph metric space. Recently, Le et al. (2022) leverage the graph structure and propose a variant of OT, namely Sobolev transport (ST), which yields a closed-form expression for a fast computation. However, ST is essentially coupled with the $L^p$ geometric structure within its definition which makes it nontrivial to utilize ST for other prior structures. In contrast, the classic OT has the flexibility to adapt to various geometric structures by modifying the underlying cost function. An important instance is the Orlicz-Wasserstein (OW) which moves beyond the $L^p$ structure by leveraging the \emph{Orlicz geometric structure}. Comparing to the usage of standard $p$-order Wasserstein, OW remarkably helps to advance certain machine learning approaches. Nevertheless, OW brings up a new challenge on its computation due to its two-level optimization formulation. In this work, we leverage a specific class of convex functions for Orlicz structure to propose the generalized Sobolev transport (GST). GST encompasses the ST as its special case, and can be utilized for prior structures beyond the $L^p$ geometry. In connection with the OW, we show that one only needs to simply solve a univariate optimization problem to compute the GST, unlike the complex two-level optimization problem in OW. We empirically illustrate that GST is several-order faster than the OW. Moreover, we provide preliminary evidences on the advantages of GST for document classification and for several tasks in topological data analysis.

Updated: 2024-05-29 12:22:37

标题: 在一个图上的概率测度的generalized Sobolev传输

摘要: 我们研究了支持图度量空间上的测度的最优输运（OT）问题。最近，Le等人（2022年）利用图结构，提出了一种OT的变体，即Sobolev输运（ST），它提供了一个闭式表达式以便快速计算。然而，ST本质上与其定义中的$L^p$几何结构相耦合，这使得利用ST处理其他先验结构变得非常困难。相比之下，经典的OT通过修改基础成本函数具有适应各种几何结构的灵活性。一个重要的例子是Orlicz-Wasserstein（OW），它通过利用\emph{Orlicz几何结构}超越了$L^p$结构。与使用标准$p$阶Wasserstein相比，OW显著促进了某些机器学习方法的进展。然而，由于其两级优化公式，OW在计算上带来了新的挑战。在这项工作中，我们利用一类特定的凸函数用于Orlicz结构，提出了广义Sobolev输运（GST）。GST将ST作为其特殊情况，并可用于超越$L^p$几何的先验结构。与OW相关联，我们展示只需简单解决一个一元优化问题即可计算GST，而不像OW中复杂的两级优化问题。我们在实证上展示了GST比OW快几个数量级。此外，我们提供了GST在文档分类和拓扑数据分析中的几个任务中的优势的初步证据。

更新时间: 2024-05-29 12:22:37

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2402.04516v2

Large Language Models for Code Summarization

Recently, there has been increasing activity in using deep learning for software engineering, including tasks like code generation and summarization. In particular, the most recent coding Large Language Models seem to perform well on these problems. In this technical report, we aim to review how these models perform in code explanation/summarization, while also investigating their code generation capabilities (based on natural language descriptions).

Updated: 2024-05-29 12:18:51

标题: 大型语言模型用于代码摘要生成

摘要: 最近，在软件工程领域中，使用深度学习进行编程生成和总结等任务的活动日益增多。特别是，最近的编程大型语言模型似乎在这些问题上表现良好。在这份技术报告中，我们旨在审查这些模型在代码解释/总结方面的表现，同时也调查它们基于自然语言描述的代码生成能力。

更新时间: 2024-05-29 12:18:51

领域: cs.AI,cs.LG,cs.PL,cs.SE

下载: http://arxiv.org/abs/2405.19032v1

FedAL: Black-Box Federated Knowledge Distillation Enabled by Adversarial Learning

Knowledge distillation (KD) can enable collaborative learning among distributed clients that have different model architectures and do not share their local data and model parameters with others. Each client updates its local model using the average model output/feature of all client models as the target, known as federated KD. However, existing federated KD methods often do not perform well when clients' local models are trained with heterogeneous local datasets. In this paper, we propose Federated knowledge distillation enabled by Adversarial Learning (FedAL) to address the data heterogeneity among clients. First, to alleviate the local model output divergence across clients caused by data heterogeneity, the server acts as a discriminator to guide clients' local model training to achieve consensus model outputs among clients through a min-max game between clients and the discriminator. Moreover, catastrophic forgetting may happen during the clients' local training and global knowledge transfer due to clients' heterogeneous local data. Towards this challenge, we design the less-forgetting regularization for both local training and global knowledge transfer to guarantee clients' ability to transfer/learn knowledge to/from others. Experimental results show that FedAL and its variants achieve higher accuracy than other federated KD baselines.

Updated: 2024-05-29 12:17:52

标题: FedAL：通过对抗学习实现的黑盒联邦知识蒸馏

摘要: 知识蒸馏（KD）可以实现分布式客户端之间的协作学习，这些客户端具有不同的模型架构，并且不与其他客户端共享其本地数据和模型参数。每个客户端使用所有客户端模型的平均模型输出/特征作为目标来更新其本地模型，这被称为联邦知识蒸馏。然而，现有的联邦知识蒸馏方法在客户端的本地模型使用异构本地数据进行训练时通常表现不佳。在本文中，我们提出了一种通过对抗学习实现的联邦知识蒸馏（FedAL）来解决客户端之间的数据异质性。首先，为了减轻由数据异质性引起的客户端之间本地模型输出的分歧，服务器充当鉴别器，通过客户端和鉴别器之间的最小-最大博弈来引导客户端的本地模型训练，以实现客户端之间的一致模型输出。此外，在客户端的本地训练和全局知识传输过程中可能发生灾难性遗忘，这是由于客户端的异构本地数据引起的。为了应对这一挑战，我们设计了一种对本地训练和全局知识传输都适用的减少遗忘的正则化方法，以确保客户端能够将知识传输/学习给/从其他客户端。实验结果表明，FedAL及其变体的准确率比其他联邦知识蒸馏基线更高。

更新时间: 2024-05-29 12:17:52

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2311.16584v2

Convex neural network synthesis for robustness in the 1-norm

With neural networks being used to control safety-critical systems, they increasingly have to be both accurate (in the sense of matching inputs to outputs) and robust. However, these two properties are often at odds with each other and a trade-off has to be navigated. To address this issue, this paper proposes a method to generate an approximation of a neural network which is certifiably more robust. Crucially, the method is fully convex and posed as a semi-definite programme. An application to robustifying model predictive control is used to demonstrate the results. The aim of this work is to introduce a method to navigate the neural network robustness/accuracy trade-off.

Updated: 2024-05-29 12:17:09

标题: 凸神经网络综合在1-范数中的鲁棒性

摘要: 随着神经网络被用于控制安全关键系统，它们越来越需要精确（即匹配输入和输出）和稳健。然而，这两个属性经常相互矛盾，需要权衡。为了解决这个问题，本文提出了一种方法来生成一个神经网络的近似，该近似在稳健性方面具有可证实的更强鲁棒性。关键地，该方法完全是凸的，并被表述为一个半定规划。将此方法应用于增强模型预测控制以演示结果。本研究的目的是引入一种方法来平衡神经网络的稳健性和精确性之间的权衡。

更新时间: 2024-05-29 12:17:09

领域: eess.SY,cs.AI,cs.SY

下载: http://arxiv.org/abs/2405.19029v1

Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcement

Large Language Models (LLMs) have shown promise as intelligent agents in interactive decision-making tasks. Traditional approaches often depend on meticulously designed prompts, high-quality examples, or additional reward models for in-context learning, supervised fine-tuning, or RLHF. Reinforcement learning (RL) presents a dynamic alternative for LLMs to overcome these dependencies by engaging directly with task-specific environments. Nonetheless, it faces significant hurdles: 1) instability stemming from the exponentially vast action space requiring exploration; 2) challenges in assigning token-level credit based on action-level reward signals, resulting in discord between maximizing rewards and accurately modeling corpus data. In response to these challenges, we introduce Entropy-Regularized Token-level Policy Optimization (ETPO), an entropy-augmented RL method tailored for optimizing LLMs at the token level. At the heart of ETPO is our novel per-token soft Bellman update, designed to harmonize the RL process with the principles of language modeling. This methodology decomposes the Q-function update from a coarse action-level view to a more granular token-level perspective, backed by theoretical proof of optimization consistency. Crucially, this decomposition renders linear time complexity in action exploration. We assess the effectiveness of ETPO within a simulated environment that models data science code generation as a series of multi-step interactive tasks; results underline ETPO's potential as a robust method for refining the interactive decision-making capabilities of language agents. For a more detailed preliminary work describing our motivation for token-level decomposition and applying it in PPO methods, please refer to arXiv:2405.15821.

Updated: 2024-05-29 12:15:46

标题: 熵正则化的语言代理增强中的令牌级策略优化

摘要: 大型语言模型（LLMs）在互动决策任务中显示出智能代理的潜力。传统方法通常依赖于精心设计的提示，高质量的示例，或者额外的奖励模型用于上下文学习，监督微调或RLHF。强化学习（RL）为LLMs提供了一种动态替代方案，可以通过直接与任务特定环境互动来克服这些依赖性。然而，它面临着重要的障碍：1）由于需要探索的指数级庞大的行动空间而导致的不稳定性；2）基于动作级奖励信号分配令牌级信用的挑战，导致在最大化奖励和准确建模语料库数据之间存在不一致。针对这些挑战，我们引入了熵正则化的令牌级策略优化（ETPO），这是一种针对优化LLMs的令牌级的RL方法。ETPO的核心是我们的新颖的每令牌软Bellman更新，旨在将RL过程与语言建模原则协调一致。这种方法将Q函数更新从粗粒度的动作级视图分解为更细粒度的令牌级视角，支持优化一致性的理论证明。关键是，这种分解使得行动探索的时间复杂度线性。我们在一个模拟环境中评估了ETPO的有效性，该环境模拟数据科学代码生成作为一系列多步互动任务；结果突显了ETPO作为优化语言代理的互动决策能力的强大方法的潜力。关于我们对令牌级分解的动机描述和在PPO方法中应用的更详细的初步工作，请参考arXiv：2405.15821。

更新时间: 2024-05-29 12:15:46

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.06700v3

Nesting Particle Filters for Experimental Design in Dynamical Systems

In this paper, we propose a novel approach to Bayesian experimental design for non-exchangeable data that formulates it as risk-sensitive policy optimization. We develop the Inside-Out SMC$^2$ algorithm, a nested sequential Monte Carlo technique to infer optimal designs, and embed it into a particle Markov chain Monte Carlo framework to perform gradient-based policy amortization. Our approach is distinct from other amortized experimental design techniques, as it does not rely on contrastive estimators. Numerical validation on a set of dynamical systems showcases the efficacy of our method in comparison to other state-of-the-art strategies.

Updated: 2024-05-29 12:15:40

标题: 粒子滤波在动力系统实验设计中的应用

摘要: 在这篇论文中，我们提出了一种新颖的贝叶斯实验设计方法，针对非可交换数据，将其制定为风险敏感策略优化。我们开发了Inside-Out SMC$^2$算法，一种嵌套顺序蒙特卡洛技术，用于推断最佳设计，并将其嵌入到粒子马尔可夫链蒙特卡洛框架中，以执行基于梯度的策略摊销。我们的方法与其他摊销实验设计技术不同，因为它不依赖于对比估计器。在一组动态系统上进行的数值验证展示了我们的方法在与其他最先进策略相比的有效性。

更新时间: 2024-05-29 12:15:40

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2402.07868v4

DiveR-CT: Diversity-enhanced Red Teaming with Relaxing Constraints

Recent advances in large language models (LLMs) have made them indispensable, raising significant concerns over managing their safety. Automated red teaming offers a promising alternative to the labor-intensive and error-prone manual probing for vulnerabilities, providing more consistent and scalable safety evaluations. However, existing approaches often compromise diversity by focusing on maximizing attack success rate. Additionally, methods that decrease the cosine similarity from historical embeddings with semantic diversity rewards lead to novelty stagnation as history grows. To address these issues, we introduce DiveR-CT, which relaxes conventional constraints on the objective and semantic reward, granting greater freedom for the policy to enhance diversity. Our experiments demonstrate DiveR-CT's marked superiority over baselines by 1) generating data that perform better in various diversity metrics across different attack success rate levels, 2) better-enhancing resiliency in blue team models through safety tuning based on collected data, 3) allowing dynamic control of objective weights for reliable and controllable attack success rates, and 4) reducing susceptibility to reward overoptimization. Project details and code can be found at https://andrewzh112.github.io/#diverct.

Updated: 2024-05-29 12:12:09

标题: DiveR-CT：通过放松约束增强多样性的红队行动

摘要: 最近对大型语言模型（LLMs）的最新进展使它们变得不可或缺，引发了对管理其安全性的重大关注。自动化红队测试提供了一种有希望的替代方案，可以减少繁重和容易出错的手动漏洞探测，提供更一致和可扩展的安全评估。然而，现有方法往往通过专注于最大化攻击成功率来牺牲多样性。此外，通过减少与历史嵌入的余弦相似度的方法，以语义多样性奖励导致新颖性停滞随着历史的增长。为了解决这些问题，我们介绍了DiveR-CT，它放宽了对目标和语义奖励的传统约束，为策略提供了更大的自由度来增强多样性。我们的实验表明，DiveR-CT在以下方面明显优于基线：1）生成在不同攻击成功率水平上表现更好的数据，2）通过基于收集数据的安全调整更好地增强蓝队模型的弹性，3）允许动态控制目标权重以获得可靠和可控的攻击成功率，4）减少对奖励的过度优化。项目详情和代码可在https://andrewzh112.github.io/#diverct找到。

更新时间: 2024-05-29 12:12:09

领域: cs.LG,cs.AI,cs.CL,cs.CR

下载: http://arxiv.org/abs/2405.19026v1

Random Inverse Problems Over Graphs: Decentralized Online Learning

We establish a framework of distributed random inverse problems over network graphs with online measurements, and propose a decentralized online learning algorithm. This unifies the distributed parameter estimation in Hilbert spaces and the least mean square problem in reproducing kernel Hilbert spaces (RKHS-LMS). We transform the convergence of the algorithm into the asymptotic stability of a class of inhomogeneous random difference equations in Hilbert spaces with L2-bounded martingale difference terms and develop the L2 -asymptotic stability theory in Hilbert spaces. It is shown that if the network graph is connected and the sequence of forward operators satisfies the infinite-dimensional spatio-temporal persistence of excitation condition, then the estimates of all nodes are mean square and almost surely strongly consistent. Moreover, we propose a decentralized online learning algorithm in RKHS based on non-stationary and non-independent online data streams, and prove that the algorithm is mean square and almost surely strongly consistent if the operators induced by the random input data satisfy the infinite-dimensional spatio-temporal persistence of excitation condition.

Updated: 2024-05-29 12:08:02

标题: 在图上的随机逆问题：分布式在线学习

摘要: 我们建立了一个基于网络图和在线测量的分布式随机逆问题框架，并提出了一个分散的在线学习算法。这将希尔伯特空间中的分布参数估计和再生核希尔伯特空间中的最小均方问题（RKHS-LMS）统一起来。我们将算法的收敛性转化为希尔伯特空间中一类不均匀随机差分方程的渐近稳定性，其中包含L2有界鞅差分项，并发展了希尔伯特空间中的L2渐近稳定理论。结果表明，如果网络图是连通的，并且正向算子序列满足无限维时空激励条件，那么所有节点的估计均为均方且几乎必然强一致的。此外，我们提出了一个基于非稳态和非独立在线数据流的RKHS中的分散式在线学习算法，并证明如果由随机输入数据诱导的算子满足无限维时空激励条件，则该算法是均方的且几乎必然强一致的。

更新时间: 2024-05-29 12:08:02

领域: cs.LG,cs.DC,cs.SY,eess.SY,math.PR

下载: http://arxiv.org/abs/2303.11789v5

Inverse Concave-Utility Reinforcement Learning is Inverse Game Theory

We consider inverse reinforcement learning problems with concave utilities. Concave Utility Reinforcement Learning (CURL) is a generalisation of the standard RL objective, which employs a concave function of the state occupancy measure, rather than a linear function. CURL has garnered recent attention for its ability to represent instances of many important applications including the standard RL such as imitation learning, pure exploration, constrained MDPs, offline RL, human-regularized RL, and others. Inverse reinforcement learning is a powerful paradigm that focuses on recovering an unknown reward function that can rationalize the observed behaviour of an agent. There has been recent theoretical advances in inverse RL where the problem is formulated as identifying the set of feasible reward functions. However, inverse RL for CURL problems has not been considered previously. In this paper we show that most of the standard IRL results do not apply to CURL in general, since CURL invalidates the classical Bellman equations. This calls for a new theoretical framework for the inverse CURL problem. Using a recent equivalence result between CURL and Mean-field Games, we propose a new definition for the feasible rewards for I-CURL by proving that this problem is equivalent to an inverse game theory problem in a subclass of mean-field games. We present initial query and sample complexity results for the I-CURL problem under assumptions such as Lipschitz-continuity. Finally, we outline future directions and applications in human--AI collaboration enabled by our results.

Updated: 2024-05-29 12:07:17

标题: 逆凹效用强化学习是逆博弈论

摘要: 我们考虑具有凹效用的逆强化学习问题。凹效用强化学习（CURL）是标准RL目标的一个泛化，它使用状态占用度量的凹函数，而不是线性函数。CURL近来备受关注，因为它能够表示许多重要应用的实例，包括标准RL，如模仿学习、纯探索、约束MDPs、离线RL、人类正则化RL等。逆强化学习是一个强大的范式，专注于恢复一个未知的奖励函数，可以合理解释代理的观察行为。最近在逆RL方面有了理论上的进展，其中问题被制定为识别可行奖励函数的集合。然而，以前并未考虑逆RL的CURL问题。在本文中，我们展示大多数标准IRL结果不适用于一般的CURL，因为CURL使经典的Bellman方程无效。这要求为逆CURL问题建立一个新的理论框架。利用CURL和均场博弈之间的最近等价结果，我们提出了一种新的I-CURL的可行奖励定义，通过证明这个问题等价于均场博弈子类中的逆博弈理论问题。我们在假设Lipschitz连续性等条件下，提供了I-CURL问题的初始查询和样本复杂性结果。最后，我们概述了由我们的结果实现的人工智能协作的未来方向和应用。

更新时间: 2024-05-29 12:07:17

领域: cs.LG,cs.AI,cs.GT,cs.MA

下载: http://arxiv.org/abs/2405.19024v1

Towards Standardizing AI Bias Exploration

Creating fair AI systems is a complex problem that involves the assessment of context-dependent bias concerns. Existing research and programming libraries express specific concerns as measures of bias that they aim to constrain or mitigate. In practice, one should explore a wide variety of (sometimes incompatible) measures before deciding which ones warrant corrective action, but their narrow scope means that most new situations can only be examined after devising new measures. In this work, we present a mathematical framework that distils literature measures of bias into building blocks, hereby facilitating new combinations to cover a wide range of fairness concerns, such as classification or recommendation differences across multiple multi-value sensitive attributes (e.g., many genders and races, and their intersections). We show how this framework generalizes existing concepts and present frequently used blocks. We provide an open-source implementation of our framework as a Python library, called FairBench, that facilitates systematic and extensible exploration of potential bias concerns.

Updated: 2024-05-29 12:03:45

标题: 走向标准化人工智能偏见探索

摘要: 创建公平的人工智能系统是一个复杂的问题，涉及评估依赖于上下文的偏见关注。现有的研究和编程库将特定关注点表达为偏见的度量，它们旨在限制或减轻偏见。在实践中，人们应该在决定哪些措施值得采取纠正行动之前，探索各种（有时不相容的）措施，但它们狭窄的范围意味着大多数新情况只能在设计新措施之后才能被检查。在这项工作中，我们提出了一个数学框架，将文献中关于偏见的度量转化为构建模块，从而便于形成新的组合，覆盖广泛的公平关注点，例如多个多值敏感属性（例如多种性别和种族，以及它们的交叉点）之间的分类或推荐差异。我们展示了这个框架如何推广现有概念，并呈现经常使用的模块。我们提供了我们的框架的开源实现，作为一个名为FairBench的Python库，促进了潜在偏见关注的系统化和可扩展的探索。

更新时间: 2024-05-29 12:03:45

领域: cs.LG,cs.CY,cs.HC

下载: http://arxiv.org/abs/2405.19022v1

InterpretCC: Intrinsic User-Centric Interpretability through Global Mixture of Experts

Interpretability for neural networks is a trade-off between three key requirements: 1) faithfulness of the explanation (i.e., how perfectly it explains the prediction), 2) understandability of the explanation by humans, and 3) model performance. Most existing methods compromise one or more of these requirements; e.g., post-hoc approaches provide limited faithfulness, automatically identified feature masks compromise understandability, and intrinsically interpretable methods such as decision trees limit model performance. These shortcomings are unacceptable for sensitive applications such as education and healthcare, which require trustworthy explanations, actionable interpretations, and accurate predictions. In this work, we present InterpretCC (interpretable conditional computation), a family of interpretable-by-design neural networks that guarantee human-centric interpretability, while maintaining comparable performance to state-of-the-art models by adaptively and sparsely activating features before prediction. We extend this idea into an interpretable, global mixture-of-experts (MoE) model that allows humans to specify topics of interest, discretely separates the feature space for each data point into topical subnetworks, and adaptively and sparsely activates these topical subnetworks for prediction. We apply variations of the InterpretCC architecture for text, time series and tabular data across several real-world benchmarks, demonstrating comparable performance with non-interpretable baselines, outperforming interpretable-by-design baselines, and showing higher actionability and usefulness according to a user study.

Updated: 2024-05-29 12:03:40

标题: InterpretCC：通过全局专家混合体实现内在用户中心的可解释性

摘要: 神经网络的可解释性在三个关键要求之间存在权衡：1）解释的忠实度（即，它如何完美地解释预测），2）解释对人类的可理解性，以及3）模型性能。大多数现有方法都牺牲了其中一个或多个要求；例如，事后方法提供了有限的忠实度，自动识别的特征掩码牺牲了可理解性，而决策树等固有可解释性方法限制了模型性能。这些缺点对于需要可信赖的解释、可行的解释和准确预测的敏感应用（如教育和医疗保健）是不可接受的。在这项工作中，我们提出了InterpretCC（可解释性条件计算），这是一组通过设计可解释性的神经网络，可以保证人类中心的解释性，同时保持与最先进模型相当的性能，通过在预测前自适应地和稀疏地激活特征。我们将这个想法扩展到一个可解释的全局专家混合（MoE）模型，允许人类指定感兴趣的主题，将每个数据点的特征空间离散分离为主题子网络，并自适应地和稀疏地激活这些主题子网络进行预测。我们在文本、时间序列和表格数据的几个真实基准测试中应用了InterpretCC架构的变体，展示了与非可解释基线相当的性能，胜过可解释性设计基线，并根据用户研究显示了更高的可行性和实用性。

更新时间: 2024-05-29 12:03:40

领域: cs.LG,cs.CY,cs.HC

下载: http://arxiv.org/abs/2402.02933v3

Physics-Aware Neural Implicit Solvers for multiscale, parametric PDEs with applications in heterogeneous media

We propose Physics-Aware Neural Implicit Solvers (PANIS), a novel, data-driven framework for learning surrogates for parametrized Partial Differential Equations (PDEs). It consists of a probabilistic, learning objective in which weighted residuals are used to probe the PDE and provide a source of {\em virtual} data i.e. the actual PDE never needs to be solved. This is combined with a physics-aware implicit solver that consists of a much coarser, discretized version of the original PDE, which provides the requisite information bottleneck for high-dimensional problems and enables generalization in out-of-distribution settings (e.g. different boundary conditions). We demonstrate its capability in the context of random heterogeneous materials where the input parameters represent the material microstructure. We extend the framework to multiscale problems and show that a surrogate can be learned for the effective (homogenized) solution without ever solving the reference problem. We further demonstrate how the proposed framework can accommodate and generalize several existing learning objectives and architectures while yielding probabilistic surrogates that can quantify predictive uncertainty.

Updated: 2024-05-29 12:01:49

标题: 物理感知神经隐式求解器用于多尺度、参数化偏微分方程在异质介质中的应用

摘要: 我们提出了一种名为Physics-Aware Neural Implicit Solvers (PANIS)的新颖、数据驱动的框架，用于学习参数化偏微分方程（PDEs）的替代模型。它由一个概率性学习目标组成，其中加权残差用于探测PDE并提供一种“虚拟”数据的来源，即实际的PDE永远不需要被解决。这与一个物理感知的隐式求解器相结合，该求解器由原始PDE的一个粗糙、离散化版本组成，为高维问题提供必要的信息瓶颈，并使得在分布外的设置（如不同的边界条件）中能够泛化。我们展示了其在随机异质材料的背景下的能力，其中输入参数代表材料微观结构。我们将这一框架扩展到多尺度问题，并展示了可以学习到有效（均匀化）解决方案的替代模型，而无需解决参考问题。我们进一步展示了所提出的框架如何适应和泛化几种现有的学习目标和架构，同时产生能够量化预测不确定性的概率替代模型。

更新时间: 2024-05-29 12:01:49

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.19019v1

Efficient Exploration in Average-Reward Constrained Reinforcement Learning: Achieving Near-Optimal Regret With Posterior Sampling

We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of $\tilde{O} (DS\sqrt{AT})$ for any communicating CMDP with $S$ states, $A$ actions, and diameter $D$. This regret bound matches the lower bound in order of time horizon $T$ and is the best-known regret bound for communicating CMDPs achieved by a computationally tractable algorithm. Empirical results show that our posterior sampling algorithm outperforms the existing algorithms for constrained reinforcement learning.

Updated: 2024-05-29 11:59:56

标题: 在平均奖励约束强化学习中高效探索：通过后验抽样实现接近最优遗憾

摘要: 我们提出了一种基于后验采样的新算法，用于在无限时域不打折扣的条件马尔可夫决策过程（CMDP）中学习。该算法在实证方面比现有算法具有明显优势，能够实现接近最优的后悔上界。我们的主要理论结果是对于每个成本组件的贝叶斯后悔上界为$\tilde{O} (DS\sqrt{AT})$，适用于具有$S$个状态，$A$个动作和直径$D$的任何通信CMDP。该后悔上界与时间跨度$T$的下界相匹配，是目前对于通信CMDP最佳已知的后悔上界，由一个可计算的易处理算法实现。实证结果表明，我们的后验采样算法在受限强化学习方面表现优于现有算法。

更新时间: 2024-05-29 11:59:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.19017v1

Distributed Management of Fluctuating Energy Resources in Dynamic Networked Systems

Modern power systems integrate renewable distributed energy resources (DERs) as an environment-friendly enhancement to meet the ever-increasing demands. However, the inherent unreliability of renewable energy renders developing DER management algorithms imperative. We study the energy-sharing problem in a system consisting of several DERs. Each agent harvests and distributes renewable energy in its neighborhood to optimize the network's performance while minimizing energy waste. We model this problem as a bandit convex optimization problem with constraints that correspond to each node's limitations for energy production. We propose distributed decision-making policies to solve the formulated problem, where we utilize the notion of dynamic regret as the performance metric. We also include an adjustment strategy in our developed algorithm to reduce the constraint violations. Besides, we design a policy that deals with the non-stationary environment. Theoretical analysis shows the effectiveness of our proposed algorithm. Numerical experiments using a real-world dataset show superior performance of our proposal compared to state-of-the-art methods.

Updated: 2024-05-29 11:54:11

标题: 动态网络系统中波动能源资源的分布管理

摘要: 现代电力系统整合可再生分布式能源资源（DERs）作为一种环境友好的增强措施，以满足不断增长的需求。然而，可再生能源固有的不可靠性使得开发DER管理算法变得迫在眉睫。我们研究了一个由多个DER组成的系统中的能源共享问题。每个代理收集和分配其邻域内的可再生能源，以优化网络的性能同时最小化能源浪费。我们将这个问题建模为一个具有对应于每个节点能源生产限制的约束的赌博凸优化问题。我们提出了分布式决策策略来解决制定的问题，其中我们利用动态遗憾作为性能度量。我们还在我们开发的算法中包括了一个调整策略，以减少约束违反。此外，我们设计了一个处理非静态环境的策略。理论分析显示了我们提出的算法的有效性。使用真实数据集进行的数值实验显示，与最先进的方法相比，我们提出的方法表现出卓越的性能。

更新时间: 2024-05-29 11:54:11

领域: eess.SY,cs.LG,cs.MA,cs.SY

下载: http://arxiv.org/abs/2405.19015v1

Trust the Model Where It Trusts Itself -- Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption

Dyna-style model-based reinforcement learning (MBRL) combines model-free agents with predictive transition models through model-based rollouts. This combination raises a critical question: 'When to trust your model?'; i.e., which rollout length results in the model providing useful data? Janner et al. (2019) address this question by gradually increasing rollout lengths throughout the training. While theoretically tempting, uniform model accuracy is a fallacy that collapses at the latest when extrapolating. Instead, we propose asking the question 'Where to trust your model?'. Using inherent model uncertainty to consider local accuracy, we obtain the Model-Based Actor-Critic with Uncertainty-Aware Rollout Adaption (MACURA) algorithm. We propose an easy-to-tune rollout mechanism and demonstrate substantial improvements in data efficiency and performance compared to state-of-the-art deep MBRL methods on the MuJoCo benchmark.

Updated: 2024-05-29 11:53:07

标题: 相信模型相信自己——基于模型的带不确定性感知的演员-评论家算法的演变适应

摘要: 动态风格的基于模型的强化学习（MBRL）将无模型代理与通过基于模型的rollouts预测转移模型相结合。这种组合提出了一个关键问题：“何时信任您的模型？”即，哪种rollout长度会使模型提供有用数据？Janner等人（2019年）通过在训练过程中逐渐增加rollout长度来解决这个问题。虽然在理论上很诱人，但均匀的模型准确性是一个在外推时最终崩溃的谬论。相反，我们建议提出问题“在哪里信任您的模型？”利用固有的模型不确定性来考虑局部准确性，我们得到了基于模型的演员-评论家算法，其中包括考虑不确定性的rollout调整（MACURA）。我们提出了一种易于调整的rollout机制，并展示了与MuJoCo基准测试上的最先进的深度MBRL方法相比，在数据效率和性能方面取得了显著的改进。

更新时间: 2024-05-29 11:53:07

领域: cs.LG

下载: http://arxiv.org/abs/2405.19014v1

On Dissipativity of Cross-Entropy Loss in Training ResNets

The training of ResNets and neural ODEs can be formulated and analyzed from the perspective of optimal control. This paper proposes a dissipative formulation of the training of ResNets and neural ODEs for classification problems by including a variant of the cross-entropy as a regularization in the stage cost. Based on the dissipative formulation of the training, we prove that the trained ResNet exhibit the turnpike phenomenon. We then illustrate that the training exhibits the turnpike phenomenon by training on the two spirals and MNIST datasets. This can be used to find very shallow networks suitable for a given classification task.

Updated: 2024-05-29 11:52:53

标题: 关于ResNets训练中交叉熵损失的耗散性

摘要: ResNets和神经ODE的训练可以从最优控制的角度进行建模和分析。本文提出了一种耗散形式的ResNets和神经ODE的训练方法，通过在阶段成本中包含交叉熵的变体作为正则化。基于训练的耗散形式，我们证明训练后的ResNet表现出转角现象。然后，我们通过在两个螺旋和MNIST数据集上进行训练来展示训练过程中的转角现象。这可以用于找到适合特定分类任务的非常浅的网络。

更新时间: 2024-05-29 11:52:53

领域: cs.LG,cs.SY,eess.SY,math.OC

下载: http://arxiv.org/abs/2405.19013v1

Implicit Neural Image Field for Biological Microscopy Image Compression

The rapid pace of innovation in biological microscopy imaging has led to large images, putting pressure on data storage and impeding efficient sharing, management, and visualization. This necessitates the development of efficient compression solutions. Traditional CODEC methods struggle to adapt to the diverse bioimaging data and often suffer from sub-optimal compression. In this study, we propose an adaptive compression workflow based on Implicit Neural Representation (INR). This approach permits application-specific compression objectives, capable of compressing images of any shape and arbitrary pixel-wise decompression. We demonstrated on a wide range of microscopy images from real applications that our workflow not only achieved high, controllable compression ratios (e.g., 512x) but also preserved detailed information critical for downstream analysis.

Updated: 2024-05-29 11:51:33

标题: 隐式神经图像场用于生物显微镜图像压缩

摘要: 生物显微镜成像技术的快速创新速度导致图像尺寸变大，增加了数据存储压力，阻碍了有效的共享、管理和可视化。这需要开发高效的压缩解决方案。传统的编解码（CODEC）方法很难适应多样化的生物成像数据，并经常遭受次优压缩的问题。在这项研究中，我们提出了一种基于隐式神经表示（INR）的自适应压缩工作流程。这种方法允许应用特定的压缩目标，能够压缩任意形状和任意像素解压缩的图像。我们在来自真实应用的广泛显微镜图像上展示了，我们的工作流程不仅实现了高、可控的压缩比（例如512倍），还保留了对下游分析至关重要的详细信息。

更新时间: 2024-05-29 11:51:33

领域: cs.AI

下载: http://arxiv.org/abs/2405.19012v1

Convergence Conditions of Online Regularized Statistical Learning in Reproducing Kernel Hilbert Space With Non-Stationary Data

We study the convergence of recursive regularized learning algorithms in the reproducing kernel Hilbert space (RKHS) with dependent and non-stationary online data streams. Firstly, we study the mean square asymptotic stability of a class of random difference equations in RKHS, whose non-homogeneous terms are martingale difference sequences dependent on the homogeneous ones. Secondly, we introduce the concept of random Tikhonov regularization path, and show that if the regularization path is slowly time-varying in some sense, then the output of the algorithm is consistent with the regularization path in mean square. Furthermore, if the data streams also satisfy the RKHS persistence of excitation condition, i.e. there exists a fixed length of time period, such that each eigenvalue of the conditional expectation of the operators induced by the input data accumulated over every time period has a uniformly positive lower bound with respect to time, then the output of the algorithm is consistent with the unknown function in mean square. Finally, for the case with independent and non-identically distributed data streams, the algorithm achieves the mean square consistency provided the marginal probability measures induced by the input data are slowly time-varying and the average measure over each fixed-length time period has a uniformly strictly positive lower bound.

Updated: 2024-05-29 11:48:59

标题: Reproducing Kernel Hilbert空间中在线正则化统计学习的收敛条件与非稳态数据

摘要: 我们研究了在具有相关性和非平稳在线数据流的再生核希尔伯特空间（RKHS）中递归正则化学习算法的收敛性。首先，我们研究了RKHS中一类随机差分方程的均方渐近稳定性，其中非齐次项是依赖于齐次项的鞅差分序列。其次，我们引入了随机Tikhonov正则化路径的概念，并表明如果正则化路径在某种意义上缓慢变化，那么算法的输出与正则化路径在均方意义上是一致的。此外，如果数据流还满足RKHS持续激励条件，即存在一个固定长度的时间段，使得每个时间段内累积的输入数据诱导的算子的条件期望的每个特征值都对时间有一个均匀正下界，那么算法的输出与未知函数在均方上是一致的。最后，对于具有独立且非同分布数据流的情况下，算法在边际概率测度由输入数据引起缓慢变化且每个固定长度时间段的平均测度具有均匀严格正下界的情况下，能够实现均方一致性。

更新时间: 2024-05-29 11:48:59

领域: cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2404.03211v3

Evaluating the External and Parametric Knowledge Fusion of Large Language Models

Integrating external knowledge into large language models (LLMs) presents a promising solution to overcome the limitations imposed by their antiquated and static parametric memory. Prior studies, however, have tended to over-reliance on external knowledge, underestimating the valuable contributions of an LLMs' intrinsic parametric knowledge. The efficacy of LLMs in blending external and parametric knowledge remains largely unexplored, especially in cases where external knowledge is incomplete and necessitates supplementation by their parametric knowledge. We propose to deconstruct knowledge fusion into four distinct scenarios, offering the first thorough investigation of LLM behavior across each. We develop a systematic pipeline for data construction and knowledge infusion to simulate these fusion scenarios, facilitating a series of controlled experiments. Our investigation reveals that enhancing parametric knowledge within LLMs can significantly bolster their capability for knowledge integration. Nonetheless, we identify persistent challenges in memorizing and eliciting parametric knowledge, and determining parametric knowledge boundaries. Our findings aim to steer future explorations on harmonizing external and parametric knowledge within LLMs.

Updated: 2024-05-29 11:48:27

标题: 评估大型语言模型的外部和参数化知识融合

摘要: 将外部知识整合到大型语言模型（LLMs）中是克服其古老和静态参数化内存所施加限制的一个有前途的解决方案。然而，先前的研究往往过于依赖外部知识，低估了LLMs固有参数化知识的宝贵贡献。LLMs在融合外部和参数化知识方面的有效性仍然大部分未被探索，特别是在外部知识不完整并需要通过其参数化知识进行补充的情况下。我们提出将知识融合分解为四种不同的场景，首次全面调查LLMs在每种场景下的行为。我们开发了一个系统化的数据构建和知识注入流程，模拟这些融合场景，促进一系列受控实验。我们的调查显示，增强LLMs内部参数化知识可以显著增强它们整合知识的能力。然而，我们发现在记忆和引出参数化知识以及确定参数化知识边界方面存在持久的挑战。我们的发现旨在引领未来关于在LLMs中协调外部和参数化知识的探索。

更新时间: 2024-05-29 11:48:27

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.19010v1

Individual and Contextual Variables of Cyber Security Behaviour -- An empirical analysis of national culture, industry, organisation, and individual variables of (in)secure human behaviour

Cyber security incidents are increasing and humans play an important role in reducing their likelihood and impact. We identify a skewed focus towards technical aspects of cyber security in the literature, whereas factors influencing the secure behaviour of individuals require additional research. These factors span across both the individual level and the contextual level in which the people are situated. We analyse two datasets of a total of 37,075 records from a) self-reported security behaviours across the EU, and b) observed phishing-related behaviours from the industry security awareness training programmes. We identify that national culture, industry type, and organisational security culture play are influential Variables (antecedents) of individuals' security behaviour at contextual level. Whereas, demographics (age, gender, and level or urbanisation) and security-specific factors (security awareness, security knowledge, and prior experience with security incidents) are found to be influential variables of security behaviour at individual level. Our findings have implications for both research and practice as they fill a gap in the literature and provide concrete statistical evidence on the variables which influence security behaviour. Moreover, findings provides practical insights for organisations regarding the susceptibility of groups of people to insecure behaviour. Consequently, organisations can tailor their security training and awareness efforts (e.g., through behaviour change interventions and/or appropriate employee group profiles), adapt their communications (e.g., of information security policies), and customise their interventions according to national culture characteristics to improve security behaviour.

Updated: 2024-05-29 11:45:21

标题: 个人和背景因素对网络安全行为的影响——国家文化、行业、组织和个人因素对（不）安全人类行为的实证分析

摘要: 网络安全事件正在增加，人类在减少其发生可能性和影响方面起着重要作用。我们发现文献中对网络安全技术方面的关注存在偏向，而影响个人安全行为的因素需要进一步研究。这些因素跨越个体层面和个体所处的情境层面。我们分析了两个数据集，总共包括来自欧盟的自报告安全行为和来自行业安全意识培训计划的观察到的钓鱼相关行为的37,075条记录。我们发现国家文化、行业类型和组织安全文化是影响个体安全行为的情境层面的关键变量（前因变量）。而人口统计学因素（年龄、性别和城市化水平）和安全特定因素（安全意识、安全知识和先前安全事件经验）被发现是个体安全行为的影响变量。我们的研究结果对于研究和实践都具有重要意义，因为它们填补了文献中的空白，并提供了对影响安全行为的变量的具体统计证据。此外，研究结果为组织提供了关于不安全行为易感性的实际见解。因此，组织可以根据国家文化特征定制他们的安全培训和意识努力（例如，通过行为变化干预和/或适当的员工群体概况），调整他们的通信（例如，信息安全政策），并根据国家文化特征定制他们的干预措施以改善安全行为。

更新时间: 2024-05-29 11:45:21

领域: cs.CR

下载: http://arxiv.org/abs/2405.16215v2

ParFam -- (Neural Guided) Symbolic Regression Based on Continuous Global Optimization

The problem of symbolic regression (SR) arises in many different applications, such as identifying physical laws or deriving mathematical equations describing the behavior of financial markets from given data. Various methods exist to address the problem of SR, often based on genetic programming. However, these methods are usually complicated and involve various hyperparameters. In this paper, we present our new approach ParFam that utilizes parametric families of suitable symbolic functions to translate the discrete symbolic regression problem into a continuous one, resulting in a more straightforward setup compared to current state-of-the-art methods. In combination with a global optimizer, this approach results in a highly effective method to tackle the problem of SR. We theoretically analyze the expressivity of ParFam and demonstrate its performance with extensive numerical experiments based on the common SR benchmark suit SRBench, showing that we achieve state-of-the-art results. Moreover, we present an extension incorporating a pre-trained transformer network DL-ParFam to guide ParFam, accelerating the optimization process by up to two magnitudes. Our code and results can be found at https://github.com/Philipp238/parfam.

Updated: 2024-05-29 11:41:47

标题: ParFam -- (神经引导) 基于连续全局优化的符号回归

摘要: 符号回归（SR）问题在许多不同的应用中出现，例如识别物理定律或从给定数据推导描述金融市场行为的数学方程。存在各种方法来解决SR问题，通常基于遗传编程。然而，这些方法通常复杂，并涉及各种超参数。在本文中，我们提出了我们的新方法ParFam，利用适合的符号函数的参数化系列将离散符号回归问题转化为连续问题，与当前最先进的方法相比，设置更为简单。结合全局优化器，这种方法可以高效地解决SR问题。我们在理论上分析了ParFam的表达能力，并通过基于常见SR基准套装SRBench的大量数值实验展示了其性能，表明我们实现了最先进的结果。此外，我们提出了一个扩展，将预训练的转换器网络DL-ParFam纳入ParFam，通过高达两个数量级的加速优化过程。我们的代码和结果可以在https://github.com/Philipp238/parfam找到。

更新时间: 2024-05-29 11:41:47

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2310.05537v3

A Canonization Perspective on Invariant and Equivariant Learning

In many applications, we desire neural networks to exhibit invariance or equivariance to certain groups due to symmetries inherent in the data. Recently, frame-averaging methods emerged to be a unified framework for attaining symmetries efficiently by averaging over input-dependent subsets of the group, i.e., frames. What we currently lack is a principled understanding of the design of frames. In this work, we introduce a canonization perspective that provides an essential and complete view of the design of frames. Canonization is a classic approach for attaining invariance by mapping inputs to their canonical forms. We show that there exists an inherent connection between frames and canonical forms. Leveraging this connection, we can efficiently compare the complexity of frames as well as determine the optimality of certain frames. Guided by this principle, we design novel frames for eigenvectors that are strictly superior to existing methods -- some are even optimal -- both theoretically and empirically. The reduction to the canonization perspective further uncovers equivalences between previous methods. These observations suggest that canonization provides a fundamental understanding of existing frame-averaging methods and unifies existing equivariant and invariant learning methods.

Updated: 2024-05-29 11:31:19

标题: 一个关于不变学习和等变学习的规范化视角

摘要: 在许多应用中，我们希望神经网络对数据中固有的对称性表现出不变性或等变性。最近，出现了一种以平均帧为基础的方法，通过对输入相关的群的子集（即帧）进行平均，有效地实现对称性。我们目前缺乏对帧设计的原则性理解。在这项工作中，我们引入了一个规范化的视角，提供了对帧设计的基本和完整的视图。规范化是一种通过将输入映射到它们的规范形式来实现不变性的经典方法。我们表明帧和规范形式之间存在固有联系。利用这种联系，我们可以有效地比较帧的复杂性，并确定某些帧的最优性。在这一原则的指导下，我们设计了针对特征向量的新颖帧，这些帧在理论上和实践中严格优于现有方法，甚至有些是最优的。对规范化视角的归纳进一步揭示了先前方法之间的等价性。这些观察结果表明，规范化提供了对现有平均帧方法的基本理解，并统一了现有的等变和不变学习方法。

更新时间: 2024-05-29 11:31:19

领域: cs.LG

下载: http://arxiv.org/abs/2405.18378v2

FedMAP: Unlocking Potential in Personalized Federated Learning through Bi-Level MAP Optimization

Federated Learning (FL) enables collaborative training of machine learning models on decentralized data while preserving data privacy. However, data across clients often differs significantly due to class imbalance, feature distribution skew, sample size imbalance, and other phenomena. Leveraging information from these not identically distributed (non-IID) datasets poses substantial challenges. FL methods based on a single global model cannot effectively capture the variations in client data and underperform in non-IID settings. Consequently, Personalized FL (PFL) approaches that adapt to each client's data distribution but leverage other clients' data are essential but currently underexplored. We propose a novel Bayesian PFL framework using bi-level optimization to tackle the data heterogeneity challenges. Our proposed framework utilizes the global model as a prior distribution within a Maximum A Posteriori (MAP) estimation of personalized client models. This approach facilitates PFL by integrating shared knowledge from the prior, thereby enhancing local model performance, generalization ability, and communication efficiency. We extensively evaluated our bi-level optimization approach on real-world and synthetic datasets, demonstrating significant improvements in model accuracy compared to existing methods while reducing communication overhead. This study contributes to PFL by establishing a solid theoretical foundation for the proposed method and offering a robust, ready-to-use framework that effectively addresses the challenges posed by non-IID data in FL.

Updated: 2024-05-29 11:28:06

标题: FedMAP：通过双层MAP优化释放个性化联邦学习的潜力

摘要: 联邦学习（FL）使得在去中心化数据上进行机器学习模型的协作训练成为可能，同时保护数据隐私。然而，由于类别不平衡、特征分布倾斜、样本大小不平衡等现象，客户端之间的数据往往存在显著差异。利用这些非独立同分布（non-IID）数据集中的信息带来了重大挑战。基于单一全局模型的FL方法无法有效捕捉客户端数据的变化，在非IID环境中表现不佳。因此，个性化FL（PFL）方法适应每个客户端的数据分布但利用其他客户端数据至关重要，但目前尚未充分探索。我们提出了一种新颖的基于贝叶斯的PFL框架，使用双层优化来解决数据异质性挑战。我们提出的框架利用全局模型作为先验分布，在最大后验估计（MAP）中个性化客户端模型。这种方法通过整合来自先验的共享知识，促进PFL，从而增强局部模型性能、泛化能力和通信效率。我们在真实和合成数据集上广泛评估了我们的双层优化方法，展示了与现有方法相比在模型准确性方面显著改进的同时减少了通信开销。这项研究通过为所提出的方法建立坚实的理论基础，提供一个强大而易于使用的框架，有效地解决了FL中非IID数据带来的挑战，为PFL做出了贡献。

更新时间: 2024-05-29 11:28:06

领域: cs.LG

下载: http://arxiv.org/abs/2405.19000v1

Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion

Editing signals using large pre-trained models, in a zero-shot manner, has recently seen rapid advancements in the image domain. However, this wave has yet to reach the audio domain. In this paper, we explore two zero-shot editing techniques for audio signals, which use DDPM inversion with pre-trained diffusion models. The first, which we coin ZEro-shot Text-based Audio (ZETA) editing, is adopted from the image domain. The second, named ZEro-shot UnSupervized (ZEUS) editing, is a novel approach for discovering semantically meaningful editing directions without supervision. When applied to music signals, this method exposes a range of musically interesting modifications, from controlling the participation of specific instruments to improvisations on the melody. Samples and code can be found in https://hilamanor.github.io/AudioEditing/ .

Updated: 2024-05-29 11:27:24

标题: 零样本无监督和基于文本的音频编辑使用DDPM反演

摘要: 使用大型预训练模型，在零样本方式下编辑信号，在图像领域最近取得了快速进展。然而，这股浪潮尚未触及音频领域。本文中，我们探索了两种用于音频信号的零样本编辑技术，使用预训练扩散模型进行DDPM反演。第一种被我们称为基于文本的零样本音频（ZETA）编辑，源自图像领域。第二种被称为零样本无监督（ZEUS）编辑，是一种新颖方法，用于在没有监督的情况下发现语义上有意义的编辑方向。当应用于音乐信号时，这种方法展示了一系列具有音乐趣味性的修改，从控制特定乐器的参与到对旋律的即兴演奏。样本和代码可以在https://hilamanor.github.io/AudioEditing/找到。

更新时间: 2024-05-29 11:27:24

领域: cs.SD,cs.LG,eess.AS

下载: http://arxiv.org/abs/2402.10009v4

Continuously Optimizing Radar Placement with Model Predictive Path Integrals

Continuously optimizing sensor placement is essential for precise target localization in various military and civilian applications. While information theory has shown promise in optimizing sensor placement, many studies oversimplify sensor measurement models or neglect dynamic constraints of mobile sensors. To address these challenges, we employ a range measurement model that incorporates radar parameters and radar-target distance, coupled with Model Predictive Path Integral (MPPI) control to manage complex environmental obstacles and dynamic constraints. We compare the proposed approach against stationary radars or simplified range measurement models based on the root mean squared error (RMSE) of the Cubature Kalman Filter (CKF) estimator for the targets' state. Additionally, we visualize the evolving geometry of radars and targets over time, highlighting areas of highest measurement information gain, demonstrating the strengths of the approach. The proposed strategy outperforms stationary radars and simplified range measurement models in target localization, achieving a 38-74% reduction in mean RMSE and a 33-79% reduction in the upper tail of the 90% Highest Density Interval (HDI) over 500 Monte Carl (MC) trials across all time steps. Code will be made publicly available upon acceptance.

Updated: 2024-05-29 11:25:53

标题: 不断优化雷达部署：模型预测路径积分

摘要: 持续优化传感器布置对于在各种军事和民用应用中精确定位目标至关重要。虽然信息理论在优化传感器布置方面显示出潜力，但许多研究过于简化传感器测量模型或忽视移动传感器的动态约束。为了解决这些挑战，我们采用了一个集成了雷达参数和雷达-目标距离的距离测量模型，结合模型预测路径积分（MPPI）控制来管理复杂的环境障碍和动态约束。我们通过Cubature Kalman Filter (CKF)估计器的均方根误差（RMSE）比较了所提出的方法与静止雷达或基于简化距离测量模型的方法。此外，我们可视化雷达和目标随时间演变的几何形状，突出显示出具有最高测量信息增益的区域，展示了该方法的优势。所提出的策略在目标定位方面优于静止雷达和简化的距离测量模型，实现了平均RMSE减少38-74%和在所有时间步骤上90%最高密度区间（HDI）上尾部减少33-79%的结果，在500次蒙特卡洛（MC）试验中。接受后将提供代码公开可用。

更新时间: 2024-05-29 11:25:53

领域: stat.AP,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.18999v1

Kernel Semi-Implicit Variational Inference

Semi-implicit variational inference (SIVI) extends traditional variational families with semi-implicit distributions defined in a hierarchical manner. Due to the intractable densities of semi-implicit distributions, classical SIVI often resorts to surrogates of evidence lower bound (ELBO) that would introduce biases for training. A recent advancement in SIVI, named SIVI-SM, utilizes an alternative score matching objective made tractable via a minimax formulation, albeit requiring an additional lower-level optimization. In this paper, we propose kernel SIVI (KSIVI), a variant of SIVI-SM that eliminates the need for lower-level optimization through kernel tricks. Specifically, we show that when optimizing over a reproducing kernel Hilbert space (RKHS), the lower-level problem has an explicit solution. This way, the upper-level objective becomes the kernel Stein discrepancy (KSD), which is readily computable for stochastic gradient descent due to the hierarchical structure of semi-implicit variational distributions. An upper bound for the variance of the Monte Carlo gradient estimators of the KSD objective is derived, which allows us to establish novel convergence guarantees of KSIVI. We demonstrate the effectiveness and efficiency of KSIVI on both synthetic distributions and a variety of real data Bayesian inference tasks.

Updated: 2024-05-29 11:21:25

标题: 核半隐变分推断

摘要: 半隐式变分推断（SIVI）通过以层次化方式定义的半隐式分布扩展了传统的变分家族。由于半隐式分布的难以处理的密度，经典的SIVI通常会采用证据下限（ELBO）的替代物，这会引入训练偏差。最近在SIVI领域的一个进展，名为SIVI-SM，利用了通过极小极大形式可行的替代得分匹配目标，尽管需要额外的低级优化。在本文中，我们提出了核SIVI（KSIVI），这是SIVI-SM的一个变种，通过核技巧消除了对低级优化的需求。具体地，我们展示了在优化再生核希尔伯特空间（RKHS）时，低级问题有一个显式解决方案。这样，上层目标成为核斯坦距离（KSD），由于半隐式变分分布的层次结构，KSD目标对于随机梯度下降是可计算的。我们推导了KSD目标的蒙特卡洛梯度估计器方差的上界，这使我们能够建立KSIVI的新收敛保证。我们展示了KSIVI在合成分布和各种真实数据的贝叶斯推断任务上的有效性和效率。

更新时间: 2024-05-29 11:21:25

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.18997v1

ParsEval: Evaluation of Parsing Behavior using Real-world Out-in-the-wild X.509 Certificates

X.509 certificates play a crucial role in establishing secure communication over the internet by enabling authentication and data integrity. Equipped with a rich feature set, the X.509 standard is defined by multiple, comprehensive ISO/IEC documents. Due to its internet-wide usage, there are different implementations in multiple programming languages leading to a large and fragmented ecosystem. This work addresses the research question "Are there user-visible and security-related differences between X.509 certificate parsers?". Relevant libraries offering APIs for parsing X.509 certificates were investigated and an appropriate test suite was developed. From 34 libraries 6 were chosen for further analysis. The X.509 parsing modules of the chosen libraries were called with 186,576,846 different certificates from a real-world dataset and the observed error codes were investigated. This study reveals an anomaly in wolfSSL's X.509 parsing module and that there are fundamental differences in the ecosystem. While related studies nowadays mostly focus on fuzzing techniques resulting in artificial certificates, this study confirms that available X.509 parsing modules differ largely and yield different results, even for real-world out-in-the-wild certificates.

Updated: 2024-05-29 11:15:12

标题: ParsEval：使用真实世界中的X.509证书评估解析行为

摘要: X.509证书在建立互联网安全通信中发挥关键作用，通过实现身份验证和数据完整性。X.509标准具有丰富的功能集，由多个全面的ISO/IEC文档定义。由于其在整个互联网上的使用，不同的编程语言实现了多种实现，导致一个庞大且分散的生态系统。本研究探讨了一个研究问题：“X.509证书解析器之间是否存在用户可见和安全相关的差异？”。调查了提供用于解析X.509证书的API的相关库，并开发了一个适当的测试套件。从34个库中选择了6个进行进一步分析。选择的库的X.509解析模块被调用了186,576,846个不同证书来自一个真实数据集，并调查了观察到的错误代码。这项研究揭示了wolfSSL的X.509解析模块中的异常，并且生态系统中存在根本差异。尽管相关研究现在主要关注导致人工证书的模糊技术，但这项研究确认可用的X.509解析模块存在较大差异，并产生不同结果，即使是对于真实世界中的野外证书。

更新时间: 2024-05-29 11:15:12

领域: cs.CR

下载: http://arxiv.org/abs/2405.18993v1

Tackling Cyberattacks through AI-based Reactive Systems: A Holistic Review and Future Vision

There is no denying that the use of Information Technology (IT) is undergoing exponential growth in today's world. This digital transformation has also given rise to a multitude of security challenges, notably in the realm of cybercrime. In response to these growing threats, public and private sectors have prioritized the strengthening of IT security measures. In light of the growing security concern, Artificial Intelligence (AI) has gained prominence within the cybersecurity landscape. This paper presents a comprehensive survey of recent advancements in AI-driven threat response systems. To the best of our knowledge, the most recent survey covering the AI reaction domain was conducted in 2017. Since then, considerable literature has been published, and therefore, it is worth reviewing it. In this comprehensive survey of the state of the art reaction systems, five key features with multiple values have been identified, facilitating a homogeneous comparison between the different works. In addition, through a meticulous methodology of article collection, the 22 most relevant publications in the field have been selected. Then each of these publications has been subjected to a detailed analysis using the features identified, which has allowed for the generation of a comprehensive overview revealing significant relationships between the papers. These relationships are further elaborated in the paper, along with the identification of potential gaps in the literature, which may guide future contributions. A total of seven research challenges have been identified, pointing out these potential gaps and suggesting possible areas of development through concrete proposals.

Updated: 2024-05-29 11:14:12

标题: 通过基于人工智能的反应性系统应对网络攻击：一项全面审查及未来展望

摘要: 在当今世界，信息技术（IT）的使用正在经历指数增长，这是不可否认的。这种数字转型也带来了许多安全挑战，特别是在网络犯罪领域。为了应对不断增长的威胁，公共和私营部门已经将加强IT安全措施作为优先事项。鉴于日益增长的安全担忧，人工智能（AI）在网络安全领域日益受到重视。本文介绍了人工智能驱动的威胁响应系统的最新进展的综合调查。据我们所知，最近一次涵盖AI反应领域的调查是在2017年进行的。自那时以来，已经发表了大量文献，因此值得进行回顾。在这份关于最新反应系统状况的综合调查中，已经确定了五个关键特征，具有多个值，从而便于不同作品之间的同质比较。此外，通过精心收集文章的方法论，选择了该领域中最相关的22篇出版物。然后，对每一篇出版物进行了详细分析，使用所识别的特征，这使得能够生成一个全面的概述，揭示了论文之间的重要关系。这些关系在论文中进一步阐述，同时还确定了文献中的潜在空白，可指导未来的贡献。共识别出了七个研究挑战，指出这些潜在空白，并通过具体提议建议可能的发展领域。

更新时间: 2024-05-29 11:14:12

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2312.06229v2

Regularized Q-learning through Robust Averaging

We propose a new Q-learning variant, called 2RA Q-learning, that addresses some weaknesses of existing Q-learning methods in a principled manner. One such weakness is an underlying estimation bias which cannot be controlled and often results in poor performance. We propose a distributionally robust estimator for the maximum expected value term, which allows us to precisely control the level of estimation bias introduced. The distributionally robust estimator admits a closed-form solution such that the proposed algorithm has a computational cost per iteration comparable to Watkins' Q-learning. For the tabular case, we show that 2RA Q-learning converges to the optimal policy and analyze its asymptotic mean-squared error. Lastly, we conduct numerical experiments for various settings, which corroborate our theoretical findings and indicate that 2RA Q-learning often performs better than existing methods.

Updated: 2024-05-29 11:12:24

标题: 通过鲁棒平均实现正则化的Q学习

摘要: 我们提出了一种新的Q-learning变体，称为2RA Q-learning，它以原则性的方式解决了现有Q-learning方法的一些弱点。其中一个弱点是无法控制的潜在估计偏差，通常导致性能不佳。我们提出了一个分布鲁棒估计器用于最大期望值项，这使我们能够精确控制引入的估计偏差水平。分布鲁棒估计器具有封闭形式的解决方案，使得所提出的算法的每次迭代的计算成本与Watkins的Q-learning相当。对于表格案例，我们展示了2RA Q-learning收敛到最优策略，并分析其渐近均方误差。最后，我们进行了各种设置的数值实验，证实了我们的理论发现，并表明2RA Q-learning通常比现有方法表现更好。

更新时间: 2024-05-29 11:12:24

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.02201v2

GRAMMAR: Grounded and Modular Methodology for Assessment of Domain-Specific Retrieval-Augmented Language Model

Retrieval-augmented Generation (RAG) systems have been actively studied and deployed across various industries to query on domain-specific knowledge base. However, evaluating these systems presents unique challenges due to the scarcity of domain-specific queries and corresponding ground truths, as well as a lack of systematic approaches to diagnosing the cause of failure cases -- whether they stem from knowledge deficits or issues related to system robustness. To address these challenges, we introduce GRAMMAR (GRounded And Modular Methodology for Assessment of RAG), an evaluation framework comprising two key elements: 1) a data generation process that leverages relational databases and LLMs to efficiently produce scalable query-answer pairs. This method facilitates the separation of query logic from linguistic variations for enhanced debugging capabilities; and 2) an evaluation framework that differentiates knowledge gaps from robustness and enables the identification of defective modules. Our empirical results underscore the limitations of current reference-free evaluation approaches and the reliability of GRAMMAR to accurately identify model vulnerabilities.

Updated: 2024-05-29 11:12:21

标题: 语法：领域特定检索增强语言模型评估的基础和模块化方法论

摘要: 检索增强生成（RAG）系统已被积极研究和部署在各个行业，以查询领域特定的知识库。然而，评估这些系统面临独特挑战，因为领域特定查询和相应的基本事实稀缺，缺乏系统性方法来诊断失败案例的原因--无论是源于知识缺陷还是与系统鲁棒性相关的问题。为了解决这些挑战，我们介绍了GRAMMAR（GRounded And Modular Methodology for Assessment of RAG），一个评估框架，包括两个关键元素：1）利用关系数据库和LLM来高效生成可扩展的查询-答案对的数据生成过程。这种方法有助于将查询逻辑与语言变化分离，以提高调试能力；和2）一个评估框架，区分知识缺口和鲁棒性，并能够识别有缺陷的模块。我们的实证结果强调了当前无参考评估方法的局限性，以及GRAMMAR准确识别模型漏洞的可靠性。

更新时间: 2024-05-29 11:12:21

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2404.19232v4

InstructVid2Vid: Controllable Video Editing with Natural Language Instructions

We introduce InstructVid2Vid, an end-to-end diffusion-based methodology for video editing guided by human language instructions. Our approach empowers video manipulation guided by natural language directives, eliminating the need for per-example fine-tuning or inversion. The proposed InstructVid2Vid model modifies a pretrained image generation model, Stable Diffusion, to generate a time-dependent sequence of video frames. By harnessing the collective intelligence of disparate models, we engineer a training dataset rich in video-instruction triplets, which is a more cost-efficient alternative to collecting data in real-world scenarios. To enhance the coherence between successive frames within the generated videos, we propose the Inter-Frames Consistency Loss and incorporate it during the training process. With multimodal classifier-free guidance during the inference stage, the generated videos is able to resonate with both the input video and the accompanying instructions. Experimental results demonstrate that InstructVid2Vid is capable of generating high-quality, temporally coherent videos and performing diverse edits, including attribute editing, background changes, and style transfer. These results underscore the versatility and effectiveness of our proposed method.

Updated: 2024-05-29 11:08:41

标题: InstructVid2Vid: 使用自然语言指令进行可控视频编辑

摘要: 我们介绍了InstructVid2Vid，这是一种基于扩散的端到端方法，用于根据人类语言指令进行视频编辑。我们的方法通过自然语言指令引导视频操作，消除了需要逐个微调或反转的需求。提出的InstructVid2Vid模型修改了预训练的图像生成模型Stable Diffusion，以生成一系列时间相关的视频帧。通过利用不同模型的集体智慧，我们构建了一个训练数据集，其中包含丰富的视频指令三元组，这是一种更具成本效益的替代方案，而不是在现实场景中收集数据。为了增强生成视频中连续帧之间的连贯性，我们提出了Inter-Frames一致性损失，并在训练过程中加以应用。在推理阶段通过多模态无分类器指导，生成的视频能够与输入视频和随附指令 resonant。实验结果表明，InstructVid2Vid能够生成高质量、时间一致的视频，并执行各种编辑，包括属性编辑、背景更改和风格转移。这些结果突显了我们提出方法的多功能性和有效性。

更新时间: 2024-05-29 11:08:41

领域: cs.CV,cs.AI,cs.MM

下载: http://arxiv.org/abs/2305.12328v2

Efficient Error Certification for Physics-Informed Neural Networks

Recent work provides promising evidence that Physics-Informed Neural Networks (PINN) can efficiently solve partial differential equations (PDE). However, previous works have failed to provide guarantees on the worst-case residual error of a PINN across the spatio-temporal domain - a measure akin to the tolerance of numerical solvers - focusing instead on point-wise comparisons between their solution and the ones obtained by a solver on a set of inputs. In real-world applications, one cannot consider tests on a finite set of points to be sufficient grounds for deployment, as the performance could be substantially worse on a different set. To alleviate this issue, we establish guaranteed error-based conditions for PINNs over their continuous applicability domain. To verify the extent to which they hold, we introduce $\partial$-CROWN: a general, efficient and scalable post-training framework to bound PINN residual errors. We demonstrate its effectiveness in obtaining tight certificates by applying it to two classically studied PINNs - Burgers' and Schr\"odinger's equations -, and two more challenging ones with real-world applications - the Allan-Cahn and Diffusion-Sorption equations.

Updated: 2024-05-29 11:08:06

标题: 物理信息神经网络的高效误差认证

摘要: 最近的研究提供了有希望的证据，表明物理信息神经网络（PINN）可以有效地解决偏微分方程（PDE）。然而，先前的研究未能提供关于PINN在时空域内最坏情况残差误差的保证 - 这是类似于数值求解器容差的度量 - 而是集中在他们的解与求解器在一组输入上获得的解之间的逐点比较。在实际应用中，不能认为在有限点集上的测试足以作为部署的充分依据，因为在不同点集上性能可能大不相同。为了缓解这个问题，我们建立了针对PINN在其连续适用域上的保证误差条件。为了验证它们的适用程度，我们引入了 $\partial$-CROWN：一个通用、高效和可扩展的后训练框架，用于限定PINN残差误差。通过将其应用于两个经典研究的PINN - Burgers' 和 Schrödinger's 方程 -，以及两个具有实际应用的更具挑战性的方程 - Allan-Cahn 和 Diffusion-Sorption 方程，我们证明了其获得紧密证书的有效性。

更新时间: 2024-05-29 11:08:06

领域: cs.LG,math-ph,math.MP

下载: http://arxiv.org/abs/2305.10157v2

Transition Constrained Bayesian Optimization via Markov Decision Processes

Bayesian optimization is a methodology to optimize black-box functions. Traditionally, it focuses on the setting where you can arbitrarily query the search space. However, many real-life problems do not offer this flexibility; in particular, the search space of the next query may depend on previous ones. Example challenges arise in the physical sciences in the form of local movement constraints, required monotonicity in certain variables, and transitions influencing the accuracy of measurements. Altogether, such transition constraints necessitate a form of planning. This work extends classical Bayesian optimization via the framework of Markov Decision Processes. We iteratively solve a tractable linearization of our utility function using reinforcement learning to obtain a policy that plans ahead for the entire horizon. This is a parallel to the optimization of an acquisition function in policy space. The resulting policy is potentially history-dependent and non-Markovian. We showcase applications in chemical reactor optimization, informative path planning, machine calibration, and other synthetic examples.

Updated: 2024-05-29 11:05:43

标题: 通过马尔可夫决策过程的转换受限贝叶斯优化

摘要: 贝叶斯优化是一种优化黑箱函数的方法。传统上，它侧重于可以任意查询搜索空间的设置。然而，许多现实生活中的问题并不提供这种灵活性；特别是，下一个查询的搜索空间可能取决于先前的查询。例如，在物理科学中存在局部移动约束、某些变量的单调性要求以及转换影响测量准确性等挑战。所有这些过渡约束需要一种规划形式。本研究通过马尔可夫决策过程的框架扩展了经典的贝叶斯优化。我们通过强化学习迭代地解决我们效用函数的可解线性化，以获得一个可以提前规划整个时间段的策略。这类似于在策略空间中优化获取函数。结果策略可能依赖于历史并且非马尔可夫。我们展示了在化学反应器优化、信息路径规划、机器校准和其他合成示例中的应用。

更新时间: 2024-05-29 11:05:43

领域: cs.LG

下载: http://arxiv.org/abs/2402.08406v2

Robust Optimization in Protein Fitness Landscapes Using Reinforcement Learning in Latent Space

Proteins are complex molecules responsible for different functions in nature. Enhancing the functionality of proteins and cellular fitness can significantly impact various industries. However, protein optimization using computational methods remains challenging, especially when starting from low-fitness sequences. We propose LatProtRL, an optimization method to efficiently traverse a latent space learned by an encoder-decoder leveraging a large protein language model. To escape local optima, our optimization is modeled as a Markov decision process using reinforcement learning acting directly in latent space. We evaluate our approach on two important fitness optimization tasks, demonstrating its ability to achieve comparable or superior fitness over baseline methods. Our findings and in vitro evaluation show that the generated sequences can reach high-fitness regions, suggesting a substantial potential of LatProtRL in lab-in-the-loop scenarios.

Updated: 2024-05-29 11:03:42

标题: 在蛋白质适应性景观中使用强化学习在潜在空间中进行鲁棒优化

摘要: 蛋白质是复杂的分子，在自然界中负责不同的功能。增强蛋白质功能性和细胞健康状况可以显著影响各个行业。然而，使用计算方法对蛋白质进行优化仍然具有挑战性，特别是当从低健康序列开始时。我们提出了LatProtRL，一种优化方法，可以有效地遍历由编码器-解码器学习的潜在空间，利用大型蛋白质语言模型。为了避免局部最优解，我们的优化被建模为一个马尔可夫决策过程，利用强化学习直接在潜在空间中发挥作用。我们在两项重要的健康优化任务上评估了我们的方法，展示了它能够实现与基准方法相当或更优的健康状况。我们的发现和体外评估表明，生成的序列可以达到高健康区域，表明LatProtRL在实验室环境中具有巨大潜力。

更新时间: 2024-05-29 11:03:42

领域: cs.LG,q-bio.BM,q-bio.QM

下载: http://arxiv.org/abs/2405.18986v1

Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets prevent these models from forecasting at finer time scales. This paper proposes a physics-AI hybrid model (i.e., WeatherGFT) which Generalizes weather forecasts to Finer-grained Temporal scales beyond training dataset. Specifically, we employ a carefully designed PDE kernel to simulate physical evolution on a small time scale (e.g., 300 seconds) and use a parallel neural networks with a learnable router for bias correction. Furthermore, we introduce a lead time-aware training framework to promote the generalization of the model at different lead times. The weight analysis of physics-AI modules indicates that physics conducts major evolution while AI performs corrections adaptively. Extensive experiments show that WeatherGFT trained on an hourly dataset, achieves state-of-the-art performance across multiple lead times and exhibits the capability to generalize 30-minute forecasts.

Updated: 2024-05-29 11:02:48

标题: 将天气预报推广到细粒度时间尺度：通过物理-人工智能混合建模

摘要: 基于数据驱动的人工智能（AI）模型在天气预报领域取得了显著进展，特别是在中期和短期预测方面。然而，大多数基于数据驱动的天气预报模型是黑匣子系统，专注于学习数据映射而不是在时间维度上进行精细的物理演化。因此，数据集在时间尺度上的限制阻碍了这些模型对更细致的时间尺度进行预测。本文提出了一种物理-AI混合模型（即WeatherGFT），将天气预报推广到超出训练数据集的更细致的时间尺度。具体而言，我们采用精心设计的PDE核来模拟小时间尺度（例如300秒）上的物理演化，并使用具有可学习路由器的并行神经网络进行偏差校正。此外，我们引入了一种提升模型在不同提前时间下泛化能力的提前时间感知训练框架。物理-AI模块的权重分析表明，物理模块主要进行演化，而AI模块则灵活进行校正。大量实验证明，WeatherGFT在每小时数据集上训练后，在多个提前时间上实现了最先进的性能，并展示了泛化30分钟预报的能力。

更新时间: 2024-05-29 11:02:48

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.13796v3

Double-I Watermark: Protecting Model Copyright for LLM Fine-tuning

To support various applications, a prevalent and efficient approach for business owners is leveraging their valuable datasets to fine-tune a pre-trained LLM through the API provided by LLM owners or cloud servers. However, this process carries a substantial risk of model misuse, potentially resulting in severe economic consequences for business owners. Thus, safeguarding the copyright of these customized models during LLM fine-tuning has become an urgent practical requirement, but there are limited existing solutions to provide such protection. To tackle this pressing issue, we propose a novel watermarking approach named ``Double-I watermark''. Specifically, based on the instruct-tuning data, two types of backdoor data paradigms are introduced with trigger in the instruction and the input, respectively. By leveraging LLM's learning capability to incorporate customized backdoor samples into the dataset, the proposed approach effectively injects specific watermarking information into the customized model during fine-tuning, which makes it easy to inject and verify watermarks in commercial scenarios. We evaluate the proposed "Double-I watermark" under various fine-tuning methods, demonstrating its harmlessness, robustness, uniqueness, imperceptibility, and validity through both quantitative and qualitative analyses.

Updated: 2024-05-29 11:02:16

标题: 双I水印：保护LLM Fine-tuning模型版权

摘要: 为支持各种应用，商业所有者常用且高效的方法是利用他们宝贵的数据集通过LLM所有者或云服务器提供的API来微调预训练的LLM。然而，这一过程存在着模型误用的重大风险，可能导致商业所有者遭受严重的经济后果。因此，在LLM微调过程中保护这些定制模型的版权已成为紧迫的实际需求，但目前存在有限的解决方案来提供此类保护。为了解决这一紧迫问题，我们提出了一种名为“Double-I水印”的新型水印方法。具体来说，基于指导微调数据，引入了两种类型的后门数据范式，分别在指令和输入中设置触发器。通过利用LLM的学习能力将定制后门样本整合到数据集中，所提出的方法在微调过程中有效地向定制模型注入特定的水印信息，从而使得在商业场景中注入和验证水印变得容易。我们通过各种微调方法评估了所提出的“Double-I水印”，通过定量和定性分析证明了其无害性、鲁棒性、唯一性、不可察觉性和有效性。

更新时间: 2024-05-29 11:02:16

领域: cs.CR,cs.AI,cs.LG

下载: http://arxiv.org/abs/2402.14883v2

Linear bandits with polylogarithmic minimax regret

We study a noise model for linear stochastic bandits for which the subgaussian noise parameter vanishes linearly as we select actions on the unit sphere closer and closer to the unknown vector. We introduce an algorithm for this problem that exhibits a minimax regret scaling as $\log^3(T)$ in the time horizon $T$, in stark contrast the square root scaling of this regret for typical bandit algorithms. Our strategy, based on weighted least-squares estimation, achieves the eigenvalue relation $\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$ for the design matrix $V_t$ at each time step $t$ through geometrical arguments that are independent of the noise model and might be of independent interest. This allows us to tightly control the expected regret in each time step to be of the order $O(\frac1{t})$, leading to the logarithmic scaling of the cumulative regret.

Updated: 2024-05-29 10:58:25

标题: 具有多对数极小化最大遗憾的线性赌博机

摘要: 我们研究了一个噪声模型，用于线性随机赌博机，其中次高斯噪声参数随着我们选择动作在接近未知向量的单位球上越来越接近线性消失。我们针对这个问题引入了一种算法，其最小化后悔在时间范围$T$内以$\log^3(T)$的比例增长，与典型赌博机算法的平方根比例增长形成鲜明对比。我们的策略基于加权最小二乘估计，通过几何论证实现了设计矩阵$V_t$在每个时间步骤$t$上的特征值关系$\lambda_{\min} ( V_t ) = \Omega (\sqrt{\lambda_{\max}(V_t ) })$，这些几何论证独立于噪声模型可能具有独立的兴趣。这使我们能够严密控制每个时间步骤中的预期后悔为$O(\frac1{t})$的量级，从而导致累积后悔的对数增长。

更新时间: 2024-05-29 10:58:25

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2402.12042v2

Optimizing Vehicular Networks with Variational Quantum Circuits-based Reinforcement Learning

In vehicular networks (VNets), ensuring both road safety and dependable network connectivity is of utmost importance. Achieving this necessitates the creation of resilient and efficient decision-making policies that prioritize multiple objectives. In this paper, we develop a Variational Quantum Circuit (VQC)-based multi-objective reinforcement learning (MORL) framework to characterize efficient network selection and autonomous driving policies in a vehicular network (VNet). Numerical results showcase notable enhancements in both convergence rates and rewards when compared to conventional deep-Q networks (DQNs), validating the efficacy of the VQC-MORL solution.

Updated: 2024-05-29 10:57:25

标题: 利用基于变分量子电路的强化学习优化车载网络

摘要: 在车辆网络（VNets）中，确保道路安全和可靠的网络连接至关重要。实现这一点需要创建优先考虑多个目标的弹性和高效的决策政策。本文中，我们开发了一个基于变分量子电路（VQC）的多目标强化学习（MORL）框架，以描述车辆网络（VNet）中的高效网络选择和自动驾驶政策。数值结果展示了与传统深度Q网络（DQNs）相比，收敛速度和奖励均有显著提升，验证了VQC-MORL解决方案的有效性。

更新时间: 2024-05-29 10:57:25

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2405.18984v1

Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping

Statistical heterogeneity severely limits the performance of federated learning (FL), motivating several explorations e.g., FedProx, MOON and FedDyn, to alleviate this problem. Despite effectiveness, their considered scenario generally requires samples from almost all classes during the local training of each client, although some covariate shifts may exist among clients. In fact, the natural case of partially class-disjoint data (PCDD), where each client contributes a few classes (instead of all classes) of samples, is practical yet underexplored. Specifically, the unique collapse and invasion characteristics of PCDD can induce the biased optimization direction in local training, which prevents the efficiency of federated learning. To address this dilemma, we propose a manifold reshaping approach called FedMR to calibrate the feature space of local training. Our FedMR adds two interplaying losses to the vanilla federated learning: one is intra-class loss to decorrelate feature dimensions for anti-collapse; and the other one is inter-class loss to guarantee the proper margin among categories in the feature expansion. We conduct extensive experiments on a range of datasets to demonstrate that our FedMR achieves much higher accuracy and better communication efficiency. Source code is available at: https://github.com/MediaBrain-SJTU/FedMR.git.

Updated: 2024-05-29 10:56:13

标题: 通过流形重塑在部分类不相交数据下的联邦学习

摘要: 统计异质性严重限制了联邦学习（FL）的性能，促使多种探索，例如FedProx、MOON和FedDyn，以缓解这一问题。尽管有效，但它们考虑的场景通常要求在每个客户端的本地训练过程中几乎需要来自所有类别的样本，尽管客户端之间可能存在一些协变量偏移。事实上，部分类不交集数据（PCDD）是一个实际但尚未充分探索的情况，其中每个客户端贡献了少量类别（而不是所有类别）的样本。具体来说，PCDD的独特崩溃和侵入特性可能在本地训练中引起偏倚的优化方向，从而阻碍了联邦学习的效率。为了解决这一困境，我们提出了一种称为FedMR的流形重塑方法，以校准本地训练的特征空间。我们的FedMR在普通联邦学习中增加了两种相互作用的损失：一种是用于消除特征维度相关性的类内损失，以防止崩溃；另一种是用于保证特征扩展中各类别之间适当间隔的类间损失。我们在一系列数据集上进行了大量实验，以证明我们的FedMR实现了更高的准确性和更好的通信效率。源代码可在https://github.com/MediaBrain-SJTU/FedMR.git上找到。

更新时间: 2024-05-29 10:56:13

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.18983v1

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts.

Updated: 2024-05-29 10:45:06

标题: MANO：利用矩阵范数在分布转变下进行无监督准确度估计

摘要: 利用模型的输出，特别是logits，是一种常见的方法，用于估计预训练神经网络在分布外（OOD）样本上的测试准确度，而无需访问相应的地面真实标签。尽管这些方法易于实现且计算效率高，但当前基于logit的方法容易受到过度自信的问题影响，尤其在自然转移情况下会导致预测偏差。在这项工作中，我们首先从低密度分离假设的角度研究logits与泛化性能之间的关系。我们的发现激发了我们提出的MaNo方法，该方法（1）对logits应用数据相关的归一化以减少预测偏差，（2）将归一化logits矩阵的$L_p$范数作为估计分数。我们的理论分析突出了提供的分数与模型的不确定性之间的联系。我们在常见的无监督准确性估计基准上进行了广泛的实证研究，并证明了在合成、自然或子群转移情况下，MaNo在各种架构中均实现了最先进的性能。

更新时间: 2024-05-29 10:45:06

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.18979v1

Self-Pro: Self-Prompt and Tuning Framework for Graph Neural Networks

Graphs have become an important modeling tool for Web applications, and graph neural networks (GNNs) have achieved great success in graph representation learning. However, their performance heavily relies on a large amount of supervision. Recently, ``pre-train, fine-tune'' has become the paradigm to address the issues of label dependency and poor generalization. However, the pre-training strategies vary for graphs with homophily and heterophily, and the objectives for various downstream tasks also differ. This leads to a gap between pretexts and downstream tasks, resulting in ``negative transfer'' and poor performance. Inspired by prompt learning in natural language processing, many studies turn to bridge the gap and fully leverage the pre-trained model. However, existing methods for graph prompting are tailored to homophily, neglecting inherent heterophily on graphs. Meanwhile, most of them rely on randomly initialized prompts, which negatively impact on the stability. Therefore, we propose Self-Prompt, a prompting framework for graphs based on the model and data itself. We first introduce asymmetric graph contrastive learning as pretext to address heterophily and align the objectives of pretext and downstream tasks. Then we reuse the component from pre-training as the self adapter and introduce self-prompts based on graph itself for task adaptation. Finally, we conduct extensive experiments on 11 benchmark datasets to demonstrate its superiority. We provide our codes at \url{https://github.com/gongchenghua/Self-Pro}.

Updated: 2024-05-29 10:41:23

标题: 自我启动：面向图神经网络的自我提示和调整框架

摘要: 图形已成为Web应用程序的重要建模工具，并且图神经网络（GNNs）在图表示学习方面取得了巨大成功。然而，它们的性能严重依赖于大量的监督。最近，“预训练，微调”已成为解决标签依赖性和泛化性差的范例。然而，针对具有同质性和异质性的图的预训练策略各不相同，各种下游任务的目标也不同。这导致了预文本和下游任务之间的差距，导致“负转移”和性能不佳。受自然语言处理中提示学习的启发，许多研究转向弥合差距并充分利用预训练模型。然而，现有的图提示方法专为同质性量身定制，忽视了图上固有的异质性。同时，大多数方法依赖于随机初始化的提示，这对稳定性产生了负面影响。因此，我们提出了Self-Prompt，这是一个基于模型和数据本身的图提示框架。我们首先引入不对称图对比学习作为预文本，以解决异质性，并对齐预文本和下游任务的目标。然后，我们重复使用来自预训练的组件作为自适应器，并引入基于图本身的自提示以进行任务适应。最后，我们在11个基准数据集上进行了大量实验，以证明其优越性。我们在\url{https://github.com/gongchenghua/Self-Pro}提供我们的代码。

更新时间: 2024-05-29 10:41:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.10362v2

Hierarchical Classification Auxiliary Network for Time Series Forecasting

Deep learning has significantly advanced time series forecasting through its powerful capacity to capture sequence relationships. However, training these models with the Mean Square Error (MSE) loss often results in over-smooth predictions, making it challenging to handle the complexity and learn high-entropy features from time series data with high variability and unpredictability. In this work, we introduce a novel approach by tokenizing time series values to train forecasting models via cross-entropy loss, while considering the continuous nature of time series data. Specifically, we propose Hierarchical Classification Auxiliary Network, HCAN, a general model-agnostic component that can be integrated with any forecasting model. HCAN is based on a Hierarchy-Aware Attention module that integrates multi-granularity high-entropy features at different hierarchy levels. At each level, we assign a class label for timesteps to train an Uncertainty-Aware Classifier. This classifier mitigates the over-confidence in softmax loss via evidence theory. We also implement a Hierarchical Consistency Loss to maintain prediction consistency across hierarchy levels. Extensive experiments integrating HCAN with state-of-the-art forecasting models demonstrate substantial improvements over baselines on several real-world datasets. Code is available at:https://github.com/syrGitHub/HCAN.

Updated: 2024-05-29 10:38:25

标题: 分层分类辅助网络用于时间序列预测

摘要: 深度学习通过其强大的捕捉序列关系的能力，显著推进了时间序列预测。然而，使用均方误差（MSE）损失训练这些模型通常会导致过度平滑的预测，使其难以处理具有高变异性和不可预测性的时间序列数据的复杂性和学习高熵特征。在本文中，我们介绍了一种新颖的方法，通过对时间序列值进行标记化，通过交叉熵损失训练预测模型，同时考虑时间序列数据的连续性。具体而言，我们提出了分层分类辅助网络（HCAN），这是一个通用的与任何预测模型集成的组件。HCAN基于一个层次感知注意模块，它在不同层次级别集成了多粒度高熵特征。在每个级别上，我们为时间步骤分配一个类标签，以训练一个不确定性感知分类器。通过证据理论，该分类器减轻了softmax损失中的过度自信。我们还实施了一个分层一致性损失，以在层次级别上保持预测一致性。将HCAN与最先进的预测模型整合的大量实验证明，在几个真实世界数据集上，相对基线取得了显著的改进。代码可在以下链接找到：https://github.com/syrGitHub/HCAN。

更新时间: 2024-05-29 10:38:25

领域: cs.LG

下载: http://arxiv.org/abs/2405.18975v1

Federated Learning with Bilateral Curation for Partially Class-Disjoint Data

Partially class-disjoint data (PCDD), a common yet under-explored data formation where each client contributes a part of classes (instead of all classes) of samples, severely challenges the performance of federated algorithms. Without full classes, the local objective will contradict the global objective, yielding the angle collapse problem for locally missing classes and the space waste problem for locally existing classes. As far as we know, none of the existing methods can intrinsically mitigate PCDD challenges to achieve holistic improvement in the bilateral views (both global view and local view) of federated learning. To address this dilemma, we are inspired by the strong generalization of simplex Equiangular Tight Frame~(ETF) on the imbalanced data, and propose a novel approach called FedGELA where the classifier is globally fixed as a simplex ETF while locally adapted to the personal distributions. Globally, FedGELA provides fair and equal discrimination for all classes and avoids inaccurate updates of the classifier, while locally it utilizes the space of locally missing classes for locally existing classes. We conduct extensive experiments on a range of datasets to demonstrate that our FedGELA achieves promising performance~(averaged improvement of 3.9% to FedAvg and 1.5% to best baselines) and provide both local and global convergence guarantees. Source code is available at:https://github.com/MediaBrain-SJTU/FedGELA.git.

Updated: 2024-05-29 10:34:44

标题: 使用双向筛选的联邦学习方法处理部分类别不一致的数据

摘要: 部分类别不相交数据（PCDD）是一种常见但尚未充分探索的数据形式，其中每个客户端只贡献部分类别（而不是全部类别）的样本，严重挑战着联邦学习算法的性能。在没有全部类别的情况下，本地目标将与全局目标相矛盾，导致本地缺失类别的角度坍缩问题和本地存在类别的空间浪费问题。据我们所知，目前没有任何现有方法可以固有地缓解PCDD挑战，实现联邦学习双边视图（全局视图和本地视图）的整体改进。为了解决这一困境，我们受到简单正交紧框架（ETF）在不平衡数据上的强大泛化能力的启发，提出了一种名为FedGELA的新方法，其中分类器在全局范围内固定为简单正交紧框架，同时在本地范围内适应个人分布。在全局范围内，FedGELA为所有类别提供公平且平等的区分，避免分类器的不准确更新，而在本地范围内，它利用本地缺失类别的空间来处理本地存在类别。我们在一系列数据集上进行了大量实验，证明我们的FedGELA取得了有希望的性能（相对于FedAvg平均改进了3.9％，相对于最佳基线改进了1.5％），并提供了本地和全局收敛保证。源代码可在https://github.com/MediaBrain-SJTU/FedGELA.git上找到。

更新时间: 2024-05-29 10:34:44

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.18972v1

UniIF: Unified Molecule Inverse Folding

Molecule inverse folding has been a long-standing challenge in chemistry and biology, with the potential to revolutionize drug discovery and material science. Despite specified models have been proposed for different small- or macro-molecules, few have attempted to unify the learning process, resulting in redundant efforts. Complementary to recent advancements in molecular structure prediction, such as RoseTTAFold All-Atom and AlphaFold3, we propose the unified model UniIF for the inverse folding of all molecules. We do such unification in two levels: 1) Data-Level: We propose a unified block graph data form for all molecules, including the local frame building and geometric feature initialization. 2) Model-Level: We introduce a geometric block attention network, comprising a geometric interaction, interactive attention and virtual long-term dependency modules, to capture the 3D interactions of all molecules. Through comprehensive evaluations across various tasks such as protein design, RNA design, and material design, we demonstrate that our proposed method surpasses state-of-the-art methods on all tasks. UniIF offers a versatile and effective solution for general molecule inverse folding.

Updated: 2024-05-29 10:26:16

标题: UniIF: 统一分子逆向折叠

摘要: 分子逆折叠一直是化学和生物学中长期存在的挑战，具有颠覆药物发现和材料科学的潜力。尽管针对不同小分子或大分子已经提出了特定模型，但很少有人尝试统一学习过程，导致了冗余的努力。与最近分子结构预测的进展相辅相成，例如RoseTTAFold全原子和AlphaFold3，我们提出了适用于所有分子的逆折叠的统一模型UniIF。我们在两个层面上进行统一：1）数据级别：我们为所有分子提出了一个统一的块图数据形式，包括局部框架构建和几何特征初始化。2）模型级别：我们引入了一个几何块注意网络，包括几何交互、交互式注意和虚拟长期依赖模块，以捕获所有分子的三维相互作用。通过对蛋白设计、RNA设计和材料设计等各种任务的全面评估，我们证明了我们提出的方法在所有任务上超越了现有方法。UniIF为一般分子逆折叠提供了多功能且有效的解决方案。

更新时间: 2024-05-29 10:26:16

领域: cs.AI,cs.LG,q-bio.QM

下载: http://arxiv.org/abs/2405.18968v1

Debiasing Algorithm through Model Adaptation

Large language models are becoming the go-to solution for the ever-growing number of tasks. However, with growing capacity, models are prone to rely on spurious correlations stemming from biases and stereotypes present in the training data. This work proposes a novel method for detecting and mitigating gender bias in language models. We perform causal analysis to identify problematic model components and discover that mid-upper feed-forward layers are most prone to convey bias. Based on the analysis results, we intervene in the model by applying a linear projection to the weight matrices of these layers. Our titular method, DAMA, significantly decreases bias as measured by diverse metrics while maintaining the model's performance on downstream tasks. We release code for our method and models, which retrain LLaMA's state-of-the-art performance while being significantly less biased.

Updated: 2024-05-29 10:22:52

标题: 通过模型适应去偏倚算法

摘要: 大型语言模型正在成为日益增长的任务数量的首选解决方案。然而，随着容量的增长，模型容易依赖于训练数据中存在的偏见和刻板印象导致的虚假相关性。本研究提出了一种新颖的方法来检测和减轻语言模型中的性别偏见。我们进行因果分析以识别问题模型组件，并发现中上层前馈层最容易传递偏见。根据分析结果，我们通过对这些层的权重矩阵应用线性投影来干预模型。我们的方法DAMA显著减少偏见，同时在下游任务中保持模型的性能。我们发布了我们的方法和模型的代码，可以重新训练LLaMA的最新性能，同时显著减少了偏见。

更新时间: 2024-05-29 10:22:52

领域: cs.CL,cs.AI,stat.ML

下载: http://arxiv.org/abs/2310.18913v4

Diffusive Gibbs Sampling

The inadequate mixing of conventional Markov Chain Monte Carlo (MCMC) methods for multi-modal distributions presents a significant challenge in practical applications such as Bayesian inference and molecular dynamics. Addressing this, we propose Diffusive Gibbs Sampling (DiGS), an innovative family of sampling methods designed for effective sampling from distributions characterized by distant and disconnected modes. DiGS integrates recent developments in diffusion models, leveraging Gaussian convolution to create an auxiliary noisy distribution that bridges isolated modes in the original space and applying Gibbs sampling to alternately draw samples from both spaces. A novel Metropolis-within-Gibbs scheme is proposed to enhance mixing in the denoising sampling step. DiGS exhibits a better mixing property for sampling multi-modal distributions than state-of-the-art methods such as parallel tempering, attaining substantially improved performance across various tasks, including mixtures of Gaussians, Bayesian neural networks and molecular dynamics.

Updated: 2024-05-29 10:20:04

标题: 扩散式吉布斯采样

摘要: 传统的马尔可夫链蒙特卡洛（MCMC）方法在多峰分布的混合方面存在不足，这在贝叶斯推断和分子动力学等实际应用中构成了重要挑战。为了解决这个问题，我们提出了扩散吉布斯抽样（DiGS），这是一种创新的采样方法系列，旨在有效地从远离和断开的模式特征的分布中进行采样。DiGS集成了最新的扩散模型发展，利用高斯卷积创建一个桥接原始空间中孤立模式的辅助噪声分布，并应用吉布斯抽样交替地从两个空间中抽取样本。提出了一种新颖的在吉布斯方案中的Metropolis方法，以增强去噪采样步骤中的混合性。DiGS对于采样多峰分布表现出比诸如平行温度等现有方法更好的混合性能，获得了在各种任务中显着改善的性能，包括高斯混合物、贝叶斯神经网络和分子动力学。

更新时间: 2024-05-29 10:20:04

领域: stat.ML,cs.LG,stat.CO

下载: http://arxiv.org/abs/2402.03008v5

Text clustering with LLM embeddings

Text clustering is an important approach for organising the growing amount of digital content, helping to structure and find hidden patterns in uncategorised data. In this research, we investigated how different textual embeddings - particularly those used in large language models (LLMs) - and clustering algorithms affect how text datasets are clustered. A series of experiments were conducted to assess how embeddings influence clustering results, the role played by dimensionality reduction through summarisation, and embedding size adjustment. Results reveal that LLM embeddings excel at capturing the nuances of structured language, while BERT leads the lightweight options in performance. In addition, we find that increasing embedding dimensionality and summarisation techniques do not uniformly improve clustering efficiency, suggesting that these strategies require careful analysis to use in real-life models. These results highlight a complex balance between the need for nuanced text representation and computational feasibility in text clustering applications. This study extends traditional text clustering frameworks by incorporating embeddings from LLMs, thereby paving the way for improved methodologies and opening new avenues for future research in various types of textual analysis.

Updated: 2024-05-29 10:16:13

标题: LLM嵌入的文本聚类

摘要: 文本聚类是组织日益增长的数字内容的重要方法，有助于在未分类数据中找到隐藏的模式并进行结构化。在这项研究中，我们调查了不同文本嵌入 - 特别是大型语言模型（LLMs）中使用的那些 - 以及聚类算法如何影响文本数据集的聚类。进行了一系列实验，以评估嵌入如何影响聚类结果，通过摘要降维的作用，以及嵌入大小的调整。结果显示，LLM嵌入在捕捉结构化语言细微差别方面表现出色，而BERT在性能方面领先轻量级选项。此外，我们发现增加嵌入维度和摘要技术并不一致地提高聚类效率，这表明这些策略需要仔细分析才能在现实生活模型中使用。这些结果突显了在文本聚类应用中需要细致文本表示和计算可行性之间的复杂平衡。该研究通过将LLMs中的嵌入融入传统的文本聚类框架，从而为改进方法论打开新途径，并为各种类型的文本分析的未来研究开辟了新的道路。

更新时间: 2024-05-29 10:16:13

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2403.15112v2

Pessimism of the Will, Optimism of the Intellect: Fair Protocols with Malicious but Rational Agents

Fairness is a desirable and crucial property of many protocols that handle, for instance, exchanges of message. It states that if at least one agent engaging in the protocol is honest, then either the protocol will unfold correctly and fulfill its intended goal for all participants, or it will fail for everyone. In this work, we present a game-based framework for the study of fairness protocols, that does not define a priori an attacker model. It is based on the notion of strong secure equilibria, and leverages the conceptual and algorithmic toolbox of game theory. In the case of finite games, we provide decision procedures with tight complexity bounds for determining whether a protocol is immune to nefarious attacks from a coalition of participants, and whether such a protocol could exist based on the underlying graph structure and objectives.

Updated: 2024-05-29 10:15:36

标题: 意志的悲观，智力的乐观：对具有恶意但理性的代理的公平协议

摘要: 公平性是许多处理消息交换的协议所具有的一个理想和关键属性。它表明，如果参与协议的至少一个代理是诚实的，那么协议要么会正确展开并实现所有参与者的预期目标，要么会对所有人都失败。在这项工作中，我们提出了一个基于游戏的框架来研究公平性协议，该框架不会预先定义攻击者模型。它基于强安全均衡的概念，并利用博弈论的概念和算法工具箱。在有限游戏的情况下，我们提供了决策程序，用于确定一个协议是否免疫来自参与者联盟的恶意攻击，并且基于底层图结构和目标来确定这样一个协议是否存在，同时提供了紧密的复杂度界限。

更新时间: 2024-05-29 10:15:36

领域: cs.GT,cs.CC,cs.CR

下载: http://arxiv.org/abs/2405.18958v1

MAGIC: Modular Auto-encoder for Generalisable Model Inversion with Bias Corrections

Scientists often model physical processes to understand the natural world and uncover the causation behind observations. Due to unavoidable simplification, discrepancies often arise between model predictions and actual observations, in the form of systematic biases, whose impact varies with model completeness. Classical model inversion methods such as Bayesian inference or regressive neural networks tend either to overlook biases or make assumptions about their nature during data preprocessing, potentially leading to implausible results. Inspired by recent work in inverse graphics, we replace the decoder stage of a standard autoencoder with a physical model followed by a bias-correction layer. This generalisable approach simultaneously inverts the model and corrects its biases in an end-to-end manner without making strong assumptions about the nature of the biases. We demonstrate the effectiveness of our approach using two physical models from disparate domains: a complex radiative transfer model from remote sensing; and a volcanic deformation model from geodesy. Our method matches or surpasses results from classical approaches without requiring biases to be explicitly filtered out, suggesting an effective pathway for understanding the causation of various physical processes.

Updated: 2024-05-29 10:11:10

标题: MAGIC：具有偏差校正的可泛化模型反演的模块化自编码器

摘要: 科学家经常建立物理过程模型来理解自然界并揭示观察背后的因果关系。由于不可避免的简化，模型预测和实际观察之间经常出现差异，表现为系统性偏差，其影响随着模型完整性的不同而变化。传统的模型反演方法，如贝叶斯推断或回归神经网络，往往忽视偏差或在数据预处理过程中对其性质进行假设，可能导致不合理的结果。受逆向图形学的最新工作启发，我们将标准自动编码器的解码器阶段替换为一个物理模型，然后再加上一个偏差校正层。这种通用方法可以同时反转模型并校正其偏差，而无需对偏差的性质进行强烈的假设。我们使用来自不同领域的两个物理模型（遥感中的复杂辐射传输模型和大地测量中的火山变形模型）展示了我们方法的有效性。我们的方法与传统方法的结果相匹配或超越，而无需显式过滤偏差，这表明了一种理解各种物理过程因果关系的有效途径。

更新时间: 2024-05-29 10:11:10

领域: cs.LG

下载: http://arxiv.org/abs/2405.18953v1

Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets

Training Large Language Models (LLMs) with Reinforcement Learning from AI Feedback (RLAIF) aligns model outputs more closely with human preferences. This involves an evaluator model ranking multiple candidate responses to user prompts. However, the rankings from popular evaluator models such as GPT-4 can be inconsistent. We propose the Repeat Ranking method - where we evaluate the same responses multiple times and train only on those responses which are consistently ranked. Using 2,714 prompts in 62 languages, we generated responses from 7 top multilingual LLMs and had GPT-4 rank them five times each. Evaluating on MT-Bench chat benchmarks in six languages, our method outperformed the standard practice of training on all available prompts. Our work highlights the quality versus quantity trade-off in RLAIF dataset generation and offers a stackable strategy for enhancing dataset and thus model quality.

Updated: 2024-05-29 10:08:31

标题: 你确定吗？重新排名：为更好的偏好数据集重复排名

摘要: 用强化学习从AI反馈（RLAIF）训练大型语言模型（LLMs）可以更接近人类偏好。这涉及一个评估模型对用户提示的多个候选响应进行排名。然而，像GPT-4这样的流行评估模型的排名可能不一致。我们提出了重复排名方法 - 我们多次评估相同的响应，并仅在那些一致排名的响应上进行训练。使用62种语言中的2,714个提示，我们从7个顶级多语言LLMs生成了响应，并让GPT-4对每个响应进行了五次排名。在六种语言的MT-Bench聊天基准上进行评估，我们的方法胜过了在所有可用提示上进行训练的标准做法。我们的工作突出了在RLAIF数据集生成中质量与数量的权衡，并提供了增强数据集和模型质量的可堆叠策略。

更新时间: 2024-05-29 10:08:31

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18952v1

Risks and Opportunities of Open-Source Generative AI

Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for Gen AI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source Gen AI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.

Updated: 2024-05-29 10:05:40

标题: 《开源生成人工智能的风险与机遇》

摘要: Generative AI（Gen AI）的应用预计将彻底改变许多不同领域，从科学和医学到教育。这些巨大变革的潜力引发了关于技术潜在风险的激烈辩论，并导致一些主要科技公司呼吁加强监管，特别是那些在AI开发方面领先的公司。这种监管可能会危及新兴的开源生成AI领域。通过使用Gen AI开发的三阶段框架（近期、中期和长期），我们分析了与当前可用模型（近期到中期）具有类似能力的开源生成AI模型以及具有更大能力的模型（长期）的风险和机会。我们认为，总体而言，开源Gen AI的好处超过了其风险。因此，我们鼓励模型、训练和评估数据的开源，并提供一套管理与开源生成AI相关风险的建议和最佳实践。

更新时间: 2024-05-29 10:05:40

领域: cs.LG

下载: http://arxiv.org/abs/2405.08597v3

A Wireless AI-Generated Content (AIGC) Provisioning Framework Empowered by Semantic Communication

Generative AI applications have been recently catering to a vast user base by creating diverse and high-quality AI-generated content (AIGC). With the proliferation of mobile devices and rapid growth of mobile traffic, providing ubiquitous access to high-quality AIGC services via wireless communication networks is becoming the future direction. However, it is challenging to provide qualified AIGC services in wireless networks with unstable channels, limited bandwidth resources, and unevenly distributed computational resources. To tackle these challenges, we propose a semantic communication (SemCom)-empowered AIGC (SemAIGC) generation and transmission framework, where only semantic information of the content rather than all the binary bits should be generated and transmitted by using SemCom. Specifically, SemAIGC integrates diffusion models within the semantic encoder and decoder to design a workload-adjustable transceiver thereby allowing adjustment of computational resource utilization in edge and local. In addition, a Resource-aware wOrk lOad Trade-off (ROOT) scheme is devised to intelligently make workload adaptation decisions for the transceiver, thus efficiently generating, transmitting, and fine-tuning content as per dynamic wireless channel conditions and service requirements. Simulations verify the superiority of our proposed SemAIGC framework in terms of latency and content quality compared to conventional approaches.

Updated: 2024-05-29 10:05:14

标题: 一个由语义通信增强的无线人工智能生成内容（AIGC）供应框架

摘要: 最近，生成式人工智能应用通过创建多样化和高质量的AI生成内容（AIGC）来满足广泛的用户群体。随着移动设备的普及和移动流量的快速增长，通过无线通信网络提供高质量的AIGC服务的普遍访问正在成为未来的发展方向。然而，在具有不稳定信道、有限带宽资源和分布不均匀的计算资源的无线网络中提供合格的AIGC服务是具有挑战性的。为了应对这些挑战，我们提出了一种语义通信（SemCom）增强的AIGC（SemAIGC）生成和传输框架，其中仅通过使用SemCom生成和传输内容的语义信息而非所有二进制位。具体来说，SemAIGC在语义编码器和解码器中集成扩散模型，设计了一个可调整工作负载的收发器，从而允许在边缘和本地调整计算资源利用率。此外，设计了一种资源感知的工作负载权衡（ROOT）方案，智能地为收发器做出工作负载适应决策，从而有效地根据动态无线信道条件和服务需求生成、传输和微调内容。模拟验证了我们提出的SemAIGC框架在延迟和内容质量方面相对于传统方法的优越性。

更新时间: 2024-05-29 10:05:14

领域: cs.NI,cs.AI,cs.LG,eess.IV

下载: http://arxiv.org/abs/2310.17705v2

Learning to Recover from Plan Execution Errors during Robot Manipulation: A Neuro-symbolic Approach

Automatically detecting and recovering from failures is an important but challenging problem for autonomous robots. Most of the recent work on learning to plan from demonstrations lacks the ability to detect and recover from errors in the absence of an explicit state representation and/or a (sub-) goal check function. We propose an approach (blending learning with symbolic search) for automated error discovery and recovery, without needing annotated data of failures. Central to our approach is a neuro-symbolic state representation, in the form of dense scene graph, structured based on the objects present within the environment. This enables efficient learning of the transition function and a discriminator that not only identifies failures but also localizes them facilitating fast re-planning via computation of heuristic distance function. We also present an anytime version of our algorithm, where instead of recovering to the last correct state, we search for a sub-goal in the original plan minimizing the total distance to the goal given a re-planning budget. Experiments on a physics simulator with a variety of simulated failures show the effectiveness of our approach compared to existing baselines, both in terms of efficiency as well as accuracy of our recovery mechanism.

Updated: 2024-05-29 10:03:57

标题: 学习在机器人操作过程中从计划执行错误中恢复：一种神经符号方法

摘要: 自动检测和从故障中恢复对于自主机器人来说是一个重要但具有挑战性的问题。大多数最近关于从演示中学习规划的工作缺乏在没有明确状态表示和/或（子）目标检查功能的情况下检测和恢复错误的能力。我们提出了一种方法（将学习与符号搜索相结合），用于自动化错误发现和恢复，而无需失败的标注数据。我们方法的核心是神经符号状态表示，以密集场景图的形式结构化，基于环境中存在的对象。这使得可以有效学习过渡函数和鉴别器，不仅可以识别失败，还可以定位它们，从而通过计算启发式距离函数实现快速重新规划。我们还提出了我们算法的任何时候版本，其中我们不是恢复到最后一个正确的状态，而是在原始计划中寻找一个子目标，以在给定重新规划预算的情况下最小化到目标的总距离。在一个物理模拟器上进行的各种模拟故障实验表明，与现有基线相比，我们方法在效率和恢复机制的准确性方面的有效性。

更新时间: 2024-05-29 10:03:57

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.18948v1

Delving into Differentially Private Transformer

Deep learning with differential privacy (DP) has garnered significant attention over the past years, leading to the development of numerous methods aimed at enhancing model accuracy and training efficiency. This paper delves into the problem of training Transformer models with differential privacy. Our treatment is modular: the logic is to `reduce' the problem of training DP Transformer to the more basic problem of training DP vanilla neural nets. The latter is better understood and amenable to many model-agnostic methods. Such `reduction' is done by first identifying the hardness unique to DP Transformer training: the attention distraction phenomenon and a lack of compatibility with existing techniques for efficient gradient clipping. To deal with these two issues, we propose the Re-Attention Mechanism and Phantom Clipping, respectively. We believe that our work not only casts new light on training DP Transformers but also promotes a modular treatment to advance research in the field of differentially private deep learning.

Updated: 2024-05-29 10:01:43

标题: 深入研究差分隐私Transformer

摘要: 使用差分隐私（DP）进行深度学习在过去几年中引起了重大关注，导致了许多方法的发展，旨在提高模型准确性和训练效率。本文深入探讨了使用差分隐私训练Transformer模型的问题。我们的处理是模块化的：逻辑是将训练DP Transformer的问题“简化”为训练DP基本神经网络的更基本问题。后者更容易理解，并且适用于许多与模型无关的方法。这种“简化”是通过首先识别DP Transformer训练中独特的困难：注意力分散现象和与现有技术不兼容的高效梯度剪裁问题。为了解决这两个问题，我们分别提出了再注意力机制和幻影剪裁。我们相信，我们的工作不仅为训练DP Transformers投下新的光，而且促进了模块化处理，推动了在差分私有深度学习领域的研究。

更新时间: 2024-05-29 10:01:43

领域: cs.LG,cs.CR

下载: http://arxiv.org/abs/2405.18194v2

Understanding and Improving Training-free Loss-based Diffusion Guidance

Adding additional control to pretrained diffusion models has become an increasingly popular research area, with extensive applications in computer vision, reinforcement learning, and AI for science. Recently, several studies have proposed training-free loss-based guidance by using off-the-shelf networks pretrained on clean images. This approach enables zero-shot conditional generation for universal control formats, which appears to offer a free lunch in diffusion guidance. In this paper, we aim to develop a deeper understanding of training-free guidance, as well as overcome its limitations. We offer a theoretical analysis that supports training-free guidance from the perspective of optimization, distinguishing it from classifier-based (or classifier-free) guidance. To elucidate their drawbacks, we theoretically demonstrate that training-free guidance is more susceptible to adversarial gradients and exhibits slower convergence rates compared to classifier guidance. We then introduce a collection of techniques designed to overcome the limitations, accompanied by theoretical rationale and empirical evidence. Our experiments in image and motion generation confirm the efficacy of these techniques.

Updated: 2024-05-29 09:59:36

标题: 理解和改善无需训练的基于损失的扩散引导

摘要: 将额外的控制添加到预训练扩散模型已成为一个日益流行的研究领域，在计算机视觉、强化学习和科学人工智能领域有广泛应用。最近，几项研究提出了使用在干净图像上预训练的现成网络进行无需训练的基于损失的引导。这种方法实现了对通用控制格式的零样本条件生成，似乎为扩散引导提供了一种免费午餐。本文旨在深入了解无需训练的引导，并克服其局限性。我们提供了支持无需训练引导的理论分析，从优化的角度区分了它与基于分类器（或无分类器）的引导。为了阐明它们的缺点，我们在理论上证明无需训练的引导更容易受到对抗梯度的影响，收敛速度比分类器引导慢。然后，我们介绍了一系列旨在克服这些限制的技术，配以理论依据和经验证据。我们在图像和运动生成方面的实验证实了这些技术的有效性。

更新时间: 2024-05-29 09:59:36

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2403.12404v2

WTTFNet: A Weather-Time-Trajectory Fusion Network for Pedestrian Trajectory Prediction in Urban Complex

Pedestrian trajectory modelling in an urban complex is challenging because pedestrians can have many possible destinations, such as shops, escalators, and attractions. Moreover, weather and time-of-day may affect pedestrian behavior. In this paper, a new weather-time-trajectory fusion network (WTTFNet) is proposed to improve the performance of baseline deep neural network architecture. By incorporating weather and time-of-day information as an embedding structure, a novel WTTFNet based on gate multimodal unit is used to fuse the multimodal information and deep representation of trajectories. A joint loss function based on focal loss is used to co-optimize both the deep trajectory features and final classifier, which helps to improve the accuracy in predicting the intended destination of pedestrians and hence the trajectories under possible scenarios of class imbalances. Experimental results using the Osaka Asia and Pacific Trade Center (ATC) dataset shows improved performance of the proposed approach over state-of-the-art algorithms by 23.67% increase in classification accuracy, 9.16% and 7.07% reduction of average and final displacement error. The proposed approach may serve as an attractive approach for improving existing baseline trajectory prediction models when they are applied to scenarios with influences of weather-time conditions. It can be employed in numerous applications such as pedestrian facility engineering, public space development and technology-driven retail.

Updated: 2024-05-29 09:56:54

标题: WTTFNet：一种用于城市复杂环境中行人轨迹预测的天气-时间-轨迹融合网络

摘要: 在城市综合体中对行人轨迹进行建模具有挑战性，因为行人可能有许多可能的目的地，例如商店、自动扶梯和景点。此外，天气和时间可能会影响行人的行为。本文提出了一种新的天气-时间-轨迹融合网络（WTTFNet），旨在提高基线深度神经网络架构的性能。通过将天气和时间信息作为嵌入结构，基于门控多模单元的新颖WTTFNet用于融合多模态信息和轨迹的深度表示。基于焦点损失的联合损失函数用于共同优化深度轨迹特征和最终分类器，有助于提高预测行人预期目的地的准确性，从而在可能存在类别不平衡的情况下改善轨迹。使用大阪亚太贸易中心（ATC）数据集的实验结果显示，所提出的方法在分类准确性方面比现有最先进算法提高了23.67%，平均和最终位移误差分别减少了9.16%和7.07%。当应用于受天气-时间条件影响的场景时，所提出的方法可能作为改进现有基线轨迹预测模型的吸引人方法。它可以应用于诸如行人设施工程、公共空间开发和技术驱动的零售等众多应用中。

更新时间: 2024-05-29 09:56:54

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.18945v1

Predicting Many Properties of Crystals by a Single Deep Learning Model

The use of machine learning methods for predicting the properties of crystalline materials encounters significant challenges, primarily related to input encoding, output versatility, and interpretability. Here, we introduce CrystalBERT, an adaptable transformer-based framework with novel structure that integrates space group, elemental, and unit cell information. The method's adaptability lies not only in its ability to seamlessly combine diverse features but also in its capability to accurately predict a wide range of physically important properties, including topological properties, superconducting transition temperatures, dielectric constants, and more. CrystalBERT also provides insightful physical interpretations regarding the features that most significantly influence the target properties. Our findings indicate that space group and elemental information are more important for predicting topological and superconducting properties, in contrast to some properties that primarily depend on the unit cell information. This underscores the intricate nature of topological and superconducting properties. By incorporating all these features, we achieve a high accuracy of 91% in topological classification, surpassing prior studies and identifying previously misclassified topological materials, further demonstrating the effectiveness of our model.

Updated: 2024-05-29 09:56:00

标题: 用一个深度学习模型预测晶体的许多性质

摘要: 机器学习方法在预测晶体材料性质方面遇到了重大挑战，主要涉及输入编码、输出多样性和可解释性。在这里，我们介绍了CrystalBERT，这是一个具有新颖结构的可调整的基于转换器的框架，集成了空间群、元素和晶胞信息。该方法的可调性不仅在于其能够无缝地结合各种特征，还在于其能够准确预测一系列物理重要性质，包括拓扑性质、超导转变温度、介电常数等。CrystalBERT还提供了关于最显著影响目标性质的特征的深刻物理解释。我们的研究结果表明，空间群和元素信息对于预测拓扑和超导性质更为重要，与一些主要依赖于晶胞信息的性质形成对比。这突显了拓扑和超导性质的复杂性质。通过整合所有这些特征，我们在拓扑分类中实现了91%的高准确率，超过先前的研究，并识别出先前被错误分类的拓扑材料，进一步展示了我们模型的有效性。

更新时间: 2024-05-29 09:56:00

领域: cond-mat.mtrl-sci,cond-mat.mes-hall,cs.LG

下载: http://arxiv.org/abs/2405.18944v1

Verifiably Robust Conformal Prediction

Conformal Prediction (CP) is a popular uncertainty quantification method that provides distribution-free, statistically valid prediction sets, assuming that training and test data are exchangeable. In such a case, CP's prediction sets are guaranteed to cover the (unknown) true test output with a user-specified probability. Nevertheless, this guarantee is violated when the data is subjected to adversarial attacks, which often result in a significant loss of coverage. Recently, several approaches have been put forward to recover CP guarantees in this setting. These approaches leverage variations of randomised smoothing to produce conservative sets which account for the effect of the adversarial perturbations. They are, however, limited in that they only support $\ell^2$-bounded perturbations and classification tasks. This paper introduces \emph{VRCP (Verifiably Robust Conformal Prediction)}, a new framework that leverages recent neural network verification methods to recover coverage guarantees under adversarial attacks. Our VRCP method is the first to support perturbations bounded by arbitrary norms including $\ell^1$, $\ell^2$, and $\ell^\infty$, as well as regression tasks. We evaluate and compare our approach on image classification tasks (CIFAR10, CIFAR100, and TinyImageNet) and regression tasks for deep reinforcement learning environments. In every case, VRCP achieves above nominal coverage and yields significantly more efficient and informative prediction regions than the SotA.

Updated: 2024-05-29 09:50:43

标题: 可验证的稳健性合规预测

摘要: Conformal Prediction (CP)是一种流行的不确定性量化方法，提供无分布、统计有效的预测集，假设训练和测试数据是可交换的。在这种情况下，CP的预测集被保证覆盖（未知的）真实测试输出，概率由用户指定。然而，当数据遭受对抗性攻击时，这个保证会被违反，通常导致覆盖率的显著损失。最近，已经提出了几种方法以恢复这种情况下的CP保证。这些方法利用随机平滑的变体产生保守集，考虑对抗性扰动的影响。然而，它们的局限在于只支持$\ell^2$-有界扰动和分类任务。本文介绍了\emph{VRCP（可验证鲁棒的符合预测）}，这是一个利用最近的神经网络验证方法，在对抗性攻击下恢复覆盖保证的新框架。我们的VRCP方法是第一个支持包括$\ell^1$、$\ell^2$和$\ell^\infty$在内的任意范数有界扰动，以及回归任务。我们在图像分类任务（CIFAR10、CIFAR100和TinyImageNet）和深度强化学习环境的回归任务上评估和比较我们的方法。在每种情况下，VRCP都实现了超过名义覆盖率，并且产生了比SotA更高效和更具信息性的预测区域。

更新时间: 2024-05-29 09:50:43

领域: cs.LO,cs.AI,cs.LG,68T37 (Primary) 68T27 (Secondary),G.3; I.2.4; F.4.1

下载: http://arxiv.org/abs/2405.18942v1

Content-Agnostic Moderation for Stance-Neutral Recommendation

Personalized recommendation systems often drive users towards more extreme content, exacerbating opinion polarization. While (content-aware) moderation has been proposed to mitigate these effects, such approaches risk curtailing the freedom of speech and of information. To address this concern, we propose and explore the feasibility of \emph{content-agnostic} moderation as an alternative approach for reducing polarization. Content-agnostic moderation does not rely on the actual content being moderated, arguably making it less prone to forms of censorship. We establish theoretically that content-agnostic moderation cannot be guaranteed to work in a fully generic setting. However, we show that it can often be effectively achieved in practice with plausible assumptions. We introduce two novel content-agnostic moderation methods that modify the recommendations from the content recommender to disperse user-item co-clusters without relying on content features. To evaluate the potential of content-agnostic moderation in controlled experiments, we built a simulation environment to analyze the closed-loop behavior of a system with a given set of users, recommendation system, and moderation approach. Through comprehensive experiments in this environment, we show that our proposed moderation methods significantly enhance stance neutrality and maintain high recommendation quality across various data scenarios. Our results indicate that achieving stance neutrality without direct content information is not only feasible but can also help in developing more balanced and informative recommendation systems without substantially degrading user engagement.

Updated: 2024-05-29 09:50:39

标题: 内容不可知的立场中立推荐审查

摘要: 个性化推荐系统通常会驱使用户倾向于更极端的内容，加剧意见分化。虽然已经提出（内容感知的）管理方法来减轻这种影响，但这些方法可能会限制言论和信息的自由。为了解决这一问题，我们提出并探讨了\emph{内容无关}管理作为减少分化的替代方法的可行性。内容无关的管理不依赖于实际被管理的内容，可以说这样的方法不太容易受到审查形式的影响。我们在理论上建立了内容无关的管理在完全通用设置下不能保证有效的证据。然而，我们展示了在实践中，在合理的假设下，这种方法通常可以有效实现。我们引入了两种新颖的内容无关管理方法，通过修改内容推荐器的推荐来分散用户-项目的共聚类，而不依赖于内容特征。为了评估控制实验中内容无关管理的潜力，我们建立了一个模拟环境来分析具有一组特定用户、推荐系统和管理方法的系统的闭环行为。通过在这个环境中的全面实验，我们展示了我们提出的管理方法显著增强了立场中立性，并在各种数据情景下保持了高质量的推荐。我们的结果表明，在没有直接内容信息的情况下实现立场中立性不仅是可行的，而且还有助于开发更加平衡和信息丰富的推荐系统，而不会显著降低用户参与度。

更新时间: 2024-05-29 09:50:39

领域: cs.IR,cs.LG

下载: http://arxiv.org/abs/2405.18941v1

HLOB -- Information Persistence and Structure in Limit Order Books

We introduce a novel large-scale deep learning model for Limit Order Book mid-price changes forecasting, and we name it `HLOB'. This architecture (i) exploits the information encoded by an Information Filtering Network, namely the Triangulated Maximally Filtered Graph, to unveil deeper and non-trivial dependency structures among volume levels; and (ii) guarantees deterministic design choices to handle the complexity of the underlying system by drawing inspiration from the groundbreaking class of Homological Convolutional Neural Networks. We test our model against 9 state-of-the-art deep learning alternatives on 3 real-world Limit Order Book datasets, each including 15 stocks traded on the NASDAQ exchange, and we systematically characterize the scenarios where HLOB outperforms state-of-the-art architectures. Our approach sheds new light on the spatial distribution of information in Limit Order Books and on its degradation over increasing prediction horizons, narrowing the gap between microstructural modeling and deep learning-based forecasting in high-frequency financial markets.

Updated: 2024-05-29 09:46:44

标题: HLOB -- 限价委托簿中的信息持久性和结构

摘要: 我们引入了一个新颖的大规模深度学习模型，用于限价订单簿中价位变化的预测，并将其命名为`HLOB'。该架构（i）利用信息过滤网络编码的信息，即三角形最大过滤图，揭示了体积水平之间更深层次和非平凡的依赖结构；（ii）通过借鉴开创性的同调卷积神经网络类设计选择，确保处理基础系统复杂性的确定性。我们在3个真实的限价订单簿数据集上对我们的模型进行了测试，每个数据集包括在纳斯达克交易的15支股票，并系统地表征了HLOB在哪些情况下胜过最先进的架构。我们的方法揭示了限价订单簿中信息的空间分布及其在增加预测时间跨度时的退化情况，缩小了微观结构建模和基于深度学习的高频金融市场预测之间的差距。

更新时间: 2024-05-29 09:46:44

领域: q-fin.TR,cs.LG

下载: http://arxiv.org/abs/2405.18938v1

Sketch-Plan-Generalize: Continual Few-Shot Learning of Inductively Generalizable Spatial Concepts

Our goal is to enable embodied agents to learn inductively generalizable spatial concepts, e.g., learning staircase as an inductive composition of towers of increasing height. Given a human demonstration, we seek a learning architecture that infers a succinct ${program}$ representation that explains the observed instance. Additionally, the approach should generalize inductively to novel structures of different sizes or complex structures expressed as a hierarchical composition of previously learned concepts. Existing approaches that use code generation capabilities of pre-trained large (visual) language models, as well as purely neural models, show poor generalization to a-priori unseen complex concepts. Our key insight is to factor inductive concept learning as (i) ${\it Sketch:}$ detecting and inferring a coarse signature of a new concept (ii) ${\it Plan:}$ performing MCTS search over grounded action sequences (iii) ${\it Generalize:}$ abstracting out grounded plans as inductive programs. Our pipeline facilitates generalization and modular reuse, enabling continual concept learning. Our approach combines the benefits of the code generation ability of large language models (LLM) along with grounded neural representations, resulting in neuro-symbolic programs that show stronger inductive generalization on the task of constructing complex structures in relation to LLM-only and neural-only approaches. Furthermore, we demonstrate reasoning and planning capabilities with learned concepts for embodied instruction following.

Updated: 2024-05-29 09:46:39

标题: 草图-规划-概括：感知归纳可推广空间概念的持续少样本学习

摘要: 我们的目标是使具有身体的代理人能够归纳地学习可推广的空间概念，例如，将楼梯学习为逐渐增高的塔的归纳组合。在给定人类示范的情况下，我们寻求一种学习架构，推断出一个简洁的程序表示，解释所观察到的实例。此外，这种方法应该归纳地推广到不同大小的新结构或以先前学习的概念的层次组合表达的复杂结构。现有的利用预训练大型（视觉）语言模型的代码生成能力以及纯神经模型的方法显示出对先验未见复杂概念的差劲泛化能力。我们的关键洞察是将归纳概念学习分解为（i）${\it Sketch:}$检测和推断新概念的粗略特征（ii）${\it Plan:}$执行基于地面的动作序列的MCTS搜索（iii）${\it Generalize:}$将基于地面的计划抽象为归纳程序。我们的流程促进了泛化和模块化重用，实现了持续的概念学习。我们的方法结合了大型语言模型（LLM）的代码生成能力和基于地面的神经表示的优势，产生了在构建复杂结构任务中比仅使用LLM或仅神经方法具有更强归纳泛化能力的神经符号程序。此外，我们展示了通过学习的概念进行具体指示后的推理和规划能力。

更新时间: 2024-05-29 09:46:39

领域: cs.LG,cs.RO

下载: http://arxiv.org/abs/2404.07774v2

ViTGAN: Training GANs with Vision Transformers

Recently, Vision Transformers (ViTs) have shown competitive performance on image recognition while requiring less vision-specific inductive biases. In this paper, we investigate if such performance can be extended to image generation. To this end, we integrate the ViT architecture into generative adversarial networks (GANs). For ViT discriminators, we observe that existing regularization methods for GANs interact poorly with self-attention, causing serious instability during training. To resolve this issue, we introduce several novel regularization techniques for training GANs with ViTs. For ViT generators, we examine architectural choices for latent and pixel mapping layers to facilitate convergence. Empirically, our approach, named ViTGAN, achieves comparable performance to the leading CNN-based GAN models on three datasets: CIFAR-10, CelebA, and LSUN bedroom.

Updated: 2024-05-29 09:41:05

标题: ViTGAN: 使用视觉Transformer训练GANs

摘要: 最近，视觉Transformer（ViTs）在图像识别方面表现出竞争力，同时需要更少的视觉特定归纳偏差。在本文中，我们研究了这种性能是否可以扩展到图像生成。为此，我们将ViT架构集成到生成对抗网络（GANs）中。对于ViT鉴别器，我们观察到现有的用于GANs的正则化方法与自注意力相互作用不佳，在训练过程中导致严重的不稳定性。为解决这个问题，我们引入了几种新颖的正则化技术，用于训练具有ViTs的GANs。对于ViT生成器，我们研究了潜在和像素映射层的架构选择，以促进收敛。实证上，我们的方法，命名为ViTGAN，在三个数据集（CIFAR-10，CelebA和LSUN卧室）上实现了与领先的基于CNN的GAN模型可比较的性能。

更新时间: 2024-05-29 09:41:05

领域: cs.CV,cs.LG,eess.IV

下载: http://arxiv.org/abs/2107.04589v2

Sherlock Holmes Doesn't Play Dice: The significance of Evidence Theory for the Social and Life Sciences

While Evidence Theory (Demster-Shafer Theory, Belief Functions Theory) is being increasingly used in data fusion, its potentialities in the Social and Life Sciences are often obscured by lack of awareness of its distinctive features. With this paper we stress that Evidence Theory can express the uncertainty deriving from the fear that events may materialize, that one has not been able to figure out. By contrast, Probability Theory must limit itself to the possibilities that a decision-maker is currently envisaging. Subsequently, we illustrate how Dempster-Shafer's combination rule relates to Bayes' Theorem for various versions of Probability Theory and discuss which applications of Information Theory can be enhanced by Evidence Theory. Finally, we illustrate our claims with an example where Evidence Theory is used to make sense of the partially overlapping, partially contradictory solutions that appear in an auditing exercise.

Updated: 2024-05-29 09:37:39

标题: 《福尔摩斯不玩骰子：证据理论对社会和生命科学的重要意义》

摘要: Evidence Theory（Dempster-Shafer理论，信念函数理论）在数据融合中得到越来越广泛的应用，但在社会和生命科学领域，其潜力常常被人们对其独特特性缺乏认识而掩盖。本文强调Evidence Theory能够表达由于担心事件可能发生而产生的不确定性，而这种担心可能是人们无法想象的。相比之下，概率论必须限制自己在决策者目前所设想的可能性范围内。接着，我们阐述了Dempster-Shafer的组合规则是如何与各种概率论的贝叶斯定理相关，并讨论了信息论的哪些应用可以通过Evidence Theory得到增强。最后，我们通过一个例子说明了Evidence Theory如何被用来理解审计练习中出现的部分重叠、部分矛盾的解决方案。

更新时间: 2024-05-29 09:37:39

领域: cs.AI,cs.HC,94.D.99,H.m

下载: http://arxiv.org/abs/2309.03222v2

LSPI: Heterogeneous Graph Neural Network Classification Aggregation Algorithm Based on Size Neighbor Path Identification

Existing heterogeneous graph neural network algorithms (HGNNs) mostly rely on meta-paths to capture the rich semantic information contained in heterogeneous graphs (also known as heterogeneous information networks (HINs)), but most of these HGNNs focus on different ways of feature aggre gation and ignore the properties of the meta-paths themselves. This paper studies meta-paths in three commonly used data sets and finds that there are huge differences in the number of neighbors connected by different meta paths. At the same time, the noise information contained in large neigh bor paths will have an adverse impact on model performance. Therefore, this paper proposes a Heterogeneous Graph Neural Network Classification and Aggregation Algorithm Based on Large and Small Neighbor Path Iden tification(LSPI). LSPI firstly divides the meta-paths into large and small neighbor paths through the path discriminator , and in order to reduce the noise interference problem in large neighbor paths, LSPI selects neighbor nodes with higher similarity from both topology and feature perspectives, and passes small neighbor paths and filtered large neighbor paths through different graph convolution components. Aggregation is performed to obtain feature information under different subgraphs, and then LSPI uses subgraph level attention to fuse the feature information under different subgraphs to generate the final node embedding. Finally this paper verifies the superiority of the method through extensive experiments and also gives suggestions on the number of nodes to be retained in large neighbor paths through exper iments. The complete reproducible code adn data has been published at: https://github.com/liuhua811/LSPIA.

Updated: 2024-05-29 09:37:23

标题: LSPI: 基于大小邻居路径识别的异构图神经网络分类聚合算法

摘要: 现有的异构图神经网络算法（HGNNs）大多依赖于元路径来捕获异构图中包含的丰富语义信息（也称为异构信息网络（HINs）），但大多数这些HGNNs关注不同的特征聚合方式，忽略了元路径本身的特性。本文研究了三个常用数据集中的元路径，并发现不同元路径连接的邻居数量存在很大差异。同时，大邻居路径中包含的噪声信息将对模型性能产生不利影响。因此，本文提出了一种基于大邻居路径和小邻居路径识别的异构图神经网络分类和聚合算法（LSPI）。LSPI首先通过路径判别器将元路径分为大和小邻居路径，为了减少大邻居路径中的噪声干扰问题，LSPI从拓扑和特征的角度选择具有更高相似度的邻居节点，并通过不同的图卷积组件传递小邻居路径和经过筛选的大邻居路径。聚合以获取不同子图下的特征信息，然后LSPI使用子图级别的注意力来融合不同子图下的特征信息，生成最终的节点嵌入。最后，本文通过大量实验验证了该方法的优越性，并通过实验提出了应在大邻居路径中保留的节点数量的建议。完整可重现的代码和数据已发布在：https://github.com/liuhua811/LSPIA。

更新时间: 2024-05-29 09:37:23

领域: cs.LG

下载: http://arxiv.org/abs/2405.18933v1

A Mallows-like Criterion for Anomaly Detection with Random Forest Implementation

The effectiveness of anomaly signal detection can be significantly undermined by the inherent uncertainty of relying on one specified model. Under the framework of model average methods, this paper proposes a novel criterion to select the weights on aggregation of multiple models, wherein the focal loss function accounts for the classification of extremely imbalanced data. This strategy is further integrated into Random Forest algorithm by replacing the conventional voting method. We have evaluated the proposed method on benchmark datasets across various domains, including network intrusion. The findings indicate that our proposed method not only surpasses the model averaging with typical loss functions but also outstrips common anomaly detection algorithms in terms of accuracy and robustness.

Updated: 2024-05-29 09:36:57

标题: 一个基于随机森林实现的类马洛斯准则用于异常检测

摘要: 异常信号检测的有效性可能会受到依赖单一指定模型固有不确定性的严重影响。在模型平均方法的框架下，本文提出了一种新颖的标准来选择多个模型聚合的权重，其中焦点损失函数考虑了极度不平衡数据的分类。这种策略进一步集成到随机森林算法中，通过替换传统的投票方法。我们在跨各个领域的基准数据集上评估了所提出的方法，包括网络入侵。研究结果表明，我们提出的方法不仅超越了具有典型损失函数的模型平均方法，而且在准确性和稳健性方面也超过了常见的异常检测算法。

更新时间: 2024-05-29 09:36:57

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.18932v1

Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRI

In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.

Updated: 2024-05-29 09:36:34

标题: 急性心肌梗死MRI的心肌分割和T2定量的同时深度学习

摘要: 在心脏磁共振成像（MRI）分析中，同时进行心肌分割和T2定量对于评估心肌病变至关重要。现有方法通常分别处理这些任务，限制了它们的协同潜力。为了解决这个问题，我们提出了SQNet，这是一个集成了Transformer和卷积神经网络（CNN）组件的双任务网络。SQNet具有一个T2-refine融合解码器，用于定量分析，利用Transformer的全局特征，以及一个带有多个局部区域监督的分割解码器，以增强准确性。一个紧密耦合模块对齐和融合CNN和Transformer分支特征，使SQNet能够专注于心肌区域。对健康对照组（HC）和急性心肌梗死患者（AMI）进行评估显示，与现有方法相比，分割dice分数（89.3/89.2）优越。T2定量与HC/AMI的标签值呈现强相关性（皮尔逊系数：0.84/0.93），表明准确的映射。放射科医生评估证实，SQNet在图像质量得分方面表现卓越（分割为4.60/4.58，T2定量为4.32/4.42），超过现有方法（分割为4.50/4.44，T2定量为3.59/4.37）。因此，SQNet提供了准确的同时分割和定量，增强了心脏疾病诊断，如AMI。

更新时间: 2024-05-29 09:36:34

领域: eess.IV,cs.AI

下载: http://arxiv.org/abs/2405.10570v3

EntProp: High Entropy Propagation for Improving Accuracy and Robustness

Deep neural networks (DNNs) struggle to generalize to out-of-distribution domains that are different from those in training despite their impressive performance. In practical applications, it is important for DNNs to have both high standard accuracy and robustness against out-of-distribution domains. One technique that achieves both of these improvements is disentangled learning with mixture distribution via auxiliary batch normalization layers (ABNs). This technique treats clean and transformed samples as different domains, allowing a DNN to learn better features from mixed domains. However, if we distinguish the domains of the samples based on entropy, we find that some transformed samples are drawn from the same domain as clean samples, and these samples are not completely different domains. To generate samples drawn from a completely different domain than clean samples, we hypothesize that transforming clean high-entropy samples to further increase the entropy generates out-of-distribution samples that are much further away from the in-distribution domain. On the basis of the hypothesis, we propose high entropy propagation~(EntProp), which feeds high-entropy samples to the network that uses ABNs. We introduce two techniques, data augmentation and free adversarial training, that increase entropy and bring the sample further away from the in-distribution domain. These techniques do not require additional training costs. Our experimental results show that EntProp achieves higher standard accuracy and robustness with a lower training cost than the baseline methods. In particular, EntProp is highly effective at training on small datasets.

Updated: 2024-05-29 09:36:20

标题: EntProp：提高准确性和鲁棒性的高熵传播

摘要: 深度神经网络（DNNs）尽管表现出色，但仍然在泛化到与训练中不同的分布领域时存在困难。在实际应用中，DNNs具有高标准准确性和对于不同分布领域的鲁棒性非常重要。一种可以同时实现这两种改进的技术是通过辅助批量归一化层（ABNs）进行混合分布的解缠学习。该技术将干净和转换样本视为不同的领域，使得DNN可以从混合领域中学习更好的特征。然而，如果我们根据熵来区分样本的领域，我们发现一些转换样本与干净样本来自同一领域，这些样本并非完全不同的领域。为了生成完全不同于干净样本的样本，我们假设将干净高熵样本转换以进一步增加熵会生成远离内部分布领域的不同分布样本。基于这一假设，我们提出了高熵传播（EntProp），将高熵样本馈送到使用ABNs的网络中。我们引入了两种技术，数据增强和自由对抗训练，以增加熵并将样本进一步远离内部分布领域。这些技术不需要额外的训练成本。我们的实验结果表明，EntProp比基线方法具有更高的标准准确性和鲁棒性，并且训练成本更低。特别是，EntProp在小数据集上训练非常有效。

更新时间: 2024-05-29 09:36:20

领域: stat.ML,cs.AI,cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.18931v1

Deep Positive-Unlabeled Anomaly Detection for Contaminated Unlabeled Data

Semi-supervised anomaly detection, which aims to improve the performance of the anomaly detector by using a small amount of anomaly data in addition to unlabeled data, has attracted attention. Existing semi-supervised approaches assume that unlabeled data are mostly normal. They train the anomaly detector to minimize the anomaly scores for the unlabeled data, and to maximize those for the anomaly data. However, in practice, the unlabeled data are often contaminated with anomalies. This weakens the effect of maximizing the anomaly scores for anomalies, and prevents us from improving the detection performance. To solve this problem, we propose the positive-unlabeled autoencoder, which is based on positive-unlabeled learning and the anomaly detector such as the autoencoder. With our approach, we can approximate the anomaly scores for normal data using the unlabeled and anomaly data. Therefore, without the labeled normal data, we can train the anomaly detector to minimize the anomaly scores for normal data, and to maximize those for the anomaly data. In addition, our approach is applicable to various anomaly detectors such as the DeepSVDD. Experiments on various datasets show that our approach achieves better detection performance than existing approaches.

Updated: 2024-05-29 09:34:47

标题: 深度正标记异常检测用于受污染无标记数据

摘要: 半监督异常检测旨在通过使用少量异常数据来改善异常检测器的性能，已经引起了关注。现有的半监督方法假设未标记数据大多是正常的。他们训练异常检测器以最小化未标记数据的异常分数，并最大化异常数据的分数。然而，在实践中，未标记数据通常受到异常数据的污染。这削弱了最大化异常分数对异常数据的影响，并阻止我们提高检测性能。为了解决这个问题，我们提出了正例-未标记自动编码器，它基于正例-未标记学习和异常检测器，如自动编码器。通过我们的方法，我们可以使用未标记和异常数据来近似正常数据的异常分数。因此，在没有标记的正常数据的情况下，我们可以训练异常检测器以最小化正常数据的异常分数，并最大化异常数据的分数。此外，我们的方法适用于各种异常检测器，如DeepSVDD。对各种数据集的实验证明，我们的方法实现了比现有方法更好的检测性能。

更新时间: 2024-05-29 09:34:47

领域: stat.ML,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18929v1

Hacking Task Confounder in Meta-Learning

Meta-learning enables rapid generalization to new tasks by learning knowledge from various tasks. It is intuitively assumed that as the training progresses, a model will acquire richer knowledge, leading to better generalization performance. However, our experiments reveal an unexpected result: there is negative knowledge transfer between tasks, affecting generalization performance. To explain this phenomenon, we conduct Structural Causal Models (SCMs) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as "Task Confounders". Based on these findings, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled generating factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.

Updated: 2024-05-29 09:30:00

标题: Meta-Learning中的任务混淆者入侵

摘要: 元学习通过从各种任务中学习知识，实现对新任务的快速泛化。直觉上认为，随着训练的进行，模型将获得更丰富的知识，从而导致更好的泛化性能。然而，我们的实验揭示了一个意想不到的结果：任务之间存在负面知识转移，影响泛化性能。为了解释这一现象，我们进行了因果分析的结构性因果模型（SCMs）。我们的调查揭示了在元学习中，任务特定因果因素和标签之间存在虚假相关性。此外，混淆因素在不同批次之间也存在差异。我们将这些混淆因素称为“任务混淆因素”。基于这些发现，我们提出了一个即插即用的元学习因果表示学习器（MetaCRL），以消除任务混淆因素。它对来自多个任务的解耦生成因素进行编码，并利用基于不变性的双层优化机制确保它们在元学习中的因果关系。在各种基准数据集上进行的大量实验表明，我们的工作实现了最先进的性能。

更新时间: 2024-05-29 09:30:00

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2312.05771v5

Federated Continual Learning Goes Online: Leveraging Uncertainty for Modality-Agnostic Class-Incremental Learning

Given the ability to model more realistic and dynamic problems, Federated Continual Learning (FCL) has been increasingly investigated recently. A well-known problem encountered in this setting is the so-called catastrophic forgetting, for which the learning model is inclined to focus on more recent tasks while forgetting the previously learned knowledge. The majority of the current approaches in FCL propose generative-based solutions to solve said problem. However, this setting requires multiple training epochs over the data, implying an offline setting where datasets are stored locally and remain unchanged over time. Furthermore, the proposed solutions are tailored for vision tasks solely. To overcome these limitations, we propose a new modality-agnostic approach to deal with the online scenario where new data arrive in streams of mini-batches that can only be processed once. To solve catastrophic forgetting, we propose an uncertainty-aware memory-based approach. In particular, we suggest using an estimator based on the Bregman Information (BI) to compute the model's variance at the sample level. Through measures of predictive uncertainty, we retrieve samples with specific characteristics, and - by retraining the model on such samples - we demonstrate the potential of this approach to reduce the forgetting effect in realistic settings.

Updated: 2024-05-29 09:29:39

标题: Federated Continual Learning Goes Online: 利用不确定性实现模态不可知的类增量学习

摘要: 鉴于模拟更真实和动态问题的能力，联邦式持续学习（FCL）最近受到越来越多的关注。在这种设置中遇到的一个众所周知的问题是所谓的灾难性遗忘，即学习模型倾向于专注于更近期的任务，同时忘记先前学到的知识。当前大多数FCL中的方法都提出基于生成的解决方案来解决这个问题。然而，这种设置需要对数据进行多次训练周期，意味着离线设置，其中数据集存储在本地并随时间不变。此外，所提出的解决方案仅针对视觉任务而设计。为了克服这些限制，我们提出了一种新的模态无关方法来处理在线场景，其中新数据以小批量流的形式到达，只能处理一次。为了解决灾难性遗忘，我们提出了一种基于不确定性感知的基于记忆的方法。特别是，我们建议使用基于Bregman信息（BI）的估计器来计算模型在样本级别的方差。通过预测不确定性的测量，我们检索具有特定特征的样本，并通过在这些样本上重新训练模型，展示了这种方法在真实环境中减少遗忘效应的潜力。

更新时间: 2024-05-29 09:29:39

领域: cs.LG

下载: http://arxiv.org/abs/2405.18925v1

Matrix Information Theory for Self-Supervised Learning

The maximum entropy encoding framework provides a unified perspective for many non-contrastive learning methods like SimSiam, Barlow Twins, and MEC. Inspired by this framework, we introduce Matrix-SSL, a novel approach that leverages matrix information theory to interpret the maximum entropy encoding loss as matrix uniformity loss. Furthermore, Matrix-SSL enhances the maximum entropy encoding method by seamlessly incorporating matrix alignment loss, directly aligning covariance matrices in different branches. Experimental results reveal that Matrix-SSL outperforms state-of-the-art methods on the ImageNet dataset under linear evaluation settings and on MS-COCO for transfer learning tasks. Specifically, when performing transfer learning tasks on MS-COCO, our method outperforms previous SOTA methods such as MoCo v2 and BYOL up to 3.3% with only 400 epochs compared to 800 epochs pre-training. We also try to introduce representation learning into the language modeling regime by fine-tuning a 7B model using matrix cross-entropy loss, with a margin of 3.1% on the GSM8K dataset over the standard cross-entropy loss. Code available at https://github.com/yifanzhang-pro/Matrix-SSL.

Updated: 2024-05-29 09:26:58

标题: 矩阵信息理论用于自监督学习

摘要: 最大熵编码框架提供了一个统一的视角，用于许多非对比学习方法，如SimSiam、Barlow Twins和MEC。受到这一框架的启发，我们引入了Matrix-SSL，这是一种利用矩阵信息理论来解释最大熵编码损失的矩阵均匀性损失的新方法。此外，Matrix-SSL通过无缝地将矩阵对齐损失纳入最大熵编码方法，直接对不同分支中的协方差矩阵进行对齐，从而增强了最大熵编码方法。实验结果显示，Matrix-SSL在线性评估设置下的ImageNet数据集和MS-COCO转移学习任务上优于最先进的方法。具体来说，在MS-COCO上进行转移学习任务时，我们的方法在仅经过400个时代的情况下，与800个时代的预训练相比，优于MoCo v2和BYOL等先前的SOTA方法高达3.3%。我们还尝试通过使用矩阵交叉熵损失对7B模型进行微调，以在GSM8K数据集上比标准交叉熵损失高出3.1%。代码可在https://github.com/yifanzhang-pro/Matrix-SSL找到。

更新时间: 2024-05-29 09:26:58

领域: cs.LG,cs.AI,cs.CV

下载: http://arxiv.org/abs/2305.17326v6

GLANCE: Global Actions in a Nutshell for Counterfactual Explainability

Counterfactual explanations have emerged as an important tool to understand, debug, and audit complex machine learning models. To offer global counterfactual explainability, state-of-the-art methods construct summaries of local explanations, offering a trade-off among conciseness, counterfactual effectiveness, and counterfactual cost or burden imposed on instances. In this work, we provide a concise formulation of the problem of identifying global counterfactuals and establish principled criteria for comparing solutions, drawing inspiration from Pareto dominance. We introduce innovative algorithms designed to address the challenge of finding global counterfactuals for either the entire input space or specific partitions, employing clustering and decision trees as key components. Additionally, we conduct a comprehensive experimental evaluation, considering various instances of the problem and comparing our proposed algorithms with state-of-the-art methods. The results highlight the consistent capability of our algorithms to generate meaningful and interpretable global counterfactual explanations.

Updated: 2024-05-29 09:24:25

标题: GLANCE：反事实可解释性中的全球行动简述

摘要: 反事实解释已经成为理解、调试和审计复杂机器学习模型的重要工具。为了提供全局的反事实可解释性，最先进的方法构建了局部解释的摘要，提供了简洁性、反事实有效性以及对实例施加的反事实成本或负担之间的权衡。在这项工作中，我们提供了一个简洁的问题表述，用于识别全局反事实，并建立了比较解决方案的原则性标准，从Pareto支配中汲取灵感。我们引入了创新算法，旨在解决寻找整个输入空间或特定分区的全局反事实的挑战，采用聚类和决策树作为关键组成部分。此外，我们进行了全面的实验评估，考虑了问题的各种实例，并将我们提出的算法与最先进的方法进行了比较。结果突显了我们的算法生成有意义和可解释的全局反事实解释的一致能力。

更新时间: 2024-05-29 09:24:25

领域: cs.LG

下载: http://arxiv.org/abs/2405.18921v1

Computing low-thrust transfers in the asteroid belt, a comparison between astrodynamical manipulations and a machine learning approach

Low-thrust trajectories play a crucial role in optimizing scientific output and cost efficiency in asteroid belt missions. Unlike high-thrust transfers, low-thrust trajectories require solving complex optimal control problems. This complexity grows exponentially with the number of asteroids visited due to orbital mechanics intricacies. In the literature, methods for approximating low-thrust transfers without full optimization have been proposed, including analytical and machine learning techniques. In this work, we propose new analytical approximations and compare their accuracy and performance to machine learning methods. While analytical approximations leverage orbit theory to estimate trajectory costs, machine learning employs a more black-box approach, utilizing neural networks to predict optimal transfers based on various attributes. We build a dataset of about 3 million transfers, found by solving the time and fuel optimal control problems, for different time of flights, which we also release open-source. Comparison between the two methods on this database reveals the superiority of machine learning, especially for longer transfers. Despite challenges such as multi revolution transfers, both approaches maintain accuracy within a few percent in the final mass errors, on a database of trajectories involving numerous asteroids. This work contributes to the efficient exploration of mission opportunities in the asteroid belt, providing insights into the strengths and limitations of different approximation strategies.

Updated: 2024-05-29 09:20:54

标题: 在小行星带中计算低推力转移：对天体动力学操作和机器学习方法的比较

摘要: 低推力轨迹在优化小行星带任务的科学产出和成本效率中发挥着至关重要的作用。与高推力转移不同，低推力轨迹需要解决复杂的最优控制问题。由于轨道力学的复杂性，这种复杂性随着访问小行星数量的增加而呈指数级增长。在文献中，已经提出了一些用于近似低推力转移而无需完全优化的方法，包括分析和机器学习技术。在这项工作中，我们提出了新的分析近似，并将它们的准确性和性能与机器学习方法进行比较。虽然分析近似利用轨道理论来估计轨迹成本，但机器学习则采用更黑盒的方法，利用神经网络根据各种属性来预测最佳转移。我们建立了一个包含约300万次转移的数据集，通过解决时间和燃料最优控制问题，以获得不同飞行时间的转移，我们也将其开源发布。在这个数据库上两种方法的比较揭示了机器学习的优越性，尤其是对于较长的转移。尽管存在多次革命转移等挑战，但这两种方法在包含众多小行星的轨迹数据库中仍能在最终质量误差方面保持准确性在几个百分点以内。这项工作有助于有效地探索小行星带任务机会，提供了不同近似策略的优势和局限性的见解。

更新时间: 2024-05-29 09:20:54

领域: astro-ph.EP,astro-ph.IM,cs.LG,physics.space-ph

下载: http://arxiv.org/abs/2405.18918v1

Causal Action Influence Aware Counterfactual Data Augmentation

Offline data are both valuable and practical resources for teaching robots complex behaviors. Ideally, learning agents should not be constrained by the scarcity of available demonstrations, but rather generalize beyond the training distribution. However, the complexity of real-world scenarios typically requires huge amounts of data to prevent neural network policies from picking up on spurious correlations and learning non-causal relationships. We propose CAIAC, a data augmentation method that can create feasible synthetic transitions from a fixed dataset without having access to online environment interactions. By utilizing principled methods for quantifying causal influence, we are able to perform counterfactual reasoning by swapping $\it{action}$-unaffected parts of the state-space between independent trajectories in the dataset. We empirically show that this leads to a substantial increase in robustness of offline learning algorithms against distributional shift.

Updated: 2024-05-29 09:19:50

标题: 因果作用影响感知对照数据增强

摘要: 离线数据既是有价值的又是实用的资源，用于教授机器人复杂行为。理想情况下，学习代理应该不受可用演示的稀缺性限制，而是应该在训练分布之外进行泛化。然而，现实世界场景的复杂性通常需要大量数据，以防止神经网络策略捕捉到偶然相关性并学习非因果关系。我们提出了CAIAC，一种数据增强方法，可以在没有在线环境交互的情况下从固定数据集中创建可行的合成转换。通过利用量化因果影响的原则方法，我们能够通过在数据集中独立轨迹之间交换不受$\it{action}$影响的状态空间部分来进行反事实推理。我们在实证上表明，这导致离线学习算法对分布变化的鲁棒性显著增加。

更新时间: 2024-05-29 09:19:50

领域: cs.LG,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.18917v1

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty. It can be filled with validated knowledge and progressively expanded. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.

Updated: 2024-05-29 09:19:35

标题: 学会拒绝：通过知识范围限制和拒绝机制使大型语言模型更加可控和可靠

摘要: 大型语言模型(LLMs)展示了令人印象深刻的语言理解和生成能力，使其能够回答各种领域的广泛问题。然而，这些模型并非完美，通常会产生包含错误或错误信息的响应。这些不准确性，通常称为幻觉，使LLMs在许多场景中变得不可靠甚至无法使用。本文的重点是缓解LLMs中的幻觉问题，特别是在问答的背景下。我们不是试图回答所有问题，而是探索一种拒绝机制，指示LLMs拒绝回答具有挑战性的问题，以避免错误。然后，我们提出了一种简单而有效的解决方案，名为Learn to Refuse (L2R)，该解决方案将拒绝机制纳入，使LLMs能够识别并拒绝回答他们发现难以应对的问题。为了实现这一点，我们利用结构化知识库来表示LLMs对世界的理解，使其能够提供可追溯的黄金知识。这个知识库与LLM分开，并且最初为空。它可以填充经过验证的知识并逐渐扩展。当LLM遇到超出其领域范围的问题时，系统会识别其知识范围，并确定它是否能独立回答问题。此外，我们介绍了一种自动和高效地扩展LLMs知识库的方法。通过定性和定量分析，我们证明了我们的方法增强了LLMs的可控性和可靠性。

更新时间: 2024-05-29 09:19:35

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2311.01041v3

Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners

Large language models (LLMs) suffer from serious unfaithful chain-of-thought (CoT) issues. Previous work attempts to measure and explain it but lacks in-depth analysis within CoTs and does not consider the interactions among all reasoning components jointly. In this paper, we first study the CoT faithfulness issue at the granularity of CoT steps, identify two reasoning paradigms: centralized reasoning and distributed reasoning, and find their relationship with faithfulness. Subsequently, we conduct a joint analysis of the causal relevance among the context, CoT, and answer during reasoning. The result proves that, when the LLM predicts answers, it can recall correct information missing in the CoT from the context, leading to unfaithfulness issues. Finally, we propose the inferential bridging method to mitigate this issue, in which we use the attribution method to recall information as hints for CoT generation and filter out noisy CoTs based on their semantic consistency and attribution scores. Extensive experiments demonstrate that our approach effectively alleviates the unfaithful CoT problem.

Updated: 2024-05-29 09:17:46

标题: 朝着忠实的思维链前进：大型语言模型正在连接推理者

摘要: 大型语言模型(LLMs)存在严重的不忠实的思维链(CoT)问题。先前的工作试图衡量和解释这一问题，但缺乏对CoTs内部的深入分析，并未考虑所有推理组件之间的相互作用。在本文中，我们首先研究CoT忠实性问题在CoT步骤的粒度上，确定了两种推理范式：集中式推理和分布式推理，并发现它们与忠实性的关系。随后，我们对推理过程中上下文、CoT和答案之间的因果关系进行了联合分析。结果证明，当LLM预测答案时，它可以从上下文中召回CoT中缺失的正确信息，导致不忠实问题。最后，我们提出推理桥接方法来缓解这一问题，其中我们使用归因方法召回信息作为CoT生成的线索，并根据它们的语义一致性和归因分数过滤出嘈杂的CoTs。大量实验证明，我们的方法有效地缓解了不忠实的CoT问题。

更新时间: 2024-05-29 09:17:46

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.18915v1

Leveraging Time-Series Foundation Models in Smart Agriculture for Soil Moisture Forecasting

The recent surge in foundation models for natural language processing and computer vision has fueled innovation across various domains. Inspired by this progress, we explore the potential of foundation models for time-series forecasting in smart agriculture, a field often plagued by limited data availability. Specifically, this work presents a novel application of $\texttt{TimeGPT}$, a state-of-the-art (SOTA) time-series foundation model, to predict soil water potential ($\psi_\mathrm{soil}$), a key indicator of field water status that is typically used for irrigation advice. Traditionally, this task relies on a wide array of input variables. We explore $\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in: ($i$) a zero-shot setting, ($ii$) a fine-tuned setting relying solely on historic $\psi_\mathrm{soil}$ measurements, and ($iii$) a fine-tuned setting where we also add exogenous variables to the model. We compare $\texttt{TimeGPT}$'s performance to established SOTA baseline models for forecasting $\psi_\mathrm{soil}$. Our results demonstrate that $\texttt{TimeGPT}$ achieves competitive forecasting accuracy using only historical $\psi_\mathrm{soil}$ data, highlighting its remarkable potential for agricultural applications. This research paves the way for foundation time-series models for sustainable development in agriculture by enabling forecasting tasks that were traditionally reliant on extensive data collection and domain expertise.

Updated: 2024-05-29 09:16:03

标题: 利用时间序列基础模型在智能农业中进行土壤湿度预测

摘要: 最近，基于自然语言处理和计算机视觉的基础模型的激增推动了各个领域的创新。受到这一进展的启发，我们探索了基础模型在智能农业时间序列预测中的潜力，这个领域经常受限于数据的有限可用性。具体来说，本文介绍了$\texttt{TimeGPT}$的一种新颖应用，这是一种最先进的时间序列基础模型，用于预测土壤水势($\psi_\mathrm{soil}$)，这是田间水分状况的关键指标，通常用于灌溉建议。传统上，这项任务依赖于各种输入变量。我们探讨了$\psi_\mathrm{soil}$在以下情况下预测$\psi_\mathrm{soil}$的能力：($i$)零-shot设置，($ii$)仅依赖历史$\psi_\mathrm{soil}$测量的微调设置，以及($iii$)在模型中还添加外生变量的微调设置。我们将$\texttt{TimeGPT}$的性能与已建立的SOTA基线模型进行了比较，用于预测$\psi_\mathrm{soil}$。我们的结果表明，$\texttt{TimeGPT}$仅使用历史$\psi_\mathrm{soil}$数据就实现了竞争力的预测准确性，突显了它在农业应用中的显著潜力。这项研究为农业可持续发展铺平了道路，通过使传统上依赖于大量数据收集和领域专业知识的预测任务成为可能。

更新时间: 2024-05-29 09:16:03

领域: cs.LG

下载: http://arxiv.org/abs/2405.18913v1

NeuralODEs for VLEO simulations: Introducing thermoNET for Thermosphere Modeling

We introduce a novel neural architecture termed thermoNET, designed to represent thermospheric density in satellite orbital propagation using a reduced amount of differentiable computations. Due to the appearance of a neural network on the right-hand side of the equations of motion, the resulting satellite dynamics is governed by a NeuralODE, a neural Ordinary Differential Equation, characterized by its fully differentiable nature, allowing the derivation of variational equations (hence of the state transition matrix) and facilitating its use in connection to advanced numerical techniques such as Taylor-based numerical propagation and differential algebraic techniques. Efficient training of the network parameters occurs through two distinct approaches. In the first approach, the network undergoes training independently of spacecraft dynamics, engaging in a pure regression task against ground truth models, including JB-08 and NRLMSISE-00. In the second paradigm, network parameters are learned based on observed dynamics, adapting through ODE sensitivities. In both cases, the outcome is a flexible, compact model of the thermosphere density greatly enhancing numerical propagation efficiency while maintaining accuracy in the orbital predictions.

Updated: 2024-05-29 09:12:44

标题: 神经ODE用于VLEO模拟：引入ThermoNET进行热层建模

摘要: 我们介绍了一种称为热网络的新型神经架构，旨在利用较少的可微分计算来表示卫星轨道传播中的热层密度。由于在运动方程的右侧出现了一个神经网络，导致结果卫星动力学由神经ODE（神经常微分方程）控制，其特点是完全可微分的性质，允许推导变分方程（因此状态转移矩阵）并促进其与先进数值技术（如基于泰勒的数值传播和微分代数技术）的连接。通过两种不同的方法有效训练网络参数。在第一种方法中，网络独立于航天器动力学进行训练，从事纯回归任务与地面真实模型（包括JB-08和NRLMSISE-00）对抗。在第二范式中，网络参数基于观察到的动态进行学习，通过ODE敏感性进行调整。在两种情况下，结果是一个灵活、紧凑的热层密度模型，极大地提高了数值传播效率，同时在轨道预测中保持准确性。

更新时间: 2024-05-29 09:12:44

领域: astro-ph.EP,cs.LG,physics.ao-ph,physics.space-ph

下载: http://arxiv.org/abs/2405.19384v1

Predicting Parking Availability in Singapore with Cross-Domain Data: A New Dataset and A Data-Driven Approach

The increasing number of vehicles highlights the need for efficient parking space management. Predicting real-time Parking Availability (PA) can help mitigate traffic congestion and the corresponding social problems, which is a pressing issue in densely populated cities like Singapore. In this study, we aim to collectively predict future PA across Singapore with complex factors from various domains. The contributions in this paper are listed as follows: (1) A New Dataset: We introduce the \texttt{SINPA} dataset, containing a year's worth of PA data from 1,687 parking lots in Singapore, enriched with various spatial and temporal factors. (2) A Data-Driven Approach: We present DeepPA, a novel deep-learning framework, to collectively and efficiently predict future PA across thousands of parking lots. (3) Extensive Experiments and Deployment: DeepPA demonstrates a 9.2% reduction in prediction error for up to 3-hour forecasts compared to existing advanced models. Furthermore, we implement DeepPA in a practical web-based platform to provide real-time PA predictions to aid drivers and inform urban planning for the governors in Singapore. We release the dataset and source code at https://github.com/yoshall/SINPA.

Updated: 2024-05-29 09:11:51

标题: 使用跨领域数据预测新加坡停车位可用性：一个新数据集和数据驱动方法

摘要: 车辆数量的增加突显了高效停车空间管理的需求。预测实时停车位可用性（PA）可以帮助缓解交通拥堵及相关社会问题，在新加坡等人口密集的城市中尤为迫切。本研究旨在共同预测新加坡各地未来的PA，涉及各个领域的复杂因素。本文的贡献如下：（1）新数据集：我们引入了SINPA数据集，包含新加坡1,687个停车场一年的PA数据，丰富了各种空间和时间因素。（2）数据驱动方法：我们提出了DeepPA，一种新颖的深度学习框架，能够集体且高效地预测成千上万个停车场的未来PA。（3）大量实验和部署：DeepPA相对于现有先进模型，在长达3小时的预测中，预测误差减少了9.2%。此外，我们将DeepPA实施在一个实用的基于Web的平台上，为新加坡的驾驶员提供实时PA预测，以帮助他们并为新加坡的城市规划者提供信息。我们在https://github.com/yoshall/SINPA发布了数据集和源代码。

更新时间: 2024-05-29 09:11:51

领域: cs.AI

下载: http://arxiv.org/abs/2405.18910v1

Language Generation with Strictly Proper Scoring Rules

Language generation based on maximum likelihood estimation (MLE) has become the fundamental approach for text generation. Maximum likelihood estimation is typically performed by minimizing the log-likelihood loss, also known as the logarithmic score in statistical decision theory. The logarithmic score is strictly proper in the sense that it encourages honest forecasts, where the expected score is maximized only when the model reports true probabilities. Although many strictly proper scoring rules exist, the logarithmic score is the only local scoring rule among them that depends exclusively on the probability of the observed sample, making it capable of handling the exponentially large sample space of natural text. In this work, we propose a straightforward strategy for adapting scoring rules to language generation, allowing for language modeling with any non-local scoring rules. Leveraging this strategy, we train language generation models using two classic strictly proper scoring rules, the Brier score and the Spherical score, as alternatives to the logarithmic score. Experimental results indicate that simply substituting the loss function, without adjusting other hyperparameters, can yield substantial improvements in model's generation capabilities. Moreover, these improvements can scale up to large language models (LLMs) such as LLaMA-7B and LLaMA-13B. Source code: \url{https://github.com/shaochenze/ScoringRulesLM}.

Updated: 2024-05-29 09:09:00

标题: 使用严格恰当评分规则的语言生成

摘要: 基于最大似然估计（MLE）的语言生成已成为文本生成的基本方法。最大似然估计通常通过最小化对数似然损失来实现，也称为统计决策理论中的对数分数。对数分数严格适当，它鼓励诚实的预测，其中期望分数仅在模型报告真实概率时最大化。尽管存在许多严格适当的评分规则，但在其中，对数分数是唯一的局部评分规则，它仅取决于观察样本的概率，使其能够处理自然文本的指数级样本空间。在这项工作中，我们提出了一种简单的策略，将评分规则调整为语言生成，允许使用任何非局部评分规则进行语言建模。利用这种策略，我们使用两种经典的严格适当的评分规则，Brier分数和球形分数，作为对数分数的替代方法，训练语言生成模型。实验结果表明，仅通过替换损失函数，而不调整其他超参数，可以显著提高模型的生成能力。此外，这些改进可以扩展到大型语言模型（LLMs），如LLaMA-7B和LLaMA-13B。源代码：https://github.com/shaochenze/ScoringRulesLM。

更新时间: 2024-05-29 09:09:00

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.18906v1

On the Properties and Estimation of Pointwise Mutual Information Profiles

The pointwise mutual information profile, or simply profile, is the distribution of pointwise mutual information for a given pair of random variables. One of its important properties is that its expected value is precisely the mutual information between these random variables. In this paper, we analytically describe the profiles of multivariate normal distributions and introduce a novel family of distributions, Bend and Mix Models, for which the profile can be accurately estimated using Monte Carlo methods. We then show how Bend and Mix Models can be used to study the limitations of existing mutual information estimators, investigate the behavior of neural critics used in variational estimators, and understand the effect of experimental outliers on mutual information estimation. Finally, we show how Bend and Mix Models can be used to obtain model-based Bayesian estimates of mutual information, suitable for problems with available domain expertise in which uncertainty quantification is necessary.

Updated: 2024-05-29 09:04:18

标题: 关于点间互信息轮廓的特性和估计

摘要: 点间互信息剖面，或简称剖面，是给定一对随机变量的点间互信息的分布。其重要特性之一是其期望值恰好是这些随机变量之间的互信息。在本文中，我们解析地描述了多元正态分布的剖面，并引入了一种新的分布族，弯曲与混合模型，通过蒙特卡罗方法可以准确估计其剖面。然后，我们展示了如何使用弯曲与混合模型来研究现有互信息估计器的局限性，调查用于变分估计器的神经评论家的行为，以及了解实验异常值对互信息估计的影响。最后，我们展示了如何使用弯曲与混合模型来获得基于模型的贝叶斯估计的互信息，适用于具有可用领域专业知识且需要不确定性量化的问题。

更新时间: 2024-05-29 09:04:18

领域: stat.ML,cs.IT,cs.LG,math.IT

下载: http://arxiv.org/abs/2310.10240v2

A Causal Framework for Evaluating Deferring Systems

Deferring systems extend supervised Machine Learning (ML) models with the possibility to defer predictions to human experts. However, evaluating the impact of a deferring strategy on system accuracy is still an overlooked area. This paper fills this gap by evaluating deferring systems through a causal lens. We link the potential outcomes framework for causal inference with deferring systems. This allows us to identify the causal impact of the deferring strategy on predictive accuracy. We distinguish two scenarios. In the first one, we can access both the human and the ML model predictions for the deferred instances. In such a case, we can identify the individual causal effects for deferred instances and aggregates of them. In the second scenario, only human predictions are available for the deferred instances. In this case, we can resort to regression discontinuity design to estimate a local causal effect. We empirically evaluate our approach on synthetic and real datasets for seven deferring systems from the literature.

Updated: 2024-05-29 09:03:44

标题: 一个用于评估延迟系统的因果框架

摘要: 推迟系统扩展了监督机器学习（ML）模型，使其具有将预测推迟给人类专家的可能性。然而，评估推迟策略对系统准确性的影响仍然是一个被忽视的领域。本文通过因果镜头评估推迟系统来填补这一空白。我们将因果推断的潜在结果框架与推迟系统联系起来。这使我们能够确定推迟策略对预测准确性的因果影响。我们区分两种情况。在第一种情况下，我们可以访问推迟实例的人类和ML模型的预测。在这种情况下，我们可以确定推迟实例的个体因果效应和它们的聚合效应。在第二种情况下，仅可用于推迟实例的人类预测。在这种情况下，我们可以借助回归断点设计来估计局部因果效应。我们在合成和真实数据集上对来自文献的七个推迟系统的方法进行了实证评估。

更新时间: 2024-05-29 09:03:44

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2405.18902v1

A novel hybrid time-varying graph neural network for traffic flow forecasting

In order to overcome these challenges, we have proposed a novel hybrid time-varying graph neural network (HTVGNN) for traffic flow prediction. Firstly, a novel time-aware multi-attention mechanism based on time-varying mask enhancement was reported to more accurately model the dynamic temporal dependencies among distinct traffic nodes in the traffic network. Secondly, we have proposed a novel graph learning strategy to concurrently learn both static and dynamic spatial associations between different traffic nodes in road networks. Meanwhile, in order to enhance the learning ability of time-varying graphs, a coupled graph learning mechanism was designed to couple the graphs learned at each time step. Finally, the effectiveness of the proposed method HTVGNN was demonstrated with four real data sets. Simulation results revealed that HTVGNN achieves superior prediction accuracy compared to the state of the art space-time graph neural network models. Additionally, the ablation experiment verifies that the coupled graph learning mechanism can effectively improve the long-term prediction performance of HTVGNN.

Updated: 2024-05-29 09:00:32

标题: 一个新颖的混合时变图神经网络用于交通流量预测

摘要: 为了克服这些挑战，我们提出了一种新颖的混合时变图神经网络（HTVGNN）用于交通流预测。首先，基于时变掩模增强的新型时间感知多注意力机制被提出，以更准确地建模交通网络中不同交通节点之间的动态时间依赖关系。其次，我们提出了一种新颖的图学习策略，同时学习道路网络中不同交通节点之间的静态和动态空间关联。同时，为了增强时变图的学习能力，设计了一种耦合图学习机制，用于耦合在每个时间步学习的图。最后，通过四个真实数据集展示了所提出的方法HTVGNN的有效性。模拟结果显示，与现有的时空图神经网络模型相比，HTVGNN实现了更高的预测准确性。此外，消融实验验证了耦合图学习机制可以有效提高HTVGNN的长期预测性能。

更新时间: 2024-05-29 09:00:32

领域: cs.LG

下载: http://arxiv.org/abs/2401.10155v3

Evolutionary Computation in the Era of Large Language Model: Survey and Roadmap

Large language models (LLMs) have not only revolutionized natural language processing but also extended their prowess to various domains, marking a significant stride towards artificial general intelligence. The interplay between LLMs and evolutionary algorithms (EAs), despite differing in objectives and methodologies, share a common pursuit of applicability in complex problems. Meanwhile, EA can provide an optimization framework for LLM's further enhancement under black-box settings, empowering LLM with flexible global search capacities. On the other hand, the abundant domain knowledge inherent in LLMs could enable EA to conduct more intelligent searches. Furthermore, the text processing and generative capabilities of LLMs would aid in deploying EAs across a wide range of tasks. Based on these complementary advantages, this paper provides a thorough review and a forward-looking roadmap, categorizing the reciprocal inspiration into two main avenues: LLM-enhanced EA and EA-enhanced LLM. Some integrated synergy methods are further introduced to exemplify the complementarity between LLMs and EAs in diverse scenarios, including code generation, software engineering, neural architecture search, and various generation tasks. As the first comprehensive review focused on the EA research in the era of LLMs, this paper provides a foundational stepping stone for understanding the collaborative potential of LLMs and EAs. The identified challenges and future directions offer guidance for researchers and practitioners to unlock the full potential of this innovative collaboration in propelling advancements in optimization and artificial intelligence. We have created a GitHub repository to index the relevant papers: https://github.com/wuxingyu-ai/LLM4EC.

Updated: 2024-05-29 09:00:25

标题: 在大语言模型时代的进化计算：调查与路线图

摘要: 大型语言模型(LLMs)不仅彻底改变了自然语言处理，而且将它们的能力扩展到各种领域，标志着人工智能的重要一步。尽管LLMs和进化算法(EAs)在目标和方法论上存在差异，但二者之间存在相互作用，共同追求在复杂问题中的适用性。与此同时，EA可以为LLM提供优化框架，在黑盒设置下进一步增强LLM，赋予LLM灵活的全局搜索能力。另一方面，LLMs内在丰富的领域知识可以使EA进行更智能的搜索。此外，LLMs的文本处理和生成能力将有助于在各种任务中部署EA。基于这些互补优势，本文提供了一份全面的审查和前瞻性路线图，将互相启发分为两个主要途径：LLM增强EA和EA增强LLM。进一步介绍了一些整合的协同方法，以示范LLMs和EAs在不同场景中的互补性，包括代码生成、软件工程、神经架构搜索和各种生成任务。作为着眼于LLMs时代的EA研究的第一份全面审查，本文为理解LLMs和EAs的协同潜力提供了基础性的参考。确定的挑战和未来方向为研究人员和从业者提供指导，以释放这种创新协作在推动优化和人工智能进步方面的全部潜力。我们已创建了一个GitHub存储库来索引相关论文：https://github.com/wuxingyu-ai/LLM4EC。

更新时间: 2024-05-29 09:00:25

领域: cs.NE,cs.AI,cs.CL

下载: http://arxiv.org/abs/2401.10034v3

Eluder-based Regret for Stochastic Contextual MDPs

We present the E-UC$^3$RL algorithm for regret minimization in Stochastic Contextual Markov Decision Processes (CMDPs). The algorithm operates under the minimal assumptions of realizable function class and access to \emph{offline} least squares and log loss regression oracles. Our algorithm is efficient (assuming efficient offline regression oracles) and enjoys a regret guarantee of $ \widetilde{O}(H^3 \sqrt{T |S| |A|d_{\mathrm{E}}(\mathcal{P}) \log (|\mathcal{F}| |\mathcal{P}|/ \delta) )}) , $ with $T$ being the number of episodes, $S$ the state space, $A$ the action space, $H$ the horizon, $\mathcal{P}$ and $\mathcal{F}$ are finite function classes used to approximate the context-dependent dynamics and rewards, respectively, and $d_{\mathrm{E}}(\mathcal{P})$ is the Eluder dimension of $\mathcal{P}$ w.r.t the Hellinger distance. To the best of our knowledge, our algorithm is the first efficient and rate-optimal regret minimization algorithm for CMDPs that operates under the general offline function approximation setting. In addition, we extend the Eluder dimension to general bounded metrics which may be of separate interest.

Updated: 2024-05-29 08:57:58

标题: 基于逃避者的后悔算法用于随机情境MDPs

摘要: 我们提出了用于随机环境背景下马尔可夫决策过程（CMDPs）中遗憾最小化的E-UC$^3$RL算法。该算法在实现函数类和\emph{离线}最小二乘和对数损失回归预言的最低假设下运行。我们的算法是高效的（假设离线回归预言高效），并且拥有$ \widetilde{O}(H^3 \sqrt{T |S| |A|d_{\mathrm{E}}(\mathcal{P}) \log (|\mathcal{F}| |\mathcal{P}|/ \delta) )}) $的遗憾保证，其中$T$是剧集数，$S$是状态空间，$A$是动作空间，$H$是视野，$\mathcal{P}$和$\mathcal{F}$是用于近似依赖于环境的动态和奖励的有限函数类，$d_{\mathrm{E}}(\mathcal{P})$是关于Hellinger距离的$\mathcal{P}$的Eluder维度。据我们所知，我们的算法是第一个在一般离线函数逼近设置下运行的CMDPs的高效和速率最优的遗憾最小化算法。此外，我们将Eluder维度扩展到可能是单独感兴趣的一般有界度量。

更新时间: 2024-05-29 08:57:58

领域: cs.LG

下载: http://arxiv.org/abs/2211.14932v3

Unit-Aware Genetic Programming for the Development of Empirical Equations

When developing empirical equations, domain experts require these to be accurate and adhere to physical laws. Often, constants with unknown units need to be discovered alongside the equations. Traditional unit-aware genetic programming (GP) approaches cannot be used when unknown constants with undetermined units are included. This paper presents a method for dimensional analysis that propagates unknown units as ''jokers'' and returns the magnitude of unit violations. We propose three methods, namely evolutive culling, a repair mechanism, and a multi-objective approach, to integrate the dimensional analysis in the GP algorithm. Experiments on datasets with ground truth demonstrate comparable performance of evolutive culling and the multi-objective approach to a baseline without dimensional analysis. Extensive analysis of the results on datasets without ground truth reveals that the unit-aware algorithms make only low sacrifices in accuracy, while producing unit-adherent solutions. Overall, we presented a promising novel approach for developing unit-adherent empirical equations.

Updated: 2024-05-29 08:57:00

标题: 单位感知遗传规划用于经验方程的开发

摘要: 在开发经验方程式时，领域专家需要确保这些方程式准确且符合物理定律。通常，需要在方程式中发现具有未知单位的常数。传统的单位感知遗传编程（GP）方法无法处理包含未知单位的常数。本文提出了一种尺度分析方法，将未知单位传播为“王牌”，并返回单位违规的大小。我们提出了三种方法，即进化淘汰、修复机制和多目标方法，将尺度分析集成到GP算法中。对具有基准数据的数据集进行实验表明，进化淘汰和多目标方法的性能与没有尺度分析的基线相当。对没有基准数据的数据集结果的广泛分析显示，单位感知算法在准确性方面只有较小的牺牲，同时产生符合单位的解决方案。总体而言，我们提出了一种有前途的新方法，用于开发符合单位的经验方程式。

更新时间: 2024-05-29 08:57:00

领域: cs.LG,cs.SC

下载: http://arxiv.org/abs/2405.18896v1

SoftED: Metrics for Soft Evaluation of Time Series Event Detection

Time series event detection methods are evaluated mainly by standard classification metrics that focus solely on detection accuracy. However, inaccuracy in detecting an event can often result from its preceding or delayed effects reflected in neighboring detections. These detections are valuable to trigger necessary actions or help mitigate unwelcome consequences. In this context, current metrics are insufficient and inadequate for the context of event detection. There is a demand for metrics that incorporate both the concept of time and temporal tolerance for neighboring detections. This paper introduces SoftED metrics, a new set of metrics designed for soft evaluating event detection methods. They enable the evaluation of both detection accuracy and the degree to which their detections represent events. They improved event detection evaluation by associating events and their representative detections, incorporating temporal tolerance in over 36\% of experiments compared to the usual classification metrics. SoftED metrics were validated by domain specialists that indicated their contribution to detection evaluation and method selection.

Updated: 2024-05-29 08:56:49

标题: SoftED：时间序列事件检测的软评估指标

摘要: 时间序列事件检测方法主要通过仅关注检测准确性的标准分类指标进行评估。然而，对事件的不准确检测往往是由其前后效应在相邻检测中反映出来导致的。这些检测对于触发必要的行动或帮助减轻不受欢迎的后果是有价值的。在这种情况下，当前的指标对于事件检测的背景来说是不足和不适当的。需要一种既考虑时间概念又考虑相邻检测的时间容忍度的指标。本文介绍了SoftED指标，这是一组新的指标，旨在对事件检测方法进行软评估。它们使得可以评估检测准确性和它们的检测程度代表事件的程度。通过将事件与其代表性检测相关联，SoftED指标改进了事件检测评估，与通常的分类指标相比，在超过36%的实验中融入了时间容忍度。SoftED指标得到了领域专家的验证，他们指出这些指标对于检测评估和方法选择的贡献。

更新时间: 2024-05-29 08:56:49

领域: cs.LG

下载: http://arxiv.org/abs/2304.00439v2

ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models

The ability to invent novel and interesting problems is a remarkable feature of human intelligence that drives innovation, art, and science. We propose a method that aims to automate this process by harnessing the power of state-of-the-art generative models to produce a diversity of challenging yet solvable problems, here in the context of Python programming puzzles. Inspired by the intrinsically motivated literature, Autotelic CodE Search (ACES) jointly optimizes for the diversity and difficulty of generated problems. We represent problems in a space of LLM-generated semantic descriptors describing the programming skills required to solve them (e.g. string manipulation, dynamic programming, etc.) and measure their difficulty empirically as a linearly decreasing function of the success rate of Llama-3-70B, a state-of-the-art LLM problem solver. ACES iteratively prompts a large language model to generate difficult problems achieving a diversity of target semantic descriptors (goal-directed exploration) using previously generated problems as in-context examples. ACES generates problems that are more diverse and more challenging than problems produced by baseline methods and three times more challenging than problems found in existing Python programming benchmarks on average across 11 state-of-the-art code LLMs.

Updated: 2024-05-29 08:56:23

标题: 自发生成模型生成多样化的编程谜题ACES: Generating Diverse Programming Puzzles with with Autotelic Generative Models

摘要: 人类智慧的一个显著特征是能够创造出新颖且有趣的问题，这推动了创新、艺术和科学的发展。我们提出了一种方法，旨在通过利用最先进的生成模型的力量来自动化这一过程，以产生多样且具有挑战性但可解决的问题，这里是在Python编程谜题的背景下。受到内在动机的文献启发，Autotelic CodE Search (ACES)同时优化生成的问题的多样性和困难程度。我们将问题表示为LLM生成的描述编码空间中的语义描述符，描述解决问题所需的编程技能（例如字符串操作、动态规划等），并根据最先进的LLM问题解决器Llama-3-70B的成功率的线性递减函数来经验性地衡量其困难程度。ACES迭代地提示一个大型语言模型生成困难问题，以实现目标语义描述符的多样性（目标导向的探索），并使用先前生成的问题作为上下文示例。相对于基准方法生成的问题，ACES生成的问题更加多样化和具有挑战性，平均比现有Python编程基准测试中的问题具有三倍的挑战性，跨11个最先进的代码LLM。

更新时间: 2024-05-29 08:56:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2310.10692v4

Decom--CAM: Tell Me What You See, In Details! Feature-Level Interpretation via Decomposition Class Activation Map

Interpretation of deep learning remains a very challenging problem. Although the Class Activation Map (CAM) is widely used to interpret deep model predictions by highlighting object location, it fails to provide insight into the salient features used by the model to make decisions. Furthermore, existing evaluation protocols often overlook the correlation between interpretability performance and the model's decision quality, which presents a more fundamental issue. This paper proposes a new two-stage interpretability method called the Decomposition Class Activation Map (Decom-CAM), which offers a feature-level interpretation of the model's prediction. Decom-CAM decomposes intermediate activation maps into orthogonal features using singular value decomposition and generates saliency maps by integrating them. The orthogonality of features enables CAM to capture local features and can be used to pinpoint semantic components such as eyes, noses, and faces in the input image, making it more beneficial for deep model interpretation. To ensure a comprehensive comparison, we introduce a new evaluation protocol by dividing the dataset into subsets based on classification accuracy results and evaluating the interpretability performance on each subset separately. Our experiments demonstrate that the proposed Decom-CAM outperforms current state-of-the-art methods significantly by generating more precise saliency maps across all levels of classification accuracy. Combined with our feature-level interpretability approach, this paper could pave the way for a new direction for understanding the decision-making process of deep neural networks.

Updated: 2024-05-29 08:53:56

标题: Decom--CAM: 告诉我你看到了什么，详细解释！通过分解类激活图进行特征级别解释

摘要: 深度学习的解释仍然是一个非常具有挑战性的问题。虽然类激活映射（CAM）被广泛用于通过突出对象位置解释深度模型的预测，但它未能提供有关模型用于做出决策的显著特征的见解。此外，现有的评估协议经常忽视解释性能与模型决策质量之间的相关性，这提出了一个更基本的问题。本文提出了一种名为Decomposition Class Activation Map（Decom-CAM）的新的两阶段解释性方法，它提供了模型预测的特征级解释。Decom-CAM通过奇异值分解将中间激活映射分解为正交特征，并通过集成它们生成显著性映射。特征的正交性使CAM能够捕获局部特征，并可用于在输入图像中定位语义组件，如眼睛、鼻子和面部，使其更有益于深度模型解释。为了确保全面比较，我们引入了一种新的评估协议，通过根据分类准确性结果将数据集划分为子集，并分别评估每个子集的可解释性能。我们的实验证明，所提出的Decom-CAM在所有分类准确性水平上生成更精确的显著性映射，明显优于当前的最先进方法。结合我们的特征级解释方法，本文可能为理解深度神经网络的决策过程开辟一条新的方向。

更新时间: 2024-05-29 08:53:56

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2306.04644v2

Few-Shot Testing: Estimating Uncertainty of Memristive Deep Neural Networks Using One Bayesian Test Vector

The performance of deep learning algorithms such as neural networks (NNs) has increased tremendously recently, and they can achieve state-of-the-art performance in many domains. However, due to memory and computation resource constraints, implementing NNs on edge devices is a challenging task. Therefore, hardware accelerators such as computation-in-memory (CIM) with memristive devices have been developed to accelerate the most common operations, i.e., matrix-vector multiplication. However, due to inherent device properties, external environmental factors such as temperature, and an immature fabrication process, memristors suffer from various non-idealities, including defects and variations occurring during manufacturing and runtime. Consequently, there is a lack of complete confidence in the predictions made by the model. To improve confidence in NN predictions made by hardware accelerators in the presence of device non-idealities, in this paper, we propose a Bayesian test vector generation framework that can estimate the model uncertainty of NNs implemented on memristor-based CIM hardware. Compared to the conventional point estimate test vector generation method, our method is more generalizable across different model dimensions and requires storing only one test Bayesian vector in the hardware. Our method is evaluated on different model dimensions, tasks, fault rates, and variation noise to show that it can consistently achieve $100\%$ coverage with only $0.024$ MB of memory overhead.

Updated: 2024-05-29 08:53:16

标题: 少样本测试：使用一个贝叶斯测试向量估计忆阻深度神经网络的不确定性

摘要: 最近深度学习算法（如神经网络）的性能大幅提升，它们可以在许多领域实现最先进的性能。然而，由于内存和计算资源的限制，将神经网络实现在边缘设备上是一项具有挑战性的任务。因此，硬件加速器（如具有忆阻器的计算内存）已被开发用于加速最常见的操作，即矩阵-向量乘法。然而，由于固有的器件属性、外部环境因素（如温度）和不成熟的制造过程，忆阻器存在各种非理想性，包括在制造和运行时发生的缺陷和变化。因此，人们对模型所做的预测缺乏完全的信心。为了在存在器件非理想性的情况下提高硬件加速器所做神经网络预测的置信度，本文提出了一个贝叶斯测试向量生成框架，可以估计基于忆阻器的计算内存硬件上实现的神经网络模型的不确定性。与传统的点估计测试向量生成方法相比，我们的方法更具通用性，可以跨不同的模型维度，并且只需要在硬件中存储一个贝叶斯测试向量。我们的方法在不同的模型维度、任务、故障率和变异噪声上进行评估，结果显示它可以稳定地实现$100\%$的覆盖率，而仅需$0.024$ MB的内存开销。

更新时间: 2024-05-29 08:53:16

领域: cs.LG,cs.AI,cs.ET

下载: http://arxiv.org/abs/2405.18894v1

Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization

In federated learning (FL), the multi-step update and data heterogeneity among clients often lead to a loss landscape with sharper minima, degenerating the performance of the resulted global model. Prevalent federated approaches incorporate sharpness-aware minimization (SAM) into local training to mitigate this problem. However, the local loss landscapes may not accurately reflect the flatness of global loss landscape in heterogeneous environments; as a result, minimizing local sharpness and calculating perturbations on client data might not align the efficacy of SAM in FL with centralized training. To overcome this challenge, we propose FedLESAM, a novel algorithm that locally estimates the direction of global perturbation on client side as the difference between global models received in the previous active and current rounds. Besides the improved quality, FedLESAM also speed up federated SAM-based approaches since it only performs once backpropagation in each iteration. Theoretically, we prove a slightly tighter bound than its original FedSAM by ensuring consistent perturbation. Empirically, we conduct comprehensive experiments on four federated benchmark datasets under three partition strategies to demonstrate the superior performance and efficiency of FedLESAM.

Updated: 2024-05-29 08:46:21

标题: 局部估计的全局扰动优于局部扰动对于联邦锐度感知最小化

摘要: 在联邦学习（FL）中，多步更新和客户端之间的数据异质性经常导致具有更尖锐最小值的损失景观，从而降低了生成的全局模型的性能。流行的联邦方法将锐度感知最小化（SAM）纳入本地训练以减轻这个问题。然而，在异构环境中，本地损失景观可能无法准确反映全局损失景观的平坦性；因此，最小化本地锐度并计算客户端数据上的扰动可能无法使FL中SAM的效力与集中式训练保持一致。为了克服这一挑战，我们提出了FedLESAM，一种新颖的算法，它在客户端侧局部估计全局扰动的方向，其差异为在前一轮和当前轮次收到的全局模型之间的差异。除了提高质量外，FedLESAM还加快了基于联邦SAM的方法，因为它每次迭代只执行一次反向传播。从理论上讲，我们通过确保一致的扰动，证明了比其原始FedSAM更紧密的界限。在实证方面，我们在三种分区策略下对四个联邦基准数据集进行了全面实验，以展示FedLESAM的卓越性能和效率。

更新时间: 2024-05-29 08:46:21

领域: cs.LG,cs.DC

下载: http://arxiv.org/abs/2405.18890v1

On Perception of Prevalence of Cheating and Usage of Generative AI

This report investigates the perceptions of teaching staff on the prevalence of student cheating and the impact of Generative AI on academic integrity. Data was collected via an anonymous survey of teachers at the Department of Information Technology at Uppsala University and analyzed alongside institutional statistics on cheating investigations from 2004 to 2023. The results indicate that while teachers generally do not view cheating as highly prevalent, there is a strong belief that its incidence is increasing, potentially due to the accessibility of Generative AI. Most teachers do not equate AI usage with cheating but acknowledge its widespread use among students. Furthermore, teachers' perceptions align with objective data on cheating trends, highlighting their awareness of the evolving landscape of academic dishonesty.

Updated: 2024-05-29 08:46:00

标题: 关于作弊普遍程度认知和生成式人工智能使用的研究

摘要: 这份报告调查了教学人员对学生作弊普遍程度和生成式人工智能对学术诚信的影响的看法。通过对于于乌普萨拉大学信息技术系教师进行匿名调查收集数据，并结合2004年至2023年作弊调查的机构统计进行分析。结果显示，尽管教师通常认为作弊并不普遍，但他们普遍认为其发生率正在增加，可能是由于生成式人工智能的可访问性。大多数教师并不将人工智能使用与作弊等同起来，但承认学生中广泛使用。此外，教师的看法与作弊趋势的客观数据相一致，突显了他们对学术不诚实不断演变的认识。

更新时间: 2024-05-29 08:46:00

领域: cs.CY,cs.AI

下载: http://arxiv.org/abs/2405.18889v1

An approach to improve agent learning via guaranteeing goal reaching in all episodes

Reinforcement learning is commonly concerned with problems of maximizing accumulated rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the state space attain maximal reward. In such a case, the environment may be considered solved when the goal is reached. Whereas numerous techniques, learning or non-learning based, exist for solving environments, doing so optimally is the biggest challenge. Say, one may choose a reward rate which penalizes the action effort. Reinforcement learning is currently among the most actively developed frameworks for solving environments optimally by virtue of maximizing accumulated reward, in other words, returns. Yet, tuning agents is a notoriously hard task as reported in a series of works. Our aim here is to help the agent learn a near-optimal policy efficiently while ensuring a goal reaching property of some basis policy that merely solves the environment. We suggest an algorithm, which is fairly flexible, and can be used to augment practically any agent as long as it comprises of a critic. A formal proof of a goal reaching property is provided. Simulation experiments on six problems under five agents, including the benchmarked one, provided an empirical evidence that the learning can indeed be boosted while ensuring goal reaching property.

Updated: 2024-05-29 08:45:28

标题: 一种通过确保在所有阶段达到目标来改善代理学习的方法

摘要: 强化学习通常涉及最大化马尔可夫决策过程中积累奖励的问题。通常情况下，某个目标状态或状态空间的子集会获得最大奖励。在这种情况下，当达到目标时，环境可以被认为是解决的。虽然存在许多解决环境的技术，无论是基于学习还是非学习的，但最优解决环境是最大的挑战。比如，可以选择一个惩罚行动努力的奖励率。强化学习目前是最活跃发展的框架之一，通过最大化累积奖励来最优解决环境，换句话说，回报。然而，调整代理是一个非常困难的任务，正如一系列作品所报道的那样。我们的目标是帮助代理有效地学习接近最优策略，同时确保某些基本策略达到目标的性质，仅仅解决环境。我们提出了一种算法，相当灵活，并且可以用来增强几乎任何代理，只要它包含一个评论家。提供了一个目标达成性质的正式证明。在包括基准代理在内的五个代理下进行了六个问题的模拟实验，提供了一个经验证据，即学习确实可以被提升，同时确保目标达成性质。

更新时间: 2024-05-29 08:45:28

领域: cs.AI,cs.SY,eess.SY,math.DS

下载: http://arxiv.org/abs/2405.18118v2

Proactive Load-Shaping Strategies with Privacy-Cost Trade-offs in Residential Households based on Deep Reinforcement Learning

Smart meters play a crucial role in enhancing energy management and efficiency, but they raise significant privacy concerns by potentially revealing detailed user behaviors through energy consumption patterns. Recent scholarly efforts have focused on developing battery-aided load-shaping techniques to protect user privacy while balancing costs. This paper proposes a novel deep reinforcement learning-based load-shaping algorithm (PLS-DQN) designed to protect user privacy by proactively creating artificial load signatures that mislead potential attackers. We evaluate our proposed algorithm against a non-intrusive load monitoring (NILM) adversary. The results demonstrate that our approach not only effectively conceals real energy usage patterns but also outperforms state-of-the-art methods in enhancing user privacy while maintaining cost efficiency.

Updated: 2024-05-29 08:45:04

标题: 基于深度强化学习的居住户中隐私成本权衡的主动负载塑形策略

摘要: 智能电表在提升能源管理和效率方面发挥着至关重要的作用，但它们可能通过能源消耗模式潜在地揭示详细的用户行为，从而引发重大的隐私问题。最近的学术研究努力致力于开发基于电池辅助负荷整形技术，以保护用户隐私同时平衡成本。本文提出了一种新颖的基于深度强化学习的负荷整形算法（PLS-DQN），旨在通过主动创建误导潜在攻击者的人工负荷签名来保护用户隐私。我们针对非侵入性负荷监测（NILM）对手评估了我们提出的算法。结果表明，我们的方法不仅有效地隐藏了真实的能源使用模式，而且在提升用户隐私的同时保持了成本效率，优于当前的最先进方法。

更新时间: 2024-05-29 08:45:04

领域: eess.SY,cs.LG,cs.SY

下载: http://arxiv.org/abs/2405.18888v1

Bayesian Neural Networks: A Min-Max Game Framework

This paper is a preliminary study of the robustness and noise analysis of deep neural networks via a game theory formulation Bayesian Neural Networks (BNN) and the maximal coding rate distortion loss. BNN has been shown to provide some robustness to deep learning, and the minimax method used to be a natural conservative way to assist the Bayesian method. Inspired by the recent closed-loop transcription neural network, we formulate the BNN via game theory between the deterministic neural network $f$ and the sampling network $f + \xi$ or $f + r*\xi$. Compared with previous BNN, BNN via game theory learns a solution space within a certain gap between the center $f$ and the sampling point $f + r*\xi$, and is a conservative choice with a meaningful prior setting compared with previous BNN. Furthermore, the minimum points between $f$ and $f + r*\xi$ become stable when the subspace dimension is large enough with a well-trained model $f$. With these, the model $f$ can have a high chance of recognizing the out-of-distribution data or noise data in the subspace rather than the prediction level, even if $f$ is in online training after a few iterations of true data. So far, our experiments are limited to MNIST and Fashion MNIST data sets, more experiments with realistic data sets and complicated neural network models should be implemented to validate the above arguments.

Updated: 2024-05-29 08:43:20

标题: 贝叶斯神经网络：最小最大博弈框架

摘要: 这篇论文是关于通过博弈论制定贝叶斯神经网络（BNN）和最大编码速率失真损失的鲁棒性和噪声分析的初步研究。已经证明BNN在深度学习中提供了一定的鲁棒性，并且最小最大方法被用作辅助贝叶斯方法的自然保守方式。受最近闭环转录神经网络的启发，我们通过确定性神经网络$f$和采样网络$f + \xi$或$f + r*\xi$之间的博弈论制定了BNN。与先前的BNN相比，通过博弈论制定的BNN在中心$f$和采样点$f + r*\xi$之间学习了一个解空间，并且与先前的BNN相比具有具有意义的先验设置的保守选择。此外，当子空间维度足够大且经过充分训练的模型$f$时，$f$和$f + r*\xi$之间的最小点变得稳定。通过这些，即使$f$在几次真实数据迭代后在线训练，模型$f$也有很高的机会在子空间中识别出分布之外的数据或噪声数据而不是在预测级别上。到目前为止，我们的实验仅限于MNIST和Fashion MNIST数据集，应该实施更多具有现实数据集和复杂神经网络模型的实验来验证上述论点。

更新时间: 2024-05-29 08:43:20

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2311.11126v2

Compressing Large Language Models using Low Rank and Low Precision Decomposition

The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy them on memory-constrained edge devices. This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent low-rank structure of a weight matrix $\mathbf{W}$ by approximating it via a low-rank, low-precision decomposition as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$. Here, $\mathbf{L}$ and $\mathbf{R}$ are low rank factors, and the entries of $\mathbf{Q}$, $\mathbf{L}$ and $\mathbf{R}$ are quantized. The model is compressed by substituting each layer with its $\mathbf{Q} + \mathbf{L}\mathbf{R}$ decomposition, and the zero-shot performance of the compressed model is evaluated. Additionally, $\mathbf{L}$ and $\mathbf{R}$ are readily amenable to low-rank adaptation, consequently enhancing the zero-shot performance. $\rm CALDERA$ obtains this decomposition by formulating it as an optimization problem $\min_{\mathbf{Q},\mathbf{L},\mathbf{R}}\lVert(\mathbf{Q} + \mathbf{L}\mathbf{R} - \mathbf{W})\mathbf{X}^\top\rVert_{\rm F}^2$, where $\mathbf{X}$ is the calibration data, and $\mathbf{Q}, \mathbf{L}, \mathbf{R}$ are constrained to be representable using low-precision formats. Theoretical upper bounds on the approximation error of $\rm CALDERA$ are established using a rank-constrained regression framework, and the tradeoff between compression ratio and model performance is studied by analyzing the impact of target rank and quantization bit budget. Results illustrate that compressing LlaMa-$2$ $7$B/$70$B and LlaMa-$3$ $8$B models obtained using $\rm CALDERA$ outperforms existing post-training LLM compression techniques in the regime of less than $2.5$ bits per parameter. The implementation is available at: \href{https://github.com/pilancilab/caldera}{https://github.com/pilancilab/caldera}.

Updated: 2024-05-29 08:42:30

标题: 使用低秩和低精度分解压缩大型语言模型

摘要: 目前大型语言模型（LLMs）的尺寸过大，使其难以部署在内存受限的边缘设备上。本文介绍了一种新的后训练LLM压缩算法$\rm CALDERA$，通过近似权重矩阵$\mathbf{W}$的固有低秩结构，将其表示为$\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$。这里，$\mathbf{L}$和$\mathbf{R}$是低秩因子，$\mathbf{Q}$，$\mathbf{L}$和$\mathbf{R}$的元素被量化。通过用$\mathbf{Q} + \mathbf{L}\mathbf{R}$分解替换每一层来压缩模型，并评估压缩模型的零-shot性能。此外，$\mathbf{L}$和$\mathbf{R}$方便地适应于低秩，从而增强了零-shot性能。$\rm CALDERA$通过将其构造为一个优化问题$\min_{\mathbf{Q},\mathbf{L},\mathbf{R}}\lVert(\mathbf{Q} + \mathbf{L}\mathbf{R} - \mathbf{W})\mathbf{X}^\top\rVert_{\rm F}^2$来获得这种分解，其中$\mathbf{X}$是校准数据，$\mathbf{Q}$，$\mathbf{L}$，$\mathbf{R}$受到表示使用低精度格式的约束。使用基于秩约束回归框架建立了$\rm CALDERA$的近似误差的理论上限，并通过分析目标秩和量化比特预算的影响来研究压缩比和模型性能之间的折衷。结果表明，使用$\rm CALDERA$获得的压缩的LlaMa-$2$ $7$B/$70$B和LlaMa-$3$ $8$B模型在每参数少于$2.5$比特的范围内优于现有的后训练LLM压缩技术。该实现可在以下链接找到：\href{https://github.com/pilancilab/caldera}{https://github.com/pilancilab/caldera}。

更新时间: 2024-05-29 08:42:30

领域: cs.LG,cs.AI,math.OC,stat.ML

下载: http://arxiv.org/abs/2405.18886v1

Tuning-Free Alignment of Diffusion Models with Direct Noise Optimization

In this work, we focus on the alignment problem of diffusion models with a continuous reward function, which represents specific objectives for downstream tasks, such as improving human preference. The central goal of the alignment problem is to adjust the distribution learned by diffusion models such that the generated samples maximize the target reward function. We propose a novel alignment approach, named Direct Noise Optimization (DNO), that optimizes the injected noise during the sampling process of diffusion models. By design, DNO is tuning-free and prompt-agnostic, as the alignment occurs in an online fashion during generation. We rigorously study the theoretical properties of DNO and also propose variants to deal with non-differentiable reward functions. Furthermore, we identify that naive implementation of DNO occasionally suffers from the out-of-distribution reward hacking problem, where optimized samples have high rewards but are no longer in the support of the pretrained distribution. To remedy this issue, we leverage classical high-dimensional statistics theory and propose to augment the DNO loss with certain probability regularization. We conduct extensive experiments on several popular reward functions trained on human feedback data and demonstrate that the proposed DNO approach achieves state-of-the-art reward scores as well as high image quality, all within a reasonable time budget for generation.

Updated: 2024-05-29 08:39:39

标题: 无调节扩散模型的直接噪声优化对准

摘要: 在这项工作中，我们关注扩散模型与连续奖励函数的对齐问题，该奖励函数代表了下游任务的特定目标，例如改善人类偏好。对齐问题的中心目标是调整扩散模型学习的分布，使生成的样本最大化目标奖励函数。我们提出了一种新颖的对齐方法，名为直接噪声优化（DNO），该方法优化了扩散模型采样过程中注入的噪声。通过设计，DNO是无需调整且与提示无关的，因为对齐是在生成过程中以在线方式发生的。我们严格研究了DNO的理论性质，并提出了处理不可微分奖励函数的变体。此外，我们发现DNO的朴素实现偶尔会遇到超出分布奖励操作问题，即优化样本具有高奖励，但不再在预训练分布的支持范围内。为了解决这个问题，我们利用经典的高维统计理论，并提出将DNO损失与某种概率正则化相结合。我们在几个受欢迎的以人类反馈数据训练的奖励函数上进行了大量实验，并证明了所提出的DNO方法在合理的生成时间预算内实现了最先进的奖励分数以及高质量图像。

更新时间: 2024-05-29 08:39:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18881v1

Spatiotemporal Forecasting Meets Efficiency: Causal Graph Process Neural Networks

Graph Neural Networks (GNNs) have advanced spatiotemporal forecasting by leveraging relational inductive biases among sensors (or any other measuring scheme) represented as nodes in a graph. However, current methods often rely on Recurrent Neural Networks (RNNs), leading to increased runtimes and memory use. Moreover, these methods typically operate within 1-hop neighborhoods, exacerbating the reduction of the receptive field. Causal Graph Processes (CGPs) offer an alternative, using graph filters instead of MLP layers to reduce parameters and minimize memory consumption. This paper introduces the Causal Graph Process Neural Network (CGProNet), a non-linear model combining CGPs and GNNs for spatiotemporal forecasting. CGProNet employs higher-order graph filters, optimizing the model with fewer parameters, reducing memory usage, and improving runtime efficiency. We present a comprehensive theoretical and experimental stability analysis, highlighting key aspects of CGProNet. Experiments on synthetic and real data demonstrate CGProNet's superior efficiency, minimizing memory and time requirements while maintaining competitive forecasting performance.

Updated: 2024-05-29 08:37:48

标题: 时空预测遇上效率：因果图过程神经网络

摘要: 图神经网络（GNNs）通过利用传感器之间的关系归纳偏差，将传感器（或任何其他测量方案）表示为图中的节点，推动了时空预测的发展。然而，当前的方法通常依赖于循环神经网络（RNNs），导致运行时间和内存使用增加。此外，这些方法通常在1跳邻域内运行，加剧了感受野的缩小。因果图过程（CGPs）提供了一种替代方案，使用图滤波器代替MLP层来减少参数并最小化内存消耗。本文介绍了因果图过程神经网络（CGProNet），这是一个非线性模型，结合了CGPs和GNNs用于时空预测。CGProNet采用高阶图滤波器，通过更少的参数优化模型，减少内存使用量，提高运行效率。我们提出了全面的理论和实验稳定性分析，突出了CGProNet的关键方面。对合成和真实数据的实验表明，CGProNet具有卓越的效率，减少内存和时间需求，同时保持竞争力的预测性能。

更新时间: 2024-05-29 08:37:48

领域: cs.LG

下载: http://arxiv.org/abs/2405.18879v1

Privacy Preserving Data Imputation via Multi-party Computation for Medical Applications

Handling missing data is crucial in machine learning, but many datasets contain gaps due to errors or non-response. Unlike traditional methods such as listwise deletion, which are simple but inadequate, the literature offers more sophisticated and effective methods, thereby improving sample size and accuracy. However, these methods require accessing the whole dataset, which contradicts the privacy regulations when the data is distributed among multiple sources. Especially in the medical and healthcare domain, such access reveals sensitive information about patients. This study addresses privacy-preserving imputation methods for sensitive data using secure multi-party computation, enabling secure computations without revealing any party's sensitive information. In this study, we realized the mean, median, regression, and kNN imputation methods in a privacy-preserving way. We specifically target the medical and healthcare domains considering the significance of protection of the patient data, showcasing our methods on a diabetes dataset. Experiments on the diabetes dataset validated the correctness of our privacy-preserving imputation methods, yielding the largest error around $3 \times 10^{-3}$, closely matching plaintext methods. We also analyzed the scalability of our methods to varying numbers of samples, showing their applicability to real-world healthcare problems. Our analysis demonstrated that all our methods scale linearly with the number of samples. Except for kNN, the runtime of all our methods indicates that they can be utilized for large datasets.

Updated: 2024-05-29 08:36:42

标题: 隐私保护数据缺失值填补在医疗应用中的多方计算

摘要: 处理缺失数据在机器学习中至关重要，但许多数据集中存在由于错误或非响应而导致的缺失。与传统的方法如整体删除相比，虽然简单但不足够，文献提供了更复杂和有效的方法，从而提高了样本大小和准确性。然而，这些方法需要访问整个数据集，这与数据分布在多个来源之间时的隐私规定相矛盾。特别是在医疗和健康领域，这种访问会泄露有关患者的敏感信息。本研究针对使用安全多方计算实现隐私保护的敏感数据插补方法，实现安全计算而不泄露任何一方的敏感信息。在本研究中，我们以隐私保护的方式实现了均值、中位数、回归和kNN插补方法。我们特别针对医疗和健康领域，考虑到保护患者数据的重要性，展示了我们的方法在糖尿病数据集上的应用。对糖尿病数据集的实验验证了我们的隐私保护插补方法的正确性，产生了最大误差约为$3 \times 10^{-3}$，与明文方法接近。我们还分析了我们的方法在不同样本数量下的可扩展性，展示了它们对真实世界医疗问题的适用性。我们的分析表明，我们的所有方法都随着样本数量呈线性缩放。除了kNN，所有我们的方法的运行时间表明它们可以用于大型数据集。

更新时间: 2024-05-29 08:36:42

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2405.18878v1

Continuous Product Graph Neural Networks

Processing multidomain data defined on multiple graphs holds significant potential in various practical applications in computer science. However, current methods are mostly limited to discrete graph filtering operations. Tensorial partial differential equations on graphs (TPDEGs) provide a principled framework for modeling structured data across multiple interacting graphs, addressing the limitations of the existing discrete methodologies. In this paper, we introduce Continuous Product Graph Neural Networks (CITRUS) that emerge as a natural solution to the TPDEG. CITRUS leverages the separability of continuous heat kernels from Cartesian graph products to efficiently implement graph spectral decomposition. We conduct thorough theoretical analyses of the stability and over-smoothing properties of CITRUS in response to domain-specific graph perturbations and graph spectra effects on the performance. We evaluate CITRUS on well-known traffic and weather spatiotemporal forecasting datasets, demonstrating superior performance over existing approaches.

Updated: 2024-05-29 08:36:09

标题: 连续产品图神经网络

摘要: 在计算机科学的各种实际应用中，处理定义在多个图上的多领域数据具有重要潜力。然而，目前的方法大多局限于离散图滤波操作。图上的张量偏微分方程（TPDEGs）为跨多个相互作用的图建模结构化数据提供了一个有原则的框架，解决了现有离散方法的局限性。在本文中，我们介绍了连续乘积图神经网络（CITRUS），它作为TPDEG的自然解决方案。CITRUS利用连续热核与笛卡尔图乘积的可分性，有效地实现了图谱分解。我们对CITRUS的稳定性和过度平滑性质进行了彻底的理论分析，以应对特定领域图扰动和图谱效果对性能的影响。我们在知名的交通和天气时空预测数据集上评估了CITRUS，表明其优于现有方法的性能。

更新时间: 2024-05-29 08:36:09

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18877v1

On Fairness Concerns in the Blockchain Ecosystem

Blockchains revolutionized centralized sectors like banking and finance by promoting decentralization and transparency. In a blockchain, information is transmitted through transactions issued by participants or applications. Miners crucially select, order, and validate pending transactions for block inclusion, prioritizing those with higher incentives or fees. The order in which transactions are included can impact the blockchain final state. Moreover, applications running on top of a blockchain often rely on governance protocols to decentralize the decision-making power to make changes to their core functionality. These changes can affect how participants interact with these applications. Since one token equals one vote, participants holding multiple tokens have a higher voting power to support or reject the proposed changes. The extent to which this voting power is distributed is questionable and if highly concentrated among a few holders can lead to governance attacks. In this thesis, we audit the Bitcoin and Ethereum blockchains to investigate the norms followed by miners in determining the transaction prioritization. We also audit decentralized governance protocols such as Compound to evaluate whether the voting power is fairly distributed among the participants. Our findings have significant implications for future developments of blockchains and decentralized applications.

Updated: 2024-05-29 08:35:37

标题: 关于区块链生态系统中的公平性关注

摘要: 区块链通过推动去中心化和透明化，彻底改变了像银行和金融这样的中心化领域。在区块链中，信息通过参与者或应用程序发出的交易传输。矿工关键地选择、排序和验证待定交易以包含在区块中，优先考虑那些具有更高激励或费用的交易。包含交易的顺序可能影响区块链的最终状态。此外，在区块链上运行的应用程序通常依赖治理协议，将决策权分散到对核心功能进行更改。这些更改可能影响参与者与这些应用程序的互动方式。由于一个代币等于一票，持有多个代币的参与者具有更高的投票权来支持或拒绝提出的更改。这种投票权的分配程度存疑，如果高度集中在少数持有者之间，可能导致治理攻击。在本论文中，我们对比特币和以太坊区块链进行审计，以调查矿工在确定交易优先级方面遵循的规范。我们还审计像Compound这样的去中心化治理协议，评估投票权是否公平分配给参与者。我们的发现对未来区块链和去中心化应用程序的发展具有重要影响。

更新时间: 2024-05-29 08:35:37

领域: cs.CR

下载: http://arxiv.org/abs/2405.18876v1

Counterfactual Metarules for Local and Global Recourse

We introduce T-CREx, a novel model-agnostic method for local and global counterfactual explanation (CE), which summarises recourse options for both individuals and groups in the form of human-readable rules. It leverages tree-based surrogate models to learn the counterfactual rules, alongside 'metarules' denoting their regions of optimality, providing both a global analysis of model behaviour and diverse recourse options for users. Experiments indicate that T-CREx achieves superior aggregate performance over existing rule-based baselines on a range of CE desiderata, while being orders of magnitude faster to run.

Updated: 2024-05-29 08:35:17

标题: 反事实元规则用于本地和全局可递归性

摘要: 我们引入了T-CREx，这是一种新颖的模型无关方法，用于本地和全局反事实解释（CE），它以人类可读规则的形式总结了个人和群体的补救选项。它利用基于树的替代模型学习反事实规则，同时使用“元规则”表示其最优区域，为用户提供模型行为的全局分析和多样的补救选项。实验证明，T-CREx在一系列CE期望上实现了优越的综合性能，同时运行速度快了数个数量级。

更新时间: 2024-05-29 08:35:17

领域: cs.AI

下载: http://arxiv.org/abs/2405.18875v1

Physics-based material parameters extraction from perovskite experiments via Bayesian optimization

The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis platform that can extract up to 8 fundamental material parameters of an organometallic perovskite semiconductor from a transient photoluminescence experiment, based on a complex full physics model that includes drift-diffusion of carriers and dynamic defect occupation. An example study of thermal degradation reveals that the carrier mobility and trap-assisted recombination coefficient are reduced noticeably, while the defect energy level remains nearly unchanged. The reduced carrier mobility can dominate the overall effect on thermal degradation of perovskite solar cells by reducing the fill factor, despite the opposite effect of the reduced trap-assisted recombination coefficient on increasing the fill factor. In future, this platform can be conveniently applied to other experiments or to combinations of experiments, accelerating materials discovery and optimization of semiconductor materials for photovoltaics and other applications.

Updated: 2024-05-29 08:33:14

标题: 基于贝叶斯优化的钙钛矿实验中物质参数的提取

摘要: 从定量实验分析提取钙钛矿材料参数的能力对于光伏和光电应用的合理设计至关重要。然而，随着理论模型的复杂性和钙钛矿材料参数数量的增加，这种分析的难度显著增加。在这里，我们使用贝叶斯优化来开发一个分析平台，该平台可以从瞬态光致发光实验中提取有机金属钙钛矿半导体的最多8个基本材料参数，基于一个包括载流子漂移扩散和动态缺陷占据的复杂物理模型。热降解的示例研究表明，载流子迁移率和陷阱辅助复合系数明显降低，而缺陷能级几乎保持不变。降低的载流子迁移率可以通过降低填充因子主导钙钛矿太阳能电池的热降解的整体效应，尽管降低的陷阱辅助复合系数对增加填充因子产生相反效果。未来，这个平台可以方便地应用于其他实验或实验组合，加速材料发现和半导体材料优化，以用于光伏和其他应用。

更新时间: 2024-05-29 08:33:14

领域: cond-mat.mtrl-sci,cs.CE,cs.LG

下载: http://arxiv.org/abs/2402.11101v4

DFAMiner: Mining minimal separating DFAs from labelled samples

We propose DFAMiner, a passive learning tool for learning minimal separating deterministic finite automata (DFA) from a set of labelled samples. Separating automata are an interesting class of automata that occurs generally in regular model checking and has raised interest in foundational questions of parity game solving. We first propose a simple and linear-time algorithm that incrementally constructs a three-valued DFA (3DFA) from a set of labelled samples given in the usual lexicographical order. This 3DFA has accepting and rejecting states as well as don't-care states, so that it can exactly recognise the labelled examples. We then apply our tool to mining a minimal separating DFA for the labelled samples by minimising the constructed automata via a reduction to solving SAT problems. Empirical evaluation shows that our tool outperforms current state-of-the-art tools significantly on standard benchmarks for learning minimal separating DFAs from samples. Progress in the efficient construction of separating DFAs can also lead to finding the lower bound of parity game solving, where we show that DFAMiner can create optimal separating automata for simple languages with up to 7 colours. Future improvements might offer inroads to better data structures.

Updated: 2024-05-29 08:31:34

标题: DFAMiner：从标记样本中挖掘最小分离的确定有限自动机

摘要: 我们提出了DFAMiner，这是一个用于从一组标记样本中学习最小分隔确定性有限自动机（DFA）的被动学习工具。分隔自动机是一类有趣的自动机，在正则模型检测中通常出现，并且引起了对奇偶游戏求解基础问题的兴趣。我们首先提出了一个简单且线性时间的算法，逐步构建了一个三值DFA（3DFA），从一组按照通常词典顺序给出的标记样本中。这个3DFA具有接受和拒绝状态以及不关心状态，因此可以精确识别标记的示例。然后，我们将我们的工具应用于通过将构建的自动机最小化来挖掘最小分隔DFA。通过将问题简化为求解SAT问题。实证评估表明，我们的工具在学习最小分隔DFA的标准基准测试中明显优于当前的最先进工具。分隔DFA的高效构建进展也可以导致找到奇偶游戏求解的下限，我们展示了DFAMiner可以为具有多达7种颜色的简单语言创建最优分隔自动机。未来的改进可能会为更好的数据结构打开大门。

更新时间: 2024-05-29 08:31:34

领域: cs.FL,cs.LG,F.4.3; I.2.6

下载: http://arxiv.org/abs/2405.18871v1

LLMs achieve adult human performance on higher-order theory of mind tasks

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

Updated: 2024-05-29 08:31:16

标题: LLMs 在高阶心理理论任务上达到成年人类的表现

摘要: 本文研究了大型语言模型（LLMs）在发展高阶心灵理论（ToM）方面的程度；即人类理解多个心理和情绪状态的能力，并以递归方式推理（如我认为你相信她知道）。本文在先前研究的基础上引入了一个手写测试套件--多阶心灵理论问答--并将其用于比较五个LLMs的表现与新收集的成年人基准。我们发现GPT-4和Flan-PaLM在整体ToM任务上达到了成年水平和接近成年水平的表现，而GPT-4在第六阶推理上超过了成年人的表现。我们的结果表明，模型大小和微调之间存在相互作用，以实现ToM能力的实现，并且表现最佳的LLMs已经发展出了一种泛化的ToM能力。鉴于高阶ToM在广泛的合作和竞争人类行为中起着重要作用，这些发现对面向用户的LLM应用具有重要意义。

更新时间: 2024-05-29 08:31:16

领域: cs.AI,cs.CL,cs.HC

下载: http://arxiv.org/abs/2405.18870v1

Towards Data-Driven Electricity Management: Multi-Region Harmonized Data and Knowledge Graph

Due to growing population and technological advances, global electricity consumption, and consequently also CO2 emissions are increasing. The residential sector makes up 25% of global electricity consumption and has great potential to increase efficiency and reduce CO2 footprint without sacrificing comfort. However, a lack of uniform consumption data at the household level spanning multiple regions hinders large-scale studies and robust multi-region model development. This paper introduces a multi-region dataset compiled from publicly available sources and presented in a uniform format. This data enables machine learning tasks such as disaggregation, demand forecasting, appliance ON/OFF classification, etc. Furthermore, we develop an RDF knowledge graph that characterizes the electricity consumption of the households and contextualizes it with household related properties enabling semantic queries and interoperability with other open knowledge bases like Wikidata and DBpedia. This structured data can be utilized to inform various stakeholders towards data-driven policy and business development.

Updated: 2024-05-29 08:30:34

标题: 走向数据驱动的电力管理：多区域协调数据和知识图谱

摘要: 由于人口增长和技术进步，全球电力消耗以及二氧化碳排放量也在增加。居民部门占全球电力消耗的25％，具有巨大潜力提高效率，减少二氧化碳排放量，而不损害舒适度。然而，在涵盖多个地区的家庭层面缺乏统一的消费数据阻碍了大规模研究和强大的多区域模型的发展。本文介绍了一个从公开来源编制的多区域数据集，并以统一格式呈现。这些数据可以支持机器学习任务，如分解、需求预测、电器开关分类等。此外，我们开发了一个RDF知识图，对家庭的电力消耗进行特征化，并将其与家庭相关属性联系起来，实现语义查询和与其他开放知识库（如Wikidata和DBpedia）的互操作性。这种结构化数据可以用于向各方利益相关者提供数据驱动的政策和业务发展建议。

更新时间: 2024-05-29 08:30:34

领域: cs.LG

下载: http://arxiv.org/abs/2405.18869v1

Topological Perspectives on Optimal Multimodal Embedding Spaces

Recent strides in multimodal model development have ignited a paradigm shift in the realm of text-to-image generation. Among these advancements, CLIP stands out as a remarkable achievement which is a sophisticated autoencoder adept at encoding both textual and visual information within a unified latent space. This paper delves into a comparative analysis between CLIP and its recent counterpart, CLOOB. To unravel the intricate distinctions within the embedding spaces crafted by these models, we employ topological data analysis. Our approach encompasses a comprehensive examination of the modality gap drivers, the clustering structures existing across both high and low dimensions, and the pivotal role that dimension collapse plays in shaping their respective embedding spaces. Empirical experiments substantiate the implications of our analyses on downstream performance across various contextual scenarios. Through this investigation, we aim to shed light on the nuanced intricacies that underlie the comparative efficacy of CLIP and CLOOB, offering insights into their respective strengths and weaknesses, and providing a foundation for further refinement and advancement in multimodal model research.

Updated: 2024-05-29 08:28:23

标题: 拓扑视角下的最优多模态嵌入空间

摘要: 最近在多模态模型开发方面取得的进展引发了文本到图像生成领域的范式转变。在这些进步中，CLIP凸显出作为一个卓越的成就，它是一个精密的自动编码器，能够在统一的潜在空间内编码文本和视觉信息。本文深入探讨了CLIP和其最近的对手CLOOB之间的比较分析。为了揭示这些模型所构建的嵌入空间内部的复杂差异，我们采用了拓扑数据分析。我们的方法涵盖了对模态差距驱动因素、存在于高维和低维空间中的聚类结构以及维度崩溃在塑造它们各自嵌入空间中的关键作用的全面检查。实证实验证实了我们分析对各种背景情景下下游性能的影响。通过这项研究，我们旨在揭示CLIP和CLOOB比较效力背后微妙的复杂性，提供关于它们各自优势和劣势的见解，并为多模态模型研究的进一步完善和发展奠定基础。

更新时间: 2024-05-29 08:28:23

领域: cs.AI,68T05

下载: http://arxiv.org/abs/2405.18867v1

SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget

Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts. However, serving MoE models on memory-constrained devices is challenging due to the large parameter size. Typical solutions such as memory swapping or expert pruning may lead to significantly higher latency or severe accuracy loss. In this paper, we introduce SwapMoE, a framework for efficient serving of MoE-based large language models with tunable memory budgets. The main idea of SwapMoE is to keep a small dynamic set of important experts, namely Virtual Experts, in the main memory for inference, while seamlessly maintaining how the Virtual Experts map to the actual experts. Experiments have shown that SwapMoE can reduce the memory footprint while maintaining reasonable accuracy. For example, on text summarization tasks with Switch Transformer, SwapMoE can reduce the memory consumption from 14.2 GiB to 4.7 GiB, together with 50\% latency reduction and a slight Rouge-2 score drop of 0.041.

Updated: 2024-05-29 08:25:03

标题: SwapMoE：使用可调内存预算为现成MoE-based大型语言模型提供服务

摘要: 混合专家（MoE）是一种流行的技术，用于提高大型语言模型（LLMs）的容量，其具有条件激活的并行专家。然而，在内存受限设备上提供MoE模型是具有挑战性的，因为参数规模很大。典型的解决方案，如内存交换或专家修剪，可能导致显著更高的延迟或严重的准确性损失。在本文中，我们介绍了SwapMoE，这是一个用于高效提供基于MoE的大型语言模型的框架，可以调整内存预算。SwapMoE的主要思想是在推断时保留一小组重要的专家，即虚拟专家，以保持虚拟专家与实际专家的映射方式。实验证明，SwapMoE可以减少内存占用量同时保持合理的准确性。例如，在使用Switch Transformer进行文本摘要任务时，SwapMoE可以将内存消耗从14.2 GiB减少到4.7 GiB，同时减少50\%的延迟，Rouge-2分数略微下降0.041。

更新时间: 2024-05-29 08:25:03

领域: cs.AI

下载: http://arxiv.org/abs/2308.15030v4

Domain-Inspired Sharpness-Aware Minimization Under Domain Shifts

This paper presents a Domain-Inspired Sharpness-Aware Minimization (DISAM) algorithm for optimization under domain shifts. It is motivated by the inconsistent convergence degree of SAM across different domains, which induces optimization bias towards certain domains and thus impairs the overall convergence. To address this issue, we consider the domain-level convergence consistency in the sharpness estimation to prevent the overwhelming (deficient) perturbations for less (well) optimized domains. Specifically, DISAM introduces the constraint of minimizing variance in the domain loss, which allows the elastic gradient calibration in perturbation generation: when one domain is optimized above the averaging level \textit{w.r.t.} loss, the gradient perturbation towards that domain will be weakened automatically, and vice versa. Under this mechanism, we theoretically show that DISAM can achieve faster overall convergence and improved generalization in principle when inconsistent convergence emerges. Extensive experiments on various domain generalization benchmarks show the superiority of DISAM over a range of state-of-the-art methods. Furthermore, we show the superior efficiency of DISAM in parameter-efficient fine-tuning combined with the pretraining models. The source code is released at https://github.com/MediaBrain-SJTU/DISAM.

Updated: 2024-05-29 08:22:33

标题: 受领域启发的领域转移下的锐度感知最小化

摘要: 本文提出了一种领域启发的锐度感知最小化(DISAM)算法，用于在领域转移下的优化问题。该算法受到SAM在不同领域之间收敛程度不一致的启发，这导致了对某些领域的优化偏差，从而影响了整体的收敛性。为了解决这个问题，我们考虑了锐度估计中的领域级收敛一致性，以防止对优化不足的领域产生过多（不足）的扰动。具体来说，DISAM引入了在领域损失中最小化方差的约束，这允许在扰动生成中进行弹性梯度校准：当一个领域的优化超过了平均水平，损失方面的梯度扰动将自动减弱，反之亦然。在这种机制下，我们在理论上展示了当出现不一致的收敛时，DISAM可以在原则上实现更快的整体收敛和改进的泛化。对各种领域泛化基准的大量实验证明了DISAM在一系列最先进方法上的优越性。此外，我们展示了DISAM在与预训练模型结合的参数高效调整中的优越效率。源代码发布在https://github.com/MediaBrain-SJTU/DISAM。

更新时间: 2024-05-29 08:22:33

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.18861v1

Cascade of phase transitions in the training of Energy-based models

In this paper, we investigate the feature encoding process in a prototypical energy-based generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value decomposition, revealing a series of phase transitions associated to a progressive learning of the principal modes of the empirical probability distribution. The model first learns the center of mass of the modes and then progressively resolve all modes through a cascade of phase transitions. We first describe this process analytically in a controlled setup that allows us to study analytically the training dynamics. We then validate our theoretical results by training the Bernoulli-Bernoulli RBM on real data sets. By using data sets of increasing dimension, we show that learning indeed leads to sharp phase transitions in the high-dimensional limit. Moreover, we propose and test a mean-field finite-size scaling hypothesis. This shows that the first phase transition is in the same universality class of the one we studied analytically, and which is reminiscent of the mean-field paramagnetic-to-ferromagnetic phase transition.

Updated: 2024-05-29 08:18:03

标题: 能量基模型训练中的相变级联效应

摘要: 在这篇论文中，我们研究了一个典型的基于能量的生成模型——受限玻尔兹曼机（RBM）中的特征编码过程。我们从使用简化的架构和数据结构进行的分析研究开始，最终以在真实数据集上进行的真实训练的数值分析结束。我们的研究通过对模型的权重矩阵进行奇异值分解来跟踪其演化，揭示了一系列与逐渐学习经验概率分布的主要模式相关的相变。模型首先学习模式的质心，然后通过一系列相变逐渐解决所有模式。我们首先在受控制的设置中从理论上描述了这一过程，这使我们能够在理论上研究训练动态。然后，我们通过在真实数据集上训练伯努利-伯努利RBM来验证我们的理论结果。通过使用维度增加的数据集，我们展示了学习确实在高维极限下导致尖锐的相变。此外，我们提出并测试了一个均场有限尺度缩放假设。这表明第一个相变与我们在分析中研究的相同的普适类别，这与均场铁磁-铁磁相变类似。

更新时间: 2024-05-29 08:18:03

领域: cs.LG,cond-mat.dis-nn,cond-mat.stat-mech

下载: http://arxiv.org/abs/2405.14689v2

LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping

Semantic Bird's Eye View (BEV) maps offer a rich representation with strong occlusion reasoning for various decision making tasks in autonomous driving. However, most BEV mapping approaches employ a fully supervised learning paradigm that relies on large amounts of human-annotated BEV ground truth data. In this work, we address this limitation by proposing the first unsupervised representation learning approach to generate semantic BEV maps from a monocular frontal view (FV) image in a label-efficient manner. Our approach pretrains the network to independently reason about scene geometry and scene semantics using two disjoint neural pathways in an unsupervised manner and then finetunes it for the task of semantic BEV mapping using only a small fraction of labels in the BEV. We achieve label-free pretraining by exploiting spatial and temporal consistency of FV images to learn scene geometry while relying on a novel temporal masked autoencoder formulation to encode the scene representation. Extensive evaluations on the KITTI-360 and nuScenes datasets demonstrate that our approach performs on par with the existing state-of-the-art approaches while using only 1% of BEV labels and no additional labeled data.

Updated: 2024-05-29 08:03:36

标题: LetsMap: 无监督语义BEV地图表示学习

摘要: 语义鸟瞰（BEV）地图为自动驾驶中的各种决策任务提供了丰富的表示，具有强大的遮挡推理能力。然而，大多数BEV映射方法采用了依赖于大量人工标注的BEV地面真实数据的全监督学习范式。在这项工作中，我们通过提出第一种无监督表示学习方法来解决这一限制，以便以一种标签高效的方式从单目前视（FV）图像生成语义BEV地图。我们的方法通过在无监督的方式下预训练网络来独立推理场景几何和场景语义，然后仅使用BEV中的少量标签对其进行微调，用于语义BEV映射任务。我们利用FV图像的空间和时间一致性来实现无标签预训练，同时依赖于一种新颖的时间掩码自编码器公式来编码场景表示。对KITTI-360和nuScenes数据集的广泛评估表明，我们的方法在仅使用1%的BEV标签且没有额外标记数据的情况下与现有的最先进方法性能相当。

更新时间: 2024-05-29 08:03:36

领域: cs.CV,cs.AI,cs.RO

下载: http://arxiv.org/abs/2405.18852v1

Anomaly Detection by Context Contrasting

Anomaly Detection focuses on identifying samples that deviate from the norm. When working with high-dimensional data such as images, a crucial requirement for detecting anomalous patterns is learning lower-dimensional representations that capture normal concepts seen during training. Recent advances in self-supervised learning have shown great promise in this regard. However, many of the most successful self-supervised anomaly detection methods assume prior knowledge about the structure of anomalies and leverage synthetic anomalies during training. Yet, in many real-world applications, we do not know what to expect from unseen data, and we can solely leverage knowledge about normal data. In this work, we propose Con2, which addresses this problem by setting normal training data into distinct contexts while preserving its normal properties, letting us observe the data from different perspectives. Unseen normal data consequently adheres to learned context representations while anomalies fail to do so, letting us detect them without any knowledge about anomalies during training. Our experiments demonstrate that our approach achieves state-of-the-art performance on various benchmarks while exhibiting superior performance in a more realistic healthcare setting, where knowledge about potential anomalies is often scarce.

Updated: 2024-05-29 07:59:06

标题: 上下文对比的异常检测

摘要: 异常检测侧重于识别偏离规范的样本。当处理诸如图像等高维数据时，检测异常模式的关键要求是学习捕捉训练过程中出现的正常概念的低维表示。最近自监督学习的进展在这方面表现出了巨大的潜力。然而，许多最成功的自监督异常检测方法假设对异常结构具有先验知识，并在训练过程中利用合成异常。然而，在许多实际应用中，我们对未知数据的预期不明确，我们只能利用有关正常数据的知识。在这项工作中，我们提出了Con2，通过将正常训练数据放入不同的环境中而保留其正常属性，让我们能从不同的角度观察数据。因此，未知的正常数据遵循学习到的环境表示，而异常则无法做到这一点，让我们能够在训练过程中不了解异常的情况下检测到它们。我们的实验证明，我们的方法在各种基准测试中取得了最新的性能，并在更现实的医疗保健环境中表现出更优异的性能，其中通常缺乏有关潜在异常的知识。

更新时间: 2024-05-29 07:59:06

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18848v1

Learning Any-View 6DoF Robotic Grasping in Cluttered Scenes via Neural Surface Rendering

A significant challenge for real-world robotic manipulation is the effective 6DoF grasping of objects in cluttered scenes from any single viewpoint without the need for additional scene exploration. This work reinterprets grasping as rendering and introduces NeuGraspNet, a novel method for 6DoF grasp detection that leverages advances in neural volumetric representations and surface rendering. It encodes the interaction between a robot's end-effector and an object's surface by jointly learning to render the local object surface and learning grasping functions in a shared feature space. The approach uses global (scene-level) features for grasp generation and local (grasp-level) neural surface features for grasp evaluation. This enables effective, fully implicit 6DoF grasp quality prediction, even in partially observed scenes. NeuGraspNet operates on random viewpoints, common in mobile manipulation scenarios, and outperforms existing implicit and semi-implicit grasping methods. The real-world applicability of the method has been demonstrated with a mobile manipulator robot, grasping in open, cluttered spaces. Project website at https://sites.google.com/view/neugraspnet

Updated: 2024-05-29 07:58:46

标题: 通过神经表面渲染在混乱场景中学习任意视角的6DoF机器人抓取

摘要: 现实世界中机器人操作的一个重要挑战是在拥挤场景中有效地实现对象的六自由度抓取，而无需额外的场景探索。本研究重新解释抓取为渲染，并引入NeuGraspNet，一种利用神经体积表示和表面渲染进步的六自由度抓取检测新方法。它通过共同学习渲染本地对象表面和学习抓取函数来编码机器人末端执行器与对象表面之间的交互。该方法使用全局（场景级别）特征进行抓取生成，使用局部（抓取级别）神经表面特征进行抓取评估。即使在部分观察到的场景中，这也使得实现有效、完全隐式的六自由度抓取质量预测成为可能。NeuGraspNet在随机视点上运行，这在移动操作场景中很常见，并且胜过现有的隐式和半隐式抓取方法。该方法已经在移动操作机器人上展示了其在开放、拥挤空间中的实际应用性。项目网站位于https://sites.google.com/view/neugraspnet。

更新时间: 2024-05-29 07:58:46

领域: cs.RO,cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.07392v4

Simulation, Modelling and Classification of Wiki Contributors: Spotting The Good, The Bad, and The Ugly

Data crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage - a free worldwide wiki travel guide open to contribution from the general public - as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.

Updated: 2024-05-29 07:56:08

标题: 模拟、建模和分类维基贡献者：发现好的、坏的和丑陋的

摘要: 数据众包是一种数据获取过程，其中一群自愿的贡献者为平台提供高度相关的数据，包括新闻、评论、媒体、知识和分类。它通常处理用户生成的数据流，以提供和完善流行服务，如维基百科、协作地图、电子商务网站和社交网络。然而，这种操作方式引发了对于恶意数据操纵在对抗环境中的严重担忧。本文提出了一种模拟、建模和分类方法，通过使用数据制造来平衡实验数据集中的类别，数据流建模来构建和更新贡献者档案，最终实现自动数据流分类，从而自动识别人类和非人类（机器人）以及良性和恶性贡献者。通过利用WikiVoyage作为一个测试基地，我们的方法证明通过使用包含真实和合成数据的类平衡数据流，显著提高了分类器的置信度和质量。我们的实证结果显示，所提出的方法能够以高达92%的分类准确度区分良性和恶性机器人以及人类贡献者。

更新时间: 2024-05-29 07:56:08

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18845v1

Data-driven Machinery Fault Detection: A Comprehensive Review

In this era of advanced manufacturing, it's now more crucial than ever to diagnose machine faults as early as possible to guarantee their safe and efficient operation. With the massive surge in industrial big data and advancement in sensing and computational technologies, data-driven Machinery Fault Diagnosis (MFD) solutions based on machine/deep learning approaches have been used ubiquitously in manufacturing. Timely and accurately identifying faulty machine signals is vital in industrial applications for which many relevant solutions have been proposed and are reviewed in many articles. Despite the availability of numerous solutions and reviews on MFD, existing works often lack several aspects. Most of the available literature has limited applicability in a wide range of manufacturing settings due to their concentration on a particular type of equipment or method of analysis. Additionally, discussions regarding the challenges associated with implementing data-driven approaches, such as dealing with noisy data, selecting appropriate features, and adapting models to accommodate new or unforeseen faults, are often superficial or completely overlooked. Thus, this survey provides a comprehensive review of the articles using different types of machine learning approaches for the detection and diagnosis of various types of machinery faults, highlights their strengths and limitations, provides a review of the methods used for condition-based analyses, comprehensively discusses the available machinery fault datasets, introduces future researchers to the possible challenges they have to encounter while using these approaches for MFD and recommends the probable solutions to mitigate those problems. The future research prospects are also pointed out for a better understanding of the field. We believe this article will help researchers and contribute to the further development of the field.

Updated: 2024-05-29 07:50:47

标题: 基于数据驱动的机械故障检测：综述

摘要: 在这个先进制造业的时代，及时诊断机器故障以尽可能保证其安全和高效运行比以往任何时候都更为关键。随着工业大数据的大规模增长和传感和计算技术的进步，基于机器/深度学习方法的数据驱动型机械故障诊断（MFD）解决方案已经在制造业中得到广泛应用。及时而准确地识别出故障机器信号在工业应用中至关重要，为此已经提出了许多相关解决方案，并在许多文章中进行了审查。尽管已经有了大量有关MFD的解决方案和评论，但现有研究往往存在一些缺陷。大多数可用文献由于其集中于特定类型的设备或分析方法，对各种制造环境的适用性有限。此外，有关实施数据驱动方法所面临的挑战，如处理嘈杂数据、选择适当特征以及调整模型以适应新的或意外的故障，往往表面化或完全被忽视。因此，本调查综合审查了利用不同类型机器学习方法进行各种机械故障检测和诊断的文章，突出了它们的优势和局限性，对基于条件的分析方法进行了审查，全面讨论了可用的机械故障数据集，向未来研究者介绍了他们在使用这些方法进行MFD时可能会遇到的挑战，并推荐了可能的解决方案以减轻这些问题。还指出了未来的研究前景，以更好地了解该领域。我们相信本文将帮助研究人员并有助于该领域的进一步发展。

更新时间: 2024-05-29 07:50:47

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18843v1

Locally Testing Model Detections for Semantic Global Concepts

Ensuring the quality of black-box Deep Neural Networks (DNNs) has become ever more significant, especially in safety-critical domains such as automated driving. While global concept encodings generally enable a user to test a model for a specific concept, linking global concept encodings to the local processing of single network inputs reveals their strengths and limitations. Our proposed framework global-to-local Concept Attribution (glCA) uses approaches from local (why a specific prediction originates) and global (how a model works generally) eXplainable Artificial Intelligence (xAI) to test DNNs for a predefined semantical concept locally. The approach allows for conditioning local, post-hoc explanations on predefined semantic concepts encoded as linear directions in the model's latent space. Pixel-exact scoring concerning the global concept usage assists the tester in further understanding the model processing of single data points for the selected concept. Our approach has the advantage of fully covering the model-internal encoding of the semantic concept and allowing the localization of relevant concept-related information. The results show major differences in the local perception and usage of individual global concept encodings and demand for further investigations regarding obtaining thorough semantic concept encodings.

Updated: 2024-05-29 07:40:40

标题: 本地测试模型检测语义全局概念

摘要: 确保黑盒深度神经网络（DNNs）的质量变得越来越重要，特别是在自动驾驶等安全关键领域。虽然全局概念编码通常使用户能够测试模型的特定概念，但将全局概念编码与单个网络输入的局部处理联系起来揭示了它们的优势和局限性。我们提出的框架全局到局部概念归因（glCA）利用局部（为什么特定预测源于）和全局（模型如何一般工作）可解释人工智能（xAI）的方法，以在本地为预定义的语义概念测试DNNs。该方法允许将预定义的语义概念编码为模型潜在空间中的线性方向，从而在局部条件化后验解释。关于全局概念使用的像素精确评分有助于测试人员进一步了解所选概念的单个数据点的模型处理。我们的方法具有完全覆盖语义概念的模型内部编码并允许定位相关概念相关信息的优势。结果显示个别全局概念编码的局部感知和使用存在主要差异，并需要进一步研究以获得彻底的语义概念编码。

更新时间: 2024-05-29 07:40:40

领域: cs.CV,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.17523v2

PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator

We present Piecewise Rectified Flow (PeRFlow), a flow-based method for accelerating diffusion models. PeRFlow divides the sampling process of generative flows into several time windows and straightens the trajectories in each interval via the reflow operation, thereby approaching piecewise linear flows. PeRFlow achieves superior performance in a few-step generation. Moreover, through dedicated parameterizations, the PeRFlow models inherit knowledge from the pretrained diffusion models. Thus, the training converges fast and the obtained models show advantageous transfer ability, serving as universal plug-and-play accelerators that are compatible with various workflows based on the pre-trained diffusion models. Codes for training and inference are publicly released. https://github.com/magic-research/piecewise-rectified-flow

Updated: 2024-05-29 07:39:56

标题: PeRFlow: 分段整流流作为通用即插即用加速器

摘要: 我们提出了Piecewise Rectified Flow（PeRFlow），这是一种加速扩散模型的基于流的方法。PeRFlow将生成流的采样过程分为几个时间窗口，并通过重新流操作在每个间隔中拉直轨迹，从而接近分段线性流。PeRFlow在几步生成中实现了优越性能。此外，通过专门的参数化，PeRFlow模型继承了预训练扩散模型的知识。因此，训练收敛快速，获得的模型展现出有利的传输能力，作为通用即插即用加速器，与基于预训练扩散模型的各种工作流兼容。训练和推理的代码已经公开发布。https://github.com/magic-research/piecewise-rectified-flow

更新时间: 2024-05-29 07:39:56

领域: cs.LG

下载: http://arxiv.org/abs/2405.07510v3

Do Finetti: On Causal Effects for Exchangeable Data

We study causal effect estimation in a setting where the data are not i.i.d. (independent and identically distributed). We focus on exchangeable data satisfying an assumption of independent causal mechanisms. Traditional causal effect estimation frameworks, e.g., relying on structural causal models and do-calculus, are typically limited to i.i.d. data and do not extend to more general exchangeable generative processes, which naturally arise in multi-environment data. To address this gap, we develop a generalized framework for exchangeable data and introduce a truncated factorization formula that facilitates both the identification and estimation of causal effects in our setting. To illustrate potential applications, we introduce a causal P\'olya urn model and demonstrate how intervention propagates effects in exchangeable data settings. Finally, we develop an algorithm that performs simultaneous causal discovery and effect estimation given multi-environment data.

Updated: 2024-05-29 07:31:18

标题: Do Finetti：关于可交换数据的因果效应

摘要: 我们研究因果效应估计的一个情境，其中数据不是独立同分布的（i.i.d.）。我们着重于满足独立因果机制假设的可交换数据。传统的因果效应估计框架，比如依赖结构因果模型和do-演算，通常局限于i.i.d.数据，并不能拓展到更一般的可交换生成过程，这些过程在多环境数据中自然出现。为了填补这一空白，我们开发了一个针对可交换数据的广义框架，并引入了一个截断因子分解公式，促进了在我们的情境中因果效应的识别和估计。为了说明潜在的应用，我们介绍了一个因果P\'olya乌恩模型，并展示了干预如何在可交换数据情境中传播效应。最后，我们开发了一个算法，可以在给定多环境数据的情况下同时进行因果发现和效应估计。

更新时间: 2024-05-29 07:31:18

领域: stat.ME,cs.LG

下载: http://arxiv.org/abs/2405.18836v1

Discovering Interpretable Directions in the Semantic Latent Space of Diffusion Models

Denoising Diffusion Models (DDMs) have emerged as a strong competitor to Generative Adversarial Networks (GANs). However, despite their widespread use in image synthesis and editing applications, their latent space is still not as well understood. Recently, a semantic latent space for DDMs, coined `$h$-space', was shown to facilitate semantic image editing in a way reminiscent of GANs. The $h$-space is comprised of the bottleneck activations in the DDM's denoiser across all timesteps of the diffusion process. In this paper, we explore the properties of h-space and propose several novel methods for finding meaningful semantic directions within it. We start by studying unsupervised methods for revealing interpretable semantic directions in pretrained DDMs. Specifically, we show that global latent directions emerge as the principal components in the latent space. Additionally, we provide a novel method for discovering image-specific semantic directions by spectral analysis of the Jacobian of the denoiser w.r.t. the latent code. Next, we extend the analysis by finding directions in a supervised fashion in unconditional DDMs. We demonstrate how such directions can be found by relying on either a labeled data set of real images or by annotating generated samples with a domain-specific attribute classifier. We further show how to semantically disentangle the found direction by simple linear projection. Our approaches are applicable without requiring any architectural modifications, text-based guidance, CLIP-based optimization, or model fine-tuning.

Updated: 2024-05-29 07:30:37

标题: 在扩散模型的语义潜在空间中发现可解释的方向

摘要: 去噪扩散模型（DDMs）已经成为生成对抗网络（GANs）的强大竞争对手。然而，尽管它们在图像合成和编辑应用中被广泛使用，但它们的潜在空间仍然不太被理解。最近，一种用于DDMs的语义潜在空间，被称为`$h$-space'，被证明可以以类似GANs的方式促进语义图像编辑。$h$-space由DDM的降噪器在扩散过程的所有时间步骤中的瓶颈激活组成。在本文中，我们探讨了h-space的属性，并提出了几种在其中找到有意义的语义方向的新方法。我们首先研究了在预训练的DDMs中揭示可解释的语义方向的无监督方法。具体来说，我们展示了全局潜在方向在潜在空间中表现为主成分。此外，我们提供了一种通过对降噪器关于潜在编码的雅可比矩阵进行谱分析来发现特定于图像的语义方向的新方法。接下来，我们通过无条件DDMs以监督的方式找到方向来扩展分析。我们演示了如何通过依赖于真实图像的标记数据集或通过使用特定于领域的属性分类器对生成的样本进行注释来找到这样的方向。我们进一步展示了如何通过简单的线性投影对找到的方向进行语义解缠。我们的方法适用于不需要任何架构修改、基于文本的指导、基于CLIP的优化或模型微调。

更新时间: 2024-05-29 07:30:37

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2303.11073v2

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Recent advancements in zero-shot text-to-speech (TTS) modeling have led to significant strides in generating high-fidelity and diverse speech. However, dialogue generation, along with achieving human-like naturalness in speech, continues to be a challenge. In this paper, we introduce CoVoMix: Conversational Voice Mixture Generation, a novel model for zero-shot, human-like, multi-speaker, multi-round dialogue speech generation. CoVoMix first converts dialogue text into multiple streams of discrete tokens, with each token stream representing semantic information for individual talkers. These token streams are then fed into a flow-matching based acoustic model to generate mixed mel-spectrograms. Finally, the speech waveforms are produced using a HiFi-GAN model. Furthermore, we devise a comprehensive set of metrics for measuring the effectiveness of dialogue modeling and generation. Our experimental results show that CoVoMix can generate dialogues that are not only human-like in their naturalness and coherence but also involve multiple talkers engaging in multiple rounds of conversation. This is exemplified by instances generated in a single channel where one speaker's utterance is seamlessly mixed with another's interjections or laughter, indicating the latter's role as an attentive listener. Audio samples are available at https://aka.ms/covomix.

Updated: 2024-05-29 07:30:20

标题: CoVoMix：推动零样本语音生成技术，实现类人多方对话

摘要: 最近在零样本文本转语音(TTS)建模方面取得的进展已经在生成高保真度和多样化的语音方面取得了重大进展。然而，对话生成，以及实现类似人类自然语音的自然度，仍然是一个挑战。在本文中，我们介绍了CoVoMix：对话语音混合生成，这是一个新颖的零样本模型，用于生成类似人类、多说话者、多轮对话的语音。CoVoMix首先将对话文本转换为多个离散标记流，每个标记流代表个体发言者的语义信息。然后，这些标记流被输入基于流匹配的声学模型中，以生成混合的mel频谱图。最后，语音波形是使用HiFi-GAN模型生成的。此外，我们设计了一套全面的度量标准，用于衡量对话建模和生成的有效性。我们的实验结果显示，CoVoMix能够生成不仅在自然度和连贯性方面类似人类的对话，而且还涉及多个说话者参与多轮对话。这可以通过在单个通道中生成的实例来说明，其中一个发言者的话语与另一个发言者的插话或笑声无缝混合，表明后者作为专注的听众的角色。音频样本可在https://aka.ms/covomix找到。

更新时间: 2024-05-29 07:30:20

领域: eess.AS,cs.AI,cs.CL,cs.LG,cs.SD

下载: http://arxiv.org/abs/2404.06690v2

MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models

Mixture-of-Experts (MoE) large language models (LLM) have memory requirements that often exceed the GPU memory capacity, requiring costly parameter movement from secondary memories to the GPU for expert computation. In this work, we present Mixture of Near-Data Experts (MoNDE), a near-data computing solution that efficiently enables MoE LLM inference. MoNDE reduces the volume of MoE parameter movement by transferring only the $\textit{hot}$ experts to the GPU, while computing the remaining $\textit{cold}$ experts inside the host memory device. By replacing the transfers of massive expert parameters with the ones of small activations, MoNDE enables far more communication-efficient MoE inference, thereby resulting in substantial speedups over the existing parameter offloading frameworks for both encoder and decoder operations.

Updated: 2024-05-29 07:23:29

标题: MoNDE: 大规模稀疏模型的混合近数据专家

摘要: 混合专家 (MoE) 大型语言模型 (LLM) 的内存需求通常超过 GPU 内存容量，需要将参数从辅助内存传输到 GPU 进行专家计算，造成高昂的成本。在本研究中，我们提出了近数据专家混合 (MoNDE)，这是一种近数据计算解决方案，可以有效地实现 MoE LLM 推断。MoNDE 通过仅将 $\textit{热}$ 专家传输到 GPU，同时在主机内存设备中计算其余 $\textit{冷}$ 专家，减少了 MoE 参数传输的量。通过用小激活替换大量专家参数的传输，MoNDE 实现了更加高效的通信，从而实现了对现有参数卸载框架的显著加速，无论是编码器还是解码器操作。

更新时间: 2024-05-29 07:23:29

领域: cs.LG,cs.AI,cs.AR

下载: http://arxiv.org/abs/2405.18832v1

A Homogenization Approach for Gradient-Dominated Stochastic Optimization

Gradient dominance property is a condition weaker than strong convexity, yet sufficiently ensures global convergence even in non-convex optimization. This property finds wide applications in machine learning, reinforcement learning (RL), and operations management. In this paper, we propose the stochastic homogeneous second-order descent method (SHSODM) for stochastic functions enjoying gradient dominance property based on a recently proposed homogenization approach. Theoretically, we provide its sample complexity analysis, and further present an enhanced result by incorporating variance reduction techniques. Our findings show that SHSODM matches the best-known sample complexity achieved by other second-order methods for gradient-dominated stochastic optimization but without cubic regularization. Empirically, since the homogenization approach only relies on solving extremal eigenvector problem at each iteration instead of Newton-type system, our methods gain the advantage of cheaper computational cost and robustness in ill-conditioned problems. Numerical experiments on several RL tasks demonstrate the better performance of SHSODM compared to other off-the-shelf methods.

Updated: 2024-05-29 07:22:58

标题: 一个用于梯度主导随机优化的均匀化方法

摘要: Gradient dominance property 是一个比强凸性条件更弱的条件，但足以确保全局收敛，即使在非凸优化中也是如此。这个属性在机器学习、强化学习和运营管理中有广泛的应用。在本文中，我们提出了一种基于最近提出的均匀化方法的随机均匀二阶下降方法（SHSODM），用于享有梯度优势特性的随机函数。在理论上，我们提供了其样本复杂性分析，并进一步通过结合方差减少技术提出了一个增强的结果。我们的发现表明，SHSODM与其他梯度占优随机优化的二阶方法在样本复杂性方面达到了已知最佳水平，但没有三次正则化。从实证上看，由于均匀化方法只依赖于在每次迭代中解决极端特征向量问题而不是牛顿类型系统，我们的方法在计算成本更低、在病态问题中更具鲁棒性。在几个RL任务上的数值实验表明，相对于其他现成方法，SHSODM的性能更好。

更新时间: 2024-05-29 07:22:58

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2308.10630v3

Evaluating Zero-Shot GPT-4V Performance on 3D Visual Question Answering Benchmarks

As interest in "reformulating" the 3D Visual Question Answering (VQA) problem in the context of foundation models grows, it is imperative to assess how these new paradigms influence existing closed-vocabulary datasets. In this case study, we evaluate the zero-shot performance of foundational models (GPT-4 Vision and GPT-4) on well-established 3D VQA benchmarks, namely 3D-VQA and ScanQA. We provide an investigation to contextualize the performance of GPT-based agents relative to traditional modeling approaches. We find that GPT-based agents without any fine-tuning perform on par with the closed vocabulary approaches. Our findings corroborate recent results that "blind" models establish a surprisingly strong baseline in closed-vocabulary settings. We demonstrate that agents benefit significantly from scene-specific vocabulary via in-context textual grounding. By presenting a preliminary comparison with previous baselines, we hope to inform the community's ongoing efforts to refine multi-modal 3D benchmarks.

Updated: 2024-05-29 07:20:28

标题: 评估零射击GPT-4V在3D视觉问答基准测试中的表现

摘要: 随着对基础模型增长的兴趣，有关在基础模型的背景下“重构”三维视觉问答（VQA）问题的兴趣，评估这些新范式如何影响现有的封闭词汇数据集至关重要。在这个案例研究中，我们评估了基础模型（GPT-4 Vision和GPT-4）在已建立的三维VQA基准数据集上的零次性能，即3D-VQA和ScanQA。我们进行了调查，以便将基于GPT的代理相对于传统建模方法的性能进行定位。我们发现，未经任何微调的基于GPT的代理性能与封闭词汇方法相当。我们的发现证实了最近的结果，即“盲目”模型在封闭词汇设置中建立了令人惊讶的强基线。我们证明，代理受益于通过上下文文本基础对特定场景词汇的显著提升。通过与先前基线的初步比较，我们希望为社区不断改进多模态三维基准数据集的努力提供信息。

更新时间: 2024-05-29 07:20:28

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.18831v1

Yuan 2.0-M32: Mixture of Experts with Attention Router

Yuan 2.0-M32, with a similar base architecture as Yuan-2.0 2B, uses a mixture-of-experts architecture with 32 experts of which 2 experts are active. A new router network, Attention Router, is proposed and adopted for a more efficient selection of experts, which improves the accuracy compared to the model with classical router network. Yuan 2.0-M32 is trained with 2000B tokens from scratch, and the training computation consumption is only 9.25% of a dense model at the same parameter scale. Yuan 2.0-M32 demonstrates competitive capability on coding, math, and various domains of expertise, with only 3.7B active parameters of 40B in total, and 7.4 GFlops forward computation per token, both of which are only 1/19 of Llama3-70B. Yuan 2.0-M32 surpass Llama3-70B on MATH and ARC-Challenge benchmark, with accuracy of 55.89 and 95.8 respectively. The models and source codes of Yuan 2.0-M32 are released at Github1.

Updated: 2024-05-29 07:19:58

标题: Yuan 2.0-M32:带有注意路由器的专家混合模型

摘要: Yuan 2.0-M32采用了与Yuan-2.0 2B相似的基础架构，使用32个专家的混合专家架构，其中有2个专家处于活跃状态。提出并采用了一种新的路由器网络，Attention Router，以更高效地选择专家，从而提高了与具有经典路由器网络的模型相比的准确性。Yuan 2.0-M32从头开始使用2000B令牌进行训练，训练计算消耗仅为相同参数规模下密集模型的9.25%。Yuan 2.0-M32在编码、数学和各种专业领域上展示了竞争能力，仅使用了40B总参数中的3.7B活跃参数，每令牌的前向计算为7.4 GFlops，这两者仅为Llama3-70B的1/19。Yuan 2.0-M32在数学和ARC-Challenge基准测试中超过了Llama3-70B，准确率分别为55.89和95.8。Yuan 2.0-M32的模型和源代码已在Github上发布。

更新时间: 2024-05-29 07:19:58

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.17976v2

Improving Token-Based World Models with Parallel Observation Prediction

Motivated by the success of Transformers when applied to sequences of discrete symbols, token-based world models (TBWMs) were recently proposed as sample-efficient methods. In TBWMs, the world model consumes agent experience as a language-like sequence of tokens, where each observation constitutes a sub-sequence. However, during imagination, the sequential token-by-token generation of next observations results in a severe bottleneck, leading to long training times, poor GPU utilization, and limited representations. To resolve this bottleneck, we devise a novel Parallel Observation Prediction (POP) mechanism. POP augments a Retentive Network (RetNet) with a novel forward mode tailored to our reinforcement learning setting. We incorporate POP in a novel TBWM agent named REM (Retentive Environment Model), showcasing a 15.4x faster imagination compared to prior TBWMs. REM attains superhuman performance on 12 out of 26 games of the Atari 100K benchmark, while training in less than 12 hours. Our code is available at \url{https://github.com/leor-c/REM}.

Updated: 2024-05-29 07:16:28

标题: 用并行观测预测改进基于令牌的世界模型

摘要: 受到将Transformer应用于离散符号序列的成功的启发，最近提出了基于标记的世界模型（TBWM）作为高效样本方法。在TBWM中，世界模型将代理经验作为类似语言的标记序列消耗，其中每个观察构成一个子序列。然而，在想象过程中，逐个标记生成下一个观察结果导致严重的瓶颈，导致长时间训练，GPU利用率低，以及表示受限。为了解决这个瓶颈，我们设计了一种新颖的并行观察预测（POP）机制。POP通过将一个Retentive Network（RetNet）与一个针对我们的强化学习设置量身定制的新颖前向模式相结合来增强。我们将POP纳入一种名为REM（Retentive Environment Model）的新型TBWM代理中，展示比以前的TBWM快15.4倍的想象速度。REM在Atari 100K基准测试中的26个游戏中的12个游戏中取得超人类表现，而训练时间不到12小时。我们的代码可在\url{https://github.com/leor-c/REM}找到。

更新时间: 2024-05-29 07:16:28

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.05643v5

Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text

AI-generated text detection has attracted increasing attention as powerful language models approach human-level generation. Limited work is devoted to detecting (partially) AI-paraphrased texts. However, AI paraphrasing is commonly employed in various application scenarios for text refinement and diversity. To this end, we propose a novel detection framework, paraphrased text span detection (PTD), aiming to identify paraphrased text spans within a text. Different from text-level detection, PTD takes in the full text and assigns each of the sentences with a score indicating the paraphrasing degree. We construct a dedicated dataset, PASTED, for paraphrased text span detection. Both in-distribution and out-of-distribution results demonstrate the effectiveness of PTD models in identifying AI-paraphrased text spans. Statistical and model analysis explains the crucial role of the surrounding context of the paraphrased text spans. Extensive experiments show that PTD models can generalize to versatile paraphrasing prompts and multiple paraphrased text spans. We release our resources at https://github.com/Linzwcs/PASTED.

Updated: 2024-05-29 07:09:59

标题: 发现人工智能的痕迹：在文本中识别LLM改写的片段

摘要: 人工智能生成文本检测已经引起越来越多的关注，因为强大的语言模型接近人类水平的生成能力。目前有限的工作致力于检测（部分）人工智能改写的文本。然而，人工智能改写常常被应用于各种应用场景，用于文本的精炼和多样性。为此，我们提出了一个新颖的检测框架，即改写文本跨度检测（PTD），旨在识别文本中的改写文本跨度。与文本级别检测不同，PTD接受完整文本，并为每个句子分配一个表示改写程度的分数。我们构建了一个专门的数据集PASTED，用于检测改写文本跨度。在分布内和分布外的结果都证明了PTD模型在识别人工智能改写的文本跨度方面的有效性。统计和模型分析解释了改写文本跨度周围上下文的关键作用。大量实验表明，PTD模型可以推广到多样的改写提示和多个改写文本跨度。我们将我们的资源发布在https://github.com/Linzwcs/PASTED。

更新时间: 2024-05-29 07:09:59

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.12689v2

Node Injection Attack Based on Label Propagation Against Graph Neural Network

Graph Neural Network (GNN) has achieved remarkable success in various graph learning tasks, such as node classification, link prediction and graph classification. The key to the success of GNN lies in its effective structure information representation through neighboring aggregation. However, the attacker can easily perturb the aggregation process through injecting fake nodes, which reveals that GNN is vulnerable to the graph injection attack. Existing graph injection attack methods primarily focus on damaging the classical feature aggregation process while overlooking the neighborhood aggregation process via label propagation. To bridge this gap, we propose the label-propagation-based global injection attack (LPGIA) which conducts the graph injection attack on the node classification task. Specifically, we analyze the aggregation process from the perspective of label propagation and transform the graph injection attack problem into a global injection label specificity attack problem. To solve this problem, LPGIA utilizes a label propagation-based strategy to optimize the combinations of the nodes connected to the injected node. Then, LPGIA leverages the feature mapping to generate malicious features for injected nodes. In extensive experiments against representative GNNs, LPGIA outperforms the previous best-performing injection attack method in various datasets, demonstrating its superiority and transferability.

Updated: 2024-05-29 07:09:16

标题: 基于标签传播的节点注入攻击对图神经网络的影响

摘要: 图神经网络（GNN）在各种图学习任务中取得了显著的成功，例如节点分类、链接预测和图分类。GNN成功的关键在于通过邻居聚合实现有效的结构信息表示。然而，攻击者可以通过注入虚假节点轻松扰乱聚合过程，这表明GNN容易受到图注入攻击的影响。现有的图注入攻击方法主要集中在破坏经典特征聚合过程，而忽视了通过标签传播的邻域聚合过程。为了弥合这一差距，我们提出了基于标签传播的全局注入攻击（LPGIA），该攻击对节点分类任务进行图注入攻击。具体而言，我们从标签传播的角度分析聚合过程，并将图注入攻击问题转化为全局注入标签特异性攻击问题。为了解决这个问题，LPGIA利用基于标签传播的策略来优化与注入节点连接的节点组合。然后，LPGIA利用特征映射为注入节点生成恶意特征。在针对代表性GNN的广泛实验中，LPGIA在各种数据集上都优于先前表现最佳的注入攻击方法，展示了其优越性和可迁移性。

更新时间: 2024-05-29 07:09:16

领域: cs.CR

下载: http://arxiv.org/abs/2405.18824v1

Why Reinforcement Learning in Energy Systems Needs Explanations

With economic development, the complexity of infrastructure has increased drastically. Similarly, with the shift from fossil fuels to renewable sources of energy, there is a dire need for such systems that not only predict and forecast with accuracy but also help in understanding the process of predictions. Artificial intelligence and machine learning techniques have helped in finding out wellperforming solutions to different problems in the energy sector. However, the usage of state-of-the-art techniques like reinforcement learning is not surprisingly convincing. This paper discusses the application of reinforcement techniques in energy systems and how explanations of these models can be helpful

Updated: 2024-05-29 07:09:00

标题: 为什么能源系统中的强化学习需要解释说明？

摘要: 随着经济发展，基础设施的复杂性急剧增加。同样，随着从化石燃料向可再生能源的转变，有迫切需要这种不仅能准确预测和预测，而且还能帮助理解预测过程的系统。人工智能和机器学习技术已经帮助找到了在能源领域不同问题上表现良好的解决方案。然而，使用像强化学习这样的尖端技术并不令人惊讶。本文讨论了在能源系统中应用强化学习技术以及这些模型的解释如何有助于解决问题。

更新时间: 2024-05-29 07:09:00

领域: cs.AI

下载: http://arxiv.org/abs/2405.18823v1

Diffeomorphic interpolation for efficient persistence-based topological optimization

Topological Data Analysis (TDA) provides a pipeline to extract quantitative topological descriptors from structured objects. This enables the definition of topological loss functions, which assert to what extent a given object exhibits some topological properties. These losses can then be used to perform topological optimizationvia gradient descent routines. While theoretically sounded, topological optimization faces an important challenge: gradients tend to be extremely sparse, in the sense that the loss function typically depends on only very few coordinates of the input object, yielding dramatically slow optimization schemes in practice.Focusing on the central case of topological optimization for point clouds, we propose in this work to overcome this limitation using diffeomorphic interpolation, turning sparse gradients into smooth vector fields defined on the whole space, with quantifiable Lipschitz constants. In particular, we show that our approach combines efficiently with subsampling techniques routinely used in TDA, as the diffeomorphism derived from the gradient computed on a subsample can be used to update the coordinates of the full input object, allowing us to perform topological optimization on point clouds at an unprecedented scale. Finally, we also showcase the relevance of our approach for black-box autoencoder (AE) regularization, where we aim at enforcing topological priors on the latent spaces associated to fixed, pre-trained, black-box AE models, and where we show thatlearning a diffeomorphic flow can be done once and then re-applied to new data in linear time (while vanilla topological optimization has to be re-run from scratch). Moreover, reverting the flow allows us to generate data by sampling the topologically-optimized latent space directly, yielding better interpretability of the model.

Updated: 2024-05-29 07:00:28

标题: 同胚插值用于高效基于持续性的拓扑优化

摘要: 拓扑数据分析（TDA）提供了从结构化对象中提取定量拓扑描述符的管道。这使得可以定义拓扑损失函数，用来衡量给定对象展示某些拓扑特性的程度。这些损失可以用来执行通过梯度下降例程进行拓扑优化。尽管在理论上是合理的，拓扑优化面临一个重要挑战：梯度往往非常稀疏，即损失函数通常只依赖于输入对象的很少坐标，导致实际中优化方案非常缓慢。在关注点云的中心情况下，我们在这项工作中提出通过各向同性插值来克服这一限制，将稀疏梯度转化为在整个空间上定义的平滑矢量场，具有可量化的Lipschitz常数。特别是，我们展示了我们的方法与TDA中常用的子采样技术有效结合，因为从子样本计算的梯度导出的各向同性插值可以用于更新完整输入对象的坐标，使我们能够在前所未有的规模上对点云进行拓扑优化。最后，我们还展示了我们的方法在黑盒自动编码器（AE）正则化中的相关性，我们旨在对与固定、预先训练的黑盒AE模型相关的潜在空间强制实施拓扑先验，我们展示学习各向同性流可以在一次完成，然后在线性时间内重新应用到新数据中（而普通的拓扑优化必须从头开始重新运行）。此外，逆转流使我们能够通过直接采样经过拓扑优化的潜在空间来生成数据，提供了更好的模型可解释性。

更新时间: 2024-05-29 07:00:28

领域: cs.AI,cs.CG,math.OC

下载: http://arxiv.org/abs/2405.18820v1

SegICL: A Multimodal In-context Learning Framework for Enhanced Segmentation in Medical Imaging

In the field of medical image segmentation, tackling Out-of-Distribution (OOD) segmentation tasks in a cost-effective manner remains a significant challenge. Universal segmentation models is a solution, which aim to generalize across the diverse modality of medical images, yet their effectiveness often diminishes when applied to OOD data modalities and tasks, requiring intricate fine-tuning of model for optimal performance. Few-shot learning segmentation methods are typically designed for specific modalities of data and cannot be directly transferred for use with another modality. Therefore, we introduce SegICL, a novel approach leveraging In-Context Learning (ICL) for image segmentation. Unlike existing methods, SegICL has the capability to employ text-guided segmentation and conduct in-context learning with a small set of image-mask pairs, eliminating the need for training the model from scratch or fine-tuning for OOD tasks (including OOD modality and dataset). Extensive experimental demonstrates a positive correlation between the number of shots and segmentation performance on OOD tasks. The performance of segmentation when provided thre-shots is approximately 1.5 times better than the performance in a zero-shot setting. This indicates that SegICL effectively address new segmentation tasks based on contextual information. Additionally, SegICL also exhibits comparable performance to mainstream models on OOD and in-distribution tasks. Our code will be released after paper review.

Updated: 2024-05-29 07:00:22

标题: SegICL：一种用于增强医学成像中分割的多模态上下文学习框架

摘要: 在医学图像分割领域，以一种成本有效的方式处理超出分布（OOD）分割任务仍然是一个重要挑战。通用分割模型是一个解决方案，旨在概括医学图像的多样性模态，然而当应用于OOD数据模态和任务时，它们的有效性通常会降低，需要对模型进行复杂的微调以实现最佳性能。少样本学习分割方法通常设计用于特定类型的数据，不能直接转移到另一种类型的数据中使用。因此，我们介绍了SegICL，一种利用上下文学习（ICL）进行图像分割的新方法。与现有方法不同，SegICL具有利用文本引导分割和使用少量图像-掩模对进行上下文学习的能力，消除了需要从头开始训练模型或为OOD任务进行微调的需求（包括OOD模态和数据集）。广泛的实验表明，镜头数量与OOD任务上的分割性能之间存在正相关关系。当提供三个镜头时，分割性能大约比零镜头设置中的性能好1.5倍。这表明SegICL有效地根据上下文信息解决新的分割任务。此外，SegICL在OOD和分布任务上也表现出与主流模型相当的性能。我们的代码将在论文审查后发布。

更新时间: 2024-05-29 07:00:22

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.16578v3

Gradient Descent with Polyak's Momentum Finds Flatter Minima via Large Catapults

Although gradient descent with Polyak's momentum is widely used in modern machine and deep learning, a concrete understanding of its effects on the training trajectory remains elusive. In this work, we empirically show that for linear diagonal networks and nonlinear neural networks, momentum gradient descent with a large learning rate displays large catapults, driving the iterates towards much flatter minima than those found by gradient descent. We hypothesize that the large catapult is caused by momentum "prolonging" the self-stabilization effect (Damian et al., 2023). We provide theoretical and empirical support for our hypothesis in a simple toy example and empirical evidence supporting our hypothesis for linear diagonal networks.

Updated: 2024-05-29 06:56:37

标题: Gradient Descent with Polyak的动量通过大型弹射找到更平坦的最小值

摘要: 尽管Polyak动量梯度下降在现代机器和深度学习中被广泛使用，但对其对训练轨迹的影响的具体理解仍然难以捉摸。在这项工作中，我们实证显示，对于线性对角网络和非线性神经网络，具有较大学习率的动量梯度下降展示出较大的弹射力，将迭代向更平缓的最小值推进，而这些最小值比梯度下降找到的更平坦。我们假设这种大弹射力是由于动量“延长”了自稳定效应（Damian等，2023）。我们在一个简单的玩具例子中提供了理论和实证支持我们的假设，并提供了支持我们假设的线性对角网络的实证证据。

更新时间: 2024-05-29 06:56:37

领域: cs.LG,math.OC,stat.ML

下载: http://arxiv.org/abs/2311.15051v3

Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching

Generative models based on flow matching have attracted significant attention for their simplicity and superior performance in high-resolution image synthesis. By leveraging the instantaneous change-of-variables formula, one can directly compute image likelihoods from a learned flow, making them enticing candidates as priors for downstream tasks such as inverse problems. In particular, a natural approach would be to incorporate such image probabilities in a maximum-a-posteriori (MAP) estimation problem. A major obstacle, however, lies in the slow computation of the log-likelihood, as it requires backpropagating through an ODE solver, which can be prohibitively slow for high-dimensional problems. In this work, we propose an iterative algorithm to approximate the MAP estimator efficiently to solve a variety of linear inverse problems. Our algorithm is mathematically justified by the observation that the MAP objective can be approximated by a sum of $N$ ``local MAP'' objectives, where $N$ is the number of function evaluations. By leveraging Tweedie's formula, we show that we can perform gradient steps to sequentially optimize these objectives. We validate our approach for various linear inverse problems, such as super-resolution, deblurring, inpainting, and compressed sensing, and demonstrate that we can outperform other methods based on flow matching.

Updated: 2024-05-29 06:56:12

标题: 通过迭代损坏轨迹匹配的线性逆问题的流先验

摘要: 基于流匹配的生成模型因其简单性和在高分辨率图像合成中的优越性能而引起了广泛关注。通过利用瞬时变量变换公式，可以直接从学习到的流计算图像的可能性，使其成为下游任务（如逆问题）中有吸引力的先验选择。特别是，一个自然的方法是将这种图像概率纳入最大后验（MAP）估计问题中。然而，一个主要障碍在于对对数似然的缓慢计算，因为它需要通过ODE求解器进行反向传播，这对于高维问题来说可能是极其缓慢的。在这项工作中，我们提出了一种迭代算法，以有效地近似MAP估计器来解决各种线性逆问题。我们的算法在数学上得到了验证，观察到MAP目标可以近似为$N$个“局部MAP”目标的总和，其中$N$是函数评估的数量。通过利用Tweedie的公式，我们表明我们可以执行梯度步骤来逐步优化这些目标。我们验证了我们的方法在各种线性逆问题，如超分辨率、去模糊、修补和压缩感知等方面的有效性，并证明我们可以胜过基于流匹配的其他方法。

更新时间: 2024-05-29 06:56:12

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2405.18816v1

UniPTS: A Unified Framework for Proficient Post-Training Sparsity

Post-training Sparsity (PTS) is a recently emerged avenue that chases efficient network sparsity with limited data in need. Existing PTS methods, however, undergo significant performance degradation compared with traditional methods that retrain the sparse networks via the whole dataset, especially at high sparsity ratios. In this paper, we attempt to reconcile this disparity by transposing three cardinal factors that profoundly alter the performance of conventional sparsity into the context of PTS. Our endeavors particularly comprise (1) A base-decayed sparsity objective that promotes efficient knowledge transferring from dense network to the sparse counterpart. (2) A reducing-regrowing search algorithm designed to ascertain the optimal sparsity distribution while circumventing overfitting to the small calibration set in PTS. (3) The employment of dynamic sparse training predicated on the preceding aspects, aimed at comprehensively optimizing the sparsity structure while ensuring training stability. Our proposed framework, termed UniPTS, is validated to be much superior to existing PTS methods across extensive benchmarks. As an illustration, it amplifies the performance of POT, a recently proposed recipe, from 3.9% to 68.6% when pruning ResNet-50 at 90% sparsity ratio on ImageNet. We release the code of our paper at https://github.com/xjjxmu/UniPTS.

Updated: 2024-05-29 06:53:18

标题: UniPTS：一种用于训练后稀疏的统一框架

摘要: 后训练稀疏性（PTS）是一条最近出现的途径，追求具有有限数据需求的高效网络稀疏性。然而，现有的PTS方法与通过整个数据集重新训练稀疏网络的传统方法相比，性能显著下降，特别是在高稀疏比率下。在本文中，我们尝试通过将三个根本因素转移到PTS的背景中，来调和这种差异，这些因素深刻地改变了传统稀疏性能。我们的努力特别包括（1）一种基础衰减的稀疏性目标，促进从密集网络向稀疏对应物的高效知识转移。（2）一个设计用于确定最佳稀疏性分布的减少-重新生长搜索算法，同时避免在PTS中过拟合到小的校准集。（3）基于前述方面的动态稀疏训练的雇用，旨在全面优化稀疏结构，同时确保训练稳定性。我们提出的框架UniPTS被证实在广泛的基准测试中要比现有的PTS方法优越得多。以POT为例，它将ResNet-50在ImageNet上90％稀疏比率修剪时的性能从3.9％提升到68.6％。我们在https://github.com/xjjxmu/UniPTS 上发布了我们论文的代码。

更新时间: 2024-05-29 06:53:18

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.18810v1

Proof Number Based Monte-Carlo Tree Search

This paper proposes a new game-search algorithm, PN-MCTS, which combines Monte-Carlo Tree Search (MCTS) and Proof-Number Search (PNS). These two algorithms have been successfully applied for decision making in a range of domains. We define three areas where the additional knowledge provided by the proof and disproof numbers gathered in MCTS trees might be used: final move selection, solving subtrees, and the UCB1 selection mechanism. We test all possible combinations on different time settings, playing against vanilla UCT on several games: Lines of Action ($7$$\times$$7$ and $8$$\times$$8$ board sizes), MiniShogi, Knightthrough, and Awari. Furthermore, we extend this new algorithm to properly address games with draws, like Awari, by adding an additional layer of PNS on top of the MCTS tree. The experiments show that PN-MCTS is able to outperform MCTS in all tested game domains, achieving win rates up to 96.2% for Lines of Action.

Updated: 2024-05-29 06:49:31

标题: 基于证明数的蒙特卡罗树搜索

摘要: 本文提出了一种新的游戏搜索算法PN-MCTS，它结合了蒙特卡洛树搜索（MCTS）和证明数搜索（PNS）。这两种算法已成功应用于一系列领域的决策制定中。我们定义了三个领域，在这些领域中，MCTS树中收集的证明和反证号所提供的额外知识可能会被使用：最终移动选择，解决子树，以及UCB1选择机制。我们在不同的时间设置上对所有可能的组合进行了测试，在几个游戏中与基本的UCT进行对战：动作行为（$7$$\times$$7$和$8$$\times$$8$板大小），MiniShogi，Knightthrough和Awari。此外，我们通过在MCTS树的顶部添加一个额外的PNS层，将这种新算法扩展到适当处理具有平局的游戏，例如Awari。实验表明，PN-MCTS能够在所有测试的游戏领域中优于MCTS，在动作行为游戏中获得高达96.2%的胜率。

更新时间: 2024-05-29 06:49:31

领域: cs.AI

下载: http://arxiv.org/abs/2303.09449v4

Semiring Activation in Neural Networks

We introduce a class of trainable nonlinear operators based on semirings that are suitable for use in neural networks. These operators generalize the traditional alternation of linear operators with activation functions in neural networks. Semirings are algebraic structures that describe a generalised notation of linearity, greatly expanding the range of trainable operators that can be included in neural networks. In fact, max- or min-pooling operations are convolutions in the tropical semiring with a fixed kernel. We perform experiments where we replace the activation functions for trainable semiring-based operators to show that these are viable operations to include in fully connected as well as convolutional neural networks (ConvNeXt). We discuss some of the challenges of replacing traditional activation functions with trainable semiring activations and the trade-offs of doing so.

Updated: 2024-05-29 06:47:45

标题: 神经网络中的半环激活

摘要: 我们介绍了一类基于半环的可训练非线性运算符，适用于神经网络。这些运算符概括了神经网络中线性运算符与激活函数的传统交替。半环是描述一般线性性质的代数结构，极大地扩展了可以包含在神经网络中的可训练运算符的范围。事实上，最大或最小池化操作是在带有固定核的热带半环中的卷积。我们进行了实验，将可训练半环为基础的运算符替换为激活函数，以表明这些是可行的操作，可以包括在全连接以及卷积神经网络（ConvNeXt）中。我们讨论了用可训练半环激活函数替换传统激活函数的一些挑战，以及这样做的权衡。

更新时间: 2024-05-29 06:47:45

领域: cs.LG

下载: http://arxiv.org/abs/2405.18805v1

Blind Data Adaptation to tackle Covariate Shift in Operational Steganalysis

The proliferation of image manipulation for unethical purposes poses significant challenges in social networks. One particularly concerning method is Image Steganography, allowing individuals to hide illegal information in digital images without arousing suspicions. Such a technique pose severe security risks, making it crucial to develop effective steganalysis methods enabling to detect manipulated images for clandestine communications. Although significant advancements have been achieved with machine learning models, a critical issue remains: the disparity between the controlled datasets used to train steganalysis models against real-world datasets of forensic practitioners, undermining severely the practical effectiveness of standardized steganalysis models. In this paper, we address this issue focusing on a realistic scenario where practitioners lack crucial information about the limited target set of images under analysis, including details about their development process and even whereas it contains manipulated images or not. By leveraging geometric alignment and distribution matching of source and target residuals, we develop TADA (Target Alignment through Data Adaptation), a novel methodology enabling to emulate sources aligned with specific targets in steganalysis, which is also relevant for highly unbalanced targets. The emulator is represented by a light convolutional network trained to align distributions of image residuals. Experimental validation demonstrates the potential of our strategy over traditional methods fighting covariate shift in steganalysis.

Updated: 2024-05-29 06:47:30

标题: 盲目数据适应以解决操作隐写分析中的协变量转移

摘要: 图像操纵在社交网络中出现的不道德目的的增加，带来了重大挑战。一种特别令人担忧的方法是图像隐写术，允许个人在数字图像中隐藏非法信息而不引起怀疑。这种技术带来严重的安全风险，因此至关重要的是开发有效的隐写分析方法，能够检测用于秘密通信的操纵图像。尽管利用机器学习模型取得了重大进展，但一个关键问题仍然存在：用于训练隐写分析模型的受控数据集与法医实践者的真实世界数据集之间存在巨大差异，严重削弱了标准化隐写分析模型的实际效果。在本文中，我们解决了这个问题，重点关注一个现实情景，即从业者缺乏关于分析对象图像有限目标集的关键信息，包括有关其开发过程的细节，甚至不确定其中是否包含操纵图像。通过利用源和目标残差的几何对齐和分布匹配，我们开发了一种新方法TADA（通过数据适应实现目标对齐），该方法能够在隐写分析中模拟与特定目标对齐的源，对高度不平衡的目标也具有相关性。模拟器由一个轻型卷积网络表示，经过训练可以对齐图像残差的分布。实验验证证实了我们的策略相对于传统方法在对抗隐写分析中的协变量转移方面的潜力。

更新时间: 2024-05-29 06:47:30

领域: eess.IV,cs.AI,cs.CR,cs.MM

下载: http://arxiv.org/abs/2405.16961v2

Enhancing Security and Privacy in Federated Learning using Update Digests and Voting-Based Defense

Federated Learning (FL) is a promising privacy-preserving machine learning paradigm that allows data owners to collaboratively train models while keeping their data localized. Despite its potential, FL faces challenges related to the trustworthiness of both clients and servers, especially in the presence of curious or malicious adversaries. In this paper, we introduce a novel framework named \underline{\textbf{F}}ederated \underline{\textbf{L}}earning with \underline{\textbf{U}}pdate \underline{\textbf{D}}igest (FLUD), which addresses the critical issues of privacy preservation and resistance to Byzantine attacks within distributed learning environments. FLUD utilizes an innovative approach, the $\mathsf{LinfSample}$ method, allowing clients to compute the $l_{\infty}$ norm across sliding windows of updates as an update digest. This digest enables the server to calculate a shared distance matrix, significantly reducing the overhead associated with Secure Multi-Party Computation (SMPC) by three orders of magnitude while effectively distinguishing between benign and malicious updates. Additionally, FLUD integrates a privacy-preserving, voting-based defense mechanism that employs optimized SMPC protocols to minimize communication rounds. Our comprehensive experiments demonstrate FLUD's effectiveness in countering Byzantine adversaries while incurring low communication and runtime overhead. FLUD offers a scalable framework for secure and reliable FL in distributed environments, facilitating its application in scenarios requiring robust data management and security.

Updated: 2024-05-29 06:46:10

标题: 使用更新摘要和基于投票的防御加强联合学习中的安全性和隐私保护

摘要: 联邦学习（FL）是一种有前途的隐私保护机器学习范式，允许数据所有者在保持数据本地化的同时协作训练模型。尽管具有潜力，但FL面临与客户端和服务器的可信度相关的挑战，特别是在存在好奇或恶意对手的情况下。在本文中，我们引入了一个名为FLUD的新框架，该框架解决了在分布式学习环境中隐私保护和对拜占庭攻击的抵抗这一关键问题。FLUD利用一种创新方法，即$\mathsf{LinfSample}$方法，允许客户端计算更新的滑动窗口上的$l_{\infty}$范数作为更新摘要。该摘要使服务器能够计算一个共享距离矩阵，显著减少了与安全多方计算（SMPC）相关的开销，同时有效区分良性和恶意更新。此外，FLUD集成了一种隐私保护的基于投票的防御机制，使用优化的SMPC协议来最小化通信轮次。我们全面的实验表明，FLUD在对抗拜占庭对手时具有有效性，同时产生较低的通信和运行时开销。FLUD为分布式环境中的安全可靠FL提供了可扩展的框架，促进其在需要强大数据管理和安全性的场景中的应用。

更新时间: 2024-05-29 06:46:10

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2405.18802v1

On Efficient and Statistical Quality Estimation for Data Annotation

Annotated datasets are an essential ingredient to train, evaluate, compare and productionalize supervised machine learning models. It is therefore imperative that annotations are of high quality. For their creation, good quality management and thereby reliable quality estimates are needed. Then, if quality is insufficient during the annotation process, rectifying measures can be taken to improve it. Quality estimation is often performed by having experts manually label instances as correct or incorrect. But checking all annotated instances tends to be expensive. Therefore, in practice, usually only subsets are inspected; sizes are chosen mostly without justification or regard to statistical power and more often than not, are relatively small. Basing estimates on small sample sizes, however, can lead to imprecise values for the error rate. Using unnecessarily large sample sizes costs money that could be better spent, for instance on more annotations. Therefore, we first describe in detail how to use confidence intervals for finding the minimal sample size needed to estimate the annotation error rate. Then, we propose applying acceptance sampling as an alternative to error rate estimation We show that acceptance sampling can reduce the required sample sizes up to 50% while providing the same statistical guarantees.

Updated: 2024-05-29 06:43:37

标题: 关于数据标注的高效和统计质量估计

摘要: Annotated datasets are an essential component for training, evaluating, comparing, and deploying supervised machine learning models. High-quality annotations are crucial for these tasks. To ensure quality, good quality management and reliable quality estimates are necessary during the annotation process. Quality estimation is typically done by experts manually labeling instances as correct or incorrect, but checking all instances can be costly. Therefore, only subsets are usually inspected, often with relatively small sample sizes chosen without proper justification. Using small sample sizes can lead to imprecise error rate values, while using unnecessarily large sample sizes is wasteful. To address this issue, this paper describes how confidence intervals can be used to determine the minimal sample size needed for estimating the annotation error rate. Additionally, the paper proposes using acceptance sampling as an alternative method for error rate estimation, which can reduce required sample sizes by up to 50% while providing the same statistical guarantees.

更新时间: 2024-05-29 06:43:37

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.11919v2

Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Cost

In this paper, we consider model-free federated reinforcement learning for tabular episodic Markov decision processes. Under the coordination of a central server, multiple agents collaboratively explore the environment and learn an optimal policy without sharing their raw data. Despite recent advances in federated Q-learning algorithms achieving near-linear regret speedup with low communication cost, existing algorithms only attain suboptimal regrets compared to the information bound. We propose a novel model-free federated Q-learning algorithm, termed FedQ-Advantage. Our algorithm leverages reference-advantage decomposition for variance reduction and operates under two distinct mechanisms: synchronization between the agents and the server, and policy update, both triggered by events. We prove that our algorithm not only requires a lower logarithmic communication cost but also achieves an almost optimal regret, reaching the information bound up to a logarithmic factor and near-linear regret speedup compared to its single-agent counterpart when the time horizon is sufficiently large.

Updated: 2024-05-29 06:26:52

标题: 联邦式Q学习与参考优势分解：几乎最优遗憾和对数通信成本

摘要: 在这篇论文中，我们考虑了针对表格式的周期性马尔可夫决策过程的无模型联邦强化学习。在一个中央服务器的协调下，多个代理共同探索环境并学习最优策略，而不共享原始数据。尽管最近在联邦Q学习算法方面取得了近乎线性的遗憾加速和低通信成本，但现有算法与信息界相比仅达到次优遗憾。我们提出了一种新颖的无模型联邦Q学习算法，称为FedQ-Advantage。我们的算法利用参考-优势分解来减少方差，并在两个不同的机制下运行：代理与服务器之间的同步，以及由事件触发的策略更新。我们证明了我们的算法不仅需要更低的对数通信成本，而且在时间跨度足够大时，达到了几乎最优的遗憾，达到了信息界，与其单一代理对应物相比有近乎线性的遗憾加速。

更新时间: 2024-05-29 06:26:52

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.18795v1

Test-Time Model Adaptation with Only Forward Passes

Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C.

Updated: 2024-05-29 06:24:55

标题: 只需前向传递的测试时间模型自适应

摘要: 测试时适应性已被证明在将训练好的模型适应到具有潜在分布变化的未见测试样本上是有效的。然而，在现实世界的场景中，模型通常部署在资源有限的设备上，例如FPGA，并且通常会被量化和硬编码为不可修改的参数以进行加速。鉴于此，现有方法通常是不可行的，因为它们严重依赖计算密集型的反向传播进行模型更新，这可能不被支持。为了解决这个问题，我们提出了一种测试时的前向优化适应（FOA）方法。在FOA中，我们试图通过无导数的协方差矩阵适应进化策略仅学习一个新添加的提示（作为模型的输入）。为了使这种策略在我们的在线无监督设置下稳定运行，我们设计了一个通过测量测试训练统计差异和模型预测熵来衡量的新颖适应函数。此外，我们设计了一个激活偏移方案，直接调整模型对移位测试样本的激活，使它们与源训练域对齐，从而进一步增强适应性能。在不使用任何反向传播和改变模型权重的情况下，FOA在量化的8位ViT上优于基于梯度的32位全精度ViT上的TENT，同时在ImageNet-C上实现了高达24倍的内存减少。

更新时间: 2024-05-29 06:24:55

领域: cs.LG

下载: http://arxiv.org/abs/2404.01650v2

Adaptive Discretization-based Non-Episodic Reinforcement Learning in Metric Spaces

We study non-episodic Reinforcement Learning for Lipschitz MDPs in which state-action space is a metric space, and the transition kernel and rewards are Lipschitz functions. We develop computationally efficient UCB-based algorithm, $\textit{ZoRL-}\epsilon$ that adaptively discretizes the state-action space and show that their regret as compared with $\epsilon$-optimal policy is bounded as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d^\epsilon_z + 1)}\log{(T)})$, where $d^\epsilon_z$ is the $\epsilon$-zooming dimension. In contrast, if one uses the vanilla $\textit{UCRL-}2$ on a fixed discretization of the MDP, the regret w.r.t. a $\epsilon$-optimal policy scales as $\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d + 1)}\log{(T)})$ so that the adaptivity gains are huge when $d^\epsilon_z \ll d$. Note that the absolute regret of any 'uniformly good' algorithm for a large family of continuous MDPs asymptotically scales as at least $\Omega(\log{(T)})$. Though adaptive discretization has been shown to yield $\mathcal{\tilde{O}}(H^{2.5}K^\frac{d_z + 1}{d_z + 2})$ regret in episodic RL, an attempt to extend this to the non-episodic case by employing constant duration episodes whose duration increases with $T$, is futile since $d_z \to d$ as $T \to \infty$. The current work shows how to obtain adaptivity gains for non-episodic RL. The theoretical results are supported by simulations on two systems where the performance of $\textit{ZoRL-}\epsilon$ is compared with that of '$\textit{UCRL-C}$,' the fixed discretization-based extension of $\textit{UCRL-}2$ for systems with continuous state-action spaces.

Updated: 2024-05-29 06:18:09

标题: 度量空间中基于自适应离散化的非时序强化学习

摘要: 我们研究了非记忆式的Lipschitz MDP强化学习，其中状态-动作空间是度量空间，转移核和奖励是Lipschitz函数。我们开发了计算效率高的基于UCB的算法，$\textit{ZoRL-}\epsilon$，该算法自适应地离散化状态-动作空间，并且证明了与$\epsilon$-最优策略相比，它们的遗憾被限制为$\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d^\epsilon_z + 1)}\log{(T)})$，其中$d^\epsilon_z$是$\epsilon$-缩放维度。相比之下，如果在MDP的固定离散化上使用基本的$\textit{UCRL-}2$，则遗憾相对于$\epsilon$-最优策略的规模为$\mathcal{O}(\epsilon^{-(2 d_\mathcal{S} + d + 1)}\log{(T)})$，因此在$d^\epsilon_z \ll d$时，自适应性增益是巨大的。值得注意的是，对于大量连续MDP的“均匀好”算法的绝对遗憾至少渐近地以$\Omega(\log{(T)})$的速度增长。尽管经证明自适应离散化在记忆式RL中产生了$\mathcal{\tilde{O}}(H^{2.5}K^\frac{d_z + 1}{d_z + 2})$的遗憾，但是试图通过使用随着$T$增加而增加持续时间的恒定持续时间情节来将其扩展到非记忆式情况是徒劳的，因为随着$T \to \infty$，$d_z \to d$。当前工作展示了如何获得非记忆式RL的自适应性增益。理论结果得到了在两个系统上的模拟支持，其中将$\textit{ZoRL-}\epsilon$的性能与具有连续状态-动作空间的系统的固定离散化扩展$\textit{UCRL-}2$的'$\textit{UCRL-C}$'进行比较。

更新时间: 2024-05-29 06:18:09

领域: cs.LG

下载: http://arxiv.org/abs/2405.18793v1

Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policies

We consider off-policy evaluation (OPE) of deterministic target policies for reinforcement learning (RL) in environments with continuous action spaces. While it is common to use importance sampling for OPE, it suffers from high variance when the behavior policy deviates significantly from the target policy. In order to address this issue, some recent works on OPE proposed in-sample learning with importance resampling. Yet, these approaches are not applicable to deterministic target policies for continuous action spaces. To address this limitation, we propose to relax the deterministic target policy using a kernel and learn the kernel metrics that minimize the overall mean squared error of the estimated temporal difference update vector of an action value function, where the action value function is used for policy evaluation. We derive the bias and variance of the estimation error due to this relaxation and provide analytic solutions for the optimal kernel metric. In empirical studies using various test domains, we show that the OPE with in-sample learning using the kernel with optimized metric achieves significantly improved accuracy than other baselines.

Updated: 2024-05-29 06:17:33

标题: 基于核度量学习的确定性RL策略样本内离策略评估

摘要: 我们考虑在连续动作空间环境中对确定性目标策略进行脱机评估（OPE）的强化学习（RL）。尽管通常使用重要性抽样进行OPE，但当行为策略与目标策略显着偏离时，重要性抽样会受到高方差的影响。为了解决这个问题，一些关于OPE的最近研究提出了使用重要性重采样的样本内学习。然而，这些方法不适用于连续动作空间的确定性目标策略。为了解决这个限制，我们建议使用核函数放宽确定性目标策略，并学习最小化动作值函数的估计时间差更新向量的整体均方误差的核度量，其中动作值函数用于策略评估。我们推导了由于这种放松而导致的估计误差的偏差和方差，并为最佳核度量提供了解析解。在使用各种测试领域进行的实证研究中，我们展示了使用经过优化度量的核进行样本内学习的OPE比其他基线方法显著提高了准确性。

更新时间: 2024-05-29 06:17:33

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18792v1

UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Region Profiling

Urban region profiling aims to learn a low-dimensional representation of a given urban area while preserving its characteristics, such as demographics, infrastructure, and economic activities, for urban planning and development. However, prevalent pretrained models, particularly those reliant on satellite imagery, face dual challenges. Firstly, concentrating solely on macro-level patterns from satellite data may introduce bias, lacking nuanced details at micro levels, such as architectural details at a place.Secondly, the lack of interpretability in pretrained models limits their utility in providing transparent evidence for urban planning. In response to these issues, we devise a novel framework entitled UrbanVLP based on Vision-Language Pretraining. Our UrbanVLP seamlessly integrates multi-granularity information from both macro (satellite) and micro (street-view) levels, overcoming the limitations of prior pretrained models. Moreover, it introduces automatic text generation and calibration, elevating interpretability in downstream applications by producing high-quality text descriptions of urban imagery. Rigorous experiments conducted across six urban indicator prediction tasks underscore its superior performance.

Updated: 2024-05-29 06:11:30

标题: UrbanVLP：城市区域配置的多粒度视觉语言预训练

摘要: 城市区域剖析旨在学习一个给定城市区域的低维表示，同时保留其特征，如人口统计、基础设施和经济活动，用于城市规划和发展。然而，目前流行的预训练模型，特别是那些依赖卫星图像的模型，面临着双重挑战。首先，仅集中在卫星数据的宏观级别模式上可能引入偏见，缺乏微观级别的细节，比如在某个地方的建筑细节。其次，预训练模型的可解释性不足限制了它们在提供城市规划透明证据方面的实用性。为了解决这些问题，我们设计了一个基于视觉-语言预训练的新框架，名为UrbanVLP。我们的UrbanVLP无缝集成了来自宏观（卫星）和微观（街景）级别的多粒度信息，克服了先前预训练模型的限制。此外，它引入了自动生成文本和校准，通过生成高质量的城市图像描述提高了下游应用中的可解释性。对六个城市指标预测任务进行的严格实验强调了其卓越的性能。

更新时间: 2024-05-29 06:11:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.16831v2

Insights from the Design Space Exploration of Flow-Guided Nanoscale Localization

Nanodevices with Terahertz (THz)-based wireless communication capabilities are providing a primer for flow-guided localization within the human bloodstreams. Such localization is allowing for assigning the locations of sensed events with the events themselves, providing benefits in precision medicine along the lines of early and precise diagnostics, and reduced costs and invasiveness. Flow-guided localization is still in a rudimentary phase, with only a handful of works targeting the problem. Nonetheless, the performance assessments of the proposed solutions are already carried out in a non-standardized way, usually along a single performance metric, and ignoring various aspects that are relevant at such a scale (e.g., nanodevices' limited energy) and for such a challenging environment (e.g., extreme attenuation of in-body THz propagation). As such, these assessments feature low levels of realism and cannot be compared in an objective way. Toward addressing this issue, we account for the environmental and scale-related peculiarities of the scenario and assess the performance of two state-of-the-art flow-guided localization approaches along a set of heterogeneous performance metrics such as the accuracy and reliability of localization.

Updated: 2024-05-29 06:05:27

标题: Flow-Guided纳米尺度定位的设计空间探索中的见解

摘要: 具有太赫兹（THz）无线通信能力的纳米器件为人体血液中的流动导向定位提供了先导。这种定位允许将感知事件的位置与事件本身进行关联，从而在早期和精准诊断、降低成本和侵入性方面为精准医学带来益处。流动导向定位仍处于初级阶段，只有少数作品针对这一问题。然而，所提出的解决方案的性能评估已经以非标准化方式进行，通常沿着单一性能指标进行，并忽视了在这一规模（如纳米器件的能量有限）和在这样具有挑战性的环境（如体内THz传播的极端衰减）下相关的各种方面。因此，这些评估具有低水平的现实性，不能以客观方式进行比较。为了解决这个问题，我们考虑了场景的环境和规模相关特性，并评估了两种最先进的流动导向定位方法的性能，包括准确性和可靠性等一系列异质性性能指标。

更新时间: 2024-05-29 06:05:27

领域: cs.NI,cs.LG,eess.SP

下载: http://arxiv.org/abs/2305.18493v2

MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other classes. However, in this paper, we find that there exist high similarities between NCC-learned representations of two samples from different classes. In order to address this problem, we propose a bi-level optimization framework, \emph{maximizing optimized kernel dependence} (MOKD) to learn a set of class-specific representations that match the cluster structures indicated by labeled data of the given task. Specifically, MOKD first optimizes the kernel adopted in \emph{Hilbert-Schmidt independence criterion} (HSIC) to obtain the optimized kernel HSIC (opt-HSIC) that can capture the dependence more precisely. Then, an optimization problem regarding the opt-HSIC is addressed to simultaneously maximize the dependence between representations and labels and minimize the dependence among all samples. Extensive experiments on Meta-Dataset demonstrate that MOKD can not only achieve better generalization performance on unseen domains in most cases but also learn better data representation clusters. The project repository of MOKD is available at: \href{https://github.com/tmlr-group/MOKD}{https://github.com/tmlr-group/MOKD}.

Updated: 2024-05-29 05:59:52

标题: MOKD：通过最大化优化核依赖进行跨域微调以进行少样本分类

摘要: 在跨领域少样本分类中，最近质心分类器(NCC)旨在学习表示以构建一个度量空间，其中可以通过测量样本与每个类别原型之间的相似性来进行少样本分类。NCC背后的直觉是，每个样本被拉近到其所属的类别质心，同时被推离其他类别的质心。然而，在本文中，我们发现NCC学习的表示中存在两个来自不同类别的样本之间的高相似性。为了解决这个问题，我们提出了一个双层优化框架，最大化优化核依赖(MOKD)来学习一组与给定任务的标记数据所指示的聚类结构相匹配的类别特定表示。具体来说，MOKD首先优化了在Hilbert-Schmidt独立准则(HSIC)中采用的核，以获得可以更精确捕捉依赖关系的优化核HSIC(opt-HSIC)。然后，解决了一个关于opt-HSIC的优化问题，以同时最大化表示和标签之间的依赖关系，并最小化所有样本之间的依赖关系。在Meta-Dataset上的大量实验证明，MOKD不仅在大多数情况下可以在未见领域上实现更好的泛化性能，而且可以学习到更好的数据表示聚类。MOKD的项目存储库可在以下链接找到：https://github.com/tmlr-group/MOKD。

更新时间: 2024-05-29 05:59:52

领域: cs.LG,cs.CV

下载: http://arxiv.org/abs/2405.18786v1

Fairness-aware Federated Minimax Optimization with Convergence Guarantee

Federated learning (FL) has garnered considerable attention due to its privacy-preserving feature. Nonetheless, the lack of freedom in managing user data can lead to group fairness issues, where models are biased towards sensitive factors such as race or gender. To tackle this issue, this paper proposes a novel algorithm, fair federated averaging with augmented Lagrangian method (FFALM), designed explicitly to address group fairness issues in FL. Specifically, we impose a fairness constraint on the training objective and solve the minimax reformulation of the constrained optimization problem. Then, we derive the theoretical upper bound for the convergence rate of FFALM. The effectiveness of FFALM in improving fairness is shown empirically on CelebA and UTKFace datasets in the presence of severe statistical heterogeneity.

Updated: 2024-05-29 05:58:59

标题: 具有收敛保证的公平感知联邦极小化优化

摘要: 联邦学习（FL）由于其保护隐私的特性而受到了广泛关注。然而，对用户数据管理的缺乏自由可能导致群体公平性问题，即模型偏向于种族或性别等敏感因素。为了解决这个问题，本文提出了一种新颖的算法，名为带增广拉格朗日方法的公平联邦平均算法（FFALM），专门设计用于解决FL中的群体公平性问题。具体来说，我们在训练目标上施加了公平约束，并解决了受限制优化问题的极小极大重述。然后，我们推导了FFALM收敛速度的理论上限。实证结果表明，在存在严重的统计异质性的CelebA和UTKFace数据集中，FFALM在提高公平性方面的有效性。

更新时间: 2024-05-29 05:58:59

领域: cs.LG,cs.CY

下载: http://arxiv.org/abs/2307.04417v3

Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms

Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms. We also show the results of experiments on three different Safety-Gym environments.

Updated: 2024-05-29 05:54:19

标题: 有限时间分析三时间尺度受限演员-评论家和受限自然演员-评论家算法

摘要: 演员评论方法在广泛的强化学习任务中找到了巨大的应用，特别是当状态-动作空间很大时。本文考虑了带有函数逼近的受限马尔可夫决策过程（C-MDP）的演员评论和自然演员评论算法，涉及不等式约束，并在非独立同分布（马尔可夫）环境中对这两种算法进行了非渐近分析。我们考虑了长期平均成本准则，其中目标和约束函数均为某些规定成本函数的适当依赖策略的长期平均值。我们使用拉格朗日乘子方法处理不等式约束。我们证明这些算法保证能够找到性能（拉格朗日）函数$L(\theta,\gamma)$的一阶稳定点（即$\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$），在受限演员评论（C-AC）和受限自然演员评论（C-NAC）算法的情况下，其样本复杂度为$\mathcal{\tilde{O}}(\epsilon^{-2.5})$。我们还展示了在三种不同的Safety-Gym环境中的实验结果。

更新时间: 2024-05-29 05:54:19

领域: cs.LG

下载: http://arxiv.org/abs/2310.16363v3

Generalization Study of Quantum Neural Network

Generalization is an important feature of neural network, and there have been many studies on it. Recently, with the development of quantum compu-ting, it brings new opportunities. In this paper, we studied a class of quantum neural network constructed by quantum gate. In this model, we mapped the feature data to a quantum state in Hilbert space firstly, and then implement unitary evolution on it, in the end, we can get the classification result by im-plement measurement on the quantum state. Since all the operations in quan-tum neural networks are unitary, the parameters constitute a hypersphere of Hilbert space. Compared with traditional neural network, the parameter space is flatter. Therefore, it is not easy to fall into local optimum, which means the quantum neural networks have better generalization. In order to validate our proposal, we evaluated our model on three public datasets, the results demonstrated that our model has better generalization than the classical neu-ral network with the same structure.

Updated: 2024-05-29 05:43:08

标题: 量子神经网络的泛化研究

摘要: 泛化是神经网络的一个重要特征，已经有许多研究对此进行了探讨。最近，随着量子计算技术的发展，为其带来了新的机遇。本文研究了一类由量子门构建的量子神经网络。在这个模型中，我们首先将特征数据映射到希尔伯特空间中的量子态，然后对其进行酉演化，最后通过对量子态进行测量，我们可以得到分类结果。由于量子神经网络中的所有操作都是酉操作，参数构成了希尔伯特空间的一个超球体。与传统神经网络相比，参数空间更为平坦。因此，不容易陷入局部最优解，这意味着量子神经网络具有更好的泛化能力。为了验证我们的提议，我们在三个公共数据集上评估了我们的模型，结果表明我们的模型比具有相同结构的经典神经网络具有更好的泛化性能。

更新时间: 2024-05-29 05:43:08

领域: quant-ph,cs.LG,stat.ML,I.2.6

下载: http://arxiv.org/abs/2006.02388v2

On the Role of Attention Masks and LayerNorm in Transformers

Self-attention is the key mechanism of transformers, which are the essential building blocks of modern foundation models. Recent studies have shown that pure self-attention suffers from an increasing degree of rank collapse as depth increases, limiting model expressivity and further utilization of model depth. The existing literature on rank collapse, however, has mostly overlooked other critical components in transformers that may alleviate the rank collapse issue. In this paper, we provide a general analysis of rank collapse under self-attention, taking into account the effects of attention masks and layer normalization (LayerNorm). In particular, we find that although pure masked attention still suffers from exponential collapse to a rank one subspace, local masked attention can provably slow down the collapse rate. In the case of self-attention with LayerNorm, we first show that for certain classes of value matrices, collapse to a rank one subspace still happens exponentially. However, through construction of nontrivial counterexamples, we then establish that with proper choice of value matrices, a general class of sequences may not converge to a rank one subspace, and the self-attention dynamics with LayerNorm can simultaneously possess a rich set of equilibria with any possible rank between one and full. Our result refutes the previous hypothesis that LayerNorm plays no role in the rank collapse of self-attention and suggests that self-attention with LayerNorm constitutes a much more expressive, versatile nonlinear dynamical system than what was originally thought.

Updated: 2024-05-29 05:41:28

标题: 关于注意力掩码和层归一化在Transformer中的作用

摘要: 自注意力是变压器的关键机制，变压器是现代基础模型的基本构建模块。最近的研究表明，纯自注意力在深度增加时会出现越来越严重的秩坍缩问题，限制了模型的表达能力和进一步利用模型深度的可能性。然而，现有文献大多忽视了变压器中可能缓解秩坍缩问题的其他关键组件。在本文中，我们对自注意力下的秩坍缩进行了一般性分析，考虑了注意力蒙版和层归一化（LayerNorm）的影响。特别是，我们发现尽管纯蒙版注意力仍然会指数级地坍缩为秩一子空间，但局部蒙版注意力可以明显减缓坍缩速率。在带有LayerNorm的自注意力情况下，我们首先证明对于某些类别的值矩阵，仍然会指数级地坍缩为秩一子空间。然而，通过构建非平凡的反例，我们进一步确定了通过适当选择值矩阵，一般类别的序列可能不会收敛到秩一子空间，并且带有LayerNorm的自注意力动力学可以同时具有在秩一和满秩之间的任意秩的丰富平衡点集。我们的结果否定了以前的假设，即LayerNorm在自注意力的秩坍缩中没有作用，并且暗示了带有LayerNorm的自注意力构成了一个比最初想象的更具表达力、灵活性的非线性动力系统。

更新时间: 2024-05-29 05:41:28

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.18781v1

A Graph is Worth $K$ Words: Euclideanizing Graph using Pure Transformer

Can we model Non-Euclidean graphs as pure language or even Euclidean vectors while retaining their inherent information? The Non-Euclidean property have posed a long term challenge in graph modeling. Despite recent graph neural networks and graph transformers efforts encoding graphs as Euclidean vectors, recovering the original graph from vectors remains a challenge. In this paper, we introduce GraphsGPT, featuring an Graph2Seq encoder that transforms Non-Euclidean graphs into learnable Graph Words in the Euclidean space, along with a GraphGPT decoder that reconstructs the original graph from Graph Words to ensure information equivalence. We pretrain GraphsGPT on $100$M molecules and yield some interesting findings: (1) The pretrained Graph2Seq excels in graph representation learning, achieving state-of-the-art results on $8/9$ graph classification and regression tasks. (2) The pretrained GraphGPT serves as a strong graph generator, demonstrated by its strong ability to perform both few-shot and conditional graph generation. (3) Graph2Seq+GraphGPT enables effective graph mixup in the Euclidean space, overcoming previously known Non-Euclidean challenges. (4) The edge-centric pretraining framework GraphsGPT demonstrates its efficacy in graph domain tasks, excelling in both representation and generation. Code is available at \href{https://github.com/A4Bio/GraphsGPT}{GitHub}.

Updated: 2024-05-29 05:40:35

标题: 一张图值$K$个词：使用纯Transformer将图欧几里得化

摘要: 我们能够将非欧几里得图建模为纯语言甚至欧几里得向量，同时保留它们固有的信息吗？非欧几里得特性在图建模中提出了长期挑战。尽管最近的图神经网络和图变换器努力将图编码为欧几里得向量，但从向量中恢复原始图仍然是一个挑战。在本文中，我们介绍了GraphsGPT，其中包含一个将非欧几里得图转换为可学习的图词（Graph Words）在欧几里得空间中的Graph2Seq编码器，以及一个从图词重新构建原始图的GraphGPT解码器，以确保信息的等效性。我们在1亿个分子上对GraphsGPT进行预训练，并得出了一些有趣的发现：（1）预训练的Graph2Seq在图表示学习方面表现出色，在8/9个图分类和回归任务上取得了最先进的结果。（2）预训练的GraphGPT作为一个强大的图生成器，其强大的能力在于进行少样本和有条件的图生成。（3）Graph2Seq+GraphGPT在欧几里得空间中实现了有效的图混合，克服了先前已知的非欧几里得挑战。（4）以边为中心的预训练框架GraphsGPT在图领域任务中表现出其有效性，在表示和生成方面都表现出色。代码可在GitHub上找到：https://github.com/A4Bio/GraphsGPT。

更新时间: 2024-05-29 05:40:35

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2402.02464v3

Quantitative Certification of Bias in Large Language Models

Large Language Models (LLMs) can produce responses that exhibit social biases and support stereotypes. However, conventional benchmarking is insufficient to thoroughly evaluate LLM bias, as it can not scale to large sets of prompts and provides no guarantees. Therefore, we propose a novel certification framework QuaCer-B (Quantitative Certification of Bias) that provides formal guarantees on obtaining unbiased responses from target LLMs under large sets of prompts. A certificate consists of high-confidence bounds on the probability of obtaining biased responses from the LLM for any set of prompts containing sensitive attributes, sampled from a distribution. We illustrate the bias certification in LLMs for prompts with various prefixes drawn from given distributions. We consider distributions of random token sequences, mixtures of manual jailbreaks, and jailbreaks in the LLM's embedding space to certify its bias. We certify popular LLMs with QuaCer-B and present novel insights into their biases.

Updated: 2024-05-29 05:39:37

标题: 大型语言模型中偏见的定量认证

摘要: 大型语言模型（LLMs）可能产生展现出社会偏见并支持刻板印象的回应。然而，传统的基准测试无法充分评估LLM的偏见，因为它无法扩展到大量提示集并且没有提供保证。因此，我们提出了一个新颖的认证框架QuaCer-B（偏见的定量认证），它可以提供关于在大量提示集下从目标LLMs获得无偏见回应的正式保证。认证包括对从分布中抽样的包含敏感属性的任何提示集获取LLM偏见回应概率的高置信度界限。我们通过对具有不同前缀的提示进行偏见认证，这些前缀是从给定分布中抽取的。我们考虑随机令牌序列、手动越狱的混合以及在LLM的嵌入空间中的越狱分布。我们使用QuaCer-B对流行的LLMs进行认证，并对它们的偏见提出新颖见解。

更新时间: 2024-05-29 05:39:37

领域: cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18780v1

SPABA: A Single-Loop and Probabilistic Stochastic Bilevel Algorithm Achieving Optimal Sample Complexity

While stochastic bilevel optimization methods have been extensively studied for addressing large-scale nested optimization problems in machine learning, it remains an open question whether the optimal complexity bounds for solving bilevel optimization are the same as those in single-level optimization. Our main result resolves this question: SPABA, an adaptation of the PAGE method for nonconvex optimization in (Li et al., 2021) to the bilevel setting, can achieve optimal sample complexity in both the finite-sum and expectation settings. We show the optimality of SPABA by proving that there is no gap in complexity analysis between stochastic bilevel and single-level optimization when implementing PAGE. Notably, as indicated by the results of (Dagr\'eou et al., 2022), there might exist a gap in complexity analysis when implementing other stochastic gradient estimators, like SGD and SAGA. In addition to SPABA, we propose several other single-loop stochastic bilevel algorithms, that either match or improve the state-of-the-art sample complexity results, leveraging our convergence rate and complexity analysis. Numerical experiments demonstrate the superior practical performance of the proposed methods.

Updated: 2024-05-29 05:36:03

标题: SPABA：一种单循环和概率随机双层算法实现最优样本复杂性

摘要: 尽管随机双层优化方法已被广泛研究用于解决机器学习中的大规模嵌套优化问题，但解决双层优化的最优复杂度界限是否与单层优化相同仍然是一个开放问题。我们的主要结果解决了这个问题：SPABA，是对非凸优化中的PAGE方法（Li等人，2021）进行调整以适用于双层设置，可以在有限和期望设置下实现最优的样本复杂度。我们通过证明在实施PAGE时，在随机双层和单层优化之间没有复杂度分析差距来展示SPABA的最优性。值得注意的是，正如（Dagr\'eou等人，2022）的结果表明，当实施其他随机梯度估计器时，如SGD和SAGA，可能存在复杂度分析差距。除了SPABA，我们提出了几种其他单循环随机双层算法，这些算法要么匹配，要么优于现有的样本复杂度结果，利用我们的收敛速度和复杂度分析。数值实验证明了所提方法的优越实用性能。

更新时间: 2024-05-29 05:36:03

领域: math.OC,cs.LG

下载: http://arxiv.org/abs/2405.18777v1

MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training

In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.

Updated: 2024-05-29 05:34:09

标题: MIST：通过成员不变子空间训练防御成员推断攻击

摘要: 在成员推理（MI）攻击中，对手试图确定一个实例是否被用来训练机器学习（ML）模型。当使用私人数据训练ML模型时，MI攻击是一个重要的隐私问题。文献中大多数MI攻击利用了ML模型被训练以很好地拟合训练数据的事实，因此在训练实例上具有非常低的损失。因此，大多数针对MI攻击的防御措施试图使模型在训练数据上拟合得更差。然而，这样做通常会导致较低的准确性。我们观察到训练实例对MI攻击具有不同程度的脆弱性。大多数实例即使不包含在训练中也会有低的损失。对于这些实例，模型可以很好地拟合它们而不必担心MI攻击。一个有效的防御只需要（可能是隐含地）识别对MI攻击脆弱的实例并避免过度拟合它们。一个主要的挑战是如何在高效的训练过程中实现这样的效果。利用最近两个不同的代表性学习进展：反事实不变表示和子空间学习方法，我们引入了一种新颖的成员不变子空间训练（MIST）方法来抵御MI攻击。MIST避免过度拟合脆弱实例而对其他实例没有显著影响。我们进行了大量实验研究，将MIST与各种其他最新的MI防御措施与几种最新的MI攻击进行了比较。我们发现MIST在超过其他防御措施的同时，测试准确性减少最小。

更新时间: 2024-05-29 05:34:09

领域: cs.CR,cs.LG

下载: http://arxiv.org/abs/2311.00919v2

Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass

Many applications today provide users with multiple auto-complete drafts as they type, including GitHub's code completion, Gmail's smart compose, and Apple's messaging auto-suggestions. Under the hood, language models support this by running an autoregressive inference pass to provide a draft. Consequently, providing $k$ drafts to the user requires running an expensive language model $k$ times. To alleviate the computation cost of running $k$ inference passes, we propose Superposed Decoding, a new decoding algorithm that generates $k$ drafts at the computation cost of one autoregressive inference pass. We achieve this by feeding a superposition of the most recent token embeddings from the $k$ drafts as input to the next decoding step of the language model. At every inference step we combine the $k$ drafts with the top-$k$ tokens to get $k^2$ new drafts and cache the $k$ most likely options, using an n-gram interpolation with minimal compute overhead to filter out incoherent generations. Our experiments show that $k$ drafts from Superposed Decoding are at least as coherent and factual as Nucleus Sampling and Greedy Decoding respectively, while being at least $2.44\times$ faster for $k\ge3$. In a compute-normalized setting, user evaluations demonstrably favor text generated by Superposed Decoding over Nucleus Sampling. Code and more examples open-sourced at https://github.com/RAIVNLab/SuperposedDecoding.

Updated: 2024-05-29 05:33:08

标题: 叠加解码：从单个自回归推断传递中获得多代数

摘要: 今天的许多应用程序在用户输入时提供多个自动完成草稿，包括GitHub的代码完成、Gmail的智能撰写和Apple的消息自动建议。在幕后，语言模型通过运行自回归推理过程来支持这一功能，以提供草稿。因此，向用户提供$k$个草稿需要运行一个昂贵的语言模型$k$次。为了减轻运行$k$次推理传递的计算成本，我们提出了一种新的解码算法Superposed Decoding，它以一个自回归推理传递的计算成本生成$k$个草稿。我们通过将$k$个草稿中最近的令牌嵌入的叠加作为输入提供给语言模型的下一个解码步骤来实现这一点。在每个推理步骤中，我们将$k$个草稿与前$k$个标记组合起来，得到$k^2$个新的草稿，并缓存$k$个最有可能的选项，使用最小的计算开销的n-gram插值来过滤出不连贯的生成结果。我们的实验表明，从Superposed Decoding生成的$k$个草稿在一致性和事实性上至少与核心采样和贪婪解码相当，而对于$k\ge3$，速度至少快$2.44$倍。在一个计算规范化的设置中，用户评价明显偏向于由Superposed Decoding生成的文本而不是核心采样。在https://github.com/RAIVNLab/SuperposedDecoding 上可以找到代码和更多示例。

更新时间: 2024-05-29 05:33:08

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.18400v2

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $\epsilon < 3$). To address such limitations, we propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism, which takes the first step to enable the tight composition of accurately fine-tuning (large) language models with a sub-optimal DP mechanism, even in strong privacy regimes (e.g., $0.1\leq \epsilon<3$). Furthermore, we propose a novel offline optimal noise search method to efficiently derive the sub-optimal DP that significantly reduces the noise magnitude. For instance, fine-tuning RoBERTa-large (with 300M parameters) on the SST-2 dataset can achieve an accuracy of 92.20% (given $\epsilon=0.3$, $\delta=10^{-10}$) by drastically outperforming the Gaussian mechanism (e.g., $\sim 50\%$ for small $\epsilon$ and $\delta$). We also draw similar findings on the text generation tasks on GPT-2. Finally, to our best knowledge, LMO-DP is also the first solution to accurately fine-tune Llama-2 with strong differential privacy guarantees. The code will be released soon and available upon request.

Updated: 2024-05-29 05:32:50

标题: LMO-DP：为差分隐私微调（大型）语言模型优化随机化机制

摘要: 差分私有随机梯度下降（DP-SGD）及其变体被提出，以确保对大规模预训练语言模型进行精细调整时的严格隐私保护。然而，它们严重依赖高斯机制，可能过度扰动梯度并降低准确性，尤其在更强的隐私保护范围内（例如，隐私预算ε < 3）。为解决这些限制，我们提出了一种新颖的基于语言模型的最优差分隐私（LMO-DP）机制，它首次实现了通过次优差分隐私机制紧密精细调整（大型）语言模型，即使在强隐私保护范围内（例如，0.1 ≤ ε < 3）。此外，我们提出了一种新颖的离线最优噪音搜索方法，以高效地推导出显著降低噪音幅度的次优差分隐私。例如，通过显著优于高斯机制（例如，在小ε和δ时约50%）的方式，将RoBERTa-large（具有3亿参数）在SST-2数据集上进行精细调整，可以实现92.20%的准确度（给定ε=0.3，δ=10^-10）。我们还在GPT-2上的文本生成任务中得出了类似的发现。最后，据我们所知，LMO-DP也是第一个能够准确精细调整带有强差分隐私保证的Llama-2的解决方案。该代码将很快发布，并可根据请求提供。

更新时间: 2024-05-29 05:32:50

领域: cs.CR,cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.18776v1

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Jointly training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.

Updated: 2024-05-29 05:27:38

标题: 单图像遗忘：多模态大型语言模型中高效的机器遗忘

摘要: 机器消除学习赋予个人“被遗忘”的权利，通过消除在机器学习模型中编码的私人或敏感信息。然而，尚不确定MU是否可以有效应用于多模数据语言模型（MLLMs），尤其是在忘记泄漏的概念视觉数据的情况下。为了克服这一挑战，我们提出了一种高效的方法，即单图像消除（SIU），通过对一个相关图像进行少量微调来消除对概念的视觉识别。SIU包括两个关键方面：（i）构建多方面微调数据。我们介绍了四个目标，基于这些目标，我们为要遗忘的概念构建微调数据；（ii）联合训练损失。为了同时忘记概念的视觉识别并保留MLLM的效用，我们通过结合双掩蔽KL散度损失和交叉熵损失，对MLLM进行微调。除此之外，我们建立了MMUBench，这是一个针对MLLM中MU的新基准，并引入了一系列用于评估的指标。在MMUBench上的实验结果表明，SIU完全超过了现有方法的性能。此外，我们惊讶地发现SIU可以避免入侵性成员推断攻击和越狱攻击。据我们所知，我们是第一个探索MLLM中MU的研究者。我们将在不久的将来发布代码和基准。

更新时间: 2024-05-29 05:27:38

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.12523v2

Bagging Improves Generalization Exponentially

Bagging is a popular ensemble technique to improve the accuracy of machine learning models. It hinges on the well-established rationale that, by repeatedly retraining on resampled data, the aggregated model exhibits lower variance and hence higher stability, especially for discontinuous base learners. In this paper, we provide a new perspective on bagging: By suitably aggregating the base learners at the parametrization instead of the output level, bagging improves generalization performances exponentially, a strength that is significantly more powerful than variance reduction. More precisely, we show that for general stochastic optimization problems that suffer from slowly (i.e., polynomially) decaying generalization errors, bagging can effectively reduce these errors to an exponential decay. Moreover, this power of bagging is agnostic to the solution schemes, including common empirical risk minimization, distributionally robust optimization, and various regularizations. We demonstrate how bagging can substantially improve generalization performances in a range of examples involving heavy-tailed data that suffer from intrinsically slow rates.

Updated: 2024-05-29 05:27:04

标题: Bagging方法能够指数级提高泛化能力

摘要: Bagging是一种流行的集成技术，用于提高机器学习模型的准确性。它依赖于一个被广泛认可的理念，即通过在重新采样的数据上反复训练，聚合模型表现出更低的方差，因此具有更高的稳定性，特别是对于不连续的基本学习器。在本文中，我们提供了Bagging的一个新视角：通过适当地在参数化而不是输出级别上聚合基本学习器，Bagging的泛化性能呈指数级改善，这种优势显著强于方差的降低。更确切地说，我们展示了对于一般随机优化问题，其泛化误差缓慢（即多项式）下降的情况，Bagging可以有效地将这些误差减少到指数下降。此外，Bagging的这种能力对解决方案方案是不可知的，包括常见的经验风险最小化、分布鲁棒优化和各种正则化。我们展示了Bagging如何显著改善涉及重尾数据的一系列例子中的泛化性能，这些数据受固有慢速率的影响。

更新时间: 2024-05-29 05:27:04

领域: math.OC,cs.LG,stat.ML

下载: http://arxiv.org/abs/2405.14741v2

Optimizing Search Advertising Strategies: Integrating Reinforcement Learning with Generalized Second-Price Auctions for Enhanced Ad Ranking and Bidding

This paper explores the integration of strategic optimization methods in search advertising, focusing on ad ranking and bidding mechanisms within E-commerce platforms. By employing a combination of reinforcement learning and evolutionary strategies, we propose a dynamic model that adjusts to varying user interactions and optimizes the balance between advertiser cost, user relevance, and platform revenue. Our results suggest significant improvements in ad placement accuracy and cost efficiency, demonstrating the model's applicability in real-world scenarios.

Updated: 2024-05-29 05:25:49

标题: 优化搜索广告策略：将强化学习与广义第二价格拍卖相结合，以增强广告排名和竞价

摘要: 本文探讨了在搜索广告中整合战略优化方法，重点放在电子商务平台内的广告排名和出价机制上。通过采用强化学习和进化策略的组合，我们提出了一个动态模型，该模型能够根据不同的用户互动调整自身，并优化广告主成本、用户相关性和平台收入之间的平衡。我们的结果表明，在广告放置准确性和成本效率方面取得了显著的改进，展示了该模型在实际场景中的适用性。

更新时间: 2024-05-29 05:25:49

领域: cs.LG

下载: http://arxiv.org/abs/2405.13381v2

Active Statistical Inference

Inspired by the concept of active learning, we propose active inference$\unicode{x2013}$a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on non-adaptively-collected data. This means that for the same number of collected samples, active inference enables smaller confidence intervals and more powerful p-values. We evaluate active inference on datasets from public opinion research, census analysis, and proteomics.

Updated: 2024-05-29 05:20:54

标题: 主动统计推断

摘要: 受主动学习概念启发，我们提出了主动推断——一种结合机器学习辅助数据收集的统计推断方法。假设标签数量存在限制，该方法利用机器学习模型识别哪些数据点最有利于标记，从而有效利用预算。它基于一个简单而强大的直觉运作：优先收集模型表现出不确定性的数据点的标签，并依赖模型在确定时的预测。主动推断构建可证明有效的置信区间和假设检验，同时利用任何黑盒机器学习模型并处理任何数据分布。关键是它能够以比依赖非自适应收集数据的现有基线更少的样本达到相同的准确性水平。这意味着对于相同数量的收集样本，主动推断能够产生更小的置信区间和更强大的p值。我们在公共舆论研究、人口普查分析和蛋白质组学数据集上评估了主动推断方法。

更新时间: 2024-05-29 05:20:54

领域: stat.ML,cs.LG,stat.ME

下载: http://arxiv.org/abs/2403.03208v2

Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks

Recent studies have revealed that vision-language (VL) models are vulnerable to adversarial attacks for image-text retrieval (ITR). However, existing defense strategies for VL models primarily focus on zero-shot image classification, which do not consider the simultaneous manipulation of image and text, as well as the inherent many-to-many (N:N) nature of ITR, where a single image can be described in numerous ways, and vice versa. To this end, this paper studies defense strategies against adversarial attacks on VL models for ITR for the first time. Particularly, we focus on how to leverage the N:N relationship in ITR to enhance adversarial robustness. We found that, although adversarial training easily overfits to specific one-to-one (1:1) image-text pairs in the train data, diverse augmentation techniques to create one-to-many (1:N) / many-to-one (N:1) image-text pairs can significantly improve adversarial robustness in VL models. Additionally, we show that the alignment of the augmented image-text pairs is crucial for the effectiveness of the defense strategy, and that inappropriate augmentations can even degrade the model's performance. Based on these findings, we propose a novel defense strategy that leverages the N:N relationship in ITR, which effectively generates diverse yet highly-aligned N:N pairs using basic augmentations and generative model-based augmentations. This work provides a novel perspective on defending against adversarial attacks in VL tasks and opens up new research directions for future work.

Updated: 2024-05-29 05:20:02

标题: 利用多对多关系抵御视觉语言对抗攻击

摘要: 最近的研究表明，视觉-语言（VL）模型在图像-文本检索（ITR）中容易受到对抗性攻击的影响。然而，现有的VL模型防御策略主要集中在零样本图像分类上，没有考虑到图像和文本的同时操作，以及ITR的固有的多对多（N:N）性质，即一张图片可以用多种方式描述，反之亦然。因此，本文首次研究了针对ITR中VL模型的对抗性攻击的防御策略。特别地，我们关注如何利用ITR中的N:N关系来增强对抗性鲁棒性。我们发现，尽管对抗训练很容易过拟合于训练数据中特定的一对一（1:1）图像-文本对，但通过多样化增强技术以创建一对多（1:N）/多对一（N:1）图像-文本对可以显著提高VL模型的对抗性鲁棒性。此外，我们还表明，增强后的图像-文本对的对齐对于防御策略的有效性至关重要，不当的增强甚至可能降低模型的性能。基于这些发现，我们提出了一种新颖的防御策略，利用ITR中的N:N关系，通过基本增强和生成模型增强有效地生成多样化但高度对齐的N:N对。这项工作为在VL任务中抵御对抗性攻击提供了一种新的视角，并为未来的研究开辟了新的方向。

更新时间: 2024-05-29 05:20:02

领域: cs.CV,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.18770v1

Coupled Mamba: Enhanced Multi-modal Fusion with Coupled State Space Model

The essence of multi-modal fusion lies in exploiting the complementary information inherent in diverse modalities. However, prevalent fusion methods rely on traditional neural architectures and are inadequately equipped to capture the dynamics of interactions across modalities, particularly in presence of complex intra- and inter-modality correlations. Recent advancements in State Space Models (SSMs), notably exemplified by the Mamba model, have emerged as promising contenders. Particularly, its state evolving process implies stronger modality fusion paradigm, making multi-modal fusion on SSMs an appealing direction. However, fusing multiple modalities is challenging for SSMs due to its hardware-aware parallelism designs. To this end, this paper proposes the Coupled SSM model, for coupling state chains of multiple modalities while maintaining independence of intra-modality state processes. Specifically, in our coupled scheme, we devise an inter-modal hidden states transition scheme, in which the current state is dependent on the states of its own chain and that of the neighbouring chains at the previous time-step. To fully comply with the hardware-aware parallelism, we devise an expedite coupled state transition scheme and derive its corresponding global convolution kernel for parallelism. Extensive experiments on CMU-MOSEI, CH-SIMS, CH-SIMSV2 through multi-domain input verify the effectiveness of our model compared to current state-of-the-art methods, improved F1-Score by 0.4\%, 0.9\%, and 2.3\% on the three datasets respectively, 49\% faster inference and 83.7\% GPU memory save. The results demonstrate that Coupled Mamba model is capable of enhanced multi-modal fusion.

Updated: 2024-05-29 05:19:15

标题: 耦合Mamba：使用耦合状态空间模型增强多模融合

摘要: 多模态融合的本质在于利用不同模态固有的互补信息。然而，目前流行的融合方法依赖于传统的神经网络架构，无法充分捕捉跨模态之间的相互作用动态，特别是在存在复杂的内部和跨模态相关性的情况下。最近在状态空间模型（SSMs）方面的进展，尤其是Mamba模型的典范，已经成为有前途的竞争者。特别是，其状态演化过程暗示了更强的模态融合范式，使得在SSMs上进行多模态融合成为一个吸引人的方向。然而，由于其硬件感知并行设计，对于SSMs来说，融合多种模态是具有挑战性的。为此，本文提出了耦合状态空间模型，用于耦合多种模态的状态链，同时保持内部模态状态过程的独立性。具体来说，在我们的耦合方案中，我们设计了一个跨模态隐藏状态转换方案，其中当前状态取决于其自己链的状态以及前一个时间步的相邻链的状态。为了完全符合硬件感知并行性，我们设计了一个快速耦合状态转换方案，并推导出相应的全局卷积核以实现并行性。通过对CMU-MOSEI、CH-SIMS、CH-SIMSV2的广泛实验，通过多领域输入验证了我们的模型相对于当前最先进的方法的有效性，分别在三个数据集上将F1分数提高了0.4％、0.9％和2.3％，推理速度提高了49％，GPU内存节省了83.7％。结果表明，耦合Mamba模型能够增强多模态融合。

更新时间: 2024-05-29 05:19:15

领域: cs.AI

下载: http://arxiv.org/abs/2405.18014v2

RNAFlow: RNA Structure & Sequence Design via Inverse Folding-Based Flow Matching

The growing significance of RNA engineering in diverse biological applications has spurred interest in developing AI methods for structure-based RNA design. While diffusion models have excelled in protein design, adapting them for RNA presents new challenges due to RNA's conformational flexibility and the computational cost of fine-tuning large structure prediction models. To this end, we propose RNAFlow, a flow matching model for protein-conditioned RNA sequence-structure design. Its denoising network integrates an RNA inverse folding model and a pre-trained RosettaFold2NA network for generation of RNA sequences and structures. The integration of inverse folding in the structure denoising process allows us to simplify training by fixing the structure prediction network. We further enhance the inverse folding model by conditioning it on inferred conformational ensembles to model dynamic RNA conformations. Evaluation on protein-conditioned RNA structure and sequence generation tasks demonstrates RNAFlow's advantage over existing RNA design methods.

Updated: 2024-05-29 05:10:25

标题: RNAFlow：通过基于逆向折叠的流匹配设计RNA结构和序列

摘要: 在各种生物学应用中，RNA工程的日益重要性引发了对基于结构的RNA设计的人工智能方法的兴趣。虽然扩散模型在蛋白质设计方面表现出色，但将其应用于RNA设计由于RNA的构象灵活性和微调大型结构预测模型的计算成本而面临新的挑战。为此，我们提出了RNAFlow，一种用于蛋白质条件下的RNA序列结构设计的流匹配模型。其去噪网络集成了RNA逆向折叠模型和一个经过预训练的RosettaFold2NA网络，用于生成RNA序列和结构。在结构去噪过程中整合逆向折叠模型使我们能够通过固定结构预测网络简化训练。我们进一步通过在推断的构象集合上对逆向折叠模型进行调节来增强其性能，以模拟动态RNA构象。在蛋白质条件下的RNA结构和序列生成任务上的评估表明，RNAFlow相对于现有的RNA设计方法具有优势。

更新时间: 2024-05-29 05:10:25

领域: q-bio.BM,cs.LG

下载: http://arxiv.org/abs/2405.18768v1

Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI

The current electroencephalogram (EEG) based deep learning models are typically designed for specific datasets and applications in brain-computer interaction (BCI), limiting the scale of the models and thus diminishing their perceptual capabilities and generalizability. Recently, Large Language Models (LLMs) have achieved unprecedented success in text processing, prompting us to explore the capabilities of Large EEG Models (LEMs). We hope that LEMs can break through the limitations of different task types of EEG datasets, and obtain universal perceptual capabilities of EEG signals through unsupervised pre-training. Then the models can be fine-tuned for different downstream tasks. However, compared to text data, the volume of EEG datasets is generally small and the format varies widely. For example, there can be mismatched numbers of electrodes, unequal length data samples, varied task designs, and low signal-to-noise ratio. To overcome these challenges, we propose a unified foundation model for EEG called Large Brain Model (LaBraM). LaBraM enables cross-dataset learning by segmenting the EEG signals into EEG channel patches. Vector-quantized neural spectrum prediction is used to train a semantically rich neural tokenizer that encodes continuous raw EEG channel patches into compact neural codes. We then pre-train neural Transformers by predicting the original neural codes for the masked EEG channel patches. The LaBraMs were pre-trained on about 2,500 hours of various types of EEG signals from around 20 datasets and validated on multiple different types of downstream tasks. Experiments on abnormal detection, event type classification, emotion recognition, and gait prediction show that our LaBraM outperforms all compared SOTA methods in their respective fields. Our code is available at https://github.com/935963004/LaBraM.

Updated: 2024-05-29 05:08:16

标题: 大脑模型用于在脑机接口中利用大量脑电数据学习通用表示

摘要: 目前基于脑电图（EEG）的深度学习模型通常针对特定数据集和脑-计算机交互（BCI）应用进行设计，限制了模型的规模，从而降低了其感知能力和泛化能力。最近，大型语言模型（LLMs）在文本处理方面取得了前所未有的成功，促使我们探索大型脑电图模型（LEMs）的能力。我们希望LEM能突破EEG数据集不同任务类型的限制，并通过无监督预训练获得EEG信号的通用感知能力。然后，可以根据不同的下游任务对模型进行微调。然而，与文本数据相比，EEG数据集的数量通常较小，格式差异很大。例如，电极数量不匹配，数据样本长度不等，任务设计各异，信噪比低。为了克服这些挑战，我们提出了一个统一的EEG基础模型，称为大脑模型（LaBraM）。LaBraM通过将EEG信号分割成EEG通道片段，实现了跨数据集学习。使用矢量量化神经频谱预测来训练一个语义丰富的神经分词器，将连续的原始EEG通道片段编码为紧凑的神经代码。然后，通过预测被屏蔽的EEG通道片段的原始神经代码来预训练神经Transformer。LaBraM在约20个数据集中的大约2500小时各种类型的EEG信号上进行了预训练，并在多种不同类型的下游任务上进行了验证。在异常检测、事件类型分类、情绪识别和步态预测实验中，我们的LaBraM在各自领域优于所有对比方法。我们的代码可在https://github.com/935963004/LaBraM获取。

更新时间: 2024-05-29 05:08:16

领域: cs.LG

下载: http://arxiv.org/abs/2405.18765v1

Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation

This paper examines the limitations of advanced text-to-image models in accurately rendering unconventional concepts which are scarcely represented or absent in their training datasets. We identify how these limitations not only confine the creative potential of these models but also pose risks of reinforcing stereotypes. To address these challenges, we introduce the Inpaint Biases framework, which employs user-defined masks and inpainting techniques to enhance the accuracy of image generation, particularly for novel or inaccurately rendered objects. Through experimental validation, we demonstrate how this framework significantly improves the fidelity of generated images to the user's intent, thereby expanding the models' creative capabilities and mitigating the risk of perpetuating biases. Our study contributes to the advancement of text-to-image models as unbiased, versatile tools for creative expression.

Updated: 2024-05-29 05:04:07

标题: 修复偏差：实现准确且无偏的图像生成路径

摘要: 本文研究了先进的文本到图像模型在准确呈现其训练数据集中稀缺或缺失的非传统概念方面的局限性。我们确定了这些局限性不仅限制了这些模型的创造潜力，还存在强化刻板印象的风险。为了解决这些挑战，我们引入了Inpaint Biases框架，该框架采用用户定义的蒙版和修复技术来增强图像生成的准确性，特别是对于新颖或准确度较低的对象。通过实验验证，我们展示了这一框架如何显著提高生成图像与用户意图的一致性，从而扩展模型的创造能力并减轻持续偏见的风险。我们的研究为文本到图像模型作为无偏见、多功能的创意表达工具的进步做出了贡献。

更新时间: 2024-05-29 05:04:07

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.18762v1

FDQN: A Flexible Deep Q-Network Framework for Game Automation

In reinforcement learning, it is often difficult to automate high-dimensional, rapid decision-making in dynamic environments, especially when domains require real-time online interaction and adaptive strategies such as web-based games. This work proposes a state-of-the-art Flexible Deep Q-Network (FDQN) framework that can address this challenge with a selfadaptive approach that is processing high-dimensional sensory data in realtime using a CNN and dynamically adapting the model architecture to varying action spaces of different gaming environments and outperforming previous baseline models in various Atari games and the Chrome Dino game as baselines. Using the epsilon-greedy policy, it effectively balances the new learning and exploitation for improved performance, and it has been designed with a modular structure that it can be easily adapted to other HTML-based games without touching the core part of the framework. It is demonstrated that the FDQN framework can successfully solve a well-defined task in a laboratory condition, but more importantly it also discusses potential applications to more challenging real-world cases and serve as the starting point for future further exploration into automated game play and beyond.

Updated: 2024-05-29 05:00:50

标题: FDQN：一种灵活的深度 Q 网络框架，用于游戏自动化

摘要: 在强化学习中，通常很难自动化处理动态环境中的高维、快速决策，特别是当领域需要实时在线交互和自适应策略，如基于网络的游戏。本文提出了一个最先进的灵活深度Q网络（FDQN）框架，可以通过自适应方法处理高维感知数据，并使用CNN实时动态调整模型架构以适应不同游戏环境的行动空间，从而在各种Atari游戏和Chrome Dino游戏中表现优于先前的基线模型。通过使用ε-贪心策略，它有效平衡了新的学习和开发，以改善性能，并且设计为模块化结构，可以轻松适应其他基于HTML的游戏，而不必涉及框架的核心部分。实验证明，FDQN框架可以成功解决实验室环境中的一个明确定义的任务，更重要的是，它还讨论了将其应用于更具挑战性的真实案例，并作为未来自动游戏玩耍及其他领域探索的起点。

更新时间: 2024-05-29 05:00:50

领域: cs.LG,68T05, 93E35,I.2.8; I.2.6; I.5.1

下载: http://arxiv.org/abs/2405.18761v1

Learning to Continually Learn with the Bayesian Principle

In the present era of deep learning, continual learning research is mainly focused on mitigating forgetting when training a neural network with stochastic gradient descent on a non-stationary stream of data. On the other hand, in the more classical literature of statistical machine learning, many models have sequential Bayesian update rules that yield the same learning outcome as the batch training, i.e., they are completely immune to catastrophic forgetting. However, they are often overly simple to model complex real-world data. In this work, we adopt the meta-learning paradigm to combine the strong representational power of neural networks and simple statistical models' robustness to forgetting. In our novel meta-continual learning framework, continual learning takes place only in statistical models via ideal sequential Bayesian update rules, while neural networks are meta-learned to bridge the raw data and the statistical models. Since the neural networks remain fixed during continual learning, they are protected from catastrophic forgetting. This approach not only achieves significantly improved performance but also exhibits excellent scalability. Since our approach is domain-agnostic and model-agnostic, it can be applied to a wide range of problems and easily integrated with existing model architectures.

Updated: 2024-05-29 04:53:31

标题: 学习使用贝叶斯原则不断学习

摘要: 在当前深度学习时代，持续学习研究主要集中在训练神经网络时如何在非稳态数据流上使用随机梯度下降来减轻遗忘。另一方面，在统计机器学习的更经典文献中，许多模型具有顺序贝叶斯更新规则，产生与批量训练相同的学习结果，即它们完全免疫灾难性遗忘。然而，它们通常过于简单，无法对复杂的现实数据进行建模。在这项工作中，我们采用元学习范式，将神经网络的强表征能力与简单统计模型对遗忘的鲁棒性相结合。在我们的新颖元持续学习框架中，持续学习只发生在统计模型中，通过理想的顺序贝叶斯更新规则，而神经网络则进行元学习，连接原始数据和统计模型。由于神经网络在持续学习过程中保持固定，它们不会受到灾难性遗忘的影响。这种方法不仅实现了显著改进的性能，而且展现出出色的可扩展性。由于我们的方法与领域无关且与模型无关，因此可以应用于各种问题，并轻松集成到现有模型架构中。

更新时间: 2024-05-29 04:53:31

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18758v1

Provable Contrastive Continual Learning

Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contrastive continual learning framework lacks convincing theoretical explanations. In this work, we fill this gap by establishing theoretical performance guarantees, which reveal how the performance of the model is bounded by training losses of previous tasks in the contrastive continual learning framework. Our theoretical explanations further support the idea that pre-training can benefit continual learning. Inspired by our theoretical analysis of these guarantees, we propose a novel contrastive continual learning algorithm called CILA, which uses adaptive distillation coefficients for different tasks. These distillation coefficients are easily computed by the ratio between average distillation losses and average contrastive losses from previous tasks. Our method shows great improvement on standard benchmarks and achieves new state-of-the-art performance.

Updated: 2024-05-29 04:48:11

标题: 可证明的对比继续学习

摘要: 持续学习需要学习具有动态数据分布的增量任务。到目前为止，已经观察到在持续学习中使用对比损失和蒸馏损失的结合进行训练可以获得良好的性能。然而，据我们所知，这种对比持续学习框架缺乏令人信服的理论解释。在这项工作中，我们通过建立理论性能保证来填补这一空白，这些保证揭示了模型的性能如何受到在对比持续学习框架中先前任务的训练损失的限制。我们的理论解释进一步支持预训练可以有助于持续学习的观点。受到我们对这些保证的理论分析的启发，我们提出了一种名为CILA的新型对比持续学习算法，该算法使用不同任务的自适应蒸馏系数。这些蒸馏系数可以通过计算先前任务的平均蒸馏损失和平均对比损失之间的比率来轻松计算。我们的方法在标准基准测试中显示出极大的改进，并取得了新的最先进性能。

更新时间: 2024-05-29 04:48:11

领域: cs.LG,cs.AI,cs.CV,stat.AP,stat.ML

下载: http://arxiv.org/abs/2405.18756v1

Spectraformer: A Unified Random Feature Framework for Transformer

Linearization of attention using various kernel approximation and kernel learning techniques has shown promise. Past methods use a subset of combinations of component functions and weight matrices within the random features paradigm. We identify the need for a systematic comparison of different combinations of weight matrix and component functions for attention learning in Transformer. In this work, we introduce Spectraformer, a unified framework for approximating and learning the kernel function in linearized attention of the Transformer. We experiment with broad classes of component functions and weight matrices for three textual tasks in the LRA benchmark. Our experimentation with multiple combinations of component functions and weight matrices leads us to a novel combination with 23.4% faster training time and 25.2% lower memory consumption over the previous SOTA random feature Transformer, while maintaining the performance, as compared to the Original Transformer. Our code is available at: https://github.com/dukeraphaelng/spectraformer .

Updated: 2024-05-29 04:45:26

标题: Spectraformer：Transformer的统一随机特征框架

摘要: 使用各种核逼近和核学习技术对注意力进行线性化已经显示出很大的潜力。过去的方法使用随机特征范式内的一组组件函数和权重矩阵的子集。我们认识到需要对Transformer中注意力学习的不同权重矩阵和组件函数组合进行系统比较。在这项工作中，我们引入了Spectraformer，这是一个统一的框架，用于在Transformer的线性化注意力中近似和学习核函数。我们在LRA基准测试中为三个文本任务尝试了广泛的组件函数和权重矩阵。我们尝试了多种组件函数和权重矩阵的组合，找到了一种新颖的组合，其训练时间比之前的SOTA随机特征Transformer快23.4%，内存消耗低25.2%，同时性能与原始Transformer相当。我们的代码可以在https://github.com/dukeraphaelng/spectraformer找到。

更新时间: 2024-05-29 04:45:26

领域: cs.LG

下载: http://arxiv.org/abs/2405.15310v2

GIST: Greedy Independent Set Thresholding for Diverse Data Summarization

We propose a novel subset selection task called min-distance diverse data summarization ($\textsf{MDDS}$), which has a wide variety of applications in machine learning, e.g., data sampling and feature selection. Given a set of points in a metric space, the goal is to maximize an objective that combines the total utility of the points and a diversity term that captures the minimum distance between any pair of selected points, subject to the constraint $|S| \le k$. For example, the points may correspond to training examples in a data sampling problem, e.g., learned embeddings of images extracted from a deep neural network. This work presents the $\texttt{GIST}$ algorithm, which achieves a $\frac{2}{3}$-approximation guarantee for $\textsf{MDDS}$ by approximating a series of maximum independent set problems with a bicriteria greedy algorithm. We also prove a complementary $(\frac{2}{3}+\varepsilon)$-hardness of approximation, for any $\varepsilon > 0$. Finally, we provide an empirical study that demonstrates $\texttt{GIST}$ outperforms existing methods for $\textsf{MDDS}$ on synthetic data, and also for a real-world image classification experiment the studies single-shot subset selection for ImageNet.

Updated: 2024-05-29 04:39:24

标题: GIST：贪婪独立集阈值化用于多样化数据摘要

摘要: 我们提出了一种名为最小距离多样化数据摘要（$\textsf{MDDS}$）的新型子集选择任务，该任务在机器学习中具有广泛的应用，例如数据抽样和特征选择。给定一个度量空间中的一组点，目标是最大化一个结合了点的总效用和捕获所选点之间最小距离的多样性项的目标，同时满足约束条件$|S| \le k$。例如，这些点可能对应于数据抽样问题中的训练示例，例如从深度神经网络中提取的图像的学习嵌入。本文介绍了$\texttt{GIST}$算法，通过用双标准贪婪算法近似一系列最大独立集问题，实现了对$\textsf{MDDS}$的$\frac{2}{3}$近似保证。我们还证明了对于任意$\varepsilon > 0$，存在一个互补的$(\frac{2}{3}+\varepsilon)$-难度的近似。最后，我们提供了一项实证研究，证明$\texttt{GIST}$在合成数据上优于现有方法，也优于一个针对ImageNet的真实世界图像分类实验中的单次子集选择研究。

更新时间: 2024-05-29 04:39:24

领域: cs.DS,cs.LG

下载: http://arxiv.org/abs/2405.18754v1

Confronting the Reproducibility Crisis: A Case Study in Validating Certified Robustness

Reproducibility is a cornerstone of scientific research, enabling validation, extension, and progress. However, the rapidly evolving nature of software and dependencies poses significant challenges to reproducing research results, particularly in fields like adversarial robustness for deep neural networks, where complex codebases and specialized toolkits are utilized. This paper presents a case study of attempting to validate the results on certified adversarial robustness in "SoK: Certified Robustness for Deep Neural Networks" using the VeriGauge toolkit. Despite following the documented methodology, numerous software and hardware compatibility issues were encountered, including outdated or unavailable dependencies, version conflicts, and driver incompatibilities. While a subset of the original results could be run, key findings related to the empirical robust accuracy of various verification methods proved elusive due to these technical obstacles, as well as slight discrepancies in the test results. This practical experience sheds light on the reproducibility crisis afflicting adversarial robustness research, where a lack of reproducibility threatens scientific integrity and hinders progress. The paper discusses the broader implications of this crisis, proposing potential solutions such as containerization, software preservation, and comprehensive documentation practices. Furthermore, it highlights the need for collaboration and standardization efforts within the research community to develop robust frameworks for reproducible research. By addressing the reproducibility crisis head-on, this work aims to contribute to the ongoing discourse on scientific reproducibility and advocate for best practices that ensure the reliability and validity of research findings within not only adversarial robustness, but security and technology research as a whole.

Updated: 2024-05-29 04:37:19

标题: 应对可重复性危机：验证认证鲁棒性的案例研究

摘要: 可重复性是科学研究的基石，能够验证、延伸和推进研究。然而，软件和依赖关系的快速发展给重现研究结果带来了重大挑战，特别是在深度神经网络的对抗鲁棒性等领域，复杂的代码库和专门的工具包被使用。本文以尝试验证"SoK: Certified Robustness for Deep Neural Networks"中关于认证对抗鲁棒性的结果为案例研究，使用VeriGauge工具包。尽管按照文档中的方法论进行，但遇到了许多软件和硬件兼容性问题，包括过时或不可用的依赖关系、版本冲突和驱动程序不兼容性。虽然可以运行部分原始结果，但由于这些技术障碍以及测试结果中的轻微差异，与各种验证方法相关的关键发现仍然难以实现。这种实践经验揭示了困扰对抗鲁棒性研究的可重复性危机，即缺乏可重复性威胁了科学诚信并阻碍了进展。本文讨论了这一危机的更广泛影响，提出了潜在解决方案，如容器化、软件保存和全面的文档实践。此外，它强调了研究社区内合作和标准化努力的必要性，以开发可重现研究的强大框架。通过直面可重复性危机，本工作旨在为科学可重复性的持续讨论做出贡献，并倡导确保不仅对抗鲁棒性研究，而且对安全和技术研究的研究结果的可靠性和有效性的最佳实践。

更新时间: 2024-05-29 04:37:19

领域: cs.LG

下载: http://arxiv.org/abs/2405.18753v1

On the Limits of Multi-modal Meta-Learning with Auxiliary Task Modulation Using Conditional Batch Normalization

Few-shot learning aims to learn representations that can tackle novel tasks given a small number of examples. Recent studies show that cross-modal learning can improve representations for few-shot classification. More specifically, language is a rich modality that can be used to guide visual learning. In this work, we experiment with a multi-modal architecture for few-shot learning that consists of three components: a classifier, an auxiliary network, and a bridge network. While the classifier performs the main classification task, the auxiliary network learns to predict language representations from the same input, and the bridge network transforms high-level features of the auxiliary network into modulation parameters for layers of the few-shot classifier using conditional batch normalization. The bridge should encourage a form of lightweight semantic alignment between language and vision which could be useful for the classifier. However, after evaluating the proposed approach on two popular few-shot classification benchmarks we find that a) the improvements do not reproduce across benchmarks, and b) when they do, the improvements are due to the additional compute and parameters introduced by the bridge network. We contribute insights and recommendations for future work in multi-modal meta-learning, especially when using language representations.

Updated: 2024-05-29 04:29:12

标题: 关于使用条件批量归一化进行辅助任务调制的多模态元学习的限制

摘要: Few-shot learning旨在学习能够处理少量示例的新任务的表示。最近的研究表明，跨模态学习可以改进用于少样本分类的表示。更具体地说，语言是一种丰富的模态，可以用来指导视觉学习。在这项工作中，我们尝试了一种用于少样本学习的多模态架构，包括三个组件：分类器、辅助网络和桥接网络。分类器执行主要的分类任务，辅助网络学习从相同输入中预测语言表示，桥接网络将辅助网络的高级特征转换为用于使用条件批量归一化的少样本分类器的层的调制参数。桥接应该鼓励语言和视觉之间一种轻量级语义对齐的形式，这对分类器可能有用。然而，在两个流行的少样本分类基准上评估所提出的方法后，我们发现a）改进在基准之间无法重现，b）当它们这样做时，改进是由桥接网络引入的额外计算和参数导致的。我们为未来的多模态元学习工作，尤其是在使用语言表示时，提供了见解和建议。

更新时间: 2024-05-29 04:29:12

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.18751v1

A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models

Antibodies are crucial proteins produced by the immune system to eliminate harmful foreign substances and have become pivotal therapeutic agents for treating human diseases. To accelerate the discovery of antibody therapeutics, there is growing interest in constructing language models using antibody sequences. However, the applicability of pre-trained language models for antibody discovery has not been thoroughly evaluated due to the scarcity of labeled datasets. To overcome these limitations, we introduce AVIDa-SARS-CoV-2, a dataset featuring the antigen-variable domain of heavy chain of heavy chain antibody (VHH) interactions obtained from two alpacas immunized with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike proteins. AVIDa-SARS-CoV-2 includes binary labels indicating the binding or non-binding of diverse VHH sequences to 12 SARS-CoV-2 mutants, such as the Delta and Omicron variants. Furthermore, we release VHHCorpus-2M, a pre-training dataset for antibody language models, containing over two million VHH sequences. We report benchmark results for predicting SARS-CoV-2-VHH binding using VHHBERT pre-trained on VHHCorpus-2M and existing general protein and antibody-specific pre-trained language models. These results confirm that AVIDa-SARS-CoV-2 provides valuable benchmarks for evaluating the representation capabilities of antibody language models for binding prediction, thereby facilitating the development of AI-driven antibody discovery. The datasets are available at https://datasets.cognanous.com.

Updated: 2024-05-29 04:22:18

标题: 一个用于抗体语言模型的SARS-CoV-2相互作用数据集和VHH序列语料库

摘要: 抗体是免疫系统产生的关键蛋白质，用于清除有害的外来物质，并已成为治疗人类疾病的重要治疗药物。为加速抗体治疗的发现，人们对利用抗体序列构建语言模型越来越感兴趣。然而，由于标记数据集的稀缺性，预训练语言模型在抗体发现中的适用性尚未得到全面评估。为了克服这些限制，我们介绍了AVIDa-SARS-CoV-2，这是一个数据集，包含了两只被免疫接种严重急性呼吸综合征冠状病毒2（SARS-CoV-2）刺突蛋白的大鼻祖抗体（VHH）与抗原可变区域的相互作用。AVIDa-SARS-CoV-2 包括二进制标签，指示多样的 VHH 序列与12种 SARS-CoV-2 突变体（如 Delta 和 Omicron 变异体）的结合或非结合。此外，我们发布了 VHHCorpus-2M，这是一个用于抗体语言模型的预训练数据集，包含超过两百万个 VHH 序列。我们使用在 VHHCorpus-2M 上预训练的 VHHBERT 和现有的通用蛋白质和抗体特定预训练语言模型，报告了预测 SARS-CoV-2-VHH 结合的基准结果。这些结果证实了 AVIDa-SARS-CoV-2 为评估抗体语言模型在结合预测中的表现能力提供了有价值的基准，从而促进了基于人工智能的抗体发现的发展。这些数据集可在 https://datasets.cognanous.com 上获得。

更新时间: 2024-05-29 04:22:18

领域: cs.LG,q-bio.GN

下载: http://arxiv.org/abs/2405.18749v1

Symmetric Reinforcement Learning Loss for Robust Learning on Diverse Tasks and Model Scales

Reinforcement learning (RL) training is inherently unstable due to factors such as moving targets and high gradient variance. Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF) can introduce additional difficulty. Differing preferences can complicate the alignment process, and prediction errors in a trained reward model can become more severe as the LLM generates unseen outputs. To enhance training robustness, RL has adopted techniques from supervised learning, such as ensembles and layer normalization. In this work, we improve the stability of RL training by adapting the reverse cross entropy (RCE) from supervised learning for noisy data to define a symmetric RL loss. We demonstrate performance improvements across various tasks and scales. We conduct experiments in discrete action tasks (Atari games) and continuous action space tasks (MuJoCo benchmark and Box2D) using Symmetric A2C (SA2C) and Symmetric PPO (SPPO), with and without added noise with especially notable performance in SPPO across different hyperparameters. Furthermore, we validate the benefits of the symmetric RL loss when using SPPO for large language models through improved performance in RLHF tasks, such as IMDB positive sentiment sentiment and TL;DR summarization tasks.

Updated: 2024-05-29 04:19:00

标题: 对不同任务和模型规模的强化学习对称损失以实现稳健学习

摘要: 强化学习（RL）训练由于移动目标和高梯度变化等因素，本质上是不稳定的。从人类反馈中强化学习（RLHF）和从人工智能反馈中强化学习（RLAIF）可能会引入额外的困难。不同的偏好可能会使对齐过程复杂化，而经过训练的奖励模型中的预测错误可能会随着LLM生成未见输出而变得更为严重。为了增强训练的稳健性，RL采用了来自监督学习的技术，如集成和层归一化。在这项工作中，我们通过将来自嘈杂数据的反交叉熵（RCE）从监督学习中适应到定义对称RL损失，来改善RL训练的稳定性。我们在各种任务和规模上展示了性能的改进。我们在离散动作任务（Atari游戏）和连续动作空间任务（MuJoCo基准和Box2D）中进行了实验，使用了对称A2C（SA2C）和对称PPO（SPPO），有无额外噪声以及在不同超参数下，SPPO的表现特别值得注意。此外，我们通过在RLHF任务中使用SPPO，验证了对称RL损失在大型语言模型中的好处，表现为在IMDB正面情感和TL;DR摘要任务中的性能提升。

更新时间: 2024-05-29 04:19:00

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17618v2

SynHING: Synthetic Heterogeneous Information Network Generation for Graph Learning and Explanation

Graph Neural Networks (GNNs) excel in delineating graph structures in diverse domains, including community analysis and recommendation systems. As the interpretation of GNNs becomes increasingly important, the demand for robust baselines and expansive graph datasets is accentuated, particularly in the context of Heterogeneous Information Networks (HIN). Addressing this, we introduce SynHING, a novel framework for Synthetic Heterogeneous Information Network Generation aimed at enhancing graph learning and explanation. SynHING systematically identifies major motifs in a target HIN and employs a bottom-up generation process with intra-cluster and inter-cluster merge modules. This process, supplemented by post-pruning techniques, ensures the synthetic HIN closely mirrors the original graph's structural and statistical properties. Crucially, SynHING provides ground-truth motifs for evaluating GNN explainer models, setting a new standard for explainable, synthetic HIN generation and contributing to the advancement of interpretable machine learning in complex networks.

Updated: 2024-05-29 04:16:10

标题: SynHING：用于图学习和解释的合成异构信息网络生成

摘要: 图神经网络(GNNs)在各个领域，包括社区分析和推荐系统中，优秀地描述了图结构。随着对GNNs解释的重要性日益增加，对稳健基准线和广泛图数据集的需求变得更加突出，特别是在异质信息网络(HIN)的情境下。为了解决这个问题，我们引入了SynHING，一个旨在增强图学习和解释的合成异质信息网络生成框架。SynHING系统地识别目标HIN中的主要模式，并采用自下而上的生成过程，包括簇内和簇间合并模块。通过后修剪技术辅助的这一过程确保合成HIN与原始图的结构和统计特性紧密匹配。关键是，SynHING为评估GNN解释模型提供了地面真实的模式，为可解释的合成HIN生成设定了新标准，并促进了在复杂网络中的可解释机器学习的进步。

更新时间: 2024-05-29 04:16:10

领域: cs.LG,cs.AI,cs.SI

下载: http://arxiv.org/abs/2401.04133v2

InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining

Pretraining auto-regressive large language models~(LLMs) with retrieval demonstrates better perplexity and factual accuracy by leveraging external databases. However, the size of existing pretrained retrieval-augmented LLM is still limited (e.g., Retro has 7.5B parameters), which limits the effectiveness of instruction tuning and zero-shot generalization. In this work, we introduce Retro 48B, the largest LLM pretrained with retrieval. Specifically, we continue to pretrain a 43B GPT model on additional 100 billion tokens using the Retro augmentation method by retrieving from 1.2 trillion tokens. Notably, the obtained foundation model, Retro 48B, largely outperforms the counterpart GPT 43B trained on 1.2T tokens in terms of perplexity with only 2.58% additional GPU hours, demonstrating the significant scaling potential of the method. After instruction tuning on Retro, InstructRetro demonstrates significant improvement over the instruction tuned GPT on a wide range of zero-shot tasks. Specifically, the average improvement of InstructRetro is 7% over its GPT counterpart across 8 short-form QA and reading comprehension tasks, 10% over GPT across 4 challenging long-form QA tasks, and 16% over GPT across 3 summarization tasks. Surprisingly, we find that one can ablate the encoder from InstructRetro architecture and directly use its decoder backbone, while achieving comparable results. Our results highlight the promising direction to obtain a better GPT decoder through continued pretraining with retrieval before instruction tuning. Our code and checkpoints are publicly available at: https://huggingface.co/nvidia/retro-48b-instruct-4k.

Updated: 2024-05-29 04:15:39

标题: InstructRetro: 检索增强预训练后的指令调整

摘要: 使用检索预训练自回归大型语言模型（LLMs）显示出更好的困惑度和事实准确性，通过利用外部数据库。然而，现有的预训练检索增强的LLM的规模仍然有限（例如，Retro有75亿个参数），这限制了指导调整和零-shot泛化的有效性。在这项工作中，我们介绍了Retro 48B，这是使用检索预训练的最大LLM。具体来说，我们继续使用Retro增强方法对43B GPT模型进行额外1000亿令牌的预训练，通过从1.2万亿令牌中检索。值得注意的是，所获得的基础模型Retro 48B，在困惑度方面大大优于仅使用额外2.58% GPU小时训练的1.2T令牌的43B GPT对应模型，展示了该方法的显著扩展潜力。在Retro上进行指导调整后，InstructRetro在广泛的零-shot任务中表现出显著改进，特别是在8个短形式QA和阅读理解任务中，InstructRetro的平均改进幅度比其GPT对应模型高出7％，比GPT在4个具有挑战性的长形式QA任务上高出10％，比GPT在3个摘要任务上高出16％。令人惊讶的是，我们发现可以从InstructRetro架构中切除编码器，并直接使用其解码器骨干，同时获得可比较的结果。我们的结果突显了在进行指导调整之前通过持续检索预训练获得更好的GPT解码器的有希望方向。我们的代码和检查点可以在以下网址公开获取：https://huggingface.co/nvidia/retro-48b-instruct-4k。

更新时间: 2024-05-29 04:15:39

领域: cs.CL,cs.AI,cs.IR,cs.LG

下载: http://arxiv.org/abs/2310.07713v3

STIQ: Safeguarding Training and Inferencing of Quantum Neural Networks from Untrusted Cloud

The high expenses imposed by current quantum cloud providers, coupled with the escalating need for quantum resources, may incentivize the emergence of cheaper cloud-based quantum services from potentially untrusted providers. Deploying or hosting quantum models, such as Quantum Neural Networks (QNNs), on these untrusted platforms introduces a myriad of security concerns, with the most critical one being model theft. This vulnerability stems from the cloud provider's full access to these circuits during training and/or inference. In this work, we introduce STIQ, a novel ensemble-based strategy designed to safeguard QNNs against such cloud-based adversaries. Our method innovatively trains two distinct QNNs concurrently, hosting them on same or different platforms, in a manner that each network yields obfuscated outputs rendering the individual QNNs ineffective for adversaries operating within cloud environments. However, when these outputs are combined locally (using an aggregate function), they reveal the correct result. Through extensive experiments across various QNNs and datasets, our technique has proven to effectively masks the accuracy and losses of the individually hosted models by upto 76\%, albeit at the expense of $\leq 2\times$ increase in the total computational overhead. This trade-off, however, is a small price to pay for the enhanced security and integrity of QNNs in a cloud-based environment prone to untrusted adversaries. We also demonstrated STIQ's practical application by evaluating it on real 127-qubit IBM\_Sherbrooke hardware, showing that STIQ achieves up to 60\% obfuscation, with combined performance comparable to an unobfuscated model.

Updated: 2024-05-29 04:09:46

标题: STIQ：保护量子神经网络在不可信的云端进行训练和推理

摘要: 当前量子云服务提供商强加的高昂费用，加上对量子资源不断增长的需求，可能会促使廉价的基于云的量子服务从潜在的不受信任的提供商中出现。在这些不受信任的平台上部署或托管量子模型，如量子神经网络（QNNs），引入了许多安全问题，其中最关键的是模型盗窃。这种漏洞源于云提供商在训练和/或推理期间对这些电路的完全访问。在这项工作中，我们介绍了STIQ，一种新颖的基于集成的策略，旨在保护QNNs免受这些基于云的对手的侵害。我们的方法创新地同时训练两个不同的QNNs，将它们托管在相同或不同的平台上，以每个网络产生模糊输出，使得单个QNN对于在云环境中操作的对手无效。然而，当这些输出在本地组合（使用聚合函数）时，它们会显示出正确的结果。通过在各种QNNs和数据集上进行广泛实验，我们的技术已被证明可以通过增加总计算开销的不超过76\%来有效遮蔽单独托管模型的准确性和损失。然而，这种权衡是为了在容易受到不受信任对手侵害的基于云的环境中增强QNNs的安全性和完整性而付出的小代价。我们还通过在真实的127量子比特IBM_Sherbrooke硬件上评估STIQ来展示了其实际应用，结果显示STIQ实现了高达60\%的混淆，其综合性能与未混淆模型相当。

更新时间: 2024-05-29 04:09:46

领域: quant-ph,cs.LG

下载: http://arxiv.org/abs/2405.18746v1

PermLLM: Private Inference of Large Language Models within 3 Seconds under WAN

The emergence of ChatGPT marks the arrival of the large language model (LLM) era. While LLMs demonstrate their power in a variety of fields, they also raise serious privacy concerns as the users' queries are sent to the model provider. On the other side, deploying the LLM on the user's device will also leak all the model data. Existing methods based on secure multiparty computation (MPC) managed to protect both the privacy of the model parameters and user queries. However, they require gigabytes of data transfer and several minutes to generate just one token, making them impractical for most real-world applications. To improve the efficiency of private LLM inference, we propose PermLLM, which accelerates the evaluation of non-linear functions using secure random permutation. Along with the optimized secret sharing protocols and homomorphic encryption, PermLLM achieves two-party private inference of the ChatGLM-6B model at the speed of around 3s/token, under a realistic network setting (10ms RTT and 1Gbps bandwidth), which is magnitudes faster than existing MPC solutions.

Updated: 2024-05-29 04:06:50

标题: PermLLM: 基于广域网在3秒内进行大型语言模型的私密推断

摘要: ChatGPT的出现标志着大型语言模型（LLM）时代的到来。虽然LLMs展示了它们在各个领域的强大能力，但它们也引发了严重的隐私问题，因为用户的查询会被发送到模型提供者。另一方面，在用户设备上部署LLM也会泄漏所有模型数据。基于安全多方计算（MPC）的现有方法成功地保护了模型参数和用户查询的隐私。然而，它们需要大量的数据传输和几分钟才能生成一个令牌，使它们对大多数实际应用来说是不切实际的。为了提高私人LLM推断的效率，我们提出了PermLLM，它通过使用安全随机排列加速非线性函数的评估。结合优化的秘密共享协议和同态加密，PermLLM在一个现实网络设置下（10ms RTT和1Gbps带宽）以约3秒/令牌的速度实现了ChatGLM-6B模型的双方私人推断，这比现有的MPC解决方案快得多。

更新时间: 2024-05-29 04:06:50

领域: cs.CR

下载: http://arxiv.org/abs/2405.18744v1

Musical Phrase Segmentation via Grammatical Induction

We outline a solution to the challenge of musical phrase segmentation that uses grammatical induction algorithms, a class of algorithms which infer a context-free grammar from an input sequence. We analyze the performance of five grammatical induction algorithms on three datasets using various musical viewpoint combinations. Our experiments show that the LONGESTFIRST algorithm achieves the best F1 scores across all three datasets and that input encodings that include the duration viewpoint result in the best performance.

Updated: 2024-05-29 04:04:36

标题: 通过语法归纳进行音乐乐句分割

摘要: 我们提出了一个解决音乐乐句分割挑战的方案，该方案使用语法归纳算法，这是一类可以从输入序列中推断出上下文无关文法的算法。我们分析了五种语法归纳算法在三个数据集上使用不同音乐视角组合的性能。我们的实验表明，LONGESTFIRST算法在三个数据集上都取得了最好的F1得分，而包括持续时间视角在内的输入编码会带来最佳性能。

更新时间: 2024-05-29 04:04:36

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.18742v1

Genshin: General Shield for Natural Language Processing with Large Language Models

Large language models (LLMs) like ChatGPT, Gemini, or LLaMA have been trending recently, demonstrating considerable advancement and generalizability power in countless domains. However, LLMs create an even bigger black box exacerbating opacity, with interpretability limited to few approaches. The uncertainty and opacity embedded in LLMs' nature restrict their application in high-stakes domains like financial fraud, phishing, etc. Current approaches mainly rely on traditional textual classification with posterior interpretable algorithms, suffering from attackers who may create versatile adversarial samples to break the system's defense, forcing users to make trade-offs between efficiency and robustness. To address this issue, we propose a novel cascading framework called Genshin (General Shield for Natural Language Processing with Large Language Models), utilizing LLMs as defensive one-time plug-ins. Unlike most applications of LLMs that try to transform text into something new or structural, Genshin uses LLMs to recover text to its original state. Genshin aims to combine the generalizability of the LLM, the discrimination of the median model, and the interpretability of the simple model. Our experiments on the task of sentimental analysis and spam detection have shown fatal flaws of the current median models and exhilarating results on LLMs' recovery ability, demonstrating that Genshin is both effective and efficient. In our ablation study, we unearth several intriguing observations. Utilizing the LLM defender, a tool derived from the 4th paradigm, we have reproduced BERT's 15% optimal mask rate results in the 3rd paradigm of NLP. Additionally, when employing the LLM as a potential adversarial tool, attackers are capable of executing effective attacks that are nearly semantically lossless.

Updated: 2024-05-29 04:04:05

标题: 原文标题翻译为：Genshin：大型语言模型自然语言处理的通用盾牌

摘要: 大型语言模型（LLMs）如ChatGPT、Gemini或LLaMA最近备受关注，展示了在无数领域中取得的显著进步和广泛适用性。然而，LLMs创建了一个更大的黑盒，加剧了不透明性，可解释性仅限于少数方法。LLMs固有的不确定性和不透明性限制了它们在高风险领域（如金融欺诈、网络钓鱼等）中的应用。当前的方法主要依赖于传统的文本分类与后验可解释算法，受到可能创建多功能对抗样本的攻击者的影响，破坏系统的防御，迫使用户在效率和稳健性之间做出权衡。为了解决这个问题，我们提出了一个名为Genshin（大型语言模型自然语言处理的通用防护盾）的新型级联框架，利用LLMs作为防御性一次性插件。与大多数LLMs的应用试图将文本转换为新的或结构化的内容不同，Genshin利用LLMs将文本恢复到其原始状态。Genshin旨在结合LLM的广泛适用性、中位模型的区分能力和简单模型的可解释性。我们在情感分析和垃圾邮件检测任务上的实验显示了当前中位模型的致命缺陷，以及LLMs的恢复能力带来的令人振奋的结果，证明Genshin既有效又高效。在我们的消融研究中，我们发现了几个有趣的观察结果。利用从第四范式派生的LLM防御者工具，我们在自然语言处理的第三范式中复制了BERT的15％最佳掩码率结果。此外，当将LLM作为潜在的对抗工具时，攻击者能够执行几乎没有语义损失的有效攻击。

更新时间: 2024-05-29 04:04:05

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.18741v1

Mitigating Data Sharing in Public Cloud using Blockchain

Public Cloud Computing has become a fundamental part of modern IT infrastructure as its adoption has transformed the way businesses operate. However, cloud security concerns introduce new risks and challenges related to data protection, sharing, and access control. A synergistic integration of blockchain with the cloud holds immense potential. Blockchain's distributed ledger ensures transparency, immutability, and efficiency as it reduces the reliance on centralized authorities. Motivated by this, our framework proposes a secure data ecosystem in the cloud with the key aspects being Data Rights, Data Sharing, and Data Validation. Also, this approach aims to increase its interoperability and scalability by eliminating the need for data migration. This will ensure that existing public cloud-based systems can easily deploy blockchain enhancing trustworthiness and non-repudiation of cloud data.

Updated: 2024-05-29 03:36:35

标题: 利用区块链技术缓解公共云中的数据共享

摘要: 公共云计算已经成为现代IT基础设施的一个基本组成部分，因为它的采用已经改变了企业运营的方式。然而，云安全问题引入了与数据保护、共享和访问控制相关的新风险和挑战。区块链与云的协同集成具有巨大潜力。区块链的分布式账本确保透明度、不可变性和效率，因为它减少了对中心化机构的依赖。在此基础上，我们的框架提出了在云中建立安全数据生态系统的关键方面，包括数据权利、数据共享和数据验证。此外，这种方法旨在通过消除数据迁移的需要来增加其互操作性和可扩展性。这将确保现有的基于公共云的系统可以轻松部署区块链，增强云数据的可信度和不可否认性。

更新时间: 2024-05-29 03:36:35

领域: cs.CR

下载: http://arxiv.org/abs/2404.16872v2

Protecting Split Learning by Potential Energy Loss

As a practical privacy-preserving learning method, split learning has drawn much attention in academia and industry. However, its security is constantly being questioned since the intermediate results are shared during training and inference. In this paper, we focus on the privacy leakage from the forward embeddings of split learning. Specifically, since the forward embeddings contain too much information about the label, the attacker can either use a few labeled samples to fine-tune the top model or perform unsupervised attacks such as clustering to infer the true labels from the forward embeddings. To prevent such kind of privacy leakage, we propose the potential energy loss to make the forward embeddings become more 'complicated', by pushing embeddings of the same class towards the decision boundary. Therefore, it is hard for the attacker to learn from the forward embeddings. Experiment results show that our method significantly lowers the performance of both fine-tuning attacks and clustering attacks.

Updated: 2024-05-29 03:27:49

标题: 通过潜在能量损失保护分割学习

摘要: 作为一种实际的隐私保护学习方法，分布式学习引起了学术界和工业界的广泛关注。然而，由于在训练和推断过程中共享中间结果，其安全性不断受到质疑。本文关注从分布式学习的前向嵌入中泄露隐私。具体来说，由于前向嵌入包含太多关于标签的信息，攻击者可以使用少量标记样本微调顶层模型，或者执行无监督攻击，如聚类，从前向嵌入中推断真实标签。为防止这种隐私泄露，我们提出了潜在能量损失，使前向嵌入变得更加“复杂”，通过将同一类别的嵌入推向决策边界。因此，攻击者很难从前向嵌入中学习。实验结果表明，我们的方法显著降低了微调攻击和聚类攻击的性能。

更新时间: 2024-05-29 03:27:49

领域: cs.CR,cs.AI,cs.DC,cs.LG

下载: http://arxiv.org/abs/2210.09617v2

Efficient Learning in Chinese Checkers: Comparing Parameter Sharing in Multi-Agent Reinforcement Learning

We show that multi-agent reinforcement learning (MARL) with full parameter sharing outperforms independent and partially shared architectures in the competitive perfect-information homogenous game of Chinese Checkers. To run our experiments, we develop a new MARL environment: variable-size, six-player Chinese Checkers. This custom environment was developed in PettingZoo and supports all traditional rules of the game including chaining jumps. This is, to the best of our knowledge, the first implementation of Chinese Checkers that remains faithful to the true game. Chinese Checkers is difficult to learn due to its large branching factor and potentially infinite horizons. We borrow the concept of branching actions (submoves) from complex action spaces in other RL domains, where a submove may not end a player's turn immediately. This drastically reduces the dimensionality of the action space. Our observation space is inspired by AlphaGo with many binary game boards stacked in a 3D array to encode information. The PettingZoo environment, training and evaluation logic, and analysis scripts can be found on \href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}.

Updated: 2024-05-29 03:27:30

标题: 《中国跳棋中的高效学习：比较多智能体强化学习中的参数共享》

摘要: 我们展示了在中国跳棋这个竞争性完全信息同质游戏中，全参数共享的多智能体强化学习（MARL）优于独立和部分共享架构。为了运行我们的实验，我们开发了一个新的MARL环境：可变大小的六个玩家中国跳棋。这个定制环境是在PettingZoo中开发的，并支持游戏的所有传统规则，包括链接跳跃。据我们所知，这是对中国跳棋的第一个忠实于真实游戏的实现。由于其大的分支因子和潜在的无限视野，中国跳棋很难学习。我们从其他RL领域的复杂动作空间中借鉴了分支动作（子动作）的概念，其中一个子动作可能不会立即结束玩家的回合。这极大地减少了动作空间的维度。我们的观察空间受到AlphaGo的启发，其中许多二进制游戏板堆叠在3D数组中以编码信息。 PettingZoo环境、训练和评估逻辑以及分析脚本可以在\href{https://github.com/noahadhikari/pettingzoo-chinese-checkers}{Github}上找到。

更新时间: 2024-05-29 03:27:30

领域: cs.AI

下载: http://arxiv.org/abs/2405.18733v1

Approximate Thompson Sampling for Learning Linear Quadratic Regulators with $O(\sqrt{T})$ Regret

We propose an approximate Thompson sampling algorithm that learns linear quadratic regulators (LQR) with an improved Bayesian regret bound of $O(\sqrt{T})$. Our method leverages Langevin dynamics with a meticulously designed preconditioner as well as a simple excitation mechanism. We show that the excitation signal induces the minimum eigenvalue of the preconditioner to grow over time, thereby accelerating the approximate posterior sampling process. Moreover, we identify nontrivial concentration properties of the approximate posteriors generated by our algorithm. These properties enable us to bound the moments of the system state and attain an $O(\sqrt{T})$ regret bound without the unrealistic restrictive assumptions on parameter sets that are often used in the literature.

Updated: 2024-05-29 03:24:56

标题: 用近似汤普森抽样学习具有$O(\sqrt{T})$遗憾的线性二次调节器

摘要: 我们提出了一种近似的Thompson采样算法，用于学习具有改进的贝叶斯遗憾界限为$O(\sqrt{T})$的线性二次调节器（LQR）。我们的方法利用了具有精心设计的预处理器和简单激励机制的Langevin动力学。我们展示了激励信号导致预处理器的最小特征值随时间增长，从而加速了近似后验采样过程。此外，我们确定了由我们的算法生成的近似后验的非平凡浓度特性。这些特性使我们能够限制系统状态的矩，并获得一个$O(\sqrt{T})$的遗憾界限，而不需要文献中经常使用的对参数集的不切实际的限制性假设。

更新时间: 2024-05-29 03:24:56

领域: stat.ML,cs.LG,cs.SY,eess.SY

下载: http://arxiv.org/abs/2405.19380v1

Gemini & Physical World: Large Language Models Can Estimate the Intensity of Earthquake Shaking from Multi-Modal Social Media Posts

This paper presents a novel approach for estimating the ground shaking intensity using social media data and CCTV footage. Employing the Gemini Pro (Reid et al. 2024) model, a multi-modal language model, we demonstrate the ability to extract relevant information from unstructured data utilizing generative AI and natural language processing. The model output, in the form of Modified Mercalli Intensity (MMI) values, align well with independent observational data. Furthermore, our results suggest that beyond its advanced visual and auditory understanding abilities, Gemini appears to utilize additional sources of knowledge, including a simplified understanding of the general relationship between earthquake magnitude, distance, and MMI intensity, which it presumably acquired during its training, in its reasoning and decision-making processes. These findings raise intriguing questions about the extent of Gemini's general understanding of the physical world and its phenomena. The ability of Gemini to generate results consistent with established scientific knowledge highlights the potential of LLMs like Gemini in augmenting our understanding of complex physical phenomena such as earthquakes. More specifically, the results of this study highlight the potential of LLMs like Gemini to revolutionize citizen seismology by enabling rapid, effective, and flexible analysis of crowdsourced data from eyewitness accounts for assessing earthquake impact and providing crisis situational awareness. This approach holds great promise for improving early warning systems, disaster response, and overall resilience in earthquake-prone regions. This study provides a significant step toward harnessing the power of social media and AI for earthquake disaster mitigation.

Updated: 2024-05-29 03:23:34

标题: 双子座与物理世界：大型语言模型可以从多模态社交媒体帖子中估计地震震动的强度

摘要: 这篇论文提出了一种利用社交媒体数据和闭路电视录像来估计地面震动强度的新方法。采用了Gemini Pro (Reid et al. 2024)模型，一种多模式语言模型，我们展示了利用生成式人工智能和自然语言处理从非结构化数据中提取相关信息的能力。模型输出以修改的默卡利强度（MMI）值的形式与独立观测数据很好地吻合。此外，我们的结果表明，Gemini除了具有先进的视听理解能力外，还似乎利用了其他知识源，包括对地震震级、距离和MMI强度之间一般关系的简化理解，在其推理和决策过程中可能在训练过程中获得。这些发现引发了有关Gemini对物理世界及其现象的一般理解程度的有趣问题。Gemini生成与已建立的科学知识一致的结果的能力突显了像Gemini这样的LLM在增强我们对复杂物理现象（如地震）的理解方面的潜力。更具体地说，本研究的结果突显了像Gemini这样的LLM在通过对目击者账目的众包数据进行快速、有效和灵活分析以评估地震影响并提供危机情景意识方面的潜力，从而革新了公民地震学。这种方法在改进早期预警系统、灾难响应和地震多发区的整体弹性方面具有巨大潜力。这项研究是利用社交媒体和人工智能来减轻地震灾害的重要一步。

更新时间: 2024-05-29 03:23:34

领域: physics.geo-ph,cs.AI,cs.LG,physics.app-ph

下载: http://arxiv.org/abs/2405.18732v1

VBIM-Net: Variational Born Iterative Network for Inverse Scattering Problems

Recently, studies have shown the potential of integrating field-type iterative methods with deep learning (DL) techniques in solving inverse scattering problems (ISPs). In this article, we propose a novel Variational Born Iterative Network, namely, VBIM-Net, to solve the full-wave ISPs with significantly improved flexibility and inversion quality. The proposed VBIM-Net emulates the alternating updates of the total electric field and the contrast in the variational Born iterative method (VBIM) by multiple layers of subnetworks. We embed the calculation of the contrast variation into each of the subnetworks, converting the scattered field residual into an approximate contrast variation and then enhancing it by a U-Net, thus avoiding the requirement of matched measurement dimension and grid resolution as in existing approaches. The total field and contrast of each layer's output is supervised in the loss function of VBIM-Net, which guarantees the physical interpretability of variables of the subnetworks. In addition, we design a training scheme with extra noise to enhance the model's stability. Extensive numerical results on synthetic and experimental data both verify the inversion quality, generalization ability, and robustness of the proposed VBIM-Net. This work may provide some new inspiration for the design of efficient field-type DL schemes.

Updated: 2024-05-29 03:21:09

标题: VBIM-Net：变分Born迭代网络用于反散射问题

摘要: 最近的研究表明，将场型迭代方法与深度学习（DL）技术相结合在解决逆散射问题（ISPs）方面具有潜力。在本文中，我们提出了一种新颖的变分波恩迭代网络，即VBIM-Net，用于解决全波ISPs，具有显著提高的灵活性和反演质量。所提出的VBIM-Net通过多层次的子网络模拟了变分波恩迭代方法（VBIM）中总电场和对比度的交替更新。我们将对比度变化的计算嵌入到每个子网络中，将散射场残差转换为近似对比度变化，然后通过U-Net加强，从而避免了现有方法中需要匹配测量维度和网格分辨率的要求。每个层输出的总场和对比度都在VBIM-Net的损失函数中受到监督，这保证了子网络变量的物理可解释性。此外，我们设计了一个带有额外噪声的训练方案，以增强模型的稳定性。在合成和实验数据上的大量数值结果都验证了所提出的VBIM-Net的反演质量、泛化能力和鲁棒性。这项工作可能为高效的场型DL方案的设计提供一些新的启发。

更新时间: 2024-05-29 03:21:09

领域: eess.SP,cs.AI,physics.comp-ph

下载: http://arxiv.org/abs/2405.18731v1

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected datasets. Recently, due to their powerful representational capabilities, diffusion models have shown significant potential as policy models for offline RL issues. However, previous offline RL algorithms based on diffusion policies generally adopt weighted regression to improve the policy. This approach optimizes the policy only using the collected actions and is sensitive to Q-values, which limits the potential for further performance enhancement. To this end, we propose a novel preferred-action-optimized diffusion policy for offline RL. In particular, an expressive conditional diffusion model is utilized to represent the diverse distribution of a behavior policy. Meanwhile, based on the diffusion model, preferred actions within the same behavior distribution are automatically generated through the critic function. Moreover, an anti-noise preference optimization is designed to achieve policy improvement by using the preferred actions, which can adapt to noise-preferred actions for stable training. Extensive experiments demonstrate that the proposed method provides competitive or superior performance compared to previous state-of-the-art offline RL methods, particularly in sparse reward tasks such as Kitchen and AntMaze. Additionally, we empirically prove the effectiveness of anti-noise preference optimization.

Updated: 2024-05-29 03:19:59

标题: 离线强化学习中优先动作优化扩散策略

摘要: 离线强化学习（RL）旨在从先前收集的数据集中学习最优策略。最近，由于其强大的表示能力，扩散模型已经显示出作为离线RL问题的策略模型的显著潜力。然而，先前基于扩散策略的离线RL算法通常采用加权回归来改进策略。这种方法仅使用收集到的行动来优化策略，并且对Q值敏感，限制了进一步性能提升的潜力。因此，我们提出了一种新颖的优化偏好动作的扩散策略用于离线RL。具体地，利用表达丰富的条件扩散模型来表示行为策略的多样分布。同时，基于扩散模型，通过评论功能自动生成相同行为分布中的优选动作。此外，设计了一种反噪声偏好优化来通过使用优选动作实现策略改进，这可以适应噪声偏好动作以实现稳定训练。大量实验证明，与先前最先进的离线RL方法相比，提出的方法在稀疏奖励任务（如厨房和AntMaze）中提供了具有竞争力或优越的性能。此外，我们通过实验证明了反噪声偏好优化的有效性。

更新时间: 2024-05-29 03:19:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18729v1

CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control

Retrieval-augmented generation (RAG) has emerged as a promising solution for mitigating hallucinations of large language models (LLMs) with retrieved external knowledge. Adaptive RAG enhances this approach by dynamically assessing the retrieval necessity, aiming to balance external and internal knowledge usage. However, existing adaptive RAG methods primarily realize retrieval on demand by relying on superficially verbalize-based or probability-based feedback of LLMs, or directly fine-tuning LLMs via carefully crafted datasets, resulting in unreliable retrieval necessity decisions, heavy extra costs, and sub-optimal response generation. We present the first attempts to delve into the internal states of LLMs to mitigate such issues by introducing an effective probe-guided adaptive RAG framework, termed CtrlA. Specifically, CtrlA employs an honesty probe to regulate the LLM's behavior by manipulating its representations for increased honesty, and a confidence probe to monitor the internal states of LLM and assess confidence levels, determining the retrieval necessity during generation. Experiments show that CtrlA is superior to existing adaptive RAG methods on a diverse set of tasks, the honesty control can effectively make LLMs more honest and confidence monitoring is proven to be a promising indicator of retrieval trigger. Our codes are available at https://github.com/HSLiu-Initial/CtrlA.git.

Updated: 2024-05-29 03:17:16

标题: CtrlA：通过探针引导控制的自适应检索增强生成

摘要: 检索增强生成（RAG）已经成为一种有希望的解决方案，用于减轻大型语言模型（LLMs）的幻觉，通过检索外部知识。自适应RAG通过动态评估检索的必要性，旨在平衡外部和内部知识的使用。然而，现有的自适应RAG方法主要通过依赖LLMs的表面化基础或基于概率的反馈，或者通过精心设计的数据集直接微调LLMs来实现按需检索，导致不可靠的检索必要性决策，额外的高成本和次优的响应生成。我们首次尝试通过引入一种有效的探测引导自适应RAG框架来深入研究LLMs的内部状态，称为CtrlA。具体来说，CtrlA利用一个诚实探测器通过操纵LLM的表示来调节其行为，增加诚实度，并使用一个自信探测器来监控LLM的内部状态，评估自信水平，在生成过程中确定检索的必要性。实验表明，CtrlA在各种任务中优于现有的自适应RAG方法，诚实控制可以有效使LLMs更加诚实，自信监控被证明是检索触发的一个有前景的指标。我们的代码可在https://github.com/HSLiu-Initial/CtrlA.git 上找到。

更新时间: 2024-05-29 03:17:16

领域: cs.CL,cs.AI,cs.IR

下载: http://arxiv.org/abs/2405.18727v1

Can We Enhance the Quality of Mobile Crowdsensing Data Without Ground Truth?

Mobile crowdsensing (MCS) has emerged as a prominent trend across various domains. However, ensuring the quality of the sensing data submitted by mobile users (MUs) remains a complex and challenging problem. To address this challenge, an advanced method is required to detect low-quality sensing data and identify malicious MUs that may disrupt the normal operations of an MCS system. Therefore, this article proposes a prediction- and reputation-based truth discovery (PRBTD) framework, which can separate low-quality data from high-quality data in sensing tasks. First, we apply a correlation-focused spatial-temporal transformer network to predict the ground truth of the input sensing data. Then, we extract the sensing errors of the data as features based on the prediction results to calculate the implications among the data. Finally, we design a reputation-based truth discovery (TD) module for identifying low-quality data with their implications. Given sensing data submitted by MUs, PRBTD can eliminate the data with heavy noise and identify malicious MUs with high accuracy. Extensive experimental results demonstrate that PRBTD outperforms the existing methods in terms of identification accuracy and data quality enhancement.

Updated: 2024-05-29 03:16:12

标题: 我们能否在没有地面实况的情况下提高移动众感数据的质量？

摘要: 移动众包感知(MCS)已成为各个领域的一个突出趋势。然而，确保移动用户(MUs)提交的感知数据的质量仍然是一个复杂而具有挑战性的问题。为了解决这一挑战，需要一种先进的方法来检测低质量的感知数据，并识别可能破坏MCS系统正常运行的恶意MUs。因此，本文提出了一种基于预测和声誉的真实发现(PRBTD)框架，该框架可以在感知任务中区分低质量数据和高质量数据。首先，我们应用一个以相关性为焦点的时空变换器网络来预测输入感知数据的真实情况。然后，我们基于预测结果提取数据的感知错误作为特征来计算数据之间的影响。最后，我们设计了一个基于声誉的真实发现(TD)模块来识别具有影响的低质量数据。给定由MUs提交的感知数据，PRBTD可以消除具有严重噪音的数据，并高准确度地识别恶意MUs。大量实验结果表明，PRBTD在识别准确性和数据质量提升方面优于现有方法。

更新时间: 2024-05-29 03:16:12

领域: cs.LG,cs.MA

下载: http://arxiv.org/abs/2405.18725v1

Boosting Flow-based Generative Super-Resolution Models via Learned Prior

Flow-based super-resolution (SR) models have demonstrated astonishing capabilities in generating high-quality images. However, these methods encounter several challenges during image generation, such as grid artifacts, exploding inverses, and suboptimal results due to a fixed sampling temperature. To overcome these issues, this work introduces a conditional learned prior to the inference phase of a flow-based SR model. This prior is a latent code predicted by our proposed latent module conditioned on the low-resolution image, which is then transformed by the flow model into an SR image. Our framework is designed to seamlessly integrate with any contemporary flow-based SR model without modifying its architecture or pre-trained weights. We evaluate the effectiveness of our proposed framework through extensive experiments and ablation analyses. The proposed framework successfully addresses all the inherent issues in flow-based SR models and enhances their performance in various SR scenarios. Our code is available at: https://github.com/liyuantsao/BFSR

Updated: 2024-05-29 03:12:58

标题: 通过学习先验知识增强基于流的生成式超分辨率模型

摘要: 基于流的超分辨率（SR）模型已经展示出在生成高质量图像方面的惊人能力。然而，这些方法在图像生成过程中遇到了一些挑战，如网格伪影、反转爆炸以及由于固定采样温度导致的次优结果。为了克服这些问题，本研究在流式SR模型的推理阶段引入了一个条件学习的先验。这个先验是由我们提出的潜变量模块根据低分辨率图像预测的潜在代码，然后通过流模型转换为SR图像。我们的框架旨在与任何当代流式SR模型无缝集成，而无需修改其架构或预训练权重。我们通过大量实验和消融分析评估了我们提出的框架的有效性。提出的框架成功解决了流式SR模型中的所有固有问题，并增强了它们在各种SR场景中的性能。我们的代码可在以下链接找到：https://github.com/liyuantsao/BFSR

更新时间: 2024-05-29 03:12:58

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2403.10988v3

Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction

Accurate prediction of molecular properties is critical in the field of drug discovery. However, existing methods do not fully consider the fact that molecules in the real world usually possess multiple property labels, and complex high-order relationships may exist among these labels. Therefore, molecular representation learning models should generate differential molecular representations that consider multi-granularity correlation information among tasks. To this end, our research introduces a Hierarchical Prompted Molecular Representation Learning Framework (HiPM), which enhances the differential expression of tasks in molecular representations through task-aware prompts, and utilizes shared information among labels to mitigate negative transfer between different tasks. HiPM primarily consists of two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). The MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atomic and motif levels, while the TAP uses agglomerative hierarchical clustering to build a prompt tree that reflects the affinity and distinctiveness of tasks, enabling the model to effectively handle the complexity of multi-label property predictions. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a new perspective on multi-label molecular representation learning.

Updated: 2024-05-29 03:10:21

标题: 使用分层提示调整差异性分子表示以进行多标签属性预测

摘要: 准确预测分子性质在药物发现领域至关重要。然而，现有方法并未充分考虑现实世界中分子通常具有多个属性标签，并且这些标签之间可能存在复杂的高阶关系。因此，分子表示学习模型应生成考虑任务之间多粒度相关性信息的差异性分子表示。为此，我们的研究引入了一个层级提示的分子表示学习框架（HiPM），通过任务感知提示增强分子表示中任务的差异表达，并利用标签之间的共享信息来减轻不同任务之间的负迁移。HiPM主要由两个核心组件组成：分子表示编码器（MRE）和任务感知提示器（TAP）。MRE采用分层消息传递网络架构捕获原子和图案级别的分子特征，而TAP使用聚类分层聚类构建一个反映任务亲和力和独特性的提示树，使模型能够有效处理多标签属性预测的复杂性。大量实验证明，HiPM在各种多标签数据集上实现了最先进的性能，为多标签分子表示学习提供了新的视角。

更新时间: 2024-05-29 03:10:21

领域: q-bio.QM,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18724v1

Conformal Depression Prediction

While existing depression recognition methods based on deep learning show promise, their practical application is hindered by the lack of trustworthiness, as these deep models are often deployed as \textit{black box} models, leaving us uncertain about the confidence of the model predictions. For high-risk clinical applications like depression recognition, uncertainty quantification is essential in decision-making. In this paper, we introduce conformal depression prediction (CDP), a depression recognition method with uncertainty quantification based on conformal prediction (CP), giving valid confidence intervals with theoretical coverage guarantees for the model predictions. CDP is a plug-and-play module that requires neither model retraining nor an assumption about the depression data distribution. As CDP provides only an average performance guarantee across all inputs rather than per-input performance guarantee, we propose CDP-ACC, an improved conformal prediction with approximate conditional coverage. CDP-ACC firstly estimates the prediction distribution through neighborhood relaxation, and then introduces a conformal score function by constructing nested sequences, so as to provide tighter prediction interval for each specific input. We empirically demonstrate the application of uncertainty quantification in depression recognition, and the effectiveness and superiority of CDP and CDP-ACC on the AVEC 2013 and AVEC 2014 datasets

Updated: 2024-05-29 03:08:30

标题: 共形抑郁预测

摘要: 尽管基于深度学习的现有抑郁症识别方法显示出潜力，但它们的实际应用受到了信任度的阻碍，因为这些深度模型通常被部署为"黑盒"模型，让我们对模型预测的置信度感到不确定。对于像抑郁症识别这样的高风险临床应用，不确定性量化在决策中至关重要。在本文中，我们介绍了一种基于符合预测（CP）的不确定性量化的抑郁症预测（CDP）方法，为模型预测提供具有理论覆盖保证的有效置信区间。CDP是一个即插即用的模块，既不需要重新训练模型，也不需要对抑郁数据分布做出假设。由于CDP只提供所有输入的平均性能保证而不是每个输入的性能保证，因此我们提出了CDP-ACC，一种改进的近似条件覆盖的符合预测。CDP-ACC首先通过邻域放松来估计预测分布，然后通过构建嵌套序列引入符合得分函数，从而为每个特定输入提供更紧密的预测区间。我们在AVEC 2013和AVEC 2014数据集上通过实证验证了不确定性量化在抑郁症识别中的应用，并展示了CDP和CDP-ACC的有效性和优越性。

更新时间: 2024-05-29 03:08:30

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18723v1

Correctable Landmark Discovery via Large Models for Vision-Language Navigation

Vision-Language Navigation (VLN) requires the agent to follow language instructions to reach a target position. A key factor for successful navigation is to align the landmarks implied in the instruction with diverse visual observations. However, previous VLN agents fail to perform accurate modality alignment especially in unexplored scenes, since they learn from limited navigation data and lack sufficient open-world alignment knowledge. In this work, we propose a new VLN paradigm, called COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE). In CONSOLE, we cast VLN as an open-world sequential landmark discovery problem, by introducing a novel correctable landmark discovery scheme based on two large models ChatGPT and CLIP. Specifically, we use ChatGPT to provide rich open-world landmark cooccurrence commonsense, and conduct CLIP-driven landmark discovery based on these commonsense priors. To mitigate the noise in the priors due to the lack of visual constraints, we introduce a learnable cooccurrence scoring module, which corrects the importance of each cooccurrence according to actual observations for accurate landmark discovery. We further design an observation enhancement strategy for an elegant combination of our framework with different VLN agents, where we utilize the corrected landmark features to obtain enhanced observation features for action decision. Extensive experimental results on multiple popular VLN benchmarks (R2R, REVERIE, R4R, RxR) show the significant superiority of CONSOLE over strong baselines. Especially, our CONSOLE establishes the new state-of-the-art results on R2R and R4R in unseen scenarios. Code is available at https://github.com/expectorlin/CONSOLE.

Updated: 2024-05-29 03:05:59

标题: 通过大型模型进行视觉-语言导航的可矫正地标发现

摘要: 视觉语言导航(VLN)要求代理根据语言指令到达目标位置。成功导航的关键因素是将指令中暗示的地标与不同的视觉观察对齐。然而，先前的VLN代理在未知场景中尤其在准确的模态对齐方面失败，因为它们学习于有限的导航数据并且缺乏足够的开放世界对齐知识。在这项工作中，我们提出了一种新的VLN范式，称为COrrectable LaNdmark DiScOvery via Large ModEls (CONSOLE)。在CONSOLE中，我们将VLN视为一个开放世界的顺序地标发现问题，引入了一种基于两个大型模型ChatGPT和CLIP的新颖可校准的地标发现方案。具体来说，我们使用ChatGPT提供丰富的开放世界地标共现常识，并根据这些常识先验进行基于CLIP的地标发现。为了减轻由于缺乏视觉约束而导致的先验中的噪声，我们引入了一个可学习的共现评分模块，根据实际观察来校正每个共现的重要性以实现准确的地标发现。我们进一步为与不同的VLN代理的框架进行优雅结合设计了一种观察增强策略，其中我们利用校正后的地标特征来获取增强的观察特征以进行动作决策。对多个流行的VLN基准(R2R, REVERIE, R4R, RxR)进行的广泛实验结果显示CONSOLE相对于强基线具有显著的优越性。特别是，我们的CONSOLE在未知场景中建立了R2R和R4R的最新技术结果。代码可在https://github.com/expectorlin/CONSOLE获得。

更新时间: 2024-05-29 03:05:59

领域: cs.CV,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.18721v1

Contextual Position Encoding: Learning to Count What's Important

The attention mechanism is a critical component of Large Language Models (LLMs) that allows tokens in a sequence to interact with each other, but is order-invariant. Incorporating position encoding (PE) makes it possible to address by position, such as attending to the i-th token. However, current PE methods use token counts to derive position, and thus cannot generalize to higher levels of abstraction, such as attending to the i-th sentence. In this paper, we propose a new position encoding method, Contextual Position Encoding (CoPE), that allows positions to be conditioned on context by incrementing position only on certain tokens determined by the model. This allows more general position addressing such as attending to the $i$-th particular word, noun, or sentence. We show that CoPE can solve the selective copy, counting and Flip-Flop tasks where popular position embeddings fail, and improves perplexity on language modeling and coding tasks.

Updated: 2024-05-29 02:57:15

标题: 情境位置编码：学会计算重要的内容

摘要: 注意机制是大型语言模型（LLMs）的关键组件，允许序列中的令牌相互交互，但是与顺序无关。通过引入位置编码（PE），可以按位置进行处理，例如关注第i个令牌。然而，当前的PE方法使用令牌计数来推导位置，因此无法推广到更高的抽象级别，例如关注第i个句子。在本文中，我们提出了一种新的位置编码方法，即上下文位置编码（CoPE），允许位置根据上下文进行调节，只在由模型确定的某些令牌上增加位置。这允许更一般的位置定位，如关注第$i$个特定单词、名词或句子。我们展示了CoPE可以解决选择性复制、计数和Flip-Flop任务，这些任务是流行的位置嵌入失败的地方，并且在语言建模和编码任务中提高了困惑度。

更新时间: 2024-05-29 02:57:15

领域: cs.CL,cs.AI

下载: http://arxiv.org/abs/2405.18719v1

LangCell: Language-Cell Pre-training for Cell Identity Understanding

Cell identity encompasses various semantic aspects of a cell, including cell type, pathway information, disease information, and more, which are essential for biologists to gain insights into its biological characteristics. Understanding cell identity from the transcriptomic data, such as annotating cell types, have become an important task in bioinformatics. As these semantic aspects are determined by human experts, it is impossible for AI models to effectively carry out cell identity understanding tasks without the supervision signals provided by single-cell and label pairs. The single-cell pre-trained language models (PLMs) currently used for this task are trained only on a single modality, transcriptomics data, lack an understanding of cell identity knowledge. As a result, they have to be fine-tuned for downstream tasks and struggle when lacking labeled data with the desired semantic labels. To address this issue, we propose an innovative solution by constructing a unified representation of single-cell data and natural language during the pre-training phase, allowing the model to directly incorporate insights related to cell identity. More specifically, we introduce LangCell, the first Language-Cell pre-training framework. LangCell utilizes texts enriched with cell identity information to gain a profound comprehension of cross-modal knowledge. Results from experiments conducted on different benchmarks show that LangCell is the only single-cell PLM that can work effectively in zero-shot cell identity understanding scenarios, and also significantly outperforms existing models in few-shot and fine-tuning cell identity understanding scenarios.

Updated: 2024-05-29 02:54:47

标题: LangCell：细胞身份理解的语言细胞预训练

摘要: 细胞身份包括细胞的各种语义方面，包括细胞类型、通路信息、疾病信息等，这些对生物学家来说至关重要，可以帮助他们了解细胞的生物特征。从转录组数据了解细胞身份，如注释细胞类型，已经成为生物信息学中的重要任务。由于这些语义方面是由人类专家确定的，因此AI模型在没有单细胞和标签对提供的监督信号的情况下，无法有效地执行细胞身份理解任务。目前用于此任务的单细胞预训练语言模型（PLM）仅在单一模态转录组数据上进行训练，缺乏对细胞身份知识的理解。因此，它们必须针对下游任务进行微调，在缺乏具有所需语义标签的标记数据时遇到困难。为了解决这个问题，我们提出了一种创新的解决方案，通过在预训练阶段构建单细胞数据和自然语言的统一表示，使模型能够直接融入与细胞身份相关的见解。具体来说，我们介绍了LangCell，第一个语言-细胞预训练框架。LangCell利用富含细胞身份信息的文本来深入理解跨模态知识。在不同基准测试上进行的实验结果显示，LangCell是唯一能在零-shot细胞身份理解场景中有效工作的单细胞PLM，同时在少-shot和微调细胞身份理解场景中明显优于现有模型。

更新时间: 2024-05-29 02:54:47

领域: q-bio.GN,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.06708v3

SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks

Large language models (LLMs) have proven to be highly effective across various natural language processing tasks. However, their large number of parameters poses significant challenges for practical deployment. Pruning, a technique aimed at reducing the size and complexity of LLMs, offers a potential solution by removing redundant components from the network. Despite the promise of pruning, existing methods often struggle to achieve substantial end-to-end LLM inference speedup. In this paper, we introduce SLEB, a novel approach designed to streamline LLMs by eliminating redundant transformer blocks. We choose the transformer block as the fundamental unit for pruning, because LLMs exhibit block-level redundancy with high similarity between the outputs of neighboring blocks. This choice allows us to effectively enhance the processing speed of LLMs. Our experimental results demonstrate that SLEB outperforms previous LLM pruning methods in accelerating LLM inference while also maintaining superior perplexity and accuracy, making SLEB as a promising technique for enhancing the efficiency of LLMs. The code is available at: https://github.com/jiwonsong-dev/SLEB.

Updated: 2024-05-29 02:53:16

标题: SLEB: 通过冗余验证和消除变压器块实现LLMs的优化

摘要: 大型语言模型（LLMs）已被证明在各种自然语言处理任务中非常有效。然而，它们庞大的参数数量给实际部署带来了重大挑战。修剪是一种旨在减小LLMs大小和复杂性的技术，通过从网络中删除冗余组件提供潜在解决方案。尽管修剪有希望，现有方法通常难以实现显著的端到端LLM推理加速。在本文中，我们介绍了SLEB，一种新颖的方法，旨在通过消除冗余的transformer块来简化LLMs。我们选择transformer块作为修剪的基本单元，因为LLMs在输出相似度较高的相邻块之间存在块级冗余。这种选择使我们能够有效提高LLMs的处理速度。我们的实验结果表明，SLEB在加速LLMs推理方面优于以前的LLM修剪方法，同时保持出色的困惑度和准确性，使SLEB成为提高LLMs效率的有前途的技术。代码可在以下链接找到：https://github.com/jiwonsong-dev/SLEB。

更新时间: 2024-05-29 02:53:16

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2402.09025v2

Calibrating Reasoning in Language Models with Internal Consistency

Large language models (LLMs) have demonstrated impressive capabilities in various reasoning tasks, aided by techniques like chain-of-thought (CoT) prompting that elicits verbalized reasoning. However, LLMs often generate text with obvious mistakes and contradictions, raising doubts about their ability to robustly process and utilize generated rationales. In this work, we investigate CoT reasoning in LLMs through the lens of internal representations, focusing on how these representations are influenced by generated rationales. Our preliminary analysis reveals that while generated rationales improve answer accuracy, inconsistencies emerge between the model's internal representations in middle layers and those in final layers, potentially undermining the reliability of their reasoning processes. To address this, we propose internal consistency as a measure of the model's confidence by examining the agreement of latent predictions decoded from intermediate layers. Extensive empirical studies across different models and datasets demonstrate that internal consistency effectively distinguishes between correct and incorrect reasoning paths. Motivated by this, we propose a new approach to calibrate CoT reasoning by up-weighting reasoning paths with high internal consistency, resulting in a significant boost in reasoning performance. Further analysis uncovers distinct patterns in attention and feed-forward modules across layers, providing insights into the emergence of internal inconsistency. In summary, our results demonstrate the potential of using internal representations for self-evaluation of LLMs.

Updated: 2024-05-29 02:44:12

标题: 用内部一致性校准语言模型的推理

摘要: 大型语言模型（LLMs）在各种推理任务中展示了令人印象深刻的能力，其中使用了像链式思维（CoT）提示这样的技术，引发口头推理。然而，LLMs经常生成带有明显错误和矛盾的文本，引发对它们处理和利用生成理由的能力的怀疑。在这项工作中，我们通过内部表示的视角研究LLMs中的CoT推理，重点关注这些表示受生成理由影响的方式。我们的初步分析显示，虽然生成的理由可以提高答案准确性，但该模型中间层和最终层的内部表示之间出现了不一致，可能会削弱其推理过程的可靠性。为了解决这个问题，我们提出了内部一致性作为模型信心的度量，通过检查从中间层解码的潜在预测的一致性来衡量。对不同模型和数据集进行的广泛实证研究表明，内部一致性有效地区分了正确和错误的推理路径。受此启发，我们提出了一种通过加权具有高内部一致性的推理路径来校准CoT推理的新方法，从而显著提升推理性能。进一步分析揭示了各层中关注和前馈模块中的不同模式，为内部不一致性的出现提供了见解。总之，我们的结果显示了利用内部表示进行LLMs自我评估的潜力。

更新时间: 2024-05-29 02:44:12

领域: cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.18711v1

To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability

The massive computational costs associated with large language model (LLM) pretraining have spurred great interest in reduced-precision floating-point representations to accelerate the process. As a result, the BrainFloat16 (BF16) precision has become the de facto standard for LLM training, with hardware support included in recent accelerators. This trend has gone even further in the latest processors, where FP8 has recently been introduced. However, prior experience with FP16, which was found to be less stable than BF16, raises concerns as to whether FP8, with even fewer bits than FP16, can be a cost-effective option for LLM training. We argue that reduced-precision training schemes must have similar training stability and hyperparameter sensitivities to their higher-precision counterparts in order to be cost-effective. However, we find that currently available methods for FP8 training are not robust enough to allow their use as economical replacements. This prompts us to investigate the stability of reduced-precision LLM training in terms of robustness across random seeds and learning rates. To this end, we propose new evaluation techniques and a new metric for quantifying loss landscape sharpness in autoregressive language models. By simulating incremental bit reductions in floating-point representations, we analyze the relationship between representational power and training stability with the intent of aiding future research into the field.

Updated: 2024-05-29 02:42:23

标题: 回到FP8：量化减少精度对LLM训练稳定性的影响

摘要: 大语言模型（LLM）预训练所涉及的巨大计算成本引发了对减少精度浮点表示以加速过程的极大兴趣。结果，BrainFloat16（BF16）精度已成为LLM训练的事实标准，最近的加速器中包括了硬件支持。这一趋势在最新的处理器中甚至进一步发展，最近还引入了FP8。然而，FP16的先前经验发现其比BF16不稳定，引发了对FP8的关注，因为FP8比FP16的位数更少，是否可以成为LLM训练的经济选择。我们认为，减少精度训练方案必须具有与其高精度对应物相似的训练稳定性和超参数敏感性，以实现经济效益。然而，我们发现目前可用的FP8训练方法不够健壮，不能作为经济替代品。这促使我们研究减少精度LLM训练的稳定性，从随机种子和学习率的角度来看其稳健性。为此，我们提出了新的评估技术和用于量化自回归语言模型中损失景观尖锐度的新度量标准。通过模拟浮点表示中逐步减少位数，我们分析了表示能力与训练稳定性之间的关系，旨在促进未来对该领域的研究。

更新时间: 2024-05-29 02:42:23

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18710v1

Relative Error Bound Analysis for Nuclear Norm Regularized Matrix Completion

In this paper, we develop a relative error bound for nuclear norm regularized matrix completion, with the focus on the completion of full-rank matrices. Under the assumption that the top eigenspaces of the target matrix are incoherent, we derive a relative upper bound for recovering the best low-rank approximation of the unknown matrix. Although multiple works have been devoted to analyzing the recovery error of full-rank matrix completion, their error bounds are usually additive, making it impossible to obtain the perfect recovery case and more generally difficult to leverage the skewed distribution of eigenvalues. Our analysis is built upon the optimality condition of the regularized formulation and existing guarantees for low-rank matrix completion. To the best of our knowledge, this is the first relative bound that has been proved for the regularized formulation of matrix completion.

Updated: 2024-05-29 02:39:59

标题: 核范数正则化矩阵填充的相对误差界分析

摘要: 在这篇论文中，我们为核范数正则化矩阵完成问题提出了一个相对误差界限，重点是完成满秩矩阵。在假设目标矩阵的顶部特征空间不相关的情况下，我们推导出了恢复未知矩阵最佳低秩逼近的相对上界。尽管已经有多篇论文致力于分析满秩矩阵完成的恢复误差，但它们的误差界限通常是加法的，这使得无法获得完美恢复情况，更普遍地难以利用特征值的偏斜分布。我们的分析建立在正则化公式的最优条件和现有的低秩矩阵完成保证基础上。据我们所知，这是第一个为矩阵完成的正则化公式证明的相对界限。

更新时间: 2024-05-29 02:39:59

领域: cs.LG,stat.ML

下载: http://arxiv.org/abs/1504.06817v2

Cognitive Evolutionary Learning to Select Feature Interactions for Recommender Systems

Feature interaction selection is a fundamental problem in commercial recommender systems. Most approaches equally enumerate all features and interactions by the same pre-defined operation under expert guidance. Their recommendation is unsatisfactory sometimes due to the following issues: (1)~They cannot ensure the learning abilities of models because their architectures are poorly adaptable to tasks and data; (2)~Useless features and interactions can bring unnecessary noise and complicate the training process. In this paper, we aim to adaptively evolve the model to select appropriate operations, features, and interactions under task guidance. Inspired by the evolution and functioning of natural organisms, we propose a novel \textsl{Cognitive EvoLutionary Learning (CELL)} framework, where cognitive ability refers to a property of organisms that allows them to react and survive in diverse environments. It consists of three stages, i.e., DNA search, genome search, and model functioning. Specifically, if we regard the relationship between models and tasks as the relationship between organisms and natural environments, interactions of feature pairs can be analogous to double-stranded DNA, of which relevant features and interactions can be analogous to genomes. Along this line, we diagnose the fitness of the model on operations, features, and interactions to simulate the survival rates of organisms for natural selection. We show that CELL can adaptively evolve into different models for different tasks and data, which enables practitioners to access off-the-shelf models. Extensive experiments on four real-world datasets demonstrate that CELL significantly outperforms state-of-the-art baselines. Also, we conduct synthetic experiments to ascertain that CELL can consistently discover the pre-defined interaction patterns for feature pairs.

Updated: 2024-05-29 02:35:23

标题: 认知进化学习用于为推荐系统选择特征交互

摘要: 特征交互选择是商业推荐系统中的一个基本问题。大多数方法在专家指导下同样枚举所有特征和交互，并通过相同的预定义操作进行推荐。有时他们的推荐不尽人意，原因如下：（1）他们无法确保模型的学习能力，因为它们的架构对任务和数据的适应性较差；（2）无用的特征和交互可能带来不必要的噪音，并使训练过程复杂化。本文旨在根据任务指导，自适应地演化模型以选择适当的操作、特征和交互。受自然生物的演化和功能启发，我们提出了一种新颖的“认知进化学习（CELL）”框架，其中认知能力指的是一种允许生物在不同环境中反应和生存的属性。它包括三个阶段，即DNA搜索、基因组搜索和模型功能。具体来说，如果我们将模型与任务之间的关系视为生物与自然环境之间的关系，特征对的交互可以类比于双链DNA，其中相关特征和交互可以类比于基因组。沿着这条线路，我们诊断模型在操作、特征和交互上的适应性，以模拟生物的存活率进行自然选择。我们展示了CELL可以自适应地演化成不同的模型，以应对不同的任务和数据，从而使从业者可以使用现成的模型。对四个真实世界数据集进行的广泛实验表明，CELL明显优于现有基线。此外，我们进行了合成实验，以确定CELL能够一致地发现预定义的特征对的交互模式。

更新时间: 2024-05-29 02:35:23

领域: cs.AI,cs.IR,cs.NE

下载: http://arxiv.org/abs/2405.18708v1

Adaptive and Parallel Split Federated Learning in Vehicular Edge Computing

Vehicular edge intelligence (VEI) is a promising paradigm for enabling future intelligent transportation systems by accommodating artificial intelligence (AI) at the vehicular edge computing (VEC) system. Federated learning (FL) stands as one of the fundamental technologies facilitating collaborative model training locally and aggregation, while safeguarding the privacy of vehicle data in VEI. However, traditional FL faces challenges in adapting to vehicle heterogeneity, training large models on resource-constrained vehicles, and remaining susceptible to model weight privacy leakage. Meanwhile, split learning (SL) is proposed as a promising collaborative learning framework which can mitigate the risk of model wights leakage, and release the training workload on vehicles. SL sequentially trains a model between a vehicle and an edge cloud (EC) by dividing the entire model into a vehicle-side model and an EC-side model at a given cut layer. In this work, we combine the advantages of SL and FL to develop an Adaptive Split Federated Learning scheme for Vehicular Edge Computing (ASFV). The ASFV scheme adaptively splits the model and parallelizes the training process, taking into account mobile vehicle selection and resource allocation. Our extensive simulations, conducted on non-independent and identically distributed data, demonstrate that the proposed ASFV solution significantly reduces training latency compared to existing benchmarks, while adapting to network dynamics and vehicles' mobility.

Updated: 2024-05-29 02:34:38

标题: 车辆边缘计算中的自适应和并行分割联邦学习

摘要: 车载边缘智能（VEI）是一种有前途的范式，可以通过在车辆边缘计算（VEC）系统中容纳人工智能（AI）来实现未来智能交通系统。联邦学习（FL）作为一种基础技术，促进了本地协作模型训练和聚合，同时保护了VEI中的车辆数据隐私。然而，传统的FL在适应车辆异构性、在资源受限的车辆上训练大型模型以及容易受到模型权重隐私泄露的挑战。同时，分裂学习（SL）被提出作为一种有前途的协作学习框架，可以减轻模型权重泄露的风险，并释放车辆上的训练负载。SL通过在给定的切割层将整个模型分割成车辆端模型和边缘云（EC）端模型来顺序训练模型。在这项工作中，我们结合了SL和FL的优势，开发了一种适应性分裂联邦学习方案用于车载边缘计算（ASFV）。ASFV方案自适应地分割模型并并行化训练过程，考虑了移动车辆选择和资源分配。我们进行了广泛的模拟实验，针对非独立和相同分布的数据，证明了所提出的ASFV解决方案与现有基准相比显著减少了训练延迟，同时适应了网络动态和车辆的移动性。

更新时间: 2024-05-29 02:34:38

领域: cs.LG,cs.AI,cs.NI

下载: http://arxiv.org/abs/2405.18707v1

Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off

Although adversarial training has been the state-of-the-art approach to defend against adversarial examples (AEs), it suffers from a robustness-accuracy trade-off, where high robustness is achieved at the cost of clean accuracy. In this work, we leverage invariance regularization on latent representations to learn discriminative yet adversarially invariant representations, aiming to mitigate this trade-off. We analyze two key issues in representation learning with invariance regularization: (1) a "gradient conflict" between invariance loss and classification objectives, leading to suboptimal convergence, and (2) the mixture distribution problem arising from diverged distributions of clean and adversarial inputs. To address these issues, we propose Asymmetrically Representation-regularized Adversarial Training (AR-AT), which incorporates asymmetric invariance loss with stop-gradient operation and a predictor to improve the convergence, and a split-BatchNorm (BN) structure to resolve the mixture distribution problem. Our method significantly improves the robustness-accuracy trade-off by learning adversarially invariant representations without sacrificing discriminative ability. Furthermore, we discuss the relevance of our findings to knowledge-distillation-based defense methods, contributing to a deeper understanding of their relative successes.

Updated: 2024-05-29 02:30:40

标题: 重新思考对抗训练中的不变性正则化，以改善鲁棒性-准确性权衡Trade-off

摘要: 尽管对抗训练一直是对抗性例子（AEs）的最先进方法，但它面临着鲁棒性-准确性的权衡问题，高鲁棒性是以牺牲准确性为代价实现的。在这项工作中，我们利用不变性正则化的潜在表示来学习具有区分性但对抗性不变的表示，旨在缓解这种权衡问题。我们分析了使用不变性正则化进行表示学习的两个关键问题：（1）不变性损失与分类目标之间的“梯度冲突”，导致次优收敛，以及（2）由于干净和对抗性输入的分布分歧而产生的混合分布问题。为了解决这些问题，我们提出了不对称表示正则化对抗训练（AR-AT），它结合了不对称不变性损失、停梯度操作和一个预测器来提高收敛性，并使用分割-BatchNorm（BN）结构来解决混合分布问题。我们的方法通过学习对抗性不变的表示显著改善了鲁棒性-准确性的权衡，并且没有牺牲区分能力。此外，我们讨论了我们的发现与基于知识蒸馏的防御方法的相关性，有助于更深入地理解它们的相对成功。

更新时间: 2024-05-29 02:30:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2402.14648v2

Comprehensive Analysis of Network Robustness Evaluation Based on Convolutional Neural Networks with Spatial Pyramid Pooling

Connectivity robustness, a crucial aspect for understanding, optimizing, and repairing complex networks, has traditionally been evaluated through time-consuming and often impractical simulations. Fortunately, machine learning provides a new avenue for addressing this challenge. However, several key issues remain unresolved, including the performance in more general edge removal scenarios, capturing robustness through attack curves instead of directly training for robustness, scalability of predictive tasks, and transferability of predictive capabilities. In this paper, we address these challenges by designing a convolutional neural networks (CNN) model with spatial pyramid pooling networks (SPP-net), adapting existing evaluation metrics, redesigning the attack modes, introducing appropriate filtering rules, and incorporating the value of robustness as training data. The results demonstrate the thoroughness of the proposed CNN framework in addressing the challenges of high computational time across various network types, failure component types and failure scenarios. However, the performance of the proposed CNN model varies: for evaluation tasks that are consistent with the trained network type, the proposed CNN model consistently achieves accurate evaluations of both attack curves and robustness values across all removal scenarios. When the predicted network type differs from the trained network, the CNN model still demonstrates favorable performance in the scenario of random node failure, showcasing its scalability and performance transferability. Nevertheless, the performance falls short of expectations in other removal scenarios. This observed scenario-sensitivity in the evaluation of network features has been overlooked in previous studies and necessitates further attention and optimization. Lastly, we discuss important unresolved questions and further investigation.

Updated: 2024-05-29 02:26:10

标题: 基于空间金字塔池化的卷积神经网络网络鲁棒性评估的全面分析

摘要: 连通性鲁棒性是理解、优化和修复复杂网络的关键方面，传统上通过耗时且常常不切实际的模拟来评估。幸运的是，机器学习为解决这一挑战提供了新途径。然而，仍有几个关键问题尚未解决，包括在更一般的边缘删除场景中的性能，通过攻击曲线捕捉鲁棒性而不是直接训练鲁棒性，预测任务的可扩展性以及预测能力的可转移性。在本文中，我们通过设计一个带有空间金字塔池化网络（SPP-net）的卷积神经网络（CNN）模型，调整现有评估指标，重新设计攻击模式，引入适当的过滤规则，并将鲁棒性价值作为训练数据来解决这些挑战。结果表明，所提出的CNN框架在处理各种网络类型、故障组件类型和故障场景中的高计算时间挑战方面是彻底的。然而，所提出的CNN模型的性能有所不同：对于与训练网络类型一致的评估任务，所提出的CNN模型在所有删除场景中始终能够准确评估攻击曲线和鲁棒性值。当预测的网络类型与训练网络不同时，CNN模型仍然在随机节点故障场景中表现出良好的性能，展示了其可扩展性和性能可转移性。然而，在其他删除场景中，性能低于预期。这种观察到的网络特征评估中的场景敏感性在先前的研究中被忽视，需要进一步关注和优化。最后，我们讨论了一些重要的未解决问题和进一步的研究。

更新时间: 2024-05-29 02:26:10

领域: cs.CV,cs.LG,cs.NI,cs.SI,68T07 (Primary) 90B25, 05C80, 05C82, 90B15, 90B18 (Secondary),I.2.6; G.2.2; J.4; F.2.2

下载: http://arxiv.org/abs/2308.08012v3

Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have significantly improved the performance of Large Language Models (LLMs) in complex reasoning tasks. With the evolution of Multimodal Large Language Models (MLLMs), enhancing their capability to tackle complex multimodal reasoning problems is a crucial frontier. However, incorporating multimodal rationales in CoT has yet to be thoroughly investigated. We propose the Image-of-Thought (IoT) prompting method, which helps MLLMs to extract visual rationales step-by-step. Specifically, IoT prompting can automatically design critical visual information extraction operations based on the input images and questions. Each step of visual information refinement identifies specific visual rationales that support answers to complex visual reasoning questions. Beyond the textual CoT, IoT simultaneously utilizes visual and textual rationales to help MLLMs understand complex multimodal information. IoT prompting has improved zero-shot visual reasoning performance across various visual understanding tasks in different MLLMs. Moreover, the step-by-step visual feature explanations generated by IoT prompting elucidate the visual reasoning process, aiding in analyzing the cognitive processes of large multimodal models

Updated: 2024-05-29 02:24:36

标题: 多模态大型语言模型中的视觉推理细化的思维启发图像

摘要: 最近在基于CoT和相关基于理性的工作方面取得了显著进展，大型语言模型（LLMs）在复杂推理任务中的表现得到了显著提高。随着多模式大型语言模型（MLLMs）的发展，增强它们解决复杂多模态推理问题的能力是一个关键的前沿。然而，在CoT中融入多模态理性尚未得到彻底的研究。我们提出了Image-of-Thought（IoT）提示方法，帮助MLLMs逐步提取视觉理性。具体来说，IoT提示可以自动设计基于输入图像和问题的关键视觉信息提取操作。视觉信息的每一步细化都可以识别支持复杂视觉推理问题答案的特定视觉理性。除了文本CoT，IoT同时利用视觉和文本理性帮助MLLMs理解复杂多模态信息。IoT提示已经改进了不同MLLMs在各种视觉理解任务中的零样本视觉推理表现。此外，IoT提示生成的逐步视觉特征解释阐明了视觉推理过程，有助于分析大型多模态模型的认知过程。

更新时间: 2024-05-29 02:24:36

领域: cs.AI,cs.CL,cs.CV

下载: http://arxiv.org/abs/2405.13872v2

A Neural Network Transformer Model for Composite Microstructure Homogenization

Heterogeneity and uncertainty in a composite microstructure lead to either computational bottlenecks if modeled rigorously or to solution inaccuracies in the stress field and failure predictions if approximated. Although methods suitable for analyzing arbitrary and non-linear microstructures exist, their computational cost makes them impractical to use in large-scale structural analysis. Surrogate models or Reduced Order Models (ROMs) commonly enhance efficiencies but are typically calibrated with a single microstructure. Homogenization methods, such as the Mori-Tanaka method, offer rapid homogenization for a wide range of constituent properties. However, simplifying assumptions, like stress and strain averaging in phases, render the consideration of both deterministic and stochastic variations in microstructure infeasible. This paper illustrates a transformer neural network architecture that captures the knowledge of various microstructures and constituents, enabling it to function as a computationally efficient homogenization surrogate model. Given an image or an abstraction of an arbitrary composite microstructure of linearly elastic fibers in an elastoplastic matrix, the transformer network predicts the history-dependent, non-linear, and homogenized stress-strain response. Two methods for encoding microstructure features were tested: calculating two-point statistics using Principal Component Analysis (PCA) for dimensionality reduction and employing an autoencoder with a Convolutional Neural Network (CNN). Both methods accurately predict the homogenized material response. The developed transformer neural network offers an efficient means for microstructure-to-property translation, generalizable and extendable to a variety of microstructures. The paper describes the network architecture, training and testing data generation, and performance under cycling and random loadings.

Updated: 2024-05-29 02:20:25

标题: 一个用于复合微观结构均质化的神经网络变换器模型

摘要: 复合微观结构中的异质性和不确定性会导致严格建模时的计算瓶颈，或者在近似时会导致应力场和破坏预测的不准确性。虽然存在适用于分析任意和非线性微观结构的方法，但它们的计算成本使其在大规模结构分析中难以使用。代理模型或减少阶模型（ROMs）通常可提高效率，但通常校准使用单一微观结构。均匀化方法，如Mori-Tanaka方法，可为各种组分性质提供快速均匀化。然而，简化假设，如阶段中的应力和应变平均，使得在微观结构中考虑确定性和随机变化变得不可行。本文展示了一种变压器神经网络架构，它捕捉了各种微观结构和组分的知识，使其能够作为一个计算效率高的均匀化代理模型。给定一个线性弹性纤维在弹塑性基体中的任意复合微观结构的图像或抽象，变压器网络可以预测历史依赖性、非线性和均匀化的应力应变响应。测试了两种编码微观结构特征的方法：使用主成分分析（PCA）计算两点统计以进行维度降低，以及使用卷积神经网络（CNN）的自动编码器。这两种方法都能准确预测均匀化材料响应。开发的变压器神经网络提供了一种有效的微观结构到性能的转换方法，可泛化和扩展到各种微观结构。本文描述了网络架构、训练和测试数据生成，以及在循环和随机加载下的性能。

更新时间: 2024-05-29 02:20:25

领域: cs.LG,physics.app-ph

下载: http://arxiv.org/abs/2304.07877v2

Spectral-Risk Safe Reinforcement Learning with Convergence Guarantees

The field of risk-constrained reinforcement learning (RCRL) has been developed to effectively reduce the likelihood of worst-case scenarios by explicitly handling risk-measure-based constraints. However, the nonlinearity of risk measures makes it challenging to achieve convergence and optimality. To overcome the difficulties posed by the nonlinearity, we propose a spectral risk measure-constrained RL algorithm, spectral-risk-constrained policy optimization (SRCPO), a bilevel optimization approach that utilizes the duality of spectral risk measures. In the bilevel optimization structure, the outer problem involves optimizing dual variables derived from the risk measures, while the inner problem involves finding an optimal policy given these dual variables. The proposed method, to the best of our knowledge, is the first to guarantee convergence to an optimum in the tabular setting. Furthermore, the proposed method has been evaluated on continuous control tasks and showed the best performance among other RCRL algorithms satisfying the constraints.

Updated: 2024-05-29 02:17:25

标题: 具有收敛保证的谱风险安全强化学习

摘要: 风险受限强化学习（RCRL）领域已经发展出一种有效减少最坏情况发生概率的方法，通过明确处理基于风险测量的约束。然而，风险测量的非线性使得实现收敛和最优性变得具有挑战性。为了克服非线性带来的困难，我们提出了一种谱风险测量受限RL算法，谱风险受限策略优化（SRCPO），这是一种双层优化方法，利用了谱风险测量的对偶性。在双层优化结构中，外部问题涉及优化从风险测量中导出的对偶变量，而内部问题涉及在给定这些对偶变量的情况下找到最优策略。据我们所知，所提出的方法是在表格设置中保证收敛到最优解的第一种方法。此外，所提出的方法已在连续控制任务上进行评估，并在满足约束条件的其他RCRL算法中表现最佳。

更新时间: 2024-05-29 02:17:25

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18698v1

ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography

Purpose: To develop and evaluate a transformer-based deep learning model for the synthesis of nephrographic phase images in CT urography (CTU) examinations from the unenhanced and urographic phases. Materials and Methods: This retrospective study was approved by the local Institutional Review Board. A dataset of 119 patients (mean $\pm$ SD age, 65 $\pm$ 12 years; 75/44 males/females) with three-phase CT urography studies was curated for deep learning model development. The three phases for each patient were aligned with an affine registration algorithm. A custom model, coined Residual transformer model for Nephrographic phase CT image synthesis (ResNCT), was developed and implemented with paired inputs of non-contrast and urographic sets of images trained to produce the nephrographic phase images, that were compared with the corresponding ground truth nephrographic phase images. The synthesized images were evaluated with multiple performance metrics, including peak signal to noise ratio (PSNR), structural similarity index (SSIM), normalized cross correlation coefficient (NCC), mean absolute error (MAE), and root mean squared error (RMSE). Results: The ResNCT model successfully generated synthetic nephrographic images from non-contrast and urographic image inputs. With respect to ground truth nephrographic phase images, the images synthesized by the model achieved high PSNR (27.8 $\pm$ 2.7 dB), SSIM (0.88 $\pm$ 0.05), and NCC (0.98 $\pm$ 0.02), and low MAE (0.02 $\pm$ 0.005) and RMSE (0.042 $\pm$ 0.016). Conclusion: The ResNCT model synthesized nephrographic phase CT images with high similarity to ground truth images. The ResNCT model provides a means of eliminating the acquisition of the nephrographic phase with a resultant 33% reduction in radiation dose for CTU examinations.

Updated: 2024-05-29 02:12:44

标题: ResNCT：用于CT泌尿造影中肾功能相图像合成的深度学习模型

摘要: 目的：开发并评估一种基于变压器的深度学习模型，用于在CT尿路造影（CTU）检查中合成肾影相图像，从未增强和尿路相中提取。材料和方法：本回顾性研究已获得当地机构审查委员会的批准。共有119名患者（平均年龄65±12岁；男性/女性75/44）的三相CT尿路造影研究数据集被用于深度学习模型的开发。每位患者的三个相均通过仿射配准算法对齐。开发了一种自定义模型，被称为用于合成肾影相CT图像的残差变压器模型（ResNCT），并采用非对比和尿路图像对作为输入，训练以产生肾影相图像。与相应的真实肾影相图像进行比较。合成图像通过多种性能指标进行评估，包括峰值信噪比（PSNR）、结构相似性指数（SSIM）、归一化互相关系数（NCC）、平均绝对误差（MAE）和均方根误差（RMSE）。结果：ResNCT模型成功地从非对比和尿路图像输入中生成了合成的肾影相图像。相对于真实的肾影相图像，模型合成的图像达到了高PSNR（27.8±2.7 dB）、SSIM（0.88±0.05）和NCC（0.98±0.02），以及低MAE（0.02±0.005）和RMSE（0.042±0.016）。结论：ResNCT模型合成的肾影相CT图像与真实图像具有高度相似性。ResNCT模型提供了一种方法，可消除CTU检查中肾影相的采集，从而使辐射剂量减少33%。

更新时间: 2024-05-29 02:12:44

领域: eess.IV,cs.AI,physics.med-ph,J.3

下载: http://arxiv.org/abs/2405.04629v2

ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention

Recently, linear complexity sequence modeling networks have achieved modeling capabilities similar to Vision Transformers on a variety of computer vision tasks, while using fewer FLOPs and less memory. However, their advantage in terms of actual runtime speed is not significant. To address this issue, we introduce Gated Linear Attention (GLA) for vision, leveraging its superior hardware-awareness and efficiency. We propose direction-wise gating to capture 1D global context through bidirectional modeling and a 2D gating locality injection to adaptively inject 2D local details into 1D global context. Our hardware-aware implementation further merges forward and backward scanning into a single kernel, enhancing parallelism and reducing memory cost and latency. The proposed model, ViG, offers a favorable trade-off in accuracy, parameters, and FLOPs on ImageNet and downstream tasks, outperforming popular Transformer and CNN-based models. Notably, ViG-S matches DeiT-B's accuracy while using only 27% of the parameters and 20% of the FLOPs, running 2$\times$ faster on $224\times224$ images. At $1024\times1024$ resolution, ViG-T uses 5.2$\times$ fewer FLOPs, saves 90% GPU memory, runs 4.8$\times$ faster, and achieves 20.7% higher top-1 accuracy than DeiT-T. These results position ViG as an efficient and scalable solution for visual representation learning. Code is available at \url{https://github.com/hustvl/ViG}.

Updated: 2024-05-29 02:06:30

标题: ViG: 具有门控线性注意力的线性复杂度视觉序列学习

摘要: 最近，线性复杂度序列建模网络在各种计算机视觉任务上实现了类似于Vision Transformers的建模能力，同时使用更少的FLOPs和更少的内存。然而，它们在实际运行速度方面的优势并不显著。为了解决这个问题，我们引入了适用于视觉的门控线性注意力（GLA），利用其优越的硬件感知和效率。我们提出了方向性门控，通过双向建模捕获1D全局上下文，以及2D门控局部注入，自适应地将2D局部细节注入到1D全局上下文中。我们的硬件感知实现进一步将前向和后向扫描合并为一个单一核心，增强了并行性，减少了内存成本和延迟。所提出的模型ViG在ImageNet和下游任务中在准确性、参数和FLOPs上提供了有利的权衡，优于流行的Transformer和基于CNN的模型。值得注意的是，ViG-S仅使用了DeiT-B的27%的参数和20%的FLOPs，在$224\times224$图像上运行速度提高了2倍。在$1024\times1024$分辨率下，ViG-T使用了5.2倍的FLOPs，节省了90%的GPU内存，运行速度提高了4.8倍，并且比DeiT-T高出20.7%的top-1准确度。这些结果将ViG定位为视觉表征学习的高效可扩展解决方案。代码可在\url{https://github.com/hustvl/ViG}上找到。

更新时间: 2024-05-29 02:06:30

领域: cs.CV,cs.AI

下载: http://arxiv.org/abs/2405.18425v2

DeepHGNN: Study of Graph Neural Network based Forecasting Methods for Hierarchically Related Multivariate Time Series

Graph Neural Networks (GNN) have gained significant traction in the forecasting domain, especially for their capacity to simultaneously account for intra-series temporal correlations and inter-series relationships. This paper introduces a novel Hierarchical GNN (DeepHGNN) framework, explicitly designed for forecasting in complex hierarchical structures. The uniqueness of DeepHGNN lies in its innovative graph-based hierarchical interpolation and an end-to-end reconciliation mechanism. This approach ensures forecast accuracy and coherence across various hierarchical levels while sharing signals across them, addressing a key challenge in hierarchical forecasting. A critical insight in hierarchical time series is the variance in forecastability across levels, with upper levels typically presenting more predictable components. DeepHGNN capitalizes on this insight by pooling and leveraging knowledge from all hierarchy levels, thereby enhancing the overall forecast accuracy. Our comprehensive evaluation set against several state-of-the-art models confirm the superior performance of DeepHGNN. This research not only demonstrates DeepHGNN's effectiveness in achieving significantly improved forecast accuracy but also contributes to the understanding of graph-based methods in hierarchical time series forecasting.

Updated: 2024-05-29 02:06:17

标题: DeepHGNN：基于图神经网络的层次相关多变量时间序列预测方法研究

摘要: 图神经网络（GNN）在预测领域获得了显著的关注，特别是因为它们能够同时考虑序列内部的时间相关性和序列间的关系。本文介绍了一种新颖的Hierarchical GNN（DeepHGNN）框架，专门设计用于复杂的层次结构中的预测。DeepHGNN的独特之处在于其创新的基于图的层次插值和端到端的协调机制。这种方法确保了在各种层次上的预测准确性和一致性，同时在它们之间共享信号，解决了层次预测中的一个关键挑战。在层次时间序列中的一个关键洞察是在不同层次间的可预测性差异，通常较高层次呈现更可预测的组件。DeepHGNN利用这一洞察，从所有层次汇聚和利用知识，从而提高整体预测准确性。我们对几种最先进的模型进行了全面评估，证实了DeepHGNN的卓越性能。这项研究不仅展示了DeepHGNN在显著改善预测准确性方面的有效性，还有助于理解基于图的方法在层次时间序列预测中的应用。

更新时间: 2024-05-29 02:06:17

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18693v1

Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation

Preference-based reinforcement learning (PbRL) has shown impressive capabilities in training agents without reward engineering. However, a notable limitation of PbRL is its dependency on substantial human feedback. This dependency stems from the learning loop, which entails accurate reward learning compounded with value/policy learning, necessitating a considerable number of samples. To boost the learning loop, we propose SEER, an efficient PbRL method that integrates label smoothing and policy regularization techniques. Label smoothing reduces overfitting of the reward model by smoothing human preference labels. Additionally, we bootstrap a conservative estimate $\widehat{Q}$ using well-supported state-action pairs from the current replay memory to mitigate overestimation bias and utilize it for policy learning regularization. Our experimental results across a variety of complex tasks, both in online and offline settings, demonstrate that our approach improves feedback efficiency, outperforming state-of-the-art methods by a large margin. Ablation studies further reveal that SEER achieves a more accurate Q-function compared to prior work.

Updated: 2024-05-29 01:49:20

标题: 通过对齐经验估计实现高效的基于偏好的强化学习

摘要: 基于偏好的强化学习（PbRL）已经显示出在没有奖励工程的情况下训练代理的显著能力。然而，PbRL的一个显著限制是其对大量人类反馈的依赖性。这种依赖性源自学习循环，其中包括准确的奖励学习以及值/策略学习，需要大量的样本。为了提升学习循环，我们提出了SEER，一种高效的PbRL方法，它整合了标签平滑和策略正则化技术。标签平滑通过平滑人类偏好标签来减少奖励模型的过度拟合。此外，我们利用当前回放内存中的支持良好的状态-动作对来引入一个保守估计$\widehat{Q}$，以减轻过度估计偏差，并将其用于策略学习正则化。我们在各种复杂任务上的实验结果，无论是在线还是离线设置，都表明我们的方法改善了反馈效率，大大优于最先进的方法。消融研究进一步揭示，与先前工作相比，SEER实现了更准确的Q函数。

更新时间: 2024-05-29 01:49:20

领域: cs.LG,cs.AI,cs.CL

下载: http://arxiv.org/abs/2405.18688v1

AutoCV: Empowering Reasoning with Automated Process Labeling via Confidence Variation

In this work, we propose a novel method named \textbf{Auto}mated Process Labeling via \textbf{C}onfidence \textbf{V}ariation (\textbf{\textsc{AutoCV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. Our approach begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process. This alleviates the need for numerous manual annotations or the high computational costs associated with model-induced annotation approaches. We experimentally validate that the confidence variations learned by the verification model trained on the final answer correctness can effectively identify errors in the reasoning steps. Subsequently, we demonstrate that the process annotations generated by \textsc{AutoCV} can improve the accuracy of the verification model in selecting the correct answer from multiple outputs generated by LLMs. Notably, we achieve substantial improvements across five datasets in mathematics and commonsense reasoning. The source code of \textsc{AutoCV} is available at \url{https://github.com/rookie-joe/AUTOCV}.

Updated: 2024-05-29 01:47:35

标题: AutoCV：通过置信度变化赋能自动化流程标记的推理

摘要: 在这项工作中，我们提出了一种名为\textbf{自动过程标记}通过\textbf{置信度变化}（\textbf{\textsc{AutoCV}}）的新方法，以通过自动注释推理步骤来增强大型语言模型（LLMs）的推理能力。我们的方法首先通过训练一个验证模型来评估最终答案的正确性，使其能够生成自动过程注释。该验证模型为每个推理步骤分配一个置信度分数，表示从那一点开始到达正确最终答案的概率。我们检测验证的置信度分数在推理步骤之间的相对变化，以自动注释推理过程。这减轻了对大量手动注释或与模型诱导注释方法相关的高计算成本的需求。我们在实验证实，验证模型在最终答案正确性训练中学习到的置信度变化能够有效识别推理步骤中的错误。随后，我们证明\textsc{AutoCV}生成的过程注释可以提高验证模型在LLMs生成的多个输出中选择正确答案的准确性。值得注意的是，我们在数学和常识推理的五个数据集上取得了显著的改进。\textsc{AutoCV}的源代码可在\url{https://github.com/rookie-joe/AUTOCV}找到。

更新时间: 2024-05-29 01:47:35

领域: cs.CL,cs.LG

下载: http://arxiv.org/abs/2405.16802v3

Advancing Household Robotics: Deep Interactive Reinforcement Learning for Efficient Training and Enhanced Performance

The market for domestic robots made to perform household chores is growing as these robots relieve people of everyday responsibilities. Domestic robots are generally welcomed for their role in easing human labor, in contrast to industrial robots, which are frequently criticized for displacing human workers. But before these robots can carry out domestic chores, they need to become proficient in several minor activities, such as recognizing their surroundings, making decisions, and picking up on human behaviors. Reinforcement learning, or RL, has emerged as a key robotics technology that enables robots to interact with their environment and learn how to optimize their actions to maximize rewards. However, the goal of Deep Reinforcement Learning is to address more complicated, continuous action-state spaces in real-world settings by combining RL with Neural Networks. The efficacy of DeepRL can be further augmented through interactive feedback, in which a trainer offers real-time guidance to expedite the robot's learning process. Nevertheless, the current methods have drawbacks, namely the transient application of guidance that results in repeated learning under identical conditions. Therefore, we present a novel method to preserve and reuse information and advice via Deep Interactive Reinforcement Learning, which utilizes a persistent rule-based system. This method not only expedites the training process but also lessens the number of repetitions that instructors will have to carry out. This study has the potential to advance the development of household robots and improve their effectiveness and efficiency as learners.

Updated: 2024-05-29 01:46:50

标题: 推进家庭机器人技术：深度交互式强化学习以实现高效训练和增强性能

摘要: 家用机器人市场正不断增长，这些机器人旨在执行家务任务，可以解放人们的日常责任。家用机器人通常受到欢迎，因为它们可以减轻人类劳动负担，与此相反，工业机器人经常因为取代人类工人而受到批评。然而，在这些机器人执行家务任务之前，它们需要熟练掌握几项较小的活动，比如识别周围环境、做决策以及理解人类行为。强化学习（RL）已经成为一种关键的机器人技术，使机器人能够与环境互动，学习如何优化行动以最大化奖励。然而，深度强化学习的目标是通过将RL与神经网络相结合，来解决更复杂、连续的行动状态空间在现实世界中的问题。通过交互式反馈，可以进一步增强DeepRL的效果，即训练员提供实时指导，以加快机器人的学习过程。然而，目前的方法存在缺点，即指导的短暂应用导致在相同条件下重复学习。因此，我们提出了一种通过深度交互式强化学习来保存和重复使用信息和建议的新方法，该方法利用了持久的基于规则的系统。这种方法不仅可以加快训练过程，还可以减少教练员需要执行的重复次数。这项研究有潜力推动家用机器人的发展，提高它们作为学习者的效果和效率。

更新时间: 2024-05-29 01:46:50

领域: cs.RO,cs.LG

下载: http://arxiv.org/abs/2405.18687v1

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation policy from static demonstration data, followed by fast finetuning with minimal environmental interaction. We find the na\"ive combination of existing offline IL and online IL methods tends to behave poorly in this context, because the initial discriminator (often used in online IL) operates randomly and discordantly against the policy initialization, leading to misguided policy optimization and $\textit{unlearning}$ of pretraining knowledge. To overcome this challenge, we propose a principled offline-to-online IL method, named $\texttt{OLLIE}$, that simultaneously learns a near-expert policy initialization along with an $\textit{aligned discriminator initialization}$, which can be seamlessly integrated into online IL, achieving smooth and fast finetuning. Empirically, $\texttt{OLLIE}$ consistently and significantly outperforms the baseline methods in $\textbf{20}$ challenging tasks, from continuous control to vision-based domains, in terms of performance, demonstration efficiency, and convergence speed. This work may serve as a foundation for further exploration of pretraining and finetuning in the context of IL.

Updated: 2024-05-29 01:42:39

标题: OLLIE：从离线预训练到在线微调的模仿学习

摘要: 在这篇论文中，我们研究了离线到在线模仿学习（IL），该方法从静态演示数据中预训练一个模仿策略，然后通过最小的环境交互进行快速微调。我们发现现有的离线IL和在线IL方法的天真组合在这种情境下往往表现不佳，因为初始的鉴别器（通常在在线IL中使用）会随机地、不协调地对策略初始化进行操作，导致策略优化错误和预训练知识的“遗忘”。为了克服这一挑战，我们提出了一种有原则的离线到在线IL方法，命名为$\texttt{OLLIE}$，它同时学习一个接近专家策略初始化和一个“对齐的鉴别器初始化”，可以无缝地集成到在线IL中，实现平滑、快速的微调。在实证上，$\texttt{OLLIE}$在$\textbf{20}$个具有挑战性的任务中，从连续控制到基于视觉的领域，在性能、演示效率和收敛速度方面一直明显优于基准方法。这项工作可能为在IL背景下进一步探索预训练和微调奠定基础。

更新时间: 2024-05-29 01:42:39

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17477v2

How to Leverage Diverse Demonstrations in Offline Imitation Learning

Offline Imitation Learning (IL) with imperfect demonstrations has garnered increasing attention owing to the scarcity of expert data in many real-world domains. A fundamental problem in this scenario is how to extract positive behaviors from noisy data. In general, current approaches to the problem select data building on state-action similarity to given expert demonstrations, neglecting precious information in (potentially abundant) $\textit{diverse}$ state-actions that deviate from expert ones. In this paper, we introduce a simple yet effective data selection method that identifies positive behaviors based on their resultant states -- a more informative criterion enabling explicit utilization of dynamics information and effective extraction of both expert and beneficial diverse behaviors. Further, we devise a lightweight behavior cloning algorithm capable of leveraging the expert and selected data correctly. In the experiments, we evaluate our method on a suite of complex and high-dimensional offline IL benchmarks, including continuous-control and vision-based tasks. The results demonstrate that our method achieves state-of-the-art performance, outperforming existing methods on $\textbf{20/21}$ benchmarks, typically by $\textbf{2-5x}$, while maintaining a comparable runtime to Behavior Cloning ($\texttt{BC}$).

Updated: 2024-05-29 01:41:13

标题: 如何利用多样化的演示在离线模仿学习中发挥作用

摘要: 线下模仿学习（IL）中存在缺陷演示的问题引起了越来越多的关注，这是因为在许多现实世界领域中专家数据稀缺。在这种情况下的一个基本问题是如何从嘈杂的数据中提取积极行为。一般来说，当前解决这个问题的方法选择根据状态-动作相似性来构建数据，忽略了在与专家示范有所偏离的（潜在丰富的）$\textit{多样化}$状态-动作中的宝贵信息。在本文中，我们引入了一种简单而有效的数据选择方法，该方法基于其结果状态识别积极行为--这是一种更具信息性的标准，可以明确利用动力学信息并有效提取专家和有益的多样化行为。此外，我们设计了一种轻量级的行为克隆算法，能够正确利用专家和选择的数据。在实验中，我们在一系列复杂且高维的线下IL基准测试中评估了我们的方法，包括连续控制和基于视觉的任务。结果表明，我们的方法实现了最先进的性能，在$\textbf{20/21}$个基准测试中表现优异，通常是$\textbf{2-5倍}$，同时保持与行为克隆（$\texttt{BC}$）相当的运行时间。

更新时间: 2024-05-29 01:41:13

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17476v2

ImputeFormer: Low Rankness-Induced Transformers for Generalizable Spatiotemporal Imputation

Missing data is a pervasive issue in both scientific and engineering tasks, especially for the modeling of spatiotemporal data. This problem attracts many studies to contribute to data-driven solutions. Existing imputation solutions mainly include low-rank models and deep learning models. The former assumes general structural priors but has limited model capacity. The latter possesses salient features of expressivity but lacks prior knowledge of the underlying spatiotemporal structures. Leveraging the strengths of both two paradigms, we demonstrate a low rankness-induced Transformer to achieve a balance between strong inductive bias and high model expressivity. The exploitation of the inherent structures of spatiotemporal data enables our model to learn balanced signal-noise representations, making it generalizable for a variety of imputation problems. We demonstrate its superiority in terms of accuracy, efficiency, and versatility in heterogeneous datasets, including traffic flow, solar energy, smart meters, and air quality. Promising empirical results provide strong conviction that incorporating time series primitives, such as low-rankness, can substantially facilitate the development of a generalizable model to approach a wide range of spatiotemporal imputation problems.

Updated: 2024-05-29 01:39:55

标题: ImputeFormer：低秩诱导的变压器用于可泛化时空插补

摘要: 缺失数据是科学和工程任务中普遍存在的问题，特别是在处理时空数据建模时。这个问题吸引了许多研究来贡献数据驱动的解决方案。现有的插补方法主要包括低秩模型和深度学习模型。前者假设一般的结构先验，但模型容量有限。后者具有表达能力的显著特征，但缺乏对底层时空结构的先验知识。利用这两种范式的优势，我们展示了一种低秩诱导的Transformer，实现了强归纳偏差和高模型表达力之间的平衡。利用时空数据的固有结构使我们的模型能够学习平衡的信号-噪声表示，使其能够推广到各种插补问题。我们在包括交通流、太阳能、智能电表和空气质量在内的异构数据集中展示了其在准确性、效率和多功能性方面的优越性。有前景的实证结果强有力地表明，将时间序列基元（如低秩性）纳入模型可以极大地促进开发出一种通用模型，以解决各种时空插补问题。

更新时间: 2024-05-29 01:39:55

领域: cs.LG

下载: http://arxiv.org/abs/2312.01728v3

Federated Offline Policy Optimization with Dual Regularization

Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes a novel offline federated policy optimization algorithm, named $\texttt{DRPO}$, which enables distributed agents to collaboratively learn a decision policy only from private and static data without further environmental interactions. $\texttt{DRPO}$ leverages dual regularization, incorporating both the local behavioral policy and the global aggregated policy, to judiciously cope with the intrinsic two-tier distributional shifts in offline FRL. Theoretical analysis characterizes the impact of the dual regularization on performance, demonstrating that by achieving the right balance thereof, $\texttt{DRPO}$ can effectively counteract distributional shifts and ensure strict policy improvement in each federative learning round. Extensive experiments validate the significant performance gains of $\texttt{DRPO}$ over baseline methods.

Updated: 2024-05-29 01:38:59

标题: 使用双重正则化进行联合离线策略优化

摘要: 联邦强化学习（FRL）被认为是在人工物联网时代智能决策的一种有前途的解决方案。然而，现有的FRL方法通常在本地更新过程中需要与环境进行重复交互，这在许多现实世界领域可能是代价高昂甚至不可行的。为了克服这一挑战，本文提出了一种新颖的离线联邦策略优化算法，命名为$\texttt{DRPO}$，它使分布式代理能够仅从私有和静态数据中协作学习决策策略，而无需进一步与环境进行交互。$\texttt{DRPO}$利用双重正则化，同时整合本地行为策略和全局聚合策略，以明智地处理离线FRL中的固有双层分布变化。理论分析表明，双重正则化对性能的影响，通过实现正确的平衡，$\texttt{DRPO}$能够有效抵消分布变化并确保在每一轮联邦学习中严格的策略改进。大量实验证实了$\texttt{DRPO}$相对于基准方法的显著性能提升。

更新时间: 2024-05-29 01:38:59

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17474v2

Subset-Based Instance Optimality in Private Estimation

We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.

Updated: 2024-05-29 01:37:23

标题: 基于子集的私密估计中的实例最优性

摘要: 我们提出了一个新的差分隐私估计算法实例最优性的定义。我们的定义要求一个最优算法在每个数据集$D$上与最佳的私有基准算法同时竞争，该基准算法(a)提前知道$D$，(b)通过在$D$的大子集上的最坏情况表现进行评估。也就是说，基准算法在潜在的极端点被添加到$D$时不必表现良好；它只需要处理一小部分已经存在的真实数据点的删除。这使得我们的基准比之前的工作中提出的基准要强大得多。尽管如此，我们仍然展示了如何针对实值数据集构建差分隐私算法，使其在估计包括均值、分位数和$\ell_p$-范数最小化器在内的广泛数据集属性时实现我们的实例最优性概念。特别是对于均值，我们提供了详细分析，并展示我们的算法在一系列分布假设下同时匹配或超越现有算法的渐近性能。

更新时间: 2024-05-29 01:37:23

领域: cs.LG,cs.CR,cs.IT,math.IT

下载: http://arxiv.org/abs/2303.01262v3

Momentum-Based Federated Reinforcement Learning with Interaction and Communication Efficiency

Federated Reinforcement Learning (FRL) has garnered increasing attention recently. However, due to the intrinsic spatio-temporal non-stationarity of data distributions, the current approaches typically suffer from high interaction and communication costs. In this paper, we introduce a new FRL algorithm, named $\texttt{MFPO}$, that utilizes momentum, importance sampling, and additional server-side adjustment to control the shift of stochastic policy gradients and enhance the efficiency of data utilization. We prove that by proper selection of momentum parameters and interaction frequency, $\texttt{MFPO}$ can achieve $\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$ and $\tilde{\mathcal{O}}(\epsilon^{-1})$ interaction and communication complexities ($N$ represents the number of agents), where the interaction complexity achieves linear speedup with the number of agents, and the communication complexity aligns the best achievable of existing first-order FL algorithms. Extensive experiments corroborate the substantial performance gains of $\texttt{MFPO}$ over existing methods on a suite of complex and high-dimensional benchmarks.

Updated: 2024-05-29 01:36:56

标题: 基于动量的联邦强化学习与交互和通信效率

摘要: 最近，联邦强化学习（Federated Reinforcement Learning，FRL）受到越来越多的关注。然而，由于数据分布的固有时空非稳态性，当前方法通常面临高交互和通信成本的问题。在本文中，我们介绍了一种新的FRL算法，命名为$\texttt{MFPO}$，它利用动量、重要性抽样和额外的服务器端调整来控制随机策略梯度的变化，并增强数据利用效率。我们证明，通过正确选择动量参数和交互频率，$\texttt{MFPO}$可以实现$\tilde{\mathcal{O}}(H N^{-1}\epsilon^{-3/2})$和$\tilde{\mathcal{O}}(\epsilon^{-1})$的交互和通信复杂度（其中$N$表示代理数量），其中交互复杂度随着代理数量的增加呈线性加速，通信复杂度与现有一阶FL算法的最佳可实现复杂度相一致。大量实验证实了$\texttt{MFPO}$在一系列复杂和高维基准测试中相对于现有方法的显著性能提升。

更新时间: 2024-05-29 01:36:56

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.17471v2

3D-GPT: Procedural 3D Modeling with Large Language Models

In the pursuit of efficient automated content creation, procedural generation, leveraging modifiable parameters and rule-based systems, emerges as a promising approach. Nonetheless, it could be a demanding endeavor, given its intricate nature necessitating a deep understanding of rules, algorithms, and parameters. To reduce workload, we introduce 3D-GPT, a framework utilizing large language models~(LLMs) for instruction-driven 3D modeling. 3D-GPT positions LLMs as proficient problem solvers, dissecting the procedural 3D modeling tasks into accessible segments and appointing the apt agent for each task. 3D-GPT integrates three core agents: the task dispatch agent, the conceptualization agent, and the modeling agent. They collaboratively achieve two objectives. First, it enhances concise initial scene descriptions, evolving them into detailed forms while dynamically adapting the text based on subsequent instructions. Second, it integrates procedural generation, extracting parameter values from enriched text to effortlessly interface with 3D software for asset creation. Our empirical investigations confirm that 3D-GPT not only interprets and executes instructions, delivering reliable results but also collaborates effectively with human designers. Furthermore, it seamlessly integrates with Blender, unlocking expanded manipulation possibilities. Our work highlights the potential of LLMs in 3D modeling, offering a basic framework for future advancements in scene generation and animation.

Updated: 2024-05-29 01:35:15

标题: 3D-GPT：利用大型语言模型进行程序化3D建模

摘要: 在追求高效自动化内容创作的过程中，利用可修改参数和基于规则的系统进行程序生成成为一种有前途的方法。尽管如此，这可能是一项具有挑战性的工作，因为其复杂的性质需要对规则、算法和参数有深刻的理解。为了减轻工作量，我们引入了3D-GPT，这是一个利用大型语言模型（LLMs）进行指导驱动的3D建模的框架。3D-GPT将LLMs定位为熟练的问题解决者，将程序化的3D建模任务分解为易访问的部分，并为每个任务指定适当的代理。3D-GPT集成了三个核心代理：任务分派代理、概念化代理和建模代理。它们共同实现了两个目标。首先，它提升了简洁的初始场景描述，将其发展成详细的形式，同时根据后续指令动态地调整文本。其次，它集成了程序生成，从丰富的文本中提取参数值，以便轻松地与3D软件进行资产创建的接口。我们的实证研究证实，3D-GPT不仅解释和执行指令，提供可靠的结果，而且还能有效地与人类设计师合作。此外，它与Blender无缝集成，开启了更广泛的操作可能性。我们的工作突显了LLMs在3D建模中的潜力，为未来在场景生成和动画方面的进展提供了一个基本框架。

更新时间: 2024-05-29 01:35:15

领域: cs.CV,cs.GR,cs.LG

下载: http://arxiv.org/abs/2310.12945v2

Rejection via Learning Density Ratios

Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions. The predominant approach is to alter the supervised learning pipeline by augmenting typical loss functions, letting model rejection incur a lower loss than an incorrect prediction. Instead, we propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance. This can be formalized via the optimization of a loss's risk with a $ \phi$-divergence regularization term. Through this idealized distribution, a rejection decision can be made by utilizing the density ratio between this distribution and the data distribution. We focus on the setting where our $ \phi $-divergences are specified by the family of $ \alpha $-divergence. Our framework is tested empirically over clean and noisy datasets.

Updated: 2024-05-29 01:32:17

标题: 通过学习密度比拒绝

摘要: 带拒绝的分类出现为一种学习范式，允许模型避免做出预测。主要方法是通过改变监督学习流程，通过增加典型损失函数，让模型拒绝预测的损失低于错误预测的损失。相反，我们提出了一个不同的分布视角，我们试图找到一个理想化的数据分布，最大化预训练模型的性能。这可以通过优化损失的风险，加上$ \phi$-散度正则化项来形式化。通过这个理想化分布，可以利用该分布与数据分布之间的密度比来做出拒绝决策。我们关注的设置是我们的$ \phi $-散度由$ \alpha $-散度家族指定。我们的框架在干净和嘈杂的数据集上经过了实证测试。

更新时间: 2024-05-29 01:32:17

领域: stat.ML,cs.LG

下载: http://arxiv.org/abs/2405.18686v1

Security--Throughput Tradeoff of Nakamoto Consensus under Bandwidth Constraints

For Nakamoto's longest-chain consensus protocol, whose proof-of-work (PoW) and proof-of-stake (PoS) variants power major blockchains such as Bitcoin and Cardano, we revisit the classic problem of the security-performance tradeoff: Given a network of nodes with limited capacities, against what fraction of adversary power is Nakamoto consensus (NC) secure for a given block production rate? State-of-the-art analyses of Nakamoto's protocol fail to answer this question because their bounded-delay model does not capture realistic constraints such as limited communication- and computation-resources. We develop a new analysis technique to prove a refined security-performance tradeoff for PoW Nakamoto consensus in a bounded-bandwidth model. In this model, we show that, in contrast to the classic bounded-delay model, Nakamoto's private attack is no longer the worst attack, and a new attack strategy we call the teasing strategy, that exploits the network congestion caused by limited bandwidth, is strictly worse. In PoS, equivocating blocks can exacerbate congestion, making the traditional PoS Nakamoto consensus protocol insecure except at very low block production rates. To counter such equivocation spamming, we present a variant of the PoS NC protocol we call Blanking NC (BlaNC), which achieves the same resilience as PoW NC.

Updated: 2024-05-29 01:21:06

标题: 安全性-吞吐量在带宽约束下的纳卡莫托共识权衡

摘要: 对于Nakamoto的最长链共识协议，其工作量证明（PoW）和权益证明（PoS）变体驱动着诸如比特币和卡尔达诺等主要区块链的协议，我们重新审视了安全性与性能之间的经典问题：在具有有限容量的节点网络中，对于给定的区块生产速率，对于对手实力的什么比例，Nakamoto共识（NC）是安全的？最新的Nakamoto协议分析未能回答这个问题，因为它们的有限延迟模型并未捕捉到现实约束，如有限的通信和计算资源。我们开发了一种新的分析技术，以在有限带宽模型中证明PoW Nakamoto共识的精细安全性-性能权衡。在这个模型中，我们展示了，与经典的有限延迟模型相反，Nakamoto的私有攻击不再是最糟糕的攻击，利用由有限带宽引起的网络拥塞的一种新攻击策略，我们称之为挑逗策略，是严格更糟糕的。在PoS中，模棱两可的区块可能加剧拥塞，使传统的PoS Nakamoto共识协议在非常低的区块生产速率下不安全。为了对抗这种模棱两可的垃圾信息，我们提出了一种我们称之为Blanking NC（BlaNC）的PoS NC协议变体，它实现了与PoW NC相同的弹性。

更新时间: 2024-05-29 01:21:06

领域: cs.CR,cs.DC

下载: http://arxiv.org/abs/2303.09113v3

Can GPT Redefine Medical Understanding? Evaluating GPT on Biomedical Machine Reading Comprehension

Large language models (LLMs) have shown remarkable performance on many tasks in different domains. However, their performance in closed-book biomedical machine reading comprehension (MRC) has not been evaluated in depth. In this work, we evaluate GPT on four closed-book biomedical MRC benchmarks. We experiment with different conventional prompting techniques as well as introduce our own novel prompting method. To solve some of the retrieval problems inherent to LLMs, we propose a prompting strategy named Implicit Retrieval Augmented Generation (RAG) that alleviates the need for using vector databases to retrieve important chunks in traditional RAG setups. Moreover, we report qualitative assessments on the natural language generation outputs from our approach. The results show that our new prompting technique is able to get the best performance in two out of four datasets and ranks second in rest of them. Experiments show that modern-day LLMs like GPT even in a zero-shot setting can outperform supervised models, leading to new state-of-the-art (SoTA) results on two of the benchmarks.

Updated: 2024-05-29 01:12:53

标题: 《GPT能重新定义医学理解吗？评估GPT在生物医学机器阅读理解上的表现》

摘要: 大型语言模型(LLMs)在不同领域的许多任务中展现出了卓越的性能。然而，在封闭书籍生物医学机器阅读理解(MRC)中，它们的性能尚未得到深入评估。在这项工作中，我们评估了GPT在四个封闭书籍生物医学MRC基准上的表现。我们尝试不同的传统提示技术，并引入了我们自己的新颖提示方法。为了解决LLMs固有的一些检索问题，我们提出了一种名为隐式检索增强生成(RAG)的提示策略，从而减轻了在传统RAG设置中使用向量数据库检索重要块的需要。此外，我们对我们方法的自然语言生成输出进行了定性评估。结果显示，我们的新提示技术能在四个数据集中的两个中获得最佳性能，并在其余数据集中排名第二。实验表明，像GPT这样的现代LLMs甚至在零-shot设置中也能胜过监督模型，从而在两个基准测试中取得了新的最新技术(SoTA)结果。

更新时间: 2024-05-29 01:12:53

领域: cs.CL,cs.AI,cs.LG

下载: http://arxiv.org/abs/2405.18682v1

Dynamic Matching Bandit For Two-Sided Online Markets

Two-sided online matching platforms are employed in various markets. However, agents' preferences in the current market are usually implicit and unknown, thus needing to be learned from data. With the growing availability of dynamic side information involved in the decision process, modern online matching methodology demands the capability to track shifting preferences for agents based on contextual information. This motivates us to propose a novel framework for this dynamic online matching problem with contextual information, which allows for dynamic preferences in matching decisions. Existing works focus on online matching with static preferences, but this is insufficient: the two-sided preference changes as soon as one side's contextual information updates, resulting in non-static matching. In this paper, we propose a dynamic matching bandit algorithm to adapt to this problem. The key component of the proposed dynamic matching algorithm is an online estimation of the preference ranking with a statistical guarantee. Theoretically, we show that the proposed dynamic matching algorithm delivers an agent-optimal stable matching result with high probability. In particular, we prove a logarithmic regret upper bound $\mathcal{O}(\log(T))$ and construct a corresponding instance-dependent matching regret lower bound. In the experiments, we demonstrate that dynamic matching algorithm is robust to various preference schemes, dimensions of contexts, reward noise levels, and context variation levels, and its application to a job-seeking market further demonstrates the practical usage of the proposed method.

Updated: 2024-05-29 01:12:01

标题: 动态匹配老虎机对于双边在线市场

摘要: 双边在线匹配平台在各种市场中被使用。然而，在当前市场中，代理的偏好通常是隐含的且未知的，因此需要从数据中学习。随着决策过程中涉及的动态边际信息的增加，现代在线匹配方法需要能够根据上下文信息跟踪代理的偏好变化。这激励我们提出了一个新颖的框架，用于解决具有上下文信息的动态在线匹配问题，该框架允许在匹配决策中存在动态偏好。现有的研究侧重于具有静态偏好的在线匹配，但这是不够的：一方的上下文信息更新后，双边偏好立即发生变化，导致匹配不再是静态的。在本文中，我们提出了一个动态匹配赌博算法来适应这个问题。所提出的动态匹配算法的关键组成部分是对偏好排序的在线估计，并具有统计保证。从理论上讲，我们展示了所提出的动态匹配算法以很高的概率提供了一种代理最优稳定匹配结果。特别地，我们证明了对数遗憾上界为$\mathcal{O}(\log(T))$，并构建了相应的实例相关匹配遗憾下界。在实验中，我们展示了动态匹配算法对于各种偏好方案、上下文维度、奖励噪声水平和上下文变化水平的鲁棒性，以及其在求职市场中的应用进一步展示了所提出方法的实际用途。

更新时间: 2024-05-29 01:12:01

领域: cs.LG,cs.GT,cs.MA,stat.ML

下载: http://arxiv.org/abs/2205.03699v3

A random-key GRASP for combinatorial optimization

This paper proposes a problem-independent GRASP metaheuristic using the random-key optimizer (RKO) paradigm. GRASP (greedy randomized adaptive search procedure) is a metaheuristic for combinatorial optimization that repeatedly applies a semi-greedy construction procedure followed by a local search procedure. The best solution found over all iterations is returned as the solution of the GRASP. Continuous GRASP (C-GRASP) is an extension of GRASP for continuous optimization in the unit hypercube. A random-key optimizer (RKO) uses a vector of random keys to encode a solution to a combinatorial optimization problem. It uses a decoder to evaluate a solution encoded by the vector of random keys. A random-key GRASP is a C-GRASP where points in the unit hypercube are evaluated employing a decoder. We describe random key GRASP consisting of a problem-independent component and a problem-dependent decoder. As a proof of concept, the random-key GRASP is tested on five NP-hard combinatorial optimization problems: traveling salesman problem, tree of hubs location problem, Steiner triple covering problem, node capacitated graph partitioning problem, and job sequencing and tool switching problem.

Updated: 2024-05-29 01:07:38

标题: 一种用于组合优化的随机键GRASP算法

摘要: 本文提出了一种使用随机密钥优化器（RKO）范例的问题无关GRASP元启发式方法。GRASP（贪婪随机自适应搜索过程）是一种用于组合优化的元启发式方法，它重复应用半贪婪构造过程，然后是局部搜索过程。在所有迭代中找到的最佳解作为GRASP的解返回。连续GRASP（C-GRASP）是GRASP的扩展，用于在单位超立方体中进行连续优化。随机密钥优化器（RKO）使用随机密钥向量来编码组合优化问题的解。它使用解码器来评估由随机密钥向量编码的解。随机密钥GRASP是一个在单位超立方体中评估点的C-GRASP，采用解码器。我们描述了一个由问题无关组件和问题相关解码器组成的随机密钥GRASP。作为概念验证，随机密钥GRASP在五个NP困难的组合优化问题上进行了测试：旅行推销员问题、中心位置树问题、Steiner三元覆盖问题、节点容量图分区问题和作业顺序和工具切换问题。

更新时间: 2024-05-29 01:07:38

领域: cs.NE,cs.AI,math.OC,90-02, 90B40, 90C27,G.1.6; G.2.1; I.2.8

下载: http://arxiv.org/abs/2405.18681v1

Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits

There has been significant recent interest in graph-based nearest neighbor search methods, many of which are centered on the construction of navigable graphs over high-dimensional point sets. A graph is navigable if we can successfully move from any starting node to any target node using a greedy routing strategy where we always move to the neighbor that is closest to the destination according to a given distance function. The complete graph is navigable for any point set, but the important question for applications is if sparser graphs can be constructed. While this question is fairly well understood in low-dimensions, we establish some of the first upper and lower bounds for high-dimensional point sets. First, we give a simple and efficient way to construct a navigable graph with average degree $O(\sqrt{n \log n })$ for any set of $n$ points, in any dimension, for any distance function. We compliment this result with a nearly matching lower bound: even under the Euclidean metric in $O(\log n)$ dimensions, a random point set has no navigable graph with average degree $O(n^{\alpha})$ for any $\alpha < 1/2$. Our lower bound relies on sharp anti-concentration bounds for binomial random variables, which we use to show that the near-neighborhoods of a set of random points do not overlap significantly, forcing any navigable graph to have many edges.

Updated: 2024-05-29 01:07:26

标题: 高维最近邻搜索的可导航图：构建和限制

摘要: 最近对基于图的最近邻搜索方法引起了很大的兴趣，其中许多方法都集中在构建高维点集上的可导航图。如果我们可以使用贪婪的路由策略成功地从任何起始节点移动到任何目标节点，其中我们始终移动到距离目的地最近的邻居，那么图就是可导航的。对于任何点集来说，完整图都是可导航的，但对于应用程序来说，重要的问题是是否可以构建更稀疏的图。虽然在低维空间中这个问题已经被很好地理解，但我们为高维点集建立了一些首个上界和下界。首先，我们提供了一种简单而有效的方法来构建一个平均度数为$O(\sqrt{n \log n })$的可导航图，适用于任何维度、任何距离函数的$n$个点集。我们用几乎匹配的下界来补充这个结果：即使在$O(\log n)$维的欧几里德度量下，一个随机点集也没有平均度数为$O(n^{\alpha})$的可导航图，其中$\alpha < 1/2$。我们的下界依赖于二项随机变量的尖锐的反集中度界，我们用它来展示随机点集的近邻域不会有显著的重叠，迫使任何可导航图拥有许多边。

更新时间: 2024-05-29 01:07:26

领域: cs.DS,cs.CG,cs.DB,cs.LG

下载: http://arxiv.org/abs/2405.18680v1

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization. To reveal this bias, we identify invariant sets, or subsets of parameter space that remain unmodified by SGD. We focus on two classes of invariant sets that correspond to simpler (sparse or low-rank) subnetworks and commonly appear in modern architectures. Our analysis uncovers that SGD exhibits a property of stochastic attractivity towards these simpler invariant sets. We establish a sufficient condition for stochastic attractivity based on a competition between the loss landscape's curvature around the invariant set and the noise introduced by stochastic gradients. Remarkably, we find that an increased level of noise strengthens attractivity, leading to the emergence of attractive invariant sets associated with saddle-points or local maxima of the train loss. We observe empirically the existence of attractive invariant sets in trained deep neural networks, implying that SGD dynamics often collapses to simple subnetworks with either vanishing or redundant neurons. We further demonstrate how this simplifying process of stochastic collapse benefits generalization in a linear teacher-student framework. Finally, through this analysis, we mechanistically explain why early training with large learning rates for extended periods benefits subsequent generalization.

Updated: 2024-05-29 01:03:31

标题: 随机崩溃：梯度噪声如何吸引SGD动态朝着更简单的子网络方向发展

摘要: 在这项工作中，我们揭示了随机梯度下降（SGD）存在的一个强烈的隐性偏见，驱使过度表达的网络变得更简单，从而大幅降低独立参数的数量，并提高泛化能力。为了揭示这种偏见，我们确定了不变集，即SGD不会修改的参数空间的子集。我们关注两类对应于更简单（稀疏或低秩）子网络的不变集，这些子网络在现代架构中常见。我们的分析揭示了SGD对这些更简单不变集的随机吸引性。我们建立了基于损失景观围绕不变集的曲率和随机梯度引入的噪声之间竞争的随机吸引性的充分条件。值得注意的是，我们发现噪声水平的增加会加强吸引性，导致与鞍点或局部最大值的训练损失相关的有吸引力的不变集的出现。我们经验性地观察到在训练好的深度神经网络中存在有吸引力的不变集，这意味着SGD动态通常会崩溃为带有消失或冗余神经元的简单子网络。我们进一步展示了随机崩溃的简化过程如何在线性师生框架中有利于泛化。最后，通过这项分析，我们解释了为什么在延长周期内使用大学习率进行早期训练有利于随后的泛化。

更新时间: 2024-05-29 01:03:31

领域: cs.LG,cs.AI,stat.ML

下载: http://arxiv.org/abs/2306.04251v3

Automated Model Selection for Tabular Data

Structured data in the form of tabular datasets contain features that are distinct and discrete, with varying individual and relative importances to the target. Combinations of one or more features may be more predictive and meaningful than simple individual feature contributions. R's mixed effect linear models library allows users to provide such interactive feature combinations in the model design. However, given many features and possible interactions to select from, model selection becomes an exponentially difficult task. We aim to automate the model selection process for predictions on tabular datasets incorporating feature interactions while keeping computational costs small. The framework includes two distinct approaches for feature selection: a Priority-based Random Grid Search and a Greedy Search method. The Priority-based approach efficiently explores feature combinations using prior probabilities to guide the search. The Greedy method builds the solution iteratively by adding or removing features based on their impact. Experiments on synthetic demonstrate the ability to effectively capture predictive feature combinations.

Updated: 2024-05-29 01:03:16

标题: 表格数据的自动模型选择

摘要: 结构化数据以表格数据集的形式存在，其中包含的特征是独特且离散的，对目标的重要性有个体和相对不同。一个或多个特征的组合可能比简单的单个特征贡献更具预测性和意义。R的混合效应线性模型库允许用户在模型设计中提供这种交互式特征组合。然而，鉴于存在许多特征和可能的相互作用可供选择，模型选择变得成倍困难。我们旨在自动化模型选择过程，以在保持计算成本较低的同时进行表格数据集上的预测，同时结合特征交互。该框架包括两种不同的特征选择方法：基于优先级的随机网格搜索和贪婪搜索方法。基于优先级的方法通过使用先验概率来指导搜索有效地探索特征组合。贪婪方法通过根据其影响逐步添加或删除特征来迭代地构建解决方案。对合成数据的实验表明，能够有效捕捉预测特征组合。

更新时间: 2024-05-29 01:03:16

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2401.00961v2

Optimized Layerwise Approximation for Efficient Private Inference on Fully Homomorphic Encryption

Recent studies have explored the deployment of privacy-preserving deep neural networks utilizing homomorphic encryption (HE), especially for private inference (PI). Many works have attempted the approximation-aware training (AAT) approach in PI, changing the activation functions of a model to low-degree polynomials that are easier to compute on HE by allowing model retraining. However, due to constraints in the training environment, it is often necessary to consider post-training approximation (PTA), using the pre-trained parameters of the existing plaintext model without retraining. Existing PTA studies have uniformly approximated the activation function in all layers to a high degree to mitigate accuracy loss from approximation, leading to significant time consumption. This study proposes an optimized layerwise approximation (OLA), a systematic framework that optimizes both accuracy loss and time consumption by using different approximation polynomials for each layer in the PTA scenario. For efficient approximation, we reflect the layerwise impact on the classification accuracy by considering the actual input distribution of each activation function while constructing the optimization problem. Additionally, we provide a dynamic programming technique to solve the optimization problem and achieve the optimized layerwise degrees in polynomial time. As a result, the OLA method reduces inference times for the ResNet-20 model and the ResNet-32 model by 3.02 times and 2.82 times, respectively, compared to prior state-of-the-art implementations employing uniform degree polynomials. Furthermore, we successfully classified CIFAR-10 by replacing the GELU function in the ConvNeXt model with only 3-degree polynomials using the proposed method, without modifying the backbone model.

Updated: 2024-05-29 01:00:01

标题: Fully Homomorphic Encryption的高效私人推理的优化逐层近似

摘要: 最近的研究探讨了利用同态加密（HE）部署保护隐私的深度神经网络，特别是用于私人推理（PI）。许多作品尝试了逼近感知训练（AAT）方法在PI中，通过改变模型的激活函数为低次多项式，从而更容易在HE上进行计算，允许模型重新训练。然而，由于训练环境的限制，通常需要考虑后训练逼近（PTA），利用现有明文模型的预训练参数而无需重新训练。现有的PTA研究统一将所有层的激活函数逼近到高度以减轻逼近带来的精度损失，导致了显著的时间消耗。本研究提出了一种优化的逐层逼近（OLA），这是一个系统框架，通过在PTA场景中为每个层使用不同的逼近多项式来优化精度损失和时间消耗。为了进行高效的逼近，我们通过考虑每个激活函数的实际输入分布来反映对分类精度的逐层影响，构建优化问题。此外，我们提供了一种动态规划技术来解决优化问题，并在多项式时间内实现了优化的逐层度数。结果，与采用统一次数多项式的现有最新实现相比，OLA方法将ResNet-20模型和ResNet-32模型的推理时间分别减少了3.02倍和2.82倍。此外，我们成功地通过使用提出的方法，仅用3次多项式替换ConvNeXt模型中的GELU函数，在不修改基础模型的情况下对CIFAR-10进行分类。

更新时间: 2024-05-29 01:00:01

领域: cs.CR,cs.AI

下载: http://arxiv.org/abs/2310.10349v3

GPLaSDI: Gaussian Process-based Interpretable Latent Space Dynamics Identification through Deep Autoencoder

Numerically solving partial differential equations (PDEs) can be challenging and computationally expensive. This has led to the development of reduced-order models (ROMs) that are accurate but faster than full order models (FOMs). Recently, machine learning advances have enabled the creation of non-linear projection methods, such as Latent Space Dynamics Identification (LaSDI). LaSDI maps full-order PDE solutions to a latent space using autoencoders and learns the system of ODEs governing the latent space dynamics. By interpolating and solving the ODE system in the reduced latent space, fast and accurate ROM predictions can be made by feeding the predicted latent space dynamics into the decoder. In this paper, we introduce GPLaSDI, a novel LaSDI-based framework that relies on Gaussian process (GP) for latent space ODE interpolations. Using GPs offers two significant advantages. First, it enables the quantification of uncertainty over the ROM predictions. Second, leveraging this prediction uncertainty allows for efficient adaptive training through a greedy selection of additional training data points. This approach does not require prior knowledge of the underlying PDEs. Consequently, GPLaSDI is inherently non-intrusive and can be applied to problems without a known PDE or its residual. We demonstrate the effectiveness of our approach on the Burgers equation, Vlasov equation for plasma physics, and a rising thermal bubble problem. Our proposed method achieves between 200 and 100,000 times speed-up, with up to 7% relative error.

Updated: 2024-05-29 00:47:02

标题: GPLaSDI：基于高斯过程的可解释潜在空间动态识别通过深度自动编码器

摘要: 数值求解偏微分方程（PDEs）可能具有挑战性且计算成本高昂。这导致了简化模型（ROMs）的发展，这些模型精确但比完整模型（FOMs）快。最近，机器学习的进展使得非线性投影方法的创建成为可能，例如潜空间动态识别（LaSDI）。LaSDI使用自动编码器将完整PDE解映射到潜空间，并学习控制潜空间动态的ODE系统。通过在简化的潜空间中插值和解决ODE系统，通过将预测的潜空间动态输入解码器，可以进行快速且准确的ROM预测。在本文中，我们介绍了GPLaSDI，一种基于LaSDI的新框架，依赖于高斯过程（GP）进行潜空间ODE插值。使用GP具有两个显著优势。首先，它可以量化ROM预测的不确定性。其次，利用这种预测不确定性可以通过贪婪选择额外的训练数据点实现有效的自适应训练。这种方法不需要事先知道底层PDE。因此，GPLaSDI具有固有的非侵入性，可应用于没有已知PDE或其残差的问题。我们在Burgers方程、等离子体物理学的Vlasov方程和上升热泡问题上展示了我们方法的有效性。我们提出的方法实现了200至100,000倍的加速，并且相对误差高达7%。

更新时间: 2024-05-29 00:47:02

领域: cs.CE,cs.LG,cs.NA,math.NA

下载: http://arxiv.org/abs/2308.05882v3

Deep Bayesian Filter for Bayes-faithful Data Assimilation

State estimation for nonlinear state space models is a challenging task. Existing assimilation methodologies predominantly assume Gaussian posteriors on physical space, where true posteriors become inevitably non-Gaussian. We propose Deep Bayesian Filtering (DBF) for data assimilation on nonlinear state space models (SSMs). DBF constructs new latent variables $h_t$ on a new latent (``fancy'') space and assimilates observations $o_t$. By (i) constraining the state transition on fancy space to be linear and (ii) learning a Gaussian inverse observation operator $q(h_t|o_t)$, posteriors always remain Gaussian for DBF. Quite distinctively, the structured design of posteriors provides an analytic formula for the recursive computation of posteriors without accumulating Monte-Carlo sampling errors over time steps. DBF seeks the Gaussian inverse observation operators $q(h_t|o_t)$ and other latent SSM parameters (e.g., dynamics matrix) by maximizing the evidence lower bound. Experiments show that DBF outperforms model-based approaches and latent assimilation methods in various tasks and conditions.

Updated: 2024-05-29 00:42:00

标题: 深度贝叶斯滤波器用于贝叶斯可信数据同化

摘要: 非线性状态空间模型的状态估计是一项具有挑战性的任务。现有的同化方法主要假设物理空间上的后验分布为高斯分布，而真实的后验分布则不可避免地成为非高斯分布。我们提出了用于非线性状态空间模型（SSMs）数据同化的深度贝叶斯滤波（DBF）。DBF在一个新的潜在（“时髦”）空间上构建新的潜在变量$h_t$，并同化观测$o_t$。通过（i）限制时髦空间上的状态转移为线性，并（ii）学习高斯逆观测算子$q(h_t|o_t)$，DBF的后验分布始终保持高斯分布。与众不同的是，后验分布的结构设计为后验的递归计算提供了解析公式，而不需要在时间步长上累积蒙特卡洛抽样误差。DBF通过最大化证据下界来寻找高斯逆观测算子$q(h_t|o_t)$和其他潜在SSM参数（例如动态矩阵）。实验证明，DBF在各种任务和条件下优于基于模型的方法和潜在同化方法。

更新时间: 2024-05-29 00:42:00

领域: cs.LG,physics.ao-ph,physics.data-an

下载: http://arxiv.org/abs/2405.18674v1

Watermarking Counterfactual Explanations

The field of Explainable Artificial Intelligence (XAI) focuses on techniques for providing explanations to end-users about the decision-making processes that underlie modern-day machine learning (ML) models. Within the vast universe of XAI techniques, counterfactual (CF) explanations are often preferred by end-users as they help explain the predictions of ML models by providing an easy-to-understand & actionable recourse (or contrastive) case to individual end-users who are adversely impacted by predicted outcomes. However, recent studies have shown significant security concerns with using CF explanations in real-world applications; in particular, malicious adversaries can exploit CF explanations to perform query-efficient model extraction attacks on proprietary ML models. In this paper, we propose a model-agnostic watermarking framework (for adding watermarks to CF explanations) that can be leveraged to detect unauthorized model extraction attacks (which rely on the watermarked CF explanations). Our novel framework solves a bi-level optimization problem to embed an indistinguishable watermark into the generated CF explanation such that any future model extraction attacks that rely on these watermarked CF explanations can be detected using a null hypothesis significance testing (NHST) scheme, while ensuring that these embedded watermarks do not compromise the quality of the generated CF explanations. We evaluate this framework's performance across a diverse set of real-world datasets, CF explanation methods, and model extraction techniques, and show that our watermarking detection system can be used to accurately identify extracted ML models that are trained using the watermarked CF explanations. Our work paves the way for the secure adoption of CF explanations in real-world applications.

Updated: 2024-05-29 00:33:56

标题: 数字水印技术在反事实解释中的应用

摘要: 可解释人工智能（XAI）领域专注于为最终用户提供关于现代机器学习（ML）模型基础的决策过程的解释技术。在庞大的XAI技术领域中，反事实（CF）解释通常受到最终用户的青睐，因为它们通过为受到预测结果影响的个体最终用户提供易于理解和可操作的对照案例，有助于解释ML模型的预测。然而，最近的研究表明，在现实世界应用中使用CF解释存在重大安全问题；特别是，恶意对手可以利用CF解释对专有ML模型进行查询高效的模型提取攻击。在本文中，我们提出了一个模型无关的水印框架（用于向CF解释添加水印），可以用于检测未经授权的模型提取攻击（依赖于带水印的CF解释）。我们的新框架解决了一个双层优化问题，将一个无法区分的水印嵌入到生成的CF解释中，以便使用空假设显著性检验（NHST）方案检测依赖于这些带水印的CF解释的任何未来的模型提取攻击，同时确保这些嵌入的水印不会损害生成的CF解释的质量。我们评估了该框架在各种真实世界数据集、CF解释方法和模型提取技术上的性能，并展示我们的水印检测系统可以准确识别使用带水印CF解释训练的提取的ML模型。我们的工作为在现实世界应用中安全采用CF解释铺平了道路。

更新时间: 2024-05-29 00:33:56

领域: cs.LG,cs.CR,stat.ME

下载: http://arxiv.org/abs/2405.18671v1

Adapting Differentially Private Synthetic Data to Relational Databases

Existing differentially private (DP) synthetic data generation mechanisms typically assume a single-source table. In practice, data is often distributed across multiple tables with relationships across tables. In this paper, we introduce the first-of-its-kind algorithm that can be combined with any existing DP mechanisms to generate synthetic relational databases. Our algorithm iteratively refines the relationship between individual synthetic tables to minimize their approximation errors in terms of low-order marginal distributions while maintaining referential integrity. Finally, we provide both DP and theoretical utility guarantees for our algorithm.

Updated: 2024-05-29 00:25:07

标题: 将差分隐私合成数据调整适应关系数据库

摘要: 现有的差分隐私（DP）合成数据生成机制通常假设存在单一源表。在实践中，数据通常分布在具有表间关系的多个表中。本文介绍了一种独特的算法，可以与任何现有的DP机制结合，生成合成关系数据库。我们的算法通过迭代调整个别合成表之间的关系，以最小化它们在低阶边际分布方面的近似误差，同时保持引用完整性。最后，我们为我们的算法提供了DP和理论效用保证。

更新时间: 2024-05-29 00:25:07

领域: cs.LG,cs.CR,cs.DB

下载: http://arxiv.org/abs/2405.18670v1

Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities

Integrating multiple generative foundation models, especially those trained on different modalities, into something greater than the sum of its parts poses significant challenges. Two key hurdles are the availability of aligned data (concepts that contain similar meaning but is expressed differently in different modalities), and effectively leveraging unimodal representations in cross-domain generative tasks, without compromising their original unimodal capabilities. We propose Zipper, a multi-tower decoder architecture that addresses these concerns by using cross-attention to flexibly compose multimodal generative models from independently pre-trained unimodal decoders. In our experiments fusing speech and text modalities, we show the proposed architecture performs very competitively in scenarios with limited aligned text-speech data. We also showcase the flexibility of our model to selectively maintain unimodal (e.g., text-to-text generation) generation performance by freezing the corresponding modal tower (e.g. text). In cross-modal tasks such as automatic speech recognition (ASR) where the output modality is text, we show that freezing the text backbone results in negligible performance degradation. In cross-modal tasks such as text-to-speech generation (TTS) where the output modality is speech, we show that using a pre-trained speech backbone results in superior performance to the baseline.

Updated: 2024-05-29 00:23:55

标题: 拉链：用于融合多模态的多塔解码器架构

摘要: 整合多个生成基础模型，特别是在不同形式上训练的模型，使之形成一个比各部分之和更大的整体，面临着重要挑战。两个关键障碍是对齐数据的可用性（包含相似含义但在不同形式中表达不同的概念），以及在跨领域生成任务中有效利用单模表示，而不损害其原始的单模能力。我们提出了Zipper，一个多塔解码器架构，通过使用交叉注意力来灵活组合从独立预训练的单模解码器中得到的多模生成模型，从而解决这些问题。在我们融合语音和文本形式的实验中，我们展示了所提出的架构在具有有限对齐文本-语音数据的场景中表现出非常有竞争力的性能。我们还展示了我们的模型的灵活性，通过冻结相应的模态塔（例如文本），选择性地维持单模（例如文本到文本生成）生成性能。在输出模态为文本的跨模任务（如自动语音识别（ASR））中，我们展示了冻结文本骨干会导致性能下降可以忽略不计。在输出模态为语音的跨模任务（如文本到语音生成（TTS））中，我们展示了使用预训练的语音骨干会导致优于基线的性能。

更新时间: 2024-05-29 00:23:55

领域: cs.LG,cs.AI,cs.CL,eess.AS

下载: http://arxiv.org/abs/2405.18669v1

DiffAug: A Diffuse-and-Denoise Augmentation for Training Robust Classifiers

We introduce DiffAug, a simple and efficient diffusion-based augmentation technique to train image classifiers for the crucial yet challenging goal of improved classifier robustness. Applying DiffAug to a given example consists of one forward-diffusion step followed by one reverse-diffusion step. Using both ResNet-50 and Vision Transformer architectures, we comprehensively evaluate classifiers trained with DiffAug and demonstrate the surprising effectiveness of single-step reverse diffusion in improving robustness to covariate shifts, certified adversarial accuracy and out of distribution detection. When we combine DiffAug with other augmentations such as AugMix and DeepAugment we demonstrate further improved robustness. Finally, building on this approach, we also improve classifier-guided diffusion wherein we observe improvements in: (i) classifier-generalization, (ii) gradient quality (i.e., improved perceptual alignment) and (iii) image generation performance. We thus introduce a computationally efficient technique for training with improved robustness that does not require any additional data, and effectively complements existing augmentation approaches.

Updated: 2024-05-29 00:16:25

标题: DiffAug：一种用于训练稳健分类器的扩散和去噪增强技术

摘要: 我们介绍了DiffAug，一种简单而高效的基于扩散的增强技术，用于训练图像分类器，以实现改进分类器稳健性这一至关重要且具有挑战性的目标。将DiffAug应用于给定示例包括一个正向扩散步骤，然后是一个反向扩散步骤。我们使用ResNet-50和Vision Transformer架构全面评估了使用DiffAug训练的分类器，并展示了单步反向扩散在提高对协变量转移、认证对抗性准确性和超出分布检测的稳健性方面的惊人有效性。当我们将DiffAug与其他增强技术（如AugMix和DeepAugment）相结合时，我们展示了进一步改善的稳健性。最后，基于这种方法，我们还改进了分类器引导的扩散，在此过程中我们观察到：(i)分类器泛化能力的提高，(ii)梯度质量的改善（即改进的感知对齐）和(iii)图像生成性能的提高。因此，我们引入了一种在不需要任何额外数据的情况下以改进的稳健性进行训练的计算效率高的技术，有效地补充了现有的增强方法。

更新时间: 2024-05-29 00:16:25

领域: cs.CV,cs.LG

下载: http://arxiv.org/abs/2306.09192v2

Two-sided Competing Matching Recommendation Markets With Quota and Complementary Preferences Constraints

In this paper, we propose a new recommendation algorithm for addressing the problem of two-sided online matching markets with complementary preferences and quota constraints, where agents' preferences are unknown a priori and must be learned from data. The presence of mixed quota and complementary preferences constraints can lead to instability in the matching process, making this problem challenging to solve. To overcome this challenge, we formulate the problem as a bandit learning framework and propose the Multi-agent Multi-type Thompson Sampling (MMTS) algorithm. The algorithm combines the strengths of Thompson Sampling for exploration with a new double matching technique to provide a stable matching outcome. Our theoretical analysis demonstrates the effectiveness of MMTS as it can achieve stability and has a total $\widetilde{\mathcal{O}}(Q{\sqrt{K_{\max}T}})$-Bayesian regret with high probability, which exhibits linearity with respect to the total firm's quota $Q$, the square root of the maximum size of available type workers $\sqrt{K_{\max}}$ and time horizon $T$. In addition, simulation studies also demonstrate MMTS's effectiveness in various settings. We provide code used in our experiments \url{https://github.com/Likelyt/Double-Matching}.

Updated: 2024-05-29 00:13:05

标题: 具有配额和互补偏好约束的双边竞争匹配推荐市场

摘要: 在这篇论文中，我们提出了一种新的推荐算法，用于解决具有互补偏好和配额约束的双边在线匹配市场问题，其中代理的偏好事先未知，并且必须从数据中学习。混合配额和互补偏好约束的存在可能导致匹配过程的不稳定性，使得这个问题具有挑战性。为了克服这一挑战，我们将问题建模为一个赌博学习框架，并提出了多智能体多类型汤普森抽样（MMTS）算法。该算法将探索的汤普森抽样方法与新的双匹配技术相结合，以提供稳定的匹配结果。我们的理论分析证明了MMTS的有效性，它可以以高概率实现稳定性，并具有总$\widetilde{\mathcal{O}}(Q{\sqrt{K_{\max}T}})$-贝叶斯遗憾，这种遗憾随着总公司配额$Q$、可用类型工作人员的最大规模的平方根$\sqrt{K_{\max}}$和时间跨度$T$的增加呈线性关系。此外，模拟研究也展示了MMTS在各种设置中的有效性。我们提供了在实验中使用的代码\url{https://github.com/Likelyt/Double-Matching}。

更新时间: 2024-05-29 00:13:05

领域: stat.ML,cs.GT,cs.LG,cs.MA

下载: http://arxiv.org/abs/2301.10230v3

Dynamics-based Feature Augmentation of Graph Neural Networks for Variant Emergence Prediction

During the COVID-19 pandemic, a major driver of new surges has been the emergence of new variants. When a new variant emerges in one or more countries, other nations monitor its spread in preparation for its potential arrival. The impact of the new variant and the timings of epidemic peaks in a country highly depend on when the variant arrives. The current methods for predicting the spread of new variants rely on statistical modeling, however, these methods work only when the new variant has already arrived in the region of interest and has a significant prevalence. Can we predict when a variant existing elsewhere will arrive in a given region? To address this question, we propose a variant-dynamics-informed Graph Neural Network (GNN) approach. First, we derive the dynamics of variant prevalence across pairs of regions (countries) that apply to a large class of epidemic models. The dynamics motivate the introduction of certain features in the GNN. We demonstrate that our proposed dynamics-informed GNN outperforms all the baselines, including the currently pervasive framework of Physics-Informed Neural Networks (PINNs). To advance research in this area, we introduce a benchmarking tool to assess a user-defined model's prediction performance across 87 countries and 36 variants.

Updated: 2024-05-29 00:10:30

标题: 基于动力学的图神经网络特征增强用于变体出现预测

摘要: 在COVID-19大流行期间，新的暴发主要驱动因素之一是新变体的出现。当一个新的变体在一个或多个国家出现时，其他国家会监测其传播情况，以便做好准备。新变体的影响以及一个国家的流行高峰时间高度取决于变体的到达时间。目前用于预测新变体传播的方法依赖于统计建模，然而，这些方法只有在新变体已经在感兴趣地区到达并具有显著的患病率时才有效。我们能否预测一个在其他地方存在的变体何时会到达特定地区？为了解决这个问题，我们提出了一种基于变体动态信息的图神经网络（GNN）方法。首先，我们推导出适用于大类传染病模型的区域（国家）之间变体患病率动态。这些动态促使引入GNN中的某些特征。我们证明了我们提出的动态信息的GNN优于所有基线模型，包括目前普遍的物理信息神经网络（PINNs）框架。为推动这一领域的研究，我们引入了一个基准测试工具，用于评估用户定义模型在87个国家和36个变体中的预测性能。

更新时间: 2024-05-29 00:10:30

领域: q-bio.PE,cs.LG,physics.soc-ph

下载: http://arxiv.org/abs/2401.03390v2

Decoupled Data Consistency with Diffusion Purification for Image Restoration

Diffusion models have recently gained traction as a powerful class of deep generative priors, excelling in a wide range of image restoration tasks due to their exceptional ability to model data distributions. To solve image restoration problems, many existing techniques achieve data consistency by incorporating additional likelihood gradient steps into the reverse sampling process of diffusion models. However, the additional gradient steps pose a challenge for real-world practical applications as they incur a large computational overhead, thereby increasing inference time. They also present additional difficulties when using accelerated diffusion model samplers, as the number of data consistency steps is limited by the number of reverse sampling steps. In this work, we propose a novel diffusion-based image restoration solver that addresses these issues by decoupling the reverse process from the data consistency steps. Our method involves alternating between a reconstruction phase to maintain data consistency and a refinement phase that enforces the prior via diffusion purification. Our approach demonstrates versatility, making it highly adaptable for efficient problem-solving in latent space. Additionally, it reduces the necessity for numerous sampling steps through the integration of consistency models. The efficacy of our approach is validated through comprehensive experiments across various image restoration tasks, including image denoising, deblurring, inpainting, and super-resolution.

Updated: 2024-05-29 00:09:08

标题: 利用扩散净化实现图像恢复的数据一致性解耦

摘要: 扩散模型最近作为一类强大的深度生成先验而受到关注，在各种图像恢复任务中表现出色，这是因为它们在建模数据分布方面具有出色的能力。为了解决图像恢复问题，许多现有技术通过将额外的似然梯度步骤纳入扩散模型的反向抽样过程中，实现数据一致性。然而，额外的梯度步骤对于实际应用中存在挑战，因为它们带来了大量的计算开销，从而增加了推理时间。当使用加速扩散模型采样器时，它们还会给使用者带来额外的困难，因为数据一致性步骤的数量受到反向抽样步骤的限制。在本研究中，我们提出了一种新颖的基于扩散的图像恢复求解器，通过将反向过程与数据一致性步骤解耦，解决了这些问题。我们的方法涉及在重建阶段和通过扩散净化强制实现先验的细化阶段之间交替进行。我们的方法表现出了通用性，使其非常适用于在潜在空间中高效解决问题。此外，通过整合一致性模型，它减少了大量采样步骤的必要性。通过在各种图像恢复任务中进行全面实验验证了我们方法的有效性，包括图像去噪、去模糊、修补和超分辨率。

更新时间: 2024-05-29 00:09:08

领域: eess.IV,cs.AI,cs.CV,cs.LG,eess.SP

下载: http://arxiv.org/abs/2403.06054v5

PARIS: Personalized Activity Recommendation for Improving Sleep Quality

The quality of sleep has a deep impact on people's physical and mental health. People with insufficient sleep are more likely to report physical and mental distress, activity limitation, anxiety, and pain. Moreover, in the past few years, there has been an explosion of applications and devices for activity monitoring and health tracking. Signals collected from these wearable devices can be used to study and improve sleep quality. In this paper, we utilize the relationship between physical activity and sleep quality to find ways of assisting people improve their sleep using machine learning techniques. People usually have several behavior modes that their bio-functions can be divided into. Performing time series clustering on activity data, we find cluster centers that would correlate to the most evident behavior modes for a specific subject. Activity recipes are then generated for good sleep quality for each behavior mode within each cluster. These activity recipes are supplied to an activity recommendation engine for suggesting a mix of relaxed to intense activities to subjects during their daily routines. The recommendations are further personalized based on the subjects' lifestyle constraints, i.e. their age, gender, body mass index (BMI), resting heart rate, etc, with the objective of the recommendation being the improvement of that night's quality of sleep. This would in turn serve a longer-term health objective, like lowering heart rate, improving the overall quality of sleep, etc.

Updated: 2024-05-29 00:06:24

标题: PARIS: 个性化活动推荐以改善睡眠质量

摘要: 睡眠质量对人们的身体和心理健康有深远影响。睡眠不足的人更容易报告身体和心理困扰、活动受限、焦虑和疼痛。此外，近年来，活动监测和健康跟踪的应用和设备呈爆炸式增长。从这些可穿戴设备收集的信号可以用于研究和改善睡眠质量。本文利用体力活动与睡眠质量之间的关系，通过机器学习技术找到帮助人们改善睡眠的方法。人们通常有几种行为模式，他们的生物功能可以被划分为不同部分。通过对活动数据进行时间序列聚类，我们找到与特定主体最明显行为模式相关的聚类中心。然后为每个聚类中的每种行为模式生成有助于良好睡眠质量的活动方案。这些活动方案被提供给活动推荐引擎，建议主体在日常生活中进行一系列从轻松到强烈的活动。根据主体的生活方式限制，如年龄、性别、体重指数（BMI）、静息心率等，进一步个性化推荐，推荐的目的是改善当晚的睡眠质量。这将进而达到更长期的健康目标，比如降低心率、改善整体睡眠质量等。

更新时间: 2024-05-29 00:06:24

领域: cs.LG,cs.AI,cs.HC

下载: http://arxiv.org/abs/2110.13745v2

Fast Explainability via Feasible Concept Sets Generator

A long-standing dilemma prevents the broader application of explanation methods: general applicability and inference speed. On the one hand, existing model-agnostic explanation methods usually make minimal pre-assumptions about the prediction models to be explained. Still, they require additional queries to the model through propagation or back-propagation to approximate the models' behaviors, resulting in slow inference and hindering their use in time-sensitive tasks. On the other hand, various model-dependent explanations have been proposed that achieve low-cost, fast inference but at the expense of limiting their applicability to specific model structures. In this study, we bridge the gap between the universality of model-agnostic approaches and the efficiency of model-specific approaches by proposing a novel framework without assumptions on the prediction model's structures, achieving high efficiency during inference and allowing for real-time explanations. To achieve this, we first define explanations through a set of human-comprehensible concepts and propose a framework to elucidate model predictions via minimal feasible concept sets. Second, we show that a minimal feasible set generator can be learned as a companion explainer to the prediction model, generating explanations for predictions. Finally, we validate this framework by implementing a novel model-agnostic method that provides robust explanations while facilitating real-time inference. Our claims are substantiated by comprehensive experiments, highlighting the effectiveness and efficiency of our approach.

Updated: 2024-05-29 00:01:40

标题: 通过可行概念集生成器实现快速可解释性

摘要: 长期存在的困境阻碍了解释方法的广泛应用：普适性和推理速度。一方面，现有的与模型无关的解释方法通常对待解释的预测模型做出最小的预设。然而，它们需要通过传播或反向传播向模型提出额外的查询来近似模型的行为，导致推理速度缓慢，阻碍了它们在时间敏感任务中的使用。另一方面，已经提出了各种模型相关的解释方法，可以实现低成本、快速推理，但牺牲了适用性，限制了它们适用于特定模型结构。在本研究中，我们通过提出一种新颖的框架，弥合了模型无关方法的普适性和模型特定方法的高效性之间的差距，而且不对预测模型的结构做出假设，实现了高效的推理，并允许实时解释。为了实现这一目标，我们首先通过一组人类可理解的概念定义解释，并提出一个框架，通过最小可行的概念集阐明模型预测。其次，我们展示了一个最小可行集生成器可以作为预测模型的伴随解释器进行学习，为预测生成解释。最后，我们通过实施一种提供强大解释并促进实时推理的新颖模型无关方法来验证这一框架。我们的主张得到了全面实验证实，突出了我们方法的有效性和效率。

更新时间: 2024-05-29 00:01:40

领域: cs.LG,cs.AI

下载: http://arxiv.org/abs/2405.18664v1